HumphrysFamilyTree.com

The genealogy site of Mark Humphrys

Home      Blog      Surnames      Ancestors      Contact

Search:

148,000 page views per month

€2k competition      Random page


"Hypertext Indented Narrative" pedigree format:
Adapting the Burke's Peerage format for the Web

or:

How to draw indefinitely large family trees by hand


Mark Humphrys

Computers in Genealogy 7(1):26-46 (Mar 2000)




Abstract - This paper discusses how to add hypertext to the "Indented Narrative" pedigree format (which I refer to as "IN" format, and which is exemplified by the format used in Burke's Peerage). The result is a free-form, hand-drawable, hyperlinked data format which I call "Hypertext Indented Narrative", or "HIN" format. This paper analyses the merits of HIN and argues for it as a real alternative to the rigid and constraining databases that have dominated genealogy since the advent of the computer (but which in many ways represent a philosophy from the era of the PC, i.e. before the advent of the Web).
    It is argued that the main advantage of the IN format was its variable resolution - an advantage that is lost and unappreciated in the modern databases. It is argued that there were some basic problems with the IN format when implemented on paper, that prevented it reaching its full potential. These problems can be overcome with the simple and controlled use of hypertext.
    The result is a format that allows one for the first time to properly draw and maintain networks of arbitrarily-complex interconnected pedigrees entirely by hand. This is a real alternative to the usual practice of wrestling one's information into the limited pre-defined fields of some rigid database. The (much-underplayed) disadvantages of databases and the possible advantages of free-form formats are discussed. Finally, HIN does not have to be an alternative to databases - It is also argued that HIN could be automatically generated as one of the standard output formats of a database.

Keywords - Hypertext, Genealogy, Pedigree formats, Indented Narrative pedigree format, Burke's Peerage.

Note - No actual connection between this work and Burke's Peerage is intended or implied. Burke's Peerage is merely held up as a model of how to lay out an Indented Narrative pedigree on paper.

(I originally wanted to call this "Hypertext Burke's Peerage" format, but was advised against this by CiG.)






Contents






Part 1 - Introduction


1. Introduction

For centuries, humans have tried (and often failed) to draw large family trees on paper. Many have not even recognised the nature of the data structure they are trying to draw - insisting on treating it as a binary tree (going up) or an ordinary branching tree (going down), and regarding all the exceptions - multiple marriages, marriages between relations, marriages between step-relations, marriages to widows of relations, and resulting offspring of such marriages - as anomalies rather than as what they are - fundamental features of any large pedigree.

"Family trees" are not trees (they are networks)

Those of us who have worked with very large, interconnecting set of pedigrees (such as the families that connect to the Peerage, or the Royal Houses of Europe) realise that family trees are not actually "trees" at all in the sense that such a data structure is defined in Computer Science (strictly branching, with no cross-links). Rather - and especially if you follow multiple surnames - they are multi-dimensional highly-connected networks, where arbitrary links may be made between widely separated points. The resulting many-dimensional data structure may not be drawable in any obvious way on 2-dimensional paper or on a 2-dimensional screen.

Before computers, genealogists struggled for centuries with formats in which to present this data on paper [1]. Many formats (e.g. almost any kind of graphical chart) were suitable only for a once-off presentation of the data, and could not cope as the family expanded in size and new connections were made. Many (perhaps even most) genealogists begin their research by drawing graphical charts - only to learn the hard way how impossible to maintain they are, as what is known about the family expands and changes without cease.

The Indented Narrative ("IN", or "Burke's Peerage") format on paper

Burke's Peerage, more than perhaps anybody else, were faced with the problem of maintaining large, ever-expanding pedigrees as the generations passed (some of their pedigrees have been running through edition after edition for 170 years now). As a result, they developed a way of laying out the pedigrees in an expandable format, where the addition of new generations did not require the rewriting of earlier ones, and where the addition of newly-discovered siblings and families in earlier generations did not require a complete re-drawing of the whole tree.

In Burke's, pedigrees are written as a continuous narrative, with successive generations indented. Complexity is managed by following only one particular line (the "head of the family") downwards. Of each new head they will say "of whom presently", and then illustrate the complete descendants of his siblings, before moving on to illustrate the family entirely below the new head (this is a primitive form of hyperlink, as I shall argue below). This general type of format is known as "Indented Narrative", and I shall refer to it as "IN" format for short. For an introduction to the IN format see [3], [6].

It could be argued that IN was the ultimate format for the paper era - though some problems clearly existed. For example, the use of the "head of the family" idea enforced a 1-dimensional structure that mapped poorly to very "bushy" trees, leading to difficult problems of visualisation. Also, the problem of cross-referencing and cross-linking still remained. Burke's could note that a spouse came from another pedigree in the collection, but where exactly that person was found on that other huge multi-page pedigree, was hard to express. Likewise, where exactly the head of family, "of whom presently", will be found in the text below is often not obvious. His siblings' descendants may go on for pages before we return to the head of family. I will deal in more detail with the problems of the IN format below.


2. Computers in Genealogy

Despite the advent of computers in genealogy, this problem of visualisation or representation has not gone away. Computer databases allow the input of large, connected pedigrees. However, their representation on screen or paper is then done by automatic generation of some form of chart. What should this output look like?

1 person per page

The simplest, most completely flexible format, and very commonly done by database output programs, is to output 1 person per page, displayed on a computer where the user clicks to move to another person. Clearly this can handle any complexity of intermarriage.

For instance, in the Genealogy of the British Royal Family, presented by Brian Tompsett on the Web [7], the GEDCOM database presents its output as 1 person per web page. See for example William IV [8]. This is scalable to deal with the high complexity of the British Royal Family, as Tompsett demonstrates.

However, there are some problems. [The non-technical reader may wish to skip at this point to the section "The need for variable resolution" below.] If each person needs an actual file in the computer's file system, Tompsett's tree would need about 40,000 files (at time of writing). This number of separate files can cause problems for some computer file systems. Tompsett gets around this by using a CGI script (i.e. a program to build a web page on the fly) to build and display a temporary "file" at "run-time" (i.e. at the moment when the user requests to see the entry). As a result, the page presented does not actually represent a file in the file system, and the tree can be of unlimited size - while still retaining permanent URLs (addresses) that one can bookmark or link to. The JavaGED program [12], [13] implements a similar solution using the more complex Java technology. Incidentally, presenting data calculated at run-time via a CGI script (or Java) has other advantages - one can easily find out what pages link to your site - and one can even customise the data returned for different requests. See discussion by Tompsett [9].

1 person per "virtual page"

Another way of getting around the file system problem is to have 1 person per "virtual page", where there are many "virtual pages" on a single web page (which is a single real file). Each virtual page is marked by a label tag in the middle of the real web page on which it appears. In the language web pages are written in, HTML, this will look something like   <A NAME=label>. The virtual page is then referenced by the URL (address):   file.html#label. This is probably the most common way of presenting GEDCOM data on the Web, exemplified by the Royal Family Tree site at Penn State University [14]. Such a site has advantages for the user in that, after a certain amount of browsing, he has probably already downloaded the file that contains somewhere within it the next individual he is going to click on (and so it can be retrieved from the browser cache on disk, instead of having to re-contact the site). This compares with Tompsett's scheme, where one must contact the web site (and suffer a small delay) at every step as one clicks on the next individual.

However, the drawback with the Penn State scheme is that the user is often distracted and confused by the virtual pages above and below the virtual page he is trying to concentrate on. See for example William IV [15]. Better use may be made of bandwidth, but readability is clearly worse than in the Tompsett model.

The need for variable resolution

There are problems, however, with the readability of any 1 person per page scheme. The basic problem is that one does not know what lies behind a link. An individual may open up into a vast sub-tree of children and descendants. Or he may have died in infancy. One has no way of knowing before clicking the link. Consider, for example, where there is a long list of individuals in a very "thin" line of descent:


  1. Thomas, died in infancy.
  2. Philip, died unmarried.
  3. JOHN, born 1600, of whom presently.
  4. Edward, had issue:
    1. Andrew, had issue:
      1. Edward, had issue:
        1. Andrew, had issue:
          1. John, died in infancy.
          2. Andrew, died unmarried 1720.
  5. Michael, died young.
  6. Andrew, had issue:
    1. John, had issue:
      1. Andrew, had issue:
        1. John, had issue:
          1. Andrew, had issue:
            1. John, died unmarried 1755.  


The IN format (which is what I have just used, above) can show us very quickly and at a glance that 3 of the brothers left no descendants, 2 left very thin (and quickly extinct) lines, while just 1, namely John, left a large sub-tree of descendants, about which we shall discover later. If this information had been presented in a 1 person per page format however:


  1. Thomas.  
  2. Philip.
  3. John.
  4. Edward.
  5. Michael.
  6. Andrew.

then we have to make at least 6 clicks before even finding out who left descendants at all. The thin lines of Edward and Andrew's descendants would not be immediately obvious, but would require many clicks to bring out. And to make it worse, we would probably explore John first before exploring Edward or Andrew, not realising the vast sub-tree that we would be entering by doing so.

It seems perfectly flexible to have 1 person per page, but by forcing us to have 1 person per page, this pedigree is clearly less readable than when in the IN format. The IN format allows us to dispose quickly of short lines and thin lines, and separate out the long, wide line and move it below to a new section. In this example, if the output program of the database could have automatically built a short narrative page illustrating all lines except John's, and then just had a single link to John's descendants instead of 6 links, the tree would have been much easier to visualise.


3. Problems with the IN format on paper

There are problems with the IN format, however. In each generation, we have a single person of whom we say "of whom presently". This works well when in each generation the other siblings are of little importance or leave relatively small sub-trees of descendants, in which case we can follow the head of the family briskly from generation to generation.

It does not work so well, however, when the other siblings leave enormous family trees in their own right. As an example, see the BLENNERHASSETT entry in Burke's Irish Family Records, 1976 [5]. On p.136, John Blennerhassett and Martha Lynne leave issue:


  1. JOHN, of whom presently.
  2. Robert, who leaves an enormous sub-tree filling 7 consecutive columns of text  
    .... after which we finally come to:  
  3. Thomas, and various daughters, after which we finally come back to the eldest son John.  

The entire pedigree of the family in Ireland is less than 13 columns, so the "diversion" of Robert's sub-tree is actually bigger than the entire rest of the genealogy! The difficulty of reading the presentation is increased considerably by the fact that Robert is not the youngest - so that when they are finally listed, Thomas and the daughters are far away from the parents and older brothers, and they seem out of place and disjoint. To make things worse, the sub-tree below Robert itself suffers from the same problems, with two enormous sub-trees opening up inconveniently under the 3rd and 6th sons.










Part 2 - Hypertext Indented Narrative


4. Hypertext Indented Narrative ("HIN") format

To convert an IN format pedigree to hypertext, first of all we convert this "of whom presently" statement into a hyperlink.

The problem however, is that, because it is designed for paper, the IN format only allows us separate off 1 person per generation, no matter how large the tree under a second candidate becomes. This constraint cannot be broken on paper, since otherwise we would have great difficulties illustrating where each of the 2 hyperlinks leads to in the text below (it is hard enough to tell with just 1 link as it is). It can, however, be broken on screen, where 2 hyperlinks can lead to 2 different destinations. e.g. We can represent the family above as:


  1. John
  2. Robert
  3. Thomas, had issue:
    1. Various information.
  1. Mary.
  2. Alice.
  3. Lucy.

where the structure of the family is now clear, and the complexity of both John and Robert's descendants is separated out onto separate pages. To see this in greater detail, there is a hypertext implementation of this very tree on my website [17]. On Robert's page, we likewise have 2 branching points:


  1. Edward.
  2. Robert.
  3. John
  4. Conway.
  5. Thomas, had issue:
    1. Various information.
  6. Henry
  1. Catherine.
  2. Avice.
  3. Alice.
  4. Lucy.
  5. Anne.

to remove the complexity of John and Henry's sub-trees, so that this generation can be properly visualised, including the daughters (who were otherwise lost in the tail end of all of Henry's data). See again my website [18]. For another example where 2 branching points greatly simplifies the tree see William Gibbon on my website [19].

What we are adopting here is a variable-resolution approach: Instead of strictly 1 link per person (most databases), or strictly 1 link per generation (IN), we adapt the number of links to the complexity of the pedigree. We try to show as much information as possible on one page, but whenever it gets too complex, we break the tree and move a sub-tree to a separate page. For instance the tree earlier:


  1. Thomas, died in infancy.
  2. Philip, died unmarried.
  3. John, born 1600 [ancestor of the main Co.Kerry family].
  4. Edward, had issue:
    1. Andrew, had issue:
      1. Edward, had issue:
        1. Andrew, had issue:
          1. John, died in infancy.
          2. Andrew, died unmarried 1720.
  5. Michael, died young.
  6. Andrew, had issue:
    1. John, had issue:
      1. Andrew, had issue:
        1. John, had issue:
          1. Andrew, had issue:
            1. John, died unmarried 1755.  

If we are restrained in our breaking of trees onto new pages, then we get all the benefits of the visualisation of a large area of the tree as in the IN format, combined with the hypertext ability to follow our own path through the pedigree.

Annotation

Note also here that with a hand-drawn format we can easily annotate the hyperlink to give some further idea of what is behind it - "[ancestor of the main Co.Kerry family]". If the pedigree is built automatically from a database, we may or may not be able to do this. With most current systems we cannot - there is no alternative to the 6 bland hyperlinks earlier.

However there is nothing in principle to prevent the database designer from allowing annotations of each link, or even from automatically displaying an indication of how much information lies below a link, as for example the Yahoo website does (thanks to Peter Christian for this suggestion). Indeed, I will discuss in the section "Automatically-generated HIN output" below how a database might automatically generate full variable-resolution HIN output itself.




5. Features of the Hypertext Indented Narrative (HIN) format

Variable number of people per page (and relatively small number of pages)

With what I call a "Hypertext Indented Narrative" or "HIN" format, we don't have the problem of 40,000 pages, because whole areas of the tree collapse. Everyone who died young or unmarried, or even just left few descendants, does not need a separate page but can be included on a page with their parents. We can draw the entire pedigree in a relatively modest number of web pages, each an ordinary file in the file system.

Defining Permanent URLs

There is, admittedly, some difficulty in maintaining a permanent URL (or Web address) for an individual if sub-trees may be moved to new pages as the pedigree grows. For example, consider my page on Henry Herbert, the 2nd Earl of Pembroke [20]. This has the address:   Herbert/2nd.earl.html   - and this URL should always stand as the starting point for all of the 2nd Earl's descendants. As the pedigree has developed, the sub-tree of the 2nd Earl's son, the 4th Earl, has grown too big, and so has been broken off onto a further separate page:   Herbert/4th.earl.html   However his brother the 3rd Earl has a modest sub-tree and so is still just listed on the previous page with his father.

Now as the amount of information about the 3rd Earl increases, he may also be moved to his own sub-page. So we cannot actually guarantee that the URL   Herbert/2nd.earl.html   will always point directly to the 3rd Earl. However, we can say that at least it should not be hard to find him from this URL (he will be at most 1 click away).

The 3rd Earl has now moved to a separate page (1 click away).

A 1 person per page scheme does, admittedly, lend itself more naturally to permanent URLs. There would be a unique URL for each individual, and this URL should be able to survive any re-arrangement of the pedigree, as Tompsett [7] has in fact demonstrated now for some years. With 1 person per "virtual page" schemes, though, trying to maintain permanent URLs may mean that individuals end up sharing "real" web pages with arbitrary unrelated individuals, see discussion in [21]. Permanent URLs are very important as an aid to bookmarking and linking - see Humphrys, 1999 [22]. See also [23].

Better use of bandwidth

The slowness of response of the Internet dominates its usability, and is likely to do so for some time to come - see Nielsen, 1997 [24] and 1998 [25]. Any scheme that makes better use of bandwidth is therefore of interest.

If you are interested in some individual, then you are probably interested in their surrounding family. In the Tompsett scheme of 1 person per page, you have to make a separate download for every individual in the family. It takes dedication, and many downloads (with all the associated Net delays), to visualise the surrounding family.

With the Penn State scheme, the Web page downloaded returns more than just 1 individual. Sometimes the surrounding family is found somewhere else on the same Web page, but this is not guaranteed. Sometimes the surrounding family is to be found elsewhere - and the rest of what was downloaded is simply irrelevant.

The HIN scheme makes better use of bandwidth than either of these. It downloads a reasonable chunk of information in one go, and that information is likely to be precisely the information the user would have downloaded next, with little to no other material. What is downloaded is not the target person in the middle of a large, confusing collection of other disconnected virtual pages, as in the Penn State scheme. Rather, what is downloaded is the target person shown embedded within the narrative structure in their correct place.

Easier to guess what lies behind a link

With a HIN format you also have some idea of what you are going to get when you click on an individual. The individual must have some large number of descendants, or at the very least some lengthy biography. Otherwise he would not be made into an extra link. He would be incorporated on the parent page. All rapidly terminating sub-trees get listed at a glance on the parent page. One does not know at a glance, of course, what is behind the remaining links, but one knows that these are all at least major new sub-trees.

In fact - as noted in the section "Annotation" above - because these are free-form, possibly hand-drawn trees, we may annotate the hyperlinks with any amount of small notes to summarise what lies behind them.

Indefinitely expandable

Clearly the HIN format, like its paper IN ancestor, is expandable to any imaginable pedigree complexity. As unusual relationships such as cousin and step-relation marriages increase the complexity of the pedigree, we adapt by splitting off more separate pages. For instance, see my page for Elizabeth Fitz-Hugh [26]. She marries twice, both times leaving descendants. I split these onto separate pages to make them more readable. Then her grand-daughter by one of these marriages, Elizabeth Cheyne [27], marries the son of her husband in the other marriage, the 2nd Baron Vaux [28]. To finally ruin us (if we had been trying to draw this as a graphic chart), their son marries the daughter of his own 2nd cousin.

In extreme cases, we may actually end up, for the sake of clarity, having to separate it into more or less 1 person per page. But this is only done at certain very difficult points. As the other parts of the tree settle down into a simpler structure with no cross-links, we increase the number of people per page again.

Marriages, Cross-linking and Cross-referencing

I find that to manage this complexity it is useful to strictly separate the surnames onto different pages. A female's birth and youth is found on the surname page for her parents. A male's birth and youth is on the surname page for his parents. However, after they marry, their issue and descendants are found in one place only - typically on the husband's page if the issue follow his surname. The wife's page then does not list the issue but rather contains a prominent hyperlink to her husband's page. (This is, of course, basically the same scheme as Burke's.) For example, here the male is on the Smith surname page:


  1. John Smith.
  2. Michael Smith, mar Alice Jones and had issue:  
    1. John Smith.
    2. Andrew Smith.
    3. Thomas Smith, had issue:
      1. John Smith.
  3. Catherine Smith.

The female is on the Jones surname page. But it does not list her issue, but rather contains a hyperlink to the Smith surname page, where her issue and descendants will be found:


  1. Thomas Jones.
  2. Alice Jones, mar Michael Smith and had issue.  
  3. Frank Jones, had issue:
    1. Andrew Jones.

For an example, see my pedigree of the Eagar family [29], who have densely intermarried with the Blennerhassett and allied families. No matter what the intermarriages, I strictly keep all the Eagars on the Eagar page, and all the Blennerhassetts on the Blennerhassett page, and extensively crosslink between them. Maintaining different surnames on separate pages helps us predict on which page an individual will be found.

It is of course easier to follow hyperlinks direct to the relevant page than to use the solutions on paper - where we just name the page (e.g. Foster, 1887 [30]), or just name an entire multi-page pedigree somewhere in the volume (e.g. Burke's), and leave it up to the reader to find the reference in it.

Internal and External Hyperlinks

Any modern author will wish to include in his pedigree hyperlinks to related material on the Web. These may be entirely for his own convenience (if the pedigree is offline) or for the added convenience of his readers. The question may arise of whether these "external" links will get confused with the main "internal" hyperlinks used to navigate through the tree.

There may also be internal hyperlinks for reasons other than to do with navigating through the tree. For instance, a casual reference to somebody else in the pedigree:


  1. Michael O'Connor,
    set up a company with John Smith,  
    mar Alice McCarthy and had issue:  
    1. John O'Connor.
    2. David O'Connor.

is clearly a different type of reference to a major "structural" link:


  1. Sarah Jones,
    mar John Smith and had issue.  

I have therefore found it useful to develop a system whereby all major structural links (marriage, parentage) are in bold, all minor cross-references are in normal type, and almost all external hyperlinks are in normal type, except for a few ones that essentially serve as external structural links. In this way, the reader knows that, on a page filled with hyperlinks, to follow the basic pedigree, he should follow the links in bold.

I got the idea of having differently emphasised links from Luke Stevens' "Descent from Adam" site [31] - a site whose information may not be true, but whose use of colour and HTML is original and can be learnt from [32]. (In a far less extreme way, a similar situation holds with Burke's Peerage and The Complete Peerage. The latter is generally agreed to have a higher standard of scholarship (in that it references its sources), yet the former clearly has a much better idea of how to lay out family trees.)

This distinction between external and internal links is in fact an issue in general with Web pages. Many authors solve it by simply refusing to have any external links. Most of the family trees online at the moment (including most of the major sites) have almost no external links, and few internal cross-references either - all links are internal structural links. As Tompsett notes [11], some of this may be because of the difficulty of getting GEDCOM software to recognise the hyperlink object - which illustrates rather nicely the dangers of leaving yourself at the mercy of a database.

Side-notes, partial trees, and other additions that do not fit into any narrative or pedigree structure

Perhaps the real strength of HIN is that it easily allows many additions - images, separate image galleries, side notes, long biographies, short parish or building histories, historical background notes, local maps, lists of references, lists of related Internet sites, partial pedigrees and so on - that are difficult or impossible to force into the structure of an IN narrative on paper, or indeed into the structure of any database.

For instance, in my page on Charles I [33], I have a timetable of events, and a collection of notes about the origin of certain names. In my page on Arthur Gibbon [34], I have inline images, and a collection of Internet links related to (and only to) what is on that page. We can add any form of data or hyperlink, without waiting for the GEDCOM standard or database program to recognise it. If all these additions get too complex, simply start another page.

We can have pages for properties. In my page on Ballyseedy House [35], I could not decide under which of its many owners I should place the information. So I gave it its own web page, and hyperlink to and from all its various owners. Currently I make these into "structural" links - and therefore put them in bold type - though some may feel this risks confusion with the "main" structural links of marriage and parentage.

Finally, consider partial pedigrees. Often in a family we collect a number of partial and disconnected pedigrees which we hope one day to connect to each other. Trying to list all these partial families and stray disconnected individuals causes havoc with formats that expect the entire family to be connected - especially if we want to add explicit notes pointing to each fragment and suggesting theories as to how they might be connected. In HIN format, each partial pedigree would begin on the home page for that family, and then hyperlinks would continue them on separate pages. The home page can describe in any number of notes suggested theories for how the fragments might be related. See for example my pages for Humphrys [36], Gibbon [37], Blennerhassett [38] and Maltass [39].

Sources

Related to the above is the fact that because all of our references to sources are done with hyperlinks, the sources themselves can be distributed throughout the tree - they don't need to be all on one long "Sources" page. For example, if a biography of someone exists, I will list it on that person's page, and then link to that page when referencing that biography throughout the tree. See for example the 2 biographies of The O'Rahilly [40].

It seems to me, especially if there are hundreds, or thousands of sources, that it makes more sense to break them up into groups like this, rather than force them all to be in one vast list (as some of the database programs do). If you were only interested in 1 person on the pedigree (like The O'Rahilly), for example, this shows you at a glance what is the relevant bibliography.

A possible drawback, if sources can be scattered everywhere, is that someone making a printout of a sub-section of the site (e.g. someone interested in just 1 surname) may not know if he has printed out all of the pages on which referenced sources appear (or indeed other cross-links, or external links). This will be discussed further in the section on "Printouts" below.

Link to a new web page or just a "virtual" page?

"Moncreiffe's Family Records" at the Baronage Press [41] use a format similar to HIN, but in their narrative the statement: "of whom presently" leads to further down the same web page rather than to a separate web page. That is, it links to a "label" on the page, similar to the schemes in the section "1 person per virtual page" above.

Jumping to different parts of the same page does save on bandwidth, but is, it seems to me, normally more confusing than schemes where the link opens up a new web page. The Moncreiffe scheme would also not work very well if one wished to add images and side notes in each generation, as in the section "Side-notes, partial trees, and other additions" above. Moncreiffe's, it seems to me, are too faithful to the Burke's format in their layout.

There may be some room, though, for the use of labels in marriages and other cross-references. This will be returned to in the section "Linking to a label on a page" in "Future Work" below.

Maintenance

The main reason a graphical chart is impossible to maintain is because the spacing of each generation depends on the spacing of the generations above and below it. As information is added, sub-trees need to be moved left or right to match the generation above them. And we all too quickly run out of room altogether on the paper.

This is why Burke's adopted a narrative style, where new generations and siblings can be added indefinitely, without disturbing previous work. It is simple to edit such a narrative in any text editing program. Insert (or delete) siblings, and the rest of the family is simply moved down (or up) a few lines. In HIN, when a page gets crowded, it is easy to separate out a sub-tree to a new page. Simply select the lines of HTML code covering the sub-tree, Cut, open a blank page, Paste, and set up reciprocal links between the two. Moving large sub-trees around is a lot more cumbersome in a graphical chart.

Incidentally, in my case, on top of writing the pedigree by hand, I write the HTML code itself by hand, so not only do I not have to worry about the character-by-character spacing of the graphical chart, but, when I am cutting, pasting and moving my information around, I don't even have to worry about whitespace or new lines. The browser collapses any random whitespace or new lines found in the HTML.

Printouts

As mentioned in the section "Sources" above, the cautious computer genealogist will still consider how his pedigree will read when printed out.

Now that web browsers are universally available, it is probably no longer foolhardy to adopt a format that reads better on screen. But will the electronic version always be available? Genealogists, of all people, know what disasters can befall records over the years, and what is true for paper documents is even more so for electronic data. It is possible that printouts of the pedigree (on acid-free paper) are all that will survive in the long-term, long after the electronic data itself has been lost.

HIN is readable on screen and on paper, though it is always easier to follow on screen. If there are a lot of links it may be difficult to arrange the printed pages in some obvious serial order. But this, as discussed, is an inherent feature of the nature of the data structure itself, rather than a flaw in HIN.


I now provide "Contents" pages for each surname, in which I am forced to arrange the pages in some serial order. Sometimes, as with Herbert, this is easy enough. Sometimes, as with Blennerhassett, it is very difficult to find an obvious serial ordering.


It also has the advantage of making it simple for anyone who is browsing a HIN pedigree on the Web to make their own readable printouts using the Print button on their browser. With many of the database-generated 1-person-per-page (and especially with the 1-person-per-virtual-page) web sites, for the reader to make his own printouts is much more problematic (unless the site author has provided special summary charts suitable for printing - which many, probably most, do not).

It could also be argued that, if you are trying to read the pedigree on paper, an IN narrative is easier to follow than any of the summary charts produced by databases. But again this argument predates the computer age, and many do not agree.



6. Discussion - You can draw large family trees by hand

In conclusion, it seems like heresy to say it in this computer age, but the fact is that you can draw large, interconnected pedigrees by hand, with no support other than a web page editor and browser. Many of us do not enjoy trying to wrestle our data into the pre-determined formats and rigid fields of someone else's database. We prefer to have total control over our data and layout ourselves. And now, because of hypertext, we can at last draw these structures by hand.

Hypertext provides a lot more than just the ability to link. It provides the ability to draw arbitrary n-dimensional data structures with connecting links between arbitrary points. Whereas previously we could only draw simple 1 or 2 dimensional data structures. Some people (understandably) thought that 3-dimensions was the obvious next step, but Nielsen, 1998 [42] sums up how pointless that would be:

"Most abstract information spaces work poorly in 3D because they are non-physical. If anything, they have at least a hundred dimensions, so visualizing an information space in 3D means throwing away 97 dimensions instead of 98: hardly a big .. improvement"

Real progress had to await the development of a tool capable of dealing with arbitrary complexity - hypertext. Since family trees are precisely the type of data structure that are not 1 or 2 (or even 3) dimensional, the recent invention of (or rather, universal access to) hypertext adds something genuinely new to the ancient debate about how to draw them.

Because of the Web, hypertext tools are now built in to every modern computer system, but it is important to understand that your pedigree does not have to actually be on the Web to be written in hypertext. It can simply be a collection of pages on your local disk or CD-ROM, where the web browser views the "address":   file:///C:/My Documents/Family Tree/index.html   or   file:///D:/index.html   (or wherever the main contents page of your hypertext pedigree is located).

Each piece of data only appears in 1 place

Crucial to the ability to draw a large family tree by hand is that there should be as little duplication as possible. Information should only need to be written in one place, and never (or as rarely as possible) re-copied somewhere else. It can obviously be linked to from anywhere.

For instance, I described in the section "Marriages, Cross-linking and Cross-referencing" above how issue are only listed on the father's page. I have further developed a convention that, when a female marries, her life and her husband's life are written up as a joint life on his page, where their issue will be found and where both of their deaths will be recorded. For instance, in my page on Ellen Pigott [43], see how her life before marriage is recorded on her parents' page, and then her life after marriage is combined with that of her husband, on the page where their children are listed [44].

I have now moved Ellen's life before marriage to a separate page. But the point is made with many other females throughout my tree.

There can be some discussion about these rules - for instance, what if the female marries again after her husband's death? - but the principle remains that information should only be written once, and it should be obvious where it is to be found.

But how about sharing data?

One of the main arguments for a standard format like GEDCOM is that you can easily share your data with other people. Whereas narratives written by hand are much harder to merge together.

But this in itself is a rather old-fashioned, pre-Web attitude, from the days when everyone had PCs that were not connected to each other. Why should you take a soon-to-be-out-of-date copy of someone else's data, import it into your database on your PC, and then worry about keeping it up to date, importing their latest changes, and so on? Surely it makes more sense to just link to them?

Link, don't share

For instance, there are currently dozens (if not hundreds) of people and groups working on the pedigrees of the Royal Houses of Europe and publishing their results online. Sometimes their work is available as massive GEDCOM files to download, e.g. see the Penn State site [14]. Does it really make any sense to actually download this work (i.e. a snapshot of just one person's work at just one particular moment in time) and import it into your database?

For instance, in my case, all appearances to the contrary, I have no intention of ever engaging in any research on the English Royal Family. My pages on them are there purely to provide continuity with the rest of my pages, and simply to point to places online where further information can be found, where true specialists are working away and constantly updating their data.

Again, remember that links do not have to be actually online. Private hypertext pages on disk can contain embedded links to remote pages just the same as public pages on the Web can. For more on the much-neglected art of linking, instead of copying, see Humphrys, 1999 [22]. The issue of distributing the work in genealogy, and linking to each other, is discussed by Tompsett [10].

Disadvantages of databases

To continue this point, it may not be clear to the reader that there are any disadvantages to databases, and indeed why should anyone prefer to write their pedigree by hand. I have tried throughout this paper to indicate how even (or perhaps especially) for a computer scientist, fear of databases is quite reasonable.

For many of us, committing to a database means a fear of having to interact with our data through a narrow set of pre-defined menus. Consider the problem of embedding hyperlinks in our data, discussed in the section "Internal and External Hyperlinks" above. We may understand what a hyperlink is, but we have to wait until the database understands before we can use them. Hence the impoverished linking of almost every genealogy online today.

A database must foresee every eventuality, in a way that the blank screen on which you hand-draw your narrative need not. Is that personal name too long to fit into the pre-defined field [11]? Does that person have too many wives [11]? (And if so, what do you do?) Did the database anticipate that someone might marry their own niece [45]? And so on. The discussion "Side-notes, partial trees, and other additions" above is a long list of additions and enhancements to our pedigree that may be difficult or impossible to construct through a database. When hand-drawing a narrative, these problems that database users wrestle with every day simply do not exist.

Am I arguing against using databases at all?

Am I arguing against using databases at all? Of course not. There are also disadvantages to writing pedigrees by hand. You miss out on all the dedicated software designed to process GEDCOM (and other format) databases - find date inconsistencies, generate summary charts, a surname index, and so forth. You can still have a search engine, only it will have to search the raw text of the web pages rather than being able to search a particular field, like, for example, "surname". On my site [16], I have a full search engine, but no surname index.

In particular, it is easy to generate binary ancestor charts from a database. See for example the ancestors of the 8th Earl of Ormonde [47] in the genealogy of Paddy Waldron [46]. A quite impressive system is in operation at the online Ancestral File of the Church of Jesus Christ of the Latter-Day Saints [48]. See for example the ancestors of Edward III [49]. Another good example of an automated ancestors chart showing a large number of generations on screen at the same time (at some cost in readability) is the Ancestors of Paul B. McBride [50]. For a survey of GEDCOM-to-web-page convertors see [51]. For a survey of genealogy databases with native web-output see [52].

However, just because we want to use HIN doesn't mean we have to abandon databases. Which leads us to the next section.







Part 3 - Future Work and Conclusion


7. Future Work

Automatically-generated HIN output

It is nice to know that we can draw serious pedigrees by hand if, like me, you dislike interacting with databases. However, HIN can also be seen as just another possible output format for a database - an output format optimised for hypertext and variable resolution. It should in fact be possible to automatically build a Hypertext Indented Narrative set of web pages from any of the standard databases.

The main difficulty would be in learning when to break the tree and start a new page. Too much breaking and it is too much like 1 person per page. Too little breaking and it is too much like the paper IN format. I currently have a Computer Science student, Andrew Martin, working on a "GEDCOM 2 narrative" program [53] that will automatically build well-balanced HIN narrative output from a GEDCOM database. Defining reliable heuristics for breaking a new page is proving to be a challenge.


Hypertext version of Burke's Peerage itself

It should of course be possible to write a program to automatically generate HIN output from the electronic source of the most prominent user of an IN format, the current Burke's Peerage itself [2], for presentation on a website or CD-ROM (it is assumed that there is an electronic source [4]).

Burke's Peerage has now gone online, but has not used any hypertext format. Instead it is just the paper format published electronically, as shown by this sample page.


Linking to a label on a page

In the case of the 3rd Earl of Pembroke [20], described in the section "Defining Permanent URLs" above, to link to the 3rd Earl I simply link to the page on which he is found:   Herbert/2nd.earl.html  

Sometimes it can take a few moments to see where someone appears on a multi-individual page, though. Possibly I should use a label tag on the page, and link to:   Herbert/2nd.earl.html#3rd.earl   I cannot decide if this aids usability (by jumping direct to the person) or hinders it (by obscuring the context that surrounds the person). Churchyard [54] discusses the problem of how browsers display links to labels on a page.

This is not a good example any more since the 3rd Earl has now moved to his own page. Presumably the point is still clear though.


File Naming

Currently I have developed a system where the "home page" for each surname I cover is of the form:   FamilyTree/Surname/   and if the pedigree for that surname spills over onto more than 1 page, it will be continued on pages with names like:   FamilyTree/Surname/john.of.kilcash.html   for the sub-tree descending from John of Kilcash, or:   FamilyTree/Surname/5th.viscount.html   for the sub-tree descending from the 5th Viscount. That is, URLs that I hope fully describe what will be found there, and that I hope will stay working as the tree grows and changes. Whereas with an address like:   FamilyTree/41639.html   one has a lot less confidence that this will always point to the same thing (what if the database reorganises, and re-numbers its pages?).

I am not sure this is the ideal Web site file structure, however. For instance, what if you are dealing with ancient families, before surnames? There are many other minor implementation issues. For instance, what if I want to include a brief sketch of an in-law's family? Currently I would use the address:   FamilyTree/Surname/othersurname.html   but other schemes could be imagined. Tompsett [10] discusses naming conventions in URLs.

Automatic numbering 1, (1), 1a, 1b, ..

Currently I number the offspring by using the "ordered list" <ol> tag in HTML, and then listing each child after a "list item" <li> tag. This automatically numbers the list from "1." to "n." without the author having to keep track of the numbers. Even better, the numbers are automatically recalculated as siblings are added to or removed from the list.

Now it would be nice to be able to replicate the system used in Burke's Peerage, where successive indented generations are numbered   1, 1, (1), 1a, 1b, etc. That is, we automatically change the numbering scheme depending on how many levels we have recursed into. [The non-technical reader may wish to skip at this point to the section "Conclusion" below.]

Currently in HTML the <ol> tag supports the numbering schemes 1, I, i, A or a. We can write a style sheet like the following, called   hyper.css:


  ol { list-style-type: decimal }
  ol ol { list-style-type: lower-roman }
  ol ol ol { list-style-type: lower-alpha }   
  ol ol ol ol { list-style-type: decimal }

put the following line in the   <HEAD>   section of each web page:


  <LINK rel="stylesheet" type="text/css"   
        href="hyper.css" title="hin">

and then all subsequent use of the <ol> tag will have the following properties. It will automatically use the numbering 1,2,3,4,... for the first level, i,ii,iii,iv,... for the next level, a,b,c,d,... for the next level, and 1,2,3,4,... for all lower levels. The style sheet can be altered to implement other combinations of the basic schemes 1, I, i, A and a.

We are nearly there, but it is still not quite the same as Burke's. The Burke's format would be 1,2,3,4,... for the first level, 1,2,3,4,... for the next level, (1),(2),(3),(4),... for the next level, 1a,2a,3a,4a,... for the next level, 1b,2b,3b,4b,... for the next level, and so on. It seems to me that Burke's, by keeping the decimal number throughout, is still the more readable format. It is not clear to me how such a scheme may be implemented in HTML at present (apart from just using <UL type="none"> and numbering them all by hand). We want to use the decimal value calculated by the <ol> tag, but append "a" or "b" to it. Perhaps there is some way of doing this.


8. Conclusion

The Burke's Peerage format was the product of well over a century of hard experience, and contains within it all the basic ideas of hypertext, that simply had to wait for recent technology to come along.

Hypertext gives the Burke's IN format a new lease of life, and brings out one of its lost and forgotten strengths - variable resolution. A Hypertext Indented Narrative allows us see at a glance what the major components of the pedigree are. And gives us, for the first time in history, the ability to draw these structures properly by hand - a welcome alternative to having to enter them into some inflexible database.

It may surprise many in this age of databases that you can still draw a multi-thousand member family tree by hand. You can even start with a single page and just start writing, splitting off extra pages if and when the need arises as your tree grows. Unlike a graphical chart, you never have to rewrite or redraw any of it as it grows. Unlike a database, you are free to insert pictures, small histories, hyperlinks and diversions of all sorts at any point whatsoever.

One could not recommend constructing pedigrees in this format, of course, were it not for the fact that hypertext browsers are now so universally available, and also the confidence that, if necessary, the HIN format can still generate a reasonably comprehensible printout - for long-term archives, and for those readers without any computer equipment at all.





References

[1]   Christian, Peter (1996), Genealogical Publishing on the World-Wide Web, Computers in Genealogy 5(9): 381-7. Online at: http://www.sog.org.uk/cig/vol5/509christian.html

[2]   Burke's Peerage and Baronetage, 106th edn, 1999.

[3]   "Reader's Guide", Burke's Peerage and Baronetage, 1999, pp. il-li.

[4]   Review of   Burke's Peerage and Baronetage, 1999, The Baronage Press, http://www.baronage.co.uk/bphtm-01/books-04.html

[5]   Burke's Irish Family Records, 1976.

[6]   "Guide to the Reader", Burke's Irish Family Records, 1976, pp. xxvii-xxix.

[7]   Genealogy of the British Royal Family, presented by Brian Tompsett on the Web at the University of Hull, at: http://www.hull.ac.uk/php/cssbct/genealogy/royal/

[8]   William IV, in the Genealogy of the British Royal Family, presented by Brian Tompsett on the Web at: http://www.hull.ac.uk/php/cssbct/cgi-bin/gedlkup.php/n=royal?royal00203

[9]   Discussion of the GEDCOM 2 CGI software used to present the Genealogy of the British Royal Family, Brian Tompsett, http://www.hull.ac.uk/php/cssbct/genealogy/SoftwareUsed.html

[10]   Problems of distributed Genealogical Databases, Brian Tompsett, http://www.hull.ac.uk/php/cssbct/genealogy/DistributedGenealogy.html

[11]   Limitations of the GEDCOM to Web Experiment, Brian Tompsett, http://www.hull.ac.uk/php/cssbct/genealogy/Limitations.html

[12]   Cooper, Chris Shearer (1998), Display Your Family on the Web With JavaGED, Journal of Online Genealogy, Vol. 2, No. 9 (March 1998). Online at: http://www.onlinegenealogy.com/archive/mar98/sof006l.htm

[13]   JavaGED Genealogical Display System, http://www.sc3.net/JavaGEDHome.html

[14]   Genealogy of the British Royal Family, on the Web at Penn State University, at: http://ftp.cac.psu.edu/~saw/royal/


[15]   William IV, in the Genealogy of the British Royal Family, on the Web at Penn State University, at: http://ftp.cac.psu.edu/~saw/royal/r07.html#I203

Comparisons:


[16]   History and Genealogy Web pages of Mark Humphrys, http://humphrysfamilytree.com/

[17]   Descendants of John Blennerhassett, of Ballycarty, Co.Kerry, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Blennerhassett/john.ballycarty.html

[18]   Descendants of Capt. Robert Blennerhassett, of Castle Conway, Co.Kerry, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Blennerhassett/robert.conway.html

[19]   Descendants of William Gibbon, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Gibbon/william.html

[20]   Descendants of Henry Herbert, 2nd Earl of Pembroke, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Herbert/2nd.earl.html

[21]   Discussion of problems with permanent URLs when using the GED2HTML program, Eugene W. Stark, http://www.gendex.com/ged2html/3.0/linking.html

[22]   Humphrys, Mark (1999), Why on earth would I link to you?, The Irish Times, 15th Feb 1999. This is online at: http://computing.dcu.ie/~humphrys/why.link.html

[23]   Howells, Mark (1998), Link Rot in your Family Tree?, Computers in Genealogy 6(8): 365-72. Online at: http://www.oz.net/~markhow/writing/linkrot.htm

[24]   Nielsen, Jakob (1997), The Need for Speed, The Alert Box, 1st Mar 1997. Online at: http://www.useit.com/alertbox/9703a.html

[25]   Nielsen, Jakob (1998), Nielsen's Law of Internet Bandwidth, The Alert Box, 5th Apr 1998. Online at: http://www.useit.com/alertbox/980405.html

[26]   Family of Elizabeth Fitz-Hugh, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/FitzHugh/

[27]   Family of Elizabeth Cheyne, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Parr/

[28]   Family of 2nd Baron Vaux, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Vaux/

[29]   Eagar family tree, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Blennerhassett/eagar.html

[30]   Foster, Joseph (1887), The Royal Lineage of Our Noble And Gentle Families, London.

[31]   "Descent from Adam" site, Luke Stevens, http://www.geocities.com/Athens/Aegean/2444/descent.htm

[32]   Legend of "Descent from Adam" site, Luke Stevens, http://www.geocities.com/Athens/Aegean/2444/legend.htm

[33]   Descendants of Charles I, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Royal/charles.i.html

[34]   Descendants of Arthur Gibbon, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Gibbon/arthur.html

[35]   Ballyseedy House, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Blennerhassett/ballyseedy.new.html

[36]   Humphrys family tree, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Humphrys/

[37]   Gibbon family tree, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Gibbon/

[38]   Blennerhassett family tree, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Blennerhassett/robert.ballycarty.html

[39]   Maltass family tree, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Maltass/

[40]   Family of The O'Rahilly, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/ORahilly/the.orahilly.html

[41]   "Moncreiffe's Family Records" at the Baronage Press, http://www.baronage.co.uk/bphtm-01/aboutmfr.html

[42]   Nielsen, Jakob (1998), 2D is Better Than 3D, The Alert Box, 15th Nov 1998. Online at: http://www.useit.com/alertbox/981115.html

[43]   Family of Ellen Pigott, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Pigott/patrick.html

[44]   Descendants of Stephen O'Mara and Ellen Pigott, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/OMara/stephen.html

[45]   Harman Blennerhassett, presented by Mark Humphrys on the Web at: http://humphrysfamilytree.com/Blennerhassett/harman.html

[46]   Genealogy of Paddy Waldron, formerly at: http://www.bess.tcd.ie/genealog.htm. Now here.

[47]   The 8th Earl of Ormonde, in the Genealogy of Paddy Waldron, formerly at: http://pwaldron.bess.tcd.ie/ged2html/I4530.html. Now here.

[48]   Ancestral File, Church of Jesus Christ of the Latter-Day Saints (LDS), http://www.familysearch.org/

[49]   Ancestors of Edward III, LDS Ancestral File, http://www.familysearch.org/Eng/Search/af/pedigree_view.asp?recid=7250172

[50]   Ancestors of Paul B. McBride, http://homepages.rootsweb.com/~pmcbride/rfc/p1.htm

[51]   Survey of GEDCOM-to-web-page convertors, Mark A. Knight, http://help.surnameweb.org/knight/

[52]   Survey of genealogy databases with native web-output, http://www.geocities.com/Heartland/Acres/7002/gc2gedcom.html

[53]   "GEDCOM 2 narrative" project, by Andrew Martin, Dublin City University, http://www.geocities.com/am97395331/

[54]   Discussion of how browsers display links to labels on a page, Henry Churchyard, http://www.pemberley.com/janeinfo/pridprej.html#useppjaht



Screen shots

Some good screen shots to illustrate HIN format:
The descendants of Capt. Robert Gibbon of Aberdeen
Illustrates that we can dispose of all died young, died unmarried, etc., and all short trees, on 1 page, and only need to branch off 2 sub pages for the 2 large sub-trees of William and Arthur. Also illustrates that because it is free-flow, hand-written, I can easily include portraits, modern photos, an entire Regency diary as a linked page, links to external sites (e.g. a map) and even a nice quote along the bottom.

Frank Flanagan
Illustrates a long biography and a short list of descendants.

Dick Humphreys
A long biography and a longer list of descendants.

Cashel of Tralee, Co.Kerry
Illustrates where we have a number of fragmentary family trees that may or may not be connected to each other - difficult to show if your database expects your tree to be connected. Also illustrates what the home page for a surname looks like under my scheme. Lots of background notes. Connected sub-trees then launch off on separate sub-pages. Also illustrates some linking to various sources, e.g. the NLI.

Provost Skene's House, Aberdeen
Illustrates a page for a property. This property was owned by a whole number of different families. Instead of trying to decide on which of their pages to put the information about it, I give it its own page, and hyperlink back and forth. I also show on this page what happened to it before and after all of these owners.

Ballyseedy House, Co.Kerry
Another property page, with different owners hyperlinked back and forth. Also shows what happened to it before and after all these owners.

Followup paper

There is a followup paper: Humphrys, Mark (2000), Hypertext for 1-name studies v. Hypertext for family histories: A reply to John Bending, Computers in Genealogy 7(3):121-8 (Sept 2000).

Some further links



Feedback form

Long version of this form.

Email me.

 
Upload additions and corrections to this site:
Upload a file (e.g. a picture):
Your email address:
Enter this password:

Help      Conventions      Abbreviations      Privacy policy      Adoption policy      Image re-use      Feeds

     Bookmark and Share           Since 1983.