Back to SimWalk2 Overview


Mendel/SimWalk2: Pedigree Data File Format

Summary:

This file contains the data specific to each pedigree and individual. For example, for each individual one provides their name, their parents’ names (if they are in the pedigree), their sex, and the phenotype or genotype at each locus for which they are typed.

The pedigree file and the locus file must be perfectly coordinated in the sense that the phenotype fields for individuals must match exactly the order of the loci in the locus data file.

Simple Format Descriptors:

Fortran uses the following format codes, also called descriptors, to describe data: (A) is used for character data, (I) for integer data, (F) for numbers with decimals, and (X) for blank spaces. For example, (A8) specifies a word of length eight characters, (I2) specifies an integer occupying two spaces, (F8.5) specifies that the following eight spaces contain a number with a decimal part and (1X) specifies a single blank space.

Pedigree Data File Format:

The pedigree file contains records for pedigrees and records for individuals within these pedigrees. A pedigree record specifies the number of people in a pedigree and an ID for the pedigree. IDs need not be unique and may be left blank. An ID can have a maximum of eight characters. After a pedigree record comes the individual records for each person in the pedigree. Each individual record provides in order: an individual ID, the IDs of the individual's parents if present in the pedigree, his/her sex, identical twin status, all his/her relevant phenotypes, and then any liability classification and quantitative variables.

The term pedigree has certain technical connotations. To reconstruct the relationships between individuals properly, often people must be included who are dead or otherwise unavailable for study. One rule is important to keep in mind. Either both parents or neither parent of a person must be listed in the pedigree. Those people without parents in the pedigree can be thought of as founders of the pedigree.

Preceding the pedigree and individual records there should be two Fortran format lines for reading in the data. The first format line tells how to read the pedigree records. It should consist of an integer format (I) for reading the number of individuals in a pedigree and a character format (A) for reading the pedigree ID. (I5,1X,A8) gives a typical example of this format line. The 1X format allows skipping one space between the two fields.

The second format line directs the reading of the individual records. Consider the sample pedigree file below.

_______________________<top of file>_______________________
(I2,A8)
(3A8,2A1,1X,A8,1X,A8)
4
FATHER                  M  B        1/2     
MOTHER                  F  AB       1/1     
CHILD1  FATHER  MOTHER  F  B        1/2     
CHILD2  FATHER  MOTHER  M  AB       1/2     
3   Ped2
FATHER                  M
MOTHER                  F  AB       1/1     
CHILD   FATHER  MOTHER  F  B        1/2     
_____________________<bottom of file>______________________


In this example the person ID and parental IDs are read in A8 format. Naming pedigrees is optional. Here the first pedigree's name is left blank. Although the mother is listed as the second parent in each pedigree, there is no requirement that the first parent be the father and the second parent the mother. Sex and monozygotic twin status are read in A1 format, ABO phenotype in A8 format and MK phenotype in A8 format.

As in this example, all items or fields on an individual record must be read in character format (A) and consist of eight characters or fewer. Missing values for any field are represented by blanks.

Two people in a pedigree obviously cannot share an ID. People from different pedigrees can. In the above example we have used an 'F' for female and 'M' for male. Other sex codes could be substituted if they are properly coordinated with the corresponding batch item, for SimWalk2 batch item #12 in the BATCH2.DAT file. The monozygotic twin field is left blank in the example because no one has an identical twin. If there are identical twins in a pedigree, then each pair of them should be assigned a unique non-blank identifier. Identical triplets, etc. are handled in the same way.

The example pedigree data file PEDIGREE.DAT has a companion annotation file that describes its entries.

Differences between SimWalk2 and Mendel3 format:

The Sampling analysis option of SimWalk2 permits one to indicate for each phenotype in the pedigree file: 1) whether or not that marker phenotype should be fixed in the simulations, i.e., conditioned upon; and 2) whether or not that phenotype should be set as unknown in each of the output simulated pedigrees. To specify that a marker phenotype should not be fixed, simply set that phenotype to be unknown in the input pedigree file. To indicate that a phenotype should be set as unknown in the output pedigrees, include the extra character '*' (asterisk) somewhere in that phenotype in the input pedigree file. (As stated above, all phenotypes, including the extra '*' if present, must be at most eight characters in length.)

[Partially abstracted, with kind permission, from "Documentation for Mendel, Version 3.0" which is copyright 1985-1991 Kenneth Lange.]


Back to SimWalk2 Overview