Back to SimWalk2 Overview


SimWalk2: Penetrance Data File Format

Summary:

This file is optional and is only used for some of the analysis options, e.g., parametric location scores. This file includes the penetrances for each phenotype at the trait locus, i.e., Prob( phenotype | genotype ).

Simple Format Descriptors:

Fortran uses the following format codes, also called descriptors, to describe data: (A) is used for character data, (I) for integer data, (F) for numbers with decimals, and (X) for blank spaces. For example, (A8) specifies a word of length eight characters, (I2) specifies an integer occupying two spaces, (F8.5) specifies that the following eight spaces contain a number with a decimal part and (1X) specifies a single blank space.

Penetrance Data File Format:

To use reduced penetrance values at the trait locus, one needs an additional input file beyond the usual four: BATCH2.DAT (control parameters), MAP.DAT (map data), LOCUS.DAT (locus data), and PEDIGREE.DAT (pedigree data). For SimWalk2, one must specify in batch item #19 in the BATCH2.DAT file the name of this additional input file that will contain the penetrance data, e.g., PEN.DAT.

The format for this penetrance file is the following. The first line contains the number of trait loci in I8 format (i.e., in the first eight characters) [SimWalk2 expects a '1']. The second line contains first the name of the trait locus as it is in the locus data file in A8 format [this is just a check that one is using the correct penetrance file] and then the number of liability classes in I8 format.

Then for each liability class you must have two lines of data. Each line starts with an integer label used for that liability class in I8 format. For example, for liability class 1 the unaffecteds might be labeled 101 and the affecteds labeled 201. (This scheme is meant to ease the transition from Linkage-format data where the affection status column immediately precedes the liability class column. For example, the entry 1 1 in Linkage-format would become in SimWalk2 format 101, similarly 2 1 becomes 201.) Following the integer label on each data line are the actual penetrances for that liability subclass. As in Linkage-format the penetrances are listed in order for the three genotypes 1/1, 1/2, and 2/2.

There is a subtle but important point here. In the locus data file be sure the trait alleles are listed in order: the normal, or wildtype, allele first (this is usually the allele '1') and then the affected allele second (this is usually the allele '2'). Unfortunately the '1' in the genotype 1/1 listed above does not represent the '1' allele, rather it means the first allele listed in the locus data file. Just be sure the trait wildtype allele, usually '1', is listed first and the affected allele, usually '2', is listed second in the locus data file and all will be well.

Consider the following example penetrance data file:

_______________________________<top of file>_______________________________
1                <= number of trait loci (I8) [you will have just one]
traitnam 3       <= trait locus name (A8), number of liability classes (I8)
101      0.90    0.90    0.10    <= liab. class (I8), 3 penetrances (3F8.5)
201      0.10    0.10    0.90    <= liab. class (I8), 3 penetrances (3F8.5)
102      0.95    0.95    0.05    <= liab. class (I8), 3 penetrances (3F8.5)
202      0.05    0.05    0.95    <= liab. class (I8), 3 penetrances (3F8.5)
103      0.99    0.99    0.00    <= liab. class (I8), 3 penetrances (3F8.5)
203      0.01    0.01    1.00    <= liab. class (I8), 3 penetrances (3F8.5)
_____________________________<bottom of file>______________________________

In the above example the comments, that appear from the '<=' onwards, are of course disregarded and optional. In this example, an affected person in liability class 1 should, in the pedigree data file, have the value 201 in their first quantitative variable, i.e., the first piece of data after the last marker genotype for that person. Then, for example, the penetrance for the 2/2 genotype at the trait locus for that person will be 0.90 .

Adding liability classes to the pedigree may require one to adjust the format listing on the second line of the pedigree data file to indicate that there is this additional data item immediately after the last marker genotype. Recall, however, that in this format listing all variables must be read in as character strings, that is, using the 'A' edit descriptor, for example, using A8.

One last reminder, since the liability class information is in the first quantitative variable, be sure and set the number of quantitative variables to (at least) 1. For SimWalk2 this number is set using batch item #18 in the file BATCH2.DAT.

[Partially abstracted, with kind permission, from "Documentation for Mendel, Version 3.0" which is copyright 1985-1991 Kenneth Lange.]


Back to SimWalk2 Overview