28 Oct 2010 SLINK: a general simulation program for linkage analysis Based on an algorithm by Jurg Ott Programming by Daniel E. Weeks with the help of Mark Lathrop and Jurg Ott Documentation by Daniel E. Weeks and Jurg Ott **** WARNING **** Familiarity with the LINKAGE programs is assumed. You will not be able to correctly use these programs unless you are familiar with the LINKAGE programs. This documentation does not attempt to teach you how to use the LINKAGE programs. Furthermore, it is assumed that you already have the LINKAGE package. If you do not have it, it may be obtained from ftp://linkage.rockefeller.edu/software/linkage/ However, we recommend using the faster version known as FASTLINK, which is available from here: http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/fastlink.html ***************************************************************** 0. CURRENT VERSION . . . . . . . . . . . . . . . . . . . . . . 2 1. INTRODUCTION. . . . . . . . . . . . . . . . . . . . . . . . 2 2. OVERVIEW OF SLINK . . . . . . . . . . . . . . . . . . . . . 3 3. INPUT FILES AND AVAILABILITY CODES. . . . . . . . . . . . . 4 4. HOW TO USE SLINK. . . . . . . . . . . . . . . . . . . . . . 6 5. TYPICAL APPLICATIONS. . . . . . . . . . . . . . . . . . . 11 5A. MSIM: Approximating the expected lod score. . . . . . . 11 5B. LSIM: Disease versus map of markers . . . . . . . . . . 14 5C. ISIM: Average maximum lod score and power . . . . . . . 17 5D. MSIM: Interpolating Zmax in each replicate. . . . . . . 19 5E. Finding the peak of the expected lod score curve. . . . 19 5F. MSIM and HOMOG and ELODHET: analysis under heterogeneity 19 6. TECHNICAL INFORMATION . . . . . . . . . . . . . . . . . . 20 6A. Program constants . . . . . . . . . . . . . . . . . 21 6B. References for SLINK. . . . . . . . . . . . . . . . 22 6C. Acknowledgments . . . . . . . . . . . . . . . . . . 22 7. LITERATURE. . . . . . . . . . . . . . . . . . . . . . . . 22 0. CURRENT VERSION The current version of FastSLINK is 3.00 dated 28 Oct 2010. FastSLINK is written in C and executes much more rapidly than the original version of SLINK, which was written in Pascal. The documentation below describes the original SLINK program, but is, for the most part, applicable to the FastSLINK program. For additional details regarding how FastSLINK differs from SLINK, please see the README.txt file that is distributed with FastSLINK. Compared to previous versions, the following changes have been implemented: The default buffer size (128 bytes) for the input or output files (whichever is usually larger) was increased to 4K bytes. This resulted in the addition of at least one extra line of code in nonstandard Pascal. The critical levels for the maximum lod score (previously fixed at 1, 2, and 3) are now user defined and must be furnished to the analysis program in an input file called LIMIT.DAT (see section 5). Similarly, input previously expected by SLINK on an interactive basis must now be furnished in an input file called SLINKIN.DAT (see section 3). The quadratic interpolation routine, QUADMAX, in the MSIM program has been rewritten, because it occasionally seemed to give nonsensical results. It is now based on formulas (8.8)- (8.14) in Ott (1991) and interpolates the maximum lod score given three pairs of values (Z, theta). It does not carry out any extrapo- lation beyond the smallest or largest theta value. If the lengths of the two intervals differ by more than a factor of 4, no quadratic interpolation is carried out, because the quadratic approximation may not fit the lod score curve very well in those circumstances. Also, no interpolation takes place when the first of the three lod scores is less than -100 (at theta=0, presumably). Known bugs: In the present version of SLINK, pedigree ID's may be names or numbers. If they are numbers, these numbers must not exceed the maximum number of pedigrees specified (it is safest to number pedigrees consecutively starting with 1). This is inconsistent with the LINKAGE programs, and the next program version will correct this inconsistency. 1. INTRODUCTION Simulation can provide approximate answers to questions that are cumbersome or impossible to answer analytically. For exam- ple, in a linkage study of a disease it is important to know if the pedigrees you have collected (or plan to collect) will be sufficient to detect linkage. The power to detect linkage depends on a variety of factors, such as the structure of the pedigree, the number of affected individuals, and the informa- tiveness of the markers. The SLINK program described below allows one to carry out such power calculations by simulating genotypes at one locus given the phenotypes at another locus linked with the first locus. If one is interested in simulating under no linkage (e.g., to investigate exclusion possibilities or p-values), the SLINK program may also be used but other programs such as SIMULATE are in that case much more efficient than SLINK. The SLINK package consists of a simulation program (SLINK) and several analysis programs (MSIM, LSIM, ISIM, ELODHET) (this allows analyzing the data under models different from those under which the data were generated). As outlined below, the input files to these programs conform to the rules required by the LINKAGE programs. SLINK implements a simulation algorithm developed by Jurg Ott and described in: 1) Ott J (1989) Computer-simulation methods in human linkage analysis. Proc Natl Acad Sci USA 86:4175-4178 The algorithm was implemented in the original SLINK computer program package by Weeks, Ott, and Lathrop: 2) Weeks DE, Ott J, Lathrop GM (1990) SLINK: a general simulation program for linkage analysis. Am J Hum Genet 47:A204 (abstr) SLINK is based on the LINKAGE programs version 4.9 (Lathrop et al., 1984) and accepts slightly modified LINKAGE data files as explained in the file 'slink.txt'. The code has been updated to be consistent with LINKAGE version 5.1. The SLINK simulation program has been modified by Schaffer and Weeks to use the algorithms developed by Cottingham et al: 3) Cottingham Jr RW, Idury RM, Schaffer AA (1993) Faster sequential genetic linkage computations. Am J Hum Genet 53:252-263 Please cite references 1-3 if you use FastSLINK. Thank you. Note that SLINK by itself is quite limited in terms of the number of markers that it can handle. If you wish to simulate large number of markers, then the SUP program will also be necessary: Lemire M. SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values. BMC Genet. 2006 Jul 3;7:40. PubMed PMID: 16803631; PubMed Central PMCID: PMC1524809. SUP is available from the web site: http://mlemire.freeshell.org/software.html 2. OVERVIEW OF SLINK The documentation below describes the original SLINK program, but is, for the most part, applicable to the FastSLINK program. For additional details regarding how FastSLINK differs from SLINK, please see the README.txt file that is distributed with FastSLINK. The program SLINK is a general computer simulation program that employs a variation of the algorithm described by Ott (1989). Suppose there are N people in a pedigree. Let x = (x1, x2,...,xN) represent the vector of phenotypes of the N people in the pedigree. Likewise, let g = (g1, g2,...,gN) represent the vector of multi-locus genotypes including phase information. Then the conditional probability distribution of the genotypes given the phenotypes may be calculated by a series of successive risk calculations: P(g>=x) = P(g1>=x) P(g2>=g1,x) P(g3>=g1,g2,x)... In the SLINK simulation algorithm, we calculate the conditional probabilities (or risks) of all the possible multi-locus genotypes with phase, P(g1>=x) and, based on these, we randomly assign one of the genotypes to person 1. We then calculate P(g2>=g1,x), taking into account the genotype g1 just generated, and randomly assign a genotype to person 2, and so on. The process of succes- sive risk calculations continues until all individuals in the pedigree have been assigned a multi-locus genotype. This approach is quite efficient since once a multi-locus genotype has been assigned to one individual, it is known in all subsequent steps. Note that this algorithm permits simulation conditional on any combination of phenotypic data. For example, if the pedigree were partially typed at a marker of interest, one could simulate conditional on both the disease phenotypes and the marker data currently available. SLINK is based on the LINKAGE programs version 4.9 (Lathrop et al., 1984) and accepts slightly modified LINKAGE data files as explained below. The simulated pedigrees may be analyzed by either the standard LINKAGE programs or by the special companion programs (MSIM, ISIM, or LSIM), which have been modified to read one replicate of the pedigrees at a time, rather than reading in the entire pedigree file. In addition, the companion programs provide some simple statistical summaries. As described above, the data are simulated conditional on the phenotypes given in the pedigree file, at the recombination fraction(s) given in the data file. The loci are conceptually divided into two categories: the 'trait' locus and the marker loci; the trait locus need not be an affection-status locus type but may be a codominant marker as well. There may be either one or no trait locus, while there may be as many marker loci as you desire. The trait locus is distinguished from the marker loci in order to provide more flexibility in simulating the data, as described below. 3. INPUT FILES AND AVAILABILITY CODES SLINK requires three input files: A) 'slinkin.dat' holding various parameter values, which previously had to be furnished interactively. No text is permit- ted between the values but text may follow the last input value (see example file). The values are: a) A seed for the random number generator. This should be a different integer between 1 and 30,000 each time. Larger numbers (> 25,000) are better. b) The number of replicates desired. c) The locus number identifying the trait locus. This is the number of the trait locus in the 'simda- ta.dat' file (see below). If you placed the trait locus first in your 'simdata.dat' file, then the locus number is 1. Input a zero if there is no trait locus, i.e., if all the loci are markers. d) The proportion of unlinked families (if you want to allow for heterogeneity). Input 0 if you want to simulate under the assumption of homogeneity (all families are of the linked type, normal situation). See section 5F for analyzing data generated under heterogeneity. B) 'simdata.dat', which is a standard LINKAGE data file in MLINK-format. C) 'simped.dat', which is a standard LINKAGE pedigree file with an additional column inserted after the last phenotype column. This additional column in the SLINK pedigree file 'simped.dat' contains the availability code, which controls what type of phenotypes are written to the output file. These codes are consistent with the codes used by the simulation program SIMLINK (Boehnke, 1986; Ploughman and Boehnke, 1989). Meaning of code for Code Markers Trait 0 Unavailable Use orig. phenotypes as given in 'simped.dat' 1 Available Use simulated phenotypes 2 Available Use orig. phenotypes as given in 'simped.dat' 3 Unavailable Use simulated phenotypes A person should be coded as "marker unavailable" if that person will be assigned the phenotype "unknown" at each marker locus in the output file 'pedfile.dat'. This is appropriate if the person will not be typed for any markers because DNA samples are not available from that person (i.e., they are dead or uncooperative). Likewise, a person should be coded as "marker available" if that person will be assigned simulated marker phenotypes in the output file 'pedfile.dat'. It is usually appropriate to "use original phenotypes" as given in 'simped.dat' at the trait locus, because most simulation studies of diseases are carried out conditional on the observed phenotypes. If the "use original phenotypes" option is chosen, then the trait phenotype remains as it was in the 'simped.dat' input file. In rare cases, you may want to simulate trait phenotypes by choosing the "use simulated phenotypes" option. Keep in mind that these simulations are carried out condi- tional on the phenotypes given in the 'simped.dat' input file. Thus, if you want to indicate that someone is unavailable at the trait locus, make their original phenotype unknown in 'sim- ped.dat' and use one of the two "Use original phenotypes" codes. If a simulation was being carried out using marker data only, then only two availability codes are needed, as indicated in the table below (codes 2 or 3 can still be used, if desired). Every person who has an availability code of 0 will be given the 'unknown' phenotype at each marker locus. Every person who has an availability code of 1 will be given a simulated phenotype at each marker locus. Code Meaning of code for Markers 0 Unavailable: Assign phenotype 'unknown' at each marker 1 Available: Assign simulated phenotypes at each marker It is highly recommended that you read the example problems below before attempting to use this simulation package. EXAMPLE 1: Small 4-member family Suppose we want to simulate the marker genotypes for two affected children in a simple nuclear family where the two parents must be carriers for a fully penetrant recessive disease, and they are known to have four different marker alleles. We first create a pedigree file called 'simpre.dat' which, after processing it with MAKEPED (one of the programs of the LINKAGE package), will become 'simped.dat' ('simpre.dat' is shown below; the comments beginning with <= are not part of the file). We use the availability code 2 for everyone, which means "Markers available, Use original trait phenotypes". Thus, the 'pedfi- le.dat' file (to be created by SLINK) will contain the original trait phenotypes as given in 'simpre.dat' (and 'simped.dat'). At the marker locus, person 1 will be 1/2 and person 2 will be 3/4, since the simulations are carried out conditional on the pheno- types given in the 'simped.dat' file. The two affected children (persons 3 and 4) will be assigned simulated marker phenotypes, since their original marker phenotypes in the 'simped.dat' file are unknown (0 0). File: simpre.dat 1 1 0 0 1 1 1 2 2 <= Father who is 1/2 at the marker 1 2 0 0 2 1 3 4 2 <= Mother who is 3/4 at the marker 1 3 1 2 1 2 0 0 2 <= First affected child 1 4 1 2 2 2 0 0 2 <= Second affected child In the corresponding 'simdata.dat' file below, we give penetrances corresponding to a fully penetrant recessive disease at the trait locus and define a codominant four-allele system at the marker locus. File: simdata.dat 2 0 0 5 << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ (IF 1) 1 2 1 2 << AFFECTION, NO. OF ALLELES 0.9900 0.0100 << GENE FREQUENCIES Trait locus 1 << NO. OF LIABILITY CLASSES 0.0000 0.0000 1.0000 << PENETRANCES 3 4 << ALLELE NUMBERS, NO. OF ALLELES Marker locus 0.250000 0.250000 0.250000 0.250000 << GENE FREQUENCIES 0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.01000 << RECOMBINATION VALUES 1 0.01000 0.50000 << REC VARIED, INCREMENT, FINISHING VALUE This example (files in subdirectory EXAMPLE1) is referred to again in sections 5C and 5D. 4. HOW TO USE SLINK First, compile the programs in the SLINK package to create executables. Compilation instructions are in the documentation file README.txt. To carry out a simulation, proceed as follows (see also figure 1, appended): 1) Create a 'simdata.dat' file defining the locus systems and the true thetas. 2) Create a 'simped.dat' file defining the pedigree struc- ture, phenotypes, and availability codes. 3) Create the 'slinkin.dat' file defining parameters for the simulation (see section 3, above). 4) Run SLINK to create a 'pedfile.dat' with the simulated data in it. 5) Create a 'datafile.dat' defining how the simulated data should be analyzed. 6) Run UNKNOWN to create a 'speedfile.dat' and an 'ipedfi- le.dat'. 7) Run the appropriate analysis program, such as MSIM or LSIM. These steps are described in detail below; the files of the example data referred to below are in subdirectory EXAMPLE2. Step 1) Create a 'simdata.dat' file defining the locus systems and the true thetas. The file 'simdata.dat' defines the locus systems and the TRUE thetas (recombination fractions) under which the simulation will be carried out. These true theta values are given in the RECOMBINATION VALUES line; the subsequent line with increment and finishing value is irrelevant for SLINK. The 'simdata.dat' file must be in standard MLINK format and can be created with the PREPLINK program of the LINKAGE package. In this example, we are interested in simulating two linked markers segregating down through two schizophrenia pedigrees, conditional on the current schizophrenia diagnoses. This simula- tion will be carried out under the assumptions that the locus order is 599Ha---Schizophrenia---153Ra, with 8 cM between 599Ha and the disease locus, and 5.7 cM between the disease locus and 153Ra. File: 'simdata.dat' 3 0 0 5 << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ (IF 1) 1 2 3 << Order of loci 3 3 << ALLELE NUMBERS, NO. OF ALLELES Locus p105-599Ha/TaqI 0.3200 0.1600 0.5200 << GENE FREQUENCIES 1 2 << AFFECTION, NO. OF ALLELES Locus Schizophrenia 0.991500 0.008500 << GENE FREQUENCIES 3 << NO. OF LIABILITY CLASSES 1.0000 0.1400 0.0000 Normal 0.0000 0.8600 0.0000 Affected 1.0000 0.0000 0.0000 Married in Normals << PENETRANCES 3 2 << ALLELE NUMBERS, NO. OF ALLELES Locus p105-153Ra/XbaI 0.3300 0.6700 << GENE FREQUENCIES 0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.0739 0.0539 << RECOMB. VALUES Schiz. is 8 cM from p105-599Ha 2 0.01000 0.50000 << REC VARIED, INCREMENT, FINISHING VALUE Step 2) Create a 'simped.dat' file defining the pedigree struc- ture, phenotypes, and availability codes First use a text editor to create a pedigree file 'simpre.dat' analogous to the one shown above in example 1. Then use MAKEPED to create the 'simped.dat' pedigree file. The file 'simped.dat' defines the pedigrees to be simulated. It is a LINKAGE pedigree file with an additional column contain- ing the availability codes after the last phenotype field. All simulations are carried out CONDITIONAL on the phenotypes given in 'simped.dat'. The example pedigree file contains two pedigrees from Sherrington et al (1988), with marker locus 599Ha, the schizo- phrenia locus, and marker locus 153Ra, in that order. Note that all individuals are untyped at both markers. Also, most individ- uals have an availability code of 2, indicating that all their marker phenotypes will be simulated while their trait phenotypes will be left as specified here. When a person is "marker avail- able", marker phenotypes are simulated for ALL the marker loci; it is not possible to make a person "marker available" for a subset of the marker loci. File: 'simped.dat' 1 1 0 0 3 0 0 1 1 0 0 2 3 0 0 2 1 2 0 0 3 0 0 2 0 0 0 2 2 0 0 0 1 3 1 2 0 4 4 2 0 0 0 2 2 0 0 2 1 4 1 2 0 5 5 2 0 0 0 2 1 0 0 2 1 5 1 2 0 6 6 1 0 0 0 2 2 0 0 2 1 6 1 2 0 7 7 2 0 0 0 2 1 0 0 2 1 7 1 2 0 8 8 1 0 0 0 2 1 0 0 2 1 8 1 2 0 9 9 1 0 0 0 2 2 0 0 2 1 9 1 2 0 10 10 1 0 0 0 2 2 0 0 2 1 10 1 2 14 12 12 2 0 0 0 2 2 0 0 2 1 11 0 0 14 0 0 1 0 0 0 2 3 0 0 2 1 12 1 2 0 13 13 1 0 0 0 2 2 0 0 2 1 13 1 2 0 0 0 2 0 0 0 2 1 0 0 2 1 14 11 10 0 15 15 2 0 0 0 2 1 0 0 2 1 15 11 10 0 16 16 1 0 0 0 2 2 0 0 2 1 16 11 10 0 17 17 1 0 0 0 2 2 0 0 2 1 17 11 10 0 0 0 2 0 0 0 2 1 0 0 2 2 1 0 0 3 0 0 1 1 0 0 2 3 0 0 0 2 2 0 0 3 0 0 2 0 0 0 2 2 0 0 2 2 3 1 2 0 4 4 2 0 0 0 2 2 0 0 2 2 4 1 2 0 5 5 1 0 0 0 2 2 0 0 2 2 5 1 2 0 6 6 1 0 0 0 2 2 0 0 2 2 6 1 2 10 8 8 1 0 0 0 2 2 0 0 2 2 7 0 0 10 0 0 2 0 0 0 2 3 0 0 2 2 8 1 2 13 0 0 2 0 0 0 2 2 0 0 2 2 9 0 0 13 0 0 1 0 0 0 2 3 0 0 2 2 10 6 7 0 11 11 1 0 0 0 2 1 0 0 2 2 11 6 7 0 12 12 1 0 0 0 2 1 0 0 2 2 12 6 7 0 0 0 1 0 0 0 2 2 0 0 2 2 13 9 8 0 14 14 1 0 0 0 2 1 0 0 2 2 14 9 8 0 0 0 1 0 0 0 2 2 0 0 2 Step 3) Create the 'slinkin.dat' input file for SLINK holding the parameters defined in section 3, above. Please note that the random number seed should be changed for every new simulation. Step 4) Run SLINK to create a 'pedfile.dat' containing the simulated data. As outlined in section 3, SLINK requires several parameters in an input file, 'slinkin.dat'. The input files for SLINK are 'simdata.dat' and 'simped.dat' (see the flow chart in figure 1). SLINK outputs the simulated data in the file 'pedfile.dat', while the parameters (as described above) and the thetas are written to 'simout.dat'. If one of the companion analysis programs is run, it will put the information from the most recent 'simout.dat' into the output file it creates (figure 1). Note that the 'pedfile.dat' is a bona-fide LINKAGE pedigree file and can be analyzed by any of the regular LINKAGE programs. However, it may contain many replicates of the pedigrees (one set after another), so that in most cases it is more practical to use the modified versions of the LINKAGE programs (MSIM, ISIM, LSIM) that are designed to analyze the data a replicate at a time. In our example, we asked for simulated marker data while maintaining the original trait phenotypes (Availability code 2) for all but two people. Note that the two people with availabil- ity code 0 have unknown marker genotypes. File: 'pedfile.dat' (first replicate of both families) 1 1 0 0 3 0 0 1 1 1 3 2 3 1 2 2 Linked Mk Avail; Tr orig 1 2 0 0 3 0 0 2 0 0 0 2 2 0 0 0 Linked Mk Unkno; Tr orig 1 3 1 2 0 4 4 2 0 1 3 2 2 1 2 2 Linked Mk Avail; Tr orig 1 4 1 2 0 5 5 2 0 1 3 2 1 1 1 2 Linked Mk Avail; Tr orig 1 5 1 2 0 6 6 1 0 1 3 2 2 1 2 2 Linked Mk Avail; Tr orig 1 6 1 2 0 7 7 2 0 3 3 2 1 2 2 2 Linked Mk Avail; Tr orig 1 7 1 2 0 8 8 1 0 3 3 2 1 2 2 2 Linked Mk Avail; Tr orig 1 8 1 2 0 9 9 1 0 1 1 2 2 1 1 2 Linked Mk Avail; Tr orig 1 9 1 2 0 10 10 1 0 1 3 2 2 1 2 2 Linked Mk Avail; Tr orig 1 10 1 2 14 12 12 2 0 1 1 2 2 1 1 2 Linked Mk Avail; Tr orig 1 11 0 0 14 0 0 1 0 1 3 2 3 2 2 2 Linked Mk Avail; Tr orig 1 12 1 2 0 13 13 1 0 1 1 2 2 2 1 2 Linked Mk Avail; Tr orig 1 13 1 2 0 0 0 2 0 3 3 2 1 1 2 2 Linked Mk Avail; Tr orig 1 14 11 10 0 15 15 2 0 1 3 2 1 1 2 2 Linked Mk Avail; Tr orig 1 15 11 10 0 16 16 1 0 1 1 2 2 2 1 2 Linked Mk Avail; Tr orig 1 16 11 10 0 17 17 1 0 1 3 2 2 1 2 2 Linked Mk Avail; Tr orig 1 17 11 10 0 0 0 2 0 1 3 2 1 1 2 2 Linked Mk Avail; Tr orig 2 1 0 0 3 0 0 1 1 0 0 2 3 0 0 0 Linked Mk Unkno; Tr orig 2 2 0 0 3 0 0 2 0 1 2 2 2 2 2 2 Linked Mk Avail; Tr orig 2 3 1 2 0 4 4 2 0 1 1 2 2 2 2 2 Linked Mk Avail; Tr orig 2 4 1 2 0 5 5 1 0 1 1 2 2 2 2 2 Linked Mk Avail; Tr orig 2 5 1 2 0 6 6 1 0 1 1 2 2 2 2 2 Linked Mk Avail; Tr orig 2 6 1 2 10 8 8 1 0 1 2 2 2 2 2 2 Linked Mk Avail; Tr orig 2 7 0 0 10 0 0 2 0 3 3 2 3 1 2 2 Linked Mk Avail; Tr orig 2 8 1 2 13 0 0 2 0 1 2 2 2 2 2 2 Linked Mk Avail; Tr orig 2 9 0 0 13 0 0 1 0 1 3 2 3 2 1 2 Linked Mk Avail; Tr orig 2 10 6 7 0 11 11 1 0 2 3 2 1 2 1 2 Linked Mk Avail; Tr orig 2 11 6 7 0 12 12 1 0 2 3 2 1 2 2 2 Linked Mk Avail; Tr orig 2 12 6 7 0 0 0 1 0 1 3 2 2 2 1 2 Linked Mk Avail; Tr orig 2 13 9 8 0 14 14 1 0 1 2 2 1 2 2 2 Linked Mk Avail; Tr orig 2 14 9 8 0 0 0 1 0 1 1 2 2 2 2 2 Linked Mk Avail; Tr orig Step 5) Create a 'datafile.dat' defining how the simulated data should be analyzed. The analysis programs require five input files (figure 1). Two are made by UNKNOWN ('ipedfile.dat' and 'speedfile.dat', see step 6). One is made by SLINK ('simout.dat', see step 4). The fourth, 'datafile.dat', must be made using PREPLINK prior to running the desired analysis program. The fifth input file, 'limit.dat', contains three threshold values (e.g., 1 2 3) used to determine the proportion of replicates exceeding a given lod score limit. The 'datafile.dat' is a standard LINKAGE data file which determines how the analyses of the simulated data will be carried out. It must be in MLINK format for MSIM; ILINK format for ISIM; and LINKMAP format for LSIM. The example 'datafile.dat' below is in MLINK format and is thus appropriate for input into MSIM (the modified version of MLINK). File: 'datafile.dat' 3 0 0 5 << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ (IF 1) 1 3 2 3 3 << ALLELE NUMBERS, NO. OF ALLELES Locus p105-599Ha/TaqI 0.320000 0.160000 0.520000 << GENE FREQUENCIES 1 2 << AFFECTION, NO. OF ALLELES Locus Schizophrenia 0.991500 0.008500 << GENE FREQUENCIES 3 << NO. OF LIABILITY CLASSES 1.0000 0.1400 0.0000 Normal 0.0000 0.8600 0.0000 Affected 1.0000 0.0000 0.0000 Married in Normals << PENETRANCES 3 2 << ALLELE NUMBERS, NO. OF ALLELES Locus p105-153Ra/XbaI 0.330000 0.670000 << GENE FREQUENCIES 0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.11970 0.12500 << RECOMBINATION VALUES 2 0.12500 0.38000 << REC VARIED, INCREMENT, FINISHING VALUE NOTE: If you choose the increment size too small, you will get an error message that maxpnt is too small. Maxpnt is the number of theta's (or map positions in LSIM) at which the lod score is evaluated. You may fix this problem by either choosing a larger increment or by increasing maxpnt and recompiling. Step 6) Run UNKNOWN to create a 'speedfile.dat' and an 'ipedfi- le.dat' Once the 'pedfile.dat' has been created by SLINK, then it is necessary to process this file with the program UNKNOWN (of the LINKAGE package) before running any of the analysis programs. UNKNOWN creates the pedigree file 'ipedfile.dat' and the 'speed- file.dat' ('speedfil.dat' on DOS machines). These two files are needed for input into MSIM, ISIM, or LSIM. Step 7) Run the appropriate analysis program, such as MSIM or LSIM. 5. TYPICAL APPLICATIONS The analysis programs, MSIM, LSIM, and ISIM, each require three input files: 1) A parameter file called 'limit.dat'. This file holds three thresholds/limits for the maximum lod score; the programs will approximate the probability with which the maximum lod score exceeds each of the three thresholds. Typical threshold values are 1, 2, and 3. Note that numbers must have at least one digit to the left of any decimal point, for example, 0.5; a number given as .5 would lead to an error. 2) A locus file, 'datafile.dat', which is analogous to the one used for the LINKAGE programs. You may simply copy 'simda- ta.dat' to 'datafile.dat' and modify the 'datafile.dat' file to correspond (i) to the analysis program to be used and (ii) to reflect the theta values at which analysis is to be carried out. Note that 'datafile.dat' is also used as an input file to the UN- KNOWN program. 3) A pedigree file, 'ipedfile.dat', created by the UNKNOWN program. Details of programs usage are given below. 5A. MSIM: Approximating the expected lod score First we show how to use MSIM, which summarizes its results in the file 'msim.dat'. The first part of 'msim.dat' contains information defining the simulation, such as the random number seed, the number of replicates, the requested proportion of unlinked families, and the trait locus number. This information is taken from the most recent 'simout.dat'. Note that the thetas and the locus order presented in the 'simout.dat' section pertain to the model under which the simulation was carried out (as previously specified in the 'simdata.dat' file). These may differ from the thetas and locus order used in the analysis of the simulated data. The rest of 'msim.dat' provides statistical information. If two loci are used, then the results are reported on the traditional lod score scale. When three or more loci are used, multi-point lod scores are computed as the log likelihood at the current thetas minus the log likelihood with the theta involving the trait locus set to 0.50. For this reason, when MLINK or MSIM are used, the trait locus must be either the leftmost or right- most locus for these statistics to make sense. Normally, MSIM, like MLINK, will be used for analyses involving only two loci. However, in this example, we use three loci, but place the trait locus (number 2) on the right by specifying the locus order to be '1 3 2' in the 'datafile.dat' file (see above). Lod scores for the position of a disease versus a map of markers are usually computed with the LINKMAP (LSIM, see next section); MSIM is used here for demonstration purposes. File: 'msim.dat' ********* Data from most recent SIMOUT.DAT ********* The random number seed is: 25432 The number of replications is: 20 The requested proportion of unlinked families is: 0.000 The trait locus is locus number: 2 Summary Statistics about simped.dat Number of pedigrees 2 Number of people 31 Number of females 12 Number of males 19 There were 2 in category: Marker Unknown; Trait original There were 0 in category: Marker Available; Trait simulated There were 29 in category: Marker Available; Trait original There were 0 in category: Marker Unknown; Trait simulated LINKAGE (V4.91) WITH 3-POINT AUTOSOMAL DATA ----------------------------------- LINKED ORDER OF LOCI: 1 2 3 ----------------------------------- ----------------------------------- TRUE THETAS FOR LINKED ORDER 0.073900 0.053900 ----------------------------------- ----------------------------------- UNLINKED ORDER OF LOCI: 2 1 3 ----------------------------------- ----------------------------------- TRUE THETAS FOR UNLINKED ORDER 0.500000 0.119834 ----------------------------------- Elapsed Time for one replicate = 8 seconds Elapsed Time = 162 seconds or 2.70 minutes. ----------------------------------- Actual proportion of unlinked families: 0.000 ********* End of most recent SIMOUT.DAT ********* ORDER OF LOCI: 1 3 2 Average Multipoint Lod Scores at Given Thetas Number of replicates = 20 ---------------------------------------------------------------- THETAS 0.120 0.125 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.928233 0.823127 -0.499552 2.372393 2 0.581052 0.458913 -0.066598 1.564117 Study 1.509285 0.725744 0.163738 3.215271 ---------------------------------------------------------------- THETAS 0.120 0.250 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.673438 0.524425 -0.170536 1.607466 2 0.355322 0.371757 -0.404606 0.957419 Study 1.028760 0.507228 -0.259767 2.132345 ---------------------------------------------------------------- THETAS 0.120 0.375 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.313556 0.231926 -0.036782 0.742686 2 0.140380 0.209244 -0.369045 0.398341 Study 0.453935 0.265833 -0.275562 0.937106 ---------------------------------------------------------------- Brief explanation of the output: The file 'msim.dat' contains information defining the simulation, followed by some tables providing statistical infor- mation about the distribution of the simulated lod scores. The 'Average' column provides the expected (or mean) lod score by pedigree and by study (i.e., set of families). The 'StdDev' column provides the standard deviation of the lod score. The 'Min' column lists the minimum lod score encountered in all the replicates, and the 'Max' column lists the maximum lod score encountered in all the replicates (and when the pedigrees are used together in one study). Note that the only column in which the Study value will equal the sum of the Pedigree values is the 'Average' column. If one is interested in approaching the absolute smallest (or absolute largest) lod score in the whole study, one should add up the min. (or max.) values (if all have the same sign) over pedigrees rather than looking at the Study min. (or max.) value. 5B. LSIM: Disease versus map of markers If we want to run LSIM (the modified version of LINKMAP), we need a different 'datafile.dat' than the one used above for MSIM because the data file for LSIM must be in standard LINKMAP format. Like LINKMAP, LSIM is most appropriate for calculating lod scores of the trait locus at various positions across a fixed map of marker loci. As mentioned above, a multi-point lod score requires calculation of the likelihood of the data with the trait "off" the map first. Thus, there are two requirements for getting accurate results out of LSIM: 1) the trait locus must start out as the leftmost locus; and 2) the trait locus must be placed "off" the map on the left at a recombination fraction of 0.50. This allows calculation of multi-point lod scores as the trait locus is moved across the whole map. In this example, the trait locus is number 2 and so the locus order must be specified as '2 1 3' in order to have the trait locus be off the map on the left. Since the trait locus must be placed off the map, the first recombination fraction is 0.50. File: 'datafile.dat' 3 0 0 4 << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT RATE, HAPLOTYPE FREQUENCIES (IF 1) 2 1 3 3 3 << ALLELE NUMBERS, NO. OF ALLELES Locus p105-599Ha/TaqI 0.320000 0.160000 0.520000 << GENE FREQUENCIES 1 2 << AFFECTION, NO. OF ALLELES Locus Schizophrenia 0.991500 0.008500 << GENE FREQUENCIES 3 << NO. OF LIABILITY CLASSES 1.0000 0.1400 0.0000 Normal 0.0000 0.8600 0.0000 Affected 1.0000 0.0000 0.0000 Married in Normals << PENETRANCES 3 2 << ALLELE NUMBERS, NO. OF ALLELES Locus p105-153Ra/XbaI 0.330000 0.670000 << GENE FREQUENCIES 0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.50000 0.11980 << RECOMBINATION VALUES 2 0.11970 4 << LOCUS VARIED, FINISHING VALUE, NU OF EVALUATIONS LSIM (the modified version of LINKMAP) creates the following output file, which has been edited to conserve space. Note how LSIM, unlike LINKMAP, automatically moves the trait locus across each interval in the fixed map of marker loci, as indicated by the different locus orders in the tables below (the middle number on the last line of the 'datafile.dat' is irrelevant for LSIM). File: 'lsim.dat' (edited) ********* Data from most recent SIMOUT.DAT ********* ----------------------------------- LINKED ORDER OF LOCI: 1 2 3 ----------------------------------- ----------------------------------- TRUE THETAS FOR LINKED ORDER 0.073900 0.053900 ----------------------------------- ----------------------------------- UNLINKED ORDER OF LOCI: 2 1 3 ----------------------------------- ----------------------------------- TRUE THETAS FOR UNLINKED ORDER 0.500000 0.119834 ----------------------------------- Actual proportion of unlinked families: 0.000 ********* End of most recent SIMOUT.DAT ********* Average Multipoint Lod Scores at Given Thetas Number of replicates = 20 ---------------------------------------------------------------- Locus Order: 2 1 3 THETAS 0.500 0.120 Number of replicates with a maximum at this location 0 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.000000 0.000000 0.000000 0.000000 2 0.000000 0.000000 0.000000 0.000000 Study 0.000000 0.000000 0.000000 0.000000 ---------------------------------------------------------------- Locus Order: 2 1 3 THETAS 0.125 0.120 Number of replicates with a maximum at this location 1 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.986943 0.957538 -0.563674 2.811669 2 0.553123 0.587166 -0.495850 1.583681 Study 1.540066 0.943955 -1.059524 3.268075 ---------------------------------------------------------------- Locus Order: 1 2 3 THETAS 0.060 0.068 Number of replicates with a maximum at this location 2 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 1.147581 1.148037 -0.900799 3.205506 2 0.751585 0.636813 -0.343115 2.097035 Study 1.899166 1.040807 -0.212925 4.120526 ---------------------------------------------------------------- Locus Order: 1 2 3 THETAS 0.090 0.037 Number of replicates with a maximum at this location 0 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 1.097943 1.163443 -1.264877 3.126892 2 0.733534 0.603973 -0.154687 2.095361 Study 1.831477 1.055195 -0.426049 4.131400 ---------------------------------------------------------------- Locus Order: 1 3 2 THETAS 0.120 0.125 Number of replicates with a maximum at this location 3 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.928120 0.823000 -0.499557 2.372007 2 0.580943 0.458917 -0.066556 1.564077 Study 1.509064 0.725652 0.163398 3.214886 ---------------------------------------------------------------- Locus Order: 1 3 2 THETAS 0.120 0.250 Number of replicates with a maximum at this location 0 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.673346 0.524329 -0.170544 1.607176 2 0.355246 0.371761 -0.404694 0.957392 Study 1.028592 0.507163 -0.259854 2.132040 ---------------------------------------------------------------- Locus Order: 1 3 2 THETAS 0.120 0.375 Number of replicates with a maximum at this location 0 ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.313506 0.231874 -0.036785 0.742532 2 0.140356 0.209227 -0.369050 0.398359 Study 0.453862 0.265785 -0.275485 0.936943 ---------------------------------------------------------------- The output in LSIM.DAT is very similar to that found in MSIM.DAT (Average, Standard Deviation, Minimum and Maximum). However, there is an additional line that records the number of replicates with a maximum at the location. This number gives a measure of how often the trait locus would be mapped to that particular location, providing a sense of the best supported location for the trait locus. In our example, we simulated with the trait locus between the two marker loci, and we find that 3 out of 20 times, the trait would have mapped into the interval at a theta of .120 from the first marker locus. NOTE: The program parameter maxpnt, which defines the maximum number of points at which the lod score can be evaluated, may cause problems when the increment of theta is too small. Maxpnt appears in MSIM and LSIM, but not in ISIM. If one sets the increment of theta too small, MSIM or LSIM may terminate with the following error message: ERROR: maxpnt is too small. The solution is to 1) evaluate lod score at less points or 2) set maxpnt higher and recompile the programs. In LSIM, this may mean choosing points evaluated in each interval rather than over the whole map. 5C. ISIM: Average maximum lod score and power With ISIM (as with ILINK), the recombination fraction between two loci can be iterated and, therefore, the lod score obtained for a given replicate is maximized over the theta values. Thus, ISIM can approximate the average (expectation) of the maximum lod score. In our example application of ISIM, we use the files from Example 1, where we simulated the marker genotypes for two affected children in a simple nuclear family where the two parents are known to have four different marker alleles. The 'simdata.dat' and 'simped.dat' have been explained above. The 'datafile.dat' must be in ILINK format. If the recombination fraction between the two loci is iterated, then the average of the maximum lod score will be calculated. If the recombination fraction is not iterated, then the average of the lod score (unmaximized) will be calculated at the requested thetas. In the file below, we request that the recombination fraction be iterat- ed. Note: ISIM may not maximize the log likelihood over theta correctly if the disease gene frequencies are extreme (even when they are fixed). You may also want to designate locus 1 or locus 2 as the locus with iterated parameters. File: 'datafile.dat' 2 0 0 3 << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM 0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ (IF 1) 1 2 1 2 << AFFECTION, NO. OF ALLELES 0.9900 0.0100 << GENE FREQUENCIES 1 << NO. OF LIABILITY CLASSES 0.0000 0.0000 1.0000 << PENETRANCES 3 4 << ALLELE NUMBERS, NO. OF ALLELES 0.250000 0.250000 0.250000 0.250000 << GENE FREQUENCIES 0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.01000 << RECOMBINATION VALUES 1 << THIS LOCUS MAY HAVE ITERATED PARS 1 When we run ISIM, we obtain the following output, which reports the average of the maximum lod score: File 'isim.dat' (edited): ********* Data from most recent SIMOUT.DAT ********* The random number seed is: 29876 The number of replications is: 200 The requested proportion of unlinked families is: 0.000 The trait locus is locus number: 1 Summary Statistics about simped.dat Number of pedigrees 1 Number of people 4 Number of females 2 Number of males 2 There were 0 in category: Marker Unknown; Trait original There were 0 in category: Marker Available; Trait simulated There were 4 in category: Marker Available; Trait original There were 0 in category: Marker Unknown; Trait simulated LINKAGE (V4.91) WITH 2-POINT AUTOSOMAL DATA ----------------------------------- LINKED ORDER OF LOCI: 1 2 ----------------------------------- ----------------------------------- TRUE THETAS FOR LINKED ORDER 0.010000 ----------------------------------- Actual proportion of unlinked families: 0.000 ********* End of most recent SIMOUT.DAT ********* Average Maximum Lod Score The number of replicates is 200 -------------------------------------------------------- Average Maximum | StdDev | Min | Max 0.576825 0.117744 0.000000 0.600859 -------------------------------------------------------- -------------------------------------------------------- Number of maximum lod scores greater than a given constant -------------------------------------------------------- Constant Number Percent 1 161 80.500 (assumed values) 2 0 0.000 3 0 0.000 -------------------------------------------------------- Note that there is no theta value reported with the Average Maximum Lod Score. This is because each maximum lod score (in a replicate) may correspond to a different estimate of theta. Also, note that the average maximum lod score is not additive over families or studies. The average lod score (ELOD), calcu- lated previously, does possess this desirable property of additi- vity (see Ott, 1991). The last table appearing in 'isim.dat' gives the number and percentage of replicates in which the maximized lod score (Zmax) was found greater than the given constants. The proportion of replicates with Zmax > 3 is an approximation to the power, i.e., to finding a significant linkage. 5D. MSIM: Interpolating Zmax in each replicate The average of the maximum lod score (calculated by ISIM) may be much more time consuming to compute, because maximization of the lod score requires many evaluations of the lod score. To partially avoid this problem, MSIM can estimate maximum lod scores by quadratic interpolation. The quadratic interpolation option works automatically, but only in the special case where there are two loci, autosomal inheritance, and the 'datafile.dat' for MSIM requests evaluation of the lod score at three or more distinct points. To get the best results from the quadratic interpolation, you should evaluate the lod score at more than three points. Also, it is better not to choose theta = 0.0 as one of the points, because if there is an obligate recombination, then the lod score is minus infinity (use theta = 0.00001 or 0.001 instead). For example, if we run the problem above, starting theta at 0.01 and increasing it in increments of 0.10 up to 0.41, for a total of 5 points, we get the results seen below, which are very similar to those obtained above by ISIM: From the file: 'msim.dat' ---------------------------------------------------------------- Average Maximum Lod Scores based on quadratic interpolation ---------------------------------------------------------------- Pedigree Average StdDev Min Max 1 0.579848 0.118361 0.000000 0.604008 ---------------------------------------------------------------- 5E. Finding the peak of the expected lod score curve Theoretically, the expected lod score should have its maximum (the ELOD) at the true recombination fraction. In a simulation, one may verify this by computing the average lod score at several assumed theta values in the vicinity of the true recombination fraction under which the data are generated. The peak should occur at or near the true recombination fraction. MSIM or LSIM may be used for these calculations. 5F. MSIM and HOMOG and ELODHET: analysis under heterogeneity As outlined above, SLINK can generate data under heterogene- ity (alpha <1) but one should not analyze data so generated assuming homogeneity, for example, using output from the MSIM program. If families generated with alpha <1 are analyzed with MSIM, that is, assuming alpha=1, the resulting lod scores are too low and do not correspond to the ELOD under heterogeneity; also, the estimates of theta are biased upwards. When generating data under heterogeneity, the easiest solution for analysis is to focus on the expected lod score (ELOD); working with the probability that the maximum lod score exceeds a given threshold requires more complicated manipulations than described here. The replicates generated by SLINK are analyzed with MSIM but the results produced by MSIM (MSIM.DAT) are disregarded. Instead, one takes the LODFILE.DAT output generated by MSIM and analyzes it using either the HOMOG program (version 3.30 or higher) or a utility program called ELODHET. The latter program uses LODFILE.DAT and MSIM.DAT as input files and computes for each family its expected lod score under hetero- geneity. Presently, its use is restricted to two loci (a trait and a marker locus). For analysis by the HOMOG program, the data in LODFILE.DAT will have to be completed to form a proper input file for HOMOG. When data are generated under heterogeneity, it is usually most appropriate to also analyze them under heterogeneity. The ELODHET program accomplishes this task. It carries out two types of analyses: 1) For the given values of a (proportion of families with linkage) and q (recombination fraction in families with linkage) under which the family data were generated, ELODHET calculates expected lod scores Z for each family and, for all families jointly, the probability (power) that the lod score at a and q exceeds given thresholds (as furnished in the 'limit.dat' file). The expected lod score under heterogeneity is defined as Z(a, q) = log10[L(a, q)] - log10[L(1, 0.5)], where L is the usual likelihood under heterogeneity, that is, for the i-th family, Li(a, q) = a Li(q) + 1 - a, and Li(q) is the antilog of the i-th family's lod score. 2) Expected maximum lod scores are calculated by evaluating in each replicate, for all families jointly, the maximum likelihood over a grid of values of a and q (irrespective of the a and q values used to generate the data). That is, each simulated replicate furnishes Zmax(a, q) = log10[Lmax(a, q)] - log10[L(0, 0.5)], where a and q are estimates obtained in a given replicate. For the Zmax values, their average is computed and the proportion of replicates in which Zmax exceeds given thresholds (power). Note that Zmax involves two degrees of freedom, and the associated lod score threshold is thus somewhat less conservative than that for the lod score under homogeneity. In this sense, the Zmax under heterogeneity is comparable to obtaining separate estimates for male and female recombination fractions. To use the ELODHET program, one must first run SLINK and MSIM. For the latter run, lod scores should be evaluated at suitable intervals, for example, starting at q = 0.00001 and increasing in steps of 0.05 with a limiting value of 0.47. In its present implementation, the step size for a in the analysis is taken to be 0.025. As in the HOMOG programs, no interpolation is carried out to approximate maximum lod scores. The ELODHET program reads the LODFILE.DAT file created by MSIM and furnishes its results in an output file called ELODHET.DAT. 6. TECHNICAL INFORMATION As indicated below, the programs in this package are modi- fied versions of LINKAGE programs. They were modified by Daniel E. Weeks with help from Mark Lathrop. Please address any ques- tions or problems to: Daniel E. Weeks University of Pittsburgh Department of Human Genetics 130 DeSoto Street A300 Crabtree Hall Pittsburgh, PA 15261 (412) 624-5388 or (412) 624-3018 FAX: (412) 624-3020 Email: weeks@pitt.edu The programs are distributed by J. Ott in versions for Vax (VMS) computers ("Disk" 21) and microcomputers running under DOS or OS/2 ("Disks" 20a through d). The DOS and OS/2 versions have been adapted to Prospero Pascal. For details on compiling and linking, see the PROSPERO.TXT file. Disk 20b contains a running version of the UNKNOWN program. You may also use the UNKNOWN program furnished with the LINKAGE programs. Differences between Prospero and Vax Pascal (double preci- sion real variables): Prospero Vax ------------- -------------- Top line none [G_FLOATING] Type real= LONGREAL DOUBLE Closing files CLOSE(ff,true) CLOSE(ff) Clock function explicit (present) built-in File buffer FBUFFER(ff,4096) not used Assign statements ASSIGN not used In addition, in procedure inib, replace ASSIGN by OPEN for Vax Pascal. 6A. Program constants Program Purpose/Description --------------------------------------------------- Simulation SLINK General simulation program Analysis MSIM Modified version of MLINK ISIM Modified version of ILINK LSIM Modified version of LINKMAP The analysis programs, MSIM, ISIM, and LSIM, read the number of pedigrees per replicate (i.e., study) from the summary file 'simout.dat' produced by SLINK. There are several constants that may be changed to modify the behavior of the programs. This requires editing the source code followed by recompilation. Compilation instructions are in the documentation file README.txt. A standard version of one of the LINKAGE programs, such as MLINK, produces a lot of output. For the purposes of conserving disk space and speeding up the SLINK programs, most of the 'standard' output has been turned off by a logical flag. This constant is called 'prn' or 'print' and is identified in the source code of each program by a comment. Set the constant equal to TRUE if you want more output. For an affection status locus, the unaffected individuals are indicated by the integer constant assigned to 'unaff'. In the LINKAGE programs, the phenotype at a quantitative locus may consist of several measurements (or traits). However, SLINK supports only ONE measurement (or trait) for a quantitative locus. Unaffected males at a sex-linked locus are indicated by the integer constant assigned to 'unafqu'. MSIM will write a list of lod scores by pedigree to the file 'lodfile.dat' if the constant 'lodprint' is set to TRUE. Howev- er, it only writes out the male thetas. This may be useful if you wish to produce a histogram of lod scores. 6B. References for SLINK Please use the following two references when reporting results based on SLINK: Ott J (1989) Computer-simulation methods in human linkage analysis. Proc Natl Acad Sci USA 86:4175-4178 Weeks DE, Ott J, Lathrop GM (1990) SLINK: a general simula- tion program for linkage analysis. Am J Hum Genet 47:A204 (abstr) 6C. Acknowledgments We would like to thank Joseph Terwilliger and Suzanne Leal for their helpful comments. Support by the W.M. Keck Foundation and grants MH44292 from NIMH and HG00008 from the National Center for Human Genome Research are gratefully acknowledged. 7. LITERATURE Boehnke M (1986) Estimating the power of a proposed linkage study: a practical computer simulation approach. Am J Hum Genet 39:513-527 Lange K, Matthysse S (1989) Simulation of pedigree genotypes by random walks. Am J Hum Genet 45:959-970 Lathrop GM, Lalouel JM, Julier C, Ott J (1984) Strategies for multilocus linkage analysis in humans. Proc Natl Acad Sci USA 81:3443-3446 Ott J (1985) Analysis of human genetic linkage, first edition. Johns Hopkins University Press, Baltimore Ott J (1989) Computer-simulation methods in human linkage analysis. Proc Natl Acad Sci USA 86:4175-4178 Ott J (1991) Analysis of Human Genetic Linkage, second edition. Johns Hopkins University Press, Baltimore and London Ploughman LM, Boehnke M (1989) Estimating the power of a proposed linkage study for a complex genetic trait. Am J Hum Genet 44:543-551 Sandkuyl L, Ott J (1989) Determining informativity of marker typing for genetic counseling in a pedigree. Hum Genet 82:159- 162 Sherrington R, Brynjolfsson J, Petursson H, Potter M, Dudleston K, Barraclough B, Wasmuth J, Dobbs M, Gurling H (1988) Localization of a susceptibility locus for schizophrenia on chromosome 5. Nature 336:164-167 Weeks DE, Ott J, Lathrop GM (1990) SLINK: a general simula- tion program for linkage analysis. Am J Hum Genet 47:A204 (abstr) Figure 1: simdata.dat simped.dat | | | | | | | | \/ \/ ============================== slinkin.dat->= SLINK = ----> simout.dat ============================== | | | | | | | | \/ | | datafile.dat pedfile.dat | | | | | | | \/ \/ | ========================== | = UNKNOWN = | ========================== | | | | | | | | | | \/ \/ | datafile.dat ipedfile.dat speedfile.dat | | | | | | | | | | | | | \/ \/ \/ | ============================== | limit.dat->= MSIM, ISIM, LSIM = <-------------| | ============================== | | | | | | | | | | | | | | \/ \/ | | | | outfile.dat msim.dat | | lodfile.dat isim.dat <---------------| | | lsim.dat | | | \/ \/ \/ ============================== = ELODHET = ============================== | | \/ elodhet.dat