V1.2
Last Modification: November 10, 1996
SimIBD is the software implementation of the algorithms described in Davis et al., "Nonparametric simulation-based statistics for detecting linkage in general pedigrees," (1996) Am J Hum Genet 58:867-880
Please use the following four references when using results from these programs.
The algorithms used in simibd are based on:
1) Davis S, Schroder M, Goldin LR, and Weeks DE, "Nonparametric simulation-based statistics for detecting linkage in general pedigrees," (1996) Am J Hum Genet 58: 867-880.
Also, please reference the following three papers that deal with the SLINK code:
SLINK implements a simulation algorithm developed by Jurg Ott and described in:
2) Ott J (1989) Computer-simulation methods in human linkage analysis. Proc Natl Acad Sci USA 86:4175-4178
The algorithm was implemented in the original SLINK computer program package by Weeks, Ott, and Lathrop:
3) Weeks DE, Ott J, Lathrop GM (1990) SLINK: a general simulation program for linkage analysis. Am J Hum Genet 47:A204 (abstr)
The SLINK simulation program has been modified by Schaffer and Weeks to use the algorithms developed by Cottingham et al:
4) Cottingham Jr RW, Idury RM, Schaffer AA (1993) Faster sequential genetic linkage computations. Am J Hum Genet 53:252-263
Please cite references 1-4 if you use this package. Thank you.
In order to use the programs included here, you must compile them to run on your machine. Simply issuing the command:
make
will produce the executable "simibd".
Note that if you have a version of the program already "made", you must issue the command:
make cleanall
before attempting to make a new version.
In our experiences, producing optimized code provides a significant increase in speed while still affording correct answers. If you are unfamiliar with producing optimized code when compiling, see your system administrator for assistance. We recommend using gcc with at least one level of optimization, the default setting for compiling with make. The default compiler and options can be changed by editing the "Makefile" in this directory.
Using simibd is simply a matter of having a LINKAGE formatted pedigree and locus file. Then, issue the command:
simibd <pedfile> <locfile>
where <pedfile> is the name of the pedigree file and <locfile> is the name of the locus or data file. You will be prompted for several pieces of information including the family weighting function, the total number of replicates (this number will determine the accuracy of the resulting p-value), the number of the trait locus, the value of the trait signifying affection, and the marker locus to be analyzed. We recommend using the weighting function 1/sqrt(p) [p is the population frequency of a given allele]. The number of replicates will vary depending on the level of accuracy desired for the p-value and, to a lesser extent, on the number of affecteds in the pedigree.
If you would like to perform either the SimAPM or SimKIN statistics in addition to the SimIBD statistic, issue the same command as above, but with the optional "-a" or "-k". For example, to get the SimAPM result, issue the command:
simibd -a <pedfile> <locfile>
The command
simibd -ak <pedfile> <locfile>
will generate results for SimIBD, SimKIN, and SimAPM. Simibd attempts to keep you up-to-date about its progress and will estimate time necessary to complete the current task at hand. When run is complete, you will have before you a great deal of information. For your convenience, the brief output is contained in the file "simibd.out". This contains only the summary values of the statistics and the assosiated p-values for each family. Other output files are available with more detailed results including histograms of the data.
The output files are summarized below:
FILE CONTENTS --------------------------------------------------------- simibd.out Brief output from all statistics run simibd.aff.out Long output from the SimIBD calculations simkin.out Long output from the SimKIN calculations simapm.out Long output from the SimAPM calculations
A new functionality has been added to SimIBD. Because each of the pedigrees is independent of the next, it is possible to use bootstrap techniques to enhance the precision and accuracy of the p-value for all of the pedigrees. Version 2.0cond has this option enabled, but it can be disabled using the "-b" flag on the command line. If you choose to use it, the null distribution for all families will consist of many more points than the number of simulations performed (exactly number of simulated replicates X number of bootstraps per replicate).
It may still be beneficial to use a larger number of replicates when the pedigrees being simulated have a large number of affected individuals because covering the large sample space adequately enough to produce a null distribution may take more than the number of replicates needed for p-value accuracy. In practice, if there is much variability in the p-value obtained from the same data run several times, increase the number of replicates simulated.
If you have any comments about how to improve this code, please address them to:
Sean Davis University of Pittsburgh Department of Human Genetics A300 Crabtree Hall 130 DeSoto Street Pittsburgh, PA 15213 davis@moriarty.hgen.pitt.edu or Daniel E. Weeks University of Pittsburgh Department of Human Genetics A310 Crabtree Hall 130 DeSoto Street Pittsburgh, PA 15213 daniel.weeks@well.ox.ac.uk
V1.2
Last Modified: November 10, 1996
The example pedigree and locus files given here are taken from Figure 2 in the paper, "Nonparametric simulation-based statistics for detecting linkage in general pedigrees." This is a very simple example so that you can quickly understand the correspondence between the text of the paper and the output generated by the program when run on this data.
This example assumes that you have compiled the program as described in the Compilation files.
At the shell prompt, type:
simibd example.ped example.dat
The simibd executable must be in the search path and the example.ped and example.dat (as shown here) files must be in the current working directory. You will be presented with the following prompts which you should answer with the values given here between < >. (For example, <2> <RETURN> means type 2 and then type the RETURN key.)
SIMIBD V1.2 Last modified: November 10, 1996 A : f(p) = 1 B : f(p) = 1/sqrt(p) C : f(p) = 1/p Select a weighting function: <b> <RETURN> Using f(p) = 1/sqrt(p) Weighting function is f(p) = 1/sqrt(p) Number of replicates total: <1000> <RETURN> Loci loaded by SimIBD Locus Number Locus Name Locus Type ------------------------------------------- 1 No name given Affection 2 No name given Allele Numbers Trait locus number: <1> <RETURN> Affection status of trait: <2> <RETURN> Marker locus number: <2> <RETURN>
At this point, simibd will give the observed values for the original families and begin simulating the requested number of replicates. It will make an attempt to determine the total time required (based on the time for one replicate) and inform you of its progress.
SimIBD Observed Values Family ZObs # Aff. Weight Weighted ZObs ---------------------------------------------- 11 2.23744 4 0.57735 1.29178 TOTAL 1.29178 Beginning Simulations YOU ARE USING LINKAGE/SLINK (V2.50) WITH 1-POINT AUTOSOMAL DATA The random number seed is: 13093 Summary of pedigrees under analysis Number of pedigrees 1 Number of people 11 Number of females 6 Number of males 5 There were 3 in category: Marker Unknown There were 0 in category: Marker Available There were 8 in category: Marker Available There were 0 in category: Marker Unknown Simulating and calculating 1000 replicates for SimIBD computations Maxneed may be reduced to 2 I took 0.01 seconds for one replicate Estimated Time to finish = 0.13 minutes = 0.00 hours .........*.........*.........*.........*.........* 50 replicates completed .........*.........*.........*.........*.........* 100 replicates completed .........*.........*.........*.........*.........* 150 replicates completed .........*.........*.........*.........*.........* 200 replicates completed .........*.........*.........*.........*.........* 250 replicates completed .........*.........*.........*.........*.........* 300 replicates completed .........*.........*.........*.........*.........* 350 replicates completed .........*.........*.........*.........*.........* 400 replicates completed .........*.........*.........*.........*.........* 450 replicates completed .........*.........*.........*.........*.........* 500 replicates completed .........*.........*.........*.........*.........* 550 replicates completed .........*.........*.........*.........*.........* 600 replicates completed .........*.........*.........*.........*.........* 650 replicates completed .........*.........*.........*.........*.........* 700 replicates completed .........*.........*.........*.........*.........* 750 replicates completed .........*.........*.........*.........*.........* 800 replicates completed .........*.........*.........*.........*.........* 850 replicates completed .........*.........*.........*.........*.........* 900 replicates completed .........*.........*.........*.........*.........* 950 replicates completed .........*.........*.........*.........*.........* 1000 replicates completed Actual Elapsed Time = 0.08 min. or 0.00 hours Affected Comparison -- Summary -- Family ZObs # Aff. Weight Weighted ZObs PValue Range -------------------------------------------------------------- 11 2.23744 4 0.57735 1.29178 0.3915 +/- 0.0265 TOTAL 1.29178 0.3915 +/- 0.0265 Combined output can be found in file: simibd.out Full output from affected comparisons can be found in file: simibd.aff.out
Output files simibd.out and simibd.aff.out will also be created. These are fairly self-explanatory. Note that your output will not necessarily match exactly because the p-value is based on simulation, a somewhat random process. If a family doesn't have at least two genotyped affected individuals, the p-value column will have the value "N/A" and this family will not be used in the calculations of the total p-value. See above for information about using simibd to get SimAPM values and SimKIN values as well as for information about the number of replicates to be used. We recommend using the weighting function B : f(p) = 1/sqrt(p).
Version 2.0cond adds two functionalities to version 1.21. The first increases the power of SimIBD to detect linkage when incomplete typing is present. The approximation in version 1.21 has been replaced by a conditional simulation approach in which the untyped people in the pedigree are simulated conditional on the typed people (the affecteds and unaffecteds for the OBSERVED replicates or the unaffecteds for the NULL DISTRIBUTION replicates). The new methods in SimIBD V2.0cond increase the power to detect linkage but at the expense of slightly increased computational effort.
Also new in SimIBD is the ability to do bootstrapping in the total null distribution. When multiple pedigrees are present in the datafile, using bootstrapping allows more precise and accurate p-values to be generated with little added computational effort.
Two new questions have been added to account for the added user input. The first asks for the number of OBSERVED replicates. For partially typed pedigrees, I typically use 50-100 OBSERVED replicates. For the NULL DISTRIBUTION replicates I typically use 500-1000. Finally, I use between 250 and 1000 bootstraps per replicate. Note that 500 NULL DISTRIBUTION replicates and 1000 bootstraps gives 500000 points in the total null distribution.