SimIBD Documentation

INTRODUCTION
REFERENCES
COMPILATION
USE
NUMBER OF REPLICATES TO SIMULATE
QUESTIONS AND COMMENTS
EXAMPLE
INTRODUCTION
RUNNING THE EXAMPLE
DIFFERENCES BETWEEN VERSIONS 1.21 and 2.0cond

V1.2

Last Modification: November 10, 1996

INTRODUCTION

SimIBD is the software implementation of the algorithms described in Davis et al., "Nonparametric simulation-based statistics for detecting linkage in general pedigrees," (1996) Am J Hum Genet 58:867-880

REFERENCES

Please use the following four references when using results from these programs.

The algorithms used in simibd are based on:

1) Davis S, Schroder M, Goldin LR, and Weeks DE, "Nonparametric simulation-based statistics for detecting linkage in general pedigrees," (1996) Am J Hum Genet 58: 867-880.

Also, please reference the following three papers that deal with the SLINK code:

SLINK implements a simulation algorithm developed by Jurg Ott and described in:

2) Ott J (1989) Computer-simulation methods in human linkage analysis. Proc Natl Acad Sci USA 86:4175-4178

The algorithm was implemented in the original SLINK computer program package by Weeks, Ott, and Lathrop:

3) Weeks DE, Ott J, Lathrop GM (1990) SLINK: a general simulation program for linkage analysis. Am J Hum Genet 47:A204 (abstr)

The SLINK simulation program has been modified by Schaffer and Weeks to use the algorithms developed by Cottingham et al:

4) Cottingham Jr RW, Idury RM, Schaffer AA (1993) Faster sequential genetic linkage computations. Am J Hum Genet 53:252-263

Please cite references 1-4 if you use this package. Thank you.

COMPILATION

In order to use the programs included here, you must compile them to run on your machine. Simply issuing the command:

                        make

will produce the executable "simibd".

Note that if you have a version of the program already "made", you must issue the command:

        make cleanall

before attempting to make a new version.

In our experiences, producing optimized code provides a significant increase in speed while still affording correct answers. If you are unfamiliar with producing optimized code when compiling, see your system administrator for assistance. We recommend using gcc with at least one level of optimization, the default setting for compiling with make. The default compiler and options can be changed by editing the "Makefile" in this directory.

USE

Using simibd is simply a matter of having a LINKAGE formatted pedigree and locus file. Then, issue the command:

simibd <pedfile> <locfile>

where <pedfile> is the name of the pedigree file and <locfile> is the name of the locus or data file. You will be prompted for several pieces of information including the family weighting function, the total number of replicates (this number will determine the accuracy of the resulting p-value), the number of the trait locus, the value of the trait signifying affection, and the marker locus to be analyzed. We recommend using the weighting function 1/sqrt(p) [p is the population frequency of a given allele]. The number of replicates will vary depending on the level of accuracy desired for the p-value and, to a lesser extent, on the number of affecteds in the pedigree.

If you would like to perform either the SimAPM or SimKIN statistics in addition to the SimIBD statistic, issue the same command as above, but with the optional "-a" or "-k". For example, to get the SimAPM result, issue the command:

simibd -a <pedfile> <locfile>

The command

simibd -ak <pedfile> <locfile>

will generate results for SimIBD, SimKIN, and SimAPM. Simibd attempts to keep you up-to-date about its progress and will estimate time necessary to complete the current task at hand. When run is complete, you will have before you a great deal of information. For your convenience, the brief output is contained in the file "simibd.out". This contains only the summary values of the statistics and the assosiated p-values for each family. Other output files are available with more detailed results including histograms of the data.

The output files are summarized below:

                FILE                  CONTENTS
---------------------------------------------------------
simibd.out              Brief output from all statistics run
simibd.aff.out          Long output from the SimIBD calculations
simkin.out              Long output from the SimKIN calculations
simapm.out              Long output from the SimAPM calculations

NUMBER OF REPLICATES TO SIMULATE

A new functionality has been added to SimIBD. Because each of the pedigrees is independent of the next, it is possible to use bootstrap techniques to enhance the precision and accuracy of the p-value for all of the pedigrees. Version 2.0cond has this option enabled, but it can be disabled using the "-b" flag on the command line. If you choose to use it, the null distribution for all families will consist of many more points than the number of simulations performed (exactly number of simulated replicates X number of bootstraps per replicate).

It may still be beneficial to use a larger number of replicates when the pedigrees being simulated have a large number of affected individuals because covering the large sample space adequately enough to produce a null distribution may take more than the number of replicates needed for p-value accuracy. In practice, if there is much variability in the p-value obtained from the same data run several times, increase the number of replicates simulated.

QUESTIONS AND COMMENTS

If you have any comments about how to improve this code, please address them to:

                        Sean Davis
                        University of Pittsburgh
                        Department of Human Genetics
                        A300 Crabtree Hall
                        130 DeSoto Street
                        Pittsburgh, PA 15213
                        davis@moriarty.hgen.pitt.edu
                                or
                        Daniel E. Weeks
                        University of Pittsburgh
                        Department of Human Genetics
                        A310 Crabtree Hall
                        130 DeSoto Street
                        Pittsburgh, PA 15213
                        daniel.weeks@well.ox.ac.uk

EXAMPLE

V1.2

Last Modified: November 10, 1996

INTRODUCTION

The example pedigree and locus files given here are taken from Figure 2 in the paper, "Nonparametric simulation-based statistics for detecting linkage in general pedigrees." This is a very simple example so that you can quickly understand the correspondence between the text of the paper and the output generated by the program when run on this data.

This example assumes that you have compiled the program as described in the Compilation files.

RUNNING THE EXAMPLE

At the shell prompt, type:

simibd example.ped example.dat

The simibd executable must be in the search path and the example.ped and example.dat (as shown here) files must be in the current working directory. You will be presented with the following prompts which you should answer with the values given here between < >. (For example, <2> <RETURN> means type 2 and then type the RETURN key.)

 
SIMIBD  V1.2
Last modified:  November 10, 1996
 
  A : f(p) = 1
  B : f(p) = 1/sqrt(p)
  C : f(p) = 1/p
Select a weighting function: <b> <RETURN>
 
Using f(p) = 1/sqrt(p)
 
Weighting function is f(p) = 1/sqrt(p)
 
Number of replicates total: <1000> <RETURN>
 
 
  Loci loaded by SimIBD
 
  Locus Number   Locus Name      Locus Type
-------------------------------------------
        1     No name given       Affection
        2     No name given  Allele Numbers

Trait locus number: <1> <RETURN> 
Affection status of trait: <2> <RETURN> 
Marker locus number: <2> <RETURN>

At this point, simibd will give the observed values for the original families and begin simulating the requested number of replicates. It will make an attempt to determine the total time required (based on the time for one replicate) and inform you of its progress.

 
SimIBD Observed Values
 
  Family   ZObs    # Aff.  Weight  Weighted ZObs
  ----------------------------------------------
      11   2.23744     4   0.57735     1.29178
   TOTAL                               1.29178
 
 
Beginning Simulations
 
YOU ARE USING LINKAGE/SLINK (V2.50) WITH  1-POINT AUTOSOMAL DATA
 The random number seed is: 13093
    Summary of pedigrees under analysis 
 Number of pedigrees       1
 Number of people         11
 Number of females         6
 Number of males           5
 There were   3 in category:  Marker Unknown
 There were   0 in category:  Marker Available
 There were   8 in category:  Marker Available
 There were   0 in category:  Marker Unknown
 
 
Simulating and calculating 1000 replicates for
    SimIBD computations
 Maxneed may be reduced to            2
 I took     0.01 seconds for one replicate
 Estimated Time to finish =     0.13 minutes =   0.00 hours
.........*.........*.........*.........*.........* 50 replicates completed
.........*.........*.........*.........*.........* 100 replicates completed
.........*.........*.........*.........*.........* 150 replicates completed
.........*.........*.........*.........*.........* 200 replicates completed
.........*.........*.........*.........*.........* 250 replicates completed
.........*.........*.........*.........*.........* 300 replicates completed
.........*.........*.........*.........*.........* 350 replicates completed
.........*.........*.........*.........*.........* 400 replicates completed
.........*.........*.........*.........*.........* 450 replicates completed
.........*.........*.........*.........*.........* 500 replicates completed
.........*.........*.........*.........*.........* 550 replicates completed
.........*.........*.........*.........*.........* 600 replicates completed
.........*.........*.........*.........*.........* 650 replicates completed
.........*.........*.........*.........*.........* 700 replicates completed
.........*.........*.........*.........*.........* 750 replicates completed
.........*.........*.........*.........*.........* 800 replicates completed
.........*.........*.........*.........*.........* 850 replicates completed
.........*.........*.........*.........*.........* 900 replicates completed
.........*.........*.........*.........*.........* 950 replicates completed
.........*.........*.........*.........*.........* 1000 replicates completed
 Actual Elapsed Time =     0.08 min. or     0.00 hours


 
Affected Comparison
 
 
                          -- Summary --
  Family   ZObs    # Aff.  Weight  Weighted ZObs  PValue   Range
  --------------------------------------------------------------
      11   2.23744     4   0.57735     1.29178   0.3915 +/- 0.0265 
   TOTAL                               1.29178   0.3915 +/- 0.0265 

 
 
Combined output can be found in file: simibd.out
Full output from affected comparisons can be found in file: simibd.aff.out

Output files simibd.out and simibd.aff.out will also be created. These are fairly self-explanatory. Note that your output will not necessarily match exactly because the p-value is based on simulation, a somewhat random process. If a family doesn't have at least two genotyped affected individuals, the p-value column will have the value "N/A" and this family will not be used in the calculations of the total p-value. See above for information about using simibd to get SimAPM values and SimKIN values as well as for information about the number of replicates to be used. We recommend using the weighting function B : f(p) = 1/sqrt(p).

DIFFERENCES BETWEEN VERIONS 1.21 and 2.0cond

Version 2.0cond adds two functionalities to version 1.21. The first increases the power of SimIBD to detect linkage when incomplete typing is present. The approximation in version 1.21 has been replaced by a conditional simulation approach in which the untyped people in the pedigree are simulated conditional on the typed people (the affecteds and unaffecteds for the OBSERVED replicates or the unaffecteds for the NULL DISTRIBUTION replicates). The new methods in SimIBD V2.0cond increase the power to detect linkage but at the expense of slightly increased computational effort.

Also new in SimIBD is the ability to do bootstrapping in the total null distribution. When multiple pedigrees are present in the datafile, using bootstrapping allows more precise and accurate p-values to be generated with little added computational effort.

Two new questions have been added to account for the added user input. The first asks for the number of OBSERVED replicates. For partially typed pedigrees, I typically use 50-100 OBSERVED replicates. For the NULL DISTRIBUTION replicates I typically use 500-1000. Finally, I use between 250 and 1000 bootstraps per replicate. Note that 500 NULL DISTRIBUTION replicates and 1000 bootstraps gives 500000 points in the total null distribution.

Return to the home page