Back to SimWalk2 Overview


SimWalk2: Additional Usage Notes

General:

To use SimWalk2 on data in Linkage-format, first extract the disease locus and the codominant marker loci using the program lsp from the Linkage package; this creates the files datafile.dat and pedfile.dat. Next run Mega2 to convert from Linkage to a format suitable for SimWalk2. This will create the locus, pedigree, and penetrance data files in the correct format. Mega2 will also create a SimWalk2 batch file and a shell script to run SimWalk2. For the availability of the Mega2 program, please see the Additional Resources section.

Use the simwalk2snp executable when the dataset contains only biallelic markers (and optionally a trait). There can be up to 1000 biallelic markers. Otherwise use the normal executable, simwalk2.

Location Scores:

When using this parametric linkage analysis option it is important to specify as fully as possible your model of how the genotype at the trait locus leads to the phenotype. There are several sites that help specify your model. In the locus data file you set the trait allele frequencies. In the penetrance data file you set the trait penetrances, e.g., the phenocopy rate and the degree of reduced penetrance. Finally in the control file, at batch item #13.2, you can set the likely proportion of the pedigrees that are segregating an affected gene linked to this set of marker loci.

Haplotyping:

Since SimWalk2 uses simulated annealing to search a space of often immense size, it may not converge to the best answer on the first run. It may be necessary to run SimWalk2 several times on your data in order to be assured of finding the optimal haplotype configuration.

If you do rerun the program with the same data and parameters, then remember to alter the seeds to the random number generator (see batch item #36); otherwise the results will be identical. Also, you may wish to change the run label (see batch item #2) so that the new results do not overwrite the old output files.

The files RERUN-nn.BCH, RERUN-nn.LOC and RERUN-nn.PED, are created to ease this process of rerunning the haplotype analysis on just those pedigrees for which, according to the criteria below, the program may not have found the optimal haplotype vectors. The 'nn' in the output filenames is the integer label assigned to that run of the program (see batch item #2).

Note that after a rerun using the RERUN-nn.BCH file, all new output files will contain for each pedigree a record of the best haplotype vector found over all the reruns. Thus the haplotypes can never get worse after such a rerun and at the end of these reruns all the best haplotypes are listed even if only a few of the pedigrees actually were rerun.

After a haplotype run that used label nn, to perform such a 'rerun' simply: rename the file RERUN-nn.BCH to BATCH2.DAT and then re-execute SimWalk2.

The RERUN-nn.BCH file contains equivalent parameters to the originating BATCH2.DAT file, except for: new random seeds, a pointer to those pedigrees that need to be rerun, parameters for a progressively more indepth search, and a run label incremented by 1 in order not to overwrite previous output.

The RERUN-nn.LOC file contains equivalent data to the originating locus and map files.

The RERUN-nn.PED file lists all pedigrees which meet at least one of the following two conditions. The first condition is set using batch item #44. Using the default values (0 and 0) batch item #44 instructs SimWalk2 to include for the rerun all pedigrees for which the best haplotype found has more than the expected number of recombinants. This expected number is based on the number of meioses in the pedigree and the lengths of the marker intervals.

The user, however, may use batch item #44 to set exact thresholds specifying when a pedigree should be rerun. There are two thresholds set: the haplotype threshold (haplo_threshold) and the recombination threshold (recomb_threshold). A pedigree is indicated to be a candidate for 'rerunning' if the best haplotype vector found has at least haplo_threshold number of haplotypes with recomb_threshold number of recombinants in each, or any number of haplotypes with more than recomb_threshold number of recombinants. For example, suppose the values for batch item #44 are set at 3 and 2. Then, when a pedigree contains 3 or more doubly recombinant haplotypes or contains any haplotypes with more than 2 recombinants, that pedigree will meet this criterion and will be a candidate for rerunning.

The second condition for adding a pedigree to those that should be rerun is set using batch item #45. In batch item #45 one sets only one parameter: the step-fraction threshold (step_threshold). This threshold is useful because when simulated annealing is performing correctly the optimal result is usually discovered somewhere near the middle of the procedure. A pedigree can thus be indicated to be a candidate for 'rerunning' if the best overall haplotype vector is first encountered after the step_threshold fraction of the total number of steps in the simulated annealing run. For example, the default value for batch item #45 is 0.85. So, using the defaults, if a best haplotype vector is not found until after 85% of the steps in the simulated annealing algorithm, then that pedigree will meet this criterion and will be a candidate for rerunning.

The pedigrees that met these criteria, and thus are listed in RERUN-nn.PED, are indicated in the overall summary file: TABLE-nn.ALL.

If no pedigrees meet the `rerun' criteria, then the RERUN-nn.* files will not exist at the completion of the program run.


To draw, using a Macintosh, the overall haplotype results found by SimWalk2, have SimWalk2 produce the files called PEDRW-nn.mmm (see batch item #43). These files, once transferred to a Macintosh, can be opened by the program Pedigree/Draw version 5. This program will then draw out the haplotypes. For availability of the program Pedigree/Draw, please see the
Additional Resources section.

(Unfortunately, Mac, Windows and Unix have slightly different text files. The relevant difference here is that the end-of-line character is different. So, when transferring a SimWalk2 output file created on one platform to another, one must also be sure the end-of-line character is converted. There are many simple utilities to do this. In addition, an ftp transfer in ASCII mode does this conversion automatically, as do many text editors and word processors, e.g., Microsoft Word, when reading a text file.)

The Pedigree/Draw program has a limitation on the number of loci that can be included in the haplotypes. If the maximum number of characters in any allele name is one, then the maximum number of loci drawn is 30. If the maximum allele name length is two characters, then 20 loci can be drawn. Finally, if any allele has three characters, then only 15 loci can be drawn. The initial loci, including the trait, if any, are the loci that are drawn. Any loci beyond these stated maximums are simply not drawn out.

The following options should be set in Pedigree/Draw 5 for the PEDRW-nn.mmm files produced by SimWalk2 to draw properly. In the command window (this is the initial window that Pedigree/Draw opens) the options Symbol, Text and Identifier should be enabled (i.e., should have a check in the associated box). The options Hap. Text, Hap. Bars and Picture should be disabled. (These options are not fully implemented in the current beta of Pedigree/Draw. When implemented, SimWalk2 will be updated to use these additional options.) Also in the command window, the Reset button should be clicked on before each new pedigree is 'Mapped and Drawn'. This will ensure the drawing is sized appropriately. Finally, in the Drawing Style window (accessed via the Pedigree -> Drawing Style... menu) in the Text settings area, all Lines (1 through 5) must be enabled and the Divider and Indent options must be disabled. (The Length option should not be altered and the Show blank lines option is not applicable.) All these settings, except clicking on the Reset button for each new pedigree, will probably only need to be set once. The program will then use them as the default settings until they are changed.

When using the PEDRW-nn.mmm files as input and the above settings, the output of the Pedigree/Draw program will include the following: the affected individuals are solid black; the first line of text under each individual is their name; if a trait locus is included in the data, then the second line is the trait phenotype (an @ symbol here represents an unknown phenotype value); and next are all the ordered marker genotypes that fit. Each marker genotype consists of the maternal allele, a separator symbol, the paternal allele and, perhaps, a trailing symbol. An @ symbol is also used for unknown alleles, i.e., for alleles that are never typed as they descend through the pedigree. The separator symbols indicate what recombination events occur in the subsequent marker interval:

  | indicates no recombination in the subsequent interval;
  / indicates recombination in the subsequent interval only in the maternal haplotype
    (this symbol points towards the marker interval on the maternal haplotype);
  \ indicates recombination in the subsequent interval only in the paternal haplotype
    (this symbol points towards the marker interval on the paternal haplotype);
  + indicates recombination in both maternal and paternal haplotypes.

The trailing symbols indicate whether the genotype was typed in the data files and whether the inferred phase is fixed by the parental genotypes:

  ! indicates the original phenotype was unknown but inferred phase is fixed;
  * indicates the original phenotype was known but inferred phase is NOT fixed;
  & indicates the original phenotype was unknown and inferred phase is NOT fixed.

(Batch item #46.2, when set to Y, causes the haplotypes to be drawn using letters indicating from which founder an allele is descended, rather than the actual allele value. This also occurs in the HAPLO-nn.mmm file and in that file is a key showing which allele value each letter corresponds to.)


The file HEF-nn.ALL may be able to be imported by a future version of the Windows pedigree drawing program Cyrillic. This single file contains the best haplotypes for each original pedigree. This file, which is in Haplotype Exchange Format 1.1.1, is designed to be human as well as machine readable. For a description of this format, see the file
HEF-111.txt.


Back to SimWalk2 Overview