To use SimWalk2 on data in Linkage-format, first extract the disease locus and the codominant marker loci using the program lsp from the Linkage package; this creates the files datafile.dat and pedfile.dat. Next run Mega2 to convert from Linkage to a format suitable for SimWalk2. This will create the locus, pedigree, and penetrance data files in the correct format. Mega2 will also create a SimWalk2 batch file and a shell script to run SimWalk2. For the availability of the Mega2 program, please see the Additional Resources section.
Use the simwalk2snp executable when the dataset contains only biallelic markers (and optionally a trait). There can be up to 1000 biallelic markers. Otherwise use the normal executable, simwalk2.
When using this parametric linkage analysis option it is important to specify as fully as possible your model of how the genotype at the trait locus leads to the phenotype. There are several sites that help specify your model. In the locus data file you set the trait allele frequencies. In the penetrance data file you set the trait penetrances, e.g., the phenocopy rate and the degree of reduced penetrance. Finally in the control file, at batch item #13.2, you can set the likely proportion of the pedigrees that are segregating an affected gene linked to this set of marker loci.
Since SimWalk2 uses simulated annealing to search a space of often
immense size, it may not converge to the best answer on the first run. It may be
necessary to run SimWalk2 several times on your data in order to be assured of finding
the optimal haplotype configuration.
If you do rerun the program with the same data and parameters, then remember to alter
the seeds to the random number generator (see batch item #36); otherwise the results
will be identical. Also, you may wish to change the run label (see batch item #2)
so that the new results do not overwrite the old output files.
The files RERUN-nn.BCH, RERUN-nn.LOC and RERUN-nn.PED,
are created to ease this process of rerunning the haplotype analysis on just those
pedigrees for which, according to the criteria below, the program may not have found
the optimal haplotype vectors. The 'nn' in the output filenames is the integer
label assigned to that run of the program (see batch item #2).
Note that after a rerun using the RERUN-nn.BCH file, all new output files
will contain for each pedigree a record of the best haplotype vector found over all
the reruns. Thus the haplotypes can never get worse after such a rerun and at the
end of these reruns all the best haplotypes are listed even if only a few of the
pedigrees actually were rerun.
After a haplotype run that used label nn, to perform such a 'rerun' simply:
rename the file RERUN-nn.BCH to BATCH2.DAT and then re-execute
SimWalk2.
The RERUN-nn.BCH file contains equivalent parameters to the originating
BATCH2.DAT file, except for: new random seeds, a pointer to those pedigrees
that need to be rerun, parameters for a progressively more indepth search, and a
run label incremented by 1 in order not to overwrite previous output.
The RERUN-nn.LOC file contains equivalent data to the originating locus
and map files.
The RERUN-nn.PED file lists all pedigrees which meet at least one of the
following two conditions. The first condition is set using batch item #44. Using
the default values (0 and 0) batch item #44 instructs SimWalk2 to include for the
rerun all pedigrees for which the best haplotype found has more than the expected
number of recombinants. This expected number is based on the number of meioses in
the pedigree and the lengths of the marker intervals.
The user, however, may use batch item #44 to set exact thresholds
specifying when a pedigree should be rerun. There are two thresholds set: the haplotype
threshold (haplo_threshold) and the recombination threshold (recomb_threshold).
A pedigree is indicated to be a candidate for 'rerunning' if the best haplotype vector
found has at least haplo_threshold number of haplotypes with recomb_threshold
number of recombinants in each, or any number of haplotypes with more than
recomb_threshold number of recombinants. For example, suppose the values for
batch item #44 are set at 3 and 2. Then, when a pedigree contains 3 or more doubly
recombinant haplotypes or contains any haplotypes with more than 2 recombinants,
that pedigree will meet this criterion and will be a candidate for rerunning.
The second condition for adding a pedigree to those that should be rerun is set using
batch item #45. In batch item #45 one sets only one parameter: the step-fraction
threshold (step_threshold). This threshold is useful because when simulated
annealing is performing correctly the optimal result is usually discovered somewhere
near the middle of the procedure. A pedigree can thus be indicated to be a candidate
for 'rerunning' if the best overall haplotype vector is first encountered after the
step_threshold fraction of the total number of steps in the simulated annealing
run. For example, the default value for batch item #45 is 0.85. So, using the defaults,
if a best haplotype vector is not found until after 85% of the steps in the simulated
annealing algorithm, then that pedigree will meet this criterion and will be a candidate
for rerunning.
The pedigrees that met these criteria, and thus are listed in RERUN-nn.PED,
are indicated in the overall summary file: TABLE-nn.ALL.
If no pedigrees meet the `rerun' criteria, then the RERUN-nn.* files will
not exist at the completion of the program run.
To draw, using a Macintosh, the overall haplotype results found by SimWalk2, have
SimWalk2 produce the files called PEDRW-nn.mmm (see batch item #43). These
files, once transferred to a Macintosh, can be opened by the program Pedigree/Draw
version 5. This program will then draw out the haplotypes. For availability of the
program Pedigree/Draw, please see the Additional Resources section.
(Unfortunately, Mac, Windows and Unix have slightly different text files. The relevant
difference here is that the end-of-line character is different. So, when transferring
a SimWalk2 output file created on one platform to another, one must also be sure
the end-of-line character is converted. There are many simple utilities to
do this. In addition, an ftp transfer in ASCII mode does this conversion automatically,
as do many text editors and word processors, e.g., Microsoft Word, when reading a
text file.)
The Pedigree/Draw program has a limitation on the number of loci that can be included
in the haplotypes. If the maximum number of characters in any allele name is one,
then the maximum number of loci drawn is 30. If the maximum allele name length is
two characters, then 20 loci can be drawn. Finally, if any allele has three characters,
then only 15 loci can be drawn. The initial loci, including the trait, if any, are
the loci that are drawn. Any loci beyond these stated maximums are simply not drawn
out.
The following options should be set in Pedigree/Draw 5 for the PEDRW-nn.mmm
files produced by SimWalk2 to draw properly. In the command window (this is the initial
window that Pedigree/Draw opens) the options Symbol, Text and Identifier should be
enabled (i.e., should have a check in the associated box). The options Hap. Text,
Hap. Bars and Picture should be disabled. (These options are not fully implemented
in the current beta of Pedigree/Draw. When implemented, SimWalk2 will be updated
to use these additional options.) Also in the command window, the Reset button should
be clicked on before each new pedigree is 'Mapped and Drawn'. This will ensure the
drawing is sized appropriately. Finally, in the Drawing Style window (accessed via
the Pedigree -> Drawing Style... menu) in the Text settings area, all
Lines (1 through 5) must be enabled and the Divider and Indent options must
be disabled. (The Length option should not be altered and the Show blank lines
option is not applicable.) All these settings, except clicking on the Reset button
for each new pedigree, will probably only need to be set once. The program will then
use them as the default settings until they are changed.
When using the PEDRW-nn.mmm files as input and the above settings, the output
of the Pedigree/Draw program will include the following: the affected individuals
are solid black; the first line of text under each individual is their name; if a
trait locus is included in the data, then the second line is the trait phenotype
(an @ symbol here represents an unknown phenotype value); and next are all
the ordered marker genotypes that fit. Each marker genotype consists of the maternal
allele, a separator symbol, the paternal allele and, perhaps, a trailing symbol.
An @ symbol is also used for unknown alleles, i.e., for alleles that are
never typed as they descend through the pedigree. The separator symbols indicate
what recombination events occur in the subsequent marker interval:
| indicates no recombination in the subsequent interval;
/ indicates recombination in the subsequent interval only
in the maternal haplotype
(this symbol points towards the marker interval
on the maternal haplotype);
\ indicates recombination in the subsequent interval only
in the paternal haplotype
(this symbol points towards the marker interval
on the paternal haplotype);
+ indicates recombination in both maternal and paternal
haplotypes.
The trailing symbols indicate whether the genotype was typed in the data files and
whether the inferred phase is fixed by the parental genotypes:
! indicates the original phenotype was unknown but inferred
phase is fixed;
* indicates the original phenotype was known but inferred
phase is NOT fixed;
& indicates the original phenotype was unknown and
inferred phase is NOT fixed.
(Batch item #46.2, when set to Y, causes the haplotypes to be drawn using letters
indicating from which founder an allele is descended, rather than the actual allele
value. This also occurs in the HAPLO-nn.mmm file and in that file is a key
showing which allele value each letter corresponds to.)
The file HEF-nn.ALL may be able to be imported by a future version of the
Windows pedigree drawing program Cyrillic. This single file contains the best haplotypes
for each original pedigree. This file, which is in Haplotype Exchange Format 1.1.1,
is designed to be human as well as machine readable. For a description of this format,
see the file HEF-111.txt.