SimWalk2: `BATCH2.DAT` Control File Format

Summary:

In this file one sets the values of the parameters that control the current SimWalk2 run. For example, in this file one may set the names of all the other input files and the type of analysis that should be performed. This is the only data file that has a fixed name: BATCH2.DAT (note that some operating systems, notably Unix-derived systems, are case sensitive, so that on those systems the name must be exactly as written here).

`BATCH2.DAT` Control File Format:

The BATCH2.DAT file is similar in format to that required by Mendel3, i.e., the data are contained in a series of menu-driven choices: each instruction is formatted as a blank line followed by a line containing the batch item number, in I6 format, followed by the data values. Each data value is on a separate line. Unless otherwise noted each batch item has only one data value. The batch items may appear in the BATCH2.DAT file in any order (except item #50 which indicates that the rest of the file is ignored). Batch items #1 is required, all others are optional. The only commonly altered values are batch items #1-16.

Please see the example BATCH2.DAT file with annotations.

The BATCH2.DAT file may contain the following batch items:

GENERAL PARAMETERS

#1) Integer indicating which type of analysis should be performed:

0 => Pedigree Sampling
1 => Haplotype Analysis
2 => Parametric Linkage Analysis
3 => Non-Parametric Linkage Analysis
4 => IBD Analysis
5 => Mistyping Analysis
10 => Setup & Error Checking (ignoring affection status)
11 => Setup & Error Checking (including affection status)

format I6 [No default value!]
(Number of lines of data = 1)

#2) Integer label for this run of the program. This label will be appended to the names of the output files to make them unique. For example, if this label is nn, then for pedigree number mmm the haplotype analysis will be in file HAPLO-nn.mmm.

format I6 [Default value: 1]
(Number of lines of data = 1)

#3) Problem title. This will be the first line in most output files.

format A40 [Default value: Pedigree Analysis by SimWalk]
(Number of lines of data = 1)

#4) Is this a continuation of a previous SimWalk2 location score analysis to which one now wishes to include additional pedigrees ?
(If so, the pedigree file should now contain only the additional pedigrees and the locus and batch files should be identical to the earlier run except for this batch item and batch item #2. This batch item should be set to Y and batch item #2 should be incremented by 1. All references to the number of a pedigree in any output file or error message will reflect all pedigrees, including those analyzed before this continuation.)

{Only used for location score analysis.}
format A1 (Y or N) [Default value: N]
(Number of lines of data = 1)

#5) Should SimWalk2 run silently, i.e., without any screen output ?
In any case all progress messages are written to the text file: VIDEO-nn.TXT
This batch item is available in SimWalk2 versions 2.80 and above.

format A1 (Y or N) [Default value: N]
(Number of lines of data = 1)

INPUT-FILE PARAMETERS

#9) Map input file name.

format A12 [Default value: MAP.DAT]
(Number of lines of data = 1)

#10) Locus input file name. (See this note to use multiple matched locus and pedigree files.)

format A12 [Default value: LOCUS.DAT]
(Number of lines of data = 1)

#11) Pedigree input file name. (See this note to use multiple matched locus and pedigree files.)

format A12 [Default value: PEDIGREE.DAT]
(Number of lines of data = 1)

#12) Female symbol and male symbol (not case sensitive).

format A1 [Default values: F and M]
(Number of lines of data = 2)

#13) Information concerning the possible trait locus.

(1) Is there a trait locus listed in the locus and pedigree files ?
If so, it must be the initial locus in both the locus and the map files.

(2) The value for alpha, i.e., the a priori proportion of the pedigrees that are segregating an affected gene linked to this set of marker loci. (This line is optional. If missing, the default value 1.0 is assumed.)

format A1 (Y or N) [Default value: Y]
format F6.2 [Default value: 1.0]
(Number of lines of data = 1 or 2)

#16) The label used to indicate the affected individuals. A primitive wildcard is available here. If a '*' is the n-th non-blank character in this label, then all individuals whose affection-status matches this label in the first n-1 non-blank characters, will be considered affected. (Batch item #17 indicates which field contains the affection-status.)

format A8 [Default value: '2' (as used by LINKAGE programs)]
(Number of lines of data = 1)

#17) An integer indicating where the affection-status label for an individual is placed: 0 => in the trait locus; n>0 => in the n-th quantitative variable; -1 => everyone should be considered affected. (Batch item #16 is the label that indicates who is affected.)

format I6 [Default value: 0 (as used by Linkage programs)]
(Number of lines of data = 1)

#18) Number of quantitative variables. If there is any liability class information, then it must be placed in the first quantitative variable, and thus this batch item would be set to at least 1.

format I6 [Default value: 0]
(Number of lines of data = 1)

#19) Name of the file containing the penetrance values for the liability classes defined at the trait locus. If the name is blank, then no penetrance file is used and penetrance values are 0 or 1 as implied by the locus file.

{Only used for location score analysis.}
format A12 [Default value: ' ' (blank)]
(Number of lines of data = 1)

RUN-SPECIFYING PARAMETERS

#20) Number of sampled pedigrees to find for each original pedigree.

format I6 [Default value: 1000]
(Number of lines of data = 1)

#21) Number of parallel runs, i.e., the number of simulated annealing runs started at each initial pedigree. At completion the single best result found over the set of parallel runs is reported.

format I6 [Default value: 1]
(Number of lines of data = 1)

#22) Initial 'temperature' in simulated annealing procedure.

format F6.2 [Default value: 100.0]
(Number of lines of data = 1)

#23) Factor by which the 'temperature' changes during simulated annealing.

format F6.2 [Default value: 0.99]
(Number of lines of data = 1)

#24) Number of 'temperature' changes during simulated annealing.

format I6 [Default value: 800]
(Number of lines of data = 1)

#25) Number of pre-simulated annealing steps.

format I6 [Default value: 0]
(Number of lines of data = 1)

#26) Maximum number of attempts that will be made for each pedigree to find an initial legal descent state using the iterative genotype elimination technique.

format I6 [Default value: 64]
(Number of lines of data = 1)

#29) Minimum number of steps to take between each sampled pedigree:

(1) during simulated annealing and
(2) during the MCMC process.

format I6 [Default values: 1000 and 1000]
(Number of lines of data = 2)

#30) Multiplicative factor for the number of steps:

(1) between temperature changes during simulated annealing and
(2) between realizations during the MCMC process

(The number of steps = max{1000, MF*M*P*2} where MF = this multiplicative factor, M = the number of marker loci and P = the number of people in the current pedigree.)

format I6 [Default values: 10 and 10]
(Number of lines of data = 2)

#31) Mean number of transitions per step:

(1) during simulated annealing and
(2) during the MCMC process.

format F6.2 [Default values: 2 and 2]
(Number of lines of data = 2)

#32) Multiplicative factor of the relative weight given untyped people versus typed people when choosing the pivot person:

(1) during simulated annealing and
(2) during the MCMC process.

format I6 [Default values: 10 and 10]
(Number of lines of data = 2)

#33) Fraction of time the next transition within the same step will pivot on a neighboring person or locus of the previous pivot:

(1) during simulated annealing and
(2) during the MCMC process.

format F6.2 [Default values: 0.5 and 0.5]
(Number of lines of data = 2)

#34) Frequency of the type 2 transition on founders:

(1) during simulated annealing and
(2) during the MCMC process.

format F6.2 [Default values: 0.25 and 0.25]
(Number of lines of data = 2)

#35) Number of unconstrained steps during free runs.

format I6 [Default value: 0 (i.e., no free runs allowed)]
(Number of lines of data = 1)

#36) Random seeds: three integers from the interval [1, 30000].

format I6 [Default values: 27713, 2321 and 18777]
(Number of lines of data = 3)

#37) Information concerning the error model used for the mistyping analysis.

(1) Integer indicating the error model to use. The default is a uniform error model obtained by setting this batch item to 0. An alternative empirical error model is available by setting this batch item to 1. Additional error models may be easily implemented by minor coding within the SimWalk2 source.

(2) The overall error rate for mistyping. (This line is optional. If missing, the default value 0.025 is assumed. However, this line is necessary when setting the following batch item, 37(3).)

(3) Should a modified layout be used in the mistyping output files? By default the posterior probabilities of mistyping are shown grouped by marker. Alternatively, by setting this batch item to Y, the probabilities are grouped by individual. (This line is optional. If missing, the default value N is assumed.)

format I6 [Default value: 0]
format F8.5 [Default value: 0.025]
format A1 (Y or N) [Default value: N]
(Number of lines of data = 1 or 2 or 3)

#38) File name for importing precomputed NPL scores. SimWalk2 2.83 (or later) can use precomputed NPL statistics, which can be computed quickly on small pedigrees by other programs, e.g., Mendel and Merlin. SimWalk2 will combine these precomputed scores with the estimates it obtains for large pedigrees and then compute empirical p-values both for individual pedigrees and overall. If the name is blank, then no import file is used.

format A12 [Default value: ' ' (blank)]
(Number of lines of data = 1)

OUTPUT-FILE PARAMETERS

#39) In the IBD analysis option two types of files can be output. The three IBD files include the probabilities of each of the possible descent configurations given the observed data, and includes both the theoretical and empirical kinship coefficients. The kinship files include only the emprical (also known as, conditional) kinship coefficients. The kinship files are designed to export all the kinship data to QTL programs (using variance component modeling) such as Mendel or SOLAR. There are three recognized values for this batch item:
KIN (to output only the two kinship files) or
IBD (to output only the three IBD files for each pedigree) or
NONE (to output none of these five files) or
ALL (to output all five types of files; the default value).

{Only used for IBD analysis.}
format A [Default value: ALL]
(Number of lines of data = 1)

#40) Output the individual pedigrees into the files INPED-nn.mmm ?
The pedigrees will reflect any reordering of the loci, any renaming of the alleles and any obligate phenotype additions.

Here nn is the integer label for this run of the program
and mmm is the number of the pedigree in this run.

format A1 (Y or N) [Default value: N]
(Number of lines of data = 1)

#41) Output the location scores computed from each original pedigree individually in files SCORE-nn.mmm ?

Here nn is the integer label for this run of the program
and mmm is the number of the pedigree in this run.

{Only used for parametric linkage analysis.}
format A1 (Y or N) [Default value: N]
(Number of lines of data = 1)

#42) Threshold for a mistyping probability to be considered significant. Significant mistypings are flagged in the TYPNG-nn.mmm files and are blanked out in the PEDNU-nn.mmm files.

{Only used for mistyping analysis.}
format F8.5 [Default value: 0.50]
(Number of lines of data = 1)

#43) Create the haplotype output files, PEDRW-nn.mmm and HEF-nn.ALL, for export of the best haplotypes found to pedigree drawing programs: Pedigree/Draw (on Macintosh) and Cyrillic (on Windows) ?

Here nn is the integer label for this run of the program
and mmm is the number of the pedigree in this run.

{Only used for haplotyping.}
format A1 (Y or N) [Default value: Y]
(Number of lines of data = 1)

#44) By default if the best haplotype found contains more than the expected number of recombinants given the number of meiosis and the lengths of the intervals, then the original pedigree is included in RERUN-nn.PED as a candidate for rerunning. However, this batch item may be used to set specific thresholds: the threshold number of haplotypes (HAPLO_THRESHOLD) and the threshold number of recombinants (RECOMB_THRESHOLD). If a pedigree has at least HAPLO_THRESHOLD number of haplotypes with RECOMB_THRESHOLD number of recombinants in each, or any number of haplotypes with more than RECOMB_THRESHOLD number of recombinants, then the original pedigree will be included in RERUN-nn.PED.

{Only used for haplotyping.}
format I6 [Default values: 0 and 0]
(Number of lines of data = 2)

#45) The threshold fraction of the total simulated annealing steps. If a best haplotype is first encountered after this level, then the original pedigree will be included in RERUN-nn.PED.

{Only used for haplotyping.}
format F6.2 [Default value: 0.85]
(Number of lines of data = 1)

#46) This batch item controls the display of the output haplotypes. There are three choices, each to be set with a Y or N.

(1) List the marker haplotypes vertically in the HAPLO-nn.mmm files ?
Vertical maybe the preferred orientation when using more than 10 markers, since the horizontal output would be harder to print.

(2) Use a label indicating the founder haplotype from which each allele is descended, instead of the allele itself ?
A key is included showing the alleles that the labels correspond to.

(3) Force an allele choice even when there is no information, except allele frequencies ?
This is NOT recommended.

{Only used for haplotyping.}
format A1 (Y or N) [Default values: N, N and N]
(Number of lines of data = 3)

#47) Write all output pedigrees in Linkage-format, instead of the default Mendel-format ?

format A1 (Y or N) [Default value: N]
(Number of lines of data = 1)

#48) The number of unconditional simulated pedigrees to use to obtain the empirical p-values for the non-parametric linkage statistics.

format I6 [Default value: 10000]
(Number of lines of data = 1)

#49) This batch item determines at which points within the marker region the IBD and kinship values will be calculated. The first value is an integer that sets the number of evenly spaced points within each marker interval that will be used. This value may be 0 (zero), in which case only the values at the markers will be calculated. If the first value is set to -1, then the second value, which must be a positive real number, is used to set the increment in cM for a grid of points that starts at the first marker and extends up to the last marker. Note that the marker positions are always used, in addition to this grid of points.

format I6 [Default value: 3]
format F6.2 [Default value: 1.0]
(Number of lines of data = 1 or 2)

#50) The end of the data in the BATCH2.DAT file. All subsequent text is ignored.

Back to SimWalk2 Overview

SimWalk2: BATCH2.DAT Control File Format

Summary:

BATCH2.DAT Control File Format:

GENERAL PARAMETERS

INPUT-FILE PARAMETERS

RUN-SPECIFYING PARAMETERS

OUTPUT-FILE PARAMETERS

SimWalk2: `BATCH2.DAT` Control File Format

`BATCH2.DAT` Control File Format: