IV. DEFINING THE MODEL

IV.1. Execution Options
IV.2. Transmission Parameters
IV.3. Recombination Parameters
IV.4. Frequency Parameters
IV.5. Discrete Trait Parameters
IV.6. Quantitative Trait Parameters
IV.7. Within-Genotype Parameters
IV.8. Multivariate Parameters
IV.9. Genotype-Assignment Codes

Program prepap (section III.3) is used to define the model to compute likelihoods or simulate phenotypes using either simul (section III.4) or papdr (section III.5). PAP can assume a variety of genetic models related to any number of markers and traits. Different parameterizations are available to define the model. This section describes the features which are available in PAP; models can be defined by combining these features.

Section IV.1 describes the execution options for program papdr. Sections IV.2-8 describe the parameterizations and section IV.9 describes the genotype assignment options available in papdr and also program simul. See section D for examples of each feature.

IV.1. Execution Options

Program papdr (section III.5) can perform eight functions: (1) compute a likelihood for a set of parameter values, (2) compute genotypic probabilities, (3) simulate phenotypes and output them to files, (4) maximize the likelihood with or without standard error calculation, (5) compute standard errors when the estimates are known, (6) simulate phenotypes and estimate the parameter values, (7) estimate expected lod scores, (8) compute likelihoods of a grid of parameter values.

See section D.1 for examples demonstrating the execution options. See sections IV.1.1 through IV.1.8 for descriptions of the execution options.

IV.1.1. Single likelihood

Execution option 1 computes the likelihood of the specified parameter values. File pap.out lists by pedigree, as well as for the complete sample, a count of total and measured members and common and natural (multiplied by -2) logarithms of the likelihoods before and after ascertainment correction.

I use execution option 1 to: (1) check for the correct sample size in my data file, (2) test for genotype inconsistencies in the markers, (3) compare the likelihoods of two different models within pedigrees, (4) obtain a likelihood for comparison to a known value (hand computed or using other subroutines).

See section D.1.1 for an example demonstrating execution option 1.

IV.1.2. Genotypic probabilities

Execution option 2 computes genotypic probabilities for either designated individuals or all sample members. The genotypic probabilities computed pertain to the genetic model and parameter values contained in model.dat. File pap.out includes only the ID numbers and the corresponding probabilities. For multi-locus genotypes, either single-locus or multi-locus probabilities may be output. For pedigree members untyped for a marker, pap.out will indicate possible incorrect probabilities due to genotype combining (section VI.3). File pap.out as output from option 2, sorted, and renamed prob.dat, serves as input to gpe (section III.6).

I use execution option 2 to: (1) identify possible gene-carriers for a gene inferred from segregation analysis, (2) produce the probabilities to estimate parameters using GPEs [Hasstedt & Moll 1989] with program gpe (section III.6), (3) explore counter-intuitive results by checking the genotypic assignment of selected pedigree members.

See section D.1.2 for an example demonstrating execution option 2.

IV.1.3. Simulation and output to files

Execution option 3 simulates phenotypes reflecting the genetic model and parameter values contained in model.dat and outputs files header.dat, phen.01, phen.02, ..., phen.n for n replicates. Pedigree structure is fixed. Phenotypes for some pedigree members may be fixed or designated as missing. For unconditional simulation (no phenotypes fixed) program simul (section III.4) may be used instead. In either case, the assigned genotypes for each individual and each replicate are output in file pap.log.

I use execution option 3 to: (1) quickly generate distinct but similar data for each student in a linkage class, (2) produce replicates with identical inheritance to estimate power for segregation analysis (option 7 estimates power for linkage analysis), (3) produce data with known inheritance in order to evaluate analysis methods.

See section D.1.3 for an example demonstrating execution option 3.

IV.1.4. Maximization of parameters

Execution option 4 maximizes the likelihood of specified parameters using either GEMINI [Lalouel 1979] or NPSOL [Gill et al 1986]. You may maximize any number of parameters simultaneously and may restrict two or more parameters to the same value. With GEMINI execution terminates upon obtaining a boundary value. NPSOL tests boundary values and determines if the maximum occurs on a boundary. Upon attaining the maximum, papdr outputs a new version of model.dat containing the estimated parameter values.

Maximization, the primary execution option selected, simultaneously produces maximum likelihood estimates of the parameters and maximized likelihoods for statistical testing; you may also compute standard errors. However, you will save computer time by reserving standard error computation (using execution option 5) for a restricted set of models determined from a set of exploratory runs executed without standard errors. Section V.3 discusses the problems encountered when maximizing the likelihood.

See section D.1.4 for an example demonstrating execution option 4.

IV.1.5. Standard errors of parameters

Execution option 5 approximates standard errors of parameters by computing numerical central derivatives. You must provide maximum likelihood estimates; file model.dat output from execution option 4 serves as input. Standard errors will not be computed for a parameter estimated at either a lower or an upper bound. Execution will terminate prematurely upon finding a higher likelihood when computing derivatives; papdr outputs a new version of model.dat containing the more likely parameter values.

I use execution option 5 to: (1) compute standard errors for the models for which I plan to publish parameter estimates, (2) verify or refute a local maximum when option 4 terminates with an equivocal code.

See section D.1.5 for an example demonstrating execution option 5.

IV.1.6. Simulation and estimation

Execution option 6 simulates phenotypes reflecting the genetic model and parameter values in model.dat, estimates the specified parameters, and outputs replicate means and standard deviations of the estimates and the replicate mean log likelihood. File pap.log contains the parameter estimates and log likelihood of each replicate.

I use execution option 6 to: (1) evaluate the accuracy of the mixed model approximation on my data through its ability to return estimates close to the simulated values, (2) test goodness-of-fit by comparing the maximum likelihood on my data to a maximum likelihood distribution produced by simulation.

See section D.1.6 for an example demonstrating execution option 6.

IV.1.7. Expected lod score estimation

Execution option 7 simulates marker phenotypes, estimates the recombination probabilities, and outputs replicate means and standard deviations of the lod score. File pap.log contains the recombination estimates and lod scores of each replicate.

I use execution option 3 to: (1) determine if a set of disease pedigrees contains sufficient information to detect linkage, (2) estimate the number of pedigrees of a fixed structure to collect for a linkage study, (3) identify which pedigrees with partial marker typing are worth more typing.

See section D.1.7 for an example demonstrating execution option 7.

IV.1.8. Grid on a parameter

Execution option 8 divides user-specified ranges of one or two parameters into 5 equal intervals and computes the likelihood of each value. Program papdr outputs both common and natural (multiplied by -2) logarithms of the likelihood. For values of frequencies or variance components which exceed a sum of 1, papdr outputs 0.

I use execution option 8 to: (1) provide detail of the likelihood surface when I suspect a flat surface, (2) search for a higher likelihood when a parameter maximizes on the boundary (see section V.3.2), (3) test for linkage using a grid on the recombination probability.

See section D.1.8 for an example demonstrating execution option 8.

IV.2. Transmission Parameters

Subroutine paptcms (section C.2.7) assumes Mendelian segregation. Alternatively, subroutine paptctp (section C.2.8) allows you to estimate the allele transmission probabilities in order to test for major locus inheritance. This test compares the likelihood of the general model with estimated allele transmission probabilities to the likelihood of its submodel, Mendelian segregation; a similar comparison to the likelihood of another submodel, environmental nontransmission, tests an alternative hypothesis. Rejecting environmental nontransmission while failing to reject Mendelian segregation supports major locus inheritance.

Subroutine paptctp defines t₁, t₂, t₃, the allele transmission probabilities, as the probability a parent of genotype 1, 2, or 3, respectively, transmits allele 1 to an offspring. Subroutine paptctp then applies the Hardy-Weinberg law to compute offspring genotype transmission probabilities for each pair of parental genotypes. For this reason, the frequency subroutine (papfq, section C.1) selected should also assume Hardy-Weinberg equilibrium and compute the genotype frequencies from the allele frequency p.

Both Mendelian segregation and environmental nontransmission maintain a constant allele frequency across generations. The same restriction on the general model obeys p = p2t₁ + 2pqt₂ + q2t₃. Subroutine paptcet (section C.2.6) includes only t₁ and t₃ as parameters; t₂ is computed to conform to the constraint.

Subroutines paptcms, paptctp, and paptcet apply to any genotype-assignment code (section IV.9). For codes with sex-specific genotypes, the allele transmission probabilities depend on the sex of both parent and offspring [Demenais & Elston 1981]. Subroutines paptctp and paptcet restrict the genetic model to 1 locus with 2 alleles; paptcms makes no restrictions.

See section D.2 for examples demonstrating the use of paptctp. See sections IV.2.1 through IV.2.3 for examples demonstrating the use of transmission probabilities.

IV.2.1. Mendelian segregation

Mendelian transmission probabilities of t₁ = 1, t₂ = _, and t₃ = 0 restrict a parent of genotype 1 (AA) to transmit exclusively allele A, a parent of genotype 2 (Aa) to transmit allele A with probability _, and a parent of genotype 3 (aa) to never transmit allele A.

See section D.2.1 for an example demonstrating Mendelian segregation using paptctp.

IV.2.2. Environmental nontransmission

Environmental nontransmission results from making the transmission probabilities independent of the parental genotypes. That is, the transmission probabilities are all the same regardless of the parental or offspring genotypes. The allele frequency may be set equal to the common transmission probability to ensure equilibrium across generations. Otherwise, a different frequency may be estimated for founders than for nonfounders. This is unreasonable in multi-generation pedigrees where founders occur in all generations and at all ages. On the other hand, a frequency difference between founders and nonfounders may accurately represent nuclear families with all founders (parents) older than all nonfounders (children).

Application of the Hardy-Weinberg law in paptctp forces a relationship between the genotype transmission probabilities. When dominance reduces the model to two distinguishable phenotypes, the two proportions are unrestricted. However, when three phenotypes can be distinguished, their occurrence in Hardy-Weinberg proportions places a genetic constraint on a supposedly environmental model.

You can extend the environmental model to any number of alleles and loci by using paptce (section C.2.5), which equates the genotype transmission probabilities to the genotype frequencies. You can eliminate the Hardy-Weinberg assumption by using paptce with papfqg (section C.1.3). You can perform commingling analysis of a quantitative trait by using the environmental model with a power transformation (section IV.6).

See section D.2.2 for an example demonstrating environmental nontransmission using paptctp.

IV.2.3. General transmission

Both Mendelian segregation and environmental nontransmission constitute submodels of a general transmission model with each allele transmission probability estimated to a value between 0 and 1. However, t₁ and t₃ often estimate to 1 and 0, respectively. Estimation on the boundary complicates determining the degrees of freedom in testing the Mendelian and environmental models.

Estimating p, t₁, t₂, and t₃ ignores the assumption of equilibrium across generations made in the Mendelian and environmental nontransmission models. Alternatively, you can estimate p, t₁ and t₃ and compute t₂ from p, t₁ and t₃ using paptcet (section C.2.6) [Demenais & Elston 1981].

See section D.2.3 for an example demonstrating general transmission using paptctp.

IV.3. Recombination Parameters

The recombination probability _ becomes a parameter in multi-locus models. Parameter _ may be gender-specific or equal for men and women. For more than two loci, you specify the order of the loci and __i represents the recombination probability between locus i and locus i + 1. When estimating _, you may fix or simultaneously estimate other parameters. In linkage analysis, you test if _ differs significantly from _. For the hypothesis that two traits result from the same locus, you test if _ differs significantly from 0.0.

To assume free recombination in a multi-locus model, use paptcms (section C.2.7). To assume linkage and autosomal inheritance use paptcal (section C.2.1). For linkage with other genotype-assignment codes (section IV.8), consult the list of subroutines in section C.2. None of the linkage subroutines restrict the number of loci or alleles.

See section D.3 for examples demonstrating recombination.

IV.4. Frequency Parameters

The probability of a specified genotype for a founder equals the corresponding genotype frequency; this assumes that founders in the pedigrees constitute a random sample of the population. Since the frequencies sum to 1, a constraint reduces the number of parameters; the final frequency (for each locus or total) cannot be estimated.

To assume Hardy-Weinberg equilibrium and linkage equilibrium for the genotype frequencies, use papfqhw (section C.1.6). Alternatively, to assume or test for linkage disequilibrium or deviations from Hardy-Weinberg equilibrium, use another versions of papfq (section C.1).

See section D.4 for examples demonstrating the frequency parameters.

IV.4.1. Allele frequencies

Subroutine papfqh2 (section C.1.5) uses linkage disequilibrium D to compute genotype frequencies. D equals the excess of the haplotype frequency over the product of allele frequencies (locus 1, allele 1 and locus 2, allele 1). Therefore, the range of D depends on the allele frequencies; 0 represents equilibrium. You might select this parameterization over papfqa2 (section IV.4.2) in order to fix the allele frequencies.

Subroutine papfqx2 (section C.1.7) parameterizes D equivalently for X-linked inheritance. Both papfqh2 and papfqx2 restrict the genetic model to two loci with 2 alleles each.

See section D.4.1 for examples demonstrating the use of papfqh2.

IV.4.2. Conditional allele frequencies

Subroutine papfqa2 (section C.1.1) uses conditional allele frequencies in computing the genotype frequencies. For locus 1, the allele frequency is conditional on the allele at locus 2; for locus 2, allele frequencies are unconditional. Equality of the conditional allele frequencies equates to linkage equilibrium; significant differences indicate association between the loci. You might select this parameterization over papfqh2 (section IV.4.1) for diseases associated with a marker locus. For example, estimate the frequency of the allele for diabetes mellitus (locus 1) conditional on HLA-DR3, conditional on HLA-DR4, and conditional on HLA-DRX (locus 2).

Subroutine papfqa2s (section C.1.2) parameterizes the frequencies equivalently for category-specific inheritance. Both papfqa2 and papfqa2s restrict the first locus to 2 alleles, but do not restrict the number of alleles at the second locus.

See section D.4.2 for examples demonstrating the use of papfqa2.

IV.4.3. Haplotype frequencies

Subroutine papfqh (section C.1.4) uses haplotype frequencies in computing the genotype frequencies assuming Hardy-Weinberg equilibrium. This parameterization removes any constraints on the numbers of loci or alleles.

See section D.4.3 for examples demonstrating the use of papfqh.

IV.4.4. Genotype frequencies

Subroutine papfqg (section C.1.3) uses genotype frequencies directly, eliminating all assumptions about relationships between the frequencies.

See section D.4.4 for examples demonstrating the use of papfqg.

IV.5. Discrete Trait Parameters

For a discrete trait, the genetic model assumes an underlying continuous liability scale distributed as a normal density within each genotype. Disease occurs when liability exceeds a threshold determined such that the integral to the right of the threshold within each genotype equals the corresponding affection probability.

All the subroutines for discrete data except dmlprsv (section IV.5.4) apply only to dichotomous (unaffected/affected) phenotypes. All the subroutines for discrete data except dmlprpn (section IV.5.1) restrict the genetic model to one locus with two alleles with the second the disease allele. Subroutines dmlprin, dmlprpr and dmlprsv require that the population incidence or prevalence figures be entered into popln.dat (section II.5).

See section D.5 for examples demonstrating the discrete trait parameters. See sections IV.5.1 through IV.5.4 for descriptions of the discrete trait parameters.

IV.5.1. Affection probability

Subroutine dmlprpn (section C.3.2) uses as parameters the affection probability for each genotype. An advantage of this parameterization is that the genetic model can include any number of loci and alleles. A disadvantage of this parameterization is that disease prevalence cannot be restricted to the same value for different models. You can restrict the penetrances to increase across genotypes by editing file model.dat (section B.7) to specify linear constraints on the parameters.

Subroutines dmlprpr (section IV.5.2) or dmlprin (section IV.5.3) are alternatives if the disease is age-dependent. Subroutine dmlprsv (section IV.5.4) can be used to restrict the prevalence.

See section D.5.1 for an example demonstrating the use of dmlprpn.

IV.5.2. Prevalence

Subroutine dmlprpr (section C.3.3) restricts the affection probabilities to correspond to age-specific (and possibly gender-specific) population prevalence figures (given in popln.dat). The subroutine applies only to 1-locus, 2-allele autosomal inheritance. The prevalence figures determine a series of points T_i ordered inversely on the liability scale such that the sum over the three genotypes of the integrals from T_i to [[infinity]] equals the corresponding prevalence. Therefore, the model assumes that earlier onset corresponds to higher liability. For an affected individual with age at examination in interval i, the affection probability for a genotype equals the integral from T_i to [[infinity]] of the normal density for that genotype. For an unaffected individual with age at examination in interval i, the affection probability for a genotype equals the integral from -[[infinity]] to T_i-1 of the normal density for that genotype.

The disease allele is the second of the two alleles. Age at examination must be available for any individual assigned disease status as a separate column in phen.dat (section II.4).

The major locus effect is parameterized as dominance d and displacement t. If we represent the mean liability for the three genotypes as µ₁, µ₂, and µ₃ and the variance within genotypes as s2 with the total mean and variance restricted to equal 0 and 1, respectively, d = (µ₂ - µ₁)/(µ₃ - µ₁) and t = (µ₃ - µ₁)/s. Note that t differs from the definition of displacement in Morton & MacLean [1974]. The output in pap.out includes the affection probability for each age interval as computed for each genotype.

For a gender-specific model, use category-specific autosomal inheritance (genotype-assignment code = 5) and include prevalence figures for both males and females in popln.dat. Dominance and displacement are gender-specific.

Subroutine dmlprin (section IV.5.3) is an alternative if onset age is available. Subroutines dmlprpn (section IV.5.1) or dmlprsv (section IV.5.4) could be used if the disease is not age-dependent.

See section D.5.2 for an example demonstrating the use of dmlprpr.

IV.5.3. Incidence

Subroutine dmlprin (section C.3.1) restricts the affection probabilities to correspond to age-specific (and possibly gender-specific) population incidence figures (given in popln.dat). The subroutine applies only to 1-locus, 2-allele, autosomal inheritance. The incidence figures determine a series of points T_i ordered inversely on the liability scale such that the sum over the three genotypes of the integrals from T_i to T_i-1 equals the corresponding incidence. Therefore, the model assumes that earlier onset corresponds to higher liability. For an affected individual with onset age in interval i, the affection probability for a genotype equals the integral from T_i to T_i-1 of the normal density for that genotype. For an affected individual with unknown onset age but age at examination in interval i, the affection probability for a genotype equals the integral from T_i to [[infinity]] of the normal density for that genotype. For an unaffected individual with age at examination in interval i, the affection probability for a genotype equals the integral from -[[infinity]] to T_i-1 of the normal density for that genotype.

The disease allele is the second of the two alleles. Onset age or age at examination must be available for any individual assigned disease status as separate columns in phen.dat (section II.4).

Subroutine dmlprpr (section IV.5.2) is an alternative if onset age is not available. Subroutine dmlprpn (section IV.5.1) or dmlprsv (section IV.5.4) could be used if the disease is not age-dependent.

See section D.5.3 for an example demonstrating the use of dmlprin.

IV.5.4. Severity Classes

Subroutine dmlprsv (section C.3.4) restricts the affection probabilities to correspond to severity-specific (and possibly gender-specific) population prevalence figures (given in popln.dat). The subroutine applies only to 1-locus, 2-allele, autosomal inheritance. The prevalence figures determine a series of points T_i, i increasing across the liability scale, such that the sum over the three genotypes of the integrals from T_i-1 to T_i equals the corresponding prevalence. Therefore, the model assumes that greater severity corresponds to higher liability. For an individual affected at severity i, the affection probability for a genotype equals the integral from T_i-1 to T_i of the normal density for that genotype. For an unaffected individual, the affection probability for a genotype equals the integral from -[[infinity]] to T₁ of the normal density for that genotype.

The disease allele is the second of the two alleles. Severity may be coded in any manner. However, the severity categories in popln.dat should be ordered with unaffected first, then disease categories through increasing levels of severity. The corresponding ranges will associate the appropriate severity codes with each category. For an unaffected/affected dichotomy, popln.dat contains two entries, for unaffected and affected.

Subroutines dmlprpr (section IV.5.2) and dmlprin (section IV.5.3) are alternatives for age-dependent diseases, but allow only an unaffected/affected dichotomy.

See section D.5.4 for an example demonstrating the use of dmlprsv.

IV.6. Quantitative Trait Parameters

For a quantitative trait, the phenotypes are assumed to distribute normally within genotypes. The penetrance equals the height of the normal density (including 1/[[radical]]2[[pi]]) at the phenotype. You can scale, standardize, or power-transform phenotypes by specification in file header.dat (see section II.3).

The effects on the genotypic means of up to five covariates may be estimated for each quantitative trait. Only linear effects are estimated; for cross or squared terms, include the square or product as a variable in file phen.dat and treat as another covariate. This extension allows for genotype-specific age or measured environmental effects.

Quantitative phenotypes may be power-transformed using the power function r/P((x/r + 1)^P - 1) [MacLean et al 1976], where x represents the phenotype, r equals 6, and P represents the power parameter. Set P = 1 if you don't want any transformation. Otherwise, the phenotypes must be standardized; you may standardize by specification in header.dat (section II.3). Estimating P within a 1-genotype model allows you to determine the transformation to correspond to a single normal density. Estimating P within the environmental model (section IV.2.2) allows you to test for a mixture of distributions. If you then insert the estimate of P into header.dat (section II.3) and run preped (section III.1), further analyses will consider the transformed phenotypes. Estimating P within a genetic model allows transformation of the phenotypes to obtain the best fit to each model.

Only subroutine qmlprdd (section IV.6.2) restricts the number of loci to one, the number of alleles to two and the within-genotype standard deviations to equality.

See section D.6 for examples demonstrating the quantitative trait parameters. See sections IV.6.1 through IV.6.3 for descriptions of the quantitative trait parameters.

IV.6.1. Means/Standard Deviations

Subroutine qmlprmv (section C.4.2) parameterizes the model as means µ_i and standard deviations s_i for each genotype i. The number of loci and alleles included in the genetic model are unrestricted.

See section D.6.1 for examples demonstrating the use of qmlprmv.

IV.6.2. Dominance/Displacement

Subroutine qmlprdd (section C.4.1) restricts the genetic model to one locus with two alleles. The parameters comprise the total mean µ_T, total standard deviation s_T, dominance d, and displacement t. If p represents the frequency of allele 1, q = 1 - p, µ_i represent the genotypic means and s_i represent the genotypic standard deviations, i = 1, 2, 3, then µ_T = p2µ₁ + 2pqµ₂ + q2µ₃, s_T2 = p2µ₁2 + 2pqµ₂2 + q2µ₃2 + s2, d = (µ₂ - µ₁)/(µ₃ - µ₁), and t = (µ₃ - µ₁)/s. Note that t differs from the definition of displacement in Morton & MacLean [1974].

See section D.6.2 for examples demonstrating the use of qmlprdd.

IV.6.3. Means/Standard Deviations/Threshold

Subroutine qmlprmvt (section C.4.3) parameterizes the model as means for each genotype µ_i, standard deviations for each genotype s_i, and threshold T. Parameter T specifies the lower limit of the trait for individuals whose phenotypes have been rendered impossible to obtain because medication or the disease process has altered the level. For example, medicated hypertensives might be included in an analysis of blood pressure by specifying that their level exceeds the diagnostic threshold. Each would be designated a missing value for the quantitative trait in phen.dat, and as affected for a corresponding disease trait. The number of loci and alleles included in the genetic model are unrestricted.

See section D.6.3 for examples demonstrating the use of qmlprmvt.

IV.7. Within-Genotype Parameters

The variation within the normal densities for each genotype may be attributed to polygenes or environmental factors shared by pedigree members, which contribute to the correlation between pedigree members, or to individual-specific environmental factors. No correlations are possible between members of different pedigrees.

Subroutine papwgml (section C.5.3) assumes all variation within genotypes is due to individual-specific environmental factors. There are no associated parameters.

Subroutine papwgvc (section C.5.4) attributes within-genotype variation to variance components: heritability h2 and shared environmental factors c_i2. The number of shared environmental components is unrestricted; each corresponds to a column in phen.dat (section II.4) designating individuals who share the environment. Parameters h2 and c_i2 are proportions of the within-genotype variance. Note that h2 differs from the definition of heritability in Morton & MacLean [1974].

Subroutine papwgam (section C.5.1) adds assortative mating [Hasstedt 1995] to the variance components model. The added parameter comprises a correlation a² which is constant across the range of the phenotype and contributes to a within-genotype correlation between spouses and a modification of the major locus genotype frequencies. When including assortative mating, the genetic model is restricted to autosomal inheritance of one locus with two alleles, the genotypic means or penetrances must be non-decreasing across the genotypes, and, in linkage analysis, there must be linkage equilibrium. You can restrict the genotypic means or penetrances to increase across genotypes by editing file model.dat (section B.7) to specify linear constraints on the parameters. Program preped (section III.1) generates a different input file w hen the model includes assortative mating.

Subroutine papwgfc (section C.5.2) parameterizes within-genotype correlations according to genetic relationships. The gender-specific familial correlations comprise husband-wife __hw, mother-daughter __md, mother-son __ms, father-daughter __fd, father-son __fs, sister-sister __ss, sister-brother __sb, and brother-brother __bb. It is possible to restrict __md = __ms = __fd = __fs and __ss = __sb = __bb. Outside the nuclear family, zero correlation is assumed for all relative pairs. Gender must be included in phen.dat (section II.4) for any individual assigned a trait phenotype.

Subroutine papwgml is used with papend/papcr for discrete traits, papenq/papcr for quantitative traits, or papendq/papcr for models including both discrete and quantitative traits. To compute the likelihood exactly, subroutines papwgvc, papwgam, and papwgfc are used with papende/papcrde for discrete traits, papenqe/papcrqe for quantitative traits, or papendqe/papcrdqe for models including both discrete and quantitative traits. Exact computation of the likelihood when the model includes correlations between pedigree members requires summing over the probabilities of all combinations of genotypes for pedigree members; exact calculation requires too much computer time for pedigrees with more than about ten members. Alternatively, papenda/papcrda for discrete traits, papenqa/papcrqa for quantitative traits, or papendqa/papcrdqa for models including both discrete and quantitative traits approximate the likelihood [Hasstedt 1993].

See section D.7 for examples demonstrating the within-genotype parameters.

IV.8. Multivariate Parameters

One marker and any number of traits may be associated with each locus in the genetic model; a trait, but not a marker, may be associated with multiple loci. You specify the locus-variable associations through responses to queries from prepap (section III.3).

There are no multivariate parameters for a marker and trait associated with the same locus, measured genotype analysis [Boerwinkle et al 1986]. For example, you can estimate means for total serum cholesterol level within genotypes for apolipoprotein E defined by electrophoresis. By associating total serum cholesterol with a second locus as well, you can simultaneously account for the LDL receptor defect.

Using the within-genotype subroutines papwgml or papwgfc (section IV.7), the multivariate parameter, for each pair of traits, comprises the correlation between the traits _. Using the within-genotype subroutine papwgvc (section IV.7), each correlation is partitioned into two: the genetic correlation __g reflects the effect of the same polygenes on both traits; the environmental correlation __e derives from shared environmental factors. Parameters __g and __e reflect correlation between the two traits in pedigree members in addition to the correlation within a single individual; parameter _ reflects the correlation only within an individual.

See section D.8 for examples demonstrating the multivariate parameters.

IV.9. Genotype-Assignment Codes

PAP allows seven genotype-assignment codes. The options include: (1) autosomal, (2) X-linked, (3) parent-specific autosomal, (4) parent-specific X-linked, (5) category-specific autosomal, (6) autosomal/X-linked mixed, (7) autosomal/X-linked admixture.

For each option, you may specify any number of loci and any number of alleles per locus. Program prepap (section III.3) presents the order of the genotypes when requesting parameter values. Alternatively, section VI.2 describes the order of the genotypes.

See section D.9 for examples demonstrating genotype-assignment codes. See sections IV.9.1 through IV.9.7 for descriptions of the genotype assignment codes.

IV.9.1. Autosomal inheritance

This is the standard model for autosomal inheritance.

See section D.9.1 for an example demonstrating autosomal inheritance.

IV.9.2. X-linked inheritance

This is the standard model for X-linked inheritance.

See section D.9.2 for an example demonstrating X-linked inheritance.

IV.9.3. Parent-specific autosomal

This extension of autosomal inheritance distinguishes between heterozygotes according to the parental origin of each allele. This distinction allows the penetrance or transmission probabilities to differ between the two types of heterozygotes. For example, parent-specific autosomal genotype assignments allow you to independently estimate parameters describing age of onset in maternally and paternally transmitted disease.

See section D.9.3 for examples demonstrating parent-specific autosomal inheritance.

IV.9.4. Parent-specific X-linked

This extension of X-linked inheritance distinguishes between heterozygotes according to the parental origin of each allele. This distinction allows the penetrance or transmission probabilities to differ between the two types of heterozygotes.

See section D.9.4 for examples demonstrating parent-specific X-linked inheritance.

IV.9.5. Category-specific autosomal

This extension of autosomal inheritance allows you to classify individuals in two categories and independently specify the penetrance or transmission probabilities for all genotypes. Possible dichotomies for separate parameter specification include males and females, nonsmokers and smokers, unmedicated individuals and individuals taking medication. Corresponding to the assignments of genotypes for females (category 2) before males (category 1), category 2 always precedes category 1 in the genotype order.

The genotype order for category-specific autosomal inheritance equals the order for the autosomal/X-linked mixed model (section IV.9.6).

See section D.9.5 for examples demonstrating category-specific autosomal inheritance.

IV.9.6. Autosomal/X-linked mixed model

The autosomal/X-linked mixed model [Hasstedt & Skolnick 1984] encompasses both autosomal and X-linked inheritance. Therefore, successively comparing the likelihoods of the autosomal and X-linked models to the likelihood of the general model tests the two modes of inheritance. Rejection of both modes of inheritance suggests an intermediate form of transmission which is not interpretable as genetic. Alternatively, the general form of the autosomal/X-linked admixture model (section IV.9.7) more realistically assumes that alleles with each mode of inheritance occur in the sample. However, the admixture model requires more parameters than the mixed model and is not realistic for a single pedigree.

The autosomal/X-linked mixed model includes three genotypes for males. When the parameters correspond to X-linkage, the frequency of the third genotype equals 0. You must define the alleles to specify phenotypic equivalence between the second and third genotypes; a dominant model has the normal allele first; a recessive model has the disease allele first; a codominant model is nonsensical.

Transmission probabilities from a father with genotype 2 distinguish between autosomal and X-linked inheritance. If both the son and daughter allele transmission probabilities equal _, you obtain autosomal inheritance. If the son allele transmission probability equals 1 and the daughter allele transmission probability equals 0, you obtain X-linked inheritance. All the other transmission probabilities assume their Mendelian values.

Frequency subroutine papfqax assumes Hardy-Weinberg equilibrium for females and generational equilibrium for males. The frequencies in males of the three genotypes equal:

(1) p2 t/[q + (p - q)t],

(2) pq/[q + (p - q)t],

(3) q2 (1-t)/[q + (p - q)t],

where t represents the son's allele transmission probability. If t = _, you obtain autosomal frequencies; if t = 1, you obtain X-linked frequencies. Subroutine papfqax restricts the model to 1 locus with 2 alleles.

See section D.9.6 for examples demonstrating autosomal/X-linked mixed model.

IV.9.7. Autosomal/X-linked admixture

The autosomal/X-linked admixture model encompasses both autosomal and X-linked inheritance. Therefore, successively comparing the likelihoods of the autosomal and X-linked models to the likelihood of the general model tests the two modes of inheritance. Rejection of both submodels supports heterogeneity in the mode of inheritance. Alternatively, the autosomal/X-linked mixed model (section IV.9.6) requires fewer parameters for testing the alternative modes of inheritance, but has an unrealistic general model.

An autosomal/X-linked admixture model requires a minimum of two loci, one autosomal and one X-linked. The submodels, autosomal or X-linked inheritance, restrict locus 1 or 2, respectively, to 1 allele. Additional loci may represent markers linked to the autosomal or X-linked form.

See section D.9.7 for examples demonstrating autosomal/X-linked admixture.