Changes between Version 40 and Version 41 of ImputationPipeline

Dec 6, 2011 1:03:11 PM (7 years ago)



  • ImputationPipeline

    v40 v41  
    5252 * chromosome : The chromosome of this study
    5353 * r2_threshold : The R2 threshold
     55=== Statistics_of_imputation_results ===
     56 * Location:
     57Computes several statistics of imputation results. This is suitable when we have "real" genotype data to benchmark our imputation pipeline. The computed statistics are:
     58 * Allelic R2 : according to
     59 * Real_Allelic_R2 : Computes the R2 (or coefficient of determination) between a real and an imputed genotype.
     60 * Imputation_Allele_Frequency and Standardized_allele_frequency_error :  (From: Allele-frequency error is the difference between the true allele frequency in the sample and the estimated allele frequency in the sample computed from the posterior genotype probabilities. If the three posterior genotype probabilities for an individual are denoted pAA, pAB, and pBB, then the estimated A allele frequency is found by summing (2pAA + pAB) over all individuals and dividing by twice the number of individuals. However, allele-frequency error is difficult to interpret unless the true allele frequency and sample size are known. abs(p - q) / sqrt( ( p * (1-p))/ (2*n)). p is the allele frequency in the sample of n individuals from a population in Hardy-Weinberg equilibrium. q is the estimated allele frequency obtained from the imputed posterior genotype probabilities.
     63 * input_beagle_dosage_filename : The output of the beagle imputation
     64 * input_beagle_unimputed_filename : The beagle file with the "real", un-imputed genotypes
     65 * output_filename : Output filename for the stats
    5466== Complete pipelines ==
    5567== Results ==