Abstract
A genome-wide association study (GWAS) can be conducted to systematically analyze the contributions of genetic factors to a wide variety of complex diseases. Nevertheless, existing GWASs have provided highly ethnic specific data. Accordingly, to provide data specific to Taiwan, we established a large-scale genetic database in a single medical institution at the China Medical University Hospital. With current technological limitations, microarray analysis can detect only a limited number of single-nucleotide polymorphisms (SNPs) with a minor allele frequency of >1%. Nevertheless, imputation represents a useful alternative means of expanding data. In this study, we compared four imputation algorithms in terms of various metrics. We observed that among the compared algorithms, Beagle5.2 achieved the fastest calculation speed, smallest storage space, highest specificity, and highest number of high-quality variants. We obtained 15,277,414 high-quality variants in 175,871 people by using Beagle5.2. In our internal verification process, Beagle5.2 exhibited an accuracy rate of up to 98.75%. We also conducted external verification. Our imputed variants had a 79.91% mapping rate and 90.41% accuracy. These results will be combined with clinical data in future research. We have made the results available for researchers to use in formulating imputation algorithms, in addition to establishing a complete SNP database for GWAS and PRS researchers. We believe that these data can help improve overall medical capabilities, particularly precision medicine, in Taiwan.
Recommended Citation
Liu, Ting-Yuan; Lin, Chih-Fan; Wu, Hsing-Tsung; Wu, Ya-Lun; Chen, Yu-Chia; Liao, Chi-Chou; Chou, Yu-Pao; Chao, Dysan; Lu, Hsing-Fang; Chang, Ya-Sian; Chang, Jan-Gowth; Hsu, Kai-Cheng; and Tsai, Fuu‑Jen
(2021)
"Comparison of Multiple Imputation Algorithms and Verification Using Whole-Genome Sequencing in the CMUH Genetic Biobank,"
BioMedicine: Vol. 11
:
Iss.
4
, Article 7.
DOI: 10.37796/2211-8039.1302
Fig1
wgs_impute_accuracies_09.png (121 kB)
Fig2A
imputed_variants.png (70 kB)
Fig2B
chip_maf.png (52 kB)
Fig3A
wgs_maf.png (39 kB)
Fig3B
imputed_maf.png (39 kB)
Fig3C
beagle_maf_300.png (58 kB)
Fig4A
R2.png (77 kB)
Fig4B
Fig5.tif (30 kB)
Fig5
table1.docx (28 kB)
Table1
title page.docx (27 kB)
Title page
Abstract.docx (27 kB)
Abstract
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Included in
Bioinformatics Commons, Medical Sciences Commons, Molecular Genetics Commons, Numerical Analysis and Scientific Computing Commons, Theory and Algorithms Commons