Menu
GWAS Study

Efficient genome-wide association in biobanks using topic modeling identifies multiple novel disease loci.

McCoy TH, Castro VM, Snapper LA et al.

28861588 PubMed ID
GWAS Study Type
10845 Participants
Scroll to explore
Chapter I

Publication Details

Comprehensive information about this research publication

Authors

MT
McCoy TH
CV
Castro VM
SL
Snapper LA
HK
Hart KL
PR
Perlis RH
Chapter II

Abstract

Summary of the research findings

Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that may be unreliable and fail to capture the relationship between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations of diagnostic codes from electronic health records (EHR) for 10845 participants in a biobanking program at two large academic medical centers. Specifically, we applied latent Dirichilet allocation to fit 50 disease topics based on diagnostic codes, then conducted genome-wide common-variant association for each topic. In sensitivity analysis, these results were contrasted with those obtained from traditional single-diagnosis phenome-wide association analysis, as well as those in which only a subset of diagnostic codes are included per topic. In meta-analysis across three biobank cohorts, we identified 23 disease-associated loci with p<1e-15, including previously associated autoimmune disease loci. In all cases, observed significant associations were of greater magnitude than for single phenome-wide diagnostic codes, and incorporation of less strongly-loading diagnostic codes enhanced association. This strategy provides a more efficient means of phenome-wide association in biobanks with coded clinical data.

10,845 European ancestry individuals

Chapter III

Study Statistics

Key metrics and study information

10845
Total Participants
GWAS
Study Type
No
Replicated
European
Ancestry
Chapter IV

Analysis

Comprehensive review of health and genetic findings

Important Disclaimer: This review has been performed semi-automatically and is provided for informational purposes only. While we strive for accuracy, this analysis may contain errors, omissions, or misinterpretations of the original research. DNA Genics disclaims all liability for any inaccuracies, errors, or consequences arising from the use of this information. Users should independently verify all information and consult original research publications before making any decisions based on this content. This analysis is not intended as a substitute for professional scientific review or medical advice.

Analysis In Progress

Our analysis of this publication is currently being prepared. Please check back soon for comprehensive insights into the health and genetic findings discussed in this research.