Objective

This project will develop a machine learning solution for building efficacious and robust genetics risk score (GSR) models that can achieve accurate prediction performances across multi-ethnic groups. If successful, this project will provide necessary modeling and computational means for T1D researchers to build better GRS models for population-level use.

Background Rationale

As genotyping technologies advance over the years, we have seen a remarkable increase of data quality obtained with lower cost. These relatively inexpensive, non-invasive genotyping technologies provide scalable means for future population screening efforts in general healthcare settings, and our knowledge of T1D genetic risk also continues to increase. A number of Genetic Risk Score (GRS) models have been developed in the literature, demonstrating this promising aspect, but also pointing out one important area we need to address is that a scientific foundation is still lacking for building a GRS model that can robustly achieve accurate performance across ethnic groups. Evidence in the literature have shown that the current state-of-the-art GRS models have uneven performances on different ethnic groups.

Description of Project

As genotyping technologies advance over the years, we have seen a remarkable increase of data quality obtained with lower cost. These relatively inexpensive, non-invasive genotyping technologies provide scalable means for future population screening efforts in general healthcare settings. A number of Genetic Risk Score (GRS) models have been developed in the literature, demonstrating this promising aspect, but evidence in the literature have also shown that the current state-of-the-art GRS models have uneven performances on different ethnic groups. Our currently available samples are mostly from the population of European descent, although the prevalence of T1D is also high in other ethnic groups. Therefore, T1D researchers are caught in a dilemma: to build a GRS model based on the data of all groups, emphasizing the commonality of the groups but not their group-specific traits and risk factors, or to build customized GRS models considering ethnic contexts, but the small sample size problem would lead to inaccurate models with many false positives and replication crisis. In this project, we aim to provide a middle ground that bends these two approaches, by developing a machine learning solution that leverages on state-of-the-art research concepts and results on AI and machine learning, and highlights an interdisciplinary research collaboration that consists of experts in statistics, machine learning, AI, bioinformatics, genetics, and T1D research. If successful, this project will provide necessary modeling and computational means for T1D researchers to build accurate and equitable GRS models for population-level use.

Anticipated Outcome

The deliverables will be 1) a set of integrated algorithms and implementation codes for the T1D research community to analyze their datasets; 2) tutorials and workshop presentations to help T1D researchers adopt the methods; 3) discovery of complex interactions (i.e., by the rule-based method) involving environment features and genes that will help mechanistic hypothesis generation and facilitate the T1D research community to further enhance their GRS models by including these interaction terms.

Relevance to T1D

To date, both diabetes researchers and clinicians have a substandard criterion for classifying diabetes, especially in pediatric and young adult diabetes. Current criteria are not sensitive or specific enough to correctly classify type of diabetes at the time of onset, which leads to incorrect clinical management of disease. The current GRS scores are limited in specific race/ethnicity features and gene-gene synergy. It is known that the diabetes phenotype is influenced by genetics, age, sex, family history, environment and race/ethnicity. And, all of the factors influence progression. The short and long term benefits of this project will result in a robust GRS score that will allow for a more accurate clinical classification and better management of diabetes disease type across race/ethnicities that would result in lower risk of acute and chronic diabetic complications. Furthermore, incorporating gene-gene synergies in the GRS model can help researchers better identify environmental and lifestyle triggers.