Expanded data mining of accumulated functional genomics data for identifying mechanistic biomarkers informative of T1D risk and prevention

Objective

This project's overarching objective is to expand data analysis efforts - called "data mining" - of functional genomics data that have already been collected to identify new biomarkers that will inform type 1 diabetes (T1D) risk and prevention, with a particular focus on pregnancy and early life. Leveraging existing proteomics, lipidomics, and genomics datasets from the ENDIA NCC Study, this project aims to deliver on the following specific objectives:

1. Link Genetic Variation to Functional Consequences: This project will establish a connection between genes and genetic risk scores, proteins, and lipids, shedding light on the mechanisms underpinning T1D pathogenesis. By doing so, this project aims to bridge the gap between prediction and causation in the context of T1D risk.

2. Exploit Existing Datasets: Capitalising on the significant past investments made by organisations including JDRF Australia and the Helmsley Charitable Trust in acquiring proteomics, lipidomics, and genomics data, this project aims to employ advanced data mining techniques to expand the analysis of these existing datasets in novel ways, including through techniques such as machine learning.

3. Enhance Functional Genomics Capability: This project seeks to enhance Australia's functional genomics capability by involving Australian groups with expertise in proteomics and lipidomics data linked with clinical researchers with demonstrated translational capacity. The project also aims to support junior researchers in conducting key informatics analyses. This will boost our capacity to undertake world-leading research in Australia, and deliver better outcomes for Australians living with or at risk of T1D.

4. Advance T1D Knowledge and Care: The ultimate goal is to advance knowledge in the field of T1D and contribute to transforming T1D care in two significant ways. First, by analysing existing proteomics, lipidomics and genomics data in-depth, this project aims to discover new biomarkers of IA risk that can complement or improve current risk-prediction methods. Second, by linking genetic variation to functional consequences in the proteome and lipidome, the project may reveal new opportunities for treating, delaying, and preventing T1D.

Background Rationale

Proteins and T1D:
Proteins play a vital role in many aspects of T1D. Proteomics, the study of proteins, has revealed crucial insights into how the insulin-producing beta cells are destroyed by the immune system in T1D. Recently, the TEDDY Study Group employed proteomics to identify potential biomarkers in the blood of children as predictive indicators of T1D. However, a limitation of this important study was that the TEDDY discoveries were not confirmed in an independent group of individuals, leaving room for further investigation. Our project aims to fill this gap. Moreover, ENDIA collects samples from both mothers and newborns, adding a unique and highly relevant dimension to our project.

The Lipid Connection:
Unlike proteins, lipids are not directly determined by our genetic code and their expression is influenced by a combination of our biology and the environment. Several studies have explored the lipid composition in the blood of individuals before the onset of islet autoimmunity and T1D. These investigations are based on the hypothesis that lipids may play a role in triggering the autoimmune response that leads to T1D. Dysregulation of lipids in T1D could either be a reaction to other factors or, intriguingly, lipids might themselves contribute to the inflammation that triggers and intensifies the immune response, ultimately causing the destruction of beta cells.

The Study Design:
ENDIA is a comprehensive Australian study that has followed expectant mothers and their children who have a close family member with T1D. It aims to pinpoint the environmental factors during pregnancy and early childhood that increase a child's risk of developing T1D. The project began in 2013 and by the end of 2019, 54 children had developed persistent islet autoimmunity or progressed clinical T1D. These children, referred to as cases, were matched to 161 children in the ENDIA cohort who didn't have islet autoimmunity (i.e., controls). The cases and controls represented 190 unique mother-infant pairs.

Plasma Samples:
Blood samples were taken from both mothers and infants, and the plasma (the liquid portion of blood) was separated and stored. In total, there were 931 unique plasma samples stored in the ENDIA Biobank from the 190 mother-infant pairs, spanning pregnancy, birth, and infancy. Researchers at WEHI and the Baker Institute in Melbourne analysed these samples for their proteins and lipids, respectively, in 2022/2023.

In summary, this study is part of a larger effort to unravel the mysteries of T1D by examining the roles of genes, proteins and lipids in children at risk of T1D due to their family history. By thoroughly studying a wide range of samples and using advanced statistical methods, this research hopes to provide critical insights into the early triggers of T1D, potentially paving the way for preventive strategies and more effective treatments in the future.

Description of Project

Summary:
In the era of functional genomics and precision medicine, understanding the complex biology behind diseases like type 1 diabetes (T1D) is crucial for improving diagnosis and developing effective prevention strategies. This proposal outlines an ambitious project aimed at identifying "mechanistic biomarkers" that can predict T1D risk and will inform strategies for primary prevention.

Significance:
The primary focus of this project is to dig deeper into proteomics, lipidomics and genomics datasets that have already been generated from participants in the Environmental Determinants of Islet Autoimmunity (ENDIA) Nested Case-Control (NCC) study. By linking genetic variation with the functional proteome and lipidome, researchers hope to gain a more comprehensive understanding of the biology that underlies a person's risk of developing T1D.

The first molecular "red flags" signalling the onset of T1D are circulating islet autoantibodies in the bloodstream, which are frequently detected in the first years of life in individuals destined to develop T1D. This suggests that the mechanisms triggering and progressing T1D are encountered in pregnancy and/or early infancy. The ENDIA NCC is the first study in the world with samples collected from genetically at-risk children who are progressing towards T1D with biospecimen collection spanning their pregnancy, birth, and early childhood prior to and at the emergence of islet autoimmunity (IA).

The key strengths of this proposal include (i) its concept of connecting genetics to the functional proteome and lipidome, (ii) the robust ENDIA cohort, (iii) existing proteomics, lipidomics, and genomics datasets, (iv) a capable research team, and (v) the potential to advance T1D knowledge by discovering new risk biomarkers and avenues for therapeutic intervention. It employs rigorous statistical methods to ensure data reliability and power in detecting significant associations.

Data Availability:
The project benefits from substantial datasets that include proteomics, lipidomics, and genomics data from 931 longitudinal samples representing 190 unique mother-infant pairs. These datasets have been acquired through significant past investments from JDRF Australia, The Helmsley Charitable Trust, and Diabetes SA.

Research Design and Methods:
The ENDIA NCC study comprises 54 children who have progressed to IA matched to antibody-negative controls of the same age and assigned sex at birth. Plasma was collected longitudinally across pregnancy, birth (cord blood), and childhood culminating in 931 samples, which were subjected to proteomics and lipidomics analyses in 2022/2023. The overall project has four phases: data acquisition (completed), data processing, data mining, and validation. Researchers will analyse existing data, process it for further analysis, apply various statistical and machine learning techniques to identify associations between proteins, lipids, and IA risk, and validate their findings through targeted assays.

Project Team:
The project team is multidisciplinary, comprising experts in molecular biology, clinical medicine, biostatistics/bioinformatics, and omics technologies. Their collective expertise ensures the project's success, with a clear goal of translating discoveries into clinical practice.

In conclusion, this project is poised to unravel critical mechanistic biomarkers for T1D risk and prevention by integrating and mining extensive omics data. The project has the potential to transform T1D care and contribute to our understanding of this complex condition.

Anticipated Outcome

In this ambitious project, researchers from the University of Adelaide, WEHI and the Baker Institute are exploring critical connections between genes, proteins, and lipids in relation to the risk of islet autoimmunity (IA) preceding type 1 diabetes (T1D). To achieve this, the project is divided into several phases, each designed to bring us closer to a comprehensive understanding of these intricate relationships.

Phase 1: Data Acquisition
The first phase, which was successfully completed in 2022/2023, involved acquiring proteomics and lipidomics data via mass spectrometry from 931 plasma samples representing 190 unique mother-infant pairs, and genomics data via the ImmunoChip genome array. Preliminary results suggest that there are numerous proteins and lipids worth investigating further, setting the stage for the project's next phases.

Phase 2: Data Processing
The project is now advancing to Phase 2, where the data collected will undergo a rigorous process of filtering, normalisation, and imputation. These steps are crucial to ensure that any associations that we find between proteins, lipids and genes are related to the underlying biology and not just the experimental design. During this phase, the Genetic Risk Score (GRS2) will be calculated for both mothers and infants using data collected using the ImmunoChip, providing a critical genetic context for the subsequent analysis.

Phase 3: Data Mining
This phase constitutes the detective work. Bioinformaticians will apply a range of statistical and machine learning techniques to identify associations between genes, proteins, lipids, and IA risk in the ENDIA mothers and children. We will assess the relative abundance of specific proteins and lipids, comparing cases (individuals with IA) with controls (those without the condition). If associations are found, this could signify that these molecules may be predictive of future T1D risk. Machine learning algorithms will also be applied to discover patterns within the data that could implicate different biological pathways as being relevant in the pathogenesis of T1D, which could lead to identifying new targets for prevention.

Phase 4: Validation
Once potential associations are identified in Phase 3, the research will be validated in Phase 4. This will involve analysing samples from additional children who developed IA after 2020 and their matched controls. This dataset will be used to validate the machine learning models trained in Phase 3. This will help to ensure that the patterns and associations identified in the first set of data are consistent when applied to a new group. If the machine learning models don't perform as expected, they will be refined based on the new data.

Anticipated Outcomes:
The ultimate goal is to discover significant links between proteins, lipids, genetics, and the risk of developing IA. By understanding these associations, scientists hope to identify potential biomarkers and pathways that could be crucial in the pathophysiology of T1D. In simpler terms, they want to find clues that can help us understand why and how T1D develops, and ultimately, to develop new ways to prevent the condition.

Relevance to T1D

The relevance of this for people living with or at risk of T1D is multifaceted and has the potential to positively impact their lives in several ways:

Improved Understanding of T1D Pathophysiology:
The project aims to discover new associations between proteins, lipids, genetics, and future T1D risk. This deeper understanding of the underlying mechanisms of T1D will lead to new insights into how T1D progresses and may inform better strategies for reducing T1D complications if novel biological pathways are implicated and can be therapeutically targeted.

Identification of Biomarkers:
If the project identifies specific proteins, lipids, or genetic factors that are strongly linked to T1D development, these could potentially serve as biomarkers. Biomarkers are essential for risk prediction, early diagnosis, and monitoring of disease progression (for example, identifying new endpoints for interventional trials aimed at preventing or delaying T1D).

Potential for Prevention Strategies:
By identifying factors that contribute to T1D risk, the project could pave the way for new preventative strategies, particularly for individuals with close relatives who are already at increased risk of developing T1D.

Capacity Building and Community Awareness :
This project brings together the University of Adelaide and Baker Institute research groups, both of whom have an established track record in the T1D field, with the WEHI Proteomics Facility, which is new to T1D research. As one of Australia's most advanced proteomics centres, we hope their association with T1D will continue and expand. Multicentre research projects such as this also increase awareness of T1D and the importance of research in the broader community. This heightened awareness can lead to increased funding for research, better support for people with T1D, and more advocacy for policies that benefit those with the condition.