Objective
The overall objective of this proposal is to use three large observational clinical databases to: (i) quantify the 20-year trends in the proportion of patients with T1D who present with DKA; (ii) identify the patient, familial, and socioeconomic factors associated with DKA at diagnosis; and (iii) develop predictive models to identify children at the highest risk of DKA at diagnosis. We will use rich longitudinal population-based clinical data sourced from Medicaid (US) and MarketScan (US) databases as well as CPRD (UK) data from 2000-2020. The validity of these data will be enhanced through additional linkages with socioeconomic and health-care access and quality variables. Cumulatively, these three databases are estimated to contain >200,000 children and young adults ≤21 years old newly diagnosed with T1D in the US and UK between 2000-20, representing one of the largest T1D cohorts of its kind with very high generalizability.
Background Rationale
Diabetic Ketoacidosis (DKA) is an acute complication of type 1 diabetes (T1D), and is the leading cause of hospitalizations, morbidity and mortality among children and younger adults with T1D in the US, UK and worldwide. DKA is also the presenting symptom of patients newly diagnosed with T1D, comprising between 30% to 40% of new cases in the US and UK, and 15% to 67% of the cases worldwide.
However, due to limitations of current data, it is currently unclear whether patients presenting with DKA at diagnosis of T1D differ from those presenting without DKA, or whether such patients can be identified before they progress into DKA. The utility of existing studies is currently limited by their cross-sectional or descriptive designs, single center settings which limit both their sample size and generalizability of study findings, focus on broad categories of risk factors such as age or gender, and a lack of incorporation of longitudinal clinical data (i.e., focusing only on the period prior to DKA rather than weeks or months preceding DKA). Further, no prior study has attempted to utilize patient level risk factors to predict and identify children and younger adults who may be at a higher risk of DKA at diagnosis.
The scientific rationale underpinning our proposal are as follows: (a) children presenting with DKA at diagnosis differ systematically from their counterparts without DKA with respect to patient-level clinical and non-clinical factors, and that such factors are ascertainable in large observational healthcare databases; (b) although DKA has an acute onset, the physiological changes leading up to the development of T1D with DKA manifest in the weeks or months prior to DKA; (c) conversely, patients who present in DKA may also have a higher prevalence of medical or pharmaceutical precipitants that may acutely instigate onset of DKA; (d) in younger children, the hallmark signs of hyperglycemia (e.g., polydipsia) are difficult to ascertain and T1D may present itself in a myriad of other ways (e.g., perineal candidiasis in infants); (e) and finally, that the differences in the prevalence of DKA at presentation can at least be partially attributed to a varying degree of awareness regarding T1D among healthcare providers and caregivers, and also as a function of socioeconomic vulnerability, and a lack of access to healthcare providers and resources.
Description of Project
Diabetic Ketoacidosis (DKA) is an acute complication of type 1 diabetes (T1D), and is the leading cause of hospitalizations, morbidity and mortality among children and younger adults with T1D in the US, UK and worldwide. DKA is also the presenting symptom of patients newly diagnosed with T1D, comprising between 30% to 40% of new cases in the US and UK, and 15% to 67% of the cases worldwide. However, due to limitations of current data, it is currently unclear whether patients presenting with DKA at diagnosis of T1D differ from those without presenting in DKA, or whether such patients can be identified earlier.
The overall objective of this proposal is to use three large observational clinical databases to: (i) quantify the 20-year trends in the proportion of patients with T1D who present with DKA; (ii) identify the patient, familial, and socioeconomic factors associated with DKA at diagnosis; and (iii) develop predictive models to identify children at the highest risk of DKA at diagnosis. We will use rich longitudinal population-based clinical data sourced from Medicaid (US) and MarketScan (US) databases as well as CPRD (UK) data from 2000-2020. The validity of these data will be enhanced through additional linkages with socioeconomic and health-care access and quality variables. Cumulatively, these three databases are estimated to contain >200,000 children and young adults ≤21 years old newly diagnosed with T1D in the US and UK between 2000-20, representing one of the largest T1D cohorts of its kind with very high generalizability.
Aim 1: To estimate the trends in the incidence of T1D with and without DKA at presentation and compare short- and long-term mortality rates in the two groups after T1D diagnosis. We hypothesize that (a) the proportion of patients who present with DKA at diagnosis of T1D is either unchanged or increased over the 20-year period; (b) patients with DKA at diagnosis have higher short- and long-term mortality rates; and that (c) majority of children with DKA will have an encounter with a healthcare provider in the immediate period preceding DKA (e.g., within 30 days). We will also quantify these trends by age, biological sex, socioeconomic status, and rural vs urban residencies.
Aim 2: To identify the patient-level factors associated with DKA at diagnosis. Using longitudinal clinical data, we will identify the key clinical (e.g., gastroenteritis) and non-clinical (e.g., healthcare access, socioeconomic status) associated with DKA at diagnosis. We will explore whether and if these factors vary across subgroups of age (<4, 4-6, 7-9, 10-13, 14-17, and 18-21), biological sex, and key patient-level characteristics (e.g., strata of socioeconomic status).
Aim 3: To create predictive models to identify individuals and patient subgroups at the highest risk of DKA at diagnosis. Using a combination of putative risk factors and data driven variables, we will develop and validate predictive models to identify individuals and subgroups of patients who are at high risk of presenting with DKA at diagnosis. Both traditional statistical models and advanced machine learning algorithms will be constructed to identify the best approach in predicting individualized patient-level risk. We will also classify patients into categories of low, medium, high, and very high risk of DKA at diagnosis, thereby identifying key intervenable populations that may benefit from T1D screenings.
Anticipated Outcome
Aim 1. The objective of this aim is to estimate the trends in the incidence of T1D with and without DKA at diagnosis among children ≤21 years in US and UK over a 20-year period from 2000-20, and to compare the short- and long-term mortality rates between the two groups. Our working hypotheses are that: (a) despite the rising incidence and awareness of T1D, the overall trends in the proportion of children with a DKA presentation has either remain unchanged or has increased over this period (preliminary analysis); (b) majority of children who present in DKA will have had contact with their healthcare provider in the 30- or 60-day period prior to DKA; and that (c) DKA at diagnosis is associated with higher 1-, 3- and 5-year mortality rates.
Aim 2. The objective of this aim is to identify the clinical and non-clinical factors associated with DKA at diagnosis. Using longitudinal clinical data, we will identify patient-level factors associated with the development of DKA. Our working hypotheses are that: (a) children and younger adults presenting with and without DKA differ systematically with respect to clinical and non-clinical risk factors; (b) both the acute precipitants and chronic pathophysiological changes leading to T1D and DKA have a longitudinal clinical footprint that is ascertainable in our databases; and that (c) differences in DKA can be partially attributed to varying degree of awareness of T1D among healthcare providers and caregivers.
Aim 3. The objective of this aim is to use a combination of potential risk factors and data driven variables to develop and validate predictive models to identify patients at risk of presenting with DKA at diagnosis. Both traditional statistical models and advanced machine learning algorithms will be constructed to identify the optimal approach in predicting individualized patient-level risk. We will also classify patients according to their estimated predicted probabilities into categories of low, medium, high, and very high risk of DKA at diagnosis. Our working hypotheses are that patients with and without a DKA presentation differ systematically with respect to their clinical and non-clinical factors, and that these factors can be used to identify individuals at a risk of a DKA presentation. Further, as traditional statistical models are limited in their ability to account for non-linear relationships or complex interactions between predictors, machine learning approaches will outperform traditional statistical methods In predicting DKA risk.
Relevance to T1D
Aim 1. The objective of this aim is to estimate the trends in the incidence of T1D with and without DKA at diagnosis among children ≤21 years in US and UK over a 20-year period from 2000-20, and to compare the short- and long-term mortality rates between the two groups. Successful completion of this aim will generate rigorous and highly generalizable evidence on the 20-year trends of DKA at diagnosis, quantify the potential missed opportunities for early disease detection, and underscore the importance of avoiding DKA. Thus, these data will be critical for increasing the awareness of T1D and DKA at diagnosis in children and younger adults among healthcare providers and policymakers. Differences in the rates of avoidable complications for children in Medicaid compared to MarketScan (a commercial insurance database) will have pertinent ramifications on the national Medicaid policy – which is the largest provider of healthcare insurance to children in the US. Such data will also have important implications on expanding access to and coverage of T1D screening and diagnostics, which is in line with the stated priorities and initiatives (e.g., T1Detect) from JDRF.
Aim 2. The objective of this aim is to identify the clinical and non-clinical factors associated with DKA at diagnosis. Successful completion of this aim will identify the key – potentially understudied – factors associated with DKA at diagnosis, thereby enhancing our understanding of this avoidable complication. Examinations of how these risk factors vary across relevant subgroups of age, sex, clinical characteristics, and socioeconomic strata will reveal key insights on biological mechanisms while also informing clinical practice and healthcare policy. Study findings will aid in the identification of children with risk factors associated with a DKA presentation, the development of strategies to prevent this avoidable complication, and inform population- and individual-based screenings and interventions for the earlier detection of T1D. To our knowledge, this will be the first population-based cohort study to examine and establish the factors associated with DKA at diagnosis using a comprehensive approach integrating longitudinal clinical data containing information on medical history and medication use with factors relating to social deprivation and healthcare access.
Aim 3. The objective of this aim is to use a combination of potential risk factors and data driven variables to develop and validate predictive models to identify patients at risk of presenting with DKA at diagnosis. Successful completion of this aim will generate predictive models that will aid in the identification of patients and subgroups who are at high risk of DKA at diagnosis, thereby identifying the key intervenable populations that may benefit from T1D screenings. Notably, all patient-level variables included in our aim are ascertainable in electronic health records and claims data, permitting their future deployment in a variety of healthcare settings. To our knowledge, this will be the first study to use large, population-based data to construct predictive models to identify children who are at the greatest risk of presenting with DKA at diagnosis. Study findings will help healthcare providers, policymakers, and caregivers identify these high-risk individuals – and thus potentially avoid the occurrence of this avoidable complication. Finally, given the lack of prior studies, our approach of using data-driven predictors may identify previously unknown risk factors or predictors of DKA at diagnosis, furthering our understanding of T1D and DKA.