Prediction of mortality in young adults with cardiovascular disease using artificial intelligence

.


INTRODUCTION
Cardiovascular disease (CVD) is one of the world's major causes of death [1,2].Increased CVD risks were estimated, identified during childhood, and noted in young adults [3].According to the most recent WHO report from 2018, heart disease death in Jordan reached 7,615 cases, representing 22.97% of all deaths.With a death rate of 172.28 per 100,000 people, Jordan is ranked 52 globally in terms of the prevalence of heart disease [4].
Many patients with CVD continue to have a high risk of mortality at a young age [5].Pulse pressure (PP) is regarded as an indicator of arterial stiffness and/or atherosclerosis.PP predicts the onset of CVD [6].Furthermore, the assessment of the primary risk factors for the mortality from CVD was the subject of a sizable number of studies [7].The main risk factors for the mortality associated with CVD were tested using a variety of tools with various sample sizes.The framingham tool revealed that age, sex, systolic blood pressure (SBP), diabetes, smoking status, total cholesterol, and high-density lipoprotein were the main antecedents of CVD [8].Therefore, assessing the risks of CVD is essential in building prediction models to estimate the magnitude of the mortality of CVD at the population level and in specific subgroups such as young adults [9,10].
Early detection of heart diseases through utilizing machine learning algorithms (MLAs) as sub-fields of artificial intelligence (AI) is an essential step to prevent cardiac patients from developing further coronary artery damage and saving their lives [11].MLAs and other large-scale data analytical techniques must take place as an effective and factual method to predict heart diseases such as CVD, ischemic heart disease (IHD), and CHF [12].MLAs that can create a variety of prediction models to accurately assess the presence or absence of risk factors that contribute to the development of CVD can be used to manage these enormous data sets, also known as big data or large-scale data [13,14].Based on various populations and subgroups, several prediction models were created using the well-established risk factors and antecedents of CVD [9].
The literature claims that prediction models are lacking in Jordan and the Arab world as a whole, particularly among young people with CVDs.Additionally, this study focused on patients who were younger adults, whereas the majority of heart disease research focuses on elderly patients.The creation of a trustworthy model for classifying each population's risk of developing CVD has consequently elevated to the top of the priority list for researchers and organizations involved in this field.To develop a prediction model for young adults  with IHD versus CHD, with an emphasis on the outcome of death versus live status, is the goal of the current study.

Study Design
The study employed a retrospective design to collect data from electronic health records (EHRs) system called "Hakeem" for young adult Jordanian patients with CVD.Data were gathered from Jordanian young people who were hospitalized in public health facilities since 2015.

Sample & Variables of Study
The dataset was acquired through the retrospective analysis of EHRs pertaining to 809 young adult patients who had received a diagnosis of CVD.The sample of the study was for Jordanians who were hospitalized in public health facilities between 2015 and 2021.
EHRs contain various levels of patient variables, such as demographic information, a diagnosis, laboratory results, medication, and a history of issues, all of which allow for the meaningful use of patient information [15].Patient's identification (ID) numbers, sex, death status, governorate, medical diagnosis based on international classification of disease (ICD-10), laboratory results including high-density lipoprotein (HDL), lactate dehydrogenase (LDH), cholesterol level, fasting blood sugar (FBS), SBP, and diastolic blood pressure (DBP) were obtained from the HDA department for the admitted patients from the period of 2015 to the end of 2021.Data were downloaded as EXCEL sheets in many files from the HDA.Age, gender, and geographic location have no missing data.Data were merged into a single file using SPSS program [16].

Data Analysis
Using frequency analysis and outlier detection, the data were examined for noise, inconsistency, and missing values.A huge number of redundant data was removed after sorting, cleaning, and organizing the extracted data.For example, there was numerous missing data (more than 80.00%) for variables like HDL, LDH, and FBS.Thus, these variables were excluded from the analysis.
Chi-squared automatic interaction detector (CHAID) is a subtype of AI that is created for decision trees.It is a very effective statistical method for segmentation, also known as tree growing [17].CHAID assesses each value in a potential predictor field using the significance of a statistical test as a criterion [18].

Transformation of data
The researchers chose the most relevant attributes of CVDs using data visualization.Besides, international business machine statistical package for social science (SPSS) modeler (version 18.0) was used for manipulating, analyzing, and visualizing the data, which provides the features of presenting the data powerfully for statistical analysis and data management for descriptive and predictive modelling [19].Descriptive modelling was used to identify the main risk factors of CVD, visualization of data, and rank the attributes of CVD from the most occurrences attributes to the least one.Besides, CHAID as a decision tree technique keeps all other values that are heterogeneous and merges values that are deemed to be statistically homogeneous concerning the target variable.The best predictor is then chosen to create the decision tree's first branch, with each child node consisting of a collection of uniform values from the chosen field.Recursively, this process goes on until the tree is fully developed.The statistical test that is applied depends on the target field's level of measurement.A chi-squared test is applied if the target field is categorical.

Ethical Consideration
Patient records were handled anonymously by employing an ID as the distinguishing characteristic of each record.The data that were extracted were stored in a separate file that was locked up and stored in a secure location within the researcher's office.For the purposes of the study, the researchers were the only ones who could access the data.

Sample Characteristics
The health records of 809 young adults (18-45 years old) were extracted.Females were slightly more (51.20%) in the sample, with the highest age group of 41-45 years (n=307, 37.90%).About three-quarters of CVD had been diagnosed with IHD (n=588, 72.70%).The sample represented all the governorates in Jordan, with the majority was from Amman (n=424, 52.40%) (Table 1).
The blood pressure readings, which include SBP, DBP, PP, and mean arterial pressure (MAP) are presented in Table 2.
Unfortunately, not all patient health records had complete workup results.Each patient had many readings for the BP, thus, the first reading on admission of each patient was taken.

Predictive Models
Seven models were built, but one of them was adopted in this study.Evaluation was based on the overall accuracy and the area under the curve (AUC).Therefore, CHAID model was utilized, which achieved the highest level of accuracy 98.27 and AUC of 0.89 among the built models (Table 3).
CHAID algorithms, an AI technology, were found to be the best effective predictive model for forecasting death versus a life among young adults (18-45 years old) who had CVD.The SPSS modeler was used to generate the seven Nodes model.During the model training process, about 30.00% of the data is used to check whether the model produces valid results.
Exhaustive CHAID is a CHAID modification created to address some of the method's shortcomings.Since CHAID stops merging categories as soon as it determines that all the remaining categories are statistically different, it is possible that it occasionally does not find the best split for a variable.This is fixed by exhaustive CHAID, which keeps merging the predictor variable's categories until there are only two super categories left.Once the series of merges for the predictor has been examined, the set of categories with the strongest correlation to target variable is identified.
An adjusted p-value is then calculated for that correlation.By comparing the adjusted p-values, exhaustive CHAID can determine the best split for each predictor and then decide which predictor to split on.
The decision tree in Figure 1 begins with the root node, which displays the death status as an outcome field (node 0).Based on the statistical significance of the predictor of the strongest relationship with the target field, it was PP, the data are split into three nodes.Patients with PP<59 showed two deaths (rate of 0.368%), while those with PP between 59 and 73 showed five deaths (rate of 2.717%), and those with PP>.73 showed death of seven patients with the highest rate (8.642%), (Chi-square=29.748,p<.001).
Diastolic blood press was the next significant predictor in the model that stemmed from the cases with PP≤59 (node 4 and node 5).Patients with DBP≤90 mmhg had zero death, while those with DBP>90 mmgh had two cases of death (2.35%).Node 6 and node 7 stemmed from node 3 (PP>73).IHD in node 6 had two cases of death (3.64%), while node 7, which represents CHF cases had five cases of death (19.23%) (Chi-square=5.438,p=.020).
Two nodes related to governorates emerged from node 6. Node 8 includes 11 governorates with zero death from IHD, while node 9 refers to Zarqa governorate alone with two cases of death (22.22%) (Chi-square=10.610,p=.017).Furthermore, to confirm the validity of the selected model (CHAID) in the analysis and results, the second model with the highest overall accuracy (98.269) and AUC (0.866) was neural Net, thus the predictors importance is plotted in Figure 2. The result was consistent and supported CHAID model findings.

DISCUSSION
To create a model for predicting death versus a life status in this large group of young adults who have heart disease, retrospective data from the Jordanian health information system were used.In contrast to most of the previous research on heart disease, this particular study focused on younger adults rather than the elderly, specifically those between the ages of 18 and 45.In the sample of this study, all of the patients were diagnosed with either IHD or CHF.
Based on the regression model that incorporates demographics and blood pressure readings as predictors for death cases [14].In our research, data-driven modeling that makes use of MLAs can improve risk prediction.One example of this is CHAID, which assumes splitting the data and visualizing the most important risk factors in the model while taking into consideration the interaction between predictors.This can be accomplished even with a limited number of attributes and a limited number of algorithms [20].Neural net model has supporting outcomes of the adopted CHAID model, which was based on the highest AUC and accuracy prediction percentage out of seven models.
PP was the first emerging predictor for death status in our study.patients with PP higher than 73 had the highest chance of death (8.64%).This finding is consistent with a Korean study, which found that high PP in patients aged 40 to 69 was a substantial predictor of cardiovascular mortality [21].It is simple to compute PP, which enables the prognosis of cardiovascular death in individuals with advanced heart failure.
The third predictor, which emerged from node 2 (PP<.59) was DBP with a cut point of 90 mmhg.In this study we found that patients with DBP>90 mmgh had a death rate of 2.35%.In another study, patients with CVDs who have DBP>80 had death rate of 18.00% [22].The reason for the difference in death rate between our study and the others could be due the AI method of analysis.
The fourth attributable risk factor was medical diagnosis, with CVD classified as IHD or non-IHD, which emerged only from patients with PP higher than 73.According to the findings of this study, the death rate of IHD was only 3.64%, while the death rate of CHF is 19.23%.These findings are consistent with the findings of a study in [23] that examined CVD trends among young adults (25-44 years) between 1999 and 2018, finding that the death rate was 13.40%.These trends in CVD death rates among this age group were unclear, which could be attributed to low socioeconomic status, a lack of information on comorbidity conditions, and the presence of risk factors such as smoking and substance abuse [24,25].The residency location was the last identified attribute for death versus life status among young Jordanian adults in our study.
Out of 55 participants who had IHD, nine of them were from Zarqa governorate with a mortality rate of 22.22%.
All other governorates did not show mortality cases for patients with IHD.Research from many nations has documented relationship between area-level socioeconomic disadvantage and spatial disparities in the distribution of CVDs at various sizes [25][26][27][28].
According to Jordanian MOH statistics from 2017, circulatory system diseases are the leading cause of death among all ages in Jordan (42.36%).However, no studies were found to compare our results in relation to geographical area and IHD together.Therefore, taking into consideration the findings of our study, which is one of the few to use an AI model among young adults, there is a need for additional research to be conducted in a variety of countries to validate our findings.

CONCLUSIONS
In summary, we can claim that CHAID model of our study correctly classified and distributed the mortality rate among numerous factors.Therefore, we could assert that the AI model added more detail regarding how various factors are articulated in relation to CVD patient death cases.Although the sample size was relatively large, it was a limitation in this study that prevented us from having a large sample of the quality of the entered data in EHR.The patient's medications, laboratory results, radiological examination findings and body mass index

Figure 1 .
Figure 1.Study predictive model of death versus a life outcome among young adults (Source: Authors' own elaboration)

Table 2 .
Outcomes of clinical work-up for patients (n=809)

Table 3 .
Seven research data models