Published on in Vol 10 (2024)

Preprints (earlier versions) of this paper are available at, first published .
Machine Learning Approaches to Predict Symptoms in People With Cancer: Systematic Review

Machine Learning Approaches to Predict Symptoms in People With Cancer: Systematic Review

Machine Learning Approaches to Predict Symptoms in People With Cancer: Systematic Review


1Department of Computer Science and Informatics, University of Iowa, Iowa City, IA, United States

2College of Nursing, University of Iowa, Iowa City, IA, United States

3Department of Business Analytics, University of Iowa, Iowa City, IA, United States

Corresponding Author:

Stéphanie Gilbertson White, PhD

College of Nursing

University of Iowa

452 CNB, 50 Newton Rd 52246

Iowa City, IA, 52246

United States

Phone: 1 319 335 7023


Background: People with cancer frequently experience severe and distressing symptoms associated with cancer and its treatments. Predicting symptoms in patients with cancer continues to be a significant challenge for both clinicians and researchers. The rapid evolution of machine learning (ML) highlights the need for a current systematic review to improve cancer symptom prediction.

Objective: This systematic review aims to synthesize the literature that has used ML algorithms to predict the development of cancer symptoms and to identify the predictors of these symptoms. This is essential for integrating new developments and identifying gaps in existing literature.

Methods: We conducted this systematic review in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist. We conducted a systematic search of CINAHL, Embase, and PubMed for English records published from 1984 to August 11, 2023, using the following search terms: cancer, neoplasm, specific symptoms, neural networks, machine learning, specific algorithm names, and deep learning. All records that met the eligibility criteria were individually reviewed by 2 coauthors, and key findings were extracted and synthesized. We focused on studies using ML algorithms to predict cancer symptoms, excluding nonhuman research, technical reports, reviews, book chapters, conference proceedings, and inaccessible full texts.

Results: A total of 42 studies were included, the majority of which were published after 2017. Most studies were conducted in North America (18/42, 43%) and Asia (16/42, 38%). The sample sizes in most studies (27/42, 64%) typically ranged from 100 to 1000 participants. The most prevalent category of algorithms was supervised ML, accounting for 39 (93%) of the 42 studies. Each of the methods—deep learning, ensemble classifiers, and unsupervised ML—constituted 3 (3%) of the 42 studies. The ML algorithms with the best performance were logistic regression (9/42, 17%), random forest (7/42, 13%), artificial neural networks (5/42, 9%), and decision trees (5/42, 9%). The most commonly included primary cancer sites were the head and neck (9/42, 22%) and breast (8/42, 19%), with 17 (41%) of the 42 studies not specifying the site. The most frequently studied symptoms were xerostomia (9/42, 14%), depression (8/42, 13%), pain (8/42, 13%), and fatigue (6/42, 10%). The significant predictors were age, gender, treatment type, treatment number, cancer site, cancer stage, chemotherapy, radiotherapy, chronic diseases, comorbidities, physical factors, and psychological factors.

Conclusions: This review outlines the algorithms used for predicting symptoms in individuals with cancer. Given the diversity of symptoms people with cancer experience, analytic approaches that can handle complex and nonlinear relationships are critical. This knowledge can pave the way for crafting algorithms tailored to a specific symptom. In addition, to improve prediction precision, future research should compare cutting-edge ML strategies such as deep learning and ensemble methods with traditional statistical models.

JMIR Cancer 2024;10:e52322




Cancer poses considerable physical and psychological challenges for those diagnosed with the disease. The Global Cancer Observatory estimated that there were 19.3 million new cancer cases and 43.8 million individuals living with cancer within 5 years of diagnosis globally in 2020 [1]. Symptoms such as fatigue, pain, nausea, vomiting, depression, and anxiety often persist beyond treatment [2-5], detrimentally affecting individuals’ quality of life [6]. Moreover, people with cancer frequently grapple with multiple intertwined symptoms [7], intensifying their distress [8]. Unmanaged cancer symptoms can lead to increased health care use, including emergency department visits and unscheduled hospitalizations to address these symptoms; a decline in the quality of life [9]; and even a reduced life expectancy. Providing precision symptom management tailored to the individual at the right moment has the potential to significantly improve outcomes, which is crucial for both people with cancer and their health care providers. Accurately predicting and addressing these symptoms is fundamental to providing such precision in symptom management.

Artificial intelligence, incorporating machine learning (ML) and deep learning (DL) models, excels in handling complex, high-dimensional, and noisy data. It has demonstrated effectiveness in disease diagnosis, predicting disease recurrence, enhancing quality of life, and symptom management [10-16]. There is a growing interest in ML in the emerging field of predictive analytics for cancer symptoms. ML contributes to the development of robust clinical decision systems, enhancing overall health care delivery [17]. ML algorithms can be broadly categorized into supervised learning, unsupervised learning, semisupervised learning, and reinforcement learning. DL, a subset of ML, addresses complex tasks such as speech recognition, image identification, and natural language processing [18].


This study seeks to offer a comprehensive and systematic review of the literature on the application of ML algorithms in predicting symptoms for people with cancer. Conducting this review of a rapidly expanding body of literature is imperative to understand the current state of the science for ML models in symptom prediction for cancer and to guide future research. This research aims to provide a comprehensive understanding of the current state of research; identify areas for improvement; and understand the limitations and gaps in the current literature, such as a lack of specific focus on ML models for patients with cancer. By comparing model performances across diverse symptom prediction tasks, we can identify the best practices, highlight areas for improvement, and offer informed recommendations that will propel the field of predictive analytics in cancer symptom research forward.

Search Strategy and Data Sources

This study was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analyses) protocol [19] and involved a comprehensive database search spanning from 1984 to August 11, 2023, including the PubMed, Embase, CINAHL, and Google Scholar databases. The search terms encompassed cancer, neoplasm, signs and symptoms, neural networks, machine learning, and specific algorithm names. In our study, we used Boolean expressions, using specific combinations of keywords and phrases, acknowledging the variability in terminology across studies. Search results were compiled using EndNote 20 (Clarivate Analytics). The detailed search strategy, which uses Boolean expressions, and the PRISMA checklist can be found in Multimedia Appendices 1 and 2.

Inclusion and Exclusion Criteria

To identify relevant research focusing on the application of ML methods in predicting cancer symptoms, we applied the following inclusion criteria: (1) papers published in English, (2) studies that used ML algorithms, and (3) research specifically aimed at predicting cancer symptoms. The exclusion criteria were as follows: (1) nonhuman studies, (2) technical reports, (3) review papers, (4) book chapters or series, (5) conference proceedings, and (6) studies for which full texts were unavailable. Two authors, NZ and NY, independently screened and cross-checked the candidate records. During the screening process, conducted using EndNote 20, any disagreements were resolved by consulting a third reviewer (SGW). The screening process involved an initial review of titles and abstracts, followed by a full-text examination to determine the study’s eligibility for inclusion in the review.

Data Extraction and Analysis

In our study, we implemented a systematic, multistep process for data synthesis. Initially, relevant studies were identified and selected based on the predefined inclusion and exclusion criteria. Two independent researchers, NZ and NY, extracted data from 42 selected studies. They worked independently to mitigate bias and enhance the accuracy of the data extraction process. In cases of discrepancies, these were resolved through discussion or consultation with a third reviewer, SGW. The extracted data were aggregated, involving the collation of study characteristics such as research location, sample size, study design, types of ML algorithms, validation metrics, identified significant predictors, cancer types, and the specific symptoms focused on. This comprehensive approach enabled us to reduce the bias and increase the reliability of our findings. For the analysis, we used both quantitative and qualitative methods. Quantitative data, such as frequencies and percentages, were compiled and analyzed using Python. This included the creation of insightful plots and heat maps to identify patterns and trends, illustrating relationships among variables and highlighting key findings in an easily digestible format. Qualitative aspects, such as algorithm implementation or study design, were explored through narrative synthesis. This allowed for a deeper understanding of the context and nuances in the application of ML algorithms for cancer symptom prediction.

We conducted a cross-analysis to compare findings from different studies, assessing the effectiveness of various ML algorithms across different cancer types and symptoms and identifying common predictors of success and the challenges faced. Finally, we interpreted the findings in the context of the existing literature. We discussed how our results align with or differ from previous studies and what new insights our synthesis brings to the field of ML in cancer symptom prediction.

Overall Results

A search across the 3 databases produced 1788 papers. After removing 289 duplicates, we screened the records for titles and abstracts, excluding another 1352 irrelevant records. However, 1 study was not retrieved. We reviewed the full text of the remaining 146 records, omitting 105 due to the absence of ML application in predicting cancer symptoms (69/146, 47.3%), not being a research article (34/146, 23.3%), and not being an English article (1/146, 1%). In the second phase, we intend to include Google Scholar in our research methodology to capture an additional 113 articles not found in our main databases, although 1 study was not retrieved. We reviewed the full text of the remaining 99 records, ultimately excluding all of them for reasons such as the lack of ML applications in cancer symptom prediction (89/99, 90%) and not being a research articles (10/99, 10%). Eventually, 42 studies met the inclusion criteria, as depicted in Figure 1.

Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart. ML: machine learning.

Of the 42 studies, 42 (100%) is listed in PubMed, Embase covers 37 (88%) studies, and CINAHL includes 18 (43%) studies. The distribution and overlap of these research articles across the databases are illustrated in Multimedia Appendix 3.

The data extracted from these studies, which include the reference number, research location, year, data type, cancer site, symptoms, significant predictors, ML algorithms, and validation methods, are detailed in Table 1 and in Multimedia Appendix 4.

Table 1. Details of the included studies (n=42).
StudyCountry, yearData type; number of dataPopulationCancer symptomsSignificant predictorsAlgorithmsValidation methods
Sun et al [20]China, 2023Clinical data; 1152People with breast cancerPainPostmenopausal status, urban medical insurance, history of at least 1one operation, underwent general anesthesia with fentanyl and sevoflurane, and received axillary lymph node dissection.LRa,b, RFc, GBDTd, and XGBeRandom
Xinran et al [21]China, 2023Clinical data; 494People with advanced cancerCognitive impairmentCancer course, anxiety, and ageLR and ANNfRandom
Shaikh et al [22]United States, 2023Clinical data; 1152Survivors of cancer with osteoarthritisDepressionAge, education, care fragmentation, polypharmacy, and zip code–level povertyXGB10-fold CVg
Kober et al [23]United States, 2023Clinical data; 1217People with cancer receiving chemotherapyMorning fatigue13 individual Li-Fraumeni syndrome itemsENh, RF, LASSOi, LR (filtered/unfiltered), RPARj, and SVMkRandom
Du et al [24]China, 2023Clinical data; 565People with cancerFatiguePain score, Eastern Cooperative Oncology Group score, platelet distribution width, and continuous erythropoiesis receptor activatorLR, RF, NBl, and XGB5-fold CV
Moscato et al [25]Italy, 2022Clinical data; 21People with cancerPainN/AmSVM, RF, MPn, LR, and AdaBoosto10-fold CV
Masukawa et al [26]Japan, 2022Clinical data; 808People with cancerSocial distress, spiritual pain, pain, dyspnea, nausea, and insomniaN/ALR, RF, light GBMp, SVM, and ensemble5-fold CV
Fanizzi et al [27]Italy, 2022CTq image data; 61People with oropharyngeal cancer receiving radiotherapyXerostomiaWeight preradiotherapy, induction chemotherapy, sex, platinum-based chemotherapy, current chemotherapy, alcohol history, age at diagnosis, smoking history, surgery, clinical tumor, and clinical nodeSVM and CNNr10-fold CV
Ueno et al [28]Japan, 2022Clinical data; 284People with breast cancerInsomniaGeneral fatigue, physical fatigue, and cognitive fatigueL2 penalized LR and XGB8-fold CV
On et al [29]Korea, 2022Clinical data; 935People with cancer receiving chemotherapyNausea-vomiting, fatigue-anorexia, diarrhea, hypersensitivity, stomatitis, hand-foot syndrome, peripheral neuropathy, and constipationEarlier history of adverse drug reaction, comorbidity, cancer site and type of chemotherapy, demographics, and antineoplastic therapy–related featuresLR, DTs, and ANN3-fold CV
Li et al [30]China, 2022Clinical data and CT image data; 365People with cancer receiving radiotherapyXerostomiaHypertension, age, total radiotherapy dose, dose at 50% of the left parotid volume, mean dose to right parotid gland, mean dose to oral cavity, and course of induction chemotherapyRF, DT and XGBExternal validation
Kurisu et al [31]Japan, 2022Clinical data; 668People with advanced cancer receiving pharmacological interventionsDeliriumThe baseline Delirium Rating Scale-R98 severity score (cutoff of 15), hypoxia, and dehydrationDT5-fold CV
Guo et al [32]China, 2022Clinical data; 80People with lung cancer receiving chemotherapyLung infectionAge ≥60 years, length of stay ≥14 days, surgery history, combined chemotherapy, myelosuppression, diabetes, and hormone applicationLR and ANNRandom
Baglione et al [33]United States, 2022Clinical data; 40People with breast cancerDepressed mood and anxietyConnectedness, receive support, frequency and duration use of mobile app, and physical painRF and XGBLOOCVt
Chao et al [34]United States, 2022Clinical data and CT image data; 155People with HNCu receiving radiotherapyXerostomiaN/ASVM, KNNv, NB, and RFNested
Wakabayashi et al [35]Japan, 2021Clinical data and CT image data; 69People with cancer receiving radiotherapyPainAge, numeric rating scale, and biological effective dose 10RFLOOCV
Zhou et al [36]China, 2021Clinical data; 386People with colorectal cancer after chemotherapyCognitive impairmentAge, BMI, colostomy, treatment complications, cancer-related anemia, depression, diabetes, Quality of Life Questionnaire Core 30 score, exercise, hypercholesterolemia, diet, marital status, education level, and pathological stageRF, LR, and SVMRandom
Xuyi et al [37]Canada, 2021Clinical data; 46,104Specific cancer site or treatment not mentionedPain, depression, and well-beingLung cancer, late-stage cancer, existing chronic conditions such as osteoarthritis, mood disorder, hypertension, diabetes, and coronary diseaseANNRandom
Xu et al [38]China, 2021Clinical data; 598People with gastrointestinal tumors after surgeryPostoperative fatigueAge, higher degree of education, lower personal monthly income, advanced cancer, hypoproteinemia, preoperative anxiety or depression, and limited social supportLR, ANN, CARTwRandom
Wei et al [39]China, 2021Clinical data; 533People with breast cancerLymphedemaN/AANN, LR, C5.0, RF, SVM, CART10-fold CV
Wang et al [40]United States, 2021Clinical data; 823People with HNCPain, taste, and general activityN/ASVM, KNN, and RF; Gaussian NB and MLPx; and ARIMAy and LSTMzRandom
Wang et al [41]United States, 2021Clinical data and CT image data; 138Specific cancer site or treatment not mentionedDepressionN/AFine tree, medium tree, coarse tree, linear-discriminant, quadratic discriminant, LR, Gaussian NB, kernel NB, linear SVM, quadratic SVM, cubic SVM, Fine Gaussian SVM, Medium Gaussian SVM, Coarse Gaussian SVM, Fine KNN, Medium KNN, Coarse KNN, Cosine KNN, Cubic KNN, Weighted KNN, boosted trees, bagged trees, subspace discriminant, subspace KNN, and random undersampling boosted trees5-fold CV
Mosa et al [17]United States, 2021Clinical data; 6124People with cancer receiving chemotherapyNausea-vomitingSmoking, alcohol status, sex, age, and BMINB, LR, ANN, SVRaa, and DT10-fold CV
Low et al [42]United States, 2021Clinical data; 44People with pancreatic cancer after surgeryDiarrhea, fatigue, and painPhysical activity bouts, sleep, heart rate, and locationLR, KNN, SVM, RF, GBab, XGB, and LightGBM3-fold CV and LOOCV
Kourou et al [43]Greece, 2021Clinical data; 609People with breast cancerDepressionA set of psychological traits (optimism, perceived ability to cope with trauma, resilience as a trait, and ability to understand the illness) and subjective perceptions of personal functionality (physical, social, and cognitive)RF, SVM, and GB5-fold CV
Kober et al [44]United States, 2021Clinical data; 1217People with cancer receiving chemotherapyEvening fatigueMorning fatigue, lower evening energy, and sleep disturbanceRF, LR (filtered or unfiltered), RPAR, and SVM10-fold CV
Hu et al [45]China, 2021Clinical data; 238People with non-Hodgkin lymphoma receiving chemotherapyDepressionEducation level, sex, age, marital status, medical insurance, per capita monthly household income, pathological stage, Suicide Severity Rating Scale, Pittsburgh Sleep Quality Index, and Quality of Life Questionnaire Core 30SVM, RF, and LASSO+LRRandom
Haun et al [46]Germany, 2021Clinical data; 496People with cancer seen in primary careAnxietyFatigue or weakness, insomnia, and pain appearedOLSac, RRad, LASSO, ENRae, RF, and XGB10-fold CV
Lee et al [47]United States, 2020Clinical data and CT Images data; 388People with lung cancer after intensity-modulated radiation therapyWeight lossJoint Gross tumor volume L1+L2+L3 radiomics, Gross tumor volume, and esophagus L3 dosiomicSVM, DNNaf, and ensemble classifierNested CV
Juwara et al [48]Canada, 2020Clinical data; 204People with breast cancer after surgeryNPajAnxiety, type of surgery, and acute painLSah, RR, ENR, RF, GB, and ANN10-fold CV
Men et al [49]United States, 2019Clinical data and CT image data; 784People with HNC receiving radiotherapyXerostomiaFeature map visualizationLR and 3D-RCNNaiRandom
Jiang et al [50]United States, 2019Clinical data and CT images data; 427People with HNCXerostomiaThe patient has human papillomavirus, completed chemotherapy, their baseline xerostomia grade, tumor site, N stage, and use of feeding tubeRR, LASSO, and RF10-fold CV
Sheikh et al [51]United States, 2019CT images data; 266People with HNCXerostomiaN/AGeneralized linear model10-fold CV
Papachristou et al [52]United States, 2019Clinical data; 799People with cancer receiving chemotherapySleep disturbance, anxiety, and depressionAge, gender, cancer site, the number of prior cancer treatment, and initial diagnosisSVR (linear, polynomial, and radial Sigma) and n-CCAaj10-fold CV and bootstrap
Zhang et al [53]China, 2018Clinical data; 375People with cancer receiving radiotherapyWeight lossHead and neck tumor location and total radiation dose of ≥70 Gray, and without postsurgeryDT and LRRandom
Olling et al [54]Denmark;2018Clinical and CT image; 131People with lung cancer receiving radiotherapyOdynophagia (painful swallowing)N/AMultivariable LR, Lasso and elastic net regularized generalized linear models, and SVM10-fold CV
Gabryś et al [55]Germany;2018Clinical and CT image; 153People with HNC after radiotherapyXerostomiaThe parotid gland volume, the spread of the contralateral dose-volume histogram, and the parotid gland eccentricity, and sexLRL1ak, LRL2al, LR-ENam, KNN, SVM, ETan, and GTBaoSingle and nested CV
Lötsch et al [56]Germany;2018Clinical data; 1000People with breast cancer after surgeryPainAge, chronic pain of any type, number of previous operations, BMI, preoperative pain in the area to be operated on, smoking and psychological factorsUnsupervised MLapRandom
Abdollahi et al [57]Iran;2018Clinical and CT image; 47People with HNC receiving chemotherapyHearing loss10 of the 490 radiomic features selected as the associated features with significant sensorineural hearing loss statusDecision stump, Hoeffding, C4.5, NB, AdaBoost, bootstrap aggregating, and LR10-fold CV
van Dijk et al [58]United States;2018Clinical data and CT image; 68People with HNCXerostomiaN/ALRExternal validation
Cvetković [59]Serbia;2017Clinical data; 84People with breast cancerDepressionN/AELMaq, ANN, and Fuzzy Genetic AlgorithmRandom
van Dijk et al [60]United States;2017CT image features; 249People with HNCXerostomiaN/ALR10-fold CV

aLR: logistic regression.

bItalic text in this column indicates the best results used in the study.

cRF: random forest.

dGBDT: gradient boosting decision tree.

eXGB: extreme gradient boosting.

fANN: artificial neural network.

gCV: cross-validation.

hEN: elastic net.

iLASSO: Least absolute shrinkage and selection operator.

jRPAR: recursive partitioning and regression trees.

kSVM: support vector machine.

lNB: Naïve bayes.

mN/A: not applicable.

nMP: multiple perceptron.

oAdaBoost: Adaptive boosting.

pGBM: light gradient boosting machine.

qCT: computed tomography.

rCNN: convolutional neural network.

sDT: decision tree.

tLOOCV: leave-one-out-cross-validation.

uHNC: head and neck cancer.

vKNN: k-nearest neighbor.

wCART: classification and regression tree.

xMLP: multilayer perceptron.

yARIMA: autoregressive integrated moving average.

zLSTM: long short-term memory neural network.

aaSVR: support vector regression.

abGB: gradient boosting.

acOLS: ordinary least square.

adRR: ridge regression.

aeENR: elastic net regression.

afDNN: deep neural network.

agNP: neuropathic pain.

ahLS: least squares.

ai3D-RCNN: 3D region-based convolutional neural network.

ajn-CCA: nonlinear canonical correlation analysis.

akLRL1: L1 penalized logistic regression.

alLRL2: L2 penalized logistic regression.

amLR-EN: logistic regression-elastic net.

anET: extra tree.

aoGTB: gradient tree boosting.

apML: machine learning.

aqELM: extreme linear machine.

A total of 2 individual researchers (NZ and NY) separately extracted data from each study, working independently of each other. This approach is used to reduce bias and increase the accuracy of the data extraction process. If discrepancies arise between the 2 independent authors, they are usually resolved through discussion or by consulting a third reviewer (SGW).

Primary Database Information

The studies selected were published between 2017 and 2023 and were conducted in North America (18/42, 43%), Asia (16/42, 38%), and Europe (8/42, 19%). Methods of data collection varied, with studies originating from individual centers (23/42, 55%) and multiple centers (19/42, 45%). The average sample size was 1686, and the studies varied in sample size: <100 participants (8/42, 19%), between 100 and 1000 participants (27/42, 64%), and >1000 participants (7/42, 17%). Most studies relied on clinical data (28/42, 67%), although some integrated clinical data with computed tomography (CT) images (14/42, 33%). The study designs were diverse, including retrospective (18/42, 43%), cross-sectional (15/42, 38%), prospective (5/42, 12%), and longitudinal (4/42, 10%) approaches.

Cancer Primary Sites and Predicted Symptoms

Various primary cancer sites were studied, with head and neck cancers being the most prevalent (9/42, 21%). Breast cancer was the focus of 19% (8/42) of the studies, and lung cancer was studied in 17% (3/42) of the cases. The included studies included participants undergoing a range of treatments, including chemotherapy (9/42, 21%), radiotherapy (9/42, 21%), surgery (4/42, 10%), and investigations of posttreatment survivors (2/42, 5%). Of the 42 included studies, 10 unique symptoms were reported as outcome variables in the predictions. Those included were xerostomia (9/42, 14%) [27,30,34,49-51,55,58,60], depression (8/42, 13%) [22,33,37,41,43,45,52,59], pain (8/42, 13%) [20,25,26,35,37,40,42,56], fatigue (6/42, 10%) [23,24,29,38,42,44], anxiety (3/42, 5%) [33,46,52], sleep disturbance or insomnia (3/42, 5%) [26,28,52], nausea or vomiting (3/42, 5%) [17,26,29], weight loss (2/42, 3%) [47,53], cognitive impairment (2/42, 3%) [21,36], and diarrhea (2/42, 3%) [29,42].

One study reported multiple symptoms, including hypersensitivity [29], stomatitis [29], hand-foot syndrome [29], peripheral neuropathy [29], and constipation [29]. Another study delved into taste and general activity [40]. Individual studies were dedicated to each of the following symptoms: delirium [31], lung infection [32], lymphedema [39], well-being [37], odynophagia [54], social distress [26], spiritual pain [26], dyspnea [26], and hearing loss [57]. The distribution of these symptoms is depicted in Multimedia Appendix 5.

Significant Candidate Predictors of Symptoms

Numerous predictors were frequently used for predicting symptoms, which can be grouped into demographic features and clinical characteristics.

Demographic Features

The demographic features include age, sex, BMI, income, medical insurance, education, marital status, and zip code–level poverty.

Clinical Characteristics

The clinical characteristics include smoking and alcohol use, initial diagnosis, presence of cancer, stage of cancer, cancer course, tumor site, type and number of prior treatments, chemotherapy type, and radiotherapy dose and volume. Health conditions such as comorbidity, diabetes, hypertension, osteoarthritis, and coronary disease also play a significant role. In addition, psychological factors such as depression and anxiety, fatigue, sleep disturbance, and pain are considered. Other influential predictors encompass care fragmentation, polypharmacy, hormone levels, physical activity, diet, heart rate, and social support factors.

In our comprehensive analysis of 42 studies, all the detailed findings on common cancer symptoms are compiled in Figure 2. We provide a detailed analysis of the predictors for the 4 most frequently reported cancer symptoms identified in this study: xerostomia, pain, depression, and fatigue. In a detailed analysis of 42 studies, various predictors for 4 common cancer symptoms—xerostomia, pain, depression, and fatigue—have been identified, each with its distinct set of influencing factors.

Figure 2. Significant predictors of individual symptoms.

For xerostomia, age, gender, chemotherapy type, radiotherapy dose and volume, cancer stage, tumor site, and hypertension are crucial predictors. In the case of pain, factors such as age, BMI, smoking and alcohol habits, cancer site and stage, tumor site, diabetes, hypertension, osteoarthritis, coronary disease, physical activity, psychological factors, sleep disorders, and existing pain conditions emerge as significant. Significant predictors for depression include age; gender; education; cancer site and stage; economic factors such as insurance, income, and poverty level; marital status; initial diagnosis impact; comorbidities (diabetes, hypertension, osteoarthritis, and coronary disease); pain; social support; care fragmentation; polypharmacy; and various scale scores. Finally, for fatigue, the key predictors are existing fatigue and low energy, cancer site, sleep disturbances, age, income, education, chemotherapy type, tumor site, comorbidities, hypercholesterolemia, heart rate, hypoproteinemia, physical and psychological factors, pain, adverse drug reaction history, limited social support, Eastern Cooperative Oncology Group score, platelet distribution width, and erythropoiesis.

When examining the commonalities across these predictors for xerostomia, pain, depression, and fatigue, several factors stand out as particularly influential across multiple symptoms: age; gender; cancer site and stage; treatment-related factors such as the type of chemotherapy and radiotherapy; comorbidities such as diabetes, hypertension, and coronary disease; physical and psychological factors; and socioeconomic factors such as income and education level, demonstrating the impact of cancer treatments on symptom development. These common predictors underscore the complex, multifactorial nature of symptom manifestation in patients with cancer, necessitating a comprehensive approach to their management and care.

ML Algorithms and Validation Metrics

Of the 42 studies analyzed, 7 (17%) used a single ML algorithm, whereas 35 (83%) used multiple algorithms. The most effective models, in terms of performance, were logistic regression (LR; 9/42, 17%), random forest (7/42, 13%), artificial neural networks (5/42, 9%), decision trees (DTs; 5/42, 9%), and extreme gradient boosting (3/42, 6%). For validation methods, 10-fold cross-validation was the most used (14/42, 31%), followed by 5-fold cross-validation (5/42, 11%), 3-fold cross-validation (2/42, 4%), and 8-fold cross-validation (1/42, 2%). The primary evaluation metric across these studies was the area under the curve, which was adopted in 24% (26/42) of the studies. A visual representation of the leading ML models along with the validation and evaluation metrics used in the study presents in Multimedia Appendix 6.

Principal Findings

In this review, we present the first systematic analysis of ML applications for predicting the development of cancer symptoms. We explore the most frequently studied cancer sites and delve into the intricacies of ML procedures. Breast, head or neck, and lung cancers are the most frequently studied sites in current research, with xerostomia, depression, pain, and fatigue being the most prominent symptoms. The application of various ML techniques is on the rise, with data acquisition and preprocessing being pivotal for successful ML models. While a range of algorithms, from traditional methods such as LR and DT to advanced ones such as DL, are used, there is a growing emphasis on data quality, external validation, and a standardized approach to model evaluation. The future of ML in cancer symptom prediction looks promising, with a need for collaborative efforts among oncologists, data scientists, and patient groups, combined with more comprehensive research on lesser-studied cancer sites and standardized methodologies.

Regarding the cancer sites covered in the studies, breast, head or neck, and lung cancers emerged as the most frequently researched primary cancer sites. The range of symptoms and side effects that patients experienced varied from one study to another. Some symptoms depended on the specific cancer site and the treatments patients received. For example, xerostomia, which can either arise from the tumor itself or manifest as a treatment side effect, has a significant impact on patients’ dental health and compromises antimicrobial functions [61]. However, most symptoms were not directly attributed to a particular cancer site or treatment.

Our review revealed a notable emphasis on predicting xerostomia in 14% (9/42) of the studies, despite head and neck cancers being less prevalent. The notable emphasis on predicting xerostomia in ML research, despite the lower prevalence of head and neck cancers, is likely due to advancements in integrating ML with CT imaging. CT imaging is a pivotal tool in the diagnosis and treatment planning of head and neck cancers. The integration of ML with CT imaging has opened new possibilities for more accurately predicting side effects such as xerostomia. ML techniques, when applied to CT images, can potentially identify patterns and indicators that are not easily discernible by human observers. This capability can lead to earlier and more precise predictions of xerostomia, thereby enabling better preventive measures and treatment planning to mitigate this side effect. Therefore, the focus on xerostomia in ML research, in the context of head and neck cancers, is likely driven by the opportunities presented by combining ML with advanced imaging techniques.

Depression, a widespread emotional challenge for people with cancer [62,63], was the focus of prediction in many studies (8/24, 13%). Similarly, pain, a recurrent concern for palliative care patients [64] and survivors of cancer [65,66], was the subject of prediction in >13% (8/24) of the studies. Fatigue, prevalent across all age groups with cancer [67,68], was highlighted in 6 (10%) of the 42 studies reviewed.

In terms of the ML approaches used in the studies, a plethora of techniques were used to construct these predictive models, spanning all phases of the ML process, from data collection and preprocessing to feature and algorithm selection, model training, testing, and evaluation. The process of data acquisition is pivotal for the development of ML models, thereby emphasizing the importance of an adequate sample size. Upon reviewing 42 studies, we discerned that the most frequent sample sizes for ML applications ranged between 100 and 1000 samples. More advanced ML techniques necessitate larger data sets to bolster robustness and mitigate the risk of overfitting. Alarmingly, certain studies in our review used ML with comparably smaller data sets, introducing the risk of model overfitting and potential biases in the subsequent performance metrics [69]. Challenges tied to sample size might impede the creation of sturdy and trustworthy ML models [70]. Data preprocessing is indispensable to yield clean and interpretable data, which is a cornerstone for proficient ML models. Data cleaning approaches encompass addressing missing values, tackling data noise, and data normalization. Within health care data sets, noisy or absent data are frequently a by-product of inaccuracies in manual entries or instrument recordings made by medical personnel or ancillary staff [71]. However, most of the reviewed studies lacked comprehensive descriptions of their data cleaning methodologies or strategies for handling noisy data and normalization, constrained by word or page limits in publications.

Given the crucial importance of data quality in developing ML models, it is essential for researchers to focus equally on effective data preparation and choosing suitable algorithms. Future endeavors would benefit from exhaustive procedural documentation made available on public platforms such as GitHub. In a research context, GitHub can be used for sharing and collaborating on various aspects of a research project, including but not limited to code. It allows researchers to maintain version control of their scripts, data analysis procedures, and even documentation. This feature is particularly beneficial for replicating studies and verifying results, as it provides a transparent view of the methodologies and analyses used.

Overloading an ML model with excessive features can undermine its ability to differentiate between pertinent data and superfluous noise, leading to the challenge often referred to as the “curse of dimensionality.” The goal of feature engineering is to mitigate model complexity, expedite the training process, reduce the data’s dimensionality, and avert overfitting [72]. By streamlining the model with a curated set of predictors, it becomes more accessible and transparent, emphasizing the importance of feature selection during data preparation. Our review pinpointed the most frequently used significant predictors in cancer symptom prediction. The efficacy of prediction models is heavily influenced by the number and interplay of the relevant predictors. Factors such as age, gender, type and number of previous treatments, cancer location, cancer stage, chemotherapy type, dosage and volume of radiotherapy; chronic conditions such as diabetes and hypertension; concurrent diseases; and symptoms including depression, anxiety, fatigue, pain, and sleep disturbances have consistently featured as determinants in numerous predictive frameworks. Our review of cancer symptom prediction underscored age as a pivotal factor, associated with predominant symptoms such as depression, pain, xerostomia, and fatigue. While numerous elements, from gender to type of treatment and cancer stage, influence the predictive models, it is the prominence of age that consistently emerges as a cornerstone predictor. As we delve deeper into this field, even with the introduction of newer determinants and correlations, the centrality of age in these frameworks remains indisputable.

Regarding algorithm selection, traditional methods often struggle with handling high-dimensional data and processing extensive information. To tackle these challenges, researchers have increasingly shifted toward innovative ML algorithms that are renowned for their robust predictive power and strong generalization capacities. These sophisticated algorithms excel at delving deep into data and discerning intricate interrelationships among variables. To navigate the multifaceted landscape of modeling challenges, it is advantageous for researchers to leverage a diverse array of ML algorithms. Most studies used multiple predictive models, with techniques such as LR, RF, ANN, and DT consistently delivering stellar results. The introduction of advanced ML techniques, such as DL and ensemble classifiers, provides promising opportunities to elevate prediction accuracy in future research.

After their design, the ML models undergo training and testing on different data sets. However, these models can grapple with issues such as overfitting and underfitting. Overfitting occurs when a model becomes overly complex, which leads to increased variance and reduced clarity. In contrast, underfitting results from an oversimplified model, causing it to overlook key data patterns and diminish its predictive capacity. Therefore, the ideal learning model should strike a balance between the optimal variance and justifiable bias. To mitigate these issues, the common strategy is to divide the data set into training and testing subsets, followed by internal or external validation. While most studies in our review used internal validation, only 1 study reported external validation [58], which was demonstrated on a small cohort of 25 patients with head and neck cancer. Although its performance is typically lower than evaluations using the original data sets, external validation remains crucial for gauging ML models [72]. It is a crucial step in ensuring that the model’s performance is not just limited to the conditions and data it was originally trained on but also applicable and reliable in broader, real-world clinical settings. This approach serves to verify the model’s efficacy and generalizability across different patient populations and settings.

Understanding and interpreting ML models continue to pose challenges. Determining the variables that significantly impact symptom prediction can be elusive due to the intricate prediction processes. Many studies gauge the performance of ML models using metrics that examine their ability to distinguish between 2 classes. From our systematic review of 42 studies, the area under the curve emerged as the predominant metric for the prediction models. Other metrics included accuracy, sensitivity, specificity, positive predictive value, root mean square error, and negative predictive value. These metrics provide a holistic view of a model’s efficacy, facilitating its refinement and enabling more precise predictions. However, the diverse emphasis on distinct metrics in numerous studies underscores the need for a uniform approach to evaluating ML models in cancer symptom prediction.

As interest grows in using ML for predicting cancer symptoms, there are several areas that merit deeper investigation. A crucial area is broadening the range of studied cancer sites and more comprehensively correlating symptoms with various treatment methods. To fully understand symptom prediction, it is essential that future studies delve into lesser-explored or infrequently studied cancer sites. Furthermore, the methodologies used for data preprocessing and cleaning should be documented more thoroughly, focusing on best practices to ensure data integrity. As data are foundational to ML models, transparent and detailed preprocessing can improve the reliability and repeatability of these models. Although our analysis highlighted common predictors for symptom forecasting, examining potentially underrepresented or emerging indicators could refine these models further. On the algorithmic front, exploring hybrid ML methods that merge the strengths of multiple algorithms might be particularly beneficial for cancer symptom prediction. Standardizing evaluation metrics across studies would also provide clarity and facilitate a more accurate comparison of various ML techniques. To genuinely progress, collaborations among oncologists, data scientists, and patient advocacy groups are vital to ensure that the developed models are technically robust and clinically pertinent. With these insights, ML stands poised to transform cancer care, creating treatment plans based on patient-focused and accurate symptom prediction models.


This review is not without its limitations. Although we established clear inclusion and exclusion criteria, potential biases in the studies we analyzed could inherently limit our review. We might have missed or excluded relevant studies due to inadequate information or the absence of keywords in their titles or abstracts. Many of the studies we reviewed did not specify the cancer site, potentially limiting the accuracy and applicability of our findings to specific cancer types. The broad range of predictors used across the studies also made it difficult to draw definitive conclusions about the most influential factors in predicting cancer symptoms using ML algorithms. As such, readers should interpret these results cautiously, given this variability.


ML offers an intriguing potential for predicting cancer symptoms, thereby preemptively mitigating the associated challenges. Predicting the symptoms that people with cancer might experience and determining their onset throughout their treatment journey is a pivotal clinical issue that can enhance patients’ quality of life. Notably, all studies in our review were published after 2017, highlighting the nascent nature of this research area. Our investigation primarily sought to outline the ML methodologies harnessed for symptom prediction in people with cancer. While ML techniques hold an edge over traditional statistical approaches by virtue of their prowess in analyzing vast data sets and gauging the efficacy of diverse prediction models, certain impediments such as a limited pool of symptoms; suboptimal data preparation; challenges in feature engineering; and complexities in ML algorithm design, validation, and evaluation can constrain the broad applicability of these predictive models. Future research should pivot toward amplifying the efficacy of ML strategies. This enhancement can be achieved by harnessing expansive, high-caliber data sets; tapping into innovative technologies for data refinement; and sculpting refined models. Harnessing ML can potentially free health care practitioners—including doctors, nurses, and clinic personnel—to accentuate the human touch in managing cancer symptoms.


The authors would like to express their profound gratitude to their esteemed colleagues and academic mentors. Their invaluable insights, unwavering support, and dedicated time significantly enriched the authors’ research journey. Special acknowledgment is reserved for the myriad of researchers and study participants whose dedication and data have underpinned this comprehensive review. Their collective efforts have made this work possible. The authors extend special thanks to Jennifer Deberg, a specialist at the Hardin Library for the Health Sciences, for her invaluable support in selecting search terms and databases.

Conflicts of Interest

None declared.

Multimedia Appendix 1

The detailed search strategy for the databases and the Boolean expressions used.

DOCX File , 30 KB

Multimedia Appendix 2

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 checklist.

PDF File (Adobe PDF File), 942 KB

Multimedia Appendix 3

The distribution and overlap of 42 studies across the databases.

PNG File , 88 KB

Multimedia Appendix 4

The data extracted from 42 studies.

DOCX File , 31 KB

Multimedia Appendix 5

Number of studies per cancer symptoms.

PNG File , 62 KB

Multimedia Appendix 6

Visual overview of the machine learning models and metrics.

PNG File , 74 KB

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. May 2021;71(3):209-249. [FREE Full text] [CrossRef] [Medline]
  2. Yates P. Symptom management and palliative care for patients with cancer. Nurs Clin North Am. Mar 2017;52(1):179-191. [CrossRef] [Medline]
  3. Abd El-Aziz N, Khallaf S, Abozaid W, Elgohary G, Abd El-Fattah O, Alhawari M, et al. Is it the time to implement the routine use of distress thermometer among Egyptian patients with newly diagnosed cancer? BMC Cancer. Oct 27, 2020;20(1):1033. [FREE Full text] [CrossRef] [Medline]
  4. Abu-Odah H, Molassiotis A, Yat Wa Liu J. Analysis of the unmet needs of Palestinian advanced cancer patients and their relationship to emotional distress: results from a cross-sectional study. BMC Palliat Care. May 14, 2022;21(1):72. [FREE Full text] [CrossRef] [Medline]
  5. Martínez Arroyo O, Andreu Vaíllo Y, Martínez López P, Galdón Garrido MJ. Emotional distress and unmet supportive care needs in survivors of breast cancer beyond the end of primary treatment. Support Care Cancer. Mar 9, 2019;27(3):1049-1057. [CrossRef] [Medline]
  6. Saeidzadeh S, Kamalumpundi V, Chi N, Nair R, Gilbertson-White S. Web and mobile-based symptom management interventions for physical symptoms of people with advanced cancer: A systematic review and meta-analysis. Palliat Med. Jun 2021;35(6):1020-1038. [CrossRef] [Medline]
  7. Cleeland CS, Bennett GJ, Dantzer R, Dougherty PM, Dunn AJ, Meyers CA, et al. Are the symptoms of cancer and cancer treatment due to a shared biologic mechanism? A cytokine-immunologic model of cancer symptoms. Cancer. Jun 01, 2003;97(11):2919-2925. [FREE Full text] [CrossRef] [Medline]
  8. Cleeland CS. Symptom burden: multiple symptoms and their impact as patient-reported outcomes. J Natl Cancer Inst Monogr. 2007.(37):16-21. [CrossRef] [Medline]
  9. Kaasa S, Loge JH, Aapro M, Albreht T, Anderson R, Bruera E, et al. Integration of oncology and palliative care: a Lancet oncology commission. Lancet Oncol. Nov 2018;19(11):e588-e653. [FREE Full text] [CrossRef] [Medline]
  10. Hunter B, Hindocha S, Lee RW. The role of artificial intelligence in early cancer diagnosis. Cancers (Basel). Mar 16, 2022;14(6):1524. [FREE Full text] [CrossRef] [Medline]
  11. Ruffle JK, Farmer AD, Aziz Q. Artificial intelligence-assisted gastroenterology- promises and pitfalls. Am J Gastroenterol. Mar 2019;114(3):422-428. [CrossRef] [Medline]
  12. Ahmed H, Soliman H, Elmogy M. Early detection of Alzheimer's disease using single nucleotide polymorphisms analysis based on gradient boosting tree. Comput Biol Med. Jul 2022;146:105622. [CrossRef] [Medline]
  13. Abouzari M, Goshtasbi K, Sarna B, Khosravi P, Reutershan T, Mostaghni N, et al. Prediction of vestibular schwannoma recurrence using artificial neural network. Laryngoscope Investig Otolaryngol. Apr 2020;5(2):278-285. [FREE Full text] [CrossRef] [Medline]
  14. Liu YH, Jin J, Liu YJ. Machine learning-based random forest for predicting decreased quality of life in thyroid cancer patients after thyroidectomy. Support Care Cancer. Mar 2022;30(3):2507-2513. [CrossRef] [Medline]
  15. van de Wiel M, Derijcke S, Galdermans D, Daenen M, Surmont V, De Droogh E, et al. Coping strategy influences quality of life in patients with advanced lung cancer by mediating mood. Clin Lung Cancer. Mar 2021;22(2):e146-e152. [CrossRef] [Medline]
  16. Aafjes-van Doorn K, Kamsteeg C, Bate J, Aafjes M. A scoping review of machine learning in psychotherapy research. Psychother Res. Jan 2021;31(1):92-116. [CrossRef] [Medline]
  17. Mosa AS, Rana MK, Islam H, Hossain AK, Yoo I. A smartphone-based decision support tool for predicting patients at risk of chemotherapy-induced nausea and vomiting: retrospective study on app development using decision tree induction. JMIR Mhealth Uhealth. Dec 02, 2021;9(12):e27024. [FREE Full text] [CrossRef] [Medline]
  18. Geron A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol, CA. O'Reilly Media; 2019.
  19. Tam WW, Tang A, Woo B, Goh SY. Perception of the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement of authors publishing reviews in nursing journals: a cross-sectional online survey. BMJ Open. Apr 20, 2019;9(4):e026271. [FREE Full text] [CrossRef] [Medline]
  20. Sun C, Li M, Lan L, Pei L, Zhang Y, Tan G, et al. Prediction models for chronic postsurgical pain in patients with breast cancer based on machine learning approaches. Front Oncol. 2023;13:1096468. [FREE Full text] [CrossRef] [Medline]
  21. Xinran Z, Shumei Z, Xueying Z, Linan W, Ying G, Peng W, et al. Construction of a predictive model for cognitive impairment risk in patients with advanced cancer. Int J Nurs Pract. Aug 2023;29(4):e13140. [CrossRef] [Medline]
  22. Shaikh NF, Shen C, LeMasters T, Dwibedi N, Ladani A, Sambamoorthi U. Prescription non-steroidal anti-inflammatory drugs (NSAIDs) and incidence of depression among older cancer survivors with osteoarthritis: a machine learning analysis. Cancer Inform. 2023;22:11769351231165161. [FREE Full text] [CrossRef] [Medline]
  23. Kober KM, Roy R, Conley Y, Dhruva A, Hammer MJ, Levine J, et al. Prediction of morning fatigue severity in outpatients receiving chemotherapy: less may still be more. Support Care Cancer. Apr 11, 2023;31(5):253. [CrossRef] [Medline]
  24. Du L, Du J, Yang M, Xu Q, Huang J, Tan W, et al. Development and external validation of a machine learning-based prediction model for the cancer-related fatigue diagnostic screening in adult cancer patients: a cross-sectional study in China. Support Care Cancer. Jan 10, 2023;31(2):106. [CrossRef] [Medline]
  25. Moscato S, Orlandi S, Giannelli A, Ostan R, Chiari L. Automatic pain assessment on cancer patients using physiological signals recorded in real-world contexts. Annu Int Conf IEEE Eng Med Biol Soc. Jul 2022;2022:1931-1934. [CrossRef] [Medline]
  26. Masukawa K, Aoyama M, Yokota S, Nakamura J, Ishida R, Nakayama M, et al. Machine learning models to detect social distress, spiritual pain, and severe physical psychological symptoms in terminally ill patients with cancer from unstructured text data in electronic medical records. Palliat Med. Sep 2022;36(8):1207-1216. [CrossRef] [Medline]
  27. Fanizzi A, Scognamillo G, Nestola A, Bambace S, Bove S, Comes MC, et al. Transfer learning approach based on computed tomography images for predicting late xerostomia after radiotherapy in patients with oropharyngeal cancer. Front Med (Lausanne). 2022;9:993395. [FREE Full text] [CrossRef] [Medline]
  28. Ueno T, Ichikawa D, Shimizu Y, Narisawa T, Tsuji K, Ochi E, et al. Comorbid insomnia among breast cancer survivors and its prediction using machine learning: a nationwide study in Japan. Jpn J Clin Oncol. Jan 03, 2022;52(1):39-46. [FREE Full text] [CrossRef] [Medline]
  29. On J, Park HA, Yoo S. Development of a prediction models for chemotherapy-induced adverse drug reactions: a retrospective observational study using electronic health records. Eur J Oncol Nurs. Feb 2022;56:102066. [CrossRef] [Medline]
  30. Li M, Zhang J, Zha Y, Li Y, Hu B, Zheng S, et al. A prediction model for xerostomia in locoregionally advanced nasopharyngeal carcinoma patients receiving radical radiotherapy. BMC Oral Health. Jun 17, 2022;22(1):239. [FREE Full text] [CrossRef] [Medline]
  31. Kurisu K, Inada S, Maeda I, Ogawa A, Iwase S, Akechi T, et al. Phase-R Delirium Study Group. A decision tree prediction model for a short-term outcome of delirium in patients with advanced cancer receiving pharmacological interventions: a secondary analysis of a multicenter and prospective observational study (Phase-R). Palliat Support Care. Apr 2022;20(2):153-158. [CrossRef] [Medline]
  32. Guo W, Gao G, Dai J, Sun Q. Prediction of lung infection during palliative chemotherapy of lung cancer based on artificial neural network. Comput Math Methods Med. 2022;2022:4312117. [FREE Full text] [CrossRef] [Medline]
  33. Baglione AN, Cai L, Bahrini A, Posey I, Boukhechba M, Chow PI. Understanding the relationship between mood symptoms and mobile app engagement among patients with breast cancer using machine learning: case study. JMIR Med Inform. Jun 02, 2022;10(6):e30712. [FREE Full text] [CrossRef] [Medline]
  34. Chao M, El Naqa I, Bakst RL, Lo Y, Peñagarícano JA. Cluster model incorporating heterogeneous dose distribution of partial parotid irradiation for radiotherapy induced xerostomia prediction with machine learning methods. Acta Oncol. Jul 2022;61(7):842-848. [CrossRef] [Medline]
  35. Wakabayashi K, Koide Y, Aoyama T, Shimizu H, Miyauchi R, Tanaka H, et al. A predictive model for pain response following radiotherapy for treatment of spinal metastases. Sci Rep. Jun 18, 2021;11(1):12908. [FREE Full text] [CrossRef] [Medline]
  36. Zhou SP, Fei SD, Han HH, Li JJ, Yang S, Zhao CY. A prediction model for cognitive impairment risk in colorectal cancer after chemotherapy treatment. Biomed Res Int. Feb 20, 2021;2021:1-13. [FREE Full text] [CrossRef] [Medline]
  37. Xuyi W, Seow H, Sutradhar R. Artificial neural networks for simultaneously predicting the risk of multiple co-occurring symptoms among patients with cancer. Cancer Med. Feb 2021;10(3):989-998. [FREE Full text] [CrossRef] [Medline]
  38. Xu XY, Lu JL, Xu Q, Hua HX, Xu L, Chen L. Risk factors and the utility of three different kinds of prediction models for postoperative fatigue after gastrointestinal tumor surgery. Support Care Cancer. Jan 2021;29(1):203-211. [CrossRef] [Medline]
  39. Wei X, Lu Q, Jin S, Li F, Zhao Q, Cui Y, et al. Developing and validating a prediction model for lymphedema detection in breast cancer survivors. Eur J Oncol Nurs. Oct 2021;54:102023. [CrossRef] [Medline]
  40. Wang Y, Van Dijk L, Mohamed AS, Fuller CD, Zhang X, Marai GE, et al. Predicting late symptoms of head and neck cancer treatment using LSTM and patient reported outcomes. Proc Int Database Eng Appl Symp. Jul 2021.:273-279. [FREE Full text] [CrossRef] [Medline]
  41. Wang X, Eichhorn J, Haq I, Baghal A. Resting-state brain metabolic fingerprinting clusters (biomarkers) and predictive models for major depression in multiple myeloma patients. PLoS One. 2021;16(5):e0251026. [FREE Full text] [CrossRef] [Medline]
  42. Low CA, Li M, Vega J, Durica KC, Ferreira D, Tam V, et al. Digital biomarkers of symptom burden self-reported by perioperative patients undergoing pancreatic surgery: prospective longitudinal study. JMIR Cancer. Apr 27, 2021;7(2):e27975. [FREE Full text] [CrossRef] [Medline]
  43. Kourou K, Manikis G, Poikonen-Saksela P, Mazzocco K, Pat-Horenczyk R, Sousa B, et al. A machine learning-based pipeline for modeling medical, socio-demographic, lifestyle and self-reported psychological traits as predictors of mental health outcomes after breast cancer diagnosis: an initial effort to define resilience effects. Comput Biol Med. Apr 2021;131:104266. [CrossRef] [Medline]
  44. Kober KM, Roy R, Dhruva A, Conley YP, Chan RJ, Cooper B, et al. Prediction of evening fatigue severity in outpatients receiving chemotherapy: less may be more. Fatigue. 2021;9(1):14-32. [FREE Full text] [CrossRef] [Medline]
  45. Hu C, Li Q, Shou J, Zhang F, Li X, Wu M, et al. Constructing a predictive model of depression in chemotherapy patients with Non-Hodgkin's lymphoma to improve medical staffs' psychiatric care. Biomed Res Int. 2021;2021:1-12. [FREE Full text] [CrossRef] [Medline]
  46. Haun MW, Simon L, Sklenarova H, Zimmermann-Schlegel V, Friederich H, Hartmann M. Predicting anxiety in cancer survivors presenting to primary care - a machine learning approach accounting for physical comorbidity. Cancer Med. Jul 2021;10(14):5001-5016. [FREE Full text] [CrossRef] [Medline]
  47. Lee SH, Han P, Hales RK, Voong KR, Noro K, Sugiyama S, et al. Multi-view radiomics and dosiomics analysis with machine learning for predicting acute-phase weight loss in lung cancer patients treated with radiotherapy. Phys Med Biol. Sep 28, 2020;65(19):195015. [CrossRef] [Medline]
  48. Juwara L, Arora N, Gornitsky M, Saha-Chaudhuri P, Velly AM. Identifying predictive factors for neuropathic pain after breast cancer surgery using machine learning. Int J Med Inform. Sep 2020;141:104170. [CrossRef] [Medline]
  49. Men K, Geng H, Zhong H, Fan Y, Lin A, Xiao Y. A deep learning model for predicting xerostomia due to radiation therapy for head and neck squamous cell carcinoma in the RTOG 0522 clinical trial. Int J Radiat Oncol Biol Phys. Oct 01, 2019;105(2):440-447. [FREE Full text] [CrossRef] [Medline]
  50. Jiang W, Lakshminarayanan P, Hui X, Han P, Cheng Z, Bowers M, et al. Machine learning methods uncover radiomorphologic dose patterns in salivary glands that predict xerostomia in patients with head and neck cancer. Adv Radiat Oncol. 2019;4(2):401-412. [FREE Full text] [CrossRef] [Medline]
  51. Sheikh K, Lee SH, Cheng Z, Lakshminarayanan P, Peng L, Han P, et al. Predicting acute radiation induced xerostomia in head and neck cancer using MR and CT radiomics of parotid and submandibular glands. Radiat Oncol. Jul 29, 2019;14(1):131. [FREE Full text] [CrossRef] [Medline]
  52. Papachristou N, Puschmann D, Barnaghi P, Cooper B, Hu X, Maguire R, et al. Learning from data to predict future symptoms of oncology patients. PLoS One. Dec 31, 2018;13(12):e0208808. [FREE Full text] [CrossRef] [Medline]
  53. Zhang Z, Zhu Y, Zhang L, Wang Z, Wan H. Prediction model of critical weight loss in cancer patients during particle therapy. Jpn J Clin Oncol. Jan 01, 2018;48(1):75-81. [CrossRef] [Medline]
  54. Olling K, Nyeng DW, Wee L. Predicting acute odynophagia during lung cancer radiotherapy using observations derived from patient-centred nursing care. Tech Innov Patient Support Radiat Oncol. Mar 2018;5:16-20. [FREE Full text] [CrossRef] [Medline]
  55. Gabryś HS, Buettner F, Sterzing F, Hauswald H, Bangert M. Design and selection of machine learning methods using radiomics and dosiomics for normal tissue complication probability modeling of xerostomia. Front Oncol. 2018;8:35. [FREE Full text] [CrossRef] [Medline]
  56. Lötsch J, Sipilä R, Tasmuth T, Kringel D, Estlander A, Meretoja T, et al. Machine-learning-derived classifier predicts absence of persistent pain after breast cancer surgery with high accuracy. Breast Cancer Res Treat. Sep 6, 2018;171(2):399-411. [FREE Full text] [CrossRef] [Medline]
  57. Abdollahi H, Mostafaei S, Cheraghi S, Shiri I, Rabi Mahdavi S, Kazemnejad A. Cochlea CT radiomics predicts chemoradiotherapy induced sensorineural hearing loss in head and neck cancer patients: a machine learning and multi-variable modelling study. Phys Med. Jan 2018;45:192-197. [CrossRef] [Medline]
  58. van Dijk LV, Thor M, Steenbakkers RJ, Apte A, Zhai T, Borra R, et al. Parotid gland fat related magnetic resonance image biomarkers improve prediction of late radiation-induced xerostomia. Radiother Oncol. Sep 2018;128(3):459-466. [FREE Full text] [CrossRef] [Medline]
  59. Cvetković J. Breast cancer patients' depression prediction by machine learning approach. Cancer Invest. Sep 14, 2017;35(8):569-572. [CrossRef] [Medline]
  60. van Dijk LV, Brouwer CL, van der Schaaf A, Burgerhof JG, Beukinga RJ, Langendijk JA, et al. CT image biomarkers to improve patient-specific prediction of radiation-induced xerostomia and sticky saliva. Radiother Oncol. Feb 2017;122(2):185-191. [FREE Full text] [CrossRef] [Medline]
  61. Jensen SB, Vissink A, Limesand KH, Reyland ME. Salivary gland hypofunction and xerostomia in head and neck radiation patients. J Natl Cancer Inst Monogr. Aug 01, 2019;2019(53):lgz016. [CrossRef] [Medline]
  62. Fervaha G, Izard JP, Tripp DA, Rajan S, Leong DP, Siemens DR. Depression and prostate cancer: a focused review for the clinician. Urol Oncol. Apr 2019;37(4):282-288. [CrossRef] [Medline]
  63. Slovacek L, Slovackova B, Slanska I, Petera J, Priester P, Filip S, et al. Depression symptoms and health-related quality of life among patients with metastatic breast cancer in programme of palliative cancer care. Neoplasma. 2009;56(6):467-472. [CrossRef] [Medline]
  64. Webber K, Davies AN, Leach C, Waghorn M. Symptom prevalence and severity in palliative cancer medicine. BMJ Support Palliat Care. Dec 07, 2023;13(e2):e270-e272. [CrossRef] [Medline]
  65. Zomkowski K, Cruz de Souza B, Pinheiro da Silva F, Moreira GM, de Souza Cunha N, Sperandio FF. Physical symptoms and working performance in female breast cancer survivors: a systematic review. Disabil Rehabil. Jun 2018;40(13):1485-1493. [CrossRef] [Medline]
  66. Bean HR, Diggens J, Ftanou M, Weihs KL, Stanton AL, Wiley JF. Insomnia and fatigue symptom trajectories in breast cancer: a longitudinal cohort study. Behav Sleep Med. 2021;19(6):814-827. [CrossRef] [Medline]
  67. Pozzar RA, Hammer MJ, Cooper BA, Kober KM, Chen L, Paul SM, et al. Symptom clusters in patients with gynecologic cancer receiving chemotherapy. Oncol Nurs Forum. Jul 01, 2021;48(4):441-452. [CrossRef] [Medline]
  68. Harris CS, Kober KM, Conley YP, Dhruva AA, Hammer MJ, Miaskowski CA. Symptom clusters in patients receiving chemotherapy: a systematic review. BMJ Support Palliat Care. Mar 2022;12(1):10-21. [FREE Full text] [CrossRef] [Medline]
  69. Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One. Nov 7, 2019;14(11):e0224365. [FREE Full text] [CrossRef] [Medline]
  70. Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, et al. Sample-Size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J. Nov 2019;70(4):344-353. [FREE Full text] [CrossRef] [Medline]
  71. Rahmani AM, Yousefpoor E, Yousefpoor MS, Mehmood Z, Haider A, Hosseinzadeh M, et al. Machine learning (ML) in medicine: review, applications, and challenges. Mathematics. Nov 21, 2021;9(22):2970. [FREE Full text] [CrossRef]
  72. Cabitza F, Campagner A, Soares F, García de Guadiana-Romualdo L, Challa F, Sulejmani A, et al. The importance of being external methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed. Sep 2021;208:106288. [FREE Full text] [CrossRef] [Medline]

CT: computed tomography
DL: deep learning
DT: decision tree
LR: logistic regression
ML: machine learning
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Edited by T de Azevedo Cardoso; submitted 12.09.23; peer-reviewed by Z Su, J Chow, F Alam; comments to author 11.12.23; revised version received 18.01.24; accepted 19.01.24; published 19.03.24.


©Nahid Zeinali, Nayung Youn, Alaa Albashayreh, Weiguo Fan, Stéphanie Gilbertson White. Originally published in JMIR Cancer (, 19.03.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Cancer, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.