@Article{info:doi/10.2196/70706, author="Yao, Jiarui and Perova, Zinaida and Mandloi, Tushar and Lewis, Elizabeth and Parkinson, Helen and Savova, Guergana", title="Extracting Knowledge From Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation Study", journal="JMIR Bioinform Biotech", year="2025", month="Jun", day="30", volume="6", pages="e70706", keywords="patient-derived cancer models", keywords="large language models", keywords="knowledge extraction", keywords="in-context learning", keywords="soft prompting", keywords="prompt tuning", keywords="information extraction", abstract="Background: Patient-derived cancer models (PDCMs) have become essential tools in cancer research and preclinical studies. Consequently, the number of publications on PDCMs has increased significantly over the past decade. Advances in artificial intelligence, particularly in large language models (LLMs), offer promising solutions for extracting knowledge from scientific literature at scale. Objective: This study aims to investigate LLM-based systems, focusing specifically on prompting techniques for the automated extraction of PDCM-related entities from scientific texts. Methods: We explore 2 LLM-prompting approaches. The classic method, direct prompting, involves manually designing a prompt. Our direct prompt consists of an instruction, entity-type definitions, gold examples, and a query. In addition, we experiment with a novel and underexplored prompting strategy---soft prompting. Unlike direct prompting, soft prompts are trainable continuous vectors that learn from provided data. We evaluate both prompting approaches across state-of-the-art proprietary and open LLMs. Results: We manually annotated 100 abstracts of PDCM-relevant papers, focusing on PDCM papers with data deposited in the CancerModels.Org platform. The resulting gold annotations span 15 entity types for a total 3313 entity mentions, which we split across training (2089 entities), development (542 entities) and held-out, eye-off test (682 entities) sets. Evaluation includes the standard metrics of precision or positive predictive value, recall or sensitivity, and F1-score (harmonic mean of precision and recall) in 2 settings: an exact match setting, where spans of gold and predicted annotations have to match exactly, and an overlapping match setting, where the spans of gold and predicted annotations have to overlap. GPT4-o with direct prompting achieved F1-scores of 50.48 and 71.36 for exact and overlapping match settings, respectively. In both evaluation settings, LLaMA3 soft prompting improved performance over direct prompting (F1-score from 7.06 to 46.68 in the exact match setting; and 12.0 to 71.80 in the overlapping evaluation setting). Results with LLaMA3 soft prompting are slightly higher than GPT4-o direct prompting in the overlapping match evaluation setting. Conclusions: We investigated LLM-prompting techniques for the automatic extraction of PDCM-relevant entities from scientific texts, comparing the traditional direct prompting approach with the emerging soft prompting method. In our experiments, GPT4-o demonstrated strong performance with direct prompting, maintaining competitive results. Meanwhile, soft prompting significantly enhanced the performance of smaller open LLMs. Our findings suggest that training soft prompts on smaller open models can achieve performance levels comparable to those of proprietary very large language models. ", doi="10.2196/70706", url="https://bioinform.jmir.org/2025/1/e70706" } @Article{info:doi/10.2196/70275, author="Virdee, S. Pradeep and Collins, K. Kiana and Smith, Friedemann Claire and Yang, Xin and Zhu, Sufen and Roberts, Nia and Oke, L. Jason and Bankhead, Clare and Perera, Rafael and Hobbs, Richard F. D. and Nicholson, D. Brian", title="Clinical Prediction Models Incorporating Blood Test Trend for Cancer Detection: Systematic Review, Meta-Analysis, and Critical Appraisal", journal="JMIR Cancer", year="2025", month="Jun", day="27", volume="11", pages="e70275", keywords="blood test", keywords="hematologic tests", keywords="trend", keywords="prediction model", keywords="primary health care", keywords="cancer", keywords="neoplasms", keywords="systematic review", abstract="Background: Blood tests used to identify patients at increased risk of undiagnosed cancer are commonly used in isolation, primarily by monitoring whether results fall outside the normal range. Some prediction models incorporate changes over repeated blood tests (or trends) to improve individualized cancer risk identification, as relevant trends may be confined within the normal range. Objective: Our aim was to critically appraise existing diagnostic prediction models incorporating blood test trends for the risk of cancer. Methods: MEDLINE and EMBASE were searched until April 3, 2025 for diagnostic prediction model studies using blood test trends for cancer risk. Screening was performed by 4 reviewers. Data extraction for each article was performed by 2 reviewers independently. To critically appraise models, we narratively synthesized studies, including model building and validation strategies, model reporting, and the added value of blood test trends. We also reviewed the performance measures of each model, including discrimination and calibration. We performed a random-effects meta-analysis of the c-statistic for a trends-based prediction model if there were at least 3 studies validating the model. The risk of bias was assessed using the PROBAST (prediction model risk of bias assessment tool). Results: We included 16 articles, with a total of 7 models developed and 14 external validation studies. In the 7 models derived, full blood count (FBC) trends were most commonly used (86\%, n=7 models). Cancers modeled were colorectal (43\%, n=3), gastro-intestinal (29\%, n=2), nonsmall cell lung (14\%, n=1), and pancreatic (14\%, n=1). In total, 2 models used statistical logistic regression, 2 used joint modeling, and 1 each used XGBoost, decision trees, and random forests. The number of blood test trends included in the models ranged from 1 to 26. A total of 2 of 4 models were reported with the full set of coefficients needed to predict risk, with the remaining excluding at least one coefficient from their article or were not publicly accessible. The c-statistic ranged 0.69?0.87 among validation studies. The ColonFlag model using trends in the FBC was commonly externally validated, with a pooled c-statistic=0.81 (95\% CI 0.77-0.85; n=4 studies) for 6-month colorectal cancer risk. Models were often inadequately tested, with only one external validation study assessing model calibration. All 16 studies scored a low risk of bias regarding predictor and outcome details. All but one study scored a high risk of bias in the analysis domain, with most studies often removing patients with missing data from analysis or not adjusting the derived model for overfitting. Conclusions: Our review highlights that blood test trends may inform further investigation for cancer. However, models were not available for most cancer sites, were rarely externally validated, and rarely assessed calibration when they were externally validated. Trial Registration: PROSPERO CRD42022348907; https://www.crd.york.ac.uk/PROSPERO/view/CRD42022348907 ", doi="10.2196/70275", url="https://cancer.jmir.org/2025/1/e70275" } @Article{info:doi/10.2196/68898, author="Alhumaidi, Hamad Norah and Dermawan, Doni and Kamaruzaman, Farhana Hanin and Alotaiq, Nasser", title="The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review", journal="JMIR Med Inform", year="2025", month="Jun", day="19", volume="13", pages="e68898", keywords="machine learning", keywords="big data", keywords="real-world data", keywords="disease prediction", keywords="health care management", keywords="real-world evidence", keywords="artificial intelligence", keywords="AI", abstract="Background: Machine learning (ML) and big data analytics are rapidly transforming health care, particularly disease prediction, management, and personalized care. With the increasing availability of real-world data (RWD) from diverse sources, such as electronic health records (EHRs), patient registries, and wearable devices, ML techniques present substantial potential to enhance clinical outcomes. Despite this promise, challenges such as data quality, model transparency, generalizability, and integration into clinical practice persist. Objective: This systematic review aims to examine the use of ML for analyzing RWD in disease prediction and management, identifying the most commonly used ML methods, prevalent disease types, study designs, and the sources of real-world evidence (RWE). It also explores the strengths and limitations of current practices, offering insights for future improvements. Methods: A comprehensive search was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to identify studies using ML techniques for analyzing RWD in disease prediction and management. The search focused on extracting data regarding the ML algorithms applied; disease categories studied; types of study designs (eg, clinical trials and cohort studies); and the sources of RWE, including EHRs, patient registries, and wearable devices. Studies published between 2014 and 2024 were included to ensure the analysis of the most recent advances in the field. Results: This review identified 57 studies that met the inclusion criteria, with a total sample size of >150,000 patients. The most frequently applied ML methods were random forest (n=24, 42\%), logistic regression (n=21, 37\%), and support vector machines (n=18, 32\%). These methods were predominantly used for predictive modeling across disease areas, including cardiovascular diseases (n=19, 33\%), cancer (n=9, 16\%), and neurological disorders (n=6, 11\%). RWE was primarily sourced from EHRs, patient registries, and wearable devices. A substantial portion of studies (n=38, 67\%) focused on improving clinical decision-making, patient stratification, and treatment optimization. Among these studies, 14 (25\%) focused on decision-making; 12 (21\%) on health care outcomes, such as quality of life, recovery rates, and adverse events; and 11 (19\%) on survival prediction, particularly in oncology and chronic diseases. For example, random forest models for cardiovascular disease prediction demonstrated an area under the curve of 0.85 (95\% CI 0.81-0.89), while support vector machine models for cancer prognosis achieved an accuracy of 83\% (P=.04). Despite the promising outcomes, many (n=34, 60\%) studies faced challenges related to data quality, model interpretability, and ensuring generalizability across diverse patient populations. Conclusions: This systematic review highlights the significant potential of ML and big data analytics in health care, especially for improving disease prediction and management. However, to fully realize the benefits of these technologies, future research must focus on addressing the challenges of data quality, enhancing model transparency, and ensuring the broader applicability of ML models across diverse populations and clinical settings. ", doi="10.2196/68898", url="https://medinform.jmir.org/2025/1/e68898" } @Article{info:doi/10.2196/64506, author="Sun, Chengkun and Mobley, Erin and Quillen, Michael and Parker, Max and Daly, Meghan and Wang, Rui and Visintin, Isabela and Awad, Ziad and Fishe, Jennifer and Parker, Alexander and George, Thomas and Bian, Jiang and Xu, Jie", title="Predicting Early-Onset Colorectal Cancer in Individuals Below Screening Age Using Machine Learning and Real-World Data: Case Control Study", journal="JMIR Cancer", year="2025", month="Jun", day="19", volume="11", pages="e64506", keywords="prediction", keywords="machine learning", keywords="ML", keywords="rectal cancer", keywords="colorectal cancer", keywords="CRC", keywords="youth", keywords="adolescent", keywords="middle-aged", keywords="United States", keywords="Americans", keywords="electronic health record", keywords="EHR", keywords="Shapley Additive Explanations", keywords="SHAP", keywords="diagnosis", keywords="prevention and treatment", abstract="Background: Colorectal cancer is now the leading cause of cancer-related deaths among young Americans. Accurate early prediction and a thorough understanding of the risk factors for early-onset colorectal cancer (EOCRC) are vital for effective prevention and treatment, particularly for patients below the recommended screening age. Objective: Our study aims to predict EOCRC using machine learning (ML) and structured electronic health record data for individuals under the screening age of 45 years, with the aim of exploring potential risk and protective factors that could support early diagnosis. Methods: We identified a cohort of patients under the age of 45 years from the OneFlorida+ Clinical Research Consortium. Given the distinct pathology of colon cancer (CC) and rectal cancer (RC), we created separate prediction models for each cancer type with various ML algorithms. We assessed multiple prediction time windows (ie, 0, 1, 3, and 5 y) and ensured robustness through propensity score matching to account for confounding variables including sex, race, ethnicity, and birth year. We conducted a comprehensive performance evaluation using metrics including area under the curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, and F1-score. Both linear (ie, logistic regression, support vector machine) and nonlinear (ie, Extreme Gradient Boosting and random forest) models were assessed to enable rigorous comparison across different classification strategies. In addition, we used the Shapley Additive Explanations to interpret the models and identify key risk and protective factors associated with EOCRC. Results: The final cohort included 1358 CC cases with 6790 matched controls, and 560 RC cases with 2800 matched controls. The RC group had a more balanced sex distribution (2:3 male-to-female) compared to the CC group (2:5 male-to-female), and both groups showed diverse racial and ethnic representation. Our predictive models demonstrated reasonable results, with AUC scores for CC prediction of 0.811, 0.748, 0.689, and 0.686 at 0, 1, 3, and 5 years before diagnosis, respectively. For RC prediction, AUC scores were 0.829, 0.771, 0.727, and 0.721 across the same time windows. Key predictive features across both cancer types included immune and digestive system disorders, secondary malignancies, and underweight status. In addition, blood diseases emerged as prominent indicators specifically for CC. Conclusions: Our findings demonstrate the potential of ML models leveraging electronic health record data to facilitate the early prediction of EOCRC in individuals under 45 years. By uncovering important risk factors and achieving promising predictive performance, this study provides preliminary insights that could inform future efforts toward earlier detection and prevention in younger populations. ", doi="10.2196/64506", url="https://cancer.jmir.org/2025/1/e64506" } @Article{info:doi/10.2196/71091, author="She, Lizhen and Li, Yunfeng and Wang, Hongyong and Zhang, Jun and Zhao, Yuechen and Cui, Jie and Qiu, Ling", title="Imaging-Based AI for Predicting Lymphovascular Space Invasion in Cervical Cancer: Systematic Review and Meta-Analysis", journal="J Med Internet Res", year="2025", month="Jun", day="16", volume="27", pages="e71091", keywords="artificial intelligence", keywords="uterine cervical neoplasms", keywords="lymphovascular space invasion", keywords="diagnostic performance", keywords="meta-analysis", abstract="Background: The role of artificial intelligence (AI) in enhancing the accuracy of lymphovascular space invasion (LVSI) detection in cervical cancer remains debated. Objective: This meta-analysis aimed to evaluate the diagnostic accuracy of imaging-based AI for predicting LVSI in cervical cancer. Methods: We conducted a comprehensive literature search across multiple databases, including PubMed, Embase, and Web of Science, identifying studies published up to November 9, 2024. Studies were included if they evaluated the diagnostic performance of imaging-based AI models in detecting LVSI in cervical cancer. We used a bivariate random-effects model to calculate pooled sensitivity and specificity with corresponding 95\% confidence intervals. Study heterogeneity was assessed using the I2 statistic. Results: Of 403 studies identi?ed, 16 studies (2514 patients) were included. For the interval validation set, the pooled sensitivity, specificity, and area under the curve (AUC) for detecting LVSI were 0.84 (95\% CI 0.79-0.87), 0.78 (95\% CI 0.75-0.81), and 0.87 (95\% CI 0.84-0.90). For the external validation set, the pooled sensitivity, specificity, and AUC for detecting LVSI were 0.79 (95\% CI 0.70-0.86), 0.76 (95\% CI 0.67-0.83), and 0.84 (95\% CI 0.81-0.87). Using the likelihood ratio test for subgroup analysis, deep learning demonstrated significantly higher sensitivity compared to machine learning (P=.01). Moreover, AI models based on positron emission tomography/computed tomography exhibited superior sensitivity relative to those based on magnetic resonance imaging (P=.01). Conclusions: Imaging-based AI, particularly deep learning algorithms, demonstrates promising diagnostic performance in predicting LVSI in cervical cancer. However, the limited external validation datasets and the retrospective nature of the research may introduce potential biases. These findings underscore AI's potential as an auxiliary diagnostic tool, necessitating further large-scale prospective validation. Trial Registration: PROSPERO CRD42024612008; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024612008 ", doi="10.2196/71091", url="https://www.jmir.org/2025/1/e71091" } @Article{info:doi/10.2196/64399, author="Brnabic, Michael Alan James and Lipkovich, Ilya and Kadziola, Zbigniew and He, Dan and Krein, M. Peter and Hess, M. Lisa", title="Next-Generation Sequencing--Based Testing Among Patients With Advanced or Metastatic Nonsquamous Non--Small Cell Lung Cancer in the United States: Predictive Modeling Using Machine Learning Methods", journal="JMIR Cancer", year="2025", month="Jun", day="11", volume="11", pages="e64399", keywords="lung cancer", keywords="NGS testing", keywords="next-generation sequencing", keywords="real-world data", keywords="machine learning", keywords="biomarkers", keywords="predictive modeling", keywords="artificial intelligence", keywords="treatment guidelines", keywords="tumor biomarker", keywords="oncology", abstract="Background: Next-generation sequencing (NGS) has become a cornerstone of treatment for lung cancer and is recommended in current treatment guidelines for patients with advanced or metastatic disease. Objective: This study was designed to use machine learning methods to determine demographic and clinical characteristics of patients with advanced or metastatic non--small cell lung cancer (NSCLC) that may predict likelihood of receiving NGS-based testing (ever vs never NGS-tested) as well as likelihood of timing of testing (early vs late NGS-tested). Methods: Deidentified patient-level data were analyzed in this study from a real-world cohort of patients with advanced or metastatic NSCLC in the United States. Patients with nonsquamous disease, who received systemic therapy for NSCLC, and had at least 3 months of follow-up data for analysis were included in this study. Three strategies, logistic regression models, penalized logistic regression using least absolute shrinkage and selection operator penalty, and extreme gradient boosting with classification trees as base learners, were used to identify predictors of ever versus never and early versus late NGS testing. Data were split into D1 (training+validation; 80\%) and D2 (testing; 20\%) sets; the 3 strategies were evaluated by comparing their performance on multiple m=1000 splits in the training (70\%) and validation data (30\%) within the D1 set. The final model was selected by evaluating performance using the area under the receiver operating curve while taking into account considerations of simplicity and clinical interpretability. Performance was re-estimated using the test data D2. Results: A total of 13,425 met the criteria for the ever NGS-tested, and 17,982 were included in the never NGS-tested group. Performance metrics showed the area under the receiver operating curve evaluated from validation data was similar across all models (77\%-84\%). Among those in the ever NGS-tested group, 84.08\% (n=11,289) were early NGS-tested, and 15.91\% (n=2136) late NGS-tested. Factors associated with both ever having NGS testing as well as early NGS testing included later year of NSCLC diagnosis, no smoking history, and evidence of programmed death ligand 1 testing (all P<.05). Factors associated with a greater chance of never receiving NGS testing included older age, lower performance status, Black race, higher number of single-gene tests, public insurance, and treatment in a geography with Molecular Diagnostics Services Program adoption (all P<.05). Conclusions: Predictors of ever versus never as well as early versus late NGS testing in the setting of advanced or metastatic NSCLC were consistent across machine learning methods in this study, demonstrating the ability of these models to identify factors that may predict NGS-based testing. There is a need to ensure that patients regardless of age, race, insurance status, and geography (factors associated with lower odds of receiving NGS testing in this study) are provided with equitable access to NGS-based testing. ", doi="10.2196/64399", url="https://cancer.jmir.org/2025/1/e64399" } @Article{info:doi/10.2196/64000, author="Heudel, Pierre and Ahmed, Mashal and Renard, Felix and Attye, Arnaud", title="Leveraging Digital Twins for Stratification of Patients with Breast Cancer and Treatment Optimization in Geriatric Oncology: Multivariate Clustering Analysis", journal="JMIR Cancer", year="2025", month="May", day="23", volume="11", pages="e64000", keywords="digital twins", keywords="artificial intelligence", keywords="breast cancer", keywords="older adult patients with cancer", keywords="treatment", keywords="geriatric oncology", keywords="geriatric", keywords="oncology", keywords="cancer", keywords="clustering analysis", keywords="therapeutic", keywords="older adult", keywords="elder", keywords="old", keywords="patients with cancer", keywords="decision-making tools", keywords="decision-making", keywords="manifold learning model", keywords="chemotherapy", keywords="comorbidities", keywords="comorbidity", keywords="health care", abstract="Background: Defining optimal adjuvant therapeutic strategies for older adult patients with breast cancer remains a challenge, given that this population is often overlooked and underserved in clinical research and decision-making tools. Objectives: This study aimed to develop a prognostic and treatment guidance tool tailored to older adult patients using artificial intelligence (AI) and a combination of clinical and biological features. Methods: A retrospective analysis was conducted on data from women aged 70+ years with HER2-negative early-stage breast cancer treated at the French L{\'e}on B{\'e}rard Cancer Center between 1997 and 2016. Manifold learning and machine learning algorithms were applied to uncover complex data relationships and develop predictive models. Predictors included age, BMI, comorbidities, hemoglobin levels, lymphocyte counts, hormone receptor status, Scarff-Bloom-Richardson grade, tumor size, and lymph node involvement. The dimension reduction technique PaCMAP was used to map patient profiles into a 3D space, allowing comparison with similar cases to estimate prognoses and potential treatment benefits. Results: Out of 1229 initial patients, 793 were included after data refinement. The selected predictors demonstrated high predictive efficacy for 5-year mortality, with mean area under the curve scores of 0.81 for Random Forest Classification and 0.76 for Support Vector Classifier. The tool categorized patients into prognostic clusters and enabled the estimation of treatment outcomes, such as chemotherapy benefits. Unlike traditional models that focus on isolated factors, this AI-based approach integrates multiple clinical and biological features to generate a comprehensive biomedical profile. Conclusions: This study introduces a novel AI-driven prognostic tool for older adult patients with breast cancer, enhancing treatment guidance by leveraging advanced machine learning techniques. The model provides a more nuanced understanding of disease dynamics and therapeutic strategies, emphasizing the importance of personalized oncology care. ", doi="10.2196/64000", url="https://cancer.jmir.org/2025/1/e64000" } @Article{info:doi/10.2196/64697, author="Varma, Gowtham and Yenukoti, Kumar Rohit and Kumar M, Praveen and Ashrit, Sai Bandlamudi and Purushotham, K. and Subash, C. and Ravi, Kumar Sunil and Kurien, Verghese and Aman, Avinash and Manoharan, Mithun and Jaiswal, Shashank and Anand, Akash and Barve, Rakesh and Thiagarajan, Viswanathan and Lenehan, Patrick and Soefje, A. Scott and Soundararajan, Venky", title="A Deep Learning--Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients With Metastatic Breast Cancer: Study Using Deidentified Electronic Health Records", journal="JMIR Cancer", year="2025", month="May", day="15", volume="11", pages="e64697", keywords="real-world evidence", keywords="data-driven oncology", keywords="real-world progression-free survival", keywords="metastatic breast cancer", keywords="natural language processing", keywords="NLP", keywords="survival", keywords="cancer", keywords="oncology", keywords="breast", keywords="metastatic", keywords="deep learning", keywords="machine learning", keywords="ML", keywords="workflow", keywords="report", keywords="notes", keywords="electronic health record", keywords="EHR", keywords="documentation", abstract="Background: Progression-free survival (PFS) is a crucial endpoint in cancer drug research. Clinician-confirmed cancer progression, namely real-world PFS (rwPFS) in unstructured text (ie, clinical notes), serves as a reasonable surrogate for real-world indicators in ascertaining progression endpoints. Response evaluation criteria in solid tumors (RECIST) is traditionally used in clinical trials using serial imaging evaluations but is impractical when working with real-world data. Manual abstraction of clinical progression from unstructured notes remains the gold standard. However, this process is a resource-intensive, time-consuming process. Natural language processing (NLP), a subdomain of machine learning, has shown promise in accelerating the extraction of tumor progression from real-world data in recent years. Objectives: We aim to configure a pretrained, general-purpose health care NLP framework to transform free-text clinical notes and radiology reports into structured progression events for studying rwPFS on metastatic breast cancer (mBC) cohorts. Methods: This study developed and validated a novel semiautomated workflow to estimate rwPFS in patients with mBC using deidentified electronic health record data from the Nference nSights platform. The developed workflow was validated in a cohort of 316 patients with hormone receptor--positive, human epidermal growth factor receptor-2 (HER-2) 2-negative mBC, who were started on palbociclib and letrozole combination therapy between January 2015 and December 2021. Ground-truth datasets were curated to evaluate the workflow's performance at both the sentence and patient levels. NLP-captured progression or a change in therapy line were considered outcome events, while death, loss to follow-up, and end of the study period were considered censoring events for rwPFS computation. Peak reduction and cumulative decline in Patient Health Questionnaire-8 (PHQ-8) scores were analyzed in the progressed and nonprogressed patient subgroups. Results: The configured clinical NLP engine achieved a sentence-level progression capture accuracy of 98.2\%. At the patient level, initial progression was captured within {\textpm}30 days with 88\% accuracy. The median rwPFS for the study cohort (N=316) was 20 (95\% CI 18-25) months. In a validation subset (n=100), rwPFS determined by manual curation was 25 (95\% CI 15-35) months, closely aligning with the computational workflow's 22 (95\% CI 15-35) months. A subanalysis revealed rwPFS estimates of 30 (95\% CI 24-39) months from radiology reports and 23 (95\% CI 19-28) months from clinical notes, highlighting the importance of integrating multiple note sources. External validation also demonstrated high accuracy (92.5\% sentence level; 90.2\% patient level). Sensitivity analysis revealed stable rwPFS estimates across varying levels of missing source data and event definitions. Peak reduction in PHQ-8 scores during the study period highlighted significant associations between patient-reported outcomes and disease progression. Conclusions: This workflow enables rapid and reliable determination of rwPFS in patients with mBC receiving combination therapy. Further validation across more diverse external datasets and other cancer types is needed to ensure broader applicability and generalizability. ", doi="10.2196/64697", url="https://cancer.jmir.org/2025/1/e64697" } @Article{info:doi/10.2196/63964, author="Mushcab, Hayat and Al Ramis, Mohammed and AlRujaib, Abdulrahman and Eskandarani, Rawan and Sunbul, Tamara and AlOtaibi, Anwar and Obaidan, Mohammed and Al Harbi, Reman and Aljabri, Duaa", title="Application of Artificial Intelligence in Cardio-Oncology Imaging for Cancer Therapy--Related Cardiovascular Toxicity: Systematic Review", journal="JMIR Cancer", year="2025", month="May", day="9", volume="11", pages="e63964", keywords="artificial intelligence", keywords="cardiology", keywords="oncology", keywords="cancer therapy--induced", keywords="cardiotoxicity", keywords="cardiovascular toxicity", keywords="machine learning", keywords="imaging", keywords="radiology", abstract="Background: Artificial intelligence (AI) is a revolutionary tool yet to be fully integrated into several health care sectors, including medical imaging. AI can transform how medical imaging is conducted and interpreted, especially in cardio-oncology. Objective: This study aims to systematically review the available literature on the use of AI in cardio-oncology imaging to predict cardiotoxicity and describe the possible improvement of different imaging modalities that can be achieved if AI is successfully deployed to routine practice. Methods: We conducted a database search in PubMed, Ovid MEDLINE, Cochrane Library, CINAHL, and Google Scholar from inception to 2023 using the AI research assistant tool (Elicit) to search for original studies reporting AI outcomes in adult patients diagnosed with any cancer and undergoing cardiotoxicity assessment. Outcomes included incidence of cardiotoxicity, left ventricular ejection fraction, risk factors associated with cardiotoxicity, heart failure, myocardial dysfunction, signs of cancer therapy--related cardiovascular toxicity, echocardiography, and cardiac magnetic resonance imaging. Descriptive information about each study was recorded, including imaging technique, AI model, outcomes, and limitations. Results: The systematic search resulted in 7 studies conducted between 2018 and 2023, which are included in this review. Most of these studies were conducted in the United States (71\%), included patients with breast cancer (86\%), and used magnetic resonance imaging as the imaging modality (57\%). The quality assessment of the studies had an average of 86\% compliance in all of the tool's sections. In conclusion, this systematic review demonstrates the potential of AI to enhance cardio-oncology imaging for predicting cardiotoxicity in patients with cancer. Conclusions: Our findings suggest that AI can enhance the accuracy and efficiency of cardiotoxicity assessments. However, further research through larger, multicenter trials is needed to validate these applications and refine AI technologies for routine use, paving the way for improved patient outcomes in cancer survivors at risk of cardiotoxicity. Trial Registration: PROSPERO CRD42023446135; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023446135 ", doi="10.2196/63964", url="https://cancer.jmir.org/2025/1/e63964" } @Article{info:doi/10.2196/62833, author="Huang, Xiayuan and Ren, Shushun and Mao, Xinyue and Chen, Sirui and Chen, Elle and He, Yuqi and Jiang, Yun", title="Association Between Risk Factors and Major Cancers: Explainable Machine Learning Approach", journal="JMIR Cancer", year="2025", month="May", day="2", volume="11", pages="e62833", keywords="electronic health record", keywords="EHR", keywords="cancer risk modeling", keywords="risk factor analysis", keywords="explainable machine learning", keywords="machine learning", keywords="ML", keywords="risk factor", keywords="major cancers", keywords="monitoring", keywords="cancer risk", keywords="breast cancer", keywords="colorectal cancer", keywords="lung cancer", keywords="prostate cancer", keywords="cancer patients", keywords="clinical decision-making", abstract="Background: Cancer is a life-threatening disease and a leading cause of death worldwide, with an estimated 611,000 deaths and over 2 million new cases in the United States in 2024. The rising incidence of major cancers, including among younger individuals, highlights the need for early screening and monitoring of risk factors to manage and decrease cancer risk. Objective: This study aimed to leverage explainable machine learning models to identify and analyze the key risk factors associated with breast, colorectal, lung, and prostate cancers. By uncovering significant associations between risk factors and these major cancer types, we sought to enhance the understanding of cancer diagnosis risk profiles. Our goal was to facilitate more precise screening, early detection, and personalized prevention strategies, ultimately contributing to better patient outcomes and promoting health equity. Methods: Deidentified electronic health record data from Medical Information Mart for Intensive Care (MIMIC)--III was used to identify patients with 4 types of cancer who had longitudinal hospital visits prior to their diagnosis presence. Their records were matched and combined with those of patients without cancer diagnoses using propensity scores based on demographic factors. Three advanced models, penalized logistic regression, random forest, and multilayer perceptron (MLP), were conducted to identify the rank of risk factors for each cancer type, with feature importance analysis for random forest and MLP models. The rank biased overlap was adopted to compare the similarity of ranked risk factors across cancer types. Results: Our framework evaluated the prediction performance of explainable machine learning models, with the MLP model demonstrating the best performance. It achieved an area under the receiver operating characteristic curve of 0.78 for breast cancer (n=58), 0.76 for colorectal cancer (n=140), 0.84 for lung cancer (n=398), and 0.78 for prostate cancer (n=104), outperforming other baseline models (P<.001). In addition to demographic risk factors, the most prominent nontraditional risk factors overlapped across models and cancer types, including hyperlipidemia (odds ratio [OR] 1.14, 95\% CI 1.11?1.17; P<.01), diabetes (OR 1.34, 95\% CI 1.29?1.39; P<.01), depressive disorders (OR 1.11, 95\% CI 1.06?1.16; P<.01), heart diseases (OR 1.42, 95\% CI 1.32?1.52; P<.01), and anemia (OR 1.22, 95\% CI 1.14?1.30; P<.01). The similarity analysis indicated the unique risk factor pattern for lung cancer from other cancer types. Conclusions: The study's findings demonstrated the effectiveness of explainable ML models in assessing nontraditional risk factors for major cancers and highlighted the importance of considering unique risk profiles for different cancer types. Moreover, this research served as a hypothesis-generating foundation, providing preliminary results for future investigation into cancer diagnosis risk analysis and management. Furthermore, expanding collaboration with clinical experts for external validation would be essential to refine model outputs, integrate findings into practice, and enhance their impact on patient care and cancer prevention efforts. ", doi="10.2196/62833", url="https://cancer.jmir.org/2025/1/e62833" } @Article{info:doi/10.2196/66189, author="Li, Hui and Yao, Haiyang and Gao, Yuxiang and Luo, Hang and Cai, Changbin and Zhou, Zhou and Yuan, Muhan and Jiang, Wei", title="Identification of Major Bleeding Events in Postoperative Patients With Malignant Tumors in Chinese Electronic Medical Records: Algorithm Development and Validation", journal="JMIR Form Res", year="2025", month="May", day="1", volume="9", pages="e66189", keywords="machine learning", keywords="electronic medical record", keywords="postoperative patients with malignant tumors", keywords="postoperative bleeding", keywords="tumor surgery", keywords="abdominal", abstract="Background: Postoperative bleeding is a serious complication following abdominal tumor surgery, but it is often not clearly diagnosed and documented in clinical practice in China. Previous studies have relied on manual interpretation of medical records to determine the presence of postoperative bleeding in patients, which is time-consuming and laborious. More critically, this manual approach severely hinders the efficient analysis of large volumes of medical data, impeding in-depth research into the incidence patterns and risk factors of postoperative bleeding. It remains unclear whether machine learning can play a role in processing large volumes of medical text to identify postoperative bleeding effectively. Objective: This study aimed to develop a machine learning model tool for identifying postoperative patients with major bleeding based on the electronic medical record system. Methods: This study used data from the available information in the National Health and Medical Big Data (Eastern) Center in Jiangsu Province of China. We randomly selected the medical records of 2,000 patients who underwent in-hospital tumor resection surgery between January 2018 and December 2021 from the database. Physicians manually classified each note as present or absent for a major bleeding event during the postoperative hospital stay. Feature engineering involved bleeding expressions, high-frequency related expressions, and quantitative logical judgment, resulting in 270 features. Logistic regression (LR), K-nearest neighbor (KNN), and convolutional neural network (CNN) models were developed and trained using the 1600-note training set. The main outcomes were accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for each model. Results: Major bleeding was present in 4.31\% (69/1600) of the training set and 4.75\% (19/400) of the test set. In the test set, the LR method achieved an accuracy of 0.8275, a sensitivity of 0.8947, a specificity of 0.8241, a PPV of 0.2024, an NPV of 0.9937, and an F1-score of 0.3301. The CNN method demonstrated an accuracy of 0.8900, sensitivity of 0.8421, specificity of 0.8924, PPV of 0.2807, NPV of 0.9913, and an F1-score of 0.4211. While the KNN method showed a high specificity of 0.9948 and an accuracy of 0.9575 in the test set, its sensitivity was notably low at 0.2105. The C-statistic for the LR method was 0.9018 and for the CNN method was 0.8830. Conclusions: Both the LR and CNN methods demonstrate good performance in identifying major bleeding in patients with postoperative malignant tumors from electronic medical records, exhibiting high sensitivity and specificity. Given the higher sensitivity of the LR method (89.47\%) and the higher specificity of the CNN method (89.24\%) in the test set, both models hold promise for practical application, depending on specific clinical priorities. ", doi="10.2196/66189", url="https://formative.jmir.org/2025/1/e66189" } @Article{info:doi/10.2196/69864, author="Jin, Yudi and Zhao, Min and Su, Tong and Fan, Yanjia and Ouyang, Zubin and Lv, Fajin", title="Comparing Random Survival Forests and Cox Regression for Nonresponders to Neoadjuvant Chemotherapy Among Patients With Breast Cancer: Multicenter Retrospective Cohort Study", journal="J Med Internet Res", year="2025", month="Apr", day="8", volume="27", pages="e69864", keywords="breast cancer", keywords="neoadjuvant chemotherapy", keywords="pathological complete response", keywords="survival risk", keywords="random survival forest", abstract="Background: Breast cancer is one of the most common malignancies among women worldwide. Patients who do not achieve a pathological complete response (pCR) or a clinical complete response (cCR) post--neoadjuvant chemotherapy (NAC) typically have a worse prognosis compared to those who do achieve these responses. Objective: This study aimed to develop and validate a random survival forest (RSF) model to predict survival risk in patients with breast cancer who do not achieve a pCR or cCR post-NAC. Methods: We analyzed patients with no pCR/cCR post-NAC treated at the First Affiliated Hospital of Chongqing Medical University from January 2019 to 2023, with external validation in Duke University and Surveillance, Epidemiology, and End Results (SEER) cohorts. RSF and Cox regression models were compared using the time-dependent area under the curve (AUC), the concordance index (C-index), and risk stratification. Results: The study cohort included 306 patients with breast cancer, with most aged 40-60 years (204/306, 66.7\%). The majority had invasive ductal carcinoma (290/306, 94.8\%), with estrogen receptor (ER)+ (182/306, 59.5\%), progesterone receptor (PR)-- (179/306, 58.5\%), and human epidermal growth factor receptor 2 (HER2)+ (94/306, 30.7\%) profiles. Most patients presented with T2 (185/306, 60.5\%), N1 (142/306, 46.4\%), and M0 (295/306, 96.4\%) staging (TNM meaning ``tumor, node, metastasis''), with 17.6\% (54/306) experiencing disease progression during a median follow-up of 25.9 months (IQR 17.2-36.3). External validation using Duke (N=94) and SEER (N=2760) cohorts confirmed consistent patterns in age (40-60 years: 59/94, 63\%, vs 1480/2760, 53.6\%), HER2+ rates (26/94, 28\%, vs 935/2760, 33.9\%), and invasive ductal carcinoma prevalence (89/94, 95\%, vs 2506/2760, 90.8\%). In the internal cohort, the RSF achieved significantly higher time-dependent AUCs compared to Cox regression at 1-year (0.811 vs 0.763), 3-year (0.834 vs 0.783), and 5-year (0.810 vs 0.771) intervals (overall C-index: 0.803, 95\% CI 0.747-0.859, vs 0.736, 95\% CI 0.673-0.799). External validation confirmed robust generalizability: the Duke cohort showed 1-, 3-, and 5-year AUCs of 0.912, 0.803, and 0.776, respectively, while the SEER cohort maintained consistent performance with AUCs of 0.771, 0.729, and 0.702, respectively. Risk stratification using the RSF identified 25.8\% (79/306) high-risk patients and a significantly reduced survival time (P<.001). Notably, the RSF maintained improved net benefits across decision thresholds in decision curve analysis (DCA); similar results were observed in external studies. The RSF model also showed promising performance across different molecular subtypes in all datasets. Based on the RSF predicted scores, patients were stratified into high- and low-risk groups, with notably poorer survival outcomes observed in the high-risk group compared to the low-risk group. Conclusions: The RSF model, based solely on clinicopathological variables, provides a promising tool for identifying high-risk patients with breast cancer post-NAC. This approach may facilitate personalized treatment strategies and improve patient management in clinical practice. ", doi="10.2196/69864", url="https://www.jmir.org/2025/1/e69864" } @Article{info:doi/10.2196/65645, author="Goes Job, Eduarda Maria and Fukumasu, Heidge and Malta, Maistro Tathiane and Porfirio Xavier, Luiz Pedro", title="Investigating Associations Between Prognostic Factors in Gliomas: Unsupervised Multiple Correspondence Analysis", journal="JMIR Bioinform Biotech", year="2025", month="Mar", day="12", volume="6", pages="e65645", keywords="brain tumors", keywords="bioinformatics", keywords="stemness", keywords="multiple correspondence analysis", abstract="Background: Multiple correspondence analysis (MCA) is an unsupervised data science methodology that aims to identify and represent associations between categorical variables. Gliomas are an aggressive type of cancer characterized by diverse molecular and clinical features that serve as key prognostic factors. Thus, advanced computational approaches are essential to enhance the analysis and interpretation of the associations between clinical and molecular features in gliomas. Objective: This study aims to apply MCA to identify associations between glioma prognostic factors and also explore their associations with stemness phenotype. Methods: Clinical and molecular data from 448 patients with brain tumors were obtained from the Cancer Genome Atlas. The DNA methylation stemness index, derived from DNA methylation patterns, was built using a one-class logistic regression. Associations between variables were evaluated using the $\chi${\texttwosuperior} test with k degrees of freedom, followed by analysis of the adjusted standardized residuals (ASRs >1.96 indicate a significant association between variables). MCA was used to uncover associations between glioma prognostic factors and stemness. Results: Our analysis revealed significant associations among molecular and clinical characteristics in gliomas. Additionally, we demonstrated the capability of MCA to identify associations between stemness and these prognostic factors. Our results exhibited a strong association between higher DNA methylation stemness index and features related to poorer prognosis such as glioblastoma cancer type (ASR: 8.507), grade 4 (ASR: 8.507), isocitrate dehydrogenase wild type (ASR:15.904), unmethylated MGMT (methylguanine methyltransferase) Promoter (ASR: 9.983), and telomerase reverse transcriptase expression (ASR: 3.351), demonstrating the utility of MCA as an analytical tool for elucidating potential prognostic factors. Conclusions: MCA is a valuable tool for understanding the complex interdependence of prognostic markers in gliomas. MCA facilitates the exploration of large-scale datasets and enhances the identification of significant associations. ", doi="10.2196/65645", url="https://bioinform.jmir.org/2025/1/e65645" } @Article{info:doi/10.2196/64364, author="Berman, Eliza and Sundberg Malek, Holly and Bitzer, Michael and Malek, Nisar and Eickhoff, Carsten", title="Retrieval Augmented Therapy Suggestion for Molecular Tumor Boards: Algorithmic Development and Validation Study", journal="J Med Internet Res", year="2025", month="Mar", day="5", volume="27", pages="e64364", keywords="large language models", keywords="retrieval augmented generation", keywords="LLaMA", keywords="precision oncology", keywords="molecular tumor board", keywords="molecular tumor", keywords="LLMs", keywords="augmented therapy", keywords="MTB", keywords="oncology", keywords="tumor", keywords="clinical trials", keywords="patient care", keywords="treatment", keywords="evidence-based", keywords="accessibility to care", abstract="Background: Molecular tumor boards (MTBs) require intensive manual investigation to generate optimal treatment recommendations for patients. Large language models (LLMs) can catalyze MTB recommendations, decrease human error, improve accessibility to care, and enhance the efficiency of precision oncology. Objective: In this study, we aimed to investigate the efficacy of LLM-generated treatments for MTB patients. We specifically investigate the LLMs' ability to generate evidence-based treatment recommendations using PubMed references. Methods: We built a retrieval augmented generation pipeline using PubMed data. We prompted the resulting LLM to generate treatment recommendations with PubMed references using a test set of patients from an MTB conference at a large comprehensive cancer center at a tertiary care institution. Members of the MTB manually assessed the relevancy and correctness of the generated responses. Results: A total of 75\% of the referenced articles were properly cited from PubMed, while 17\% of the referenced articles were hallucinations, and the remaining were not properly cited from PubMed. Clinician-generated LLM queries achieved higher accuracy through clinician evaluation than automated queries, with clinicians labeling 25\% of LLM responses as equal to their recommendations and 37.5\% as alternative plausible treatments. Conclusions: This study demonstrates how retrieval augmented generation--enhanced LLMs can be a powerful tool in accelerating MTB conferences, as LLMs are sometimes capable of achieving clinician-equal treatment recommendations. However, further investigation is required to achieve stable results with zero hallucinations. LLMs signify a scalable solution to the time-intensive process of MTB investigations. However, LLM performance demonstrates that they must be used with heavy clinician supervision, and cannot yet fully automate the MTB pipeline. ", doi="10.2196/64364", url="https://www.jmir.org/2025/1/e64364", url="http://www.ncbi.nlm.nih.gov/pubmed/40053768" } @Article{info:doi/10.2196/62851, author="Fu, Yao and Huang, Zongyao and Deng, Xudong and Xu, Linna and Liu, Yang and Zhang, Mingxing and Liu, Jinyi and Huang, Bin", title="Artificial Intelligence in Lymphoma Histopathology: Systematic Review", journal="J Med Internet Res", year="2025", month="Feb", day="14", volume="27", pages="e62851", keywords="lymphoma", keywords="artificial intelligence", keywords="bias", keywords="histopathology", keywords="tumor", keywords="hematological", keywords="lymphatic disease", keywords="public health", keywords="pathologists", keywords="pathology", keywords="immunohistochemistry", keywords="diagnosis", keywords="prognosis", abstract="Background: Artificial intelligence (AI) shows considerable promise in the areas of lymphoma diagnosis, prognosis, and gene prediction. However, a comprehensive assessment of potential biases and the clinical utility of AI models is still needed. Objective: Our goal was to evaluate the biases of published studies using AI models for lymphoma histopathology and assess the clinical utility of comprehensive AI models for diagnosis or prognosis. Methods: This study adhered to the Systematic Review Reporting Standards. A comprehensive literature search was conducted across PubMed, Cochrane Library, and Web of Science from their inception until August 30, 2024. The search criteria included the use of AI for prognosis involving human lymphoma tissue pathology images, diagnosis, gene mutation prediction, etc. The risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Information for each AI model was systematically tabulated, and summary statistics were reported. The study is registered with PROSPERO (CRD42024537394) and follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 reporting guidelines. Results: The search identified 3565 records, with 41 articles ultimately meeting the inclusion criteria. A total of 41 AI models were included in the analysis, comprising 17 diagnostic models, 10 prognostic models, 2 models for detecting ectopic gene expression, and 12 additional models related to diagnosis. All studies exhibited a high or unclear risk of bias, primarily due to limited analysis and incomplete reporting of participant recruitment. Most high-risk models (10/41) predominantly assigned high-risk classifications to participants. Almost all the articles presented an unclear risk of bias in at least one domain, with the most frequent being participant selection (16/41) and statistical analysis (37/41). The primary reasons for this were insufficient analysis of participant recruitment and a lack of interpretability in outcome analyses. In the diagnostic models, the most frequently studied lymphoma subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and mantle cell lymphoma, while in the prognostic models, the most common subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and Hodgkin lymphoma. In the internal validation results of all models, the area under the receiver operating characteristic curve (AUC) ranged from 0.75 to 0.99 and accuracy ranged from 68.3\% to 100\%. In models with external validation results, the AUC ranged from 0.93 to 0.99. Conclusions: From a methodological perspective, all models exhibited biases. The enhancement of the accuracy of AI models and the acceleration of their clinical translation hinge on several critical aspects. These include the comprehensive reporting of data sources, the diversity of datasets, the study design, the transparency and interpretability of AI models, the use of cross-validation and external validation, and adherence to regulatory guidance and standardized processes in the field of medical AI. ", doi="10.2196/62851", url="https://www.jmir.org/2025/1/e62851" } @Article{info:doi/10.2196/66269, author="Shan, Rui and Li, Xin and Chen, Jing and Chen, Zheng and Cheng, Yuan-Jia and Han, Bo and Hu, Run-Ze and Huang, Jiu-Ping and Kong, Gui-Lan and Liu, Hui and Mei, Fang and Song, Shi-Bing and Sun, Bang-Kai and Tian, Hui and Wang, Yang and Xiao, Wu-Cai and Yao, Xiang-Yun and Ye, Jing-Ming and Yu, Bo and Yuan, Chun-Hui and Zhang, Fan and Liu, Zheng", title="Interpretable Machine Learning to Predict the Malignancy Risk of Follicular Thyroid Neoplasms in Extremely Unbalanced Data: Retrospective Cohort Study and Literature Review", journal="JMIR Cancer", year="2025", month="Feb", day="10", volume="11", pages="e66269", keywords="follicular thyroid neoplasm", keywords="machine learning", keywords="prediction model", keywords="malignancy", keywords="unbalanced data", keywords="literature review", abstract="Background: Diagnosing and managing follicular thyroid neoplasms (FTNs) remains a significant challenge, as the malignancy risk cannot be determined until after diagnostic surgery. Objective: We aimed to use interpretable machine learning to predict the malignancy risk of FTNs preoperatively in a real-world setting. Methods: We conducted a retrospective cohort study at the Peking University Third Hospital in Beijing, China. Patients with postoperative pathological diagnoses of follicular thyroid adenoma (FTA) or follicular thyroid carcinoma (FTC) were included, excluding those without preoperative thyroid ultrasonography. We used 22 predictors involving demographic characteristics, thyroid sonography, and hormones to train 5 machine learning models: logistic regression, least absolute shrinkage and selection operator regression, random forest, extreme gradient boosting, and support vector machine. The optimal model was selected based on discrimination, calibration, interpretability, and parsimony. To address the highly imbalanced data (FTA:FTC ratio>5:1), model discrimination was assessed using both the area under the receiver operating characteristic curve and the area under the precision-recall curve (AUPRC). To interpret the model, we used Shapley Additive Explanations values and partial dependence and individual conditional expectation plots. Additionally, a systematic review was performed to synthesize existing evidence and validate the discrimination ability of the previously developed Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria to differentiate between benign and malignant FTNs using our data. Results: The cohort included 1539 patients (mean age 47.98, SD 14.15 years; female: n=1126, 73.16\%) with 1672 FTN tumors (FTA: n=1414; FTC: n=258; FTA:FTC ratio=5.5). The random forest model emerged as optimal, identifying mean thyroid-stimulating hormone (TSH) score, mean tumor diameter, mean TSH, TSH instability, and TSH measurement levels as the top 5 predictors in discriminating FTA from FTC, with the area under the receiver operating characteristic curve of 0.79 (95\% CI 0.77?0.81) and AUPRC of 0.40 (95\% CI 0.37-0.44). Malignancy risk increased nonlinearly with larger tumor diameters and higher TSH instability but decreased nonlinearly with higher mean TSH scores or mean TSH levels. FTCs with small sizes (mean diameter 2.88, SD 1.38 cm) were more likely to be misclassified as FTAs compared to larger ones (mean diameter 3.71, SD 1.36 cm). The systematic review of the 7 included studies revealed that (1) the FTA:FTC ratio varied from 0.6 to 4.0, lower than the natural distribution of 5.0; (2) no studies assessed prediction performance using AUPRC in unbalanced datasets; and (3) external validations of Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria underperformed relative to the original study. Conclusions: Tumor size and TSH measurements were important in screening FTN malignancy risk preoperatively, but accurately predicting the risk of small-sized FTNs remains challenging. Future research should address the limitations posed by the extreme imbalance in FTA and FTC distributions in real-world data. ", doi="10.2196/66269", url="https://cancer.jmir.org/2025/1/e66269" } @Article{info:doi/10.2196/58760, author="Li, Yanong and He, Yixuan and Liu, Yawei and Wang, Bingchen and Li, Bo and Qiu, Xiaoguang", title="Identification of Intracranial Germ Cell Tumors Based on Facial Photos: Exploratory Study on the Use of Deep Learning for Software Development", journal="J Med Internet Res", year="2025", month="Jan", day="30", volume="27", pages="e58760", keywords="deep learning", keywords="facial recognition", keywords="intracranial germ cell tumors", keywords="endocrine indicators", keywords="software development", keywords="artificial intelligence", keywords="machine learning models", keywords="software engineering", keywords="neural networks", keywords="algorithms", keywords="cohort studies", abstract="Background: Primary intracranial germ cell tumors (iGCTs) are highly malignant brain tumors that predominantly occur in children and adolescents, with an incidence rate ranking third among primary brain tumors in East Asia (8\%-15\%). Due to their insidious onset and impact on critical functional areas of the brain, these tumors often result in irreversible abnormalities in growth and development, as well as cognitive and motor impairments in affected children. Therefore, early diagnosis through advanced screening techniques is vital for improving patient outcomes and quality of life. Objective: This study aimed to investigate the application of facial recognition technology in the early detection of iGCTs in children and adolescents. Early diagnosis through advanced screening techniques is vital for improving patient outcomes and quality of life. Methods: A multicenter, phased approach was adopted for the development and validation of a deep learning model, GVisageNet, dedicated to the screening of midline brain tumors from normal controls (NCs) and iGCTs from other midline brain tumors. The study comprised the collection and division of datasets into training (n=847, iGCTs=358, NCs=300, other midline brain tumors=189) and testing (n=212, iGCTs=79, NCs=70, other midline brain tumors=63), with an additional independent validation dataset (n=336, iGCTs=130, NCs=100, other midline brain tumors=106) sourced from 4 medical institutions. A regression model using clinically relevant, statistically significant data was developed and combined with GVisageNet outputs to create a hybrid model. This integration sought to assess the incremental value of clinical data. The model's predictive mechanisms were explored through correlation analyses with endocrine indicators and stratified evaluations based on the degree of hypothalamic-pituitary-target axis damage. Performance metrics included area under the curve (AUC), accuracy, sensitivity, and specificity. Results: On the independent validation dataset, GVisageNet achieved an AUC of 0.938 (P<.01) in distinguishing midline brain tumors from NCs. Further, GVisageNet demonstrated significant diagnostic capability in distinguishing iGCTs from the other midline brain tumors, achieving an AUC of 0.739, which is superior to the regression model alone (AUC=0.632, P<.001) but less than the hybrid model (AUC=0.789, P=.04). Significant correlations were found between the GVisageNet's outputs and 7 endocrine indicators. Performance varied with hypothalamic-pituitary-target axis damage, indicating a further understanding of the working mechanism of GVisageNet. Conclusions: GVisageNet, capable of high accuracy both independently and with clinical data, shows substantial potential for early iGCTs detection, highlighting the importance of combining deep learning with clinical insights for personalized health care. ", doi="10.2196/58760", url="https://www.jmir.org/2025/1/e58760" } @Article{info:doi/10.2196/57275, author="Yamagishi, Yosuke and Nakamura, Yuta and Hanaoka, Shouhei and Abe, Osamu", title="Large Language Model Approach for Zero-Shot Information Extraction and Clustering of Japanese Radiology Reports: Algorithm Development and Validation", journal="JMIR Cancer", year="2025", month="Jan", day="23", volume="11", pages="e57275", keywords="radiology reports", keywords="clustering", keywords="large language model", keywords="natural language processing", keywords="information extraction", keywords="lung cancer", keywords="machine learning", abstract="Background: The application of natural language processing in medicine has increased significantly, including tasks such as information extraction and classification. Natural language processing plays a crucial role in structuring free-form radiology reports, facilitating the interpretation of textual content, and enhancing data utility through clustering techniques. Clustering allows for the identification of similar lesions and disease patterns across a broad dataset, making it useful for aggregating information and discovering new insights in medical imaging. However, most publicly available medical datasets are in English, with limited resources in other languages. This scarcity poses a challenge for development of models geared toward non-English downstream tasks. Objective: This study aimed to develop and evaluate an algorithm that uses large language models (LLMs) to extract information from Japanese lung cancer radiology reports and perform clustering analysis. The effectiveness of this approach was assessed and compared with previous supervised methods. Methods: This study employed the MedTxt-RR dataset, comprising 135 Japanese radiology reports from 9 radiologists who interpreted the computed tomography images of 15 lung cancer patients obtained from Radiopaedia. Previously used in the NTCIR-16 (NII Testbeds and Community for Information Access Research) shared task for clustering performance competition, this dataset was ideal for comparing the clustering ability of our algorithm with those of previous methods. The dataset was split into 8 cases for development and 7 for testing, respectively. The study's approach involved using the LLM to extract information pertinent to lung cancer findings and transforming it into numeric features for clustering, using the K-means method. Performance was evaluated using 135 reports for information extraction accuracy and 63 test reports for clustering performance. This study focused on the accuracy of automated systems for extracting tumor size, location, and laterality from clinical reports. The clustering performance was evaluated using normalized mutual information, adjusted mutual information , and the Fowlkes-Mallows index for both the development and test data. Results: The tumor size was accurately identified in 99 out of 135 reports (73.3\%), with errors in 36 reports (26.7\%), primarily due to missing or incorrect size information. Tumor location and laterality were identified with greater accuracy in 112 out of 135 reports (83\%); however, 23 reports (17\%) contained errors mainly due to empty values or incorrect data. Clustering performance of the test data yielded an normalized mutual information of 0.6414, adjusted mutual information of 0.5598, and Fowlkes-Mallows index of 0.5354. The proposed method demonstrated superior performance across all evaluation metrics compared to previous methods. Conclusions: The unsupervised LLM approach surpassed the existing supervised methods in clustering Japanese radiology reports. These findings suggest that LLMs hold promise for extracting information from radiology reports and integrating it into disease-specific knowledge structures. ", doi="10.2196/57275", url="https://cancer.jmir.org/2025/1/e57275" } @Article{info:doi/10.2196/59480, author="Gopukumar, Deepika and Menon, Nirup and Schoen, W. Martin", title="Medication Prescription Policy for US Veterans With Metastatic Castration-Resistant Prostate Cancer: Causal Machine Learning Approach", journal="JMIR Med Inform", year="2024", month="Nov", day="19", volume="12", pages="e59480", keywords="prostate cancer", keywords="metastatic castration resistant prostate cancer", keywords="causal survival forest", keywords="machine learning", keywords="heterogeneity", keywords="prescription policy tree", keywords="oncology", keywords="pharmacology", abstract="Background: Prostate cancer is the second leading cause of death among American men. If detected and treated at an early stage, prostate cancer is often curable. However, an advanced stage such as metastatic castration-resistant prostate cancer (mCRPC) has a high risk of mortality. Multiple treatment options exist, the most common included docetaxel, abiraterone, and enzalutamide. Docetaxel is a cytotoxic chemotherapy, whereas abiraterone and enzalutamide are androgen receptor pathway inhibitors (ARPI). ARPIs are preferred over docetaxel due to lower toxicity. No study has used machine learning with patients' demographics, test results, and comorbidities to identify heterogeneous treatment rules that might improve the survival duration of patients with mCRPC. Objective: This study aimed to measure patient-level heterogeneity in the association of medication prescribed with overall survival duration (in the form of follow-up days) and arrive at a set of medication prescription rules using patient demographics, test results, and comorbidities. Methods: We excluded patients with mCRPC who were on docetaxel, cabaxitaxel, mitoxantrone, and sipuleucel-T either before or after the prescription of an ARPI. We included only the African American and white populations. In total, 2886 identified veterans treated for mCRPC who were prescribed either abiraterone or enzalutamide as the first line of treatment from 2014 to 2017, with follow-up until 2020, were analyzed. We used causal survival forests for analysis. The unit level of analysis was the patient. The primary outcome of this study was follow-up days indicating survival duration while on the first-line medication. After estimating the treatment effect, a prescription policy tree was constructed. Results: For 2886 veterans, enzalutamide is associated with an average of 59.94 (95\% CI 35.60-84.28) more days of survival than abiraterone. The increase in overall survival duration for the 2 drugs varied across patient demographics, test results, and comorbidities. Two data-driven subgroups of patients were identified by ranking them on their augmented inverse-propensity weighted (AIPW) scores. The average AIPW scores for the 2 subgroups were 19.36 (95\% CI --16.93 to 55.65) and 100.68 (95\% CI 62.46-138.89). Based on visualization and t test, the AIPW score for low and high subgroups was significant (P=.003), thereby supporting heterogeneity. The analysis resulted in a set of prescription rules for the 2 ARPIs based on a few covariates available to the physicians at the time of prescription. Conclusions: This study of 2886 veterans showed evidence of heterogeneity and that survival days may be improved for certain patients with mCRPC based on the medication prescribed. Findings suggest that prescription rules based on the patient characteristics, laboratory test results, and comorbidities available to the physician at the time of prescription could improve survival by providing personalized treatment decisions. ", doi="10.2196/59480", url="https://medinform.jmir.org/2024/1/e59480" } @Article{info:doi/10.2196/60323, author="Janbain, Ali and Farolfi, Andrea and Guenegou-Arnoux, Armelle and Romengas, Louis and Scharl, Sophia and Fanti, Stefano and Serani, Francesca and Peeken, C. Jan and Katsahian, Sandrine and Strouthos, Iosif and Ferentinos, Konstantinos and Koerber, A. Stefan and Vogel, E. Marco and Combs, E. Stephanie and Vrachimis, Alexis and Morganti, Giuseppe Alessio and Spohn, KB Simon and Grosu, Anca-Ligia and Ceci, Francesco and Henkenberens, Christoph and Kroeze, GC Stephanie and Guckenberger, Matthias and Belka, Claus and Bartenstein, Peter and Hruby, George and Emmett, Louise and Omerieh, Afshar Ali and Schmidt-Hegemann, Nina-Sophie and Mose, Lucas and Aebersold, M. Daniel and Zamboglou, Constantinos and Wiegel, Thomas and Shelan, Mohamed", title="A Machine Learning Approach for Predicting Biochemical Outcome After PSMA-PET--Guided Salvage Radiotherapy in Recurrent Prostate Cancer After Radical Prostatectomy: Retrospective Study", journal="JMIR Cancer", year="2024", month="Sep", day="20", volume="10", pages="e60323", keywords="cancer", keywords="oncologist", keywords="metastases", keywords="prostate", keywords="prostate cancer", keywords="prostatectomy", keywords="salvage radiotherapy", keywords="PSMA-PET", keywords="prostate-specific membrane antigen--positron emission tomography", keywords="prostate-specific membrane antigen", keywords="PET", keywords="positron emission tomography", keywords="radiotherapy", keywords="radiology", keywords="radiography", keywords="machine learning", keywords="ML", keywords="artificial intelligence", keywords="AI", keywords="algorithm", keywords="algorithms", keywords="predictive model", keywords="predictive models", keywords="predictive analytics", keywords="predictive system", keywords="practical model", keywords="practical models", keywords="deep learning", abstract="Background: Salvage radiation therapy (sRT) is often the sole curative option in patients with biochemical recurrence after radical prostatectomy. After sRT, we developed and validated a nomogram to predict freedom from biochemical failure. Objective: This study aims to evaluate prostate-specific membrane antigen--positron emission tomography (PSMA-PET)--based sRT efficacy for postprostatectomy prostate-specific antigen (PSA) persistence or recurrence. Objectives include developing a random survival forest (RSF) model for predicting biochemical failure, comparing it with a Cox model, and assessing predictive accuracy over time. Multinational cohort data will validate the model's performance, aiming to improve clinical management of recurrent prostate cancer. Methods: This multicenter retrospective study collected data from 13 medical facilities across 5 countries: Germany, Cyprus, Australia, Italy, and Switzerland. A total of 1029 patients who underwent sRT following PSMA-PET--based assessment for PSA persistence or recurrence were included. Patients were treated between July 2013 and June 2020, with clinical decisions guided by PSMA-PET results and contemporary standards. The primary end point was freedom from biochemical failure, defined as 2 consecutive PSA rises >0.2 ng/mL after treatment. Data were divided into training (708 patients), testing (271 patients), and external validation (50 patients) sets for machine learning algorithm development and validation. RSF models were used, with 1000 trees per model, optimizing predictive performance using the Harrell concordance index and Brier score. Statistical analysis used R Statistical Software (R Foundation for Statistical Computing), and ethical approval was obtained from participating institutions. Results: Baseline characteristics of 1029 patients undergoing sRT PSMA-PET--based assessment were analyzed. The median age at sRT was 70 (IQR 64-74) years. PSMA-PET scans revealed local recurrences in 43.9\% (430/979) and nodal recurrences in 27.2\% (266/979) of patients. Treatment included dose-escalated sRT to pelvic lymphatics in 35.6\% (349/979) of cases. The external outlier validation set showed distinct features, including higher rates of positive lymph nodes (47/50, 94\% vs 266/979, 27.2\% in the learning cohort) and lower delivered sRT doses (<66 Gy in 57/979, 5.8\% vs 46/50, 92\% of patients; P<.001). The RSF model, validated internally and externally, demonstrated robust predictive performance (Harrell C-index range: 0.54-0.91) across training and validation datasets, outperforming a previously published nomogram. Conclusions: The developed RSF model demonstrates enhanced predictive accuracy, potentially improving patient outcomes and assisting clinicians in making treatment decisions. ", doi="10.2196/60323", url="https://cancer.jmir.org/2024/1/e60323" } @Article{info:doi/10.2196/56022, author="Lin, Tai-Han and Chung, Hsing-Yi and Jian, Ming-Jr and Chang, Chih-Kai and Perng, Cherng-Lih and Liao, Guo-Shiou and Yu, Jyh-Cherng and Dai, Ming-Shen and Yu, Cheng-Ping and Shang, Hung-Sheng", title="An Advanced Machine Learning Model for a Web-Based Artificial Intelligence--Based Clinical Decision Support System Application: Model Development and Validation Study", journal="J Med Internet Res", year="2024", month="Sep", day="4", volume="26", pages="e56022", keywords="breast cancer recurrence", keywords="artificial intelligence--based clinical decision support system", keywords="machine learning", keywords="personalized treatment planning", keywords="ChatGPT", keywords="predictive model accuracy", abstract="Background: Breast cancer is a leading global health concern, necessitating advancements in recurrence prediction and management. The development of an artificial intelligence (AI)--based clinical decision support system (AI-CDSS) using ChatGPT addresses this need with the aim of enhancing both prediction accuracy and user accessibility. Objective: This study aims to develop and validate an advanced machine learning model for a web-based AI-CDSS application, leveraging the question-and-answer guidance capabilities of ChatGPT to enhance data preprocessing and model development, thereby improving the prediction of breast cancer recurrence. Methods: This study focused on developing an advanced machine learning model by leveraging data from the Tri-Service General Hospital breast cancer registry of 3577 patients (2004-2016). As a tertiary medical center, it accepts referrals from four branches---3 branches in the northern region and 1 branch on an offshore island in our country---that manage chronic diseases but refer complex surgical cases, including breast cancer, to the main center, enriching our study population's diversity. Model training used patient data from 2004 to 2012, with subsequent validation using data from 2013 to 2016, ensuring comprehensive assessment and robustness of our predictive models. ChatGPT is integral to preprocessing and model development, aiding in hormone receptor categorization, age binning, and one-hot encoding. Techniques such as the synthetic minority oversampling technique address the imbalance of data sets. Various algorithms, including light gradient-boosting machine, gradient boosting, and extreme gradient boosting, were used, and their performance was evaluated using metrics such as the area under the curve, accuracy, sensitivity, and F1-score. Results: The light gradient-boosting machine model demonstrated superior performance, with an area under the curve of 0.80, followed closely by the gradient boosting and extreme gradient boosting models. The web interface of the AI-CDSS tool was effectively tested in clinical decision-making scenarios, proving its use in personalized treatment planning and patient involvement. Conclusions: The AI-CDSS tool, enhanced by ChatGPT, marks a significant advancement in breast cancer recurrence prediction, offering a more individualized and accessible approach for clinicians and patients. Although promising, further validation in diverse clinical settings is recommended to confirm its efficacy and expand its use. ", doi="10.2196/56022", url="https://www.jmir.org/2024/1/e56022" } @Article{info:doi/10.2196/54740, author="Islam, Nazmul and Reuben, S. Jamie and Dale, Justin and Coates, W. James and Sapiah, Karan and Markson, R. Frank and Jordan, T. Craig and Smith, Clay", title="Predictive Models for Long Term Survival of AML Patients Treated with Venetoclax and Azacitidine or 7+3 Based on Post Treatment Events and Responses: Retrospective Cohort Study", journal="JMIR Cancer", year="2024", month="Aug", day="21", volume="10", pages="e54740", keywords="Leukemia, Myeloid, Acute", keywords="Venetoclax", keywords="Azacitidine", keywords="Anthracycline", keywords="Arabinoside, Cytosine", keywords="Clinical Decision Support", keywords="Clinical Informatics", keywords="Machine Learning", keywords="Predictive Model", keywords="Overall Survival", abstract="Background: The treatment of acute myeloid leukemia (AML) in older or unfit patients typically involves a regimen of venetoclax plus azacitidine (ven/aza). Toxicity and treatment responses are highly variable following treatment initiation and clinical decision-making continually evolves in response to these as treatment progresses. To improve clinical decision support (CDS) following treatment initiation, predictive models based on evolving and dynamic toxicities, disease responses, and other features should be developed. Objective: This study aims to generate machine learning (ML)--based predictive models that incorporate individual predictors of overall survival (OS) for patients with AML, based on clinical events occurring after the initiation of ven/aza or 7+3 regimen. Methods: Data from 221 patients with AML, who received either the ven/aza (n=101 patients) or 7+3 regimen (n=120 patients) as their initial induction therapy, were retrospectively analyzed. We performed stratified univariate and multivariate analyses to quantify the association between toxicities, hospital events, and short-term disease responses and OS for the 7+3 and ven/aza subgroups separately. We compared the estimates of confounders to assess potential effect modifications by treatment. 17 ML-based predictive models were developed. The optimal predictive models were selected based on their predictability and discriminability using cross-validation. Uncertainty in the estimation was assessed through bootstrapping. Results: The cumulative incidence of posttreatment toxicities varies between the ven/aza and 7+3 regimen. A variety of laboratory features and clinical events during the first 30 days were differentially associated with OS for the two treatments. An initial transfer to intensive care unit (ICU) worsened OS for 7+3 patients (aHR 1.18, 95\% CI 1.10-1.28), while ICU readmission adversely affected OS for those on ven/aza (aHR 1.24, 95\% CI 1.12-1.37). At the initial follow-up, achieving a morphologic leukemia free state (MLFS) did not affect OS for ven/aza (aHR 0.99, 95\% CI 0.94-1.05), but worsened OS following 7+3 (aHR 1.16, 95\% CI 1.01-1.31) compared to that of complete remission (CR). Having blasts over 5\% at the initial follow-up negatively impacted OS for both 7+3 (P<.001) and ven/aza (P<.001) treated patients. A best response of CR and CR with incomplete recovery (CRi) was superior to MLFS and refractory disease after ven/aza (P<.001), whereas for 7+3, CR was superior to CRi, MLFS, and refractory disease (P<.001), indicating unequal outcomes. Treatment-specific predictive models, trained on 120 7+3 and 101 ven/aza patients using over 114 features, achieved survival AUCs over 0.70. Conclusions: Our findings indicate that toxicities, clinical events, and responses evolve differently in patients receiving ven/aza compared with that of 7+3 regimen. ML-based predictive models were shown to be a feasible strategy for CDS in both forms of AML treatment. If validated with larger and more diverse data sets, these findings could offer valuable insights for developing AML-CDS tools that leverage posttreatment clinical data. ", doi="10.2196/54740", url="https://cancer.jmir.org/2024/1/e54740" } @Article{info:doi/10.2196/56538, author="Raghu, Ananya and Raghu, Anisha and Wise, F. Jillian", title="Deep Learning--Based Identification of Tissue of Origin for Carcinomas of Unknown Primary Using MicroRNA Expression: Algorithm Development and Validation", journal="JMIR Bioinform Biotech", year="2024", month="Jul", day="24", volume="5", pages="e56538", keywords="cancer genomics", keywords="machine learning algorithms", keywords="deep learning", keywords="gene expression", keywords="RNA", keywords="RNAs", keywords="cancer", keywords="oncology", keywords="tumor", keywords="tumors", keywords="tissue", keywords="tissues", keywords="metastatic", keywords="microRNA", keywords="microRNAs", keywords="gene", keywords="genes", keywords="genomic", keywords="genomics", keywords="machine learning", keywords="algorithm", keywords="algorithms", keywords="carcinoma", keywords="genetics", keywords="genome", keywords="detection", keywords="bioinformatics", abstract="Background: Carcinoma of unknown primary (CUP) is a subset of metastatic cancers in which the primary tissue source of the cancer cells remains unidentified. CUP is the eighth most common malignancy worldwide, accounting for up to 5\% of all malignancies. Representing an exceptionally aggressive metastatic cancer, the median survival is approximately 3 to 6 months. The tissue in which cancer arises plays a key role in our understanding of sensitivities to various forms of cell death. Thus, the lack of knowledge on the tissue of origin (TOO) makes it difficult to devise tailored and effective treatments for patients with CUP. Developing quick and clinically implementable methods to identify the TOO of the primary site is crucial in treating patients with CUP. Noncoding RNAs may hold potential for origin identification and provide a robust route to clinical implementation due to their resistance against chemical degradation. Objective: This study aims to investigate the potential of microRNAs, a subset of noncoding RNAs, as highly accurate biomarkers for detecting the TOO through data-driven, machine learning approaches for metastatic cancers. Methods: We used microRNA expression data from The Cancer Genome Atlas data set and assessed various machine learning approaches, from simple classifiers to deep learning approaches. As a test of our classifiers, we evaluated the accuracy on a separate set of 194 primary tumor samples from the Sequence Read Archive. We used permutation feature importance to determine the potential microRNA biomarkers and assessed them with principal component analysis and t-distributed stochastic neighbor embedding visualizations. Results: Our results show that it is possible to design robust classifiers to detect the TOO for metastatic samples on The Cancer Genome Atlas data set, with an accuracy of up to 97\% (351/362), which may be used in situations of CUP. Our findings show that deep learning techniques enhance prediction accuracy. We progressed from an initial accuracy prediction of 62.5\% (226/362) with decision trees to 93.2\% (337/362) with logistic regression, finally achieving 97\% (351/362) accuracy using deep learning on metastatic samples. On the Sequence Read Archive validation set, a lower accuracy of 41.2\% (77/188) was achieved by the decision tree, while deep learning achieved a higher accuracy of 80.4\% (151/188). Notably, our feature importance analysis showed the top 3 most important features for predicting TOO to be microRNA-10b, microRNA-205, and microRNA-196b, which aligns with previous work. Conclusions: Our findings highlight the potential of using machine learning techniques to devise accurate tests for detecting TOO for CUP. Since microRNAs are carried throughout the body via extracellular vesicles secreted from cells, they may serve as key biomarkers for liquid biopsy due to their presence in blood plasma. Our work serves as a foundation toward developing blood-based cancer detection tests based on the presence of microRNA. ", doi="10.2196/56538", url="https://bioinform.jmir.org/2024/1/e56538", url="http://www.ncbi.nlm.nih.gov/pubmed/39046787" } @Article{info:doi/10.2196/46360, author="Chen, Yi-Chu and Chen, Yun-Yuan and Su, Shih-Yung and Jhuang, Jing-Rong and Chiang, Chun-Ju and Yang, Ya-Wen and Lin, Li-Ju and Wu, Chao-Chun and Lee, Wen-Chung", title="Projected Time for the Elimination of Cervical Cancer Under Various Intervention Scenarios: Age-Period-Cohort Macrosimulation Study", journal="JMIR Public Health Surveill", year="2024", month="Apr", day="18", volume="10", pages="e46360", keywords="age-period-cohort model", keywords="population attributable fraction", keywords="macrosimulation", keywords="cancer screening", keywords="human papillomavirus", keywords="HPV", keywords="cervical cancer", keywords="intervention", keywords="women", keywords="cervical screening", keywords="public health intervention", abstract="Background: The World Health Organization aims for the global elimination of cervical cancer, necessitating modeling studies to forecast long-term outcomes. Objective: This paper introduces a macrosimulation framework using age-period-cohort modeling and population attributable fractions to predict the timeline for eliminating cervical cancer in Taiwan. Methods: Data for cervical cancer cases from 1997 to 2016 were obtained from the Taiwan Cancer Registry. Future incidence rates under the current approach and various intervention strategies, such as scaled-up screening (cytology based or human papillomavirus [HPV] based) and HPV vaccination, were projected. Results: Our projections indicate that Taiwan could eliminate cervical cancer by 2050 with either 70\% compliance in cytology-based or HPV-based screening or 90\% HPV vaccination coverage. The years projected for elimination are 2047 and 2035 for cytology-based and HPV-based screening, respectively; 2050 for vaccination alone; and 2038 and 2033 for combined screening and vaccination approaches. Conclusions: The age-period-cohort macrosimulation framework offers a valuable policy analysis tool for cervical cancer control. Our findings can inform strategies in other high-incidence countries, serving as a benchmark for global efforts to eliminate the disease. ", doi="10.2196/46360", url="https://publichealth.jmir.org/2024/1/e46360", url="http://www.ncbi.nlm.nih.gov/pubmed/38635315" } @Article{info:doi/10.2196/47744, author="Ru, Boshu and Sillah, Arthur and Desai, Kaushal and Chandwani, Sheenu and Yao, Lixia and Kothari, Smita", title="Real-World Data Quality Framework for Oncology Time to Treatment Discontinuation Use Case: Implementation and Evaluation Study", journal="JMIR Med Inform", year="2024", month="Mar", day="6", volume="12", pages="e47744", keywords="data quality assessment", keywords="real-world data", keywords="real-world time to treatment discontinuation", keywords="systemic anticancer therapy", keywords="Use Case Specific Relevance and Quality Assessment", keywords="UReQA framework", abstract="Background: The importance of real-world evidence is widely recognized in observational oncology studies. However, the lack of interoperable data quality standards in the fragmented health information technology landscape represents an important challenge. Therefore, adopting validated systematic methods for evaluating data quality is important for oncology outcomes research leveraging real-world data (RWD). Objective: This study aims to implement real-world time to treatment discontinuation (rwTTD) for a systemic anticancer therapy (SACT) as a new use case for the Use Case Specific Relevance and Quality Assessment, a framework linking data quality and relevance in fit-for-purpose RWD assessment. Methods: To define the rwTTD use case, we mapped the operational definition of rwTTD to RWD elements commonly available from oncology electronic health record--derived data sets. We identified 20 tasks to check the completeness and plausibility of data elements concerning SACT use, line of therapy (LOT), death date, and length of follow-up. Using descriptive statistics, we illustrated how to implement the Use Case Specific Relevance and Quality Assessment on 2 oncology databases (Data sets A and B) to estimate the rwTTD of an SACT drug (target SACT) for patients with advanced head and neck cancer diagnosed on or after January 1, 2015. Results: A total of 1200 (24.96\%) of 4808 patients in Data set A and 237 (5.92\%) of 4003 patients in Data set B received the target SACT, suggesting better relevance of the former in estimating the rwTTD of the target SACT. The 2 data sets differed with regard to the terminology used for SACT drugs, LOT format, and target SACT LOT distribution over time. Data set B appeared to have less complete SACT records, longer lags in incorporating the latest data, and incomplete mortality data, suggesting a lack of fitness for estimating rwTTD. Conclusions: The fit-for-purpose data quality assessment demonstrated substantial variability in the quality of the 2 real-world data sets. The data quality specifications applied for rwTTD estimation can be expanded to support a broad spectrum of oncology use cases. ", doi="10.2196/47744", url="https://medinform.jmir.org/2024/1/e47744", url="http://www.ncbi.nlm.nih.gov/pubmed/38446504" } @Article{info:doi/10.2196/42129, author="Gassner, Mathias and Barranco Garcia, Javier and Tanadini-Lang, Stephanie and Bertoldo, Fabio and Fr{\"o}hlich, Fabienne and Guckenberger, Matthias and Haueis, Silvia and Pelzer, Christin and Reyes, Mauricio and Schmithausen, Patrick and Simic, Dario and Staeger, Ramon and Verardi, Fabio and Andratschke, Nicolaus and Adelmann, Andreas and Braun, P. Ralph", title="Saliency-Enhanced Content-Based Image Retrieval for Diagnosis Support in Dermatology Consultation: Reader Study", journal="JMIR Dermatol", year="2023", month="Aug", day="24", volume="6", pages="e42129", keywords="dermatology", keywords="deep learning", keywords="melanoma", keywords="saliency maps", keywords="image retrieval", keywords="dermoscopy", keywords="skin cancer", keywords="diagnosis", keywords="algorithms", keywords="convolutional neural network", keywords="dermoscopic images", abstract="Background: Previous research studies have demonstrated that medical content image retrieval can play an important role by assisting dermatologists in skin lesion diagnosis. However, current state-of-the-art approaches have not been adopted in routine consultation, partly due to the lack of interpretability limiting trust by clinical users. Objective: This study developed a new image retrieval architecture for polarized or dermoscopic imaging guided by interpretable saliency maps. This approach provides better feature extraction, leading to better quantitative retrieval performance as well as providing interpretability for an eventual real-world implementation. Methods: Content-based image retrieval (CBIR) algorithms rely on the comparison of image features embedded by convolutional neural network (CNN) against a labeled data set. Saliency maps are computer vision--interpretable methods that highlight the most relevant regions for the prediction made by a neural network. By introducing a fine-tuning stage that includes saliency maps to guide feature extraction, the accuracy of image retrieval is optimized. We refer to this approach as saliency-enhanced CBIR (SE-CBIR). A reader study was designed at the University Hospital Zurich Dermatology Clinic to evaluate SE-CBIR's retrieval accuracy as well as the impact of the participant's confidence on the diagnosis. Results: SE-CBIR improved the retrieval accuracy by 7\% (77\% vs 84\%) when doing single-lesion retrieval against traditional CBIR. The reader study showed an overall increase in classification accuracy of 22\% (62\% vs 84\%) when the participant is provided with SE-CBIR retrieved images. In addition, the overall confidence in the lesion's diagnosis increased by 24\%. Finally, the use of SE-CBIR as a support tool helped the participants reduce the number of nonmelanoma lesions previously diagnosed as melanoma (overdiagnosis) by 53\%. Conclusions: SE-CBIR presents better retrieval accuracy compared to traditional CBIR CNN-based approaches. Furthermore, we have shown how these support tools can help dermatologists and residents improve diagnosis accuracy and confidence. Additionally, by introducing interpretable methods, we should expect increased acceptance and use of these tools in routine consultation. ", doi="10.2196/42129", url="https://derma.jmir.org/2023/1/e42129", url="http://www.ncbi.nlm.nih.gov/pubmed/37616039" } @Article{info:doi/10.2196/45455, author="Yu, Yushuai and Xu, Zelin and Shao, Tinglei and Huang, Kaiyan and Chen, Ruiliang and Yu, Xiaoqin and Zhang, Jie and Han, Hui and Song, Chuangui", title="Epidemiology and a Predictive Model of Prognosis Index Based on Machine Learning in Primary Breast Lymphoma: Population-Based Study", journal="JMIR Public Health Surveill", year="2023", month="Jun", day="8", volume="9", pages="e45455", keywords="primary breast lymphoma", keywords="epidemiology", keywords="prognosis", keywords="machine learning", keywords="disparities", abstract="Background: Primary breast lymphoma (PBL) is a rare disease whose epidemiological features, treatment principles, and factors used for the patients' prognosis remain controversial. Objective: The aim of this study was to explore the epidemiology of PBL and to develop a better model based on machine learning to predict the prognosis for patients with primary breast lymphoma. Methods: The annual incidence of PBL was extracted from the surveillance, epidemiology, and end results database between 1975 and 2019 to examine disease occurrence trends using Joinpoint software (version 4.9; National Cancer Institute). We enrolled data from 1251 female patients with primary breast lymphoma from the surveillance, epidemiology, and end results database for survival analysis. Univariable and multivariable analyses were performed to explore independent prognostic factors for overall survival and disease-specific survival of patients with primary breast lymphoma. Eight machine learning algorithms were developed to predict the 5-year survival of patients with primary breast lymphoma. Results: The overall incidence of PBL increased drastically between 1975 and 2004, followed by a significant downward trend in incidence around 2004, with an average annual percent change (AAPC) of ?0.8 (95\% CI ?1.1 to ?0.6). Disparities in trends of PBL exist by age and race. The AAPC of the 65 years or older cohort was about 1.2 higher than that for the younger than 65 years cohort. The AAPC of White patients is 0.9 (95\% CI 0.0-1.8), while that of Black patients was significantly higher at 2.1 (95\% CI ?2.5 to 6.9). We also identified that the risk of death from PBL is multifactorial and includes patient factors and treatment factors. Survival analysis revealed that the patients diagnosed between 2007 and 2015 had a significant risk reduction of mortality compared to those diagnosed between 1983 and 1990. The gradient booster model outperforms other models, with 0.752 for sensitivity and 0.817 for area under the curve. The important features established with the gradient booster model were the year of diagnosis, age, histologic type, and primary site, which were the 4 most relevant variables to explain 5-year survival status. Conclusions: The incidence of PBL started demonstrating a tendency to decrease after 2004, which varied by age and race. In recent years, the prognosis of patients with primary breast lymphoma has been remarkably improved. The gradient booster model had a promising performance. This model can help clinicians identify the early prognosis of patients with primary breast lymphoma and therefore improve the clinical outcome by changing management strategies and patient health care. ", doi="10.2196/45455", url="https://publichealth.jmir.org/2023/1/e45455", url="http://www.ncbi.nlm.nih.gov/pubmed/37169516" } @Article{info:doi/10.2196/43409, author="Seo, Dongjin and Kim, Sang Han and Ahn, Bae Joong and Park, Rang Yu", title="Investigation of the Trajectory of Muscle and Body Mass as a Prognostic Factor in Patients With Colorectal Cancer: Longitudinal Cohort Study", journal="JMIR Public Health Surveill", year="2023", month="Mar", day="22", volume="9", pages="e43409", keywords="body mass index", keywords="BMI", keywords="colorectal cancer", keywords="deep neural network model", keywords="skeletal muscle", keywords="skeletal muscle volume index", keywords="SMVI", abstract="Background: Skeletal muscle and BMI are essential prognostic factors for survival in colorectal cancer (CRC). However, there is a lack of understanding due to scarce studies on the continuous aspects of these variables. Objective: This study aimed to evaluate the prognostic impact of the initial status and trajectories of muscle and BMI on overall survival (OS) and assess whether these 4 profiles within 1 year can represent the profiles 6 years later. Methods: We analyzed 4056 newly diagnosed patients with CRC between 2010 to 2020. The volume of the muscle with 5-mm thickness at the third lumbar spine level was measured using a pretrained deep learning algorithm. The skeletal muscle volume index (SMVI) was defined as the muscle volume divided by the square of the height. The correlation between BMI status at the first, third, and sixth years of diagnosis was analyzed and assessed similarly for muscle profiles. Prognostic significances of baseline BMI and SMVI and their 1-year trajectories for OS were evaluated by restricted cubic spline analysis and survival analysis. Patients were categorized based on these 4 dimensions, and prognostic risks were predicted and demonstrated using heat maps. Results: Trajectories of SMVI were categorized as decreased (812/4056, 20\%), steady (2014/4056, 49.7\%), or increased (1230/4056, 30.3\%). Similarly, BMI trajectories were categorized as decreased (792/4056, 19.5\%), steady (2253/4056, 55.5\%), or increased (1011/4056, 24.9\%). BMI and SMVI values in the first year after diagnosis showed a statistically significant correlation with those in the third and sixth years (P<.001). Restricted cubic spline analysis showed a nonlinear relationship between baseline BMI and SMVI change ratio and OS; BMI, in particular, showed a U-shaped correlation. According to survival analysis, increased BMI (hazard ratio [HR] 0.83; P=.02), high baseline SMVI (HR 0.82; P=.04), and obesity stage 1 (HR 0.80; P=.02) showed a favorable impact, whereas decreased SMVI trajectory (HR 1.31; P=.001), decreased BMI (HR 1.23; P=.02), and initial underweight (HR 1.38; P=.02) or obesity stages 2-3 (HR 1.79; P=.01) were negative prognostic factors for OS. Considered simultaneously, BMI >30 kg/m2 with a low SMVI at the time of diagnosis resulted in the highest mortality risk. We observed improved survival in patients with increased muscle mass without BMI loss compared to those with steady muscle mass and BMI. Conclusions: Profiles within 1 year of both BMI and muscle were surrogate indicators for predicting the later profiles. Continuous trajectories of body and muscle mass are independent prognostic factors of patients with CRC. An automatic algorithm provides a unique opportunity to conduct longitudinal evaluations of body compositions. Further studies to understand the complicated natural courses of muscularity and adiposity are necessary for clinical application. ", doi="10.2196/43409", url="https://publichealth.jmir.org/2023/1/e43409", url="http://www.ncbi.nlm.nih.gov/pubmed/36947110" } @Article{info:doi/10.2196/35750, author="Gao, Ying and Li, Shu and Jin, Yujing and Zhou, Lengxiao and Sun, Shaomei and Xu, Xiaoqian and Li, Shuqian and Yang, Hongxi and Zhang, Qing and Wang, Yaogang", title="An Assessment of the Predictive Performance of Current Machine Learning--Based Breast Cancer Risk Prediction Models: Systematic Review", journal="JMIR Public Health Surveill", year="2022", month="Dec", day="29", volume="8", number="12", pages="e35750", keywords="breast cancer", keywords="machine learning", keywords="risk prediction", keywords="cancer", keywords="oncology", keywords="systemic review", keywords="review", keywords="meta-analysis", keywords="cancer research", keywords="risk model", abstract="Background: Several studies have explored the predictive performance of machine learning--based breast cancer risk prediction models and have shown controversial conclusions. Thus, the performance of the current machine learning--based breast cancer risk prediction models and their benefits and weakness need to be evaluated for the future development of feasible and efficient risk prediction models. Objective: The aim of this review was to assess the performance and the clinical feasibility of the currently available machine learning--based breast cancer risk prediction models. Methods: We searched for papers published until June 9, 2021, on machine learning--based breast cancer risk prediction models in PubMed, Embase, and Web of Science. Studies describing the development or validation models for predicting future breast cancer risk were included. The Prediction Model Risk of Bias Assessment Tool (PROBAST) was used to assess the risk of bias and the clinical applicability of the included studies. The pooled area under the curve (AUC) was calculated using the DerSimonian and Laird random-effects model. Results: A total of 8 studies with 10 data sets were included. Neural network was the most common machine learning method for the development of breast cancer risk prediction models. The pooled AUC of the machine learning--based optimal risk prediction model reported in each study was 0.73 (95\% CI 0.66-0.80; approximate 95\% prediction interval 0.56-0.96), with a high level of heterogeneity between studies (Q=576.07, I2=98.44\%; P<.001). The results of head-to-head comparison of the performance difference between the 2 types of models trained by the same data set showed that machine learning models had a slightly higher advantage than traditional risk factor--based models in predicting future breast cancer risk. The pooled AUC of the neural network--based risk prediction model was higher than that of the nonneural network--based optimal risk prediction model (0.71 vs 0.68, respectively). Subgroup analysis showed that the incorporation of imaging features in risk models resulted in a higher pooled AUC than the nonincorporation of imaging features in risk models (0.73 vs 0.61; Pheterogeneity=.001, respectively). The PROBAST analysis indicated that many machine learning models had high risk of bias and poorly reported calibration analysis. Conclusions: Our review shows that the current machine learning--based breast cancer risk prediction models have some technical pitfalls and that their clinical feasibility and reliability are unsatisfactory. ", doi="10.2196/35750", url="https://publichealth.jmir.org/2022/12/e35750", url="http://www.ncbi.nlm.nih.gov/pubmed/36426919" } @Article{info:doi/10.2196/27694, author="Chen, Pei-Chin and Lu, Yun-Ru and Kang, Yi-No and Chang, Chun-Chao", title="The Accuracy of Artificial Intelligence in the Endoscopic Diagnosis of Early Gastric Cancer: Pooled Analysis Study", journal="J Med Internet Res", year="2022", month="May", day="16", volume="24", number="5", pages="e27694", keywords="artificial intelligence", keywords="early gastric cancer", keywords="endoscopy", abstract="Background: Artificial intelligence (AI) for gastric cancer diagnosis has been discussed in recent years. The role of AI in early gastric cancer is more important than in advanced gastric cancer since early gastric cancer is not easily identified in clinical practice. However, to our knowledge, past syntheses appear to have limited focus on the populations with early gastric cancer. Objective: The purpose of this study is to evaluate the diagnostic accuracy of AI in the diagnosis of early gastric cancer from endoscopic images. Methods: We conducted a systematic review from database inception to June 2020 of all studies assessing the performance of AI in the endoscopic diagnosis of early gastric cancer. Studies not concerning early gastric cancer were excluded. The outcome of interest was the diagnostic accuracy (comprising sensitivity, specificity, and accuracy) of AI systems. Study quality was assessed on the basis of the revised Quality Assessment of Diagnostic Accuracy Studies. Meta-analysis was primarily based on a bivariate mixed-effects model. A summary receiver operating curve and a hierarchical summary receiver operating curve were constructed, and the area under the curve was computed. Results: We analyzed 12 retrospective case control studies (n=11,685) in which AI identified early gastric cancer from endoscopic images. The pooled sensitivity and specificity of AI for early gastric cancer diagnosis were 0.86 (95\% CI 0.75-0.92) and 0.90 (95\% CI 0.84-0.93), respectively. The area under the curve was 0.94. Sensitivity analysis of studies using support vector machines and narrow-band imaging demonstrated more consistent results. Conclusions: For early gastric cancer, to our knowledge, this was the first synthesis study on the use of endoscopic images in AI in diagnosis. AI may support the diagnosis of early gastric cancer. However, the collocation of imaging techniques and optimal algorithms remain unclear. Competing models of AI for the diagnosis of early gastric cancer are worthy of future investigation. Trial Registration: PROSPERO CRD42020193223; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=193223 ", doi="10.2196/27694", url="https://www.jmir.org/2022/5/e27694", url="http://www.ncbi.nlm.nih.gov/pubmed/35576561" } @Article{info:doi/10.2196/35768, author="Ma, Zhuo and Huang, Sijia and Wu, Xiaoqing and Huang, Yinying and Chan, Wai-Chi Sally and Lin, Yilan and Zheng, Xujuan and Zhu, Jiemin", title="Development of a Prognostic App (iCanPredict) to Predict Survival for Chinese Women With Breast Cancer: Retrospective Study", journal="J Med Internet Res", year="2022", month="Mar", day="9", volume="24", number="3", pages="e35768", keywords="app", keywords="breast cancer", keywords="survival prediction model", keywords="iCanPredict", abstract="Background: Accurate prediction of survival is crucial for both physicians and women with breast cancer to enable clinical decision making on appropriate treatments. The currently available survival prediction tools were developed based on demographic and clinical data obtained from specific populations and may underestimate or overestimate the survival of women with breast cancer in China. Objective: This study aims to develop and validate a prognostic app to predict the overall survival of women with breast cancer in China. Methods: Nine-year (January 2009-December 2017) clinical data of women with breast cancer who received surgery and adjuvant therapy from 2 hospitals in Xiamen were collected and matched against the death data from the Xiamen Center of Disease Control and Prevention. All samples were randomly divided (7:3 ratio) into a training set for model construction and a test set for model external validation. Multivariable Cox regression analysis was used to construct a survival prediction model. The model performance was evaluated by receiver operating characteristic (ROC) curve and Brier score. Finally, by running the survival prediction model in the app background thread, the prognostic app, called iCanPredict, was developed for women with breast cancer in China. Results: A total of 1592 samples were included for data analysis. The training set comprised 1114 individuals and the test set comprised 478 individuals. Age at diagnosis, clinical stage, molecular classification, operative type, axillary lymph node dissection, chemotherapy, and endocrine therapy were incorporated into the model, where age at diagnosis (hazard ratio [HR] 1.031, 95\% CI 1.011-1.051; P=.002), clinical stage (HR 3.044, 95\% CI 2.347-3.928; P<.001), and endocrine therapy (HR 0.592, 95\% CI 0.384-0.914; P=.02) significantly influenced the survival of women with breast cancer. The operative type (P=.81) and the other 4 variables (molecular classification [P=.91], breast reconstruction [P=.36], axillary lymph node dissection [P=.32], and chemotherapy [P=.84]) were not significant. The ROC curve of the training set showed that the model exhibited good discrimination for predicting 1- (area under the curve [AUC] 0.802, 95\% CI 0.713-0.892), 5- (AUC 0.813, 95\% CI 0.760-0.865), and 10-year (AUC 0.740, 95\% CI 0.672-0.808) overall survival. The Brier scores at 1, 5, and 10 years after diagnosis were 0.005, 0.055, and 0.103 in the training set, respectively, and were less than 0.25, indicating good predictive ability. The test set externally validated model discrimination and calibration. In the iCanPredict app, when physicians or women input women's clinical information and their choice of surgery and adjuvant therapy, the corresponding 10-year survival prediction will be presented. Conclusions: This survival prediction model provided good model discrimination and calibration. iCanPredict is the first tool of its kind in China to provide survival predictions to women with breast cancer. iCanPredict will increase women's awareness of the similar survival rate of different surgeries and the importance of adherence to endocrine therapy, ultimately helping women to make informed decisions regarding treatment for breast cancer. ", doi="10.2196/35768", url="https://www.jmir.org/2022/3/e35768", url="http://www.ncbi.nlm.nih.gov/pubmed/35262503" } @Article{info:doi/10.2196/25800, author="Plasek, Joseph and Weissert, John and Downs, Tracy and Richards, Kyle and Ravvaz, Kourosh", title="Clinicopathological Criteria Predictive of Recurrence Following Bacillus Calmette-Gu{\'e}rin Therapy Initiation in Non--Muscle-Invasive Bladder Cancer: Retrospective Cohort Study", journal="JMIR Cancer", year="2021", month="Jun", day="22", volume="7", number="2", pages="e25800", keywords="urinary bladder neoplasms", keywords="risk factor", keywords="bacillus Calmette-Gu{\'e}rin", keywords="recurrence", abstract="Background: Bacillus Calmette-Gu{\'e}rin (BCG) is currently the most clinically effective intravesical treatment for non--muscle-invasive bladder cancer (NMIBC), particularly for patients with high-risk NMIBC such as those with carcinoma in situ. BCG treatments could be optimized to improve patient safety and conserve supply by predicting BCG efficacy based on tumor characteristics or clinicopathological criteria. Objective: The aim of this study is to assess the ability of specific clinicopathological criteria to predict tumor recurrence in patients with NMIBC who received BCG therapy along various treatment timelines. Methods: A total of 1331 patients (stage Ta, T1, or carcinoma in situ) who underwent transurethral resection of a bladder tumor between 2006 and 2017 were included. Univariate analysis, including laboratory tests (eg, complete blood panels, creatinine levels, and hemoglobin A1c levels) within 180 days of BCG therapy initiation, medications, and clinical and demographic variables to assess their ability to predict NMIBC recurrence, was completed. This was followed by multivariate regression that included the elements of the Club Urol{\'o}gico Espa{\~n}ol de Tratamiento Oncol{\'o}gico (CUETO) scoring model and variables that were significant predictors of recurrence in univariate analysis. Results: BCG was administered to 183 patients classified as intermediate or high risk, and 76 (41.5\%) experienced disease recurrence. An abnormal neutrophil-to-lymphocyte ratio measured within 180 days of induction BCG therapy was a significant predictor (P=.047) of future cancer recurrence and was a stronger predictor than the CUETO score or the individual variables included in the CUETO scoring model through multivariate analysis. Conclusions: An abnormal neutrophil-to-lymphocyte ratio within 180 days of BCG therapy initiation is predictive of recurrence and could be suggestive of additional or alternative interventions. ", doi="10.2196/25800", url="https://cancer.jmir.org/2021/2/e25800", url="http://www.ncbi.nlm.nih.gov/pubmed/34156341" }