Search Articles

View query in Help articles search

Search Results (1 to 10 of 145 Results)

Download search results: CSV END BibTex RIS


Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study

Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study

Retrieval-augmented generation (RAG) is a state-of-the-art technique that enhances LLMs by integrating external data retrieval, improving factual accuracy, and reducing costs [13]. By retrieving relevant information from external sources and incorporating it as contextual input, RAG effectively mitigates the issue of hallucinations in LLMs [14].

Hai Li, Jingyi Huang, Mengmeng Ji, Yuyi Yang, Ruopeng An

J Med Internet Res 2025;27:e66098

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis

Accuracy for objective questions was calculated as the number of correctly answered questions divided by the total number of questions. For diagnosis and classification, accuracy was defined as the number of cases correctly diagnosed or triaged divided by the total number of cases. Specifically for open-ended questions, accuracy was determined based on the number of questions rated “good” or “accurate” on the accuracy scale divided by the total number of questions.

Ling Wang, Jinglin Li, Boyang Zhuang, Shasha Huang, Meilin Fang, Cunze Wang, Wen Li, Mohan Zhang, Shurong Gong

J Med Internet Res 2025;27:e64486

Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis

Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis

Therefore, this study aims to comprehensively evaluate the performance and accuracy of LLMs in clinical diagnosis, providing references for their clinical application. This systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) statement [7]. Specific details can be found in Checklist 1.

Guxue Shan, Xiaonan Chen, Chen Wang, Li Liu, Yuanjing Gu, Huiping Jiang, Tingqi Shi

JMIR Med Inform 2025;13:e64963

Assessing the Quality and Reliability of ChatGPT’s Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4

Assessing the Quality and Reliability of ChatGPT’s Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4

However, despite being one of the most favored informational modalities, websites often require more content accuracy and better readability [1]. Recently, artificial intelligence (AI)–powered chatbots such as Chat GPT have signified a potential paradigm shift in how patients with cancer can access a vast amount of medical information [1,3,4].

Ana Grilo, Catarina Marques, Maria Corte-Real, Elisabete Carolino, Marco Caetano

JMIR Cancer 2025;11:e63677

Understanding the Relationship Between Ecological Momentary Assessment Methods, Sensed Behavior, and Responsiveness: Cross-Study Analysis

Understanding the Relationship Between Ecological Momentary Assessment Methods, Sensed Behavior, and Responsiveness: Cross-Study Analysis

Despite these advantages, EMA implementation faces challenges, especially in the variability, completeness, and accuracy of participant responses to prompts. Factors such as distraction, self-awareness, boredom, time of day, and interruption burden [11] can impact participant responses. Addressing these issues is essential for maintaining the integrity of research findings. Furthermore, the design of notification strategies may dramatically impact response compliance and quality [12,13].

Diane Cook, Aiden Walker, Bryan Minor, Catherine Luna, Sarah Tomaszewski Farias, Lisa Wiese, Raven Weaver, Maureen Schmitter-Edgecombe

JMIR Mhealth Uhealth 2025;13:e57018

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study

Therefore, a comprehensive evaluation of chatbots’ reliability and accuracy in addressing medical inquiries is essential to ensure their effective application in managing diseases like OMG [16]. Recent studies have explored the application of LLMs in ophthalmology. Jaskari et al [17] introduced a model named DR-GPT, designed to analyze fundus images, demonstrating that LLMs can be applied to unstructured medical report databases to aid in classifying diabetic retinopathy.

Bin Wei, Lili Yao, Xin Hu, Yuxiang Hu, Jie Rao, Yu Ji, Zhuoer Dong, Yichong Duan, Xiaorong Wu

J Med Internet Res 2025;27:e67883

Wrist-Worn and Arm-Worn Wearables for Monitoring Heart Rate During Sedentary and Light-to-Vigorous Physical Activities: Device Validation Study

Wrist-Worn and Arm-Worn Wearables for Monitoring Heart Rate During Sedentary and Light-to-Vigorous Physical Activities: Device Validation Study

Moreover, mean absolute error, mean absolute percentage error (MAPE), 5% accuracy (percentage of MAPE within a 5% range of the reference value), root-mean-squared error (RMSE), and ordinary least squares linear regression were used to evaluate accuracy.

Theresa Schweizer, Rahel Gilgen-Ammann

JMIR Cardio 2025;9:e67110

Synthetic Data-Driven Approaches for Chinese Medical Abstract Sentence Classification: Computational Study

Synthetic Data-Driven Approaches for Chinese Medical Abstract Sentence Classification: Computational Study

Notably, when trained on dataset #1, the SBERT-Doc SCAN algorithm emerges as the leading performer, securing an accuracy and F1-score of 0.8985 on the test dataset. This standout performance highlights the algorithm’s capability to classify medical domain data with high precision. Additionally, the SBERT-MEC algorithm also displays comparable performance on the same dataset, with an accuracy and F1-score of 0.8938, making it the second most effective algorithm in our evaluation.

Jiajia Li, Zikai Wang, Longxuan Yu, Hui Liu, Haitao Song

JMIR Form Res 2025;9:e54803

Creation of Scientific Response Documents for Addressing Product Medical Information Inquiries: Mixed Method Approach Using Artificial Intelligence

Creation of Scientific Response Documents for Addressing Product Medical Information Inquiries: Mixed Method Approach Using Artificial Intelligence

The accuracy of an SRD is crucial in its creation. Furthermore, traceability and accountability are essential considerations. The use of LLMs like Chat GPT often results in the original authors and sources not being cited, leading to the misattribution of information [13]. This study has 2 aims. The first is to quantify the challenges of SRD creation by gathering the opinions of medical information professionals regarding the time consumption of the various steps of SRD development.

Jerry Lau, Shivani Bisht, Robert Horton, Annamaria Crisan, John Jones, Sandeep Gantotti, Evelyn Hermes-DeSantis

JMIR AI 2025;4:e55277

GPT-3.5 Turbo and GPT-4 Turbo in Title and Abstract Screening for Systematic Reviews

GPT-3.5 Turbo and GPT-4 Turbo in Title and Abstract Screening for Systematic Reviews

This study aimed to compare accuracy and efficiency between GPT-3.5 Turbo and GPT-4 Turbo (Open AI)—widely used LLMs in the medical field—in title and abstract screening. We conducted a post hoc analysis of our previous study to evaluate the performance of GPT-3.5 Turbo and GPT-4 Turbo in LLM-assisted title and abstract screening, using data from 5 clinical questions (CQs) developed for the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock 2024 [6,10].

Takehiko Oami, Yohei Okada, Taka-aki Nakada

JMIR Med Inform 2025;13:e64682