Search Articles

View query in Help articles search

Search Results (1 to 10 of 2547 Results)

Download search results: CSV END BibTex RIS


Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study

Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study

We then calculated response consistency using the variance of these frequency distributions: For example, if the model answered “A” consistently across multiple trials, this would yield a frequency distribution of {A: 1, B: 0, C: 0, D: 0, E: 0} and a variance of 0.2. In contrast, if the model answered “A, A, B, B, C, C, D, D, E, E,” this would yield a frequency distribution of {A: 0.2, B: 0.2, C: 0.2, D: 0.2, E: 0.2} and a variance of 0.

Kaitlin Hanss, Karthik V Sarma, Anne L Glowinski, Andrew Krystal, Ramotse Saunders, Andrew Halls, Sasha Gorrell, Erin Reilly

J Med Internet Res 2025;27:e69910

Co-Designed Digital Device for Tracking Rehabilitation Dosage in a Clinical Environment After Stroke: Mixed Methods Validity and Feasibility Study

Co-Designed Digital Device for Tracking Rehabilitation Dosage in a Clinical Environment After Stroke: Mixed Methods Validity and Feasibility Study

Cohen d was calculated to quantify the effect sizes of any observed differences, with small effect sizes being indicative of negligible differences. Bland-Altman plots were used to visually explore the relationship between the 2 methods, and the limits of agreement (LOA) were calculated with a 95% CI to determine the range within which most differences would fall.

Fiona Boyd, Gillian Sweeney, Mark Barber, Elaine Forrest, Mark Dunlop, Andrew Kerr

JMIR Rehabil Assist Technol 2025;12:e68129

Combining Artificial Intelligence and Human Support in Mental Health: Digital Intervention With Comparable Effectiveness to Human-Delivered Care

Combining Artificial Intelligence and Human Support in Mental Health: Digital Intervention With Comparable Effectiveness to Human-Delivered Care

Clinical effectiveness was quantified by calculating the change in anxiety symptoms, measured using the GAD-7, from baseline to final score, and estimating a within-participant effect size (Cohen d). A negative mean change denotes a reduction in GAD-7 total scores. Absolute Cohen d values are presented. The threshold for a clinically meaningful reduction in symptoms was defined as a change greater than the reliable change index of the GAD-7 scale (minimum of a 4-point reduction) [54].

Clare E Palmer, Emily Marshall, Edward Millgate, Graham Warren, Michael Ewbank, Elisa Cooper, Samantha Lawes, Alastair Smith, Chris Hutchins-Joss, Jessica Young, Malika Bouazzaoui, Morad Margoum, Sandra Healey, Louise Marshall, Shaun Mehew, Ronan Cummins, Valentin Tablan, Ana Catarino, Andrew E Welchman, Andrew D Blackwell

J Med Internet Res 2025;27:e69351