Assessing accuracy of chat generative pre-trained transformer's responses to common patient questions regarding congenital upper limb differences Journal Article uri icon
Overview
abstract
  • PURPOSE: The purpose was to assess the ability of Chat Generative Pre-Trained Transformer (ChatGPT) 4.0 to accurately and reliably answer patients' frequently asked questions (FAQs) about congenital upper limb differences (CULDs) and their treatment options.
    METHODS: Two pediatric hand surgeons were queried regarding FAQs they receive from parents about CULDs. Sixteen FAQs were input to ChatGPT-4.0 for the following conditions: (1) syndactyly, (2) polydactyly, (3) radial longitudinal deficiency, (4) thumb hypoplasia, and (5) general congenital hand differences. Two additional psychosocial care questions were queried, and all responses were graded by the surgeons using a scale of 1-4, based on the quality of the response. Independent chats were used for each question to reduce memory-retention bias with no pretraining of the software application.
    RESULTS: Overall, ChatGPT provided relatively reliable, evidence-based responses to the 16 queried FAQs. In total, 164 grades were assigned to the 82 ChatGPT responses: 83 (51%) did not require any clarification, 37 (23%) required minimal clarification, 32 (20%) required moderate clarification, and 13 (8%) received an unsatisfactory rating. However, there was considerable variability in the depth of many responses. When queried on medical associations with syndactyly and polydactyly, ChatGPT provided a detailed account of associated syndromes, although there was no mention that syndromic involvement is relatively rare. Furthermore, ChatGPT recommended that the patients consult a health care provider for individualized care 81 times in 49 responses. It commonly "referred" patients to genetic counselors (n = 26, 32%), followed by pediatric orthopedic surgeons and orthopedic surgeons (n = 16, 20%), and hand surgeons (n = 9, 11%).
    CONCLUSIONS: Chat Generative Pre-Trained Transformer provided evidence-based responses not requiring clarification to a majority of FAQs about CULDs. However, there was considerable variation across the responses, and it rarely "referred" patients to hand surgeons. As new tools for patient education, ChatGPT and similar large language models should be approached cautiously when seeking information about CULDs. Responses do not consistently provide comprehensive, individualized information. 8% of responses were misguiding.
    TYPE OF STUDY/LEVEL OF EVIDENCE: Economic/decision analysis IIC.

  • Link to Article
    publication date
  • 2025
  • published in
    Research
    keywords
  • Artificial Intelligence
  • Consumer Health Information
  • Hand
  • Orthopedics
  • Pediatrics
  • Surgery
  • Additional Document Info
    volume
  • 7
  • issue
  • 4