Abstract
Background/Aim: The integration of AI and natural language processing technologies, such as ChatGPT, into surgical practice has shown promising potential in enhancing various aspects of abdominopelvic surgical procedures. This systematic review aims to comprehensively evaluate the current state of research on the applications and impact of artificial intelligence (AI) and ChatGPT in abdominopelvic surgery summarizing existing literature towards providing a comprehensive overview of the diverse applications, effectiveness, challenges, and future directions of these innovative technologies. Materials and Methods: A systematic search of major electronic databases, including PubMed, Google Scholar, Cochrane Library, Web of Science, was conducted from October to November 2023, to identify relevant studies. Inclusion criteria encompassed studies that investigated the utilization of AI and ChatGPT in abdominopelvic surgical settings, including, but not limited to preoperative planning, intraoperative decision-making, postoperative care, and patient communication. Results: Fourteen studies met the inclusion criteria and were included in this review. The majority of the studies were analysing ChatGPT’s data output and decision making while two studies reported patient and general surgery resident perception of the tool applied to clinical practice. Most studies reported a high accuracy of ChatGPT in data output and decision-making process, however with an unforgettable number of errors. Conclusion: This systematic review contributes to the current understanding of the role of AI and ChatGPT in abdominopelvic surgery, providing insight into their applications and impact on clinical practice. The synthesis of available evidence will inform future research directions, clinical guidelines, and development of these technologies to optimize their potential benefits in enhancing surgical care within the abdominopelvic domain.
In recent years, the landscape of abdominal surgery has been undergoing a transformative shift, driven by advancements in artificial intelligence (AI) and natural language processing. Among the emerging technologies, ChatGPT, a powerful language model developed by OpenAI, has emerged as a promising tool with the potential to revolutionize the field of surgery. As a novel application of AI, ChatGPT holds the promise of enhancing communication, decision-making, and information retrieval in the complex and dynamic environment of abdominal surgery. Indeed, the ability of ChatGPT to understand and generate human-like text might be integrated in the clinical practice in order to improve interaction between surgical teams, decision support, and patient education. With regard to medical knowledge, an earlier version of ChatGPT was shown to perform at or near the passing threshold of 60% accuracy on the United States Medical Licensing Exam (USMLE) (1, 2).
This review aims to provide a comprehensive exploration of the current utilization of ChatGPT in abdominal surgery, shedding light on its evolving role, applications, and implications for both surgeons and patients. By delving into the current state of research, practical implementations, and future directions, this review aimed to offer insights into the transformative potential of ChatGPT in optimizing various facets of abdominal surgery.
From preoperative planning and intraoperative decision-making to postoperative care and patient communication, the integration of ChatGPT introduces a new dimension to surgical practice. The present review tried to explore the impact of this technology on surgical workflows, its role in enhancing precision, and the challenges and opportunities associated with its adoption in the clinical setting. As the surgical community embraces the digital era, understanding the nuances of ChatGPT’s integration into abdominal surgery becomes paramount for harnessing its capabilities effectively and ensuring optimal patient outcomes.
As we navigate this dynamic intersection of artificial intelligence and surgery, this systematic review aims to offer a panoramic view of the current landscape, fostering a deeper understanding of ChatGPT’s potential and paving the way for further research and innovation in abdominal surgery (3).
Materials and Methods
Search strategy. The systematic review adhered to the guidelines set forth by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and was prospectively registered with the International Prospective Register of Systematic Reviews (PROSPERO) prior to commencing data extraction. A comprehensive search was conducted across multiple databases, including PubMed, Google Scholar, ScienceDirect, with the search period spanning from October to November 2023. The search was restricted to articles published in English, and no additional filters were applied. The initial database retrieval involved screening by title and abstract, employing specific search terms, such as “chatgpt”, “abdominal”, “general”, “surgery”, and “artificial intelligence” (4).
Data extraction. The Rayyan software (Qatar Computing Research Institute, HBKU, Doha, Qatar) was used for the selection process. Following the elimination of duplicate publications, the titles, abstracts, and keywords were independently screened for inclusion by M.G. and M.P. Eligible articles then underwent a full-text review. In instances of disparities in inclusion decisions, a consensus was reached through discussion with a third author (G.G.). The inclusion criteria centred on articles describing the adoption of ChatGPT on the abdominopelvic surgical field. Excluded were articles that lacked the utilization of the tool in this field, as well as abstracts, reviews, meta-analyses, letters, and editorials. Studies reporting on ChatGpt as a tool for article writing as well as in field other thank abdominopelvic surgery were excluded. As well as articles regarding other kind of surgery beyond abdominopelvic like plastic surgery or otorhinolaryngology were excluded. Subsequently, relevant data pertaining to authors, journal, field of application, aims and studies details, were extracted for subsequent analysis.
Results
The search strategy identified studies reporting the use of ChatGPT in the field of abdominopelvic surgery. Initially, 112 studies were identified, of which 34 full texts were selected at the level of title and abstract screening. Finally, 14 studies met the criteria for inclusion in the systematic review (Figure 1). Due to the heterogeneity of reports in data collection and study outcomes, a quantitative analysis of the results was not appropriate.
PRISMA flow diagram of study selection.
All studies found in the Literature were published in 2023 (100%). They were mainly (57.1%) from the field of general surgery (GS) and other fields of abdominopelvic surgery (43%) including urology, pediatric, bariatric and gynaecologic surgery.
Of the included studies, eight were about the capability of the system to generate data output (50%) (3, 5-9); three (24.4%) considered the decision-making accuracy during tumour board (10) or emergency/intraoperative setting; two (14.2%) about the response quality perception by both medical students and patients (11-14); One (7%) included the ability of data collection and one (7%) data generation (15, 16) (Table I).
Details of selected studies.
Discussion
Summary of main results. This systematic review encompassed a diverse range of investigations into the capabilities and limitations of ChatGPT within the medical domain and specifically the surgical field. The analysis revealed that while ChatGPT demonstrated a comprehensive understanding of established general surgical knowledge, its capacity to generate ground-breaking concepts or discoveries beyond existing paradigms was limited. The potential for ChatGPT to serve as an informative resource for patients was acknowledged, albeit accompanied by concerns regarding information accuracy and the generation of non-existent references. Noteworthy was ChatGPT’s ability to respond valuably to clinical queries, although further refinement is imperative. Patient perceptions, particularly among those with lower education levels, exhibited a negative shift following explanations provided by the AI chatbot. The study underscored ChatGPT’s feasibility in data collection scenarios, indicating its potential utility in certain research contexts. Additionally, a substantial inconsistency in responses with repeated queries was observed. The results also emphasized the dependence of ChatGPT’s output quality on the prompt and highlighted the necessity for content verification, along with secure integration into electronic health records before adoption in healthcare systems. The comprehensive synthesis of findings elucidated both the promise and challenges associated with ChatGPT’s integration into clinical medicine, emphasizing the importance of collaboration between AI and human expertise (17, 18).
Indeed, ChatGPT is a tool of great innovation but still difficult to use and incorporate into current clinical practice. The topic most researched and explored by the various authors of all the surgical specialities in question is certainly that of data and information generation. That is, the use of ChatGPT as an interactive colleague that can provide answers to complex clinical questions in real time. For example, Hermann et al. queried the system with a questionnaire about cervical cancer, its prevention, diagnosis, treatment, and survival/quality of life (QoL) and rated the system’s answers as ranging from ‘correct and appropriate answer’ to ‘completely incorrect and inappropriate answer’, finding high rates of 91.7 to 93.8% correct and appropriate answers in the case of prevention and survival/QoL. Modestly lower (71.4% correct answers) in the case of treatment and absolutely insufficient in the case of diagnosis (33% correct answers) (9) Samaan JS et al. as well reported the results of the appropriateness of ChatGPT generated answers to bariatric surgery questionnaires. In this case, the system produced 25% partially correct answers with rare cases of completely incorrect answers (8).
The need behind the use of such software in the surgical field stems from the fact that numerous studies have shown the tendency of patients to search online to answer clinical, surgical, and medical questions. However, the information conveyed via online search engines is frequently the subject of fake news and difficult to interpret, a search tool capable of generating correct and appropriate humanoid answers could provide great support for the patient and a foothold for doctor-patient communication, which is still one of the welding points. However, other reported studies investigating ChatGPT data output registered lower rates of accuracy underlying the limitation of large language models and highlighting the need to use them in conjunction with human expertise and judgment (7). Beaulieu-Jones BR et al. evaluated ChatGPT responses through two commonly used surgical educational resources: The Surgical Council on Resident Education (SCORE) and Data-B (1). They registered around 68-71% of correct answers justifying the inaccurate responses with inaccurate information in a complex question (n=16, 36.4%); inaccurate information in fact-based question (n=11, 25.0%); and accurate information with circumstantial discrepancy (n=6, 13.6%). Moreover, they reported a substantial inconsistency in ChatGPT responses with repeat query. However, all the studies considered in this review, and which discuss the use of ChatGPT for data output agree that the accuracy of ChatGPT in answering clinical questions is high but not infallible and that it still has certain limitations, such as the production of non-existent or incorrect references or the difficulty in understanding non-linear questions or in answering repeated questions (3, 5).
On the other hand, some of these studies concern the user’s perception of ChatGPT. In particular, the perceptions of general surgery residents regarding the role of artificial intelligence in medicine and those of patients confronted with certain medical explanations provided by ChatGPT were investigated.
In the first case, the trainees welcomed the ChatGPT technology positively as an added value to their training in terms of research information and clinical collaboration for the improvement of both training and patient care. In the second case, on the other hand, opinions were more heterogeneous and not always positive. The authors identified negative opinions especially in patients with a lower level of education (13, 14, 19).
Another aspect of using ChatGPT as a revolutionary support tool for the medical profession is that of decision-making support. In this sense, the use of Chat GPT has been directed in the evaluation of opinions produced by the software when faced with complex clinical decisions, for example within a tumour board in planning the correct therapy for breast cancer, or in the management of complex clinical conditions such as pancreatitis.
However, the software, while showing promising capabilities, did not prove infallible and was found to be out of date with the latest therapies and guidelines (10, 11, 20, 21).
ChatGPT’s evident capabilities, such as decision-making and data generation, also include that of text generation, which is perhaps the most immediate compared to the software itself. In fact, one study used the tool to generate surgical notes with good results. However, the quality of the reports produced depended strictly on the user’s suggestions and indications and therefore simplified and speeded up the work without, however, eliminating human error (16). In the final analysis, given the premises and the incredible improvements over time, it is probably worth considering that ChatGPT may in the future be mentioned among the authors who collaborated in the drafting and generation of a scientific paper, making a substantial contribution not only in structure and methodology but also in content (6).
Results in the context of published literature. As part of the generative pre-trained transformer (GPT) family of models, specifically based on GPT-3.5 architecture, ChatGPT represents a model trained on a diverse range of internet text and capable of understanding and generating coherent and contextually relevant text. “Chat” in ChatGPT refers to its ability to generate human-like responses in a conversational format.
It’s important to note that while ChatGPT can produce impressive outputs, it still has many limitations. It may generate incorrect or nonsensical information, be sensitive to input phrasing, and might not always ask for clarification in case of ambiguous queries. OpenAI has made efforts to improve upon these limitations, and it is a part of ongoing research in the field of natural language processing and AI (17, 22, 23).
The systematic review presented in the paper sheds light on the expanding role of artificial intelligence (AI), particularly leveraging ChatGPT, in the realm of abdominopelvic surgery. The integration of AI into surgical practices has the potential to revolutionize patient care and procedural outcomes.
Implications for practice and further research. The utilization of ChatGPT in medicine brings forth significant implications for both clinical practice and future research. While the model showcases a commendable understanding of medical knowledge and offers potential benefits, it also poses challenges that need to be carefully addressed. In practice, ChatGPT could serve as a valuable resource for disseminating timely and comprehensible medical information to patients, enhancing health literacy. However, the identified issues of inaccuracy and the generation of non-existent or erroneous references underscore the importance of thorough validation and caution in relying solely on AI-generated content for critical medical decisions. The observed negative perception changes among certain patient groups further emphasize the need for tailored communication strategies and consideration of patient demographics in deploying AI-driven tools. Future research should delve into refining ChatGPT’s accuracy and reliability, exploring its impact on clinical decision-making processes, patient outcomes, and healthcare provider workflows. Additionally, investigations into ethical considerations, privacy safeguards, and secure integration with electronic health records are paramount to ensure responsible and secure implementation in medical practice. Continuous collaboration between AI developers, healthcare professionals, and researchers is crucial to harness the potential benefits of ChatGPT while mitigating its limitations for safe and effective integration into medical settings (18, 24-27).
Strengths and weaknesses. This is the first systematic review performed according to the PRISMA statement on the adoption of ChatGPT in the abdominopelvic surgery. This study represents an innovative exploration aligning with the freshness of the subject matter. Notably, all existing literature on this topic emerged in 2023, underscoring the contemporary nature of the research. Nevertheless, the very novelty of the subject contributes to a notable diversity in objectives and outcomes, posing a challenge in establishing measurable or comparable parameters across studies. While the systematic review highlights the potential benefits of AI in abdominopelvic surgery, it is crucial to acknowledge the challenges and considerations. Issues, such as data security, model interpretability, and ethical considerations must be addressed to ensure the responsible and effective integration of AI into surgical workflows. Moreover, the review emphasizes the need for continuous model training to enhance stability and consistency in responses (18).
Conclusion
This systematic review provides a comprehensive overview of the current landscape and future potential of AI language model in the surgical scenario. While recognizing the remarkable advancements, the review encourages a cautious and thoughtful approach to address challenges and optimize the collaborative relationship between surgeons and AI technologies. As the field continues to evolve, ongoing research and technological refinement are essential to unlock the full benefits of AI in abdominopelvic surgery.
Declaration
The Authors declare that AI was not used in any part of the manuscript composition process.
Footnotes
Author’s Contributions
PA approved the final version to be published, MP, MG conceived, designed, and wrote the study, MY provided data, MP collected data, GG analysed data, NP critically revised the article.
Conflicts of Interest
The Authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- Received January 10, 2024.
- Revision received February 27, 2024.
- Accepted March 11, 2024.
- Copyright © 2024, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).