Abstract
Background/Aim: Demographic change and increasing complexity of therapy decisions lead to a growing burden on the healthcare system, necessitating efforts to simplify and enhance the efficiency of patient care. The present study evaluates ChatGPT’s ability to provide therapy recommendations for gynecological malignancies that are both in line with the local guidelines and individually tailored to the patient. Patients and Methods: Sixteen patients with endometrial, cervical, and ovarian cancer who were treated in the gynecological clinic of the University Hospital Erlangen from January 2022 to August 2023 were included in the analysis. Data collected within clinical routine care were communicated to the chat-based AI model ChatGPT (version 3.5). ChatGPT’s performance generating treatment plans were evaluated using an answer scoring system and descriptive analysis. Results: According to the answer scoring system [range: −1 point (minimum) to 2 points (maximum)], ChatGPT demonstrated a good potential in generating therapy recommendations with an average score of 0.75 points for patients with ovarian cancer, 0.7 points for cervical and 1.5 points for endometrial cancer patients. The most common deductions in points were about incomplete therapy recommendations, whereas contraindicated treatment modalities were rarely suggested. Individual patient characteristics were regularly considered by ChatGPT. ChatGPT reliably indicated aftercare and provided detailed information on preventive measures as well as supportive treatment. Conclusion: ChatGPT is a promising tool for the generation of therapy suggestions for gynecological carcinomas with high flexibility in response to individual patient differences. At the current state, however, ChatGPT is not suitable for replacing expert panels.
Due to demographic change with an increase in the older population, a growing burden on the healthcare system is to be expected (1, 2), necessitating efforts to simplify and enhance the efficiency of patient care. In gynecological oncology, the complexity of therapy decisions is escalating. Molecular classifications, evolving surgical recommendations, and novel therapeutics like checkpoint inhibitors and PARP [poly(ADP-ribose)-polymerase] inhibitors demand personalized approaches for endometrial, cervical, and ovarian cancers (3-5). The implementation of artificial intelligence (AI) and machine learning to address increasingly complex patient care is currently being intensively investigated in the oncological field (6-8). Tested applications include the detection of risk factors, generation of therapy recommendations and diagnostics (6-8). One of the most advanced and open accessible examples of AI is the Chatbot Generative Pre-trained Transformer (ChatGPT), developed by OpenAI in San Francisco, USA. It is capable of understanding and generating human-like text. ChatGPT stands out because it has been trained on the most extensive dataset ever used by any AI (9, 10). Additionally, ChatGPT shows a high internal concordance (11). ChatGPT is able to pass medical exams, remarkably without specialized human training (10, 11). ChatGPT’s application in treatment decisions was investigated for various carcinomas (10, 12-14). A potential of ChatGPT as a supportive tool for treatment decisions was revealed in primary diagnosis of early breast cancer, for example (13). In another study, case summaries of brain glioma patients were utilized for the communication with ChatGPT and the AI was found to recommend useful therapy strategies but also of not taking sufficient account of the patients’ functional status (12). However, the general state of health of the patient, including comorbidities, and social factors must urgently be considered in the decision-making process. While interdisciplinary tumor boards (ITB) convene specialists from diverse fields and show numerous benefits (4, 5, 15, 16), the sheer volume of available and pertinent information for making optimal therapy decisions cannot be comprehensively compiled by human capacity alone. The present study evaluates ChatGPT’s ability to create treatment plans for gynecological malignancies that are both in line with the local guidelines and individually tailored to the patient.
Patients and Methods
Patient cohort. Ethics approval for the present work was granted from the ethical committee FAU Erlangen-Nürnberg. Patients with endometrial, cervical, and ovarian cancer who were treated in the gynecological clinic of the University Hospital Erlangen (Erlangen, Germany) in the time period between January 2022 and August 2023 were included in the analysis. The only inclusion criteria were that the initial diagnosis of the gynecological carcinoma was made in 2022 and that the patient had already been presented to the interdisciplinary tumor board (ITB) at least once. Otherwise, the cohort contained patients of different age, secondary disease, tumor stage as well as histological and molecular subtype. The patients’ characteristics are depicted in Table I, Table II, and Table III.
Ovarian carcinoma patients: patient characteristics with corresponding treatment recommendations from ChatGPT (=AI-R) and the interdisciplinary tumor board (ITB).
Cervical carcinoma patients: patient characteristics with corresponding treatment recommendations from ChatGPT (=AI-R) and the interdisciplinary tumor board (ITB).
Endometrial carcinoma patients: patient characteristics with corresponding treatment recommendations from ChatGPT (=AI-R) and the interdisciplinary tumor board (ITB).
Methodology of data collection. Data collected within clinical routine care from medical history, clinical and instrumental examinations, laboratory diagnostics and imaging as well as further individual patient characteristics (e.g., social environment) were communicated to the chat-based AI model Chat-GPT (version 3.5). The entered information corresponded exactly to the data presented in the ITB. Only patient-identifying information, namely name, birth date, contact information, provenance and attending physicians, were explicitly excluded from the query. The study included initial diagnoses and second opinions. In addition, several patients were discussed in the tumor board numerous times, during the course of their disease. The information from each individual tumor board, which was held from January 2022 to August 2023, was presented to ChatGPT step-by-step in individual queries and the treatment suggestions were summarized in one treatment plan. This approach was chosen in order to reflect reality as accurately as possible. At the time of the analysis, the dataset of Chat-GPT (version 3.5) contained information up until September 2021. The communication with ChatGPT was performed in a standardized manner in German language, with “Bitte gib mir eine Therapieempfehlung analog der aktuellen S3-Leitlinie für folgende Patientin” (eng.: “Please give me a therapy recommendation analogous to the current S3 guideline for the following patient”) as an introductory prompt. To prevent previous responses from affecting the results generated by the model, a fresh ChatGPT session was initiated for every new prompt. Two different investigators presented the data a total of four times on four different days to ChatGPT, in order to check the reliability of the answers. In this way, four treatment plans were created for each patient by the AI, two based on the prompts from investigator A and two based on the prompts from investigator B. Five patients with ovarian cancer, five patients with endometrial cancer and six patients with cervical cancer were analyzed, resulting in a total of 20-24 treatment plans per tumor entity. Hereafter, the abbreviation “AI-R” (for “AI-response”) is used synonymously with the term “treatment plan” in order to present the results as simply as possible.
Data analysis. Answer scoring system. In order to illustrate the performance of the AI, a score was calculated for each AI-R as well as for each tumor board decision following the model of Lukac et al. (see Table IV) (13).
Answer scoring system, modified according to Lukac et al. (13).
Descriptive analysis. Besides the answer scoring system, the data was primarily reviewed descriptively. ChatGPT generated therapy decisions for cervical carcinoma were evaluated using the three categories of surgical treatment, lymphadenectomy [including the sentinel node biopsy (SNB) procedure] and radio-chemotherapy (RCT). Endometrial and ovarian cancer were each analyzed on the basis of four variables: surgical treatment, lymphadenectomy [including (SNB)], radiotherapy, chemotherapy for endometrial cancer and surgical therapy, chemotherapy, targeted therapies with vascular endothelial growth factor (VEGF) antibodies or PARP inhibitors and genetic consulting for ovarian cancer. Table I, Table II, and Table III compare ChatGPT’s responses with the results of the ITB. In addition to the variables shown in Table I, Table II, and Table III, the recommendations for follow-up examinations and monitoring were also evaluated for each entity. It was also analyzed whether treatment plans included psychosocial support, supportive therapy, and interdisciplinary care.
Results
A total of 16 patients with gynecological carcinomas were identified, including five each with ovarian and endometrial carcinoma and six with cervical carcinoma. The age of the patients examined ranged from 27 to 86 years. The study included a comprehensive examination of both early-stage carcinomas and advanced tumors [Fédération Internationale de Gynécologie et d’Obstétrique (FIGO) IA-IVB]. Notably, the endometrial carcinoma group primarily consisted of early-stage carcinomas, while the ovarian carcinoma group was characterized by a higher proportion of advanced tumors. ChatGPT was able to successfully design four treatment plans for each patient.
Ovarian carcinoma. Answer scores for ovarian carcinoma patients. Across all queries for ovarian carcinoma (OC), ChatGPT achieved an average score of 0.75 points according to the answer scoring system [range: −1 points (minimum) to 2 points (maximum)]. For patients 1-4, 11 out of 16 treatment plans (11 out of 16 AI-Rs) were awarded one point, as the proposed treatments were correct and described in detail, but individual treatment steps (e.g., genetic testing, further surgical interventions or targeted therapy with VEGF and PARP inhibitors) were missing from the treatment plan (see Table I). For patient 5, one point was deducted for each AI-R for the mention of chemotherapy, although the patient’s general condition was limited. The score of 0 points was awarded a total of five times among all treatment plans for ovarian cancer (5 out of 20 AI-Rs) because, in addition to detailed, though incomplete therapy recommendations, incorrect treatment modalities such as radiotherapy were also suggested by ChatGPT (see Table I). The maximum score was not achieved once in the OC subgroup.
Surgical treatment. In the present study, the ITB identified an indication for surgical treatment in all of the OC patients. With the exception of one patient, the recommendations by ChatGPT were consistent with this. Patient 4, a 45-year-old perimenopausal woman with a low-grade serous carcinoma of the right ovary, contacted the clinic for a second opinion after having undergone hysterectomy, salpingectomy on both sides, biopsies of the ovaries and omentectomy at another hospital. The ITB decided on a completing surgery with ovariectomy on both sides, appendectomy as well as pelvic and paraaortic lymphadenectomy. ChatGPT failed to recognize the need for further surgical treatment. Another case worth discussing was patient 5, an 86-year-old woman with a number of concomitant diseases (including dementia, arterial hypertension and severe stress incontinence). Both the ITB and ChatGPT suggested surgical treatment. However, ChatGPT pointed out that surgery must be carefully considered in this patient due to her general state of health and advanced age. As an alternative to radical surgery, ChatGPT suggested less invasive surgical methods, such as laparoscopic removal of the tumor without further interventions on other organs. Another critical observation was ChatGPT’s tendency to omit salpingectomy and lymphadenectomy from surgical recommendations. However, hysterectomy and ovariectomy as well as peritoneal sampling and removal of all other affected tissue in the sense of a debulking operation were reliably mentioned by ChatGPT.
Chemotherapy and other systemic therapies. Chemotherapy was suggested by the ITB for four out of five ovarian cancer patients. The ITB did not recommend systemic therapy for patient 5 due to her known dementia. However, all of ChatGPT’s treatment plans for the patients with carcinoma of the ovaries covered chemotherapy. The AI was able to identify carboplatin and paclitaxel as suitable pharmaceutical agents for this treatment.
Other systemic therapies indicated by the ITB included VEGF antibody and PARP inhibitor therapy. ChatGPT recommended targeted therapy or immunotherapy for all patients with higher FIGO stages (IVA and B), in line with the German guidelines (5, 17). For one patient with ovarian cancer of FIGO stage IVA, the AI suggested maintenance therapy with bevacizumab after completion of chemotherapy with carboplatin and paclitaxel in three out of four treatment plans. PARP inhibitors were not explicitly mentioned once by ChatGPT. However, in two patients with FIGO stage IV ovarian cancer, the AI suggested testing the tumor tissue for specific mutations, in order to check the eligibility for targeted therapy. It should be mentioned that even the ITB did not consistently indicate PARP inhibitor therapy for patients with FIGO stage IV. This can be explained by the fact that genetic testing had not yet been carried out at the time of the treatment decision and the choice of a PARP inhibitor was directly dependent on this test result (5, 18, 19).
For a patient with low-grade serous ovarian carcinoma, both ITB and ChatGPT recommended endocrine therapy based on the hormone receptor status of the tumor. ChatGPT suggested postoperative hormone replacement therapy after bilateral adnexectomy for pre- and perimenopausal women only after careful consideration of the risks, reflecting current guideline ambiguities (5).
Genetic consulting. ChatGPT recommended genetic consulting to four out of five ovarian cancer patients, accurately identifying relevant genes like BReast CAncer 1/2 (BRCA 1/2). It appropriately did not suggest genetic testing for an 86-year-old patient with no family history, in line with current guidelines (20, 21). For two patients, genetic consulting was consistently advised in all treatment plans. For the remaining two, ChatGPT’s recommendations varied, potentially influenced by one patient’s explicit opposition to genetic testing, demonstrating its capacity to integrate patient preferences into its suggestions.
Further recommendations. Radiotherapy, typically not established in ovarian cancer treatment (5), was included in seven of ChatGPT’s 20 treatment plans. ChatGPT frequently suggested additional diagnostics, like supplemental imaging and tissue biopsies. It often specified biopsy methods, explicitly recommending surgical sampling and rightly advising against transdermal puncture for ovarian tumors due to metastasis risk (22). The AI also proposed cytology from peritoneal fluid in a case with ascites.
Cervical carcinoma. Answer scores for cervical carcinoma patients. In early-stage cervical carcinoma (CC) patients 6 and 7, ChatGPT performed relatively poorly with a total of five times (5 out of 8 AI-Rs) 0 points for incomplete recommendations (see Table II). For patients with advanced tumor stages (patients 8-11), ChatGPT’s recommendations were more frequently in line with those of the ITB (see Table II). A score of at least 1 was achieved in 12 out of 16 AI-Rs, and 0 points were awarded only four times (4 out of 16 AI-Rs). Across all queries for cervical carcinoma, ChatGPT achieved an average score of 0.7 points according to the answer scoring system.
Surgical treatment. Surgical resection of the tumor was suggested in two out of six cervical carcinoma patients by our ITB in accordance with the local guidelines. As one can see from Table II, there is a clear discrepancy with the answers from ChatGPT at this point. In patients 6 and 7, who suffered from early-stage cervical carcinoma (FIGO ≤ IA2), ChatGPT did not consistently suggest surgical treatment, in contrast to ITB.
Patient 6, for example, was a 34-year-old premenopausal woman with completed family planning. At the time of her initial presentation at our clinic she had already undergone laser conisation with cervical curettage and the histopathological examination revealed residual tumor tissue (R1). ChatGPT only recognized the need for further surgical intervention in two out of four AI-Rs.
Patient 7, a 27-year-old premenopausal woman, was presented to the ITB following a trachelectomy with bilateral sentinel node biopsy (histopathological tumor stage according to TNM 2017: pT1a2 pN0 L0 V0 Pn0 R0). According to the current German guidelines, a radical trachelectomy is an adequate treatment option in this tumor stage if the patient desires to have children and the lymph nodes are histologically negative (4). In this individual case, as patient 7 had not yet completed family planning, not suggesting any further surgical intervention should not be considered a mistake. However, it was not explicitly stated by ChatGPT that a secondary hysterectomy should be performed after a successful pregnancy, especially in cases of HPV persistence, PAP abnormalities, desire for safety, or limited evaluability of the cervix (4, 23).
Although the AI did not suggest adnexectomy for patients 6 and 7, ChatGPT proposed postoperative hormone replacement therapy for both women if necessary due to their premenopausal status.
Radiochemotherapy. In patient 8-11, our ITB recommended primary radiochemotherapy without prior tumor resection. ChatGPT regularly recognized the indication for radiochemotherapy and was able to identify cisplatin as a suitable radiosensitizer. In two patients, the AI additionally addressed the radiotherapeutic procedures of percutaneous radiotherapy and brachytherapy. Nevertheless, ChatGPT recommended additional radical hysterectomy in multiple AI-Rs (see Table II).
Lymphadenectomy. A lymphadenectomy in the form of a sentinel node biopsy was suggested by the ITB for one patient with early-stage cervical carcinoma (patient 6) due to the tumor type (adenosquamous cervical carcinoma), high grading (G3) of the tumor cells and the patient’s young age. According to the current German guidelines, there is no clear indication for sentinel node biopsy in the presence of an early tumor stage FIGO IA1 and lack of lymph vessel invasion (4). The recommendation for sentinel node biopsy can certainly be discussed in this case. However, not suggesting this procedure in the treatment plan is clearly not a mistake on ChatGPT’s part.
Furthermore, a lymph node examination was indicated by the ITB in patients 8, 10 and 11, each with advanced cervical carcinoma. It is important to note that patient 10 was suspected to have FIGO stage III preoperatively and was only diagnosed with FIGO stage IV during the operation. ChatGPT recognized the indication for surgical lymph node examination in the majority of queries (8 out of 12 AI-Rs).
ChatGPT did not recommend unnecessary lymphadenectomy in patients who had already undergone this procedure (patients 7 and 9), demonstrating its capacity to integrate prior medical interventions into its treatment plans.
Further recommendations. In the present study, ChatGPT demonstrated its versatility in treatment planning for advanced cervical carcinoma, repeatedly suggesting immunotherapy and delving into specific therapies like checkpoint inhibition with pembrolizumab. It also emphasized comprehensive patient care, advocating for preventive measures, such as HPV (human papilloma virus) vaccination, lifestyle changes (e.g., smoking and alcohol abstinence, healthy diet, exercise, stress management, sexual abstinence), and genetic consulting when relevant.
Endometrial carcinoma. Answer scores for endometrial carcinoma patients. For two patients, ChatGPT scored maximum points for every single AI-R. In the other three cases, the maximum answer score was achieved with at least one of the four treatment plans. The results are shown in Table III. Across all queries for endometrial carcinoma (EC), ChatGPT achieved an average score of 1.5 points according to the answer scoring system.
Surgical treatment and lymphadenectomy. In the presented data, surgical treatment recommendations from ChatGPT aligned closely with the ITB, suggesting surgical treatment for all EC patients, and emphasizing the preference for minimally invasive procedures if the oncological risk permitted. ChatGPT explicitly recommended hysterectomy in most cases (18 out of 20 AI-Rs). However, bilateral adnexectomy was suggested less consistently, in only nine AI-Rs, although all the endometrial carcinoma patients analyzed were postmenopausal and this intervention would have been indicated according to the German guidelines (16, 24). In three further AI-Rs, ChatGPT did recommend at least ovariectomy.
The need for lymphadenectomy was well identified by ChatGPT (see Table III), with the exception of patient 16. It is important to note that patient 16 was initially only known to have atypical endometrial hyperplasia and that the plans for surgical treatment - both from ChatGPT and the ITB - were based on this information. The diagnosis of endometrial cancer was not made until hysterectomy. This explains why ChatGPT did not indicate lymphadenectomy for this patient. In this case, the ITB did not clearly recommend a lymphadenectomy either. Instead, the patient was only offered a sentinel node biopsy. Eventually, the patient opted for lymph node surgery, which resulted in a negative nodal status [pN0 (0/3)]. ChatGPT’s deviation in recommending endometrial ablation as an alternative to hysterectomy for patient 16, contrary to guidelines, underscores the AI’s limitation in discerning suitable surgical interventions in complex cases (16).
Radiotherapy. According to the ITB, two of the patients were eligible for radiotherapy. ChatGPT aligned with this recommendation in three out of four therapy plans. However, it should be mentioned here that the AI classified radiotherapy in one of the treatment plans as “not absolutely necessary”, but merely as “to be considered”. For those patients for whom radiotherapy was not indicated by the ITB, ChatGPT also consistently did not suggest radiation.
Chemotherapy and other systemic therapies. Chemotherapy was recommended by the ITB for one out of five patients with endometrial cancer. For this patient, three out of four treatment plans generated by ChatGPT also included chemotherapy. Conversely, the AI suggested chemotherapy twice for a patient without a corresponding indication (see Table III). Other systemic therapies proposed by ChatGPT included immunotherapy and endocrine therapy. It should be mentioned in advance that these types of therapy were not recommended by the ITB even once in the patients examined. ChatGPT suggested immunotherapy for a total of two patients. In one of these patients, the tumor cells showed aberrant strong nuclear overexpression of p53 (p53 mutation signature). The AI presented the possibility of using p53 as a target for immunotherapy. In another patient with a first diagnosis of endometrial carcinoma FIGO stage IB with grading G3 and mismatch-repair deficiency, ChatGPT suggested considering immunotherapy (e.g., with immune checkpoint inhibitors), as mismatch repair (MMR)-deficient tumors may respond to this. According to the current German guideline, patients with relapsed or primarily advanced endometrial cancer with microsatellite-instable/mismatch-repair deficient tumor tissue can be treated with immunotherapy after prior treatment with platinum-based chemotherapy (16). Dostarlimab or pembrolizumab are particularly suitable substances for this therapy (16, 24). ChatGPT’s suggestions for immunotherapy are consequently not supported by the current guidelines. However, its therapy recommendations indicate that the AI is aware of certain correlations, for example between MMR deficiency and response to checkpoint inhibitor therapy (16, 24-26), and that ChatGPT incorporates this information into its treatment plans. Endocrine therapy was suggested more frequently by ChatGPT, in 11 out of 20 AI-Rs. In many cases, the type of hormonal therapy was not specified. In response to one query, the AI specifically recommended hormone replacement therapy in the postoperative setting after hysterectomy. A total of four times, endocrine therapy with progesterone was listed as a treatment option. ChatGPT identified megestrol acetate as an example of a suitable substance for such therapy. The AI emphasized that hormonal therapy is useful either in the adjuvant setting or if patients are not eligible for surgical treatment or radiotherapy.
Further recommendations. Another suggestion ChatGPT mentioned very frequently - more precisely in 15 out of 20 AI-Rs - was genetic consulting. In a patient with a p53 mutation of the tumor tissue and a family history of pancreatic cancer, the AI pointed out the possible presence of Li-Fraumeni syndrome. In another case where mutS homolog 6 (MSH6) loss of the tumor cells was detected, ChatGPT advised testing for Lynch syndrome. There was also one patient whose tumor showed an MMR protein deficiency of mutL homolog 1 (MLH1) and postmeiotic segregation increased 2 (PMS2). In this case, in addition to genetic consulting, the AI recommended an MLH1 promoter methylation analysis for further clarification. Other examinations ChatGPT occasionally requested included mainly imaging examinations for more precise staging (e.g. MRI examinations), which was the case for four patients in one AI-R each.
As can be seen from its answers, ChatGPT based its suggestions for diagnostics and therapy i.e., on immunohistochemical test results and, if these were not available, the AI actively requested them. In particular, subtyping of the tumor based on microsatellite instability, p53 and DNA polymerase epsilon (Polε) mutation status was required by ChatGPT multiple times. Using the molecular classification of endometrial carcinoma, the AI was also able to make statements about the prognosis of patients. For example, ChatGPT stated that Polε mutations can be associated with a more favorable prognostic profile.
Cross-entity observations. One observation for all carcinomas examined was that, depending on the patient’s history, the AI called for further diagnostics to be carried out pre-therapeutically, for example imaging with computer tomography or magnetic resonance imaging as well as clinical examination and determination of tumor markers. In the event of abnormalities in the staging, for example calcification in the liver, pulmonary lesions, splenic cysts, and nipple retraction, ChatGPT called for immediate diagnostic clarification or recommended a follow-up.
With regard to the indication of regular aftercare and follow-up examinations, ChatGPT was reliable in 93.75% of cases. For cervical carcinoma, the AI was able to correctly list the necessary examinations, for example gynecological examination, cytological smear test and, if necessary, HPV test, tumor marker controls, colposcopy, or imaging procedures. ChatGPT also provided details about the recommended check-ups for ovarian and endometrial cancer patients, including physical examinations, imaging, and tumor marker controls. In individual cases, ChatGPT gave additional information on the necessary follow-up intervals.
In ChatGPT’s responses, great importance was attached to supportive therapy, including analgesia, management of comorbidities as well as lifestyle adjustments, such as a balanced diet, regular physical activity, smoking cessation and abstaining from alcohol consumption. Depending on the patient’s complaints, also specific suggestions were made, for example pleural or ascites drainage. Interestingly, psychosocial support was suggested most frequently for ovarian cancer, in 65% of treatment plans (13/20 AI-Rs). In cervical cancer, this suggestion was made in 50% of cases (12/24 AI-Rs), and in endometrial cancer in only 20% (4/20 AI-Rs). One explanation for this could be that the patients with ovarian cancer in this study had on average a higher tumor stage compared to the patients of the CC and EC subgroup and therefore a significantly worse prognosis (27). It should be emphasized that these measures are part of the standard of care at our clinic, even without a prior tumor conference decision, and are therefore not regularly recorded in the ITB.
ChatGPT repeatedly emphasized the need for interdisciplinary care, tailored to the individual patient. In the case of an elderly patient with severe stress incontinence, for example, the AI suggested geriatric and palliative care as well as urological treatment in addition to gynecological-oncological therapy. ChatGPT’s consideration of clinical trial participation, especially for ovarian and cervical carcinoma cases, further underscores its potential in supporting nuanced cancer treatment strategies.
Discussion
ChatGPT’s potential in tailoring gynecological cancer therapies. Unlike previous scientific work in the field of ChatGPT-based therapy decisions, which rely on standardized input forms, this study showed that ChatGPT was able to provide oncological treatment recommendations based on real individual patient cases from clinical practice. In this study, the information for the AI was not reduced to the essentials, such as tumor entity, tumor stage, immunohistochemistry, and patient age. Instead, the entered data corresponded exactly to the data presented in the ITB, including comorbidities, social environment, and the patient’s attitude towards certain treatment modalities.
ChatGPT demonstrated adaptability in cancer therapy recommendations, tailoring plans to individual patient profiles, including secondary diseases, social factors, and overall health. It suggested custom follow-ups for abnormal findings, involved family in decisions for a dementia patient, and considered advance directives. ChatGPT offered alternative treatments for older or medically complex patients, like less invasive surgeries for an elderly ovarian cancer patient and acknowledged the challenges of certain interventions in advanced cases. This illustrates ChatGPT’s nuanced understanding of the individual cases which is way beyond basic algorithmic decision-making.
ChatGPT consistently recommended comprehensive supportive therapy, covering pain management, stress avoidance, psycho-oncological care, and lifestyle adjustments. It provided specific instructions for symptom control and emphasized regular monitoring and treatment of pre-existing conditions, stressing the importance of interdisciplinary collaboration. Notably, when suggesting hormone replacement therapy, ChatGPT considered tumor characteristics to assess potential risks, demonstrating its ability to weigh the benefits and disadvantages of specific interventions.
ChatGPT identified risk factors for various carcinomas, recommending nicotine abstinence and HPV vaccination for CC patients for example. However, it occasionally missed genetic testing indications unless a significant family history was present. Addressing this by pre-training ChatGPT on testing criteria could enhance its performance.
ChatGPT was able to identify carboplatin and paclitaxel as suitable for systemic treatment of ovarian and endometrial cancer. In the context of radiochemotherapy for cervical cancer, cisplatin was rightly suggested as an appropriate pharmaceutical agent. The VEGF inhibitor bevacizumab was correctly chosen in individual ovarian cancer cases. PARP inhibitors were not specifically listed, however, ChatGPT recommended targeted therapy depending on the genetic characteristics of the tumor for one patient with ovarian cancer. No immunotherapy with a PD-1 antibody was indicated for any of the included patients in the ITB. Accordingly, it remains unclear whether ChatGPT would have recognized the indication for such therapy on a regular basis. However, ChatGPT highlighted the potential of immunotherapy with checkpoint inhibitors in cases with advanced cervical and endometrial carcinoma, even specifying pembrolizumab as a suitable agent.
The AI regularly recognized the indications for surgery or radiotherapy. In this context, ChatGPT was also able to interpret immunohistochemical and molecular genetic findings, explicitly deciding against radiotherapy in the case of a Polε mutation, for example. On the other hand, the AI did not always go into details of these therapies and, if it did, individual steps were sometimes forgotten. Due to ChatGPT’s partially incomplete therapy suggestions, it is best used as a supporting tool, but certainly cannot replace precise intervention planning by medical experts.
ChatGPT reliably recommended aftercare, including specific instructions for clinical examinations, tumor marker monitoring, and imaging. The AI also discussed aftercare intervals, especially for cervical and endometrial cancer, and suggested appropriate follow-up examinations in case of abnormal staging findings. On the other hand, ChatGPT also made some misjudgments, recommending adjuvant radiotherapy in ovarian cancer in single cases or suggesting endometrial ablation as an alternative to hysterectomy for one patient with atypical endometrial hyperplasia. ChatGPT also produced quite long answers with a lot of safety instructions for the user. Although this prevents misuse of the provided information by medical laymen or even patients and their relatives, it is quite impractical for clinical routine.
Limitations of the present work. It should be noted that all responses of ChatGPT were based on one-time queries and no further requests for clarification or summary of certain information were performed. This approach was chosen to ensure better comparability between the prompts. However, it remains to be investigated to what extent the AI is able to specify its answers on further inquiry.
ChatGPT version 3.5 (released on 15 March 2022) was used for this study, which was the latest version of the chatbot broadly available at the time of data retrieval. The limited accessible ChatGPT version 4.0 was released on March 14, 2023, and was shown to have a higher accuracy compared to the older version 3.5 (28-30). However, current publications suggest that both the older version 3.5 and version 4.0 generate partially incorrect or non-existent references (28). Relying solely on ChatGPT to answer medical questions can therefore pose a safety risk. At the moment, ChatGPT is not feasible for reliable treatment decisions in oncology.
When evaluating the answers of ChatGPT, it should also be emphasized that its dataset only contained information up to September 2021 at the time of the prompts. The current S3 guidelines of all carcinomas examined in the present work were published chronologically thereafter, in the year 2022. Since ChatGPT’s programming language is English-based and a wide range of scientific work in the field of gynecological oncology is available in English only, the interaction with ChatGPT in German language may represent a limitation of the present work. It is possible that providing ChatGPT with information in other languages would have provoked different recommendations. Restrictions of this study also include the small sample size of patients as well as the inhomogeneity of the cohort.
The differences in the answer scores between the tumor entities are most likely due to the fact that the EC subgroup contained only early-stage carcinomas, albeit with different molecular biological risk profiles. In contrast, the OC and CC subgroup also included patients with advanced tumors, who naturally require more complex treatment plans.
Conclusion
The present feasibility study indicates ChatGPT as a promising tool for generating therapy suggestions in gynecological oncology, with a strong focus on supportive care and individual patient differences. By the current state however, ChatGPT is not suitable for replacing expert panels and its use by medical laypersons for medical questions cannot be recommended. Integrating artificial intelligence into a trusting relationship between the physician and the patient will be a major challenge in the future. Instead of simple one-time queries, further work should also analyze real dialogues with ChatGPT and include future versions of the AI in their investigations.
Acknowledgements
ChatGPT 4.0 (Open AI, https://chat.openai.com) was used to proofread the final draft.
Footnotes
Authors’ Contributions
All Authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Annika Krückel and Lena Brückner. The first draft of the manuscript was written by Annika Krückel and all authors commented on previous versions of the manuscript. All Authors read and approved the final manuscript.
Conflicts of Interest
The Authors have no relevant financial or non-financial interests to disclose with regard to the present work.
- Received January 24, 2024.
- Revision received March 3, 2024.
- Accepted March 19, 2024.
- Copyright © 2024, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).