Abstract
Background/Aim: Although, acute appendicitis (AA) and nonspecific abdominal pain (NSAP) are the most common diagnoses among secondary care patients with acute abdominal pain, the diagnostic performance of leucocyte count (LC) in DS (Diagnostic Score) model is rarely considered. Patients and Methods: As an extension of the World Organisation of Gastro-Enterology Research Committee (OMGE) acute abdominal pain study, 1,333 patients presenting with acute abdominal pain were included in the study. The clinical history and diagnostic symptoms (n=22), signs (n=14) and tests (n=3) in each patient were recorded in detail, and the collected data were related with the final diagnoses of the patients. Results: In the ROC comparison test, there was no statistically significant difference in the performance of DSLC− (DS without LC) and DSLC+ (DS with LC). The highest sensitivities of the DSLC− and DSLC+ tests for detecting AA were 86% (95%CI=81-90%) and 87% (95%CI=82-91%), respectively. The highest specificities of the DSLC− and DSLC+ tests for detecting AA were 98% (95%CI=97-99%) and 98% (95%CI=96-99%), respectively. Conclusion: DS could assist the clinician in differentiating AA from NSAP and other causes of acute abdominal pain. Importantly, LC does not improve the diagnostic performance of a DS in AA.
We have studied acute abdominal pain in connection with the survey on acute abdominal pain by the Research Committee of the World Organization of Gastroenterology (OMGE) (1) and investigated the diagnostic performance of history-taking and clinical examination in acute appendicitis (AA) (2), nonspecific abdominal pain (NSAP) (3), acute small bowel obstruction (4) and acute renal colic (5). Given that AA and NSAP are the most common diagnoses among secondary care patients with acute abdominal pain, the diagnostic performance of history-taking, clinical examination and possible diagnostic score (DS) is extremely important. However, the differential diagnosis of AA and NSAP is not always easy due to many similarities in the clinical presentation at the onset and many cases may be misdiagnosed in the initial diagnostic setting. Although, there is some DS models available (2, 6-10) in the diagnosis of acute abdominal pain (AAP) there is continuing debate on the shortcomings of the DS models and we thus aimed to examine the performance of our DS model i) without leucocyte count (DSLC−) and ii) with leucocyte count (DSLC+).
Patients and Methods
Criteria for inclusion in this study and the diagnostic criteria were those set out by the OMGE Committee (1). There were 636 males (47.7%) and 697 females (52.3%) with a mean age (±SD) of 38.0±22.1 years (Table I).
The clinical findings in each patient were recorded in detail (Tables II and III), using a predefined structured data collection sheet. The disease history was recorded and categorised as shown in Tables II and III. The examination of the clinical symptoms, signs and tests were conducted using a standard technique and the results were graded positive or negative (Tables II and III). The diagnosis of acute abdominal pain (AAP) was done by considering all symptoms, signs and results of the laboratory tests together and the diagnostic criteria of AA defined elsewhere (1-3).
The likelihood ratio of a positive test result (LR+) shows how many times greater the probability of a positive test result is among patients with acute appendicitis (AA) than in subjects without acute appendicitis. LR+ should always be higher than 1.0 and LR+ of a good test (diagnostic method) is 10 or higher. The likelihood ratio of a negative test result (LR−) is the probability of a negative test result among patients with AA divided by the corresponding probability among the subjects without acute AA. LR− should be less than 1 and the LR− ratio of a good test is less than 0.1.
Statistical analysis. In the computation of the diagnostic score (DS), a logistic stepwise multivariate regression analysis of the SPSS Statistics 26.0.0.1 (IBM, NY, USA) was used. All the variables presented in Tables II and III were included in the analysis as binary data e.g. AA (1) and NSAP (0). The multivariate analysis was used to disclose the variables with an independent predictive value. Using the coefficients of the regression model, a DS was built and its predictive value for AA was studied. The coefficient of the multivariate analysis shows the relative risk (RR=e_, n=β) of a patient with a given symptom or sign to have an AA.
The rest of the analyses were performed with STATA/SE version 16.1 (StataCorp, College Station, TX, USA). Statistical tests presented were two-sided, and p-value <0.05 was considered statistically significant. Using 2×2 tables, we calculated sensitivity (Se) and specificity (Sp) with 95% confidence intervals (95%CI) for each symptom, sign or test, and created separate forest plots for showing each set of data, separately for each diagnostic variable. We calculated the summary estimates of Se and Sp, positive (LR+) and negative likelihood ratio (LR−) and diagnostic odds ratio (DOR), using a random effect bivariate model and fitted the summary hierarchical receiving operating characteristic (HSROC) curves including all diagnostic variables in the DSLC− and DSLC+ models, using the AA endpoint.
Using the STATA's predict tool, we also made posterior predictions [Empirical Bayes (EB) estimates] of the Se and Sp in each variable in DSLC− and DSLC+. Analogous to its use in meta-analysis, EB estimates here give the best estimates of the true Se and Sp for each diagnostic variable, the variable-specific point estimates usually shrinking toward the summary point of the HSROC. We explored the statistical heterogeneity between diagnostic variables (Tables II and III) and DS models (Tables II and III) through visual examination of the forest plots and the HSROC curves. To study the potential bias, we used the Cook's distance to check for the particularly influential variables, together with a scatter plot of the standardised (level 2) residuals to check for the variables that are distinct outliers.
Results
Diagnostic performance of the symptoms. The pooled overall Se and Sp of the diagnostic symptoms for detecting AA were 75% (95%CI=60-87%) and 35% (95%CI=23-49%), respectively (Figures 1 and 2). In 13 diagnostic symptoms the Se was higher than 75%, and the Sp was higher than 35% in 10 diagnostic symptoms. The five best diagnostic symptoms (vertigo, jaundice, micturition, drugs for abdominal pain and use of alcohol) showed 97-100% Se in the diagnosis of AA (Figure 1). The four best diagnostic symptoms showed 69-91% Sp, the initial pain being the most specific (91%) followed by the intensity of the abdominal pain, sex and location of the pain at diagnosis (Figure 2).
Diagnostic performance of the signs and tests. The pooled overall Se and Sp of the clinical signs and tests for detecting AA were 87% (95%CI=81-92%) and 38% (95%CI=19-59%), respectively (Figures 3 and 4). For 10 clinical signs and tests, the Se exceeded 87%, and the Sp was higher than 38% for 10 diagnostic signs. The best four clinical signs and tests (mass, urine, distension and Murphy's sign positive) showed 96-100% Se for AA (Figure 3). The best four clinical signs and tests showed 71-98% Sp, rigidity (98%) being the most specific, followed by rectal digital tenderness, leucocyte count (LC) and rebound (Figure 4).
Diagnostic performance of the DS without leucocytes (DSLC−). The most significant predictors were used to construct six different DSLC− formulas for AA diagnosis (Table IV). The pooled overall Se and Sp of these six DSLC− models for AA diagnosis were 77% (95%CI=70-84%) and 95% (95%CI=93-97%) (Figures 5 and 6). At the best diagnostic performance level for AA, the DSLC− (formula DS I, Figures 5 and 6) showed Se of 82% (95%CI=77-86%) and Sp of 95% (95%CI=93-97%). The formula without LC, showing the highest diagnostic performance for AA in HSROC analysis is as follows: DSLC−=−1.72×guarding (positive endpoint=1, negative endpoint=0)−0.56×type of pain (positive endpoint=1, negative endpoint=0) −0.9×pain at diagnosis (positive endpoint=1, negative endpoint=0) −1.36×tenderness (positive endpoint=1, negative endpoint=0) −3.32×rigidity (positive endpoint=1, negative endpoint=0) −1.1×vomiting (positive endpoint=1, negative endpoint=0) −1.44×previous abdominal surgery (positive endpoint=1, negative endpoint=0)+6.91. The mean (SD) of DSLC− values for AA (n=270) were −2.06 (2.2) and DSLC− mean (SD) values for NSAP (n=613) were 3.38 (1.91). This DSLC− formula shows Se of 82% (95%CI=77-86%) and Sp of 95% (95%CI=93-97%), which is the best diagnostic performance level for DS without LC (Figures 5 and 6).
Diagnostic performance of the DS with leucocytes (DSLC+). The most powerful predictors were used to build up six different DSLC+ formulas for AA diagnosis (Table V). The pooled overall Se and Sp of these six DSLC+ models for AA diagnosis were 79% (95%CI=72-85%) and 95% (95%CI=93-97%) (Figures 7 and 8). At the best diagnostic performance level for AA, the DSLC+ (formula DS VII, Figures 7 and 8) showed Se of 82% (95%CI=76-86%) and Sp of 95% (95%CI=93-97%), which is the best diagnostic performance level for DS with LC (Figures 7 and 8).
The Se of the best DSLC− and DSLC+ formulas for detecting AA were equal: 82% (95%CI=77-86%) and 82% (95%CI=76-86%), respectively. The Sp of the best DSLC− and DSLC+ formulas for detecting AA were identical: 95% (95%CI=93-97%) and 95% (95%CI=93-97%). The formula with LC (DSLC+) showing the highest diagnostic performance for AA in HSROC analysis (Figure 7) is the following: DSLC+=−0.95×location of pain at diagnosis (positive endpoint=1, negative endpoint=0), −1.77×previous abdominal surgery (positive endpoint=1, negative endpoint=0), −1.16×rebound (positive endpoint=1, negative endpoint=0), −1.61×guarding (positive endpoint=1, negative endpoint=0), −3.32×rigidity (positive endpoint=1, negative endpoint=0), −0.97×tenderness (positive endpoint=1, negative endpoint=0) and −2.2×LC (positive endpoint=1, negative endpoint=0)+7.035. The mean (SD) of DSLC+ values for AA (n=247) were −2.50 (2.27) and DSLC+ mean (SD) values for NSAP (n=492) were 3.38 (2.12) (Figures 7 and 8).
HSROC analyses and empirical Bayes (EB) estimates. STATA (metandiplot algorithm) was used to draw the HSROC curves and empirical Bayes (EB) estimates to visualise the comparison of the pooled overall diagnostic performance of the diagnostic symptoms with the clinical signs and tests in AA diagnosis (Figures 9, 10, 11, and 12). HSROC curves and HSROC-EB estimates were also used to compare the pooled overall diagnostic performance of the different DS formulas in detecting AA (Figures 13, 14, 15, and 16). In the HSROC analysis, there was no statistically significant difference between the DSLC− and DSLC+ formulas, with AUC=0.860 (95%CI=0.85-0.86) and AUC=0.870 (95%CI=0.86-0.88), respectively (p=0.799, ROC comparison test).
Discussion
In this analysis, we focused on the diagnostic performance of the patients' symptoms/signs and DS in a clinical setting of patients with acute abdominal pain. The present study compared all predictive factors for AA diagnosis including 22 clinical symptoms and history variables with 14 diagnostic tests or signs. Sensitivity was defined as the proportion of AA positive patients among those who were diagnosed with the outcome of interest. Specificity referred to the number of participants with negative AA test results divided by the number of participants without AA.
Although there is a general impression that DSLC+ performs better than DSLC−, limited data on the performance of DS makes it difficult to decide which DS test to choose in the clinical diagnosis of AA to keep both negative appendectomy rate low (FP rate) and perforated appendix rate (FN rate) at a minimum. To improve the diagnostic performance in AA, such inflammatory biomarkers as LC and C-reactive protein (CRP) have been included in the DS models as in Alvarado score (6). We feel that LC is more sensitive in early AA than CRP, which is related to the severity of the AA and is a possible biomarker of AA perforation.
Alvarado's DS is based on retrospective data of 305 patients with AAP and included 8 predictive factors for AA, each given a value of 1 point or 2 points based on the diagnostic weight for AA. One point was given for shifting of pain to the right lower quadrant (RLQ), anorexia, nausea or vomiting, rebound, body temperature >37.3°C and LC left shift. Two points were given for tenderness at RLQ and LC>10,000/μl. Alvarado's recommendations for management of AA patients are based on the sum of the points of these eight variables. Alvarado score between 7 and 8 suggests that “AA probable” and the score between 9 and 10 denotes “AA very probable”. In a meta-analysis, Ohle et al. (11) estimated that Alvarado score has 82% Se and 81% Sp at the score 7 cut-off level.
We have studied AAP in connection with the survey of OMGE (1) and investigated the diagnostic performance of history-taking, clinical signs and tests and computer-based decision in confirming AA (2). In Finland, a total of 1,333 patients presenting with AAP were included in the OMGE study (1) and 25 clinical history variables, 13 clinical signs and 3 tests were evaluated in multivariate analyses to find the optimal combinations of independent predictors of AA. The most important predictors of AA were tenderness, rigidity, rebound, LC, location of pain and duration of pain. In practice, the use of DS is relatively simple as shown by the following; “A patient is admitted to the emergency room with abdominal pain of ≤48 h duration (2 points×2,13); at onset the pain was localized in the upper abdomen, but has shifted to RLQ (2 points×3.51); clinical examination showed RLQ tenderness (2 points×11.4) and rigidity (2 points×6.62), and the rebound tests were positive (2 points×5.58); LC test showed positive value (≥10,000/μl, 5.87). In this example, the total score is 67 points and diagnosis is AA. Our original cut-off level for AA was 55 points (2). Sitter et al. (8) tested our DS in a prospective trial including 2,359 patients with AAP. After careful analysis they suggested a higher cut-off value of 57 points, which gives 91% AUC for the AA endpoint.
Ohmann et al. (7) formulated a DS including 8 factors; Age <50 years, shifting of pain to RLQ, type of pain (steady), micturition (normal), tenderness, rebound, rigidity, LC≥10,000/μl. However, when tested in a clinical setting, the Ohmann score did not significantly improve the clinical performance of AA diagnosis (7).
Tzanakis et al. (10) introduced DS for AA diagnosis using a combination of clinical tests for tenderness and rebound in combination with abdominal ultrasound examination (US) and LC (cut-off >12,000/μl). When tested in a clinical setting, their DS reached 90% AUC for the AA endpoint. Although US is less expensive than other imaging methods, its diagnostic performance still remains questionable, especially in borderline US findings and in problems to visualize retrocaecal appendix (12). In a prospective multicenter trial of 2,280 patients with AAP, Franke et al. (12) reported no correlation between the diagnostic performance of US and a clinician, and the negative appendectomy or perforation rate, thus showing no clear benefit of US in AA diagnosis. In another study, Lee et al. (13) found US to even delay appendectomy in a cohort of 766 patients.
In our study, the LC test showed Se of 82% (95%CI=76-86%) and Sp of 95% (95%CI=93-97%), suggesting that the routine determination of the total number of leucocytes and their relative ratio could help in AA diagnosis. However, the delay of appendectomy should be kept in mind, especially when the LC is not fully necessary to support the AA diagnosis. To find the optimal combination of symptoms, signs and tests in the DS formula, we compared the DSLC− and DSLC+ models at six different combinations of predictors. No significant difference in performance between DSLC− and DSLC+ formulas were detected. In the AA endpoint, DSLC− and DSLC+ formulas showed almost equal AUC values in HSROC analysis (0.86 versus 0.87, p=0.799).
The new diagnostic strategies of AA, beside the DS formulas and US may include interleukin 6 (IL-6), which is an early marker of inflammation. IL-6 blood levels were shown to increase even 3-fold from the IL-6 reference levels in 90% of the patients with perforated appendicitis (14, 15). Anielski et al. (15) found significantly higher IL-6 serum levels in patients with gangrenous perforated AA, suggesting that IL-6 test could be useful in assessing the risk of complications during the course of AA. Although, the IL-6 results are promising, the current enzyme-linked immunosorbent assay (ELISA) precludes its use as a point-of-care (POC) test in AA so far (16, 17).
In conclusion, the DS test could assist the clinician in differentiating AA from NSAP and other causes of acute abdominal pain. Importantly, LC does not improve the diagnostic performance of a DS in AA.
Acknowledgements
The study was funded by the Päivikki ja Sakari Sohlberg Foundation.
Footnotes
Authors' Contributions
All Authors have met all of the following four criteria: 1. Substantial contributions to the conception or design of the work or the acquisition, analysis, or interpretation of data for the work, 2. Drafting the work or revising it critically for important intellectual content, 3. Final approval of the version to be published, 4. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
This article is freely accessible online.
Conflicts of Interest
The Authors report no conflicts of interest or financial ties in relation to this study. The Authors alone are responsible for the content and writing of this article.
- Received July 17, 2020.
- Revision received July 30, 2020.
- Accepted August 3, 2020.
- Copyright© 2020, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved