Abstract
Background/Aim: T2 weighted magnetic resonance (MR) imaging is the gold standard for locally advanced rectal cancer (LARC) staging. The potential benefit of functional imaging, as diffusion-weighted MR (DWI) and positron emission tomography-computed tomography (PET-CT), could be considered for treatment intensification strategies. Dose intensification resulted in better pathological complete response (pCR) rates. This study evaluated the inter-observer agreement between two radiation oncologists, and the difference in gross tumor volume (GTV) delineation in simulation-CT, T2-MR, DWI-MR, and PET-CT in patients with LARC. Patients and Methods: Two radiation oncologists prospectively delineated GTVs of 24 patients on simul-CT (CTGTV), T2-weighted MR (T2GTV), echo planar b1000 DWI (DWIGTV) and PET-CT (PETGTV). Observers’ agreement was assessed using Dice index. Kruskal-Wallis test assessed differences between methods. Results: Mean CTGTV, T2GTV, DWIGTV, and PETGTV were 41.3±26.9 cc, 25.9±15.2 cc, 21±14.8 cc, and 37.7±27.7 cc for the first observer, and 42.2±27.9 cc, 27.6±16.9 cc, 19.9±14.9cc, and 34.8±24.3 cc for the second observer, respectively. Mean Dice index was 0.85 for CTGTV, 0.84 for T2GTV, 0.82 for DWIGTV, and 0.89 for PETGTV, representative of almost perfect agreement. Kruskal-Wallis test showed a statistically significant difference between methods (p=0.009). Dunn test showed there were differences between DWIGTV vs. PETGTV (p=0.040) and DWIGTV vs. CTGTV (p=0.008). Conclusion: DWI resulted in smaller volume delineation compared to CT, T2-MR, and PET-CT functional images. Almost perfect agreements were reported for each imaging modality between two observers. DWI-MR seems to remain the optimal strategy for boost volume delineation for dose escalation in patients with LARC.
- Diffusion weighted imaging
- magnetic resonance imaging
- GTV
- inter-observer agreement
- 18F-FDG PET-CT
- rectal cancer
- radiotherapy
Standard treatment for locally advanced rectal cancer (LARC) is represented by neoadjuvant chemoradiotherapy (CRT) followed by organ preservation surgery. It is well known that clinical outcomes depend on the results of these treatments, in particular related to pathological complete response (pCR).
Response to neoadjuvant CRT is dose-dependent with pCR rates reaching 20.4% using treatment intensification, when dose escalation above 60 Gy are delivered (1). Furthermore, both better dose distribution to the target as well as sparing the adjacent small bowel and other organs at risk (OARs) can be obtained with modern radiotherapy techniques, such as intensity-modulated radiotherapy (IMRT) (1-3) and volumetric arc therapy (VMAT) with simultaneous integrated boost (SIB) strategies. In this scenario, an accurate definition of the target is required.
Despite its poor soft-tissue contrast, computed tomography (CT) remains necessary for treatment planning. Instead, magnetic resonance (MR) imaging, with its better resolution compared to CT, is considered the gold standard for rectal tumor staging and assessment after neoadjuvant CRT (4, 5).
Diffusion-weighted MR (DWI-MR) can detect areas with high cellularity, and rectal tumors are densely cellular compared to normal tissues, showing restricted diffusion (6, 7). This characteristic allows to predict tumor response during and after CRT (8, 9).
Besides, positron emission tomography-computed tomography (PET-CT) with 18Fluoro-deoxyglucose (18F-FDG PET-CT) can identify both primary rectal lesion and metastatic disease, providing metabolic information (10). Standardized uptake value maximum changes between pre- and post-treatment PET-CT can give information regarding/predicting tumor response (10, 11).
MR and 18F-FDG PET-CT have been studied for their potentiality of defining a biological target volume based on metabolic information (12-16), suggesting a potential benefit of these images when a radiation boost dose is planned.
In these regards, we previously reported our results of 322 LARC patients treated in our Radiotherapy Department. Dose escalation up to 55 Gy associated with fluoropyrimidine chemotherapy obtained a significantly higher tumor regression grade (TRG)1-2 rate of 59.4% (p=0.046) compared to standard doses of 50 Gy with fluoropyrimidine (TRG1-2: 42.2%). Furthermore, tumor response as TRG1-2 was associated with statistically higher rates of 5- and 10-year overall survival (OS) (p=0.001) and disease-free survival (DFS) (p=0.014) (17).
With the aim of evaluating a biological target volume for treatment intensification, we analyzed the difference in volumes of gross tumor volume (GTV) delineation in simul-CT, T2-MRI, DWI-MR, and 18F-FDG PET-CT of patients with LARC and calculated the inter-observer agreement between two radiation oncologists.
Patients and Methods
Study population. Twenty-four consecutive patients with LARC were enrolled in this prospective study. All patients performed colonoscopy and had a biopsy-proven non-mucinous rectal adenocarcinoma, clinically staged as cT2-4, N0-2, M0 by a diagnostic CT scan, 3T rectal MR, and 18F-FDG PET-CT before starting treatment. All patients were treated with long-course CRT (Capecitabine-based chemotherapy and radiotherapy with total dose of 5500 cGy, 220 cGy/die with SIB-IMRT or SIB-VMAT techniques). A CT scan simulation was performed for treatment planning procedure. Patient characteristics are reported in Table I.
Patient characteristics.
MR technique. The MR studies were performed on a 3T scanner (Achieva, Philips Healthcare). For all patients, T2-weighted fast spin-echo sequences were obtained in three orientations, sagittal, coronal, and axial, perpendicular to the long axis of the tumor. DWI echo planar images were acquired in the transverse plane.
18F-FDG PET-CT technique. 18F-FDG PET-CT images were acquired according to standard procedures (18), 60 minutes after 18F-FDG injection (5MBq/kg of body weight), using a GE Discovery STE. Images were acquired from the base of the skull to the proximal femur (3 min for bed position) and then reconstructed using ordered subset expectation maximization (OSEM)-based algorithms. The CT scanner was used both for attenuation correction and anatomic localization of 18F-FDG uptake. Fused 18F-FDG PET-CT images were displayed in coronal, transverse, and sagittal planes.
Target volume delineation. Two radiation oncologists, both with specific experience in rectal cancer diagnosis and treatment, delineated GTV on simul-CT (CTGTV), T2 axial (T2GTV), echo planar b1000 DWI (DWIGTV) axial sequences, and 18F-FDG PET-CT (PETGTV) on RayStation platform (RaySearch Laboratories, Stockholm, Sweden). They delineated the entire volume blinded and independently from each other. They had the possibility to adjust window- and level-settings for MR and 18F-FDG PET-CT.
The tumor appeared as a hyper-intense signal on DWI corresponding to the mass-like signal alteration on T2-weighted MR. Regarding 18F-FDG PET-CT images, the tumor volume was manually contoured using a visual interpretation technique in collaboration with an experienced nuclear medicine physician. Any area of abnormal FDG uptake, not explained by normal anatomic structures was considered to be tumor tissue. Both MR and 18F-FDG PET-CT were not co-registered with the CT scan simulation.
Statistical analysis. Descriptive statistics were expressed as the mean and standard deviation (SD) for normally distributed variables and as median and quartiles (q1=first quartile; q3=third quartile) for not normally distributed; categorical variables were expressed as frequencies and percentages (%). Dice similarity index (DICE) was computed to assess the measures agreement between reader and method (19). The DICE was used as a statistical validation metric to evaluate the spatial overlap accuracy of the different volume’s delineations. Given two observers to contour volumes A and B, DICE is defined as: DICE=2×(A∩B)/(A+B). The value of a DICE is a scalar coefficient and ranges from 0, indicating no spatial overlap, to 1, indicating complete overlap. From 0 to 1 with steps of 0.2, slight, fair, moderate, substantial, and almost perfect agreement are indicated.
The agreement between readers for the volume measurement was assessed by Lin’s concordance correlation coefficient (CCC). The CCC evaluates the degree to which pairs of observations fall on the 45° line through the origin. It contains a measurement of precision ρ (the Pearson correlation coefficient, which measures how far each observation deviates from the best-fit line) and accuracy Cb (a bias correction factor that measures how far the best-fit line deviates from the 45° line through the origin): ρc=ρCb; in addition, CCC suggests a poor strength of agreement for values below 0.90, moderate from 0.90 to 0.95, substantial from 0.95 to 0.99 and perfect >0.99. The level for significance was set at p<0.05.
Bland Altman analysis (mean difference, 95% limits of agreement) was used to assess reliability between methods evaluating the 95% limits of agreement (20). Indeed, the Kruskal-Wallis test was used to assess the difference between methods and the Dunn test with Bonferroni correction was used for multiple comparisons. All tests were performed using the NCSS statistical software. Before carrying out the non-parametric analysis, normality was tested by De Agostino.
Results
Twenty-four LARC patients (20 males and 4 females), with a mean age of 69 years (range=40-88 years), were included in this study and prospectively analyzed. Each observer analyzed 24 simul-CT, T2-weighted, b1000 DWI-MR, and 24 18F-FDG PET-CT. An example of a CTGTV, T2GTV, DWIGTV, and PETGTV delineation performed by both observers is shown in Figure 1.
Graphic representation on simulation-computed tomography (CT) (Panel A), positron emission tomography-computed tomography (PET-CT) (Panel B), magnetic resonance-T2 (MR-T2) (Panel C), and diffusion-weighted magnetic resonance (DWI-MR) (Panel D) for both observers.
As reported in Table II, mean CTGTV, T2GTV, DWIGTV, and PETGTV were 41.3±26.9 cc (5.6-102.1), 25.9±15.2 cc (3.1-53.1), 21±14.8 cc (2.4-52.6), and 37.7±27.7 cc (2.4-104.5) for the first observer, and 42.2±27.9 cc (7-117.7), 27.6±16.9 cc (3.4-65.4), 19.9±14.9 cc (2.4-48.1), and 34.8±24.3 cc (2.9-102.2) for the second observer, respectively. Mean Dice index was 0.85 for CTGTV, 0.84 for T2GTV, 0.82 for DWIGTV, and 0.89 for PETGTV, representative of an almost perfect agreement (Table II). These values show the feasibility of using all the modalities for both observers, with a near complete overlap in CTGTV, T2GTV, DWIGTV, and PETGTV delineation for the radiation oncologist.
Mean and standard deviation (SD) for gross tumor volume (GTV) for simul-CT (CTGTV), T2-weighted MRI (T2GTV), echo planar b1000 DWI (DWIGTV), and PET-CT (PETGTV) by each observer. DICE index for CTGTV, T2GTV, DWIGTV, and PETGTV obtained by each radiation oncologist, with its range.
The inter-rater repeatability of the measurements evaluated by the calculation of CCC showed a high strength of agreement for all considered variables (Figure 2). In particular, the CCC resulted in 0.968 (0.928-0.986) for CTGTV, 0.962 (0.921-0.982) for T2GTV, 0.968 (0.928-0.986) for DWIGTV, and 0.887 (0.766-0.947) for PETGTV. Because of the high concordance between observers, we have evaluated the agreement between methods considering the mean of the measurement between the observers for each method.
Lin’s concordance correlation coefficient (CCC) evaluated for each method between observers.
The Bland Altman plots (Figure 3) showed some outliers out of the limits of agreements, but globally the methods for the two observers were in accordance. The bias (difference) was 14.99±16.69 CTGTV vs. T2GTV, 6.33±6.90 for T2GTVvs. DWIGTV, and 15.79±17.55 for DWIGTV vs. PETGTV.
Bland-Altman concordance plot between observer mean and methods difference. The lower and upper 95% limits of agreements are represented as blue lines; the mean difference is represented as red line.
Indeed, the Kruskal-Wallis test showed a significant difference between methods (p=0.009). Dunn test showed that there were differences between DWIGTV vs. PETGTV (p=0.040) and DWIGTV vs. CTGTV (p=0.008) (Table III).
Results of Kruskal-Wallis test and Dunn test with Bonferroni correction used to assess differences between methods.
Discussion
In patients with LARC, the potentiality of increasing doses (more than 60 Gy) of neoadjuvant CRT, without compromising toxicities, is an interesting approach, allowing a pCR increase up to 20.4% (1, 2). Standard doses of 50 Gy compared to dose intensification up to 55 Gy, both associated with fluoropyrimidine-based chemotherapy, can achieve a higher rate of pCR (TRG1-2 of 42.2% versus 59.4%, respectively) (17).
This approach could be useful in high-risk patients, with non-resectable T4 tumor, tumor close to the mesorectal fascia or extra-mesorectal lymph nodes involvement. Furthermore, patients reporting ‘good’ clinical responses on imaging restaging during and after neoadjuvant CRT (8, 9, 21, 22) can potentially increase, taking advantage from a boost dose-escalation with SIB procedures (23, 24).
The use of SIB-IMRT with its high dose rate offers the possibility to obtain high pCR, allowing also an OARs sparing. Furthermore, considering the rectum and mesorectum as moving structures, as well as bladder filling variations, a quantification of organ motion remains mandatory in case of SIB-IMRT (25).
In this perspective of modern and precise radiotherapy techniques, with increasingly predictive biomarker research to facilitate a personalized treatment as well as a wait and see strategy (26), good knowledge of target volume definition and an accurate target delineation are required.
Nowadays, CT remains the standard imaging modality for target volume delineation and for conformal RT treatment planning. MR imaging, instead, is superior in terms of rectal tumor definition, defining the depth of tumor invasion through the rectal wall into local structures and tumor extension into the presacral space, and mesorectal fascia involvement assessment, thanks to its high soft-tissue contrast (27). Therefore, it is now considered the gold standard for local staging and restaging of rectal tumors (28).
Furthermore, new techniques offer the possibility to evaluate a “biological target volume”, using the biological information related both to the better image contrast based on water mobility differences (DWI-MR) or to the better definition of the tumor in respect to near organs (18F-FDG PET-CT) (15).
In addition to MR imaging, there is also the potential of 18F-FDG PET-CT in detecting the primary lesion with its metabolic activity, estimating tumor size, determining T and N stages as well as synchronous metastases (10), and predicting treatment response (8-10) in patients with LARC.
DWI-MR has yet been studied and established as a valid method to be used by non-expert readers, therefore radiation oncologists with rectal cancer treatment expertise can use DWI-MR even without a specific formation (29). Instead, the use of PET-CT, also with novel tracers, could require further validation before routine implementation, as reported in the review of Gwinne et al. (27). These images modalities are now studied for target volume delineation, particularly when a radiation boost is planned (12-14, 16). In this perspective, previous studies showed that CT may overestimate rectal tumor volume in respect to T2-MR (30, 31).
Regarding the use of MR imaging, a comparison of tumor definition using DWI has been already performed in respect to T2-MRI. GTV delineated on DWI-MRI resulted in smaller volumes compared to T2-MR, as reported by different studies (12-14). Furthermore, T2 showed significantly larger volumes also when rectal tumors were defined using both T2 and DWI (14). The feasibility of these methods was confirmed by the good results obtained by the inter-observer agreement, for radiologists and radiation oncologists. The authors concluded that boost delineation, using DWI images, could be interesting when dose intensification is required.
The inter-observer agreement was moderate (DICE index of 0.666) between two radiation oncologists and two radiologists for T2 weighted, DWI-MR, and co-registration of T2/DWI-MR contours. The same moderate agreement (DICE of 0.581) was observed regarding semi-automated diffusion-based volume delineation. Also, semi-automated delineation on specific ADC thresholds seemed to be able to standardize rectal contouring in case of accurate co-registration, applying this method in dose escalation or “dose painting” protocols (32).
New evaluations emerged from MR and 18F-FDG PET-CT comparison. Roels et al. evaluated 45 18F-FDG PET-CT and 45 T2-MR exams from 15 LARC patients, obtained before, during, and after preoperative CRT. Larger tumor volumes were found on MR imaging compared to 18F-FDG PET-CT, with an approximately 50% mismatch between the 18F-FDG PET-CT and the MR tumor volume at baseline and during treatment (4).
The same results in larger volumes obtained in T2 weighted MRI (111 cm3) compared to 18F-FDG PET-CT (87 cm3) (p<0.001) were reported in a prospective study on 68 patients with rectal cancer (6). The authors reported the largest volumes on MRGTV and PET-CTGTV in the middle third of the rectum, whereas the smallest were in the upper third. The GTV including the union of MRGTV, with information derived from both CT and MR imaging, and PET-CTGTV became larger than the standard GTV in several patients (6).
Considering CT and 18F-FDG PET-CT evaluation, 18F-FDG PET-CT co-registered with planning CT resulted in smaller volumes than CT alone, allowing also reduction in the inter-observer variation (27, 33). The inter-observer variability was analyzed by an Italian group using 18F-FDG PET-CT images in two different cases of rectal cancer, treated with neoadjuvant radiotherapy. Five radiation oncologists contoured the GTV and the clinical target volume (CTV) on CT and another five contoured on 18F-FDG PET-CT images. The authors concluded that using 18F-FDG PET-CT could decrease variability in GTV size and position, increasing the reproducibility of GTV delineation. Furthermore, the inter-observer variability reduction on the GTV contoured using 18F-FDG PET-CT images could be important for standardizing delineation modalities, guaranteeing more reproducibility when a boost is necessary (16).
In respect to previous studies, we analyzed both morphological and biological assessments represented by CT, T2-MR, DWI-MR, and 18F-FDG PET-CT. To the best of our knowledge, differently from other authors, we compared DWI-MR and 18F-FDG PET-CT. We obtained the smallest volume for DWIGTV for both observers (21±14.8 cm3 and 19.9±14.9 cm3), followed by T2GTV (25.9±15.2 cm3 and 27.6±16.9cm3), PETGTV (37.7±27.7 cm3 and 34.8±24.3 cm3) and CTGTV (41.3±26.9 cm3 and 42.2±27.9 cm3). The Dun test confirmed that DWIGTV resulted in smaller volumes compared to PETGTV (p=0.040) and CTGTV (p=0.008). Therefore, DWI-MR seems to remain the best imaging modality for boost delineation, allowing a reduction in side effects to near OARs, when a dose intensification is required.
Regarding the agreement between readers for all volumes, both DICE and Lin’s concordance correlation coefficient showed perfect agreement between observers for each modality, with all values higher than 0.8. We underline the feasibility of using DWI sequences and 18F-FDG PET-CT images for radiation oncologists.
Our study has some limitations. Firstly, the number of patients was relatively small. Secondary, it is known that 18F-FDG PET-CT has the disadvantage of the limited resolution of images, inter-observer variability, and dependence on the experience of the physician (34). Despite these possible difficulties, in accordance with our nuclear physicians, we adjusted the background intensity to what was considered normal based on FDG uptake in the liver, considering as tumor tissue all areas with elevated FDG uptake.
In conclusion, DWI-MR resulted in smaller volume delineation compared to T2-weighted MR, 18F-FDG PET-CT, and CT images. Almost perfect agreements, as reported through DICE index, were reported for each imaging modality between the two observers, both radiation oncologists. As functional imaging, DWI obtained smaller volumes compared to 18F-FDG PET-CT. DWI-MR seems to remain the optimal strategy for boost volume delineation in case of dose escalation. In case of impossibility to perform a rectal MR, 18F-FDG PET-CT can provide biological information for an accurate boost volume delineation compared to CT. Both 18F-FDG PET-CT and DWI-MR are used in target volume definition, although obtaining smaller volumes in respect to CT alone, requires further validation.
Footnotes
Authors’ Contributions
CR, MDT, and DG designed and coordinated the study and analysis. CR, LG and FDG collected the data. LC, MDT, ADP, PC and RM reviewed and approved data selection. CR, LG and FDG performed main data analysis and provided pictures elaboration. CR, LC, and MDT drafted the article. AP and MDN performed statistical data analysis. ADP, GM, PC, RM, MLC, and DG critically revised the study and the article. All Authors reviewed and approved the final article.
Conflicts of Interest
The Authors report no conflicts of interest in relation to this study.
- Received November 20, 2022.
- Revision received December 8, 2022.
- Accepted December 9, 2022.
- Copyright © 2023 The Author(s). Published by the International Institute of Anticancer Research.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).