Main

Breast cancer screening with mammography is known to reduce mortality from the disease (Smith et al, 2004) and although there is some dissent (Gøtzsche et al, 2009), the majority opinion is that mammographic screening is effective (Vainio, 2002). The major mechanism of this mortality reduction is the diagnosis of disease at an early stage, while it is likely to be successfully treatable (Tabár et al, 1985; Smith et al, 2004).

In recent years, there has been interest in the extent to which screen-detected breast cancer differs from symptomatic disease in biological terms (Collett et al, 2005; Wishart et al, 2008). Survival studies have indicated that the majority of the survival benefit can be attributed to smaller size and a lesser rate of node involvement at presentation (Wishart et al, 2008). Biological variables such as HER-2 status apparently account for <10% of the difference in prognosis between screen-detected and symptomatic cancers (Dawson et al, 2009). Around 30% of the difference remains to be explained (Wishart et al, 2008; Dawson et al, 2009).

It is also of interest to study survival differences in narrow prognostic categories, to ascertain whether the difference can be better explained by more minute categorisation of factors such as tumour size, and whether the survival advantage of screen-detected tumours is more marked in higher risk or lower risk tumours. It is also desirable to take lead time into account in explaining survival differences.

In this paper, we investigate the proportion of the survival difference between screen-detected and symptomatic tumours that can be explained by tumour size, a combination of tumour size and node status, histological grade and the Nottingham Prognostic Index (NPI), which takes into account all three prognostic factors. In addition, we estimate the difference that can be explained by lead time, the additional observation time added to the survival as a result of early detection by screening.

We also use a method, described by Bashir and Estève (2000), for partitioning the variation in survival between the two modes of breast cancer detection (screening or symptomatic) with respect to (1) the distribution of prognostic factors by detection mode and (2) differences in survival specific to prognostic factor status in narrow categories. In this study, we used 19 411 invasive breast tumours diagnosed in women aged 50–64 years recorded by the West Midlands Cancer Intelligence Unit. The size of the remaining survival differences, between screen-detected and symptomatic tumours after taking into account lead time and the difference in pathological prognostic factors illustrates the scope of survival differences attributable to length bias and overdiagnosis. Length bias in the context of screening is the tendency of screening to detect preferentially more slow-growing tumours, which therefore have better prognosis. Overdiagnosis is the extreme form of length bias whereby screening detects some tumours, which would never have been diagnosed in the host's lifetime had the screening not taken place.

Materials and methods

In collaboration with the NHS Breast Screening Programme, the West Midlands Cancer Intelligence Unit aims to determine the screening histories of all women diagnosed with breast cancer in the West Midlands, UK. Screening histories for 19 411 women aged between 50 and 64 years with invasive breast tumours diagnosed between 1988 and 2004; 11 674 (60.1%) diagnosed symptomatically and 7737 (39.9%) screen detected are included in this study. We studied the survival difference between symptomatic and screen-detected tumours in relation to tumour size, grade, nodal status and the NPI. The latter is a validated prognostic tool based on tumour size, grade and lymph node status (Todd et al, 1987). It is frequently categorised into five prognostic groups (Lee and Ellis, 2008): excellent (NPI<2.41), good (2.41NPI<3.41), moderate 1 (3.41NPI<4.41), moderate 2 (4.41NPI<5.41) and poor (NPI5.41). Note that the number of cases vary among analyses, due to different numbers with missing data on size, node status and grade. We also considered socioeconomic status as measured by the area-based Townsend score.

Categorical variables were compared between symptomatic and screen-detected tumours using the χ2-test, and continuous variables using the Wilcoxon test (Wilcoxon, 1945). For survival analysis, we first examined the difference in 10-year Kaplan–Meier survival (Kaplan and Meier, 1958) in five size categories between symptomatic and screen-detected tumours. We then estimated the expected overall survival for the symptomatic cases if they had had the same size distribution as the screen-detected cases, using the method of Bashir and Estève (2000). This yielded an estimate of the proportion of the survival difference attributable to the more favourable size distribution of screen-detected cancers, the complementary proportion attributable to size-specific survival differences between the two detection modes.

The analysis was performed with and without adjustment for lead time. We repeated this analysis for size categorised into 10 classes, for a combination of tumour size and node status, for histological grade and for the NPI, divided into 10 prognostic groups. We adjusted for lead time bias using the method of Duffy et al (2008), who estimated the additional time of observation, due to screening lead time, between diagnosis and either death or censoring for each screen-detected case. They showed that for a subject who dies of breast cancer at time t, the additional time is on average

For a subject censored at time t, the average additional time is

where λ is the rate of transition from asymptomatic to symptomatic disease, and is the reciprocal of the average asymptomatic screen-detectable period. We calculated E(s) for every screen-detected case, and subtracted this from their survival time. We estimated λ as 0.26 from the largest of the breast cancer screening trials (Tabár et al, 2000). This corresponds to an average asymptomatic screen-detectable period of 3.9 years.

With the correction for lead time, the proportion of the survival difference accounted for by pathological prognostic factors such as size can be considered the residual proportion attributable to size etc, after removal of the lead time effect. The difference remaining to be accounted for is attributable to unobserved factors, and to length bias or overdiagnosis.

The above analysis was complemented by Cox proportional hazards regression (Clayton and Hills, 1993), estimating the relative hazard for screen-detected cancers unadjusted and adjusted for pathological factors and lead time. In addition, the Freedman statistic for the proportion of the survival difference accounted for by the various adjustment factors was calculated (Freedman et al, 1992).

Results

Patient and tumour characteristics are shown for screen-detected and symptomatic cases in Table 1. All variables showed significant differences between the two detection modes. The symptomatic cases were slightly but significantly younger and slightly but significantly more deprived, had larger tumours, had a greater proportion of tumours with positive nodes and had tumours with a more severe grade. Consequently, women with symptomatic tumours had a poorer prognosis than women with cancers detected by screening.

Table 1 Patient characteristics by mode of detection

Table 2 shows invasive breast tumours categorised into five size groups for symptomatic and screen-detected tumours and their 10-year survival rates. The unadjusted 10-year survival for women with screen-detected tumours compared to women with symptomatic tumours was better in all size groups and overall. This was most marked in the 21–50 mm size groups, and the adjustment for lead time had the strongest attenuating effect in these groups. Note that the overall survival difference is greater than observed within specific size categories. This indicates that a substantial part of the survival benefit of screen detection is due not to size-specific differences but to shifts in tumour size associated with screen detection. This phenomenon was also observed in subsequent analyses described below. Overall, the absolute survival advantage for women with screen-detected tumours was 85.9–65.3=20.6%.

Table 2 10-year survival for women aged 50–64 years with symptomatic and screen-detected invasive breast tumours by size of tumour in five categories

The expected overall survival in the symptomatic cases if they had had the same size distribution as the screen detected was calculated as

The proportion of the survival difference explained by the different size distributions was therefore

That is, 45% of the survival difference between screen-detected and symptomatic cases can be attributed to the more favourable size distribution (using these five size categories) in the screen-detected tumours, and 55% to differences in size-specific survival. The overall 10-year survival of the screen-detected tumours adjusted for lead time was 79.8%. This suggests that 30% of the difference is due to lead time. The proportion of the remaining survival difference attributable to the differing size distributions was

That is, 64% of the difference in survival after adjustment for lead time is attributable to the better size distribution of screen-detected cases.

Survival differences were markedly changed when the tumours were divided by size and node status simultaneously (Table 3). For node-negative tumours, the greatest survival advantage for screen-detected cases was in the 31–50 mm size group for both unadjusted and adjusted figures. The smallest difference was seen in the smallest tumours where indeed a slight survival advantage was observed for women with symptomatic tumours after adjustment for lead time. For node-positive tumours, the greatest survival advantage was seen in women with the smallest tumours using either unadjusted or adjusted survival figures.

Table 3 10-year survival for women aged 50–64 years with symptomatic and screen-detected invasive breast tumours by a combination of tumour size and nodal status

The expected survival in the symptomatic tumours if they had had the same size and node status distribution as the screen-detected cases was 77.0%. Unadjusted for lead time, the overall survival of the screen-detected tumours was 85.0%. Thus, before adjusting for lead time, 60% of the survival advantage of screen-detected tumours was attributable to the difference between the joint distributions of tumour size and node status. After adjustment for lead time, the survival difference between screen-detected and symptomatic tumours was 12.5%, and the difference between the survival of screen-detected cases and that expected in the symptomatic if they had had the same size/node status distribution as the screen detected was 77.4–77.0=0.4%. Thus, almost all (97%) of the remaining survival difference after adjusting for lead time was attributable to the difference between screen-detected and symptomatic tumours in terms of size and node status.

Table 4 shows invasive breast tumours categorised into histological grade for symptomatic and screen-detected tumours and their 10-year survival rates. The unadjusted and adjusted 10-year survivals for the screen-detected cases compared with the symptomatic cases was better for all grades and overall although less after adjusting for lead time. Overall, the absolute survival advantage for women with screen-detected tumours was 85.0–64.9=20.1% unadjusted. The expected overall survival in the symptomatic cases if they had had the same size distribution as the screen detected was 71.6%. The proportion of the survival difference explained by the different size distributions was 34%, so 66% was due to the difference in grade-specific survival. The overall 10-year survival of the screen-detected tumours adjusted for lead time was 77.4%. This suggests that 37.6% of the difference is due to lead time. The proportion of the remaining survival difference attributable to the differing grade distributions was 0.54, that is, 54% of the difference in survival after adjustment for lead time is attributable to the better grade distribution of screen-detected cases. The greatest survival advantage was seen in women with grade 2 tumours both before and after adjustment for lead time and the smallest difference was seen for women with grade 1 tumours.

Table 4 10-year survival for women aged 50–64 years with symptomatic and screen-detected invasive breast tumours by histological grade of tumour

Since size, node status and grade are correlated, the attributable percentages are non-exclusive and cannot be combined additively. Table 5 shows 10-year survival for symptomatic and screen-detected cases when tumours were divided into 10 NPI categories. Total survival for the symptomatic tumours was 66.1%, and for the screen-detected tumours, 84.7% unadjusted, and 75.5% after adjustment for lead time. There was a screen-detected survival advantage for all prognostic groups when using the unadjusted survival figures except for women in the 4.21<4.38 group where a small survival advantage for women with symptomatic tumours was seen. When using lead time adjusted survival figures, there was an even larger survival advantage seen for women with symptomatic tumours in this prognostic group. The expected survival for symptomatic tumours if they had had the same NPI distribution as the screen-detected cases was 79.7%. Thus, the NPI distribution accounted for 73% of the survival difference without adjustment for lead time and entirely accounted for the difference after lead time adjustment.

Table 5 10-year survival by mode of detection for women aged 50–64 years with invasive breast tumours in 10 NPI categories

For some of the categories, the survival in the screen-detected tumours is poorer after lead time adjustment. This may be due to the fact that much of the lead time is highly correlated with the prognostic factors making up the NPI and therefore within very minute categories of NPI there is little residual lead time, and therefore the correction may be an overadjustment.

Table 6 shows the relative hazard for screen-detected vs symptomatic cancers, unadjusted and adjusted for prognostic factors, and uncorrected and corrected for lead time. The Freedman statistics indicate that size and node status account for 46% of the survival difference, and that correction for lead time, size and node status account for 90% of the difference. The NPI accounts for 67% of the difference, but together with the correction for lead time, it accounts for 100% of the difference. These results are consistent with those of the Bashir and Estève (2000) method. The lead time corrected and NPI adjusted results again suggest an overcorrection.

Table 6 Cox regression results – relative hazards for screen-detected vs symptomatic tumours, the effect of adjustment for prognostic factors and correction for lead time, and the Freedman (199 percentage of the survival advantage of screen-detected tumours accounted for by adjustment and correction)

Discussion

We analysed the 10-year survival data of 19 411 women aged 50–64 years diagnosed with invasive breast cancers in the West Midlands region of the United Kingdom. The availability of the very large tumour series with detailed screening history made it possible to divide the cancers into very narrow prognostic bands. Our results found a strong survival advantage for women with screen-detected tumours as seen in many studies comparing screen-detected and symptomatic breast cancers (Wishart et al, 2008; Dawson et al, 2009; Lawrence et al, 2009). The survival advantage was partly explained by the more favourable distribution of tumour size in narrow prognostic categories. When screen-detected tumour survival was additionally adjusted for lead time, the survival advantage was still evident, but smaller. When the tumours were classified into 10 categories by size and node status, the survival difference was almost entirely accounted for by a combination of lead time and the more favourable size and node status of the screen-detected cancers, with a remaining absolute survival difference of <1%.

A strong survival advantage was also seen for women with screen-detected tumours when adjusted for histological grade, which was again, attenuated when adjusted for lead time. Simultaneously adjusting for lead time and NPI, which incorporates tumour size, node status and histological grade, the survival difference between screen-detected and symptomatic tumours was entirely accounted for. However, one might argue that histological grade in many cases is an innate feature of tumour biology rather than a time-progressive attribute of the tumour, so the size–node status adjustment might be more appropriate.

The lead time adjustment is rigorous and based on empirical estimation of the average preclinical screen-detectable period from a large randomised trial, estimating the average sojourn time as 3.9 years (Tabar et al, 2000). This gave an average additional observation due to lead time of 3 years in the screen-detected cases in our data. The method depends also on the observed survival time, so that the lead time correction is on average smaller for poor prognosis tumours than for tumours with favourable prognostic attributes. This makes overcorrection unlikely, although there may be some overcorrection within prognostic categories defined partially by non-progressive features. This may be the case for the NPI results, since at the very least for some tumours the grade is an innate rather than a progressive characteristic of the tumour. There is a wide range of sojourn time estimates in the literature, and a shorter mean sojourn time would give a smaller proportion of the survival difference accountable for by lead time. However, the estimated mean sojourn times vary by age and in this age group, 50–64 years, they are mostly close to our estimate of 3.9 years (Paci and Duffy, 1991; Tabar et al, 2000; Weedon-Fekjaer et al, 2005).

The conclusion up to now has been that the portion of the survival advantage of screen-detected cancers that could not be attributed to the prognostic factors size and node status (and possibly grade) must be attributable to unobserved biological covariates (Collett et al, 2005; Wishart et al, 2008). Our results suggest that a combination of lead time with size and node status in 10 categories explains almost all of the survival advantage. This does not invalidate the hypothesis of further unobserved biological differences, since biological tumour features will almost certainly affect tumour progression rates and therefore lead time. It does, however, suggest that only a small proportion of the survival advantage of screen-detected cancers remains to be explained by biological differences between screen-detected and symptomatic tumours.

Such biological differences are likely to give rise to length bias, the tendency of screening to detect the more slow-growing tumours. The extreme form of length bias is overdiagnosis, the detection by screening of cancers that would never have been diagnosed in the host's lifetime if screening had not taken place. Estimates of overdiagnosis vary considerably (Biesheuvel et al, 2007). The results here do not formally estimate the overdiagnosis rate, but the small amount of the survival benefit that remains unattributed after correction for lead time and adjustment for tumour size and node status would only require a small degree of overdiagnosis (between 3% and 10%) to account for it.

In addition to the survival difference conferred by different distributions of prognostic factors, there were notable differences in survival within prognostic categories. Broadly, substantially better survival was observed with screen detection for node-negative tumours of size 21–50 mm and node-positive tumours of size 30 mm. These were partly but not entirely explained by lead time.

In conclusion, in this large tumour series, the better survival of screen-detected breast cancers was almost entirely explained by a combination of lead time and the improved size and node status of screen-detected tumours.