Abstract
Background/Aim: Breast cancer remains a major global health concern. This study aimed to develop a deep-learning-based artificial intelligence (AI) model that predicts the malignancy of mammographic lesions and reduces unnecessary biopsies in patients with breast cancer. Patients and Methods: In this retrospective study, we used deep-learning-based AI to predict whether lesions in mammographic images are malignant. The AI model learned the malignancy as well as margins and shapes of mass lesions through multi-label training, similar to the diagnostic process of a radiologist. We used the Curated Breast Imaging Subset of Digital Database for Screening Mammography. This dataset includes annotations for mass lesions, and we developed an algorithm to determine the exact location of the lesions for accurate classification. A multi-label classification approach enabled the model to recognize malignancy and lesion attributes. Results: Our multi-label classification model, trained on both lesion shape and margin, demonstrated superior performance compared with models trained solely on malignancy. Gradient-weighted class activation mapping analysis revealed that by considering the margin and shape, the model assigned higher importance to border areas and analyzed pixels more uniformly when classifying malignant lesions. This approach improved diagnostic accuracy, particularly in challenging cases, such as American College of Radiology Breast Imaging-Reporting and Data System categories 3 and 4, where the breast density exceeded 50%. Conclusion: This study highlights the potential of AI in improving the diagnosis of breast cancer. By integrating advanced techniques and modern neural network designs, we developed an AI model with enhanced accuracy for mammographic image analysis.
Breast cancer is one of the most common cancers affecting women worldwide (1). A concerning statistic revealed that one in every eight women in the United States will be diagnosed with this condition during their lifetime (2). The importance of early detection cannot be overstated; diagnosis at stage 0 or 1 promises a 5-year survival rate of 99% (3). However, for those detected at stage 3, the rate decreases drastically to 72% (4). Accordingly, an accurate diagnosis of breast cancer has become an important research topic in the medical field (5).
Mammography, the gold standard for early breast cancer detection, primarily identifies the tumor location and size. However, this method can misdiagnose 10-30% of malignancies (6). Furthermore, of the women recommended for biopsies based on mammographic findings (7), 80% have benign conditions (8). Unnecessary breast biopsy causes a short-term decrease in the patient’s quality of life due to anxiety and pain before biopsy (9). In cases of late diagnosis, the tumor is less favorable compared with those of initial diagnosis, resulting in more mastectomies (10). Additionally, mammography sometimes fails to detect malignancies in women with dense breast tissue as it can obscure tumors (11).
The potential of artificial intelligence (AI) to address these shortcomings has become a popular research topic (12, 13). However, it is unclear whether the AI model is based on actual medical knowledge similar to that of a specialist when making decisions, and its interpretability remains ambiguous (14). Many of these models, have achieved detection accuracies of approximately 80-85% (15, 16). Moreover, they are vulnerable to data imbalances, leading to misclassification rates of up to 15% (17-20).
In this study, we developed an AI model to classify malignant lesions using mammographic images from a widely recognized Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM). We aimed to design AI methods that mirror intricate observations made by radiologists and capture the diverse lesion locations and morphologies. We intend to integrate sophisticated cascade-region-based convolutional neural network (R-CNN) designs and employ data-augmentation techniques to achieve this objective. Our primary goal was to increase the accuracy of lesion detection using mammography. We hypothesize that by imitating the diagnostic methods of actual radiologists, the properties of mass lesions can be learned using deep learning-based AI to improve the accuracy of malignancy prediction by mammography.
Patients and Methods
Dataset. This study was approved by the Institutional Review Board (IRB) of the authors’ affiliated institutions (IRB No. AJOUIRB-EX-2023-422). The need for informed consent from participants was waived by the IRB because of the retrospective nature of this study. In this study, we used the CBIS-DDSM in the Digital Imaging and Communications in Medicine format for patients with breast cancer (21). The repository includes 3,568 radiographic images from 1,566 patients. Each patient’s dataset predominantly featured craniocaudal (CC) and mediolateral oblique (MLO) views. To integrate the shape and margin properties of mass lesions into our model training framework, which is the main focus of our study, we excluded images without mass lesions from the learning process. As a result, a total of 892 patients and 1,696 mammography images were used in this study. Among these, the test dataset, as defined by the CBIS-DDSM data curators, consisted of 201 patients and 378 images (Table I). The entire cohort included only malignant and benign tumors. Detailed information on the shapes and margins of the mass lesions is shown in Table II. To ensure a balanced distribution of malignant and benign cases in the training and validation datasets, we employed stratified k-fold cross-validation using the StratifiedKFold module from the Python library scikit-learn. This technique partitioned the data into 5 folds while preserving the relative class frequencies, maintaining an approximate 8:2 ratio of malignant to benign cases in each fold. By using stratification, we mitigated potential bias and promoted a balanced learning process for our model.
Distribution of patients and mass images in the training and testing sets of the Curated Breast Imaging Subset of Digital Database for Screening Mammography dataset.
Mass margins and shapes and number of images.
Data preprocessing. Considering the memory limitations of the graphic processing unit, the images were resized using a linear interpolation technique. During this process, axial padding was judiciously applied to preserve the original aspect ratio and achieve a congruent quadrilateral dimensionality. Before being entered into the machine learning model, radiographic data were subjected to normalization, constraining the pixel values within the interval (0 and 1).
To prepare the input data for the detection model, the mammographic images were resized to a height of 960 pixels while maintaining their aspect ratio. This resizing process ensures that the images are compatible with the input requirements of the detection model while preserving the relative dimensions of the lesions. Once the detection model identified the lesions within the mammographic images, the regions of interest (ROIs) containing the lesions were extracted. These ROIs were then resized to a fixed size of 224 pixels in width and 224 pixels in height. The resized lesion ROIs served as the input for the malignancy classification model. This two-stage approach, consisting of lesion detection followed by malignancy classification, allows for a more focused analysis of the lesions and helps improve the overall accuracy of the breast cancer diagnosis system.
Image augmentation. Compared with images from other fields, medical images are typically characterized by scarcity and class imbalance. Due to these characteristics, augmentation is crucial for achieving generalized model performance (22). We employed the following augmentation techniques to develop a model with robust and powerful generalization capabilities.
Brightness adjustment: The brightness and saturation of the images were modulated using the hue saturation, brightness, contrast, and gamma techniques.
Elastic transformations: Techniques, such as optical distortion, grid distortion, elastic transformation, and shift-scale rotation were used to introduce various elastic deformations into the images.
Noise injection: Blur, Gaussian, and multiplicative noise techniques were applied to introduce noise into the images.
Image blending techniques: The mix-up technique was used to overlap two images to produce a single composite, whereas the Mosaic (23) technique was used to combine four training images into a single one.
Object detection in mammographic images. Recognizing that identifying the location of lesions is critical for a more precise diagnosis, we initially implemented an object detection model to identify these lesions, followed by their classification. A cascade R-CNN was applied to further fine-tune the detection of lesion locations using supervised learning. The detection performance was evaluated using mean average precision (mAP), which is a popular metric that provides a comprehensive assessment of object detection accuracy by averaging the precision values across different recall levels.
Multi-label classification. ViT-B DINO-v2, a vision transformer model pre-trained using the self-supervised learning method DINO (Self-Distillation with No Labels) (24), was used for training during the supervised learning phase. Vision transformers (ViTs) have recently emerged as a powerful alternative to convolutional neural networks (CNNs) for image classification tasks, showing improved performance and generalization capabilities (25). We applied the multi-label classification to concurrently train the model on the malignancy and mass characteristics of the lesion. By learning multiple labels associated with the image, the AI model can capture a wide range of image features and their intricate relationships (26). This approach closely mirrors the diagnostic processes employed by radiologists as recommended by the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS) guidelines (27).
The shape and margin of a mass provide critical information about its relationship with the surrounding tissues. A mass with an irregular margin or asymmetric shape or one that has blurred boundaries with adjacent tissues or appears to infiltrate them, often indicates a malignancy (28). Rather than merely focusing on a binary classification into benign or malignant, we designed our training model to consider these intricate characteristics of the mass (Figure 1). In the final step, the model accepted two image views, namely CC, and MLO. The max probability from these lesion images was computed to predict the likelihood of the patient’s breast being malignant (Figure 2).
Region of interest (ROI) classification. This figure shows the architecture of a classifier that learns to classify lesion images found by a detector. We used three classifiers, each learning the shape (malignant or benign), and margins of the mass. Through this process, the encoder creates a better image representation.
Training process. The detector identifies lesions in the mammography image and learns them using a multi-label classification method with a vision transformer.
To assess the performance of our multi-label classification model, we employed several widely used evaluation metrics, including the area under the receiver operating characteristic curve (AUROC), F1 score, and accuracy. The AUROC measures the model’s ability to discriminate between classes, with a higher value indicating better performance. The F1 score is the harmonic mean of precision and recall, providing a balanced measure of the model’s accuracy. Accuracy represents the proportion of correct predictions made by the model. Moreover, to gain insights into the regions of the images that most influenced the model’s predictions, we utilized Gradient-weighted Class Activation Mapping (GradCAM). GradCAM generates a heatmap that highlights the important regions in the input image for a particular class prediction. Comparing the GradCAM visualizations of our multi-label model with those of a model trained solely on lesion malignancy facilitates better understanding of the effect of incorporating mass shape and margin information to the model’s decision-making process.
Results
For mass detection on mammograms, our model demonstrated promising results, achieving a mean average precision (mAP) of 0.713, mAP@0.5 of 0.843, and mAP@0.75 of 0.628 (Figure 3). We employed multi-label classification to address the complexity of mass lesion. Thus, we trained our model simultaneously using two distinct parameters: shape and margin of the mass. When trained on mass shape and margins, our model achieved an F1 score (machine learning accuracy measure) of 0.7789 (95%CI=0.7712-0.7867) and an AUROC of 0.8522 (95% CI=0.8480-0.8565).
Lesion detection model performance and examples. We demonstrate an example prediction of a lesion detection model on mammography images. The red box is the ground truth with actual lesions, and the green box is the ground truth. This is the location and probability value of the lesion predicted by the model. The table below evaluates the detection performance of the model using average precision mAP, evaluated across various IoU thresholds. Our model achieved a performance of 0.843 at mAP@0.5.
In contrast, when trained solely on the malignancy of the lesion, the model’s performance decreased slightly, with an F1 score of 0.7697 (95%CI=0.7692-0.7703) and an AUROC of 0.8408 (95%CI=0.8362-0.8440) The difference in performance between the two models was statistically significant (p=0.0112 for F1 score, p=0.0018 for AUROC) (Table III).
Performance malignancy classification of the Curated Breast Imaging Subset of Digital Database for Screening Mammography (the developed model versus other models).
Figure 4 shows a comparison using GradCAM when the model’s training was limited to the malignant tumor of the lesion and when it was trained considering the overall characteristics (margin and shape) of the tumor. GradCAM confirmed that the model that learned the types of margins and shapes demonstrated higher weight to the border area and viewed the entire pixel evenly when classifying malignant lesions.
Comparison of the developed model with other models using gradient-weighted activation mapping. The image feature generated by an image encoder that learns both the margin and shape of the mass reveals the lesion more specifically than only learning whether the lesion is malignant.
In contrast, the model that learned only whether or not a malignant lesion was present tended to make decisions by looking only at specific pixels. Table IV presents the performance evaluation results in terms of breast density divided according to the ACR BI-RADS guidelines. In this case, we obtained the probability of malignancy for lesions detected in four mammographic views: LEFT, RIGHT, CC, and MLO. The max probability was subsequently used to evaluate whether the patient had a malignant lesion or not. Particularly, in categories 3 and 4, where the density was ≥50% (29), our methodology predicted breast cancer better than when only the lesions were used for training.
Performance results according to breast density.
Discussion
In this study, we successfully developed an AI model tailored to classify lesions using the CBIS-DDSM. The pathological diagnosis of malignant lesions poses a unique challenge. Our model is designed to capture the different shapes and margins of the mass. Before this task, we developed a lesion detection model for lesion classification, and our model, with a ConvNeXt backbone integrated with a cascade RCNN detector, showed notable precision and efficiency. Our model, which employs this innovative architecture, significantly outperformed conventional models (30). We believe that the insights gained from this study can strengthen the role of AI in the medical domain.
Image augmentation is a method of augmenting the amount of an image by applying various transformations to an existing image (31). Compared with general image techniques, image augmentation techniques are essential for medical images that include complex distributions, are less general, and have a severe class imbalance (32), where there is often a lack of available data. In our study, we harnessed the capabilities of the “albumentations” and “mmdetection” Python libraries to implement diverse augmentation strategies. One such strategy, distortion, involves the application of elastic transformations to warp the image. Given the inherent variability in lesion shapes on mammographic images among different patients (33), such an augmentation strengthens the model’s generalization capabilities across a broad spectrum of image variants (34). Furthermore, we used the mix-up technique, wherein two images are superimposed, conferring robustness against adversarial data and curtailing the risk of overfitting (35).
ResNet, a traditional CNN model, has demonstrated good performance for a long time using a simple idea called the residual block. However, this model does not reflect cross-channel information (36). The overall structure of ConvNeXt is similar to that of ResNet50. Nevertheless, it includes the feature extraction layer of the head, middle layer, where the bottleneck structures of four different dimensions are separately stacked, and final high-dimensional feature classification layer (37).
The architecture of standard CNNs was modernized to construct a hierarchical vision transformer, enabling newer models to surpass the performance of existing CNN models, including ResNet (38). ConvNeXt has a multistage design with varying feature map resolutions for each stage. Therefore, it is easy to extract features from radiographic images (39).
The cascade R-CNN introduces a three-stage detector structure to rectify the aforementioned challenges. The intersection over union (IoU) is the most popular evaluation metric for object detection benchmarks (40). The IoU refers to the ratio of predicted to marked regions, and a high IoU indicates that the model is good at finding objects in the background (41). Each phase was trained by incrementally increasing the IoU threshold. Initially trained with an IoU of 0.5, the detector generated region proposals. These proposals were then fed into the subsequent detector, which was trained with an IoU of 0.6. Finally, the output values were derived from a detector operating at an IoU of 0.7. This methodology substantially mitigates the overfitting problem and ensures consistent performance during the training and inference phases (42).
The unique characteristics of cascade R-CNNs are essential for detecting lesions on mammographic images. Lesions often require comparison with surrounding tissues for accurate assessment (43), making higher IoU values indispensable for precision detection. The trilevel structure of the cascade R-CNN facilitates this process. Furthermore, the integration of this detector considerably augmented the accuracy of lesion detection in dense breast tissue (Figure 3).
The core of our model for classifying the malignancy of lesions after detection is the multi-label classification approach. In the GradCAM analysis, our methodology focused on pixels that closely resembled the actual ROI mask of the mass lesions (Figure 4). Notably, the difference in performance was based on breast density. Our model did not reveal a significant performance increase compared to the model that learned only the malignancy of lesions when the density was <25%. However, for cases with density >25%, a significant improvement was observed in AUROC (Table IV).
Study limitations. First, the data used for training the model were exclusively derived from the CBIS-DDSM. The retrospective nature of the data may introduce potential biases and limit the generalizability of the findings. Moreover, the model’s performance with other datasets, particularly prospective data, is yet to be evaluated. External validation using independent datasets is necessary to assess the model’s robustness and generalizability. Furthermore, it remains unclear how the AI model employed in this study classifies other types of breast cancers. Second, our model follows a two-stage approach: detecting lesions and classifying their malignancy rather than employing a more streamlined and efficient end-to-end structure. Therefore, a method is required to manage when the detector fails to identify a lesion. Third, further improvement in the prediction score can be achieved using techniques such as ensembles with other models. Finally, convolutional networks are prone to information loss during pooling (44). Comparing the results using a Swin transformer (45), a transformer model, as the backbone model, is crucial.
Conclusion
In conclusion, our findings demonstrate the potential of AI in improving breast cancer diagnosis, particularly regarding dense breast tissue. By integrating cutting-edge techniques with contemporary neural network designs, we present an AI model demonstrating superior accuracy in mammographic image analysis. Future studies should assess its broader clinical significance and potential utility in patients with dense breast tissues.
Acknowledgements
This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant number: HR21C1003, HR22C1734). This work was supported by research fund of Ajou University Medical Center (2023).
Footnotes
Authors’ Contributions
Conceived and designed the analysis: J.H.P., J.H.; Collected the data: J.H.P., J.H.L., S.K. Contributed data or analysis tools: J.H.P., J.H; Performed the analysis: J.H.P.; Wrote the paper: J.H.P; Manuscript editing: J.H.
Conflicts of Interest
The Authors have no conflicts of interest to declare relevant to this article.
- Received July 8, 2024.
- Revision received July 17, 2024.
- Accepted July 18, 2024.
- Copyright © 2024 The Author(s). Published by the International Institute of Anticancer Research.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).










