Abstract
Background/Aim: Hashimoto thyroiditis (HT) association with thyroid lymphoma is well established; however, the association with papillary thyroid cancer (PTC) is still unclear. Thyroid cancer incidence has shown an increasing trend in recent years. It is characterized by slow growth, making it generally amenable to successful treatment. Materials and Methods: We aimed to identify genes considered as promising biomarkers of the progression from thyroiditis to thyroid cancer in public gene expression datasets. Results: We identified 70 differentially expressed genes (DEGs) and used them to prioritize biological risk genes for thyroiditis and thyroid cancer. Statistics and a scoring system based on six functional annotations of significant biological impact identified four genes of interest: CXCR4, IL6ST, PPARG and TP53. Kaplan–Meier plots were used to assess the expression levels related to overall survival. Furthermore, a manual bibliographic search was carried out for each gene, and a protein–protein interaction (PPI) network was built to verify their known associations. Conclusion: The results showed that all four genes (CXCR4, IL6ST, PPARG, TP53) were highly relevant to thyroiditis and thyroid cancer, thus making them worthy of further investigation to understand their relationship with these two diseases.
Between 1990 and 2013, the worldwide age-normalized rate of occurrence of thyroid cancer surged by 20% (1, 2). The frequency of thyroid cancer is increasing, and this disease is forecasted to become the fourth most prevalent category of cancer worldwide (1, 2). Thyroid cancer is the predominant endocrine malignancy, showing substantial rises in occurrence in China and East Asia in the previous decade (3).
Hashimoto’s thyroiditis (HT) is triggered by an autoimmune reaction in which immune cells generate autoantibodies (4-9). Most papillary thyroid cancers (PTCs) arise from the thyroid glands without HT and most persons with HT do not develop PTC. In addition, non-environmental risk factors, such as genetic predisposition, may be interconnected with a background of hereditary autoimmune disorders (10, 11).
Genomic and biomedical information in the form of databases has rapidly become accessible, owing to technological progress in experimental and computational biology (12). The field of bioinformatics has steered research toward the realm of precision and personalized medicine (13). This study investigated the intersection between differentially expressed genes (DEGs) in datasets linking thyroiditis and thyroid cancer. Assignment of higher scores (>3) to candidate genes during the annotation process of a scoring system based on six functional annotations indicated a more significant biological impact. Therefore, we employed Gene Expression Profiling Interactive Analysis (GEPIA) and Gene Set Enrichment Analysis (GSEA) to analyze data from various databases, aiming to identify potential biomarkers indicating the likelihood of developing thyroid cancer in thyroiditis.
Materials and Methods
Identification of differentially expressed genes in the two datasets. The gene expression profiling datasets GSE3678 and GSE138198 were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/, accessed on August 7, 2023). The GEO database contains 7 PTC tissue samples and 7 normal thyroid tissue samples in the GSE3678 dataset and 13 PTC samples and 13 HT samples in the GSE138198 dataset. The downloaded data had already been normalized. Here, we used the limma package in R to identify DEGs (14, 15). For the paired samples in GSE138198, we selected correlation between HT and PTC. DEGs were filtered using the following criteria: [log fold change (log FC)] >2 and p-Value <0.05. Our study included samples solely from human skin tissue, and we ensured that the datasets had complete data for analysis and had obtained ethical approval.
The set of DEGs was expanded using the STRING database to identify additional potential target genes. The STRING database (https://string-db.org, accessed on August 7, 2023) was established to combine protein–protein interactions with functional associations with protein expression (16). We inputted the selected list of DEGs from the previous stages and set a threshold of 50 interactions to augment the gene count in the initial DEG set. The rationale for this was that the likelihood of uncovering novel targets for disease treatment increased with increasing the number of disease-protein networks (17).
Prioritizing biological thyroiditis and thyroid cancer risk genes. Next, we employed the extended list of DEGs within the network, using the STRING database for additional functional annotation. This approach aimed to obtain a deeper understanding of thyroid cancer in thyroiditis and to pinpoint potential biomarkers as targets. DEGs were filtered through six functional annotation scoring systems, employing the criteria described herein. 1) Knockout mouse phenotype (KMP): To ascertain whether the gene was associated with specific phenotypic diseases in mice, genes were ranked based on Mammalian Phenotype Ontology (MP) from WebGestalt (2019), with a false discovery rate (FDR) of q-value <0.05, indicating significance (18). 2) Primary immunodeficiency (PID): Thyroiditis and thyroid cancer is linked to innate immune disorders. Importantly, genetic variations that coincide with genes related to PID might play a role in the development of thyroid cancer within the context of thyroiditis. The PID genes were sourced from the 2019 update of the IUIS phenotypical classification (19, 20). Enrichment analysis of the data was performed using a hypergeometric test, with a significance threshold set at p-value <0.05. In addition, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database was employed to identify relevant molecular pathways (21, 22). KEGG prioritized genes in STRING. The significance threshold was set at q-value <0.05 (FDR). 3) Gene Ontology (GO): GO categories were differentiated as biological processes (BP), cellular components (CC), and molecular functions (MF) to identify specific biological functions involved in thyroiditis and thyroid cancer. A GO enrichment analysis was performed using STRING (18), with the significance threshold of FDR set as q<0.05. Genes with scores of ≥3 were categorized as “biological risk genes” for the transition to thyroid cancer from thyroiditis. Higher annotation scores corresponded to genes exerting a more significant biological influence on the development of thyroid cancer within the context of thyroiditis and were thus referred to as “biological risk genes”.
Analysis of gene correlation in GEPIA. The online database GEPIA (http://gepia2.cancer-pku.cn) was employed to validate the genes that showed significant correlations in TIMER. GEPIA is an interactive web resource that incorporates data from 9,736 tumor samples and 8,587 normal samples sourced from TCGA and the GTEx projects. This platform can be used to analyze RNA sequencing expression data (23, 24). GEPIA was used to produce survival curves for overall survival (OS) and disease-free survival (DFS) based on gene expression. Correlation analysis of gene expression was performed using specific sets of TCGA expression data. The Spearman correlation was employed to ascertain the correlation coefficient. In the visualization, CXCR4, IL6ST, PPARG, and TP53 were placed on the x-axis, while other genes were depicted on the y-axis. The analysis involved datasets of both tumor and normal tissue.
GSEA for examining the transition from thyroiditis to thyroid cancer. To pinpoint potential candidate genes for thyroid cancer in thyroiditis, we analyzed the collection of biological risk genes associated with thyroid cancer in thyroiditis using the GSEA technique (https://www.gsea-msigdb.org/gsea/index.jsp). GSEA is a computational approach used to assess whether a predefined set of genes exhibits statistically significant and consistent differences between two biological conditions (25).
Statistical analysis. Continuous variables were expressed as the mean±standard deviation (SD). All statistical analyses in this study were performed using R (version 4.3.1) and GraphPad Prism 10 (GraphPad Software, Boston, MA, USA) (26). DEGs were identified using the limma package (14). The WebGestalt 2019 R package was employed to perform over-representation analysis (ORA), which included enrichment analyses of pathways, such as KMP, GO, and KEGG (18). PID was analyzed using the hypergeometric test with a significance threshold set at p-value <0.05. GraphPad Prism 10 was used to visually represent the complete set of DEGs by illustrating the overlapping regions between GSE3678 and GSE138198 (26-28).
In the final phase, we conducted a manual literature search to delve into previous studies investigating each of these genes in relation to thyroiditis and thyroid cancer. Additionally, we employed STRING (16) to construct a protein-protein interaction network, assessing the collective relationships among this group of genes by drawing insights from diverse knowledge sources. The network construction settings were defined as follows: Neighborhood, Gene Fusion, Co-occurrence.
Results
Screening results for DEGs in the two datasets. We successfully identified 194 DEGs – 90 up-regulated (red) and 104 down-regulated (blue) genes – in the GSE3678 dataset as shown in the volcano plots in Figure 1A. Next, we identified the DEGs in the 13 HT and 14 PTC samples from GSE138198. In this step, 146 DEGs – 54 up-regulated (red) and 92 down-regulated(bule) genes – were identified, as shown in volcano plots in Figure 1B. To enhance the rigor of the risk gene identification, we gathered all DEGs by intersecting the analysis outcomes from the two datasets. Seventy genes were found to be common to the two groups, as shown in the Venn diagram in Figure 1C and Supplementary Table S1.
Biological risk genes for thyroiditis and thyroid cancer. We employed a rigorous functional annotation process to identify biological risk genes for PTC in thyroiditis. Before identifying the biological progression from thyroiditis to thyroid cancer in risk genes, we expanded the interaction network of DEGs using the STRING database.
We established a threshold of 50 interactions, inputted 70 DEGs from the earlier stages, and yielded 79 DEGs for subsequent analysis (Supplementary Table S2). In addition, to allocate priority to genes for this study, we employed a scoring system that was previously used in other studies (29-31): 1) KMP (n=3); 2) gene prioritized by PID 2019 (n=2); 3) gene prioritized by KEGG (n=51); 4) gene prioritized by BP (n=76); 5) gene prioritized by CC (n=67); and 6) gene prioritized by MF (n=72). Figure 2A and B show a distribution score for each criterion. Eventually, a total of 79 risk genes associated with thyroiditis and thyroid cancer met the criteria, with a score of ≥3. Our findings from a further analysis of the gene scores revealed top four genes, i.e., CXCR4, IL6ST, PPARG, and TP53, all having higher scores (Figure 2C).
Discovery of candidate genes for thyroiditis and thyroid cancer. GEPIA (using the TCGA cohort data) performs survival analysis to identify the most significant associations with patient survival. The survival analysis was based on the levels of expression of PPARG and TP53 is p<0.05 (Figure 3). Important genes and pathways determined using GSEA. To better understand the role of thyroiditis in the development of thyroid cancer, we applied GSEA to analyze the signatures of thyroiditis and thyroid cancer (p<0.05, FDR <0.25) (Figure 4). The results showed that the main pathways of PPARG and TP53 were HALLMARK_UV_RESPONSE_DN, HALLMARK_ADIPO GENESIS, HALLMARK_CHOLESTEROL_HOMEOSTASIS, HALLMARK_P53_PATHWAY, and HALLMARK_WNT_ BETA_CATENIN_SIGNALING. The set of DEGs was expanded using the STRING database to identify additional potential target genes. We inputted the list of DEGs selected in the previous stages and set a threshold of CXCR4, IL6ST, PPARG, and TP53 interactions to augment the initial DEG count. The rationale was that the inclusion of more disease-protein networks Functional Enrichment in the analysis would lead to increased likelihood of uncovering novel targets for disease treatment (17). The PPI network is shown in (Figure 5).
Discussion
Bioinformatics-driven gene repositioning is a method that uses computational techniques to identify new potential uses for known genes (32). Biomolecular analysis has been extensively used to uncover cancer-causing genes and to dissect the underlying molecular mechanisms of various diseases at the genetic level to improve diagnostics (33). In this study, we attempted to identify new biomarkers for thyroiditis and thyroid cancer, through multiple STRING database approaches.
By identifying existing bioinformatics methods that can target the underlying molecular disease mechanisms, gene repositioning may provide an alternative, promising approach that quickly identifies and prioritizes new potential treatment options for thyroiditis patients. This study analyzed the gene expression profiles of tissue samples from patients with thyroiditis and patients with thyroid cancer and compared them to the profiles of normal samples. This could help identify up-regulated or down-regulated genes in thyroiditis and thyroid cancer tissue, which suggested potential targets for bioinformatics-driven gene repositioning.
In the initial phase of this research, the hypothesis aimed to validate the coherence between the information derived from differential expression in microarray experiments and that inferred from RNA-seq experiments. As depicted in Figure 1, when we specifically examined the two Gene Expression Omnibus (GEO) experiments, it became evident that the number of shared genes is significantly higher. This suggests that, when seeking highly representative genes, it may be necessary to undergo at least two filtering stages to identify those genes that appear to be the most relevant for making distinctions. Consequently, all the integration procedure is left as future work. All in all, an interesting group of DEGs in common was found, the 70 genes of interest in thyroiditis and thyroid cancer. Utilizing the STRING database, a threshold of 50 interactions was set. Seventy DEGs from earlier stages were inputted, resulting in 79 DEGs for subsequent analysis.
Our next step was prioritizing the DEGs for candidate bioinformatics-driven gene repositioning prediction using a scoring system based on six functional annotations. Through this pipeline, we successfully identified the top four biological risk genes as potential biomarkers of thyroiditis and thyroid cancer: CXCR4, IL6ST, PPARG, and TP53. CXCR4 is a G protein-coupled chemokine receptor that regulates tissue development, cell trafficking, proliferation, and the immune response. It is expressed in anaplastic thyroid carcinomas and potentially holds a pivotal role in facilitating tumor cell migration and local invasion (34, 35). IL6ST, also known as gp130, has been found to have a strong association with IL-6 and thyroid disorders (36, 37). PPARG is a nuclear receptor that regulates cell cycle and apoptosis (38). It is not only crucial for synthesizing thyroid hormones and ensuring consistent thyroid function, but also highly relevant to the diagnosis and treatment of numerous thyroid ailments (39). PPARG may be associated with chronic lymphocytic thyroiditis (CLT), a condition sometimes considered a precursor to thyroid cancer, which is negative for anti-thyroid antibodies (38). The point mutations in TP53 are situated within the genomic region from exons 5 to 8 (40). Changes in the sequence, stability, and downstream signaling of p53 play a significant role in the development of thyroid cancer (41). Due to their frequent occurrence in undifferentiated thyroid cancer, the presence of P53 mutations might indicate a highly aggressive thyroid cancer. However, as a standalone factor, the sensitivity of P53 is too low for it to serve as a dependable clinical biomarker (38).
Notably, among the top four potential biomarkers of thyroiditis and thyroid cancer identified from the databases, the PPARG and TP53 genes also demonstrated consistent results in gene pathway analysis. In the survival analysis conducted in GEPIA, only the presence of PPARG showed meaningful results. This highlights the potential of the PPARG pathway as a target for future development to treat thyroiditis and thyroid cancer. These genes have the potential to become markers for thyroiditis and thyroid Cancer diagnosis and may even help to further enhance the care of these patients.
Conclusion
In summary, we found 70 common intersection genes in patients with HT or PTC. The use of a scoring system based on functional annotations revealed that CXCR4, IL6ST, PPARG, and TP53 could serve as important upstream regulatory genes. These findings helped to uncover relationships between certain thyroid-related genes and their polymorphisms in the pathogenesis of PTC. Although thyroglobulin antibody epitope patterns differ between HT and PTC (42), the latter induces a lymphocytic reaction and consequent HT (43). However, the mechanisms of increased TSH in HT may have a role in the increased risk of PTC (44). Therefore, PPARG and TP53 will be the focus of our future research.
Acknowledgements
We thank Tungs’ Taichung MetroHarbor Hospital for assistance with the preparation of this manuscript.
Footnotes
Authors’ Contributions
Conceptualization, Szu-I Yu and Meei-Ling Sheu; methodology, Szu-I Yu; software, Szu-I Yu; validation and formal analysis, Szu-I Yu and Meei-Ling Sheu; investigation, Meei-Ling Sheu; resources, Yao-Hsien Tseng; data curation, Szu-I Yu; writing – original draft preparation, Szu-I Yu; writing – review and editing, Yao-Hsien Tseng; visualization, Szu-I Yu; supervision, Szu-I Yu; project administration, Yao-Hsien Tseng; funding acquisition, Meei-Ling Sheu and Yao-Hsien Tseng. All Authors have read and agreed to the published version of the manuscript.
Supplementary Material
Supplementary material can be downloaded from: https://figshare.com/s/a948e646b4ad8b0ca8d5
Conflicts of Interest
The Authors declare no conflicts of interest.
Funding
This research was funded by Tungs’ Taichung MetroHarbor Hospital, grant number TTMHH-R1120048.
- Received May 9, 2024.
- Revision received June 19, 2024.
- Accepted July 3, 2024.
- Copyright © 2024, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).