Introduction

Anaplastic thyroid carcinoma (ATC) is a very rare tumor of the thyroid gland, featuring undifferentiated tissue. ATC is diagnosed in about one percent of thyroid cancers with nearly 100 percent disease-specific mortality1. In most cases, surgery is intended to prevent imminent airway compromise and despite multimodality approach including post-surgery radio/chemotherapies, many ATC patients have very poor outcome2. Radioactive iodine administration which has been routinely applied and considered to be effective in differentiated thyroid cancers, is not successful in ATC patients as ATC tissue does not concentrate iodide due to defects in sodium-iodide symporter (NIS) expression, structure or translocation3. Attempts have been made to induce radioactive iodine uptake in ATC cells4, e.g., by stable expression of NIS5 or suppression of inhibitory signaling pathways6,7, however none have yet been clinically used for ATC. The inability of ATC cells to respond to radioactive iodine seems to be the result of several genetic and epigenetic abnormalities. ATC is a genetically complex disease, and various mutations, notably in BRAF, TP53, TERT, RAS, and PIK3CA, have been noted in samples8,9,10,11. Moreover, many breaks and copy number variations have been observed in ATC samples12. For instance, with regard to NIS expression it was found that BRAFV600E, one of the most frequent mutations in ATC, leads to NIS downregulation through induction of TGFβ secretion13 or regulation of DNA methyltransferase 114. In addition, membranous NIS-expressing thyroid tumor samples were shown to have wild-type BRAF and N-RAS15. Moreover, suppression of MAPK and PI3K/AKT signaling pathways, two pathways with the most frequent mutations in ATC, led to NIS restoration6,7. This complicated and multi-pathway regulation of NIS expression has persuaded us to exploit other approaches in order to better understand it.

At several levels, biological entities show such complicated structures and behaviors that systemic approaches are required to complement studies on single molecules. Further encouragement to try these approaches came from the similarities between biological and non-biological (e.g., social) networks that facilitate use of similar methods16. Centrality analysis is one of these methods employed by biological researchers to determine the importance of each node (e.g., protein) in a network (e.g., protein–protein interaction (PPI) networks)17. In such networks, protein interactions are undirected and unweighted edges18. Centrality analysis has been applied in order to find essential proteins of PPI networks in a wide range of organisms19,20,21,22,23. This method has also been brought to cancer research24. In almost all studies, the goal was finding the most suitable centrality indices or a combination of them, usually by comparing the ranking results within experimental data17,18,19,20,21,22,23,25,26,27,28. Finding essential proteins of a network using centrality analysis can help researchers to design therapeutics and genetic manipulations with a minimalistic approach, and therefore lower cost.

Considering the complexity of NIS expression regulatory pathways in ATC, we attempted to simplify it by building a model network comprising the transcription factors and signaling pathways (MAPK, TGFβ/SMAD, PI3K/AKT, and TSHR) related to NIS expression13,29,30. Based on 167 input proteins, the NIS regulatory protein interaction network (NIS-ERPIN) was established. These proteins include transcription factors and signaling pathway proteins that regulate NIS expression, along with some other proteins that interact with them. NIS-ERPIN has then been analyzed for both modularity and centrality and sorted based on different centrality indices. Ranks of proteins in each centrality index have been considered as indicators of proteins essentiality within the network. Protein essentiality in this network may be utilized in order to choose molecular targets for NIS expression and consequently response to radioactive iodine therapy in ATC patients if one is going to disrupt the network.

Methods

Creating the NIS-ERPIN

We have built a network of proteins, mainly including the four signaling pathways and transcription factors that regulate NIS expression (Supplementary Table S1). These proteins were initially mined manually through literature review and were enriched by several interacting proteins that were recommended by BioGRID31, together making the list of 167 input proteins (Supplementary Figure S1). BioGRID is the Biological General Repository for Interaction Datasets, a curated database of different types of interactions, notably protein–protein interactions. Using Cytoscape (version 3.7.1)32, a software capable of modeling the networks of molecular interactions, a network was reconstructed with the data from BioGRID. It is possible to create a network for which all nodes are not necessarily connected to each other. Therefore, we have a whole network including different connected sub-networks. In most cases, there is a very large connected sub-network and other small connected sub-networks. If the other small sub-networks contain a very small number of nodes (compared to the largest one), it is possible to remove them from the analyses. At the present study, the largest connected sub-network was considered for further topological analyses which contains 1,278 nodes and 76,924 edges or interactions (Supplementary Figure S2).

Centrality analysis

Eleven centrality indices were used in this study: Degree, Betweenness, Closeness, Bottleneck, Radiality, Stress, Clustering Coefficient, EcCentricity, Edge Percolated Component (EPC), Density of Maximum Neighborhood Component (DMNC), and Maximal Clique Centrality (MCC)33. Based on the primary ranking, eight centrality indices were informative and thus categorized in three groups: (1) representing the number of immediate neighbors of a node (Degree), (2) indicating the number of interactions in the circle of neighbors (DMNC, and Clustering Coefficient), and (3) reflecting distance and information flow in the network (EPC, Closeness, Radiality, Betweenness, and Stress).

Centrality indices in each category averaged out to give a combination index which we used to compute weighted and unweighted averages of protein ranks. Since we hypothesized that, in NIS-ERPIN, distance- and information flow-based centrality indices are better predictors of protein essentiality, weighted averages were computed to reflect this assumption. The weighted average was calculated by giving a factor of 0.7 to the average of information flow indices, 0.2 to Degree, and 0.1 to the average of Clustering Coefficient and DMNC.

Modularity analysis

A network of protein interactions was generated using STRING database34. The output file, the list of protein interactions, was imported in Gephi 0.9.235 for modularity analysis by Blondel’s method36. A graph of 2039 nodes and 76,924 edges was created with the resolution set at one. For comparison, three random graphs with the same number of nodes and resolutions were also generated.

Mutated genes in ATC

To obtain the mutational landscape of genes that encode proteins of NIS-ERPIN, we used the results of the whole-exome sequencing on ATC samples with the largest sample size, including 22 ATC tissue specimens and four ATC cell lines37. At the present study, the list of mutation frequencies for proteins of NIS-ERPIN in ATC is extracted from the results of Kunstman et al.37 study and is presented in Supplementary Table S2. The mutation frequencies, normalized and sorted based on the protein length, in ATC were compared with the lists obtained from the centrality analysis. The correlation between lists was evaluated using R software (version 3.5.2) in Mac Operating System.

Results

Scoring based on centrality indices and averaging out the proteins ranks

Of the eleven centrality indices used, the results of three (Bottleneck, MCC, and EcCentricity) were not informative (data are not shown) and as a result were excluded from further analyses. They were not informative because their lists consist of mostly repetitive scores. The high-ranked proteins of the eight informative indices are presented in Table 1. Results of the weighted and unweighted averages of protein ranks of three centrality categories are presented in Table 2.

Table 1 Top-20 proteins of the eight informative centrality indices.
Table 2 Weighted and unweighted averages of proteins rank after categorizing centrality indices in three groups, as mentioned in “Methods”.

Modularity analysis

The modularity score of 0.411 was calculated for NIS-ERPIN (plot of size distribution is presented in Fig. 1), whereas three random graphs with the same size of nodes had modularity scores of about one fifth (0.085, 0.084 and 0.082; plots of size distribution are presented in Supplementary Figure S3. Moreover, the graph of NIS-ERPIN contained seven communities of nodes, and the random graphs had nine, eight and ten communities, respectively.

Figure 1
figure 1

Size distribution of protein communities of NIS expression regulatory network. 7 communities were detected, and the modularity score was calculated 0.411.

Comparing centrality analysis with mutation frequencies

Interested in the correlation between the rate of mutations and the essentiality of the protein products in NIS-ERPIN, we focused on the mutations of 167 protein-coding genes, reported by Kunstman et al. in the whole-exome sequencing of ATC samples and cell lines37. The list of mutations was compared with the lists of protein ranks from centrality analyses. There was no statistically significant correlation between the list of mutation frequencies and any of the centrality indices (Table 3). Also, all correlation coefficients were below 0.3 and considered weak.

Table 3 Correlation analysis of mutation rates and centrality indices shows that there is no statistically significant correlation between any of the centrality indices and normalized mutation rates. Mutation data are from a study by Kunstman et al.37.

Discussion

Centrality analysis has been recently explored in thyroid cancer research38,39. In doing so, the two groups of Shang et al. and Hossain et al. first regarded differential gene expression as their criterion for the importance of genes and then built their networks. Taking a different approach here, we have built our network without referring to the gene expression data and therefore we provide a protein interaction network which can be used as a basis for further studies. When creating strategies to disrupt the NIS expression regulatory network, gene expression data can be imported to strengthen or weaken edges and nodes of the network.

An interesting result of our investigation was finding the concentration of apoptotic proteins on top of the unweighted average list of the three centrality categories (Table 2). This outcome indicates that considering all eight centrality indices explored in this study, several apoptosis-related proteins, including BCL2L1, CASP9, XIAP, CASP3, and PARP1, are important in the network. Apoptosis has been proposed as a mechanism of cell death after radioactive iodine uptake in thyroid40 and non-thyroid41,42,43,44 cancer cells. However, our results suggest that apoptosis-related proteins may have a notable role in regulating NIS expression.

Categorizing centrality indices, we hypothesized that those indices, indicating distance and information flow in the network, were better indicators of essentiality in the NIS-ERPIN, if one was going to disrupt the function of the network in ATC. These indices, including Betweenness, Closeness, Stress, Radiality, and EPC, can provide information about the fast flow of signal in the network. Betweenness and Closeness have been shown to present essential proteins of PPI networks in yeast, worm, and fruit fly22. Betweenness effectiveness in discriminating essential proteins is proposed to be independent of the number of connections (Degree centrality)22. In a homology-based study in yeast PPI network, Xiong et al. found that cancer proteins tend to have higher Betweenness scores than average24. In a study on prostate cancer, Closeness and Betweenness were found to be more predictive of unknown genes/proteins related to the disease, whereas Degree was able to explore known related genes/proteins accurately25. Centrality indices of Degree, Betweenness, and Closeness have been proposed to be profitable in exploring essential proteins of cancer PPI networks45 and have been applied in studying the pan-cancer network of proteins related to epithelial-mesenchymal transition46. However, we reasoned that Degree centrality could give high scores to more known proteins due to the simple fact that these more recognized proteins have been the focus of more studies, and therefore more interactions (network neighbors) have been found for them. Considering the distance-based indices, we found that many top proteins in these lists are encoded by highly mutated genes in ATC (Table 2). The results of the present study suggest that information-flow- and distance-based centrality indices may be useful in predicting the essential proteins in regulating NIS expression.

To emphasize the importance of distance and information flow in the network, we put more weight on this category when averaging ranks. We found that top-20 positions are concentrated with proteins which have been frequently mutated in ATC, including AKT1, TP53, PIK3CA, PIK3CB, MAPK3, PIK3CD, PIK3CG, SRC, MAPK1, and MYC (Table 2). These are all main proteins of AKT/PI3K or MAPK pathways. The essentiality of distance-based and information-flow-based centrality indices in our study is in accordance with experimental efforts to restore the NIS function. By using small molecules to inhibit MEK, AKT, and histone deacetylase, Liu et al. could restore NIS expression and iodine uptake in several non-thyroid cell lines6. Also, Hou et al. gained similar results upon downregulation of BRAF and AKT in melanoma cells7. These observations confirm the significance of MAPK and AKT signaling pathways and epigenetic regulation in NIS expression. PI3K and TGFβ were shown to inhibit radioactive iodine uptake, and one mechanism in the case of TGFβ was NIS repression47. In another study to restore iodide uptake, transfection of ATC cells with vector encoding wide-type TP53 gene was successful to express NIS at the mRNA and protein levels and induce radioactive iodine concentration and eventually cell death48. Developing a high-throughput screening, Oh et al. recently found a new tyrosine kinase inhibitor leading to MAPK signaling pathway inactivation and NIS expression49. Accordingly, we conclude that not only do our results suggest that MAPK and AKT/PI3K pathways may be more important in regulating NIS expression but also, they confirm the importance of distance-based centrality indices in this network. Additionally, other high-ranked proteins in the category of distance-related centrality indices may be interesting subjects of study.

In a modular network, communities of nodes which are densely connected can be found, whereas their connections with nodes of other communities are sparse36. Finding evidence of modularity is encouraging, as it suggests that there might be nodes which are highly essential for the integrity of the network, regarded as its weakness. The modularity of NIS-ERPIN emphasizes the significance of centrality analysis presented here. Moreover, we investigated if we can find nodes with high scores in Betweenness and low scores in Degree, which might be the connecting nodes of protein communities50. However, we observed that there was no such node, and that there was a very strong correlation between scores of Betweenness and Degree (data are not presented).

Curious about any correlation between centrality ranks of proteins and their rate of mutations, we also compared the results of the only available whole-exome sequencing of ATC samples with our lists of centrality indices and combinatory indices. Such a correlation may indicate the importance of highly mutated proteins for the NIS-ERPIN; however, we found no statistically significant correlation. This lack of meaningful correlation was not unexpected since mutation frequencies found in the whole-exome sequencing were only partially in accordance with mutation frequencies found in target-based sequencing studies8,9,10,11,12. Whether this controversial outcome is due to possibly inadequate ATC samples investigated in the whole-exome sequencing will be clarified by further examinations. Therefore, future genetic studies may strengthen or weaken the implications of our centrality analysis.

Beginning with 167 input proteins to create our network and listing the top-20 in each index, we propose here the essential proteins of the NIS expression regulatory network (Fig. 2). We hypothesized that those centrality indices that represent the distance and flow of information would be better indicators of essential proteins of NIS-ERPIN. We also found that several proteins encoded by highly mutated genes of ATC were high-ranked in these lists. This study can be used to exclude some targets and include other targets in experimental efforts to restore NIS expression. However, we are aware that NIS downregulation is not the only abnormality that results in an inability of thyroid cells to uptake iodide, and that NIS translocation is also worth considering15. Nevertheless, some proteins of our network probably have a role in the suppression of NIS function in ways other than expression regulation. For example, it has been shown that β-catenin regulates NIS distribution in thyroid cells51. Additionally, NIS inhibitory mutations should be considered, particularly in personalized therapy. In addition to the complicated manner of NIS suppression, it is worth noting that, to interpret raw data from the network analysis, careful reflection on biological context and/or experimental data is required. Besides, there are some limitations regarding the centrality analysis. One of them is different indices could lead to difference on the most important vertex in a network. Hence , using a proper index depends heavily on the context of the reconstructed network and should be chosen with a reasonable process. Another limitation is whether a combination of different indices is more useful than considering an index alone. However, this issue also depends on the basics and properties of the reconstructed network. Therefore, one must find or suggest the proper indices in the study based on the context of the biological problem in the study52. Moreover, PPI networks are affected by false positives and, have also not yet been completed45, which is a source of inaccuracy in these analyses. Therefore, a more cautious approach to network analysis results is recommended.

Figure 2
figure 2

Several proteins of the four signaling pathways, including MAPK (A), PI3K/AKT (B), TGFβ/SMAD (C) and TSH/TSHR (D), alongside a few transcription factors (E) with a role in the regulation of NIS expression in ATC from the list of 167 input proteins, also appeared in top-20 lists of centrality indices.

Conclusion

The results of our modularity and centrality analyses suggest a combination therapy approach to induce NIS expression in ATC cases, particularly if the targets are essential proteins we present in this study. We propose several genes/proteins in the NIS expression regulatory network as interesting targets for manipulation, including: AKT1, TP53, PIK3C, MAPK1, MAPK3, SRC, MYC, EGR1, XIAP, BCL2L1, CASP3, and CASP9. Also, several previously unnoticed proteins which rank high in our analyses might prove to be interesting targets of study. Moreover, we presume that lower rank proteins of our weighted and unweighted combinatory indices are of little importance and interest, and that manipulations based on them will probably result in failure due to the fact that they are not essential enough in the NIS expression regulatory network.