Introduction

Transthyretin (TTR) is a plasma protein predominantly synthesised in the liver, where it functions as a carrier for thyroxine (T4) and retinol—the latter in complex with retinol-binding protein1,2. Its three-dimensional structure, illustrated in Fig. 1, comprises four identical subunits of ~14 kDa each, forming a highly stable homotetramer rich in β-sheet content2,3,4,5,6,7,8,9,10.

Fig. 1: Wild-type transthyretin.
figure 1

PyMOL visualisation of the three-dimensional structure of wild-type transthyretin (TTR) predicted using AlphaFold3.

The TTR gene (Gene ID 7276; GenBank accession NG_009490.1) is located on chromosome 18q12.1 and spans ~6.9 kb7. It consists of four exons separated by three introns. Exon 1 encodes a 23-residue sequence that includes a 20-residue signal peptide and the first three residues of the mature protein, while exons 2–4 encode residues 4–127. Throughout this study, residue numbering refers to the mature 127-residue monomer, excluding the signal peptide. Variants are therefore annotated using mature protein nomenclature (e.g., Val30Met or V30M), while genomic nomenclature includes the signal peptide (e.g., p.Val50Met or p.V50M).

To date, 216 point mutations have been reported within the first 127 residues of the mature TTR monomer11, encompassing both benign and pathogenic variants12,13,14. While some variants have been extensively characterised, biochemical and clinical data on rare mutations remain limited. Most reported changes are single amino acid substitutions, although notable exceptions include the Val122_del deletion14, duplications at Met13, Glu51, and Ser52, and compound heterozygous mutations observed in individual patients14. In this study, the initial selection of mutations was guided primarily by their established pathogenicity, as reported in clinical and molecular literature. Among these pathogenic variants, we prioritised mutations that are predominantly associated with cardiological and neurological phenotypes—two of the most clinically relevant manifestations of ATTR. The final subset used was chosen based on a combination of disease prevalence, clinical severity, and evidence of structural destabilization. Finally, we selected the three most globally prevalent variants for molecular dynamics analysis. Figure 2 represents the analysis of 133 point mutations, primarily those classified as pathogenic or of uncertain significance.

Fig. 2: Overview of the computational pipeline.
figure 2

Starting from the wild-type (WT) TTR sequence and known point mutations, AlphaFold3 was used to generate structural models for all the variants. Predicted structures were assessed and analysed in latent space and by means of contact networks to evaluate structural impact. Ligand docking was performed to assess mutation-dependent binding affinities. Finally, the optimisation of the existing ligand was proposed to better fit with the existing variants.

Previous investigations have examined the spatial distribution of TTR mutations to assess potential clustering. While residue-level analysis did not reveal significant aggregation, sliding window approaches (e.g., seven-residue intervals)14,15 identified mutation hotspots. Notably, highly amyloidogenic and clinically aggressive variants cluster in regions 1 and 2 (residues 26–59), whereas non-amyloidogenic variants are more prevalent in regions 4 and 6 (residues 97–125), suggesting a non-random distribution across the sequence.

Pathogenic mutations destabilise the native tetrameric structure, facilitating dissociation into monomers that misfold, aggregate, and form insoluble amyloid fibrils. This process underlies both hereditary transthyretin amyloidosis (ATTRv) and wild-type transthyretin amyloidosis (ATTRwt)16,17. Over 150 pathogenic variants have been reported, with Val30Met being the most studied due to its high prevalence in endemic populations affected by familial amyloid polyneuropathy18,19. Disease progression is marked by systemic fibril deposition, leading to multi-organ dysfunction and ultimately death1,20,21.

Clinical manifestations vary by mutation. Destabilising variants often result in early-onset polyneuropathy or cardiomyopathy, whereas ATTRwt primarily presents later in life as cardiomyopathy22,23. Progressive fibril accumulation leads to debilitating complications, including cardiac failure and peripheral neuropathy, with rapid clinical decline in the absence of treatment24,25. Inflammatory responses have also been implicated in disease progression, as elevated cytokines may suppress hepatic TTR synthesis26,27.

Beyond amyloidosis, altered TTR expression has been associated with other pathological conditions such as preeclampsia28, highlighting its broader clinical relevance and potential utility as a biomarker. The complex relationship between TTR sequence variation, structural stability, and disease phenotype underscores the need for systematic molecular characterisation29.

Under physiological conditions, TTR assembles into a kinetically stable tetramer17,30. Mutations that perturb this assembly increase the pool of monomeric intermediates susceptible to aggregation1,18. In contrast, stabilising mutations are associated with higher circulating TTR levels and greater longevity31. Therapeutic strategies aimed at stabilising the tetramer—most notably tafamidis—have shown substantial clinical benefit32,33,34.

Comprehensive structural and functional profiling of TTR variants is critical for advancing precision medicine. Elucidating mutation-specific effects on folding stability, aggregation propensity, and drug binding provides key insights into disease mechanisms and therapeutic responsiveness35,36. Moreover, circulating TTR concentrations and conformational changes may serve as accessible biomarkers for diagnosis and treatment monitoring37,38.

This study focuses on a prioritised and curated subset of 133 TTR variants based on reported pathogenicity, with an emphasis on those associated with neurological and cardiac phenotypes—the two most clinically impactful forms of ATTR. Selection criteria included mutation prevalence, severity of clinical presentation, and evidence of structural destabilisation. The three most prevalent pathogenic variants worldwide were selected for detailed structural and dynamic characterisation via molecular simulations.

Key findings of this analysis include:

  • A comprehensive mapping of mutations across the TTR sequence revealed no consistent correlation between residue position and pathogenicity.

  • High-confidence structural models of both monomeric and tetrameric forms were generated for each variant using AlphaFold3.

  • Predicted changes in Gibbs free energy (ΔΔG) did not exhibit a direct relationship with pathogenic potential.

  • Clustering based on structural similarity identified three major variant groups and suggested that R123S may warrant reclassification as non-pathogenic.

  • Protein contact network analysis demonstrated that pathogenic mutations disproportionately impact structural regions critical for tetramer stability.

  • In silico evaluations of drug binding for two clinically approved stabilisers revealed substantial mutation-dependent variability in predicted affinity.

  • Structure-based modifications of existing therapeutics were proposed to improve binding efficacy in the context of destabilising variants.

Together, these findings provide a comprehensive and structure-guided framework for understanding the molecular basis of TTR amyloidosis and informing the development of mutation-specific therapeutic strategies within a precision medicine paradigm.

Results

Overview of the results

The computational analysis of TTR variants revealed consistent structural and functional signatures associated with pathogenicity. AlphaFold3-predicted models showed high agreement with experimental structures (mean RMSD = 0.2662 Å), and TM-score clustering indicated that pathogenic mutations tend to co-localise, suggesting shared conformational effects. Embeddings from the ESM2 language model, projected via UMAP, accurately distinguished pathogenic from benign variants (ROC AUC = 0.9948), outperforming AlphaMissense, E-SNPs&GO, and VESM++. To assess functional impact, classical and AI-based docking have been combined. DiffDock-L and AutoDock Vina revealed mutation-specific binding profiles for tafamidis and acoramidis, with the latter showing higher affinity in several non-canonical contexts. Ligand optimisation via DiffSBDD improved binding to destabilised variants such as Y98F. Network analysis and MD simulations further confirmed that both ligands restore structural stability in key pathogenic variants, demonstrating the utility of AI-guided pipelines for mutation-aware drug design. A complete results overview is depicted in Fig. 3.

Fig. 3: Integrated computational framework for the structural and functional characterization of transthyretin (TTR) variants.
figure 3

A multi-level computational analysis has been performed by combining structural prediction, sequence-based representation, and dynamic simulations of TTR variants. A Structural alignment between the structures of variants. B Experimental Validation of the predicted structures. C Two-dimensional UMAP projection of TTR variants for classifying pathogenic and bening variants. D Distribution of TM-Scores for the Variants reveals that structure of the monomer are more conserved. E Receiver operating characteristic (ROC) curves illustrating the classification performance in distinguishing benign and pathogenic TTR variants. F PCN analysis identified changes at mesoscale in variants. F Docking pose of Acoramidis on a generatively designed Y98F variant, revealing predicted binding interactions from AI-based docking. G RMSD time evolution from molecular dynamics simulations of the V50M variant, supporting the structural stability of the predicted model. H Selected Ligands have been optimised. I Molecular Dynamics Simulation of the binding.

Structural alignment of TTR variants

Structural comparisons between AlphaFold3-predicted and experimentally resolved TTR structures revealed a high degree of agreement. The predicted tetramer achieved a TM-score of 0.9904 when aligned to the crystallographic tetramer (PDB ID: 1ICT), while the predicted monomer obtained a TM-score of 0.9958 compared to the crystallographic dimer structure (PDB ID: 3A4D), demonstrating the reliability of AlphaFold3 in reproducing native conformations.

To validate the predicted mutant structures, all available experimental mutant TTR structures were retrieved from the RCSB Protein Data Bank. For each mutation, the AlphaFold3-predicted model was aligned to its corresponding experimental structure, and the root-mean-square deviation (RMSD) was computed. A mean RMSD of 0.2662 Å (standard deviation: 0.0635) was observed for tetramer-only structures, while a slightly higher value of 0.5083 Å was recorded for tetramer-ligand complexes. These values support the structural validity of the predicted models (Table 1).

Table 1 Validation of AlphaFold3-predicted mutant structures against experimental references

AlphaFold3-predicted models offer additional benefits: (i) they are complete, without missing residues; (ii) they include the signal peptide (residues 1–21); and (iii) they account for asymmetric interactions between monomers, whereas experimental tetramers often assume idealised global symmetries (e.g., D2 symmetry).

Pairwise structural alignments were performed for all variants, producing TM-score matrices for both monomeric and tetrameric forms. Hierarchical clustering using average linkage39 revealed that pathogenic variants tended to cluster together, suggesting shared conformational perturbations, whereas benign mutations formed structurally distinct groups (Fig. 4).

Fig. 4: Hierarchical clustering of TTR tetrameric variants based on TM-score similarity.
figure 4

Clinical classifications are colour-coded: red (Pathogenic), orange (Likely pathogenic), green (Benign), and blue (Unknown). Three main clusters are annotated (purple, black, and sea-green).

The TM-score distributions for monomeric and tetrameric forms are shown in Fig. 5. Tetrameric structures exhibited narrow distributions centred around 0.96, indicating strong structural conservation. By contrast, monomeric forms showed broader variability, suggesting conformational flexibility that may underlie aggregation propensity.

Fig. 5: Density plot of TM-score distributions for monomeric and tetrameric TTR variants.
figure 5

Tetrameric structures display high structural conservation; monomeric forms show greater variability, potentially reflecting aggregation-prone states.

Latent space analysis using ESM2 and UMAP

To explore sequence-level determinants of pathogenicity, the ESM2 transformer-based protein language model40 has been used to compute 1280-dimensional embeddings for each TTR variant. These embeddings were projected into two dimensions using Uniform Manifold Approximation and Projection (UMAP)41, yielding an interpretable latent space representation (Fig. 6).

Fig. 6: Two-dimensional UMAP projection of ESM2 embeddings for TTR variants.
figure 6

Each point represents a single variant, coloured by clinical annotation from UniProt (accession ID: P02766). Clusters indicate shared functional or structural properties.

Clinical annotations were obtained from the UniProt Proteins API, with additional classification into amyloidogenic or non-amyloidogenic categories using the Mutations-TTR database42. Variants of unknown significance were often located near benign or non-amyloidogenic mutations, suggesting possible reclassification.

Subclusters were observed around residues such as L55 and V28, whose mutations (e.g., L55P, L55R, V28M) are known to cause destabilisation. Importantly, several “Unknown” variants—including S132I, M33I, and R124C—were located close to benign or non-amyloidogenic variants in embedding space. This proximity suggests that they may have similar functional profiles. For instance, R124H and R124C nearly overlapped in UMAP space; R124H is known to be benign and protective43, indicating that R124C may similarly lack pathogenic effects. Conversely, S132I, while still annotated as “Unknown”, clustered with amyloidogenic variants, consistent with reports of increased aggregation propensity44. F64L, classified as “likely pathogenic”, was also part of this cluster, consistent with its association with late-onset neuropathy and mild cardiac symptoms45.

Taken together, the clustering patterns suggest that UMAP projections of ESM2 embeddings can reveal structural and functional similarity among variants, including those currently lacking definitive clinical annotation.

To quantify classification performance, the UMAP-based approach has been compared with respect to three state-of-the-art methods: AlphaMissense46, E-SNPs&GO47, and VESM++48. As shown in Table 2, this method outperformed all baselines, achieving a ROC AUC of 0.9948, compared to 0.9219 for VESM++, and substantially higher than AlphaMissense and E-SNPs&GO. These results confirm the ability of pretrained protein language models, combined with dimensionality reduction, to capture biologically meaningful determinants of variant pathogenicity Table 3.

Table 2 Comparison of ROC AUC scores for classification of pathogenic vs. benign TTR variants
Table 3 Molecular properties and drug-likeness metrics of ligand compounds

Network-based modelling of protein structural perturbations

Protein Contact Networks (PCNs)49 were constructed for both wild-type and mutant TTR tetramers to investigate how single-point mutations influence residue-level centrality. For each variant, the change in centrality for every residue i relative to the wild-type has been calculated:

$$\Delta {C}_{i}={C}_{i,{\rm{wt}}}-{C}_{i,{\rm{mut}}}.$$
(1)

Figure 7 illustrates residue-wise differences in closeness centrality across representative mutations. For visual clarity, only one chain is shown, as all four chains exhibited comparable profiles. Residues and mutations included are those for which at least five residues showed non-zero centrality changes. Complete results are available in the public repository.

Fig. 7: Differences in closeness centrality between wild-type and mutant TTR tetramers.
figure 7

Only one chain is displayed for clarity. Mutations shown correspond to those with at least five residues exhibiting altered centrality values.

To further characterise the structural organisation of TTR, PCN-Miner has identified residue communities—i.e. densely connected subgraphs -that may correspond to cooperative structural or functional domains. The Leiden algorithm was used to partition the PCN into communities, revealing seven and ten discrete modules in the monomeric and tetrameric wild-type structures on Fig. 8). These communities provide a mesoscale representation of the protein, potentially linking mutation hotspots to collective structural dynamics.

Fig. 8: Analysis of Residue Communities in TTR.
figure 8

Residue communities detected in monomeric (left) and tetrameric (right) wild-type TTR structures using the Leiden algorithm. Communities represent densely interconnected regions within the protein contact network.

Variant-aware molecular docking and ligand optimisation in transthyretin

Molecular coupling is a computational approach that is used to predict the binding orientation of a small molecule (ligand) to a target macromolecule, typically a protein. By estimating binding affinities and interaction modes, docking simulations offer valuable insights into the molecular basis of ligand-receptor recognition. In the context of disease-related protein variants, docking enables evaluation of how specific mutations can alter ligand binding profiles, thus informing drug efficacy, resistance mechanisms, and opportunities for drug repurposing or personalisation.

This work presents the use of two docking tools: DiffDock-L50,51 and AutoDock Vina52, to simulate interactions between approved or AI-generated ligands and TTR variants. DiffDock-L was used to predict 20 binding poses per variant-ligand pair, from which the pose with the highest confidence score was selected. AutoDock Vina was used to estimate binding affinities for each selected pose.

Docking with approved ligands

Tafamidis and Acoramidis are FDA-approved transthyretin stabilisers currently used in the clinical management of systemic amyloidosis14. Tafamidis53 is a non-NSAID (Nonsteroidal anti-inflammatory drug) benzoxazole compound that binds to the TTR tetramer, preventing its dissociation into monomers and subsequent amyloid formation. Approved in 201954, it is particularly effective in treating V50M-associated hereditary transthyretin amyloidosis.

Acoramidis55, marketed as Attruby, gained FDA approval in 2024 for transthyretin-mediated cardiomyopathy. Designed to emulate the stabilising T139M mutation56, Acoramidis achieves over 90% tetramer stabilisation throughout dosing intervals.

Figure 9 presents the highest confidence scores from DiffDock-L for TTR variant interactions with tafamidis and acoramidis, respectively. Two variants, A39D and G73R, exhibited consistently low confidence scores across both ligands, suggesting a common mutation-induced disruption or a docking algorithm limitation.

Fig. 9: Confidence scores of docking between mutants and drugs.
figure 9

Highest-confidence docking poses predicted by DiffDock-L for TTR tetramer variants with acoramidis (top) and tafamidis (bottom). A39D and G73R consistently show low-confidence scores.

AutoDock Vina simulations further revealed mutation-specific differences in binding affinity. For each ligand-mutant pair, the lowest Vina score from ten poses was selected to represent optimal binding.

To ensure robustness, predicted poses were aligned with experimentally resolved ligand conformations. Alignments with the lowest RMSD were retained for further analysis using the PyMOL Align plugin.

Binding affinities relative to the V50M mutation for tafamidis and to tafamidis for acoramidis have been calculated as depicted in Fig. 10. Tafamidis showed diminished binding in several mutants, consistent with its optimisation for V50M. In contrast, acoramidis showed broadly improved binding compared to tafamidis for all the reported mutations.

Fig. 10: Binding Affinity Analysis.
figure 10

Binding affinity differences between TTR mutants and a tafamidis and b acoramidis. White cells depicts mutations that report approximately no changes in binding affinity, while red and green ones depict mutations with decreased or increased binding affinity.

Ligand optimisation and de novo design

To improve binding against destabilising mutations, DiffSBDD57 to optimise Tafamidis has been used. Among the ten variants with the lowest binding scores, Y98F was randomly selected for ligand redesign.

In the TTR tetramer, Y98F lies distal to the Tafamidis binding pocket (green). Despite its spatial separation, this mutation significantly reduced binding affinity, highlighting long-range structural effects58. Optimisation via DiffSBDD improved binding to A129T and modestly benefited several others. These results indicate the need and effectiveness of a mutation-specific ligand design as reported in Fig. 11.

Fig. 11: Ligand optimisation against the Y98F mutation using DiffSBDD.
figure 11

White cells depicts mutations that report approximately no changes in binding affinity, while red and green ones depict mutations with decreased or increased binding affinity.

De novo ligand generation using the Y98F tetramer structure as reference has been employed. Residues 15, 17, 54, 106, 108, 109, 110, 117 and 119 were selected as binding pocket ref.59. Binding affinities of the generated compound were evaluated across mutants.

As shown in Fig. 12, the generated ligand exhibited consistently strong affinities when compared with Acoramidis, showing mutation-dependent variability in docking outcomes.

Fig. 12: Evaluation of de novo generated ligand targeting TTR mutants.
figure 12

White cells depicts mutations that report approximately no changes in binding affinity, while red and green ones depict mutations with decreased or increased binding affinity.

A strong negative correlation (−0.4631) with statistical significance (1.9850e−8) was found for acoramidis docking between confidence values given by DiffDock and Vina Scores predicted with AutoDock Vina. This strong, but negative, correlation may be caused by how vina and confidence scores are defined. Vina scores are always negative while confidences are always positive values between 0 and 1. This negative correlation actually indicates agreement between the two scoring methods, as lower (more negative) Vina scores represent stronger predicted binding affinities, while higher confidence values from DiffDock represent greater certainty in the predicted binding poses. No correlation found for tafamidis.

An agreement assessment was also performed to study the alignment of DiffDock and AutoDock Vina in predicting the optimal existing ligand for each mutation. A strong agreement of 96% was found and quite always acoramidis is predicted by both methods as the optimal ligand in terms of docking.

Figure 13 depicts the 2D structures of all the ligands studied in this work, from existing ones (tafamidis, acoramidis, tolcapone and diflunisal) to the ones generated and optimised with DiffSBDD (Y98F_T4 and tafamidis_optimised_V142I).

Fig. 13: Two dimensional structure of ligands.
figure 13

2D structure of existing ligands (tafamidis, acoramidis, diflunisal, tolcapone), generated ligands (Y98F_T4) and optimised ones (tafamidis_optimised_V142I).

For all the structures, a drug-likeness assessment was performed for validating the generated ligands as reported in Table 1. The optimisation of tafamidis leads to an higher QED value maintains compliance with major drug-likeness rules. The optimised tafamidis also reports enhanced membrane permeability with increased Blood-brain barrier permeability (BB Perm) and a reduction of the Topological Polar Surface Area (TPSA). Tolcapone, on the other hand, represents the classic example of the drug-likeness paradox: even if it fails multiple drug-likeness criteria (lowest QED score, poor permeability, high TPSA) it remains therapeutically valuable and effective. The ligand generated by DiffSBDD for the T4 binding site of Y98F-TTR is the best among the studied ligands in terms of theoretical proprieties for drug-likeness. Generated ligand passed the Lipinski’s Rule of Five test and reports the higher QED score of 0.91, with excellent lipophilicity (LogP), good membrane permeability (BB. Perm).

Assessing mutation-induced destabilisation and ligand-mediated rescue

The V50M TTR tetramer displayed increased flexibility in regions corresponding to the signal peptides of each monomer and the thyroxine (T4) binding site as represented in Fig. 14. These local fluctuations coincide with a loss of tetrameric symmetry and may underlie an enhanced propensity for aggregation Fig. 15. Notably, all tested ligands restored local rigidity in the V50M variant, reducing root mean square fluctuations (RMSF) to levels comparable to the wild-type protein, consistent with their established stabilising effects on the TTR tetramer60. Analysis of root mean square deviation (RMSD) and structural frame comparisons further indicated that ligand-bound V50M variants preserved compact conformations throughout simulations, while unliganded V50M exhibited more pronounced conformational drift, as depicted in Fig. 16).

Fig. 14: Impact of TTR point mutations and ligand binding.
figure 14

RMSF comparison across wild-type and mutant TTR variants. V50M shows increased flexibility in chain C and D.

Fig. 15: Impact of TTR point mutations and ligand binding.
figure 15

RMSD of V50M in apo form and bound to tafamidis or acoramidis, showing rescue of wild-type-like flexibility.

Fig. 16: Impact of TTR point mutations and ligand binding.
figure 16

Representative frames from the V50M trajectory illustrating progressive structural changes over time.

Discussion

Transthyretin amyloidosis (ATTR) is a genetically heterogeneous disorder characterised by the accumulation of amyloid fibrils derived from destabilised transthyretin (TTR) protein variants. Despite the clinical availability of stabilising agents such as tafamidis and acoramidis, the results of this paper demonstrate that their efficacy is not uniform across all known single-point TTR mutations. This finding carries profound implications for both therapeutic intervention and the fundamental understanding of genotype-phenotype relationships in amyloid diseases.

By combining AlphaFold3-based structure prediction, transformer-based protein embeddings (ESM2), graph-based molecular docking (DiffDock-L), and classical affinity prediction (AutoDock Vina), the structural and functional consequences of TTR mutations have been analysed. The comprehensive docking analysis revealed that while acoramidis generally shows higher average binding affinity than tafamidis, several variants—including W61L and Y98F—exhibit marked resistance to tafamidis. This observation reinforces the notion that the effect of a drug cannot be universally extrapolated across all genotypes, even when the ligand binds to the same pocket in the same protein scaffold.

This work indicate that mutations exert their influence not only through direct disruption of ligand-contact residues but also via long-range structural perturbations. For example, the Y98F and W61L mutations, although distant from the tafamidis binding site, significantly impair binding affinity, likely by altering the conformational ensemble of the protein. This highlights the necessity of considering the full structural context of a mutation, rather than merely its spatial proximity to the binding site. Docking results, further corroborated by pose alignment and RMSD analysis, demonstrate that traditional assumptions regarding locality in protein-ligand interactions may fall short in cases where dynamic allostery and subtle conformational shifts are involved.

Between the list of pathogenic mutations that show poor binding affinity with tafamidis, the most clinically prevalent V142I to perform the tafamidis optimisation with DiffSBDD has been selected. The use of generative models like DiffSBDD enabled the design of mutation-specific ligands, with the V142I-optimised analog of tafamidis showing improved binding not only to the target mutation but to a broader subset of destabilising variants. However, this optimisation was not universally beneficial, with some mutations displaying reduced binding affinity to the redesigned ligand. These findings suggest that while generative ligand design holds promise for customizing therapies, it may require careful constraint or multi-objective optimisation to avoid adverse trade-offs across the mutational landscape.

Taken together, the results strongly advocate for a synergistic and variant-resolved strategy in the development of TTR-directed therapeutics. Relying on a single stabiliser, even one optimised for a high-prevalence mutation like V50M, may be insufficient in the context of diverse genotypic presentations. A shift toward precision medicine in ATTR is thus both biologically warranted and technologically feasible. Tools such as ESM2 embeddings and UMAP clustering, which has been used to identify benign-like behaviour in uncharacterised mutations, offer a scalable pipeline for preclinical triage and drug-response prediction.

Furthermore, this work opens the possibility for rational repurposing of ligands based on structural similarity in latent embedding space. For example, mutations clustering near known benign variants may benefit from similar therapeutic profiles, while outliers in embedding or docking space could be prioritized for custom ligand design or combinatorial therapy.

Future work should aim to extend these findings beyond computational frameworks. In vitro validation of predicted affinities and conformational shifts will be critical, particularly for variants of unknown significance (VUS). Longitudinal studies tracking clinical outcomes across genotypes treated with the same stabiliser could provide real-world evidence supporting the in silico predictions. Additionally, the integration of patient-specific omics data may further refine the variant-ligand interaction landscape, ultimately enabling personalised treatment plans.

In conclusion, this study demonstrates the necessity and promise of integrating structural modelling, AI-driven prediction, and generative design in addressing the unmet need for precision therapeutics in TTR amyloidosis. By acknowledging the structural and functional heterogeneity induced by point mutations, this paper suggests a scenario in which each patient’s genotype informs the most effective and targeted treatment strategy.

This study explored the hypothesis that an integrative, structure-based framework—comprising conformational modelling, molecular docking, and generative design—could help prioritize stabilising ligands specifically tailored to various transthyretin (TTR) variants. The results support this hypothesis across multiple areas.

It has been demonstrated that pathogenic TTR mutations lead to variant-specific conformational changes at the binding site. These structural alterations result in measurable differences in binding affinities for known stabilisers, including tafamidis and acoramidis. These observations are consistent with the idea that the varying response to ligands is due to mutation-induced structural heterogeneity, a central premise of this framework.

Building on these insights, a generative design approach to propose novel ligand optimised for binding to representative TTR variants has bee applied. The top-scoring candidates, ranked through docking-based affinity predictions, outperformed existing stabilisers in several variant contexts. These findings reinforce the concept that data-driven molecular design can utilise subtle structural variations to develop genotype-specific stabilisers with improved binding profiles.

Importantly, the approach of this paper is fully extensible. Though this initial study was limited to two reference stabilisers and a set of structurally related analogues, the generative pipeline is compatible with various chemical libraries and alternative TTR conformations. This flexibility allows for broader applications in mutation-guided drug design.

From a clinical standpoint, this article underscores the potential of precision pharmacology in managing TTR amyloidosis. Instead of pursuing a universal stabiliser, the results support the development of personalised ligands tailored to a patient’s genetic background. This shift aligns with broader trends in genotype-informed therapy, especially for disorders related to protein misfolding.

However, the study does have limitations. All results are based on in silico predictions; experimental validation of binding and stabilising effects is a vital next step. Additionally, while docking scores provide a useful approximation of affinity, further evaluation of pharmacokinetic and toxicity properties will be necessary for lead optimisation.

In summary, this work provides evidence that mutation-aware ligand generation—grounded in structural modelling and generative chemistry—can inform the design of next-generation TTR stabilisers and serve as a framework for addressing similar amyloid-related diseases.

This approach could be readily applied to other protein misfolding disorders, such as amyotrophic lateral sclerosis (ALS), where mutations in SOD1 disrupt metal binding and promote aggregation. By modelling SOD1 variants and optimising ligands to stabilise the native structure, targeted therapeutic strategies could be developed. Similarly, in Alzheimer’s disease, the misfolding of tau protein, despite its intrinsic disorder, presents aggregation-prone regions that can be studied using structural embeddings and docking simulations to identify isoform- or modification-specific stabilisers. In systemic light chain (AL) amyloidosis, where immunoglobulin light chains exhibit extensive sequence variability, the framework can be utilised to classify and predict aggregation-prone variants, thereby supporting the design of therapeutic inhibitors. Moreover, in Parkinson’s disease and related synucleinopathies, latent space analysis and docking to emergent pockets of α-synuclein variants could inform the development of conformation-specific binders or modulators. Overall, the methodology supports a precision medicine paradigm, where ligand design and therapeutic strategies are tailored to the specific structural and mutational profile of each protein variant. Future work will aim to generalise this platform to additional disease systems, incorporating experimental structures (e.g., cryo-EM), mutagenesis data, and proteostasis models to enhance its clinical and translational utility.

Methods

Overview of the computational workflow

An overview of the computational pipeline is shown in Fig. 17. Beginning with the wild-type (WT) transthyretin (TTR) sequence and a curated set of single-point mutations from the literature, three-dimensional structural models were generated using AlphaFold3. These models were subsequently analysed using a suite of computational techniques, including molecular dynamics (MD) simulations, network-based metrics, and protein language model embeddings. To assess mutation-specific effects on ligand interaction, both classical and AI-based molecular docking approaches were applied. Generative models were further used to explore mutation-aware ligand design.

Fig. 17: Overview of the computational pipeline.
figure 17

a Mutant structures are generated using AlphaFold 3 and the impact of point mutations on protein stability is predicted using ΔΔG predictors. b The structural effects of these mutations are analysed alongside existing ligands. This involves quantifying structural changes with TM-Align, investigating residue centrality and functional communities with PCN-Miner, visualizing embeddings and predicting pathogenicity with ESM2, and investigating the structural stabilization/destabilization effects of both mutations and ligands using molecular dynamics simulations. c Molecular docking was performed on the TTR binding sites of the mutant structures using DiffDock and AutoDock Vina, evaluating binding affinities and identifying optimal ligands for each mutation. d DiffSBDD was used to generate new ligands or optimize existing ones to effectively bind specific mutations.

Structure prediction

All single-point variants were modelled using AlphaFold361, producing structures in both monomeric and tetrameric forms. The predicted wild-type structure was validated against crystallographic references for consistency. Structural alignments were performed using TM-align62, and structural similarity was quantified using the TM-score:

$${\rm{TM\; -\; score}}=\max \left(\frac{1}{{L}_{T}}\mathop{\sum }\limits_{i=1}^{{L}_{A}}\frac{1}{1+{\left(\frac{{d}_{i}}{{d}_{0}}\right)}^{2}}\right),$$
(2)

where LA and LT denote the lengths of the aligned and target proteins, respectively; di is the distance between aligned residues; and \({d}_{0}=1.24\sqrt[3]{{L}_{T}-15}-1.8\). TM-scores above 0.5 indicate high structural similarity.

AlphaFold3-predicted models were aligned and validated via root mean square deviation (RMSD) calculations for the available experimentally resolved mutant structures. Validation was carried out on both unbound tetramers and ligand-bound complexes.

Latent space analysis using ESM2 and UMAP

Variant sequences were embedded using the ESM2 transformer model40. Each sequence was mapped to a 1280-dimensional vector and projected into two dimensions using Uniform Manifold Approximation and Projection (UMAP)41. Euclidean distances from the WT embedding were used to classify pathogenicity: variants within a predefined threshold radius were labelled benign, and those beyond as pathogenic. Due to the imbalance between pathogenic annotations (n = 96) and benign annotations (n = 2), classification performance was assessed using the ROC-AUC metric. The ground truth was obtained through the UniProt Variation API, consolidating ’likely’ labels into binary annotations. Variants of uncertain significance were classified using the ROC-optimised threshold.

Benchmarking against state-of-the-art predictors

Performance was benchmarked against three established predictors: AlphaMissense46,63, E-SNPs&GO47, and VESM++48,64. AlphaMissense adapts AlphaFold2 to generate pathogenicity scores in the [0, 1] range, mapped to three confidence classes. E-SNPs&GO employs embeddings from ESM-1v65 and ProtTrans T566, reduced via PCA and classified using a support vector machine. VESM++ is a co-distilled ensemble of ESM-1b, ESM2-650M, and ESM3, outputting log-likelihood ratios transformed into pathogenicity scores using a sigmoid function. For fair comparison, all models used the same classification threshold.

Prediction of mutation-induced structural changes

Mutation-induced effects on protein stability were predicted using a consensus of sequence- and structure-based tools: mCSM67, SDM68, DUET69, DynaMut270, DDGun71, and SAAFEC72. These tools include energy-based statistical models, machine learning frameworks, and purely sequence-driven approaches. Predicted changes in Gibbs free energy (ΔΔG) were combined into a consensus matrix.

Network analysis of protein contact maps

For each TTR structure a Protein Contact Network (PCN)73 was built. Each residue of the protein is represented as a node while edge models residues having distances within the 4–8 Å range74,75. The Euclidean distance between residues i and j was defined as:

$${d}_{ij}=\sqrt{{({x}_{i}-{x}_{j})}^{2}+{({y}_{i}-{y}_{j})}^{2}+{({z}_{i}-{z}_{j})}^{2}}.$$
(3)

Centrality measures—degree, closeness, and betweenness—were computed. Closeness was defined as:

$${C}_{{\rm{closeness}}}(i)=\frac{n-1}{{\sum }_{j\ne i}d(i,j)},$$
(4)

and betweenness as:

$${C}_{{\rm{betweenness}}}(i)=\sum _{j\ne k\ne i}\frac{{\sigma }_{jk}(i)}{{\sigma }_{jk}},$$
(5)

where σjk is the number of shortest paths between nodes j and k, and σjk(i) denotes those passing through node i.