ANARCII enables alignment-free antigen receptor numbering using a generalised language model

Greenshields-Watson, Alexander; Agarwal, Parth; Robinson, Sarah A.; Williams, Benjamin Heathcote; Gordon, Gemma L.; Capel, Henriette L.; Li, Yushi; Spoendlin, Fabian C.; Aguilar-Sanjuan, Broncio; Boyles, Fergus; Deane, Charlotte M.

doi:10.1038/s42003-026-10186-z

Article
Open access
Published: 21 May 2026

ANARCII enables alignment-free antigen receptor numbering using a generalised language model

Communications Biology (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Antigen receptor numbering allows delineation of antigen-binding regions of antibodies and T cell receptors, from sequence alone. Numbering is currently achieved by aligning to a reference set. This approach may result in different numbering depending on reference set used or fail on sequences from rare species or formats. We present a method (ANARCII) which requires no alignment step and is based on a Seq2Seq language model. ANARCII improves upon existing methods through more consistent numbering of key regions, robustness to truncations, generalisation to unseen species, and easier user installation. The lightweight architecture allows numbering of 90,000 sequences per minute on a high-end GPU. The software is available via web app (https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/anarcii/), and package (https://github.com/oxpig/ANARCII). Ultimately ANARCII allows numbering of more antibody-like sequences, with better recovery of full-length regions from existing databases, and enables comparative analysis of new receptors not numbered by existing tools.

Acknowledgements

The authors would like to thank Oliver Turnbull, Carlos Outeiral and David Prihoda for their helpful suggestions and feedback. We would also like to thank Chris Thorpe, Benjamin McMaster, Bruce MacLachlan and Nele Quast for their helpful discussions on numbering of MHC/HLA (currently under development – available as a development branch on GitHub). The work was supported through research funding by Exscientia awarded to A.G.W., and Doctoral programme funding from the UK Engineering and Physical Sciences Research Council (EPSRC) awarded to S.A.R., G.L.G., H.L.C. and F.C.S (EP/S024093/1). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Author information

These authors contributed equally: Alexander Greenshields-Watson, Parth Agarwal, Sarah A. Robinson.

Authors and Affiliations

Department of Statistics, Oxford Protein Informatics Group, University of Oxford, Oxford, UK
Alexander Greenshields-Watson, Parth Agarwal, Sarah A. Robinson, Benjamin Heathcote Williams, Gemma L. Gordon, Henriette L. Capel, Yushi Li, Fabian C. Spoendlin, Broncio Aguilar-Sanjuan, Fergus Boyles & Charlotte M. Deane

Authors

Alexander Greenshields-Watson
View author publications
Search author on:PubMed Google Scholar
Parth Agarwal
View author publications
Search author on:PubMed Google Scholar
Sarah A. Robinson
View author publications
Search author on:PubMed Google Scholar
Benjamin Heathcote Williams
View author publications
Search author on:PubMed Google Scholar
Gemma L. Gordon
View author publications
Search author on:PubMed Google Scholar
Henriette L. Capel
View author publications
Search author on:PubMed Google Scholar
Yushi Li
View author publications
Search author on:PubMed Google Scholar
Fabian C. Spoendlin
View author publications
Search author on:PubMed Google Scholar
Broncio Aguilar-Sanjuan
View author publications
Search author on:PubMed Google Scholar
Fergus Boyles
View author publications
Search author on:PubMed Google Scholar
Charlotte M. Deane
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Charlotte M. Deane.

Ethics declarations

Competing interests

C.D. discloses membership of the Scientific Advisory Board of Fusion Antibodies and AI proteins, as well as a founder of Dalton. All other authors declare no conflict of interest.

AI disclosure

Generative AI tools, including ChatGPT and GitHub Copilot, were utilised to assist in code generation and error checking during the development of this project.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary data (download XLSX )

Reporting Summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Greenshields-Watson, A., Agarwal, P., Robinson, S.A. et al. ANARCII enables alignment-free antigen receptor numbering using a generalised language model. Commun Biol (2026). https://doi.org/10.1038/s42003-026-10186-z

Download citation

Received: 12 September 2025
Accepted: 23 April 2026
Published: 21 May 2026
DOI: https://doi.org/10.1038/s42003-026-10186-z