Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Communications Biology
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. communications biology
  3. articles
  4. article
ANARCII enables alignment-free antigen receptor numbering using a generalised language model
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 21 May 2026

ANARCII enables alignment-free antigen receptor numbering using a generalised language model

  • Alexander Greenshields-Watson1 na1,
  • Parth Agarwal1 na1,
  • Sarah A. Robinson1 na1,
  • Benjamin Heathcote Williams  ORCID: orcid.org/0000-0001-9544-08901,
  • Gemma L. Gordon  ORCID: orcid.org/0000-0002-8259-91111,
  • Henriette L. Capel  ORCID: orcid.org/0000-0002-3757-53131,
  • Yushi Li1,
  • Fabian C. Spoendlin  ORCID: orcid.org/0000-0002-3006-62171,
  • Broncio Aguilar-Sanjuan  ORCID: orcid.org/0000-0001-8068-64171,
  • Fergus Boyles1 &
  • …
  • Charlotte M. Deane  ORCID: orcid.org/0000-0003-1388-22521 

Communications Biology (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Data processing
  • Sequence annotation
  • Somatic hypermutation
  • VDJ recombination

Abstract

Antigen receptor numbering allows delineation of antigen-binding regions of antibodies and T cell receptors, from sequence alone. Numbering is currently achieved by aligning to a reference set. This approach may result in different numbering depending on reference set used or fail on sequences from rare species or formats. We present a method (ANARCII) which requires no alignment step and is based on a Seq2Seq language model. ANARCII improves upon existing methods through more consistent numbering of key regions, robustness to truncations, generalisation to unseen species, and easier user installation. The lightweight architecture allows numbering of 90,000 sequences per minute on a high-end GPU. The software is available via web app (https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabpred/anarcii/), and package (https://github.com/oxpig/ANARCII). Ultimately ANARCII allows numbering of more antibody-like sequences, with better recovery of full-length regions from existing databases, and enables comparative analysis of new receptors not numbered by existing tools.

Acknowledgements

The authors would like to thank Oliver Turnbull, Carlos Outeiral and David Prihoda for their helpful suggestions and feedback. We would also like to thank Chris Thorpe, Benjamin McMaster, Bruce MacLachlan and Nele Quast for their helpful discussions on numbering of MHC/HLA (currently under development – available as a development branch on GitHub). The work was supported through research funding by Exscientia awarded to A.G.W., and Doctoral programme funding from the UK Engineering and Physical Sciences Research Council (EPSRC) awarded to S.A.R., G.L.G., H.L.C. and F.C.S (EP/S024093/1). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Author information

Author notes
  1. These authors contributed equally: Alexander Greenshields-Watson, Parth Agarwal, Sarah A. Robinson.

Authors and Affiliations

  1. Department of Statistics, Oxford Protein Informatics Group, University of Oxford, Oxford, UK

    Alexander Greenshields-Watson, Parth Agarwal, Sarah A. Robinson, Benjamin Heathcote Williams, Gemma L. Gordon, Henriette L. Capel, Yushi Li, Fabian C. Spoendlin, Broncio Aguilar-Sanjuan, Fergus Boyles & Charlotte M. Deane

Authors
  1. Alexander Greenshields-Watson
    View author publications

    Search author on:PubMed Google Scholar

  2. Parth Agarwal
    View author publications

    Search author on:PubMed Google Scholar

  3. Sarah A. Robinson
    View author publications

    Search author on:PubMed Google Scholar

  4. Benjamin Heathcote Williams
    View author publications

    Search author on:PubMed Google Scholar

  5. Gemma L. Gordon
    View author publications

    Search author on:PubMed Google Scholar

  6. Henriette L. Capel
    View author publications

    Search author on:PubMed Google Scholar

  7. Yushi Li
    View author publications

    Search author on:PubMed Google Scholar

  8. Fabian C. Spoendlin
    View author publications

    Search author on:PubMed Google Scholar

  9. Broncio Aguilar-Sanjuan
    View author publications

    Search author on:PubMed Google Scholar

  10. Fergus Boyles
    View author publications

    Search author on:PubMed Google Scholar

  11. Charlotte M. Deane
    View author publications

    Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Charlotte M. Deane.

Ethics declarations

Competing interests

C.D. discloses membership of the Scientific Advisory Board of Fusion Antibodies and AI proteins, as well as a founder of Dalton. All other authors declare no conflict of interest.

AI disclosure

Generative AI tools, including ChatGPT and GitHub Copilot, were utilised to assist in code generation and error checking during the development of this project.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary data (download XLSX )

Reporting Summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Greenshields-Watson, A., Agarwal, P., Robinson, S.A. et al. ANARCII enables alignment-free antigen receptor numbering using a generalised language model. Commun Biol (2026). https://doi.org/10.1038/s42003-026-10186-z

Download citation

  • Received: 12 September 2025

  • Accepted: 23 April 2026

  • Published: 21 May 2026

  • DOI: https://doi.org/10.1038/s42003-026-10186-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Journal Information
  • Open Access Fees and Funding
  • Journal Metrics
  • Editors
  • Editorial Board
  • Calls for Papers
  • Referees
  • Contact
  • Editorial policies
  • Aims & Scope

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Communications Biology (Commun Biol)

ISSN 2399-3642 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing