Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells

Gallo, Giulia; Di Nardo, Antonello; Lugano, Doreen; Roberts, Adam J.; Kutima, Bernadette Ataku; Okombo, Moses; Dewantari, Aghnianditya Kresno; Buckley, Florence M. M.; Wright, Gavin J.; Nyagwange, James; Agwanda, Bernard; Graham, Stephen C.; Bailey, Dalan

doi:10.1038/s41586-026-10394-x

Download PDF

Article
Open access
Published: 22 April 2026

Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells

Nature volume 653, pages 180–189 (2026)Cite this article

29k Accesses
1 Citations
447 Altmetric
Metrics details

Subjects

Abstract

Identifying viruses with zoonotic potential on the basis of their ability to enter human cells is a critical component of pandemic prediction, prevention and preparedness. Here using a computational approach that retains maximum phylogenetic diversity, we selected an optimal subset of alphacoronavirus spike proteins to screen against broad coronavirus receptor libraries. Most of the selected spike proteins did not use any of the established coronavirus receptors. However, the pseudotyped spike protein of Cardioderma cor (heart-nosed bat) coronavirus KY43 (CcCoV-KY43) could enter human cells. Using a recombinant CcCoV receptor-binding domain (RBD) and a human receptor screening platform, we identified direct interactions with the human CEACAM proteins CEACAM3, CEACAM5 and CEACAM6. Overexpression of human CEACAM6—a protein widely expressed in the human lung—conferred permissivity to otherwise refractory human cells. A crystal structure showed that the RBD binds the amino-terminal IgV-like domain of human CEACAM6. Immune surveillance studies using sera of individuals from the Taveta region of Kenya, where CcCoV-KY43 was identified, did not show significant evidence of recent spillover. Wider characterization of alphacoronaviruses related to CcCoV-KY43 showed that human CEACAM6 is used by two other CcCoVs collected in Kenya. Moreover, there was more restricted nonhuman CEACAM6 tropism for viruses isolated from Rhinolophus bats from Russia and China. Thus, alphacoronaviruses that use CEACAM6 are probably geographically widespread, and viruses from East Africa show potential for transmission to humans.

Bat coronaviruses related to SARS-CoV-2 and infectious for human cells

Article 16 February 2022

High host specificity of alphacoronaviruses in Nearctic, insectivorous bats

Article Open access 26 April 2025

Synchronized seasonal excretion of multiple coronaviruses coincides with high rates of coinfection in immature bats

Article Open access 17 July 2025

Main

Following the COVID-19 pandemic, there has been a renewed focus on the characterization and prediction of viruses with zoonotic potential, particularly among the coronaviruses. Between 60% and 75% of human pathogens are thought to have a zoonotic origin¹. However, acquiring comprehensive information on the zoonotic potential of all viruses is technically challenging because of their extensive diversity. Detailed data are usually available for only a few species of human viruses, or their close relatives, which belies the fact that the next pandemic is likely to be a previously undescribed virus, as SARS-CoV-2 was to virologists in 2019.

The first barrier for any cross-species viral jump is cell entry, a process that relies on the binding of viral attachment proteins to cellular receptors. To attempt wider characterization and identification of viruses with zoonotic potential, we focused on alphacoronavirus entry as a model, owing to the lack of detailed genus-wide information on the host range of these predominantly bat-borne viruses. So far, only two cellular receptors have been characterized for alphacoronaviruses. Human, porcine, canine and feline aminopeptidase N (APN) receptors provide entry to group I viruses, which include human coronavirus 229E (HCoV-229E)^2,3, transmissible gastroenteritis virus and the related porcine respiratory coronavirus (PRCV)⁴, as well as some canine and feline coronaviruses⁵. Human coronavirus NL63 (HCoV-NL63) is the only known alphacoronavirus to use ACE2 as a receptor⁶. Apart from ACE2 (used by SARS-CoV-1 and SARS-CoV-2), human betacoronaviruses use the receptors DPP4 (used by MERS-CoV)⁷, O-acetylated sialic acids (used by HKU1 (ref. ⁸) and OC43 (ref. ⁹)) and TMPRSS2 (used by HKU1)¹⁰. Recently, porcine and human DPEP1 were both shown to be efficient receptors for porcine haemagglutinating encephalomyelitis virus, although there is no evidence for zoonotic infection¹¹. Entry of the betacoronavirus mouse hepatitis virus (MHV) is facilitated by mouse CEACAM1 (ref. ¹²); although in this case, the human orthologue is not efficiently used¹³. How receptor usage partitions across the entire alphacoronavirus genus and whether there are other human tropic viruses and/or other permissive receptors are unclear.

APN- and ACE2-dependent entry is rare for alphacoronaviruses

For all coronaviruses, the spike protein is the major determinant of viral entry, and cleavage by cellular proteases produces two subunits: S1 and S2. S1 is involved in binding to the cellular receptor, whereas S2 is involved in fusion of the viral and cellular membranes. To select alphacoronavirus spike proteins that accurately represent the known diversity, we used a greedy algorithm¹⁴ on all full-length spike protein sequences (n = 2,714) retrieved from the Virus Pathogen Database and Analysis Resource database¹⁵ and deposited into GenBank (as of May 2021). Our goal was to scale down the number of spike proteins tested without significantly losing the richness and heterogeneity of diversity in this genus. The final number chosen (n = 40) was based on our capacity for pseudotyping and receptor screening (Fig. 1a and Supplementary Table 1). Among these sequences, which included representatives from taxonomically classified subgenera such as Colacovirus, Pedacovirus, Minacovirus, Tegacovirus, Nyctacovirus and Luchacovirus, the overall level of amino acid conservation was low, especially in the predicted RBD (Supplementary Fig. 1). The selected 40-sequence spike protein panel (around 1.5% of the full dataset) captured 53.4% of the total phylogenetic diversity, which substantially exceeded size-matched random panels (13.7 ± 3.2%, 10,000 permutations, around 3.9-fold enrichment, empirical P < 0.0001). Most selected spike proteins were from poorly characterized alphacoronaviruses isolated from bats (27 out of 40); however, the two endemic human viruses HCoV-229E and HCoV-NL63 were selected. To confirm that the commercially synthesized, cloned, codon-optimized spike protein open-reading frames (ORFs) could efficiently produce spike proteins that are pseudotyped onto HIV-1-based lentiviruses, pseudoviruses were purified and spike protein incorporation was confirmed by immunoblotting (Extended Data Fig. 1a). Concurrent to the establishment and validation of this alphacoronavirus spike protein pseudotype library, we developed equivalent plasmid-based APN and ACE2 expression libraries that represented 25 and 34 mammalian species, respectively (Supplementary Tables 2 and 3). Expression of these tagged receptors was verified by flow cytometry (Supplementary Fig. 2). First, we assessed use of known human receptors (APN, ACE2, DPP4, TMPRSS2 and DPEP1), which identified only HCoV-229E and HCoV-NL63 as human-tropic (Fig. 1b). The algorithm-selected HCoV-229E spike protein was not functional; therefore, we replaced it with the reference sequence for the virus (Supplementary Fig. 3a). Next, we assayed whether the pseudotyped spike protein from our alphacoronavirus library could use any APN or ACE2 from our receptor libraries (Fig. 1c (selected data) and Supplementary Figs. 4a and 5a (all data)). Canine coronaviruses (CCoVs) could enter cells that express APN, whereas despite a feline coronavirus being included in the library, it did not pseudotype. Moreover, two different porcine epidemic diarrhoea virus (PEDV) spike proteins did not use porcine APN in our experimental settings, which could be explained by the dynamics of PEDV spike protein internalization in non-susceptible cells¹⁶. However, another porcine alphacoronavirus, PRCV, was able to use the same receptor (Supplementary Fig. 3b). Where possible, we included the species for which individual viruses were identified or the closest relative available (Supplementary Figs. 4b and 5b). This selection included genetically divergent Chiroptera species to account for the reservoir biology of coronaviruses and domesticated and peridomestic animals that might bridge the gap between unknown reservoirs and humans. In general, APN-using viruses seemed to be restricted to lower numbers of host species than the ACE2-using HCoV-NL63. We suggest that this difference is determined by the spike protein–receptor interaction surface area, affinity and physicochemical nature (for example, charge and hydrophobicity), combined with receptor sequence variability across hosts. Notably, two bat-origin alphacoronaviruses, BtCoV-WA1087 and BtCoV-AT1A-F41, used APN receptors to enter cells, albeit not human APN (Fig. 1c). To our knowledge, this is the first study to confirm APN usage by alphacoronaviruses isolated from bats. Most of the alphacoronaviruses tested did not use any of the expressed APN or ACE2 receptors to enter cells. Some coronaviruses are only infectious after protease treatment, which facilitates transition of the RBD from a closed to open conformation and/or exposure of the fusion peptide at the S2′ site. To address this possibility, we repeated experiments in the presence or absence of human serine protease TMPRSS2 or trypsin (Extended Data Fig. 2a (TMPRSS2) and 2b (trypsin)). Co-expression or pretreatment did not affect host range, although there was some evidence of digestion of spike proteins by trypsin (Extended Data Fig. 1b). To examine whether closed RBD conformations were leading to false negatives in our screens, we purified a selection of RBDs and performed flow cytometry to assess receptor binding at the cell surface (Extended Data Fig. 3a,b). The observed correlation between entry and receptor binding indicated that an absence of pseudotype entry reflects a genuine inability to bind APN and ACE2. Together, these results suggest that use of the known receptors ACE2 and APN may be a relatively rare phenomenon in the alphacoronavirus genus.

**Fig. 1: Construction of the alphacoronavirus spike protein library and screening of the receptor libraries.**

**Fig. 3: CcCoV-KY43 RBD interacts with the IgV domain of CEACAM6.**

Human CEACAM6 supports entry of CcCoV-KY43

Recent high-throughput analyses of a diverse range of pseudotyped viruses (screened against the NCI-60 panel from the National Institutes of Health) concluded that coronavirus entry is a key determinant of host range¹⁷. As most alphacoronavirus spike proteins did not use human APN, ACE2, DPP4, TMPRSS2 or DPEP1 for entry, the next step in our zoonotic risk assessment was to examine pseudotype infection of a wider range of human cell lines. Across our screen (Fig. 2a), the only examples of infection we detected were with pseudotyped HCoV-NL63 and HCoV-229E, a finding consistent with our receptor screening results and their known human tropism (Fig. 1b), and with BtCoV-KY43 (Fig. 2a). The spike protein sequence of BtCoV-KY43 (hereafter referred to as CcCoV-KY43) was originally isolated from C. cor bats in Kenya in 2006 (ref. ¹⁸), with a partial genome sequence published in 2012 (GenBank accession: HQ728480)¹⁹. We speculated that successful infection of Calu3 and Caco2 cells by CcCoV-KY43 spike protein pseudotypes is facilitated by an unknown human receptor. To identify this receptor, we used an avidity-based method designed to systematically identify direct extracellular interactions with a library of 759 human receptor ectodomains²⁰, using the CcCoV-KY43 spike protein RBD (CcCoV-KY43 RBD) as prey (Fig. 2b). We identified three interactions, all of which were paralogues of the human carcinoembryonic antigen cell adhesion molecule (CEACAM) protein: CEACAM3, CEACAM5 and CEACAM6 (Fig. 2c and Supplementary Fig. 6). These cell surface glycoproteins contain immunoglobulin-like domains, with both CEACAM5 and CEACAM6 having a glycosylphosphatidylinositol anchor. Subsequent overexpression of a panel of full-length human CEACAM proteins in HEK293 cells, which are normally refractory to CcCoV-KY43 spike protein pseudotype infection, demonstrated that only human CEACAM6 expression resulted in a significant increase in permissivity (Fig. 2d). A split luciferase-based cell–cell fusion reporter assay subsequently confirmed this receptor usage profile in a separate cellular context (Extended Data Fig. 4). ELISA experiments with recombinant CcCoV-KY43 RBD and the extracellular domains of human CEACAM3, CEACAM5 and CEACAM6 enabled us to attribute this entry specificity to a higher affinity for CEACAM6 (Fig. 2e). Thermodynamic analyses using isothermal titration calorimetry (ITC) confirmed a 1:1 interaction between the CcCoV-KY43 RBD and CEACAM6, with a dissociation constant (K_d) of 271 ± 68 nM (mean ± s.d., n = 3), whereas the RBD did not measurably bind CEACAM5 (Fig. 2f and Supplementary Table 4). Dose-dependent neutralization of CcCoV-KY43 spike protein pseudotypes was demonstrated using soluble, recombinant CEACAM6 (Fig. 2g), and cell entry was blocked when using monoclonal antibodies against human CEACAM6 (Fig. 2h and Extended Data Fig. 5a). To further confirm the role of human CEACAM6 in CcCoV-KY43 entry, we used CEACAM6-specific short interfering RNAs (siRNAs) to knockdown endogenous protein expression in the susceptible cell lines Caco2 and Calu3. These experiments revealed reduced virus entry for CcCoV-KY43 (Fig. 2i, left) but not HCoV-229E inf1 (Supplementary Fig. 7). The same results were obtained in Calu3 and Caco2 cells treated with short hairpin RNA (shRNA) that were selected to stably downregulate the expression of human CEACAM6 (Fig. 2i, right). To assess whether human CEACAM6 could support entry of other greedy-selected alphacoronaviruses, we re-screened our library but did not find other examples of entry (Extended Data Fig. 6).

In humans, pathogenic coronaviruses often have tropism for the respiratory and/or gastrointestinal tract. Analyses of datasets from The Human Cell Atlas²¹, ranked by the number of cells expressing CEACAM6 in individual tissues, identified the lung, colon and bronchus as the three tissues with the most numerous CEACAM6-expressing cells (Supplementary Fig. 8). Stratification of single-cell transcriptomic data for the lung²² by cell type identified pulmonary alveolar type 1, goblet (bronchial, tracheobronchial and nasal) and epithelial cells of the lower respiratory tract as having the most prevalent CEACAM6 expression (Fig. 2j and Supplementary Fig. 9a). These cell types are frequent targets for respiratory virus infections. The only precedents for CEACAM6 being used by pathogens for infection is as an adhesion factor for Gram-negative bacteria^23,24, for example by Escherichia coli in patients with Crohn’s disease who have aberrant CEACAM6 expression in their ileal epithelium²⁵, and by the pathogenic yeast Candida albicans²⁶. Of note, expression of CEACAM6 in the human lung is both higher and more widespread than any of the established human coronavirus proteinaceous receptors APN, ACE2 or DPP4 (Fig. 2j and Supplementary Fig. 9b). Together, these data demonstrate that CcCoV-KY43 spike protein uses human CEACAM6 as a receptor to infect cells, and this receptor is expressed in the human lung.

KY43 RBD binds CEACAM6 N-terminal immunoglobulin domain

We used crystallography to solve the structure of a complex comprising the CcCoV-KY43 RBD and the three immunoglobulin-like domains that form the ectodomain of CEACAM6 to define the molecular architecture of their interaction (Fig. 3a). The structure was refined (R/R_free = 0.256/0.317) from crystals with anisotropic diffraction (3.0 and 5.1 Å in the best and worst direction, respectively) that contained one heterodimer of the CcCoV-KY43 RBD plus CEACAM6 per asymmetric unit (Supplementary Table 5 and Supplementary Fig. 10a). The RBD binds the tip of the N-terminal V-set immunoglobulin-like domain of CEACAM6, with receptor binding loops 1–3 of the RBD engaging the surface formed by CEACAM6 strands G-F-C-C′-C″ plus the F-G and C-C′ loops (Fig. 3b and Supplementary Table 6). This region is the same surface that mediates both homodimerization of CEACAM6 and its heterodimerization with CEACAM8 (ref. ²⁷). Moreover, the N-terminal domain of CEACAM6 is highly similar between the complexes with CcCoV-KY43 RBD and CEACAM6 and CEACAM8, with 0.51–0.54 Å root-mean-squared deviation across 99–103 Cα atoms (Supplementary Fig. 11). The interaction buries 1,300 Å² of the surface area, less extensive than CEACAM6 homodimerization or heterodimerization (1,600 Å²) but more than the interaction between the HCoV-229E RBD and human APN (1,000 Å²)²⁸ and comparable with that of other coronavirus–receptor interactions (Supplementary Fig. 12). The interaction is dominated by the hydrophobic interaction between CcCoV-KY43 RBD loop 3, especially residue W600, and the hydrophobic surface formed by strands C, C′ and F of CEACAM6, centred on residue L129 (Fig. 3c and Supplementary Table 6).

Elucidation of the structure of the binding interface enabled us to dissect the genetic determinants of CEACAM6 receptor specificity. Loops 1–3 in the RBDs from alphacoronavirus species included in our library differed markedly from CcCoV-KY43 in both their length (Supplementary Fig. 13a) and amino acid composition (Supplementary Fig. 13b). This result is consistent with the observation that only CcCoV-KY43 uses CEACAM6 for entry (Extended Data Fig. 6). Although human CEACAM5 was identified as an entry receptor in our sensitive avidity-based receptor discovery experiment, the ELISA and ITC experiments showed that it has reduced affinity for the CcCoV-KY43 RBD (Fig. 2e,f). Moreover, CEACAM5 overexpression was not sufficient for pseudotype entry (Fig. 2d). Alignment of the surface contact interface identified two important amino acid substitutions, I63F and Q123H, in CEACAM5 relative to CEACAM6 (Fig. 3d). A F63I, but not H123Q, substitution in CEACAM5 partially overcame the entry restriction of CcCoV-KY43 pseudotypes, with the corresponding substitutions in CEACAM6—I63F, Q123H or both—reducing pseudotype entry (Fig. 3d and Extended Data Fig. 7a). Note that the mutated receptors were expressed at comparable levels to the wild type for these experiments (Extended Data Fig. 7b). Detailed structural mechanics of coronavirus receptor binding have been extensively characterized, which have highlighted a fascinating pattern of either convergent or continued evolution. The architecture of the CcCoV-KY43 RBD is similar to those of the other two human alphacoronaviruses HCoV-229E and HCoV-NL63, even though their cellular receptors are structurally different (Fig. 3e). Although our data provide evidence that CEACAM6 can be used as a direct proteinaceous receptor by viruses, we note that mouse CEACAM1 is used as a receptor by the betacoronavirus MHV²⁹. MHV receptor binding is mediated through the N-terminal domain of S1, whereas most protein-binding coronavirus RBDs are found in the S1 C-terminal domain, as is the case for CcCoV-KY43. Nevertheless, the target region on the CEACAM receptor remains orthologous: the N-terminal immunoglobulin domain (Supplementary Fig. 14). As the human CEACAM N-terminal immunoglobulin domains are relatively conserved (Supplementary Fig. 15, binding region in teal), we also examined CEACAM1, CEACAM4, CEACAM8 and CEACAM20 in CcCoV-KY43 entry assays (Extended Data Fig. 8 and Supplementary Table 7). None of these CEACAM proteins was able to facilitate entry of CcCoV-KY43 pseudotypes. Moreover, CEACAM1, CEACAM4 and CEACAM8 did not show appreciable binding in biolayer interferometry assays, nor did the I63F mutant of human CEACAM6 (Extended Data Fig. 9). These data highlight another example of the diversity of receptor recognition modalities among coronaviruses, with two coronaviruses from distinct genera evolving to bind the same family of receptors (in the same orthologous domain on those receptors), but through different S1 domains. These results further confirm that CEACAM usage across the rich coronavirus diversity may be underexplored.

CEACAM6-using coronaviruses are globally distributed

CcCoV-KY43 was isolated from a bat collected in a rural setting: an artisanal mine in southeastern Kenya¹⁸ (Fig. 4a). C. cor is naturally distributed across Eastern Africa³⁰. However, more granular population distribution data for Kenya are currently lacking. To provide an interim assessment of sites where bat–human spillover might occur, we summarized our bat sampling data from across Kenya over the past 20 years (Fig. 4a, derived geographical range of C. cor coloured in teal). This analysis revealed that coastal regions in the southeast are human population centres at increased risk. Focusing on this region, we examined human sera from our biobank and assessed for the presence of CcCoV-KY43 RBD-specific antibodies (Fig. 4b). We analysed a total of 368 blood donor samples collected in 2020 and 2021 from mainly male individuals (95%) under 35 years of age (72%) living in the Tana River and Taita-Taveta counties where CcCoV-KY43 was identified (Supplementary Table 8). The samples were examined for CcCoV-KY43, HCoV-229E, HCoV-NL63 and SARS-CoV-2 S-protein-specific responses by ELISA (Fig. 4b, inset, Tana River and Taita-Taveta counties highlighted in pale red). We also included two RBDs from more distantly related alphacoronaviruses (BtCoV-HlYN18 and BtCoV-A701, both isolated from bats in China) as comparators. Across the full dataset, Spearman’s rank correlation analysis revealed weak monotonic associations between CcCoV-KY43 and HCoV-229E, SARS-CoV-2, BtCov-HlYN18 and BtCoV-A701 (ρ = 0.14–0.30), whereas no association was observed with HCoV-NL63 (ρ = 0.06) (Fig. 4b). Although ELISA reactivity against the human alphacoronaviruses HCoV-229E and HCoV-NL63 was commonly observed, increased CcCoV-KY43 ELISA signals were detected only sporadically, with optical density (OD) values exceeding one in nine samples. Using this OD threshold as a descriptive marker of high reactivity rather than evidence of seropositivity, similarly infrequent high OD signals were observed for BtCoV-HlYN18 (n = 5) and BtCoV-A701 (n = 2). Restricting the analysis to samples comprising the top 10% of CcCoV-KY43 ELISA signals, Pearson’s correlation analysis of matched sera revealed moderate correlations with HCoV-NL63 and SARS-CoV-2, but only weak correlation with HCoV-229E, BtCoV-HlYN18 and BtCoV-A701 (Supplementary Fig. 16). Together, these data suggest that widespread CcCoV-KY43 spillover is unlikely in these populations. Alternatively, sporadic CcCoV-KY43 reactivity may reflect cross-reactive antibody responses induced by exposure to antigenically related coronaviruses.

**Fig. 4: Human CEACAM6 is the receptor for other CcCoVs identified in Kenya.**

Recently, two more CcCoV sequences from Meru National Park in central Kenya, about 600 km from the site of the isolation of CcCoV-KY43, were published³¹: CcCoV 2A/Kenya/BAT2621/2015 (CcCoV-2A; GenBank identifier: PP273172.1) and CcCoV 2B/Kenya/BAT2618/2015 (CcCoV-2B; GenBank identifier: PP273173.1) (Fig. 4a). Notably, both viruses are relatively divergent. CcCoV-2B is more related to CcCoV-KY43; however, it shares only 79% and 83% nucleotide and amino acid identity, respectively, in the spike protein (79% and 85%, respectively, in the RBD). The more distantly related isolate, CcCoV-2A, shares only 70% and 77% nucleotide and amino acid identity, respectively, across the spike protein (72% and 76%, respectively, for the RBD) with CcCoV-KY43 (Fig. 4c). Despite this variability, entry assays using CcCoV-2A spike protein pseudotypes showed human CEACAM6-dependent entry (Fig. 4d). In accordance with the CcCoV-KY43 data, human CEACAM5 did not support pseudotype entry. Similarly, the F63I mutant in CEACAM5 led to increased entry to levels seen with human CEACAM6 (Extended Data Fig. 10a), whereas the I63F substitution in CEACAM6 reduced entry (Extended Data Fig. 10b). In parallel, an ELISA with a CcCoV-2B recombinant RBD confirmed binding to human CEACAM6 (Fig. 4e), and ITC showed that it had higher affinity for CEACAM6 than CcCoV-KY43 (70 ± 16 nM; Fig. 4f and Supplementary Table 4). The structure of the CcCoV-2B RBD in complex with human CEACAM6 was solved from crystals with an anisotropic diffraction limit of 3 Å (Supplementary Table 5). The structures showed an identical quaternary organization to the structure of human CEACAM6 in complex with the CcCoV-KY43 RBD (Fig. 4g) and highly similar residues at the interaction interface (Fig. 4h and Supplementary Fig. 10b).

The use of human CEACAM6 by genetically divergent CcCoVs strongly suggest that CEACAM6 usage might be a conserved ancestral trait of a larger group of uncharacterized alphacoronaviruses. To explore the evolutionary history of CEACAM6 receptor usage, we assembled a library of spike proteins from related viruses and reconstructed the ‘local phylogeny’ of CcCoVs on the alphacoronavirus tree (Supplementary Table 9 and Supplementary Fig. 17a). Overall, CcCoVs were phylogenetically placed in a sister clade to alphacoronaviruses isolated from Rhinolophus ferrumequinum bats in South China (Fig. 5a). The CcCoV clade was returned as monophyletic in all sampled posterior trees (n = 9,000, 95% binomial confidence interval (CI) of 0.999–1.000), with its most recent common ancestor (MRCA) estimated to have evolved around 1833 (95% highest posterior density (HPD) of 1794–1884). Time-dependent rate effects in viral molecular evolution suggest that our date estimates for the CEACAM6-adapted lineage are likely to be conservative (that is, biased towards more recent times). Across diverse viruses, substitution rate estimates are systematically higher over short time scales and decay as the measurement interval increases, consistent with a general time-dependent rate phenomenon³². Mechanistic work further implicated purifying selection and multiple-hit saturation as key drivers of this bias, which causes long-term evolutionary change to accumulate more slowly than expected under a simple, constant-rate clock³². For coronaviruses specifically, models that explicitly accommodated variable selection pressure and substitution saturation pushed the inferred origin of coronavirus radiation from around 10⁴ years to an ancient time scale of millions of years, a result in closer agreement with the diversification of their hosts³³. A full investigation of the time-dependent rate phenomenon (for example, using epoch or heterogenous clock models³⁴) is beyond the scope of this study. However, we acknowledge that estimates for the time to MRCA reported here should be interpreted as lower bounds and that the acquisition of CEACAM6 usage may predate our point estimates. To understand the potential host range of these viruses in more detail, entry through additional CEACAM6-like proteins from primates, bats and other mammals was examined (Fig. 5b, Supplementary Table 10 and Supplementary Fig. 17b). C. cor are taxonomically classified in the family Megadermatidae (African false vampire bat) and the superfamily Rhinolophoidea, which also contains the common coronavirus reservoir bat species Hipposideros and Rhinolophus. Typically, nonhuman primates and bats in the Yinpterochiroptera suborder (Megadermatidae, Rhinolophidae, Hipossideridae, Rhinopomatidae, Craseonycteridae and all fruit bats) have four to six CEACAM genes that produce CEACAM6-like proteins³⁵. Notably, this is not the case for Yangochiroptera bats, which have undergone large expansions in their CEACAM gene repertoire³⁶. As there are currently no Megadermatidae CEACAM6 sequences publicly available, we screened a wide diversity of other bat species. Our screens identified that human CEACAM6 tropism is exclusive to the Kenyan CcCoV isolates (Fig. 5b). The CcCoV-KY43 and CcCoV-2A pseudotypes showed broad use of primate CEACAM6 receptors as well as various Yinpterochiroptera bats, including Indian flying fox, Egyptian fruit bat, large flying fox and greater horseshoe bat. Moreover, pseudotyped spike proteins from alphacoronaviruses isolated from R. ferrumequinum bats in South China³⁷ and Southern European Russia³⁸ were able to use CEACAM6 as receptors to enter cells, but only from Egyptian fruit bat and/or greater horseshoe bat (Fig. 5b). ELISA experiments showed that the recombinant RBD from a related virus, BtCoV-977, which did not pseudotype, can bind Egyptian fruit bat, but not human, CEACAM6 (Extended Data Fig. 11). No consistent amino acid substitution pattern in these viral RBDs distinguished CEACAM6-like proteins that permit CcCoV entry from those that do not (Extended Data Fig. 12a). However, viruses that use human CEACAM6 conserve multiple RBD sequence features that are absent from other RBDs, for example, the length and amino acid compositions of loops 1 and 2 (Extended Data Fig. 12a). We introduced point mutations into the BtCoV-LN20 RBD, which uses various CEACAM6-like proteins but not the human orthologue, to better define the genetic determinants of human receptor usage and zoonotic spillover (Extended Data Fig. 12b). Although we could expand the host range phenotype of BtCoV-LN20, individual changes did not confer human CEACAM6 tropism, which indicated that multiple changes are needed to induce this characteristic and that the risk of rapid adaptation of CcCoV-related viruses to zoonotic spillover is low. To investigate the potential acquisition (or loss) of CEACAM6 usage along the full evolutionary history of alphacoronavirus, we further explicitly modelled CEACAM6 usage as a discrete trait in a Markov jump framework to assess whether the data support single versus multiple gains (or losses). The posterior distribution of CEACAM6 transitions supports ≥2 independent gains (median number of gains = 2; 95% HPD of 1−3), whereas losses are rare and are not mapped on the resulting tree generated using the method highest independent posterior subtree reconstruction (median number of gains = 0, 95% HPD of 0−1). Consistent with this result, the posterior probability that the ‘focal MRCA’ used CEACAM6 is 0.97. In the reconstructed history, CEACAM6 usage in Rhinolophus ferrumequinum bat (BtRf) CoV-Kudep arises on a branch leading to the MRCA of a sister clade to the local phylogeny, whereas the MRCA of these two sister clades itself is reconstructed as not using CEACAM6 (Supplementary Fig. 18). This pattern is most parsimoniously explained by two independent acquisitions of CEACAM6 usage: one along the lineage giving rise to the local phylogeny and one along the lineage leading to the Rhinolophus sinicus (BtRs) CoV-YN20 and BtRfCoV-Kudep strains. Collectively, these data provide evidence that CEACAM6 is a receptor for a broad range of geographically divergent alphacoronaviruses found across East Africa, European Russia and China.

**Fig. 5: CEACAM6 is the receptor for CcCoV-related viruses identified in Asia.**

Discussion

As a consequence of the COVID-19 pandemic, there has been a significant increase in research on coronavirus discovery, reservoir characterization and spillover, as well as the development of broad-acting antivirals and therapeutics^39,40,41,42. However, most studies have focused on betacoronaviruses, and the zoonotic and pandemic potential of alphacoronaviruses has remained relatively uncharacterized. Indeed, sequencing efforts to understand the origin of alphacoronaviruses has identified a rich diversity of virus genotypes in reservoir species such as rodents and bats^31,43. Here we used an approach aimed at analysing and capitalizing on viral heterogeneity at the genus-wide level to gain a broad understanding of receptor usage and host tropism. We confirmed the importance of APN and ACE2 for human, livestock and companion animal alphacoronavirus infections. However, the broader trend was that use of these two receptors is poorly conserved across the genus. This was especially true for spike proteins for which ORFs were sequenced after sampling of bats. Indeed, only 2 out of 25 functionally pseudotyped bat coronavirus spike proteins use a recognized alphacoronavirus receptor. One limitation of our study is that only 28 out of the 36 pseudotyped viruses have a single species assigned as a host, and for these 28 species, the corresponding receptor sequences are known in only 50% (APN) and 79% (ACE2) of cases. We directly matched 10 out of 28 and 9 out of 28 of viruses to their cognate host’s APN and ACE2 receptors, respectively. These numbers increased to 13 out of 28 and 18 out of 28, respectively, when the identity threshold for the receptors was reduced to >85%; however, only 5 pseudotypes used APN and only 1 used ACE2 (Supplementary Figs. 4 and 5). Although we cannot formally discount the hypothesis that some bat alphacoronaviruses are hyperspecialized to only one cognate bat APN or ACE2 that was not included in our libraries, our results strongly indicated the presence of other receptors. The subsequent discovery that one of these viruses—CcCoV-KY43 from Kenya—has at least partial tropism for human cells provides a critical risk assessment for regional and global health communities to prepare for any potential spillover of these viruses. This assessment is strengthened by the following findings: identification of the CcCoV receptor CEACAM6; examination of host reservoir distribution in Kenya; sero-surveillance in human populations; and finally, wider elucidation of receptor use by related bat viruses from outside Africa. These results are crucial when considering that bat coronaviruses closely related to the endemic human alphacoronaviruses HCoV-229E and HCoV-NL63 have been found across Sub-Saharan Africa (Supplementary Fig. 19), consistent with previous alphacoronavirus zoonotic spillover events in this region^44,45.

Many viruses use the N-terminal V-set immunoglobulin domain of cellular adhesion molecules for cell attachment⁴⁶, including MHV, which binds CEACAM1 (ref. ²⁹). The high abundance of CEACAM family proteins on the apical surfaces of mucosal epithelia make them highly suitable for pathogen attachment. Indeed, C. albicans²⁶ and several Gram-negative bacteria²⁴ use CEACAM proteins for adhesion. Given their immunoglobulin-domain architecture and abundance on barrier membranes, we propose that additional coronaviruses use CEACAM family proteins for cell entry. The ultimate goal of pandemic preparedness is to be able to predict and risk assess the zoonotic potential of viruses from their genome sequence alone. Various approaches are being leveraged to do this, one such being the application of machine-learning algorithms to predict zoonotic potential⁴⁷. By integrating computational, unbiased selection of sequences with high-throughput screening, receptor identification, structural characterization, field epidemiology and sero-surveillance, we identified a previously uncharacterized receptor used by alphacoronaviruses from Africa, Europe and East Asia. This finding both identifies a potential threat to human health and provides the underpinning characterization to enhance pandemic preparedness and prevention efforts.

Methods

Ethics statement

The use of human sera for this study was approved by the Scientific and Ethics Review Unit of the Kenya Medical Research Institute (protocol SSC 3426). Before the blood draw, donors gave individual consent for the use of their samples for research.

Construction of gene libraries

A schematic of the downstream analysis pipeline used for retrieving alphacoronavirus data and for generating the final spike protein library used in this study (n = 40) is provided in Fig. 1a. All publicly available alphacoronavirus genome sequences were retrieved from the Virus Pathogen Database and Analysis Resource platform hosted by the Bioinformatics Resource Center at the National Institute of Allergy and Infectious Diseases¹⁵. As of May 2021, the full database consists of 19,082 alphacoronavirus genomes, from which we extracted sequences of the whole spike protein-coding region to obtain a final database of 2,714 sequences. We constructed the spike protein-coding DNA sequence alignment using MAFFT (v.7.526)^49,50 by integrating structural alignments of homologous spike protein structures queried from the UniProt Reference Clusters⁵¹. Maximum-likelihood phylogenetic reconstruction was performed using IQTREE (v.2.3.4)⁵² with 1,000 ultrafast bootstrap replicates (UFBoot) and 1,000 SH-like approximate likelihood ratio tests⁵³ (Supplementary Fig. 20). We performed codon model reconstruction by determining the best-fit model using the model selection procedure implemented in IQTREE. Patristic distances (the sum of branch lengths along the shortest path) were then computed between all pairs of tips in R (v.4.4.1) using the ape package⁵⁴, which informed the unbiased selection of the n = 40 spike protein-coding sequences by applying a greedy algorithm. In brief, let δ(i,j) denote the patristic distance between tips i and j on the reconstructed maximum-likelihood tree, with the branch length estimated in substitutions or sites, and let k be a previous number of tips to be selected. The greedy algorithm identifies the farthest pair arg max_i,j δ(i,j) and initializes the final selection set S with these two tips. Next, for each subsequent selection step, it adds the tip x∉S that maximize its nearest-neighbour distance to the current set x = arg max_x∉S min_y∈S δ(i,j) and repeats the process until |S| = k. Although heuristic, this approach ensures that an optimal subset of tips (evolutionary units) is returned under the assumption of maximizing both minimum phylogenetic distance and phylogenetic diversity, as previously theoretically proposed and discussed¹⁴. We report Faith’s phylogenetic diversity⁵⁵ of the induced minimal subtree and benchmarked against a 10,000 random panels of matching size. For the most divergent alphacoronaviruses, a RBD could not be readily identified for two unclassified viruses and viruses classified in the Soracivirus and Luchacovirus subgenera.

Plasmids used for pseudotyping

Selected alphacoronavirus spike protein-coding sequences were ordered from Biobasic as codon-optimized synthetic genes and subcloned in pcDNA3.1 with a HA tag in the C-terminal cytoplasmic tail. For the APN library, genes were ordered from GenScript and subcloned with a N-terminal V5 tag in the vector pCAGGS. For the ACE2 library and human DPP4 (GenBank: NP_001926), the ectodomains of the ORFs were subcloned with a N-terminal HA tag in pDisplay. CEACAM libraries were obtained from GenScript and subcloned in pcDNA3.1(+) with a C-terminal 8×His tag. Flag-tagged human TMPRSS2 (GenBank: NP_001128571.1) and human DPEP1 (GenBank: NP_001121613.1) were cloned into pcDNA3.1. GenBank accession numbers of the sequences used for the study are provided in Supplementary Table 1 for the alphacoronavirus spike protein library, Supplementary Table 2 for APN, Supplementary Table 3 for ACE2, Supplementary Table 7 for human CEACAM receptors, Supplementary Table 9 for the CcCoV local phylogeny spike protein library and Supplementary Table 10 for CEACAM6-like proteins of different mammalian species.

Cells

Mycoplasma-free HEK293T (human kidney), Calu3 (human lung), Caco2 (human colorectal adenocarcinoma) and HuH7 (human hepatoma) were cultured in Dulbecco’s modified Eagle medium (DMEM) supplemented with 10% FBS, 100 U ml^–1 penicillin plus 100 µg ml^–1 streptomycin (PenStrep) and 1 mM sodium pyruvate. Mycoplasma-free THP1 (human monocytes) and LCL (human B lymphocytes) cell lines were maintained in suspension in RPMI1640 medium supplemented with 10% FBS and PenStrep. All reagents for cell culture were purchased from Gibco. All cells were cultured in a humidified atmosphere at 37 °C, 5% CO₂.

Pseudotype virus production

To pseudotype alphacoronavirus spike proteins, plasmids encoding their ORF were transfected in confluent HEK293T cells seeded in a 6-well plate using polyethylenimine (PEI, 5 µg ml^–1). HIV-1-based lentiviral vectors coding for viral structural proteins (p8.91), particle packaging signals and a luciferase reporter gene (pCSFLW) were also included in the transfection mix⁵⁶. The following day, the medium was replaced and pseudoviruses in the supernatant were collected 48 and 72 h after transfection and pooled together. Following the final collection, the supernatant was centrifuged for 10 min at 4,000 rpm to remove cellular debris. Finally, pseudoviruses were aliquoted and stored at −80 °C until further use. To verify spike protein incorporation, pseudoparticles were purified by ultracentrifugation at 23,000 rpm, 4 °C for 2 h using a 20% sucrose gradient. Supernatants were discarded and pellets were resuspended in PBS. Concentrated pseudoparticles were lysed by boiling with Laemmli (Bio-Rad) and spike protein expression was analysed by SDS–PAGE. Separated proteins were transferred onto a 0.45 µm nitrocellulose membrane (Cytiva), blocked in PBS supplemented with 0.05% Tween-20 and 5% (w/v) unskimmed milk powder and incubated with mouse monoclonal anti-HA (clone 6E2, Cell Signaling Technology, 1:5,000) and anti-p24 (clone 5, Abcam, 1:2,000) overnight at 4 °C. The following day, goat anti-mouse DyLight 680 (Invitrogen, 1:10,000) was used to probe primary antibodies, and signals were detected with an Odyssey DLx imaging system (Li-Cor Biosciences). Of note, for those spike proteins for which we saw entry (Fig. 1; HCoV-NL63, CcCoV-SD F3, CcCoV-A76, HCoV-229E, BtCoV-AT1A F41 and BtCoV-WA1087), protein expression in the purified pseudoparticle immunoblots (Extended Data Fig. 1) did not quantitatively correlate with entry signals, which indicated that it is difficult to establish a minimum threshold of spike protein incorporation needed for membrane fusion. For the six spike proteins that did not pseudotype (Supplementary Fig. 1), we attempted to substitute them for spike proteins that have been reported to pseudotype and that share at least 95% amino acid similarity (Supplementary Fig. 21). However, the only match was for PEDV-KDJ, which was replaced by PEDV-Colorado⁵⁷. The selected HCoV-229E spike protein sequence pseudotyped to low levels and was not functional in our downstream applications (Supplementary Fig. 3a); therefore, we replaced it with a different strain of the same species. Original uncropped immunoblots are provided in the Zenodo repository (https://doi.org/10.5281/zenodo.17951484)⁵⁸.

Pseudovirus entry assay

For receptor usage screening, HEK293T cells were transfected with plasmids coding for the receptors of interest. The following day, the medium was removed and replaced with fresh DMEM supplemented with 10% FBS to a final concentration of 2 × 10⁴ cells per ml. Next, 100 µl of cell preparation was aliquoted in each well of a 96-well plate and treated with 100 µl pseudovirus preparation, diluted 1:1 with fresh medium. Two days later, the supernatant was removed and cells were treated with Bright-Glo (Promega), diluted 1:1 with PBS. Luciferase signals were acquired using a Glomax Discover luminometer (Promega). To assess cell line permissivity to pseudovirus infection, confluent human cell lines were transduced with undiluted pseudoparticle supernatant in a 96-well plate. Two days later, plates were spun down, the medium replaced with 50 µl Bright-Glo and luciferase signals were acquired on a luminometer. To inhibit human CEACAM6-dependent entry, pseudotyped CcCoV-KY43 spike protein was incubated with commercial monoclonal antibodies against human CEACAM6, clone B6.2 (ThermoFisher) and clone 439424 (ThermoFisher), or recombinant C-terminally Fc-tagged human CEACAM6, for 1 h at room temperature in a 96-well plate. HEK293T cells transiently transfected with pcDNA-human CEACAM6 was then added to the plate at a concentration of 2 × 10⁴ cells per well. Two days later, luciferase signals were measured as described above.

Flow cytometry for expression of the receptor libraries

Plasmids were transfected in HEK293T cells using Trans-IT X2 (Mirus Bio). The following day, cells were resuspended in PBS, fixed using 2% paraformaldehyde for 20 min at 4 °C and permeabilized using PBS supplemented with 0.5% Triton X-100 for 5 min at 4 °C. After washing, cells were incubated for 1 h with anti-tag antibodies conjugated with PE for APN (V5 tag, Invitrogen, clone TCM5, 1:500) and APC for ACE2 (Flag-tag, Miltenyi Biotec, clone REA216, 1:500) and CEACAM (His-tag, Miltenyi Biotec, clone GG11-8F3.5.1, 1:500) libraries. Samples were run on a MACSQuant Analyzer 10 cytometer (Miltenyi Biotec), and data analysis was performed using FlowJo (BD Biosciences). To assess the specificity of anti-human CEACAM6 monoclonal antibodies, HEK293T cells were transfected with a library of different human CEACAM proteins. The following day, cells were washed in PBS and fixed using 2% paraformaldehyde for 20 min at 4 °C. After washing, cells were incubated with clone B6.2 (1:500) or clone 439424 (1:500) for 1 h on ice. Cells were washed three times before incubation with the secondary antibody conjugated with FITC and recognizing mouse IgG(H+L) (Invitrogen, 1:5,000). Samples were run on a MACSQuant Analyzer 10 cytometer and data were analysed using FlowJo.

Human virus receptor discovery

Human receptor ectodomains were expressed as enzymatically monobiotinylated soluble proteins as previously described²⁰ by co-transfecting HEK293 cells with a secreted BirA protein biotin⁵⁹, as previously described⁶⁰. Protein-containing supernatants were collected after 5 days of expression at 37 °C with 5% CO₂ and 70% humidity. Cells were removed by centrifugation 2,000g for 20 mins, before filtering away particulates (Acrodisc PF PES 0.8/0.2 micron filters, 4658). The imidazole concentration was adjusted to 20 mM and applied to a 96-well HisTrapHP plate as previously described⁶¹. Non-captured proteins were removed using 3 washes with buffer (20 mM sodium phosphate buffer, 400 mM NaCl and 40 mM imidazole, pH 7.4). Captured proteins were eluted after 15 min of incubation with elution buffer (20 mM sodium phosphate, 400 mM NaCl and 400 mM imidazole, pH adjusted to 7.4). Protein concentrations were determined using a Pierce Bradford protein assay kit, comparing the absorbance values for the purified proteins against values obtained for a dilution series of bovine serum albumin (BSA). Protein purity was assessed by separating protein preparations under reducing conditions (NuPAGE sample reducing agent and LDS sample loading buffer) by SDS–PAGE in MOPS–SDS buffer (NuPAGE) and visualized using Coomassie dye (InstantBlue). Human receptor screening assays were carried out as previously described²⁰. In brief, horseradish-peroxidase-labelled preys were produced by complexing monobiotinylated proteins (1.785 ml at 17.5 nM) with streptavidin HRP (714 µl of a 1 in 1,000 dilution, Pierce: 21130) for 1 h at 23 °C, diluted in HEPES buffered saline (HBS) containing 2% (w/v) BSA. Preys were further diluted 20-fold in the same buffer and applied to 2 × 384-well plates containing 759 immobilized human receptor ectodomains and incubated for 1 h at 23 °C. Plates were washed twice with HBS containing Tween-20 (0.1% w/v), followed by a final wash in HBS. Protein interactions were visualized using TMB/E solution (Merck ES001), stopping the reaction by the addition of NaF to a final concentration 0.15% (w/v). Absorbance readings at 652 nm were processed using a median polished Z score with the significance threshold set to Z > 2.

Site-directed mutagenesis

Substitutions in human CEACAM5 and CEACAM6 were introduced using a Quikchange Lightning Site-Directed Mutagenesis kit (Agilent) following the manufacturer’s instructions. Primers were designed using the Agilent online tool (https://www.agilent.com/store/primerDesignProgram.jsp).

Cell–cell fusion assay

HEK293T cells were transfected with either the human CEACAM receptor constructs and rLuc-GFP 1-7 plasmid⁶² or with plasmids encoding CcCoV-KY43 spike protein and rLuc-GFP 8-11 using Transit-X2 transfection reagent (Mirus) according to the manufacturer’s instructions. The following day, cells were resuspended in fresh medium and co-cultured at a ratio of 1:1 to a final density of 4 × 10⁴ cells per well in a 96-well plate. Two days later, medium was removed and cells were lysed in Passive Lysis buffer (Promega). Renilla luciferase substrate coelenterazine-H (Promega) was added and signals were read using a Glomax Discover Reader (Promega).

Knockdown of CEACAM6 in human cells

siRNA targeting human CEACAM6 (Dharmacon) was introduced in Calu3 and Caco2 cells by electroporation (Neon NxT, ThermoFisher). In brief, cells were washed, resuspended in electroporation buffer at a final concentration of 1 × 10⁸ cells per ml and mixed with siRNA at 100 mM. Electroporation was carried out for 2 pulses of 20 ms at 1,400 V. Afterwards, cells were grown for 24 h in DMEM supplemented with 20% FBS before transduction with pseudotyped CcCoV-KY43 spike protein. Entry signals using firefly luciferase were detected as described above. To evaluate reduction in protein expression, cellular lysates were obtained at the same time. Western blot membranes were incubated with mouse anti-human CEACAM6 (Fisher Scientific, clone B6.2, 1:1,000) and rabbit anti-human GAPDH (Proteintech, 1:10,000) as a loading control. The following day, goat anti-mouse DyLight 680 (Invitrogen, 1:10,000) was used to probe primary antibodies, and signals were detected using an Odyssey DLx imaging system (Li-Cor Biosciences). In parallel, stable Caco2 and Calu3 cell lines for which expression of human CEACAM6 was knocked down using shRNA were obtained. Hairpin sequences of 21-mers targeting the gene were obtained from the web portal of the Genetic Perturbation Consortium (Broad Institute, https://portals.broadinstitute.org/gpp/public/seq/search). The sequences TRCN0000424513 and TRCN000062298 were purchased as oligomers from IDT and cloned into the pKLO.1C vector (Addgene, 139470; PMID 16564017). The final constructs were transfected into HEK293T cells, along with the lentiviral packaging vector p8.91 and a plasmid encoding for VSV glycoprotein G. Pseudoviruses were collected as described above. Caco2 and Calu3 cells were plated in a 6-well plate and transduced when 80% confluency was reached. Pseudoviruses were added for 3 days, then removed and replaced with DMEM 20% FBS containing 8 µg ml^–1 puromycin. Functional entry assays were performed as described above, and reduced expression of CEACAM6 was again confirmed by immunoblotting. Original uncropped immunoblots are provided in the Zenodo repository (https://doi.org/10.5281/zenodo.17951484)⁵⁸.

Transcriptomic analysis of available datasets

Publicly available single-cell RNA-sequencing data from the Human Protein Atlas were downloaded (https://www.proteinatlas.org/humanproteome/single+cell/single+cell+type/data#datasets). The dataset included gene read counts per cell for 31 human organs. Count matrices per organ were imported into R (v.4.3.3) and processed using Seurat (v.5.3.0) in RStudio (v.2025.05.1). To further zoom in on lung-specific single-cell transcriptomic data, we downloaded the Human Lung Cell Atlas (HLCA v.1.0) dataset for normal human lungs samples (https://data.humancellatlas.org/hca-bio-networks/lung/atlases/lung-v1-0). Processed cell-type annotations and expression matrices were loaded into Seurat (v.4.3.3.) for downstream analyses.

Recombinant protein production

Receptors (human CEACAM6, residues 1–326; human CEACAM5, residues 1–684; human CEACAM3, residues 1–215; human CEACAM1, residues 1–428; human CEACAM8, residues 1–327; and Egyptian fruit bat CEACAM6, residues 1–388) were cloned into pOPINTT vectors, upstream of the human rhinovirus 3C protease site and human IgG-Fc tag. Full-length spike proteins (SARS-CoV-2, HCoV-NL63 and HCoV-229E) were cloned into pCDNA3.1 vectors. RBDs (CcCoV-KY43, residues 500–630; CcCoV-2B, residues 496–625; BtCoV-HlYN18, residues 490–627; BtCoV-A701, residues 488–622; BtMfCoV-HuB2013, residues 497–641; BtCoV-WA3607, residues 497–652; BtCoV-CpYN11, residues 491–641; BtCoV-180, residues 477–633; BtCoV-HKU8, residues 497–644; BtCoV-RmYN17, residues 506–632; BtCoV-77, residues 495–634, BtCoV-WA2028, residues 497–648; BtCoV-977, residues 508-638; mink coronavirus, residues 518–660; and CCoV-SD-F3, residues 525–679) were cloned into pOPINTT vectors, which encode a 6×His tag at the C terminus of the protein. Mycoplasma-free Expi293 cells were cultured in Expi293 Expression medium (Gibco) at 37 °C, 8% CO₂. Expression plasmids were transfected using Polyethylenimine Max (PEI 40K, Polyscience) at a mass ratio of 1:1.5 according to the manufacturer’s instructions. After 18 h, valproic acid (Merck), sodium propionate (Merck) and glucose (Merck) were added to the cellular suspension at a final concentration of 5 mM, 6.7 mM and 46 mM, respectively. After 3 days, supernatant was collected by centrifugation (3,800g for 10 min) and sterile filtered before storage at 4 °C. Fc-tagged proteins were purified using HiTrap protein G HP (Cytiva) prepacked affinity columns, washing with 10 mM sodium phosphate pH 7 buffer, eluting with 0.1 M glycine pH 2.7 that was immediately neutralized with 1 M Tris pH 8.0. His-tagged proteins were purified using HisTrap FF (Cytiva) prepacked affinity columns, washing with 10 mM sodium phosphate pH 7.5, 150 mM NaCl, 20 mM imidazole and eluting with 10 mM sodium phosphate pH 7.5, 150 mM NaCl and 1 M imidazole. For all samples, eluted fractions were analysed by SDS–PAGE, and fractions containing the relevant protein were pooled. Pooled protein was exchanged into 10 mM Tris pH 7.5, 150 mM NaCl via repeated concentration and dilution using Amicon Ultra Centrifugal filters (Merck). For crystallization, the CEACAM6 Fc tag was removed by overnight incubation at 4 °C with human rhinovirus 3C protease, and both the Fc tag plus uncleaved CEACAM6-Fc were depleted using a HiTrap protein G HP affinity column. For crystallography and biophysics (ITC and biolayer interferometry assays), proteins were further purified by size-exclusion chromatography using a Superdex 200 10/300 GL column (Cytiva) equilibrated in 10 mM Tris pH 7.5, 150 mM NaCl. Proteins were stored at 4 °C (<2 weeks) or snap-frozen and stored at –80 °C (long term).

Protein binding determined by ELISA

Recombinant RBDs were coated onto Maxisorp NUNC-immuno flat-bottomed 96-well plates (Thermo Scientific) at 1 μg ml^–1 overnight at 4 °C in carbonate–bicarbonate solution (0.6 M, pH 9.6). Plates were blocked for 1 h at room temperature with PBS Tween-20 containing 2% BSA (Merck), after which a dilution series of CEACAM proteins in PBS BSA 2% was added for 1 h at room temperature. Plates were washed 3 times with PBS Tween-20 0.1% (Sigma), followed by the addition of anti-human-Fc HRP conjugate diluted 1:10,000 for 1 h at room temperature. 1-step Ultra TMB (Merck) was added to each well, incubated for 5–10 min at room temperature and the reactions stopped with an equivalent volume of 2 M sulfuric acid solution. The OD at 450 nm was measured using a Glomax Discover luminometer (Promega).

The binding IgG ELISAs with human sera were performed according to previously published protocols⁶³. In brief, Maxisorp NUNC-immuno flat-bottomed 96-well plates were coated with 2 µg ml^–1 coronavirus spike antigens and CcCoV-KY43, BtCoV-HlYN18 and BtCoV-A701 RBDs at 37 °C for 1 h, then washed 3 times in 0.1% Tween-20 and blocked with blocker casein (Thermo Fisher) for 1 h. As full-length spike protein ectodomains could not be produced in sufficient quantities for CcCoV-KY43, ELISAs used stabilized full-length spike proteins for HCoV-229E, HCoV-NL63 and SARS-CoV-2 and RBDs for CcCoV-KY43, BtCoV-HlYN18 and BtCoV-A701; this modification represents a limitation of the study and precludes direct quantitative comparison of ELISA signal magnitudes across antigens. Samples were diluted 1:800 in blocker casein and added to both receptor binding domain and spike protein-coated plates, and incubated for 2 h at room temperature. After washing with 0.1% Tween-20, a 1:10,000 dilution of horseradish peroxidase-conjugated goat antihuman IgA antibody (Sigma) in wash buffer was added to plates, incubated for 1 h at room temperature, washed and O-phenylenediamine dihydrochloride substrate (Sigma) was added for colour development for 10 min. Absorbance was measured at 492 nm. Spearman’s rank correlations were applied to assess associations across the full dataset, whereas Pearson’s correlations were applied to the top 10% of CcCoV-KY43 ELISA signals to explore co-variation among high responders. As true negative control sera are unavailable in this emergent-virus setting, OD thresholds were used solely as descriptive markers of high reactivity (in each antigen-specific ELISA) and not to infer seropositivity or to compare absolute signal magnitudes across assays.

Flow cytometry for receptor–viral protein binding

HEK293T cells were transiently transfected with plasmids encoding APN and ACE2 from different species, as well as human TMPRSS2, human DPP4 and human DPEP1. The following day, cells were washed in PBS and fixed using 2% formaldehyde in PBS for 20 min on ice. After incubation, cells were permeabilized with Triton 1% in PBS, for 5 min on ice. Cells were washed twice in PBS and incubated with 10 mg ml^–1 viral-predicted RBD diluted in PBS BSA 1%, for 1 h on ice. After washing, the secondary antibodies were incubated with the cells for 1 h on ice. To label the receptors, PE-conjugated anti-V5 (Invitrogen, clone TCM5, 1:500) was used for the APN library, PE-conjugated anti-HA (Miltenyi Biotec, clone GG8-1F3.3.1, 1:500) was used for the ACE2 library and human DPP4 and PE conjugated anti-Flag (BioLegend, clone L5, 1:500) was used for human TMPRSS2 and human DPEP1. To recognize the His-tagged viral predicted RBDs, APC-conjugated anti-6×His was used (Miltenyi Biotec, 1:500). Cells were run on a MACSQuant flow cytometer (Miltenyi Biotec), and at least 25,000 events were acquired per sample. Data were analysed using FlowJo (BD Biosciences). The gating strategy is shown in Supplementary Fig. 22.

Crystallization

Complexes of CEACAM6 with RBDs from CcCoV-KY43 and CcCoV-2B were formed by mixing the proteins at a 1:1.2 molar ratio and incubating for 60 min at 22 °C. Proteins were crystallized in 96-well nanolitre-scale sitting drops (200 nl protein plus 200 nl of reservoir) equilibrated at 293 K against 80 μl reservoir solution as follows: 2 mg ml⁻¹ CcCoV-KY43 complex, 0.1 M sodium acetate pH 4.3, 21.5% PEG5K MME and 5% glycerol; 2.7 mg ml⁻¹ CcCoV-2B complex, 0.1 M sodium acetate, 20% PEG 6K and 0.2 M NaCl. Crystals were cryo-preserved by brief immersion in reservoir supplemented with 20–25% glycerol before collecting in SPINE standard nylon loops and flash-cryocooling in liquid nitrogen.

X-ray data collection and structure determination

Diffraction data were recorded from single crystals on the Diamond beamline I24 at 20 keV using an Eiger2 9M detector (CcCoV-KY43 complex) or on the beamline I04 at 13 keV using an Eiger2 X 16M detector (CcCoV-2B complex). Both crystals produced severely anisotropic diffraction; therefore, data were integrated using DIALS⁶⁴ before anisotropic scaling and merging using the STARANISO ‘aniso merge’ data processing pipeline⁶⁵. The structure of human CEACAM6 in complex with CcCoV-KY43 was solved by molecular replacement using PHASER⁶⁶ with AlphaFold 3 (ref. ⁶⁷) models of the ectodomain (residues 35–326) of CEACAM6 and the RBD (residues 500–630) of CcCoV-KY43, for which the atomic pLDDT values had been converted into pseudo-atomic displacement factors⁶⁸. The model was manually improved using ISOLDE⁶⁹ with adaptive restraints based on the AlphaFold quality metrics⁶⁸ before refinement using phenix.refine⁷⁰, with the coordinates from ISOLDE used as a reference model to prevent deterioration of model geometry. Subsequent rounds of iterative model building were performed using COOT⁷¹ and ISOLDE in consultation with the validation statistics provided by MolProbity⁷², in each case using the coordinates from ISOLDE as a reference model in phenix.refine. The final model had one molecule of CEACAM6 and CcCoV-KY43 RBD per asymmetric unit, with 95.6% of residues in the favoured area of the Ramachandran plot and no outliers, and it included 10 N-acetylglucosamine residues (9 attached to CEACAM6 and 1 to CcCoV-KY43 RBD). The structure of human CEACAM6 in complex with CcCoV-2B was solved by molecular replacement using PHASER⁶⁶ with models of the CEACAM6 ectodomain and CcCoV-2B RBD (residues 496–625) that were generated with ColabFold⁷³ using the crystal structure of CEACAM6 in complex with the CcCoV-KY43 RBD as a template. The structure was refined as described above. The final model had one molecule of CEACAM6 and CcCoV-2B RBD per asymmetric unit, with 96.2% of residues in the favoured area of the Ramachandran plot and no outliers, and it included 12 N-acetylglucosamine residues (9 attached to CEACAM6 and 3 to CcCoV-2B RBD).

Biolayer interferometry

To compare affinities of human CEACAM proteins for CcCoV-KY43 and CcCoV-2B, the recombinant viral RBDs were immobilized at 20 μg ml^–1 to Ni-NTA biosensors (Sartorius) pre-hydrated in 10 mM HEPES pH 7.5, 150 mM NaCl and 0.02% Tween-20. Biosensors were incubated with a a 2-fold serial dilution of CEACAM proteins from 600 nM to 37.5 nM (association) then buffer alone (dissociation). Between incubations, biosensors were regenerated by thrice incubating for 5 s with 10 mM glycine pH 1.7 then 5 s in buffer to neutralize, and biosensors were recharged using 10 mM NiCl₂ before fresh immobilization of the RBD. Two independent experiments were performed at 30 °C using a Octect Red (ForteBio). Results were analysed using data analysis software of the instrument (Sartorius).

ITC

ITC experiments were performed using a MicroCal PEAQ-ITC automated calorimeter (Malvern Panalytical) at 25 °C. CcCoV RBDs were titrated into human CEACAM proteins using 13 × 3 µl or 19 × 2 µl injections. Results were analysed using the analysis software of the instrument and fitted using a one-site binding model. The experiments were performed twice (CEACAM5) or thrice (CEACAM6) independently, and data from each experiment are shown in Supplementary Table 4.

C. cor and human population distribution data in Kenya

GPS points from all the study areas in Kenya where bats were surveyed were uploaded onto QGIS (2025, https://www.qgis.org). The points were later grouped based on bats species as either ‘other species of bats’ or ‘heart-nosed bat’. A shape file of the known extent of C. cor was added to ascertain the accuracy of the sampling. Sampling points containing C. cor were later classified based on the cave type as either natural cave, tree cave or mine cave. In two instances, bats were found in human habitations.

Phylogenetic reconstruction

Alphacoronavirus spike gene sequences of the 40 selected strains, in addition to those representing the local phylogeny of CcCoV-KY43, were aligned using MAFFT (v.7.526)⁴⁹ and molecular clock calibration was performed in BEAST (v.1.10.5)⁷⁴. Bayesian analysis was parameterized using the SRD06 codon position model (which implements a HKY₁₁₂ partition)^75,76, a relaxed clock defined by an underlying log-normal distribution to model variability in rates across branches⁷⁷, and a nonparametric skygrid prior with 50 grid points distributed over 65 years⁷⁸. An uninformative continuous-time Markov chain rate was set as reference prior on the clock rate, whereas other priors were left at their default values. Moreover, to investigate the potential acquisition (or loss) of CEACAM6 usage along the full evolutionary history of alphacoronavirus, discrete CEACAM6 usage was modelled as a binary trait under an asymmetric continuous-time Markov chain process with Markov jump counting in BEAST⁷⁹. CEACAM6 trait evolution was jointly inferred in BEAST (v.1.10.5) with time-calibrated phylogenies estimated under a SRD06 codon-partitioned substitution model, an uncorrelated log-normal relaxed molecular clock and a skygrid coalescent prior. Markov jumps were used to obtain posterior distributions for the number and direction of CEACAM6 transitions (gains and losses) and to reconstruct the most probable trait state at internal nodes. All Markov chain Monte Carlo analyses were run for 200 million iterations, with samples collected every 20,000 steps. Convergence and mixing of the chain were evaluated by ensuring that all posterior parameters returned effective sample size values greater than 200. Highest independent posterior subtree reconstructed⁸⁰ trees were constructed in TreeAnnotator by discarding the initial 10% of the chain.

Additional computational analysis

Molecular graphics were generated using PyMOL (Schrödinger) and ChimeraX⁸¹. Structure interfaces were analysed using PDBePISA⁸². For analysis of relative changes in pseudotype entry, significance was quantified using the Python3 module statsmodels (https://www.statsmodels.org/) with technical replicates averaged and treatment effects quantified as log₁₀-transformed fold changes relative to the control (untreated) sample. Significance was determined using a one-sample t-test against a theoretical mean of 0, with standard errors calculated using a pooled variance estimate across all groups to ensure robust variance estimation given the small sample sizes. P values were adjusted for multiple comparisons using the Holm–Bonferroni correction. GraphPad Prism 9 was used to generate all the other figures and to perform all other statistical analyses. Non-significant difference is shown when P > 0.032.

Biosecurity statement

Following the identification of CEACAM6 as a receptor in early 2024, we recognized the potential health implications of our findings and the need to notify and improve preparedness and biosecurity in Kenya. Contacting B. A. at the National Museum of Kenya, who first identified CcCoV-KY43 and is a co-author on this study, we let his team know of the potential risk, especially concerning sampling of this bat species. In consultation with biosecurity colleagues in the United Kingdom, we also made the decision to not attempt rescuing or propagating the virus in our laboratories. As such, all work detailed in this paper has been done without live virus work in UK laboratories.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All sequence data analysed were sourced from public databases, as described in the Methods. These, along with the raw, when necessary, analysed data presented in this article, have been deposited into Zenodo (https://doi.org/10.5281/zenodo.17951484)⁵⁸ and made freely accessible. Atomic coordinates, structure factors and protein sequences have been deposited into the Protein Data Bank (PDB) with the following accession numbers: 9RCS for human CEACAM6 in complex with the CcCoV-KY43 RBD and 9RCU for human CEACAM6 in complex with the CcCoV-2B RBD. In addition to the structures determined in this study, the following previously experimentally determined 3D structures were included: CEACAM6 homodimer (PDB:4Y8A); CEACAM6–CEACAM8 heterodimer (PDB: 4YIQ); BtCoV-MOW15-22 (PDB: 9C6O); HCoV-SARS2 (PDB: 6M0J); HCoV-SARS1 (PDB: 2AJF4); NeoCoV (PDB: 7WPO); BtCoV-HKU5 (PDB: 9D32); BtCoV-KY72 (PDB: 8K4U); BtCoV-PRD-0038 (PDB: 8U0T); HCoV-NL63 (PDB: 3KBH); HCoV-229E (PDB: 6ATK); CCoV strain HuPn2018 (PDB: 7U0L); PRCV (PDB: 4F5C); PDCoV (PDB: 7VPQ); and MHV spike trimer in complex with mouse CEACAM1 (PDB: 6VSJ). Human population data were obtained from the Kenya National Bureau of Statistics (2019, https://www.knbs.or.ke/download/2019-kenya-population-and-housing-census-volume-iii-distribution-of-population-byage-sex-and-administrative-units/) based on the 2019 national census. Viral sequences used for this study were downloaded (https://www.bv-brc.org/view/Virus/10239). Publicly available single-cell RNA-sequencing data from the Human Protein Atlas were downloaded (https://www.proteinatlas.org/humanproteome/single+cell/single+cell+type/data#datasets). Lung Cell Atlas (HLCA v1.0) dataset for normal human lungs samples was downloaded https://data.humancellatlas.org/hca-bio-networks/lung/atlases/lung-v1-0).

References

Ellwanger, J. H. & Chies, J. A. B. Zoonotic spillover: understanding basic aspects for better prevention. Genet. Mol. Biol. 44, e20200355 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yeager, C. L. et al. Human aminopeptidase N is a receptor for human coronavirus 229E. Nature 357, 420–422 (1992).
Article CAS PubMed PubMed Central Google Scholar
Li, Z. et al. The human coronavirus HCoV-229E S-protein structure and receptor binding. eLife https://doi.org/10.7554/eLife.51230 (2019).
Reguera, J. et al. Structural bases of coronavirus attachment to host aminopeptidase N and its inhibition by neutralizing antibodies. PLoS Pathog. 8, e1002859 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tusell, S. M., Schittone, S. A. & Holmes, K. V. Mutational analysis of aminopeptidase N, a receptor for several group 1 coronaviruses, identifies key determinants of viral host range. J. Virol. 81, 1261–1273 (2007).
Article CAS PubMed Google Scholar
Hofmann, H. et al. Human coronavirus NL63 employs the severe acute respiratory syndrome coronavirus receptor for cellular entry. Proc. Natl Acad. Sci. USA 102, 7988–7993 (2005).
Article CAS PubMed PubMed Central Google Scholar
Raj, V. S. et al. Dipeptidyl peptidase 4 is a functional receptor for the emerging human coronavirus-EMC. Nature 495, 251–254 (2013).
Article CAS PubMed PubMed Central Google Scholar
Huang, X. et al. Human coronavirus HKU1 spike protein uses O-acetylated sialic acid as an attachment receptor determinant and employs hemagglutinin-esterase protein as a receptor-destroying enzyme. J. Virol. 89, 7202–7213 (2015).
Article CAS PubMed PubMed Central Google Scholar
Vlasak, R., Luytjes, W., Spaan, W. & Palese, P. Human and bovine coronaviruses recognize sialic acid-containing receptors similar to those of influenza C viruses. Proc. Natl Acad. Sci. USA 85, 4526–4529 (1988).
Article CAS PubMed PubMed Central Google Scholar
Saunders, N. et al. TMPRSS2 is a functional receptor for human coronavirus HKU1. Nature 624, 207–214 (2023).
Article CAS PubMed PubMed Central Google Scholar
Dufloo, J. et al. Dipeptidase 1 is a functional receptor for a porcine coronavirus. Nat. Microbiol. 10, 2981–2996 (2025).
Article CAS PubMed PubMed Central Google Scholar
Williams, R. K., Jiang, G. S. & Holmes, K. V. Receptor for mouse hepatitis virus is a member of the carcinoembryonic antigen family of glycoproteins. Proc. Natl Acad. Sci. USA 88, 5533–5536 (1991).
Article CAS PubMed PubMed Central Google Scholar
Peng, G. et al. Crystal structure of mouse coronavirus receptor-binding domain complexed with its murine receptor. Proc. Natl Acad. Sci. USA 108, 10696–10701 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bordewich, M., Rodrigo, A. G. & Semple, C. Selecting taxa to save or sequence: desirable criteria and a greedy solution. Syst. Biol. 57, 825–834 (2008).
Article PubMed Google Scholar
Olson, R. D. et al. Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR. Nucleic Acids Res. 51, D678–D689 (2023).
Article CAS PubMed PubMed Central Google Scholar
An, D. et al. Resolving the APN controversy in PEDV infection: comparative kinetic characterization through single-virus tracking. PLoS Pathog. 21, e1013317 (2025).
Article CAS PubMed Google Scholar
Dufloo, J., Andreu-Moreno, I., Moreno-García, J., Valero-Rello, A. & Sanjuán, R. Receptor-binding proteins from animal viruses are broadly compatible with human cell entry factors. Nat. Microbiol. 10, 405–419 (2025).
Article CAS PubMed PubMed Central Google Scholar
Tong, S. et al. Detection of novel SARS-like and other coronaviruses in bats from Kenya. Emerg. Infect. Dis. 15, 482–485 (2009).
Article PubMed PubMed Central Google Scholar
Tao, Y. et al. Genomic characterization of seven distinct bat coronaviruses in Kenya. Virus Res. 167, 67–73 (2012).
Article CAS PubMed PubMed Central Google Scholar
Shilts, J. et al. A physical wiring diagram for the human immune system. Nature 608, 397–404 (2022).
Article CAS PubMed PubMed Central Google Scholar
Regev, A. et al. The Human Cell Atlas. eLife https://doi.org/10.7554/eLife.27041 (2017).
Sikkema, L. et al. An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563–1577 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ambrosi, C. et al. Acinetobacter baumannii targets human carcinoembryonic antigen-related cell adhesion molecules (CEACAMs) for invasion of pneumocytes. mSystems https://doi.org/10.1128/msystems.00604-20 (2020).
Tchoupa, A. K., Schuhmacher, T. & Hauck, C. R. Signaling by epithelial members of the CEACAM family—mucosal docking sites for pathogenic bacteria. Cell Commun. Signal. 12, 27 (2014).
Article PubMed PubMed Central Google Scholar
Barnich, N. et al. CEACAM6 acts as a receptor for adherent-invasive E. coli, supporting ileal mucosa colonization in Crohn disease. J. Clin. Invest. 117, 1566–1574 (2007).
Article CAS PubMed PubMed Central Google Scholar
Klaile, E. et al. Binding of Candida albicans to human CEACAM1 and CEACAM6 modulates the inflammatory response of intestinal epithelial cells. mBio https://doi.org/10.1128/mbio.02142-16 (2017).
Bonsor, D. A., Günther, S., Beadenkopf, R., Beckett, D. & Sundberg, E. J. Diverse oligomeric states of CEACAM IgV domains. Proc. Natl Acad. Sci. USA 112, 13561–13566 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wong, A. H. M. et al. Receptor-binding loops in alphacoronavirus adaptation and evolution. Nat. Commun. 8, 1735 (2017).
Article PubMed PubMed Central Google Scholar
Dveksler, G. S. et al. Cloning of the mouse hepatitis virus (MHV) receptor: expression in human and hamster cell lines confers susceptibility to MHV. J. Virol. 65, 6881–6891 (1991).
Article CAS PubMed PubMed Central Google Scholar
Monadjem, A., Bergmans, W., Mickleburgh, S. & Hutson, A. M. The IUCN Red List of Threatened Species (IUCN, 2017).
Wang, D. et al. Substantial viral diversity in bats and rodents from East Africa: insights into evolution, recombination, and cocirculation. Microbiome 12, 72 (2024).
Article PubMed PubMed Central Google Scholar
Aiewsakun, P. & Katzourakis, A. Time-dependent rate phenomenon in viruses. J. Virol. 90, 7184–7195 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wertheim, J. O., Chu, D. K., Peiris, J. S., Kosakovsky Pond, S. L. & Poon, L. L. A case for the ancient origin of coronaviruses. J. Virol. 87, 7039–7045 (2013).
Article CAS PubMed PubMed Central Google Scholar
Mifsud, J. C. O., Suchard, M. A., Holmes, E. C. & Lemey, P. Recent advances in the inference of deep viral evolutionary history. J. Virol. 99, e0029225 (2025).
Article PubMed PubMed Central Google Scholar
Chang, C. L. et al. Widespread divergence of the CEACAM/PSG genes in vertebrates and humans suggests sensitivity to selection. PLoS ONE 8, e61701 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kammerer, R. et al. Recent expansion and adaptive evolution of the carcinoembryonic antigen family in bats of the Yangochiroptera subgroup. BMC Genomics 18, 717 (2017).
Article PubMed PubMed Central Google Scholar
Han, Y. et al. Panoramic analysis of coronaviruses carried by representative bat species in Southern China to better understand the coronavirus sphere. Nat. Commun. 14, 5537 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lenshin, S. V. et al. Identification of a new alphacoronavirus (Coronaviridae: Alphacoronavirus) associated with the greater horseshoe bat (Rhinolophus ferrumequinum) in the south of European part of Russia. Vopr. Virusol. 69, 546–557 (2024).
Article CAS PubMed Google Scholar
Kuchinski, K.S. et al. Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats. eLife https://doi.org/10.7554/eLife.79777 (2022).
Ghai, R. R. et al. Animal reservoirs and hosts for emerging Alphacoronaviruses and Betacoronaviruses. Emerg. Infect. Dis. 27, 1015–1022 (2021).
Article CAS PubMed PubMed Central Google Scholar
Karim, M., Lo, C.W. & Einav, S. Preparing for the next viral threat with broad-spectrum antivirals. J. Clin. Invest. 133, e170236 (2023).
Article CAS PubMed PubMed Central Google Scholar
Monrad, J. T., Sandbrink, J. B. & Cherian, N. G. Promoting versatile vaccine development for emerging pandemics. NPJ Vaccines 6, 26 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cui, X. et al. Virus diversity, wildlife-domestic animal circulation and potential zoonotic viruses of small mammals, pangolins and zoo animals. Nat. Commun. 14, 2488 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tao, Y. et al. Surveillance of bat coronaviruses in Kenya identifies relatives of human coronaviruses NL63 and 229E and their recombination history. J. Virol. 91, e01953-16 (2017).
Article PubMed PubMed Central Google Scholar
Corman, V. M. et al. Evidence for an ancestral association of human coronavirus 229E with bats. J. Virol. 89, 11858–11870 (2015).
Article CAS PubMed PubMed Central Google Scholar
Dermody, T. S., Kirchner, E., Guglielmi, K. M. & Stehle, T. Immunoglobulin superfamily virus receptors and the evolution of adaptive immunity. PLoS Pathog. 5, e1000481 (2009).
Article PubMed PubMed Central Google Scholar
Wardeh, M., Blagrove, M. S. C., Sharkey, K. J. & Baylis, M. Divide-and-conquer: machine-learning integrates mammalian and viral traits with network features to predict virus–mammal associations. Nat. Commun. 12, 3954 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wu, K., Li, W., Peng, G. & Li, F. Crystal structure of NL63 respiratory coronavirus receptor-binding domain complexed with its human receptor. Proc. Natl Acad. Sci. USA 106, 19970–19974 (2009).
Article CAS PubMed PubMed Central Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Rozewicki, J., Li, S., Amada, K. M., Standley, D. M. & Katoh, K. MAFFT-DASH: integrated protein sequence and structural alignment. Nucleic Acids Res. 47, W5–W10 (2019).
CAS PubMed PubMed Central Google Scholar
Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B. & Wu, C. H. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).
Article CAS PubMed Google Scholar
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).
Article CAS PubMed PubMed Central Google Scholar
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Article CAS PubMed Google Scholar
Daniel, P. F. Conservation evaluation and phylogenetic diversity. Biol. Conserv. 61, 1–10 (1992).
Google Scholar
Thakur, N., Gallo, G., Elreafey, A. M. E. & Bailey, D. Production of recombinant replication-defective lentiviruses bearing the SARS-CoV or SARS-CoV-2 attachment spike glycoprotein and their application in receptor tropism and neutralisation assays. Bio-Protocol 11, e4249 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, C. et al. Receptor usage and cell entry of porcine epidemic diarrhea coronavirus. J. Virol. 89, 6121–6125 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gallo, G. et al. Data from: Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells. Zenodo https://doi.org/10.5281/zenodo.17951484 (2026).
Bushell, K. M., Söllner, C., Schuster-Boeckler, B., Bateman, A. & Wright, G. J. Large-scale screening for novel low-affinity extracellular protein interactions. Genome Res. 18, 622–630 (2008).
Article CAS PubMed PubMed Central Google Scholar
Galaway, F., Yu, R., Constantinou, A., Prugnolle, F. & Wright, G. J. Resurrection of the ancestral RH5 invasion ligand provides a molecular explanation for the origin of P. falciparum malaria in humans. PLoS Biol. 17, e3000490 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sun, Y., Gallagher-Jones, M., Barker, C. & Wright, G. J. A benchmarked protein microarray-based platform for the identification of novel low-affinity extracellular protein interactions. Anal. Biochem. 424, 45–53 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ishikawa, H., Meng, F., Kondo, N., Iwamoto, A. & Matsuda, Z. Generation of a dual-functional split-reporter protein for monitoring membrane fusion using self-associating split GFP. Protein Eng. Des. Sel. 25, 813–820 (2012).
Article CAS PubMed Google Scholar
Nyagwange, J. et al. Serum immunoglobulin G and mucosal immunoglobulin A antibodies from prepandemic samples collected in Kilifi, Kenya, neutralize SARS-CoV-2 in vitro. Int. J. Infect. Dis. 127, 11–16 (2023).
Article CAS PubMed Google Scholar
Winter, G. et al. DIALS: implementation and evaluation of a new integration package. Acta Crystallogr. D Struct. Biol. 74, 85–97 (2018).
Article CAS PubMed PubMed Central Google Scholar
Evans, P. R. & Murshudov, G. N. How good are my data and what is the resolution? Acta Crystallogr. D Biol. Crystallogr. 69, 1204–1214 (2013).
Article CAS PubMed PubMed Central Google Scholar
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).
Article CAS PubMed PubMed Central Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article CAS PubMed PubMed Central Google Scholar
Oeffner, R. D. et al. Putting AlphaFold models to work with phenix.process_predicted_model and ISOLDE. Acta Crystallogr. D Struct. Biol. 78, 1303–1314 (2022).
Article CAS PubMed PubMed Central Google Scholar
Croll, T. I. ISOLDE: a physically realistic environment for model building into low-resolution electron-density maps. Acta Crystallogr. D Struct. Biol. 74, 519–530 (2018).
Article CAS PubMed PubMed Central Google Scholar
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 66, 213–221 (2010).
Article CAS PubMed PubMed Central Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).
Article CAS PubMed PubMed Central Google Scholar
Williams, C. J. et al. MolProbity: more and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315 (2018).
Article CAS PubMed Google Scholar
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Article CAS PubMed PubMed Central Google Scholar
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
Article PubMed PubMed Central Google Scholar
Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (2001).
Article CAS PubMed Google Scholar
Shapiro, B., Rambaut, A. & Drummond, A. J. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23, 7–9 (2006).
Article CAS PubMed Google Scholar
Drummond, A. J., Ho, S. Y., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006).
Article PubMed PubMed Central Google Scholar
Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013).
Article CAS PubMed Google Scholar
Lemey, P., Rambaut, A., Drummond, A. J. & Suchard, M. A. Bayesian phylogeography finds its roots. PLoS Comput. Biol. 5, e1000520 (2009).
Article PubMed PubMed Central Google Scholar
Baele, G. et al. HIPSTR: highest independent posterior subtree reconstruction in TreeAnnotator X. Bioinformatics https://doi.org/10.1093/bioinformatics/btaf488 (2025).
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
Article CAS PubMed Google Scholar
Krissinel, E. & Henrick, K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372, 774–797 (2007).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank T. Peacock, N. Thakur and J. Newman at The Pirbright Institute for provision of the TMPRSS2 plasmids, other protease constructs and the ACE2 expression library; S. Uyoga at the KEMRI–Wellcome Trust Research Program for the blood donor samples used in the serological assays; all the coronavirus research teams that deposited sequence information to GenBank that were selected by our greedy-algorithm-based approach; staff at the Diamond Light Source for beamtime (proposal mx36838) and staff of beamlines I04 and I24 for assistance with crystal testing and data collection. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission. D.B., A.D.N., S.C.G. and G.G. were supported by a BBSRC grant (BB/W006162/1). D.B. is supported by a BBSRC Institute Strategic Program Grant (BBS/E/PI/230002B) to The Pirbright Institute. D.B. and G.G. acknowledge the Pirbright Institute Flow Cytometry Facility (BBS/E/PI/23NB0003). A.D.N. is supported by a grant funded by the UK Department for Environment, Food and Rural Affairs (Defra, SE2947). The human receptor screening platform was supported by a MRC Partnership Grant (MR/X019705/1) awarded to D.B. and G.J.W. J.N., D.L. and B.A. are funded by Wellcome Trust grants 226141/Z/22/Z and 226130/Z/22/Z. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the article.

Author information

Authors and Affiliations

The Pirbright Institute, Woking, UK
Giulia Gallo, Antonello Di Nardo, Aghnianditya Kresno Dewantari & Dalan Bailey
Department of Pathology, University of Cambridge, Cambridge, UK
Giulia Gallo, Florence M. M. Buckley & Stephen C. Graham
KEMRI–Wellcome Trust Research Programme, Kilifi, Kenya
Doreen Lugano, Bernadette Ataku Kutima & James Nyagwange
Hull York Medical School, Department of Biology, York Biomedical Research Institute, University of York, York, UK
Adam J. Roberts & Gavin J. Wright
Loisaba Conservancy, Nanyuki, Kenya
Moses Okombo
Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, UK
Aghnianditya Kresno Dewantari
Department of Zoology, National Museum of Kenya, Nairobi, Kenya
Bernard Agwanda

Authors

Giulia Gallo
View author publications
Search author on:PubMed Google Scholar
Antonello Di Nardo
View author publications
Search author on:PubMed Google Scholar
Doreen Lugano
View author publications
Search author on:PubMed Google Scholar
Adam J. Roberts
View author publications
Search author on:PubMed Google Scholar
Bernadette Ataku Kutima
View author publications
Search author on:PubMed Google Scholar
Moses Okombo
View author publications
Search author on:PubMed Google Scholar
Aghnianditya Kresno Dewantari
View author publications
Search author on:PubMed Google Scholar
Florence M. M. Buckley
View author publications
Search author on:PubMed Google Scholar
Gavin J. Wright
View author publications
Search author on:PubMed Google Scholar
James Nyagwange
View author publications
Search author on:PubMed Google Scholar
Bernard Agwanda
View author publications
Search author on:PubMed Google Scholar
Stephen C. Graham
View author publications
Search author on:PubMed Google Scholar
Dalan Bailey
View author publications
Search author on:PubMed Google Scholar

Contributions

G.G., A.D.N., S.C.G. and D.B. conceptualized the study and designed the methodology. G.G., A.J.R., B.A.K., A.K.D., F.M.M.B. and B.A. performed the investigation. G.G., D.L. and A.J.R. conducted formal analyses. G.G., A.K.D. and F.M.M.B. performed validation. G.G., A.D.N., D.L. and M.O. performed visualization. G.G., S.C.G. and D.B. wrote the original draft of the manuscript. All authors reviewed and edited the manuscript. S.C.G. and D.B. supervised the study. G.J.W., J.N., A.D.N., S.C.G. and D.B. acquired funding.

Corresponding authors

Correspondence to Stephen C. Graham or Dalan Bailey.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Huan Yan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Lentiviral pseudotyping of the alphaCoV library.

(a) Pseudoparticles were purified from the cellular supernatant using a 20% sucrose gradient, and centrifugated at 33,000 g for 2 h, at 4 °C. Pellets were resuspended in DMEM and lysed for immunoblot analysis. Spikes were detected using an antibody recognizing the attached HA tag, while efficiency of pseudoparticle purification was assessed using an antibody against the p24 capsid protein of HIV. (b) Pseudoparticles were also pre-treated with TDPK-trypsin for one hour at room temperature, before purification and immunoblot analysis, as described in (a). (c) Pseudoviruses of CcCoV-related viruses were purified from cellular supernatant using 20% sucrose, as previously described, and the incorporation of S on the particles assessed by immunoblot against an HA-tag fused to the S. To validate correct purification, expression of the lentiviral structural protein p24 was also assayed.

Extended Data Fig. 2 APN and ACE2 receptor screening in presence of human TMPRSS2, or preincubation with trypsin.

(a) HEK293T overexpressing libraries of APN or ACE2 from various mammals, as indicated, plus human TMPRSS2 were infected with the alphaCoV library of S pseudotypes. Expression of hTMPRSS2 did not impact the overall pattern of receptor usage. Initial screening was performed in technical triplicate, and positive results were validated in two additional independent experiments. (b) Treatment of pseudotyped alphaCoV S with trypsin does not affect usage of APN and ACE2 receptors. Pseudoviruses were incubated with 250μg/mL of TDPK-trypsin for one hour at room temperature, before infection of HEK293T overexpressing selected APN or ACE2. Cleavage of S did not impact receptor usage, or host range (through comparison to results shown in Fig. 1b and Fig. 1c). Experiments were performed once in technical triplicate.

Extended Data Fig. 3 Binding of a subset of alphaCoV predicted RBDs to known coronavirus receptors, as assessed by flow cytometry.

(a) Top: Phylogenetic tree of the alphaCoV library. Viruses with RBDs that were successfully produced recombinantly are highlighted in green, while those we could not obtain are coloured in red. Those viruses highlighted in grey were not tested (no outline) or alternatively the RBD could not be confidently identified by alignment (black outline). Below: Coomassie staining of the recombinant RBDs used in the binding assay. (b) Binding of viral RBDs to relevant APN, ACE2, TMPRSS2, DPP4 and DPEP1 receptors, as determined by antibody staining and flow cytometry. The numbers by each sample indicate the percentage of PE+ positive cells among the singlets (receptor + population). The numbers in coloured boxes show the percentage of APC+ cells (viral RBD + population) in this PE+ population. At least 25,000 events were acquired per sample.

Extended Data Fig. 4 Human CEACAM6 supports cell-cell fusion in the presence of CcCoV-KY43 S.

A fusion assay, based on a split firefly luciferase and GFP, was used to validate usage of human CEACAM6 as a cognate receptor for CcCoV-KY43 S. The average fold change from at least three independent experiments, each performed in triplicate, is shown along with the standard deviation. Statistical significance was examined using a one-sample t test of log₁₀ fold change against a theoretical mean of 0, with standard errors calculated using a pooled variance estimate across all groups to ensure robust variance estimation given the small sample sizes. P-values were adjusted for multiple comparisons using the Holm-Bonferroni correction (*p-value = 0.01, ns: non-significant difference).

Extended Data Fig. 5 Monoclonal antibodies B6.2 and 493424 are specific to human CEACAM6.

HEK293T cells transfected with plasmids encoding the indicated human CEACAMs were stained using the anti-CEACAM6 commercial monoclonal antibodies, B6.2 and 493424. FITC-conjugated anti-mouse was used as secondary antibody to detect binding of the monoclonals. Surface expression was then quantified using a flow cytometer.

Extended Data Fig. 6 Human CEACAM6 usage within the alphaCoV pseudotype S library.

Pseudovirus (PV) entry assays showed that only CcCoV-KY43 requires CEACAM6 for entry. Experiment was performed in technical triplicate, with the mean and SD relative light units (RLU) plotted.

Extended Data Fig. 7 Additional amino-acid substitutions in human CEACAM5 and CEACAM6 confirm a major contribution of position 63 to receptor specificity.

(a) Entry assays using CcCoV-KY43 S pseudotypes with HEK293T cells transiently transfected with human CEACAM5 and CEACAM6, or mutated forms of these proteins. The substitutions I63F and Q123H confirms that position 63 is a major determinant of CEACAM usage for KY43. (b) Flow cytometry data shows comparable expression of human CEACAM5 and CEACAM6, and their derived mutants, in HEK293T cells.

Extended Data Fig. 8 Entry of pseudotyped CcCoV|KY43 S using a wider library of human CEACAMs.

HEK293T cells were transiently transfected to individually overexpress human CEACAMs, as indicated. The experiment was performed with two independent replicates, both containing technical triplicates. The technical triplicates of two biological replicates, with SD, are shown as relative light units (RLU), compared to non-enveloped (NE) controls. ns: non-significant, ****p = <0.001 statistical analysis using two-way ANOVA.

Extended Data Fig. 9 Comparison of RBD binding to different human CEACAMs using bio-layer interferometry (BLI).

Ni-NTA sensors were loaded with His-tagged CcCoV|KY43 RBD (a) or CcCoV-2B (b) and then incubated with different human CEACAMs in a two-fold concentration gradient from 600 nM (dark colours) to 37.5 nM (light colours). BLI responses following subtraction of a reference sensor incubated with buffer are shown. The end of the association phase is marked (dashed vertical line). Human CEACAM6 binds strongly to CcCoV RBDs, and this binding is lost upon introduction of an I63F substitution. CcCoV RBDs bind weakly to human CEACAM5 and displays negligible binding to CEACAM1, CEACAM3 and CEACAM8. Data are representative of two independent experiments.

Extended Data Fig. 10 Usage of human CEACAM5, CEACAM6, and individual point mutants, by pseudotyped CcCoV-2A S.

(a) Pseudovirus entry of CcCoV-2A S in HEK293T cells expressing the indicated CEACAM protein. The F63I substitution in CEACAM5 increases CcCoV-2A S-mediated entry to the level seen with CEACAM6. Values are presented as relative light units (RLU). Assays were performed twice in technical triplicates with mean RLU and SD shown. (b) CcCoV-2A S entry into cells expressing CEACAM6 substitution I63F is reduced relative to wild type human CEACAM6 (hCEACAM6). Assays were performed twice in technical triplicates in triplicate with percentage PV entry and SD shown. Statistical analysis was performed using paired one-way ANOVA (****p-value < 0.0001) and paired t-test (***p-value = 0.0002), respectively.

Extended Data Fig. 11 Binding of recombinant btCoV-977 to Egyptian fruit bat, but not human, CEACAM6.

ELISA using the recombinant RBD of btCoV-977 to assess binding to soluble Egyptian fruit bat CEACAM6-like, or human CEACAM6, proteins. The mean OD from a representative experiment, performed in technical quadruplicates, is shown together with the SD.

Extended Data Fig. 12 Sequence determinants of human CEACAM6 usage.

(a) Alignment of RBD sequences from the CEACAM6-using alphaCoV RBDs included in this study. Loops that interact with human CEACAM6 are highlighted in red. Residues that interact with human CEACAM6 in the CcCoV-KY43 and CcCoV-2B complex structure are shown in bold red. Residues of btCoV-LN20 that were mutated to probe the sequence determinants of human CEACAM6 usage (see below) are underlined in grey. (b) Host range of btCoV-LN20, a CEACAM6 user, cannot be defined by point substitutions in its RBD. Entry assays using pseudotyped btCoV-LN20 S show that single mutations in the RBD, making it more like CcCoV-KY43 S, cannot recapitulate the broad host range of CcCoV-KY43, including usage of human CEACAM6. The background-corrected average of three independent experiments, each performed in triplicate, is shown.

Supplementary information

Supplementary Information (download DOCX )

Supplementary Figs. 1–22, Supplementary Tables 1–10 and Supplementary References.

Reporting Summary (download PDF )

Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gallo, G., Di Nardo, A., Lugano, D. et al. Heart-nosed bat alphacoronaviruses use human CEACAM6 to enter cells. Nature 653, 180–189 (2026). https://doi.org/10.1038/s41586-026-10394-x

Download citation

Received: 23 July 2025
Accepted: 10 March 2026
Published: 22 April 2026
Version of record: 22 April 2026
Issue date: 07 May 2026
DOI: https://doi.org/10.1038/s41586-026-10394-x