Main

Even in sequenced species little is known about the Y chromosomes because their heterochromatic state precludes sequence assembly into large and easily studied scaffolds, but instead short Y-linked scaffolds must be individually identified11,12. In most Drosophila species the Y chromosome is essential for male fertility13, and genetic data have identified between six and ten Y-linked factors required for this function14,15. The paucity of genes and its heterochromatic state suggested that, like the mammalian Y chromosome16, the Drosophila Y chromosome might be largely a degenerated X chromosome. The conservation of the fertility function in rather distant species fits well with the known conservation of the gene content of Drosophila chromosomal arms6,17. Therefore sex-chromosome evolutionary theory8,9, well-known patterns of chromosome evolution in Drosophila, and conservation of biological function all suggest that the Drosophila Y chromosome ought to be a degenerated X chromosome, with a few remaining and well-conserved genes. However, the 12 genes identified on the D. melanogaster Y chromosome were all acquired through gene duplications from the autosomes, rather than being a relic subset of the X-linked genes18,19,20,21,22. Furthermore, a Y-chromosome–autosome fusion in the D. pseudoobscura lineage made the ancestral Y chromosome into part of an autosome, and a new Y chromosome arose23. Both findings suggest that Drosophila Y chromosomes are labile and raise the question of how well conserved is their gene content.

The recent sequencing of 10 further Drosophila genomes24 allows a detailed study of this question. We first identified the putative orthologues of the 12 known D. melanogaster Y-linked genes18,19,20,21,22 in the remaining species (see Methods). Owing to the low coverage of the Y chromosome11 and its abundance of repetitive sequences, the sequences of almost all Y-linked genes have large gaps and sequencing errors, and different exons of the same gene are scattered in several scaffolds19,20 (Supplementary Fig. 1). These problems were corrected by direct sequencing of the products from polymerase chain reaction with reverse transcription (RT–PCR) and rapid amplification of complementary DNA ends (RACE) (see Methods) for all genes. We sequenced 150 kilobases (kb), and the average gene has one-third of its sequence generated de novo (Supplementary Table 1). Notably, we could not find the orthologue of the Pp1-Y1 gene in D. mojavensis or the orthologue of Ppr-Y in D. grimshawi, even among the raw sequencing traces. Synteny analysis strongly suggests that the Pp1-Y1 loss is real; degenerate PCR with a primer pair that amplifies Ppr-Y in a broad range of species confirmed its loss in D. grimshawi (Supplementary Discussion).

Molecular evolutionary analysis, revealing a substantial excess of synonymous over nonsynonymous changes in protein-coding genes, strongly indicates that all of these Y-linked genes are functional (Supplementary Table 2). Orthology was confirmed by phylogenetic analysis of all genes (Supplementary Fig. 2). We then tested their Y-linkage by PCR in males and females. Notably, many of the genes are not Y-linked in several species (Supplementary Fig. 3 and Table 1). The results of D. pseudoobscura and D. persimilis are expected, given the known Y-autosome fusion that occurred in this lineage23. The other linkage changes (Table 1) can be caused by individual movements of genes from the Y chromosome to the other chromosomes or vice versa. Movement direction was unambiguously ascertained by synteny analysis even in the kl-5 gene, with the data indicating two independent transfers to the Y chromosome (Fig. 1 and Supplementary Fig. 4). Using synteny (Supplementary Figs 4–8) and the known phylogenetic relationships among the sequenced species24, we could infer the direction and time of the gene movements, as shown in Fig. 2. Intron positions were conserved in all cases, which rules out retrotransposition and suggests a DNA-based mechanism for the gene movements (Supplementary Discussion). Most or all extant genes were acquired individually by the Y chromosome (as opposed to resulting from large segmental duplications), because they are not adjacent to each other at their original autosomal locations (Supplementary Figs 4–8 and Supplementary Table 3).

Table 1 Y-linkage across the 12 Drosophila species
Figure 1: Synteny analysis of the kl-5 gene.
figure 1

a, b, The gene is Y-linked in all examined Drosophila species except D. willistoni (and in D. pseudoobscura or D. persimilis), which might suggest a Y-chromosome-to-autosome transfer in the D. willistoni lineage. However, the conserved synteny between D. willistoni and Anopheles gambiae (a) shows that the autosomal D. willistoni location is ancestral (thick lines in b). Hence, there were two independent transfers of kl-5 to the Y chromosome (arrows in b). Note that the Drosophila CG3330 gene has no orthologue in Anopheles. See Supplementary Fig. 4 for the remaining species.

PowerPoint slide

Figure 2: Gene movements in the Drosophila Y chromosome.
figure 2

Gene gains (red arrows) and losses (blue arrows) were inferred by synteny. For changes that occurred before the split of the Drosophila and Sophophora subgenera (genes kl-2, kl-3, ORY, PRY and Ppr-Y; dashed arrows) there is no close outgroup for inferring the direction (gain versus loss) through synteny. However, all five genes are autosomal or X-linked in Anopheles, which suggests that they were acquired by the Y chromosome between 260 (that is, the DrosophilaAnopheles divergence time3,5) and 63 Myr ago.

PowerPoint slide

It is clear from Fig. 2 that the gene content of the Drosophila Y chromosome is highly variable: among the 12 known Y-linked genes of D. melanogaster, only three (kl-2, kl-3 and ORY) are Y-linked in all sequenced species (we ignored the special case of the Y-chromosome–autosome fusion in the D. pseudoobscura lineage because the changes that happened there were not caused by individual gene gain and loss). All other genes (75% of the total) moved onto or off the Y chromosome at least once, or were lost. This contrasts sharply with the remainder of the genome, where it was found that 514 genes out of 13,000 (4% of the total) moved to different chromosome arms in the same set of species6, and may suggest that there is increased gene movement to and from the Y chromosome, as has been observed for the X chromsome25,26,27. However, the rate of gene movements in the Y chromosome is smaller than the rate of similarly sized chromosome arms (Supplementary Discussion), and thus increased gene movement does not seem to be the main cause of the low conservation of Y-linked gene content.

The contrast between the Y chromosome and the other chromosomes seems to reflect their different evolutionary histories: whereas in the ancestor of all sequenced species the large chromosome arms had thousands of genes, the Y chromosome had a very low number of genes (we know of five: kl-2, kl-3, Ppr-Y, PRY and ORY; Fig. 2). This, coupled with a small number of gene movements in both genomic compartments, would produce the present pattern of low conservation in the Y chromosome and high conservation in the other chromosomes. A possible caveat to this conclusion is that we do not know the full gene content of the Drosophila Y chromsome22. However, the low conservation of linkage we found should hold for the full gene set of the D. melanogaster Y chromosome, because the discovery of the 12 known Y-linked genes did not use any information from the other species (their genomic sequences were not even available at that time). Hence it is safe to conclude that most of the D. melanogaster Y-linked genes are recent acquisitions. In contrast, the mammalian Y chromosome mostly contains relic subsets of the X-linked genes, and variation in the Y-linked gene content among species reflects differential loss of these relic genes and some gene acquisitions28,29. In Drosophila no such relic genes have been found, and variation arises mainly from a continuing process of gene acquisition.

Figure 2 suggests that there are more gene gains than losses in the Y chromosome lineages examined, but these inferences were drawn using genes ascertained in D. melanogaster, opening a concern about bias. For example, D. virilis probably harbours Y-linked genes that were either acquired after its ancestor split from the D. melanogaster lineage, or were lost in the D. melanogaster lineage, and such genes would not be detected in the present study. Indeed, direct search in the D. virilis genome identified at least two Y-linked genes not shared with D. melanogaster (A.B.C. and A.G.C, unpublished data). Given the ascertainment issue, only the rate of gene gain can be estimated in the D. melanogaster lineage branches of the phylogeny, and only the rate of gene loss can be estimated in the other branches (Supplementary Fig. 9). This procedure produces an estimate of the raw rate of gene gain by the Y chromosome of 0.1113 genes per Myr (7 gains in 63 Myr), whereas the raw rate of gene loss is 0.0073 genes per Myr (2 losses in 275 Myr). After correcting for an ascertainment bias in the loss rate (Supplementary Methods), and under the assumption that the rates of gene gain and gene loss are homogeneous across the lineages, we found that the rate of gene gain is 10.9 times higher than the rate of gene loss (P = 0.003 under the null hypothesis of equal gain and loss rates), which strongly suggests that the gene content of the Y chromosome has indeed increased.

To explore more fully the consequences of the ascertainment bias of gene content, we performed simulations of gene gain and loss using the observed phylogeny and branch lengths, and made inferences of gene loss conditional on observing the same genes in D. melanogaster (identical to the true ascertainment). Approximate Bayesian estimates of the posterior densities of the rates of gene gain and loss were obtained by a rejection-sampling procedure for 1,000 runs (Supplementary Methods). All 1,000 runs had a gene gain rate exceeding the gene loss rate across the phylogeny (Fig. 3 and Supplementary Fig. 11). Thus both the simulations and the analytical result provide strong evidence that the Y chromosome lineages examined have experienced a net gain in gene number. The origin of the Drosophila Y chromosome remains a controversial issue9,23; if one assumes that it arose from the degeneration of the X chromosome, then only more recently had gene gains became important after all of its ancestral genes (shared with the X chromosome) had been lost.

Figure 3: Posterior density of the net rate of Y-linked gene gain in the Drosophila phylogeny.
figure 3

A Bayesian rejection sampling procedure was applied (see text) to yield 1,000 estimates of the rates of gene gain and loss conditional on the observed gains and losses of genes on the Y chromosome, and conditional on the genes being observed in D. melanogaster (matching the actual ascertainment of Y genes used in this study). The average net gain rate (gain rate minus loss rate) is 0.130 genes per Myr, and all 1,000 simulations had a higher rate of gene gain than loss (range of net gain rate: 0.035 to 0.352).

PowerPoint slide

Given the restrictive characteristics of the Y chromosome (for example, its heterochromatic state) it is puzzling that genes moved there. Several hypotheses, ranging from neutrality to positive selection, could explain this but our data do not allow definitive support for one model (Supplementary Discussion). The Y-linked gene Suppressor of Stellate, which is a recent acquisition in the D. melanogaster lineage, may be a case of positive selection30 (we excluded it because it is multi-copy and RNA-encoding). Whatever its cause, the finding that the Y chromosome has gained genes has interesting consequences. A chromosome that on average has gained genes and yet has few of them must be relatively young. Further Diptera genome sequences may shed light on this issue. But the data in hand already strongly support the conclusion that the gene content of the Drosophila Y chromosome is younger than the other chromosomes, and that gene acquisitions have had a prominent role in its evolution.

Methods Summary

Genomic sequences

We used the WGS3 assembly of D. melanogaster (accession AABU00000000), the TIGR assembly of D. pseudoobscura (accession AAFS01000000) and the CAF1 assemblies for all other species (available at http://rana.lbl.gov/drosophila/caf1.html). Full details of the strains used, sequencing and assembly strategies are described in ref. 24.

Search of orthologues of D. melanogaster Y-linked genes

We searched for these genes with TblastN20, using as queries the protein sequences of the D. melanogaster Y-linked genes18,19,20,21,22 and as databases the genomes of the remaining species. Orthology was confirmed by phylogenetic analysis (Supplementary Fig. 2). Supplementary Table 1 shows the accession numbers of the finished CDS sequences.

Molecular biology methods

DNA and RNA were extracted from the same strains used for the genome sequencing24. RNA and DNA extractions, PCR and RT–PCR were performed using standard protocols19,20. 3′ RACE and 5′ RACE were performed with the Invitrogen Gene Racer Kit following the instructions of the manufacturer, using testis or whole body total RNA (in the case of D. grimshawi) as templates. DNA sequencing was done at Macrogen (Korea) and the Cornell DNA sequencing core facility.