Table 1 Proteomes and structure models considered.

From: Real-time structure search and structure classification for AlphaFold protein models

Species

Common name

Reference proteome

# unique UniProt IDs

# original

# domains

# structure predictions with no domains (1D)

Arabidopsis thaliana

Arabidopsis

UP000006548

27,434

27,434

37,682

5722

Caenorhabditis elegans

Nematode worm

UP000001940

19,694

19,694

26,160

4277

Candida albicans

C. albicans

UP000000559

5974

5,974

9,978

743

Danio rerio

Zebrafish

UP000000437

24,664

24,664

42,135

2530

Dictyostelium discoideum

Dictyostelium

UP000002195

12,622

12,622

18,963

2986

Drosophila melanogaster

Fruit fly

UP000000803

13,458

13,458

19,881

2335

Escherichia coli

E. coli

UP000000625

4363

4363

5397

417

Glycine max

Soybean

UP000008827

55,799

55,799

72,217

14,146

Homo sapiens

Human

UP000005640

20,504

23,391

44,827

3302

Leishmania infantum

L. infantum

UP000008153

7924

7924

12,257

1579

Methanocaldococcus jannaschii

M. jannaschii

UP000000805

1,773

1,773

2,097

131

Mus musculus

Mouse

UP000000589

21,615

21,615

35,216

2477

Mycobacterium tuberculosis

M. tuberculosis

UP000001584

3988

3988

5170

351

Oryza sativa

Asian rice

UP000059680

43,649

43,649

39,775

19,756

Plasmodium falciparum

P. falciparum

UP000001450

5187

5187

7283

1162

Rattus norvegicus

Rat

UP000002494

21,272

21,272

33,818

2664

Saccharomyces cerevisiae

Budding yeast

UP000002311

6040

6040

9837

967

Schizosaccharomyces pombe

Fission yeast

UP000002485

5128

5128

8173

637

Staphylococcus aureus

S. aureus

UP000008816

2888

2888

3283

415

Trypanosoma cruzi

T. cruzi

UP000002296

19,036

19,036

26,205

5436

Zea mays

Maize

UP000007305

39,299

39,299

48,433

11,582

  1. For each proteome, the number of unique proteins, total original/domain models, and total original models containing no confident domains are given. The definition of the confident domains is given in the main text. The human original model count is underlined, indicating that the number of original models does not match the number of unique proteins. The human structure predictions retrieved from the AlphaFold Database contain models which are 1400-residue slices of larger proteins.