Table 3 Summary of the top outlier pentapeptides.

From: Global pentapeptide statistics are far away from expected distributions

Protein region

Permutation class

Number of sequences in the class

Total occurrences of the class

Average occurrences of the class

Variance

Sequence

Occurrences of the sequence

%Total

Z score

Under-represented

DM

LLLLS

5

1103203

220641

176512

SLLLL

129826

11.8

−216

DM

AAELR

60

7061680

117695

115733

LEARA

46773

0.7

−208

DM

AELLR

60

5987177

99786

98123

LELRA

35795

0.6

−204

DM

AAAAL

5

2256485

451297

361038

LAAAA

335710

14.9

−192

DM

GGGGI

5

442963

88593

70874

GGGGI

41341

9.3

−177

DM

AAAGL

20

4815784

240789

228750

LGAAA

156132

3.2

−177

DM

AEKLL

60

4064306

67738

66609

LKLEA

24337

0.6

−168

DM

ELLRR

30

2152724

71757

69366

RLELR

29344

1.4

−161

DM

AAALR

20

4377052

218853

207910

LRAAA

145448

3.3

−161

DM

EEKLL

30

1999573

66652

64431

ELKLE

26021

1.3

−160

Over-represented

DM

EGHKT

120

827985

6900

6842

HTGEK

413267

49.9

4913

DM

FMNSW

120

268723

2239

2221

NMSFW

198813

74.0

4171

DM

PTVWY

120

351052

2925

2901

WTVYP

209428

59.7

3834

DM

GKLST

120

3009236

25077

24868

GKSTL

595718

19.8

3619

DM

EGKPY

120

911766

7598

7535

GEKPY

318604

34.9

3583

DM

EGKPT

120

1461844

12182

12081

TGEKP

389313

26.6

3431

DM

GTVWY

120

445075

3709

3678

GWTVY

210756

47.4

3414

DM

EGMWY

120

217167

1810

1795

WMGYE

145689

67.1

3396

DM

FMNPR

120

339635

2830

2807

FPRMN

177146

52.2

3290

DM

FLMSW

120

362542

3021

2996

MSFWL

182120

50.2

3272

Under-represented

ND

GGPPP

10

586979

58698

52828

GPGPP

10793

1.8

−208

ND

AAAPP

10

1239944

123994

111595

APPAA

71014

5.7

−159

ND

AGGGG

5

839291

167858

134287

GAGGG

114103

13.6

−147

ND

DDSSS

10

606514

60651

54586

SDDSS

26740

4.4

−145

ND

GPPPP

5

388310

77662

62130

PGPPP

42721

11.0

−140

ND

DDDSS

10

487092

48709

43838

DDDSS

19800

4.1

−138

ND

GGGRR

10

580934

58093

52284

GRRGG

26697

4.6

−137

ND

AAGGG

10

948949

94895

85405

AGGGA

58015

6.1

−126

ND

RRRSS

10

510453

51045

45941

RSSRR

24251

4.8

−125

ND

RRSSS

10

563675

56368

50731

SRSSR

29134

5.2

−121

Over-represented

ND

DHKPW

120

141538

1179

1170

HPDKW

122449

86.5

3546

ND

DKPTW

120

168175

1401

1390

PDKWT

121326

72.1

3217

ND

KQTVW

120

156893

1307

1297

KWTVQ

116218

74.1

3191

ND

ILQTW

120

171694

1431

1419

QITLW

121321

70.7

3183

ND

DGKMP

120

207222

1727

1712

KPGMD

129015

62.3

3076

ND

DKTVW

120

175850

1465

1453

DKWTV

117182

66.6

3036

ND

GKLMP

120

249781

2082

2064

LKPGM

128663

51.5

2786

ND

ILPQT

120

431365

3595

3565

PQITL

122621

28.4

1994

ND

FIPPS

60

241704

4028

3961

FPISP

110649

45.8

1694

ND

EIPST

120

536102

4468

4430

SPIET

106093

19.8

1527

Under-represented

NN

AGGGG

5

395836

79167

63334

AGGGG

52481

13.3

−106

NN

GGGGN

5

157711

31542

25234

NGGGG

17608

11.2

−88

NN

AAAPP

10

469392

46939

42245

APPAA

29107

6.2

−87

NN

GGPPP

10

100799

10080

9072

GPGPP

2337

2.3

−81

NN

AAGGG

10

484217

48422

43580

AGGGA

32303

6.7

−77

NN

GGGNN

10

126506

12651

11386

GGGNN

4638

3.7

−75

NN

GGGGT

5

151280

30256

24205

GTGGG

18821

12.4

−74

NN

LLQQQ

10

160339

16034

14431

QQLQL

8129

5.1

−66

NN

DDSSS

10

179338

17934

16140

SDDSS

9821

5.5

−64

NN

DDDSS

10

144807

14481

13033

DDSSD

7326

5.1

−63

Over-represented

NN

CEFHK

120

41470

346

343

KHCFE

26831

64.7

1431

NN

CEFHV

120

44399

370

367

HCFEV

27459

61.8

1414

NN

CFHKS

120

45270

377

374

SKHCF

26211

57.9

1336

NN

DESTV

120

348198

2902

2877

TDEVS

48183

13.8

844

NN

CEFVV

60

44179

736

724

CFEVV

22635

51.2

814

NN

HKSSV

60

89154

1486

1461

VSSKH

31133

34.9

776

NN

DEFVV

60

148928

2482

2441

FEVVD

37821

25.4

715

NN

CHKSS

60

34640

577

568

SSKHC

16273

47.0

659

NN

DDERT

60

122092

2035

2001

DRTDE

30087

24.6

627

NN

DERTV

120

255182

2127

2109

RTDEV

30351

11.9

615

  1. Ten most underrepresented and ten most overrepresented peptides in DM, in ND and in NN protein regions, respectively.