Introduction

The deoxyribonucleic acid (DNA) and the ribonucleic acid (RNA) are made of nucleotides (A/T or U/C/G), the monomer-units of nucleic acids. Nucleotides are grouped into three different classes based on their chemical properties, i.e., purine group R = {A, G} and pyrimidine group Y = {C, T/U}; amino group M = {A, C} and keto group K = {G, T/U}; strong H-bond group S = {C, G} and weak H-bond group W = {A, T/U}1. There are two kinds of nitrogen-containing bases - Purines and Pyrimidines, first isolated from hydrolysates of nucleic acids, were identified using classical methods of organic chemistry. An important contribution was made by Emil Fischer who must be credited with the earliest synthesis of purines (1897)2. Purines consist of a nine member double-ring (containing carbon and nitrogen) fused together, where as pyrimidines have a six member single-ring comprising of carbon and nitrogen3,4. Among three biological properties namely purine/pyrimidine, strong/week hydrogen bond and amino/keto, our analysis demonstrates a strong evidence that the organization of purine-pyrimidine bases over miRNAs is crucial. Therefore, we intend to understand the organization of the two chemical bases purine and pyrimidine over some of the non-coding RNAs, microRNAs using different mathematical parameters.

MicroRNAs (abbreviated miRNAs) contain about 18–25 ribonucleotides that can play important gene regulatory roles by pairing to the messages of protein-coding genes, to specify messenger RNA (mRNA) cleavage or repression of productive translation5,6,7. miRNA genes are one of the more abundant classes of regulatory genes in animals, estimated to comprise between 0.5 and 1 percent of the predicted genes in worms, flies, and humans, raising the prospect that they could have many more regulatory functions than those uncovered to date8,9,10. The main function of miRNAs is to down-regulate gene expression11. One miRNA may target several mRNAs, and a particular mRNA might be regulated by multiple miRNAs12,13,14,15,16,17. It is important to identify the miRNA targets accurately. miRNAs control gene expression by targeting mRNAs and triggering either translation repression or RNA degradation18,19,20. Their aberrant expression may be involved in various human diseases, including cancer21,22,23,24,25,26,27. miRNA regulatory mechanisms are complex and there is still no high-throughput and low-cost miRNA target screening technique28,29,30,31,32. It is an well known fact that each miRNA is potentially able to regulate around 100 or more target mRNAs and 30% of all human genes are regulated by miRNAs33.

In this article an attempt has been made to decipher the patterns of purine and pyrimidine distributions over the miRNAs of the three species human, gorilla and chimpanzee from Homonidae family and two species mouse and rat from Muridae family. We desire to understand how the purine and pyrimidine bases are organized over the sequence and how much distantly the purine or pyrimidine bases can be placed over the sequence. Which one of these two types of chemical bases purine or pyrimidine dominates the other in terms of their frequency density over the sequence is one of our prime aims to comprehend. A simple binomial distribution (i.e. location independent occurrence of the bases) fails to describe the observed variation of purine and pyrimidine. This encourages us to look for further patterns. We investigate, the self-organization of the purine and pyrimidine bases for all the miRNAs of the five species human, gorilla, chimpanzee, mouse and rat through the fractal dimension of the indicator matrix. The auto correlation of purine-pyrimidine bases over the miRNAs through the parameter Hurst exponent is determined and found many of the miRNAs having identical auto correlations even if their purine-pyrimidine organization is different. All the miRNAs are compared about their nearness based on their purine-pyrimidine distribution, Hamming distance is employed among all the miRNAs in understanding the nearness of purine-pyrimidine organization. The purine-pyrimidine distance patterns including the frequency distribution have been found for all the miRNAs for all the five species. All possible distinct patterns of frequency distribution are determined for all the miRNAs of all the five species. Here we wish to bring attention to the reader that through our investigation, the one miRNA hsa-miR-6124 MIMAT0024597 of human, made of only purine bases is identified. There is no miRNA (human, gorilla, chimpanzee, mouse and rat) which is absolutely made of pyrimidines. In order to understand the association among miRNAs and their target mRNAs, we take a set of mRNAs from human species. Based on the quantitative measures, we have examined the set of miRNAs which relates the associations with the target mRNAs.

Materials and Methods

Dataset Specification

From the MiRBase (a miRNA database: http://www.mirbase.org/ (Release 21))34, from the family Hominidae, total of 2588 mature miRNAs of human, 357 mature miRNAs of gorilla and 587 mature miRNAs of chimpanzee and from the family Muriade, total of 1915 mature miRNAs of mouse and 765 miRNAs of rat are taken. Each miRNA of human, gorilla, chimpanzee, mouse and rat are encoded as numbers starting from h1 to the total number of sequences h2588 for miRNAs of human and same has been made for miRNAs of gorilla g1 to g357, for miRNAs of chimpanzee p1 to p587, for miRNAs of mouse m1 to m1915 and for miRNAs of rat r1 to r765 (Supplementary Table S1). We then transform the miRNAs sequences (A, U, C, G) into binary sequences (1’s and 0’s) according to the following rules:

$$\begin{array}{c}A/G\to \mathrm{1;}\\ U/C\to \mathrm{0;}\end{array}$$

That means purine and pyrimidine nucleotide bases are encoded as 1 and 0 respectively into the transformed binary sequences of miRNAs. Therefore, presently we have five datasets of binary sequences from the five species human, gorilla, chimpanzee, mouse and rat. All the computational codes are written in MATLAB R2016a software. One can easily obtain the results of the discussed methods of this article for any datasets, the detailed procedures are discussed and also we have provided the source codes (MATLAB 2016 onwards) in Supplementary Table S2.

Fractal Dimension of Indicator Matrices

Here we shall encode each binary sequences into its indicator matrices35,36. It is noted that there are several other techniques for finding fractal dimension and self-organization structure of DNA sequences37,38. Consider a set S = {0, 1} and an indicator function f : {0, 1} × {0, 1} ↔ {0, 1} is defined as for all (x, y) S × S,

$$f(x,\,y)=(\begin{array}{cc}\mathrm{1,} & {\rm{if}}x=y\\ \mathrm{0,} & {\rm{if}}x\ne y\end{array}$$
(1)

This indicator function can be used to obtain the binary image of the binary sequence as a two dimensional dot-plot. The binary image obtained by this indicator matrix can be used to visualize the distribution of ones and zeros within the same binary sequence and some kind of auto-correlation between the ones and zeros of the same sequence. It can be easily drawn by assigning a black dot to 1 and a white dot to 0. An example of indicator matrix is shown in Fig. 1 for the binary sequence HsamiR−576−3pMIMAT0004796: 1111010111111100111100.

Figure 1
figure 1

Indicator matrix for the binary sequence HsamiR−576−3pMIMAT0004796: 1111010111111100111100.

From the indicator matrix, we can have an idea of the “fractal-like” distribution of ones and zeros (purines and pyrimidines). The fractal dimension for the graphical representation of the indicator matrix plots can be computed as the average of the number p(n) of 1 in the randomly taken n × n minors of the N × N indicator matrix. Using p(n), the fractal dimension (FD) is defined below.

$$FD=\frac{1}{N}\sum _{n=2}^{N}\frac{logp\,(n)}{log\,n}$$
(2)

The self-organization of the purine and pyrimidine bases for all the miRNAs can be obtained through the fractal dimension of the indicator matrix.

Hurst Exponent of Binary Sequences

The Hurst Exponent (HE) deciphers the autocorrelation of a time series appeared in several areas of applied mathematics39,40,41. Hurst exponent ranges from 0 to 1. A value of HE in the interval [0, 0.5] indicates a time series with negatively autocorrelated and a value of HE in the interval [0.5, 1] indicates a time series with positively autocorrelated. A value of HE = 0.5 indicates a random series, there is no correlation of the variable with its past value. The larger the HE value is the stronger the correlation.

The Hurst exponent of a binary sequence {x n } is defined as

$${(\frac{n}{2})}^{HE}=\frac{R(n)}{S(n)}$$
(3)

where \(S(n)=\sqrt{\frac{1}{n}{\sum }_{i}^{n}({x}_{i}-m)}\) and R(n) = maxY (i, n) − minY (i, n); 1 ≤ i ≤ n where \(Y(i)={\sum }_{j=1}^{i}({x}_{j}-m)\) and \(m=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{x}_{i}}\)

The auto correlation of purine-pyrimidine bases for all the miRNAs is obtained through the Hurst exponent.

Hamming Distance of Binary Sequences

The Hamming Distance (HD) between two binary strings is the number of bits in which they differ42,43,44. Since length of the miRNAs might differ and hence a special care has been taken into consideration. Suppose there are two miRNAs S n and S m of length n and m respectively (n > m), then

$$HD({S}_{n},\,{S}_{m})=min(hd({S}_{n},{S}_{m}))$$
(4)

where S m of length m window is sliding over S n from the left alignment to the right alignment and each time hamming distance (hd) is calculated, and finally minimum hd value is taken as hamming distance HD of two binary sequences.

For example, take two binary sequences S n  = 010100 and S m  = 1101, now sliding of S m over S n of length 4, from left to right alignment of these two sequences, we find the hamming distances are hd(010100,1101) = 1, hd(010100,1101) = 3, hd(010100,1101) = 2, therefore we take HD = 1 (minimum) of these two binary sequences. Finding the minimum hamming distance of the two binary sequences says about the maximum similarity of two sequences over the distribution of purines and pyrimidines. The minimum value of HD = 0 when the pattern of length min(n, m) of two binary sequences of miRNAs are exactly identical i.e. similar distribution of purines and pyrimidines over the miRNAs of the two sequences and the maximum value of HD = min(n, m) when the pattern of length min(n, m) of two binary sequences of miRNAs are exactly opposite i.e. completely dissimilar distribution of purines and pyrimidines over miRNAs two sequences.

To get the nearness of the miRNAs based on their purine-pyrimidine distribution, minimum Hamming distance is deployed.

Distance pattern of purine and pyrimidine over miRNAs

Here we are exploring the distance pattern of purines bases across the miRNAs of five species. How sparsely (closely) purine bases are placed over the miRNAs. So we find the distance (gap) between purine bases to the immediate next purine base over the miRNA sequences.

For example, take a transformed binary sequence S m  = 110100111000001, where 1 indicates the purines bases and 0 indicates pyrimidine bases in the sequence. From left to right the positions of 1’s and 0’s in serial is shown in Table 1. Now, from the distribution of 1’s, we find the purine distances at 1 (two consecutive 1’s at a distance of 1: 11), 2 (two consecutive 1’s at a distance of 2: 101), 3 (two consecutive 1’s at a distance of 3: 1001) and 6 (two consecutive 1’s at a distance of 6: 1000001). So, the distance pattern of purines (Purine-Distance pattern(Pu-Dp)) over the sequence is [1-2-3-6] in order.

Table 1 The position of each bit (1/0) of transformed binary sequence of S m  = 110100111000001 from left to right.

Similar to the distance pattern of purine (Pyrimidine-Distance pattern(Py-Dp)), the distance pattern of pyrimidine bases (0’s) across the miRNAs also can be determined. The distance pattern of pyrimidines of the above sequence is [1-2-4] in order. Further the distance pattern of pyrimidines [1 2 4] of a miRNA opens up a fact that there is at least one 1(=2−1) and at least one 3(=4−1) length purine blocks present in the miRNA. In the similar way, a distance pattern of purine triggers the presence of pyrimidine blocks in miRNAs. If there is no distance pattern of purine (or pyrimidine) i.e. miRNA is made of only the pyrimidine (or purine) bases respectively, we denote the distance pattern of purine (or pyrimidine) as [−] and if the miRNA is having a single purine (or pyrimidine) base, we denote the distance pattern of purine (or pyrimidine) as [0].

Shannon entropy of miRNAs

The Shannon entropy (SE) mesures information-entropy of a Bernoulli process with probability p of the two outcomes (0/1)45,46,47. It is defined as

$$SE=-\,\sum _{i=1}^{2}{p}_{i}lo{g}_{2}({p}_{i})$$
(5)

where \({p}_{1}=\frac{k}{{2}^{l}}\) and \({p}_{2}=\frac{l-k}{{2}^{l}}\); here l is length of the binary string and k is the number of 1’s in the binary string of length l.

The binary Shannon entropy is a measure of the uncertainty in a binary string. Whenever the probability p = 0, the event is certain never to occur, and so there is no uncertainty, leading to an entropy of 0. Similarly, if the probability p = 1, the result is certain, so the entropy must be 0. When p = 1/2, the uncertainty is at a maximum and consequently the SE is 145.

Results

Deviation from randomness

A simple random binomial (p, q) model48, where each entries can either be purine (or pyrimidine) with probability p (or q = 1 − p) fails to address the distribution of purine or pyrimidine over miRNAs. We can calculate the mean (\(\bar{x}\)) of the distribution from the sample. If we divide the sample mean by the average sample size (n) we get a probability p. From this probability, we can calculate the expected variance npq. We can also calculate the variance for m number of samples x1, x2, …, x m using \(\frac{1}{m-1}{\sum }_{i=1}^{m}{({x}_{i}-\bar{x})}^{2}\). The standard deviation (std) is the square root of the variance.

For purine, in human p = 0.509, gorilla p = 0.514, chimpanzee p = 0.505, mouse p = 0.495 and rat p = 0.473 from mean. So, expected std = 2.323 (human), 2.322 (gorilla), 2.327 (chimpanzee), 2.340 (mouse) and 2.325 (rat).

The sample std = 3.42 (human), 2.64 (gorilla), 2.79 (chimpanzee), 3.48 (mouse) and 2.79 (rat).

So here we see that in all five species the expected variances through binomial distribution are significantly smaller than what we would have expected from the sample.

Classification Based on FDs of Indicator Matrices

For each binary sequence of miRNA of human, gorilla, chimpanzee, mouse and rat, the fractal dimension (using Equation (2)) is calculated. Based on the fractal dimension, we have made classifications (clusters) for all the the datasets of the five species. There are 10 clusters of miRNAs of each species as shown in Table 2. The fractal dimensions including the histograms of all the miRNAs of human, gorilla, chimpanzee, mouse and rat are plotted in the Fig. 2. Also a normal distribution fitting is also made as shown in Fig. 3.

Table 2 Clusters based on Fractal dimension of miRNAs of Human, Gorilla and Chimpanzee, Mouse and Rat.
Figure 2
figure 2

Histograms of fractal dimensions of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat from top to bottom respectively.

Figure 3
figure 3

Normal distribution fitting over FDs of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat from top to bottom respectively.

The detail members (miRNAs) of the clusters using FDs for human, gorilla, chimpanzee, mouse and rat are given in the Supplementary Table S3. It is observed that the FD of miRNAs of human lies in the interval [1.50, 1.9] and the largest cluster (center at 1.60) contains 931 miRNAs, whereas the FD of miRNAs of gorilla lies in the interval [1.55, 1.86] with the largest cluster (center at 1.60) of miRNAs of gorilla that contains 151 miRNAs and the FD of miRNAs of chimpanzee lies in the interval [1.52, 1.84] with the largest cluster (center at 1.58) that contains 255 miRNAs. For the species mouse and rat, it is found that the FD of miRNAs of mouse lies in the interval [1.52, 1.87] and the largest cluster (center at 1.57) contains 743 miRNAs. The FD of miRNAs of rat lies in the interval [1.53, 1.85] and the largest cluster (center at 1.57) contains 368 miRNAs. It is worth mentioning that all the four intervals of FDs of the four species (except human) are contained in the interval of FD [1.50, 1.9] of human.

The centers of largest FD clusters of miRNAs of human, gorilla are approximately same (1.60) that reflect they are evolutionarily close. Further, the centers of largest clusters of miRNAs of chimpanzee, mouse and rat are approximately same (1.57) which is a reflection of the fact that chimpanzee, mouse and rat species are evolutionarily close. It is noted that there is no miRNAs of gorilla whose FD lies in between 1.76 and 1.84 whereas there are approximately 72 miRNAs of human and 7 miRNAs of chimpanzee whose FD lies in the said interval. There are clusters (for human, gorilla and chimpanzee, mouse and rat) with largest centers among the other centers of the clusters contain 5, 2, 3, 6 and 2 members respectively.

Classification Based on HEs

For each binary sequence of miRNA of human, gorilla, chimpanzee, mouse and rat, the Hurst exponent (using Equation (3)) is determined and then a classification is made which is shown in Table 3 for all the species. The Hurst exponents and the histograms of all the miRNAs five species are plotted in the Fig. 4. Also a normal distribution fitting is also made as shown in Fig. 5.

Table 3 Clusters based on Hurst exponent of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat.
Figure 4
figure 4

Histograms of Hurst exponents of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat from top to bottom respectively.

Figure 5
figure 5

Normal distribution fitting over HEs of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat from top to bottom respectively.

The detail members (miRNAs) of the clusters for human, gorilla, chimpanzee, mouse and rat are given in the Supplementary Table S4. The HE of miRNAs of human lies in the interval [0.26, 0.96] and the largest cluster (center at 0.72) contains 671 miRNAs whereas the HE of miRNAs of gorilla lies in the interval [0.37, 0.96] and HE of miRNAs of chimpanzee in the interval [0.27, 0.96]. The largest cluster (center at 0.64) of miRNAs of gorilla contains 69 members and the same (center at 0.72) contains 134 miRNAs of chimpanzee. The HE of miRNAs of mouse lies in the interval [0.0, 0.96] and the largest cluster (center at 0.72) contains 618 miRNAs. The HE of miRNAs of rat lies in the interval [0.41, 0.96] and the largest cluster (center at 0.66) contains 158 miRNAs.

The centers of the largest HE clusters in the case of human, chimpanzee and mouse are close enough where as the center of the largest cluster of miRNAs of gorilla and rat is significantly different from other three species unlike FD as stated in the above section. It interprets basically the long range autocorrelations of miRNAs of gorilla and rat are significantly different from the miRNAs of human, chimpanzee and mouse. Therefore with regards to the centers of the largest HE clusters, we observed two sets of evolutionarily close species: human, chimpanzee and mouse (HE ≈ 0.72) belong to one set and, gorilla and rat belong to another set (HE ≈ 0.65).

Classification Based on HDs

The detail pairs of miRNAs based on minimum Hamming distances (using Equation (4)) of human, gorilla, chimpanzee, mouse and rat are given in the Supplementary Tables S5, S6, S7, S8 and S9 respectively. We then form classes of pairs of the binary strings (miRNAs) of the five species based on Hamming distances 0 to 22 with the percentages of each class as shown in Table 4. The bar plots of these class-frequencies are also given the Fig. 6.

Table 4 Clusters based on minimum Hamming distance of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat.
Figure 6
figure 6

Bar plots of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat based on hamming distances from top to bottom respectively.

For all the species except rat, none of the clusters with Hamming distances from 0 to 21 is empty. In the case of mouse only, there are 48 pairs of miRNAs having HD = 22. It is also seen that the largest clusters with HD 9 for miRNAs of human, gorilla, chimpanzee, mouse and rat contain 1042752, 23076, 56794, 617676 and 95526 number of pairs respectively. It interprets that the arrangement of the purine and pyrimidine bases for most of miRNAs of human, gorilla, chimpanzee, mouse and rat are differed by 9 bases only.

Classification Based on Distance Pattern of Purine and Pyrimidine

For all the miRNAs of five species, the distance patterns between purine bases to the next immediate purine bases are obtained. There are 174, 47, 68, 168 and 99 clusters based on unique distinct patterns of purine bases distance (gap) of miRNAs of human, gorilla, chimpanzee, mouse and rat are shown in Supplementary Table S10. For an example, the pattern of purine distances in the miRNAs of h421 of human is [1-2-3-4-12] of which is interpreted as there are purine bases which are 1, 2, 3, 4 and 12 bases apart. The bar plot of different clusters frequencies (number of miRNAs) for the five species human, gorilla, chimpanzee, mouse and rat are plotted in Fig. 7.

Figure 7
figure 7

Bar plots of purine (on top) and pyrimidine (on bottom) distances of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat from left to right respectively.

It is found that most of the miRNAs have purine distance patterns [1-2-3] and [1-2-3-4]. There are exactly 549 and 343 many miRNAs of human, 71 and 65 miRNAs of gorilla, 117 and 107 miRNAs of chimpanzee, 374 and 278 miRNAs of mouse, 107 and 126 of miRNAs of rat for the purine distance pattern [1-2-3] and [1-2-3-4] respectively. There is no miRNAs of gorilla, chimpanzee and rat having purine distance pattern [1]. In all the five cases, it is noted that there are several clusters having only one member which means that the miRNA of those clusters have unique purine distance pattern. In the similar fashion, for all the miRNAs of human, gorilla, chimpanzee, mouse and rat, the distance between pyrimidine bases to the next immediate pyrimidine bases are found, which is tabulated in the Supplementary Table S10. The maximum length of the purine and pyrimidine distance patterns is found to be 5 for all the five sets of miRNAs except only five miRNAs of human.

We also have determined the density of purine and pyrimidine bases of the miRNAs of human, gorilla, chimpanzee, mouse and rat as presented in detail in the Supplementary Table S11. If the length of the miRNA is 20 (20 nt) in which number of purine bases (1’s) is 8 and number of pyrimidine bases (0’s) is 12, then the density of purine is 0.4 and density of pyrimidine is 0.6. The clusters based on the density of purine is made and tabulated in Table 5 for all the species. The histogram of the frequencies density for purine bases of all species are given in Fig. 8.

Table 5 Clusters based on Density (Purine) of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat.
Figure 8
figure 8

Histograms of frequency of Density for purine bases of the miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat from top to bottom respectively.

Classification Based on SEs

For all the miRNAs of the five species, the Shannon entropy (using Equation (5)) is calculated of which detail can be seen in the Supplementary Table S12. In the case of miRNAs of human, there are exactly 80 distinct SE values are obtained whereas in the case of miRNAs of gorilla, chimpanzee, mouse and rat there are 38, 57, 73 and 55 respectively distinct SEs are found. Based on the Shannon entropy, the miRNAs of the five species are classified into 10 clusters separately as shown in the Table 6. The histograms of SEs of all the miRNAs of human, gorilla, chimpanzee, mouse and rat are plotted in the Fig. 9.

Table 6 Clusters based on Shannon entropy of miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat.
Figure 9
figure 9

Histograms of Shannon entropy (purine and pyrimidine) Human, Gorilla, Chimpanzee, Mouse and Rat from top to bottom respectively.

It is observed that most of the miRNAs of all the species human, gorilla, chimpanzee, mouse and rat are having Shannon entropy centered at 0.96 which is the largest center of the clusters for all the three different clustering, which contain 1977, 300, 446, 1296 and 597 miRNAs respectively. It is seen that there is no member having SE 0.5 of all the miRNAs in all the five species human, gorilla, chimpanzee, mouse and rat. It interprets that none of the miRNAs is having equal (approximately) purine-pyrimidine density over the sequences. Overall these observations draw an impression that almost none of the miRNAs of all the five species are having random-like purine-pyrimidine distributions.

Discussion

The purine and pyrimidine analysis with the binomial distribution shows the purines and pyrimidines are not independently distributed over the miRNAs and there is a tendency of same properties (purine or pyrimidine) to repeat in a miRNA. We find the various classes using different methods where most of the cases the classes are normally distributed although the distribution of the purines and pyrimidines is not random like distribution.

There are 5 miRNAs of human in the cluster 10 based on fractal dimension as shown in the Table 2 having maximum FD. We have seen closely those sequences and find that three of them (h2248, h1954 and h2552) are pyrimidine rich sequences (94%, 90% and 90% respectively) and the other two (h1835 and h1291) are purine rich sequences (95% and 100%) as shown in Table 7. In the case of miRNAs of gorilla, chimpanzee, mouse and rat the cluster 10 contains 2, 3, 6 and 2 miRNAs respectively. All these miRNAs of gorila, chimpanzee, mouse and rat are either purine or pyrimidine rich as shown in Table 7. Based on the observations here we strongly suggest that whenever the amount of purine or pyrimidine is quite high in a miRNA sequence, then the corresponding FD will be maximum.

Table 7 MiRNAs (from the cluster-10 based on fractal dimension) and its density distribution of miRNAs of the five species.

There are several clusters having miRNAs for human, gorilla, chimpanzee, mouse and rat with the same HEs. The density distribution of purine and pyrimidine are balanced for all such miRNAs having same HEs. For an example, we took 17 miRNAs from the cluster 7 of miRNAs of human based on Hurst exponent (h463, h526, …, h1824 and h2202) which are all having the same HE (0.714484) as shown in the Table 8. It is found that the density distributions of purine and pyrimidine are 60% and 40% (or 40% and 60%) respectively. It is also seen that these 17 miRNAs of human belong to same cluster 3 based on fractal dimension. It is observed that there are 0.8% miRNAs of human, 1.1% of gorilla, 0.68% of chimpanzee, 0.67% of mouse and 0.4% of rat miRNAs which are having the same HE (0.5) indicating a completely uncorrelated purine and pyrimidine spatial ordering over the miRNAs. This investigation reassures that the miRNAs for all the five species are deviating from randomness. It is also noted that there are exactly 11 miRNAs only of mouse having zero Hurst exponent that interprets those 11 miRNAs are having consecutive purines (pyrimidines) and pyrimidines (purines).

Table 8 MiRNAs (from the cluster-7 based on Hurst exponent) and its FD with the density distribution of miRNAs of Human.

Now we see the miRNAs of human having identical distance patterns of purine and pyrimidine. It is observed that there are very few numbers of miRNAs of human which are having identical distance pattern of purine and pyrimidine. For example, h61 miRNA is having identical distance pattern of purine and pyrimidine [1-2]. Also There are miRNAs h36, h51, h62, h83, h122 and h2584 having identical distance pattern of purine and pyrimidine [1-2-3]. There are miRNAs of gorilla g6, g13, g19, g20 and g21 having identical distance pattern of purine and pyrimidine [1-2-3-4]. In the set of miRNAs of chimpanzee, there are p55, p84, p106, p119 and p138 having identical purine and pyrimidine distance pattern [1-2-3-4]. Similar distance patterns are also seen in the miRNAs of mouse and rat. These identical distance patterns of purine and pyrimidine make a guarantee that there are purine and pyrimidine blocks of same length.

There are 142, 58, 92, 126 and 88 distinct densities of purine and pyrimidine bases across miRNAs of human, gorilla, chimpanzee, mouse and rat are found. Out of 2588 miRNAs of human, 194 miRNAs of human having equal density (0.5), 1121 miRNAs of human having lesser density (less than 0.5) of purines than that of pyrimidine, 1273 miRNAs of human having higher density (greater than 0.5) of purine than that of pyrimidine. This infers density of pyrimidine is richer than that of purine over the set of miRNAs of human. It is found that there are 43 miRNAs out of 357 miRNAs of gorilla, 67 out of 587 miRNAs of chimpanzee, 154 out of 1915 miRNAs of mouse and 69 out of 765 miRNAs of rat having equal density (0.5) of purine and pyrimidine. There are 146 miRNAs over 357 miRNAs of gorilla, 268 miRNAs over 587 miRNAs of chimpanzee, 893 miRNAs over 1915 miRNAs of mouse and 411 miRNAs over 765 of rat having lesser density of purines than that of pyrimidine. In this regard we infer that the densities of pyrimidine over the miRNAs of these five observed species is richer than that of the purine bases. Out of all the 2588 miRNAs, the miRNA hsa-miR-6124 MIMAT0024597 (h1291) is only miRNA containing all purine bases.

The evolutionary closeness among the species utilizing five parameters are shown in Table 9. For an example, the mean of FDs of 2588 miRNAs of human is 1.62 which is approximately same as for 1915 miRNAs of mouse. So, these two close species ({Human, Mouse}) are put in one set. Further, the mean of FDs of miRNAs of the species gorilla, chimpanzee and rat are close (≈1.60), so they are put in another set. Similarly, we have shown the close species based on the largest cluster center of each parameter in Table 9 as discussed in Result section.

Table 9 The set of evolutionarily close species based on mean quantitative value and largest cluster center of the discussed parameters. Pu-Purine.

It is reported that MiR-200 (star miRNAs) is a family of tumour suppressor miRNAs consisting of five members (miR-200a-3p/h609, miR-200b-3p/h888, miR-200c-3p/h1520, miR-200a-429/h1670, miR-141-3p/h515), which are significantly involved in inhibition of epithelial-to-mesenchymal transition (EMT), repression of cancer stem cells (CSCs) self-renewal and differentiation, modulation of cell division and apoptosis, and reversal of chemoresistance21 as shown in Table 10 along with other four miRNAs of miR-200a-5p/h1449, miR-200b-5p/h1677, miR-200c-5p/h2186 and miR-141-5p/h328. We have chosen all these nine miRNAs of human including other miRNAs of human which are 0, 1 and 2 Hamming distance apart from those nine miRNAs.

Table 10 Star miRNAs (MiR-200) in human cancer and their quantifications. Pu-purine, Py-Pyrimidine.

The FDs of five star miRNAs are almost similar except h1520 whose FD is slightly greater than the other four miRNAs. But the HEs are varying for all the five miRNAs. It is found that h888 is 1 Hamming distance apart from h609 and h1520 although h609 and h1520 are 2 Hamming distance apart. The miRNA h888 is having approximately same HD, HE and density of purine and pyrimidine bases with the miRNAs h609 and h1520. Hence we convict that h888 might also work as h609 and h1520 do. It is also observed that h1670 is 2 HD apart from the miRNAs h609 and h515 and the miRNA h1670 is showing very closeness as per quantitative measures and hence this miRNA h1670 would function as h609 and h515 do. As closeness is a transitive property (HD follows transitive inequality, HD(a, b) + HD(b, c) ≥ HD(a, c)), we can conclude that h888 could also function as h1670 (HD(h888, h1670) = 3). There are two miRNAs of human h1449 and h1677 which are 0 Hamming distance apart with same quantitative measures and hence we firmly propose that these two miRNAs would also function similarly. It is worth noting that both the miRNAs h1449 and h1677 have identical purine-pyrimidine organization. Following the similar argument, other association with the rest of miRNAs can also be made. It is seen that there does not exist any human miRNA which is 0, 1 or 2 HD apart from the miRNA h328. The five star miRNAs and their various combinations are associated with variety of diseases (*Table 1 in21 and23,49,50,51). The miR−200a−3p/h609 and miR−200c−3p/h1520 are associated with cancer type Cutaneous melanoma as reported in52. These two miRNAs are similar except in two bases (HD = 2) in their purine pyrimidine distribution.

In order to understand the association among the mRNAs and miRNAs, we take eight target mRNAs for some diseases and the set of associated miRNAs of human species as listed in Table 11 and Table 12. It is found that the FDs of mRNAs are quite higher than the same of associated miRNAs. The FDs of h150, h1617 and h22 are very close and they target to TGFBR2. It is observed that the density distribution of purine/pyrimidine is balanced for hh2220 and h616 and they target to DNMT3A causing the disease Lung Neoplasms. If we look for the distance pattern of purine and pyrimidine, we could observe some sub patterns for miRNAs to the corresponding mRNAs. The HD (using Equation (4)) between the purine pyrimidine distribution of miRNAs and the corresponding target mRNAs are determined and it turns out to be ranging from 1 to 5 (1 ≤ HD ≤ 5). This observation suggests that for some specific regions of target mRNAs of length around 22 nt, we have approximately (80–95)% similarities with the corresponding miRNAs.

Table 11 Selected miRNAs of Human and their corresponding target mRNAs. En-Encrypted name.
Table 12 Target mRNAs/miRNAs and corresponding quantifications. Pu-Purine, Py-Pyrimidine, Dp-Distance pattern.

Through these investigations based on quantitative measures, we observe that their is no direct association among the miRNAs and target mRNAs due to their many-many relationships. The complex relationship among miRNAs and the target mRNAs is very much dynamic under various specific conditions as previously pointed out in the literatures13,49,51. Thus it has also been realized through our quantitative analysis over a set of miRNAs and target mRNAs. Our analyses can presume a set of possible miRNAs which would play some key role on the target mRNAs as they are very close with regards to quantitative measures.

Concluding Remarks

One of the integral divisions of nucleotides based on their chemical properties is purine-pyrimidine. We attempted to understand the distribution of purine and pyrimidine bases over all the miRNAs in five species human, gorilla, chimpanzee, mouse and rat. Quantitatively, we deciphered the self-organization of the purine and pyrimidine bases for all the miRNAs through the fractal dimension of the indicator matrix. Also we took out the auto correlation of purine-pyrimidine bases through the parameter Hurst exponent. To get the nearness of the miRNAs based on their purine-pyrimidine distribution, HD is employed. The purine-pyrimidine distance patterns including the frequency distribution have been found for all the miRNAs. For all these parameters, we did cluster the miRNAs into several clusters. Based on the quantitative investigation, some crucial observations are adumbrated in the discussion. Our investigation over all the miRNAs of the five species through the purine and pyrimidine distributions triggers evolutionary closeness among the inter and intra families of different clusters.

Over the analysis through all the quantitative measures we could provide the set of miRNAs which relates target mRNAs and also the set of miRNAs that are associated with the specific diseases. Imperfect base-pairing between the miRNA and the 3′-UTR of its target mRNA leads to blockage of translation, or at least accumulation of the mRNA’s protein product, whereas perfect or near-perfect base-pairing between the miRNA and the middle of its target mRNA causes cleavage of the mRNA, thereby inactivating the same. Such diverse patterns of miRNAs may be responsible for making the correlation among miRNAs and target mRNAs complex, that are yet to be resolved decisively as pointed out by several researchers earlier. As, miRNAs are very smaller in size compared to their target mRNAs, the subsequence/subregions of specific mRNAs and miRNAs association might improve the results. In this context, we plan to bring out the patterns of organization of nucleotides following the other two modes of classifications (amino-keto and strong H-bond and weak H-bond) based on the chemical properties of nucleotides. This is in order to integrate the whole three kinds of grouping to find out the correlationship among the miRNAs-mRNAs and also the targeted regions of mRNAs.