Introduction

Recently, the number of the unidentified cadavers has increased due to disasters and terrorisms episodes worldwide, and as a consequence of social internationalization1,2.

Amongst the various types of information needed for cadaver identification, age is the most important. Most of the conventional age estimation methods, including anthropological methods and the dental method3 are mainly used in highly corrupted or skeletonized corpses. Closing of the cranial suture4, change of the pubic bone5, morphology of the ribs6, the developmental stages of the seven left permanent mandibular teeth7 and so on represent some examples of those methods. Some of the above-mentioned methods provide a high estimation accuracy8,9,10; however, they are mainly subjective and require extensive technical skills and experience2,11,12,13. Recently, computed tomography (CT) and magnetic resonance imaging (MRI) have also been used in this area14,15,16,17. CT and MRI require very expensive equipment, and therefore their use is not yet widely adopted.

Age estimation is somewhat thought to be necessary only for highly corrupted corpses, but fresh corpses that require age estimation are found on a daily basis. For age estimation in fresh cadavers or living individuals, in addition to above mentioned methods, bone X-ray examination, physical examination, dental eruptions, etc.18, molecular biological methods are also used. For instance, among the alterations that occur to our body as we age, many relate to proteins, and attach to intrinsic variations of their molecular structure. Stadtman19 proved how the age-related increase in oxidized proteins might reflect the age-dependent accumulation of unrepaired DNA damage.

The most advanced and objective concept of an “age predictor” in modern forensic science matches the so-called epigenetic (or DNA, or Horvath’s) clock20,21,22, which is based on the DNA methylation of CpG dinucleotides and can be used to estimate the age of a suspect based on blood remains. Usable on a wide spectrum of different tissues, the median error in age prediction of this method is 3.6 y, with a Pearson correlation coefficient, r = 0.96, to chronological age21. The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, with this coefficient being either + 1 or − 1 depending on whether the relationship is positive or negative, respectively. Horvath’s clock leads to a remarkably precise age estimation as compared to other proposed methods (e.g., telomere length, p16INK4a, or microsatellite mutations)23,24,25, which show distinctly lower correlation coefficients and only relate to specific types of cells rather than directly to the chronological age. The epigenetic clock was also found to relate to the body mass index (BMI) with a good correlation, r = 0.42, for analyses conducted on liver tissue21. Although Horvath’s method proved highly objective and has remarkable accuracy among the various age estimation methods, special technical skills, expensive reagents and equipment are still required as well as time and effort.

Looking for a different approach, we examined the possibility of using Raman spectroscopy of human skin in an attempt to establish a reliable and easily accessible method of age estimation in forensic medicine. Our spectroscopic method relies on the collection of incisional skin samples during forensic autopsy, and spectroscopic examination of the structural characteristics of both the proteins and lipids contained in the dermis (including epidermis, dermis, and hypodermis). Aging of the structures of skin has been deeply studied by employing a number of different analytical techniques26,27. Our previously published proof-of-concept study attempted to label all the skin-emitted Raman bands and to locate any biophysical link between vibrational fingerprints and specific structural variations associated with human age28. The existence of such links was indeed phenomenologically proven for protein structures; however, it is hindered by insufficient statistics. The final proof for our spectroscopic findings, thus, relies on the statistical validations.

The main purpose here is a statistical validation of the presence of Raman spectral markers for precise age identification from human skin samples. In doing so, we used the same spectral deconvolution algorithm previously proposed28. In that previous study, we investigated the Raman spectra of various skin samples to find a candidate of ideal structural change for age estimation with a small number of cases. Accordingly, we found that protein folding may be a good potential “biological clock”. In this study, upon increasing the number of autopsy samples (i.e., n = 132) and, concurrently, taking care that a wide age range was included, we attempted to assess the statistical exactness of age estimation by Raman spectrometry. The statistical validation presented here could be attractive for the forensic community because the Raman method is label-free and directly links vibrational features at the molecular scale to practical forensic purposes.

Figure 1 gives schematic drafts of the different secondary structures of proteins detected by Raman spectroscopy. Figure 1 also includes drafts of the specific vibrational modes that were monitored in this study, as well as the Raman spectroscopic links in terms of the vibrational frequencies of different structural assemblies. The structure of skin upon aging has been thoroughly studied with a number of different analytical techniques26,27. Unfortunately, explicit biophysical links are still missing between the observed vibrational fingerprints and the specific structural variations associated with human age. With reference to the relative intensities of specific Raman scatterings from different protein mesostructures (α-helix structure and disordered random coil protein structures as seen in Fig. 1), we defined the protein-folding intensity ratio (RPF). The RPF represents the ratio of Raman intensities related to the C=O stretching mode (i.e., in the overall Amide I vibrational spectrum depicted in Table S2) in disordered protein structures (i.e., the random coil Raman signal seen at 1681 cm−1) to α-helix structure (i.e., the Raman signal at 1652 cm−1) (Fig. 1)29,30,31,32,33,34,35,36,37.

Figure 1
figure 1

Explanatory schema of the different secondary structures involved with the spectroscopic measurement of human age based on the protein folding mechanism. Three kinds of protein mesostructures (shown in Table S2) are detected by Raman spectroscopy. Our previous study28 suggested that the rate of random coiled disordered protein structures (Raman signal seen at 1681 cm−1) becoming α-helix structures (Raman signal at 1652 cm−1) was age dependent.

Studies29,30 have reported that proteins irreversibly fold upon aging. Accordingly, in our previous study28, the Raman features representing those folding processes were located as parameters sensing biological age, although their age-sensing precision was yet unknown to us.

The aim of this study was to statistically validate the Raman spectroscopic parameters of protein folding in human skin as age-predicting parameters. The Raman spectra of abdominal skin sample were measured in 132 autopsy cases and the protein-folding intensity ratio (RPF) was calculated. And four models, based on linear (RPF), squared (RPF2), sex, and RPF by sex interaction terms, were examined.

Results

The data from a total of 132 subjects (92 males and 40 females) were analyzed. Table 1 shows the summary statistics of age and RPF. The RPF of our samples ranged from 0.16 to 0.90. The mean years of age and RPF were 53.4 and 0.41, respectively. On the other hand, the median years of age and RPF were 53.5 and 0.39, respectively. The estimated regression coefficients for four models were shown in Table 2. The results of cross validation suggested that Model 2 including linear and squared terms was the best model with the lowest root mean squared error (11.3 years of age) and the highest coefficient of determination (0.743). Both estimates for RPF and RPF2 were statistically significant at the 5% significance level in Model 2. There was a strong negative correlation between age and RPF with a Pearson correlation coefficient of r = − 0.878. Complete plots for all age intervals are given in Fig. 2 for the RPF. Figure 2 also shows the predicted values of age and the 95% prediction intervals. Table 3 shows the 95%, 80%, and 50% prediction intervals. For instance, if the RPF is 0.3, the 95%, 80%, and 50% prediction intervals are 46.7–91.3 years of age, 54.5–83.5 years of age and 61.4–76.6 years of age, respectively. If the RPF would be 0.5, the 95%, 80%, and 50% prediction intervals are 15.9–60.6 years of age, 23.7–52.8 years of age and 30.6–45.9 years of age, respectively. There was no statistically significant correlation between the RPF and BMI. There was no statistically significant difference between the male and female samples. Moreover, the specific cause of death was not seen in the outliers.

Table 1 Descriptive statistics for age and RPF.
Table 2 The regression formulae used in this study.
Figure 2
figure 2

Prediction curve using Model 2 (RPF). Complete plots for all age intervals are given for the protein-folding intensity ratio (RPF) and the 95% prediction intervals are also shown. There was a strong negative correlation between age and the RPF with a Pearson correlation coefficient of r = 0.878.

Table 3 RPF and the age prediction intervals for 50, 80 and 95% by confidence levels.

Discussion

To quickly identify unidentified cadavers, age estimation methods that can be used by non-experts at the places where bodies are found must be developed Recently, biomarkers, such as prostate specific antigen and thyroid hormones, skin imaging, and ultrasonography have been reported as alternative age estimation methods to the conventional subjective methods38,39,40,41,42. Skin imaging and ultrasonography are particularly elementary and can be used by non-experts on the scene38,41. Unfortunately, their estimation accuracy is relatively low.

In this study, a large number of analytical data showed that RPF, related to the folding in peptides and proteins, was a reliable marker for the estimation of human age. The reason for the observed precision of the age-predictor, RPF, might reside in the fact that protein oxidation and misfolding mainly affect the frequency shift of the sub-bands used for age assessment, while they minimally alter the Raman intensity. Although the outputs of this study were validated with a statistically analyzable number of patients and the shown equations are presently used at the Kyoto Prefectural University of Medicine for estimating the age of unidentifiable bodies, our presented method still requires a substantial amount of time and skill.

Recently, hand-held Raman spectrometric devices were developed43. This device is now available in each prefectural police department in Japan for drug detection at crime scenes and requires less than a minute for detection, no expensive reagents, and technical skills. Considering our results, this equipment appears a promising choice for age estimation of unidentified cadavers in instantaneous assessments at the crime scene by simply detecting RPF related bands.

There are certain limitations in our study. First, although there was no statistically significant difference between the male and female samples, this may be related to our sample size. Therefore, we need to continue increasing the sample number. Second, although the specific causes of death were not related to the outliers, we could not perform a statistical analysis due to the diverse causes of death. Third, the confidence intervals of age prediction were relatively wide especially in young people. Many methods that use growing and development characteristics are reported as reliable for individuals under 20 years of age44,45.

Therefore, we consider that age estimation should be done by the combination of many methods to obtain better accuracy and avoid erroneous judgments. Finally, our sampling was restricted to Japanese donors living in the same area and climate; nevertheless the results are evocative and must next be confirmed with future experiments regarding whether (or to what extent) the Raman biological clock of protein folding is common to all human beings.

Materials and methods

Materials

A total of 132 cadaveric skin samples were obtained from forensic autopsies for cadavers at different ages spanning from a few months to 93 y old (Table S1 in the Supplementary Materials). Relatives of all the subjects gave their informed consent before inclusion in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Institutional Review Board of Kyoto Prefectural University of Medicine, Japan (ERB C-191). Samples were taken during the autopsies conducted within 48 h.

Several samples for each patient were taken from the sub-umbilical region of abdominal skin in the body, which typically remains unexposed to solar irradiation. The sampling included only Japanese subjects. A list of the donors with information on their age, height and weight at death, sex is given in Table S1. In the statistical characterization, no distinction was made regarding the sampling; therefore, we included in the Raman analysis any available sample irrespective of the cause of death. The skin samples were 10 × 10 mm2, and encompassed the full thickness of the skin from the stratum corneum to the hypodermis.

Spectroscopy

The samples were stored in a humid and evacuated environment at approximately 0 °C until use. Raman spectroscopy was conducted within 5 days after the autopsy. The Raman excitation source in this experiments used a 532 nm Nd:YVO4 diode-pumped solid state laser operating with a power of 200 mW (SOC JUNO, Showa Optronics Co. Ltd., Tokyo, Japan). An objective lens with a numerical aperture of 0.5 was used both to focus the laser beam on the sample surface and to collect the scattered Raman light. A pinhole aperture of 100 μm was adopted employing an objective lens with a magnification of 100x. For each studied sample, several tens of spectra were collected from the epidermis to dermis (100–600 µm depth from the skin surface) in the cross-section and we obtained an average spectrum whose sub-bands were best-fitted to Gaussian–Lorentzian bells46. Table S2 shows a complete description of the sub-bands labeled in Fig. 3 for the protein spectral zone, which includes the frequency range at which these specific sub-bands were located and their vibrational origins. To have a reliable sampling of the bands per each donor, a number of Raman spectra in the order of > 100 needed to be averaged. Although the spectral acquisition procedure could be automatized, scanning of wide spectral range through the dermis to epidermis resulted in time consuming (about 20 h of measurement per donor).

Figure 3
figure 3

Average Raman spectra from skin samples from the patients listed in Table S1 for the selected age intervals (shown in inset). Spectroscopic related to the proteins are shown. In the protein spectral zone, the labels α and d refer to α-helix and disordered structures, respectively. The vibrational origins of all labeled bands are listed in Table S2 of the Supplementary Materials and are visualized in Fig. 1.

Statistics

We evaluated the correlation between age or the BMI and the RPF using the Pearson correlation coefficient. To build a prediction model, we performed regression analysis with age as the dependent variable for four candidate models. The independent variables of those models were follows: RPF in Model 1; RPF and RPF2 (the square of RPF) in Model 2; RPF , RPF2 and sex in Model 3; and RPF, sex, and RPF by sex interaction in Model 4. The goodness-of-fit of each model was assessed using a tenfold cross validation method (repeated 100 times), estimating the mean squared error and coefficient of determination. After selecting a best model, we estimated 95% prediction intervals for the predicted ages.