Introduction

Pitch identification of an acoustic signal is made possible by the precise capacity of the organ of Corti to identify all the frequencies composing the sound1,2. This is possible thanks to the resonance characteristics of the basilar membrane; at this level, each frequency causes it to oscillate at different points with a distribution that goes from the apex to the base, respectively, for low and high frequencies3. Furthermore, the oscillation mechanism concerning the frequency is made even more precise by the contractile activity of the external hair cells1,4.

Based on the basilar membrane vibration, it is possible to explain the pitch identification of a pure (sinusoidal) tone characterized by the presence of a single frequency. According to the temporal theory of pitch perception, the periodicity that characterizes the periodic waveform of every musical note is directly related to the pitch of the note itself. Thus, the pitch is often described as “the perceptual correlate of the periodicity of the sound’s waveform”5.

Musical note and voices are characterized by a more complex acoustic signal defined as complex periodic waveform (CPW)6. In CPW signal, the waveform is characterized by the presence of multiple frequencies that sound together. The lowest frequency is called fundamental frequency (f0), which represents, in the case of chordophone instruments and of the vocal folds, the vibration of the entire chord. The other higher frequencies are called harmonics and are integer multiples of the fundamental frequency7,8. In this case, the basilar membrane vibrates at different points simultaneously, and each point is related to one of the harmonics composing the CPW. In this way, the cochlea perfectly represents the musical or vocal acoustic signal9,10.

In the CPW, the perceived pitch is mainly defined as the fundamental frequency (f0) expressed in Hz. In the current convention, the tuning reference of a piano is the keyboard’s fifth A note (A4), which has a fundamental frequency ca. 440 Hz. Based on this value and the mathematical relationship between the notes, it is possible to determine the fundamental frequency of each note. Theoretically, every octave interval (frequency ratio 2:1) is divided into 12 equally sized semitones (in equal-tempered tuning). However, in practice, the tuning of the piano is slightly stretched11,12. The 88 keys of the piano keyboard cover the fundamental frequencies from 27.5 to 4186 Hz (a0 to C8)2,13.

Pitch perception may also be influenced by timbre and loudness due to filtering imposed by different vocal tracts of instrument bodies. About this point, McPherson et al. demonstrated that pitch discrimination was less accurate when musical notes derived from different instruments compared to when the instrument was the same and was biased by the associated timbre differences. However, they also demonstrated that relative pitch judgments are not invariant to timbre, even when such judgments are based on representations of f014.

According to some previous authors, pitch estimation relies on two components: spectral pitch (which involves analytically listening to individual harmonics) and virtual pitch (which involves holistic listening of a single evoked pitch). Moreover, there are several neurophysiological processes that may influence the perceived pitch of a complex tone; they consist in, among others, pitch bending of an individual harmonic, masking of harmonics, and irregular auditory sensitivity in listeners15,16,17.

Due to masking or filtering, it can be difficult to hear the fundamental frequency of a note in the presence of background noise, where the acoustic pressure is concentrated on low frequencies1,6, or in listening to filtered acoustic signals such as MP3 recordings, radio, or telephone conversation14. In addition, the hearing sensitivity of a human is considerably attenuated in low frequencies18.

In these cases, however, it is possible to identify the pitch of the notes based on the perception of the consecutive harmonics, which, being higher in frequency, are less easily masked. Since the harmonics are in a mathematical relationship with the fundamental, being its integer multiples (in the case of harmonic tones), it is possible to perceive the pitch of the fundamental frequency, even if it is not hearable, based on the consecutive harmonics. For example, in the 500–600 Hz frequency pair, the perceived fundamental is 100 Hz14 (greatest common divisor).

This study aimed to evaluate the pitch identification of acoustic signals characterized by the presence of two or four consecutive harmonics without fundamental with amateur musicians without perfect pitch19. Moreover, we aimed to evaluate the influence of the distance between the theoretical fundamentals and the presented harmonics.

Materials and methods

This study was performed following the ethics standards laid down in the 1964 Declaration of Helsinki and informed written consent was obtained from all subjects. The study was approved by the University of Turin’s ethics committee, and the study’s aim was clearly explained to each participant to obtain informed consent to be subjected to the test.

The study was performed on a group of 60 participants, 26 (43%) male and 34 (57%) female, aged between 21 and 75 years (mean of 37.2 years). Participants with non-professional ability were recruited to play an instrument but lacking perfect pitch.

In the design phase of the study, we first asked 6 musicians with perfect pitch to identify the presented notes with harmonics without fundamentals. In these participants, unlike what happens in people without perfect pitch, the sensation generated by the stimulus sent was that of the simultaneous presence of single notes having the same fundamental frequency of the harmonics presented. Those who have perfect pitch, therefore, have the same perceptive behavior as a person without this characteristic but to whom the different frequencies sent are not sent simultaneously20. Since the aim of the study was to verify the possibility of note identification based on the harmonics without fundamentals, subjects with perfect pitch were excluded from the study.

We also tested some participants lacking the ability to play an instrument but for many of these participants the identification of the note on the keyboard caused significant tension and anxiety, making the note was identified with considerable difficulty and the outcome could not be considered reliable. Therefore, even these participants were excluded by the study group.

Hence, we decided to test people who were able to play on a keyboard but did not have perfect pitch. To avoid interference with cochlear function we included in the study group only participants free from ear pathologies and with normal audiometric threshold.

Among the inclusion criteria, there were also a negative history of ear diseases and an audiometric threshold equal to or less than 25 dB at frequencies between 125 and 8000 Hz.

The sound stimuli were presented by a tone generator that produced a complex waveform made up of two or four pure tones with the frequencies chosen by the Authors. The generator is based around a SGTL5000 stereo codec with headphone amplifier (https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf), mounted on an audio adaptor (https://www.pjrc.com/store/teensy3_audio.html), and controlled over a Freescale based (https://www.nxp.com/docs/en/data-sheet/K20P64M72SF1.pdf) Teensy 3.2 board (https://www.pjrc.com/store/teensy32.html). The SGTL5000 in our configuration is powered by 3.3 V and is driving the analog output to the Audiotechnica headphones (model: ATH-M50X) in a 24bit data regime with a 44.1 kHz sampling frequency. In this configuration, 32-Ohm headphones achieved a signal-to-noise ratio (SNR) of 100 dB with a total harmonic distortion plus noise (THD + N) of − 88 dB and a frequency response of ± 0.11 dB.

All frequencies started simultaneously had the same sound pressure level (SPL) at 65 dB.

The test was divided in two sessions. In the first session, the harmonics 2–3–4–5 of the notes C3 and G3 were presented to all participants, so they were invited to identify the notes listened on an electronic keyboard without temporal limit (Table 1). Then, to the same participants were presented the harmonics 3–4–5–6 of the notes E3 and A3 and asked to identify the note following the same protocol (Table 2). In the second session, two consecutive harmonics, 3–4, 4–5, 5–6, 6–7, 7–8, 8–9 and 9–10 of the notes A4, D4, E4, F4, A4, E4, D4 (Table 3) were presented to all participants and they were asked to identify the note with the same protocol similar to the first session.

Table 1 Notes utilized for the test, frequency of their fundamentals according to the actual musical codification and frequency of the harmonics 2–3-4–5 present in the acoustic signal of the test.
Table 2 Notes utilized for the test, frequency of their fundamental according to the actual musical codification and frequency of the harmonics 3–4–5–6 present in the acoustic signal.
Table 3 Notes utilized for the test, frequency of their fundamental according to the actual musical codification, progressive number of the harmonic association presented and frequency of the two harmonics presented.

The explanation for the choice of notes used in the study lies in the fact that these notes, being in the middle region notes of the piano keyboard, are more pleasing to auditory perception and facilitate the listening and the recognition of the presented harmonics.

Results

In Tables 4 and 5, the rate of correct identification of the notes in relation to the group of harmonics presented is reported. The correct identification rate of the note in the absence of the fundamental is between 88 and 100% listening to harmonics 2–5 and between 82 and 96% listening harmonics 3–6. The correct identification rate is higher in the presence of harmonics nearer to fundamental and of notes with a higher fundamental frequency. Table 6 shows the correct identification rate of the note in relation to the harmonics presented, respectively 3–4, 4–5, 5–6, 6–7, 7–8, 8–9 and 9–10. The correct identification ranges from 6 to 76%. Even in this case, the higher rate of correct identification was obtained by listening to a couple of harmonics nearer to the fundamental and lower frequencies. This pattern is also reported also in Fig. 1, where the rate of correct identification is related to the frequency of the lowest couple of harmonics presented to participants. The slope of the line clearly shows that the identification of the note is easier if the frequency of the first harmonic presented has a frequency less than 1500–2000 Hz. Then, the success rate of identification falls to lower values.

Table 4 Rate of correct identification of the notes C3 and G3 on the basis of the presentation of harmonics 2–3-4–5.
Table 5 Rate of correct identification of the notes E3 and A3 on the basis of the presentation of harmonics 3–4-5–6.
Table 6 Rate of correct identification of the notes in the different couple of harmonics presented to the participants.
Fig. 1
figure 1

Rate of correct identification of the note in relationship to the frequency of the lowest harmonic presented. The numbers in the brackets refer to the frequency of the first harmonic.

Discussion

The results obtained from the study confirm that it is possible to identify a note based on the presence of four harmonics19. The correct note identification was higher if the harmonics presented were nearer to the fundamental (2–5 versus 3–6). If the signal is composed by two harmonics only, the identification is still possible, but the rate of correct identification dropped to lower values (6% to 76%), and even in this case, it is higher if tones presented were closer to the fundamental.

The identification of the note without the perception of the fundamental is possible since two or more frequencies correlated with each other as consecutive integer multiples of a lower frequency allow the reconstruction of the missing fundamental of the signal and confers the tonal characteristic perceived20. This modality of perception requires the simultaneous presentation of the single frequency components (harmonics). Otherwise, these are identified as signals not correlated with each other and heard each one as single associated note21,22,23.

Previous studies on pitch note discrimination based on harmonics listening have been conducted, so leading to the concept that pitch discrimination abilities and pitch salience decrease dramatically when harmonics of a complex below the tenth are removed24,25 In the recent past, also Graves et al. demonstrated that functional pitch perception was possible within combinations and mixtures of different harmonics, even when the stimuli were filtered to fall within the same overlapping spectral region26.

The concept of the “critical band” was introduced for the first time in 1933 by Harvey Fletcher and defines the frequency range of the so-called “auditory filter” created by the cochlea21. The critical bandwidth refers to the range of audio frequencies where a second tone disrupts the perception of the first tone through auditory masking. Critical bands are also linked to auditory masking effects, which reduce the detectability of a sound signal when it coexists with a second signal of greater intensity within the same critical band. The implications of masking phenomena are extensive, encompassing a nuanced interplay between loudness and intensity.

When listening to just two harmonics, the critical point in note recognition was placed between 1500 and 2000 Hz. To explain this pattern, it is necessary to remember that the fundamental identification based on at least two harmonics can take place only if both harmonics can be heard. If the two harmonics are too close to each other, they are analyzed at very near points along the basilar membrane. If the distance is lesser than 1 mm, the masking phenomenon occurs, and one of the two tones is not perceived22,27.

The arrangement of the points of maximum cochlear oscillation does not follow a linear relationship with frequency. Instead, it is based on a logarithmic ratio in base 2; this means the distance between two points of maximum oscillation is constant, 4 mm, in relation to the doubling of the frequency21. As a result, the distance between the points of maximum oscillation of the basilar membrane induced by two tones with a constant difference in frequency, as occurs in our experiment, is lesser in the high-frequency region. Therefore, in the high-frequency range, it is easier for one of the two tones to be masked by the other. Consequently, it becomes more difficult to identify the fundamental.

In normal musical listening it is always possible to identify the fundamental even if not perceived while in our study this event does not always occur. The low performance reported in our experiment can be explained by the adoption of synthetic tones in which all the harmonics have the same intensity, situation far from normality. Moreover, in real music listening for a normally hearing subject, it is never possible that only two or four harmonics are perceived. Since the masking effect is mainly due to the background noise which is characterized by a higher acoustic pressure up to about 500 Hz24, all the harmonics above 500 Hz, even for the notes who’s fundamental is below this frequency (C5, i.e. the 52nd note of the piano keyboard) can be clearly distinguished, leading to the identification of the fundamental frequency of those perceived harmonics.

Conclusion

Our study confirms that it is possible to identify a note solely based on the presence of harmonics near the fundamental frequency, and identification success is higher if f0 is under 2000 Hz. Moreover, in presence of more harmonics we have demonstrated a higher rate of correct identifications.

Our results could have implications for models and computational algorithms for pitch determination. A better understanding of the mechanisms humans use for fundamental note identification should lead to improved computer listening capabilities for the same tasks.