Abstract
The LLaMAC dataset was developed to predict the success of audio-visual media via emotion prediction. It was created using low-cost biosignal sensors, with emotional questionnaires in both continuous (valence, arousal, dominance) and discrete domains (emotion type and intensity: neutral, fun, sadness, anger, fear), and included over 100 participants. Questionnaires on liking and familiarity were also collected. The dataset contains five biosignals—EEG, GSR, PPG, SKT, and RESP—and seven questionnaire measures. Biosignals were validated using statistical metrics and signal-to-noise ratios, while questionnaire responses were assessed with scatter plots and statistical analyses. Emotion classification was performed using a Light Gradient Boosting Machine (LightGBM). The dataset enables biosignal-based prediction of emotions and liking, correlation analysis between continuous and discrete emotions, and investigation of biosignal differences related to familiarity, which can further inform emotion and liking predictions.
Similar content being viewed by others
Background & Summary
For those who produce audio-visual media, such as movies, dramas, over-the-top media services (OTT), musical shows, and performances, the emotional state of the viewers is an important factor. Because the emotional state of the viewers affects the success of the audio-visual media1. This means that emotion prediction provides significant benefits to producers of the audio-visual media2. Therefore, various studies were conducted to predict the emotional states of the viewers, through facial expressions3,4, gestures3,4, and voices3. Recently, researches on predicting viewers’ emotional states from viewers’ biosignals were actively conducted5. In order to conduct this research, however, a dataset consisting of biosignals and emotional states is necessary. As shown in Table 1, therefore, many datasets were introduced consisting of biosignals and emotional states of people who watching audio-visual media. Among them, DREAMER2, ASCERTAIN6, AMIGOS7, Emognition8, and EmoWear9 created dataset using low-cost biosignal sensors, and AMIGOS7, Emognition8, MAHNOB-HCI10, FACED11, and Yang et al.,12 created dataset by using both continuous and discrete domains for emotional survey. FACED11 and G-REx13 created dataset by performing experiments with more than 100 subjects, and ASCERTAIN6, AMIGOS7, EmoWear9, FACED11, and DEAP14 created dataset by including a liking and a familiarity in survey. However, there is no dataset that uses low-cost biosignal sensor, performs survey in both continuous and discrete domains, conducts experiments with more than 100 subjects, and performs survey in both liking and familiarity. Therefore, this paper introduces a dataset that satisfying these conditions. The importance of each condition will be explained below.
First, it is important to create a dataset using low-cost biosignal sensors, because translational value requires that emotion prediction works with the types of devices consumers will actually use. The number of channels, measurement range, and resolution generally varies depending on the price of the biosignal sensor. When technology to predict emotions using biosignals becomes commercialized, users will use low-cost biosignal sensors, or in other words, consumer-grade biosignal sensors. The criteria for defining low-cost sensors are as follows. First, their price is significantly lower than that of medical-grade devices, thereby improving accessibility for researchers and developers. Second, they have been explicitly described as low-cost or consumer-grade biosignal sensors in prior literature. Of course, the low-cost biosignal sensors mentioned here mean low-cost biosignal sensors that have performance suitable for emotion prediction, such as Emotiv Epoc-X2. For this reason, previous researches have used low-cost biosignal sensors, such as Emotiv Epoc-X, NeuroSky MindWave, Muse 2, Empatica E4, Shimmer ECG, and Shimmer GSR2,6,7,8,9. Therefore, considering the commercialization of the technology, it is reasonable to create a dataset using low-cost biosignal sensors, which ensures both experimental validity and real-world scalability.
Second, it is important to create a dataset by surveying both continuous and discrete emotions, because translationally relevant models must operate across different theoretical and practical representations of affect. Emotional response data can be structured into continuous and discrete domains in affective computing. The continuous model captures affect along dimensional axes such as valence, arousal, and dominance15, while the discrete model categorizes emotions into basic types such as joy, anger, or sadness. The selection of these categories was grounded in the basic emotions framework16, which identifies a set of universally recognizable emotions that exhibit distinct expressive, physiological, and behavioral signatures. Continuous domain emotion has the advantage of being able to describe whole emotion without being affected by differences in language expression of emotion2,10. On the other hand, discrete domain emotion is a long standing approach that has the advantage of being easier to understand intuitively about emotion2. The reason for conducting the emotion survey for both continuous and discrete domains was to enable biosignal-based emotion prediction for continuous or discrete domains, and additionally to identify the relationship between continuous and discrete domain emotions. Previous researches have also conducted surveys in both domains7,8,10,11,12. Therefore, it is reasonable to conduct emotion surveys in both domains, thereby enhancing the dataset’s flexibility for real-world applications that may demand either categorical or dimensional outputs.
Third, it is important to create a dataset that conducts an experiment with more than 100 participants, since translational relevance also depends on robustness across diverse users. Such datasets gain statistical power as the number of participants increases. For this reason, previous researches have conducted experiments on large scale participants11,13. Therefore, it is reasonable to conduct an experiment with a large number of people to create a dataset enabling the development of predictive models that can generalize across heterogeneous populations rather than being limited to narrow laboratory cohorts.
Finally, it is important to create a dataset that surveys liking and familiarity, since translational value requires accounting for ecological factors that shape real-world emotional responses. Liking asks whether participants like the audio-visual media they viewed, and familiarity asks whether participants have viewed the audio-visual media before the experiment. As mentioned earlier, this dataset was created to check the state of people watching an audio-visual media. For audio-visual media producers, not only the emotions of the audience but also their liking are important factors. This is because liking is a more direct evaluation of the audio-visual media than emotion. Additionally, whether the audio-visual media has already been seen is also an important factor, because participants may not feel the same tension when watching a horror movie they have already seen. If there is a difference in biosignals, classification should be performed considering this difference. For this reason, previous researches have included liking and familiarity in their questionnaires6,7,9,11,14. Therefore, it is reasonable to conduct a survey that includes liking and familiarity, which anchors the dataset in ecologically valid user experiences.
Methods
This section includes a description of the ethical procedures for protecting participants, the experimental environment, pre-experimental prohibitions, demographics of participants, audio-visual media for emotion elicitation, used biosignal sensors to be worn, emotion questionnaire, and the experimental protocol and procedures.
Ethics statement
The research background and purpose, research period, participant requirements, recruitment method, consent acquisition method, research method, biosignal sensors, analysis method, safety evaluation criteria and evaluation method, expected adverse effects/cautions and solutions for these, experimental pause and dropout criteria, participants’ risks and benefits, and safety and privacy protection methods of this experiment were reviewed by the institutional review board (IRB) of the Korea institute of industrial technology (KITECH) with an application number of 2024-004-002. The board did not provide a consent waiver.
Experimental environment
As shown in Fig. 1, there were three separate desks and chairs in the experimental space to block the surrounding view and to make an isolated environment. This is to prevent visual distraction from other participants and surrounding environments2. Audio-visual media was played, biosignals were acquired, and surveys were conducted via a laptop installed on the desk. The visual media was presented on a 17.3-inch laptop monitor and the audio media was presented through a wired earphones. This was to prevent auditory disturbances caused by other participants and surrounding environments. By constructing identical and independent desks and using individual earphones, three participants in the same experimental space were able to conduct the experiment without disturbing each other. Finally, an air conditioner was installed in the experimental space to prevent changes in temperature and humidity in the experimental space, and the desired temperature was maintained at a constant of 22 °C. This is taken into consideration because body temperature and skin conductivity can change depending on air temperature and humidity.
All biosignals in the dataset were recorded through a single master computer, with all devices directly synchronized to its system clock, which also served as the source for all timestamps. This design inherently prevents inter-controller misalignment and temporal drift, ensuring clock-level synchronization across all signals without the need for additional validation.
Affective stimuli
The previous researches used audio-visual media for emotion elicitation, and some of them used self-generated audio-visual media14,17,18,19,20,21,22. However, most of these audio-visual media were in English. It was difficult to use these audio-visual media for the general Korean public who are not fluent in English. Therefore, the authors of this paper created audio-visual media using Korean trailers and some scenes from movies, dramas, and documentaries. We prepared a set of 20 audio-visual media candidates for each target emotion. A pilot study involving 97 participants, who did not participate in the main experiment, was conducted to independently assess the emotional elicitation effectiveness of these audio-visual media. Participants reported their emotional responses using both continuous ratings and discrete labels. Based on these results, we excluded media that exhibited high rate of cross-emotional misattribution, where participants consistently responded with a different emotion than intended. For example, media designed to induce anger but frequently rated as sadness were removed. The number of remaining candidates varied by emotion depending on the degree of misattribution observed. From these validated candidates, we selected a final set of 10 audio-visual media for each emotion, aiming to preserve within-category diversity while minimizing potential bias from manual selection. This process ensured a balance between emotional clarity and representative variation, making the dataset suitable for training robust and generalizable affective models.
As a result, a total of 50 audio-visual media were prepared, with 10 audio-visual media for each of the five emotions: neutral, fun, sadness, anger, and fear. Information about each audio-visual medium is shown in Table 2. And the length of each audio-visual media was the same, 60 seconds. This is because audio-visual media of 1 to 10 minutes in length can trigger emotions20,23,24, and audio-visual media of such a short length prevents various emotions from being triggered or becoming habitual10.
Biosignal sensors
Three biosignal sensors were used to acquire five biosignals from the participants. The specific specifications of each sensor are shown in Table 3. The Emotiv Epoc-X is a low-cost electroencephalogram (EEG) sensor suitable for affective computing2, and it was used to generate the DREAMER2 and AMIGOS7 datasets. EEG sensor such as the Biosemi Active II (used in the MAHNOB-HCI and DEAP datasets10,14) cost over 12,000 EUR, and the DSI-24 (Yang et al.12) costs at least 22,000 USD, whereas the Emotiv Epoc-X, employed in the DREAMER dataset and recognized as a low-cost device, is priced at approximately 1,000 USD. The Empatica E4 is a low-cost galvanic skin response (GSR), photoplethysmograph (PPG), and skin temperature (SKT) sensor, which is a practical and valid tool for studying heart rate or heart rate variability in stationary conditions25. It is an effective tool as much as the Biopac MP150 in emotion recognition tasks26 and has been validated in emotion recognition tasks8,27,28,29. While the Biosemi Active II system was used for blood flow, skin conductance, and body temperature measurements in MAHNOB-HCI and DEAP10,14 (≥12,000 EUR), the Empatica E4, applied in Emognition and WESAD8,30 and introduced as a consumer-grade biosignal sensor, costs about 1,700 USD. For respiration (RESP), the Biosemi Active II (≥12,000 EUR) was replaced with the GDX-RB, which is available for approximately 125 USD.
Self-assessment questionnaire
After watching the prepared audio-visual media, the participants filled out a seven-item questionnaire. It included the continuous domain emotion (valence, arousal, dominance), the discrete domain emotion (type and intensity of emotion: neutral, fun, sadness, anger, and fear), liking for the audio-visual media, and whether the audio-visual media had been seen (familiarity).
First, the continuous domain emotion consists of three items: valence, arousal, and dominance, and the participants answered with a number between 1 and 5. This questionnaire has been used in many previous studies investigating emotions2,7,9,10,12,14,17 since it was proposed by Russell15. Among them, valence refers to the intrinsic positivity or negativity of an emotion. A high-valence emotion is pleasant, while a low-valence emotion is unpleasant. Next, arousal indicates the intensity or activation level of an emotion, ranging from calm (low) to highly excited (high). Finally, dominance describes the degree of control or power an individual feels over a situation or emotion, ranging from submissive (low) to dominant (high).
Second, the discrete domain emotion consists of two items: emotion type and intensity, and participants were asked to express one emotion among neutral, fun, sadness, anger, and fear, and to rate the intensity of that emotion with a number between 1 and 6.
Third, participants expressed their liking for audio-visual media by choosing between disliking, neutral, and liking. As mentioned in Affective stimuli section, the affective stimuli consisted of trailers and scenes from movies, dramas, and documentaries. Participants were asked to select ‘liking’ if they wanted to watch the movie or the entire audio-visual media after watching the prepared 60-second audio-visual media, or ‘disliking’ if they did not want to watch the movie or the entire audio-visual media. This can be used to perform biosignal-based liking prediction for audio-visual media.
Fourth, participants answered whether they had seen or not seen the audio-visual media used in the experiment. Participants were asked to select ‘seen’ if they had seen the edited 60-second audio-visual media, the original audio-visual media, or the movie/drama/documentary of the corresponding audio-visual media, or ‘not seen’ if they had not seen any of it.
Experimental procedure and protocol
Before the experiment, the participants were given two prohibitions. One was not to drink alcohol the day before the experiment and the other was not to consume caffeine on the day of the experiment. These two things were prohibited because they affect the cardiovascular system. Such prohibitions are common in researches measuring biosignals8,30.
Participants appeared in the experimental space at the appointed time. Depending on the recruitment of participants, as few as one and as many as three participants conducted the experiment together. As shown in Fig. 2, the overall experimental process is structured in three hierarchical levels. The first row represents the pre-experimental explanation, consent form signing, sensor placement, four experimental blocks, and three inter-block breaks. The second row details the structure within each block, consisting of 12 or 13 trials. The third row illustrates the trial sequence, where each trial involved 60 seconds of audio-visual media viewing followed by at least 15 seconds of self-assessment.
After all recruited participants appeared, the contents of the experiment were introduced. This included the research background and purpose, participant requirements, conditions for ineligibility for participation, biosignal sensors to be worn, the experimental task, the experimental procedure, the experimental time, expected adverse effects/cautions and solutions for these, reward, reasons for reward restrictions, voluntary participation and withdrawal from the experiment, and safety and privacy protection methods. The experimental task was to survey emotions after viewing 50 audio-visual media, and it was emphasized that the survey was about elicited emotions, not intended emotions from audio-visual media. The experiment took about 150 minutes, and the reward for participating in the experiment was about $35.
Then, the graphical user interface (GUI) of the experiment shown in Fig. 3 was introduced. First, as shown in the upper part of Fig. 3a, the trial number was presented. This was to provide information about the progress of the experiment. The participant started the experiment by pressing the ‘Begin’ button of Fig. 3a. Then, a 60-second audio-visual media was played. The participant was asked to look only at the monitor while the audio-visual media was played. This is because there may be participants who do not faithfully perform the task of feeling emotions while looking at the audio-visual media.
After the audio-visual media was played, the participants filled out the questionnaire of Fig. 3b. As in the DREAMER dataset experiment2, the meaning of each item was explained to participants so that they could understand each questionnaire, then all items were explained with three specific examples. Participants were asked to spend at least 15 seconds (resting time) filling out the questionnaire as shown in the upper right part of Fig. 3b. This was to force them to restore from the elicited emotion to a neutral state. Such survey time limitation was common in previous studies6,17,31. While a short 15-second interval was applied between successive trials within the same emotional category, the experiment was structured to minimize strong emotional contrast by presenting stimuli of the same category consecutively (e.g., strong fun followed by mild fun). In addition, when transitioning between different emotional categories—particularly between distant ones such as fun and sadness—we implemented an extended recovery phase. This consisted of a 5-minute break and the presentation of at least two neutral stimuli, ensuring a minimum interval of 7 minutes and 30 seconds before the next emotion-inducing stimulus. After completing the questionnaire, as shown in the lower part of Fig. 3c, a ‘Next’ button appeared on the screen, which allowed them to start the next trial. In addition, as shown in lower part of Fig. 3c, the intended emotion of the next trial was presented by ‘following emotion’. In the absence of this information, participants could have experienced fear while viewing neutral audio-visual media, after viewing consequtive fear-inducing audio-visual media. This was also to prevent participants from being startled when watching audio-visual media that induces fear.
While viewing 50 audio-visual media, three 5-minute breaks were given. Breaks were intended to keep participants focused on the experiment and prevent physical and mental fatigue. Breaks began after viewing the 13th, 26th, and 38th audio-visual media. Instructions for this break were provided as shown in the lower part of Fig. 3d. During the break, the Emotiv Epoc-X was removed. This was because wearing the Emotiv Epoc-X for a long time could cause pain in the scalp. After the break, the ‘Next’ button appeared in the same way, allowing the next trial to begin.
Then, the participant heard a description of the sleeping time shown in the lower part of Fig. 3a,b,c. This was a value that increased by one if the participant closed their eyes for more than 5 seconds, and the closing of the eyes was identified through Tobii Pro Spark. Participants were dropped out when their sleeping time reached 3. This was done to prevent participants from losing focus on the experiment. The experiment ended after viewing all 50 audio-visual media.
Finally, the participant was told again about the precautions. First, they should not doze off or fall asleep. Second, they should not move their head, non-dominant hand, and body. Last, they should not touch any of the biosignal sensors they were wearing.
After hearing this explanation, participants signed a consent form indicating that they wanted to participate in the experiment. This consent also includes consent to share their data in an anonymized form. Then, the participants were asked to turn off all smart devices, including smartphones, smartwatches, and smartbands. This is because notifications generated from smart devices interfere with the participant’s task. Then, his or her body temperature was measured and the biosignal sensors were worn. As shown in Fig. 4, the Empatica E4 was worn on the non-dominant hand, the Emotiv Epoc-X was worn on the head according to the 10-20 system, and the Vernier GDX-RB was worn on the abdomen or chest. Then, the participant was instructed not to move his or her non-dominant hand and head, and then the second temperature measurement was performed. At this time, the two temperature measurements were performed to confirm that the participant had reached a normal body temperature. There was a time gap of approximately 30 minutes between the two temperature measurements. If the participant’s body temperature changed by more than 1 °C, the body temperature was measured again after 5 minutes. Once the body temperature was determined to be normal, the experiment was started.
It is also important in what order the prepared audio-visual media are presented to the participants. This will be explained with an example in Fig. 5. The experiment consisted of four blocks, each of which contained 13, 13, 12, and 12 trials. At this time, audio-visual media eliciting neutral emotions were presented in the first 3, 3, 2, and 2 trials of each block. This was to remove daily dependencies2, to acquire normal-state biosignals, and to recover from the elicited emotions31. In the next 10 trials for each block, audio-visual media eliciting the same emotion were played sequentially, with the emotions were selected randomly. The reason for grouping audio-visual media of the same emotion into one block is that it is difficult for participants to experience sudden emotional changes. This block level experimental protocol was similar to that of previous dataset experiments11,12. In addition, the 10 prepared audio-visual media for each emotion were also presented in a fully randomized order. At this time, the fully randomized presentation order for same emotion included audio-visual media that elicited neutral emotion. In summary, the presentation order of emotions was fully randomized at the block level, and the presentation order of audio-visual media was fully randomized at the trial level.
Participants
A total of 114 people participated in the experiment, and their mean age was 27.73, standard deviation (SD) was 9.08, and they ranged from 20 to 57 years old. Of them, 57 were male and 57 were female. However, the data of six of them were invalid. The reasons were as follows: 1. One participant consumed caffeine on the day of the experiment. 2. One participant moved continuously during the experiment. 3. One participant withdrew from the experiment due to fear of audio-visual media that elicited fear. 4. One participant’s data was missing due to a sensor malfunction. 5. One participant did not understand Korean fluently. All audio-visual media were prepared in Korean, incorporating dialogue, narration, and culturally specific references. The excluded participant reported difficulty in comprehending the content during the post-experiment debriefing, indicating that limited Korean language proficiency hindered their engagement with the stimuli. Because proper comprehension of the media was essential for eliciting reliable emotional responses and maintaining internal validity, we excluded this participant’s data. 6. One participant was found to have a tic disorder, so the data was excluded.
Data Records
The dataset is available at ‘LLaMAC: Low-cost Biosignal Sensor based Large Multimodal Dataset for Affective Computing’32. In this dataset, each folder represents a participant number, and each folder contains the files answer.csv, band_#.csv, eeg_#.csv, and respiration_#.csv, where # represents the trial number that each participant performed in the experiment. Trial numbers range from 1 to 50.
1. answer.csv
The first column of this file represents the trial number, and the second column represents the number of audio-visual media presented to the participant. The numbers of each audio-visual media are in the same order as they appear in the Table 2. Columns 3 to 5 show the results of the continuous domain emotion questionnaire on a 5-point scale, measuring valence (1 = negative to 5 = positive), arousal (1 = calm to 5 = highly excited), and dominance (1 = submissive to 5 = dominant). Column 6 shows the results of the liking questionnaire (1: disliking, 2: neutral, 3: liking). Column 7 (1: neutral, 2: fun, 3: sadness, 4: anger, 5: fear,) and column 8 reflects the strength of that emotion on a 6-point scale (1 = weakest, 6 = strongest).column 8 (ranges from 1 to 6) show the results of the discrete domain emotion questionnaire, and the last column 9 shows the results of the seen (1: unseen, 2: seen) questionnaire.
2. band_#.csv
This file contains raw GSR, PPG, and SKT data. Columns 1 and 2 show the raw GSR values (μS) and the time at which they were acquired, columns 3 and 4 show the raw PPG values and the time at which they were acquired, and finally columns 5 and 6 show the raw SKT values (°C) and the time at which they were acquired. At this time, all times used UNIX time. Some participants’ GSR signals showed values outside the measurable range of the sensor. This is due to the non-contact nature of the electrodes. Therefore, it is recommended not to use the GSR signals of participants 32, 37, 40, 47, 54, 55, 56, 70, and 99. Additionally, we recommend excluding specific trials of participants listed in Table 4 for the same reasons.
3. eeg_#.csv
The first column of this file is the EEG signal acquisition time, which is presented in UNIX time. Next, from the second to the fifteenth columns are raw 14-channel EEG signals (μV) of AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4. Due to missing some EEG signals, we recommend excluding certain trials of certain participants as shown in Table 4.
4. respiration_#.csv
The first column of this file presents the time at which the respiration sensor acquired the signal in the format year-month-day hour:minute:second.microsecond. The second column is the raw signal (N) measured by the belt-type respiration sensor.
Biosignal processing
Heart rate (HR) was estimated from the PPG signal by first centering the raw signal via median subtraction to remove baseline fluctuations. A Butterworth bandpass filter (0.5 - 8 Hz) was applied to isolate cardiac-related frequencies and reduce noise and motion artifacts. Peaks corresponding to heartbeats were detected using an adaptive peak detection algorithm with a minimum peak distance of 0.35 s and a prominence fraction of 0.08, after which inter-beat intervals (IBIs) were calculated and converted to beats per minute (BPM). Heart rate variability (HRV) features, including the mean and standard deviation of IBIs, the root mean square of successive differences (RMSSD), and pNN50 (IBI threshold of 0.05 s), were also derived. In addition, the temporal slope and zero-crossing rate (ZCR) were computed to characterize trend and fluctuation dynamics.
RESP signals were processed similarly: an adaptive peak detection algorithm with a minimum peak distance of 1.0 s and a prominence fraction of 0.08 was used to identify breaths, followed by calculation of inter-breath intervals (IBIs) and conversion to breaths per minute (BPM). with additional measures including the mean and standard deviation of IBIs, the temporal slope, and the ZCR.
GSR signals were interpolated to address missing values and analyzed in terms of both tonic and phasic components. Phasic activity was characterized by the number of skin conductance responses (SCRs) per trial, along with their mean and maximum amplitudes. Tonic activity was summarized using robust descriptive statistics (mean, variance, interquartile range, and quantiles) as well as the linear slope. In addition, the ZCR was calculated to capture fluctuation dynamics.
SKT signals were represented through descriptive statistics, temporal slope, and interquartile range, providing insights into both baseline levels and gradual thermal drifts over the course of the trials.
EEG signals were recorded using the Emotiv Epoc-X headset, which employs 14 saline-based electrodes (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4) referenced to CMS/DRL electrodes that implement active common-mode noise rejection. Device-level preprocessing included a 0.2-45 Hz band-pass filter, 50/60 Hz notch filters, and a 5th-order sinc digital filter. As these filters are applied by the device, no additional software-based band-pass or notch filtering was performed. Independent component analysis (ICA) was not used for the removal of ocular (typically < 4 Hz) or electromyographic (typically > 20 Hz) artifacts. Instead, corrupted or missing trials and channels were excluded following a trial-level quality control procedure (see Table 4). This strategy ensured that subsequent analyses were conducted on minimally processed but quality-controlled EEG signals.
Feature extraction was then applied to the retained EEG data. In the time domain, features included robust descriptive statistics (mean, variance, skewness, kurtosis, interquartile range, quantiles, root mean square (RMS), and energy), Hjorth parameters (activity, mobility, complexity), line length, zero-crossing rate, and temporal slope. In the frequency domain, Welch’s method is used to estimate the power spectral density (PSD), from which absolute and relative band powers were calculated for δ (1-4 Hz), θ (4-7 Hz), α (8-13 Hz), β (13-30 Hz), and γ (30-45 Hz). Additional spectral features included spectral entropy and alpha peak frequency. Trials with insufficient valid samples produce NaN values and were excluded from aggregate analyses.
Technical Validation
This section validated the generated data. This included validation of audio-visual media, biosignals and self-assessment.
Validation for audio-visual media
To evaluate the effectiveness of this selection procedure, we compared emotion induction performance between the initial, non-selected pool and the refined set. As shown in Table 5, the refined set achieved higher precision for most emotions, with particularly large improvements for anger (from 0.705 to 0.893) and fear (from 0.887 to 0.936), which are often confounded in affective computing due to their shared high-arousal, low-valence characteristics. Cohen’s κ improved from 0.682 to 0.753, indicating stronger agreement between intended and perceived emotions.
While pilot data validated the initial selection of stimuli, their effectiveness was further assessed in the main experiment with 108 participants, yielding 5,400 self-assessment responses. Precision for each discrete emotion was computed, and overall performance metrics showed strong agreement between intended and reported emotions, with overall accuracy of 0.843 and Cohen’s κ of 0.804 (see Table 6). To evaluate stimulus-level performance, intended labels were compared with participant-reported labels, and a confusion matrix displaying both raw counts and row-normalized percentages was generated (see Fig. 6). The diagonal percentages in the matrix show the elicitation rate of each video, clearly indicating how effectively each stimulus elicited its target emotional response.
Validation for biosignals
As shown in Tables 7, 8, 9 and 10, to validate the quality of recorded biosignals, statics and signal-to-noise ratios (SNRs) were computed. SNRs were computed using an autocorrelation-based center fitting technique, with quadratic polynomial interpolation performed over four samples (excluding the center, including two samples on either side). This SNR computation has also been used in previous studies8,9. SNR values were computed for each biosignal, each trial, and each participant. The mean, SD, maximum, Q75, Q50, Q25, Q0.3, and minimum of the SNR values obtained for each biosignal are shown in Tables 9 and 10. Here, Q75, Q50, Q25, and Q0.3 represent the top 75%, 50%, 25%, and 0.3% SNR values, respectively. As shown in Tables 9 and 10, the mean of the SNR values ranges from 11.42 dB to 43.37 dB, and the SD ranges from 0.50 dB to 2.37 dB. Additionally, the proportion of SNR values less than 5 dB and 10 dB for each biosignal is shown in Tables 9 and 10.
All biosignals except GSR, HR, and RESP showed values greater than 21.47 dB, and 99.98 %, 99.98 %, and 75.04 % of GSR, HR, and RESP values showed values greater than 10 dB, respectively.
Validation for self-assessments
Figure 7 shows the valence and arousal distribution from self-assessment. Figure 7a shows 2D kernel density estimates (KDE) for each emotion with overlaid mean points per video, highlighting skewness, tails, and multimodal patterns to emphasize non-Gaussian features. Figure 7b summarizes each emotion using pooled means and RMS-based circular dispersion bands, which replace error bars and explicitly account for low or variable valence-arousal correlations. Additionally, 95% covariance ellipses were computed to support the main findings. Together, the KDE, circular bands, and covariance ellipses provide a clearer and more accurate visualization of the valence-arousal distributions across all emotions.
Next, statistical analysis was conducted to validate that participants elicited different emotions depending on the intended emotion. One-way analysis of variance (ANOVA) was used for the analysis, with the intended emotion of audio-visual media as the independent variable and valence, arousal, and dominance as the dependent variables. As a result, there were statistically significant differences in valence [F(4, 5395) = 3017.682, p < 0.001***], arousal [F(4, 5395) = 1044.064, p < 0.001***], and dominance [F(4, 5395) = 447.357, p < 0.001***] depending on the intended emotion. Additionally, a post-hoc pairwise comparison with Tukey’s honestly significant difference (HSD) test showed that there was no statistically significant difference between the dominances of fun and anger (p = 0.372). On the other hand, there was a statistically significant difference between all other comparisons (p < 0.001***).
Additionally, a Chi-square test was conducted to assess the congruence between the intended emotions of the audio-visual media and the reported discrete emotions from the 5,400 self-assessments. The analysis revealed a statistically significant association between the two variables [χ2(16) = 14824.090, p < 0.001***]. These results indicate a strong association between participants’ reported discrete emotions and the intended emotion of the audio-visual media, supporting the validity of the emotional classifications in the dataset.
Also, the distribution of continuous domain emotion responses based on the participants’ discrete domain emotion responses is shown in Fig. 7b. This distribution is intended to confirm the correlation between continuous and discrete domain emotions. The valence and arousal distribution by audio-visual media in Fig. 7a was similar to the valence and arousal distribution by discrete domain emotion responses in Fig. 7b. The valence and arousal distribution for the discrete domain emotion responses of neutral, fun, sadness, anger, and fear were [high valence / low arousal], [high valence / high arousal], [low valence / low arousal], [low valence / high arousal], and [low valence / high arousal], respectively.
Next, statistical analysis was performed to confirm whether the differences in valence, arousal, and dominance according to each discrete domain emotion response showed statistically significant differences. Discrete domain emotion responses were used as independent variables and valence, arousal, and dominance were used as dependent variables. As a result, statistically significant differences were found in all of valence [F(4, 5395) = 4197.515, p < 0.001***], arousal [F(4, 5395) = 1328.217, p < 0.001***], and dominance [F(4, 5395) = 720.434, p < 0.001***]. Additionally, a post-hoc pairwise comparison with Tukey’s HSD test showed that there was no statistically significant difference between the valences of anger and fear (p = 0.981) and between the arousals of anger and fear (p = 0.542). On the other hand, there was a statistically significant difference between the dominances of anger and fear (p = 0.002**) and all other comparisons (p < 0.001***).
Emotion classification
A baseline emotional classification task was implemented to evaluate whether the biosignals could predict emotional states. A total of 574 features were extracted from EEG, GSR, PPG, SKT, and RESP. These comprised time-domain statistical features (minimum, maximum, mean, variance, standard deviation, median, interquartile range, Q10 and Q90 values, skewness, kurtosis, and coefficient of variation) and amplitude/energy-related features (root mean square and energy). In addition, biosignal-specific features were extracted: EEG frequency-domain characteristics (band power in Delta, Theta, Alpha, Beta, Gamma), GSR (slope and zero-crossing rate), PPG (heart rate and mean inter-beat interval), SKT (slope-based features), and RESP (breaths per minute and mean inter-breath interval).
Using these features, a light gradient boosting machine (LightGBM) classifier was trained to classify five discrete emotional states (neutral, fun, sadness, anger, fear). The model achieved an accuracy of 68.2% and a Cohen’s κ of 0.590. As shown in Fig. 8, high-arousal emotions (fun, anger, fear) exhibited precision values above 0.710, whereas lower-arousal states (neutral, sadness) showed precision values of 0.581 – 0.622. This indicates differential discriminability across emotional categories. These findings suggest that multimodal biosignals, particularly EEG and peripheral features, contain informative patterns for decoding affective states, thereby providing supporting evidence that the stimuli successfully elicit distinguishable physiological responses.
Usage Notes
Figshare repository
The dataset preprocessing and analysis code associated with this paper has been uploaded to https://doi.org/10.6084/m9.figshare.28748696.v632. The codebase consists of one Jupyter Notebook and several standalone Python scripts. The Jupyter Notebook file ‘2025_Kitech_Emotion_Data_Code.ipynb’ contains code for preprocessing and analyzing the dataset introduced in this study. The script iteratively processes data for 108 subjects, calculating statistical metrics to validate emotion classification consistency. Specifically, the code computes Cohen’s κ and overall accuracy by comparing intended versus reported emotions, and visualizes the results as a confusion matrix heatmap. It relies on the following packages and versions: numpy (2.3.2), pandas (2.3.1), scikit-learn (1.7.1), xgboost (3.0.4), lightgbm (4.6.0), matplotlib (3.10.5), and seaborn (0.13.2). Additional scripts include ‘compute_snr.py’ for calculating the SNR, ‘compute_snr_statistics.py’ for computing SNR statistics, and ‘compute_raw_statistics.py’ for generating statistics from the raw data.
Limitations
Firstly, air conditioning was used during the experiments to maintain a stable temperature; however, it was not fully effective at controlling humidity. Variations in humidity can influence physiological signals such as skin conductance, potentially introducing noise or variability in the measurements. Despite this limitation, the presence of air conditioning reduced extreme fluctuations in environmental conditions compared to an uncontrolled setting, ensuring that the resulting dataset remains valuable and suitable for affective computing research. For future data collection, air conditioning may be complemented with a dehumidifier to achieve more consistent control of both temperature and humidity.
Secondly, participants were instructed not to consume caffeine on the day of the experiment and to abstain from alcohol the day before. However, there was no objective way to verify compliance. In addition, no restrictions were imposed on food intake prior to the experiment, even though pre-experimental dietary control would have been desirable. One candidate was, in fact, excluded from participation after self-reporting alcohol consumption during the briefing session. Although such restrictions may not have been strictly enforced, participants were explicitly instructed regarding caffeine and alcohol avoidance, and it is likely that most participants complied. Therefore, the dataset retains its value. For future data collection, stricter protocols will be adopted, following established guidelines—for example, no meals within 2 hours before the experiment, no caffeine within 6 hours, and no alcohol within 24 hours. Furthermore, objective verification such as breath alcohol testing will be implemented to ensure adherence.
Data availability
The dataset is available at https://doi.org/10.6084/m9.figshare.28748696.v6.
Code availability
The code for calculating raw biosignal statistics, computing the SNRs of biosignals, calculating statistics of the SNRs, performing preprocessing, and conducting LightGBM-based emotion classification is available at: https://doi.org/10.6084/m9.figshare.28748696.v6.
References
Evans, R. Catching feelings: Measurement of theatre audience emotional response through performance. University of Central Lancashire (2022).
Katsigiannis, S. & Ramzan, N. DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE Journal of Biomedical and Health Informatics 22, 98–107, https://doi.org/10.1109/JBHI.2017.2688239 (2017).
Tao, J. & Tan, T. Affective computing: a review. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction 981–995 (2005).
Leong, S. C. et al. Facial expression and body gesture emotion recognition: a systematic review on the use of visual data in affective computing. Computer Science Review 48, 100545, https://doi.org/10.1016/j.cosrev.2023.100545 (2023).
Maria, E., Matthias, L. & Sten, H. Emotion recognition from physiological signal analysis: a review. Electronic Notes in Theoretical Computer Science 343, 35–55 (2019).
Subramanian, R. et al. ASCERTAIN: emotion and personality recognition using commercial sensors. IEEE Transactions on Affective Computing 9, 147–160 (2016).
Miranda-Correa, J. A. et al. AMIGOS: a dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing 12, 479–493 (2018).
Saganowski, S. et al. Emognition dataset: emotion recognition with self-reports, facial expressions, and physiology using wearables. Sci. Data 9, 158, https://doi.org/10.1038/s41597-022-01262-0 (2022).
Rahmani, M. H. et al. EmoWear: Wearable Physiological and Motion Dataset for Emotion Recognition and Context Awareness. Sci. Data 11, 648, https://doi.org/10.1038/s41597-024-03429-3 (2024).
Soleymani, M. et al. A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing 3, 42–55 (2011).
Chen, J. et al. A Large Finer-grained Affective Computing EEG Dataset. Sci. Data 10, 740, https://doi.org/10.1038/s41597-023-02650-w (2023).
Yang, P. et al. A Multimodal Dataset for Mixed Emotion Recognition. Sci. Data 11, 847, https://doi.org/10.1038/s41597-024-03676-4 (2024).
Bota, P. et al. A real-world dataset of group emotion experiences based on physiological data. Sci. Data 11, 116, https://doi.org/10.1038/s41597-023-02905-6 (2024).
Koelstra, S. et al. DEAP: a database for emotion analysis using physiological signals. IEEE Transactions on Affective Computing 3, 18–31 (2011).
Russell, J. A. A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178 (1980).
Ekman, P. An argument for basic emotions. Cognition and Emotion 6, 169–200 (1992).
Abadi, M. K. et al. DECAF: MEG-based multimodal database for decoding affective physiological responses. IEEE Transactions on Affective Computing 6, 209–222 (2015).
Gross, J. J. & Levenson, R. W. Emotion elicitation using films. Cognition and Emotion 9, 87–108 (1995).
Morris, J. D. Observations: SAM: the self-assessment manikin; an efficient cross-cultural measurement of emotional response. Journal of Advertising Research 35, 63–68 (1995).
Schaefer, A. et al. Assessing the effectiveness of a large database of emotion-eliciting films: a new tool for emotion researchers. Cognition and Emotion 24, 1153–1172 (2010).
Gabert-Quillen, C. A. et al. Ratings for emotion film clips. Behavior Research Methods 47, 773–787 (2015).
Kaczmarek, L. D. et al. Splitting the affective atom: divergence of valence and approach-avoidance motivation during a dynamic emotional experience. Current Psychology 40, 3272–3283 (2021).
Rottenberg, J., Ray, R. D. & Gross, J. J. Emotion elicitation using films. In Handbook of Emotion Elicitation and Assessment (eds Coan, J. A. & Allen, J. J. B.) 9-28 (Oxford Univ. Press, 2007).
Soleymani, M. et al. Analysis of EEG signals and facial expression for continuous emotion detection. IEEE Transactions on Affective Computing 7, 17–28 (2016).
Schuurmans, A. T. et al. Validity of the Empatica E4 wristband to measure heart rate variability (HRV) parameters: a comparison to electrocardiography (ECG). Psychophysiology 44, 1–11 (2020).
Ragot, M. et al. Emotion recognition using physiological signals: laboratory vs. wearable sensors. In International Conference on Advances in Human Factors and Wearable Technologies 15–22 (2018).
Schmidt, P. et al. Multi-target affect detection in the wild: an exploratory study. In Proceedings of the ACM International Symposium on Wearable Computers 211–219 (2019).
Saganowski, S. et al. Consumer wearables and affective computing for wellbeing support. In Proceedings of the International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services 482-487 (2020).
Dzieżyc, M. et al. How to catch them all? Enhanced data collection for emotion recognition in the field. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops 345-351 (2021).
Schmidt, P. et al. Introducing WESAD, a multimodal dataset for wearable stress and affect detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction 400-408 (2018).
Li, T. H. et al. Classification of five emotions from EEG and eye movement signals: discrimination ability and stability over time. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER) 607-610 (2019).
Lee, C.-G. & Kim J. Y. LLaMAC: Low-cost Biosignal Sensor based Large Multimodal Dataset for Affective Computing. figshare https://doi.org/10.6084/m9.figshare.28748696.v6.
Sharma, K. et al. A dataset of continuous affect annotations and physiological signals for emotion analysis. Sci. Data 6, 196, https://doi.org/10.1038/s41597-019-0209-0 (2019).
Acknowledgements
This research was supported by Culture, Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by the Ministry of Culture, Sports and Tourism in 2023 (Project Name: Development of realtime feedback visualization and multimodal performance by using performer-audience emotion status, Project Number: RS-2023-00219678, Contribution Rate: 100%).
Author information
Authors and Affiliations
Contributions
C.-G. L. designed the experiment, collected and validated the data, created the experimental software, drafted the main manuscript, and prepared the figures and tables and J.Y. K. validated the data, drafted the main manuscript, and prepared the figures and tables.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lee, CG., Kim, J.Y. LLaMAC: low-cost biosignal sensor based large multimodal dataset for affective computing. Sci Data 12, 1911 (2025). https://doi.org/10.1038/s41597-025-06165-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-06165-4










