An fMRI dataset in response to large-scale short natural dynamic facial expression videos

Chen, Panpan; Zhang, Chi; Li, Bao; Tong, Li; Wang, LinYuan; Ma, ShuXiao; Cao, Long; Yu, ZiYa; Yan, Bin

doi:10.1038/s41597-024-04088-0

Download PDF

Data Descriptor
Open access
Published: 19 November 2024

An fMRI dataset in response to large-scale short natural dynamic facial expression videos

Scientific Data volume 11, Article number: 1247 (2024) Cite this article

3290 Accesses
Metrics details

Subjects

Abstract

Facial expression is among the most natural methods for human beings to convey their emotional information in daily life. Although the neural mechanisms of facial expression have been extensively studied employing lab-controlled images and a small number of lab-controlled video stimuli, how the human brain processes natural dynamic facial expression videos still needs to be investigated. To our knowledge, this type of data specifically on large-scale natural facial expression videos is currently missing. We describe here the natural Facial Expressions Dataset (NFED), an fMRI dataset including responses to 1,320 short (3-second) natural facial expression video clips. These video clips are annotated with three types of labels: emotion, gender, and ethnicity, along with accompanying metadata. We validate that the dataset has good quality within and across participants and, notably, can capture temporal and spatial stimuli features. NFED provides researchers with fMRI data for understanding of the visual processing of large number of natural facial expression videos.

A large-scale fMRI dataset for human action recognition

Article Open access 27 June 2023

A large-scale fMRI dataset for the visual processing of naturalistic scenes

Article Open access 23 August 2023

Natural scene reconstruction from fMRI signals using generative latent diffusion

Article Open access 20 September 2023

Background & Summary

Facial expressions are essential to nonverbal communication¹, and how they are represented in the human brain is a core question in affective science as important affective information is carried out this way^2,3,4. Affective computing is also one of the crucial research areas in contemporary human-computer interaction⁵.

The research on neural mechanisms underlying the processing of facial expressions has made significant progress in the past couple of decades. Researchers have systematically investigated the neural mechanisms involved in facial expression processing by using images or a small number of video stimuli. Numerous brain regions have been determined as involved in processing emotional information conveyed by observed facial expressions. These regions include the posterior superior temporal sulcus (pSTS)^6,7,8,9,10 and the temporoparietal junction (TPJ)^11,12,13, responsible for processing dynamic faces; the fusiform face area (FFA)^{14,15,16,17,18,19} and the occipital face area (OFA)^20,21,22, which characterize faces’ static features; and the inferior parietal (IP) region, which processes facial expression and face recognition²¹. Large-scale functional magnetic resonance imaging (fMRI) datasets for facial expression image stimuli were employed to represent the relationship between brain responses and perceived stimuli through neural encoding and decoding^23,24,25. However, these studies on facial expression processing used lab-controlled images and a limited number of lab-controlled video stimuli. How the human brain processes the complex characterization of natural dynamic facial expression videos still needs to be explored.

The utilization of naturalistic stimuli like films has enhanced our comprehension of the human brain in the field of cognitive neuroscience²⁶. Experiments using lab-controlled stimuli necessarily aim at the brain’s responses within limited ranges. However, the naturalistic stimuli’s the sample space is much broader^27,28. Naturalistic stimuli are more suitable for engaging participants and maintaining their attention, thereby evoking rich, real-life neural processes^{29,30,31,32,33}. An fMRI dataset with a continuous movie stimulus has been collected to investigate social cognitive processes and face processing²⁶. However, the lack of appropriate facial expression annotations (e.g., category labels) for these movie stimuli restricts the use of such datasets to test hypotheses regarding facial expressions. Moreover, the extended length of movies brings about ambiguous temporal event boundaries and other complexities^34,35. Consequently, short naturalistic videos achieve an optimal balance between ecological validity and experimental control³⁶.

Therefore, we present the Natural Facial Expressions Dataset (NFED), which recorded fMRI responses to 1,320 short (3-second) natural facial expression videos using a rapid event-related paradigm. The videos were sampled from the DFEW³⁷ and CAER³⁸. DFEW and CAER are collections of video clips respectively extracted from over 1,500 movies and 79 television series, encompassing various challenging disruptions (e.g., pose changes and extreme illuminations). These datasets were created for recognizing dynamic facial expressions in uncontrolled real-world environments in the field of computer vision, facilitating collaboration with the most advanced computer vision models. Additionally, to enhance the potential of NFED, we annotated a series of semantic metadata for natural facial expression videos (Fig. 1c).

Below, we validate NFED’s quality through within and across participants analysis and display its suitability for modeling methods. First, we illustrate the hierarchical correspondence between deep convolutional neural networks (DCNNs) and the human brain by constructing encoding models. We then show that NFED can capture temporal and spatial stimulus features through metadata-driven analysis. This dataset provides fMRI data for researchers who are specifically interested in studying the visual processing of natural facial expression videos.

Methods

Participants

The research was approved by the Ethics Committee of the First Affiliated Hospital of Henan University of Chinese Medicine(approval number: AF/SC-08/03.2). To recruit potential participants, flyers were posted on campus. The experiment initially admitted 8 participants. All participants provided informed written consent for the sharing of their anonymized data. In the screening session, 3 participants were not admitted to participate in the NFED experiment because their attention was not focused. Finally, 5 participants (male, mean age ± SD = 28.6 ± 3.83 years) participated in the NFED experiment.

Stimuli

The stimuli for natural facial expression videos were sampled from the DFEW and CAER datasets, which were annotated with the expression categories. The DFEW and CAER datasets were respectively gathered from over 1,500 movies worldwide and 79 TV shows, encompassing various challenging disruptions (e.g., pose changes and extreme illuminations). Both DFEW and CAER are designed to be implemented as benchmarks for assessing the performance of computer vision models in recognizing dynamic facial expressions in real-world environments. Because our goal is to gather fMRI data for understanding the processing of emotional information conveyed by natural dynamic facial expressions, we selected 1,320 video clips extracted from the two datasets as stimuli for the NFED experiment. Each of these video clips is annotated with three types of labels: emotion, gender, and ethnicity (Table 1). Each video was square-cropped, trimmed to 3 seconds (composed of 30 frames per second), resized to 768 × 768 pixels, covered by a circular mask, and situated on a gray backdrop. The video lasts for three seconds, which is roughly equivalent to the length of human working memory^39,40, and is considered an optimal duration for capturing brain responses to basic events³⁶. Video durations less than three seconds may fail to capture complete actions, while those that are more than three seconds may introduce unwanted composite expressions (e.g., starting with an expression of happiness and then turning neutral). Compared with lab-controlled images and video stimuli, the naturalistic nature of movies and TV shows guarantees that we can evoke and capture the full extent of brain activity^13,41,42.

Table 1 The category of each video and the number of videos corresponding to the category are as follows.

Full size table

The 1,320 videos were divided into two sets: “testing” and “training” sets. The testing set included 120 videos, and the training set encompassed the remaining 1,200 videos. In the testing set and training set, the number of repetitions presented to the participants was different. Specifically, the NFED comprised a testing set with a high repetition of 10 times and a training set with a low repetition of 2 times, serving to support analyses that rely on numerous repetitions and stimuli. In addition, the purpose of designing the testing and training sets is to mirror the testing and training sets used in the field of machine learning, indicating their potential application in constructing and assessing models.

Experimental design

A total of 11 separate fMRI sessions were completed by each participant. Session 1 included field maps, the population receptive field (pRF)⁴³ defining retinotopic visual areas, functional localizer (fLoc)⁴⁴ experiments defining category-selective regions, and structural scans. Sessions 2–11 included field maps and the runs of main functional experiment (during which the participants watched testing and training set videos). For a given participant throughout the entire experiment, each testing set video was presented 10 times, and each training set video was presented 2 times, resulting in 1,200 trials in the testing sets and 2,400 trials in the training sets. We arranged the trials into experimental runs, ensuring that each run exclusively comprised either training set videos (training runs) or testing set videos (test runs). Each of the 10 sessions after session 1 consisted of 4 training runs and 2 test runs. Field maps are located at the beginning, middle, and end of task runs in sessions 1–11.

Main Experiment

Natural facial expression videos were displayed employing a rapid event-related design with a trial structure of 3-second ON/3-second OFF. The main experiment consisted of 60 runs for each participant. Each run was composed of 30 distinct stimulus trials repeated twice, 4 one-back trials, and 4 blank trials (Fig. 1a). These trials were randomly presented in each run. Blank periods of 12 and 8 seconds were added to each run’s beginning and end to establish a robust baseline. Each run lasted 428 seconds. All stimuli (11.3° × 11.3° visual angle) were exhibited at the gray screen’s center and covered by a central white fixation cross (0.5° × 0.5° visual angle). Participants were requested to concentrate on the fixation cross during MRI scans and to press a button during the one-back task. The average accuracy of all participants for the one-back task was 98.7%.

Natural facial expression videos were presented using a BOLDscreen 32 monitor (resolution: 1920 × 1080, frequency: 120 Hz; size: 89 cm(width) × 50 cm(height); viewing distance: 168 cm) placed at the head of the scanner bed. Subjects watched the monitor through a mirror mounted on the head coil. The visual stimulation program was written using PsychoPy 2022.2.3, and the stimulation playback was remotely and synchronously controlled by a computer. Meanwhile, a dedicated MRI button box was used to record the behavioral data of the participants. Besides, an Eye-Link 1000 Plus system was employed to monitor the attention and sleep states of the participants during MRI scans (Fig. 1b).

Population receptive field (pRF) experiment

The early visual cortex’s retinotopic organization in each participant was mapped employing the pRF experiments obtained from the Human Connectome Project (HCP) 7 T Retinotopy experiment^43,45. The stimuli comprised of apertures that moved slowly and were filled with a background of pink noise and dynamic colorful textures. Two different types of runs were utilized: multibar and wedgering. A total of 4 runs (2 multibar and 2 wedgering) were gathered for the NFED experiment. Each run lasted 300 seconds. There was a fixation dot shown at the screen’s center throughout the stimuli’s presentation. The dot randomly changed its color to either red, black, or white. Participants were requested to pay attention to the fixation dot and push a button when the central dot’s color turns red.

Functional localizer (fLoc) experiment

We utilized fLoc (http://vpnl.stanford.edu/fLoc/) to localize category-selective regions^44,45. The experiment showed 10 distinct categories of grayscale images, encompassing faces (adult and child), characters (word and number), bodies (body and limb), objects (car and instrument), and places (corridor and house). Eight images from a specific category were presented successively in each trial, with each image lasting 0.5 seconds. For the NFED experiment, a total of 4 runs were gathered in session 1. Each run lasted 300 seconds. There was a small red dot shown at the screen’s center during the presentation of the stimuli. We requested participants to pay attention to the fixation dot and push buttons during the one-back trial.

MRI acquisition

A 3 Tesla Siemens MAGNETOM Prisma scanner was employed to collect MRI data at the Henan Key Laboratory of Imaging and Intelligent Processing, Zhengzhou, China, using a 64-channel head coil. Task-based fMRI data were gathered during all 11 sessions. At the end of session 1, we collected a T1-weighted (T1w) anatomical image for all participants. Three spin-echo field maps were obtained at the beginning, middle, and end of each fMRI session to correct magnetic field distortion. Earplugs were employed to reduce the noise generated by the scanner, and using a sponge cushion to stabilize the head and minimize head movement. In all scan runs, we utilized an EyeLink 1000 Plus system inside the scanner to monitor the participants’ sleep states and attention states. Specifically, we observed the subject’s eyes and the position of a blue dot on the screen (Fig. 1b). The presence of the blue dot within a central box indicated that the subject’s attention was focused, while deviations from this pattern suggested potential distractions or drowsiness.

Functional MRI

The collection of functional data was performed using an interleaved T2*-weighted, single-shot, gradient-echo echo-planar imaging (EPI) sequence: TR = 1000 ms; slices = 56, slice gap = 10%; matrix size = 96 × 96; voxel size = 2 × 2 × 2 mm³; slice thickness = 2 mm; flip angle = 68°; TE = 30 ms; bandwidth = 1748 Hz/Px; echo spacing = 0.68 ms; multi-band acceleration factor = 4; field of view (FOV) = 220 × 220mm²; Phase-encoding direction: anterior to posterior (AP). Prescan Normalize: Off; Slice orientation: axial, coronal, sagittal; Auto-align: Head > Brain; Shim mode: B0 Shim mode Standard, B1 Shim mode TrueForm; Slice acquisition order:Interleaved; Other acceleration: acceleration factor PE = 2.

Field map

The field maps were obtained for correcting EPI spatial distortion using a 2-D spin-echo sequence: TR = 1000 ms; field of view = 220 × 220 mm²; flip angle = 60°; voxel size = 2 × 2 × 2 mm³;TE1/TE2 = 4.76/7.22 ms; slice thickness = 2 mm; slices = 56,slice gap = 10%. Prescan Normalize: Off; Slice orientation: axial, coronal, sagittal; Auto-align: Head > Brain; Shim mode: B0 Shim mode Standard, B1 Shim mode TrueForm; Slice acquisition order: Interleaved.

Structural MRI

We employed a 3D-MPRAGE sequence to acquire structural T1-weighted images: 192 slices; TR = 2300 ms; TE = 2.26 ms; TI = 1100 ms; voxel size = 1 × 1 × 1 mm³; FOV = 256 × 256 mm²; matrix size = 256 × 256; echo spacing = 6.8 ms; slice thickness = 1 mm; flip angle = 8°. Prescan normalization: turned on; Acceleration Factor = 2.

Data analysis

We adopted the data analysis method proposed by Kay et al.⁴⁶ (http://surfer.nmr.mgh.harvard.edu). The FreeSurfer software was utilized to generate participants’ pial and white surfaces based on the T1 volume. Furthermore, for each participant, we constructed an intermediate gray matter surface between the pial and the white surfaces.

We employed the FSL utility “prelude” to phase-unwrap field maps obtained during each scan session. To regularize the field maps, we then used an Epanechnikov kernel to perform three-dimensional local linear regression. In the regression, we employed the magnitude component values from the field maps as weights, enhancing the field estimates’ robustness. This regularization process eliminates noise in the field maps and imposes spatial smoothness. Ultimately, The field maps underwent linear interpolation over time, resulting in an estimation of the field strength for all functional volumes.

We initially pre-processed the task-based functional data based on volume by executing one spatial resampling and one temporal resampling. For the purpose of EPI distortion and head motion correction, we performed one spatial resampling. The correction for slice timing was achieved through temporal resampling, which involved performing one cubic interpolation on the time series of each voxel. The EPI spatial distortion was corrected by using regularized time-interpolated field maps. The SPM12 utility “spm_realign” was then employed to estimate the parameters of rigid-body motion from the undistorted EPI volumes. Ultimately, we used one cubic interpolation on each volume that had been corrected for slice timing to realize the spatial resampling.

The mean attained by pre-processing functional volumes in a session was co-registered to the T1 volume. A three-dimensional ellipse defined manually was employed to concentrate cost metric on brain areas that were not impacted by gross susceptibility influences during the co-registration alignment estimation. The ultimate outcome of the co-registration process is a transformation that illustrates how to align the EPI data with the native anatomy of the participants.

We used pre-processing based on surface to re-analyze the functional data following the completion of co-registration for the anatomical data. This two-stage method was adopted due to the necessity of volume-based pre-processing to produce undistorted functional volumes of high quality, which is essential for registering the functional data with the anatomical data. Surface-based pre-processing can only begin after obtaining this registration.

During pre-processing based on surface, the processes are identical to those in volume-based pre-processing, with the only difference being that the ultimate spatial interpolation is conducted at the vertices of the intermediate gray matter surfaces. Therefore, the sole distinction lies in whether the data is being prepared on an irregular manifold of densely spaced vertices (surface) or a regular three-dimensional grid (volume). The procedures of surface-based pre-processing can finally be simplified to one spatial resampling (for addressing EPI distortion, head motion, and registration to anatomy) and one temporal resampling (for managing slice acquisition times). The advantage of conducting only two simple pre-processing steps is to avoid unnecessary interpolation and maximize the preservation of spatial resolution^46,47,48. The time-series data were ultimately generated for every vertex of the cortical surfaces after completing this pre-processing stage.

GLM analysis of main experiment

We utilized a single-trial General Linear Model (GLMsingle)⁴⁹ (https://github.com/cvnlab/GLMsingle) approach, an advanced denoising approach implemented in MATLAB R2019a, to model the pre-processed fMRI data from the main experiment. For single trials, the method of GLM was developed to offer estimations of BOLD response magnitudes (‘betas’). In this study, three betas were calculated by analyzing the BOLD response corresponding to individual video onset ranging from 1 to 3 seconds with 1-second intervals. We produced individual GLMsingle models for each session (consisted of 4 training runs and 2 test runs). In general, for each video within the training set, 2 (repetitions) × 3 (seconds) betas were acquired. Similarly, for each video within the testing set, 10 (repetitions) × 3 (seconds) betas were acquired. The utilization of repetitions enabled us to acquire video-evoked responses with a high signal-to-noise ratio (SNR).

GLM analysis of the pRF experiment

We used the Compressive Spatial Summation model⁵⁰ conducted in analyzePRF (http://cvnlab.net/analyzePRF/) to model the time-series data acquired from the pRF experiment. Initially, the time-series data from the 2 repetitions for each type of run (multibar, wedgering) were averaged. Then, we estimated pRF parameters for each vertex using the analyzePRF. We mapped the outcomes to the cortex surface by executing linear interpolation on the vertices and then computing the average across cortical depth.

GLM analysis of the fLoc experiment

GLMdenoise^51,52, a denoising approach based on data-driven, was applied to model the time-series data from the fLoc experiment. We utilized a “condition-split” approach to encode the 10 stimulus categories, dividing the trials linked to each category into distinct conditions within every run. For each category, six condition-splits were employed to generate six betas of response. For the purpose of quantifying the selectivity for various categories and fields, we calculated t-values from the GLM betas following conducting the GLM. We conducted linear interpolation to map the results to the inflated surface of FreeSurfer.

Data Records

The data, which is structured following the BIDS format⁵³, were accessible at https://openneuro.org/datasets/ds005047⁵⁴. The “sub- < subID > ” folders store the raw data of each participant (Fig. 2c). The pre-processed volume data are saved in the “derivatives/pre-processed_volume_data/sub- < subID > ” folders (Fig. 2d), while the pre-processed surface data are saved in the “derivatives/volumetosurface/sub- < subID > ” folders (Fig. 2e).

Stimulus

Distinct folders store the stimuli for distinct fMRI experiments: “stimuli/face-video”, “stimuli/floc”, and “stimuli/prf” (Fig. 2b). The category labels and metadata corresponding to video stimuli are stored in the “videos-stimuli_category_metadata.tsv”. The “videos-stimuli_description.json” file describes category and metadata information of video stimuli (Fig. 2b).

Raw MRI data

Each participant’s folder is comprised of 11 session folders: “sub- < subID > /ses-anat”, “sub- < subID > /ses- < sesID > ” (Fig. 2c). The “ses-anat” folder comprises one folder, named “anat”. The “ses- < sesID > ” folder comprises “fmap”and “func” folders. A “sub- < subID > _ses- < sesID > _scans.tsv” file stores the scan information for a scan session. The task events were saved as a “sub- < subID > /func/sub- < subID > _ses- < sesID > _task- < taskID > _run- < runID > _events.tsv” file for each run. The parameter descriptions in “events.tsv” are saved in”task-face_events.json”.

Volume data from pre-processing

The pre-processed volume-based fMRI data were in the folder named “pre-processed_volume_data/sub- < subID > /ses- < sesID > ” (Fig. 2d). A “sub- < subID > _ses- < sesID > _task- < taskID > _space-individual_desc-volume_run- < runID > .nii” file stores the pre-processed volumes data of a run. A “sub- < subID > _ses- < sesID > _task- < taskID > _space-individual_desc-volume_mean.nii” file stores the mean of the pre-processed volume data of a session. A “sub- < subID > _ses- < sesID > _task- < taskID > _space-individual_desc-volume_run- < runID > _timeseries.tsv” file stores the head motion of participants. The explanation of head motion parameters are in the folder named “sub- < subID > _ses- < sesID > _task- < taskID > _space-individual_desc-volume_run- < runID > _timeseries.json”. The parameters of co-registration alignment for a session were stored in the folder named “sub- < subID > _ses- < sesID > _task- < taskID > _alignment”. The process of the pre-processing based on volume is stored in “pre-processed_volume_data_description.json”.

Surface data from pre-processing

The pre-processed surface-based data were stored in a file named “volumetosurface/sub- < subID > /ses- < sesID > /sub- < subID > _ses- < sesID > _task- < taskID > _run- < index > _hemi-lh/rh.surfacedata.gii” for each run (Fig. 2e). The process of the pre-processing based on surface is stored in “volumetosurface_data_description.json”.

FreeSurfer recon-all

The results of reconstructing the cortical surface are saved as “recon-all-FreeSurfer/sub- < subID > ” folders (Fig. 2f). The process of the recon-all and the information of FreeSurfer are stored in “recon-all-freesurfer_data_description.json”.

Surface-based GLM analysis data

We have conducted GLMsingle on the data of the main experiment. There is a file named “sub– < subID > _ses- < sesID > _task- < taskID > _betas_FITHRF_GLMDENOISE_RR.hdf5”, which contains the betas from a scanning session. A “sub- < subID > _ses- < sesID > _task- < taskID > _FRACvalue.hdf5” file stores the the fractional ridge regression regularization level selected for individual voxels of a scanning session. A “sub- < subID > _ses- < sesID > _task- < taskID > _HRFindex.hdf5” file stores the 1-index of the best Hemodynamic Response Function (HRF) of a scanning session. The file named “sub- < subID > _ses- < sesID > _task- < taskID > _HRFindexrun.hdf5” stores HRF index which are segregated by run. A “sub- < subID > _ses- < sesID > _task- < taskID > _R2.hdf5” file stores the model accuracy. A “sub- < subID > _ses- < sesID > _task- < taskID > _R2run.hdf5” file stores R2 which is segregated by run. A “sub- < subID > _ses- < sesID > _task- < taskID > _mean.mat” file stores the mean of the GLM analysis data for the pRF and fLoc experiments (Fig. 2g). The process of the GLM analysis are stored in “surface-based_GLM_analysis_data_description.json”.

Validation

The code of technical validation was saved in the “derivatives/validation/code” folder. The results of technical validation were saved in the “derivatives/validation/results” folder (Fig. 2h). The “README.md” describes the detailed information of code and results.

Technical Validation

Basic quality control

The fundamental quality control suggests that the data displays good quality. To evaluate the quality of the structural data obtained from NFED, we employed four crucial metrics⁵⁵ (see Fig. 3a): Coefficient of Joint Variation (CJV), Contrast-to-Noise Ratio (CNR), Signal-to-Noise Ratio in Grey Matter (SNR_GM), and Signal-to-Noise Ratio in White Matter (SNR_WM). Specifically, the CJV is between white matter (WM) and grey matter (GM). The CNR assesses the relationship between the contrast of GM and WM with the noise present in the image. The SNR assesses the connection between the mean signal measurements and the noise present in the image. The SNR assessment is conducted individually for GM and WM.

For quality control of the NFED’s functional scans, we assessed the amount of head motion for each participant and the temporal signal-to-noise ratio (tSNR) of the time-series data, separately.

We used the framewise displacement (FD) metric⁵⁶ to measure the head motion of each participant. All participants display no volumes with FD exceeding 0.5 mm (Fig. 3a), which is a standard commonly utilized in the literature to identify volumes exhibiting large head motion⁵⁶. The mean FD across participants was 0.051 mm (with a minimum mean of 0.038 mm and a maximum of 0.069 mm, see Fig. 3a). The results suggest that participants effectively controlled head motion throughout the scan.

The tSNR is a commonly employed measure to evaluate the capacity for detecting brain activation in fMRI data⁵⁷. To be specific, the tSNR was calculated as the average of the time course of each vertex divided by its standard deviation of the pre-processed data from each run. Individual tSNR maps for the main experiment were then generated by averaging across all runs. The mean tSNR for the whole-brain across participants was 75.0 ± 5.79 (with a minimum mean of 68.16 and a maximum of 81.90, see Fig. 3b), which has comparability with previous datasets^26,58,59. As predicted, there was variation in tSNR among different brain regions, with dorsal areas exhibiting higher tSNR levels, while the anterior temporal cortex and orbito-frontal cortex showed lower tSNR levels (Fig. 3c). Overall, these findings suggest that the data exhibit strong tSNR levels in the main experiment.

The visual cortex exhibits reliable BOLD responses to natural facial expression videos stimuli in main experiment

In the main experiment, there were 5 participants who accomplished 10 sessions, each consisting of 4 training runs and 2 test runs, making a total of 60 runs (40 training and 20 test runs) in all. For each run, 30 (videos)*2 (repetitions) × 3 (seconds) betas were acquired. Z-score normalization was performed on the raw betas at each voxel for each run. The betas were then averaged across stimulus repetitions to generate a vector of betas. In general, 44 runs (1320 videos, 40 training runs, and 4 test runs)*90 beta series were acquired. Hence, the test-retest reliability of responses to 1320 videos in main experiment was evaluated through computing the Pearson correlation between the 90 beta series obtained from the even runs and odd runs on each vertex. The occipital cortex exhibits high test-retest reliability, which also extends into the regions of the temporal and parietal cortex (Fig. 4). Analysis results indicate that the fMRI data collected during the main experiment demonstrate reasonable test-retest reliability across natural facial expression videos.

Noise ceiling estimation

To evaluate multivariate reliability of the NFED, noise ceilings were computed for each vertex in the entire brain utilizing the testing set. The noise ceiling refers to the highest level of accuracy that a model is capable of achieving, considering the presence of noise in the data^60,61. In our case, the beta values estimated in the GLM are the data of interest. Z-score normalization was performed on the raw beta values at each voxel for each participant individually. The values were then averaged across 3 beta values (beta1, beta2, beta3) and across stimulus repetitions to generate a (n_stimuli × 1) vector of betas.

The Monte Carlo simulations were conducted to assess the noise ceiling following the general framework outlined in prior research^60,62. In these simulations, a known signal is produced, along with noisy measurements of this signal, following which the R² between the measurements and the signal is computed. Specifically, the beta weights and their standard errors were initially gathered through simulations. Since each stimulus was shown only 10 times, the standard errors of the individual beta values are comparatively noisy. Thus, a combined standard error was computed and thought of as the normal distribution’s standard deviation that represents the noise. Subsequently, the mean of the beta values was calculated and considered as the normal distribution’s mean that represents the signal. The amount of variance that can be attributed to noise was subtracted from the total amount of variance in the data to estimate the signal distribution’s standard deviation, and the outcome was forced to be non-negative, and subsequently the square root was calculated. Ultimately, we produced a signal by extracting random values from the signal distribution, created a noisy measurement of this signal by adding random values extracted from the noise distribution to the signal, and computed the R² between the measurement and the signal. A total of 1000 simulations (100 signals, 10 measurements per signal) were conducted and the average R² value was taken as the noise ceiling.

According to the approach described above, we estimated the noise ceiling for all participants. Surface mappings of noise ceiling outcomes indicate the areas where responses to the NFED stimuli are reliable. The occipital cortex exhibits high noise ceilings, which also extend into the regions of the temporal and parietal cortex (Fig. 5a). We presented the quantifications of the average noise ceiling level for the whole-brain for each participant (Fig. 5b). The average whole-brain noise ceiling across participants was 34.75% ± 8.16% (with a minimum mean of 25.08 and a maximum of 47.78), which has comparability with previous datasets⁴⁵.

Manually defined regions of interest (ROIs)

The pRF and fLoc experiments are capable of successfully defining subject-specific functional ROIs. We conducted a pRF experiment and an fLoc experiment during session 1 to facilitate precise-scale subject-specific analyses. To map the retinotopic representation in the low-level visual regions of each participant, we utilized the data obtained from the pRF experiment. The spatial patterns of polar angles through individual data analysis using the pRF model closely resemble those obtained from the Human Connectome Project 7 T Retinotopy Dataset (Fig. 6a). The fLoc experiment was utilized to map high-level regions that show selectivity for categories. The brain areas related to facial and facial semantic recognition can be easily identified using the fLoc experiment (Fig. 6b). In general, these analyses indicate that NFED’s functional localizer effectively delineates subject-specific visual regions with good quality.

The hierarchical correspondences between the DCNN and the human brain

Many studies have revealed hierarchical correspondences in the characterization between DCNN and the human brain^63,64,65 by encoding models. In this study, the data from the main, fLoc, and pRF experiments were combined to construct an encoding model aimed at replicating the hierarchical correspondence between the DCNN and the human brain. We constructed encoding models to establish a mapping between the neural characterizations from the fMRI response at each vertex and the artificial characterizations extracted from different layers of the pre-trained VideoMAEv2⁶⁶. Specifically, a linear regression model was optimized for each vertex to combine the outputs of units from a single layer of the DCNN, resulting in the most accurate prediction of the fMRI response to the training set videos.

The principal components derived from the outputs of the DCNN were recognized and utilized as regressors to account for the signal of the fMRI vertex. Given the training set videos, a much smaller number of components could largely explain the output of each layer in the CNN. For the first through fortieth layers, we performed PCA to extract 60 components for each layer (information loss was negligible (1%)). Although there was a significant reduction in dimension, the loss of information could be neglected, and the overfitting during the training of the encoding model was greatly alleviated by the reduction in feature dimension⁶⁷.

After training an individual encoding model for each vertex, the fMRI responses to the testing set videos could be predicted utilizing these models. For each vertex, we evaluated the accuracy of prediction by measuring the correlation between the measured and predicted fMRI responses. The encoding models exhibited reasonably high accuracies in predicting cortical responses across almost the entire visual cortex (Fig. 7a). In addition to the ventral visual stream, the regions that could be predicted extended even further to encompass the dorsal visual stream, frontal cortices, temporal, and parietal (Fig. 7a).

Example regions in different levels of the visual hierarchy were chosen as ROIs: V1, V2, V3, V4 (ventromedial visual areas), LO (lateral occipital), FFA, OFA, pSTS, TPJ, IPS, MT (middle temporal), V3A, V3B. We selected the top 100 vertices with better encoding performance in each ROI for further analysis. As shown in Fig. 7b, the encoding correlation in low-level visual regions exhibits a gradual decrease as the layers of VideoMAEv2 ascend. Conversely, there is a contrasting trend in the encoding accuracy observed in the high-level visual regions. In summary, NFED validates an early-to-early and late-to-late correspondence pattern between DCNN layers and the human visual cortex.

Metadata-driven analysis

The metadata was leveraged to reveal that spatial and temporal stimulus characteristics were encoded in the brain. In order to perform numerical analyses on text-based data, we feed the text into a video-language model (InternVideo)⁶⁸ to produce model embeddings. Subsequently, we establish a connection between the model embeddings and brain responses by employing Representational Similarity Analysis (RSA)⁶⁹, which primarily involves correlating the Representational Dissimilarity Matrix (RDM) at each vertex defined by the brain responses with an RDM defined by the metadata (Fig. 8a).

For establishing the metadata RDM, the action and expression labels from 120 testing set videos were fed into a video-language model (InternVideo)⁶⁸. This process generated vector embeddings for each label. A single 120 × 120 RDM for the expression and action metadata was produced by calculating the pairwise cosine distance between the vector embeddings of each video. Figure 8b shows the RDMs for the expression and action, respectively.

A searchlight analysis⁷⁰ was performed to produce the RDMs at each vertex. Specifically, a searchlight disk of two dimensions, with a radius of 3 mm around the vertex v, was then defined. Subsequently, the betas (3 betas averaged over repetitions and z-scored across conditions) for all conditions in the testing set were extracted at every vertex. Therefore, each facial expression video stimulus possesses a corresponding beta vector, one from each vertex in the disk. To generate an RDM at the central vertex v, the Pearson correlation for all pairs of stimulus vectors was calculated. This process is repeated for all vectors in the entire brain for each participant. Next, we conducted Spearman correlation between the metadata RDM and the RDMs at each vertex for all participants. The results from the whole-brain analysis (Fig. 8c) indicate that the correlations between the two types of metadata are predominantly significant in the ventral visual regions, dorsal visual regions, and face-selective regions. For the ROI analysis (Fig. 8d), we conducted permutation tests to detect whether the Pearson correlation of voxel responses in each brain region is significantly higher than the null hypothesis distribution. Specifically, we randomly shuffle the metadata RDM and perform Pearson correlation with the searchlight-based RDM at each voxel for the ROI analysis (Fig. 8d). Repeat this process 10,000 times to obtain the null hypothesis distribution. From the null distribution, it can be concluded that for all voxel responses, if their Pearson correlation is higher than 0.0126, it indicates that the Pearson correlation of the voxel response is significantly higher than the null hypothesis (p < 0.05). In this paper, such voxels are referred to as accurately predictable.

Usage Notes

The NFED offers researchers a unique resource for understanding the visual processing of a large number of short natural facial expression videos. Initially, given a large number of short natural facial expression videos along with corresponding BOLD responses recordings, NFED has the potential to offer fine-grained investigation into how the brain represents wide-ranging visual characteristics. Specifically, it facilitates studies on neural encoding and decoding based on fMRI. Neural encoding aims at predicting brain responses to perceived natural facial expression videos, while neural decoding aims at classifying, identifying or reconstructing perceived natural facial expression based on brain responses. Secondly, the NFED includes annotated semantic metadata for the facial expression videos. Additionally, to enhance the potential of NFED, we annotated a series of semantic metadata for natural facial expression videos.

Although NFED is considered by us to be a distinctive resource for exploring neural representation of a large number of short natural facial expression videos, the limitations of this dataset also should be acknowledged. Specifically, the NFED experiment includes data from only five subjects, which limits its suitability for making an fMRI group analysis.

Code availability

The entire code for data pre-processing, data organization, GLM, and validation is available at https://zenodo.org/records/13919442 (https://zenodo.org/account/settings/github/repository/chenpanp/NFED_fmri). The entire code, the data required to run the code, and the results are also available available at https://openneuro.org/datasets/ds005047 (derivatives/validation/code).

References

Tracy, J. L., Randles, D. & Steckler, C. M. The nonverbal communication of emotions. Curr. Opin. Behav. Sci. 3, 25–30 (2015).
Article Google Scholar
Ekman, P. Facial Expression and Emotion. American Psychologist. 48(4), 384–392 (1993).
Article PubMed CAS Google Scholar
Chen, J., Wang, Z., Li, Z., Peng, D. & Fang, Y. Disturbances of affective cognition in mood disorders. Sci. China Life Sci. 64, 938–941 (2021).
Article PubMed Google Scholar
Fang, F. & Hu, H. Recent progress on mechanisms of human cognition and brain disorders. Sci. China Life Sci. 64, 843–846 (2021).
Article PubMed Google Scholar
Giordano, B. L. et al. The representational dynamics of perceived voice emotions evolve from categories to dimensions. Nat. Hum. Behav. 5, 1203–1213 (2021).
Article PubMed PubMed Central Google Scholar
Pitcher, D. & Ungerleider, L. G. Evidence for a Third Visual Pathway Specialized for Social Perception. Trends Cogn. Sci. 25, 100–110 (2021).
Article PubMed Google Scholar
Allison, T., Puce, A. & McCarthy, G. Social perception from visual cues: role of the STS region. Trends Cogn. Sci. 4, 267–278 (2000).
Article PubMed CAS Google Scholar
Lee, L. C. et al. Neural responses to rigidly moving faces displaying shifts in social attention investigated with fMRI and MEG. Neuropsychologia 48, 477–490 (2010).
Article PubMed Google Scholar
Pelphrey, K. A., Singerman, J. D., Allison, T. & McCarthy, G. Brain activation evoked by perception of gaze shifts: the influence of context. Neuropsychologia 41, 156–170 (2003).
Article PubMed Google Scholar
Puce, A., Allison, T., Bentin, S., Gore, J. C. & McCarthy, G. Temporal Cortex Activation in Humans Viewing Eye and Mouth Movements. J. Neurosci. 18, 2188–2199 (1998).
Article PubMed PubMed Central CAS Google Scholar
Grèzes, J., Pichon, S. & De Gelder, B. Perceiving fear in dynamic body expressions. NeuroImage 35, 959–967 (2007).
Article PubMed Google Scholar
Pichon, S., De Gelder, B. & Grèzes, J. Two different faces of threat. Comparing the neural systems for recognizing fear and anger in dynamic body expressions. NeuroImage 47, 1873–1883 (2009).
Article PubMed Google Scholar
Kret, M. E., Pichon, S., Grèzes, J. & De Gelder, B. Similarities and differences in perceiving threat from dynamic faces and bodies. An fMRI study. NeuroImage 54, 1755–1762 (2011).
Article PubMed CAS Google Scholar
Grill-Spector, K., Knouf, N. & Kanwisher, N. The fusiform face area subserves face perception, not generic within-category identification. Nat. Neurosci. 7, 555–562 (2004).
Article PubMed CAS Google Scholar
Yovel, G. & Kanwisher, N. Face PerceptionDomain Specific, Not Process Specific. Neuron 44, 889–898 (2004).
PubMed CAS Google Scholar
Kanwisher, N., McDermott, J. & Chun, M. M. The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception. Neuroscience. 17(11), 4302–4311 (1997).
Article PubMed PubMed Central CAS Google Scholar
Jacob, H. et al. Cerebral integration of verbal and nonverbal emotional cues: Impact of individual nonverbal dominance. NeuroImage 61, 738–747 (2012).
Article PubMed Google Scholar
Bernstein, M., Erez, Y., Blank, I. & Yovel, G. An Integrated Neural Framework for Dynamic and Static Face Processing. Sci. Rep. 8, 7036 (2018).
Article ADS PubMed PubMed Central Google Scholar
McCarthy, G. Face-Specific Processing in the Human Fusiforrn Gyms. Neuroscience 9(5), 605–610 (1997).
CAS Google Scholar
Rotshtein, P., Henson, R. N. A., Treves, A., Driver, J. & Dolan, R. J. Morphing Marilyn into Maggie dissociates physical and identity face representations in the brain. Nat. Neurosci. 8, 107–113 (2005).
Article PubMed CAS Google Scholar
Zhang, Z. et al. Decoding the temporal representation of facial expression in face- selective regions. NeuroImage 283, 120442 (2023).
Article PubMed Google Scholar
Tsantani, M. et al. FFA and OFA Encode Distinct Types of Face Identity Information. J. Neurosci. 41, 1952–1969 (2021).
Article PubMed PubMed Central CAS Google Scholar
VanRullen, R. & Reddy, L. Reconstructing faces from fMRI patterns using deep generative neural networks. Commun. Biol. 2, 193 (2019).
Article PubMed PubMed Central Google Scholar
Güçlütürk, Y., Güçlü, U., Seeliger, K. & Bosch, S. Reconstructing perceived faces from brain activations with deep adversarial neural decoding. NIPS. 12, 4249–4260 (2017).
Google Scholar
Du, C., Du, C., Huang, L., Wang, H. & He, H. Structured Neural Decoding With Multitask Transfer Learning of Deep Neural Network Representations. IEEE Trans. Neural Netw. Learn. Syst. 33, 600–614 (2022).
Article MathSciNet PubMed Google Scholar
Visconti Di Oleggio Castello, M., Chauhan, V., Jiahui, G. & Gobbini, M. I. An fMRI dataset in response to “The Grand Budapest Hotel”, a socially-rich, naturalistic movie. Sci. Data 7, 383 (2020).
Article PubMed PubMed Central Google Scholar
Haxby, J. V., Guntupalli, J. S., Nastase, S. A. & Feilong, M. Hyperalignment: Modeling shared information encoded in idiosyncratic cortical topographies. eLife 9, e56601 (2020).
Article PubMed PubMed Central CAS Google Scholar
Wu, M. C.-K., David, S. V. & Gallant, J. L. Complete functional characterization of sensory neurons by system identification. Annu. Rev. Neurosci. 29, 477–505 (2006).
Article PubMed CAS Google Scholar
Vanderwal, T., Eilbott, J. & Castellanos, F. X. Movies in the magnet: Naturalistic paradigms in developmental functional neuroimaging. Dev. Cogn. Neurosci. 36, 100600 (2019).
Article PubMed Google Scholar
Berkes, P., Orbán, G., Lengyel, M. & Fiser, J. Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment. Science 331, 83–87 (2011).
Article ADS PubMed PubMed Central CAS Google Scholar
Olshausen, B. A. & Field, D. J. Natural image statistics and efficient coding. Network: Computation in Neural Systems, 7(2), 333–339.
Olshausen, B. A. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).
Article ADS PubMed CAS Google Scholar
Smyth, D., Willmore, B., Baker, G. E., Thompson, I. D. & Tolhurst, D. J. The Receptive- Field Organization of Simple Cells in Primary Visual Cortex of Ferrets under Natural Scene Stimulation. J. Neurosci. 23, 4746–4759 (2003).
Article PubMed PubMed Central CAS Google Scholar
Hasson, U., Furman, O., Clark, D., Dudai, Y. & Davachi, L. Enhanced Intersubject Correlations during Movie Viewing Correlate with Successful Episodic Encoding. Neuron 57, 452–462 (2008).
Article PubMed PubMed Central CAS Google Scholar
Roberts, J. A. Fixational eye movements during viewing of dynamic natural scenes. Front. Psychol. 4, 797 (2013).
Article PubMed PubMed Central Google Scholar
Lahner, B. et al. Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nat. Commun. 15, 6241 (2024).
Jiang, X. et al. Dfew: A large-scale database for recognizing dynamic facial expressions inthe wild. In Proceedings of the 28th ACM International Conference on Multimedia, 2881–2889(2020).
Lee, J., Kim, S., Kim, S., Park, J. & Sohn, K. Context-Aware Emotion Recognition Networks. in 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 10142–10151, https://doi.org/10.1109/ICCV.2019.01024 (IEEE, Seoul, Korea (South), 2019).
Barrouillet, P., Bernardin, S. & Camos, V. Time Constraints and Resource Sharing in Adults’ Working Memory Spans. J. Exp. Psychol. Gen. 133, 83–100 (2004).
Article PubMed Google Scholar
Schneider, W. X. Selective visual processing across competition episodes: a theory of task-driven visual attention and working memory. Philos. Trans. R. Soc. B Biol. Sci. 368, 20130060 (2013).
Article Google Scholar
Schultz, J. & Pilz, K. S. Natural facial motion enhances cortical responses to faces. Exp. Brain Res. 194, 465–475 (2009).
Article PubMed PubMed Central Google Scholar
Sonkusare, S., Breakspear, M. & Guo, C. Naturalistic Stimuli in Neuroscience: Critically Acclaimed. Trends Cogn. Sci. 23, 699–714 (2019).
Article PubMed Google Scholar
Benson, N. C. et al. The Human Connectome Project 7 Tesla retinotopy dataset: Description and population receptive field analysis. J. Vis. 18, 23 (2018).
Article PubMed PubMed Central Google Scholar
Stigliani, A., Weiner, K. S. & Grill-Spector, K. Temporal Processing Capacity in High-Level Visual Cortex Is Domain Specific. J. Neurosci. 35, 12412–12424 (2015).
Article PubMed PubMed Central CAS Google Scholar
Allen, E. J. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126 (2022).
Kay, K. et al. A critical assessment of data quality and venous effects in sub-millimeter fMRI. NeuroImage 189, 847–869 (2019).
Article PubMed Google Scholar
Kang, X., Yund, E. W., Herron, T. J. & Woods, D. L. Improving the resolution of functional brain imaging: analyzing functional data in anatomical space. Magn. Reson. Imaging 25, 1070–1078 (2007).
Article PubMed Google Scholar
Kay, K. N. & Yeatman, J. D. Bottom-up and top-down computations in word-and face- selective cortex. eLife 6, e22341 (2017).
Article PubMed PubMed Central Google Scholar
Prince, J. S. et al. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife 11, e77599 (2022).
Article PubMed PubMed Central CAS Google Scholar
Kay, K. N., Winawer, J., Mezer, A. & Wandell, B. A. Compressive spatial summation in human visual cortex. J. Neurophysiol. 110, 481–494 (2013).
Article PubMed PubMed Central Google Scholar
Charest, I., Kriegeskorte, N. & Kay, K. N. GLMdenoise improves multivariate pattern analysis of fMRI data. NeuroImage 183, 606–616 (2018).
Article PubMed Google Scholar
Kay, K. N., Rokem, A., Winawer, J., Dougherty, R. F. & Wandell, B. A. GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Front. Neurosci. 7, (2013).
Gorgolewski, K. J. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. DATA. 3, 160044 (2016).
Article PubMed PubMed Central Google Scholar
Pan, C. et al. An fMRI dataset in response to large-scale short natural dynamic facial expression videos. OpenNeuro https://doi.org/10.18112/openneuro.ds005047.v1.0.7 (2024).
Garcia-Dias, R. et al. Neuroharmony: A new tool for harmonizing volumetric MRI data from unseen scanners. NeuroImage 220, 117127 (2020).
Article PubMed Google Scholar
Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L. & Petersen, S. E. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage 59, 2142–2154 (2012).
Article PubMed Google Scholar
Welvaert, M. & Rosseel, Y. On the Definition of Signal-To-Noise Ratio and Contrast-To- Noise Ratio for fMRI Data. PLoS ONE 8, e77089 (2013).
Article ADS PubMed PubMed Central CAS Google Scholar
Nastase, S. A., Halchenko, Y. O., Connolly, A. C., Gobbini, M. I. & Haxby, J. V. Neural Responses to Naturalistic Clips of Behaving Animals in Two Different Task Contexts. Front. Neurosci. 12, 316 (2018).
Article PubMed PubMed Central Google Scholar
Sengupta, A. A studyforrest extension, retinotopic mapping and localization of higher visual areas. Sci. DATA. 3, 160093 (2016).
Article PubMed PubMed Central Google Scholar
David, S. V. & Gallant, J. L. Predicting neuronal responses during natural vision. Netw. Comput. Neural Syst. 16, 239–260 (2005).
Article Google Scholar
Sahani, M. & Linden, J. F. How Linear are Auditory Cortical Responses? NIPS. 8, 125–132 (2002).
Google Scholar
Lage-Castellanos, A., Valente, G., Formisano, E. & De Martino, F. Methods for computing the maximum performance of computational models of fMRI responses. PLOS Comput. Biol. 15, e1006397 (2019).
Article PubMed PubMed Central Google Scholar
Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. 111, 8619–8624 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Güçlü, U. & van Gerven, M. A. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35, 10005–10014 (2015).
Article PubMed PubMed Central Google Scholar
Lindsay, G. W. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. J. Cogn. Neurosci. 33, 2017–2031 (2021).
Article PubMed Google Scholar
Wang, L. et al. VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking. in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 14549– 14560, https://doi.org/10.1109/CVPR52729.2023.01398 (IEEE, Vancouver, BC, Canada, 2023).
Wen, H. et al. Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision. Cereb. Cortex 28, 4136–4160 (2018).
Article PubMed Google Scholar
Wang, Y. et al. InternVideo: General Video Foundation Models via Generative and Discriminative Learning. Preprint at http://arxiv.org/abs/2212.03191 (2022).
Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis – connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, 1662–5137 (2008).
Google Scholar
Chen, Y. et al. Cortical surface-based searchlight decoding. NeuroImage 56, 582–592 (2011).
Article PubMed Google Scholar

Download references

Acknowledgements

This research was supported by two Key Programs of National Natural Science Foundation of China (62106285,82071884) and Major Projects of Technological Innovation 2030 of China (2022ZD0208500).

Author information

Authors and Affiliations

Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information Engineering University, Zhengzhou, 450000, China
Panpan Chen, Chi Zhang, Bao Li, Li Tong, LinYuan Wang, ShuXiao Ma, Long Cao, ZiYa Yu & Bin Yan

Authors

Panpan Chen
View author publications
Search author on:PubMed Google Scholar
Chi Zhang
View author publications
Search author on:PubMed Google Scholar
Bao Li
View author publications
Search author on:PubMed Google Scholar
Li Tong
View author publications
Search author on:PubMed Google Scholar
LinYuan Wang
View author publications
Search author on:PubMed Google Scholar
ShuXiao Ma
View author publications
Search author on:PubMed Google Scholar
Long Cao
View author publications
Search author on:PubMed Google Scholar
ZiYa Yu
View author publications
Search author on:PubMed Google Scholar
Bin Yan
View author publications
Search author on:PubMed Google Scholar

Contributions

B.Y. conceived of the idea and supervised the research. P.C., C.Z., B.L., L.T., L.W., S.M., L.C., Z.Y. contributed to discussion, experimental design,execution,and manuscript structure. P.C., B.L. visualizes experimental results. P.C., C.Z., B.L., B.Y. wrote the manuscript.

Corresponding author

Correspondence to Bin Yan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, P., Zhang, C., Li, B. et al. An fMRI dataset in response to large-scale short natural dynamic facial expression videos. Sci Data 11, 1247 (2024). https://doi.org/10.1038/s41597-024-04088-0

Download citation

Received: 17 April 2024
Accepted: 06 November 2024
Published: 19 November 2024
DOI: https://doi.org/10.1038/s41597-024-04088-0