Abstract
We recognize faces every day to help us gauge social situations, facilitate communication, and retain relationships. A common goal in face perception research is to understand what specific face features are important during such tasks, and how shape information specific to one task (e.g., identity recognition) is influenced by that specific to another (e.g., expression recognition). We addressed these questions using reverse correlation in an anatomically-interpretable face shape space. Participants viewed pairs of randomly sampled faces and were tasked to select the face closest to a target identity or expression. Through averaging chosen noise patterns, we obtained an estimate of the target’s representation, or template, conducted tests to recover face features significantly associated with recognition, and compared templates across changes in face context. We find that recognition of identity and expression depend on subtle changes in shape distributed across the face, and that the shape information is flexible and specific to contextual changes. In particular, emotional expression changes our representation of identity, and face identity changes our representation of emotional expression. These results provide insight into the adaptability of the visual system’s use of sensory information.
Introduction
The interaction of facial identity and expression has been intensely studied in both vision science and affective science, with contradictory findings reported in each field. In vision science, it has been generally thought that the dimensions are processed independently in the brain and behavior1,2,3,4, largely based on the neural theory that two separate face processing streams branch from the occipital face area, each specialized for different facial dimensions (e.g., identity, expression, motion). However, in the affective science literature, context specificity for identity and expression has been found5,6 with identity features (e.g., age) found to influence the perception of expression7,8. The importance of situational context has roots in social cognition, in which context has been found to aid individuals in effectively processing social information. Here, we use established psychophysical methods to assess precisely whether and how facial context alters face perception. This is the first known study to apply a vision science technique (i.e., reverse correlation) to test the context specificity of face feature representations across identity and emotional expression. The methodology allows us to directly test whether identity features are entangled in emotional expression representations and vice-versa, and to contribute to the ongoing theoretical debate about invariance across facial dimensions.
In psychophysics research, reverse correlation9,10,11 and related techniques (e.g., bubbles12) are particularly useful to determine what stimulus information is used for classification. In reverse correlation (Fig. 1), participants are shown stimuli with randomly manipulated properties and asked to indicate whether each stimulus belongs to a specific category (e.g., “happy face”). Across many trials, the properties of stimuli classified as belonging to a category are averaged, resulting in a vector of property values that represent the template a participant uses to perform the categorization task. These templates reveal which stimulus properties are most important for categorization or recognition. Reverse correlation offers a linear approximation to the information used in a face perception task9 but, without modifications, it is ill-suited to the study of nonlinear effects such as context specificity. This limitation can be overcome by determining how templates can change with changes in stable contexts (e.g., how the template for “happy” changes across different identities), thereby providing a measure of context specificity in face representations13,14.
Reverse Correlation Procedure: Stimulus Generation, Experimental Task, and Quantitative Analysis. (A) Stimuli are generated in a multidimensional shape space that models each face feature as an orthogonal dimension. The origin of the space is the average of multiple identities for the identity shape model, and a neutral expression for the expression pose model. Stimuli can be represented by vectors in this space, as denoted by the purple and green arrows. The arrows connect at the midpoint and point to exactly opposite directions, which indicates these faces are anti-expressions of each other. (B) Two stimuli (a random expression and its anti-expression) are presented to the participant who must use their internal representation of the target dimension (e.g., anger expression) to choose which stimulus has higher sensory evidence favoring the target dimension (e.g., anger expression). 3D face images were rendered using MakeHuman15 v1.2 (www.makehumancommunity.org). (C) Over many trials, the features chosen can be modeled in the same multidimensional shape space that the stimuli were created in. The mean vector (orange line) of chosen features represents the shape parameters that make up the estimated observer template for the target dimension (e.g., anger). 3D face images were rendered using MakeHuman15 v1.2 (www.makehumancommunity.org).
Significant Face Shape Features for Identity. The ten parameters with the lowest p-values for the identity groups, divided by face region: top half, middle half, and bottom half. The left face models depict how the significant parameters change the face mesh when compared to the average male identity. The colors represent differences between the average (i.e., zero model) and the recovered template. Blue represents horizontal changes, green represents vertical changes, and red represents depth changes in shape. The right panels depict which specific face parameters were significant and how they differ from the average model (arrows and boxes). “l/r” means both the left and right sides of that parameter were significant. 3D face images were rendered using a plugin from FaReT16 via MakeHuman15 v1.2 (www.makehumancommunity.org).
Previous face perception studies using reverse correlation typically obtained templates averaged across changes in face context12,13,17,18,19,20,21,22,23,24,25. This approach led to expression templates including mostly information that is invariant across identities, and identity templates including mostly information that is invariant across expressions, providing little information about context specificity.
Early studies using reverse correlation in face perception research primarily manipulated pixel luminance in face images10,26,27. The templates obtained from such procedures are not directly interpretable, because natural changes in face shape (e.g., nose width, lip thickness, or the lifting of the oral commissure to produce a smile) produce coordinated and complex variation across multiple adjacent pixels. As a result, interpretations have relied on identifying clusters of statistically significant pixels and mapping these clusters onto facial regions with semantic meaning (e.g., the mouth or nose). While this approach provides a useful qualitative link between perceptual information and facial areas, it lacks a direct correspondence to the underlying morphological structure of the face.
The increasing adoption of three-dimensional (3D) face modeling has enabled researchers to perform reverse correlation directly within a space of shape features. For example, facial expressions have been modeled using local blend morphs that implement Facial Action Coding System (FACS) action units28,29,30. This approach produces important insights into how anatomically based and semantically meaningful facial movements are represented in both perception and the brain. By linking perceptual templates to specific, well-defined action units, researchers can more precisely interpret which facial movements carry diagnostic information for recognizing expressions.
In contrast, 3D modeling of static facial morphology— the basis of facial identity— has typically relied on global morphable models28,31,32,33,34. These models use blend morphs that simultaneously influence multiple facial features, making interpretation difficult: their variation cannot be easily described in terms of semantically meaningful labels, nor are they grounded in known anatomy. In the biomedical literature, normal and abnormal variation in facial shape is described in terms of local morphological features35 or anatomical landmarks36, which, like FACS action units, require modeling through local blend morphs37. Without this anatomical grounding, it remains unclear which specific face features are most useful during visual processing for identity recognition. Consequently, interpretations can be subjective and are often limited to noting that templates capture general facial areas— such as the eyes and mouth— that are commonly associated with identity classification13,20,21,22,23,24,25,38.
Here, we perform reverse correlation in a space of anatomically-interpretable and blendshape-based face shape parameters for both identity and expression— local morphological features that are semantically meaningful and grounded in anatomical descriptions from the relevant literature (see the Stimuli section in Methods for more information on how the face models are constructed)— and we obtain templates for specific combinations of identities and expressions rather than averaging out context-specific information. We find that, for both identity and expression, (1) shape features distributed across the entire face are informative, rather than being limited to the localized areas identified in previous research, (2) people are highly sensitive to minute changes in face shape, with effects in regions smaller than the typical significant clusters found in traditional studies, and (3) the information used during identity and expression recognition is highly flexible and context specific; that is, information sampled during identity recognition varies with expression, and vice versa. Additionally, the face features used during identity recognition vary depending on the identity presented. These results build on previous findings in vision science13,20,21,22,23,24,25,38 by showing that the use of face information is more precise and flexible than previously reported, and in affective science39,40,41,42 by providing a detailed account of how facial information is used across contexts.
Results
The visual system uses face shape information with high precision during identity and expression recognition
Our first goal was to determine which face features were informative for recognizing a face’s identity or expression. We used a stimulus generation software16 to create three-dimensional face models in which identity and expression were defined by separate parameter spaces. On each trial, we randomly sampled two faces from these spaces and asked participants to choose the one that most resembled a particular category (e.g., “Joe” or “happy”; see Methods). We then averaged the chosen parameter vectors to estimate the observer template (i.e., via reverse correlation; Fig. 1) for either an identity or expression, with separate participant groups assigned to each specific identity or expression. We used permutation testing to determine which parameters were significantly informative for each task.
Results for the identity groups showed that, although many significant parameters were clustered around the eyes and mouth, consistent with previous findings13,20,21,22,23,24,25,38, significant parameters were also distributed across the entire face and reflected smaller, more precise changes in shape. Moreover, the specific features important for recognition varied by identity. Figure 2 illustrates these results and highlights several of the most noticeable significant features (see Supplementary Materials for the set of all significant parameters for each group).
Significant Face Shape Features for Emotional Expression. The ten parameters with the lowest p-values for the expression groups, divided by face region: top half, middle half, and bottom half. The left face models depict the significant parameters when compared to the average neutral expression in FaReT space. The colors represent differences between the average (i.e., zero model) and the recovered template. The right graphic depicts which face features (names defined in FaReT) were significant and how they differ from the average model (arrows). “l/r” means both the left and right sides of that parameter were significant. 3D face images were rendered using a plugin from FaReT16 via MakeHuman15 v1.2 (www.makehumancommunity.org).
We found similar results for the expression templates, except that for many expressions, the significant features were concentrated in the lower half of the face. The anger and sadness templates consistently showed significant features in both the upper and lower face halves, with sadness showing smaller effects in the upper half (Fig. 3). These findings are consistent with previous studies13,20,21,22,23,24,25,38, but our approach also allowed us to isolate informative features across the entire face. It has been proposed that rapid and thorough processing of negative expressions provides an evolutionary advantage by helping to avoid potential threats43,44. In light of this, it is sensible that observers use information distributed across the entire face when detecting anger. Interestingly, although anger did not have the highest total number of significant parameters (see Supplementary Materials), it had the widest distribution of significant features across facial regions, possibly reflecting the visual system’s efficiency in detecting it. Notably, disgust, also an indicator of potential danger45, had the highest total number of significant parameters.
Evidence of context specificity in face recognition: informative identity features vary with expression, and informative expression features vary with identity
Our main goal was to assess whether templates change significantly when an irrelevant facial context varies. For example, when recovering identity templates, do these templates change if the faces presented are happy instead of neutral? To test this possibility, we included additional blocks of trials in which the task remained the same, but faces were presented changed in the face dimension irrelevant for the main task (e.g., classifying identities with a happy expression). We performed pairwise permutation tests to assess significant differences between the recovered templates in the two contexts. We found many significant differences in the templates between facial contexts (Fig. 4; see Figure S3 in the Supplementary Materials for depictions of the significant face features).
Evidence of Context Specificity During Recognition of Identities and Expressions. Rendered “difference” models. The top half represents identity templates (i.e., target was identity) comparing shape information differences when the expression was neutral (i.e., 0) versus one of six expressions. The bottom half represents expression templates (i.e., target was expression) comparing shape information differences when the identity was the male average (i.e., 0) versus one of two identities. All differences are displayed in red. 3D face images were rendered using a plugin from FaReT16 via MakeHuman15 v1.2 (www.makehumancommunity.org).
These results suggest that the visual system is far more flexible in its use of face information than previously recognized. Perceptual decisions rely on information that changes not only with the specific identity or expression, but also with the context in which they appear (e.g., same identity with different expressions, or vice-versa). This context specificity indicates that the visual system adapts its use of facial shape information to different scenarios, an adaption likely shaped for optimal navigation of our ever-changing social world.
Shape information used for identity recognition is negatively correlated with emotional expression context
We observed that templates used for identity recognition often appeared to display an expression opposite to the expression context shown to participants (see Fig. S2 in the Supplementary Materials). To assess this possibility, we conducted an additional analysis (see Fig. S1 in Supplementary Materials). Because identity and expression were modeled in separate parameter spaces, we compared three-dimensional mesh models by correlating the vertex coordinates of two templates for each group: (1) the mesh recovered using reverse correlation with a change in the irrelevant feature (e.g., an identity template obtained in the context of a happy expression), and (2) the mesh recovered without that change (e.g., an identity template obtained in the context of a neutral expression), but with the expression context superimposed (e.g., adding a previously validated happy expression16). We found that the meshes were significantly and negatively correlated across all groups (r=[-0.102, -0.038], p<0.01; see values in Table S1 in Supplementary Materials). Interestingly, although the identity and expression shape spaces16 were defined by entirely different parameters, the results revealed significantly correlated shape information at the level of the face surface. These results suggest that observers may expect or internally represent neutral faces during identity recognition, leading them to sample opposite expression information to counteract context effects on shape. For completeness, we conducted a similar analysis for the expression groups by correlating expression meshes across identity contexts. Here, we found no consistent pattern: both positive and negative correlations appeared within expressions (i.e., across identities) and across expressions.
Discussion
Summary
We employed reverse correlation in an anatomical face space to estimate the information used for recognizing identities and expressions in specific facial contexts. Rather than averaging the information used for a given category (e.g., happiness) across different contexts (e.g., identities), we estimated it for specific identity-expression combinations. We found that the information used for both identity and expression classification was distributed across the face, highly precise (i.e., reflected in small anatomical shape changes), and dependent on both the specific recognition target (identity or expression) and the irrelevant facial context in which it was shown. This latter finding supports the hypothesis favored in the recent affective science literature5,6,7,8 that face perception is highly nonlinear and context-specific.
Using pixel-based approaches, previous studies13,20,21,22,23,24,25,38,46,47,48 have shown that the areas around the eyes and mouth are particularly informative for recognizing identity and expression. Building on these findings13,20,21,22,23,24,25,38,46,47,48, we demonstrate that the visual system can also detect minute changes in other facial areas (e.g., nose, chin) and use this shape information for decision-making. Moreover, we isolated features that are informative for recognizing some identities or expressions, but not others, indicating that the visual system is highly selective, and that facial representations are precisely fine-tuned during recognition.
These results have important theoretical implications in the face recognition literature, showing that the visual system flexibly adapts to the available facial information, which shifts depending on contextual factors. This work advances the vision science literature— particularly the many reverse correlation studies on face perception— by moving beyond identifying important regions for recognition to examining how changes in one facial dimension (e.g., expression) alter the diagnostic features used to recognize another (e.g., identity). While recent research49 suggests that identity and expression are perceived holistically or through Gestalt processes50, we provide direct evidence of this interaction using template representations.
Research on facial identity and emotional expression processing has evolved through decades of theoretical and empirical work. Bruce and Young2 first proposed that identity and expression are processed independently, a framework later supported by neural evidence4. Subsequent studies have suggested that this independence may be context-dependent, with facial motion acting as a potential modulator of structural representations51. More recently, a revised neural model3 proposed that shape and motion are processed separately, and emotional expression information can be integrated within either or both streams. The current results presented in this manuscript support this revised model by providing evidence that identity and static emotional expression is context-specific, suggesting identity and expression representations are processed interactively.
Within the affective science literature, the integration of identity and expression cues has been proposed to enhance social communication and survival52. Our results show that this integration operates at the level of visual shape: identity information can influence how expressions are represented, and expression information can influence how identities are represented, further supporting the social pathway theory of situational processing7,8, in which emotional expression is proposed to be contextually important for identity representation and vice-versa. For example, the same angry expression might appear more intense when shown on Bob’s face than on Joe’s, due to the interaction between identity-specific shape features and the emotional expression. As a result, we might be more cautious around Joe, even if he visually appears less angry than Bob, because identity shape features modulate the perception of expression, and vice-versa.
An assumption underlying our research is that face psychophysics benefits from stimulus-generation models based on an interpretable morphological face space. Psychophysics measures and manipulates physical stimulus features to quantitatively relate them to responses in a perceptual task, which requires carefully selecting those measurable features. Some questions in high-level vision cannot be answered by manipulating low-level features, such as pixel luminance. Consequently, in face research, manipulation of three-dimensional shape via morphable face models53 has become increasingly common28.
However, widely used morphable models (e.g., the Basel face model) typically manipulate global shape features, which are difficult to interpret and hard to link to the true biophysical determinants of face shape. In contrast, using a morphological space that manipulates local facial features addresses both of these issues: it provides a more interpretable framework for face psychophysics, yields an identity space that more closely resembles the expression space (often defined by local action units), and establishes connections to biomedical and genetic research on facial shape55,56,57,58,59.
Our results open new directions for neuroscience research. Reverse correlation and related techniques have been widely applied to EEG and MEG, revealing the time course of shape information processing in the visual stream21,60,61,62,63. However, relatively little is known about how brain areas use facial information in a context-specific manner, and further neuroimaging studies are needed to build on the findings reported here.
Limitations
Past research has documented a wide range of emotional states and expressions beyond the six basic emotions examined here— expressions that are important for social communication but currently underexplored in vision science. Further work is required to determine whether our findings extend to emotions other than the six basic ones identified by Ekman64.
Previous research has shown that categorization training can make estimated templates more invariant to contextual changes13. To test this possibility, we conducted a control experiment with increased categorization training. Even with additional training that included variation in the irrelevant expression dimension, we still found significant differences between identity templates across expressions (see Supplementary Materials).
Reverse correlation provides a linear approximation of the information used in a face perception task9. In the present study, we examined nonlinear face processing (i.e., context specificity) by obtaining and comparing such linear estimates across facial contexts. A limitation of this approach is that it does not reveal exactly how the two sources of information interact. While our analyses showed that shape information used to recognize identity changes systematically with expression, future research that use methods to directly estimate nonlinear psychophysical kernels69 could more precisely characterize how specific identity and expression features interact. Such estimates may also shed light on how representations at different levels of integration within the visual system (e.g., individual features, feature clusters, categories) contribute to face recognition, and whether top-down modulation occurs across these levels (e.g., from identity to eye to eyelid fold).
We opted for experimental control by using three-dimensional computerized face models to construct our stimuli. While these methods offer clear scientific advantages, we recognize the limitations of using computer-generate faces instead of natural human faces. Controlling for non-facial features and precisely manipulating and measuring face shape changes enhanced experimental rigor, though possibly at the cost of reduced generalizability (for a meta-analysis of how computer-generated faces are perceived, see Miller et al. 202366).
Conclusion
Altogether, this research presents an improved methodology to study how shape information influences face perception, opening the door to exciting future investigations. Using this approach, we found that shape information important for identity and expression recognition is precise, flexible, and context-specific. This is the first known study to measure context-specificity in face perception using controlled psychophysics, with interactions explained through interpretable anatomical face shape features. We believe that future work can build on this methodology to address unanswered questions in face perception research.
Methods
Participants
480 students (20 participants per group, detailed below; mean age=23.55, SD=6.34; 389 females, 81 males) at Florida International University were recruited to participate in the study. The study was approved by the Institutional Review Board ethics committee at Florida International University and informed consent was obtained from all participants. All methods were performed in accordance with the relevant guidelines and regulations. Data was collected online via Pavlovia (https://pavlovia.org) and the experiment was programmed via Psychopy3 (v2020.2.10).
Stimuli
Stimuli were created using FaReT16, an open-source toolbox for three-dimensional modeling of face shape parameters such as identity and expression via MakeHuman15 (www.makehuman.org). FaReT via MakeHuman constructs face models using separate parameter spaces for identity and expression, representing each face as a vector of local morphological features (Fig. 1a) and each expression as a vector of local action poses. MakeHuman represents facial features as anatomically-based linear blendshapes and applies nonlinear parametric smoothing onto the mesh to increase surface realism67. Many of these facial parameters correspond directly to anatomical landmarks. For instance, the software can model both skeletal variations (e.g., head scale horizontal changes the bizygomatic width) and soft-tissue deformations (e.g., scale upperlip volume changes the assumed underlying fat pad distribution).
We developed a new plugin in FaReT16 to render randomly generated faces and anti-faces (see below) by varying either identity or expression, while keeping other facial dimensions constant. Head shape and ears were fixed, as our interest was limited to in changes in inner face features (e.g., eyes, mouth). The camera angle was centered on the face (x=0,y=0) with a default zoom of 8.7, excluding any view below the base of the neck. Images were rendered to 400 \(\times\) 300 pixels with anti-aliasing and a lightmap applied. Skin texture was set to “young Caucasian male” to standardize age, race, and sex across stimuli.
Although the forehead is treated as a facial feature in FaReT16, we held it constant because applying the same standard deviation as the other features produced unnatural-looking faces. To maintain equivalent variability across features while preserving facial realism, we fixed the forehead height parameters to the “average male” model values.
Two faces were rendered for each trial. To construct the first random face, a random noise pattern (e.g., a vector of identity values) was added to the base vector (e.g., average male with a neutral expression). This noise pattern was generated by sampling from a Gaussian distribution centered on the average (either average male identity in the FaReT database or zero expression) with a standard deviation of 0.08. All facial features were modulated simultaneously, but with varying magnitudes and directions.
The second, opposite “anti” face was created by multiplying the noise pattern by -1 and adding it to the base vector, producing the exact opposite shape change from that of the first face. Each experimental group was presented with different stimuli generated from different noise patterns. In the identity group, neutral and expressive faces were generated from separate random noise samples; the reverse was true for the expression groups.
Because all faces in FaReT contain all defined facial features (with the average centered at 0), each randomly sampled stimulus contained information from all features. Sampling all features randomly over 500 trials yielded a robust representation of feature combinations. Furthermore, because the stimulus noise was defined by randomizing shape parameters, the resulting faces appeared as realistic random identities or expressions (see Fig. 1 for an illustration of this procedure).
Groups
Participants were assigned to either an identity group or an expression group. Those in the identity groups selected the face that most resembled a target identity while those in the expression groups selected the face that most resembled a target expression. Data was collected for each group until 20 participants met the inclusion criteria.
Identity groups
When the target to be identified was an identity, expression was held constant within trial blocks (four blocks of 200 trials and two blocks of 100 trials). Across six different groups, expressions varied between neutral and one of the six basic expressions: happiness, anger, sadness, surprise, fear, or disgust. Randomized face feature values (i.e., Gaussian noise; SD=0.08) were generated in the identity space and superimposed onto the average model. For each expression condition, 500 faces were rendered, yielding a total of 7,000 stimuli (500 trials \(\times\) 2 identities \(\times\) 7 expressions).
Each participant in the identity groups completed 500 trials in the neutral condition and 500 trials in one of the six irrelevant-expression conditions. In total, there were 12 identity groups (2 target identities\(\times\) 6 varied expressions).
Emotion groups
When the target to be identified was an expression, identity was held constant within trial blocks (four blocks of 200 trials and two blocks of 100 trials). Identities varied between the average male identity defined in FaReT16 space and one of two male identities. Randomized face feature values (i.e., Gaussian noise; SD=0.08) were generated in the expression space and superimposed onto the neutral expression. 500 faces were rendered, yielding a total of 3,000 generated stimuli (500 trials \(\times\) 2 expressions \(\times\) 3 identities).
Each participant in the expression groups completed 500 trials in the average male condition and 500 trials in one of the two irrelevant identity conditions. In total, there were 12 expression groups (6 target expressions\(\times\) 2 varied identities).
Procedure
Familiarization
Previous research has shown that unfamiliar faces are processed differently from familiar faces. Presenting animated videos of faces from multiple viewpoints not only helps participants learn an identity quickly, but also reduces the likelihood of image-matching. Therefore, the task began with a familiarization phase in which participants viewed the experimental identities as animated videos. The faces rotated 30 degrees horizontally into different viewpoints while their expression changed from neutral to each of the six target expressions. Each video was shown twice, and participants were instructed to memorize the faces.
Training (identity groups only)
Participants in the identity groups completed 132 training trials to learn the identity they would later be tasked to recognize. Same as in the experimental trials, two faces were presented side-by-side, and participants were instructed to select the target identity (e.g., “Choose Joe”). However, instead of two experimental stimuli, one face was the target identity and the other was one of 11 possible distractor identities. Participants in the expression groups did not complete training, as individuals are generally assumed to be familiar with the target expressions.
Main reverse correlation task
In each trial, participants were shown two faces simultaneously. One was a randomly sampled face (i.e., random expression for expression groups or random identity for identity groups), and the other was its anti-face— the mathematically opposite face (anti-expression for expression groups or anti-identity for identity groups). Participants indicated which face most resembled the target identity or expression by pressing the left or right arrow key.
This forced-choice design, combined with Gaussian noise, naturally controls for response uncertainty. Stimuli can be visualized as a cloud of points around the origin model (e.g., the average identity or neutral expression; see Fig. 1c). Stimuli that closely match the participant’s internal template are chosen consistently and contribute strongly to the final estimate, whereas stimuli that deviate from the template lead to more variable choices. Because opposite stimuli are presented within a trial pair, inconsistent responses across trials cancel each other out in the averaging process, ensuring that only informative trials influence the recovered template.
Each block presented stimuli in a single face context (i.e., neutral or one expression for identity groups; average identity or a specific identity for expression groups). Most blocks contained 200 experimental trials and 4 catch trials (see below), except the final block, which contained 100 experimental trials and 2 catch trials. Face context alternated between blocks, and for the identity groups, 11 refresher training trials were presented between blocks. Importantly, the specific face context was irrelevant to the perceptual task.
Catch trials were included to ensure participants remained attentive throughout the experiment. These trials resembled experimental trials but paired the target with a “distorted” identity for identity groups or the anti-expression for expression groups (see Fig. S4 in the Supplementary Materials for examples). An incorrect response on a catch trial was taken as evidence of inattention, as these trials had a definite correct answer (the target) and a definite incorrect answer (the distractor).
Analysis
Inclusion criteria
Participants were included in the analysis if they achieved at least 85% accuracy on the training trials and if they missed no more than two catch trials (equivalent to 90% accuracy). For each group, the first 20 complete datasets meeting these criteria were analyzed.
Reverse correlation procedure
As shown in Fig. 1, the two stimuli in each trial were generated from the same noise vector, but with opposite signs. Individual templates were estimated by averaging the parameters of the chosen faces across many trials, and group templates were estimated by averaging across all participants in a group.
Permutation tests
To identify features significantly associated with participants’ perceptual decisions, we conducted a one-sample nonparametric permutation test to determine which features differed significantly from zero. Permutations were iterated 5,000 times to construct a null distribution. In each iteration, the sign of each parameter (i.e., face feature value) was randomly shuffled (i.e., multiplied by ± 1), and the test statistic (the standardized group average template) was computed. The maximum value across all features was used to form the empirical distribution function (EDF) while controlling for multiple comparisons. The observed test statistic for each feature was then compared to the EDF, with p-values representing the proportion of permuted statistics exceeding the observed statistic.
To assess whether observer templates changed with variations in an irrelevant facial context (i.e., context specificity), individual templates from the two contexts (e.g., neutral or happy expression) were subtracted, and a paired-sample permutation test using the same algorithm described above was applied using the average difference array as a test statistic. Permutations were iterated 5,000 times and parameters were again normalized. Significant parameters indicated that participants relied on different facial features depending on the context.
Exploratory correlation analysis
To further investigate the interactions between identity and expression representations, we tested whether a significant correlation existed between the face shape of the expressive identity templates (i.e., obtained in non-neutral blocks) and the face shape of the average identity template with the same expression superimposed. Because the stimulus spaces for identity and expression were independent, correlations were computed using the coordinates of the polygon mesh vertices as a common shape space.
To accomplish this, each face was exported from the FaReT16 model space to a common mesh file to extract the three-dimensional coordinates of each vertex. We then isolated the face coordinates, calculated the vector direction from the MakeHuman15] origin mesh by subtracting the origin mesh from the template meshes, and normalized each vector by dividing by its magnitude (\(\sqrt{x^{2}+y^{2}+z^{2}}\)). Pearson correlations were then computed between the coordinate vectors for each pair. Figure S1 in the Supplementary Materials illustrates this procedure. A negative correlation in this space would indicate that participants— faced with recognizing an identity in the context of a facial expression— sampled anti-expression shape information in addition to the target identity.
Data availability
The datasets generated during and analyzed during the current study are available in the OSF repository, https://doi.org/10.17605/OSF.IO/5ZEXF. This project was pre-registered via OSF, https://doi.org/10.17605/OSF.IO/EBX6S. Any additional information required to reanalyze the data reported in this paper is available from the lead contact (Fabian Soto: fasoto@fiu.edu) upon request.
References
Bernstein, M. & Yovel, G. Two neural pathways of face processing: A critical evaluation of current models. Neurosci. Biobehav. Rev. 55, 536–546. https://doi.org/10.1016/j.neubiorev.2015.06.010 (2015).
Bruce, V. & Young, A. Understanding face recognition. Br. J. Psychol. 77, 305–327. https://doi.org/10.1111/j.2044-8295.1986.tb02199.x (1986).
Duchaine, B. & Yovel, G. A revised neural framework for face processing. Annu. Rev. Vis. Sci. 1, 393–416. https://doi.org/10.1146/annurev-vision-082114-035518 (2015).
Haxby, J. V., Hoffman, E. A. & Gobbini, M. I. The distributed human neural system for face perception. Trends Cogn. Sci. 4, 223–233. https://doi.org/10.1016/s1364-6613(00)01482-0 (2000).
Aviezer, H., Ensenberg, N. & Hassin, R. R. The inherently contextualized nature of facial emotion perception. Curr. Opin. Psychol. 17, 47–54. https://doi.org/10.1016/j.copsyc.2017.06.006 (2017).
Hess, U. & Hareli, S. The Emotion-Based Inferences in Context (EBIC) Model. In The Social Nature of Emotion Expression (eds Hess, U. & Hareli, S.) 1–5 (Springer International Publishing, 2019). https://doi.org/10.1007/978-3-030-32968-6_1.
Albohn, D. N. & Adams, R. B. Jr. Social vision: At the intersection of vision and person perception. In Neuroimaging Personality, Social Cognition, and Character (ed. Albohn, D. N.) 159–186 (Elsevier Academic Press, 2016). https://doi.org/10.1016/B978-0-12-800935-2.00008-7.
Albohn, D. N., Brandenburg, J. C. & Adams, R. B. Perceiving emotion in the neutral face: A powerful mechanism of person perception. In The Social Nature of Emotion Expression (ed. Albohn, D. N.) 25–47 (Springer International Publishing, 2019).
Murray, R. F. Classification images: A review. J. Vis. 11, 2. https://doi.org/10.1167/11.5.2 (2011).
Murray, R. F. Classification images and bubbles images in the generalized linear model. J. Vis. 12, 2. https://doi.org/10.1167/12.7.2 (2012).
Lu, Z. L. & Dosher, B. Visual Psychophysics: From Laboratory to Theory (MIT Press, 2013).
Gosselin, F. & Schyns, P. G. Bubbles: A technique to reveal the use of information in recognition tasks. Vision. Res. 41, 2261–2271. https://doi.org/10.1016/S0042-6989(01)00097-9 (2001).
Soto, F. A. Categorization training changes the visual representation of face identity. Atten. Percept. Psychophys. 81, 1220–1227. https://doi.org/10.3758/s13414-019-01765-w (2019).
Soto, F. A. & Beevers, C. G. Perceptual observer modeling reveals likely mechanisms of face expression recognition deficits in depression. Biol. Psychiatry Cogn. Neurosci. Neuroimag. 9, 597–605. https://doi.org/10.1016/j.bpsc.2024.01.011 (2024).
The MakeHuman Team MakeHuman.
Hays, J., Wong, C. & Soto, F. A. FaReT: A free and open-source toolkit of three-dimensional models and software to study face perception. Behav. Res. Methods 52, 2604–2622. https://doi.org/10.3758/s13428-020-01421-4 (2020).
Schyns, P. G., Bonnar, L. & Gosselin, F. Show me the features! Understanding recognition from the use of visual information. Psychol. Sci. 13, 402–409. https://doi.org/10.1111/1467-9280.00472 (2002).
Caldara, R. et al. Does prosopagnosia take the eyes out of face representations? Evidence for a defect in representing diagnostic facial information following brain damage. J. Cogn. Neurosci. 17, 1652–1666. https://doi.org/10.1162/089892905774597254 (2005).
Butler, S., Blais, C., Gosselin, F., Bub, D. & Fiset, D. Recognizing famous people. Attent. Percept. Psychophys. 72, 1444–1449. https://doi.org/10.3758/APP.72.6.1444 (2010).
Smith, M. L., Cottrell, G. W., Gosselin, F. & Schyns, P. G. Transmitting and decoding facial expressions. Psychol. Sci. 16, 184–189. https://doi.org/10.1111/j.0956-7976.2005.00801.x (2005).
Schyns, P. G., Petro, L. S. & Smith, M. L. Dynamics of visual information integration in the brain for categorizing facial expressions. Curr. Biol. 17, 1580–1585. https://doi.org/10.1016/j.cub.2007.08.048 (2007).
Smith, F. W. & Schyns, P. G. Smile through your fear and sadness: Transmitting and identifying facial expression signals over a range of viewing distances. Psychol. Sci. 20, 1202–1208. https://doi.org/10.1111/j.1467-9280.2009.02427.x (2009).
Ewing, L., Karmiloff-Smith, A., Farran, E. K. & Smith, M. L. Developmental changes in the critical information used for facial expression processing. Cognition 166, 56–66. https://doi.org/10.1016/j.cognition.2017.05.017 (2017).
Ewing, L., Farran, E. K., Karmiloff-Smith, A. & Smith, M. L. Understanding strategic information use during emotional expression judgments in Williams syndrome. Dev. Neuropsychol. 42, 323–335. https://doi.org/10.1080/87565641.2017.1353995 (2017).
Lee, J., Gosselin, F., Wynn, J. K. & Green, M. F. How do schizophrenia patients use visual information to decode facial emotion?. Schizophr. Bull. 37, 1001–1008. https://doi.org/10.1093/schbul/sbq006 (2011).
Dotsch, R. & Todorov, A. Reverse correlating social face perception. Soc. Psychol. Personal. Sci. 3, 562–571. https://doi.org/10.1177/1948550611430272 (2012).
Liu, M. et al. Facial expressions elicit multiplexed perceptions of emotion categories and dimensions. Curr. Biol. 32, 200-209.e6. https://doi.org/10.1016/j.cub.2021.10.035 (2022).
Yu, H., Garrod, O. & Schyns, P. Perception-driven facial expression synthesis. Comput. Graph. 36, 152–162. https://doi.org/10.1016/j.cag.2011.12.002 (2012).
Jack, R. E. & Schyns, P. G. Toward a social psychophysics of face communication. Annu. Rev. Psychol. 68, 269–297. https://doi.org/10.1146/annurev-psych-010416-044242 (2017).
Snoek, L., Jack, R. E. & Schyns, P. G. Dynamic face imaging: A novel analysis framework for 4D social face perception and expression. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG) (ed. Snoek, L.) 1–4 (IEEE Press, 2023). https://doi.org/10.1109/FG57933.2023.10042724.
Schyns, P. G., Zhan, J., Jack, R. E. & Ince, R. A. A. Revealing the information contents of memory within the stimulus information representation framework. Philos. Trans. R. Soc. Lond. B Biol. Sci. 375, 20190705. https://doi.org/10.1098/rstb.2019.0705 (2020).
Zhan, J., Garrod, O. G. B., van Rijsbergen, N. & Schyns, P. G. Modelling face memory reveals task-generalizable representations. Nat. Hum. Behav. 3, 817–826. https://doi.org/10.1038/s41562-019-0625-3 (2019).
Snoek, L. et al. Testing, explaining, and exploring models of facial expressions of emotions. Sci. Adv. 9, eabq8421. https://doi.org/10.1126/sciadv.abq8421 (2023).
Yu, H., Garrod, O., Jack, R. & Schyns, P. A framework for automatic and perceptually valid facial expression generation. Multimed. Tools Appl. 74, 9427–9447. https://doi.org/10.1007/s11042-014-2125-9 (2015).
Allanson, J. E., Biesecker, L. G., Carey, J. C. & Hennekam, R. C. Elements of morphology: introduction. Am. J. Med. Genet. A 149(1), 2–5 (2009).
Deutsch, C. K., Shell, A. R., Francis, R. W. & Bird, B. D. The Farkas system of craniofacial anthropometry: methodology and normative databases. In Handbook of anthropometry: Physical measures of human form in health and disease 561–573 (New York, NY, Springer, New York, 2012).
Wisetchat, S., Stevens, K. A. & Frost, S. R. Facial modeling and measurement based upon homologous topographical features. PLoS ONE 19(5), e0304561 (2024).
Schyns, P. G., Petro, L. S. & Smith, M. L. Transmission of facial expressions of emotion co-evolved with their efficient decoding in the brain: Behavioral and brain evidence. PLoS ONE 4, e5625. https://doi.org/10.1371/journal.pone.0005625 (2009).
Ganel, T., Goshen-Gottstein, Y. & Ganel, T. Effects of familiarity on the perceptual integrality of the identity and expression of faces: The parallel-route hypothesis revisited. J. Exp. Psychol. Hum. Percept. Perform. 30, 583–597. https://doi.org/10.1037/0096-1523.30.3.583 (2004).
Wang, Y., Fu, X., Johnston, R., & Yan, Z. Discriminability effect on Garner interference: Evidence from recognition of facial identity and expression. Front. Psychol. 4 (2013).
Fitousi, D. & Wenger, M. J. Variants of independence in the perception of facial identity and expression. J. Exp. Psychol. Hum. Percept. Perform. 39, 133–155. https://doi.org/10.1037/a0028001 (2013).
Stoesz, B., Jakobson, L., & Rigby, S. A sex difference in interference between identity and expression judgments with static but not dynamic faces (2013).
Burra, N., & Kerzel, D. Task demands modulate effects of threatening faces on early perceptual encoding. Front. Psychol. 10 (2019).
Green, M. J., Williams, L. M. & Davidson, D. In the face of danger: Specific viewing strategies for facial expressions of threat?. Cogn. Emot. 17, 779–786. https://doi.org/10.1080/02699930302282 (2003).
Grahlow, M., Rupp, C. I. & Derntl, B. The impact of face masks on emotion recognition performance and perception of threat. PLoS ONE 17, e0262840. https://doi.org/10.1371/journal.pone.0262840 (2022).
Sekuler, A. B., Gaspar, C. M., Gold, J. M. & Bennett, P. J. Inversion leads to quantitative, not qualitative. Changes Face Process. Curr. Biol. 14, 391–396. https://doi.org/10.1016/j.cub.2004.02.028 (2004).
Royer, J. et al. Greater reliance on the eye region predicts better face recognition ability. Cognition 181, 12–20. https://doi.org/10.1016/j.cognition.2018.08.004 (2018).
Creighton, S. E., Bennett, P. J. & Sekuler, A. B. Classification images characterize age-related deficits in face discrimination. Vision. Res. 157, 97–104. https://doi.org/10.1016/j.visres.2018.07.002 (2019).
Hosseini, S. S. & Soto, F. A. Multidimensional signal detection modeling reveals Gestalt-like perceptual integration of face emotion and identity. Emotion 24, 1494–1502. https://doi.org/10.1037/emo0001352 (2024).
Townsend, J. T. & Wenger, M. J. On the dynamic perceptual char- acteristics of Gestalten: Theory-based methods. In The Oxford handbook of perceptual organization get access arrow (ed. Wagemans, J.) 948–968 (Oxford University Press, 2014).
O’Toole, A. J., Roark, D. A. & Abdi, H. Recognizing moving faces: A psychological and neural synthesis. Trends Cogn. Sci. 6(6), 261–266 (2002).
Adams, R. B., Albohn, D. N. & Kveraga, K. Social vision: Applying a social-functional approach to face and expression perception. Curr. Dir. Psychol. Sci. 26(3), 243–248 (2017).
Egger, B. et al. 3D morphable face models– past, present, and future. ACM Trans. Graph. 39, 157.1-157.38. https://doi.org/10.1145/3395208 (2020).
Ekman, P. & Friesen, W. V. Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues (Prentice-Hall, 1975).
Garrod, O., Yu, H., Breidt, M., Curio, C. & Schyns, P. Reverse correlation in temporal FACS space reveals diagnostic information during dynamic emotional expression classification. J. Vis. 10, 700. https://doi.org/10.1167/10.7.700 (2010).
Naqvi, S. et al. Decoding the human face: Progress and challenges in understanding the genetics of craniofacial morphology. Annu. Rev. Genomics Hum. Genet. 23, 383–412. https://doi.org/10.1146/annurev-genom-120121-102607 (2022).
Palmer, R.L., Helmholz, P., & Baynam, G. CLINIFACE: PHENOTYPIC VISUALISATION AND ANALYSIS USING NON-RIGID REGISTRATION OF 3D FACIAL IMAGES. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2020, 301–308 (2020). https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-301-2020.
Weinberg, S. M. et al. The 3D facial norms database: Part 1. A web-based craniofacial anthropometric and image repository for the clinical and research community. Cleft Palate Craniofac. J. 53, e185–e197. https://doi.org/10.1597/15-199 (2016).
Deutsch, C. K., Shell, A., Francis, R. W. & Bird, B. D. The Farkas system of craniofacial anthropometry: Methodology and normative databases (physical measures of human form in health and disease. Handb. Anthropometry https://doi.org/10.1007/978-1-4419-1788-1_29 (2012).
Sforza, C., Dellavia, C., De Menezes, M., Rosati, R., & Ferrario, V. Three-Dimensional Facial Morphometry: From Anthropometry to Digital Morphology. In Handbook of Anthropometry: Physical Measures of Human Form in Health and Disease, pp. 611–624 (2012). https://doi.org/10.1007/978-1-4419-1788-1_32.
Yan, Y. et al. The brain computes dynamic facial movements for emotion categorization using a third pathway. bioRxiv https://doi.org/10.1101/2024.05.06.592699 (2024).
Smith, M. L., Fries, P., Gosselin, F., Goebel, R. & Schyns, P. G. Inverse mapping the neuronal substrates of face categorizations. Cereb. Cortex 19, 2428–2438. https://doi.org/10.1093/cercor/bhn257 (2009).
Smith, M. L., Gosselin, F. & Schyns, P. G. Measuring internal representations from behavioral and brain data. Curr. Biol. 22, 191–196. https://doi.org/10.1016/j.cub.2011.11.061 (2012).
Cowen, A. S. & Keltner, D. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Natl. Acad. Sci. 114, E7900–E7909. https://doi.org/10.1073/pnas.1702247114 (2017).
Smith, M. L., Gosselin, F. & Schyns, P. G. From a face to its category via a few information processing states in the brain. Neuroimage 37, 974–984. https://doi.org/10.1016/j.neuroimage.2007.05.030 (2007).
Miller, E. J., Foo, Y. Z., Mewton, P. & Dawel, A. How do people respond to computer-generated versus human faces? A systematic review and meta-analyses. Comput. Hum. Behav. Rep. 10, 100283 (2023).
Bastioni, M., Re, S., & Misra, S. Ideas and methods for modeling 3D human figures: The principal algorithms used by MakeHuman and their implementation in a new approach to parametric modeling. In: Proc. 1st Bangalore Annual Compute Conference, 10, 1-6 (2008).
Burton, A.M., and Jenkins, R. (2011). Unfamiliar face perception. The Oxford Handbook of Face Perception, 287–306.
Neri, P. Estimation of nonlinear psychophysical kernels. J. Vis. 4 (2), 2 (2004). https://doi.org/10.1167/4.2.2
Funding
This work was supported by the National Science Foundation under grant numbers 2020982 and 2319234 to Fabian A. Soto. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Contributions
F.A.S. conceived the study. E.R.M. and F.A.S. programmed the experiment, analyzed the data, created figures, wrote and revised the manuscript. E.R.M. collected the data. J.S.H. created software and code used in the experiment and analysis.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Martin, E.R., Hays, J.S. & Soto, F.A. Shape information used for face identity and expression recognition is highly versatile and context specific. Sci Rep 16, 3597 (2026). https://doi.org/10.1038/s41598-025-33545-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-33545-y



