Introduction

In response to an increasingly competitive market, companies are placing heightened emphasis on design and innovation as means to align with evolving consumer demands. As consumers seek furniture that embodies both aesthetic appeal and personal identity, understanding user preferences has become essential for businesses aiming to enhance market relevance.Two basic design elements—form and color—are vital. They stimulate human vision and physiology, impact users’ inner feelings, and help form corresponding color associations and image recognition1,2. Despite the significant role of color and form in furniture design, research on these elements has historically been constrained by limited methodologies and tools. As a result, substantial gaps remain in understanding consumer preferences for furniture designs. To address this knowledge gap, the present study examines chairs as focal stimuli, specifically investigating how color and form impact aesthetic evaluations. This research employs advanced multimodal measurement techniques to provide comprehensive insights into the interaction between design factors and user aesthetic responses, thus offering valuable implications for furniture design and innovation.

Human–computer interaction (HCI) is important for the development of products. Recently, as a critical factor in evaluating product design quality, user aesthetic experience has received more attention from researchers3, which plays an important role in purchase decisions4. Preference, the main part of user aesthetic experience, narrowly refers to how consumers like or dislike product designs. Designers often use several research tools to understand preferences, such as one-on-one interviews, general surveys, and focus group studies5. The data provided by these traditional methods are not sufficiently accurate and are easily affected by the respondents’ surrounding environment, false feelings about their inner state, and so on6. Therefore, the conclusions of such surveys may mislead the design direction7. Owing to various levels of conscious awareness, process operations can drive people toward different behavioral responses8. Within HCI research, multimodal measurement techniques—such as eye-tracking, galvanic skin response (GSR), electrocardiography (EKG), and event-related potentials (ERP)—have gained prominence. Eye-tracking reveals focal points and preferences, GSR reflects levels of emotional arousal, and EKG provides insights into psychological stress. Additionally, ERP, an advanced neurophysiological tool, captures brain activity responses to specific visual stimuli, allowing researchers to measure users’ subconscious reactions to design elements. Collectively, these methods enable a more objective and scientific examination of aesthetic preferences, supporting the development of more user-centered designs in furniture.

Although furniture is a major product category, there is a significant lack of HCI-related research on furniture aesthetic preference. Therefore, the introduction of HCI, utilizing research methods from applied psychology (such as eye tracking and physiological measurement techniques), was chosen to scientifically and objectively study consumer preferences for furniture. This approach aims to provide precise design guidance by incorporating scientific and objective perspectives.

Colors and forms of leisure chairs

The leisure chair generally comes in various colorful, creative, and popular shapes with young people. Using the theory of ergonomics and following a certain principle of scale, Han9designed a new type of leisure chair to alleviate and improve the compression and fatigue of the lumbar spine and spinal column during sedentary work and analyzed it in comparison with the results of the finite element simulation10. Research on leisure chairs’ appearance, visual sensibility, and physiological feedback remains limited.

Form and color play crucial roles in the design of leisure chairs, directly influencing people’s perceptions, comfort, and aesthetic experience. In neurasthenic and computational aesthetics, sensory features like color and shape are known to significantly impact user preferences. For example, Markus et al.11found that specific colors can evoke unique emotional responses, while Wang and Hsu12demonstrated that symmetry and simplicity in form enhance perceptions of beauty and emotional appeal. These findings underscore the importance of visual elements in shaping consumer emotions and preferences, offering valuable insights for design13.

Regarding the initial understanding of colors, names such as rose red and lemon yellow are disorderly and imprecise. Color researchers worldwide have begun constructing numerical color systems to address this issue. Currently, three common color systems are used internationally: the Munsell color system, the Ostwald color system, and the PCCS system formulated by the Japan Color Research Institute. The visual perception of color is influenced by various factors, including hue, intensity, brightness14, and lighting conditions15. Even when sharing the same color palette, furniture items such as sofas, chairs, and couches16can elicit distinct sensory experiences. Furthermore, different countries and regions have diverse cultural backgrounds and religious beliefs, leading to varying cultural perceptions of color17. Furniture forms fall under the category of artificial forms, referring to the shape presented by the outline of the furniture, encompassing both “external shape” and “expression”. It represents the external manifestation of furniture materials and structures and incorporates elements such as points, lines, surfaces, and blocks18. Based on existing research, the classification of furniture forms primarily encompasses distinctions such as curvilinear vs. rectilinear19, open and closed, and complex and simple designs. Compared to other furniture design elements, furniture forms convey visual perceptions more quickly and are highly infused with sensory factors, carrying elements of zeitgeist and cultural connotations. In today’s market, furniture forms change more rapidly than new functionalities, reflecting society’s ever-changing desires and pursuits toward furniture.

Given the complexity and impact of color and form in furniture design, several studies have explored consumer preferences in this context. For instance, Jiang et al.20 studied the impact of color preference on adolescent children’s choice of furniture. Ciritcioğlu et al.21focused on the consumer preference on furniture surface colors. Kaputa and Supin22 dealt with the selected results of research aimed at the consumer’s preferences for furniture in the Slovak Republic.

Development of the multimodal measurement

Ergonomic studies primarily focus on human fit and decrease fatigue and discomfort through product design23. A collaborative study of ergonomics with HCI technologies, including eye tracking, physiological measurements, and EEG, should yield more favorable evidence for the emotional perception of furniture design. By harnessing HCI emotion recognition technology, designers can gain insights into users’ emotional responses to furniture design24.

Visual perception refers to the brain’s process of receiving, distinguishing, and recognizing visual information. Human perception is the visual reception of information and visual cognition through memory and thought processes. Vision was the first channel used to gather information. This can affect consumers’ subjective intentions of consumers25. Eye tracking techniques record eye movements when consumers see a product. It is increasingly applied in aesthetic experience26,27,28. Several studies have examined how design elements affect users’ gaze behavior; eye tracking has been widely applied in design fields, including mobile devices, packaging, web design, and advertising, thus providing references and guidance for designers29,30,31,32. It has also been applied to evaluate the perception of sensory properties and quality factors or consumers’ willingness to purchase6,33. For instance, Palacios-Ibáñez et al.34 utilized eye-tracking metrics in virtual environments to analyze consumer gaze behavior and decision-making processes for household products, achieving a 90% accuracy rate in predicting user preferences. Their study highlights the role of eye-tracking as a powerful tool for understanding consumer behavior and assessing aesthetic preferences in virtual settings, which provides valuable insights for our approach in evaluating furniture design.

Physiological measurements can generally more objectively report users’ emotions35. Some available metrics include GSR, EKG, skin temperature (SKT), and respiratory pressure (RSP). GSR and EKG have been widely applied in fields such as Medical Science, Psychology, and Biology, particularly for emotion evaluation. The GSR reflects the conductivity of the skin and correlates with emotional arousal, especially in the domain of user aesthetic experience design. EKG refers to the heart’s electrical activity36.

EEG research found that humans continuously generate brainwaves. The electrical potential changes in the brain caused by the spontaneous activity of groups of brain cells are called spontaneous electrical potentials and form the basis for generating Event-Related Potentials (ERPs). ERPs denote brain electrical activity changes when subjected to or removed from external stimuli37. Most EEG responses and emotional ERPs are induced by complex visual stimuli38. ERPs are mainly applied in neurological research, such as the study of ERPs in patients with epilepsy and the brain mechanisms induced by face recognition39. Its application in the design field is relatively limited, and research is sparse, mainly focusing on industrial products and clothing design.

As for the multimodal measurement method, Scherer40pointed out that it could be utilized to measure people’s emotions because of the complex components. Additionally, Guo, Cao, Ding, Liu, and Zhang41 stated that this method can evaluate users’ emotional experiences. Generally, there have been few studies on emotional measurements using multiple techniques, particularly in furniture design. This study aimed to determine the physiological mechanisms evoked by preferences and the relationship between emotional evaluation and design elements. Moreover, this study will help optimize product design, promote the development of the entire material and furniture production industry, and improve the quality of people’s lives.

Aesthetic evaluation

Computational aesthetics is an interdisciplinary field involving the disciplines of computer science, artificial intelligence, cognitive science, and aesthetics, and aims to study computer-generated art and design, as well as human perceptions and evaluations of their aesthetic experiences42. Aesthetic computation is an important branch of computational aesthetics that emphasizes the use of computers and algorithms to understand, simulate, and enhance the human process of perceiving and appreciating beauty43. In the research of aesthetic computation, designers usually adopt the method based on subjective quantification, i.e., subjective scoring based on personal aesthetic preferences or questionnaires, and then compensating quantitative assessment of the scoring results through various mathematical methods44. In order to more accurately measure the aesthetic properties of quantified products, the scientific foundations of computational aesthetics and aesthetic computing need to be further explored and developed. This includes the construction of more accurate, scientific models of aesthetics that capture and quantify aspects of beauty more comprehensively. This may require interdisciplinary collaborations covering fields such as mathematics, psychology, and computer science to develop more reliable theoretical frameworks and assessment methods. For products with complex or irregular image forms, new methods of measurement and quantification need to be explored. This may involve more advanced image processing techniques, machine learning algorithms, and computer vision methods to more accurately analyze and assess the aesthetic characteristics of these products. There is also a need to strengthen research on the relationship between the objective and the subjective in order to understand the link between individual aesthetic preferences and objective aesthetic features.

Aesthetic experience involves multiple sensory inputs, such as vision, hearing and touch. Recent research in affective computing and user experience evaluation has highlighted the value of multimodal measurement in assessing aesthetic experiences. For instance, Kim and Lee45 developed a multimodal framework that combines video, audio, text, and physiological data to improve emotion recognition and user satisfaction assessment, proving the value of multimodal integration for robust user experience analysis. Similarly, Wang et al.46introduced an attention-enhanced model for emotion recognition in complex environments, using Transformers and attention mechanisms to capture subtle emotional cues in multimedia settings, making it highly effective for understanding aesthetic and emotional responses. Machine learning techniques can effectively process multimodal data, integrate different sensory inputs, and improve the model’s ability to comprehensively understand and analyze the aesthetic experience. Breiman47 proposed the RF algorithm, which is an emerging and highly flexible machine learning algorithm used to effectively address classification and regression problems. The core idea of the RF algorithm is ensemble learning, where different features and a fixed number of decision trees are integrated to achieve higher predictive performance and robustness. The advantages of RF include:

  1. (1)

    Handling large-scale, high-dimensional feature datasets with powerful adaptability to disturbances in the dataset, exhibiting high robustness in various ways47.

  2. (2)

    The relative importance of different features can be determined by assessing the contribution of each input variable in nonlinear sensitivity analysis and decision-making processes48.

  3. (3)

    Adapting to application scenarios with incomplete data, performing classification or regression in situations where the given dataset is incomplete47.

  4. (4)

    Since the RF algorithm is a non-parametric model, it does not make assumptions about the existence of any a priori relationships in the dataset and can deal with datasets that contain non-linear relationships49.

This study used Nordic-style chairs as an example to explore synchronous measurement methods of HCI applied to the furniture domain. In addition, the impact of form and color on aesthetic preferences for chairs was investigated. The study aims to advance the integration of HCI techniques and furniture design, particularly through synchronous measurement methods for user behavior and emotional assessment. To achieve this, we propose a multimodal measurement method that includes questionnaires, eye tracking, physiological measures, and EEG acquisition to investigate the factors (i.e., color and form of chairs, sex, and consumer specialty) that significantly affect subjective preference evaluation and physiological feedback. And we also established an aesthetic model using RF algorithm based on the multimodal measurement data. This approach provides a clear and objective perspective on furniture preferences. It benefits the research and exploration of innovative furniture design and imparts a sense of visualization, accuracy, and scientific rigor, enhancing its overall significance.

Methods

Participants

Forty Chinese college students from Beijing, China, were recruited for this study. The participants consisted of 19 men and 21 women, half of whom had educational backgrounds in art design and ranged between 18 and 26 years of age (Mage=22.83 years, SD = 1.93). The sample size for the subsequent ERP experiment was calculated using G*Power (Effect Size: 0.45, Significance Level: 0.05, Statistical Power: 0.80, Number of Groups: 6) according to the previous literature50. All participants were right-handed and had normal or corrected visual acuity; no color blindness; no history of neurological or psychiatric disorders such as epilepsy; no serious addiction to smoking, alcohol, and no use of psychoactive drugs such as hypnotics or stimulants. Before the experiment, all participants provided informed consent and a short introduction to the experimental procedures. Each participant received a gift worth approximately $7 as compensation after the experiment, regardless of their performance. This study was approved by the Human Study Ethics Committee of Beijing Forestry University in 2023. We confirm that all experiments were performed in accordance with relevant guidelines and regulations.

Materials and procedures

Various chair images of different colors and forms were prepared as stimulus materials (Fig. 1). According to the Practical Color Coordinate System (PCCS), six chair colors were selected: cool tones (green and blue), warm tones (red and yellow), and neutral tones (white and black). The selection of colors was based on the PCCS system, which incorporates six hues: the psychological primary colors of red (RGB: 255, 0, 0), yellow (RGB: 255, 255, 0), blue (RGB: 0, 0, 255), and green (RGB: 0, 255, 0), along with the neutral tones of black (RGB:0, 0, 0) and white (RGB: 255, 255, 255). By utilizing the RGB values within the PCCS framework, the corresponding color textures were assigned to the chairs. Additionally, six chair forms with larger differences in their overall shapes were chosen from the Nordic-style chairs. According to the presence or absence of decorative openwork on the backrest, chair forms are divided into two categories: complex (backrest with openwork, forms 1–3) and simple (backrest without openwork, forms 4–6). Three simple forms dominated by linear contours and three complex forms characterized by hollow decorations were chosen as the six chair morphologies, serving as another variable. Additionally, efforts were made to ensure consistency in factors such as lighting, angles, textures, and furniture types, thereby mitigating the impact of other variables on the experimental measurements. The study combined these two design elements. Six colors × six forms = 36 chair figures were created using Autodesk 3ds Max software. All the images (1280 × 1024 pixels) had the same browsing angle, were set to equal luminance and were presented on a gray background.

Fig. 1
figure 1

Experimental stimuli, including chair figures with six forms in six colors.

The experiments were conducted in a quiet, appropriately illuminated, electrically shielded, enclosed laboratory at the authors’ institution (Fig. 2c), and a complete experiment took approximately 15 min in total. After the preparations were performed (e.g., washing and blow-drying their hair and putting on the physiological instruments), the participants were comfortably seated in front of the screen at a distance of approximately 60 cm (Fig. 2b). We have adjusted the participants’ height so that they could look at the center of the screen. Subsequently, a five-point calibration was performed on the screen with an accuracy of 0.4 °. Following eye-tracking calibration, the participants were required to relax for a 3-minute baseline measurement. The test task was started after the baseline measurements (Fig. 3). Each trial started with a cross-shaped symbol (+, with the same background color as the stimuli) displayed in the center of 19 inch screen for 1,000 ms and was followed by an image of a chair presented on the screen for 10,000 ms. At the end of each stimulus picture presentation, a preference-level questionnaire appeared on the screen to allow participants to rate the chairs that appeared. Participants’ chair preferences were assessed by clicking the mouse to fill out a 9-point Semantic Differences scale (SD), and the time taken to decide was recorded. After the participants clicked on the rating, the next cross-shaped symbol appeared. Additionally, to ensure that the eye tracker and physiological data were aligned with the EEG data, the available timestamps were utilized via the ErgoLAB HME Synchronization Platform software (King Far International Inc., Beijing, China).

Fig. 2
figure 2

(a) The composition of ErgoLAB HME Synchronization Platform (b) Measuring equipment (c) Experimental process photo.

Fig. 3
figure 3

Experimental procedure.

Data acquisition and analysis

The ErgoLAB HME synchronization platform was employed in this study to collect data for further analysis. This platform consists of two laptops for image presentation and test control: a portable eye-tracking module, a wearable physiological recording module, and an EEG module (Fig. 2a).

Eye Tracking: The portable eye tracker Tobii Pro X3-120 (Tobii, Stockholm, Sweden), placed at the bottom of the screen (5 cm height), recorded the 3D position of each eyeball and gaze point on the screen at a sampling rate of 120 Hz (Fig. 2b). Physiological systems were used to capture physiological data. To analyze eye movements, the areas of interest (AOI) were defined as the overall outline of the chairs51 (Fig. 4). According to previous studies, there are individual differences in the blink data of the participants, which would affect the accuracy of the results of the experiment. These data were not included in the present study. For each AOI, five measures of consumer gazing behavior were calculated: first fixation duration, average pupil diameter, fixation count, saccade count, and total fixation time.

GSR and EKG: We measured the GSR and EKG using a multichannel physiological measurement instrument (Mega Electronics Ltd., Kuopio, Northern Savonia, Finland). The GSR was gathered using surface electrodes sewn in Velcro straps placed around the two fingers on the same hand. EKG signals were measured by a photoplethysmogram (PPG), which could capture blood flow using infrared light on the skin (as at the earlobes in this study) (Fig. 2b). Based on previous research52, after collecting skin electrical data for a long period, the median skin conductance (SC) signal was used as the representative value of skin conductance to avoid the insensitive influence of the average value caused by irrelevant interference. In addition, the average heart rate (AVHR) data were selected as the criteria for describing the EKG. A paired t-test was used to compare the data when viewing the images and the rest to determine whether the stimuli evoked obvious physiological responses. Moreover, to decrease the individual difference, the D-value obtained by subtracting the baseline data from the post-stimulation generated data (GSR or EKG) was utilized for comparison following the Eqs. (1) and (2).

$$\:\Delta\text{G}\text{S}\text{R}={\text{G}}_{1}-{\text{G}}_{0}$$
(1)
$$\:\Delta\text{E}\text{K}\text{G}={\text{E}}_{1}-{\text{E}}_{0}$$
(2)

where G0, E0 represent the SC signal and AVHR values in the resting state, and G1, E1 represent the SC signal and AVHR values after visual stimulation, respectively.

EEG: EEG signals were recorded using a Neuron EEG/ERP system (Mega Electronics Ltd., Kuopio, Northern Savonia, Finland) with 32 Ag/AgCl electrodes at a sampling rate of 500 Hz (Fig. 2b). Before attaching the electrodes, the electrode sites were cleaned with an alcohol patch to improve contact. All EEG electrode impedances were maintained below 5kΩ (using Conductive paste to improve contact). Raw EEG signals were preprocessed offline using the EEGlab toolbox (EEGlab v 2023.0)53in MATLAB R2022b (MathWorks Inc., Natick, MA, USA) to obtain clean ERP data. Some key steps in the offline EEG data analysis exist. The EEG signals were first re-referenced to the average mastoids (TP9/TP10) and then filtered with a 0.1–30 Hz band-pass filter and a 48–52 Hz notch filter to optimize independent component analysis (ICA) decomposition for artifact correction54. Each epoch lasted 1000 ms, ranging from 200 ms before stimulus onset to 800 ms after stimulus onset, and was corrected to a 200 ms pre-stimulus baseline. Previous studies identified the aforementioned ERP components, P300 (at C3, C4, Cz, P3, P4, and Pz; 410–560 ms), and their amplitudes were calculated as the average amplitude within the corresponding time window.

Fig. 4
figure 4

Examples of the AOIs in each stimulus.

Statistical analysis

In this study, the normality of the data was assessed using the Shapiro-Wilk test. The results indicated that the p-values for all groups were greater than 0.05, suggesting that the data follow a normal distribution. Therefore, subsequent statistical analyses were conducted using parametric methods based on the assumption of normality. Double-factor analysis of variance (ANOVA) was conducted with the factor groups (color and form) on the experimental data. A two-way repeated-measures ANOVA was performed in a 2 (sex: women, men) × 2 (specialty: design, non-design) between-subjects experiment. Greenhouse-Geisser corrections were applied when the assumption of sphericity was violated. SPSS (version 20.0) was used to collect and analyze the experimental data. The F-value (F) is the outcome of the F-test, which equals the ratio of the mean square between and within groups. Additionally, the p-value (p) can determine the significance of the differences and represent the probabilities under the corresponding F values while defining 5% as the significance level (p < 0.05). Eta-squared (ηp2) is a correlation ratio and a measure factor of effect size.

Experimental results

Behavioral results

Descriptive statistical analyses were conducted on the scores for each color and form using SPSS software (Tables 1 and 2). The results show that there was little difference in the average scores. Next, we combined the results of the measured data plotted to obtain Fig. 5. The preference scores were ranked as warm (1.59) > cool (0.30) > neutral (−0.12), complex (1.08) > simple (−0.41). Moreover, the results of two-way ANOVA suggested that there was a significant difference for both the color (F (5, 35) = 11.689, p < 0.001, ηp2 = 0.040) and the form of chairs (F (5, 35) = 51.556, p < 0.001, ηp2 = 0.155), but without their interactive effects (F (25, 35) = 0.731, p = 0.829 > 0.05, ηp2 = 0.013).

Fig. 5
figure 5

Average chair scores for different classifications of colors and forms.

Table 1 The scores of each form chair.
Table 2 The scores of each color chair.

The repeated-measures ANOVA for factors such as sex and specialty revealed a violation of the sphericity assumption, according to Mauchly’s test (p < 0.001). Therefore, the Greenhouse-Geisser correction was applied, indicating significant differences among subjective evaluations of each participant (F (10.62, 39) = 8.405, p < 0.001, ηp2 = 0.185). Furthermore, the participant’s specialty exhibited a noticeable difference in subjective preference ratings (F (10.62, 39) = 2.596, p = 0.004 < 0.05, ηp2 = 0.066). However, sex did not significantly differ in subjective evaluations (F (10.62, 39) = 0.779, p = 0.657 > 0.05, ηp2 = 0.021).

Eye-tracking outcomes

We selected five eye-tracking parameters based on the measurement results for detailed statistical analysis. Tables 3 and 4 show the descriptive statistics of eye-tracking data evoked by each form or color. The differences were relatively small regarding the magnitudes of the parameter values. The results of recalculating the average values according to categories, as shown in Fig. 6, indicate that the average values for all eye-tracking parameters were consistently higher for warm tones than for cool and neutral tones. Additionally, chairs with complex forms tended to elicit higher eye-tracking parameter values.

Fig. 6
figure 6

Mean values of eye movement indicators for different classifications of colors and forms (a) First fixation duration (b) Average pupil diameter (c) Fixation count (d) Saccade count (e) Total fixation time.

Table 3 The results of eye tracking data evoked by each form of chairs.
Table 4 The results of eye tracking data evoked by each color of chairs.

For average pupil diameter, the results of two-way ANOVA showed highly significant effects of the color of chairs (F (5, 35) = 8.187, p < 0.001, ηp2 = 0.028), but without main effect of the form of chairs (F (5, 35) = 0.809, p = 0.543 > 0.05, ηp2 = 0.003), or interaction between color and form (F (5, 35) = 0.290, p = 0.998 > 0.05, ηp2 = 0.005). Moreover, Mauchly’s test indicated that the assumption of sphericity was violated (p < 0.05). Thus, the statistics were reported based on the Greenhouse-Geisser correction. The results of repeated measures ANOVA revealed notable differences in chairs (F (7.04, 39) = 7.506, p < 0.001, ηp2 = 0.169), but without in sex (F (7.04, 39) = 0.457, p = 0.866 > 0.05, ηp2 = 0.012) or specialty (F (7.04, 39) = 0.894, p = 0.512 > 0.05, ηp2 = 0.024).

For first fixation duration, the two-way ANOVA clearly showed that the first fixation duration was more influenced by interaction between the color and form of chairs (F (5, 35) = 1.414, p = 0.056 > 0.05, ηp2 = 0.034) rather than the color (F (5, 35) = 1.185, p = 0.314 > 0.05, ηp2 = 0.004) or the form (F (5, 35) = 0.607, p = 0.695 > 0.05, ηp2 = 0.002). According to Mauchly’s test, the assumption of sphericity was violated (p < 0.05). The repeated measures ANOVA results corrected by G-G correction signified there was no clear main influence of sex (F (11.17, 39) = 0.679, p = 0.761 > 0.05, ηp2 = 0.018) or specialty (F (11.17, 39) = 0.699, p = 0.742 > 0.05, ηp2 = 0.019) of participants except for different stimuli (F (11.17, 39) = 1.019, p = 0.428 > 0.05, ηp2 = 0.027).

Furthermore, based on the ANOVA results, no significant differences (p > 0.05) were observed in the impact of color, form, sex, and professional factors on the other three parameters, including fixation count, saccade count, and total fixation time (Table 5).

Table 5 The ANOVA results for fixation count, saccade count, and total fixation time.

Physiological responses

“AVHR” was chosen as the analysis indicator for EKG, which refers to the average number of heart beats per minute. The heart rate is the number of heartbeats per minute and is typically expressed as beats per minute (BPM). The “SC signal” refers to the skin conductance signal obtained by measuring the changes in skin resistance. This signal is associated with sweat secretion and provides physiological information about emotional and arousal states. The participants’ physiological responses were compared between viewing the images and the rest (Table 6). According to the paired t-test, the results indicated that the physiological responses when watching chair pictures were significantly higher than when they were at rest.

Table 6 Comparison of physiological responses between viewing the images and rest.

GSR and EKG represent the increments in skin conductance and electrocardiogram relative to the baseline after receiving picture stimuli. Figure 7 illustrates the variations in GSR and EKG values for chairs of different colors and forms, while Fig. 8 presents a bar chart that categorically integrates these data. The results indicate that both GSR and EKG showed a trend of warm colors > cool colors > neutral colors and complex forms > simple forms.

Fig. 7
figure 7

EKG and GSR mean values of different chairs.

Fig. 8
figure 8

EKG and GSR mean values for different classifications of colors and forms.

For EKG, the ANOVA results suggested no obvious differences in each variable (p = 0.794 > 0.05). The results of the two-way ANOVA revealed that the color factor (F (5, 35) = 0.194, p = 0.965 > 0.05, ηp2 = 0.001) and form (F (5, 35) = 0.534, p = 0.751 > 0.05, ηp2 = 0.002) did not exhibit a significant difference on EKG. Additionally, the interaction between color and form for chairs showed no significant difference in EKG (F (25, 35) = 0.972, p = 0.503 > 0.05, ηp2 = 0.017). In the repeated-measures ANOVA, the assumption of sphericity was not met (p < 0.001). Similarly, after G-G correction, the data results indicated significant differences among different participants (F (7.60, 39) = 4.658, p < 0.001, ηp2 = 0.112). However, the sex and specialty of the participants showed no significant impact on EKG (F (7.60, 39) = 1.028, p = 0.414 > 0.05, ηp2 = 0.027; F (7.60, 39) = 1.055, p = 0.394 > 0.05, ηp2 = 0.028).

For GSR, the results of two-way ANOVA showed unusual effects of different chairs (p < 0.001). Meanwhile, there were uncommon main effects of the form of chair (F (5, 35) = 2.293, p = 0.043 < 0.05, ηp2 = 0.008) and interaction of form and color (F (25, 35) = 5.199, p < 0.001, ηp2 = 0.085), without the color of chair (F (5, 35) = 0.989, p = 0.423 > 0.05, ηp2 = 0.004). The results of repeated measures ANOVA corrected by G-G correction revealed significant effects of varied chairs (F (2.61, 39) = 9.234, p < 0.001, ηp2 = 0.200), but without sex (F (2.61, 39) = 0.809, p = 0.477 > 0.05, ηp2 = 0.021) or specialty (F (2.61, 39) = 0.473, p = 0.675 > 0.05, ηp2 = 0.013).

ERP results

The grand average waves of the P300 amplitude across colors and forms are displayed in Fig. 9. The P300 amplitudes of the red, yellow, green, blue, white, and black chairs were 0.406, 0.317, 0.178, 0.134, 0.054, and 0.083 eV, respectively (Fig. 9a). The P300 amplitudes of form 1–6 chairs were 0.211, 0.253, 0.364, 0.078, 0.080, and 0.111 eV, respectively (Fig. 9b). Compared with Fig. 9a, the six curves with different morphological conditions in Fig. 9b almost overlap and differ less. The amplitude of the P300 fluctuation changed in this order: warm colors (−0.36 eV) > cool colors (−0.16 eV) > neutral colors (−0.05 eV), complex forms (−0.39 eV) > simple forms (−0.17 eV) (Fig. 10). The ANOVA analysis results suggested that there was no significant effect on P300 amplitude, regardless of sex (F (3.01, 32) = 2.410, p = 0.126 > 0.05, ηp2 = 0.188) or specialty (F (2.67, 32) = 1.207, p = 0.645 > 0.05, ηp2 = 0.537). Color had a significant effect on P300 amplitude (F (5, 35) = 1.105, p = 0.035 < 0.05, ηp2 = 1.227), whereas form (p = 0.225 > 0.05) and interaction (p = 0.217 > 0.05) had no significant effect.

Fig. 9
figure 9

Grand average waves of P300 amplitude (a) The P300 was elicited by different color stimuli; (b) Different form stimuli elicited the P300.

Fig. 10
figure 10

Grand average waves of P300 amplitude for different classifications of colors and forms.

Correlation analysis between behavioral and effective indicator data

Based on the above results, we chose only different metrics to determine whether eye movement, physiological responses, or EEG signals could reflect furniture aesthetics. This study conducted a correlation analysis between subjective and objective data. Defining the nine-level evaluation scores as Ordinal Data, all data were processed using Spearman’s correlation inspection and the Kendall correlation coefficient test. The results suggest a weak positive correlation between the subjective evaluation score, average pupil diameter (Kendall’s tau-b = 0.044, p = 0.020 < 0.05; Spearman’s rho = 0.060, p = 0.023 < 0.05), and first fixation duration (Kendall’s tau-b = 0.045, p = 0.022 < 0.05; Spearman’s rho = 0.060, p = 0.023 < 0.05). However, there was no correlation between the subjective evaluation score and the SC signal (Kendall’s tau-b = 0.016, p = 0.404 > 0.05; Spearman’s rho = 0.022, p = 0.411 > 0.05) or the P300 amplitude (Kendall’s tau-b = 0.038, p = 0.327 > 0.05; Spearman’s rho = 0.057, p = 0.334 > 0.05). The results were consistent, although the correlation coefficients differed.

Discussion

Behavioral results

Regarding chair color and form, each element significantly affected the subjective evaluation scores; however, the interaction between the two was not significantly different for the behavioral results. For the overall chair color, the trend for all subjective behavioral data means was warm > cool > neutral. These results support the notion that long-wavelength colors (red and yellow) are more arousing than other colors55,56, leading to a linear association between affective tone and wavelength57. In addition, the participants preferred complex to simple forms, which agrees with the literature58,59. Regarding participants’ sex and specialty, only educational majors showed a significant difference in subjective evaluation scores, suggesting whether participants had a design-related education affected their preference for chairs. No significant sex-based differences were found, consistent with previous studies that reported no significant differences between men and women in terms of preferences60,61.

Multimodal measurement results

Visual fixation is active when assessing internal mental representations, and eye movement indicators can be used as measures of choice and preference34. Based on data analysis, it was concluded that among the five eye movement parameters, only the average pupil diameter and first fixation duration were significant and effective. The chair color significantly affected the average pupil diameter, whereas the interaction between chair color and form significantly affected the first fixation duration. In previous studies, there was little research on the form and color of furniture. However, there are references in other design fields. Bradley et al.37found that pupil sizes evoked by International Affective Picture System images differed significantly. Guo, Ding, Liu, Liu, and Zhang62reported that first fixation duration and average pupil diameter could reflect the user experience of products when participants browse products freely. It is possible that emotional responses observed through pupil size changes are also influenced by visual characteristics such as color and contrast63. Research suggests that these visual factors can affect physiological responses, including pupil dilation, which may not solely reflect emotional or aesthetic reactions but also the inherent sensory impact of the stimuli themselves. Thus, the emotional effects we interpret might be partially mediated by the color and contrast of the stimuli, which engage the visual system and can modulate physiological responses.

Moreover, statistical analysis revealed that the values of these two effective eye movement parameters evoked by chairs with complex and warm colors were significantly higher than those evoked by chairs with cool/neutral colors and simple forms, suggesting that warm colors and complex forms are more likely to stimulate visual responses. The results of the correlation analysis showed that the values of these two effective eye movement indices had a weak positive correlation with the subjective behavioral data. This implies that changes in the visual eye movement parameters affected the participants’ preferences for different colors and forms of chairs. Visual attention and stimulation would determine how much people like or dislike chair products, which, in turn, affect their desire to consume64,65. People pay more attention to things they like and give more visual attention to chairs with higher subjective ratings, thus inducing longer first fixation duration and larger pupil diameters. Conversely, less visual attention is given to chairs with lower ratings that they dislike less, thus inducing smaller pupil diameters in the first fixation duration62.

Guo et al.41proved that saccade and fixation counts relate to users’ emotional experiences when shopping online. Other studies have shown that the type of traditional furniture has a significant effect on total fixation time and average fixation count66. Our study results showed no significant effects on total fixation time, fixation count, or saccade count, differing from findings in online and interactive environments, where visual richness tends to increase engagement. Studies indicate that in more dynamic contexts, elements like color contrast, varied layouts, and interactive content significantly draw users’ attention and influence eye-tracking metrics67. In contrast, our experimental setup was highly controlled, with standardized lighting, angle, and background to reduce extraneous influences. Additionally, traditional furniture typically incorporates intricate patterns and complex textures, which may naturally draw more attention and increase fixation counts, unlike the simpler, modern designs used in this study. This likely minimized attention shifts typically seen in more complex visual environments, suggesting that the lack of significant effects in our study may reflect the simplified, controlled stimuli, where variations in fixation metrics were less pronounced68.

Furthermore, most single-eye movement measurement studies have set flexible and controllable observation times, whereas our study set a fixed observation time of 10s. Fixation of the observation time ensures the integrity of data recording during synchronized measurements of other physiological indices and avoids confounding preferences and decision-making processes. This led to an inability to develop significant gaze or blinking behaviors69, and the partial eye movement data in the time dimension did not show significant differences, as previously mentioned in the literature.

The t-test results revealed that the physiological signal values of the EKG and GSR in the resting state were significantly different from those while viewing the stimulus material. The results of the physiological measurement showed that the interaction between form and color and the chair form factor had significant effects on the SC signal of the GSR without the AVHR of the EKG. Higher GSR values were observed for warm-colored chairs in the complex form than for cool/neutral-colored chairs in the simple form, consistent with previous research70,71. Wilson and Angela Sasse72conducted five studies on the impact of audio and video degradation and found a remarkable increase in SC and AVHR72. Mandryk and Atkins73 found evidence of different physiological responses in the body when playing computer games and provided a method for evaluating user experiences with entertainment technology using physiological responses.

Nevertheless, our results suggest a significant effect on GSR (SC signal) but not EKG (AVHR). The difference in color and form might not be enough to influence the EKG because viewing the stimuli was relatively quiet, not as active as watching videos, playing games, or listening to music. However, the t-test results also illustrated that the visual stimulation of pictures still had some effect on the EKG relative to the calm state.

The P300 wave is associated with an individual’s attentional and memory processing of stimuli and can, therefore, be used to study several psychological processes, such as cognition and decision-making. The P300 wave has also been associated with an individual’s favor or disfavor of a particular stimulus in contexts involving preferences74. The results revealed that the amplitude of the P300 component was significantly affected only by chair color, which demonstrated that participants focused their attention on stimuli of various colors75. Participants had the highest P300 amplitude when viewing warm-toned chairs, followed by cool-toned chairs, and the lowest for neutral-toned ones, which aligns with the results of previous studies. Higher arousal in the P300 component was observed when viewing complex-formed chairs, suggesting that complex morphology can stimulate more P300 component activity59. Nevertheless, the experimental results showed that morphology did not significantly affect the P300 amplitude.

Regarding the relationship between subjective response and GSR or EEG, the correlation was not significant, which is similar to the findings of other studies41 and may be due to small differences in signal fluctuations.

The participants’ sex had no significant impact on the degree of preference, including subjective and objective measurements in our study. This means the degree of preference for the different chairs was similar, regardless of the participant’s sex. Previous studies reached similar conclusions in other design fields. Djamasbi et al.76 examined whether sex preferences could influence the recognition of specific information provided by specific items on a webpage by collecting subjective data and users’ fixation information. Their study did not show any notable differences between the sexes. Additionally, Guo et al.62 found similar conclusions that sex did not affect eye movement indices that could reflect the product’s user experience.

Regarding the effect of the participants’ majors (design and non-design) on the preference level, the results of the data analysis showed that the major influence of the specialty on the subjective evaluation was significantly greater than that of numerous indicators, such as EKG, GSR, and EEG. Regardless of specialty, people have similar eye movements and physiological and neural feedback when viewing pictures of chairs. However, their behavioral data (preference decisions made) vary significantly owing to differences in educational backgrounds that lead to varying knowledge about furniture aesthetics and design77,78.

Aesthetic model construction and validation

Model setup

We defined − 4 to −1 as “dislike” and 0 to + 4 as “like” in the preference scores, and used the RF algorithm to build an aesthetic classification model. Hyperparameters in the RF model affect the performance and generalization ability of the Random Forest model, hence the GridSearchCV tool was used to systematically search for the optimal combination of parameter configurations for the model. GridSearchCV uses 5-fold cross-validation, and the metric for the model evaluation uses accuracy. The final results of the Hyperparameter Optimization and the RF parameter setup are shown in Table 7.

Table 7 Hyperparameter setup for RF.

Model evaluation

RF method, the dataset will be split into 70% training data and 30% testing data. Precision, recall, and F1-score will be used to evaluate the predictive performance of the model.

Precision represents the proportion of samples predicted as positive by the model that are actually positive, calculated by the formula (3):

$$\:Precision=\frac{True\:positives}{True\:positives+False\:positives}$$
(3)

Recall represents the proportion of actual positive samples that are correctly predicted as positive by the model, calculated by the formula (4):

$$\:Precision=\frac{True\:positives}{True\:positives+False\:negatives}$$
(4)

F1-Score is the harmonic mean of precision and recall, particularly suitable for imbalanced class situations. The formula (5) for calculation is:

$$\:F1-Score=2\times\:\frac{Precision\times\:Recall}{Precision+Recall}$$
(5)

The classification model was evaluated by using test samples. Based on the calculation of three evaluation metrics, the model prediction performance is shown in Table 8. Precision reached 0.934, Recall reached 0.969, and F1-Score reached 0.946. This indicates that the classification prediction model based on RF can accurately predict the classification according to these 10 selected features. Based on the ranking of feature importance (Fig. 11), this study found that the most important features in the aesthetic model were the SC signal and the average pupil diameter, implying that affective and cognitive processing play a key role in the aesthetic experience. Color and Form features had the least influence in the model, but their contribution to aesthetic experience should not be ignored.

Fig. 11
figure 11

The Ranked importance scores of 10 selected features estimated by the RF model.

Table 8 Evaluation results of RF model by using the 30% test data (432 samples).

Conclusion

As a critical factor in evaluating furniture design quality, participants’ preference plays an important role in purchase decisions and deserve extensive research. This study employed the ErgoLAB HME Synchronization Platform to investigate eye movements and physiological responses during preference-making. The simultaneous multimodal measurement method proposed in this study recorded and analyzed both subjective preference evaluation, the actual eye movement, physiological data, EEG signal data simultaneously. Based on these data, the aesthetic RF model was constructed and the Precision value of this classification model can reach 0.934.

This study aimed to advance the understanding of how HCI and synchronous measurement techniques can enhance the study of user behavior and emotions in the context of furniture and to guide future furniture design based on scientific considerations of form and color preferences. Additionally, this approach could be extended to other products and architectural spaces while increasing the complexity of mood measurements. This will allow for the continuous improvement and refinement of this synchronized measurement method in practical applications.