Introduction

Disgust, a primary negative emotion recognized since Darwin1, is elicited by a distaste for food and contact with feces. This emotion plays a crucial role in protecting animals from intoxication and infection by helping them avoid harmful substances and pathogens2. Abnormal manifestations of disgust have been observed in patients receiving chemotherapy for cancer3,4 and have been proposed in the pathogenesis of various intractable psychiatric disorders, including eating disorders and obsessive-compulsive disorder5,6,7. Therefore, elucidating the neural mechanisms underlying disgust is essential for understanding the biological basis of emotion, reducing drug side effects, and effectively managing various psychiatric disorders.

Since disgust is a conscious experience and thus cannot be directly observed8,9,10,11,12, an observable readout is required to elucidate its neural mechanisms. Taste reactivity, consisting of orofacial and somatic behavioral reactions in response to taste stimuli, expresses the hedonic evaluation of taste, but not taste quality13,14,15. Several specific behaviors, such as gape, in taste reactivity have been established as reliable indicators of disgust15 and are referred to as disgust reactions16,17,18,19.

The quantification of disgust reactions is so complex that manual analysis has been carried out and automated analysis methods have not been developed. In previous analyses of disgust reactions, orofacial and somatic behavioral reactions were video-recorded, and the relevant behaviors were subjectively distinguished while the video was played frame-by-frame. Thus, the quantified data of a particular test can be somewhat variable between the people analyzed, and a significant amount of time and effort is required for the analysis20.

To overcome these limitations and reproducibly and quickly assess disgust reactions in mice, we constructed a classifier to automatically count them. To accomplish this, we utilized DeepLabCut as an estimation system for the position of the body part based on transfer learning with deep neural networks that track defined body parts of animals without physical markers21 and random forest as a supervised learning model with an ensemble learning method22.

Methods

Animals

Adult (6–7 weeks-old) male C57BL/6J mice (n = 27) were obtained from Japan SLC Inc. (Shizuoka, Japan). Upon arrival, mice were housed in clear plastic cages (18 × 26 × 13 cm) with wood tips (Soft tip: Japan SLC, Shizuoka, Japan) in groups of 2–4 males per cage and were then housed individually in smaller cages (14 × 21 × 12 cm) immediately before handling. Mice were maintained at 23 ± 1 °C under a 12 h light/dark cycles (lights on at 8:00 am) and given ad libitum access to food (Labo MR stock; Nosan Corp., Kanagawa, Japan) and tap water. The tap water was changed to Milli-Q water 3 days before beginning the reaction experiments. All behavioral experiments were performed during the light cycle. All the animal experiments were approved (No. 0150384 A, 0160057C2, 0170163 C, A2017-194 A, and A2018-138C4) by the Institutional Animal Care and Use Committee of the Institute of Science Tokyo, and was performed according to the ARRIVE and relevant guidelines and regulations.

Surgery for intraoral tubing

For implantation of an intraoral tube, mice were anesthetized by intraperitoneal (i.p.) injection of sodium pentobarbital (40 mg/kg BW; Nembutal; Abbott Laboratories) or a mixture of midazolam (4 mg/kg BW; Astellas Pharma Inc.), butorphanol (5 mg/kg BW; Meiji Seika Pharma Co., Ltd.), and medetomidine (0.3 mg/kg BW; Nippon Zenyaku Kogyo Co., Ltd.)23. The depth of anesthesia was maintained at a sufficient level to prevent paw pinch reflex. An incision was made in the midline of the scalp. A curved needle attached to an intraoral polyethylene tube (SP-10; Natsume Seisakusho Co., Ltd., Tokyo, Japan) was inserted from the incision site and advanced subcutaneously posterior to the eye, to exit at a point lateral to the first maxillary molar on the right side of the mouth. The intraoral end of the tube was heat-flared to an approximate diameter of 1 mm to prevent it from drawing into the oral mucosa. Mice were mounted in a stereotaxic frame (David Kopf Instruments, Tujunga, CA, USA) using ear bars. Two alcohol-sterilized small screws (PN-04; LMS Co., Ltd., Tokyo, Japan) were anchored to the skull and a piece of plastic bar was fixed to the screws with dental cement (Unifast3; GC Corp., Tokyo, Japan). The end of the tube exiting the head incision was fixed to a plastic bar with dental cement. Mice received subcutaneous injections of the antibiotic chloramphenicol sodium succinate (60 mg/kg BW, Chloromycetin Succinate; Daiichi Sankyo Co., Ltd., Tokyo, Japan) for infection prevention and carprofen (5 mg/kg BW, Rimadyl; Zoetis, Tokyo, Japan) for pain relief. Three days after surgery, one end of a delivery tube (SP-10; Natsume Seisakusho Co., Ltd.) was connected to the end of the intraoral tube fixed to the plastic bar on the mouse head using a connector (KN-394, two directions (0.3 + 0.3); Natsume Seisakusho Co., Ltd.), and the other end was connected to a needle (30Gx13mm; ReactSystem Co., Osaka, Japan) attached to a 1-mL syringe, and ~ 20 µL of sterile MilliQ water was introduced into the mouth of the animal to test the patency of the tubes. The intraoral tube was flushed with ~ 20 µL sterile MilliQ water every 2–3 days to prevent occlusion. The mice were allowed 1–3 weeks to recover from surgery before the beginning of the behavioral experiments (Fig. 1).

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Schematic of the automatic quantification process. The left column illustrates “Machine prediction” and the right one illustrates “Human observation” process in the present study.

Taste reactivity test

The taste reactivity test14,15,18 was used to measure affective behavioral reactions in mice. The test chamber was composed of a glass floor and an acrylic cylinder (30 cm height, 10 cm outside diameter, and 3 mm thickness). A digital video camera (HDR-PJ800; Sony, Tokyo, Japan) was placed beneath the glass floor to record the ventral view of the mouse. One end of the delivery tube was connected to the end of the intraoral tube fixed to the plastic bar on the mouse head using a connector, and the other end was connected to a needle attached to a 1-mL syringe. The stimulant solution was filled into a syringe and infused into the mouth of the mice. Orofacial and somatic behaviors during the infusions were video-recorded in a bottom-up view at 30 frames per second (Fig. 2A).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Experimental setup for taste reactivity test in mice. (A) Solution was infused intraorally through an implanted tube. Orofacial and somatic behaviors during infusions were video-recorded in a bottom-up view. (B) This scheme shows the Experimental timeline. Experiment 1,2 (Innate): On Day 1, mice received intraoral infusions of Quinine solution as habituation. On Day 2 (test day), the mice received the same infusions as those performed on Day 1, and the behaviors of the mice during the infusions were video-recorded. In Experiment 1, infusions were performed twice for a total of 1 min each, with a 1 min interval in between. In Experiment 2, the infusions were continuous for 2 min, without intervals. Experiment 3 (Innate): On Day 1, mice received intraoral infusions of quinine solution, and shortly thereafter received an i.p. injection of physiological saline as a control treatment for the learned groups described below. The same procedure was repeated on Day 4. On Day 6 (test day), the mice received intraoral infusions of quinine solution, and their behaviors during the infusions were video-recorded. In Experiment 3, infusions were performed twice for a total of 1 min each, with a 1 min interval in between. Experiment 4,5 (Learned): On Day 1, mice received intraoral infusions of saccharin solution, and shortly after that, they received an i.p. injection of LiCl solution (1st conditioning). The same procedure was repeated on Day 4 (2nd conditioning). On Day 6 (test day), the mice received intraoral infusions of saccharin solution, and their behaviors during the infusions were video-recorded. In Experiment 4, infusions were performed twice for a total of 1 min each, with a 1 min interval in between. In Experiment 5, the infusions were continuous for 2 min without intervals.

To assess disgust reactions in the innate and learned experimental conditions, the mice were divided into an innate group (n = 13) and a learned group (n = 14) after surgery. As the experiments were conducted while optimizing the experimental conditions, the conditions differed from experiment to experiment. Three different schedules and experiments were performed for the innate groups and two slightly different schedules and experiments were used for the learned groups (Fig. 2B; Table 1). Each mouse was used in a single experiment. The number of mice used in each experiment was determined by unintentional factors such as the number of mice available at the time the experiment was performed. The experimental schedule for each experiment is as follows.

Table 1 Taste reactivity test schedule.

Experiment 1 for the innate group: On Day 1, mice (n = 5) received intraoral infusions of 50 µL of 3 mM quinine solution twice for 1 min each with a 1 min interval in between to adapt to the test procedure. On Day 2 (test day), the mice received the same infusions as those administered on Day 1, and their behavior during the infusions was recorded (Fig. 2B; Table 1).

Experiment 2 for the innate group: On Day 1, mice (n = 2) continuously received infusions of 80 µL of 3 mM quinine solution for 2 min without an interval to adapt to the test procedure. On Day 2 (test day), the mice received the same infusions as those administered on Day 1, and their behaviors during the infusions were video-recorded.

Experiment 3 for the innate group: On Day 1, mice (n = 6) received intraoral infusions of 50 µL of 3 mM quinine solution twice for 1 min each with a 1 min interval in between. Shortly thereafter, the mice received an intraperitoneal (i.p.) injection of physiological saline (10 mL/kg body weight [BW]) as a control treatment in the learned group. The same procedure was repeated on Day 4. On Day 6 (test day), the mice received the same infusions as those administered on Day 1, and their behaviors during the infusions were video-recorded.

Experiment 4 for the learned group: On Day 1, mice (n = 11) received infusions of 50 µL of 5.4 mM saccharin solution twice for 1 min each with a 1 min interval in between. Shortly thereafter, the mice received an i.p. injection of LiCl solution (0.3 M, 10 mL/kg BW) (1st conditioning). The same procedure was repeated on Day 4 (2nd conditioning). On Day 6 (test day), the mice received the same infusions as those administered on Day 1, and their behaviors during the infusions were video-recorded.

Experiment 5 for the learned group: On Day 1, mice (n = 3) continuously received infusions of 80 µL of 5.4 mM saccharin solution for 2 min without an interval. Shortly thereafter, the mice received an i.p. injection of LiCl solution (0.3 M, 10 mL/kg BW) (1st conditioning). The same procedure was repeated on Day 4 (2nd conditioning). On Day 6 (test day), the mice received the same infusions as those administered on Day 1, and their behaviors during the infusions were video-recorded.

Thus, videos obtained from five different experiments were used for subsequent analysis. Among these videos, almost half (n = 14; innate = 7, learned = 7) were obtained in a previous study in which automated tracking was not performed18 (Table 2).

Table 2 Videos used in the present studies.

Automated body part tracking

DeepLabCut2.2rc1 (DLC)23 was used to develop a tracking framework with five mouse-body parts (Fig. 3A). Frame choice and training were performed using the default settings of the DLC. Initially, 20 frames were selected from each video for training purposes. The provisional coordinates of the mouse body part were estimated according to the temporarily created model, and 40 candidate frames for false predictions were selected for each video and manually annotated. This procedure was repeated until the loss was < 0.001 and the error was sufficiently small. After the training sessions were completed, the coordinates of the mouse body part in the other (not manually annotated) frames were estimated (Fig. 3A).

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Schematic diagram of the machine learning estimation method used. (A) Schematic illustration of the estimation of body part position using DLC. (B) Schematic illustration of the Disgust Classification of each frame using random forest.

Disgust reaction labeling

All 27 videos, including those taken in a previous study18, were (re)analyzed by a blinded observer who was not involved in the previous study18. According to previously established rules15,18, all video frames were labeled manually frame-by-frame (30 frames/sec) to determine whether they constituted disgust reactions and, if so, which of the five types of disgust reactions they were. The five types of disgust reactions were gapes (large opening of the mouth with retraction of the lower lip), headshakes (rapid lateral movement of the head), face washes (wipes over the face with the paws), forelimb flails (rapid waving of both forelimbs), and chin rubs (pushing the chin against the floor of the test chamber).

Random forest classification and model evaluation

We extracted 25 per-frame geometric features. They consisted of five displacements of each body part (i.e., nose, left front paw, right front paw, left rear paw, and right rear paw) from the previous frame, ten distances between each body part (e.g., the distance between the nose and left front paw), and ten amounts of changes of each distance from the previous frame. In addition, for each frame, the sum of all features from five consecutive frames, including the two frames before and after the frame, was considered, so that each frame had 125 features. The features created in this way were verified to determine whether they were appropriate for inclusion in the model in the training process for random forests, which is described later. Specifically, after confirming the correlation between the features, they underwent a feature-selection process based on the feature importance of random forests using the Boruta package24 in R. The feature importance is the Z-score of the mean decrease accuracy measure (the default setting in the Boruta package). The importance of each feature is determined from the degree of degradation in the prediction performance when only that feature is randomly shuffled from the original data. This is based on the hypothesis that if the feature is very useful, the prediction using the data containing the shuffled feature should have a much worse performance than the prediction using the original data that was not shuffled. The feature selection algorithm also computed the feature importance for the original features shuffled by feature (“shadow” features), and the original features that were significantly (p = 0.01, as the default value in the package) higher than the shadow features were finally selected for inclusion in the model.

We then constructed a random forest classifier using the caret25 and Rborist26 packages in R. Before developing the classifier, the data were randomly divided into training (n = 19; innate = 9, learned = 10) and test (n = 8; innate = 4, learned = 4) datasets, with almost equal numbers of innate and learned groups (Table 2). The classifier was constructed by setting the classWeight argument in the train function of the caret package to ‘balance’ in view of the imbalanced presence of frames containing movements of interest, which allowed the classes to be weighted inversely proportional to their frequency of occurrence in the data. In the training sessions, the dataset was leave-one-subject-out (19-fold) cross-validated for each mouse to prevent overfitting (Fig. 3B). In addition, we performed hyperparameter tuning and selected the model with the highest accuracy as the final one using the tuneLength argument in the train function of the caret package. Although the number of ensembled decision trees was fixed at the default value of 500 during the model creation process, we confirmed that the number of ensembled decision trees was sufficient by separate training after the final model creation, with the same training parameters changing only nTree.

After developing the final model, we evaluated the classifier’s frame-by-frame performance in discriminating each type of disgust reaction by calculating the overall accuracy, as well as its positive predictive value (PPV) and sensitivity. PPV, also called precision, is expressed by the following equation:

$$\:\frac{true\:positive}{true\:positive\:+\:false\:positive}$$

Sensitivity, also called recall, is expressed by the following equation:

$$\:\frac{true\:positive}{true\:positive\:+\:false\:negative}$$

Disgust reaction scoring

For the five types of disgust reactions, we analyzed consecutive frames with the same label as one count. To ensure that each component of the disgust reactions contributed equally to the final scores, reactions that occurred in continuous counts were scored in time bins according to previously established rules15,27. Components characterized by long-duration counts, such as chin rubs, were scored in five-second bins (successive repetitions within five seconds scored as one occurrence), and face wash was scored in two-second bins. The other three reactions that could occur as a single behavior were scored as separate occurrences (e.g., one forelimb flail equals one occurrence). The total disgust reaction scores were quantified as the sum of all five binned counts during two minutes of intraoral infusion of the test solutions.

Computer software and hardware

A desktop computer equipped with an Intel Core i7-10700 CPU, an NVIDIA GeForce GTX 1660 SUPER GPU, and 16 GB of RAM was used for mouse posture estimation in DLC and data processing in R.

Results

Manual analysis of disgust reactions

To determine which of the five types of behavior that make up the disgust reactions were included in each frame of the recorded videos used in this study, we manually labeled and quantified 97,200 frames from 27 videos based on established rules15,18 (“Human observation” in Fig. 1). Consistent with a previous study18, disgust reactions were observed in both the innate and learned groups with a similar number of reactions for each type of disgust reaction (Fig. 4A). Regarding the number of counts for each type of disgust reaction, forelimb flails, face washes, and chin rubs were more common than were gapes and head shakes (Fig. 4B, Wilcoxon signed-rank sum test; gape vs. chin rub: p = 0.049, gape vs. face wash: p < 1e-6, gape vs. forelimb flail: p < 1e-7, head shake vs. chin rub: p = 0.031, head shake vs. face wash: p < 1e-7, head shake vs. forelimb flail: p < 1e-7).

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

The number of each type of disgust reaction. (A) The number of frames that included any disgust reaction was confirmed manually. (B) The number of counts for each type of disgust reaction was confirmed manually. Each dot represents a value for a mouse.

This classical disgust reaction quantification method requires significant time and effort for analysis20.

Automated body part tracking

To extract features from recorded videos for machine prediction of the disgust reaction by random forest in the following step, we first semi-automatically annotated the positions of the body parts of mice in the videos using DeepLabCut (DLC) (“Machine prediction” in Fig. 1). We automatically estimated the position of the body part to perform automatic quantification of video-recorded disgust reactions. Because the optimal number of body parts to be labeled for posture estimation is yet to be established, we referred to the occurrence rate of each type of disgust reaction (Fig. 4B), to determine the number of markers. The movements that can be tracked by focusing on the nose and the front and rear paws (i.e., chin rubs, face washes, forelimb flails, and head shakes) comprised 96.23% of the disgust reactions. In contrast, movements that required attention to mouth movements (i.e., gapes) were rarely observed (3.77%). Thus, we defined the five body parts of mice, the nose, and the front and rear paws (Fig. 5A) were sufficient to meet the objective of the present study, which estimated the total scores for disgust reactions.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Annotation of five body parts for estimation of the position of the body part. (A) Definition of the body parts used to estimate the position of the body part in the mouse using DLC. Only those visible in the bottom-up view were annotated. Scale bar, 1 cm. (B) Root mean square error (RMSE) for each body part during the training.

In the DLC training process, the coordinate information of the five body parts was annotated in 2674 frames, which were automatically selected by DLC, from 27 single-trial video recordings through three training sessions with 500,000 iterations each (Fig. 3). The errors in the estimation of the body part position were evaluated after the model training procedure. The results showed that the mean error for each body part was less than 1 mm (Fig. 5B), which was sufficiently small compared to previous studies using DeepLabCut (1–8 mm)28,29 to confirm that the training was complete.

Outlier frame detection and correction

Next, we corrected for mouse posture estimation because our posture estimation often failed in specific postures (e.g., standing up with its paws against a wall) and rapid movements that could not be captured on a recording. We introduced a rule-based correction procedure that refers to the likelihood of preventing incorrect estimations in such cases. Specifically, the following two corrections were made for all frames in the video posture-estimated by DLC when the estimated likelihood of a particular body part in a specific frame was less than a predetermined threshold value. (1) If the likelihood of the body part in the next frame exceeded the threshold value, the position of the body part in the frame was corrected in the middle of the frame before and after. (2) If the likelihood of the body part in the next frame was below the threshold, the body part’s position in the frame was corrected to the same position as that in the previous frame. In the present study, we set the threshold value to 0.6, which is the default value for an outlier correction method inside the DLC.

Random forest classifier

We developed a random forest classifier to automate the image classification task of determining whether each video frame contained the behaviors of interest. Before developing the classifier, the data were randomly and equally divided into training and test datasets between the innate and learned groups. There were no significant differences in the incidence of each type of disgust reaction between the datasets (Fig. 6; Table 3, Wilcoxon rank-sum test; Chin Rub, p = 0.466; Face Wash, p = 0.441; Forelimb Flail, p = 0.212; Gape, p = 0.547; Head Shake, p = 0.413; Others, p = 0.222).

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

The number of each type of disgust reaction in Train and Test sets. The number of frames that included any disgust reaction was manually confirmed.

Table 3 Observed behaviors.

Next, we checked the possible correlations between the geometrical features. The results showed that the features created from the five consecutive frames of each of the distances between each body part had a high correlation (Pearson’s r = 0.82 ~ 0.99). This correlation was clearly higher than that of the other features (r = -0.30 ~ 0.75). Because the presence of feature pairs with very high correlations increases the complexity of the model and does not contribute to its prediction performance, all features generated as two frames before and after the frame of the distances between each body part (40 features in total) were not included in the model. This resulted in the employment of 85 feature candidates.

We developed three random forest models with different predFixed hyperparameters, that is, the number of randomly selected features when forming each split in a classification tree, of the Rborist package in R. Finally, we adopted a model with a parameter value of 85 and the highest cross-validated accuracy (Fig. 7). To confirm that 500 was sufficient for the number of decision trees ensembled in the final model, separate training was conducted with nTree varying from 10 to 1000 in steps. The results confirmed that there was no significant difference in Accuracy for nTree > 200.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

The relationship between the number of randomly selected features and cross-validated accuracy for hyperparameter tuning. The dots represent the mean accuracy of the entire cross-validation and the vertical lines represent the standard deviation.

After the random forest model was completed, the importance of the features was checked using the Boruta package24 to find the features that are likely to contribute strongly to the performance of this model and to remove those that do not contribute sufficiently. During this process, the importance calculation was automatically repeated 18 times until all the features were adopted or rejected. The most important features were the distance between the two front paws and the distance between the nose and right/left front paws (Fig. 8), indicating that the positional relationship between the two front paws and nose of the mouse was particularly important for the prediction of this model. In addition, all 85 original features exceeded the importance of the randomly generated shadow features and none were deemed to be removed (Fig. 8).

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

The importance of each feature in the final model. The values are the Z-scores of the mean decrease in accuracy measure. The plots show the distribution of the values over 18 iterations. The center line of each box plot corresponds to the median value, the lower and upper hinges correspond to the first and third quartiles, and the lower and upper whiskers extend from the respective hinges to the minimum or maximum value within 1.5 × interquartile range (IQR). Data that fell outside the edge of the whisker are plotted separately as dots. The temporal relationship of the feature used for each frame to the frame itself is shown as t + x or t − x (i.e., when x = 2, the feature is obtained from the two frames after the frame). shadowMin, shadowMean, and shadowMax are the minimum, mean, and maximum values, respectively, of the randomly generated features at each iteration. The original features are shown in red and shadow features are shown in blue. ns = nose, rh = right front paw, lh = left front paw, rf = right rear paw, lf = left rear paw, disp = displacement, dist = distance, disc = change of distance.

In the test set, face washes (sensitivity = 0.65, positive predictive value (PPV) = 0.85) (Fig. 9A), and forelimb flails (sensitivity = 0.55 and PPV = 0.73) (Fig. 9B) were detected frame-by-frame with reasonable accuracy. In contrast, chin rubs (sensitivity = 0.16, PPV = 0.63) (Fig. 9C), gapes (sensitivity = 0), and head shakes (sensitivity = 0) were almost undetectable (Table 4). Others (i.e., movements other than the behaviors of interest) were detected with 0.99 sensitivity and 0.97 PPV. This prediction result was similar to that of the cross-validated prediction in the training set (Table 5).

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Typical examples of positive and false classifications by the present random forest classifier. The nose, right front paw, and left front paw of the mouse are highlighted with green, yellow, and blue crosses, while the right and left rear paws are marked with yellow and blue dots, respectively. Successive images were recorded at intervals of 33 milliseconds (ms) (1/30th of a second). Scale bar, 1 cm. (A) Examples of the classifier’s positive and false classifications of face wash. (Top) A series of frames in which the classifier correctly classifies the face wash. (Bottom) In this series, the classifier did not detect the face wash, and there was an estimation error in the position of the body part: at t = 0 and 33 ms, the right front paw positions were correctly estimated (red arrow), but at t = 67 ms, they were incorrectly estimated (red arrowhead). (B) Examples of the classifier’s positive and false classifications of forelimb flails. (Top) A series of frames in which the classifier correctly classified the forelimb flails. (Bottom) A series of frames in which the classifier did not detect forelimb flails and there was no error in the estimation of the body part position. (C) Examples of the classifier’s positive and false classifications of chin rubs. (Top) A series of frames in which the classifier correctly classifies chin rubs. In this series, the left front paw of the mouse was moving so quickly that it could not be accurately recorded on the video; at t = 33 and 67 ms, its position was correctly estimated (red arrow), but at t = 0 ms, its position was incorrectly estimated (red arrowhead). However, the estimation of the position of the body part was successful. (Middle) In this series, the classifier did not detect the chin rub, and there was an error in the estimation of the position of the body part: at t = 0 and 67 ms, the right front paw positions were correctly estimated (red arrow), but at t = 33 ms, they were incorrectly estimated (red arrowhead). (Bottom) The classifier did not detect chin rubs in this series and there was no error in estimating the position of the body part.

Table 4 Observation and prediction of the number of frames that include each type of disgust reaction and other movement in the test set.
Table 5 Observation and prediction of the percentual average counts overall cross validation that include each type of disgust reaction and other movement in the train set.

Disgust reaction scoring

To evaluate the performance of the classifier for the disgust reaction score, predictions of the total number of frames that constitute individual disgust reactions were scored according to certain rules, as mentioned above, and the scored values were compared with the values manually counted and scored by a human. We found a high correlation between the scores calculated from the classifier’s prediction and human count (Pearson’s r = 0.97, Fig. 10, see Supplementary Movie).

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

The classifier’s predictions were highly correlated with human observations. Comparison of disgust reaction scores for each video between observations and predictions in the test dataset. Pearson’s r = 0.97.

Discussion

We constructed a classifier using DLC and random forest to automatically count innate and learned disgust reactions in the taste reactivity test. The total predicted values of the classifier showed a strong correlation with human observations. The present study is the first to automatically assess disgust reactions in taste reactivity tests.

Our classifier was constructed according to the published taste reactivity test protocols. Therefore, the scores analyzed by our classifier can be compared with manually counted scores. We found a very high correlation in this comparison (r = 0.97; Fig. 10). The accuracy of our classifier is comparable to that of previous automated methods: the pup retrieval behaviors in mice were automatically assessed with 0.51–0.90 correlations using DeepLabCut and machine learning28, and the number of chemically induced scratching in mice was predicted with 0.98 correlations using a convolutional recurrent neural network30.

The present study used DLC for marker-less estimation of the position of the body part because a previous study demonstrated excellent performance31. Nonetheless, alternative open-source software programs such as DeepPoseKit32 and SLEAP33 allow for comparable estimations. Our present classifier may also assess disgust reactions even if these programs are used instead of DLC. The research field of markerless estimation of the position of body parts has made remarkable progress in recent years, and more accurate programs are likely to become available. Accordingly, automatic behavior evaluation procedures that estimate the position of the body part in advance, such as our present classifier, are expected to improve in accuracy because at least some of the failures of our classifier resulted from the failure to estimate the position of the body part (Fig. 9).

A recent study evaluated facial expressions in response to taste stimuli in mice using a machine-learning-based method34. Facial expressions in response to bitter stimuli were automatically distinguishable from those in response to the other stimuli. Bitter-induced facial expressions were also induced by the intake of saccharin solution paired with intraperitoneal LiCl injection and by the intake of NaCl solution at high concentrations. However, it remains unclear what specific brain functions are reflected in the observed facial expressions because the taste stimuli used may evoke multiple brain functions, including disgust and motivation to avoid the stimulants. In addition, the facial expressions detected by this method in mice were challenging to interpret as relevant facial expressions in other species, including humans, making it difficult to interpret the mouse’s facial expressions as a readout of emotion. In contrast, all disgust reactions expressed by mice are shared by several different species, such as rats, monkeys, and human neonates15. This fact supports the idea that a mouse’s disgust reactions and human disgust may share a common neural basis. In support of this, taste reactivity may reflect different brain functions and neural activities from facial expressions because the number of disgust reactions and facial expressions do not always fit in mice35. In addition to facial expressions, intake and licking behaviors have been widely examined in taste avoidance and evaluation studies, but the results from these behaviors do not always match those of taste reactivity20,36. Thus, taste reactivity will continue to be a valuable readout for future research to understand the neural basis of disgust.

Our classifier can be built with minimal resources. In the video of the taste reactivity used in the present study, three of the five types of behavioral reactions that make up the disgust reaction, chin rub, face wash, and forelimb flail, accounted for 93.46% of the total disgust reactions (Fig. 4B). As these three reactions are characterized by the movement of the nose and the front and rear paws, tracking the five body parts of the mice, the nose, and the front and rear paws (Fig. 5A) seemed to be sufficient to track these reactions, and thus the disgust reactions observed in the present study. These five tracking points are generally smaller than those used in other studies using DeepLabCut (8–13 points/mouse)29,31. Although the behaviors detected in other studies are different from those in the present study and cannot be directly compared, the relatively small number of tracking points in the present study achieved automated tracking with a relatively small burden of manual annotation and machine training costs. In addition, an inexpensive and commercially available handheld video camera with low frame rates (30 frames per second), which is equal to or less than that used in previous studies (30–60 frames/sec)30,37, was sufficient to record mouse behavior for subsequent analysis using our classifier. Using videos with low frame rates saves time in estimating the position of the body part and reduces the storage size of the original videos, ultimately improving the model38.

However, our classifier could hardly detect gapes and head shakes, which are the other two reactions constituting the disgust reaction (Tables 3 and 4). This may be because the tracking point for the gapes was not set at a sufficiently high level to detect movement around the oral cavity, which was assumed necessary for detection. For the head shake, the 30 fps record may have been insufficient because the movements were too fast. Although the inability to detect these reactions was not problematic in the present study because the frequency of these reactions was low in mice (Fig. 4), increasing the number of tracking points may be required to predict disgust reactions in rats, where gapes are more frequently observed39.

In the present study, we developed an automated assessment method for mouse disgust reactions. The present method allows for the evaluation of disgust reactions independent of the skill level and bias of the human evaluator. In addition, the present method will facilitate the implementation of taste reactivity tests in large-scale screening and long-term experiments that require counting numerous disgust reactions, which are challenging to perform manually. Thus, the present method may help researchers implement taste reactivity tests in a broader range of studies than previously thought20. The present method is expected to accelerate our understanding of the neural basis of disgust, contribute to understanding the pathophysiology of various psychiatric disorders, and advance the development of preventive and therapeutic interventions.