Introduction

Skin care is of considerable importance globally and is growing rapidly due to factors such as an aging population, environmental pollution, and modern lifestyle, further driven by growing consumer awareness about skin health, aesthetics, and the prevention of dermatological conditions. Skin and subcutaneous diseases remain among the leading non-fatal health burdens worldwide, according to recent Global Burden of Disease analyses1. Their high prevalence across diverse age groups and regions underscores the widespread nature of dermatological conditions2. Despite this burden, access to dermatological evaluation remains limited. Traditional diagnostic methods such as visual inspection and invasive procedures often lack precision, can be uncomfortable, and may not reliably provide comprehensive skin assessments3,4. Moreover, the shortage of dermatology specialists in many regions further constrains access to professional assessment, emphasizing the need for more accessible diagnostic alternatives5. This has led to a growing interest in personalized, non-invasive skin care technologies that can accurately assess skin conditions with minimal discomfort6,7.

Current imaging techniques, including dermoscopy and hyperspectral imaging, offer valuable insights into skin health by capturing detailed visual characteristics8,9. Similarly, electrical impedance measurement has become prominent in dermatological assessments, effectively evaluating skin hydration levels and barrier function. Skin impedance is directly related to moisture content, making it a reliable indicator for hydration assessment10,11. Accurately monitoring hydration is important, since even mild dehydration can significantly affect human physiology12.

Additionally, image-based methods provide information correlating with skin types, as visual features captured by cameras reflect underlying physiological characteristics13,14. Image processing combined with machine learning has opened new directions for automation. Saiwaeo et al. employed a convolutional neural network (CNN) to classify skin types from facial images by leveraging surface attributes such as brightness, texture, and color. In contrast, Indriyani and Sudarma adopted discrete wavelet transform (DWT) and local binary pattern (LBP) feature extraction, followed by support vector machine (SVM) classification, achieving an accuracy exceeding 91.7%13,15. These approaches demonstrate that both handcrafted feature-based methods and deep-learning techniques hold significant potential for dermatological image analysis. Skin-type classification plays a pivotal role in personalized skin care and clinical dermatological assessment. Accurate identification of one’s skin type is essential for selecting appropriate skincare products and treatment strategies. However, many individuals misclassify their own skin type, and the use of unsuitable products can lead to skin-related problems or unintended complications16,17.

Despite their advantages, these diagnostic methods, when applied individually, exhibit limitations in sensitivity, specificity, or depth of diagnostic information. Despite offering a direct and quantitative measure, the impedance method is technically less robust in uncontrolled measurement environments. The technique exhibits a high sensitivity to measurement errors at the electrode-skin interface. Furthermore, impedance provides only bulk physiological data, lacking the spatial resolution required to analyze crucial morphological features such as wrinkles, pores, and pigment distribution, which are key to skin type classification. This trade-off between functional measurement reliability and morphological comprehensiveness necessitates a combined solution. Integrating multiple diagnostic modalities can substantially improve the accuracy, reliability, and user acceptability of skin health assessments. Recent literature highlights that combining image-based and impedance-based approaches significantly enhances diagnostic precision compared to single-modality methods18,19,20.

The primary aim of this research is to develop and evaluate a low-cost multimodal diagnostic system that integrates skin imaging and impedance measurements for comprehensive skin-health assessment. The work encompasses designing a portable monitoring device, developing robust machine-learning algorithms, assessing predictive performance, and ensuring real-time operational feasibility. Methodologically, the study advances the field by introducing innovative integration strategies that enhance diagnostic accuracy and improve applicability in both clinical and consumer settings. Overall, the proposed system has the potential to support personalized skincare, strengthen dermatological diagnostics, and contribute to broader advancements in the cosmetic industry.

Methods

This section describes the design and operational workflow of the system, covering hardware completion, software development, data acquisition, and algorithmic processing to form a unified framework for multimodal skin assessment. The device integrates a custom-fabricated skin-contact interdigitated capacitor (IDC) sensor and a camera to simultaneously capture electrical and visual information from the skin. The ESP32 microcontroller handles data acquisition, transmission, and management before sending all information to the server for processing. The collected data including sensor signals and facial images undergo preprocessing to enhance quality, followed by feature extraction using both handcrafted techniques and machine-learning models. Two main tasks are performed: predicting skin moisture and classifying skin types. Multiple machine-learning models are trained and evaluated to determine the most suitable approach for each task.

The proposed skincare system

Impedance sensor

Fig. 1
Fig. 1
Full size image

The capacitance-based moisture sensor: (a) simplified schematic of the signal-conditioning circuit used to excite the interdigitated capacitor and convert capacitance changes into a voltage signal, (b) photograph of the fabricated sensor mounted on a spring-loaded structure to ensure stable skin contact, and (c) Output response of the circuit illustrating the relationship between capacitance variations and the corresponding measured voltage.

Figure 1(a) illustrates the simplified schematic of the signal-conditioning circuit for the capacitance-based moisture sensor. The IDC sensor was fabricated using a custom-designed copper trace in a planar configuration, allowing direct interaction with the skin. This structure is particularly suitable for moisture detection, as variations in skin moisture alter the dielectric constant, thereby changing the sensor’s capacitance21. To ensure consistent and reliable contact with the skin, the IDC sensor is mounted using a spring mechanism.

The sensor interfaces with a signal-conditioning circuit that includes an impedance-to-digital converter and a 74HC14 Schmitt trigger IC (Nexperia, Nijmegen, Netherlands, Netherlands). The 74HC14 performs two primary functions: in the first stage, it generates a square-wave excitation voltage to drive the sensor; in the second stage, it produces oscillation-based pulses corresponding to the sensor’s response. The oscillation frequency is governed by the sensor’s capacitance in combination with the surrounding circuit elements, and can be approximated using the standard expression in (1)22:

$$\begin{aligned} f = \frac{1}{k \times R \times C} \, \end{aligned}$$
(1)

, where:

  • f is the oscilation frequency (in Hz).

  • R is the feedback resistance (in \(\Omega\)).

  • C is input capacitance (in F).

  • k is a constant that depends on the Schmitt trigger thresholds.

Moisture-induced changes in the IDC sensor’s capacitance directly influence the oscillation behavior of the second Schmitt trigger stage. This stage translates the capacitance variations into a pulse-width–modulated (PWM) signal by exploiting the capacitor’s charge–discharge cycle across associated resistive components (Fig. 1(a)). The resulting PWM waveform is then smoothed by a first-order passive low-pass filter with a cutoff frequency of 15.9 Hz, yielding a corresponding DC voltage. This voltage is subsequently digitized by the microcontroller’s analog-to-digital converter (ADC) for further processing.

Figure 1(b) presents a photograph of the fully assembled sensor housed within a enclosure and mounted on a spring mechanism, which ensures stable placement and consistent skin contact during measurements. Figure 1(c) illustrates the relationship between variations in capacitance and the corresponding output voltage measured by the microcontroller’s ADC. The results exhibit a strong linear correlation, confirming the stability and reliability of the circuit in response to capacitance changes.

Functional block diagram of the system

Figure 2(a) illustrates the functional block diagram of the proposed skincare monitoring system. The system is managed by an ESP32-CAM module (Espressif Systems, Shanghai, China), which acquires image data from an integrated OV2640 camera sensor (OmniVision, Santa Clara, California, USA) via serial communication. A capacitance-based moisture sensor is connected through a 74HC14 Schmitt trigger IC, with the resulting voltage signal digitized by the microcontroller’s onboard ADC. To additionally control the influence of ambient humidity and temperature, an SHT3x digital sensor (Sensirion, Zurich, Switzerland) is included, communicating with the ESP32 via the I\(^2\)C protocol. All collected data are wirelessly transmitted to a remote server over Wi-Fi, allowing users to retrieve and analyze the information in real time.

The ESP32-CAM is a compact and cost-effective development board that integrates the ESP32 System-on-Chip (SoC) with an embedded 2.0-megapixel OV2640 camera. The camera supports resolutions up to \(1600 \times 1200\) pixels (UXGA) at frame rates of up to 15 frames per second (fps), with image data transmitted through a Digital Video Port (DVP) interface.

Operational flowchart of the ESP32-CAM

Figure 2(b) illustrates the flowchart describing the operation of the ESP32-CAM microcontroller. The process begins by establishing a Wi-Fi connection and initializing the web server, after which the system remains in a listening state, awaiting client requests. Upon receiving a command, the ESP32-CAM activates the camera module and the impedance sensor. The camera captures a skin image and transmits it to the server via the HTTP protocol. The impedance sensor acquires ten samples within three seconds, computes their average to minimize noise, formats the result as a JSON object, and transmits the data to the server through HTTP.

Fig. 2
Fig. 2
Full size image

System architecture and operational workflow of the proposed skincare device: (a) functional block diagram of the proposed system, (b) flowchart illustrating the ESP32 microcontroller operation, and (c) graphical user interface of the system.

Graphical user interface

Figure 2(c) presents the graphical user interface (GUI) of the proposed Internet of Things (IoT) system during runtime. The GUI was custom-developed by the authors for this study using the PyQt5 framework in Python. The visual layout was designed with Qt Designer to create a user-friendly dashboard that integrates real-time video streaming, sensor data acquisition, and result classification. The software is deployed on a standard personal computer and communicates with the ESP32-CAM via HTTP requests over a Wi-Fi network to synchronize image capture and impedance measurements. Sensor data from the IDC and SHT31 modules can be retrieved by clicking the “Read Sensor” button, while the “Stream” button allows users to define the region of interest (ROI) for camera-based image acquisition. The “Save” button enables storage of the collected data for future analysis.

Upon initialization, the GUI loads a pre-trained machine learning model for skin type prediction. When the “Stream” button is activated, a dedicated thread initiates continuous image capture, displaying a live video feed within the interface. The “Capture” button freezes the current frame, which is subsequently pre-processed using cropping and Contrast-Limited Adaptive Histogram Equalization (CLAHE). Features are extracted from the processed image and passed into the pre-trained model, which performs inference and displays the predicted skin type within the GUI.

When the “Read Sensor” button is pressed, the system retrieves updated data from the impedance and temperature sensors, applies a regression model to estimate skin moisture, and displays the result in real time. The ambient temperature and humidity are also presented within the interface.

For historical monitoring and diagnostics, clicking the “Save” button stores the current sensor readings, associated skin images, and corresponding timestamps, enabling later review and analysis.

Fig. 3
Fig. 3
Full size image

Device design and experimental setup: (a) experimental setup for skin monitoring, (b) measurement protocol, and (c) 3D model design with the complete custom prototype. Dimensions are shown in millimeters (mm).

Experimental protocol

Measurements were taken at multiple facial locations, including the forehead, left cheek, right cheek, and additional random sites to account for skin variability. Figure 3(a) illustrates the skin diagnostic process, where the proposed system is positioned against the skin surface. Reliable contact between the IDC sensor and the skin is ensured by a spring-loaded mechanism, maintaining consistent pressure during measurements. Simultaneously, the camera sensor is positioned approximately 4 cm from the skin to enable proper focusing and optimal capture of reflected light for high-quality imaging. For each subject, the system simultaneously captured a skin image and measured skin impedance using the IDC sensor. A reference device, the Belulu Skin Checker (Beautiful Angel, Japan), was used to estimate skin moisture percentage. Skin types are classified based on observable surface characteristics: normal skin typically exhibits small pores, smooth texture, uniform tone, and a rosy appearance; oily skin is marked by a shiny surface, enlarged pores, thicker and paler texture, and a higher susceptibility to acne; dry skin is characterized by reduced oil secretion, poor hydration, and a darker tone, often accompanied by roughness, flaking, nearly invisible pores, and fine wrinkles13,15. These characteristics serve as a benchmark for evaluating the performance of the proposed system.

A total of 20 volunteers (8 males and 12 females) participated in this study. The participants were aged between 20 and 40 years. All subjects were healthy individuals with no history of dermatological diseases (e.g., eczema, psoriasis). They were recruited to ensure that the sample was representative of the three target skin types (“Dry”, “Normal”, “Oily”). The study protocol was approved by the Ethics Committee of Ho Chi Minh City University of Technology, Vietnam National University, and all procedures were carried out in compliance with the Declaration of Helsinki. Written informed consent was obtained from all participants prior to enrollment. The data acquisition protocol is illustrated in Fig. 3(b).

The 3D design of the housing and the complete prototype of the wearable system are shown in Figure 3(c). In addition to IoT-based wireless data transmission, the system also incorporates a local OLED display for real-time result visualization.

Fig. 4
Fig. 4
Full size image

Image preprocessing and analysis pipeline: (a) cropping and CLAHE enhancement applied to raw skin images, (b) data augmentation through image rotation, and (c) a dual-path signal processing workflow was implemented in which skin images and capacitance sensor data were processed separately. Both modalities were divided into training and testing sets for performance evaluation. Sensor data were used for a regression task using linear regression, MLP, and Random Forest models, whereas skin images were used for a skin-type classification task employing CNN and MLP algorithms.

Data processing

Data preprocessing and feature extraction

In the first task of the study, skin images were captured using the onboard ESP32-CAM at a resolution of \(1024\times 768\) pixels. To eliminate noise and blurred regions near the edges and standardize input dimensions, the images were cropped to \(700\times 700\) pixels. Contrast-Limited Adaptive Histogram Equalization (CLAHE)23 was subsequently applied to enhance local contrast, resulting in higher-quality inputs for feature extraction and model training. An example of the image preprocessing results is presented in Fig. 4(a).

To enrich the dataset and improve model generalization, data augmentation was applied to the training set by rotating images by \(90^\circ\), \(180^\circ\), and \(270^\circ\). This augmentation was performed exclusively on the training data to preserve the integrity of evaluation results. This two-step data enrichment strategy effectively mitigates overfitting and enhances model robustness during real-time inference. An example of the implemented augmentation process is shown in Fig. 4(b), highlighting the comparison between the original and augmented images.

Figure 4(c) illustrates the complete image processing and analysis pipeline used in this study. The dataset, comprising both skin images and IDC sensor data, was divided into 70% for training and 30% for testing. For image-based classification, the raw images were preprocessed through cropping and CLAHE to enhance contrast and improve feature visibility.

Two types of feature representations were investigated: (i) raw RGB images and (ii) handcrafted features extracted from the preprocessed images. The handcrafted features were designed to capture both color and texture characteristics of the skin24,25. Color-based features quantify the distribution and intensity of pixel values within the image. The extracted color features include the mean, standard deviation, skewness, kurtosis, and entropy of the pixel intensity distribution, which collectively characterize brightness, saturation, and overall color balance. Texture-based features, on the other hand, describe local intensity variations that reflect skin surface characteristics, such as patterns, wrinkles, and pores. To extract these, the Gray-Level Co-occurrence Matrix (GLCM)26 was computed, capturing the spatial relationships between pixel intensities at defined distances and orientations. From the GLCM, five key texture descriptors were derived: contrast, dissimilarity, homogeneity, energy, and correlation.

In parallel, IDC sensor data were utilized to predict skin moisture levels using a regression-based approach, which constituted the second task of this study. Three algorithms were evaluated for this purpose: Linear Regression, Multilayer Perceptron (MLP), and Random Forest (RF).

Skin moisture prediction

The first objective of this study was to predict skin moisture levels based on impedance measurements obtained from the IDC sensor. During data collection, skin impedance was recorded simultaneously with reference moisture values provided by a commercial skin analysis device, which served as the ground truth. As previously mentioned, three machine learning algorithms, namely Linear Regression, MLP, and RF, were employed to model the relationship between the measured impedance and the reference moisture values. Once trained, these models were used to estimate skin moisture from new impedance inputs acquired by the IDC sensor. The Mean Squared Error (MSE) was adopted as the loss function during training, and the performance of the models was evaluated using two key metrics: MSE and the coefficient of determination (\(R^2\)). In addition, skin moisture is also estimated from skin images using a similar procedure.

Fig. 5
Fig. 5
Full size image

Skin moisture prediction results: (a) Comparison of \(R^2\) value for Linear, MLP, and RF regression models on the sensor (left) and image (right) dataset, (b) moisture prediction on the test set using the RF algorithm for sensor data (left) and image data (right), and (c) corresponding evaluation metrics of the RF model in terms of MSE and \(R^2\).

Skin type classification

The second objective of this study was to classify skin types using facial images. The first method, a CNN model was employed, which takes preprocessed 2D skin images as input and automatically extracts hierarchical features through convolutional and pooling layers. In parallel, a MLP was used to process a set of handcrafted features derived from the skin images. Ground-truth labels for skin type were obtained from the reference device. All models were trained on the training set and validated on the test set to evaluate their generalization performance.

Fig. 6
Fig. 6
Full size image

Feature visualization and classification performance: (a) distribution of the 20 handcrafted image features visualized using PCA (left) and t-SNE (right), and (b) accuracy curves during training and testing for CNN (left) and MLP (right) models.

CNN structure The CNN architecture consisted of three convolutional blocks with increasing filter sizes of 16, 32, and 64, respectively. Each convolutional layer employed a kernel size of \(3 \times 3\), ReLU activation, and L2 regularization (\(\lambda = 0.001\)), followed by a \(2 \times 2\) max-pooling layer. The feature maps were then flattened and passed through two fully connected layers of 512 and 128 units, respectively, each with ReLU activation and L2 regularization (\(\lambda = 0.001\)). Dropout layers with a rate of 0.3 were applied after each dense layer to mitigate overfitting. Finally, the network outputs three neurons with softmax activation, enabling classification into three skin types: “Oily”, “Dry”, and “Normal”.

MLP structure In the second approach, handcrafted features extracted from the preprocessed skin images were used to train a MLP model. The architecture begins with a dense input layer comprising 128 neurons, accepting 20 handcrafted features with ReLU activation and L2 regularization (\(\lambda = 0.03\)), followed by a 30% dropout layer to enhance robustness and mitigate overfitting. This is followed by two additional fully connected hidden layers of 64 and 32 neurons, respectively, each with ReLU activation, L2 regularization (\(\lambda = 0.03\)), and a 30% dropout layer. Finally, the output layer consists of three neurons with softmax activation, enabling three-class classification of skin types: “Oily”, “Dry”, and “Normal”.

Results and discussion

Data visualization

Figure 6(a) illustrates the visualization of 20 handcrafted features extracted from skin images using Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE). PCA reduces the feature dimensionality to two principal components (PC1 and PC2), capturing the most significant variance in the data27. Although some overlap among the three skin types is observed, partially distinct clusters can still be identified.

In comparison, the t-SNE visualization maps the same feature set into a two-dimensional space while preserving local neighborhood relationships28. Both techniques reveal the structure and separability of the data; however, PCA provides a global overview of feature variance and shows partial class overlap, whereas t-SNE focuses on local structure preservation and demonstrates better inter-class separability. These results indicate that the handcrafted features contain discriminative information suitable for skin-type classification.

Skin moisture prediction

Figure 5(a) compares the performance of the three models for skin moisture prediction using both the sensor and image datasets, evaluated using the \(R^2\) metric. On the sensor dataset, the RF model achieved the highest performance with an \(R^2\) of 0.91 on the test set, followed by the MLP and Linear Regression models with \(R^2\) values of 0.80 and 0.76, respectively. A similar performance trend was observed on the image dataset, although overall prediction accuracy was lower compared to the sensor data. Specifically, the RF model achieved an \(R^2\) of 0.73, while the MLP and Linear Regression models yielded \(R^2\) values of 0.57 and 0.34, respectively. This performance gap is expected, as the IDC sensor directly measures moisture-induced capacitance changes, whereas the camera only captures indirect visual manifestations of skin moisture.

Figure 5(b) illustrates the moisture prediction results on the test set, where the left panel shows predictions based on IDC sensor data and the right panel presents predictions based on image data. Each sample represents a paired measurement consisting of the skin-moisture value predicted by the IDC sensor and the corresponding reference moisture value obtained from the reference device. The two points representing the same sample move closer together as the predicted value approaches the actual reference value. The quantitative comparison in Fig. 5(c) demonstrates that the IDC sensor-based model outperformed the image-based model, achieving a lower MSE (4.45 vs. 5.49) and a higher \(R^2\) (0.73 vs. 0.66), indicating that the impedance-based approach provides more accurate and reliable information for predicting skin moisture levels.

Fig.7
Fig.7
Full size image

Real-time predictions of skin moisture and skin type displayed in the GUI, where the system estimated a skin moisture level of 46.4% and classified the skin type as “Oily”.

Skin type prediction

Figure 6(b) shows the training and validation accuracy curves for skin type classification using both the CNN and MLP models. For the CNN, the training process terminated after 50 iterations, as the validation loss plateaued and showed no further improvement. As a result, the CNN achieved an overall classification accuracy of 84% on the test set.

In contrast, the MLP model continued training for 150 iterations and achieved a significantly higher overall classification accuracy of 93% across the three skin categories: “Oily”, “Dry”, and “Normal”. The MLP demonstrated better generalization, achieving higher accuracy and exhibiting less overfitting compared to the CNN. A detailed comparison of classification performance between the two models is provided in Table 1. These results indicate that the CNN, which learns features directly from raw images, performs less effectively on this dataset compared to the MLP, which relies on handcrafted features extracted from the preprocessed images. This suggests that the handcrafted features capture more discriminative and informative representations of skin characteristics, leading to better classification accuracy. Consequently, for this dataset, incorporating domain-specific feature engineering provides a clear advantage over purely end-to-end deep learning approaches.

Table 1 Performance comparison between CNN and MLP models for skin type classification.

Figure 7 illustrates the real-time prediction results for both skin type and skin moisture, leveraging data from the impedance sensor and image sensor. In this representative case, the regression model predicted a skin moisture level of 46.4%, along with an ambient temperature of \(32.57^\circ\)C and relative humidity of 75.47%. Additionally, the system successfully classified the skin type as “Oily”, demonstrating its capability for integrated, multimodal skin analysis in real time.

Conclusions

In this study, we presented the design and development of a portable device for effective and non-invasive skin monitoring. The proposed system integrates impedance measurements and imaging techniques to capture comprehensive information about skin conditions. By leveraging machine learning models, the device accurately predicts skin moisture levels and classifies skin types, providing a foundation for personalized skincare recommendations.

Experimental evaluations demonstrated that the RF algorithm achieved superior performance in moisture prediction, particularly when using impedance-based data, which proved more informative than image-based inputs. For skin type classification, a MLP trained on handcrafted features outperformed a CNN trained on raw images, underscoring the effectiveness of feature-engineered approaches in this context.

This study offers a key advantage through its multimodal integration of bio-impedance sensing and imaging within a low-cost platform, enabling a more comprehensive assessment of skin health. However, the current work is limited by the relatively small sample size, which may affect the reliability and generalizability of deep learning models.

Building on the current findings, the project holds considerable potential to evolve into a more versatile and accurate device. A primary objective is to expand the dataset to enhance model generalization and enable finer-grained classification of additional skin attributes. Beyond moisture estimation and basic skin-type categorization, future versions of the system will incorporate advanced predictive functionalities such as oiliness level, skin-age estimation, skin tone analysis, and preliminary detection of dermatological conditions.

From a technological standpoint, the development roadmap includes implementing a web-based application to ensure flexible access across any WiFi-enabled device, extending usability beyond the current local interface. This aligns with the broader aim of integrating the system into consumer wearable technologies to support continuous skin-health monitoring. Ultimately, these enhancements will strengthen the device’s role in delivering personalized skincare solutions and accessible, user-friendly medical assessment tools.