Introduction

Parkinson’s disease (PD) is a neurodegenerative disorder that is primarily characterized by the degeneration of dopaminergic neurons and the aberrant accumulation of alpha-synuclein1,2. The fundamental approach to managing PD is dopamine replacement therapy2,3. The efficacy of this therapeutic approach is currently evaluated through medical consultations, symptom assessments, and patient diaries based on the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Part III, which serves as a quantifiable measure of motor symptoms4,5. However, this assessment methodology is labour-intensive, time-consuming, and confined to hospital visits. Therefore, there is a pressing requirement for a simple, immediate, continuous, and quantitative approach to monitoring the fluctuating symptoms of PD in any setting. Despite the advent of numerous wearable devices, their accessibility for clinical use remains inadequate, and standardization of these devices has yet to be achieved6.

The involuntary act of eye blinking during arousal, commonly referred to as “blink” or “spontaneous eye blink,” is governed by neural circuits that are mediated by the basal ganglia7. The spinal trigeminal complex is a major element in the spontaneous blink generator8. There is evidence that the basal ganglia, via the superior colliculus and nucleus raphe magnus, modulate input to and excitability of the trigeminal complex, thus providing a path-way through which dopamine could affect the trigeminal complex and, in turn, blinking9. Previous studies have indicated that blink rate are affected by age10, emotional state11, dopaminergic treatment12, performance in mental tasks13 and test condition such as conversation, reading, and watching14. It is well-documented that the spontaneous eye blink rate (sEBR) decreases following the administration of dopamine antagonists such as haloperidol and increases following the administration of dopamine agonists such as apomorphine15. Moreover, alterations in sEBR have been observed in diseases associated with dopamine abnormalities. For example, an increase in spontaneous eye blinking has been observed in patients with schizophrenia marked by excessive dopamine16. In contrast, patients with Parkinson’s disease (PD) collectively exhibit a reduction in sEBR compared to healthy controls14,17. Nevertheless, the administration of dopamine has been observed to elevate the blinking frequency of a patient with PD to a level within the normal range17,18,19,20,21. Kaminer et al. suggested that DA inhibits the trigeminal complex, via its effects on the nucleus raphe magnus, which results in increased spontaneous blinking7,8. It is noteworthy that some patients with PD do not necessarily demonstrate a lower sEBR than that of healthy controls during OFF periods. Kimper et al. found that blink frequency in PD patients with motor fluctuations was divided into two groups: a low blink rate group, in which blink frequency decreased off state and a high blink rate group, in which blink frequency increased off state. The high blink rate was found in 32% of the patients. In both groups, blink frequency normalized by L-dopa treatment. Clinical symptoms were compared between the two groups, yet there were no differences in age, duration of disease, disease severity, anti-PD drugs, LEDD or type of motor fluctuation. They considered that the high blink rate was caused by symptoms similar to blepharospasm as off-dystonia17.

There have been a number of studies on PD blink rate to date, yet they have issues with regard to observation duration and the data collection method10,12,14,19,20. Most of the studies were fixed-point measurement trials in the ON and OFF states capturing blinks during several minutes. On the other hand, Iwaki et al. have found that fluctuations in blinking were consistent with those in PD symptoms for several hours in PD patients suffering from the wearing-off phenomenon21. However, they detected blink using electromyography, which was associated with high noise levels and difficulties in accurate detection of blink rates. In a notable contribution to the field, Kimura and colleagues employed a 1 kHz ultra-high-speed camera to examine blink kinetics in patients with PD22. Their findings revealed noteworthy discrepancies in several aspects, including eye-opening and closing speed and the amplitude of blinks, when compared to healthy subjects.

In this study, we utilized an eyeglasses-type device as an eye tracker, to record pupils and measure blink. We hypothesized that blink features collected in this manner could predict scores of PD symptoms, symptom fluctuation, dyskinesia, and plasma L-dopa concentration, primarily when regular doses of L-dopa are administered. We trained models to predict the variability of PD symptoms post L-dopa administration based on these blink features using machine learning techniques.

Results

Patient Demographics

Table 1 delineates demographic and clinical attributes of the 20 patients incorporated in this study. The onset age was 63.05 (SD: 9.41) years, relatively youthful for PD. A distinct disparity was observed in the MDS-UPDRS-Part III score between off and on states, with dyskinesia noted in 7 patients (35%). The cognitive function of the patients remained intact.

Table 1 Demographic and clinical attributes of the 20 patients

PD symptoms and Blink information progression post L-dopa administration

The L-dopa/decarboxylase inhibitor (DCI) dosage varied among patients and was administered after the overnight off-state. Motor symptom scores were evaluated in the overnight off state when the patient symptom diary indicated a complete off state of 1. L-dopa blood levels ascended, and Part III scores decreased 30 to 60 min post-administration when patient symptom diaries transitioned to 3 and 4, signifying the ON state. Accompanying these variations, their sEBR also noticeably fluctuated: sEBR increased in 9 patients (hereinafter referred to as the “Increased Group”), decreased in 6 (hereinafter referred to as the “Decreased Group”), and did not change in 3 (hereinafter referred to as “Unchanged Group”). Additionally, categorization was unfeasible for two patients, as ON periods did not occur in one patient, and the other exhibited extremely low pupil confidence. Consequently, 75% of data points were discarded during data cleansing (hereinafter referred to as “N/A Group”). A comparison of the clinical symptoms among those groups revealed no differences in disease duration, LEDD, quality of life scores, or cognitive function. However, the patients in the unchanged group were significantly older than those in the other groups. The rate of motor symptom improvement with L-dopa/DCI administration was significantly higher in the increase group than in the other two groups. There were no differences in pharmacokinetic parameters among the groups (Table 1). Although no significant differences in pharmacokinetic parameters among the groups were observed, the increase and unchanged groups exhibited a steep ascent till the peak (Supplementary Fig. 1). In contrast, the decrease group demonstrated a gradual change overall. Additionally, the concentrations remained low in the 2 patients who were unavailable for the assessment.

Machine learning

The sEBR of PD patients exhibited significant fluctuations accompanied by the administration of L-dopa/DCI. However, the clinical differences characterizing sEBR-pattern-based patient groups were not recognized. Therefore, while it is evident that there is a biological basis indicating the relationship among PD, sEBR, and dopamine, sEBR alone is insufficient to comprehensively describe the state of PD patients. Consequently, we have chosen not to apply the conventional sEBR as a special metric. Instead, we have aimed to utilize more multifaceted blink characteristics obtained from the eye-tracker data such as sEBR for machine learning models, so as to estimate the state of the patients.

We employed eXtreme Gradient Boosted Trees with early stopping as our regressor and classifier models. These models were chosen based on preliminary model evaluations by DataRobot, which demonstrated their consistently superior performance for regression and binary classification tasks. Hyperparameters models were set as follows: learning rate = 0.05, n_estimators = 1000, and max_depth = 5. See the official XGBoost Documentation (https://xgboost.readthedocs.io/en/stable/index.html) for details. Of the 20 selected features for each target variable, most were derivatives of blink confidence, interval, and duration, with a limited contribution from eye blink rate and its derivatives (refer to Fig. 1 for selected features and their contributions).

Fig. 1: Top 30 features ranked by relative importance during the test phase, as measured by mean SHAP values and frequency of selection.
figure 1

For each target task, 20 blink-related features were selected during training, and their mean SHAP values (bars) and standard deviations are displayed. Bar colours represent the frequency with which each feature was selected across training iterations. Panels display feature importance for A Dyskinesia classification, B ON/OFF classification, C MDS-UPDRS Part III regression, and D Plasma L-dopa concentration regression. Refer to Fig. 4C and Tables 2 and 3 for details on feature categories and extraction processes.

Dyskinesia Classification (Fig. 2A, Supplementary Fig. 2): The performance of models was evaluated using the AUCROC over the entire evaluation period (Supplementary Fig. 2). Models trained using only blink-related features (B) achieved a mean AUCROC of 0.77 (SD: 0.17) during the test phase. In contrast, models relying solely on plasma L-dopa concentration as features performed near chance level, with a mean AUCROC of 0.61 (SD: 0.08). Adding background information features, such as elapsed time post-L-dopa administration and patient age (B + BG), did not improve predictive performance, resulting in a mean AUCROC of 0.77 (SD: 0.16). However, applying a post-processing smoothing technique to the blink-only model (B, Smoothed) significantly improved performance, achieving a mean AUCROC of 0.86 (SD: 0.11).

Fig. 2: ROC curves for dyskinesia and ON/OFF classification using different feature sets.
figure 2

A ROC curves for dyskinesia classification during ON state, evaluated across patient groups including both dyskinetic and non-dyskinetic individuals, using models trained with blink-related features (“B”), blink + background features (“B + BG”), and plasma L-dopa concentration (“L”). B ROC curves for ON/OFF classification evaluated across all test patients using the same feature sets. The upper panels show AUCROC values for raw predictions, while the lower panels show AUCROC values after applying a smoothing technique on the prediction. The blue curves represent the mean ROC, with the shaded area indicating the 95% confidence interval (CI) of all ROC curves. Mean AUCROC values are displayed on each panel, with distinct letters indicating statistically significant differences (paired t-test with Bonferroni correction, α = 0.0033).

When dyskinesia predictions were restricted to the ON state (Fig. 2A), performance of the blink-only models remained consistent. Specifically, the mean AUCROC was 0.76 (SD: 0.17) for Model B and 0.85 (SD: 0.18) for the smoothed model prediction (B, Smoothed). For the model B + BG, the mean AUCROC was 0.70 (SD: 0.23), while that of the smoothed model B + BG performance was 0.73 (SD: 0.27). The model trained by using only plasma L-dopa concentration (L) showed the decline with a mean AUCROC of 0.45 (SD: 0.10).

Figure 3A demonstrates the time-series plots of dyskinesia prediction by the model trained with B features in a representative test group, showing a clear distinction in predicted dyskinesia likelihood between patients with and without actual dyskinesia. These results highlight the robustness of blink-related features in dyskinesia classification, particularly when post-processed with smoothing techniques, while suggesting limited utility of L-dopa concentration or background information for improving performance on dyskinesia prediction.

Fig. 3: Time-series predictions of dyskinesia and ON/OFF states for representative patients.
figure 3

A Dyskinesia predictions using models trained with blink-related features (“B”) are shown for two non-dyskinetic patients (#1, #3) and one dyskinetic patient (#4). Dotted blue lines indicate raw predictions; solid blue lines show smoothed predictions. True ON states and observed dyskinesia periods are marked with red and orange lines, respectively. B ON/OFF predictions using models trained with blink + background features (“B + BG”) are shown for the same patients. Dotted orange lines indicate raw ON predictions; solid orange lines show smoothed predictions. True ON states are marked with red lines.

ON/OFF Classification (Fig. 2B): ON/OFF Classification was also evaluated using the AUCROC. In the test phase, the model with B features achieved the mean AUCROC of 0.69 (SD: 0.14), which was not significantly different from L (mean AUCROC = 0.73 (SD: 0.12)). Adding background features (B + BG) and post-processing smoothing (B, Smoothed) both significantly improved prediction accuracy, which showed a different trend from dyskinesia prediction. The highest mean AUCROC obtained was 0.87 (ASD: 0.10) (B + BG, Smoothed), achieved significantly higher than any other conditions.

MDS-UPDRS Part III Regression (Supplementary Fig. 3): The plot of the test results for all patients showed a weak Spearman’s correlation ρ between predicted and actual scores in all feature combinations (0.18 ≤ ρ ≤ 0.39; B, B + BG, L) (Supplementary Fig. 3A). Correlation ρ between predicted and true scores of individual patients is shown in Supplementary Fig. 3B. The model trained solely on B features achieved a ρ > 0 with statistical significance in only 30% of the patients (Supplementary Fig. 3C), demonstrating a weak average correlation (mean ρ = 0.19 (SD: 0.22)). The addition of background features and smoothing post-processing both improved the correlation. The highest mean ρ was achieved with the combination of both additives (i.e., B + BG Smoothed) showing a moderate correlation (ρ = 0.54 (SD: 0.24)) but without significant difference from Smoothed L (mean ρ = 0.48 (SD: 0.35)) (Supplementary Fig. 3B). The percentage of patients with ρ > 0 with a statistical significance also improved with these additions (Supplementary Fig. 3C). DDTW analysis indicated that the post-processing smoothing significantly improved the extent to which the predicted data could replicate original trends (Supplementary Fig. 3D). Supplementary Fig. 3E presents the time-restored predicted scores for two representative patients with the best-performing B + BG model. Although the moving average of the prediction deviated from the absolute MDS-UPDRS Part III total scores, it potentially captures their temporal change patterns.

Plasma L-dopa Concentration Prediction (Supplementary Fig. 4A): Both B and B + BG feature combinations showed weak correlation ρ between predicted values and actual values in the plot of all test data (0.11 ≤ ρ ≤ 0.37) (Supplementary Fig. 4A). The correlation ρ between prediction and true scores of individual patients and the percentage of patients with ρ > 0 with a statistical significance both improved with the addition of background features and smoothing post-processing, with the combination showing the highest average correlation (B + BG, smoothed; mean ρ = 0.57 (SD: 0.21)) (Supplementary Fig. 4B, C). DDTW analysis indicated significant improvement with post-processing smoothing (Supplementary Fig. 4D).

Details of the statistical analysis, such as t-statistics and P-values, are summarized in Supplementary Table 1.

Discussion

This study demonstrated that the changes in blink features following oral administration of L-dopa were associated with motor symptoms and motor complications of PD. The analysis of blink rates by traditional methods is limited in scope, prompting our research to explore novel blink features and employ machine learning techniques. We have developed machine learning models that are capable of concurrently estimating the presence of dyskinesia, ON/OFF symptoms, MDS-UPDRS Part III scores, and plasma L-dopa concentration based on eye blink features. Our findings indicate that eye blink represents a promising, straightforward, non-invasive digital biomarker that reflects a patient’s concurrent clinical status.

This study employs a distinctive approach by utilizing a comprehensive range of blink-related features, extending beyond the mere measurement of blink rate, to anticipate motor fluctuations in patients with PD. Previous research, including our own21, has been limited by the inconsistency of findings regarding the change patterns in blink rate in response to L-dopa. The predictive capacity of the models was augmented by the incorporation of a multitude of blink-related features, including confidence, duration and interval, elapsed time following L-dopa administration, and patient age.

In both the prediction of dyskinesia and the ON/OFF state, models based on blink features alone demonstrated high classification accuracy. In particular, the blink-feature model demonstrated superior predictive performance compared to the plasma L-dopa concentration model in predicting dyskinesia. This finding indicates that blinking may serve as a more precise indicator of the central nervous system status in PD patients than peripheral L-dopa levels. The incorporation of background features and post-processing smoothing resulted in an enhanced performance of the model for ON/OFF prediction. The regression models for MDS-UPDRS Part III and plasma L-dopa concentration showed moderate positive correlations in a number of patients when background information and smoothing post-processing were applied. As background information, elapsed time following L-dopa administration was a very significant contributor in those models (Fig. 1). This is likely due to its direct correlation with plasma L-dopa concentration, we presume. On the other hand, “patient age” consistently contributed only to the estimation of MDS-UPDRS Part III, which suggests an indirect relationship between age and the severity of motor symptoms23.

For MDS-UPDRS Part III subscores of tremor, rigidity, bradykinesia, and axial symptoms, models based on blink-related features (Models B and B + BG) showed no statistically significant differences in Spearman’s correlation coefficients (ρ) across subscore categories (Supplementary Fig. 5A). For the smoothed model B, the mean correlation ranged from 0.24 to 0.29 (Supplementary Fig. 5A).

The blink confidence-related features often contributed more to the models’ performance than the traditional blink rate. Blink confidence, derived from eye-opening and closing speed, duration, and amplitude, indicates the quality of a blink. It is a more considerable value in a blink with rapid eyelid movement and complete pupil covering (Figs. 4B and 1).

Fig. 4: Blink data acquisition, analysis and machine learning procedure.
figure 4

A The Pupil Core eye tracker (Pupil Labs GmbH, Berlin, Germany. The upper left image is cited from the product webpage (https://pupil-labs.com/products/core) with permission), example of pupil image by infrared eye cameras, and recording setup of eye tracker wired to a smartphone. B Blink extraction from pupil data and blink feature extraction. Pupil Player extracts eye blinks based on pupil confidence. A differential filter was applied to extract blink onset and offset. Blink interval and duration were calculated from the onset and offset detection. Blink confidence was calculated as the proportion of the shaded area within the region enclosed by the dashed line calculated for each blink. C Data processing flow. The Base Blink Features #1 were transformed into three features through the Baseline Correction: Raw Value (retaining the original, unprocessed value), the Difference from Baseline (baseline: the mean prior to L-dopa administration within the data from the same individual), and the Absolute Value of the Difference from Baseline. These parameters were subsequently normalized in two ways: within the individual and across all patient data. The Base Blink Features #2 were initially converted into multiple statistical features using Statistical Feature Extraction, and then processed in the same manner as Base Features #1, yielding a total of 468 features. These features were then selected for each model in the training phase based on their contribution. Details of features can be found in Tables 2 and 3. D Machine learning procedure. Time-series data were segmented by a time window of 3 min and treated as independent data points. Clinical indicators such as dyskinesia, ON/OFF, MDS-UPDRS Part III total score and plasma L-dopa concentration were set as target variables, while blink features were set as feature variables. machine learning was conducted using DataRobot, the automated machine learning platform, and custom-Python programs. The DataRobot logo is cited from product webpage (https://www.datarobot.com/) with permission. See text for details.

These findings indicate the potential of eye blinking, in conjunction with basic background information, to provide an objective assessment of ON/OFF and dyskinesia symptoms in real time. This could be a pivotal advance in the assessment of therapies for motor fluctuations, including anti-dyskinetic drugs. The headset used in this study is lightweight, weighing only 9 g, and no patients reported any discomfort during its use. Today, while most camera-based wearable eye trackers, such as those from Tobii and Pupil Labs, are primarily used in industrial or research settings, eye-tracking technology is becoming increasingly integrated into consumer-grade devices. Additionally, technologies capable of quantifying blinks using smartphone or fixed video cameras have been widely explored24. By combining such technologies, it could support telemedicine by analyzing eye blinks through remotely recorded video data, enabling objective evaluation of PD motor symptoms without requiring frequent clinic visits. Nevertheless, further research with a larger data set and more sophisticated modelling is required before this approach can be applied in practice to accurately regress MDS-UPDRS Part III scores. It is noteworthy that the present study included patients with diverse blink patterns. It should be noted, however, that the training did not involve stratification during the machine learning phase. Moreover, the L-dopa dosages utilized in this study, ranging from 100−200 mg, constituted part of each patient’s standard treatment regimen. This suggests that our models are capable of effectively capturing symptom fluctuations in real-life settings, thereby indicating the potential applicability of our blink model in clinical practice.

The following limitations are intrinsic to this study: This study is exploratory in nature and based on a limited sample size of 20 cases from a single institution, with only seven patients developing dyskinesia. Moreover, the study concentrated on advanced-stage PD patients who exhibited discernible responses to L-dopa. The diagnosis of PD was based on clinical criteria and not confirmed pathologically. To provide more comprehensive validation and practical application, future studies should include a larger, multicenter cohort with patients at various stages of PD.

In conclusion, this study demonstrates the potential of blink features, either alone or in combination with other features, as an innovative and non-invasive tool for the real-time monitoring of PD clinical status, both inside and outside of hospital settings.

Methods

Subject recruitment and ethical approval

Patients with PD were recruited from the Department of Neurology at Juntendo University School of Medicine between May and December 2021. All patients provided written informed consent after being fully informed of the purpose and procedures of the study. This clinical trial has been approved by the Research Ethics Committee, Faculty of Medicine, Juntendo University (H20-0376) and registered in the University Hospital Medical Information Network- Clinical Trials Registry (UMIN000044246).

The study design and patient population are as follows:

This study is an uncontrolled, open-label, and exploratory clinical study. It consists of 1 week of baseline assessment, habituation to the glasses-type device, and training in symptom diary, followed by a 1 day blink evaluation. On the day of the blink evaluation, the patients were administered a single dose of L-dopa/DCI in the fasting state, following a 12 h discontinuation of anti-PD medication. The blink information, fluctuations in Parkinson’s motor symptoms, and L-dopa pharmacokinetics were evaluated for a maximum of 4 h after administration.

The subjects were 20 patients with advanced-stage PD and fluctuating symptoms, who had been hospitalized for device therapy since they were suffering difficulties in their daily lives due to the wearing-off phenomenon. The authors employed patients who met and did not conflict with the inclusion and exclusion criteria as follows, respectively: Inclusion criteria: (1) Clinically established or probable PD meeting the MDS clinical diagnosis criteria for PD (2015), (2) Stage ≤ III on the Hoehn and Yahr scale in the ON state, (3) Receiving treatment with L-dopa for ≥ 6 months (26 weeks) and showing its effects, (4) Patients with advanced PD who are to be hospitalized for medical evaluation, drug adjustment, and rehabilitation, (5) capable of providing a voluntary written informed consent based upon their sufficient understanding of the research, (6) Patients who are able to join the evaluation without using their own eyeglasses in the case that they regularly use glasses in their daily life. Exclusion criteria: (1) Atypical Parkinsonism syndromes, (2) Dementia or at high risk of it (Mini Mental State Examination (MMSE) score ≤ 20), (3) Contraindicated for concomitant medications (L-dopa/DCI), (4) Hypersensitivity to concomitant medications (L-dopa/DCI) and/or their ingredients, (5) Co-existing psychiatric disease (e.g., depression, bipolar disorder or schizophrenia) and/or clinically significant complications (e.g., cerebrovascular accident, heart disease, chronic respiratory disease, uncontrolled hypertension and diabetes), (6) History of psychiatric disease (e.g., depression, bipolar disorder or schizophrenia) and/or device-aided therapies (i.e., GPi pallidotomy, thalamotomy, and deep brain stimulation), (7) Those who the principal investigator judges to be inappropriate as research subjects

Clinical assessments

The subjects underwent a series of assessments at baseline to obtain data regarding their age, gender, body mass index, disease duration, Mini-Mental State Examination (MMSE) scores, levodopa equivalent daily doses, and stage of the disease according to the Hoehn-Yahr scale, both during and outside of episodes. The following instruments were utilized: the EuroQol 5 Dimensions 5-Level (EQ-5D-5L), the Parkinson’s Disease Questionnaire-39 (PDQ-39), the Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Parts I, II, III, IV, and the Unified Dyskinesia Rating Scale (UDysRS). To confirm L-dopa levels on the day of blink evaluation, multiple measurements of MDS-UPDRS part III, UDysRS, and bedside score were obtained at the same time as blood collection. The MDS-UPDRS and UDysRS were evaluated by trained experts. Patients were instructed to record bedside scores for their motor symptoms of PD in their diaries, according to a four-point scale: completely off (1), partially off (2), partially on (3), and completely on (4), while discussing the symptoms with their attending physicians.

Pharmacokinetics assessments of levodopa

Following an overnight fast and medication-free period of 12 h, the patients were administered a single dose of L-dopa/DCI, which was the same dose as that for usual regimen. Blood samples were collected for pharmacokinetic analysis of L-dopa at the following time points: before administration and at 15, 30, 45, 60, 90, 120, 150, 180, and 240 min after administration. Plasma L-dopa concentration was determined by the validated method described in Measurement of L-dopa Concentration Section. The maximum L-dopa concentration (Cmax), time to maximum L-dopa concentration (Tmax), and area under the concentration-time curve from time 0 to the last measured time point (AUC0-t) were obtained as parameters of L-dopa pharmacokinetics.

Measurement of L-dopa concentration

The plasma L-dopa concentration was measured by high-performance liquid chromatography (Thermo Scientific™ UltiMate™ 3000 HPLC system). Briefly, 500 μL of plasma was obtained by centrifugation of blood collected in EDTA-2Na tubes (3000 rpm, 10 min, 4 °C), mixed with 50 μL of 60% perchloric acid and centrifuged (12,000 rpm, 40 min, 4 °C). The supernatant was further centrifuged in an ultrafree tube, and the newly obtained purified supernatant (25 µL) was injected into an HPLC system equipped with a WPS-3000 TRS autosampler with a cooling device, an Acclaim™ 120 C18 column (Ф4.6 × 150 mm), and an ECD-3000RS detector. The mobile phase buffer A was prepared by adding 27.6 g sodium phosphate, 680 µL of 0.2 mg/mL nitrilotriacetic acid, and 100 µL tetrahydrofuran to distilled water so that the total volume would be 2 L, and then mixing it with 400 µL 5% SDS, 200 µL ProClin150, and 728 µL phosphoric acid. The mobile phase buffer B was prepared by adding 27.6 g sodium phosphate and 100 µL of 0.2 mg/mL nitrilotriacetic acid to distilled water so that the total volume would be 1 L, and then mixing it with 1060 mL methanol, 4.2 mL 5% SDS, and 6 mL phosphoric acid. The gradient elution was delivered as follows (A:B): 0–2.5 min, 96:4; 2.5–12.5 min, 96:4–62:38; 12.5–21.5 min, 62:38–40:60; 21.5–25.5 min, 40:60–10:90; 25.5–30.5 min, 10:90; 30.5–31.5 min, 10:90–96:4. L-dopa was separated from the buffer solution at a flow rate of 1.0 mL/min and column temperature of 31 °C. Chromatographs were analyzed by Chromeleon 7.2. The limit of detection and quantification were 6.74 pmol/mL and 22.4 pmol/mL, respectively.

Acquisition of pupil data

Pupil data were collected from the eyes of patients using a wearable eye-tracking headset (Fig. 4A) (Pupil Core; Pupil Labs GmbH, Berlin, Germany) for a period of 4 h following L-dopa administration on the day of blink evaluation, with data recorded from 30 min prior to administration until the end of the 4-h period. The headset frame is made of flexible and lightweight plastic (9 g)25. This design did not cause any oppressive or uncomfortable feelings when wearing the device. The headset was equipped with two infrared eye cameras, with one camera dedicated to each eye and a sampling rate of 200 Hz. A USB cable was utilized to establish a connection between the headset and a smartphone (Android OS, version 11) affixed to the patient’s upper arm. The video data of the eyes was recorded with the Pupil Mobile software (version 1.2.3).

Blink data extraction

Pupil and blink data were extracted from recorded videos of eyes with Pupil Player software (v. 3.5.1) (Fig. 4B). Pupil Player detects pupils based on ellipse fitting25 to the infrared video image where a pupil appears as a black ellipse when the eyelid is opened. When the pupil is not detected at all, the value “pupil confidence” based on ellipse fitting was set to 0.0, while the value is 1.0 when the pupil is detected as an ellipse with high accuracy. Pupil confidence for each eye was concatenated by recorded time and a differential filter was applied to extract blink onset and offset. Blinking is detected with a decrease in filtered pupil confidence below the threshold (onset) caused by the concealment of the pupil by the eyelid, and a subsequent increase in filtered pupil confidence above the threshold (offset) caused by eye-opening, occurring at a time less than the filter length. The data were cleaned based on the confidence in the pupil detection to eliminate poor recordings of eye opening as following processes:1. Data Division: We divided the 200 Hz pupil data into 3-min time windows. Each represents an independent data point, focusing on eye blinking characteristics. 2. Criteria for Data Quality:1) If pupil confidence is below 0.2, we consider the eyes closed. 2) If the eyes are closed for > 50% of the time window, or if the average pupil confidence when the eyes are open is below 0.5, we exclude this data point. This is because the patient’s eyes might not be fully open due to reasons like drowsiness or camera setup issues.

Categorization of blink rate changes

PD patients have been reported to show either an increase or decrease in sEBR17,18 in response to L-dopa, we categorized patients into “Increased,” “Decreased,” and “Unchanged” groups based on the changes in blink rate from OFF to ON periods in our observation. In order to categorize these patterns, we first defined ON and OFF periods based on 1-4 bedside scores (BS):

$${\rm{ON}}{\rm{:= }}{\rm{BS}} > 2$$
$${\rm{OFF}}{\rm{:= }}{\rm{BS}}\le 2$$

We then calculated the number of blinks per minute during these periods, denoted as sEBRON for ON periods and sEBROFF for OFF periods. Patients were categorized into “Increased,” “Decreased,” and “Unchanged” groups based on the changes in blink rate from OFF to ON periods. We used a 15% change in sEBRON compared to sEBROFF as an indicator:

$${Increased}:=\frac{{{sEBR}}_{{ON}}}{{{sEBR}}_{{OFF}}} > 1.15$$
$${Decreased}:=\frac{{{sEBR}}_{{ON}}}{{{sEBR}}_{{OFF}}}\le 0.85$$
$${Unchanged}:=1.15\ge \frac{{{sEBR}}_{{ON}}}{{{sEBR}}_{{OFF}}} > 0.85$$

The 15% threshold for defining changes was subjectively and retrospectively chosen to represent our observational results best.

Machine learning procedure and statistical analysis

We aimed to estimate concurrent clinical information from blink features within a specific time window using machine learning techniques (Fig. 4D). In order to achieve this, we employed both the DataRobot platform (Versions: 7.2.8 and 8.0.6; DataRobot, Inc.), which automates model selection and hyperparameter tuning (automated machine learning), and custom Python programs. Ensemble models were intentionally omitted to simplify interpretation.

Data processing

Time-series blink data were segmented into 3 min windows and treated as independent data points. We utilized 1485 cleansed data points from 20 patients for this process. Target variables included: 1.Presence of dyskinesia (binary classification), 2.Bedside score-based ON/OFF state (binary classification), 3.Total MDS-UPDRS Part III score (regression), 4.Plasma L-dopa concentration (regression).

Due to the actual values of target variables being obtained every 15−30 min, we employed linear interpolation for data points within these intervals.

Feature extraction and normalization

We extracted 468 blink-related features from each time window, including base blink features such as blink rate, blink intervals and blink confidence, and their derivatives. The definition of base blink features is summarized in Table 2 and Fig. 4B. The derivatives of base features were extracted by following processes. Also see Fig. 4C and Table 3 for details. First, blink rate and energy—referred to as “base blink features #1”—were obtained as a single value per time window. A baseline correction was performed using the individual’s average before L-dopa administration, which transformed the raw data into three types of features: the original unprocessed value, the difference from baseline, and the absolute difference from baseline. These features were subsequently normalized in two ways: one normalization was applied within each patient’s data to capture intra-individual variability, and another normalization was performed across patients to clarify the relative positioning. For the training phase, the normalization utilized data from all patients included in the training, and for the testing phase, the test patient’s features were normalized using data from all patients. Similarly, base blink features #2—comprising interval, duration, confidence, and depth—were calculated for each blink within the time window, allowing the derivation of statistical parameters such as mean, median, standard deviation, maximum, and minimum. In addition, the mean and standard deviation from all blinks observed during the recording for each individual were used to categorize these features into five groups (LOW, MID-LOW, MID, MID-HIGH, and HIGH), according to rules detailed in Table 3. The frequency of each category was then counted within each time window; for instance, if three LOW and two MID blinks were observed for blink confidence, the features were recorded as CONFIDENCE_FRQ_LOW = 3 and CONFIDENCE_FRQ_MID = 2. Furthermore, the relative frequency and the ratio of each category to the most frequently observed category (MID) were computed. In the example above, CONFIDENCE_FRQ_LOW_REL = CONFIDENCE_FRQ_LOW / CONFIDENCE_FRQ_MID = 3 / 2 = 1.5. Similar baseline correction and normalization procedures as for base blink features #1 were applied, resulting in a total of 468 blink-related features.

Table 2 Summary of base blink features, background and reference features
Table 3 Construction of derivative features. Also see the text and Fig. 1C for a detailed feature engineering procedure

In addition to blink features, non-blink features such as plasma L-dopa concentration, elapsed time after L-dopa administration, and patient age were included. Plasma L-dopa concentration is an invasive but reflective indicator of Parkinson’s disease (PD) motor symptoms26, while elapsed time after L-dopa administration is easily obtained and anticipated to correlate with PD motor symptoms. Patient age was also included, as it is known to relate to both blink dynamics and PD symptoms23,27. To prevent overfitting due to predictable data collection intervals and small sample size, Gaussian noise (σ = 30) was added to elapsed time, while Gaussian noise (σ = 3) was added to patient age. These features were normalized in two ways: first, within-patient normalization was applied to capture each patient’s relative variability; second, across-patient normalization was performed to clarify relative positioning across the patient group (Fig. 4B, C; Tables 2 and 3).

Model Construction and Evaluation

We performed model construction and evaluation using a leave-one patient/group-out approach. For ON/OFF state, MDS-UPDRS Part III score and plasma L-dopa concentration, models were trained using data from 19 patients and evaluated on the remaining patient. For dyskinesia, since only seven patients exhibited this symptom, a leave-one-patient-out evaluation with AUCROC was not feasible. In order to address this, we assigned one or two non-dyskinetic patients to each dyskinetic patient, forming two- or three-patient groups for leave-one-group-out evaluation. Non-dyskinetic patients were selected from different blink rate change patterns (i.e., Increased, Decreased, or Unchanged) randomly to ensure diversity. We prepared 14 groups to prevent bias, ensuring that each patient was included in two groups without being paired with the same patient.

During each training phase of leave-one patient/group-out approach, 20 blink-related features were selected for each target variable by a two-step approach. First, the Boruta algorithm was used to exclude features with contributions that were statistically lower than random noise28. Next, SHAP (SHapley Additive exPlanations) analysis29 was applied to iteratively remove features with the lowest contributions at a 10% exclusion rate until 20 features remained.

Evaluation Metrics and Statistical Analysis

The performance of models was evaluated based on the task. For binary classification tasks, including ON/OFF state and dyskinesia, models were evaluated based on the LogLoss metric, and the area under the receiver operating characteristic (ROC) curve (AUCROC) was calculated for each eligible patient or patient group and compared. For dyskinesia, ROC curves were generated for both ON state and the entire evaluation period for 14 groups. For ON/OFF state, ROC curves were generated for 19 patients. One patient who was entirely in the OFF state and did not show any ON state was omitted. For regression tasks, including MDS-UPDRS Part III scores and plasma L-dopa concentration, performance was assessed using the Root Mean Squared Error (RMSE). Then Spearman’s correlation (ρ) between predictions and actual values was calculated for each patient and compared, with the proportion of patients with positive (ρ > 0) and statistically significant ρ (test of no correlation (two-sided, p < 0.05)). In order to examine the extent to which the predicted data could replicate original trends when restored to a time series, we calculated Derivative Dynamic Time Warping (DDTW) for each patient30.

Predictive performance was assessed across three different feature combinations:1. Selected blink features (B), 2. Blink features + background information (elapsed time post-L-dopa and patient age) (B + BG), 3. Plasma L-dopa concentration alone (L). Additionally, we evaluated the predictive performance after post-processing raw predictions with a 15 min moving average for practical application (smoothed). Results were compared using paired t-tests with Bonferroni’s correction. All possible feature combinations were tested, i.e., B, B + BG, and L, both with and without smoothing.