Introduction

Curve fitting is a fundamental task in statistical modeling and data analysis, with applications spanning diverse fields such as physics, engineering, economics, and biology. Various methods have been developed over the years to address the challenges of fitting curves to observed data, including polynomial regression, splines, and kernel-based methods. B-splines, in particular, have emerged as a popular approach due to their flexibility and ability to provide smooth approximations while maintaining local control1,2,3,4,5,6,7. The use of B-splines in curve fitting has been extensively studied, and numerous advancements have been made to enhance their efficiency and robustness5,6,7,8,9.

Penalized regression is a powerful tool in statistical modeling, where additional penalty terms are incorporated into the loss function to control the complexity of the fitted model. This helps prevent overfitting and ensures the smoothness or sparsity of the model, depending on the context. The penalty terms typically impose constraints on the model parameters, ensuring that the resulting estimates exhibit desirable properties. One popular form of penalization for promoting smoothness in regression is the total variation penalty, which limits the roughness of the estimated function10,11. Penalized regression techniques, such as the lasso and ridge regression, have also been combined with B-splines to control model complexity and prevent overfitting2,12. Recent developments have focused on improving the interpretability of fitted curves and extending these methods to high-dimensional data settings3,13,14.

The generalized lasso extends the classical lasso framework by incorporating a flexible penalty structure that can include not only simple sparsity but also penalties based on differences between coefficients, such as the total variation penalty15,16. This flexibility is particularly valuable when dealing with structured data, and it has been shown to be effective in capturing spatial and temporal dependencies in various applications. The penalized B-spline approach discussed in this paper is a specific instance of this broader generalized lasso framework.

The Alternating Direction Method of Multipliers (ADMM) algorithm is an optimization technique that has gained significant popularity in recent years, especially for problems involving separable objectives and complex constraints. ADMM decomposes complex optimization problems into simpler subproblems that can be solved iteratively17,18. It has been effectively used for a wide range of statistical learning problems, including those involving lasso and generalized lasso penalties15,17,19,20. Previous research has shown that ADMM is particularly well-suited for solving lasso and generalized lasso problems due to its ability to handle non-differentiable penalty terms efficiently19,20. These studies demonstrate the robustness and scalability of ADMM in high-dimensional settings, making it an ideal choice for penalized regression problems. The introduction of efficient optimization algorithms, such as the Alternating Direction Method of Multipliers (ADMM), has further contributed to the scalability of these methods in practical applications17,18.

Group lasso is another extension of the lasso method, designed to handle situations where predictors can be naturally grouped. This method encourages entire groups of coefficients to be either included or excluded from the model, which is particularly useful when predictors exhibit group-wise sparsity21,22. Group lasso is closely related to the idea of penalizing differences in B-spline coefficients to ensure smoothness, where the goal is to control the overall structure of the estimated model. Recent research has demonstrated the effectiveness of ADMM for solving group lasso problems. In particular,23 presents an ADMM-based framework for efficiently solving group lasso problems, showing that ADMM can decompose the group-wise penalty structure into manageable subproblems while ensuring convergence. This approach is particularly advantageous for high-dimensional data, as it provides scalability and robustness in handling group-wise sparsity.

Existing nonparametric function smoothing techniques, such as smoothing splines24, trend filtering25,26, generalized additive models27, and kernel smoothing28, typically fit p-dimensional response variables by separately applying univariate smoothers to each coordinate before integrating the results. While effective in many cases, these methods lack the ability to capture interdependencies between response coordinates, leading to potential inefficiencies in capturing complex functional relationships. Our proposed penalized B-spline approach, incorporating a group penalty, ensures that all response coordinates share the same set of knots, thereby improving computational efficiency and stabilizing variance across dimensions. This framework provides a principled method for high-dimensional curve estimation, effectively addressing the limitations of traditional smoothing approaches while leveraging the benefits of penalized regression models.

Oceanography has long benefited from advancements in data collection and modeling techniques, enabling a deeper understanding of complex marine processes. The advent of real-time drifter datasets, such as those provided by the Global Drifter Program (GDP), has revolutionized the study of ocean surface currents, sea surface temperatures, and their interactions with climatic phenomena like El Niño and La Niña events (NOAA/AOML, 2024). Drifters, equipped with high-precision sensors, offer an unprecedented view into the spatiotemporal variability of oceanographic features, making them invaluable for validating circulation models and predicting climate anomalies29,30.

In this study, we leverage penalized B-spline curve fitting to reconstruct drifter trajectories with an emphasis on generalizing ocean current patterns observed from similar locations. Due to the inherent variability of oceanographic processes, drifters released from nearly identical positions can follow markedly different paths influenced by a range of factors. Our approach synthesizes these diverse trajectories into a coherent, generalized representation of regional ocean currents, thus offering a robust framework for analyzing complex marine processes and discerning the underlying flow structures. One application of this method is the rapid construction of ocean current schematic diagrams that succinctly summarize the generalized current patterns. Moreover, when combined with additional information such as swell waves, wind waves, and salinity, the integrated interpretation of the generalized currents becomes much more straightforward, enabling a comprehensive understanding of the interplay between various oceanographic phenomena. This methodology is demonstrated in the Niño 4 region, a pivotal area for studying equatorial current dynamics and trade wind interactions, where the generalized current pattern provides deeper insights into both simplified schematic representations and integrated analyses that consider multiple environmental variables31,32.

Ecology, as a field, increasingly relies on advanced analytical techniques to unravel the complexities of species behavior and their interactions with the environment. Bird migration studies, for instance, offer critical insights into ecological dynamics, behavioral adaptations, and the impacts of climate change on biodiversity33,34. The Demoiselle Crane, a species renowned for its extensive seasonal migrations, presents unique challenges for analysis due to irregular GPS tracking intervals and noise in the recorded data35,36. It is crucial to accurately estimate reliable migration routes by accounting for these characteristics, as this is essential for establishing appropriate protected areas and optimizing habitat management strategies for migratory birds37. In the ecological context, our study applies penalized B-spline curve fitting to estimate migration trajectories based on GPS tracking data from Demoiselle Cranes. The estimated trajectories closely align with those reported in previous studies, demonstrating the reliability of our approach. In large-scale GPS tracking data, noise is frequently present due to the extensive range of movement. A key advantage of our method is its ability to effectively handle noisy data while still producing results comparable to those derived from noise-free datasets. Previous studies primarily relied on high-precision measurements, excluding data with high levels of noise38. However, our approach successfully incorporates noisy data without compromising accuracy, enabling a more cost-effective analysis of migration trajectories. Ultimately, our findings suggest that this methodology reduces costs while maintaining performance even in the presence of noise, providing a basis for its application to other avian species.

This paper makes three primary contributions. First, we propose a curve fitting approach using B-splines with a total variation penalty, which effectively controls the smoothness of the fitted curve. Second, we introduce group penalties to ensure that all response variables share the same set of knots, facilitating both functional smoothness and interpretability. Finally, we apply the ADMM algorithm to efficiently solve the penalized optimization problem, providing an effective and scalable solution for the proposed method. In this paper, we discuss penalized B-spline curve estimation by employing the ADMM algorithm to optimize the penalized objective function. Our approach draws on the principles of generalized lasso, total variation penalties, and other related methodologies to achieve a balance between model fit and smoothness, ensuring a flexible yet interpretable curve fitting framework.

Model and estimator

B-spline curve model

Consider a p-dimensional curve model

$$\begin{aligned} y_i = f(t_i) + \varepsilon _i \quad \text{ for }\quad i = 1, \ldots , n, \end{aligned}$$
(1)

where \(y_i = (y_{i1}, \ldots , y_{ip}) \in \mathbb {R}^p\) are p-dimensional response data, \(t_i \in [a, b]\) are fixed input points, \(\varepsilon _i\) are independent errors with mean zero and variance \(\sigma ^2 I_p\) with \(\sigma > 0\). The goal of this study is to estimate the underlying curve f given space-time data \((t_1, y_1), \ldots , (t_n, y_n)\). To estimate \(f = (f_1, \ldots , f_p): [a, b] \rightarrow \mathbb {R}^p\), we consider a penalized regression splines with total variation penalty.

Let \(\mathcal {S}\) be the set of splines of order m defined on [ab] with increasing knots sequence \(\xi = \{ \xi _1, \ldots , \xi _K \}\) where \(a< \xi _1< \cdots< \xi _K < b\). Any spline \(s \in \mathcal {S}\) can be expressed as a linear combination of B-spline basis \(B_1, \ldots , B_J\) with dimension \(J = K + m\).

Denote the B-spline curve

$$s(\cdot ;\beta ) = (s^1(\cdot ; \beta ^1), \ldots , s^p(\cdot ; \beta ^p)) = \sum _{j = 1}^J \beta _j B_j(\cdot ),$$

where \(\beta ^\ell = (\beta _1^\ell , \ldots , \beta _J^\ell )\), \(\beta _j = (\beta _j^1, \ldots , \beta _j^p) \in \mathbb {R}^p\) and \(\beta = [\beta _j^\ell ] \in \mathbb {R}^{J \times p}\) is the coefficient matrix for \(j = 1, \ldots , J\) and \(\ell = 1, \ldots , p\). Note that the \(\ell\)th coordinate function of B-spline curve \(s^\ell\) is determined by \(\beta ^\ell\) such that

$$s^\ell (\cdot ;\beta ^\ell ) = \sum _{j = 1}^J \beta _j^\ell B_j(\cdot ) \quad \text{ for }\quad \ell = 1, \ldots , p.$$

In the B-spline curve, the coefficients matrix \(\beta\) serve as control points, which play a key role in determining the shape and position of the curve39,40. These control points provide local influence over specific parts of the curve and allow for precise adjustments to the curve’s shape by moving the control points.

Penalized curve estimation via group total variation norm

The residual sum of squares is defined as

$$R(\beta ) = \frac{1}{2} \sum _{i = 1}^n {\Vert y_i - s(t_i;\beta ) \Vert }_2^2,$$

where \({\Vert \cdot \Vert }_2\) be a euclidean norm of vector.

For the penalties on the proposed estimators, we adopt the total variation of the \((m - 1)\)th derivative of \(s^\ell\) for \(\ell = 1, \ldots , p\), which corresponds to the coefficient of the highest-degree term in \(s^\ell\) within the B-spline scheme. This derivative is a piecewise constant function that changes only at the knot positions, where each jump size corresponds to the degree of change at each knot. Thus, knots with zero jump sizes are naturally regarded as unnecessary and can be removed. The total variation penalty can be described by the jump matrix \(D \in \mathbb {R}^{(J - m) \times J}\) that represents the jump sizes of the \((m - 1)\)th derivative of the B-splines at the interior knots. For more details on the total variation penalty and the corresponding jump matrix, see41.

Observe

$$D\beta = \begin{bmatrix}D\beta ^1&\cdots&D\beta ^p \end{bmatrix}\in \mathbb {R}^{(J - m) \times p}.$$

Each column of the matrix \(D\beta\) represents the jump sizes at the \((J - m)\) interior knots for each coordinate-specific B-splines \(s^\ell\), and their \(L_1\) norm corresponds to the total variation. In this study, we additionally propose a group-type total variation \(L_2\) norm as a mechanism to ensure that all components select the same set of knots. Thus, the proposed penalty function is defined by

$$p(\beta ) = \sqrt{p}\sum _{j = 1}^{J - m} {\Vert d^j\beta \Vert }_2,$$

where \(d^j\) is the jth row of D and \(d^j\beta = (d^j \beta ^1, \cdots , d^j \beta ^p) \in \mathbb {R}^p\) is represents the jump sizes at the jth knot for each component. By introducing a group penalty term on this vector, we can ensure homogeneity at the knot level, thereby enhancing both the accuracy and interpretability of the estimator.

To this end, the penalized objective function we optimized is denoted as

$$R^\lambda (\beta ) = R(\beta ) + \lambda p(\beta ),$$

where \(\lambda \ge 0\) plays the role of smoothing parameter. Define the estimator of the coefficients matrix \(\beta\)

$$\hat{\beta }= \mathop {\textrm{argmin}}\limits _{\beta \in \mathbb {R}^{J \times p}} R^\lambda (\beta )$$

and finally the penalized B-spline curve estimator (PBCE) is given by

$$\hat{f}(\cdot ) = s(\cdot ; \hat{\beta }).$$

Optimization approach with ADMM

We utilize the Alternating Direction Method of Multipliers (ADMM) algorithm to address a constrained convex optimization problem. The ADMM algorithm has gained significant attention in statistical and machine learning applications due to its efficiency in solving complex optimization problems. By decomposing large-scale problems into a series of smaller, more manageable sub-problems, ADMM enables efficient, parallelized computations. This iterative approach alternates between updating variables, which allows for distributed optimization and simplifies the handling of complex constraints. Consequently, ADMM has become a powerful tool for optimizing tasks in high-dimensional settings, especially where computational resources are limited or distributed systems are needed. For a more in-depth understanding of the ADMM algorithm, please refer to the works of42,43,44.

We find the minimization problem is re-formulated as

$$\min _\beta \frac{1}{2} {\Vert Y - B\beta \Vert }_2^2 + \lambda {\Vert D\beta \Vert }_G \quad \text{ subject } \text{ to }\quad D\beta = \alpha ,$$

where \(Y = \begin{bmatrix}y_i^\ell \end{bmatrix}\in \mathbb {R}^{n \times p}\) is the response matrix, \(B = \begin{bmatrix}B_j(t_i) \end{bmatrix}\in \mathbb {R}^{n \times J}\) is the B-spline basis matrix. Also, for matrix \(A \in \mathbb {R}^{r \times c}\), \({\Vert A \Vert }_2\) be the \(L_2\) norm of vectorized of matrix A and \({\Vert A \Vert }_G\) denotes the row-wise \(L_2\) group norm

$${\Vert A \Vert }_G = \sqrt{c} \sum _{j = 1}^r {\Vert A^j \Vert }_2,$$

where \(A^j\) is the jth row of A.

The augmented Lagrangian is defined as

$$L_\eta (\beta , \alpha ) = \frac{1}{2}{\Vert Y - B\beta \Vert }_2^2 + \lambda {\Vert \alpha \Vert }_G + \frac{\eta }{2} {\Vert D\beta - \alpha + u \Vert }_2^2,$$

where \(u = \begin{bmatrix}u_j^\ell \end{bmatrix}\in \mathbb {R}^{(J - m) \times p}\). For update scheme of \(\beta\), we have

$$\beta \leftarrow \left( B^\top B + \eta D^\top D \right) ^{-1} \left( B^\top Y + \eta D^\top (\alpha - u) \right)$$

For update scheme of \(\alpha\), we have

$$\begin{aligned} \alpha&\leftarrow \mathop {\textrm{argmin}}\limits _\alpha \left( \frac{\eta }{2}{\Vert D\beta - \alpha + u \Vert }_2^2 + \lambda \sqrt{p} \sum _{j = 1}^{J - m} {\Vert \alpha ^j \Vert }_2 \right) \\&= \mathop {\textrm{argmin}}\limits _\alpha \left( \frac{1}{2} {\Vert D\beta - \alpha + u \Vert }_2^2 + \frac{\lambda \sqrt{p}}{\eta } \sum _{j = 1}^{J - m} {\Vert \alpha ^j \Vert }_2 \right) \\&= \left( {{\bf {prox}}}_{\frac{\lambda \sqrt{p}}{\eta }}^1 \left( d^1\beta + u^1\right) , \ldots , {{\bf {prox}}}_{\frac{\lambda \sqrt{p}}{\eta }}^{J - m} \left( d^{J - m}\beta + u^{J - m}\right) \right) \\&\triangleq {{\bf {prox}}}_{\frac{\lambda \sqrt{p}}{\eta }} \left( D\beta + u\right) , \end{aligned}$$

where

$${{\bf {prox}}}_\lambda ^j (A) = A^j \cdot \max \left( 1 - \frac{\lambda }{{\Vert A^j \Vert }_2}, 0 \right)$$

is the proximal operator of group norm for the jth row vector \(A^j\) of the matrix A45,46. The procedure for implementing the proposed method is outlined in Algorithm 1. The algorithm was packaged as an R program, and it is provided along with a manual file detailing its main functions, simulation, and data analysis code as supplementary materials.

Algorithm 1
figure a

Group-wise penalized B-spline curve estimation.

Data analysis

In this study, we selected drifter data and crane migration data as case studies for analyzing movement trajectories over time. Both datasets share a common characteristic in that they incorporate temporal and spatial variations, forming curved movement trajectories. Drifter data capture movement influenced by ocean currents, while crane migration data reflect seasonal variations in movement patterns. Thus, both datasets serve as suitable cases for analyzing irregular movement. Accordingly, we applied a penalized B-spline curve fitting technique to model the temporal and spatial changes in both datasets. This approach provides a methodological framework for effectively analyzing various movement patterns and can be utilized to interpret movement trajectories in datasets that incorporate temporal variations. The computational algorithm developed in this study has been implemented as an R package, which, along with a detailed manual, has been uploaded to the authors’ GitHub repository (https://github.com/JaeHwan-Jhong/BsplineCurve). In addition, all data and code used for the analyses and visualizations presented in this paper have been organized into R Markdown and PDF files and made publicly available for unrestricted use.

Real time drifter data

We present a novel application of penalized B-spline curve fitting to real oceanographic data, focusing on the Niño 4 region in the central equatorial Pacific Ocean. Utilizing data from the Global Drifter Program(GDP), we demonstrate how this method addresses challenges inherent to drifter trajectory analysis, such as noise and irregular sampling intervals.

Data collection

The GDP provides real-time drifter data that are essential for understanding ocean surface currents and sea surface temperature patterns47. Drifters are autonomous, floating instruments deployed across the world’s oceans to collect data that contribute to a variety of scientific research, including studies on climate, ocean circulation, and marine ecosystems. Each drifter is equipped with sensors that transmit location and environmental measurements via satellite, offering high-resolution data that are crucial for oceanographic models and climate forecasts. This real-time drifter dataset supports a wide range of research applications, from validating ocean current models to improving our understanding of air-sea interactions, which are vital for assessing climate change impacts. For more information, see at NOAA’s Global Drifter Program website47.

We obtained the drifter data from the NOAA Atlantic Oceanographic and Meteorological Laboratory’s ERDDAP server. The data span from April 2, 2020, to May 10, 2020, covering a geographical region defined by \(5^\circ\)N-\(5^\circ\)S latitude and \(160^\circ\)W-\(150^\circ\)W longitude, which corresponds to the Niño 4 area in the central equatorial Pacific Ocean. The dataset includes six-hourly measurements of longitude and latitude for drifters with World Meteorological Organization (WMO) identifiers 5102764, 5102765, and 5102766. These specific drifters were selected to illustrate an example of a single estimated generalized trajectory based on data starting from the same location. For improved readability, throughout the following sections we refer to drifter tracks 5102764, 5102765, and 5102766 as track 1, track 2, and track 3, respectively.

The Niño 4 region is pivotal for studying ENSO dynamics. Variations in sea surface temperatures and currents here influence global weather patterns, marine biodiversity, and socio-economic conditions. Understanding these processes is essential for predicting climate anomalies such as El Niño and La Niña events.

The first subplot of Fig. 1 illustrates the global map with the Niño 4 region, marked by black dots in the central equatorial Pacific Ocean, spanning \(5^\circ\)N–\(5^\circ\)S latitude and \(160^\circ\)W–\(150^\circ\)W longitude. This subplot provides geographic context for the study area, emphasizing the region’s importance in understanding equatorial oceanic dynamics.

The subsequent subplot provides an enlarged view of the Niño 4 region, where the observed trajectories of three drifters are displayed, color-coded as red (track 1), blue (track 2), and black (track 3). Track 1, shown in red, exhibited complex patterns with pronounced rotations, suggesting localized variability in ocean currents. The trajectory of track 2, represented in blue, followed a smoother path but displayed noticeable curvature, likely caused by specific localized interactions. In contrast, track 3, depicted in black, maintained a largely monotonic trajectory, aligning closely with the dominant regional currents.

Together, these visualizations highlight the diverse oceanic dynamics within the Niño 4 region. The interplay between localized variability and broader regional influences is evident in the distinct movement patterns of the drifters, underlining the significance of detailed trajectory analysis in understanding equatorial current systems.

Previous studies have revealed interactions between ocean currents and various oceanographic elements such as the atmosphere, wave, and swell wave48,49,50. Particularly, swell waves and wind have been shown to significantly impact ocean currents51,52,53,54. Based on these findings, we aimed to enhance the interpretation of ocean current movement by presenting information on wind wave, primary swell wave, and secondary swell wave. These three datasets were referenced from the CMEMS Marine Data Store55.

Figure 2 is a detailed visualization of oceanographic data collected during the early, middle, and late periods of buoy observations on dates such as “April 5, 2020”, “April 21, 2020”, and “May 9, 2020.” In this figure, the spatial distribution of variables such as wave height and swell height is represented using the same longitude and latitude coordinates as the buoy trajectories. Additionally, for each dataset–wind wave, primary swell wave, and secondary swell wave–arrows indicate their respective directions along with the corresponding ocean current direction at that time. This allows for an intuitive understanding of how these three elements influence the movement of ocean currents. These visual cues not only illustrate the interactions between ocean currents and other oceanographic elements, but they also provide concrete evidence for interpreting ocean current movements based on the observed data.

Fig. 1
figure 1

World map highlighting the Niño 4 region (left) and an enlarged view of observed drifter trajectories in the Niño 4 region (right). Maps were generated using RStudio (version 2024.12.0+467), R (version 4.4.1), and the rworldmap package (version 1.3.8) with the getMap function.

Fig. 2
figure 2

Wind wave (black arrow), Primary swell wave (blue arrow), and Secondary swell wave (purple arrow) corresponding to “April 5, 2020” (first row), “April 21, 2020” (second row), and “May 9, 2020” (third row), respectively, along with the ocean current direction (red arrow). Each arrow intuitively indicates the direction of the corresponding oceanographic element. The background represents the height of the wind or swell waves (unit: m).

Model settings

We employed a penalized B-spline approach to reconstruct smooth drifter trajectories while preserving essential movement patterns. A key feature of this method is the incorporation of group penalties, ensuring that all drifters share a consistent set of knots. This enhances comparability and facilitates the study of collective behaviors within oceanic systems.

To optimize the balance between flexibility and smoothness, we applied penalty parameters (\(\lambda )\) ranging from \(10^{-10}\) to 100. Time normalization to the [0, 1] range was performed to ensure numerical stability and a uniform distribution of B-spline basis functions. These settings were tailored to the dataset size (approximately 400 observations), ensuring computational efficiency and accurate representation of trends.

To choose an optimal complexity parameter, we use a Akaike Information Criterion (AIC)56 Bayes information criterion (BIC)57. The AIC and BIC for a sequence of the complexity parameters is defined as

$$AIC_k = n \log \left\{ \frac{1}{n} \sum _{i = 1}^n \left( y_i - \hat{f}^{\lambda _k} (x_i)\ \right) ^2 \right\} + 2 J_k \quad \text{ and }\quad BIC_k = n \log \left\{ \frac{1}{n} \sum _{i = 1}^n \left( y_i - \hat{f}^{\lambda _k} (x_i)\ \right) ^2 \right\} + J_k \log (n)$$

for \(k = 1, \ldots K\), where K is the number of candidate \(\lambda\) values and \(J_k\) denotes the number of nonzero control points for \(\lambda _k\). The optimal value for \(\lambda\) is chosen as the lambda value corresponding to the smallest AIC or BIC value. We acknowledge the potential of cross-validation (CV) as an alternative to AIC and BIC. In our initial experiments, we performed an 8:2 data split for cross-validation and observed that the complexity of models selected by CV was generally in line with those chosen using AIC, while BIC tended to select simpler models. Since no substantial difference was found between AIC and CV in terms of selected model complexity, we primarily focused on AIC and BIC for model selection in the manuscript. This tuning parameter selection method was also used in the Demoiselle Crane data analysis.

Results and discussion

Figure 3 illustrates the observed trajectories of three drifters (track 1, track 2, and track 3) within the Niño 4 region and their respective fitted lines. The first subplot (top left) provides an overview of the fitted lines for all three drifters, using the same color scheme as Fig. 1: red for track 1, blue for track 2, and black for track 3. These fitted lines capture both localized variability and broader regional trends in ocean currents.

The subsequent subplots provide a detailed analysis of each drifter’s observed trajectory and its corresponding fitted line. The second subplot (top right) focuses on track 1. Track 1 exhibits a highly intricate path, starting from the lower-right section of the plot and transitioning into sharp turns and pronounced loops near its final segment. These features suggest intense localized variability in ocean currents. The fitted line follows these patterns closely, effectively capturing the sharp changes in direction and rotational movements.

The third subplot (bottom left) examines track 2, whose trajectory presents a smoother curve compared to track 1. However, the path displays a distinct bending towards the southeast, with a noticeable shift in curvature mid-way through its course. This curvature indicates interactions with localized oceanic systems. The fitted line provides a precise representation of the gradual directional changes observed in this trajectory.

Finally, the last subplot (bottom right) focuses on drifter track 3. Track 3 drifter follows a predominantly monotonic and streamlined path with minimal deviations. The trajectory starts with a slight curvature but stabilizes into a nearly linear movement, suggesting alignment with dominant regional currents. The fitted line accurately reflects this uniform and stable progression, reinforcing its alignment with large-scale ocean dynamics.

Fig. 3
figure 3

Observed trajectories and corresponding fitted lines of three drifters in the Niño 4 region, presented in individual subplots for detailed analysis.

Figures 4 compare the fitted lines generated using the AIC and BIC, highlighting differences in their ability to capture localized features. The lower-right section of the plots, representing the initial segment of the trajectories, reveals distinct patterns in the behavior of the fitted lines under each criterion. The AIC-optimized model demonstrates a closer fit to the observed data points, closely following minor fluctuations and capturing finer-scale variations in the trajectory. However, this fine-tuned fit introduces a higher risk of overfitting, as seen in the erratic behavior of the fitted line around localized variability. In contrast, the BIC-optimized model provides a smoother trajectory that generalizes the data, sacrificing the ability to follow the finer details in favor of reducing overfitting. This trade-off is evident in the initial segment, where the BIC model fails to capture some localized patterns observed in the data. These differences illustrate the flexibility of the penalized B-spline method in balancing model complexity and generalization, depending on the chosen criterion.

The reconstructed fitted lines underscore the temporal and spatial complexities of ocean currents in the Niño 4 region. In addition, the results demonstrate the method’s capability to enhance the interpretability of drifter data and reveal key insights into ocean surface dynamics, particularly regarding the influence of equatorial currents and trade winds. This evidence, drawn from the detailed analysis of trajectory patterns and the comparison between AIC and BIC optimized models, confirms the robustness of the approach in capturing both large-scale trends and small-scale variations.

Fig. 4
figure 4

Integrated fitted lines of three drifters in the Niño 4 region, optimized using the Akaike Information Criterion (AIC, left) and the Bayesian Information Criterion (BIC, right).

Previous statistical methods for estimating ocean currents have comprehensively taken into account key factors such as error analysis, spectral techniques, time series analysis, and wave and tidal modeling58,59,60. As an extension of these approaches, we propose a novel interpretation of ocean current flows by generalizing buoy trajectories through statistical curve fitting. In particular, our method has the advantage of fitting multivariate data all at once. Moreover, recent technological advances have enabled ocean current measurements using both buoys and radar61,62. Both methods are considered primary for understanding ocean currents, with radar offering the potential for faster and more accurate interpretations in various regions beyond traditional buoy observations.

Seasonal migration routes of the Demoiselle Crane

We applied a penalized B-spline curve-fitting methodology to analyze the seasonal migration routes of Demoiselle Cranes (Anthropoides virgo). Migration studies provide critical insights into ecological patterns and behavioral adaptations, with Demoiselle Cranes serving as an ideal subject due to their extensive migrations. However, GPS tracking data often present challenges such as irregular intervals, missing values, and noise. To address these issues, our approach combines numerical interpolation with penalized curve-fitting, optimized using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), to estimate smooth migratory trajectories.

Data collection and preprocessing

The simulation utilized GPS tracking data collected between August 2018 and November 2023 from in Mongolia63. To investigate seasonal patterns, data from one year, spanning from August 2018 to August 2019, were selected for analysis. A scatter plot of the raw data, as shown in Fig. 5, illustrates the distribution of GPS points over time.

Fig. 5
figure 5

Scatter plot of all data points based on GPS tracking data from Movebank’s “1000 Cranes - Mongolia” dataset. This study uses migration data from August 9, 2018, covering a 1-year period. Maps were generated using RStudio (version 2024.12.0+467), R (version 4.4.1), and the rworldmap package (version 1.3.8) with the getMap function.

To account for the irregular nature of GPS data collection, numerical interpolation was performed to adjust for uneven time gaps. The initial recording time was set to \(t_1 = 0\), with \(t\) increasing by 1 for each hour. Since the dataset covered a full year, \(t\) spanned from 0 to 8760 (24 h \(\times\) 365 days). These values were rescaled to range between 0 and 365 for analysis purposes.

Model settings

In estimating migration routes, a penalty parameter \(\lambda\) was varied between \(10^{-5}\) and 10 to balance smoothness and data fidelity. Models were evaluated based on their Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) scores.

Fig. 6
figure 6

The integrated fitted lines of all data points, optimized using the AIC(left) and the BIC(right). The data is GPS tracking data from Movebank’s “1000 Cranes - Mongolia” dataset. This study uses migration data from August 9, 2018, covering a one-year period. Maps were generated using RStudio (version 2024.12.0+467), R (version 4.4.1), and the rworldmap package (version 1.3.8) with the getMap function.

Figure 6 shows the trajectory of movement obtained using the BIC and AIC optimization models. The AIC model for the trajectory of movement can describe the complexity and variability of the movement patterns more effectively than the BIC optimization model tends to be simpler. In subsequent analyses, the AIC and BIC fitting results show minimal differences. Therefore, based on the principle of model simplicity, only the BIC fitting results will be presented and used for analysis.

Results and discussion

In this study, we aimed to estimate the migration trajectories of Demoiselle Cranes using the penalized B-spline method. Here, stopover refers to a temporary location where migratory birds rest and forage during long-distance migration. The estimated trajectories were then compared with previously identified migration routes and stopover site patterns to evaluate the accuracy and ecological relevance of our approach.

Despite we using all available data, including those with high levels of noise, our methodology produced migration patterns similar to those observed in previous studies that relied only on a limited set of noise-free observations38,64. Figure 7 illustrates scatter plots representing stopover sites identified in previous studies for individuals with migration trajectories similar to our fitted line65,66. This figure demonstrates that many of these stopover sites, identified for individuals with migration trajectories similar to our fitted line, are also included in our estimated migration trajectories.

Fig. 7
figure 7

Fitted line overlaid on previously identified stopover sites (scatter points) from existing studies. Maps were generated using RStudio (version 2024.12.0+467), R (version 4.4.1), and the rworldmap package (version 1.3.8) with the getMap function.

These findings suggest that our methodology effectively captures the actual migration routes of the cranes. Building on this, we further analyzed migration trajectory dynamics and identified two primary migration groups. The differences in their movement patterns appear to be influenced by environmental factors. Northwest migration group:

  • Birds in this group predominantly traveled along western routes.

Northeast migration group:

  • This group followed eastern routes and eventually returned to Mongolia.

This separation between the groups is believed to have been influenced by temperature and wind speed38.

Additionally, further research is needed to determine whether migration patterns are associated with specific climatic conditions and topographical features67. Figure 8 visualizes the spatial separation and distinct trajectories of these groups, with scatter plots highlighting the divergence in their northward migration routes.

Fig. 8
figure 8

Scatter plot: the northwest migration group (red) and the northeast migration group (blue). Maps were generated using RStudio (version 2024.12.0+467), R (version 4.4.1), and the rworldmap package (version 1.3.8) with the getMap function.

A comparison of the fitted models indicated that there was little difference between the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)-optimized results. Since AIC tends to overfit models, we selected BIC as a more conservative model selection criterion, as mentioned earlier. Figure 9 presents the migration patterns of the two groups (Left: Northwest Migration Group; Right: Northeast Migration Group) fitted using the BIC criterion These fitted trajectories effectively capture both overall trends and localized variations within each group, demonstrating that the penalized B-spline method is useful for distinguishing specific movement behaviors. This study integrates penalized curve-fitting with statistical modeling, providing a comprehensive approach to analyzing migratory behaviors.

Fig. 9
figure 9

fitted lines: northwest migration group (left), and northeast migration group (right). Maps were generated using RStudio (version 2024.12.0+467), R (version 4.4.1), and the rworldmap package (version 1.3.8) with the getMap function.

Considering its consistency with previous research, the approach used in this study appears to be applicable to the analysis of movement patterns of other animals, and the estimated migration routes, as demonstrated in our study, can serve as a foundation for further research. Based on these findings, the methodology presented in this study can contribute to the precise estimation of migration routes in the fields of migration ecology and conservation science, and it is expected to serve as a valuable tool for habitat management and conservation planning.

Conclusion

In this study, we presented a robust framework for penalized B-spline curve fitting using the Alternating Direction Method of Multipliers (ADMM) algorithm. The proposed methodology effectively balances flexibility and smoothness by incorporating a total variation penalty, making it highly suitable for applications in diverse scientific domains. Through applications to oceanographic drifter data and ecological bird migration data, we demonstrated the method’s ability to handle noise, irregular sampling intervals, and complex patterns in spatiotemporal data.

While the methodology was developed for a general p-dimensional setting, where represents the number of response variables, the data analyses in this study were limited to \(p = 2\), representing two-dimensional spatial data (latitude and longitude). A significant avenue for future research lies in extending this framework to higher-dimensional data settings, such as or more. This expansion would enable the analysis of datasets where an additional dimension, such as altitude, temperature, or other environmental factors, plays a critical role.

Although we considered two-dimensional spatial data (latitude and longitude) with a response variable count of 2, for comparison with other well-known methods we first fitted each case using \((p = 1\)) and then combined the results. In Table 1, we compare the MSE, MAE, and MXDV for both the Drifter and Crane datasets using the SplineCurve, SplineCurve_Ind, SmoothSpline24, generalized additive model(GAM)27, and kernel smoothing(K-Smooth)28 methods. As mentioned earlier, SplineCurve_Ind is employed for comparison in the same manner as the other methods. For objective comparisons, we used the optimal parameter values, determined through cross-validation, for the SmoothSpline, GAM, and K-Smooth methods.

For the Drifter data, which comprises approximately 400 data points, the performance metrics (MSE, MAE, and MXDV) of all five methods were quite similar. However, for the Crane data, which contains about 18,000 data points, differences in the evaluation metrics became apparent. Specifically, the GAM method exhibited a relatively high MSE of about 15, indicating a decline in performance, while the SplineCurve, SmoothSpline, and K-Smooth methods yielded similar MSE values of around 11. Notably, the SplineCurve_Ind method, which uses individual fits, achieved the best performance with an MSE of about 5.7.

When comparing our SplineCurve method in the \((p = 2\)) case with well-known models such as SmoothSpline, GAM, and K-Smooth, we observed that their performance was comparable. In particular, when evaluated under the \((p = 1\)) setting used by the established models, our method demonstrated numerically superior performance. This confirms that our model is highly competitive compared to other approaches.

Table 1 Performance comparison for drifter and crane datasets using SplineCurve, SplineCurve_Ind, SmoothSpline, GAM and K-Smooth methods. SplineCurve_Ind represents the result of applying the SplineCurve method to univariate data. The mean and standard deviation (values in parentheses) of the MSE, MAE, and MXDV from 50 random sampling trials are presented.

One promising direction for future work is the application of this methodology to three-dimensional trajectory data, where \(p = 3\). For instance, the analysis of bird flight paths, incorporating altitude as a third dimension alongside latitude and longitude, could provide deeper insights into migration dynamics and behavioral adaptations. Similarly, marine studies involving the 3D movement of aquatic animals, such as diving behaviors of whales or sharks, would greatly benefit from this extended approach. These datasets often contain rich, multidimensional information that could further validate and refine the proposed penalized B-spline framework.

Moreover, the ability to handle dimensions could open avenues for analyzing complex environmental or biomedical datasets. For example, incorporating additional dimensions such as time-varying physiological metrics in animal studies or multidimensional climate data in oceanographic research would enhance the scope and applicability of this method.

In summary, this study underscores the versatility and robustness of penalized B-spline curve fitting for analyzing spatiotemporal data. The results demonstrate the method’s effectiveness in capturing both localized variations and broader trends, providing valuable insights into ecological and oceanographic processes. By extending the framework to higher-dimensional settings, future research can unlock new possibilities for addressing complex scientific questions, making this methodology a cornerstone for multidimensional data analysis in the years to come.