Introduction

The global maritime network handles over 80% of international trade by volume, with crude oil and petroleum products alone accounting for nearly 30% of this market1,2. Despite its economic significance and substantial environmental impact3,4,5,6, maritime shipping networks remain understudied compared to other transportation systems, particularly regarding their temporal operational dynamics7. This research gap is especially pronounced for oil tanker networks, which exhibit fundamentally different operational characteristics from the more extensively researched container shipping networks, where fixed schedules and regular route structures have made traditional complex network analysis particularly effective for understanding system-level properties8.

Current maritime network research has predominantly focused on static, time-aggregated analyses, with four main analytical themes: trade and connectivity; hubs and centrality; vulnerability and robustness; and communities and spatial structure9,10,11. These studies established that global shipping is dominated by hub-and-spoke structures, where major ports are connected to numerous smaller ports, while other important ports act as regional gateways controlling the flow of goods between different geographic areas8,12,13,14,15,16. Moreover, such analyses have also been used to compare changes in network connectivity and regional communities at different time periods, often before and after significant geopolitical events17,18,19. These studies are commonly interpreted through the lens of economic and ecological impacts15,20,21,22.

Despite these insights, static approaches that aggregate temporal data obscure critical operational patterns. When networks are analyzed as snapshots, essential sequential information is lost: the timing and order of vessel movements between regions, and the dynamic variation in cargo flows over time. This information loss masks valuable operational insights, as vessels following identical aggregate movement patterns between the same ports may employ fundamentally different sequential strategies that significantly impact their efficiency and market positioning. These different temporal routing patterns create distinct operational outcomes that remain invisible in static networks.

These temporal dynamics are especially critical in oil tanker movements, which are operationally different compared to container liners. Unlike container ships operating on fixed schedules along established routes, tankers operate in charter-based markets where individual vessels compete for cargo, creating highly flexible and responsive movement patterns8. These competitive dynamics generate precisely the type of sequential, temporal dependencies that static aggregation eliminates—as vessels dynamically adjust their routing sequences based on market opportunities and cargo availability. Recent work on oil tanker networks has examined traffic density23, hub-and-spoke structures24, and influential port identification14, yet these studies maintain the static network paradigm and do not capture the temporal operational patterns that may distinguish successful from unsuccessful competitive strategies.

To overcome these analytical limitations and uncover the hidden temporal structure, we demonstrate in this study that sequential motif analysis (which identifies recurring patterns of regional visits) and Dynamic Mode Decomposition (DMD, a method for extracting temporal patterns from time series data) reveal previously undetected patterns spanning both individual vessel operations and global flow dynamics. This dual-scale approach links routing strategies to efficiency outcomes and cargo flows to seasonal cycles. Specifically, we show that ships exhibiting diverse regional exploration patterns—captured through sequential motifs—achieve significantly better laden-ballast ratios (LBRs), a key efficiency measure closely related to the industry-standard Energy Efficiency Operational Indicator. In parallel, DMD applied to regional cargo flow time series reveals that different ship classes operate with distinct regional footprints and specialized seasonal trading patterns. We identify previously uncharacterized synchronous and anti-synchronous relationships between maritime regions that correlate with economic and climate cycles. While these analyses operate at different scales—individual vessel behavior versus collective fleet dynamics—they provide complementary insights into the multi-scale temporal structure of maritime operations.

Results

Vessel data description

We obtained proprietary data from our partner company AlphaOcean, covering laden (loaded with cargo) voyages taken by 3026 medium to large crude-oil and petroleum tankers from 2016 January to 2020 February; see illustration in Fig. 1. This consists of 452 Panamax, 1141 Aframax, 610 Suezmax, and 823 Very Large Crude Carrier (VLCC), listed in order of their size in deadweight tonnage. While the locations provided are accurate up to the port-level, our analysis condenses this to a regional level, grouping 1337 ports into 26 regions. This aligns with industry practices of pricing freight rate indices on a region-to-region basis and reflects the uncertainty that exact discharge ports are often not specified to operators until late in the voyage.

Fig. 1: Schematic representation of voyages in time and space of two ships.
Fig. 1: Schematic representation of voyages in time and space of two ships.The alternative text for this image may have been generated using AI.
Full size image

Ships travel along directed port-to-port routes with journey time indicated by clock markers, alternating between laden and ballast voyages. Individual ports in the dataset are shown, with the regions they belong indicated by their marker shape and color.

Ship routing performance indicators

The tanker ships in our study typically do not travel on fixed routes, nor do they service particular countries or regions exclusively. Instead, they are free to compete for cargo across different markets and regions, with their movements driven by economic considerations such as travel times, fuel and staffing costs, potential earnings, and other market opportunities. This operational flexibility presents a unique challenge for tankers: how do we evaluate the routing performance of ships?

A key challenge in evaluating tanker routing performance lies in the inaccessibility of comprehensive voyage and vessel data. We propose an accessible metric using temporal data—the LBR,

$${{{\rm{LBR}}}}=\frac{{\sum }_{l\in L}\Delta {t}_{l}}{{\sum }_{l\in L}\Delta {t}_{l}+{\sum }_{b\in B}\Delta {t}_{b}},$$
(1)

where Δtl is the at-sea duration of laden leg l L (adjusted for port waiting times; see Methods for details and robustness checks) and Δtb is the duration of ballast leg b B, where L and B are the sets of laden and ballast legs, respectively, of a ship’s trajectory. The ratio above simply measures the proportion of time spent laden at sea out of the total time at sea. The LBR is designed to capture the relative amount of time a ship spends performing useful work (transporting cargo) versus non-revenue generating operations (repositioning to load cargo). By focusing solely on temporal patterns of laden versus ballast operations, the LBR excludes ship-specific technical factors such as fuel consumption patterns or engine specifications. This design allows us to avoid scenarios where a vessel with efficient engines but poor routing outperforms another vessel with less efficient engines but a superior routing strategy. This enables fleet-level comparisons where comprehensive operational data remains unavailable.

The stratification of our analysis by vessel class (Panamax, Aframax, Suezmax, VLCC) controls for substantial inter-class variability in vessel characteristics and operational profiles. Within each class, multivariate analysis controls for vessel age, effectively isolating routing efficiency from vessel-specific factors. Furthermore, as demonstrated in Supplementary Eqs. (1)(5), LBR captures fundamentally similar efficiency patterns to the International Maritime Organization’s Energy Efficiency Operational Indicator (EEOI)25, which measures CO2 emissions per unit of cargo work. While EEOI remains an industry gold standard, its calculation requires proprietary fuel consumption and precise route distance data typically unavailable in fleet-level research datasets. Our LBR measure, though simplified, provides a theoretically grounded and empirically useful measure that reflects revenue-optimizing incentives, enabling large-scale analysis of routing efficiency patterns across global shipping networks.

The LBR distributions in Fig. 2a expose significant performance disparities across each ship class, with the upper quartile (top 25%) displaying substantially higher values compared to the lower quartile (bottom 25%). Comparing median values between these groups reveals that the upper quartile achieves LBRs approximately 50% higher than the lower quartile. This clear difference in time spent carrying versus seeking cargo indicates significant heterogeneity in operational strategies within identical ship classes. Such performance differentials suggest that optimizing routing strategies could substantially improve both economic outcomes and environmental performance.

Fig. 2: Distribution of vessel measure data across classes.
Fig. 2: Distribution of vessel measure data across classes.The alternative text for this image may have been generated using AI.
Full size image

Kernel density estimates for a laden-ballast ratio, b number of unique regions visited, and c year constructed across four tanker classes. Dotted lines mark quartiles; triangle markers denote means. Sample sizes shown in parentheses: Panamax (452), Aframax (1141), Suezmax (610), Very Large Crude Carrier or VLCC (823).

This performance landscape reveals distinct efficiency profiles that align with each vessel class’s operational niche. Panamax, Aframax, and Suezmax vessels show comparable LBRs with overlapping median ranges, while VLCCs achieve marginally higher values due to their specialized operational advantages. As the largest vessel class, VLCCs are often deployed on inter-regional long-haul routes inaccessible to smaller ships, maximizing their at-sea laden time while minimizing time spent waiting in port. These longer voyage durations enable easier fixture planning, reducing inter-voyage delays that would otherwise erode efficiency.

Tanker vessels exhibit remarkable freedom to explore diverse markets rather than operating from fixed regional bases. Analysis of total regional coverage, Fig. 2b, uncovers distinct operational strategies across ship classes that directly reflect their design constraints and market positioning. Panamax and Suezmax vessels demonstrate similar exploration capabilities, visiting a median of 12 regions, indicating comparable operational flexibility despite size differences. Aframax vessels show the greatest variation in regional diversity with broader distribution patterns (median: 9 regions), reflecting their widespread usage, superior versatility for different cargo types, and port accessibility. VLCCs operate under significant geographical constraints, visiting only 7 regions on average with markedly narrow distribution. This restricted mobility stems from their specialized infrastructure requirements—as the largest vessels in the global fleet, VLCCs can only access ports with sufficient depth and handling capacity for their extreme size.

Extracting sequential motifs

Ship routing decisions involve complex factors that create distinctive movement signatures, exposing the strategic thinking behind high-performance operations. Consider two ships visiting identical regions: Ship 1 follows (Middle East, China, Southeast Asia, Middle East, China, Southeast Asia)—systematically cycling through three regions—while Ship 2 alternates (Middle East, China, Middle East, Southeast Asia, China, Southeast Asia)—return journeys between pairs of regions. In this example, time-aggregated networks would show identical connectivity between these three major trading regions, yet their sequential trajectories reveal fundamentally different strategies: systematic regional diversification versus back-and-forth movements between trading partners. To better understand the routing patterns that arise from these decision processes, we analyze short repeating sequences known as sequential motifs.

Previous studies have shown that ship trajectories can be modeled statistically using higher-order Markov chains, where future locations depend not only on the current position but also on recent prior locations26,27. Higher-order networks generalize this correlation structure of individual agent trajectories to a network representation where nodes represent trajectory subsequences rather than individual ports, embedding these sequential dependencies in the network, where the order indicates the correlation length captured28. Utilizing this framework on our tanker dataset, we systematically identified order-2 as the optimal order based on information-theoretic model selection criteria (see “Methods”). Intuitively, this means that routing decisions depend on both the previous location and current location, but extending knowledge further back adds little predictive value while dramatically increasing model complexity. Building on the insights of this framework, we examine ship movements through 2-hop motifs to uncover the relationship between routing patterns and operational routing performance.

Five distinct strategic patterns emerge from routing decisions. These are the five possible 2-hop motifs, which we call AAA, ABA, AAB, ABB, and ABC, illustrated in Fig. 3a. The specific regional labels are not relevant; instead, motifs are identified by the topological relations between elements in the motif. For example, a journey sequence (China, Japan, China), (Japan, China, Japan), and (UK, Mediterranean, UK) all share the same underlying pattern: a ship visits region A, then region B, then returns to region A. This maps onto the motif ABA, regardless of the regions involved. Thus, ABC motifs represent maximally diverse routing (3 unique regions within 2 consecutive legs), while AAA motifs indicate routing focused within a single region.

Fig. 3: Diagrammatic examples of 2-hop motifs and their observed counts over time.
Fig. 3: Diagrammatic examples of 2-hop motifs and their observed counts over time.The alternative text for this image may have been generated using AI.
Full size image

a Diagrams of 2-hop motifs. b Stacked plot of motif counts over time in 1-week windows. Counts indicate the number of motifs that started within each 1-week window, to account for varying motif durations.

We examine the absolute counts of different motifs over time, shown in the stacked time series in Fig. 3b. Despite random fluctuations, autocorrelation analysis reveals no clear temporal patterns in any of the motif counts, suggesting that fleet-level routing behaviors remain stable and consistent over time (see Supplementary Fig. 4 for autocorrelation plots). The motifs ranked based on occurrence frequency consistently reveal ABC as the most common motif, followed by ABA, and then AAA. This allows us to aggregate our motif counts across the whole duration of the data without losing critical information.

To examine the relationship between routing patterns and LBR, we compared the motif prevalence between high-performing (top quartile by LBR) and low-performing (bottom quartile by LBR) ships, stratified by vessel class and cargo status (shown in Fig. 4). Within each performance group, prevalence values are calculated as the weighted average of individual ship prevalence rates, where each ship is weighted by its total number of observed motifs; e.g., the weighted average prevalence of laden-first ABC motif among all Panamax ships is 30%.

Fig. 4: Observed motif prevalence of each ship class based on laden-ballast ratio (LBR) performance rank.
Fig. 4: Observed motif prevalence of each ship class based on laden-ballast ratio (LBR) performance rank.The alternative text for this image may have been generated using AI.
Full size image

Motif prevalence by ship class for three groups: all ships, the top-ranked 25% by LBR performers, and the bottom-ranked 25% by LBR performers. Motifs are categorized into a laden-first (laden → ballast), and b ballast first (ballast → laden). Marker size indicates prevalence (in percentages).

Our analysis reveals a striking relationship: high-performing ships consistently exhibit a greater prevalence of ABC motifs (visiting three different regions consecutively) and a diminished prevalence of AAA motifs (remaining within the same region). Meanwhile, the inverse is true for low-performing ships. This relationship holds across all ship classes, with top 25% performers displaying notably larger ABC prevalence and correspondingly smaller AAA prevalence compared to the bottom 25% performers (seen by the marker sizes in top 25% versus bot 25% columns).

The cargo status analysis reveals prominent usage of intra-regional legs for ballast operations. In laden → ballast motifs, ABB motifs (where the second leg represents intra-regional ballast movement) are more prevalent than AAB motifs overall, and especially in the top 25% group, consistent with the strategy of minimizing ballast travel times by seeking nearby cargo when available. The reverse applies to the ballast → laden motifs, with increased AAB prevalence for high-performance ships optimizing their cargo pickup location.

We strengthen our findings through comprehensive statistical validation. We systematically control for ship age—distribution shown in Fig. 2c—as a potential confounder and establish the statistical significance of observed trends through multiple approaches. Our stratified analysis by vessel class and motif cargo status employs Beta regression with motif prevalence as the dependent variable and LBR and ship age as regressors, isolating the independent contribution of each predictor (detailed in “Methods” and Supplementary Methods). The regression coefficients quantify each regressor’s unique effect on motif prevalence, with p-values confirming whether these effects are statistically significant. Additionally, we capture the magnitude of group separation using Cliff’s delta δ29,30, a robust non-parametric effect size that reveals distributional shifts beyond what simple mean comparisons can detect.

The associations between movement patterns and vessel LBR discussed above are statistically significant, show substantial effect sizes, and remain robust after controlling for vessel age. Effect sizes measured by Cliff’s delta, shown in Fig. 5, reveal large positive effects for ABC motifs across all vessel classes, while AAA motifs display small to large negative effect sizes for Panamax, Aframax, and Suezmax vessels. The laden-first ABB and ballast-first AAB motifs also exhibit large effect sizes across all vessel classes. These patterns achieve statistical significance (p < 0.05 after controlling for false discovery rates31) with the sole exception of AAA motifs in VLCCs. Critically, adjusting for ship age produces negligible changes in effect sizes (mean absolute difference δorig − δadj = 0.056; 20 out of 32 comparisons differ by less than 0.05), confirming that the motif-LBR associations represent genuine operational patterns rather than age-related artifacts. Qualitative interpretations of effect magnitude remain consistent whether or not age is explicitly controlled, with only Aframax’s AAA ballast → laden shifting from large to medium effect size after adjustment (Fig. 5).

Fig. 5: Motif effect size (Cliff’s delta δ) comparing top- and bottom-performers by laden-ballast ratio.
Fig. 5: Motif effect size (Cliff’s delta δ) comparing top- and bottom-performers by laden-ballast ratio.The alternative text for this image may have been generated using AI.
Full size image

Mean effect sizes with bootstrapped 95% confidence intervals (10,000 iterations) comparing top-ranked 25% against bottom-ranked 25%, grouped by vessel type and cargo status. Only statistically significant associations from beta regression are shown (two-sided tests, p < 0.05 after false discovery rate correction). Colors indicate vessel class and shapes distinguish cargo status (▲: laden → ballast; : ballast → laden). Filled markers (▲, ) indicate original delta δorig, while hollow markers (, ) indicate age-adjusted delta δadj. Dotted vertical lines denote effect size thresholds (small:  ± 0.147; medium:  ± 0.33; large:  ± 0.474).

VLCCs present a slightly unique case due to their operational constraint: unlike other ship classes, VLCCs show minimal AAA usage across both performance groups, reflecting their specialized role in long-haul routes with few intra-regional routes available to them. Instead, high-performing VLCCs show a reduced ABA motif prevalence (representing round-trips between two regions), suggesting that even within their operational constraints, high-performing VLCCs benefit from route diversification rather than repeating the same trading route. While reduced ABA motif prevalence of the top 25% is also observed in other ship classes, their effect sizes are notably smaller than in VLCCs, suggesting a weaker reliance on ABA motifs across the general population of ships (see Supplementary Fig. 5 for VLCC leg duration comparisons of ABC and ABA motifs).

Together, these findings indicate that route diversification represents a key component of operational efficiency, with more efficient ships consistently exhibiting varied market exposure while optimizing ballast destinations through intra-regional movements.

Identifying periodicity in shipping activity

While our motif analysis establishes a connection between individual ship efficiency and route diversification, these individual routing decisions aggregate to create system-level phenomena. The collective movements of many ships, driven by opportunities arising from fluctuations in the crude oil market, manifest as non-trivial temporal signals at the regional level. Although some fluctuations are a result of highly unpredictable events (i.e., geopolitical disruptions or accidents), others may be driven by predictable physical or social phenomena, ranging from climate patterns32 to socioeconomic patterns in production, operation, or consumption33,34. We therefore shift our focus from ship-centric trajectories to examine the macro-scale cargo flow dynamics that these aggregated movements generate across regions.

These macro-scale dynamics are captured by time-series y(t) of cargo flows (measured in 10,000 deadweight tonnage) entering and leaving a region for each ship class, yielding 8 time series per region. Each unique combination [region, direction, ship class] is referred to as a regional flow segment, indexed with subscript n. To capture the complex dynamics within this collective time series y(t) = [y1(t), …, yn(t), …], we employ bagging-optimized DMD35,36,37, a powerful and noise-robust tool for extracting spatio-temporal patterns from a vector time series, with regional flow segments serving as spatial coordinates. Detailed methodology on the time series construction and DMD reconstructions is in the “Methods” section.

A rank R DMD yields R complex spatial components ϕn,j, scalar amplitudes Aj, and a temporal frequency component ωj, where n is the regional flow segments index, and j is the index of the R modes. The DMD reconstructed time series for region flow segment n can be expressed in the form

$${\hat{y}}_{n}(t)={\sum}_{j}{A}_{j}| {\phi }_{n,j}| \cos \left({\omega }_{j}t+\arg ({\phi }_{n,j})\right).$$
(2)

Despite the noisy data, the DMD method is able to consistently recover periodic oscillatory patterns; examples for the reconstruction \({\hat{y}}_{n}(t)\) for South East Asia are shown in Fig. 6. The reconstruction errors are measured using the normalized root mean squared error RMSE*38, which lies in the range of [0, 1] where 0 represents a perfect signal reconstruction. We see that for RMSE*≤0.5, the reconstructions visually fit the peaks and troughs, although much of the shorter-scale fluctuations are not captured.

Fig. 6: Time evolution of cargo flow in Southeast Asia by vessel type and flow direction.
Fig. 6: Time evolution of cargo flow in Southeast Asia by vessel type and flow direction.The alternative text for this image may have been generated using AI.
Full size image

Time series showing (ad) outgoing and (eh) incoming cargo flows for different vessel classes: (a, e) Aframax, (b, f) Panamax, (c, g) Suezmax, and (d, h) Very Large Crude Carrier (VLCC). Gray lines represent original cargo flow data; magenta lines show reconstructed signals via dynamic mode decomposition. The normalized root mean squared error (RMSE*) quantifies reconstruction accuracy for each time series; lower RMSE* values indicate a better fit to the data.

The amplitude Aj is interpreted as the overall contribution of mode j to the dynamics; modes with larger Aj are more important. The spatial components ϕn,j represent the individual contributions of the regional flow segments n to mode j, and the complex argument of ϕn,j introduces a phase shift that controls when the peaks in each cycle occur. Thus, the combined term Ajϕn,j in Eq. (2) is interpreted as the amplitude of the regional flow segment n. The temporal frequencies are more easily understood as cosine oscillation periods τj = 2π/ωj. Note that, since DMD modes come in complex conjugate pairs with identical contributions to the reconstruction, we interpret our results in terms of these pairs.

Our decomposition uncovers a dominant annual cycle that governs global tanker movements. The scalar amplitudes and oscillation periods from the decomposition are shown in Fig. 7, where the mode index j is ordered in decreasing period lengths. The dominant mode-pair exhibits a striking 51.5-week oscillation period with 16% peak-to-trough amplitude variations across regional flows. To strengthen the validity of this finding, we applied two alternative periodicity extraction methods: the Fourier transform and Multi-channel Singular Spectrum Analysis (MSSA)39. Both methods independently recovered dominant periods of 48.0 weeks (95% CI: 47.1−49.9 weeks) via Fourier analysis and 49.5 weeks via MSSA, confirming the validity of our findings (see Supplementary Figs. 9, 10, and 11).

Fig. 7: Amplitude Aj against period τj from dynamic mode decomposition.
Fig. 7: Amplitude Aj against period τj from dynamic mode decomposition.The alternative text for this image may have been generated using AI.
Full size image

Twelve modes were chosen for the decomposition, which were merged into six as each mode has a complex conjugate pair.

This yearly seasonal cycle firmly matches our prior expectations, as the natural seasons shape physical and socio-economic forces that drive the maritime oil trade. There are several direct factors that affect the movement of ships. In particular, the economic supply and demand of crude oil is driven primarily by the transportation sector, with secondary influences from the petrochemical industry and the energy sector40,41. Analysis of global crude oil consumption33 also confirms annual seasonality in consumption patterns. Our primary focus lies on interpreting this mode pair; we present the rest of our results in Supplementary Figs. 1216.

Our results reveal that different ship classes operate with distinct regional footprints, creating specialized seasonal trading patterns across the globe. Analysis of regional contributions to annual seasonality in each ship class, visualized in Fig. 8, identifies where each vessel class concentrates its seasonal operations, with incoming and outgoing flow directions revealing the critical import and export hubs driving global trade cycles. The most significant patterns are described below.

Fig. 8: Normalized regional amplitudes by ship class and flow direction.
Fig. 8: Normalized regional amplitudes by ship class and flow direction.The alternative text for this image may have been generated using AI.
Full size image

Amplitudes, indicated by hue intensity, are normalized in each map by dividing actual amplitudes by the maximum observed amplitude within the same map. Maximum observed amplitudes by vessel class: a, e Very Large Crude Carrier or VLCC (291.7 incoming, 238.7 outgoing), b, f Suezmax (102.9 incoming, 99.0 outgoing), c, g Aframax (99.6 incoming, 124.8 outgoing), and d, h Panamax (38.2 incoming, 34.4 outgoing). Amplitude units are in 10,000 deadweight tonnage.

For incoming flows, VLCCs exhibit the most regional concentration in seasonal amplitude variation, seen in Fig. 8a, with China dominating the annual oscillations. This concentration reflects both the dedicated infrastructure required for the largest vessels and their substantial cargo capacity. These factors draw VLCCs to regions with the highest demand—primarily economically developed areas in the northern hemisphere with port facilities capable of accommodating these massive ships.

The geographical distribution of high incoming flow amplitude regions becomes increasingly dispersed for smaller vessels, reflecting their greater operational flexibility and broader port accessibility. Suezmax vessels (Fig. 8b) maintain strong seasonal amplitudes around China but show their most prominent activity in Southeast Asia, with notable contributions from South America. Aframax vessels (Fig. 8c) demonstrate further geographical spread encompassing much of Asia and Central America, while Panamax vessels (Fig. 8d) exhibit the most even distribution of amplitudes across the globe. The prominence of Western European and US ports for Panamax operations may reflect port size limitations in these regions. The amplitude scale factors (VLCC: 292, Suezmax: 103, Aframax: 100, Panamax: 38) reveal that a significant portion of seasonal demand is supplied by VLCCs, particularly toward China, while Aframax and Panamax vessels support most of the remaining global seasonal demand.

The outgoing flow patterns reveal complementary geographical specializations shaped by both resource distribution and maritime infrastructure constraints. VLCC exports (Fig. 8e) show dominant seasonal amplitudes primarily in the Middle East, followed by South America and Southeast Asia. While the Middle East and South America represent major crude oil production centers, Southeast Asia’s prominence reflects its role as a major crude oil trading and storage hub rather than production, serving as a redistribution centers for the Far East market. Suezmax (Fig. 8f) exports are primarily driven by West Africa, the Mediterranean, and the Middle East—a pattern reflecting how Suezmax vessels, designed as the largest ships capable of Suez Canal transit, can efficiently serve seasonal Asian demand.

Aframax (Fig. 8g) exports remain concentrated in the Middle East, while Panamax (Fig. 8h) exports show peak amplitudes in the western United States, followed by Western Europe, West Africa, and China. The western United States dominance in Panamax exports suggests these vessels transited the Panama Canal eastward to access Atlantic regions42. These amplitude distributions demonstrate how size constraints interact with port infrastructure limitations and geographical resource distributions to create distinct seasonal flow patterns across the global shipping network.

Additionally, the DMD analysis on phase shifts uncovers patterns of synchronous and anti-synchronous oscillations between ship classes and between regions, illustrated in Fig. 9. We find that few regions exhibit strong synchronicity across the ship classes, as observed through the clustering (or lack thereof) of markers along rows in Fig. 9. Notably, only China has the phase shift of all ship classes being aligned around a seasonal peak in February/March, or π/2 corresponding to an offset of 3 months, (see Fig. 9a). Other notable regions (Middle East and South East Asia) show moderate synchronicity, where only the VLCC outgoing flows display a distinctly different phase shift, highlighted in Fig. 8b, c.

Fig. 9: Phase-amplitude of cargo flow for each region.
Fig. 9: Phase-amplitude of cargo flow for each region.The alternative text for this image may have been generated using AI.
Full size image

The color and marker size indicate the ship class and amplitude, respectively. The phase can be interpreted as where in time the first peak is by reading the top axis (note the direction of time is right to left), or the angular shift θ in \(\cos (\theta+2\pi /\tau )\) for τ = 51.5 weeks by reading the bottom axis. Dotted outlines identify phase clusters in regions of interest: a China, b Middle East, c South East Asia, and d across multiple regions.

We also observe clustering of multiple regions around the start of the calendar year, i.e., February/March (see Fig. 9d), corresponding to the winter months of the Northern hemisphere. Indeed, these include prominent developed economic northern regions in Europe and Asia, which explains their ability to import in large quantities. Interestingly, South East Asia, while not a northern region, is largely in sync with this particular seasonal cycle, potentially due to their strong trading relationship with their northern Asian partners. Meanwhile, contributors to the outgoing flow (such as the Middle East) experience a peak around June/July or −π/2, anti-synchronous to the winter peaks. American seasonal patterns (USG, USW, and USA) do not appear to be in sync with the rest of the northern regions, potentially explained by their increasing independence in crude oil production40.

Discussion

We demonstrated that moving beyond time-aggregated static network analysis reveals previously undetected patterns in maritime tanker networks at the individual vessel and fleet level. By applying sequential motif analysis, we uncover links between individual routing strategies and operational efficiency, while DMD shows seasonal spatio-temporal patterns in the dynamic global flow of cargo.

At the individual vessel level, our sequential motif analysis reveals that strategic route diversification correlates with superior operational routing performance across all ship classes. Top-performing ships favor ABC motifs (regional exploration) over AAA motifs (single-region focus) and ABA motifs (repeating round trips), even after controlling for ship age and stratifying by ship class, suggesting genuine strategic advantages beyond economies of scale or vessel-specific technical characteristics. While a definitive causal relationship warrants further investigation, we interpret this correlation as reflecting responsiveness to regional price differentials and cargo availability. Unlike container ships with fixed routes, tankers operate in charter-based markets where strategic flexibility enables exploitation of market opportunities across different regions, often manifesting as ABC motifs. Our findings suggest that fleet operators practicing regional specialization may improve performance through greater strategic diversification and market responsiveness. We note that AAA motifs encompass heterogeneous routing behaviors (i.e., from repetitive shuttles between specific port pairs to diverse movements across multiple ports within a region), which our regional aggregation cannot distinguish; future sub-regional analysis could clarify whether diversification benefits persist at finer spatial scales.

At the system level, our DMD analysis identifies pronounced seasonal patterns in regional cargo flows with quantifiable periods and phase relationships between maritime regions. These seasonalities represent strong underlying structural trends; however, they can also be obscured by short-term fluctuations week-to-week due to non-linear drivers—including market volatility, port congestion, weather delays, or disruptive events—which manifest as high-frequency noise. Our findings should therefore inform strategic planning over longer horizons. For fleet operators, this enables strategic positioning towards regions experiencing increasing seasonal supply or demand, opening opportunities for vessels willing to explore markets beyond their traditional operational areas. For port and governmental authorities, these patterns enable proactive planning for quarterly import/export variations, including timing infrastructure upgrades during off-peak periods, allocating seasonal workforce and berth capacity, and coordinating logistical resources to match anticipated cargo.

These findings establish strong ties between routing performance incentives and routing diversification; the preference for exploration over regional concentration among high-performing vessels provides actionable insights for fleet optimization strategies that can outperform competitors in revenue generation versus costs incurred. Furthermore, our new framework closes methodological gaps explicitly identified by Álvarez et al.9, who emphasized the need for “deeper comparison among connected ports in terms of performance indicator” and urged researchers to “check the interactions between economy evolution in different regions and the maritime routes”.

However, several limitations should be acknowledged. In the absence of ship-specific operational data, we constructed a performance indicator using the durations of laden and ballast legs. While these durations are recorded precisely in the data, they include port waiting times stemming from factors outside operator control (e.g., port congestion, administrative delays), which ideally would be isolated to strengthen inferences about routing strategy efficiency. With the availability of additional data—by including route-specific details such as the distance traveled and freight rate realized, ship-specific details such as the average speed and fuel consumption rate, or environmental conditions such as wind speeds or wave heights43—researchers could construct alternative performance indicators or EEOI proxies44, including CO2-based indicators, enabling more direct evaluation of individual vessel routing decisions’ environmental impact alongside economic efficiency. Alternatively, such enriched datasets would enable fleet-level comparative analysis through multivariate approaches that simultaneously control for technical, environmental, and operational factors while isolating routing strategy effects.

While the Bagging-Optimized DMD method incorporates sub-sampling to improve robustness against noise, our approach assumes an additive linear seasonal model45, and deliberately excludes nonlinear drivers such as oil price shocks, policy changes, and geopolitical tensions or conflicts. Nonetheless, our primary objective was to establish whether underlying seasonal components exist and can be recovered despite system noise—and indeed we successfully demonstrate both. Future research could develop more predictive models that explicitly incorporate disruptive events alongside the baseline seasonal patterns we have established.

Our methodological choices, together with the pre-COVID time frame of our dataset (2016 January to 2020 February), limit the direct generalizability of the specific spatial and temporal patterns we report. Nevertheless, our dual-scale framework is fully deployable on post-COVID data, where comparisons to pre-COVID data can reveal whether major disruptions have fundamentally reshaped seasonal trading behavior or whether the system has reverted to its pre-disruption state.

Our methodology extends beyond oil tanker applications. Sequential motif analysis applies directly to dry bulk carriers and other charter-based shipping or transport sectors with flexible routing patterns. The DMD component could be applied to any transportation system with constrained capacity and regional flows, including trucking networks or airline cargo operations, while the general framework of linking individual behavioral patterns to system-wide efficiency outcomes has broad applicability across complex networked systems.

Methods

Data structure

The dataset contains a list of laden legs (single trips where the ship is loaded with cargo), detailing the origin/destination locations and their respective departure/arrival times, as well as an identifier of the ship that completed the leg. This enables the reconstruction of the entire historical journey of an individual ship, illustrated in Fig. 1, typically consisting of alternating laden and ballast legs. In total, we recovered over 192,000 shipping legs (both laden and ballast), split amongst 33,509 Panamax, 94,840 Aframax, 34,401 Suezmax, and 29,418 VLCC legs.

Ballast leg definition

Ballast legs (where a ship is not carrying cargo) are not explicitly recorded in our dataset; instead, we infer their occurrence between two consecutive laden legs of the same ship: given that a particular ship performs two laden legs from regions a → b and then c → d, this implies a ballast leg from b → c. Consequently, our definition of ballast legs may include other activities not typically related to reaching the next load port. Note that, as the only timestamps recorded are the departure and arrival times of each laden leg, the duration of ballast legs is consequently on average, artificially longer than that of laden legs, as the additional amount of time spent on loading and discharging is not known. Nonetheless, including these port waiting times is logical from an economic perspective, as delays of this type incur further costs to operators without adding to revenues, much like ballast legs do.

Port waiting days adjustment

Our dataset records laden voyage departure and arrival times, but arrival timestamps correspond to berthing rather than port arrival, systematically conflating at-sea time with port waiting time. Since port waiting times vary randomly, this introduces a randomly distributed positive bias that disproportionately affects shorter voyages.

We corrected this systematic bias using a reference dataset with complete temporal information (including separate port arrival and berthing times) covering different vessels and time periods. Analysis of this reference data revealed that port waiting times are approximately independent of voyage duration and appear randomly distributed within each vessel class. Empirically, these distributions are well-characterized by lognormal models, which we fitted using maximum likelihood estimation to obtain vessel-class-specific parameters (μσ) in log-space.

To ensure conservative corrections that avoid introducing bias in the opposite direction, we applied lower-bound estimates based on the mode of each lognormal distribution \(\exp (\mu -{\sigma }^{2})\). This approach identifies, with 95% confidence, the minimum port waiting time that vessels spend at the end of each laden leg. The conservative adjusted at-sea laden durations are obtained by subtracting the correction values from observed laden leg durations (see Supplementary Fig. 2).

Validation analysis demonstrates that LBR performance rankings remain stable across correction levels: mean rank changes are ≤2 positions for VLCCs, Suezmax, and Panamax, and ≤5 positions for Aframax vessels under median corrections (see Supplementary Fig. 3), with smaller changes for modal correction. For multi-event voyages (multiple loads or multiple discharges), we applied the same correction methodology at intermediate port visits to the total port duration from arrival to departure.

Comparison of laden-ballast ratio to EEOI

Performance indicators are useful tools for evaluating the efficiency of a ship’s operations. The ideal performance indicator would accurately pick out the most efficient ships (e.g., high revenue yield, low emissions rate, low incurred costs, etc.) and the least efficient ships accurately. The International Maritime Organization (IMO) encourages the use of the Energy Efficiency Operational Indicator (EEOI), which acts as an industry standard indicator. The EEOI is defined as

$${{{\rm{EEOI}}}}=\frac{{\sum }_{i}{C}_{i}}{{\sum }_{i}{T}_{i}{D}_{i}},$$
(3)

where C is the carbon emissions, T is the cargo tonnage, D is the distance traveled, and i is the index of a leg (which can be either laden or ballast). The carbon emissions C are calculated as ∑αFαcα, where Fα is the quantity of fuel consumed, and cα is the CO2 conversion factor for fuel type α. However, such detailed data are often only available to ship or fleet owners, making direct calculations and comparisons of the EEOI challenging on an industry level. Additionally, the EEOI is particularly sensitive to ship-specific parameters and, as such, is not as useful in comparing ship-independent processes.

To tackle this challenge, we proposed an alternative route performance indicator, the LBR. In our data, the key parameters that are available to us are (i) the load departure time. and (ii) discharge arrival times. Given trajectory J = {(tisi)} where ti are port visit times and si {load, discharge}, we define the set of laden legs L and ballast legs B as

$$L=\{({t}_{i},{t}_{i+1}):({s}_{i},{s}_{i+1})=\,{{{\rm{(load,\; discharge)}}}}\},$$
(4)
$$B=\{({t}_{i},{t}_{i+1}):({s}_{i},{s}_{i+1})=({{\rm{discharge}}},\;{{{\rm{load}}}})\}.$$
(5)

Using the definitions from Eqs. (4) and (5), we compute the durations of the laden legs and ballast legs as

$$\,{{{\rm{For}}}}\,l=({t}_{1},{t}_{2})\in L{{{\rm{}}}}:\Delta {t}_{l}={t}_{2}-{t}_{1}-{{{{\rm{PWT}}}}}_{c},$$
(6)
$$\,{{{\rm{For}}}}\,b=({t}_{1},{t}_{2})\in B{{{\rm{}}}}:\Delta {t}_{b}={t}_{2}-{t}_{1},$$
(7)

where PWTc is the lower-bound estimated port waiting time for vessel class c. The LBR for a given ship is then defined as

$${{{\rm{LBR}}}}=\frac{{\sum }_{l\in L}\Delta {t}_{l}}{{\sum }_{l\in L}\Delta {t}_{l}+{\sum }_{b\in B}\Delta {t}_{b}},$$
(8)

where Δtl is the at-sea duration of laden leg l and Δtb is the duration of ballast leg b.

The LBR measures the proportion of time a ship spends carrying cargo during its journey. This measure captures both the revenue-generating aspect of laden durations and the cost-savings associated with shorter ballast durations. By focusing on temporal patterns—time spent laden versus ballast—it isolates the effectiveness of routing decisions independent of vessel-specific factors such as engine efficiency or fuel type. Ships with higher LBR spend more time performing useful work transporting cargo, or waste less time and energy traveling ballast, or both. As the industry increasingly adopts data-driven operations, this measure provides a practical tool for evaluating and comparing route-based performance between ships or fleets across different ownership groups.

Furthermore, the LBR shares similar properties with the EEOI. First, longer laden legs are beneficial. In the EEOI calculations, the average ratio of fuel consumed per unit distance for laden vs. ballast legs is much smaller than the ratio of tonnage carried (ballast legs have 0 useful tonnage carried). Thus, while laden legs emit more CO2, longer laden legs tend be more efficient in the long run due to the tonnage carried being higher. Second, longer ballast legs are penalized. The ballast leg duration is defined as any time between consecutive laden leg durations. Thus, port waiting times are included in our ballast leg durations. As such, our performance indicator penalizes ballast legs more heavily than the EEOI—this approach reflects a financial perspective where time spent waiting at ports represents the same opportunity cost as time spent traveling ballast. See Supplementary Eqs. (1)(5) for further details on EEOI compared to LBR.

Sequential motif mining

Sequential motifs represent short recurring patterns within trajectory sequences that capture higher-order dependencies in movement data46. Unlike traditional network motifs that focus on structural relationships between ports8,13,47,48,49, sequential motifs analyze the ordered trajectory patterns of individual ships navigating through the maritime network. Selecting an appropriate motif length (i.e., the chosen correlation length) is closely linked to identifying the optimal order in higher-order networks26,27,28, where -hop sub-sequences serve as fundamental nodes of the network.

We determined optimal motif length by evaluating higher-order network models at orders 1–4 (beyond which model complexity becomes computationally expensive) using information-theoretic criteria that balance explanatory power against model complexity: Akaike Information Criterion (AIC)50, Bayesian Information Criterion (BIC)51, and likelihood ratio tests52. Each of these criteria applies different penalties, offering complementary perspectives on the optimal order selection. Parameters were estimated via maximum likelihood fitting of transition matrices on the full dataset for each vessel class.

Across all vessel classes, both AIC and likelihood ratio tests consistently selected order-2 as optimal (see Supplementary Table 1). While BIC’s stricter complexity penalty favored order-1, the order-2 BIC values remained substantially closer to order-1 than to order-3, indicating that order-2 provides meaningful improvement without excessive parameterization. Intuitively, this means that routing decisions depend on the previous location and current location, but extending knowledge further back adds little explanatory power at the cost of dramatically increasing model complexity. For order  = 2, the higher-order network edges represent 3-node trajectory subsequences, which our 2-hop motifs directly capture. This convergence of independent model selection criteria establishes that 2-hop motifs represent the minimal sequence length necessary to capture statistically significant sequential dependencies in tanker routing patterns.

Having established the optimal length, we now formally define sequential motifs. A sequential motif M of length is defined as a sequence of alphabets (m1m2, . . . , m+1) where \({m}_{i}\in {{{\mathcal{A}}}}\) for a finite alphabet set \({{{\mathcal{A}}}}=\{A,B,C,...\}\). Denoting regions as vi and the set of regions as V, a regional sequence S = (v1v2, . . . , v+1) constitutes an instance of motif M if there exists a unique mapping function \(f:V\to {{{\mathcal{A}}}}\) such that f(vj) = mj for 1≤j + 1, with the constraint that mi = mj vi = vj. This formulation ensures that motifs capture topological relationships independent of specific regional identities.

At length 2, there are exactly five possible motif types: AAA, ABA, AAB, ABB, and ABC. For example, the sequence (China → Japan → China → SEA → SEA → Japan) contained the following 2-hop motifs: ABA, ABC, ABB, AAB.

The total number of possible motifs of length follows the Bell numbers, representing unique partitions of  + 1 labels—see https://oeis.org/A000110.

Multivariate motif model

To control for potential confounding by vessel age, we employed multivariate Beta regression for each motif variant across all vessel classes. Beta regression is appropriate for modeling proportional data (e.g., motif prevalence [0, 1]) where observed values are close to the boundary53,54. For each motif-class combination, we model motif prevalence ρM using Beta regression with logit link:

$$\,{{{\rm{logit}}}}\,(\mu )={\beta }_{0}+{\beta }_{LBR}\cdot {{{\rm{LBR}}}}+{\beta }_{age}\cdot {{{\rm{age}}}},$$
(9)

where \(\mu={\mathbb{E}}({\rho }_{M}| {{{\rm{LBR}}}},{{{\rm{age}}}})\) is the conditional mean motif prevalence. The regression yields coefficients βLBR and βage quantifying the strength of association between each regressor and motif prevalence ρM. Statistical significance was assessed via hypothesis tests for βLBR ≠ 0 after applying Benjamini-Yekutieli correction across 40 tests (10 motif variants × 4 vessel classes)31.

To isolate age effects from routing performance effects, we computed age-adjusted LBR residuals by regressing LBR against vessel age. These residuals capture the performance variation unexplained by age alone. Complete results on the Beta regression coefficients, p-values, and age-adjusted effect sizes are detailed in Supplementary Methods and Supplementary Table 2.

Motif mining algorithm

For each vessel journey S = (v1v2, . . . , vn+1) representing a sequence of regions visited, we first extract all consecutive 3-region subsequences Si:i+2 = (vivi+1vi+2) for i = 1, . . . , n − 1, and then apply the topological mapping function to classify each subsequence. Letting κM(S) = count of motif in S, for a collection of journeys Si, the prevalence of motif M is calculated as

$${\rho }_{M}=\frac{{\sum }_{i}{\kappa }_{M}({S}_{i})}{{\sum }_{i}(| {S}_{i}| -2)}.$$
(10)

Dynamic mode decomposition for shipping dynamics

To uncover coherent patterns in our regional shipping data, we employ the DMD method35, a powerful data-driven approach to studying dynamics of spatio-temporal data. DMD has been extended from its origins in fluid dynamics to a range of network data analysis—such as disease spreading dynamics and brain dynamics55,56,57—yet remains unexplored in maritime network analysis despite its suitability for spatiotemporal pattern extraction in constrained systems.

DMD is able to treat spatial and temporal modes simultaneously, making it particularly effective for identifying system-wide trends, something that can be leveraged in networks where the underlying topological connections act as a latent space for the spatial modes. We specifically use bagging-optimized DMD36,37—implemented in Python with the PyDMD package58,59—which improves robustness against noisy data through ensemble averaging. We provide a summary of the DMD output next; see refs. 36,37 for full details and theory.

Given a vectorized time series \({{{\bf{Y}}}}=\left[{{{\bf{y}}}}(1),{{{\bf{y}}}}(2),\cdots \,,{{{\bf{y}}}}(m)\right]\) over N regional flow segments and m time steps, the R rank DMD finds a complex-valued reconstruction

$$\hat{{{{\bf{y}}}}}(t)={\sum}_{j=1}^{R}{A}_{j}{{{{\mathbf{\Phi }}}}}_{j}\exp (i{\omega }_{j}t)$$
(11)

where Aj is a scalar amplitude, \({{{{\mathbf{\Phi }}}}}_{j}=({{\upphi }}_{1,j},{\upphi }_{2,j},\ldots,{\upphi }_{N,j})\in {{\mathbb{C}}}^{N}\) is a complex vector, and \({\omega }_{j}\in {\mathbb{R}}\) is the frequency of the jth mode. The Aj, Φj, and ωj terms are parameters to be fitted. In this case, the real part of \(\hat{{{{\bf{y}}}}}\) is the best-fitted reconstruction of the data obtained by minimizing the Frobenius norm37.

We can decompose the terms inside the right-hand side of Eq. (11) into more intuitive forms. In particular, the real component of the nth region of the jth mode can be written as

$$\Re [{A}_{j}{\upphi }_{n,j}\exp (i{\omega }_{j}t)]={A}_{j}| {\upphi }_{n,j}| \cos (2\pi t/{\tau }_{j}+{\theta }_{n,j}),$$
(12)

where θn,j is obtained from the polar form of the complex number \({\upphi }_{n,j}=| {\upphi }_{n,j}| \exp (i{\theta }_{n,j})\), and τj = 2π/ωj is the period.

In order to apply DMD to our study, we must first get our shipping data in the form of y(t). Our data comes in the form of a set of laden legs. For a given vessel class, a laden leg describes a voyage from region u to region v between the times tu and tv with cargo weight w. This can be represented as a temporal edge (uvtutv). We then define a time step δt and rolling window interval ΔW, where ti+1 = ti + δt. A temporal edge (uvtutv) is then considered active in interval t + ΔW if

$$t\le {t}_{u}\le t+\Delta W\,\,{{{\rm{edge}}}}\; {{{\rm{begins}}}}\; {{{\rm{in}}}}\; {{{\rm{window}}}}$$
(13)
$$t\le {t}_{v}\le t+\Delta W\,\,{{{\rm{edge}}}}\; {{{\rm{ends}}}}\; {{{\rm{in}}}}\; {{{\rm{window}}}}$$
(14)
$${t}_{u}\le t\,\;{{{\rm{and}}}}\,\; {t}_{v}\ge t+\Delta W\,\,{{{\rm{edge}}}}\; {{{\rm{overlaps}}}}\; {{{\rm{window}}}}$$
(15)

Denoting the set of edges active in window t + ΔW as \({{{\mathcal{E}}}}(t,\Delta W)\), the time series of incoming or outgoing cargo flow for region i in time window t is the sum of the cargo weights we of all the active voyages \(e\in {{{\mathcal{E}}}}(t,\Delta W)\):

$${y}_{(i,in)}(t)={\sum }_{(u,v,{t}_{u},{t}_{v})=e\in {{{\mathcal{E}}}}(t,\Delta W)}{\delta }_{i,u}{w}_{e}$$
(16)
$${y}_{(i,out)}(t)={\sum }_{(u,v,{t}_{u},{t}_{v})=e\in {{{\mathcal{E}}}}(t,\Delta W)}{\delta }_{i,v}{w}_{e}$$
(17)

where δi,u = 1 if i = u else 0 and we is in units of 10,000 deadweight tonnage (dwt). By definition then, ballast legs are not included. Thus, y(iin)(t) is the total cargo arriving at region i at time t, and y(iout)(t) is the total cargo departing from region i at time t.

An additional pre-processing step is to remove linear trends from y(t), as the DMD method can only recover sinusoidal and exponential terms. While the DMD algorithm may still converge without this step, it tends to fit eigenmodes with period the duration of the data set to approximate a linear term. This can obscure the functionality and interpretability of results, and thus it is better to remove it beforehand. We apply a linear regression in time to each time series, such that

$${\widetilde{y}}_{i}(t)={y}_{i}(t)-{r}_{i}t-{k}_{i}$$
(18)

where ri and ki are gradient and constant terms from a least-squares straight line fit. This brings our work in line with an additive seasonal model45, where the original signal \({{{\bf{y}}}}({{t}})={{{\bf{L}}}}(t)+{{{\bf{I}}}}(t)+\widetilde{{{{\bf{y}}}}}(t)\), where L(t) is a long-term linear trend, and I(t) are irregular terms. The component \(\widetilde{{{{\bf{y}}}}}(t)\) is composed of both the “seasonal” and “cyclical” terms, which in ref. 45 corresponds to periodicities of <1 year and >1 year, respectively. However, in our case, we do not distinguish between the two.

Peak-to-trough relative amplitudes

To quantify the relative significance of an individual mode to each time-series, we need a standardized measure that accounts for the varying scales of cargo flows across different regional flow segments. To do this, we define the peak-to-peak relative amplitude, which normalizes each mode’s peak-to-peak amplitude by the total variability observed in the original time series:

$${{{{\rm{Relative\; Amplitude}}}}}_{n,j}=\frac{2{A}_{j}| {\upphi }_{n,j}| }{\max ({y}_{n})-\min ({y}_{n})}$$
(19)

where Ajϕn,j represents the amplitude of mode j in regional flow segment n, and \(\max ({y}_{n})-\min ({y}_{n})\) is the peak-to-peak range of the original time series for regional flow segment n. The factor of 2 converts the cosine amplitude to peak-to-peak amplitude. An additional factor of 2 is introduced when considering the relative amplitude of a mode pair, since DMD modes come in complex conjugate pairs that both contribute to the reconstructed signal.

A larger value in the relative peak-to-peak amplitude indicates that the mode contributes more significantly to the observed fluctuations in that regional flow segment. The standardized nature of this measure also enables calculation of the average significance of each mode across all regional flow segments.

Choice of parameters

DMD requires the selection of several key parameters: SVD rank, time interval, rolling window duration, and number of delay embeddings. We employed a two-stage selection strategy utilizing error-based optimization (for SVD rank) and stability-based selection (for other temporal parameters) to avoid overfitting to data-specific noise.

For time interval (1 week), rolling window duration (6 weeks), and delay embeddings (22), we ensured that the parameter values chosen were statistically stable under partial variation. We systematically scanned parameter ranges while computing eigenvalue distributions for each combination, then calculated pairwise statistical distances (variational distance and Hellinger distance) between eigenvalue distributions across parameter sweeps60. Regions of low statistical distance correspond to stable parameter regimes where extracted eigenvalues remain consistent across parameter perturbations (see Supplementary Fig. 8). For time intervals and rolling window duration, we selected small parameter values to balance eigenvalue stability with maximizing the number of time points available for decomposition. For delay embeddings, we selected the smallest value within the stable regime that ensures the data matrix is tall (more rows than columns), which is required for stable SVD decomposition.

We determined the SVD rank through a two-step process. First, we examined the explained variance, identifying an “elbow point” where the explained variance plateaus. Then we evaluated the root-mean-squared error (RMSE) around this elbow point, corresponding to candidate ranks 6−20. The final rank was chosen such that it provided the largest average marginal improvement in RMSE across a range of delay-embedding (see Supplementary Fig. 6). SVD rank 12 provided the largest average marginal improvement, while higher ranks showed inconsistent performance and diminishing returns. This approach balances reconstruction performance against model complexity to prevent overfitting to noisy signals.

The final parameter choices produced stable modes and eigenvalues that were well within the expected ranges under neighboring parameter choices. In particular, across all parameter combinations, we recovered the dominant annual mode with period 362 ± 10 days. Similar results were found for the other 5 modes; see Supplementary Fig. 7.

Reconstruction errors

The reconstruction errors for the τ = 51.5 mode was evaluated for each combination of region, direction, and ship class using the normalized root mean squared error RMSE*38, shown in the scatterplot in Fig. 10. The RMSE* lies within the range of [0, 1], with 0 indicating a perfect fit and 1 a maximally erroneous fit. We find that regions with larger amplitudes tend to have lower RMSE*, indicating that our interpretations should be focused on high amplitude regions.

Fig. 10: Scatter plot of reconstruction error against amplitude of the dominant mode-pair.
Fig. 10: Scatter plot of reconstruction error against amplitude of the dominant mode-pair.The alternative text for this image may have been generated using AI.
Full size image

Reconstruction error measured using normalized root mean squared error (RMSE*). Each point represents a cargo flow segment. Color indicates vessel class: Panamax, Aframax, Suezmax, and Very Large Crude Carrier (VLCC). Lower values of RMSE* indicate a better fit to the data.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.