A Bayesian perspective on geographic influences in coach decision-making for athlete identification and selection

Johnston, Kathryn; Wang, Yiru; Trace, Jack; Baker, Joseph; Richard, Veronique; Roberts, Alex

doi:10.1038/s41598-025-30791-y

Download PDF

Article
Open access
Published: 11 February 2026

A Bayesian perspective on geographic influences in coach decision-making for athlete identification and selection

Kathryn Johnston¹,
Yiru Wang¹,
Jack Trace^2,3,
Joseph Baker¹,
Veronique Richard^3,4 &
…
Alex Roberts^2,5

Scientific Reports volume 16, Article number: 8234 (2026) Cite this article

826 Accesses
Metrics details

Subjects

Abstract

Decision-making science examines not only what choices people make but how and why they make them. This study investigated whether systematic geographical differences exist in athlete selection within a development program. With 4805 athletes from a state-wide talent search program across 16 sports (19 disciplines) in Australia, selection patterns were analyzed between ‘Regional’ and ‘Metropolitan’ (South East Queensland) athletes. Athletes completed anthropometric, physical, and physiological assessments. A Bayesian hierarchical model revealed Regional athletes face systematically stricter selection thresholds, requiring 1.009 standard deviations higher than Metropolitan athletes to be selected (95% CI [0.656, 1.355], P = 100%). This effect size indicates that Regional athletes must perform at approximately the 84th percentile to achieve the same offering probability as an Metropolitan athlete performing at the 50th percentile. This varied substantially by sport, from 0.899 SD (Archery) to − 0.386 SD (Athletics 400 m). Notably, 16 disciplines showed no clear disparities for Regional athletes (effects ranging from − 0.216 to 0.211 SD). These findings suggest geographical location creates systematic disadvantages in talent selection, likely driven by economic realities, as supporting Regional athletes costs several times more through travel, accommodation, and facilities. Consequently, Regional athletes may require superior performance across selection criteria to justify these additional costs.

Introduction

The practice of identifying (i.e., creating opportunities to observe and determine ‘fit’ of an athlete to a team/squad/group¹) and selecting athletes (i.e., the process of choosing which athletes are afforded certain developmental opportunities¹) to formal and structured programs, is common in many competitive sporting environments around the world^2,3,4. These practices are designed to find/recruit athletes and streamline available resources in hopes of increasing the number of potentially high-performance athletes while augmenting the support available along the journey towards sporting excellence.

The identification phase typically involves a series of tests, designed to allow athletes to demonstrate their capabilities relative to their peers^5,6. The focus of these tests can vary, spanning an athlete’s physical/physiological ability, technical/tactical/psychological skills, and/or perceptual cognitive capacities^7,8. For example, Gil and colleagues⁷ assessed 194 soccer athletes during a selection camp using anthropometric (e.g., height, weight, skinfold measures) and physiological tests (e.g., endurance running test, sprinting test, jumping height test), illuminating the range of variables that coaches and scouts consider.

In addition to objective testing data, coaches/recruiters may use their own observations to gain more nuanced information about athletes beyond their testing scores. Jokuschies⁹, for example, interviewed five elite soccer coaches and revealed an athlete’s personality (e.g., willingness to improve), perceptual cognitive skills (e.g., speed of perception), motor abilities (e.g., agility in small spaces) and technique (e.g., technical soccer skills) were just some of the subjective criteria the coaches consider during identification and selection phases. In work by Johnston and Baker¹⁰, elite distance running coaches stated that despite speed being an important variable for selection, if an athlete does not ‘fit’ with the rest of the team, the athlete would not be invited to join the team. Work by Roberts et al.¹¹ also highlighted that in addition to the more ‘objective’ (i.e., testing scores) and subjective indicators (i.e., personality and team cohesion), a coach will use their ‘gut’ instinct to make selection decisions. Putting these pieces of evidence together, it would appear coaches and recruiters use a variety of information sources when trying to determine what an athlete can do currently (i.e., performance) and might be able to do in the future (i.e., potential) (see also^11,12). That said, making judgments about an athlete’s suitability to a team/squad remains one of the most complicated and complex tasks in sport^13,14.

Athlete selection is complex (i.e., highly intricate and interconnected), because decisions about an athlete’s suitability for some future state are typically made when athletes are not fully-formed entities¹⁵. More specifically, athletes are usually still developing most facets of their physical, psychological, emotional, mental, and cognitive makeup when selection occurs, undermining the predictive accuracy of these selections. Making it even more challenging, coaches and selectors also need to predict how an athlete will change and adapt within a dynamic environment, where teammates, competitors, coaches, and sometimes even rules or styles of play will change^15,16. Selection is also complicated in the sense of having many parts because coaches consider many factors when making decisions under uncertainty (i.e., when the outcome is not always knowable) and are operating under various system constraints. These factors may include coaches’ preferences, goals, risk profiles, beliefs, and biases at any given time¹⁷. While studies have explored various decisions athletes make (e.g., when to pass, when to shoot, when to speed up), less is known about the decision-making processes and procedures coaches and selectors use, especially in the context of athlete selection.

In other fields, such as medical science, social science, politics, and economics, examining how decision makers craft judgements and make decisions under uncertainty is a growing area of exploration¹⁸. Primarily, the focus of this research has been rooted in understanding the 'quirks’ of the mind when tasked with decision problems (often in controlled environments with knowable outcomes), the mechanisms leading to those quirks, and the possible ways in which they can be mitigated. Importantly, it can be difficult to determine what is quirky and what is rational in decision-making situations as some individuals will value certain pieces of information or possible outcomes over others. In athlete selection, for example, while it can be presumed that coaches and recruiters will select the best performing athletes, in reality, sometimes athletes who are not the top performers are selected for a variety of reasons, such as assumptions about an athletes ‘fit’ to the team (as mentioned above), the needs of the team (a specific role or position needed to fill), the size and maturation of the athlete, and environmental constraints that some individuals face. Another potential reason relates to where an athlete lives (i.e., Metropolitan vs Regional areas), their access to training and coaching opportunities in different geographical regions, and the cost to integrate that athlete in training and development programs.

Although not widely examined, geographical characteristics have been purported to be a key component of athlete development^19,20,21,22. Primarily, research in this area has focused on birthplace effects; that is, how the population size of the city/town that an athlete was born in affects their likelihood of sporting success. Early work by Curtis and Birch²³ postulated that population regions that are ‘medium-sized’ (i.e., ranging from 1000 to 500,000) were most advantageous for Canadian and United States Olympic ice-hockey players and Canadian National Hockey League (NHL) players. Extending this work, Côté et al.¹⁹ examined the size of the community and likelihood of athletes reaching elite levels (i.e., NHL, National Basketball League, Major League Baseball, and Professional Golf Association), where athletes in the smallest rural communities (e.g., fewer than 1000 residents), or the biggest communities (e.g., greater than 500,000 residents) appeared to be disadvantaged¹⁹. In subsequent work, scholars not only considered population size in a community, but also population density^20,24. For instance, work by Smith and Weir²⁴ indicated that while medium-sized communities (i.e., 10,000–249,999 inhabitants) provided the best odds of participation and continued engagement for female soccer players, less densely populated communities (i.e., 50–<400 population/km$^2$) appeared to be ideal for facilitating participation at age 10 years, but not at age 16. This work helps to highlight the distinct considerations that community size and population density may have for athletes during different stages of development.

While prior work offers some insight into the geographical influences shaping athlete participation, concerns have been raised about the value of examining where an athlete was born (e.g., birthplace), or where an athlete lived at the time of investigation (e.g., cross sectional designs), as they may be too reductionist to capture the complex relationship between geography and athlete development. Hancock et al.²⁰ recognized that birthplace alone is unlikely to play a critical role in determining the realization of sport expertise, calling birthplace a proxy for describing different types of developmental environments, experiences, and opportunities. To explore this, Rossing et al.²⁵ and Farah²⁶ examined an athlete’s proximity to their nearest elite training facility, whereas Curran (^27,28) examined migration patterns of athletes. These are important steps towards uncovering the geo-political-cultural factors shaping an athlete’s development, but there remains much to be uncovered. For instance, little is known about if/how geographical considerations influence athlete selection decisions.

Current study

To explore the impact of geography on athlete selection, the present study investigated selection patterns in one of the world’s largest and most robust talent/athlete identification programs. Specifically, this work sought to quantify whether selection patterns revealed different requirements based on geographical location. While coaches specified their selection criteria (i.e., desired test performance, anthropometrics, and participation preferences), we investigated whether the data shows Regional athletes, (i.e., athletes from communities outside of the South East Queensland [SEQ] Metropolitan areas), must exceed these stated criteria by larger margins than SEQ Metropolitan athletes to receive comparable opportunities.

Our primary research question was: How much better do Regional athletes need to perform compared to Metropolitan (SEQ) athletes to get selected; and does this pattern vary across different sports? Using a hierarchical Bayesian approach, we aimed to: (1) estimate the difference in empirical selection thresholds between Regional and SEQ athletes, using coaches’ stated criteria as priors, (2) quantify our certainty about these threshold differences, and (3) identify which sports showed the strongest evidence of location-based disparities in selection patterns. Based on prior research indicating geographic impacts on athletic development and the economic realities of supporting geographically dispersed athletes, we expected the empirical selection thresholds, which are derived from actual selection patterns rather than stated criteria, would reveal that Regional athletes need to demonstrate superior attributes across multiple dimensions to receive offers.

Methods

Participants

Athletes

The present study was performed in accordance with relevant guidelines and regulations. Ethics from the University of Queensland ethics board was granted (2024/HE002237) and informed consent was obtained from all subjects and/or their legal guardian(s). Participants included applicants (n=4805, M_age=14.29±2.04) who attended a testing session for a multi-sport talent identification program in Queensland, Australia. Approximately one-third of athletes (n=1665) lived in Regional areas of Queensland (see Table 1 for a demographic breakdown).

Table 1 Overview of athlete demographics.

Full size table

Selectors

The selectors in this sample included coaches, pathway managers and/or high-performance managers from 19 Olympic sports/disciplines. The sports/disciplines included in the program had an average of two selectors involved in the selection process per sport. All selectors were representatives of their relevant State or National Sporting Organization. Each sport was supported in their selection process by a dedicated Queensland Academy of Sport (QAS) employee with expertise in athlete development whose role in the selection process was to be a ‘critical friend’ for selectors and to assist with contextualizing results.

QAS employees were in the room during selection and may have been influential in some cases; however, their role was to (1) collect the data; (2) collate the data; (3) present the data back to the selectors; (4) check and challenge the selectors’ selection processes; and (5) record the selection processes and decisions.

YouFor2032

The QAS YouFor2032 program aims to provide an alternative (i.e., ‘talent transfer’) mechanism for athletes to enter Olympic and Paralympic pathways to help fill key gaps within the talent pipelines of partnered Olympic and Paralympic sports. YouFor2032 consists of four talent search initiatives: Olympic sport search (athletes aged 13–23), Paralympic sports (athletes aged 13–30), Daredevils (athletes aged 8–15 for the action sports of BMX Freestyle, diving, and skateboarding), and Thematic search (targeted age ranges and key demographics to fill niche gaps in partner sports).

YouFor2032 was created with seven phases: (1) talent strategy, (2) talent recruitment (3) talent detection, (4) talent selection, (5) talent confirmation, (6) talent development, and (7) talent integration. The present study focused only on phases 4 to 5 within the Olympic sport search for the first two athlete intakes, looking primarily at the selection behaviour of coaches following athlete testing during talent detection. For more information please see YouFor2032²⁹.

Athlete testing procedure

As part of the first step of the talent detection phase, athletes registered for physical testing sessions through an online portal, which explained the YouFor2032 program. Athletes (and/or their parents, depending on their age at the time of registration) provided consent for participation and data collection; and an athlete profile (including demographics, sport history, and sporting preferences) was collected. Athletes then attended a testing session in which anthropometric, and physiological information was collected. These sessions occurred in over 40 locations across Queensland, in both Metropolitan and Regional areas. For more detail on recruitment, athlete profile information and testing protocols, please see YouFor2032²⁹. For a list of the test names and information both collected and derived, alongside their descriptions and measurement units, please see Table A1.

Prior to selections, selectors (i.e., coaches etc.) provided a list of testing parameters that would indicate potential success in their sport, which was used to guide selection discussions and decisions. For example, aerial skiing indicated that the ideal athlete needed to be lower than the 50th percentile in height, have a vertical jump greater than the 95th percentile, and to have an acrobatic sport background. Selectors would also indicate the approximate number of athlete spots available (ranging from 2–100), which varied depending on the sport’s needs (i.e., certain positions or roles to fill), financial capacity, and/or coaching capacity for that year.

Determining offers to participate in the talent program

A short list of athletes meeting (or close to) the sports’ selection criteria were provided to selectors by the QAS, from which they chose which athletes to invite (i.e., select) into the 12-week talent confirmation phase of the program. During this ‘Confirmation Phase’ ($\sim$ 3 months in duration), athletes were provided the opportunity to trial the new sport(s) they had been selected for, at no cost. Importantly, athletes were able to be selected for multiple sports if they met the sports’ selection criteria.

Determining regionality status

Queensland is the second largest state in Australia (1,727,000 $\hbox {km}^2$), with a population of just over 5.1 million people. Approximately two-thirds of the population ($\sim$ 3.7 m) live in the 35,000 $\hbox {km}^2$ South East Queensland (SEQ) area, which is the major metropolitan center of the state with a population density of 108 people/$\hbox {km}^2$³⁰. Areas outside of SEQ are considered ‘regional Queensland’, with 1.4 million residents across$\sim$ 1.69 m square kilometers.

Athletes who resided outside of SEQ were considered “Regional” athletes. To determine regionality status, participant suburbs collected during the registration process were first categorized into broader local government area groups before being classified as “Metropolitan” (i.e., “Metro” or “SEQ-based”) or “Regional” (i.e., non-SEQ)³⁰. Accordingly, suburbs within 11 local government areas (LGA) around Brisbane were classified as “SEQ” and the remaining were described as “Regional”. One exception was the Toowoomba LGA, of which only part was considered SEQ³⁰. To aid the analysis, all suburbs in the Toowoomba LGA were recorded as “Regional” for the purposes of this paper, given that the area is considered a ‘regional centre’ within other government models³¹.

Analyses

Distinguishing selection criteria from selection thresholds

It is important to distinguish between selection criteria and selection thresholds in this analysis. Selection criteria refer to the coach-defined requirements that athletes must meet to be considered for selection, encompassing physical performance metrics (e.g., vertical jump > 95th percentile), anthropometric characteristics (e.g., height > 50th percentile), age ranges (e.g., 13–17 years), and sport preference levels (e.g., ‘willing to try’ or higher). These criteria were explicitly stated by sport representatives prior to athlete testing and were applied uniformly to all athletes regardless of location. Selection thresholds, in contrast, represent the actual constellation of characteristics, including performance metrics, anthropometrics, preferences, and demographic factors, that athletes demonstrated to receive offers, as revealed through empirical analysis of selection outcomes. While selection criteria are the stated requirements, selection thresholds reflect the implicit multidimensional benchmarks that emerge from actual selection decisions. Our analysis examined whether these thresholds differed systematically between Regional and SEQ athletes, even when both groups were evaluated against the same stated criteria. For instance, if Regional athletes who received offers consistently display superior characteristics across multiple dimensions (i.e., taller height, stronger preferences, better test scores) compared to SEQ athletes who received offers, this would suggest Regional athletes face stricter selection thresholds despite identical selection criteria.

Athlete sport participation preference

As noted in Table A2, during registration athletes were asked to indicate their willingness to participate in each YouFor2032 sport. Specifically, athletes were asked to provide their willingness to participate in each sport using a scale of “absolutely try”, “willing to try”, “sure”, and “will not try”. The proportion of responses for each option by sport and by location (Regional vs SEQ) are represented. Some sports were only available to girls including Rugby 7 s, Soccer Scorer/Keeper and Sailing Crew.

Offer rate differences after meeting selection criteria

This analysis examined the offer rates for athletes who met the coach-defined selection criteria, comparing outcomes between Regional and SEQ locations. To calculate offer rates, we first applied the sport-specific selection criteria to all tested athletes within each location. The offer rate was then computed as:

$$\begin{aligned} \text {Offer Rate} = \frac{\text {Number of offers extended}}{\text {Number of athletes meeting criteria}} \end{aligned}$$

This calculation was performed separately for Regional and SEQ athlete populations. To quantify the disparity between locations, we calculated the difference in offer rates as (SEQ offer rate - Regional offer rate), where positive values indicate higher offer rates for SEQ athletes.

Regional comparisons by test metrics and anthropometrics

We employed Bayesian Estimation Supersedes the t-test (BEST;³²) to understand distributional differences in test metrics and anthropometrics between Regional and SEQ athletes. This approach provides rich posterior distributions for all parameters, enabling nuanced interpretation beyond binary significance decisions. Each metric was analyzed independently using PyMC (version 5.20.1) with robust Student’s t-distributions to handle outliers. Full technical details are provided in the Supplemental File.

Bayesian hierarchical model for athlete selection threshold differences between locations

We developed a hierarchical Bayesian model to quantify the gap between coaches’ stated selection criteria and the empirical thresholds revealed by actual selection patterns. The model addresses our research questions by: (1) incorporating coaches’ stated criteria as informative priors, (2) estimating location-specific deviations from these criteria, and (3) quantifying uncertainty through full posterior distributions. Simple t-tests could have been used to compare mean performance between geographic locations (e.g., Regional vs SEQ), but it would ignore the nested structure of athletes within sports and the reality that different sports ask for and use different characteristics (e.g., rowing wants bigger athletes whereas skateboarding is looking for smaller ones). Similarly, we could have used standard logistic regression to model selection probability but this would not have accounted for sport-specific variations or incorporated coaches’ stated criteria. Frequentist mixed-effects models could handle the hierarchical structure but would not naturally incorporate prior information about expected thresholds. The hierarchical Bayesian approach was chosen because it: (1) handles the nested data structure, (2) incorporates coaches’ stated criteria as informative priors, and (3) provides uncertainty quantification that directly addresses our research questions.

The hierarchical structure borrows strength across sports to improve estimates for those with smaller sample sizes while separating global patterns (systematic regional bias across all sports) from sport-specific variations. This directly addresses our aim to identify which sports show the strongest location-based disparities.

Model diagnostics indicated excellent convergence with all parameters achieving ${\hat{R}} < 1.01$ and effective sample sizes exceeding 1600 (400 per chain). The global Regional disadvantage was estimated with high precision (standard errors below 0.2 SD), though precision varied across sports depending on sample size. Sports with fewer than 10 offers in specific region combinations exhibited wider uncertainty bounds, which the hierarchical model addressed through partial pooling. Complete model specification and implementation details are presented in the Supplemental File.

Sport-metric level analysis

To complement the hierarchical model’s quantitative estimates and to provide an intuitive interpretation of Regional disadvantages, we examined the actual performance distributions of athletes who received offers at the sport-metric level. For each sport, we analyzed both anthropometric and performance metrics alongside preference levels. Anthropometric and performance metrics were visualized using boxplots to show full distributions, while preference levels were visualized using heatmaps showing the proportion of offered athletes at each preference level. These descriptive analyses illustrate the practical meaning of standardized effect sizes from the model and reveal metric-specific patterns that aggregate sport-level estimates may obscure.

Sensitivity analysis

To assess the robustness of our findings, we conducted comprehensive sensitivity analyses examining the stability of the estimated Regional disadvantage parameter ($\mu _\delta$). We tested model performance across multiple random seeds (42, 142, 242), varied prior specifications (skeptical priors with halved standard deviations and diffuse priors with doubled standard deviations), and restricted the analysis to sports with larger sample sizes. Full technical details and convergence diagnostics are provided in the Supplemental File.

Results

Athlete sport participation preference

The top three most selected sports (by ‘most willing to try’) by Regional athletes were Athletics (400 m) (36%), Beach Volleyball (35%), and Athletics (Throws) (29%). Whereas the top three most selected sports by SEQ athletes were Beach Volleyball (39%), Athletics (400 m) (37%), and Triathlon (29%). Overall, the participation interest across sports was similar for Regional and SEQ athletes. For a visual depiction of the proportion of responses for each option by sport and by location (Regional vs SEQ), please refer to the heat map in Fig. 1.

Offer rate differences after meeting selection criteria

Analysis of offer rates among athletes who met coach-defined selection criteria revealed systematic disparities between Regional and SEQ locations (Fig. 2). Among eligible athletes, the mean offer rate was 50.3% for Regional athletes compared to 65.3% for SEQ athletes, representing a 15.0% difference (median difference: 13.9%). This pattern was remarkably consistent across sports, with 16 of 19 sport/disciplines showing higher offer rates for SEQ athletes and three sports showing equal rates. Notably, no sports demonstrated higher offer rates for Regional athletes.

The disparity was most pronounced in Skateboarding (46.9% difference), Archery (40.3%), and Swimming (27.2%). In Skateboarding, for instance, 88.0% of SEQ athletes who met selection criteria received offers compared to only 41.1% of Regional athletes meeting identical criteria. These offer rate differences help to illuminate that meeting selection criteria alone was insufficient to ensure equal selection probability across geographic locations. The consistency of this pattern, with no sports favoring Regional athletes, suggests systematic rather than sport-specific factors driving selection decisions. This evidence supports the need for the hierarchical Bayesian analysis to quantify the implicit performance thresholds that Regional athletes must exceed to achieve comparable selection probabilities.

Regional comparisons by test metrics and anthropometrics

Bayesian analysis of anthropometric and performance metrics revealed distinct regional differences in athlete profiles. Regional athletes demonstrated substantially higher values in Seated Height (P(Regional>SEQ) = 100%), Sitting Height:Height Ratio (100%), Maturation Age (100%), and Peak Height Velocity Index (100%), where P(Regional>SEQ) represents the probability that Regional athletes exceed SEQ athletes on a given metric.

In contrast, SEQ athletes exhibited superior performance in endurance and power metrics. Specifically, SEQ athletes achieved better Beep Test results (mean difference = − 0.44 levels, P(Regional>SEQ) < 0.1%), higher predicted $\hbox {VO}_2$ max percentiles (− 3.87 percentile points, P(Regional>SEQ) < 0.1%), greater Vertical Jump percentiles (-2.98 percentile points, P(Regional>SEQ) = 0.1%), and more Max Inclined Pull-ups repetitions (− 0.71 reps, P(Regional>SEQ) = 0.2%).

No meaningful differences were observed in 20 m Sprint performance (time or percentile) or Arm Length measurements between geographic locations, with overlapping posterior distributions and probabilities near 50%, indicating these metrics are similarly distributed across both populations. These findings are visualized in Fig. 3.

Bayesian hierarchical model for athlete selection threshold differences between locations

A hierarchical Bayesian model was employed to simultaneously address two key questions: (1) whether actual selection thresholds aligned with coaches’ stated criteria, and (2) how much better Regional athletes needed to perform compared to SEQ athletes to achieve comparable selection probabilities. The model incorporated coaches’ stated criteria as informative priors while allowing the data to reveal actual selection patterns. Two primary parameters emerged from this analysis: First, the global threshold adjustment parameter ($\mu _{\text {threshold}}$) quantified the overall deviation between stated selection criteria and empirical thresholds. With a posterior mean of − 0.0003 (95% CI [− 0.94, 1.0]), this parameter indicated that actual selection thresholds align closely with coaches’ stated requirements on average, validating the use of coach criteria as a baseline for comparison (see posterior distributions in Fig. B1). Model diagnostics indicated excellent convergence (Fig. B2) with all parameters achieving ${\hat{R}}<1.01$ and effective sample sizes exceeding 1600 (400 per chain).

Second, and more critically for our research question, the global Regional disadvantage parameter ($\mu _{\delta }$) revealed systematic differences in selection thresholds between locations. Regional athletes faced significantly higher thresholds than their SEQ counterparts, with a mean Regional disadvantage of 1.009 standard deviations (95% CI [0.656, 1.355]) and 100% posterior probability that Regional athletes face stricter selection criteria. This finding directly answers our primary research question: within this sample, Regional athletes must perform approximately one full standard deviation better than SEQ athletes to achieve comparable selection probabilities. In practical terms, this means a Regional athlete at the 50th percentile of a performance metric would need to reach approximately the 84th percentile to have the same selection probability as an SEQ athlete at the 50th percentile. Figure 4 reveals a critical asymmetry: SEQ athletes’ selection thresholds closely match stated criteria (deviation $\approx$ 0 SD), while Regional athletes must systematically exceed these same requirements by $\mu _\delta$ = 0.757 SD (95%CI [0.656, 1.355]).

Sports level differences

While the global analysis revealed a systematic disadvantage for Regional athletes, substantial variation existed across sports (Fig. 5). The hierarchical model’s sport-specific bias adjustments ($\beta _s$) quantified how individual sports deviated from the global pattern. Three distinct patterns emerged: first, several sports imposed additional disadvantage beyond the global mean, with Archery (0.899 SD, 95% HDI [0.341, 1.461]) and Sprint Canoeing (0.530 SD, 95% HDI [0.099, 0.983]) showing the strongest bias against Regional athletes. Second, a subset of sports demonstrated more lenient criteria for Regional athletes, notably Athletics 400 m (− 0.386 SD, 95% HDI [− 1.124, 0.350]) and Skateboarding (− 0.388 SD, 95% HDI [− 1.100, 0.351]). Third, the majority of sports (16 of 19) showed effects between − 0.216 and 0.211 SD, clustering near zero and indicating they largely followed the global pattern without substantial sport-specific deviations. This heterogeneity suggests that while a Regional disadvantage is systematic, its magnitude is likely modulated by sport-specific factors such as resource requirements, coaching availability, or selection philosophies.

Sport-metric level differences

To understand how a Regional disadvantage manifests at the metric level, we examined the actual performance and preference distributions of athletes who received offers (Figs. 6 and 7). These empirical patterns validate the hierarchical model’s estimates by revealing the specific attributes where Regional athletes must excel to achieve selection.

For Athletics 400 m, where the model estimated moderate disadvantage for Regional athletes (sport effect: − 0.150 SD), the metric-level analysis revealed nuanced patterns (Fig. 8). Regional athletes who received offers demonstrated substantially higher standing height percentiles than their SEQ counterparts (median 75th versus 60th percentile) and greater predicted height (difference: 0.943 SD, 95% CI [− 2.082, 0.240]). However, these same Regional athletes showed inferior Beep Test performance (median level 8 versus 10), suggesting coaches may prioritize different attributes when evaluating Regional versus SEQ athletes.

Skateboarding (Fig. 9), which showed moderate evidence of more lenient Regional criteria in the aggregate model (− 0.337 SD sport effect), presented distinct patterns in the raw data. Among offered athletes, Regional athletes were taller (37th versus 29th height percentile) but lighter (15th versus 19th mass percentile) than SEQ athletes. Interestingly, the posterior distributions for these two metrics appear nearly identical despite these observable differences.

Boxing (Fig. 10) demonstrated pronounced metric-specific threshold variations. Despite no clear overall bias (sport effect: − 0.077 SD, P(stricter) = 38.8%), Regional athletes showed superior vertical jump performance (median 25 cm versus 20 cm) while having lower 20 m sprint percentiles (median 20th versus 40th percentile).

Figure 7 displays preference level distributions among offered athletes, revealing that Regional athletes often expressed stronger preferences (“absolutely try” versus “willing try”) to receive offers, providing additional evidence of selection threshold differences beyond anthropometric metrics.

These metric-level patterns demonstrate that Regional disadvantage is not uniform across all performance dimensions within a sport. Rather, Regional athletes must strategically excel in specific attributes that coaches value most highly, while potentially compensating for limited training opportunities reflected in lower endurance metrics.

Robustness of regional bias

The sensitivity analysis confirmed the robustness of our primary finding. Across all specifications tested, the mean Regional disadvantage remained consistently positive and substantial, ranging from 0.596 SD (95% HDI [0.244, 0.968]) under diffuse priors to 0.932 SD (95% HDI [0.516, 1.339]) when restricted to large sports only. The baseline estimate of 0.757 SD was highly stable across different random seeds (range 0.757–0.762 SD), and even skeptical priors yielded a meaningful disadvantage of 0.603 SD (95% HDI [0.335, 0.870]). All models achieved satisfactory convergence with no divergences except under diffuse priors, where 200 divergences suggested the model appropriately resisted unrealistic parameter values.

Discussion

Overall, findings indicate that athletes from Regional areas (i.e., Regional athletes) of Queensland, Australia were less likely to be selected into the YouFor2032 program compared to athletes residing in Metropolitan areas (i.e., SEQ athletes). Results from two cohorts indicated that Regional athletes who met the coaches’ selection criteria (from a physical testing, sporting background, demographic information perspective) were underrepresented compared to the proportions of Regional and SEQ athletes, across multiple sports. While the exact mechanisms driving this finding were not examined in this study directly, potential explanations will be explored, which may provide a basis for future work in this under-researched area.

Our finding that regional athletes require approximately one standard deviation higher performance for equal selection probability warrants careful consideration. To contextualize this effect size, an athlete performing at the 84th percentile from a regional area would have comparable selection chances to a metropolitan athlete at the 50th percentile. This substantial gap suggests geographic location functions as a meaningful, albeit unrecognized, factor in talent selection decisions. What makes this finding particularly compelling is its consistency across our sensitivity analyses. When we imposed skeptical priors to guard against overestimation, the effect remained meaningful at 0.60 SD. Alternative random seeds produced virtually identical estimates (0.757–0.762 SD), and when we restricted our analysis to sports with larger sample sizes, where selection processes might be more standardized, the disadvantage actually increased to 0.93 SD. These sensitivity analyses (Table A2) confirm the robustness of our main finding of approximately a 1.0 SD disadvantage (Fig. B2). This robustness suggests we are observing a genuine phenomenon rather than a statistical artifact, a systematic barrier that must be considered when evaluating the potential mechanisms behind regional underrepresentation.

A potential mechanism driving this finding relates to the expected cost of sport participation for Regional athletes compared to SEQ athletes. While the sporting organizations that partner with YouFor2032 are provided with funding to help support the costs of the program and to ensure no-cost participation for at least the 3-month confirmation phase that follows an offer, the relative cost of supporting and developing an athlete in regional areas is greater than those in SEQ areas. Broadly, training facilities and appropriate level coaches are more common in metropolitan areas^19,33. The further an athlete lives from the Metropolitan area, the more difficult it is for them to access facilities, equipment and coaching; and the larger the financial commitment required from the sport program to subsidize and support the athletes. Costs for participation may include facility access and coaching fees, equipment hire or purchase, relevant insurance, and/or travel to/from facilities or metropolitan areas for training/camps for both athletes and coaches. Relatedly, the smaller populations in regional areas means fewer athletes participating in a given sport, which often requires sports to select multiple athletes in order to support their targeted athlete. In this sense, if a coach wanted to accept a Regional athlete, that coach may want to select three or four other Regional athletes in the same area to allow for those athletes to train together. This draws into question the relative value and cost of not only the targeted athlete, but those surrounding the athlete in development³⁴.

To examine this more closely, geographical influences such as transportation accessibility and access to junior coaching and playing opportunities have been studied in the context of ‘athlete production’. Burke et al.³⁵ performed a multivariate spatial analysis exploring the influence of transportation accessibility and remoteness of individuals, on the propensity of Australian towns and regions to produce elite professional (as measured by players who were drafted and played at least one game in the senior professional Australian Football League [AFL]), professional athletes. Their findings suggest that regions with high talent ‘yields’, that also have low transport accessibility scores, still produce talent despite the barriers. Thus, if the sport feasibility issue mentioned above can be overcome, the selection and support of athletes from these areas may significantly improve Regional athlete development and the talent pool within the sport. This aligns with similar research in the field noting that geographical context can systematically shape physical activity. As evidenced by the work by Janssen et al.³⁶ and Hansen and Chen³⁷, measures of neighborhood/community and school catchment area socioeconomic status (SES) have been associated with physical activity, even after controlling for household SES, individual income, and education. As well, neighborhoods/communities that have elements of being ‘walkable’³⁸ with access to green space and recreation facilities³⁹, and better weather⁴⁰ are more likely to support active lifestyles. More work is needed, however, to determine the relationship between these variables and access to sport participation.

Relatedly, work from the birthplace effect research suggests children from a larger 'urban centre’ have access to a larger number of resources compared with their counter-parts from smaller cities¹⁹). This might mean that Regional athletes are more likely to self-deselect prior to testing, and/or that SEQ athletes are more likely to perform better in a formalized testing environment due to their greater likelihood of exposure to formal training (and likely sport-based testing), compared to Regional athletes.

The next most likely driving mechanism at play is related to logistics. As emphasized in this article, most of Queensland’s population live in regional and remote areas, where there are limited facilities and coaching opportunities. Therefore, to support athletes in these areas, coaches from the South-East metropolitan areas would need to create remote coaching opportunities (e.g., through video linking), or travel to these areas (i.e., requiring a minimum of 3 hours of travel in either direction) on a regular basis. Given the limited resources available in developmental sport settings, identifying an athlete from a SEQ area rather than an athlete from a Regional area may reduce the financial risk. Put another way, selectors can support multiple SEQ athletes with the same resource allocation as a single Regional athlete; therefore Regional athletes need to be ‘worth the risk’.

A third potential explanation could be that coaches wanted to select athletes to the YouFor2032 program who were less-developmentally mature (as measured by Sitting Height:Standing Height ratio, seated height value, and Peak Height Velocity) as the YouFor2032 program emphasizes the development of athletes for the upcoming 2032 Olympics in Brisbane, Australia, which at the time of the testing, was roughly 8-10 years away. This could mean coaches wanted athletes, particularly for those in earlier peak age sports like diving, who were relatively younger. Since Regional athletes’ testing scores indicated they were relatively older on average (not significant) and were more mature (significant), this could have influenced their decision for giving offers to SEQ athletes, on top of their other physical testing disadvantages. These findings align with established work in the domain of sport science with strong evidence for robust and pervasive relative age and maturational effects across many sports⁴¹. For example, in a review by Rubia and colleagues⁴², the authors examined the influence of relative age effects (RAE) on both short-term and long-term performance. They found that across 10 different sports, short-term performance (e.g., within single competitions or regular seasons) was generally influenced by the RAE, whereas long-term performance (e.g., beyond a sport season) showed a reverse RAE in nearly 50 percent of the cases. This finding may shed light on the impact of short-term assessment (such as talent identification batteries) and how relative age effects can shape early impressions for selection.

While these explanations seem plausible, they will take detailed investigations to determine their influence on selection behaviour. What is clear from the present study, however, was that coaches demonstrate sophisticated athlete/talent identification practices. Coaches appeared to have considered anthropometric scores, physical testing scores, location, and athlete preferences for which sports to compete in, indicating multiple information sources were included in their selection decisions. Notably, although Regional athletes demonstrated lower scores on measures like the Beep Test and Pull-Ups, coaches may have recognized these as highly ‘trainable’ attributes. These are areas where coaching and regular training can produce substantial improvements. In contrast, Regional athletes matched SEQ athletes on less trainable measures such as 20m Sprint Speed and anthropometric characteristics. This pattern suggests coaches may have accounted for Regional athletes’ limited access to training facilities and coaching when evaluating their potential, distinguishing between current performance (influenced by training exposure) and inherent athletic capacity. These coaches make a calculated dual investment when selecting Regional athletes with lower fitness scores. They commit to 2–3 times higher support costs while betting that equalizing training opportunities will unlock superior long-term development. For instance, choosing a Regional athlete with a Beep Test of 5 over an SEQ athlete scoring 6 recognizes that this performance gap may reverse once both athletes receive equivalent training access. This holistic approach, weighing latent potential against current performance within context, represents “talent identification expertise” and underscores why Regional athletes deserve equitable opportunities.

From a decision-making perspective, coaches in this sample may have a bias towards SEQ athletes, and/or a bias against Regional athletes. It is also plausible that location information about an athlete is utilized as a heuristic (i.e., a fast and frugal decision-making strategy) when making athlete selections. As noted in other decision-making literature, sometimes these approaches have the potential to help decision makers make time- and energy-efficient decisions⁴³, but often, these strategies are notorious for leading to error-filled judgments ^44,45. Whether this regional bias (like other cognitive biases) is helpful or hurtful in the context of athlete development and athlete success (Olympic hopefuls) will be difficult to determine, as athletes who are not selected for the program do not receive the same dedicated coaching, facility access, and education as those who are selected.

Limitations and future directions

While this study advances our understanding of athlete selection processes, it is important to acknowledge several limitations. First, the hierarchical Bayesian modelling approach, while providing robust global estimates, revealed important methodological considerations when examining sport-specific patterns. For instance, the near-identical posterior distributions for Skateboarding’s height and mass metrics, despite clear differences in the raw data, illustrate a characteristic limitation of hierarchical models with sparse data. With only 12 Regional offers in Skateboarding, the model’s hierarchical structure pools information across metrics within the sport, attributing most variation to a strong sport-level effect (− 4.5 SD) rather than distinguishing between individual metrics. This behavior is statistically appropriate as the model correctly identifies that Skateboarding selection strongly favors smaller athletes overall, but can obscure metric-specific patterns when data is sparse^46,47. This pooling effect demonstrates both the strength and limitation of hierarchical modeling: while it provides stable estimates by borrowing strength across related parameters, it may oversimplify when group-specific sample sizes are small. The model essentially determines that the evidence is insufficient to confidently distinguish between height and mass criteria within Skateboarding, defaulting to a simpler explanation of a uniform sport effect.

Beyond these analytical considerations, perhaps the most prominent conceptual limitation is the crude and binary nature of ‘Regional’ vs ‘SEQ’ comparisons. This simplistic split does not allow for the nuanced investigation of differences between levels of ‘remoteness’, which such a variable deserves. In reality, there are multiple levels of remoteness, and this binary split may obscure meaningful gradations. However, even within governmental documents there are differing definitions of ‘regional’, ‘rural’, ‘remote’, depending on the particular model and purpose (i.e., geographical, economic, workforce/education, healthcare access boundaries). The simple binary was selected for this research as for selectors, Queensland is functionally conceptualized as ‘the regions’ and ‘SEQ’, and align with reporting definitions (e.g., 5 percent of Olympic athletes were from the regions). In addition to this conceptual limitation, there are practical limitations to consider. There are a range of factors influencing a sport’s capacity to develop Regional athletes, such as internal sport politics, lack of facilities, and natural geographic limitations (e.g., sailing in non-coastal areas); as well as understanding that some sports had additional minimum selection criteria, such as already being able to ride a bike or skateboard, which may have impacted selection decisions outside of testing scores.

Future work could benefit from exploring other gradients. Roca et al. (2023), for example, reported macroregional differences in cardiorespiratory fitness in Croatian children. Specifically, they stratified participants into three macro-regions of the Republic of Croatia using the Nomenclature of Territorial Units for Statistics (NUTS-2) system. The European Union (EU) uses this system to break countries into regions so that information can be collected and compared fairly across Europe. These regions are big enough to show important economic and social trends, but small enough to reflect local differences. The NUTS-2 level is especially useful because it provides consistent regional statistics, guides how EU regional funding and policies are applied, and helps track how different parts of Europe are developing over time.

Other potential avenues for expanding the classification include population size, population density, and distance (by km) to the nearest ‘city’ centre. Further detailing and layering could be explored such as neighborhood/community conditions (e.g., neighborhood-level income, or amount of green space) or some variation or combination of these variables to create meaningful regions that distinguish communities on a more granular level⁴⁸. To determine meaningful regions, it could be helpful to use spatial analysis amplified with Geographic Information Systems (GIS) programming.

Future work could also benefit from layering information onto the present findings to better understand how community size and density relate to resource density and availability. These include considering proximity (by kilometer or by minutes/hours) to the nearest training clubs ²⁵, the concentration of clubs in a region (club density)⁴⁹, and proximity to larger cities that house developmental teams/programs²⁶. Various aspects of spatial analyses could be useful for understanding how community features (size, density, weather conditions etc.) and offerings (training, transportation, proximity to hospitals, school etc.) influence athlete development outcomes.

As well, integrating interviews with coaches and athlete selectors to triangulate findings to better understand the experiences of those forming judgements and making decisions about ‘fit’ and ‘utility’ for athlete selections would be a worthy investigation. For example, interview questions could explore how system constraints (e.g., funding, number of spots in pathways, regional infrastructure) impact selection decisions and coach evaluation of athletic potential (and the risk/reward trade-off). Closely related, it would be interesting to consider Regional biases for different groups including gender differences, team and individual sport differences, etc.

Finally, future research could examine whether increased funding for regional athletes ends up reducing disadvantages by replicating this design with subsequent cohorts following funding changes. Applying causal inference methods in this context would provide stronger estimates of the relationship between funding and athlete opportunity. While such approaches cannot establish causality with complete certainty, they offer a valuable framework for evaluating the impact of funding on reducing inequities for regional athletes. As well, future research could examine and compare selection when additional resourcing (e.g. facilities, staff) is provided in regional areas, as often found through ‘regional academy’ systems⁵⁰.

Implications

This research project advances our current knowledge of athlete selection in sport. In light of the relatively sparse research on decision-making for talent selection, this project contributes to our understanding of how ‘talent’ is conceptualized and operationalized across multiple sport domains. As a result, the project may have important implications for coach education and future sport policy considerations. While there may be multiple avenues forward to creating more equitable pathways for Regional athletes, perhaps the most impactful, would be to have a four-pronged approach, which could include: (1) establishing substantial and sustainable funding designated for the purpose of developing Regional athletes, (2) implementing supportive programming at multiple points in the pathway, (3) altering policies to reflect these priorities, and (4) monitoring and (re)evaluating the system for progress. Overall, these actions will help to widen the athlete pool by encouraging participation from traditionally under-represented areas by means of reducing barriers (e.g., cultural, geographic, financial) and increasing regional representation to reflect Australia’s changing demographics. For a practical example, the Australian Football League (AFL) used a similar approach to help minimize disparities between Regional and Metropolitan athletes within the AFL system. The Next Generation Academies (NGAs) program was developed jointly by clubs and the league and was designed to support Indigenous, multicultural, and underrepresented athletes (i.e., individuals from non-traditional AFL backgrounds, typically found in regional areas) between 11–18 years of age⁵¹. Clubs were assigned specific geographic zones (e.g., metro areas, rural regions) where they must actively scout, engage, and develop athletes. Doing so provides early exposure and other supports (e.g., foundational skills, “come and try” days), as well as opportunities for leadership education, fitness training, and other elements of coaching to which athletes might not have had access to otherwise. At the later stages of the talent pathway, the AFL introduced policy changes designed to make the National Draft more inclusive of athletes from traditionally under-represented communities. In particular, rules were modified to provide clubs with concessions to increase their opportunity to draft their own Academy athletes. In addition, clubs are allowed to list athletes from regional, multicultural, or international backgrounds as “Category B” rookies outside of the primary draft⁵¹.

It has been suggested that one method to become more objective and less prone to biases is to recognize the influence that assumptions and biases play in making predictions and decisions^52,53. Mann and van Ginniken ⁽⁵⁴⁾ studied this selection biases across elite level soccer scouts who watched video footage of a soccer match and then ranked each player’s potential for success in soccer. Prior to watching the video, scouts were divided into three categories, (i) scouts who were not given information about the age of the athletes, (ii) those who were given written information on each player’s birthdate and (iii) and a final group that was told that the player’s jersey number corresponded to the player’s age (i.e., jersey number one would correspond with the oldest player on the team). Results revealed a significant selection bias for scouts in the ‘no age’ (group i). Interestingly, informing the scouts of the players’ birthdates was insufficient to reduce this bias, but presenting information about the corresponding shirt number and age was effective. In this example, simple manipulations of information significantly influenced decision makers. Findings from the present study relating to geography and selection biases could inform coaches and other interest holders about the pervasiveness of biases and to caution the use of rigid testing protocols without critical examination.

We hope this work acts as a catalyst for future research in the field as it is a relatively untapped area with the potential to have important implications on athlete selection. The present research project and future work will hopefully positively contribute to the increased accuracy of talent selection and lead to a decrease in the de-selection of potentially talented performers through inappropriate identification and selection methods⁵⁵.

Beyond revealing the existence of a regional bias, this work demonstrates the importance of examining selection thresholds rather than simply examining the proportions of athletes coming from certain areas. The finding that Regional athletes must outperform their SEQ counterparts to achieve selection highlights how geographical biases operate through implicit performance standards. Future research and policy interventions must therefore address not only who or how many get selected, but what level of performance different groups need to demonstrate for selection.

Conclusion

Athletes from Regional areas appear to need to achieve significantly greater testing scores to receive offers relative to the population distribution compared to athletes from Regional or Metropolitan areas. This is thought to be due to the logistical and resource constraints of supporting athletes away from the sports’ developmental resources, which are typically found in lesser quantities in regional and remote areas. It was clear from the present investigation that coaches employ complex and sophisticated decision-making strategies to consider an athlete’s present abilities, financial barriers, and an athlete’s future potential within a development system. These findings help to illuminate the hugely complicated task that coaches have of making athlete selections, and the environmental constraints they must consider.

Data availability

Data is available upon reasonable request from AR.

Code availability

Code is available upon reasonable request from YW.

References

Zhao, J., Xiang, C., Kamalden, T.F.T., Dong, W., Luo, H., Ismail, N. Differences and relationships between talent detection, identification , development and selection in sport: A systematic review. Heliyon (2024)
Ford, P. et al. A survey of talent identification and development processes in the you th academies of professional soccer clubs from around the world. J. Sports Sci. 38(11–12), 1269–1278 (2020) https://doi.org/10.1080/02640414.2020.1752440
Pankhurst, A., Collins, D. Talent identification and development: The need for coherence between research, system, and process. QUEST. 65(1), 83–97. https://doi.org/10.1080/00336297.2012.727374 (2013).
Parra-Martinez, F., Wai, J. Talent identification research: a bibliometric study from multidiscipl inary and global perspectives. Front. Psychol. 14 https://doi.org/10.3389/fpsyg.2023.1141159 (2023).
Dodd, K. & Newans, T. Talent identification for soccer: Physiological aspects. J. Sci. Med. Sport 21(10), 1073–1078. https://doi.org/10.1016/j.jsams.2018.01.009 (2018).
Article PubMed Google Scholar
Söderström, T., Sandlund, S., Westerlund, R. & Tervo, T. The role of physiological testing for athlete development in sport: Th e elite athlete perspective. Int. Rev. Sociol. Sport 59(8), 1244–1265 (2024).
Article Google Scholar
Gil, S., Ruiz, F., Irazusta, A., Gil, J. & Irazusta, J. Selection of young soccer players in terms of anthropometric and physi ological factors. J. Sports Med. Phys. Fitness 47(1), 25 (2007).
CAS PubMed Google Scholar
Lemoyne, J., Brunelle, J., Pelletier, V., Glaude-Roy, J., Martini, G. Talent identification in elite adolescent ice hockey players: The disc riminant capacity of fitness tests, skating performance and psychological characteristics. Sports. 10(4). https://doi.org/10.3390/sports10040058 (2022).
Jokuschies, N., Gut, V. & Conzelmann, A. Systematizing coaches’ “eye for talent”: Player assessments based on e xpert coaches’ subjective talent criteria in top-level youth soccer. Int. J. Sports Sci. Coach. 12(5), 565–576. https://doi.org/10.1177/1747954117727646 (2017).
Johnston, K., Baker, J. Sources of information used by elite distance running coaches for selection decisions. PLoS One. 17(8). https://doi.org/10.1371/journal.pone.0268554 (2022).
Roberts, A., Greenwood, D., Stanley, M., Humberstone, C., Iredale, F., Raynor, A. Understanding the “gut instinct” of expert coaches during talent ident ification. J. Sports Sci. 39(4), 359–367. https://doi.org/10.1080/02640414.2020.1823083 (2021).
Roberts, A., Greenwood, D., Humberstone, C., Raynor, A. Pilot study on the reliability of the coach’s eye: Identifying talent throughout a 4-day cadet judo camp. Front. Sports Active Liv. 2 https://doi.org/10.3389/fspor.2020.596369 (2020).
Capstick, A.L., Trudel, P. Coach communication of non-selection in youth competitive sport. Int. J. Coach. Sci. 4(1) (2010)
Neely, K.C., Dugdale, J.H. Making the Cut: Coaches and the deselection of young athletes. In Routledge Handbook of Coaching Children in Sport, 175–184 (2022).
Den Hartigh, R., Van Dijk, M., Steenbeek, H., Van Geert, P. A dynamic network model to explain the development of excellent human performance. Front. Psychol. 7, 532. https://doi.org/10.3389/fpsyg.2016.00532 (2016).
Den Hartigh, R., Hill, Y., Van Geert, P.: The development of talent in sports: A dynamic network approach. Complexity. 2018 https://doi.org/10.1155/2018/9280154 (2018).
Johnston, K., Roberts, A., Baker, J. 10 considerations for athlete selection: A resource and guide for researchers and practitioners. Sports Coach. Rev. 1–19. https://doi.org/10.1080/21640629.2025.2497208 (2025).
Peringa, I.P., Niessen, A.S.M., Meijer, R.R., Hartigh, R.J.R. A uniform approach for advancing athlete assessment: A tutorial on the lens model. Psychol. Sport Exerc. 76, 102732. https://doi.org/10.1016/j.psychsport.2024.102732 (2025).
Côté, J., Macdonald, D. J., Baker, J. & Abernethy, B. When “where” is more important than “when”: Birthplace and birthdate effects on the achievement of sporting expertise. J. Sports Sci. 24(10), 1065–1073. https://doi.org/10.1080/02640410500432490 (2006).
Hancock, D.J., Coutinho, P., Côté, J., Mesquita, I. Influences of population size and density on birthplace effects. J. Sports Sci. 36(1), 33–38. https://doi.org/10.1080/02640414.2016.1276614 (2018).
Maayan, Z., Lidor, R., Arnon, M. The birthplace effect in 14–18-year-old athletes participating in competitive individual and team sports. Sports (2075-4663) 10(4), (2022)
MacDonald, D. J., King, J., Côté, J. & Abernethy, B. Birthplace effects on the development of female athletic talent. J. Sci. Med. Sport 12(1), 234–237. https://doi.org/10.1016/j.jsams.2007.05.015 (2009).
Article PubMed Google Scholar
Curtis, J. E. & Birch, J. S. Size of community of origin and recruitment to professional and Olympic hockey in North America. Sociol. Sport J. 4(3), 229–244. https://doi.org/10.1123/ssj.4.3.229 (1987).
Article Google Scholar
Smith, K.L., Weir, P.L. Female youth soccer participation and continued engagement: Associations with community size, community density, and relative age. Front. Sports Active Living. 2, 552597. https://doi.org/10.3389/fspor.2020.552597 (2020).
Rossing, N.N., Nielsen, A.B., Elbe, A., Karbing, D.S. The role of community in the development of elite handball and football players in Denmark. Eur. J. Sport Sci. 16(2), 237–245. https://doi.org/10.1080/17461391.2015.1009492 (2016).
Farah, L. Community size effects in Canadian National Hockey League draftees: Exploring regional variations in community size effects and the influence of population density and proximity to Canadian Hockey League teams. PhD thesis, Ontario Tech University (2017)
Curran, C. The migration of irish-born players to undertake US Soccer Scholarships, 1969–2000. Immigrants Minorities 39(1), 101–131. https://doi.org/10.1080/02619288.2021.1982700 (2021).
Curran, C. The migration of Irish-born footballers to England, 1945–2010. Soccer Soc. 16(2–3), 360–376. https://doi.org/10.1080/14660970.2014.961372 (2015).
YouFor2032: About YouFor2032|YouFor2032. https://www.qasport.qld.gov.au/youfor2032/about (accessed 12 Aug 2025).
Commonwealth of Australia: South East Queensland: Population, Housing, Jobs, Connectivity, and Li veability. (2022). https://www.infrastructure.gov.au/sites/default/files/documents/bcarr- seq-report.pdf (accessed 27 May 2025)
Queensland Government. https://www.infrastructure.gov.au/sites/default/files/documents/bcarr-seq-report.pdf
Kruschke, J. K. Bayesian estimation supersedes thet test. J. Exp. Psychol. Gen. 142(2), 573–603. https://doi.org/10.1037/a0029146 (2013).
Hancock, J.T., Liu, S.X., Luo, M., Mieczkowski, H. Social media and psychological well-being. In The Psychology of Technology: Social Science Research in the Age of Big Data, 195–238 (American Psychological Association, 2022). https://doi.org/10.1037/0000290-007
Baker, J., Wattie, N.: Innate talent in sport: Separating myth from reality. Curr. Issues Sport Sci. (CISS). 3, 006–006. https://doi.org/10.36950/CISS_2018.006 . (2018).
Burke, M., Woolcock, G., Poruschi, L. The impacts of transport accessibility and remoteness on elite sports talent development (2012)
Janssen, I., Boyce, W. F., Simpson, K. & Pickett, W. Influence of individual- and area-level measures of socioeconomic status on obesity, unhealthy eating, and physical inactivity in canadian adolescents. Am. J. Clin. Nutr. 83(1), 139–145. https://doi.org/10.1093/ajcn/83.1.139 (2006).
Article CAS PubMed Google Scholar
Hanson, M. D. & Chen, E. Socioeconomic status and health behaviors in adolescence: A review of the literature. J. Behav. Med. 30(3), 263–285. https://doi.org/10.1007/s10865-007-9098-3 (2007).
Article PubMed Google Scholar
Frank, L. D., Schmid, T. L., Sallis, J. F., Chapman, J. & Saelens, B. E. Linking objectively measured physical activity with objectively measured urban form: Findings from smartraq. Am. J. Prev. Med. 28(2 Suppl 2), 117–125. https://doi.org/10.1016/j.amepre.2004.11.001 (2005).
Article PubMed Google Scholar
Giles-Corti, B., Donovan, R.J. The relative influence of individual, social and physical environment determinants of physical activity. Soc. Sci. Med. (1982). 54(12), 1793–1812. https://doi.org/10.1016/s0277-9536(01)00150-2 (2002).
Tucker, P. & Gilliland, J. The effect of season and weather on physical activity: A systematic review. Public Health 121(12), 909–922. https://doi.org/10.1016/j.puhe.2007.04.009 (2007).
Article CAS PubMed Google Scholar
Čular, D., Granić, I. & Babić, M. Relative age effect presence among swimmers within youth olympic games. Acta Kinesiologica 17(2), 12–16 (2023).
Google Scholar
De La Rubia, A., Lorenzo-Calvo, J., Lorenzo, A. Does the relative age effect influence short-term performance and sport career in team sports? a qualitative systematic review. Front. Psychol. 11, 1947. https://doi.org/10.3389/fpsyg.2020.01947 (2020).
Gigerenzer, G. Fast and frugal heuristics: The tools of bounded rationality. Blackwell Handb. Judg. Decis. Making 62, 88 (2004).
Google Scholar
Tversky, A., Kahneman, D. Judgment under uncertainty: Heuristics and biases: Biases in judgments reveal some heuristics of thinking under uncertainty. Science 185(4157), 1124–1131 (1974).
Tversky, A., Kahneman, D.: Rational choice and the framing of decisions. In Multiple Criteria Decision Making and Risk Analysis Using Microcomputers, 81–126 (Springer, 1989)
Gelman, A., Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge University Press, 2007).
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B. Hierarchical Models. In Bayesian Data Analysis, 3rd edn, 101–134 (2025)
Rainham, D. G. et al. Spatial classification of youth physical activity patterns. Am. J. Prev. Med. 42(5), 87–96. https://doi.org/10.1016/j.amepre.2012.02.011 (2012).
Article Google Scholar
Faria, L.O., Bredt, S.d.G.T., Ribeiro, A.I., Galatti, L.R., Albuquerque, M.R.: Inequality in Brazilian basketball: the birthplace effect. Revista Brasileira de Cineantropometria & Desempenho Humano 23, 76932. https://doi.org/10.1590/1980-0037.2021v23e76932 (2021).
Member Academies
AFL-News, Fixtures, Scores & Results-AFL.Com.Au. https://www.afl.com.au/draft/nga-clubacademies
Silver, N. The Signal and the Noise (Penguin Random House, 2015).
Google Scholar
Taleb, N. The Black Swan: The Impact of the Highly Improbable (Random House, 2007).
Mann, D., Ginneken, P. Age-ordered shirt numbering reduces the selection bias associated with the relative age effect. J. Sports Sci. 35(8), 784–790. https://doi.org/10.1080/02640414.2016.1189588 (2017).
Pinder, R.A., , R. Ian, , Davids, K.: The role of representative design in talent development: a comment on “Talent identification and promotion programmes of Olympic athletes”. Journal of Sports Sciences 31(8), 803–806 (2013) https://doi.org/10.1080/02640414.2012.718090 . Publisher: Routledge _eprint: https://doi.org/10.1080/02640414.2012.718090. Accessed 2025-06-06

Download references

Acknowledgements

The authors want to acknowledge the QAS Talent Team for their work in collecting this data. In particular, they would like to acknowledge Karlee Quinn, Matt Glossop and Jennifer Hollier for their work on cleaning and collating the data used in this paper.

Funding

Queensland Academy of Sport

Author information

Authors and Affiliations

Tanenbaum Institute for Science in Sport, University of Toronto, 27 King’s College Circle, Toronto, ON, M5S 1A1, Canada
Kathryn Johnston, Yiru Wang & Joseph Baker
Talent and Coaching, Queensland Academy of Sport, 468 Kessels Road, Nathan, QLD, 4111, Australia
Jack Trace & Alex Roberts
School of Human Movement and Nutrition Sciences, University of Queensland, Level 2, Connell Building, St Lucia, QLD, 4072, Australia
Jack Trace & Veronique Richard
Health and Wellbeing Centre for Research Innovation, University of Queensland, St Lucia, QLD, 4072, Australia
Veronique Richard
Sport, Performance and Nutrition Research Group, School of Allied Health, Human Services and Sport, La Trobe University, Plenty Road, Bundoora, VIC, 3086, Australia
Alex Roberts

Authors

Kathryn Johnston
View author publications
Search author on:PubMed Google Scholar
Yiru Wang
View author publications
Search author on:PubMed Google Scholar
Jack Trace
View author publications
Search author on:PubMed Google Scholar
Joseph Baker
View author publications
Search author on:PubMed Google Scholar
Veronique Richard
View author publications
Search author on:PubMed Google Scholar
Alex Roberts
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation and data collection were performed by J.T. and A.R. Data analysis were conducted by Y.W. The first draft (and subsequent drafts) of the manuscript were written by K.J. and Y.W., and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kathryn Johnston.

Ethics declarations

Competing interests

JT and AR are employed by the Queensland Academy of Sport to deliver the YouFor2032 program. All other authors have no conflict of interest to declare.

Ethics approval and consent to participate

Ethics approval was granted from the University of Queensland and University of Toronto (2024/HE002237).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information. (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Johnston, K., Wang, Y., Trace, J. et al. A Bayesian perspective on geographic influences in coach decision-making for athlete identification and selection. Sci Rep 16, 8234 (2026). https://doi.org/10.1038/s41598-025-30791-y

Download citation

Received: 26 August 2025
Accepted: 27 November 2025
Published: 11 February 2026
Version of record: 05 March 2026
DOI: https://doi.org/10.1038/s41598-025-30791-y

Subjects

Abstract

Introduction

Current study

Methods

Participants

Athletes

Selectors

YouFor2032

Athlete testing procedure

Determining offers to participate in the talent program

Determining regionality status

Analyses

Distinguishing selection criteria from selection thresholds

Athlete sport participation preference

Offer rate differences after meeting selection criteria

Regional comparisons by test metrics and anthropometrics

Bayesian hierarchical model for athlete selection threshold differences between locations

Sport-metric level analysis

Sensitivity analysis

Results

Athlete sport participation preference

Offer rate differences after meeting selection criteria

Regional comparisons by test metrics and anthropometrics

Bayesian hierarchical model for athlete selection threshold differences between locations

Sports level differences

Sport-metric level differences

Robustness of regional bias

Discussion

Limitations and future directions

Implications

Conclusion

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Additional information

Publisher’s note

Supplementary Information

Supplementary Information. (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links