Introduction

Child mental health and air pollution

According to the 2019 Child and Adolescent Mental Health report published by the Statistics National Institute of the United Kingdom, mental disorders between the ages of 5 and 15 have increased from 9.7 to 11.2% in the last decade1. The World Health Organization (WHO) states that more than 20% of adolescents worldwide (10 to 19 years old) suffer from mental disorders, and that 50% of these begin before the age of 14. In most cases, however, they are neither detected nor treated. According to the WHO, depression is one of the leading causes of illness and disability among adolescents and suicide is the third leading cause of death between the ages of 15 and 192. Failure to address adolescent mental disorders has consequences that extend into adulthood, affecting both physical and mental health and limiting opportunities for a satisfying adult life3.

Extant research on mental health of children related to pollution is based on subjects living mainly or only in urban areas. To the best of our knowledge, there are no studies focusing on rural areas. It is thus important to conduct a study in a rural area to discern whether pollution and child mental health have a similar relationship as in urban areas. It is well known that pollution behaves differently in rural areas4, and it is an open question whether rural children exhibit similar or different relationships from their urban counterparts.

Regarding the association between exposure to air pollution and the occurrence of mental health problems in urban children, we carried out a bibliographic search in the PubMed database at the end of July 2023, using the key words “mental health”, “child” and “air pollution”. Specifically, we found three reviews: Volk et al.5; Karrari et al.6 and Zhao et al.7. The last two are systematic reviews and follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)8 reporting guidelines, whereas Volk et al.5, is a review within the guidelines of the Environmental Influences on Child Health Outcomes (ECHO) Program.

Karrari et al., in their systematic review of articles published between 1990 and 2011, showed that lead exposure can have adverse effects on health throughout life such as hearing loss, anaemia, renal failure, and weakened immune system. In the case of children, it is more detrimental to mental development. At low doses, this element is related to motor and cognitive dysfunctions6.

Zhao et al. conducted a review of the literature between 1960 and 2017, to identify the associations between air pollution and mental health and behavioural disorders. Specifically, they studied ozone exposure and the immune and nervous systems. The results from their research did not relate ozone to autism spectrum disorders, impaired cognitive function, dementia, depression or suicide, the immune and nervous systems, mental health, or behavioural disorders7.

Volk et al. provide a review of the literature from 2006 to 2020 on prenatal exposure to pollution and neurodevelopment including autism spectrum disorder, attention deficit hyperactivity disorder (ADHD), general measures of cognition and intelligence, state of mood and image. They concluded that there is a relationship between prenatal exposure and neurodevelopment5.

In exploring the literature not included in the reviews, we looked for problems other than neurodevelopment, as well as exposure to air pollutants other than lead and ozone. Five groups of mental health problems were identified: behavioural and development disorders9,10,11,12,13,14,15,16,17,18; ADHD12,16,19,20,21,22,23,24; anxiety25,26,27,28,29,30,31,32; eating disorders33,34,35,36, and other mental health problems37,38,39,40,41.

Long scientific evidence highlights the relationship between environmental pollution and cognitive development, understood as the processes through which human beings acquire skills that allow them to interpret reality and interact with it. Maternal exposure to air pollution is related to child neurological development during the first 24 months of life through the onset of a neurological delay10. There is an association between traffic pollution and the neurodevelopment of children aged 7 to 12 years, with a lower cognitive development as well as behavioural impairments11,12,17,25. There are sex-dependent differences in children aged 4 to 6 years, with increased vulnerability in male children14. Air pollution is also connected to memory, attention, and verbal cognition15. Some studies on traffic pollution show a relationship with a cognitive deficit in children, despite the lack of significance9 while others do not identify any relationship between short-term exposure to traffic-induced pollution and neuropsychological problems13,17.

Air pollution is related to ADHD and behavioural disruptions and development problems in children aged 7 to 11 years12,19. Environmental pollutants such as O3, NO2, PM1, PM2.5 and PM10 are associated with children’s and adolescents’ mental health and the development of ADHD20,21,22,23,24.

When exposure to air pollution is prolonged over time, the risk of anxiety in children and adolescents increases25,26,27,29,30. Ma et al. highlight that the hospital admissions for anxiety increase with short-term exposure to NO2 and sulphur dioxide (SO2)32. Hao et al. find a positive association between PM2.5 and anxiety31, however, other studies relating PM2.5 and anxiety are unconclusive28.

To the best of our knowledge, evidence on pollution and eating disorders focuses on obesity and overweight, but not other mental health problems. Limaye and Salvi suggest that air pollution is a risk factor for the development of obesity33. In particular, prenatal exposure to PM2.5 and NO2 is related to an increased risk of being overweight or obese34,35. Shi et al. find that the increased risk not only develops in a prenatal stage, but also later in life36.

Exposure to certain fine particles such as PM2.5 is associated with other psychiatric problems in children and adolescents like bipolar disorder, impulse control disorder, or suicidality37,38,40 although other studies do not obtain significant results39,41.

The studies above focus on mental health of children living mainly or only in urban areas. We have identified no studies focusing on rural areas devoted to children.

However, some studies in rural areas do analyse the relationship between air pollution and mental health of the general population. Exposure to wildfire smoke in rural communities, particularly among vulnerable groups like working-age adults, non-Hispanic whites, and those without a university education, increases the risk of suicide42. In China, concerns about physical and mental health are growing, with air pollution, income inequality, food contamination, and limited green spaces relating to health43. Air pollution is also connected to the mental health of China's aging population, with urban elderly individuals having better psychological well-being compared to their rural counterparts44. These studies emphasize the need to address air pollution and health in various populations, with a focus on rural communities and vulnerable groups.

Methodological issues

The reviewed studies have certain limitations at the spatial and temporal level. Pollutant monitoring stations are mostly located in urban areas, thus offering limited spatial coverage5. This makes it very difficult to estimate the relationship between pollutants and health in rural settings.

The aim of the article is to provide first-time scientific evidence on the association between air pollution and mental health of children and adolescents under 15 in a rural setting, including behavioural and developmental disorders, ADHD, anxiety, and eating disorders beyond obesity and overweight (anorexia nervosa, bulimia nervosa, binge eating disorder, and avoidant and restrictive disorder). The methodological value of this research lies in combining spatio-temporal models, Bayesian inference, and the Compositional Data (CoDa) analysis method, thus providing a new perspective on the relationship between the concentrations of air pollutants and mental health problems. This innovative methodology allows for a better understanding of the relationship between air pollution and mental disorders in children and adolescents in rural areas where pollution monitoring stations are few and far between.

By utilizing spatio-temporal models, both the geographical location and the temporal variability of air pollutants are taken into account, providing better information about exposure. Bayesian inference makes a rigorous assessment of the associations between air pollution and mental disorders, considering the inherent uncertainty in the data and the sparse monitoring network.

The use of the CoDa analysis method is particularly relevant in this study, as it addresses the compositional nature of air pollutants. By considering the relative proportions of different components in the pollutant composition, CoDa analysis relates air pollution and mental health taking into account possible trade-offs between pollutants and not only overall pollution levels. This improves on studies focusing only on absolute pollution levels or focusing only on a single pollutant.

Results

Population characteristics

All the variables, except age, number of chronic diseases and pollutant concentrations, were expressed in proportions. Of the population studied in the Alt Empordà region in Catalonia, Spain (24,674 children under 15 years old), 8.82% had behavioural and developmental disorders, 1.74% anxiety, 0.76% ADHD, 0.37% eating disorders, 47.9% were girls and 52.1% boys, 1.1% had glucose intolerance, 7.4% were obese, 1.6% smoked, and 0.2% consumed alcohol. In relation to the pharmaceutical co-payment, which is an indicator of individual socioeconomic level, 10.3% made an economic contribution of up to 10%, 68.8% contributed economically with 40% and the remaining 20.9% with 50% or more. Regarding the average income in the census tract, 79.0% were in the first quartile of lowest income census tracts (reference category), 8.8% in Q2, 5.8% in Q3, and 6.4% in Q4. The average age was 11 years, and, on average, the subjects suffered from 0.15 chronic diseases. The average concentrations of each pollutant were 22.52 μg/m3 of PM10, 34.69 μg/m3 of NO2, 44.10 μg/m3 of O3, 0.29 mg/m3 of CO, and 2.20 μg/m3 of SO2 (Table 1).

Table 1 Descriptive analysis in children under 15 years old from the Alt Empordà region in Catalonia, Spain, from 2009 to 2019.

Air pollutant concentrations

The results for the relationship between air pollutants and the behavioural and development disorders, anxiety, ADHD, and eating disorders can be seen in Table 2. The \(\upbeta\) coeficients, show the compositional behaviour indicating trade-offs between pollutants: \({\upbeta }_{1}\) refers to PM10, \({\upbeta }_{2}\) to NO2, \({\upbeta }_{3}\) to O3, \({\upbeta }_{4 }\) to CO, and \({\upbeta }_{5}\) to SO2. \(\upgamma\) indicates the geometric mean of all pollutants standing for overall pollution, hereinafter the total. The total pollution level (γ 95% credibility interval (CI) -Bayesian equivalent of a 95% confidence interval—from 17.9264 to 20.2053) related to the incidence of behavioural and development disorders. In relative terms, when NO22 CI from 20.1750 to 25.9906) and O33 CI from 18.0670 to 23.8689) increased, and PM101 CI from − 28.5884 to − 20.8427) and CO (β4 CI from − 23.6106 to − 16.3512) decreased, the incidence also increased. When adding up the β and γ coefficients, behavioural and development disorders appeared to be related to NO2, O3 and SO2.

Table 2 Estimation of pollution effects on mental health problems in children under 15 years old from the Alt Empordà region in Catalonia, Spain, from 2009 to 2019.

In the case of anxiety, total pollution (γ CI from 8.2395 to 11.2685) was also related to incidence. When O33 CI from 7.8187 to 15.4243) and SO25 CI from 1.4973 to 9.4917) increased and CO (β4 CI from − 21.3632 to − 9.4764) decreased, the incidence also increased. When adding up the β and γ coefficients, anxiety appeared to be related to PM10, O3 and SO2. Total pollution (γ) was also related to the incidence of ADHD (CI from 3.2102 to 4.7697) and eating disorders (CI from 3.2065 to 5.1064).

Covariables

Gender, age, number of chronic diseases, glucose intolerance, obesity, alcohol consumption, tobacco consumption and individual socioeconomic level had a significant relationship with the risk of having behavioural and development disorders. Being a girl (CI from 0.5867 to 0.8874) and having a medium (CI from 0.4690 to 0.8499) or high (CI from 0.2229 to 0.4895) individual socioeconomic level acted as protective factors. Age (CI from 1.6661 to 2.2696), number of chronic diseases (CI from 1.4647 to 2.1420), glucose intolerance (CI from 1.5983 to 5.0600), obesity (CI from 1.4125 to 2.5303), and alcohol (CI from 1.6891 to 17.8359) and tobacco (CI from 3.8947 to 10.1989) consumption indicated increased risks of behavioural and development disorders given that their odds-ratio was greater than 1 (Table 3).

Table 3 Estimation of covariable relative risks on behavioural and development disorders in children under 15 years old from the Alt Empordà region in Catalonia, Spain, from 2009 to 2019.

The covariables which were significant with respect to anxiety were gender, age, glucose intolerance, obesity, alcohol and tobacco consumption, individual socioeconomic level, and average income in the census tract. Protective factors were being a girl (CI from 0.0928 to 0.3004), having a high individual socioeconomic level (CI from 6.6988 × 10−4 to 5.4451 × 10–3), and living in a census tract where the average level of income was medium to high (Q2 (CI from 6.1181 × 10–3 to 1.1462 × 10–1) and Q3 (CI from 2.9175 × 10–3 to 5.3412 × 10−2)). Age (CI from 4.8610 to 14.0504), glucose intolerance (CI from 1.7828 to 391.4102), obesity (CI from 1.0087 to 6.2730), alcohol consumption (CI from 1.2828 to 9.3783 × 103), smoking (CI from 3.6832 to 113.2430) and living in a census tract where the average level of income was very high (Q4 (CI from 2.0168 × 104 to 1.5426 × 107)), were related to an increased risk of anxiety (Table 4).

Table 4 Estimation of covariable relative risks on anxiety in children under 15 years old from the Alt Empordà region in Catalonia, Spain, from 2009 to 2019.

Regarding ADHD (Table 5), gender, age, individual socioeconomic level, and average income in the census tract were significant. Age (CI from 2.9929 to 9.7336) and living in a census tract where the average level of income was very high (Q4 (CI from 2.8765 to 77.3954)) were related to an increased risk. Protective factors were being a girl (CI from 5.7924 × 10−3 to 5.3888 × 10−2), having a high individual socioeconomic level (CI from 0.0134 to 0.1935), and living in a census tract where the average level of income was high (Q3 (CI from 0.0215 to 0.3252)).

Table 5 Estimation of covariable relative risks on ADHD in children under 15 years old from the Alt Empordà region in Catalonia, Spain, from 2009 to 2019.

The significant variables for eating disorders were being a girl (CI from 5.7278 to 45.1946), age (CI from 3.6859 to 12.6260), obesity (CI from 1.2639 to 9.2735), and tobacco consumption (CI from 4.2096 to 350.7618). All of them acted as risk factors (Table 6).

Table 6 Estimation of covariable relative risks on eating disorders in children under 15 years old from the Alt Empordà region in Catalonia, Spain, from 2009 to 2019.

Discussion

Our results for the first time reveal a significant association between air pollution and child mental health in a rural setting. Exposure to PM10, NO2, O3, and SO2 is linked to increased risks of behavioural and developmental disorders, anxiety, ADHD, and eating disorders. Different pollutants exhibit distinct relationships. NO2, O3, and SO2 are linked to behavioural and developmental disorders, while PM10, O3, and SO2 correlate with anxiety. Overall pollution levels also contribute to higher rates of the disorders under study.

Demographic factors are also linked to disorder risks. Protective factors include being female and having higher socioeconomic status, while risks rise with age, chronic diseases, glucose intolerance, obesity, and alcohol and tobacco consumption.

We compared the findings of our study to other research addressing the associations between air pollution and child mental health in urban populations, as the relationship has not yet been established in rural settings. This comparison revealed relevant similarities but also differences. We found the review by Volk et al.5 particularly useful, as well as some individual studies in topics not covered by these authors.

Regarding behavioural and development disorders, the level of total air pollution (γ) is related to a higher incidence, a fact that also occurs when NO22) and O33) increase and PM101) and CO (β4) decrease. By adding the β and γ coefficients, behavioural and developmental disorders seem to be related to NO2, O3 and SO2; as shown by the results of the research carried out by Sunyer et al. in children aged from 7 to 11 in the city of Barcelona, with a reduction in cognitive development when traffic-related pollutants increased11. Pérez-Crespo et al.18 reached the same conclusion in their study of 9- to 12-year-olds in Rotterdam.

Regarding anxiety, our results show that incidence is also related to the total (γ) and, when adding the β and γ coefficients, to PM10, O3, and SO2. This coincides with the research by Yolton et al.27 in the city of Cincinnati, with a significant association between pollutants present in road traffic and anxiety in 12-year-olds. Another significant relationship between PM10, NO2 and anxiety is reported by Jorcano et al.26 in boys and girls aged 7 to 11 who had been exposed in pre- and post-natal phases.

Total air pollution (γ) is also connected to the incidence of ADHD in this investigation. Volk et al., carried out a systematic review determining that exposure to air pollutants both pre- and post-natal is associated with an increased risk of ADHD. Some of the reviewed studies analysed similar pollutants to ours, such as PM2.5, NO2, O3, and PM105. Similar results were also observed by Forns et al.12 in boys and girls aged 7 to 11 living in Barcelona. These children presented an increase in ADHD cases at higher levels of exposure. Maitre et al.16 through data from the "Human Early Life Exposome" (HELIX) project, based on 6 longitudinal birth cohorts in Europe between the ages of 6 and 11, also found a positive relationship. Likewise, Markevych et al.20 identified the increased risk of ADHD in relation to PM10 and NO2 in children under 10. Shim et al.23 in their study with children aged 7 to 12, found a relationship between particulate matter exposure and the prevalence of ADHD. Zhou et al.24 found a significant association between long-term ozone exposure and ADHD in children aged 3 to 12 years residing in seven cities of Liaoning, China.

With regard to eating disorders, our study relates total air pollution with anorexia nervosa, bulimia nervosa, binge eating disorder, and avoidant and restrictive disorder, all of which are absent from the literature relating pollution and eating disorders.

The relation between pollutant concentrations and behavioural and developmental disorders, ADHD, anxiety, and eating disorders has been assessed through the combination of spatio-temporal models, Bayesian inference, and CoDa. This method combination is new in air pollution and health research, but Bayesian inference has been applied by Saez et al.21 to identify the association between environmental factors and ADHD, and gains relevance in a rural setting.

The combination of CoDa with a total pollution level is also new in studies related to air pollution and health and makes it possible to disentangle overall pollution from the relative importance of each pollutant with respect to one another. These relationships have hitherto been presented together, thus making it impossible to assess trade-offs between sets of pollutants.

There are several limitations in our study. First and foremost was the misclassification of exposure to air pollutants. In fact, this is an ecological (or average) exposure in the area in which the subject lives and does not necessarily coincide with the exposure of each subject. In addition, the subjects are not immobile, and so are exposed to air pollution not only in the place where they live, but also at the school they attend, the place where they do extracurricular activities (for example, languages or sports), etc. Fortunately, on the one hand, for the spatial prediction of exposure we considered the census tract where each subject lives45. While this did not eliminate the problem entirely, it did greatly minimize it. On the other hand, the subjects we analysed, boys and girls, do not move very far from their homes, so exposure would not vary very much. Furthermore, it is a non-differential misclassification. That is, all subjects have the same probability of being misclassified.

Second, the covariables used were limited to what was available in the data base. In particular, the proxy variable for individual socioeconomic level based on the grades of pharmaceutical co-payment had to be grouped into only three categories because very few cases were present in some of the original nine. Thus, this variable may have been measured in error. A less coarse measure of socioeconomic level would be desirable in future research.

Finally, there could be residual confounding bias. However, we have tried to control for bias including both a large number of observed confounders as covariables, and random effects that captured unobserved confounders.

Methods

Population

We used a sub-cohort of a population-based cohort composed of 124,112 individuals from the Alt Empordà, a rural region in Catalonia, Spain, from 2009 to 2019. Specifically, this retrospective population-based cohort study was conducted including data from all residents in the region. Data were obtained from the Public Data Analysis for Health Research and Innovation Program (PADRIS) database, which includes admission data from primary care centres, hospitals (inpatient and outpatient care), extended care facilities, and mental health centres, as well as sociodemographic (sex and age) and socioeconomic (pharmaceutical co-payment) variables.

Alt Empordà, located in the province of Girona, Catalonia stands out as a predominantly rural area. With a population density of 104 inhabitants per square kilometre, Alt Empordà starkly contrasts with Catalonia, which boasts a density of 248 inhabitants per square kilometre. This underscores the significance of the primary sector in this region, whose value-added contribution amounts to 4%, as opposed to Catalonia's 1.2%. These figures highlight the reliance of Alt Empordà on agriculture, winery, fishery, and livestock farming. According to data from the Institute of Statistics of Catalonia in 2022, Alt Empordà had 144,926 inhabitants, of which 15.2% were 14 years old or younger, compared with 7,899,056 inhabitants in Catalonia, and 14.2% respectively45. We selected children and adolescents who were under the age of 15 at some point between 2009 and 2019. The number of subjects was 24,674 children and adolescents, of whom 2,382 had at least one of the following mental disorders: behavioural and development disorders, ADHD, anxiety, and eating disorders.

Outcome variables

The mental disorders listed below were considered as dependent variables. The codes of each accord to the International Classification of Diseases, 10th Revision (ICD-10):

  • Behavioural and development disorders (F94)

  • Anxiety (F41.9)

  • ADHD (F90)

  • Eating disorders (F50)

Environmental explanatory variables

As explanatory environmental variables we included the pollutants PM10 (µg/m3), NO2 (µg/m3), O3 (µg/m3), carbon monoxide (CO) (mg/m3), and SO2 (µg/m3). We obtained their hourly levels for 2009–2019 from the 143 automatic monitoring stations from the Catalan Network for Pollution Control and Prevention (XVPCA) (open data)46. Instead of the short-term effects of these pollutants on children's health, we were interested in the effects of long-term exposure. For this reason, we used the monthly averages, from January 2009 to December 2019 (further details can be found in Saez and Barceló47 and in Bertran et al.48).

While it is essential to obtain the pollution data in each census tract of the studied population, there may not always be a monitoring station, and this is even more so in a rural setting. We used a Bayesian hierarchical spatiotemporal model47 to predict exposure to air pollutants in locations or time periods without pollution monitoring sites. The estimation of the (second-order stationary) Gaussian field suffers from the so-called 'big n problem', that is, there are large computational costs of the linear algebra operations required for model fitting and prediction. The costs are larger for large datasets in space and time, as in our case, in which a frequentist approach proved unfeasible.

Even using a Bayesian approach, these computational problems subsist when the MCMC algorithm is used, since dense matrices must be computed for each iteration. Among the solutions that have been proposed to Bayesian inferences under the 'big n problem', we select the Integrated Nested Laplace Approximation (INLA) by Rue et al.49,50, in the specification suggested by Saez and Barceló47 and used by Mota-Bertran et al.4,48. The INLA approximation provides a fast and yet quite exact approach to fitting complex latent Gaussian models which comprise many statistical models in a Bayesian context49,50.

Predictions of exposure to pollutants were made at the level of the census tract where the subject lives, for the period between the beginning of the follow-up (January 1, 2009) and the day before the diagnosis of the mental illness.

In the context of pollution studies, the CoDa methodology51,52,53,54,55 identifies patterns of varying relative importance of pollutants48,56,57,58 which can later be associated to potential health risks59. The CoDA methodology makes it possible to study the relative importance of pollutant concentrations, that is, how pollutants behave relative to one another, beyond the information provided by total pollution levels. This is crucial for understanding air pollution. The commonest tool within the CoDa methodology is the log-ratio transformation, the simplest of which is the log-ratio between pairs of pollutant concentrations53.

For instance, Mota-Bertran et al. 48 identified three substantial trade-offs among the pollutants defined at the beginning of this section: NO2 versus O3, and SO2 versus NO2 and O3. In other words, the log-ratios with the highest variance were ln(O3/NO2), ln(NO2/SO2), and ln(O3/SO2). The CoDa approach is tailored to studying these forms of joint variability, rather than considering one pollutant individually and rather than considering only overall pollution levels.

The specification of a composition as an explanatory variable was first developed by Aitchison and Bacon-Shone as the so-called log-contrast model, and indicates the increase of which pollutants, at the expense of the decrease of which other, is related to the incidence of a disease60. For this purpose, the right-hand side of the model equation must include:

$$\begin{aligned}\beta_{1} \ln \left( {{\text{PM}}_{10} } \right) + \beta_{2} \ln \left( {{\text{NO}}_{2} } \right) + \beta_{3} \ln \left( {{\text{O}}_{3} } \right) + \beta_{4} \ln \left( {{\text{CO}}} \right) + \beta_{5} \ln \left( {{\text{SO}}_{2} } \right) \\{\text{with the constraint}} \ \beta_{1} + \beta_{2} + \beta_{3} + \beta_{4} + \beta_{5} = 0\end{aligned}$$
(1)

The constraint \({\upbeta }_{1}+{\upbeta }_{2}+{\upbeta }_{3}{+\upbeta }_{4}+{\upbeta }_{5}=0\) ensures that total pollution is constant, as required for the interpretation of compositional effects as trade-offs. In other words, the increases of some pollutant concentrations in relative terms are only possible when decreasing some others. For ease of estimation, the so-called additive log-ratio transformation60 can be applied to introduce said constraint by setting log-ratios with a common denominator. These log-ratios can be used as variables in any statistical model:

$${\upbeta }_{1}\text{ln}\left(\frac{{\text{PM}}_{10}}{{\text{SO}}_{2}}\right)+{\upbeta }_{2}\text{ln}\left(\frac{{\text{NO}}_{2}}{{\text{SO}}_{2}}\right)+{\upbeta }_{3}\text{ln}\left(\frac{{\text{O}}_{3}}{{\text{SO}}_{2}}\right)+{\upbeta }_{4}\text{ln}\left(\frac{\text{CO}}{{\text{SO}}_{2}}\right),$$
(2)

where \({\upbeta }_{5}\) is obtained either as \({\upbeta }_{5}={-\upbeta }_{1}-{\upbeta }_{2}-{\upbeta }_{3}-{\upbeta }_{4}\) or by rerunning the model with a different pollutant in the denominator61.

The CoDa approach does not imply disregarding the absolute pollution levels. Apart from considering the relative importance of air pollutants, some form of total air pollution can be added to the compositional information by means of a \(\mathcal{T}\)-space62,63,64,65, also known as CoDa with a total. The composition extracts the relative importance of air pollutants to each other, while the total T speaks of global levels of air pollution. This total is best defined as the so-called multiplicative total, in other words, the product of air pollutant concentrations48,65,66.

$$T=\text{ln}\left({\text{PM}}_{10}\cdot {\text{NO}}_{2}\cdot {\text{O}}_{3}\cdot \text{CO}\cdot {\text{SO}}_{2}\right)= \left(\text{ln}\left({\text{PM}}_{10}\right)+\text{ln}\left({\text{NO}}_{2}\right)+\text{ln}\left({\text{O}}_{3}\right)+\text{ln}\left(\text{CO}\right)+\text{ln}\left({\text{SO}}_{2}\right)\right).$$
(3)

The total refers to increasing all pollutant concentrations in absolute terms by a common factor and can be specified by adding the γ coefficient to the right-hand side of the model equation:

$${\upbeta }_{1}\text{ln}\left(\frac{{\text{PM}}_{10}}{{\text{SO}}_{2}}\right)+{\upbeta }_{2}\text{ln}\left(\frac{{\text{NO}}_{2}}{{\text{SO}}_{2}}\right)+{\upbeta }_{3}\text{ln}\left(\frac{{\text{O}}_{3}}{{\text{SO}}_{2}}\right)+{\upbeta }_{4}\text{ln}\left(\frac{\text{CO}}{{\text{SO}}_{2}}\right)+ \gamma (\text{ln}\left({\text{PM}}_{10}\right)+\text{ln}\left({\text{NO}}_{2}\right)+\text{ln}\left({\text{O}}_{3}\right)+\text{ln}\left(\text{CO}\right)+\text{ln}\left({\text{SO}}_{2}\right) )$$
(4)

For interpretation, the effects of pollutant concentrations can be decomposed into the log-contrast part and the total part as:

$${(\upbeta }_{1}+\upgamma )\text{ln}\left({\text{PM}}_{10}\right)+{(\upbeta }_{2}+\upgamma )\text{ln}\left({\text{NO}}_{2}\right)+{(\upbeta }_{3}+\gamma )\text{ln}\left({\text{O}}_{3}\right)+{(\upbeta }_{4}+\gamma )\text{ln}\left(\text{CO}\right)+{(\upbeta }_{5}+\upgamma )\text{ln}\left({\text{SO}}_{2}\right)$$
(5)

with \({\upbeta }_{1}+{\upbeta }_{2}+{\upbeta }_{3}{+\upbeta }_{4}+{\upbeta }_{5}=0\) . This decomposition provides complementary views on pollution and mental health which are made possible by the \(\mathcal{T}\)-space. Firstly, the \(\beta\) coefficients are interpreted as trade-offs: positive \(\beta\)\(\text{s}\) indicate pollutants whose relative increase is related to a greater incidence of mental disorders when coupled with relative decreases in pollutants with negative \(\beta {\text{s}}\), the total remaining constant. Secondly, the \(\upgamma\) coefficient relates the incidence of mental disorders to increases in the overall pollution level while keeping its composition constant, in other words, when multiplying all pollutant concentrations by a common factor. Thirdly, the sums \(\beta + {\upgamma }\) relate mental disorders to the increase of the absolute concentration of each pollutant while leaving the absolute concentrations of all other pollutants constant, which implies both an increase in the relative importance of the pollutant and an increase in the total pollution. This interpretation is the only one available in standard pollution modelling. The CoDa approach with a total adds to this the decomposition of the sums \(\beta + {\upgamma }\) into \(\beta\) and \(\upgamma\).

Because the numbers of cases of ADHD and eating disorders are small, in these two disorders we simplified Eq. 4, by dropping the log-ratios and taking only the total pollution \(\gamma (\text{ln}\left({\text{PM}}_{10}\right)+\text{ln}\left({\text{NO}}_{2}\right)+\text{ln}\left({\text{O}}_{3}\right)+\text{ln}\left(\text{CO}\right)+\text{ln}\left({\text{SO}}_{2}\right) )\) into account. This simplification improved the stability of the estimates.

Covariables

We used the following covariables.

  • Gender: binary variable indicating girls (1) or boys (0).

  • Age: standardized to zero mean and unit standard deviation.

  • Number of chronic diseases: including bronchitis (7.5%), asthma (6.4%), neoplasms (2.3%) and others (chronic obstructive pulmonary disease, hypertension, and ischemic cardiomyopathy).

  • Glucose intolerance: binary variable indicating if the subject has glucose intolerance (1) or not (0).

  • Obesity: binary variable indicating if the subject is obese (BMI ≥ 30 kg/m2) (1) or not (0).

  • Alcohol consumption: binary variable indicating if the subject consumes alcohol (1) or not (0).

  • Tobacco consumption: binary variable indicating if the subject is a smoker or was a former smoker (1) or not (0).

  • A proxy variable for individual socioeconomic level: categorical variable that shows the different levels of pharmaceutical co-payment present in Catalonia67. The following categories were defined for the economic contribution: up to 10% (Lowest socioeconomic level, reference category); 40% (medium socioeconomic level); and 50% or more (Highest socioeconomic level). The first category contains incomes below 18,000 EUR/year, the second between 18,000 and 100,000 EUR/year and the third above 100,000 EUR/year.

  • Average income (EUR). Average from 2015 to 2019 observed at census-tract level68 and transformed into quartiles (lowest quartile as reference category).

Data analysis

We specified a generalized linear mixed model (GLMM) for each health outcome, with binary response from the binomial family:

$$\begin{aligned}\text{ln}\left(\frac{Prob\left({Y}_{it}\right)}{1-Prob\left({Y}_{it}\right)}\right)& = {\beta }_{0}+\sum_{k=1}^{5}{\beta }_{k}{pollutant}_{k,it}+\gamma {T}_{it}+{\beta }_{6}{gender}_{i}+{\beta }_{7}{age}_{it}+{\beta }_{8}{number\_of\_chronic\_diseases}_{it}\\ &\quad{+\beta }_{9}{impaired\_glucose\_tolerance}_{it}+{\beta }_{10}{obesity}_{it}+{\beta }_{11}{alcohol}_{it}\\ &\quad+{\beta }_{12}{tobacco}_{it}+\sum_{k=1}^{2}{\beta }_{13k} {individual\_socioeconomic\_status}_{ik}\\ &\quad+ \sum_{k=2}^{4}{\beta }_{14k} {quartile\_average\_income}_{ik} {+ \eta }_{i}+S\left({census\_tract}_{i}\right),\end{aligned}$$
(6)

where the subindexes \(i\) and \(t\) indicate the subject and the year, respectively; Yit indicates the mental health problem of the subject i at year t (0 absence, 1 presence); pollutantk,it denotes the exposure to PM10, NO2, O3, CO, and SO2 in relative terms; Tit denotes the exposure to overall pollution as in Eq. 3\(; {\eta }_{i} \text{and} S\left({census\_tract}_{i}\right)\) denote random effects (explained below); and \(\beta s\) and \(\gamma\) are the coefficients of the explanatory variables and covariables (\({e}^{\beta }\) is the relative risk associated with each covariable).

We considered two random effects in the models. First, \({\eta }_{i}\) is a random effect indexed on the subject. This random effect is unstructured (independent and identically distributed), and captures individual heterogeneity, i.e., unobserved confounders specific to the subject and invariant in time.

Second, we included the structured random effect S(census_tracti) to control spatial dependency: census tracts that are close in space show more similar incidence than those that are not.

The spatially structured random effect S(census_tracti) is normally distributed with zero mean and a Matérn covariance function:

$$Cov\left(S\left({x}_{i}\right),S\left({x}_{{i}^{\prime}}\right)\right)=\frac{{\sigma }^{2}}{{2}^{\nu -1}\Gamma \left(\nu \right)} {\left(\upkappa \Vert {x}_{i}-{x}_{{i}^{\prime}}\Vert \right)}^{\nu } {\text{ K}}_{\nu } \left(\upkappa \Vert {x}_{i}-{x}_{{i}^{\prime}}\Vert \right),$$
(7)

where \({\text{K}}_{\nu }\) is the modified Bessel function of the second type and order. \(\nu >0\) is a smoother parameter, \({\sigma }^{2}\) is the variance and \(\kappa >0\) is related to the range (\(\rho =\sqrt{8 \nu }/\kappa\)), the distance to which the spatial correlation is close to 0.169.

As can be seen, we introduced many unobserved variations into the GLMMs for each of the observations. This prevents using a frequentist approach. For this reason, inferences were made with a Bayesian perspective and the INLA approach49,50. We used priors that penalize complexity (called PC priors). These priors are robust in the sense that they do not have an impact on the results70. All analyses were carried out using the free software R (version 4.2.2)71, through the INLA package49,50,72.

Ethical approval and consent to participate

The use of the data included is authorized by the Catalan Health Institute (ICS) and the Data Analysis Program for Health Research and Innovation (PADRIS) which ensure the pseudo-anonymization of the information. When linkage with other public data sources is required, ICS or PADRIS act as a Trusted Third Party (TTP) to execute the linkage and provide the new data set already pseudoanonymized; otherwise, informed consent of patients is needed to access their personal data, using the same TTP. The data are managed in a secure server following all the present legal requirements of the General Data Protection Regulation (European Union) 2016/679 and of the Council of 27 April 2016 and the Spanish Organic Law 3/2018 of 5 December on the protection of personal data and guarantee of digital rights.