Background & Summary

Concomitantly with artificialization1 and densification of (sub)urban areas, an emphasis has been placed on kitchen gardens, which offer many benefits for the environment and the population, such as the reintroduction of nature into cities, the creation of social links and the willingness to produce local food and eat more healthily. In towns and cities, the pressure on land and the dynamic toward urban agriculture is such that shared, community or individual gardens are most often set up on areas where the soil is degraded. In France, this trend was reinforced by the Climate and Resilience Act (2021)2 requiring neutral balance between artificialization and restoration (zero-net artificialization) by 2050. For example, redeployment of brownfield sites could result in increasing the exposure of citizens to contaminants3. Thus, compliance between the soil state and its use in any urban agriculture project needs to be assessed. For this purpose and on the initiative of ADEME (The French Agency for Ecological Transition) and INERIS (French institute of industrial environment and risks), the BAPPOP dataset was created in 2015 and updated in 2023. BAPPOP gathered data on a set of organic molecule concentrations in several kitchen garden crop types and associated soils, in different contexts and origins of contamination. The dataset resulted in a freely available table, located in RechercheDataGouv4, an online repository with shared data, conforming to “FAIR” principles. Another dataset5, previously created in the same context as BAPPOP, concerning metallic trace elements concentrations in both different kitchen garden crop types and in associated soils (BAPPET dataset, BAse de données sur la contamination des Plantes Potagères par les Eléments Traces métalliques, i.e. dataset on contamination of kitchen garden plants by metallic trace elements), was also updated in 2023.

The BAPPOP dataset was initially created for people in charge of environmental analyses to compare site-specific results to literature data in identical or similar contexts. A user survey was conducted from December 2022 to February 2023, revealing that the BAPPOP dataset is used either for prediction of soil and/or vegetable concentrations, or for comparison of data with their own specific site data. A research on Web Of Science (WOS) was performed to discern whether the BAPPOP dataset had already been used for scientific analysis, but to our knowledge, this is not yet the case.

Methods

Selection of most relevant organic molecules

The first step was to select an appropriate search engine for this collection of bibliographic resources. Three search engines were available at the Université de Lorraine: WOS, Science Direct and Ex Libris. Whether at the creation of BAPPOP in 2015, or when it was updated in 2023, WOS was selected as being the tool returning the highest number of references for a test associating one of the five sub-families of organic pollutants (polycyclic aromatic hydrocarbon-PAH, polychlorinated biphenyls-PCB, polychlorinated dibenzodioxins-PCDD/polychlorinated dibenzofurans-PCDF and benzene, toluene, ethylbenzene and xylene-BTEX) to four target keywords (plant/transfer/uptake/soil).

At the time of the BAPPOP dataset creation in 2015 it appeared essential to define a reasonable list of target molecules to be included in the dataset, so that it would both meet the end-users’ expectations and be adapted to existing literature resources. This list was updated in 2023 following the same decision-making methodology. The decision-making process thus associated three major issues regarding the existence of i) regulations and ii) toxicological reference values, and iii) the availability of published data.

In 2015 a preliminary list of 576 organic molecules was established, based on existing European legislation6,7,8, scientific reports by the European Food Safety and Authority (EFSA 20089 and EFSA 201010) and French monitoring and control plan of animal and plant foodstuffs and animal feedstuffs by the French directorate for food 200411 and 201012. To be further considered, a molecule had to be present in at least three of the above cited regulations, thereby leading in 2015 to a reduced list of 169 molecules. To this list, organic molecules considered as priority substances by environmental and health risk agencies were added on expert opinion. Thus, this led to a list of 235 organic molecules of interest.

Given the way the dataset was to be implemented, it seemed pointless to include molecules which had yet to be studied and for which literature resource was poor or non-existent. A bibliographic criterion, based on the potential availability of data was defined. Using the request in WOS [“plant” AND “transfer” OR “uptake” AND “name of the organic molecule”], where “name of the molecule” was replaced by the name of one of the 235 pre-selected molecule, initially applied over the period 1950 to June 19th 2014, the Bibliometric score set was equal to the number of publications returned by the search +1.

A Toxicological score combining hazard and dose scoring was defined as follows. The Hazard score was defined according to the French National Institute of Research and Security (INRS) classification on carcinogenic, mutagenic and reprotoxic (CMR) substances13 and the directive 1272/2008/CEE14 according to the class 1A or 1B, 2 and no CMR classification. If the organic molecule is:

  • Class 1A or 1B, the hazard score is equal to 6;

  • Class 2, the hazard score is equal to 3;

  • Not classified as CMR, the hazard score is equal to 1.

The Dose score is given according to the existence of a toxicity reference value (TRV) in the main toxicological databases, like US EPA15, ATSDR16 and RIVM17. A score equal to 2 is attributed to a substance which presents at least one TRV, and a score equal to 1 is attributed when no TRV exists for the substance. The global toxicological score is then given by:

$${Toxicological\; score}={Hazard\; score}\ast {Dose\; score}$$
(1)

A Global score was then calculated with:

$${Global\; score}={Bibliometric\; score}\ast {Toxicological\; score}$$
(2)

In 2015, this analysis led to the selection of 47 molecules to be considered in the dataset. To represent the quantity of available publications and the number of molecules to consider, the cumulative number of publications as a function of molecule number was represented (Fig. 1). Thus, it makes it possible to assess the gain in publications compared to the number of molecules to be considered. It has been suggested that a threshold of 30 should be set, given that the gain in publishing beyond this is small (Fig. 1). In order not to exclude other non-dioxin like PCB, 7 non-dioxin like PCB were added, resulting in a list of 54 organic molecules of interest.

Fig. 1
figure 1

Cumulative number of publications as a function of the number of molecules to consider.

In 2023, the same methodology was followed, considering the period 2015 to 2022 for the update. Updated legislation was used for the filter application, leading to the addition of 18 new organic molecules to the BAPPOP dataset. Based on the same strategy, 36 molecules that are now prioritised by environmental and risks agencies were added to the BAPPOP dataset update, including PCCD, PCDF and per-and polyfluoroalkyl substances (PFAS), of which 7 PCDD and 10 PCDF that are conventionally assessed in the environment and 19 PFAS based on commonly studied PFAS in scientific literature. Thus, 108 organic molecules of interest are now considered in the BAPPOP dataset.

Plant selection criteria

The publications, of whatever origin, from which the data were registered in BAPPOP were selected following a specific decision-making methodology detailed in the Fig. 2 flowchart. The first selection criterion was to refer to experiments on kitchen garden crops exclusively. The kitchen garden crops not cultivated in temperate climates were not included.

Fig. 2
figure 2

Decision-making flowchart for data implementation in BAPPOP dataset. (BCF stands for bioconcentration factor).

With regards to plant species, the consumed organs of plants were taken into account, to the exclusion of all others. For example, a study concerning the analysis of organic molecule concentration in tomato roots was excluded. Similarly, studies involving model plant species (e.g. Arabidopsis thaliana) or cereals other than grain of sweet corn were excluded. The molecules studied in publications had to be among the 108 selected molecules of interest. In addition, all experiments in cited studies had to be carried out on soils, leading to the elimination of hydroponic experimentations or direct use of substrates (e.g. compost, sewage sludge…) without mixing with soil. Organic molecule concentrations in soils and plants had to be supplied by the publication. Situation in which experimentations were conducted with unrealistically high doses of organic molecules in soils or leading to significant phytotoxic responses of the plant were also excluded. When bioconcentration factors (BCF) were used to express plant uptake, these had to correspond to only one species and one edible part of the plant. Data obtained by averaging the concentration of different plant species or different organs of different plant species or even calculated from models were also excluded.

Data collection

Relevant international scientific published literature was collected with a search request in WOS database, performed on title, abstract and keywords, using the following equation:

$$BAPPOP\,update=(Plant\,\mathrm{AND}\,(Transfer\,OR\,Uptake)\,\mathrm{AND}\, \mbox{``} Name\,of\,the\,organic\,molecule\,of\,interest\mbox{''})$$
(3)

At the time of BAPPOP creation in 2015, 1213 occurrences were identified for the period spanning 1950 to June 19th 2014. When updating the dataset in 2022, 2677 occurrences were identified for the subsequent period from 2015 to August 1st 2022. Following the methods described above, 90 references were recorded in BAPPOP with 87 references collected via search request on WOS and 3 experimental reports from French institutes.

Data Records

Recording of data

The scientific publications and experimental reports are referenced in BAPPOP with DOI when applicable and/or information to make the document findable. A unique source code has been designed to refer to each source document (e.g. scientific publication or experimental report) and corresponds to the 3 first letters of the first author’s surname followed by the last two numbers of the publication year of the document.

BAPPOP structure

The BAPPOP dataset is organized in 1203 experiments (Fig. 3) collected from 90 bibliographic references on 75 organic molecules, 12 families of organic molecules, 12 plant types and 48 plant species. An experiment is defined as one or a set of organic molecules analysed for one single plant species or variety grown in a single treatment/modality of culture. For each experiment at least the following data are recorded: organic molecule concentration per plant and in the soil supporting their culture. For a given experiment, different environmental media (i.e. soil, air, water) may be analyzsd. Thus, in the BAPPOP dataset, there are 6246 recorded observations. The description of the BAPPOP dataset is presented in Table S1.

Fig. 3
figure 3

Data structuration in the BAPPOP dataset. Primary keys are in grey. N° refers to the n lines (one or more) for one experiment in the database. Exp: Experiment; OM: Organic molecule; Env. Medium: Environmental medium.

Experimental conditions

In the BAPPOP dataset, organic molecules have various pollution origins (i.e. agricultural, artificial, industrial, natural, urban or a combination of origins). The environmental context and the experimental types are also detailed when possible, because these parameters can influence the final result.

Organic molecule parameters

Each organic molecule is defined by its French and English names and family type, Chemical Abstract Services number (CAS) and chemical formula (Table S1). The list of all the organic molecules considered in BAPPOP is detailed in the BAPPOP metadata associated with the dataset itself in the data repository. The types of organic molecules with the most observations recorded in the BAPPOP dataset are, in decreasing order, PCDD/PCDF, PAH, PFAS and PCB (Fig. 4). It is noteworthy that (i) of the 14 family types of organic molecules considered in the BAPPOP dataset, only 2 families (i.e. polychlorinated alkanes-PCA and herbicides) present no observations recorded in the dataset (Fig. 4) and (ii) of the 108 organic molecules considered, only 75 were recorded in the dataset because the other compounds lacked data corresponding to our research criteria. The main criteria for rejection were as follows (i) hydroponic experiments, (ii) the plant species was not considered of interest to increment the BAPPOP dataset, (iii) the concentration of the given organic molecule was not available in both the plant and the soil.

Fig. 4
figure 4

Frequency of observations recorded in the BAPPOP dataset as a function of organic molecule family types.

Plant parameters

The BAPPOP dataset describes organic molecule concentration in kitchen garden crops commonly grown in temperate climate conditions. Plant species are classified in 13 family types. It is noteworthy that leafy vegetables are the most represented plant type with 31% of data records, and that no data is available for the “Edible flower” type (Fig. 5, Table S2). Of the 98 plant species considered for inclusion in the BAPPOP dataset (Table S2), only 48 were included due to the lack of research meeting our selection criteria. Carrots and lettuces are the most documented species with over 100 experiments and more than 1000 entries recorded. Other parameters concerning the analyses and the preparation of samples are documented in BAPPOP dataset (Table S1, section “Plant”) whenever available in the source document. No information regarding cultivar and health status (publications reporting phytotoxicity were excluded) of the plant has been implemented.

Fig. 5
figure 5

Number of occurrences recorded in the BAPPOP dataset categorized by species and plant types. The proportion of data recorded for each plant type is detailed in the caption.

Soil parameters

Several parameters known to affect the availability of organic molecules are recorded in the BAPPOP dataset. These parameters concern the soil type which was used as growing support, the soil texture, and classic information about soil chemistry parameters. All details of the 8 parameters considered are given in Table S1 in the “Soil” section.

Organic molecule analyses in the environmental medium

The BAPPOP dataset focuses on studies reporting organic molecules concentration in soils but, when available, the organic molecule content in water and/or air is also included. The BAPPOP dataset also reports the extraction type and the chemical extractant used to analyse organic molecules (Table S1).

Data repository

Data collected can be freely accessed through the Recherche Data Gouv repository4. The data are organized in .csv files, accompanied by metadata. All variables recorded in the dataset are described in Table S1. To ensure the anonymity of the data, no location is provided concerning the origin in the dataset. Only the country of origin of first author is given when data were collected from scientific literature.

Technical Validation

The quality control procedure included two points: (i) quality of the data collected, and (ii) quality of the recorded data.

Concerning data collected from scientific literature, it was assumed that the quality of the data was confirmed by the peer-review process. When available, the detection limit and/or quantification limit were recorded in the BAPPOP dataset. Concerning data quality from experimental reports, the studies were supported by French public institutions (ADEME, INERIS) which build various guidelines to optimize experimentation practices18,19,20. Consequently, the use of these guidelines is assumed.

Concerning the quality of recorded data, the operators in charge of BAPPOP incrementation used a Visual Basic Applications (VBA) form with drop-down menus. For numerical inputs, when possible, data were extracted from .pdf files using the import function of Excel. When needed, data were extracted from graphs, using the online software WebPlotDigitizer21 with an estimated error between 0 and 16% (mean error 1.7% ± 2.1, median error 0.85%, n = 219 values from 14 different papers).

Usage Notes

The BAPPOP dataset is suitable (i) to compare site-specific results to scientific literature data, for example as part of an environmental analysis, (ii) to predict organic molecule’ concentration in kitchen garden crops based on the soil concentration and (iii) to identify parameters that may significantly drive the soil to plant transfer of organic molecules. For example, in 2023, a test was performed on the BAPPOP dataset concerning dieldrin, to predict its concentrations in different types of vegetables, based on soil dieldrin concentrations of a residential area. The objective was to assess the suitability of the soil for use as a kitchen garden by the local residents. This revealed that the use of the soil as a kitchen garden did indeed conform to sanitary guidelines for example for growing leafy or root vegetables.

The update by users of the BAPPOP dataset is not possible. Only those individuals responsible for updating BAPPOP dataset are allowed to carry out this task. Giveng the number of bibliographic references added with the update performed in 2023, an update of the BAPPOP dataset around every 5 years seems reasonable.