Background & Summary

Beach litter is a growing concern worldwide due to its negative impacts on the environment, economy, and marine life1. Collecting and organizing beach litter data is crucial for understanding its sources and distribution2. Integrating meteorological and oceanographic factors such as winds, waves, grain size, beach slope, and precipitation into a unified dataset for the same location and time frame, could enhance data analysis and comprehension of beach and marine litter dynamics3,4.

Regarding data collection, meteo-oceanographic data could be retrieved from models (e.g., wind direction and intensity) or meteorological stations, and field sampling (e.g., grain size). Beach activities and the presence of Marine Protected Areas (MPA) are also key to contextualizing beach litter data5, this type of data could be collected by observation and following a given checklist (e.g., BeachLog tool5, or Coastal Scenery Evaluation techniques6). Also, the interaction with marine litter and the biota, such as algae7 and protected species8 is a topic that is raising attention and concern in the scientific community and coastal managers. This data could be collected in field campaigns, combined or not with beach litter monitoring or cleanup activities.

Tourism has been identified as a potential source of beach litter in many popular tourist destinations9,10,11. However, there remains a gap in the literature regarding datasets that integrate information from both the tourism sector and beach litter. One possible reason for this gap could be the conflict of interest in disclosing data related to tourism activities and accommodation occupancy and revenue, given that tourism is a lucrative and rapidly growing industry12

Public datasets provide, most of the time, a fragmented perspective into marine litter topic. For instance, Oracle was used to store data from necropsies and stomach content analyses to study biological interactions with marine litter13. Europe has led the way in establishing strong data management practices, exemplified by the pan-European beach litter database. This database supports the EMODnet Chemistry beach format, enabling the integration of datasets from various protocols and reference systems for marine debris monitoring14,15,16. In contrast, countries in the Global South, such as Brazil, still lack comparable frameworks for effective marine litter data management17.

If databases could integrate different types of data, it would represent progress in marine litter studies. One challenge is that the timeframes of different data types do not match. In this paper, we made an effort to collect all data (beach litter, beach use, and meteo-oceanographic factors) within the same timeframe and geographic region. As a result, we compiled an unified dataset, enabling a more complete view about beach litter topic. For each type of data, we used a different collect method, but comparable in space – time dimensions.

As a region of interest and for dataset construction, we collected data on Itamaracá Island in northeast Brazil. As shown in Fig. 1(a), beach litter is present along the sand strip and near urban areas, highlighting the importance of integrating different types of data related to beach litter, including beach use. The variety of beach litter possible sources, types and materials (Fig. 1b) also presents a challenge for mitigation and management strategies18. Also, the presence of plastic item in beach wrack is reported for the region (Fig. 1c), adding more complexity to management strategies. Connecting data on beach litter, beach use, and coastal oceanographic conditions can provide a more complete understanding of the problem, especially when collected within the same timeframe or sampling effort.

Fig. 1
figure 1

Beach litter in Itamaracá island – Pernambuco – Brazil. (a) Beach litter in the sand strip, close to an urban area; (b) Beach litter collected in 25 m2, (c) fishing net fond in a beach wrack in Sossego beach (Sos) in Itamaracá Island.

By incorporating variables like meteo-oceanographic data, beach use, and beach litter surveys into a single platform, relational databases enable the identification of correlations and patterns that may not be apparent when analyzing datasets individually. This provides a more comprehensive understanding of marine litter possible sources, society role and impacts, and mitigation strategies. Together with Citizen Science initiatives, it can collaborate for the better use of already collected and open accessed data19. Future perspective could include data from initiatives such as ocean travelers (https://serc.si.edu/participatory-science/projects/ocean-travelers) and CoastSnap20.

In summary, the main contribution of this paper is that we compiled a dataset for Itamaracá Island integrating different data types focused on beach liter and based on same timeframe data collection.

Methods

Area of interest

Itamaracá Island, situated in the Northeast of Brazil (7.735° S, 34.870° W) (Fig. 2), is a coastal region of significant ecological and socio-economic importance, with a growing tourism sector. The island’s urban development and increasing human activity make it a hot spot for beach litter studies, as these factors contribute to pollution in the coastal environment. Additionally, the island is surrounded by Protected Areas and an important estuarine system21. The interaction between urban occupation, tourism, and adjacent conservation areas presents a unique opportunity to understand beach litter dynamics and its interaction with beach use, the tourism sector, and meteo-oceanographic data.

Fig. 2
figure 2

Itamaracá Island and neighbor cities such as Igarassu (south), Itapissuma (west), and Goiana (north). Sampled beaches are represented as red circles and urban areas as gray lines.

Relational database

We developed a relational database to retrieve and construct a unified dataset focused on beach litter and including other data types. The physical model was implemented in PostgreSQL using the pgAdmin interface and SQL (Structured Query Language). We created the tables, defined their relationships, and input data using a Python script linking the table structure to data organized in a spreadsheet. The database Entity-Relationship Diagram (ERD) is represented in Fig. 3. The Python scripts are available at https://github.com/ramos-bruna/MarineLitter_database.

Fig. 3
figure 3

Entity-Relationship Diagram (ERD) for a relational database about beach litter, interaction between litter and beach wrack, accommodation and hotel information, meteo-oceanography, and beach use variables/attributes. Relationships are highlighted in dark blue lines connecting the tables/entities.

The data that was used to compile the dataset focused on five aspects: beach litter sampling, litter and beach wrack interaction, beach use, accommodation focused on tourism, and meteo-oceanographic variables (Table 1). All the data was collected on Itamaracá Island in northeast Brazil (Fig. 2). The sampling campaigns occurred during the spring tide of March, June, September, and December 2022.

Table 1 Overview of datasets and data sources.

For beach litter, the team walked along a 25-meter transect, collecting all visible litter found between the low tide level and the high tide range. All collected litter was separated, counted, categorized based on UNEP guidelines (2009)22 and possible source23. Additional details on marine litter collection are described in24. The beach litter data used for the compiled dataset is already published at figshare25 https://doi.org/10.6084/m9.figshare.14128610.v2 (Table 1) and partially used in a previous study24. However, previously only the tables litter_underwater and brand_audit were used. For the compiled dataset presented here we used the table litter_sand and added the Clean Coast Index (CCI) analysis26. Additionally, a photo repository from the collected beach litter in December 2022 was created27 https://doi.org/10.6084/m9.figshare.28695560.v1.

Beach use data were acquired applying the BeachLog tool5 in the beaches in Itamaracá island (Fig. 2) right after the beach litter sampling. The data is available in figshare28 https://doi.org/10.6084/m9.figshare.27246942.v2 (Table 1) and it was partially used in a previous publication5. However, the BeachLog_itamaraca dataset contain 2 more months of data compared to the previous version of the dataset, and it was used to integrate the compile dataset in the study. Also, we applied the Coastal Scenery evaluation6 and added this in the compiled dataset. The Coastal Scenery analysis6 was done for the same beaches (Fig. 2) and time frame as the beach litter collection and the BeachLog.

Accommodation focused on tourism data was retrieved from Google Maps and AirDNA (https://www.airdna.co/) for the four months of data collection. The total accommodation options for the island was accounted of 420 units, we subset the hotels in the three sampled beaches for the hotels list. Data as occupation percentage, average stay time and daily price were calculated for the four months of data sampling.

The data collection for the interaction between litter and beach wrack was done on the same date and location (Fig. 2), and the data was retrieved from a dataset in Figshare29 https://doi.org/10.6084/m9.figshare.28939709.v1 (Table 1). We applied the CCI analysis for the litter found in the wracks and added this to the compiled dataset.

Regarding meteo-oceanographic variables, tide data were sourced from the Brazilian Navy website (https://www.marinha.mil.br/chm/dados-do-segnav/dados-de-mare-mapa), and the sampling point was Recife Harbor (08° 03′.4 S; 034° 52′.1 W). Wave, wind, and precipitation data were obtained from the Global Forecast System (GSF). The example code for data retrieving is available on https://github.com/ramos-bruna/MarineLitter_database/blob/main/GFS_data_retrieval.py.

Data Records

All data is provided in a compiled dataset available at figshare30 https://doi.org/10.6084/m9.figshare.29128109.v1. The file BeachLitter_compiled_dataset_itamaraca.xlsx contains the sheets Beach, SurveyDetails. LitterData_beach, LitterData_wrack, Meteo_oceano, BeachLog, Accomodation, and Hotel. Additionally, the file data_dictionary_beach_litter_itamaraca.txt describes each sheet from the dataset as well as the columns and respective measurements units.

Technical Validation

To validate the representativeness of the beach litter sampling effort, the species accumulation curve was applied to beach litter sampling (Fig. 4). This illustrated the relationship between transect width and the number of litter categories recorded. This approach, commonly used in ecology to assess sampling effort31, demonstrates that as the sampled area increases, the number of identified litter categories also rises. Each curve represents a different sampling effort, highlighting variations in accumulation patterns across different beaches and transects. From a transect width of 20 m onward, most curves stabilize, with only one remaining unstable at 25 m. This finding validates our data, indicating that more than 90% of the litter types present on the beach were sampled.

Fig. 4
figure 4

Species accumulation curve applied to beach litter sampled in Itamaracá Island. Each curve represents one sampling effort.

The observed differences among curves suggest spatial variability in litter distribution, potentially influenced by factors such as coastal dynamics, human activities, and environmental conditions. These findings highlight the importance of integrating different types of data, such as beach use and meteo-oceanographic variables, to ensure a comprehensive assessment of beach litter composition.

Beach use was validated using an expert’s opinion. Two volunteers with expertise in coastal management tools looked into the data and agreed with the observations collected on the field, based on BeachLog criteria and Coastal Scenery checklist.