A new approach to historical migratory movements based on surnames: the case of Spain

Rodríguez-Díaz, Roberto; Manni, Franz; Blanco-Villegas, María José

doi:10.1057/s41599-024-04065-3

Download PDF

Article
Open access
Published: 14 November 2024

A new approach to historical migratory movements based on surnames: the case of Spain

Roberto Rodríguez-Díaz¹,
Franz Manni² &
María José Blanco-Villegas¹

Humanities and Social Sciences Communications volume 11, Article number: 1541 (2024) Cite this article

3573 Accesses
19 Altmetric
Metrics details

Subjects

Abstract

A data mining technique called Self-Organized Maps (SOM) was used to select the surnames with populational information (monophyletic) to be used in the reconstruction of past demographic processes, in order to show the historical population migrations that have generated the current structure of the Spanish population. The technique made possible to identify groups of surnames with the same origin. Once the origin of each surname has been established, we can assume that each surname found outside its origin would have moved, at some point in the past, out of that original area, which allows to study historical movements. The observed movements reveal the existence of two main migratory arcs. Both migrations have moved along the coast; the first following the Mediterranean and the second following the Cantabrian Sea. It appears that these two arcs are those that have provided the backbone of the Spanish population, dividing it into two halves that would reach the limits of areas of influence of these arcs. Several types of movement have taken place (Isolation due to distance, Short-distance movements, Medium-distance movements, Long-distance movements). Those of short-medium distance have been the most frequent and most determinant in the current structure.

A detailed database of sub-annual Spanish demographic statistics: 2005–2021

Article Open access 16 January 2024

A genetic history of continuity and mobility in the Iron Age central Mediterranean

Article 17 August 2023

HISTORECO: Historical Spanish transition database on climate, geography and economics of the 20th-21st century

Article Open access 17 May 2025

Introduction

The development of informatics techniques has increased the availability of using surnames from population datasets that allow the analysis of large population groups at the country level (Rodríguez-Larralde et al., 1998; Barrai et al., 2000; Rodríguez-Larralde et al., 2000; Rodríguez-Larralde et al., 2003; Barrai et al., 2004; Dipierri et al., 2005; Manni et al., 2005; Dipierri et al., 2011; Rodríguez-Larralde et al., 2011; Longley et al., 2011; Cheshire et al., 2011; Carrieri et al., 2020) and at the continent level (Scapoli et al., 2007; Cheshire et al., 2011). Traditionally voter records and telephone directories served as databases (Rodríguez-Larralde et al., 2003; Cheshire et al., 2011; Dipierri et al., 2016; Carrieri et al., 2020) and, more recently, for population censuses (Rodríguez-Díaz et al., 2015, 2017; Posch et al., 2024) representing an excellent source of information for population studies and even the socioeconomic implications of this structure (Posch et al., 2024). Even street names have been used as a source of information for sociological analysis (Creţan and Matthews, 2016). Anthropologists and geographers have mainly carried out this type of study for different purposes, but always hoping that the large sample size will minimize possible deviations in the use of surnames as an estimator (Barrai et al., 2000; Rodríguez-Larralde et al., 2003).

When analyzing populations, many of the conclusions are based on the assumption that surnames arise in a single location, and can therefore be used as monophyletic markers. (Rodríguez-Larralde et al., 2003; Manni et al., 2005). However, since the first doubts were cast (Rogers, 1991) this assumption has continued to be controversial and apart from proving to be the most controversial aspect of isonymy the most recent studies seem to support the reliability of the method (Sykes and Irven, 2000; Gagnon and Heyer, 2001; Esparza et al., 2006; King et al., 2006; Boattini et al., 2007; Lisa et al., 2007; Mateos, 2007; King and Jobling, 2009a; King and Jobling, 2009b; Alvarez et al., 2010; Rodríguez-Díaz and Blanco-Villegas, 2010; Balanovskaia et al., 2011; Longley et al., 2011; Liu et al., 2012; Dipierri et al., 2016; Toledo et al., 2017 Carrieri et al., 2020; Kamel et al., 2023). Thus, isonymy, which currently has ample bibliographic support, turns out to be an excellent, fast, cheap and reliable alternative for the study of human populations, provided that the surnames used are properly selected since, even if the capacity to do so exists, the analysis of a complete population record (including the totality of the surnames) is not always desirable (Cheshire, 2014).

In this sense, Manni et al. (2005) proposed the application of a data mining technique (Self Organized Maps) to large biodemographic databases. The aim was to be able to analyze these databases without the need to resort to genealogical records, in order to unravel historical population processes because, even today, the scarcity of data and information are major challenges facing research on historical population migrations (Fan et al., 2023). The technique allowed the identification of groups of surnames with the same origin; in other words, it became possible to distinguish the autochthonous names of each zone, also enabling discrimination between monophyletic and polyphyletic surnames. In this way, it was possible to use monophyletic surnames as reliable markers (Manni et al., 2005; Boattini et al., 2010; Rodríguez-Díaz and Blanco-Villegas, 2010; Boattini et al., 2012; Rodríguez-Díaz et al., 2015, 2017; Kamel et al., 2023). The work carried out to date endorses the reliability of the method. Nevertheless, the databases employed had limits that reduced the reach of the conclusions drawn. In some cases, the size of the database was limited owing to the small size of the geographic area addressed (Boattini et al., 2010; Rodríguez-Díaz and Blanco-Villegas, 2010) while in others the surnames were only recently established there (Manni et al., 2005). Validation of the methodology was established only a short time ago using the Italian surnames dataset (Boattini et al., 2012). In that work, the validity of the method was checked by comparing the origin identified for each surname with pre-existing databases, with excellent results (Boattini et al., 2012). However, so far, we are not aware of any large-scale attempts to introduce the results obtained from individual surname studies, for the demographic analysis of broad regional geographies, which would allow the exclusion of polyphyletic surnames from the analyses, which as Cheshire (2014) indicates could be of great value. Therefore, in order to complete the methodology, it would only be necessary to apply it to a specific population in order to demonstrate its possibilities when it comes to showing the historical population movements that have taken place in its interior and that, in short, have determined its current population structure.

The population chosen here for such purposes was the Spanish one, which is ideal for this type of study. For centuries, the influence of Spain on other countries has been very important, both in Europe and S. America (Mateos and Tucker, 2008). This influence is mainly reflected in the widespread presence of the Spanish surname system in Latin America (Rodríguez-Larralde et al., 2000; Dipierri et al., 2005, 2011; Carrieri et al., 2020), although not only in those countries (Scapoli et al., 2007; Cheshire et al., 2011). This widespread presence of Spanish surnames and an inherited surname system assimilated by the native population’s surnames system makes the methodological and population conclusions derived from this study broadly applicable when studying other populations.

In addition, the Spanish population has been subject to certain very special conditions. Geographically, the country is located at the southern end of Europe and is isolated from it by the Pyrenees. Tanto es así que hasta bien entrado el siglo XX (1986), la migración externa ha sido una constante en la historia de España (Valero-Matas et al., 2014). This isolation has been further exacerbated by prominent orographic contrasts (Bycroft et al., 2019), Spain having a particularly complicated geophysical relief compared with other European countries. This high diversity is also seen in the linguistic field: within the Spanish population there are also different official regional languages and their variants (García, 2007; Goebl, 2010). All these conditions have led the Spanish population to be under the constant pressure from a variety of factors that have culminated in a particularly conserved structure (Adams et al., 2008; Rodríguez-Díaz et al., 2017). The current genetic diversity is the result of events and linguistic structures deeply rooted in the historical past of the Iberian Peninsula (Bycroft et al., 2019).

Both characteristics of the Spanish population—an inherited surname system widely represented around the world and a particularly well-preserved population structure—make the Spanish population a subject of special interest for a study such as ours.

For all these conditions and, given that the literature on internal migration in Spain is not very conclusive, other authors (Maza et al., 2019) have used the Spanish case as a kind of experimental laboratory to analyze internal migration in very recent times and endorse the decision to apply this novel methodology in the analysis of historical migrations.

We aim to address the question of whether the methodology proposed by Manni (Manni et al., 2005) for identifying the origin of surnames can be useful for analyzing large-scale internal movements within populations and gaining an in-depth understanding of their history and population structure. If satisfactory results are obtained through the application of this methodology to the Spanish population, it would validate a new approach for studying migratory movements in populations where genealogical information is nonexistent, inaccessible, or unmanageable. Additionally, it would establish a study protocol applicable to a surname system widely distributed worldwide. Both of these factors would make the potential results broadly generalizable.

Materials and methods

Study area

Spain is located on the Iberian Peninsula at the extreme south-west end of Europe. It has a surface area of de 504,645 Km² and it is surrounded by the sea to the north, south and east. It borders Portugal in the west and France in the north.

Geophysically, Spain is located at a considerable altitude above sea level, with a mean of 660 m. The territory can be considered mountainous in comparison with other European countries.

The population of Spain is currently 47 million people, distributed unevenly across the territory and mainly on the coastal areas, leaving the interior with a low population density (with the exception of Madrid, the administrative capital).

From the administrative point of view Spain is organized in 15 Regional Autonomies and 47 Provinces. The official language is Spanish but other co-official languages are also spoken in some areas (Catalonia, Galicia and the Basque country).

In populations with such cultural and demographic complexity (Fig. 1), working with surnames offers a clear advantage. Surnames are passed down from one generation to the next within families through a process of inheritance, allowing them to be used as markers for population and lineage. At the same time, however, a surname is ultimately just a word. This means that in diverse populations like Spain, where cultural diversity is well-preserved in terms of language, surnames can directly reflect this linguistic and cultural diversity.

Databases

In the Spanish system of surname transmission, individuals inherit two surnames (Mateos and Tucker, 2008). Everybody inherits the father’s first surname (which becomes their first surname) and the first surname of the mother (which is the individual’s second surname; e.g., Nicolás Fernández García, where the father’s surname is, Fernández and the mother’s is García). In view of the availability of the two surnames of each individual, the data base was constructed using both the first and second surnames, choice that according to some authors (Pettener et al., 1998; Colantonio et al., 2003; Dipierri et al., 2011; Rodríguez-Larralde et al., 2011; Barrai et al., 2012; Carrieri et al., 2020) duplicates the amount of information and contributes to the robustness of the analysis, given that the differences observed on various occasions (as in the present study) between the distributions of the first and second surnames have always been negligible. This fact is entirely expected if we consider that the surname inherited through the maternal line is actually the surname inherited through the paternal line of the previous generation.

The data on surnames were provided by the Spanish National Statistics Institute (INE) and came from the 2008 census. The database included all the surnames of each municipality as long as they appeared a minimum of five times. The initial database included 56,976,706 entries, corresponding to 87,148 different surnames. The INE database contains a vast amount of information corresponding to the entire Spanish population in 2008, but it has certain limitations. Firstly, it does not include all Spanish surnames; those that are not repeated at least five times in a single municipality are excluded, even though they represent a small portion of the information. Secondly, the database provides a snapshot of the population at a specific point in time, meaning it includes the entire population at that moment without providing information about the bearers of each surname. This characteristic presents certain problems; for instance, we have no way of knowing if all the bearers of a surname are adults or if they represent a single individual and their descendants, indicating a single lineage.

Data correction and treatment

The initial database was revised meticulously. We found repeated surnames, different graphs of the same surname, spaces between words, spelling errors and compound forms. All these faults are a huge problem when attempting to perform statistical processing. To avoid such drawbacks, all surnames were revised with bibliographic (Faure et al., 2001; Solís, 2002) and cartographic support and were corrected as many times as necessary.

In the next step, we removed all the surnames that did not appear a minimum of 20 times in the database in order to avoid excessive noise in the statistical procedures (Manni et al., 2005; Boattini et al., 2012). In all, once the data had been treated there were 51,419,788 data (33,753 different surnames). When discussing statistical noise, we refer, for example, to the potential influence of surnames that have appeared in Spain due to recent immigration processes. These surnames retain two characteristics that allow us to identify them. On the one hand, surnames historically established in Spain often show Castilianized spellings, which differ from those used in other countries and are more characteristic of recent immigration. On the other hand, the geographical distribution of historically established surnames tends to follow clear dispersion patterns, whereas surnames from more recent immigration appear in scattered populations and at low frequencies. In this way, we ensure that we are working exclusively with surnames that are historically established and representative of the Spanish population.

Data processing

After data treatment, a double-entry matrix was created in which the rows (i) corresponded to each surname and the columns (j) to each province. Accordingly, each cell (ij) corresponded to the frequency represented by each surname in the total population of each population.

We then performed a transformation of the frequencies in two steps (Boattini et al., 2012; Rodríguez-Díaz et al., 2015, 2017):

In the first step, we attempted to prevent the smallest populations from having excessive weight. To accomplish this, we used the expression:

$${f}_{i}=\frac{{{fabs}}_{{ij}}}{\log ({{pop}}_{j})}$$

where ${{fabs}}_{{ij}}$ is the absolute frequency of surname “i” in province “j”, and ${{pop}}_{j}$ is the total population of the province “j”

In the second step, we tried to avoid surname grouping as a function of how numerous they were, using the expression:

$${{wf}}_{i}=\frac{{f}_{i}}{\Sigma {f}_{i}}$$

where ${f}_{i}$ is the result of the previous expression.

Grouping of surnames

The surnames were grouped as a function of their geographic distribution using a Cluster-type data mining procedure, the self-organizing maps of Kohonen, or SOM (Kohonen, 1982; 1984).

SOM are unsupervised neural learning networks that allow the statistical recognition of patterns to be obtained. Here they were used to recognize the patterns in the geographic distribution of the surnames. This procedure allows the surnames to be grouped as a function of their distributing and permits their origins to be identified. Its application in the field of biodemography was developed by Manni et al. (2005) but it is a methodology that has been tested (Boattini et al., 2012) and found to afford good results (Boattini et al., 2010; Rodríguez-Díaz and Blanco-Villegas, 2010; 2015; 2017).

In our case, the software used was the “Kohonen” R Project package (Wehrens and Buydens, 2007). With this software we classified the surnames in a rectangular matrix whose size had to be decided. The criterion should involve choosing a size that is not so large that it will prevent interpretation nor too small or the results will not be representative. To achieve this, the criterion adopted here after testing different sizes was to use the smallest matrix in which an empty cell would appear (Boattini et al., 2011). Thus, we used the smallest size for which all the groups were already representative (this is why empty cells begin to appear), which in the present case was 17 cells wide, with 1000 repetitions.

In sum, the SOM consisted of an entry layer of 33,753 vectors (one vector per surname) and a layer of 289 cells (groups of surnames with a similar geographic distribution).

Origin of surnames

Finally, each group of surnames was represented graphically by gradient maps using the ArcGIS 10.0 software, which allows their geographic distribution to be observed.

On looking at these maps, it is possible to identify the origin of each population group as a function of the surname in question. The method is based on the assumption that the closer we approach its origin, the more numerous a population group will be ((Manni et al., 2005, Longley et al., 2011; Boattini et al., 2012; Cheshire (2014). Thus, observing the gradient map of the distribution of each surname (Fig. 2) it may be assumed that the population group bearing that surname will have its origin at the place where the occurrences of that surname are most frequent (Fig. 1). When identifying the origin of each surname, we must consider that we are working with geographic information. This means we only obtain information about the geographic origin of each surname, not the historical moment when it originated or when the movements occurred. The authors of the method (Manni et al., 2005) emphasize this point and question whether the dispersion pattern itself could be used to infer the historical depth of the origin and dispersion of each surname. This could be of particular interest in future developments of the method.

Migration matrices

Once the origin of each surname had been established, the second step was to assume that each surname found outside its origin would, at some time in the past, have moved out of that area (Boattini et al., 2012). Starting out from this, we constructed a migration matrix (Bodmer and Cavalli-Sforza, 1968), with as many rows (“I”) and columns (“j”) as provinces included in the study (47 ×47). Accordingly, each cell (“ij”) reflected the number of times that a surname with an origin in population “i” appeared in the population “j”.

This methodology allows the study of the historical population movements that have taken place within a population and what has been the contribution of each of the subpopulations to the structure of the entire population (Boattini et al., 2012).

Historical censuses

The results obtained by isonymy were compared with the historical population. In particular, we performed regressions between the historical censuses of the National Institute of Statistics (www.ine.es). To go even further back in time, we consulted the 1787 Floridablanca census at the digital Library of the Royal Academy of History (www.biobliotecadigital.rah.es).

Results

SOM

To analyze the internal movements of the Spanish population, a first and crucial aspect is to identify the origin of each surname and its pattern of dispersion.

These origins were identified by organizing the surnames as a function of their distribution pattern using neuronal networks. In this way, it was not necessary to identify the origin of each surname and study its movements but we were able to study the origin of each group. The Spanish surnames (Fig. 2, Table 1) were organized in 289 groups, of which 27 were identified as groups of polyphyletic surnames; 4 remained blank, and of the remaining 258 groups the origin was identified. In other words, thanks to the use of SOM we were able to determine the origin of 31,752 of the 22,753 (29,289,329 data). Each of the provinces was seen to have at least one group of surnames with their origin in it (all the provinces were represented in the sample of surnames of known origin).

Table 1 Summary table of the SOM. The table shows the number of surnames and the number of data grouped in each cell, together with the origin of each grouping of surnames.

Full size table

In comparative terms (Table 2), if the size of the population is taken into account Spain has relatively few surnames, which means that a few polyphyletic surnames will represent a greater part of the population.

Table 2 Comparative table showing the main results obtained in Spain (present work), the Netherlands (Manni et al., 2005) and Italy (Boattini et al., 2012).

Full size table

Characterization of movement

With the origin of each surname identified, we were able to build migration matrices. The analysis of these allowed the migratory processes of each province to be characterized. Thus, (Fig. 3) we determined that the western and southern zones of Spain are the ones out of which proportionally more people have emigrated over time. By contrast, in the north and east of the Peninsula expansion away from the origin has been much less pronounced.

Additionally, it would appear that four provinces are outstanding as favorite destinations of population movements. These are Vizcaya, in the north; Madrid, in the center (two of the main economic centers of the country), Valencia-Alicante, in the east, and Seville-Malaga, in the south. These would be the receivers of population movements. It seems that it would be possible to consider that the southern and western zones would be a source of population while the northeast would be the sink.

Migration distances

With knowledge of the general characteristics of the internal movements of the Spanish population, we analyzed the distance covered by means of PCA analysis. In this, we analyzed all the internal movements on the basis of the distance separating the origin of the surname from the destination. The first aspect revealed by this analysis is that, the isolation model is not homogeneous for the whole of the Spanish population.

Some trends have deformed this (Fig. 4); in particular, four different trends. The first corresponds perfectly to what would be expected from a model of isolation due to distance, and includes populations located in the north-west, north, and north-east.

**Fig. 4: Principal component analysis.**

In the second group, which is mainly localized around the center of the Peninsula, the model of isolation due to distance is deformed by the high frequency of short-distance movements (1–200 Km).

In the third group, formed by population in the south and west of the Peninsula, it is the medium-distance movements (200–600) that deform the model of isolation due to distance.

Finally, the fourth group, corresponding to the periphery of the Peninsula, is characterized by long-distance movements (more than 600).

Direction and sense of the movements

With knowledge of the different types of movement (Fig. 4) as a function of distance, we used PCA (Fig. 5) to analyze them separately. In this, we separated two populations (Fig. 5, circles) as a function of the destinations (Fig. 5, triangles) towards which the population has migrated.

**Fig. 5: Direction and sense of the migrational movements.**

First (Fig. 5A), we studied the short-distance migratory movements (less than 200 Km; these represent 18.67% of all movements). Three of them were apparently the most characteristic destinations of these movements (Fig. 1 and Fig. 5 A, 7, 23 and 28), all of them located in the N-W half of the Peninsula.

Then (Fig. 5B), we analyzed the medium-distance movements (200–600 Km), representing 23.65% of the total. Now there seemed to be 7 important destinations, although they can be grouped in two: 4 corresponding to destinations in the south and east of the Peninsula and 3 centered in the middle and north. Each group of destinations mainly received populations from the half of the Peninsula in which they were located (the south-eastern centers received population from that part of Spain; those in the north-west received immigrants from the same area (NW)).

Finally, (Fig. 5C) we explored long-distance movements (more than 600 Km, representing 13.6% of the total). Here it is important to note that the provinces in the center of the country have few destinations at distances of more than 600 Km. Accordingly, the migratory movements mainly occurred at the periphery of the country. Two groups of destinations stand out: those situated in the north-east of Spain and those located in the south-west.

The remaining 44% correspond to surnames that have remained in the province of origin.

In the case of long-distance movements, there has been a transfer of the population between the north-west of Spain and the south-east part.

In the three PCAs performed, one group of destinations located in the opposite direction to the other destinations and to all the origins emerged (Fig. 5A). In the short- and medium-distance movements this group is formed precisely by the populations surrounding the three most important destinations in the category of movements. By contrast, in long-distance movements it was the populations located in the west. In all three cases, this group of populations represents the populations that have received the least migratory movements.

Receiving centers

Attending to the movements received (Fig. 5D), the receiving centers can be classified in three categories:

Centers of national importance. These have received at least at least two different types of migratory movements. This means that the reach of their “gravitational field” is distributed across the whole country. Madrid is the only one of these destinations that loses importance in long-distance movements. This is reasonable if it is considered that it is located in the center of the Peninsula.
Regional centers. These are destinations whose reach is regional and whose importance is seen only at short and medium distance. Two of these centers are located in the north-west and the migrant population received by them is mainly from that half of the Spanish territory. The third is on the south-east coast, and the movements received are precisely from that region.
Long-distance centers. Located on the SE coats, these have only received long-distance migrations, of less relevance than the previous ones.

Main migratory movements

The main movements within the Spanish population are represented; specifically the two major migratory movements in each province (Fig. 6) and the two major immigrant movements (Fig. 7).

Analysis (Fig. 6) allows the same relevant destinations to be detected as in the analysis of movements by distance (Fig. 5) and these destinations have the same fields of attraction. Likewise (Fig. 7), the existence can be seen of two main emitting sources: one is the north-west zone of Spain and the other is the south-east. It is also seen that the movements originating in each of these foci remain in their own half of the country.

These representations also provide information about major “streams” (Fig. 6). The north-western half of the country mainly moved towards the north or the center of the Peninsula, and the south-east moved mainly around the coast and, to a lesser extent, towards the center. Moreover, it seems that these movements followed what might be termed “population corridors”. The two main ones are coastal, following the Cantabrian coast in the north and the Mediterranean coast in the south-east (Figs. 6 and 7). Although less evident and probably less relevant, another corridor can be seen in the west of the country.

Autochthony

The movements of populations alter their composition and hence one of the most interesting parameters to analyze is the autochthony of the populations, or the proportion of surnames present in a given population that have their origin in it (Fig. 8), and its relationship with the movements of the population.

In Spain, there are two zones in which the proportion of autochthonous surnames is especially high. Both zones are located on the coast. The most autochthonous zone in Spain is the Cantabrian coats in the north (which is some cases surpasses 60% of autochthonous surnames). The second one is on the Mediterranean coast, in the southeast.

Above, three corridors were identified in which a large part of the movements can be seen. The two most autochthonous zones of the country correspond precisely to the two coastal corridors. By contrast, the west corridor crosses a much less autochthonous zone.

The rest of Spain shows values ranging from 20 to 40%, with the single exception of Madrid and its surroundings in the center, and Barcelona in the north-east, both showing values that do not surpass 20%.

Historical background

Once we had obtained the data describing the structure of the Spanish population, their relations and internal movements, we were interested in addressing the issue of what kind of historical meaning could be extrapolated from these results. To accomplish this (Fig. 9) we compared the autochthonous population of each province with the historical size of these populations.

**Fig. 9: Left: Plot of the variation of the population of each province around the population mean (1.0) for the whole period (1787–2000).**

On one hand, the size of the provincial populations increased slightly but steadily up to 1950, after which it underwent sharp changes. On the other hand, in the correlations between the autochthonous population and the historical population size two observations are important. The first is that the significance rises with the antiquity of the census and the second is that it was precisely from 1950 that this correlation ceased being significant.

Discussion

SOM

The first part of the study’s objective, testing the methodology for analyzing population movements, requires its use to clarify the origin of each surname. The methodological basis of SOMs is simple: SOMs group surnames according to their geographic distribution in such a way that this can be studied in groups. Each surname group becomes more and more frequent the closer it gets to its origin (Cheshire and Longley, 2012). Accordingly, it is possible to distinguish three basic types of surnames (Manni et al., 2005).

a.
Surnames whose distribution extends throughout the area studied without obeying any apparent pattern. These are considered polyphyletic surnames and they tend to identify many individuals.
b.
Surnames whose distribution shows an ambiguous pattern that does not allow a clear origin to be established. These are ambiguous surnames that, for the rest of the procedure, cannot be considered monophyletic.
c.
Surnames whose distribution follows a clear pattern and whose origin can be established. These are monophyletic surnames, and contain valuable information for the population study.

The first two types of surname do not contain information relating to the origin of their bearers and hence could not be used in the study. Only the third type of surnames (monophyletic), for which we have been able to establish a unique geographic origin, can be used to analyze population movements.

Regarding certain population aspects, it was interesting to compare the raw findings for the three populations in which this methodology has been used (Tabla 1): The Netherlands (Manni et al., 2005) and Italy (Boattini et al., 2012). First, in comparative terms the low diversity of surnames in the Spanish population is noteworthy (Spain: 0.656 surnames/1000 inhabitants; The Netherlands 6.046/1000 and Italy 6.406/1000). This would suggest a lower diversity in Spain and has been reported in previous works (Rodríguez-Larralde et al., 2003; Scapoli et al., 2007; Adams et al., 2008; Cheshire et al., 2011; Rodríguez-Díaz et al., 2015; 2017) and would be expected in view of its isolated geographical situation (i.e., it is a peninsula at the extreme south-western end of the continent, separated from it by the Pyrenees) and the orographic features that have led it to become an amalgam of isolated parts. At this point, it seems pertinent to recall the socioeconomic repercussions that have been observed in relation to population diversity, and more specifically, those associated with low surname diversity (Posch et al., 2024).

Continuing with the comparison of the proportion of polyphyletism (2.62%), is similar to that of The Netherlands, where 1.46% of the surnames are polyphyletic, and differs from that of Italy, where 21.05% are polyphyletic. The fact that there are so few polyphyletic surnames in Spain points to the notion of a highly settled and regionalized population, or at least one with a well marked structure (Adams et al., 2008).

However, these few polyphyletic surnames represent a huge proportion of the Spanish population (63.12%). By contrast in The Netherlands a similar percentage of surnames represents a considerably smaller portion of the population (24.29%) and in Italy many more surnames represent a percentage of the population similar to the Spanish case. The comparison shows that the Spanish population is less diverse than the Dutch one.

Although it is true that this phenomenon could be due to the difference in the origin of the surnames, which in the cases of Italy and Spain is very old (13th century in Spain and 14th century in Italy), whereas in the case of The Netherlands it is much younger (19th century), it is also true that in this first impression it seems that the Spanish population shows a low diversity, a direct consequence of several factors that deriving both from its historical outflow of population to other countries (Encarnación, 2004) and from its great isolation and regionalization.

After identifying the origin of each surname, we can assume that individuals carrying a surname found outside this origin descend from a lineage that left the original population at some point in history. Based on this assumption, we can observe the historical internal movements of the Spanish population, the distances traveled, the direction of the flows, and how these have contributed to the current structure.

Migratory movements

An initial look shows that migratory movements are not a homogeneous phenomenon in the Spanish population (Fig. 3). There are two large zones showing very different kinds of behavior: a zone in which the majority of the original population (more than 60%) has left and has relocated across the whole of the western and southern zones, and the other, in which there is less of the original population (less than 50%) than the outsider one. This is located in the center and north-east.

This geographic distribution of migration shows that population movements have by no means been homogeneous and geographically asymmetrical (Santiago-Caballero, 2021). The genetic differences between populations are not random but are influenced by the physical characteristics and natural barriers of the terrain, such as mountains and rivers, which have historically limited the movement of people (Bycroft et al., 2019). It seems a priori that part of Spain has emitted population (Emitter) that could have colonized other parts of the country (Receiver). This movement has generally been from more rural populations towards more industrial and affluent areas (Bover and Velilla, 2019). The availability of economic resources often appears to be one of the most important reasons behind migratory movements (O’Brien et al., 2022). The nature of these movements would therefore have governed the structure of the Spanish population and merits a detailed analysis.

Distance, sense and direction of migratory movements

In general, it is considered that populations obey a model of isolation due to distance (Malecot, 1955). In fact, along general lines it is known that this is what has happened in the case of Spain (Rodríguez-Larralde et al., 2003; Rodríguez-Díaz et al., 2017). Nevertheless, in Spain, as is the case of Italy (Boattini et al., 2012), the movements are far from being homogeneous; neither are they reduced homogeneously as distance increases and neither do they obey a single model for the whole of the geography of the country (Fig. 4). Indeed, quite the opposite: the movements can be classified in four groups that can be analyzed individually to see how much and at what level they have contributed to the formation of the structure of the Spanish population.

Isolation due to distance. It is remarkable (Fig. 4) that the populations that best fit to the model the model of isolation due to distance coincide with those in which a language other than Spanish is spoken (Fig. 1). It would appear that although languages have not played a relevant role in the global structure of the Spanish population ((Rodríguez-Díaz et al., 2017), they could have played a secondary role at a lower geographical level in the same way as has been seen for other populations studied (Manni et al., 2004; Boattini et al., 2011).
Short-distance movements. These represent a very low percentage of the population: 18.87% of all movements. This, together with the fact that they represent somewhat uninfluential movements means that they have been of less importance in the structure of the population. They are better represented in population located around important centers (Fig. 5, A and D) on the northwest of the country. The attraction of these centers is so important that it has altered isolation due to distance.
Medium-distance movements. These are the most important (they represent 23.65% of the movements). Most Spanish provinces are separated by this distance range, such that this group is the most representative of interpopulation relations. Spain is seen to be divided into two halves: the northwestern parts and the southeastern one are not related to each other and movements occur from the provinces to centers located in the same half as these.
Long-distance movements. These are the least representative movements (13.36%) and have occurred at the periphery because the peripheral populations are the only ones separated by such large distances.

The nature and distance of these population movements seem to depend on the chronological period. In recent times (20th and 21st centuries), these distances are associated with the employment opportunities offered by each territory (Bover and Velilla, 2019).

Main movements

Detailed analysis of the main movements allowed us to observe the relationship between the Spanish populations, representing the two main destinations (Fig. 6) and the two main origins (Fig. 7) of the movements of each province.

Both the destinations (Fig. 6) and the origins (Fig. 7) reveal the existence of two main migratory arcs. Both migrations have moved along the coast; the first following the Mediterranean and the second following the Cantabrian Sea. It appears that these two arcs are those that have provided the backbone of the Spanish population, dividing it into two halves that would reach the limits of areas of influence of these arcs. On a second plane, it is possible to observe the existence of a third arc (less relevant) in the west (Figs. 6 and 7), which matches the “Ruta de la Plata” perfectly. This is an ancient communication route that was taken up by the Romans and was later used as a transhumance route (Martínez, 2003), constituting currently a population exchange route between all the towns along its route and, at present, it is maintained as an important corridor that runs through the peninsula from north to south. The role of transhumance routes as population itineraries has already been evidenced in other environments (Orrù et al., 2018) and, in fact, there are indications that, precisely in this area of the western peninsula there mixing population with different origins (Adams et al., 2008).

It seems that the presence of these large population movements is what has led to the Spanish population structure already described (Rodríguez-Díaz et al., 2017) and observed again in the present work.

Autochthony

The degree of autochthony varies considerably from one zone to another as a result of the influence of geographic or historical factors that have given rise to different migration patterns (Manni et al., 2005). In the case of Spain, it seems that autochthony is concentrated on the coasts, coinciding with the Mediterranean and Cantabrian arcs, regions that have traditionally experienced lower historical emigration (Valero-Matas et al., 2014).

It also appears that there are two types of corridor. Along the coastal corridors autochthony is very high, while the western corridor is not very autochthonous. A feasible explanation for this phenomenon is that each coastal corridor would have “articulated” its own half of the Spanish population (Rodríguez-Díaz et al., 2017). The Spanish population is divided into two halves and each extends out of a coastal arc. Through such arcs movements within a single population have occurred (northwestern half/southeastern half) while the western corridor would involve a route of exchange between these two differentiated populations and therefore has more allochthonous population. Recent studies carried out on internal migration in the Spanish population have shown the influence of climatic aspects as factors determining population mobility (Maza et al., 2019), so that the displacement within two arcs, as described, can be framed in this line.

Historical background

To validate our results, we have compared them with what is known about the historical Spanish population from available historical records and genetic studies.

With insight into the structure of the population and how it has been conformed, the most pressing question is when this process actually occurred. First, the results reported here are consistent with those described by the National Geographic Institute and the National Institute of Statistics (www.ign.es, www.ine.es), and they become even more consistent as the records of the migratory movements become older (the oldest correspond to the decade between 1960 and 1970). This was to be expected from the isonymy methodology used, which reflects the results of a historical process.

A similar situation is found for the comparison between autochthonous surnames and the historical censuses of the provinces (Fig. 8). The fact that the number of bearers of autochthonous surnames correlates better with the population size as the age of the census used increases confirms the notion that autochthonous surnames are a faithful reflection of the original population of each province and is an indication of the precision underlying the identification of the origin of each surname.

Until 1950 (Fig. 9), this correlation is significant. Sometime around then, rural emigration began in Spain and the population lost stability (Fuster and Colantonio, 2002). In fact, on observing the evolution of the Spanish population in each province (Fig. 9), it is clearly seen that all the provinces maintained a stable population subject to gentle growth up to 1950. Then, after that year some provinces suddenly began to lose part of their populations in favor of others and the population sizes changed sharply: population stability had disappeared.

This phenomenon again suggests that the long-distance migratory movements have been recent phenomenon and is coherent with the notion that the Spanish population is highly conserved (Adams et al., 2008; Rodríguez-Díaz et al., 2017). It traces back to the historical events of the Muslim era and the Reconquista, which can be placed between the 9th and 11th centuries (Bycroft et al., 2019), and therefore reflect a structure that predates the established surname system. In this scenario, it appears the Spanish interpopulation relations have been of limited reach, both as regards intensity and distance, and that they have persisted until very recent dates within zones clearly delimited by geographic determinants, which would explain why the Spanish population is clearly divided into two differentiated parts (Rodríguez-Larralde et al., 2003; Cheshire et al., 2011; Rodríguez-Díaz et al., 2017). Traditionally, population movements have been observed inside these zones and there has been little exchange between them until very recent times.

Conclusions

The results obtained demonstrate a reliable methodology that can be used (though not exclusively) in populations with a surname system similar to that of Spain. Additionally, the findings regarding the internal structure of the Spanish population and the origins of Spanish surnames may also be of interest to populations that share surnames of Spanish origin.

The application of this new methodology has allowed us to distinguish the surnames with a clear origin (monophyletic) of the Spanish population, to be used as geographic markers, in such a way that we have been able to know the origin of each group of surnames and to highlight their mobility. The coherence with previously reported results, with analyses carried out prior to the migratory movements, and the correlation between autochthony and the oldest censuses point to the precision (bearing in mind the geographical level chosen) of the technique when attempting to identify the origins of the surnames.

Within the Spanish population several types of movement have taken place. Those of short-medium distance have been the most frequent and most determinant in the current structure. This mobility has been confirmed mainly within two geographically differentiated regions. In the northwest the movements have occurred along the Cantabrian arc up to where its influence reached. Symmetrically, in the southeast the population followed the Mediterranean arc, also arriving as far as the reach of its influence. The exchange between both areas has been relatively scarce and has mainly been seen in relation to the west, following the ancient “Ruta de la Plata” corridor.

In light of the good stability of the population until relatively recently (1950) and their reduced importance, long-distance movements seem to have been a more recent phenomenon, with a less marked contribution to the population structure.

Data availability

The data used in this study are publicly available under appropriate conditions from the National Institute of Statistics (INE). Researchers may access the data through their official platform, subject to compliance with their access guidelines. Additionally, the datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

References

Adams SM, Bosch E, Balaresque PL, Ballereau SJ, Lee AC, Arroyo E et al. (2008) The genetic legacy of religious diversity and intolerance: paternal lineages of Christians, Jews, and Muslims in the Iberian Peninsula. Am J Hum Genet 83:725–736
Article CAS PubMed PubMed Central Google Scholar
Alvarez L, Santos C, Ramos A, Pratdesaba R, Francalacci P, Aluja MP (2010) Mitochondrial DNA patterns in the Iberian Northern plateau: population dynamics and substructure of the Zamora province. Am J Phys Anthropol 142:531–539
Article PubMed Google Scholar
Balanovskaia EV, Romanov AG, Balanovskii OP (2011) Namesakes or relatives? Approaches to investigating the relationship between Y chromosomal haplogroups and surnames. Mol Biol 45:473–485
CAS Google Scholar
Barrai I, Rodríguez-Larralde A, Mamolini E, Manni F, Scapoli C (2000) Elements of the surname structure of Austria. Ann Hum Biol 27:607–622
Article CAS PubMed Google Scholar
Barrai I, Rodríguez-Larralde A, Manni F, Ruggiero V, Tartari D, Scapoli C (2004) Isolation by language and distance in Belgium. Ann Hum Genet 68:1–16
Article CAS PubMed Google Scholar
Barrai I, Rodriguez‐Larralde A, Dipierri J, Alfaro E, Acevedo N, Mamolini E, Scapoli C (2012) Surnames in Chile: a study of the population of Chile through isonymy. Am J Phys Anthropol 147(3):380–388
Article CAS PubMed Google Scholar
Boattini A, Blanco-Villegas MJ, Pettener D (2007) Genetic structure of La Cabrera, Spain, from surnames and migration matrices. Hum Biol 79:649–666
Article PubMed Google Scholar
Boattini A, Griso C, Pettener D (2011) Are ethnic minorities synonymous for genetic isolates? Comparing Walser and Romance populations in the Upper Lys Valley (Western Alps). J Anthropol Sci 89:161–173
PubMed Google Scholar
Boattini A, Lisa A, Fiorani O, Zei G, Pettener D, Manni F (2012) General method to unravel ancient population structures through surnames: final validation on Italian data. Hum Biol 84:235–270
Article PubMed Google Scholar
Boattini A, Pedrosi ME, Luiselli D, Pettener D (2010) Dissecting a human isolate: novel sampling criteria for analysis of the genetic structure of the Val di Scalve (Italian Pre-Alps). Ann Hum Biol 37:604–609
Article PubMed Google Scholar
Bodmer WF, Cavalli-Sforza LL (1968) A migration matrix model for the study of random genetic drift. Genetics 59:565–592
Article CAS PubMed PubMed Central Google Scholar
Bover O, Velilla P (2019) Migrations in Spain: Historical Background and Current Trends. Banco de España - Servicio de Estudios Documento de Trabajo, n° 9909, páginas 10–23
Bycroft C, Fernandez-Rozadilla C, Ruiz-Ponte C et al. (2019) Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula. Nat Commun 10:551. https://doi.org/10.1038/s41467-018-08272-w
Article ADS CAS PubMed PubMed Central Google Scholar
Carrieri A, Sans M, Dipierri JE, Alfaro E, Mamolini E, Sandri M et al. (2020) The structure and migration patterns of the population of Uruguay through isonymy. J Biosoc Sci 52(2):300–314
Article CAS PubMed Google Scholar
Cheshire J, Mateos P, Longley PA (2011) Delineating Europe’s cultural regions: population structure and surname clustering. Hum Biol 83:573–598
Article PubMed Google Scholar
Cheshire JA, Longley PA (2012) Identifying spatial concentrations of surnames. Int J Geogr Inf Sci 26(2):309–325
Article Google Scholar
Cheshire J (2014) Analysing surnames as geographic data. J Anthropol Sci 92:99–117
PubMed Google Scholar
Colantonio SE, Lasker GW, Kaplan BA, Fuster V (2003) Use of surname models in human population biology: a review of recent developments. Hum Biol 75:785–807
Article PubMed Google Scholar
Creţan R, Matthews PW (2016) Popular responses to city‐text changes: Street naming and the politics of practicality in a post‐socialist martyr city. Area 48(1):92–102. https://doi.org/10.1111/area.12241
Article Google Scholar
Dipierri JE, Alfaro EL, Scapoli C, Mamolini E, Rodríguez-Larralde A, Barrai I (2005) Surnames in Argentina: a population study through isonymy. Am J Phys Anthropol 128:199–209
Article CAS PubMed Google Scholar
Dipierri J, Rodríguez-Larralde A, Alfaro E, Scapoli C, Mamolini E, Salvatorelli G et al. (2011) A study of the population of Paraguay through Isonymy. Ann Hum Genet 75:678–687
Article PubMed Google Scholar
Dipierri JE, Ela G, Rodriguez-Larralde A, Ramallo V (2016) Isonymic relations in the Bolivia-Argentina border region. Hum Biol 88:191–200
Article PubMed Google Scholar
Encarnación OG (2004) The politics of immigration: why Spain is different. Mediterr. Q 15(4):167–185
Article Google Scholar
Esparza M, García-Moro C, Hernández M (2006) Genetic relationships between parishes in the Ebro delta region (Spain) as estimated by migration matrix and surnames. Hum Biol 78:647–662
Article PubMed Google Scholar
Fan X, Liu Y, Yuan Y, Chen J, Chen L (2023) A surname-based index of migration intensity and its application in China. Phys A Stat Mech Appl 626:129034
Article Google Scholar
Faure R, Ribes MA, García A (2001) Diccionario de apellidos españoles. Madrid: Espasa-Calpe
Fuster V, Colantonio SE (2002) Consanguinity in Spain: socioeconomic, demographic, and geographic influences. Hum Biol 74:301–315
Article CAS PubMed Google Scholar
Gagnon A, Heyer E (2001) Intergenerational correlation of effective family size in early Quebec (Canada). Am J Hum Biol 13:645–659
Article CAS PubMed Google Scholar
García P (2007) Lenguas y dialectos de España. Madrid: Arco Libros
Goebl H (2010) La dialectometrización del ALPI: Rápida presentación de los resultados. 26th CILFR. Valencia
Kamel C, Saliba-Serre B, Lizee MH, Signoli M, Costedoat C (2023) Surnames in south-eastern France: structure of the rural population during the 19th century through isonymy. J Biosoc Sci 55(1):174–189
Article PubMed Google Scholar
King TE, Ballereau SJ, Schurer KE, Jobling MA (2006) Genetic signatures of coancestry within surnames. Curr Biol 16:384–388
Article CAS PubMed Google Scholar
King TE, Jobling MA (2009a) Founders, drift, and infidelity: the relationship between Y chromosome diversity and patrilineal surnames. Mol Biol Evol 26:1093–1102
Article CAS PubMed PubMed Central Google Scholar
King TE, Jobling MA (2009b) What’s in a name? Y chromosomes, surnames, and the genetic genealogy revolution. Trends Genet 25:351–360
Article CAS PubMed Google Scholar
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69
Article Google Scholar
Kohonen T (1984) Self-organization and associative memory. Springer, Berlin
Google Scholar
Lisa A, De Silvestri A, Mascaretti L, Degiuli A, Guglielmino CR (2007) HLA genes and surnames show a similar genetic structure in Lombardy: does this reflect part of the history of the region? Am J Hum Biol 19:311–318
Article PubMed Google Scholar
Liu Y, Chen L, Yuan Y, Chen J (2012) A study of surnames in China through isonymy. Am J Phys Anthropol 148(3):341–350
Article PubMed Google Scholar
Longley PA, Cheshire JA, Mateos P (2011) Creating a regional geography of Britain through the spatial analysis of surnames. Geoforum 42(4):506–516
Article Google Scholar
Malecot G (1955) Decrease of relationship with distance. Cold Spring Harb Symp Quant Biol 20:52–53
Google Scholar
Manni F, Guerard E, Heyer E (2004) Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by using Monmonier’s algorithm. Hum Biol 76:173–190
Article PubMed Google Scholar
Manni F, Toupance B, Sabbagh A, Heyer E (2005) New method for surname studies of ancient patrilineal population structures and possible application to improvement of Y-chromosome sampling. Am J Phys Anthropol 126:214–228
Article PubMed Google Scholar
Martínez E (2003) Atlas histórico de España. Istmo, Madrid
Google Scholar
Mateos P (2007) A review of name‐based ethnicity classification methods and their potential in population studies. Popul, Space Place 13(4):243–263
Article Google Scholar
Mateos P, Tucker DK (2008) Forenames and Surnames in Spain in 2004. Names: A J Onomast 56(3):165–184
Article Google Scholar
Maza A, Gutiérrez‐Portilla M, Hierro M, Villaverde J (2019) Internal migration in Spain: dealing with multilateral resistance and nonlinearities. Int Migr 57(1):75–93
Article Google Scholar
O’Brien T, Cretan R, Covaci R, Jucu IS (2022) Internal migration and stigmatization in the rural Banat region of Romania. Identities: Global Studies in Culture and Power. https://doi.org/10.1080/1070289X.2022.2109276
Orrù A, De Iasio S, Frederic P, Girotti M, Boano R, Sanna E (2018) Spatial diffusion of surnames by long transhumance routes between highland and lowland: A study in Sardinia. Homo 69(3):127–138
Article PubMed Google Scholar
Posch M, Schulz J, Henrich J (2024) How social structure drives innovation: Surname diversity and patents in U.S. history. SSRN Electron J. https://doi.org/10.2139/ssrn.4531209
Pettener D, Pastor S, Tarazona-Santos E (1998) Surnames and genetic structure of a high-altitude Quechua community from the Ichu River Valley, Peruvian Central Andes, 1825-1914. Hum Biol 70:865–887
CAS PubMed Google Scholar
Rodríguez-Díaz R, Blanco-Villegas MJ (2010) Genetic structure of a rural region in Spain: distribution of surnames and gene flow. Hum Biol 82:301–314
Article PubMed Google Scholar
Rodríguez-Díaz R, Manni F, Blanco-Villegas MJ (2015) Footprints of Middle Ages kingdoms are still visible in the contemporary surname structure of Spain. Plos One 10(4):e0121472
Article PubMed PubMed Central Google Scholar
Rodríguez-Díaz R, Blanco-Villegas MJ, Manni F (2017) From surnames to linguistic and genetic diversity: five centuries of internal migrations in Spain. J Anthropol Sci 95:249–267
PubMed Google Scholar
Rodríguez-Larralde A, Dipierri J, Alfaro E, Scapoli C, Mamolini E, Salvatorelli G et al. (2011) Surnames in Bolivia: a study of the population of Bolivia through isonymy. Am J Phys Anthropol 144:177–184
Article PubMed Google Scholar
Rodríguez-Larralde A, Gonzales-Martin A, Scapoli C, Barrai I (2003) The names of Spain: a study of the isonymy structure of Spain. Am J Phys Anthropol 121:280–292
Article PubMed Google Scholar
Rodríguez-Larralde A, Morales J, Barrai I (2000) Surname frequency and the isonymy structure of Venezuela. Am J Hum Biol 12:352–362
Article PubMed Google Scholar
Rodríguez-Larralde A, Scapoli C, Beretta M, Nesti C, Mamolini E, Barrai I (1998) Isonymy and the genetic structure of Switzerland. II. Isolation by distance. Ann Hum Biol 25:533–540
Article PubMed Google Scholar
Rogers KB (1991) The relationship of grouping practices to the education of the gifted and talented learner (RBDM 9102). Storrs, CT: The National Research Center on the Gifted and Talented, University of Connecticut
Santiago-Caballero C (2021) Domestic migrations in Spain during its first industrialisation, 1840s–1870s. Cliometrica 15(3):535–563
Article Google Scholar
Scapoli C, Mamolini E, Carrieri A, Rodríguez-Larralde A, Barrai I (2007) Surnames in Western Europe: a comparison of the subcontinental populations through isonymy. Theor Popul Biol 71:37–48
Article PubMed Google Scholar
Solís JA (2002) El gran libro de los apellidos. La Coruña: El arca de papel
Sykes B, Irven C (2000) Surnames and the Y Chromosome. Am J Hum Genet 66:1417–1419
Article CAS PubMed PubMed Central Google Scholar
Toledo A, Pámpanas L, García D, Pettener D, González-Martin A (2017) Changes in the genetic structure of a valley in the Pyrenees (Catalonia, Spain). J Biosoc Sci 49(1):69–82
Article PubMed Google Scholar
Valero-Matas JA, Coca JR, Valero-Otero I (2014) Análisis de la inmigración en España y la crisis económica. Pap Poblac 20(80):9–45. ISSN 2448-7147
Google Scholar
Wehrens R, Buydens LMC (2007) Self- and super-organising maps in R: the Kohonen package. J Stat Softw 21:1–19
Article Google Scholar

Download references

Author information

Authors and Affiliations

Área de Antropología Física, Departamento de Biología Animal, Facultad de Biología, Universidad de Salamanca, Salamanca, Spain
Roberto Rodríguez-Díaz & María José Blanco-Villegas
National Museum of Natural History—Musée de l’Homme, Paris, France
Franz Manni

Authors

Roberto Rodríguez-Díaz
View author publications
Search author on:PubMed Google Scholar
Franz Manni
View author publications
Search author on:PubMed Google Scholar
María José Blanco-Villegas
View author publications
Search author on:PubMed Google Scholar

Contributions

RRD was responsible for data analysis and preparation. The study design was carried out by RRD, MJBV, and FM. Data interpretation was performed by RRD and MJBV. RRD and MJBV wrote the manuscript and handled revisions. Additionally, RRD, MJBV, and FM were in charge of the overall coordination of the project.

Corresponding author

Correspondence to Roberto Rodríguez-Díaz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This study did not involve human participants or their data and therefore, ethical approval was not required.

Informed consent

The data utilized in this research are fully anonymized and do not contain any personally identifiable information. As such, no consent was necessary. The study adheres to ethical guidelines and complies with all relevant data protection and privacy regulations.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Rodríguez-Díaz, R., Manni, F. & Blanco-Villegas, M.J. A new approach to historical migratory movements based on surnames: the case of Spain. Humanit Soc Sci Commun 11, 1541 (2024). https://doi.org/10.1057/s41599-024-04065-3

Download citation

Received: 06 May 2024
Accepted: 28 October 2024
Published: 14 November 2024
Version of record: 14 November 2024
DOI: https://doi.org/10.1057/s41599-024-04065-3

Subjects

Abstract

Similar content being viewed by others

A detailed database of sub-annual Spanish demographic statistics: 2005–2021

A genetic history of continuity and mobility in the Iron Age central Mediterranean

HISTORECO: Historical Spanish transition database on climate, geography and economics of the 20th-21st century

Introduction

Materials and methods

Study area

Databases

Data correction and treatment

Data processing

Grouping of surnames

Origin of surnames

Migration matrices

Historical censuses

Results

SOM

Characterization of movement

Migration distances

Direction and sense of the movements

Receiving centers

Main migratory movements

Autochthony

Historical background

Discussion

SOM

Migratory movements

Distance, sense and direction of migratory movements

Main movements

Autochthony

Historical background

Conclusions

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Informed consent

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links