Introduction

Due to the increasing popularity of location services and Internet of Things (IoT) applications, Location big data (LBD) has become a crucial strategic resource. It is currently used to understand the patterns of human community activities, analyse geographic conditions, and develop smart cities1,2,3,4. Acquiring and researching spatio-temporal location big data can transcend traditional spatial boundaries. Some researchers have studied the mobile trajectory and surrounding environment characteristics of social work life and other trips. From this data, we can extract the patterns of social activities of individuals or groups, predict the travel behaviour of individuals or groups, and understand the law of spatio-temporal distribution5,6,7. Literature8 analyses the unreasonable areas in urban traffic based on trajectory data from over 30,000 taxis in Beijing over three months. Literature9,10 analysed 4.4 million microblogs from 630,000 microbloggers in New York using Global Position System(GPS) data and created ‘hotspot’ maps of the location of unwell users, demonstrating the spread of influenza in New York. Using hotspot maps and videos, it is possible to predict the onset of influenza symptoms in an individual as early as 8 days in advance with 90% accuracy. Literature11,12,13,14 has employed geographic information data generated from social check-ins in Beijing, Wuhan, Shenzhen, and other large and medium-sized cities to classify and study hotspot business districts and typical landmarks in these cities. Literature15,16 has analysed the spatio-temporal distribution characteristics of real-time kinematic(RTK) location big data and its correlation with economic and social development.

The satellite navigation and positioning reference station network is a crucial infrastructure in modern society. It provides accurate positioning services ranging from metres to centimetres and plays an indispensable role in the national economy’s construction. In 2011, the first phase of the Hunan Continuously Operating Reference Stations (HNCORS) project was completed17. The network now offers RTK positioning services to the public. RTK location data is a collection of data that can accurately reflect the user’s position. It contains a series of information, including the user’s precise position, the number of satellites, and the solution status. In order to utilise the RTK service provided by HNCORS, users are required to transmit their positional data to the data centre on an uninterrupted basis. The data centre then generates the corresponding virtual observation value at the user’s location and returns it to the user, thereby enabling high-precision positioning. HNCORS is currently employed in a multitude of fields, including surveying and mapping geographic information, natural resources, smart cities, and others. The data centre stores user location data from 2016 to the present, forming RTK location big data that provides a foundation for location big data research. The data centres average daily access is 1.2 million times.

The current approach to updating geographic information data relies on a regular and uniform updating process, with updates occurring at a consistent time interval. However, this method is not without limitations. Firstly, this approach does not make efficient use of resources, as updates are applied uniformly without considering regional variability in change dynamics. Secondly, it lacks relevance in certain contexts. Thirdly, it is unable to improve the frequency of updating geographic information changes in the region. Existing technologies lack the capability to accurately capture real-time geographic changes, making it difficult to assess regions that require updates. This makes it challenging to assess whether the region needs to be updated with geographic information. RTK location big data is characterised by high precision, high frequency and rich information content. Its data is derived primarily from the process of mapping geographic information and other activities requiring high-precision positioning. It is closely related to the production and construction activities of mankind, and mapping activities are frequent in areas with a high concentration of urban construction projects. The mining and analysis of RTK location big data allows the extraction of information related to urban construction, thereby enabling the identification of potential areas of geographic information change and the provision of a reference basis for the updating of geographic information data.

Methodology

This paper presents a spatial active index for RTK location big data by processing the data through pre-processing techniques and constructing a user spatial active index. The correlation between the user’s active spatial index and changes in remote sensing points is analysed using remote sensing point change data to verify the relationship between RTK location big data and changes in geographic information. The RTK location data is utilised to create the geographic information update index. This index serves as a reference point for updating geographic information. The flow chart is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the geographic information update index of RTK location big data.

The network RTK location big data from HNCORS should be collected, cleaned, thinned and merged in order to ensure that the data can meet the requirements for further processing. The study area should then be divided into multiple grids according to a certain grid size, and the spatially active index of each grid calculated. The correlation between the user spatial activity index and the remote sensing influence change patch is analysed according to the county administrative scale. This is done in order to gain insight into the characteristics of the remote sensing image change patch data. Furthermore, the relationship between the network RTK location big data and the remote sensing image change patch data is studied. The user spatial activity index is employed in the construction of the geographic information updating index for county-level administrative areas and 1:10,000 standard maps, thereby providing a foundation for decision-making on geographic information updating.

Correlation analysis of RTK position big data and remote sensing image change patches

Data collection and preprocessing

The objective of data collection and preprocessing is to provide research data for data mining. This is achieved by collecting relevant data that is needed for location big data analysis and then preprocessing it. In this paper, we present the collection of two data sets: (1) HNCORS RTK location big data (approximately 6.75 billion items) from 2016 to 2018, and (2) remote sensing image change patch data of Hunan Province from 2016 to 2018. The RTK data adhere to the specifications set forth by the NMEA 0183 protocol, necessitating the transmission of GGA message, The message format is: $GPGGA,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>*<15>, and the specifics of each number are as shown in Table 1.

Table 1 NMEA 0183 protocol GGA message format description.

The remote sensing image change map data for Hunan Province has been derived from the utilisation of remote sensing image data in conjunction with the regular updating of geographic information within the province. The incorporation of remote sensing image data has facilitated the comprehensive delineation of geographic information alterations across the entire geographical scope of Hunan Province. Consequently, the data can be deemed to accurately reflect the prevailing circumstances pertaining to geographic information changes. Furthermore, it can serve as a reliable validation of the data presented in this paper.

As the HNCORS RTK location big data is too large, pre-processing of the data is required before modelling for data analysis. Preprocessing includes 3 main steps: data cleaning, thinning and merging.

The primary objective of data cleansing is to eliminate incomplete data structures, obvious errors, and non-compliance. This paper’s cleaning process removes user location data that does not conform to the NMEA 0183 protocol and eliminates inappropriate data in the location data, such as geographic coordinates and network latency, which should be numeric but are recorded as textual data error records.

Thinning aims to reduce the number of high-frequency records sent by the same user. When users send location data to the data centre, they do so at different time intervals due to differences in instrument models, the lowest frequency being once every 60 s. To ensure the analysis results are not affected by these differences, this paper thins out the user location data to a time interval of 60 s. The cleaned and thinned data is then merged and stored in a single file for subsequent model analysis.

Model construction of user spatial activity index

The location data sent by HNCORS users to the data centre is single-point data, in order to further analyse the user activity index in different regions and evaluate the user activities within a certain area, this paper constructs a user spatial activity index model. The spatial activity index for the user is calculated based on a grid of a specific size that divides Hunan Province into multiple sections. The number of location data within each grid is counted, and then the spatial activity index for each grid is calculated using formula (1) for subsequent analysis.

$$\:USI=n*g/m$$
(1)

The User Space Activity Index (USI) is calculated using the formula USI = n/(g*m), where n is the number of users in the grid, g is the total number of grids, and m is the total number of users in Hunan Province.

The area of remote sensing image change patch represents the measured data employed to validate the spatial activity index of RTK users. The area of remote sensing patch change is counted according to the districts and counties. Furthermore, it is essential to normalise the data. Prior to analysis, the remote sensing image change patch is initially divided by the total area of each district and county. This is followed by calculation using the normalisation formula, as illustrated in Eq. 2.

$$\:{X}^{{\prime\:}}=(X-{X}_{min})/({X}_{max}-{X}_{min})$$
(2)

In order to identify an appropriate grid division method, we conducted an analysis to determine the correlation between the user spatial activity index, calculated using the 0.05° × 0.05°, 0.01° × 0.01° and 0.025° × 0.025° grids, and the data on remote sensing patch changes in 2016. The results of this analysis are presented in Fig. 2.

Fig. 2
figure 2

Comparison of RTK spatial activity indices calculated for different grid sizes with changes in patches sensed remotely in 2016.

As illustrated in Fig. 2, when the grid size is 0.01°×0.01°, there is no discernible correlation between the RTK user active index and the spatial active index calculated from the 2016 remote sensing patch data. 9, which is demonstrably superior to the other grid delineation methods, and the principal reason for this is associated with the operational range of RTK users, which is approximately 1 km in general, which is equivalent to 0.01 degrees. Consequently, in this paper, we elect to set the grid size as 0.01°×0.01° as the foundation for the subsequent research.

Correlation analysis of user spatial activity indices with remote sensing impact change patches

The remote sensing image change patch data of Hunan Province comprises the patch area data of each district and county where geographic information has changed in Hunan Province, as monitored by remote sensing. This data effectively reflects the changes in geographic information in each district and county of Hunan Province on an annual basis. To study the impact of HNCORS RTK locations on mapping geographic information, we analysed the correlation between remote sensing image change patch data and RTK user spatial activity index in Hunan Province at the county scale.

This paper, we utilise the location big data generated by HNCORS from 2016 to 2018 to calculate the user spatial activity index of each grid according to a 0.01°×0.01° grid size, and subsequently take the average value of The user spatial activity index of each grid in each district and county was calculated as the user spatial activity index of the district and county. Furthermore, the changes in remote sensing images of 122 districts and counties within Hunan Province from 2016 to 2018 were analysed. Finally, a correlation analysis was carried out with the remote sensing image change data of 122 districts and counties in Hunan Province from 2016 to 2018.

Figure 3 depicts the 122 districts and counties in Hunan Province, with the horizontal coordinates representing the user spatial activity index or remote sensing change area after normalisation. The user activity index and remote sensing change area of each district and county are in the range of 0–1. Figure 3 illustrates a strong positive correlation between the user spatial activity index of RTK location big data and remote sensing image change patch area across the 122 districts and counties in Hunan Province. As the user spatial activity index increases, the remote sensing patch change area also rises.

Fig. 3
figure 3

HNCORS RTK position big data and remote sensing patch variation curve.

Figure 4 depicts the correlation between the user spatial activity index of RTK location big data. The correlation between the data and remote sensing image change patch area from 2016 to 2018 is between 0.8 and 0.9, indicating a strong correlation between the two. RTK location big data is capable of reflecting changes in geographic information. Additionally, Fig. 3 illustrates that the correlation coefficient between RTK location big data user spatial activity index and remote sensing image change patch area is 0.9 in 2016, 0.89 in 2017, and 0.8 in 2018. The correlation coefficient between the user spatial activity index and remote sensing image change area reached 0.9 in 2016, decreased slightly to 0.89 in 2017, and declined further to 0.8 in 2018. This decline may be attributed to the increasing utilisation of network RTK services by users, which are provided by companies.

Fig. 4
figure 4

Correlation analysis between HNCORS RTK location big data and remote sensing patches.

The distribution of the user spatial activity index and the area of remote sensing patch changes in the province for each district and county were counted using network RTK big data from 2016 to 2018, as illustrated in Fig. 5.

Fig. 5
figure 5

Spatial distribution of HNCORS RTK big data and remote sensing change patches. Note: The map was generated using ArcGIS Geographic Information Systems software version 10.2, Environmental Systems Research Institute Inc, Redlands, Calif. URL:https://www.esri.com/zh-cn/home.

Figure 5 shows a significant correlation between the user spatial activity index and the remote sensing change area in most districts and counties. Higher user activity indexes tend to correspond with larger change mapping areas. There is a discrepancy between the number of RTK operations and the change area of remote sensing maps in some cities and counties. For instance, this is the case in Furong District in Changsha City, Wangcheng District in Changde City, and Wuling District in Changde City. There may be a correlation between the presence of self-built CORS in Changning City in Hengyang City and Beitag District in Shaoyang City, and the fact that a significant amount of surveying and mapping work did not utilize the HNCORS, but instead relied on the city-level CORS systems.

Construction of an index for updating geographic information

Correlation analysis revealed a significant correlation between HNCORS RTK location big data and remote sensing patch changes. To improve geographic update efficiency, this paper constructs the HNCORS geographic information update index model using HNCORS RTK location big data. The model serves as a guide for geographic information update. The frequency intervals between the user’s spatial active index and the area of remote sensing changes were calculated in various interval ranges, as illustrated in Fig. 6.

Fig. 6
figure 6

Distribution frequency of the user spatial activity index and the area distribution of changes in remote sensing patches.

The frequency distribution of the user spatial active index and remote sensing impact change area distribution are similar. Most districts and counties fall within the 0-0.2 interval. The geographic information update index model is constructed based on the geometric intervals of the histogram of the frequency distribution of the HNCORS RTK location big data. The index is divided into five levels, ranging from level 1 to level 5. The update index is divided into five levels, ranging from level 1 to level 5. It is based on counties and 1:10,000 standard sub-scales and is divided into levels using the geometric interval based on the histogram. This provides a decision-making basis for updating geographic information in Hunan Province.

Development of a Geographic Information Update Index

A geographic information updating index of Hunan Province based on administrative divisions has been constructed using the operational index of HNCORS RTK location big data in different districts and counties. The index is divided into five levels, as shown in Fig. 7. The updates include 20 first-level areas located in urban areas of economically developed cities and states, 23 s-level areas located in urban areas of economically developed cities and states as well as second-tier cities, 36 third-level areas located in urban areas of western cities and states as well as the central-southern region, and 15 fourth-level areas located mainly in the suburbs of second-tier cities and states. The administrative regions have been divided into 28 fifth-level update areas, primarily in economically disadvantaged counties in the western region. This division allows for more scientific and region-specific updates of geographic information data, while also saving on basic mapping funds. It meets the varying needs for geographic information data in different regions for social development purposes.

Fig. 7
figure 7

Distribution of geographic information update index in Hunan Province based on administrative divisions. Note: The map was generated using ArcGIS Geographic Information Systems software version 10.2, Environmental Systems Research Institute Inc, Redlands, Calif. URL:https://www.esri.com/zh-cn/home).

Provincial geographic information update is mainly to update 1:10,000 geographic information data, therefore, we analysed the operational index of HNCORS RTK location big data within the scope of each map according to the standard map of 1:10,000 and divided the update level of 1:10,000 standard map as shown in Fig. 8.

Fig. 8
figure 8

Distribution of updated levels of geographic information based on 1:10,000 standard map sheets. Note: The map was generated using ArcGIS Geographic Information Systems software version 10.2, Environmental Systems Research Institute Inc, Redlands, Calif. URL:https://www.esri.com/zh-cn/home.

Figure 8 shows 1,403 Level 1 update maps, primarily in the urban areas of economically developed cities and states, and 2,908 Level 2 update maps, mainly in the urban areas of economically developed cities and states and second-tier cities. 2,567 Level 3 updated maps are mainly located in urban areas of western cities and states, as well as the central-southern region. Additionally, there are 602 Level 4 updated maps, primarily in the suburbs of second-tier cities and states, and 149 Level 5 updated maps, mainly in economically less developed counties and mountainous regions in the western part of Hunan Province.

The strategy of zoning and grading for basic mapping update allows for a more reasonable distribution of costs, quicker updates in rapidly changing areas, reasonable updates in slower changing areas, improved updating frequency in economically developed areas, increased efficiency in the use of financial funds for basic mapping, and satisfaction of the demand for geographic information updating in developed areas.

Conclusion

This paper presents a model for a user spatial active index using RTK location big data generated from HNCORS users. The correlation between the user spatial active index and the area of remote sensing change is analysed, and the results show a strong correlation between the two. The user spatial active index can reflect changes in geographic information in real-time. This paper presents the geographic information updating index of Hunan Province based on administrative division and standard map size, according to the user spatial active index. The index can provide a reference for updating geographic information throughout the province. Simultaneously, the HNCORS RTK location big data can be used to establish numerical models or indexes by combining data from other industries, such as nature reserve boundaries and engineering project progress data, through joint analysis. This can provide data support for nature reserve management and key project supervision.