Introduction

Rural information services are of importance to the development of all aspects of rural areas. They can assist in agricultural production decision-making, promote the popularization of agricultural technologies, expand the sales channels of agricultural products, enhance the farmers’ living standards, and promote rural social governance. They have become a key factor in promoting the all-round development of rural areas1,2. The content of rural information services is also very extensive, mainly including agricultural production, rural economy, social life, and public services3. These information services are provided through various channels to meet farmers’ demands, and ultimately achieve the goals of improving rural social welfare.

To enhance the level of rural information services, many countries have carried out specific practices and exploratory research, achieving remarkable accomplishments and accumulating a considerable amount of experience. Through the literature review, we also find that, in the process of rural information services, there are still some problems. The most prominent one is the mismatch between the supply and demand of rural information services4. Specifically, the supply of information in rural areas is not what the farmers need, while the information that farmers truly need is lacking in supply, which has greatly affected the efficiency and effectiveness of rural information services.

The demand for rural information services refers to the needs of farmers for all kinds of information in the process of production, life, and social development. Their demands have the characteristics of diversity, timeliness, practicality, and easy accessibility3,5. Meeting farmers’ information needs is the fundamental purpose of rural information services; Rural information service supply refers to the sum of all information services provided by information providers, such as the government, enterprises, social organizations individuals, etc. It covers all the links of information production, dissemination, distribution, and application.

The reasons for the mismatch between the supply and demand of rural information services are multi-faceted. Among them, the lack of an accurate rural information service model is generally regarded as a crucial one. The rural information service model is a comprehensive service system in which information services are provided by information providers based on the farmer’s needs. It serves to meet farmers’ information needs in multiple fields, thereby promoting the all-around development of rural areas. Therefore, given the mismatch between the supply and demand of rural information services, it is urgent to develop an accurate rural information service model that can precisely achieve the balance of “supply = demand” for rural information services, that is, provide farmers with the information they truly need. At present, there is no unified definition of accurate information service in academics. However, many scholars believe that an accurate information service system is an information service model that uses big data analysis technology to identify the images of users, and based on their different needs, delivers personalized information to them through a modern network.

Previously, scholars from various countries have also researched the rural information services model4,6,7,8,9,10,11, and put forward some ideas dedicated to coordinating the mismatch between the supply and demand of rural information services. However, most of them still take “self” as the center, only considering what the service provider “can offer” rather than “what the farmers need”.The supply of information services is still mainly a one-way push, which fails to meet the farmers’ personalized needs and does not take into account the interactivity between information supply and demand. However, in reality, farmers’ demand for information services is constantly increasing, and the trend of personalized demands is becoming more and more obvious. They are also more cautious about accepting information.

In particular, literature that is truly oriented towards the farmers’ information service demands, dedicated to addressing the imbalance between the supply and demand of rural information services, and utilizes modern information technology to study the accurate information service model in rural areas remains sparse. To fill in the gap, this paper utilizes big data mining and analysis techniques to propose a rural accurate information service model that balances the information supply and demand in rural areas. It analyzes and mines the personalized information demands and behavioral tendencies of farmers, and provides accurate information services to them. On the whole, the effectiveness of information service supply in rural areas has been enhanced.

The accurate information service model proposed in this paper, under the collaboration of the government, enterprises, and third parties, utilizes big data mining technology to analyze the personalized farmers’ needs and integrates the collected agricultural technology information, agricultural production information, market information, etc. An innovative model that precisely delivers to farmers through a combination of online and offline channels. This model consists of three steps: the first is the collection and processing of relevant agricultural-related data; the second is to process and analyze the collected data.; the third is to predict the farmer’s information needs and provide them with accurate and personalized rural information services.

The remainder of the paper proceeds as follows. We review the related literature in Section “Literature review”, and then we provide the information organization and communication model design in Section “Information organization and communication mode design”. In Section “Construction of rural accurate information service model”, we construct the rural accurate information service model. In Section “Conclusion” we conclude the paper.

Literature review

There are three streams of literature related to the research in this paper, and we will review them separately. The first stream is concerned with the rural information service model. The rural information service model is the external form of rural information service supply, reflecting the interactive relationship between the information provider and farmers. Through the reform and innovation of the rural information service model, the enthusiasm of all parties involved can be mobilized and exerted, and the mismatch between the supply and demand of rural information services can be improved. It is precisely because of its significance that many scholars are dedicated to researching rural information service models. For example, Liu et al8 proposed a susceptible exposed infected recovered (SEIR) information dissemination model in the era of artificial intelligence and proved that this model is more effective than the traditional ones. Zhang et al11 reviewed and analyzed seven Agricultural information dissemination models based on Information and Communication Technologies (ICTs) in China, and provided some typical successful cases. Harbiankova and Gertsberg6 developed an information service model for the rural settlement system, considering five influencing factors ( society, technology, economy, environment, and politics). After being tested for a period of time, it was affirmed by the certification department. Jin et al4 developed a probit discriminant model of influencing factors of rural information demand using big data technology. Kukar et al7 developed an information management and decision support model applied to rural farms. This model can predict the simulated scenarios of users and can better understand the dependencies among models. Muangprathub et al.9 predicted the soil fertility level through the analysis of soil information in rural areas of India. This study provides a reference for the rural information service supply model from the perspective of technical implementation. Based on data mining and the K-means algorithm, Shang and Shang10. Analyzed agricultural fertilization behavior and its influencing factors. Wang and Wang12. Analyzed an uncertainty model in the integration path of rural tourism information service.Wang13. Built an image evaluation and optimization model utilizing deep learning technology to enhance the information service of rural tourism. Li and Zhou14. Developed an intelligent monitoring and analysis model based on federated learning to optimize the information services regarding rural sports.Ifty, R. A.15 offered an federated Learning-Driven Agricultural Innovations model to improve the accuracy of rural information services.

This research has a certain guiding significance for agricultural production and also provides ideas for the rural information service supply model. From the literature above, it can be found that the current rural information service models are diverse in types, while the accurate service that can truly meet the diverse needs of farmers is rarely studied by scholars. Most models still remain confined to the research on the information supply side, lacking in-depth research and investigation on the information demand side, and not even attempting to address the mismatch between the supply and demand of rural information services. In reality, the problem of mismatch between information supply and demand in rural area is still quite prominent. Therefore, it is very necessary to study a rural information service model that can better match the supply and demand of rural information service.

The second stream is on the farmer’s demand for rural information services. Scholars’ research on the farmer’s information demands began with how to effectively deliver information such as agricultural information techniques and pest control to them. With the continuous expansion of information demands in rural areas, these types of information have been unable to meet farmers’ actual needs. Researchers have begun to expand the research scope of information that farmers are concerned about, from the initial information on agricultural technology and pest control to that on agricultural production materials, and then to market information on agricultural products. For example, Bopape et al.16 Found that the information required by users of public libraries and information services in rural areas is centered around specific categories, and suggested understanding other services provided by public libraries. From time to time, we conduct in-depth research on user needs to enhance the effectiveness of information supply and meet the constantly changing needs of users. Folitse et al.17 evaluated the information needs and sources in rural areas. And concluded that some of the main contractual factors that farmers face when obtaining information include the lack of the skills to obtain information, insufficient information resources, the absence of information centers, and the inappropriate timing of agricultural programs being broadcast on radio stations. Chen and Lu5. Adopted the correlation analysis method to explore the relationship between farmers’ information demands and channel preferences. The results show that individual characteristic factors, social factors, and family factors have different degrees of influence on farmers’ information needs and access channels. Nikam et al.2 examined the farmers’ information demands in two different regions mainly relying on rain-fed and irrigated crops, as well as the ways to obtain information from different sources, and concluded that the farmers’ information demands in the two regions are slightly different. Cui et al.18 applied the Urban Network Analysis Toolbox (UNA) toolkit and demand intensity measurement methods to establish a theoretical framework to examine the supply of and demand for rural public service facilities and information services Under the background of China’s rural revitalization strategy. Shukla et al.3 studied the differences in information demands of farmers with different population structures and regions, by interviewing 90 respondents from the Sitapur region of Uttar Pradesh, India, to formulate targeted information dissemination strategies. With the advancement of rural informatization, the information provided in rural areas is no longer sufficient to meet the basic needs of production and living. Because of this, researchers shifted their attention to the fields of culture and entertainment, and even to the farmers’ information awareness and behavior.

The third stream is about the supply of rural information services. Improving the supply quality of rural information services is an important way to solve the mismatch between the supply and demand of rural information services. Currently, scholars have conducted in-depth research on aspects such as technology and approaches to information services, attempting to find methods to improve the quality of rural information services. There is also much literature on this aspect. This article only lists some representative ones. Amir et al.1 took the Lodland area of South Punjab Province, Pakistan as an example to study the interpersonal service information dissemination channels preferred by farmers. The conclusion of the research indicates that farmers prefer to have face-to-face communication with sales representatives of marketing companies regarding agricultural product service information. Chen et al.19 took the information and communication technology (ICT) platform as an example to study the role of rural information service supply in promoting rural e-government. The research results show that the information asymmetry and persistent tensions between the government and villagers have hindered the effectiveness of rural governance to varying degrees and run through all stages of e-government. Saridewi and Annisa20. Studied the impact of the popularity of the Internet on the supply of information services in rural areas and pointed out that smartphones are one of the most extensive and effective channels for the supply of information services. Rabiu21. Studied the potential of Selective Dissemination of Information (SDI) as a strategic information supply tool in meeting the information needs of rural poor communities. The conclusion shows that customized and accessible SDI initiatives have a positive impact on community development. Nowfal et al.22. Studied the impact of agricultural cooperatives, as the main providers of service information, on strengthening farmers’ access to market information. Studies show that agricultural cooperatives play an important role in providing service information.

From the literature above, it can be known that the rural information service supply system refers to an entity composed of the government, information enterprises, farmers, information products, and information communication media. The current achievements in the supply of rural information services mainly focus on issues such as the service providers, government roles, information supply channels, as well as supply effectiveness, but most of which are at a micro level. Some scholars have improved and expanded the current rural information service supply model with the government playing a leading role, all of which are aimed at addressing the existing problems. The suggestions put forward within their theoretical framework are not highly operational.

Information organization and communication mode design

Organization of rural information

The rural fine information service model constructed in this paper adopts a government-led, market-participated, and third-party-followed collaborative organization system with farmers as the information service object. Therefore, it can be divided into the following two modes: collaborative subject + service platform + farmer, collaborative subject + service platform + technician + farmer. These two models are influenced by the communication channels. If you choose an online channel, you will use the model of collaborative agent + service platform + farmer. If you choose an offline channel, you will use the model of collaborative agent + service platform + technician + farmer. As shown in Fig. 1 .

Fig. 1
figure 1

Accurate information service organization model.

Collaborative subject + service platform + farmers. This model is coordinated by the government, the market, and third parties, and uses information service platforms to send rural information to farmers in an online form. From the perspective of the subject, since the government subject in the collaborative subject still plays a leading role under this model, the government’s own supervisory function is played; the participation of the market subject is conducive to the market-oriented operation of information services and promotes the maximization of the interests of all parties; The follow-up of third-party entities is conducive to the improvement of service levels. Therefore, this organizational model has greater influence and execution. At the same time, the service platform provides support for farmers’ accurate information services, and analyzes the individual needs of farmers through data mining to ensure the effectiveness of this model organization.

Collaborative subject + service platform + technician + farmer. This model is similar to the above model and has the same advantages and features. The only difference is the participation of technical personnel under the offline communication channel. The technical personnel through the selection of offline channels ensure the effective delivery of information services. Of course, the quality and ability of technical personnel will also have a certain impact on the efficiency of the service organization model.

Rural information dissemination model

The information dissemination mode in the rural precise information service model directly determines the communication between service providers and farmers, but the information dissemination model is not directly related to the information service providers. The front-end information collection and processing, information service platform, and information The data mining function in the service platform provides an important basis for the selection of the dissemination mode of information services. As shown in Fig. 2.

Fig. 2
figure 2

Information dissemination mode of precise information service.

Accurate push considering the differences in rural information environment. Due to the different advantages and disadvantages of the rural information environment, the dissemination of information services needs to consider regional differences, and selectively push different types of information to farmers in different regions.

Consider the precise push of farmers’ individual needs. In the promotion of information services, we must fully consider the impact of different factors, such as pushing different information service content for farmers of different ages; the form and quality of rural information services should take into account the farmers’ cultural level, and push the matching information services; The duality of peasants’ occupations makes the information services that they care about quite different, so it is necessary to distinguish between the pushes of information services.

Consider the dynamic and accurate push of farmers’ information feedback. From the perspective of time, the information needs of farmers are obviously affected by seasonal factors. At the same time, as the information environment changes, and the age and experience of farmers increase, the information needs of farmers will continue to change. From the perspective of information service application effects, farmers need to provide timely feedback on information service results and satisfaction. Therefore, the service subject should timely adjust the information service content according to the needs of farmers, and carry out dynamic and accurate push.

Construction of rural accurate information service model

Through research, it is found that although the current domestic and foreign rural information service models are diversified, they still cannot meet the individualized needs of farmers. From the perspective of the service provider, it is more self-centered and considers “what can be provided” rather than “what the farmer needs”; from the perspective of the farmer demand side, the farmer’s demand for information exists objectively, and personalized demand is increasing The more obvious, and at the same time take a more cautious attitude towards information; from the perspective of rural information dissemination channels, operators provide basic network information platforms, which need to be further tapped and improved in terms of functions; from government policies, although Under the rural revitalization strategy, the trend of rural informatization is irreversible, but there is still a lack of targeted policies for path guidance in the construction of digital villages and smart villages. Based on the above problems, considering the individual needs of farmers for information and their willingness to adopt information, based on the idea of data mining, the rural information service model innovation is proposed23,24.

Model building

The rural precision information service model refers to the use of data mining technology to analyze the personalized needs of farmers under the collaborative cooperation of the government, enterprises and third parties, and accurately transfer the collected agricultural technology information, agricultural production information, market information, etc. to farmers. The first is the collection of relevant agricultural-related data, information processing, processing and analysis, guided by the needs of farmers, to predict and provide personalized rural information services to farmers. The collection and processing of data is the basis of the work and the most difficult part. Different types of data from multiple sources need to be considered. Information processing and analysis mainly use data mining tools to identify and predict different needs of farmers and establish different service categories25. The information service owner provides accurate and effective personalized services to farmers and obtains farmers’ feedback in a timely manner. As shown in Fig. 3.

Figure. 3.
figure 3

Rural precision information service model based on data mining.

From the perspective of the effectiveness of the service model, accuracy is the biggest advantage of the model, and the design of data mining algorithms is the greatest technical guarantee to achieve the accuracy of rural information service supply. In the following, the technical support guarantee under this mode will be discussed from the process of rural information data collection, data preprocessing, algorithm design, and data mining.

Data collection and preprocessing

Data collection of farmers’ information needs

The study’s respondents are villagers from administrative villages.The data mainly come from two sources: First, county-level statistical yearbooks; second, survey data, which primarily include interviews with relevant personnel and questionnaire data collected from the sample. To facilitate data collection and ensure a uniform sample distribution, while also covering three types of terrain in terms of geographic space (plains, hilly areas, and mountains), this study selected 15 towns under a central China province and 30 administrative villages within them, totaling 1,543 respondents as the study sample.

Based on the research on farmers’ information needs and farmers’ willingness to accept information, the data table is designed. See Table 1 for details. Data attributes are mainly divided into two categories: First, the basic information attributes of farmers, such as name, gender, age, etc.; Second, the types of farmers’ information needs, such as life, production, market, education, and entertainment and so on.

Table 1 Contents of data collection.

Each row represents the type of information needs of each farmer, corresponding to a transaction, and each column corresponds to an item. Let \(I{\text{ = \{ }}i_{1} ,i_{2} , \cdots ,i_{d} \}\) be the set of all items in the farmer’s demand information, while \(T = \{ t_{1} ,t_{2} , \cdots t_{N} \}\) is the set of all transactions. Each transaction \(t_{i}\) contains a subset of \(I\).

Data preprocessing

Information collection is a very important work. After data collection, we need to introduce the concept of association rules, and then construct FP Tree algorithm through association analysis to preprocess the information data.

Association rules reflect the interdependence and association between one thing and other things. Association rules are one of the main technologies of data mining. In essence, association rules are used to find frequent patterns, associations, correlations or causal structures between item sets or object sets in transaction data or other information carriers. Shopping basket analysis is a good example of association rules, such as promotion classification and cross marketing. They all discover association rules from data, identify and inform the relationship between certain human behaviors and projects, and finally describe specific local patterns. At the same time, association rules are also applicable to other fields, such as bioinformatics, medical diagnosis, web mining and scientific data analysis26.

The definition of association rule mining is as follows: for a binary data table, \(I{\text{ = \{ }}i_{1} ,i_{2} , \cdots ,i_{d} \}\) is a set of items, \(T = \{ t_{1} ,t_{2} , \cdots t_{N} \}\) is the set of all transactions, each transaction \(t_{i}\) contains a set of items is a subset of \(I\). In association analysis, a set containing 0 or more items is called an item set. The so-called association rules refer to expressions of the form X → Y, where X and Y are disjoint itemsets, X is called the leader, and Y is called the successor of the rule. In association rules, the two attribute values of support and confidence are usually used to directly describe the nature of the association rule. The support determination rule can be used for the frequency of a given data set, and the confidence determines that Y is in a transaction containing X The frequency of occurrence in. The two measures of support (s) and confidence (c) are defined as follows:

$$s(X \to Y) = \frac{\sigma (X \cup Y)}{N},c(X \to Y) = \frac{\sigma (X \cup Y)}{{\sigma (X)}}$$

In the process of mining association rules, if you do not consider the support and confidence thresholds, you will find an infinite number of association rules from the database, but in our actual life, we need association rules with practical significance to reflect the implicit data Rules, so you need to set a minimum value, that is, the minimum support and minimum confidence. Association rules can be divided into two stages: The first stage must find all frequent item sets from the data set, and the second stage generates strong association rules from these frequent item sets27.

The key of association mining is the mining of frequent itemsets. The purpose is to find all frequent itemsets. When the frequency of occurrence of an item is “frequent” relative to other items, we call it frequent itemsets. After frequent itemsets are identified, the corresponding association rules can be directly pushed out. Compared with the first stage, the second stage is easier. It uses the frequent itemsets obtained in the previous step to generate rules. If the confidence of a rule meets the minimum confidence, it is called a strong association rule. There are many algorithms that can extract frequent item sets28. The two typical strategies adopted by these algorithms are: One is to reduce the candidate set combination search space through an effective pruning strategy (Apriori technology), and one is to use compressed data representation To promote the core computing of the program (FP tree technology).

FP-tree is a prefix tree, which is composed of frequent item header table and item prefix tree, arranged in descending order of support, the higher the support, the closer the frequent items are to the root node, so that more frequent items can share the prefix. The FP-growth algorithm compresses the information in the transaction database by building an FP-tree (Frequent Pattern tree), thereby generating frequent item sets more efficiently.

The initial FP-tree root node is null, meaningless. Afterwards, the transactional database is continuously scanned to create an FP-tree. For each transaction, a node of an item is created. The special support is greater than the support threshold. When the same node is encountered, a support is added. After FP-tree is constructed in this way, frequent item sets can be mined to build a conditional pattern library. (The authors declare that all experiments in this paper were approved by the E-commerce Vocational Education Group of Hebei Province, China, and the experimental methods applied conformed to the regulations of the group on scientific experimental methods).

Analysis of improved algorithm based on FP-tree

Algorithm improvement

In order to increase the mining speed of the Apriori algorithm based on FP-tree, research is conducted from the direction of improvement to further reduce the amount of scanned data. In this paper, the following methods are used to improve and optimize: First, change the partitioning method of the algorithm and use the tail meta-partitioning method for partitioning, which can get a smaller amount of data. Second, based on the nature of the Apriori algorithm, this paper dynamically deletes redundant data in the sub-data set that is smaller than the number of dimensions of the current iteration. The dynamic reduction of the data further reduces the data in the sub-data set12. Finally, scan the sub-data set for quick statistics, quickly count the number of support of the candidate set, and determine whether the candidate set is a frequent item set, so as to achieve rapid mining.

The initial Apriori algorithm is an algorithm that uses the iterative idea of layer-by-layer search to mine frequent itemsets with association rules. Its specific operation process is mainly divided into the following three steps:

Generate candidate sets. Randomly select two frequent item sets, where each item in the k item set is arranged in order, and connect to generate a candidate set;

Candidate set cutting technique. The anti-monotonicity of frequent item sets is used. If the candidate K + 1 item candidate set contains any infrequent items or item sets, then the infrequent item sets can be directly determined, otherwise the support number statistics of the database traversal need to be further Verify whether it is frequent;

Support statistics. Scan the data set and accumulate the number of occurrences of the item candidate set in the data set. Finally, K + 1 frequent itemsets are generated according to the given minimum support number threshold.

The support number calculation follows a theorem when determining frequent itemsets: If the length of a transaction in the database is K, then this transaction cannot contain any frequent itemsets with a number of items greater than K.

Corollary: If the length of a transaction is less than, the transaction data can be ignored in the support count statistics of the K + 1 candidate set.

The specific improvement methods based on FP-tree structure are as follows:

The FP-tree structure is a tree data structure that compresses data, where each node corresponds to an item element, each node is identified by an item element, the number of supports through the node, the item element chain, and the parent node pointer 4 Domain composition. In addition, in order to facilitate the operation of the tree, a header table is also needed to record the two fields of the item meta identifier and the item chain link. In FP-tree, the solid line is the parent node pointer, and the dotted line is the item chain.

Tail meta partition: Search the FP-tree through the tail meta of the candidate set, divide the original data set to generate several sub-data sets, so as to achieve the purpose of reducing the amount of data. There is only one copy of each generated sub-data set in memory, and there is no need to generate it again after generation. The specific process of tail element partitioning is shown in Fig. 4.

Fig. 4
figure 4

Schematic diagram of FP-tree structure.

If a tail item \(t_{k}\) exists in the candidate set, the sub data set generation process of \(t_{k}\) is as follows: Find the node-head of the item element chain in the header table Header Table of the item element \(t_{k1}\), and use the node-head to find the first node T1 whose item-item identifier item-name is \(t_{k}\) in the FP-tree. Assuming that the general form of the branch of the FP-tree prefix where node \(t_{k1}\) is located is shown in Fig. 5, the information of all nodes on the branch can be obtained by traversing the branch, and the item-name of each node on the branch is summarized into a vector representation. The general representation of the branch vector is: \(k = [t_{1} ,t_{2} \cdots t_{k} ]\).

Fig. 5
figure 5

FP-tree branch.

The \(t_{k1}\) branch vector can be obtained as \(k_{1}\), and record the support count of node \(t_{k1}\) as \(S_{1}\), Find the node \(t_{k2}\) at the next position \(t_{k}\) through the node-link of node \(t_{k1}\), and generate the vector \(k_{2}\) of the branch where the node \(t_{k2}\) is located in the same way, record the support number as \(S_{2}\). Traverse the next node according to the node-link of the current node until the last node \(t_{ki}\). Get the branch vector \(k_{i}\), support number \(S_{i}\), stop traversing. Summarizing all the branch vectors, the sub-data set of this element \(t_{k}\) can be obtained as: \(M_{k} = \{ k_{1} ,k_{2} , \cdots k_{i} \}\). The corresponding support data set is \(S_{k} = \{ S_{1} ,S_{2} , \cdots S_{i} \}\). The \(S_{k}\) data set records the number of supports corresponding to each branch in the sub-data set, which is used for the following quick support number statistics.

By tail element partitioning, the data to be scanned for frequent item sets with the same tail element are grouped into the same sub-data set, thereby avoiding the scanning of irrelevant data. In the case of a large number of frequent item sets, the amount of data is reduced. More, the algorithm mining speed is also improved.

Dynamic data reduction: The iterative generation of candidate sets recurs from low to high dimensions. When the number of transactions is less than the number of dimensions of the iteration, it is known that the frequent itemsets of the current number of dimensions must not be included in this transaction, then this transaction can be ignored in the support statistics, and this transaction will be deleted from the sub-data set to achieve reduced scanning The purpose of the data volume. The description of the dynamic reduction of the sub-data set data is as follows:

On the basis that the sub-data set has been generated, if you want to count the support number of a certain candidate set \(C_k = [t_{1}\quad t_{2} \cdots t_{k} ]\), then select the last item \(t_{k}\) of \(C_{k}\), find the sub-data set corresponding to \(t_{k}\), such as the sub-data set \(M_{k}\) in Eq. (2). Among them, if the dimension of the current iteration is j, then delete the transaction with the number of transactions less than j in \(M_{k}\), get a new \(M_{k}\) with a smaller amount of data.

The improved method of dynamic data reduction is to reduce the data based on the reduced sub-data set in the previous layer. When the number of iterations is higher, the more redundant data can be deleted. Therefore, the more obvious the mining speed of the algorithm is improved.

Fast statistics support number: Traverse new \(M_{k}\), compare the vector of each branch with \(C_{k}\), if \(k_{i}\) contains \(C_{k}\), then this branch is considered to contain candidate set \(C_{k}\), the cumulative number (3) supporting number in the data set \(S_{k}\) corresponding to this branch, if there are integers n branches containing candidate set \(C_{k}\), Then the support number of the final candidate set \(C_{k}\) is: \(S = \sum\limits_{i = 1}^{n} {S_{i} }\). If \(S\) is greater than the minimum support number, the candidate set \(C_{k}\) is a frequent item set.

The improved algorithm proposed in this paper is improved through the above ideas and methods, which improves the algorithm mining speed, especially for data sets containing a large number of high-dimensional number frequent item sets, the amount of scanned data will be greatly reduced, and the algorithm mining speed can be significantly improved29.

Improved algorithm experiment simulation analysis

In this section, the self-built database collected and processed using questionnaires is analyzed using the original Apriori algorithm and the improved Apriori algorithm based on FP-Tree, and then compared from two aspects: The running time of the two algorithms on different scale data under the same conditions; Comparison of the running time of the two algorithms under different tolerances. In order to reduce the experimental error, the experimental results are averaged after multiple experiments for comparative analysis.

Comparison of running time at different data scales. While keeping the support and the number of cluster points unchanged, the data set is divided into six subsets of different sizes, and the different subsets are tested separately to see the time required to run the data, as shown in Fig. 6.

Fig 6
figure 6

Comparison of running time of the two algorithms under different data scales.

It can be seen from Fig. 4 that when the amount of data is small, the running time is more spent on parallel work scheduling. Therefore, the running time of the Apriori algorithm and the FP-Tree algorithm is basically the same, and there is no obvious advantage of the algorithm. When the amount of data gradually increases, the difference between the Apriori algorithm and the FP-Tree algorithm is gradually reflected. The running time of the FP-Tree algorithm is much shorter than that of the Apriori algorithm. When the amount of data continues to increase, the Apriori algorithm will first cause problems such as insufficient memory in the FP-Tree algorithm, which causes the algorithm to fail. If the memory reaches a relative value and remains constant, then the data processed by the FP-Tree algorithm will be several times higher than the Apriori algorithm. Therefore, the experimental results are basically consistent with the proposed advantages, indicating that the improved algorithm is significantly better than the original algorithm.

The running time under different tolerances is compared as follows: This experiment only changes the preset support degree of the algorithm, and explores the efficiency of the two algorithms under different support degrees. With the same data size and the same execution in a distributed environment, the number of cluster nodes is 3. The experimental results are shown in Fig. 7.

Fig 7
figure 7

Comparison of the running time of the two algorithms under different tolerances.

It can be seen from the figure that with the increase of the minimum support, the overall running time of the algorithm is continuously decreasing, but the overall decreasing trend is gradually gradual. This is because as the minimum support increases, the resulting frequent itemsets will decrease, and the algorithm calculation time will also decrease. Overall, when the data set is kept at a fixed size, no matter how the tolerance of the algorithm is changed, the running time of the FP-Tree algorithm will always be less than the Apriori algorithm, which also verifies the superiority of the algorithm.

Simulation experiment

The database used in this study is a self-constructed database composed of county annual data and survey data. It employs high-performance hardware configuration and a stable operating system environment to ensure the accuracy and repeatability of the experiments. The computer used for the experiments is equipped with an Intel Core i7-14700KF processor, 32 GB DDR5 6000 MHz high-frequency memory, and a 1 TB NVMe M.2 solid-state drive. The operating system is 64-bit Windows 11, and the software environment relied upon during the experiments is IBM SPSS Modeler 18.0. This setup ensures the objectivity and accuracy of algorithm efficiency testing during hardware and software operation.

For the types of farmers’ information needs, the improved Apriori algorithm is used to mine association rules between different types of information, and the effectiveness of the rules is determined based on support and credibility.

According to the principle of the algorithm, the data records in the constructed farmers’ information demand data set are searched for frequent itemsets. Table 2 shows the types of information required by different farmers.

Table 2 Farmers’ demand for different types of information.

Table 2 shows the transactional database used for farmers’ information needs. Among them, a, b, …, m represent the types of information required by farmers. First, scan the transactional database to calculate the support of various information requirements in each row of records; then arrange in descending order of support, only retain frequent item sets, and remove items below the support threshold. The support threshold is taken in this paper. Is 3, so as to get < (f:4), (a:4), (b:3), (c:3), (m:3, (k:3) > , the third column of Table 2 is sorted result.

According to the frequent item set {f,a,b} in the results of the association rule algorithm, its non-empty true subset is {f}, {a}, {b}, {f,a}, {a,b}, {f ,b}. So 6 association rules were obtained, and the corresponding confidence and support were calculated. The specific rules are shown in Table 3.

Table 3 Association rules between types of information.

Therefore, based on the improved Apriori algorithm based on FP-tree, the established rural information demand data is established for association rule mining, which can effectively mine the association between different types of information, and decide whether to adopt the rule based on confidence and support. And at the same time predict the types of information that farmers may need in the future based on the types of information the farmers need. Realized accurate information services based on data mining results.

Result analysis

Based on the improved Apriori algorithm based on FP-tree, the basis for farmers’ personalized information needs is established. According to the actual demand of farmers for information, the association rule analysis method was used to find out the association relationship among all agricultural-related information categories, and establish association rules based on confidence.

The establishment of the above rules has provided technical support for the precise model of rural information services, greatly improved the accuracy of farmers’ information, and truly realized the provision of information services according to the information environment and on demand. At the same time, considering that the time complexity of the original association rule algorithm is too high, which may lead to a decline in data mining efficiency in the case of large amounts of data, this paper optimizes and improves apriori in advance to effectively avoid this problem.

The accuracy of the results reached over 90% when compared with the interview data from the respondents. This not only significantly improved the efficiency of matching information supply and demand, but also ensured the precision of the information provided.

Conclusion

Firstly, the types of foreign rural information service models are analyzed, and a comparative analysis is carried out. It is found that there is a large gap in the information service models of different countries. This diversity provides a reference space for my country’s rural information service. Secondly, it sorts out and summarizes the domestic rural information service models, and finds out the problems in China’s rural information services, mainly including the lack of targeted rural information services, the peasants’ own deficiencies, the high cost of rural information services and the lack of rural information services Long-term mechanism, etc. This part is not only a supplement to the current situation of rural information service supply but also a continuation study. According to farmers’ information needs and willingness to adopt information, an accurate information service model with feedback mechanism based on data mining is proposed. It mainly uses data mining methods to predict the needs of different farmers and provide differentiated information based on different needs. At the same time, when the farmers receive the information, they can provide feedback and realize two-way communication, and the information providers can dynamically adjust the types and methods of information services. Through the accurate information service model, the value of rural data can be brought into full play to realize the effective use of information.