Residential real estate price prediction based on adaptive loss function and feature embedding optimization

Zhang, Hongqin

doi:10.1057/s41599-025-05217-9

Download PDF

Article
Open access
Published: 16 June 2025

Residential real estate price prediction based on adaptive loss function and feature embedding optimization

Hongqin Zhang¹

Humanities and Social Sciences Communications volume 12, Article number: 832 (2025) Cite this article

4697 Accesses
2 Citations
Metrics details

Subjects

This article has been updated

Abstract

As a crucial sector underpinning national economic growth and social stability, accurately predicting real estate prices holds immense significance in directing regional development planning and refining resource allocation. To align with the dynamic nature of the real estate market, we propose a real estate price prediction model that leverages an adaptive loss function and optimizes feature embedding. Initially, we utilize diverse real estate factors to develop a representation method for real estate prices, rooted in feature embedding optimization, to thoroughly examine the interconnections among these factors. Subsequently, we introduce a reinforcement learning approach incorporating an adaptive loss function to emphasize the significance of each factor and facilitate accurate price predictions. Experimental results demonstrate that our method achieves the state-of-the-art performance, providing a robust data foundation for the real estate market, which enhances price forecasting accuracy, benefiting investors, developers, and policymakers by improving market analysis and investment decision-making This study advances the field of real estate price prediction by offering a novel approach to dynamic factor weighting. Our findings can support urban planners and government agencies in formulating more effective housing policies and resource allocation strategies.

Housing price and credit environment: evidence from China

Article Open access 22 July 2025

Political connection and credit risk of real estate enterprises: evidence from stock market

Article Open access 26 January 2024

Research on sentiment index and real estate demand forecasting based on BERT-BiLSTM and ADL-MIDAS models

Article Open access 18 August 2025

Introduction

As a pivotal and crucial industry underpinning the national economy, the price fluctuation and stable development of real estate directly affect the overall operation of the national economy. The relationship between demand and supply in the real estate market has become increasingly complex, and the price changes have become more frequent and difficult to predict. Accurate prediction of real estate price (REP) can help investors judge when to enter the market, reduce investment risks, and improve investment returns. At the same time, it can assist developers to grasp the market dynamics, formulate reasonable development strategies and sales plans, and improve the profitability and market competitiveness of the project (Sharma et al., 2024; Song and Ma, 2024). Hence, it carries immense practical importance to study the REP prediction model, which helps to promote the housing market.

Recently, the research on REP forecasting faces multiple technical difficulties. The primary difficulty lies in the multiple influencing factors of REP, which cover macroeconomic conditions, the fluctuation of policy environment, the delicate balance between market demand and supply, the uniqueness of geographical location, and the perfection of surrounding supporting facilities (Møller et al., 2024). These factors interact with each other intricately (Wu, 2024), and some of them are difficult to be directly quantified due to their abstract or dynamic nature, which sets obstacles for the construction of accurate prediction models. Secondly, the relationships between REP and these influencing factors are not linear correspondence, but show a nonlinear feature (Nirmala et al., 2024; Millar and White, 2024). This nonlinear relationship requires the prediction model to have a high degree of flexibility and deep learning ability to capture and process the nonlinear patterns in the data, which poses a severe challenge to traditional prediction methods (Rey-Blanco et al., 2024). Finally, the acquisition and processing of real estate market data is also a big challenge. The problem of data incompleteness and lag is particularly prominent, such as the absence or delay of key data such as land transaction and housing transaction, which makes it difficult for the model to reflect the market dynamics in a comprehensive and timely manner (Li, 2023; Mao, 2023). Such defects in data quality limit the prediction model and affect the accuracy and timeliness of the prediction results (Liu and Ma, 2024; Yang et al. 2024; Yang et al. 2024).

To solve the above problems, many real estate’s price prediction models have been proposed. Wheeler et al. (2014) discussed the difference between the different functional forms used in the characteristic price model for evaluating the housing price, and found that the Bayesian method can achieve better results when used to evaluate the housing price. Wei et al. (2022) employ big data to boost the real estate evaluation in the characteristic model according to 124 studies. Guirguis et al. (2024) suggested the autoregression to predict the house price index, and their model surpassed the autoregressive moving average model and GARCH model in the out-of-sample empirical prediction. Zhao et al. (2019) established an autoregressive average moving model based on the training and validation method, and used it to forecast the house price of New Zealand. The results demonstrate that the performance of ARIMA often surpasses that of multiple linear regression models (Soltani and Lee, 2024). Zulkifley et al. (2020) employed SVM to build the price predicting model, and utilized Genetic Algorithm (GA) to optimize the parameter selection of the SVM model. The empirical analysis proved that the model had a good prediction effect. Fang (2022) utilizes the BP network for the price prediction of auction houses (Zhu and Li, 2021) and uses GA to optimize the model. Alfaro-Navarro et al. (2020) applied a variety of ensemble algorithms to predict the housing price in Spain. Wang et al. (2021) introduced the a novel network upon Bagging ensemble method according to macroeconomic data, and predicted the housing price index of four municipalities directly under the central government of China. These methodologies have leveraged deep learning approaches for predicting house prices, making significant improvements in model structure and providing directions for feature processing enhancements. In terms of model structure, deep learning methods such as convolutional neural networks (CNNs), recurrent neural networks, and their variants have been explored to improve the predictive capabilities of housing price models. These architectures allow for the extraction of hierarchical features from the input data, enabling the models to learn more complex representations of the housing market dynamics.

Although current REP prediction models have made notable progress in exploration and practice, they continue to encounter significant challenges (Liu, 2022; Khrais and Shidwan, 2023; Rampini and Re Cecconi, 2022), particularly in accurately predicting prices due to the combined influence of numerous complex factors. These factors are complex and intertwined, which makes the traditional deep learning model unable to fully and accurately express these relationships. To comprehensively consider and deal with these diverse influencing factors more effectively, we propose a REP prediction upon the adaptive loss function (ALF) (Baik et al., 2021) and feature embedding optimization. This model aims to break through the existing bottleneck through the ALF and feature embedding optimization strategy. Firstly, in view of the fact that the influence degree of different influencing factors on REP may change dynamically with time, market conditions, and other factors, we design a loss function that can automatically adjust the weight. This adaptive mechanism enables the model to flexibly respond to changes in the importance of various factors during the training process. Secondly, in order to effectively process and integrate multi-dimensional features in REP prediction, we transform high-dimensional and sparse original features into low-dimensional and dense vector representations by feature embedding, while retaining key information. We further optimize the feature embedding process to ensure that the model can fully mine and utilize the potential relationship between these features to improve the accuracy of prediction.

Related works

REP prediction is a hot research topic now, and many excellent results have been achieved. Numerous studies have been conducted to investigate the various influencing factors of REP. Ganioğlu and Seven (2021) took the developing country Turkey as an analysis sample and found that the price in Turkey was influenced by the inflow of income, population, education, unemployment, and refugees, and the housing price in Turkey showed long-term convergence. Churchill et al. (2018) delved into the convergence patterns exhibited by residential house prices across the capital cities of Australian states and subsequently constructed a nonlinear model to capture the intricacies of house price dynamics. The results showed that the house prices in Australian states did not converge. Duca et al. (2021), on the other hand, established a connection between the housing market and various aspects such as credit markets, broader economic phenomena and so on (Turnbull and Zheng, 2021; Turnbull et al., 2018; Mathur, 2017), illustrating the interconnectedness and far-reaching implications of housing market dynamics.

For the prediction of REP, Peng et al. (2023) used GA as the entry point to improve the model. In the actual data analysis and application, it was empirically found that the improved model has better valuation accuracy. Zhao et al. (2024) improved the BP network upon the fruit fly and frog-leaping algorithm, and the valuation effect of the model was greatly improved. Gabauer et al. (2024) employed the high-dimensional sparse vector autoregressive model to predict the REP of 35 cities, which could better mine the key explanatory variables and economic information, and had a better prediction effect. Jiang et al. (2023) found that the use of web crawler technology can identify key factors that can significantly impact housing prices, and the prediction accuracy of housing prices can be improved by combining Internet data with a VAR model. Liang (2023) proved through research that ARIMA model can make continuous predictions for the price of the Chinese second-hand housing market, which provides a certain reference basis for buyers and sellers in China’s real estate market. Lorenz et al. (2023) constructed random forest, boosting, and Bagging models based on the advanced network search data to predict the REP. Comparative analysis showed that the random forest model combined with network search data had the best prediction effect. Du et al. (2014) combined an SVR model with linear regression to predict the REP, and proved that the fusion model had higher prediction accuracy and better fitting effect than a single model. Peng et al. (2023) introduced the Barbara method into SVR, so that it could adjust the three parameters to the maximum extent, so as to establish the BA-SVR&WSD prediction model.

Methodology

REP prediction model upon the ALF and feature embedding optimization is proposed for the complex, realistic situation where REP is deeply affected by many complex and intertwined factors. The core of this model lies in two major innovations. First, an optimized feature embedding framework is constructed to deeply analyze and effectively characterize the multi-dimensional factors (FEF) affecting housing prices. Secondly, the reinforcement learning strategy with the ALF is introduced to dynamically adjust the optimization direction of the model in the training to achieve the robustness of the prediction results.

First, through in-depth mining and analysis of extensive real estate market data, we employ advanced feature embedding techniques to transform the myriad of factors influencing housing prices (such as geographical location, surrounding environment, building quality, policy regulations, etc.) into quantifiable feature vectors in a high-dimensional space. This achieves efficient representation and dimensionality reduction of the factors affecting housing prices. This process not only simplifies the complexity of data, but significantly improves the ability of the model to capture the change law of housing prices. Secondly, to increase the accuracy, we introduce the concept of ALF. Different from the traditional fixed loss function, the adaptive mechanism can dynamically adjust the parameters of the loss function according to the real-time performance and prediction error, so that the model can adjust the optimization strategy more flexibly in the face of different types of house price data fluctuations, reduce the prediction bias, and achieve more accurate house price prediction.

In summary, by integrating the dual advantages of feature embedding optimization and ALF reinforcement learning, the proposed model not only overcomes the limitations of traditional REP prediction methods in dealing with complex factors, but also significantly improves the accuracy and practicability of the prediction, which provides strong support for real estate market analysis, investment decision-making and policy-making.

Fused feature embedding

To extensively integrate various conditions that affect the real estate and understand the internal factors of REP changes, we propose a REP representation method with feature embedding optimization to achieve a deep understanding of REP characteristics.

Due to too many factors affecting the REP, the model has the problem of gradient disappearance, which can only capture the local relationship of each factor, and cannot effectively learn the long-term related content, and the word memory ability at the front of the sequence is weak. Therefore, we redesign the bidirectional GRU (B-GRU) network (RoselinKiruba et al., 2024) to extract the sequence features composed of multiple factors. The network model of this part is shown in Fig. 1. According to the B-GRU network diagram, it contains the forward part and the reverse part. Each direction is a separate GRU. Feature extraction from the input of the forward and reverse directions can more fully learn the relationship between the context of each factor. The state at time t consists of two parts, which are the forward hidden state h_n and the reverse hidden state h’_t-n, which can be represented as follows:

$${h}_{n}^{{\prime} }={\rm{GRU}}({F}_{t},{h}_{t-1})$$

(1)

$${h}_{t-n}^{{\prime} {\prime} }={\rm{GRU}}(h\,\cap \sim {h}_{n}^{{\prime} },{h}_{t-n+1})$$

(2)

$${h}_{t}={w}_{t}{F}_{t}+{\nu }_{t}{h}_{t-n}^{{\prime} {\prime} }+{b}_{t}$$

(3)

where w_t and v_t are used as weight parameters to transfer weights between the input data and the internal state of the model or other layers, while b_t is used as a bias parameter to adjust the baseline level of the output. These parameters work together to enable the model to effectively learn and capture context from the input data to generate the final output h_t. In addition, regularization methods are introduced during the training process, especially the dropout operation. It reduces the coadaptation between neurons by randomly discarding a part of the neurons in the neural network temporarily during training.

B-GRU is good at capturing the complex relationship in which house prices are affected by many factors. To deeply mine the deep semantic features of each single factor’s impact on house prices, we integrate CNN. It can efficiently extract the local correlation information in the data, which is particularly critical for analyzing the feature patterns in the house price data. We used a variety of convolution kernels of different sizes to simulate the influence of a single factor on house prices in different environments with different ranges and intensities (M-CNN), as shown in Fig. 2. This design allows the model to understand the characteristics of the data from multiple dimensions and scales, which enhances the flexibility and adaptability of the model. Meanwhile, by applying filters of different sizes, we can obtain a broader or more refined view, and then achieve an effective and comprehensive features of the influencing factors of housing prices.

The integrated model structure combining B-GRU and CNN not only fully leverages the advantages of GRU in processing sequential data and capturing long-term dependencies, but also leverages the powerful feature extraction capabilities of CNN to deeply uncover the intrinsic relationships and underlying patterns among various factors in housing price data. This comprehensive model demonstrates significant performance improvements and higher prediction accuracy in applications such as housing price prediction. By incorporating the bidirectional nature of GRU, the model can effectively capture information from both past and future contexts, enabling it to more accurately predict housing prices based on a comprehensive understanding of market trends and historical data. Additionally, the CNN component enhances the model’s ability to extract intricate patterns and relationships within the data, which further refines the prediction process and boosts the overall accuracy of the housing price forecasts. Together, these capabilities make the B-GRU-CNN hybrid model a powerful tool for real estate market analysis and price prediction.

Reinforcement learning with adaptive loss functions

To make full use of the advantages of many factors affecting REP in the prediction model, this paper innovatively proposes a reinforcement learning method with an ALF, and the framework is shown in Fig. 3. This framework aims to better capture the differential impact of different factors on the prediction accuracy of REP by dynamically adjusting the loss function, to optimize the prediction performance. Through the mechanism of reinforcement learning, the model can automatically identify and emphasize those factors that can focus more on the results during the learning process, while weakening or ignoring the interference caused by secondary factors, and finally achieve more accurate and robust REP prediction.

We utilize reinforcement learning as the intelligent decision-making unit in the framework to adjust the learning policy and parameters based on the performance feedback from the prediction model. Through trial and error, we optimize the ALF and the parameter Settings of B-GRU and CNN to achieve the goal of maximizing prediction accuracy. By training in a simulated environment, the reinforcement learning agent can learn how to most effectively configure the prediction model in different market environments, thereby improving its generalization ability and robustness in practical applications.

We redesign the loss from the Dice (Abraham and Khan, 2019). First, the Dice function is:

$${DLoss}({y},{y}^{\prime} )=1-\frac{2\sum {y}_{i}{y}^{\prime}_{i}+\epsilon}{\sum{y}_{i}+\sum {y}^{\prime}_{i}+\epsilon}$$

(4)

where y refers to the predicted results, y’ denotes the truth of the sample, and ε is set to 0.0001. Then, to compensate for Dice’s inability to handle multiple real estate factors, we redesign the loss function as follows:

$${MDL}{oss}=1-2\times \frac{\sum {y}_{i}{y^{\prime} }_{i}+\epsilon }{\sum {w}_{i}\sum {y}_{i}+\sum {w^{\prime} }_{i}\sum {y^{\prime} }_{i}+\epsilon }$$

(5)

where w_i and w’_i are trainable matrices. To avoid the problem that the denominator of the loss function is 0. The proportion of each factor is corrected by its inverse to reduce the correlation between each factor to the improved loss. The MSE between the predicted results and the standard results is backpropagated through model training to realize the regression of prices. The MSE is present as follows:

$${MSE}(y,{y}^{{\prime} })=\frac{\mathop{\sum }\limits_{i=1}^{n}{\left({y}_{i}-{y^{\prime} }_{i}\right)}^{2}}{n}$$

(6)

Finally, to avoid the limitation of fixed weights to multiple losses, we adopt adaptive weights, so that the model can adaptively assign weights according to the loss values of different training stages during the training process, which are shown in the following equation:

$$L{oss}={w}_{1}{MSE}+{w}_{2}{MDL}{oss}$$

(7)

where w₁ and w₂ are trainable matrices. By dynamically adjusting the loss function, intelligently screening the key features and optimizing the learning strategy, the complexity and uncertainty problems faced in the REP prediction are effectively solved, and a more accurate and reliable prediction tool is provided for the real estate industry.

Experiments

Dataset and implementation settings

We use the House sale Dataset (https://zenodo.org/records/6423459, https://doi.org/10.5281/zenodo.6423459) on the REP prediction model for testing. This dataset encompasses information gathered from the websites of Fotocasa and Idealista, spanning from April 4th to April 7th, 2022. Each entry meticulously details a house listed for sale within the Salamanca and Villaverde districts of Madrid, utilizing the following attributes: a title, location specifications, the price, square meterage, the number of rooms, the floor level, the count of photos, availability of floor plans, 3D views, videos, home staging status and the comprehensive description. This dataset can be utilized for various research purposes, such as REP prediction, market analysis, and consumer behavior studies. Through in-depth analysis of these data, researchers can understand the relationship between housing prices and housing characteristics, predict future price trends, evaluate market trends, and gain insights into consumers’ preferences and demands for different housing features. Furthermore, this dataset offers invaluable information resources. By analyzing these data, they can better comprehend market demands and competitive landscapes, and formulate more reasonable pricing strategies, investment plans, and policy measures.

During the training phase, we leveraged a Ryzen 7600x processor alongside six Nvidia RTX 3070 GPUs to enhance computational efficiency. To expedite the training process, we opted for Pytorch as our framework and meticulously fine-tuned its configurations to precisely align with the training parameters outlined in Table 1.

Table 1 Parameter settings.

Full size table

In order to fully evaluate our method, we choose the mean square error (MSE), explained square difference (EVS), mean absolute error (MAE), and determination coefficient R² to analyze the accuracy of each model after prediction. The higher the R² value, the better the performance, while the opposite is true for MSE. Firstly, real estate data is cleaned by employing methods such as deletion, imputation, or interpolation to handle missing values, outliers, and duplicate values. Subsequently, feature scaling is applied, and quantization is performed through one-hot encoding.

Ablation experiments

To enhance the predictive precision of REP, we introduce two distinct modules: FEF and ALF. The performance of FEF and ALF is verified by conducting ablation experiments. A common deep CNN was used as a Baseline in the experiment.

We conducted in-depth qualitative analysis of these two independent modules, and visually present the results in Fig. 4. When we integrate the FEF module on the basis of baseline, the performance of our model undergoes a notable enhancement, the R² is significantly increased from 0.854 to 0.956, while the MSE is significantly decreased from 0.0425 to 0.0123. This result shows that the FEF module has a significant influence on improving the model. On the other hand, we also observe a positive change in model performance when we separately introduce ALF into the baseline model, with a decrease of 0.005 in MAE and an increase of 0.038 in EVS. This demonstrates the effectiveness of the ALF module in enhancing the model and improving the prediction accuracy and interpretability. Furthermore, to explore the two modules working together, we apply FEF and ALF to the baseline model simultaneously. This combination results in the optimal performance. The MSE of the model is reduced to an extremely low 0.0059, the MAE is only 0.0099, the R² is as high as 0.975, and the EVS is 0.951. These indicators not only show the excellent performance of the model in prediction accuracy but also reflect its strong explanatory power and generalization ability. In summary, the joint application of the FEF and ALF modules brings comprehensive performance improvement to the baseline model.

In order to clearly show the specific role and advantages of these two modules in REP prediction, we designed and executed a qualitative ablation experiment. In the experiment, we specially select the housing price data of two representative areas in the data set within a year as samples, and evaluate the performance of different modules through comparative analysis. The results are in Fig. 5, where the real price trend of the housing market is clearly depicted by the red solid line. First, we focus on the samples where housing prices show an upward trend. In this case, the comparison results show that both ALF and FEF modules can more closely track and predict the real change of house prices than the baseline, that is, their prediction curve is closer to the real price represented by the red solid line. What is more remarkable is that when the ALF and FEF modules are used together, the prediction accuracy is extremely high. Subsequently, we extend our analysis to the sample where house prices remain relatively stable. Through this comparison, we again verify the previous conclusion that the use of ALF and FEF modules alone or in combination can effectively increase the prediction accuracy and make the prediction results more aligned with the actual housing price. This finding not only strengthens the positive role of ALF and FEF modules in house price forecasting but also further proves their wide applicability and stability under different market conditions.

In addition, through the in-depth analysis of these experimental results, we can also draw the following conclusions. The ALF and FEF modules show good complementarity in the prediction process. ALF module improves the sensitivity of the model to complex market dynamics by focusing on key information points in the data. While the FEF module enhances the quality of input features to provide richer and more valuable information sources for the model. The combination of the two makes the model more comprehensive and accurate in capturing market trends and predicting price changes. The experimental results show that both ALF and FEF modules can maintain stable performance in different market environments, which reflects their good adaptability and robustness. This is particularly important for the highly uncertain and complex task of REP prediction, because market conditions often change rapidly, and a reliable forecasting model needs to have the ability to maintain accuracy in a variety of scenarios.

Compare other methods

We conduct an exhaustive and comprehensive performance evaluation of the newly proposed REP prediction model, aiming to verify its effectiveness and superiority in practical applications. For this purpose, we select Li et al. (2017), Li (2023), Liu and Ma (2024), Demirhan and Baser (2024), Zhao et al. (2024), Sharma et al. (2024), Ozalp and Akıncı (2024), and Samadadiya et al. (2024) as the comparison benchmarks, which represent the representative and advanced research results in this field. Meanwhile, we choose a plain CNN as our baseline. The CNN extracts local features from the input data through convolutional operations, and reduces the dimensionality and the number of parameters of the data through pooling operations, ultimately performing classification or regression through fully connected layers.

The evaluation results, as shown in Table 2 and Fig. 6, show that our property price prediction model has demonstrated excellent performance on various key evaluation indicators. The MSE of the model reaches 0.0059, the MAE is 0.795, the R² is as high as 0.824, and the EVS is 0.951, which emphasizes the strong strength of the model in capturing data variability and prediction ability. In the detailed comparison with the benchmark, our model highlights its significant advantages. Compared with Liu and Ma (2024), our MSE value is decreased by 0.0064, and EVS value is increased by 0.062. The improvement of these two key indicators directly reflects the significant improvement of our model in prediction accuracy and explanatory ability. Compared with Demirhan and Baser (2024) and Zhao et al. (2024), our MAE values are 0.0168 and 0.0133 ahead, respectively, which shows that our prediction results are closer to the real situation and our error control is better. At the same time, our R-squared values are 7.9% and 8.9% ahead of those of Demirhan and Baser (2024) and Zhao et al. (2024), which again verifies our performance in model fit and prediction accuracy. Sharma et al. (2024) and Ozalp, Akıncı (2024) perform worse than our method in all evaluation metrics, which further consolidates our technological leadership. Although Samadadiya et al. (2024) have similar performance with our method, it still lags behind our model in all key evaluation indicators, which reflects the depth and innovation of our research work.

Table 2 Comparison experiments for our method.

Full size table

Our REP prediction model shows excellent performance in the performance evaluation, which is not only significantly better than many comparison methods in various evaluation indicators, but also provides a new solution for accurate prediction and decision support of the real estate market.

Real sample testing

To evaluate the efficiency of our model in the REP prediction application, I visualize the price prediction results on the sample and show the comparison results in Figs. 7, 8. Firstly, we use the real samples, Location 1 and Location 2, in the ablation experiment for testing. We compare the prediction of the house price of location 1, and find that our method is closer to the real state than that of Ozalp and Akıncı (2024) and Samadadiya et al. (2024). At the same time, we also output the prediction time of our model for different numbers of houses in location 1 for 12 months, which can help find good sample scalability of our model. In Fig. 8, we show the same conclusion.

In order to comprehensively and deeply evaluate the efficiency and performance of our method in the field of REP predicting, we take a series of rigorous steps and visualize the key results on a sample. Figures 7 and 8 show the comparison of our prediction effect in different scenarios, which verifies the performance of our model, but also reveals its unique advantages.

In Fig. 7, we focus on Location 1 and show significant improvement by comparing the prediction results of our method with those of existing literatures. It can demonstrate that our prediction curve is closer to the real housing price trend, which is not only consistent in the overall trend, but also shows higher accuracy in local fluctuations. This result fully proves the superiority of our method in capturing market dynamics and predicting future house prices. In addition, we also pay special attention to the prediction time of Location 1 for different numbers of listings within twelve months. The experimental results show that no matter how the number of listings changes, our model can complete the prediction in a reasonable time, and the prediction accuracy remains stable. This finding not only demonstrates the efficiency of our model but also provides strong support for its wide applicability in practical applications.

To further consolidate our conclusions, we show the comparison results of house price prediction for Location 2 in Fig. 8. Similar to Location 1, our method also shows better prediction performance than other literature. This cross-location validation not only proves the universality of our method in different market environments but also further enhances its reliability as a REP prediction tool.

Through the above comparative experiments and visual presentation, we can draw the following conclusions. Our method shows higher accuracy in house price forecasting and is able to capture market dynamics and price trends more precisely. The model shows good scalability when dealing with different numbers of listings, and is able to adapt to datasets of different sizes while ensuring the prediction accuracy. The cross-site validation results show that our method can maintain stable prediction performance in different market environments, and has high universality and application value.

Discussion

After experimental verification, we tested the REP representation method with the fused feature embedding optimization technique combined with a reinforcement learning framework based on an ALF. The experimental data fully show that our proposed model has high effectiveness and wide practicability in the field of REP prediction.

Through the refined feature embedding optimization strategy, the model significantly improves the richness and accuracy of data representation, so that the model can more deeply understand the dynamics of the market and the internal laws of price changes. Meanwhile, the introduction of ALF enables the model to intelligently adjust the optimization direction in the training process, and give different attention to different types of prediction errors, thus further improving the accuracy of prediction. Experiments show the proposed model achieves obvious advantages in terms of prediction accuracy, stability, and generalization ability. This verifies the rationality and innovation of the model design. The ALF can dynamically adjust to the characteristics of the data, focusing more on errors that need more attention. This can lead to improved prediction accuracy, especially in complex scenarios where REPs are influenced by various factors. By emphasizing larger errors and adjusting the model accordingly, the ALF helps in reducing overall prediction error. Feature embedding optimization schemes convert raw data into a more compact and meaningful representation, which can capture the underlying patterns and relationships within the data. This efficient representation of features allows the model to learn more effectively from the data, improving its ability to generalize and make accurate predictions on unseen data. REPs are influenced by a multitude of complex and often intertwined factors. Feature embedding techniques can help in extracting and representing these factors in a way that makes it easier for the model to understand and learn from them.

This study makes significant contributions across academic, practical, and policy domains. Academically, it advances predictive modeling by integrating reinforcement learning with an ALF, enabling dynamic feature weighting in response to market fluctuations. Additionally, optimizing the feature embedding process enhances the model’s ability to process FEF, improving interpretability and robustness in deep learning applications. The study also bridges theory and practice by providing a replicable hybrid modeling framework for complex forecasting tasks in financial and real estate analytics. Practically, the model enhances forecasting accuracy, aiding investors and developers in making informed decisions, while its improved predictive reliability helps stakeholders mitigate financial risks in volatile real estate markets. Furthermore, its scalability and adaptability allow application across diverse regional markets and varying data conditions. From a policy perspective, the model supports urban planning by offering insights into predictive price trends, facilitates data-driven policy formulation by leveraging machine learning analytics, and enhances resource allocation by optimizing housing supply, subsidy distribution, and infrastructure planning to promote sustainable urban development.

Limitations

While the proposed model demonstrates notable predictive performance, it introduces a certain level of computational complexity, primarily due to the integration of reinforcement learning and the adaptive loss optimization framework. The reinforcement learning module requires iterative environment interaction and policy updates, which can result in increased training time and memory consumption, especially when applied to large-scale or real-time datasets. Similarly, the dynamic adjustment of the loss function necessitates continuous gradient recalculations, potentially imposing additional computational overhead.

From an implementation perspective, deploying such a model in production environments may demand high-performance computing infrastructure and technical expertise, which could be a barrier for small-scale enterprises or government agencies with limited resources. To mitigate these challenges, future work may explore model compression techniques such as knowledge distillation or the development of lightweight surrogate models that retain predictive performance while reducing complexity. Additionally, hybrid deployment strategies—wherein offline training is combined with simplified online inference—may offer practical trade-offs between accuracy and efficiency.

Additionally, the model has been evaluated on a specific regional dataset, which may limit its generalizability to other real estate markets with different socio-economic structures or regulatory environments. Although the proposed architecture is theoretically adaptable, cross-market testing is necessary to verify robustness under varying conditions. The performance of data-driven models heavily depends on the availability and quality of input data. In real-world applications, inconsistencies, biases, or incomplete records in real estate datasets may negatively impact prediction accuracy. While not the focus of this study, attention to responsible data handling and transparent model interpretation will be essential in future work to ensure reliability and stakeholder trust in practical deployments.

Conclusion

In order to effectively deal with the dynamic real estate market, we propose a REP prediction method combining ALF and feature embedding optimization. This method constructs a more refined REP representation model by deeply mining the internal relationship between the FEF that affects the REP. Furthermore, the reinforcement learning mechanism based on ALF is introduced, which not only improves the adaptability of the model to a complex market environment, but also significantly enhances the weight consideration of key influencing factors in price prediction. The experiments conclude that the prediction model achieves a high level of accuracy, with the R-squared value reaching 0.975 and the EVS value reaching 0.951, which fully proves the superiority of the model in the field of REP prediction and provides solid data support and a reliable prediction basis for scientific decision-making. In the future, we will explore more effective feature extraction and embedding methods for a wider range of real estate markets, in order to enhance the model’s ability to capture and process critical information, making it more widely applicable to various market environments and conditions.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Change history

17 September 2025
The Acknowledgements section was missing from this article and should have read “Research Project Supported by Shanxi Scholarship Council of China：Study on the Influence of Healthy Aging on Multi-dimensional Spatiotemporal Differentiation of Housing Market. The fund number is 2022-134'. The original article has been corrected.” The original Article has been updated.

References

Abraham N, Khan NM (2019) A novel focal Tversky loss function with improved attention U-Net for lesion segmentation. In: Proceedings of IEEE 16th international symposium on biomedical imaging (ISBI). IEEE, p 683–687
Alfaro-Navarro JL, Cano EL, Alfaro-Cortes E, García N, Gámez M, Larraz B (2020) A fully automated adjustment of ensemble methods in machine learning for modeling complex real estate systems. Complexity 1:5287263
Google Scholar
Baik S, Choi J, Kim H, Cho D, Min J, Lee MK (2021) Meta-learning with task-adaptive loss function for few-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision. IEEE, p 9465–9474
Churchill SA, Inekwe J, Ivanovski K (2018) House price convergence: evidence from Australian cities. Econ Lett 170:88–90
Article Google Scholar
Demirhan H, Baser F (2024) Hierarchical fuzzy regression functions for mixed predictors and an application to real estate price prediction. Neural Comput Appl 36:1–17
Article Google Scholar
Du D, Li A, Zhang L (2014) Survey on the applications of Big Data in Chinese real estate enterprise. Procedia Comput Sci 30:24–33
Article Google Scholar
Duca JV, Muellbauer J, Murphy A (2021) What drives house price cycles? International experience and policy issues. J Econ Lit 59(3):773–864
Article Google Scholar
Fang YC (2022) Forecast of foreclosure property market trends during the epidemic based on GA‐BP neural network. Sci Program 1:3220986
Google Scholar
Gabauer D, Gupta R, Marfatia HA, Miller SM (2024) Estimating US housing price network connectedness: evidence from dynamic elastic net, lasso, and ridge vector autoregressive models. Int Rev Econ Financ 89:349–362
Article Google Scholar
Ganioğlu A, Seven U (2021) Do regional house prices converge? Evidence from a major developing economy. Cent Bank Rev 21(1):17–24
Article Google Scholar
Guirguis H, Mueller G, Dutra V, Jafek R (2024) Advances in forecasting home prices. Comput Econ 65:3633–3650
Jiang Z, Rai A, Sun H, Nie C, Hu Y (2023) How does online information influence offline transactions? Insights from digital real estate platforms. Inf Syst Res 35(3):917–1506
Google Scholar
Khrais LT, Shidwan OS (2023) The role of neural network for estimating real estate prices value in post COVID-19: a case of the middle east market. Int J Electr Comput Eng 13(4):4516
Google Scholar
Li RYM, Fong S, Chong KWS (2017) Forecasting the REITs and stock indices: group method of data handling neural network approach. Pac Rim Prop Res J 23(2):123–160
Google Scholar
Li X (2023) Comparing linear regression and decision trees for housing price prediction. In: Proceedings of international conference on data science, advanced algorithm and intelligent computing (DAI 2023). Atlantis Press, p 77–84
Liang X (2023) Prediction and Analysis of Commodity House Price Based on ARIMA Model. In: Proceedings of the international conference on business and policy studies. Springer Nature Singapore, p 918–929
Liu G (2022) Research on prediction and analysis of the real estate market based on the multiple linear regression model. Sci Program1:5750354
Google Scholar
Liu J, Ma Z (2024) Forecasting housing price using GRU, LSTM and Bi-LSTM for California. In Proceedings of IEEE 2nd international conference on control, electronics and computer technology (ICCECT). IEEE, p 1033–1037
Lorenz F, Willwersch J, Cajias M, Fuerst F (2023) Interpretable machine learning for real estate market analysis. J Real Estate Econ 51(5):1178–1208
Article Google Scholar
Mao M (2023) A comparative study of random forest regression for predicting house prices using. In: Proceedings of the international conference on Data Science, Advanced Algorithms and Intelligent Computing (DAI 2023). Atlantis Press, p 619–626
Mathur S (2017) The myth of “free” public education: impact of school quality on house prices in the Fremont Unified School District, California. J Plan Educ Res 37(2):176–194
Article Google Scholar
Millar MI, White RM (2024) Do residential property assessed clean energy (PACE) financing programs affect local house price growth? J Environ Econ Manag 124:102936
Article Google Scholar
Møller SV, Pedersen T, Montes Schutte EC, Timmermann A (2024) Search and predictability of prices in the housing market. Manag Sci 70(1):415–438
Article Google Scholar
Nirmala JS, Sravya SG, Tharun K (2024) Housing market intelligence: data science for rental price forecasting. In: Proceedings of the 3rd international conference for innovation in technology (INOCON). IEEE, p 1–9
Ozalp AY, Akıncı H (2024) Comparison of tree-based machine learning algorithms in price prediction of residential real estate. Gumuşhane Univ Fen Bilim Derg 14(1):116–130
Google Scholar
Peng C, Xiao H, Ou K (2023) Transaction price prediction of second-hand houses in Wuhan based on GA-BP model. Highlights Sci Eng Technol 31:153–160
Article Google Scholar
Rampini L, Re Cecconi F (2022) Artificial intelligence algorithms to predict Italian real estate market prices. J Prop Investig Financ 40(6):588–611
Article Google Scholar
Rey-Blanco D, Zofio JL, Gonzalez-Arias J (2024) Improving hedonic housing price models by integrating optimal accessibility indices into regression and random forest analyses. Expert Syst Appl 235:121059
Article Google Scholar
RoselinKiruba R, Sowmyayani S, Anitha S, Kavitha J, Preethi R, Saranya Jothi C (2024) Text summarization based on feature extraction using GloVe and B-GRU. In: Proceedings of the 2nd international conference on sustainable computing and smart systems (ICSCSS). p 517–522
Samadadiya K, Das S, Kumari R (2024) Predictive paradigm: AI-driven social media analysis for real estate sales forecasts. In: Senjyu T, So–In C, Joshi A (eds) Smart trends in computing and communications; SmartCom 2024. Lecture Notes in Networks and Systems. Vol 946 Springer, Singapore https://doi.org/10.1007/978-981-97-1323-3_16
Sharma M, Chauhan R, Devliyal S, Chythanya KR (2024) House price prediction using linear and lasso regression. In: Proceedings of the 3rd international conference for innovation in technology (INOCON). IEEE, p 1–5
Soltani A, Lee CL (2024) The non-linear dynamics of South Australian regional housing markets: a machine learning approach. Appl Geogr 166:103248
Article Google Scholar
Song Y, Ma X (2024) Exploration of intelligent housing price forecasting based on the anchoring effect. Neural Comput Appl 36(5):2201–2214
Article Google Scholar
Turnbull GK, Zheng M (2021) A meta‐analysis of school quality capitalization in US house prices. Real Estate Econ 49(4):1120–1171
Article Google Scholar
Turnbull GK, Zahirovic-Herbert V, Zheng M (2018) Uncertain school quality and house prices: theory and empirical evidence. J Real Estate Financ Econ 57:167–191
Article Google Scholar
Wang X, Gao S, Zhou S, Guo Y, Duan Y, Wu D (2021) Prediction of house price index based on bagging integrated WOA‐SVR Model. Math Probl Eng 1:3744320
Google Scholar
Wei C, Fu M, Wang L, Yang H, Tang F, Xiong Y (2022) The research development of hedonic price model-based real estate appraisal in the era of big data. Land 11(3):334
Article Google Scholar
Wheeler DC, Paez A, Spinney J, Waller LA (2014) A Bayesian approach to hedonic price analysis. Pap Reg Sci 93(3):663–684
Article Google Scholar
Wu J (2024) Multiple machine learning models in house price prediction: performance evaluation and comparison. Highlights Bus Econ Manag 40:364–371
Article Google Scholar
Yang X, Che H, Leung MF, Liu C, Wen S (2024) Auto-weighted multi-view deep non-negative matrix factorization with multi-kernel learning. IEEE Trans Signal Info Process Netw 11:23–34
MathSciNet Google Scholar
Yang X, Che H, Leung MF, Wen S (2024) Self-paced regularized adaptive multi-view unsupervised feature selection[J]. Neural Netw 175:106295
Article PubMed Google Scholar
Zhao L, Mbachu J, Liu Z (2019) Exploring the trend of New Zealand housing prices to support sustainable development. Sustainability 11(9):2482
Article ADS Google Scholar
Zhao Y, Zhao J, Lam EY (2024) House price prediction: a multi-source data fusion perspective. Big Data Min Anal 7(3):603–620
Article Google Scholar
Zhu H, Li H (2021) Predict prices of second‐hand house using GBDT algorithm and PSO Algorithm. Front Econ Manag 2(11):513–524
Google Scholar
Zulkifley NH, Rahman SA, Ubaidullah NH (2020) House price prediction using a machine learning model: a survey of literature. Int J Mod Educ Comput Sci 12(6):46–54
Article Google Scholar

Download references

Acknowledgements

Research Project Supported by Shanxi Scholarship Council of China：Study on the Influence of Healthy Aging on Multi-dimensional Spatiotemporal Differentiation of Housing Market. The fund number is 2022-134.

Author information

Authors and Affiliations

Shanxi University of Finance and Economics, Taiyuan, China
Hongqin Zhang

Authors

Hongqin Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

This article is written by the author alone.

Corresponding author

Correspondence to Hongqin Zhang.

Ethics declarations

Competing interests

The author declares no competing interests.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, H. Residential real estate price prediction based on adaptive loss function and feature embedding optimization. Humanit Soc Sci Commun 12, 832 (2025). https://doi.org/10.1057/s41599-025-05217-9

Download citation

Received: 31 October 2024
Accepted: 05 June 2025
Published: 16 June 2025
Version of record: 16 June 2025
DOI: https://doi.org/10.1057/s41599-025-05217-9