Introduction

Green development has become a global consensus. As direct contributors to ecological degradation, firms face increasing social attention and legitimacy pressure regarding their environmental protection actions (Delmas and Toffel, 2004). Environmental information disclosure is crucial for firms to effectively communicate these actions to stakeholders (Kuo and Chen, 2013). However, greenwashing has become a strategic practice for some firms, who utilize their information advantages to selectively disclose favorable environmental information or use misleading language to conceal insubstantial environmental protection efforts (Delmas and Burbano, 2011; Seele and Gatti, 2017; Zhang, 2023). As the global digital economy grows, big data drives a new wave of technological revolution. At its core, big data involves the generation, integration, analysis, and application of massive data, and its importance in fighting environmental challenges is becoming increasingly prominent (Wu et al., 2016; Gao et al., 2023). Does big data affect firm greenwashing? If so, what is the underlying mechanism? Investigating these questions is crucial for explaining how the digital factor drives green development at the firm level.

The literature relevant to this paper is divided into two main streams: one examines the determinants affecting firm greenwashing. Existing literature indicates that firms embellish environmental performance in response to environmental regulations (Hu et al., 2023; Zhou et al., 2024), seek financial resources (Zhang, 2023), and cater to the green preferences of investors and consumers (Delmas and Burbano, 2011). While on the other hand, existing literature discovers factors such as social media (Fernando et al., 2014), independent directors and institutional investors (Yu et al., 2020), and retail investor activism (Zhang, 2024) that inhibit corporate greenwashing, the practice remains covert and complex, posing challenges for effective regulation and governance. With the rise of the digital economy, existing literature finds that fintech (Xie et al., 2023), digital finance (Yin and Yang, 2024), government digitalization (Xu et al., 2024), corporate digital transformation (Wang et al., 2024), and artificial intelligence (Zhang, 2024) are effective tools for governing corporate greenwashing. The second stream probes how big data affects business operations and development. Existing literature explores the positive effects of big data on innovation from various perspectives, including business model innovation (Ciampi et al., 2021), green technology innovation (Gao et al., 2023), process innovation, innovation by diverse recombination (Wu et al., 2020), and organizational service innovation (Troilo et al., 2017). Furthermore, some studies demonstrate that big data reshapes traditional decision-making paradigms (Yaqoob et al., 2016), enables product sales forecasting to be more accurate, and enhances firm supply chain agility and adaptability (Wamba et al., 2020). Additionally, big data increases the productivity of the firms and corporate performance (Müller et al., 2018; Vitari and Raguseo, 2020). The literature also examines the feasibility of big data in green development through qualitative analysis (Wu et al., 2016). Overall, existing literature has begun to examine how the digital technologies of various organizations, such as governments (Xu et al., 2024) and enterprises (Yin and Yang, 2024; Zhang, 2024), affect corporate greenwashing. Although big data provides abundant data resources for digital technologies, there are notable differences in the attributes between big data and digital technologies. Big data is the deep integration of data sets and technological systems, which allows data mining and efficient application through high-speed access, integration, and analysis of large and diverse data resources (Chen et al., 2012). The core characteristics of big data include high volume, high velocity, wide variety, significant veracity, and low value density (McAfee and Brynjolfsson, 2012; Wamba et al., 2015). In contrast, digital technologies rely on exact processing and efficient transmission of information via binary encoding, discretization, and algorithmic implementation (Fitzgerald et al., 2014). Consequently, the effect of big data on corporate greenwashing and its underlying mechanism needs to be further clarified.

To achieve this, we use the launch of China’s national big data comprehensive pilot zone in 2016 as a policy shock and employ the difference-in-differences specification to examine its effect on corporate greenwashing. Difference-in-differences is a causal inference method used to estimate the net effect of policy interventions by comparing changes before and after the policy between the treatment group (the group affected by the policy) and the control group (the group not affected by the policy). We define enterprises located within the pilot zones as the treatment group and those outside the pilot zones as the control group. Then, we estimate the net effect of big data policy on firm greenwashing by comparing the changes in firm greenwashing of the two groups before and after the establishment of pilot zones.

This study provides three marginal contributions as follows. First, this article contributes to the existing literature by examining how policy-driven data technology development at the regional level affects corporate greenwashing. Although existing literature has begun to explore the effect of digital factors, such as fintech (Xie et al., 2023), digital finance (Yin and Yang, 2024), government digitalization (Xu et al., 2024), corporate digital transformation (Wang et al., 2024), and artificial intelligence (Zhang, 2024), on firm greenwashing, it is primarily limited to the digitalization or digital technology applications within different organizations, including government agencies and enterprises. China’s national big data comprehensive pilot zones are a regional pilot policy aimed at promoting the regional development of big data techniques, providing a more objective indicator for measuring regional big data development, so we essentially examine the effect of the regional development of big data techniques at the macro-level policy on corporate greenwashing. The empirical finding extends our understanding of how the digital factor governs corporate greenwashing. Second, it contributes to the studies on the effect of big data on green development. The existing literature indicates that big data enhances green technology innovation (Gao et al., 2023) and qualitatively discusses its role in driving green changes (Wu et al., 2016). According to our research, big data significantly inhibits firm greenwashing, adding to the knowledge of how big data affects green development from the perspective of green speculation. Finally, our study clarifies the underlying mechanism through which big data affects firm greenwashing. We find that big data inhibits corporate greenwashing by enhancing environmental performance and alleviating information asymmetry, offering deeper insights into the effect of big data on greenwashing.

The remainder of this paper is organized as follows. Institutional background and hypothesis development cover the institutional background of China’s national big data comprehensive pilot zones and hypothesis development. Data, variable construction, and descriptive statistics describe the data, variable construction, and descriptive statistics. Empirical specification and results describe the empirical specification and results. Further analysis provides more analysis, while Conclusion concludes.

Institutional background and hypothesis development

Institutional background

China’s central government recognized big data technologies as a disruptive force for the future. The Action Outline for Promoting Big Data Development was issued in 2015, which makes it evident that the government in China attempted to build the foundations for big data technologies and their applications. Despite the initial top-to-bottom push from the government, the development of big data technologies was still subject to some constraints, such as lagging law and regulation, integrated planning, openness, and data sharing. As a result, the push for big data development has been almost ineffective. To ease these constraints, the central government switched to selecting some regions in a balanced manner across the country to pilot big data technologies and their applications in 2016. The national comprehensive big data pilot zones were officially launched. The local governments in these regions were given special administrative authority to ease the constraints on developing big data technologies and their applications. The eight regions included the Pearl River Delta, Beijing, Tianjin, Hebei, Shanghai, Chongqing, Henan, Inner Mongolia, Guizhou, and Shenyang. These eight regions launched a pilot in several major aspects, such as big data systematic innovation, the opening and sharing of public data, the application and advancement of big data, the circulation of data factors, the integration and utilization of data centers, and international big data collaboration, aiming to promote regional development of big data technologies. Compared to the others, these regions establish comprehensive development systems of big data technologies through pilot policies, which enable them to have significant advantages in data acquisition, integration, analysis, application, circulation, and collaboration. As a result, they make data mining more effective and promote the deeper and broader application of data.

Hypothesis development

The two most basic questions about greenwashing are: Why do companies greenwash? And what is the nature of greenwashing? When the accounting benefits of corporate environmental governance are insufficient to compensate for the accounting costs, corporate management is under pressure for short-term financial performance, and they are strongly motivated to greenwash. Thus, environmental performance declines (Clatworthy and Jones, 2003; Hu et al., 2023). If corporate environmental governance becomes more accessible and affordable, firms are more likely to improve their environmental governance performance, and the incentive to greenwash is reduced. In essence, greenwashing is the distortion, exaggeration, or concealment of a company’s environmental information (Seele and Gatti, 2017). When it becomes difficult to falsify, exaggerate, or conceal information, corporate greenwashing becomes more difficult to achieve. Therefore, the impact of the regional development of big data technologies on corporate greenwashing depends on whether big data enhances corporate environmental performance and whether it mitigates environmental information asymmetry between firms and stakeholders. We develop a null hypothesis against an alternative hypothesis as follows.

H10: Big data significantly inhibits firm greenwashing vs. H11: Big data significantly drives firm greenwashing.

In traditional production models, corporate environmental governance activities are usually characterized by high uncertainty, high expenditures, and long cycles. These characteristics make it difficult for the returns on environmental governance activities to cover investment costs, thereby weakening the effort for companies to fulfill their environmental responsibilities (Gray, 1987; Palmer et al., 1995). In sustainable production models, however, businesses with poor environmental performance face legitimacy pressures such as negative public perception and social ostracism (Lin et al., 2016), substantial fines from regulatory authorities (Shimshack and Ward, 2005), and more stringent credit requirements from financial institutions (Chava, 2014). As a result, these businesses are highly motivated to adopt low-cost greenwashing strategies, attempting to embellish their poor environmental performance and portray a false image of environmental friendliness (Meng et al., 2014).

The development of big data techniques makes it easier and cheaper for companies to improve environmental performance, thus weakening the incentive to greenwash. On the one hand, big data makes it easier and lower-cost for companies to adopt a data-driven model to reduce pollutant emissions in their production processes. Big data offers real-time dynamic monitoring of resource inputs and pollutant emissions. Companies use environmental data analysis and frequent feedback to adjust their production strategies on time, optimizing the balance between resource availability, demand, and pollutant emissions. This approach reduces resource waste and pollutant emissions (Wu et al., 2016). Data sharing empowers enterprises and their supply chain partners to establish close alliances dedicated to environmental governance, which facilitates the accurate measurement and dynamic control of energy consumption and carbon emissions across the supply chain from procurement to warehousing to logistics. This data-driven green collaborative control model enhances the enterprises’ efficiency of environmental governance and also promotes the optimization of environmental performance across the entire supply chain network (Raut et al., 2019). Moreover, big data integrates environmental pollution data from various sources, which clarifies the origins and dispersion channels of pollution. It provides a timely warning of pollution incidents and motivates companies to develop cooperative prevention and control of environmental pollution (Wu et al., 2016). On the other hand, big data enhances corporate green innovation. Big data makes it easier and less expensive to access and analyze a large amount of consumer data, which enables companies to more accurately predict consumers’ green preferences so that the direction of R&D can be more in line with consumer preferences and reduce the trial-and-error costs of green innovation (Troilo et al., 2017; Yaqoob et al., 2016). Big data also transcends traditional spatial boundaries and enables enterprises to develop closer connections with consumers, suppliers, universities, and academic institutions. Big data offers an efficient platform for knowledge and information exchange, resource sharing, and R&D collaboration. It accelerates the green R&D process and, at the same time, promotes the dissemination and use of green technologies (Chen et al., 2024). Based on the above analysis, we develop the second hypothesis.

H2: Big data significantly improves firm environmental performance, thereby inhibiting corporate greenwashing.

In a traditional business environment, corporate environmental governance is a secondary function, and the environmental information is often hidden in a large volume of primary business information. This information is often discontinuous, confusing, and diverse. Meanwhile, the lack of standardized environmental information disclosure norms and regulations grants management considerable discretion in determining what and how much information to disclose (Yu et al., 2020). Stakeholders face multiple challenges when collecting environmental information, such as narrow search scopes, ambiguous results, and high costs, all of these challenges make it difficult to accurately assess whether companies are fulfilling or not fulfilling their environmental obligations. Consequently, this information asymmetry creates a fertile ground for corporate greenwashing (Li et al., 2023; Wu et al., 2020).

The development of big data techniques mitigates information asymmetry between the firm and stakeholders, thus enhancing the completeness and reliability of environmental information disclosure. First, big data improves firms’ ability to process environmental information. Big data captures massive information throughout the production-to-sales process. It then standardizes and processes huge amounts of information using technologies such as data aggregation, hierarchical classification, and multi-dimensional analysis, enabling enterprises to integrate and use diverse forms of data (Oussous et al., 2018). This shifts the long-standing practice of firms relying on human experience and subjective judgment to process environmental information, transforming it from fragmented and ambiguous to comprehensive and verifiable. Second, big data increases the flow of information and communication between the firm’s departments, decreasing the possibility of management manipulating environmental information. The data network enables the board of directors, employees, and other internal stakeholders to access diverse environmental data. Therefore, it is more likely to stop the distortion and concealment of environmental information within the company and also make it more effective in reducing management discretion in environmental decision-making. Finally, in the big data environment, the volume of environmental information released by enterprises to stakeholders outside the company has surged. Environmental information disclosure has become more diversified, both in terms of content and format, as well as through communication channels and tools (Blazquez and Domenech, 2018). These stakeholders access firm environmental information more easily through lower-cost and diversified channels. To assess and verify information through big data, stakeholders can make precise and effective decisions about whether firms are achieving their environmental obligations. For example, big data enables verification of consistency between environmental indicators voluntarily disclosed by enterprises and those reported by government regulatory agencies. It also facilitates a comparative analysis of a company’s environmental performance against industry benchmarks, thereby identifying significant deviations that may indicate anomalies or discrepancies in environmental practices (Chen et al., 2012). Moreover, big data facilitates the collection and integration of social media reports and consumer feedback, providing real-time insights into diverse public perceptions of corporate environmental behavior (Chen et al., 2012). Therefore, big data enables the company’s outside stakeholders to identify potential corporate greenwashing. Based on the above analysis, we develop the third hypothesis.

H3: Big data significantly alleviates information asymmetry between enterprises and stakeholders, thereby inhibiting corporate greenwashing.

Data, variable construction, and descriptive statistics

Data

We chose A-share firms listed on China’s Shanghai and Shenzhen Stock Exchanges from 2012 to 2022 as the initial research sample. We remove the outliers from the original sample using the approach taken in existing literature (for example, He et al., 2025). Financial institutions are highly regulated, and therefore, we exclude them. Considering that ST firms exhibit financial anomalies, such as consecutive years of losses or insolvency, we also exclude them from the original sample. We exclude enterprises with significant missing data to ensure that all variables can be measured. We finally obtained 1293 A-share listed enterprises, totaling 10509 sample observations. The data on measuring the greenwashing indicator are sourced from the Bloomberg and WIND databases, respectively. Other data are sourced from the China Stock Market and Accounting Research (CSMAR) database. To avoid extreme observation interference, all continuous variables are winsorized at the 1st and 99th percentiles.

Variable construction

The dependent variable (GW) in this paper is firm greenwashing. Existing literature (Hu et al., 2023; Zhang, 2023; Yu et al., 2020) measures ESG (environmental, social, and governance dimensions) greenwashing by calculating the gap between a company’s standardized ESG disclosure score and its standardized ESG real performance, both of which are standardized relative to peer enterprises.Footnote 1 However, this paper explores corporate greenwashing from an environmental aspect. Since the environmental dimension of ESG directly reflects a company’s environmental practices, following the approach of Hu et al. (2023), we measure greenwashing by calculating a firm’s standardized environmental dimension disclosure score and subtracting its standardized environmental dimension actual performance. The design of greenwashing is as follows.

$${{\rm{GW}}}_{{\rm{i}},{\rm{t}}}=({{\rm{E}}}_{{\rm{dis}}\; {\rm{i}},{\rm{t}}}-\overline{{{\rm{E}}}_{{\rm{dis}}}})/{{\rm{\sigma }}}_{{\rm{dis}}}-({{\rm{E}}}_{{\rm{per}}\; {\rm{i}},{\rm{t}}}-\overline{{{\rm{E}}}_{{\rm{per}}}})/{{\rm{\sigma }}}_{{\rm{per}}}$$
(1)

Where i denotes firm, t refers to year, GW is firm greenwashing. \({{\rm{E}}}_{{\rm{dis}}}\) is a company’s environmental disclosure score, which is indicated by the Bloomberg environmental dimension score.Footnote 2\(\overline{{{\rm{E}}}_{{\rm{dis}}}}\) represents the average environmental disclosure score of peer companies. \({{\rm{\sigma }}}_{{\rm{dis}}}\) denotes the standard deviation of environmental disclosure scores among peer companies. \({{\rm{E}}}_{{\rm{per}}}\) represents a firm’s real environmental performance, as indicated by the Huazheng environmental dimension rating.Footnote 3\(\overline{{{\rm{E}}}_{{\rm{per}}}}\) is the average of the real environmental performance of peer enterprises. \({{\rm{\sigma }}}_{{\rm{per}}}\) denotes the standard deviation of real environmental performance among peer enterprises.

We construct the primary explanatory variable (Post×Treat) using the launch of China’s national big data pilot zone in 2016, which is the interaction between the dummy variable (Post) representing the launch time of the pilot zone and the dummy variable (Treat) indicating the firm situated in the pilot zone. Post is equal to 1 if the year is 2016 or later and 0 otherwise. Treat is equal to 1 if the firm is located in the pilot zone, which includes Pearl River Delta, Beijing, Tianjin, Hebei, Shanghai, Chongqing, Henan, Inner Mongolia, Guizhou, and Shenyang, indicating the treatment group; otherwise, Treat is 0, indicating the control group.

Accordance with the existing literature (Hu et al., 2023; Zhang, 2023), we control for several variables including firm size (Size), asset-liability ratio (Leverage), the growth rate of operating income (Growth), return on common stockholders’ equity (ROE), operating cashflow (Cashflow), current ratio (Liquidity), fixed asset ratio (PPE), firm age (AGE), the proportion of independent directors (Director), proportion of executives’ holdings (Holdings), and proportion of the largest shareholder’s holdings (First). Table 1 presents the variable definitions.

Table 1 Variable definitions.

Descriptive statistics

Table 2 reports the descriptive statistics of the main variables. For GW, the mean is −0.0177, which indicates that sample firms engage in low-level greenwashing on average. The values range from a minimum of −3.6927 to a maximum of 3.4330, with a standard deviation of 1.2000, which suggests that greenwashing varies widely among the sample firms. The mean of Post×Treat is 0.3058, which shows that 30.58% of the sample firms are affected by the establishment of the pilot zones. Additionally, the mean, standard deviation, minimum, median, and maximum of the control variables are generally consistent with those reported in the existing literature, which supports the reliability of our sample.

Table 2 Descriptive statistics.

Empirical specification and results

Baseline specification

Utilizing the 2016 establishment of China’s national big data pilot zone as a policy shock, we use the difference-in-differences method to examine how big data affects firm greenwashing. Compared to the classic difference-in-differences model, the difference-in-differences model with firm and year fixed effects is more precise in estimation and inference, so we set firm and year fixed effects. The model is specified as follows.

$${{\rm{GW}}}_{{\rm{i}},{\rm{t}}}={{\rm{\beta }}}_{0}+{{\rm{\beta }}}_{1}{{\rm{Post}}}_{{\rm{t}}}\times {{\rm{Treat}}}_{{\rm{i}}}+\sum {{\rm{\beta }}}_{{\rm{i}}}{{\rm{Controls}}}_{{\rm{i}},{\rm{t}}}+{{\rm{\mu }}}_{{\rm{i}}}+{{\rm{\tau }}}_{{\rm{t}}}+{{\rm{\varepsilon }}}_{{\rm{i}},{\rm{t}}}$$
(2)

Where i denotes firm, t refers to year. GW is the dependent variable that represents firm greenwashing. Post is the dummy variable for the establishment time of the pilot zone. Treat is the dummy variable for a firm located in the pilot zone. We are most interested in the variable Post×Treat, as the sign, magnitude, and statistical significance of the coefficient β1 capture the effect of big data on firm greenwashing. Controls represent a vector of control variables including Size, Leverage, Growth, ROE, Cashflow, Liquidity, PPE, AGE, Director, Holdings, and First. \({{\rm{\mu }}}_{{\rm{i}}}\) and \({{\rm{\tau }}}_{{\rm{t}}}\) represent firm-fixed effects and year-fixed effects, respectively. To control for the time-varying characteristics of different industries and regions, we also include the interaction fixed effects between industry and year (Industry×Year), and the interaction fixed effects between province and year (Province×Year).

The results of the baseline regression

Table 3 reports the results of the baseline. Column (1) presents the result controlling only for fixed effects for Firm, Year, Industry×Year, and Province×Year. The coefficient on Post×Treat is −0.8214 and is significant at the 1% level. Column (2) shows that after adding controls for firm-level characteristics, the coefficient on Post×Treat is −0.8130 and significant at the 1% level. Column (3) suggests that further adding controls for corporate governance characteristics, the coefficient on Post×Treat is −0.8187 and significant at the 1% level. These empirical results indicate that big data significantly inhibits firm greenwashing, and H10 is statistically tested at conventional levels.

Table 3 The results of the baseline regression.

Robustness checks

We perform a series of checks to validate the robustness of the baseline regression results.

Parallel trend test

A fundamental assumption of the difference-in-differences design is that the treatment and the control group should exhibit the same trend before policy implementation. Following the approach of Liu and Qiu (2016), we specify model (3) to test the parallel trend.

$$\begin{array}{l}{{\rm{GW}}}_{{\rm{i}},{\rm{t}}}={{\rm{\beta }}}_{0}+\mathop{\sum }\limits_{{\rm{t}}=2013}^{2019}{{\rm{\beta }}}_{{\rm{t}}}{{\rm{Year}}}_{{\rm{t}}}\times {{\rm{Treat}}}_{{\rm{i}}}+{{\rm{\beta }}}_{1}{\rm{Controls}}\\\qquad\qquad+\,{{\rm{\mu }}}_{{\rm{j}}}+{{\rm{\tau }}}_{{\rm{t}}}+{{\rm{\varepsilon }}}_{{\rm{i}},{\rm{t}}}\end{array}$$
(3)

Where, we introduce interactions between each year and Treat, using 2012 as the base period. The βt denotes estimated coefficients for Year×Treat from 2013 to 2019. The definitions of all other variables remain consistent with those in model (2). As shown in Fig. 1, the βt from 2013 to 2015 is not significant at conventional levels, which indicates that corporate greenwashing between the treatment and control groups has no statistically significant differences. This finding supports the parallel trend assumption before the establishment of pilot zones. In addition, the βt from 2016 to 2019 is significant at least at the 10% level, which demonstrates that the establishment of pilot zones has a significant inhibitory effect on firm greenwashing.

Fig. 1
figure 1

Parallel trend test.

Placebo test

To examine whether the baseline finding is affected by unobserved factors, we randomly construct a false treatment group and repeat the procedure 1000 times. Figure 2 presents a distribution plot of false β1 and their corresponding p-values. The false β1 displays a normal distribution centered around 0, and the majority of p-values exceed 0.1, which indicates statistical insignificance at the 10% level. Therefore, other unobserved factors have no significant effect on the finding that big data inhibits firm greenwashing.

Fig. 2
figure 2

Placebo test.

Additional robustness checks

To rectify sample selection bias arising from observable variables between the treatment and control groups, we apply the propensity score matching (PSM) method. Specifically, we use the control variables from the previous regression as covariates and perform the 1:2 nearest-neighbor procedure to match the two groups. We also perform the robustness test using the entropy balancing matching (EBM) method. As shown in columns (1) and (2) of Table 4, the regression results show that the coefficients of Post×Treat are −0.8715 and −0.8212, both significant at the 1% level. Moreover, we remeasure firm greenwashing based on Eq. (1) using disclosure scores of ESG and actual performance of ESG. Column (3) of Table 4 presents the empirical finding, which indicates that the coefficient of Post×Treat is −0.7182 and significant at the 5% level. Finally, given that computer information technology firms are directly related to big data industries, they might influence the baseline regression result. To accomplish this, we conduct the robustness test by excluding these sample companies.Footnote 4 The result in column (4) of Table 4 suggests that the coefficient of Post×Treat remains significantly negative at the 1% level. Overall, the results of the robustness check consistently show that big data has a statistically significant inhibitory effect on company greenwashing.

Table 4 The results of additional robustness checks.

Further analysis

Mechanism analysis

As mentioned above, the inhibitory effect of big data on firm greenwashing depends on whether or not big data effectively enhances corporate environmental performance and reduces information asymmetry between the firm and stakeholders. We empirically investigate the effect of big data on corporate environmental performance and information asymmetry.

Environmental performance

To examine the causal relationship between big data and corporate environmental performance, we use environmental tax (ET) and green innovation (GI) as proxies for firm environmental performance.Footnote 5 Columns (1) and (2) of Table 5 present the empirical results, the coefficients of Post×Treat are −0.1090 and 0.3432, which are significant at least at the 10% level. These results suggest that big data significantly reduces environmental taxes and improves green innovation. As is evident, big data inhibits enterprise greenwashing by improving their environmental performance. Thus, H2 is statistically tested.

Table 5 The results of the mechanism analysis.

Information asymmetry

We follow the literature (Dechow et al., 1995) and measure firm information asymmetry between the firm and stakeholders using the discretionary accruals (DA) estimated by the modified Jones model.Footnote 6 Column (3) of Table 5 reports the result of the effect of big data on discretionary accruals, the coefficient of Post×Treat is −0.0255 and significant at the 10% level. Referring to existing literature (Bhattacharya et al., 2003; Francis et al., 2004), we also utilize earnings smoothing (ES) as a proxy for information asymmetry.Footnote 7 The coefficient of Post×Treat in column (4) is −11.9693 and significant at the 1% level. These empirical results show that big data significantly reduces corporate discretionary accruals and earnings smoothing. The mitigation of information asymmetry as the channel through which big data inhibits corporate greenwashing is statistically tested at conventional levels, thus supporting H3.

Heterogeneity analysis

The above sections explore the effect of big data on corporate greenwashing and its underlying mechanism. We proceed to examine whether this causal effect exhibits heterogeneity across different ownership structures and its competitive intensity.

Ownership structure

Compared to non-state-owned enterprises, state-owned enterprises reflect government willingness and are subject to stricter regulatory oversight. Thus, state-owned enterprises pay more attention to their environmental legitimacy. When environmental performance is poor, state-owned enterprises are more likely to engage in greenwashing to present an environmentally friendly image (Cheng et al., 2017). Big data enables the government to monitor and assess the environmental performance of state-owned enterprises more accurately, thereby driving these enterprises to genuinely fulfill environmental governance activities. Therefore, we predict that big data significantly inhibits state-owned enterprise greenwashing. To test this, we follow the literature (Cheng et al., 2025) and divide the sample into state-owned enterprise and non-state-owned enterprise groups according to the ownership structure. Column (1) of Table 6 shows that the coefficient of Post×Treat for the state-owned enterprise group is −1.1536 and significant at the 1% level. The coefficient of Post×Treat in column (2) for the non-state-owned enterprise group is −0.9314, but it is not statistically significant at the conventional level. These results suggest that the inhibitory effect of big data on greenwashing is more prominent in the state-owned enterprise group.

Table 6 The results of heterogeneity analysis.

Market competition

When faced with consumer demand for green products and competitors’ pressure, firms are more motivated to claim environmental sustainability (Delmas and Burbano, 2011). Big data increases the risk of greenwashing being exposed, which reduces firms’ competitive advantage, thereby driving them to improve their environmental performance. Compared to firms with low market competition, we anticipate that big data significantly inhibits the greenwashing of firms with high market competition. Following the approach of Aghion et al. (2005), we use the Lerner index to measure market competition and categorize firms into high and low market competition groups according to whether their Lerner index exceeds the industry median.Footnote 8 The regression results in columns (3) and (4) of Table 6 indicate that the coefficients of Post×Treat are −1.2087 and −0.4360, but only the high market competition group is significant at the 1% level. It implies that the inhibitory effect is more pronounced in firms with high market competition.

Conclusions

In the era of the digital economy, it is crucial to examine the effect of big data on firm greenwashing and its underlying mechanism. We use the launch of China’s national big data comprehensive pilot zone in 2016 as a policy shock to investigate its effect on corporate greenwashing. The empirical finding indicates that big data has a significant inhibitory effect on firm greenwashing, which still holds after conducting parallel trend tests, placebo tests, and additional robustness checks. The mechanism analysis demonstrates that improving environmental performance and reducing information asymmetry between the firm and stakeholders are underlying channels through which big data inhibits corporate greenwashing. Heterogeneity analysis across various types of enterprises shows that state-owned enterprises and those with high market competition are particularly predisposed to this inhibitory effect.

Our study is based on the Chinese scenario, where the design and implementation of the policy in the pilot zones are geographically constrained due to different market environments and regulatory policies, but the firms’ responses to this policy are genuine and representative of the firms’ behavior. These findings suggest the following implications. On the one hand, the pursuit of green development goals should consider the inhibitory effect of big data on corporate greenwashing. The government should further update policy designs to support the extensive application of big data in green governance. This includes using big data to improve corporate environmental performance and break down environmental information barriers, thus effectively solving the green development challenges posed by greenwashing. On the other hand, big data development has created a favorable technological environment for firms’ green governance. Firms should actively integrate big data into environmental protection efforts, environmental information processing, and disclosure, thereby promoting the coordinated development of digitalization and green transformation.