Introduction

The Securities and Exchange Commission (SEC) required listed companies to discuss “the most significant factors that make the company speculative or risky” in “risk factor” section of their Form 10-K annual report since 2005. Unlike other corporate disclosures that provide users with quantitative financial information, risk factors disclosed by listed companies present investors with a summary of their expectations of adverse risk events in the form of qualitative text (Nelson and Pritchard, 2016; Dyer et al. 2017). These unstructured textual risk disclosures are essential for stockholders and investors to know about companies’ business environment and potential risks (Azmi et al. 2020; Calvin and Holt, 2023). However, whether the risk factors in Form 10-K reports are informative has always been controversial (Kravet and Muslu, 2013; Hope et al. 2016; Nelson and Pritchard, 2016; Beatty et al. 2018).

Critics have long questioned that risk factors in Form 10-K are generic and boilerplate, and lack real and significant information (Hope et al. 2016; Beatty et al. 2018; Lukas and Martin, 2023). There are two main theories for this opposition. From one perspective, the “litigation shield” hypothesis (Skinner, 1994; 1997) suggests that preemptively alerting investors to potential negative outcomes can lower the anticipated expenses associated with securities litigation. In 1995, the Private Securities Litigation Reform Act established a “safe harbor” provision that protects companies from legal repercussions when they share forward-looking statements, provided these statements are accompanied by warnings about the inherent uncertainty of the predictions. To protect against securities litigation, managers of companies tend to disclose every conceivable risk factor rather than focus on those with a high probability of occurrence (Beatty et al. 2018, Singh et al. 2022). There are also studies to verify this theory empirically. Nelson and Pritchard (2016) provide evidence that only firms with high litigation risk have incentives to disclose meaningful future risk warnings. On the other hand, “signaling theory” posits that managers are inclined to voluntarily disclose positive information while potentially delaying or obscuring negative news (Azmi et al. 2020; Whitehead and Belghitar 2022). Kothari et al. (2009) also find that managers postpone releasing awful news to investors by designing empirical study. Azmi et al. (2020) also state that managers may have incentives to hide their operations’ actual risks. Other countries’ risk disclosures in financial reports also provide evidence that these textual disclosures may generic and boilerplate. For example, Andreas (2024) find that the risk information disclosed in UK’s companies’ annual reports are also increased in similarity and boilerplate over time.

The supporters argue that the authenticity of risk factors disclosures in Form 10-K can be assured to some extent as regulations tighten (Lambert et al. 2007; Kravet and Muslu, 2013; Chanie et al. 2024). Having realized that transparent and decision-useful disclosures facilitate well-informed investment decisions by investors, the SEC warned firms in 2010 to “avoid generic risk factor disclosure that could apply to any company”. In 2013, the SEC conduct a review of identifying excessive, unduly complex, and redundant disclosures of Form 10-K again (SEC, 2013). The SEC once again warned companies that they could be sued by investors or face hefty compensation if they didn’t disclose risks promptly (SEC, 2016). Basu et al. (2022) find that the risk factors in Form 10-K can improve the predictive power of firms’ future investments due to their multidimensional nature, which shows the informativeness of these risk information. Huang et al. (2023) empirically find that categorizing the risk factors in Form 10-K based on risk headings can enhance its readability, thereby facilitating investors’ risk perception.

Several studies have demonstrated that these required risk factors are of informativeness, as they can increase the investors’ perceptions of the potential variability in a firm’s expected future performance (Hope et al. 2016; Campbell et al. 2019), which mainly focus on the impact of risk disclosures on general stock market measures and demonstrate that they are informative to investors and can influence their decisions. However, few studies focus on the specific content of disclosure and examine whether they reflect the actual risk profiles.

To bridge this gap, this study innovatively uses the typical risk event of the U.S. subprime crisis that started in 2007 to test the informativeness and effectiveness of the risk factor section. When the financial crisis coming, the financial institutions should be the first to realize the systemic problems (Azmi et al. 2020) and disclose more risk factors related to the causes, evolution, specific risk events of the impending and ongoing crisis. The expression ways, such as the sentiment and the level of details when disclosing the risks should also show different states as the crisis evolves (Dyer et al. 2017; Nelson and Pritchard, 2016). We introduce the text mining method to capture the textual attributes and specific contents disclosed by financial institutions as the subprime crisis developed and study whether and how they reflect the real risk profiles at that time. Several text indicators are used to comprehensively capture the expression ways of risk disclosures during the progresses of the subprime crisis. Sentence Latent Dirichlet Allocation (Sent-LDA), one of the topic models, is employed to identify the risk topics from textual data and then we can qualitatively analyze whether the risk topics have reflected the crisis or not. We further construct a dictionary of the subprime crisis and propose a matching score indicator to quantitatively analyze the consistency between the disclosed risks and the subprime crisis dictionary. Finally, a logic regression is used to study the different ways of risk disclosing for firms that survived and failed in the crisis.

Overall, this study provides new evidence to the ongoing debate over the informativeness of risk factors in financial reports. The innovations and contributions of this paper are three-fold in terms of theory, methods, and results. Firstly, this is one of the earliest studies to examine the informativeness of risk factors from the perspective of the specific content of disclosure. Unlike prior studies that focus on the investors’ market reactions, this paper uses the real risk event of the subprime crisis in the U.S. to directly examine whether and how the expression ways and the specific contents reflect the real risk profiles in the crisis.

Secondly, this paper develops a comprehensive framework that measures textual information from multiple dimensions, providing a more systematic perspective for capturing the characteristics of risk disclosures in Form 10-K. While existing research has utilized some measures to quantify textual information, such as sentiment index, this study simultaneously mines both the attributes and content of the text, thereby achieving a thorough quantification of the information in Form 10-K. Additionally, the paper employs text-mining techniques to examine the consistency between the company’s self-disclosed risk information and external risk evaluations.

Thirdly, the results support that on the whole the risk factors disclosed by financial institutions are informative and effective, which has important implications for scholars, investors, and regulators. Besides traditional quantitative data, the forward-looking unstructured textual risk disclosure is also a reliable new data source to identify, measure, and analyze the potentially significant risks at present and in the future. This data is valuable and unique because it offers the risk information from the view of the institutions themselves.

The structure of this paper is as follows. Section “Literature review” reviews the related literature. Section “Methods” designs the research methods used in this study. Section “Data description” introduces the empirical data, and Section “Results” is the empirical results. Finally, Section 6 concludes the paper.

Literature review

Since the listed companies in the U.S. are required by SEC to disclose their risk factors in Item 1A of financial reports, this part of textual information has attracted the attention of many studies. This section reviews the literature related to the informativeness and effectiveness of risk factors disclosed in Form 10-K.

Most existing studies analyze the informativeness of risk disclosure by examining the risk disclosures’ effects on market volatility, and usually conclude that these textual risk data indeed contain useful information that can impact the fluctuations of the capital market. For example, Kravet and Muslu (2013) examine the effectiveness of risk factors and find that investors react correspondingly to annual increases in risk disclosures. Campbell et al. (2014) focus on the total number of words in the risk factor section, as well as the risk types including idiosyncratic, systematic, tax, financial, and legal risk, and find that investors have market reactions on these textual measures. Hope et al. (2016) find stronger market reactions to more specific risk disclosures, so the risk factors with higher specificity are of informativeness. Beatty et al. (2018) examine market participants’ responses to the new, discontinued, and repeated disclosures, and find that the documented changes ineffectiveness is driven by new disclosures initiated in the current year. They also find that the risk factors are informative in the preceding period. Campbell et al. (2019) also find that the risk disclosures are informative because the investors incorporate these information into stock price fluctuations.

However, few studies try to examine whether companies have been authentically disclosed the risks they face, while this is the question that regulators and investors have attached great importance to. The most relevant studies with our paper are the studies that examine the changes in disclosures before and after the financial crisis .

Daniel and Meriem (2016) use the content analysis method to manually codify the risk factors into predetermined topics or categories and find that the U.S. subprime crisis has positive effects on the length of risk disclosures. Differently, they use a manual method to extract the critical risks from a large corpus of unstructured text data. Besides, they have predetermined the categories of risk factors in the analysis, such as financial, operational, and strategic risk. This type of supervised analysis may restrict the information captured to some extent.

There is another study relevant to our research. Azmi et al. (2020) focus on whether banks warned investors the potential risks prior to the financial crisis, or they disclosed the dangers after the crisis had occurred. The results reveal a significant increase in the average number of risks disclosed by banks, and their tone also became more negative over from 2008 to 2009 (Azmi et al. 2020). Thus, banks failed to disclose the crisis business risk before the financial crisis. Compared with our research, Azmi et al. (2020) focus on Form 10-K reports’ timeliness rather than their credibility. Besides, the textual information they mainly focused on is the textual attributes of number and tone. In our study, multiple methods are adopted to capture both the expression ways and the contents of risk factors.

Overall, the existing studies primarily focus on the impact of risk disclosures on the general stock market. Their findings reveal that those risk disclosures are informative, because they increase investors’ perceptions of the firm’s expected future performance. A few studies try to test the quality of risk factors using manual methods, but the manual collection method is time-consuming and labor-intensive. It might be somewhat subjective because it highly relies on the expert’s experience. This paper uses the real risk event of the subprime crisis as a test criterion and adopts the text mining method to analyze whether the risk disclosures are in line with the actual risk profiles before, during, and after the crisis, which expands the existing studies theoretically and methodically.

Methods

An overview of the research design

This study uses the real risk event of the U.S. subprime crisis to examine the informativeness of the textual risk disclosures in Form 10-K. Theoretically, if the risk factors that the companies disclose in their financial reports are reliable, these disclosures should change as the companies’ risk environment changes (Nelson and Pritchard, 2016). The U.S. court ruled that a firm’s “cautionary language” is insufficient to warrant safe harbor protection if it remains unchanged while the risks evolve. Market financial and economic crises can be viewed as external shocks impacting a firm’s financial performance and profitability (Levy et al. 2022). When a financial crisis coming, the financial institutions should have been the initial entities to detect systemic issues (Azmi et al. 2020). If they do not hide and decide to frankly raise the concerns to the investors, then they will disclose more risk factors related to the upcoming and ongoing crisis. The expression ways such as the sentiment and the level of details when disclosing the risks should also show different states as the crisis evolves (Huang et al. 2014). For example, when a financial institution faced the subprime crisis, its disclosures’ sentiment may be more negative. It may be more inclined to disclose risk factors related to loans, mortgages, etc.

We examine the changes of risk disclosures in Form 10-K from the perspectives of both expression ways and specific content. By analyzing the textual attributes and contents of the disclosed risk factors as the subprime crisis evolves and study whether and how they reflect the real risk profiles at that time, we can judge whether the financial institution has credibly disclosed their risk factors or not. Five indicators, including length, specificity, stickiness, boilerplate, and sentiment, are adopted to measure risk factors’ textual attributes and examine how the expression ways of the textual risk data changes in a crisis. The textual risk factors disclosed by financial institutions are too extensive for humans to review and summarize manually (Wei et al. 2019). So we introduced a text mining method of Sent-LDA to extract the risk topics, i.e. the core risk information, disclosed in Form 10-K, and qualitatively analyze whether these risk topics have reflected the actual risk profiles during that time. We also construct a dictionary of the subprime crisis and propose a matching score to quantify the crisis-related risk information included in the textual risk disclosures. The authenticity of the risk factor section in U.S. financial reports can then be verified, to a certain extent, based on these qualitative and quantitative analyses.

The following sub-sections present the methods used in this paper. Section “Examine the informativeness of risk disclosures from the ways of expressions” introduces the textual attributes that reflect the ways of disclosing risk factors. Section “Examine the informativeness of risk disclosures from the detailed contents” introduces the Sent-LDA model used to extract the specific contents of risk factors disclosed in Form 10-K and methods to quantify the consistency between risk factors disclosed by companies and the subprime crisis dictionary.

Examine the informativeness of risk disclosures from the ways of expressions

The financial institutions in US are required to disclose “the most significant factors” in the Form 10-K reports. These risk disclosures have no specific form and are unstructured textual data. They have different ways of expressions, such as the different number of words, the different level of detail, different tone, etc., which are called textual attributes in the field of text mining. The textual attributes can be measured by interpretable and quantizable indicators. When the company’s risk profile changes, these textual attributes of risk factors may change as well.

In this paper, we select five textual attributes than can reflect the informativeness of disclosure, including length, specificity, stickiness, boilerplate, and sentiment. These textual attributes can capture the text characteristics from different aspects. Table 1 presents the description,quantification and references of the textual attributes used in this paper.

Table 1 The description and quantification of textual attributes.

Length represents the number of words used in risk factor disclosures, which is the most straightforward and most widely used textual attribute to quantify the textual data. More risk factor disclosures mean that the company tends to convey more information to investors. Thus, the length is selected as an essential measure to examine whether the level of volume of risk factor disclosure changes with the subprime crisis’s evolution.

Specificity, stickiness, and boilerplate three attributes are usually used to describe the quality of risk disclosures. Specificity established by Hope et al. (2016) is adaptable to quantify the level of specification of firms’ qualitative disclosures. It is measured by the total number of entities such as people, locations, monetary amounts, organizations, percentages, dates, and times identified in the text using the Stanford Named Entity Recognizer tool. This metric indicates that text containing a higher number of entities is more precise compared to more general descriptions. (Dyer et al. 2017). For example, a statement like “Our company may not be able to implement its growth strategy” may help companies comply with regulations. However, it is unlikely to be very informative to users in perceiving the actual risks of the corporate (Kravet and Muslu, 2013), because it can be applied to many other firms. In general, the higher specificity usually means that the company discloses more risk factors related to itself, and the disclosures are more credible.

Stickiness and boilerplate two measures are used to describe the repeatability of the risk disclosures from two different perspectives. Stickiness is used to measure the re-use of the same disclosures from a prior period (Brown and Tucker, 2011). It is calculated as the percent of sticky words to the total words, while the sticky words are those found in sentences containing at least one identical 8-word phrase used in the previous year’s Form 10-K (Dyer et al. 2017). The SEC requires companies to disclose more risk factors they face and to avoid “copying and pasting” from earlier reports (Johnson 2010). Thus, in general, the lower level of stickiness implies a more probability of reliable risk disclosures.

Boilerplate means the percent of boilerplate words within a specific text. In the context of Form 10-K, boilerplate words are those that contain at least one 4-word phrase common to at least 75% of all firms during a particular fiscal year (Lang and Stice-Lawrence, 2015). The SEC notes that companies should avoid “boilerplate” discussion in risk factor disclosure that could affect any issuer (Nelson and Pritchard, 2016). A high level of boilerplate signifies a more generic and standardized disclosure, and the annual reports containing more boilerplate are less informative (Dyer et al. 2017). Thus, in general, the lower level of boilerplate is a sign of a less probability of reliable risk disclosures because disclosure that indiscriminately duplicates many other firms’ disclosure is unlikely to contain important firm-specific information.

Sentiment analysis of textual risk disclosures is also adopted, which is essential and common in text analysis (Agarwal et al. 2016). The financial sentiment dictionary designed for Form 10-K reports’ textual data by Loughran and Mcdonald (2014) is used to study the tone of risk disclosures. When a company is in crisis, managers may use a more negative tone to disclose the risks they face (Kravet and Muslu 2013). Thus, if the risk factors are reliable, their sentiment should change as the risk event develops.

These five textual attributes are used to describe the expression ways of risk factors disclosed in financial reports. By examining the changes of textual attributes when disclosing risk factors as the crisis develops, an initial probe into whether these disclosures are informative and effective can be made.

Examine the informativeness of risk disclosures from the detailed contents

This section further explores whether and how the specific contents disclosed by financial institutions reflect the occurrence of risk events. We adopt the text mining method of Sent-LDA model to extract the risk topics latent in the textual risk disclosure, then design an experiment to quantify the consistency between the risks disclosed in financial reports and subprime crisis dictionary.

Sent-LDA model used to extract risk topics from textual risk disclosures

Public companies in US are required to disclose key risk factors derived from the experiences of senior management under real operating environment. Typically, each risk factor includes a heading and a thorough description. The heading provides an accurate summary of the associated risk. For example, “a downgrade or a potential downgrade in our credit ratings could result in a loss of business and adversely affect our results of operations and financial condition” is one of the risk headings disclosed by ACE Limited in 2018, which clearly outline that the rating downgrade as an essential risk point. By collecting all the risk headings and identifying the risk points that financial institutions attached importance to before, during, and after the financial crisis, it can be found that whether and how the risk factors reflect the real risk environment.

One way of extracting all the risk factors and evaluating their authenticity in different risk profiles is to read through all the fillings to identify the risk points and determine the date of their appearance. However, this approach demands a lot of human effort and is limited by professional ability. Thus, a topic modeling method, named the Sent-LDA model, is introduced to identify the significant risk topics disclosed by all financial institutions.

The topic model is an unsupervised machine-learning technique used to discover the topics in a corpus of documents (Huang et al. 2018; Brooks and Schopohl 2018). The Latent Dirichlet Allocation (LDA), proposed by Blei et al. (2003), is one of the most popular topic modes. This model functions on the premise that each word in a document is generated by initially selecting a topic with a specific probability, followed by choosing a word from that topic according to another probability. It utilizes a three-layer Bayesian probability framework to represent the “document-topic-word” structure (Loughran and McDonald, 2016; Hu et al. 2024). Utilizing a substantial corpus of textual data, the LDA model can autonomously group words that originate from the same topic (Brown et al. 2019; Fei et al. 2022; Li et al. 2022; Zhu et al. 2024). Therefore, the topics latent in a large amount of textual data can be extracted by analyzing the keywords under each topic.

The LDA model is also applicable for identifying the risk topics in Form 10-K. A single risk heading typically reflect one specific risk factor, implying that words within a given sentence are likely derived from one topic (Zhu et al. 2022). However, the traditional LDA model operates on the bag-of-words assumption, and it might lead to each word in a sentence being from different topics. To better fit the characteristics of textual data in Form 10-K, this paper employs the Sent-LDA model introduced by Bao and Datta (2014) for short texts. The Sent-LDA model builds upon LDA by assuming “one topic per sentence,” emphasizing that each sentence within a document pertains to a single topic Fig. 1 illustrates the graphical model of Sent-LDA, and Table 2 explains the meanings of the associated parameters. The generative process of Sent-LDA is detailed below:

  1. (1)

    For each topic \(k\in \{1,\ldots,K\}\), generate a distribution of vocabulary words

    $${\beta }_{k} \sim Dirichlet(\eta ).$$
  2. (2)

    For each document d,

    1. a.

      draw a vector of topic proportions \({\theta }_{d} \sim Dirichlet(\alpha )\).

    2. b.

      for each sentence s in the document d,

      1. i.

        draw a topic assignment \({z}_{d,s} \sim Multinomial({\theta }_{d})\).

      2. ii.

        for each word \({w}_{d,s,n}\) in a sentence s, draw a word \({w}_{d,s,n} \sim Multinomial({\beta }_{{z}_{d,s}})\)

Fig. 1
figure 1

Graphical model of Sent-LDA.

Table 2 The parameters and meanings of Sent-LDA model.

The risk factors section of a single Form 10-K filling is treated as one document and all the documents are merged into the document set. By repeating the generative process steps, the sentences reflecting the same risk are clustered into one topic. Then, by analyzing the high-frequency words under each topic, the financial institutions’ risks in different periods can be obtained.

For the parameters setting, following Griffiths and Steyvers (2004), the hyperparameter α is set to \(50/k\). The topic number k is usually determined by the indicator of perplexity in previous studies. Perplexity is commonly employed to indicate the accuracy of clustering across various topic numbers. Denoting M, \({D}_{test}\), \({w}_{d}\) and \({N}_{d}\) as document number, a test set of documents, word w in \(d{\rm{th}}\) document and total words number of \(d{\rm{th}}\) document respectively, the per-word perplexity is defined as

$$perplexity({D}_{test})=\exp \left(-\mathop{\sum }\limits_{d=1}^{M}\log p({w}_{d})/\mathop{\sum }\limits_{d=1}^{M}{N}_{d}\right)$$
(1)

Lower perplexity usually suggests improved classification performance.

For the training algorithm, Variational Expectation-Maximization (VEM) algorithm is chosen to train the Sent-LDA model following Bao and Datta (2014), as it has demonstrated superior performance. Some parameters associated with the VEM training algorithm must be predetermined. The convergence threshold for variational inference in this study is set to 10−8, the maximum number of VEM iterations is capped at 1500, and the VEM convergence threshold is also set to 10−5.

Quantify the importance of risk topics

The Sent-LDA model is capable of extracting topics, thereby refining the content of risk factor disclosures. This process facilitate the econometric analysis by automatically transforming textual data into numeric variables. After identifying the risk topics, the importance of each risk topic can also be obtained by quantifying each risk topic’s Importance, to analyze the evolution of significant risks disclosed by financial institutions. To be specific, the Sent-LDA model can quantify each topic’s importance by output the variable θd, which represents the proportion of the number of sentences clustered in this topic to the total number of sentences in the document d (Wei et al. 2019). The importance of the topic t is defined as

$$Importanc{e}_{t}=\mathop{\sum }\limits_{d=1}^{D}{\theta }_{d}^{t}$$
(2)

The greater the importance of each topic, the more times this risk topic is disclosed, and it can be considered more critical by financial institutions. We thereby can analyze the evolution of risks that companies disclosed before, during, and after the subprime crisis and check their consistency with the real risk profiles.

Quantify the consistency between risk disclosures and the subprime crisis dictionary

An experiment is further designed to quantitively examine whether the financial institutions have disclosed the risks related to the subprime crisis or not during that period. A dictionary containing the keywords about the subprime crisis is built based on the newspaper articles during the crisis. Then the subprime crisis dictionary is compared with the disclosed risks from financial reports to measure their consistency. If the risk disclosures during or before the subprime crisis highly match with the subprime crisis dictionary, it indicates that financial institutions indeed have noticed the crisis and truthfully write the related risks in their financial reports.

There are several ways to build a dictionary based on a corpus. The most straightforward method is based on the frequency of words. However, the repeated words with high frequency cannot fully reflect the uniqueness of this corpus (Egbert and Biber, 2019). In this paper, we calculate the word’s keyness for identifying the distinctive keywords in a corpus. The key words with high keyness are not just the most frequent words. Keyness enables the analyst to see which words are used significantly more frequently in the specialized corpus than in the reference corpus, reflecting what the text is really about and avoiding trivia and insignificant detail (Scott and Christopher, 2006).

The selection of words for the subprime crisis dictionary is based on their frequency within the word list of the targeted corpus (news from the subprime crisis period), compared to their frequency in the reference corpus (news from periods before and after the subprime crisis). The “keyness” is defined to measure the importance of each word using the log-likelihood test, which is more appropriate than the chi-square test, especially when tackling long texts or a whole genre against the reference corpus. The log-likelihood G2, representing the keyness, is calculated by Eq. (3).

$${G}^{2}=2\mathop{\sum}\limits _{i}{O}_{i}\,\mathrm{ln}\left(\frac{{O}_{i}}{{E}_{i}}\right)$$
(3)

where i equals 1 for the targeted corpus and equals 2 for the reference corpus. The Oi and Ei are the observed frequency and expected value where the word occurs in the corpus. Let Ni denote the total number of words in the corpus. The expected value Ei is calculated by Eq. (4).

$${E}_{i}=\frac{{N}_{i}{\sum }_{i}{O}_{i}}{{\sum }_{i}{N}_{i}}$$
(4)

A word is considered highly significant if it possesses a high keyness value (Egbert and Biber, 2019). For instance, let c represent the word count in the target corpus, and let d indicate the word count in the reference corpus. The frequency of a particular word is denoted as a in the targeted corpus and b in the reference corpus. For a specific word, \({E}_{1}=c\times (a+b)/(c+d)\), and \({E}_{2}=d\times (a+b)/(c+d)\). Then, the keyness of the word is calculated as \({G}^{2}=2\times ((a\times \,{\mathrm{ln}}(a/{E}_{1}))+(b\times \,{\mathrm{ln}}(b/{E}_{2})))\). The higher the G2 value, the more significant is the difference between two frequency scores from the two corpora. Therefore, the more the word is associated with the subprime crisis. Words are considered as the key words of subprime crisis if they pass the log-likelihood statistical tests (\(p\,value < 1\times {10}^{-6}\)).

Assume n words are selected to the subprime crisis dictionary D. The \({j}_{th}\) word in D is denoted as \(wor{d}_{j}\) and its keyness is \(keynes{s}_{j}\). The frequency of \(wor{d}_{j}\) in the textual risk disclosures of financial reports is \(frequenc{y}_{j}\). Then we define “matching score” to measure the consistency between the subprime crisis dictionary and the corpus of risk factors R as Eq. (5).

$$Matching\_scor{e}_{D,R}=\mathop{\sum }\limits_{j=1}^{n}keynes{s}_{j}\times frequenc{y}_{j}$$
(5)

The more consistent the textual risk disclosure with the subprime crisis dictionary, the greater the value of their matching score. By this quantitative measure, we can intuitively judge when financial institutions disclosed more information about the subprime crisis.

Furthermore, we can also calculate the matching score between the subprime dictionary and the specific risk topic identified by the Sent-LDA model. Assume T topics are identified from the risk factors, and the importance of tth topic is \(Importanc{e}_{t}\), which is calculated by Eq. (2). The frequency of \(wor{d}_{j}\) in the tth topic is \(frequenc{y}_{t,j}\). Another indicator \(Matching\_scor{e}_{D,topi{c}_{t}}\) to measure the matching degree between risk topic t and the subprime crisis dictionary is shown in Eq. (6).

$$Matching\_scor{e}_{D,topi{c}_{t}}=Importanc{e}_{t}\times \mathop{\sum }\limits_{j=1}^{n}keynes{s}_{j}\times frequenc{y}_{t,j}$$
(6)

The larger the matching score, the greater the consistency between the risk topic and subprime crisis dictionary is. This indicator can help understand whether critical risk points that financial institutions disclosed accord with the risks of subprime crisis or not from a more micro perspective.

Data description

The empirical data is based on the textual risk disclosures in Form 10-K of listed financial institutions in US, which are obtained from Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. These Form 10-K reports can be downloaded by entering their identifier Central Index Key (CIK) code. According to the Global Industry Classification Standard (GICS), financial institutions that categorized into four sub-sectors of “banks,” “diversified financials,” “insurance,” and “real estate” are determined. There CIK codes can be obtained from the Compustat databases according to the GIC codes. The SEC started to require companies to disclose their risk factors in 2005, and it has been relatively standardized until 2006 (Wei et al. 2019). Thus, using the web crawling technique, a total of 14089 Form 10-K filings of financial institutions from 2006 to 2022 are obtained.

The SEC requires companies to disclose each risk factor using a risk heading and a detailed explanation. These two parts of information are separately extracted. For these financial statements in HTML format, a set of rules is created, including the extraction of bold, italic, underlined words using HTML tag to distinguish the headings and the explanation paragraphs. For TXT files, the capitalization of words in the sentence, the position of the sentence, etc are used to separate these two parts. By applying these criteria, the risk factors of 11,553 fillings are successfully extracted. Then we manually check the causes of 1732 failures. Although this is a time-consuming process, it can ensure that the data is comprehensive and reliable. Risk factors of 368 fillings are manually obtained during this process. The remaining 1364 financial statements are still unsuccessful for three reasons: 1) Small firms are not required to disclose risk factors. There are 975 firms in this case, accounting for 71.5% of the total; 2) The disclosure format of 345 firms is very irregular. The headings and content of the risk factors cannot be distinguished; 3) The remaining 44 firms do not disclose risk factors in the specified position.

The final samples of this paper include 3,344,335 risk factors from 14089 Form 10-K statements of 1685 financial institutions from 2006 to 2022, and the numbers of fillings and risk factors for each year are presented in Table 3. The empirical samples in this paper have been significantly expanded compared to prior relevant studies that examine the quality of Form 10-K, which consists of 59 banks’ 236 Form 10-K filings throughout 2006–2009 (Daniel and Meriem, 2016), and 134 banks’ 2627 Form 10-K filings over 2006–2012 (Azmi et al. 2020).

Table 3 Samples used in empirical studies.

To examine whether these risk factors embody the risk evolution of subprime crisis event or not, the samples are further divided into three parts, i.e. before, during, and after the subprime crisis. By referring to Zhu et al. (2015), the subprime crisis period is set to be 2008 to 2010 in this paper. The samples before the crisis (2006–2007) and during the crisis (2008–2010) consist of 1841 and 2760 Form 10-K filings, respectively. To ensure the balance of data volume at each stage, risk factors disclosed from 2011 to 2013 are selected as the post-crisis samples, consisting of 2609 Form 10-K filings.

Results

In this section, first, we analyze how the textual attributes of risk factor disclosures change over time. Then, whether the disclosed risks are consistent with the financial crisis is examined. Furthermore, the differences in the risk disclosures between companies that survived and failed during the crisis are analyzed.

Changes in textual attributes with the evolution of the subprime crisis

In this section, five textual attributes, including length, specificity, stickiness, boilerplate, and sentiment, are used to make the ways of disclosing risk factors quantizable. The trends of the disclosure ways of the risk factors over time are analyzed.

Figure 2 provides the mean, median, and quartile values and growth rate of the five textual attribute values from 2006 to 2022. The primary textual attribute of length demonstrates a clear and consistent increasing trend over time, which indicates that financial institutions are using longer and longer text to describe the risk factors. Considering the growth rate, the most significant increases in length are in 2009, which means that the financial crisis saw a substantial increase in the size and volume of risk factor disclosures in that year.

Fig. 2
figure 2

Trends of expression ways of the risk factor over time.

The textual attributes of specificity, stickiness, and boilerplate are measured to examine the quality and informativeness of the risk factor disclosures. Figure 2 shows that the growth rate of specificity has increased significantly in 2009, while the change rate of stickiness and boilerplate decreases substantially during the financial crisis, especially in 2009. The increase of specificity between 2008–2009 reveals that the financial crisis prompted firms to disclose more risks related to themselves. Besides, despite observations from the 2011 Klynveld Peat Marwick Goerdeler (KPMG) survey and regulatory authorities that risk factor disclosures, once included in a company’s Form 10-K filing, are rarely removed from future filings (Beatty et al. 2018), the significant decrease of stickiness means that firms reduce the re-use of the same disclosures from a prior period. It further reveals that they tend to disclose risks they were facing during the financial crisis, not just adopt muddle through their work using previous disclosures.

As for boilerplate, there is a significant decrease in the change rate in 2008. Hans Hoogervorst, the chairman of the IASB, has identified the use of boilerplate language as a primary concern, and points out that such language is unhelpful when it merely serves to minimize legal or reputational risks without providing meaningful disclosure (Lang and Stice-Lawrence 2015). Although we cannot rule out that the publication of regulatory documents may reduce the boilerplate, the findings in this paper also show that when the risk situation changes, the financial institutions tend to reduce the template and disclose more risk information.

Finally, the sentiment reflecting the tone of risk disclosures is measured. As shown in Fig. 2, the sentiment significantly decreased in 2009 and remain stable during the financial crisis. The findings reveal that when the risk environment changes, financial institutions will use more negative words to inform investors of their risks.

Overall, financial institutions tend to disclose their risks in the more specific and negative language during the financial crisis and significantly reduces stickiness and boilerplate. This reflects that the ways of disclosing the facing risks can reflect the significant changes in the external risk environment. Our first empirical research provides initial evidence that the risk factor disclosures in Form 10-K are not invariable documents. They reasonably changed the expression ways to convey more risk information when the subprime crisis erupted.

Changes in risk topics with the evolution of the subprime crisis

Section “ Changes in textual attributes with the evolution of the subprime crisis” analyzes the change of textual attributes of risk factors when the financial crisis occurs. This section further explores whether and how the detailed contents disclosed by financial institutions reflect the occurrence of risk events. The risk topics are discovered from the textual risk disclosures and their changes in importance are qualitatively analyzed as the subprime crisis evolved. Then the consistency between the risk disclosures and subprime crisis dictionary is further measured quantitatively.

Risk topics identified from the risk disclosures in Form 10-K

Identifying the significant risks disclosed by financial institutions is the first and essential step to extract the detailed contents of risk disclosures in Form 10-K. The Sent-LDA model is adopted to identify the risk topics disclosed in Form 10-K before, during, and after the subprime crisis. The samples in each stage are, respectively, input into the Sent-LDA model.

Firstly, the range of the optimal number of topics needs to be determined by calculating the perplexity based on Eq. (1). Following Blei and Lafferty (2007) and Bao and Datta (2014), this paper adopts tenfold cross-validation to compute held-out per-word perplexity. Perplexity consistently decreases as the number of topics increases, making it a monotone decreasing function (Bao and Datta, 2014). To identify the optimal number of topics, it is suitable to select a number of topics that is at or beyond the point where perplexity starts to converge.

Figure 3 shows the perplexity values under the different number of topics. The values of perplexities begin to converge at around 140 and tend to be steady. Thus, an appropriate number of topics is approximately 140. We further check the clustering results for topic numbers 120, 125, 130, 135, 140, 145, and 150. Our findings indicate that clustering performs optimally when the number of topics is 140. Hence, 140 is set as the number of topics for each of the three periods.

Fig. 3
figure 3

The perplexities with different number of topics.

By applying the Sent-LDA model to the samples over three periods, 140 topics and corresponding keyword lists are obtained, respectively. Then, we need to identify what risk point that each topic represents by naming and labeling each topic according to the keyword list. There are some automatic methods to mark the topic. However, in most works, the application of the topic model primarily concentrates on a particular domain that necessitates specialized expertise, and manual labeling techniques have been demonstrated to achieve greater accuracy (Bao and Datta, 2014). Thus, we label the topics manually by analyzing the high-frequency keywords and determine the specific names of topics by reference to the relevant keywords in Campbell et al. (2014). For the 140 topics identified in each period, it is important to note that while some topics have differing keyword lists, they share similar high-frequency words and represent the same type of risk. These risk categories are essentially the same topic but are classified into different groups during the model’s execution, and such topics can be labeled the same and merged into one topic. Additionally, some topics are difficult to identify with a clear meaning, which are categorized as “others”.

Finally, 28, 28, and 30 nonredundant risk topics are identified in each stage. Among them, 24 risk topics (excluding others) are identified in all three different sets of samples, while 2 (tenant bankruptcy and hedging transaction), 2 (debt risk and deposit risk), 3 (goodwill impairment, third party risk, and foreign exchange risk) risk topics appearing only in the pre-crisis, mid-crisis and post-crisis stages respectively. The risk topic of “reputational risk” appears in the period before and after the subprime and the “tax risk” appears in the period during and after the subprime crisis. As a result, a total of 32 different risk topics are identified from the risk factor text of the three stages.

To present the identified topics more intuitively, word clouds are employed to illustrate the high-frequency keywords associated with each topic. Each word cloud includes the top 25 most frequent words, with larger font sizes representing higher probabilities of occurrence within the topic. Figure 4 shows the word clouds for the 32 non-repetitive risk topics identified by the Sent-LDA model. For example, the fourth-word cloud in the first line shows that the words “common”, “stock”, and “price” appear the most frequently, which means that many risk factors expressing risks associated with stock price fluctuations. Thus, this risk topic is labeled as “common stock fluctuations”. To ensure that the risk topic is reasonably labeled, we have checked the risk headings assigned to each topic and find that all these topic labels can reflect the risk described in the headings well.

Fig. 4
figure 4

Word clouds of risk topics identified from the textual risk disclosures in Form 10-K.

The changes in risk factors with the evolution of the subprime crisis

After identifying the risk topics disclosed by the financial institutions, the importance of each risk topic can be quantified, which makes it possible to analyze whether the risk perception and assessment of financial institutions are along with the evolution of the financial crisis and reflect the real risk environment at that time.

The Sent-LDA model can identify potential topics in large amounts of textual data and trace back to which topic each sentence was assigned to. Therefore, the Sent-LDA model is capable of quantifying the importance of each risk topic according to Eq. (2). This equation calculates the proportion of risk headings grouped under a specific risk topic relative to the overall number of risk headings. A higher ratio indicates that this risk is disclosed more frequently, and is given greater importance by financial institutions. The importance of risk topics disclosed by financial institutions before, during, and after the financial crisis are presented in Table 4.

Table 4 Proportions of risk topics in three period.

The subprime crisis has passed for some years. The scholars, practitioners, and regulators have made a great effort to investigate its origins, development, and end. Now the whole risk event is clear. We can analyze whether the identified risk topics from textual risk disclosures are consistent with the subprime crisis’s accepted risk perception. Figure 5 shows the dynamic transformation in each risk topic’s importance to display the changes of risk topics over time.

Fig. 5
figure 5

Changes in risk topics before, during and after the crisis.

The findings in Fig. 5 provide evidence that the changes in risk topics identified from Form 10-K can reflect the financial crisis’s evolution in general. The seeds for the crisis were sown prior to 2008. Before 2006, the U.S. subprime mortgage market experienced rapid growth, driven by the ongoing boom in the housing sector and historically low interest rates. However, with the cooling of the housing market and short-term interest rates increased significantly, the repayment rates of subprime mortgages also rose sharply. Consequently, the burden of repayment of home buyers increased considerably. Simultaneously, the ongoing cooling of the housing market further complicated efforts for homebuyers to sell their properties or refinance through mortgage options. This scenario directly resulted in borrowers of large-scale mortgages being unable to repay their loans punctually, ultimately contributing to the subprime crisis.

Reflected in the risk topics identified before the financial crisis over 2006–2007, the loan loss is the most frequent risk topic disclosed. Loans originated in 2006 and 2007 exhibit a notably higher delinquency rate at every mortgage loan age compared to those originated in prior years at the same stages (Yuliya and Otto, 2009). This means that during the relatively prosperous period of mortgage lending before the financial crisis broke out, financial institutions realized that there might be a hidden risk underlying a large number of mortgage loans. Another risk topic frequently mentioned before the financial crisis is interest rate fluctuations, which are ranked third and are consistent with the continued raising of interest rates. This had revealed to some extent that financial institutions had recognized that the continual fluctuation of interest rates would bring potential risks before the financial crisis broke out and disclosed it as an essential risk factor in Form 10-K.

Besides, when the economy was relatively prosperous, another notable phenomenon of the U.S. market was excessive financial innovation. Financial institutions created many financial derivatives, such as asset securitization, and sought more profits through continuous leverage (Bessler and Kurmann, 2014; Hasan and Manfredonia, 2022). This is also reflected in the identified risk topics. Two risk topics labeled product and service problems, and competition risk are disclosed before the financial crisis ranked 9 and 10, but show a significant downward trend during and after the financial crisis. The other frequently declared risk topics, such as real estate risk, macroeconomic risk, and funding risk, are also typical risk factors on the eve of the crisis.

During the period of subprime crisis, the financial market was turbulent, which is also directly reflected in the frequently mentioned risk topics. Financial institutions repeatedly alerted investors that they were encountering market risks. The macroeconomic risk rises significantly in importance from sixth to second place. Moreover, investors pulled their money out of the markets, and credit institutions drastically decreased lending to limit losses, which resulted in a capital shortage (Andersen et al. 2012, Marshall et al. 2019, Carlson and Macchiavelli, 2020). Fahlenbrach et al. (2012) also find that banks heavily dependent on short-term funding sources, such as deposits, fared poorly during the crisis. Reflected in risk factor disclosures, the risk topic of funding risk shows a significant upward trend. Besides, the sharp decline in market liquidity had made banks reluctant to lend, so that business operations in the real economy faced difficulties. The shrinking of the real economy, in turn, led to a decline in demand for loans (Singh et al. 2022). This detail also demonstrats that the risk factors related to the loan loss are the most frequently disclosed during the financial crisis.

The severe consequences made regulators aware of the urgency of strengthening financial institutions’ supervision. A series of laws and regulations were promulgated to constrain firms’ misconduct. As shown in Fig. 5, we observe elevated risks for compliance risk and litigation risk compared with the period before the financial crisis. Furthermore, downgrade risk, disclosed in great frequency during the financial crisis, also directly reflects the importance that financial institutions are attached to. A major issue revealed by the financial crisis is the global financial system’s heavy dependence on external credit ratings for investment decisions and risk management. This excessive reliance on credit ratings fosters significant moral hazard (Mullard 2012). After the financial crisis, stricter regulation has gradually improved the credit rating system. Thus, the change curve of downgrade risk shows a continuous upward trend, which indicative of worries by some financial institutions regarding the downgrade risk after the crisis.

In the period from 2011 to 2013, the most frequently disclosed risk topic is compliance risk. The subprime crisis exposed the financial supervision system’s defects after the crisis; the U.S. undertook the most significant overhaul of its legal and regulatory regime. The need for substantial support from the government and central banks ignited a vigorous debate about the adequacy of current bank risk management standards and the ideal approach to bank regulation (Bessler and Kurmann, 2014). Thus, financial institutions are increasingly disclosing risk factors related to regulatory and legal risk against a backdrop of increased regulation. Besides, financial institutions’ internal control is not valued until after the financial crisis, which jumped from the original 17th to 11th place. In the aftermath of crisis, the academics, industry, and regulators have all reflected upon its root causes. One of the factors identified is that the crisis stemmed from a widespread inability within the industry to manage risk effectively, particularly operational risk (Andersen et al. 2012). The topic of operational risk identified from the Form 10-K may capture concerns about the bank’s ability to manage increased risk exposure. This paper’s findings suggest that financial institutions pay more attention to operational risk, mainly focusing on their internal controls, and honestly disclose this risk factor. Another risk topic worthy of attention is information technology risk, which has risen from 15th in the pre-crisis period to 18th and is closely related to the continuous advancement of technology.

Finally, we focus on the particular risk topics identified in samples of different stages. The tenant bankruptcy and hedging transaction are unique to the pre-crisis phase. The risk topic of tenant bankruptcy shows high elevation before the financial crisis. This topic has component words such as “penalties” and “tenant”, which capture borrowers’ propensity to refinance existing loans. This could be attributed to worries regarding mortgage-backed securities and the liquidity of various short-term assets (Hanley and Hoberg, 2019). Besides, the topic of “hedging transaction” is specific before the subprime crisis, which is related to the two Bear Stearns’ hedge funds’ bankruptcy filing on July 31, 2007. The debt risk and deposit risk are unique to the period from 2008 to 2010. The third-party risk, goodwill impairment, and foreign exchange currency are emerging threats in the stage of 2011 to 2013. It may still be relevant for the U.S. due to globalization and increased competition among banks, and heightened cross-border activities (Bessler and Kurmann, 2014).

To sum up, by analyzing the changes of risk topics extracted in Form 10-K over the periods before, during, and after the subprime crisis, our findings show that, in general, the changes of specific risk topics are consistent with the risk evolution of the subprime crisis. It provides some evidence that the risk factors in Form 10-K are not meaningless and boilerplate words but can reflect the critical risks in a particular risk environment. The financial institutions have credibly disclosed their risk factors at a macro level.

The consistency between risk disclosures and the subprime crisis dictionary

The above section is the qualitative analysis of risk factors’ evolution. We also design another experiment to quantitively examine the informativeness of risk factors. By creating a dictionary of the subprime crisis based on news data, the matching scores between these keywords related to the subprime crisis and the risk factors disclosed in their financial reports are measured.

The subprime crisis dictionary is created based on news coverage published in the U.S. A total of 2515 daily news articles are downloaded from the finance, economics, and risk management section of the Wall Street Journal, the New York Times, USA Today, and the Washington Post from the Factiva from January 1, 2008, to December 31, 2010. Also, 1500 articles from the same sections in other years (2006–2007, 2011–2013) are downloaded as the reference corpus. Then, the keywords with different keyness values can be obtained using the method introduced in section “Quantify the consistency between risk disclosures and the subprime crisis dictionary”. Table 5 presents 30 typical words and their keyness in the subprime crisis dictionary.

Table 5 Examples of words and their keyness values in subprime crisis dictionary.

The matching score between the risk disclosures in different periods and the subprime crisis dictionary is calculated based on Eq. (5), and the results with different numbers of words in the dictionary are presented in Fig. 6. It shows that as the number of the top keyness words increase, all the scores become larger. However, matching scores between the textual risk disclosure samples during the subprime crisis and the dictionary is always the highest. This indicates that companies’ risk disclosure is more consistent with the dictionary during the subprime crisis compared with the period before and after the crisis. This is quite reasonable and indicates that the financial institutions indeed have disclosed many factors about the subprime crisis during that period. Besides, the matching scores before the subprime crisis with different numbers of keywords are always larger than that after the subprime crisis, which demonstrates that the risk factors disclosed before the crisis are closer to the subprime crisis dictionary and further provides evidence that financial institutions have realized some risks related to the upcoming subprime crisis.

Fig. 6
figure 6

The matching score of risk factors disclosures and the subprime crisis dictionary.

The significance of differences in matching scores is further statistically verified by Student’s t-test. The null hypothesis of the t-test is the mean of two lists of numbers is the same. If the P value is larger than the threshold (usually 5% or 1%), then the null hypothesis can be accepted, otherwise, the null hypothesis is rejected. The mean, median, and variance of the matching scores are in Fig. 6 and the t-test results are shown in Table 6. The p-values of the t-test for the scores before and during the crisis is 0.00006, which is much smaller than the threshold. So it is verified that the matching score before the subprime is significantly smaller than the scores during the crisis. Similarly, the scores during the crisis are statistically significantly larger than the scores after the crisis.

Table 6 The statistical description of matching scores and results of the t-test.

The consistency between the subprime dictionary and the specific risk topic identified by the Sent-LDA model is also calculated. The number of words in the dictionary is set to 240, where the matching score is highest. In section “ The changes in risk factors with the evolution of the subprime crisi”, the \(Importanc{e}_{t}\) of each topic in three periods is calculated based on Eq. (2) and presented in Table 4. Then the matching scores between each risk topic and the dictionary in the different periods are calculated based on Eq.(6). To show the consistency between risk disclosures and the subprime crisis dictionary more visually, the thermodynamic diagram is used to display the matching score between each risk topic and the subprime crisis dictionary in different periods. The results are shown in Fig. 7. Different color blocks represent different values of matching scores, and the darker the red is, the higher the consistency between this risk topic and dictionary is.

Fig. 7
figure 7

The matching score between risk topic and the subprime crisis dictionary.

Before the subprime crisis, the matching score of common stock fluctuations, loan loss, interest rate fluctuations, product and service problems, real estate risk, etc. are much larger, meaning that financial institutions are perceived these risks related to the subprime crisis. During the subprime crisis, the matching score between risk topics, including common stock fluctuations, litigation risk, compliance risk, macroeconomic risk, funding risk, downgrade risk, and the subprime crisis dictionary are larger, which are consistent with the evolution analysis in subsection of “ The changes in risk factors with the evolution of the subprime crisi”. When the subprime crisis subsides, the matching scores are lower than the prior period generally, meaning that the financial institutions are less likely to disclose risk factors related to the subprime crisis than they were at the previous stage.

In general, we find that the matching score between the disclosed risk factors during the subprime crisis and the subprime crisis dictionary is the largest, the score before the crisis is the second, and the score after the crisis is the lowest. This finding provides quantitative evidence that, before and during the subprime crisis, financial institutions perceived some risks related to the subprime crisis thus disclosed them in Form 10-K. The risk factors in Form 10-K are indeed related to the true risk profiles and are informative when a crisis is approaching or occurred.

The differences in risk factors between the failed and survived firms

The above analysis shows that both the expression ways and the contents of risk factors can reflect the financial crisis’s evolution well. This section further studies whether there is a difference in the disclosure between the survived and failed companies during the subprime mortgage crisis. The firms that still listed after the subprime crisis are named “survived companies” that sustain listing by gaining more acceptance by investors and generate better benefits. One of the worst consequences for both companies and their shareholders is involuntary delisting, which is a forced exist from the listing exchange (Charitou et al. 2007). The involuntary delisting companies in crisis are regarded as the failed companies. Based on the textual attributes and risk topics identified above, the logit regression model is employed to examine which type of companies tend to disclose the risk factors related to the subprime crisis.

The \(F-SC\) denotes the dependent variable. It is a dummy variable that takes the value 0 if the financial institution survived the subprime crisis, and takes the value 1 if the financial institution failed in the subprime crisis. Then the logit regression model is explained by Eq. (7).

$${\mathrm{ln}}\left(\frac{p(F-SC=1)}{1-p(F-SC=1)}\right)={b}_{0}+{b}_{1}{x}_{1}+{b}_{2}{x}_{2}+\ldots+{b}_{n}{x}_{n}+c$$
(7)

where \(p\) is the failure probability of a financial institution and \(1-p\) is the survival probability, \(p/1-p\) is the odds ratio representing the probability of failure over survival, and \({b}_{0},\mathrm{..}.,{b}_{n}\) denotes the estimated regression parameters of the selected variables. The coefficients \({x}_{1},\mathrm{..}.,{x}_{n}\) indicate the independent variables (Heidinger and Gatzert, 2018).

The independent variables incorporate both the risk information from textual disclosure (5 textual attributes and 32 risk topics) and the control variables (4 financial ratios). The financial ratios are size, cash, leverage, and RoA. They are selected by reference to Lu and Whidbee (2013), which exmaines the bank structure and failure during the subprime crisis. The size is the natural log of total assets. The cash measures the availability of cash to pay the debt of the firm and is the sum of cash and securities, scaled by beginning total assets (Che et al. 2020). The leverage means a firm’s ability to repay long-term debt by measuring the level of loans, represented as total liabilities divided by total assets. The RoA is a profitability metric that assesses how efficiently a company uses its assets and controls its expenses to generate a satisfactory return, calculated as net income divided by total assets. The data of financial ratios for each firm are obtained from CRSP and Compustat database. The textual attributes are length, specificity, stickiness, boilerplate, and sentiment from section “Changes in textual attributes with the evolution of the subprime crisis”. The 32 risk topics identified in section “Risk topics identified from the risk disclosures in Form 10-K” are regarded as 32 variables. When the firm discloses a specific risk topic, the corresponding variable is recorded as 1; otherwise, it is recorded as 0. Finally, the binary logistic regression is given by Eq. (8).

$$\begin{array}{l}{\mathrm{ln}}(\frac{p(F-SC=1)}{1-p(F-SC=1)})={b}_{1}Length+{b}_{2}Specificity+{b}_{3}Stickiness\\\qquad\qquad\qquad\qquad+\,{b}_{4}Boilerplate+\,{b}_{5}Sentiment+{b}_{6-34}Risk\_topic\\\qquad\qquad\qquad\qquad+\,{b}_{35}Size+{b}_{36}Cash+{b}_{37}Leverage+{b}_{38}RoA+c.\end{array}$$
(8)

The failed companies are selected based on the CRSP share codes in 2011. Firms of involuntary delisting are with delisting codes ≥400 (excluding 570 and 573), and the firms still active after the subprime crisis are with delisting codes = 100 (Chaplinsky and Ramchand, 2012). After preparing all the samples, we first, conduct a multicollinearity test. All the variance inflation factor (VIF) values of variables are less than 10, and pass the multicollinearity test. Then the binary logistic regression is conducted and Table 7 shows the results.

Table 7 The estimation results of logistic regression.

Form Table 7, the preliminary test result of the factors motivating the dropped delisting companies are presented. A total of 18 variables are statistically significant. The regression results show that firms with a larger size, higher RoA, and lower leverage are significantly more likely to survive the subprime crisis. From the textual attributes, firms that failed in the financial crisis tend to disclose their risk factors using less boilerplate words and more negative words. Besides, the sentiment has the most substantial effect (negative), followed by the leverage (positive), boilerplate (negative), RoA (negative), cash (positive), and size (negative). Chen et al. (2021) also discovered that higher liquidity risk reduced a bank’s chances of survival, ROA, and net interest margin during the subprime crisis from 2007 to 2009, while simultaneously increasing its loan-loss-provision expenses. This negative impact was especially evident in banks with higher credit risk and lower capital ratios.

The regression coefficients of the risk topics indicating that the failed intuitions tend to disclose their problems more frequently regarding loan losses, real estate risk, product and service problems, credit risk, hedging transaction, and so on, which are considered the high relevant risks to the subprime crisis. The findings align with Hanley and Hoberg (2019), who find that banks with greater risk exposure before the crisis are more likely to fail afterward. Avinoam and Alon (2023) also conduct textual analysis, revealing that during the crisis, banks have already shifted their report focus from market risk to credit and liquidity risks. Elsayed and Elshandidy (2020) develop a lexicon related to corporate failure and show that narrative disclosures provide additional explanatory power in predicting corporate collapse. Their research also indicates that narrative disclosures about corporate failure can significantly forecast a firm’s downfall up to two years in advance. It is natural for the delisting financial institutions facing more subprime crisis-related risks because they failed because of the crisis. The survived institutions face fewer crisis-related risks, so their textual risk disclosures contain less information about the crisis event. Both the failed intuitions and the survived institutions have truthfully disclosed their risk factors in financial reports; therefore, we can reach the regression results that involuntary delisted firms tend to disclose more crisis-related risks than the survived firms.

The results of logistic regression show that failed firms have reported more risk factors related to the subprime crisis using more negative words. It also provides evidence that the risk factors in Form 10-K are not meaningless but can genuinely reflect the bad situation faced by some companies in crisis.

Risk topics identification in recent years

The global financial crisis reminds regulators to comprehend the sources of emerging risks and to take preemptive actions to prevent these risks from causing economic instability (Hanley and Hoberg, 2019). However, assessing and predicting the significant risks in the future is a great challenge. Compared with other accounting disclosures that mainly describe companies’ past performance or events, the risk factor section in Form 10-K is required to be forward-looking by the SEC. The experiments in Sections “ Changes in textual attributes with the evolution of the subprime crisis” to “The differences in risk factors between the failed and survived firms” provide evidence that the risk disclosures in Form 10-K are credible and convey valuable risk information of the listed firms to the market, so these risk disclosures can shed new light on the financial industry’s risk identification and monitor.

This section further utilizes these forward-looking risk disclosures to assess the significant and emerging risks of the U.S. in recent years. Hanley and Hoberg (2019) also make a pioneering contribution to develop an interpretable methodology to detect emerging risks. Based on the risk factors disclosed in recent years (2020–2022), the major and emerging risks are identified using the Sent-LDA model, which can aggregate perceptions of all senior managers to provide a reference for the regulators and companies.

Based on the risk disclosures in Form 10-K from 2020 to 2022, 33 risk topics are identified. We restrict attention to prominent and emerging risk topics. The top 5 risk topics with the highest importance and the 6 emerging risk topics that haven’t been identified in previous textual risk disclosure are displayed by word cloud in Fig. 8. Different from other periods, the investment risk is the most frequently disclosed risk topic in recent years, and its proportion is significantly higher than other risk topics. This means that companies may face higher investment risks in recent years’ uncertain environment. It is worth noting that the loan loss and compliance risk are still frequently disclosed risk topics. In addition, common stock fluctuations becomes the fourth risk topic in importance, and the information technology risk is the fifth risk topic.

Fig. 8
figure 8

The top 5 risks in importance and emerging risks in recent years.

Some new risks are emerging from 2020 to 2022, including liquidity risk, LIBOR changes, instituition soundness risk, accounting estimates risk, and transaction mode changes. Particularly, the pandemic risk is identified, which has not been identified before 2020. Different from the previous stages, which focused on the risk topic of capital requirement, liquidity risk is identified based on the risk disclosures for 2020–2022. In particular, the proportion of liquidity risk accounts for 2.34%, meaning that more financial institutions are exposed to liquidity risk. For the risk topic of LIBOR changes, at the beginning of 2017, the eight-year-long global monetary easing cycle came to an end, and the post-era monetary easing environment of the financial crisis underwent a historical change, with the major world economies led by the United States gradually stepping into the rhythm of interest rate hikes. In the context of the world economy’s rising interest rates, the LIBOR rate change is bound to be affected (Hanley and Hoberg, 2019). Thus, the LIBOR changes are identified for the first time.

In the these three years, especially in 2020, the world has been affected by the COVID-19 pandemic. Loughran and McDonald (2023) analyze the risk factors disclosed from 2018 to 2019, before the COVID-19 outbreak, and find that less than 21% of them mention pandemic-related terms. However, it is worth noting that pandemic risk is clearly identified from the risk factors disclosed in 2020 to 2022, and the importance of pandemic risk accounts for 1.83%. These findings reveal that many companies have indeed disclosed the pandemic risk as an important risk factor to investors, and acknowledged its negative impact on the company’s business. Furthermore, affected by the COVID-19 pandemic, many companies are facing varying degrees of difficulties, so it is the first time to identify the risk topic of institution soundness risk. The transaction mode changes may be also related to the shift of many businesses from offline to online under the impact of the COVID-19 pandemic (Zhu et al. 2021; Wang et al. 2022). These findings also reflects that the company’s risk factors disclosed in the annual report is informative and timely to some extent. Overall, the results provide a new perspective for assessing current significant risks and facilitate regulators to understand the emerging risks in recent years.

Conclusion

This paper offers new evidence regarding the considerable controversy about the informativeness of risk factors in Form 10-K as required by the SEC. Based on the real risk event of the subprime crisis, this study innovatively constructs a text mining framework to examine whether the risk factors disclosed in financial reports of U.S. financial institutions are consistent with the actual risk profile before, during, and after the subprime crisis. The empirical study is based on 14089 Form 10-K reports from 1685 U.S. financial institutions, and the major findings are as follows.

Firstly, we find that financial institutions tend to disclose their risks in more specific and negative language and significantly reduce stickiness and boilerplate during the subprime crisis. This reflects that the expression ways of disclosing the facing risks can reflect the significant changes of the risk environment. Secondly, for the detailed contents of the risk disclosures in Form 10-K, both qualitative analysis and quantitative analysis based on our developed subprime crisis dictionary find that the changes of the disclosed risks are consistent with the risk evolution of the subprime crisis. This shows that when a risk event approaches or occurs, the financial institutions have indeed disclosed the risk information in their financial reports. Thirdly, we find that the failed institutions due to the subprime crisis have reported more risk factors related to the crisis and used more negative words. This reflects that the failed ones didn’t hide their risks and prevent managers from disclosing them. The risk factors in Form 10-K are not meaningless but can genuinely reflect the bad situation faced by some companies in trouble.

All the findings provide evidence from different perspectives that the risk factors in Form 10-K are not invariable boilerplate and can reflect the actual risk profiles. Therefore, we conclude that on the whole, the risk factors disclosed by financial institutions are informative. This carries important implications for researcheres, investors, and regulators. Nowadays most of the studies reply on using qualitative data to identify, measure and analyze the potential risks. The unstructured text data can also be used as an important supplement to study risk because they contain abundant risk information. Especially, the textual risk disclosures in financial reports may be reliable data sources to assess and predict the potential risks at present and in the future. Other countries can refer to the U.S. and require the companies to disclose their risks in financial reports, to ease the information asymmetry problem in the financial market.

This study is not without limitations. For example, this paper focuses on studying whether the risk factors can reflect the main risks faced by the companies from a perspective of the whole financial industry, which is a relatively macro perspective, rather than examining whether a company has truly and accurately disclosed its risk factors from the individual level. This more refined text analysis focusing on the individuals may be a research direction in the future. In the future, more detailed and dynamic risk identification and assessment can be carried out based on this valuable and forward-looking textual risk data source in Form 10-K.