Introduction

Historic gardens and interdisciplinary learning

In its first article, the International Charter of Florence defines a historic garden as ā€œan architectural and horticultural composition of interest to the public from the historical or artistic point of viewā€ (ICOMOS 1982). This brief definition is primarily focusing on the heritage-aspect. Historic green spaces such as landscape parks or baroque gardens are seen as significant elements of cultural tradition. Consequently, the charter insists that historic greenery shall be conserved just like built monuments. According to the convention, the general architectural structure of respective facilities ought to be maintained. The botanical composition itself is seen as a historic good (Ibd.). Garden heritages are characterised by typical design features such as planned groups of trees, ornamental beds or historic plant collections. The composition of these elements is regarded as a historically grown artwork. Furthermore, it has been stated that gardens shall not only be seen as an element of culture, but also as natural spaces. The close connection of human care and natural influence is considered as a crucial characteristic of (historic) garden areas (Cooper 2006). Against this background, it has been argued that historic gardens may offer habitat structures and fulfil valuable ecological functions (Carrari et al. 2022). The biodiversity of horticultural heritages may surpass the one of ā€œusualā€ cultural landscape sites (LƵhmus and Liira 2013). Respective green spaces mitigate hot summer temperatures in urban spaces. Moreover, they can promote residentsā€˜ health and well-being (Fineschi and Loreto 2020; Carrus et al. 2015; Donovan 2017). Consequently, natural protection and heritage cultivation may be seen as crucial objectives of garden conservation. However, conflicts between both principles may also be an item of public discourse (Salwa 2014).

In light of this dynamic interaction of ecology, aesthetics and historicity, old garden areas can potentially serve as interdisciplinary learning places, where environmental education may be connected with historical or aesthetical content. However, didactic possibilities, associated with historic gardens, have hardly been investigated. From a practical point of view, it would be crucial to design adaptive learning concepts for this purpose. From the perspective of educational sciences, it would especially be of interest to investigate these programsā€˜ impact on psychological constructs such as attitudes (American Educational Research Association 2014; Cronbach and Meehl 1955). Research has shown that educational offers in natural settings such as botanical gardens may have positive influence on participantsā€˜ attitudes and behavioural intentions (Sellmann and Bogner 2013; Zelenika et al. 2018). It is yet to show, to what extent these findings may be transferred to historic gardens. Furthermore, the efficacy of different educational programs in garden heritages (e.g. guided tours or self-guided exploration) could be analysed and compared systematically.

It has been shown that especially young people seem to prefer managed park landscapes over ā€œnaturalisticā€ areas (Lückmann et al. 2013). Consequently, it may be hypothesised that historic garden areas are appropriate spaces for early nature-contact and environmental education. The confirm this assumption, evaluative studies are required. However, the development of all this research highly depends on the availability of suitable psychometric measurement tools. The current study is focusing this target by developing a scale addressing attitudes with regard to historic gardens.

Theoretical explanation and evaluation of attitudes

In social psychology, attitudes are generally defined as evaluative convictions (Ajzen 2001; Bohner and Dickel 2011). These evaluations are seen as personal characteristics, potentially influencing people’s actions. Furthermore, attitudes are understood as latent constructs, not directly observable but implicitly efficacious (Hogg and Vaughan 2022). They are generally referring to specific attitude objects, i.e. definable parts of a person’s environment. Such objects can include concrete items as well as abstract terms, single persons, social groups or inanimate entities (Bohner and WƤnke 2002). According to an influential definition, attitudes may be divided into three decisive components: affective aspects (evaluative emotions concerning the attitude-object), cognitive considerations (evaluative beliefs) as well as behavioural tendencies with regard to the object (Maio et al. 2019). This definition of the construct refers to a philosophical tradition that distinguishes between thought, feeling and action in describing the fundamental character of human experience (McGuire 1989).

The three-component-model can be illustrated with a short example: People’s attitudes towards specific landscape sites may on the one hand include feelings or emotions regarding this part of environment (for example a sense of joy one might feel when observing this characteristic landscape). Furthermore, judgments (for example the opinion that particular landscape sites are valuable parts of nature because they are offering important habitat structures). Finally, the willingness of behaving in a characteristic manner (e.g. donating for landscape protection). The constructsā€˜ tripartite structure has been validated in previous research (Breckler 1984). However, the multicomponent model is also being disputed. Alternative approaches claim only one or two construct-dimensions (Chaiken and Stangor 1987). For example, theories on environmental attitudes distinguish between preservative and utilitarian perspectives (Milfont and Duckitt 2004; Bogner and Wiseman 2002). Also the relationship between manifest behaviour and attitude has been questioned (Schwarz 2008; Ajzen and Cote 2008). Apparently weak correlations between latent evaluations and behavioural patterns have long since been discussed in social psychology. For example, it has been stated that rather weakly pronounced attitudes were unlikely to have an impact on people’s actions. In these cases, different factors such as social conventions could serve as behavioural predictors (Smith and Louis 2009). Nevertheless, recent studies have adhered to the tripartite construct-model (Gonulal 2019; Kaiser and Wilson 2019). Furthermore, theoretical approaches have been developed, in order to explain, under which conditions latent evaluations may lead to manifest action (Ajzen and Fishbein 1980). Several crucial factors have been figured out such as the attitudesā€˜ accessibility or its stability over time (Hogg and Vaughan 2022; Doll and Ajzen 1992; Glasman and AlbarracĆ­n 2006). Well-established theories on the relationship between behaviour and attitude claim an indirect interaction, focusing on intentions as mediating factors between latent evaluations and concrete action (Ajzen 2001, 1985).

Attitudes are considered as relatively stable constructs, almost partially independent from situational contexts. Single evaluations that appear spontaneously and/or situational are not necessarily seen as expressions of an underlying attitude (Hogg and Vaughan 2022; Himmelfarb and Eagly 1974; Krech and Crutchfield 1948). However, theoretical approaches have also aimed to explain, under which conditions attitudes may be changed or shaped by situational factors. External influences such as cognitive dissonance have been figured out as potential causes of attitude-change (Harmon-Jones and Mills 2019). The possibility of influencing attitudes makes the construct relevant for educational purposes. The development and change of one’s attitude-structure for instance is crucial in case of environmental convictions. In times of biodiversity-loss and climate crisis, environmental attitudes are seen as decisive objectives of conservation psychology (Cruz and Manata 2020). It is generally hoped that a positive evaluation of the environment may lead to pro-environmental behaviour. Furthermore, it is expected that the development of such attitudes could be supported by pedagogic interventions (Janakiraman et al. 2018). Consequently, the construct of attitudes may also be applicable on cultural heritages. As indicated above, heritage cultivation is connected to value judgements as well (Salwa 2014). The willingness of conserving such facilities can be explained with latent evaluations, be they affective or cognitive. For that reason, we put the attitude-construct at the core of our validation study.

State of empirical research

Multiple scales have been developed, in order to measure environmental attitudes (Kaiser et al. 2007; Schultz 2001). These tools mainly address ecological aspects, for that reason, they are only partially adaptive in the context of cultural heritage sites such as historic gardens. Only few studies focus on public attitudes towards historic greenery. For instance, Hristov et al. (2018) analysed park visitor’s views on a particular garden area. They used a qualitative approach (semi-structured interviews), in order to explore visitor’s experiences and interpretation-strategies. Other authors investigated general assessments with regard to characteristic landscapes (Shahamati 2020; Lückmann et al. 2013). However, these studies do not work with the attitude theory posed above. Furthermore, in these cases qualitative methods are used. Appropriate scales that enable quantitative measurement are generally lacking. The current study aimed to close this research gap by introducing the Garden Heritage Scale (GHS), a novel psychometric tool with respect to affective, cognitive and behavioural aspects. Taking into account these three components, the GHS may allow a differentiated evaluation of educational offers in historic gardens.

Scale development and evaluation methodology used in this paper

The process of developing and evaluating psychometric instruments includes several methodical stages. Firstly, items need to be generated. Afterwards, the scale development may be performed (including item reduction and factor extraction). Based on this, a scale evaluation shall be conducted; this contains analyses of dimensionality, reliability and validity (Boateng et al. 2018; Crocker and Aligna 2008). Validity is generally considered as the coherence between test score interpretations and theoretical as well as empirical evidence. Psychometric tools are meant to measure certain constructs. Validation means evaluating if the test is capable of measuring what it is meant to. Reliability refers to the precision that can be achieved with the respective tool. The criterion insofar indicates the degree to which testing-results are replicable and robust against confounding variables (American Educational Research Association 2014; Reynolds et al. 2021).

The current study was based on clear theoretical assumptions and partially aligned on an existing scale (Sieg et al. 2018). For that reason, we adapted the conventional research procedure and put an emphasis on item reduction, factor extraction and scale evaluation. In the first step, a pool of items was collected, based on the three components of attitudes (Maio et al. 2019). Afterwards, factorability and item quality were investigated by an exploratory factor analysis. On the basis of data from another survey, the dimensionality was additionally examined with a confirmatory factor analysis (Worthington and Whittaker 2006; Carpenter 2018). To confirm external validity, the scale was correlated with convergent and discriminant references. In the final methodical step, reliability was analysed – on the one hand by a calculation of internal consistency, on the other hand by an investigation of test-retest-reliability.

Analysis of validity

Factor extraction and item selection

Short Introduction

In the first step, a total of 18 items has been collected. Always six of these were meant to measure one of the theoretically proposed construct-dimensions (Table 1). Seven items were aligned on corresponding ones of Sieg et al. (2018). An exploratory factor analysis (EFA) was performed to examine the item’s suitability. This calculation procedure is generally used to investigate latent factors that explain variances within given data. Statistical relations between manifest variables are analysed and associated items are being aggregated (Beavers et al. 2013; Goretzko et al. 2021). The EFA may serve to investigate the dimensionality of a scale and enable an exclusion of inappropriate items (Boateng et al. 2018).

Table 1 Initial item-collection. In the survey, items were rated on a five-point Likert-scale.

Methods

Participants

To perform an EFA, a sufficient sample size is generally required. It is common to include at least 10 cases per tested item into the sample (Costello and Osborne 2005). In the current study, a total of 233 persons participated, 143 of these were adults that belonged to the general population, 90 were high-school students. The data from the general public was partially collected with an online survey, the high school students were only questioned via paper-pencil. 140 persons identified themselves as female (60.09%), 91 as male (39.06%), two participants did not indicate a gender identity (0.86%). The age structure of the total survey can be found in Table 2.

Table 2 Age structure of the survey.

Data analysis

All calculations were performed with SPSS 29 and Excel 2016. In the first step, it was necessary to examine, whether the collected data-set was appropriate for a factor analysis. To this end, a Kaiser-Meyer-Olkin test was performed (Kaiser and Rice 1974; Kaiser 1970). Afterwards, an initial extraction was conducted, using principal axis factoring. Results were analysed via promax rotation. Since a tripartite structure has been proposed, the extraction was forced on three factors. In the following step, an initial item selection was performed. Variables with cross-loading-differences lower than 0.20 were eliminated. Those, whose loadings contradicted the initial classification (Table 1), were excluded as well. In a second step, the scale was shortened to guarantee balanced subscales. To test the factorial structure of the final item set, the analysis was finally repeated, based on the Kaiser criterion (eigenvalues ≄ 1.0) (Fabrigar et al. 1999).

Results

The KMO-test led to a coefficient of 0.904. The results of the initial EFA can be found in the Supplementary Appendix (A). Table 3 shows all items that needed to be excluded. The cognitive and behavioural subscales were reduced due to low cross-loading differences as well as logical considerations. Whereas, all items assigned to the affective dimension were allocated into the same factor. In order to shorten the scale and ensure balance between the components, the three affective items with the lowest loadings were excluded as well. The reduced outcome is shown in Table 4. All loadings reach relatively high values (>0.590 with a rounded average of 0.779). However, the final EFA, based on eigenvalues, led to a two-component model: Cognitive and affective items were attributed to the same component. As the screeplot (Fig. 1) shows, the eigenvalue of the third factor slightly misses the Kaiser-criterion. Although the structure of the graph does not show a clear peak, a flattening especially becomes manifest after factor four.

Table 3 Overview of excluded items: ā€œLogical considerationsā€ refers to loadings that contradicted the initial item-classification (see Table 1).
Table 4 Pattern matrix of the EFA after the item-selection.
Fig. 1: Screeplot of the final EFA.
figure 1

The graph is showing the eigenvalue of each factor. The figure has been created with Microsoft Excel and Powerpoint 2016.

Discussion

To guarantee the data-set’s suitability for an EFA, the Kaiser-Meyer-Olkin test shall lead to a coefficient at least >0.60 (Kaiser and Rice 1974). Since this requirement has been fulfilled, the current data proved to be appropriate. The analysis was meant to test internal validity and provide evidence for an initial item selection. After a criteria-led reduction of the item-pool, the expected factorial structure could be extracted. Though one factor’s eigenvalue slightly falls under the Kaiser criterion, the tripartite model reaches relatively high factor-loadings and no cross-loadings above 0.30. Also the screeplot’s flattening after factor four supports a tripartite solution. For that reason, we hypothesised three factors in the following.

Test of dimensionality

Short Introduction

In the next methodical step, the dimensionality of the scale was to be tested with an additional calculation procedure. To this end, data from a second survey was used for the conduction of a confirmatory factor analysis (CFA). This statistical operation also serves to discover latent factors. But the CFA does not simply explore the data, as the EFA does. On the contrary, hypothetical factor-models must be defined from the outset. Afterwards, their fit to the data is being calculated (Cole 1987; Mueller 1996). The factorial structure of a newly developed scale may be confirmed this way (Worthington and Whittaker 2006).

Methods

Participants

A total of 183 participants took part in the inquiry. It is recommended that the ratio between participants and free variables shall be at least 10 to 1 (Bentler and Chou 1987). Consequently, the sample size may be seen as sufficient. The survey consisted of adults belonging to the general population (mainly students of Goethe-University Frankfurt). 130 participants identified themselves as female (71.04%), 51 as male (27.87%). Two persons did not indicate a gender identity (1.09%). The age structure of the survey may be found in Table 5.

Table 5 Age structure of the survey.

Data analysis

In the first step, SPSS 29 was used to replace missing values by series means. The CFA was then performed with SPSS Amos 29. In order to examine the model-fit of the proposed factor-structure, a maximum likelihood estimation was conducted, using relevant fit indices: Comparative Fit Index (CFI), Tucker Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), Standardised Root Mean Squared Residual (SRMR) and Chi square/degree of freedom (χ2/df) (West et al. 2012). Three models were compared:

  • Model 1: The tripartite solution, extracted during the EFA.

  • Model 2: The two-factor solution, extracted during the EFA, using the Kaiser criterion.

  • Model 3: A hypothetical one-factor solution, attributing each variable to only one factor.

We found it reasonable to compare Model 1 and 2, since the outcome of the initial EFA did not support the tripartite structure beyond any doubt. Though the screeplot rather spoke for a differentiation between the affective and cognitive dimension, the eigenvalue of the third factor slightly fell under 1.0. This supported a two-factorial structure, only differentiating between ā€œBehaviourā€ and ā€œAffect-Cognitionā€. We also included a one-factor-solution (Model 3) because the correlations between the factors turned out relatively high (see Fig. 2).

Fig. 2: Path diagram of the confirmatory factor analysis of Model 1.
figure 2

The figure shows the factor loadings of each variable as well as and correlation coefficients between the factors. The figure has been created with Microsoft Powerpoint 2016.

In order to compare the different models, a likelihood-ratio test was conducted. This calculation serves to evaluate, whether the fit-levels of compared models differ significantly. To this end, \(\Delta \chi 2\) and \(\Delta {df}\) are calculated as indicated below (Cheung and Rensvold 2002). Significance may then be reviewed with a chi-square table (Pandis 2016).

$$\begin{array}{cc}\Delta \chi 2=\chi \begin{array}{c}2\\ c\end{array}-\chi \begin{array}{c}2\\ {uc}\end{array} & \Delta {df}={{df}}^{c}-{{df}}^{{uc}}\end{array}$$

\(\chi \begin{array}{c}2\\ c\end{array}\) and \({{df}}^{c}\) are values of the ā€œconstrainedā€ model with a higher degree of freedom. The opposite values stem from the ā€œunconstrainedā€ model with less restricted parameters.

Results

The compared fit-indices may be found in Table 6. While each modelsā€˜ p-value fell under the significance level (<0.05), the highest score was reached with Model 1 (p = 0.139). Also in case of CFI, Model 1 reached the highest level (0.988), followed by Model 2 (0.879) and Model 3 (0.714). The same ranking was found in case of TLI (0.980/0.821/0.600). On the contrary, the opposite order could be discovered at SRMR (0.0392/0.0981/0.1200), RMSEA (0.045/0.136/0.203) and the χ2-df-ratio (1.37/4.98/8.5). Figure 2 shows a path diagram of Model 1, containing factor loadings and correlations between the factors. The regression weights (factor loadings) reach moderate to high scores (>0.60), their average amounts 0.74. The results of the likelihood-ratio-test may be found in Table 7. āˆ†Ļ‡2 and āˆ†df are shown for the juxtaposition of Model 1 and 2 as well as Model 1 and 3. The columns on the right side show relevant cut-off-levels for p-values of 0.05 and 0.01, extracted from the standard chi-square table. In both cases, the outcome indicates a highly significant difference between the models. Since both values of āˆ†Ļ‡2 (59.5/146.7) exceed the respective cut-off for a significance level of p = 0.01 (9.210 and 11.345).

Table 6 Fit indices of the confirmatory factor analyses.
Table 7 Relevant indicators for the likelihood-ratio test with Model 1 and 2 as well as Model 1 and 3.

Discussion

Both CFI and TLI are based on a comparison of the hypothesised factor-structure with alternative solutions. High values generally imply a better fit of the proposed model. CFI and TLI ≄0.95 are recommended as cut off-levels (Boateng et al. 2018; Hu and Bentler 1999). Only Model 1 meets this criterion. SRMR and RMSEA are based on a comparison of model-implied predictions and the given data. Smaller values support a better model-fit. SRMR ≤0.08 and RMSEA ≤0.06 are seen as cut off-criteria (Boateng et al. 2018; Hu and Bentler 1999). Also in these cases, Model 1 leads to the lowest values, indicating the best fit to the data. The χ2-df-ratio shall be smaller than 5 (Weathon et al. 1977). While Model 1 and 2 fulfil this requirement, the third solution clearly misses the cut off-level. All these findings generally support the tripartite factor-structure of Model 1. The other options prove to be inadequate. The path diagram of Model 1 (Fig. 2) is also supporting this solution. The on average high regression weights confirm a reasonable model fit. The calculated correlations between the factors are neither extremely high nor extremely low. The first case would speak against a differentiation between the respective factors, while the opposite would pose the question, if the instrument’s subscales may be used for measuring one and the same construct. As the likelihood-ratio test indicates, the superior fit of Model 1 turns out highly significant (p < 0.01) when compared to the fit level of Model 2 and 3.

Criterion-related validity

Short Introduction

In the final step, criterion-related validity was tested with external references. For that purpose, the Garden Heritage Scale has been correlated with a discriminant and a convergent instrument. In the first case, the external reference is expected to measure something different. Consequently, the correlations shall be low. In the second case, both instruments shall measure similar constructs. For that reason, higher correlations are expected (Campbell and Fiske 1959; Rƶnkkƶ and Cho 2022; American Educational Research Association 2014). A direct comparison of the correlation coefficients may clarify the question, if the instrument can be used for its purpose.

Methods

Participants and data collection procedure

For the calculation of criterion-related validity, the data of the CFA-study was used. Next to this, the survey consisted of the pre-test data of the test-retest inquiry (see following chapter). This led to a total of 224 participants. 162 of these identified themselves as females (72.32%) and 58 as male (25.89%), 4 persons did not indicate a gender identity (1.79%). The survey contained an adapted version of the Nature Interest Scale (NIS) by Kleespies et al. (2021). This instrument is originally meant to measure individual interest in nature. In the current study, in case of each item the word ā€œnatureā€ was replaced by the term ā€œhistoric gardensā€. This modified scale should serve to disclose interest in garden heritages. It was used as a convergent reference. Consequently, modest to high correlations with the GHS were expected. Furthermore, an adapted short scale of Zoderer and Tasser (2021) was used as a discriminant reference. This instrument inquiries potential risks and chances, people connect with the term ā€œwildernessā€. The scale has shown to be suitable for extrapolating on people’s attitudes towards wilderness. Since managed park areas can clearly be distinguished from such ā€œnaturalā€ spaces, it has been hypothesised that the correlations with the GHS must be low. In the current survey, items were translated into German. Furthermore, the general question, underlying the scale, has been adapted: In the original paper, the items refer to an increase of wilderness in the Italian region of South Tirol. In the current study, potential consequences of an increase of wilderness in Germany have been inquired. In accordance with this supra-regional focus, the term ā€œlocalā€ in the item ā€œDecrease of local economy and tourismā€ has been eliminated. Furthermore, a brief definition of the term wilderness has been added. Both the discriminant as well as the convergent instrument may be found in the Supplementary Appendix (B.1; B.2).

Analysis

All analyses were performed with SPSS 29. In the first step, the negative items in the discriminant instrument, indicating perceived risks, were reversed. Afterwards, internal consistency of both scales was calculated via Cronach’s alpha (Cronbach 1951). This coefficient generally indicates the degree to which chosen items show consistency, it can take values from 0 to 1. Its calculation is based on the total number of variables (N), the test variance (\({V}_{i}\)) and the sum of the single variable’s variance (\(\sum _{i}{V}_{i})\). If \({V}_{i}\) turns out relatively high in relation to \(\sum _{i}{V}_{i}\), the coefficient turns out rather high and vice versa (Crocker and Aligna 2008).

$$\propto =\frac{N}{(N-1)}\left(1-\frac{\sum _{i}{V}_{i}}{{V}_{i}}\right)$$

Both external references were summarised, using mean values. Finally, they were correlated with the three subscales of the GHS, using Pearson product-moment correlations (r). This calculation may be used to disclose the general mathematical relationship between different data sets. It is based on the total sample size (N), the standard deviations of corresponding variables (\({s}_{x},\,{s}_{y}\)), their means (\(\bar{x},\,\bar{y}\)) and single values \({(x}_{i}\), \({y}_{i}\)) (Rodgers and Nicewander 1988; Field 2018). The Pearson coefficient is a ā€œcoefficient of equivalenceā€. The higher it scores, the higher both surveys appear to match mathematically (Crocker and Aligna 2008).

$${\rm{r}}=\frac{\mathop{\sum }\limits_{i=1}^{n}{(x}_{i}-\bar{x}){(y}_{i}-\bar{y})}{(N-1){s}_{x}{s}_{y}}$$

Results

Cronbach’s alpha of the convergent instrument amounted 0.915, in case of the discriminant scale, it reached 0.691. The results of the correlations may be found in Table 8. In case of each component, the correlation with the convergent instrument turned out notably higher. Convergent correlations reach an average value of 0.494. In the opposite case, the average amounts 0.114.

Table 8 Correlations of the subscales of the GHS with the convergent and discriminant reference.

Discussion

Internal consistency of both scales turned out acceptable, the scores lay above the minimum of 0.6 (Robinson et al. 1991). Correlation coefficients close to 0.10 speak for small effects, values close to 0.30 are considered as mediocre, results >0.5 are seen as indicators of large effects (Cohen 1988). As expected, each subscale’s correlation with the discriminant instrument is relatively low. This indicates that both scales measure something entirely distinct. On the other hand, middle and high correlations with the convergent instrument meet the assumption that these tools measure something similar. However, the respective coefficients do not reach notably high levels (>0.60). This outcome is in accordance with theoretical expectations as well. For attitudes and interests are to be seen as similar but not identical constructs. Individual interest implies a general evaluation of the interest-object just like attitudes (Krapp 1992). But beyond this, interest also includes components that meet the attitude-construct only indirectly (Prenzel et al. 1986).

Analysis of reliability

Short Introduction

The term ā€œreliabilityā€ refers to a scale’s suitability for precise and replicable measuring (American Educational Research Association 2014; Reynolds et al. 2021). In order to evaluate this quality-criterion, Cronach’s alpha-values were calculated. As explained previously, this coefficient generally indicates the degree to which the chosen variables show mathematical consistency (Tavakol and Dennick 2011). Additionally, test-retest reliability has been determined. For that purpose, the measurement tool is being used twice with the same group of participants. Afterwards, the results of both measurements are correlated. In case of scales that are meant to measure stable personality traits, high correlations between the surveys are expected. Hence, differences are attributed to measurement errors (Reynolds et al. 2021; Raykov and Marcoulides 2011). The time interval between both inquiries should be long enough to minimise memory effects. On the other hand, extremely long time distances should also be avoided to counteract the risk that the construct may occasionally be impacted by external influences (Reynolds et al. 2021).

Methods

Participants and data collection procedure

Cronbach’s alpha was calculated with the data set of both factor analyses (n = 416). A GPower was conducted, to investigate the required sample size for the calculation of test-retest reliability (correlation with a bivariate normal model) (Faul et al. 2009; Kang 2021). We found that in case of a large effect (r = 0.7) with a power of 0.95 (alpha = 0.05) a two tailed test would require a minimum of 20 participants. To determine test-retest-reliability, a sample of biology students of Goethe University Frankfurt was questioned in a panel-survey-design. 41 persons participated in the first inquiry, 31 of these partook in the second. Between both dates of testing lay a time interval of eight weeks. The survey consisted of 6 male participants (19.35%) and 24 persons that identified themselves as females (77.42%). One person (3.23%) did not indicate a gender identity. Participantsā€˜ concrete age was not surveyed due to anonymization. However, the study was conducted within a course, mostly frequented by students in the age of 20–30.

Data analysis

All calculations were performed with SPSS 29. Cronbach’s alpha was determined for the three subscales as well as the total scale. In order the evaluate test-retest-reliability, the results of both surveys were summarised, using mean values. Afterwards, a Pearson product-moment correlation was determined. The formulas and descriptions of both calculations have already been described in the previous chapter.

Results

As shown in Table 9, all values of the reliability-analysis reach relatively high levels. The alpha-scores lie between 0.735 and 0.871. Also the correlation of the retest surveys reaches a high coefficient of 0.782, indicating a strong positive connection between both data-sets.

Table 9 Results of reliability-analysis.

Discussion

The analysis of internal consistency led to sufficient results. Items or (sub-)scales that are meant to measure one and the same construct are expected to reach relatively high scores (at least >0.70) (Bland and Altman 1997). However, it has also been stated that the coefficients shall not lie above 0.90, since this level would merely indicate redundancies between the items (Streiner 2003). The current results fulfil both requirements. Also the calculation of test-retest reliability showed a relatively strong mathematical connection between both surveys. In case of attitude scales, coefficients in the 0.70 s may be considered as acceptable (Crocker and Aligna 2008). Consequently, this outcome speaks for sufficient precision of the measurement tool and robustness against measurement errors.

Final discussion and conclusions

The current study aimed to introduce and evaluate a novel psychometric instrument, suitable for the measurement of attitudes towards historic gardens. Two factor analyses and correlations with convergent and discriminant references indicated sufficient validity. The analysis of test-retest-reliability and internal consistency led to adequate results as well. The study’s outcome is in accordance with the three-component-model of attitudes (Maio et al. 2019). Even though the EFA had to be forced on three factors, the confirmatory factor analysis indicated a sufficient model-fit of the tripartite solution. Insofar, the current results support the implementation of the behavioural component into the attitude-construct. Taking a look at the results of the CFA, it is especially striking that the correlation between the behavioural and affective component is notably lower than the other ones are (Fig. 2). In case of the current data, the general evaluation of the garden’s usefulness seems to be stronger connected with behaviour than affective considerations. However, it must be emphasised that the current scale does not measure manifest action. The items only register self-reported behavioural intentions. The relevance of an intentional disposition for concrete behaviour is disputable per se.

Finally, it can be stated that the Garden Heritage Scale represents a suitable measurement tool, directly anchored in well-established attitude-theory. The scale will enable the systematic evaluation of educational programs in historic greenery such as guided tours or exhibitions. The instrument may contribute to an improvement of such educational offers. At the same time, it can support the development of a new empirical research field, addressing environmental psychology as well as heritage studies.

Limitations

Tough the current study has been conducted with care, some limitations need to be mentioned: The data used for the initial EFA and the calculation of international consistency was collected with questionnaires that were designed slightly different, also some context-items differed. However, in these cases, no groups have been compared, only statistical interrelations between the items were analysed. The design of the tools used for the determination of criterion-related validity also slightly differed, but these variations only concerned the collection of sociodemographic data at the beginning of the questionnaire. Furthermore, it needs to be mentioned that the questionnaires used during the advanced stages of the study contained the whole initial scale of 18 items. However, only the appropriate items identified during the initial EFA were analysed statistically. Another important limitation consists in the fact that the current study worked with relatively small samples, mostly characterised by an overrepresentation of young female adults. Consequently, representativeness for the general population cannot be claimed. Applicability in different target groups needs to be evaluated with caution.