Introduction

This article describes a replication study in the humanities, more specifically, of an art historical attribution study. Replication studies are repetitions of earlier studies. So far, they have been carried out mostly in the biomedical, natural, and social sciences with the aim to assess whether the initial findings can be corroborated. A recent scoping review showed that the replication success rate in the fields of economics, education, psychology, health sciences, and biomedicine is little more than 50% (Cobey et al., 2023). In other words, when studies are repeated, they often yield different outcomes. Against the background of this ‘replication crisis’, Peels and Bouter (2018a) asked whether and how replication could be useful in the humanities. A lively debate followed, in which some authors argued that replication has no relevance for the humanities (De Rijcke and Penders, 2018; Holbrook et al., 2019). Peels and Bouter were not convinced by the critique and decided, in addition to providing more theoretical responses to these objections, to test matters empirically by performing two replication studies in the humanities (Peels and Bouter, 2018b; Rulkens et al., 2022b). This replication of an art historical attribution study is one of them.Footnote 1

The study replicated concerns the attribution of two painted portraits of Rembrandt. They are part of the collections of the Mauritshuis in The Hague, The Netherlands (Mh) and the Germanisches Nationalmuseum in Nuremberg, Germany (GN) (see Figs. 1 and 2). The debate surrounding the attribution and comparisons of these two versions has a long historiographical tradition, dating back to the 19th century.Footnote 2 The initial research replicated in this study, however, took place in 1998–99. Prior to this research, the Mauritshuis version (Mh version) was generally considered to be painted by Rembrandt himself and the Germanisches Nationalmuseum version (GN version) as a studio copy. These attributions were reversed during the initial study, which followed the unexpected discovery of an underdrawing in the Mh version during a technical examination of the painting in the conservation studio of the Mauritshuis in 1998. Extensive underdrawings were not known to be present in other paintings by Rembrandt, and the underdrawing was therefore a surprising finding that supported the reversal of the attribution.Footnote 3 Additionally, this underdrawing followed the outlines of the finished GN version, indicating that painting was made first. The reversed attribution was further supported by comparison of the painting methods used in both paintings. As part of the research process, the two paintings were brought together in Nuremberg, where they were examined and compared closely by a group of experts.

Fig. 1
figure 1

Rembrandt van Rijn (studio copy), Portrait of Rembrandt (1606–1669) with a Gorget, c. 1629. Panel, 37.9 × 28.9 cm. The Hague, Mauritshuis (inv. no. 148).

Fig. 2
figure 2

Rembrandt van Rijn, Self-portrait with gorget, c.1629. Panel, 38.2 ×31 cm. Nuremberg, Germanisches Nationalmuseum, on loan from the Kunstsammlungen der Stadt Nürnberg (inv. no. 391).

The initial research was published by different authors and for different audiences. The publications of Buijsen (1999) and Wadum (2000) were selected as the main sources for our replication study because they describe the methodologies, argumentation, and conclusions of the initial research for a scholarly audience in most detail.Footnote 4 These publications did not end the debate surrounding the attribution and comparison of the two versions. Sluijter (2000), an expert independent of the museum, responded to Buijsen and Wadum with a publication expressing a divergent opinion. He posed that the Mh version may well be a second version by Rembrandt himself, made to demonstrate the artist’s ability to paint both in his well-known rough manner and in the fine manner appreciated by others.

The choice to replicate this particular study was made primarily because attribution studies are frequent in art history. Findings could therefore be relevant for a wider variety of attribution issues. Second, the initial research presents a well-defined and widely discussed example of an important attribution study. Third, the initial study is based on a combination of different categories of evidence – historical, technical and stylistic – which allows for triangulation. Finally, the replication researchers would have access to the case-study paintings, the initial researchers involved, and the initial raw data, when preserved. The fact that the Mauritshuis was planning a conservation treatment of their painting increased interest, as the current research project could provide insights relevant to that treatment.

Methods

The design of this study was extensively described in its preregistration and addenda, which are published on the Open Science Framework (OSF) (Rulkens et al., 2022a; Rulkens et al., 2023b; Rulkens et al., 2023a; Rulkens et al., 2024a; for the overall project page, see Rulkens et al., 2025a). The aim of preregistration is to increase transparency in research by making a study design available in advance, and publish revisions in this design along the way. Preregistration is new to art history, but not yet mainstream in the sciences either (Sofi-Mahmudi et al., 2024). All publicly accessible supplementary material referred to in the current publication has been stored on OSF as well. A summary of the study design is providedn below.

Categorisation of replication methodology

Different ways have been suggested to categorise replication studies (see e.g., Penders et al., 2019; Peels, 2019). In the current study, the categorisation put forward by Peels and Bouter (2018b) and Peels (2019) was adopted: they define a replication study as ‘an independent repetition of an earlier study, answering the same study question by using the same or similar methods under the same or similar circumstances’, and distinguish between three categories:

  • A reproduction: the reanalysis of the data from the initial study using the initial study protocol;

  • A direct replication: the collection of new data using the initial study protocol;

  • A conceptual replication: the collection of new data using a modified study protocol.

Research question

As is often the case in art historical studies, the research questions of Buijsen (1999) and Wadum (2000) remain implicit in their publications. Therefore, the research questions were distilled from the main conclusions in these publications. Both authors conclude that (1) the painting technique in the GN version, combined with the observation that the underdrawing of the Mh version repeats details observed in the final image of the GN version, shows that the GN version is the first version (principal) on which the Mh version is directly based, and that (2) the underdrawing in the Mh version and the painting technique employed in both versions shows that the GN version is by Rembrandt himself, whilst the Mh version is a studio copy – i.e., a copy made by one of his students in his studio. Based on these conclusions of the initial study, three research questions were formulated for our replication study:

  • Is the Mh version painted by Rembrandt or not?

  • Is the GN version painted by Rembrandt or not?

  • If any, which of the paintings is the prime version (principal)?

In addition to conclusions one and two, Buijsen (1999) also discusses who could have painted the Mh version, if not Rembrandt. A hypothesis was presented that one of the early students of Rembrandt, Gerard Dou, is the best candidate for further investigation. This is not discussed by Wadum. In addition to conclusions one and two, Wadum (2000) presents the hypothesis that the blotchiness of the IRR images is a result of an undermodelling/dead colouring technique unique to Rembrandt and therefore a (potentially more generally applicable) indication that a painting is by Rembrandt himself. The current replication focused on the main conclusions and did not include these two hypotheses generated in the publications as well.

Reconstruction of methodology and protocol of the initial study

The initial study was not conducted according to a formalised, predetermined protocol, nor did the initial publications contain methodology sections, as is common in papers within the sciences. The course of events of the initial study was described briefly in the footnotes of the publications, and research methods were mentioned in the main text whenever research outcomes played a role in the argumentation. Therefore, a research protocol for the replication study was distilled through close reading of the initial publications. The methods referred to are: naked-eye observation, microscopy,Footnote 5 literature study, Infrared Reflectography (IRR)Footnote 6, X-Radiography (X-Ray)Footnote 7, dendrochronologyFootnote 8 and comparison with a case-study group of comparative paintings. Further information on the methods employed in the initial study was gathered from the archives and documentation files of the Mauritshuis and of the Germanisches Nationalmuseum, where raw technical data were kept. In addition, Buijsen and Wadum generously made available documents from their personal archives. The information distilled from these sources was further supplemented by interviews with the initial researchers. Supplement 1 contains the reconstruction of the initial study, which was based on a combination of all these resources.

Combined approach: reproduction and conceptual replication

An assessment was made of the practical and theoretical feasibility of the three categories of replication described earlier: reproduction, direct replication and conceptual replication (see Rulkens et al., 2022a Table 2) The choice was made to carry out both a reproduction and a conceptual replication, as this would allow a comparison of these categories and thus provide further insight into their relevance. It would also enable the inclusion of new techniques and methods as part of the conceptual replication, and thus provide the opportunity to explore the potential for improvement of scientific rigour through replication. A direct replication was considered less promising, because it was thought to be unlikely that the production of new IRR and X-Ray images with the old equipment would result in significantly different data than the raw data saved from the initial study. The reproduction and conceptual replication were carried out one after the other, to prevent bias through data contamination (see Rulkens et al., 2023a, para. 4.2). Figure 3 presents an overview of the study design.

Fig. 3
figure 3

Design of the replication study (methods used in reproduction and in conceptual replication).

Methods of the reproduction

Only those parts of the initial study that were feasible to reproduce were selected for the reproduction: naked-eye observationFootnote 9; microscopy; reanalysis of the old IRR image data; reanalysis of the old X-Ray imagesFootnote 10; and reanalysis of the old dendrochronology data of the Mh and GN versions.Footnote 11 Supplement 2 contains the data assembled for the reproduction. Reanalysis of the old data of the comparative case study paintings that were part of the initial study was practically unfeasible, because Buijsen and Wadum used different comparative case study paintings and the group of comparative case study paintings discussed by the latter was not fully reconstructible. Additionally, not all raw IRR data of these comparative paintings was preserved, because the technology available to the researchers at the time and the access to these paintings did not always allow this. Reanalysis of the data of the initial expert meeting was considered unfeasible because the interaction and joint process of looking, discussing and interpreting the paintings and decision-making process was not, or only partially documented (see Rulkens et al., 2022a table 2). The reproduction was carried out by authors SM and CR, and the analysis of dendrochronological data was performed by Dr. Marta Domínguez-Delmás (also see Domínguez-Delmás, 2025, forthcoming).

Methods of the conceptual replication

For the conceptual replication, the initial methodology was revised in two ways in comparison to the initial study. The first revision to the initial methodology was the inclusion of examination techniques that have been developed or significantly improved since the initial study was carried out. This decision allowed us to assess the impact of technical innovations on answers to the research questions. Techniques modified or added were: higher-resolution and image quality IRR; improved image-quality X-Ray; dendrochronology with a larger dataset and improved technique; higher-quality digital photography, including the visible and UV ranges; and new analytical techniques – Hirox digital 3D-microscopyFootnote 12 and Ma-XRFFootnote 13. Furthermore, naked-eye observations and stereo microscopy were included, and the cross-sections of the initial study were re-examined.Footnote 14 The collected data were gathered in a data file (Supplement 3) and shared with the experts before the expert meeting described below.Footnote 15

The second revision to the initial methodology was the formalisation of the expert meeting, to explore whether it is possible to improve the value of the expert meeting as a research tool. This new formalised set-up was called A-ECM – Attribution Expert Consensus Meeting.Footnote 16 The A-ECM was designed in response to a crucial lack of information about the lines of argumentation of the five experts involved in the expert meeting of the initial study and their decision-making process – it was not possible to know exactly what happened or how the expert meeting influenced opinions about the paintings. Therefore, a more formalised format was developed, with the aims of enhancing its transparency and future replicability; of minimising biases and influence due to group dynamics; and to gain more insight into the lines of argumentation of the experts when (de)attributing the paintings.

Attribution expert consensus meeting

The procedure of the A-ECM is described in detail in the first addendum to the preregistration of this study (Rulkens et al., 2023b) and can be summarised in three main phases:

  1. 1.

    Preparatory phase:

    1. a.

      The experts and chair were selected according to predefined criteria (see Rulkens et al., 2023b para. 4.3.E). The initial researchers and researchers with ties to the Mauritshuis and Germanisches Nationalmuseum were excluded to minimize potential bias.

    2. b.

      During a preparatory meeting, it was ensured that the experts and chair understood the procedure, and they were given the opportunity to suggest changes. A consensus threshold was agreed upon – specifically, when three out of four experts agreed on a painting being entirely, partially, or not by Rembrandt. All participants formally agreed to maintain confidentiality.

    3. c.

      The experts and chair were provided with the above-mentioned data file (Supplement 3) two weeks prior to the A-ECM, to allow them to study the materials prior to their visit. This file included both the raw data and a description of the technical data – presented as objectively as possible – intented to assist experts less familiar with some of the technical methods.

  2. 2.

    On-site attribution procedure:

    1. a.

      Assessing of the paintings individually (20 minutes): The aim was to allow the experts to form their opinion about the paintings in combination with the provided data, prior to potential influence from group dynamics.

    2. b.

      Completing form 1 (20 minutes): The form consisted of multiple-choice questions about the attribution of the paintings, and whether they were the principal. It also included a five-point Likert scale on which experts could indicate how certain they were about their answers (0–25–50–75–100%, with 100% representing absolute certainty), and free-text fields for argumentation (for the forms, see Rulkens et al., 2023b, Appendix I).

    3. c.

      Individual interview 1 (20 minutes): The experts were asked which part of the evidence provided in the data file was most important to them when completing form 1 (for the interviews, see Rulkens et al., 2023b, Appendix II).

    4. d.

      Group discussion (120 minutes): Led by a chair in front of the paintings.

    5. e.

      Completing form 2 (20 minutes): This form was similar to form 1. The aim was to investigate the influence of the group discussion on individual opinions.

    6. f.

      Individual interview 2 (20 minites): The aim was to investigate whether the experts’ opinions had changed based on the group discussion, and if so, for what reasons.

  3. 3.

    Semi-structured open group interview with all participants (debriefing) (40 minutes): The aim was to explore participants’ experiences of the A-ECM and to stimulate discussion.

The data analysis process of the forms and the interviews was described in the third addendum to the preregistration (Rulkens et al., 2024a). The interviews were analysed using qualitative thematic deductive-inductive analysis (Fereday and Muir-Cochrane, 2006).

Findings

Both the reproduction and the conceptual replication corroborated the attributions made in the initial study, including the principality of the GN painting. The argumentations supporting both conclusions partly overlapped, but also showed differences. The findings relating to the different aspects of the reproduction and conceptual replication are discussed below, and both categories are compared.

Findings of the reproduction

The reassessment of the dendrochronological data did not fully corroborate the results reported in 1995 (earliest production date of the Mh painting being 1619 and of the GN 1623; see Supplement 2.1.1 and 2.1.3) (Domínguez-Delmás, 2025 forthcoming). In her report of the reproduction of the dendrochronological research Domínguez-Delmás made remarks about the statistical value of the initial analysis and identified an error in the initial measurements, probably caused by the relatively low precision of the method of measurement at the time (hand-loupe).Footnote 17 Although the same data set was analysed, a wider and more dendrochronologically and statistically accurate way of reporting the results was chosen, because of the limited availability of sapwood in the samples. For the Mh version this resulted in the earliest production date of 1612, and for the GN version in an interval for the earliest production date of 1617–1632 (Supplement 2.1.7). These revisions did affect the attributions of the paintings or for one of them being a principal.

Furthermore, the reproduction confirmed similarities and differences between the execution of the paintings. They both have a first ground layer in chalk, and a second ground layer which is light in colour and consists mainly of lead white with the addition of a few earth pigments and perhaps some black. Also, in both paintings, thin, semi-transparent reddish-brown under modelling is evident in various areas. The size of the paintings differs slightly, and both panels may have been altered in size. The bevelling in the Mh version is from a later date and gives no indication how much of the panel could be missing, but it may have been altered on all sides. In the GN version, the shortened bevelling at the left and the absence of bevelling at the lower edge indicate that the panel was made smaller at the right and lower sides. Pentimenti in lower side of the GN version may be remains from an earlier, unfinished composition that was part of the larger format panel.

The most striking difference is the presence of the underdrawing in the Mh version, visible in IRR and consisting of two phases. The first phase of underdrawing is less evident and seems to be partly wiped away. The second phase is more clearly visible. It is positioned slightly higher and to the left of the first phase. Overlaying the IRR image from the Mh version on the GN showed that both phases of the underdrawing correspond exactly to the contours of the GN version, strongly suggesting that the underdrawing was made from a tracing of the GN painting.

The second underdrawing was closely followed in the painting of the Mh version. The drawing varies in execution, which may indicate different hands or adjustments during execution by the same hand. The hair is underdrawn in free, fluent and curly lines, which may have been applied in a wet underdrawing material. The lips and nostrils consist of short, fast and scribbly lines, whilst the eyes are indicated more stiffly. In the costume and the gorget, more repetitive, scratchy sketch lines have been applied in this preparatory phase, as if the artist was reassuring themselves of the right form.

The paint layers in the GN version have generally been executed more openly and loosely than those of the Mh version, and more of a grey underpaint is shining through. The Mh version is much more opaque and has a higher degree of finish; its individual brushstrokes are harder to distinguish. A good example of the difference is the hair, which in the Mh version is rather stiff, less curly and more wavy, in comparison to the loose curls of the GN version. The openness of execution in the GN painting is also evident in the X-Ray and IRR images.Footnote 18 The proper left shadow side of the face of the GN version is hard to read with the naked eye and in the IRR, in which it appears patchy.Footnote 19 This effect is increased as the edge of the face at Rembrandt’s proper left side and at his chin are discoloured and have become much lighter, making the face appear much rounder than initially intended. There are very dark passages in illogical places, which do not look like brushstrokes. Paint degradation in this area of the face may be the cause of some grey areas. A signature in the form of a monogram is only present in the GN version and appears to be part of the initial paint layers. A notable exception to the differences are the collars, which have been painted very similarly in both versions. Although additional observations were made during the reproduction in relation to the initial publications, they did not lead to different conclusions (cf. Buijsen, 1999; Wadum, 2000). The reproduction corroborated the initial findings that the GN is the principal version on which the Mh version was directly based, and confirmed the initial conclusion that the GN was painted by Rembrandt, identifying the Mh version as a contemporary studio copy by another hand.Footnote 20

Findings of the conceptual replication

Forms 1 and 2: Main answers and certainty levels

In the forms, the experts expressed their opinion on the main research questions before (form 1) and after the group discussion (form 2). Supplement 4.1 provides a table with the outcomes. The certainty percentages provided by Expert 4 in form 1 are not included in the analysis, because after completing form 1 this expert explained that they wrongly interpreted the question.Footnote 21

The outcomes show that not only the preset minimum level of consensus (three out of four) was met, but that the four experts were unanimous in attributing the GN version to Rembrandt (with the final average of feeling 94% certain about their answers), and de-attributing the Mh version (with the final average of feeling 94% certain about their answers). All experts furthermore agreed that the GN version was the principal (with the final average of feeling 94% certain about their conclusions).

Comparison of the two forms allowed for an investigation of the impact of an expert group discussion on expert opinions. Interestingly, the group discussion increased the average certainty about the de-attribution of the Mh version, but lowered their average certainty about the question of whether the GN painting was the direct model for the Mh version. Comments about the procedure helped to explain these differences. While Expert 3 was initially 100% certain of the relation between principal and Mh painting, they lowered their score to 75% while raising their certainty score about the Mh painting. This seems indicative of their general view on attribution questions, as Expert 3 explains in form 2: ‘I have learned that attribution questions can never be resolved 100%. The case of the GN and Mh paintings has shown that attributions can change if new evidence becomes available. It is therefore very important to keep an open mind in attribution questions.’ This attitude is reflected in the modifications of the ratings of this expert.

Although the certainty level of Experts 1 and 2 remain unchanged in form 2 compared to form 1, they did indicate that their thoughts about the attributions changed because of the group discussion. Expert 1 explains in form 2: ‘I came into the meeting knowing that Dou has been named as the possible author of the Mh painting. Colleagues’ reminders of how little we know of who was working in the studio and just what they were doing, was a healthy corrective. Without further detailed study of Dou’s painting processes, especially in his very early career, it might be more productive to attribute the Mh painting to an anonymous associate of Rembrandt.’ And Expert 2 writes in form 2: ‘[Expert 1’s] conviction that the two were painted side-by-side pushed me to reconsider my idea that the artist of the Mauritshuis painting only started with Rembrandt’s work. It encouraged me to look again, and I am now open to the idea that the GN painting may have remained available, and that the artist of the Mh painting followed his own convictions in diverging from it in parts.’

Forms 1 and 2: Argumentation

The open fields of forms 1 and 2 gave experts the opportunity to briefly verbalise the underlying argumentation in support of their answers to the multiple choice questions. Supplement 4.2 provides the co-occurrence of the arguments in relation to the statements, which is visualised in Fig. 4.Footnote 22Statement 1, The painting in the Germanisches Nationalmuseum is entirely made by Rembrandt, was most often (seven times) supported by the argument that the painting process and build-up of the GN version are consistent with what the expert knows about the process and build-up applied by Rembrandt (Supplement 4.2, argument 13). For instance, as explained by Expert 2 when summarising their motives in form 1: ‘Overall, economy [of means], decisiveness, buildup in different layers, variation of strokes and colour and impasto. And Expert 4 in form 2: ‘Typical and accomplished is the differentiated structure of the left eye cavity and the eye, which shows the structure of the different colour layers, but also the confident design of the eye - clever details, such as the curl falling into the forehead, painted wet-on-wet.’

Fig. 4: Sankey diagram of arguments provided in the forms to support statements 2, 3 and 1.
figure 4

The order of the statements 1–3 is mixed to improve the clarity of the Sankey diagram. The indications E 1–4 represent the experts that provided the argument, followed by the number of times this argument was coded (in parentheses).

In addition, the argument that a brown-painted sketch and/or all stages of the painting process play a role in the final image of the GN version was mentioned five times with regard to statement 1 (Supplement 4.2, argument 5). For example, by Expert 1 in form 1: ‘In fact, all stages of the painting process play a role, including strokes that originated in an underpainting stage (see all stages, for example the proper right eye). In the gorget, the sketch and ground was left exposed strategically to echo the structure, for example, around the rivet and along the join of the flat of the gorget to its raised collar.’ And by Expert 2 in form 2 ‘My opinion was supported by [Expert 3’s] and [Expert 1’s] observations of the buildup in layers, the characteristic use of brown umber paint in the underlying sketch.’

In support of statement 2, The painting in the Mauritshuis is not made by Rembrandt, the argument that the painting process and build up in the Mh version are not consistent with what the expert knows about Rembrandt’s process and build-up, was raised most frequently (eight times) (Supplement 4.2 argument 27). As Expert 4 explains in form 1: ‘Painting technique very fine, dense, disciplined and well-worked, not very lively, unusually well-worked, smooth for a Rembrandt.’ And Expert 2 in form 2: ‘Not a single brushstroke aligns with Rembrandt’s handling of the brush: agile, attenuated, often with a tendency for roundish and irregular strokes.’

Also, the argument that the painter of the Mh version seems to reproduce visual qualities of Rembrandts technique and/or compositions with other methods than employed by Rembrandt is mentioned five times (Supplement 4.2 argument 30). For example, by Expert 1 in form 1: ‘This painting seems to consciously reproduce the visual qualities of Rembrandt’s painting technique as well as the composition, but uses somewhat different methods to reach a superficially similar appearance.’ And by Expert 3 in form 2: ‘It imitates the effects of Rembrandts prototype with different means.’

In support of statement 3, The version in the Germanisches Nationalmuseum is the principal, the most frequently coded (four times) arguments concern the improvisation experts recognise in the painting process for the GN version (Supplement 4.2, argument 10). For instance, Expert 1 in form 1: ‘The GN version is freely, inventively painted, developing the composition on the fly and incorporating all stages into the final image.’ Argument 13, explained above, was also mentioned in support of statement 3 (three times), for example by Expert 1 in form 2: ‘The handling of the GN painting, to my eye is so familiar – entirely consistent with other paintings I have examined from early in Rembrandt’s career - that I don’t see the need of a lost painting that is ‘more Rembrandtesque’ to explain the shared qualities.’

Interviews 1 and 2

In the interviews, the experts were asked to explain which part of the evidence was most important to them when completing forms 1 and 2. To code the answers given in the interviews, the main code Technical evidence was developed deductively. Its subcodes consisted of the pieces of information in the data file that the experts received before the start of the A-ECM. In response to the information experts provided in the interview, five additional disitinct themes were inductively developed: Artistic vocabulary/painting technique; Studio practice; Condition; Literature; and Reference paintings. The description of all the themes can be found in Supplement 5.1, alongside an overview of all main themes, codes, subcodes, their descriptions and examples. The co-document table, representing the number of times and expert mentioned the evidence in interviews 1 and 2, is presented in Supplement 5.2.

The inductively developed theme Artistic vocabulary/painting technique covers the set of skills and techniques a painter masters and employs to reach their artistic goals, which can be wide or limited and can be particular to an individual artist. It includes the subcodes concerning colour and light, the use of materials, as well as paint handling, such as the perceived speed of painting, and the thickness and broadness of strokes (e.g., impasto). This is the theme mentioned most by experts (121 times), who used such properties to characterise the painting and, in some cases, for statements about whether they are in line with Rembrandt’s painting style or not. Within this main theme, the subtheme Layering/build-up was coded most frequently (24 times). For example, Expert 3 said in interview 1 that there is a: ‘difference in where you see underlying layers, especially the first paint layer, probably in the Nuremberg painting which seems to me on a logical basis where to expect them in paintings by Rembrandt.’

The emphasis on the build-up of the paintings is supported by the fact that within the main code Technical evidence (mentioned 44 times), the most advanced imaging techniques to analyse build-up, Ma-XRF and IRR, were mentioned most (both 13 times). As Expert 1 explains in interview 1, for example: ‘I would say of the technical evidence for me, the most important was the Ma-XRF, which was new to me and which gave a more varied picture.’ And Expert 2 in interview 1, when asked to indicate the importance of different techniques for their opinion: ‘Well, definitely the infrared [IRR]. I think the revelation of the ground layer, the revelation of change that had taken place underway.’ The explicit mention of other subcodes of Technical evidence was incidental, and some techniques, such as Hirox digital microscopy and dendrochronology, were not mentioned at all. In case of the dendrochronology that might be the result of the convincing outcome of this aspect of the technical research, which does not require further discussion, but also because the felling date of the tree does not provide definitive answers for the (de-)attribution of (one of the) painting(s), or them being a principal or not.

The theme Studio practice (61 times) concerns the role of all aspects of the studio in the practice of the artist, including its students. Sub-codes covered comments related to opinions about whether a painting is considered to be imitated from another work of art, if it is invented ‘on the fly’ by its painter, whether it was painted from life, or whether adjustments were made. The subcode Imitation was coded most (11 times), suggesting an important role—not surprising considering the nature of the issue at hand. For example, Expert 2 stated in interview 1: ‘The difference in the squatter-turned brush stroke at the tip, lends a certain roughness that is imitated actually in the Mauritshuis painting with little dabs, but that’s not efficient. That’s not Rembrandt’s way of doing it, but it’s an imitation of it.’

The theme Condition (12 times) covers the general condition of the painting, as well as more specified mentions of condition issues such as abrasion, paint discoloration and too much varnish. While Experts 1 and 2 did not mention the condition of the paintings at all, Expert 3 mentioned condition once in interview 1, and Expert 4 covered this topic more in-depth in interview 1 (ten times). For instance, as Expert 4 stated in interview 1: ‘it is because the Nuremberg painting is very heavily cleaned too. You have a lot of areas where it’s over-cleaned. You see this on the highest of the grain of the wood, the highest point of the wood. There you very often see this where the paint is gone, so this must have been cleaned too heavily.’

The frequent mentioning of Reference paintings (18 times) may be caused by the set-up of the interview, in which experts were asked explicitly about reference paintings. Unrelated to this question, the experts refer to Rembrandt’s painting technique more broadly, without indicating specific paintings. Lastly, the main code Literature was coded the least (3 times). This does not necessarily indicate that the experts do not use knowledge from literature in their argumentation. They might have internalised this knowledge as part of their own knowledge base, as it is not common to make oral references to published literature.

Debriefing

At the end of the A-ECM, the chair and experts were asked to participate in a debriefing in the form of a semi-structured open group interview. The experts and chair discussed various aspects of the set-up of the expert meeting. They were very positive about the decision to include an individual assessment prior to the group discussion. They felt that being able to compose one’s thoughts without being influenced by other opinions strengthened their individual contributions, increased the productivity of the overall meeting, and was considered a valuable ‘launching pad’ for the group discussion, thus increasing its quality. One expert explained how the individual assessment and forms led to independent confirmation of their views, which served as a reassurance of their expertise. Another expert mentioned that the interviews in-between supported progress in their thought process. A potential point of improvement mentioned by the experts could be to reserve a longer amount of time to look at the paintings individually and to complete form 1. One expert suggested to spread the programme over several days to enable experts to reflect on the case before participating in a group discussion. Furthermore, the advantage of including more experts was discussed. However, the risk of undermining the productivity of a discussion with a larger number of participants was raised as a counter argument. Some participants thought that including experts from the institutions owning the paintings could add to the quality of the assessment. They did not expect a risk of institutional experts arguing pro domo.

The experts reflected on the rigidity of the forms and felt they often had to repeat their arguments in the forms. Furthermore, they discussed the interpretation of the consensus rates. While one expert started the conversation by noting that giving a rate of 100% may lead to a closed-minded attitude in the case of new evidence coming up, another expert stated that a 100% rate may indicate that one are ‘comfortable’ with believing a certain hypothesis or theory at this moment in time. Interestingly, these reflections on the interpretation of the Likert scale only arose after the participants had used it during the A-ECM, not when the scale was discussed during the preparatory meeting before the A-ECM.

The participants furthermore exchanged thoughts on how the approach of the A-ECM would work with a more controversial attribution case. The confidentiality aspect which was part of the preparation was mentioned as a good starting point. The participants did think that in more controversial cases, experts might be protective off indings, first seeking to publish independently before participating in a group discussion such as the A-ECM. Therefore, in such cases, it might be difficult to find a group that has no stake in the discussion.Footnote 23

Discussion

This first attempt to replicate an art historical attribution study revealed several strengths, limitations, and implications for future development. First of all, a novelty for art history was not only the replication itself but also the preregistration on the Open Science Framework (Rulkens et al., 2022a; Rulkens et al., 2023b; Rulkens et al., 2023a; Rulkens et al., 2024a; for overall project page, see Rulkens et al., 2025a). This allowed for a more detailed documentation of (the evolution of) the study design and prompted researchers to thoroughly and continuously contemplate and finetune their approach. This was especially beneficial because this study concerns a replication, inherently putting additional emphasis on applied methodologies. Further case studies could be valuable to assess the advantages of preregistration for various other (less technical) kinds of art historical studies, as well as for the exploration of adjustments to existing preregistration templates to make them more suitable for qualitative research (c.f. Haven et al., 2020).

Second, the combined approach of carrying out a reproduction and a conceptual replication allowed for the demonstration of the impact of new and improved methodologies on the initial main research questions. As expected, the new IRR and X-Ray images which were part of the conceptual replication, were of a higher quality than those used in the initial study and reproduction. In the current case, the increased level of detail, for example of the underdrawing in the Mh version, did not lead to other conclusions. However, the differences may be decisive in other cases. The outcomes of the reproduction of the dendrochronology were further refined in the conceptual replication, with the correction of measuring errors which had been identified in the reproduction. The conceptual replication of the dendrochronology demonstrated the impact of improved practices such as the precise measurement of tree rings, more rigid statistical analysis and the extensive reporting of findings compared to the initial study, when this methodology was still in an earlier stage of development (Supplement 3.1). Carrying out both a reproduction and a conceptual replication did require repetitive analysis of some of the technical research methods to prevent bias through data contamination (see Rulkens et al., 2023a, para. 4.2), which in some instances had the disadvantage of being counter-intuitive and inefficient.

Third, the detailed reconstruction of the initial study, which was a necessity in particular for the design of the reproduction, contributed to the historiography of the replicated study. Moreover, it revealed important gaps in our knowledge about the course of events of the initial study. Together with practical and theoretical infeasibilities, these gaps limited the possibility of carrying out a complete reproduction and as result, only a ‘partial reproduction’ was carried out. Although its findings corroborated the conclusions of the initial study, due to the concessions made to its design it remains unclear whether a complete reproduction would have led to the same overall outcome. Additionally, it raises the question whether a partial reproduction can still be considered to be a reproduction, since inherently, parts of the protocol are changed – or better in this case, omitted.

The knowledge gaps which were identified by the efforts to reconstruct the initial study, were invaluable in informing the improvements of the study design of the conceptual replication with regards to documentation, transparency and future replicability. These strengths and limitations reveal that replication researchers should carefully consider whether the extra effort of applying a combined approach is outweighed by its epistemological benefits, especially when concessions have to be made in the design of the reproduction. Based on our experiences in this study, it is conceivable that a detailed reconstruction of the initial study might be sufficient to inform the design of a conceptual replication, and that the step of the actual conduct of the reproduction is not a necessity to benefit from its lessons.

Fourth, the conceptual replication and introduction of the A-ECM highlights the potential of formalised consensus methods as a means to increase transparency, efficiency, quality and future replicability of attribution processes. The preset protocol with the introduction of the forms and consensus threshold allows for formal ‘consensus’ on the attributions and principality of the paintings. It provides experts with the opportunity to both individually and collectively assess the data and paintings, whilst mitigating biases and documenting lines of argumentation. This method not only provides additional insight in the process of scholarship, but potentially prevents the ‘wasting’ knowledge generated during expert meetings, and improves the argumentation and judgement of the experts involved. The introduction of the A-ECM is therefore a major innovation.Footnote 24

The analysis of the forms revealed how answers to multiple choice questions and certainty percentages related to the main research questions may remain the same, whilst insights into the paintings and their creation change, as expressed in the free-text fields. Analysis of the textual arguments furthermore show that even when there is a high level of agreement on the attribution of the paintings, including the views of multiple experts has the potential of broadening the spectrum of arguments supporting the attributions. Every additional expert brings along a different set of skills and knowledge, leading to a more complete assessment of the paintings and when consensus is reached, the enhancement of the trustworthiness of findings. This is highly relevant for a field that relies strongly on the individual expertise and opinions of experts. Despite the potential of the A-ECM to contribute to the increase of trustworthiness of findings, it is evident that, as is often the case in (art) historical research, the conclusions of the A-ECM can never be considered as definitive. Other relevant perspectives may exist, certain evidence may not have been available to the experts, or may emerge in the future.

The A-ECM has the potential to efficiently, fairly and adequately include and document a larger number of expert’s opinions and helps create a systematic overview of their argumentation. However, as the method does require some level of generalisation of arguments and reduction of the complexity of questions, it might be best considered an addition to, and no substitute for academic publications substantiating (de)attribution in detailed and multilayered arguments. Combining these two approaches would do justice to the complex and nuanced knowledge at play in attribution cases.

Conclusion

Looking at the future of replication research in art history, and more broadly in the humanities, it is important to realise that this study’s perspective rests on one particular case in one particular area of research, namely attribution. Furthermore, this study employed both methods from the natural sciences and those belonging to more ‘traditional’ art history, such as expert opinions. This makes it rather an interdisciplinary study connecting the sciences with the humanities, instead of a ‘typical’ humanities study. Yet, the results obtained do carry wider relevance, and interestingly, one major innovation concerns specifically the humanities side. We demonstrated that a replication study can strengthen the scientific value of expert meetings by increasing the transparency of expert judgements. Replication as a research method furthermore allows for the (partial) reconstruction of past research practices. Protocols developed to increase replicability can help prevent bias, contribute to Open Science, and have shown to accommodate both instrumental data and connoisseurial expertise. To further develop the A-ECM method and test its strengths and limitations in various attribution contexts, the method is currently applied to a second, Rubens case study, of which the results will shortly be published (Rulkens et al., 2024b, forthcoming). A worthwhile next step would be to further refine the A-ECM methodology based on those two case studies, and apply it to another, more controversial attribution case. This would allow for an assessment whether this method is effective in a setting where financial, institutional or prestige-related implications are even more significant.

This paper has already revealed how the merits of carrying out a replication may not be limited to the corroboration of findings.Footnote 25 Striving for future replicability in itself leads to a richer picture of the variation of knowledge, arguments and scholarly processes leading up to the (de)attributions. Replication has furthermore proven to be a valuable tool to identify opportunities to improve attribution methodologies. These opportunities entail the aforementioned increase of scholarly transparency, facilitation of equitable knowledge exchange, mitigation of biases and by extension, the strengthening of expert argumentation and attribution judgements. The future of replication in art historical attribution studies therefore might not be limited to practically carrying out replications. It may also motivate researchers to develop methodologies that maximise the possibility for future replicability, in line with recent developments in Open Science, and to thus increase epistemic impact and progress in the field.