Transcribing historical Canadian weather data

Slonosky, Victoria; Black, Rachel; Podolsky, Lori; Wang, Xiaolan; Cheng, Vincent

doi:10.1038/s41597-025-06036-y

Download PDF

Data Descriptor
Open access
Published: 29 April 2026

Transcribing historical Canadian weather data

Victoria Slonosky^1,2,
Rachel Black^1,3,
Lori Podolsky⁴,
Xiaolan Wang^5,6 &
…
Vincent Cheng⁵

Scientific Data volume 13, Article number: 678 (2026) Cite this article

1214 Accesses
14 Altmetric
Metrics details

Subjects

Abstract

Historical weather journals from across Canada, spanning 1768–1884, have been transcribed from handwritten records into machine readable formats. The NORTHERN (Nineteenth-century Overseas Records Transcribed for Historical Environmental Reconstruction in the North) project transcribed nearly 2 million weather observations from 46 locations. The original documents are in archives outside Canada. The two principal archives investigated for historical Canadian weather are the United States’ National Administration and Records Archives (NARA) and the United Kingdom’s Meteorological Office (UKMO) Library and Archives. Some observations were also located in the United States’ National Centers for Environmental Information (NCEI) “Forts” dataset. Observers recorded from three to twenty weather variables, in most cases two or three times daily. Validation procedures are carried out with export files produced in both the original format and in modern units. Observations of pressure, temperature, precipitation, snow depth, cloud cover, cloud type, wind direction and wind force are transcribed along with detailed descriptions of events including fires, floods, ice formation and break up, storms and other weather phenomena. The value of these data lies in their detailed observations of sub-daily weather together with descriptive observation of disruptive or extreme weather events. These data will be used to expand knowledge of Canada’s climate variability and extreme values for three centuries and to improve global reanalysis data products.

High-resolution climate reconstruction from historical Chinese weather records using optimized natural language processing

Article Open access 24 December 2025

Unified 0.25-degree gridded infrastructure-critical extreme weather for the United States from 1979 to 2100

Article Open access 12 September 2025

Gridded daily weather data for North America with comprehensive uncertainty quantification

Article Open access 23 July 2021

Background & Summary

There exist millions of historical weather observations from Canadian locations in paper format in archives in Canada and around the world¹. These historical records are of immense importance in understanding long term climate change^2,3,4, climatic extremes and high impact weather events^5,6, and in constraining and validating climate models⁷. Historical observations such as those presented in this paper are collected into international data collections^8,9 and are used as the input data to drive the numerical weather prediction model which produce reanalysis data, such as the Twentieth Century Reanalysis Project (20CR)^10,11. The high-quality, multi-variable and sub-daily observations and the reanalysis models they enable are particularly vital for furthering our understanding of rare, disruptive or high impact events which caused local, regional and global social disruption, such as the Year with a Summer of 1816, following the Tambora eruption of 1815¹².

Here we present a collection of digitally transcribed (machine readable) and quality-controlled historical weather observations for the 18^th and 19^th centuries in Canada, concentrating on the period before the founding of the Meteorological Service of Canada (MSC) in 1873. These observations are housed in archives outside present-day Canadian territory. We note here that the territory that is now Canadian underwent considerable evolution over the past three centuries. In the early nineteenth century, Canada consisted of British colonies known as “British North America” in southern Ontario, Quebec, New Brunswick, Nova Scotia, Prince Edward Island and Newfoundland. The Hudson Bay Company had been granted the land covered by the drainage basin of the Hudson Bay, a territory then known as Rupert’s Land. At Confederation in 1867, the colonies of Ontario, Quebec, New Brunswick and Nova Scotia formed the country of Canada. The territory of Rupert’s Land became attached to Canada in 1870, along with the area near the Red River Settlement (Winnipeg). British Columbia was formed in 1871, Prince Edward Island joined Canada in 1873, and Newfoundland in 1949. Thus, many of the locations now in Canada were administered by different entities during the period of interest here, and many meteorological records can be found in non-Canadian archives.

The main source of these observations with 32 records, is the United States National Archives and Records Administration (NARA), specifically those housed in the NARA M1958 collection, consisting of eight rolls of microfilms of weather records from 19^th century sources outside the contiguous United States¹³. The microfilms were produced by NARA in 2005. We designate this source “NARA M1958”. Two additional records were found in the US National Centres for Environmental Information (NCEI) “Forts” Dataset^14,15 designated here “NCEI-Forts.”

Locations where existing information is sparse, particularly in the Canadian north and north-west^1,5 were preferentially selected. The two determinants of areas of existing observational scarcity and record longevity were often mutually exclusive, as longer, more stable observation recording locations tended to be in more southern areas, with some exceptions (Fig. 1). Considerable effort was expended on the short, fragmentary, and often difficult to decipher records of the Canadian North-West. These image files were uploaded into an online app where values were transcribed directly into a relational database^16,17. Efforts are made to transcribe as much as possible of the original documents and all of the information for each site in order to conserve our scientific heritage, meteorological and climatological intra-variable coherence, data validation and data traceability. All the information from Canadian stations in the NARA¹³ and Forts¹⁴ archives were extracted. Some of the information from the UKMO Royal Engineers and Army Medical Department were transcribed in an earlier project (see below). We believe we have extracted the main sites and information for Canadian stations from the UKMO official British Army archives; however, further Canadian observations may remain in other UKMO collections.

Earlier projects in 2017 and 2019 located the records and identified the observational data and structure in the NARA and NCEI archives. A pilot project (Transcribing Historical Canadian Climate Records: the Red River Settlement and York Factory Records) ran from January to March 2020, where observations from the stations of York Factory¹³ (1874–1884; 204,028 data points) and the Red River Settlement¹³ (1844, 1855–1861; 31,798 data points) were digitized by 16 transcribers. Following the success of this project, a more substantial project (Transcribing Historical Canadian Weather Data: The Smithsonian Records), renewed twice in 2022 and 2023, was organized to expand on and transcribed observations from the rest of the Canadian stations in the NARA¹³ and Forts¹⁴ datasets. This amounted to 496,578 data points transcribed by 12 transcribers in the first year (2021–2022) and 744,523 data entries by 11 transcribers the second year. As the project entered its third year, most of the observations from the NARA and NCEI sources had been transcribed, and so meteorological records from the United Kingdom were investigated. Twelve records were found in the United Kingdom’s Meteorological Office (UKMO) archives. Seven transcribers entered 482,829 data points in the final year of the project.

The transcribed values are verified, validated and transformed into modern international units and exported into machine readable text files. Here we use “verify” to denote a process to check that the values transcribed are faithful to the original observations recorded by the historical observers. We use “validate” to indicate the transcribed values are within the normal ranges for the meteorological variable observed. The two process are sometimes in conflict, as when the original observer transposed digits. In these cases, the original observation is flagged and, if possible (e.g. the observer wrote “15” instead of “51” for a summertime temperature registered in Fahrenheit degrees), the transformed value is altered in the validation stage. All changes are automatically recorded. The initial export is a csv format which reproduces as faithfully as possible the layout of the original register pages. Other export formats are the Station Exchange Format (SEF)^18,19,20 and the NCEI recommended csv format²¹.

Many of the historical observations transcribed are weather records kept by volunteer weather observers and transmitted to the Smithsonian Institute as part of the volunteer Smithsonian Meteorological Project organized and maintained by Joseph Henry^22,23,24. Although originating in the United States, one of Henry’s goals was the understanding and predictions of storms, and thus observations were collected from across North America. The project began in 1849. The U.S. Civil War of 1861–1865 severely disrupted the observation network in the United States²⁴, although it was a peak period of observations in Canada (Fig. 1b). After the Civil War, responsibility for communications was taken up by the US Signal Service.

Printed forms with a detailed set of instructions on how to observe and record weather and meteorological phenomena were sent to observers to be filled out and returned at the end of each month (see Fig. 2). Many of the volunteer weather observers had already been engaged in recording the weather and in some cases other copies of their observations exist in local or other international collections. The value of the Smithsonian collection is that the participants were requested to observe a standard set of variables at specific observing times, usually 7AM, 2PM and 9PM local time, at stations across the continent. This gives the set of observations standard observing practices, a set of commonly observed variables and regularized register forms. These aspects increase confidence in the accuracy of the observations and make it possible to design a web-based transcription process.

The consideration of how to gauge the trustworthiness of historical observations is complex²⁵. Instructions to observers include both specific instructions of daily readings (Fig. 3) and more general remarks on the placement of instruments. Barometers “may be conveniently placed within doors, in a room not subjected to sudden changes of temperature, in a good light but shaded from the direct rays of the sun”²⁶. Thermometers “should not be placed in contact with the side of a house. The best position for the thermometer is in the middle of a projection from a window on the north side of the house, so as to be entirely in the shade”²⁶.

Other records in these archival collections include documents from both individual observers and from organized weather collection efforts by the United States' Surgeon General’s (USSG) Office. After the Smithsonian ended direct involvement in the volunteer weather project in 1870, the United States Signal Service (USSS) continued to collect weather observations from volunteer observers¹³. Some records from current Canadian territory can also be found in the United States' National Centre for Environmental Information (NCEI), in the “Forts” data collection¹⁴.

Weather registers and journals are also found in the archives of the UKMO^27,28. Among these are the observations kept by the Royal Engineers and later the Army Medical Department^{28,29,30,31,32,33,34,35,36,37}. The Royal Engineers (RE) also observed the weather at specified local times and according to systematic instructions and on pre-printed register forms³⁸.

Methods

The procedures used to recover the historical weather and climate data can be divided into three main parts. The first is the pre-transcription processing, which includes locating the observations, obtaining digital image files, processing the image files and configuring the app for data transcription (Fig. 4a). The second part is the actual transcription process. The third part is the post-transcription processing of the now machine-readable weather observations (Fig. 4b).

A traceable transcription and validation process is critical to maintaining transparency in data records. Here, traceability starts with the maintaining of a connection to the original archival record source through the medium of the digital image file of the original meteorological observations. The code for the transcription app can be found on the GitHub platform³⁹.

The image files

The image files obtained from the NARA M1958 repository¹³ were organized by station location and renamed according to register type, page type and date. Three considerations went into naming the image files. First, we wished to embed metadata of interest, such as location and time period covered by the data on the image files, to make it easy to locate the image file when at a later stage in the project weather data obtained from the image file were examined. Second, we wished to create a unique identifier for each image file. Finally, we wanted to maintain traceability of data from the original archive identifier to the data export in flat files. To accomplish these three goals, the elements listed in Fig. 5 were combined to form each image file name.

Each image file was given a unique identifier composed of the station location, the observer’s name if necessary for disambiguation, the register type, the originating archive identifier, the date of the observations and the page type. A typical file name is YorkFactory_USSS-316_M1958_1883-06-01_OBS-1.jpg. The image files were then uploaded to the web app, and appropriate transcription environments created to replicate each register type.

Images files were examined for quality and notes made on quality issues. Notes include comments on the condition of the image or of the original pages, such as “major ink smear”, “badly scanned, but mostly legible”, “pasted in values”, or “heavy bleedthrough.” The image files from NARA and NCEI were obtained from microfilm images of the original documents, so are at several removes from the original document (Fig. 2). In some cases, the original documents were not accessible to, due to either their fragility or their location in an overseas archive. Problems with the image files originated from two main sources. The first, which is inherent to the original document, was the quality of the original documents and conditions under which the weather observations were recorded and transmitted to the Smithsonian Institute. Tears in the pages were common in older documents and in pages from trading posts in Canada’s interior such as Fort Simpson or Michipocoton, which presumably had long postal journeys over rough terrain. A substantial number of pages had ink blots or bleedthrough obscuring sections of the page. At Wolfville, two months of humidity observations had recalculated values pasted over the originally recorded values. Some of these issues, particularly the bleedthrough and pasted sections, may have been easier to resolve if the images were available in a colour format, rather than black and white microfilm. The second major quality issue occurred when part of the document is obscured what appears to be tape or bindings on the document itself and possible microfilm issues. These problems can lead to irrecoverable portions of the observations on the page.

The images from UKMO were taken directly from the original sources, either by photograph or by high-quality scan. The photographs had issues with the page binding of the original document, making the values towards the edges of the pages where they were bound into a volume difficult to read and distorted. The images scanned professionally by the UKMO library and archives staff were the highest quality images with few issues.

The register types

As most the records were inspired by the Smithsonian volunteer weather observing network, the observers recorded their observations on pre-printed forms and distributed first by the Smithsonian Institute (Fig. 2), and later by its various successors such as the US Signal Service. Similarly, the observations from the UKMO archives were largely taken by military observers with standard printed forms. Although an advantage of the printed forms is that they provide uniformity across stations and time, there is nonetheless some variety in forms. Formats changed over time as new observing variables were added or removed, or observing instructions were updated with evolving needs and improving instrumentation. Different forms were sent to different observers depending on the types of observations made. The forms were catalogued and given code based on the number of pages in the form: the “100” code family denotes a one-page form, “200” a two-page form, and so on (Table 1). A new register type was coded if the variables recorded changed or if the layout of the printed form changed. A subtype was noted if the observer added handwritten modifications or additional observations. This structure is designed to be flexible as new register types are continually being identified with new source materials.

Table 1 Register and Page Types.

Full size table

Some of the observations are recorded in personal diaries or in handwritten tables. These are unique, and as such do not conform to register page type cataloguing. These are given register types with the abbreviation of their location followed by a numerical designation for each change in the variables or layout of the diary or table. For example, the Amherstburg register¹⁴ changed formats several times, and the designations assigned to the register types are Amherstburg_AM-1, Amherstburg_AM-2, and Amherstburg_AM-3.

The page type

Each register type can have one or more page types. A page type has a specific organization of information, both meteorological observations and metadata such station location, observer, date and variables observed. It should be noted that not all observers had sufficient time or the necessary instruments to record all variables listed in the forms, thus not all variables listed on a register type were necessarily recorded. The Smithsonian and later US Army Signal Service observation forms were sheets designed to be folded and sent by mail¹³. Their format changed over time: at times the form consisted of one sheet folded in two, with instructions printed on the reverse side. Later, the form consisted of four pages: a page of instructions, two inside pages for observations, and page for recording remarks and casual phenomena. The page types are divided into observations pages (OBS: Fig. 2a,b), casual phenomena pages (CP; Fig. 2c), and instruction pages (INS; Fig. 2d). On some forms the casual phenomena and instructions appear on the same page (CP-I).

Each page and register type have a specific combination of observing time and meteorological variables recorded. An example of the meteorological variables and the original measurements units for Register Type USSS 316 is shown in Table 2, along with modern equivalents and units where possible.

Table 2 Example of Meteorological variables for Register types USSS-316 with units and abbreviations: barometer, thermometer, cloud and wind, precipitation, humidty and weather remarks.

Full size table

The microfilming process of the original documents led to some documents being photographed as one image, but at other times be split into two separate images, a left-hand side of one original document page and a right-hand side another original document page. In order to capture this diversity of formats, the observations pages are subdivided into full observations pages (OBS-F), left-side pages (OBS-L) and right-side pages (OBS-R). The register types for the US Signal Service 314 and 316 had distinct pages, so these were named 1 and 2 rather than left and right (Fig. 2a,b). Each register type has one or more page type associated to it (Table 1).

The transcription process

As observations are transcribed into the web app, they are saved directly into a database. Both our transcription environment and our data output are designed to resemble the original observations as closely as possible, for reasons of error reduction and conservation of scientific heritage.

The transcription interface for each register type is therefore built up to reflect the observation groupings in the original register pages. Within the user interface (UI) on the administrator pages, a field group is created and named “Clouds”. Fields are created and named “Cloud direction”, “Cloud amount” and “Cloud kind” (Fig. 6). Field values are then created for the field “Cloud kind” which include options for the drop-down menu such as Cumulus, Nimbus, etc. These field values are then linked with the field “Cloud kind” in the UI (Fig. 7), the field “Cloud kind” is linked to the field group “Clouds”, and the field group clouds is linked to a register schema, such as USSS-316.

All linkages made in the UI are reflected in the back-end database. Fields values, fields and field groups can be used in more than one register schema. The field options for variables that have technically limited values, such as cloud type or wind direction, are accessed by a drop-down menu to limit transcription or interpretation errors (Fig. 7). Fields that are not constrained to limited options, such as barometer observations, have a free-text entry field.

Occasionally, modifications by the original observers necessitated the addition of new fields and sometimes even the creation of new register types during the transcription process. The observer for York Factory, for example, added minimum and maximum thermometer and supplementary barometer observations to the register. These additional observations, as well as additions to the printed observation forms, account for the differences between register types USSG-314 and USSG-316.

Data transformation to modern standards

Table 2a–f gives an overview of the historical variables, the modern equivalents, historical and modern units, and the internationally agreed upon abbreviations for these variables where they appear in standardized filename datasets. Information on the specific observations, instruments and variables, such as the wind scale, are found in historical technical documents, such as Instructions to Observers pamphlets or articles^38,40.

Not all historical variables have yet been given designated recognized modern equivalents. Most historical observations from the 19^th century are not recorded in standard SI units accepted in internationally exchanged data files (Table 2a–f). The information needs to be transformed into modern units as designated by, for example, the World Meteorological Organization (WMO) standards. Conversion values for pressure (Table 2a), temperature (Table 2b), precipitation (Table 2d), humidity (Table 2e) and precipitation are well-known. Wind and cloud directions are transformed from cardinal directions to degrees (Tables 2c, 3). Variables which are recorded in ordinal scales, such as wind force (Tables 2c, 4) or cloud velocity (Table 2c, are more difficult to transform into modern equivalents. The Smithsonian Institute developed a wind scale which was contemporaneous with, but not completely equivalent to, the Beaufort wind scale.

Table 3 Wind and cloud direction conversions from cardinal and intercardinal directions to degrees.

Full size table

Table 4 Smithsonian and USSG Wind Scale conversions.

Full size table

The Royal Engineers were requested to measure the wind force in pounds per square foot. These were sometimes recorded in pounds and ounces, such as at Halifax (e.g. 3 15; Fig. 8a), and others in decimal pounds such as at Kingston (e.g. 3.8, Fig. 8b). At still other stations, such as New Westminster, the engineers measured the wind force using the Beaufort scale. Some of the wind force observations are further complicated by observers changing methods of recording partway through their records (e.g. from Beaufort to miles per hour; see Fig. 8c). With up to five different methods of recording, the wind force field is one of the most difficult to interpret correctly. As the SEF file formatting standard requests wind speed in m/s rather than wind force scales, wind force was one of the most complicated variables to address.

Most conversions were applied using the standard functions contained in the lmrlib.py routine produced by the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) from NOAA⁴¹. The formula used for converting pounds per square inch to meters per second is given by Equation 1:

Equation 1. Conversion from wind force measured in pounds per square inch to metres per second

$${\boldsymbol{ws}}={\bf{0.44704}}\left(\sqrt{\frac{{\boldsymbol{wf}}}{{\bf{0.00256}}}}\right);$$

where ws is wind speed in m/s and wf is wind force in lbs/in²

Cloud velocities are also recorded in a scale of 1 to 10 (Tabe 2c). Cloud amounts, or cloud cover, were commonly recorded in tenths in the 19^th century, whereas the units prescribed by the SEF standards are octets. Cloud types used in the Smithsonian and other records are listed in Table 5, along with equivalents from the International Cloud Atlas⁴².

Table 5 Cloud types used in the Smithsonian registers and International Cloud Atlas equivalents.

Full size table

Historical weather remarks are more difficult to translate to modern synoptic weather codes (Tables 2f, 6). The relationship between the weather conditions described in the historical registers and the modern Canadian synoptic is not fully equivalent. There are conditions described in the historical documents which have no parallel in the synoptic codes and similarly, some synoptic codes which will have no exact equivalent in the historical wording of past weather conditions.

Table 6 Historical Weather remarks and equivalent Canadian Synoptic Weather Codes⁴⁶.

Full size table

Data Record

The dataset is available at the US NOAA National Centers for Environmental Information (NCEI)⁴³. The dataset is titled AIR TEMPERATURE, Surface pressure, and others collected from FIXED STATIONS OF CANADA in Canada from 17680911 to 18840229, with the NCEI Accession Number 0304217. The data can be found at https://doi.org/10.25921/g637-9093.

Metadata

The metadata standard used here builds on the ISO 11905 standards for geographical information, the WIGOS (World Meteorological Organization Integrated Global Observing System) recommendations⁴⁴ and the extensions and recommendations of the Copernicus Working Group Best Practice Guidelines⁴⁵. Further modifications have been made here to adapt to the contingencies of historical climate information. We include historical location designators such as historical latitude, historical longitude and additional historical location designators to reflect the fluidity and nomenclature of Canadian territorial designators.

Data Export files 1: CSV files

After the technical validation (see next section), csv files are produced whose aim is to replicate the original observations as closely as possible in the original units. This is to provide a machine-readable reproduction of the historical records. The files are produced by register type, as each register type reflects differences in the times of observation or variables observed. The csv files for York Factory are thus YorkFactory_USSI-412_1874-10_1876-09.csv, YorkFactory_USSI-314_1876-12_1881-12.csv and YorkFactory_USSI-316_1882-01_1884-02.csv.

Data Export Files 2: SEF files

The export format of the data files is based on the station exchange format (SEF) developed by the Copernicus Working Group. The filename is constructed using the elements of the originating data project source or repository, the station name, the start date, the end date, and the variable abbreviation.

The first eleven lines of the SEF file contain standardized metadata, comprising the SEF version, the station ID, the station name, the latitude, the longitude (degrees east, station altitude, source, data link, variable abbreviation, temporal statistic and measurement unit. The twelfth line describes the structure of any metadata included in the subsequent data lines, as well as any metadata included in the overall data series (for example, “UTCOffset = YES” specifics that the dates and times in the file have been transformed to the Universal Time Coordinate).

Data Export Files 3: Final SEF and NCEI format csv

The final set of SEF files are compiled once the post-processing is complete. As the database is updated with a final ISO standard table, csv files according to NCEI specifications, with station metadata, observation metadata, and observations for all variables for each station in a single csv file.

Technical Validation

Step 1: Page checks

The first validation step is a visual check by quality-control specialists on the transcribed data through the transcription app. The data table is compared to the uploaded image file, and visual inspection for missing rows of data indicating mis-entered dates, repeated rows of data, and other common transcriptions errors that are difficult or time-consuming to verify without recourse to the original document were prioritized during this initial check. These include common historical meteorological shortcuts such as use of ditto marks to indicate repeated values or omission of leading digits before the decimal. A first check of illegible values, due to document degradation, microfilming or imaging issues, or handwriting issues is also performed at this stage.

All issues are noted in a log file, though not all logged issues are necessarily errors. Many are notes made by transcribers on the comments, annotations or inconsistencies, such as changes in observation times made by the original observers. Common transcription errors that are difficult to correct automatically were found to be writing 57 for 51, due to mistaking the bar in the handwritten five for a bar in the digit seven; misplacing a decimal, such as typing 1.01 instead of 10.1, and forgetting to enter the correct date.

Step 2

The second validation step is performed in a program external to the app. Each entry for a given station is extracted from the database. If more than one transcription exists for a station ID, variable and date the most recently updated value is selected.

Entries are then scanned for values which equate to no data, including “none”, “missing”, “retracted”, “empty”, and other transcription equivalents such as dashes, spaces, and null values. These are coded to −999 and the quality control flag is set to missing. Values entered as “illegible” or with other error codes are set to −999 with appropriate flags (Table 7). Many transcription errors are noted automatically at this stage.

Table 7 Validity codes.

Full size table

The validation then proceeds depending on the variable type, with different procedures depending on the assigned units. For all variables with units of inches of mercury, degrees Fahrenheit or percentage, values are checked for common transcription issues such as double decimal points, commas instead of periods for decimals or spaces in the number. These are automatically corrected in the code and an error message with a structured query language (sql) correction code is written to a log file for each variable. Values for fields such as temperatures, pressures or precipitations should also all be decimals, so they are transformed into floats. Observers sometimes recorded values as fractions, which then must be transformed into decimal values.

The observers often did not note the leading zero and decimal when recording vapour pressure. Vapour pressure values are tested for range and divided by 100 if no decimal point is detected and the values are greater than 1. Similarly, some observers did not record the leading digits before the decimal for barometric pressure. In most cases either the transcribers added the leading digits based on previous barometric pressure values, or they were added at the first validation stage, but these values needed to be corrected more often than completed written observation in the third validation stage. Common issues and solutions are listed in Table 8.

Table 8 Transcription validation issues.

Full size table

Variable specific validations are noted in Table 9. Observers often wrote specific symbols for recurring, site-specific situations. For example, difficulties occurred in recording humidity levels occurred at York Factory when the air temperature was less than −20 °F (−28.9 °C), as the tables used for calculating humidity did not extend past −20 °F. There may have been additional problems with the reliability of the calibrations of the dry and wet bulb thermometers below this temperature. The observer recorded this with a modified asterisk symbol.

Table 9 Out of range flags.

Full size table

There is the potential for bias in values such as cloud cover, wind or precipitation where the observers did not always record “zero” when there was no cloud, wind or precipitation, but instead left the entry blank. This makes disambiguating data that was not recorded from days without a particular phenomenon, such as wind, cloud amount or precipitation, difficult. Users are advised to keep in mind this occasional issue between “no phenomena to record” and “no observations were made”. We advise caution, and for users to consider the entirety of the observation, such as whether other observations are present for that observation time, and whether these or the weather remarks support an interpretation of “no phenomena to observe” when using these data.

Range checks

Once the transcribed values were in a form appropriate for the variable they represented (float, integer, text, etc), the following value checks could be performed. Given the wide range and variability of the Canadian weather and climate, at this validation stage these checks are only at the extreme boundary edge of the expected range for the climate zone. If the values are not determined to be errors, they are flagged as “out-of-range” and left in the database. The range checks were performed on the original values before transformed into SI units. The ranges and flags for numerical values are listed in Table 9.

Values which are noted as out of range are checked to ensure there are no transcription errors such as a misplaced decimal.

Some of these concerns, such as multiple entries in the original handwritten entries, original entries containing questions marks, asterisks, or other non-standard characters, out-of-range values such as “11” for scale values with a maximum value of ten, are conserved in the original database and are only modified in post-processed files. On the other hand, when an observer made an error which is easily verified and corrected, such as writing “39.91” for a barometer entry rather than “29.91” when the sequence of barometric pressure shows a clearing falling pressure trend from 30 inches to 29 inches, the values are updated directly in the database. A data audit feature in the app tracks all changes made to the data post-transcription.

The final action in step 2 is the transformation to SI units and the production of the SEF files.

Step 3

The third validation check is to verify the data produced in the SEF files. Values are verified by station in groups of similar variables, for example temperature files are verified together, pressure files are verified together, and so on. Large deviations are investigated for transcription errors such as transcribing as 37 instead of 73. By transcribing all the observations, it is possible to make use of the integration of the observations: for example, the open-air temperature and the dry bulb temperature should be very similar; the mean daily humidity should be arithmetically close to the sum of the individual humidity observations, corrected pressure should usually be greater than station pressure, and so on. Any large deviations from expected patterns of comparative values can be quicky investigated.

Other transcriptions issues include the inclusion of additional observations, observations which were not recorded in column specified, and systematic non-standard recording of observations. These required amending of the transcription platform and establishing new protocols to best capture the meteorological information in a systematic and complete manner. One example was the recording of minimum temperatures in the force of vapour fields. Another was the systematic recording of precipitation events in the cloud column by several observers, while noting the cloud cover as 10/10^th. While not transcription errors, these deviations from standard practice on the part of the historical observers led to a modified protocol of adding the precipitation and weather notes into a newly added “Remarks” field, noting the type of precipitation in the precipitation type field, and marking the cloud type as “overcast”. This necessitated the substantial re-transcription of numerous pages. Overall, a total of approximately 5400 annotations (field groups) changed during the validation in the database. From this we arrive at a validation change rate of 2.4%.

Final dataset

Figure 9a shows the overall number of observations in each weather category for the dataset.

As can be seen in Fig. 9a, the observations that can by judged by eye, without the need for an instrument, are the most common, with wind direction and force, cloud type, direction of movement, and velocity forming a large part of the dataset. Thermometer observations are the next most common, as the thermometer is a reliable and robust instrument. Barometric pressure observations are nearly as plentiful as thermometer observations, while humidity measurements, mainly derived from dry and wet bulb thermometer observations, are also widely reported. Precipitation observations are not as common, partly since while wind and cloud (or their absence), and temperature, can be observed continually, precipitation is only observed when it occurs, which for most of the stations is typically about 10 days per month. Nevertheless, surprisingly few observers kept regular quantitative measurements of precipitation. This could be due to the prevalence of snowpack across the country and the relative difficulty of accurately measuring snow. Weather remarks were not kept in a regular fashion by all observers, with some providing detailed accounts of the weather and some none.

Observations become sparser to the west and north as can be seen in Fig. 9b. A large proportion of the observations are from Newfoundland, Labrador and the Maritimes; the provinces bordering on the Atlantic Ocean. Thanks to the efforts of observers connected to the Hudson Bay Company, there are also observations from the central and north-central parts of the country. There are fewer observations from the Northwest and Pacific Ocean regions.

Four separate measurements of daily temperature are shown in Fig. 10. The internal coherence of the data as a whole, and the use of inter-variable comparisons as data verification, are both expressed in this figure. The lowest thermometer readings are, as expected, the grass minimum thermometers (Fig. 10, green dots), as this measurement is designed to capture the outgoing longwave radiation emitted from the ground surface. The minimum air thermometer readings (blue) are the next lowest. Maximum air temperatures (red) are the second highest, with the blackbulb thermometer in the sun readings (yellow), designed to indicate the amount of incoming shortwave radiation, showing the highest values. If any value is unexpected relative to any others (for example, if the maximum air temperature is less than the minimum, or the blackbulb less than the air maximum), it can be investigated for errors.

We can further examine specific weather revealed in this dataset by looking at cases of extreme temperature events: the heatwaves of July 1857 and the cold spell of January 1859 (Fig. 11).

At least two distinct heatwave events occurred in Canada in July 1857. The first, from July 6 to July 15, is shown in Fig. 11a. High temperatures first occurred in the Red River Settlement (Winnipeg) to the west over July 6 to July 8, while temperatures remained cool in the east. Warmer conditions started to develop over central Canada on July 10, while the west started to cool on July 11. Conditions continued warm in southern Ontario and Quebec until July 15. A second heatwave (not shown) again spread across the country from July 22 to July 28, this time with temperatures reaching the high 20 s to low 30 s °C in the Atlantic provinces.

Eighteen months later, in January 1859, some of the lowest temperatures ever recorded were experienced in southern Canada. Once again starting in the west, temperatures were below −33 °C at the Red River Settlement (Winnipeg) on January 9. Temperatures below −34 °C were recorded in Kingston on the 10^th, −34 °C in Montreal on the 10^th and 11^th, and −39 °C in Quebec City and −38 °C in Stanbridge on the 10^th. By January 10th Red River had warmed to −17 °C, and to −10 °C om the 11^th as the cold wave swept eastward. The cold also moderated as it reached the Atlantic provinces, with a minimum of only −21 °C in Halifax on January 12 and −15 °C in St Johns on January 13.

Usage Notes

The data here has not been homogenized or corrected from the original values beyond that described in the validation section.

The SEF conventions require times to be converted to the Universal Time Co-ordinate. Given the uncertainties in the longitude estimates in many of the historical documents, we have used modern time zone approximations rather than rely on the longitude estimates for time conversions. For fields such as “time precipitation began” and “time precipitation ended”, observers sometimes wrote vague indicators such as “afternoon”, “since this morning” or “overnight”, rather than precise times. Precipitation totals were sometimes also measured over event durations which could last several days, leading to some precipitation amounts that are higher than expected for single day measurements.

Observers also regularly noted problems with instruments due to external factors. The humidity readings at York Factory were on occasion unable to be recorded as the temperatures were too low to enable readings. The rain gauge on the campus of Acadia College at Wolfville was reported broken on several occasions.

The instruments used to measure humidity are of unknown quality and work remains to be done to investigate historical humidity measurements.

Other known issues include the pressure values for Winnipeg being too low for credibility from January to July 1869. Wolfville pressure is also lower between September 1855 and March 1856 than for the remainder of the record. The relative humidity values for Wolfville between January and May 1856 are also suspect; most of them are too high if considered as accurate recordings but too low if divided by 1000, assuming the observers did not write in the decimal point. The mean sea level pressure values for Mount Forest are sometimes out of range, being higher than could be reasonably expected. The pressure values for Halifax Royal Engineers (RE) suggests two different sites; one from August 1852 to December 1856, and a different site from September 1858 to March 1862. Pressure values for Halifax Dockyards (DY) are higher than physically possible for July and August 1860. These values have been removed from the archived dataset version 1.0 pending further investigation. The original values are available from the authors.

Code availability

The code used to run the transcription website is archived at https://github.com/open-data-rescue/climate-data-rescue.

The code used to perform data validation checks and transform the values from the database of transcribed values to SI units and SEF standards is archived on GitHub at https://github.com/open-data-rescue/ODR-weather-data-files/tree/main/Canadian_stations/programs. The main source code is sef_generator_global.py. It has two possible execution modes: one to be run for each individual station with a json file with the station particulars, the other to access all the relevant metadata from the database and run all the stations in a loop. The code is designed to switch on parameters such as UTC offset or wind and cloud SI conversions, as full compliance with these standards can make validation more difficult at different stages of data production. The json files and metadata tables also have parameters to adjust for non-standard observation times or changes in observing practice, adjustable by variable, in the original observations.

Further code modifications are made on a continual update basis as other non-standard or observer-based changes are discovered in the historical observation set.

Code to read the data files can be found at https://github.com/open-data-rescue/ODR-weather-data-files/tree/main/Canadian_stations/programs.

References

Brönnimann, S. et al. Unlocking Pre-1850 Instrumental Meteorological Records: A Global Inventory. BAMS 100, ES389–ES413, https://doi.org/10.1175/BAMS-D-19-0040.1 (2019).
Article Google Scholar
Allan, R. et al. Toward Integrated Historical Climate Research: The Example of Atmospheric Circulation Reconstructions over the Earth: Toward Integrated Historical Climate Research. Wiley Interdisciplinary Reviews: Climate Change 7, 164–74, https://doi.org/10.1002/wcc.379 (2016).
Article Google Scholar
Camuffo, D. et al. 500-Year Temperature Reconstruction in the Mediterranean Basin by Means of Documentary Data and Instrumental Observations. Climatic Change 101, 169–199, https://doi.org/10.1007/s10584-010-9815-8 (2010).
Article ADS Google Scholar
Murphy, C. et al. A 305-year continuous monthly rainfall series for the island of Ireland (1711–2016). Clim. Past 14, 413–440 (2018).
Article Google Scholar
Vincent, L. A. & Mekis, É. Changes in Daily and Extreme Temperature and Precipitation Indices for Canada over the Twentieth Century. Atmosphere-Ocean 44, 177–93 (2006).
Article ADS Google Scholar
Ashcroft, L., Karoly, D. J. & Dowdy, A. J. Historical extreme rainfall events in southeastern Australia. Weather and Climate Extremes 25, 100210 (2019).
Article Google Scholar
Ribes, Aurélien, Qasmi, S. & Gillett, N. P. Making Climate Projections Conditional on Historical Observations. Science Advances 7, eabc0671, https://doi.org/10.1126/sciadv.abc0671 (2021).
Article ADS PubMed PubMed Central Google Scholar
Thorne, P. W. et al. Guiding the Creation of a Comprehensive Surface Temperature Resource for Twenty-First-Century Climate Science. BAMS 92, ES40 (2011).
Article ADS Google Scholar
Cram, T. A. et al. The international surface pressure databank version 2. Geosci. Data J. 2, 31–46 (2015).
Article ADS Google Scholar
Compo, G. P. et al. The International Surface Pressure Databank version 4. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory. http://rda.ucar.edu/datasets/ds132.2/ (2019).
Slivinski, L. C. et al. Towards a more reliable historical reanalysis: Improvements for version 3 of the Twentieth Century Reanalysis system. Quart. J. Roy. Met. Soc. 145, 2876–2908 (2019).
Article ADS Google Scholar
Brugnara, Y. et al. A collection of sub-daily pressure and temperature observations for the early instrumental period with a focus on the “year without a summer” 1816. Clim. Past 11, 1027–1047 (2015).
Article Google Scholar
Ciarlante, M & Reytar, A. Climatological Observations made outside the United States, 1821–1892. M1958. National Archives and Records Administration (2005).
Dupigny-Giroux, L. A. et al. NOAA’s Climate Database Modernization Program: Rescuing, Archiving, and Digitizing history. BAMS 88, 1015–1017 (2007).
Article ADS Google Scholar
Hopkins E. J. & Moran J.M. in Historical Climate Variability and Impacts in North America (eds Dupigny-Giorux L.-A. & Mock, C. J.) Ch.11 (Springer, 2009).
Slonosky, V. C. et al. From Books to Bytes: A New Data Rescue Tool. Geosci. Data J. 6, 58–73, https://doi.org/10.1002/gdj3.62 (2019).
Article ADS Google Scholar
Sieber, R. & Slonosky, V. C. Developing a Flexible Platform for Crowdsourcing Historical Weather Records. Historical Methods 52, 164–77, https://doi.org/10.1080/01615440.2018.1558138 (2019).
Article Google Scholar
Brunet, M. et al. Best Practice Guidelines for Climate Data and Metadata Formatting, Quality Control and Submission of the Copernicus Climate Change Service Data Rescue Service https://doi.org/10.24381/kctk-8j22 (2020).
Article Google Scholar
Noone S et al. Progress towards a holistic land and marine surface meteorological database and a call for additional contributions. Geosci. Data J. 103–120, https://doi.org/10.1002/gdj3.109 (2021).
Lundstad, E. et al. The global historical climate database HCLIM. Sci. Data 10, 44, https://doi.org/10.1038/s41597-022-01919-w (2023).
Article PubMed PubMed Central Google Scholar
National Centers for Environmental Information. NOAA Archive https://www.ncei.noaa.gov/archive (2025).
Fleming, J. R. Meteorology in America, 1800–1870 (Johns Hopkins University Press 1990).
Belton, T. From Meteorological Registers to Climate Data: Information Gathering in the Early Years of the Meteorological Service of Canada. Archivaria 84, 127–149 (2017).
Google Scholar
Silverman, S. M. Joseph Henry and John Henry Lefroy: A Common 19th Century Vision of Auroral Research. Eos, Transactions American Geophysical Union 70, 227–40, https://doi.org/10.1029/89EO00118 (1989).
Article ADS Google Scholar
Sieber, R., Slonosky, V., Ashcroft, L. & Pudmenzky, C. 2022. Formalizing Trust in Historical Weather Data. Weather, Climate, and Society 14, 993–1007, https://doi.org/10.1175/WCAS-D-21-0077.1 (2022).
Article ADS Google Scholar
Guyot, A. Directions for Meteorological Observations, Intended for the First Class of Observers. Smithsonian Institution (1850).
United Kingdom Meteorological Office. Meteorological Observations at the Foreign and Colonial Stations of the Royal Engineers and the Army Medical Department, 1852-1886 - M.O. 83. MET/2/1/3/57. Exeter: United Kingdom Met Office National Meteorological Library and Archive (1890).
Robertson, S. Private Weather Diary for Thunder Bay, Lake Superior, Canada. MET/2/1/2/3/534. Exeter: United Kingdom Met Office National Meteorological Library and Archive (1870).
United Kingdom Meteorological Office. Climatological Returns for Newfoundland, Canada, North America 1852–1870 (DCnn: 9NFL). Exeter: United Kingdom Met Office National Meteorological Library and Archive (1871).
United Kingdom Meteorological Office. Climatological Returns for Halifax, Nova Scotia, Canada, North America 1852–1875 (DCnn: 9HFX). Exeter: United Kingdom Met Office National Meteorological Library and Archive (1876).
United Kingdom Meteorological Office. Climatological Returns for Halifax, Citadel Hill, Canada, North America 1854–1865. Exeter: United Kingdom Met Office National Meteorological Library and Archive (1866).
United Kingdom Meteorological Office. Climatological Returns for Quebec, Canada, North America (DCnn: 9QBC), Exeter: United Kingdom Met Office National Meteorological Library and Archive (1871).
United Kingdom Meteorological Office. Climatological Returns for Kingston, Canada, North America (DCnn: 9KIS). Exeter: United Kingdom Met Office National Meteorological Library and Archive (1862).
United Kingdom Meteorological Office. Climatological Returns for Manitoba, St John's College, Canada, North America 1873–1879. (DCnn: 9MAT). United Kingdom Met Office National Meteorological Library and Archive (1880).
United Kingdom Meteorological Office. Climatological Returns for New Westminster - British Columbia, Canada, North America (DCnn: 9NEW). ARCHIVE Z18.K1-Z17.C3. Exeter: Meteorological Office Archives (1866).
Royal Engineers. Meteorological Observations for Kingston. Canadian Journal 1855 (1855).
Royal Engineers and the Army Medical Department. Meteorological Observations at the Foreign and Colonial Stations, 1852–1886. 83 (1890).
James, W. On Meteorological Observations. Papers on Subjects Connected with the Duties of the Corps of Royal Engineers (new series) 4, 75–77 (1855).
Google Scholar
Open Data Rescue. Open-data-rescue/climate-data-rescue. https://github.com/open-data-rescue/climate-data-rescue (2025).
Guyot, A. Directions for Meteorological Observations, Intended for the First Class of Observers. Washington DC: Smithsonian Institution, 1850.
ICOADS. International Comprehensive Ocean-Atmosphere Data Set. U.S. National Oceanic and Atmospheric Administration, https://icoads.noaa.gov/ (2025).
World Meteorological Organization. International Cloud Atlas. https://cloudatlas.wmo.int/en/home.html (2017).
Slonosky, V., Black, R. & Podolsky, L. AIR TEMPERATURE, Surface pressure, and others collected from FIXED STATIONS OF CANADA in Canada from 1768-09-11 to 1884-02-29 (NCEI Accession 0304217). NOAA National Centers for Environmental Information. Dataset. https://doi.org/10.25921/g637-9093 (2025).
WMO (World Meteorological Organization) WIGOS Metadata Standard. WMO-No.1192. Geneva, 51pp. ISBN 978-92-63-11192-0 (2019).
Valente, M. A. et al. Guidelines for Inventory Metadata Standards and Formats. Copernicus Climate Change Services, C3S_DC3S311a_Lot1.2.1_2018_Guidelines for inventory metadata_v1.docx (2018).
Monitoring and Data Services Directorate. MANOBS Manual of Surface Weather Observations. Environment and Climate Change Canada Meteorological Service of Canada, https://www.canada.ca/content/dam/eccc/migration/main/manobs/73bc3152-e142-4aee-ac7d-cf30daff9f70/manobs_7e-a19_eng_web.pdf (2019).

Download references

Acknowledgements

The authors would like to acknowledge Rob Smith, wrote the original code for the transcription platform for the DRAW project (https://citsci.geog.mcgill.ca/). The authors also thank Aurora Feletti, Lucy Wilkins, Thompson Yu, Jennifer Dowker, Sam Heinrichs, Brittany Brammer, Andriana Giarliris, Haven Poole, Jean-Paul Hacot, Anthi Tsobou, Fintan Neylan, Ateeque Siddique, Fintan Neyland, Willem Norland, Lucas Ferrazza, Ruoqian Liu, Alison Thiel, Brittany Nolan, and Oscar Hahnel for their dedication and accuracy. We would also like to thank Jason Cooper from NOAA for his help in locating the Canadian weather records from the NCEI data centres and Mark Beswick from the UK Met Office Library and Archives for provided scanned images of the records from the UKMO archives. Hervé Hacot, Henry Balen, David McIver, Antoine Rehberg and Christopher Holland provided technical support and advice. Two anonymous reviews provided excellent comments which greatly improved this paper. This work was funded by ECCC grants/contracts 3000705251 and 3000725269.

Author information

Authors and Affiliations

Open Data Rescue, Saint-Lambert, Canada
Victoria Slonosky & Rachel Black
Affiliate Researcher, Tomlinson Lab, McGill University, Montreal, Canada
Victoria Slonosky
Nova Scotia Provincial Archives, Halifax, Canada
Rachel Black
Texas A&M Corpus Christi, Corpus Christi, USA
Lori Podolsky
Climate Research Division, Environment and Climate Change Canada, Toronto, Canada
Xiaolan Wang & Vincent Cheng
Environment and Climate Change Canada, Suite 200, 2474 Arbutus Rd, Victoria, BC, V8N 1V8, Canada
Xiaolan Wang

Authors

Victoria Slonosky
View author publications
Search author on:PubMed Google Scholar
Rachel Black
View author publications
Search author on:PubMed Google Scholar
Lori Podolsky
View author publications
Search author on:PubMed Google Scholar
Xiaolan Wang
View author publications
Search author on:PubMed Google Scholar
Vincent Cheng
View author publications
Search author on:PubMed Google Scholar

Contributions

Victoria Slonosky organized the project, designed the data transformation and validation concepts, the register and page types, oversaw the transcription work and wrote the paper. Lori Podolsky designed the register and page types, the metadata schema, oversaw the transcription work and validated data transcriptions. Rachel Black performed image quality control, oversaw the transcription work and data validation and project management. Xioalan Wang conceived the idea for this data descriptor paper, and wrote the statement of work, and served as the scientific authority, of contract 3000705251, and reviewed earlier versions of this paper. Vincent Cheng served as the scientific authority of contract 3000725269, reviewed earlier versions of this paper, and is responsible for publishing this dataset in the Canadian Government Open Data Portal.

Corresponding author

Correspondence to Victoria Slonosky.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Slonosky, V., Black, R., Podolsky, L. et al. Transcribing historical Canadian weather data. Sci Data 13, 678 (2026). https://doi.org/10.1038/s41597-025-06036-y

Download citation

Received: 28 May 2025
Accepted: 24 September 2025
Published: 29 April 2026
Version of record: 29 April 2026
DOI: https://doi.org/10.1038/s41597-025-06036-y

Subjects

Abstract

Similar content being viewed by others

High-resolution climate reconstruction from historical Chinese weather records using optimized natural language processing

Unified 0.25-degree gridded infrastructure-critical extreme weather for the United States from 1979 to 2100

Gridded daily weather data for North America with comprehensive uncertainty quantification

Background & Summary

Methods

The image files

The register types

The page type

The transcription process

Data transformation to modern standards

Data Record

Metadata

Data Export files 1: CSV files

Data Export Files 2: SEF files

Data Export Files 3: Final SEF and NCEI format csv

Technical Validation

Step 1: Page checks

Step 2

Range checks

Step 3

Final dataset

Usage Notes

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links