Abstract
We present the Manually Annotated GONG Filaments in H-alpha Observations (MAGFiLO v1.0) dataset. This dataset contains 10,244 annotated filaments from 1,593 observations captured by the Global Oscillation Network Group (GONG), spanning the years 2011 through 2022. Each annotation details one filament’s segmentation, minimum bounding box, spine, and magnetic field chirality. With a total of over one thousand person-hours of annotation, and a double-blind review process, we ensured high-quality ground-truth data. Our inter-annotator agreement reaches a Kappa score of 0.66. We also verified that the hemispheric preference of filaments as annotated in MAGFiLO aligns with the findings from similar datasets of much smaller sample sizes. MAGFiLO is the first dataset of its size, enabling advanced deep learning models to identify filaments and their features with unprecedented precision. It also provides a testbed for solar physicists interested in large-scale analysis of filaments. In this report, we document the details of the annotation and the post-processing phases that were applied.
Similar content being viewed by others
Background & Summary
Chromospheric filaments are dense “clouds” of solar material suspended by magnetic field lines above photospheric neutral lines. They are formed above magnetic Polarity Inversion Lines (PILs)—an imaginary field-free line that separates the positive and the negative polarities of the magnetic flux of the Sun. Chirality of a filament refers to the handedness or orientation of the magnetic field lines that make up the filament. Smaller in size, and shorter lived active region filaments develop above PIL situated between two magnetic polarities forming the same active region (type A) or between two close active regions (type B). Larger in size and longer lived quiescent filaments are associated with weaker magnetic fields outside active regions. Polar crown filaments are a special class of quiescent filaments that form above the PIL separating polar magnetic fields from the magnetic field of decaying active regions, which gradually moves towards solar poles. Thus, polar crown filaments serve as a visual trace for polar field reversal. This is one of the key aspects of solar magnetic cycle. For review of filament/prominance observational properties the reader is referred to Engvold1. filaments appear in the solar corona that is visible in absorption of the H-alpha line on the disk, and in emission off the solar limb (historically, called prominences). We refer to such observations with H-alpha filters as H-alpha images.
The filament material can accumulate near the top of a magnetic arcade (sheared arcade model) or at the low part of a near horizontal twisted flux tube (flux tube model)2. In the latter case, one can visualize the magnetic field lines in a filament as a slinky, but one with only a few windings. Note that the main component of a filament magnetic field pre-eruption is always along the axis. In the sheared arcade model, a flux rope forms later, prior to filament eruption via magnetic reconnection that connects separate arcade field lines to a coherent slinky-like rope. Relatively cool plasma (≈104 K) can accumulate in the troughs of the magnetic field lines, where it remains suspended against gravity because its ionized plasma is “frozen in” with the magnetic field3. At this temperature this “cool” material absorbs the H-alpha line emission from the underlying photosphere, so that the filament appears dark against the disk. When viewed against dark sky (off limb), the same filament that appears as a dark feature on the disk, would appear as a bright emission feature called prominence. The surrounding solar corona has a temperature of ≈106 K or above.
The significance of filaments for space weather lies in that filaments are closely associated with solar eruptive events (Coronal Mass Ejections, CMEs4), solar flares5, and Solar Energetic Particle (SEP6) storms. An Earth-directed CME can cause enormous damage to our electric power supply grid, offset the GPS system, create radiation hazards for passengers and crew on polar flights, and can be lethal for astronauts as they travel outside the protective bubble provided by the Earth’s magnetosphere. A 2008 report by the Space Studies Board of the National Research Council concludes that a solar superstorm, similar to the once-in-a-century 1859 Carrington event7,8, could cripple the entire US power grid for months and lead to economic damage of $1-2 trillion9. Large CMEs and high intensity solar flares are almost always associated with the eruption of the chromospheric filaments. A severity of geomagnetic storms caused by a CME is defined by the orientation of magnetic field in CME. In general, if the magnetic field in the leading part of CME is directed opposite to the magnetic field of Earth’s magnetosphere, a stronger geomagnetic storm is expected. A combination of the orientation of filament body on the Sun and its chirality would largely define the orientation of magnetic field in a CME, when it arrives to Earth orbit. Therefore a global network (GONG) exists that monitors the solar disk for filaments 24/7 to catch eruptions as they happen. From there it takes a CME one to several days to reach the Earth and have its full impact.
Since our dataset revolves around filaments’ chirality, let us explain this concept. The direction of the axial field of filaments on the Sun can be determined from full-disk H-alpha observations and magnetograms, both of which are acquired 24/7 through the Global Oscillation Network Group (GONG10,11). The direction of filaments determines the direction of their axial field except for its sign (the magnetic field vector can point in two directions along the axis). Martin, in a series of ground breaking papers in the 90’s, reviewed in Martin12, introduced a method for determining the sign of the axial magnetic field in filaments from H-alpha observations and magnetograms. We already mentioned what H-alpha observations are. Magnetograms are line-of-sight magnetic field maps of the solar photosphere, derived by measuring line-splitting through the Zeeman effect in photospheric spectral lines. Filaments are always overlying PILs, the lines that form the boundary between regions of positive and negative magnetic polarities, where the line-of-sight magnetic field is zero. Now imagine standing on the photospheric surface in a positive magnetic polarity looking at a PIL with a filament suspended above it. If the axial magnetic field of the filament points to the right observed from that vantage point, the filament is said to have Dextral (or right hand) chirality, and Sinistral (or left hand) chirality if the axial field points to the left. While the chirality of a filament can be determined from H-alpha images alone with Martin’s method, to determine the direction of the axial magnetic field—which our dataset can help identify—a simultaneous magnetogram is needed, as is evident from the definition above: one needs to know on what side of the PIL the positive magnetic flux is.
With respect to statistical studies, the community have been largely relying on the Advanced Automatic Filament Detection and Characterization Code (hereafter, AAFDCC) which automatically detected filaments and identified their geometrical properties13. Compared to its predecessors14,15,16,17, it was the first to go beyond the localization of filaments (not counting earlier manual surveys18). It determines filaments’ chain code outlining their shapes, spines, and barbs (see the definition of these features in Definitions of Filaments’ Labels and Features). Based on the obtained information and inspired by the theory of filament formation by Martin12, the algorithm identifies filaments’ magnetic chirality. In March 2010, AAFDCC was deployed for daily report of the detected filaments and their characteristics on the H-alpha images captured by the Big Bear Solar Observatory (BBSO)19,20. The module’s outputs had since been posted (twice a day) to the Heliophysics Event Knowledgebase (HEK)21 to feed the analytical experiments on filaments and their characteristics and impacts. Unfortunately, since mid-2017, AAFDCC has no longer been reporting to HEK and no alternatives have been provided. Although, unlike our effort, AAFDCC detects filaments automatically, our manual annotation is largely inspired by its strengths and weaknesses22.
Several other studies were carried out to develop automatic filament detection. To list just a few, Shih et al.15 utilized thresholding and morphological operations for detection of filaments from H-alpha observations15. Qu et al.16 similarly stacked layers of image-processing algorithms for detection of filaments, and used the Support Vector Machine (SVM) algorithm for distinguishing filaments from sunspots16. Scholl et al.23 detected filaments and coronal holes from Extreme Ultraviolet (EUV) observations and magnetograms23. They used histogram-based and edge-detection algorithms. Joshi et al.24 also developed an algorithm for automatic detection of filaments from H-alpha observations using classical image-processing operations24. Yuun et al. (2011) identified filaments using similar operations, but they further converted each filament to a graph representing its skeleton to be used for identification of their chirality25. Karachik et al.26 applied thresholding on both H-alpha and Magnetorgrams for detection of filaments26. Pötzi et al.27 developed a near real-time filament detection using feature extraction for an image-segmentation algorithm, tailored for H-alpha observations captured by Kanzelhöhe Observatory27. Ahmadzadeh et al. (2018), used Mask-RCNN, a state-of-the-art object detection algorithm for detection of filaments from the observations made by the Big Bear Solar Observatory. Most recently, Lin et al. (2023) created an impressively large collection of detected filaments from 100 years of observations28, with a focus on spatiotemporal parameters of filaments across multiple solar cycles, such as latitude, longitude, tilt angle, etc.
Methods
Our Data Source of Solar Filaments
We distinguish between a data source, a data pool, and a data feed pipeline. This distinction is illustrated in Fig. 1. A data source is where raw (or minimally processed) data are stored following a set of data collection strategies. We consider the H-alpha image archive of GONG as our data source. We provide more information about GONG shortly. A data pool contains data that are sampled from a data source following a set of sampling strategies. Data in our data pool are preprocessed and ready for manual annotation although they may not all be used. Our data feed pipeline contains all data that are to be distributed among annotators, following a set of data distribution strategies. Unlike the other two, a data feed pipeline—as the name suggests—is essentially a dynamic process following a set of data distribution strategies. Details of these strategies are given in relevant sections.
The Global Oscillation Network Group (GONG)10 is a worldwide network of identical robotic telescopes strategically placed at six locations around the world29. The six locations are the Big Bear Solar Observatory (CA, USA), Mauna Loa Observatory (HI, USA), Learmonth Solar Observatory (Australia), Udaipur Solar Observatory (India), El Teide Observatory (Canary Islands, Spain), and Cerro Tololo Inter-American Observatory (Chile). These sites were identified to form a network of instruments which together optimally reduce the observed diurnal cycle and mitigate the impact of local weather. Towards achieving continuous observations, this network allows a 24/7 observation of the Sun year round30,31. Reported in 2021, the GONG network of instruments achieved the mean duty cycle of 93%, which indicates the (average) fraction of the 24-hour period in which observations are available. This continuity has been maintained over the last 18 years of the GONG’s 25-year ongoing operation32. H-alpha instruments were added to all GONG stations in 2010. At each GONG site, H-alpha observations are taken at 60 second cadence, with the acquisition time for the adjacent stations intentionally shifted by ± 30 seconds, effectively enabling a 30-second cadence for the network observations. The data are processed on each GONG site, transferred to the Data Center in Boulder, Colorado, and made public within one minute after their acquisition. Additional details about GONG instrumentation for taking H-alpha observations can be found in Diercke et al.33. We use the GONG’s archive of 2k × 2k H-alpha observations of the Sun, in FITS format (short for Flexible Image Transport System), as our data source.
Data Preparation for Annotation
To prepare the GONG H-alpha observations for manual annotation, we took a few steps the details of which are discussed in this section. An overview of those steps is as follows: first, we sampled observations in which a significant presence of filaments is identified. Then, we converted the observations, originally in FITS format, to JPEG format. Lastly, we identified anomalous observations and replaced them with normal observations. At this stage, the images were ready to be distributed among the annotators.
Data Sampling
Compared to the observation cadence, filaments are not prevalent events. Therefore, a uniform sampling strategy would result in a small set of filaments in a disproportionately large set of observations. This would be particularly problematic during the period of low level of solar magnetic activity. Such an imbalance would burden the annotators with extra browsing time searching for filaments. Moreover, the final manually-annotated dataset would depend on a much larger image data to be downloaded, maintained, and fed into machine learning models for training. To bypass this issue, in our sampling strategy, we used the filaments data reported to the Heliophysics Events Knowledgebase (HEK)21 as a proxy for identifying the time intervals during which observations are more likely to contain filaments. The exact steps of our data sampling strategy are as follows:
Step 1. Retrieve filaments’ spatiotemporal data from HEK: Using the HEK module of the SunPy v4.1.0 package34 we queried all the filament metadata available in HEK, reported by the AAFDCC algorithm, between Jan 1, 2011 and Dec 31, 2017. We limited our query to this interval because, unfortunately, AAFDCC stopped reporting to HEK since mid-2017, without any replacement. This resulted in 42, 045 identified filaments, detected from 2, 626 unique observations. The breakdown of these numbers are shown in Fig. 2.
The plot shows the total filament area (in pixels) per observation, between the year 2011 and 2017, as reported to HEK by the AAFDCC algorithm13. Yearly data are colored differently (alternating blue and gray). Three pieces of information corresponding to each period are, from top to bottom: the year number, the total number of observations per year, and the total number of identified filaments per year. The dotted red line indicates the 10th percentile of filaments’ area at 4, 334.5 pixels.
Step 2. Form high-activity time intervals: Using the collected spatiotemporal data in step 1, we computed the total filaments’ area per observation. For this, we considered each filaments’ segmentation area, i.e., values paired with the keyword ‘fi_n_pixels’ in the HEK reports. Next, we defined an observation to have a significant presence of filaments if the total area of identified filaments in that observation is above the 10-th percentile of per-observation area (at 4, 334.5 pixels), shown by the red dotted line in Fig. 2. Dropping all observations with an insignificant presence of filaments left us with a set of observation timestamps, each corresponding to an accepted filament. We then composed time intervals from those observations as follows: we used a five-day gap threshold, meaning that any two timestamps t1 and t2 with less than five days of temporal gap formed a time interval, [t1, t2]. Further, any two time intervals [t1, t2] and [t3, t4] with less than five days of gap between them were merged and formed a longer time interval, [t1, t4]. As a result, we obtained 67 intervals ranging from one day to over 800 days in length, as illustrated in Fig. 3.
Step 3. Sample H-alpha observations from high-activity time intervals: Using the data search and retrieval tool provided by the SunPy package, namely “Fido”, we developed a sampling strategy which collected GONG H-alpha observations captured within each high-activity time interval obtained from Step 2. We set the sampling cadence to six hours in order to prevent extremely similar observations from entering our data pool. As a result, we sampled 16, 828 observations in FITS format, roughly 3, 000 per instrument. As mentioned earlier, this large collection of observations serves as our data pool and it contains many more sampled than planned to be annotated.
Data Conversion
The sampled images in our data pool are in FITS format. Since pixel values in FITS format—unlike the typical 8-bit images—range from 0 to 214, different transformation functions, such as log or squared, are used for viewing those images. To provide the annotators with a unified view, we converted the FITS files to JPEG format. For this conversion, we relied on the exact same procedure adopted by NSO, as follows: At each GONG site, the observations are taken with a variable exposure time to normalize the image intensity to the solar disk center. The size of images is normalized to compensate for a slight (3%) change of the diameter of solar disk due to variations in Sun-Earth distance throughout a year. To improve visual identification of filaments and other solar features (solar limb, sunspots, plages, prominences), large scale variations of intensity (e.g., solar limb darkening) are removed by applying a digital filter based on a Fourier Transform. Originally, this, so called “image sharpening”, was done to enable a quick visual inspection of solar activity Data in FITS format are the original observations.
We used the fits2jpeg software implemented by William Cotton35, which takes advantage of the NASA’s CFITSIO library36. A Python wrapper of fits2jpeg, created by our team, is also available37. As a result, the annotators worked on the exact same JPEG images as the GONG archive provides, but without the watermarked metadata.
Anomaly Detection and Removal
When GONG H-alpha images were introduced to the observing program, they were intended as a quick reference for the space weather forecasters. Thus, no cloud or other anomaly detection was implemented as part of regular observations. The anomalies may include clouds, airplanes, birds, tree branches, test targets, and in rare cases, man-made structures at a low horizon. Although some of the anomalous images are removed, when noticed by the operator, some may still be present in the data set. For the purpose of this project, we conceptually defined an anomalous H-alpha observation to be an image exhibiting noise in regions or the entire observation to the extent that features at those regions are no longer discernible. This definition encompasses only observations with significant issues, leaving the rest, such as mildly blur observations, to be considered normal. Some of the typical issues which may render an observation anomalous, by our definition, are as follows: (1) the solar disk is un-centered, (2) some regions of the solar disk are over-saturated, (3) parts of an observation are obscured by clouds, (4) a placeholder image is used instead of an actual observation, (5) an observation is occluded by shadows of objects such as airplanes and transmission towers, (6) an observation is significantly blurry. As per our estimate, 3–5% of the GONG’s H-alpha observations fall into this definition. To minimize the number of such observations in our data feed pipeline, we implemented an anomaly detection algorithm to ensure that the annotators receive as few anomalous images as possible in their image batches.
Our anomaly detection algorithm partitions each observation into 8 × 8 equal-size cells and compares the average pixel intensity of each cell against the same regions of historical observations. Any cell with an average pixel value outside the ‘acceptable range’ is deemed anomalous. An image containing at least one anomalous cell is classified as an anomalous image. To find the ‘acceptable range’ per cell, we used the One-Way ANOVA F-test. We first introduced the S-statistic Si,j,k = ∣Ui,j − Ii,j,k∣ + ∣Li,j − Ii,j,k∣, where Ui,j and Li,j are the candidate upper-range and lower-range pixel values for cell (i, j), and Ii,j,k is the average pixel value for cell (i, j) of image k. Using the F-test, we computed the best values of U and L based on a collection of 400 observations containing 200 anomalous and 200 normal observations. These images were manually labeled. The algorithm performed relatively well, with F1-score of 0.88 (0.2 false-positive rate, and 0.05 false-negative rate).
For the missed anomalous observations, we manually identified and replaced them with non-anomalous observations before sending them to our data feed pipeline. In addition to these two layers of filtering mechanism, we trained the annotators to be able to distinguish and report anomalous observations in case they came across any during annotation. The reported anomalous images, if confirmed by the reviewers, were removed from the data feed pipeline.
Manual Annotation
We prepared thousands of GONG H-alpha observations to be distributed among the annotators. In this section, the core of our project, i.e., manual annotation of solar filaments, is discussed. Our goal is to ensure that the users of this dataset are fully aware of the annotation process, and the intrinsic weaknesses and strengths. It is important to note that it is not possible to annotate complex phenomena such as solar filaments without significant simplifications and compromises. Our manual annotation efforts, despite employing best practices, should be seen in that light.
Onboarding and Management Routine
In a nutshell, the following steps were taken for each group of newly recruited annotators throughout their contribution period: (1) An account would be created for each annotator to access the annotation platform and appropriate roles and permissions would be granted to them (see Annotation Platform). Then, (2) a 90-minute training session (online, synchronous) would take place during which the annotators would learn what exactly is expected from them, how to efficiently use the annotation platform, and how to reach out to the team administrators to get help whenever needed (see Training Human Annotators). Following the training session, (3) each annotator would be given a time frame to complete the annotation of a small batch of ten images, referred to as the evaluation batch. Those who completed the evaluation batch with a reasonable quality would be encouraged to continue. (4) The annotations would then be reviewed and feedback would be provided (see Annotation Review Process). Lastly, they would be given batches of 60 images each, with more flexible deadlines. It would typically take two to six weeks for one batch to be completed. The annotators could continue this work for as long as they desired. As a result, we regularly onboarded new annotators to fill in vacancies.
Annotation Platform
To maximize the quality and quantity—in that order—of the annotations, we adopted an online image-labeling service, named V7 Darwin (V7, hereafter). Even though V7 offers automatic annotation using ML, we refrained from using it as it would have defeated our purpose of manual annotation. Instead, we relied on (1) its rich manual toolbox containing tools for pixel-precise annotation of complex shapes, (2) its flexible workflows which made a systematic review and feedback loop possible, (3) its utilities for monitoring the time and quality of the work throughout the annotation period, and (4) its Software Development Kit (SDK) for automation of image distribution (see Data Distribution) and retrieval of the annotations. It is worth noting that annotators were allowed to use the Auto-annotate tool which is not ML-supported; it runs a simple clustering-based image-processing algorithm to capture a rough estimate of the objects’ region. That said, the annotators could use that tool only for creating the first draft of segmentations to speed up the annotation process. In the first draft created using the Auto-annotated tool, the barbs would almost always be ignored. The annotators would then use the Brush and Eraser tools to carefully refine each segmentation.
Definitions of Filaments’ Labels and Features
Let us borrow from Sara Martin’s extensive work on solar filaments12, the definition of the two key concepts which we used for the annotation of filaments, namely the spines and the barbs of filaments: “The spine [of a filament] is synonymous with the horizontal fine structure along the axis and is the highest part of the filament. […] The barbs (Kiepenheuer, 1953) are appendages along the sides of a filament which extend from the spine.” Both of these concepts are visualized in Fig. 4.
The graphic shows a few screenshots of the V7 annotation platform capturing the main steps of manual annotation of a filament. First, the chirality of the filament is visually identified. Second, the Auto-annotate or Brush tool is used to create a segmentation mask for the filament. Third, the Brush tool is used to refine the segmentation so that small details are captured, focusing on the barbs and their orientation. Forth, the Polyline tool is used to capture the spine of the filament.
The annotators were tasked to categorize each solar filament into one of the four given labels: left chirality, right chirality, ambiguous chirality, and unidentifiable chirality. Additionally, they were tasked to capture each filament’s features by identifying their barbs and spines. Below, we explain these concepts and the constraints we enforced on them.
Labels
Filaments with discernible chirality were labeled as either “left chirality” or “right chirality”, depending on the orientation of the barbs. We already explained how a filament’s chirality can be determined from H-alpha observations (see Background & Summary). If a filament’s chirality was not discernible due to the invisibility of the barbs, they were labeled as “unidentifiable”. In rare occasions, where contradicting orientations of barbs were identified, they were labeled as “ambiguous”. Those identified as ambiguous were later analyzed more thoroughly by the reviewers and relabeled as one of the other three labels. In doing so, their segmentations were also modified to satisfy the constraints discussed below. In the final dataset, no filaments are labeled as ambiguous.
Features
For each filament, regardless of its label, two key features were to be carefully captured: the shape of the filament, including its barbs (if any), and the spine of the filament. Filaments’ shapes were captured using segmentation masks, and their spines were captured using a polyline, as follows.
A segmentation mask (hereafter, segmentation) is a binary mask that is created to capture a region in an image. To create instance segmentations, the annotators used the V7’s Brush and the Eraser tools with an adjustable tip size (see Fig. 4). Note that although the annotators used these tools to create pixel-level segmentation for each filament, the annotations were stored as high-granular polygons. This approach noticeably reduces the size of the annotation data since the number of corner points forming a polygon is usually significantly less than the size of the matrix representing a segmentation mask. For a detailed discussion about such statistics see Analysis of Annotated Features.
In this project, an acceptable segmentation is required to have minimal false-positive and false-negative rates. Practically, a segmentation is considered acceptable if it satisfies all of the following constraints:
-
1.
all visible (if any) prominent barbs of the filament are captured,
-
2.
all captured barbs (if any) agree with the identified chirality of the filament,
-
3.
the segmentation is in one piece (no islands are allowed) and it contains no holes, and
-
4.
the segmentation tightly fits the body of the filament.
To minimize capturing background noise in H-alpha observations as barbs, the annotators were asked to first identify the chirality of a given filament by visually analyzing its textural patterns, and only then focus on pixel-level details for drawing a segmentation. Following this order, as illustrated in Fig. 4, when a filament is labeled as unidentifiable (therefore, not exhibiting any barbs), annotators are generally less likely to confuse background noise with barbs. Annotators were specifically asked to minimize the details in their segmentations of filaments labeled as unidentifiable.
In the review process, as a rule of thumb, reviewers rejected segmentations of left/right-chiral filaments if they could not correctly identify their chirality solely by looking at the given segmentation structure (instead of the filament in the corresponding observation). Further, they rejected segmentations of unidentifiable filaments if any barb was captured in the segmentation.
A polyline is a sequence of points which together form a discrete curve (i.e., an undirected path). Unlike in AAFDCC where the identification of barbs depends on the identification of spines, the spines and barbs in this project were created independent from one another. Moreover, reviewers expected that, regardless of a filament’s chirality its spine be annotated using a polyline. Spines were used for the identification of the curvature and length of each filament. A spine was considered acceptable if it satisfied all of the following constraints:
-
1.
the polyline is in one piece (not disconnected),
-
2.
the polyline stretches from one very end of the filament to the other very end,
-
3.
the polyline does not leave the segmentation area (remains in the middle of the narrow body of the filament), and
-
4.
the polyline follows the curvature of the filament smoothly (by creating more points at places with a greater curvature).
Data Distribution
As illustrated in Fig. 5, we formed 24 groups of potential annotators, each working on three copies of one batch of 60 H-alpha observations. The inclusion of duplicates permits cross verification of annotations. The annotation was a double-blind procedure: annotators in each group worked independently so that they would not be impacted by the decisions made by the other annotators in the same group (working on the same set of images). Moreover, each annotation was independently reviewed and verified. That is, a reviewer did not know the decisions made by any other annotators working on the same batch. No annotator was assigned more than one batch from the same group. This ensured that no duplicate observations were annotated by a single annotator. See Analysis of Annotated Features for the exact number of images annotated once, twice, or thrice.
The graphic shows the distribution of images among the annotators. Each annotator in each group of three works independently on the same batch of images. Each blue ellipse represents a batch of images assigned to each annotator. The 4-digit code XXYY on each ellipse stores the group number XX and the annotator ID YY within that group. The triplicators, shown as red disks, triplicate each observation so that every annotator in one group has access to a unique copy of observations assigned to that group.
A Supplementary Source of Information for Annotators
Identification of filaments’ chirality on still images can become challenging at times. We anticipated that without any extra piece of information, a majority of filaments would be labeled as unidentifiable, resulting in an extremely imbalanced dataset. To help the annotators in the process, we developed a web application, named the GONG H-alpha Viewer, for searching through the GONG archive of H-alpha observations. This application has been made publicly available for the community as well. Using this web app, an annotator would simply copy an observation’s file name (e.g., 20240617225902Bh) to the text-box in the app to see the corresponding observation. They could then easily monitor the evolution of each filament by going over the next/previous observations. The app makes it possible to make jumps as well, i.e., retrieving every n-th observations, or observations with an arbitrary cadence (e.g., 6 hours). The annotators could also filter observations by any subset of the GONG’s six observatories, whichever captured better images for their particular inquiry time. It is noteworthy that we do not assume that a filament’s chirality does not change as it evolves in time—a notion that the community has not yet found very strong evidence for or against. If contradictory indications were observed, the annotators were instructed to label the filament as unidentifiable.
Although looking up an observation in the GONG H-alpha Viewer may only take a few minutes, the extra time quickly adds up when tens of thousands of filaments need to be annotated. See the statistics reported in Analysis of Contributions to get a better picture of the invested time. Thus we asked the annotators to use this supplementary source of information only for discerning the chirality of relatively larger filaments.
Annotation Review Process
Each annotation went through two rounds of review, as illustrated in Fig. 6. Initially, each batch of images were at the Annotate stage when only one assigned annotator could work on those images. As soon as an annotator completed a batch (i.e., all images were marked as ‘ready for initial review’), the images were moved to the Review & Modify stage when a reviewer could start the review process. When the review process was completed, the accepted images were moved to the Final Review stage, and images were marked as ‘ready for final review’. The rejected images were sent back to the original annotator, marked as ‘being annotated’. The annotator would then be able to modify their annotations based on the feedback they received from the reviewers. The images in the Final Review stage were reviewed by domain experts. If they were rejected, they would be sent back to the Review & Modify stage, letting the reviewers decide whether they want to modify the mistakes by themselves and resubmit them, or they should be sent back to the original annotator. At any stage, when an annotator decided to leave the team, any incomplete annotation were sent back to the Annotate stage, pending to be assigned to a new annotator. The new annotator would be pick up the annotation where the last annotator left off, although they would be responsible for all filaments in that batch.
The graphic shows our two-stage review process in the annotation workflow. In the first stage (Review & Modify), in addition to providing review and feedback, reviewers made minor modifications whenever needed to reduce lag time. In the second stage (Final Review), the final review was conducted by the domain experts. The rejection and acceptance paths are also depicted in the graphic.
With this two-stage review process we gained a significant speedup without sacrificing the quality. This is because, first, unlike domain experts, the experienced annotators could be trained quickly. Second, when the review process is broken into two stages, the second round, carried out by a few domain experts, would be significantly less time consuming than the first round where annotations are more likely to need improvements. Furthermore, by letting the reviewers modify the annotations whenever quick modifications could fix an issue, we reduced the lag time, i.e., the time between when an image is rejected and when the assigned annotator starts working on it.
Training Human Annotators
Our training process was designed to be sufficient and multi-faceted. Each annotator went through a 90-minute interactive training session. Additionally, they were continuously provided with feedback, either as a short comment next to a rejected annotation, or as a more general comment through our dedicated Slack workspace. We believe our case-specific feedback significantly improved the quality of the annotations—something that is not possible when crowdsourcing technologies are utilized. Also, the annotators frequently posted screenshots of challenging cases, for which the reviewers would provide guidance. Furthermore, a dedicated website brought some educational material at the annotators’ finger tips, such as video tutorials, a catalog of observations to be ignored, a catalog of good and bad practices, and efficiency tips provided by the more-experienced annotators.
Each training session started with a brief introduction about solar filaments, their importance, and the end goals of our annotation project. It was followed by a hands-on training on identification of filaments and their chirality using numerous examples and live and interactive annotation. A visual mnemonic device was provided to help annotators remember the name of each label, as illustrated in Fig. 7. The annotators were also briefly taught what sunspots are—the other solar event visible in H-alpha observations—so that they do not confuse them with filaments.
The graphic shows a visual mnemonic device used to help annotators not to confuse the Left and Right chirality of filaments during annotation. The filaments’ examples are borrowed from Pevtsov et al.18.
For handling non-trivial cases, the annotators were given specific instructions. A summary of those instructions is given below:
-
1.
Filaments to ignore: (1) filaments which are too small or too faint to exhibit any feature; (2) small, close-by filaments which form groups; and (3) filaments which are close to the limb of the solar disk (roughly beyond ±70° from the central meridian).
-
2.
Filaments’ pieces: occasionally, a large filament may appear in multiple pieces, raising the question whether it is one filament, or multiple close ones. In such cases, the annotators were instructed to look up the filament in GONG H-alpha Viewer (introduced earlier in A Supplementary Source of Information for Annotators) and judge based on the evolution of that filament (forward and backward in time). A given rule of thumb, as subjective as it may be, was that if the chirality of the pieces clearly contradict one another, it is likely that the pieces belong to more than one filament. That said, the final dataset contains instances of filaments which are identified as one large filament by one annotator, and as multiple filaments by another annotator. This is an important point to consider for evaluation of agreements among the annotators (see Analysis of Annotators’ Agreements).
-
3.
Thick filaments: occasionally filaments may appear as almost-circular shapes, making it impossible to identify their tails, and consequently to identify the orientation of their spines. In such cases, the reviewers were instructed to trust the annotators’ (possibly arbitrary) judgments.
Post-processing of Data
When the annotation phase was completed, we obtained our v0.1 dataset. At this stage, every item of the dataset has already been manually reviewed. In this section, we explain our systematic review and iterative post-processing efforts which resulted in the final dataset v1.0.
To quantify the overall quality of the assigned labels, we ran a cross-comparison analysis of the annotations, which showed a substantial agreement, i.e. κ = 0.66, among the annotators. In Technical Validation we elaborate on how we computed κ, and moreover, we explain why we refrained from relabeling the filaments for achieving a higher Kappa score.
For systematic evaluation of the captured filaments’ features (segmentations and spines), we examined the quality of the annotations using PostgreSQL—a spatiotemporal object-relational database—extended with the PostGIS software program, and the Python’s Shapely package38. Both of these products rely on the GEOS library39—a C/C++ library for computational geometry—that guarantees their full compatibility and therefore, consistency in the obtained analytics. The following items summarize our investigation, the issues and their corresponding treatments.
-
1.
Segmentations with multiple (disconnected) pieces covering a seemingly multi-region filament. Although the annotators were instructed to capture each filament by a single shape, occasionally the instruction was not followed. Treatment: Such cases were identified using GEOS functions, scrutinized using GONG H-alpha Viewer, and modified in V7. If multiple pieces seemingly belong to one filament, the missing parts were manually extrapolated. Otherwise, they were broken down into separate segmentations.
-
2.
Segmentations with tiny “islands” around them, or tiny “holes” within them. Such patterns are the result of imprecise use of the Brush/Erase tool in V7 during refinement of segmentations. Treatment: Following the treatment in the previous item, if the area of a piece was greater than 5% of the total area, the segmentation was manually verified. Otherwise, the pieces were dropped as an erroneous annotation. It might be helpful to mention that, segmentations in COCO-style data format are represented as 2D lists (i.e., lists of lists). Each internal list is a segmentation piece that is considered either an island, if it does not intersect any other piece, or a hole, if it does intersect another piece. This makes the removal of islands/holes convenient, by dropping statistically insignificant pieces.
-
3.
Segmentations with missing spines, and vice versa. Treatment: Among all annotations corresponding to one observation, annotated by one annotator, segmentations which do not intersect any spine were identified and their corresponding spines were manually created in V7. Conversely, segmentations were provided for any spine without a corresponding segmentation. In MAGFiLO v1.0 each segmentation is coupled with exactly one spine, and vice versa.
-
4.
Spines with a small curve or loop at one end. This pattern emerges because to end a polyline’s extension during manual annotation, users must perform a double-click. Consequently, any slight movement of the cursor during this action could create unusual shapes. Treatment: We identified self-intersecting spines, and corrected them manually in V7. The identification of spines with small curves but without self intersection required us to define what constitutes a ‘small curve’, which we refrained from due to the subjective nature of the task, and its minimal impact.
Data Records
The MAGFiLO v1.0 dataset is made publicly available at Harvard Dataverse repository40. The repository contains the main dataset, named ‘magfilo_2024_v1.0.json’, as well as metadata files whose name starts with the keyword ‘[metadata]_’. These files contain the time windows from which GONG observations were sampled, as well as the URLs for downloading each observation.
The annotations of filaments, examples of which are illustrated in Fig. 8, are compiled into a single JSON file following the COCO-style data format. The main dictionary (pairs of keys and values) in the dataset contains five collections, mapped to keys ‘info’, ‘licenses’, ‘categories’, ‘images’, and ‘annotations’. The first two are dictionaries carrying information about the dataset itself such as the release date, dataset version, image license, etc. In the ‘categories’ collection which is a list of dictionaries, the labels (i.e., filaments’ identified chirality) can be found with unique IDs. Those labels are “Left”, “Right”, and “Unidentifiable”, mapped respectively to IDs 1, 2 and 3. The ‘images’ collection which is also a list of dictionaries, contains information about the annotated observations, such as the width and height of the images in pixels, the file name as appears in GONG’s archive (e.g., “20210607142350Bh.jpeg”), the captured date (e.g., “2021-06-07 14:23:50”), as well as a unique ID (e.g., “010103-20210607142350Bh”) that is used to map each observation to all of its corresponding annotations. Those annotations are listed in the ‘annotations’ collection, another list of dictionaries. Each annotation dictionary describes a single filaments’ annotation, with pieces of information such as the image ID it corresponds to, and the segmentation, the spine, and the minimum bounding box (1D list) of that filament. Examples of each item can be found on the dataset’s dedicated web page at https://www.mlecofi.net/magfilo.
Technical Validation
In this section, we present information describing the quality of data in MAGFiLO v1.0. We start by presenting some statistics about the utilized GONG H-alpha observations, and the distributions of annotations’ size. We further discuss the annotation agreement and what disagreements actually means. Lastly, for a high-level, physics-based comparison, we compare the hemispheric preference of left-chiral and right-chiral filaments as annotated in our dataset versus those of some other datasets.
Analysis of Annotated Features
The final dataset contains 10, 244 annotated filaments from 1, 593 full-disk, H-alpha observations captured by the GONG network. For each annotated filament its polygon, spine, minimum bounding box, and chirality are identified. Recall that our designed annotation pipeline permitted an observation to be annotated by up to three independent annotators (see Data Distribution). In the released dataset v1.0, a total of 958 unique observations are annotated from which 548 (57.20%) are annotated once, 185 (19.31%) are annotated twice, and 225 (23.49%) are annotated thrice. These observations are sampled from the years 2011 through 2022 (shown in Fig. 9a). The annotations are made of 3, 128 (30.53%) left-chiral, 3, 273 (31.95%) right-chiral, and 3, 843 (37.51%) unidentifiable filaments. As mentioned in Definitions of Filaments’ Labels and Features, filaments temporarily labeled as ‘ambiguous’ were reviewed and re-labeled as one of the other three labels.
The plots show (a) the number of unique observations annotated, (b) per-class distribution of filaments’ minimum bounding-box area, (c) per-class distribution of filaments’ segmentation area, (d) per-class distribution of filaments’ spine length, (e) per-class distribution of filaments’ segmentation dimension (i.e., number of corner points forming each polygon), and (f) per-class distribution of filaments’ spine dimension (i.e., the number of points forming each spine).
The distribution of the annotated filaments’ size, in terms of their segmentation area, minimum bounding-box area, and spine length, is shown in Fig. 9(b–d). All three histograms, with log-scaled y axes, confirm the expected inverse correlation between the filaments’ size and their counts. The one order of magnitude difference between the bounding-box area and segmentation area comes from the fact that filaments are, by definition, thin structures, occupying significantly smaller regions than their bounding boxes. In plot (b), one observes that the overall bounding-box area corresponding to the filaments labeled as Unidentifiable is less than the area of those identified as Left- or Right-chiral; the area averages for the Left, Right, and Unidentifiable filaments are 15, 714, 17, 523, and 6, 709 pixels, respectively. This significant difference however, disappears when one looks at the segmentation area, as shown in plot (c); the averages go down to 2, 253, 2, 416, and 1, 683 pixels, for the same order of classes. This could be considered as an evidence for the claim that thinner filaments are more likely to exhibit clear features determining their magnetic chirality.
Plots (e) and (f) show the dimension of segmentations and spines. The term dimension here refers to the number of points forming each object. As shown in (e), each segmentation is made of hundreds of (corner) points; the averages are 169, 176, and 104 points for the Left, Right, and Unidentifiable filaments, respectively. This high precision level is achieved because the annotators created masks (instead of polygons) using the Brush and Eraser tools, which were stored as polygons. Regarding the spines whose points were made by single clicks, the distribution plot in (f) shows that most of them are made with high granularity, and mostly of a few tens of points; the averages are 29, 30, and 17 points for the same order of classes.
Analysis of Annotators’ Agreements
We used Cohen’s kappa score, a coefficient of inter-annotator agreement (or “interjudge agreement”, as appears in the original text), to measure the agreement among the annotators41. Kappa is calculated as \(\kappa =\frac{{p}_{o}-{p}_{e}}{1-{p}_{e}}\) where po is the probability of agreement and pe is the probability of agreement if labels are assigned randomly (see scikit-learn’s implementation for details.) A Kappa score of 0 indicates a chance agreement (i.e., agreement no better than that expected by chance), and scores closer to 1 indicate a stronger agreement beyond chance. Small negative values are possible, but the lower range of Kappa depends on the number of annotators42. Our final dataset achieves κ = 0.47 for three annotators choosing labels from three possible cases for each filament. Since the actual disagreement occur only in cases where both the left and right chirality are assigned to a single filament, we further excluded the disagreement penalty for unidentifiable labels. With this consideration, the dataset reached the overall agreement of κ = 0.66. According to the agreement scale proposed by Landis and Kock43, values of κ within the interval [0.41, 0.60] represent a ‘fair’ agreement and those within the interval [0.61, 0.80] indicate a ‘substantial’ agreement.
It is noteworthy that the term disagreement here should be taken in its most abstract form, as it obscures many nuances. In other words, what counts as one case of disagreement may not be perceived as a real disagreement when the exact annotations are examined. One example of such a case is when the annotators do not agree whether they are looking at a single filament or multiple smaller ones. When looking at one filament, small features may be disregarded in favor of the more prominent ones. In contrast, when looking at multiple small filaments, the prominent features may appear only in some of them, giving a greater weight to the otherwise insignificant features. Therefore, in this one-to-many comparison, although the annotators may agree on the orientation of the prominent barbs, they may yet disagree on the chirality of all (pieces of) filaments.
The decision as to whether the existing discrepancies should be rectified is worth some clarification. One should note that the disagreements we notice here were exhibited not only among the annotators but also among the reviewers, since each annotation passed the review process. Moreover, a reviewer may have reviewed two or three identical batches of images annotated by different annotators. Yet, they occasionally contradicted themselves by accepting nonmatching labels, or even by modifying the proposed labels while contradicting their previous judgements on the same filaments. Therefore, because there is only so much that can be seen in 2D images capturing complex 3D structure of solar filaments, the labeling task is, to a degree, subjective. The elimination of such disagreements can only take place by valuing another person’s judgement higher than the annotators’ and the reviewers’. Thus, we decided to leave the natural discrepancies in the annotations in the dataset. In the future, using other sources of information such as magnetograms, we might be able to improve the quality of the labels.
Validation through Hemispheric Preference of Filaments’ Chirality
One interesting method for validation of the overall quality of labeling is to compare the hemispheric preference of the annotated filaments’ chirality with those of some other datasets. Hemispheric helicity rule is an observed trend of solar magnetic fields that indicates, in the northern hemisphere of the Sun, magnetic fields show primarily negative chirality while those in the southern hemisphere show primarily positive chirality44. For this comparison, we chose two manual investigation of chirality carried out by Pevtsov et al.18 and Hazra et al.45. We also include the hemispheric preference of filaments automatically identified by the AAFDCC algorithm, developed by Bernasconi et al.13, as reported in Hazra et al.45. It is worth noting a few points: (1) Pevtsov et al. reported a break-down of the hemispheric preference of filaments in terms of Quiescent and Active-Region filaments, separately. For a fair comparison, we used the averages of the two groups. (2) The reported hemispheric preference associated to the AAFDCC algorithm is derived from the same filaments identified in Hazra et al.45. Therefore, without proper investigation, one cannot assume that the statistics remain unchanged if all filaments identified by AAFDCC were incorporated. (4) Hemispheric preference can change during one solar cycle, or from cycle to cycle. Therefore, only investigations corresponding to similar windows of observations are comparable. (3) To quantify the hemispheric preference in our dataset, we considered a filament to be part of the northern (or southern) hemisphere if more than half of its segmentation area intersects that hemisphere.
The hemispheric preference comparison is shown in Fig. 10. A strong hemispheric preference was reported in Pevtsov et al.18 which focused on one year of data (2000-2001) containing 2, 310 manually-annotated filaments. A key difference between their effort and ours is that they used the full-disk line-of-sight magnetograms in addition to the H-alpha observations. A stronger hemispheric preference was reported in Hazra et al.45. They examined 3, 480 filaments from the BBSO observations captured during the month of August of 16 consecutive years (2000 through 2016), one observation every other day. The team managed to determine the chirality of only 22% of the identified filaments, which constituted their final report. The continuity in their approach made it possible for them to examine the chirality of long-lasting filaments in multiple observations. The weakest hemispheric preference is exhibited in the AAFDCC’s filaments, as reported in Hazra et al.18. Our dataset, MAGFiLO v1.0, largely agrees with both of the manual investigations despite the fact that it contains a significantly larger set of manually annotated filaments (i.e. 10, 244). It reflects a weaker hemispheric preference than that in the other two manual investigations, and a stronger hemispheric preference than the AAFDCC code.
Analysis of Contributions
The final dataset is the result of 1, 066 person-hours of annotation and review. Throughout the project, out of more than 100 recruited and trained annotators, the contributions of 41 annotators directly made the final dataset v1.0. The annotators and reviewers collectively created 46, 166 segmentations and spines (for over 23k filaments), working on 6, 944 observations (not necessarily unique; see Data Distribution). The top-three contributors (as annotators and reviewers) worked on 27, 828, 8, 659, and 5, 358 instances. We estimated that our acceptance rate was below 30% indicating that, majority of the observations were processed at least twice before they were accepted. With over 12k annotated filaments in our final dataset, the team spent, on average, 5.33 minutes annotating each filament.
For the record, the utilized annotation platform calculated our average acceptance rate to be 75.3%, however, this is unrealistic because of two main reasons: (1) this number is averaged over multiple rounds of review. Naturally, the chances that an annotation is accepted the second time is much higher than that being accepted the first time. (2) We incorporated a two-stage review process, resulting in the second stage, where annotations are already verified once, having a much higher acceptance rate than in the first stage. We managed to estimate a more realistic acceptance rate based on our own collected metadata.
Usage Notes
MAGFiLO v1.0 is released following the COCO-style data format. This data format used for the storage of the annotations is still the most popular format for image annotation, utilized by popular machine learning softwares such as PyTorch and Tensorflow, as well as many object-detection competitions. For basic tasks such as loading, parsing, and visualization, the COCO API can be used supporting Python, Matlab, and Lua programming languages. Its Python package, named ‘pycocoapi’, can be easily installed in any operating system. For more advanced interactions, such as geometrical analysis and curation of data, the C/C++ GOES library can be employed, which can be accessed via many languages, such as Python (through ‘Shapely’ package).
Code availability
The data collection and analytics code for this project is made publicly available via our Bitbucket workspace at https://bitbucket.org/dataresearchlab/mleco-magfilodatacode/. In this code repository, a Jupyter notebook file, named ‘data_collection.ipynb’, shows how the data collection process can be reproduced. Another notebook file, named ‘data_analytics.ipynb’, facilitates an interactive analysis of the dataset.
References
Engvold, O. Description and Classification of Prominences. In Vial, J.-C. & Engvold, O. (eds.) Solar Prominences, vol. 415 of Astrophysics and Space Science Library, 31, https://doi.org/10.1007/978-3-319-10416-4_2 (2015).
Gibson, S. E. Solar Prominences: Theory and Models. Fleshing Out the Magnetic Skeleton. Living Reviews in Solar Physics 15, 7, https://doi.org/10.1007/s41116-018-0016-2 (2018).
Alfvén, H. Existence of Electromagnetic-Hydrodynamic Waves. nat 150, 405–406, https://doi.org/10.1038/150405d0 (1942).
Webb, D. F. & Howard, T. A. Coronal mass ejections: Observations. Living Reviews in Solar Physics 9, 1–83, https://doi.org/10.12942/lrsp-2012-3 (2012).
Bruzek, A. & Durrant, C. (eds.). Illustrated Glossary for Solar and Solar Terrestrial Physicshttps://doi.org/10.1007/978-94-010-1245-4 (D. Reidel, Dordrecht, 1977).
Reames, D. V. Solar Energetic Particles. A Modern Primer on Understanding Sources, Acceleration and Propagation, vol. 978 of Lecture Notes in Physics, 2nd edn., https://doi.org/10.1007/978-3-030-66402-2 (Springer, Cham, 2021).
Carrington, R. C. Description of a Singular Appearance seen in the Sun on September 1, 1859. MNRAS 20, 13–15, https://doi.org/10.1093/mnras/20.1.13 (1859).
Stewart, B. On the Great Magnetic Disturbance Which Extended from August 28 to September 7, 1859, as Recorded by Photography at the Kew Observatory. Philosophical Transactions of the Royal Society of London Series I 151, 423–430, https://doi.org/10.1098/rstl.1861.0023 (1861).
Council, N. R.Severe Space Weather Events–Understanding Societal and Economic Impacts: A Workshop Report (The National Academies Press, Washington, DC, 2008).
Harvey, J. W. et al. The Global Oscillation Network Group (GONG) Project. Science 272, 1284–1286, https://doi.org/10.1126/science.272.5266.1284 (1996).
Hill, F. The Global Oscillation Network Group Facility—An Example of Research to Operations in Space Weather. Space Weather 16, 1488–1497, https://doi.org/10.1029/2018SW002001 (2018).
Martin, S. F. Conditions for the Formation and Maintenance of Filaments (Invited Review). Solar Physics 182, 107–137, https://doi.org/10.1023/A:1005026814076 (1998).
Bernasconi, P. N., Rust, D. M. & Hakim, D. Advanced automated solar filament detection and characterization code: Description, performance, and results. Solar Physics 228, 97–117, https://doi.org/10.1007/s11207-005-2766-y (2005).
Gao, J., Wang, H. & Zhou, M. Development of an automatic filament disappearance detection system. Solar Physics 205, 93–103, https://doi.org/10.1023/A:1013851808367 (2002).
Shih, F. Y. & Kowalski, A. J. Automatic extraction of filaments in hα solar images. Solar Physics 218, 99–122, https://doi.org/10.1023/B:SOLA.0000013052.34180.58 (2003).
Qu, M., Shih, F. Y., Jing, J. & Wang, H. Automatic solar filament detection using image processing techniques. Solar Physics 228, 119–135, https://doi.org/10.1007/s11207-005-5780-1 (2005).
Fuller, N., Aboudarham, J. & Bentley, R. D. Filament recognition and image cleaning on meudon hα spectroheliograms. Solar Physics 227, 61–73, https://doi.org/10.1007/s11207-005-8364-1 (2005).
Pevtsov, A. A., Balasubramaniam, K. S. & Rogers, J. W. Chirality of Chromospheric Filaments. Astrophysical Journal 595, 500–505, https://doi.org/10.1086/377339 (2003).
Zirin, H. The Big Bear Solar Observatory. skytel 39, 215 (1970).
Denker, C. et al. Synoptic hα full-disk observations of the sun from big bear solar observatory–i. instrumentation, image processing, data products, and first results. Solar Physics 184, 87–102, https://doi.org/10.1023/A:1005047906097 (1999).
Hurlburt, N. et al. Heliophysics event knowledgebase for the solar dynamics observatory (sdo) and beyond. Solar Physics 275, 67–78, https://doi.org/10.1007/s11207-010-9624-2 (2012).
Ahmadzadeh, A., Mahajan, S. S., Kempton, D. J., Angryk, R. A. & Ji, S. Toward filament segmentation using deep neural networks. In 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, December 9-12, 2019, 4932–4941, https://doi.org/10.1109/BigData47090.2019.9006340 (IEEE, 2019).
Scholl, I. F. & Habbal, S. R. Automatic Detection and Classification of Coronal Holes and Filaments Based on EUV and Magnetogram Observations of the Solar Disk. Solar Physics 248, 425–439, https://doi.org/10.1007/s11207-007-9075-6 (2008).
Joshi, A. D., Srivastava, N. & Mathew, S. K. Automated Detection of Filaments and Their Disappearance Using Full-Disc Hα Images. Solar Physics 262, 425–436, https://doi.org/10.1007/s11207-010-9528-1 (2010).
Yuan, Y., Shih, F. Y., Jing, J., Wang, H. & Chae, J. Automatic solar filament segmentation and characterization. Solar Physics 272, 101–117, https://doi.org/10.1007/s11207-011-9798-2 (2011).
Karachik, N. V. & Pevtsov, A. A. Properties of Magnetic Neutral Line Gradients and Formation of Filaments. Solar Physics 289, 821–830, https://doi.org/10.1007/s11207-013-0362-0 (2014).
Pötzi, W. et al. Real-time flare detection in ground-based hα imaging at kanzelhöhe observatory. Solar Physics 290, 951–977, https://doi.org/10.1007/s11207-014-0640-5 (2015).
Lin, G. et al. A new comprehensive data set of solar filaments of 100 yr interval. i. The Astrophysical Journal Supplement Series 249, 11, https://doi.org/10.3847/1538-4365/ab92a5 (2020).
Leibacher, J. & GONG Project Team. The global oscillation network group (gong). In AAS/Solar Physics Division Meeting #28, vol. 28 of AAS/Solar Physics Division Meeting, 02.11 (1997).
Hill, F. et al. The global oscillation network group site survey - part one. Solar Physics 152, 321–349, https://doi.org/10.1007/BF00680443 (1994).
Hill, F. et al. The global oscillation network group site survey - part two. Solar Physics 152, 351–379, https://doi.org/10.1007/BF00680444 (1994).
Jain, K., Tripathy, S. C., Hill, F. & Pevtsov, A. A. Continuous solar observations from the ground-assessing duty cycle from gong observations. pasp 133, 105001, https://doi.org/10.1088/1538-3873/ac24d5 (2021).
Diercke, A. et al. A universal method for solar filament detection from Hα observations using semi-supervised deep learning. Astron. Astrophys. 686, A213, https://doi.org/10.1051/0004-6361/202348314 (2024).
The SunPy Community. et al. The sunpy project: Open source development and status of the version 1.0 core package. The Astrophysical Journal 890, 68–, https://doi.org/10.3847/1538-4357/ab4f7a (2020).
Cotton, W. Fits2jpeg. https://gitlab.nrao.edu/nvss/fits2jpeg (1996). [Online; accessed 13-June-2024].
Pence, W. CFITSIO, v2.0: A New Full-Featured Data Interface. In Mehringer, D. M., Plante, R. L. & Roberts, D. A. (eds.) Astronomical Data Analysis Software and Systems VIII, vol. 172 of Astronomical Society of the Pacific Conference Series, 487, https://doi.org/10.1098/rstl.1861.0023 (1999).
Ahmadzadeh, A., Bawa, A. & Chaurasiya, K. Fits2jpeg. https://bitbucket.org/dataresearchlab/mleco-fits2jpegconvertor/ (2023). [Online; accessed 13-June-2024].
Gillies, S. et al. Shapely: manipulation and analysis of geometric objects, https://doi.org/10.5281/zenodo.5597138 (2024-07-13).
GEOS contributors. GEOS computational geometry library. Open Source Geospatial Foundation, https://doi.org/10.5281/zenodo.11396894 (2021).
Ahmadzadeh, A. et al. MAGFILO: Manually Annotated GONG Filaments in H-Alpha Observations, https://doi.org/10.7910/DVN/J6JNVK (2024).
Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37–46, https://doi.org/10.1177/001316446002000104 (1960).
Fleiss, J. L., Nee, J. C. & Landis, J. R. Large sample variance of kappa in the case of different sets of raters. Psychological bulletin 86, 974, https://doi.org/10.1037/0033-2909.86.5.974 (1979).
Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159–174, https://doi.org/10.2307/2529310 (1977).
Pevtsov, A. A., Berger, M. A., Nindos, A., Norton, A. A. & van Driel-Gesztelyi, L. Magnetic Helicity, Tilt, and Twist. Space Science Review 186, 285–324, https://doi.org/10.1007/s11214-014-0082-2 (2014).
Hazra, S., Mahajan, S. S., Douglas, W. K. & Martens, P. C. H. Hemispheric Preference and Cyclic Variation of Solar Filament Chirality from 2000 to 2016. apj 865, 108, https://doi.org/10.3847/1538-4357/aadb42 (2018).
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant No. 2209912 and 2433781, directorate for Computer and Information Science and Engineering (CSE), and Office of Advanced Cyberinfrastructure (OAC). This work utilizes GONG data obtained by the NSO Integrated Synoptic Program, managed by the National Solar Observatory, which is operated by the Association of Universities for Research in Astronomy (AURA), Inc. under a cooperative agreement with the National Science Foundation and with contribution from the National Oceanic and Atmospheric Administration. The GONG network of instruments is hosted by the Big Bear Solar Observatory, High Altitude Observatory, Learmonth Solar Observatory, Udaipur Solar Observatory, Instituto de Astrofísica de Canarias, and Cerro Tololo Interamerican Observatory. We further acknowledge, and very much appreciate, the help of John Britanik (NSO), Niles Oien (NSO), and Sara F. Martin (Helio Research) for their invaluable insight and feedback.
Author information
Authors and Affiliations
Contributions
A.A. led the project, the data processing and software development needs. R.A. developed software for data collection, pre-processing, and post-processing. K.C. created code for image conversion and set up the annotation platform. L.A.N. developed the GONG H-alpha Viewer web application. V.A. trained the annotators and reviewed their work. P.C.M. offered insights into the physics of filaments, while A.P. (Alexei), L.B., and A.P. (Alexander) provided insights into GONG’s six observatories and the H-alpha images they produce. N.D. assisted with manual annotation and the training of newly recruited annotators. S.M., A.B., and S.H.S. contributed to the data collection and pre-processing of H-alpha images. E.K., R.W., and A.A. (Aya) were involved in the manual annotation of filaments and provided feedback to the annotators. D.J.K. assisted in the deployment of software, including the GONG H-alpha Viewer. P.M.C. contributed to the development of our H-alpha anomaly-detection software.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ahmadzadeh, A., Adhyapak, R., Chaurasiya, K. et al. A dataset of manually annotated filaments from H-alpha observations. Sci Data 11, 1031 (2024). https://doi.org/10.1038/s41597-024-03876-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03876-y