Introduction

“Seeing is believing”; thus, imaging is one of the most powerful tools for scientific discovery. While human memory has a tremendous capability to recall images, we cannot associate the images with metadata (e.g., date of data collection, sample properties, or sample processing and provenance, etc.). Instruments of scientific discovery from optical to scanning probe to electron microscopes and spectrometers generate enormous volumes of images containing information about the properties and structure of materials. While advances in optics and electronics have accelerated the resolution, length, and timescales of imaging, the downstream analysis tools have not kept pace. Recently, it has been purported that machine learning can extract and represent underlying physics from large datasets1.

Machine learning, however, is only as good as the objective with which it is trained and thus struggles with unstructured exploratory tasks2. There has been an emergence in machine learning tools to accelerate discoveries from imaging sources3. Most commonly, researchers have trained machine learning models on labeled datasets to identify pre-determined features of significance. For example, convolutional neural networks (CNNs) can be trained to identify imaging modes in electron microscopy [e.g., transmission electron microscopy (TEM) vs. scanning TEM (STEM)]4. However, classification is limited by the requirement that researchers know a piori what features they are looking to discover and have at least a small labeled dataset. Research tasks tend to emphasize unknown discoveries, and thus classification is ill-posed. To extract more information from images, CNNs have been extended using a UNET architecture to segment phases5, and nanoparticles6. Alternatively, autoencoder structures have been leveraged to unmix statistical information from hyperspectral piezoresponse force microscopy7 and current-voltage spectroscopy8. This concept has been extended via the inclusion of rotational invariance to extract the orientation of ferroelectric variants from atomically resolved images9. While autoencoders are powerful in extracting information from imaging data that lies on a narrow distribution, they provide significantly less insight when the data is diverse10.

CNNs are typically constructed of convolutional layers that convolve filters across an image and pooling layers that combine spatial information. During optimization, the weights of the filters are adjusted towards an objective. For classification, the output of the 2-dimensional filters is flattened into a single dimension that eventually predicts the probability that an observed image belongs to a class. When training CNNs as classifiers, the learned filters resemble standard computer vision filters (e.g., edges, lines, etc.)11. CNNs have achieved incredible performance on image classification tasks. For example, it is common for models trained on ImageNet to achieve top-5 accuracy of >98%12. One of the core challenges with materials science datasets is that there are no large, labeled datasets for training. To circumvent the requirement for large databases, in a method called transfer learning13, filters learned on large, labeled datasets from, for example, ImageNet can be used for a new objective where data might be limited. Transfer learning has been used to classify microstructures, structure-property predictions, and much more14,15,16. Transfer learning requires a discrete objective for optimization, which is undefined in exploratory data analysis.

In materials imaging, one of the most pervasive concepts is symmetry, periodicity, and long-range order. The mathematics of CNNs are unable to learn this concept parsimoniously17. The process of convolving filters across an image imposes translational equivariance – meaning that the network can detect if a feature exists and its location. In practice, to make CNNs computational tractable, it is common to add pooling layers that use a region-based summary statistic to reduce the dimensionality of the data. As a result of pooling layers, CNNs are commonly translational invariant, meaning they can detect if a feature exists in an image but cannot determine its precise location. Similarly, if an image were rotated even slightly, the outputs of the filters would change, resulting in a unique trajectory through the network. To combat this, researchers have developed 2D-rotational equivariant18, and 3D-euclidian neural networks19; however, it is still an open question how to learn symmetry parsimoniously.

Despite these inherent limitations, all is not lost. Using over-parameterized models, large datasets, and data augmentation, it is possible to at least approximate symmetry. This means that given enough data, neural networks can be trained to accurately predict symmetry (within the bounds of a dataset) despite having no proper understanding of the concept of symmetry. This is achieved by learning a data-driven discrepancy model that leads to correct narrow-bounded conclusions without any requirement for parsimony. The challenge with learning in this fashion is that if a new example lies on the periphery of the training data distribution, the predictions might be nonsensical. Until neural networks can learn concepts like symmetry parsimoniously, their use in materials microstructures should be restricted to narrow-bounded problems (e.g., binary classification), where subsequent physics-based validation is possible, or there is an expert-in-the-loop. Even when it is impossible to learn underlying mathematical concepts, there can be utility in approximations when the task is untenable for human analysis, and independent validation is possible. For example, neural-network-based suggestive interactive frameworks provide an unprecedented capability to discover complex correlations in large datasets that are impossible to extract without flexible data-driven models.

Imaging and microscopy are ubiquitous in materials science. For example, the study of ferroelectric relies on understanding highly-ordered nanoscale domain structures. Piezoresponse force microscopy (PFM) is a high-throughput technique used to image domain structures in ferroelectrics. In PFM, a conductive cantilever is brought into contact with a surface, and an AC-electrical stimulus is used to measure the local piezoresponse in the lateral and vertical directions. To enhance the signal-to-noise, it is common to measure the response near the cantilever resonance frequency. To mitigate resonant shifts associated with changing tip-surface interactions, dual-frequency resonance tracking (DART) has been implemented by Asylum Research20. This technique uses a dual-frequency excitation with two lock-in amplifiers coupled to a phase-lock loop to measure piezoresponse at two frequencies simultaneously. It is important to characterize the structure, order, and periodicity in these images to understand structure-property relations.

Here, using a database of 25,133 DART-PFM images, we demonstrate how an interactive suggestive neural-network framework can be used to discover correlations in scientific images. To do this, we develop two neural-network-based image feature extractors. The first is based on transfer-learning pretrained image classifiers trained on ImageNet. A second neural-network provides symmetry-aware features. We achieve this by generating 1,000,000 images that belong to the 17-wallpaper group symmetries21. We trained a classifier that achieved >99% testing accuracy with a 60%-training, 20%-validation, 20%-testing split. Following feature extraction, we use manifold learning techniques to create two-dimensional projections that correlate images based on their composition and structure irrespective of the length scale of the images. Correlation of projections to metadata (e.g., filename) demonstrates preservation of trends in experimentalist, experimental timeline, and experimental methods. Through recursive exploration, more nuanced details of the image similarity are revealed. This provides a method for researchers to identify correlations in large unstructured image databases to accelerate scientific observations and discoveries. If coupled with a structured database containing metadata, the combination of filtering and projections would provide insights into synthesis-structure-property relationships. We democratize this tool by releasing all source code under a 3-clause BSD License, a non-restrictive open-source license.

Results and discussion

Data collection, collation, and feature extraction

We extracted 25,133 DART-PFM images acquired from 2013–2019 within the Lane Martin and Ramamoorthy Ramesh Laboratory at the University of California Berkeley7,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316. Within this data, there are PFM images of PbZr1−xTixO3, BiFeO3, PbTiO3/SrTiO3 superlattices, and much more. These films were grown at various growth conditions, epitaxial strain, elastic and electrostatic boundary conditions, and substrate orientation. Additionally, within this dataset, there were various stroboscopic switching studies. Associated with these images, we were provided access to the file and sample names and the sample province in the form of handwritten notes.

DART-PFM images contain multichannel information, including the topography, two amplitude, and phase channels, and an estimate of the resonance frequency. CNNs for natural-image classification are designed for red-green-blue (RGB) images. To make PFM images compatible with pretrained neural networks, we stacked the topography and one of the amplitude and phase channels to form RGB images. Then, using a pretrained VGG-16 convolutional neural-network317 (Fig. 1a) trained on ImageNet318 (~14 million images and 1,000 object categories), we extracted the 4096 latent features from the layer immediately preceding the classification layer for all stacked PFM images. While these features are tailored to classify natural images, mapping images from other domains to this latent space can produce a manifold where image similarity is preserved.

Fig. 1: Schematic illustration of image similarity computational structure.
figure 1

a VGG-16 architecture pretrained on ImageNet database. The gray area represents the pretrained VGG-16 model. FC3 was determined by training an autoencoder to reduce the dimensionality of the feature vector from 4096 to 512. The gear represents that this is the only layer optimized in the VGG-16 model. b ResNet-34 architecture with symmetry awareness by training it to classify wallpaper group symmetries.

To enhance the latent features, we embedded physics awareness into the latent manifold. In microstructural images such as ferroelectric domain structures, long-range order and symmetry are pervasive concepts. We built a dataset of 1,000,000 images equally distributed amongst the 17-wallpaper group symmetries to include this symmetry awareness. To do this, we sampled various-sized unit cells from the ImageNet dataset and conducted symmetry operations to create images (Supplementary Fig. 1). We provide this dataset available open-source (see data availability). We then trained a ResNet-34 using an ADAM optimizer to predict wallpaper group symmetry (Fig. 1b). We refer to this model as ResNet-34-Symm-Aware. Following training, our model achieved >99% testing accuracy (Supplementary Fig. 2).

It is important to note that despite this model achieving >99% test accuracy on predicting symmetry, it does not actually learn the concept of symmetry. Learning or detecting symmetry parsimoniously from images is not something possible using current machine learning or analytical methods. We do the next best thing; we provide symmetry awareness by training a model to approximate symmetry detection. This symmetry awareness improves the model’s performance but should not be relied upon without validation.

We, once again, extracted the 512 latent features from the layer immediately preceding the classification layer. While these latent features cannot accurately determine symmetry from PFM images, the features they extract are sensitive to the symmetry in images. Since the latent dimensions between these two feature extractors are of different sizes, we reduced the dimensionality of the natural-image features from 4096 to 512 using a dense autoencoder with a single-latent layer (see methods for details). Following training, the latent layers were stacked into a 1024-dimensional feature vector. By including an equal number of regular and symmetry-aware features, we consider both symmetry and non-symmetry-related information approximately equally.

Manifold learning

To further reduce the dimensionality of the latent space to a tenable two-dimensions, we used Uniform Manifold Approximation and Projection (UMAP)319,320. This technique belongs to the class of neighbor graph manifold learning techniques—of which t-distributed stochastic neighbor embedding (t-SNE)321 is the most common. UMAP works by building complexes of simplices that represent topological space. Based on Nerve theorem, it can be shown that all the vital topology of a data manifold can be preserved. For the mathematics to work, the data must be uniformly distributed on a manifold – a condition absent in real data. This can be corrected by defining a Riemannian metric on the manifold that makes this true. Essentially, what this is doing is taking regions of the manifold and distorting the distance metric such that the data maps to a uniformly distributed Euclidean space. Using fuzzy topology and under the assumption that the manifold is locally connected, it is possible to avoid the curse of dimensionality by making the local connectivity distributions independent of the dimensionality. These different representations of the manifold space are combined using a probabilistic fuzzy union of weight edges \(f\left( {\alpha ,\beta } \right) = \alpha + \beta - \alpha \cdot \beta\). This graph is then translated to a low-dimensional manifold by minimizing the cross-entropy \(\mathop {\sum}\limits_{\alpha \in A} {\mu \left( \alpha \right)\log } \left( {\frac{{\mu \left( \alpha \right)}}{{\upsilon \left( \alpha \right)}}} \right) + \left( {1 - \mu \left( \alpha \right)} \right)\log \left( {\frac{{1 - \mu \left( \alpha \right)}}{{1 - \upsilon \left( \alpha \right)}}} \right)\) between these two graphs. In this equation, the first term tries to correctly group local connectivity, where the second term tries to ensure the distance between groups is represented accurately. This can be optimized using stochastic gradient descent with negative sampling. UMAP has significant advantages over t-SNE in that it creates more uniform projections, scales significantly better, and has an adaptable mathematical framework. For this work, we used the original open-source implementation319.

Image similarity search in natural images

To demonstrate this concept, we conduct a toy demonstration using the CAL101 image dataset322. This dataset contains images from 101 distinct categories, each of which contains 40–80 images. Since these images are not symmetry-sensitive, only the VGG-16 network trained on the ImageNet database was used as a feature extractor. We note that the symmetry-aware model resulted in significantly worse results as symmetry is not a good descriptor of natural images (Supplementary Figs. 3–4). Following feature extraction, UMAP was used to project the feature manifold into a 2-dimensional space. We show the resulting UMAP image similarity manifold (Fig. 2, closeup images are provided in Supplementary Figs. 5–7). Looking at this projection, much of the important similarity is preserved. For example, image categories such as people, airplanes, motorcycles, soccer balls, and stop signs are tightly clustered. More interestingly, similar categories, for example, vehicles (bottom—circle, Fig. 2), animals (top—star, Fig. 2), and household furnishing (center left—square, Fig. 2), are grouped. It is important to note that this occurs without any categorical association between the classes, only similarities in the images. This demonstrates that UMAP projections of neural-network features preserve local similarity between images of the same class and more global similarity associated with commonalities amongst classes. The ability of this methodology to preserve global similarity means it is possible to identify similarities between uncorrelated images.

Fig. 2: UMAP representation of Caltech 101 dataset.
figure 2

The hyperparameters for the UMAP were n_neighbors = 5, min_dist = 0.3, n_components = 2, metric = “correlation”.

Image similarity search in piezoresponse force microscopy images

Having proven the capability of this approach on natural images, we applied this concept to our database of 25,133 PFM images. We extracted features using both the VGG-16 natural image and ResNet-34 symmetry-aware models. We show the 2D-UMAP projection (Fig. 3) with n_neighbors = 5, min_dist = 0.3, n_components = 2, metric = “correlation”. The projections were relatively insensitive to reasonable hyperparameter choices (Supplementary Figs. 8–10). All 25,133 PFM images are included in this projection (black dots); however, we show only 1000 random images (Fig. 3, additional closeups are provided Supplementary Figs. 11–15). Further details regarding the role of feature extractors on PFM images is discussed in Supplementary Fig. 16.

Fig. 3: UMAP representation of 25,133 piezoresponse force microscopy images.
figure 3

The hyperparameters for the UMAP were n_neighbors = 5, min_dist = 0.3, n_components = 2, metric = “correlation”. Only the amplitude channel is shown.

Similar to the natural images, this projection preserves similarity between the images. For example, there are clusters of (001)-oriented PbZr1-xTixO3 with c/a/c/a domain structures (Fig. 3 green-triangle). These are adjacent to extended clusters of striped-phase BiFeO3 (Fig. 3 blue-square), (111)-oriented PbZr1−xTixO3 (Fig. 3 red-circle). The large section on the top right-hand side of the image shows experiments with tip-induced 180-degree switching (Fig. 3 purple-star), and the bottom shows regions where there was minimal PFM contrast (Fig. 3 yellow-diamond). Such results indicate that the neural-network feature extractor combined with UMAP projections can conduct fuzzy classification and association of microstructures based solely on the images. Furthermore, this method preserves semantic relationships between microstructures despite differences in imagining conditions, magnification, tip, date, acquisition time, etc.

To explore how this approach can preserve image similarity, we conducted studies where a single image was selected (shown with dashed box), and the 9-nearest neighbors in UMAP Euclidean space were selected (Fig. 4). In the example shown in the main text (Additional examples in Supplementary Figs. 17–19), an image of a ~400 nm thick PbZr0.2Ti0.8O3/BaSrRuO3/SmScO3 film with dense c/a/c/a domain structures was selected. Using the VGG-16 model without symmetry awareness as a feature extractor resulted in 9-nearest neighbors, which at first glance appear to have similar structures. Upon closer inspection, it is observed that only 2 of the nearest neighbors are PbZr1−xTixO3 with c/a/c/a domain structures (Fig. 4 green-star), with the rest having a striped BiFeO3 domain structure (Fig. 4 red-x). Conversely, using the combined VGG-16 and ResNet-34-Symm-Aware model as a feature extractor, except one (Fig. 4 red-x) of the 9-nearest neighbors, had a similar PbZr1−xTixO3, c/a/c/a domain structure (Fig. 4 green-star). Furthermore, nearest neighbors 1 and 4 (Fig. 4 diamond) are images obtained from a similarly synthesized film by a different grower. This, and other examples, highlights the importance of including symmetry-awareness to improve image similarity projections of PFM images of ferroelectric domain structures.

Fig. 4: Comparison to UMAP-projections using natural image and symmetry-aware features.
figure 4

Nine (9) next-nearest neighbors in UMAP-projected space near a random piezoresponse force microscopy image for the (a) VGG-16, and (b) ResNet-34-Symm-Aware model. The images determined to be semantically similar are marked with a green-star, and those deemed different are marked with a red x. These images were extracted by finding the nearest neighbors in Euclidian space from a selected image in the UMAP projection. The selected image is marked original. The nearest neighbors are sorted by increasing Euclidean distance from the original randomly selected image.

This technique becomes even more powerful when metadata filters are applied. Here, we filter the data to only those samples grown by Dr. Rujuan Xu. Owing to the narrower distribution of the data, latent space projections automatically cluster similar samples and experiments. For instance, ordered (Fig. 5 green-square) and frustrated (Fig. 5 orange-circle) (111)-oriented, and (001) oriented (Fig. 5 blue-triangle) PbZr1−xTixO3 are clustered close together. Furthermore, experiments such as box-in-a-box (Fig. 5, pink-plus), single-point (Fig. 5 yellow-diamond), and patterned ferroelastic switching (Fig. 5 green-x) are all identified. We once again note that these similarities are identified regardless of image magnification, rotation angle, etc. Furthermore, it is somewhat uncanny how associations between similar yet different classes are preserved. For instance, the ordered, frustrated, and switching studies on (111)-PbZr1−xTixO3 are neighbors. Similarly, all switching type studies are grouped. This demonstrates that sequential searching and projection of images based on metadata filters provides a pathway to discovering correlations in unstructured image databases.

Fig. 5: UMAP projection of samples made by Dr. Ruijuan Xu.
figure 5

Regions of semantic similarity are identified. The hyperparameters for the UMAP were n_neighbors = 5, min_dist = 0.3, n_components = 2, metric = “correlation”.

Since image similarity is, by definition, a fuzzy metric, the utility of this approach hinges on interactive human-in-the-loop exploration toward an evolving objective. To provide this interaction, we have created a platform for recursive image exploration of microstructural images. We built a Bokeh dashboard to conduct recursive searching of symmetry-aware projections. The front-facing dashboard has three panels an initial projection, a recursive projection, and a metadata viewer (Fig. 6a). Following neural-network-based feature extraction, UMAP projections of an entire database or a subset of filtered images are visualized. Using the Bokeh server, it is possible to pan and zoom to explore this projection.

Fig. 6: Demo of interactive graphical user interface of the image similarity latent manifold using the Bokeh Visualization Library.
figure 6

a Graphical user interface outlook on the web browser. b Initial UMAP projection. c Area of interest selection. d Recursive UMAP projection. e Zoom-in view of recursive UMAP projection. f Exploration of the metadata associated with the images.

Furthermore, it is possible to select specific images individually or in a region of interest (Fig. 6b). Once selected, the software will automatically compute a recursive UMAP projection of the selection. In the example shown, we selected an image from a set of studies where patterns of ferroelectric domains were written (Fig. 6c). The application selects all images within the selection and recomputes the UMAP projection. By narrowing the distribution in the feature space, subsequent UMAP two-dimensional projections refine the similarities identified. In this example (Fig. 6d–e), we show that the new projection captures detailed semantic similarities; in this case, all images with similar focus-ion beam milled regions. We provide additional examples of recursive searching (Supplementary Figs. 20–22). In this tool recursive projections can also be explored interactively. A tooltip can be configured within this module to display metadata associated with an image object (Fig. 6f). We provide a video highlighting (Supplemental Video 1) some of the interactive features of this software package. We have released the software package Recursive_Symmetry_Aware_Materials_Microstructure_Explorer under the open BSD-3-Clause License to democratize this tool for widespread use. In this package, we include trained VGG-16 and ResNet-34-Symm-Aware models. A user can supply their data in RGB format and use the interactive visualization toolkit. We provide an example of this using images scraped from google image search. More information is available at https://recursive-symmetry-aware-materials-microstructure-explorer.readthedocs.io/en/latest/.

Benchmarking

This interactive recursive searching and exploration has to be fast and scalable on conventional workstations to be viable. We conducted benchmarks on a Lambda Labs Workstation with 16 cores, 64 GB of RAM, and two NVIDIA 2080 RTX GPUs. Feature extraction from our entire dataset using both models on a single GPU progressed at ~150 iters/second. Since feature extraction is only required once and deep learning inference is embarrassingly parallel, this only needed 3 min to process the 25,133 DART-PFM images. We tested UMAP speed by conducting UMAP with \({{{\mathcal{O}}}}\left( {2^n} \right)\) from n = 4−19. For \({{{\mathcal{O}}}}\left( {2^{19}} \right)\) the UMAP projection took less than 300 s, and \({{{\mathcal{O}}}}\left( {2^{15}} \right)\) took less than 30 s (Supplementary Fig. 23). We note that it would be rare that more than 500,000 images would be searched without applying metadata filters that reduce the number of images being compared.

In summary, we demonstrate, develop, and democratize a method to interactively and recursively search large unstructured databases of microstructural images. We achieve this by generating a wallpaper group symmetry dataset and training a ResNet-34 model to predict the symmetry. Using this network, combined with a pretrained VGG-16 model, we extract features and create UMAP projections of 25,133 DART-PFM images. We demonstrate that we can enhance the semantic projections of image similarity by including symmetry awareness in the model. We show how recursive searching with the help of metadata filters can elucidate hidden correlations in large unstructured image databases. This toolkit is made openly available under non-restrictive open-source licenses. This work motivates increased sharing of open data and emphasis on data and metadata curation as a pathway to increase the scientific impact of costly experiments and accelerate scientific progress.

Methods

Wallpaper group symmetry dataset

The wallpaper group symmetry dataset was generated using a custom python script. First, this script scraped a random image from the ImageNet dataset. Then, a random swatch in the shape of the primitive unit-cell of variable size was selected. This unit-cell was rotated in euclidian space, and then the symmetry operators were applied to generate an image, which was cropped to a 256 × 256 image.

Training symmetry-aware feature extractor

A ResNet-34 model from Pytorch was downloaded and a classification layer of size 17 was added to the model. We trained our model using the 1,000,000 generated wallpaper group symmetry images with a 60-training/20-validation/20-testing split to predict the wallpaper group symmetry of the image. The model was trained for six (6) epochs on an NVIDIA Titan RTX GPU. The model was optimized using an ADAM optimizer with a learning rate of 3 × 10−5. The learning rate was scheduled using fit-one-cycle. Further details are provided in the Supplemental Materials.

Feature extraction of PFM images

PFM images were constructed into red-green-blue images by stacking the height, first-amplitude, and phase channels from the DART-PFM images. All images were normalized using a min-max scalar applied to each channel. Features were extracted by predicting the latent vector immediately preceding the classification layer in both the VGG-16 and ResNet-34-Symm-Aware models. Following feature extraction, a single-layer dense autoencoder was used to reduce the VGG-16 model from 4096 to 512 features such that it is balanced with the symmetry-aware features. We trained the model for 50 epochs using ADAM, achieving a final loss of 0.07. The manifold of these features was then learned using UMAP as described in the main text. The hyperparameters for UMAP were n_neighbors = 5, min_dist = 0.3, n_components = 2, metric = “correlation”. Further details regarding the computational pipeline and visualization methods are obtainable in the reproducible open-source code made openly available.