Energy-based segmentation methods for images with non-Gaussian noise

Zhong, Jiatao; Du, Shiyin; Shen, Canruo; Chen, Yiting; Gao, Min; Naidu, Medha; Shi, Xiaoping; Fu, Yuejiao

doi:10.1038/s41598-025-09211-8

Download PDF

Article
Open access
Published: 16 July 2025

Energy-based segmentation methods for images with non-Gaussian noise

Jiatao Zhong¹,
Shiyin Du¹,
Canruo Shen¹,
Yiting Chen¹,
Min Gao^1,3,
Medha Naidu¹,
Xiaoping Shi¹ &
…
Yuejiao Fu²

Scientific Reports volume 15, Article number: 25707 (2025) Cite this article

1717 Accesses
1 Citations
45 Altmetric
Metrics details

Subjects

Abstract

This paper proposes an energy-based segmentation method facilitated by the change point detection. We apply the Kullback–Leibler (KL) divergence to demonstrate the feasibility of our method for non-Gaussian noisy images. Notably, the algorithm automatically determines whether the model is solvable using a Gaussian approach and, if not, effortlessly switches to a non-Gaussian alternative. It can also automatically determine the optimal number of classifications. Furthermore, its iterative nature enables the detection and segmentation of small regions that other methods often fail to capture. Compared to the traditional maximum between-class variance technique and recent statistical approaches, this method provides improved thresholding accuracy for bimodal grayscale images. Moreover, in the context of multiple threshold identification, the proposed method outperforms Subtractive Clustering K-Means with Filtering, Sparse Graph Spectral Clustering, Gaussian mixture on Markov random field, and Adaptive Thresholding in segmenting multimodal grayscale images.

An alternating multiple residual Wasserstein regularization model for Gaussian image denoising

Article Open access 25 November 2024

Robust mean shift filter for mixed Gaussian and impulsive noise reduction in color digital images

Article Open access 02 September 2022

Opti3D for low light enhancement and calibration free 3D digitization of cultural relics

Article Open access 22 July 2025

Introduction

Image segmentation is a fundamental technique with important applications across various disciplines. For instance, numerous CT scans and X-rays rely on segmentation to detect tumors or locate lesions within organs. In the case of breast cancer, it is estimated that approximately 15 Canadian women would succumb to the disease daily in 2024¹. Detecting Lymph node plays a crucial role in both staging and treatment of breast cancer². Beyond medical image detection, rapid and accurate wildfire detection from satellite imagery has become increasingly urgent. In 2023, Canadian wildfires burned nearly 7.8 million hectares of forest, representing over a quarter of global tree cover loss³. Image segmentation is also crucial in the field of biology. For instance, Gooh et al.⁴ used optical manipulation to collect 400 images of Arabidopsis early embryogenesis at regular intervals to observe cell division. Accurate cell counting was a prerequisite for this analysis, necessitating effective image segmentation methods.

It is essential to identify or develop robust image segmentation methods to address these issues. A widely known algorithm used in image segmentation is Otsu’s method⁵. It has been cited as an effective technique^6,7, but the Otsu threshold significantly deviates from the intersection point, skewing towards the center of the class with a larger variance. This causes it to fail when handling peaks with differing variances⁸. Besides Otsu’s method, there are several other methods using thresholds to segment the pictures, for example, Kapur et al.⁹ introduced an entropy-based technique for image segmentation, where entropy quantifies the amount of image information. The maximum entropy algorithm aims to determine the optimal threshold by maximizing the combined entropy of the background and foreground segments. Kittler and Illingworth¹⁰ developed a minimum error thresholding technique. More recent methodologies have explored the use of semi-atomic algorithms for histogram thresholding¹¹. Zhang and Hu¹² developed image segmentation based on the 2D Otsu method with histogram analysis and demonstrated that the two-dimensional Otsu method behaves better in segmenting images of low signal-to-noise ratio than the one-dimensional method. However, these methods lack rigorous mathematical or statistical validation of their algorithms. Our approach seeks to apply a statistical method for change point detection based on the Energy-Based Model (EBM), extending single-threshold segmentation to multiple-threshold segmentation. Building on the traditional approach, we developed the Energy-Based Model segmentation method as an enhancement. The EBM can represent any statistical distribution and effectively simulate the gray-level histograms of images. This capability suggests that EBM could be applied to image segmentation tasks based on histogram data.

The objective of EBM is to derive an energy function that accurately represents the probability distribution of data. Originating from physics, EBM often describes the potential energy of a system. The model is inspired by the Boltzmann distribution, in which the state of a system at a given temperature follows a probability distribution inversely proportional to the energy¹³. Over time, EBM has been extensively applied within the field of computing, the EBM approach offers a unified theoretical framework applicable to a wide range of learning models, encompassing traditional discriminative and generative approaches, as well as graph-transformer networks, conditional random fields, maximum margin Markov networks, and various manifold learning techniques¹⁴. Furthermore, the exponential tilting density has notable connections with EBM and can represent any statistical distribution. Compared to probabilistic methods, EBM could be interpreted as a non-probabilistic factor graph, offering significantly greater flexibility in designing architectures and establishing training criteria. Thus, EBM has considerable potential for wide application in statistics.

Our primary objective is to detect appropriate thresholds in the gray-level histograms of images such as CT scans, X-rays, cell images, and satellite wildfire images. The localization of the threshold is crucial for this task, which differs from standard imaging because even small errors can seriously affect the results of professional image segmentation. As for Fig. 1, we converted the colour image to a grayscale image and plotted its grayscale histogram; based on this histogram, we could then identify the change point (a specific grayscale value) in a bimodal histogram. This grayscale value was then used as the threshold value, converting pixels with values above it to black and those below it to white. However, for multiple change points, binary segmentation is insufficient. As shown in Fig. 2, since there are five change points, we need to divide the image into six segments, corresponding to six distinct colours.

Determining thresholds is essential for effective image segmentation. To achieve this, we applied EBM to estimate the image histogram and used a change point detection method to facilitate segmentation.

Modelling

EBM with non-Gaussian noise

Based on the Central Limit Theorem, the normalized grayscale histogram of many images can be viewed as one-dimensional Gaussian distributions or Gaussian Mixtures¹⁵. There exists a baseline density such as a one-dimensional Gaussian density defined as

$$\begin{aligned} f_b(x) = \frac{1}{\sqrt{2 \pi \sigma ^2}} \exp \left( -\frac{(x - \mu )^2}{2 \sigma ^2}\right) , \end{aligned}$$

(1)

where $\mu $ represents the mean, and $\sigma $ represents the standard deviation. Since Gaussian distribution belongs to the exponential family, (1) can be also written as

$$\begin{aligned} f_b(x)=\exp \{-(k_0+k_1x+k_2x^2)\}, \end{aligned}$$

(2)

where $k_0=\log {\sqrt{2 \pi \sigma ^2}}+\frac{\mu ^2}{2\sigma ^2}$, $k_1=-\frac{\mu }{\sigma ^2}$, $k_2=\frac{1}{2\sigma ^2}$. It is worth noting that some normalized grayscale image histograms can not be modeled as Gaussian distributions or as Gaussian Mixtures¹⁶. To tackle this problem, we developed an extended probabilistic model including both Gaussian and Non-Gaussian distributions. Given a baseline density $f_b(x)$, we can get the optimal extension according to Kullback–Leibler (KL) divergence which was defined as

$$\begin{aligned} KL(g || f_b) = \int g(x) \log \frac{g(x)}{f_b(x)} \, dx, \end{aligned}$$

(3)

where a smaller KL divergence value indicates that the distribution $ g(x) $ is closer to $ f_b(x) $. Therefore, minimizing the KL divergence can be interpreted as a way of finding the $ g(x) $ that best approximates $ f_b(x) $.

Next, we aim for $ g^*(x) $ to closely approximate a reference density $ f_b(x) $ in an information-theoretic sense. Since we aim to encompass a broader range of potential expressions, according to Shi et al.¹⁷, we let $ d{\mu}(x)\: = \:f_b(x)dx $ be a positive measure on $ \mathbb {R} $ and define $ \mathscr {F}_{\mu } $ as the collection of positive probability densities $ f $ on $ \mathbb {R} $ such that

$$\begin{aligned} \int h_l(x) f(x) f_b(x)dx = m_l, \quad l = 0, 1, \dots , p, \end{aligned}$$

(4)

where $ h_0(x) \equiv 1 $, $ m_0 = 1 $, $h_l(x)=-x^{l+2}, l = 1, \dots , p $ and $ m_1, \dots , m_L $ are real numbers. In particular, $ \mathscr {F} $ represents a special case of $ \mathscr {F}_{\mu } $ with $ d\mu (x) = dx $. If $ f \in \mathscr {F} $ and $ f_b(x) > 0 $, then $ \frac{f}{f_b} \in \mathscr {F}_{\mu } $. Let $ c_0, \dots , c_L $ be some real constants such that

$$\begin{aligned} f^*(x)&:= \exp \left\{ \sum _{l=0}^L c_l h_l(x) \right\} \in \mathscr {F}_{\mu }. \end{aligned}$$

(5)

In light of Shi et al.¹⁷, the optimal choice in terms of image information should be the one that minimizes the KL divergence;

$$\begin{aligned} g^*(x)=\arg \min _{g(x) \in \mathscr {F}} KL\{g(x) || f_b\}= f^*(x)f_b(x), \end{aligned}$$

(6)

which can also be represented by

$$\begin{aligned} \left\{ \begin{aligned}&g^*(x)= \exp \{-X^{\top }\varvec{\gamma }\}, \\&\int g^*(x) \, dx = 1. \end{aligned} \right. \end{aligned}$$

(7)

This represents a specific form of EBM, where $ x \in \mathbb {R} $, $ X = (1, x, x^2, \dots , x^p)^{\top } $ is a vector of predictors with $ p + 1 $ dimensions, and $ \varvec{\gamma } = (\gamma _0, \gamma _1, \gamma _2, \dots , \gamma _p)^{\top }=(c_0+k_0, k_1, k_2, c_1,c_2\dots , c_p)^{\top } $. Note that this model remains feasible even for large p, indicating its ability to handle images with non-Gaussian noise.

EBM with single change point

By the definition of the energy-based model in (7), and in conjunction with the process of research on change points by Jin et al.¹⁸, we develop a change point detection method based on the energy model. To facilitate image segmentation, we need to extend EBM $g^*(x)$ to EBM with change point G(x), as defined below:

$$\begin{aligned} \left\{ \begin{aligned}&G(x):=\exp \{-X^{\top }\varvec{\gamma _{0}}I\left( x \le \tau \right) -X^{\top }\varvec{\gamma _{1}}I\left( x > \tau \right) \},\\&\int G(x) dx=1, \end{aligned} \right. \end{aligned}$$

(8)

where $X=\left( 1, x,x^{2},\ldots , x^{p}\right) ^{\top }$ is a sequence of $p+1$ dimensional vector of predictors, $\varvec{\gamma _{0}}=\left( \gamma _{0,0}, \gamma _{1,0},\ldots , \gamma _{p,0}\right) ^{\top }$ and $\varvec{\gamma _{1}}=\left( \gamma _{0,1},\gamma _{1,1}, \gamma _{2,1},\ldots , \gamma _{p,1}\right) ^{\top }$ are unknown $p+1$ dimensional vector of regression coefficients, $\tau $ is an unknown change point location.

Since the normalized grayscale histogram of an image is bounded and discrete in practical applications, we define $ G_i $ as the vertical coordinates of the histogram and $ x_i $ as the horizontal coordinates, introducing the error term $ \varepsilon _{i,n} $ for normalization:

$$\begin{aligned} \begin{aligned} G_{i}:=\exp \{-(X_{i}^{\top }\varvec{\beta _{0}}+\varepsilon _{i,n})I\left( a \le i \le \tau \right) -(X_{i}^{\top }\varvec{\beta _{1}}+\varepsilon _{i,n})I\left( \tau < i \le b \right) \}, \end{aligned} \end{aligned}$$

(9)

where $\sum _{i = a}^{b} G_i=1$, $X_{i}=\left( 1, x_{i},x_{i}^{2},\ldots , x_{i}^{p}\right) ^{\top }$ is a sequence of $p+1$ dimensional vector of predictors, $\varvec{\beta _{0}}=\left( \beta _{0,0}, \beta _{1,0},\ldots , \beta _{p,0}\right) ^{\top }$ and $\varvec{\beta _{1}}=\left( \beta _{0,1},\beta _{1,1}, \beta _{2,1},\ldots , \beta _{p,1}\right) ^{\top }$ are unknown regression coefficients, and a, b represent the endpoints of the pixel value range, $a<b \in \mathbb {Z}_{\ge 0}$, $n=b-a+1$ is the total number of combinations of pixel values (e.g., $n=2^8$ for an 8-bit image).

Change point detection in grayscale histogram

In light of Csörgő & Horvath¹⁹ and Jin et al.¹⁸, whose methods for detecting change points in linear regression are useful and have been rigorously validated through mathematical proofs, we used the least squares regression to estimate unknown change point $\tau $ because polynomial regression is a kind of linear regression. We divide the data sequence into two segments using $\theta $ as the dividing point and employ least squares polynomial regression within each segment, resulting in the following least squares estimation of coefficients:

$$\begin{aligned} \left\{ \begin{array}{l} \varvec{\hat{\beta }_{0,\theta }}= \left( \varvec{X_{1,\theta }}^{\top } \varvec{X_{1,\theta }}\right) ^{-1} \varvec{X_{1,\theta }}^{\top } \log \varvec{G_{1,\theta }} \\ \varvec{\hat{\beta }_{1,\theta }}= \left( \varvec{X_{2,\theta }}^{\top } \varvec{X_{2,\theta }}\right) ^{-1} \varvec{X_{2,\theta }}^{\top } \log \varvec{G_{2,\theta }}, \end{array}\right. \end{aligned}$$

(10)

where $\varvec{G_{1,\theta }}=\left( G_{a}, G_{a+1},\ldots , G_{\theta }\right) ^{\top }$, $\varvec{G_{2,\theta }}=\left( G_{\theta +1}, G_{\theta +2},\ldots , G_{b}\right) ^{\top }$, $\varvec{X_{1,\theta }}=\left( X_{a}, X_{a+1},\ldots , X_{\theta }\right) ^{\top }$, and $\varvec{X_{2,\theta }}=\left( X_{\theta +1}, X_{\theta +2},\ldots , X_{b}\right) ^{\top }$. Then, we apply these coefficients to estimate $G_{i}$,

$$\begin{aligned} \left\{ \begin{array}{l} \log \hat{G}_{i}(\theta )=X_{i}^{\top } \varvec{\hat{\beta }_{0,\theta }},~~ a\le i \le \theta ,\\ \log \hat{G}_{i}(\theta )=X_{i}^{\top }\varvec{\hat{\beta }_{1,\theta }},~~ \theta < i \le b, \end{array}\right. \end{aligned}$$

(11)

Following Csörgő & Horvath¹⁹, we specify the following least squares criterion

$$\begin{aligned} M(\theta )=\sum _{ a+p+1 \le i \le \theta }{\left| \log G_{i}(\theta )-\log \hat{G}_{i}(\theta )\right| ^{2}+ \sum _{\theta < i \le b-p-1}{\left| \log G_{i}(\theta )-\log \hat{G}_{i}(\theta )\right| ^{2}}}. \end{aligned}$$

(12)

The least squares estimator of $\tau $ is

$$\begin{aligned} \hat{\theta }_{n}=\underset{\theta }{\operatorname {arg min}} ~M(\theta ). \end{aligned}$$

(13)

Subsequently, we employed the likelihood ratio test to validate the existence of a change point at $\hat{\theta }_n$. Following Csörgő & Horvath¹⁹ , under a set of assumptions C.1–C.5 (See Supplementary Equations S1 online), the test statistic is given by

$$\begin{aligned} T=n\left( \hat{\sigma }^2-M(\hat{\theta }_{n})\right) /\hat{\sigma }^2. \end{aligned}$$

(14)

where $ \hat{\sigma }^2=\sum _{ a \le i \le b}{\left| \log G_{i}(\theta )-\log \hat{G}_{i}(\theta )\right| ^{2}}.$ We claim that there exists a change point at $\hat{\theta }_{n}$, if

$$\begin{aligned} T > T^\prime = c + 2d \log (-2 / \log (1 - \alpha )), \end{aligned}$$

(15)

where $c = (\left( 2 \log \log n + (p+1)(\log \log \log n)/2-\log \Gamma ((p+1)/2)\right) ^2/2\log \log n)$, $d=(c/(2\log \log n))^{1/2}$, α is the significance level, and $\Gamma (\cdot )$ denotes Gamma function.

Next, the asymptotic distribution of the statistic is presented. Let

$$\begin{aligned} A(x)&= \left( 2 \log x \right) ^{1/2}, \end{aligned}$$

(16)

$$\begin{aligned} D_p(x)&= 2 \log x + \frac{p}{2} \log \log x - \log \Gamma \left( \frac{p}{2} \right) , \end{aligned}$$

(17)

$$\begin{aligned} Z_n&= \max _{a+p+1 \le i \le b-p-1} \left( -n\log \frac{M(\theta )}{\hat{\sigma }^2} \right) . \end{aligned}$$

(18)

From (16)–(18), there is

$$\begin{aligned} \lim _{n \rightarrow \infty } P\left\{ A(\log n)Z_n^{1/2}-D_p(\log n) \le t \right\} = \exp \left( -2e^{-t} \right) . \end{aligned}$$

(19)

To verify this asymptotic result, we conducted a Monte Carlo simulation. We selected different sample sizes, $n = 256, 512, 1024$, and generated random variable sequences $\{Z_n\}$ that satisfy the required conditions for each sample size, thereby obtaining the sample distribution of the statistic $A(\log n)Z_n^{1/2} - D_p(\log n)$. Next, we estimated the critical value $t_{0.05}$ at the significance level $\alpha = 0.05$, which satisfies

$$ P\left\{ A(\log n)Z_n^{1/2} - D_p(\log n) \le t_{\alpha } \right\} \approx 1 - \alpha . $$

By solving for $\exp (-2e^{-t_{\alpha }})=1-\alpha $ with $\alpha =0.05$, we obtain $t_{0.05}=3.663$. We performed 10,000 simulations and obtained the 95th percentile of $A(\log n)Z_n^{1/2} - D_p(\log n)$ as 3.264, 3.302, and 3.323, corresponding to n = 256, 512, and 1024. We observed that the 95th percentile converges to an asymptotic value of 3.663, which confirms the validity of the asymptotic distribution in (19).

To further illustrate and evaluate the performance of the proposed procedures, we give an example to compare this method with the traditional Otsu’s method through a Monte Carlo experiment.

Example 1

Let $p=2$, $\tau =60$, and $n=256$ in (9). We defined

$$\begin{aligned} G_i \propto \exp \left\{ \left( -\frac{(x_{i}/255 - 0.2)^2 + 0.01}{0.05}+\varepsilon _{i,n}\right) \cdot \varvec{I}(i < \tau )- \left( \frac{(x_{i}/255 - 0.7)^2}{0.06}+\varepsilon _{i,n}\right) \cdot \varvec{I}(i \ge \tau ) \right\} , \end{aligned}$$

where $ \varepsilon _{i,n} \sim N(1,5)$. The density histogram of $G_i$ is shown as Fig. 3.

We divided the data sequence into two segments using $\theta $ as the dividing point. According to (12), we can get

$$\begin{aligned} M(\theta )=\sum _{ 3 \le i \le \theta }{\left| \log G_{i}(\theta )-\log \hat{G}_{i}(\theta )\right| ^{2}+ \sum _{\theta < i \le 252}{\left| \log G_{i}(\theta )-\log \hat{G}_{i}(\theta )\right| ^{2}}}. \end{aligned}$$

Next, we employed the Energy Based Model Segmentation (EBS) change point detection method in (13) to identify the change point. Using the EBS method yields the value of the statistic as (a) in Fig.4

$$\begin{aligned} \hat{\tau }_{1}=\underset{\theta }{\operatorname {argmin}}M(\theta ). \end{aligned}$$

We then applied (15) to evaluate the significance of the change point. With $\alpha =0.001$, the observed test statistic is

$$\begin{aligned} T=62.07> T^\prime =24.85. \end{aligned}$$

Therefore, $\hat{\tau }_{1}=60$ was a significant change point estimation.

For comparison purposes, Otsu’s detection method is introduced below. We found that this well-known algorithm is, in fact, equivalent to detecting change point within the grayscale image histogram. Therefore, we provide a rigorous mathematical definition for it and reformulate it within the context of change point detection.

Let the pixels of a given picture be represented in $L+1$ gray levels $[0,1,2, \cdots , L]$. The number of pixels at level i is denoted by $n_i$ and the total number of pixels by $N=n_1+n_2+\cdots +n_L$. In order to simplify the discussion, the gray-level histogram is normalized and regarded as a probability distribution:

$$ p_i=n_i / N,\quad p_i \ge 0, \quad \sum _{i=0}^L p_i=1. $$

Suppose that we divide the pixels into two classes (background and foreground) using $\theta $ as the dividing point, $ 0<\theta <L $. Pixels with levels $[0, \cdots , \theta ]$ belong to one class, while pixels with levels $[\theta +1, \cdots , L]$ belong to the other class. In the context of change point detection, Otsu’s method is equivalent to finding the optimal change point by maximizing the between-class variance:

$$\begin{aligned} Y(\theta )=\left( \sum _{i=0}^\theta p_i\right) \left( \sum _{i=0}^\theta i p_i - \frac{\sum _{i=0}^\theta i p_i}{\sum _{i=0}^\theta p_i} \right) ^2 + \left( \sum _{i=\theta +1}^L p_i \right) \left( \sum _{i=0}^\theta i p_i - \frac{\sum _{i=\theta +1}^L i p_i}{\sum _{i=\theta +1}^L p_i} \right) ^2, \end{aligned}$$

(20)

and

$$\begin{aligned} \hat{\tau }_{2}&=\underset{\theta }{\operatorname {argmax}} Y(\theta ). \end{aligned}$$

(21)

The details of (20) can be found in Supplementary Methods S2 online.

As illustrated in Fig.4, EBM method identified the change point efficiently, where $\hat{\tau }_{1}=60=\tau $, whereas the Otsu method failed to do so. This can be explained by the fact that the Otsu threshold tends to deviate from the intersection point of grayscale histogram, skewing towards the center of the class with larger variance⁸.

Example 2

A dataset of $N = 1000$ grayscale images was synthetically generated to evaluate the performance of various image segmentation algorithms under challenging conditions. Each image has dimensions of SZ $\times $ SZ pixels, where SZ denotes the image size and is set to 100. Initially, we tested the algorithms on a simpler set of images, where all methods performed well. To better assess their robustness in complex scenarios, we increased the difficulty of the test set. The generation process for each image, denoted as I, is described in the Supplementary Method S3 online. The resulting 1000 images are highly noisy and challenging to segment. Representative examples are shown in Fig. 5.

The performance of the segmentation methods is evaluated using the F score, specifically the $F_2$ score²⁰, which emphasizes recall over precision. Let GT be the ground truth binary mask (where $GT(i,j)=1$ for the foreground, $GT(i,j)=0$ for the background), derived directly from the generated mask M. Let BS be the binary segmentation mask produced by a given algorithm for the image I. We define the following quantities based on pixel-wise comparison between BS and GT:

$$\begin{aligned} TP&= \sum _{i=1}^{SZ} \sum _{j=1}^{SZ} \mathbb {I}(BS(i,j) = 1 \wedge GT(i,j) = 1),\\ FP&= \sum _{i=1}^{SZ} \sum _{j=1}^{SZ} \mathbb {I}(BS(i,j) = 1 \wedge GT(i,j) = 0),\\ FN&= \sum _{i=1}^{SZ} \sum _{j=1}^{SZ} \mathbb {I}(BS(i,j) = 0 \wedge GT(i,j) = 1) , \end{aligned}$$

where TP is the number of pixels correctly identified as foreground, FP is the number of pixels incorrectly identified as foreground (background pixels classified as foreground), FN is the number of pixels incorrectly identified as background (foreground pixels classified as background), $\mathbb {I}(\cdot )$ is the indicator function. From these quantities, we calculate Precision (P) and Recall (R) as follows:

$$\begin{aligned} { P = \frac{TP}{TP + FP}, ~R = \frac{TP}{TP + FN}, } \end{aligned}$$

where Precision (P) represents the proportion of pixels identified as foreground that are actually foreground, and Recall (R) denotes the proportion of actual foreground pixels correctly identified. The general F score ($F_\beta $) is the harmonic mean of Precision and Recall, weighted by a factor $\beta $:

$$ F_\beta = (1 + \beta ^2) \frac{P \cdot R}{(\beta ^2 \cdot P) + R}. $$

In biomedical image analysis, the $F_2$ score is a commonly used evaluation metric. This choice gives R twice the importance of Precision:

$$ F_2 = (1 + 2^2) \frac{P \cdot R}{(2^2 \cdot P) + R} = \frac{5 P R}{4P + R}. $$

We have used different methods for image segmentation, including the Otsu method and the recent change point detection methods: Inspect²¹, Wild Binary Segmentation (WBS)²², Standard Binary Segmentation (SBS)²², Graph-based change-point (GCP)²³ as well as the non-threshold segmentation method Bayesian Gaussian Mixture²⁴ (BGM). Table 1 shows that EBS achieves the highest average $F_2$ score, demonstrating strong overall performance.

Table 1 Average $F_2$ scores for different segmentation methods on the synthetic dataset.

Full size table

Algorithm

This section outlines the steps in EBS and Adaptive Multiple change point Energy-based Model Segmentation (MEBS) algorithms.

EBS

Based on the theorem of the EBM method, the procedural steps of energy-based image segmentation are delineated as follows:

In Algorithm 1, the CalculateTauHat function uses parameters a and b to define the grayscale value interval for histogram analysis, defaulting to the full range (e.g., 0–255). The trim parameter (default False) controls whether this interval is automatically adjusted to exclude histogram tail-end data, typically based on cumulative distribution thresholds (e.g., removing the lowest and highest $5\%$). This focuses on the analysis on of the main grayscale distribution. The parameter p (default 2) sets the polynomial degree used to fit the histogram within the (possibly trimmed) interval. Together, these parameters guide the calculation of the optimal segmentation threshold $\hat{\tau }$. The algorithm processes an image with N pixels and a gray range M. Initially, the algorithm computes the histogram H by examining each of the N pixels, an operation with a time complexity of O(N). Following this, it processes the CalculateTauHat function, which involves performing change point detection using polynomial fits on the histogram H. Given that the histogram consists of M bins corresponding to the gray range, this subsequent step has a computational complexity of $O(M^2)$. Therefore, the total computational complexity for this algorithm is $O(N + M^2)$.

It is worth noting that the EBS could only handle images with bimodal grayscale histograms, making it challenging to process images with histograms containing more than two peaks. The MEBS method extends the original EBS by detecting multiple change points through a bi-section approach, enabling it to process complex, multi-modal images beyond the bimodal limitation of EBS. Specifically, EBS serves as the basis for MEBS, and in the special case of $p=2$, EBS can be considered the initial step within the MEBS framework.

MEBS

The limit theorem in change point analysis typically requires a large sample, conventionally defined as 30 or more observations²⁵. Since we need two regressions within a region to find the change point, the minimum number of regions is 60. Therefore, According to (12), we get

$$\begin{aligned} (b-p-1)-(a+p+1) \ge =\xi . \end{aligned}$$

(22)

Therefore, we set $\xi =60$. Based on Algorithm 1 of EBS, The algorithmic flow chart of MEBS is shown in Fig.6:

The definitions of $a,b, \hat{\theta }_{n}, T, T^\prime $ could be found in (9), (13), (14), and (15). List.append(x) means adding x to the List. The computational complexity of the MEBS algorithm is $O(N+L \cdot M^2)$, where L is the number of significant thresholds found. Fig.6 demonstrates that if the CalculateTauHat function, applied to a given interval with polynomial degree p, fails to detect a significant threshold, the algorithm does not immediately discard the interval. Instead, the algorithm increases the polynomial degree from p to $p + 1$ and refits the histogram data for segmentation. Automatically increasing p enhances the robustness and capability of the algorithm in detecting thresholds in histograms with complex or subtle patterns. The recursion terminates only when either a significant threshold cannot be identified after incrementing the polynomial degree p, or the interval length falls below a predefined minimum threshold.

Empirical applications

Based on the algorithm presented, we now illustrate their applications through several empirical examples. All applications were implemented on a personal computer equipped with a 3.8 GHz CPU, utilizing Python 3.11 as the programming language.

EBS applications

Lung fluids segmentation

We acknowledge the use of this $600\times 459$ original image from Nouri, S. et al.²⁶. The image is licensed under CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/). It indicates bilateral pleural effusion (indicated by blue arrows) in a 67-year-old female patient with acute myeloid leukemia (AML), suggesting pulmonary edema. Fluids within the lungs are particularly insidious, rendering their isolation challenging. As illustrated in Fig. 7, the EBS method appears to segment bilateral pleural effusions with clarity effectively. In contrast due to its lack of sensitivity to finer details, Otsu’s method failed to discern the bilateral pleural effusions. Although Modified Otsu²⁷ aligns more closely with EBS than Otsu, it still inherits Otsu’s limitations in single-threshold segmentation. Hommo-Ostu²⁸ integrates homomorphic transform with Otsu segmentation to enhance contrast between detected regions and normal tissue, but its effectiveness in pleural effusion segmentation remains limited.

Brain tumor segmentation

Medical image analysis techniques using brain X-ray are used for early diagnosis and treatment planning, assisting radiologists in identifying and evaluating brain tumors. We acknowledge using this $279\times 344$ original image from the Brain Tumor Dataset²⁹. The contents of this dataset are licensed under DbCL v1.0 (https://opendatacommons.org/licenses/dbcl/1-0/). It shows the brain X-ray of a tumor patient, with the tumor’s location indicated by the red arrow. As shown in Fig. 8, the EBS method effectively highlights the tumor contour, though some white noise is still present in surrounding areas. In contrast, other methods struggle to segment the tumor, focusing mainly on the overall brain contour.

Applications of MEBS

Detailed descriptions of the MEBS operations on individual images can be found in Supplementary Discussion S4 online.

Breast tumor detection in mammograms

Comparison of MEBS and Other Segmentation Methods for Breast Tumor Detection in Mammograms

Table 2 $F_2$ scores in breast tumor detection.

Full size table

The $868\times 806$ images in Fig. 9 depict breast tumors in the mammogram from Alsolami et al.³⁰. We acknowledge their work, which is licensed under CC0 1.0 (https://creativecommons.org/publicdomain/zero/1.0/). These X-ray images were annotated and validated by three different radiologists, Dr. Sawsan Ashoor, Dr. Samia Alamoud, and Dr. Gawaher Al Ahadi, and the breast cancer images were segmented through hand-drawing on the suspicious areas. The structure of this particular tumor closely resembled the complex architecture of a healthy breast. Therefore, it is challenging to identify any distinctive or differentiating features from normal breast tissue. Due to the multiple peaks in the grayscale histogram of these images, simple threshold detection is insufficient. Therefore, we apply the Adaptive Multiple change point Energy-based Model Segmentation (MEBS). We compare its performance to several methods, including Subtractive Clustering K-Means with Filtering (SCKF)³¹, Sparse Graph Spectral Clustering (SGSC)³², Gaussian mixture model based on Markov random field (MRF-GMM)³³, and Adaptive Thresholding³⁴. MEBS identified the optimal number of clusters as $k=6$. Since the other methods require manual selection of the cluster number, we fixed this value for all methods to facilitate a direct comparison. The results demonstrate that tumor regions seem more accurately detected using MEBS, particularly in Fig.9 ($B_{1}$) and Fig.9 ($D_{1}$). MEBS appears to segment nearly the entire tumor, whereas the other methods fail to achieve this level of precision. Tumor segmentation precision was quantitatively evaluated using the $F_2$ score, as presented in Table 2. Details of the evaluation procedure are provided in Supplementary Method S5 online. The results indicate that MEBS consistently achieves the highest or near-highest $F_2$ scores, significantly outperforming SCKF and MRF-GMM. Compared to the SGSC method, MEBS performs comparably overall.

Cells counting analysis

The $512\times 512$ pixel cell image is from Gooh et al.⁴ and used with the authors’ permission. They operated optical manipulation of Arabidopsis’s early embryogenesis and took a picture at every other unit of time. Counting these cells within each unit of time can provide insights into the growth trends of plants. In addition to cell count, other characteristics such as morphological changes, size, density, and position of cells can also be considered. By analyzing variations in these features, it is possible to gain a more comprehensive understanding of the developmental stages of plant embryos, which may be important for future agricultural applications. Image segmentation is crucial before cell counting can be conducted. Due to interference from the cellular matrix illustrated in Fig. 10, it is challenging for image segmentation techniques to produce precise outcomes. We selected the 1st, 101st, 201st, and 301st images to apply different segmentation methods and evaluate their effectiveness. According to Fig. 11, MEBS, Adaptive Mean Threshold, and Adaptive Gaussian Threshold appear to perform better than the other methods. In Fig. 11A, MEBS demonstrated notable effectiveness in segmenting individual cells. Other clustering methods were ineffective because they could not eliminate the noise produced by the original epidermal layer and matrix of the embryo, resulting in suboptimal cell segmentation. However, although the Adaptive Mean Threshold and Adaptive Gaussian Threshold performed segmentation as effectively as MEBS, they could only binarize the images to black and white, resulting in a substantial loss of information. On the other hand, MEBS segments the cells but also classifies them into different colors. The results could have potential applications in identifying differences in the cell division process. Table 3 further highlights the overall superior performance of MEBS compared to other methods, based on the $F_2$ score.

Table 3 $F_2$ scores in cells counting analysis.

Full size table

Wildfire spark detection

These 668 $\times $ 478 pixel images, originally acquired by the MODIS instrument aboard the Terra Earth-orbiting satellite, were obtained from NASA’s LAADS DAAC³⁵. Hyperspectral images taken from satellites are well suited to analyze wildfires. Hyperspectral images are characterized by having many discrete layers or bands over a range of wavelengths. Specific bands will be sensitive to different features. For example, some bands will capture clouds, snow, or for our analysis, fire well³⁶. Consequently, the image is plagued with substantial noise, rendering conventional image segmentation methods ineffective in this scenario. For this kind of image, we do not quantitatively compare them because the wildfire sparks are just a few pixels, and it is straightforward to see whether they are segmented or not. As illustrated in Fig.12³⁶, the tendency of the other methods toward excessive generalization impairs their ability to detect spark points, significantly compromising their segmentation results accurately. In contrast, MEBS and SGSC have a strong ability to recognize images and detect sparks effectively. It is interesting to note that both MEBS and SGSC successfully identified two red sparks, as indicated by the arrows in Fig. 12C. This robustness is particularly important when dealing with satellite remote sensing data, which are often subject to noise and uncertainty. This demonstrates the potential of MEBS for future wildfire spark detection applications, providing a promising tool for effective wildfire spark prevention and control strategies.

Conclusions

This paper introduces an energy-based image segmentation method inspired by energy functions in physics and incorporates change point detection within a stochastic process framework. The advantage of this approach is demonstrated through the use of KL divergence. Building on this foundation, we propose an innovative and effective method for applying change point detection to image segmentation. The EBS method exhibits refined segmentation capabilities, particularly in capturing complex details that are often imperceptible to the human eye. In empirical studies, we constructed a highly challenging test dataset, and comparative analyses show that EBS achieves greater advantages over competing methods. In practical applications, EBS demonstrates high accuracy in segmenting medical images. Furthermore, we extend EBS to the MEBS framework, which can automatically determine the optimal number of categories and performs well when processing multi-modal images. MEBS achieves high precision in detecting sparks in wildfire imagery and, based on the $F_2$ score evaluation, outperforms other methods in segmenting diverse types of images.

One limitation of the current study is its focus on grayscale images. For color images, the different-channel intensities can be converted into a single grayscale intensity using weighted-sum methods^37,38. In light of the findings reported by Shi et al.³⁹, our method can be readily extended to handle color image segmentation.

Data Availability

The data and code will be made available on reasonable request to the corresponding author of the paper. The content list is available in Supplementary Data S6 online.

References

Canadian Cancer Society. Breast Cancer Statistics. https://cancer.ca/en/cancer-information/cancer-types/breast/statistics (2024). Accessed 12 Aug 2024.
Cascetta, K. What does it mean if breast cancer spreads to your lymph nodes? https://www.healthline.com/health/breast-cancer/breast-cancer-lymph-nodes (2023). Accessed 12 Aug 2024.
MacCarthy, J., Tyukavina, A., Weisse, M. J., Harris, N. & Glen, E. Extreme wildfires in Canada and their contribution to global loss in tree cover and carbon emissions in 2023. Glob. Change Biol. 30, e17392 (2024).
Article CAS Google Scholar
Gooh, K. et al. Live-cell imaging and optical manipulation of arabidopsis early embryogenesis. Dev. Cell 34, 242–251 (2015).
Article CAS PubMed Google Scholar
Otsu, N. et al. A threshold selection method from gray-level histograms. Automatica 11, 23–27 (1975).
Google Scholar
Yu, B., Jain, A. K. & Mohiuddin, M. Address block location on complex mail pieces. In Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 2, 897–901 (IEEE, 1997).
Kurita, T., Otsu, N. & Abdelmalek, N. Maximum likelihood thresholding based on population mixture models. Pattern Recogn. 25, 1231–1240 (1992).
Article ADS Google Scholar
Xu, X., Xu, S., Jin, L. & Song, E. Characteristic analysis of otsu threshold and its applications. Pattern Recogn. Lett. 32, 956–961 (2011).
Article ADS CAS Google Scholar
Kapur, J. N., Sahoo, P. K. & Wong, A. K. A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29, 273–285 (1985).
Article Google Scholar
Kittler, J. & Illingworth, J. Minimum error thresholding. Pattern Recogn. 19, 41–47 (1986).
Article ADS Google Scholar
Raju, P. D. R. & Neelima, G. Image segmentation by using histogram thresholding. Int. J. Comput. Sci. Eng. Technol. 2, 776–779 (2012).
Google Scholar
Zhang, J. & Hu, J. Image segmentation based on 2d otsu method with histogram analysis. In 2008 International Conference on Computer Science and Software Engineering, vol. 6, 105–108 (IEEE, 2008).
Lindley, D. Boltzmann’s atom: the great debate that launched a revolution in physics (Simon and Schuster, 2001).
LeCun, Y. et al. A tutorial on energy-based learning. Predicting structured data1 (2006).
Bouman, C. A., Shapiro, M., Cook, G., Atkins, C. B. & Cheng, H. Cluster: An unsupervised algorithm for modeling Gaussian mixtures (Springer, 1997).
Google Scholar
Ruderman, D. L. & Bialek, W. Statistics of natural images: Scaling in the woods. Phys. Rev. Lett. 73, 814 (1994).
Article ADS CAS PubMed Google Scholar
Shi, X., Wang, X.-S., Wong, A. & Lin, W. Approximate inference with exponential tilting densities: theory andapplications. Statistical Science accepted on Dec 28, 2024 (2024).
Jin, B., Wu, Y. & Shi, X. Consistent two-stage multiple change-point detection in linear models. Can. J. Stat. 44, 161–179 (2016).
Article MathSciNet Google Scholar
Csorgo, M. & Horváth, L. Limit theorems in change-point analysis (1997).
Van Rijsbergen, C. J. Information Retrieval, 2nd edn. (1979).
Wang, T. & Samworth, R. J. High dimensional change point estimation via sparse projection. J. R. Stat. Soc. Ser. B Stat Methodol. 80, 57–83 (2018).
Article MathSciNet Google Scholar
Fryzlewicz, P. Wild binary segmentation for multiple change-point detection. Ann. Stat. (2014).
Shi, X., Wu, Y. & Rao, C. R. Consistent and powerful graph-based change-point test for high-dimensional data. Proc. Natl. Acad. Sci. 114, 3873–3878 (2017).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Cohen, J. Approximate power and sample size determination for common one-sample and two-sample hypothesis tests. Educ. Psychol. Measur. 30, 811–831 (1970).
Article Google Scholar
Nouri, S., Mirhosseini, N., Naghibi, N. & Hasanian, M. Thoracic ct scan findings in patients with confirmed hematologic malignancies admitted to the hospital with acute pulmonary symptoms. Int. J. Cancer Manag.16 (2023).
Faragallah, O. S., El-Hoseny, H. M. & El-sayed, H. S. Efficient brain tumor segmentation using otsu and k-means clustering in homomorphic transform. Biomed. Signal Process. Control 84, 104712 (2023).
Article Google Scholar
Singh, S., Mittal, N., Singh, H. & Oliva, D. Improving the segmentation of digital images by using a modified otsu’s between-class variance. Multim. Tools Appl. 82, 40701–40743 (2023).
Article Google Scholar
Shaheen, M. Brain Tumor Dataset. https://www.kaggle.com/datasets/mahmoudshaheen1134/brain-tumor-dataset (2022). Accessed 29 April 2025.
Alsolami, A. S. et al. King abdulaziz university breast cancer mammogram dataset (kau-bcmd). Data 6, 111 (2021).
Article Google Scholar
Deeparani, K. & Sudhakar, P. Efficient image segmentation and implementation of k-means clustering. Mater. Today Proc. 45, 8076–8079 (2021).
Article Google Scholar
Palnitkar, R. & Neto, J. F. S. R. A sparse graph formulation for efficient spectral image segmentation. In 2024 IEEE International Conference on Image Processing (ICIP), 1410–1416 (IEEE, 2024).
Zhang, Y., Brady, M. & Smith, S. Segmentation of brain mr images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20, 45–57 (2001).
Article CAS PubMed Google Scholar
Library, O. Opencv: Open Source Computer Vision Library (2024). Accessed 02 Aug 2024.
Level 1 and Atmospheres Archive and Distribution System (LAADS) Distributed Active Archive Center (DAAC). Terra/MODIS MOD021KM, Level 1B Calibrated Radiances - 1km (March 2019) (2019).
Giles, P. A summary of research on fire detection problem and change-point analysis using modis data. Tech. Rep., Research report with Dr. Xiaoping Shi, unpublished (2019).
Jack, K. Video Demystified: A Handbook for the Digital Engineer (Elsevier, 2011).
Google Scholar
Pratt, W. K. Digital Image Processing: PIKS Scientific Inside Vol. 4 (Wiley, 2007).
Book Google Scholar
Shi, X., Wu, Y. & Rao, C. R. Consistent and powerful non-euclidean graph-based change-point test with applications to segmenting random interfered video data. Proc. Natl. Acad. Sci. 115, 5914–5919 (2018).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to thank the three reviewers and the editor for their valuable suggestions, which have significantly improved the presentation of this paper. We thank Dr. Daisuke Kurihara for providing the cell image data. We also thank Dr. Fateme Sadat Hosseini, Dr. Mahmoud Shaheen, Dr. Asmaa Saad, and NASA LAADS DAAC for making the datasets used in this study available.

Funding

Dr. Shi discloses support for the research of this work from the Natural Sciences and Engineering Research Council of Canada [grant number RGPIN-2022-03264], the NSERC Alliance International Catalyst [grant number ALLRP 590341-23] and the University of British Columbia Okanagan (UBC-O) Vice Principal Research in collaboration with UBC-O Irving K. Dr. Fu discloses support for the research of this work from the Natural Sciences and Engineering Research Council of Canada [grant RGPIN 2018-05846].

Author information

Authors and Affiliations

Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Kelowna, V1V 1V7, Canada
Jiatao Zhong, Shiyin Du, Canruo Shen, Yiting Chen, Min Gao, Medha Naidu & Xiaoping Shi
Department of Mathematics and Statistics, York University, Toronto, M3J 1P3, Canada
Yuejiao Fu
Changzhou Institute of Technology, Changzhou, 213032, China
Min Gao

Authors

Jiatao Zhong
View author publications
Search author on:PubMed Google Scholar
Shiyin Du
View author publications
Search author on:PubMed Google Scholar
Canruo Shen
View author publications
Search author on:PubMed Google Scholar
Yiting Chen
View author publications
Search author on:PubMed Google Scholar
Min Gao
View author publications
Search author on:PubMed Google Scholar
Medha Naidu
View author publications
Search author on:PubMed Google Scholar
Xiaoping Shi
View author publications
Search author on:PubMed Google Scholar
Yuejiao Fu
View author publications
Search author on:PubMed Google Scholar

Contributions

J.Z. and X.S. designed this research. J.Z. performed this research and wrote the main manuscript. S.D. assisted in modelling and empirical applications. C.S. assisted in evaluating the performance of the MEBS method in image histogram analysis. Y.C. performed change point detection in empirical experiments and evaluated the performance of methods. M.G. assisted in model development and performed sensitivity analysis. M.N., X.S., and Y.F. reviewed and revised the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiaoping Shi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This study utilized exclusively pre-existing datasets. Three human datasets were obtained from publicly available sources and are governed by the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), Database Contents License v1.0 (DbCL v1.0), and Creative Commons Zero v1.0 Universal (CC0 1.0) licenses, respectively. An additional copyrighted plant dataset was included in the analysis after obtaining explicit permission from the data owner(s).

Human participants or animals

All human data used in this study were accessed in an anonymized or de-identified format. This research involved only the secondary analysis of these existing data. There was no direct interaction with human participants or animals by the researchers involved in this study, nor was there direct handling of the original plant specimens. Consequently, as the study relied on existing, publicly available or permitted, and anonymized/de-identified data, formal ethical approval from an Institutional Review Board (IRB) or Ethics Committee was not required for this specific secondary analysis. The use of all datasets adhered strictly to their respective licensing terms and the permissions granted.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhong, J., Du, S., Shen, C. et al. Energy-based segmentation methods for images with non-Gaussian noise. Sci Rep 15, 25707 (2025). https://doi.org/10.1038/s41598-025-09211-8

Download citation

Received: 26 November 2024
Accepted: 25 June 2025
Published: 16 July 2025
Version of record: 16 July 2025
DOI: https://doi.org/10.1038/s41598-025-09211-8

Keywords

This article is cited by

Cable partial discharge identification network based on adaptive residual diffusion denoising and morphological attention
- Long Chen
- Qiong Li
- Jian Deng
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

An alternating multiple residual Wasserstein regularization model for Gaussian image denoising

Robust mean shift filter for mixed Gaussian and impulsive noise reduction in color digital images

Opti3D for low light enhancement and calibration free 3D digitization of cultural relics

Introduction

Modelling

EBM with non-Gaussian noise

EBM with single change point

Change point detection in grayscale histogram

Example 1

Example 2

Algorithm

EBS

MEBS

Empirical applications

EBS applications

Lung fluids segmentation

Brain tumor segmentation

Applications of MEBS

Breast tumor detection in mammograms

Cells counting analysis

Wildfire spark detection

Conclusions

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Human participants or animals

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Cable partial discharge identification network based on adaptive residual diffusion denoising and morphological attention

Search

Quick links