Efficient parameter identification using nonsmooth and nonconvex regularization for inverse problem of diffuse optical tomography

Tang, Jinping; Bi, Bo

doi:10.1038/s41598-025-07560-y

Download PDF

Article
Open access
Published: 18 July 2025

Efficient parameter identification using nonsmooth and nonconvex regularization for inverse problem of diffuse optical tomography

Jinping Tang¹ &
Bo Bi²

Scientific Reports volume 15, Article number: 26103 (2025) Cite this article

521 Accesses
Metrics details

Subjects

Abstract

To better reconstruct the piecewise constant optical coefficients in diffuse optical tomography, nonconvex and nonsmooth approximation of weak Mumford–Shah functional is considered in this paper. We theoretically analyze the existence of minimizers in piecewise constant finite element space and constrained finite dimensional subsets. However, optimizing such minimization problems presents a computational challenge due to the nonconvex and non-differentiable properties. To overcome these difficulties, we propose a fast graduated nonconvex alternative directional multiplier method to solve this numerical problem. Compared with graduated nonconvex Gaussian–Newton, $TV^q$ Gaussian–Newton, and TV Gaussian–Newton, our simulations show that the proposed GNC-ADMM can well keep edges and values of the anomaly with fewer iteration steps and fewer measurements.

Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network

Article Open access 04 January 2021

Reconfigurable image processing metasurfaces with phase-change materials

Article Open access 27 May 2024

Electrically reconfigurable non-volatile metasurface using low-loss optical phase-change material

Article 19 April 2021

Introduction

Diffuse Optical Tomography (DOT) is a non-invasive and non-destructive medical imaging technique by which optical parameters of biological media can be reconstructed through information of the excitation light and the detected outgoing light. The propagation of light within biological media can be modeled by a radiative transport system in a domain $X\subset {\mathbb {R}^{N}}(N=2,3)$. Assuming transport process is considered in the stationary state, the density of particles $u(\varvec{x},\varvec{\omega })$ at a point $\varvec{x}\in {X}$ moving in direction $\varvec{\omega }\in {\Omega }:=\mathbb {S}^{N-1}$ can be modeled by the following boundary value problem:

$$\begin{aligned} \varvec{\omega }\cdot \nabla u(\varvec{x},\varvec{\omega }) + (\mu _a(\varvec{x}) +\mu _s(\varvec{x})) u(\varvec{x},\varvec{\omega }) = \mu _s(\varvec{x})\int _{\Omega }k(\varvec{\omega }\cdot \hat{\varvec{\omega }})u(\varvec{x},\hat{\varvec{\omega }})d\sigma (\hat{\varvec{\omega }}), \;\text {in}\; X\times \Omega \end{aligned}$$

(1)

Here, $d\sigma (\varvec{\omega })$ is the infinitesimal area element on the unit sphere $\Omega$, $\varvec{\omega }\cdot \nabla u(\varvec{x},\varvec{\omega })$ denotes the directional derivative, $k(\varvec{\omega }\cdot \hat{\varvec{\omega }})$ is the nonnegative normalized phase function whose specific definition can be found in¹. The media is characterized by rates $\mu _a(\varvec{x})$ and $\mu _s(\varvec{x})$ of absorption and scattering. The incident excitation light over the boundary is modeled by

$$\begin{aligned} u(\varvec{x},\varvec{\omega }) = u_{in}(\varvec{x},\varvec{\omega }), \; \varvec{x}\in {\partial X}, \;\varvec{\omega }\cdot \nu (\varvec{x})<0, \,\, \end{aligned}$$

(2)

where $\nu (\varvec{x})$ denotes the unit outward normal at $\varvec{x}\in {\partial X}$. Considering the actual physical background, the optical coefficients are often assumed to be uniformly positive and bounded, i.e., the domain of coefficients D can be defined as $D=D_1\times D_2$, where $D_1 = \{\mu _a(\varvec{x})|\mu _a(\varvec{x})\in {L^{\infty }(X)}, \text {there exist two constants} \mu _a^1 \text {and} \mu _a^2 \text {such that } 0<\mu _a^1\le \mu _a(\varvec{x})\le \mu _a^2<\infty \}$, $D_2 = \{\mu _s(\varvec{x})|\mu _s(\varvec{x})\in {L^{\infty }(X)}, \text {there exist two constants} \mu _s^1 \text {and} \mu _s^2 \text {such that } 0<\mu _s^1\le \mu _s(\varvec{x})\le \mu _s^2<\infty \}$. In this paper, we assume the outgoing light is measured angularly averaged, then the measurement data $M(\varvec{x})\in {L^2(\partial X)}$ can be written as $M(\varvec{x})=\int _{\varvec{\omega }\cdot \nu (\varvec{x})>0}\varvec{\omega }\cdot \nu (\varvec{x}) u(\varvec{x},\varvec{\omega })d\sigma (\varvec{\omega })$ . Based on the above notations, the forward problem of DOT can be described as the following nonlinear operator equation

$$\begin{aligned} F:D\rightarrow L^2(\partial X) \quad (\mu _a(\varvec{x}),\mu _s(\varvec{x}))\mapsto M(\varvec{x}). \end{aligned}$$

(3)

The corresponding inverse problem of DOT is to find the optimal optical parameters $(\mu _a(\varvec{x}),\mu _s(\varvec{x}))$ from $M(\varvec{x})$ such that they satisfy (3). It is well known that the inverse problem of DOT is ill-posed. Hence, multiple excitations with finite observation schemes and regularization penalty strategy are often used. Assuming there are s incident sources, d detectors located at the boundary of domain X, then the forward operator and measurements for the i-th ($0<i\le s$) excitation are $F_i$ and $M_i(\varvec{x})$. We denote the regularization penalty term about the optical coefficients as $\mathcal {R}(\mu )$ and introduce the regularized functional

$$\begin{aligned} J_{\beta }(\mu ):=\dfrac{1}{2}\sum _{i=1}^s\Vert F_i(\mu )-M_i(\varvec{x})\Vert _{L^2(\partial X)}^2 + \beta \mathcal {R}(\mu ). \end{aligned}$$

(4)

Then, the inverse problem can be abstracted as

$$\begin{aligned} (P)\quad \text {Find} \;\mu \in {Q} \;\text {such that}\; J_{\beta }(\mu ) \; \text {is minimal possible over}\; Q, \end{aligned}$$

with Q the admissible set for $\mu$, which is usually chosen as the intersection of D and certain functional Banach space possessing the property of weak compactness, e.g., $L^2$ space, bounded variation space. Due to the box constraint of D and the weak compactness of the Banach space, as well as the additional assumption that the regularization penalty term $\mathcal {R}(\mu )$ satisfies a lower semicontinuity condition, the minimization problem (P) exists at least one minimizer. In this study, we assume that the scattering coefficient is known. This assumption is a common simplification in the DOT literature. The joint reconstruction of both absorption and scattering coefficients is a significantly more ill-posed problem and often leads to increased numerical instability. By focusing on the reconstruction of the absorption coefficient, we are able to evaluate the effectiveness and convergence of the proposed non-smooth and nonconvex regularization method.

There are a variety of possible choices for $\mathcal {R}(\mu )$, for example, it can be settled as $\mathcal {R}(\mu )=\Vert \mu \Vert _p^p$, or $\mathcal {R}(\mu )=TV_c^{\psi _{q}}(\mu ):=\int _{X}\psi _q(|\nabla \mu (\varvec{x})|)d\varvec{x}$ for continuous $\mu$, and $\mathcal {R}(\mu )=TV_d^{\psi _{q}}(\mu ):=\int _{S_{\mu }}\psi _{q}(|\mu ^+(\varvec{x})-\mu ^-(\varvec{x})|)d\mathcal {H}^{N-1}$ for piecewise constant $\mu$. Here, $S_{\mu }$ is the jump set of $\mu (\varvec{x})$, where the one-sided traces $\mu ^{\pm }$ from different sides of $S_{\mu }$ differ, $\psi _{q}(\cdot )$ is the energy function parameterized by q. In particular, when $\psi (t)$ is taken as $\psi _{0-1}(t)$, i.e., $\psi _{0-1}(0)=0$, $\psi _{0-1}(t)=1$ when $t\ne 0$, the penalty term reduced to the weak version of Mumford–Shah functional which is known for providing an approach to well quantify edges together with values of the nonsmooth piecewise constant parameter. In our previous work¹, we proposed to approximate $\mathcal {R}(\mu )=TV_d^{\psi _{0-1}}(\mu )$ by a family of TV related graduated nonconvex energy functions $\mathcal {R}(\mu ;\epsilon ,\psi _{\alpha }):=(1-\epsilon )TV(\mu )+\epsilon TV_d^{\psi _{q}}(\mu )$, where $\psi _{q}(t)$ is a family of concave energy function parameterized by q. The analysis and simulations exhibited the power of this graduated nonconvex scheme in preserving edges and reducing background artifacts.

Note that the TV term in $\mathcal {R}(\mu ;\epsilon ,\psi _q)$ is the $L^1$ norm of the gradient in the sense of piecewise constant, we adopt variable splitting and alternative directional strategies to transfer the nonsmooth TV term in such a way that the minimization steps relevant to this convex nonsmooth term can be solved with shrinkage operation. To this end, we transfer the primal minimization problem to a primal dual problem by variable splitting. It is natural to solve this primal dual problem using the Augmented Lagrangian Method (ALM). However, because there may be a nonzero duality gap between primal and dual problems, the Alternative Directional Multiplier Method (ADMM), which is a sequential approximation method of ALM, is applied. To ensure convergence of the proposed iterative method, we linearize the data-fitting term of the objective function. This linearization, as part of our ADMM-based approach, allows us to maintain convergence guarantees while solving each subproblem iteratively.

More specifically, in this paper, we make the following two contributions: (1) we theoretically analyze the existence of solutions for minimization problem with nonconvex total variation energy function $\mathcal {R}(\mu ;\epsilon ,\psi _q)$ over piecewise constant finite element space and constrained finite dimensional subset; (2) we theoretically analyze the convergence of iterative sequence generated with ADMM to the primal nonconvex minization problem with introduced splitting variable; (3) we develop an efficient double graduated nonconvex algorithm based on ADMM to minimize the weak version of Mumford Shah functional. The proposed GNC-ADMM has two highlights: (1) we construct a weighted combination of total variation and an adjustable nonconvex potential function and then approximate weak Mumford–Shah functional by tuning weight coefficient and nonconvexity parameter simultaneously; (2) we treat the energy function at the first stage as two separate total variation by weighted decompose it into two parts, and then solve each of them with shrinkage operator and lagged diffusive fixed point strategy. Simulation results indicate the effectiveness of the proposed algorithm. To ensure convergence of the proposed iterative method, we linearize the data-fitting term of the objective function. This linearization, as part of our ADMM-based approach, allows us to maintain convergence guarantees while solving each subproblem iteratively.

In parallel to our model-based optimization approach, recent studies have adopted deep learning strategies to improve image quality and stability in DOT. These include model-driven CNNs and graph convolutional networks that incorporate physics priors and spatial constraints^2,3, as well as multitask architectures for joint reconstruction and lesion localization⁴. Although these deep learning-based methods differ fundamentally from variational regularization framework, they address similar goals of edge preservation, artifact reduction, and computational efficiency in DOT. Our proposed GNC-ADMM offers an interpretable and optimization-theoretic alternative that does not require training data, and thus complements these advances from a model-based perspective.

This manuscript is organized as follows. In section 1, we analyze the existence of a minimizer for nonconvex regularization functional under the circumstance of finite element numerical discretization and then transfer the primal minimization problem to the primal dual problem by variable splitting. In section 2, we describe the proposed Graduated Nonconvex Alternative Directional Multiplier Method (GNC-ADMM) algorithm and present simulation results in section 3. Finally, we conclude with a short conclusion and some remarks on further research are given.

Minimization problems

The regularized minimization problem for DOT can be described generally by problem (P). In this section, we focus on the minimization problem based on the regularized functional $\mathcal {R}(\mu ;\epsilon ,\psi _{q})$ under the situation of numerical discretization.

For a numerical approximation of DOT problems in a finite element approach, the polyhedral domain X needs to be triangulated with a regular triangulation mesh $\mathcal {T}_h$ of simplicial elements, namely intervals in one dimension, triangles in two dimensions and tetrahedra in three dimensions. Here, we assume that the spatial triangular mesh for the optical density $u(\varvec{x},\varvec{\omega })$ is the same as that for optical coefficient $\mu (\varvec{x})$. Then associated with triangulation $\mathcal {T}_h$, together with specific discretization method for angular variable $\varvec{\omega }$, the finite element numerical approximation can be implemented for the boundary problem of Radiative Transfer Equation (RTE). In this paper, we don’t intend to expand details about the discretization of the boundary problem of RTE. For mathematical analysis and specific technical details of the most popular discontinuous Galerkin finite element method for the forward problem of DOT, please refer to⁵ and⁶.

We define the finite element space $V_h$ for optical coefficients to be the following piecewise constant space

$$\begin{aligned} V_h=\{v\in {L^2(X)};\forall K\in {\mathcal {T}_h},\exists \alpha _K\in {\mathbb {R}}:v_K=\alpha _K\} \end{aligned}$$

(5)

Then the discrete constrained admissible set can be defined as $Q_{ad}^{h}:= D\cap V_h$. Then, the numerical scheme for the discretized problem based on graduated nonconvex total variation minimization is

$$\begin{aligned} \left( P_{TV}^{h,\psi _{q}}\right) \quad Find\; \mu ^h\in {Q_{ad}^h}\; such \; that \; J_{\beta }(\mu ^h)\; is\; minimal \; possible \; over\; Q_{ad}^h \end{aligned}$$

where $J_{\beta }(\mu ^h) = \dfrac{1}{2}\sum _{i=1}^{N_s}\Vert F_i^h(\mu ^h)-M_i\Vert _{L^2(\partial X)}^2 + \beta \mathcal {R}(\mu ^h;\epsilon ,\psi _{q})$. Here, $F_i^h(\mu ^h)$ is the numerical forward operator correspond to the ith excitation defined as formula (3) with $\mu$ replaced by $\mu ^h$, D replaced by $Q_{ad}^h$.

Assuming the energy function $\psi _{q}(t)$ satisfying the following four conditions: a) is symmetric on $(-\infty ,\infty )$, $\mathcal {C}^2$ on $(0,\infty )$; b) $\psi ^{\prime }(0^+)>0$ and $\psi ^{\prime }(t)\ge 0$ for all $t>0$; c) $\psi (0)=0$ and $\psi _0 =\infty$, where $\psi _0:=\lim \limits _{t\rightarrow 0^+}\psi (t)/t$; d) $\psi ^{\prime \prime }(t)$ increasing on $(0,\infty )$ with $\lim \limits _{t\rightarrow 0^+}\psi ^{\prime \prime }(t)<0$ and $\lim \limits _{t\rightarrow \infty }\psi ^{\prime \prime }(t)=0$.

By analogue reasoning for weak continuity of the forward operator F with respect to the topologies of $L^p(X)(p>1)$ and $L^2(\partial X)$ as in⁷, Theorem 3.2, and the continuous embedding of BV(X) in $L^q(X)$ space for certain $1\le q< 1^*$, with $1^*=\frac{N}{N-1}$ if $N>1$ and $1^*=\infty$ if $N=1$(⁸, Corollary 3.49 ), the weak continuity of the numerical forward operator $F_i^h$ with respect to $Q_{ad}^h$ and $L^2(\partial X)$ can be proved.

By the lower semicontinuous of $\psi _{q}(t)$ satisfying the above four assumptions in piecewise constant space, as well as the weak continuity of $F_i^h$, so we have the following lemma which gives the existence of the minimizers to problem $(P_{TV}^{\psi })$ in $Q_{ad}^h$ whose proof is similar to that from⁹, Theorem 4.1.

Lemma 1

Under the above assumptions on the admissible set $Q_{ad}^h$ and the energy function $\psi _{q}(t)$, for any $\beta > 0$ and $\alpha >0$, there exists at least one minimizer to problem $(P_{TV}^{h,\psi _{q}})$ in $Q_{ad}^h$.

Since the optical coefficients are approximated as piecewise constants, the discretized coefficient can be expressed as $\mu ^h(\varvec{x})=\sum _{i=1}^{N_t}\mu _k\chi _k(\varvec{x})$, with $N_t$ the number of elements which is finite, $\chi _k(\varvec{x})$ and $\mu _k$ the character function and the value of optical coefficient corresponding to the kth element. In fact, due to the finiteness of $N_t$, the piecewise constant finite element space $V_h$ is isomorphic to the finite dimensional Euclidean space $\mathbb {R}^{N_t}$. Accordingly, the admissible set $Q_{ad}^h$ is isomorphic to the constrained subset

$$\begin{aligned} K_h:=\{\mu ^h\in {\mathbb {R}^{N_t}}|\;\mu ^1\le \mu _i^h\le \mu ^2,1\le i\le N_t\} \end{aligned}$$

which means that the value of $\mu ^h$ restricted on the triangle $\tau$ is $\mu ^h_{\tau }$, written as $\mu ^h|_{\tau }$ sometimes.

For any $\mu \in {K_h}$, we define the jump of $\mu$ over an edge e as

$$\begin{aligned} \lfloor \mu ^h\rfloor _{e}:= \left\{ \begin{aligned} \sum \limits _{e\prec \tau }\mu ^h|_{\tau }\;\text {sgn}(e,\tau ),e\subsetneq \partial X \\ \mu ^h|_{\tau }\;\text {sgn}(e,\tau ),e\subseteq \partial X \end{aligned} \right. \end{aligned}$$

(6)

where $e\prec \tau$ denotes e is an edge of $\tau$, and $\text {sgn}(e,\tau ) = 1$ when the orientation of e is consistent with $\tau$, while $\text {sgn}(e,\tau ) = -1$ otherwise. Then the gradient operator $\nabla$ can be defined as $\nabla \mu ^h|_{e}=\lfloor \mu ^h\rfloor _e$, for any $\mu ^h\in {K_h}$ and e. Given $\mu ^h\in {K_h}$, according to the Radon-Nikodym decomposition for piecewise constant functions^1,8, the discretized total variation of $\mu ^h$ is thus

$$\begin{aligned} TV(\mu ^h)=\sum \limits _{e}l_e\left| \nabla \mu ^h|_e\right| , \end{aligned}$$

(7)

with $l_e$ the edge length of e. Furthermore, in this vector definition of total variation, since the value of $l_e$ is always positive, $l_e\left| \nabla \mu ^h|_e\right|$ can be written as $\left| l_e\nabla \mu ^h|_e\right|$, which means that the total variation of $\mu ^h$ can be represented as the $L^1$ norm of matrix vector product, i.e. $TV(\mu ^h):=\Vert L\mu ^h\Vert _1$. Here, L is a matrix with its e-th row being a sparse vector with at most two non-zero elements. Specifically, if $e\subsetneq \partial X$, the column numbers of the nonzero elements corresponding to the two triangles sharing edge e, and the nonzero values are $l_e$ and $-l_e$ which for the left and right triangle elements of e respectively. On the other hand, if $e\subseteq \partial X$, the column number of the nonzero element corresponding to the triangle contains e. Moreover, the value of non-zero element is $l_e\text {sgn}(e,\tau )$. Then, the formula of discretized nonconvex total variation functional is

$$\begin{aligned} TV_d^{\psi _{q}}(\mu ^h)=\sum \limits _{e}\psi _{q}\left( l_e\left| \nabla \mu ^h|_e\right| \right) \end{aligned}$$

(8)

Under the above notations and analysis, we redefine problem $(P_{TV}^{h,\psi _{q}})$ as the following minimization problem:

$$\begin{aligned} (P_0) \quad Find\; \mu ^h\in {K_h}\; such \; that \; J_{\beta }^d(\mu ^h)\; is\; minimal \; possible \; over\; K_h, \end{aligned}$$

where $J_{\beta }^d(\mu ^h)$ is defined almost the same as $J_{\beta }(\mu ^h)$, except that $TV(\mu ^h)$ and $TV_d^{\psi _{q}}(\mu ^h)$ are defined by formula (7)-(8), i.e.,

$$\begin{aligned} J_{\beta }^d(\mu ^h) = \dfrac{1}{2}\sum _{i=1}^{N_s}\Vert F_i^h(\mu ^h)-M_i\Vert _{L^2(\partial X)}^2 + \beta [(1-\epsilon )\Vert L\mu ^h\Vert _1+\epsilon TV_d^{\psi _{q}}(\mu ^h)] \end{aligned}$$

Analogy to lemma 1, by the compactness of $K_h$, and its isomorphic to $Q_{ad}^h$, problem $(P_0)$ also has a solution for any $\beta \ge 0$ over $K_h$. We thus establish the following lemma and omit the proof.

Lemma 2

Under the definitions on $K_h$ and the assumptions on the energy function $\psi _{q}(t)$, for any $\beta >0$ and q, there is at least one minimizer for problem $(P_0)$ over $K_h$.

To solve problem $(P_0)$, we apply the augmented Lagrangian multiplier method. We first introduce an auxiliary splitting vector variable $v\in {S_h}\subset \mathbb {R}^{N_e}$, where $N_e$ is the number of edges, and then transform problem $(P_0)$ to the following constrained optimization problem $(P_1)$:

$$\begin{aligned} (P_1)\quad \inf _{\mu ^h,v} \mathcal {F}(\mu ^h,v)\quad \text {s.t.} \quad (\mu ^h,v)\in {K_h\times S_h},\quad \mathcal {G}(\mu ^h,v)=0 \end{aligned}$$

where

$$\begin{aligned} \mathcal {F}(\mu ^h,v) = \sum _{i=1}^{N_s}\Vert F_i^h(\mu ^h)-M_i\Vert _2^2 + \beta [(1-\epsilon )\Vert v\Vert _1+\epsilon TV_{d}^{\psi _{q}}(\mu ^h)],\quad \quad \mathcal {G}(\mu ^h,v) = v-L\mu ^h \end{aligned}$$

Problem $(P_1)$ is equivalent with problem $(P_0)$. By a general definition of Lagrangian, the augmented Lagrangian functional $\mathcal {L}_r(\mu ^h,v;\lambda ,r):K_h\times S_h\rightarrow \mathbb {R}$ associated with problem $(P_1)$ will be defined as

$$\begin{aligned} \mathcal {L}_r(\mu ^h,v;\lambda ,r)=\mathcal {F}(\mu ^h,v)-<\lambda ,\mathcal {G}(\mu ^h,v)> +\frac{r}{2}\Vert \mathcal {G}(\mu ^h,v)\Vert _2^2 \end{aligned}$$

(9)

with $\lambda$ be augmented Lagrangian multiplier, and r be positive penalty parameter. By virtue of (9), problem $(P_1)$ can be formulated as the following min-max problem:

$$\begin{aligned} (P_1)\quad \inf _{(\mu ^h,v)\in {K_h\times S_h}}\sup _{(\lambda ,r)\in {\mathbb {R}^{N_e}\times \mathbb {R}^+}}\mathcal {L}_r(\mu ^h,v;\lambda ,r) \end{aligned}$$

Its dual problem is

$$\begin{aligned} (P_1^*) \quad \sup _{(\lambda ,r)\in {\mathbb {R}^{N_e}\times \mathbb {R}^+}}\inf _{(\mu ^h,v)\in {K_h\times S_h}}\mathcal {L}_r(\mu ^h,v;\lambda ,r) \end{aligned}$$

If we denote $inf P_1$ as the infimum of problem $(P_1)$, and denote $sup P_1^*$ as the supremum of problem $(P_1^*)$. Then it is apparent that $inf P_1\ge sup P_1^*$. Besides, if a pair $(\mu ^{h,*},v^*)\in {K_h\times S_h}$ and $(\lambda ^*,r^*)\in {\mathbb {R}^{N_e}\times \mathbb {R}^+}$ furnishes a saddle point of the augmented Lagrangian functional $\mathcal {L}_r(\mu ^h,v;\lambda ,r)$, then $(\mu ^{h,*},v^*)$ is a solution to the primal problem $(P_1)$. But the reverse is not true, since the second order sufficient conditions of problem $(P_1)$ can not be guaranteed¹⁰, Corollary 5.2. This means that the augmented Lagrangian algorithm will diverge, because it can not be sure that the functional $\mathcal {L}_r(\mu ^h,v;\lambda ,r)$ has a saddle point, and there may be a non-zero duality gap between problem $(P_1)$ and problem $(P_1^*)$. However, as an approximation to augmented Lagrangian multiplier by sequentially updating each of the primal variables, the alternating directional multiplier method may be unaffected by the non-zero duality gap, see for the specific example in¹¹, Proposition 1.

ADMM algorithm

To apply ADMM and ensure convergence of the iterative sequence, we first replace the nonlinear term $F_i^h(\mu ^h)$ in problem $(P_1)$ with its linearized formulation by using the first order Taylor expansion at the k-th iteration $\mu ^{h,k}$, i.e. $F_i^h(\mu ^h) \approx F_i(\mu ^{h,k}) + F_i^{\prime }(\mu ^{h,k})(\mu -\mu ^{h,k})$. We denote the new optimization problem as $(P_1^{\prime })$, i.e.

$$\begin{aligned} (P_1^{\prime })\quad&\inf _{\mu ^h,v} \sum _{i=1}^{N_s}\Vert F_i^{\prime }(\mu ^{h,k})(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\Vert _2^2 + \beta [(1-\epsilon )\Vert v\Vert _1+\epsilon TV_{d}^{\psi _{q}}(\mu ^h)]\\&\text {s.t.} \quad (\mu ^h,v)\in {K_h\times S_h},\quad \mathcal {G}(\mu ^h,v)=0 \end{aligned}$$

Then by scaling $\lambda$ with a factor of 1/r, and doing a least squares completion of the last two terms in (9), the scaled augmented Lagrangian functional $\mathcal {L}_r(\mu ^h,v;g,r)$ can be written as

$$\begin{aligned} \mathcal {L}_r(\mu ^h,v;g,r)&:= \sum _{i=1}^{N_s}\Vert F_i^{\prime }(\mu ^{h,k})(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\Vert _2^2\\&\quad + \beta [(1-\epsilon )\Vert v\Vert _1 + \epsilon TV_d^{\psi _{q}}(\mu ^h)] + \dfrac{r}{2}\Vert v - L\mu ^h - g\Vert _2^2 - \dfrac{r}{2}\Vert g\Vert _2^2, \end{aligned}$$

where $g=\lambda /r$ is the scaled Lagrangian multiplier. To solve this optimization problem, the ADMM technique which alternatively minimizes one variable with the other fixed can be utilized. Specifically, the optimization problem of $\mathcal {L}_r(\mu ^h,v;g,r)$ for each variable can be solved by the following subproblems.

Subproblem-1: Firstly, for variable $\mu ^h$, the iterative solution in $k+1$ step can be solved by optimizing $\mathcal {L}_r(\mu ^h,v;g,r)$ with respect to $\mu ^h$

$$\begin{aligned} \mu ^{h,k+1}&= \mathop {\arg \min }\limits _{\mu ^h}\mathcal {L}_r(\mu ^h,v^k;g^k,r)\\&= \mathop {\arg \min }\limits _{\mu ^h} \sum _{i=1}^{N_s}\Vert F_i^{\prime }(\mu ^{h,k})(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\Vert _2^2 + \beta \epsilon TV_d^{\psi _{q}}(\mu ^h) + \dfrac{r}{2}\Vert v^k - L\mu ^h - g^k\Vert _2^2 \end{aligned}$$

Subproblem-2: Secondly, for variable v, the k+1-th iteration can be solved by optimizing $\mathcal {L}_r(\mu ^h,v;g,r)$ with respect to v

$$\begin{aligned} v^{k+1} = \mathop {\arg \min }\limits _{v}\mathcal {L}_r(\mu ^{h,k+1},v;g^k,r) = \mathop {\arg \min }\limits _{v} \beta (1-\epsilon )\Vert v\Vert _1 + \dfrac{r}{2}\Vert v - L\mu ^{h,k+1} - g^k\Vert _2^2 \end{aligned}$$

Subproblem-3: For modified Lagrangian multiplier g, it can be updated by the following formula

$$\begin{aligned} g^{k+1} = g^k + r (L\mu ^{h,k+1} - v^{k+1}) \end{aligned}$$

(10)

For Subproblem-1, since the functional $TV_d^{\psi _{q}}(\mu ^h)$ is not differentiable, a technique that gives the jump term a slight perturbation is commonly used to overcome this difficulty. In this paper, we use $\sqrt{\left| \nabla \mu ^h|_e\right| ^2+\varpi }$ to replace $\left| \nabla \mu ^h|_e\right|$, where $\varpi$ is a small positive parameter. In order not to cause confusion in symbols, we denoted $\sqrt{\left| \nabla \mu ^h|_e\right| ^2+\varpi }$ as $\left| \nabla \mu ^h|_e\right| _{\varpi }$ for short. It is worth mentioning that this kind of replacement will not change the nonconvex property of $TV_d^{\psi _{q}}(\mu ^h)$, and when $\varpi$ tends to zero, the nonsmooth property will also be kept.

The perturbed $TV_d^{\psi _{q}}(\mu ^h)$ will not affect the existence of Subproblem-1, but thanks to its nonconvex, the global unique solution for Subproblem-1 can not be guaranteed, which means we may only get a local minimal solution. A nature way to find the optimal solution of S is solving the equation $\nabla _{\mu ^h}\mathcal {L}_r(\mu ^h,v^k;g^k,r)=0$, in which the gradient of $\mathcal {L}_r(\mu ^h,v^k;g^k,r)$ with respect to the variable $\mu ^h$ reads

$$\begin{aligned} \nabla _{\mu ^h}\mathcal {L}_r(\mu ^h,v^k;g^k,r)&= \sum _{i=1}^{N_s}F_i^{\prime }(\mu ^h)^*\left( F_i^{\prime }(\mu ^h)(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\right) \\&\quad +\beta \epsilon \mathcal {L}(\mu ^h)\mu ^{h}-rL^*\left( v^k-L_e\mu ^h-g^k\right) . \end{aligned}$$

Here, $\mathcal {L}(\mu ^h)\mu ^h$ represents the gradient of $TV_d^{\psi _{q}}(\mu ^h)$ with respect to $\mu ^h$, and is computed as

$$\begin{aligned} \mathcal {L}(\mu ^h)\mu ^h = L_e^T\mathcal {K}\left( \left| \nabla \mu ^h|_e\right| _{\varpi }\right) L_e\;\mu ^h, \end{aligned}$$

where $\mathcal {K}(\left| \nabla \mu ^h|_e\right| _{\varpi })$ is a diagonal matrix of size $N_e\times N_e$, whose i-th entry on the diagonal is $\dfrac{\psi _{q}^{\prime }(\left| L(i,:)\,\mu ^h\right| _{\varpi })}{\left| L(i,:)\,\mu ^h\right| _{\varpi }}$.

With the above analysis and notations, the fixed point iteration is then

$$\begin{aligned} \mu ^{h,k+1} = \left( \sum _{i=1}^{N_s}F_i^{\prime }(\mu ^{h,k})^*F_i^{\prime }(\mu ^{h,k})+\beta \epsilon \mathcal {L}(\mu ^{h,k})+rL^*L\right) ^{-1}\left( \sum _{i=1}^{N_s}F_i^{\prime }(\mu ^{h,k})^*\left( F_i^{\prime }(\mu ^{h,k})\mu ^{h,k}+F_i(\mu ^{h,k})-M_i\right) +rL^*(v^k-g^k)\right) \end{aligned}$$

This paper’s fixed point iteration formula is equivalent to the Quasi-Newton formula where the negative definite second order differential item is dropped to ensure the descent direction¹². From this point, $\mu ^{h,k+1}$ then can be updated by

$$\begin{aligned} \mu ^{h,k+1}= \mu ^{h,k} + \Delta \mu ^{h,k}, \end{aligned}$$

(11)

where $\Delta \mu ^{h,k}=\nabla _2^{-1}\nabla _1$, with

$$\begin{aligned} \nabla _1&= \sum _{i=1}^s F_i^{\prime }(\mu ^{h,k})^*(F_i(\mu ^{h,k})-M_i) +\beta \epsilon \mathcal {L}(\mu ^{h,k})\mu ^{h,k} + rL^*(v^k - L\mu ^{h,k} - g^k) \\ \nabla _2&= \sum _{i=1}^{s}F_i^{\prime }(\mu ^{h,k})^*F_i^{\prime }(\mu ^{h,k}) +\beta \epsilon \mathcal {L}(\mu ^{h,k})+rL^*L. \end{aligned}$$

For Subproblem-2, the closed-form minimizer can be obtained by the shrinkage operator, and its component formula is as follows:

$$\begin{aligned} v_i^{k+1} =&\text {shrinkage} \left( g_i^k + L(i,:)\mu ^{h,k+1},\dfrac{\beta (1-\epsilon )}{r}\right) ,\quad \forall i\in {\{1,2,\cdots ,N_e\}}\nonumber \\&= \text {sign}\left( g_i^k+L(i,:)\mu ^{h,k+1}\right) \max \left( \left| g_i^k+L(i,:) \mu ^{h,k+1}\right| -\dfrac{\beta (1-\epsilon )}{r},0\right) \nonumber \\&= \left\{ \begin{array}{lr} g_i^k+L(i,:)\mu ^{h,k+1}-\dfrac{\beta (1-\epsilon )}{r}, & g_i^k+L(i,:)\mu ^{h,k+1}\ge \dfrac{\beta (1-\epsilon )}{r},\\ 0 , & |g_i^k+L(i,:)\mu ^{h,k+1}|<\dfrac{\beta (1-\epsilon )}{r}\\ g_i^k+L(i,:)\mu ^{h,k+1}+\dfrac{\beta (1-\epsilon )}{r}, & g_i^k+L(i,:)\mu ^{h,k+1}\le -\dfrac{\beta (1-\epsilon )}{r} \end{array} \right. \end{aligned}$$

(12)

The stopping criteria used in this paper is the relative residual error $\mathcal {E}_{resi}$ between the true measurements matrix $M^{true}$ and the synthetic measurement matrix a, which is defined as

$$\begin{aligned} \mathcal {E}_{resi}:= \dfrac{\Vert M^{true} - M^{syn}\Vert _{F}}{\Vert M^{true}\Vert _F} \end{aligned}$$

where $\Vert \cdot \Vert _{F}$ denotes the Frobenius norm of matrix. On the other hand, $M^{true}$ and $M^{syn}$ are both matrices of size $\textit{ns} \times \textit{nd}$. where $\textit{ns}$ denotes the number of sources, and $\textit{nd}$ denotes the number of detectors. The iteration will be terminated when $\mathcal {E}_{resi}\le \tau$ with $\tau$ is a small positive constant. The procedure of the proposed ADMM algorithm can be summarized in the following Algorithm 1.

Now, we will discuss the convergence for the above Algorithm 1. Our problem $(P_1^{\prime })$ is a special case of Problem (3.1) in¹⁴, for which the matrix(or vector) Z and D therein are all zero in our problem, and the matrix $\mathcal {B}$ and $\mathcal {C}$ therein are corresponding to the unite matrix I and the matrix L in our problem, respectively. On the other hand, the possibly nonconvex, nonsmooth and non-Lipschitz function $\Phi$ therein corresponds to $\sum _{i=1}^{N_s}\Vert F_i^{\prime }(\mu ^{h,k})(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\Vert _2^2 + \beta \epsilon TV_d^{\psi _{q}}(\mu ^h)$, while the convex function $\Psi$ is corresponding to $\beta (1-\epsilon )\Vert v\Vert _1$. With the above analysis, by an analogous proof as in¹⁴, we can get the following conclusion, which characterizes the cluster point of the iterative sequence.

Theorem 1

For any fixed nonconvex parameter $\alpha$, regularization parameter $\beta$, suppose the Lagrangian penalty parameter r is chosen above a computable threshold, and that $\{(\mu ^{h,k},v^k,g^k,)\}$ is a iterative sequence generated by Algorithm 1, then any cluster point $(\mu ^{h,*},v^*,g^*)$ of the sequence is a stationary point of problem $(P_1^{\prime })$.

Moreover, noting that in $\mathcal {L}_r(\mu ^h,v;g,r)$, the term $\Vert v\Vert _1$ is the so-called semialgebraic function, and the other terms are all real analytic functions. By the definition of Kurdyka-Łojasiewicz(KL) inequality, the augmented Lagrangian functional $\mathcal {L}_r(\mu ^h,v;g,r)$ as the sum of real analytic and semialgebraic functions is a KL function¹⁵ and satisfies the uniformized KL property¹⁶. With this, the whole iterative sequence generated by Algorithm 1 is convergent.

Remark

In Subproblem-1, we use a smooth approximation to the non-differentiable regularization term $TV_d^{\psi _{q}}(\mu ^h)$ in order to facilitate the solution process. This approximation is necessary to handle the non-differentiable terms efficiently within the ADMM framework. However, two natural questions arise: 1) Does the smoothed approximation used for Subproblem-1 still satisfy the convergence conditions in¹⁴? 2) Can the solution based on the smoothed approximation converge to the solution of the problem in its original form? For the first concern, the approximation term $\sqrt{\left| \nabla \mu ^h|_e\right| ^2+\varpi }$ is nonconvex but smooth for fixed $\varpi > 0$ and still satisfies the assumption in¹⁴, which ensures the convergence of the corresponding ADMM method. For the second concern, as $\varpi$ approaches 0, the smoothed approximation will converge to the original non-differentiable regularization term $TV_d^{\psi _{q}}(\mu ^h)$. Then, following a proof process similar to that in¹, Theorem 3.17, it can be shown that the solution to the regularized problem with the smoothed approximation will also converge to the solution of the problem with the original non-differentiable regularization term.

Theorem 2

Suppose that $\{(\mu ^{h,k},v^k,g^k,)\}$ is generated by Algorithm 1, then the iterative sequence converges to a stationary point of problem $(P_1^{\prime })$.

By tuning $\epsilon$ and q in $\mathcal {R}(\mu ;\epsilon ,\psi _{q})$ just like in¹, we can get a solution that approaches the one obtained with $\mathcal {R}(\mu ^h)=TV_d^{\psi _{0-1}}(\mu ^h)$ regularization. Thus, we embed the ADMM algorithm into a graduated nonconvex strategy. The complete graduated nonconvex ADMM algorithm can now be summarized in Algorithm 2. From this procedure, the purpose of graduate nonconvex strategy is to use the solution of weak nonconvex problem as the initial value of strong nonconvex problem.

Through many experiments, we notice that even though residual error is not decreased as the iteration steps in the first stage of the GNC strategy, i.e., the outer loop at $\epsilon _0$, the GNC algorithm can still converge. On the other hand, if $\tau$ is the same in every stage, the GNC algorithm may stop only at the first stage. Hence, in Algorithm 2, the parameter $\tau$ is updated with a rate $\gamma$ in every stage.

Simulations

General settings. In this section, to test the effectiveness of our proposed GNC-ADMM algorithm in edge preserving and artifacts removing, we will present numerical results especially on a 2D circular domain which contains one polygon inclusion. The radius of the circle is 5 $cm^{-1}$. The measurements are synthetically generated by solving the discretized forward problem on triangular meshes with the package RTE-2D-MATLAB developed by Gao⁶. The same number of sources and detectors are located interleaved on the boundary of the domain with equal spacing. The angular space is discretized into 32 directions, which equally divide $[0,2\pi ]$, and the anisotropic factor is set to be $g=0.9$.

In all simulations, the background scattering and absorption coefficients are set as $5cm^{-1}$ and $0.1cm^{-1}$, respectively, and the corresponding coefficient values for the anomaly are all twice the background. In this paper, the scattering coefficient is assumed to be known, and only the absorption coefficient needs to be reconstructed. The quantitative of the reconstruction is measured by three metrics: the relative $l^2$ norm error $\mathcal {E}_{l_2}$, the relative total variation error $\mathcal {E}_{tv}$ between the true distribution and the reconstructed quantity, and the relative residual error $\mathcal {E}_{resi}$ between the true measurements and the synthetic measurements corresponding to the reconstructed coefficients. The nonconvex energy function satisfying assumption 1, is chosen as $\psi _{q}(t) = |t|^q$ with $0<q<1$. In all simulations, the tiny perturbation parameter $\varpi$ is set to be $10^{-10}$.

Based on above settings, we do several simulations to validate the proposed GNC-ADMM algorithm. All the simulations are performed on a PC equipped with an Intel Core i7-8700 CPU 3.2GHz, 16.0GB RAM, Windows 10 OS, and programmed in the Matlab R2018a environment.

Simulation 1: Reconstruction with different energy functions. It is interesting to compare the reconstruction results to those of our previous work¹. In this simulation, we will explore the ability of the proposed GNC-ADMM algorithm in preserving edges by comparing the reconstructions with convex TV energy function and the nonconvex $TV^q$ energy function, as well as the ability of reconstructing values, by comparing the reconstructions with the double graduated nonconvex Gauss-Newton algorithm(GNC-GN) under the same graduated nonconvex strategy. The optimization of the TV and $TV^q$ problems is solved with the Gauss-Newton iterative algorithm, denoted as TV-GN and TVq-GN for short. Readers can refer to¹ for more details about these two algorithms. To better compare, two single polygon shape abnormal inclusions are designed, they are hexagon and square. For the hexagon case, the unstructured triangle mesh composed of 1169 nodes and 2240 elements; for the square case, the number of the nodes and elements is 2261 and 4384 respectively. 12 sources and 12 detectors are placed on the boundary of the circular area, which means that we will get 144 measurement data.

In $TV^q$ Gaussian-Newton(TVqGN) and TV Gaussian-Newton(TVGN) algorithms, the iteration will be stopped if $\mathcal {E}_{resi}<\tau$, where $\tau$ is set to be $10^{-2}$ in this simulation. While in the proposed GNC-ADMM algorithm and the GNC-GN algorithm, $\tau$ is not fixed. Considering the residual error may not decrease in the first stage, we let the initial $\tau$ be a small number, then iterate some steps at the first stage and keep the residual error in the last iteration as the new $\tau$ of next stage. Regularization parameter $\beta$ plays an important role in reconstruction. They are $5\times 10^{-7}$ in TVGN and TVqGN, $5\times 10^{-6}$ in GNC-ADMM, and $5\times 10^{-7}$ and $5\times 10^{-6}$ for hexagon and square case in GNC-GN, respectively. The nonconvex parameters q in TVqGN are all $1\times 10^{-4}$. In this simulation, the first stage starts from $\epsilon _0 = 0.2$ in GNC-ADMM and $\epsilon _0=0$ in GNC-GN. The common difference is $\Delta \epsilon =0.2$. The initial value of the nonconvex parameter is $q_0=1$, and the common ratio is $\eta = 0.1$. We let the iteration run 15 and 10 steps at the first stage for hexagon and square cases, respectively. Considering the edges and values have been reconstructed in the previous stages, it is not necessary to iterate many steps in the last stage. Hence, the ratio $\gamma$ is set to be 0.9 at the last stage and 0.7 at the other stages in both hexagon and square cases.

Reconstructed images of the absorption coefficient are exhibited in Fig. 1, together with the true distribution. The results show that GNC-ADMM keeps the shape and edges of the anomaly, and there are only slight artifacts in the upper right corner of the anomaly. Furthermore, under the same range setting of the color bar, we can find that GNC-ADMM also reconstructs values of the anomaly well. The reconstructions using TVGN reveal visible blocky artefacts, although edges and values of the anomaly are preserved to some extent. From the reconstruction with TVqGN, the visible blocky artifacts are effectively removed, but there are still some slight artifacts around the anomaly, and the shapes are not well kept as in GNC-ADMM. The results with TVqGN indicate the advantages of nonconvex energy function over usual total variation energy function in artifacts removing and edge preserving. For the low resolution hexagon case, the reconstruction using GNC-GN is comparable to that using GNC-ADMM, while for the high resolution square case, the reconstructed edges and values for the anomaly are not as accurate as those with GNC-ADMM.

Similar findings are observed from the corresponding 1D cross sections in Fig. 2. TVGN produces staircase phenomena, and presents offsets from the true location of the anomaly. Compared with the reconstruction using TVGN, TVqGN keeps better the piecewise constant effect, but it reveals some apparent oscillations on the up and down sides of the anomaly. It can be found that the results are consistent with those in Fig. 1d and i.

Table 1 Metrics for different algorithms.

Full size table

Quantitative metrics, including $\mathcal {E}_{l_2}$, $\mathcal {E}_{tv}$, $\mathcal {E}_{resi}$, the number of iterations, and the CPU time in minutes when the stopping rule is satisfied are shown in Table 1. The results show that GNC-ADMM and GNC-GN perform better than the TVGN and TVqGN in terms of relative errors, and GNC-ADMM performs the best. These results also support those in Figs. 1 and 2. On the other hand, since the nonconvex TVqGN algorithm can quickly recover both edges and values of the anomaly, TVqGN needs the least number of iteration steps. As a graduated nonconvex strategy, GNC-ADMM performs better than GNC-GN. Under the same setting of $\gamma$, due to the first stage of GNC-ADMM starting from the double convex structure, i.e., one convex total variation with ADMM and another convex total variation with GN, GNC-ADMM also needs fewer iterations than GNC-GN. The CPU time is calculated by multiplying the number of iteration steps by the time in seconds required for each iteration step. Through experiments, we observed that the most time-consuming steps are solving the forward problem and the computation of the Jacobian matrix. Apart from these, the computation time for the other steps is almost the same regardless of the algorithm employed. The time consumption of solving the forward problem and Jacobian matrix calculation primarily depends on the resolution of the grid discretization and the number of source-detector pairs. In this simulation, for the hexagon case, the CPU time for solving the forward problem and Jacobian matrix calculation is 20 seconds for each iteration, and for the square case, it is 40 seconds for each iteration.

Simulation 2: Comparison of reconstruction with GNC-ADMM and GNC-GN. In this simulation, we use the same hexagon and square cases as simulation 1, but with 8 sources and 8 detectors placed on the boundary, which makes the total number of measurements only 64, less than half the 144 measurements in simulation 1. We compare the reconstruction with GNC-ADMM and GNC-GN under the same stage parameters setting and fewer measurements.

Results of the above simulation reveal that both GNC-ADMM and GNC-GN can recover edges and values of the anomaly well. It is much more meaningful that GNC-ADMM performs better than GNC-GN in relative errors and iterations. In fact, in the first stage of GNC-GN, the initial stage parameter $\epsilon _0$ is set to zero, which means that total variation is the only energy function. And the corresponding optimization problem is solved with Gauss-Newton iterative method. Compared with GNC-GN, the first stage of GNC-ADMM starts from nonzero $\epsilon _0=0.2$ and $q_0=1$, which means the total variation energy function is divided into two parts by weights 0.2 and 0.8. The former part is treated as $l_1$ norm of gradient and is solved with a soft-shrinkage operator, while the latter part is solved by using lagged diffusive fixed point iteration. This difference allows GNC-ADMM to satisfy the stopping rule earlier just as Table 1 exhibits.

Here, it is necessary to explain the reason why we set different initial stage parameters $\epsilon _0$ in GNC-ADMM and GNC-GN (i.e., $\epsilon _0 = 0.2$ in GNC-ADMM and $\epsilon _0 = 0$ in GNC-GN). Our extensive experimental results indicate that the reconstruction using only ADMM to the gradient based sparse regularization is usually not convergent. However, this situation is exactly the case where $\epsilon _0 = 0$ in the first stage of the GNC-ADMM method. Therefore, by setting $\epsilon _0 = 0.2$ and $q_0 = 1$, even in the first stage of GNC-ADMM, the issue of non-convergence caused by applying the ADMM solely to the sparse regularization of the gradient can be mitigated by the Gaussian Newton method used for total variation regularization in Subproblem 1.

In GNC-ADMM and GNC-GN, the common difference of $\epsilon$ both are set as 0.2, and the common ratio of q both are set as 0.1. Under this setting, for GNC-ADMM, the energy function of the first stage is the weighted decomposition of total variation, and the energy function of next stage is the weighted combination of total variation and nonconvex total variation. To avoid excessive reconstruction at the first stage, we let the iteration run 10 steps at the first stage in each experiment of this simulation. It is worth mentioning that the number of steps kept at the first stage is chosen empirically from the reconstruction results of total variation regularization. Regularization parameter $\beta$ is also selected empirically, which is set as $5\times 10^{-7}$ in each experiment of this simulation. The stopping rule and the setting of tolerance ratio $\gamma$ is the same as simulation 1, i.e., 0.9 at the last stage and 0.7 at the other stages.

Reconstruction results are presented in Fig. 3. Compared with the true distributions shown in Fig. 1a, we can observe that GNC-ADMM performs better than GNC-GN in both the edge preserving and value reconstruction, even when the measurements are greatly reduced. Specifically, from Fig. 3b and g , reconstructed values of the anomaly with GNC-GN are smaller than the exact value, and the contour line of the recovered edges is a little larger than the exact contour line.

These results are also supported in Fig. 3c and h, where the comparison on 1D cross section of images recovered with GNC-ADMM and GNC-GN is presented. From this comparison, the reconstruction with GNC-ADMM and the true distribution on the cross section are almost exactly the same in both values and shapes. However, just as the red lines show, the height of the bulge for the 1D cross-section reconstruction with GNC-GN is lower than the true distribution, as well as those with GNC-ADMM, and the width of the bulge is a little wider than it really is. Fig. 3c and h reflect that GNC-ADMM can reconstruct more efficiently than GNC-GN under the same parameter settings.

On the other hand, from the evolution of $\mathcal {E}_{l_2}$ and $\mathcal {E}_{tv}$ versus iterations with different algorithms in Fig. 3d–i and Fig. 3e–j, both GNC-ADMM and GNC-GN can converge, but the convergence speed of GNC-ADMM is faster and the relative errors of GNC-ADMM are smaller.

Simulation 3: Reconstruction with GNC-ADMM under different noise levels. From the results of the above two simulation experiments, we have observed the effectiveness of our proposed GNC-ADMM method in preserving the abnormal edges. At the same time, we also noticed that under the same graduated non-convex strategy, the computational efficiency of the ADMM-based optimization algorithm is superior to that of the Gaussian Newton based optimization algorithm. Now in this simulation, we will present the reconstructions with GNC-ADMM under different noise levels.

The true solution and the reconstructed are present in Fig. 4. The unstructured triangular mesh composed of 1305 nodes and 2496 elements. The regularization parameters under different noise levels are all taken as $5\times 10^{-5}$, the initial value of $\epsilon _0$ and $q_0$ are all taken as 0.2 and 1, while the stopping rule and the iteration step setting in the first stage are the same as those in Simulation 1. Figure 4a presents the true solution. Figure 4b–e exhibit the reconstructions under noise levels 0%, 0.03%, 0.3%, and 3%, respectively. The relative $l_2$ norm errors corresponding to each noise level are 0.1497, 0.1498, 0.1515 and 0.2407, respectively, and the iteration steps are 36, 36, 35, and 31 respectively. Since the four different noise cases share the same inclusion model and have the same mesh, the CPU time for each iteration step is almost the same, it is about 27 seconds. From Fig. 4, as the noise level gets small, the reconstruction converges to the true solution. But when the noise level reaches 1%, the reconstruction results are severely distorted.

Conclusions

In this paper, we present a graduated nonconvex alternative directional multiplier method(GNC-ADMM) for optical coefficients reconstruction in DOT with nonconvex and nonsmooth penalty function. Our theoretical analysis shows the convergence of the sequence generated by ADMM to the solution of the minimization problem $(P_1^{\prime })$. Numerical experiments with GNC-ADMM illustrate that compared to GNC-GN, TVqGN, and TVGN, GNC-ADMM effectively removes artifacts in the background and outperforms GNC-GN in keeping edges and values of the anomaly with fewer iteration steps and fewer measurements especially. Some parameters, such as the tolerance ratio and regularization parameter, are selected empirically. The lack of theoretical quantitative analysis for the parameter selection in GNC-ADMM is a major challenge for future work. We aim to extend the framework toward simultaneous reconstruction of both absorption and scattering coefficients, potentially incorporating multi-wavelength data or structural priors to better stabilize the inversion.

Data availability

The datasets used during the current study are available from the corresponding author on reasonable request.

References

Tang, J. Nonconvex and nonsmooth total variation regularization method for diffuse optical tomography based on rte. Inverse Prob. 37(6), 065001. https://doi.org/10.1088/1361-6420/abf5ed (2021).
Article ADS MathSciNet Google Scholar
Wei, C., Li, Z., Sun, Z., Jia, K. & Feng, J. Diffusion equation engine deep learning for diffuse optical tomography. In Multimodal Biomedical Imaging XVII, vol. 11952, 1195206. (International Society for Optics and Photonics, 2022). https://doi.org/10.1117/12.2606609.
Wei, C., Li, Z., Hu, T., Sun, Z., Jia, K. & Feng, J. Model-based graph convolutional network for diffuse optical tomography. In Multimodal Biomedical Imaging XIX, vol. 12834, 128340A. (International Society for Optics and Photonics, 2024). https://doi.org/10.1117/12.3003439.
Yedder, H. B., Cardoen, B., Shokoufi, M., Golnaraghi, F. & Hamarneh, G. Multitask deep learning reconstruction and localization of lesions in limited angle diffuse optical tomography. IEEE Trans. Med. Imaging 41(3), 515–530. https://doi.org/10.1109/TMI.2021.3117276 (2022).
Article PubMed Google Scholar
Han, W., Huang, J. & Eichholz, J. A. Discrete ordinate discontinuous Galerkin methods for solving the radiative transfer equation. SIAM J. Sci. Comput. 32(2), 477–497. https://doi.org/10.1137/090767340 (2021).
Article MathSciNet Google Scholar
Gao, H. & Z, H. A fast-forward solver of radiative transfer equation. Transp. Theory Stat. Phys. 38(3), 149–192. https://doi.org/10.1080/00411450903187722 (2009).
Article ADS MathSciNet CAS Google Scholar
Egger, H. & S, M. Numerical methods for parameter identification in stationary radiative transfer. Comput. Optim. Appl. 62(1), 67–83. https://doi.org/10.1007/s10589-014-9657-9 (2015).
Article MathSciNet Google Scholar
Ambrosio, L., Fusco, N. & Pallara, D. Functions of Bounded Variation and Free Discontinuity Problems (Oxford University Press, 2000). https://doi.org/10.1093/oso/9780198502456.001.0001.
Book Google Scholar
Tang, J., H, B. & Han, W. A theoretical study for RTE-based parameter identification problems. Inverse Prob. 29(9), 095002. https://doi.org/10.1088/0266-5611/29/9/095002 (2013).
Article ADS MathSciNet Google Scholar
Rockafellar, R. T. Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM J. Control 12(2), 268–285. https://doi.org/10.1137/0312021 (1974).
Article MathSciNet Google Scholar
Wang, Y., Z, J. & Yin, W. Global convergence of admm in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29–63. https://doi.org/10.1007/s10915-018-0757-z (2019).
Article MathSciNet Google Scholar
Nikolova, M., Ng, M. K. & Tam, C. P. Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Trans. Image Process. 19(12), 3073–3088. https://doi.org/10.1109/TIP.2010.2052275 (2010).
Article ADS MathSciNet PubMed Google Scholar
Tang, J., H, W. & Han, B. Mixed total variation and l1 regularization method for optical tomography based on radiative transfer equation. Comput. Math. Methods Med. 2017, 1–15. https://doi.org/10.1155/2017/2953560 (2017).
Article CAS Google Scholar
Yang, L., C, X. & Pong, T. K. Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imag. Sci. 10(1), 74–110. https://doi.org/10.1137/15M1027528 (2017).
Article MathSciNet Google Scholar
Xu, Y. & Y, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imag. Sci. 6(3), 1758–1789. https://doi.org/10.1137/120887795 (2013).
Article MathSciNet Google Scholar
Bolte, J., T, M. & Sabach, S. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494. https://doi.org/10.1007/s10107-013-0701-9 (2014).
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the anonymous referees for their valuable and important comments.

Funding

This work is supported by the Natural Science Foundation of Hainan Province under grant no.121RC554. The Research Funds for Universities and Colleges in Heilongjiang Province under grant no. 2022-KYYWF-1047.

Author information

Authors and Affiliations

School of Computer and Big data (School of Cyber and Security), Heilongjiang University, Harbin, 150008, China
Jinping Tang
School of Public Health, Hainan Medical University (Hainan Academy of Medical Sciences), Haikou, 571199, China
Bo Bi

Authors

Jinping Tang
View author publications
Search author on:PubMed Google Scholar
Bo Bi
View author publications
Search author on:PubMed Google Scholar

Contributions

J.T. conceived this study and conducted theoretical analysis. B.B. designed the experiments, analyzed the results and wrote the first version of the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Bo Bi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Tang, J., Bi, B. Efficient parameter identification using nonsmooth and nonconvex regularization for inverse problem of diffuse optical tomography. Sci Rep 15, 26103 (2025). https://doi.org/10.1038/s41598-025-07560-y

Download citation

Received: 26 December 2024
Accepted: 16 June 2025
Published: 18 July 2025
DOI: https://doi.org/10.1038/s41598-025-07560-y