Adjoint state method explained
The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.[1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.[2]
The adjoint state space is chosen to simplify the physical interpretation of equation constraints.[3]
Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.
The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient.The adjoint method is derived from the dual problem[4] and is used e.g. in the Landweber iteration method.[5]
is used.
When the initial problem consists of calculating the product
and
must satisfy
, the dual problem can be realized as calculating the product, where
must satisfy
. And
is called the adjoint state vector.
General case
The original adjoint calculation method goes back to Jean Cea,[6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.
For a state variable
, an optimization variable
, an objective functional
is defined. The state variable
is often implicitly dependent on
through the (direct) state equation
(usually the
weak form of a
partial differential equation), thus the considered objective is
. Usually, one would be interested in calculating
using the
chain rule:
\nablaj(v)=\nablavJ(uv,v)+\nablauJ(uv)\nablavuv.
Unfortunately, the term
is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of
, the problem
has an associate Lagrangian functional
l{L}:l{U} x l{V} x l{U}\toR
defined by
l{L}(u,v,λ)=J(u,v)+\langleDv(u),λ\rangle,
where
is a
Lagrange multiplier or
adjoint state variable and
is an
inner product on
. The method of Lagrange multipliers states that a solution to the problem has to be a
stationary point of the lagrangian, namely
\begin{cases}
dul{L}(u,v,λ;\deltau)=duJ(u,v;\deltau)+\langle\deltau,D
=0&\forall\deltau\inl{U},\\
dvl{L}(u,v,λ;\deltav)=dvJ(u,v;\deltav)+\langledvDv(u;\deltav),λ\rangle=0&\forall\deltav\inl{V},\\
dλl{L}(u,v,λ;\deltaλ)=\langleDv(u),\deltaλ\rangle=0 &\forall\deltaλ\inl{U},
\end{cases}
where
is the
Gateaux derivative of
with respect to
in the direction
. The last equation is equivalent to
, the state equation, to which the solution is
. The first equation is the so-called adjoint state equation,
\langle\deltau,D
=-duJ(uv,v;\deltau) \forall\deltau\inl{U},
because the operator involved is the adjoint operator of
,
. Resolving this equation yields the adjoint state
.The gradient of the quantity of interest
with respect to
is
\langle\nablaj(v),\deltav\rangle=dvj(v;\deltav)=dvl{L}(uv,v,λv;\deltav)
(the second equation with
and
), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator
is
self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.
Example: Linear case
In a real finite dimensional linear programming context, the objective function could be
J(u,v)=\langleAu,v\rangle
, for
,
and
, and let the state equation be
, with
and
.
The lagrangian function of the problem is
l{L}(u,v,λ)=\langleAu,v\rangle+\langleBvu-b,λ\rangle
, where
.
The derivative of
with respect to
yields the state equation as shown before, and the state variable is
. The derivative of
with respect to
is equivalent to the adjoint equation, which is, for every
,
du[\langleBv ⋅ -b,λ\rangle](u;\deltau)=-\langleA\topv,\deltau\rangle\iff\langleBv\deltau,λ\rangle=-\langleA\topv,\deltau\rangle\iff\langle
λ+A\topv,\deltau\rangle=0\iff
λ=-A\topv.
Thus, we can write symbolically
. The gradient would be
\langle\nablaj(v),\deltav\rangle=\langleAuv,\deltav\rangle+\langle\nablavBv:λv ⊗ uv,\deltav\rangle,
where
is a third order
tensor,
is the
dyadic product between the direct and adjoint states and
denotes a double
tensor contraction. It is assumed that
has a known analytic expression that can be differentiated easily.
Numerical consideration for the self-adjoint case
If the operator
was self-adjoint,
, the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a
LU decomposition can be used instead to solve the state equation, in
operations for the decomposition and
operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only
operations since the matrices are the same.
See also
References
- Pollini. Nicolò. Lavan. Oren. Amir. Oded. 2018-06-01. Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers. Structural and Multidisciplinary Optimization. en. 57. 6. 2273–2289. 10.1007/s00158-017-1858-2. 125712091. 1615-1488.
- Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online
- Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website
- McNamara . Antoine . Treuille . Adrien . Popović . Zoran . Stam . Jos . Fluid control using the adjoint method . 28 October 2022 . https://web.archive.org/web/20220129011505/https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf . 29 January 2022 . ACM Transactions on Graphics . 23 . 3. 449–456 . 10.1145/1015706.1015744 . August 2004 . live.
- Web site: Lundvall . Johan . Data Assimilation in Fluid Dynamics using Adjoint Optimization . . 28 October 2022 . https://web.archive.org/web/20221009083111/http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf . 9 October 2022 . Sweden . en . 2007 . live.
- Cea. Jean. 1986. Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût. ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique. fr. 20. 3. 371–402. 10.1051/m2an/1986200303711 . free.
External links