Contents
- Contents
- Barzilai Borwein Method
- Gradient Descent with Fixed Step Size
- Lipschitz Approximation (Malitsky & Mishchenko)
- Weighted Norm Damping Gradient Method (WNGrad)
- Nesterov's Accelerated Gradient Descent
- Gradient Descent with Diminishing Step Size
- Gradient Descent with Backtracking
- Gradient Descent with Non-monotone Line Search
- Gradient Descent with Non-sequential Armijo Line Search
- Line search Helper Functions
- Index
Barzilai Borwein Method
OptimizationMethods.barzilai_borwein_gd — Functionbarzilai_borwein_gd(optData::BarzilaiBorweinGD{T}, progData::P
where P <: AbstractNLPModel{T, S}) where {T,S}Implements gradient descent with Barzilai-Borwein step size and applies the method to the optimization problem specified by progData.
Reference(s)
Barzilai and Borwein. "Two-Point Step Size Gradient Methods". IMA Journal of Numerical Analysis.
Method
Given iterates $\lbrace x_0,\ldots,x_k\rbrace$, the iterate $x_{k+1}$ is equal to $x_k - \alpha_k \nabla f(x_k)$, where $\alpha_k$ is one of two versions.
Long Step Size Version (if optData.long_stepsize==true)
If $k=0$, then $\alpha_0$ is set to optData.init_stepsize. For $k>0$,
\[\alpha_k = \frac{ \Vert x_k - x_{k-1} \Vert_2^2}{(x_k - x_{k-1})^\intercal (\nabla f(x_k) - \nabla f(x_{k-1}))}.\]
Short Step Size Version (if optData.long_stepsize==false)
If $k=0$, then $\alpha_0$ is set to optData.init_stepsize. For $k>0$,
\[\alpha_k = \frac{(x_k - x_{k-1})^\intercal (\nabla f(x_k) - \nabla f(x_{k-1}))}{\Vert \nabla f(x_k) - \nabla f(x_{k-1})\Vert_2^2}.\]
Arguments
optData::BarzilaiBorweinGD{T}, the specification for the optimization method.progData<:AbstractNLPModel{T,S}, the specification for the optimization problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
OptimizationMethods.BarzilaiBorweinGD — TypeBarzilaiBorweinGD{T} <: AbstractOptimizerData{T}A structure for storing data about gradient descent using the Barzilai-Borwein step size, and the progress of its application on an optimization problem.
Fields
name:String, name of the solver for reference.init_stepsize::T, initial step size to start the method.long_stepsize::Bool, flag for step size; if true, use the long version of Barzilai-Borwein. If false, use the short version.threshold::T, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64, max number of iterations (gradient steps) taken by the solver.iter_diff::Vector{T}, a buffer for storing differences between subsequent iterate values that are used for computing the step sizegrad_diff::Vector{T}, a buffer for storing differences between gradient values at adjacent iterates, which is used to compute the step sizeiter_hist::Vector{Vector{T}}, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0). Thek+1entry corresponds to the iterate at iterationk.grad_val_hist::Vector{T}, a vector for storingmax_iterations+1gradient norm values. The first entry corresponds to iteration0. Thek+1entry correpsonds to the gradient norm at iterationk.stop_iteration::Int64, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1].
Constructors
BarzilaiBorweinGD(::Type{T}; x0::Vector{T}, init_stepsize::T,
long_stepsize::Bool, threshold::T, max_iterations::Int) where {T}Constructs the struct for the Barzilai-Borwein optimization method
Arguments
T::DataType, specific data type used for calculations.
Keyword Arguments
x0::Vector{T}, initial point to start the solver at.init_stepsize::T, initial step size used for the first iteration.long_stepsize::Bool, flag for step size; if true, use the long version of Barzilai-Borwein, if false, use the short version.threshold::T, gradient threshold. If the norm gradient is below this, then iteration is terminated.max_iterations::Int, max number of iterations (gradient steps) taken by the solver.
Gradient Descent with Fixed Step Size
OptimizationMethods.fixed_step_gd — Functionfixed_step_gd(optData::FixedStepGD{T}, progData<:AbstractNLPModel{T,S})
where {T,S}Implements fixed step-size gradient descent for the desired optimization problem specified by progData.
Method
The iterates are updated according to the procedure
\[x_{k+1} = x_k - \alpha ∇f(x_k),\]
where $\alpha$ is the step size, $f$ is the objective function, and $∇f$ is the gradient function of $f$.
Arguments
optData::FixedStepGD{T}, the specification for the optimization method.progData<:AbstractNLPModel{T,S}, the specification for the optimization problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
OptimizationMethods.FixedStepGD — TypeFixedStepGD{T} <: AbstractOptimizerData{T}A structure for storing data about fixed step-size gradient descent, and the progress of its application on an optimization problem.
Fields
name::String, name of the solver for referencestep_size::T, the step-size selection for the optimization procedurethreshold::T, the threshold on the norm of the gradient to induce stoppingmax_iterations::Int, the maximum allowed iterationsiter_hist::Vector{Vector{T}}, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0). Thek+1entry corresponds to the iterate at iterationk.grad_val_hist::Vector{T}, a vector for storingmax_iterations+1gradient norm values. The first entry corresponds to iteration0. Thek+1entry correpsonds to the gradient norm at iterationkstop_iteration, iteration number that the algorithm stopped at. Iterate numberstop_iterationis produced.
Constructors
FixedStepGD(::Type{T}; x0::Vector{T}, step_size::T, threshold::T,
max_iterations::Int) where {T}Constructs the struct for the optimizer.
Arguments
T::DataType, specific data type for the calculations
Keyword Arguments
x0::Vector{T}, the initial iterate for the optimizersstep_size::T, the step size of the optimizerthreshold::T, the threshold on the norm of the gradient to induce stoppingmax_iterations::Int, the maximum number of iterations allowed
Lipschitz Approximation (Malitsky & Mishchenko)
OptimizationMethods.lipschitz_approximation_gd — Functionlipschitz_approximation_gd(optData::FixedStepGD{T}, progData::P where P
<: AbstractNLPModel{T, S}) where {T, S}Implements gradient descent with adaptive step sizes formed through a lipschitz approximation for the desired optimization problem specified by progData.
This method is designed for convex optimization problems.
References(s)
Method
The iterates are updated according to the procedure,
\[x_{k+1} = x_{k} - \alpha_k \nabla f(x_{k}),\]
where $\alpha_k$ is the step size and $\nabla f$ is the gradient function of the objective function $f$.
The step size is computed depending on $k$. When $k = 0$, the step size is set to optData.init_stepsize. When $k > 0$,
\[\alpha_k = \min\left\lbrace \sqrt{1 + \theta_{k-1}}\alpha_{k-1}, \frac{\Vert x_k - x_{k-1} \Vert}{\Vert \nabla f(x_k) - \nabla f(x_{k-1})\Vert} \right\rbrace,\]
where $\theta_0 = \inf$ and $\theta_k = \alpha_k / \alpha_{k-1}$.
Arguments
optData::LipschitzApproxGD{T}, the specification for the optimization method.progData<:AbstractNLPModel{T,S}, the specification for the optimization problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
OptimizationMethods.LipschitzApproxGD — TypeLipschitzApproxGD{T} <: AbstractOptimizerData{T}A structure for storing data about adaptive gradient descent using a Lipschitz Approximation scheme (AdGD), and the progress of its application on an optimization problem.
Fields
name::String, name of the solver for referenceinit_stepsize::T, the initial step size for the methodprev_stepsize::T, step size used atiter - 1wheniter > 1.theta::T, element used in the computation of the step size. See the referenced paper for more information.lipschitz_approximation::T, help the lipschitz approximation used in the computation of the step size. See the referenced paper for more information.threshold::T, the threshold on the norm of the gradient to induce stoppingmax_iterations::Int64, the maximum allowed iterationsiter_diff::Vector{T}, a buffer for storing differences between subsequent iterate values that are used for computing the step sizegrad_diff::Vector{T}, a buffer for storing differences between gradient values at adjacent iterates, which is used to compute the step sizeiter_hist::Vector{Vector{T}}, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0). Thek+1entry corresponds to the iterate at iterationk.grad_val_hist::Vector{T}, a vector for storingmax_iterations+1gradient norm values. The first entry corresponds to iteration0. Thek+1entry corresponds to the gradient norm at iterationkstop_iteration::Int64, iteration number that the algorithm stopped at. Iterate numberstop_iterationis produced.
Constructors
LipschitzApproxGD(::Type{T}; x0::Vector{T}, init_stepsize::T, threshold::T,
max_iterations::Int) where {T}Constructs the struct for the optimizer.
Arguments
T::DataType, specific data type for the calculations
Keyword Arguments
x0::Vector{T}, the initial iterate for the optimizersinit_stepsize::T, the initial step size for the methodthreshold::T, the threshold on the norm of the gradient to induce stoppingmax_iterations::Int, the maximum number of iterations allowed
Weighted Norm Damping Gradient Method (WNGrad)
OptimizationMethods.weighted_norm_damping_gd — Functionweighted_norm_damping_gd(optData::WeightedNormDampingGD{T},
progData::P where P <: AbstractNLPModel{T, S}) where {T, S}Method that implements gradient descent with weighted norm damping step size using the specifications in optData on the problem specified by progData.
Reference
Wu, Xiaoxia et. al. "WNGrad: Learn the Learning Rate in Gradient Descent". arxiv, https://arxiv.org/abs/1803.02865
Method
Let $\theta_k$ be the $k^{th}$ iterate, and $\alpha_k$ be the $k^{th}$ step size. The optimization method generate iterates following
\[\theta_{k + 1} = \theta_{k} - \alpha_k \nabla f(\theta_k),\]
where $\nabla f$ is the gradient of the objective function $f$.
The step size depends on the iteration number $k$. For $k = 0$, the step size $\alpha_0$ is the reciprocal of optData.init_norm_damping_factor. For $k > 0$, the step size is iteratively updated as
\[\alpha_k = \left[ \frac{1}{\alpha_{k-1}} + \Vert\dot F(\theta_k)\Vert_2^2 \alpha_{k-1} \right]^{-1}.\]
When $\alpha_0 < \Vert\dot F(\theta_0)\Vert_2^{-1}$ and a globally Lipschitz smooth objective function is used, then the method is guaranteed to find an $\epsilon$-stationary point. It is recommended then that optData.init_norm_damping_factor exceed $\Vert\dot F(\theta_0)\Vert_2$.
Arguments
optData::WeightedNormDampingGD{T}, specification for the optimization algorithm.progData::P where P <: AbstractNLPModel{T, S}, specification for the problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
OptimizationMethods.WeightedNormDampingGD — TypeWeightedNormDampingGD{T} <: AbstractOptimizerData{T}A mutable struct that represents gradient descent using the weighted-norm damping step size. It stores the specification for the method and records values during iteration.
Fields
name::String, name of the optimizer for recording purposesinit_norm_damping_factor::T, initial damping factor. This value's reciprocal will be the initial step size.threshold::T, norm gradient tolerance condition. Induces stopping when norm is at mostthreshold.max_iterations::Int64, max number of iterates that are produced, not including the initial iterate.iter_hist::Vector{Vector{T}}, store the iterate sequence as the algorithm progresses. The initial iterate is stored in the first position.grad_val_hist::Vector{T}, stores the norm gradient values at each iterate. The norm of the gradient evaluated at the initial iterate is stored in the first position.stop_iteration::Int64, the iteration number the algorithm stopped on. The iterate that induced stopping is saved atiter_hist[stop_iteration + 1].
Constructors
WeightedNormDampingGD(::Type{T}; x0::Vector{T}, init_norm_damping_factor::T,
threshold::T, max_iterations::Int64) where {T}Constructs an instance of type WeightedNormDampingGD{T}.
Arguments
T::DataType, type for data and computation
Keyword Arguments
x0::Vector{T}, initial point to start the optimization routine. Saved initer_hist[1].init_norm_damping_factor::T, initial damping factor, which will correspond to the reciprocoal of the initial step size.threshold::T, norm gradient tolerance condition. Induces stopping when norm at mostthreshold.max_iterations::Int64, max number of iterates that are produced, not including the initial iterate.
Nesterov's Accelerated Gradient Descent
OptimizationMethods.nesterov_accelerated_gd — Functionnesterov_accelerated_gd(optData::NesterovAcceleratedGD{T}, progData::P
where P <: AbstractNLPModel{T, S}) where {T, S}Implements Nesterov's Accelerated Gradient Descent as specified by optData on the problem specified by progData.
This algorithm is designed for convex problems.
Reference(s)
- See Algorithm 1 of Li et. al. "Convex and Non-convex Optimization Under Generalized Smoothness". arxiv, https://arxiv.org/abs/2306.01264.
- See line-search based approach in Nesterov, Yurii. 1983. “A Method for Solving the Convex Programming Problem with Convergence Rate O(1/K^2).” Proceedings of the USSR Academy of Sciences 269:543–47.
Method
Let the objective function be denoted by $F(\theta)$ and $\nabla F( \theta)$ denote its gradient function. Given $\theta_0$ and a step size $\alpha$ (equal to optData.step_size), the method produces five sequences. At $k=0$,
\[\begin{cases} B_0 &= 0 \\ z_0 &= \theta_0 \\ \Delta_0 & = 1 \\ y_0 &= \theta_0; \end{cases}\]
and, for $k\in\mathbb{N}$,
\[\begin{cases} B_{k} &= B_{k-1} + \Delta_{k-1} \\ \theta_k &= y_{k-1} - \alpha \nabla F(y_{k-1}) \\ z_k &= z_{k-1} - \alpha\Delta_{k-1}\nabla F(y_{k-1}) \\ \Delta_k &= \frac{1}{2}\left( 1 + \sqrt{4 B_{k} + 1} \right) \\ y_k &= \theta_k + \frac{\Delta_{k}}{B_k + \Delta_k + \alpha^{-1}} (z_k - \theta_k). \end{cases}\]
The iterate sequence of interest is $\lbrace \theta_k \rbrace$.
Arguments
optData::NesterovAcceleratedGD{T}, the specification for the optimization method.progData<:AbstractNLPModel{T,S}, the specification for the optimization problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
OptimizationMethods.NesterovAcceleratedGD — TypeNesterovAcceleratedGD{T} <: AbstractOptimizerDataA structure that represents Nesterov Accelerated Gradient Descent. Stores variables related to the method, and tracks quantities as the algorithm progresses.
Fields
name:String, name of the solver for reference.step_size::T, step size used in the method.z::Vector{T}, buffer array for auxiliary iterate sequencey::Vector{T}, buffer array for convex combination of iterate and auxiliary sequenceB::T, auxiliary quadratic scaling term for computing acceleration weights and step size.threshold::T, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64, max number of iterations (gradient steps) taken by the solver.iter_hist::Vector{Vector{T}}, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0). Thek+1entry corresponds to the iterate at iterationk.grad_val_hist::Vector{T}, a vector for storingmax_iterations+1gradient norm values. The first entry corresponds to iteration0. Thek+1entry corresponds to the gradient norm at iterationk.stop_iteration::Int64, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1].
Constructors
NesterovAcceleratedGD(::Type{T}; x0::Vector{T}, step_size::T,
threshold::T, max_iterations::Int64) where {T}Arguments
T::DataType, specific data type used for calculations.
Keyword Arguments
x0::Vector{T}, initial point to start the solver at.step_size::T, step size used in the method.threshold::T, gradient threshold. If the norm gradient is below this, then iteration is terminated.max_iterations::Int, max number of iterations (gradient steps) taken by the solver.
Gradient Descent with Diminishing Step Size
OptimizationMethods.DiminishingStepGD — TypeDiminishingStepGD{T} <: AbstractOptimizerData{T}A structure for storing data about gradient descent using diminishing step sizes, and recording the progress of its application on an optimization problem.
Fields
name::String, name of the solver for reference.step_size_function::Function, step size function. Should take the iteration number and return the step size for that iteration.step_size_scaling::T, factor that is multipled with the amount of the step size function.threshold::T, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64, max number of iterations (gradient steps) taken by the solver.iter_hist::Vector{Vector{T}}, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0). Thek+1entry corresponds to the iterate at iterationk.grad_val_hist::Vector{T}, a vector for storingmax_iterations+1gradient norm values. The first entry corresponds to iteration0. Thek+1entry correpsonds to the gradient norm at iterationk.stop_iteration::Int64, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1].
step_size_function should take in two arguments, the data type for computation T and the iteration number. For example, calling step_size_function(Float64, 1) should return the step size as a Float64 for the iteration 1.
Constructors
DiminishingStepGD(::Type{T}; x0::Vector{T}, step_size_function::Function,
step_size_scaling::T, threshold::T, max_iterations::Int)Constructs the struct for the diminishing step size gradient descent method.
Arguments
T::DataType, specific data type used for calculations.
Keyword Arguments
x0::Vector{T}, initial point to start the solver at.step_size_function::Function, step size function. Should take the iteration number and return the step size for that iteration.step_size_scaling::T, factor that is multipled with the amount of the step size function.threshold::T, gradient threshold. If the norm gradient is below this, then iteration is terminated.max_iterations::Int, max number of iterations (gradient steps) taken by the solver.
OptimizationMethods.diminishing_step_gd — Functiondiminishing_step_gd(optData::DiminishingStepGD{T}, progData::P
where P <: AbstractNLPModel{T, S}) where {T, S}Implements gradient descent with diminishing step sizes and applies the method to the optimization problem specified by progData.
Reference(s)
Bertsekas, Dimitri. "Nonlinear Programming". 3rd Edition, Athena Scientific, Chapter 1.
Method
Given iterates $\lbrace x_0,\ldots,x_k\rbrace$, the iterate $x_{k + 1}$ is equal to $x_k - \alpha_k \nabla f(x_k)$, where $\alpha_k$ is equal to optData.step_size_scaling * optData.step_size_function(T, k).
The step size function should satisfy several conditions. First, $\alpha_k > 0$ for all $k$. Second, $\lim_{k \to \infty} \alpha_k = 0.$ Finally, $\sum_{k=0}^{\infty} \alpha_k = \infty.$ See Patel and Berahas (2024) for details.
Arguments
optData::DiminishingStepGD{T}, the specification for the optimization method.progData<:AbstractNLPModel{T,S}, the specification for the optimization problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
Below is a list of step size functions that are in the library. The step size sequence generated by these functions, $\lbrace \alpha_k \rbrace$ satisfies $\alpha_k > 0$, $\lim_{k \to\infty} \alpha_k = 0$ and $\sum_{k=0}^\infty \alpha_k = \infty$.
OptimizationMethods.inverse_k_step_size — Functioninverse_k_step_size(::Type{T}, k::Int64)Return the inverse of k.
Method
The step size sequence generated for $k \in \mathbb{N}$ is
\[ \alpha_k = \frac{1}{k+1},\]
when using this method.
Arguments
T::Type, data type that the computations are done in.k::Int64, index of the step size needed.
Returns
Returns a number of type T.
OptimizationMethods.inverse_log2k_step_size — Functioninverse_log2k_step_size(::Type{T}, k::Int64) where {T}Method
The step size sequence generated when using this method is
\[ \alpha_k = \frac{1}{\lfloor \log_2(k+1) + 1 \rfloor}\]
for $k \in \mathbb{N}$.
Arguments
T::Type, data type that the computation are done in.k::Int64, index of the step size needed.
Returns
Returns a number of type T.
OptimizationMethods.stepdown_100_step_size — Functionstepdown_100_step_size(::Type{T}, k::Int64) where {T}Method
The step size sequence generated is
\[ \alpha_k = \frac{1}{2^i}\]
for $k \in [ (2^i-1)100, (2^{i+1}-1)100)$.
Arguments
T::Type, data type that the computation are done in.k::Int64, index of the step size needed.
Returns
Returns a number of type T.
Gradient Descent with Backtracking
OptimizationMethods.backtracking_gd — Functionbacktracking_gd(optData::BacktrackingGD{T},
progData::P where P <: AbstractNLPModel{T, S}) where {T, S}Reference(s)
Nocedal and Wright. "Numerical Optimization". Springer New York, NY.
Method
Let $\theta_{k-1}$ be the current iterate, and let $\alpha \in \mathbb{R}_{>0}$, $\delta \in (0, 1)$, and $\rho \in (0, 1)$. The $k^{th}$ iterate is generated as $\theta_k = \theta_{k-1} - \delta^t\alpha \dot F(\theta_{k-1})$ where $t + 1 \in \mathbb{N}$ is the smallest such number satisfying
\[ F(\theta_k) \leq F(\theta_{k-1}) - \rho\delta^t\alpha ||\dot F(\theta_{k-1})||_2^2,\]
where $||\cdot||_2$ is the L2-norm.
Theoretically, there exists such a $t$, but it can be made arbitrarily large. Therefore, the line search procedure stops searching after optData.line_search_max_iteration. The current implementation terminates the procedure if the backtracking condition is not satisfied, and returns the previous iterate.
Arguments
optData::BarzilaiBorweinGD{T}, the specification for the optimization method.progData<:AbstractNLPModel{T,S}, the specification for the optimization problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
OptimizationMethods.BacktrackingGD — TypeBacktrackingGD{T} <: AbstractOptimizerData{T}Mutable sturcture representing gradient descent using backtracking. It also stores and keeps track of values during the optimization routine.
Fields
name:String, name of the solver for reference.α::T, initial step size used for backtracking.δ::T, backtracking decreasing factor applied toαwhen line search criterion is not satisfied.ρ::T, factor involved in the acceptance criterion in the line search procedure. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions.line_search_max_iteration::Int64, maximum allowable iterations for the backtracking procedure.threshold::T, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64, max number of iterations (gradient steps) taken by the solver.iter_hist::Vector{Vector{T}}, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0). Thek+1entry corresponds to the iterate at iterationk.grad_val_hist::Vector{T}, a vector for storingmax_iterations+1gradient norm values. The first entry corresponds to iteration0. Thek+1entry correpsonds to the gradient norm at iterationk.stop_iteration::Int64, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1].
Constructors
BacktrackingGD(::Type{T}; x0::Vector{T}, α::T, δ::T, ρ::T,
line_search_max_iteration::Int64, threshold::T, max_iteration::Int64)
where {T}Returns a struct of type BacktrackingGD{T} with all the field initialized to either initialized to the values given, or their default values.
Arguments
T::DataType, specific data type used for calculations.
Keyword Arguments
x0::Vector{T}, initial point to start the solver at.α::T, initial step size used for backtracking.δ::T, backtracking decreasing factor applied toαwhen line search criterion not satisfied.ρ::T, factor involved in the acceptance criterion in the line search procedure. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions.line_search_max_iteration::Int64, maximum allowable iterations for the backtracking procedure.threshold::T, gradient threshold. If the norm gradient is below this, then iteration is terminated.max_iterations::Int, max number of iterations (gradient steps) taken by the solver.
Gradient Descent with Non-monotone Line Search
Fixed Step
OptimizationMethods.fixed_step_nls_maxval_gd — Functionfixed_step_nls_maxval_gd(optData::FixedStepNonmonLSMaxValG{T},
progData::P where P <: AbstractNLPModel{T, S}) where {T, S}Implementation of gradient descent with non-monotone line search using the maximum value of a fixed number of previous objective values on an optimization problem specified by progData. This implementation initializes the line search procedure with optData.α every iteration.
Reference(s)
Method
Let $\theta_{k-1}$ be the current iterate, and let $\alpha \in \mathbb{R}_{>0}$, $\delta \in (0, 1)$, and $\rho \in (0, 1)$. The $k^{th}$ iterate is generated as $\theta_k = \theta_{k-1} - \delta^t\alpha \dot F(\theta_{k-1})$ where $t + 1 \in \mathbb{N}$ is the smallest such number satisfying
\[ F(\theta_k) \leq \max_{\max(0, k-M) \leq j < k} F(\theta_{k-1}) - \rho\delta^t\alpha||\dot F(\theta_{k-1})||_2^2,\]
where $||\cdot||_2$ is the L2-norm, and $M \in \mathbb{N}_{>0}$.
Arguments
optData::FixedStepNonmonLSMaxValG{T}, the specification for the optimization method.progData<:AbstractNLPModel{T,S}, the specification for the optimization problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
OptimizationMethods.FixedStepNonmonLSMaxValG — TypeFixedStepNonmonLSMaxValGD{T} <: AbstractOptimizerData{T}Mutable struct that represents gradient descent using non-monotone line search where the initial step size for line search is fixed. This struct also keeps track of values during the optimization procedure implemented in fixed_step_nls_maxval_gd.
Fields
name::String, name of the solver for reference.α::T, the initial step size for line search.δ::T, backtracking decreasing factor applied toαwhen the line search criterion is not satisfiedρ::T, factor involved in the acceptance criterion in the line search procedure. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions.window_size::Int64, number of previous objective values that are used to construct the reference objective value for the line search criterion.line_search_max_iteration::Int64, maximum number of iterations for line search.objective_hist::CircularVector{T, Vector{T}}, buffer array of sizewindow_sizethat storeswindow_sizeprevious objective values.max_value::T, maximum value ofobjective_hist. This is the reference objective value used in the line search procedure.max_index::Int64, index of the maximum value that corresponds to the reference objective value.threshold::T, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64, max number of iterations (gradient steps) taken by the solver.iter_hist::Vector{Vector{T}}, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0). Thek+1entry corresponds to the iterate at iterationk.grad_val_hist::Vector{T}, a vector for storingmax_iterations+1gradient norm values. The first entry corresponds to iteration0. Thek+1entry correpsonds to the gradient norm at iterationk.stop_iteration::Int64, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1].
Constructors
FixedStepNonmonLSMaxValG(::Type{T}; x0::Vector{T}, α::T, δ::T, ρ::T,
window_size::Int64, line_search_max_iteration::Int64,
threshold::T, max_iterations::Int64) where {T}Arguments
T::DataType, specific data type used for calculations.
Keyword Arguments
α::T, the initial step size for line search.δ::T, backtracking decreasing factor applied toαwhen the line search criterion is not satisfiedρ::T, factor involved in the acceptance criterion in the line search procedure. Larger values correpsond to stricter descetn conditions, and smaller values correspond to looser descent conditions.window_size::Int64, number of previous objective values that are used to construct the reference value for the line search criterion.line_search_max_iteration::Int64, maximum number of iterations for line search.threshold::T, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64, max number of iterations (gradient steps) taken by the solver.
Gradient Descent with Non-sequential Armijo Line Search
OptimizationMethods.NonsequentialArmijoGD — TypeNonsequentialArmijoGD{T} <: AbstractOptimizerData{T}A mutable struct that represents gradient descent with non-sequential armijo line search and triggering events. It stores the specification for the method and records values during iteration.
Fields
name::String, name of the optimizer for reference.∇F_θk::Vector{T}, buffer array for the gradient of the initial inner loop iterate.norm_∇F_ψ::T, norm of the gradient of the current inner loop iterate.prev_∇F_ψ::Vector{T}, buffer array for the previous gradient in the inner loop. Necessary for updating the local Lipschitz approximation.prev_norm_step::T, norm of the step between inner loop iterates. Used for updating the local Lipschitz approximation.α0k::T, first step size used in the inner loop.δk::T, scaling factor used to condition the step size.δ_upper::T, upper limit imposed on the scaling factor when updating.ρ::T, parameter used in the non-sequential Armijo condition. Larger numbers indicate stricter descent conditions. Smaller numbers indicate less strict descent conditions.τ_lower::T, lower bound on the gradient interval triggering event.τ_upper::T, upper bound on the gradient interval triggering event.local_lipschitz_estimate::T, local Lipshitz approximation.threshold::T, norm gradient tolerance condition. Induces stopping when norm is at mostthreshold.max_iterations::Int64, max number of iterates that are produced, not including the initial iterate.iter_hist::Vector{Vector{T}}, store the iterate sequence as the algorithm progresses. The initial iterate is stored in the first position.grad_val_hist::Vector{T}, stores the norm gradient values at each iterate. The norm of the gradient evaluated at the initial iterate is stored in the first position.stop_iteration::Int64, the iteration number the algorithm stopped on. The iterate that induced stopping is saved atiter_hist[stop_iteration + 1].
Constructors
NonsequentialArmijoGD(::Type{T}; x0::Vector{T}, δ0::T, δ_upper::T, ρ::T,
threshold::T, max_iterations::Int64) where {T}Constructs an instance of type NonsequentialArmijoGD{T}.
Arguments
T::DataType, type for data and computation.
Keyword Arguments
x0::Vector{T}, initial point to start the optimization routine. Saved initer_hist[1].δ0::T, starting scaling factor.δ_upper::T, upper limit imposed on the scaling factor when updating.ρ::T, parameter used in the non-sequential Armijo condition. Larger numbers indicate stricter descent conditions. Smaller numbers indicate less strict descent conditions.threshold::T, norm gradient tolerance condition. Induces stopping when norm at mostthreshold.max_iterations::Int64, max number of iterates that are produced, not including the initial iterate.
OptimizationMethods.nonsequential_armijo_gd — Functionnonsequential_armijo_gd(optData::NonsequentialArmijoGD{T},
progData::P where P <: AbstractNLPModel{T, S}) where {T, S}Implementation of gradient descent with non-sequential armijo and triggering events. The optimization algorithm is specified through optData, and applied to the problem progData.
Reference(s)
Method
In what follows, we let $||\cdot||_2$ denote the L2-norm. Let $\theta_{k}$ for $k + 1 \in\mathbb{N}$ be the $k^{th}$ iterate of the optimization algorithm. Let $\delta_{k}, \tau_{\mathrm{grad},\mathrm{lower}}^k, \tau_{\mathrm{grad},\mathrm{upper}}^k$ be the $k^{th}$ parameters for the optimization method. The $k+1^{th}$ iterate and parameters are produced by the following procedure.
Let $\psi_0^k = \theta_k$, and recursively define
\[ \psi_{j}^k = \psi_0^k - \sum_{i = 0}^{j-1} \delta_k \alpha_i^k \dot F(\psi_i^k).\]
Let $j_k \in \mathbb{N}$ be the smallest iteration for which at least one of the conditions are satisfied:
- $||\psi_{j_k}^k - \theta_k||_2 > 10$,
- $||\dot F(\psi_{j_k}^k)||_2 \not\in (\tau_{\mathrm{grad},\mathrm{lower}}^k, \tau_{\mathrm{grad},\mathrm{upper}}^k)$,
- $j_k == 100$.
The next iterate and algorithmic parameters in optData are updated based on the result of the non-sequential Armijo condition
\[ F(\psi_{j_k}^k) \leq F(\theta_k) - \rho\delta_k\alpha_0^k||\dot F(\theta_k)||_2.\]
When this condition is not satisfied, the following quantities are updated.
- The iterate $\psi_{j_k}^k$ is rejected, and $\theta_{k+1} = \theta_k$
- The scaling factor $\delta_{k+1} = .5\delta_k$
When this condition is satisfied, the following quantities are updated.
- The iterate $\psi_{j_k}^k$ is accepted, and $\theta_{k+1} = \psi_{j_k}^k$.
- The scaling factor is updated as $\delta_{k+1} = \min(1.5*\delta_k, \bar\delta)$ when $||\dot F(\psi_{j_k}^k)||_2 > \tau_{\mathrm{grad},\mathrm{lower}}^k$, otherwise $\delta_{k+1} = \delta_k$.
- If $||\dot F(\psi_{j_k}^k)||_2 \not\in (\tau_{\mathrm{grad},\mathrm{lower}}^k, \tau_{\mathrm{grad},\mathrm{upper}}^k)$, then $\tau_{\mathrm{grad},\mathrm{lower}}^{k+1} = ||\dot F(\psi_{j_k}^k)||_2/\sqrt{2}$ and $\tau_{\mathrm{grad},\mathrm{upper}}^{k+1} = \sqrt{10}||\dot F(\psi_{j_k}^k)||_2$. Otherwise, this parameters are held constant.
Arguments
optData::NesterovAcceleratedGD{T}, the specification for the optimization method.progData<:AbstractNLPModel{T,S}, the specification for the optimization problem.
progData must have an initialize function that returns subtypes of AbstractPrecompute and AbstractProblemAllocate, where the latter has a grad argument.
Return
x::S, final iterate of the optimization algorithm.
The method above requires several utility functions. These are listed below.
OptimizationMethods.update_local_lipschitz_approximation — Functionupdate_local_lipschitz_approximation(j::Int64, k::Int64, djk::S, curr_grad::S,
prev_grad::S, prev_approximation::T, prev_acceptance::Bool) where {T, S}Given the previous approximation of the local Lipschitz constant, prev_approximation::T, update the current estimate. That is, return the Lipschitz approximation for inner loop iteration j and outer loop iteration k.
Method
The local Lipschitz approximation method is conducted as follows. Let $j$ be the current inner loop iteration counter, and let $k$ be the current outer loop iteration counter. Let $\psi_j^k$ be the $j^{th}$ iterate of the $k^{th}$ outer loop, and let $\hat L_{j}^k$ be the $j^th$ estimate of the $k^{th}$ outer loop of the local Lipschitz constant. The local estimate is updated according the following five cases.
When $j == 1$ and $k == 1$, this is the first iteration of the first inner loop, and as there is no information available we set it to
1.0.When $j == 1$ and $k > 1$, this is the first iteration of the $k^{th}$ inner loop, and we return $L_{j_{k-1}}^{k-1}$ which is the local Lipschitz estimates formed using information at the terminal iteration of the $k-1^{th}$ inner loop (i.e., this is the latest estimate).
When $j > 1$ and $k == 1$, this is an inner loop iteration where we have possible taken multiple steps, so we return the most 'local' estimate of the local Lipschitz constant which is $\frac{ ||\dot F(\psi_{j}^k) - \dot F(\psi_{j-1}^k)||_2}{ ||\psi_{j}^k - \psi_{j-1}^k||_2}.$
When $j > 1$ and $k > 1$ and $\psi_{j_{k-1}}^{k-1}$ satisfied the descent condition, then we return $\frac{ ||\dot F(\psi_{j}^k) - \dot F(\psi_{j-1}^k)||_2}{ ||\psi_{j}^k - \psi_{j-1}^k||_2}.$
When $j > 1$ and $k > 1$ and $\psi_{j_{k-1}}^{k-1}$ did not satisfy the descent condition, then we return $\max \left( \frac{||\dot F(\psi_{j}^k) - \dot F(\psi_{j-1}^k)||_2}{ ||\psi_{j}^k - \psi_{j-1}^k||_2}, \hat L_{j-1}^k \right).$
Arguments
j::Int64, inner loop iteration.k::Int6, outer loop iteration.norm_djk::T, norm of difference between\psi_j^kand $\psi_{j-1}^k$. On the first iteration of the inner loop this will not matter.curr_grad::S, gradient at $\psi_j^k$ (i.e., the current iterate).prev_grad::S, gradient at $\psi_{j-1}^k$ (i.e., the previous iterate).prev_approximation::T, the local Lipschitz approximation from the previous iterationprev_acceptance::Bool, whether or not the previous inner loop resulted in an accepted iterate.
Return
estimate::T, estimate of the local Lipschitz constant.
OptimizationMethods.compute_step_size — Functioncompute_step_size(τ_lower::T, norm_grad::T, local_lipschitz_estimate::T
) where {T}Computes the step size for the inner loop iterates.
Method
The inner loop iterates are generated according to the formula
\[ \psi_{j+1}^k = \psi_j^k - \delta_k\alpha_j^k \dot F(\psi_j^k).\]
The step size, $\alpha_j^k$ is computed as
\[ \alpha_j^k = \min \left( (\tau_{\mathrm{grad}, \mathrm{lower}}^k)^2 / C_{j,1}^k, 1 / C_{j,2}^k \right)\]
where
\[ C_{j,1}^k = ||\dot F(\psi_j^k)||_2^3 + .5 \hat{L}_j^k ||\dot F(\psi_j^k)||_2^2 + 1e-16\]
and
\[ C_{j,2}^k = ||\dot F(\psi_j^k)||_2 + .5 \hat{L}_j^k + 1e-16.\]
Arguments
τ_lower::T, lower bound on the gradient.norm_grad::T, norm of the gradient at $\psi_j^k$.local_lipschitz_estimate::T, local lipschitz approximation at inner loop iterationjand outer loop iterationk.
Return
αjk::T, the step size.
OptimizationMethods.inner_loop! — Functioninner_loop!(ψjk::S, θk::S, optData::NonsequentialArmijoGD{T},
progData::P1 where P1 <: AbstractNLPModel{T, S},
precomp::P2 where P2 <: AbstractPrecompute{T},
store::P3 where P3 <: AbstractProblemAllocate{T},
past_acceptance::Bool, k::Int64; max_iteration = 100) where {T, S}Conduct the inner loop iteration, modifying ψjk, optData, and store in place. ψjk gets updated to be the terminal iterate of the inner loop; the fields local_lipschitz_estimate, norm_∇F_ψ, and prev_∇F_ψ are updated in optData; the fields grad in store gets updated to be the gradient at ψjk.
Arguments
ψjk::S, buffer array for the inner loop iterates.θk::S, starting iterate.optData::NonsequentialArmijoGD{T},structthat specifies the optimization algorithm. Fields are modified during the inner loop.progData::P1 where P1 <: AbstractNLPModel{T, S},structthat specifies the optimization problem. Fields are modified during the inner loop.precomp::P2 where P2 <: AbstractPrecompute{T},structthat has precomputed values. Required to take advantage of this during the gradient computation.store::P3 where P3 <: AbstractProblemAllocate{T},structthat contains buffer arrays for computation.past_acceptance::Bool, flag indicating if the previous inner loop resulted in a success (i.e., $F(\theta_k) < F(\theta_{k-1})$).k::Int64, outer loop iteration for computation of the local Lipschitz approximation scheme.
Optional Keyword Arguments
max_iteration = 100, maximum number of allowable iteration of the inner loop. Should be kept at100as that is what is specified in the paper, but is useful to change for testing.
Returns
j::Int64, the iteration for which a triggering event evaluated to true.
OptimizationMethods.update_algorithm_parameters! — Functionupdate_algorithm_parameters!(θkp1::S, optData::NonsequentialArmijoGD{T},
achieved_descent::Bool, iter::Int64) where {T, S}Given that the non-sequential Armijo condition is checked, update the parameters the optimization method. The method updates the following variables in place.
θkp1is updated to be the next outer loop iterate.optDatahas (potentially) the following fields updated:δk,τ_lower,τ_upper.
Arguments
θkp1::S, buffer array for the storage of the next iterate.optData::NonsequentialArmijoGD{T},structthat specifies the optimization algorithm.achieved_descent::Bool, boolean flag indicating whether or not the descent condition was achieved.iter::Int64, the current iteration of the method. The outer loop iteration. This is requried as it is used to overwriteθkp1with the previous iterate.
Returns
- A boolean flag equal to
achieved_descentto indicate whetherθkp1is modified in-place.
Line search Helper Functions
Backtracking
OptimizationMethods.backtracking! — Functionbacktracking!(θk::S, θkm1::S, F::Function, gkm1::S, step_direction::S,
reference_value::T, α::T, δ::T, ρ::T; max_iteration::Int64 = 100
) where {T, S}Implementation of backtracking which modifies θk in place. This method should be used for general step directions. If gkm1 is the step direction use the other backtracking!(...) method.
Reference(s)
Nocedal and Wright. "Numerical Optimization". Springer New York, NY.
Method
Let $\theta_{k-1}$ be the current iterate, and let $\alpha \in \mathbb{R}_{>0}$, $\delta \in (0, 1)$, and $\rho \in (0, 1)$. Let $d_k$ be the step direction, then $\theta_k = \theta_{k-1} - \delta^t\alpha d_k$ where $t + 1 \in \mathbb{N}$ is the smallest such number satisfying
\[ F(\theta_k) \leq O_{k-1} - \rho\delta^t\alpha \dot F(\theta_{k-1})^\intercal d_k,\]
where $O_{k-1}$ is some reference value.
Arguments
θk::S, buffer array for the next iterate.θkm1::S, current iterate the optimization algorithm.F::Function, objective function. Should take in a single argument and return the value of the objective at the input value.gkm1::S, gradient value atθkm1.step_direction::S, direction to moveθkm1to formθk.reference_value::T, value to check the objective function atθkagainst.α::T, initial step size.δ::T, backtracking decrease factor.ρ::T, factor involved in the acceptance criterion inθk. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions onθk.
Optional Keyword Arguments
max_iteration::Int64 = 100, the maximum allowable iterations for the line search procedure. In the language above, when $t$ is equal tomax_iterationthe algorithm will terminate.
Return
backtracking_condition_satisfied::Bool, whether the backtracking condition is satisfied before the max iteration limit.
backtracking!(θk::S, θkm1::S, F::Function, gkm1::S, norm_gkm1_squared::T,
reference_value::T, α::T, δ::T, ρ::T; max_iteration::Int64 = 100)
where {T, S}Implementation of backtracking which modifies θk in place. This method assumes that the step direction is -gkm1.
Reference(s)
Nocedal and Wright. "Numerical Optimization". Springer New York, NY.
Method
Let $\theta_{k-1}$ be the current iterate, and let $\alpha \in \mathbb{R}_{>0}$, $\delta \in (0, 1)$, and $\rho \in (0, 1)$. Let $\dot F(\theta_{k-1})$ be the step direction, then $\theta_k = \theta_{k-1} - \delta^t\alpha \dot F(\theta_{k-1})$ where $t + 1 \in \mathbb{N}$ is the smallest such number satisfying
\[ F(\theta_k) \leq O_{k-1} - \rho\delta^t\alpha ||\dot F(\theta_{k-1})||_2^2,\]
where $O_{k-1}$ is some reference value, and $||\cdot||_2$ is the L2-norm.
Arguments
θk::S, buffer array for the next iterate.θkm1::S, current iterate the optimization algorithm.F::Function, objective function. Should take in a single argument and return the value of the objective at the input value.gkm1::S, gradient value atθkm1.norm_gkm1_squared::T, norm squared for the value ofgkm1.reference_value::T, value to check the objective function atθkagainst.α::T, initial step size.δ::T, backtracking decrease factor.ρ::T, factor involved in the acceptance criterion inθk. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions onθk.
Optional Keyword Arguments
max_iteration::Int64 = 100, the maximum allowable iterations for the line search procedure. In the language above, when $t$ is equal tomax_iterationthe algorithm will terminate.
Return
backtracking_condition_satisfied::Bool, whether the backtracking condition is satisfied before the max iteration limit.
Non-sequential Armijo
OptimizationMethods.non_sequential_armijo_condition — Functionnon_sequential_armijo_condition(F_ψjk::T, reference_value::T,
norm_grad_θk::T, ρ::T, δk::T, α0k::T) where {T}Check if F_ψjk satisfies the non-sequential armijo condition with respect to reference_value and the remaining parameters. Returns a boolean value indicating if the descent condition is satisfied or not.
Method
Let $F(\theta) : \mathbb{R}^n \to \mathbb{R}$ be the objective function. Suppose that $\theta_k \in \mathbb{R}^n$ is an iterate of an optimization routine. Let $\psi_0^k = \theta_k$ and for $j \in \lbrace 1,...,T \rbrace$ for some $T \in \mathbb{N}$, recursively define
\[ \psi_j^k = \psi_0^k - \sum_{t = 0}^{j-1} \delta_k \alpha_t^k \dot F(\psi_t^k).\]
Then, the (monotone) non-sequential armijo condition requires that
\[ F(\psi_j^k) < F(\psi_0^k) - \rho \delta_k \alpha_0^k ||\dot F(\psi_0^k)||_2^2,\]
where $||\cdot||_2$ is the L2-norm and $\rho \in (0, 1)$.
This function implements checking the inequality, where F_ψjk corresponds to $F(\psi_j^k)$, reference_value corresponds to $F(\psi_0^k)$, norm_grad_θk to $||\dot F(\psi_0^k)||_2$, ρ to $\rho$, δk to $\delta_k$, and α0k to $\alpha_0^k$. To see more about how this method is used read the documentation for gradient descent with non-sequential armijo
Arguments
F_ψjk::T, numeric value on the LHS of the inequality. In optimization context, this is the objective value of a trial iterate to check if sufficient descent is achieved.reference_value::T, numeric value on the RHS of the inequality. In the optimization context, the value of the current iterateF_ψjkmust be smaller than this to guarantee a sufficient descent criterion.norm_grad_θk::T, numeric value forming the amount of descent that needs to be achieved. This value is usually the norm of the gradient of a previous iterate.ρ::T, parameter in the line search criterion dictating how much descent should be required. Should be positive. Larger values indicate stricter conditions, and lower value indicate looser conditions.δk::T, numeric value that corresponds to a scaling factor for the step size.α0k::T, numeric value. In the context of non-sequential armijo gradient descent this is the first step size used in an inner loop.
Return
flag::Bool,trueif the descent condition is satisfied, andfalseif the descent condition is not satisfied.
Index
OptimizationMethods.BacktrackingGDOptimizationMethods.BarzilaiBorweinGDOptimizationMethods.DiminishingStepGDOptimizationMethods.FixedStepGDOptimizationMethods.FixedStepNonmonLSMaxValGOptimizationMethods.LeastSquaresOptimizationMethods.LipschitzApproxGDOptimizationMethods.LogisticRegressionOptimizationMethods.NesterovAcceleratedGDOptimizationMethods.NonsequentialArmijoGDOptimizationMethods.PoissonRegressionOptimizationMethods.QLLogisticCenteredExpOptimizationMethods.QLLogisticCenteredLogOptimizationMethods.QLLogisticMonomialOptimizationMethods.QLLogisticSinOptimizationMethods.WeightedNormDampingGDOptimizationMethods.backtracking!OptimizationMethods.backtracking_gdOptimizationMethods.barzilai_borwein_gdOptimizationMethods.centered_expOptimizationMethods.centered_shifted_logOptimizationMethods.compute_step_sizeOptimizationMethods.dcentered_expOptimizationMethods.dcentered_shifted_logOptimizationMethods.ddlogisticOptimizationMethods.diminishing_step_gdOptimizationMethods.dlinear_plus_sinOptimizationMethods.dlogisticOptimizationMethods.dmonomial_plus_constantOptimizationMethods.fixed_step_gdOptimizationMethods.fixed_step_nls_maxval_gdOptimizationMethods.inner_loop!OptimizationMethods.inverse_complimentary_log_logOptimizationMethods.inverse_k_step_sizeOptimizationMethods.inverse_log2k_step_sizeOptimizationMethods.inverse_probitOptimizationMethods.linear_plus_sinOptimizationMethods.lipschitz_approximation_gdOptimizationMethods.logisticOptimizationMethods.monomial_plus_constantOptimizationMethods.nesterov_accelerated_gdOptimizationMethods.non_sequential_armijo_conditionOptimizationMethods.nonsequential_armijo_gdOptimizationMethods.stepdown_100_step_sizeOptimizationMethods.update_algorithm_parameters!OptimizationMethods.update_local_lipschitz_approximationOptimizationMethods.weighted_norm_damping_gd