Contents
- Contents
- Barzilai Borwein Method
- Gradient Descent with Fixed Step Size
- Lipschitz Approximation (Malitsky & Mishchenko)
- Weighted Norm Damping Gradient Method (WNGrad)
- Nesterov's Accelerated Gradient Descent
- Gradient Descent with Diminishing Step Size
- Gradient Descent with Backtracking
- Gradient Descent with Non-monotone Line Search
- Gradient Descent with Non-sequential Armijo Line Search
- Line search Helper Functions
- Index
Barzilai Borwein Method
OptimizationMethods.barzilai_borwein_gd
— Functionbarzilai_borwein_gd(optData::BarzilaiBorweinGD{T}, progData::P
where P <: AbstractNLPModel{T, S}) where {T,S}
Implements gradient descent with Barzilai-Borwein step size and applies the method to the optimization problem specified by progData
.
Reference(s)
Barzilai and Borwein. "Two-Point Step Size Gradient Methods". IMA Journal of Numerical Analysis.
Method
Given iterates $\lbrace x_0,\ldots,x_k\rbrace$, the iterate $x_{k+1}$ is equal to $x_k - \alpha_k \nabla f(x_k)$, where $\alpha_k$ is one of two versions.
Long Step Size Version (if optData.long_stepsize==true
)
If $k=0$, then $\alpha_0$ is set to optData.init_stepsize
. For $k>0$,
\[\alpha_k = \frac{ \Vert x_k - x_{k-1} \Vert_2^2}{(x_k - x_{k-1})^\intercal (\nabla f(x_k) - \nabla f(x_{k-1}))}.\]
Short Step Size Version (if optData.long_stepsize==false
)
If $k=0$, then $\alpha_0$ is set to optData.init_stepsize
. For $k>0$,
\[\alpha_k = \frac{(x_k - x_{k-1})^\intercal (\nabla f(x_k) - \nabla f(x_{k-1}))}{\Vert \nabla f(x_k) - \nabla f(x_{k-1})\Vert_2^2}.\]
Arguments
optData::BarzilaiBorweinGD{T}
, the specification for the optimization method.progData<:AbstractNLPModel{T,S}
, the specification for the optimization problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
OptimizationMethods.BarzilaiBorweinGD
— TypeBarzilaiBorweinGD{T} <: AbstractOptimizerData{T}
A structure for storing data about gradient descent using the Barzilai-Borwein step size, and the progress of its application on an optimization problem.
Fields
name:String
, name of the solver for reference.init_stepsize::T
, initial step size to start the method.long_stepsize::Bool
, flag for step size; if true, use the long version of Barzilai-Borwein. If false, use the short version.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64
, max number of iterations (gradient steps) taken by the solver.iter_diff::Vector{T}
, a buffer for storing differences between subsequent iterate values that are used for computing the step sizegrad_diff::Vector{T}
, a buffer for storing differences between gradient values at adjacent iterates, which is used to compute the step sizeiter_hist::Vector{Vector{T}}
, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0
). Thek+1
entry corresponds to the iterate at iterationk
.grad_val_hist::Vector{T}
, a vector for storingmax_iterations+1
gradient norm values. The first entry corresponds to iteration0
. Thek+1
entry correpsonds to the gradient norm at iterationk
.stop_iteration::Int64
, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1]
.
Constructors
BarzilaiBorweinGD(::Type{T}; x0::Vector{T}, init_stepsize::T,
long_stepsize::Bool, threshold::T, max_iterations::Int) where {T}
Constructs the struct
for the Barzilai-Borwein optimization method
Arguments
T::DataType
, specific data type used for calculations.
Keyword Arguments
x0::Vector{T}
, initial point to start the solver at.init_stepsize::T
, initial step size used for the first iteration.long_stepsize::Bool
, flag for step size; if true, use the long version of Barzilai-Borwein, if false, use the short version.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration is terminated.max_iterations::Int
, max number of iterations (gradient steps) taken by the solver.
Gradient Descent with Fixed Step Size
OptimizationMethods.fixed_step_gd
— Functionfixed_step_gd(optData::FixedStepGD{T}, progData<:AbstractNLPModel{T,S})
where {T,S}
Implements fixed step-size gradient descent for the desired optimization problem specified by progData
.
Method
The iterates are updated according to the procedure
\[x_{k+1} = x_k - \alpha ∇f(x_k),\]
where $\alpha$ is the step size, $f$ is the objective function, and $∇f$ is the gradient function of $f$.
Arguments
optData::FixedStepGD{T}
, the specification for the optimization method.progData<:AbstractNLPModel{T,S}
, the specification for the optimization problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
OptimizationMethods.FixedStepGD
— TypeFixedStepGD{T} <: AbstractOptimizerData{T}
A structure for storing data about fixed step-size gradient descent, and the progress of its application on an optimization problem.
Fields
name::String
, name of the solver for referencestep_size::T
, the step-size selection for the optimization procedurethreshold::T
, the threshold on the norm of the gradient to induce stoppingmax_iterations::Int
, the maximum allowed iterationsiter_hist::Vector{Vector{T}}
, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0
). Thek+1
entry corresponds to the iterate at iterationk
.grad_val_hist::Vector{T}
, a vector for storingmax_iterations+1
gradient norm values. The first entry corresponds to iteration0
. Thek+1
entry correpsonds to the gradient norm at iterationk
stop_iteration
, iteration number that the algorithm stopped at. Iterate numberstop_iteration
is produced.
Constructors
FixedStepGD(::Type{T}; x0::Vector{T}, step_size::T, threshold::T,
max_iterations::Int) where {T}
Constructs the struct
for the optimizer.
Arguments
T::DataType
, specific data type for the calculations
Keyword Arguments
x0::Vector{T}
, the initial iterate for the optimizersstep_size::T
, the step size of the optimizerthreshold::T
, the threshold on the norm of the gradient to induce stoppingmax_iterations::Int
, the maximum number of iterations allowed
Lipschitz Approximation (Malitsky & Mishchenko)
OptimizationMethods.lipschitz_approximation_gd
— Functionlipschitz_approximation_gd(optData::FixedStepGD{T}, progData::P where P
<: AbstractNLPModel{T, S}) where {T, S}
Implements gradient descent with adaptive step sizes formed through a lipschitz approximation for the desired optimization problem specified by progData
.
This method is designed for convex optimization problems.
References(s)
Method
The iterates are updated according to the procedure,
\[x_{k+1} = x_{k} - \alpha_k \nabla f(x_{k}),\]
where $\alpha_k$ is the step size and $\nabla f$ is the gradient function of the objective function $f$.
The step size is computed depending on $k$. When $k = 0$, the step size is set to optData.init_stepsize
. When $k > 0$,
\[\alpha_k = \min\left\lbrace \sqrt{1 + \theta_{k-1}}\alpha_{k-1}, \frac{\Vert x_k - x_{k-1} \Vert}{\Vert \nabla f(x_k) - \nabla f(x_{k-1})\Vert} \right\rbrace,\]
where $\theta_0 = \inf$ and $\theta_k = \alpha_k / \alpha_{k-1}$.
Arguments
optData::LipschitzApproxGD{T}
, the specification for the optimization method.progData<:AbstractNLPModel{T,S}
, the specification for the optimization problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
OptimizationMethods.LipschitzApproxGD
— TypeLipschitzApproxGD{T} <: AbstractOptimizerData{T}
A structure for storing data about adaptive gradient descent using a Lipschitz Approximation scheme (AdGD), and the progress of its application on an optimization problem.
Fields
name::String
, name of the solver for referenceinit_stepsize::T
, the initial step size for the methodprev_stepsize::T
, step size used atiter - 1
wheniter > 1
.theta::T
, element used in the computation of the step size. See the referenced paper for more information.lipschitz_approximation::T
, help the lipschitz approximation used in the computation of the step size. See the referenced paper for more information.threshold::T
, the threshold on the norm of the gradient to induce stoppingmax_iterations::Int64
, the maximum allowed iterationsiter_diff::Vector{T}
, a buffer for storing differences between subsequent iterate values that are used for computing the step sizegrad_diff::Vector{T}
, a buffer for storing differences between gradient values at adjacent iterates, which is used to compute the step sizeiter_hist::Vector{Vector{T}}
, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0
). Thek+1
entry corresponds to the iterate at iterationk
.grad_val_hist::Vector{T}
, a vector for storingmax_iterations+1
gradient norm values. The first entry corresponds to iteration0
. Thek+1
entry corresponds to the gradient norm at iterationk
stop_iteration::Int64
, iteration number that the algorithm stopped at. Iterate numberstop_iteration
is produced.
Constructors
LipschitzApproxGD(::Type{T}; x0::Vector{T}, init_stepsize::T, threshold::T,
max_iterations::Int) where {T}
Constructs the struct
for the optimizer.
Arguments
T::DataType
, specific data type for the calculations
Keyword Arguments
x0::Vector{T}
, the initial iterate for the optimizersinit_stepsize::T
, the initial step size for the methodthreshold::T
, the threshold on the norm of the gradient to induce stoppingmax_iterations::Int
, the maximum number of iterations allowed
Weighted Norm Damping Gradient Method (WNGrad)
OptimizationMethods.weighted_norm_damping_gd
— Functionweighted_norm_damping_gd(optData::WeightedNormDampingGD{T},
progData::P where P <: AbstractNLPModel{T, S}) where {T, S}
Method that implements gradient descent with weighted norm damping step size using the specifications in optData
on the problem specified by progData
.
Reference
Wu, Xiaoxia et. al. "WNGrad: Learn the Learning Rate in Gradient Descent". arxiv, https://arxiv.org/abs/1803.02865
Method
Let $\theta_k$ be the $k^{th}$ iterate, and $\alpha_k$ be the $k^{th}$ step size. The optimization method generate iterates following
\[\theta_{k + 1} = \theta_{k} - \alpha_k \nabla f(\theta_k),\]
where $\nabla f$ is the gradient of the objective function $f$.
The step size depends on the iteration number $k$. For $k = 0$, the step size $\alpha_0$ is the reciprocal of optData.init_norm_damping_factor
. For $k > 0$, the step size is iteratively updated as
\[\alpha_k = \left[ \frac{1}{\alpha_{k-1}} + \Vert\dot F(\theta_k)\Vert_2^2 \alpha_{k-1} \right]^{-1}.\]
When $\alpha_0 < \Vert\dot F(\theta_0)\Vert_2^{-1}$ and a globally Lipschitz smooth objective function is used, then the method is guaranteed to find an $\epsilon$-stationary point. It is recommended then that optData.init_norm_damping_factor
exceed $\Vert\dot F(\theta_0)\Vert_2$.
Arguments
optData::WeightedNormDampingGD{T}
, specification for the optimization algorithm.progData::P where P <: AbstractNLPModel{T, S}
, specification for the problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
OptimizationMethods.WeightedNormDampingGD
— TypeWeightedNormDampingGD{T} <: AbstractOptimizerData{T}
A mutable struct that represents gradient descent using the weighted-norm damping step size. It stores the specification for the method and records values during iteration.
Fields
name::String
, name of the optimizer for recording purposesinit_norm_damping_factor::T
, initial damping factor. This value's reciprocal will be the initial step size.threshold::T
, norm gradient tolerance condition. Induces stopping when norm is at mostthreshold
.max_iterations::Int64
, max number of iterates that are produced, not including the initial iterate.iter_hist::Vector{Vector{T}}
, store the iterate sequence as the algorithm progresses. The initial iterate is stored in the first position.grad_val_hist::Vector{T}
, stores the norm gradient values at each iterate. The norm of the gradient evaluated at the initial iterate is stored in the first position.stop_iteration::Int64
, the iteration number the algorithm stopped on. The iterate that induced stopping is saved atiter_hist[stop_iteration + 1]
.
Constructors
WeightedNormDampingGD(::Type{T}; x0::Vector{T}, init_norm_damping_factor::T,
threshold::T, max_iterations::Int64) where {T}
Constructs an instance of type WeightedNormDampingGD{T}
.
Arguments
T::DataType
, type for data and computation
Keyword Arguments
x0::Vector{T}
, initial point to start the optimization routine. Saved initer_hist[1]
.init_norm_damping_factor::T
, initial damping factor, which will correspond to the reciprocoal of the initial step size.threshold::T
, norm gradient tolerance condition. Induces stopping when norm at mostthreshold
.max_iterations::Int64
, max number of iterates that are produced, not including the initial iterate.
Nesterov's Accelerated Gradient Descent
OptimizationMethods.nesterov_accelerated_gd
— Functionnesterov_accelerated_gd(optData::NesterovAcceleratedGD{T}, progData::P
where P <: AbstractNLPModel{T, S}) where {T, S}
Implements Nesterov's Accelerated Gradient Descent as specified by optData
on the problem specified by progData
.
This algorithm is designed for convex problems.
Reference(s)
- See Algorithm 1 of Li et. al. "Convex and Non-convex Optimization Under Generalized Smoothness". arxiv, https://arxiv.org/abs/2306.01264.
- See line-search based approach in Nesterov, Yurii. 1983. “A Method for Solving the Convex Programming Problem with Convergence Rate O(1/K^2).” Proceedings of the USSR Academy of Sciences 269:543–47.
Method
Let the objective function be denoted by $F(\theta)$ and $\nabla F( \theta)$ denote its gradient function. Given $\theta_0$ and a step size $\alpha$ (equal to optData.step_size
), the method produces five sequences. At $k=0$,
\[\begin{cases} B_0 &= 0 \\ z_0 &= \theta_0 \\ \Delta_0 & = 1 \\ y_0 &= \theta_0; \end{cases}\]
and, for $k\in\mathbb{N}$,
\[\begin{cases} B_{k} &= B_{k-1} + \Delta_{k-1} \\ \theta_k &= y_{k-1} - \alpha \nabla F(y_{k-1}) \\ z_k &= z_{k-1} - \alpha\Delta_{k-1}\nabla F(y_{k-1}) \\ \Delta_k &= \frac{1}{2}\left( 1 + \sqrt{4 B_{k} + 1} \right) \\ y_k &= \theta_k + \frac{\Delta_{k}}{B_k + \Delta_k + \alpha^{-1}} (z_k - \theta_k). \end{cases}\]
The iterate sequence of interest is $\lbrace \theta_k \rbrace$.
Arguments
optData::NesterovAcceleratedGD{T}
, the specification for the optimization method.progData<:AbstractNLPModel{T,S}
, the specification for the optimization problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
OptimizationMethods.NesterovAcceleratedGD
— TypeNesterovAcceleratedGD{T} <: AbstractOptimizerData
A structure that represents Nesterov Accelerated Gradient Descent. Stores variables related to the method, and tracks quantities as the algorithm progresses.
Fields
name:String
, name of the solver for reference.step_size::T
, step size used in the method.z::Vector{T}
, buffer array for auxiliary iterate sequencey::Vector{T}
, buffer array for convex combination of iterate and auxiliary sequenceB::T
, auxiliary quadratic scaling term for computing acceleration weights and step size.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64
, max number of iterations (gradient steps) taken by the solver.iter_hist::Vector{Vector{T}}
, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0
). Thek+1
entry corresponds to the iterate at iterationk
.grad_val_hist::Vector{T}
, a vector for storingmax_iterations+1
gradient norm values. The first entry corresponds to iteration0
. Thek+1
entry corresponds to the gradient norm at iterationk
.stop_iteration::Int64
, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1]
.
Constructors
NesterovAcceleratedGD(::Type{T}; x0::Vector{T}, step_size::T,
threshold::T, max_iterations::Int64) where {T}
Arguments
T::DataType
, specific data type used for calculations.
Keyword Arguments
x0::Vector{T}
, initial point to start the solver at.step_size::T
, step size used in the method.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration is terminated.max_iterations::Int
, max number of iterations (gradient steps) taken by the solver.
Gradient Descent with Diminishing Step Size
OptimizationMethods.DiminishingStepGD
— TypeDiminishingStepGD{T} <: AbstractOptimizerData{T}
A structure for storing data about gradient descent using diminishing step sizes, and recording the progress of its application on an optimization problem.
Fields
name::String
, name of the solver for reference.step_size_function::Function
, step size function. Should take the iteration number and return the step size for that iteration.step_size_scaling::T
, factor that is multipled with the amount of the step size function.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64
, max number of iterations (gradient steps) taken by the solver.iter_hist::Vector{Vector{T}}
, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0
). Thek+1
entry corresponds to the iterate at iterationk
.grad_val_hist::Vector{T}
, a vector for storingmax_iterations+1
gradient norm values. The first entry corresponds to iteration0
. Thek+1
entry correpsonds to the gradient norm at iterationk
.stop_iteration::Int64
, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1]
.
step_size_function
should take in two arguments, the data type for computation T
and the iteration number. For example, calling step_size_function(Float64, 1)
should return the step size as a Float64
for the iteration 1
.
Constructors
DiminishingStepGD(::Type{T}; x0::Vector{T}, step_size_function::Function,
step_size_scaling::T, threshold::T, max_iterations::Int)
Constructs the struct
for the diminishing step size gradient descent method.
Arguments
T::DataType
, specific data type used for calculations.
Keyword Arguments
x0::Vector{T}
, initial point to start the solver at.step_size_function::Function
, step size function. Should take the iteration number and return the step size for that iteration.step_size_scaling::T
, factor that is multipled with the amount of the step size function.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration is terminated.max_iterations::Int
, max number of iterations (gradient steps) taken by the solver.
OptimizationMethods.diminishing_step_gd
— Functiondiminishing_step_gd(optData::DiminishingStepGD{T}, progData::P
where P <: AbstractNLPModel{T, S}) where {T, S}
Implements gradient descent with diminishing step sizes and applies the method to the optimization problem specified by progData
.
Reference(s)
Bertsekas, Dimitri. "Nonlinear Programming". 3rd Edition, Athena Scientific, Chapter 1.
Method
Given iterates $\lbrace x_0,\ldots,x_k\rbrace$, the iterate $x_{k + 1}$ is equal to $x_k - \alpha_k \nabla f(x_k)$, where $\alpha_k$ is equal to optData.step_size_scaling * optData.step_size_function(T, k)
.
The step size function should satisfy several conditions. First, $\alpha_k > 0$ for all $k$. Second, $\lim_{k \to \infty} \alpha_k = 0.$ Finally, $\sum_{k=0}^{\infty} \alpha_k = \infty.$ See Patel and Berahas (2024) for details.
Arguments
optData::DiminishingStepGD{T}
, the specification for the optimization method.progData<:AbstractNLPModel{T,S}
, the specification for the optimization problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
Below is a list of step size functions that are in the library. The step size sequence generated by these functions, $\lbrace \alpha_k \rbrace$ satisfies $\alpha_k > 0$, $\lim_{k \to\infty} \alpha_k = 0$ and $\sum_{k=0}^\infty \alpha_k = \infty$.
OptimizationMethods.inverse_k_step_size
— Functioninverse_k_step_size(::Type{T}, k::Int64)
Return the inverse of k
.
Method
The step size sequence generated for $k \in \mathbb{N}$ is
\[ \alpha_k = \frac{1}{k+1},\]
when using this method.
Arguments
T::Type
, data type that the computations are done in.k::Int64
, index of the step size needed.
Returns
Returns a number of type T
.
OptimizationMethods.inverse_log2k_step_size
— Functioninverse_log2k_step_size(::Type{T}, k::Int64) where {T}
Method
The step size sequence generated when using this method is
\[ \alpha_k = \frac{1}{\lfloor \log_2(k+1) + 1 \rfloor}\]
for $k \in \mathbb{N}$.
Arguments
T::Type
, data type that the computation are done in.k::Int64
, index of the step size needed.
Returns
Returns a number of type T
.
OptimizationMethods.stepdown_100_step_size
— Functionstepdown_100_step_size(::Type{T}, k::Int64) where {T}
Method
The step size sequence generated is
\[ \alpha_k = \frac{1}{2^i}\]
for $k \in [ (2^i-1)100, (2^{i+1}-1)100)$.
Arguments
T::Type
, data type that the computation are done in.k::Int64
, index of the step size needed.
Returns
Returns a number of type T
.
Gradient Descent with Backtracking
OptimizationMethods.backtracking_gd
— Functionbacktracking_gd(optData::BacktrackingGD{T},
progData::P where P <: AbstractNLPModel{T, S}) where {T, S}
Reference(s)
Nocedal and Wright. "Numerical Optimization". Springer New York, NY.
Method
Let $\theta_{k-1}$ be the current iterate, and let $\alpha \in \mathbb{R}_{>0}$, $\delta \in (0, 1)$, and $\rho \in (0, 1)$. The $k^{th}$ iterate is generated as $\theta_k = \theta_{k-1} - \delta^t\alpha \dot F(\theta_{k-1})$ where $t + 1 \in \mathbb{N}$ is the smallest such number satisfying
\[ F(\theta_k) \leq F(\theta_{k-1}) - \rho\delta^t\alpha ||\dot F(\theta_{k-1})||_2^2,\]
where $||\cdot||_2$ is the L2-norm.
Theoretically, there exists such a $t$, but it can be made arbitrarily large. Therefore, the line search procedure stops searching after optData.line_search_max_iteration
. The current implementation terminates the procedure if the backtracking condition is not satisfied, and returns the previous iterate.
Arguments
optData::BarzilaiBorweinGD{T}
, the specification for the optimization method.progData<:AbstractNLPModel{T,S}
, the specification for the optimization problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
OptimizationMethods.BacktrackingGD
— TypeBacktrackingGD{T} <: AbstractOptimizerData{T}
Mutable sturcture representing gradient descent using backtracking. It also stores and keeps track of values during the optimization routine.
Fields
name:String
, name of the solver for reference.α::T
, initial step size used for backtracking.δ::T
, backtracking decreasing factor applied toα
when line search criterion is not satisfied.ρ::T
, factor involved in the acceptance criterion in the line search procedure. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions.line_search_max_iteration::Int64
, maximum allowable iterations for the backtracking procedure.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64
, max number of iterations (gradient steps) taken by the solver.iter_hist::Vector{Vector{T}}
, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0
). Thek+1
entry corresponds to the iterate at iterationk
.grad_val_hist::Vector{T}
, a vector for storingmax_iterations+1
gradient norm values. The first entry corresponds to iteration0
. Thek+1
entry correpsonds to the gradient norm at iterationk
.stop_iteration::Int64
, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1]
.
Constructors
BacktrackingGD(::Type{T}; x0::Vector{T}, α::T, δ::T, ρ::T,
line_search_max_iteration::Int64, threshold::T, max_iteration::Int64)
where {T}
Returns a struct
of type BacktrackingGD{T}
with all the field initialized to either initialized to the values given, or their default values.
Arguments
T::DataType
, specific data type used for calculations.
Keyword Arguments
x0::Vector{T}
, initial point to start the solver at.α::T
, initial step size used for backtracking.δ::T
, backtracking decreasing factor applied toα
when line search criterion not satisfied.ρ::T
, factor involved in the acceptance criterion in the line search procedure. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions.line_search_max_iteration::Int64
, maximum allowable iterations for the backtracking procedure.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration is terminated.max_iterations::Int
, max number of iterations (gradient steps) taken by the solver.
Gradient Descent with Non-monotone Line Search
Fixed Step
OptimizationMethods.fixed_step_nls_maxval_gd
— Functionfixed_step_nls_maxval_gd(optData::FixedStepNonmonLSMaxValG{T},
progData::P where P <: AbstractNLPModel{T, S}) where {T, S}
Implementation of gradient descent with non-monotone line search using the maximum value of a fixed number of previous objective values on an optimization problem specified by progData
. This implementation initializes the line search procedure with optData.α
every iteration.
Reference(s)
Method
Let $\theta_{k-1}$ be the current iterate, and let $\alpha \in \mathbb{R}_{>0}$, $\delta \in (0, 1)$, and $\rho \in (0, 1)$. The $k^{th}$ iterate is generated as $\theta_k = \theta_{k-1} - \delta^t\alpha \dot F(\theta_{k-1})$ where $t + 1 \in \mathbb{N}$ is the smallest such number satisfying
\[ F(\theta_k) \leq \max_{\max(0, k-M) \leq j < k} F(\theta_{k-1}) - \rho\delta^t\alpha||\dot F(\theta_{k-1})||_2^2,\]
where $||\cdot||_2$ is the L2-norm, and $M \in \mathbb{N}_{>0}$.
Arguments
optData::FixedStepNonmonLSMaxValG{T}
, the specification for the optimization method.progData<:AbstractNLPModel{T,S}
, the specification for the optimization problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
OptimizationMethods.FixedStepNonmonLSMaxValG
— TypeFixedStepNonmonLSMaxValGD{T} <: AbstractOptimizerData{T}
Mutable struct
that represents gradient descent using non-monotone line search where the initial step size for line search is fixed. This struct
also keeps track of values during the optimization procedure implemented in fixed_step_nls_maxval_gd
.
Fields
name::String
, name of the solver for reference.α::T
, the initial step size for line search.δ::T
, backtracking decreasing factor applied toα
when the line search criterion is not satisfiedρ::T
, factor involved in the acceptance criterion in the line search procedure. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions.window_size::Int64
, number of previous objective values that are used to construct the reference objective value for the line search criterion.line_search_max_iteration::Int64
, maximum number of iterations for line search.objective_hist::CircularVector{T, Vector{T}}
, buffer array of sizewindow_size
that storeswindow_size
previous objective values.max_value::T
, maximum value ofobjective_hist
. This is the reference objective value used in the line search procedure.max_index::Int64
, index of the maximum value that corresponds to the reference objective value.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64
, max number of iterations (gradient steps) taken by the solver.iter_hist::Vector{Vector{T}}
, a history of the iterates. The first entry corresponds to the initial iterate (i.e., at iteration0
). Thek+1
entry corresponds to the iterate at iterationk
.grad_val_hist::Vector{T}
, a vector for storingmax_iterations+1
gradient norm values. The first entry corresponds to iteration0
. Thek+1
entry correpsonds to the gradient norm at iterationk
.stop_iteration::Int64
, the iteration number that the solver stopped on. The terminal iterate is saved atiter_hist[stop_iteration+1]
.
Constructors
FixedStepNonmonLSMaxValG(::Type{T}; x0::Vector{T}, α::T, δ::T, ρ::T,
window_size::Int64, line_search_max_iteration::Int64,
threshold::T, max_iterations::Int64) where {T}
Arguments
T::DataType
, specific data type used for calculations.
Keyword Arguments
α::T
, the initial step size for line search.δ::T
, backtracking decreasing factor applied toα
when the line search criterion is not satisfiedρ::T
, factor involved in the acceptance criterion in the line search procedure. Larger values correpsond to stricter descetn conditions, and smaller values correspond to looser descent conditions.window_size::Int64
, number of previous objective values that are used to construct the reference value for the line search criterion.line_search_max_iteration::Int64
, maximum number of iterations for line search.threshold::T
, gradient threshold. If the norm gradient is below this, then iteration stops.max_iterations::Int64
, max number of iterations (gradient steps) taken by the solver.
Gradient Descent with Non-sequential Armijo Line Search
OptimizationMethods.NonsequentialArmijoGD
— TypeNonsequentialArmijoGD{T} <: AbstractOptimizerData{T}
A mutable struct that represents gradient descent with non-sequential armijo line search and triggering events. It stores the specification for the method and records values during iteration.
Fields
name::String
, name of the optimizer for reference.∇F_θk::Vector{T}
, buffer array for the gradient of the initial inner loop iterate.norm_∇F_ψ::T
, norm of the gradient of the current inner loop iterate.prev_∇F_ψ::Vector{T}
, buffer array for the previous gradient in the inner loop. Necessary for updating the local Lipschitz approximation.prev_norm_step::T
, norm of the step between inner loop iterates. Used for updating the local Lipschitz approximation.α0k::T
, first step size used in the inner loop.δk::T
, scaling factor used to condition the step size.δ_upper::T
, upper limit imposed on the scaling factor when updating.ρ::T
, parameter used in the non-sequential Armijo condition. Larger numbers indicate stricter descent conditions. Smaller numbers indicate less strict descent conditions.τ_lower::T
, lower bound on the gradient interval triggering event.τ_upper::T
, upper bound on the gradient interval triggering event.local_lipschitz_estimate::T
, local Lipshitz approximation.threshold::T
, norm gradient tolerance condition. Induces stopping when norm is at mostthreshold
.max_iterations::Int64
, max number of iterates that are produced, not including the initial iterate.iter_hist::Vector{Vector{T}}
, store the iterate sequence as the algorithm progresses. The initial iterate is stored in the first position.grad_val_hist::Vector{T}
, stores the norm gradient values at each iterate. The norm of the gradient evaluated at the initial iterate is stored in the first position.stop_iteration::Int64
, the iteration number the algorithm stopped on. The iterate that induced stopping is saved atiter_hist[stop_iteration + 1]
.
Constructors
NonsequentialArmijoGD(::Type{T}; x0::Vector{T}, δ0::T, δ_upper::T, ρ::T,
threshold::T, max_iterations::Int64) where {T}
Constructs an instance of type NonsequentialArmijoGD{T}
.
Arguments
T::DataType
, type for data and computation.
Keyword Arguments
x0::Vector{T}
, initial point to start the optimization routine. Saved initer_hist[1]
.δ0::T
, starting scaling factor.δ_upper::T
, upper limit imposed on the scaling factor when updating.ρ::T
, parameter used in the non-sequential Armijo condition. Larger numbers indicate stricter descent conditions. Smaller numbers indicate less strict descent conditions.threshold::T
, norm gradient tolerance condition. Induces stopping when norm at mostthreshold
.max_iterations::Int64
, max number of iterates that are produced, not including the initial iterate.
OptimizationMethods.nonsequential_armijo_gd
— Functionnonsequential_armijo_gd(optData::NonsequentialArmijoGD{T},
progData::P where P <: AbstractNLPModel{T, S}) where {T, S}
Implementation of gradient descent with non-sequential armijo and triggering events. The optimization algorithm is specified through optData
, and applied to the problem progData
.
Reference(s)
Method
In what follows, we let $||\cdot||_2$ denote the L2-norm. Let $\theta_{k}$ for $k + 1 \in\mathbb{N}$ be the $k^{th}$ iterate of the optimization algorithm. Let $\delta_{k}, \tau_{\mathrm{grad},\mathrm{lower}}^k, \tau_{\mathrm{grad},\mathrm{upper}}^k$ be the $k^{th}$ parameters for the optimization method. The $k+1^{th}$ iterate and parameters are produced by the following procedure.
Let $\psi_0^k = \theta_k$, and recursively define
\[ \psi_{j}^k = \psi_0^k - \sum_{i = 0}^{j-1} \delta_k \alpha_i^k \dot F(\psi_i^k).\]
Let $j_k \in \mathbb{N}$ be the smallest iteration for which at least one of the conditions are satisfied:
- $||\psi_{j_k}^k - \theta_k||_2 > 10$,
- $||\dot F(\psi_{j_k}^k)||_2 \not\in (\tau_{\mathrm{grad},\mathrm{lower}}^k, \tau_{\mathrm{grad},\mathrm{upper}}^k)$,
- $j_k == 100$.
The next iterate and algorithmic parameters in optData
are updated based on the result of the non-sequential Armijo condition
\[ F(\psi_{j_k}^k) \leq F(\theta_k) - \rho\delta_k\alpha_0^k||\dot F(\theta_k)||_2.\]
When this condition is not satisfied, the following quantities are updated.
- The iterate $\psi_{j_k}^k$ is rejected, and $\theta_{k+1} = \theta_k$
- The scaling factor $\delta_{k+1} = .5\delta_k$
When this condition is satisfied, the following quantities are updated.
- The iterate $\psi_{j_k}^k$ is accepted, and $\theta_{k+1} = \psi_{j_k}^k$.
- The scaling factor is updated as $\delta_{k+1} = \min(1.5*\delta_k, \bar\delta)$ when $||\dot F(\psi_{j_k}^k)||_2 > \tau_{\mathrm{grad},\mathrm{lower}}^k$, otherwise $\delta_{k+1} = \delta_k$.
- If $||\dot F(\psi_{j_k}^k)||_2 \not\in (\tau_{\mathrm{grad},\mathrm{lower}}^k, \tau_{\mathrm{grad},\mathrm{upper}}^k)$, then $\tau_{\mathrm{grad},\mathrm{lower}}^{k+1} = ||\dot F(\psi_{j_k}^k)||_2/\sqrt{2}$ and $\tau_{\mathrm{grad},\mathrm{upper}}^{k+1} = \sqrt{10}||\dot F(\psi_{j_k}^k)||_2$. Otherwise, this parameters are held constant.
Arguments
optData::NesterovAcceleratedGD{T}
, the specification for the optimization method.progData<:AbstractNLPModel{T,S}
, the specification for the optimization problem.
progData
must have an initialize
function that returns subtypes of AbstractPrecompute
and AbstractProblemAllocate
, where the latter has a grad
argument.
Return
x::S
, final iterate of the optimization algorithm.
The method above requires several utility functions. These are listed below.
OptimizationMethods.update_local_lipschitz_approximation
— Functionupdate_local_lipschitz_approximation(j::Int64, k::Int64, djk::S, curr_grad::S,
prev_grad::S, prev_approximation::T, prev_acceptance::Bool) where {T, S}
Given the previous approximation of the local Lipschitz constant, prev_approximation::T
, update the current estimate. That is, return the Lipschitz approximation for inner loop iteration j
and outer loop iteration k
.
Method
The local Lipschitz approximation method is conducted as follows. Let $j$ be the current inner loop iteration counter, and let $k$ be the current outer loop iteration counter. Let $\psi_j^k$ be the $j^{th}$ iterate of the $k^{th}$ outer loop, and let $\hat L_{j}^k$ be the $j^th$ estimate of the $k^{th}$ outer loop of the local Lipschitz constant. The local estimate is updated according the following five cases.
When $j == 1$ and $k == 1$, this is the first iteration of the first inner loop, and as there is no information available we set it to
1.0
.When $j == 1$ and $k > 1$, this is the first iteration of the $k^{th}$ inner loop, and we return $L_{j_{k-1}}^{k-1}$ which is the local Lipschitz estimates formed using information at the terminal iteration of the $k-1^{th}$ inner loop (i.e., this is the latest estimate).
When $j > 1$ and $k == 1$, this is an inner loop iteration where we have possible taken multiple steps, so we return the most 'local' estimate of the local Lipschitz constant which is $\frac{ ||\dot F(\psi_{j}^k) - \dot F(\psi_{j-1}^k)||_2}{ ||\psi_{j}^k - \psi_{j-1}^k||_2}.$
When $j > 1$ and $k > 1$ and $\psi_{j_{k-1}}^{k-1}$ satisfied the descent condition, then we return $\frac{ ||\dot F(\psi_{j}^k) - \dot F(\psi_{j-1}^k)||_2}{ ||\psi_{j}^k - \psi_{j-1}^k||_2}.$
When $j > 1$ and $k > 1$ and $\psi_{j_{k-1}}^{k-1}$ did not satisfy the descent condition, then we return $\max \left( \frac{||\dot F(\psi_{j}^k) - \dot F(\psi_{j-1}^k)||_2}{ ||\psi_{j}^k - \psi_{j-1}^k||_2}, \hat L_{j-1}^k \right).$
Arguments
j::Int64
, inner loop iteration.k::Int6
, outer loop iteration.norm_djk::T
, norm of difference between\psi_j^k
and $\psi_{j-1}^k$. On the first iteration of the inner loop this will not matter.curr_grad::S
, gradient at $\psi_j^k$ (i.e., the current iterate).prev_grad::S
, gradient at $\psi_{j-1}^k$ (i.e., the previous iterate).prev_approximation::T
, the local Lipschitz approximation from the previous iterationprev_acceptance::Bool
, whether or not the previous inner loop resulted in an accepted iterate.
Return
estimate::T
, estimate of the local Lipschitz constant.
OptimizationMethods.compute_step_size
— Functioncompute_step_size(τ_lower::T, norm_grad::T, local_lipschitz_estimate::T
) where {T}
Computes the step size for the inner loop iterates.
Method
The inner loop iterates are generated according to the formula
\[ \psi_{j+1}^k = \psi_j^k - \delta_k\alpha_j^k \dot F(\psi_j^k).\]
The step size, $\alpha_j^k$ is computed as
\[ \alpha_j^k = \min \left( (\tau_{\mathrm{grad}, \mathrm{lower}}^k)^2 / C_{j,1}^k, 1 / C_{j,2}^k \right)\]
where
\[ C_{j,1}^k = ||\dot F(\psi_j^k)||_2^3 + .5 \hat{L}_j^k ||\dot F(\psi_j^k)||_2^2 + 1e-16\]
and
\[ C_{j,2}^k = ||\dot F(\psi_j^k)||_2 + .5 \hat{L}_j^k + 1e-16.\]
Arguments
τ_lower::T
, lower bound on the gradient.norm_grad::T
, norm of the gradient at $\psi_j^k$.local_lipschitz_estimate::T
, local lipschitz approximation at inner loop iterationj
and outer loop iterationk
.
Return
αjk::T
, the step size.
OptimizationMethods.inner_loop!
— Functioninner_loop!(ψjk::S, θk::S, optData::NonsequentialArmijoGD{T},
progData::P1 where P1 <: AbstractNLPModel{T, S},
precomp::P2 where P2 <: AbstractPrecompute{T},
store::P3 where P3 <: AbstractProblemAllocate{T},
past_acceptance::Bool, k::Int64; max_iteration = 100) where {T, S}
Conduct the inner loop iteration, modifying ψjk
, optData
, and store
in place. ψjk
gets updated to be the terminal iterate of the inner loop; the fields local_lipschitz_estimate
, norm_∇F_ψ
, and prev_∇F_ψ
are updated in optData
; the fields grad
in store
gets updated to be the gradient at ψjk
.
Arguments
ψjk::S
, buffer array for the inner loop iterates.θk::S
, starting iterate.optData::NonsequentialArmijoGD{T}
,struct
that specifies the optimization algorithm. Fields are modified during the inner loop.progData::P1 where P1 <: AbstractNLPModel{T, S}
,struct
that specifies the optimization problem. Fields are modified during the inner loop.precomp::P2 where P2 <: AbstractPrecompute{T}
,struct
that has precomputed values. Required to take advantage of this during the gradient computation.store::P3 where P3 <: AbstractProblemAllocate{T}
,struct
that contains buffer arrays for computation.past_acceptance::Bool
, flag indicating if the previous inner loop resulted in a success (i.e., $F(\theta_k) < F(\theta_{k-1})$).k::Int64
, outer loop iteration for computation of the local Lipschitz approximation scheme.
Optional Keyword Arguments
max_iteration = 100
, maximum number of allowable iteration of the inner loop. Should be kept at100
as that is what is specified in the paper, but is useful to change for testing.
Returns
j::Int64
, the iteration for which a triggering event evaluated to true.
OptimizationMethods.update_algorithm_parameters!
— Functionupdate_algorithm_parameters!(θkp1::S, optData::NonsequentialArmijoGD{T},
achieved_descent::Bool, iter::Int64) where {T, S}
Given that the non-sequential Armijo condition is checked, update the parameters the optimization method. The method updates the following variables in place.
θkp1
is updated to be the next outer loop iterate.optData
has (potentially) the following fields updated:δk
,τ_lower
,τ_upper
.
Arguments
θkp1::S
, buffer array for the storage of the next iterate.optData::NonsequentialArmijoGD{T}
,struct
that specifies the optimization algorithm.achieved_descent::Bool
, boolean flag indicating whether or not the descent condition was achieved.iter::Int64
, the current iteration of the method. The outer loop iteration. This is requried as it is used to overwriteθkp1
with the previous iterate.
Returns
- A boolean flag equal to
achieved_descent
to indicate whetherθkp1
is modified in-place.
Line search Helper Functions
Backtracking
OptimizationMethods.backtracking!
— Functionbacktracking!(θk::S, θkm1::S, F::Function, gkm1::S, step_direction::S,
reference_value::T, α::T, δ::T, ρ::T; max_iteration::Int64 = 100
) where {T, S}
Implementation of backtracking which modifies θk
in place. This method should be used for general step directions. If gkm1
is the step direction use the other backtracking!(...)
method.
Reference(s)
Nocedal and Wright. "Numerical Optimization". Springer New York, NY.
Method
Let $\theta_{k-1}$ be the current iterate, and let $\alpha \in \mathbb{R}_{>0}$, $\delta \in (0, 1)$, and $\rho \in (0, 1)$. Let $d_k$ be the step direction, then $\theta_k = \theta_{k-1} - \delta^t\alpha d_k$ where $t + 1 \in \mathbb{N}$ is the smallest such number satisfying
\[ F(\theta_k) \leq O_{k-1} - \rho\delta^t\alpha \dot F(\theta_{k-1})^\intercal d_k,\]
where $O_{k-1}$ is some reference value.
Arguments
θk::S
, buffer array for the next iterate.θkm1::S
, current iterate the optimization algorithm.F::Function
, objective function. Should take in a single argument and return the value of the objective at the input value.gkm1::S
, gradient value atθkm1
.step_direction::S
, direction to moveθkm1
to formθk
.reference_value::T
, value to check the objective function atθk
against.α::T
, initial step size.δ::T
, backtracking decrease factor.ρ::T
, factor involved in the acceptance criterion inθk
. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions onθk
.
Optional Keyword Arguments
max_iteration::Int64 = 100
, the maximum allowable iterations for the line search procedure. In the language above, when $t$ is equal tomax_iteration
the algorithm will terminate.
Return
backtracking_condition_satisfied::Bool
, whether the backtracking condition is satisfied before the max iteration limit.
backtracking!(θk::S, θkm1::S, F::Function, gkm1::S, norm_gkm1_squared::T,
reference_value::T, α::T, δ::T, ρ::T; max_iteration::Int64 = 100)
where {T, S}
Implementation of backtracking which modifies θk
in place. This method assumes that the step direction is -gkm1
.
Reference(s)
Nocedal and Wright. "Numerical Optimization". Springer New York, NY.
Method
Let $\theta_{k-1}$ be the current iterate, and let $\alpha \in \mathbb{R}_{>0}$, $\delta \in (0, 1)$, and $\rho \in (0, 1)$. Let $\dot F(\theta_{k-1})$ be the step direction, then $\theta_k = \theta_{k-1} - \delta^t\alpha \dot F(\theta_{k-1})$ where $t + 1 \in \mathbb{N}$ is the smallest such number satisfying
\[ F(\theta_k) \leq O_{k-1} - \rho\delta^t\alpha ||\dot F(\theta_{k-1})||_2^2,\]
where $O_{k-1}$ is some reference value, and $||\cdot||_2$ is the L2-norm.
Arguments
θk::S
, buffer array for the next iterate.θkm1::S
, current iterate the optimization algorithm.F::Function
, objective function. Should take in a single argument and return the value of the objective at the input value.gkm1::S
, gradient value atθkm1
.norm_gkm1_squared::T
, norm squared for the value ofgkm1
.reference_value::T
, value to check the objective function atθk
against.α::T
, initial step size.δ::T
, backtracking decrease factor.ρ::T
, factor involved in the acceptance criterion inθk
. Larger values correspond to stricter descent conditions, and smaller values correspond to looser descent conditions onθk
.
Optional Keyword Arguments
max_iteration::Int64 = 100
, the maximum allowable iterations for the line search procedure. In the language above, when $t$ is equal tomax_iteration
the algorithm will terminate.
Return
backtracking_condition_satisfied::Bool
, whether the backtracking condition is satisfied before the max iteration limit.
Non-sequential Armijo
OptimizationMethods.non_sequential_armijo_condition
— Functionnon_sequential_armijo_condition(F_ψjk::T, reference_value::T,
norm_grad_θk::T, ρ::T, δk::T, α0k::T) where {T}
Check if F_ψjk
satisfies the non-sequential armijo condition with respect to reference_value
and the remaining parameters. Returns a boolean value indicating if the descent condition is satisfied or not.
Method
Let $F(\theta) : \mathbb{R}^n \to \mathbb{R}$ be the objective function. Suppose that $\theta_k \in \mathbb{R}^n$ is an iterate of an optimization routine. Let $\psi_0^k = \theta_k$ and for $j \in \lbrace 1,...,T \rbrace$ for some $T \in \mathbb{N}$, recursively define
\[ \psi_j^k = \psi_0^k - \sum_{t = 0}^{j-1} \delta_k \alpha_t^k \dot F(\psi_t^k).\]
Then, the (monotone) non-sequential armijo condition requires that
\[ F(\psi_j^k) < F(\psi_0^k) - \rho \delta_k \alpha_0^k ||\dot F(\psi_0^k)||_2^2,\]
where $||\cdot||_2$ is the L2-norm and $\rho \in (0, 1)$.
This function implements checking the inequality, where F_ψjk
corresponds to $F(\psi_j^k)$, reference_value
corresponds to $F(\psi_0^k)$, norm_grad_θk
to $||\dot F(\psi_0^k)||_2$, ρ
to $\rho$, δk
to $\delta_k$, and α0k
to $\alpha_0^k$. To see more about how this method is used read the documentation for gradient descent with non-sequential armijo
Arguments
F_ψjk::T
, numeric value on the LHS of the inequality. In optimization context, this is the objective value of a trial iterate to check if sufficient descent is achieved.reference_value::T
, numeric value on the RHS of the inequality. In the optimization context, the value of the current iterateF_ψjk
must be smaller than this to guarantee a sufficient descent criterion.norm_grad_θk::T
, numeric value forming the amount of descent that needs to be achieved. This value is usually the norm of the gradient of a previous iterate.ρ::T
, parameter in the line search criterion dictating how much descent should be required. Should be positive. Larger values indicate stricter conditions, and lower value indicate looser conditions.δk::T
, numeric value that corresponds to a scaling factor for the step size.α0k::T
, numeric value. In the context of non-sequential armijo gradient descent this is the first step size used in an inner loop.
Return
flag::Bool
,true
if the descent condition is satisfied, andfalse
if the descent condition is not satisfied.
Index
OptimizationMethods.BacktrackingGD
OptimizationMethods.BarzilaiBorweinGD
OptimizationMethods.DiminishingStepGD
OptimizationMethods.FixedStepGD
OptimizationMethods.FixedStepNonmonLSMaxValG
OptimizationMethods.LeastSquares
OptimizationMethods.LipschitzApproxGD
OptimizationMethods.LogisticRegression
OptimizationMethods.NesterovAcceleratedGD
OptimizationMethods.NonsequentialArmijoGD
OptimizationMethods.PoissonRegression
OptimizationMethods.QLLogisticCenteredExp
OptimizationMethods.QLLogisticCenteredLog
OptimizationMethods.QLLogisticMonomial
OptimizationMethods.QLLogisticSin
OptimizationMethods.WeightedNormDampingGD
OptimizationMethods.backtracking!
OptimizationMethods.backtracking_gd
OptimizationMethods.barzilai_borwein_gd
OptimizationMethods.centered_exp
OptimizationMethods.centered_shifted_log
OptimizationMethods.compute_step_size
OptimizationMethods.dcentered_exp
OptimizationMethods.dcentered_shifted_log
OptimizationMethods.ddlogistic
OptimizationMethods.diminishing_step_gd
OptimizationMethods.dlinear_plus_sin
OptimizationMethods.dlogistic
OptimizationMethods.dmonomial_plus_constant
OptimizationMethods.fixed_step_gd
OptimizationMethods.fixed_step_nls_maxval_gd
OptimizationMethods.inner_loop!
OptimizationMethods.inverse_complimentary_log_log
OptimizationMethods.inverse_k_step_size
OptimizationMethods.inverse_log2k_step_size
OptimizationMethods.inverse_probit
OptimizationMethods.linear_plus_sin
OptimizationMethods.lipschitz_approximation_gd
OptimizationMethods.logistic
OptimizationMethods.monomial_plus_constant
OptimizationMethods.nesterov_accelerated_gd
OptimizationMethods.non_sequential_armijo_condition
OptimizationMethods.nonsequential_armijo_gd
OptimizationMethods.stepdown_100_step_size
OptimizationMethods.update_algorithm_parameters!
OptimizationMethods.update_local_lipschitz_approximation
OptimizationMethods.weighted_norm_damping_gd