References

Armijo, L. (1966). Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of Mathematics.
Barzilai, J. and Borwein, J. M. (1988). Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis 8.
Bertsekas, D. (2016). Nonlinear Optimization (Athena Scientific).
Grippo, L.; Lampariello, F. and Lucidi, S. (1986). A Nonmonotone Line Search Technique for Newton's Method. Society for Industrial and Applied Mathematics.
Lanteri, A.; Leorato, S.; Lopez-Fidalgo, J. and Tommasi, C. (2023-02-24). Designing to Detect Heteroscedasticity in a Regression Model. Journal of the Royal Statistical Society 85, 315–326.
Li, H.; Qian, J.; Tian, Y.; Rakhlin, A. and Jadbabaie, A. (2023-06). Convex and Non-convex Optimization Under Generalized Smoothness. Advances in Neural Information Processing Systems 36.
Malitsky, Y. and Mishchenko, K. (2020-11-21). Adaptive Gradient Descent without Descent. In: Proceedings of the 37th International Conference on Machine Learning (PMLR); pp. 6702–6712.
Nesterov, Y. (1983). A Method for Solving the Convex Programming Problem with Convergence Rate O(1/K2). Proceedings of the USSR Academy of Sciences 269, 543–547.
Nocedal, J. and Wright, S. (2006). Numerical Optimization. 2 Edition, Springer Series in Operations Research and Financial Engineering (Springer New York, NY).
Patel, V. and Berahas, A. (2024). Gradient Descent in the Absence of Global Lipschitz Continuity of the Gradients. SIAM 6, 579–846.
Wedderburn, R. W. (1974). Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss—Newton Method. Biometrika 61, 439–447.