References
- Armijo, L. (1966). Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of Mathematics.
- Barzilai, J. and Borwein, J. M. (1988). Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis 8.
- Bertsekas, D. (2016). Nonlinear Optimization (Athena Scientific).
- Grippo, L.; Lampariello, F. and Lucidi, S. (1986). A Nonmonotone Line Search Technique for Newton's Method. Society for Industrial and Applied Mathematics.
- Lanteri, A.; Leorato, S.; Lopez-Fidalgo, J. and Tommasi, C. (2023-02-24). Designing to Detect Heteroscedasticity in a Regression Model. Journal of the Royal Statistical Society 85, 315–326.
- Li, H.; Qian, J.; Tian, Y.; Rakhlin, A. and Jadbabaie, A. (2023-06). Convex and Non-convex Optimization Under Generalized Smoothness. Advances in Neural Information Processing Systems 36.
- Malitsky, Y. and Mishchenko, K. (2020-11-21). Adaptive Gradient Descent without Descent. In: Proceedings of the 37th International Conference on Machine Learning (PMLR); pp. 6702–6712.
- Nesterov, Y. (1983). A Method for Solving the Convex Programming Problem with Convergence Rate O(1/K2). Proceedings of the USSR Academy of Sciences 269, 543–547.
- Nocedal, J. and Wright, S. (2006). Numerical Optimization. 2 Edition, Springer Series in Operations Research and Financial Engineering (Springer New York, NY).
- Patel, V. and Berahas, A. (2024). Gradient Descent in the Absence of Global Lipschitz Continuity of the Gradients. SIAM 6, 579–846.
- Wedderburn, R. W. (1974). Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss—Newton Method. Biometrika 61, 439–447.