Cosine Annealing

Introduced by Loshchilov et al. in SGDR: Stochastic Gradient Descent with Warm Restarts

Cosine Annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again. The resetting of the learning rate acts like a simulated restart of the learning process and the re-use of good weights as the starting point of the restart is referred to as a "warm restart" in contrast to a "cold restart" where a new set of small random numbers may be used as a starting point.

$$\eta_{t} = \eta_{min}^{i} + \frac{1}{2}\left(\eta_{max}^{i}-\eta_{min}^{i}\right)\left(1+\cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right) $$

Where where $\eta_{min}^{i}$ and $ \eta_{max}^{i}$ are ranges for the learning rate, and $T_{cur}$ account for how many epochs have been performed since the last restart.

Text Source: Jason Brownlee

Image Source: Gao Huang

Source: SGDR: Stochastic Gradient Descent with Warm Restarts

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	73	9.77%
Large Language Model	46	6.16%
Question Answering	34	4.55%
Retrieval	31	4.15%
In-Context Learning	25	3.35%
Text Generation	24	3.21%
Sentence	23	3.08%
Code Generation	22	2.95%
Prompt Engineering	18	2.41%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Learning Rate Schedules