Nesterov Accelerated Gradient is a momentum-based SGD optimizer that "looks ahead" to where the parameters will be to calculate the gradient ex post rather than ex ante:
$$ v_{t} = \gamma{v}_{t-1} + \eta\nabla_{\theta}J\left(\theta-\gamma{v_{t-1}}\right) $$ $$\theta_{t} = \theta_{t-1} + v_{t}$$
Like SGD with momentum $\gamma$ is usually set to $0.9$.
The intuition is that the standard momentum method first computes the gradient at the current location and then takes a big jump in the direction of the updated accumulated gradient. In contrast Nesterov momentum first makes a big jump in the direction of the previous accumulated gradient and then measures the gradient where it ends up and makes a correction. The idea being that it is better to correct a mistake after you have made it.
Image Source: Geoff Hinton lecture notes
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Classification | 8 | 19.05% |
General Classification | 3 | 7.14% |
Object Recognition | 3 | 7.14% |
Semantic Segmentation | 2 | 4.76% |
Open-Ended Question Answering | 1 | 2.38% |
Question Answering | 1 | 2.38% |
Denoising | 1 | 2.38% |
Image Denoising | 1 | 2.38% |
Sparse Learning | 1 | 2.38% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |