GPT

Introduced by Radford et al. in Improving Language Understanding by Generative Pre-Training

GPT is a Transformer-based architecture and training procedure for natural language processing tasks. Training follows a two-stage procedure. First, a language modeling objective is used on the unlabeled data to learn the initial parameters of a neural network model. Subsequently, these parameters are adapted to a target task using the corresponding supervised objective.

Source: Improving Language Understanding by Generative Pre-Training

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	89	11.10%
Large Language Model	54	6.73%
Question Answering	35	4.36%
Prompt Engineering	26	3.24%
Retrieval	23	2.87%
Text Generation	22	2.74%
In-Context Learning	21	2.62%
Sentence	19	2.37%
Decision Making	18	2.24%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Attention Dropout	Regularization
BPE	Subword Segmentation
Dense Connections	Feedforward Networks
Discriminative Fine-Tuning	Fine-Tuning
Dropout	Regularization
GELU	Activation Functions
Layer Normalization	Normalization
Linear Warmup With Cosine Annealing	Learning Rate Schedules
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
Softmax	Output Functions
Weight Decay	Regularization

Categories

Add Remove

Transformers

Autoregressive Transformers

GPT

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove