CodeBERT

Introduced by Feng et al. in CodeBERT: A Pre-Trained Model for Programming and Natural Languages

CodeBERT is a bimodal pre-trained model for programming language (PL) and natural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language code search, code documentation generation, etc. CodeBERT is developed with a Transformer-based neural architecture, and is trained with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables the utilization of both bimodal data of NL-PL pairs and unimodal data, where the former provides input tokens for model training while the latter helps to learn better generators.

Source: CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Code Search	9	12.50%
Language Modelling	6	8.33%
Vulnerability Detection	5	6.94%
Code Generation	5	6.94%
Retrieval	3	4.17%
Code Documentation Generation	3	4.17%
Graph Neural Network	2	2.78%
Large Language Model	2	2.78%
Code Classification	2	2.78%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Transformers

Code Generation Transformers