CodeBERT is a bimodal pre-trained model for programming language (PL) and natural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language code search, code documentation generation, etc. CodeBERT is developed with a Transformer-based neural architecture, and is trained with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables the utilization of both bimodal data of NL-PL pairs and unimodal data, where the former provides input tokens for model training while the latter helps to learn better generators.
Source: CodeBERT: A Pre-Trained Model for Programming and Natural LanguagesPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Code Search | 9 | 12.50% |
Language Modelling | 6 | 8.33% |
Vulnerability Detection | 5 | 6.94% |
Code Generation | 5 | 6.94% |
Retrieval | 3 | 4.17% |
Code Documentation Generation | 3 | 4.17% |
Graph Neural Network | 2 | 2.78% |
Large Language Model | 2 | 2.78% |
Code Classification | 2 | 2.78% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |