Code Generation

351 papers with code • 16 benchmarks • 43 datasets

Code Generation is an important field to predict explicit code or program structure from multimodal data sources such as incomplete code, programs in another programming language, natural language descriptions or execution examples. Code Generation tools can assist the development of automatic programming tools to improve programming productivity.

Source: Deep Learning for Source Code Modeling and Generation

Image source: Measuring Coding Challenge Competence With APPS

Libraries

Use these libraries to find Code Generation models and implementations
4 papers
65
2 papers
7,842
See all 15 libraries.

Most implemented papers

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

salesforce/codet5 EMNLP 2021

We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

salesforce/CodeGen 25 Mar 2022

To democratize this, we train and release a family of large language models up to 16. 1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.

SantaCoder: don't reach for the stars!

bigcode-project/starcoder 9 Jan 2023

The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code.

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

gair-nlp/factool 25 Jul 2023

With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e. g., ChatGPT).

TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation

pcyin/tranX EMNLP 2018

We present TRANX, a transition-based neural semantic parser that maps natural language (NL) utterances into formal meaning representations (MRs).

Content Enhanced BERT-based Text-to-SQL Generation

guotong1988/NL2SQL-BERT 16 Oct 2019

We present a simple methods to leverage the table content for the BERT-based model to solve the text-to-SQL problem.

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

microsoft/CodeXGLUE 9 Feb 2021

Benchmark datasets have a significant impact on accelerating research in programming language tasks.

StarCoder: may the source be with you!

bigcode-project/starcoder 9 May 2023

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15. 5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention.

How is ChatGPT's behavior changing over time?

lchen001/llmdrift 18 Jul 2023

We find that the performance and behavior of both GPT-3. 5 and GPT-4 can vary greatly over time.

Mistral 7B

mistralai/mistral-src 10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.