TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Code Generation	HumanEval	AgentCoder (GPT-4)	Pass@1	96.3	# 2
Code Generation	HumanEval	AgentCoder (ChatGPT)	Pass@1	79.9	# 12
Code Generation	MBPP	GPT-4 + AgentCoder	Accuracy	91.8	# 1
Code Generation	MBPP	GPT-3.5 Turbo (ChatGPT) + AgentCoder	Accuracy	89.9	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/agentcoder-multi-agent-based-code-generation/code-generation-on-mbpp)](https://paperswithcode.com/sota/code-generation-on-mbpp?p=agentcoder-multi-agent-based-code-generation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/agentcoder-multi-agent-based-code-generation/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=agentcoder-multi-agent-based-code-generation)`

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

20 Dec 2023 · Dong Huang, Jie M. Zhang, Michael Luck, Qingwen Bu, Yuhao QING, Heming Cui ·

The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent. During the coding procedure, the programmer agent will focus on the code generation and refinement based on the test executor agent's feedback. The test designer agent will generate test cases for the generated code, and the test executor agent will run the code with the test cases and write the feedback to the programmer. This collaborative system ensures robust code generation, surpassing the limitations of single-agent models and traditional methodologies. Our extensive experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models and prompt engineering techniques across various benchmarks. For example, AgentCoder (GPT-4) achieves 96.3\% and 91.8\% pass@1 in HumanEval and MBPP datasets with an overall token overhead of 56.9K and 66.3K, while state-of-the-art obtains only 90.2\% and 78.9\% pass@1 with an overall token overhead of 138.2K and 206.5K.

PDF Abstract

Code

Add Remove Mark official

huangd1999/AgentCoder official

Tasks

Add Remove

Code Generation

Prompt Engineering

Datasets

HumanEval MBPP

Results from the Paper

Add Remove

Ranked #1 on Code Generation on MBPP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Code Generation	HumanEval	AgentCoder (GPT-4)	Pass@1	96.3	# 2	Compare
Code Generation	HumanEval	AgentCoder (ChatGPT)	Pass@1	79.9	# 12	Compare
Code Generation	MBPP	GPT-4 + AgentCoder	Accuracy	91.8	# 1	Compare
Code Generation	MBPP	GPT-3.5 Turbo (ChatGPT) + AgentCoder	Accuracy	89.9	# 2	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • Focus • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove