HaluEval is a large-scale hallucination evaluation benchmark designed for Large Language Models (LLMs). It provides a comprehensive collection of generated and human-annotated hallucinated samples to evaluate the performance of LLMs in recognizing hallucinations¹².
Here are the key details about the HaluEval dataset:
Data Sources:
Data Composition:
Task-Specific Examples:
Data Release:
qa_data.json
: Hallucinated QA samples.dialogue_data.json
: Hallucinated dialogue samples.summarization_data.json
: Hallucinated summarization samples.general_data.json
: Human-annotated ChatGPT responses to general user queries.Source: Conversation with Bing, 3/17/2024 (1) HaluEval: A Hallucination Evaluation Benchmark for LLMs. https://github.com/RUCAIBox/HaluEval. (2) jzjiao/halueval-sft · Datasets at Hugging Face. https://huggingface.co/datasets/jzjiao/halueval-sft. (3) HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large .... https://aclanthology.org/2023.emnlp-main.397/. (4) HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large .... https://arxiv.org/abs/2305.11747. (5) undefined. https://github.com/RUCAIBox/HaluEval%29.
Paper | Code | Results | Date | Stars |
---|