RewardBench

Introduced by Lambert et al. in RewardBench: Evaluating Reward Models for Language Modeling

RewardBench is a benchmark designed to evaluate the capabilities and safety of reward models, including those trained with Direct Preference Optimization (DPO). It serves as the first evaluation tool for reward models and provides valuable insights into their performance and reliability¹.

Here are the key components of RewardBench:

  1. Common Inference Code: The repository includes common inference code for various reward models, such as Starling, PairRM, OpenAssistant, and more. These models can be evaluated using the provided tools¹.

  2. Dataset and Evaluation: The RewardBench dataset consists of prompt-win-lose trios spanning chat, reasoning, and safety scenarios. It allows benchmarking reward models on challenging, structured, and out-of-distribution queries. The goal is to enhance scientific understanding of reward models and their behavior².

  3. Scripts for Evaluation:

    • scripts/run_rm.py: Used to evaluate individual reward models.
    • scripts/run_dpo.py: Used to evaluate direct preference optimization (DPO) models.
    • scripts/train_rm.py: A basic reward model training script built on TRL (Transformer Reinforcement Learning)¹.
  4. Installation and Usage:

    • Install PyTorch on your system.
    • Install the required dependencies using pip install -e ..
    • Set the environment variable HF_TOKEN with your token.
    • To contribute your model to the leaderboard, open an issue on HuggingFace with the model name. For local model evaluation, follow the instructions in the repository¹.

Remember that RewardBench provides a standardized way to assess reward models, ensuring transparency and comparability across different approaches. 🌟🔍

(1) GitHub - allenai/reward-bench: RewardBench: the first evaluation tool .... https://github.com/allenai/reward-bench. (2) RewardBench: Evaluating Reward Models for Language Modeling. https://arxiv.org/abs/2403.13787. (3) RewardBench: Evaluating Reward Models for Language Modeling. https://paperswithcode.com/paper/rewardbench-evaluating-reward-models-for.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages