RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, the datasets are provided with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established. This is a dataset accompanying the paper RL Unplugged: Benchmarks for Offline Reinforcement Learning.
7 PAPERS • NO BENCHMARKS YET
CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments. It's designed to test your agent's generalization capabilities in all scenarios where intra-task generalization is important.
6 PAPERS • NO BENCHMARKS YET
Random sampled instances of the Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) for 20, 50 and 100 customer nodes.
Memory Maze is a 3D domain of randomized mazes designed for evaluating the long-term memory abilities of RL agents. Memory Maze isolates long-term memory from confounding challenges, such as exploration, and requires remembering several pieces of information: the positions of objects, the wall layout, and keeping track of agent’s own position.
The task of Room Rearrangement consists on an agent exploring a room and recording objects' initial configurations. The agent is removed and the poses and states (e.g., open/closed) of some objects in the room are changed. The agent must restore the initial configurations of all objects in the room.
SPACE is a simulator for physical Interactions and causal learning in 3D environments. The SPACE simulator is used to generate the SPACE dataset, a synthetic video dataset in a 3D environment, to systematically evaluate physics-based models on a range of physical causal reasoning tasks. Inspired by daily object interactions, the SPACE dataset comprises videos depicting three types of physical events: containment, stability and contact.
6 PAPERS • 1 BENCHMARK
Collected in the snow belt region of Michigan's Upper Peninsula, WADS is the first multi-modal dataset featuring dense point-wise labeled sequential LiDAR scans collected in severe winter weather.
safe-control-gym is an open-source benchmark suite that extends OpenAI's Gym API with (i) the ability to specify (and query) symbolic models and constraints and (ii) introduce simulated disturbances in the control inputs, measurements, and inertial properties. We provide implementations for three dynamic systems -- the cart-pole, 1D, and 2D quadrotor -- and two control tasks -- stabilization and trajectory tracking.
The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure. It was created to test for the ability of agents to reason and plan via latent state inference, as well as useful exploration and experimentation.
5 PAPERS • NO BENCHMARKS YET
To collect the 3D Vehicle Tracking Simulation Dataset, a driving simulation is used to obtain accurate 3D bounding box annotations at no cost of human efforts. The data collection and annotation pipeline extend the previous works like VIPER and FSV, especially in terms of linking identities across frames. The simulation is based on Grand Theft Auto V, a modern game that simulates a functioning city and its surroundings in a photo-realistic three-dimensional world. Note that the pipeline is real-time, providing the potential of largescale data collection, while VIPER requires expensive offline processings.
4 PAPERS • NO BENCHMARKS YET
Numerical simulations of Earth's weather and climate require substantial amounts of computation. This has led to a growing interest in replacing subroutines that explicitly compute physical processes with approximate machine learning (ML) methods that are fast at inference time. Within weather and climate models, atmospheric radiative transfer (RT) calculations are especially expensive. This has made them a popular target for neural network-based emulators. However, prior work is hard to compare due to the lack of a comprehensive dataset and standardized best practices for ML benchmarking. To fill this gap, we build a large dataset, ClimART, with more than \emph{10 million samples from present, pre-industrial, and future climate conditions}, based on the Canadian Earth System Model. ClimART poses several methodological challenges for the ML community, such as multiple out-of-distribution test sets, underlying domain physics, and a trade-off between accuracy and inference speed.
LemgoRL is an open-source benchmark tool for traffic signal control designed to train reinforcement learning agents in a highly realistic simulation scenario with the aim to reduce Sim2Real gap. In addition to the realistic simulation model, LemgoRL encompasses a traffic signal logic unit that ensures compliance with all regulatory and safety requirements. LemgoRL offers the same interface as the well-known OpenAI gym toolkit to enable easy deployment in existing research work.
Phy-Q is a benchmark that requires an agent to reason about physical scenarios and take an action accordingly. Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, the authors identify 15 essential physical scenarios. For each scenario, a wide variety of distinct task templates are created, and all the task templates within the same scenario can be solved by using one specific physical rule.
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
4 PAPERS • 2 BENCHMARKS
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
3 PAPERS • NO BENCHMARKS YET
The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions fo
3 PAPERS • 1 BENCHMARK
FluidLab is a simulation environment with a diverse set of manipulation tasks involving complex fluid dynamics. These tasks address interactions between solid and fluid as well as among multiple fluids.
The MUAD dataset (Multiple Uncertainties for Autonomous Driving), consisting of 10,413 realistic synthetic images with diverse adverse weather conditions (night, fog, rain, snow), out-of-distribution objects, and annotations for semantic segmentation, depth estimation, object, and instance detection. Predictive uncertainty estimation is essential for the safe deployment of Deep Neural Networks in real-world autonomous systems and MUAD allows to a better assess the impact of different sources of uncertainty on model performance.
MineRLis an imitation learning dataset with over 60 million frames of recorded human player data. The dataset includes a set of tasks which highlights many of the hardest problems in modern-day Reinforcement Learning: sparse rewards and hierarchical policies.
Symbolic Interactive Language Grounding (SILG) is a multi-environment benchmark which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity.
ThreeDWorld Transport Challenge is a visually-guided and physics-driven task-and-motion planning benchmark. In this challenge, an embodied agent equipped with two 9-DOF articulated arms is spawned randomly in a simulated physical home environment. The agent is required to find a small set of objects scattered around the house, pick them up, and transport them to a desired final location. Several containers are positioned around the house that can be used as tools to assist with transporting objects efficiently. To complete the task, an embodied agent must plan a sequence of actions to change the state of a large number of objects in the face of realistic physical constraints.
rSoccer is an open-source simulator for the IEEE Very Small Size Soccer and the Small Size League optimized for reinforcement learning experiments.
Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcome this shortfall, this paper introduces a semantically-enriched, photo-realistic 3D architectural models dataset and benchmark for semantic segmentation. It features 4 different building purposes of real-world buildings as well as an open architectural landscape in Hong Kong. Each point cloud is annotated into one of 14 semantic classes.
2 PAPERS • NO BENCHMARKS YET
The temporal variability in calving front positions of marine-terminating glaciers permits inference on the frontal ablation. Frontal ablation, the sum of the calving rate and the melt rate at the terminus, significantly contributes to the mass balance of glaciers. Therefore, the glacier area has been declared as an Essential Climate Variable product by the World Meteorological Organization. The presented dataset provides the necessary information for training deep learning techniques to automate the process of calving front delineation. The dataset includes Synthetic Aperture Radar (SAR) images of seven glaciers distributed around the globe. Five of them are located in Antarctica: Crane, Dinsmoore-Bombardier-Edgeworth, Mapple, Jorum and the Sjörgen-Inlet Glacier. The remaining glaciers are the Jakobshavn Isbrae Glacier in Greenland and the Columbia Glacier in Alaska. Several images were taken for each glacier, forming a time series. The time series lie in the time span between 1995 an
2 PAPERS • 2 BENCHMARKS
CinemAirSim is an extension of the well-known drone simulator, AirSim, with a cinematic camera as well as extended its API to control all of its parameters in real time, including various filming lenses and common cinematographic properties.
Archive of Global Tropical Cyclone Tracks Tracks from 1980 to May 2019.
A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments.
MRPB 1.0 is a mobile robot local planning benchmark. The benchmark facilitates both motion planning researchers who want to compare the performance of a new local planner relative to many other state-of-the-art approaches as well as end users in the mobile robotics industry who want to select a local planner that performs best on some problems of interest.
MSC is a dataset for Macro-Management in StarCraft 2 based on the platfrom SC2LE. It consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. It contains 36,619 high quality replays, which are unbroken and played by relatively professional players.
Multirotor gym environment for learning control policies for various unmanned aerial vehicles.
POPGym is designed to benchmark memory in deep reinforcement learning. It contains a set of environments and a collection of memory model baselines. The environments are all Partially Observable Markov Decision Process (POMDP) environments following the Openai Gym interface. Our environments follow a few basic tenets:
2 PAPERS • 1 BENCHMARK
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established. This is a dataset accompanying the paper RL Unplugged: Benchmarks for Offline Reinforcement Learning.
An environment for RNA design given structure constraints with structures from different datasets to choose from.
Situated Dialogue Navigation (SDN) is a navigation benchmark of 183 trials with a total of 8415 utterances, around 18.7 hours of control streams, and 2.9 hours of trimmed audio. SDN is developed to evaluate the agent's ability to predict dialogue moves from humans as well as generate its own dialogue moves and physical navigation actions.
Large-scale shadows from buildings in a city play an important role in determining the environmental quality of public spaces. They can be both beneficial, such as for pedestrians during summer, and detrimental, by impacting vegetation and by blocking direct sunlight. Determining the effects of shadows requires the accumulation of shadows over time across different periods in a year. In our paper Shadow Accrual Maps: Efficient Accumulation of City-Scale Shadows over Time, we present a simple yet efficient class of approach that uses the properties of sun movement to track the changing position of shadows within a fixed time interval. This repository presents the computed shadow information for New York City, Chicago, Los Angeles, Boston and Washington DC.
SuperCaustics is a simulation tool made in Unreal Engine for generating massive computer vision datasets that include transparent objects.
Thumb Index 1000 (TI1K) is a dataset of 1000 hand images with the hand bounding box, and thumb and index fingertip positions. The dataset includes the natural movement of the thumb and index fingers making it suitable for mixed reality (MR) applications.
A procedurally generated jump'n'run game with control over level similarity.
TeachMyAgent (TA) is a benchmark for Automatic Curriculum Learning (ACL) algorithms leveraging procedural task generation. It includes 1) challenge-specific unit-tests using variants of a procedural Box2D bipedal walker environment, and 2) a new procedural Parkour environment combining most ACL challenges, making it ideal for global performance assessment.
The 2048 game task involves training an agent to achieve high scores in the game 2048 (Wikipedia)
Wastewater catchment area data are essential for wastewater treatment capacity planning and have recently become critical for operationalising wastewater-based epidemiology (WBE) for COVID-19. Owing to the privatised nature of the water industry in the United Kingdom, the required catchment area datasets are not readily available to researchers. Here, we present a consolidated dataset of 7,537 catchment areas from ten sewerage service providers in the Great Britain, covering more than 96% of the population of England and Wales.
The bipedal skills benchmark is a suite of reinforcement learning environments implemented for the MuJoCo physics simulator. It aims to provide a set of tasks that demand a variety of motor skills beyond locomotion, and is intended for evaluating skill discovery and hierarchical learning methods. The majority of tasks exhibit a sparse reward structure.
This package provides utilities for generation, filtering, solving, visualizing, and processing of mazes for training ML systems. Primarily built for the maze-transformer interpretability project. You can find our paper on it here: http://arxiv.org/abs/2309.10498
Unsustainable fishing practices worldwide pose a major threat to marine resources and ecosystems. Identifying vessels that do not show up in conventional monitoring systems---known as ``dark vessels''---is key to managing and securing the health of marine environments. With the rise of satellite-based synthetic aperture radar (SAR) imaging and modern machine learning (ML), it is now possible to automate detection of dark vessels day or night, under all-weather conditions. SAR images, however, require a domain-specific treatment and are not widely accessible to the ML community. Maritime objects (vessels and offshore infrastructure) are relatively small and sparse, challenging traditional computer vision approaches. We present the largest labeled dataset for training ML models to detect and characterize vessels and ocean structures in SAR imagery. xView3-SAR consists of nearly 1,000 analysis-ready SAR images from the Sentinel-1 mission that are, on average, 29,400-by-24,400 pixels each.
TDW is a 3D virtual world simulation platform, utilizing state-of-the-art video game engine technology. A TDW simulation consists of two components: a) the Build, a compiled executable running on the Unity3D Engine, which is responsible for image rendering, audio synthesis and physics simulations; and b) the Controller, an external Python interface to communicate with the build.
1 PAPER • NO BENCHMARKS YET
This dataset is meant to be used to develop models for next-day fire hazard forecasting in Greece. It contains data from 2009 to 2020 at a 1km x 1km x 1 daily grid.
A dataset for flying honeybee detection introduced in "A Method for Detection of Small Moving Objects in UAV Videos".
1 PAPER • 1 BENCHMARK
BrazilDAM is a multi sensor and multitemporal dataset that consists of multispectral images of ore tailings dams throughout Brazil. Landsat 8 and Sentinel 2 satellites that capture multispectral images over the years 2016, 2017, 2018 and 2019 were used. The dataset contains samples collected in different regions, which increases the diversity and representativeness of the characteristics of the dams.