no code implementations • 26 Feb 2024 • Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu
As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.
1 code implementation • 19 Feb 2024 • Michael Beukman, Samuel Coward, Michael Matthews, Mattie Fellows, Minqi Jiang, Michael Dennis, Jakob Foerster
In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation.
no code implementations • 24 Jan 2024 • Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel
In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial.
1 code implementation • 17 Dec 2023 • Dominik Schmidt, Minqi Jiang
LAPO takes a first step towards pre-training powerful, generalist policies and world models on the vast amounts of videos readily available on the web.
1 code implementation • 10 Dec 2023 • Ishita Mediratta, Qingfei You, Minqi Jiang, Roberta Raileanu
Our experiments reveal that existing offline learning algorithms struggle to match the performance of online RL on both train and test environments.
1 code implementation • 3 Dec 2023 • Minqi Jiang
Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents.
1 code implementation • 21 Nov 2023 • Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rocktäschel
This compute requirement is a major obstacle to rapid innovation for the field.
2 code implementations • 16 Nov 2023 • Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster
This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL.
1 code implementation • NeurIPS 2023 • Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster
Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks.
1 code implementation • NeurIPS 2023 • Minqi Jiang, Chaochuan Hou, Ao Zheng, Songqiao Han, Hailiang Huang, Qingsong Wen, Xiyang Hu, Yue Zhao
Deep learning (DL) techniques have recently found success in anomaly detection (AD) across various fields such as finance, medical services, and cloud computing.
1 code implementation • 21 Aug 2023 • Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel
As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment.
1 code implementation • 26 Jun 2023 • Minqi Jiang, Songqiao Han, Hailiang Huang
In this paper, we propose to optimize the anomaly scoring function from the view of score distribution, thus better retaining the diversity and more fine-grained information of input data, especially when the unlabeled data contains anomaly noises in more practical AD scenarios.
1 code implementation • 15 Jun 2023 • Marc Rigter, Minqi Jiang, Ingmar Posner
We consider robustness in terms of minimax regret over all environment instantiations and show that the minimax regret can be connected to minimising the maximum error in the world model across environment instances.
2 code implementations • 5 Jun 2023 • Mikael Henaff, Minqi Jiang, Roberta Raileanu
This results in an algorithm which sets a new state of the art across 16 tasks from the MiniHack suite used in prior work, and also performs robustly on Habitat and Montezuma's Revenge.
no code implementations • 6 Mar 2023 • Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel
Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents.
2 code implementations • 9 Feb 2023 • Minqi Jiang, Chaochuan Hou, Ao Zheng, Xiyang Hu, Songqiao Han, Hailiang Huang, Xiangnan He, Philip S. Yu, Yue Zhao
Anomaly detection (AD) is a crucial task in machine learning with various applications, such as detecting emerging diseases, identifying financial frauds, and detecting fake news.
1 code implementation • 18 Jan 2023 • Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, Yupeng Wu
We call the collected dataset the Human ChatGPT Comparison Corpus (HC3).
no code implementations • 15 Nov 2022 • Minqi Jiang, Tim Rocktäschel, Edward Grefenstette
We are at the cusp of a transition from "learning from data" to "learning what data to learn from" as a central focus of artificial intelligence (AI) research.
2 code implementations • 11 Oct 2022 • Mikael Henaff, Roberta Raileanu, Minqi Jiang, Tim Rocktäschel
In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes.
no code implementations • 13 Jul 2022 • Christopher Bamford, Minqi Jiang, Mikayel Samvelyan, Tim Rocktäschel
Progress in reinforcement learning (RL) research is often driven by the design of new, challenging environments -- a costly undertaking requiring skills orthogonal to that of a typical machine learning researcher.
1 code implementation • 11 Jul 2022 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster
Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution.
1 code implementation • 22 Mar 2022 • Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, DaeJin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel, Mikayel Samvelyan, Dmitry Sorokin, Maciej Sypetkowski, Michał Sypetkowski
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge.
3 code implementations • 2 Mar 2022 • Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.
1 code implementation • 17 Feb 2022 • Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette
Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse.
4 code implementations • NeurIPS 2021 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria.
no code implementations • 29 Sep 2021 • Jack Parker-Holder, Minqi Jiang, Michael D Dennis, Mikayel Samvelyan, Jakob Nicolaus Foerster, Edward Grefenstette, Tim Rocktäschel
Deep Reinforcement Learning (RL) has recently produced impressive results in a series of settings such as games and robotics.
1 code implementation • 27 Sep 2021 • Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel
By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use.
no code implementations • NeurIPS Workshop ICBINB 2021 • Iryna Korshunova, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel, Edward Grefenstette
Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels.
no code implementations • ICLR Workshop SSL-RL 2021 • Clare Lyle, Amy Zhang, Minqi Jiang, Joelle Pineau, Yarin Gal
To address this, we present a robust exploration strategy which enables causal hypothesis-testing by interaction with the environment.
1 code implementation • 8 Feb 2021 • Zhengyao Jiang, Pasquale Minervini, Minqi Jiang, Tim Rocktaschel
In this work, we show that we can incorporate relational inductive biases, encoded in the form of relational graphs, into agents.
4 code implementations • 8 Oct 2020 • Minqi Jiang, Edward Grefenstette, Tim Rocktäschel
Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning.
1 code implementation • ICML Workshop LaReL 2020 • Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel
This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment.