Search Results for author: Philip Pham

Found 7 papers, 4 papers with code

ReadTwice: Reading Very Large Documents with Memories

no code implementations • NAACL 2021 • Yury Zemlyanskiy, Joshua Ainslie, Michiel de Jong, Philip Pham, Ilya Eckstein, Fei Sha

Knowledge-intensive tasks such as question answering often require assimilating information from different sections of large inputs such as books or article collections.

Question Answering

Paper
Add Code

OmniNet: Omnidirectional Representations from Transformers

1 code implementation • 1 Mar 2021 • Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

Ranked #1 on Machine Translation on WMT2017 Russian-English

Few-Shot Learning Language Modelling +2

Paper
Code

Long Range Arena : A Benchmark for Efficient Transformers

no code implementations • ICLR 2021 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity.

16k Benchmarking

Paper
Add Code

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations • 8 Nov 2020 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Ranked #18 on Long-range modeling on LRA (Pathfinder metric)