no code implementations • 25 Feb 2024 • Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu
This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization.
1 code implementation • 16 Feb 2024 • Chiyu Zhang, Yifei Sun, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long
Leveraging users' long engagement histories is essential for personalized content recommendations.
2 code implementations • 27 Sep 2023 • Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma
We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.
1 code implementation • 30 Aug 2023 • Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang
As a result, their performance suffers drastically on inputs longer than those encountered during training, substantially limiting their applications in real-world tasks involving long contexts such as encoding scientific articles, code repositories, or long dialogues.
no code implementations • 22 May 2023 • Kuan-Hao Huang, Liang Tan, Rui Hou, Sinong Wang, Amjad Almahairi, Ruty Rinott
Fine-tuning a large pre-trained language model for each downstream task causes computational burdens in the inference time due to several times of forward passes.
1 code implementation • CVPR 2023 • Ajinkya Tejankar, Maziar Sanjabi, Qifan Wang, Sinong Wang, Hamed Firooz, Hamed Pirsiavash, Liang Tan
It was shown that an adversary can poison a small part of the unlabeled data so that when a victim trains an SSL model on it, the final model will have a backdoor that the adversary can exploit.
1 code implementation • 4 Feb 2023 • Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer
In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens.
no code implementations • 4 Nov 2022 • Yifang Chen, Karthik Sankararaman, Alessandro Lazaric, Matteo Pirotta, Dmytro Karamshuk, Qifan Wang, Karishma Mandyam, Sinong Wang, Han Fang
We design a novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.
no code implementations • 2 Jun 2022 • Karthik Abinav Sankararaman, Sinong Wang, Han Fang
Transformer has become ubiquitous due to its dominant performance in various NLP and image processing tasks.
no code implementations • Findings (ACL) 2022 • Khalil Mrini, Shaoliang Nie, Jiatao Gu, Sinong Wang, Maziar Sanjabi, Hamed Firooz
Without the use of a knowledge base or candidate sets, our model sets a new state of the art in two benchmark datasets of entity linking: COMETA in the biomedical domain, and AIDA-CoNLL in the news domain.
no code implementations • NAACL 2022 • Zhuofeng Wu, Sinong Wang, Jiatao Gu, Rui Hou, Yuxiao Dong, V. G. Vinod Vydiswaran, Hao Ma
Prompt tuning is a new, efficient NLP transfer learning paradigm that adds a task-specific prompt in each input instance during the model training stage.
no code implementations • 7 Dec 2021 • Darsh J Shah, Sinong Wang, Han Fang, Hao Ma, Luke Zettlemoyer
The ubiquity of offensive and hateful content on online fora necessitates the need for automatic solutions that detect such content competently across target groups.
1 code implementation • NAACL 2022 • Qinyuan Ye, Madian Khabsa, Mike Lewis, Sinong Wang, Xiang Ren, Aaron Jaech
Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time.
2 code implementations • NeurIPS 2021 • Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, Luke Zettlemoyer
Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length.
3 code implementations • 29 Apr 2021 • Sinong Wang, Han Fang, Madian Khabsa, Hanzi Mao, Hao Ma
Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners.
Ranked #1 on Topic Classification on OS
no code implementations • EMNLP 2021 • Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa
Current NLP models are predominantly trained through a two-stage "pre-train then fine-tune" pipeline.
1 code implementation • NAACL 2021 • Nayeon Lee, Belinda Z. Li, Sinong Wang, Pascale Fung, Hao Ma, Wen-tau Yih, Madian Khabsa
In this paper, we introduce UnifiedM2, a general-purpose misinformation model that jointly models multiple domains of misinformation with a single, unified setup.
no code implementations • 31 Dec 2020 • Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa
Thus, our policy packs task-relevant knowledge into the parameters of a language model.
no code implementations • 31 Dec 2020 • Zhuofeng Wu, Sinong Wang, Jiatao Gu, Madian Khabsa, Fei Sun, Hao Ma
Pre-trained language models have proven their unique powers in capturing implicit language features.
Ranked #5 on Question Answering on Quora Question Pairs
no code implementations • ACL 2020 • Sinong Wang, Madian Khabsa, Hao Ma
Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks.
no code implementations • 15 Jun 2020 • Sinong Wang, Madian Khabsa, Hao Ma
Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks.
15 code implementations • 8 Jun 2020 • Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma
Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications.
no code implementations • WS 2020 • Nayeon Lee, Belinda Z. Li, Sinong Wang, Wen-tau Yih, Hao Ma, Madian Khabsa
Recent work has suggested that language models (LMs) store both common-sense and factual knowledge learned from pre-training data.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, Jie Tang
We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies.
no code implementations • 16 Apr 2018 • Fang Liu, Sinong Wang, Swapna Buccapatnam, Ness Shroff
We show that UCBoost($D$) enjoys $O(1)$ complexity for each arm per round as well as regret guarantee that is $1/e$-close to that of the kl-UCB algorithm.
no code implementations • NeurIPS 2017 • Sinong Wang, Ness Shroff
It is well known that, for a linear program (LP) with constraint matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, the Alternating Direction Method of Multiplier converges globally and linearly at a rate $O((\|\mathbf{A}\|_F^2+mn)\log(1/\epsilon))$.