no code implementations • 24 Apr 2024 • Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu
In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86. 2%, including a 99% leakage with GPT-4 and claude-1. 3.
1 code implementation • 15 Nov 2023 • Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, PengFei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan
Our study reveals that instruction controllable text summarization remains a challenging task for LLMs, since (1) all LLMs evaluated still make factual and other types of errors in their summaries; (2) all LLM-based evaluation methods cannot achieve a strong alignment with human annotators when judging the quality of candidate summaries; (3) different LLMs show large performance gaps in summary generation and evaluation.
no code implementations • 15 Nov 2023 • Prafulla Kumar Choubey, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu
Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote.
1 code implementation • 17 Sep 2023 • Kung-Hsiang Huang, Philippe Laban, Alexander R. Fabbri, Prafulla Kumar Choubey, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu
In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event.
1 code implementation • 28 May 2023 • Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Kathleen McKeown, Noémie Elhadad
Similarly, on 1k samples from CNN / DM, we show that prompting GPT-3 to follow EDU plans outperforms sampling-based methods by 1. 05 ROUGE-2 F1 points.
1 code implementation • 23 May 2023 • Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu
To address this, we propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits.
1 code implementation • 23 May 2023 • Yixin Liu, Kejian Shi, Katherine S He, Longtian Ye, Alexander R. Fabbri, PengFei Liu, Dragomir Radev, Arman Cohan
Meanwhile, we perform a meta-analysis on this new learning setting that reveals a discrepancy between human and LLM-based evaluation, highlighting the benefits and risks of this LLM-as-reference setting we investigated.
1 code implementation • 7 Mar 2023 • Yixin Liu, Alexander R. Fabbri, Yilun Zhao, PengFei Liu, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev
Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics.
1 code implementation • 20 Dec 2022 • Artidoro Pagnoni, Alexander R. Fabbri, Wojciech Kryściński, Chien-Sheng Wu
In long document controllable summarization, where labeled data is scarce, pretrained models struggle to adapt to the task and effectively respond to user queries.
2 code implementations • 15 Dec 2022 • Yixin Liu, Alexander R. Fabbri, PengFei Liu, Yilun Zhao, Linyong Nan, Ruilin Han, Simeng Han, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev
Human evaluation is the foundation upon which the evaluation of both summarization systems and automatic metrics rests.
1 code implementation • 29 Nov 2022 • Adithya Bhaskar, Alexander R. Fabbri, Greg Durrett
Large language models have shown impressive performance across a wide variety of tasks, including text summarization.
1 code implementation • 11 Nov 2022 • Alexander R. Fabbri, Prafulla Kumar Choubey, Jesse Vig, Chien-Sheng Wu, Caiming Xiong
We propose to use sentence-compression data to train the post-editing model to take a summary with extrinsic entity errors marked with special tokens and output a compressed, well-formed summary with those errors removed.
no code implementations • COLING (CreativeSumm) 2022 • Divyansh Agarwal, Alexander R. Fabbri, Simeng Han, Wojciech Kryściński, Faisal Ladhak, Bryan Li, Kathleen McKeown, Dragomir Radev, Tianyi Zhang, Sam Wiseman
We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions.
1 code implementation • 2 Sep 2022 • Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, Lucy Sun, Alex Wardle-Solano, Hannah Szabo, Ekaterina Zubova, Matthew Burtell, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Alexander R. Fabbri, Wojciech Kryscinski, Semih Yavuz, Ye Liu, Xi Victoria Lin, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Rex Ying, Arman Cohan, Dragomir Radev
We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations.
1 code implementation • 25 May 2022 • Liyan Tang, Tanya Goyal, Alexander R. Fabbri, Philippe Laban, Jiacheng Xu, Semih Yavuz, Wojciech Kryściński, Justin F. Rousseau, Greg Durrett
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
1 code implementation • NAACL 2022 • Alexander R. Fabbri, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong
Factual consistency is an essential quality of text summarization models in practical settings.
1 code implementation • Findings (NAACL) 2022 • Jesse Vig, Alexander R. Fabbri, Wojciech Kryściński, Chien-Sheng Wu, Wenhao Liu
Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization.
2 code implementations • NAACL 2022 • Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. Smith
We therefore propose a generalization of leaderboards, bidimensional leaderboards (Billboards), that simultaneously tracks progress in language generation models and metrics for their evaluation.
1 code implementation • NAACL 2022 • Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
no code implementations • 14 Oct 2021 • Prafulla Kumar Choubey, Alexander R. Fabbri, Jesse Vig, Chien-Sheng Wu, Wenhao Liu, Nazneen Fatema Rajani
Then, we fine-tune a base summarization model, which is trained on all training samples, on the clean (noisy) subset to obtain an \textit{expert} (\textit{anti-expert}) model.
1 code implementation • ACL 2021 • Alexander R. Fabbri, Faiaz Rahman, Imad Rizvi, Borui Wang, Haoran Li, Yashar Mehdad, Dragomir Radev
While online conversations can cover a vast amount of information in many different formats, abstractive text summarization has primarily focused on modeling solely news articles.
no code implementations • 17 Apr 2021 • Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Mona Diab
A major obstacle for multi-perspective, abstractive answer summarization is the absence of a dataset to provide supervision for producing such summaries.
no code implementations • NAACL 2021 • Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, Yashar Mehdad
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks.
5 code implementations • 24 Jul 2020 • Alexander R. Fabbri, Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher, Dragomir Radev
The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress.
1 code implementation • ACL 2020 • Alexander R. Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, Bing Xiang
Training a QA model on this data gives a relative improvement over a previous unsupervised model in F1 score on the SQuAD dataset by about 14%, and 20% when the answer is a named entity, achieving state-of-the-art performance on SQuAD for unsupervised QA.
1 code implementation • 4 Sep 2019 • Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander R. Fabbri, Irene Li, Dan Friedman, Dragomir R. Radev
Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community.
Ranked #1 on Scientific Document Summarization on CL-SciSumm
2 code implementations • 26 Jun 2019 • Youngnam Lee, Youngduck Choi, Junghyun Cho, Alexander R. Fabbri, HyunBin Loh, Chanyou Hwang, Yongku Lee, Sang-Wook Kim, Dragomir Radev
Our model outperforms existing approaches over several metrics in predicting user response correctness, notably out-performing other methods on new users without large question-response histories.
1 code implementation • ACL 2019 • Alexander R. Fabbri, Irene Li, Tianwei She, Suyi Li, Dragomir R. Radev
Automatic generation of summaries from multiple news articles is a valuable tool as the number of online publications grows rapidly.
Ranked #5 on Multi-Document Summarization on Multi-News
no code implementations • 26 Nov 2018 • Irene Li, Alexander R. Fabbri, Robert R. Tung, Dragomir R. Radev
The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation.
no code implementations • CL 2018 • Debanjan Ghosh, Alexander R. Fabbri, Smaranda Muresan
To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the current turn.
no code implementations • ACL 2018 • Alexander R. Fabbri, Irene Li, Prawat Trairatvorakul, Yijiao He, Wei Tai Ting, Robert Tung, Caitlin Westerfield, Dragomir R. Radev
The field of Natural Language Processing (NLP) is growing rapidly, with new research published daily along with an abundance of tutorials, codebases and other online resources.