Search Results for author: Niloofar Mireshghallah

Found 7 papers, 4 papers with code

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

1 code implementation • 5 Mar 2024 • Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, Santu Rana

In this paper, we introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent, compared to what is revealed by prompting the target model with the training data directly, which is the dominant approach of quantifying memorization in LLMs.

Memorization

Paper
Code

Do Membership Inference Attacks Work on Large Language Models?

1 code implementation • 12 Feb 2024 • Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh Hajishirzi

Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data.

Membership Inference Attack

Paper
Code

A Roadmap to Pluralistic Alignment

1 code implementation • 7 Feb 2024 • Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, Yejin Choi

We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1) Overton pluralistic models that present a spectrum of reasonable responses; 2) Steerably pluralistic models that can steer to reflect certain perspectives; and 3) Distributionally pluralistic models that are well-calibrated to a given population in distribution.

Paper
Code

A Block Metropolis-Hastings Sampler for Controllable Energy-based Text Generation

no code implementations • 7 Dec 2023 • Jarad Forristal, Niloofar Mireshghallah, Greg Durrett, Taylor Berg-Kirkpatrick

Recent work has shown that energy-based language modeling is an effective framework for controllable text generation because it enables flexible integration of arbitrary discriminators.

Language Modelling Large Language Model +1

Paper
Add Code

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

no code implementations • 27 Oct 2023 • Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, Yejin Choi

The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.)

Privacy Preserving

Paper
Add Code

Misusing Tools in Large Language Models With Visual Adversarial Examples

1 code implementation • 4 Oct 2023 • Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence Fernandes

Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities.

SSIM

Paper
Code

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

no code implementations • 17 May 2023 • Niloofar Mireshghallah, Justus Mattern, Sicun Gao, Reza Shokri, Taylor Berg-Kirkpatrick

With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written becomes more challenging and more important, as such models could be used to spread misinformation, fake news, fake reviews and to mimic certain authors and figures.

Misinformation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.