Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 393 - NLP and Text Analysis
Type: Contributed
Date/Time: Wednesday, August 10, 2022 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #323428
Title: Model Editing in Language Models Using Influence Functions
Author(s): Jillian Fisher* and Liwei Jiang and Krishna Pillutla and Swabha Swayamdipta and Yejin Choi and Zaid Harchaoui
Companies: University of Washington and University of Washington, Allen Institute for Artificial Intelligence and University of Washington and Allen Institute for Artificial Intelligence and University of Washington,Allen Institute for Artificial Intelligence and University of Washington
Keywords: Influence Functions; Language Models; Model Editing; Machine Learning
Abstract:

Despite the successes of large pretrained language models, there is a dependence of large training corpora that contain social biases and toxicity, which adversely affects model behavior. We propose to use influence functions, a classical concept from robust statistics, to design a cost-effective post-hoc model editing technique to remove unwanted behaviors in trained language models. Influence functions depict the amount of dependence the estimator has with any one data point in a given sample. We utilize influence functions to approximate parameters of transformer language models based on a subset of the training data without re-training the model, using only the gradient and Hessian-vector product oracles of the model. To implement this method, we use an efficient numerical technique to calculate the influence of a datapoint utilizing matrix sketching. We present this technique using a language model task. Our preliminary results show promising increase in forgetting the toxic or unwanted behavior while retaining the learning of wanted behaviors.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program