Challenges in Augmenting Large Language Models with Private Data
Ashwinee Panda
Time:
10.09.2023 14:30 to 15:30
Location:
IRB 4105
We consider the emerging problem of preventing LLMs from displaying "misaligned" behavior through the lens of robustness, by way of algorithmic stability. We present a set of challenging problems that manifest through the fundamentally opaque nature of training datasets (e.g., because they are proprietary or private) and are amplified by autoregressive generation and massive model capacity. We consider inference-time mitigation strategies that can provide provable guarantees in production systems. The strategies we discuss are general to a wide range of problems in trustworthiness, but we focus on concrete concerns such as private data leakage, jailbreaks, prompt leaking, generating offensive text, etc.