Powerful Open-Weight AI Models: Wonderful, Terrible, and Inevitable — How Can We Make Them Safer?
Frontier AI models with openly available weights are steadily becoming more powerful and widely adopted. However, compared to proprietary models, open-weight models pose different challenges to effective risk management. There is also relatively little research on safety tooling specific to them. Addressing these gaps will be key to both realizing the benefits and mitigating the harms of open-weight models. This talk will focus on 16 open technical challenges for open-weight model safety involving pretraining, fine-tuning, evaluations, deployment, and ecosystem monitoring. It will discussing the nascent state of the field, emphasizing that openness about research, methods, and evaluations — not just weights — will be key to building a rigorous science of open-weight model risk management.