Building More Reliable and Scalable AI Systems with Language Model Programming
It is now easy to build impressive demos with language models (LMs) but turning these into reliable systems currently requires brittle combinations of prompting, chaining, and finetuning LMs. In this talk, I present LM programming, a systematic way to address this by defining and improving four layers of the LM stack. I start with how to adapt LMs to search for information most effectively (ColBERT, ColBERTv2, UDAPDR) and how to scale that to billions of tokens (PLAID). I then discuss the right architectures and supervision strategies (ColBERT-QA, Baleen, Hindsight) for allowing LMs to search for and cite verifiable sources in their responses. This leads to DSPy, a programming model that replaces ad-hoc LM prompting techniques with composable modules and with optimizers that can supervise complex LM programs. Even simple AI systems expressed in DSPy routinely outperform standard hand-crafted prompt pipelines, in some cases while using small LMs. I highlight how ColBERT and DSPy have sparked applications at dozens of leading tech companies, open-source communities, and research labs, and then conclude by discussing how DSPy enables a new degree of research modularity, one that stands to allow open research to again lead the development of AI systems.