DSPy is the framework for programming—rather than prompting—language models. It allows you to iterate fast on building modular AI systems and offers algorithms for optimizing their prompts and weights, whether you're building simple classifiers, sophisticated RAG pipelines, or Agent loops.
DSPy stands for Declarative Self-improving Python. Instead of brittle prompts, you write compositional Python code and use DSPy to teach your LM to deliver high-quality outputs.
This lecture excerpt by Omar Khattab introduces Compound AI Systems, a modular approach to building AI systems using Large Language Models (LLMs) as components. The presentation highlights the limitations of monolithic LLMs, emphasizing the advantages of Compound AI Systems in terms of improved reliability, controllability, transparency, and efficiency. Khattab then details DSPy, a framework for creating these modular systems by expressing them as programs with natural-language-typed modules, and discusses methods for optimizing these programs, including instruction and demonstration optimization techniques. Finally, the presentation showcases experimental results and concludes that modularity, rather than simply larger LLMs, is key to advancing AI.
DSPy is superior to traditional prompting engineering for several reasons. Traditional prompting engineering is based on monolithic language models where users input a prompt into a language model as a black box and hope that the system outputs the desired response. DSPy on the other hand uses a modular approach where language models act as components that specialize in particular roles within a larger program. This modular approach has several advantages over traditional prompting methods. Here is a list of those advantages:
- Quality: DSPy offers a more reliable composition of better-scoped language model capabilities. By breaking down a task into smaller modules, language models can focus on specific aspects of the task, leading to improved overall performance. For example, in a multi-hop retrieval-augmented generation task, one module could focus on generating search queries, while another focuses on synthesizing information from retrieved contexts.
- Control: DSPy allows for iterative improvements and grounding through external tools. Developers have greater control over the system's architecture, enabling faster iteration and the ability to build systems that individual language models cannot achieve. For instance, a developer could add a fact-checking module to a pipeline, enhancing the system's reliability without needing to retrain the entire model.
- Transparency: DSPy provides debugging capabilities and user-facing attribution. By inspecting the trace of system behavior, developers can understand why a system generated specific information, facilitating debugging and improvement. If a system makes a mistake, it's easier to pinpoint the source of the error and rectify it.
- Efficiency: DSPy enables the use of smaller language models by offloading knowledge and control flow. Instead of requiring a monolithic language model to possess knowledge on all topics, DSPy distributes the workload across specialized modules. This leads to more efficient systems, as smaller, more focused language models can be used effectively.
- Inference-Time Scaling: DSPy enables systematic searches for better outputs at inference time. By using strategies like reflection and solution ranking, DSPy can spend more compute at test time to explore various solution paths and improve performance. This is exemplified by systems like AlphaCodium, which iteratively generates, ranks, and refines code solutions.
In contrast, traditional prompting methods suffer from several limitations.
- First, they lack modularity, making it difficult to control, debug, and improve the resulting systems.
- Second, they heavily rely on trial-and-error to coax language models into desired behaviors, often leading to lengthy and non-generalizable prompts.
- Third, traditional prompts implicitly couple various roles, hindering portability across different tasks, language models, and objectives.
DSPy addresses these limitations by offering a more structured, modular, and transparent approach to building compound AI systems. By abstracting away prompting details and enabling optimization through programming paradigms, DSPy empowers developers to create more robust, efficient, and adaptable AI systems.
A canonical example that demonstrates how DSPy works is multi-hop retrieval-augmented generation (RAG). In this example, the goal is to answer a complex question that requires retrieving information from multiple sources.
Traditional approaches would input the entire question into a monolithic language model, hoping it could synthesize an answer from its internal knowledge. This method often fails because it requires the model to retain vast amounts of knowledge and understand complex reasoning patterns.
DSPy provides a more structured solution. Here's how a multi-hop RAG system is built using DSPy:
Define Modules: The system is broken down into smaller, manageable modules. For multi-hop RAG, two key modules are:
generate_query
: This module takes the question and any existing context as input and outputs a search query.generate_answer
: This module takes the retrieved contexts and the original question as input and produces a final answer.Specify Program Logic: The
forward
method of a DSPy module defines the program's execution flow. In this case, the forward method would:generate_query
to get a search query.generate_answer
with the accumulated context and the original question to produce the final answer.Optimize the System: The initial modules, defined using natural language signatures, need to be translated into effective prompts for the underlying language model. DSPy offers various optimization strategies, such as:
Evaluate and Iterate: The optimized DSPy program is then evaluated on a test set. The modular nature of DSPy allows developers to easily modify or add modules to improve the system's performance. For example, a fact-checking module could be incorporated to verify the generated answer against retrieved sources, further enhancing the system's reliability.
In essence, DSPy transforms the problem of prompt engineering into a programming task. Developers can define a modular system, specify its behavior using natural language, and rely on optimization techniques to generate effective prompts for the language models. This approach offers greater control, transparency, and efficiency compared to traditional monolithic prompting methods.
The multi-hop RAG example highlights the benefits of DSPy: it demonstrates the decomposition of a complex task into simpler modules, the use of natural language signatures to define module behavior, and the application of optimization techniques to automatically improve prompt quality.