Prompting with DSPy: A New Approach

In this article, we will explore DSPy, created by Stanford NLP University, a framework for algorithmically optimizing LM prompts and weights, hence leading to fewer manual promptings and higher overall scores.

10 days ago   •   15 min read

By Shaoni Mukherjee

Sign up FREE

Build & scale AI models on low-cost cloud GPUs.

Get started Talk to an expert
Table of contents

The era has come where we are always working on better ways to use and combine language models (LMs). Usually, LMs use fixed "prompt templates" made by trial and error. DSPy is a new method that simplifies this by turning LM pipelines into easy-to-manage text transformation graphs. These graphs use modules that can learn and improve how they prompt, fine-tune, and reason.

DSPy includes a tool that optimizes these pipelines for better performance. Studies show that DSPy can quickly create effective LM pipelines, improving performance significantly over traditional methods. It also makes smaller, open models competitive with expert-designed prompts for advanced models like GPT-3.5.

What is DSPy?

DSPy is a framework that makes optimizing language model (LM) prompts and weights easier, especially when using LMs multiple times. Without DSPy, building complex systems with LMs involves many manual steps: breaking down problems, fine-tuning prompts, tweaking steps, generating synthetic examples, and fine-tuning smaller LMs, which can be cumbersome and messy.

DSPy streamlines this by separating the program's flow from the parameters (prompts and weights) and introducing new optimizers that adjust these parameters based on desired outcomes. This makes powerful models like GPT-4 or T5-base more reliable and effective. Instead of manual prompt adjustments, DSPy uses algorithms to update the parameters, allowing you to recompile your program to fit any changes in code, data, or metrics.

Think of it like using frameworks like PyTorch for neural networks: we don’t manually tune every detail but instead use layers and optimizers to learn the best parameters. Similarly, DSPy provides modules and optimizers that automate and enhance working with LMs, making it less about manual tweaking and more about systematic improvement and higher performance.

Join our Discord Community

Get started Join the community

What does DSPy stand for? 

The backronym "now" stands for "Declarative Self-improving Language Programs," created by Stanford NLP University.

DSPy streamlines the complex process of optimizing language model (LM) prompts and weights, especially for multi-step pipelines. Traditionally, you'd have to break down the problem, refine prompts, tweak steps, generate synthetic examples, and fine-tune smaller models. This is messy and time-consuming, as any change requires reworking prompts and finetuning.

DSPy, by separating program flow from LM parameters and introducing optimizers, enhances the reliability of models like GPT-3.5, GPT-4, T5-base, or Llama2-13b. This makes them more effective and less error-prone, instilling a sense of trust and confidence in the results.

Why do we need DSPy?

"Prompt templates" are predefined instructions or demonstrations provided to the LM to guide its response to a given task.

  • Prompt templates are often created through trial and error. This means they may work well for specific tasks or scenarios but fail or produce irrelevant results in different contexts. Since these templates are hardcoded, they lack adaptability and may not effectively handle variations in input data, task requirements, or even other language models.
  • A given prompt template might work effectively for a particular LM pipeline or framework. Still, it may not generalize well to other pipelines, different LMs, varied data domains, or even different types of inputs. This lack of generalization limits the flexibility and applicability of the LM across diverse use cases.
  • Manually crafting and fine-tuning prompt templates for different tasks or LMs can be time-consuming and labor-intensive. As the complexity and diversity of tasks increase, maintaining and updating these templates becomes increasingly challenging and inefficient.

Further, other issues could be with generating the response. Using hardcoded prompt templates in language model (LM) pipelines and frameworks often leads to problems such as lack of context and relevance, inconsistency in the output, poor quality response, and inaccuracy. These challenges stem from the limited flexibility and scalability of prompt templates, which are manually crafted and may not effectively generalize across different LM models, data domains, or input variations.

So why DSPy?

  • DSPy focuses on constructing new language model pipelines away from manipulating unstructured text inputs and toward programming.
  • DSPy modules are task-adaptive components similar to neural network layers, abstracting text transformations like question answering or summarization.
  • DSPy compiler optimizes program quality or cost, utilizing training inputs and validation metrics.
  • DSPy Compiler simulates program versions, bootstrapping example traces for self-improvement and effective prompt generation.
  • Optimization in DSPy is modular, conducted by teleprompters, which determine module learning from data.
  • DSPy can map declarative modules to high-quality compositions of prompting, finetuning, reasoning, and augmentation.
  • DSPy programming models focus on reducing the role of expert-crafted prompts.
  • Compositions of DSPy modules can significantly raise the quality of simple programs within minutes to tens of minutes of compiling.

Bring this project to life

Major Components in DSPy

Before we dive deeper, let us understand a few significant components of DSPy

  • Signatures
  • Modules
  • Teleprompters or Optimizers

A DSPy signature is a declaration of a function, providing a concise specification of what a text transformation needs to be taken care of rather than detailing how a specific language model should be prompted to achieve that behavior. A DSPy signature is a tuple comprising input and output fields with an optional instruction. Each field includes a field name and optional metadata.

Signature focuses on the type of system we are building, for example:- question - > answer, english document -> french translation, or content -> summary.

qa = dspy.Predict (" question -> answer ")
qa(question =" Where is Guaran ´ı spoken?")
# Out: Prediction ( answer = ’ Guaran ´ı is spoken mainly in South America . ’)

A DSPy module is a core component for creating programs that utilize language models. Each module encloses a specific prompting technique, such as chain of thought or ReAct, and is designed to be versatile enough to work with any DSPy Signature.

These modules have adjustable parameters, including prompt and language model weights elements, and can be called to process inputs and produce outputs. Moreover, multiple DSPy modules can be combined to form larger, more complex programs. Inspired by neural network modules in PyTorch, DSPy modules bring similar functionality to language model programming.

For example:-

The dspy.Predict is the fundamental module, and all other DSPy modules are built using this module.

To use a module, we start by declaring it with a specific signature. Next, we call the module with the input arguments and extract the output fields.

sentence = "it's a charming and often affecting journey."  # example from the SST-2 dataset.

# 1) Declare with a signature.
classify = dspy.Predict('sentence -> sentiment')

# 2) Call with input argument(s). 
response = classify(sentence=sentence)

# 3) Access the output.
print(response.sentiment)

Output:-

Positive

There are a few other DSPy modules we can use:-

  • dspy.ChainOfThought
  • dspy.ReAct
  • dspy.MultiChainComparison
  • dspy.ProgramOfThought

and more.

A DSPy teleprompter is used for optimization in DSPy. It is very flexible and modular. The optimization is carried out by teleprompters, which are versatile strategies guiding how the modules should learn from data.

A DSPy optimizer is an algorithm designed to fine-tune the parameters of a DSPy program, such as the prompts and language model weights, to maximize specified metrics like accuracy. DSPy offers a variety of built-in optimizers, each employing different strategies. Typically, a DSPy optimizer requires three things: your DSPy program (which could be a single module or a complex multi-module setup), a metric function to evaluate and score your program’s output (with higher scores indicating better performance), and a few training inputs (sometimes as few as 5 or 10 examples, even if they lack labels). While having a lot of data can be beneficial, DSPy is designed to deliver strong results even with minimal input.

How the Optimizers Enhance Performance?

Traditional deep neural networks (DNNs) are optimized using gradient descent with a loss function and training data. In contrast, DSPy programs comprise multiple calls-to-language models (LMs) integrated as DSPy modules. Each module has three internal parameters: LM weights, instructions, and demonstrations of input/output behavior.

DSPy can optimize all three using multi-stage optimization algorithms, combining gradient descent for LM weights and LM-driven optimization for refining instructions and demonstrations. Unlike typical few-shot examples, DSPy demonstrations are more robust and can be generated and optimized from scratch based on your program. This compilation often produces better prompts than human writing, not because DSPy optimizers are inherently more creative but because they can systematically explore more options and fine-tune the metrics directly.

A few DSPy optimizers are listed below:-

  • LabeledFewShot
  • BootstrapFewShot
  • BootstrapFewShotWithRandomSearch
  • BootstrapFewShotWithOptuna
  • KNNFewShot

and the list goes on.

We highly recommend the DSPy documentation for further information regarding the different kinds off optimizers.

Comparison with Langchain and Llamaindex

LangChain and LlamaIndex Overview:

  • Both langchain and llamaindex are popular libraries in the field of prompting LMs.
  • Both of the libraries focus on providing pre-packaged components and chains for application developers. Further, they offer implementations of reusable pipelines (e.g., agents, retrieval pipelines) and tools (e.g., database connections, memory implementations).

DSPy's Overview:

  • DSPy aims to tackle fundamental challenges of prompt engineering and builds new LM computational graphs without manual prompt engineering.
  • Additionally, it introduces core composable operators, signatures (for abstract prompts), modules (for abstract prompting techniques), and teleprompters as optimizers.
  • DSPy facilitates quick construction of new LM pipelines and high-quality results through automatic compilation and self-improvement.

Significant differences between LangChain and LlamaIndex:

  • LangChain and LlamaIndex rely on manual prompt engineering, which DSPy aims to resolve.
  • DSPy provides a structured framework that automatically bootstraps prompts, eliminating the need for hand-written prompt demonstrations.
  • In September 2023, LangChain's codebase contained 50 strings exceeding 1000 characters and numerous files dedicated to prompt engineering (12 prompts.py and 42 prompt.py files). In contrast, DSPy contains no hand-written prompts yet achieves high quality with various LMs.
  • DSPy proves to be more modular and powerful than hard-coded prompts.

Getting started with DSPy

Bring this project to life

Let us start with installing the packages:

!pip install dspy-ai 
#or
!pip install git+https://github.com/stanfordnlp/dspy.git

By default, DSPy installs the latest openai from pip.

Import the necessary packages,

import sys
import os
import dspy
from dspy.datasets import HotPotQA
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate.evaluate import Evaluate
from dsp.utils import deduplicate

Getting started and loading the data

turbo = dspy.OpenAI(model='gpt-3.5-turbo') #model name 'gpt-3.5-turbo'
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts') #the retriever ColBERTv2

dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

#load the data
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

HotpotQA is a question-answering dataset sourced from English Wikipedia, which comprises around 113,000 crowd-sourced questions.

Using this information, we will create a question-answering system. For this purpose, we will use 20 data points for training and 50 data points for the development or validation set.

# get the train and validation set.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

Output:-

(20, 50)

Next, we will take a look at some examples.

train_example = trainset[0]
print(f"Question: {train_example.question}")
print(f"Answer: {train_example.answer}")

Output:-

Question: At My Window was released by which American singer-songwriter?
Answer: John Townes Van Zandt
dev_example = devset[18]
print(f"Question: {dev_example.question}")
print(f"Answer: {dev_example.answer}")
print(f"Relevant Wikipedia Titles: {dev_example.gold_titles}")

Output:-

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Answer: English
Relevant Wikipedia Titles: {'Robert Irvine', 'Restaurant: Impossible'}

Creating a chatbot

We're creating a function called Basic QA with the signature for questions requiring short, factoid answers. Each question will have one answer, limited to one to five words.

This signature defines our goal: to develop a question-answering chatbot.

class BasicQA(dspy.Signature): #Signature
    """Answer questions with short factoid answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

Next, we generate the response using dspy.predict, pass the Basic QA class, and call the generate_answer function with our example question. Finally, we print the output to test if our question-answering chatbot responds correctly.

# Define the predictor.
generate_answer = dspy.Predict(BasicQA)

# Call the predictor on a particular input.
pred = generate_answer(question=dev_example.question)

# Print the input and the prediction.
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")

Output:-

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Predicted Answer: American

Here, the answer is incorrect, and we need to correct it. Let us inspect how this output was generated.

turbo.inspect_history(n=1)

turbo.inspect_history(n=1)




Answer questions with short factoid answers.

---

Follow the following format.

Question: ${question}
Answer: often between 1 and 5 words

---

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Answer: American

This chef is British and American, but we cannot know if the model just guessed "American" because it's a standard answer.

Let us introduce the 'chain of thought.'

Creating a chatbot using Chain of Thought

 Chain of thought includes a series of intermediate reasoning steps, significantly improving large language models' ability to perform complex reasoning. 

generate_answer_with_chain_of_thought = dspy.ChainOfThought(BasicQA)

# Call the predictor on the same input.
pred = generate_answer_with_chain_of_thought(question=dev_example.question)

# Print the input, the chain of thought, and the prediction.
print(f"Question: {dev_example.question}")
print(f"Thought: {pred.rationale.split('.', 1)[1].strip()}")
print(f"Predicted Answer: {pred.answer}")
Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Thought: We know that the chef and restaurateur featured in Restaurant: Impossible is Robert Irvine.
Predicted Answer: British

Here, the answer is better than the response we received earlier. These predictors (dspy.Predict and dspy.ChainOfThought) can be applied to any signature. 

Feel free to run the code below and check the reasoning and how this response is generated.

turbo.inspect_history(n=1)

Creating a RAG Application

We'll build a retrieval-augmented pipeline for answer generation. First, we will create a signature and then a module, set up an optimizer to refine it, and finally execute the RAG process by defining a class called GenerateAnswer.

RAG Signature

Define the signature: context, question --> answer.

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

RAG Module

In the RAG class, which acts as a module, we define the model in the init function. We focus on 'Retrieve' and 'GenerateAnswer.' 'Retrieve' gathers relevant passages as context, then 'GenerateAnswer' uses 'ChainOfThought' to provide predictions based on the user's question.

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

RAG Optimizer

Next, we are compiling the RAG program, that involves using a training set, defining a validation metric, and selecting a teleprompter to optimize the program. Teleprompters are powerful optimizers that select effective prompts for modules. We'll use BootstrapFewShot as a simple default teleprompter, similar to choosing an optimizer in traditional supervised learning setups like SGD, Adam, or RMSProp.

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

Now, let's try executing this pipeline.

# Ask any question you like to this simple RAG program.
my_question = "What castle did David Gregory inherit?"

# Get the prediction. This contains `pred.context` and `pred.answer`.
pred = compiled_rag(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
Question: What castle did David Gregory inherit?
Predicted Answer: Kinnairdy Castle
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...', 'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University ...']

Let us inspect the history.

turbo.inspect_history(n=1)
Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 to 1006. In December 999, and again on February 2, 1002, he reinstituted and confirmed the possessions of the abbey and monks of Monte Cassino in Ascoli. In 1004, he fortified and expanded the castle of Dragonara on the Fortore. He gave it three circular towers and one square one. He also strengthened Lucera.»
[3] «David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University of Edinburgh, Savilian Professor of Astronomy at the University of Oxford, and a commentator on Isaac Newton's "Principia".»

Question: What castle did David Gregory inherit?

Reasoning: Let's think step by step in order to produce the answer. We know that David Gregory inherited a castle. The name of the castle is Kinnairdy Castle.

Answer: Kinnairdy Castle

Evaluate

The final step is evaluation, where we assess the RAG model's performance: We will evaluate the basic RAG, the uncompiled RAG (without optimizer), and the compiled RAG (with optimizer). We will compare the scores obtained from these evaluations.

Basic RAG

def gold_passages_retrieved(example, pred, trace=None):
    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))
    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    return gold_titles.issubset(found_titles)

evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)

compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)

Uncompiled Baleen RAG (Without Optimizer)

Exploring challenging questions in the training/dev sets reveals that a single search query often needs to be revised, such as when more details are needed. To address this, retrieval-augmented NLP literature proposes multi-hop search systems like GoldEn and Baleen, which generate additional queries to gather further information.

With DSPy, we can easily simulate such systems using the GenerateAnswer signature from the RAG implementation and a signature for the "hop" behavior: generating search queries to find missing information based on partial context and a question.

class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

Next, create the module.

class SimplifiedBaleen(dspy.Module):
    def __init__(self, passages_per_hop=3, max_hops=2):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []
        
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

Baleen's primary purpose is to automatically modify the question or query by dividing it into chunks. It retrieves the context from the chunks and then saves it in a variable, which helps generate more accurate answers.

Inspect the zero-shot version of the Baleen program.

Using a program in a zero-shot (uncompiled) setting relies on the underlying language model's ability to understand sub-tasks with minimal instructions. This works well with powerful models (e.g., GPT-4) on simple, common tasks. However, zero-shot approaches are less practical for specialized tasks, novel domains, and more efficient or open models. DSPy can enhance performance in these situations.

# Ask any question you like to this simple RAG program.
my_question = "How many storeys are in the castle that David Gregory inherited?"

# Get the prediction. This contains `pred.context` and `pred.answer`.
uncompiled_baleen = SimplifiedBaleen()  # uncompiled (i.e., zero-shot) program
pred = uncompiled_baleen(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
Question: How many storeys are in the castle that David Gregory inherited?
Predicted Answer: five
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'The Boleyn Inheritance | The Boleyn Inheritance is a novel by British author Philippa Gregory which was first published in 2006. It is a direct sequel to her previous novel "The Other Boleyn Girl," an...', 'Gregory of Gaeta | Gregory was the Duke of Gaeta from 963 until his death. He was the second son of Docibilis II of Gaeta and his wife Orania. He succeeded his brother John II, who had left only daugh...', 'Kinnairdy Castle | Kinnairdy Castle is a tower house, having five storeys and a garret, two miles south of Aberchirder, Aberdeenshire, Scotland. The alternative name is Old Kinnairdy....', 'Kinnaird Head | Kinnaird Head (Scottish Gaelic: "An Ceann Àrd" , "high headland") is a headland projecting into the North Sea, within the town of Fraserburgh, Aberdeenshire on the east coast of Scotla...', 'Kinnaird Castle, Brechin | Kinnaird Castle is a 15th-century castle in Angus, Scotland. The castle has been home to the Carnegie family, the Earl of Southesk, for more than 600 years....']

Compiled Baleen RAG (with Optimizer)

First, we'll define our validation logic, which will ensure that:

  1. The predicted answer matches the correct answer.
  2. The retrieved context includes the correct answer.
  3. None of the generated queries are too long (i.e., none exceed 100 characters).
  4. None of the generated queries are repetitive (i.e., none have an F1 score of 0.8 or higher than earlier ones).
def validate_context_and_answer_and_hops(example, pred, trace=None):
    if not dspy.evaluate.answer_exact_match(example, pred): return False
    if not dspy.evaluate.answer_passage_match(example, pred): return False

    hops = [example.question] + [outputs.query for *_, outputs in trace if 'query' in outputs]

    if max([len(h) for h in hops]) > 100: return False
    if any(dspy.evaluate.answer_exact_match_str(hops[idx], hops[:idx], frac=0.8) for idx in range(2, len(hops))): return False

    return True

Next, we will use one of the most basic teleprompters in DSPy, namely, BootstrapFewShot

teleprompter = BootstrapFewShot(metric=validate_context_and_answer_and_hops)

Finally, we will compile the optimizer and evaluate the retrieval quality of the compiled and uncompiled baleen pipelines.

compiled_baleen = teleprompter.compile(SimplifiedBaleen(), teacher=SimplifiedBaleen(passages_per_hop=2), trainset=trainset)

uncompiled_baleen_retrieval_score = evaluate_on_hotpotqa(uncompiled_baleen, metric=gold_passages_retrieved)

compiled_baleen_retrieval_score = evaluate_on_hotpotqa(compiled_baleen, metric=gold_passages_retrieved)

Let us print the scores for comparison now.

print(f"## Retrieval Score for RAG: {compiled_rag_retrieval_score}")  # note that for RAG, compilation has no effect on the retrieval step
print(f"## Retrieval Score for uncompiled Baleen: {uncompiled_baleen_retrieval_score}")
print(f"## Retrieval Score for compiled Baleen: {compiled_baleen_retrieval_score}")

Output:-

## Retrieval Score for RAG: 26.0
## Retrieval Score for uncompiled Baleen: 48.0
## Retrieval Score for compiled Baleen: 60.0

Hence, the compiled Baleen method provides more accurate answers than the basic RAG application. Compiled Baleen divides the question into multiple small chunks, retrieves the context, and provides a more precise answer.

compiled_baleen("How many storeys are in the castle that David Gregory inherited?")
turbo.inspect_history(n=3)

Conclusion

This article introduces DSPy, a new programming model for designing AI systems using pipelines of pre-trained language models (LMs) and other tools. We presented three key concepts: DSPy signatures, modules, and teleprompters. Further, we explored the framework by creating simple q and a chatbots and RAG applications. Through these experiments, we demonstrated that DSPy enables the rapid development of effective systems using relatively small LMs.

We hope you enjoyed the article!

Add speed and simplicity to your Machine Learning workflow today

Get startedTalk to an expert

References

Spread the word

Keep reading