A Simple Introduction to DSPy

Maxime Rivest

2025-06-03

DSPy is simple and powerful. It is the best way to build LLM software right now. Despite that, lots of people keep putting off learning it. I know I did—for a whole year! I was excited about DSPy, but I thought I would need a substantial time investment before I could “get it.” That’s not the case! It took me one hour. If you know Python, in an hour you’ll either have built several LLM programs, or you’ll have built one, benchmarked it, and optimized it!

In this article, we’ll go through the entire cycle: building a program, creating a gold set (synthetically, with AI—and yes, it’s actually useful, not just contrived!), and evaluating the results.

For this article, our task will be to build a program that can count the mentions of “Artificial Intelligence,” “AI,” or any other ways of referring to AI.

Overview

We’ll:

Define a DSPy signature for counting AI mentions
Fetch data from Wikipedia
Create a training dataset using a stronger model (Claude Sonnet 4)
Optimize a weaker model (Gemini Flash-lite 2.0) to match the stronger model’s performance

Figure 1: A video version of this tutorial, even more beginner friendly.

Step 1: Define the AI Task Signature

In DSPy, we define the task using a Signature class instead of writing prompts manually. DSPy provides two ways for you to specify your program. This is the shortest method. In this case, it has four parts:

dspy.Predict: This could have been dspy.ChainOfThought; it lets you specify the “strategy” the LLM should use. Predict is the vanilla option—no special strategy is mentioned in the prompt that DSPy sends to the LLM.
Input (“paragraph”): This tells the LLM that it will receive a “paragraph” as input.
Output (“ai_occurrences_count”): This tells the LLM that it will have to output the “AI occurrences count.”
Output Type (“float”): This specifies that the output should be a float—nothing else.

import dspy

ai_counter = dspy.Predict("paragraph -> ai_occurrences_count: float")

You can specify more. To fully define your program, you would use the class syntax (see the chunk below). In this case, you can add general instructions and descriptions to the fields (inputs and/or outputs).

import dspy

# Setup the llm
dspy.configure(lm=dspy.LM('gemini/gemini-2.0-flash-lite', temperature = 1.0, max_tokens = 6000))

# This define the signature of the AI function. The replaces prompts.
class count_ai_occurrences(dspy.Signature):
    """Count the number times the word 'Artificial Intelligence'
    or 'AI' or any other reference to AI or AI-related terms appears in the paragraph"""
    paragraph: str= dspy.InputField(desc = "The paragraph to count the AI mentions in")
    ai_occurrences_count: int= dspy.OutputField(desc = "The number of times the word 'Artificial Intelligence' or 'AI' appears in the paragraph")

dspy_module = dspy.Predict(count_ai_occurrences)

This signature will be turned into the following prompt by DSPy:

[
  {
    "role": "system",
    "content": """Your input fields are:
  1. `paragraph` (str): The paragraph to count the AI mentions in

Your output fields are:
  1. `ai_occurrences_count` (int): Number of times 'Artificial Intelligence'
     or 'AI' appears in the paragraph

Format all interactions like this, filling in the values:

[[ ## paragraph ## ]]
{paragraph}

[[ ## ai_occurrences_count ## ]]
{ai_occurrences_count}   # must be a single int value

[[ ## completed ## ]]

Objective:
  Count the number times the word 'Artificial Intelligence'
    or 'AI' or any other reference to AI or AI-related terms appears in the paragraph."""
  },
  {
    "role": "user",
    "content": """[[ ## paragraph ## ]]
This is a paragraph mentioning AI once.

Respond with the corresponding output fields, starting with
[[ ## ai_occurrences_count ## ]] (must be a valid Python int),
then end with [[ ## completed ## ]].
"""
  }
]

Ok, so our program is defined! That’s it.

There’s one small thing I like to do—it’s entirely optional. I do it because I want to use my DSPy program more like a regular function. So, before I go ahead, I wrap it in a function:

def count_ai_occurrences_f(paragraph):
    return dspy_module(paragraph=paragraph).ai_occurrences_count

The DSPy module requires keyword arguments and returns output as an object. Instead of repeatedly specifying my keyword arguments and the single output I want, I bake that in here. This also has the added benefit that my function now composes well with my data analytics tools, which expect not to provide a keyword argument or extract a value from an output object.

Step 2: Fetch Data

This section has nothing to do with LLMs. We are simply fetching content from the Wikipedia AI page and storing it in a dataframe. We use the Attachments library to easily fetch and split paragraphs from Wikipedia.

from attachments import Attachments

attachments_dsl = "[images: false][select: p,title,h1,h2,h3,h4,h5,h6][split: paragraphs]"
a = Attachments("https://en.wikipedia.org/wiki/Artificial_intelligence" + attachments_dsl)

We then use Datar as our data manipulation tool. I come from R and I love dplyr. Datar is an effort to provide a similar data manipulation experience here in Python.

from datar import f
import datar.base as b
from datar.tibble import tibble
from datar.dplyr import mutate, summarise, n

df = tibble(paragraphs = [p.text for p in a[:10]])

Dataframe Structure

The resulting tibble dataframe contains only one column (paragraphs) with the text from Wikipedia.

Step 3: Applying the AI to our paragraphs

Now we are starting to use large language models. Below, we apply our function to every row in our dataframe. In other words, we loop through each paragraph and send it to the LLM. The LLM returns the number of times it thinks “AI” was mentioned in the paragraph. The result from the LLM is extracted as a float. We store this in a new column of our dataframe, which we name flash_response.

df = mutate(df, flash_response = f.paragraphs.apply(count_ai_occurrences_f))

This column is now our baseline. This shows how Flash-lite performs with the base prompt from DSPy. Now, we want to optimize that prompt! For this, we need a gold set.

I like to create gold sets with state-of-the-art (SOTA) models and then optimize the prompt to approximate the responses I would get from a SOTA model, but using a much smaller, faster, and cheaper model. In other words, we’ll provide a sample of our paragraphs to Sonnet 4 and then automatically “find a way” to prompt Flash-lite into responding like Sonnet would. This is extremely useful when you don’t know the answer yourself but know that SOTA models do—or at least they get it “right enough” for you to gain valuable insights.

Ok, so now we want to add a column with Sonnet’s answers.

with dspy.context(lm=dspy.LM('anthropic/claude-sonnet-4-20250514')):
    df_with_goldset_col = mutate(df, resp_sonnet = f.paragraphs.apply(count_ai_occurrences_f))

That’s it. Let’s break down those two lines. First, DSPy recommends using either dspy.context or dspy.configure to set the LLM. Both ways are fine and both are thread-safe. On the second line, we take our current dataframe, which now has two columns (paragraphs and flash_response), and loop through every value in paragraphs, passing each one to our AI program. We then save all of that in a new column called resp_sonnet, and the entire dataframe is stored as df_with_goldset_col.

Gold Set Strategy

Using a SOTA model to create gold sets is a practical approach when you don’t have manually labeled data but trust that advanced models will perform well enough for your use case.

Evaluation

Next, we need a metric! In this case, we’ll keep it simple—we’ll require an exact match. Let’s add a column for exact_match (true/false).

df_with_goldset_col = mutate(df_with_goldset_col, exact_match = f.resp_sonnet == f.flash_response)

Let’s quickly calculate our current precision. Here, we are purely in dataframe manipulation mode with Datar. Using the >> operator, we can pass the dataframe you see above (as it comes out of mutate) to the summarise function, which sums all the True values (1s) and divides by the number of rows.

baseline_metrics = (mutate(df_with_goldset_col, exact_match = f.resp_sonnet == f.flash_response) >>
    summarise(baseline_precision = b.sum(f.exact_match)/n() * 100))

This tells us that we have 65% baseline precision with Flash-lite and this prompt.

Preparing for the optimizer

So now we have all the conceptual pieces needed to run the optimizer.

optimizer = dspy.MIPROv2(metric=exact_match)
optimized_dspy_module = optimizer.compile(dspy_module, trainset=trainset)

But notice how I said “conceptual”—now we need to do a bit of data wrangling to get our dataframe into an object that compile knows how to work with. The same goes for the metric.

Here’s how to reshape the data:

trainset = []
for r in df_with_goldset_col.to_dict(orient='records'):
    trainset.append(dspy.Example(
        paragraph=r['paragraphs'],                    # this is the input
        ai_occurrences_count=r["resp_sonnet"]).       # this is the target
       with_inputs('paragraph'))                      # this is needed (not sure why)

This is how to prepare the metric: it has to use .[output_name] to access the value of x (gold set) and y (trained model output).

def exact_match(x, y, trace=None):
    return x.ai_occurrences_count == y.ai_occurrences_count

With these two chunks of code, the optimizer will run! In this case, if we were to keep it as is, we would be using Flash-lite to compose the prompts (whenever the optimizer we choose does that). I prefer to use a SOTA model for that, so we will set a teacher model. To set a teacher model on MIPROv2, use the teacher_settings keyword. Be careful—different optimizers set the teacher in different ways.

Automatic prompt optimization

optimizer = dspy.MIPROv2(metric=exact_match,
                        teacher_settings=dspy.LM('anthropic/claude-sonnet-4-20250514'))
optimized_dspy_module = optimizer.compile(dspy_module, trainset=trainset)

We’ll wrap it in a function again so we can use it with our data analytics tools.

def count_ai_occurrences_opt(paragraph):
    return optimized_dspy_module(paragraph=paragraph).ai_occurrences_count

And we’ve built a complete one-shot pipeline to apply the optimized program, add it as a new column, and summarize the dataframe into performance metrics. Apart from count_ai_occurrences_opt, this has nothing to do with DSPy.

final_performance = (df_with_goldset_col >>
    mutate(
        # Applies flash to every row with the optimized prompt
        resp_flash_opt= f.paragraphs.apply(count_ai_occurrences_opt)) >>
    mutate(
        # Add 2 columns with 0 or 1 if the flash response is equal to the sonnet response
        flash_eq_sonnet = f.resp_sonnet == f.flash_response,  # Compare flash with sonnet
        flash_opt_eq_sonnet = f.resp_flash_opt == f.resp_sonnet  # Compare opt flash with sonnet
        ) >>
    summarise(
        # Sum the number of rows where the flash response is equal to the sonnet response
        flashlight_before_opt = b.sum(f.flash_eq_sonnet)/n() * 100, #n() is the number of rows in df
        # Sum the number of rows where the opt flash response is equal to the sonnet response
        flashlight_after_opt = b.sum(f.flash_opt_eq_sonnet)/n() * 100 #n() is the number of rows in df
    ) >>
    mutate(precision_increase=f.flashlight_after_opt-f.flashlight_before_opt)
    )

Results

Performance Improvement

Flash-lite improved by 20%. Not bad!

Here is the optimized prompt:

[
  {
    "role": "system",
    "content": """Your input fields are:
  1. `paragraph` (str): The paragraph to count the AI mentions in

Your output fields are:
  1. `ai_occurrences_count` (int): The number of times the word 'Artificial Intelligence'
     or 'AI' appears in the paragraph

All interactions will be structured in the following way, with the appropriate values filled in:

[[ ## paragraph ## ]]
{paragraph}

[[ ## ai_occurrences_count ## ]]
{ai_occurrences_count}   # note: the value you produce must be a single int value

[[ ## completed ## ]]

Objective:
  Analyze the provided paragraph and determine the frequency of mentions related to
  "Artificial Intelligence" (AI). This includes direct references to "AI",
  "Artificial Intelligence", as well as any related concepts, technologies, or subfields
  associated with AI. Provide a count representing the total number of AI-related mentions.
"""
  },
  {
    "role": "user",
    "content": """[[ ## paragraph ## ]]
In classical planning, the agent knows exactly what the effect of any action
will be.[35] In most real-world problems, however, the agent may not be certain
about the situation they are in (it is "unknown" or "unobservable") and it may
not know for certain what will happen after each possible action (it is not
"deterministic"). It must choose an action by making a probabilistic guess and
then reassess the situation to see if the action worked.[36]

Respond with the corresponding output fields, starting with the field
[[ ## ai_occurrences_count ## ]] (must be formatted as a valid Python int), and
then ending with the marker for [[ ## completed ## ]].
"""
  }
]

Conclusion

In about 50 lines, we: - Fetched paragraphs from Wikipedia - Created a gold-set - Tuned Flash-lite - Improved its precision by 20%

No prompt spaghetti.

The Complete Script

import dspy
from attachments import Attachments
from datar import f
import datar.base as b
from datar.tibble import tibble
from datar.dplyr import mutate, summarise, n

# Setup the LLM
dspy.configure(lm=dspy.LM('gemini/gemini-2.0-flash-lite', temperature=1.0, max_tokens=6000))

# Define the signature
class count_ai_occurrences(dspy.Signature):
    """Count the number times the word 'Artificial Intelligence'
    or 'AI' or any other reference to AI or AI-related terms appears in the paragraph"""
    paragraph: str = dspy.InputField(desc="The paragraph to count the AI mentions in")
    ai_occurrences_count: int = dspy.OutputField(desc="The number of times the word 'Artificial Intelligence' or 'AI' appears in the paragraph")

# Create the DSPy module
dspy_module = dspy.Predict(count_ai_occurrences)

# Wrap in a function
def count_ai_occurrences_f(paragraph):
    return dspy_module(paragraph=paragraph).ai_occurrences_count

# Fetch data
attachments_dsl = "[images: false][select: p,title,h1,h2,h3,h4,h5,h6][split: paragraphs]"
a = Attachments("https://en.wikipedia.org/wiki/Artificial_intelligence" + attachments_dsl)

# Create dataframe
df = tibble(paragraphs=[p.text for p in a[:10]])

# Apply baseline model
df = mutate(df, flash_response=f.paragraphs.apply(count_ai_occurrences_f))

# Create gold set with Sonnet
with dspy.context(lm=dspy.LM('anthropic/claude-sonnet-4-20250514')):
    df_with_goldset_col = mutate(df, resp_sonnet=f.paragraphs.apply(count_ai_occurrences_f))

# Calculate baseline precision
baseline_metrics = (mutate(df_with_goldset_col, exact_match=f.resp_sonnet == f.flash_response) >>
    summarise(baseline_precision=b.sum(f.exact_match)/n() * 100))

# Prepare training set
trainset = []
for r in df_with_goldset_col.to_dict(orient='records'):
    trainset.append(dspy.Example(
        paragraph=r['paragraphs'],
        ai_occurrences_count=r["resp_sonnet"]).with_inputs('paragraph'))

# Define metric
def exact_match(x, y, trace=None):
    return x.ai_occurrences_count == y.ai_occurrences_count

# Optimize
optimizer = dspy.MIPROv2(metric=exact_match,
                        teacher_settings=dspy.LM('anthropic/claude-sonnet-4-20250514'))
optimized_dspy_module = optimizer.compile(dspy_module, trainset=trainset)

# Wrap optimized module
def count_ai_occurrences_opt(paragraph):
    return optimized_dspy_module(paragraph=paragraph).ai_occurrences_count

# Calculate final performance
final_performance = (df_with_goldset_col >>
    mutate(resp_flash_opt=f.paragraphs.apply(count_ai_occurrences_opt)) >>
    mutate(
        flash_eq_sonnet=f.resp_sonnet == f.flash_response,
        flash_opt_eq_sonnet=f.resp_flash_opt == f.resp_sonnet
    ) >>
    summarise(
        flashlight_before_opt=b.sum(f.flash_eq_sonnet)/n() * 100,
        flashlight_after_opt=b.sum(f.flash_opt_eq_sonnet)/n() * 100
    ) >>
    mutate(precision_increase=f.flashlight_after_opt-f.flashlight_before_opt)
)

--- title: "A Simple Introduction to DSPy" date: 2025-07-07 author: Maxime Rivest description: "Learn DSPy in one hour with practical examples" draft: false format: html: toc: true toc-location: right title-block-banner: false title-block-style: none execute: cache: true freeze: true --- ![](images/dspy-banner.jpeg) DSPy is simple and powerful. It is the best way to build LLM software right now. Despite that, lots of people keep putting off learning it. I know I did—for a whole year! I was excited about DSPy, but I thought I would need a substantial time investment before I could "get it." That’s not the case! It took me one hour. If you know Python, in an hour you’ll either have built several LLM programs, or you’ll have built one, benchmarked it, and optimized it! In this article, we’ll go through the entire cycle: building a program, creating a gold set (synthetically, with AI—and yes, it’s actually useful, not just contrived!), and evaluating the results. For this article, our task will be to build a program that can count the mentions of “Artificial Intelligence,” “AI,” or any other ways of referring to AI. ## Overview We'll: 1. Define a DSPy signature for counting AI mentions 2. Fetch data from Wikipedia 3. Create a training dataset using a stronger model (Claude Sonnet 4) 4. Optimize a weaker model (Gemini Flash-lite 2.0) to match the stronger model's performance ::: {#fig-yt} {{< video https://youtu.be/fXjCleTYUm8?si=qA6mF6tccVDkOeez >}} A video version of this tutorial, even more beginner friendly. ::: ## Step 1: Define the AI Task Signature In DSPy, we define the task using a Signature class instead of writing prompts manually. DSPy provides two ways for you to specify your program. This is the shortest method. In this case, it has four parts: - **dspy.Predict**: This could have been `dspy.ChainOfThought`; it lets you specify the "strategy" the LLM should use. Predict is the vanilla option—no special strategy is mentioned in the prompt that DSPy sends to the LLM. - **Input ("paragraph")**: This tells the LLM that it will receive a "paragraph" as input. - **Output ("ai_occurrences_count")**: This tells the LLM that it will have to output the "AI occurrences count." - **Output Type ("float")**: This specifies that the output should be a float—nothing else. ```{python} import dspy ``` ```{python} ai_counter = dspy.Predict("paragraph -> ai_occurrences_count: float") ``` You can specify more. To fully define your program, you would use the class syntax (see the chunk below). In this case, you can add general instructions and descriptions to the fields (inputs and/or outputs). ```{python} import dspy # Setup the llm dspy.configure(lm=dspy.LM('gemini/gemini-2.0-flash-lite', temperature = 1.0, max_tokens = 6000)) # This define the signature of the AI function. The replaces prompts. class count_ai_occurrences(dspy.Signature): """Count the number times the word 'Artificial Intelligence' or 'AI' or any other reference to AI or AI-related terms appears in the paragraph""" paragraph: str= dspy.InputField(desc = "The paragraph to count the AI mentions in") ai_occurrences_count: int= dspy.OutputField(desc = "The number of times the word 'Artificial Intelligence' or 'AI' appears in the paragraph") dspy_module = dspy.Predict(count_ai_occurrences) ``` This signature will be turned into the following prompt by DSPy: --- ```json [ { "role": "system", "content": """Your input fields are: 1. `paragraph` (str): The paragraph to count the AI mentions in Your output fields are: 1. `ai_occurrences_count` (int): Number of times 'Artificial Intelligence' or 'AI' appears in the paragraph Format all interactions like this, filling in the values: [[ ## paragraph ## ]] {paragraph} [[ ## ai_occurrences_count ## ]] {ai_occurrences_count} # must be a single int value [[ ## completed ## ]] Objective: Count the number times the word 'Artificial Intelligence' or 'AI' or any other reference to AI or AI-related terms appears in the paragraph.""" }, { "role": "user", "content": """[[ ## paragraph ## ]] This is a paragraph mentioning AI once. Respond with the corresponding output fields, starting with [[ ## ai_occurrences_count ## ]] (must be a valid Python int), then end with [[ ## completed ## ]]. """ } ] ``` --- Ok, so our program is defined! That's it. There's one small thing I like to do—it's entirely optional. I do it because I want to use my DSPy program more like a regular function. So, before I go ahead, I wrap it in a function: ```python def count_ai_occurrences_f(paragraph): return dspy_module(paragraph=paragraph).ai_occurrences_count ``` The DSPy module requires keyword arguments and returns output as an object. Instead of repeatedly specifying my keyword arguments and the single output I want, I bake that in here. This also has the added benefit that my function now composes well with my data analytics tools, which expect not to provide a keyword argument or extract a value from an output object. ## Step 2: Fetch Data This section has nothing to do with LLMs. We are simply fetching content from the Wikipedia AI page and storing it in a dataframe. We use the Attachments library to easily fetch and split paragraphs from Wikipedia. ```python from attachments import Attachments attachments_dsl = "[images: false][select: p,title,h1,h2,h3,h4,h5,h6][split: paragraphs]" a = Attachments("https://en.wikipedia.org/wiki/Artificial_intelligence" + attachments_dsl) ``` We then use Datar as our data manipulation tool. I come from R and I love dplyr. Datar is an effort to provide a similar data manipulation experience here in Python. ```python from datar import f import datar.base as b from datar.tibble import tibble from datar.dplyr import mutate, summarise, n df = tibble(paragraphs = [p.text for p in a[:10]]) ``` ::: {.callout-note} ## Dataframe Structure The resulting tibble dataframe contains only one column (`paragraphs`) with the text from Wikipedia. ::: ![a tibble](images/dfimage1.jpg) ## Step 3: Applying the AI to our paragraphs Now we are starting to use large language models. Below, we apply our function to every row in our dataframe. In other words, we loop through each paragraph and send it to the LLM. The LLM returns the number of times it thinks "AI" was mentioned in the paragraph. The result from the LLM is extracted as a float. We store this in a new column of our dataframe, which we name `flash_response`. ```python df = mutate(df, flash_response = f.paragraphs.apply(count_ai_occurrences_f)) ``` This column is now our baseline. This shows how Flash-lite performs with the base prompt from DSPy. Now, we want to optimize that prompt! For this, we need a gold set. I like to create gold sets with state-of-the-art (SOTA) models and then optimize the prompt to approximate the responses I would get from a SOTA model, but using a much smaller, faster, and cheaper model. In other words, we'll provide a sample of our paragraphs to Sonnet 4 and then automatically "find a way" to prompt Flash-lite into responding like Sonnet would. This is extremely useful when you don't know the answer yourself but know that SOTA models do—or at least they get it "right enough" for you to gain valuable insights. Ok, so now we want to add a column with Sonnet's answers. ```python with dspy.context(lm=dspy.LM('anthropic/claude-sonnet-4-20250514')): df_with_goldset_col = mutate(df, resp_sonnet = f.paragraphs.apply(count_ai_occurrences_f)) ``` That's it. Let's break down those two lines. First, DSPy recommends using either `dspy.context` or `dspy.configure` to set the LLM. Both ways are fine and both are thread-safe. On the second line, we take our current dataframe, which now has two columns (`paragraphs` and `flash_response`), and loop through every value in paragraphs, passing each one to our AI program. We then save all of that in a new column called `resp_sonnet`, and the entire dataframe is stored as `df_with_goldset_col`. ::: {.callout-tip} ## Gold Set Strategy Using a SOTA model to create gold sets is a practical approach when you don't have manually labeled data but trust that advanced models will perform well enough for your use case. ::: ![](images/dfimage2.png) ## Evaluation Next, we need a metric! In this case, we'll keep it simple—we'll require an exact match. Let's add a column for exact_match (true/false). ```python df_with_goldset_col = mutate(df_with_goldset_col, exact_match = f.resp_sonnet == f.flash_response) ``` ![](images/dfimage3.png) Let's quickly calculate our current precision. Here, we are purely in dataframe manipulation mode with Datar. Using the `>>` operator, we can pass the dataframe you see above (as it comes out of mutate) to the summarise function, which sums all the True values (1s) and divides by the number of rows. ```python baseline_metrics = (mutate(df_with_goldset_col, exact_match = f.resp_sonnet == f.flash_response) >> summarise(baseline_precision = b.sum(f.exact_match)/n() * 100)) ``` This tells us that we have **65% baseline precision** with Flash-lite and this prompt. ## Preparing for the optimizer So now we have all the conceptual pieces needed to run the optimizer. ```python optimizer = dspy.MIPROv2(metric=exact_match) optimized_dspy_module = optimizer.compile(dspy_module, trainset=trainset) ``` But notice how I said "conceptual"—now we need to do a bit of data wrangling to get our dataframe into an object that compile knows how to work with. The same goes for the metric. Here's how to reshape the data: ```python trainset = [] for r in df_with_goldset_col.to_dict(orient='records'): trainset.append(dspy.Example( paragraph=r['paragraphs'], # this is the input ai_occurrences_count=r["resp_sonnet"]). # this is the target with_inputs('paragraph')) # this is needed (not sure why) ``` This is how to prepare the metric: it has to use `.[output_name]` to access the value of x (gold set) and y (trained model output). ```python def exact_match(x, y, trace=None): return x.ai_occurrences_count == y.ai_occurrences_count ``` With these two chunks of code, the optimizer will run! In this case, if we were to keep it as is, we would be using Flash-lite to compose the prompts (whenever the optimizer we choose does that). I prefer to use a SOTA model for that, so we will set a teacher model. To set a teacher model on MIPROv2, use the `teacher_settings` keyword. Be careful—different optimizers set the teacher in different ways. ## Automatic prompt optimization ```python optimizer = dspy.MIPROv2(metric=exact_match, teacher_settings=dspy.LM('anthropic/claude-sonnet-4-20250514')) optimized_dspy_module = optimizer.compile(dspy_module, trainset=trainset) ``` We'll wrap it in a function again so we can use it with our data analytics tools. ```python def count_ai_occurrences_opt(paragraph): return optimized_dspy_module(paragraph=paragraph).ai_occurrences_count ``` And we've built a complete one-shot pipeline to apply the optimized program, add it as a new column, and summarize the dataframe into performance metrics. Apart from `count_ai_occurrences_opt`, this has nothing to do with DSPy. ```python final_performance = (df_with_goldset_col >> mutate( # Applies flash to every row with the optimized prompt resp_flash_opt= f.paragraphs.apply(count_ai_occurrences_opt)) >> mutate( # Add 2 columns with 0 or 1 if the flash response is equal to the sonnet response flash_eq_sonnet = f.resp_sonnet == f.flash_response, # Compare flash with sonnet flash_opt_eq_sonnet = f.resp_flash_opt == f.resp_sonnet # Compare opt flash with sonnet ) >> summarise( # Sum the number of rows where the flash response is equal to the sonnet response flashlight_before_opt = b.sum(f.flash_eq_sonnet)/n() * 100, #n() is the number of rows in df # Sum the number of rows where the opt flash response is equal to the sonnet response flashlight_after_opt = b.sum(f.flash_opt_eq_sonnet)/n() * 100 #n() is the number of rows in df ) >> mutate(precision_increase=f.flashlight_after_opt-f.flashlight_before_opt) ) ``` ![](images/dfimage4.png) ## Results ::: {.callout-important} ## Performance Improvement Flash-lite improved by **20%**. Not bad! ::: Here is the optimized prompt: ```json [ { "role": "system", "content": """Your input fields are: 1. `paragraph` (str): The paragraph to count the AI mentions in Your output fields are: 1. `ai_occurrences_count` (int): The number of times the word 'Artificial Intelligence' or 'AI' appears in the paragraph All interactions will be structured in the following way, with the appropriate values filled in: [[ ## paragraph ## ]] {paragraph} [[ ## ai_occurrences_count ## ]] {ai_occurrences_count} # note: the value you produce must be a single int value [[ ## completed ## ]] Objective: Analyze the provided paragraph and determine the frequency of mentions related to "Artificial Intelligence" (AI). This includes direct references to "AI", "Artificial Intelligence", as well as any related concepts, technologies, or subfields associated with AI. Provide a count representing the total number of AI-related mentions. """ }, { "role": "user", "content": """[[ ## paragraph ## ]] In classical planning, the agent knows exactly what the effect of any action will be.[35] In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked.[36] Respond with the corresponding output fields, starting with the field [[ ## ai_occurrences_count ## ]] (must be formatted as a valid Python int), and then ending with the marker for [[ ## completed ## ]]. """ } ] ``` ## Conclusion In about 50 lines, we: - Fetched paragraphs from Wikipedia - Created a gold-set - Tuned Flash-lite - Improved its precision by 20% No prompt spaghetti. ## The Complete Script ```python import dspy from attachments import Attachments from datar import f import datar.base as b from datar.tibble import tibble from datar.dplyr import mutate, summarise, n # Setup the LLM dspy.configure(lm=dspy.LM('gemini/gemini-2.0-flash-lite', temperature=1.0, max_tokens=6000)) # Define the signature class count_ai_occurrences(dspy.Signature): """Count the number times the word 'Artificial Intelligence' or 'AI' or any other reference to AI or AI-related terms appears in the paragraph""" paragraph: str = dspy.InputField(desc="The paragraph to count the AI mentions in") ai_occurrences_count: int = dspy.OutputField(desc="The number of times the word 'Artificial Intelligence' or 'AI' appears in the paragraph") # Create the DSPy module dspy_module = dspy.Predict(count_ai_occurrences) # Wrap in a function def count_ai_occurrences_f(paragraph): return dspy_module(paragraph=paragraph).ai_occurrences_count # Fetch data attachments_dsl = "[images: false][select: p,title,h1,h2,h3,h4,h5,h6][split: paragraphs]" a = Attachments("https://en.wikipedia.org/wiki/Artificial_intelligence" + attachments_dsl) # Create dataframe df = tibble(paragraphs=[p.text for p in a[:10]]) # Apply baseline model df = mutate(df, flash_response=f.paragraphs.apply(count_ai_occurrences_f)) # Create gold set with Sonnet with dspy.context(lm=dspy.LM('anthropic/claude-sonnet-4-20250514')): df_with_goldset_col = mutate(df, resp_sonnet=f.paragraphs.apply(count_ai_occurrences_f)) # Calculate baseline precision baseline_metrics = (mutate(df_with_goldset_col, exact_match=f.resp_sonnet == f.flash_response) >> summarise(baseline_precision=b.sum(f.exact_match)/n() * 100)) # Prepare training set trainset = [] for r in df_with_goldset_col.to_dict(orient='records'): trainset.append(dspy.Example( paragraph=r['paragraphs'], ai_occurrences_count=r["resp_sonnet"]).with_inputs('paragraph')) # Define metric def exact_match(x, y, trace=None): return x.ai_occurrences_count == y.ai_occurrences_count # Optimize optimizer = dspy.MIPROv2(metric=exact_match, teacher_settings=dspy.LM('anthropic/claude-sonnet-4-20250514')) optimized_dspy_module = optimizer.compile(dspy_module, trainset=trainset) # Wrap optimized module def count_ai_occurrences_opt(paragraph): return optimized_dspy_module(paragraph=paragraph).ai_occurrences_count # Calculate final performance final_performance = (df_with_goldset_col >> mutate(resp_flash_opt=f.paragraphs.apply(count_ai_occurrences_opt)) >> mutate( flash_eq_sonnet=f.resp_sonnet == f.flash_response, flash_opt_eq_sonnet=f.resp_flash_opt == f.resp_sonnet ) >> summarise( flashlight_before_opt=b.sum(f.flash_eq_sonnet)/n() * 100, flashlight_after_opt=b.sum(f.flash_opt_eq_sonnet)/n() * 100 ) >> mutate(precision_increase=f.flashlight_after_opt-f.flashlight_before_opt) ) ``` <style> /* Zen DSPy Theme - Minimal and Clean */ /* Hero image styling */ article > p:first-of-type img { width: 100%; height: 320px; object-fit: cover; object-position: center; margin: 0 0 3rem 0; border-radius: 0; } /* Clean title */ .title { font-size: 2.5rem; font-weight: 800; letter-spacing: -0.02em; line-height: 1.2; margin: 0 0 1rem 0; color: var(--bs-gray-900); } .quarto-title-meta { font-size: 0.9rem; color: var(--bs-gray-600); margin-bottom: 3rem; } /* Simplified TOC */ #TOC, nav[role="doc-toc"] { font-size: 0.9rem; line-height: 1.8; } #TOC ul, nav[role="doc-toc"] ul { list-style: none; padding-left: 0; } #TOC li, nav[role="doc-toc"] li { margin: 0.5rem 0; } #TOC a, nav[role="doc-toc"] a { color: var(--bs-gray-600); text-decoration: none; padding: 0.25rem 0; display: block; border-left: 2px solid transparent; padding-left: 1rem; transition: all 0.2s ease; } #TOC a:hover, nav[role="doc-toc"] a:hover { color: var(--bs-primary); border-left-color: var(--bs-primary); background: rgba(12, 133, 204, 0.05); } #TOC a.active, nav[role="doc-toc"] a.active { color: var(--bs-primary); font-weight: 600; border-left-color: var(--bs-primary); } @media (min-width: 992px) { #TOC, nav[role="doc-toc"] { position: sticky; top: 2rem; max-height: calc(100vh - 4rem); overflow-y: auto; margin-left: 3rem; } } /* Clean code blocks */ pre { background: #f8f9fa; border: none; border-radius: 6px; padding: 1.5rem; margin: 1.5rem 0; overflow-x: auto; overflow-y: hidden; font-size: 0.875rem; line-height: 1.6; max-height: none; /* Remove shadow line */ box-shadow: none !important; border: 1px solid #e9ecef; } pre code { font-family: 'JetBrains Mono', 'Fira Code', monospace; font-size: inherit; white-space: pre; word-break: normal; word-wrap: normal; background: transparent !important; padding: 0 !important; display: inline-block; min-width: 100%; } /* Force syntax highlighting colors */ .sourceCode { overflow: auto !important; } /* Python syntax highlighting */ .sourceCode .kw { color: #0969da !important; font-weight: 600; } /* Keywords */ .sourceCode .dt { color: #0550ae !important; } /* Data types */ .sourceCode .dv, .sourceCode .fl { color: #0a3069 !important; } /* Numbers */ .sourceCode .st { color: #032f62 !important; } /* Strings */ .sourceCode .co { color: #6e7781 !important; font-style: italic; } /* Comments */ .sourceCode .ot { color: #953800 !important; } /* Other tokens */ .sourceCode .cf { color: #cf222e !important; font-weight: 600; } /* Control flow */ .sourceCode .fu { color: #8250df !important; } /* Functions */ .sourceCode .im { color: #cf222e !important; } /* Import */ .sourceCode .op { color: #0969da !important; } /* Operators */ .sourceCode .bu { color: #8250df !important; } /* Built-in */ .sourceCode .va { color: #0550ae !important; } /* Variables */ .sourceCode .cn { color: #0a3069 !important; } /* Constants */ .sourceCode .sc { color: #0a3069 !important; } /* Special chars */ /* Better formatting for JSON/prompt blocks */ pre:has(code.language-python) { font-size: 0.8rem; } /* Clean scrollbar styling */ pre::-webkit-scrollbar { width: 6px; height: 6px; } pre::-webkit-scrollbar-track { background: transparent; } pre::-webkit-scrollbar-thumb { background: rgba(0, 0, 0, 0.15); border-radius: 3px; } pre::-webkit-scrollbar-thumb:hover { background: rgba(0, 0, 0, 0.25); } /* Remove any shadows or borders that might appear on scroll */ pre:focus { outline: none; box-shadow: none !important; } div.sourceCode { overflow: auto; background: transparent; border: none; box-shadow: none !important; margin: 0; } /* Inline code */ code:not(pre code) { background: rgba(12, 133, 204, 0.1); color: var(--bs-gray-900); padding: 0.2em 0.4em; border-radius: 3px; font-size: 0.875em; font-family: 'JetBrains Mono', monospace; } /* Callouts */ .callout { border-left: 3px solid var(--bs-primary); padding: 1rem 1.5rem; margin: 1.5rem 0; background: rgba(12, 133, 204, 0.05); border-radius: 0 6px 6px 0; } .callout-note { border-left-color: var(--bs-info); background: rgba(29, 191, 224, 0.05); } .callout-tip { border-left-color: var(--bs-success); background: rgba(29, 211, 1, 0.05); } .callout-important { border-left-color: var(--bs-warning); background: rgba(255, 207, 1, 0.05); } /* Dark mode adjustments */ .quarto-dark pre { background: #1a1a1a; color: #e2e8f0; border-color: #2d3748; } /* Dark mode syntax highlighting */ .quarto-dark .sourceCode .kw { color: #79c0ff !important; } /* Keywords */ .quarto-dark .sourceCode .dt { color: #a5d6ff !important; } /* Data types */ .quarto-dark .sourceCode .dv, .quarto-dark .sourceCode .fl { color: #79c0ff !important; } /* Numbers */ .quarto-dark .sourceCode .st { color: #a5d6ff !important; } /* Strings */ .quarto-dark .sourceCode .co { color: #8b949e !important; } /* Comments */ .quarto-dark .sourceCode .ot { color: #ffa657 !important; } /* Other tokens */ .quarto-dark .sourceCode .cf { color: #ff7b72 !important; } /* Control flow */ .quarto-dark .sourceCode .fu { color: #d2a8ff !important; } /* Functions */ .quarto-dark .sourceCode .im { color: #ff7b72 !important; } /* Import */ .quarto-dark .sourceCode .op { color: #79c0ff !important; } /* Operators */ .quarto-dark .sourceCode .bu { color: #d2a8ff !important; } /* Built-in */ .quarto-dark .sourceCode .va { color: #7ee787 !important; } /* Variables */ .quarto-dark .sourceCode .cn { color: #79c0ff !important; } /* Constants */ .quarto-dark .sourceCode .sc { color: #79c0ff !important; } /* Special chars */ .quarto-dark pre::-webkit-scrollbar-thumb { background: rgba(255, 255, 255, 0.15); } .quarto-dark pre::-webkit-scrollbar-thumb:hover { background: rgba(255, 255, 255, 0.25); } .quarto-dark code:not(pre code) { background: rgba(29, 191, 224, 0.15); color: #e2e8f0; } .quarto-dark .callout { background: rgba(12, 133, 204, 0.1); } .quarto-dark .callout-note { background: rgba(29, 191, 224, 0.1); } .quarto-dark .callout-tip { background: rgba(29, 211, 1, 0.1); } .quarto-dark .callout-important { background: rgba(255, 207, 1, 0.1); } .quarto-dark #TOC a, .quarto-dark nav[role="doc-toc"] a { color: var(--bs-gray-400); } .quarto-dark #TOC a:hover, .quarto-dark nav[role="doc-toc"] a:hover, .quarto-dark #TOC a.active, .quarto-dark nav[role="doc-toc"] a.active { color: var(--bs-secondary); border-left-color: var(--bs-secondary); background: rgba(29, 191, 224, 0.1); } /* Content spacing */ article { max-width: 45rem; margin: 0 auto; padding: 2rem 1rem; } article h2 { margin-top: 3rem; margin-bottom: 1.5rem; font-weight: 700; font-size: 1.75rem; letter-spacing: -0.01em; } article h3 { margin-top: 2rem; margin-bottom: 1rem; font-weight: 600; font-size: 1.25rem; } article p { line-height: 1.7; margin-bottom: 1.25rem; color: var(--bs-gray-800); } article ul, article ol { margin: 1.5rem 0; padding-left: 1.5rem; } article ul li, article ol li { margin-bottom: 0.5rem; line-height: 1.7; } .quarto-dark article p { color: var(--bs-gray-300); } /* Images */ article img { max-width: 100%; height: auto; margin: 2rem auto; display: block; } /* Mobile responsiveness */ @media (max-width: 991px) { /* Typography adjustments */ .title { font-size: 1.75rem; line-height: 1.3; margin-bottom: 0.75rem; } .quarto-title-meta { font-size: 0.85rem; margin-bottom: 2rem; } article { padding: 1rem; max-width: 100%; } article h2 { font-size: 1.5rem; margin-top: 2rem; margin-bottom: 1rem; } article h3 { font-size: 1.25rem; margin-top: 1.5rem; } article p { font-size: 1rem; line-height: 1.6; margin-bottom: 1rem; } /* Hero image adjustment */ article > p:first-of-type img { height: 200px; margin-bottom: 2rem; } /* TOC as collapsible card on mobile */ #TOC, nav[role="doc-toc"] { position: relative; margin-bottom: 2rem; padding: 1rem; background: var(--bs-gray-100); border-radius: 8px; box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05); } #TOC::before, nav[role="doc-toc"]::before { content: "Table of Contents"; display: block; font-weight: 600; margin-bottom: 0.5rem; color: var(--bs-gray-700); } .quarto-dark #TOC, .quarto-dark nav[role="doc-toc"] { background: rgba(255, 255, 255, 0.05); } .quarto-dark #TOC::before, .quarto-dark nav[role="doc-toc"]::before { color: var(--bs-gray-300); } /* Code blocks mobile optimization */ pre { font-size: 0.75rem; padding: 1rem; margin: 1rem -0.5rem; border-radius: 4px; overflow-x: auto; -webkit-overflow-scrolling: touch; } /* Add visual scroll indicator for code blocks */ pre::after { content: "→"; position: absolute; right: 0.5rem; top: 0.5rem; color: var(--bs-gray-500); font-size: 0.75rem; opacity: 0; transition: opacity 0.3s; } pre:hover::after { opacity: 1; } /* Inline code sizing */ code:not(pre code) { font-size: 0.85em; padding: 0.1em 0.3em; } /* Callouts mobile optimization */ .callout { margin: 1.5rem -0.5rem; padding: 1rem; font-size: 0.9rem; } .callout-note, .callout-tip, .callout-important { border-left-width: 3px; } /* Images mobile optimization */ article img:not(article > p:first-of-type img) { margin: 1.5rem -0.5rem; width: calc(100% + 1rem); max-width: none; border-radius: 4px; } /* Lists mobile optimization */ ul, ol { padding-left: 1.5rem; margin-bottom: 1rem; } ul li, ol li { margin-bottom: 0.5rem; } /* Blockquotes mobile */ blockquote { margin: 1.5rem 0; padding: 0.75rem 1rem; font-size: 0.95rem; } /* Tables mobile - make them scrollable */ .table-responsive, table { display: block; width: 100%; overflow-x: auto; -webkit-overflow-scrolling: touch; } table { font-size: 0.875rem; margin: 1.5rem -0.5rem; width: calc(100% + 1rem); } /* Section spacing */ section { margin: 2rem 0; } } /* Small mobile devices (phones in portrait) */ @media (max-width: 575px) { /* Even smaller typography */ .title { font-size: 1.5rem; } article h2 { font-size: 1.35rem; } article h3 { font-size: 1.15rem; } /* Minimal padding on very small screens */ article { padding: 0.75rem; } /* Code blocks full width */ pre { margin: 1rem -0.75rem; border-radius: 0; font-size: 0.7rem; } /* Images full bleed */ article img:not(article > p:first-of-type img) { margin: 1rem -0.75rem; width: calc(100% + 1.5rem); border-radius: 0; } /* Hero image smaller on phones */ article > p:first-of-type img { height: 160px; } /* Callouts full width */ .callout { margin: 1rem -0.75rem; border-radius: 0; } /* TOC compact */ #TOC, nav[role="doc-toc"] { padding: 0.75rem; font-size: 0.85rem; } #TOC a, nav[role="doc-toc"] a { padding: 0.4rem 0; padding-left: 0.75rem; } } /* Touch-friendly enhancements */ @media (hover: none) and (pointer: coarse) { /* Larger tap targets */ a, button { min-height: 44px; min-width: 44px; display: inline-flex; align-items: center; justify-content: center; } /* Remove hover effects on touch devices */ a:hover { background-size: 0 2px !important; } /* Better code copy button for touch */ .code-copy-button { padding: 0.5rem 0.75rem; font-size: 0.85rem; } } /* Landscape mobile optimization */ @media (max-width: 991px) and (orientation: landscape) { /* Reduce vertical spacing in landscape */ .title { margin-bottom: 0.5rem; } .quarto-title-meta { margin-bottom: 1.5rem; } article h2 { margin-top: 1.5rem; margin-bottom: 0.75rem; } /* Smaller hero image in landscape */ article > p:first-of-type img { height: 150px; margin-bottom: 1.5rem; } } /* Focus on readability */ ::selection { background: rgba(12, 133, 204, 0.2); color: inherit; } /* Clean headings without decorations */ article h2::before { display: none !important; } /* Better link styling */ article a { color: var(--bs-primary); text-decoration: none; border-bottom: 1px solid transparent; transition: border-color 0.2s ease; } article a:hover { border-bottom-color: var(--bs-primary); } /* Code output styling */ .cell-output pre { background: #f0f4f8; border-left: 3px solid var(--bs-info); font-size: 0.8rem; } .quarto-dark .cell-output pre { background: rgba(29, 191, 224, 0.1); border-left-color: var(--bs-info); } /* Python code cell styling */ div.sourceCode { margin: 1rem 0; } /* Improve spacing between sections */ section { margin: 3rem 0; } /* Clean blockquotes */ blockquote { border-left: 3px solid var(--bs-gray-300); padding-left: 1rem; color: var(--bs-gray-700); font-style: italic; margin: 1.5rem 0; } .quarto-dark blockquote { border-left-color: var(--bs-gray-600); color: var(--bs-gray-400); } /* Smooth scrolling */ html { scroll-behavior: smooth; } /* Better focus states for accessibility */ a:focus, button:focus, input:focus, textarea:focus, select:focus { outline: 2px solid var(--bs-primary); outline-offset: 2px; } /* Print styles */ @media print { pre { max-height: none; page-break-inside: avoid; } #TOC, nav[role="doc-toc"] { display: none; } article { max-width: 100%; } } </style>