Anthropic 04 Call Summarizer

04 Call Summarizer

real_world_promptinganthropic-courses

alph-notebooks/anthropic-courses / 04_call_summarizer.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Lesson 4: Call transcript summarizer

In this lesson, we're going to write a complex prompt for a common customer use-case: summarizing. Specifically, we'll summarize long customer service call transcripts. Our goal is to summarize customer service calls for customer support metrics. We want summaries of complete customer service calls to evaluate the efficacy of our customer support team. This means we'll exclude calls that have connection issues, language barriers, and other issues that hinder effective summarization.

Let's imagine we work for Acme Corporation, a company that sells smart home devices. The company handles hundreds of customer service calls daily and needs a way to quickly turn these conversations into useful, structured data.

Some important considerations include:

Calls can be short and sweet or long and complicated.
Customers might be calling about anything from a simple Wi-Fi connection issue to a complex system malfunction.
We need our summaries in a specific format so they're easy to analyze later.
We have to be careful not to include any personal customer information in our summaries.

To help us out, we'll follow the best practices we described previously:

Use a system prompt to set the stage.
Structure the prompt for optimal performance.
Give clear instructions and define your desired output.
Use XML tags to organize information.
Handle special cases and edge scenarios.
Provide examples to guide the model.

Understanding the data

Now that we understand our task, let's take a look at the data we'll be working with. In this lesson, we'll use a variety of simulated customer service call transcripts from Acme Corporation's smart home device support team. These transcripts will help us create a robust prompt that can handle different scenarios.

Let's examine some of the types of call transcripts we might encounter:

A short and simple transcript:

[42]

A medium-length transcript with an eventual resolution:

[43]

A longer call with no resolution:

[44]

These examples showcase the variety of calls and considerations we need to handle:

Calls have wildly different lengths.
Calls feature various support issues (simple fixes, device malfunctions, complex problems).
Some calls end with a resolution and others remain unresolved cases.
Some calls require follow-up.

As we build our prompt, we'll need to ensure it can effectively summarize all these types of calls, extracting the key information and presenting it in a consistent, structured format. In the next section, we'll start building our prompt, step by step, to handle this diverse range of call transcripts.

A simple version of the prompt

Now that we understand our task and the kind of data we're working with, let's start building our prompt. We'll begin with a basic version and gradually refine it to handle the complexities of our call summarization task.

Let's begin with this very simple prompt that outlines the basic task:

[45]

This basic prompt gives Claude a general idea of what we want, but it has several limitations:

It doesn't specify the desired output format, which could lead to inconsistent summaries.
It doesn't provide guidance on how to handle different scenarios (like unresolved issues or insufficient information).
It doesn't set any constraints on length or content, potentially resulting in overly long or detailed summaries.
It doesn't instruct Claude to omit personal information, which could lead to privacy issues.

With that said, let's test it out to get a sense of how it performs:

[46]

[47]

Here is a summary of the customer service call transcript:

Main Issue:
The customer was unable to turn on their Acme smart light bulb.

Resolution:
The service agent instructed the customer to reset the bulb by turning the power off for 5 seconds and then back on. This should reset the bulb and allow it to turn on.

Follow-Up:
The agent told the customer to call back if they continued to have issues after trying the reset procedure. No other follow-up was mentioned.

[48]

Summary:

Main Issue: The customer's Acme SmartTherm thermostat was not maintaining the set temperature of 72°F, and the house was much warmer.

Resolution: The agent guided the customer through the process of recalibrating the SmartTherm thermostat. This involved accessing the "Calibration" menu, adjusting the temperature to match the customer's room thermometer (79°F in this case), and confirming the new setting. The recalibration process may take a few minutes to complete.

Follow-up Required: The customer was advised to check the thermostat in an hour to see if the issue was resolved after the recalibration process completed.

[49]

Here is a summary of the customer service call transcript:

Main Issue:
The customer was having an issue with their Acme SecureHome alarm system going off randomly in the middle of the night, even though all doors and windows were closed properly.

How It Was Resolved:
The customer service agent first had the customer check for any error messages on the control panel and confirm that the battery was not low. When those basic troubleshooting steps did not reveal the issue, the agent determined that one of the sensors may be malfunctioning and needed to transfer the customer to the technical support team for a full system diagnostic.

Required Follow-Up:
The technical support team needs to run a diagnostic on the customer's SecureHome system to identify which sensor(s) may be causing the false alarms and then repair or replace those components. The customer should be contacted again once the diagnostic is complete and the repair/replacement has been performed to ensure the random alarms have been resolved.

As you can see, while Claude does provide a summary, it's not in a format that would be easy to analyze systematically. The summary might be too long or too short, and it might not consistently cover all the points we're interested in.

In the next steps, we'll start adding more structure and guidance to our prompt to address these limitations. We'll see how each addition improves the quality and consistency of Claude's summaries.

Remember, prompt engineering is an iterative process. We start simple and gradually refine our prompt.

Adding a system prompt

The easiest place to start is with a system prompt that sets the overall context and role for Claude, helping to guide its behavior throughout the interaction.

Let's start with this system prompt:

[50]

Structuring our main prompt

Next, we're going to start writing the main prompt. We'll rely on some of these prompting tips:

Put long documents (our transcripts) at the top.
Add detailed instructions and output format requirements.
Introduce XML tags for structuring the prompt and output.
Give Claude space "to think out loud".

Because this prompt may get quite long, we'll write individual pieces in isolation and then combine them together.

The input data

When working with large language models like Claude, it's crucial to put long documents, like our call transcripts, at the beginning of the prompt. This ensures that Claude has all the necessary context before receiving specific instructions. We should also use XML tags to identify the transcript in the prompt:

[51]

Instructions and output format

Before we go any further, let's think clearly about what a good structured output format might look like. To make our life easier when parsing the results, it's often easiest to ask Claude for a JSON response. What should a good JSON look like in this case?

At a minimum, our JSON output should include the following:

A status as to whether Claude had enough information to generate a summary. We'll come back to this. For now, we'll assume that all summaries have a status of "COMPLETE" meaning that Claude could generate a summary.
A summary of the customer issue
If the call requires additional follow up
Details on any follow up actions, if required (call the customer back, etc.)
How the issue was resolved
A list of ambiguities or vague points in the conversation

Here's a proposed sample JSON structure:

{
  "summary": {
    "customerIssue": "Brief description of the main problem or reason for the call",
    "resolution": "How the issue was addressed or resolved, if applicable",
    "followUpRequired": true/false,
    "followUpDetails": "Description of any necessary follow-up actions, or null if none required"
  },
  "status": "COMPLETE",
  "ambiguities": ["List of any unclear or vague points in the conversation, or an empty array if none"]
}

Let's create a new piece of our prompt that includes specific instructions, including:

Create a summary focusing on the main issue, resolution, and any follow-up actions required.
Generate a JSON output following our specific, standardized format.
Omit specific customer information in the summaries.
Keep each piece of the summary short.

Here's an attempt at providing the output instructions, including our specific output JSON format:

[52]

Using XML tags and giving Claude room to think

Next, we'll employ two more prompting strategies: giving Claude room to think and using XML tags.

We'll ask Claude to start by outputting <thinking> tags that contain its analysis.
Then, we'll ask Claude to output its JSON output inside of <json>.

Here's the final piece of our first draft prompt:

[53]

By asking Claude to put its analysis within <thinking> tags, we're prompting it to break down its thought process before formulating the final JSON output. This encourages a more thorough and structured approach to analyzing the transcript. The <thinking> section allows us (and potentially other reviewers or systems) to see Claude's reasoning process. This transparency can be crucial for debugging and quality assurance purposes.

By separating the analysis (<thinking>) from the structured output (<json>), we create a clear distinction between Claude's interpretation of the transcript and its formatted summary. This can be helpful in cases where we might want to review the analysis separately from the JSON output, but also by isolating the JSON content inside of <json> tags, we make it easy to parse the final response and capture the JSON we want to work with.

Testing our updated prompt

Here's the complete version of the prompt, constructed by combining the individual prompt pieces we've written so far:

[56]

Here's a function we can use to test our prompt:

[57]

Let's test out the prompt using some of the call transcripts we previously defined:

[58]

<thinking>
From the transcript, the main issue appears to be that the customer could not turn on their smart light bulb. The resolution provided by the agent was to reset the bulb by turning the power off for 5 seconds and then back on.

The agent did offer for the customer to call back if they needed further assistance, indicating potential follow-up may be required if the reset did not resolve the issue. However, no specific follow-up details were provided.

There do not seem to be any significant ambiguities in the conversation.
</thinking>

<json>
{
  "summary": {
    "customerIssue": "Unable to turn on smart light bulb",
    "resolution": "Agent instructed customer to reset the bulb by turning power off for 5 seconds, then back on",
    "followUpRequired": true,
    "followUpDetails": "Customer was advised to call back if the reset did not resolve the issue"
  },
  "status": "COMPLETE",
  "ambiguities": []
}
</json>

[59]

<thinking>
Main issue: The customer's Acme SmartTherm thermostat is not maintaining the set temperature of 72°F, and the house is much warmer.

Resolution: The agent guided the customer through recalibrating the SmartTherm thermostat by:
1. Having the customer press and hold the menu button for 5 seconds.
2. Navigating to the "Calibration" menu and selecting it.
3. Adjusting the temperature to match the customer's room thermometer reading of 79°F.
4. Confirming the new calibration setting.

Follow-up required: Yes, the agent instructed the customer to check back in an hour to see if the recalibration resolved the temperature issue.

Ambiguities: None
</thinking>

<json>
{
  "summary": {
    "customerIssue": "Thermostat not maintaining set temperature, causing house to be much warmer.",
    "resolution": "Agent guided customer through recalibrating the thermostat to match room temperature.",
    "followUpRequired": true,
    "followUpDetails": "Customer to check back in an hour to see if recalibration resolved the temperature issue."
  },
  "status": "COMPLETE",
  "ambiguities": []
}
</json>

[60]

<thinking>
Main issue: The customer's Acme SecureHome system alarm is going off randomly in the middle of the night, even though doors and windows are closed properly.

Resolution: The agent suggests running a diagnostic on the system to identify potential sensor malfunctions. The customer is transferred to the technical team to perform the diagnostic and resolve the issue.

Follow-up required: Yes, the technical team needs to follow up with the customer to diagnose and fix the alarm system problem.

Ambiguities: None identified in the conversation.
</thinking>

<json>
{
  "summary": {
    "customerIssue": "Customer's home security alarm system is going off randomly at night without apparent cause.",
    "resolution": "Agent suggests running a diagnostic to check for sensor malfunction and transfers customer to technical team.",
    "followUpRequired": true,
    "followUpDetails": "Technical team to diagnose and resolve the issue with the customer's alarm system."
  },
  "status": "COMPLETE",
  "ambiguities": []
}
</json>

Those responses all look great! Let's try another call transcript that has a bit of ambiguity to it to see if the JSON result includes those ambiguities:

[64]

<thinking>
Main Issue: The customer is experiencing issues with their Acme SmartLock not consistently locking automatically or manually through the app.

Resolution: The agent attempted to troubleshoot by asking for the specific SmartLock model and suggesting a reset, but the customer had to leave before completing the troubleshooting process.

Follow-Up Required: Yes, the customer needs to call back to complete a full diagnostic and troubleshooting session with the technical team.

Ambiguities:
- The customer was unsure if the issue was related to their phone or not, suggesting a potential connectivity problem.
- The customer mentioned having issues with another Acme product (SmartTherm), but it's unclear if those issues are related to the SmartLock problem.
- The customer's contact number was not clearly provided, which could make follow-up more difficult.
</thinking>

<json>
{
  "summary": {
    "customerIssue": "SmartLock not consistently locking automatically or manually through the app.",
    "resolution": "Attempted troubleshooting but customer had to leave before completing the process.",
    "followUpRequired": true,
    "followUpDetails": "Customer needs to call back for full diagnostic and troubleshooting session with technical team."
  },
  "status": "COMPLETE",
  "ambiguities": [
    "Potential connectivity issue with customer's phone",
    "Unclear if issues with other Acme product are related",
    "Customer's contact number not clearly provided"
  ]
}
</json>

Great! Everything seems to be working as intended

Edge cases

So far, all of the call transcripts we've tried have been relatively straightforward customer service calls. In the real world, we would expect to also encounter transcripts that perhaps we don't want to summarize, including:

Calls with connection issues
Calls with language barriers
Calls with garbled transcripts
Calls with irrational or upset customers

Remember, our goal is to summarize these calls to help gauge the effectiveness of the customer service we offer. If we include these edge-case calls in the summaries, we'll likely get skewed results.

Let's see what happens with some of these edge cases with our current prompt. Below we've defined some new call transcripts:

[65]

Let's run these edge-case transcripts through our prompt and see what sort of results we get:

[66]

<thinking>
Issue: The customer appears to have dialed the wrong number for technical support.
Resolution: Since it was a wrong number, there was no issue to resolve. The agent politely concluded the call.
Follow-up: No follow-up is required since it was a misdialed call.
Ambiguities: There are no apparent ambiguities in this brief conversation.
</thinking>

<json>
{
  "summary": {
    "customerIssue": "The customer dialed the wrong number for technical support",
    "resolution": "The agent concluded the call politely since it was a misdialed number",
    "followUpRequired": false,
    "followUpDetails": null
  },
  "status": "COMPLETE",
  "ambiguities": []
}
</json>

[67]

<thinking>
In this transcript, the main issue is unclear as the customer does not provide sufficient details about the device they are having trouble with. The agent attempts to clarify the issue, but the customer disconnects the call before providing more information.

There is no resolution since the issue is not clearly identified. However, follow-up is required to gather more details from the customer about the specific device and the problem they are experiencing.

The key ambiguity is the lack of clarity about the device and the nature of the problem. The customer's vague statements ("the thing isn't working" and "the usual one") do not provide enough information for the agent to diagnose or resolve the issue.
</thinking>

<json>
{
  "summary": {
    "customerIssue": "Customer reported an unspecified device was not working but did not provide further details.",
    "resolution": "No resolution was possible due to lack of information from the customer.",
    "followUpRequired": true,
    "followUpDetails": "Agent needs to contact the customer again to gather details about the specific device and issue."
  },
  "status": "COMPLETE",
  "ambiguities": ["The device the customer was referring to", "The nature of the problem with the device"]
}
</json>

[68]

<thinking>
From the transcript, it appears the customer called Acme Smart Home Support, but their voice was garbled, and the agent could not understand them. With this limited information, it is unclear what the main issue or reason for the call was. There was no resolution provided, and it is ambiguous whether follow-up is required since the issue itself is unknown. The key ambiguity is the lack of clear communication from the customer, preventing the agent from understanding the problem.
</thinking>

<json>
{
  "summary": {
    "customerIssue": "Unclear due to garbled voice from the customer",
    "resolution": "No resolution provided since the issue could not be understood",
    "followUpRequired": true,
    "followUpDetails": "Agent should try to reconnect with the customer for clearer communication"
  },
  "status": "COMPLETE",
  "ambiguities": ["The customer's voice was garbled, preventing understanding of the issue"]
}
</json>

[69]

<thinking>
Main issue: The customer called and spoke in Spanish, but the agent could not understand Spanish.
Resolution: The agent tried to get a translator on the line to resolve the language barrier.
Follow-up required: Yes, the agent needs to connect with a Spanish translator to assist the customer.
Ambiguities: It is unclear why the customer called, as the reason for their call is not stated in the transcript.
</thinking>

<json>
{
  "summary": {
    "customerIssue": "Customer spoke in a language the agent did not understand (Spanish).",
    "resolution": "Agent attempted to get a translator to resolve the language barrier.",
    "followUpRequired": true,
    "followUpDetails": "Agent needs to connect the customer with a Spanish translator."
  },
  "status": "COMPLETE",
  "ambiguities": ["Reason for the customer's call is not stated in the transcript."]
}
</json>

Unfortunately, we're getting full summaries for these edge-case transcripts. Here are some problematic parts of the responses:

"customerIssue": "Customer spoke in a language the agent did not understand (Spanish)."

"customerIssue": "Unclear due to garbled voice from the customer"

"customerIssue": "The customer dialed the wrong number for technical support"

Remember that our goal is to summarize our customer service calls to get some insight into how effective our customer service team is. These edge-case transcripts are resulting in complete summaries that will cause problems when analyzing all the summaries. We'll need to decide on a strategy for handling these calls.

Further prompt improvements

As we previously saw, our prompt is currently generating full summaries for edge-case transcripts. We want to change this behavior. We have a couple of options for how we handle these edge-cases:

Flag them in some way to indicate they are not summarizable, allowing for later human-review.
Categorize them separately (e.g., "technical difficulty," "language barrier," etc.).

For simplicity's sake, we'll opt to flag these edge-case calls by asking the model to output JSON that looks like this:

{
  "status": "INSUFFICIENT_DATA"
}

In order to make this work, we'll need to update our prompt in the following ways:

Add instructions explaining the desired "INSUFFICIENT_DATA" output
Add examples to show summarizable and non-summarizable transcripts along with their corresponding JSON outputs.

Updating our instructions

Let's write a new part of the instructions portion of the prompt to explain when the model should output our "INSUFFICIENT_DATA" JSON.

[70]

Adding examples

As we discussed previously in this course, it's almost always a good idea to add examples to a prompt. In this specific use case, examples will help Claude generally understand the types of summaries we want for both summarizable and non-summarizable call transcripts.

Here's a set of examples we could include in our prompt:

[71]

Note that the examples cover three different situations:

A complete interaction that does not require follow up
A complete interaction that does require follow up and contains ambiguities
A non-summarizable interaction that contains insufficient data

When providing examples to Claude, it's important to cover a variety of input/output pairs.

Our final prompt

Let's combine our initial prompt with the additions we made in the previous section:

the instructions on handling calls with insufficient data
the set of example inputs and outputs

This is the new complete prompt:

[75]

The above prompt is quite long, but here is the general structure:

The system prompt sets the context, role, and tone for the model.
The main prompt includes the following:
- the call transcript
- a set of instructions containing:
  - general instructions
  - guidelines
  - output format requirements
  - details on handling edge-case calls
  - examples
- details on the XML tags to use in the output

Here's a summary to help visualize the flow of the prompt:

Analyze the following customer service call transcript and generate a JSON summary of the interaction:

<transcript>
[INSERT CALL TRANSCRIPT HERE]
</transcript>

<instructions>
- General instructions and guidelines
- Output JSON format description
- Insufficient data (edge-case) criteria
<examples>
varied example inputs and outputs
</examples>
</instructions>

Before generating the JSON, please analyze the transcript in <thinking> tags. 
Include your identification of the main issue, resolution, follow-up requirements, and any ambiguities. 
Then, provide your JSON output in <json> tags.

Let's test the final prompt with a new function. Note that this function extracts the JSON summary content inside the <json> tags:

[76]

Let's test it out with a bunch of our existing call variables:

[77]

{
  "summary": {
    "customerIssue": "Unable to turn on smart light bulb",
    "resolution": "Agent guided customer to reset the bulb by cycling power off and on",
    "followUpRequired": false,
    "followUpDetails": null
  },
  "status": "COMPLETE",
  "ambiguities": []
}

[78]

{
  "summary": {
    "customerIssue": "Acme SecureHome alarm system going off randomly multiple times at night without apparent cause",
    "resolution": "Initial troubleshooting steps taken, but issue unresolved. Customer transferred to technical team for diagnostics",
    "followUpRequired": true,
    "followUpDetails": "Technical team to diagnose and resolve issue with alarm system"
  },
  "status": "COMPLETE",
  "ambiguities": []
}

Let's try our call transcript that should result in a summary with a non-empty ambiguities array:

[79]

{
  "summary": {
    "customerIssue": "SmartLock not reliably locking automatically or through app, behavior is inconsistent",
    "resolution": "Troubleshooting attempted but incomplete due to lack of model details, customer had to leave",
    "followUpRequired": true,
    "followUpDetails": "Customer to call back for further troubleshooting of SmartLock issue when available"
  },
  "status": "COMPLETE",
  "ambiguities": [
    "Unclear if related SmartTherm issue mentioned",
    "SmartLock model not identified",
    "Customer's contact number not confirmed"
  ]
}

Now let's try some of our edge case prompts that we do not want summarized:

[80]

{
  "status": "INSUFFICIENT_DATA"
}

[82]

{
  "status": "INSUFFICIENT_DATA"
}

[83]

{
  "status": "INSUFFICIENT_DATA"
}

Great! We're getting the exact outputs we want! Let's try pushing it even further:

[84]

{
  "status": "INSUFFICIENT_DATA"
}

[85]

{
  "status": "INSUFFICIENT_DATA"
}

Excellent, the prompt is handling all of our edge cases!

Wrap up

In this lesson, we walked through the process of developing a complex prompt for summarizing customer service call transcripts. Let's recap the prompting techniques we employed:

System Prompt: We used a system prompt to set the overall context and role for Claude.
Structured Input: We placed the call transcript at the beginning of the prompt using XML tags.
Clear Instructions: We provided detailed guidelines on what to focus on and how to structure the output.
Output Formatting: We specified a JSON structure for the summary, ensuring consistent and easily parseable results.
Handling Edge Cases: We added criteria for identifying calls with insufficient data.
Examples: We included diverse examples to illustrate desired outputs for different scenarios.
Thinking Aloud: We asked Claude to show its analysis in tags before providing the final JSON output.

By employing these techniques, we created a robust prompt capable of generating structured summaries for a wide range of customer service call transcripts, while appropriately handling edge cases. This approach can be adapted to many other complex prompting scenarios beyond call summarization.

Important Note: While we've developed a sophisticated prompt that appears to handle our test cases well, it's crucial to understand that this prompt is not yet production-ready. What we've created is a promising starting point, but it requires extensive testing and evaluation before it can be reliably used in a real-world setting. Our current eye-ball test evaluation has been based on a small set of examples. This is not representative of the diverse and often unpredictable nature of real customer service calls. To ensure the prompt's effectiveness and reliability, we need to implement a comprehensive evaluation process that includes quantitative metrics. Robust, data-driven evaluations are the key to bridging the gap between a promising prototype and a reliable, production-grade solution.