Hugging Face Building Good Agents

Building Good Agents

smolagents_dochf-notebookszhpytorch

alph-notebooks/hf-notebooks / building_good_agents.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

[ ]

构建好用的 agent

能良好工作的 agent 和不能工作的 agent 之间，有天壤之别。我们怎么样才能构建出属于前者的 agent 呢？在本指南中，我们将看到构建 agent 的最佳实践。

[!TIP] 如果你是 agent 构建的新手，请确保首先阅读 agent 介绍和 smolagents 导览。

最好的 agent 系统是最简单的：尽可能简化工作流

在你的工作流中赋予 LLM 一些自主权，会引入一些错误风险。

经过良好编程的 agent 系统，通常具有良好的错误日志记录和重试机制，因此 LLM 引擎有机会自我纠错。但为了最大限度地降低 LLM 错误的风险，你应该简化你的工作流！

让我们回顾一下 agent 介绍中的例子：一个为冲浪旅行公司回答用户咨询的机器人。与其让 agent 每次被问及新的冲浪地点时，都分别调用 "旅行距离 API" 和 "天气 API"，你可以只创建一个统一的工具 "return_spot_information"，一个同时调用这两个 API，并返回它们连接输出的函数。

这可以降低成本、延迟和错误风险！

主要的指导原则是：尽可能减少 LLM 调用的次数。

这可以带来一些启发：

尽可能把两个工具合并为一个，就像我们两个 API 的例子。
尽可能基于确定性函数，而不是 agent 决策，来实现逻辑。

改善流向 LLM 引擎的信息流

记住，你的 LLM 引擎就像一个智能机器人，被关在一个房间里，与外界唯一的交流方式是通过门缝传递的纸条。

如果你没有明确地将信息放入其提示中，它将不知道发生的任何事情。

所以首先要让你的任务非常清晰！由于 agent 由 LLM 驱动，任务表述的微小变化可能会产生完全不同的结果。

然后，改善工具使用中流向 agent 的信息流。

需要遵循的具体指南：

每个工具都应该记录（只需在工具的 forward 方法中使用 print 语句）对 LLM 引擎可能有用的所有信息。
- 特别是，记录工具执行错误的详细信息会很有帮助！

例如，这里有一个根据位置和日期时间检索天气数据的工具：

首先，这是一个糟糕的版本：

[ ]

为什么它不好？

没有说明 date_time 应该使用的格式
没有说明位置应该如何指定
没有记录机制来处理明确的报错情况，如位置格式不正确或 date_time 格式不正确
输出格式难以理解

如果工具调用失败，内存中记录的错误跟踪，可以帮助 LLM 逆向工程工具来修复错误。但为什么要让它做这么多繁重的工作呢？

构建这个工具的更好方式如下：

[ ]

一般来说，为了减轻 LLM 的负担，要问自己的好问题是："如果我是一个第一次使用这个工具的傻瓜，使用这个工具编程并纠正自己的错误有多容易？"。

给 agent 更多参数

除了简单的任务描述字符串外，你还可以使用 additional_args 参数传递任何类型的对象：

[ ]

例如，你可以使用这个 additional_args 参数传递你希望 agent 利用的图像或字符串。

如何调试你的 agent

1. 使用更强大的 LLM

在 agent 工作流中，有些错误是实际错误，有些则是你的 LLM 引擎没有正确推理的结果。例如，参考这个我要求创建一个汽车图片的 CodeAgent 的运行记录：

==================================================================================================== New task ====================================================================================================
Make me a cool car picture
──────────────────────────────────────────────────────────────────────────────────────────────────── New step ─────────────────────────────────────────────────────────────────────────────────────────────────────
Agent is executing the code below: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
image_generator(prompt="A cool, futuristic sports car with LED headlights, aerodynamic design, and vibrant color, high-res, photorealistic")
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Last output from code snippet: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png
Step 1:

- Time taken: 16.35 seconds
- Input tokens: 1,383
- Output tokens: 77
──────────────────────────────────────────────────────────────────────────────────────────────────── New step ─────────────────────────────────────────────────────────────────────────────────────────────────────
Agent is executing the code below: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
final_answer("/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png")
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Print outputs:

Last output from code snippet: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png
Final answer:
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png

用户看到的是返回了一个路径，而不是图像。这看起来像是系统的错误，但实际上 agent 系统并没有导致错误：只是 LLM 大脑犯了一个错误，没有把图像输出，保存到变量中。因此，它无法再次访问图像，只能利用保存图像时记录的路径，所以它返回的是路径，而不是图像。

调试 agent 的第一步是"使用更强大的 LLM"。像 Qwen2.5-72B-Instruct 这样的替代方案不会犯这种错误。

2. 提供更多指导/更多信息

你也可以使用不太强大的模型，只要你更有效地指导它们。

站在模型的角度思考：如果你是模型在解决任务，你会因为系统提示+任务表述+工具描述中提供的信息而挣扎吗？

你需要一些额外的说明吗？

为了提供额外信息，我们不建议立即更改系统提示：默认系统提示有许多调整，除非你非常了解提示，否则你很容易翻车。更好的指导 LLM 引擎的方法是：

如果是关于要解决的任务：把所有细节添加到任务中。任务可以有几百页长。
如果是关于如何使用工具：你的工具的 description 属性。

3. 更改系统提示（通常不建议）

如果上述说明不够，你可以更改系统提示。

让我们看看它是如何工作的。例如，让我们检查 CodeAgent 的默认系统提示（下面的版本通过跳过零样本示例进行了缩短）。

[ ]

你会得到：

You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.
To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.
To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.

At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.
Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '<end_code>' sequence.
During each intermediate step, you can use 'print()' to save whatever important information you will then need.
These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step.
In the end you have to return a final answer using the `final_answer` tool.

Here are a few examples using notional tools:
---
Task: "Generate an image of the oldest person in this document."

Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.
Code:

[ ]

如你所见，有一些占位符，如 "{{ tool.description }}"：这些将在 agent 初始化时用于插入某些自动生成的工具或管理 agent 的描述。

因此，虽然你可以通过将自定义提示作为参数传递给 system_prompt 参数来覆盖此系统提示模板，但你的新系统提示必须包含以下占位符：

用于插入工具描述。

{%- for tool in tools.values() %}
- {{ tool.to_tool_calling_prompt() }}
{%- endfor %}

用于插入 managed agent 的描述（如果有）。

{%- if managed_agents and managed_agents.values() | list %}
You can also give tasks to team members.
Calling a team member works similarly to calling a tool: provide the task description as the 'task' argument. Since this team member is a real human, be as detailed and verbose as necessary in your task description.
You can also include any relevant variables or context using the 'additional_args' argument.
Here is a list of the team members that you can call:
{%- for agent in managed_agents.values() %}
- {{ agent.name }}: {{ agent.description }}
{%- endfor %}
{%- endif %}

仅限 CodeAgent："{{authorized_imports}}" 用于插入授权导入列表。

然后你可以根据如下，更改系统提示：

[ ]

这也适用于 ToolCallingAgent。

4. 额外规划

我们提供了一个用于补充规划步骤的模型，agent 可以在正常操作步骤之间定期运行。在此步骤中，没有工具调用，LLM 只是被要求更新它知道的事实列表，并根据这些事实反推它应该采取的下一步。

[ ]