Notebooks
d
deepset
Chat With SQL 3 Ways

Chat With SQL 3 Ways

agentic-aiagenticagentsgenaiAIhaystack-cookbookgenai-usecaseshaystack-ainotebooksPythonragai-tools

Chat With Your SQL Database

by Tuana Celik (X, LI)

In this example, we are querying a SQL Database!

Resources:

King (14).png

Install dependencies

For this demo, we're using SQLite.

The first few code cells in this section fetchers a CSV file on 'Absenteeism' and creates a SQL table from it

[ ]
[27]
Extracting the Absenteeism at work dataset...
Cleaning up the Absenteeism at work dataset...
[97]
'ID, Reason_for_absence, Month_of_absence, Day_of_the_week, Seasons, Transportation_expense, Distance_from_Residence_to_Work, Service_time, Age, Work_load_Average_day_, Hit_target, Disciplinary_failure, Education, Son, Social_drinker, Social_smoker, Pet, Weight, Height, Body_mass_index, Absenteeism_time_in_hours'
[38]
Opened database successfully
[40]
740
[41]

Create a SQL Query Component

Here, we're creating a custom component called SQLQuery, this way, we can use it in our Haystack pipeline like any other component (like a retriever, generator etc). This component does just one thing:

  • Accepts queries which are SQL queries
  • Queries the database with those SQL queries and returns the result from the database.
[123]

Try the SQLQuery Component

[124]
[125]
   Age  Total_Absenteeism_Hours
0   28                      651
1   33                      538
2   38                      482

Query A SQL Database with Natural Language

In this section, we're building a simple pipeline that can:

  • Accept natural language questions
  • Translates those questions into a SQL Query
  • Queries our database using the SQLQuery component

Shortcoming: This pipeline will still run if you ask a completely unrelated question that cannot be answered with the database we have at hand. Observe how the SQLQuery component throws an error in these cases.

[67]
OpenAI API Key: ··········
[126]
Output
[127]
   Day_of_the_week
0                2
1                3
2                4
3                5
4                6

Skip for Unrelated Questions: Add a Condition

Now, let's create another pipeline, to avoid having to query the database if the question is unrelated to the information present in the database. For this, we do a few things:

  • We modify the prompt to answer with no_answer if the question cannot be answered given the database and its columns
  • We add a conditional router that routes the query to the SQLQuery component only if the question was evaluated to be answerable
  • We add a fallback_prompt and fallback_llm to return a statement about the fact that the question cannot be answered, along with reasons. This branch of the pipeline runs only if the question cannot be answered.
[135]
Output
[137]
The query cannot be answered as the provided table does not contain information regarding the user's personal data such as birthdays. The table primarily focuses on absence-related data for presumably work or similar situations. Please provide the relevant data to get the accurate answer.

Function Calling to Query a SQL Database

Now let's try something a bit more fun. Instead of a component, we are going to provide sql querying as a function. Since we already built it, we can simply wrap our SQLQuery component into a function 👇

[128]

Define Tools

Now, let's provide this function as a tool. Below, we are using OpenAI for demonstration purposes so we abide by their function definition schema 👇

[129]

Try The Tool

[130]
{'replies': [ChatMessage(content='[{"index": 0, "id": "call_fRYwYg6iAqroHwYzPD6UxOVg", "function": {"arguments": "{\\n  \\"queries\\": [\\"SELECT Day_of_the_week, AVG(Absenteeism_time_in_hours) AS Average_Absenteeism_Hours FROM absenteeism GROUP BY Day_of_the_week HAVING AVG(Absenteeism_time_in_hours) > 4\\"]\\n}", "name": "sql_query_func"}, "type": "function"}]', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4-0613', 'index': 0, 'finish_reason': 'tool_calls', 'usage': {}})]}
[131]
Function Name: sql_query_func
Function Arguments: {'queries': ['SELECT Day_of_the_week, AVG(Absenteeism_time_in_hours) AS Average_Absenteeism_Hours FROM absenteeism GROUP BY Day_of_the_week HAVING AVG(Absenteeism_time_in_hours) > 4']}
Function Response: {'reply': '   Day_of_the_week  Average_Absenteeism_Hours\n0                2                   9.248447\n1                3                   7.980519\n2                4                   7.147436\n3                5                   4.424000\n4                6                   5.125000'}

Build a Chat with SQL App

First, let's install Gradio, we will use that for our mini app

[ ]
[132]
[133]
[134]
Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://83eb0414c1916d8ee7.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
ChatMessage(content='[{"id": "call_Uu8QXlIsJfYCULD4Q0bEcAtP", "function": {"arguments": "{\\n  \\"queries\\": [\\n    \\"SELECT Age, SUM(Absenteeism_time_in_hours) as Total_Absenteeism_Hours FROM absenteeism WHERE Disciplinary_failure = 0 GROUP BY Age ORDER BY Total_Absenteeism_Hours DESC LIMIT 3\\"\\n  ]\\n}", "name": "sql_query_func"}, "type": "function"}]', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4-0613', 'index': 0, 'finish_reason': 'tool_calls', 'usage': {'completion_tokens': 68, 'prompt_tokens': 207, 'total_tokens': 275}})
   Age  Total_Absenteeism_Hours
0   28                      651
1   33                      538
2   38                      482
ChatMessage(content='[{"id": "call_t8bjUHMvHHrXReB2qm3iVkNF", "function": {"arguments": "{\\n\\"queries\\": [\\"SELECT Day_of_the_week, AVG(Absenteeism_time_in_hours) as average_absenteeism_time FROM absenteeism GROUP BY Day_of_the_week HAVING average_absenteeism_time > 4\\"]\\n}", "name": "sql_query_func"}, "type": "function"}]', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4-0613', 'index': 0, 'finish_reason': 'tool_calls', 'usage': {'completion_tokens': 57, 'prompt_tokens': 320, 'total_tokens': 377}})
   Day_of_the_week  average_absenteeism_time
0                2                  9.248447
1                3                  7.980519
2                4                  7.147436
3                5                  4.424000
4                6                  5.125000
Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://83eb0414c1916d8ee7.gradio.live