Chat With SQL 3 Ways
Chat With Your SQL Database
In this example, we are querying a SQL Database!
Resources:
Install dependencies
For this demo, we're using SQLite.
The first few code cells in this section fetchers a CSV file on 'Absenteeism' and creates a SQL table from it
Extracting the Absenteeism at work dataset... Cleaning up the Absenteeism at work dataset...
'ID, Reason_for_absence, Month_of_absence, Day_of_the_week, Seasons, Transportation_expense, Distance_from_Residence_to_Work, Service_time, Age, Work_load_Average_day_, Hit_target, Disciplinary_failure, Education, Son, Social_drinker, Social_smoker, Pet, Weight, Height, Body_mass_index, Absenteeism_time_in_hours'
Opened database successfully
740
Create a SQL Query Component
Here, we're creating a custom component called SQLQuery, this way, we can use it in our Haystack pipeline like any other component (like a retriever, generator etc). This component does just one thing:
- Accepts
querieswhich are SQL queries - Queries the database with those SQL queries and returns the result from the database.
Try the SQLQuery Component
Age Total_Absenteeism_Hours 0 28 651 1 33 538 2 38 482
Query A SQL Database with Natural Language
In this section, we're building a simple pipeline that can:
- Accept natural language questions
- Translates those questions into a SQL Query
- Queries our database using the
SQLQuerycomponent
Shortcoming: This pipeline will still run if you ask a completely unrelated question that cannot be answered with the database we have at hand. Observe how the SQLQuery component throws an error in these cases.
OpenAI API Key: ··········
Day_of_the_week 0 2 1 3 2 4 3 5 4 6
Skip for Unrelated Questions: Add a Condition
Now, let's create another pipeline, to avoid having to query the database if the question is unrelated to the information present in the database. For this, we do a few things:
- We modify the prompt to answer with
no_answerif the question cannot be answered given the database and itscolumns - We add a conditional router that routes the query to the
SQLQuerycomponent only if the question was evaluated to be answerable - We add a
fallback_promptandfallback_llmto return a statement about the fact that the question cannot be answered, along with reasons. This branch of the pipeline runs only if the question cannot be answered.
The query cannot be answered as the provided table does not contain information regarding the user's personal data such as birthdays. The table primarily focuses on absence-related data for presumably work or similar situations. Please provide the relevant data to get the accurate answer.
Function Calling to Query a SQL Database
Now let's try something a bit more fun. Instead of a component, we are going to provide sql querying as a function. Since we already built it, we can simply wrap our SQLQuery component into a function 👇
Define Tools
Now, let's provide this function as a tool. Below, we are using OpenAI for demonstration purposes so we abide by their function definition schema 👇
Try The Tool
{'replies': [ChatMessage(content='[{"index": 0, "id": "call_fRYwYg6iAqroHwYzPD6UxOVg", "function": {"arguments": "{\\n \\"queries\\": [\\"SELECT Day_of_the_week, AVG(Absenteeism_time_in_hours) AS Average_Absenteeism_Hours FROM absenteeism GROUP BY Day_of_the_week HAVING AVG(Absenteeism_time_in_hours) > 4\\"]\\n}", "name": "sql_query_func"}, "type": "function"}]', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4-0613', 'index': 0, 'finish_reason': 'tool_calls', 'usage': {}})]}
Function Name: sql_query_func
Function Arguments: {'queries': ['SELECT Day_of_the_week, AVG(Absenteeism_time_in_hours) AS Average_Absenteeism_Hours FROM absenteeism GROUP BY Day_of_the_week HAVING AVG(Absenteeism_time_in_hours) > 4']}
Function Response: {'reply': ' Day_of_the_week Average_Absenteeism_Hours\n0 2 9.248447\n1 3 7.980519\n2 4 7.147436\n3 5 4.424000\n4 6 5.125000'}
Build a Chat with SQL App
First, let's install Gradio, we will use that for our mini app
Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly). Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch(). Running on public URL: https://83eb0414c1916d8ee7.gradio.live This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
ChatMessage(content='[{"id": "call_Uu8QXlIsJfYCULD4Q0bEcAtP", "function": {"arguments": "{\\n \\"queries\\": [\\n \\"SELECT Age, SUM(Absenteeism_time_in_hours) as Total_Absenteeism_Hours FROM absenteeism WHERE Disciplinary_failure = 0 GROUP BY Age ORDER BY Total_Absenteeism_Hours DESC LIMIT 3\\"\\n ]\\n}", "name": "sql_query_func"}, "type": "function"}]', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4-0613', 'index': 0, 'finish_reason': 'tool_calls', 'usage': {'completion_tokens': 68, 'prompt_tokens': 207, 'total_tokens': 275}})
Age Total_Absenteeism_Hours
0 28 651
1 33 538
2 38 482
ChatMessage(content='[{"id": "call_t8bjUHMvHHrXReB2qm3iVkNF", "function": {"arguments": "{\\n\\"queries\\": [\\"SELECT Day_of_the_week, AVG(Absenteeism_time_in_hours) as average_absenteeism_time FROM absenteeism GROUP BY Day_of_the_week HAVING average_absenteeism_time > 4\\"]\\n}", "name": "sql_query_func"}, "type": "function"}]', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4-0613', 'index': 0, 'finish_reason': 'tool_calls', 'usage': {'completion_tokens': 57, 'prompt_tokens': 320, 'total_tokens': 377}})
Day_of_the_week average_absenteeism_time
0 2 9.248447
1 3 7.980519
2 4 7.147436
3 5 4.424000
4 6 5.125000
Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://83eb0414c1916d8ee7.gradio.live