Topic Classifier
End_to_end_Solutionsazure-openai-samplessrcAOAIVirtualAssistantnotebooks
Export
Building a Topic Classifier using Azure OpenAI GPT
Problem description
Build topic classifier to route route user questions into different topics.
On workflow, this topic classifier comes before several ChatGPT chatbots that addressing questions on their own topics.
Background
The chatbot is for an insurance company called Contoso, Ltd. The insurance company provide two services: auto and home flood insurance.
The six topics are:
1 'auto insurance premium'
2 'home flood insurance'
3 'irrelevant of insurance'
4 'chit chat'
5 'end conversation'
6 'continued conversation'
Data
Data is manually generated by multiple GPT API calls, it's purely synthetic.
There are 66 rows in total, and it covers the 6 topics mentioned above. Each row contains one round of QnA between customer and agent, with topic labeled.
Approach
Understanding data: exploratory data analysis (EDA),
Build models: build 4 models in total including zero/few-shot across two models: text-davinci-003 and text-curie-001
zero-shot
text-davinci-003
text-curie-001
few-shot
text-davinci-003
text-curie-001
Compare performancemetrics from 4 models and pick the best one.
Result
Topic classifier working as expected and evaluation metrics on 66 synthetic dataset reported.
Overall, few-shot text-curie-001 performs better than zero-shot text-curie-001, and text-davinci-003 outperforms text-curie-001. On text-davinci-003, both zero-shot and few-shot performs the same, achieving >98% on weighted accuracy/precision/recall.
When applying to real business classification problems, please evaluate with production dataset and pick the best model based on performance and cost.
set up
[1]
[2]
load data
[3]
[4]
<class 'pandas.core.frame.DataFrame'> RangeIndex: 66 entries, 0 to 65 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 QnA 66 non-null object 1 topic 66 non-null object 2 customer_question 66 non-null object 3 agent_reply 66 non-null object dtypes: object(4) memory usage: 2.2+ KB
[5]
Data Cleaning and EDA
[6]
{'auto insurance premium',
, 'chit chat',
, 'continued conversation',
, 'end conversation',
, 'home flood insurance',
, 'irrelevant of insurance'} [7]
[8]
[9]
[10]
pre-processing
[11]
[12]
1 'auto insurance premium' 2 'home flood insurance' 3 'irrelevant of insurance' 4 'chit chat' 5 'end conversation' 6 'continued conversation'
batch prediction
[13]
[14]
[15]
metric
confusion matrix
[16]
result display
[17]
compare results
zero-shot
text-davinci-003
text-curie-001
few-shot
text-davinci-003
text-curie-001
zero-shot prompt engineering and results
[18]
[19]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 QnA 66 non-null object
1 topic 66 non-null object
2 customer_question 66 non-null object
3 agent_reply 66 non-null object
4 choice 66 non-null int64
5 completion 66 non-null object
6 prompt_original 66 non-null object
7 prompt 66 non-null object
dtypes: int64(1), object(7)
memory usage: 4.2+ KB
None
data saved at prompt_zero_shot.jsonl
----show a sample prompt for visual check----
Insurer's question:
How are auto insurance premiums calculated?
Based on the customer question above, choose one choice below that best describe the question content. Classify between category 1 to 6.
Make choices based on provided facts, don't imply nor make up stuff.
Detailed Guidelines for how to choose:
choose 1 if the question is about auto insurance premium.
choose 2 if the question is about home flood insurance.
choose 3 if the question is irrelevant of insurance.
choose 4 if the question is chit chat.
choose 5 if the question is about ending conversation.
choose 6 if the question is just a continuation of previous question.
Choose one correct number:
[20]
text-davinci-003 prompt_zero_shot.jsonl total number of rounds: 5 current run 0 1 2 3 4 failed number of cases 0
1 20 ,2 10 ,3 10 ,4 8 ,5 9 ,6 9 ,Name: pred_class, dtype: int64
1 20 ,2 10 ,3 10 ,4 6 ,5 10 ,6 10 ,Name: ground_truth, dtype: int64
precision recall f1-score support
1 1.00 1.00 1.00 20
2 1.00 1.00 1.00 10
3 1.00 1.00 1.00 10
4 0.75 1.00 0.86 6
5 0.89 0.80 0.84 10
6 1.00 0.90 0.95 10
accuracy 0.95 66
macro avg 0.94 0.95 0.94 66
weighted avg 0.96 0.95 0.96 66
1 20 ,2 10 ,3 10 ,5 9 ,6 9 ,4 8 ,Name: pred, dtype: int64
[21]
text-curie-001 prompt_zero_shot.jsonl total number of rounds: 5 current run 0 1 2 3 4 failed number of cases 0
1 66 ,Name: pred_class, dtype: int64
1 20 ,2 10 ,3 10 ,4 6 ,5 10 ,6 10 ,Name: ground_truth, dtype: int64
precision recall f1-score support
1 0.30 1.00 0.47 20
2 1.00 0.00 0.00 10
3 1.00 0.00 0.00 10
4 1.00 0.00 0.00 6
5 1.00 0.00 0.00 10
6 1.00 0.00 0.00 10
accuracy 0.30 66
macro avg 0.88 0.17 0.08 66
weighted avg 0.79 0.30 0.14 66
1 66 ,Name: pred, dtype: int64
few-shot prompt engineering and results
[22]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 QnA 66 non-null object
1 topic 66 non-null object
2 customer_question 66 non-null object
3 agent_reply 66 non-null object
4 choice 66 non-null int64
5 completion 66 non-null object
6 prompt_original 66 non-null object
7 prompt 66 non-null object
dtypes: int64(1), object(7)
memory usage: 4.2+ KB
None
data saved at prompt_few_shot.jsonl
----show a sample prompt for visual check----
Classify customer's question. Classify between category 1 to 6.
Detailed guidelines for how to choose:
choose 1 if the question is about auto insurance premium.
choose 2 if the question is about home flood insurance.
choose 3 if the question is irrelevant of insurance.
choose 4 if the question is chit chat.
choose 5 if the question is about ending conversation.
choose 6 if the question is just a continuation of previous question.
Customer question: Hi there, do you know how to choose flood insurance?
Classified topic:2
Customer question: Hi there, I have a question on my auto insurance.
Classified topic:1
Customer question: Hi there, do you know how to apply for financial aid?
Classified topic:3
Customer question: How is your day?
Classified topic:4
Customer question: Thanks, I got all I need.
Classified topic:5
Customer question: Can you tell me more about it?
Classified topic:6
Customer question: How are auto insurance premiums calculated?
Classified topic:
[23]
text-davinci-003 prompt_few_shot.jsonl total number of rounds: 5 current run 0 1 2 3 4 failed number of cases 0
1 20 ,2 10 ,3 10 ,4 6 ,5 11 ,6 9 ,Name: pred_class, dtype: int64
1 20 ,2 10 ,3 10 ,4 6 ,5 10 ,6 10 ,Name: ground_truth, dtype: int64
precision recall f1-score support
1 1.00 1.00 1.00 20
2 1.00 1.00 1.00 10
3 1.00 1.00 1.00 10
4 1.00 1.00 1.00 6
5 0.91 1.00 0.95 10
6 1.00 0.90 0.95 10
accuracy 0.98 66
macro avg 0.98 0.98 0.98 66
weighted avg 0.99 0.98 0.98 66
1 20 ,5 11 ,2 10 ,3 10 ,6 9 ,4 6 ,Name: pred, dtype: int64
[24]
text-curie-001 prompt_few_shot.jsonl total number of rounds: 5 current run 0 1 2 3 4 failed number of cases 0
3 ,1 5 ,2 14 ,3 12 ,Customer 14 ,This 18 ,Name: pred_class, dtype: int64
1 20 ,2 10 ,3 10 ,4 6 ,5 10 ,6 10 ,Name: ground_truth, dtype: int64
precision recall f1-score support
0.00 1.00 0.00 0
1 1.00 0.25 0.40 20
2 0.29 0.40 0.33 10
3 0.58 0.70 0.64 10
4 1.00 0.00 0.00 6
5 1.00 0.00 0.00 10
6 1.00 0.00 0.00 10
Customer 0.00 1.00 0.00 0
This 0.00 1.00 0.00 0
accuracy 0.24 66
macro avg 0.54 0.48 0.15 66
weighted avg 0.83 0.24 0.27 66
This 18 ,2 14 ,Customer 14 ,3 12 ,1 5 , 3 ,Name: pred, dtype: int64