Azure Topic Classifier

Topic Classifier

End_to_end_Solutionsazure-openai-samplessrcAOAIVirtualAssistantnotebooks

alph-notebooks/azure-openai-samples / topic_classifier.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Building a Topic Classifier using Azure OpenAI GPT

Problem description

	Build topic classifier to route route user questions into different topics.

On workflow, this topic classifier comes before several ChatGPT chatbots that addressing questions on their own topics.

Background

	The chatbot is for an insurance company called Contoso, Ltd. The insurance company provide two services: auto and home flood insurance.

The six topics are:
    1 'auto insurance premium'
    2 'home flood insurance'
    3 'irrelevant of insurance'
    4 'chit chat'
    5 'end conversation'
    6 'continued conversation'

Data

	Data is manually generated by multiple GPT API calls, it's purely synthetic. 
There are 66 rows in total, and it covers the 6 topics mentioned above. Each row contains one round of QnA between customer and agent, with topic labeled.

Approach

	Understanding data: exploratory data analysis (EDA), 
Build models: build 4 models in total including zero/few-shot across two models: text-davinci-003 and text-curie-001 
        zero-shot
            text-davinci-003
            text-curie-001
        few-shot
            text-davinci-003
            text-curie-001
Compare performancemetrics from 4 models and pick the best one.

Result

	Topic classifier working as expected and evaluation metrics on 66 synthetic dataset reported.
Overall, few-shot text-curie-001 performs better than zero-shot text-curie-001, and text-davinci-003 outperforms text-curie-001. On text-davinci-003, both zero-shot and few-shot performs the same, achieving >98% on weighted accuracy/precision/recall.
When applying to real business classification problems, please evaluate with production dataset and pick the best model based on performance and cost.

set up

[1]

[2]

load data

[3]

[4]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   QnA                66 non-null     object
 1   topic              66 non-null     object
 2   customer_question  66 non-null     object
 3   agent_reply        66 non-null     object
dtypes: object(4)
memory usage: 2.2+ KB

[5]

Data Cleaning and EDA

[6]

{'auto insurance premium',
, 'chit chat',
, 'continued conversation',
, 'end conversation',
, 'home flood insurance',
, 'irrelevant of insurance'}

[7]

[8]

[9]

[10]

pre-processing

[11]

[12]

1 'auto insurance premium'
2 'home flood insurance'
3 'irrelevant of insurance'
4 'chit chat'
5 'end conversation'
6 'continued conversation'

batch prediction

[13]

[14]

[15]

metric

confusion matrix

[16]

result display

[17]

compare results

	zero-shot
    text-davinci-003
    text-curie-001

few-shot
    text-davinci-003
    text-curie-001

zero-shot prompt engineering and results

[18]

[19]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   QnA                66 non-null     object
 1   topic              66 non-null     object
 2   customer_question  66 non-null     object
 3   agent_reply        66 non-null     object
 4   choice             66 non-null     int64 
 5   completion         66 non-null     object
 6   prompt_original    66 non-null     object
 7   prompt             66 non-null     object
dtypes: int64(1), object(7)
memory usage: 4.2+ KB
None
data saved at prompt_zero_shot.jsonl
----show a sample prompt for visual check----
Insurer's question:
How are auto insurance premiums calculated?


Based on the customer question above, choose one choice below that best describe the question content. Classify between category 1 to 6.
Make choices based on provided facts, don't imply nor make up stuff.

Detailed Guidelines for how to choose:
    choose 1 if the question is about auto insurance premium.
    choose 2 if the question is about home flood insurance.
    choose 3 if the question is irrelevant of insurance.
    choose 4 if the question is chit chat.
    choose 5 if the question is about ending conversation.
    choose 6 if the question is just a continuation of previous question.
Choose one correct number:

[20]

text-davinci-003 prompt_zero_shot.jsonl
total number of rounds: 5
current run
0
1
2
3
4
failed number of cases 0

1    20
,2    10
,3    10
,4     8
,5     9
,6     9
,Name: pred_class, dtype: int64

1    20
,2    10
,3    10
,4     6
,5    10
,6    10
,Name: ground_truth, dtype: int64

              precision    recall  f1-score   support

           1       1.00      1.00      1.00        20
           2       1.00      1.00      1.00        10
           3       1.00      1.00      1.00        10
           4       0.75      1.00      0.86         6
           5       0.89      0.80      0.84        10
           6       1.00      0.90      0.95        10

    accuracy                           0.95        66
   macro avg       0.94      0.95      0.94        66
weighted avg       0.96      0.95      0.96        66

1    20
,2    10
,3    10
,5     9
,6     9
,4     8
,Name: pred, dtype: int64

[21]

text-curie-001 prompt_zero_shot.jsonl
total number of rounds: 5
current run
0
1
2
3
4
failed number of cases 0

1    66
,Name: pred_class, dtype: int64

1    20
,2    10
,3    10
,4     6
,5    10
,6    10
,Name: ground_truth, dtype: int64

              precision    recall  f1-score   support

           1       0.30      1.00      0.47        20
           2       1.00      0.00      0.00        10
           3       1.00      0.00      0.00        10
           4       1.00      0.00      0.00         6
           5       1.00      0.00      0.00        10
           6       1.00      0.00      0.00        10

    accuracy                           0.30        66
   macro avg       0.88      0.17      0.08        66
weighted avg       0.79      0.30      0.14        66

1    66
,Name: pred, dtype: int64

few-shot prompt engineering and results

[22]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   QnA                66 non-null     object
 1   topic              66 non-null     object
 2   customer_question  66 non-null     object
 3   agent_reply        66 non-null     object
 4   choice             66 non-null     int64 
 5   completion         66 non-null     object
 6   prompt_original    66 non-null     object
 7   prompt             66 non-null     object
dtypes: int64(1), object(7)
memory usage: 4.2+ KB
None
data saved at prompt_few_shot.jsonl
----show a sample prompt for visual check----

Classify customer's question. Classify between category 1 to 6.

Detailed guidelines for how to choose:
    choose 1 if the question is about auto insurance premium.
    choose 2 if the question is about home flood insurance.
    choose 3 if the question is irrelevant of insurance.
    choose 4 if the question is chit chat.
    choose 5 if the question is about ending conversation.
    choose 6 if the question is just a continuation of previous question.
    
Customer question: Hi there, do you know how to choose flood insurance?
Classified topic:2

Customer question: Hi there, I have a question on my auto insurance.
Classified topic:1

Customer question: Hi there, do you know how to apply for financial aid?
Classified topic:3

Customer question: How is your day?
Classified topic:4

Customer question: Thanks, I got all I need.
Classified topic:5

Customer question: Can you tell me more about it?
Classified topic:6


Customer question: How are auto insurance premiums calculated?
Classified topic:

[23]

text-davinci-003 prompt_few_shot.jsonl
total number of rounds: 5
current run
0
1
2
3
4
failed number of cases 0

1    20
,2    10
,3    10
,4     6
,5    11
,6     9
,Name: pred_class, dtype: int64

1    20
,2    10
,3    10
,4     6
,5    10
,6    10
,Name: ground_truth, dtype: int64

              precision    recall  f1-score   support

           1       1.00      1.00      1.00        20
           2       1.00      1.00      1.00        10
           3       1.00      1.00      1.00        10
           4       1.00      1.00      1.00         6
           5       0.91      1.00      0.95        10
           6       1.00      0.90      0.95        10

    accuracy                           0.98        66
   macro avg       0.98      0.98      0.98        66
weighted avg       0.99      0.98      0.98        66

1    20
,5    11
,2    10
,3    10
,6     9
,4     6
,Name: pred, dtype: int64

[24]

text-curie-001 prompt_few_shot.jsonl
total number of rounds: 5
current run
0
1
2
3
4
failed number of cases 0

             3
,1            5
,2           14
,3           12
,Customer    14
,This        18
,Name: pred_class, dtype: int64

1    20
,2    10
,3    10
,4     6
,5    10
,6    10
,Name: ground_truth, dtype: int64

              precision    recall  f1-score   support

                   0.00      1.00      0.00         0
           1       1.00      0.25      0.40        20
           2       0.29      0.40      0.33        10
           3       0.58      0.70      0.64        10
           4       1.00      0.00      0.00         6
           5       1.00      0.00      0.00        10
           6       1.00      0.00      0.00        10
    Customer       0.00      1.00      0.00         0
        This       0.00      1.00      0.00         0

    accuracy                           0.24        66
   macro avg       0.54      0.48      0.15        66
weighted avg       0.83      0.24      0.27        66

This        18
,2           14
,Customer    14
,3           12
,1            5
,             3
,Name: pred, dtype: int64