Notebooks
A
Azure
Completions With Dynamic Prompt

Completions With Dynamic Prompt

azure-openai-samplesBasic_SamplesCompletions

Dynamic Prompting for task completion


Recent papers such as Do Prompt-Based Models Really Understand the Meaning of their Prompts? and What Makes Good In-Context Examples for GPT-3? have shown that using dynamic set of examples instead of fixed set of examples help GPT-3 to perfom the task with higher accuracy.

[1]
[2]
True

Dataset Summary

The Text REtrieval Conference (TREC) Question Classification dataset is a dataset for question classification consisting of open-domain, fact-based questions divided into broad semantic categories. It contains 5500 labeled questions in training set and another 500 for test set.

The dataset has 6 coarse class labels and 50 fine class labels. Average length of each sentence is 10, vocabulary size of 8700.

[3]
[4]
DatasetDict({
,    train: Dataset({
,        features: ['text', 'coarse_label', 'fine_label'],
,        num_rows: 5452
,    })
,    test: Dataset({
,        features: ['text', 'coarse_label', 'fine_label'],
,        num_rows: 500
,    })
,})
[5]

Task Prompt

[6]

Setup OpenAI API

[7]
[8]

Zero-shot Prompt

Example of the zero-shot prompt

[9]
As a Question Answering agent, your goal is to categorize questions into different semantic classes that impose constraints on potential answers, so that they can be utilized in later stages of the question answering process.
Following are the semantic classes: [ABBR, ENTY, DESC, HUM, LOC, NUM]
Classify the following question into one of the above classes. Please answer in a single word.
question: How far is it from Denver to Aspen ?
output: 
[10]
[11]
              precision    recall  f1-score   support

           0       0.69      1.00      0.82         9
           1       0.30      0.69      0.42        94
           2       0.70      0.15      0.25       138
           3       0.88      0.35      0.51        65
           4       0.70      0.90      0.78        81
           5       0.84      0.80      0.82       113

    accuracy                           0.56       500
   macro avg       0.69      0.65      0.60       500
weighted avg       0.68      0.56      0.54       500

Few-shot Prompt

[12]

Example of the few-shot prompt

[13]
As a Question Answering agent, your goal is to categorize questions into different semantic classes that impose constraints on potential answers, so that they can be utilized in later stages of the question answering process.
Following are the semantic classes: [ABBR, ENTY, DESC, HUM, LOC, NUM]
Following are some examples.
question: What is the full form of .com ?
output: ABBR
question: What films featured the character Popeye Doyle ?
output: ENTY
question: How did serfdom develop in and then leave Russia ?
output: DESC
question: What contemptible scoundrel stole the cork from my lunch ?
output: HUM
question: What sprawling U.S. state boasts the most airports ?
output: LOC
question: When was Ozzy Osbourne born ?
output: NUM
Classify the following question into one of the above classes. Please answer in a single word.
question: How far is it from Denver to Aspen ?
output: 
[14]
[15]
              precision    recall  f1-score   support

           0       0.75      1.00      0.86         9
           1       0.41      0.57      0.48        94
           2       0.80      0.51      0.62       138
           3       0.96      0.69      0.80        65
           4       0.93      0.93      0.93        81
           5       0.80      1.00      0.89       113

    accuracy                           0.73       500
   macro avg       0.77      0.78      0.76       500
weighted avg       0.77      0.73      0.73       500

Extract Embeddings for Training dataset

[16]

Dynamic Few-shot Prompt

[17]
[18]

Example of the dynamic prompt

[ ]
[20]
[21]
              precision    recall  f1-score   support

           0       0.69      1.00      0.82         9
           1       0.66      0.76      0.70        94
           2       0.88      0.71      0.79       138
           3       0.95      0.91      0.93        65
           4       0.94      0.90      0.92        81
           5       0.88      0.99      0.93       113

    accuracy                           0.84       500
   macro avg       0.83      0.88      0.85       500
weighted avg       0.85      0.84      0.84       500