[1]
[2]
/opt/anaconda3/envs/hover-benchmark/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[5]
[4]
[ ]
[7]
llm_generate |█████████▉| 149/150 (99.3%) | ⏳ 00:36<00:00 | 14.61it/s 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:20<00:00 |  7.40it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:19<00:00 |  7.52it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:18<00:00 |  8.11it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.46it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:18<00:00 |  8.09it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.69it/s
                                                                       
                                                                       
llm_generate |██████████| 150/150 (100.0%) | ⏳ 03:31<00:00 |  2.64s/it
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
0.6666666666666666
['claim', 'uid', 'ground_truth_label', 'ground_truth_wikipedia_titles', 'query_1', 'passages_1', 'summary_1', 'query_2', 'passages_2', 'summary_2', 'query_3', 'passages_3', 'summary_3', 'final_answer', 'correctness', 'evaluation']

🔧 Creating batches with 128,000 token limit
📊 Processing 150 examples in 4 batches
   ✅ Batch 1/4: Optimized
   ✅ Batch 2/4: Optimized
   ✅ Batch 3/4: Optimized
   ✅ Batch 4/4: Optimized



































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:34<00:00 |  8.75it/s
























































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:44<00:00 |  6.73it/s


































































































































                                                                       
                                                                    

llm_generate |██████████| 150/150 (100.0%) | ⏳ 16:03<00:00 |  2.64s/it
                                                                       
                                                                    

llm_generate |██████████| 150/150 (100.0%) | ⏳ 16:03<00:00 |  2.64s/it
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...



llm_generate |██████████| 150/150 (100.0%) | ⏳ 16:07<00:00 |  6.45s/it
llm_generate |██████████| 150/150 (100.0%) | ⏳ 13:36<00:00 |  5.44s/it
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:52<00:00 |  5.71it/s






















































































































































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:56<00:00 |  5.29it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:27<00:00 | 10.79it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:56<00:00 |  5.31it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:32<00:00 |  9.34it/s
0.47
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:17<00:00 |  8.37it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:27<00:00 |  5.47it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.56it/s
llm_generate |██▍       | 37/150 (24.7%) | ⏳ 00:11<00:19 |  5.81it/s 
Exception in worker on attempt 1: raised InternalServerError('<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>')
Requeuing...
llm_generate |█████████▉| 149/150 (99.3%) | ⏳ 00:46<00:00 |  2.53it/s 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.59it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 01:13<00:00 |  2.04it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:39<00:00 |  3.83it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.54it/s
llm_generate |███████▎  | 110/150 (73.3%) | ⏳ 01:19<00:16 |  2.48it/s 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |███████▊  | 118/150 (78.7%) | ⏳ 01:22<00:13 |  2.43it/s 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |██████████| 150/150 (100.0%) | ⏳ 01:54<00:00 |  2.25s/it
0.5133333333333333
['claim', 'uid', 'ground_truth_label', 'ground_truth_wikipedia_titles', 'query_1', 'passages_1', 'summary_1', 'query_2', 'passages_2', 'summary_2', 'query_3', 'passages_3', 'summary_3', 'final_answer', 'correctness', 'evaluation']

🔧 Creating batches with 128,000 token limit
📊 Processing 150 examples in 5 batches
   ✅ Batch 1/5: Optimized
   ✅ Batch 2/5: Optimized
   ✅ Batch 3/5: Optimized
   ✅ Batch 4/5: Optimized
   ✅ Batch 5/5: Optimized

llm_generate |██████████| 150/150 (100.0%) | ⏳ 17:48<00:00 |  2.25s/it 
llm_generate |██████████| 150/150 (100.0%) | ⏳ 17:48<00:00 |  2.25s/it 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...












































































































                                                                
                                                                       
llm_generate |██████████| 150/150 (100.0%) | ⏳ 18:32<00:00 |  2.25s/it

                                                                
                                                                       
llm_generate |██████████| 150/150 (100.0%) | ⏳ 18:32<00:00 |  2.25s/it
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...















































































































































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.70it/s




















































































































































































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 01:12<00:00 |  4.13it/s




































































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.49it/s


























































































































































































































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 01:09<00:00 |  4.31it/s
























llm_generate |██████████| 150/150 (100.0%) | ⏳ 22:31<00:00 |  9.01s/it
llm_generate |██████████| 300/300 (100.0%) | ⏳ 05:36<00:00 |  1.12s/it
llm_generate |██████████| 300/300 (100.0%) | ⏳ 04:39<00:00 |  1.07it/s














































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:27<00:00 | 10.82it/s
0.5933333333333334
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:17<00:00 |  8.79it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:36<00:00 |  4.13it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.49it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:37<00:00 |  4.05it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:16<00:00 |  8.92it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:37<00:00 |  4.05it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:16<00:00 |  9.27it/s
llm_generate |█████████ | 136/150 (90.7%) | ⏳ 01:51<00:11 |  1.17it/s 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |█████████▊| 147/150 (98.0%) | ⏳ 02:06<00:05 |  1.76s/it 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |██████████| 150/150 (100.0%) | ⏳ 02:32<00:00 |  6.66s/it
0.58
['claim', 'uid', 'ground_truth_label', 'ground_truth_wikipedia_titles', 'query_1', 'passages_1', 'summary_1', 'query_2', 'passages_2', 'summary_2', 'query_3', 'passages_3', 'summary_3', 'final_answer', 'correctness', 'evaluation']

🔧 Creating batches with 128,000 token limit
📊 Processing 150 examples in 5 batches
   ✅ Batch 1/5: Optimized
   ✅ Batch 2/5: Optimized
   ✅ Batch 3/5: Optimized
   ✅ Batch 4/5: Optimized
   ✅ Batch 5/5: Optimized
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.69it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 01:14<00:00 |  4.01it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.47it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 15:53<00:00 |  6.35s/it

                                                                   
                                                                   
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.50it/s
llm_generate |████████▌ | 258/300 (86.0%) | ⏳ 00:57<00:07 |  5.35it/s
llm_generate |████████▋ | 259/300 (86.3%) | ⏳ 00:57<00:07 |  5.35it/s  
llm_generate |████████▋ | 260/300 (86.7%) | ⏳ 00:57<00:06 |  6.26it/s  
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |██████████| 300/300 (100.0%) | ⏳ 01:14<00:00 |  1.92s/it






























































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:34<00:00 |  8.79it/s
0.54




































































llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:16<00:00 |  9.24it/s




































































































                                                                
                                                                       
llm_generate |██████████| 300/300 (100.0%) | ⏳ 03:02<00:00 |  1.92s/it

                                                                
                                                                       
llm_generate |██████████| 300/300 (100.0%) | ⏳ 03:02<00:00 |  1.92s/it
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...



































































































































llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:19<00:00 |  7.89it/s















































































































































                                                             

                                                                 
                                                                       


llm_generate |██████████| 300/300 (100.0%) | ⏳ 04:01<00:00 |  1.92s/it


                                                             

                                                                 
                                                                       


llm_generate |██████████| 300/300 (100.0%) | ⏳ 04:01<00:00 |  1.92s/it
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...

















































































































































































































































llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.63it/s











































































































































































































































































llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:33<00:00 |  4.43it/s





















































































































































































































llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.69it/s


















































































































































































































































































































































































































                                                              

                                                                 
                                                                       



                                                          


llm_generate |██████████| 300/300 (100.0%) | ⏳ 07:27<00:00 |  1.92s/it





                                                              

                                                                 
                                                                       



                                                          


llm_generate |██████████| 300/300 (100.0%) | ⏳ 07:27<00:00 |  1.92s/it


Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...



0.5466666666666666
['claim', 'uid', 'ground_truth_label', 'ground_truth_wikipedia_titles', 'query_1', 'passages_1', 'summary_1', 'query_2', 'passages_2', 'summary_2', 'query_3', 'passages_3', 'summary_3', 'final_answer', 'correctness', 'evaluation']

🔧 Creating batches with 128,000 token limit
📊 Processing 150 examples in 5 batches
   ✅ Batch 1/5: Optimized
   ✅ Batch 2/5: Optimized
   ✅ Batch 3/5: Optimized
   ✅ Batch 4/5: Optimized
   ✅ Batch 5/5: Optimized
































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 23:33<00:00 |  4.71s/it
llm_generate |██████████| 300/300 (100.0%) | ⏳ 21:37<00:00 |  4.33s/it
llm_generate |██████████| 150/150 (100.0%) | ⏳ 19:30<00:00 |  7.81s/it
llm_generate |██████████| 150/150 (100.0%) | ⏳ 18:07<00:00 |  7.25s/it
llm_generate |██████████| 150/150 (100.0%) | ⏳ 16:20<00:00 |  6.54s/it




















































































































































































































































































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.66it/s
llm_generate |█████████▉| 299/300 (99.7%) | ⏳ 01:28<00:00 |  3.08it/s 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.69it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 01:06<00:00 |  4.54it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.61it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 01:10<00:00 |  4.26it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.70it/s
0.55

llm_generate |██████████| 300/300 (100.0%) | ⏳ 06:03<00:00 |  4.59s/it 
llm_generate |██████████| 300/300 (100.0%) | ⏳ 06:03<00:00 |  4.59s/it 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...














































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 06:41<00:00 |  1.34s/it
llm_generate |██████████| 150/150 (100.0%) | ⏳ 01:15<00:00 |  2.00it/s


































llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:51<00:00 |  2.90it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.55it/s
llm_generate |█████▋    | 85/150 (56.7%) | ⏳ 00:16<00:11 |  5.85it/s 
Exception in worker on attempt 1: raised InternalServerError('<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>')
Requeuing...
llm_generate |█████████▊| 148/150 (98.7%) | ⏳ 00:35<00:01 |  1.61it/s 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |█████████▊| 148/150 (98.7%) | ⏳ 00:38<00:01 |  1.61it/s 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.61it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:40<00:00 |  3.70it/s
llm_generate |██████████| 150/150 (100.0%) | ⏳ 00:15<00:00 |  9.71it/s

llm_generate |██████████| 150/150 (100.0%) | ⏳ 02:32<00:00 |  4.43s/it 
llm_generate |██████████| 150/150 (100.0%) | ⏳ 02:32<00:00 |  4.43s/it 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...

llm_generate |██████████| 150/150 (100.0%) | ⏳ 02:33<00:00 |  4.43s/it 
llm_generate |██████████| 150/150 (100.0%) | ⏳ 02:33<00:00 |  4.43s/it 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...

llm_generate |██████████| 150/150 (100.0%) | ⏳ 03:57<00:00 |  4.43s/it 
llm_generate |██████████| 150/150 (100.0%) | ⏳ 03:57<00:00 |  4.43s/it 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...

llm_generate |██████████| 150/150 (100.0%) | ⏳ 04:00<00:00 |  4.43s/it 
llm_generate |██████████| 150/150 (100.0%) | ⏳ 04:00<00:00 |  4.43s/it 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...

llm_generate |██████████| 150/150 (100.0%) | ⏳ 04:21<00:00 |  4.43s/it 
llm_generate |██████████| 150/150 (100.0%) | ⏳ 04:21<00:00 |  4.43s/it 
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...
0.6133333333333333
['claim', 'uid', 'ground_truth_label', 'ground_truth_wikipedia_titles', 'query_1', 'passages_1', 'summary_1', 'query_2', 'passages_2', 'summary_2', 'query_3', 'passages_3', 'summary_3', 'final_answer', 'correctness', 'evaluation']

🔧 Creating batches with 128,000 token limit
📊 Processing 150 examples in 5 batches
   ✅ Batch 1/5: Optimized
   ✅ Batch 2/5: Optimized
   ✅ Batch 3/5: Optimized
   ✅ Batch 4/5: Optimized
   ✅ Batch 5/5: Optimized




































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:30<00:00 |  9.72it/s


























































































































































































































                                                                
                                                                       
llm_generate |██████████| 150/150 (100.0%) | ⏳ 20:17<00:00 |  4.43s/it

                                                                
                                                                       
llm_generate |██████████| 150/150 (100.0%) | ⏳ 20:17<00:00 |  4.43s/it
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...





































































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:29<00:00 | 10.20it/s





































































































































































































































































































































































































                                                             

                                                                 
                                                                       


llm_generate |██████████| 150/150 (100.0%) | ⏳ 22:17<00:00 |  4.43s/it


                                                             

                                                                 
                                                                       


llm_generate |██████████| 150/150 (100.0%) | ⏳ 22:17<00:00 |  4.43s/it
Exception in worker on attempt 1: raised APITimeoutError('Request timed out.')
Requeuing...



























































llm_generate |██████████| 150/150 (100.0%) | ⏳ 22:32<00:00 |  9.02s/it
llm_generate |██████████| 150/150 (100.0%) | ⏳ 20:30<00:00 |  8.20s/it
llm_generate |██████████| 300/300 (100.0%) | ⏳ 03:49<00:00 |  1.31it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 01:34<00:00 |  3.17it/s










































































































































































































































































































































llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:29<00:00 | 10.17it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 01:32<00:00 |  3.25it/s
llm_generate |██████████| 300/300 (100.0%) | ⏳ 00:28<00:00 | 10.58it/s
0.5633333333333334

[8]
[{'iteration': 0,
,  'train_accuracy': np.float64(0.6666666666666666),
,  'dev_accuracy': np.float64(0.47),
,  'prompts': {'create_query_1_prompt': "Given the fields {claim}, produce the field 'query_1'.",
,   'summarize_1_prompt': "Given the fields {claim}, {passages_1}, produce the field 'summary_1'.",
,   'create_query_2_prompt': "Given the fields {claim}, {summary_1}, produce the field 'query_2'.",
,   'summarize_2_prompt': "Given the fields {claim}, {summary_1}, {passages_2}, produce the field 'summary_2'.",
,   'create_query_3_prompt': "Given the fields {claim}, {summary_1}, {summary_2}, produce the field 'query_3'.",
,   'summarize_3_prompt': "Given the fields {claim}, {summary_1}, {summary_2}, {passages_3}, produce the field 'summary_3'.",
,   'final_answer_prompt': "Given the fields {claim}, {summary_1}, {summary_2}, {summary_3}, return either 'SUPPORTED' or 'NOT_SUPPORTED'."},
,  'optimized_prompts': {'create_query_1_prompt': 'Task: Using {claim}, produce the field \'query_1\' as a single, high-recall Wikipedia search query.\nWrite one compact query that maximizes retrieval of the most relevant Wikipedia pages for verifying the claim.\nGuidelines:\n- Decompose the claim into atomic subclaims: list all named entities (people/works/organizations/places), the relation(s) between them, key attributes (occupation, nationality, director/author, label/league, birthplace/HQ, release year/number), and qualifiers (negation, comparison, \'both/either/neither\', timeline \'in 1943\' vs \'currently\', location granularity/district).\n- Include all essential entities (on both sides of each relation) plus the most discriminative attribute terms. Prefer exact/canonical Wikipedia titles in quotes with disambiguators like (film)/(song)/(album)/(TV series)/(novel); for people add role terms (film director/singer/astronaut/footballer) and nationality when relevant.\n- For negation/comparatives/conjunctive claims, include both entities and the contested property (e.g., \'inducted Rock and Roll Hall of Fame\', \'nationality German\', \'mascot\', \'headquarters district\', \'release year 2003\'). For \'not both\' or \'either/or\' structures, keep both entities and the property terms.\n- If the claim hinges on a specific value/year/metric or a geographic qualifier, include it explicitly (e.g., \'as of 2014\', \'born 1961\', \'headquarters northern Geneva\', \'area km2\'). For historical→current chains, include both time markers (e.g., \'1943 footage\' AND \'now 22nd Air Base\').\n- Disambiguate ambiguous names with precise type terms and aliases using OR (e.g., "Patty Jenkins" film director NOT Patti Scialfa; use OR for known aliases). Use AND between core entities/attributes.\nSmall examples:\n- Claim: \'Ulrich Walter and Léopold Eyharts both were not from Germany.\'\n  Query: "Ulrich Walter" astronaut nationality German OR Germany AND "Léopold Eyharts" ESA astronaut nationality French OR France\n- Claim: \'Prince and Patty Jenkins have not both been inducted into the Rock and Roll Hall of Fame.\'\n  Query: "Prince (musician)" Rock and Roll Hall of Fame induction AND "Patty Jenkins" film director NOT Patti Scialfa induction\n- Claim: \'Some of the 1943 footage for Target for Today was filmed at Marienburg; the base now hosts MiG-29 jets developed by Mikoyan.\'\n  Query: "Target for Today" 1943 Marienburg AND "22nd Air Base" Malbork "41st Tactical Squadron" MiG-29 Mikoyan design bureau\nNow produce only the query_1 string.',
,   'summarize_1_prompt': 'Task: Using {claim} and {passages_1}, produce the field \'summary_1\' as a concise, evidence-focused synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add outside knowledge. If a needed detail is absent, write \'not stated\' or \'not found\'. Quote exact titles/values/years where present and preserve temporal qualifiers (\'as of 2014\').\n- Use exact Wikipedia titles when citing; put short quoted phrases for critical qualifiers (\'not both\', \'as of\', district names).\nKeep it concise (5–8 bullets) and include only what helps verify the claim or plan retrieval.\nStructure:\n- Claim decomposition: enumerate atomic subclaims with IDs S1, S2, ... (Entity/attribute A, Entity/attribute B, relation, specific value/timeline/geo/negation qualifiers).\n- Evidence bullets: Each bullet \'[Title] — key fact(s) with exact numbers/dates/names/roles\'. If ambiguous or off-topic, label it.\n- Coverage status: for each subclaim ID from decomposition, mark Supported / Refuted / Not found yet. Track temporal alignment (past vs current) explicitly.\n- Discrepancies: flag contradictions (different date/number/location/genre/role) and likely false premises or name confusions (e.g., Patty vs Patti; series vs song title).\n- Gaps/Next-hop targets: list the highest-yield missing or ambiguous items needed next (e.g., \'need: "Ulrich Walter" nationality line\', \'need: Patty Jenkins Hall of Fame status\', \'need: base renaming link Marienburg→22nd Air Base\').\n- Suggested query terms: 1 short line with high-yield titles/terms for the next hop (prefer exact page titles + the missing attribute/value; include type disambiguators like film/TV series/city/astronaut).\nSmall example format:\n- Claim decomposition: S1=Ulrich Walter nationality not German; S2=Léopold Eyharts nationality not German. Qualifier: both.\n- [European Astronaut Corps] — mentions ESA astronauts; \'Ulrich Walter\'/\'Léopold Eyharts\' not stated.\nCoverage: S1 Not found; S2 Not found.\nDiscrepancies: none yet; need person pages.\nGaps/Next-hop targets: need: \'Ulrich Walter\' page nationality; \'Léopold Eyharts\' page nationality.\nSuggested query terms: "Ulrich Walter" astronaut nationality AND "Léopold Eyharts" nationality ESA\nNow produce only the summary_1 string.',
,   'create_query_2_prompt': 'Task: Using {claim} and {summary_1}, produce the field \'query_2\'.\nGoal: Target the single highest-value missing or contested element identified in summary_1 to advance verification.\nGuidelines:\n- Read the \'Coverage status\', \'Discrepancies\', and \'Gaps/Next-hop targets\'. Focus your query on the least-supported/most critical component that unlocks a verdict (e.g., a person\'s nationality, an induction status, the specific director/label, the base renaming link, the exact year/number).\n- Include exact page titles already observed plus the complementary attribute (e.g., person + \'nationality/birthplace/astronaut\', work + \'director/cast/label/release year\', organization + \'headquarters address/district\').\n- For names with near-miss confusions, add type terms and exclude common confounders (e.g., "Patty Jenkins" film director NOT Patti Scialfa; \'Hank Osasuna\' ITV2 series).\n- For dates/numbers/metrics, include exact value/units if the claim specifies them.\n- For conjunctive claims (\'both/and\'), query the missing conjunct directly. For \'not both\' claims, querying either entity\'s property can suffice if it discriminates (e.g., show one is not inducted).\n- Keep compact and specific; use AND between core entities; use OR for aliases.\nProduce only the query_2 string.',
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements:\n- Synthesize only what is present in {passages_2} and integrate with prior evidence from summary_1. No outside inferences.\n- Keep 5–8 bullets total; cite titles; quote critical numbers/years/phrases; mark temporal qualifiers.\n- Update coverage explicitly for each subclaim ID from summary_1: Supported/Refuted/Not found. Note temporal alignment (e.g., '1943' vs 'current').\n- Flag discrepancies (timeline, entity mismatch, genre/role mix-ups, different numeric/date/location). If evidence gives a different specific value than the claim, call it out as likely refutation.\n- End with: Gaps/Next-hop targets (the last unresolved pieces) and a concise Suggested query for hop 3 that directly tests the remaining weakest link or suspected false premise.\nStructure:\n- Evidence bullets: [Title] — key fact(s) relevant to unresolved components.\n- Coverage update: list subclaim IDs (S1, S2, ...) with status after hops 1–2.\n- Discrepancies/notes: contradictions; name/type confusions (e.g., Patty vs Patti; series title vs song title); temporal misreads.\n- Gaps/Next-hop targets\n- Suggested query terms\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the last missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target unresolved components listed under 'Gaps/Next-hop targets' in summary_2. Choose the single query most likely to close the case.\n- For conjunctive 'both' claims, verify any remaining conjunct explicitly (e.g., confirm the second person's nationality or induction status). For 'not both' claims, a single decisive refutation (e.g., evidence one entity is not inducted) is sufficient if directly supported.\n- For numeric/date claims, query explicitly for the exact metric with units/value (e.g., 'released 1984', 'rank third highest', 'area km2').\n- For credits/authorship/cast, query the exact work title plus 'director/cast/composer/writer/producer/credits'.\n- For geo qualifiers within cities, query for district/arrondissement/address/coordinates tied to the entity (e.g., 'headquarters northern Geneva district').\n- If a false premise is suspected (wrong identity/role/time), query to confirm/refute that premise directly (e.g., 'Patty Jenkins Rock and Roll Hall of Fame inductee', 'Hank Osasuna ITV2 series title', 'Marienburg now 22nd Air Base').\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, map it to each part of the claim, and assess sufficiency for a verdict.\nRequirements:\n- Be concise (6–10 bullets) and strictly grounded in the retrieved text; cite titles; quote critical values/years/phrases; retain temporal qualifiers.\n- Provide a component-wise support matrix and final evidence sufficiency assessment.\nStructure:\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing the last gaps or suspected contradictions.\n- Claim mapping:\n  - Entity/attribute A (S1): Supported/Refuted/Not found (by [Title])\n  - Entity/attribute B (S2): Supported/Refuted/Not found (by [Title])\n  - Relation/timeline/number/geo qualifier (S3...): Supported/Refuted/Not found (by [Title])\n- Discrepancies/notes: list contradictions, timeline/number mismatches, wrong entity types/genres, misidentified persons (e.g., Patty vs Patti), or missing links. Treat differing specific values/dates as contradictions.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (state the missing specific component).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Return SUPPORTED only if every essential component of the claim (all named entities, the specific relation/attribute, and all qualifiers such as date/number/geo/genre/superlative/negation) is directly and unambiguously supported by the cited evidence with no unresolved contradictions. For 'not both' claims, it is sufficient to have direct evidence that at least one of the two entities does not satisfy the property (or direct evidence that exactly one does and the other does not).\n- Return NOT_SUPPORTED if any essential component is directly contradicted (e.g., different location/date/number/ranking/genre/role) or a false premise is confirmed (e.g., wrong identity/title). If the claim specifies a particular value/qualifier and evidence provides a different specific value, treat as contradiction (NOT_SUPPORTED). Near-misses (e.g., 1942 vs 1941) are contradictions.\n- Return NOT ENOUGH INFO if any essential component remains unverified or only partially/indirectly supported, or if evidence is silent on a required detail (e.g., missing second conjunct in a 'both' claim, missing release year, missing mascot/credit, missing original artist/genre). For claims linking different timeframes (historical→current), ensure each linked component is evidenced; if any link is missing, return NOT ENOUGH INFO.\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."}},
, {'iteration': 1,
,  'train_accuracy': np.float64(0.5133333333333333),
,  'dev_accuracy': np.float64(0.5933333333333334),
,  'prompts': {'create_query_1_prompt': 'Task: Using {claim}, produce the field \'query_1\' as a single, high-recall Wikipedia search query.\nWrite one compact query that maximizes retrieval of the most relevant Wikipedia pages for verifying the claim.\nGuidelines:\n- Decompose the claim into atomic subclaims: list all named entities (people/works/organizations/places), the relation(s) between them, key attributes (occupation, nationality, director/author, label/league, birthplace/HQ, release year/number), and qualifiers (negation, comparison, \'both/either/neither\', timeline \'in 1943\' vs \'currently\', location granularity/district).\n- Include all essential entities (on both sides of each relation) plus the most discriminative attribute terms. Prefer exact/canonical Wikipedia titles in quotes with disambiguators like (film)/(song)/(album)/(TV series)/(novel); for people add role terms (film director/singer/astronaut/footballer) and nationality when relevant.\n- For negation/comparatives/conjunctive claims, include both entities and the contested property (e.g., \'inducted Rock and Roll Hall of Fame\', \'nationality German\', \'mascot\', \'headquarters district\', \'release year 2003\'). For \'not both\' or \'either/or\' structures, keep both entities and the property terms.\n- If the claim hinges on a specific value/year/metric or a geographic qualifier, include it explicitly (e.g., \'as of 2014\', \'born 1961\', \'headquarters northern Geneva\', \'area km2\'). For historical→current chains, include both time markers (e.g., \'1943 footage\' AND \'now 22nd Air Base\').\n- Disambiguate ambiguous names with precise type terms and aliases using OR (e.g., "Patty Jenkins" film director NOT Patti Scialfa; use OR for known aliases). Use AND between core entities/attributes.\nSmall examples:\n- Claim: \'Ulrich Walter and Léopold Eyharts both were not from Germany.\'\n  Query: "Ulrich Walter" astronaut nationality German OR Germany AND "Léopold Eyharts" ESA astronaut nationality French OR France\n- Claim: \'Prince and Patty Jenkins have not both been inducted into the Rock and Roll Hall of Fame.\'\n  Query: "Prince (musician)" Rock and Roll Hall of Fame induction AND "Patty Jenkins" film director NOT Patti Scialfa induction\n- Claim: \'Some of the 1943 footage for Target for Today was filmed at Marienburg; the base now hosts MiG-29 jets developed by Mikoyan.\'\n  Query: "Target for Today" 1943 Marienburg AND "22nd Air Base" Malbork "41st Tactical Squadron" MiG-29 Mikoyan design bureau\nNow produce only the query_1 string.',
,   'summarize_1_prompt': 'Task: Using {claim} and {passages_1}, produce the field \'summary_1\' as a concise, evidence-focused synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add outside knowledge. If a needed detail is absent, write \'not stated\' or \'not found\'. Quote exact titles/values/years where present and preserve temporal qualifiers (\'as of 2014\').\n- Use exact Wikipedia titles when citing; put short quoted phrases for critical qualifiers (\'not both\', \'as of\', district names).\nKeep it concise (5–8 bullets) and include only what helps verify the claim or plan retrieval.\nStructure:\n- Claim decomposition: enumerate atomic subclaims with IDs S1, S2, ... (Entity/attribute A, Entity/attribute B, relation, specific value/timeline/geo/negation qualifiers).\n- Evidence bullets: Each bullet \'[Title] — key fact(s) with exact numbers/dates/names/roles\'. If ambiguous or off-topic, label it.\n- Coverage status: for each subclaim ID from decomposition, mark Supported / Refuted / Not found yet. Track temporal alignment (past vs current) explicitly.\n- Discrepancies: flag contradictions (different date/number/location/genre/role) and likely false premises or name confusions (e.g., Patty vs Patti; series vs song title).\n- Gaps/Next-hop targets: list the highest-yield missing or ambiguous items needed next (e.g., \'need: "Ulrich Walter" nationality line\', \'need: Patty Jenkins Hall of Fame status\', \'need: base renaming link Marienburg→22nd Air Base\').\n- Suggested query terms: 1 short line with high-yield titles/terms for the next hop (prefer exact page titles + the missing attribute/value; include type disambiguators like film/TV series/city/astronaut).\nSmall example format:\n- Claim decomposition: S1=Ulrich Walter nationality not German; S2=Léopold Eyharts nationality not German. Qualifier: both.\n- [European Astronaut Corps] — mentions ESA astronauts; \'Ulrich Walter\'/\'Léopold Eyharts\' not stated.\nCoverage: S1 Not found; S2 Not found.\nDiscrepancies: none yet; need person pages.\nGaps/Next-hop targets: need: \'Ulrich Walter\' page nationality; \'Léopold Eyharts\' page nationality.\nSuggested query terms: "Ulrich Walter" astronaut nationality AND "Léopold Eyharts" nationality ESA\nNow produce only the summary_1 string.',
,   'create_query_2_prompt': 'Task: Using {claim} and {summary_1}, produce the field \'query_2\'.\nGoal: Target the single highest-value missing or contested element identified in summary_1 to advance verification.\nGuidelines:\n- Read the \'Coverage status\', \'Discrepancies\', and \'Gaps/Next-hop targets\'. Focus your query on the least-supported/most critical component that unlocks a verdict (e.g., a person\'s nationality, an induction status, the specific director/label, the base renaming link, the exact year/number).\n- Include exact page titles already observed plus the complementary attribute (e.g., person + \'nationality/birthplace/astronaut\', work + \'director/cast/label/release year\', organization + \'headquarters address/district\').\n- For names with near-miss confusions, add type terms and exclude common confounders (e.g., "Patty Jenkins" film director NOT Patti Scialfa; \'Hank Osasuna\' ITV2 series).\n- For dates/numbers/metrics, include exact value/units if the claim specifies them.\n- For conjunctive claims (\'both/and\'), query the missing conjunct directly. For \'not both\' claims, querying either entity\'s property can suffice if it discriminates (e.g., show one is not inducted).\n- Keep compact and specific; use AND between core entities; use OR for aliases.\nProduce only the query_2 string.',
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements:\n- Synthesize only what is present in {passages_2} and integrate with prior evidence from summary_1. No outside inferences.\n- Keep 5–8 bullets total; cite titles; quote critical numbers/years/phrases; mark temporal qualifiers.\n- Update coverage explicitly for each subclaim ID from summary_1: Supported/Refuted/Not found. Note temporal alignment (e.g., '1943' vs 'current').\n- Flag discrepancies (timeline, entity mismatch, genre/role mix-ups, different numeric/date/location). If evidence gives a different specific value than the claim, call it out as likely refutation.\n- End with: Gaps/Next-hop targets (the last unresolved pieces) and a concise Suggested query for hop 3 that directly tests the remaining weakest link or suspected false premise.\nStructure:\n- Evidence bullets: [Title] — key fact(s) relevant to unresolved components.\n- Coverage update: list subclaim IDs (S1, S2, ...) with status after hops 1–2.\n- Discrepancies/notes: contradictions; name/type confusions (e.g., Patty vs Patti; series title vs song title); temporal misreads.\n- Gaps/Next-hop targets\n- Suggested query terms\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the last missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target unresolved components listed under 'Gaps/Next-hop targets' in summary_2. Choose the single query most likely to close the case.\n- For conjunctive 'both' claims, verify any remaining conjunct explicitly (e.g., confirm the second person's nationality or induction status). For 'not both' claims, a single decisive refutation (e.g., evidence one entity is not inducted) is sufficient if directly supported.\n- For numeric/date claims, query explicitly for the exact metric with units/value (e.g., 'released 1984', 'rank third highest', 'area km2').\n- For credits/authorship/cast, query the exact work title plus 'director/cast/composer/writer/producer/credits'.\n- For geo qualifiers within cities, query for district/arrondissement/address/coordinates tied to the entity (e.g., 'headquarters northern Geneva district').\n- If a false premise is suspected (wrong identity/role/time), query to confirm/refute that premise directly (e.g., 'Patty Jenkins Rock and Roll Hall of Fame inductee', 'Hank Osasuna ITV2 series title', 'Marienburg now 22nd Air Base').\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, map it to each part of the claim, and assess sufficiency for a verdict.\nRequirements:\n- Be concise (6–10 bullets) and strictly grounded in the retrieved text; cite titles; quote critical values/years/phrases; retain temporal qualifiers.\n- Provide a component-wise support matrix and final evidence sufficiency assessment.\nStructure:\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing the last gaps or suspected contradictions.\n- Claim mapping:\n  - Entity/attribute A (S1): Supported/Refuted/Not found (by [Title])\n  - Entity/attribute B (S2): Supported/Refuted/Not found (by [Title])\n  - Relation/timeline/number/geo qualifier (S3...): Supported/Refuted/Not found (by [Title])\n- Discrepancies/notes: list contradictions, timeline/number mismatches, wrong entity types/genres, misidentified persons (e.g., Patty vs Patti), or missing links. Treat differing specific values/dates as contradictions.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (state the missing specific component).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Return SUPPORTED only if every essential component of the claim (all named entities, the specific relation/attribute, and all qualifiers such as date/number/geo/genre/superlative/negation) is directly and unambiguously supported by the cited evidence with no unresolved contradictions. For 'not both' claims, it is sufficient to have direct evidence that at least one of the two entities does not satisfy the property (or direct evidence that exactly one does and the other does not).\n- Return NOT_SUPPORTED if any essential component is directly contradicted (e.g., different location/date/number/ranking/genre/role) or a false premise is confirmed (e.g., wrong identity/title). If the claim specifies a particular value/qualifier and evidence provides a different specific value, treat as contradiction (NOT_SUPPORTED). Near-misses (e.g., 1942 vs 1941) are contradictions.\n- Return NOT ENOUGH INFO if any essential component remains unverified or only partially/indirectly supported, or if evidence is silent on a required detail (e.g., missing second conjunct in a 'both' claim, missing release year, missing mascot/credit, missing original artist/genre). For claims linking different timeframes (historical→current), ensure each linked component is evidenced; if any link is missing, return NOT ENOUGH INFO.\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."},
,  'optimized_prompts': {'create_query_1_prompt': "Task: Using {claim}, produce the field 'query_1' as a single, high-recall Wikipedia search query for the wiki abstracts corpus.\nWrite one compact query that maximizes retrieval of the most relevant Wikipedia pages needed to verify the claim in multi-hop settings.\nMethod (Entity–Slot–Predicate with Linking):\n- Entities: enumerate all named/proxy entities (people, works, orgs, places, events, characters, albums/songs/episodes). Resolve implicit identities where possible by including the referenced work/event and its role term (e.g., “author of X” → X AND author). For ambiguous names, add Wikipedia-style disambiguators (film/TV series/album/song/politician/novel) or key roles.\n- Slots (attributes): extract the exact properties to verify (e.g., cast/starring/voice cast; director/creator/writer/composer/producer; party/nationality; release year/season/month; birthplace/headquarters/district; award/induction; chart peak/roster/track listing/membership; numbers like “nine wins”).\n- Predicate/Relation and Logic: encode relations between entities and slots, plus qualifiers (both/also/only/all/none/neither/not both; exact numbers/dates/geo scopes). For multi-part claims, include all core entities in one query to surface pages that connect them.\nGuidelines for query_1:\n- Use Wikipedia-style titles and parentheticals (film/TV series/album/song/novel/politician/band). Include aliases with OR if helpful; connect core entities and the decisive slot with AND. Keep the query under ~20 content tokens and front-load distinctive titles.\n- Include explicit role keywords for credits/affiliations and list-type terms when relevant: cast OR starring; director OR creator; writer OR composer; track listing; political party OR affiliation; inducted; winners; roster; discography.\n- Include numeric/date/geo markers exactly as claimed: “summer 2008”, “2008–09 season”, “born 1985”, “8th arrondissement”, “nine wins”. Include British/American spelling variants with OR when salient (honour/honor; theatre/theater).\n- Anticipate common confusions and disambiguate with role/type words (e.g., Mercury (planet) vs Mercury (element); Thésée (opera)). You may add one high-value NOT exclusion if it clearly reduces ambiguity.\n- Output discipline: produce only the query_1 string; no explanations.\nMicro examples:\n- Claim: “Step Up 2: The Streets is the sequel to the dance film directed by Anne Fletcher.”\n  Query: Step Up 2: The Streets AND sequel AND Step Up (film) AND Anne Fletcher director\n- Claim: “The teammate who won the Nation’s Cup with Nico Hülkenberg in Bushy Park also won Le Mans nine times.”\n  Query: Race of Champions 2014 Bushy Park Nation’s Cup winners AND Nico Hülkenberg teammate AND 24 Hours of Le Mans nine wins\n- Claim: “The opera Thésée was composed by an Italian-born British composer as a tragédie en musique.”\n  Query: Thésée (opera) AND composer AND Tragédie en musique AND Jean-Baptiste Lully composer OR Italian-born British composer\nNow produce only the query_1 string.",
,   'summarize_1_prompt': "Task: Using {claim} and {passages_1}, produce the field 'summary_1' as a concise, evidence-focused synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add outside knowledge. If a needed detail is absent, write 'not stated' or 'no mention'. If a passage contradicts a detail, mark it as 'contradiction'. If canonical/primary pages lack an expected fact, mark it as 'strong absence on [Title]'.\n- Preserve exact titles/years/values/roles and qualifiers (as of, both, only, first, numbers, month/season markers, districts/arrondissements). Maintain clear entity–role typing (film vs episode; director vs writer; team vs event).\n- Keep logical links explicit: if the claim requires a single shared subject across multiple properties (both/also/and), track that the same entity must satisfy all parts.\nKeep it concise (5–8 bullets) and include only what helps verify the claim or plan retrieval. Avoid irrelevant tangents (e.g., capacities/dimensions) unless required by the claim.\nStructure:\n- Claim decomposition: enumerate atomic subclaims with IDs S1, S2, ... and note whether a shared subject is required.\n- Entity/role map: list resolved titles/persons; note unresolved identities and candidate(s) if present.\n- Evidence bullets: Each bullet '[Title] — key fact(s) with exact names/dates/roles/numbers/phrases'. Mark off-topic/ambiguous items. Note strong absence on canonical pages when relevant.\n- Coverage status: for each S#, mark Supported / Refuted / Not found yet. If a shared subject is required, state Shared-subject status: satisfied / not satisfied / unresolved.\n- Discrepancies: contradictions; wrong entity/role; series-region/film-episode confusions; identity mismatch; strong absence on canonical pages.\n- Gaps/Next-hop targets: name the single most decisive unresolved item(s) to unlock a verdict (identity, specific credit, date/number, membership/induction, track inclusion, party/parent). Prioritize what directly settles the claim; do not propose side details.\n- Suggested query terms: one short line with exact titles + the missing attribute/value/role (add type disambiguators like film/TV series/album/song/opera/politician; include OR aliases if helpful).\nSmall example format:\n- Claim decomposition: S1=Is Step Up 2 a sequel to a film directed by Anne Fletcher? (no shared subject needed); S2=Who directed Step Up 2?\n- Entity/role map: Step Up (film) — director Anne Fletcher; Step Up 2: The Streets — director unresolved.\n- [Step Up (film)] — directed by Anne Fletcher (2006). [Step Up 2: The Streets] — 2008 sequel; director not stated.\nCoverage: S1 Partially supported; S2 Not found yet. Shared-subject: not required.\nDiscrepancies: none.\nGaps/Next-hop targets: confirm director of Step Up 2.\nSuggested query terms: Step Up 2: The Streets director AND sequel to Step Up (film)",
,   'create_query_2_prompt': "Task: Using {claim} and {summary_1}, produce the field 'query_2'.\nGoal: Target the single highest-value missing or contested element identified in summary_1 to advance verification or refutation.\nGuidelines:\n- Read 'Coverage status', 'Discrepancies', and 'Gaps/Next-hop targets'. Choose the one decisive item that, if answered, would most directly resolve the claim (identity resolution; exact cast/credit; release date/season; party/parent/induction; track listing inclusion; exact number/ranking; canonical list confirming/refuting a universal/exclusive like both/only/all/none).\n- Reuse exact page titles and role terms from summary_1. If a strong absence was noted on a canonical page, query that title plus the contested attribute or retrieve an authoritative list page (e.g., List of winners; inductees; roster; discography; episode list) to confirm/refute.\n- For conjunctive claims, ensure the remaining conjunct is verified for the same subject; if identity is unresolved, resolve it first by linking the distinctive event/work with the decisive attribute in one query.\n- Include precise numeric/date/geo qualifiers when relevant (e.g., “nine wins”, “summer 2008”, “8th arrondissement”). Keep the query compact; use AND between core entities and the attribute; use OR for plausible aliases.\nProduce only the query_2 string.",
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements:\n- Synthesize only what is present in {passages_2} and integrate with prior evidence from summary_1. No outside inferences. Keep 5–8 bullets; cite titles; quote critical numbers/years/phrases; retain temporal/geo qualifiers ('as of', 'first', 'only', seasons/months, districts).\n- Update coverage explicitly for each S# from summary_1: Supported/Refuted/Not found. Track whether a single shared subject satisfies all required parts when the claim is conjunctive; add a 'Shared-subject status: satisfied / not satisfied / unresolved' line when applicable.\n- Call out counterexamples and strong absences (canonical/list pages showing the opposite; main pages lacking a claimed credit/affiliation).\n- Treat temporal precision carefully: if the claim asserts a specific month/season and evidence provides only a broader season/year or mismatched span, mark Not found yet and note the mismatch.\n- End with: Gaps/Next-hop targets (the last unresolved piece that would close the case) and a concise Suggested query for hop 3 that directly tests the remaining weakest link, false premise, specific number/date, or needed counterexample.\nStructure:\n- Evidence bullets: [Title] — key fact(s) relevant to unresolved components; quote key values.\n- Coverage update: list subclaim IDs (S1, S2, ...) with status after hops 1–2; add Shared-subject status when applicable.\n- Discrepancies/notes: contradictions; role/type/name/series-region mix-ups; temporal/geo mismatches; strong absences.\n- Gaps/Next-hop targets\n- Suggested query terms\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the last missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target the single unresolved component listed under 'Gaps/Next-hop targets' that will settle the claim (e.g., the exact cast/credit, the track listing inclusion, the company parent/customer base, the political party, the precise release month/season/date, the original recording artist, the Hall of Fame induction status, the event winners/roster).\n- For conjunctive 'both/and' or exclusives ('only', 'neither'), verify any remaining conjunct explicitly for the same subject, or retrieve a canonical list/definition page that clearly contradicts the claim.\n- For numeric/date/geo claims, query explicitly for the exact metric/value or fine-grained timeframe (e.g., 'nine wins', 'summer 2008', 'headquarters 8th arrondissement'). Avoid accepting broader spans that do not directly match the claim.\n- For credits/authorship/cast/creator/host/producer, query the exact work title plus the specific credit term ('cast/track listing/director/creator/writer/composer/producer/credits'). For awards/inductions, query the person AND 'inducted' AND the award title; for competitions/rosters, query the event/team page AND 'winners' OR 'roster'.\n- Reuse exact titles and role terms established in earlier summaries to keep the subject aligned.\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, map it to each part of the claim, and assess sufficiency for a verdict.\nRequirements:\n- Be concise (6–10 bullets) and strictly grounded in the retrieved text; cite titles; quote critical values/years/venues/phrases; retain temporal/geo and logical qualifiers. Maintain correct attachment of properties in relational chains (A→B, B→C) and track shared-subject requirements.\n- Provide a component-wise support matrix and final evidence sufficiency assessment. Treat negative/absence evidence explicitly: if canonical or authoritative list pages lack the claimed fact while the counterpart is evidenced, note this as strong counterevidence. One clear counterexample or authoritative list can refute universals/exclusives ('both', 'neither', 'only', 'all').\nStructure:\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing the last gaps or contradictions.\n- Claim mapping:\n  - S1: Supported/Refuted/Not found (by [Title])\n  - S2: Supported/Refuted/Not found (by [Title])\n  - S3... (continue for all subclaims and qualifiers)\n  - Shared-subject status (if applicable): satisfied / not satisfied / unresolved\n- Scope/implication check: note if the claim implies exclusivity/causality/universality not borne out; treat mismatched specific values/dates/roles as contradictions; avoid overconstraining links not stated in the claim.\n- Discrepancies/notes: contradictions; timeline/number/venue/geo mismatches (e.g., 'summer 2008' vs '2008–09 season'); wrong entity types/genres; misidentified persons; strong absences on canonical pages.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (state the missing specific component or unresolved identity/date/credit).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Return SUPPORTED only if every essential component of the claim (all named/resolved entities, the specific relation/attribute, and all qualifiers such as date/season/month/number/venue/geo/genre/role/superlative/universality/exclusivity/negation/comparison) is directly and unambiguously supported by the cited evidence with no unresolved contradictions, and if any required shared subject across multiple properties is satisfied by the same entity.\n- Return NOT_SUPPORTED if any essential component is directly contradicted (different person/title/role/creator-vs-writer; different location/date/season/venue/number/ranking; misattributed identity; canonical list/roster/credits/inductees pages show the opposite; or if a universal/exclusive/negation claim ('both', 'neither', 'only', 'all', 'both not') is shown false by one clear counterexample). Treat strong absence on canonical pages as counterevidence when the counterpart side is evidenced.\n- Return NOT ENOUGH INFO if any essential component remains unverified or only partially supported, or if evidence is silent on a required detail (e.g., missing the second conjunct in a 'both' claim; missing exact number/venue/district; unresolved identity like the singer/host; timeline precision not met when the claim specifies a month/season but sources only give a broader season/year without explicit alignment).\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."}},
, {'iteration': 2,
,  'train_accuracy': np.float64(0.58),
,  'dev_accuracy': np.float64(0.54),
,  'prompts': {'create_query_1_prompt': "Task: Using {claim}, produce the field 'query_1' as a single, high-recall Wikipedia search query for the wiki abstracts corpus.\nWrite one compact query that maximizes retrieval of the most relevant Wikipedia pages needed to verify the claim in multi-hop settings.\nMethod (Entity–Slot–Predicate with Linking):\n- Entities: enumerate all named/proxy entities (people, works, orgs, places, events, characters, albums/songs/episodes). Resolve implicit identities where possible by including the referenced work/event and its role term (e.g., “author of X” → X AND author). For ambiguous names, add Wikipedia-style disambiguators (film/TV series/album/song/politician/novel) or key roles.\n- Slots (attributes): extract the exact properties to verify (e.g., cast/starring/voice cast; director/creator/writer/composer/producer; party/nationality; release year/season/month; birthplace/headquarters/district; award/induction; chart peak/roster/track listing/membership; numbers like “nine wins”).\n- Predicate/Relation and Logic: encode relations between entities and slots, plus qualifiers (both/also/only/all/none/neither/not both; exact numbers/dates/geo scopes). For multi-part claims, include all core entities in one query to surface pages that connect them.\nGuidelines for query_1:\n- Use Wikipedia-style titles and parentheticals (film/TV series/album/song/novel/politician/band). Include aliases with OR if helpful; connect core entities and the decisive slot with AND. Keep the query under ~20 content tokens and front-load distinctive titles.\n- Include explicit role keywords for credits/affiliations and list-type terms when relevant: cast OR starring; director OR creator; writer OR composer; track listing; political party OR affiliation; inducted; winners; roster; discography.\n- Include numeric/date/geo markers exactly as claimed: “summer 2008”, “2008–09 season”, “born 1985”, “8th arrondissement”, “nine wins”. Include British/American spelling variants with OR when salient (honour/honor; theatre/theater).\n- Anticipate common confusions and disambiguate with role/type words (e.g., Mercury (planet) vs Mercury (element); Thésée (opera)). You may add one high-value NOT exclusion if it clearly reduces ambiguity.\n- Output discipline: produce only the query_1 string; no explanations.\nMicro examples:\n- Claim: “Step Up 2: The Streets is the sequel to the dance film directed by Anne Fletcher.”\n  Query: Step Up 2: The Streets AND sequel AND Step Up (film) AND Anne Fletcher director\n- Claim: “The teammate who won the Nation’s Cup with Nico Hülkenberg in Bushy Park also won Le Mans nine times.”\n  Query: Race of Champions 2014 Bushy Park Nation’s Cup winners AND Nico Hülkenberg teammate AND 24 Hours of Le Mans nine wins\n- Claim: “The opera Thésée was composed by an Italian-born British composer as a tragédie en musique.”\n  Query: Thésée (opera) AND composer AND Tragédie en musique AND Jean-Baptiste Lully composer OR Italian-born British composer\nNow produce only the query_1 string.",
,   'summarize_1_prompt': "Task: Using {claim} and {passages_1}, produce the field 'summary_1' as a concise, evidence-focused synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add outside knowledge. If a needed detail is absent, write 'not stated' or 'no mention'. If a passage contradicts a detail, mark it as 'contradiction'. If canonical/primary pages lack an expected fact, mark it as 'strong absence on [Title]'.\n- Preserve exact titles/years/values/roles and qualifiers (as of, both, only, first, numbers, month/season markers, districts/arrondissements). Maintain clear entity–role typing (film vs episode; director vs writer; team vs event).\n- Keep logical links explicit: if the claim requires a single shared subject across multiple properties (both/also/and), track that the same entity must satisfy all parts.\nKeep it concise (5–8 bullets) and include only what helps verify the claim or plan retrieval. Avoid irrelevant tangents (e.g., capacities/dimensions) unless required by the claim.\nStructure:\n- Claim decomposition: enumerate atomic subclaims with IDs S1, S2, ... and note whether a shared subject is required.\n- Entity/role map: list resolved titles/persons; note unresolved identities and candidate(s) if present.\n- Evidence bullets: Each bullet '[Title] — key fact(s) with exact names/dates/roles/numbers/phrases'. Mark off-topic/ambiguous items. Note strong absence on canonical pages when relevant.\n- Coverage status: for each S#, mark Supported / Refuted / Not found yet. If a shared subject is required, state Shared-subject status: satisfied / not satisfied / unresolved.\n- Discrepancies: contradictions; wrong entity/role; series-region/film-episode confusions; identity mismatch; strong absence on canonical pages.\n- Gaps/Next-hop targets: name the single most decisive unresolved item(s) to unlock a verdict (identity, specific credit, date/number, membership/induction, track inclusion, party/parent). Prioritize what directly settles the claim; do not propose side details.\n- Suggested query terms: one short line with exact titles + the missing attribute/value/role (add type disambiguators like film/TV series/album/song/opera/politician; include OR aliases if helpful).\nSmall example format:\n- Claim decomposition: S1=Is Step Up 2 a sequel to a film directed by Anne Fletcher? (no shared subject needed); S2=Who directed Step Up 2?\n- Entity/role map: Step Up (film) — director Anne Fletcher; Step Up 2: The Streets — director unresolved.\n- [Step Up (film)] — directed by Anne Fletcher (2006). [Step Up 2: The Streets] — 2008 sequel; director not stated.\nCoverage: S1 Partially supported; S2 Not found yet. Shared-subject: not required.\nDiscrepancies: none.\nGaps/Next-hop targets: confirm director of Step Up 2.\nSuggested query terms: Step Up 2: The Streets director AND sequel to Step Up (film)",
,   'create_query_2_prompt': "Task: Using {claim} and {summary_1}, produce the field 'query_2'.\nGoal: Target the single highest-value missing or contested element identified in summary_1 to advance verification or refutation.\nGuidelines:\n- Read 'Coverage status', 'Discrepancies', and 'Gaps/Next-hop targets'. Choose the one decisive item that, if answered, would most directly resolve the claim (identity resolution; exact cast/credit; release date/season; party/parent/induction; track listing inclusion; exact number/ranking; canonical list confirming/refuting a universal/exclusive like both/only/all/none).\n- Reuse exact page titles and role terms from summary_1. If a strong absence was noted on a canonical page, query that title plus the contested attribute or retrieve an authoritative list page (e.g., List of winners; inductees; roster; discography; episode list) to confirm/refute.\n- For conjunctive claims, ensure the remaining conjunct is verified for the same subject; if identity is unresolved, resolve it first by linking the distinctive event/work with the decisive attribute in one query.\n- Include precise numeric/date/geo qualifiers when relevant (e.g., “nine wins”, “summer 2008”, “8th arrondissement”). Keep the query compact; use AND between core entities and the attribute; use OR for plausible aliases.\nProduce only the query_2 string.",
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements:\n- Synthesize only what is present in {passages_2} and integrate with prior evidence from summary_1. No outside inferences. Keep 5–8 bullets; cite titles; quote critical numbers/years/phrases; retain temporal/geo qualifiers ('as of', 'first', 'only', seasons/months, districts).\n- Update coverage explicitly for each S# from summary_1: Supported/Refuted/Not found. Track whether a single shared subject satisfies all required parts when the claim is conjunctive; add a 'Shared-subject status: satisfied / not satisfied / unresolved' line when applicable.\n- Call out counterexamples and strong absences (canonical/list pages showing the opposite; main pages lacking a claimed credit/affiliation).\n- Treat temporal precision carefully: if the claim asserts a specific month/season and evidence provides only a broader season/year or mismatched span, mark Not found yet and note the mismatch.\n- End with: Gaps/Next-hop targets (the last unresolved piece that would close the case) and a concise Suggested query for hop 3 that directly tests the remaining weakest link, false premise, specific number/date, or needed counterexample.\nStructure:\n- Evidence bullets: [Title] — key fact(s) relevant to unresolved components; quote key values.\n- Coverage update: list subclaim IDs (S1, S2, ...) with status after hops 1–2; add Shared-subject status when applicable.\n- Discrepancies/notes: contradictions; role/type/name/series-region mix-ups; temporal/geo mismatches; strong absences.\n- Gaps/Next-hop targets\n- Suggested query terms\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the last missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target the single unresolved component listed under 'Gaps/Next-hop targets' that will settle the claim (e.g., the exact cast/credit, the track listing inclusion, the company parent/customer base, the political party, the precise release month/season/date, the original recording artist, the Hall of Fame induction status, the event winners/roster).\n- For conjunctive 'both/and' or exclusives ('only', 'neither'), verify any remaining conjunct explicitly for the same subject, or retrieve a canonical list/definition page that clearly contradicts the claim.\n- For numeric/date/geo claims, query explicitly for the exact metric/value or fine-grained timeframe (e.g., 'nine wins', 'summer 2008', 'headquarters 8th arrondissement'). Avoid accepting broader spans that do not directly match the claim.\n- For credits/authorship/cast/creator/host/producer, query the exact work title plus the specific credit term ('cast/track listing/director/creator/writer/composer/producer/credits'). For awards/inductions, query the person AND 'inducted' AND the award title; for competitions/rosters, query the event/team page AND 'winners' OR 'roster'.\n- Reuse exact titles and role terms established in earlier summaries to keep the subject aligned.\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, map it to each part of the claim, and assess sufficiency for a verdict.\nRequirements:\n- Be concise (6–10 bullets) and strictly grounded in the retrieved text; cite titles; quote critical values/years/venues/phrases; retain temporal/geo and logical qualifiers. Maintain correct attachment of properties in relational chains (A→B, B→C) and track shared-subject requirements.\n- Provide a component-wise support matrix and final evidence sufficiency assessment. Treat negative/absence evidence explicitly: if canonical or authoritative list pages lack the claimed fact while the counterpart is evidenced, note this as strong counterevidence. One clear counterexample or authoritative list can refute universals/exclusives ('both', 'neither', 'only', 'all').\nStructure:\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing the last gaps or contradictions.\n- Claim mapping:\n  - S1: Supported/Refuted/Not found (by [Title])\n  - S2: Supported/Refuted/Not found (by [Title])\n  - S3... (continue for all subclaims and qualifiers)\n  - Shared-subject status (if applicable): satisfied / not satisfied / unresolved\n- Scope/implication check: note if the claim implies exclusivity/causality/universality not borne out; treat mismatched specific values/dates/roles as contradictions; avoid overconstraining links not stated in the claim.\n- Discrepancies/notes: contradictions; timeline/number/venue/geo mismatches (e.g., 'summer 2008' vs '2008–09 season'); wrong entity types/genres; misidentified persons; strong absences on canonical pages.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (state the missing specific component or unresolved identity/date/credit).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Return SUPPORTED only if every essential component of the claim (all named/resolved entities, the specific relation/attribute, and all qualifiers such as date/season/month/number/venue/geo/genre/role/superlative/universality/exclusivity/negation/comparison) is directly and unambiguously supported by the cited evidence with no unresolved contradictions, and if any required shared subject across multiple properties is satisfied by the same entity.\n- Return NOT_SUPPORTED if any essential component is directly contradicted (different person/title/role/creator-vs-writer; different location/date/season/venue/number/ranking; misattributed identity; canonical list/roster/credits/inductees pages show the opposite; or if a universal/exclusive/negation claim ('both', 'neither', 'only', 'all', 'both not') is shown false by one clear counterexample). Treat strong absence on canonical pages as counterevidence when the counterpart side is evidenced.\n- Return NOT ENOUGH INFO if any essential component remains unverified or only partially supported, or if evidence is silent on a required detail (e.g., missing the second conjunct in a 'both' claim; missing exact number/venue/district; unresolved identity like the singer/host; timeline precision not met when the claim specifies a month/season but sources only give a broader season/year without explicit alignment).\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."},
,  'optimized_prompts': {'create_query_1_prompt': "Task: Using {claim}, produce the field 'query_1' as one high-recall, disambiguated Wikipedia abstracts search query.\nGoal: Retrieve the single most authoritative page(s) that unlock the core entity(ies), definition(s), or relation(s) needed to verify the claim, especially for multi-hop chains.\nMethod (Parse → Pivot → Disambiguate → Decisive Slot Terms → Authoritative Targets):\n- Parse and type the claim into atomic subclaims and detect structure:\n  • Identity/indirection (e.g., 'writer of [song]', 'author of X’s Y', 'singer of [track]')\n  • Role/attribute (party affiliation, birthplace, cast/credits, members/lineup, inductee status)\n  • Temporal/quantified/comparative (date/month/season, number/ranking, earlier/later)\n  • Conjunction vs simultaneity: If a claim links facts about the same subject across different times (A happened in 1943; B operates there now), do not enforce simultaneity unless explicitly stated.\n- Choose a decisive pivot that resolves the largest uncertainty first:\n  • If the subject is described via a work/role, target the work’s canonical page plus exact credit/list terms (writer/lyricist/cast/track listing/guest appearances) to identify the person. Bridge to bios later.\n  • For membership/inductee/roster claims, target canonical list/roster/credits pages ('List of ...', 'Track listing', 'Cast', 'Members', 'Inductees').\n  • For ambiguous names, add profession/type and parenthetical (film/TV series/album/song/politician/band/city/company; U.S./UK; release year).\n- Encode decisive slots that live in leads/infoboxes/lists:\n  • Exact role terms (writer/lyricist/creator/producer/host/members/lineup; 'inducted', 'political party/affiliation')\n  • Precise temporal/metric terms (year/month/season, number/ranking, area km2)\n- Authority prioritization:\n  • Prefer canonical subject pages and authoritative lists/credits/definition pages ('List of ...', 'Inductees', 'Cast', 'Track listing', 'Members', 'Discography/Filmography', 'Awards and nominations').\n- Anti-hallucination/precision guards:\n  • Do not assert new facts; use claim details only as query terms.\n  • When identity is unresolved, do not bias with a guessed name; query the work + role terms first.\n  • For potential name collisions, include profession and/or parenthetical disambiguators.\nGuidelines for query_1:\n- Keep under ~22 content tokens; front-load distinctive titles/entities.\n- Use AND between core entities/slots; OR only for key aliases.\n- Include type/year parentheticals and decisive role/metric terms.\n- Output discipline: produce only the query_1 string; no explanations.\nMicro examples:\n- Claim: “The writer of 'Eyes of the Insane' and Marcus Mumford were both artists.”\n  Query: Eyes of the Insane (song) writer lyricist credits AND song\n- Claim: “Prince and Patty Jenkins have not both been inducted into the Rock and Roll Hall of Fame.”\n  Query: Prince (musician) Rock and Roll Hall of Fame induction AND Patty Jenkins (film director) Rock and Roll Hall of Fame inductees list\n- Claim: “Step Up 2 is the sequel to the dance film directed by Anne Fletcher.”\n  Query: Step Up 2: The Streets sequel AND Step Up (film) director Anne Fletcher\n- Claim: “James Clark McReynolds was a Republican who served as U.S. Attorney General under Wilson.”\n  Query: James Clark McReynolds AND political party affiliation AND United States Attorney General Woodrow Wilson",
,   'summarize_1_prompt': "Task: Using {claim} and {passages_1}, produce the field 'summary_1' as a concise, evidence-grounded synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add knowledge or echo claim details unless present in the passages. If a needed detail is absent, write 'not stated' or 'no mention'. If a passage contradicts a detail, mark 'contradiction'. If canonical pages lack an expected fact, mark 'strong absence on [Title]'. If only a single obscure page asserts the key fact, mark 'single-source risk'.\n- Clause/Referent mapping and subject lock:\n  • Decompose the claim into S1, S2, ... and map relative clauses to the correct antecedent (e.g., 'the sequel to the dance film directed by Anne Fletcher' → the original Step Up, not Step Up 2).\n  • State whether the same subject must satisfy multiple parts ('shared-subject requirement').\n  • Confirm that evidence refers to the correct entity (name collisions like Patty Jenkins vs Patti Scialfa; Andrea vs Pierre Casiraghi). If mismatch, mark 'subject mismatch'.\n- Temporal reasoning:\n  • Note whether parts of the claim require simultaneity. If not explicit, treat facts at different times as acceptable; record time qualifiers (year/season/month).\n- Set/authority handling:\n  • For existential set claims (guest/member/inductee), prefer authoritative sets (full cast/track listing/inductees). If set not enumerated, mark 'set not enumerated'.\n- Keep it concise (5–8 bullets) with only what helps verify the claim or plan the next retrieval.\nStructure:\n- Claim decomposition: S1, S2, ...; mark existential/universal/exclusive; note simultaneity requirement if any.\n- Entity/role & coreference map: resolved titles/persons/places with type/years; note unresolved or subject mismatch.\n- Evidence bullets: [Title — SourceType] key fact(s) with exact names/dates/roles/numbers/phrases; include strong absences or single-source risk.\n- Coverage status: S# = Supported / Refuted / Not found yet. Shared-subject status: satisfied / not satisfied / unresolved.\n- Discrepancies: contradictions; temporal/geo mismatches; role/type/category confusion; identity collisions.\n- Gaps/Next-hop targets: the single most decisive missing item (identity, exact credit/date/number/definition; authoritative list/credits page; corroboration for single-source risk) to settle the claim.\n- Suggested query terms: one short line with exact titles + the missing attribute/role/value (add type disambiguators; include OR aliases if helpful).",
,   'create_query_2_prompt': "Task: Using {claim} and {summary_1}, produce the field 'query_2'.\nGoal: Target the single highest-value unresolved or contested element identified in summary_1 to advance verification or refutation.\nGuidelines:\n- Read 'Coverage status', 'Discrepancies', and 'Gaps/Next-hop targets'. Pick the one decisive item that would directly settle the claim: identity resolution, exact cast/host/writer credit, list/roster inclusion, release date/month/season, party/affiliation, parent company, founding data, track listing membership, exact number/ranking, comparative metric (area/population/distance), definitional distinction, or authoritative inductee/membership list.\n- Existential set-checks: if the claim is about 'a/the guest/member/principal actor/inductee', retrieve the complete authoritative set (full track listing, principal cast, lineup/members, inductees) rather than testing a single candidate.\n- If identity is unresolved, resolve it first (work title + role). If identity is known, query the authoritative page asserting the needed attribute (subject page + party/affiliation/origin/actor status; or the canonical 'List of ...').\n- Conjunctions across time: If summary_1 indicates the claim links independent facts about the same place/entity at different times, target the unresolved half; do not force simultaneity unless the claim states it.\n- If summary_1 noted a strong absence on a canonical page, query either that page plus the contested attribute as a negative check or the nearest authoritative list to corroborate or contradict it.\n- Encode numeric/date/geo qualifiers exactly ('2008-09 season', 'summer 2008', 'April 2018', 'area km2').\n- Keep the query compact; use AND for core entities and attribute; OR for plausible aliases; include type parentheticals for disambiguation.\nProduce only the query_2 string.",
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements:\n- Synthesize only what is present in {passages_2}; integrate with prior evidence from summary_1 without inventing facts. Keep 5–8 bullets; cite titles; quote key numbers/dates/phrases; maintain temporal/geo qualifiers.\n- Update coverage explicitly for each S# from summary_1: Supported/Refuted/Not found. Track whether a single shared subject must satisfy all parts; add 'Shared-subject status: ...'.\n- Coreference discipline and subject lock: map pronouns/definites to named entities; confirm the evidence refers to the right entity (e.g., Patty vs Patti; Andrea vs Pierre). If unclear, keep 'identity unresolved' or 'subject mismatch'.\n- Enumerate authoritative sets when relevant (full cast/track listing/inductees/members) and intersect with needed attributes (e.g., 'from Cleveland' AND 'actor').\n- Call out counterexamples and strong absences on canonical/list pages. Distinguish role/type (film vs TV; soundtrack vs film; city vs county; director vs producer). Flag 'single-source risk' when applicable.\n- Precision: if the claim specifies month/season/date/number/role/title and evidence differs or is broader, mark Not found yet with 'temporal/metric/role mismatch'.\n- End with: Gaps/Next-hop targets (the last unresolved piece to close the case or corroborate a single-source risk) and a concise Suggested query for hop 3 that directly tests the remaining weakest link (missing identity/date/credit/metric/definition or authoritative counterexample/list).\nStructure:\n- Evidence bullets: [Title] — key fact(s) relevant to unresolved components; quote values exactly.\n- Coverage update: S1, S2, ... with status after hops 1–2; Shared-subject status when applicable.\n- Discrepancies/notes: contradictions; role/type/name/series-region mix-ups; temporal/geo mismatches; strong absences; non-existent entity cues; single-source risk.\n- Gaps/Next-hop targets\n- Suggested query terms\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the final missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target the single unresolved component under 'Gaps/Next-hop targets' that will conclusively settle the claim: exact cast/host/writer/producer credit, track listing, inductee/membership list, political party/affiliation, precise release month/season/date, original artist/first recording, Hall of Fame induction, event winners/roster, city area km2, distance X–Y, or definitional distinction.\n- For existential claims ('a principal actor/member/guest/inductee'), query the authoritative set page (full principal cast/members/lineup/guest appearances/track listing/inductees) and/or the candidate’s bio with the needed attribute to intersect the sets.\n- For identity/indirection/possessives ('author of X’s Y', 'singer of [song]'), query the work’s page with the exact role term (writer/lyricist/creator) or the author/artist page constrained by year/genre/type.\n- For numeric/date/geo claims, use the exact metric/timeframe ('2008-09 season', 'summer 2008', 'born 1956', 'area km2'). For role/title assertions, include the exact role term and 'never' if testing a negative on an authoritative list.\n- If prior steps suggest misattribution or a non-existent entity, query the nearest authoritative page (disambiguation + type term; main subject + contested attribute; or a relevant 'List of ...') to surface the contradiction. If there is a 'single-source risk', query an independent corroborating authoritative page.\n- Reuse exact titles/role terms established earlier to keep subjects aligned. Keep the query compact and disambiguated.\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, align it to each part of the claim, and assess sufficiency for a verdict.\nRequirements:\n- Be concise (6–10 bullets) and strictly grounded in the retrieved text. Cite titles; quote salient values/years/venues/phrases; preserve temporal/geo/logical qualifiers. Keep property attachment correct across chains (A→B, B→C). Track any shared-subject requirement and confirm subject lock.\n- Provide a component-wise support matrix and a final sufficiency assessment. Treat absence explicitly: when canonical/list pages lack a claimed fact while the counterpart is evidenced, note 'strong counterevidence'. One authoritative counterexample or list can refute universals/exclusives ('both', 'neither', 'only', 'all').\n- Temporal/scope interpretation: Only require simultaneity if the claim explicitly demands it; otherwise allow different timeframes for different conjuncts about the same entity/location.\n- Precision and ambiguity: mismatched specific values/dates/seasons/roles/titles are contradictions for precise claims; distinguish film vs TV vs soundtrack; city vs county vs district; resolve pronouns/possessives/definites to named entities; flag subject mismatches.\n- Reliability: if a pivotal fact relies on a single obscure page, mark 'single-source risk — corroboration missing'.\nStructure:\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing the last gaps/contradictions.\n- Claim mapping:\n  - S1: Supported/Refuted/Not found (by [Title])\n  - S2: Supported/Refuted/Not found (by [Title])\n  - S3... (continue for all subclaims/qualifiers)\n  - Shared-subject status: satisfied / not satisfied / unresolved; Subject lock: passed/failed\n- Scope/implication check: simultaneity needed? yes/no; note exclusivity/causality/universality implications; ensure referents of relative clauses are correct.\n- Discrepancies/notes: contradictions; timeline/number/venue/geo/role mismatches; wrong entity types/genres; misidentified persons; strong absences on canonical pages; indicators of non-existent entities; single-source risk and whether it was mitigated.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (name the missing specific component or unresolved identity/date/credit/metric/definition, or unmitigated single-source risk).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Parse the claim into essential components and resolve clause referents (e.g., 'the sequel to the dance film directed by Anne Fletcher' refers to the original Step Up). Enforce subject lock: evidence must pertain to the exact named/resolved entities.\n- Simultaneity: Only require that multiple facts hold at the same time if the claim explicitly states this (e.g., 'at the time', 'while', 'during'). Otherwise, conjunctive facts about the same entity/location may come from different timeframes.\n- Return SUPPORTED only if every essential component of the claim (all named/resolved entities, the specific relation/attribute, and all qualifiers such as date/season/month/number/venue/geo/genre/role/superlative/universality/exclusivity/negation/comparison/definition) is directly and unambiguously supported by the cited evidence, with no unresolved contradictions, and any required shared subject is satisfied by the same entity.\n- Return NOT_SUPPORTED if any essential component is directly contradicted (different person/title/role; different location/date/season/venue/number/ranking; definitional mismatch; misattributed identity; canonical list/roster/credits/inductees/definition pages show the opposite; a required named entity is shown to be non-existent; or a universal/exclusive/negation claim is shown false by a counterexample). Treat strong absence on canonical/list pages as counterevidence when the claim asserts a specific, checkable fact (e.g., credited host/cast/inductee/captain; explicit release year; membership/induction status) and a relevant authoritative list lacks the item.\n- Return NOT_ENOUGH_INFO if any essential component remains unverified, identity/pronoun/possessive reference is unresolved, evidence is off-subject, or pivotal facts rely on an uncorroborated single obscure page (single-source risk). Also return NOT_ENOUGH_INFO when precision demanded by the claim (exact number/date/season/month/venue/district/role/title/definition) is not explicitly available in evidence, even if broader context is present. For definitional/equivalence claims, if the direction of implication is not established, return NOT_ENOUGH_INFO unless the precise classification is explicitly stated.\n- Limited compositional inference is acceptable only when it follows directly from explicit evidence (e.g., composer X scored game Y; page for Y explicitly states it contains element Z → infer X composed for a game containing Z). Do not infer roles/titles/affiliations or months/seasons without explicit statements.\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."}},
, {'iteration': 3,
,  'train_accuracy': np.float64(0.5466666666666666),
,  'dev_accuracy': np.float64(0.55),
,  'prompts': {'create_query_1_prompt': "Task: Using {claim}, produce the field 'query_1' as one high-recall, disambiguated Wikipedia abstracts search query.\nGoal: Retrieve the single most authoritative page(s) that unlock the core entity(ies), definition(s), or relation(s) needed to verify the claim, especially for multi-hop chains.\nMethod (Parse → Pivot → Disambiguate → Decisive Slot Terms → Authoritative Targets):\n- Parse and type the claim into atomic subclaims and detect structure:\n  • Identity/indirection (e.g., 'writer of [song]', 'author of X’s Y', 'singer of [track]')\n  • Role/attribute (party affiliation, birthplace, cast/credits, members/lineup, inductee status)\n  • Temporal/quantified/comparative (date/month/season, number/ranking, earlier/later)\n  • Conjunction vs simultaneity: If a claim links facts about the same subject across different times (A happened in 1943; B operates there now), do not enforce simultaneity unless explicitly stated.\n- Choose a decisive pivot that resolves the largest uncertainty first:\n  • If the subject is described via a work/role, target the work’s canonical page plus exact credit/list terms (writer/lyricist/cast/track listing/guest appearances) to identify the person. Bridge to bios later.\n  • For membership/inductee/roster claims, target canonical list/roster/credits pages ('List of ...', 'Track listing', 'Cast', 'Members', 'Inductees').\n  • For ambiguous names, add profession/type and parenthetical (film/TV series/album/song/politician/band/city/company; U.S./UK; release year).\n- Encode decisive slots that live in leads/infoboxes/lists:\n  • Exact role terms (writer/lyricist/creator/producer/host/members/lineup; 'inducted', 'political party/affiliation')\n  • Precise temporal/metric terms (year/month/season, number/ranking, area km2)\n- Authority prioritization:\n  • Prefer canonical subject pages and authoritative lists/credits/definition pages ('List of ...', 'Inductees', 'Cast', 'Track listing', 'Members', 'Discography/Filmography', 'Awards and nominations').\n- Anti-hallucination/precision guards:\n  • Do not assert new facts; use claim details only as query terms.\n  • When identity is unresolved, do not bias with a guessed name; query the work + role terms first.\n  • For potential name collisions, include profession and/or parenthetical disambiguators.\nGuidelines for query_1:\n- Keep under ~22 content tokens; front-load distinctive titles/entities.\n- Use AND between core entities/slots; OR only for key aliases.\n- Include type/year parentheticals and decisive role/metric terms.\n- Output discipline: produce only the query_1 string; no explanations.\nMicro examples:\n- Claim: “The writer of 'Eyes of the Insane' and Marcus Mumford were both artists.”\n  Query: Eyes of the Insane (song) writer lyricist credits AND song\n- Claim: “Prince and Patty Jenkins have not both been inducted into the Rock and Roll Hall of Fame.”\n  Query: Prince (musician) Rock and Roll Hall of Fame induction AND Patty Jenkins (film director) Rock and Roll Hall of Fame inductees list\n- Claim: “Step Up 2 is the sequel to the dance film directed by Anne Fletcher.”\n  Query: Step Up 2: The Streets sequel AND Step Up (film) director Anne Fletcher\n- Claim: “James Clark McReynolds was a Republican who served as U.S. Attorney General under Wilson.”\n  Query: James Clark McReynolds AND political party affiliation AND United States Attorney General Woodrow Wilson",
,   'summarize_1_prompt': "Task: Using {claim} and {passages_1}, produce the field 'summary_1' as a concise, evidence-grounded synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add knowledge or echo claim details unless present in the passages. If a needed detail is absent, write 'not stated' or 'no mention'. If a passage contradicts a detail, mark 'contradiction'. If canonical pages lack an expected fact, mark 'strong absence on [Title]'. If only a single obscure page asserts the key fact, mark 'single-source risk'.\n- Clause/Referent mapping and subject lock:\n  • Decompose the claim into S1, S2, ... and map relative clauses to the correct antecedent (e.g., 'the sequel to the dance film directed by Anne Fletcher' → the original Step Up, not Step Up 2).\n  • State whether the same subject must satisfy multiple parts ('shared-subject requirement').\n  • Confirm that evidence refers to the correct entity (name collisions like Patty Jenkins vs Patti Scialfa; Andrea vs Pierre Casiraghi). If mismatch, mark 'subject mismatch'.\n- Temporal reasoning:\n  • Note whether parts of the claim require simultaneity. If not explicit, treat facts at different times as acceptable; record time qualifiers (year/season/month).\n- Set/authority handling:\n  • For existential set claims (guest/member/inductee), prefer authoritative sets (full cast/track listing/inductees). If set not enumerated, mark 'set not enumerated'.\n- Keep it concise (5–8 bullets) with only what helps verify the claim or plan the next retrieval.\nStructure:\n- Claim decomposition: S1, S2, ...; mark existential/universal/exclusive; note simultaneity requirement if any.\n- Entity/role & coreference map: resolved titles/persons/places with type/years; note unresolved or subject mismatch.\n- Evidence bullets: [Title — SourceType] key fact(s) with exact names/dates/roles/numbers/phrases; include strong absences or single-source risk.\n- Coverage status: S# = Supported / Refuted / Not found yet. Shared-subject status: satisfied / not satisfied / unresolved.\n- Discrepancies: contradictions; temporal/geo mismatches; role/type/category confusion; identity collisions.\n- Gaps/Next-hop targets: the single most decisive missing item (identity, exact credit/date/number/definition; authoritative list/credits page; corroboration for single-source risk) to settle the claim.\n- Suggested query terms: one short line with exact titles + the missing attribute/role/value (add type disambiguators; include OR aliases if helpful).",
,   'create_query_2_prompt': "Task: Using {claim} and {summary_1}, produce the field 'query_2'.\nGoal: Target the single highest-value unresolved or contested element identified in summary_1 to advance verification or refutation.\nGuidelines:\n- Read 'Coverage status', 'Discrepancies', and 'Gaps/Next-hop targets'. Pick the one decisive item that would directly settle the claim: identity resolution, exact cast/host/writer credit, list/roster inclusion, release date/month/season, party/affiliation, parent company, founding data, track listing membership, exact number/ranking, comparative metric (area/population/distance), definitional distinction, or authoritative inductee/membership list.\n- Existential set-checks: if the claim is about 'a/the guest/member/principal actor/inductee', retrieve the complete authoritative set (full track listing, principal cast, lineup/members, inductees) rather than testing a single candidate.\n- If identity is unresolved, resolve it first (work title + role). If identity is known, query the authoritative page asserting the needed attribute (subject page + party/affiliation/origin/actor status; or the canonical 'List of ...').\n- Conjunctions across time: If summary_1 indicates the claim links independent facts about the same place/entity at different times, target the unresolved half; do not force simultaneity unless the claim states it.\n- If summary_1 noted a strong absence on a canonical page, query either that page plus the contested attribute as a negative check or the nearest authoritative list to corroborate or contradict it.\n- Encode numeric/date/geo qualifiers exactly ('2008-09 season', 'summer 2008', 'April 2018', 'area km2').\n- Keep the query compact; use AND for core entities and attribute; OR for plausible aliases; include type parentheticals for disambiguation.\nProduce only the query_2 string.",
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements:\n- Synthesize only what is present in {passages_2}; integrate with prior evidence from summary_1 without inventing facts. Keep 5–8 bullets; cite titles; quote key numbers/dates/phrases; maintain temporal/geo qualifiers.\n- Update coverage explicitly for each S# from summary_1: Supported/Refuted/Not found. Track whether a single shared subject must satisfy all parts; add 'Shared-subject status: ...'.\n- Coreference discipline and subject lock: map pronouns/definites to named entities; confirm the evidence refers to the right entity (e.g., Patty vs Patti; Andrea vs Pierre). If unclear, keep 'identity unresolved' or 'subject mismatch'.\n- Enumerate authoritative sets when relevant (full cast/track listing/inductees/members) and intersect with needed attributes (e.g., 'from Cleveland' AND 'actor').\n- Call out counterexamples and strong absences on canonical/list pages. Distinguish role/type (film vs TV; soundtrack vs film; city vs county; director vs producer). Flag 'single-source risk' when applicable.\n- Precision: if the claim specifies month/season/date/number/role/title and evidence differs or is broader, mark Not found yet with 'temporal/metric/role mismatch'.\n- End with: Gaps/Next-hop targets (the last unresolved piece to close the case or corroborate a single-source risk) and a concise Suggested query for hop 3 that directly tests the remaining weakest link (missing identity/date/credit/metric/definition or authoritative counterexample/list).\nStructure:\n- Evidence bullets: [Title] — key fact(s) relevant to unresolved components; quote values exactly.\n- Coverage update: S1, S2, ... with status after hops 1–2; Shared-subject status when applicable.\n- Discrepancies/notes: contradictions; role/type/name/series-region mix-ups; temporal/geo mismatches; strong absences; non-existent entity cues; single-source risk.\n- Gaps/Next-hop targets\n- Suggested query terms\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the final missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target the single unresolved component under 'Gaps/Next-hop targets' that will conclusively settle the claim: exact cast/host/writer/producer credit, track listing, inductee/membership list, political party/affiliation, precise release month/season/date, original artist/first recording, Hall of Fame induction, event winners/roster, city area km2, distance X–Y, or definitional distinction.\n- For existential claims ('a principal actor/member/guest/inductee'), query the authoritative set page (full principal cast/members/lineup/guest appearances/track listing/inductees) and/or the candidate’s bio with the needed attribute to intersect the sets.\n- For identity/indirection/possessives ('author of X’s Y', 'singer of [song]'), query the work’s page with the exact role term (writer/lyricist/creator) or the author/artist page constrained by year/genre/type.\n- For numeric/date/geo claims, use the exact metric/timeframe ('2008-09 season', 'summer 2008', 'born 1956', 'area km2'). For role/title assertions, include the exact role term and 'never' if testing a negative on an authoritative list.\n- If prior steps suggest misattribution or a non-existent entity, query the nearest authoritative page (disambiguation + type term; main subject + contested attribute; or a relevant 'List of ...') to surface the contradiction. If there is a 'single-source risk', query an independent corroborating authoritative page.\n- Reuse exact titles/role terms established earlier to keep subjects aligned. Keep the query compact and disambiguated.\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, align it to each part of the claim, and assess sufficiency for a verdict.\nRequirements:\n- Be concise (6–10 bullets) and strictly grounded in the retrieved text. Cite titles; quote salient values/years/venues/phrases; preserve temporal/geo/logical qualifiers. Keep property attachment correct across chains (A→B, B→C). Track any shared-subject requirement and confirm subject lock.\n- Provide a component-wise support matrix and a final sufficiency assessment. Treat absence explicitly: when canonical/list pages lack a claimed fact while the counterpart is evidenced, note 'strong counterevidence'. One authoritative counterexample or list can refute universals/exclusives ('both', 'neither', 'only', 'all').\n- Temporal/scope interpretation: Only require simultaneity if the claim explicitly demands it; otherwise allow different timeframes for different conjuncts about the same entity/location.\n- Precision and ambiguity: mismatched specific values/dates/seasons/roles/titles are contradictions for precise claims; distinguish film vs TV vs soundtrack; city vs county vs district; resolve pronouns/possessives/definites to named entities; flag subject mismatches.\n- Reliability: if a pivotal fact relies on a single obscure page, mark 'single-source risk — corroboration missing'.\nStructure:\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing the last gaps/contradictions.\n- Claim mapping:\n  - S1: Supported/Refuted/Not found (by [Title])\n  - S2: Supported/Refuted/Not found (by [Title])\n  - S3... (continue for all subclaims/qualifiers)\n  - Shared-subject status: satisfied / not satisfied / unresolved; Subject lock: passed/failed\n- Scope/implication check: simultaneity needed? yes/no; note exclusivity/causality/universality implications; ensure referents of relative clauses are correct.\n- Discrepancies/notes: contradictions; timeline/number/venue/geo/role mismatches; wrong entity types/genres; misidentified persons; strong absences on canonical pages; indicators of non-existent entities; single-source risk and whether it was mitigated.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (name the missing specific component or unresolved identity/date/credit/metric/definition, or unmitigated single-source risk).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Parse the claim into essential components and resolve clause referents (e.g., 'the sequel to the dance film directed by Anne Fletcher' refers to the original Step Up). Enforce subject lock: evidence must pertain to the exact named/resolved entities.\n- Simultaneity: Only require that multiple facts hold at the same time if the claim explicitly states this (e.g., 'at the time', 'while', 'during'). Otherwise, conjunctive facts about the same entity/location may come from different timeframes.\n- Return SUPPORTED only if every essential component of the claim (all named/resolved entities, the specific relation/attribute, and all qualifiers such as date/season/month/number/venue/geo/genre/role/superlative/universality/exclusivity/negation/comparison/definition) is directly and unambiguously supported by the cited evidence, with no unresolved contradictions, and any required shared subject is satisfied by the same entity.\n- Return NOT_SUPPORTED if any essential component is directly contradicted (different person/title/role; different location/date/season/venue/number/ranking; definitional mismatch; misattributed identity; canonical list/roster/credits/inductees/definition pages show the opposite; a required named entity is shown to be non-existent; or a universal/exclusive/negation claim is shown false by a counterexample). Treat strong absence on canonical/list pages as counterevidence when the claim asserts a specific, checkable fact (e.g., credited host/cast/inductee/captain; explicit release year; membership/induction status) and a relevant authoritative list lacks the item.\n- Return NOT_ENOUGH_INFO if any essential component remains unverified, identity/pronoun/possessive reference is unresolved, evidence is off-subject, or pivotal facts rely on an uncorroborated single obscure page (single-source risk). Also return NOT_ENOUGH_INFO when precision demanded by the claim (exact number/date/season/month/venue/district/role/title/definition) is not explicitly available in evidence, even if broader context is present. For definitional/equivalence claims, if the direction of implication is not established, return NOT_ENOUGH_INFO unless the precise classification is explicitly stated.\n- Limited compositional inference is acceptable only when it follows directly from explicit evidence (e.g., composer X scored game Y; page for Y explicitly states it contains element Z → infer X composed for a game containing Z). Do not infer roles/titles/affiliations or months/seasons without explicit statements.\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."},
,  'optimized_prompts': {'create_query_1_prompt': "Task: Using {claim}, produce the field 'query_1' as one high-recall, disambiguated Wikipedia abstracts search query.\nGoal: Retrieve the most authoritative page(s) that unlock the core entity(ies), relation(s), list(s), chronology, or definition(s) needed to verify the claim, with special care for multi-hop chains and clause referents.\nMethod (Decompose → Parse clauses → Lock entities → Pick anchor → Disambiguate → Pin qualifiers → Target authority → Anticipate pitfalls):\n- Decompose the claim into atomic slots (S1, S2, ...): subjects, predicates, and required qualifiers (type, year/season/date, role/title/credit, debut/first/original/last/only/both, list membership, location/affiliation/number), and any shared-subject link.\n- Parse relative/possessive clauses precisely (e.g., “the sequel to the dance film directed by Anne Fletcher” → sequel = Step Up 2; referenced film directed by Anne Fletcher = Step Up (2006)).\n- Lock entities with exact types and canonical titles (film vs TV vs album/song/game; person vs character; award vs production; city vs county; club vs national team). If a claimed title may be a concept/award, prefer the definition/award page.\n- Choose the decisive first anchor:\n  • For works/entities named in the claim: use the work’s canonical page (album/film/episode/award/event/venue) or the person’s bio if the claim is about their attributes/roles.\n  • For membership/credits/rosters/inductees: target canonical list/credits pages ('List of ...', 'Track listing', 'Cast', 'Members', 'Inductees', 'Filmography/Discography').\n  • For ordinal/universal claims ('first/last/only/both'): target authoritative result tables, standings, or enumerated lists.\n- Pin qualifiers into the query (time/role/geo/numeric): exact year/season/month/date, role terms (director/host/writer/member/cast/original recording/feature film debut), location or team level (club vs national), numbers/rankings.\n- Authority targeting: prefer canonical subject pages, infobox-bearing bios, official list/credits/definition pages. When negative/exclusive claims are made, favor enumerations or credits that can show inclusion/exclusion or a counterexample.\n- Safety & precision: do not assert unknown facts; only use claim terms and safe disambiguators. Avoid drifting to similarly titled entities or wrong media/years.\n- Ambiguity guardrails:\n  • Sports/geography reused names → include state/country or competition level.\n  • Multi-era/place evolution (e.g., bases, franchises) → include timeframe if required; otherwise retrieve the entity’s history page.\n  • Style/definition chains → retrieve both the subject’s broader style and the definition page of the named subset.\nGuidelines for query_1:\n- Keep under ~22 content tokens; front-load exact titles/names; include type/years; use AND between core terms; OR only for high-value aliases; include parentheticals for disambiguation (e.g., '(rapper)').\n- Output discipline: produce only the query_1 string; no explanations.\nMicro examples:\n- Claim: “Playing God was released on Brand New Eyes by Paramore.”\n  Query: Playing God (song) AND Brand New Eyes (album) track listing\n- Claim: “Excuse My French features a guest who is an American rapper and actor from Cleveland, Ohio.”\n  Query: Excuse My French (album) AND guest appearances AND Machine Gun Kelly (rapper)\n- Claim: “Step Up 2: The Streets is the sequel to the dance film directed by Anne Fletcher.”\n  Query: Step Up 2: The Streets AND Step Up (film) director Anne Fletcher\n- Claim: “Robert O’Brien came last in the men’s K-1 10000 m at the 1956 Olympics.”\n  Query: Canoeing at the 1956 Summer Olympics – Men's K-1 10000 metres final results AND Robert O'Brien (canoeist)\n- Claim: “Prince and Patty Jenkins have not both been inducted into the Rock and Roll Hall of Fame.”\n  Query: List of Rock and Roll Hall of Fame inductees AND Prince (musician) AND Patty Jenkins",
,   'summarize_1_prompt': "Task: Using {claim} and {passages_1}, produce the field 'summary_1' as a concise, evidence-grounded synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add knowledge beyond the text. If a needed detail is absent, write 'not stated' or 'no mention'. If a passage contradicts a detail, mark 'contradiction'. If canonical pages lack an expected fact, mark 'strong absence on [Title]'. If only a single obscure page asserts the key fact, mark 'single-source risk'. If evidence seems about a different entity than the claim, mark 'off-subject evidence'.\nCore structure (tight and actionable; 5–8 bullets):\n- Claim decomposition: list S1, S2, ... (note existential/universal/negation/ordinal); indicate any shared-subject requirement.\n- Clause/subject resolution: map relative/possessive clauses to exact entities (e.g., sequel A → original B directed by C); identify time-scope (simultaneity required? yes/no).\n- Entity/role map (subject lock): enumerate resolved titles/persons/places with type/year and role terms (director vs writer; singer vs original recording; club vs national team). Note unresolved items or subject mismatch.\n- Evidence bullets: [Title — SourceType] quote exact names/dates/roles/numbers/phrases supporting or refuting each S#; include strong absences/contradictions/single-source risk; for lists/tables, note whether enumeration is present.\n- Temporal/metric precision: extract exact dates/seasons/ordinals/capacity/area. For ordinals ('first/last/only') or position-specific claims, highlight any known ordinal vs claim ('temporal/ordinal mismatch') and whether total set size is stated.\n- Sets/authority: for list/membership/credits/result-table claims, state if the set is explicitly enumerated; otherwise, 'set not enumerated'. For 'original/first/debut', capture 'original recording'/'first released'/'feature film debut' phrases.\n- Coverage status: S# = Supported / Refuted / Not found yet; Shared-subject: satisfied / not satisfied / unresolved.\n- Gaps/Next-hop target: ONE decisive missing item (identity, exact credit/date/number/definition; authoritative list/credits/results page; corroboration for single-source risk; 'original/first/debut' confirmation; entity existence check) to settle the claim.\n- Suggested query terms: One short line with exact titles + the missing attribute/role/value (add type disambiguators; include OR aliases if helpful).",
,   'create_query_2_prompt': "Task: Using {claim} and {summary_1}, produce the field 'query_2'.\nGoal: Target the single highest-value unresolved or contested element in summary_1 to advance verification or refutation.\nGuidelines:\n- Read 'Coverage status', 'Clause/subject resolution', and 'Gaps/Next-hop target'. Pick the ONE decisive item that would settle the claim: identity resolution (work → person/place), exact cast/host/writer credit, list/roster/inductee inclusion, release date/month/season, party/affiliation, parent company/ownership ('acquired'/'sold'/'as of'), founding/birth data, track listing membership, original/first/debut evidence, precise number/ranking/area/capacity/distance, definitional classification, or authoritative results table.\n- Ordinal/universal/negative claims ('first/last/only/both/neither/not/never') → query authoritative enumerations: official results tables, inductee lists, full rosters/credits, track listings; include the exact event/season/year.\n- Indirect references: if unresolved (e.g., album guest needs bio attributes), pivot from the work’s page to the person’s bio with precise attributes ('actor', 'from Cleveland, Ohio').\n- Temporal precision: encode exact qualifiers ('current/as of YEAR', event date, '2008–09 season', 'feature film debut 1935').\n- If summary_1 noted strong absence on a canonical page, pivot to the nearest authoritative list/definition/results or corroborating canonical page.\n- Keep compact; use AND between core entities/attributes; include parentheticals for disambiguation; OR only for critical aliases.\nProduce only the query_2 string.",
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements (5–8 bullets, evidence-only):\n- Integrate new facts from {passages_2} with prior evidence. Quote key numbers/dates/roles/phrases; maintain temporal/geo/role qualifiers. Do not invent.\n- Update coverage explicitly for each S# from summary_1: Supported/Refuted/Not found. Note if a single shared subject must satisfy multiple parts; set 'Shared-subject status: ...'.\n- Clause/subject and identity lock: confirm evidence refers to the exact entity type (film vs TV vs soundtrack; city vs county; club vs national vs all-star; award vs production). If unclear, keep 'identity unresolved' or 'subject mismatch' and propose a clarifying target.\n- Enumerate authoritative sets when relevant (principal cast/track listing/rosters/inductees/results) and intersect with required attributes. For 'original/first/debut/last', extract explicit 'original recording/first released/feature film debut/last place' evidence with dates and, if needed, total set size.\n- Call out counterexamples and strong absences on canonical/list pages. Distinguish role/type (director vs writer; singer vs songwriter). Flag 'single-source risk' if pivotal.\n- Precision: if the claim specifies a month/season/date/number/role/title/ordinal and evidence differs or is broader, mark Not found yet with 'temporal/metric/role/ordinal mismatch'. For multi-era/location claims, state whether simultaneity is required; if not, allow facts from different eras at the same entity when appropriate.\n- End with: Gaps/Next-hop target (the last unresolved piece to close the case or corroborate a single-source risk) and a concise Suggested query for hop 3 directly testing the remaining weakest link (e.g., missing identity/date/credit/metric/definition; ownership as-of; authoritative list/results; 'first/original/debut/last' confirmation; entity existence check).\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the final missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target the single unresolved component under 'Gaps/Next-hop target' that conclusively settles the claim: exact cast/host/writer/producer credit, full track listing, inductee/membership list, political party/affiliation, precise release month/season/date, original artist/first recording, feature film debut, event winners/roster, numeric metric (capacity/area/distance), definitional classification, or 'acquired/sold/as of' ownership change.\n- For ordinal/universal/exclusive claims ('first/last/only/both/neither/not/never'), issue a results/list query that enumerates the full set for the exact event/season/timeframe and the subject’s position; include total competitors if 'last' is asserted.\n- For ambiguous or possibly non-existent titles, issue an existence/definition query (disambiguation or award page) or a counterexample query from authoritative lists/credits.\n- Indirection/possessives: use the canonical work/organization page with the exact role/location term OR the resolved entity’s bio with the precise attribute ('member of', 'actor', 'from Cleveland, Ohio', 'feature film debut').\n- For multi-era/location claims that do not require simultaneity, query the entity’s history/unit pages to link eras (e.g., base later operated Mikoyan jets; original footage year elsewhere).\n- Reuse exact titles/role terms established earlier; keep compact and disambiguated; use AND between core terms; OR only for critical aliases.\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, align it to each part of the claim, and assess sufficiency for a verdict.\nRequirements (6–10 bullets, strictly grounded):\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing the last gaps/contradictions. Quote key values/years/roles/phrases. Keep property attachment correct across chains (A→B, B→C). Include definitional links when relevant (broader style vs named subset) and results/list enumerations with totals when ordinals matter.\n- Claim mapping:\n  - S1: Supported/Refuted/Not found (by [Title])\n  - S2: Supported/Refuted/Not found (by [Title])\n  - S3... (continue for all subclaims/qualifiers)\n  - Shared-subject status: satisfied / not satisfied / unresolved; Subject lock: passed/failed\n- Temporal/scope interpretation:\n  • Require simultaneity only if explicit ('at the time', 'during', 'current/as of').\n  • Allow multi-era facts at the same entity if the claim doesn’t require simultaneity (e.g., WWII footage at base; later jets at same base).\n  • Treat season labels (e.g., 2008–09) as not equivalent to calendar months unless explicitly tied; mismatches are contradictions when the claim is specific.\n- Ordinals and universals:\n  • For 'first/last/only/both/neither' or position claims, rely on authoritative enumerations/results. If evidence shows a different ordinal or gives a placement (e.g., 11th) without 'last' confirmation and the official list lacks 'last' phrasing for the subject, mark as contradiction or strong absence as appropriate.\n  • A single authoritative list/credits/results page can confirm inclusion/exclusion or refute universals/exclusives.\n- Reliability & ambiguity:\n  • Flag 'single-source risk — corroboration missing' if pivotal facts hinge on one obscure page.\n  • Resolve similarly named entities (film vs soundtrack; city vs county; award vs production). Flag subject mismatches and off-subject evidence.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (name the missing specific component: unresolved identity/date/credit/metric/definition, total competitors for 'last', or unmitigated single-source risk).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Parse the claim into essential components and resolve clause referents precisely. For relative clauses, attach properties to the correct referent (e.g., 'the sequel to the film directed by Anne Fletcher' refers to Step Up 2 being the sequel to Step Up (directed by Anne Fletcher), not that Anne Fletcher directed Step Up 2).\n- Enforce subject lock: evidence must pertain to the exact named/resolved entities (correct type/country/year/series vs soundtrack; club vs national vs all-star team; award vs production; parent vs grandparent). Misattributed relationships (e.g., wrong parent) are NOT_SUPPORTED.\n- Universals/negations/exclusives and ordinals:\n  • 'both', 'neither', 'only', 'all', 'not/never' require authoritative lists/credits/definitions that establish inclusion/exclusion. One credible counterexample refutes a universal/exclusive.\n  • 'first/last/position' requires explicit ordinal evidence from authoritative results/enumerations; if evidence shows a different ordinal (e.g., 11th) or lacks 'last' confirmation while listing placements, treat as NOT_SUPPORTED (strong absence or contradiction on official/results pages).\n- 'Original/first/debut' claims: Require explicit 'original recording'/'first released'/'feature film debut' with dates. Distinguish songwriter vs singer, recording vs release, debut vs early roles. If missing or ambiguous, NOT ENOUGH INFO.\n- Chronology and temporal precision: Do not equate academic/athletic season labels (e.g., 2008–09) with calendar months unless explicit. If the claim specifies a particular month/season/year and evidence differs, treat as NOT_SUPPORTED.\n- Style/definition chains: If the claim asserts that a subject was created in a broader style that, under a material variation, is called something else (e.g., 'stop-motion' that when using plasticine is 'clay animation'), SUPPORT only if evidence establishes both the subject’s broader style and the definitional relation; otherwise NOT ENOUGH INFO.\n- Existence/identity checks: If the claim hinges on a possibly non-existent entity-as-title (e.g., an award as a TV show), and authoritative evidence shows it is an award (not a production) or no such titled work exists, return NOT_SUPPORTED.\n- Famous-person attributes and multi-hop patterns: When the claim ties a work’s guest/author/artist to an occupation/attribute (e.g., 'guest is an American rapper and actor from Cleveland'), combine the work’s credits with the person’s bio. If either link is missing, NOT ENOUGH INFO.\n- Multi-era/location claims: If simultaneity is not required, facts from different eras at the same entity can satisfy the claim (e.g., footage at base in 1943; later jets operated there). If simultaneity is required but not met, NOT_SUPPORTED.\n- Conjunctions: Return SUPPORTED only if every essential component (all entities, the specific relation/attribute, and all qualifiers such as date/season/month/number/venue/geo/genre/role/superlative/universality/exclusivity/ordinal/comparison/definition/original/first/debut) is directly and unambiguously supported, with no unresolved contradictions, and any required shared subject is satisfied by the same entity.\n- Disjunctions/partials: Return NOT_SUPPORTED if any essential component is directly contradicted (wrong person/title/role; wrong location/date; definitional mismatch; canonical list/credits shows the opposite; non-existent entity). Treat strong absence on canonical/list/results pages as counterevidence for specific, checkable facts (credited host/cast/inductee/captain; explicit release month/season; membership/induction; track listing; feature film debut; original recording; ordinal/last place) when surrounding context is credible.\n- Otherwise, if any essential component remains unverified, identity/pronoun/possessive reference is unresolved, temporal/role/ordinal qualifiers are not explicit, or pivotal facts rely on a single obscure source (single-source risk), return NOT ENOUGH INFO.\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."}},
, {'iteration': 4,
,  'train_accuracy': np.float64(0.6133333333333333),
,  'dev_accuracy': np.float64(0.5633333333333334),
,  'prompts': {'create_query_1_prompt': "Task: Using {claim}, produce the field 'query_1' as one high-recall, disambiguated Wikipedia abstracts search query.\nGoal: Retrieve the most authoritative page(s) that unlock the core entity(ies), relation(s), list(s), chronology, or definition(s) needed to verify the claim, with special care for multi-hop chains and clause referents.\nMethod (Decompose → Parse clauses → Lock entities → Pick anchor → Disambiguate → Pin qualifiers → Target authority → Anticipate pitfalls):\n- Decompose the claim into atomic slots (S1, S2, ...): subjects, predicates, and required qualifiers (type, year/season/date, role/title/credit, debut/first/original/last/only/both, list membership, location/affiliation/number), and any shared-subject link.\n- Parse relative/possessive clauses precisely (e.g., “the sequel to the dance film directed by Anne Fletcher” → sequel = Step Up 2; referenced film directed by Anne Fletcher = Step Up (2006)).\n- Lock entities with exact types and canonical titles (film vs TV vs album/song/game; person vs character; award vs production; city vs county; club vs national team). If a claimed title may be a concept/award, prefer the definition/award page.\n- Choose the decisive first anchor:\n  • For works/entities named in the claim: use the work’s canonical page (album/film/episode/award/event/venue) or the person’s bio if the claim is about their attributes/roles.\n  • For membership/credits/rosters/inductees: target canonical list/credits pages ('List of ...', 'Track listing', 'Cast', 'Members', 'Inductees', 'Filmography/Discography').\n  • For ordinal/universal claims ('first/last/only/both'): target authoritative result tables, standings, or enumerated lists.\n- Pin qualifiers into the query (time/role/geo/numeric): exact year/season/month/date, role terms (director/host/writer/member/cast/original recording/feature film debut), location or team level (club vs national), numbers/rankings.\n- Authority targeting: prefer canonical subject pages, infobox-bearing bios, official list/credits/definition pages. When negative/exclusive claims are made, favor enumerations or credits that can show inclusion/exclusion or a counterexample.\n- Safety & precision: do not assert unknown facts; only use claim terms and safe disambiguators. Avoid drifting to similarly titled entities or wrong media/years.\n- Ambiguity guardrails:\n  • Sports/geography reused names → include state/country or competition level.\n  • Multi-era/place evolution (e.g., bases, franchises) → include timeframe if required; otherwise retrieve the entity’s history page.\n  • Style/definition chains → retrieve both the subject’s broader style and the definition page of the named subset.\nGuidelines for query_1:\n- Keep under ~22 content tokens; front-load exact titles/names; include type/years; use AND between core terms; OR only for high-value aliases; include parentheticals for disambiguation (e.g., '(rapper)').\n- Output discipline: produce only the query_1 string; no explanations.\nMicro examples:\n- Claim: “Playing God was released on Brand New Eyes by Paramore.”\n  Query: Playing God (song) AND Brand New Eyes (album) track listing\n- Claim: “Excuse My French features a guest who is an American rapper and actor from Cleveland, Ohio.”\n  Query: Excuse My French (album) AND guest appearances AND Machine Gun Kelly (rapper)\n- Claim: “Step Up 2: The Streets is the sequel to the dance film directed by Anne Fletcher.”\n  Query: Step Up 2: The Streets AND Step Up (film) director Anne Fletcher\n- Claim: “Robert O’Brien came last in the men’s K-1 10000 m at the 1956 Olympics.”\n  Query: Canoeing at the 1956 Summer Olympics – Men's K-1 10000 metres final results AND Robert O'Brien (canoeist)\n- Claim: “Prince and Patty Jenkins have not both been inducted into the Rock and Roll Hall of Fame.”\n  Query: List of Rock and Roll Hall of Fame inductees AND Prince (musician) AND Patty Jenkins",
,   'summarize_1_prompt': "Task: Using {claim} and {passages_1}, produce the field 'summary_1' as a concise, evidence-grounded synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add knowledge beyond the text. If a needed detail is absent, write 'not stated' or 'no mention'. If a passage contradicts a detail, mark 'contradiction'. If canonical pages lack an expected fact, mark 'strong absence on [Title]'. If only a single obscure page asserts the key fact, mark 'single-source risk'. If evidence seems about a different entity than the claim, mark 'off-subject evidence'.\nCore structure (tight and actionable; 5–8 bullets):\n- Claim decomposition: list S1, S2, ... (note existential/universal/negation/ordinal); indicate any shared-subject requirement.\n- Clause/subject resolution: map relative/possessive clauses to exact entities (e.g., sequel A → original B directed by C); identify time-scope (simultaneity required? yes/no).\n- Entity/role map (subject lock): enumerate resolved titles/persons/places with type/year and role terms (director vs writer; singer vs original recording; club vs national team). Note unresolved items or subject mismatch.\n- Evidence bullets: [Title — SourceType] quote exact names/dates/roles/numbers/phrases supporting or refuting each S#; include strong absences/contradictions/single-source risk; for lists/tables, note whether enumeration is present.\n- Temporal/metric precision: extract exact dates/seasons/ordinals/capacity/area. For ordinals ('first/last/only') or position-specific claims, highlight any known ordinal vs claim ('temporal/ordinal mismatch') and whether total set size is stated.\n- Sets/authority: for list/membership/credits/result-table claims, state if the set is explicitly enumerated; otherwise, 'set not enumerated'. For 'original/first/debut', capture 'original recording'/'first released'/'feature film debut' phrases.\n- Coverage status: S# = Supported / Refuted / Not found yet; Shared-subject: satisfied / not satisfied / unresolved.\n- Gaps/Next-hop target: ONE decisive missing item (identity, exact credit/date/number/definition; authoritative list/credits/results page; corroboration for single-source risk; 'original/first/debut' confirmation; entity existence check) to settle the claim.\n- Suggested query terms: One short line with exact titles + the missing attribute/role/value (add type disambiguators; include OR aliases if helpful).",
,   'create_query_2_prompt': "Task: Using {claim} and {summary_1}, produce the field 'query_2'.\nGoal: Target the single highest-value unresolved or contested element in summary_1 to advance verification or refutation.\nGuidelines:\n- Read 'Coverage status', 'Clause/subject resolution', and 'Gaps/Next-hop target'. Pick the ONE decisive item that would settle the claim: identity resolution (work → person/place), exact cast/host/writer credit, list/roster/inductee inclusion, release date/month/season, party/affiliation, parent company/ownership ('acquired'/'sold'/'as of'), founding/birth data, track listing membership, original/first/debut evidence, precise number/ranking/area/capacity/distance, definitional classification, or authoritative results table.\n- Ordinal/universal/negative claims ('first/last/only/both/neither/not/never') → query authoritative enumerations: official results tables, inductee lists, full rosters/credits, track listings; include the exact event/season/year.\n- Indirect references: if unresolved (e.g., album guest needs bio attributes), pivot from the work’s page to the person’s bio with precise attributes ('actor', 'from Cleveland, Ohio').\n- Temporal precision: encode exact qualifiers ('current/as of YEAR', event date, '2008–09 season', 'feature film debut 1935').\n- If summary_1 noted strong absence on a canonical page, pivot to the nearest authoritative list/definition/results or corroborating canonical page.\n- Keep compact; use AND between core entities/attributes; include parentheticals for disambiguation; OR only for critical aliases.\nProduce only the query_2 string.",
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements (5–8 bullets, evidence-only):\n- Integrate new facts from {passages_2} with prior evidence. Quote key numbers/dates/roles/phrases; maintain temporal/geo/role qualifiers. Do not invent.\n- Update coverage explicitly for each S# from summary_1: Supported/Refuted/Not found. Note if a single shared subject must satisfy multiple parts; set 'Shared-subject status: ...'.\n- Clause/subject and identity lock: confirm evidence refers to the exact entity type (film vs TV vs soundtrack; city vs county; club vs national vs all-star; award vs production). If unclear, keep 'identity unresolved' or 'subject mismatch' and propose a clarifying target.\n- Enumerate authoritative sets when relevant (principal cast/track listing/rosters/inductees/results) and intersect with required attributes. For 'original/first/debut/last', extract explicit 'original recording/first released/feature film debut/last place' evidence with dates and, if needed, total set size.\n- Call out counterexamples and strong absences on canonical/list pages. Distinguish role/type (director vs writer; singer vs songwriter). Flag 'single-source risk' if pivotal.\n- Precision: if the claim specifies a month/season/date/number/role/title/ordinal and evidence differs or is broader, mark Not found yet with 'temporal/metric/role/ordinal mismatch'. For multi-era/location claims, state whether simultaneity is required; if not, allow facts from different eras at the same entity when appropriate.\n- End with: Gaps/Next-hop target (the last unresolved piece to close the case or corroborate a single-source risk) and a concise Suggested query for hop 3 directly testing the remaining weakest link (e.g., missing identity/date/credit/metric/definition; ownership as-of; authoritative list/results; 'first/original/debut/last' confirmation; entity existence check).\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the final missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target the single unresolved component under 'Gaps/Next-hop target' that conclusively settles the claim: exact cast/host/writer/producer credit, full track listing, inductee/membership list, political party/affiliation, precise release month/season/date, original artist/first recording, feature film debut, event winners/roster, numeric metric (capacity/area/distance), definitional classification, or 'acquired/sold/as of' ownership change.\n- For ordinal/universal/exclusive claims ('first/last/only/both/neither/not/never'), issue a results/list query that enumerates the full set for the exact event/season/timeframe and the subject’s position; include total competitors if 'last' is asserted.\n- For ambiguous or possibly non-existent titles, issue an existence/definition query (disambiguation or award page) or a counterexample query from authoritative lists/credits.\n- Indirection/possessives: use the canonical work/organization page with the exact role/location term OR the resolved entity’s bio with the precise attribute ('member of', 'actor', 'from Cleveland, Ohio', 'feature film debut').\n- For multi-era/location claims that do not require simultaneity, query the entity’s history/unit pages to link eras (e.g., base later operated Mikoyan jets; original footage year elsewhere).\n- Reuse exact titles/role terms established earlier; keep compact and disambiguated; use AND between core terms; OR only for critical aliases.\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, align it to each part of the claim, and assess sufficiency for a verdict.\nRequirements (6–10 bullets, strictly grounded):\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing the last gaps/contradictions. Quote key values/years/roles/phrases. Keep property attachment correct across chains (A→B, B→C). Include definitional links when relevant (broader style vs named subset) and results/list enumerations with totals when ordinals matter.\n- Claim mapping:\n  - S1: Supported/Refuted/Not found (by [Title])\n  - S2: Supported/Refuted/Not found (by [Title])\n  - S3... (continue for all subclaims/qualifiers)\n  - Shared-subject status: satisfied / not satisfied / unresolved; Subject lock: passed/failed\n- Temporal/scope interpretation:\n  • Require simultaneity only if explicit ('at the time', 'during', 'current/as of').\n  • Allow multi-era facts at the same entity if the claim doesn’t require simultaneity (e.g., WWII footage at base; later jets at same base).\n  • Treat season labels (e.g., 2008–09) as not equivalent to calendar months unless explicitly tied; mismatches are contradictions when the claim is specific.\n- Ordinals and universals:\n  • For 'first/last/only/both/neither' or position claims, rely on authoritative enumerations/results. If evidence shows a different ordinal or gives a placement (e.g., 11th) without 'last' confirmation and the official list lacks 'last' phrasing for the subject, mark as contradiction or strong absence as appropriate.\n  • A single authoritative list/credits/results page can confirm inclusion/exclusion or refute universals/exclusives.\n- Reliability & ambiguity:\n  • Flag 'single-source risk — corroboration missing' if pivotal facts hinge on one obscure page.\n  • Resolve similarly named entities (film vs soundtrack; city vs county; award vs production). Flag subject mismatches and off-subject evidence.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (name the missing specific component: unresolved identity/date/credit/metric/definition, total competitors for 'last', or unmitigated single-source risk).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Parse the claim into essential components and resolve clause referents precisely. For relative clauses, attach properties to the correct referent (e.g., 'the sequel to the film directed by Anne Fletcher' refers to Step Up 2 being the sequel to Step Up (directed by Anne Fletcher), not that Anne Fletcher directed Step Up 2).\n- Enforce subject lock: evidence must pertain to the exact named/resolved entities (correct type/country/year/series vs soundtrack; club vs national vs all-star team; award vs production; parent vs grandparent). Misattributed relationships (e.g., wrong parent) are NOT_SUPPORTED.\n- Universals/negations/exclusives and ordinals:\n  • 'both', 'neither', 'only', 'all', 'not/never' require authoritative lists/credits/definitions that establish inclusion/exclusion. One credible counterexample refutes a universal/exclusive.\n  • 'first/last/position' requires explicit ordinal evidence from authoritative results/enumerations; if evidence shows a different ordinal (e.g., 11th) or lacks 'last' confirmation while listing placements, treat as NOT_SUPPORTED (strong absence or contradiction on official/results pages).\n- 'Original/first/debut' claims: Require explicit 'original recording'/'first released'/'feature film debut' with dates. Distinguish songwriter vs singer, recording vs release, debut vs early roles. If missing or ambiguous, NOT ENOUGH INFO.\n- Chronology and temporal precision: Do not equate academic/athletic season labels (e.g., 2008–09) with calendar months unless explicit. If the claim specifies a particular month/season/year and evidence differs, treat as NOT_SUPPORTED.\n- Style/definition chains: If the claim asserts that a subject was created in a broader style that, under a material variation, is called something else (e.g., 'stop-motion' that when using plasticine is 'clay animation'), SUPPORT only if evidence establishes both the subject’s broader style and the definitional relation; otherwise NOT ENOUGH INFO.\n- Existence/identity checks: If the claim hinges on a possibly non-existent entity-as-title (e.g., an award as a TV show), and authoritative evidence shows it is an award (not a production) or no such titled work exists, return NOT_SUPPORTED.\n- Famous-person attributes and multi-hop patterns: When the claim ties a work’s guest/author/artist to an occupation/attribute (e.g., 'guest is an American rapper and actor from Cleveland'), combine the work’s credits with the person’s bio. If either link is missing, NOT ENOUGH INFO.\n- Multi-era/location claims: If simultaneity is not required, facts from different eras at the same entity can satisfy the claim (e.g., footage at base in 1943; later jets operated there). If simultaneity is required but not met, NOT_SUPPORTED.\n- Conjunctions: Return SUPPORTED only if every essential component (all entities, the specific relation/attribute, and all qualifiers such as date/season/month/number/venue/geo/genre/role/superlative/universality/exclusivity/ordinal/comparison/definition/original/first/debut) is directly and unambiguously supported, with no unresolved contradictions, and any required shared subject is satisfied by the same entity.\n- Disjunctions/partials: Return NOT_SUPPORTED if any essential component is directly contradicted (wrong person/title/role; wrong location/date; definitional mismatch; canonical list/credits shows the opposite; non-existent entity). Treat strong absence on canonical/list/results pages as counterevidence for specific, checkable facts (credited host/cast/inductee/captain; explicit release month/season; membership/induction; track listing; feature film debut; original recording; ordinal/last place) when surrounding context is credible.\n- Otherwise, if any essential component remains unverified, identity/pronoun/possessive reference is unresolved, temporal/role/ordinal qualifiers are not explicit, or pivotal facts rely on a single obscure source (single-source risk), return NOT ENOUGH INFO.\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."},
,  'optimized_prompts': {'create_query_1_prompt': 'Task: Using {claim}, produce the field \'query_1\' as one high-recall, disambiguated Wikipedia abstracts search query.\nGoal: Retrieve the most authoritative page(s) that unlock the exact entities, relations, lists, ordinals, definitions, or metrics needed to verify the claim, with special care for multi-hop links, entity ambiguity, quantifiers (existential vs universal), ordinals/comparatives, dates/eras, soundtrack/roster/list inclusions, parent-vs-subsidiary distinctions, character-vs-actor vs work-vs-soundtrack, and location/metonymy pitfalls (infrastructure vs place).\nMethod (Decompose → Resolve entities/aliases → Identify quantifier/logic → Choose evidence form → Encode qualifiers → Disambiguate → Draft one query):\n- Decompose the claim into atomic slots S1, S2, ...: entities (people/works/teams/places), relations (acted in/first recorded by/member of/owned by/located in), qualifiers (date/season/era, location, membership/list inclusion, number/ranking/ordinal, metric, party/affiliation), and shared-subject or chain links (A→B→C requirements). Explicitly list all conjuncts that must be jointly true.\n- Resolve entities & aliases precisely:\n  • Distinguish type and naming traps (film vs TV series/episode/soundtrack; person vs character; venue vs event; club vs national team; parent vs subsidiary; city vs county vs island; brand vs product; infrastructure vs geographic area).\n  • For kinship or indirection (e.g., "son of X"), resolve to canonical names and targeted bios; include alias or variants if uncertain.\n  • For ambiguous \'from\' or nationality/origin, include both nationality and birthplace keywords when needed.\n- Identify quantifier/logic type: existential (\'a/an/any\'), conjunctive (AND across S#), negation/universal (\'both/neither/only/all/not/never\'), ordinal (\'first/last/only/position\'), comparative (greater/less/older/area/capacity), definitional/classificatory, temporal scope (as of/era consistency).\n- Choose target evidence form:\n  • Credits/rosters/track listings/inductees/soundtracks → authoritative list on the work/team/event page or specific list pages.\n  • Ordinals/results → pages with full enumerations/tables for the exact event/year/season.\n  • Comparatives → pages giving explicit numeric values for BOTH comparands + the metric keyword (area, distance, capacity, population, runtime, box office, height, MW).\n  • Negations/universals → authoritative lists to establish inclusion/exclusion or a counterexample.\n  • Biographical attributes (party affiliation, activism, birthdate, nationality/place of birth) → the person’s biography page.\n  • Anachronism checks → era-correct organizational name (e.g., United States Army Air Forces vs United States Air Force during WWII).\n- Encode critical qualifiers and scope (year/season/date, venue/location, exact role/credit such as host/writer/founder/owner/feature/cast list, soundtrack/track listing, list of foreign/import players, sibling lists, official results pages; for organizations, parent vs subsidiary). Use exact title parentheticals when relevant.\n- Disambiguate & alias: include parentheticals (e.g., (film), (U.S. TV series), (basketball), (1999 film)) and high-value aliases/OR variants; resolve pronouns.\n- Anticipate traps: similarly named but wrong entities; wrong medium; wrong era naming; list vs item; actor vs character; solo vs band; company vs brand; infrastructure vs place (e.g., cable car vs the highland it serves).\nGuidelines for query_1:\n- Keep under ~22 content tokens; front-load exact titles/names; use AND between core terms; OR only for critical aliases; include disambiguating parentheticals and needed year/metric/role keywords. Prefer canonical page titles.\n- Output discipline: produce only the query_1 string; no explanations.\nMicro examples:\n- Claim: “A Kathy Mattea #1 song was first recorded by a country singer.”\n  Query: Come from the Heart AND Don Williams original recording AND country singer\n- Claim: “Tim Whelan directed a 1935 film which was the feature debut of the actor born May 20, 1908.”\n  Query: The Murder Man (1935 film) AND James Stewart birthdate May 20 1908\n- Claim: “Wolf Parade is a Christian rock band and Brent Liles was their bassist.”\n  Query: Wolf Parade (band) members list AND genre AND Brent Liles\n- Claim: “Greatest Hits Volume 1 (Beatles) was released in 1984 in Australia/NZ.”\n  Query: Greatest Hits Volume 1 (Beatles album) release year 1984 Australia New Zealand\n- Claim: “James C. McReynolds was a Republican and served on the Court under Coolidge.”\n  Query: James Clark McReynolds party affiliation AND Supreme Court tenure Coolidge\n- Claim: “The Seattle Fighter Wing was a United States Air Force unit during 1942–45.”\n  Query: Seattle Fighter Wing WWII designation AND United States Army Air Forces\n- Claim: “A sequel to a German romantic fantasy film featuring Reinhard Hauff and Solveig Dommartin is Faraway, So Close!”\n  Query: Wings of Desire cast list AND Reinhard Hauff AND Solveig Dommartin AND Faraway, So Close! sequel\n- Claim: “Andrea Casiraghi is seventh in line to Monaco’s throne.”\n  Query: Andrea Casiraghi line of succession Monaco seventh\n- Claim: “Ulrich Walter and Léopold Eyharts both were not from Germany.”\n  Query: Ulrich Walter nationality AND Léopold Eyharts nationality\n- Claim: “Robert O’Brien came last in the K-1 10000 m at the 1956 Olympics.”\n  Query: 1956 Summer Olympics canoeing K-1 10000 m full results list last place',
,   'summarize_1_prompt': "Task: Using {claim} and {passages_1}, produce the field 'summary_1' as a concise, evidence-grounded synthesis to guide the next hop.\nOutput discipline:\n- Be strictly grounded in {passages_1}. Do not add knowledge beyond the text. If a needed detail is absent, write 'not stated'. If a passage contradicts a detail, mark 'contradiction'. If canonical pages lack an expected fact, mark 'strong absence on [Title]'. If evidence seems about a different entity, mark 'off-subject evidence'. If phrasing in the claim seems ill-formed (e.g., 'director of an Emmy Award', 'rule' vs 'role'), mark 'ill-formed phrase'. If an era-specific organizational name appears wrong (e.g., USAF for WWII), mark 'anachronism risk'. If a term may be metonymic (infrastructure vs place), flag 'metonymy risk'.\nCore structure (6–9 bullets, tight/actionable):\n- Claim logic & decomposition: state quantifier/logic type (existential/conjunctive/negation/universal/ordinal/comparative/definitional) and list S1, S2, ...; name any shared-subject links or chains (A→B→C).\n- Subject lock and entity/type match check: confirm retrieved pages match the claim’s exact subject(s) and types (film vs TV/episode/soundtrack; person vs character; venue vs event; club vs national; era-correct organization; location vs infrastructure). Note exact mismatches and needed corrections.\n- Entity coverage checklist: list each essential entity/title and mark Covered / Missing. Example: A (Covered), B (Missing), Year/Season (Missing), Metric (Missing), Party/Affiliation (Missing), List/Track/Roster (Missing).\n- Evidence bullets: [Title — SourceType] quote exact names/dates/roles/numbers/phrases tied to S#; include strong absences/contradictions/identity risks; keep property attachment correct (A→B, B→C; do not infer A→C unless stated).\n- Ordinal/universal/comparative precision: extract explicit ordinals/totals or BOTH metric values when relevant; mark 'metric missing' if either side is absent.\n- Coverage status: S# = Supported / Refuted / Not found yet; Shared-subject = satisfied / unresolved; Subject lock: passed/failed.\n- Gaps/Next-hop target: ONE decisive missing or contested item (identity resolution; second conjunct; exact credit/host/writer/founder; list/roster/inductee; soundtrack track listing; event winner/date; numeric metric for both sides; party/affiliation; definition/existence; broader context; birthplace vs nationality).\n- Suggested query terms: One short line with exact titles + the missing attribute/role/value (include OR aliases if helpful).",
,   'create_query_2_prompt': "Task: Using {claim} and {summary_1}, produce the field 'query_2'.\nGoal: Target the single highest-value unresolved or contested element in summary_1 to advance verification/refutation.\nGuidelines:\n- Read 'Subject lock and entity/type match check', 'Entity coverage checklist', 'Coverage status', and 'Gaps/Next-hop target'. If the subject is mismatched (wrong work/show/person/event, character vs actor, parent vs subsidiary, wrong era naming, infrastructure vs place), correct it with precise disambiguation (e.g., (film), (U.S. TV series), (basketball), 'United States Army Air Forces', 'Ngong Ping (highland)' vs 'Ngong Ping 360').\n- Choose ONE decisive item:\n  • Identity/alias resolution; the missing conjunct entity; exact credit/host/writer/founder; list/roster/inductee; soundtrack/track listing with exact song; event winner/placement with year/venue; numeric metric for BOTH comparands; party affiliation/activism; definitional existence when claim seems ill-formed; birthplace vs nationality disambiguation.\n- For negations/universals/ordinals ('both/neither/only/all/not/never/first/last/only/position'), query authoritative lists/results for the exact timeframe.\n- For comparisons/classifications, seek explicit metric or definition pages; include the metric/category keywords for both items.\n- If an essential entity page is Missing, pivot to that entity’s canonical page or authoritative list (e.g., 'cast list', 'discography', 'roster', 'results by year', 'list of inductees').\n- Keep compact; use AND between core entities/attributes; include disambiguating parentheticals and year/venue; OR only for critical aliases.\nProduce only the query_2 string.",
,   'summarize_2_prompt': "Task: Using {claim}, {summary_1}, and {passages_2}, produce the field 'summary_2'.\nRequirements (6–9 bullets, evidence-only):\n- Integrate new facts from {passages_2} with prior evidence. Quote key numbers/dates/roles/phrases; retain temporal/geo/role qualifiers; do not invent.\n- Update the Subject lock and Entity/type/Era/Location identity: confirm exact entity type (film vs TV/episode/soundtrack; city vs county vs island; club vs national; parent vs subsidiary; character vs actor; infrastructure vs place; era-correct organization like USAAF vs USAF). If unclear, mark 'identity unresolved' or 'subject mismatch' and propose a clarifying target.\n- Update the Entity coverage checklist explicitly: mark newly Covered vs still Missing (entity titles, lists/rosters/track listings, year/season, metric values, party/affiliation, definition pages).\n- Update coverage for each S# from summary_1: Supported/Refuted/Not found. State Shared-subject status; if a subject mismatch from hop 1 is now corrected, say 'subject corrected to [Title]' and update Subject lock: passed/failed. Call out 'anachronism resolved' or 'metonymy resolved' if fixed.\n- Enumerations/sets: when relevant (cast/credits/rosters/inductees/results/soundtracks), note whether a full list is present and intersect it with required attributes. For 'first/last/only', extract explicit ordinals and totals.\n- Comparatives/classifications: extract explicit metric values for BOTH comparands or state definition criteria; if a metric for either side is missing, mark 'metric missing — cannot compare yet'.\n- Counterevidence & strong absences: call out counterexamples and strong absences on canonical/list pages. Flag 'single-source risk' if pivotal.\n- End with: Gaps/Next-hop target (the last unresolved piece or corroboration), plus a concise Suggested query for hop 3 that directly tests the remaining weakest link (identity/date/credit/metric/definition; broader-context confirmation; authoritative list/results; alias/kinship list; activism/party; imports/foreign-player roster; birthplace vs nationality).\nProduce only the summary_2 string.",
,   'create_query_3_prompt': "Task: Using {claim}, {summary_1}, and {summary_2}, produce the field 'query_3'.\nGoal: Retrieve the final missing piece(s) needed for a verdict or directly confirm/refute a suspected false premise.\nGuidelines:\n- Target the single unresolved component under 'Gaps/Next-hop target' that conclusively settles the claim: missing conjunct entity, exact credit/roster/inductee, soundtrack track listing, precise date/month/season, original artist/first recording, feature film debut, official results/placements with totals, numeric metric for BOTH comparands, definitional classification, political party/activism, ownership/rename/legal-form change, or birthplace vs nationality.\n- For negations/universals/ordinals ('both/neither/only/all/not/never/first/last/only/position'), issue a list/results query that enumerates the full set for the exact event/timeframe; include totals if 'first/last' is asserted.\n- For scope-narrowing or metonymy risks, query the authoritative broader context page (overall series/event/organization/place vs infrastructure) to ensure the claim does not misrepresent scope.\n- For ambiguous or possibly non-existent titles/phrases, issue a definition/existence or disambiguation query, or pivot to the authoritative entity bio/company page. For kinship/indirect references, query '[X] siblings' or the sibling’s bio directly. For era naming risks, query the era-correct organizational term.\n- Reuse exact titles/role terms established earlier; keep compact, disambiguated; use AND between core terms; OR only for critical aliases.\nProduce only the query_3 string.",
,   'summarize_3_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {passages_3}, produce the field 'summary_3'.\nGoal: Consolidate all evidence, align it to each part of the claim, and assess sufficiency for a verdict.\nRequirements (7–10 bullets, strictly grounded):\n- Evidence bullets: [Title] — salient fact(s) from hop 3 (plus crucial confirmations from earlier hops) addressing remaining gaps/contradictions. Quote key values/years/roles/phrases. Keep property attachment correct across chains (A→B, B→C). Include definitional links, ownership/legal-form changes with dates, official results/enumerations, and soundtrack/track listings when ordinals/universals/list inclusions matter. Note 'anachronism confirmed/resolved' or 'metonymy confirmed/resolved' where relevant.\n- Entity coverage checklist (final): mark all essential entities/lists/metrics/attributes (e.g., party/activism; birthplace vs nationality) as Covered or Missing. Note any 'strong absence on [Title]' that bears on the claim.\n- Claim mapping:\n  - S1: Supported/Refuted/Not found (by [Title])\n  - S2: Supported/Refuted/Not found (by [Title])\n  - S3... (continue)\n  - Shared-subject status: satisfied / not satisfied / unresolved; Subject lock: passed/failed (explicitly note parent vs subsidiary, character vs actor, infrastructure vs place, and era-correct organization if relevant)\n- Temporal/scope interpretation:\n  • Require simultaneity only if explicit ('at the time', 'during', 'current/as of'); otherwise, sequential membership at the same entity can satisfy the claim.\n  • Treat season labels vs calendar months carefully; mismatches are contradictions when the claim is specific.\n  • Era naming must be correct (e.g., WWII units under USAAF, not USAF); treat mismatches as material omissions if the claim asserts the wrong organization.\n  • Resolve location/metonymy: if a claim references an infrastructure name but intends the place it serves (or vice versa), state the interpretation adopted and whether the evidence supports it.\n- Ordinals/universals/comparatives/classifications:\n  • For 'first/last/only/both/neither/all/not/never' or position claims, rely on authoritative enumerations/results. Strong absence on canonical pages counts against the claim.\n  • For comparisons, require explicit numeric values for both sides; if a required metric is missing, mark 'insufficient metric evidence'.\n  • For classification (e.g., hydrogen-only vs bi-fuel vs flex-fuel), apply definition criteria explicitly.\n- Reliability & ambiguity:\n  • Flag 'single-source risk — corroboration missing' if pivotal facts hinge on one obscure page.\n  • Resolve similarly named entities (series vs soundtrack; person vs character; venue vs event; sequel vs original) and parent vs subsidiary ownership. Flag subject mismatches and off-subject evidence.\n- Misleading/omission check:\n  • If the claim narrows scope in a way that materially misrepresents a broader, well-established scope (e.g., attributing an event outcome to a venue; crediting a parent for a subsidiary; wrong organizational era; infrastructure vs place confusion), note 'material omission' and treat as contradiction where appropriate.\n- Evidence sufficiency: Sufficient for verdict / Insufficient (name the missing specific component: unresolved identity/date/credit/metric/definition, totals for 'last/first', party-vs-activism verification, parent-vs-subsidiary or character-vs-actor ambiguity, location/metonymy unresolved, or unmitigated single-source risk).\nProduce only the summary_3 string.",
,   'final_answer_prompt': "Task: Using {claim}, {summary_1}, {summary_2}, and {summary_3}, return the final label only.\nDecision rules (apply strictly):\n- Parse the claim into essential components; resolve clause referents and pronouns precisely. Attach properties to the correct referent (e.g., sequel vs original; daughter-of; 'the man X prosecuted'; venue vs event; parent vs subsidiary/local channel; character vs actor; soundtrack vs film/series; hydrogen-only vs bi-fuel vs flex-fuel; infrastructure vs place; era-correct organization name).\n- Subject lock: evidence must pertain to the exact named/resolved entities (correct type/year/country/series vs soundtrack; club vs national vs all-star; award vs production; parent vs subsidiary; resolved kinship; era-appropriate naming; location vs infrastructure). Misattributed relationships or wrong-era organizations are NOT_SUPPORTED.\n- Conjunctions: SUPPORTED only if every essential component (entities + relations + qualifiers like date/season/month/number/venue/geo/genre/role/superlative/universality/ordinal/comparison/definition/original/first/debut/party/activism/same nationality/era/birthplace-vs-nationality interpretation) is directly and unambiguously supported. If any essential part is Refuted, return NOT_SUPPORTED; if any essential part remains unverified or identity is unresolved, return NOT ENOUGH INFO.\n- Universals/negations/ordinals:\n  • 'both', 'neither', 'only', 'all', 'not/never', 'first/last/only/position' require authoritative lists/credits/results/definitions to establish inclusion/exclusion or exact order. One credible counterexample refutes a universal/exclusive.\n- Comparatives and metrics: Require explicit, numeric metric evidence for both comparands from authoritative pages (e.g., area, distance, capacity, height, MW). If a needed metric is missing, return NOT ENOUGH INFO. If metrics contradict the claim, return NOT_SUPPORTED.\n- Lists/rosters/soundtracks: Require explicit inclusion on an authoritative list (e.g., track listing/roster/credits/imports). Strong absence on the canonical page counts against the claim.\n- Classification/definition checks: Apply definitional criteria (e.g., FFV vs bi-fuel vs hydrogen-only). If the claim overgeneralizes or misclassifies, and definitions contradict, return NOT_SUPPORTED; if classification cannot be resolved, NOT ENOUGH INFO.\n- Existence/identity checks and kinship resolution: If the claim hinges on a possibly non-existent or mis-typed entity, or an ill-formed phrase whose intended entity cannot be resolved, and authoritative evidence shows a different type or no such title, return NOT_SUPPORTED. If identity (e.g., 'the man born on [date]') remains unresolved, return NOT ENOUGH INFO.\n- Multi-era/location claims: If simultaneity is not required by wording, sequential facts at the same entity can suffice; if simultaneity is required but not met, NOT_SUPPORTED. For era naming, if the claim asserts the wrong organization for the timeframe (e.g., USAF in WWII), treat as NOT_SUPPORTED (material omission).\n- Ambiguity resolution:\n  • If wording like 'from' could mean nationality vs birthplace, use the interpretation supported by the evidence chain; if unresolved, NOT ENOUGH INFO.\n  • If a term plausibly refers to a place via an infrastructure name (metonymy), adopt the interpretation best grounded in evidence; if the claim’s literal reading is contradicted but a location-based reading is supported, treat the literal as NOT_SUPPORTED unless the claim clearly intends the location.\n- Misleading/omission rule: If authoritative evidence shows the true scope is materially broader/different than the claim’s narrowed framing (e.g., attributing an event outcome to a venue; crediting a parent for a subsidiary; mixing character with actor; wrong sequel/original or work-vs-soundtrack), treat as NOT_SUPPORTED (material omission) rather than SUPPORTED.\n- Strong absence: Treat strong absence on canonical/list/results pages as counterevidence for specific, checkable facts (credited cast/host/inductee/captain; explicit release year/month; membership/induction; track listing; feature film debut; original recording; ordinal/last place; birthplaces/nationalities), when surrounding context is credible.\nOutput: One token exactly among SUPPORTED, NOT_SUPPORTED, NOT ENOUGH INFO. Do not add any explanation or punctuation."}}]
[ ]