Notebooks
A
Athina
Basic Unstructured Rag

Basic Unstructured Rag

llmsathina-rag-cookbooksadvanced_rag_techniquesopenaiAItutorialsChromaDBcookbooksfaissqdrantLLMPythonpineconeraglangchainweaviate

Unstructured RAG

Unstructured or (Semi-Structured) RAG is a method designed to handle documents that combine text, tables, and images. It addresses challenges like broken tables caused by text splitting and the difficulty of embedding tables for semantic search.

Here we are using unstructured.io to parse and separate text, tables, and images.

Tool Reference: Unstructured

Initial Setup

[ ]
[ ]
[3]

Indexing

[4]
[ ]
[7]
Counter({"<class 'unstructured.documents.elements.CompositeElement'>": 14,
,         "<class 'unstructured.documents.elements.TableChunk'>": 2})
[8]
{'CompositeElement', 'Table'}
[9]
[10]

Vector Store

[11]

Retriever

[12]

RAG Chain

[13]
[14]
[15]
'To compare all the Training Results on the MATH Test Set, we can look at the results from Table 6 in the provided context. The results are as follows:\n\n- deepseek-sft-abel:\n   - SFT-phase1: 0.372\n   - SFT-phase2-shortcutLearning: 0.386\n   - SFT-phase2-journeyLearining: 0.470\n   - DPO: 0.472\n\n- deepseek-sft-prm800k:\n   - SFT-phase1: 0.290\n   - SFT-phase2-shortcutLearning: 0.348\n   - SFT-phase2-journeyLearining: 0.428\n   - DPO: 0.440\n\nBased on these results, we can see that Journey Learning led to significant improvements compared to Shortcut Learning on both models, with gains of +8.4 and +8.0 on deepseek-sft-abel and deepseek-sft-prm800k, respectively. The DPO results were also provided for comparison.'

Preparing Data for Evaluation

[16]
/usr/local/lib/python3.10/dist-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `BaseRetriever.get_relevant_documents` was deprecated in langchain-core 0.1.46 and will be removed in 0.3.0. Use invoke instead.
  warn_deprecated(
[17]
[18]
[19]
[20]

Evaluation in Athina AI

We will use Does Response Answer Query eval here. It Checks if the response answer the user's query. To learn more about this. Please refer to our documentation for further details.

[21]
/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py:547: UserWarning: <built-in function any> is not a Python type (it may be an instance of an object), Pydantic will allow any object with no validation since we cannot even enforce that the input is an instance of the given type. To get rid of this error wrap the type with `pydantic.SkipValidation`.
  warn(
[22]
[23]
You can view your dataset at: https://app.athina.ai/develop/e5dec38c-c58c-412d-b910-588d97ccd090