Property Graph
PropertyGraph Index with Mistral AI and LlamaIndex
In this notebook, we demonstrate the basic usage of the PropertyGraphIndex in LlamaIndex.
The property graph index will process unstructured documents, extract a property graph from them, and offer various methods for querying this graph.
Setup
Download Data
--2024-07-05 07:22:24-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8003::154, 2606:50c0:8002::154, 2606:50c0:8000::154, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8003::154|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘data/paul_graham/paul_graham_essay.txt’ data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.06s 2024-07-05 07:22:24 (1.27 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
Load Data
Create PropertyGraphIndex
The following steps occur during the creation of a PropertyGraph:
-
PropertyGraphIndex.from_documents(): We load documents into an index.
-
Parsing Nodes: The index parses the documents into nodes.
-
Extracting Paths from Text: The nodes are passed to an LLM, which is prompted to generate knowledge graph triples (i.e., paths).
-
Extracting Implicit Paths: The node.relationships property is used to infer implicit paths.
-
Generating Embeddings: Embeddings are generated for each text node and graph node, occurring twice during the process.
/Users/ravithejad/Desktop/llamaindex/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 27.96it/s] Extracting paths from text: 100%|██████████| 22/22 [00:24<00:00, 1.13s/it] Extracting implicit paths: 100%|██████████| 22/22 [00:00<00:00, 11592.30it/s] Generating embeddings: 100%|██████████| 44/44 [00:13<00:00, 3.18it/s]
For debugging purposes, the default SimplePropertyGraphStore includes a helper to save a networkx representation of the graph to an html file.
Querying
Querying a property graph index typically involves using one or more sub-retrievers and combining their results. The process of graph retrieval includes:
- Selecting Nodes: Identifying the initial nodes of interest within the graph.
- Traversing: Moving from the selected nodes to explore connected elements.
By default, two primary types of retrieval are employed simultaneously:
• Synonym/Keyword Expansion: Utilizing an LLM to generate synonyms and keywords derived from the query.
• Vector Retrieval: Employing embeddings to locate nodes within your graph.
Once nodes are identified, you can choose to:
• Return Paths: Provide the paths adjacent to the selected nodes, typically in the form of triples.
• Return Paths and Source Text: Provide both the paths and the original source text of the chunk, if available.
Retreival
Viaweb -> Launch date -> January 1996 Viaweb -> Growth rate -> 7x a year Viaweb -> Received seed funding from -> Julian Viaweb -> Is -> Online store builder Viaweb -> User growth -> 70 stores at the end of 1996 and about 500 at the end of 1997 Viaweb -> Pricing -> $100 a month for a small store and $300 a month for a big one Viaweb -> Acquisition -> Bought by yahoo in the summer of 1998 Viaweb -> Has -> Code editor Viaweb -> Strategy -> Doing things that don't scale Viaweb -> Developed by -> Robert and trevor Viaweb -> Founded by -> Paul graham and robert Viaweb -> Reached breakeven -> Summer of 1998 Viaweb -> Was -> One of the best general-purpose site builders Viaweb -> Started by -> I Viaweb -> Service -> Building stores for users Viaweb -> Investors -> Had significant influence on company decisions Viaweb -> Software -> Works via the web Viaweb -> Was founded by -> I and robert morris Viaweb -> Bought by -> Yahoo Viaweb -> Initial product -> Wysiwyg site builder Viaweb -> Had -> Handful of employees Viaweb -> Started for -> Needing money Viaweb -> Hosts -> Stores Viaweb -> Status before acquisition -> Not profitable I -> Got a job at -> Interleaf Interleaf -> Made software for -> Creating documents Interleaf -> Added a scripting language -> Lisp I -> Arranged to do freelance work for -> Interleaf
QueryEngine
Storage
By default, storage is managed using our straightforward in-memory abstractions—SimpleVectorStore for embeddings and SimplePropertyGraphStore for the property graph.
We can save and load these structures to and from disk.
Vector Stores
While some graph databases, such as Neo4j, support vectors, you can still specify which vector store to use with your graph in cases where vectors are not supported, or when you want to override the default settings.
Below, we will demonstrate how to combine ChromaVectorStore with the default SimplePropertyGraphStore.
Build and Save Index
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 28.20it/s] Extracting paths from text: 100%|██████████| 22/22 [00:24<00:00, 1.13s/it] Extracting implicit paths: 100%|██████████| 22/22 [00:00<00:00, 7562.26it/s] Generating embeddings: 100%|██████████| 45/45 [00:14<00:00, 3.13it/s]