Notebooks
M
Mistral AI
Property Graph

Property Graph

mistral-cookbookLlamaIndexthird_partypropertygraphs

Open In Colab

PropertyGraph Index with Mistral AI and LlamaIndex

In this notebook, we demonstrate the basic usage of the PropertyGraphIndex in LlamaIndex.

The property graph index will process unstructured documents, extract a property graph from them, and offer various methods for querying this graph.

Setup

[ ]
[1]
[2]
[3]

Download Data

[4]
--2024-07-05 07:22:24--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8003::154, 2606:50c0:8002::154, 2606:50c0:8000::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8003::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’

data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.06s   

2024-07-05 07:22:24 (1.27 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]

Load Data

[5]

Create PropertyGraphIndex

The following steps occur during the creation of a PropertyGraph:

  1. PropertyGraphIndex.from_documents(): We load documents into an index.

  2. Parsing Nodes: The index parses the documents into nodes.

  3. Extracting Paths from Text: The nodes are passed to an LLM, which is prompted to generate knowledge graph triples (i.e., paths).

  4. Extracting Implicit Paths: The node.relationships property is used to infer implicit paths.

  5. Generating Embeddings: Embeddings are generated for each text node and graph node, occurring twice during the process.

[6]
/Users/ravithejad/Desktop/llamaindex/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 27.96it/s]
Extracting paths from text: 100%|██████████| 22/22 [00:24<00:00,  1.13s/it]
Extracting implicit paths: 100%|██████████| 22/22 [00:00<00:00, 11592.30it/s]
Generating embeddings: 100%|██████████| 44/44 [00:13<00:00,  3.18it/s]

For debugging purposes, the default SimplePropertyGraphStore includes a helper to save a networkx representation of the graph to an html file.

[7]
[8]

Querying

Querying a property graph index typically involves using one or more sub-retrievers and combining their results. The process of graph retrieval includes:

  1. Selecting Nodes: Identifying the initial nodes of interest within the graph.
  2. Traversing: Moving from the selected nodes to explore connected elements.

By default, two primary types of retrieval are employed simultaneously:

• Synonym/Keyword Expansion: Utilizing an LLM to generate synonyms and keywords derived from the query.

• Vector Retrieval: Employing embeddings to locate nodes within your graph.

Once nodes are identified, you can choose to:

• Return Paths: Provide the paths adjacent to the selected nodes, typically in the form of triples.

• Return Paths and Source Text: Provide both the paths and the original source text of the chunk, if available.

Retreival

[9]
Viaweb -> Launch date -> January 1996
Viaweb -> Growth rate -> 7x a year
Viaweb -> Received seed funding from -> Julian
Viaweb -> Is -> Online store builder
Viaweb -> User growth -> 70 stores at the end of 1996 and about 500 at the end of 1997
Viaweb -> Pricing -> $100 a month for a small store and $300 a month for a big one
Viaweb -> Acquisition -> Bought by yahoo in the summer of 1998
Viaweb -> Has -> Code editor
Viaweb -> Strategy -> Doing things that don't scale
Viaweb -> Developed by -> Robert and trevor
Viaweb -> Founded by -> Paul graham and robert
Viaweb -> Reached breakeven -> Summer of 1998
Viaweb -> Was -> One of the best general-purpose site builders
Viaweb -> Started by -> I
Viaweb -> Service -> Building stores for users
Viaweb -> Investors -> Had significant influence on company decisions
Viaweb -> Software -> Works via the web
Viaweb -> Was founded by -> I and robert morris
Viaweb -> Bought by -> Yahoo
Viaweb -> Initial product -> Wysiwyg site builder
Viaweb -> Had -> Handful of employees
Viaweb -> Started for -> Needing money
Viaweb -> Hosts -> Stores
Viaweb -> Status before acquisition -> Not profitable
I -> Got a job at -> Interleaf
Interleaf -> Made software for -> Creating documents
Interleaf -> Added a scripting language -> Lisp
I -> Arranged to do freelance work for -> Interleaf

QueryEngine

[10]

Storage

By default, storage is managed using our straightforward in-memory abstractions—SimpleVectorStore for embeddings and SimplePropertyGraphStore for the property graph.

We can save and load these structures to and from disk.

[11]

Vector Stores

While some graph databases, such as Neo4j, support vectors, you can still specify which vector store to use with your graph in cases where vectors are not supported, or when you want to override the default settings.

Below, we will demonstrate how to combine ChromaVectorStore with the default SimplePropertyGraphStore.

[ ]

Build and Save Index

[12]
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 28.20it/s]
Extracting paths from text: 100%|██████████| 22/22 [00:24<00:00,  1.13s/it]
Extracting implicit paths: 100%|██████████| 22/22 [00:00<00:00, 7562.26it/s]
Generating embeddings: 100%|██████████| 45/45 [00:14<00:00,  3.13it/s]

Load Index

[13]
[ ]