Notebooks
L
LanceDB
Lancedb Retrieval with Alph

Lancedb Retrieval with Alph

Use LanceDB to retrieve snippets from Alph's llms.txt

vector-databaseAlph

Index & Search llms.txt with LanceDB Cloud

Build a semantic search engine over any website's llms.txt file β€” powered by LanceDB Cloud and Alph.

Open in Alph

What is llms.txt?

llms.txt is an emerging standard β€” a Markdown file at the root of a website that provides structured context for LLMs. Think of it as robots.txt for AI: it tells language models what a site is about, links to docs, and surfaces key facts.

What we'll do

StepDescription
1πŸ“¦ Install dependencies
2πŸ”— Connect to LanceDB Cloud
3πŸ“₯ Fetch & parse llms.txt from runalph.ai
4🧠 Generate embeddings with a local sentence-transformer
5πŸ—„οΈ Ingest into a LanceDB table & build a vector index
6πŸ” Interactive semantic search with @param widgets
7🧹 Cleanup

Step 1 Β· πŸ“¦ Install Dependencies

[2]

Step 2 Β· πŸ”— Connect to LanceDB Cloud

Credentials are loaded from the .env file in your project root.

[3]
βœ… URI:    db://default-jhihj6
βœ… Region: us-east-1
βœ… API Key: sk_IQYKPβ€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’β€’
[ ]

Step 3 Β· πŸ“₯ Fetch & Parse llms.txt

We pull the raw Markdown from runalph.ai/llms.txt and split it into meaningful chunks β€” one per section heading. Each chunk becomes a searchable document in our vector database.

[5]
llms_txt_url
πŸ“„ Fetched 6,994 characters from https://runalph.ai/llms.txt
πŸ“ Preview:
────────────────────────────────────────────────────────────
# Alph

> Alph is a cloud platform where Jupyter notebooks are first-class citizens. Standard .ipynb files, cloud compute via JupyterLab projects, built-in AI assistance (Claude, GPT, Gemini), GitHub sync, notebook publishing, embeddable notebooks, scheduled automations, and web app hosting β€” all in a multi-tenant organization workspace.

- Website: https://runalph.ai
- Documentation: https://docs.runalph.ai
- GitHub: https://github.com/alph-ai
- CLI: `pip install alphai` (PyPI: https://pypi.org
…

Step 4 · 🧠 Chunk, Embed & Ingest

We split by Markdown headings, generate embeddings with a local sentence-transformer, and push everything into LanceDB Cloud in one go.

[6]
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
πŸ“¦ Parsed 6 chunks
modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]
config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]
README.md: 0.00B [00:00, ?B/s]
sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]
model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]
Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]
BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]
vocab.txt: 0.00B [00:00, ?B/s]
tokenizer.json: 0.00B [00:00, ?B/s]
special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]
Batches:   0%|          | 0/1 [00:00<?, ?it/s]
βœ… Generated 6 embeddings (dim=384)
[ ]

Step 5 Β· πŸ” Semantic Search

Search across the indexed llms.txt content. Change the query and re-run!

[9]
search_query
num_results
5
πŸ”Ž Top 5 results for: "How does AI work in Alph notebooks?"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  #1  πŸ“Œ Guides  (distance: 1.0353)
  - [Automate Notebooks](https://docs.runalph.ai/guides/automate-notebooks): Schedule notebook cells to run on a cron β€” ETL pipelines, reports, model retraining, monitoring; configure cell ranges, view…

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  #2  πŸ“Œ Core Concepts  (distance: 1.0404)
  - [Notebooks](https://docs.runalph.ai/concepts/notebooks): Creating, editing, and executing standard .ipynb notebooks; keyboard shortcuts; AI cell generation; publishing and sharing; semantic search;…

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  #3  πŸ“Œ Docs  (distance: 1.0682)
  - [Welcome to Alph](https://docs.runalph.ai): Platform overview β€” what Alph is, who it's for (AI/ML engineers, data scientists, data teams), and how to get started -…

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  #4  πŸ“Œ Alph  (distance: 1.0705)
  > Alph is a cloud platform where Jupyter notebooks are first-class citizens. Standard .ipynb files, cloud compute via JupyterLab projects, built-in AI assistance (Claude, GPT, Gemini), GitHub sync,…

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  #5  πŸ“Œ Optional  (distance: 1.1855)
  - [Anthropic Notebooks](https://docs.runalph.ai/examples/anthropic): Run Anthropic SDK examples live in Alph - [OpenAI Notebooks](https://docs.runalph.ai/examples/openai): Run OpenAI SDK examples…


Step 6 · 🧹 Cleanup

Uncomment and run to drop the table when you're done.

[ ]

πŸŽ‰ That's it! You fetched a site's llms.txt, chunked it, embedded it, stored it in LanceDB Cloud, and ran semantic search β€” all in one notebook.

Next steps: Swap the URL param to index any site's llms.txt. Try https://docs.anthropic.com/llms.txt or https://docs.stripe.com/llms.txt!

Built with ❀️ on Alph + LanceDB Cloud