Lancedb Retrieval with Alph
Use LanceDB to retrieve snippets from Alph's llms.txt
Index & Search llms.txt with LanceDB Cloud
Build a semantic search engine over any website's
llms.txtfile β powered by LanceDB Cloud and Alph.
What is llms.txt?
llms.txt is an emerging standard β a Markdown file at the root of a website that provides structured context for LLMs. Think of it as robots.txt for AI: it tells language models what a site is about, links to docs, and surfaces key facts.
What we'll do
| Step | Description |
|---|---|
| 1 | π¦ Install dependencies |
| 2 | π Connect to LanceDB Cloud |
| 3 | π₯ Fetch & parse llms.txt from runalph.ai |
| 4 | π§ Generate embeddings with a local sentence-transformer |
| 5 | ποΈ Ingest into a LanceDB table & build a vector index |
| 6 | π Interactive semantic search with @param widgets |
| 7 | π§Ή Cleanup |
Step 1 Β· π¦ Install Dependencies
Step 2 Β· π Connect to LanceDB Cloud
Credentials are loaded from the .env file in your project root.
β URI: db://default-jhihj6 β Region: us-east-1 β API Key: sk_IQYKPβ’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’β’
Step 3 Β· π₯ Fetch & Parse llms.txt
We pull the raw Markdown from runalph.ai/llms.txt and split it into meaningful chunks β one per section heading. Each chunk becomes a searchable document in our vector database.
π Fetched 6,994 characters from https://runalph.ai/llms.txt π Preview: ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # Alph > Alph is a cloud platform where Jupyter notebooks are first-class citizens. Standard .ipynb files, cloud compute via JupyterLab projects, built-in AI assistance (Claude, GPT, Gemini), GitHub sync, notebook publishing, embeddable notebooks, scheduled automations, and web app hosting β all in a multi-tenant organization workspace. - Website: https://runalph.ai - Documentation: https://docs.runalph.ai - GitHub: https://github.com/alph-ai - CLI: `pip install alphai` (PyPI: https://pypi.org β¦
Step 4 Β· π§ Chunk, Embed & Ingest
We split by Markdown headings, generate embeddings with a local sentence-transformer, and push everything into LanceDB Cloud in one go.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
π¦ Parsed 6 chunks
modules.json: 0%| | 0.00/349 [00:00<?, ?B/s]
config_sentence_transformers.json: 0%| | 0.00/116 [00:00<?, ?B/s]
README.md: 0.00B [00:00, ?B/s]
sentence_bert_config.json: 0%| | 0.00/53.0 [00:00<?, ?B/s]
config.json: 0%| | 0.00/612 [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/90.9M [00:00<?, ?B/s]
Loading weights: 0%| | 0/103 [00:00<?, ?it/s]
BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2 Key | Status | | ------------------------+------------+--+- embeddings.position_ids | UNEXPECTED | | Notes: - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
tokenizer_config.json: 0%| | 0.00/350 [00:00<?, ?B/s]
vocab.txt: 0.00B [00:00, ?B/s]
tokenizer.json: 0.00B [00:00, ?B/s]
special_tokens_map.json: 0%| | 0.00/112 [00:00<?, ?B/s]
config.json: 0%| | 0.00/190 [00:00<?, ?B/s]
Batches: 0%| | 0/1 [00:00<?, ?it/s]
β Generated 6 embeddings (dim=384)
Step 5 Β· π Semantic Search
Search across the indexed llms.txt content. Change the query and re-run!
π Top 5 results for: "How does AI work in Alph notebooks?" ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ #1 π Guides (distance: 1.0353) - [Automate Notebooks](https://docs.runalph.ai/guides/automate-notebooks): Schedule notebook cells to run on a cron β ETL pipelines, reports, model retraining, monitoring; configure cell ranges, viewβ¦ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ #2 π Core Concepts (distance: 1.0404) - [Notebooks](https://docs.runalph.ai/concepts/notebooks): Creating, editing, and executing standard .ipynb notebooks; keyboard shortcuts; AI cell generation; publishing and sharing; semantic search;β¦ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ #3 π Docs (distance: 1.0682) - [Welcome to Alph](https://docs.runalph.ai): Platform overview β what Alph is, who it's for (AI/ML engineers, data scientists, data teams), and how to get started -β¦ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ #4 π Alph (distance: 1.0705) > Alph is a cloud platform where Jupyter notebooks are first-class citizens. Standard .ipynb files, cloud compute via JupyterLab projects, built-in AI assistance (Claude, GPT, Gemini), GitHub sync,β¦ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ #5 π Optional (distance: 1.1855) - [Anthropic Notebooks](https://docs.runalph.ai/examples/anthropic): Run Anthropic SDK examples live in Alph - [OpenAI Notebooks](https://docs.runalph.ai/examples/openai): Run OpenAI SDK examplesβ¦
Step 6 Β· π§Ή Cleanup
Uncomment and run to drop the table when you're done.
π That's it! You fetched a site's
llms.txt, chunked it, embedded it, stored it in LanceDB Cloud, and ran semantic search β all in one notebook.Next steps: Swap the URL param to index any site's
llms.txt. Tryhttps://docs.anthropic.com/llms.txtorhttps://docs.stripe.com/llms.txt!Built with β€οΈ on Alph + LanceDB Cloud