Subreddit Summarization Querying
Demo: Summarize and search through long reddit posts using dlt, Notion, and LanceDB
If you have the attention span to read those extra-long Reddit posts, you deserve respect. If you don't, then you deserve this demo.
By the end of this 100% free demo, you'll have something like this, without needing to be a Python pro (well, not the happiest example... π):

So what exactly is this Colab for?
TL;DR: You'll learn how to automatically load AI summarized content from a specific subreddit into Notion, making content management and review more efficient for creators.

The full scoop:
- This notebook is your testament to the fact that YES, you can indeed automate the summary of those never-ending Reddit posts and park them neatly into Notion, all without spending a dime.
- Consider this a
one-stop-shop template to breeze through content from any subredditβ because, letβs face it, nobody has the time to read that much anymore. - If you fancy a bit of coding, customize your data source and tweak this setup to do anything else AI might handle β like:
- Bulk loading comments for sentiment analysis.
- Automating translations across any language.
The coding corner
1. Install and import necessary libraries:
Collecting praw
Downloading praw-7.7.1-py3-none-any.whl (191 kB)
ββββββββββββββββββββββββββββββββββββββββ 191.0/191.0 kB 2.7 MB/s eta 0:00:00
Collecting notion_client
Downloading notion_client-2.2.1-py2.py3-none-any.whl (13 kB)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1)
Requirement already satisfied: dlt in /usr/local/lib/python3.10/dist-packages (0.5.1)
Collecting prawcore<3,>=2.1 (from praw)
Downloading prawcore-2.4.0-py3-none-any.whl (17 kB)
Collecting update-checker>=0.18 (from praw)
Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Requirement already satisfied: websocket-client>=0.54.0 in /usr/local/lib/python3.10/dist-packages (from praw) (1.8.0)
Collecting httpx>=0.15.0 (from notion_client)
Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
ββββββββββββββββββββββββββββββββββββββββ 75.6/75.6 kB 5.1 MB/s eta 0:00:00
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk) (1.4.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk) (2024.5.15)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk) (4.66.4)
Requirement already satisfied: PyYAML>=5.4.1 in /usr/local/lib/python3.10/dist-packages (from dlt) (6.0.1)
Requirement already satisfied: astunparse>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from dlt) (1.6.3)
Requirement already satisfied: fsspec>=2022.4.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (2023.6.0)
Requirement already satisfied: gitpython>=3.1.29 in /usr/local/lib/python3.10/dist-packages (from dlt) (3.1.43)
Requirement already satisfied: giturlparse>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (0.12.0)
Requirement already satisfied: hexbytes>=0.2.2 in /usr/local/lib/python3.10/dist-packages (from dlt) (1.2.1)
Requirement already satisfied: humanize>=4.4.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (4.7.0)
Requirement already satisfied: jsonpath-ng>=1.5.3 in /usr/local/lib/python3.10/dist-packages (from dlt) (1.6.1)
Requirement already satisfied: makefun>=1.15.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (1.15.4)
Requirement already satisfied: orjson!=3.10.1,!=3.9.11,!=3.9.12,!=3.9.13,!=3.9.14,<4,>=3.6.7 in /usr/local/lib/python3.10/dist-packages (from dlt) (3.10.6)
Requirement already satisfied: packaging>=21.1 in /usr/local/lib/python3.10/dist-packages (from dlt) (24.1)
Requirement already satisfied: pathvalidate>=2.5.2 in /usr/local/lib/python3.10/dist-packages (from dlt) (3.2.0)
Requirement already satisfied: pendulum>=2.1.2 in /usr/local/lib/python3.10/dist-packages (from dlt) (3.0.0)
Requirement already satisfied: pytz>=2022.6 in /usr/local/lib/python3.10/dist-packages (from dlt) (2023.4)
Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (2.31.0)
Requirement already satisfied: requirements-parser>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (0.9.0)
Requirement already satisfied: semver>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (3.0.2)
Requirement already satisfied: setuptools>=65.6.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (67.7.2)
Requirement already satisfied: simplejson>=3.17.5 in /usr/local/lib/python3.10/dist-packages (from dlt) (3.19.2)
Requirement already satisfied: tenacity>=8.0.2 in /usr/local/lib/python3.10/dist-packages (from dlt) (8.5.0)
Requirement already satisfied: tomlkit>=0.11.3 in /usr/local/lib/python3.10/dist-packages (from dlt) (0.13.0)
Requirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from dlt) (4.12.2)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from dlt) (2024.1)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from astunparse>=1.6.3->dlt) (0.43.0)
Requirement already satisfied: six<2.0,>=1.6.1 in /usr/local/lib/python3.10/dist-packages (from astunparse>=1.6.3->dlt) (1.16.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.10/dist-packages (from gitpython>=3.1.29->dlt) (4.0.11)
Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx>=0.15.0->notion_client) (3.7.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx>=0.15.0->notion_client) (2024.7.4)
Collecting httpcore==1.* (from httpx>=0.15.0->notion_client)
Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
ββββββββββββββββββββββββββββββββββββββββ 77.9/77.9 kB 8.3 MB/s eta 0:00:00
Requirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx>=0.15.0->notion_client) (3.7)
Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx>=0.15.0->notion_client) (1.3.1)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx>=0.15.0->notion_client)
Downloading h11-0.14.0-py3-none-any.whl (58 kB)
ββββββββββββββββββββββββββββββββββββββββ 58.3/58.3 kB 4.0 MB/s eta 0:00:00
Requirement already satisfied: ply in /usr/local/lib/python3.10/dist-packages (from jsonpath-ng>=1.5.3->dlt) (3.11)
Requirement already satisfied: python-dateutil>=2.6 in /usr/local/lib/python3.10/dist-packages (from pendulum>=2.1.2->dlt) (2.8.2)
Requirement already satisfied: time-machine>=2.6.0 in /usr/local/lib/python3.10/dist-packages (from pendulum>=2.1.2->dlt) (2.14.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->dlt) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->dlt) (2.0.7)
Requirement already satisfied: types-setuptools>=69.1.0 in /usr/local/lib/python3.10/dist-packages (from requirements-parser>=0.5.0->dlt) (70.3.0.20240710)
Requirement already satisfied: smmap<6,>=3.0.1 in /usr/local/lib/python3.10/dist-packages (from gitdb<5,>=4.0.1->gitpython>=3.1.29->dlt) (5.0.1)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx>=0.15.0->notion_client) (1.2.2)
Installing collected packages: h11, update-checker, prawcore, httpcore, praw, httpx, notion_client
Successfully installed h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 notion_client-2.2.1 praw-7.7.1 prawcore-2.4.0 update-checker-0.18.0
[nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip.
2. Initialize the PRAW (Python Reddit API Wrapper) client and the summarizer using Facebook's BART model:
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks. Please note that authentication is recommended but still optional to access public models or datasets. warnings.warn(
config.json: 0%| | 0.00/1.58k [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/1.63G [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/363 [00:00<?, ?B/s]
vocab.json: 0%| | 0.00/899k [00:00<?, ?B/s]
merges.txt: 0%| | 0.00/456k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/1.36M [00:00<?, ?B/s]
3. Define helper functions:
# This is formatted as code
4. Define your custom dlt resource:
In the function below, we are using dlt.sources.incremental to perform incremental loading. It is used to track a specific field in the data source, in this case, the Created_utc field, which represents the time when a post was created.
The initial_value parameter is set to "1970-01-01T00:00:00Z", which is the start of the Unix epoch time. This means that on the first run of the pipeline, it will load all posts since this time.
On subsequent runs, dlt.sources.incremental will keep track of the maximum Created_utc value that it has seen, and only load posts that have a Created_utc value greater than this. This is how it achieves incremental loading: by only loading new data that has been created since the last run.
Without using this functionality of dlt, you would have to manually keep track of the last Created_utc value that you have seen, and manually filter the posts to only include those that are newer. This would involve more complex code and potentially error-prone manual tracking.
5. Define Notion as a custom dlt destination:
While dlt supports a variety of regularly tested integrations, Notion is typically used as a data source and does not have built-in support as a destination within dlt. For guidance on using Notion as a source, refer to the official documentation. However, considering the wide variety of custom destinations available, configuring Notion as a custom destination provides a learning opportunity to effectively utilize dlt.
It's important to note that if you have configured a dlt resource with incremental loading, you must also define your destination as a dlt destination to ensure the incremental loading functions correctly.
6. Create and run your dlt pipeline:
Upon executing the code snippet below, your Notion database will be populated with basic information and summaries of subreddit posts. Utilizing incremental loading ensures that subsequent executions do not create duplicate entries.
To explore different content, simply change the subreddit_name argument in the subreddit_posts function your dlt pipeline.
WARNING:praw:It appears that you are using PRAW in an asynchronous environment. It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io. See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.
Summarization successful! Summarization successful! Summarization successful! Summarization successful! Summarization successful! Summarization successful! Summarization successful! Summarization failed: index out of range in self. Splitting the text. Summarization successful! Summarization failed: index out of range in self. Splitting the text. Summarization successful! Summarization successful! Pipeline reddit_notion_pipeline load step completed in 2.26 seconds 1 load package(s) were loaded to destination Notion and into dataset None The Notion destination used <dlt.common.configuration.specs.base_configuration.CredentialsConfiguration object at 0x7a56f1d35270> location to store data Load package 1721134597.1803734 is LOADED and contains no failed jobs
Good Things Come to Those who Finish Code Demos...
Congrats on having a reasonably long attention span! π
In this part, you'll do some additional cool stuff with the same Reddit data using LanceDB.
What cool stuff?
TL;DR: You'll basically have your own mini search engine for querying Subreddit post summaries.

The full scoop:
- If you've never had the chance to work with vector databases, this is your calling.
- Otherwise, this is a template to streamline your vector data pipelines with
dltandLanceDB- both open-source! - If you're up for more advanced Machine Learning tasks, this is a great starting point where you donβt need to worry about the data loading part.
The coding corner
1. Install and import necessary libraries:
Collecting dlt[lancedb]
Downloading dlt-0.5.1-py3-none-any.whl (712 kB)
ββββββββββββββββββββββββββββββββββββββββ 712.3/712.3 kB 4.7 MB/s eta 0:00:00
Collecting lancedb
Downloading lancedb-0.10.1-cp38-abi3-manylinux_2_28_x86_64.whl (21.0 MB)
ββββββββββββββββββββββββββββββββββββββββ 21.0/21.0 MB 16.5 MB/s eta 0:00:00
Requirement already satisfied: PyYAML>=5.4.1 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (6.0.1)
Requirement already satisfied: astunparse>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (1.6.3)
Requirement already satisfied: click>=7.1 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (8.1.7)
Requirement already satisfied: fsspec>=2022.4.0 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (2023.6.0)
Collecting gitpython>=3.1.29 (from dlt[lancedb])
Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)
ββββββββββββββββββββββββββββββββββββββββ 207.3/207.3 kB 9.1 MB/s eta 0:00:00
Collecting giturlparse>=0.10.0 (from dlt[lancedb])
Downloading giturlparse-0.12.0-py2.py3-none-any.whl (15 kB)
Collecting hexbytes>=0.2.2 (from dlt[lancedb])
Downloading hexbytes-1.2.1-py3-none-any.whl (5.2 kB)
Requirement already satisfied: humanize>=4.4.0 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (4.7.0)
Collecting jsonpath-ng>=1.5.3 (from dlt[lancedb])
Downloading jsonpath_ng-1.6.1-py3-none-any.whl (29 kB)
Collecting makefun>=1.15.0 (from dlt[lancedb])
Downloading makefun-1.15.4-py2.py3-none-any.whl (23 kB)
Collecting orjson!=3.10.1,!=3.9.11,!=3.9.12,!=3.9.13,!=3.9.14,<4,>=3.6.7 (from dlt[lancedb])
Downloading orjson-3.10.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (141 kB)
ββββββββββββββββββββββββββββββββββββββββ 141.1/141.1 kB 981.0 kB/s eta 0:00:00
Requirement already satisfied: packaging>=21.1 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (24.1)
Collecting pathvalidate>=2.5.2 (from dlt[lancedb])
Downloading pathvalidate-3.2.0-py3-none-any.whl (23 kB)
Collecting pendulum>=2.1.2 (from dlt[lancedb])
Downloading pendulum-3.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (384 kB)
ββββββββββββββββββββββββββββββββββββββββ 384.9/384.9 kB 8.9 MB/s eta 0:00:00
Requirement already satisfied: pytz>=2022.6 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (2023.4)
Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (2.31.0)
Requirement already satisfied: requirements-parser>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (0.9.0)
Collecting semver>=2.13.0 (from dlt[lancedb])
Downloading semver-3.0.2-py3-none-any.whl (17 kB)
Requirement already satisfied: setuptools>=65.6.0 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (67.7.2)
Collecting simplejson>=3.17.5 (from dlt[lancedb])
Downloading simplejson-3.19.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (137 kB)
ββββββββββββββββββββββββββββββββββββββββ 137.9/137.9 kB 3.4 MB/s eta 0:00:00
Requirement already satisfied: tenacity>=8.0.2 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (8.5.0)
Collecting tomlkit>=0.11.3 (from dlt[lancedb])
Downloading tomlkit-0.13.0-py3-none-any.whl (37 kB)
Requirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (4.12.2)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (2024.1)
Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from dlt[lancedb]) (14.0.2)
Collecting deprecation (from lancedb)
Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Collecting pylance==0.14.1 (from lancedb)
Downloading pylance-0.14.1-cp39-abi3-manylinux_2_28_x86_64.whl (25.7 MB)
ββββββββββββββββββββββββββββββββββββββββ 25.7/25.7 MB 23.9 MB/s eta 0:00:00
Collecting ratelimiter~=1.0 (from lancedb)
Downloading ratelimiter-1.2.0.post0-py3-none-any.whl (6.6 kB)
Collecting retry>=0.9.2 (from lancedb)
Downloading retry-0.9.2-py2.py3-none-any.whl (8.0 kB)
Requirement already satisfied: tqdm>=4.27.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (4.66.4)
Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.8.2)
Requirement already satisfied: attrs>=21.3.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (23.2.0)
Requirement already satisfied: cachetools in /usr/local/lib/python3.10/dist-packages (from lancedb) (5.3.3)
Collecting overrides>=0.7 (from lancedb)
Downloading overrides-7.7.0-py3-none-any.whl (17 kB)
Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from pylance==0.14.1->lancedb) (1.25.2)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from astunparse>=1.6.3->dlt[lancedb]) (0.43.0)
Requirement already satisfied: six<2.0,>=1.6.1 in /usr/local/lib/python3.10/dist-packages (from astunparse>=1.6.3->dlt[lancedb]) (1.16.0)
Collecting gitdb<5,>=4.0.1 (from gitpython>=3.1.29->dlt[lancedb])
Downloading gitdb-4.0.11-py3-none-any.whl (62 kB)
ββββββββββββββββββββββββββββββββββββββββ 62.7/62.7 kB 3.8 MB/s eta 0:00:00
Collecting ply (from jsonpath-ng>=1.5.3->dlt[lancedb])
Downloading ply-3.11-py2.py3-none-any.whl (49 kB)
ββββββββββββββββββββββββββββββββββββββββ 49.6/49.6 kB 3.9 MB/s eta 0:00:00
Requirement already satisfied: python-dateutil>=2.6 in /usr/local/lib/python3.10/dist-packages (from pendulum>=2.1.2->dlt[lancedb]) (2.8.2)
Collecting time-machine>=2.6.0 (from pendulum>=2.1.2->dlt[lancedb])
Downloading time_machine-2.14.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34 kB)
Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (0.7.0)
Requirement already satisfied: pydantic-core==2.20.1 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (2.20.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->dlt[lancedb]) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->dlt[lancedb]) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->dlt[lancedb]) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->dlt[lancedb]) (2024.7.4)
Requirement already satisfied: types-setuptools>=69.1.0 in /usr/local/lib/python3.10/dist-packages (from requirements-parser>=0.5.0->dlt[lancedb]) (70.3.0.20240710)
Requirement already satisfied: decorator>=3.4.2 in /usr/local/lib/python3.10/dist-packages (from retry>=0.9.2->lancedb) (4.4.2)
Collecting py<2.0.0,>=1.4.26 (from retry>=0.9.2->lancedb)
Downloading py-1.11.0-py2.py3-none-any.whl (98 kB)
ββββββββββββββββββββββββββββββββββββββββ 98.7/98.7 kB 6.2 MB/s eta 0:00:00
Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython>=3.1.29->dlt[lancedb])
Downloading smmap-5.0.1-py3-none-any.whl (24 kB)
Installing collected packages: ratelimiter, ply, makefun, tomlkit, smmap, simplejson, semver, py, pathvalidate, overrides, orjson, jsonpath-ng, hexbytes, giturlparse, deprecation, time-machine, retry, pylance, gitdb, pendulum, lancedb, gitpython, dlt
Successfully installed deprecation-2.1.0 dlt-0.5.1 gitdb-4.0.11 gitpython-3.1.43 giturlparse-0.12.0 hexbytes-1.2.1 jsonpath-ng-1.6.1 lancedb-0.10.1 makefun-1.15.4 orjson-3.10.6 overrides-7.7.0 pathvalidate-3.2.0 pendulum-3.0.0 ply-3.11 py-1.11.0 pylance-0.14.1 ratelimiter-1.2.0.post0 retry-0.9.2 semver-3.0.2 simplejson-3.19.2 smmap-5.0.1 time-machine-2.14.2 tomlkit-0.13.0
2. Initialize Notion verified source:
This command sets up a pipeline that extracts data from the Notion verified source and loads it into a LanceDB destination. You can check what it has loaded in Files.
Looking up the init scripts in https://github.com/dlt-hub/verified-sources.git...
No files to update, exiting
3. Import the dlt.source that fetches databases from Notion:
Note that we're also defining a dlt.transformer function that allows you to manipulate data from a dlt.resource. The reason is to pass clean table data to the LanceDB adapter later, without any metadata that notion_databases yields.
4. Create and run your dlt pipeline with LanceDB as destination:
LanceDB has an integration with dlt. All you need to do is just to pass the data with the column you want to embed to the adapter and run the pipeline.
tokenizer_config.json: 0%| | 0.00/366 [00:00<?, ?B/s]
vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/711k [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/125 [00:00<?, ?B/s]
config.json: 0%| | 0.00/743 [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/133M [00:00<?, ?B/s]
Pipeline reddit_lancedb_pipeline load step completed in 5.95 seconds 1 load package(s) were loaded to destination LanceDB and into dataset reddit_top_posts The LanceDB destination used <dlt.destinations.impl.lancedb.configuration.LanceDBCredentials object at 0x78e4b4584670> location to store data Load package 1721294703.4760716 is LOADED and contains no failed jobs
3. Query your data:
This script connects to a LanceDB database, retrieves data from a specific table, searches for a query within the table, and converts the search results to a pandas DataFrame.
I was 21 when my fiance asked me to marry him. We were only engaged for 6 months before the inncident. My middle oldest sister, lets call her Nicky, was a very cold person. She only ever opened up to my fiance as she said she saw him as a brother. She and I never saw eye to eye, I loved her dearly because she was my sister but didn't like her as a person. The night was going smoothly until Nicky spotted a guy across the room whom she claimed she wanted to "climb like a tree" She walked over to him and within a few minutes she was back and she had a sour expression on her face. She then told me the guy didn't want her number but he wanted mine instead. I don't remember what happened next as I blacked out and the next morning I woke up on a hard sofa, my head pounding. When I came to, I realised I was in Nicky's friends house and my phone was sitting on the glass table in front of me, but it was flat. I tried to explain that my phone went flat but he then went on screaming about how could I cheat on him. Nicky told her ex fiance that she had slept with him multiple times. When he found out he left and never returned any of her calls or texts. Nicky's mother kicked her out and threw her things out. She was homeless and single in less than a day and a half. She tried everything to get her fiance back and her family back. But they all chose Nicky and her side and left her to fend for herself. She has not spoken to any of them in two years and doesn't know if she can ever forgive them. If you would like to talk to her about her story, please contact her on 020 3615 9090. For confidential support call the Samaritans in the UK on 08457 90 90 90, visit a local Samaritans branch or click here for details. In the U.S. call the National Suicide Prevention Line on 1-800-273-8255. "I never wanted kids, but was never adamantly against having one" "My wife's sister died. All of a sudden "family" is SUPER important to my wife" "I've read every book. I've worked shifts 6 days a week for a decade to pay for tens of thousands (probably 100,000's) of therapy, behaviorists, counseling, classes" "Guess what? Grandma and grandpa say the kid is "too much". They haven't helped for more than a day a month in almost 7 years. And here I am - on reddit on my laptop, tethered to my phone in a park after dark"