Weaviate Ragas Demo

Ragas Demo

vector-searchvector-databaseretrieval-augmented-generationragasllm-frameworksoperationsfunction-callingweaviate-recipesintegrationsPythongenerative-ai

alph-notebooks/weaviate-recipes / ragas-demo.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

[1]

[2]

[{'question': 'Why would I use Weaviate as my vector database?',
, 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'},
, {'question': 'What is the difference between Weaviate and for example Elasticsearch?',
, 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'},
, {'question': 'Do I need to know about Docker (Compose) to use Weaviate?',
, 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'},
, {'question': 'What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?',
, 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'},
, {'question': "Are there any 'best practices' or guidelines to consider when designing a schema?",
, 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."},
, {'question': 'Is it possible to create one-to-many relationships in the schema?',
, 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'},
, {'question': 'Do Weaviate classes have namespaces?',
, 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'},
, {'question': 'Are there restrictions on UUID formatting? Do I have to adhere to any standards?',
, 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."},
, {'question': 'If I do not specify a UUID during adding data objects, will Weaviate create one automatically?',
, 'answer': 'Yes, a UUID will be created if not specified.'},
, {'question': 'Can I use Weaviate to create a traditional knowledge graph?',
, 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'},
, {'question': 'Why does Weaviate have a schema and not an ontology?',
, 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."},
, {'question': 'How can I retrieve the total object count in a class?',
, 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'},
, {'question': "How do I get the cosine similarity from Weaviate's certainty?",
, 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"},
, {'question': 'What is the best way to iterate through objects? Can I do paginated API calls?',
, 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'},
, {'question': "How does Weaviate's vector and scalar filtering work?",
, 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."},
, {'question': 'Can I request a feature in Weaviate?',
, 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}]

[15]

[16]

{'question': ['Why would I use Weaviate as my vector database?',
,  'What is the difference between Weaviate and for example Elasticsearch?',
,  'Do I need to know about Docker (Compose) to use Weaviate?',
,  'What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?',
,  "Are there any 'best practices' or guidelines to consider when designing a schema?",
,  'Is it possible to create one-to-many relationships in the schema?',
,  'Do Weaviate classes have namespaces?',
,  'Are there restrictions on UUID formatting? Do I have to adhere to any standards?',
,  'If I do not specify a UUID during adding data objects, will Weaviate create one automatically?',
,  'Can I use Weaviate to create a traditional knowledge graph?',
,  'Why does Weaviate have a schema and not an ontology?',
,  'How can I retrieve the total object count in a class?',
,  "How do I get the cosine similarity from Weaviate's certainty?",
,  'What is the best way to iterate through objects? Can I do paginated API calls?',
,  "How does Weaviate's vector and scalar filtering work?",
,  'Can I request a feature in Weaviate?'],
, 'answer': ["You might consider using Weaviate as your vector database for several compelling reasons based on the search results provided:\n\n1. **Full Vector Database Capabilities**: Weaviate offers comprehensive vector database functionalities, even to customers using the free service. This means you can leverage all of its features without incurring additional costs.\n\n2. **Third Wave Database Technology**: Weaviate represents a new generation of database technology where data is processed first by a machine learning model. AI models assist in the processing, storing, and retrieval of data, making it a smart choice for AI-driven applications.\n\n3. **Cloud-Native and Modular**: As a cloud-native solution, Weaviate is designed to work seamlessly in cloud environments, ensuring scalability and flexibility. Its modular architecture allows it to cover a wide range of use cases, making it adaptable to various needs.\n\n4. **Real-Time Vector Database**: Weaviate is built to handle real-time queries, which is essential for applications that require immediate insights from their data.\n\n5. **Open-Source**: Being open-source, Weaviate allows for transparency, community contributions, and the possibility to modify the software to suit your specific requirements.\n\n6. **Ease of Use**: The goal of Weaviate is to simplify the creation of semantic systems or vector search engines. Its APIs are based on GraphQL, which is known for its ease of use and flexibility.\n\n7. **Strong Focus on Performance**: Weaviate is designed with performance in mind, ensuring that it can handle the demands of vector search and machine learning workloads efficiently.\n\n8. **Combination of ANN and Database Features**: Weaviate combines the speed of Approximate Nearest Neighbor (ANN) algorithms with traditional database features like backups, persistence, and replication. This hybrid approach offers the best of both worlds.\n\n9. **Multiple Access Methods**: You can interact with Weaviate using GraphQL, REST, and client libraries in multiple programming languages, providing flexibility in how you integrate and work with the database.\n\n10. **Scalability for Machine Learning Models**: Weaviate is built to scale your machine learning models, which is crucial for applications that grow over time and require a database that can keep up with increasing data volumes and complexity.\n\nIn summary, Weaviate's combination of AI-driven processing, cloud-native architecture, real-time capabilities, and open-source modularity make it a strong candidate for anyone looking",
,  'The difference between Weaviate and Elasticsearch primarily lies in the way they handle and search through data. Both systems use inverted indices to store data and values, which is a common technique that enables fast search operations.\n\nElasticsearch is a well-known search engine that relies heavily on inverted indices for its search capabilities. It is designed to handle a variety of use cases, including text search, analytics, and aggregation of large volumes of data. Elasticsearch is known for its full-text search capabilities, scalability, and real-time indexing.\n\nWeaviate, on the other hand, not only uses inverted indices but also incorporates vector search capabilities. This means that Weaviate processes and organizes data to allow objects to be retrieved based on their similarity to a query. It quantifies similarity and indexes vector data, which traditional databases and search engines like Elasticsearch do not typically do. This vector indexing allows for semantic search, where the meaning of the query is taken into account, rather than just keyword matching.\n\nAdditionally, Weaviate offers graph functionalities on top of its vector-search focus. While it is primarily focused on searching through data objects using inverted indices and vector indices, it also supports operations that are common in graph databases, albeit more efficiently. In traditional graph databases, certain operations like listing data can be quite expensive, but Weaviate is designed to handle these operations more effectively.\n\nIn summary, while both Weaviate and Elasticsearch use inverted indices for fast search, Weaviate distinguishes itself by also incorporating vector search for semantic understanding and offering graph database functionalities, which can enhance the search experience by understanding the context and relationships within the data.',
,  'Based on the search results provided, it is not strictly necessary to know about Docker (Compose) to use Weaviate, but it is highly recommended if you plan to deploy Weaviate using Docker. The documentation indicates that Weaviate supports deployment with Docker Compose, which is a tool for defining and running multi-container Docker applications. With Docker Compose, you can use a `docker-compose.yml` file to configure your Weaviate service.\n\nThe search results suggest that Docker Compose is a convenient option for trying out or evaluating Weaviate, as well as for development purposes. It is quick and easy to set up, making it suitable for these scenarios. However, for production environments or more complex deployments, other methods such as Kubernetes with Helm support might be more appropriate.\n\nIn summary, while you can use Weaviate without deep knowledge of Docker Compose, understanding how to use Docker and Docker Compose will greatly facilitate the process of deploying and managing Weaviate instances, especially for initial evaluation or development purposes.',
,  "When the Weaviate Docker container restarts, whether your data in the Weaviate database is lost depends on the configuration of the Docker container, specifically whether you have configured a volume for data persistence.\n\nBased on the search results provided:\n\n1. If you have no volume configured (which is the default in the Docker Compose files provided by Weaviate), your data is kept even if the container restarts. This could be due to a crash or because of `docker stop/start` commands. The data is stored inside the container's writable layer, so as long as the container itself is not removed, the data should persist across restarts.\n\n2. The search results also mention a Docker Compose configuration snippet that includes a `volumes` section. This section maps a local directory (`./data-node-3`) to the `/var/lib/weaviate` directory inside the container. This means that the data is stored outside of the container's writable layer and on the host machine's filesystem. With this configuration, the data will persist even if the container is removed and recreated, as long as the volume mapping is maintained.\n\nIn summary, if you have not configured a volume, your data should still persist across container restarts, but it is at risk if the container is removed. To ensure data persistence beyond the container's lifecycle, you should configure a volume as shown in the Docker Compose configuration snippet, mapping a host directory to the appropriate directory inside the container where Weaviate stores its data (`/var/lib/weaviate`).",
,  "When designing a schema, there are indeed several best practices and guidelines to consider. These best practices help ensure that the schema is well-structured, scalable, and efficient for the operations you intend to perform on the data. Here are some key points to keep in mind based on the provided search results and general knowledge:\n\n1. **Understand Your Data and Use Case**: Before designing your schema, have a clear understanding of the data you will be storing and the queries you will be performing. For example, if you are performing a semantic search over the content of a book, you might want to structure your schema to include chapters and paragraphs as separate entities or properties to facilitate more granular searches.\n\n2. **Define Clear Metadata**: Start by defining the metadata for your schema, such as its name (often referred to as `class` in some systems). This helps in identifying and organizing the data within your database.\n\n3. **Data Properties**: Clearly define the data properties, which are the attributes or fields that will hold the data within your schema. Ensure that each property is appropriately named and typed according to the data it will store.\n\n4. **Group Related Objects**: When using systems like Weaviate, it's important to group objects that you want to search together in the same class. This is because vector searches, which are used for semantic searches, can only be performed within a single vector space.\n\n5. **Normalization vs. Denormalization**: Depending on the database system you are using (relational, NoSQL, etc.), consider whether to normalize or denormalize your data. Normalization reduces redundancy and improves data integrity, while denormalization can improve read performance at the cost of potential data duplication.\n\n6. **Scalability**: Design your schema with scalability in mind. Anticipate future changes and growth in data volume, and ensure that your schema can accommodate these changes without significant rework.\n\n7. **Consistency**: Maintain consistency in naming conventions, data types, and structures throughout your schema. This makes it easier to understand and maintain the schema over time.\n\n8. **Indexing**: Determine which fields will be queried frequently and consider indexing them to improve query performance.\n\n9. **Documentation**: Document your schema design decisions, including the rationale behind the structure, properties, and any constraints or indexes applied. This documentation is invaluable for future maintenance and for new team members to understand the schema.\n\n10. **Validation**:",
,  'Yes, it is possible to create one-to-many relationships in the schema. According to the provided search results, you can reference one or more objects (Class -> one or more Classes) through cross-references in the schema. This means that you can establish relationships where one instance of a class can be associated with multiple instances of another class, which is the essence of a one-to-many relationship.\n\nThe search results also mention that referring to lists or arrays of primitives will be available soon, indicating that the capability to handle one-to-many relationships may be further enhanced in the future.\n\nAdditionally, the search results suggest that if you have classes with multiple links, it can be useful to resolve these connections in a single query. For cases where there is a single (bi-directional) reference, you could denormalize the links (e.g., with an ID field) and resolve them during the search, which is another way to handle relationships between data entities in a schema.',
,  'Yes, Weaviate classes act as namespaces. This is evident from the search results, particularly from the documentation titled "blog-weaviate-1-14-release.json," which states, "IDs don\'t have to be globally unique, because in Weaviate classes act as namespaces." This means that each class in Weaviate has its own namespace, allowing for the same ID to be used in different classes without causing conflicts. Each class also has a different Hierarchical Navigable Small World (HNSW) index and is isolated on disk, further emphasizing the separation and namespace-like behavior of classes within Weaviate.',
,  'Yes, there are restrictions on UUID formatting, and you do have to adhere to certain standards. According to the search results, specifically from the documentation related to Weaviate (an open-source vector search engine), the UUID must be presented as a string matching the Canonical Textual representation. This means that when you provide a UUID, it should follow the standard format, which typically looks like this: `123e4567-e89b-12d3-a456-426614174000`.\n\nThe documentation also mentions that if you do not specify a UUID when working with Weaviate, it will generate a `v4` UUID for you, which is a randomly generated UUID. The `v4` UUIDs are one of the several versions of UUIDs defined by the standards, and they are generated based on random or pseudo-random numbers.\n\nThe standard for UUIDs is defined in RFC 4122, which outlines the format consisting of 32 hexadecimal digits, displayed in five groups separated by hyphens, in the form of 8-4-4-4-12 for a total of 36 characters (including the hyphens). The version of the UUID is indicated by the first character of the third group, and the variant is indicated by the first one to three bits of the first character of the fourth group.\n\nIn summary, when generating or using UUIDs, you should adhere to the standard format and conventions as outlined in RFC 4122 and ensure that they are represented in the Canonical Textual representation, especially when working with systems like Weaviate that enforce this requirement.',
,  'Yes, if you do not specify a UUID when adding data objects to Weaviate, it will create one automatically. The documentation confirms that a UUID will be generated if it is not provided by the user. Additionally, the Weaviate Python client offers a function to create a deterministic UUID based on an object, which can be used if you want to specify the UUID at import time. However, if you choose not to provide a UUID, Weaviate will handle the generation of a random UUID for you.',
,  'Yes, you can use Weaviate to create a traditional knowledge graph. Weaviate supports ontology and RDF-like definitions in its schema, which allows you to define the structure of your knowledge graph in a way that is similar to traditional RDF-based systems. It is designed to be scalable and provides a GraphQL API, which makes it easy to query and interact with your knowledge graph.\n\nAlthough Weaviate is not strictly based on RDF or schema.org, it is inspired by these standards. The use of GraphQL, which has gained popularity since being open-sourced by Facebook, is a key feature of Weaviate that enables the representation and querying of data within the knowledge graph.\n\nIn summary, Weaviate offers the necessary features and capabilities to create and manage a traditional knowledge graph, with the added benefits of modern technologies like GraphQL for efficient data handling.',
,  "Weaviate has a schema rather than an ontology because it focuses on the representation of data, particularly within the context of its GraphQL API. The schema in Weaviate is designed to express the structure and organization of the data that users will interact with through the API. This approach is aligned with the practical needs of developers who are building applications and services that query and manipulate data within Weaviate's knowledge graph.\n\nHowever, it's important to note that while Weaviate uses a schema for data representation, it does not preclude the use of ontological concepts. In fact, one of Weaviate's core features is its ability to semantically interpret the schema, which allows users to express an ontology within that schema. This semantic interpretation enables users to perform searches for concepts rather than being limited to formally defined entities. This means that Weaviate can understand and leverage the relationships and meanings within the data, providing a more flexible and powerful way to work with knowledge graphs.\n\nThe choice to use a schema also aligns with Weaviate's decision to utilize GraphQL, a graph query language open-sourced by Facebook, which has become popular for its efficiency and ease of use in representing and querying data. Weaviate's approach, while inspired by RDF (Resource Description Framework) and schema.org, is tailored to provide a modern, developer-friendly experience that leverages the strengths of GraphQL in the context of a knowledge graph.",
,  'To retrieve the total object count in a class, you can use an `Aggregate` query. The search results indicate that this type of query is used to aggregate objects in a specific class to obtain the total count. For example, the query mentioned in the search results aggregates the objects in the `JeopardyQuestion` class without any restrictions, thus returning the total number of objects in that class, which is 10,000.\n\nWhile the exact syntax of the `Aggregate` query is not provided in the search results, it typically involves using a database query language or an API provided by the database system or ORM (Object-Relational Mapping) tool you are using. If you are using a SQL-based database, for instance, the query might look something like this:\n\n```sql\nSELECT COUNT(*) FROM JeopardyQuestion;\n```\n\nIf you are using an ORM or a database with a different query language, you would need to consult the specific documentation for that system to construct the appropriate `Aggregate` query to retrieve the total object count in a class.',
,  "To obtain the cosine similarity from Weaviate's `certainty`, you can use the following formula:\n\n```python\ncosine_sim = 2 * certainty - 1\n```\n\nThis formula converts the `certainty` value, which is a number between 0 and 1, into a cosine similarity score. The cosine similarity score ranges from -1 to 1, where -1 indicates completely dissimilar vectors, 0 indicates orthogonality (no similarity or dissimilarity), and 1 indicates identical vectors. The `certainty` value in Weaviate represents the normalized cosine similarity, and the formula adjusts it to the standard range of cosine similarity.",
,  "The best way to iterate through objects, according to the provided search results from Weaviate documentation, is to use cursor-based iteration or pagination through a result set. You can achieve this by using the `after` operator for cursor-based iteration and the `offset` and `limit` operators for pagination.\n\nFor cursor-based iteration, which is supported by both REST and GraphQL interfaces in Weaviate, you would typically make an API call to retrieve the first set of results and then use the `after` operator to fetch subsequent sets of objects. This method is efficient for iterating through large datasets because it allows you to continue from the last retrieved object without re-fetching the previous results.\n\nFor paginated API calls, you would use the `offset` and `limit` operators. The `limit` operator specifies the number of objects to return in a single call, while the `offset` operator indicates the starting point in the collection of objects. This method is useful for retrieving specific subsets of data or when you need to display data in a paginated format, such as in a web application.\n\nThe search results also mention that prior to certain updates, users were limited to retrieving 10 objects per query, which required crafting artificial queries to access more data. However, with the current capabilities, you can access all objects in a class more efficiently.\n\nIn summary, to iterate through objects in Weaviate, you can use:\n\n- Cursor-based iteration with the `after` operator for continuous retrieval of objects.\n- Pagination with the `offset` and `limit` operators to fetch specific subsets of data.\n\nBoth methods are supported by Weaviate's REST and GraphQL APIs, and you can choose the one that best fits your use case.",
,  'Weaviate is an open-source vector database that allows for the storage and retrieval of data objects based on their semantic properties, which are indexed with vectors. The database supports both vector and scalar filtering, enabling users to perform complex searches that combine the semantic understanding of the data with traditional database querying capabilities.\n\nVector filtering in Weaviate involves searching for data objects whose vector representations are semantically similar to a given query vector. This is achieved through vector search algorithms that can quickly identify objects with vectors close to the query vector in the vector space. This type of search is particularly useful for finding objects with similar meanings or contexts, even if they do not share exact keywords.\n\nScalar filtering, on the other hand, refers to the traditional database querying methods that filter data based on scalar fields such as numbers, strings, booleans, dates, etc. Scalar filtering allows users to perform exact matches or range queries on these fields, which can be combined with vector searches to refine the results further.\n\nWeaviate\'s ability to perform fast vector searches along with scalar filtering provides a powerful tool for users to retrieve highly relevant data. Users can specify criteria for both the semantic content of the objects (using vector search) and their structured data properties (using scalar filtering). This dual approach enables more precise and context-aware search results.\n\nFor example, a user could search for articles that are semantically related to a topic like "machine learning" by providing a query vector representing that concept and then further filter the results to only include articles published within a certain date range using scalar filtering.\n\nWeaviate is designed to be flexible with how vectors are generated. Teams with experience in data science and machine learning can use their own models to create vectors and import them into Weaviate. Additionally, Weaviate offers built-in options for generating vectors using pre-trained machine learning models, making it accessible for users who may not have custom models.\n\nIn summary, Weaviate\'s vector and scalar filtering work together to provide a comprehensive search experience that leverages both the semantic meaning of data and its structured attributes, allowing for nuanced and contextually relevant data retrieval.',
,  "Yes, you can request a feature in Weaviate. According to the information provided, Weaviate's development team encourages feedback and feature requests from the community. They track feature requests and discussions on GitHub pages, where you can upvote features or engage in discussions about them. Additionally, you can join their forum to discuss the roadmap and feature requests in more detail.\n\nTo request a feature, you would typically follow these steps:\n\n1. Visit the Weaviate GitHub repository.\n2. Look for the Issues section of the repository.\n3. Check if the feature you want to request has already been suggested by someone else. If it has, you can add your thoughts to the existing discussion and upvote the feature.\n4. If the feature has not been requested yet, you can create a new issue. Provide a clear and detailed description of the feature you would like to see, explaining why it would be beneficial for Weaviate and its users.\n5. Engage with the community and the developers by responding to any questions or comments regarding your feature request.\n\nRemember that the Weaviate team uses community feedback to plan future releases, so your input could help shape the direction of the project."],
, 'contexts': [['ware, customers using the free service will always be able to access all of the Weaviate\'s vector database capabilities.\n\nWeaviate vector database is an example of a "third wave" database technology. Data is processed by a machine learning model first, and AI models help in processing, storing, and ',
,   "m existing ANN solutions, let's quickly take a look at what Weaviate is. Weaviate is a cloud-native, modular, real-time vector database built to scale your machine learning models. Oh, it's also open-source, by the way. Because of its modularity, Weaviate can cover a wide variety of bases. By defaul",
,   '\n\n## General\n\n#### Q: Why would I use Weaviate as my vector database?\n\n\n  Answer\n\n> Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus',
,   '\n\n## General\n\n#### Q: Why would I use Weaviate as my vector database?\n\n\n  Answer\n\n> Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus',
,   't use-cases.\n\nWeaviate was built to combine the speed and capabilities of ANN algorithms with the features of a database such as backups, real-time queries, persistence, and replication (part of the v1.17 release). Weaviate can be accessed through GraphQL, REST, and client libraries in multiple prog'],
,  ['reason why Weaviate comes containerized.\n\n\n\n#### Q: What is the difference between Weaviate and for example Elasticsearch?\n\n\n  Answer\n\n> Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. Bu',
,   'reason why Weaviate comes containerized.\n\n\n\n#### Q: What is the difference between Weaviate and for example Elasticsearch?\n\n\n  Answer\n\n> Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. Bu',
,   'operations you can do with Weaviate is listing data. In a traditional graph database that is quite expensive. Weaviate does however have graph functionalities on top of the vector-search focus. So although its primary focus is on searching through data objects with the inverted index and/or vector i',
,   'ing models.\n\nIn plain terms, Weaviate processes and organizes your data in such a way that objects can be retrieved based on their similarity to a query. To perform these tasks at speed, Weaviate does two things that traditional databases do not. Weaviate:\n\n- Quantifies similarity\n- Indexes vector d',
,   'operations you can do with Weaviate is listing data. In a traditional graph database that is quite expensive. Weaviate does however have graph functionalities on top of the vector-search focus. So although its primary focus is on searching through data objects with the inverted index and/or vector i'],
,  ['\n\n## Overview\n\nWeaviate supports deployment with Docker Compose, which allows you to run Weaviate on any OS supported by Docker.\n\nTo start Weaviate with Docker, you can use a Docker Compose file, typically called `docker-compose.yml`. You can:\n* use the Starter Docker Compose file,\n* generate one wi',
,   '\n\n## Overview\n\nWeaviate supports deployment with Docker Compose, which allows you to run Weaviate on any OS supported by Docker.\n\nTo start Weaviate with Docker, you can use a Docker Compose file, typically called `docker-compose.yml`. You can:\n* use the Starter Docker Compose file,\n* generate one wi',
,   '\n\n## Overview\n\nWeaviate supports deployment with Docker Compose, which allows you to run Weaviate on any OS supported by Docker.\n\nTo start Weaviate with Docker, you can use a Docker Compose file, typically called `docker-compose.yml`. You can:\n* use the Starter Docker Compose file,\n* generate one wi',
,   "un on Kubernetes? Is Helm supported?\nYes, see next step.\n\n## When should or shouldn't I use Docker Compose?\nDocker Compose is quick, easy and convenient, but there are situations that it isn't suited for. We recommend to use a Docker Compose setup for trying out or evaluating Weaviate and when devel",
,   "un on Kubernetes? Is Helm supported?\nYes, see next step.\n\n## When should or shouldn't I use Docker Compose?\nDocker Compose is quick, easy and convenient, but there are situations that it isn't suited for. We recommend to use a Docker Compose setup for trying out or evaluating Weaviate and when devel"],
,  ['arts? Is my data in the Weaviate database lost?\n\n\n  Answer\n\n> There are three levels:\n> 1. You have no volume configured (the default in our `Docker Compose` files), if the container restarts (e.g. due to a crash, or because of `docker stop/start`) your data is kept\n> 2. You have no volume configure',
,   'arts? Is my data in the Weaviate database lost?\n\n\n  Answer\n\n> There are three levels:\n> 1. You have no volume configured (the default in our `Docker Compose` files), if the container restarts (e.g. due to a crash, or because of `docker stop/start`) your data is kept\n> 2. You have no volume configure',
,   "   restart: on-failure:0\n    volumes:\n      - ./data-node-3:/var/lib/weaviate\n    environment:\n      LOG_LEVEL: 'debug'\n      QUERY_DEFAULTS_LIMIT: 25\n      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'\n      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'\n      ENABLE_MODULES: 'text2vec-openai,text2ve",
,   "   restart: on-failure:0\n    volumes:\n      - ./data-node-3:/var/lib/weaviate\n    environment:\n      LOG_LEVEL: 'debug'\n      QUERY_DEFAULTS_LIMIT: 25\n      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'\n      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'\n      ENABLE_MODULES: 'text2vec-openai,text2ve",
,   "   restart: on-failure:0\n    volumes:\n      - ./data-node-3:/var/lib/weaviate\n    environment:\n      LOG_LEVEL: 'debug'\n      QUERY_DEFAULTS_LIMIT: 25\n      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'\n      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'\n      ENABLE_MODULES: 'text2vec-openai,text2ve"],
,  [" start up with a volume, all your data will be there\n\n\n\n## Schema and data structure\n\n#### Q: Are there any 'best practices' or guidelines to consider when designing a schema?\n\n*(E.g. if I was looking to perform a semantic search over a the content of a Book would I look to have Chapter and Paragrap",
,   " start up with a volume, all your data will be there\n\n\n\n## Schema and data structure\n\n#### Q: Are there any 'best practices' or guidelines to consider when designing a schema?\n\n*(E.g. if I was looking to perform a semantic search over a the content of a Book would I look to have Chapter and Paragrap",
,   "t into some key considerations while doing so.\n\n## &nbsp;&nbsp;How to define a schema\n\nAs you learned earlier, a schema definition includes a great deal of information. Let's cover a few of those properties in this section, starting with:\n- The metadata such as its name (`class`),\n- Its data `proper",
,   "t into some key considerations while doing so.\n\n## &nbsp;&nbsp;How to define a schema\n\nAs you learned earlier, a schema definition includes a great deal of information. Let's cover a few of those properties in this section, starting with:\n- The metadata such as its name (`class`),\n- Its data `proper",
,   "ing your data schema in waviate it's important to group objects that you want to search together in the same class since Vector searches can only be performed within a single Vector space finally let's touch on schemas a schema in waviate is the blueprint that defines all the things we've talked abo"],
,  ['.\n\n\n\n#### Q: Is it possible to create one-to-many relationships in the schema?\n\n\n  Answer\n\n> Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.\n\n\n\n#### Q: What is th',
,   '.\n\n\n\n#### Q: Is it possible to create one-to-many relationships in the schema?\n\n\n  Answer\n\n> Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.\n\n\n\n#### Q: What is th',
,   'u have classes with multiple links, it could definitely be helpful to resolve some of those connections in a single query. On the other hand, if you have a single (bi-directional) reference in your data, you could also just denormalize the links (e.g. with an ID field) and resolve them during search',
,   'u have classes with multiple links, it could definitely be helpful to resolve some of those connections in a single query. On the other hand, if you have a single (bi-directional) reference in your data, you could also just denormalize the links (e.g. with an ID field) and resolve them during search',
,   "e these complex schemas like with we8 we kind of have like a collections like abstraction where you have like a uh you know like student teacher book these are like separate classes and so it's like you'd only retrieve kind of one schema at a time to format the query for one schema hopefully that ma"],
,  ['ses in weave maybe also for new listeners we could quickly do what a class is and like when to separate it into different classes how to think about adding filters versus uh classes yeah the the original motivation in off of the sort of class-based schema in vv8 was to reflect the real world so in a',
,   "h the different classes so you gave the example of like a tweet and maybe that has a has a link to i don't know post or something so you would have two separate classes and um the concept of having different classes or different collection or different namespaces and databases i think that's that's ",
,   'ke in weviate land you have this class you have this class you have this class this class contains like weba blog posts this documentation this is the code base and and so it like chooses which information source to Traverse maybe you can also talk about like SQL or like the Google search API or jus',
,   "ey-value store. IDs don't have to be globally unique, because in Weaviate classes act as namespaces. While each class has a different HNSW index, including the store around it, which is isolated on disk.\n\nThere was however one point in the API where reusing IDs between classes was causing serious is",
,   "ey-value store. IDs don't have to be globally unique, because in Weaviate classes act as namespaces. While each class has a different HNSW index, including the store around it, which is isolated on disk.\n\nThere was however one point in the API where reusing IDs between classes was causing serious is"],
,  ["ID formatting? Do I have to adhere to any standards?\n\n\n  Answer\n\n> The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a `v4` i.e. a random UUID. If you generate them yourself you could either use random ones or de",
,   "ID formatting? Do I have to adhere to any standards?\n\n\n  Answer\n\n> The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a `v4` i.e. a random UUID. If you generate them yourself you could either use random ones or de",
,   'e uuids they uniquely identify every item in our database could be images you know passages paragraphs whatever you put in your vector in the web Vector database so how does so the uuids they get sorted in in the tree is that the key idea for when you like how you use this kind of structure um would',
,   "w could we find data that was structured in different ways? I fell in love with the semantic web but the challenge I saw there, was the need to have people agree on naming conventions and standards.\n\nThis made me wonder, what if we wouldn't have to agree on this any more? What if we could just store",
,   "w could we find data that was structured in different ways? I fell in love with the semantic web but the challenge I saw there, was the need to have people agree on naming conventions and standards.\n\nThis made me wonder, what if we wouldn't have to agree on this any more? What if we could just store"],
,  ["terministically determine them based on some fields that you have. For this you'll need to use `v3` or `v5`).\n\n\n\n#### Q: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?\n\n\n  Answer\n\n> Yes, a UUID will be created if not specified.\n\n\n\n#### Q: Can I use Wea",
,   "terministically determine them based on some fields that you have. For this you'll need to use `v3` or `v5`).\n\n\n\n#### Q: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?\n\n\n  Answer\n\n> Yes, a UUID will be created if not specified.\n\n\n\n#### Q: Can I use Wea",
,   'new, random ID, and create new objects.\n\n\n\n### &nbsp;&nbsp;Specify object UUID\n\nYou could specify an object UUID at import time to serve as the object identifier. The Weaviate Python client, for example, provides a function to create a deterministic UUID based on an object. So, it could be added to ',
,   'new, random ID, and create new objects.\n\n\n\n### &nbsp;&nbsp;Specify object UUID\n\nYou could specify an object UUID at import time to serve as the object identifier. The Weaviate Python client, for example, provides a function to create a deterministic UUID based on an object. So, it could be added to ',
,   ' but if not, Weaviate will generate a random UUID.\n\nWeaviate does not check if a duplicate object is being created. As a result, using a deterministic uuid may prevent accidental creation of duplicate objects.\n\n### &nbsp;&nbsp;Vector\n\nEach object in Weaviate can have a vector embedding to represent '],
,  ['viate to create a traditional knowledge graph?\n\n\n  Answer\n\n> Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to sugg',
,   'viate to create a traditional knowledge graph?\n\n\n  Answer\n\n> Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to sugg',
,   '.\n\nWeaviate is not per se RDF- or schema.org-based, but is definitely inspired by it. One of the most important upsides of this approach was that we could use GraphQL (the graph query language which was entering the software stage through Facebook open-sourcing it) to represent the data inside Weavi',
,   '.\n\nWeaviate is not per se RDF- or schema.org-based, but is definitely inspired by it. One of the most important upsides of this approach was that we could use GraphQL (the graph query language which was entering the software stage through Facebook open-sourcing it) to represent the data inside Weavi',
,   '.\n\nWeaviate is not per se RDF- or schema.org-based, but is definitely inspired by it. One of the most important upsides of this approach was that we could use GraphQL (the graph query language which was entering the software stage through Facebook open-sourcing it) to represent the data inside Weavi'],
,  ['est you really try its semantic features. After all, you are creating a _knowledge_ graph 😉.\n\n\n\n#### Q: Why does Weaviate have a schema and not an ontology?\n\n\n  Answer\n\n> We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviat',
,   'est you really try its semantic features. After all, you are creating a _knowledge_ graph 😉.\n\n\n\n#### Q: Why does Weaviate have a schema and not an ontology?\n\n\n  Answer\n\n> We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviat',
,   "e schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.\n\n\n\n#### Q: What is the difference between a Weaviate data schema, ontologies and ta",
,   "e schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.\n\n\n\n#### Q: What is the difference between a Weaviate data schema, ontologies and ta",
,   '.\n\nWeaviate is not per se RDF- or schema.org-based, but is definitely inspired by it. One of the most important upsides of this approach was that we could use GraphQL (the graph query language which was entering the software stage through Facebook open-sourcing it) to represent the data inside Weavi'],
,  ['a graph-like connection between an object of Class 1 to the corresponding object of Class 2 to make it easy to see the equivalent in the other space.\n\n\n\n## Queries\n\n#### Q: How can I retrieve the total object count in a class?\n\n\n  Answer\n\n\n> This `Aggregate` query returns the total object count in a',
,   'a graph-like connection between an object of Class 1 to the corresponding object of Class 2 to make it easy to see the equivalent in the other space.\n\n\n\n## Queries\n\n#### Q: How can I retrieve the total object count in a class?\n\n\n  Answer\n\n\n> This `Aggregate` query returns the total object count in a',
,   'number of objects in the class.\n\n\n   Explain this query\n\nThis query aggregates the objects in the `JeopardyQuestion` class to obtain the total count. Since there are no restrictions, it returns the total number of objects which is 10,000.\n\n\n\n### &nbsp;&nbsp;`meta` property\n\nIn the above `Aggregate` ',
,   'number of objects in the class.\n\n\n   Explain this query\n\nThis query aggregates the objects in the `JeopardyQuestion` class to obtain the total count. Since there are no restrictions, it returns the total number of objects which is 10,000.\n\n\n\n### &nbsp;&nbsp;`meta` property\n\nIn the above `Aggregate` ',
,   'number of objects in the class.\n\n\n   Explain this query\n\nThis query aggregates the objects in the `JeopardyQuestion` class to obtain the total count. Since there are no restrictions, it returns the total number of objects which is 10,000.\n\n\n\n### &nbsp;&nbsp;`meta` property\n\nIn the above `Aggregate` '],
,  [" class.\n\n\n\n\n\n#### Q: How do I get the cosine similarity from Weaviate's certainty?\n\n\n  Answer\n\n> To obtain the cosine similarity from weaviate's `certainty`, you can do `cosine_sim = 2*certainty - 1`\n\n\n\n#### Q: The quality of my search results change depending on the specified limit. Why? How can I ",
,   " class.\n\n\n\n\n\n#### Q: How do I get the cosine similarity from Weaviate's certainty?\n\n\n  Answer\n\n> To obtain the cosine similarity from weaviate's `certainty`, you can do `cosine_sim = 2*certainty - 1`\n\n\n\n#### Q: The quality of my search results change depending on the specified limit. Why? How can I ",
,   "im of like yeah this is how certain uh something is um but of course you can use you can still use certainty in the future on on anything that uses cosine distance um but yeah you'll also just get the raw distance to sort of have an un un uh opinionated way of basically just having an objective numb",
,   'easy to add new distance metrics in the future.\n\n### Background\nIn the past Weaviate used a single number that would control the distances between vectors and that was **certainty**. A certainty is a number between 0 and 1, which works perfectly for cosine distances, as cosine distances are limited ',
,   'easy to add new distance metrics in the future.\n\n### Background\nIn the past Weaviate used a single number that would control the distances between vectors and that was **certainty**. A certainty is a number between 0 and 1, which works perfectly for cosine distances, as cosine distances are limited '],
,  [' API calls?\n\n\n  Answer\n\n> Yes, Weaviate supports cursor-based iteration as well as pagination through a result set.\n>\n> To iterate through all objects, you can use the `after` operator with both REST and GraphQL.\n>\n> For pagination through a result set, you can use the `offset` and `limit` operators',
,   ' API calls?\n\n\n  Answer\n\n> Yes, Weaviate supports cursor-based iteration as well as pagination through a result set.\n>\n> To iterate through all objects, you can use the `after` operator with both REST and GraphQL.\n>\n> For pagination through a result set, you can use the `offset` and `limit` operators',
,   ' integrate Weaviate into your stack, and we believe that GraphQL is the answer to this. The community and client libraries around GraphQL are enormous, and you can use almost all of them with Weaviate.\n\n\n\n## Data management\n\n#### Q: What is the best way to iterate through objects? Can I do paginated',
,   ' integrate Weaviate into your stack, and we believe that GraphQL is the answer to this. The community and client libraries around GraphQL are enormous, and you can use almost all of them with Weaviate.\n\n\n\n## Data management\n\n#### Q: What is the best way to iterate through objects? Can I do paginated',
,   "go through all of the data like in a really fast manner without like writing complicated queries or what's the what's the idea and the main idea is that now you are able to access all objects in a class well previously you are limited to 10 objects for a query so users had to craft these artificial "],
,  ['search with structured filtering.\n\n**Weaviate in a nutshell**:\n\n* Weaviate is an open source vector database.\n* Weaviate allows you to store and retrieve data objects based on their semantic properties by indexing them with vectors.\n* Weaviate can be used stand-alone (aka _bring your vectors_) or wi',
,   'search with structured filtering.\n\n**Weaviate in a nutshell**:\n\n* Weaviate is an open source vector database.\n* Weaviate allows you to store and retrieve data objects based on their semantic properties by indexing them with vectors.\n* Weaviate can be used stand-alone (aka _bring your vectors_) or wi',
,   " criteria.\n\nWe will get into this in more detail later - but for now, it's enough to know that Weaviate can perform fast vector searches as well as filtering.\n\n## &nbsp;&nbsp;Review\n\nIn this section, you learned about what vectors are and how Weaviate utilizes them at a very high level. You have als",
,   " criteria.\n\nWe will get into this in more detail later - but for now, it's enough to know that Weaviate can perform fast vector searches as well as filtering.\n\n## &nbsp;&nbsp;Review\n\nIn this section, you learned about what vectors are and how Weaviate utilizes them at a very high level. You have als",
,   't, Weaviate is agnostic of how you came up with your vectors. This means teams with experience in data science and machine learning can simply keep using their finely-tuned ML models and import their data objects alongside their existing vector positions. At the same time, Weaviate comes with option'],
,  ['\n\n## Overview\n\nThe following is an overview of features planned for Weaviate. By clicking the link, you can upvote the feature or engage in a discussion about it. You can also join our forum to discuss the roadmap in more detail.\n\nWe use your feedback and votes on GitHub pages to plan future release',
,   '\n\n## Overview\n\nThe following is an overview of features planned for Weaviate. By clicking the link, you can upvote the feature or engage in a discussion about it. You can also join our forum to discuss the roadmap in more detail.\n\nWe use your feedback and votes on GitHub pages to plan future release',
,   'use it today\nOne of the coolest things about an open-source community and users of the software is to see how people use it and what trends you can see emerge around implementations. The core features of Weaviate are the semantic search element and the semantic classification, which are used in a va',
,   "ell I think in our our slack Community I often see people requesting this feature or asking when it's going to be available or being excited when they hear that it's going to be available soon I think uh yeah it's always awesome to see people asking for things and then have it delivered that's so co",
,   'use it today\nOne of the coolest things about an open-source community and users of the software is to see how people use it and what trends you can see emerge around implementations. The core features of Weaviate are the semantic search element and the semantic classification, which are used in a va']],
, 'ground_truths': [['Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'],
,  ['Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'],
,  ['Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'],
,  ['There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'],
,  ["As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."],
,  ['Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'],
,  ['Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'],
,  ["The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."],
,  ['Yes, a UUID will be created if not specified.'],
,  ['Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'],
,  ["We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."],
,  ['Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'],
,  ["To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"],
,  ['Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'],
,  ["It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."],
,  ["Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."]]}

Ragas part

[19]

[20]

evaluating with [faithfulness]

100%|██████████| 2/2 [01:55<00:00, 57.98s/it]

evaluating with [answer_relevancy]

  0%|          | 0/2 [00:00<?, ?it/s]/usr/local/lib/python3.11/site-packages/langchain/embeddings/openai.py:501: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.5/migration/
  response = response.dict()
/usr/local/lib/python3.11/site-packages/pydantic/main.py:979: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.5/migration/
  warnings.warn('The `dict` method is deprecated; use `model_dump` instead.', DeprecationWarning)
 50%|█████     | 1/2 [00:11<00:11, 11.99s/it]/usr/local/lib/python3.11/site-packages/langchain/embeddings/openai.py:501: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.5/migration/
  response = response.dict()
/usr/local/lib/python3.11/site-packages/pydantic/main.py:979: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.5/migration/
  warnings.warn('The `dict` method is deprecated; use `model_dump` instead.', DeprecationWarning)
100%|██████████| 2/2 [00:18<00:00,  9.01s/it]

evaluating with [context_precision]

100%|██████████| 2/2 [00:04<00:00,  2.35s/it]

evaluating with [context_recall]

100%|██████████| 2/2 [00:53<00:00, 26.87s/it]

[21]

[22]