From the course: AWS Certified AI Practitioner (AIF-C01) Cert Prep

Unlock this course with a free trial

Join today to access over 24,700 courses taught by industry experts.

Introduction to vector databases

Introduction to vector databases

- Let's talk about vector databases. But in order to do that, we need to define some terms, starting with vector database. This is a specialized data storage system that's designed to efficiently store index and retrieve high dimensional vector representations of data, which can be called vector embeddings. And this enables fast similarity searches and nearest neighbor queries. Next, we need to talk about a vector embedding. This is a dense numerical representation of data points called chunks, which are a contiguous string of single characters, words all the way up to sentences or longer, in a continuous vector space. And it captures the semantic relationships and similarities among the data in a way that facilitates ML and information retrieval tasks. Our next term, well, I mentioned chunks, so what is document chunking? This is the process of taking a large document and breaking it down into smaller, more manageable pieces. Each of those are called a chunk. Chunk size in terms of…

Contents