The ability to efficiently search and retrieve documents based on their semantic content has become increasingly important; especially when it comes to building Generative AI systems that implement the Retrieval Augmented Generation (RAG) design pattern. One powerful approach to achieving this is through the use of vector representations of text data, which enables similarity search based on vector distances.

This article defines what a vector search engine is, why you might use one, and how to integrate a lightweight, in-memory vector search engine (aka vector database) in a .NET application using the Build5Nines.SharpVector library in your .NET applications for vector similarity search.

What is a Vector Search Engine?

A vector search engine is a specialized system designed to index and retrieve documents based on their vectorized representations. Unlike traditional keyword-based search engines, vector search engines convert text data into high-dimensional vectors and use these vectors to perform similarity searches.

The key operations include:

  • Vectorization: Converting text data into numerical vectors using techniques such as word embeddings or transformers.
  • Similarity Search: Finding vectors (and thus documents) that are similar to a given query vector based on distance metrics like cosine similarity or Euclidean distance.

Why Use a Vector Search Engine?

Here are some key reasons to use a vector search engine:

  1. Semantic Search: Vector search engines can capture the semantic meaning of text, allowing for more accurate and relevant search results compared to traditional keyword-based searches.
  2. Flexibility: They can handle a wide range of data types, including text, images, and more, as long as the data can be vectorized.
  3. Performance: Optimized for high-dimensional vector operations, they offer efficient and scalable solutions for similarity search tasks.

Getting Started with Build5Nines.SharpVector

Build5Nines.SharpVector is an open-source, provides lightweight library for a vector database (or more precisely a text vector similarity search) in .NET applications. It provides tools for converting text data into vectors and performing similarity searches.

While working on building Generative AI applications in .NET, I had troubles finding a really simple vector database to facilitate text searching for use in a .NET / C# application. For this reason, I decided to learn more about vector databases since they are used for the RAG (Retrieval Augmented Generation) AI design pattern. As a result, I decided to create my own lightweight in-memory vector database that could be easily used in Generative AI + RAG applications in .NET.

Nuget Package

To get started with Build5Nines.SharpVector, you can install it via NuGet. This library supports .NET 6.0 and newer, and doesn’t require any other dependencies.

In Visual Studio, open your .NET project and run the following command in the Package Manager Console:

Install-Package Build5Nines.SharpVector

Or, from the command-line using the .NET CLI:

dotnet add package Build5Nines.SharpVector

Basic Usage of Build5Nines.SharpVector

Here’s a step-by-step outline to using Build5Nines.SharpVector to create a lightweight, in-memory vector database in C# applications:

1. Add using statement to import the library

using Build5Nines.SharpVector;

2. Initialize the Vector Database

Create an instance of the BasicMemoryVectorDatabase class:

var database = new BasicMemoryVectorDatabase();

FYI, the BasicMemoryVectorDatabase class inherits from a base class that enables more custom implementations to be built. Essentially, the library does include more advanced usage if necessary.

3. Add Text Data to Vector Database

The .AddText() method can be used to add text documents (as string values) to the vector database, along with some metadata to attach to the document. The BasicMemoryVectorDatabase class accepts the metadata as a string.

var strDocument = "";
var strMetadata = "";
database.AddText(strDocument, strMetadata);

The document is converted to vectors when added to the database. The metadata is any additional data to store along side the document that will also be retrieved when the document text is retrieved through a later search of the database.

4. Performing a Search

var query = "some query to search for";
var result = database.Search(query,
  pageCount: 5 // optional
  );

The .Search() method can be called to perform a text similarity search. Simply pass in your text query to the database and it will return the results of a vector similarity search. You can optionally specify the pageCount argument to the method to configure the maximum number of results to return.

There are some more advanced options available. You can see these if you look through the code in the GitHub repo for the Build5Nines.SharpVector library’s source code. We’ll keep the code example in this article simple for now, to help you get started with the basic usage of the library.

5. Iterate Through Search Results

The return value of the .Search() method will contain the search results. It is an object that has a .HasResults property that returns a boolean whether there are any search results. Then the .Texts property is an IEnumerable that can be iterated through to access the Text and Metadata of each of the search results.

if (result.HasResults) {
  foreach (var item in result.Texts)
  {
    Console.WriteLine(item.Text);
    Console.WriteLine(item.Metadata);
  }
}

The search results from using Build5Nines.SharpVector in a .NET application could then be used to pass on through a Generative AI prompt or some other application scenario where a vector text search is needed.

What to store as Metadata?

Each Text item added to the vector database with Build5Nines.SharpVector has a .Metadata item associated. This enables you to store additional data or information alongside the Text. With the BasicMemoryVectorDatabase class, the metadata is a value of type string.

Using the string metadata for text added to the vector database, it can be useful to store either a JSON string containing more information, or maybe just the filename or url for the source of the text content. This enables you to keep related, useful information for the text attached to the text within the vector database.

When a vector search is performed using Build5Nines.SharpVector, the metadata is retrieved along with the text that matches the similarity search.

Loading Data using Text Chunking

In the example above, the .AddText(text, metadata) method is used to load text data and it’s associated metadata. This is the simple method of adding text to the vector database. The Build5Nines.SharpVector library does also include a TextDataLoader class that adds support for loading text data using a text chunking pattern.

Text Chunking is the method of splitting up a document of text into smaller pieces when adding it to the vector database. This can be done to better optimize the database when working with larger documents.

Here’s an example of using the TextDataLoader to add a document to the vector database using a method of text chunking based on splitting the document up into individual paragraphs:

var loader = new Build5Nines.SharpVector.Data.TextDataLoader<int, string>(database);
// TextDatLoader uses generics since custom vector databases are supported by Build5Nines.VectorSharp
// This example matches using the BasicMemoryVectorDatabase which defaults to 'int' for text ids and 'string' for metadata

loader.AddDocument(document, new TextChunkingOptions<string>
{
  Method = TextChunkingMethod.Paragraph,
  RetrieveMetadata = (chunk) => {
    // add json to metadata containing the source filename
    return "{ filename: \"" + filename + \" }";
  }
});

The TextChunkingOptions.Method is set using the TextChunkingMethod enum. This defines the method of text chunking the loader will use to load the specified text document. This example is using the Paragraph method.

The TextDataLoader supports these text chunking methods:

Text Chunking Method Description
TextChunkingMethod.Paragraph Breaks up the text document into chunks based on individual paragraphs.
TextChunkingMethod.Sentence Breaks up the text document into chunks based on individual sentences.
TextChunkingMethod.FixedLength Breaks up the text document into chunks based on a fixed size set using the TextChunkingOptions.ChunkSize property (default is 100 characters).

TextChunkingOptions.RetrieveMetadata is used to define a lambda expression that is called for every text chunk before it’s added to the vector database. This enables you to set the metadata to associate with the text. This example, it’s just setting the metadata to every chunk of the document to JSON that contains the source filename of the document.

Summary

This article explored the concept of vector search engines and their advantages, and it provided a practical guide to using the Build5Nines.SharpVector library in your .NET applications for managing and searching text vectors efficiently using a lightweight, in-memory vector database. Whether you are building a Generative AI + RAG application, a semantic search engine, or any other application that requires fast and scalable vector operations, Build5Nines.SharpVector offers a lightweight and easy-to-use solution.

Give Build5Nines.SharpVector a try in your next .NET project and experience the benefits of an lightweight, in-memory vector similarity search engine.

Happy coding!

Chris Pietschmann is a Microsoft MVP, HashiCorp Ambassador, and Microsoft Certified Trainer (MCT) with 20+ years of experience designing and building Cloud & Enterprise systems. He has worked with companies of all sizes from startups to large enterprises. He has a passion for technology and sharing what he learns with others to help enable them to learn faster and be more productive.
Microsoft MVP HashiCorp Ambassador

Discover more from Build5Nines

Subscribe now to keep reading and get access to the full archive.

Continue reading