Skip to main content

Google Vertex AI Matching Engine

Compatibility

Only available on Node.js.

The Google Vertex AI Matching Engine "provides the industry's leading high-scale low latency vector database. These vector databases are commonly referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service."

Setup

caution

This module expects an endpoint and deployed index already created as the creation time takes close to one hour. To learn more, see the LangChain python documentation Create Index and deploy it to an Endpoint.

Before running this code, you should make sure the Vertex AI API is enabled for the relevant project in your Google Cloud dashboard and that you've authenticated to Google Cloud using one of these methods:

  • You are logged into an account (using gcloud auth application-default login) permitted to that project.
  • You are running on a machine using a service account that is permitted to the project.
  • You have downloaded the credentials for a service account that is permitted to the project and set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of this file.

Install the authentication library with:

npm install google-auth-library

The Matching Engine does not store the actual document contents, only embeddings. Therefore, you'll need a docstore. The below example uses Google Cloud Storage, which requires the following:

npm install @google-cloud/storage

Usage

Initializing the engine

When creating the MatchingEngine object, you'll need some information about the matching engine configuration. You can get this information from the Cloud Console for Matching Engine:

  • The id for the Index
  • The id for the Index Endpoint

You will also need a document store. While an InMemoryDocstore is ok for initial testing, you will want to use something like a GoogleCloudStorageDocstore to store it more permanently.

import { MatchingEngine } from "langchain/vectorstores/googlevertexai";
import { Document } from "langchain/document";
import { SyntheticEmbeddings } from "langchain/embeddings/fake";
import { GoogleCloudStorageDocstore } from "langchain/stores/doc/gcs";

const embeddings = new SyntheticEmbeddings({
vectorSize: Number.parseInt(
process.env.SYNTHETIC_EMBEDDINGS_VECTOR_SIZE ?? "768",
10
),
});

const store = new GoogleCloudStorageDocstore({
bucket: process.env.GOOGLE_CLOUD_STORAGE_BUCKET!,
});

const config = {
index: process.env.GOOGLE_VERTEXAI_MATCHINGENGINE_INDEX!,
indexEndpoint: process.env.GOOGLE_VERTEXAI_MATCHINGENGINE_INDEXENDPOINT!,
apiVersion: "v1beta1",
docstore: store,
};

const engine = new MatchingEngine(embeddings, config);

Adding documents

const doc = new Document({ pageContent: "this" });
await engine.addDocuments([doc]);

Any metadata in a document is converted into Matching Engine "allow list" values that can be used to filter during a query.

const documents = [
new Document({
pageContent: "this apple",
metadata: {
color: "red",
category: "edible",
},
}),
new Document({
pageContent: "this blueberry",
metadata: {
color: "blue",
category: "edible",
},
}),
new Document({
pageContent: "this firetruck",
metadata: {
color: "red",
category: "machine",
},
}),
];

// Add all our documents
await engine.addDocuments(documents);

The documents are assumed to have an "id" parameter available as well. If this is not set, then an ID will be assigned and returned as part of the Document.

Querying documents

Doing a straightforward k-nearest-neighbor search which returns all results is done using any of the standard methods:

const results = await engine.similaritySearch("this");

Querying documents with a filter / restriction

We can limit what documents are returned based on the metadata that was set for the document. So if we just wanted to limit the results to those with a red color, we can do:

import { Restriction } from `langchain/vectorstores/googlevertexai`;

const redFilter: Restriction[] = [
{
namespace: "color",
allowList: ["red"],
},
];
const redResults = await engine.similaritySearch("this", 4, redFilter);

If we wanted to do something more complicated, like things that are red, but not edible:

const filter: Restriction[] = [
{
namespace: "color",
allowList: ["red"],
},
{
namespace: "category",
denyList: ["edible"],
},
];
const results = await engine.similaritySearch("this", 4, filter);

Deleting documents

Deleting documents are done using ID.

import { IdDocument } from `langchain/vectorstores/googlevertexai`;

const oldResults: IdDocument[] = await engine.similaritySearch("this", 10);
const oldIds = oldResults.map( doc => doc.id! );
await engine.delete({ids: oldIds});