Unleashing the Power of OpenSearch k-NN

July 10th, 2024Resources

6 min read

K-NN (k-nearest neighbors) is a widely used machine learning (ML) recommendation algorithm, it is used to locate nearby documents based on vector dimensions. The algorithm can be and has been applied to numerous different use cases including image recognition, fraud detection, image recognition, and ‘other songs you might like’ feature in a music application.

KNN uses proximity to provide classifications and predictions regarding the grouping of an individual data point. The algorithm is particularly beneficial when labeled data is too expensive or impossible to acquire, and it can achieve high accuracy in a broad range of prediction-type problems.

OpenSearch is a powerful search analytics engine that is used to monitor a significant amount of data. Searching through this data when users cannot attain the data or are unaware of exactly what they’re searching for can be challenging. So, to alleviate this issue numerous users turn to the OpenSearch KNN plugin. The plugin is simple to install and can offer useful results and suggestions with the same ease as a standard OpenSeearch query. This allows users to leverage similarity search capabilities within the robust and scalable OpenSearch framework, making it an attractive choice for applying complex search functionalities via numerous applications.

Within this article, we will discuss what OpenSearch k-NN is, how to install the OpenSearch k-NN plugin and break down the three different methods for obtaining the k-nearest neighbors from an index of vectors.

Contents

What is OpenSearch KNN?
OpenSearch KNN Plugin
OpenSearch KNN: Real World Use Cases
Hosted OpenSearch

What is OpenSearch KNN?

OpenSearch k-NN (short for k-nearest neighbors) is a plugin for OpenSearch that enables users to execute the nearest neighbor search on an extensive quantity (billions) of documents via a vast amount (thousands) of dimensions as simply as executing a typical OpenSearch query. aggregations and filter clauses can be utilized to further refine a user's similarity search operations. The OpenSearch KNN plugin supports three different methods for acquiring the k-nearest neighbors from an index of vectors: approximate k-NN, script score k-NN, and painless extensions.

OpenSearch KNN Plugin

Installation

You will most likely already have the OpenSearch k-NN plugin installed as it is included with the all-in installation version of OpenSearch. If you aren’t sure of this you can verify your plugins and search for ‘opensearch-knn’:

sudo bin/opensearch-plugin list

On the chance that you can’t find ‘opensearch-knn’ on the list of plugins, it is easily installed with:

sudo bin/opensearch-plugin install opensearch-knn

OpenSearch KNN Methods

As stated previously there are three main different methods for obtaining k-nearest neighbors from an index of vectors, here we will delve into the details of each of these methods

Approximate k-NN

When dealing with particularly large datasets with high dimensionality, users can encounter scaling problems that impact the efficiency of the search when using standard k-NN search methods. Approximate k-NN search methods can overcome this by implementing tools that restructure indexes more efficiently and lessen the dimensionality of searchable vectors.

The Approximate k-NN search methods utilized by OpenSearch harness approximate nearest neighbor (ANN) algorithms from the nmslib, faiss, and Lucene libraries to enhance k-NN search performance. The use of ANN algorithms reduces search latency for large datasets. Among the three search methods offered by the k-NN plugin, this approach offers the best scalability for large datasets, it’s the preferred method when dealing with datasets containing hundreds of thousands of vectors.

To utilize OpenSearch k-NN approximate search capabilities create an index with the appropriate settings and mappings to enable k-NN search, as seen below.

1. Create an Index with k-NN Settings: Create an index with the knn setting enabled and outline a field of type knn_vector in the mappings.

PUT /my_knn_index { "settings": { "index": { "knn": true } }, "mappings": { "properties": { "my_vector": { "type": "knn_vector", "dimension": 128 // Replace with your vector dimension } } } }

2. Index your Vectors: Add documents containing your vectors.

PUT /my_knn_index/_doc/1 { "my_vector": [0.1, 0.2, 0.3, ..., 0.128] // Replace with your actual vector values }

3. Execute a k-NN Search: To perform a k-NN search, use the following query format. This will search for the nearest neighbors to the provided vector

POST /my_knn_index/_search { "size": 3, // Number of nearest neighbors to return "query": { "knn": { "my_vector": { "vector": [0.1, 0.2, 0.3, ..., 0.128], // The query vector "k": 3 // Number of nearest neighbors to find } } } }

Script Score k-NN

The KNN plugin employs the OpenSearch score script plugin that can be used to locate the exact k-nearest neighbors to a specified query point. With the k-NN script score, you’re able to apply a filter on an index before searching, making it especially useful for dynamic search cases where the index body may vary based on various conditions. It’s worth noting that the script score k-NN search employs a brute force search, meaning it doesn’t scale well, especially when compared to the approximate search.

To use a script score KNN search, the first two steps are the same as an approximate search; create an index with k-NN settings and index your vectors. So follow these steps in the previous example. The example below outlines how to execute a script score k-NN search.

1. Perform a k-NN Search with Script Scoring: To execute a k-NN search using script scoring, utilize the following query format. This example uses a script to calculate the cosine similarity between vectors.

POST /my_knn_index/_search { "size": 3, // Number of nearest neighbors to return "query": { "script_score": { "query": { "match_all": {} // You can use a more specific query if needed }, "script": { "source": "knn_score", "lang": "knn", "params": { "field": "my_vector", "query_value": [0.1, 0.2, 0.3, ..., 0.128], // The query vector "space_type": "cosinesimil" // Metric for similarity } } } } }

Painless Extensions

The OpenSearch KNN plugin can be utilized with painless scripting extensions. Directly use k-NN distance functions in painless scripts to execute operations on knn_vector fields. Painless possesses’ a strict list of allowed functions and classes per context to guarantee the scripts are secure. The k-NN plugin extends Painless Scripting with additional distance functions used in k-NN scoring scripts. These extensions enable you to customize your k-NN workloads more effectively.

Similar to the script score search, to use a painless extensions KNN search, the first two steps are the same as an approximate search; create an index with k-NN settings and index your vectors. So follow these steps in the approximate k-NN example. The example below outlines how to execute a painless extensions k-NN search.

1. Perform a k-NN Search with Script Scoring Using Painless Extensions: To perform a k-NN search with customized scoring using Painless scripting extensions, use the following query format. This example uses a script to calculate cosine similarity.

POST /my_knn_index/_search { "size": 3, // Number of nearest neighbors to return "query": { "script_score": { "query": { "match_all": {} // You can use a more specific query if needed }, "script": { "source": """ // Example script using Painless extensions for k-NN double[] queryVector = params.query_vector; double[] documentVector = doc['my_vector']; return cosineSimilarity(queryVector, documentVector); """, "params": { "query_vector": [0.1, 0.2, 0.3, ..., 0.128] // The query vector } } } } }

OpenSearch KNN: Real World Use Cases

OpenSearch KNN and the KNN algorithm have been widely used in numerous real-world applications. Firstly the algorithm can be effectively applied to anomaly detection. This entails defining what constitutes normal and abnormal values without traditional training processes making KNN usable for small and large datasets and enabling straightforward visualizations. KNN anomaly detection can be applied to cybersecurity, as it is particularly beneficial for identifying evolving threats. Through continuous monitoring, KNN can identify subtle signs of intrusion or abnormal activities that might elude traditional security measures.

Another real-world use case of KNN is within healthcare to detect patients with similar medical conditions such as diabetes and heart disease. Here, the data points for KNN are presented as feature vectors with each feature corresponding to a medical attribute, like blood pressure or BMI. Also, KNN is used to locate k-nearest neighbors concerning a new patient based on their medical data. Analyzing these neighbors can classify the new patient with the most frequent class among them as his predicted class. This approach can be applied to disease diagnosis or treatment recommendation in a way backed by medical decision-making.

Continuing with KNN real-world applications, the algorithm can be utilized in retail to offer customers products based on past purchases and history. The algorithm works by computing the distance between the feature vector for a new data point, say a customer's profile, and feature vectors for existing data points, which correspond to other customers' profiles. The KNN algorithm then identifies the K nearest neighbors according to this distance metric. It assigns a label to the new data point according to the most common label from these nearest neighbors.

For example, it can find other customers with a similar purchasing pattern to the customer, in terms of style, size, and color of clothes, and suggest products that the similar customers have bought but the new customer hasn't.

Lastly, KNN can be used in finance to identify fraudulent transactions by comparing them to previous examples of fraud to assist in preventing financial losses and enhance the security of financial systems. As well as this, KNN can be used for portfolio management in identifying similar stocks or assets that are likely to advise on where to invest, based on past performance and market trends. This helps in selecting investments that align with historical patterns and potential future performance.

Hosted OpenSearch

By opting for a Hosted OpenSearch solution like the one provided by Logit.io, you are guaranteed the latest version of the tool as we handle the maintenance, ensuring you never miss an update. This means that you can use capabilities such as the OpenSearch k-NN plugin straight away. Also, with our Hosted OpenSearch solution, we offer industry-leading customer support, so if you encounter any issues or difficulties our experienced engineers will happily assist you with any issues you encounter.

If you’re interested in finding out more regarding the extensive capabilities of the Logit.io Hosted OpenSearch solution, don’t hesitate to contact us or begin exploring the platform for yourself with a 14-day free trial.

If you've enjoyed this article why not read OpenSearch vs Solr or The Leading OpenSearch Training Resources next?

Logging

Metrics

Observability

Features

Grafana Demo

Prometheus as a Service

ELK as a Service

Monitoring

Logging

Compliance and Auditing

Analysis

Platform-Specific Logging

CMMC Solution

Datadog Alternative

Splunk Alternative

Logz.io Alternative

New Relic Alternative