Resources
3 min read
OpenSearch’s cluster architecture allows for scalability, fault tolerance, and high availability, making it suitable for a broad variety of use cases, from log and event analysis to full-text search and business intelligence. Within this article, we will define what a cluster and nodes are in OpenSearch and provide a configuration guide for how to configure an OpenSearch cluster.
Contents
What is an OpenSearch Cluster?
An OpenSearch cluster is a distributed computing environment that consists of numerous nodes operating as a whole to store and process data efficiently. One of the key features of an OpenSearch cluster is its ability to horizontally scale out by inputting additional nodes into the cluster. This enables organizations to manage increasing data volumes and query loads without sacrificing performance or availability.
OpenSearch clusters support data sharding and replication, enabling data to be distributed via numerous nodes for enhanced performance and fault tolerance. Data is divided into shards, which are distributed across data nodes, and each shard is replicated to guarantee data redundancy and resilience against node failures.
OpenSearch clusters also offer advanced features for cluster management, monitoring, and security. Administrators can configure cluster settings, monitor cluster health and performance metrics, and set up security policies to control access to data and cluster resources. With built-in tools and APIs, administrators are able to execute tasks such as index management, cluster monitoring, and data backup and restore operations.
What are Nodes in OpenSearch?
In an OpenSearch cluster, nodes can be categorized into different roles, each serving a specific purpose in the cluster's operation. These roles include data nodes, master-eligible nodes, and client nodes.
- Data nodes: These are responsible for storing and indexing data, executing search and analytics operations, and managing data replication and shard management.
- Master-eligible nodes: Participate in cluster coordination and election of a master node responsible for cluster-wide coordination and metadata management.
- Client nodes: These nodes serve as entry points for client applications and distribute requests across the cluster.
Why Configure an OpenSearch Cluster?
By configuring parameters such as heap size, thread pools, and caching settings, you can enhance the performance of your OpenSearch cluster. Editing these settings guarantees that the cluster can manage the expected workload efficiently, offering rapid response times for search queries and data retrieval.
In addition to this, configuring integration with other tools and technologies, such as data ingestion frameworks, visualization tools, and monitoring solutions, enables you to construct an extensive data analysis platform. By integrating OpenSearch with these tools, you can ingest data from a variety of sources, visualize and examine data, and monitor cluster health and performance effectively.
How To Configure an OpenSearch Cluster
Below is a configuration guide that walks you through the process of creating a cluster in OpenSeearch.
- Define Cluster Settings: In the cluster settings, you define the name of your OpenSearch cluster (cluster.name), the name of each node (node.name), and the network host settings (network.host).
cluster.name: my-opensearch-cluster node.name: ${HOSTNAME} network.host: [local, site]
- Define Node Rules: Specify the roles of each node in the cluster. In this example, each node is configured as both a master and a data node.
node.master: true node.data: true
- Configure Security: Enable security features for authentication, authorization, and encryption. Specify the distinguished name (DN) of the admin user, SSL settings for HTTP communication, and paths to certificate and key files.
opensearch_security.disabled: false opensearch_security.authcz.admin_dn: CN=admin,OU=client,O=client,L=Test,C=DE opensearch_security.ssl.http.enabled: true opensearch_security.ssl.http.pemcert_filepath: esnode.pem opensearch_security.ssl.http.pemkey_filepath: esnode-key.pem opensearch_security.ssl.http.pemtrustedcas_filepath: root-ca.pem
- Outline Index Settings and Mappings: Create an index with specified settings, such as the number of shards and replicas. Outline mappings for index fields, specifying their data types and properties.
PUT /my-index { "settings": { "number_of_shards": 1, "number_of_replicas": 1 }, "mappings": { "properties": { "title": { "type": "text" }, "content": { "type": "text" }, "timestamp": { "type": "date" } } } }
- Configure Data Ingestion: Use Logstash to ingest data from a log file. Configure input to read from the log file, apply filters to parse log entries, and output to OpenSearch, specifying cluster endpoint, authentication credentials, and target index.
input { file { path => "/path/to/logfile.log" start_position => "beginning" } } filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} %{GREEDYDATA:message}" } } } output { opensearch { hosts => ["https://opensearch-cluster-endpoint:443"] user => "admin" password => "password" index => "my-index" } }
- Deploy and Configure Nodes: Deploy OpenSearch nodes and specify their names, data storage paths, and log paths. Adjust these settings according to your server environment.
bin/opensearch-node -E node.name=node-1 -E path.data=/path/to/data -E path.logs=/path/to/logs
- Integration with Visualization Tool: Configure Kibana to connect to the OpenSearch cluster for visualization and analytics. Specify server settings, cluster endpoint, and authentication credentials.
server.host: "0.0.0.0" server.name: "my-kibana" elasticsearch.hosts: ["https://opensearch-cluster-endpoint:443"] elasticsearch.username: "kibana" elasticsearch.password: "password"
- Test and Validate Configuration: Ensure that the OpenSearch cluster is up and running, and verify that data ingestion, indexing, querying, and visualization work as expected.
If you've enjoyed this article why not read The Top 10 OpenSearch Plugins or Cassandra vs OpenSearch next?