What is an index in ElasticSearch?
An index is similar to a table in relational databases. The difference is that relational databases would store actual values, which is optional in ElasticSearch. An index can store actual and/or analyzed values in an index.
What is a document in ElasticSearch?
A document is similar to a row in relational databases. The difference is that each document in an index can have a different structure (fields), but should have same data type for common fields.
Each field can occur multiple times in a document with different data types. Fields can contain other documents too.
Does ElasticSearch have a schema?
Yes, Elasticsearch can have mappings which can be used to enforce schema on documents.
What is a document type in ElasticSearch?
A document type can be seen as the document schema / dynamic mapping definition, which has the mapping of all the fields in the document along with its data types.
What is indexing in ElasticSearch?
The process of storing data in an index is called indexing in ElasticSearch. Data in Elasticsearch can be divided into write-once and read-many segments. Whenever an update is attempted, a new version of the document is written to the index.
What is a node in ElasticSearch?
Each instance of ElasticSearch is called a node. Multiple nodes can work in harmony to form an Elasticsearch Cluster.
What is a shard in ElasticSearch?
Due to resource limitations like RAM, vCPU etc, for scale-out, applications need to employ multiple instances of ElasticSearch on separate machines. Data in an index can be divided into multiple partitions, each handled by a separate node (instance) of ElasticSearch. Each such partition is called a shard. By default an ElasticSearch index has 5 shares.
What is a replica in ElasticSearch?
Each shard in ElasticSearch has 2 copy of the shard. These copies are called replicas. They serve the purpose of high-availability and fault-tolerance.
What is an Analyzer in Elasticsearch?
While indexing data in ElasticSearch, data is transformed internally by the Analyzer defined for the index, and then indexed. An analyzer is built of tokenizer and filters. Following types of Analyzers are available in ElasticSearch 1.10.
- STANDARD ANALYZER
- SIMPLE ANALYZER
- WHITESPACE ANALYZER
- STOP ANALYZER
- KEYWORD ANALYZER
- PATTERN ANALYZER
- LANGUAGE ANALYZERS
- SNOWBALL ANALYZER
- CUSTOM ANALYZER
What is a Tokenizer in ElasticSearch?
A Tokenizer breakdown fields values of a document into a stream, and inverted indexes are created and updates using these values, and these stream of values are stored in the document.
What is a Filter in ElasticSearch?
After data is processed by Tokenizer, the same is processed by Filter, before indexing. Following types of Filters are available in ElasticSearch 1.10.
- AND FILTER
- BOOL FILTER
- EXISTS FILTER
- GEO BOUNDING BOX FILTER
- GEO DISTANCE FILTER
- GEO DISTANCE RANGE FILTER
- GEO POLYGON FILTER
- GEOSHAPE FILTER
- GEOHASH CELL FILTER
- HAS CHILD FILTER
- HAS PARENT FILTER
- IDS FILTER
- INDICES FILTER
- LIMIT FILTER
- MATCH ALL FILTER
- MISSING FILTER
- NESTED FILTER
- NOT FILTER
- OR FILTER
- PREFIX FILTER
- QUERY FILTER
- RANGE FILTER
- REGEXP FILTER
- SCRIPT FILTER
- TERM FILTER
- TERMS FILTER
- TYPE FILTER
What is the query language of ElasticSearch?
ElasticSearch uses the Apache Lucene query language, which is called Query DSL.
What is Type (Mapping Type) in Index of Elasticsearch?
A type used to be a logical
category/partition of your index to allow you to store different types of
documents in the same index, e.g. one type for users, another type for blog
posts.
It has been marked as deprecated and will
no longer be possible to create multiple types in an index, and the whole
concept of types will be removed in a ElasticSearch 7.x version.
What is the use of field level attributes- index and store?
The index is employed for searching. Indexed fields are transformed during analysis, and cannot retrieve the original data when necessary.
Store implies the data stored by Lucene, which will again return when necessary. Stored fields are not searchable.
What is an Analyzer in Elasticsearch?
While indexing data, it is transformed
internally via the defined Analyzer for the index.
Analyzers are made of one Tokenizer, preceded by CharFilters and zero or many TokenFilters. On the other hand, analysis module refers Analyzers under the name of mapping definitions or any APIs.
Elasticsearch is prebuilt with analyzers that are ready to use. However, you can integrate the built in character, token filters, along with tokenizers to create custom analyzers.
What is Character Filter in ElasticsearchAnalyzer?
A character filter obtains the ideal text as stream of characters, later on modifies it by adding, deleting, or altering characters. For example, any character filter in usage has the ability to convert Hindu-Arabic numerals into Arabic-Latin numerals (0123456789), and even sometimes strip HTML elements via the stream.
What are Token filters in ElasticsearchAnalyzer?
A token filter obtains the token stream, later on add, delete, or alter the tokens. For instance, a lowercase token filter modifies all tokens into lowercase, a stop token filter deletes stop words, and a synonym token filter includes synonyms into the token stream.
Token filters will be unable to change the position or character offsets of any certain token.
What is a Tokenizer?
Tokenizers break down a string into stream of tokens. A single tokenizer split the string into terms when working with punctuation and whitespace. Elasticsearch has a number of built in tokenizers which can be used to build custom analyzers.
What are the advantages of Elasticsearch?
Elasticsearch is compatible on any platform.
Elasticsearch is Near Real Time (NRT), making it searchable on engine.
Elasticsearch cluster is distributed, scalable and easy to integrate.
Elasticsearch REST uses JSON objects, making it to invoke the Elasticsearch server along with different programming languages.
Elasticsearch supports every document type except text rendering.
What is Elasticsearch REST API and use of it?
Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. Among the few things that can be done with the API are as follows:
Check your cluster, node, and index health, status, and statistics
Administer your cluster, node, and index data and metadata
Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
Execute advanced search operations viz. aggregations, filtering, paging, scripting, sorting, among many others
Does Elasticsearch have a schema?
Yes, Elasticsearch can have a schema. A schema is a description of one or more fields that describes the document type and how to handle the different fields of a document. The schema in Elasticsearch is a mapping that emphasizes the JSON document fields and other data type, as well as Lucene indexes under the hood. Because of this, in Elasticsearch terms, we usually call this schema a “mapping”.
What is a cluster in Elasticsearch?
Cluster is a collection of nodes that holds data together and enables indexing and search abilities across each. Each cluster is recognized by a unique default name i.e. “Elasticsearch”. This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
What is a node in Elasticsearch?
Node is a minute server and forms a part of the cluster. It stores the data and enjoys the clusters indexing and search functionalities.
What is Ingest Node in Elasticsearch?
Ingest nodes can execute pre-processing an ingest pipeline. It effectively transform and works on the document prior to indexing. Dedicated ingest nodes mark the master and data nodes either as false or true.
What is Elasticsearch Data Node?
Data nodes hold shards that handle indexed documents. They execute data related CRUD and search aggregation operations etc. Set node.data=true to make node as Data Node.
Data Node operations are I/O-, memory-, and CPU-intensive. Data nodes benefit the separation of the master and data roles.
What are Master Node and Master Eligible Node in Elasticsearch?
Master Node control cluster wide operations like to create or remove an index, track nodes of cluster, and decide to allocate shards on nodes. It is important for cluster health to have a stable master node. Master Node elected based on configuration properties node. Master=true (Default).
Master Eligible Node decide based on below configuration
discovery.zen.minimum_master_node : number (default 1)
and above number is decided based on (master_eligible_nodes / 2) + 1
What are Tribe Node and Coordinating Node in Elasticsearch?
Tribe nodes connect variant clusters and execute search operations across each connected clusters. This node is configured by settings tribe.
Coordinating Node is just like a Smart Load balancer that handles master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing.
Every node can be termed as a coordinating node which has all three nodes. Data, node. Ingest and node. Master, set to false. This node is impossible to disable as it possess enough memory and CPU to deal with the gather phase.