100 Days of DevOps — Day 84-Introduction to ElasticSearch

Prashant Lakhera
7 min readMay 6, 2019

--

To view the updated DevOps course(101DaysofDevOps)

Course Registration link: https://www.101daysofdevops.com/register/

Course Link: https://www.101daysofdevops.com/courses/101-days-of-devops/

YouTube link: https://www.youtube.com/user/laprashant/videos

Welcome to Day 84 of 100 Days of DevOps, Focus for today is Introduction to ElasticSearch

ElasticSearch is an ultrafast distributed(fault tolerant)search and analytics engine powered by Apache Lucene Project. ElasticSearch is specifically designed to search an index of massive datasets in the order of Petabytes.

Installing ElasticSearch

Requirement

java >7

Installation

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.2.zip$ unzip elasticsearch-5.4.2.zip$ cd elasticsearch-5.4.2# To start ElasticSearch$ bin/elasticsearch

To test elasticsearch

$ curl http://localhost:9200{
“name” : “ASEUJl9”,
“cluster_name” : “elasticsearch”,“cluster_uuid” : “LSEdI1HARj-O_cSk0k8DLg”,“version” : {“number” : “5.4.2”,“build_hash” : “929b078”,“build_date” : “2017–06–15T02:29:28.122Z”,“build_snapshot” : false,“lucene_version” : “6.5.1”},“tagline” : “You Know, for Search”}

Now if we want to test via GUI interface we need Kibana for that

$ wget https://artifacts.elastic.co/downloads/kibana/kibana-5.4.2-darwin-x86_64.tar.gz$ tar -xvf kibana-5.4.2-darwin-x86_64.tar.gz$ cd kibana-5.4.2-darwin-x86_64$ vim config/kibana.ymlNow go inside kibana.yml and uncomment this entryelasticsearch.url: "http://localhost:9200"# Start Kibana$ bin/kibana

# To access Kibana

http://localhost:5601/

Kibana

Now let’s try to put some index(using Dev Tools)(don’t worry if you don’t understand what index is)

Now try to get it

ElasticSearch Stack

Kibana: Data VisualizationElasticSearch: Store,Index,Search and Analyze dataLogstash,Beats: Data IngestionX-Pack: Additional Features

What is an Index?

The index is a logical namespace which points to one or more shards(container for data)in an ElasticSearch cluster and it serves as the place to store related data

# Adding Index$ curl -XPUT http://localhost:9200/testing{"acknowledged":true,"shards_acknowledged":true}# Getting all indices in a cluster$ curl -XGET http://localhost:9200/_cat/indices?vhealth status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.sizeyellow open   testing Zkhf_-5pR2yPvjvqgdLlyg   5   1          0            0       650b           650byellow open   test    rfTZJwiSTSmufAzzHWjSkg   5   1          0            0       650b           650byellow open   .kibana 8-5xFao6TjCqV3a_iNCFnw   1   1          1            0      3.1kb          3.1kb# Get a specific index in a cluster$ curl -XGET http://localhost:9200/testing?pretty{"testing" : {"aliases" : { },"mappings" : { },"settings" : {"index" : {"creation_date" : "1498323861381","number_of_shards" : "5","number_of_replicas" : "1","uuid" : "Zkhf_-5pR2yPvjvqgdLlyg","version" : {"created" : "5040299"},"provided_name" : "testing"}}}}

NOTE:

  • When Index is created by default it’s assigned to 5 primary shards(fixed)
  • There is a one replica shard(can be changed any time)

ElasticSearch VS Relational Database Analogy

Documents

As mentioned above Document can think of Row(an individual entry in ElasticSearch) and contained in Document is Field which we can think as a column.

Easiest way to understand this

index --> my_index
type --> my_doc
document --> 1
Fields -->
"name":"testuser",
"years":1,
"date":"2017-06-24"

Now to get the data back

Now to get a mapping

If we want to get information about a particular index

$ curl -XGET http://localhost:9200/my_index?pretty
{
“my_index” : {“aliases” : { },“mappings” : {“my_doc” : {“properties” : {“date” : {“type” : “date”},“name” : {“type” : “text”,“fields” : {“keyword” : {“type” : “keyword”,“ignore_above” : 256}}},“years” : {“type” : “long”}}}},“settings” : {“index” : {“creation_date” : “1498325421498”,“number_of_shards” : “5”,“number_of_replicas” : “1”,“uuid” : “upg2ElNdTHGpcCPCOMb6AA”,“version” : {“created” : “5040299”},“provided_name” : “my_index”}}}}

Now if we want to change the number of primary shards(not possible as they are immutable)and number of replicas, we can do it easily with the help of Kibana Developer Console

To verify it

As mentioned above a number of shards is immutable(i.e) we can’t change that value but we can change the number of replicas

So let’s try to change the number of replicas

But if we try to change the number of shards

To verify it

Mapping

Mapping describes the field properties of a document through the type. Mapping should be setup before the first document added as the system knows what type of data each field contains.

name: stringdate: date

NOTE: If you try to change the datatype after it’s been indexed(eg: from date to string) all that data will become unsearchable. The only solution to this problem is to re-index all the data.

A simple example of Mapping

Deleting an Index

First, let’s verify all index

$ curl -XGET http://localhost:9200/_cat/indices?vhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.sizeyellow open testing Zkhf_-5pR2yPvjvqgdLlyg 5 1 0 0 650b 650byellow open my_index upg2ElNdTHGpcCPCOMb6AA 5 1 1 0 4.4kb 4.4kbyellow open my_new_test qcwTmf7VSbiutHfZOJSlZw 1 2 0 0 130b 130byellow open test rfTZJwiSTSmufAzzHWjSkg 5 1 0 0 795b 795byellow open .kibana 8–5xFao6TjCqV3a_iNCFnw 1 1 1 0 3.1kb 3.1kb

Now to delete an index

$ curl -XDELETE http://localhost:9200/test{“acknowledged”:true}

Deleting an index effectively removes all documents and types associated with that index.

To delete multiple index(via developer console)

To check the cluster health

$ curl -XGET http://localhost:9200/_cluster/health?pretty{“cluster_name” : “elasticsearch”,“status” : “yellow”, # It shows yellow because it's single node cluster“timed_out” : false,“number_of_nodes” : 1,“number_of_data_nodes” : 1,“active_primary_shards” : 12,“active_shards” : 12,“relocating_shards” : 0,“initializing_shards” : 0,“unassigned_shards” : 13,“delayed_unassigned_shards” : 0,“number_of_pending_tasks” : 0,“number_of_in_flight_fetch” : 0,“task_max_waiting_in_queue_millis” : 0,“active_shards_percent_as_number” : 48.0}

Adding a document

  • We can add a document without assign an id, ElasticSearch by default assign id for us(we need to use POST for this purpose)
  • Optionally we can assign id using PUT(But if the same id exist, ElasticSearch will throw 409 conflict error)

Let’s add a field to our document mapping, we can’t change existing mapping but we can add new mapping after the index has been created

To delete a document

To verify it

Bulk API

  • Allow us to index multiple documents at one time

To verify it

To use a bulk API to add external document

curl -XPOST http://localhost:9200/books/_bulk --data-binary @test.json

Looking forward from you guys to join this journey and spend a minimum an hour every day for the next 100 days on DevOps work and post your progress using any of the below medium.

Reference

--

--

Prashant Lakhera

AWS Community Builder, Ex-Redhat, Author, Blogger, YouTuber, RHCA, RHCDS, RHCE, Docker Certified,4XAWS, CCNA, MCP, Certified Jenkins, Terraform Certified, 1XGCP