100 Days of DevOps — Day 84-Introduction to ElasticSearch
To view the updated DevOps course(101DaysofDevOps)
Course Registration link: https://www.101daysofdevops.com/register/
Course Link: https://www.101daysofdevops.com/courses/101-days-of-devops/
YouTube link: https://www.youtube.com/user/laprashant/videos
Welcome to Day 84 of 100 Days of DevOps, Focus for today is Introduction to ElasticSearch
ElasticSearch is an ultrafast distributed(fault tolerant)search and analytics engine powered by Apache Lucene Project. ElasticSearch is specifically designed to search an index of massive datasets in the order of Petabytes.
Installing ElasticSearch
Requirement
java >7
Installation
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.2.zip$ unzip elasticsearch-5.4.2.zip$ cd elasticsearch-5.4.2# To start ElasticSearch$ bin/elasticsearch
To test elasticsearch
$ curl http://localhost:9200{
“name” : “ASEUJl9”,“cluster_name” : “elasticsearch”,“cluster_uuid” : “LSEdI1HARj-O_cSk0k8DLg”,“version” : {“number” : “5.4.2”,“build_hash” : “929b078”,“build_date” : “2017–06–15T02:29:28.122Z”,“build_snapshot” : false,“lucene_version” : “6.5.1”},“tagline” : “You Know, for Search”}
Now if we want to test via GUI interface we need Kibana for that
$ wget https://artifacts.elastic.co/downloads/kibana/kibana-5.4.2-darwin-x86_64.tar.gz$ tar -xvf kibana-5.4.2-darwin-x86_64.tar.gz$ cd kibana-5.4.2-darwin-x86_64$ vim config/kibana.ymlNow go inside kibana.yml and uncomment this entryelasticsearch.url: "http://localhost:9200"# Start Kibana$ bin/kibana
# To access Kibana
Now let’s try to put some index(using Dev Tools)(don’t worry if you don’t understand what index is)
Now try to get it
ElasticSearch Stack
Kibana: Data VisualizationElasticSearch: Store,Index,Search and Analyze dataLogstash,Beats: Data IngestionX-Pack: Additional Features
What is an Index?
The index is a logical namespace which points to one or more shards(container for data)in an ElasticSearch cluster and it serves as the place to store related data
# Adding Index$ curl -XPUT http://localhost:9200/testing{"acknowledged":true,"shards_acknowledged":true}# Getting all indices in a cluster$ curl -XGET http://localhost:9200/_cat/indices?vhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.sizeyellow open testing Zkhf_-5pR2yPvjvqgdLlyg 5 1 0 0 650b 650byellow open test rfTZJwiSTSmufAzzHWjSkg 5 1 0 0 650b 650byellow open .kibana 8-5xFao6TjCqV3a_iNCFnw 1 1 1 0 3.1kb 3.1kb# Get a specific index in a cluster$ curl -XGET http://localhost:9200/testing?pretty{"testing" : {"aliases" : { },"mappings" : { },"settings" : {"index" : {"creation_date" : "1498323861381","number_of_shards" : "5","number_of_replicas" : "1","uuid" : "Zkhf_-5pR2yPvjvqgdLlyg","version" : {"created" : "5040299"},"provided_name" : "testing"}}}}
NOTE:
- When Index is created by default it’s assigned to 5 primary shards(fixed)
- There is a one replica shard(can be changed any time)
ElasticSearch VS Relational Database Analogy
Documents
As mentioned above Document can think of Row(an individual entry in ElasticSearch) and contained in Document is Field which we can think as a column.
Easiest way to understand this
index --> my_index
type --> my_doc
document --> 1
Fields -->"name":"testuser",
"years":1,
"date":"2017-06-24"
Now to get the data back
Now to get a mapping
If we want to get information about a particular index
$ curl -XGET http://localhost:9200/my_index?pretty
{“my_index” : {“aliases” : { },“mappings” : {“my_doc” : {“properties” : {“date” : {“type” : “date”},“name” : {“type” : “text”,“fields” : {“keyword” : {“type” : “keyword”,“ignore_above” : 256}}},“years” : {“type” : “long”}}}},“settings” : {“index” : {“creation_date” : “1498325421498”,“number_of_shards” : “5”,“number_of_replicas” : “1”,“uuid” : “upg2ElNdTHGpcCPCOMb6AA”,“version” : {“created” : “5040299”},“provided_name” : “my_index”}}}}
Now if we want to change the number of primary shards(not possible as they are immutable)and number of replicas, we can do it easily with the help of Kibana Developer Console
To verify it
As mentioned above a number of shards is immutable(i.e) we can’t change that value but we can change the number of replicas
So let’s try to change the number of replicas
But if we try to change the number of shards
To verify it
Mapping
Mapping describes the field properties of a document through the type. Mapping should be setup before the first document added as the system knows what type of data each field contains.
name: stringdate: date
NOTE: If you try to change the datatype after it’s been indexed(eg: from date to string) all that data will become unsearchable. The only solution to this problem is to re-index all the data.
A simple example of Mapping
Deleting an Index
First, let’s verify all index
$ curl -XGET http://localhost:9200/_cat/indices?vhealth status index uuid pri rep docs.count docs.deleted store.size pri.store.sizeyellow open testing Zkhf_-5pR2yPvjvqgdLlyg 5 1 0 0 650b 650byellow open my_index upg2ElNdTHGpcCPCOMb6AA 5 1 1 0 4.4kb 4.4kbyellow open my_new_test qcwTmf7VSbiutHfZOJSlZw 1 2 0 0 130b 130byellow open test rfTZJwiSTSmufAzzHWjSkg 5 1 0 0 795b 795byellow open .kibana 8–5xFao6TjCqV3a_iNCFnw 1 1 1 0 3.1kb 3.1kb
Now to delete an index
$ curl -XDELETE http://localhost:9200/test{“acknowledged”:true}
Deleting an index effectively removes all documents and types associated with that index.
To delete multiple index(via developer console)
To check the cluster health
$ curl -XGET http://localhost:9200/_cluster/health?pretty{“cluster_name” : “elasticsearch”,“status” : “yellow”, # It shows yellow because it's single node cluster“timed_out” : false,“number_of_nodes” : 1,“number_of_data_nodes” : 1,“active_primary_shards” : 12,“active_shards” : 12,“relocating_shards” : 0,“initializing_shards” : 0,“unassigned_shards” : 13,“delayed_unassigned_shards” : 0,“number_of_pending_tasks” : 0,“number_of_in_flight_fetch” : 0,“task_max_waiting_in_queue_millis” : 0,“active_shards_percent_as_number” : 48.0}
Adding a document
- We can add a document without assign an id, ElasticSearch by default assign id for us(we need to use POST for this purpose)
- Optionally we can assign id using PUT(But if the same id exist, ElasticSearch will throw 409 conflict error)
Let’s add a field to our document mapping, we can’t change existing mapping but we can add new mapping after the index has been created
To delete a document
To verify it
Bulk API
- Allow us to index multiple documents at one time
To verify it
To use a bulk API to add external document
curl -XPOST http://localhost:9200/books/_bulk --data-binary @test.json
Looking forward from you guys to join this journey and spend a minimum an hour every day for the next 100 days on DevOps work and post your progress using any of the below medium.
- Twitter: @100daysofdevops OR @lakhera2015
- Facebook: https://www.facebook.com/groups/795382630808645/
- Medium: https://medium.com/@devopslearning
- Slack: https://devops-myworld.slack.com/messages/CF41EFG49/
- GitHub Link:https://github.com/100daysofdevops
Reference