Elasticsearch是一個開源的、高度可擴展的全文搜索與分析引擎。它可以存儲海量的數據,能近乎實時地搜索和分析數據,能支撐復雜的查詢需求。
Elasticsearch的使用場景有:
在線商店搜索
日志分析(ELK技術棧)
商品價格波動監控
海量數據的快速調查、分析、可視化和即席查詢
Elasticsearch功能強大,使用簡單,接下來我們將介紹Elasticsearch集群的搭建和簡單使用,以快速上手。
基本概念
集群
集群由一個或多個節點構成,使用唯一的名字標識,默認為elasticsearch。如果一個網絡環境中運行著多個Elasticsearch集群,集群名字最好不要相同。因為如果節點配置為根據集群名字加入集群,那么就會產生沖突。
節點
節點是集群中的單個服務器。節點也以名字進行標識,默認為UUID,在啟動時獲得。節點名字可以配置。集群可以包含任意多個節點,單節點也可以構成一個集群。
索引
索引是文檔的集合。集群中可以創建任意多個索引,只要資源足夠。
類型
索引中可以定義一個或多個類型,類型是索引下的邏輯分類,通常擁有共同字段的文檔定義在一個類型之內。
文檔
文檔是索引中信息的基本單元。
分片(shard)和副本(replica)
索引可以存儲大量的數據,會超過單個節點的硬件上限。例如,一個包含10億文檔的索引占1TB硬盤空間,單個節點要么空間不夠,要么相應查詢的速度太慢。
為了解決這一問題,Elasticsearch支持將一個索引分成多個小塊,稱為分片。在創建索引的時候可以定義分片數。每一個分片相當于一個功能完備的獨立的小索引,可以存儲在集群的任意節點上。
分片重要的原因有兩點:?
1. 它能水平拆分數據?
2. 并行操作分片,提升吞吐量
在網絡和云環境中,故障隨時可能發生,因此故障恢復機制十分必要。Elasticsearch支持為分片創建一個或多個副本,稱為分片副本。
副本有兩個好處:?
1. 高可用性。?
2. 提升查詢的吞吐量。
總的來說,每一個索引可以拆分成多個分片,可以復制多個副本,存在主分片和分片副本。分片數和副本數都可以在創建索引時指定,不同的是,分片數確定之后就不能更改,而副本數可以動態修改。
默認情況下,每個索引擁有5個主分片和一個副本(即5個分片,每個分片都有一個副本)。
每一個Elasticsearch分片都是一個Lucene索引。Lucene索引有文檔數上限。在LUCENE-5843中,該上限為2,147,483,519 (=Integer.MAX_VALUE-128)。可以使用_cat/shards監控分片的大小。
curl -XGET gd01:9200/_cat/shards/20171229?
20171229 1 p STARTED 1904509 369.5mb 132.98.16.178 data-178?
20171229 1 r STARTED 1902986 383.6mb 132.98.16.176 master-176?
20171229 3 r STARTED 1898048 349.7mb 132.98.16.178 data-178?
20171229 3 p STARTED 1898595 492.2mb 132.98.16.177 data-177?
20171229 2 r STARTED 1903094 481.2mb 132.98.16.178 data-178?
20171229 2 p STARTED 1904497 526.9mb 132.98.16.176 master-176?
20171229 4 p STARTED 1902180 487mb 132.98.16.178 data-178?
20171229 4 r STARTED 1900635 586.9mb 132.98.16.176 master-176?
20171229 0 p STARTED 1902472 421.6mb 132.98.16.177 data-177?
20171229 0 r STARTED 1901511 511.8mb 132.98.16.176 master-176
Elasticsearch集群安裝
Elasticsearch集群依賴JDK1.8,因此在安裝之前應先安裝好JDK1.8。
下載安裝文件
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.5.tar.gz
解壓
tar -xvf elasticsearch-5.6.5.tar.gz
啟動單節點
elasticsearch-5.6.5/bin/elasticsearch
集群配置
elasticsearch.yml示例
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#? ? ? Before you set out to tweak and tune the configuration, make sure you
#? ? ? understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: es-gotcha
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-${HOSTNAME}
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
node.master: true
node.data: false
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/data/es
#
# Path to log files:
#
path.logs: /var/log/es
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 132.98.16.176
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["132.98.16.176", "132.98.16.177", "132.98.16.179", "132.98.16.180", "132.98.16.182", "132.98.16.183", "132.98.16.184"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
action.destructive_requires_name: true
需要配置的有:
cluster.name
node.name
node.master,定義節點是否為主節點
node.data
network.host
discovery.zen.ping.unicast.hosts,Elasticsearch集群節點列表
discovery.zen.minimum_master_nodes,構成集群的最少主節點數
在多臺機器上部署Elasticsearch,然后依次啟動,節點會自動發現,并構成一個集群。
集群小試
Elasticsearch提供了REST API和Java API。接下來我們使用REST API。使用API,我們可以:
檢查集群、節點、索引健康、狀態和一些統計信息
管理集群、節點、索引數據和元數據
執行CRUD
執行高級搜索,如分頁、排序、過濾、執行腳本、聚合等等
集群健康
curl -XGET gd01:9200/_cat/health?v
epoch? ? ? timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1514533722 15:48:42? esbds? green? ? ? ? ? 3? ? ? ? 3? ? 20? 10? ? 0? ? 0? ? ? ? 0? ? ? ? ? ? 0? ? ? ? ? ? ? ? ? -? ? ? ? ? ? ? ? 100.0%
獲取節點列表
curl -XGET gd01:9200/_cat/nodes?v
ip? ? ? ? ? ? heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
132.98.16.176? ? ? ? ? 64? ? ? ? ? 26? 5? ? 1.45? ? 1.49? ? 1.43 mdi? ? ? *? ? ? master-176
132.98.16.177? ? ? ? ? 84? ? ? ? ? 19? 8? ? 1.27? ? 1.43? ? 1.60 di? ? ? ? -? ? ? data-177
132.98.16.178? ? ? ? ? 57? ? ? ? ? 78? 16? ? 2.24? ? 2.40? ? 2.45 di? ? ? ? -? ? ? data-178
列舉索引
curl -XGET gd01:9200/_cat/indices?v
health status index? ? uuid? ? ? ? ? ? ? ? ? pri rep docs.count docs.deleted store.size pri.store.size
green? open? 20171228 ij-Y05EEQIimzEDYPyzvjw? 5? 1? ? 7810000? ? ? 3922446? ? ? 2.6gb? ? ? ? ? 1.3gb
green? open? 20171229 FUabFhc5TYyi4K_y81GJ9w? 5? 1? ? 9905546? ? ? 6122165? ? ? 4.2gb? ? ? ? ? 2.2gb
創建索引
curl -XPUT gd01:9200/test_idx?pretty
返回:
{
? "acknowledged" : true,
? "shards_acknowledged" : true,
? "index" : "test_idx"
}
創建文檔
在test_idx索引中創建類型為external,id為1的文檔。
curl -XPUT gd01:9200/test_idx/external/1?pretty -d '
{
? "name": "John Doe"
}'
返回
{
? "_index" : "test_idx",
? "_type" : "external",
? "_id" : "1",
? "_version" : 1,
? "result" : "created",
? "_shards" : {
? ? "total" : 2,
? ? "successful" : 2,
? ? "failed" : 0
? },
? "created" : true
}
查詢文檔
curl -XGET gd01:9200/test_idx/external/1?pretty
返回
{
? "_index" : "test_idx",
? "_type" : "external",
? "_id" : "1",
? "_version" : 1,
? "found" : true,
? "_source" : {
? ? "name" : "John Doe"
? }
}
bulk操作
批量創建文檔
curl -XPOST gd01:9200/test_idx/external/_bulk?pretty -d '?
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
'
bulk中的操作可以不一樣
curl -XPOST gd01:9200/test_idx/external/_bulk?pretty -d '
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'
查詢
在Elasticsearch中,查詢條件可以放在url中,也可以在請求體里。
url附帶查詢條件
curl -XGET gd01:9200/test_idx/external/_search?q=John
返回
{
? "_index" : "test_idx",
? "_type" : "external",
? "_id" : "1",
? "_version" : 2,
? "found" : true,
? "_source" : {
? ? "name" : "John Doe"
? }
}
請求體中附帶查詢條件
curl -XPOST gd01:9200/test_idx/external/_search?pretty -d '
{
? "query": {
? ? "term": {
? ? ? "name": "John Doe"
? ? }
? }
}'
除了簡單查詢,Elasticsearch還支持:
過濾,請參考https://www.elastic.co/guide/en/elasticsearch/reference/5.6/_executing_filters.html
聚合,請參考https://www.elastic.co/guide/en/elasticsearch/reference/5.6/_executing_aggregations.html