前言
elasticsearch5.x 新增一個比較重要的特性 IngestNode。
之前如果需要對數據進行加工,都是在索引之前進行處理,比如logstash可以對日志進行結構化和轉換,現在直接在es就可以處理了。
目前es提供了一些常用的諸如convert、grok之類的處理器,在使用的時候,先定義一個pipeline管道,里面設置文檔的加工邏輯,在建索引的時候指定pipeline名稱,那么這個索引就會按照預先定義好的pipeline來處理了。
Ingest Attachment Processor Plugin
處理文檔附件,替換之前的 mapper attachment plugin。
默認存儲附件內容必須base64編碼的數據,不想base64轉換,可以使用CBOR(沒有試驗)
官網說明:
The source field must be a base64 encoded binary.
If you do not want to incur the overhead of converting back and forth between base64,
you can use the CBOR format instead of JSON and specify the field as a bytes array instead of a string representation.
The processor will skip the base64 decoding then.
安裝
./bin/elasticsearch-plugin install ingest-attachment
卸載
./bin/elasticsearch-plugin remove ingest-attachment
用管道處理單個附件示例(Using the Attachment Processor in a Pipeline)
1.創建管道single_attachment
PUT _ingest/pipeline/single_attachment
{
"description" : "Extract single attachment information",
"processors" : [
{
"attachment" : {
"field": "data",
"indexed_chars" : -1,
"ignore_missing" : true
}
}
]
}
2.創建index
PUT /index1
{
"mappings" : {
"type1" : {
"properties" : {
"id": {
"type": "keyword"
},
"filename": {
"type": "text",
"analyzer": "english"
},
"data":{
"type": "text",
"analyzer": "english"
}
}
}
}
}
3.索引數據
PUT index1/type1/1?pipeline=single_attachment&refresh=true&pretty=1
{
"id": "1",
"filename": "1.txt",
"data" : "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
PUT index1/type1/2?pipeline=single_attachment&refresh=true&pretty=1
{
"id": "2",
"subject": "2.txt",
"data": "dGVzdGluZyBteSBmaXJzdCBlbmNvZGVkIHRleHQ="
}
4.查看結果
GET index1/type1/1
GET index1/type1/2
POST index1/type1/_search?pretty=true
{
"query": {
"match": {
"attachment.content_type": "text plain"
}
}
}
POST index1/type1/_search?pretty=true
{
"query": {
"match": {
"attachment.content": "testing"
}
},
"highlight": {
"fields": {
"attachment.content": {}
}
}
}
返回結果
"hits": [
{
"_index": "index1",
"_type": "type1",
"_id": "2",
"_score": 0.2824934,
"_source": {
"data": "dGVzdGluZyBteSBmaXJzdCBlbmNvZGVkIHRleHQ=",
"attachment": {
"content_type": "text/plain; charset=ISO-8859-1",
"language": "et",
"content": "testing my first encoded text",
"content_length": 30
},
"subject": "2.txt",
"id": "2"
},
"highlight": {
"attachment.content": [
"<em>testing</em> my first encoded text"
]
}
}
]
用管道處理多個附件示例(Using the Attachment Processor with arrays)、
1.創建管道multi_attachment
PUT _ingest/pipeline/multi_attachment
{
"description" : "Extract attachment information from arrays",
"processors" : [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"target_field": "_ingest._value.attachment",
"field": "_ingest._value.data",
"indexed_chars" : -1,
"ignore_missing" : true
}
}
}
}
]
}
2.創建index
PUT /index2
{
"mappings" : {
"type2" : {
"properties" : {
"id": {
"type": "keyword"
},
"subject": {
"type": "text",
"analyzer": "ik_max_word"
},
"attachments": {
"properties":{
"filename" : {"type": "text","analyzer": "english"},
"data":{"type": "text","analyzer": "english"}
}
}
}
}
}
}
3.索引數據
PUT index2/type2/1?pipeline=attachment&refresh=true&pretty=1
{
"id": "1",
"subject": "Elasticsearch: The Definitive Guide007",
"attachments" : [
{
"filename" : "a.txt",
"data" : "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
},
{
"filename" : "b.txt",
"data" : "dGVzdGluZyBteSBmaXJzdCBlbmNvZGVkIHRleHQ="
}
]
}
PUT index2/type2/2?pipeline=attachment&refresh=true&pretty=1
{
"id": "2",
"subject": "Using the Attachment Processor with arrays",
"attachments" : [
{
"filename" : "test1.txt",
"data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
},
{
"filename" : "test2.txt",
"data" : "VGhpcyBpcyBhIHRlc3QK"
}
]
}
4.查看結果
POST index2/type2/_search?pretty=true
{
"query": {
"match": {
"attachments.attachment.content": "test"
}
},
"highlight": {
"fields": {
"attachments.attachment.content": {}
}
}
}
返回結果
"hits": [
{
"_index": "index2",
"_type": "type2",
"_id": "2",
"_score": 0.27233246,
"_source": {
"attachments": [
{
"filename": "test1.txt",
"data": "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=",
"attachment": {
"content_type": "text/plain; charset=ISO-8859-1",
"language": "en",
"content": "this is just some text",
"content_length": 24
}
}
,
{
"filename": "test2.txt",
"data": "VGhpcyBpcyBhIHRlc3QK",
"attachment": {
"content_type": "text/plain; charset=ISO-8859-1",
"language": "en",
"content": "This is a test",
"content_length": 16
}
}
],
"id": "2",
"subject": "Using the Attachment Processor with arrays"
},
"highlight": {
"attachments.attachment.content": [
"This is a <em>test</em>"
]
}
}
]