Ingest Attachment Processor Plugin 基本用法

前言

elasticsearch5.x 新增一個比較重要的特性 IngestNode。
之前如果需要對數據進行加工,都是在索引之前進行處理,比如logstash可以對日志進行結構化和轉換,現在直接在es就可以處理了。
目前es提供了一些常用的諸如convert、grok之類的處理器,在使用的時候,先定義一個pipeline管道,里面設置文檔的加工邏輯,在建索引的時候指定pipeline名稱,那么這個索引就會按照預先定義好的pipeline來處理了。

Ingest Attachment Processor Plugin

處理文檔附件,替換之前的 mapper attachment plugin。
默認存儲附件內容必須base64編碼的數據,不想base64轉換,可以使用CBOR(沒有試驗)
官網說明:

The source field must be a base64 encoded binary. 
If you do not want to incur the overhead of converting back and forth between base64, 
you can use the CBOR format instead of JSON and specify the field as a bytes array instead of a string representation. 
The processor will skip the base64 decoding then.

安裝

./bin/elasticsearch-plugin install ingest-attachment

卸載

./bin/elasticsearch-plugin remove ingest-attachment

用管道處理單個附件示例(Using the Attachment Processor in a Pipeline)

1.創建管道single_attachment

PUT _ingest/pipeline/single_attachment
{
  "description" : "Extract single attachment information",
  "processors" : [
    {
      "attachment" : {
        "field": "data",
        "indexed_chars" : -1,
        "ignore_missing" : true
      }
    }
  ]
}

2.創建index

PUT /index1
{
    "mappings" : {
        "type1" : {
            "properties" : {
                "id": {
                    "type": "keyword"
                },
                "filename": {
                    "type": "text",
                    "analyzer": "english"
                },
                "data":{
                    "type": "text",
                    "analyzer": "english"
                }
            }
        }
    }
}

3.索引數據

PUT index1/type1/1?pipeline=single_attachment&refresh=true&pretty=1
{
    "id": "1",
    "filename": "1.txt",
    "data" : "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
PUT index1/type1/2?pipeline=single_attachment&refresh=true&pretty=1
{
  "id": "2",
  "subject": "2.txt",
  "data": "dGVzdGluZyBteSBmaXJzdCBlbmNvZGVkIHRleHQ="
}

4.查看結果

GET index1/type1/1
GET index1/type1/2
POST index1/type1/_search?pretty=true
{
  "query": {
    "match": {
      "attachment.content_type": "text plain"
    }
  }
}
POST index1/type1/_search?pretty=true
{
  "query": {
    "match": {
      "attachment.content": "testing"
    }
  },
  "highlight": {
    "fields": {
      "attachment.content": {}
    }
  }
}

返回結果

"hits": [
    {
        "_index": "index1",
        "_type": "type1",
        "_id": "2",
        "_score": 0.2824934,
        "_source": {
            "data": "dGVzdGluZyBteSBmaXJzdCBlbmNvZGVkIHRleHQ=",
            "attachment": {
                "content_type": "text/plain; charset=ISO-8859-1",
                "language": "et",
                "content": "testing my first encoded text",
                "content_length": 30
            },
            "subject": "2.txt",
            "id": "2"
        },
        "highlight": {
            "attachment.content": [
                "<em>testing</em> my first encoded text"
            ]
        }
    }
]

用管道處理多個附件示例(Using the Attachment Processor with arrays)、

1.創建管道multi_attachment

PUT _ingest/pipeline/multi_attachment
{
  "description" : "Extract attachment information from arrays",
  "processors" : [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "target_field": "_ingest._value.attachment",
            "field": "_ingest._value.data",
            "indexed_chars" : -1,
            "ignore_missing" : true
          }
        }
      }
    }
  ]
}

2.創建index

PUT /index2
{
    "mappings" : {
        "type2" : {
            "properties" : {
                "id": { 
                    "type": "keyword"
                },
                "subject": { 
                    "type": "text",
                    "analyzer": "ik_max_word"
                },
                "attachments": {
                    "properties":{
                         "filename" : {"type": "text","analyzer": "english"},
                         "data":{"type": "text","analyzer": "english"}
                    }
                }
            }
        }
    }
}

3.索引數據

PUT index2/type2/1?pipeline=attachment&refresh=true&pretty=1
{
  "id": "1",
  "subject": "Elasticsearch: The Definitive Guide007",
  "attachments" : [
    {
      "filename" : "a.txt",
      "data" : "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
    },
    {
      "filename" : "b.txt",
      "data" : "dGVzdGluZyBteSBmaXJzdCBlbmNvZGVkIHRleHQ="
    }
  ]
}
PUT index2/type2/2?pipeline=attachment&refresh=true&pretty=1
{
  "id": "2",
  "subject": "Using the Attachment Processor with arrays",
  "attachments" : [
    {
      "filename" : "test1.txt",
      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
    },
    {
      "filename" : "test2.txt",
      "data" : "VGhpcyBpcyBhIHRlc3QK"
    }
  ]
}

4.查看結果

POST index2/type2/_search?pretty=true
{
  "query": {
    "match": {
      "attachments.attachment.content": "test"
    }
  },
  "highlight": {
    "fields": {
      "attachments.attachment.content": {}
    }
  }
}
返回結果
"hits": [
    {
        "_index": "index2",
        "_type": "type2",
        "_id": "2",
        "_score": 0.27233246,
        "_source": {
            "attachments": [
                {
                    "filename": "test1.txt",
                    "data": "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo=",
                    "attachment": {
                        "content_type": "text/plain; charset=ISO-8859-1",
                        "language": "en",
                        "content": "this is just some text",
                        "content_length": 24
                    }
                }
                ,
                {
                    "filename": "test2.txt",
                    "data": "VGhpcyBpcyBhIHRlc3QK",
                    "attachment": {
                        "content_type": "text/plain; charset=ISO-8859-1",
                        "language": "en",
                        "content": "This is a test",
                        "content_length": 16
                    }
                }
            ],
            "id": "2",
            "subject": "Using the Attachment Processor with arrays"
        },
        "highlight": {
            "attachments.attachment.content": [
                "This is a <em>test</em>"
            ]
        }
    }
]
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • Spring Cloud為開發人員提供了快速構建分布式系統中一些常見模式的工具(例如配置管理,服務發現,斷路器,智...
    卡卡羅2017閱讀 134,933評論 18 139
  • 版本記錄 前言 OpenGL ES是一個強大的圖形庫,是跨平臺的圖形API,屬于OpenGL的一個簡化版本。iOS...
    刀客傳奇閱讀 7,739評論 0 2
  • kafka的定義:是一個分布式消息系統,由LinkedIn使用Scala編寫,用作LinkedIn的活動流(Act...
    時待吾閱讀 5,361評論 1 15
  • 轉載自cr180大神DiscuzX2.5完整目錄結構【source程序文件庫】 /source/admincp后臺...
    cndaqiang閱讀 884評論 1 2
  • 我本以為自己是個很堅強的人,但是并不是這樣。對家人也好,對朋友也好,我以為能夠相處下去的秘訣就是:不能把最重要...
    金蓮月閱讀 222評論 0 1