elasticsearch 7.0 新特性之 Rank Feature & Rank Features

Rank Feature 和 Rank Features 字段類型的支持,使得ES在特征數據處理上成為了可能

1、介紹

rank_feature 和 rank_features 只支持存儲數字,查詢時使用 rank_feature query語句;rank_features 是rank_feature的擴展,支持存儲多個維度,當特征維度比較多時,使用rank_features是非常適合的。

PUT test
{
  "mappings": {
    "properties": {
      "pagerank": {
        "type": "rank_feature"
      },
      "url_length": {
        "type": "rank_feature",
        "positive_score_impact": false
      },
      "topics": {
        "type": "rank_features"
      }
    }
  }
}

PUT test/_doc/1
{
  "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  "content": "Rio 2016",
  "pagerank": 50.3,
  "url_length": 42,
  "topics": {
    "sports": 50,
    "brazil": 30
  }
}

PUT test/_doc/2
{
  "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in S?o Paulo, Brazil",
  "pagerank": 50.3,
  "url_length": 47,
  "topics": {
    "sports": 35,
    "formula one": 65,
    "brazil": 20
  }
}

PUT test/_doc/3
{
  "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  "content": "Deadpool is a 2016 American superhero film",
  "pagerank": 50.3,
  "url_length": 37,
  "topics": {
    "movies": 60,
    "super hero": 65
  }
}

POST test/_refresh

GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "2016"
          }
        }
      ],
      "should": [
        {
          "rank_feature": {
            "field": "pagerank"
          }
        },
        {
          "rank_feature": {
            "field": "url_length",
            "boost": 0.1
          }
        },
        {
          "rank_feature": {
            "field": "topics.sports",
            "boost": 0.4
          }
        }
      ]
    }
  }
}

2、操作

rank feature和rank features 只能搭配rank_feature query語句使用,不支持其它query以及排序和聚合操作,它們存儲的特征數值只能是正數。

如果某個特征對于整體打分成負相關的話,需要將該field對應的positive_score_impact 參數設置為false(默認是true),這樣在進行rank_feature query查詢時,該字段對應的value值會對整體打分進行衰減,如在網站搜索引擎中 url 長度字段,url越長的對文檔提升score貢獻越低。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。