Rank Feature 和 Rank Features 字段類型的支持,使得ES在特征數據處理上成為了可能
1、介紹
rank_feature 和 rank_features 只支持存儲數字,查詢時使用 rank_feature query語句;rank_features 是rank_feature的擴展,支持存儲多個維度,當特征維度比較多時,使用rank_features是非常適合的。
PUT test
{
"mappings": {
"properties": {
"pagerank": {
"type": "rank_feature"
},
"url_length": {
"type": "rank_feature",
"positive_score_impact": false
},
"topics": {
"type": "rank_features"
}
}
}
}
PUT test/_doc/1
{
"url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
"content": "Rio 2016",
"pagerank": 50.3,
"url_length": 42,
"topics": {
"sports": 50,
"brazil": 30
}
}
PUT test/_doc/2
{
"url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
"content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in S?o Paulo, Brazil",
"pagerank": 50.3,
"url_length": 47,
"topics": {
"sports": 35,
"formula one": 65,
"brazil": 20
}
}
PUT test/_doc/3
{
"url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
"content": "Deadpool is a 2016 American superhero film",
"pagerank": 50.3,
"url_length": 37,
"topics": {
"movies": 60,
"super hero": 65
}
}
POST test/_refresh
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "2016"
}
}
],
"should": [
{
"rank_feature": {
"field": "pagerank"
}
},
{
"rank_feature": {
"field": "url_length",
"boost": 0.1
}
},
{
"rank_feature": {
"field": "topics.sports",
"boost": 0.4
}
}
]
}
}
}
2、操作
rank feature和rank features 只能搭配rank_feature query語句使用,不支持其它query以及排序和聚合操作,它們存儲的特征數值只能是正數。
如果某個特征對于整體打分成負相關的話,需要將該field對應的positive_score_impact 參數設置為false(默認是true),這樣在進行rank_feature query查詢時,該字段對應的value值會對整體打分進行衰減,如在網站搜索引擎中 url 長度字段,url越長的對文檔提升score貢獻越低。