美文网首页
50、初识搜索引擎_如何将一个field索引两次来解决字符串排序

50、初识搜索引擎_如何将一个field索引两次来解决字符串排序

作者: 拉提娜的爸爸 | 来源:发表于2020-01-09 10:06 被阅读0次

如果对一个string field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。

通常解决方案是,将一个string field建立两次索引,一个分词,用来进行搜索;一个不分词,用来进行排序。

1、怎样将string field建立两次索引

示例:给title建立两次索引

PUT /website
{
  "mappings": {
    "article": {
      "properties": {
        "title":{
          "type": "text", 
          "fields": {
            "raw":{
              "type": "string",
              "index": "not_analyzed"
            }
          },
          "fielddata": true
        },
        "content":{
          "type": "text"
        },
        "post_date":{
          "type": "date"
        },
        "author_id":{
          "type": "long"
        }
      }
    }
  }
}

2、测试建立的索引

1.新增document数据

PUT /website/article/1
{
  "title": "first article",
  "content": "this is my first article",
  "post_date": "2017-01-01",
  "author_id": 110
}
........省略其他两条
-------------------------------结果-------------------------------
{
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-02-01",
          "author_id": 110
        }
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-01-01",
          "author_id": 110
        }
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 110
        }
      }

2.查询数据并根据title进行正序排序
先进行一般的排序

GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title": {
        "order": "asc"
      }
    }
  ]
}
-----------------------------------------结果-----------------------------------------
{
  "took": 28,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-02-01",
          "author_id": 110
        },
        "sort": [
          "article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-01-01",
          "author_id": 110
        },
        "sort": [
          "article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 110
        },
        "sort": [
          "article"
        ]
      }
    ]
  }
}

排序结果发现,因为title被分词,排序sort根据的是article排序,结果并不稳定,因为分词器将值分开,每次排序结果并不稳定。
使用第二次建立的索引进行排序

GET /website/article/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "title.raw": {
        "order": "asc"
      }
    }
  ]
}
---------------------------------结果---------------------------------
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "website",
        "_type": "article",
        "_id": "1",
        "_score": null,
        "_source": {
          "title": "first article",
          "content": "this is my first article",
          "post_date": "2017-01-01",
          "author_id": 110
        },
        "sort": [
          "first article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "2",
        "_score": null,
        "_source": {
          "title": "second article",
          "content": "this is my second article",
          "post_date": "2017-02-01",
          "author_id": 110
        },
        "sort": [
          "second article"
        ]
      },
      {
        "_index": "website",
        "_type": "article",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "third article",
          "content": "this is my third article",
          "post_date": "2017-03-01",
          "author_id": 110
        },
        "sort": [
          "third article"
        ]
      }
    ]
  }
}

查询结果显示sort排序是根据整个title的值进行排序的

相关文章

网友评论

      本文标题:50、初识搜索引擎_如何将一个field索引两次来解决字符串排序

      本文链接:https://www.haomeiwen.com/subject/vzaeactx.html