美文网首页
ElasticSearch5.x研究日记3:搜索数据

ElasticSearch5.x研究日记3:搜索数据

作者: 不迷失 | 来源:发表于2017-01-22 22:14 被阅读193次

前面了解了怎样向es中索引数据,这一切的最终目的是为了搜索数据。

为了很好的展示搜索,我们准备一些根据真实的数据。

向bank索引库的account类型中导入一些数据。

POST http://localhost:9200/bank/account/_bulk

{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
{"index":{"_id":"13"}}
{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"}
{"index":{"_id":"18"}}
{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"}
{"index":{"_id":"20"}}
{"account_number":20,"balance":16418,"firstname":"Elinor","lastname":"Ratliff","age":36,"gender":"M","address":"282 Kings Place","employer":"Scentric","email":"elinorratliff@scentric.com","city":"Ribera","state":"WA"}

搜索有两种基本方式:一种是通过发送搜索参数,一种是发送正文主体。请求正文主体允许您更具表达力,并以更易读的JSON格式定义您的搜索。我们将尝试一个请求参数方法的例子。

搜索的REST API是_search端点。此示例返回bank索引中的所有文档,并且按account_number排序:

GET http://localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty

返回

{
  "took" : 189,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "6",
        "_score" : null,
        "_source" : {
          "account_number" : 6,
          "balance" : 5686,
          "firstname" : "Hattie",
          "lastname" : "Bond",
          "age" : 36,
          "gender" : "M",
          "address" : "671 Bristol Street",
          "employer" : "Netagy",
          "email" : "hattiebond@netagy.com",
          "city" : "Dante",
          "state" : "TN"
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "13",
        "_score" : null,
        "_source" : {
          "account_number" : 13,
          "balance" : 32838,
          "firstname" : "Nanette",
          "lastname" : "Bates",
          "age" : 28,
          "gender" : "F",
          "address" : "789 Madison Street",
          "employer" : "Quility",
          "email" : "nanettebates@quility.com",
          "city" : "Nogal",
          "state" : "VA"
        },
        "sort" : [
          13
        ]
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "18",
        "_score" : null,
        "_source" : {
          "account_number" : 18,
          "balance" : 4180,
          "firstname" : "Dale",
          "lastname" : "Adams",
          "age" : 33,
          "gender" : "M",
          "address" : "467 Hutchinson Court",
          "employer" : "Boink",
          "email" : "daleadams@boink.com",
          "city" : "Orick",
          "state" : "MD"
        },
        "sort" : [
          18
        ]
      }
    ]
  }
}

响应中我们看到以下部分:

took - Elasticsearch执行搜索的时间(毫秒)

timed_out - 告诉我们搜索是否超时

_shards - 告诉我们搜索了多少个分片,以及成功/失败的搜索分片的计数

hits - 搜索结果

hits.total - 符合我们的搜索条件的文档总数

hits.hits - 实际搜索结果数组(默认为前10个文档)

sort - 结果排序键(如果按分数排序,则缺少)

_score和max_score-暂时忽略吧,后面再讲

以下是使用发送请求体实现上述相同的搜索:

POST http://localhost:9200/bank/_search

{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

查询语言

Elasticsearch提供了一种JSON风格的特定于域的语言,您可以使用它来执行查询。这被称为查询DSL。查询语言相当全面,初看起来可能是吓人的,但实际学习它的最好方法是从一些基本的例子开始。

GET /bank/_search
{
  "query": { "match_all": {} },
  "size": 1
}

query部分表示查询定义,match_all表示查询指定索引的所有文档,size表示返回记录的条数,请注意,如果size没有指定,默认为10。

from参数指定从哪个文档开始搜索,结合size参数在实现搜索结果的分页时很有用。请注意,如果from没有指定,则默认为0。

这个例子做了match_all查询,并通过账户余额降序排列的结果进行排序,并返回前10名(默认大小)的文件。

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}

默认情况下,返回文档中的所有字段,可以通过_source参数指定需要的字段。

GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}

现在让我们继续查询部分。此前,我们已经看到了如何match_all查询用来匹配所有文档。现在,让我们引入一个叫做match的查询,基于文档相关性的全文搜索—一种传统数据库很难实现的功能。

此示例返回编号为20的帐户:

GET /bank/_search
{
  "query": { "match": { "account_number": 20 } }
}

此示例返回address字段包含mill的文档。

GET /bank/_search
{
  "query": { "match": { "address": "mill" } }
}

此示例返回address字段包含mill或者lane的文档

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

此示例返回address字段包含mill lane短语的文档,注意,使用了match_phrase查询,表示匹配含有"mill lane"的文档。

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

关键词高亮

{
  "query": { "match": { "address": "lane Street" } },
  "_source": ["address"],
  "highlight": {
        "fields" : {
            "address" : {}
        }
    }
}

结果中会对关键词用<em>包装

{
    "took": 10, 
    "timed_out": false, 
    "_shards": {
        "total": 5, 
        "successful": 5, 
        "failed": 0
    }, 
    "hits": {
        "total": 3, 
        "max_score": 0.86312973, 
        "hits": [
            {
                "_index": "bank", 
                "_type": "account", 
                "_id": "1", 
                "_score": 0.86312973, 
                "_source": {
                    "address": "880 Holmes Lane"
                }, 
                "highlight": {
                    "address": [
                        "880 Holmes <em>Lane</em>"
                    ]
                }
            }, 
            {
                "_index": "bank", 
                "_type": "account", 
                "_id": "13", 
                "_score": 0.86312973, 
                "_source": {
                    "address": "789 Madison Street"
                }, 
                "highlight": {
                    "address": [
                        "789 Madison <em>Street</em>"
                    ]
                }
            }, 
            {
                "_index": "bank", 
                "_type": "account", 
                "_id": "6", 
                "_score": 0.25316024, 
                "_source": {
                    "address": "671 Bristol Street"
                }, 
                "highlight": {
                    "address": [
                        "671 Bristol <em>Street</em>"
                    ]
                }
            }
        ]
    }
}

bool查询

bool查询可以组合一些小的查询成为一个大的查询。

must: 表示文档一定要满足这个里面的条件,两个must相当于and

must_not: 表示文档一定不能满足这个里面的条件,相当于not

should: 两个shoud表示or的关系,满足其中一个即可

下面的示例返回address字段必须包含mill和lane的文档。

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

下面的示例返回address字段必须包含mill或者lane的文档。

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

下面的示例返回address字段必须不包含mill和lane的文档。

GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

下面的示例返回age字段必须等于40,bingqstate不包含ID的文档。

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

过滤

在上一节中,我们跳过了称为文档得分(一个小细节——搜索结果中的_score字段)。分数是一个数值,它是文档与我们指定的搜索查询匹配程度的相对度量。分数越高,文档越相关,分数越低,文档的相关性越低。

查询时,es会计算文档的得分,但是查询并不总是需要计算分数,特别是当它们仅用于“过滤”文档集时。Elasticsearch检测这些情况并自动优化查询执行,以便不计算无用的分数。

查询中,支持filter子句。作为一个例子,让我们介绍range查询,这使得我们可以通过一定范围的值来过滤文档。这通常用于数字或日期过滤。

此示例使用bool查询返回余额介于20000和30000(含)之间的所有帐户。

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

上面的bool查询包含match_all查询(查询部分)和一个range查询(滤波器部分)。我们可以将任何其他查询替换为查询和过滤器部分。

聚合编辑

聚合提供了从数据中分组和提取统计信息的功能。考虑聚合的最简单的方法是大致等同于SQL GROUP BY和SQL聚合函数。在Elasticsearch中,您可以执行返回匹配的搜索,同时返回与一个响应中的所有匹配分开的聚合结果。

首先,此示例按状态分组所有帐户,然后返回按计数降序排序的前10个(默认)状态(也为默认):

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

返回结果

{
    "took": 33, 
    "timed_out": false, 
    "_shards": {
        "total": 5, 
        "successful": 5, 
        "failed": 0
    }, 
    "hits": {
        "total": 4, 
        "max_score": 0, 
        "hits": [ ]
    }, 
    "aggregations": {
        "group_by_state": {
            "doc_count_error_upper_bound": 0, 
            "sum_other_doc_count": 0, 
            "buckets": [
                {
                    "key": "IL", 
                    "doc_count": 1
                }, 
                {
                    "key": "MD", 
                    "doc_count": 1
                }, 
                {
                    "key": "TN", 
                    "doc_count": 1
                }, 
                {
                    "key": "VA", 
                    "doc_count": 1
                }
            ]
        }
    }
}

相当于sql中的

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

Elasticsearch是一个简单而复杂的产品。到目前为止,我们已经学习了它的基本知识,如何查看它,以及如何使用一些REST API来处理它。我希望本教程让您更好地了解Elasticsearch是什么,更重要的是,激励您进一步尝试其余的功能!

之后,我们将深入和细致地走入elasticsearch的世界!

特别提醒
如果读者希望很好的学会和掌握es,必须要动手实践,不能只看不练,不然学习效果会大打折扣.

相关文章

网友评论

      本文标题:ElasticSearch5.x研究日记3:搜索数据

      本文链接:https://www.haomeiwen.com/subject/eswobttx.html