前面了解了怎样向es中索引数据,这一切的最终目的是为了搜索数据。
为了很好的展示搜索,我们准备一些根据真实的数据。
向bank索引库的account类型中导入一些数据。
POST http://localhost:9200/bank/account/_bulk
{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
{"index":{"_id":"13"}}
{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"}
{"index":{"_id":"18"}}
{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"}
{"index":{"_id":"20"}}
{"account_number":20,"balance":16418,"firstname":"Elinor","lastname":"Ratliff","age":36,"gender":"M","address":"282 Kings Place","employer":"Scentric","email":"elinorratliff@scentric.com","city":"Ribera","state":"WA"}
搜索有两种基本方式:一种是通过发送搜索参数,一种是发送正文主体。请求正文主体允许您更具表达力,并以更易读的JSON格式定义您的搜索。我们将尝试一个请求参数方法的例子。
搜索的REST API是_search端点。此示例返回bank索引中的所有文档,并且按account_number排序:
GET http://localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty
返回
{
"took" : 189,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : null,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
},
"sort" : [
1
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : null,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
},
"sort" : [
6
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "13",
"_score" : null,
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
},
"sort" : [
13
]
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "18",
"_score" : null,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "daleadams@boink.com",
"city" : "Orick",
"state" : "MD"
},
"sort" : [
18
]
}
]
}
}
响应中我们看到以下部分:
took - Elasticsearch执行搜索的时间(毫秒)
timed_out - 告诉我们搜索是否超时
_shards - 告诉我们搜索了多少个分片,以及成功/失败的搜索分片的计数
hits - 搜索结果
hits.total - 符合我们的搜索条件的文档总数
hits.hits - 实际搜索结果数组(默认为前10个文档)
sort - 结果排序键(如果按分数排序,则缺少)
_score和max_score-暂时忽略吧,后面再讲
以下是使用发送请求体实现上述相同的搜索:
POST http://localhost:9200/bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
查询语言
Elasticsearch提供了一种JSON风格的特定于域的语言,您可以使用它来执行查询。这被称为查询DSL。查询语言相当全面,初看起来可能是吓人的,但实际学习它的最好方法是从一些基本的例子开始。
GET /bank/_search
{
"query": { "match_all": {} },
"size": 1
}
query部分表示查询定义,match_all表示查询指定索引的所有文档,size表示返回记录的条数,请注意,如果size没有指定,默认为10。
from参数指定从哪个文档开始搜索,结合size参数在实现搜索结果的分页时很有用。请注意,如果from没有指定,则默认为0。
这个例子做了match_all查询,并通过账户余额降序排列的结果进行排序,并返回前10名(默认大小)的文件。
GET /bank/_search
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}
默认情况下,返回文档中的所有字段,可以通过_source参数指定需要的字段。
GET /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}
现在让我们继续查询部分。此前,我们已经看到了如何match_all查询用来匹配所有文档。现在,让我们引入一个叫做match的查询,基于文档相关性的全文搜索—一种传统数据库很难实现的功能。
此示例返回编号为20的帐户:
GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}
此示例返回address字段包含mill的文档。
GET /bank/_search
{
"query": { "match": { "address": "mill" } }
}
此示例返回address字段包含mill或者lane的文档
GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
此示例返回address字段包含mill lane短语的文档,注意,使用了match_phrase查询,表示匹配含有"mill lane"的文档。
GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
关键词高亮
{
"query": { "match": { "address": "lane Street" } },
"_source": ["address"],
"highlight": {
"fields" : {
"address" : {}
}
}
}
结果中会对关键词用<em>包装
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.86312973,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "1",
"_score": 0.86312973,
"_source": {
"address": "880 Holmes Lane"
},
"highlight": {
"address": [
"880 Holmes <em>Lane</em>"
]
}
},
{
"_index": "bank",
"_type": "account",
"_id": "13",
"_score": 0.86312973,
"_source": {
"address": "789 Madison Street"
},
"highlight": {
"address": [
"789 Madison <em>Street</em>"
]
}
},
{
"_index": "bank",
"_type": "account",
"_id": "6",
"_score": 0.25316024,
"_source": {
"address": "671 Bristol Street"
},
"highlight": {
"address": [
"671 Bristol <em>Street</em>"
]
}
}
]
}
}
bool查询
bool查询可以组合一些小的查询成为一个大的查询。
must: 表示文档一定要满足这个里面的条件,两个must相当于and
must_not: 表示文档一定不能满足这个里面的条件,相当于not
should: 两个shoud表示or的关系,满足其中一个即可
下面的示例返回address字段必须包含mill和lane的文档。
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
下面的示例返回address字段必须包含mill或者lane的文档。
GET /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
下面的示例返回address字段必须不包含mill和lane的文档。
GET /bank/_search
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
下面的示例返回age字段必须等于40,bingqstate不包含ID的文档。
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
过滤
在上一节中,我们跳过了称为文档得分(一个小细节——搜索结果中的_score字段)。分数是一个数值,它是文档与我们指定的搜索查询匹配程度的相对度量。分数越高,文档越相关,分数越低,文档的相关性越低。
查询时,es会计算文档的得分,但是查询并不总是需要计算分数,特别是当它们仅用于“过滤”文档集时。Elasticsearch检测这些情况并自动优化查询执行,以便不计算无用的分数。
查询中,支持filter子句。作为一个例子,让我们介绍range查询,这使得我们可以通过一定范围的值来过滤文档。这通常用于数字或日期过滤。
此示例使用bool查询返回余额介于20000和30000(含)之间的所有帐户。
GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
上面的bool查询包含match_all查询(查询部分)和一个range查询(滤波器部分)。我们可以将任何其他查询替换为查询和过滤器部分。
聚合编辑
聚合提供了从数据中分组和提取统计信息的功能。考虑聚合的最简单的方法是大致等同于SQL GROUP BY和SQL聚合函数。在Elasticsearch中,您可以执行返回匹配的搜索,同时返回与一个响应中的所有匹配分开的聚合结果。
首先,此示例按状态分组所有帐户,然后返回按计数降序排序的前10个(默认)状态(也为默认):
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
返回结果
{
"took": 33,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": [ ]
},
"aggregations": {
"group_by_state": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "IL",
"doc_count": 1
},
{
"key": "MD",
"doc_count": 1
},
{
"key": "TN",
"doc_count": 1
},
{
"key": "VA",
"doc_count": 1
}
]
}
}
}
相当于sql中的
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
Elasticsearch是一个简单而复杂的产品。到目前为止,我们已经学习了它的基本知识,如何查看它,以及如何使用一些REST API来处理它。我希望本教程让您更好地了解Elasticsearch是什么,更重要的是,激励您进一步尝试其余的功能!
之后,我们将深入和细致地走入elasticsearch的世界!
特别提醒
如果读者希望很好的学会和掌握es,必须要动手实践,不能只看不练,不然学习效果会大打折扣.











网友评论