Avg Aggregation(平均数)
- 使用字段
POST /exams/_search?size=0
{
"aggs" : {
"avg_grade" : { "avg" : { "field" : "grade" } }
}
}
- 使用脚本
POST /exams/_search?size=0
{
"aggs" : {
"avg_grade" : {
"avg" : {
"script" : {
"source" : "doc.grade.value"
}
}
}
}
}
注: 用missing字段当作缺失值
Weighted Avg Aggregation(带权平均数)
As a formula, a weighted average is the ∑(value * weight) / ∑(weight)
- 字段
如果field 是有多个值的,会当为多个值来处理
但是权重的字段不能为数组类型,否则会抛Encountered more than one weight for a single document
的错误
POST /exams/_doc?refresh
{
"grade": [1, 2, 3],
"weight": 2
}
POST /exams/_search
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "grade"
},
"weight": {
"field": "weight"
}
}
}
}
}
- 脚本
TODO: 脚本中权重可以为数组,但是只会用第一个值, 权重和字段的值都为数组的时候情况貌似又不一样
POST /exams/_search
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"script": "doc.grade.value + 1"
},
"weight": {
"script": "doc.weight.value + 1"
}
}
}
}
}
Cardinality Aggregation
an approximate count of distinct values(近似去重计数)
- 使用的是HyperLogLog++算法, 占用内存为(precision_threshold * 8) bytes
- 当高于precision_threshold时计数可能会变得模糊,基本上 precision_threshold越大,计数越准确,最大(40000)
- 当需要对基数比较大的字符串做计数时,可自生成hash值或者用
mapper-murmur3
预先生成hash值,这样会快跟多,省cpu- 缺失值用
missing
, 但是脚本中定义没用,报错No field found for [sd] in mapping with types []
- 字段
POST /exams/_search?size=0
{
"aggs":{
"type_count": {
"cardinality":{
"field": "weight",
"precision_threshold": 100
}
}
}
}
- 脚本
doc.grade.value
和doc['type'].value
等同
POST /sales/_search?size=0
{
"aggs" : {
"type_promoted_count" : {
"cardinality" : {
"script": {
"lang": "painless",
"source": "doc['type'].value + ' ' + doc['promoted'].value"
}
}
}
}
}
Extended Stats Aggregation
拓展统计聚合
统计的拓展版本,会包含平方和(sum_of_squares),方差(variance),标准差(std_deviation),标准差范围(std_deviation_bounds,默认平均数±2倍标准差,可通过sigma
参数配置倍数)
missing字段
- 字段
GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"field" : "grade",
"sigma" : 3
}
}
}
}
- 脚本
GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"script" : {
"source" : "doc['grade'].value",
"lang" : "painless"
}
}
}
}
}
GET /exams/_search
{
"size": 0,
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"field" : "grade",
"script" : {
"lang" : "painless",
"source": "_value * params.correction",
"params" : {
"correction" : 1.2
}
}
}
}
}
}
Geo Bounds Aggregation
获取地理坐标边界
wrap_longitude: 是否可以与国际日期更改线重叠
POST /museums/_search?size=0
{
"query" : {
"match" : { "name" : "musée" }
},
"aggs" : {
"viewport" : {
"geo_bounds" : {
"field" : "location",
"wrap_longitude" : true
}
}
}
}
Geo Centroid Aggregation
坐标重心
POST /museums/_search?size=0
{
"aggs" : {
"cities" : {
"terms" : { "field" : "city.keyword" },
"aggs" : {
"centroid" : {
"geo_centroid" : { "field" : "location" }
}
}
}
}
}
Max/Min Aggregation
最大值/最小值
POST /sales/_search?size=0
{
"aggs" : {
"max_price" : { "max" : { "field" : "price" } }
}
}
Percentiles Aggregation
百分比,将数值型字段做排序,然后给出每个百分比的最大值,默认为[ 1, 5, 25, 50, 75, 95, 99 ]
可通过percents
字段指定你需要的百分比
可通过keyed
字段指定用map形式还是key=x,value=x形式
missing
字段
这个用的算法是TDigest (introduced by Ted Dunning in Computing Accurate Quantiles using T-Digests)
- 精度是q(1-q),百分比越高,准确率越高
- 数据集越小,精度越高
compression
字段控制最大node使用数为20*compression,可以控制精度和内存的平衡,默认是100image.png
- HDR Histogram (High Dynamic Range Histogram)这个会比t-digest更快,但hdr只支持正数,需要用
number_of_significant_value_digits
指定有效单位,如果数据范围未知容易引起高内存使用率
GET latency/_search
{
"size": 0,
"aggs" : {
"load_time_outlier" : {
"percentiles" : {
"field" : "load_time",
"tdigest": {
"compression" : 200
}
}
}
}
}
GET exams/_search
{
"size": 0,
"aggs" : {
"load_time_outlier" : {
"percentiles" : {
"field" : "grade",
"percents" : [95, 99, 99.9],
"hdr": {
"number_of_significant_value_digits" : 3
},
"keyed":false
}
}
}
}
Percentile Ranks Aggregation
和上面的类似,上面主要看x%的数据最大是多少,这个主要是看小于x的数据占比是多少.
参数和上面的类似,hdr和compression都可以使用
GET latency/_search
{
"size": 0,
"aggs" : {
"load_time_ranks" : {
"percentile_ranks" : {
"field" : "load_time",
"values" : [500, 600]
}
}
}
}
Scripted Metric Aggregation
直接用脚本输出一个
走的就是map-reduce那一套,需要写一个init脚本,map脚本,combine脚本,reduce脚本
POST exams/_search?size=0
{
"query" : {
"match_all" : {}
},
"aggs": {
"profit": {
"scripted_metric": {
"init_script" : "state.transactions = []",
"map_script" : "state.transactions.add(doc.type.value == 'sale' ? doc.amount.value : -1 * doc.amount.value)",
"combine_script" : "double profit = 0; for (t in state.transactions) { profit += t } return profit",
"reduce_script" : "double profit = 0; for (a in states) { profit += a } return profit"
}
}
}
}
Stats Aggregation
概要聚合
返回min
, max
, sum
, count
,avg
POST /exams/_search?size=0
{
"aggs" : {
"grades_stats" : { "stats" : { "field" : "grade" } }
}
}
POST /sales/_search?size=0
{
"query" : {
"constant_score" : {
"filter" : {
"match" : { "type" : "hat" }
}
}
},
"aggs" : {
"square_hats" : {
"sum" : {
"field" : "price",
"script" : {
"source": "_value * _value"
}
}
}
}
}
Top Hits Aggregation
TopN聚合
POST /example/_search?size=0
{
"aggs":{
"a_a":{
"terms":{
"field": "a",
"order":{
"top_hit": "asc"
},
"size": 2
},
"aggs":{
"top_tags_hits":{
"top_hits": {
"size": 2,
"_source":{
"includes": ["a","b"]
},
"sort": [
{
"b":{
"order":"desc"
}
}
]
}
},
"top_hit":{
"avg":{
"script": {
"source": "doc.b"
}
}
}
}
}
}
}
- items中可用自定义的聚合指标来做排序
_source
中可自定义需要返回的数据字段- top_hits中的sort用来控制top的顺序,最小topN或者最大topN
- 内部数据topN聚合
对doc.comments做聚合
GET /sales/_search
{
"query": {
"term":{"tags":"car"}
},
"aggs":{
"by_sale":{
"nested": {
"path": "comments"
},
"aggs":{
"by_user":{
"terms":{
"field": "comments.username",
"size": 2
},
"aggs": {
"by_nested": {
"top_hits": {
"size": 4
}
}
}
}
}
}
}
}
Value Count Aggregation
值计数聚合,和Cardinality Aggregation
类似,但是不做去重
POST /example/_search?size=0
{
"aggs" : {
"types_count" : { "value_count" : { "field" : "c" } }
}
}
Median Absolute Deviation Aggregation
绝对离差中位数,鲁棒性很强,计算公式为:median(|median(X) - Xi|,平均值变得很大,但是绝对离差中位数还是很小
可使用compression
控制性能和精确性的平衡,默认是1000
当加入一个很大的离异值时:
GET reviews/_search
{
"size": 0,
"aggs": {
"review_average": {
"avg": {
"field": "rating"
}
},
"review_variability": {
"median_absolute_deviation": {
"field": "rating"
}
}
}
}
{
"took" : 447,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"review_average" : {
"value" : 15.625
},
"review_variability" : {
"value" : 1.5
}
}
}
网友评论