ES 故障

作者: Ary_zz | 来源:发表于2019-10-18 10:41 被阅读0次

2019-10-18

primary shard lost

unassigned_info

"can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt"

几种报错:

  •    "node_id" : "dBd4onKFSLSvrxgFCIP6GQ",
       "node_name" : "elasticsearch-data-86d6d959c5-ddlfd",
       "transport_address" : "172.16.38.77:9300",
       "node_decision" : "no",
       "store" : {
       "in_sync" : false,
       "allocation_id" : "SA157qPdRViXVK2ie2QgKg",
       "store_exception" : {
       "type" : "file_not_found_exception",
       "reason" : "no segments* file found in SimpleFSDirectory@/data/db/nodes/0/indices/sRSOtM-URGGeOs49IACD8w/1/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@42971bc2: files: [write.lock]"
       }
       }
       }
    
  •  "node_id" : "WJSf3f08Riuy-4kajyLb6A",
     "node_name" : "elasticsearch-data-86d6d959c5-8jb7x",
     "transport_address" : "172.16.126.15:9300",
     "node_decision" : "no",
     "store" : {
     "in_sync" : false,
     "allocation_id" : "A8L3_M7SQG-ZSy7zbBNWFg"
     }
     }
    
    
  •   "node_id" : "9S-fKqTkQg-06muuMl20Uw",
      "node_name" : "elasticsearch-data-86d6d959c5-j89bq",
      "transport_address" : "172.16.23.40:9300",
      "node_decision" : "no",
      "deciders" : [
      {
      "decider" : "disk_threshold",
      "decision" : "NO",
      "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [13.930268425592923%]"
      }
      ]
      },
      {
      "node_id" : "E2BvZJ4jQu2anQzaQrgCLA",
      "node_name" : "elasticsearch-data-86d6d959c5-f2957",
      "transport_address" : "172.16.63.6:9300",
      "node_decision" : "no",
      "deciders" : [
      {
      "decider" : "disk_threshold",
      "decision" : "NO",
      "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [14.635894063472007%]"
      }
      ]
      },
      {
      "node_id" : "F_iPC-LbQ9uH9NquGrnYmw",
      "node_name" : "elasticsearch-data-86d6d959c5-2dlpb",
      "transport_address" : "172.16.95.7:9300",
      "node_decision" : "no",
      "deciders" : [
      {
      "decider" : "disk_threshold",
      "decision" : "NO",
      "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [14.663439808514775%]"
      }
      ]
      },
      {
      "node_id" : "lj1_omL1RYOSo5xws7ibQg",
      "node_name" : "elasticsearch-data-86d6d959c5-khh96",
      "transport_address" : "172.16.53.13:9300",
      "node_decision" : "no",
      "deciders" : [
      {
      "decider" : "disk_threshold",
      "decision" : "NO",
      "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [14.5565950622084%]"
      }
      ]
      }
     ​
    
    

ES 分配策略

https://doc.yonyoucloud.com/doc/mastering-elasticsearch/chapter-4/43_README.html

数据量太大

shards disk.indices disk.used disk.avail disk.total disk.percent host           ip             node
 4060      442.9gb   443.4gb     56.5gb      500gb           88 172.16.100.115 172.16.100.115 elasticsearch-data-f8449cccf-2qblf
 3719        439gb   442.2gb     57.7gb      500gb           88 172.16.3.50    172.16.3.50    elasticsearch-data-f8449cccf-bxpmh
 490      103.8gb   459.2gb     40.7gb      500gb           91 172.16.98.138  172.16.98.138  elasticsearch-data-f8449cccf-mr2bm
 3742      439.8gb   446.6gb     53.3gb      500gb           89 172.16.51.88   172.16.51.88   elasticsearch-data-f8449cccf-b74rw
 3631      463.9gb   464.4gb     35.5gb      500gb           92 172.16.98.137  172.16.98.137  elasticsearch-data-f8449cccf-gz62r
 11294                                                                                         UNASSIGNED

当容量超过80%就会有问题

引用es文档

cluster.routing.allocation.disk.threshold_enabled Defaults to true. Set to false to disable the disk allocation decider.

cluster.routing.allocation.disk.watermark.low Controls the low watermark for disk usage. It defaults to 85%, meaning that Elasticsearch will not allocate shards to nodes that have more than 85% disk used. It can also be set to an absolute byte value (like 500mb) to prevent Elasticsearch from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices or, specifically, any shards that have never previously been allocated.

cluster.routing.allocation.disk.watermark.high Controls the high watermark. It defaults to 90%, meaning that Elasticsearch will attempt to relocate shards away from a node whose disk usage is above 90%. It can also be set to an absolute byte value (similarly to the low watermark) to relocate shards away from a node if it has less than the specified amount of free space. This setting affects the allocation of all shards, whether previously allocated or not.

cluster.routing.allocation.disk.watermark.flood_stage Controls the flood stage watermark. It defaults to 95%, meaning that Elasticsearch enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is a last resort to prevent nodes from running out of disk space. The index block must be released manually once there is enough disk space available to allow indexing operations to continue.

Other issues

内存段错误 core dump

segment fault

java offheap

es memory limit 5 5 10

相关文章

  • ES 故障

    2019-10-18 primary shard lost unassigned_info "can_alloca...

  • [ElasticSearch填坑] 聚合请求导致GC故障

    故障描述: 某天对Es做多次查询请求,发现Es集群经常挂掉,无法响应。 定位问题: 我们的Es在之前较长时间内未出...

  • es的故障应对

    我们现在假设有一个这样的集群,有3个主分片,有6各复制分片如下图所示: 我们杀掉一个master节点来模拟故障。一...

  • 3-1 ES集群介绍

    ES单节点服务存在问题? a.无法在出现故障时,自动完成故障转移b.当整个网站中诮求数过于多时,导致单节点处理诸求...

  • ELASTICSEARCH FAILOVER

    故障转移 当一个ES集群中只有一个节点时,就会出现单点故障问题(没有备份冗余),我们只需要再启用一个节点就可以避免...

  • 不同类型的电缆故障该如何解决?

    不同类型的电缆故障 电缆故障分为两类,即开路故障和短路故障。 开路故障 开路故障优于其他类型的故障,因为当发生这种...

  • 线上故障处理书目录

    线上故障处理之故障信息获取源 线上故障处理之处理流程 线上故障处理之故障后处理

  • 使用skywalking对dubbo链路监控的一次问题排查

    昨天上午,运维支持组的小伙伴向我反馈说他们的es集群出了故障,bulk写性能突然下降了,平均1s中只有几百条数据写...

  • 别克gl8故障灯亮,怠速抖动

    故障: 车辆故障灯亮,而且怠速抖动 诊断: 首先用电脑查看故障码,看故障是什么。故障码显示EGR阀(废气再循环)卡...

  • 主管手记(8)

    产品故障分析要点 1、确认故障现象 2、利用故障树穷举可能导致故障的原因 3、基于故障树确认排故方案 4、依据排故...

网友评论

      本文标题:ES 故障

      本文链接:https://www.haomeiwen.com/subject/wecimctx.html