美文网首页
解决EFK kibana异常之Unable to revive

解决EFK kibana异常之Unable to revive

作者: 王彦锋BlackStone | 来源:发表于2019-08-09 15:00 被阅读0次
  • 问题现象:

openshift安装efk后,各个pod显示正常,但kibana pod提示打印异常信息

log [22:52:02.901] [warning][elasticsearch] Unable to revive connection: https://logging-es:9200/
log [22:52:02.902] [warning][elasticsearch] No living connections
log [22:52:05.442] [warning][elasticsearch] Unable to revive connection: https://logging-es:9200/
log [22:52:05.443] [warning][elasticsearch] No living connections
  • 问题排查:


1、确认kibana和ES Server(logging-es)服务链接是否正常

进入OKD 集群,master节点,执行如下命令。

np="openshift-logging"; \
oc exec `oc get pods -l component=kibana-ops -o name  -n $np |cut -d/ -f2` \
  -c kibana   \
  -n $np     \
  -- curl -s \
     --cacert /etc/kibana/keys/ca \
     --cert /etc/kibana/keys/cert \
     --key /etc/kibana/keys/key \
     https://logging-es:9200/

结果显示kibana连接ES服务是正常的,可以排除kibana的问题。

{
  "name" : "logging-es-data-master-ypgh5heq",
  "cluster_name" : "logging-es",
  "cluster_uuid" : "uHhppucFTI2JwZdYbStTaA",
  "version" : {
    "number" : "5.6.13",
    "build_hash" : "4d5320b",
    "build_date" : "2018-10-30T19:05:08.237Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}

2、确认ES Server(logging-es)集群服务是否正常

执行命令查看集群状态

np="openshift-logging"; \
oc exec `oc get pods -l component=es -o name  -n $np |cut -d/ -f2` \
  -c elasticsearch   \
  -n $np     \
  -- curl  -s \
       --cacert /etc/elasticsearch/secret/admin-ca   \
       --cert /etc/elasticsearch/secret/admin-cert   \
       --key  /etc/elasticsearch/secret/admin-key    \
       https://localhost:9200/_cluster/health?pretty=true

结果显示,status为yellow,unassigned_shards为14,active_shards_percent_as_number为50。

{
  "cluster_name" : "logging-es",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 14,
  "active_shards" : 14,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 14,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

3、通过unassigned_shards和active_shards_percent_as_number的显示,推断是有索引分片没有被正确存储。
执行命令查看索引状态:

np="openshift-logging"; \
oc exec `oc get pods -l component=es -o name  -n $np |cut -d/ -f2` \
  -c elasticsearch   \
  -n $np      \
  -- curl -s  \
        --cacert /etc/elasticsearch/secret/admin-ca  \
        --cert /etc/elasticsearch/secret/admin-cert  \
        --key  /etc/elasticsearch/secret/admin-key   \
        https://localhost:9200/_cat/indices?v
health status index                                            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .operations.2019.08.01                           eb5nHgZdSYmU91YBKoMXWA   1   1      42731            0     31.5mb         31.5mb
yellow open   .operations.2019.07.30                           XAcy27SxQ-6eKw4e3NHtRA   1   1      67211            0     74.2mb         74.2mb
yellow open   .operations.2019.08.03                           fTcaIEx-TD-BQuYhocYOjA   1   1     275384            0    173.4mb        173.4mb
yellow open   .kibana.d033e22ae348aeb5660fc2140aec35850c4da997 fN0s-AywSDieHYiI0yor2A   1   1          5            0     57.7kb         57.7kb
yellow open   .operations.2019.07.31                           KYJ0wn8TTl6cVYjUplW3Ag   1   1      61817            0     63.8mb         63.8mb
yellow open   .operations.2019.08.08                           FmKDNSmLQ469GXZzh3-oTQ   1   1      73322            0     64.1mb         64.1mb
yellow open   .searchguard                                     E_CTiBPXQy29uAceCLfEoA   1   1          5            0     66.3kb         66.3kb
yellow open   .operations.2019.08.05                           BbDX_5rlRAO-kMbIrS7Wag   1   1     249518            0    174.7mb        174.7mb
yellow open   .operations.2019.08.02                           Tvnk20pKS-eBz2mPtR-rmA   1   1     293582            0    199.5mb        199.5mb
yellow open   .kibana                                          g8Py9rE0TNe6K37XIXmXiw   1   1          1            0      3.2kb          3.2kb
yellow open   .operations.2019.08.09                           MIbESM4KTWOzzoCPe8b9Jg   1   1      20257            0     17.3mb         17.3mb
yellow open   .operations.2019.08.04                           oZi_MuSNS9-HUvmXT61CNA   1   1     271250            0      169mb          169mb
yellow open   .operations.2019.08.06                           uOgHA4EVT42WgiDkKPf97g   1   1     124307            0    102.4mb        102.4mb
yellow open   .operations.2019.07.29                           aBQSJHprTK-bhW0JIX3UpA   1   1      39028            0     29.4mb         29.4mb
yellow open   .operations.2019.07.27                           m1QQJSjjRXS9gBMpos_2Qw   1   1      36097            0     30.4mb         30.4mb

结果显示indes的rep都为1,这个和预期是不相符的。

副本分片的主要目的就是为了故障转移,正如在 集群内的原理 中讨论的:如果持有主分片的节点挂掉了,一个副本分片就会晋升为主分片的角色。

那么可以看出来副本分片和主分片是不能放到一个节点上面的,可是在只有一个节点的集群里,副本分片没有办法分配到其他的节点上,所以出现所有副本分片都unassigned得情况。因为只有一个节点,如果存在主分片节点挂掉了,那么整个集群理应就挂掉了,不存在副本分片升为主分片的情况。

  • 问题修复:


既然只有一个节点,无法做副本分片,那就设置index副本分片为0,使得ES集群恢复正常。
执行命令:

np="openshift-logging"; \
oc exec `oc get pods -l component=es -o name  -n $np |cut -d/ -f2` \
  -c elasticsearch   \
  -n $np      \
  -- curl -s  \
        --cacert /etc/elasticsearch/secret/admin-ca    \
        --cert   /etc/elasticsearch/secret/admin-cert  \
        --key    /etc/elasticsearch/secret/admin-key   \
        -H "Content-Type: application/json"            \
        -XPUT 'https://localhost:9200/_settings'       \
        -d '{
            "index" : {
                "number_of_replicas" : 0
            }
        }'

再次执行集群状态查看,可以看到,集群已经恢复正常。

{
  "cluster_name" : "logging-es",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 15,
  "active_shards" : 15,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

相关文章

网友评论

      本文标题:解决EFK kibana异常之Unable to revive

      本文链接:https://www.haomeiwen.com/subject/fgchjctx.html