美文网首页
断电导致HDFS 块损坏修复

断电导致HDFS 块损坏修复

作者: 吃货大米饭 | 来源:发表于2019-08-22 17:33 被阅读0次

一、现象

断电 导致HDFS服务不正常或者显示块损坏

二、直接DN节点上删除文件一个block的三个副本(3副本)

[hadoop@hadoop001 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta
[hadoop@hadoop002 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta
[hadoop@hadoop003 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta

直接重启HDFS,直接模拟损坏效果

三、检查hdfs文件系统健康

hdfs fsck /path

[hadoop@hadoop001 ~]$ hdfs fsck /
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&path=%2F
FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path / at Thu Aug 22 17:07:58 CST 2019
.
/blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827

/blockrecover/genome-scores.csv: MISSING 1 blocks of total size 55108925 B.Status: CORRUPT
 Total size:    323544381 B
 Total dirs:    10
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      3 (avg. block size 107848127 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:      1 (33.333332 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:        1
  MISSING BLOCKS:       1
  MISSING SIZE:         55108925 B
  CORRUPT BLOCKS:       1
  ********************************
 Minimally replicated blocks:   2 (66.666664 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0
 Corrupt blocks:                1
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Thu Aug 22 17:07:58 CST 2019 in 5 milliseconds


The filesystem under path '/' is CORRUPT

四、输出损坏的块及其所属的文件列表

hdfs fsck /path -list-corruptfileblocks

[hadoop@hadoop001 ~]$ hdfs fsck /blockrecover/genome-scores.csv -list-corruptfileblocks
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&listcorruptfileblocks=1&path=%2Fblockrecover%2Fgenome-scores.csv
The list of corrupt files under path '/blockrecover/genome-scores.csv' are:
blk_1073741827  /blockrecover/genome-scores.csv
The filesystem under path '/blockrecover/genome-scores.csv' has 1 CORRUPT files

五、定位文件的哪些块分布在哪些机器上面

-files 文件分块信息,
-blocks 在带-files参数后才显示block信息
-locations 在带-blocks参数后才显示block块所在datanode的具体IP位置,
-racks 在带-files参数后显示机架位置

错误情况

[hadoop@hadoop001 ~]$ hdfs fsck /blockrecover/genome-scores.csv -files -blocks -locations -racks
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&racks=1&path=%2Fblockrecover%2Fgenome-scores.csv
FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path /blockrecover/genome-scores.csv at Thu Aug 22 17:16:36 CST 2019
/blockrecover/genome-scores.csv 323544381 bytes, 3 block(s): 
/blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827
 MISSING 1 blocks of total size 55108925 B
0. BP-1685056456-192.168.174.121-1566207286072:blk_1073741825_1001 len=134217728 Live_repl=3 [/default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.121:50010]
1. BP-1685056456-192.168.174.121-1566207286072:blk_1073741826_1002 len=134217728 Live_repl=3 [/default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.121:50010]
2. BP-1685056456-192.168.174.121-1566207286072:blk_1073741827_1003 len=55108925 MISSING!

Status: CORRUPT
 Total size:    323544381 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      3 (avg. block size 107848127 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:      1 (33.333332 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:        1
  MISSING BLOCKS:       1
  MISSING SIZE:         55108925 B
  CORRUPT BLOCKS:       1
  ********************************
 Minimally replicated blocks:   2 (66.666664 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0
 Corrupt blocks:                1
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Thu Aug 22 17:16:36 CST 2019 in 1 milliseconds


The filesystem under path '/blockrecover/genome-scores.csv' is CORRUPT

正常情况:

[hadoop@hadoop001 data]$ hdfs fsck /blockrecover/genome-scores.csv -files -blocks -locations -racks
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&racks=1&path=%2Fblockrecover%2Fgenome-scores.csv
FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path /blockrecover/genome-scores.csv at Thu Aug 22 17:36:21 CST 2019
/blockrecover/genome-scores.csv 323544381 bytes, 3 block(s):  OK
0. BP-1685056456-192.168.174.121-1566207286072:blk_1073741828_1004 len=134217728 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.122:50010]
1. BP-1685056456-192.168.174.121-1566207286072:blk_1073741829_1005 len=134217728 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010]
2. BP-1685056456-192.168.174.121-1566207286072:blk_1073741830_1006 len=55108925 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.122:50010]

Status: HEALTHY
 Total size:    323544381 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      3 (avg. block size 107848127 B)
 Minimally replicated blocks:   3 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Thu Aug 22 17:36:21 CST 2019 in 1 milliseconds


The filesystem under path '/blockrecover/genome-scores.csv' is HEALTHY

六、选择删除损坏的块文件,然后业务系统数据重刷

[hadoop@hadoop001 ~]$ hdfs fsck / -delete
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&delete=1&path=%2F
FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path / at Thu Aug 22 17:32:00 CST 2019
.
/blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827

/blockrecover/genome-scores.csv: MISSING 1 blocks of total size 55108925 B.Status: CORRUPT
 Total size:    323544381 B
 Total dirs:    10
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      3 (avg. block size 107848127 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:      1 (33.333332 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:        1
  MISSING BLOCKS:       1
  MISSING SIZE:         55108925 B
  CORRUPT BLOCKS:       1
  ********************************
 Minimally replicated blocks:   2 (66.666664 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0
 Corrupt blocks:                1
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Thu Aug 22 17:32:00 CST 2019 in 15 milliseconds


The filesystem under path '/' is CORRUPT

log文件丢一丢丢 没有关系
文件是业务数据 订单数据 丢了,需要报告重刷数据

七、总结

1.hdfs fsck / -delete 直接删除损坏的文件

如果是hbase 无需删除这个表的所有文件,只需重刷所有数据,put 有的就update 没有的就insert

2.-files -locations -blocks -racks 好文件显示 坏文件不显示

相关文章

网友评论

      本文标题:断电导致HDFS 块损坏修复

      本文链接:https://www.haomeiwen.com/subject/vitosctx.html