美文网首页
KVM学习笔记(集群实验二,基于ISCSI共享存储的集群)

KVM学习笔记(集群实验二,基于ISCSI共享存储的集群)

作者: 一个反派人物 | 来源:发表于2020-10-10 17:59 被阅读0次

1 规划设计

组网拓扑
主机 业务网 心跳网 存储网
node1 192.168.234.129 172.16.1.231 10.0.1.231
node2 192.168.234.130 172.16.1.232 10.0.1.232
storage 192.168.234.250 10.0.1.235

集群资源依赖关系
DLM->CLVM->File System(GFS2)->Virtual Domain

2 节点准备

安装虚拟化软件(计算节点)

yum groups install -y "Virtualization Platform " 
yum groups install -y "Virtualization Hypervisor "
yum groups install -y "Virtualization Tools "
yum groups install -y "Virtualization Client "

安装集群软件(计算节点)

yum install pacemaker corosync pcs psmisc policycoreutils-python fence-agents-all -y

安装CLVM(计算节点)

yum -y install lvm2-cluster

安装GFS2集群文件系统,dlm分布式锁(计算节点)

yum -y install gfs2-utils dlm

安装Linux IO软件(存储节点)

yum -y install targetcli

设置hosts添加各主机的域名解析(所有节点)

[root@node1 ~]$  cat /etc/hosts
192.168.234.129 node1
192.168.234.130 node2
10.0.1.231 node1-stor
10.0.1.232 node2-stor
10.0.1.235 stor
172.16.1.231 node1-sync
172.16.1.232 node2-sync

配置ssh免密认证(计算节点)

ssh-keygen -t rsa -P ''
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1 #到自己免密码
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2 #到node2免密码(双向)

设置定时同步时间(所有节点)

yum install ntpdate -y
crontab -e
*/30 * * * * /usr/sbin/ntpdate time.windows.com &> /dev/null

设置防火墙(所有节点)

#允许集群服务通过防火墙
firewall-cmd --permanent --add-service=high-availability
#允许心跳及存储网络通过防火墙
firewall-cmd --zone=trusted --add-source=10.0.1.0/24 --permanent
firewall-cmd --zone=trusted --add-source=172.16.1.0/24 --permanent
#允许动态迁移
firewall-cmd --permanent --add-port=16509/tcp
firewall-cmd --permanent --add-port=49152-49215/tcp
#允许虚拟机vnc端口,方便virt-manager远程连接
firewall-cmd --permanent --add-service=vnc-server
#存储上放行iscsi-target端口
firewall-cmd --permanent --add-service=iscsi-target
firewall-cmd --reload

3 配置存储服务器

准备使用fileio的方式为计算节点提供存储。创建新的lv用于存储,准备一块新的硬盘sdb,创建分区sdb1用于lvm

fdisk /dev/sdb
n  #创建一个新的分区
p  #类型是主分区,默认是p,不输入回车即可
1  #分区id,默认是1,不输入回车即可
回车 #起始扇区,保持默认
回车 #终止扇区,保持默认,用满整个磁盘
t  #更改分区类型
8e #8e对应类型为lvm
w #写入配置

创建pv、vg、lv

#创建pv
pvcreate /dev/sdb1
#创建vg
vgcreate vglabstor /dev/sdb1
#创建lv
lvcreate -l 100%FREE -n lvlabstor vglabstor

格式化lv,并挂载

mkfs.xfs /dev/vglabstor/lvlabstor
mount /dev/vglabstor/lvlabstor /labstor

配置iscsi target,最终完成状态如下图,创建2个fileio的lun,2个lun都map给计算节点


iscsi配置完毕状态

具体的配置步骤如下:

[root@storage ~]$ targetcli 
targetcli shell version 2.1.fb49
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.
#进入fileio目录下
/> cd /backstores/fileio/
#创建名disk01的fileio文件,大小1G
/backstores/fileio>  create disk01 /labstor/disk01.img 1G
#创建名disk02的fileio文件,大小20G
/backstores/fileio>  create disk02 /labstor/disk02.img 20G
#进入iscsi目录
/backstores/fileio> cd /iscsi/
#创建iqn target,会自动创建tpg、acl、luns目录
/iscsi> create iqn.2020-10.linuxplux.srv:storage.target00
#进入luns目录
/iscsi> cd iqn.2020-10.linuxplux.srv:storage.target00/tpg1/luns/
#创建lun
/iscsi/iqn.20...t00/tpg1/luns> create /backstores/fileio/disk01
/iscsi/iqn.20...t00/tpg1/luns> create /backstores/fileio/disk02
#进入acl目录
/iscsi/iqn.20...t00/tpg1/luns> cd ../acls/
#创建acl,查看和修改计算节点的/etc/iscsi/initiatorname.iscsi文件来得到initiator的iqn 
/iscsi/iqn.20...t00/tpg1/acls> create iqn.1994-05.com.redhat:kvmnode1
/iscsi/iqn.20...t00/tpg1/acls> create iqn.1994-05.com.redhat:kvmnode2

4 连接ISCSI存储

在计算节点上进行iscsi target发现

[root@node1 ~]$ iscsiadm -m discovery -t sendtargets -p stor
10.0.1.235:3260,1 iqn.2020-10.linuxplux.srv:storage.target00

连接发现的iscsi node

[root@node1 ~]$ iscsiadm -m node --login

lsblk能够发现多了两个块设备sdb和sdc,大小分别是1G和20G,能够对应存储上刚刚创建的两个fileio文件

[root@node1 ~]$ lsblk 
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   20G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   19G  0 part 
  ├─centos-root 253:0    0   17G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
sdb               8:16   0    1G  0 disk 
sdc               8:32   0   20G  0 disk 

5 配置STONITH(DISK)

因为STONITH设备需要连接所有节点,所以在共享存储上配置。1G大小的sdb用于STONITH设备,首先查看sdb磁盘的wwn号,指定disk时使用wwn号。因为disk的wwn号是不会变化的,防止disk在不同计算节点上的名字不一样

[root@node1 ~]$ ll /dev/disk/by-id/ | grep sdb
lrwxrwxrwx 1 root root  9 Oct 11 09:46 scsi-36001405a2706466a01541c79afd664c9 -> ../../sdb
#取下面wwn开头的这段id
lrwxrwxrwx 1 root root  9 Oct 11 09:46 wwn-0x6001405a2706466a01541c79afd664c9 -> ../../sdb

创建名为scsi-shooter的STONITH设备,类型为fence_scsi,注意scsi类型的隔离设备不会让节点下电,仅仅是服务没有启动,所以meta provides=unfencing是必须的

pcs  stonith create scsi-shooter fence_scsi \
pcmk_host_list="node1-sync node2-sync" devices="/dev/disk/by-id/wwn-0x6001405a2706466a01541c79afd664c9" \
meta provides=unfencing

配置后查看pcs 状态,能够看到STONITH设备在节点2上启动

[root@node1 ~]$ pcs status 
Cluster name: cluster1
Stack: corosync
Current DC: node1-sync (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Oct 11 15:16:12 2020
Last change: Sat Oct 10 21:00:35 2020 by root via cibadmin on node1-sync

2 nodes configured
1 resources configured

Online: [ node1-sync node2-sync ]

Full list of resources:

 scsi-shooter   (stonith:fence_scsi):   Started node2-sync

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

6 配置DLM分布式锁

方法1:通过更改拷贝文件,再push的方式写入

pcs cluster cib dlm_cfg
pcs -f dlm_cfg resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence
#资源名称是dlm;类型是ocf:pacemaker:controld;
#op代表operation,op的动作是monitor,间隔是30s,检测到资源fail时对node进行fence
pcs -f dlm_cfg resource clone dlm clone-max=2 clone-node-max=1
#clone-max=2代表最多克隆为2份,clone-node-max=1代表每个node上最多由1个资源
pcs cluster cib-push dlm_cfg

方法2:直接通过命令创建

pcs resource create dlm ocf:pacemaker:controld \
op monitor interval=30s on-fail=fence \
clone interleave=true ordered=true
#clone参数讲解
#1、interleave
Changes the behavior of ordering constraints (between clones/masters) so that
copies of the first clone can start or stop as soon as the copy on the same node of 
the second clone has started or stopped (rather than waiting until every instance of 
the second clone has started or stopped). Allowed values: false, true. The default 
value is false.
#2、order
Should the copies be started in series (instead of in parallel). 
Allowed values: false, true. The default value is false.

配置后查看pcs 状态,能够看到dlm在2个节点上启动

[root@node1 ~]$ pcs status 
Cluster name: cluster1
Stack: corosync
Current DC: node1-sync (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Oct 11 16:17:47 2020
Last change: Sat Oct 10 21:00:35 2020 by root via cibadmin on node1-sync

2 nodes configured
3 resources configured

Online: [ node1-sync node2-sync ]

Full list of resources:

 scsi-shooter   (stonith:fence_scsi):   Started node2-sync
 Clone Set: dlm-clone [dlm]
     Started: [ node1-sync node2-sync ]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

7 配置CLVM(集群化的LVM)

CLVM是LVM集群化的扩展,允许集群的主机通过LVM的方式管理共享的存储,clvmd是CLVM的守护进程,作为pacemaker的一个子进程来运行。各个节点通过运行的clvmd进程来同步共享存储的lvm元数据。


CLVM结构图

7.1 修改CLVM的锁定方式

修改CLVM的锁定方式,默认是本地基于文件的锁定方式,更改为基于集群的锁定方式,修改后需要重启节点

cat /etc/lvm/lvm.conf
...
        # Configuration option global/locking_type.
        # Type of locking to use.
        # 
        # Accepted values:
        #   0
        #     Turns off locking. Warning: this risks metadata corruption if
        #     commands run concurrently.
        #   1
        #     LVM uses local file-based locking, the standard mode.
        #   2
        #     LVM uses the external shared library locking_library.
        #   3
        #     LVM uses built-in clustered locking with clvmd.
        #     This is incompatible with lvmetad. If use_lvmetad is enabled,
        #     LVM prints a warning and disables lvmetad use.
        #   4
        #     LVM uses read-only locking which forbids any operations that
        #     might change metadata.
        #   5
        #     Offers dummy locking for tools that do not need any locks.
        #     You should not need to set this directly; the tools will select
        #     when to use it instead of the configured locking_type.
        #     Do not use lvmetad or the kernel device-mapper driver with this
        #     locking type. It is used by the --readonly option that offers
        #     read-only access to Volume Group metadata that cannot be locked
        #     safely because it belongs to an inaccessible domain and might be
        #     in use, for example a virtual machine image or a disk that is
        #     shared by a clustered machine.
        # 
    locking_type = 1

使用lvm配置工具lvmconf修改

lvmconf --enable-cluster
reboot

7.2 在集群中添加CLVM资源

在集群的每个节点上添加clvmd,所以以克隆的方式添加

pcs resource create clvmd ocf:heartbeat:clvm \
op monitor interval=30s on-fail=fence \
clone interleave=true ordered=true
#资源名称是clvmd,资源类型是ocf:heartbeat:clvm,参数设置说明可以参考之前dlm的参数设置

配置后查看pcs 状态,能够看到clvmd在2个节点上启动

[root@node1 ~]$ pcs status 
Cluster name: cluster1
Stack: corosync
Current DC: node1-sync (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Oct 11 16:42:10 2020
Last change: Sat Oct 10 21:00:35 2020 by root via cibadmin on node1-sync

2 nodes configured
5 resources configured

Online: [ node1-sync node2-sync ]

Full list of resources:

 scsi-shooter   (stonith:fence_scsi):   Started node2-sync
 Clone Set: dlm-clone [dlm]
     Started: [ node1-sync node2-sync ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ node1-sync node2-sync ]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

7.3 添加顺序和位置约束

因为约束关系,clvmd必须在dlm启动后启动,且启动clvmd和dlm必须在同一节点上,所以添加两种约束

pcs constraint order start dlm-clone then start clvmd-clone
pcs constraint colocation add clvmd-clone with dlm-clone

7.4 创建集群的lv

对共享磁盘分区,在任意节点上操作即可,clmvd会自动在节点间同步信息

fdisk /dev/sdc
n  #创建一个新的分区
p  #类型是主分区,默认是p,不输入回车即可
1  #分区id,默认是1,不输入回车即可
回车 #起始扇区,保持默认
回车 #终止扇区,保持默认,用满整个磁盘
t  #更改分区类型
8e #8e对应类型为lvm
w #写入配置

其他节点建议手工同步下分区表

partprobe ; multipath -r 

创建pv、vg、lv

#创建pv
pvcreate /dev/sdc1
#创建vg
vgcreate vmvg0 /dev/sdc1
#创建lv
lvcreate -l 100%FREE -n vmlv0 vmvg0

clvm的vg属性会多出cluster的字段


clvm的vg
clvm的vg

8 配置GFS2文件系统

常见的集群文件系统有GFS2、OCFS(Oracle)、VMFS(VMware)、VIMS(华为)。

8.1 格式化lv为GFS2

下面格式化刚刚创建的lv为GFS2文件系统

mkfs.gfs2 -p lock_dlm -j 2 -t cluster1:labkvm1 /dev/mapper/vmvg0-vmlv0
#-p protocol 代表锁协议
#-j journals 代表文件系统日志数目,最好设置为使用文件系统的集群节点数
#-t clustername:lockspace clustername必须为集群名字,lockspace必须是一个用于在集群中表示这个文件系统的独一无二的名字

8.2 向集群添加GFS2文件系统资源

在集群的每个节点上添加GFS2文件系统,即在每个节点上均挂载文件系统,所以以克隆的方式添加,且需要事先准备好用于挂载的文件夹/vm

pcs resource create VMFS ocf:heartbeat:Filesystem \
device="/dev/vmvg0/vmlv0" directory="/vm" fstype="gfs2" clone
#资源名称是VMFS,类型是ocf:heartbeat:Filesystem

配置后查看pcs 状态,能够看到文件系统在2个节点上启动

[root@node1 ~]$  pcs status 
Cluster name: cluster1
Stack: corosync
Current DC: node1-sync (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Oct 11 17:29:20 2020
Last change: Sat Oct 10 21:00:35 2020 by root via cibadmin on node1-sync

2 nodes configured
7 resources configured

Online: [ node1-sync node2-sync ]

Full list of resources:

 scsi-shooter   (stonith:fence_scsi):   Started node2-sync
 Clone Set: dlm-clone [dlm]
     Started: [ node1-sync node2-sync ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ node1-sync node2-sync ]
 Clone Set: VMFS-clone [VMFS]
     Started: [ node1-sync node2-sync ]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

使用df -T查看lv已经挂载到/vm目录下,且文件系统是gfs2


lv已经自动挂载

8.3 添加顺序和位置约束

因为约束关系,GFS2必须在CLVMD启动后启动,且启动GFS2和CLVMD必须在同一节点上,所以添加两种约束

pcs constraint order start clvmd-clone then start VMFS-clone
pcs constraint colocation add VMFS-clone with clvmd-clone

之后可以测试下对/vm目录读写是否可以,且数据是否同步

9 配置虚拟机

创建虚拟机磁盘文件、virt-install安装虚拟机、虚拟机迁移测试、拷贝虚拟机配置文件和磁盘文件至共享存储的过程省略

9.1 向集群添加虚拟机资源

pcs resource create centos6 ocf:heartbeat:VirtualDomain \
hypervisor="qemu:///system" \
config="/vm/qemu_config/centos6.xml" \
migration_transport=ssh \
meta allow-migration="true" priority="100" \
op start timeout="120s" \
op stop timeout="120s" \
op monitor timeout="30" interval="10" \
op migrate_from interval="0" timeout="120s" \
op migrate_to interval="0" timeout="120s" 

配置后查看pcs 状态,能够看到虚拟机在node1-sync上启动,pacemaker分配资源方式为交替分配,因为scsi-shooter在node2上,所以虚拟机分配到node1上

[root@node1 ~]$  pcs status 
Cluster name: cluster1
Stack: corosync
Current DC: node1-sync (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Oct 11 17:47:48 2020
Last change: Sat Oct 10 21:00:35 2020 by root via cibadmin on node1-sync

2 nodes configured
8 resources configured

Online: [ node1-sync node2-sync ]

Full list of resources:

 centos6        (ocf::heartbeat:VirtualDomain): Started node1-sync
 scsi-shooter   (stonith:fence_scsi):   Started node2-sync
 Clone Set: dlm-clone [dlm]
     Started: [ node1-sync node2-sync ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ node1-sync node2-sync ]
 Clone Set: VMFS-clone [VMFS]
     Started: [ node1-sync node2-sync ]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

9.2 添加顺序和位置约束

因为约束关系,虚拟机必须在GFS2文件系统启动后启动,且虚拟机和文件系统要在同一节点上,所以添加两种约束

pcs constraint order start VMFS-clone then start centos6
pcs constraint colocation add centos6 with VMFS-clone

9.3 迁移测试

Centos自带的kvm版本,使用libvirt迁移支持在线迁移,使用pcs无法在线迁移,虚拟机会先shutdown,再在目标节点上开机。

#节点standby
pcs cluster standby node1-sync
#手工move,不写目标节点会自动在源节点产生constraint
pcs cluster resource move centos6
#手工move,写目标节点不会产生constraint;
#即使目标节点有constraint,手工move后disable的constraint会变为enable,虚拟机可以正常启动
pcs cluster resource move centos6 node1-sync
#停止节点集群服务
pcs cluster stop node2

相关文章

网友评论

      本文标题:KVM学习笔记(集群实验二,基于ISCSI共享存储的集群)

      本文链接:https://www.haomeiwen.com/subject/ugzspktx.html