새소식

Cloud Engineering Log

rook-ceph trouble shooting: OSD가 생성되지 않아요 (RKE)

  • -

ceph-OSD란?

Object Storage Daemon, 데이터 저장하는 곳으로, OSD Pod가 없으면 Rook-Ceph을 통해 PVC가 Bound되지 않음

https://rook.io/docs/rook/v1.9/ceph-common-issues.html#pvcs-stay-in-pending-state

 

문제 관련 참고사항 1. Official Doc에서의 Issue

https://rook.io/docs/rook/v1.9/ceph-common-issues.html#osd-pods-are-not-created-on-my-devices

 

Ceph Docs

Ceph Common Issues Many of these problem cases are hard to summarize down to a short phrase that adequately describes the problem. Each problem will start with a bulleted list of symptoms. Keep in mind that all symptoms may not apply depending on the confi

rook.io

증상 확인

관련증상 1. 클러스터에 OSD Pod가 없음(해당)

kubectl get pods -n rook-ceph | grep osd
rook-ceph-osd-prepare-master1-kcxr5                 0/1     Completed   0          74m
rook-ceph-osd-prepare-worker1-ntmgq                 0/1     Completed   0          74m
rook-ceph-osd-prepare-worker2-28wcs                 0/1     Completed   0          74m
rook-ceph-osd-prepare-worker3-vvzk9                 0/1     Completed   0          74m

: prepare-worker는 수행되었으나, OSD Pod가 없음

https://rook.io/docs/rook/v1.9/ceph-advanced-configuration.html#osd-information

위와 같은 osd 정보가 나오지 않음

 

[접은글 > prepare-worker 로그 확인]

더보기

요약: 전체 디스크에 대해서 osd를 찾기 위해 가능한 파일시스템을 모두 접근하였고, vda1이 적합하나, 

cephosd: &{Name:vda1 Parent:vda HasChildren:false DevLinks:/dev/disk/by-path/virtio-pci-0000:00:04.0-part1 /dev/disk/by-partuuid/aa956a8d-7296-4cbe-bdb3-09f0e2953de5 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part1 /dev/disk/by-path/pci-0000:00:04.0-part1 /dev/disk/by-uuid/096f1af6-7b9b-47a5-b367-7348cebbfed9 /dev/disk/by-label/cloudimg-rootfs Size:107257773568 UUID: Serial:7dba1fcb-7396-4b86-9 Type:part Rotational:true Readonly:false Partitions:[] Filesystem:ext4 Mountpoint:rootfs Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/vda1 KernelName:vda1 Encrypted:false}
cephosd: skipping device "vda1" with mountpoint "rootfs"
cephosd: configuring osd devices: {"Entries":{}}
2022-05-24 01:44:03.050289 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-05-24 01:44:03.050540 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-05-24 01:44:03.586268 D | cephosd: {}
2022-05-24 01:44:03.586364 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-05-24 01:44:03.586410 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json
2022-05-24 01:44:04.586497 D | cephosd: {}
2022-05-24 01:44:04.586548 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-05-24 01:44:04.586577 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "master1"

이렇게 마무리됨. 이유를 모르겠음.

 

<아래는 full log>

kubectl logs -n rook-ceph rook-ceph-osd-prepare-master1-kcxr5
2022-05-24 01:44:01.961520 I | rookcmd: starting Rook v1.9.3 with arguments '/rook/rook ceph osd provision'
2022-05-24 01:44:01.961651 I | rookcmd: flag values: --cluster-id=c5d93339-e296-433c-bcec-34f71dd1bb58, --cluster-name=rook-ceph, --data-device-filter=all, --data-device-path-filter=, --data-devices=, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-level=DEBUG, --metadata-device=, --node-name=master1, --operator-image=, --osd-crush-device-class=, --osd-crush-initial-weight=, --osd-database-size=0, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --service-account=
2022-05-24 01:44:01.961656 I | op-mon: parsing mon endpoints: a=10.43.37.53:6789,b=10.43.26.101:6789,c=10.43.128.237:6789
2022-05-24 01:44:01.983899 I | op-osd: CRUSH location=root=default host=master1
2022-05-24 01:44:01.983920 I | cephcmd: crush location of osd: root=default host=master1
2022-05-24 01:44:01.983991 D | exec: Running command: dmsetup version
2022-05-24 01:44:01.988821 I | cephosd: Library version:   1.02.181-RHEL8 (2021-10-20)
Driver version:    4.41.0
2022-05-24 01:44:02.060988 D | cephclient: No ceph configuration override to merge as "rook-config-override" configmap is empty
2022-05-24 01:44:02.061037 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2022-05-24 01:44:02.061212 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2022-05-24 01:44:02.061416 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid                = 3a77df1f-b514-4372-a9fa-ef479fb2dc3f
mon initial members = a b c
mon host            = [v2:10.43.37.53:3300,v1:10.43.37.53:6789],[v2:10.43.26.101:3300,v1:10.43.26.101:6789],[v2:10.43.128.237:3300,v1:10.43.128.237:6789]

[client.admin]
keyring = /var/lib/rook/rook-ceph/client.admin.keyring

2022-05-24 01:44:02.061425 I | cephosd: discovering hardware
2022-05-24 01:44:02.061431 D | exec: Running command: lsblk --all --noheadings --list --output KNAME
2022-05-24 01:44:02.071676 D | exec: Running command: lsblk /dev/loop0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.086591 D | sys: lsblk output: "SIZE=\"73728000\" ROTA=\"1\" RO=\"1\" TYPE=\"loop\" PKNAME=\"\" NAME=\"/dev/loop0\" KNAME=\"/dev/loop0\" MOUNTPOINT=\"/rootfs/snap/lxd/21029\" FSTYPE=\"squashfs\""
2022-05-24 01:44:02.086805 W | inventory: skipping device "loop0". unsupported diskType loop
2022-05-24 01:44:02.086831 D | exec: Running command: lsblk /dev/loop1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.097037 D | sys: lsblk output: "SIZE=\"58130432\" ROTA=\"1\" RO=\"1\" TYPE=\"loop\" PKNAME=\"\" NAME=\"/dev/loop1\" KNAME=\"/dev/loop1\" MOUNTPOINT=\"/rootfs/snap/core18/2128\" FSTYPE=\"squashfs\""
2022-05-24 01:44:02.097426 W | inventory: skipping device "loop1". unsupported diskType loop
2022-05-24 01:44:02.097683 D | exec: Running command: lsblk /dev/loop2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.105811 D | sys: lsblk output: "SIZE=\"46870528\" ROTA=\"1\" RO=\"1\" TYPE=\"loop\" PKNAME=\"\" NAME=\"/dev/loop2\" KNAME=\"/dev/loop2\" MOUNTPOINT=\"/rootfs/snap/snapd/15904\" FSTYPE=\"squashfs\""
2022-05-24 01:44:02.106004 W | inventory: skipping device "loop2". unsupported diskType loop
2022-05-24 01:44:02.106152 D | exec: Running command: lsblk /dev/loop3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.111286 D | sys: lsblk output: "SIZE=\"58232832\" ROTA=\"1\" RO=\"1\" TYPE=\"loop\" PKNAME=\"\" NAME=\"/dev/loop3\" KNAME=\"/dev/loop3\" MOUNTPOINT=\"/rootfs/snap/core18/2409\" FSTYPE=\"squashfs\""
2022-05-24 01:44:02.111451 W | inventory: skipping device "loop3". unsupported diskType loop
2022-05-24 01:44:02.111678 D | exec: Running command: lsblk /dev/loop4 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.116770 D | sys: lsblk output: "SIZE=\"46845952\" ROTA=\"1\" RO=\"1\" TYPE=\"loop\" PKNAME=\"\" NAME=\"/dev/loop4\" KNAME=\"/dev/loop4\" MOUNTPOINT=\"/rootfs/snap/snapd/15534\" FSTYPE=\"squashfs\""
2022-05-24 01:44:02.116813 W | inventory: skipping device "loop4". unsupported diskType loop
2022-05-24 01:44:02.116829 D | exec: Running command: lsblk /dev/loop5 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.127461 D | sys: lsblk output: "SIZE=\"64909312\" ROTA=\"1\" RO=\"1\" TYPE=\"loop\" PKNAME=\"\" NAME=\"/dev/loop5\" KNAME=\"/dev/loop5\" MOUNTPOINT=\"/rootfs/snap/core20/1434\" FSTYPE=\"squashfs\""
2022-05-24 01:44:02.127524 W | inventory: skipping device "loop5". unsupported diskType loop
2022-05-24 01:44:02.127557 D | exec: Running command: lsblk /dev/loop6 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.140894 D | sys: lsblk output: "SIZE=\"71106560\" ROTA=\"1\" RO=\"1\" TYPE=\"loop\" PKNAME=\"\" NAME=\"/dev/loop6\" KNAME=\"/dev/loop6\" MOUNTPOINT=\"/rootfs/snap/lxd/22753\" FSTYPE=\"squashfs\""
2022-05-24 01:44:02.140936 W | inventory: skipping device "loop6". unsupported diskType loop
2022-05-24 01:44:02.140982 D | exec: Running command: lsblk /dev/loop7 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.143883 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.143920 W | inventory: skipping device "loop7". exit status 32
2022-05-24 01:44:02.143929 D | exec: Running command: lsblk /dev/nbd0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.148275 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.148299 W | inventory: skipping device "nbd0". exit status 32
2022-05-24 01:44:02.148309 D | exec: Running command: lsblk /dev/nbd1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.152009 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.152037 W | inventory: skipping device "nbd1". exit status 32
2022-05-24 01:44:02.152050 D | exec: Running command: lsblk /dev/nbd2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.160379 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.160421 W | inventory: skipping device "nbd2". exit status 32
2022-05-24 01:44:02.160436 D | exec: Running command: lsblk /dev/nbd3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.163403 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.163434 W | inventory: skipping device "nbd3". exit status 32
2022-05-24 01:44:02.163445 D | exec: Running command: lsblk /dev/nbd4 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.168019 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.168044 W | inventory: skipping device "nbd4". exit status 32
2022-05-24 01:44:02.168100 D | exec: Running command: lsblk /dev/nbd5 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.174901 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.175078 W | inventory: skipping device "nbd5". exit status 32
2022-05-24 01:44:02.175267 D | exec: Running command: lsblk /dev/nbd6 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.178552 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.178707 W | inventory: skipping device "nbd6". exit status 32
2022-05-24 01:44:02.178901 D | exec: Running command: lsblk /dev/nbd7 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.182077 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.182218 W | inventory: skipping device "nbd7". exit status 32
2022-05-24 01:44:02.182321 D | exec: Running command: lsblk /dev/vda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.187227 D | sys: lsblk output: "SIZE=\"107374182400\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/vda\" KNAME=\"/dev/vda\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-05-24 01:44:02.187453 D | exec: Running command: sgdisk --print /dev/vda
2022-05-24 01:44:02.197383 D | exec: Running command: udevadm info --query=property /dev/vda
2022-05-24 01:44:02.205496 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-path/pci-0000:00:04.0 /dev/disk/by-path/virtio-pci-0000:00:04.0 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9\nDEVNAME=/dev/vda\nDEVPATH=/devices/pci0000:00/0000:00:04.0/virtio1/block/vda\nDEVTYPE=disk\nID_PART_TABLE_TYPE=gpt\nID_PART_TABLE_UUID=6c680baf-f1cc-41bf-af27-6859ac6944c2\nID_PATH=pci-0000:00:04.0\nID_PATH_TAG=pci-0000_00_04_0\nID_SERIAL=7dba1fcb-7396-4b86-9\nMAJOR=252\nMINOR=0\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=1100281"
2022-05-24 01:44:02.205552 D | exec: Running command: lsblk --noheadings --path --list --output NAME /dev/vda
2022-05-24 01:44:02.211064 I | inventory: skipping device "vda" because it has child, considering the child instead.
2022-05-24 01:44:02.211210 D | exec: Running command: lsblk /dev/vda1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.219322 D | sys: lsblk output: "SIZE=\"107257773568\" ROTA=\"1\" RO=\"0\" TYPE=\"part\" PKNAME=\"/dev/vda\" NAME=\"/dev/vda1\" KNAME=\"/dev/vda1\" MOUNTPOINT=\"/rootfs\" FSTYPE=\"ext4\""
2022-05-24 01:44:02.219524 D | exec: Running command: udevadm info --query=property /dev/vda1
2022-05-24 01:44:02.229000 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-path/virtio-pci-0000:00:04.0-part1 /dev/disk/by-partuuid/aa956a8d-7296-4cbe-bdb3-09f0e2953de5 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part1 /dev/disk/by-path/pci-0000:00:04.0-part1 /dev/disk/by-uuid/096f1af6-7b9b-47a5-b367-7348cebbfed9 /dev/disk/by-label/cloudimg-rootfs\nDEVNAME=/dev/vda1\nDEVPATH=/devices/pci0000:00/0000:00:04.0/virtio1/block/vda/vda1\nDEVTYPE=partition\nID_FS_LABEL=cloudimg-rootfs\nID_FS_LABEL_ENC=cloudimg-rootfs\nID_FS_TYPE=ext4\nID_FS_USAGE=filesystem\nID_FS_UUID=096f1af6-7b9b-47a5-b367-7348cebbfed9\nID_FS_UUID_ENC=096f1af6-7b9b-47a5-b367-7348cebbfed9\nID_FS_VERSION=1.0\nID_PART_ENTRY_DISK=252:0\nID_PART_ENTRY_NUMBER=1\nID_PART_ENTRY_OFFSET=227328\nID_PART_ENTRY_SCHEME=gpt\nID_PART_ENTRY_SIZE=209487839\nID_PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4\nID_PART_ENTRY_UUID=aa956a8d-7296-4cbe-bdb3-09f0e2953de5\nID_PART_TABLE_TYPE=gpt\nID_PART_TABLE_UUID=6c680baf-f1cc-41bf-af27-6859ac6944c2\nID_PATH=pci-0000:00:04.0\nID_PATH_TAG=pci-0000_00_04_0\nID_SCSI=1\nID_SERIAL=7dba1fcb-7396-4b86-9\nMAJOR=252\nMINOR=1\nPARTN=1\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=1141094"
2022-05-24 01:44:02.229159 D | exec: Running command: lsblk /dev/vda14 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.235940 D | sys: lsblk output: "SIZE=\"4194304\" ROTA=\"1\" RO=\"0\" TYPE=\"part\" PKNAME=\"/dev/vda\" NAME=\"/dev/vda14\" KNAME=\"/dev/vda14\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-05-24 01:44:02.236086 D | exec: Running command: udevadm info --query=property /dev/vda14
2022-05-24 01:44:02.243273 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-path/pci-0000:00:04.0-part14 /dev/disk/by-path/virtio-pci-0000:00:04.0-part14 /dev/disk/by-partuuid/27c0da2b-1a54-454d-a027-5ffcfa7ee742 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part14\nDEVNAME=/dev/vda14\nDEVPATH=/devices/pci0000:00/0000:00:04.0/virtio1/block/vda/vda14\nDEVTYPE=partition\nID_PART_ENTRY_DISK=252:0\nID_PART_ENTRY_NUMBER=14\nID_PART_ENTRY_OFFSET=2048\nID_PART_ENTRY_SCHEME=gpt\nID_PART_ENTRY_SIZE=8192\nID_PART_ENTRY_TYPE=21686148-6449-6e6f-744e-656564454649\nID_PART_ENTRY_UUID=27c0da2b-1a54-454d-a027-5ffcfa7ee742\nID_PART_TABLE_TYPE=gpt\nID_PART_TABLE_UUID=6c680baf-f1cc-41bf-af27-6859ac6944c2\nID_PATH=pci-0000:00:04.0\nID_PATH_TAG=pci-0000_00_04_0\nID_SCSI=1\nID_SERIAL=7dba1fcb-7396-4b86-9\nMAJOR=252\nMINOR=14\nPARTN=14\nSUBSYSTEM=block\nTAGS=:systemd:\nUDISKS_IGNORE=1\nUSEC_INITIALIZED=1151159"
2022-05-24 01:44:02.243492 D | exec: Running command: lsblk /dev/vda15 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.252414 D | sys: lsblk output: "SIZE=\"111149056\" ROTA=\"1\" RO=\"0\" TYPE=\"part\" PKNAME=\"/dev/vda\" NAME=\"/dev/vda15\" KNAME=\"/dev/vda15\" MOUNTPOINT=\"/rootfs/boot/efi\" FSTYPE=\"vfat\""
2022-05-24 01:44:02.252532 D | exec: Running command: udevadm info --query=property /dev/vda15
2022-05-24 01:44:02.262361 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-partuuid/0e1a92f9-b9b4-410a-bc98-ac60295b8f0b /dev/disk/by-label/UEFI /dev/disk/by-path/virtio-pci-0000:00:04.0-part15 /dev/disk/by-uuid/FE80-7911 /dev/disk/by-path/pci-0000:00:04.0-part15 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part15\nDEVNAME=/dev/vda15\nDEVPATH=/devices/pci0000:00/0000:00:04.0/virtio1/block/vda/vda15\nDEVTYPE=partition\nID_FS_LABEL=UEFI\nID_FS_LABEL_ENC=UEFI\nID_FS_TYPE=vfat\nID_FS_USAGE=filesystem\nID_FS_UUID=FE80-7911\nID_FS_UUID_ENC=FE80-7911\nID_FS_VERSION=FAT32\nID_PART_ENTRY_DISK=252:0\nID_PART_ENTRY_NUMBER=15\nID_PART_ENTRY_OFFSET=10240\nID_PART_ENTRY_SCHEME=gpt\nID_PART_ENTRY_SIZE=217088\nID_PART_ENTRY_TYPE=c12a7328-f81f-11d2-ba4b-00a0c93ec93b\nID_PART_ENTRY_UUID=0e1a92f9-b9b4-410a-bc98-ac60295b8f0b\nID_PART_TABLE_TYPE=gpt\nID_PART_TABLE_UUID=6c680baf-f1cc-41bf-af27-6859ac6944c2\nID_PATH=pci-0000:00:04.0\nID_PATH_TAG=pci-0000_00_04_0\nID_SCSI=1\nID_SERIAL=7dba1fcb-7396-4b86-9\nMAJOR=252\nMINOR=15\nPARTN=15\nSUBSYSTEM=block\nTAGS=:systemd:\nUDISKS_IGNORE=1\nUSEC_INITIALIZED=1142872"
2022-05-24 01:44:02.262473 D | exec: Running command: lsblk /dev/nbd8 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.265366 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.265400 W | inventory: skipping device "nbd8". exit status 32
2022-05-24 01:44:02.265412 D | exec: Running command: lsblk /dev/nbd9 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.267850 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.267893 W | inventory: skipping device "nbd9". exit status 32
2022-05-24 01:44:02.267908 D | exec: Running command: lsblk /dev/nbd10 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.270318 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.270364 W | inventory: skipping device "nbd10". exit status 32
2022-05-24 01:44:02.270378 D | exec: Running command: lsblk /dev/nbd11 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.274057 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.274103 W | inventory: skipping device "nbd11". exit status 32
2022-05-24 01:44:02.274117 D | exec: Running command: lsblk /dev/nbd12 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.279635 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.279669 W | inventory: skipping device "nbd12". exit status 32
2022-05-24 01:44:02.279683 D | exec: Running command: lsblk /dev/nbd13 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.282219 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.282244 W | inventory: skipping device "nbd13". exit status 32
2022-05-24 01:44:02.282252 D | exec: Running command: lsblk /dev/nbd14 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.285984 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.286012 W | inventory: skipping device "nbd14". exit status 32
2022-05-24 01:44:02.286021 D | exec: Running command: lsblk /dev/nbd15 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.289889 E | sys: failed to execute lsblk. output: .
2022-05-24 01:44:02.289906 W | inventory: skipping device "nbd15". exit status 32
2022-05-24 01:44:02.289910 D | inventory: discovered disks are:
2022-05-24 01:44:02.290025 D | inventory: &{Name:vda1 Parent:vda HasChildren:false DevLinks:/dev/disk/by-path/virtio-pci-0000:00:04.0-part1 /dev/disk/by-partuuid/aa956a8d-7296-4cbe-bdb3-09f0e2953de5 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part1 /dev/disk/by-path/pci-0000:00:04.0-part1 /dev/disk/by-uuid/096f1af6-7b9b-47a5-b367-7348cebbfed9 /dev/disk/by-label/cloudimg-rootfs Size:107257773568 UUID: Serial:7dba1fcb-7396-4b86-9 Type:part Rotational:true Readonly:false Partitions:[] Filesystem:ext4 Mountpoint:rootfs Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/vda1 KernelName:vda1 Encrypted:false}
2022-05-24 01:44:02.290068 D | inventory: &{Name:vda14 Parent:vda HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:00:04.0-part14 /dev/disk/by-path/virtio-pci-0000:00:04.0-part14 /dev/disk/by-partuuid/27c0da2b-1a54-454d-a027-5ffcfa7ee742 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part14 Size:4194304 UUID: Serial:7dba1fcb-7396-4b86-9 Type:part Rotational:true Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/vda14 KernelName:vda14 Encrypted:false}
2022-05-24 01:44:02.290093 D | inventory: &{Name:vda15 Parent:vda HasChildren:false DevLinks:/dev/disk/by-partuuid/0e1a92f9-b9b4-410a-bc98-ac60295b8f0b /dev/disk/by-label/UEFI /dev/disk/by-path/virtio-pci-0000:00:04.0-part15 /dev/disk/by-uuid/FE80-7911 /dev/disk/by-path/pci-0000:00:04.0-part15 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part15 Size:111149056 UUID: Serial:7dba1fcb-7396-4b86-9 Type:part Rotational:true Readonly:false Partitions:[] Filesystem:vfat Mountpoint:efi Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/vda15 KernelName:vda15 Encrypted:false}
2022-05-24 01:44:02.290100 I | cephosd: creating and starting the osds
2022-05-24 01:44:02.290115 D | cephosd: desiredDevices are [{Name:all OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:true IsDevicePathFilter:false}]
2022-05-24 01:44:02.290124 D | cephosd: context.Devices are:
2022-05-24 01:44:02.290134 D | cephosd: &{Name:vda1 Parent:vda HasChildren:false DevLinks:/dev/disk/by-path/virtio-pci-0000:00:04.0-part1 /dev/disk/by-partuuid/aa956a8d-7296-4cbe-bdb3-09f0e2953de5 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part1 /dev/disk/by-path/pci-0000:00:04.0-part1 /dev/disk/by-uuid/096f1af6-7b9b-47a5-b367-7348cebbfed9 /dev/disk/by-label/cloudimg-rootfs Size:107257773568 UUID: Serial:7dba1fcb-7396-4b86-9 Type:part Rotational:true Readonly:false Partitions:[] Filesystem:ext4 Mountpoint:rootfs Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/vda1 KernelName:vda1 Encrypted:false}
2022-05-24 01:44:02.290150 D | cephosd: &{Name:vda14 Parent:vda HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:00:04.0-part14 /dev/disk/by-path/virtio-pci-0000:00:04.0-part14 /dev/disk/by-partuuid/27c0da2b-1a54-454d-a027-5ffcfa7ee742 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part14 Size:4194304 UUID: Serial:7dba1fcb-7396-4b86-9 Type:part Rotational:true Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/vda14 KernelName:vda14 Encrypted:false}
2022-05-24 01:44:02.290159 D | cephosd: &{Name:vda15 Parent:vda HasChildren:false DevLinks:/dev/disk/by-partuuid/0e1a92f9-b9b4-410a-bc98-ac60295b8f0b /dev/disk/by-label/UEFI /dev/disk/by-path/virtio-pci-0000:00:04.0-part15 /dev/disk/by-uuid/FE80-7911 /dev/disk/by-path/pci-0000:00:04.0-part15 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part15 Size:111149056 UUID: Serial:7dba1fcb-7396-4b86-9 Type:part Rotational:true Readonly:false Partitions:[] Filesystem:vfat Mountpoint:efi Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/vda15 KernelName:vda15 Encrypted:false}
2022-05-24 01:44:02.290165 I | cephosd: skipping device "vda1" with mountpoint "rootfs"
2022-05-24 01:44:02.290171 D | exec: Running command: udevadm info --query=property /dev/vda14
2022-05-24 01:44:02.299705 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-path/pci-0000:00:04.0-part14 /dev/disk/by-id/virtio-7dba1fcb-7396-4b86-9-part14 /dev/disk/by-path/virtio-pci-0000:00:04.0-part14 /dev/disk/by-partuuid/27c0da2b-1a54-454d-a027-5ffcfa7ee742\nDEVNAME=/dev/vda14\nDEVPATH=/devices/pci0000:00/0000:00:04.0/virtio1/block/vda/vda14\nDEVTYPE=partition\nID_PART_ENTRY_DISK=252:0\nID_PART_ENTRY_NUMBER=14\nID_PART_ENTRY_OFFSET=2048\nID_PART_ENTRY_SCHEME=gpt\nID_PART_ENTRY_SIZE=8192\nID_PART_ENTRY_TYPE=21686148-6449-6e6f-744e-656564454649\nID_PART_ENTRY_UUID=27c0da2b-1a54-454d-a027-5ffcfa7ee742\nID_PART_TABLE_TYPE=gpt\nID_PART_TABLE_UUID=6c680baf-f1cc-41bf-af27-6859ac6944c2\nID_PATH=pci-0000:00:04.0\nID_PATH_TAG=pci-0000_00_04_0\nID_SCSI=1\nID_SERIAL=7dba1fcb-7396-4b86-9\nMAJOR=252\nMINOR=14\nPARTN=14\nSUBSYSTEM=block\nTAGS=:systemd:\nUDISKS_IGNORE=1\nUSEC_INITIALIZED=1151159"
2022-05-24 01:44:02.299770 D | exec: Running command: lsblk /dev/vda14 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-05-24 01:44:02.306364 D | sys: lsblk output: "SIZE=\"4194304\" ROTA=\"1\" RO=\"0\" TYPE=\"part\" PKNAME=\"/dev/vda\" NAME=\"/dev/vda14\" KNAME=\"/dev/vda14\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-05-24 01:44:02.306408 D | exec: Running command: ceph-volume inventory --format json /dev/vda14
2022-05-24 01:44:03.037172 I | cephosd: skipping device "vda14": ["Insufficient space (<5GB)"].
2022-05-24 01:44:03.037197 I | cephosd: skipping device "vda15" with mountpoint "efi"
2022-05-24 01:44:03.050228 I | cephosd: configuring osd devices: {"Entries":{}}
2022-05-24 01:44:03.050289 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-05-24 01:44:03.050540 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-05-24 01:44:03.586268 D | cephosd: {}
2022-05-24 01:44:03.586364 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-05-24 01:44:03.586410 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json
2022-05-24 01:44:04.586497 D | cephosd: {}
2022-05-24 01:44:04.586548 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-05-24 01:44:04.586577 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "master1"

 

관련증상 2. 클러스터 CRD는 있는데 OSD Device가 없음(해당)

kubectl get crd -n rook-ceph | grep cluster
cephclusters.ceph.rook.io                             2022-05-24T01:28:53Z


관련증상 3. 노드 당 하나의 OSD Pod만 시작됨 (비해당 ... OSD Pod 자체가 없음)

 

---

 

Investigation

1. Cluster CRD에서 useAllDevices:true 또는 deviceFilter:, devices: 설정 확인 (문제없음)

2. Rook이 보기에 device를 사용할 수 없을 것 같다면 skip함 (기존 파티션/포맷된 파일 시스템이 있는 경우)

  - rook-ceph-osd-prepare 확인해야 함

kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare
NAME                                  READY   STATUS      RESTARTS   AGE
rook-ceph-osd-prepare-master1-kcxr5   0/1     Completed   0          118m
rook-ceph-osd-prepare-worker1-ntmgq   0/1     Completed   0          117m
rook-ceph-osd-prepare-worker2-28wcs   0/1     Completed   0          117m
rook-ceph-osd-prepare-worker3-vvzk9   0/1     Completed   0          117m

Job 자체는 잘 동작했으나 (Completed), 내용을 보면

cephosd: skipping device "vda1" with mountpoint "rootfs"

라고 해서, rook official doc에서의 해당 예시가

# A device will be skipped if Rook sees it has partitions or a filesystem
2019-05-30 19:02:57.353171 W | cephosd: skipping device sda that is in use
2019-05-30 19:02:57.452168 W | skipping device "sdb5": ["Used by ceph-disk"]

이것인 것으로 보아, rootfs로 마운트 된 포인트가 vda1이기 때문에 스킵을 했다. 까지로 해석됨.

(비슷한 사례: https://github.com/rook/rook/issues/9470)

(이러한 파일 시스템 마운트 된 것을 스킵하는 방식은 https://github.com/rook/rook/issues/8046 해당 이슈로부터 적용됨)

(요약: 기존에 OSD가 자꾸 rootfs를 consume 해서, ranchOS가 그냥 날아갔다 --> rootfs와 같은 파일 시스템을 skip 하도록 변경)

 

rootfs는, Rancher에서의 K3OS를 위한 storage임.

이걸 솔루션에서 삭제 시, Rancher 클러스터가 날아감....

 

---

Solution

1. CRD가 문제라면 CR을 업데이트

2. Device나 Partition에서 FileSystem을 정리

-- 정리 안내서 (https://rook.io/docs/rook/v1.9/ceph-teardown.html#zapping-devices) --

DISK="/dev/sdX"

# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
sgdisk --zap-all $DISK

# Wipe a large portion of the beginning of the disk to remove more LVM metadata that may be present
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync

# SSDs may be better cleaned with blkdiscard instead of dd
blkdiscard $DISK

# Inform the OS of partition table changes
partprobe $DISK

 

 

---

 

문제 관련 참고사항 2. Official Community (Slack)에서의 Thread

[질문자]
Hi. I need help setting up a 1 Master/ 1 Worker Virtualbox based Ceph Cluster. I am using Rook 1.9.3.
I am through the process of installing rook and am trying to get the ceph cluster configured.
I have added a single 32 GB disk to my worker node (seen as /dev/sdb in the following):...
---
master/worker 1:1의 VMware로 Rook-Ceph Cluster(v1.9.3, 최신)를 테스트하는데, 워커노드에 32GB 디스크 한 개가 추가되어 있다.

이렇게 시작한 스레드고, 검색 방법은 "rootfs"를 통해 검색했다.

 

대충 내용은 나와 같이, osd prepare가 전혀 되지 않고, skip 된 후 job complete 된 상태여서 OSD pod가 없는 현상에 있었다.

 

2022-05-17 18:23:31.604327 I | cephosd: creating and starting the osds
2022-05-17 18:23:31.605110 D | cephosd: desiredDevices are [{Name:all OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:true IsDevicePathFilter:false}]
2022-05-17 18:23:31.605231 D | cephosd: context.Devices are:
2022-05-17 18:23:31.605353 D | cephosd: &{Name:sda1 Parent:sda HasChildren:false DevLinks:/dev/disk/by-id/ata-VBOX_HARDDISK_VB0e3425cc-00258e10-part1 /dev/disk/by-partuuid/1c30fbba-01 /dev/disk/by-path/pci-0000:00:0d.0-ata-1-part1 /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0-part1 /dev/disk/by-uuid/1b72d5a0-89b2-4362-b1d3-0457b6828f3b Size:1073741824 UUID: Serial:VBOX_HARDDISK_VB0e3425cc-00258e10 Type:part Rotational:true Readonly:false Partitions:[] Filesystem:xfs Mountpoint:boot Vendor: Model:VBOX_HARDDISK WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/sda1 KernelName:sda1 Encrypted:false}
2022-05-17 18:23:31.605630 D | cephosd: &{Name:sda2 Parent:sda HasChildren:false DevLinks:/dev/disk/by-partuuid/1c30fbba-02 /dev/disk/by-path/pci-0000:00:0d.0-ata-1-part2 /dev/disk/by-id/ata-VBOX_HARDDISK_VB0e3425cc-00258e10-part2 /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0-part2 /dev/disk/by-id/lvm-pv-uuid-ywxnsM-cGSH-2esN-YcQL-uoUM-4lJ4-Pb26fF Size:84824555520 UUID: Serial:VBOX_HARDDISK_VB0e3425cc-00258e10 Type:part Rotational:true Readonly:false Partitions:[] Filesystem:LVM2_member Mountpoint: Vendor: Model:VBOX_HARDDISK WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/sda2 KernelName:sda2 Encrypted:false}
2022-05-17 18:23:31.605996 D | cephosd: &{Name:sdb Parent: HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:00:0d.0-ata-3 /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0 /dev/disk/by-id/ata-VBOX_HARDDISK_VB67874fc6-03c7724f Size:34359738368 UUID:a56792a1-5470-4965-a7fb-583386731a04 Serial:VBOX_HARDDISK_VB67874fc6-03c7724f Type:disk Rotational:true Readonly:false Partitions:[] Filesystem:ceph_bluestore Mountpoint: Vendor: Model:VBOX_HARDDISK WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/sdb KernelName:sdb Encrypted:false}
2022-05-17 18:23:31.609340 D | cephosd: &{Name:zram0 Parent: HasChildren:false DevLinks: Size:4109369344 UUID:c92ab2e6-7e69-4b6b-b5f5-cf8d638c1ae0 Serial: Type:disk Rotational:false Readonly:false Partitions:[] Filesystem: Mountpoint:[SWAP] Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/zram0 KernelName:zram0 Encrypted:false}
2022-05-17 18:23:31.609700 D | cephosd: &{Name:dm-0 Parent: HasChildren:false DevLinks:/dev/disk/by-id/dm-name-fedora_fedora-root /dev/disk/by-id/dm-uuid-LVM-U9eafzGutFZU9iR7ecqILn03q0Up9Gf12kysXuwlvuV06Ox3OLja3k3KwOTODsTj /dev/disk/by-uuid/e3d6f1d3-5b23-4eff-9544-34f87cf8d659 /dev/mapper/fedora_fedora-root /dev/fedora_fedora/root Size:16106127360 UUID: Serial: Type:lvm Rotational:true Readonly:false Partitions:[] Filesystem:xfs Mountpoint:rootfs Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/mapper/fedora_fedora-root KernelName:dm-0 Encrypted:false}
2022-05-17 18:23:31.609928 I | cephosd: skipping device "sda1" with mountpoint "boot"
2022-05-17 18:23:31.610167 I | cephosd: skipping device "sda2" because it contains a filesystem "LVM2_member"
2022-05-17 18:23:31.610986 I | cephosd: skipping device "sdb" because it contains a filesystem "ceph_bluestore"
2022-05-17 18:23:31.611292 I | cephosd: skipping device "zram0" with mountpoint "[SWAP]"
2022-05-17 18:23:31.611509 I | cephosd: skipping 'dm' device "dm-0"
2022-05-17 18:23:31.626751 I | cephosd: configuring osd devices: {"Entries":{}}
2022-05-17 18:23:31.626975 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2022-05-17 18:23:31.627248 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-05-17 18:23:32.423781 D | cephosd: {}
2022-05-17 18:23:32.425272 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2022-05-17 18:23:32.425607 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json
2022-05-17 18:23:34.421705 D | cephosd: {
    "b4fc1de8-f088-43de-abdc-3e0d719ab16d": {
        "ceph_fsid": "8c04c18e-72ca-4cb7-ac70-d9313d1986eb",
        "device": "/dev/sdb",
        "osd_id": 0,
        "osd_uuid": "b4fc1de8-f088-43de-abdc-3e0d719ab16d",
        "type": "bluestore"
    }
}
2022-05-17 18:23:34.422008 I | cephosd: skipping osd.0: "b4fc1de8-f088-43de-abdc-3e0d719ab16d" belonging to a different ceph cluster "8c04c18e-72ca-4cb7-ac70-d9313d1986eb"
2022-05-17 18:23:34.426731 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-05-17 18:23:34.426926 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "node01"

 

rootfs는 다음과 같이 찾을 수 있었다.

2022-05-17 18:23:31.609700 D | cephosd: &{Name:dm-0 Parent: HasChildren:false DevLinks:/dev/disk/by-id/dm-name-fedora_fedora-root /dev/disk/by-id/dm-uuid-LVM-U9eafzGutFZU9iR7ecqILn03q0Up9Gf12kysXuwlvuV06Ox3OLja3k3KwOTODsTj /dev/disk/by-uuid/e3d6f1d3-5b23-4eff-9544-34f87cf8d659 /dev/mapper/fedora_fedora-root /dev/fedora_fedora/root Size:16106127360 UUID: Serial: Type:lvm Rotational:true Readonly:false Partitions:[] Filesystem:xfs Mountpoint:rootfs Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/mapper/fedora_fedora-root KernelName:dm-0 Encrypted:false}

 

solution

아래 의견은 이 스레드에 대한 rook-ceph opensource contributor의 답변이다.

travisn (6 days ago) : If attaching a new virtual disk is an option, that sounds good
travisn (6 days ago) : then you just don’t have to worry about cleaning the existing one

디스크를 더 추가하라는 의미였고, 이 경우 해결이 되었다고 한다.

[root@master rook]# k -n rook-ceph get pods
NAME                                               READY   STATUS      RESTARTS   AGE
rook-ceph-operator-7f88c457b7-2wc2l                1/1     Running     0          2m23s
csi-rbdplugin-ltgkb                                3/3     Running     0          96s
csi-rbdplugin-nkdnv                                3/3     Running     0          96s
csi-rbdplugin-provisioner-847b498845-tbhlf         6/6     Running     0          96s
csi-cephfsplugin-mmmtn                             3/3     Running     0          95s
csi-cephfsplugin-provisioner-7577bb4d59-7r8x2      6/6     Running     0          95s
csi-cephfsplugin-gntxj                             3/3     Running     0          95s
csi-rbdplugin-provisioner-847b498845-rz975         6/6     Running     0          96s
csi-cephfsplugin-provisioner-7577bb4d59-dfmnd      6/6     Running     0          95s
rook-ceph-mon-a-6db7d77586-8jrvj                   1/1     Running     0          86s
rook-ceph-mgr-a-6b7f869c6-zqlml                    1/1     Running     0          60s
rook-ceph-osd-prepare-master-ljd5r                 0/1     Completed   0          38s
rook-ceph-osd-prepare-node01-rhjs8                 0/1     Completed   0          38s
rook-ceph-crashcollector-node01-6d9cf648f5-82rd9   1/1     Running     0          19s
rook-ceph-osd-1-795998c474-kzbdb                   0/1     Running     0          19s
rook-ceph-osd-0-b9ddcdfb8-7t76h                    0/1     Running     0          19s

 

 

---

 

Dive To Solution 1: Rancher에서 Rook-Ceph을 어떻게 설치하는지 Survey!

 

- https://www.cloudops.com/blog/the-ultimate-rook-and-ceph-survival-guide/

 

The Ultimate Rook and Ceph Survival Guide

Learn the basics of Rook and Ceph in this ultimate guide to implement cloud native storage solutions.

www.cloudops.com

- https://gist.github.com/vitobotta/45f62ca44bfa19196bc2e44c9ec42b8b

 

Install Rook/Ceph with Rancher deployed cluster

Install Rook/Ceph with Rancher deployed cluster. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

 

2~3년 전 자료에 따르면, Rook-Ceph을 RKE상으로 실행할 때 volume을 확보하기 위해,

K8S의 기능인 Flexvolume을 사용하라고 말하고 있다. (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-storage/flexvolume.md)

다만, 문제점은 현재 사용되고 있지 않는 기능 (2021.05.19 기준 markdown. K8S 1.8v 이후 GA기능)이며, 필요한 경우 kubelet에서 사용하라고 적혀 있음.

 

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

 

kubelet

Synopsis The kubelet is the primary "node agent" that runs on each node. It can register the node with the apiserver using one of: the hostname; a flag to override the hostname; or specific logic for a cloud provider. The kubelet works in terms of a PodSpe

kubernetes.io

해당 공식문서의 맨 밑, --volume-plugin-dir에서 찾아볼 수 있는 기능임.

 

해당 기능을 RKE에서 적용하기 위해, https://jay-chamber.tistory.com/entry/Rancher%EC%97%90%EC%84%9C-Kubernetes%EC%9D%98-Feature-Gate%EB%A5%BC-%ED%99%9C%EC%84%B1%ED%99%94%ED%95%98%EB%8A%94-%EB%B0%A9%EB%B2%95 에서 언급한 방법과 같이 Cluster Config 접근

 

 

수정을 통해 다음과 같이 변경

 

kubelet:
      extra_args:
        feature-gates: TTLAfterFinished=true
        volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      extra_binds:
        - /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec

 

이후 Save시

 

이런 메시지가 뜨면서 적용 시작.

---> 이 방법으로는 되지 않는다. (똑같은 결과 발생)

 

---

 

Dive To Solution 2: K3S에서 Rook-Ceph을 어떻게 설치하는지 Survey!

 

https://coldbrewlabs.ca/highly-available-dynamic-persistent-storage-with-rook-on-k3s-ba6c14e4324

 

Highly Available, Dynamic, Persistent Storage with Rook on K3s

Deploying a Ceph cluster on RaspberryPis running K3s made easy by Rook.

coldbrewlabs.ca

해당 아티클에서는 k3s 클러스터(로컬 노드 스토리지를 사용하는 클러스터라는 점에서 선택했다고 함)에서

longhorn과 rook을 스토리지 매니저로 놓고 고민하였는데,

 

Prerequisites
To follow along with the steps below, you will need access to an already deployed Kubernetes cluster. Each node in the cluster will need to have an empty, unused disk that will be claimed by Rook. For me, this was a three node RaspberryPi 4 K3s cluster, with each Pi having a Kingston USB 3.0 32GB attached to it to achieve that extra, unused disk requirement.

 

다음과 같이, rook을 사용하기 위해 unused disk를 가진 node가 필요하다고 한다.... 결국 Rancher에서 Partition을 수행하는 방법이 남은 듯하다.

 

해결 결과

결국, 기존에 longhorn storage에서처럼 클러스터 노드에서 마운트 된 것과 상관없이, 새로운 볼륨을 추가하는 방식으로 해결함

 

Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.