CEPH iriri iṣẹ

Nigbati data ba wa ju ti o le baamu lori disk kan, o to akoko lati ronu nipa RAID. Gẹ́gẹ́ bí ọmọdé, mo sábà máa ń gbọ́ látọ̀dọ̀ àwọn àgbààgbà mi pé: “Ní ọjọ́ kan RAID yóò jẹ́ ohun àtijọ́, ibi ìpamọ́ nǹkan yóò kún ayé, àti pé o kò tilẹ̀ mọ ohun tí CEPH jẹ́,” nítorí náà ohun àkọ́kọ́ nínú ìgbésí ayé òmìnira mi. je lati ṣẹda ara mi iṣupọ. Idi ti idanwo naa ni lati ni oye pẹlu eto inu ti ceph ati loye ipari ohun elo rẹ. Bawo ni imuse ti ceph ni idalare ni awọn iṣowo alabọde ati ni awọn kekere? Lẹhin awọn ọdun pupọ ti iṣẹ ati tọkọtaya ti awọn adanu data ti ko ni iyipada, oye ti awọn intricacies dide pe kii ṣe ohun gbogbo rọrun. Awọn iyasọtọ ti CEPH ṣe awọn idena si isọdọmọ ni ibigbogbo, ati nitori wọn, awọn idanwo ti de opin ti ku. Ni isalẹ ni apejuwe ti gbogbo awọn igbesẹ ti a mu, abajade ti o gba ati awọn ipinnu ti a ti ṣe. Ti awọn eniyan ti o ni oye ba pin iriri wọn ati ṣalaye diẹ ninu awọn aaye, Emi yoo dupẹ.

Akiyesi: Awọn asọye ti ṣe idanimọ awọn aṣiṣe to ṣe pataki ni diẹ ninu awọn arosinu ti o nilo atunyẹwo gbogbo nkan naa.

CEPH nwon.Mirza

Iṣupọ CEPH daapọ nọmba lainidii K ti awọn disiki ti iwọn lainidii ati tọju data lori wọn, ṣiṣe ẹda kọọkan (4 MB nipasẹ aiyipada) nọmba ti a fun ni awọn akoko N.

Jẹ ki a wo ọran ti o rọrun julọ pẹlu awọn disiki kanna meji. Lati ọdọ wọn o le ṣajọ RAID 1 tabi iṣupọ pẹlu N=2 - abajade yoo jẹ kanna. Ti awọn disiki mẹta ba wa ati pe wọn ni awọn titobi oriṣiriṣi, lẹhinna o rọrun lati ṣajọpọ iṣupọ kan pẹlu N = 2: diẹ ninu awọn data yoo wa lori disks 1 ati 2, diẹ ninu awọn yoo wa lori disks 1 ati 3, ati diẹ ninu awọn yoo jẹ. lori 2 ati 3, nigba ti RAID yoo ko (o le adapo iru kan igbogun ti, sugbon o yoo jẹ a perversion). Ti awọn disiki paapaa diẹ sii, lẹhinna o ṣee ṣe lati ṣẹda RAID 5; CEPH ni afọwọṣe - erasure_code, eyiti o tako awọn imọran ibẹrẹ ti awọn olupilẹṣẹ, nitorinaa ko ṣe akiyesi. RAID 5 dawọle pe nọmba kekere ti awọn awakọ wa, gbogbo eyiti o wa ni ipo ti o dara. Ti ọkan ba kuna, awọn miiran gbọdọ duro titi disiki yoo fi rọpo ati data naa yoo pada si ọdọ rẹ. CEPH, pẹlu N> = 3, ṣe iwuri fun lilo awọn disiki atijọ, ni pataki, ti o ba tọju ọpọlọpọ awọn disiki ti o dara lati fi ẹda data kan pamọ, ti o si fi awọn ẹda meji tabi mẹta ti o ku pamọ sori nọmba nla ti awọn disiki atijọ, lẹhinna alaye naa. yoo wa ni ailewu, niwọn igba ti awọn disiki titun wa laaye - ko si awọn iṣoro, ati pe ti ọkan ninu wọn ba ṣẹ, lẹhinna ikuna nigbakanna ti awọn disiki mẹta pẹlu igbesi aye iṣẹ ti o ju ọdun marun lọ, ni pataki lati awọn olupin oriṣiriṣi, jẹ eyiti ko ṣeeṣe pupọ. iṣẹlẹ.

Iyatọ kan wa si pinpin awọn ẹda. Nipa aiyipada, a ro pe data ti pin si diẹ sii (~ 100 fun disk) awọn ẹgbẹ pinpin PG, ọkọọkan wọn jẹ ẹda lori diẹ ninu awọn disiki. Jẹ ki a sọ K = 6, N = 2, lẹhinna ti eyikeyi awọn disiki meji ba kuna, data jẹ iṣeduro lati sọnu, nitori pe ni ibamu si ilana iṣeeṣe, PG kan yoo wa ni o kere ju ti yoo wa lori awọn disiki meji wọnyi. Ati pipadanu ẹgbẹ kan jẹ ki gbogbo data ninu adagun ko si. Ti awọn disiki naa ba pin si awọn orisii mẹta ati pe a gba data laaye lati wa ni ipamọ nikan lori awọn disiki laarin bata kan, lẹhinna iru pinpin tun jẹ sooro si ikuna ti eyikeyi disk kan, ṣugbọn ti awọn disiki meji ba kuna, iṣeeṣe ti pipadanu data kii ṣe. 100%, ṣugbọn nikan 3/15, ati paapaa ti ikuna awọn disiki mẹta - nikan 12/20. Nitorinaa, entropy ni pinpin data ko ṣe alabapin si ifarada ẹbi. Tun ṣe akiyesi pe fun olupin faili kan, Ramu ọfẹ ṣe alekun iyara esi. Awọn diẹ iranti ni kọọkan ipade, ati awọn diẹ iranti ni gbogbo awọn apa, awọn yiyara o yoo jẹ. Eyi jẹ laiseaniani anfani ti iṣupọ kan lori olupin kan ati, paapaa diẹ sii, NAS ohun elo kan, nibiti iye kekere ti iranti ti kọ sinu.

O tẹle pe CEPH jẹ ọna ti o dara lati ṣẹda eto ipamọ data ti o gbẹkẹle fun awọn mewa ti TB pẹlu agbara lati ṣe iwọn pẹlu idoko-owo kekere lati awọn ohun elo igba atijọ (nibi, dajudaju, awọn idiyele yoo nilo, ṣugbọn kekere ni akawe si awọn eto ipamọ iṣowo).

Iṣatunṣe iṣupọ

Fun ṣàdánwò, jẹ ki ká ya a decommissioned kọmputa Intel DQ57TM + Intel mojuto i3 540 + 16 GB ti Ramu. A yoo ṣeto awọn disiki TB mẹrin mẹrin si nkan bi RAID2, lẹhin idanwo aṣeyọri a yoo ṣafikun ipade keji ati nọmba kanna ti awọn disiki.

Fifi sori ẹrọ Linux. Pinpin nilo agbara lati ṣe akanṣe ati iduroṣinṣin. Debian ati Suse pade awọn ibeere. Suse ni insitola ti o rọ diẹ sii ti o fun ọ laaye lati mu eyikeyi package ṣiṣẹ; Laanu, Emi ko le mọ iru awọn ti o le danu laisi ibajẹ eto naa. Fi Debian sori ẹrọ nipa lilo buster debootstrap. Aṣayan ipilẹ min-fi sori ẹrọ eto fifọ ti ko ni awakọ. Awọn iyato ninu iwọn akawe si awọn kikun ti ikede ni ko ki ńlá bi lati ribee. Niwọn igba ti a ti ṣe iṣẹ naa lori ẹrọ ti ara, Mo fẹ lati ya awọn fọto, bii lori awọn ẹrọ foju. Aṣayan yii jẹ nipasẹ boya LVM tabi btrfs (tabi xfs, tabi zfs - iyatọ ko tobi). LVM snapshots ni ko kan to lagbara ojuami. Fi btrfs sori ẹrọ. Ati bootloader wa ninu MBR. Ko si aaye ni idimu disiki 50 MB pẹlu ipin FAT nigba ti o le Titari rẹ sinu agbegbe tabili ipin 1 MB ki o pin gbogbo aaye fun eto naa. Mu 700 MB lori disk. Emi ko ranti iye ti fifi sori SUSE ipilẹ ni, Mo ro pe o jẹ nipa 1.1 tabi 1.4 GB.

Fi CEPH sori ẹrọ. A foju ti ikede 12 ni ibi ipamọ debian ati sopọ taara lati aaye 15.2.3. A tẹle awọn itọnisọna lati apakan “Fi CEPH sori ẹrọ pẹlu ọwọ” pẹlu awọn itọsi wọnyi:

  • Ṣaaju ki o to so ibi-ipamọ pọ, o gbọdọ fi awọn iwe-ẹri gnupg wget ca-fi sori ẹrọ
  • Lẹhin ti o so ibi ipamọ pọ, ṣugbọn ṣaaju fifi sori ẹrọ iṣupọ, fifi sori awọn idii jẹ ti yọkuro: apt -y --no-install-ṣeduro fi sori ẹrọ ceph-common ceph-mon ceph-osd ceph-mds ceph-mgr
  • Nigbati o ba nfi CEPH sori ẹrọ, fun awọn idi aimọ, yoo gbiyanju lati fi lvm2 sori ẹrọ. Ni opo, kii ṣe aanu, ṣugbọn fifi sori ẹrọ kuna, nitorinaa CEPH kii yoo fi sii boya.

    Patch yii ṣe iranlọwọ:

    cat << EOF >> /var/lib/dpkg/status
    Package: lvm2
    Status: install ok installed
    Priority: important
    Section: admin
    Installed-Size: 0
    Maintainer: Debian Adduser Developers <[email protected]>
    Architecture: all
    Multi-Arch: foreign
    Version: 113.118
    Description: No-install
    EOF
    

Akopọ iṣupọ

ceph-osd - jẹ iduro fun titoju data lori disiki. Fun disk kọọkan, iṣẹ nẹtiwọki kan ti ṣe ifilọlẹ ti o gba ati ṣiṣe awọn ibeere lati ka tabi kọ si awọn nkan. Awọn ipin meji ni a ṣẹda lori disiki naa. Ọkan ninu wọn ni alaye nipa iṣupọ, nọmba disk, ati awọn bọtini si iṣupọ naa. Alaye 1KB yii ni a ṣẹda ni ẹẹkan nigbati o nfi disk kan kun ati pe ko ti ṣe akiyesi lati yipada. Ipin keji ko ni eto faili ati tọju data alakomeji CEPH. Fifi sori ẹrọ aifọwọyi ni awọn ẹya iṣaaju ṣẹda ipin xfs 100MB fun alaye iṣẹ. Mo ti yi disk pada si MBR ati soto nikan 16MB - iṣẹ ko kerora. Mo ro pe xfs le paarọ rẹ pẹlu ext laisi awọn iṣoro eyikeyi. Ipin yii ti gbe sinu /var/lib/…, nibiti iṣẹ naa ti ka alaye nipa OSD ati tun rii itọkasi si ẹrọ idina nibiti data alakomeji ti wa ni ipamọ. Ni imọ-jinlẹ, o le gbe awọn faili iranlọwọ lẹsẹkẹsẹ sinu /var/lib/…, ati pin gbogbo disk fun data. Nigbati o ba ṣẹda OSD nipasẹ ceph-deploy, ofin kan ni a ṣẹda laifọwọyi lati gbe ipin naa sinu /var/lib/…, ati pe olumulo ceph tun jẹ ẹtọ lati ka ohun elo idena ti o fẹ. Ti o ba fi sori ẹrọ pẹlu ọwọ, o gbọdọ ṣe eyi funrararẹ; iwe ko sọ eyi. O tun ni imọran lati ṣe pato paramita ibi-afẹde iranti osd ki iranti ti ara wa to.

ceph-mds. Ni ipele kekere, CEPH jẹ ibi ipamọ ohun. Agbara lati dènà ibi ipamọ wa si isalẹ lati titoju bulọọki 4MB kọọkan bi ohun kan. Ibi ipamọ faili ṣiṣẹ lori ipilẹ kanna. Awọn adagun omi meji ni a ṣẹda: ọkan fun metadata, ekeji fun data. Wọn ti wa ni idapo sinu kan faili eto. Ni akoko yii, iru igbasilẹ kan ti ṣẹda, nitorinaa ti o ba paarẹ eto faili, ṣugbọn tọju awọn adagun mejeeji, iwọ kii yoo ni anfani lati mu pada. Ilana kan wa fun yiyọ awọn faili jade nipasẹ awọn bulọọki, Emi ko ṣe idanwo rẹ. Iṣẹ ceph-mds jẹ iduro fun iraye si eto faili naa. Eto faili kọọkan nilo apẹẹrẹ lọtọ ti iṣẹ naa. Aṣayan “ atọka” wa, eyiti o fun ọ laaye lati ṣẹda irisi ti awọn ọna ṣiṣe faili pupọ ni ọkan - tun ko ni idanwo.

Ceph-mon - Iṣẹ yii tọju maapu ti iṣupọ kan. O pẹlu alaye nipa gbogbo awọn OSDs, algorithm kan fun pinpin awọn PG ni OSDs ati, pataki julọ, alaye nipa gbogbo awọn nkan (awọn alaye ti ẹrọ yii ko han si mi: itọsọna kan wa /var/lib/ceph/mon/…/ store.db, o ni kan ti o tobi faili ti wa ni 26MB, ati ni a iṣupọ ti 105K ohun, o wa ni jade lati wa ni kekere kan lori 256 baiti fun ohun - Mo ro pe awọn atẹle tọjú akojọ kan ti gbogbo awọn ohun ati awọn PGs ninu eyi ti. wọn wa). Bibajẹ si liana yii n yọrisi isonu ti gbogbo data ninu iṣupọ naa. Nitorinaa ipari ti fa pe CRUSH fihan bi awọn PG ṣe wa lori OSD, ati bii awọn nkan ṣe wa lori awọn PGs - wọn wa ni ipamọ aarin inu ibi ipamọ data, laibikita bi awọn olupilẹṣẹ ṣe yago fun ọrọ yii. Bi abajade, ni akọkọ, a ko le fi eto naa sori kọnputa filasi ni ipo RO, nitori data ti wa ni igbasilẹ nigbagbogbo, a nilo disk afikun fun iwọnyi (o fee ju 1 GB), keji, o jẹ dandan lati ni daakọ ni akoko gidi yi mimọ. Ti ọpọlọpọ awọn diigi ba wa, lẹhinna ifarada aṣiṣe jẹ idaniloju laifọwọyi, ṣugbọn ninu ọran wa atẹle kan nikan wa, o pọju meji. Ilana imọ-jinlẹ kan wa fun mimu-pada sipo atẹle kan ti o da lori data OSD, Mo tun lọ si ni igba mẹta fun awọn idi pupọ, ati ni igba mẹta ko si awọn ifiranṣẹ aṣiṣe, bii ko si data. Laanu, ẹrọ yii ko ṣiṣẹ. Boya a ṣiṣẹ ipin kekere kan lori OSD ki o ṣajọ RAID kan lati ṣafipamọ data data, eyiti yoo dajudaju ni ipa buburu pupọ lori iṣẹ, tabi a pin o kere ju awọn media ti ara ti o gbẹkẹle meji, ni pataki USB, ki o má ba gba awọn ebute oko oju omi.

rados-gw - okeere ibi ipamọ ohun nipasẹ ilana S3 ati iru. Ṣẹda ọpọlọpọ awọn adagun-odo, koyewa idi. Emi ko ṣe idanwo pupọ.

ceph-mgr - Nigbati o ba nfi iṣẹ yii sori ẹrọ, ọpọlọpọ awọn modulu ti ṣe ifilọlẹ. Ọkan ninu wọn jẹ autoscale ti ko le ṣe alaabo. O tiraka lati ṣetọju iye deede ti PG/OSD. Ti o ba fẹ ṣakoso ipin pẹlu ọwọ, o le mu iwọnwọn kuro fun adagun-odo kọọkan, ṣugbọn ninu ọran yii module naa ṣubu pẹlu pipin nipasẹ 0, ati ipo iṣupọ di Aṣiṣe. A kọ module naa ni Python, ati pe ti o ba ṣalaye laini pataki ninu rẹ, eyi nyorisi disabling rẹ. Ọlẹ pupọ lati ranti awọn alaye.

Akojọ awọn orisun ti a lo:

Fifi sori ẹrọ ti CEPH
Imularada lati ikuna atẹle pipe

Awọn atokọ iwe afọwọkọ:

Fifi sori ẹrọ eto nipasẹ debootstrap

blkdev=sdb1
mkfs.btrfs -f /dev/$blkdev
mount /dev/$blkdev /mnt
cd /mnt
for i in {@,@var,@home}; do btrfs subvolume create $i; done
mkdir snapshot @/{var,home}
for i in {var,home}; do mount -o bind @${i} @/$i; done
debootstrap buster @ http://deb.debian.org/debian; echo $?
for i in {dev,proc,sys}; do mount -o bind /$i @/$i; done
cp /etc/bash.bashrc @/etc/

chroot /mnt/@ /bin/bash
echo rbd1 > /etc/hostname
passwd
uuid=`blkid | grep $blkdev | cut -d """ -f 2`
cat << EOF > /etc/fstab
UUID=$uuid / btrfs noatime,nodiratime,subvol=@ 0 1
UUID=$uuid /var btrfs noatime,nodiratime,subvol=@var 0 2
UUID=$uuid /home btrfs noatime,nodiratime,subvol=@home 0 2
EOF
cat << EOF >> /var/lib/dpkg/status
Package: lvm2
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install

Package: sudo
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install
EOF

exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

apt -yq install --no-install-recommends linux-image-amd64 bash-completion ed btrfs-progs grub-pc iproute2 ssh  smartmontools ntfs-3g net-tools man
exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

Ṣẹda iṣupọ kan

apt -yq install --no-install-recommends gnupg wget ca-certificates
echo 'deb https://download.ceph.com/debian-octopus/ buster main' >> /etc/apt/sources.list
wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
apt update
apt -yq install --no-install-recommends ceph-common ceph-mon

echo 192.168.11.11 rbd1 >> /etc/hosts
uuid=`cat /proc/sys/kernel/random/uuid`
cat << EOF > /etc/ceph/ceph.conf
[global]
fsid = $uuid
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
mon allow pool delete = true
mon host = 192.168.11.11
mon initial members = rbd1
mon max pg per osd = 385
osd crush update on start = false
#osd memory target = 2147483648
osd memory target = 1610612736
osd scrub chunk min = 1
osd scrub chunk max = 2
osd scrub sleep = .2
osd pool default pg autoscale mode = off
osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 1
osd pool default pgp num = 1
[mon]
mgr initial modules = dashboard
EOF

ceph-authtool --create-keyring ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
cp ceph.client.admin.keyring /etc/ceph/
ceph-authtool --create-keyring bootstrap-osd.ceph.keyring --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 'allow r'
cp bootstrap-osd.ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
ceph-authtool ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
ceph-authtool ceph.mon.keyring --import-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
monmaptool --create --add rbd1 192.168.11.11 --fsid $uuid monmap
rm -R /var/lib/ceph/mon/ceph-rbd1/*
ceph-mon --mkfs -i rbd1 --monmap monmap --keyring ceph.mon.keyring
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-mon@rbd1
systemctl start ceph-mon@rbd1
ceph mon enable-msgr2
ceph status

# dashboard

apt -yq install --no-install-recommends ceph-mgr ceph-mgr-dashboard python3-distutils python3-yaml
mkdir /var/lib/ceph/mgr/ceph-rbd1
ceph auth get-or-create mgr.rbd1 mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-rbd1/keyring
systemctl enable ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1
ceph config set mgr mgr/dashboard/ssl false
ceph config set mgr mgr/dashboard/server_port 7000
ceph dashboard ac-user-create root 1111115 administrator
systemctl stop ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1

Ṣafikun OSD (apakan)

apt install ceph-osd

osdnum=`ceph osd create`
mkdir -p /var/lib/ceph/osd/ceph-$osdnum
mkfs -t xfs /dev/sda1
mount -t xfs /dev/sda1 /var/lib/ceph/osd/ceph-$osdnum
cd /var/lib/ceph/osd/ceph-$osdnum
ceph auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' > /var/lib/ceph/osd/ceph-$osdnum/keyring
ln -s /dev/disk/by-partuuid/d8cc3da6-02  block
ceph-osd -i $osdnum --mkfs
#chown ceph:ceph /dev/sd?2
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-osd@$osdnum
systemctl start ceph-osd@$osdnum

Akopọ

Anfani tita akọkọ ti CEPH jẹ CRUSH - algorithm kan fun iṣiro ipo data. Awọn diigi pin kaakiri algorithm yii si awọn alabara, lẹhin eyiti awọn alabara taara beere ipade ti o fẹ ati OSD ti o fẹ. CRUSH ṣe idaniloju ko si aarin. O jẹ faili kekere ti o le tẹjade paapaa ki o kọkọ sori odi. Iṣe ti fihan pe CRUSH kii ṣe maapu ti o pari. Ti o ba run ati tun ṣe awọn diigi, titọju gbogbo OSD ati CRUSH, lẹhinna eyi ko to lati mu iṣupọ naa pada. Lati eyi o ti pari pe atẹle kọọkan tọju diẹ ninu awọn metadata nipa gbogbo iṣupọ. Iwọn kekere ti metadata yii ko fa awọn ihamọ lori iwọn iṣupọ, ṣugbọn nilo idaniloju aabo wọn, eyiti o yọkuro awọn ifowopamọ disk nipa fifi sori ẹrọ lori kọnputa filasi ati yọkuro awọn iṣupọ pẹlu awọn apa ti o kere ju mẹta. Ilana ibinu ti Olùgbéejáde nipa awọn ẹya iyan. Jina lati minimalism. Iwe-ipamọ naa wa ni ipele ti “o ṣeun fun ohun ti a ni, ṣugbọn o jẹ pupọ, o kere pupọ.” Agbara lati ṣe ajọṣepọ pẹlu awọn iṣẹ ni ipele kekere ni a pese, ṣugbọn iwe-ipamọ fọwọkan lori koko-ọrọ yii ju lainidi, nitorinaa o ṣee ṣe rara ju bẹẹni lọ. O fẹrẹ ko si aye lati gba data pada lati ipo pajawiri.

Awọn aṣayan fun igbese siwaju: fi CEPH silẹ ki o lo btrfs multi-disk banal (tabi xfs, zfs), wa alaye tuntun nipa CEPH, eyiti yoo gba ọ laaye lati ṣiṣẹ labẹ awọn ipo pàtó kan, gbiyanju lati kọ ibi ipamọ tirẹ bi ilọsiwaju ti ilọsiwaju. Idanileko.

orisun: www.habr.com

Fi ọrọìwòye kun