Isipiliyoni sokusebenza se-CEPH

Uma kunedatha eningi kunaleyo engangena kudiski eyodwa, yisikhathi sokucabanga nge-RAID. Ngiseyingane, ngangivame ukuzwa kubadala bami: “Ngolunye usuku i-RAID izoba yinto yesikhathi esidlule, ukugcinwa kwezinto kuyogcwala umhlaba, futhi awazi nokuthi iyini i-CEPH,” ngakho into yokuqala ekuphileni kwami ​​okuzimele. bekuwukudala iqoqo lami. Injongo yokuhlolwa bekuwukujwayelana nesakhiwo sangaphakathi se-ceph nokuqonda ububanzi bokusebenza kwayo. Kuthetheleleka kangakanani ukuqaliswa kwe-ceph emabhizinisini aphakathi nendawo nakumancane? Ngemuva kweminyaka eminingana yokusebenza kanye nokulahleka kwedatha okumbalwa okungenakuhlehliswa, ukuqonda okuyinkimbinkimbi kwavela ukuthi akuyona yonke into elula kangaka. Izici ezingavamile ze-CEPH zibeka izithiyo ekwamukelweni kwayo okusakazekile, futhi ngenxa yazo, ukuhlola sekufinyelele esiphethweni esiphelele. Ngezansi kunencazelo yazo zonke izinyathelo ezithathiwe, umphumela otholiwe kanye neziphetho ezithathiwe. Uma abantu abanolwazi babelana ngolwazi lwabo futhi bachaze amaphuzu athile, ngizobonga.

Qaphela: Ababeka amazwana babone amaphutha amakhulu kokunye ukuqagela okudinga ukubuyekezwa kwayo yonke i-athikili.

CEPH Isu

Iqoqo le-CEPH lihlanganisa inombolo engafanele engu-K yamadiski osayizi ongafanele futhi ligcine idatha kuwo, liphindaphinda ucezu ngalunye (4 MB ngokuzenzakalelayo) inombolo enikeziwe izikhathi ezingu-N.

Ake sicabangele icala elilula elinamadiski amabili afanayo. Kuzo ungahlanganisa i-RAID 1 noma iqoqo eline-N=2 - umphumela uzofana. Uma kukhona amadiski amathathu futhi anosayizi abahlukene, kulula ukuhlanganisa iqoqo nge-N=2: enye idatha izoba kudiski 1 no-2, enye izoba kudiski 1 no-3, kanti enye izoba. ku-2 no-3, kuyilapho i-RAID ingeke (ungahlanganisa i-RAID enjalo, kodwa kungaba ukuhlanekezela). Uma kukhona amadiski engeziwe, khona-ke kungenzeka ukudala i-RAID 5; i-CEPH ine-analogue - erasure_code, ephikisana nemibono yokuqala yonjiniyela, ngakho-ke ayicatshangelwa. I-RAID 5 ithatha ukuthi kunezinombolo ezimbalwa zokushayela, zonke zisesimweni esihle. Uma eyodwa ihluleka, abanye kufanele babambe kuze kube yilapho idiski ishintshwa futhi idatha ibuyiselwa kuyo. I-CEPH, ene-N> = 3, ikhuthaza ukusetshenziswa kwama-disks amadala, ikakhulukazi, uma ugcina amadiski amahle amaningana ukuze ugcine ikhophi eyodwa yedatha, futhi ugcine amakhophi amabili noma amathathu asele enanini elikhulu lamadiski amadala, bese ulwazi kuzobe kuphephile, ngoba okwamanje amadiski amasha ayaphila - azikho izinkinga, futhi uma enye yazo iphuka, khona-ke ukwehluleka ngesikhathi esisodwa kwamadiski amathathu nempilo yesevisi yeminyaka engaphezu kwemihlanu, okungcono kakhulu kumaseva ahlukene, akunakwenzeka kakhulu. umcimbi.

Kunobuqili ekusatshalalisweni kwamakhophi. Ngokuzenzakalelayo, kucatshangwa ukuthi idatha ihlukaniswe yaba ngaphezulu (~100 ngediski ngayinye) amaqembu wokusabalalisa we-PG, ngalinye liphindwe kwamanye amadiski. Ake sithi K = 6, N = 2, khona-ke uma noma yimaphi amadiski amabili ehluleka, idatha iqinisekisiwe ukuthi izolahleka, ngoba ngokombono wamathuba, kuzoba khona okungenani i-PG eyodwa ezobe ikhona kulawa madiski amabili. Futhi ukulahlekelwa kweqembu elilodwa kwenza yonke idatha ku-pool ingatholakali. Uma ama-disks ahlukaniswe ngamapheya amathathu futhi idatha ivunyelwe ukugcinwa kuphela kumadiski ngaphakathi kwepheya eyodwa, khona-ke ukusatshalaliswa okunjalo nakho kumelana nokwehluleka kwanoma iyiphi idiski eyodwa, kepha uma amadiski amabili ehluleka, amathuba okulahleka kwedatha awanawo. 100%, kodwa 3/15 kuphela, futhi ngisho uma kwenzeka ukwehluleka ezintathu disk - 12/20 kuphela. Ngakho-ke, i-entropy ekusabalaliseni idatha ayifaki isandla ekubekezeleleni amaphutha. Futhi qaphela ukuthi kuseva yefayela, i-RAM yamahhala inyusa kakhulu isivinini sokuphendula. Inkumbulo eyengeziwe endaweni ngayinye, futhi inkumbulo eyengeziwe kuwo wonke ama-node, izoshesha. Lokhu ngokungangabazeki kuyinzuzo yeqoqo phezu kweseva eyodwa futhi, ngisho nangaphezulu, i-hardware NAS, lapho inani elincane kakhulu lenkumbulo lakhiwe khona.

Kulandela ukuthi i-CEPH iyindlela enhle yokudala isistimu yokugcina idatha ethembekile yamashumi e-TB enekhono lokukala ngokutshalwa kwezimali okuncane kusuka kumishini ephelelwe yisikhathi (lapha, yiqiniso, izindleko zizodingeka, kodwa ezincane uma kuqhathaniswa nezinhlelo zokugcina ezentengiso).

Ukuqaliswa kweqoqo

Ngokuhlolwa, masithathe ikhompuyutha enqanyuliwe i-Intel DQ57TM + Intel core i3 540 + 16 GB ye-RAM. Sizohlela amadiski amane we-TB we-TB into efana ne-RAID2, ngemva kokuhlolwa okuphumelelayo sizofaka i-node yesibili kanye nenani elifanayo lamadiski.

Ifaka i-Linux. Ukusabalalisa kudinga ikhono lokwenza ngendlela oyifisayo nokuzinza. U-Debian no-Suse bahlangabezana nezimfuneko. I-Suse inesifaki esivumelana nezimo esikuvumela ukuthi ukhubaze noma iyiphi iphakheji; Ngeshwa, angikwazanga ukuthola ukuthi yiziphi ezingalahlwa ngaphandle kokulimaza uhlelo. Faka i-Debian usebenzisa i-debootstrap buster. Inketho ye-min-base ifaka isistimu ephukile engenabashayeli. Umehluko ngosayizi uma kuqhathaniswa nenguqulo egcwele awumkhulu kangako ukuthi ungahlupha. Njengoba umsebenzi wenziwa emshinini womzimba, ngifuna ukuthatha izifinyezo, njengasemishinini ebonakalayo. Le nketho ihlinzekwa yi-LVM noma i-btrfs (noma i-xfs, noma i-zfs - umehluko awumkhulu). Izifinyezo ze-LVM azilona iphuzu eliqinile. Faka i-btrfs. Futhi i-bootloader iku-MBR. Asikho isidingo sokuhlanganisa idiski engu-50 MB ene-FAT partition lapho ungakwazi ukuliphusha endaweni yetafula lokuhlukanisa elingu-1 MB futhi unikeze sonke isikhala sesistimu. Ithathe i-700 MB kudiski. Angikhumbuli ukuthi ukufakwa kwe-SUSE okuyisisekelo kunani, ngicabanga ukuthi kumayelana ne-1.1 noma i-1.4 GB.

Faka i-CEPH. Siziba inguqulo 12 endaweni yokugcina i-debian futhi sixhuma ngqo kusuka kusayithi le-15.2.3. Silandela imiyalelo evela esigabeni “Faka i-CEPH mathupha” ngalezi zixwayiso ezilandelayo:

  • Ngaphambi kokuxhuma indawo yokugcina, kufanele ufake izitifiketi ze-gnupg wget ca-certificates
  • Ngemuva kokuxhuma indawo yokugcina, kodwa ngaphambi kokufaka iqoqo, ukufaka amaphakheji kushiywe: i-apt -y --no-install-incoma ukufaka i-ceph-common ceph-mon ceph-osd ceph-mds ceph-mgr
  • Lapho ufaka i-CEPH, ngenxa yezizathu ezingaziwa, izozama ukufaka i-lvm2. Empeleni, akusona isihawu, kodwa ukufakwa kwehluleka, ngakho-ke i-CEPH ngeke ifake noma.

    Lesi siqeshana sisizile:

    cat << EOF >> /var/lib/dpkg/status
    Package: lvm2
    Status: install ok installed
    Priority: important
    Section: admin
    Installed-Size: 0
    Maintainer: Debian Adduser Developers <[email protected]>
    Architecture: all
    Multi-Arch: foreign
    Version: 113.118
    Description: No-install
    EOF
    

Ukubuka konke kweqoqo

ceph-osd - inesibopho sokugcina idatha kudiski. Kudiski ngayinye, kwethulwa isevisi yenethiwekhi eyamukela futhi yenze izicelo zokufunda noma ukubhalela izinto. Ama-partitions amabili adalwe kudiski. Enye yazo iqukethe ulwazi mayelana neqoqo, inombolo yediski, nokhiye beqoqo. Lolu lwazi lwe-1KB lwenziwa kanye uma lwengeza idiski futhi alukaze luqashelwe ukuthi luyashintsha. Ingxenye yesibili ayinalo uhlelo lwefayela futhi igcina idatha kanambambili ye-CEPH. Ukufaka okuzenzakalelayo ezinguqulweni ezedlule kudale ukwahlukanisa okungu-100MB xfs ngolwazi lwesevisi. Ngiguqule idiski yaba yi-MBR futhi nganikeza i-16MB kuphela - isevisi ayikhonondi. Ngicabanga ukuthi ama-xfs angashintshwa kufakwe i-ext ngaphandle kwezinkinga. Lokhu kuhlukaniswa kufakwe ku-/var/lib/…, lapho isevisi ifunda khona ulwazi mayelana ne-OSD futhi ithola ireferensi kudivayisi yokuvimba lapho kugcinwa khona idatha kanambambili. Ngokwethiyori, ungabeka ngokushesha amafayela asizayo ku-/var/lib/…, futhi unikeze yonke idiski ukuze uthole idatha. Lapho udala i-OSD kusetshenziswa i-ceph-deploy, umthetho udalwa ngokuzenzakalela ukuze kukhwezwe ukwahlukanisa ku-/var/lib/…, futhi umsebenzisi we-ceph unikezwe amalungelo okufunda idivayisi efiselekayo yokuvimba. Lapho ufaka ngesandla, udinga ukwenza lokhu ngokwakho; imibhalo ayisho lokhu. Kuyatuseka futhi ukucacisa ipharamitha yethagethi yememori ye-osd ukuze kube nenkumbulo eyanele yomzimba.

ceph-mds. Ezingeni eliphansi, i-CEPH isitoreji sento. Ikhono lokuvimba isitoreji lifinyelela phansi ekugcineni ibhulokhi ngayinye engu-4MB njengento. Isitoreji sefayela sisebenza ngesimiso esifanayo. Kudalwe amachibi amabili: eyodwa eyemethadatha, enye eyedatha. Ahlanganiswe abe yisistimu yefayela. Ngalesi sikhathi, uhlobo oluthile lwerekhodi luyadalwa, ngakho-ke uma ususa uhlelo lwefayela, kodwa ugcine amachibi womabili, ngeke ukwazi ukuwubuyisela. Kunenqubo yokukhipha amafayela ngamabhulokhi, angikayihloli. Isevisi ye-ceph-mds inesibopho sokufinyelela ohlelweni lwefayela. Isistimu yefayela ngayinye idinga isenzakalo esihlukile sesevisi. Kukhona inketho "yenkomba", ekuvumela ukuthi udale ukufana kwezinhlelo zefayela eziningana endaweni eyodwa - futhi ayihloliwe.

I-Ceph-mon - Le sevisi igcina imephu yeqoqo. Kuhlanganisa ulwazi mayelana nawo wonke ama-OSD, i-algorithm yokusabalalisa ama-PGs kuma-OSD futhi, okubaluleke kakhulu, ulwazi mayelana nazo zonke izinto (imininingwane yale nqubo ayicacile kimi: kukhona uhla lwemibhalo /var/lib/ceph/mon/.../ store.db, iqukethe ifayela elikhulu lingu-26MB, futhi eqoqweni lezinto ezingu-105K, kuvele kube ngaphezudlwana kuka-256 bytes into ngayinye - ngicabanga ukuthi ukuqapha ugcina uhlu lwazo zonke izinto kanye nama-PGs lapho bakhona). Ukulimala kwalo mkhombandlela kuholela ekulahlekeni kwayo yonke idatha kuqoqo. Ngakho-ke kwafinyelelwa esiphethweni sokuthi i-CRUSH ibonisa ukuthi ama-PG atholakala kanjani ku-OSD, nokuthi izinto zitholakala kanjani kuma-PGs - zigcinwa endaweni ephakathi nendawo egciniwe, kungakhathaliseki ukuthi onjiniyela baligwema kangakanani leli gama. Ngenxa yalokho, okokuqala, asikwazi ukufaka isistimu ku-flash drive kumodi ye-RO, njengoba i-database iqoshwa njalo, idiski eyengeziwe iyadingeka kulawa (okungengaphezu kwe-1 GB), okwesibili, kuyadingeka ukuba kopisha ngesikhathi sangempela lesi sisekelo. Uma kukhona abaqaphi abaningana, khona-ke ukubekezelelana kwamaphutha kuqinisekiswa ngokuzenzakalelayo, kodwa esimweni sethu kukhona ukuqapha okukodwa kuphela, okuphezulu okubili. Kunenqubo yetiyetha yokubuyisela imonitha esekelwe kudatha ye-OSD, ngiphendukele kuyo izikhathi ezintathu ngezizathu ezihlukahlukene, futhi izikhathi ezintathu kwakungekho imilayezo yephutha, kanye nedatha. Ngeshwa, lo mshini awusebenzi. Kuphakathi kokuthi sisebenzisa i-partition encane ku-OSD futhi sihlanganise i-RAID ukuze sigcine imininingwane egciniwe, okuzoba nomthelela omubi kakhulu ekusebenzeni, noma sabe okungenani imidiya emibili ethembekile, okungcono kakhulu i-USB, ukuze singahlali emachwebeni.

i-rados-gw - ithumela ngaphandle isitoreji sento ngephrothokholi ye-S3 nokunye okufanayo. Idala amachibi amaningi, akucaci ukuthi kungani. Angizange ngizame okuningi.

ceph-mgr - Uma ufaka le sevisi, amamojula amaningana aqaliswa. Enye yazo i-autoscale engakwazi ukukhutshazwa. Ilwela ukugcina inani elilungile le-PG/OSD. Uma ufuna ukulawula isilinganiso mathupha, ungakhubaza ukukala kuphuli ngayinye, kodwa kulesi simo imojuli iphahlazeka ngokuhlukaniswa ngo-0, futhi isimo seqoqo siba IPHUTHA. Imojula ibhalwe ngePython, futhi uma uphawula ngomugqa odingekayo kuyo, lokhu kuholela ekukhubazeni kwayo. Ukuvilapha ukukhumbula imininingwane.

Uhlu lwemithombo esetshenzisiwe:

Ukufakwa kwe-CEPH
Ukuthola kabusha kusukela ekuhlulekeni okuphelele kokuqapha

Uhlu lweskripthi:

Ukufaka isistimu nge-debootstrap

blkdev=sdb1
mkfs.btrfs -f /dev/$blkdev
mount /dev/$blkdev /mnt
cd /mnt
for i in {@,@var,@home}; do btrfs subvolume create $i; done
mkdir snapshot @/{var,home}
for i in {var,home}; do mount -o bind @${i} @/$i; done
debootstrap buster @ http://deb.debian.org/debian; echo $?
for i in {dev,proc,sys}; do mount -o bind /$i @/$i; done
cp /etc/bash.bashrc @/etc/

chroot /mnt/@ /bin/bash
echo rbd1 > /etc/hostname
passwd
uuid=`blkid | grep $blkdev | cut -d """ -f 2`
cat << EOF > /etc/fstab
UUID=$uuid / btrfs noatime,nodiratime,subvol=@ 0 1
UUID=$uuid /var btrfs noatime,nodiratime,subvol=@var 0 2
UUID=$uuid /home btrfs noatime,nodiratime,subvol=@home 0 2
EOF
cat << EOF >> /var/lib/dpkg/status
Package: lvm2
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install

Package: sudo
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install
EOF

exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

apt -yq install --no-install-recommends linux-image-amd64 bash-completion ed btrfs-progs grub-pc iproute2 ssh  smartmontools ntfs-3g net-tools man
exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

Dala iqoqo

apt -yq install --no-install-recommends gnupg wget ca-certificates
echo 'deb https://download.ceph.com/debian-octopus/ buster main' >> /etc/apt/sources.list
wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
apt update
apt -yq install --no-install-recommends ceph-common ceph-mon

echo 192.168.11.11 rbd1 >> /etc/hosts
uuid=`cat /proc/sys/kernel/random/uuid`
cat << EOF > /etc/ceph/ceph.conf
[global]
fsid = $uuid
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
mon allow pool delete = true
mon host = 192.168.11.11
mon initial members = rbd1
mon max pg per osd = 385
osd crush update on start = false
#osd memory target = 2147483648
osd memory target = 1610612736
osd scrub chunk min = 1
osd scrub chunk max = 2
osd scrub sleep = .2
osd pool default pg autoscale mode = off
osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 1
osd pool default pgp num = 1
[mon]
mgr initial modules = dashboard
EOF

ceph-authtool --create-keyring ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
cp ceph.client.admin.keyring /etc/ceph/
ceph-authtool --create-keyring bootstrap-osd.ceph.keyring --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 'allow r'
cp bootstrap-osd.ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
ceph-authtool ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
ceph-authtool ceph.mon.keyring --import-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
monmaptool --create --add rbd1 192.168.11.11 --fsid $uuid monmap
rm -R /var/lib/ceph/mon/ceph-rbd1/*
ceph-mon --mkfs -i rbd1 --monmap monmap --keyring ceph.mon.keyring
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-mon@rbd1
systemctl start ceph-mon@rbd1
ceph mon enable-msgr2
ceph status

# dashboard

apt -yq install --no-install-recommends ceph-mgr ceph-mgr-dashboard python3-distutils python3-yaml
mkdir /var/lib/ceph/mgr/ceph-rbd1
ceph auth get-or-create mgr.rbd1 mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-rbd1/keyring
systemctl enable ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1
ceph config set mgr mgr/dashboard/ssl false
ceph config set mgr mgr/dashboard/server_port 7000
ceph dashboard ac-user-create root 1111115 administrator
systemctl stop ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1

Ingeza i-OSD (ingxenye)

apt install ceph-osd

osdnum=`ceph osd create`
mkdir -p /var/lib/ceph/osd/ceph-$osdnum
mkfs -t xfs /dev/sda1
mount -t xfs /dev/sda1 /var/lib/ceph/osd/ceph-$osdnum
cd /var/lib/ceph/osd/ceph-$osdnum
ceph auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' > /var/lib/ceph/osd/ceph-$osdnum/keyring
ln -s /dev/disk/by-partuuid/d8cc3da6-02  block
ceph-osd -i $osdnum --mkfs
#chown ceph:ceph /dev/sd?2
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-osd@$osdnum
systemctl start ceph-osd@$osdnum

Isifingqo

Inzuzo enkulu yokuthengisa ye-CEPH yi-CRUSH - i-algorithm yokubala indawo yedatha. Abaqaphi basabalalisa le-algorithm kumakhasimende, okuthi ngemva kwalokho amakhasimende acele ngokuqondile i-node ayifunayo kanye ne-OSD ayifunayo. I-CRUSH iqinisekisa ukuthi akukho ukufakwa endaweni eyodwa. Ifayela elincane ongaliphrinta futhi ulilengise odongeni. Ukuzijwayeza kubonise ukuthi i-CRUSH akuyona imephu ephelele. Uma ucekela phansi futhi udala kabusha iziqaphi, ugcine yonke i-OSD ne-CRUSH, khona-ke lokhu akwanele ukubuyisela iqoqo. Kulokhu kuphetha ngokuthi imonitha ngayinye igcina imethadatha ethile mayelana neqoqo lonke. Inani elincane lale metadata alifaki imingcele kusayizi weqoqo, kodwa lidinga ukuqinisekisa ukuphepha kwabo, okuqeda ukugcinwa kwediski ngokufaka isistimu ku-flash drive futhi ingabandakanyi amaqoqo anamanodi angaphansi kwamathathu. Inqubomgomo enolaka yonjiniyela mayelana nezici ozikhethela zona. Kude ne-minimalism. Amadokhumenti asezingeni elithi "siyabonga ngalokho esinakho, kodwa kuncane kakhulu." Ikhono lokuxhumana nezinsizakalo ezisezingeni eliphansi linikezwa, kodwa imibhalo ithinta lesi sihloko ngokukha phezulu, ngakho-ke kungenzeka ukuthi uthi cha kunoyebo. Cishe awekho amathuba okuthola idatha esimweni esiphuthumayo.

Izinketho zesenzo esengeziwe: shiya i-CEPH futhi usebenzise i-banal multi-disk btrfs (noma xfs, zfs), thola ulwazi olusha mayelana ne-CEPH, ezokuvumela ukuthi uyisebenzise ngaphansi kwezimo ezishiwo, zama ukubhala isitoreji sakho njengesithuthukisiwe. ukuqeqeshwa.

Source: www.habr.com

Engeza amazwana