CEPH amava okusebenza

Xa kukho idatha eninzi kunokungena kwidiski enye, lixesha lokucinga nge-RAID. Njengomntwana, ndandidla ngokuva kubadala bam: "Ngenye imini iRAID iya kuba yinto yexesha elidlulileyo, ukugcinwa kwezinto kuya kuzalisa ihlabathi, kwaye awazi nokuba yintoni i-CEPH," ngoko ke into yokuqala ebomini bam obuzimeleyo. yayikukudala elam iqela. Injongo yovavanyo yayikukuqhelana nesakhiwo sangaphakathi se-ceph kwaye siqonde umda wesicelo sayo. Kuthetheleleka kangakanani ukuphunyezwa kwe-ceph kumashishini aphakathi kunye namancinci? Emva kweminyaka eliqela yokusebenza kunye neelahleko ezimbalwa ezingenakuguquleka zedatha, kwavela ukuqonda okuntsonkothileyo ukuba asiyiyo yonke into elula. Iimpawu ezikhethekileyo zeCEPH zibeka imiqobo ekwamkeleni kwayo ngokubanzi, yaye ngenxa yazo, uphando luye lwafikelela esiphelweni. Ngezantsi inkcazo yawo onke amanyathelo athathiweyo, isiphumo esifunyenweyo kunye nezigqibo ezithatyathiweyo. Ukuba abantu abanolwazi babelana ngamava abo kwaye bachaze iingongoma ezithile, ndiya kuba nombulelo.

Qaphela: Abagqabaza bachonge iimpazamo ezinzulu kwezinye zeengqikelelo ezifuna uhlaziyo lwalo lonke inqaku.

CEPH Isicwangciso

Iqela le-CEPH lidibanisa inani elingenasizathu le-K leediski zobungakanani obungenasizathu kwaye zigcina idatha kuzo, ziphindaphinda iqhekeza ngalinye (4 MB ngokungagqibekanga) inani elinikiweyo le-N amaxesha.

Makhe siqwalasele eyona meko ilula kunye neediski ezimbini ezifanayo. Ukusuka kubo unokudibanisa i-RAID 1 okanye i-cluster ene-N = 2 - umphumo uya kufana. Ukuba kukho iidiski ezintathu kwaye zinobukhulu obahlukeneyo, ngoko kulula ukuhlanganisa iqela nge-N=2: enye yedatha iya kuba kwidiski 1 no-2, ezinye ziya kuba kwidiski 1 no-3, kwaye ezinye ziya kuba. kwi-2 kunye ne-3, ngelixa i-RAID ingayi (ungahlanganisa i-RAID enjalo, kodwa iya kuba yintlupheko). Ukuba kukho iidiski ezingaphezulu, ngoko kunokwenzeka ukwenza i-RAID 5; i-CEPH ine-analogue - erasure_code, ephikisana neengcamango zangaphambili zabaphuhlisi, kwaye ke ayiqwalaselwa. I-RAID 5 ithatha ukuba kukho inani elincinci lokuqhuba, zonke zikwimeko efanelekileyo. Ukuba enye iyasilela, abanye kufuneka babambe de kube idiski ibuyiselwe kwaye idatha ibuyiselwe kuyo. CEPH, nge N> = 3, ikhuthaza ukusetyenziswa iidiski ezindala, ngokukodwa, ukuba ugcina iidiski ezilungileyo eziliqela ukugcina ikopi enye yedatha, kwaye ugcine iikopi ezimbini okanye ezintathu eziseleyo kwinani elikhulu leediski ezindala, ngoko ulwazi iya kukhuseleka, kuba okwangoku iidiski ezintsha ziyaphila-akukho ngxaki, kwaye ukuba enye yazo iyaphuka, ukusilela kwangaxeshanye kwiidiski ezintathu kunye nobomi benkonzo obungaphezulu kweminyaka emihlanu, ngokukhethekileyo kwiiseva ezahlukeneyo, akunakwenzeka kakhulu. isiganeko.

Kukho ubuqili ekuhanjisweni kweekopi. Ngokungagqibekanga, kucingelwa ukuba idata yahlulwe ngaphezulu (~100 ngediski) amaqela osasazo ePG, ngalinye liphindwa kabini kwezinye iidiski. Masithi K=6, N=2, ukuba naziphi na iidiski ezimbini ziyasilela, idatha iqinisekisiwe ukuba ilahlekile, kuba ngokwethiyori enokwenzeka, kuya kubakho ubuncinci i-PG eya kufumaneka kwezi diski zimbini. Kwaye ukulahleka kweqela elinye kwenza ukuba yonke idatha echibini ingafumaneki. Ukuba iidiski zihlulwe zibe ngamaqela amathathu kwaye idatha ivumelekile ukuba igcinwe kuphela kwiidiski ngaphakathi kwesibini esinye, ngoko unikezelo olunjalo luxhathisa ukusilela kwayo nayiphi na idiski enye, kodwa ukuba iidiski ezimbini ziyasilela, amathuba okuba ilahleko yedatha ayinakwenzeka. 100%, kodwa 3/15 kuphela, kwaye nakwimeko yokungaphumeleli iidiski ezintathu - kuphela 12/20. Ke ngoko, i-entropy ekusasazweni kwedatha ayinagalelo ekunyamezeleni iimpazamo. Kwakhona qaphela ukuba kwiseva yefayile, i-RAM yasimahla yonyusa kakhulu isantya sokuphendula. Imemori eninzi kwi-node nganye, kwaye imemori eninzi kuzo zonke iindawo, iya kuba ngokukhawuleza. Oku ngokungathandabuzekiyo kuluncedo lweqela ngaphezulu komncedisi omnye kwaye, ngakumbi, i-hardware ye-NAS, apho imemori encinci kakhulu yakhelwe kuyo.

Oku kulandela ukuba i-CEPH yindlela efanelekileyo yokudala inkqubo yokugcinwa kwedatha ethembekileyo yamashumi e-TB kunye nokukwazi ukulinganisa kunye notyalo-mali oluncinci ukusuka kwizixhobo ezidlulileyo (apha, ngokuqinisekileyo, iindleko ziya kufuneka, kodwa ezincinci xa kuthelekiswa neenkqubo zokugcina zorhwebo).

Ukuphunyezwa kweqela

Ngovavanyo, makhe sithathe ikhompyuter evaliweyo ye-Intel DQ57TM + Intel core i3 540 + 16 GB ye-RAM. Siza kuququzelela iidiski ezine ze-TB ze-TB zibe yinto efana ne-RAID2, emva kovavanyo oluyimpumelelo siya kwongeza i-node yesibini kunye nenani elifanayo leediski.

Kuhlohlwa iLinux. Ukuhanjiswa kufuna ukukwazi ukwenza ngokwezifiso kwaye uzinzile. I-Debian kunye ne-Suse ziyahlangabezana neemfuno. I-Suse ine-installer eguquguqukayo ngakumbi ekuvumela ukuba ukhubaze nayiphi na iphakheji; Ngelishwa, andikwazanga ukufumanisa ukuba zeziphi ezinokulahlwa ngaphandle kokonakalisa inkqubo. Faka i-Debian usebenzisa i-debootstrap buster. I-min-base option ifaka inkqubo eyaphukileyo engenabaqhubi. Umahluko kubungakanani xa kuthelekiswa nenguqulelo epheleleyo ayinkulu kangako ukuba ingakhathazwa. Kuba umsebenzi wenziwa kumatshini womzimba, ndifuna ukuthatha izifinyezo, njengakumatshini obonakalayo. Olu khetho lunikezwe nokuba yiLVM okanye btrfs (okanye xfs, okanye zfs - umahluko awukho mkhulu). I-snapshots ye-LVM ayiyona nto inamandla. Faka i-btrfs. Kwaye i-bootloader ikwi-MBR. Akukho njongo ekuhlanganiseni idiski ye-50 MB ngesahlulo seFAT xa ungayityhala kwindawo yetafile yolwahlulo lwe-1 MB kwaye unike sonke isithuba senkqubo. Ithathe i-700 MB kwidiski. Andikhumbuli ukuba kungakanani ukufakwa kwe-SUSE okusisiseko, ndicinga ukuba malunga ne-1.1 okanye i-1.4 GB.

Faka i-CEPH. Asiyihoyi inguqulo ye-12 kwindawo yokugcina i-debian kwaye idibanise ngokuthe ngqo kwindawo ye-15.2.3. Silandela imiyalelo ephuma kwicandelo "Faka i-CEPH ngesandla" ngala mava alandelayo:

  • Ngaphambi kokudibanisa indawo yokugcina, kufuneka ufake i-gnupg wget ca-certificates
  • Emva kokudibanisa indawo yokugcina, kodwa ngaphambi kokufaka iqela, ukufaka iipakethi kushiywe: i-apt -y --no-install-incoma ukufaka i-ceph-common ceph-mon ceph-osd ceph-mds ceph-mgr
  • Xa ufaka i-CEPH, ngenxa yezizathu ezingaziwayo, iya kuzama ukufaka i-lvm2. Ngokomgaqo, akusiyo lusizi, kodwa ukufakela kuyasilela, ngoko ke i-CEPH ayiyi kuyifaka.

    Le patch yanceda:

    cat << EOF >> /var/lib/dpkg/status
    Package: lvm2
    Status: install ok installed
    Priority: important
    Section: admin
    Installed-Size: 0
    Maintainer: Debian Adduser Developers <[email protected]>
    Architecture: all
    Multi-Arch: foreign
    Version: 113.118
    Description: No-install
    EOF
    

Isishwankathelo seqela

ceph-osd - inoxanduva lokugcina idatha kwidiski. Kwidiski nganye, kusungulwe inkonzo yenethiwekhi eyamkela kwaye yenze izicelo zokufunda okanye ukubhalela izinto. Izahlulo ezibini zenziwe kwidiski. Enye yazo iqulethe ulwazi malunga neqela, inombolo yediski, kunye nezitshixo zeqela. Olu lwazi lwe-1KB lwenziwa kanye xa udibanisa idiski kwaye ayizange iqatshelwe ukuba itshintshe. Isahlulo sesibini asinayo inkqubo yefayile kwaye sigcina i-CEPH yedatha yokubini. Ufakelo oluzenzekelayo kwiinguqulelo zangaphambili zenze i-100MB xfs isahlulelo solwazi lwenkonzo. Ndiguqule idiski kwi-MBR kwaye ndabelwe kuphela i-16MB - inkonzo ayikhalazi. Ndicinga ukuba ii-xfs zinokutshintshwa nge-ext ngaphandle kweengxaki. Esi sahlulelo sifakwe kwi /var/lib/…, apho inkonzo ifunda ulwazi malunga neOSD kwaye ifumana ireferensi kwisixhobo sebhloko apho kugcinwa khona idatha yokubini. Ngokwethiyori, ungabeka kwangoko iifayile ezincedisayo kwi/var/lib/…, kwaye unike idiski yonke yedatha. Xa usenza i-OSD nge-ceph-deploy, umthetho wenziwa ngokuzenzekelayo ukunyusela isahlulelo kwi /var/lib/…, kwaye umsebenzisi we-ceph ukwabelwa amalungelo okufunda isixhobo esibhloko esifunwayo. Ukuba ufaka ngesandla, kufuneka wenze oku ngokwakho; uxwebhu aluthethi oku. Kukwacetyiswa ukuba ukhankanye i parameter yenjongo yenkumbulo ye-osd ukuze kubekho inkumbulo yomzimba eyaneleyo.

ceph-mds. Kwinqanaba eliphantsi, i-CEPH yinto yokugcina into. Ukukwazi ukubhloka ukugcinwa kuhla ekugcineni ibhloko nganye ye-4MB njengento. Ukugcinwa kwefayile kusebenza kumgaqo ofanayo. Amachibi amabini enziwe: enye yeyemethadatha, enye yeyedatha. Zidityanisiwe zibe yinkqubo yefayile. Ngeli xesha, uhlobo oluthile lwerekhodi lwenziwe, ngoko ke ukuba ucima inkqubo yefayile, kodwa gcina zombini amachibi, awuyi kukwazi ukuyibuyisela. Kukho inkqubo yokukhupha iifayile ngeebhloko, andizange ndiyivavanye. Inkonzo ye-ceph-mds inoxanduva lokufikelela kwisixokelelwano sefayile. Isixokelelwano ngasinye sefayile sifuna umzekelo owahlukileyo wenkonzo. Kukho ukhetho "lwesalathiso", olukuvumela ukuba wenze ukufana kweenkqubo ezininzi zefayile kwenye - nazo azivavanywanga.

Ceph-mon - Le nkonzo igcina imephu yeqela. Ibandakanya ulwazi malunga nazo zonke ii-OSD, i-algorithm yokuhambisa ii-PGs kwii-OSD kwaye, okona kubaluleke kakhulu, ulwazi malunga nazo zonke izinto (iinkcukacha zolu matshini azicacanga kum: kukho i-directory /var/lib/ceph/mon/.../ store.db, iqulathe enkulu ifayile yi 26MB, kwaye kwiqela lezinto ze 105K, ivele ibengaphezulu kancinane kwe 256 bytes ngento nganye - ndicinga ukuba iliso ligcina uluhlu lwazo zonke izinto kunye nee PGs bakhona). Ukonakaliswa kolu luhlu kukhokelela ekulahlekeni kwayo yonke idatha kwiqela. Yiyo loo nto kwathathwa isigqibo sokuba i-CRUSH ibonisa indlela ii-PGs ezibekwe ngayo kwi-OSD, kunye nendlela izinto ezibekwe ngayo kwii-PGs - zigcinwe kwindawo ephakathi kwisiseko sedatha, nokuba ngaba abaphuhlisi baliphepha kangakanani eli gama. Ngenxa yoko, okokuqala, asikwazi ukufaka inkqubo kwi-flash drive kwimodi ye-RO, ekubeni i-database ihlala irekhodwa, i-disk eyongezelelweyo iyadingeka kwezi (ngokungathandabuzekiyo ngaphezu kwe-1 GB), okwesibini, kuyimfuneko ukuba ne ikopi ngexesha lokwenyani esi siseko. Ukuba kukho iimonitha ezininzi, ke ukunyamezela iimpazamo kuqinisekiswa ngokuzenzekelayo, kodwa kwimeko yethu kukho imonitha enye kuphela, ubuninzi besibini. Kukho inkqubo yethiyori yokubuyisela imonitha esekelwe kwidatha ye-OSD, ndabhenela kuyo kathathu ngenxa yezizathu ezahlukeneyo, kwaye kathathu akukho miyalezo yempazamo, kwaye akukho datha. Ngelishwa, le ndlela yokusebenza ayisebenzi. Mhlawumbi sisebenza isahlulelo esincinci kwi-OSD kwaye sihlanganise i-RAID yokugcina i-database, eya kuba nefuthe elibi kakhulu ekusebenzeni, okanye sabe ubuncinane izinto ezimbini ezithembekileyo zeendaba eziphathekayo, ngokukhethekileyo i-USB, ukuze ungahlali emachwebeni.

i-rados-gw - ithumela ngaphandle ukugcinwa kwezinto nge-S3 protocol kunye okufanayo. Idala amachibi amaninzi, akucaci ukuba kutheni. Andizange ndizame kakhulu.

ceph-mgr - Xa ufaka le nkonzo, iimodyuli ezininzi ziyasungulwa. Enye yazo yi-autoscale engenako ukucishwa. Izama ukugcina isixa esichanekileyo sePG/OSD. Ukuba ufuna ukulawula umlinganiselo ngesandla, unokukhubaza ukukala kwiqula ngalinye, kodwa kulo mzekelo imodyuli iyantlitheka ngolwahlulo ngo 0, kwaye ubume beqela buba yi ERROR. Imodyuli ibhalwe kwiPython, kwaye ukuba uphawula umgca oyimfuneko kuyo, oku kukhokelela ekukhubazeni kwayo. Wonqena ukukhumbula iinkcukacha.

Uluhlu lwemithombo esetyenzisiweyo:

Ukufakelwa kweCEPH
Ukufumana kwakhona ukusilela ngokupheleleyo esweni

Uluhlu lwezikripthi:

Ukuhlohla inkqubo nge-debootstrap

blkdev=sdb1
mkfs.btrfs -f /dev/$blkdev
mount /dev/$blkdev /mnt
cd /mnt
for i in {@,@var,@home}; do btrfs subvolume create $i; done
mkdir snapshot @/{var,home}
for i in {var,home}; do mount -o bind @${i} @/$i; done
debootstrap buster @ http://deb.debian.org/debian; echo $?
for i in {dev,proc,sys}; do mount -o bind /$i @/$i; done
cp /etc/bash.bashrc @/etc/

chroot /mnt/@ /bin/bash
echo rbd1 > /etc/hostname
passwd
uuid=`blkid | grep $blkdev | cut -d """ -f 2`
cat << EOF > /etc/fstab
UUID=$uuid / btrfs noatime,nodiratime,subvol=@ 0 1
UUID=$uuid /var btrfs noatime,nodiratime,subvol=@var 0 2
UUID=$uuid /home btrfs noatime,nodiratime,subvol=@home 0 2
EOF
cat << EOF >> /var/lib/dpkg/status
Package: lvm2
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install

Package: sudo
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install
EOF

exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

apt -yq install --no-install-recommends linux-image-amd64 bash-completion ed btrfs-progs grub-pc iproute2 ssh  smartmontools ntfs-3g net-tools man
exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

Yenza iqela

apt -yq install --no-install-recommends gnupg wget ca-certificates
echo 'deb https://download.ceph.com/debian-octopus/ buster main' >> /etc/apt/sources.list
wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
apt update
apt -yq install --no-install-recommends ceph-common ceph-mon

echo 192.168.11.11 rbd1 >> /etc/hosts
uuid=`cat /proc/sys/kernel/random/uuid`
cat << EOF > /etc/ceph/ceph.conf
[global]
fsid = $uuid
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
mon allow pool delete = true
mon host = 192.168.11.11
mon initial members = rbd1
mon max pg per osd = 385
osd crush update on start = false
#osd memory target = 2147483648
osd memory target = 1610612736
osd scrub chunk min = 1
osd scrub chunk max = 2
osd scrub sleep = .2
osd pool default pg autoscale mode = off
osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 1
osd pool default pgp num = 1
[mon]
mgr initial modules = dashboard
EOF

ceph-authtool --create-keyring ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
cp ceph.client.admin.keyring /etc/ceph/
ceph-authtool --create-keyring bootstrap-osd.ceph.keyring --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 'allow r'
cp bootstrap-osd.ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
ceph-authtool ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
ceph-authtool ceph.mon.keyring --import-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
monmaptool --create --add rbd1 192.168.11.11 --fsid $uuid monmap
rm -R /var/lib/ceph/mon/ceph-rbd1/*
ceph-mon --mkfs -i rbd1 --monmap monmap --keyring ceph.mon.keyring
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-mon@rbd1
systemctl start ceph-mon@rbd1
ceph mon enable-msgr2
ceph status

# dashboard

apt -yq install --no-install-recommends ceph-mgr ceph-mgr-dashboard python3-distutils python3-yaml
mkdir /var/lib/ceph/mgr/ceph-rbd1
ceph auth get-or-create mgr.rbd1 mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-rbd1/keyring
systemctl enable ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1
ceph config set mgr mgr/dashboard/ssl false
ceph config set mgr mgr/dashboard/server_port 7000
ceph dashboard ac-user-create root 1111115 administrator
systemctl stop ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1

Ukongeza i-OSD (inxalenye)

apt install ceph-osd

osdnum=`ceph osd create`
mkdir -p /var/lib/ceph/osd/ceph-$osdnum
mkfs -t xfs /dev/sda1
mount -t xfs /dev/sda1 /var/lib/ceph/osd/ceph-$osdnum
cd /var/lib/ceph/osd/ceph-$osdnum
ceph auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' > /var/lib/ceph/osd/ceph-$osdnum/keyring
ln -s /dev/disk/by-partuuid/d8cc3da6-02  block
ceph-osd -i $osdnum --mkfs
#chown ceph:ceph /dev/sd?2
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-osd@$osdnum
systemctl start ceph-osd@$osdnum

Isishwankathelo

Inzuzo enkulu yokuthengisa ye-CEPH yi-CRUSH - i-algorithm yokubala indawo yedatha. Abahloli basasaza le algorithm kubathengi, emva koko abathengi bacela ngokuthe ngqo i-node efunwayo kunye ne-OSD efunwayo. I-CRUSH iqinisekisa ukuba akukho ndawo iphakathi. Yifayile encinci onokuthi uyiprinte kwaye uyixhome edongeni. Ukuziqhelanisa kubonisile ukuba i-CRUSH ayisiyomephu epheleleyo. Ukuba utshabalalisa kwaye uphinde wenze iimonitha, ugcine yonke i-OSD kunye ne-CRUSH, ke oku akwanelanga ukubuyisela iqela. Ukusuka koku kugqitywe ekubeni umntu ngamnye obeka iliso ugcina imetadata malunga neqela lonke. Imali encinci yale metadata ayinyanzeli imiqobo kubungakanani beqela, kodwa ifuna ukuqinisekisa ukhuseleko lwabo, okuphelisa ukugcinwa kwediski ngokufaka inkqubo kwi-flash drive kwaye ingabandakanyi amaqoqo angaphantsi kweenodi ezintathu. Umgaqo-nkqubo ondlongondlongo womphuhlisi malunga neempawu ozikhethelayo. Kude kwi-minimalism. Amaxwebhu akwinqanaba elithi "enkosi ngento esinayo, kodwa incinci kakhulu." Ukukwazi ukusebenzisana neenkonzo kwizinga eliphantsi kunikezelwa, kodwa amaxwebhu achukumisa kwesi sihloko kakhulu, ngoko ke kunokwenzeka ukuba u-hayi kuno-ewe. Akukho thuba lokubuyisela idatha kwimeko yonxunguphalo.

Izinketho zesenzo esongezelelweyo: lahla i-CEPH kwaye usebenzise i-banal multi-disk btrfs (okanye xfs, zfs), fumana ulwazi olutsha malunga ne-CEPH, eya kukuvumela ukuba usebenze phantsi kweemeko ezikhankanyiweyo, zama ukubhala ugcino lwakho njengenkqubela phambili. uqeqesho.

umthombo: www.habr.com

Yongeza izimvo