Khibrad hawleedka CEPH

Marka ay jiraan xog ka badan inta ku habboon hal saxan, waa waqtigii laga fikiri lahaa RAID. Carruur ahaan, waxaan inta badan ka maqlay odayaashayda: "Maalin maalmaha ka mid ah RAID waxay noqon doontaa wax la soo dhaafay, kaydinta shayada ayaa buuxin doonta adduunka, oo xitaa ma garan kartid waxa CEPH uu yahay," markaa waxa ugu horreeya ee noloshayda madaxbannaan. waxay ahayd inaan abuuro koox ii gaar ah. Ujeedada tijaabadu waxay ahayd in la barto qaabka gudaha ee ceph iyo in la fahmo baaxadda codsigeeda. Sidee loo xaq u leeyahay hirgelinta ceph ee ganacsiyada dhexdhexaadka ah iyo kuwa yaryar? Ka dib dhowr sano oo hawlgal ah iyo laba khasaare xogta aan la soo celin karin, fahamka qallafsanaanta ayaa kacday in wax walba aysan ahayn mid fudud. Waxyaabaha u gaarka ah CEPH ayaa caqabad ku ah korsashadeeda baahsan, iyo iyaga daraaddood, tijaabooyinku waxay gaadheen dhammaad dhintay. Hoos waxaa ku qoran sharaxaad ku saabsan dhammaan tillaabooyinka la qaaday, natiijada la helay iyo gabagabada la sameeyay. Haddii dadka aqoonta u leh ay wadaagaan waayo-aragnimadooda oo ay sharraxaan qodobbada qaarkood, waan u mahadcelin doonaa.

Fiiro gaar ah: Faallooyinku waxay aqoonsadeen khaladaad halis ah oo ku jira qaar ka mid ah malo-awaalka u baahan in dib loo eego maqaalka oo dhan.

Istaraatiijiyada CEPH

Kooxda CEPH waxay isku daraysaa lambar aan sabab lahayn K ee saxannada cabbirka aan sharciga ahayn waxayna ku kaydisaa xogta iyaga oo nuqul ka dhigaya qayb kasta (4 MB sida caadiga ah) lambar la bixiyay N jeer.

Aynu tixgelinno kiiska ugu fudud oo leh laba saxan oo isku mid ah. Iyaga waxaad ka soo ururin kartaa RAID 1 ama koox leh N=2 - natiijadu waxay ahaan doontaa isku mid. Haddii ay jiraan saddex saxan oo ay kala cabbir duwan yihiin, markaa way fududahay in la isku keeno koox leh N=2: qaar ka mid ah xogta waxay ku jiri doonaan saxannada 1 iyo 2, qaar waxay ku jiri doonaan saxannada 1 iyo 3, qaarna waxay ahaan doonaan on 2 iyo 3, halka RAID ma doono (waxaad soo ururin kartaa sida RAID ah, laakiin waxa ay noqon doontaa qalloocan). Haddii ay jiraan xitaa diskiyo badan, markaa waxaa suurtagal ah in la abuuro RAID 5; CEPH waxay leedahay analoog - erasure_code, taas oo ka soo horjeeda fikradaha hore ee horumarinta, sidaas darteed lama tixgelin. RAID 5 waxay u malaynaysaa inay jiraan tiro yar oo wadiiqooyin ah, kuwaas oo dhamaantood xaalad fiican ku jira. Haddii mid ku guuldareysto, kuwa kale waa in ay qabtaan ilaa saxanka la bedelayo oo xogta lagu soo celinayo. CEPH, oo leh N> = 3, waxay dhiirigelinaysaa isticmaalka saxanadaha hore, gaar ahaan, haddii aad dhowr saxan oo wanaagsan ku kaydiso hal nuqul oo xog ah, oo aad ku kaydiso labada ama saddexda nuqul ee soo hadhay ee tiro badan oo saxan ah, markaa macluumaadka Waxay noqon doontaa mid ammaan ah, maadaama hadda diskooga cusub ay nool yihiin - ma jiraan wax dhibaato ah, oo haddii mid ka mid ah uu jabo, markaa fashilka isku midka ah ee saddex saxan oo leh nolol adeeg oo ka badan shan sano, gaar ahaan server-yada kala duwan, waa wax aan macquul ahayn. dhacdo.

Waxaa jira wax yar oo ku saabsan qaybinta nuqullada. Sida caadiga ah, waxaa loo maleynayaa in xogta loo qaybiyay in ka badan (~ 100 halkii disk) kooxaha qaybinta PG, kuwaas oo mid kasta oo ka mid ah lagu soo koobay saxanadaha qaarkood. Aynu nidhaahno K=6, N=2, ka dib haddii laba saxanadood ay guuldarraystaan, xogta ayaa dammaanad qaadaysa inay lunto, maadaama sida waafaqsan aragtida itimaalka, waxaa jiri doona ugu yaraan hal PG oo ku yaal labadan saxan. Oo luminta hal koox ayaa ka dhigaysa dhammaan xogta barkada aan la heli karin. Haddii saxanka loo qaybiyo saddex lammaane oo xogta loo oggolaado in lagu kaydiyo oo keliya saxanadaha hal lammaane ah, qaybinta noocan oo kale ah waxay sidoo kale adkaysi u tahay fashilka mid kasta oo saxan ah, laakiin haddii laba saxanadood ay fashilmaan, suurtogalnimada luminta xogta maahan. 100%, laakiin kaliya 3/15, iyo xitaa haddii ay dhacdo guuldarro saddex saxan - kaliya 12/20. Sidaa darteed, entropy qaybinta xogta waxba kuma kordhinayso dulqaadka khaladka. Sidoo kale ogow in server-ka faylka, RAM bilaashka ah wuxuu si weyn u kordhiyaa xawaaraha jawaabta. Markasta oo ay sii bataan xusuusta qanjidhada kasta, iyo inta badan ee xusuusta ee dhammaan noodyada, ayaa dhakhso u noqon doonta. Tani shaki la'aan waa faa'iido ay kooxdu u leedahay hal server iyo, xitaa si ka sii badan, qalabka NAS, halkaasoo qadar aad u yar oo xusuusta ah lagu dhex dhisay.

Waxay raacaysaa in CEPH ay tahay hab wanaagsan oo lagu abuuro nidaam kaydinta xogta lagu kalsoonaan karo ee tobanaan TB oo awood u leh in lagu cabbiro maalgelinta ugu yar ee qalabka duugoobay (halkan, dabcan, kharashyada ayaa loo baahan doonaa, laakiin yar marka loo eego hababka kaydinta ganacsiga).

Hirgelinta kooxda

Tijaabada, aynu soo qaadano kombuyuutar go'ay Intel DQ57TM + Intel core i3 540 + 16 GB ee RAM. Waxaan u habayn doonaa afar 2 TB disks oo u dhigma sida RAID10, imtixaan guul leh ka dib waxaan ku dari doonaa node labaad iyo tiro isku mid ah oo saxan ah.

Ku rakibida Linux. Qaybintu waxay u baahan tahay awoodda wax lagu beddeli karo oo xasilloon. Debian iyo Suse waxay buuxiyeen shuruudaha. Suse waxay leedahay rakibe dabacsan oo kuu oggolaanaya inaad joojiso xirmo kasta; Nasiib darro, waxaan garan waayay kuwa la tuuri karo iyada oo aan waxyeello loo geysan nidaamka. Ku rakib Debian adoo isticmaalaya debootstrap buster. Xulashada min-base waxay ku rakibtaa nidaam jabay oo ka maqan darawallo. Farqiga u dhexeeya cabbirka marka loo eego nooca buuxa ma aha mid aad u weyn sida in la dhibo. Tan iyo markii shaqada lagu fuliyaa mashiinka jirka, waxaan rabaa in aan qaato sawir-qaadis, sida mashiinnada farsamada. Doorashadan waxaa bixiya LVM ama btrfs (ama xfs, ama zfs - faraqa ma weyna). Sawir-qaadista LVM maaha meel adag. Ku rakib btrfs Oo bootloader-ku wuxuu ku jiraa MBR. Ma jirto wax faa'iido ah oo ku saabsan qashinka 50 MB oo leh qayb FAT ah markaad riixi karto meel 1 MB qaybeed oo u qoondayso dhammaan booska nidaamka. Waxay ku qaadatay 700 MB diskka. Ma xasuusto inta ay le'eg tahay rakibaadda aasaasiga ah ee SUSE, waxaan u maleynayaa inay ku saabsan tahay 1.1 ama 1.4 GB.

Ku rakib CEPH Waxaan iska indha tirnaa nooca 12 ee ku jira kaydka debian oo si toos ah ayaanu uga xidhnay goobta 15.2.3. Waxaan raacnaa tilmaamaha qaybta "Ku rakib CEPH gacanta" oo leh digniinaha soo socda:

  • Kahor inta aanad xidhin kaydka, waa inaad ku rakibtaa gnupg wget ca-certificates
  • Ka dib marka la xidho bakhaarka, laakiin ka hor inta aan la rakibin kutlada, ku rakibida baakadaha waa laga saaray: apt -y --no-install-waxay ku talinaysaa rakibi ceph-common ceph-mon ceph-osd ceph-mds ceph-mgr
  • Marka la rakibayo CEPH, sababo aan la garanayn, waxay isku dayi doontaa inay rakibto lvm2. Mabda 'ahaan, ma aha wax laga xumaado, laakiin rakibiddu way guuldareysataa, markaa CEPH midna ma rakibi doonto.

    balastarkan ayaa caawiyay:

    cat << EOF >> /var/lib/dpkg/status
    Package: lvm2
    Status: install ok installed
    Priority: important
    Section: admin
    Installed-Size: 0
    Maintainer: Debian Adduser Developers <[email protected]>
    Architecture: all
    Multi-Arch: foreign
    Version: 113.118
    Description: No-install
    EOF
    

Guudmarka kooxda

ceph-osd - waxay mas'uul ka tahay kaydinta xogta saxanka. Saxan kasta, adeeg shabakad ayaa loo furayaa oo aqbala oo fulisa codsiyada lagu akhriyo ama loo qoro shay. Laba qaybood ayaa lagu abuuray saxanka. Mid ka mid ah ayaa ka kooban macluumaadka ku saabsan kutlada, nambarka diskka, iyo furayaasha kooxda. Macluumaadkan 1KB waxa la abuuraa hal mar marka disk lagu daro oo aan waligood la dareemin inay isbedelayaan. Qaybta labaad ma laha nidaam faylal waxayna kaydisaa xogta binary CEPH. Ku rakibida tooska ah ee noocyadii hore waxay abuurtay 100MB xfs qayb macluumaadka adeega ah. Waxaan diskka u beddelay MBR waxaana u qoondeeyay 16MB kaliya - adeeggu kama cabanayo. Waxaan filayaa in xfs lagu bedeli karo ext dhibaato la'aan. Qaybtani waxa ay ku rakiban tahay /var/lib/…,halkaas oo adeeggu ku akhriyo macluumaadka ku saabsan OSD oo uu sidoo kale helo tixraaca aaladda xannibaadda ee lagu kaydiyo xogta binary. Aragti ahaan, waxaad isla markaaba gelin kartaa faylalka caawinta gudaha /var/lib/…, oo aad u qoondayn kartaa dhammaan diskka xogta. Marka la abuurayo OSD iyada oo loo marayo ceph-deploy, sharci si toos ah ayaa loo abuuray si loogu dhejiyo qaybta /var/lib/… Haddii aad gacanta ku rakibto, waa inaad tan samaysaa laftaadu, dukumeentigu sidan ma odhanayo. Waxa kale oo lagu talinayaa in la qeexo qiyaasta bartilmaameedka xusuusta osd si ay u jirto xasuus jidheed oo ku filan.

ceph-mds. Heerka hoose, CEPH waa kaydinta shay. Awoodda lagu xannibo kaydinta waxay hoos ugu dhacdaa kaydinta baloog kasta oo 4MB ah shay ahaan. Kaydinta faylka waxay ku shaqeysaa isla mabda'a. Laba barkadood ayaa la sameeyay: mid loogu talagalay metadata, kan kalena xogta. Waxay isku daraan nidaamka faylka. Waqtigan xaadirka ah, nooc ka mid ah rikoodhada ayaa la abuuray, markaa haddii aad tirtirto nidaamka faylka, laakiin aad ilaaliso labada barkadood, ma awoodid inaad soo celiso. Waxaa jira hab lagu soo saaro faylasha by blocks, ma aan tijaabin. Adeegga ceph-mds ayaa mas'uul ka ah gelitaanka nidaamka faylka. Nidaam kasta oo fayl ah wuxuu u baahan yahay tusaale gaar ah oo adeegga ah. Waxaa jira ikhtiyaarka "index", kaas oo kuu ogolaanaya inaad abuurto muuqaal dhowr habab faylal ah oo mid ka mid ah - sidoo kale aan la tijaabin.

Ceph-mon - Adeegani waxa uu kaydiyaa khariidadda kooxda. Waxa ku jira macluumaadka ku saabsan dhammaan OSD-yada, algorithm-ka qaybinta PG-yada OSD-yada iyo, tan ugu muhiimsan, macluumaadka ku saabsan dhammaan walxaha (faahfaahinta habkan aniga iima cadda: waxaa jira hagaha /var/lib/ceph/mon/…/ store.db, waxa ku jira qayb weyn oo faylku yahay 26MB, iyo koox ka kooban 105k walx, waxa ay u soo baxaysaa in ay wax yar ka badan tahay 256 bytes walax kasta - waxa aan filayaa in kormeeruhu kaydiyo liiska dhammaan walxaha iyo PG-yada ay ku jiraan. waxay ku yaalaan). Burbur ku yimaadda hagahani waxa ay keentaa luminta dhammaan xogta ku jirta kooxda Sidaa darteed gabagabadii waxaa la soo saaray in CRUSH ay muujinayso sida PG-yadu ugu yaalliin OSD-ga, iyo sida walxaha ay ugu yaalliin PG-yada - waxay ku dhex kaydsan yihiin kaydka xogta, iyadoon loo eegin inta horumariyayaashu ka fogaadaan ereygan. Natiijo ahaan, marka hore, kuma rakibi karno nidaamka flash-ka ee qaabka RO, maadaama xogta si joogto ah loo duubo, kuwan ayaa loo baahan yahay disk dheeri ah (in ka badan 1 GB), marka labaad, waa lagama maarmaan in la haysto a koobbi wakhtiga dhabta ah saldhigan. Haddii ay jiraan dhowr kormeerayaal, markaa dulqaadka cilada si toos ah ayaa loo hubiyaa, laakiin kiiskeena waxaa jira hal kormeere, ugu badnaan laba. Waxaa jira nidaam aragtiyeed oo dib loogu soo celinayo kormeeraha iyadoo lagu saleynayo xogta OSD, waxaan u adeegsaday saddex jeer sababo kala duwan, saddex jeerna ma jirin farriimo khalad ah, iyo sidoo kale xog. Nasiib darro, habkani ma shaqeeyo. Ama waxaan ku shaqeynaa qayb yar oo OSD ah oo aan ku ururinno RAID si aan u kaydiyo xogta, taas oo hubaal ah inay saameyn aad u xun ku yeelan doonto waxqabadka, ama waxaan u qoondeynaa ugu yaraan laba warbaahin jireed oo la isku halleyn karo, oo doorbidan USB, si aan loo qabsan dekedaha.

rados-gw - waxay dhoofisaa kaydinta shay iyada oo loo sii marayo borotokoolka S3 iyo wixii la mid ah. Wuxuu abuuraa barkado badan, ma cadda sababta. Wax badan ma tijaabin.

ceph-mgr - Marka la rakibo adeeggan, dhowr qaybood ayaa la bilaabay. Mid ka mid ah waa autoscale oo aan la curyaami karin. Waxay ku dadaalaysaa inay ilaaliso qaddarka saxda ah ee PG/OSD. Haddii aad rabto in aad gacanta ku maamusho saamiga, waxa aad joojin kartaa miisaanaynta barkad kasta, laakiin kiiskan cutubku waxa uu ku burburaa qaybin 0 ah, heerka kooxduna waxa ay noqotaa KHALAD. Module-ku wuxuu ku qoran yahay Python, oo haddii aad faallo ka bixiso khadka lagama maarmaanka ah ee ku jira, tani waxay keenaysaa curyaaminta. Aad u caajis ah in la xasuusto faahfaahinta.

Liiska ilaha la isticmaalay:

Rakibaadda CEPH
Ka soo kabashada cilladda kormeeraha oo dhammaystiran

Liisaska qoraalka:

Ku rakibida nidaamka iyada oo loo marayo debootstrap

blkdev=sdb1
mkfs.btrfs -f /dev/$blkdev
mount /dev/$blkdev /mnt
cd /mnt
for i in {@,@var,@home}; do btrfs subvolume create $i; done
mkdir snapshot @/{var,home}
for i in {var,home}; do mount -o bind @${i} @/$i; done
debootstrap buster @ http://deb.debian.org/debian; echo $?
for i in {dev,proc,sys}; do mount -o bind /$i @/$i; done
cp /etc/bash.bashrc @/etc/

chroot /mnt/@ /bin/bash
echo rbd1 > /etc/hostname
passwd
uuid=`blkid | grep $blkdev | cut -d """ -f 2`
cat << EOF > /etc/fstab
UUID=$uuid / btrfs noatime,nodiratime,subvol=@ 0 1
UUID=$uuid /var btrfs noatime,nodiratime,subvol=@var 0 2
UUID=$uuid /home btrfs noatime,nodiratime,subvol=@home 0 2
EOF
cat << EOF >> /var/lib/dpkg/status
Package: lvm2
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install

Package: sudo
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install
EOF

exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

apt -yq install --no-install-recommends linux-image-amd64 bash-completion ed btrfs-progs grub-pc iproute2 ssh  smartmontools ntfs-3g net-tools man
exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

Samee koox

apt -yq install --no-install-recommends gnupg wget ca-certificates
echo 'deb https://download.ceph.com/debian-octopus/ buster main' >> /etc/apt/sources.list
wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
apt update
apt -yq install --no-install-recommends ceph-common ceph-mon

echo 192.168.11.11 rbd1 >> /etc/hosts
uuid=`cat /proc/sys/kernel/random/uuid`
cat << EOF > /etc/ceph/ceph.conf
[global]
fsid = $uuid
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
mon allow pool delete = true
mon host = 192.168.11.11
mon initial members = rbd1
mon max pg per osd = 385
osd crush update on start = false
#osd memory target = 2147483648
osd memory target = 1610612736
osd scrub chunk min = 1
osd scrub chunk max = 2
osd scrub sleep = .2
osd pool default pg autoscale mode = off
osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 1
osd pool default pgp num = 1
[mon]
mgr initial modules = dashboard
EOF

ceph-authtool --create-keyring ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
cp ceph.client.admin.keyring /etc/ceph/
ceph-authtool --create-keyring bootstrap-osd.ceph.keyring --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 'allow r'
cp bootstrap-osd.ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
ceph-authtool ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
ceph-authtool ceph.mon.keyring --import-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
monmaptool --create --add rbd1 192.168.11.11 --fsid $uuid monmap
rm -R /var/lib/ceph/mon/ceph-rbd1/*
ceph-mon --mkfs -i rbd1 --monmap monmap --keyring ceph.mon.keyring
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-mon@rbd1
systemctl start ceph-mon@rbd1
ceph mon enable-msgr2
ceph status

# dashboard

apt -yq install --no-install-recommends ceph-mgr ceph-mgr-dashboard python3-distutils python3-yaml
mkdir /var/lib/ceph/mgr/ceph-rbd1
ceph auth get-or-create mgr.rbd1 mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-rbd1/keyring
systemctl enable ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1
ceph config set mgr mgr/dashboard/ssl false
ceph config set mgr mgr/dashboard/server_port 7000
ceph dashboard ac-user-create root 1111115 administrator
systemctl stop ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1

Ku darida OSD (qayb)

apt install ceph-osd

osdnum=`ceph osd create`
mkdir -p /var/lib/ceph/osd/ceph-$osdnum
mkfs -t xfs /dev/sda1
mount -t xfs /dev/sda1 /var/lib/ceph/osd/ceph-$osdnum
cd /var/lib/ceph/osd/ceph-$osdnum
ceph auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' > /var/lib/ceph/osd/ceph-$osdnum/keyring
ln -s /dev/disk/by-partuuid/d8cc3da6-02  block
ceph-osd -i $osdnum --mkfs
#chown ceph:ceph /dev/sd?2
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-osd@$osdnum
systemctl start ceph-osd@$osdnum

Soo koobid

Faa'iidada suuqgeynta ugu weyn ee CEPH waa CRUSH - algorithm ee xisaabinta goobta xogta. Kormeerayaashu waxay u qaybiyaan algorithm-ka macaamiisha, ka dib markaa macaamiishu waxay si toos ah u codsadaan noodhka la rabo iyo OSD la doonayo. CRUSH waxay xaqiijisaa inaysan jirin meel dhexe. Waa fayl yar oo aad xitaa daabacan karto oo aad gidaarka ku dhejin karto. Ku celcelinta ayaa muujisay in CRUSH aysan ahayn khariidad dhammaystiran. Haddii aad burburiso oo aad dib u samayso kormeerayaasha, adoo ilaalinaya dhammaan OSD iyo CRUSH, markaa tani kuma filna soo celinta kooxda. Halkaas waxaa lagu soo gabagabeeyay in kormeere kastaa kaydiyo xog badan oo ku saabsan kooxda oo dhan. Qadarka yar ee xog-ururintani kuma soo rogayso xannibaado cabbirka kutlada, laakiin waxay u baahan tahay hubinta ammaankooda, taas oo meesha ka saaraysa kaydinta diskka iyadoo lagu rakibayo nidaamka flash-ka oo ka saaraya rucubyada leh wax ka yar saddex nood. Siyaasadda qallafsan ee horumariyaha ee ku saabsan sifooyinka ikhtiyaariga ah. Ka fog minimalism. Dukumeentigu waa heerka "waad ku mahadsan tahay waxa aan haysano, laakiin aad iyo aad bay u yar tahay." Awoodda lagula macaamilayo adeegyada heer hoose ayaa la bixiyaa, laakiin dukumeentiyadu mawduucan aad bay u taabanayaan, markaa waxay u badan tahay maya ka badan haa. Dhab ahaantii ma jirto fursad aad xogta uga soo kaban karto xaalad degdeg ah.

Ikhtiyaarada ficilka dheeriga ah: iska daa CEPH oo isticmaal banal multi disk btrfs (ama xfs, zfs), ogow macluumaad cusub oo ku saabsan CEPH, kaas oo kuu ogolaanaya inaad ku shaqeyso shuruudaha la cayimay, isku day inaad u qorto kaydintaada sidii mid horumarsan. tababarka.

Source: www.habr.com

Add a comment