Kwarewa tare da CEPH

Lokacin da akwai ƙarin bayanai fiye da iya dacewa akan faifai ɗaya, lokaci yayi da za a yi tunani game da RAID. Sa’ad da nake yaro, na sha ji daga dattawana: “Wata rana RAID zai zama al’ada, ajiyar abubuwa za su cika duniya, kuma ba ka ma san menene CEPH ba,” don haka abu na farko a rayuwata mai ’yancin kai. shine in kirkiro gungu na. Manufar gwajin shine don sanin tsarin ciki na ceph da fahimtar iyakar aikace-aikacensa. Yaya tabbatar da aiwatar da ceph a cikin matsakaitan kasuwanci da kanana? Bayan shekaru da yawa na aiki da kuma wasu asarar bayanan da ba za a iya jurewa ba, fahimtar rikice-rikice ya tashi cewa ba komai ba ne mai sauƙi. Abubuwan da ke tattare da CEPH suna haifar da shinge ga yaduwar ta, kuma saboda su, gwaje-gwajen sun kai ga ƙarshe. A ƙasa akwai bayanin duk matakan da aka ɗauka, sakamakon da aka samu da kuma sakamakon da aka yanke. Idan masu ilimi sun ba da labarinsu kuma sun bayyana wasu batutuwa, zan yi godiya.

Lura: Masu sharhi sun gano manyan kurakurai a cikin wasu zato waɗanda ke buƙatar sake fasalin labarin gaba ɗaya.

Dabarun CEPH

Tarin CEPH yana haɗa lambar sabani K na faifai masu girman sabani kuma yana adana bayanai a kansu, suna kwafi kowane yanki (4 MB ta tsohuwa) lambar da aka bayar sau N.

Bari mu yi la'akari da mafi sauƙi yanayin tare da faifai guda biyu iri ɗaya. Daga cikinsu zaku iya haɗa RAID 1 ko tari tare da N=2 - sakamakon zai kasance iri ɗaya. Idan akwai faifai guda uku kuma suna da girma daban-daban, to yana da sauƙi a haɗa cluster tare da N=2: wasu bayanan za su kasance akan diski 1 da 2, wasu za su kasance akan diski 1 da 3, wasu kuma zasu kasance. akan 2 da 3, yayin da RAID ba zai yi ba (zaku iya tara irin wannan RAID, amma zai zama karkatacciyar hanya). Idan akwai ƙarin faifai, to yana yiwuwa a ƙirƙiri RAID 5; CEPH yana da analog - erasure_code, wanda ya saba wa farkon ra'ayoyin masu haɓakawa, don haka ba a la'akari da su. RAID 5 yana ɗauka cewa akwai ƙananan adadin tukwici, waɗanda duk suna cikin yanayi mai kyau. Idan daya ya kasa, dole ne sauran su riƙe har sai an maye gurbin diski kuma an mayar da bayanan zuwa gare shi. CEPH tare da N> = 3, yana ƙarfafa amfani da tsofaffin faifai, musamman, idan ka adana kyawawan diski da yawa don adana kwafin bayanai guda ɗaya, sannan ka adana sauran kwafin biyu ko uku akan adadi mai yawa na tsofaffin faifai, sannan bayanan. Za su kasance lafiya, tun da a yanzu sababbin faifai suna da rai - babu matsaloli, kuma idan ɗayansu ya karye, to, gazawar guda ɗaya na diski uku tare da rayuwar sabis na fiye da shekaru biyar, zai fi dacewa daga sabobin daban-daban, ba zai yuwu ba. taron.

Akwai dabara ga rarraba kwafi. Ta hanyar tsoho, ana ɗauka cewa an raba bayanan zuwa ƙari (~ 100 kowane faifai) ƙungiyoyin rarraba PG, kowannensu yana kwafi akan wasu faifai. Bari mu ce K=6, N=2, to, idan kowane diski guda biyu ya gaza, an tabbatar da cewa bayanan za su ɓace, tunda bisa ga ka'idar yiwuwar, za a sami akalla PG guda ɗaya wanda zai kasance akan waɗannan diski guda biyu. Kuma asarar rukuni ɗaya ya sa duk bayanan da ke cikin tafkin ba su samuwa. Idan diski ya kasu kashi uku kuma an ba da izinin adana bayanai a kan faifai a cikin guda biyu kawai, to irin wannan rarraba kuma yana da juriya ga gazawar kowane faifai guda ɗaya, amma idan diski biyu ya gaza, yuwuwar asarar bayanai ba ta kasance ba. 100%, amma kawai 3/15, har ma a yanayin rashin nasara uku diski - kawai 12/20. Don haka, entropy a cikin rarraba bayanai baya taimakawa ga haƙurin kuskure. Hakanan lura cewa don uwar garken fayil, RAM kyauta yana ƙara saurin amsawa. Ƙarin ƙwaƙwalwar ajiya a cikin kowane kumburi, kuma mafi yawan ƙwaƙwalwar ajiya a duk nodes, da sauri zai kasance. Wannan babu shakka fa'ida ce ta gungu akan sabar guda ɗaya kuma, ma fiye da haka, kayan aikin NAS, inda aka gina ƙaramin adadin ƙwaƙwalwar ajiya a ciki.

Ya biyo baya cewa CEPH hanya ce mai kyau don ƙirƙirar ingantaccen tsarin adana bayanai don dubun tarin tarin fuka tare da ikon yin ƙima tare da ƙaramin saka hannun jari daga kayan aikin da suka wuce (a nan, ba shakka, za a buƙaci farashi, amma kaɗan idan aka kwatanta da tsarin ajiyar kasuwanci).

Aiwatar da tari

Don gwajin, bari mu ɗauki kwamfutar da ba ta aiki Intel DQ57TM + Intel core i3 540 + 16 GB na RAM. Za mu tsara fayafai guda 2 na TB zuwa wani abu kamar RAID10, bayan gwaji mai nasara za mu ƙara node na biyu da adadin diski iri ɗaya.

Ana shigar da Linux. Rarraba yana buƙatar ikon tsarawa da kuma zama barga. Debian da Suse sun cika buƙatun. Suse yana da mai sakawa mai sauƙi wanda ke ba ku damar kashe kowane fakitin; Abin takaici, na kasa gano wadanda za a iya jefar da su ba tare da lalata tsarin ba. Sanya Debian ta amfani da buster debootstrap. Zaɓin min-base yana shigar da tsarin karya wanda ba shi da direbobi. Bambanci a cikin girman idan aka kwatanta da cikakken sigar ba haka ba ne babba kamar damuwa. Tun da ana gudanar da aikin a kan na'ura ta jiki, Ina so in dauki hotuna, kamar a kan inji mai mahimmanci. Ana bayar da wannan zaɓi ta ko dai LVM ko btrfs (ko xfs, ko zfs - bambancin ba babba bane). Hoton hoto na LVM ba abu ne mai ƙarfi ba. Shigar da btrfs. Kuma bootloader yana cikin MBR. Babu wata ma'ana a tattare da faifan 50 MB tare da sashin FAT lokacin da zaku iya tura shi cikin yankin tebur na 1 MB kuma ku ware duk sarari don tsarin. Ya ɗauki 700 MB akan faifai. Ban tuna nawa ainihin shigarwar SUSE ke da shi ba, Ina tsammanin yana da kusan 1.1 ko 1.4 GB.

Shigar da CEPH. Mun yi watsi da sigar 12 a cikin ma'ajiyar debian kuma muna haɗa kai tsaye daga rukunin yanar gizon 15.2.3. Muna bin umarnin daga sashin "Shigar da CEPH da hannu" tare da fa'idodi masu zuwa:

  • Kafin haɗa ma'ajiyar, dole ne ka shigar da gnupg wget ca-certificates
  • Bayan haɗa ma'ajiyar, amma kafin shigar da gungu, an daina shigar da fakiti: apt -y --no-install-yana ba da shawarar shigar da ceph-common ceph-mon ceph-osd ceph-mds ceph-mgr
  • Lokacin shigar da CEPH, don dalilan da ba a sani ba, zai yi ƙoƙarin shigar da lvm2. A ka'ida, ba abin tausayi ba ne, amma shigarwa ya gaza, don haka CEPH ba zai shigar ko ɗaya ba.

    Wannan facin ya taimaka:

    cat << EOF >> /var/lib/dpkg/status
    Package: lvm2
    Status: install ok installed
    Priority: important
    Section: admin
    Installed-Size: 0
    Maintainer: Debian Adduser Developers <[email protected]>
    Architecture: all
    Multi-Arch: foreign
    Version: 113.118
    Description: No-install
    EOF
    

Tasirin tari

ceph-osd - yana da alhakin adana bayanai akan faifai. Ga kowane faifai, ana ƙaddamar da sabis na cibiyar sadarwa wanda ke karɓa da aiwatar da buƙatun karantawa ko rubuta zuwa abubuwa. An ƙirƙiri ɓangarori biyu akan faifai. Ɗaya daga cikinsu ya ƙunshi bayani game da gungu, lambar diski, da maɓallan gungu. Wannan bayanin 1KB ana ƙirƙira shi sau ɗaya lokacin ƙara diski kuma ba a taɓa ganin ya canza ba. Bangare na biyu ba shi da tsarin fayil kuma yana adana bayanan binary CEPH. Shigarwa ta atomatik a cikin sigogin baya sun ƙirƙiri ɓangaren xfs 100MB don bayanin sabis. Na canza faifai zuwa MBR kuma na ware 16MB kawai - sabis ɗin bai koka ba. Ina tsammanin za a iya maye gurbin xfs tare da ext ba tare da wata matsala ba. An ɗora wannan ɓangaren a /var/lib/…, inda sabis ɗin ke karanta bayanai game da OSD kuma ya sami nuni ga na'urar toshe inda aka adana bayanan binary. A bisa ka'ida, zaku iya sanya fayilolin taimako nan da nan a /var/lib/…, kuma ku ware dukkan faifai don bayanai. Lokacin ƙirƙirar OSD ta hanyar ceph-deploy, ana ƙirƙiri ƙa'ida ta atomatik don hawa bangare a /var/lib/…, kuma ana ba mai amfani da ceph haƙƙoƙin karanta na'urar toshe da ake so. Idan ka shigar da hannu, dole ne ka yi wannan da kanka; takardun ba su faɗi haka ba. Hakanan yana da kyau a ƙididdige madaidaicin maƙasudin ƙwaƙwalwar ajiya na osd domin a sami isasshen ƙwaƙwalwar ajiyar jiki.

ceph-mds. A ƙaramin matakin, CEPH shine ajiyar abu. Ikon toshe ajiya yana zuwa don adana kowane block 4MB azaman abu. Adana fayil yana aiki akan ƙa'ida ɗaya. An ƙirƙiri wuraren tafki guda biyu: ɗaya don metadata, ɗayan don bayanai. Ana haɗa su cikin tsarin fayil. A wannan lokacin, an ƙirƙiri wani nau'in rikodin, don haka idan kun share tsarin fayil ɗin, amma kiyaye wuraren tafki biyu, ba za ku iya dawo da shi ba. Akwai hanya don cire fayiloli ta hanyar tubalan, ban gwada shi ba. Sabis na ceph-mds yana da alhakin samun dama ga tsarin fayil. Kowane tsarin fayil yana buƙatar keɓantaccen misalin sabis ɗin. Akwai zaɓi na "index", wanda ke ba ku damar ƙirƙirar kamannin tsarin fayiloli da yawa a cikin ɗaya - kuma ba a gwada su ba.

Ceph-mon - Wannan sabis ɗin yana adana taswirar tari. Ya haɗa da bayanai game da duk OSDs, algorithm don rarraba PGs a cikin OSDs kuma, mafi mahimmanci, bayanai game da duk abubuwa (bayanan da wannan tsarin ba su bayyana a gare ni ba: akwai directory / var/lib/ceph/mon/…/ store.db, yana ƙunshe da babban fayil ɗin 26MB ne, kuma a cikin gungu na abubuwa 105K, ya zama ɗan sama da 256 bytes akan kowane abu - Ina tsammanin cewa mai saka idanu yana adana jerin abubuwan duka da PGs a ciki. suna nan). Lalacewa ga wannan jagorar yana haifar da asarar duk bayanan da ke cikin tari. Don haka an yanke shawarar cewa CRUSH yana nuna yadda PGs suke akan OSD, da kuma yadda abubuwa suke akan PGs - ana adana su a tsakiya a cikin ma'ajin bayanai, komai nawa masu haɓakawa ke guje wa wannan kalmar. A sakamakon haka, da farko, ba za mu iya shigar da tsarin a kan filasha a cikin yanayin RO ba, tun da kullum ana yin rikodin bayanai, ana buƙatar ƙarin faifai don waɗannan (da wuya fiye da 1 GB), na biyu, dole ne a sami faifai. kwafi a ainihin lokacin wannan tushe. Idan akwai masu saka idanu da yawa, to ana tabbatar da haƙurin kuskure ta atomatik, amma a cikin yanayinmu akwai saka idanu ɗaya kawai, matsakaicin biyu. Akwai ka'idar hanya don maido da mai saka idanu dangane da bayanan OSD, na yi amfani da shi sau uku don dalilai daban-daban, kuma sau uku babu saƙon kuskure, haka kuma babu bayanai. Abin takaici, wannan tsarin ba ya aiki. Ko dai mu yi aiki da ɗan ƙaramin bangare akan OSD kuma mu haɗa RAID don adana bayanan, wanda tabbas zai yi mummunan tasiri akan aikin, ko kuma mu ware aƙalla amintattun kafofin watsa labarai na zahiri guda biyu, zai fi dacewa USB, don kar mu mamaye tashoshin jiragen ruwa.

rados-gw - yana fitar da ajiyar abu ta hanyar ka'idar S3 da makamantansu. Yana ƙirƙira wuraren tafkuna da yawa, ba a san dalili ba. Ban yi gwaji da yawa ba.

ceph-mgr - Lokacin shigar da wannan sabis ɗin, ana ƙaddamar da kayayyaki da yawa. Ɗayan su shine sikelin auto wanda ba za a iya kashe shi ba. Yana ƙoƙarin kiyaye daidai adadin PG/OSD. Idan kana so ka sarrafa rabo da hannu, za ka iya musaki sikeli ga kowane pool, amma a cikin wannan yanayin da module hadarurruka da rabo ta 0, da kuma cluster matsayi zama ERROR. An rubuta tsarin a cikin Python, kuma idan kun yi sharhi akan layin da ya dace a cikinsa, wannan yana haifar da kashe shi. Yayi kasala don tunawa da cikakkun bayanai.

Jerin hanyoyin da aka yi amfani da su:

Shigar da CEPH
Farfadowa daga cikakkiyar gazawar saka idanu

Jerin rubutun:

Shigar da tsarin ta hanyar debootstrap

blkdev=sdb1
mkfs.btrfs -f /dev/$blkdev
mount /dev/$blkdev /mnt
cd /mnt
for i in {@,@var,@home}; do btrfs subvolume create $i; done
mkdir snapshot @/{var,home}
for i in {var,home}; do mount -o bind @${i} @/$i; done
debootstrap buster @ http://deb.debian.org/debian; echo $?
for i in {dev,proc,sys}; do mount -o bind /$i @/$i; done
cp /etc/bash.bashrc @/etc/

chroot /mnt/@ /bin/bash
echo rbd1 > /etc/hostname
passwd
uuid=`blkid | grep $blkdev | cut -d """ -f 2`
cat << EOF > /etc/fstab
UUID=$uuid / btrfs noatime,nodiratime,subvol=@ 0 1
UUID=$uuid /var btrfs noatime,nodiratime,subvol=@var 0 2
UUID=$uuid /home btrfs noatime,nodiratime,subvol=@home 0 2
EOF
cat << EOF >> /var/lib/dpkg/status
Package: lvm2
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install

Package: sudo
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install
EOF

exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

apt -yq install --no-install-recommends linux-image-amd64 bash-completion ed btrfs-progs grub-pc iproute2 ssh  smartmontools ntfs-3g net-tools man
exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

Ƙirƙiri tari

apt -yq install --no-install-recommends gnupg wget ca-certificates
echo 'deb https://download.ceph.com/debian-octopus/ buster main' >> /etc/apt/sources.list
wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
apt update
apt -yq install --no-install-recommends ceph-common ceph-mon

echo 192.168.11.11 rbd1 >> /etc/hosts
uuid=`cat /proc/sys/kernel/random/uuid`
cat << EOF > /etc/ceph/ceph.conf
[global]
fsid = $uuid
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
mon allow pool delete = true
mon host = 192.168.11.11
mon initial members = rbd1
mon max pg per osd = 385
osd crush update on start = false
#osd memory target = 2147483648
osd memory target = 1610612736
osd scrub chunk min = 1
osd scrub chunk max = 2
osd scrub sleep = .2
osd pool default pg autoscale mode = off
osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 1
osd pool default pgp num = 1
[mon]
mgr initial modules = dashboard
EOF

ceph-authtool --create-keyring ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
cp ceph.client.admin.keyring /etc/ceph/
ceph-authtool --create-keyring bootstrap-osd.ceph.keyring --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 'allow r'
cp bootstrap-osd.ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
ceph-authtool ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
ceph-authtool ceph.mon.keyring --import-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
monmaptool --create --add rbd1 192.168.11.11 --fsid $uuid monmap
rm -R /var/lib/ceph/mon/ceph-rbd1/*
ceph-mon --mkfs -i rbd1 --monmap monmap --keyring ceph.mon.keyring
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-mon@rbd1
systemctl start ceph-mon@rbd1
ceph mon enable-msgr2
ceph status

# dashboard

apt -yq install --no-install-recommends ceph-mgr ceph-mgr-dashboard python3-distutils python3-yaml
mkdir /var/lib/ceph/mgr/ceph-rbd1
ceph auth get-or-create mgr.rbd1 mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-rbd1/keyring
systemctl enable ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1
ceph config set mgr mgr/dashboard/ssl false
ceph config set mgr mgr/dashboard/server_port 7000
ceph dashboard ac-user-create root 1111115 administrator
systemctl stop ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1

Ƙara OSD (bangare)

apt install ceph-osd

osdnum=`ceph osd create`
mkdir -p /var/lib/ceph/osd/ceph-$osdnum
mkfs -t xfs /dev/sda1
mount -t xfs /dev/sda1 /var/lib/ceph/osd/ceph-$osdnum
cd /var/lib/ceph/osd/ceph-$osdnum
ceph auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' > /var/lib/ceph/osd/ceph-$osdnum/keyring
ln -s /dev/disk/by-partuuid/d8cc3da6-02  block
ceph-osd -i $osdnum --mkfs
#chown ceph:ceph /dev/sd?2
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-osd@$osdnum
systemctl start ceph-osd@$osdnum

Takaitaccen

Babban fa'idar kasuwancin CEPH shine CRUSH - algorithm don ƙididdige wurin bayanai. Masu saka idanu suna rarraba wannan algorithm ga abokan ciniki, bayan haka abokan ciniki kai tsaye suna buƙatar kumburin da ake so da OSD da ake so. CRUSH yana tabbatar da ba a tsakiya. Karamin fayil ne wanda har ma zaka iya bugawa ka rataya a bango. Aiki ya nuna cewa CRUSH ba taswira ce cikakke ba. Idan ka lalata kuma sake ƙirƙirar masu saka idanu, kiyaye duk OSD da CRUSH, to wannan bai isa ya dawo da tari ba. Daga wannan an kammala cewa kowane mai saka idanu yana adana wasu metadata game da duka tari. Ƙananan adadin wannan metadata baya sanya ƙuntatawa akan girman gungu, amma yana buƙatar tabbatar da amincin su, wanda ke kawar da ajiyar faifai ta hanyar shigar da tsarin a kan filasha kuma ya keɓance gungu tare da ƙasa da nodes uku. Manufar m mai haɓakawa game da abubuwan zaɓi. Nisa daga minimalism. Takaddun yana kan matakin "na gode da abin da muke da shi, amma yana da ƙarancin gaske." Ana ba da ikon yin hulɗa tare da sabis a ƙaramin matakin, amma takaddun sun taɓa wannan batun sosai, don haka yana da yuwuwar a'a fiye da e. A zahiri babu damar dawo da bayanai daga yanayin gaggawa.

Zaɓuɓɓuka don ƙarin aiki: watsi da CEPH kuma yi amfani da banal Multi-disk btrfs (ko xfs, zfs), nemo sabbin bayanai game da CEPH, wanda zai ba ku damar sarrafa shi a ƙarƙashin ƙayyadaddun sharuɗɗan, gwada rubuta naku ajiya azaman ci gaba. horo.

source: www.habr.com

Add a comment