Ahụmịhe ọrụ CEPH

Mgbe enwere data karịa ka enwere ike dabara na otu diski, oge eruola ichebara RAID echiche. Mgbe m bụ nwata, m na-anụkarị site n’aka ndị okenye m: “Otu ụbọchị RAID ga-abụ ihe gara aga, nchekwa ihe ga-ejupụta ụwa, ma ị maghị ihe CEPH bụ,” yabụ ihe mbụ na ndụ m nwere onwe m. bụ imepụta ụyọkọ nke m. Ebumnuche nke nnwale ahụ bụ ka ịmara usoro dị n'ime ceph wee ghọta oke nke ngwa ya. Kedu ka esi zie ezi na mmejuputa ceph na azụmaahịa na obere obere? Mgbe ọtụtụ afọ nke ọrụ na di na nwunye nke na-apụghị ịgbagha agbagha data ọnwụ, nghọta nke mgbagwoju anya bilitere na ọ bụghị ihe niile dị mfe. Ihe dị iche iche nke CEPH na-egbochi nnabata ya nke ukwuu, na n'ihi ha, nnwale abanyela na njedebe. N'okpuru ebe a bụ nkọwa nke usoro niile e mere, nsonaazụ enwetara na nkwubi okwu ndị e nwetara. Ọ bụrụ na ndị maara ihe na-ekerịta ahụmahụ ha ma kọwaa isi ihe ụfọdụ, m ga-enwe ekele.

Mara: Ndị na-aza ajụjụ achọpụtala nnukwu njehie na ụfọdụ n'ime echiche ndị chọrọ nlegharị anya nke akụkọ ahụ dum.

Atụmatụ CEPH

Ụyọkọ CEPH na-ejikọta nọmba aka ike K nke diski nke nha aka ike ma na-echekwa data na ha, na-emegharị ibe ọ bụla (4 MB na ndabara) nọmba enyere oge N.

Ka anyị tụlee ikpe kachasị mfe na diski abụọ yiri ya. Site na ha ị nwere ike ikpokọta RAID 1 ma ọ bụ ụyọkọ nwere N = 2 - nsonaazụ ga-abụ otu. Ọ bụrụ na enwere diski atọ na ha dị nha dị iche iche, mgbe ahụ ọ dị mfe ikpokọta ụyọkọ na N = 2: ụfọdụ data ga-adị na diski 1 na 2, ụfọdụ ga-adị na diski 1 na 3, ụfọdụ ga-adịkwa. na 2 na 3, mgbe RAID agaghị (ị nwere ike ikpokọta RAID dị otú ahụ, ma ọ ga-abụ ihe gbagọrọ agbagọ). Ọ bụrụ na enwere diski ndị ọzọ, mgbe ahụ ọ ga-ekwe omume ịmepụta RAID 5; CEPH nwere analogue - erasure_code, nke megidere echiche mmalite nke ndị mmepe, ya mere anaghị atụle ya. RAID 5 na-eche na enwere obere draịva, ha niile dị n'ọnọdụ dị mma. Ọ bụrụ na otu ada ada, ndị ọzọ ga-ejiderịrị ruo mgbe edochiri diski ahụ wee weghachi data na ya. CEPH, na N> = 3, na-akwado iji diski ochie, karịsịa, ọ bụrụ na ị na-edebe ọtụtụ diski dị mma iji chekwaa otu data, ma chekwaa abụọ ma ọ bụ atọ fọdụrụnụ na nnukwu diski ochie, mgbe ahụ ozi ahụ. Ọ ga-adị mma, ebe ọ bụ na ugbu a diski ọhụrụ dị ndụ - enweghị nsogbu, ma ọ bụrụ na otu n'ime ha agbaji, mgbe ahụ ọdịda nke otu diski atọ nwere ndụ ọrụ karịa afọ ise, ọkacha mma site na sava dị iche iche, bụ ihe na-agaghị ekwe omume. omume.

Enwere aghụghọ na nkesa nke nnomi. Site na ndabara, a na-eche na e kewara data ahụ n'ime karịa (~ 100 kwa diski) otu nkesa PG, nke ọ bụla n'ime ha na-emegharị na diski ụfọdụ. Ka anyị kwuo K = 6, N = 2, mgbe ahụ, ọ bụrụ na diski abụọ ọ bụla daa, a na-ekwe nkwa na data ga-efunahụ, ebe ọ bụ na dịka echiche nke puru omume, a ga-enwe ma ọ dịkarịa ala otu PG ga-adị na diski abụọ a. Na ọnwụ nke otu ìgwè na-eme ka data niile dị na ọdọ mmiri adịghị. Ọ bụrụ na ekewa diski ahụ na ụzọ atọ na-ekwe ka echekwa data naanị na diski n'ime otu ụzọ, mgbe ahụ nkesa dị otú ahụ na-eguzogidekwa ọdịda nke diski ọ bụla, ma ọ bụrụ na diski abụọ dara, ihe gbasara nke puru omume nke ọnwụ data abụghị. 100%, ma naanị 3/15, na ọbụna ikpe nke ọdịda atọ diski - naanị 12/20. N'ihi ya, entropy na nkesa data anaghị etinye aka na nnabata mmejọ. Marakwa na maka ihe nkesa faịlụ, RAM efu na-abawanye ngwa ngwa nzaghachi. Ka ebe nchekwa dị na ọnụ nke ọ bụla, na ebe nchekwa dị na oghere niile, ọ ga-adị ngwa ngwa. Ihe ịrụ ụka adịghị ya na nke a bụ uru nke ụyọkọ n'elu otu ihe nkesa na, ọbụna karịa, ngwaike NAS, ebe a na-ewu obere ebe nchekwa.

Ọ na-esote na CEPH bụ ụzọ dị mma iji mepụta usoro nchekwa data a pụrụ ịdabere na ya maka iri puku TB nwere ike iji obere ntinye ego sitere na ngwá ọrụ oge ochie (ebe a, n'ezie, a ga-achọ ụgwọ, ma obere ma e jiri ya tụnyere usoro nchekwa azụmahịa).

Mmejuputa ụyọkọ

Maka nnwale ahụ, ka anyị were kọmputa arụrụ arụ ọrụ Intel DQ57TM + Intel core i3 540 + 16 GB nke RAM. Anyị ga-ahazi diski 2 TB anọ n'ime ihe dịka RAID10, mgbe nyochachara nke ọma, anyị ga-agbakwunye ọnụ ụzọ nke abụọ na ọnụ ọgụgụ diski ahụ.

Ịwụnye Linux. Nkesa na-achọ ikike ịhazi ma kwụsie ike. Debian na Suse na-emezu ihe achọrọ. Suse nwere ihe nrụnye na-agbanwe agbanwe nke na-enye gị ohere gbanyụọ ngwugwu ọ bụla; N'ụzọ dị mwute, enweghị m ike ịchọpụta ndị a ga-atụfu na-emebighị usoro ahụ. Wụnye Debian site na iji debootstrap buster. Nhọrọ min-base na-etinye sistemu gbajiri agbaji nke enweghị ndị ọkwọ ụgbọ ala. Ọdịiche dị na nha ma e jiri ya tụnyere ụdị zuru oke abụghị nnukwu nsogbu. Ebe ọ bụ na a na-arụ ọrụ ahụ na igwe anụ ahụ, achọrọ m ịse foto, dị ka na igwe mebere. A na-enye nhọrọ a site na LVM ma ọ bụ btrfs (ma ọ bụ xfs, ma ọ bụ zfs - ọdịiche ahụ abụghị nnukwu). Ihe onyonyo LVM abụghị ebe siri ike. Wụnye btrfs. Na bootloader dị na MBR. Ọ nweghị uru ijikọ diski 50 MB nwere akụkụ FAT mgbe ị nwere ike ịkwanye ya na mpaghara tebụl nkebi nke 1 MB wee kenye ohere niile maka sistemụ. Were 700 MB na diski. Anaghị m echeta ole nrụnye SUSE bụ isi, echere m na ọ bụ ihe dịka 1.1 ma ọ bụ 1.4 GB.

Wụnye CEPH. Anyị na-eleghara ụdị 12 anya na ebe nchekwa debian wee jikọọ ozugbo na saịtị 15.2.3. Anyị na-agbaso ntuziaka sitere na ngalaba “Wụnye CEPH n'aka” yana egwu ndị a:

  • Tupu ijikọ ebe nchekwa ahụ, ị ​​ga-etinyerịrị gnupg wget ca-certificates
  • Mgbe ejikọtachara ebe nchekwa ahụ, mana tupu ịwụnye ụyọkọ ahụ, ahapụla ịwụnye ngwugwu: apt -y --no-install-na-akwado ịwụnye ceph-common ceph-mon ceph-osd ceph-mds ceph-mgr.
  • Mgbe ị na-etinye CEPH, n'ihi amaghị ihe kpatara ya, ọ ga-agbalị ịwụnye lvm2. Na ụkpụrụ, ọ bụghị ihe nwute, mana nrụnye ahụ ada ada, yabụ CEPH agaghị wụnye nke ọ bụla.

    Ihe mgbochi a nyere aka:

    cat << EOF >> /var/lib/dpkg/status
    Package: lvm2
    Status: install ok installed
    Priority: important
    Section: admin
    Installed-Size: 0
    Maintainer: Debian Adduser Developers <[email protected]>
    Architecture: all
    Multi-Arch: foreign
    Version: 113.118
    Description: No-install
    EOF
    

Nchịkọta ụyọkọ

ceph-osd - bụ ọrụ maka ịchekwa data na diski. Maka diski ọ bụla, a na-ewepụta ọrụ netwọk nke na-anabata ma na-eme arịrịọ ka ịgụ ma ọ bụ dee ihe. A na-emepụta akụkụ abụọ na diski ahụ. Otu n'ime ha nwere ozi gbasara ụyọkọ, nọmba diski na igodo nke ụyọkọ ahụ. A na-emepụta ozi 1KB a otu ugboro mgbe ị na-agbakwunye diski na ahụbeghị ka ọ gbanwee. Nkebi nke abụọ enweghị sistemụ faịlụ ma chekwaa data ọnụọgụ abụọ CEPH. Ntinye akpaaka na ụdị ndị gara aga mepụtara akụkụ 100MB xfs maka ozi ọrụ. Agbanwere m diski ahụ na MBR wee kenye naanị 16MB - ọrụ ahụ anaghị eme mkpesa. Echere m na enwere ike dochie xfs na ext n'enweghị nsogbu ọ bụla. A na-etinye nkebi a na /var/lib/…, ebe ọrụ na-agụ ozi gbasara OSD ma hụkwa ihe nrụtụaka na ngwaọrụ ngọngọ ebe echekwara data ọnụọgụ abụọ. N'usoro iwu, ị nwere ike idowe faịlụ inyeaka ozugbo na /var/lib/…, wee kenye diski dum maka data. Mgbe ị na-eke OSD site na ceph-deploy, a na-emepụta iwu na-akpaghị aka iji bulie nkebi ahụ na /var/lib/…, na-ekenyekwa onye ọrụ ceph ikike ịgụ ngwaọrụ ngọngọ achọrọ. Ọ bụrụ na iji aka tinye ya, ị ga-emerịrị nke a n'onwe gị; akwụkwọ anaghị ekwu nke a. Ọ dịkwa mma ịkọwapụta oke osd ebe nchekwa ebumnuche ka enwere ebe nchekwa anụ ahụ zuru oke.

ceph-mds. N'ọkwa dị ala, CEPH bụ nchekwa ihe. Ikike igbochi nchekwa na-agbadata ịchekwa ngọngọ 4MB ọ bụla dịka ihe. Nchekwa faịlụ na-arụ ọrụ n'otu ụkpụrụ ahụ. A na-emepụta ọdọ mmiri abụọ: otu maka metadata, nke ọzọ maka data. A na-ejikọta ha na sistemụ faịlụ. N'oge a, a na-emepụta ụdị ndekọ ụfọdụ, yabụ ọ bụrụ na ihichapụ usoro faịlụ ahụ, mana debe ọdọ mmiri abụọ ahụ, ị ​​​​gaghị enwe ike iweghachi ya. Enwere usoro maka iwepu faịlụ site na ngọngọ, anwalebeghị m ya. Ọrụ ceph-mds bụ maka ịnweta sistemụ faịlụ. Sistemu faịlụ ọ bụla chọrọ ihe atụ dị iche iche nke ọrụ ahụ. Enwere nhọrọ "index", nke na-enye gị ohere ịmepụta ụdị nke ọtụtụ faịlụ faịlụ n'otu n'otu - anaghị anwale ya.

Ceph-mon - Ọrụ a na-echekwa maapụ ụyọkọ. Ọ gụnyere ozi gbasara OSD niile, algọridim maka ikesa PG na OSD na, nke kachasị mkpa, ozi gbasara ihe niile (nkọwa nke usoro a edoghị m anya: enwere ndekọ / var/lib/ceph/mon/…/ store.db, o nwere nnukwu faịlụ ahụ bụ 26MB, na n'ime ụyọkọ nke ihe 105K, ọ na-atụgharị na ọ bụ ntakịrị ihe karịrị 256 bytes kwa ihe - echere m na onye nyocha ahụ na-echekwa ndepụta nke ihe niile na PG nke. ha dị). Mmebi nke akwụkwọ ndekọ aha a na-ebute mfu nke data niile dị na ụyọkọ ahụ. N'ihi ya, a bịara nkwubi okwu ahụ na CRUSH na-egosi otú PG si dị na OSD, yana otu ihe dị na PG - a na-echekwa ha n'ime nchekwa data, n'agbanyeghị otú ndị mmepe si zere okwu a. N'ihi ya, nke mbụ, anyị enweghị ike ịwụnye sistemu ahụ na draịva flash na ọnọdụ RO, ebe ọ bụ na a na-edekọ nchekwa data mgbe niile, achọrọ diski ọzọ maka ihe ndị a (o siri ike karịa 1 GB), nke abụọ, ọ dị mkpa ịnwe nchekwa data. detuo ozugbo na isi ihe a. Ọ bụrụ na enwere ọtụtụ ndị nlekota, mgbe ahụ, a na-ahụta nnabata mmejọ na-akpaghị aka, mana n'ọnọdụ anyị enwere naanị otu nyocha, kacha abụọ. Enwere usoro usoro iwu maka iweghachi onye nleba anya dabere na data OSD, etinyere m ya ugboro atọ maka ihe dị iche iche, ugboro atọ enweghị ozi njehie, yana enweghị data. N'ụzọ dị mwute, usoro a anaghị arụ ọrụ. Ma anyị na-arụ ọrụ a miniature nkebi na OSD na-achịkọta a RAID na-echekwa nchekwa data, nke ga-enwe nnọọ njọ mmetụta na arụmọrụ, ma ọ bụ na anyị ekenye opekata mpe abụọ a pụrụ ịdabere na media anụ ahụ, ọkachamma USB, ka ghara ogide ọdụ ụgbọ mmiri.

rados-gw - na-ebupụ nchekwa ihe site na protocol S3 na ihe yiri ya. Na-emepụta ọtụtụ ọdọ mmiri, amabeghị ihe kpatara ya. Nnwale nke ukwuu.

ceph-mgr - Mgbe ị na-etinye ọrụ a, a na-ewepụta ọtụtụ modul. Otu n'ime ha bụ autoscale nke enweghị ike gbanyụọ. Ọ na-agba mbọ idowe oke PG/OSD ziri ezi. Ọ bụrụ na ịchọrọ iji aka gị jikwaa oke ahụ, ị ​​nwere ike gbanyụọ scaling maka ọdọ mmiri ọ bụla, mana na nke a modul na-akụda na nkewa site na 0, ọnọdụ ụyọkọ na-aghọ ERROR. Edere modul ahụ na Python, ma ọ bụrụ na ị kwupụta ahịrị dị mkpa na ya, nke a na-eduga n'ịgbanyụ ya. Dị umengwụ icheta nkọwa.

Ndepụta isi mmalite ejiri:

Ntinye nke CEPH
Iweghachite site na ọdịda nyocha zuru oke

Ndepụta edemede:

Ịwụnye sistemụ site na debootstrap

blkdev=sdb1
mkfs.btrfs -f /dev/$blkdev
mount /dev/$blkdev /mnt
cd /mnt
for i in {@,@var,@home}; do btrfs subvolume create $i; done
mkdir snapshot @/{var,home}
for i in {var,home}; do mount -o bind @${i} @/$i; done
debootstrap buster @ http://deb.debian.org/debian; echo $?
for i in {dev,proc,sys}; do mount -o bind /$i @/$i; done
cp /etc/bash.bashrc @/etc/

chroot /mnt/@ /bin/bash
echo rbd1 > /etc/hostname
passwd
uuid=`blkid | grep $blkdev | cut -d """ -f 2`
cat << EOF > /etc/fstab
UUID=$uuid / btrfs noatime,nodiratime,subvol=@ 0 1
UUID=$uuid /var btrfs noatime,nodiratime,subvol=@var 0 2
UUID=$uuid /home btrfs noatime,nodiratime,subvol=@home 0 2
EOF
cat << EOF >> /var/lib/dpkg/status
Package: lvm2
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install

Package: sudo
Status: install ok installed
Priority: important
Section: admin
Installed-Size: 0
Maintainer: Debian Adduser Developers <[email protected]>
Architecture: all
Multi-Arch: foreign
Version: 113.118
Description: No-install
EOF

exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

apt -yq install --no-install-recommends linux-image-amd64 bash-completion ed btrfs-progs grub-pc iproute2 ssh  smartmontools ntfs-3g net-tools man
exit
grub-install --boot-directory=@/boot/ /dev/$blkdev
init 6

Mepụta ụyọkọ

apt -yq install --no-install-recommends gnupg wget ca-certificates
echo 'deb https://download.ceph.com/debian-octopus/ buster main' >> /etc/apt/sources.list
wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
apt update
apt -yq install --no-install-recommends ceph-common ceph-mon

echo 192.168.11.11 rbd1 >> /etc/hosts
uuid=`cat /proc/sys/kernel/random/uuid`
cat << EOF > /etc/ceph/ceph.conf
[global]
fsid = $uuid
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
mon allow pool delete = true
mon host = 192.168.11.11
mon initial members = rbd1
mon max pg per osd = 385
osd crush update on start = false
#osd memory target = 2147483648
osd memory target = 1610612736
osd scrub chunk min = 1
osd scrub chunk max = 2
osd scrub sleep = .2
osd pool default pg autoscale mode = off
osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 1
osd pool default pgp num = 1
[mon]
mgr initial modules = dashboard
EOF

ceph-authtool --create-keyring ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
cp ceph.client.admin.keyring /etc/ceph/
ceph-authtool --create-keyring bootstrap-osd.ceph.keyring --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 'allow r'
cp bootstrap-osd.ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
ceph-authtool ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
ceph-authtool ceph.mon.keyring --import-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
monmaptool --create --add rbd1 192.168.11.11 --fsid $uuid monmap
rm -R /var/lib/ceph/mon/ceph-rbd1/*
ceph-mon --mkfs -i rbd1 --monmap monmap --keyring ceph.mon.keyring
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-mon@rbd1
systemctl start ceph-mon@rbd1
ceph mon enable-msgr2
ceph status

# dashboard

apt -yq install --no-install-recommends ceph-mgr ceph-mgr-dashboard python3-distutils python3-yaml
mkdir /var/lib/ceph/mgr/ceph-rbd1
ceph auth get-or-create mgr.rbd1 mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-rbd1/keyring
systemctl enable ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1
ceph config set mgr mgr/dashboard/ssl false
ceph config set mgr mgr/dashboard/server_port 7000
ceph dashboard ac-user-create root 1111115 administrator
systemctl stop ceph-mgr@rbd1
systemctl start ceph-mgr@rbd1

Na-agbakwunye OSD (akụkụ)

apt install ceph-osd

osdnum=`ceph osd create`
mkdir -p /var/lib/ceph/osd/ceph-$osdnum
mkfs -t xfs /dev/sda1
mount -t xfs /dev/sda1 /var/lib/ceph/osd/ceph-$osdnum
cd /var/lib/ceph/osd/ceph-$osdnum
ceph auth get-or-create osd.0 mon 'profile osd' mgr 'profile osd' osd 'allow *' > /var/lib/ceph/osd/ceph-$osdnum/keyring
ln -s /dev/disk/by-partuuid/d8cc3da6-02  block
ceph-osd -i $osdnum --mkfs
#chown ceph:ceph /dev/sd?2
chown ceph:ceph -R /var/lib/ceph
systemctl enable ceph-osd@$osdnum
systemctl start ceph-osd@$osdnum

Nchịkọta

Isi uru ahịa ahịa nke CEPH bụ CRUSH - algọridim maka ịgbakọ ebe data dị. Monitors na-ekesa algọridim a na ndị ahịa, mgbe nke ahụ gasịrị ndị ahịa na-arịọ ozugbo ọnụ ọnụ achọrọ na OSD chọrọ. CRUSH na-ahụ na ọ nweghị centralization. Ọ bụ obere faịlụ nke ị nwere ike ibipụta ma kpọgidere na mgbidi. Omume egosila na CRUSH abụghị maapụ na-agwụ ike. Ọ bụrụ na ibibi ma megharịa ndị nlekota, na-edebe OSD na CRUSH niile, mgbe ahụ nke a ezughị iji weghachi ụyọkọ ahụ. Site na nke a, ekwubiri na onye nleba anya ọ bụla na-echekwa ụfọdụ metadata gbasara ụyọkọ ahụ dum. Obere ntakịrị nke metadata a anaghị etinye ihe mgbochi na nha nke ụyọkọ ahụ, mana ọ chọrọ ijide n'aka na nchekwa ha, nke na-ewepụ nchekwa nchekwa diski site na ịwụnye usoro na draịva flash ma wepụ ụyọkọ na-erughị ọnụ atọ. Amụma ike ike nke onye nrụpụta gbasara njirimara nhọrọ. Dị anya na minimalism. Akwụkwọ ahụ dị n'ogo nke "daalụ maka ihe anyị nwere, mana ọ dị oke obere." A na-enye ikike ịmekọrịta ọrụ na ọkwa dị ala, mana akwụkwọ ahụ na-emetụ isiokwu a aka nke ukwuu, yabụ na ọ ga-abụ mba karịa ee. Ọ fọrọ nke nta ka ọ bụrụ ohere ọ bụla ị nwetaghachi data site na ọnọdụ mberede.

Nhọrọ maka imekwu ihe: hapụ CEPH wee jiri banal multi-disk btrfs (ma ọ bụ xfs, zfs), chọpụta ozi ọhụrụ gbasara CEPH, nke ga-enye gị ohere iji ya rụọ ọrụ n'okpuru ọnọdụ ndị akọwapụtara, gbalịa dee nchekwa nke gị dị ka ihe dị elu. ọzụzụ.

isi: www.habr.com

Tinye a comment