RAM da yawa kyauta, NVMe Intel P4500 kuma komai yana jinkirin gaske - labarin ƙari mara nasara na ɓangaren musanyawa

A cikin wannan labarin, zan yi magana game da yanayin da ya faru kwanan nan tare da ɗaya daga cikin sabobin a cikin girgijen mu na VPS, wanda ya bar ni har tsawon sa'o'i da yawa. Na kasance ina daidaitawa da magance sabar Linux kusan shekaru 15, amma wannan shari'ar ba ta dace da aikina ba kwata-kwata - Na yi zato da yawa na karya kuma na sami ɗan matsananciyar matsananciyar wahala kafin in iya tantance dalilin matsalar da warware ta daidai. .

Preamble

Muna aiki da gajimare mai matsakaici, wanda muke ginawa akan daidaitattun sabobin tare da tsari mai zuwa - 32 cores, 256 GB RAM da 4500TB PCI-E Intel P4 NVMe drive. Muna matukar son wannan tsarin saboda yana kawar da buƙatar damuwa game da sama da IO ta hanyar samar da ingantacciyar ƙuntatawa a matakin nau'in misali na VM. Saboda NVMe Intel P4500 yana da ban sha'awa aiki, za mu iya lokaci guda samar da duka biyu cikakken IOPS tanadi ga inji da madadin ajiya zuwa uwar garken madadin tare da sifili IOWAIT.

Mu ne daya daga cikin tsofaffin masu bi waɗanda ba sa amfani da SDN hyperconverged da sauran masu salo, gaye, matasa abubuwa don adana kundin VM, gaskanta cewa mafi sauƙi tsarin, mafi sauƙi shi ne magance shi a cikin yanayin "babban guru ya tafi. zuwa duwatsu.” Sakamakon haka, muna adana kundin VM a cikin tsarin QCOW2 a cikin XFS ko EXT4, wanda aka tura a saman LVM2.

Hakanan ana tilasta mu mu yi amfani da QCOW2 ta samfurin da muke amfani da shi don ƙungiyar kade-kade - Apache CloudStack.

Don yin ajiyar ajiya, muna ɗaukar cikakken hoton ƙarar azaman hoton LVM2 (eh, mun san cewa hotunan LVM2 suna jinkirin, amma Intel P4500 yana taimaka mana a nan ma). Muna yi lvmcreate -s .. kuma tare da taimakon dd muna aika kwafin madadin zuwa uwar garken nesa tare da ajiyar ZFS. Anan har yanzu muna ci gaba kaɗan - bayan haka, ZFS na iya adana bayanai a cikin nau'i mai matsewa, kuma za mu iya dawo da shi da sauri ta amfani da DD ko samun kundin VM guda ɗaya ta amfani da mount -o loop ....

Kuna iya, ba shakka, cire cikakken hoton ƙarar LVM2 ba, amma ku hau tsarin fayil ɗin a cikin RO da kwafin hotunan QCOW2 da kansu, duk da haka, mun fuskanci gaskiyar cewa XFS ya zama mummunan daga wannan, kuma ba nan da nan ba, amma a cikin hanyar da ba a iya ganewa ba. Da gaske ba ma son shi lokacin da masu ba da sabis na hypervisor suka “tsaya” ba zato ba tsammani a karshen mako, da dare ko kuma lokacin hutu saboda kurakuran da ba a bayyana lokacin da za su faru ba. Don haka, don XFS ba ma amfani da hawan hoto a ciki RO don cire juzu'i, kawai muna kwafi dukkan girman LVM2.

An ƙayyade saurin wariyar ajiya zuwa uwar garken madadin a cikin yanayinmu ta hanyar aikin uwar garken madadin, wanda shine kusan 600-800 MB/s don bayanan da ba za a iya fahimta ba; ƙarin iyakance shi ne tashar 10Gbit/s wanda aka haɗa uwar garken madadin. zuwa gungu.

A wannan yanayin, ana loda kwafin kwafi na sabobin hypervisor guda 8 a lokaci guda zuwa sabar madadin guda ɗaya. Don haka, faifan diski da tsarin cibiyar sadarwa na uwar garken ajiyar, kasancewa a hankali, ba sa ƙyale tsarin faifai na rundunonin hypervisor su yi nauyi, tunda ba su da ikon aiwatarwa kawai, ka ce, 8 GB / sec, wanda masu ɗaukar hoto na iya sauƙaƙe. kera.

Tsarin kwafin da ke sama yana da mahimmanci sosai don ƙarin labarin, gami da cikakkun bayanai - ta amfani da injin Intel P4500 mai sauri, ta amfani da NFS kuma, mai yiwuwa, ta amfani da ZFS.

Ajiyayyen labari

A kan kowane kumburin hypervisor muna da ƙaramin ɓangaren SWAP na girman 8 GB, kuma muna “fitar da” kumburin hypervisor kanta ta amfani da DD daga hoton tunani. Don girman tsarin akan sabobin, muna amfani da 2xSATA SSD RAID1 ko 2xSAS HDD RAID1 akan LSI ko mai sarrafa kayan aikin HP. Gabaɗaya, ba mu damu da komai a ciki ba, tunda ƙarar tsarin mu yana aiki a yanayin “kusan karantawa kawai”, ban da SWAP. Kuma tunda muna da RAM da yawa akan sabar kuma yana da 30-40% kyauta, ba ma tunanin SWAP.

Ajiyayyen tsari. Wannan aikin yana kama da wani abu kamar haka:

#!/bin/bash

mkdir -p /mnt/backups/volumes

DIR=/mnt/images-snap
VOL=images/volume
DATE=$(date "+%d")
HOSTNAME=$(hostname)

lvcreate -s -n $VOL-snap -l100%FREE $VOL
ionice -c3 dd iflag=direct if=/dev/$VOL-snap bs=1M of=/mnt/backups/volumes/$HOSTNAME-$DATE.raw
lvremove -f $VOL-snap

kula da ionice -c3, a zahiri, wannan abu ba shi da amfani ga na'urorin NVMe, tunda an saita masu tsara IO don su kamar:

cat /sys/block/nvme0n1/queue/scheduler
[none] 

Koyaya, muna da nodes ɗin gado da yawa tare da SSD RAID na al'ada, a gare su wannan yana da dacewa, don haka suna motsawa. AS IS. Gabaɗaya, wannan kawai yanki ne mai ban sha'awa na lamba wanda ke bayyana rashin amfani ionice idan akwai irin wannan tsari.

Kula da tuta iflag=direct to DD. Muna amfani da IO kai tsaye ta ƙetare cache na buffer don guje wa maye gurbin da ba dole ba na IO buffer lokacin karantawa. Duk da haka, oflag=direct ba mu yi ba saboda mun ci karo da lamuran aikin ZFS lokacin amfani da shi.

Mun yi amfani da wannan makirci cikin nasara shekaru da yawa ba tare da matsala ba.

Sannan aka fara... Mun gano cewa daya daga cikin nodes din ba a goyon bayansa, kuma na baya yana gudana tare da IOWAIT mai ban mamaki na 50%. Lokacin ƙoƙarin fahimtar dalilin da yasa kwafi baya faruwa, mun ci karo da al'amura masu zuwa:

Volume group "images" not found

Mun fara tunani game da "ƙarshen ya zo ga Intel P4500," duk da haka, kafin kashe uwar garken don maye gurbin drive, har yanzu ya zama dole don yin madadin. Mun gyara LVM2 ta hanyar maido da metadata daga madadin LVM2:

vgcfgrestore images

Mun kaddamar da madadin sai muka ga wannan zanen mai:
RAM da yawa kyauta, NVMe Intel P4500 kuma komai yana jinkirin gaske - labarin ƙari mara nasara na ɓangaren musanyawa

Mun sake yin baƙin ciki sosai - a bayyane yake cewa ba za mu iya rayuwa haka ba, tun da duk VPSs za su sha wahala, wanda ke nufin mu ma za mu sha wahala. Ba a san abin da ya faru ba - iostat ya nuna IOPS mai tausayi da IOWAIT mafi girma. Babu wani ra'ayi ban da "bari mu maye gurbin NVMe," amma fahimta ta faru a daidai lokacin.

Binciken halin da ake ciki mataki-mataki

Mujallar tarihi. Bayan 'yan kwanaki a baya, akan wannan uwar garke ya zama dole don ƙirƙirar babban VPS tare da 128 GB RAM. Da alama akwai isasshen ƙwaƙwalwar ajiya, amma don kasancewa a gefen aminci, mun ware wani 32 GB don ɓangaren musanyawa. An ƙirƙiri VPS, an kammala aikin cikin nasara kuma an manta da lamarin, amma ɓangaren SWAP ya kasance.

Siffofin Kanfigareshan. Ga duk sabobin gajimare ma'aunin vm.swappiness an saita zuwa tsoho 60. Kuma an ƙirƙiri SWAP akan SAS HDD RAID1.

Me ya faru (a cewar masu gyara). Lokacin yin goyan baya DD ya samar da bayanai da yawa na rubutawa, waɗanda aka sanya su a cikin buffers RAM kafin rubutawa zuwa NFS. Tsarin tsarin, jagora ta hanyar manufa swappiness, An motsa shafuka da yawa na ƙwaƙwalwar VPS zuwa yankin swap, wanda aka samo a kan jinkirin HDD RAID1 girma. Wannan ya haifar da IOWAIT girma sosai, amma ba saboda IO NVMe ba, amma saboda IO HDD RAID1.

Yadda aka warware matsalar. An kashe ɓangaren musanya 32GB. Wannan ya ɗauki awanni 16; zaku iya karantawa daban game da yadda kuma me yasa SWAP ke kashewa a hankali. An canza saituna swappiness zuwa darajar daidai da 5 duk bisa gajimare.

Ta yaya hakan bai faru ba?. Da fari dai, idan SWAP ya kasance akan na'urar SSD RAID ko NVMe, na biyu kuma, idan babu na'urar NVMe, amma na'urar mai hankali wacce ba zata samar da irin wannan adadin bayanai ba - abin mamaki, matsalar ta faru saboda NVMe yayi sauri.

Bayan haka, duk abin da ya fara aiki kamar baya - tare da sifili IOWAIT.

source: www.habr.com

Add a comment