FAQ akan gine-gine da aikin VKontakte

Tarihin halittar VKontakte yana kan Wikipedia; Pavel da kansa ya fada. Da alama kowa ya san ta. Game da abubuwan ciki, gine-gine da tsarin rukunin yanar gizon akan HighLoad++ Pavel yace min a 2010. Yawancin sabobin sun yoyo tun lokacin, don haka za mu sabunta bayanan: za mu rarraba su, fitar da abubuwan ciki, auna shi, kuma duba na'urar VK ta hanyar fasaha.

FAQ akan gine-gine da aikin VKontakte

Alexei Akulovich (AterCattus) mai haɓaka baya a cikin ƙungiyar VKontakte. Rubutun wannan rahoto amsa ce ta gamayya ga tambayoyin da ake yawan yi kan yadda ake tafiyar da dandalin, ababen more rayuwa, sabobin sadarwa da mu’amala a tsakaninsu, amma ba game da ci gaba ba, wato. game da baƙin ƙarfe. Na dabam, game da bayanan bayanai da abin da VK ke da shi a maimakon haka, game da tattara rajistan ayyukan da sa ido kan duk aikin gaba ɗaya. Cikakken bayani a ƙarƙashin yanke.



Fiye da shekaru hudu ina fama da kowane irin ayyuka da suka shafi baya.

  • Ana lodawa, adanawa, sarrafawa, rarraba kafofin watsa labarai: bidiyo, yawo kai tsaye, sauti, hotuna, takardu.
  • Kayan aiki, dandamali, saka idanu na masu haɓakawa, rajistan ayyukan, caches na yanki, CDN, ka'idar RPC ta mallaka.
  • Haɗin kai tare da sabis na waje: sanarwar turawa, fassarar hanyar haɗin waje, ciyarwar RSS.
  • Taimakawa abokan aiki da tambayoyi daban-daban, amsoshin da ke buƙatar nutsewa cikin lambar da ba a sani ba.

A wannan lokacin, ina da hannu a yawancin sassan rukunin yanar gizon. Ina so in raba wannan kwarewa.

Babban gine-gine

Komai, kamar yadda aka saba, yana farawa da uwar garken ko rukunin sabar da ke karɓar buƙatun.

Sabar na gaba

Sabar ta gaba tana karɓar buƙatun ta HTTPS, RTMP da WSS.

HTTPS - Waɗannan buƙatun ne don manyan nau'ikan gidan yanar gizon yanar gizo da wayar hannu: vk.com da m.vk.com, da sauran abokan cinikinmu na hukuma da na API ɗinmu: abokan cinikin hannu, manzanni. Muna da liyafar RTMP- zirga-zirga don watsa shirye-shiryen Live tare da sabobin gaba daban daban da WSS- haɗi don API Streaming.

Don HTTPS da WSS akan sabobin yana da daraja nginx. Don watsa shirye-shiryen RTMP, kwanan nan mun canza zuwa namu mafita kiwo, amma ya wuce iyakar rahoton. Don haƙurin kuskure, waɗannan sabar suna tallata adiresoshin IP na gama gari kuma suna aiki cikin rukuni ta yadda idan akwai matsala akan ɗayan sabar, buƙatun mai amfani ba su ɓace ba. Don HTTPS da WSS, waɗannan sabar guda ɗaya suna ɓoye zirga-zirga don ɗaukar ɓangaren nauyin CPU akan kansu.

Ba za mu ƙara yin magana game da WSS da RTMP ba, amma game da daidaitattun buƙatun HTTPS, waɗanda galibi ana haɗa su da aikin yanar gizo.

backend

A bayan gaba yawanci akwai sabobin baya. Suna aiwatar da buƙatun da uwar garken gaba ke karɓa daga abokan ciniki.

wannan kPHP sabar, wanda daemon HTTP ke gudana, saboda an riga an ɓoye HTTPS. kPHP sabar ce da ke aiki a kunne prefork model: fara babban tsari, ɗimbin matakai na yara, yana ba da damar sauraron su kuma suna aiwatar da buƙatun su. A wannan yanayin, ba a sake farawa matakai tsakanin kowane buƙatun daga mai amfani, amma kawai sake saita jihar su zuwa ainihin ƙimar ƙimar sifili - buƙatu bayan buƙatar, maimakon sake farawa.

Rarraba kaya

Duk abubuwan da muke goyan bayanmu ba babban tafkin inji bane waɗanda zasu iya aiwatar da kowace buƙata. Mu su kasu kashi daban-daban: gabaɗaya, wayar hannu, api, bidiyo, tsarawa ... Matsalar akan rukunin injuna daban ba zai shafi duk sauran ba. Idan akwai matsala tare da bidiyo, mai amfani da ke sauraron kiɗa ba zai san matsalolin ba. Wanne baya don aika buƙatun zuwa an yanke shawarar ta nginx a gaba bisa ga saitin.

Tarin awo da sake daidaitawa

Don fahimtar adadin motocin da muke buƙatar samun a kowace ƙungiya, mu kar a dogara ga QPS. Abubuwan baya sun bambanta, suna da buƙatu daban-daban, kowane buƙatun yana da rikitarwa daban-daban na ƙididdige QPS. Shi ya sa mu muna aiki tare da manufar lodi akan uwar garken gaba ɗaya - akan CPU da perf.

Muna da dubban irin waɗannan sabobin. Kowane uwar garken jiki yana gudanar da ƙungiyar kPHP don sake yin amfani da duk abin da ake buƙata (saboda kPHP zaren zaren guda ɗaya ne).

Sabar abun ciki

CS ko uwar garken abun ciki ajiya ne. CS sabar ce da ke adana fayiloli kuma tana aiwatar da fayilolin da aka ɗora da kowane nau'in ayyuka na daidaitawa na baya wanda babban gidan yanar gizo ke ba shi.

Muna da dubun dubatar sabar na zahiri waɗanda ke adana fayiloli. Masu amfani suna son loda fayiloli, kuma muna son adanawa da raba su. Wasu daga cikin waɗannan sabobin ana rufe su ta wasu sabar pu/pp na musamman.

ku/pp

Idan kun buɗe shafin yanar gizon a cikin VK, kun ga pu/pp.

FAQ akan gine-gine da aikin VKontakte

Menene pu/pp? Idan muka rufe sabar ɗaya bayan ɗaya, to akwai zaɓuɓɓuka biyu don lodawa da zazzage fayil zuwa uwar garken da aka rufe: kai tsaye ta hanyar http://cs100500.userapi.com/path ko via matsakaici uwar garken - http://pu.vk.com/c100500/path.

Pu shine sunan tarihi don loda hoto, kuma pp shine wakili na hoto. Wato daya uwar garken na loda hotuna ne, wani kuma na lodawa ne. Yanzu ba hotuna kawai aka loda ba, amma an adana sunan.

Waɗannan sabobin ƙare zaman HTTPSdon cire kayan sarrafawa daga ajiya. Hakanan, tunda ana sarrafa fayilolin mai amfani akan waɗannan sabar, ƙarancin bayanan da aka adana akan waɗannan injina, mafi kyau. Misali, maɓallan ɓoye HTTPS.

Tun da sauran injinan ke rufe injin ɗin, ba za mu iya ba su “farar” IPs na waje ba, kuma ba "launin toka". Ta wannan hanyar mun adana akan tafkin IP kuma mun ba da garantin kare injin daga samun damar waje - babu kawai IP don shiga ciki.

Resilience akan IPs da aka raba. Dangane da haƙurin kuskure, tsarin yana aiki iri ɗaya - sabobin jiki da yawa suna da IP na zahiri na gama gari, kuma kayan aikin da ke gabansu suna zaɓar inda za a aika buƙatar. Zan yi magana game da wasu zaɓuɓɓuka daga baya.

Abin da ake cece-kuce shi ne a wannan yanayin abokin ciniki yana riƙe ƙarancin haɗi. Idan akwai IP iri ɗaya don injuna da yawa - tare da mai watsa shiri iri ɗaya: pu.vk.com ko pp.vk.com, mai binciken abokin ciniki yana da iyaka akan adadin buƙatun lokaci guda zuwa runduna ɗaya. Amma a lokacin HTTP/2 na ko'ina, na yi imani cewa wannan bai dace ba.

Babban hasara na tsarin shine cewa dole ne famfo duk zirga-zirga, wanda ke zuwa wurin ajiya, ta wata uwar garken. Tunda muna fitar da zirga-zirga ta injuna, har yanzu ba za mu iya fitar da cunkoson ababen hawa ba, misali, bidiyo, ta amfani da wannan tsari. Muna watsa shi kai tsaye - haɗin kai tsaye daban don ma'ajin daban musamman don bidiyo. Muna watsa abun ciki mai sauƙi ta hanyar wakili.

Ba da dadewa ba mun sami ingantaccen sigar wakili. Yanzu zan gaya muku yadda suka bambanta da na yau da kullun kuma me yasa wannan ya zama dole.

Lah

A cikin Satumba 2017, Oracle, wanda a baya ya sayi Sun, ya kori ma'aikatan Sun da yawa. Za mu iya cewa a wannan lokacin kamfanin ya daina wanzuwa. Lokacin zabar suna don sabon tsarin, masu gudanar da mu sun yanke shawarar ba da ladabi ga ƙwaƙwalwar ajiyar wannan kamfani kuma suna kiran sabon tsarin Sun. A cikin kanmu kawai muna kiranta "rana".

FAQ akan gine-gine da aikin VKontakte

pp ya sami 'yan matsaloli. IP ɗaya a kowace ƙungiya - cache mara tasiri. Sabis na zahiri da yawa suna raba adireshin IP na gama gari, kuma babu wata hanya ta sarrafa wace uwar garken buƙatun za ta je zuwa. Saboda haka, idan masu amfani daban-daban sun zo don fayil iri ɗaya, to, idan akwai cache akan waɗannan sabobin, fayil ɗin yana ƙarewa a cikin cache na kowane uwar garken. Wannan tsari ne mara inganci, amma ba za a iya yin komai ba.

Sakamakon haka - ba za mu iya raba abun ciki ba, saboda ba za mu iya zaɓar takamaiman uwar garken don wannan rukunin ba - suna da IP gama gari. Haka kuma saboda wasu dalilai na cikin gida da muke da su bai yiwu a shigar da irin waɗannan sabar a yankuna ba. Sun tsaya ne kawai a St. Petersburg.

Tare da rana, mun canza tsarin zaɓi. Yanzu muna da anycast routing: dynamic routing, anycast, daemon duba kai. Kowane uwar garken yana da nasa IP guda ɗaya, amma na gama gari. Ana tsara komai ta hanyar da idan uwar garken ɗaya ta gaza, ana bazuwar zirga-zirga a kan sauran sabar na rukuni ɗaya ta atomatik. Yanzu yana yiwuwa a zaɓi takamaiman uwar garken, babu m caching, kuma ba a shafi abin dogaro ba.

Tallafin nauyi. Yanzu za mu iya samun damar shigar da na'urori masu iko daban-daban kamar yadda ake bukata, da kuma, idan akwai matsalolin wucin gadi, canza ma'auni na "rana" masu aiki don rage nauyin da ke kan su, don su "huta" kuma su sake yin aiki.

Rarraba ta id abun ciki. Wani abu mai ban dariya game da sharding: yawanci muna shard abun ciki don masu amfani daban-daban su je fayil iri ɗaya ta hanyar "rana" iri ɗaya don samun cache gama gari.

Kwanan nan mun ƙaddamar da aikace-aikacen "Clover". Wannan tambaya ce ta kan layi a cikin watsa shirye-shirye kai tsaye, inda mai watsa shiri yayi tambayoyi kuma masu amfani suna amsawa a ainihin lokacin, zaɓi zaɓuɓɓuka. App ɗin yana da taɗi inda masu amfani za su iya taɗi. Za a iya haɗawa lokaci guda zuwa watsa shirye-shirye fiye da mutane dubu 100. Dukkansu suna rubuta saƙonnin da aka aika ga duk mahalarta, kuma avatar yana zuwa tare da saƙon. Idan mutane dubu 100 suka zo don avatar ɗaya a cikin “rana” ɗaya, to, wani lokacin yana iya birgima a bayan gajimare.

Don jure buƙatun buƙatun don fayil iri ɗaya, don wani nau'in abun ciki ne muke kunna tsarin wawa wanda ke yada fayiloli a duk “rana” da ke cikin yankin.

Rana daga ciki

Maida wakili akan nginx, cache ko dai a cikin RAM ko akan faifan Optane/NVMe mai sauri. Misali: http://sun4-2.userapi.com/c100500/path - hanyar haɗi zuwa "rana", wanda ke cikin yanki na huɗu, ƙungiyar sabar ta biyu. Yana rufe fayil ɗin hanyar, wanda a zahiri yake kwance akan sabar 100500.

cover

Muna ƙara ƙarin kumburi guda ɗaya zuwa tsarin ƙirar mu - yanayin caching.

FAQ akan gine-gine da aikin VKontakte

A ƙasa akwai zane-zane caches na yanki, akwai kusan 20 daga cikinsu. Waɗannan su ne wuraren da caches da "rana" suke, waɗanda za su iya adana zirga-zirga ta kansu.

FAQ akan gine-gine da aikin VKontakte

Wannan caching na abun cikin multimedia ne; babu bayanan mai amfani da aka adana a nan - kiɗa, bidiyo, hotuna kawai.

Don ƙayyade yankin mai amfani, mu muna tattara prefixes cibiyar sadarwa ta BGP da aka sanar a yankuna. A cikin yanayin koma baya, mu ma dole ne mu tantance bayanan geoip idan ba za mu iya samun IP ta prefixes ba. Muna ƙayyade yankin ta hanyar IP na mai amfani. A cikin lambar, za mu iya duba ɗaya ko fiye da yankuna na mai amfani - waɗancan wuraren da ya fi kusa da geographically.

Yaya ta yi aiki?

Muna kirga shaharar fayiloli ta yanki. Akwai adadin cache na yanki inda mai amfani yake, da kuma mai gano fayil - muna ɗaukar wannan biyu kuma muna haɓaka ƙimar tare da kowane zazzagewa.

A lokaci guda, aljanu - ayyuka a yankuna - lokaci zuwa lokaci suna zuwa API kuma suna cewa: "Ni irin wannan ma'auni ne, ba ni jerin manyan fayilolin da suka fi shahara a yankina waɗanda ba su kasance a kaina ba tukuna. ” API ɗin yana ba da ɗimbin fayiloli da aka jera ta hanyar ƙima, daemon yana zazzage su, ya kai su yankuna kuma yana isar da fayilolin daga can. Wannan shine babban bambanci tsakanin pu/pp da Sun daga caches: suna ba da fayil ɗin ta kansu kai tsaye, koda kuwa wannan fayil ɗin baya cikin cache, kuma cache ɗin ta fara zazzage fayil ɗin zuwa kanta, sannan ta fara ba da shi.

A wannan yanayin mun samu abun ciki kusa da masu amfani da kuma yada nauyin cibiyar sadarwa. Misali, kawai daga cache na Moscow muna rarraba fiye da 1 Tbit/s a lokacin mafi girman sa'o'i.

Amma akwai matsaloli - sabobin cache ba roba ba ne. Don mashahurin abun ciki, wani lokacin babu isasshiyar hanyar sadarwa don uwar garken daban. Sabar cache ɗinmu 40-50 Gbit/s ne, amma akwai abun ciki wanda ke toshe irin wannan tasha gaba ɗaya. Muna tafiya don aiwatar da ajiyar fiye da kwafi ɗaya na shahararrun fayiloli a yankin. Ina fatan za mu aiwatar da shi a karshen shekara.

Mun kalli tsarin gine-ginen gabaɗaya.

  • Sabar gaban da ke karɓar buƙatun.
  • Yana goyan bayan aiwatar da buƙatun.
  • Ma'ajiyar da aka rufe ta hanyar wakilai iri biyu.
  • caches na yanki.

Menene ya ɓace daga wannan zane? Tabbas, rumbun bayanan da muke adana bayanai.

Databases ko injuna

Ba mu kira su ba, amma injuna - Injin, saboda a zahiri ba mu da bayanan bayanai a ma'anar da aka yarda da ita.

FAQ akan gine-gine da aikin VKontakte

Wannan ma'aunin wajibi ne. Wannan ya faru ne saboda a cikin 2008-2009, lokacin da VK yana da girma a cikin shahararsa, aikin ya yi aiki gaba ɗaya akan MySQL da Memcache kuma akwai matsaloli. MySQL yana son rushewa da ɓarna fayiloli, bayan haka ba zai murmure ba, kuma Memcache a hankali ya ƙasƙantar da aiki kuma dole ne a sake farawa.

Ya bayyana cewa aikin da ya yi fice ya kasance yana da ma'ajiya mai tsayi, wanda ke lalata bayanai, da cache, wanda ke raguwa. A irin waɗannan yanayi, yana da wahala a haɓaka aikin haɓaka. An yanke shawarar ƙoƙarin sake rubuta muhimman abubuwan da aikin ya mayar da hankali a kan kekunanmu.

Maganin ya yi nasara. An sami damar yin hakan, da kuma matsananciyar larura, domin babu sauran hanyoyin yin kisa a lokacin. Babu tarin bayanai, NoSQL bai wanzu ba tukuna, akwai kawai MySQL, Memcache, PostrgreSQL - kuma shi ke nan.

Ayyukan duniya. Ƙungiyarmu ta masu haɓaka C sun jagoranci ci gaban kuma an yi duk abin da aka yi daidai da tsari. Ba tare da la'akari da injin ba, dukkansu suna da kusan tsarin fayil iri ɗaya da aka rubuta zuwa faifai, sigogin ƙaddamarwa iri ɗaya, sarrafa sigina iri ɗaya, kuma sun kasance kusan iri ɗaya idan yanayin yanayi da matsaloli. Tare da haɓakar injuna, yana da dacewa ga masu gudanarwa suyi aiki da tsarin - babu gidan zoo da ke buƙatar kiyayewa, kuma dole ne su sake koyon yadda ake aiki da kowane sabon bayanan ɓangare na uku, wanda ya sa ya yiwu a ƙara sauri da sauƙi. lambar su.

Nau'in injuna

Tawagar ta rubuta injuna kaɗan. Ga wasu daga cikinsu: aboki, alamu, hoto, ipdb, haruffa, lissafi, rajistan ayyukan, memcached, meowdb, labarai, nostradamus, hoto, lissafin waƙa, pmemcached, akwatin sandbox, bincike, ajiya, abubuwan so, ɗawainiya,…

Ga kowane ɗawainiya da ke buƙatar takamaiman tsarin bayanai ko aiwatar da buƙatun na yau da kullun, ƙungiyar C ta rubuta sabon injin. Me yasa ba.

Muna da injin daban memori, wanda yayi kama da na yau da kullum, amma tare da tarin kayan kirki, kuma wanda ba ya raguwa. Ba ClickHouse ba, amma kuma yana aiki. Akwai daban pmmcached Shin m memcached, wanda kuma zai iya adana bayanai akan faifai, haka ma, fiye da dacewa da RAM, don kada a rasa bayanai lokacin sake kunnawa. Akwai injuna daban-daban don ɗawainiya ɗaya: jerin layi, jeri, saiti - duk abin da aikinmu ke buƙata.

Tari

Daga hangen nesa na lamba, babu buƙatar tunanin injuna ko bayanan bayanai azaman matakai, ƙungiyoyi, ko al'amura. Lambar tana aiki musamman tare da gungu, tare da ƙungiyoyin injuna - nau'i daya a kowace gungu. Bari mu ce akwai gungu da aka rufe - rukuni ne na injuna.

Lambar baya buƙatar sanin wurin zahiri, girman, ko adadin sabar kwata-kwata. Yana zuwa gungu ta hanyar amfani da takamaiman mai ganowa.

Don yin aiki, kuna buƙatar ƙara ƙarin mahaɗan guda ɗaya wanda ke tsakanin lambar da injinan - wakili.

Wakilin RPC

Wakili bas mai haɗawa, wanda kusan dukkanin rukunin yanar gizon ke gudana. A lokaci guda muna da babu binciken sabis - a maimakon haka, akwai tsarin saitin wannan wakili, wanda ya san wurin da duk gungu yake da duk shards na wannan gungu. Wannan shine abinda admins sukeyi.

Masu shirye-shirye ba su damu da nawa ba, a ina da abin da farashinsa - kawai suna zuwa gungu. Wannan yana ba mu damar da yawa. Lokacin karɓar buƙatun, wakili yana tura buƙatar, sanin inda - ya ƙayyade wannan da kansa.

FAQ akan gine-gine da aikin VKontakte

A wannan yanayin, wakili shine wurin kariya daga gazawar sabis. Idan wasu injinan sun ragu ko sun yi karo, to wakili ya fahimci hakan kuma ya amsa daidai ga bangaren abokin ciniki. Wannan yana ba ku damar cire lokacin ƙarewa - lambar ba ta jira injin don amsawa ba, amma ya fahimci cewa ba ya aiki kuma yana buƙatar nuna hali daban. Dole ne a shirya lambar don gaskiyar cewa bayanan bayanan ba koyaushe suke aiki ba.

Takamaiman aiwatarwa

Wani lokaci har yanzu muna son samun wani nau'in bayani mara daidaito a matsayin injin. A lokaci guda kuma, an yanke shawarar kada mu yi amfani da shirye-shiryen rpc-proxy ɗinmu, wanda aka ƙirƙira musamman don injunan mu, amma don yin wakili daban don aikin.

Don MySQL, wanda har yanzu muna da nan da can, muna amfani da db-proxy, kuma don ClickHouse - Kittenhouse.

Yana aiki gabaɗaya kamar wannan. Akwai takamaiman uwar garken, yana gudanar da kPHP, Go, Python - gabaɗaya, kowace lambar da za ta iya amfani da ka'idar RPC ɗin mu. Lambar tana aiki a gida akan wakili na RPC - kowace uwar garken inda lambar take tana gudanar da nata wakili na gida. Bayan buƙatar, wakili ya fahimci inda za a je.

FAQ akan gine-gine da aikin VKontakte

Idan wani injin yana son tafiya zuwa wani, ko da makwabci ne, yana bi ta hanyar wakili, saboda maƙwabcin yana iya kasancewa a wata cibiyar bayanai. Kada injin ya dogara da sanin wurin wani abu banda kansa - wannan shine daidaitaccen maganin mu. Amma tabbas akwai keɓancewa :)

Misali na tsarin TL wanda duk injuna ke aiki.

memcache.not_found                                = memcache.Value;
memcache.strvalue	value:string flags:int = memcache.Value;
memcache.addOrIncr key:string flags:int delay:int value:long = memcache.Value;

tasks.task
    fields_mask:#
    flags:int
    tag:%(Vector int)
    data:string
    id:fields_mask.0?long
    retries:fields_mask.1?int
    scheduled_time:fields_mask.2?int
    deadline:fields_mask.3?int
    = tasks.Task;
 
tasks.addTask type_name:string queue_id:%(Vector int) task:%tasks.Task = Long;

Wannan ka'ida ce ta binary, mafi kusancin analogue wanda shine protobuf. Tsarin tsari yana bayyana filayen zaɓi, nau'ikan hadaddun - kari na ginanniyar scalars, da tambayoyi. Komai yana aiki bisa ga wannan ka'ida.

RPC akan TL akan TCP/UDP… UDP?

Muna da ka'idar RPC don aiwatar da buƙatun injin da ke gudana akan tsarin TL. Wannan duk yana aiki akan haɗin TCP/UDP. TCP yana iya ganewa, amma me yasa muke buƙatar UDP sau da yawa?

UDP yana taimakawa guje wa matsalar ɗimbin haɗin haɗin kai tsakanin sabobin. Idan kowane uwar garken yana da wakili na RPC kuma, gabaɗaya, yana iya zuwa kowane injin, to akwai dubun-dubatar haɗin TCP a kowane sabar. Akwai kaya, amma ba shi da amfani. A wajen UDP wannan matsalar ba ta wanzu.

Babu musafaha TCP mai ƙarfi. Wannan matsala ce ta al'ada: lokacin da aka ƙaddamar da sabon injin ko sabuwar uwar garken, yawancin haɗin TCP ana kafa su lokaci ɗaya. Don ƙananan buƙatun masu nauyi, misali, kayan aikin UDP, duk sadarwa tsakanin lambar da injin shine Fakitin UDP guda biyu: daya tashi a daya hanya, na biyu a daya. Tafiya guda ɗaya - kuma lambar ta sami amsa daga injin ba tare da musafaha ba.

Ee, duk yana aiki kawai tare da ƙananan kaso na asarar fakiti. Yarjejeniyar tana da goyon baya don sake aikawa da lokaci, amma idan muka yi hasara mai yawa, za mu sami kusan TCP, wanda ba shi da riba. Ba mu fitar da UDP a cikin tekuna.

Muna da dubbai irin waɗannan sabobin, kuma makirci ɗaya ne: an shigar da fakitin injuna akan kowace uwar garken jiki. Yawancin su zaren guda ɗaya ne don gudanar da sauri da sauri ba tare da toshewa ba, kuma an raba su azaman mafita mai zaren guda ɗaya. A lokaci guda, ba mu da wani abin dogaro fiye da waɗannan injunan, kuma ana mai da hankali sosai ga ci gaba da adana bayanai.

Ma'ajiyar bayanai na dindindin

Injin suna rubuta binlogs. Binlog fayil ne a ƙarshensa wanda aka ƙara wani taron canji a jiha ko bayanai. A daban-daban mafita ana kiransa daban-daban: binary log, Wal, AOF, amma ka'ida ɗaya ce.

Don hana injin sake karanta duk binlog na shekaru masu yawa lokacin sake kunnawa, injunan suna rubutawa hotuna - halin yanzu. Idan ya cancanta, suna karantawa da farko, sannan su gama karantawa daga binlog ɗin. Duk binlogs an rubuta su a cikin tsarin binary iri ɗaya - bisa ga tsarin TL, ta yadda masu gudanarwa za su iya sarrafa su daidai ta amfani da kayan aikin su. Babu irin wannan buƙatar ɗaukar hoto. Akwai rubutun gaba ɗaya wanda ke nuna hoton wane ne int, sihirin injin, da kuma wane jikin da ba shi da mahimmanci ga kowa. Wannan matsala ce ta injin da ya nadi hoton hoton.

Zan yi sauri bayyana ka'idar aiki. Akwai uwar garken da injin ke tafiyar da ita. Ya buɗe sabon fanko don rubutu kuma ya rubuta wani taron don canji zuwa gare shi.

FAQ akan gine-gine da aikin VKontakte

A wani lokaci, ko dai ya yanke shawarar ɗaukar hoto da kansa, ko kuma ya karɓi sigina. Sabar ta ƙirƙiro sabon fayil, ta rubuta duk yanayinsa a ciki, tana ƙara girman binlog na yanzu - kashewa - zuwa ƙarshen fayil ɗin, kuma ya ci gaba da rubutu. Ba a ƙirƙiri sabon binlog ba.

FAQ akan gine-gine da aikin VKontakte

A wani lokaci, lokacin da injin ya sake kunnawa, za a sami duka biyun binlog da hoto akan faifai. Injin yana karanta ɗaukacin hoton kuma yana ɗaga yanayinsa a wani wuri.

FAQ akan gine-gine da aikin VKontakte

Yana karanta matsayin da yake a lokacin da aka ƙirƙiri hoton da girman girman binlog.

FAQ akan gine-gine da aikin VKontakte

Yana karanta ƙarshen binlog ɗin don samun halin yanzu kuma ya ci gaba da rubuta ƙarin abubuwan da suka faru. Wannan tsari ne mai sauƙi; duk injinmu suna aiki bisa ga shi.

Kwafiwar bayanai

A sakamakon haka, kwafin bayanai a cikin mu tushen sanarwa - mu rubuta a cikin binlog ba kowane shafi ya canza ba, amma wato canza buƙatun. Yayi kama da abin da ke zuwa akan hanyar sadarwar, an canza shi kaɗan.

Ana amfani da wannan makirci ba kawai don kwafi ba, har ma don ƙirƙirar madadin. Muna da injin - mawallafin rubutu wanda ke rubutawa ga binlog. A duk wani wurin da admins suka saita shi, ana kwafi wannan binlog, kuma shi ke nan - muna da madadin.

FAQ akan gine-gine da aikin VKontakte

Idan ana bukata kwafin karatuDon rage nauyin karatun CPU, ana buɗe injin karantawa kawai, wanda ke karanta ƙarshen binlog kuma yana aiwatar da waɗannan umarni a cikin gida.

Lag a nan kadan ne, kuma yana yiwuwa a gano nawa kwafin ya kasance a bayan maigidan.

Sharing bayanai a cikin wakili na RPC

Ta yaya sharding ke aiki? Ta yaya wakili zai fahimci wane gungu shard zai aika zuwa? Lambar ba ta ce: "Aika don shards 15!" - a'a, wakili ne ke yin haka.

Mafi sauƙin tsarin shine farkon - lambar farko a cikin buƙatar.

get(photo100_500) => 100 % N.

Wannan misali ne don ƙa'idar ƙa'idar rubutu mai sauƙi, amma, ba shakka, tambayoyin na iya zama hadaddun da tsari. Misalin yana ɗaukar lamba ta farko a cikin tambayar da saura idan aka raba ta da girman gungu.

Wannan yana da amfani lokacin da muke son samun wurin bayanan mahalli guda ɗaya. Bari mu ce 100 mai amfani ne ko ID na rukuni, kuma muna son duk bayanan mahaɗan guda ɗaya su kasance akan shard ɗaya don tambayoyi masu rikitarwa.

Idan ba mu damu da yadda ake yada buƙatun a cikin gungu ba, akwai wani zaɓi - hashing dukan shard.

hash(photo100_500) => 3539886280 % N

Hakanan muna samun zanta, ragowar rabo da lambar shard.

Duk waɗannan zaɓuɓɓukan guda biyu suna aiki ne kawai idan an shirya mu don gaskiyar cewa lokacin da muka ƙara girman gungu, za mu raba shi ko ƙara shi sau da yawa. Alal misali, muna da shards 16, ba mu da isasshen, muna son ƙarin - za mu iya samun 32 cikin aminci ba tare da lokaci ba. Idan muna so mu ƙara ba da yawa ba, za a sami raguwa, saboda ba za mu iya raba komai daidai ba tare da asara ba. Waɗannan zaɓuɓɓukan suna da amfani, amma ba koyaushe ba.

Idan muna buƙatar ƙara ko cire adadin sabobin sabani, muna amfani da su Daidaitaccen hashing akan zoben a la Ketama. Amma a lokaci guda, mun rasa wurin da bayanan ke gaba ɗaya; dole ne mu haɗa buƙatun zuwa gungu don kowane yanki ya dawo da ƙaramin martaninsa, sannan mu haɗa martani ga wakili.

Akwai takamaiman buƙatu. Yayi kama da haka: Wakilin RPC yana karɓar buƙatun, yana ƙayyade gungu don zuwa kuma yana ƙayyade shard. Sannan akwai ko dai masu yin rubutu, ko kuma, idan tarin yana da tallafin kwafi, yana aika zuwa kwafi akan buƙata. Wakilin yana yin duk wannan.

FAQ akan gine-gine da aikin VKontakte

Logs

Muna rubuta rajistan ayyukan ta hanyoyi da yawa. Mafi bayyane kuma mai sauki shine rubuta rajistan ayyukan zuwa memcache.

ring-buffer: prefix.idx = line

Akwai prefix maɓalli - sunan log ɗin, layi, kuma akwai girman wannan log ɗin - adadin layin. Muna ɗaukar lambar bazuwar daga 0 zuwa adadin layukan da aka cire 1. Maɓalli a cikin memcache prefix ne wanda aka haɗa tare da wannan lambar bazuwar. Muna adana layin log ɗin da lokacin yanzu zuwa ƙimar.

Lokacin da ya zama dole don karanta rajistan ayyukan, muna aiwatar da shi Multi Get duk maɓallan, an jera su ta lokaci, don haka sami log ɗin samarwa a ainihin lokacin. Ana amfani da makircin lokacin da kuke buƙatar cire wani abu a cikin samarwa a ainihin lokacin, ba tare da karya wani abu ba, ba tare da tsayawa ko ba da izinin zirga-zirga zuwa wasu na'urori ba, amma wannan log ɗin ba ya daɗe.

Don ingantaccen ajiya na katako muna da injin log-injin. Wannan shi ne ainihin dalilin da ya sa aka ƙirƙira shi kuma ana amfani da shi sosai a cikin adadi mai yawa na gungu. Babban gungu na sani yana adana tarin tarin tarin TB 600.

Injin ya tsufa sosai, akwai gungu waɗanda tuni sun kai shekaru 6-7. Akwai matsaloli tare da shi da muke ƙoƙarin warwarewa, alal misali, mun fara amfani da ClickHouse sosai don adana rajistan ayyukan.

Tattara rajistan ayyukan a ClickHouse

Wannan zane yana nuna yadda muke tafiya cikin injin mu.

FAQ akan gine-gine da aikin VKontakte

Akwai lambar da ke tafiya a cikin gida ta hanyar RPC zuwa RPC-proxy, kuma yana fahimtar inda za a je injin. Idan muna son rubuta rajistan ayyukan a ClickHouse, muna buƙatar canza sassa biyu a cikin wannan makirci:

  • maye gurbin wasu injin tare da ClickHouse;
  • maye gurbin wakili na RPC, wanda ba zai iya samun damar ClickHouse ba, tare da wasu bayani wanda zai iya, kuma ta hanyar RPC.

Injin yana da sauƙi - muna maye gurbin shi da sabar ko gungun sabar tare da ClickHouse.

Kuma don zuwa ClickHouse, mun yi Gidan Kitten. Idan muka tafi kai tsaye daga KittenHouse zuwa ClickHouse, ba zai jure ba. Ko da ba tare da buƙatun ba, yana haɓaka daga haɗin HTTP na ɗimbin injuna. Don makircin ya yi aiki, akan sabar tare da ClickHouse an ɗaga wakili na baya na gida, wanda aka rubuta ta hanyar da zai iya jure juzu'in haɗin haɗin da ake buƙata. Hakanan yana iya adana bayanai a cikin kanta gwargwadon dogaro.

FAQ akan gine-gine da aikin VKontakte

Wani lokaci ba ma son aiwatar da tsarin RPC a cikin hanyoyin da ba daidai ba, misali, a cikin nginx. Saboda haka, KittenHouse yana da ikon karɓar rajistan ayyukan ta hanyar UDP.

FAQ akan gine-gine da aikin VKontakte

Idan mai aikawa da mai karɓar rajistan ayyukan suna aiki akan na'ura ɗaya, to yuwuwar rasa fakitin UDP a cikin gida yana da ƙasa kaɗan. A matsayin daidaitawa tsakanin buƙatar aiwatar da RPC a cikin bayani na ɓangare na uku da aminci, muna amfani da aika UDP kawai. Za mu dawo kan wannan makirci daga baya.

Kulawa

Muna da nau'ikan rajistan ayyukan biyu: waɗanda masu gudanarwa ke tattarawa akan sabar su da waɗanda masu haɓakawa suka rubuta daga lamba. Sun yi daidai da nau'ikan ma'auni guda biyu: tsarin da samfur.

Ma'aunin tsarin

Yana aiki akan duk sabobin mu netdata, wanda ke tattara kididdiga da aika su zuwa Carbon Graphite. Don haka, ana amfani da ClickHouse azaman tsarin ajiya, ba Whisper ba, misali. Idan ya cancanta, zaku iya karantawa kai tsaye daga ClickHouse, ko amfani Grafana don ma'auni, jadawalai da rahotanni. A matsayinmu na masu haɓakawa, muna da isassun dama ga Netdata da Grafana.

Ma'aunin samfurin

Domin saukakawa, mun rubuta abubuwa da yawa. Misali, akwai saitin ayyuka na yau da kullun waɗanda ke ba ku damar rubuta ƙididdiga, ƙimar UniqueCounts cikin ƙididdiga, waɗanda aka aika wani wuri gaba.

statlogsCountEvent   ( ‘stat_name’,            $key1, $key2, …)
statlogsUniqueCount ( ‘stat_name’, $uid,    $key1, $key2, …)
statlogsValuetEvent  ( ‘stat_name’, $value, $key1, $key2, …)

$stats = statlogsStatData($params)

Daga baya, za mu iya amfani da rarrabuwa da tara matattara da yin duk abin da muke so daga kididdiga - gina jadawali, saita Watchdogs.

Muna rubutu sosai ma'auni masu yawa adadin abubuwan da suka faru sun kasance daga biliyan 600 zuwa tiriliyan 1 a kowace rana. Duk da haka, muna so mu kiyaye su akalla shekaru biyudon fahimtar abubuwan da ke faruwa a cikin awo. Haɗa shi duka babbar matsala ce da har yanzu ba mu magance ta ba. Zan gaya muku yadda yake aiki a cikin 'yan shekarun nan.

Muna da ayyuka waɗanda ke rubuta waɗannan ma'auni zuwa memcache na gidadon rage yawan shigarwar. Sau ɗaya a cikin ɗan gajeren lokaci an ƙaddamar da gida stats-daemon yana tattara duk bayanan. Bayan haka, aljanin yana haɗa ma'auni zuwa sabar sabar guda biyu log-masu tattara, wanda ke tattara kididdiga daga tarin injinan mu don kada Layer bayan su ya mutu.

FAQ akan gine-gine da aikin VKontakte

Idan ya cancanta, za mu iya rubutawa kai tsaye zuwa ga masu tara bayanai.

FAQ akan gine-gine da aikin VKontakte

Amma rubuta daga lamba kai tsaye zuwa ga masu tarawa, ƙetare stas-daemom, mafita ce mara kyau don yana ƙara nauyi akan mai tarawa. Maganin ya dace kawai idan saboda wasu dalilai ba za mu iya tayar da memcache stats-daemon akan na'ura ba, ko kuma ya fadi kuma mun tafi kai tsaye.

Bayan haka, masu tara logs suna haɗa kididdiga cikin muwDB - wannan shine bayanan mu, wanda kuma zai iya adana awo.

FAQ akan gine-gine da aikin VKontakte

Sa'an nan kuma za mu iya yin zabin "kusa-SQL" binary daga lambar.

FAQ akan gine-gine da aikin VKontakte

Gwaji

A lokacin rani na 2018, muna da hackathon na ciki, kuma ra'ayin ya zo don ƙoƙarin maye gurbin ja na zane tare da wani abu wanda zai iya adana ma'auni a ClickHouse. Muna da rajistan ayyukan akan ClickHouse - me yasa ba gwada shi ba?

FAQ akan gine-gine da aikin VKontakte

Muna da tsari wanda ya rubuta rajista ta hanyar KittenHouse.

FAQ akan gine-gine da aikin VKontakte

Mun yanke shawara ƙara wani "* Gida" zuwa zane, wanda zai karbi daidai ma'auni a cikin tsari kamar yadda lambar mu ta rubuta su ta hanyar UDP. Sannan wannan * Gidan yana juya su cikin abubuwan da aka saka, kamar katako, wanda KittenHouse ya fahimta. Zai iya isar da waɗannan rajistan ayyukan zuwa ClickHouse, wanda yakamata ya iya karanta su.

FAQ akan gine-gine da aikin VKontakte

An maye gurbin tsarin tare da memcache, stats-daemon da kuma bayanan tattara bayanai da wannan.

FAQ akan gine-gine da aikin VKontakte

An maye gurbin tsarin tare da memcache, stats-daemon da kuma bayanan tattara bayanai da wannan.

  • Akwai aikawa daga lamba anan, wacce aka rubuta a gida a cikin StatsHouse.
  • StatsHouse yana rubuta ma'aunin UDP, wanda aka riga an canza shi zuwa abubuwan SQL, zuwa KittenHouse a cikin batches.
  • KittenHouse yana aika su zuwa ClickHouse.
  • Idan muna son karanta su, to muna karanta su ta ƙetare StatsHouse - kai tsaye daga ClickHouse ta amfani da SQL na yau da kullun.

Shin har yanzu gwaji, amma muna son yadda abin ya kasance. Idan muka gyara matsalolin da makirci, to watakila za mu canza zuwa gare shi gaba daya. Da kaina, ina fata haka.

Makircin baya ajiye baƙin ƙarfe. Ana buƙatar ƙarancin sabobin, ba a buƙatar ƙididdiga-daemons na gida da masu tara bayanai, amma ClickHouse yana buƙatar babban uwar garken fiye da waɗanda ke cikin tsarin yanzu. Ana buƙatar ƙarancin sabobin, amma dole ne su kasance masu tsada da ƙarfi.

tura

Da farko, bari mu kalli turawar PHP. Muna tasowa a ciki Git: amfani GitLab и TeamCity domin turawa. An haɗa rassan ci gaba zuwa reshen babban, daga maigidan don gwaji ana haɗa su zuwa matakin tsari, kuma daga tsari zuwa samarwa.

Kafin turawa, ana ɗaukar reshen samarwa na yanzu da na baya, kuma ana la'akari da fayilolin diff a cikinsu - canje-canje: ƙirƙira, sharewa, canza. Ana yin rikodin wannan canjin a cikin binlog na injin kwafi na musamman, wanda zai iya yin kwafin canje-canje da sauri zuwa gabaɗayan rundunar sabar mu. Abin da ake amfani da shi a nan ba kwafi kai tsaye ba ne, amma kwafin tsegumi, lokacin da uwar garken ɗaya ta aika canje-canje ga maƙwabtanta na kusa, waɗanda zuwa ga maƙwabtansu, da sauransu. Wannan yana ba ku damar sabunta lambar a cikin dubun da raka'a na daƙiƙa a cikin dukkan rundunar. Lokacin da canjin ya isa kwafi na gida, yana amfani da waɗannan facin ga nasa tsarin fayil na gida. Rollback kuma ana aiwatar da shi bisa ga wannan makirci.

Hakanan muna tura kPHP da yawa kuma yana da nasa ci gaban akan Git bisa ga zanen da ke sama. Tunda wannan HTTP binary uwar garken, to ba za mu iya samar da diff - binary na saki yana auna daruruwan MB. Saboda haka, akwai wani zaɓi a nan - an rubuta sigar zuwa binlog copyfast. Tare da kowane ginin yana ƙaruwa, kuma yayin jujjuyawa shima yana ƙaruwa. Sigar maimaituwa zuwa sabobin. Masu kwafi na cikin gida sun ga cewa wani sabon salo ya shiga binlog, kuma ta hanyar tsegumi guda ɗaya suna ɗaukar sabon nau'in binary ɗin da kansu, ba tare da gajiyar da uwar garken mu ba, amma a hankali suna yada lodi a kan hanyar sadarwar. Me ya biyo baya m sake farawa ga sabon sigar.

Ga injinan mu, waɗanda suma ainihin binaries ne, makircin yayi kama da haka:

  • git babban reshe;
  • binary in .deb;
  • an rubuta sigar zuwa binlog copyfast;
  • maimaituwa zuwa sabobin;
  • uwar garken yana fitar da sabo .dep;
  • dpkg -i;
  • m sake farawa zuwa sabon siga.

Bambanci shine cewa binary ɗinmu yana kunshe a cikin ma'ajin .deb, da kuma lokacin fitar da su dpkg -i ana sanya su akan tsarin. Me yasa aka tura kPHP azaman binary, kuma ana tura injuna azaman dpkg? Haka ya faru. Yana aiki - kar a taɓa shi.

Hanyoyi masu amfani:

Alexei Akulovich yana daya daga cikin wadanda, a matsayin wani ɓangare na kwamitin shirin, taimaka PHP Rasha a kan Mayu 17th zai zama babban taron ga masu haɓaka PHP a cikin 'yan lokutan. Dubi abin da PC mai sanyi muke da shi, menene masu magana (biyu daga cikinsu suna haɓaka tushen PHP!) - kamar wani abu ne da ba za ku iya rasa ba idan kun rubuta PHP.

source: www.habr.com

Add a comment