Khangela kwi-1 TB/s

TL; DR: Kwiminyaka emine eyadlulayo ndashiya uGoogle ndinombono wesixhobo esitsha sokujonga iseva. Uluvo yayikukudibanisa imisebenzi eqhele ukwenziwa ibe yinkonzo enye ingqokelela kunye nohlalutyo lwelog, ukuqokelelwa kweemetrics, izilumkiso kunye needeshibhodi. Omnye wemigaqo kukuba inkonzo kufuneka ibe yenyani ngokukhawuleza, ukubonelela ngee-devops ngokulula, ukusebenzisana, amava amnandi. Oku kufuna ukucubungula iiseti zedatha ye-multi-gigabyte kumaqhezu okwesibini ngelixa uhlala kuhlahlo lwabiwo-mali. Izixhobo zolawulo lwelog ezikhoyo zihlala zicotha kwaye zibuthuntu, ngoko besijongene nomceli mngeni olungileyo: ukuyila ngobuchule isixhobo ukunika abasebenzisi amava amatsha.

Eli nqaku lichaza indlela thina e-Scalyr eyisombulule ngayo le ngxaki ngokusebenzisa iindlela zesikolo esidala, i-brute force approach, ukuphelisa iileya ezingeyomfuneko kunye nokuphepha izakhiwo zedatha ezinzima. Unokusebenzisa ezi zifundo kwiingxaki zakho zobunjineli.

Amandla eSikolo esiDala

Uhlalutyo lwelogi ludla ngokuqala ngophendlo: fumana yonke imiyalezo ehambelana nepateni ethile. Kwi-Scalyr, la ngamashumi okanye amakhulu eegigabhayithi zelogi ezivela kwiiseva ezininzi. Iindlela zale mihla, njengomthetho, zibandakanya ukwakhiwa kolwakhiwo lwedatha entsonkothileyo elungiselelwe ukukhangela. Ngokuqinisekileyo ndiyibonile oku kuGoogle, apho balunge kakhulu ngolu hlobo lwento. Kodwa sazinza kwindlela enzima kakhulu: ukuskena okukodwa kweelog. Kwaye isebenzile - sibonelela ngojongano olunokukhangelwa oluyiodolo zobukhulu ngokukhawuleza kunookhuphisana nabo (jonga oopopayi ekugqibeleni).

Ukuqonda okuphambili yayikukuba iiprosesa zanamhlanje zikhawuleza kakhulu kwimisebenzi elula, ethe ngqo. Oku kulula ukuphosakela kwiinkqubo eziyinkimbinkimbi, ezininzi ezixhomekeke kwisantya se-I / O kunye nokusebenza kwenethiwekhi, kwaye iinkqubo ezinjalo zixhaphake kakhulu namhlanje. Ke siphuhlise uyilo olucutha iileya kunye nobutyobo obugqithisileyo. Ngamaprosesa amaninzi kunye neeseva ngokufanayo, isantya sokukhangela sifikelela kwi-TB ye-1 ngomzuzwana.

Izinto eziphambili ezithathwa kweli nqaku:

  • Ukukhangela kweBrute-force yindlela esebenzayo yokusombulula iingxaki zehlabathi zokwenyani, ezinkulu.
  • I-Brute force bubuchule boyilo, hayi isisombululo esingenamsebenzi. Njengabo nabuphi na ubuchule, bulunge ngakumbi kwezinye iingxaki kunezinye, kwaye zinokuphunyezwa kakubi okanye kakuhle.
  • Amandla abhlaza alungile ngokukodwa ekufezekiseni ezinzile ukusebenza.
  • Ukusetyenziswa okusebenzayo kwamandla akhohlakeleyo kufuna ikhowudi yokwandisa kunye nokusebenzisa izixhobo ezaneleyo ngexesha elifanelekileyo. Ifanelekile ukuba iiseva zakho ziphantsi komthwalo onzima ongengomsebenzisi kwaye imisebenzi yabasebenzisi ihlala iphambili.
  • Ukusebenza kuxhomekeke kuyilo lwenkqubo yonke, kungekhona nje i-algorithm ye-loop yangaphakathi.

(Eli nqaku lichaza ukukhangela idatha kwimemori. Kwiimeko ezininzi, xa umsebenzisi enza uphando lwelogi, abancedisi be-Scalyr sele beyigcinile. Inqaku elilandelayo liza kuxubusha ukukhangela iilogi ezingagcinwanga. Imigaqo efanayo iyasebenza: ikhowudi esebenzayo, i-brute force ngezixhobo ezinkulu zokubala).

Indlela yeBrute force

Ngokwesiko, isethi enkulu yedatha ikhangelwe kusetyenziswa isalathisi segama elingundoqo. Xa isetyenziswa kwiilog zomncedisi, oku kuthetha ukukhangela igama ngalinye elilodwa kwilog. Kwigama ngalinye, kufuneka wenze uluhlu lwazo zonke ezibandakanyiweyo. Oku kwenza kube lula ukufumana yonke imiyalezo ngeli gama, umzekelo 'impazamo', 'firefox' okanye "transaction_16851951" - jonga nje kwisalathiso.

Ndisebenzise le ndlela kuGoogle kwaye isebenze kakuhle. Kodwa kwi-Scalyr sikhangela iilog byte byte.

Ngoba? Ukusuka kwindawo yokujonga i-algorithmic engabonakaliyo, izalathi zegama elingundoqo zisebenza kakhulu kunokukhangela kwamandla akhohlakeleyo. Nangona kunjalo, asithengisi i-algorithms, sithengisa ukusebenza. Kwaye ukusebenza akukho malunga ne-algorithms kuphela, kodwa malunga nobunjineli beenkqubo. Kufuneka siqwalasele yonke into: umthamo wedatha, uhlobo lokukhangela, i-hardware ekhoyo kunye nomxholo wesoftware. Sigqibe kwelokuba ingxaki yethu ethile, into efana ne-'grep' ifaneleke ngcono kunesalathiso.

Izalathisi zinkulu, kodwa zinemida. Igama elinye kulula ukulifumana. Kodwa ukukhangela imiyalezo enamagama amaninzi, anje nge 'googlebot' kunye ne '404', kunzima kakhulu. Ukukhangela ibinzana elifana ne 'uncaught exception' kufuna isalathiso esinzima esingashiyi yonke imiyalezo enelo gama kuphela, kodwa nendawo ethile yegama.

Obona bunzima bufika xa ungajonganga magama. Masithi ufuna ukubona ukuba ingakanani itrafikhi evela kwi-bots. Ingcinga yokuqala kukukhangela iinkuni zegama elithi 'bot'. Nantsi indlela oya kuzifumana ngayo ezinye iibhothi: Googlebot, Bingbot kunye nezinye ezininzi. Kodwa apha 'ibhot' ayilogama, kodwa yinxalenye yalo. Ukuba sikhangela i'bot' kwisalathiso, asizukufumana naziphi na iiposti ezinegama elithi 'Googlebot'. Ukuba ujonga lonke igama kwisalathiso kwaye emva koko uskena isalathisi samagama angundoqo afunyenweyo, uphendlo luzakucotha kakhulu. Ngenxa yoko, ezinye iinkqubo zelog azivumeli uphendlo lwegama-nxalenye okanye (eyona nto ingcono) zivumela i-syntax ekhethekileyo kunye nokusebenza okuphantsi. Sifuna ukuyiphepha le nto.

Enye ingxaki ziziphumlisi. Ngaba uyafuna ukufumana zonke izicelo ezivela 50.168.29.7? Kuthekani ngokulungisa iilog eziqulathe [error]? Imirhumo idla ngokutsiba iziphumlisi.

Ekugqibeleni, iinjineli zithanda izixhobo ezinamandla, kwaye ngamanye amaxesha ingxaki inokusonjululwa kuphela ngokubonakaliswa okuqhelekileyo. Isalathisi segama elingundoqo asiyifanelanga kakhulu oku.

Ukongeza, ii-indices entsonkothileyo. Umyalezo ngamnye kufuneka wongezwe kuluhlu lwamagama angundoqo aliqela. Olu luhlu kufuneka lugcinwe kwifomathi ekhangelekayo lula ngamaxesha onke. Imibuzo enamabinzana, amaqhekeza amagama, okanye amabinzana aqhelekileyo kufuneka aguqulelwe kwimisebenzi yoluhlu oluninzi, kwaye iziphumo ziskenwe kwaye zidityaniswe ukuvelisa iziphumo. Kwimeko yenkonzo enkulu, ehlala abantu abaninzi, le ngxaki idala imiba yokusebenza engabonakaliyo xa kuhlalutywa i-algorithms.

Izalathisi zegama elingundoqo nazo zithatha indawo eninzi, kwaye ukugcinwa kuyindleko enkulu kwinkqubo yolawulo lwelogi.

Ngakolunye uhlangothi, uphendlo ngalunye lungadla amandla amaninzi ekhompyutheni. Abasebenzisi bethu bayaluxabisa ukhangelo olukhawulezayo lwemibuzo eyodwa, kodwa imibuzo enjalo yenziwa kunqabile. Kwimibuzo yokukhangela eqhelekileyo, umzekelo, kwideshibhodi, sisebenzisa iindlela ezikhethekileyo (siya kuzichaza kwinqaku elilandelayo). Ezinye izicelo aziqhelekanga kangangokuba akufane kwenzeke ukuba usebenze ngaphezu kwesinye ngexesha. Kodwa oku akuthethi ukuba abancedisi bethu abaxakekanga: baxakeke ngumsebenzi wokufumana, ukuhlalutya kunye nokucinezela imiyalezo emitsha, ukuvavanya izilumkiso, ukucinezela idatha endala, njalo njalo. Ke, sinobonelelo olubalulekileyo lweeprosesa ezinokuthi zisetyenziswe ukwenza imibuzo.

Amandla eBrute ayasebenza ukuba unengxaki ekhohlakeleyo (kunye namandla amaninzi)

I-Brute force isebenza kakuhle kwiingxaki ezilula kunye neelophu ezincinci zangaphakathi. Rhoqo uyakwazi ukwenza iluphu yangaphakathi isebenze ngesantya esiphezulu kakhulu. Ukuba ikhowudi intsonkothile, kunzima kakhulu ukuyikhulisa.

Ikhowudi yethu yokukhangela yayinelophu yangaphakathi enkulu. Sigcina imiyalezo kumaphepha e-4K; iphepha ngalinye linemiyalezo ethile (kwi-UTF-8) kunye nemetadata yomyalezo ngamnye. I-Metadata sisakhiwo esifaka iikhowudi ubude bexabiso, i-ID yomyalezo wangaphakathi, kunye neminye imimandla. Umjikelo wokukhangela ujonge ngolu hlobo:

Khangela kwi-1 TB/s

Olu luguqulelo olwenziwe lula lweyona khowudi. Kodwa nalapha, ukubekwa kwezinto ezininzi, iikopi zedatha, kunye neefowuni zokusebenza ziyabonakala. I-JVM ilungile ekwandiseni iifowuni zokusebenza kunye nokwabiwa kwezinto ze-ephemeral, ke le khowudi isebenze ngcono kunokuba besifanele. Ngexesha lovavanyo, abathengi bayisebenzise ngempumelelo. Kodwa ekugqibeleni sayisa kwinqanaba elilandelayo.

(Ungabuza ukuba kutheni sigcina imiyalezo kule fomati kunye namaphepha e-4K, umbhalo kunye nemethadatha, kunokuba sisebenze kunye neelogi ngokuthe ngqo. Kukho izizathu ezininzi, ezibilisa kwinto yokuba ngaphakathi kwi-injini ye-Scalyr ifana nedatha esasazwayo kunokuba Inkqubo yefayile Ukukhangela okubhaliweyo kuhlala kudityaniswa nezihluzo zesimbo se-DBMS kwimida emva kokwahlulwa kwelogi.Sinokukhangela ngaxeshanye amawaka eelogi ngaxeshanye, kwaye iifayile ezibhaliweyo ezilula azifanelekanga kwintengiselwano yethu, ephindaphindwayo, esasazwayo ulawulo lwedatha).

Ekuqaleni, kwakubonakala ngathi ikhowudi enjalo yayingafanelekanga kakhulu ekusebenziseni amandla akhohlakeleyo. "Umsebenzi wokwenyani" ngaphakathi String.indexOf() ayizange ilawule iprofayile ye-CPU. Oko kukuthi, ukwenza le ndlela yodwa akuyi kuzisa umphumo obalulekileyo.

Kwenzeka ukuba sigcine imethadatha ekuqaleni kwephepha ngalinye, kwaye isicatshulwa sayo yonke imiyalezo kwi-UTF-8 ipakishwe kwelinye icala. Sithatha ithuba loku, siphinda sibhala i-loop ukukhangela lonke iphepha ngaxeshanye:

Khangela kwi-1 TB/s

Le nguqulo isebenza ngokuthe ngqo kwimbono raw byte[] kwaye ikhangela yonke imiyalezo ngexesha elinye kulo lonke iphepha le-4K.

Oku kulula kakhulu ukulungiselela indlela ye-brute force. I-loop yokukhangela yangaphakathi ibizwa ngaxeshanye lonke iphepha le-4K, kunokuba ngokwahlukileyo kwisithuba ngasinye. Akukho kukopishwa kwedatha, akukho ukwabiwa kwezinto. Kwaye imisebenzi yemethadatha entsonkothileyo ibizwa kuphela xa isiphumo sivuma, kwaye hayi kuwo wonke umyalezo. Ngale ndlela siye sasusa itoni ye-overhead, kwaye umthwalo oseleyo ugxininiswe kwindawo encinci yokukhangela yangaphakathi, ekufanelekele ukuphuculwa ngakumbi.

Eyona algorithm yokukhangela isekelwe Ingcamango enkulu kaLeonid Volnitsky. Iyafana ne-algorithm ye-Boyer-Moore, ukutsiba malunga nobude bomtya wokukhangela kwinqanaba ngalinye. Umahluko ophambili kukuba ijonga ii-byte ezimbini ngexesha lokunciphisa imidlalo yobuxoki.

Ukuphunyezwa kwethu kufuna ukwenza i-64K yokujonga itafile kukhangelo ngalunye, kodwa ayonto xa ithelekiswa neegigabytes zedatha esiyikhangelayo. I-loop yangaphakathi iqhuba iigigabytes ezininzi ngesekhondi kwisiseko esinye. Enyanisweni, ukusebenza okuzinzileyo kujikeleze i-1,25 GB ngesekhondi kumbindi ngamnye, kwaye kukho indawo yokuphucula. Kuyenzeka ukuba kupheliswe enye i-overhead ngaphandle kwe-loop yangaphakathi, kwaye siceba ukuzama i-loop yangaphakathi kwi-C endaweni yeJava.

Sisebenzisa amandla

Siye saxoxa ukuba ukukhangela kwelogi kunokuphunyezwa "ngokurhabaxa", kodwa "amandla" angakanani esinawo? Okuninzi kakhulu.

1 undoqo: Xa isetyenziswe ngokuchanekileyo, undoqo omnye weprosesa yangoku inamandla ngokwawo.

8 amanqaku: Ngoku siqhuba kwi-Amazon hi1.4xlarge kunye ne-i2.4xlarge iiseva ze-SSD, nganye ine-8 cores (i-16 threads). Njengoko kukhankanyiwe ngasentla, ezi cores zihlala zixakeke yimisebenzi yangasemva. Xa umsebenzisi esenza uphando, imisebenzi yangasemva iyanqunyanyiswa, ikhulula zonke ii-cores ezisi-8 zokukhangela. Uphendlo ludla ngokugqibezela kwisekondi esahlulahlulwe, emva koko umsebenzi wangemva uphinda uqalise (inkqubo yokuthomalalisa iqinisekisa ukuba i-barrage yemibuzo yokukhangela ayiphazamisi umsebenzi obalulekileyo ongasemva).

16 amanqaku: ukuthembeka, siququzelela iiseva zibe ngamaqela enkosi / amakhoboka. Inkosi nganye ine-SSD enye kunye neseva enye ye-EBS phantsi komyalelo wayo. Ukuba umncedisi oyintloko uyawa, iseva ye-SSD ngokukhawuleza ithatha indawo yayo. Phantse lonke ixesha, inkosi kunye nekhoboka zisebenza kakuhle, ukwenzela ukuba ibhloko nganye yedatha ikhangeleke kwiiseva ezimbini ezahlukeneyo (umncedisi we-EBS wekhoboka unomprosesa obuthathaka, ngoko asiyicingi). Sahlulahlula umsebenzi phakathi kwabo, ukuze sibe ne-16 cores ekhoyo.

Iicores ezininzi: Kwixesha elizayo elingekude, siza kusasaza idatha kuzo zonke iiseva ngendlela yokuba zonke zithathe inxaxheba ekuqhubeleni phambili zonke izicelo ezingabalulekanga. Wonke undoqo uya kusebenza. [Phawula: siphumeze isicwangciso kwaye sandise isantya sokukhangela kwi-1 TB / s, jonga inqaku ekupheleni kwenqaku].

Ubulula buqinisekisa ukuthembeka

Enye inzuzo yendlela ye-brute force kukusebenza kwayo ngokufanelekileyo. Ngokuqhelekileyo, ukukhangela akukhathali kakhulu kwiinkcukacha zengxaki kunye neseti yedatha (ndicinga ukuba yiloo nto ibizwa ngokuba yi "coarse").

Isalathisi segama elingundoqo ngamanye amaxesha sivelisa iziphumo ezikhawulezayo, kwaye ngamanye amaxesha ayenzi. Masithi une-50 GB yelogi apho igama elithi 'customer_5987235982' livela ngqo kathathu. Ukukhangela kweli gama kubala iindawo ezintathu ngokuthe ngqo kwisalathiso kwaye kuya kugqiba ngoko nangoko. Kodwa ukukhangela okuntsokothileyo kwekhadi lasendle kunokuskena amawaka amagama angundoqo kwaye kuthathe ixesha elide.

Kwelinye icala, uphendlo lwamandla akhohlakeleyo luqhuba ngaphezulu okanye ngaphantsi kwesantya esifanayo kuwo nawuphi na umbuzo. Ukukhangela amagama amade kungcono, kodwa nokukhangela umlinganiswa omnye kukhawuleza kakhulu.

Ubulula bendlela ye-brute force ithetha ukuba ukusebenza kwayo kusondele kubuninzi bayo bethiyori. Kukho iinketho ezimbalwa zokulayisha ngaphezulu kwediski okungalindelekanga, ingxabano yokutshixa, ukuleqa isalathisi, kunye namawaka ezinye izizathu zokusilela. Ndijonge nje izicelo ezenziwe ngabasebenzisi be-Scalyr kwiveki ephelileyo kwiseva yethu exakeke kakhulu. Kwakukho izicelo ezili-14. Ngokuchanekileyo isibhozo kubo sithathe ngaphezu kwesekondi enye; I-000% igqityiwe ngaphakathi kwe-99 millisecond (ukuba awuzange usebenzise izixhobo zokuhlalutya kwelog, ndithembe: iyakhawuleza).

Ukusebenza okuzinzileyo, okuthembekileyo kubalulekile ukuze kube lula ukusetyenziswa kwenkonzo. Ukuba ilahleka ngamaxesha athile, abasebenzisi baya kuyibona njengengathembekanga kwaye babe mathidala ukuyisebenzisa.

Log search in action

Nanku upopayi omfutshane obonisa ukhangelo lwe-Scalyr lusebenza. Sineakhawunti yedemo apho singenisa khona wonke umnyhadala kuwo wonke uvimba woluntu weGithub. Kule demo, ndivavanya ixabiso ledatha yeveki: malunga ne-600 MB yeelog ekrwada.

Ividiyo yarekhodwa ngqo, ngaphandle kokulungiswa okukhethekileyo, kwi-desktop yam (malunga neekhilomitha ezingama-5000 ukusuka kumncedisi). Ukusebenza oza kukubona kubangelwa kakhulu ukusebenzela umxhasi wewebhu, kunye ne-backend ekhawulezayo nethembekileyo. Nanini na xa kukho unqumamo ngaphandle kwesalathisi 'sokulayisha', ndiyanqumama ukuze ufunde le nto ndiza kuyicofa.

Khangela kwi-1 TB/s

Ekugqibeleni

Xa kusetyenzwa inani elikhulu ledatha, kubalulekile ukukhetha i-algorithm elungileyo, kodwa "ilungile" ayithethi "ifancy." Cinga ngendlela ikhowudi yakho eya kusebenza ngayo ekusebenzeni. Uhlalutyo lwethiyori yee-algorithms lushiya ezinye izinto ezinokubaluleka kakhulu kwihlabathi lokwenyani. Ii-algorithms ezilula kulula ukuzilungiselela kwaye zizinzile kwiimeko zomda.

Kwakhona cinga malunga nomxholo apho ikhowudi iya kuphunyezwa. Kwimeko yethu, sifuna iiseva ezinamandla ngokwaneleyo ukulawula imisebenzi yangasemva. Abasebenzisi baqalisa uphendlo olunqabile, ngoko ke sinokuboleka lonke iqela leeseva ngexesha elifutshane elifunekayo ukugqiba uphendlo ngalunye.

Sisebenzisa indlela ye-brute force, siphumeze ukhangelo olukhawulezayo, oluthembekileyo, olubhetyebhetye kwiseti yelogi. Siyathemba ukuba ezi ngcamango ziluncedo kwiiprojekthi zakho.

Hlela: Isihloko kunye nesicatshulwa sitshintshile ukusuka ku-"Search at 20 GB per second" ukuya ku-"Search at 1 TB per second" ukubonisa ukunyuka kwentsebenzo kule minyaka imbalwa idlulileyo. Oku kunyuka kwesantya kungenxa yotshintsho kuhlobo kunye nenani leeseva ze-EC2 esizibekayo namhlanje ukuze sikhonze isiseko sethu sabathengi esandayo. Kukho iinguqu ezizayo kungekudala eziya kubonelela ngolunye ulwando olumangalisayo ekusebenzeni kakuhle, kwaye asinakulinda ukwabelana ngazo.

umthombo: www.habr.com

Yongeza izimvo