Sesha ngesivinini esingu-1 TB/s

TL;DR: Eminyakeni emine edlule ngashiya i-Google nombono wethuluzi elisha lokuqapha iseva. Umqondo wawuwukuhlanganisa imisebenzi evamile ehlukanisiwe ibe isevisi eyodwa ukuqoqa nokuhlaziywa kwelogi, ukuqoqwa kwamamethrikhi, izexwayiso kanye namadeshibhodi. Omunye wemigomo ukuthi inkonzo kumele ibe ngeqiniso ngokushesha, ihlinzeka ngama-devops ngesipiliyoni esilula, esisebenzisanayo, esijabulisayo. Lokhu kudinga ukucubungula amasethi edatha ye-multi-gigabyte ngezingxenyana zesekhondi kuyilapho uhlala ngaphakathi kwesabelomali. Amathuluzi okuphatha amalogi akhona ngokuvamile awasheshi futhi ayashesha, ngakho-ke siye sabhekana nenselele enhle: ukuklama ngobuhlakani ithuluzi lokunikeza abasebenzisi umuzwa omusha.

Lesi sihloko sichaza ukuthi thina kwa-Scalyr siyixazulule kanjani le nkinga ngokusebenzisa izindlela zesikole esidala, indlela ye-brute force, ukuqeda izendlalelo ezingadingekile nokugwema izakhiwo zedatha eziyinkimbinkimbi. Ungasebenzisa lezi zifundo ezinkingeni zakho zobunjiniyela.

Amandla Esikole Esidala

Ukuhlaziywa kwelogi kuvame ukuqala ngokusesha: thola yonke imilayezo efana nephethini ethile. Ku-Scalyr, lawa amashumi noma amakhulu ama-gigabytes amalogi avela kumaseva amaningi. Izindlela zesimanje, njengomthetho, zibandakanya ukwakhiwa kwesakhiwo sedatha esiyinkimbinkimbi esenzelwe ukusesha. Ngikubonile lokhu ku-Google, lapho bebahle kakhulu kulolu hlobo lwento. Kodwa sazinza endleleni eyinkimbinkimbi kakhulu: ukuskena okuqondile kwamalogi. Futhi kusebenzile - sinikeza isixhumi esibonakalayo esiseshekayo esingama-oda obukhulu ngokushesha kunezimbangi zethu (bona ukugqwayiza ekugcineni).

Ukuqonda okubalulekile kwaba ukuthi amaphrosesa anamuhla ayashesha kakhulu ekusebenzeni okulula, okuqondile. Lokhu kulula ukukugeja ezinhlelweni eziyinkimbinkimbi, ezendlalelo eziningi ezithembele kusivinini se-I/O kanye nokusebenza kwenethiwekhi, futhi amasistimu anjalo avame kakhulu namuhla. Ngakho-ke sakha idizayini enciphisa izendlalelo nemfucumfucu eyeqile. Ngamaphrosesa amaningi namaseva ngokuhambisana, isivinini sokusesha sifinyelela ku-1 TB ngomzuzwana.

Okubalulekile okuthathwe kulesi sihloko:

  • Ukusesha kwe-Brute-force kuyindlela esebenzayo yokuxazulula izinkinga zomhlaba wangempela, ezinkulu.
  • I-Brute force iyindlela yokuklama, hhayi isisombululo esingenamsebenzi. Njenganoma iyiphi inqubo, ifaneleka kangcono kwezinye izinkinga kunezinye, futhi ingasetshenziswa kabi noma kahle.
  • Amandla e-Brute alungele ukufeza ezinzile ukusebenza.
  • Ukusetshenziswa ngempumelelo kwamandla anonya kudinga ikhodi yokuthuthukisa nokusebenzisa izinsiza ezanele ngesikhathi esifanele. Kuyafaneleka uma amaseva akho engaphansi komthwalo osindayo ongabasebenzisi futhi ukusebenza komsebenzisi kuhlala kuseqhulwini.
  • Ukusebenza kuncike ekwakhiweni kwayo yonke isistimu, hhayi nje i-algorithm yelophu yangaphakathi.

(Lesi sihloko sichaza ukusesha idatha esenkumbulweni. Ezimweni eziningi, lapho umsebenzisi enza ukusesha kwelogi, amaseva e-Scalyr asuke eseyilondolozile. Isihloko esilandelayo sizodingida ukusesha amalogi angagciniwe. Kusebenza izimiso ezifanayo: ikhodi esebenzayo, amandla ahlukumezayo. ngezinsiza ezinkulu zokubala).

Indlela ye-Brute force

Ngokwesiko, isethi yedatha enkulu iseshwa kusetshenziswa inkomba yegama elingukhiye. Uma kusetshenziswa kulogi lweseva, lokhu kusho ukusesha igama ngalinye eliyingqayizivele kulogi. Kugama ngalinye, udinga ukwenza uhlu lwakho konke okufakiwe. Lokhu kwenza kube lula ukuthola yonke imilayezo enaleli gama, isibonelo 'iphutha', 'firefox' noma "transaction_16851951" - vele ubheke kunkomba.

Ngisebenzise le ndlela kwa-Google futhi yasebenza kahle. Kodwa ku-Scalyr sicinga ama-log byte byte.

Kungani? Kusukela endaweni yokubuka ye-algorithmic engabonakali, izinkomba zamagama angukhiye zisebenza kahle kakhulu kunokusesha ngenkani. Nokho, asiwathengisi ama-algorithms, sithengisa ukusebenza. Futhi ukusebenza akukona nje mayelana nama-algorithms, kodwa futhi mayelana nobunjiniyela bezinhlelo. Kufanele sicabangele yonke into: umthamo wedatha, uhlobo lokusesha, izingxenyekazi zekhompuyutha ezitholakalayo kanye nomongo wesofthiwe. Sinqume ukuthi ngenkinga yethu ethile, into efana ne-'grep' ifaneleka kangcono kunenkomba.

Izinkomba zinkulu, kodwa zinemikhawulo. Igama elilodwa litholakala kalula. Kodwa ukusesha imilayezo enamagama amaningi, njengokuthi 'googlebot' kanye ne-'404', kunzima kakhulu. Ukusesha umushwana othi 'okuhlukile okungakafundiwe' kudinga inkomba enzima kakhulu engaqophi kuphela yonke imilayezo ngalelo gama, kodwa nendawo ethile yegama.

Ubunzima bangempela bufika lapho ungawafuni amagama. Ake sithi ufuna ukubona ukuthi ingakanani ithrafikhi evela ku-bots. Umcabango wokuqala uwukusesha izingodo zegama elithi 'bot'. Nansi indlela ozothola ngayo amanye ama-bots: Googlebot, Bingbot nabanye abaningi. Kodwa lapha elithi 'bot' akulona igama, kodwa ingxenye yalo. Uma sicinga i-'bot' kunkomba, ngeke sithole noma yimaphi amaposi anegama elithi 'Googlebot'. Uma uhlola wonke amagama kunkomba bese uskena inkomba yamagama angukhiye atholiwe, ukusesha kuzokwehla kakhulu. Ngenxa yalokho, ezinye izinhlelo zelogi azikuvumeli ukusesha kwengxenye yegama noma (okungcono kakhulu) zivumela i-syntax ekhethekile enokusebenza okuphansi. Sifuna ukukugwema lokhu.

Enye inkinga yizimpawu zokuloba. Ingabe ufuna ukuthola zonke izicelo ezivela 50.168.29.7? Kuthiwani ngokulungisa amaphutha aqukethe izingodo [error]? Okubhaliselwe kuvame ukweqa izimpawu zokuloba.

Okokugcina, onjiniyela bayawathanda amathuluzi anamandla, futhi ngezinye izikhathi inkinga ingaxazululwa kuphela ngamazwi avamile. Inkomba yegama elingukhiye ayikulungele lokhu.

Ngaphezu kwalokho, indices inkimbinkimbi. Umlayezo ngamunye udinga ukungezwa ohlwini lwamagama angukhiye ambalwa. Lezi zinhlu kufanele zigcinwe ngendlela engasesheka kalula ngaso sonke isikhathi. Imibuzo enemisho, izingcezwana zamagama, noma izinkulumo ezivamile zidinga ukuhunyushelwa ekusebenzeni kohlu oluningi, futhi imiphumela iskenwe futhi ihlanganiswe ukuze kukhiqizwe isethi yemiphumela. Kumongo wesevisi yezinga elikhulu, eqasha abantu abaningi, lobu bunzima budala izinkinga zokusebenza ezingabonakali lapho kuhlaziywa ama-algorithms.

Izinkomba zamagama angukhiye nazo zithatha indawo enkulu, futhi ukugcinwa kuyindleko enkulu ohlelweni lokuphathwa kwelogi.

Ngakolunye uhlangothi, ukusesha ngakunye kungadla amandla amaningi ekhompyutha. Abasebenzisi bethu bayakwazisa ukusesha okushesha kakhulu kwemibuzo ehlukile, kodwa imibuzo enjalo iyenziwa kancane uma kuqhathaniswa. Ngemibuzo yosesho evamile, isibonelo, kudeshibhodi, sisebenzisa amasu akhethekile (sizowachaza esihlokweni esilandelayo). Ezinye izicelo azivamile kangangokuthi akuvamile ukuthi ucubungule izikhathi ezingaphezu kwesisodwa ngesikhathi. Kodwa lokhu akusho ukuthi amaseva ethu awamatasatasa: amatasa nomsebenzi wokwamukela, ukuhlaziya nokucindezela imilayezo emisha, ukuhlola izexwayiso, ukucindezela idatha endala, njalonjalo. Ngakho-ke, sinokunikezwa okubalulekile kwamaphrosesa angasetshenziswa ukwenza imibuzo.

I-Brute force iyasebenza uma unenkinga enonya (kanye namandla amaningi)

I-Brute force isebenza kahle kakhulu ezinkingeni ezilula ezinamalophu amancane angaphakathi. Ngokuvamile ungakwazi ukuthuthukisa iluphu yangaphakathi ukuze isebenze ngesivinini esiphezulu kakhulu. Uma ikhodi iyinkimbinkimbi, kuba nzima kakhulu ukuyithuthukisa.

Ikhodi yethu yosesho ekuqaleni yayineluphu yangaphakathi enkulu. Sigcina imilayezo emakhasini ku-4K; ikhasi ngalinye liqukethe imilayezo (ku-UTF-8) kanye nemethadatha yomlayezo ngamunye. Imethadatha iyisakhiwo esibhala ngekhodi ubude bevelu, i-ID yomlayezo wangaphakathi, nezinye izinkambu. Umjikelezo wokusesha ububukeka kanje:

Sesha ngesivinini esingu-1 TB/s

Lena inguqulo eyenziwe lula yekhodi yangempela. Kodwa nalapha, ukubekwa kwezinto eziningi, amakhophi edatha, nezingcingo zokusebenza ziyabonakala. I-JVM inhle kakhulu ekuthuthukiseni izingcingo zokusebenza kanye nokwaba izinto ze-ephemeral, ngakho le khodi isebenze kangcono kunalokho ebesikufanele. Ngesikhathi sokuhlola, amakhasimende ayisebenzise ngempumelelo. Kodwa ekugcineni siyibeke kwelinye izinga.

(Ungabuza ukuthi kungani sigcina imilayezo ngale fomethi enamakhasi e-4K, umbhalo kanye nemethadatha, esikhundleni sokusebenza ngamalogi ngokuqondile. Ziningi izizathu, ezibilisa iqiniso lokuthi ngaphakathi injini ye-Scalyr ifana nesizindalwazi esabalalisiwe kune- uhlelo lwefayela. Ukusesha umbhalo kuvame ukuhlanganiswa nezihlungi zesitayela se-DBMS emaphethelweni ngemva kokuhlaziya ilogu. Singakwazi ukusesha kanyekanye izinkulungwane eziningi zamalogi ngesikhathi esisodwa, futhi amafayela ombhalo alula awafanele ukuphathwa kwedatha yethu yokwenziwayo, eyimpinda, esabalalisiwe).

Ekuqaleni, kwakubonakala sengathi ikhodi enjalo yayingafaneleki kakhulu ekusebenziseni amandla e-brute. "Umsebenzi wangempela" ku String.indexOf() ayizange ilawule iphrofayela ye-CPU. Okusho ukuthi, ukwenza kahle le ndlela kukodwa ngeke kulethe umphumela obalulekile.

Kuyenzeka sigcine imethadatha ekuqaleni kwekhasi ngalinye, futhi umbhalo wayo yonke imilayezo ku-UTF-8 upakishwe ngakolunye uhlangothi. Sisebenzisa lokhu, sibhala kabusha iluphu ukuze siseshe lonke ikhasi ngesikhathi esisodwa:

Sesha ngesivinini esingu-1 TB/s

Le nguqulo isebenza ngokuqondile ekubukeni raw byte[] futhi isesha yonke imilayezo ngesikhathi esisodwa kulo lonke ikhasi le-4K.

Lokhu kulula kakhulu ukulungiselela indlela ye-brute force. Iluphu yokusesha yangaphakathi ibizwa ngasikhathi sinye kulo lonke ikhasi le-4K, esikhundleni sokuhlukaniswa kokuthunyelwe ngakunye. Akukho ukukopishwa kwedatha, akukho ukwabiwa kwezinto. Futhi imisebenzi yemethadatha eyinkimbinkimbi ibizwa kuphela uma umphumela umuhle, hhayi kuyo yonke imilayezo. Ngale ndlela sisuse ithani le-overhead, futhi wonke umthwalo ugxiliswe ku-loop encane yokusesha yangaphakathi, elungele kahle ukuthuthukiswa okwengeziwe.

I-algorithm yethu yokusesha yangempela isekelwe umbono omuhle kaLeonid Volnitsky. Iyafana ne-algorithm ye-Boyer-Moore, yeqa cishe ubude beyunithi yezinhlamvu yosesho esinyathelweni ngasinye. Umehluko omkhulu ukuthi ihlola amabhayithi amabili ngesikhathi ukuze kuncishiswe okufanayo okungamanga.

Ukusebenzisa kwethu kudinga ukudala ithebula lokubheka elingu-64K kusesho ngalunye, kodwa lokho akulutho uma kuqhathaniswa namagigabhayithi edatha esiseshayo. Ilophu yangaphakathi icubungula amagigabhayithi ambalwa ngomzuzwana kumongo owodwa. Empeleni, ukusebenza okuzinzile kuzungeze i-1,25 GB ngomzuzwana kumongo ngamunye, futhi kunesikhala sokuthuthukiswa. Kungenzeka ukuqeda okunye okungaphezulu ngaphandle kweluphu yangaphakathi, futhi sihlela ukuhlola iluphu yangaphakathi ku-C esikhundleni se-Java.

Sisebenzisa amandla

Sixoxile ngokuthi ukusesha kwelogi kungenziwa "ngokucishe", kodwa "amandla" esinawo angakanani? Kakhulu.

1 umongo: Uma isetshenziswe ngendlela efanele, ingqikithi eyodwa yephrosesa yesimanje inamandla ngokwawo.

8 amakhora: Njengamanje sisebenza kumaseva e-Amazon hi1.4xlarge kanye ne-i2.4xlarge SSD, ngalinye linama-cores angu-8 (imicu engu-16). Njengoba kushiwo ngenhla, lawa ma-cores ngokuvamile amatasa ngemisebenzi yangemuva. Uma umsebenzisi enza ukusesha, ukusebenza kwangemuva kuyamiswa, kukhululwe wonke ama-cores ayi-8 ukuze kuseshwe. Ukusesha kuvame ukuqeda ngesekhondi elihlukanisiwe, ngemva kwalokho umsebenzi wangemuva uphinda uqalise (uhlelo lwe-throttling luqinisekisa ukuthi inqwaba yemibuzo yosesho ayiphazamisi umsebenzi obalulekile wangemuva).

16 amakhora: ngokwethembeka, sihlela amaseva abe amaqembu ayinhloko/ezigqila. Inkosi ngayinye ine-SSD eyodwa kanye neseva eyodwa ye-EBS ngaphansi komyalo wayo. Uma iseva eyinhloko iphahlazeka, iseva ye-SSD ngokushesha ithatha indawo yayo. Cishe ngaso sonke isikhathi, umphathi nesigqila basebenza kahle, ukuze ibhulokhi ngayinye yedatha isesheke kumaseva amabili ahlukene (iseva yesigqila se-EBS inomprosesa obuthakathaka, ngakho-ke asiyicabangi). Sihlukanisa umsebenzi phakathi kwabo, ukuze sibe nengqikithi yama-cores ayi-16 atholakalayo.

Ama-cores amaningi: Esikhathini esizayo esiseduze, sizosabalalisa idatha kuwo wonke amaseva ngendlela yokuthi wonke abambe iqhaza ekucubunguleni zonke izicelo ezingasho lutho. Wonke umongo uzosebenza. [Qaphela: sisebenzise uhlelo futhi sandisa isivinini sokusesha saba ngu-1 TB/s, bheka inothi ekupheleni kwesihloko].

Ubulula buqinisekisa ukwethembeka

Enye inzuzo yendlela ye-brute force ukusebenza kwayo ngokungaguquguquki. Ngokuvamile, ukusesha akuzwela kakhulu emininingwaneni yenkinga nesethi yedatha (ngicabanga ukuthi yingakho kubizwa ngokuthi "amaholoholo").

Inkomba yegama elingukhiye kwesinye isikhathi ikhiqiza imiphumela eshesha kakhulu, futhi kwesinye isikhathi ayikwenzi. Ake sithi uno-50 GB wamalogi lapho igama elithi 'customer_5987235982' livela khona kathathu. Ukusesha kwaleli temu kubala izindawo ezintathu ngqo kusuka kunkomba futhi kuzoqedwa khona manjalo. Kodwa ukusesha kwe-wildcard okuyinkimbinkimbi kungaskena izinkulungwane zamagama angukhiye futhi kuthathe isikhathi eside.

Ngakolunye uhlangothi, ukusesha kwe-brute force kwenza ngesivinini esikhulu noma esingaphansi esifanayo kunoma yimuphi umbuzo. Ukusesha amagama amade kungcono, kodwa ngisho nokusesha uhlamvu olulodwa kuyashesha kakhulu.

Ubulula bendlela ye-brute force isho ukuthi ukusebenza kwayo kuseduze nobuningi bayo bethiyori. Kunezinketho ezimbalwa zokulayisha ngokweqile kwediski okungalindelekile, umbango wokukhiya, ukujaha izikhomba, nezinkulungwane zezinye izizathu zokwehluleka. Ngisanda kubuka izicelo ezenziwe ngabasebenzisi be-Scalyr ngesonto eledlule kuseva yethu ematasa kakhulu. Bekunezicelo eziyi-14. Impela eziyisishiyagalombili zazo zathatha ngaphezu komzuzwana owodwa; U-000% uqedwe phakathi kwama-millisecond angu-99 (uma ungakasebenzisi amathuluzi okuhlaziya amalogi, ngithembe: kuyashesha).

Ukusebenza okuzinzile, okuthembekile kubalulekile ukuze kube lula ukusetshenziswa kwesevisi. Uma ibambezeleka ngezikhathi ezithile, abasebenzisi bayoyibona njengengathembekile futhi babe manqikanqika ukuyisebenzisa.

Ukusesha kwelogi kuyasebenza

Nasi ukugqwayiza okufushane okubonisa ukusesha kwe-Scalyr kusebenza. Sine-akhawunti yedemo lapho singenisa khona yonke imicimbi kuyo yonke inqolobane ye-Github yomphakathi. Kule demo, ngihlola inani ledatha yeviki: cishe u-600 MB wamalogi aluhlaza.

Ividiyo iqoshwe bukhoma, ngaphandle kokulungiswa okukhethekile, kudeskithophu yami (cishe amakhilomitha angama-5000 ukusuka kuseva). Ukusebenza ozokubona kubangelwa kakhulu ukulungiselelwa kweklayenti lewebhu, kanye ne-backend esheshayo nethembekile. Noma kunini lapho kuba khona isikhashana ngaphandle kwesibonisi 'sokulayisha', yimina ohlabayo ukuze ufunde engizokucindezela.

Sesha ngesivinini esingu-1 TB/s

Ekuphethweni

Lapho ucubungula inani elikhulu ledatha, kubalulekile ukukhetha i-algorithm enhle, kodwa "okuhle" akusho "imfashini." Cabanga ukuthi ikhodi yakho izosebenza kanjani ekusebenzeni. Ukuhlaziywa kwethiyori kwama-algorithms kushiya ngaphandle ezinye izici ezingabaluleka kakhulu emhlabeni wangempela. Ama-algorithms alula kulula ukuwenza futhi azinze kakhulu ezimeni ezisemaphethelweni.

Futhi cabanga ngomongo lapho ikhodi izosetshenziswa khona. Esimeni sethu, sidinga amaseva anamandla ngokwanele ukuphatha imisebenzi yangemuva. Abasebenzisi baqala ukusesha okungavamile, ukuze sikwazi ukuboleka lonke iqembu lamaseva ngesikhathi esifushane esidingekayo ukuze siqedele ukusesha ngakunye.

Sisebenzisa indlela ye-brute force, senze usesho olusheshayo, oluthembekile, oluguquguqukayo kusethi yamalogi. Sithemba ukuthi le mibono iwusizo kumaphrojekthi akho.

Hlela: Isihloko nombhalo kushintshile ukusuka kokuthi "Sesha ngo-20 GB ngomzuzwana" kuya kokuthi "Sesha kokuthi 1 TB ngomzuzwana" ukuze kubonise ukunyuka kokusebenza eminyakeni embalwa edlule. Lokhu kunyuka kwesivinini ngokuyinhloko kungenxa yezinguquko ohlotsheni nenani lamaseva e-EC2 esilenzayo namuhla ukuze sinikeze isisekelo samakhasimende ethu esikhulisiwe. Kunezinguquko ezizayo maduze ezizohlinzeka ngokunye ukuthuthuka okumangalisayo ekusebenzeni kahle, futhi asikwazi ukulinda ukwabelana ngazo.

Source: www.habr.com

Engeza amazwana