Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde

Abasebenzisi bethu babhalelana imiyalezo ngaphandle kokwazi ukukhathala.
Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde
Kuningi impela lokho. Uma uzimisele ukufunda yonke imilayezo yabo bonke abasebenzisi, kuzothatha iminyaka engaphezu kwezinkulungwane eziyi-150. Kuncike ekutheni ungumfundi othuthuke kakhulu futhi ungachithi ngaphezu kwesekhondi kumlayezo ngamunye.

Ngomthamo onjalo wedatha, kubalulekile ukuthi ingqondo yokugcina nokuyifinyelela yakhelwe ngendlela efanele. Uma kungenjalo, ngomzuzu owodwa ongewona omangalisayo, kungase kucace ukuthi konke kuzohamba kabi maduzane.

Kithina, lo mzuzu ufike ngonyaka nengxenye edlule. Ukuthi sifike kanjani kulokhu nokuthi kwenzekeni ekugcineni - sikutshela ngokulandelana.

Ingemuva

Ekuqalisweni kokuqala, imilayezo ye-VKontakte yasebenza ekuhlanganiseni kwe-PHP backend ne-MySQL. Lesi yisixazululo esijwayelekile ngokuphelele sewebhusayithi yabafundi encane. Kodwa-ke, le sayithi yakhula ngokungalawuleki futhi yaqala ukufuna ukwenziwa kahle kwezakhiwo zedatha yona ngokwayo.

Ekupheleni kuka-2009, kwabhalwa inqolobane yokuqala yombhalo-injini, futhi ngo-2010 imiyalezo yadluliselwa kuyo.

Enjinini yombhalo, imilayezo yayigcinwe ohlwini - uhlobo "lwebhokisi lemeyili". Uhlu ngalunye olunjalo lunqunywa i-uid - umsebenzisi ongumnikazi wayo yonke le milayezo. Umlayezo unesethi yezibaluli: isihlonzi se-interlocutor, umbhalo, izinanyathiselwa, nokunye. Isihlonzi somlayezo esingaphakathi “kwebhokisi” sithi local_id, asishintshi futhi sabelwa ngokulandelana imilayezo emisha. "Amabhokisi" azimele futhi awavunyelanisiwe phakathi kwenjini; ukuxhumana phakathi kwawo kwenzeka ezingeni le-PHP. Ungabheka ukwakheka kwedatha namandla enjini yombhalo kusuka ngaphakathi lapha.
Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde
Lokhu bekwanele ngempela ukuxhumana phakathi kwabasebenzisi ababili. Qagela ukuthi kwenzekani ngokulandelayo?

Ngo-May 2011, i-VKontakte yethula izingxoxo nabahlanganyeli abambalwa-izingxoxo eziningi. Ukuze sisebenze nabo, sikhulise amaqoqo amasha amabili - izingxoxo zamalungu kanye namalungu engxoxo. Eyokuqala igcina idatha mayelana nezingxoxo ngabasebenzisi, eyesibili igcina idatha emayelana nabasebenzisi ngezingxoxo. Ngaphezu kohlu ngokwalo, lokhu kufaka, isibonelo, umsebenzisi omemayo kanye nesikhathi abengezwe ngaso engxoxweni.

"PHP, asithumele umlayezo engxoxweni," kusho umsebenzisi.
"Woza, {username}," kusho i-PHP.
Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde
Kunobubi kulolu hlelo. Ukuvumelanisa kusewumthwalo we-PHP. Izingxoxo ezinkulu nabasebenzisi abathumela imilayezo kubo ngesikhathi esisodwa kuyindaba eyingozi. Njengoba isibonelo se-text-engine sincike ku-uid, ababambiqhaza bengxoxo bangathola umlayezo ofanayo ngezikhathi ezihlukene. Umuntu angaphila nalokhu uma intuthuko imile. Kodwa lokho ngeke kwenzeke.

Ekupheleni kuka-2015, sethule imilayezo yomphakathi, futhi ekuqaleni kuka-2016, sethule i-API yabo. Ngokufika kwama-chatbots amakhulu emiphakathini, bekungenzeka ukukhohlwa ngisho nokusatshalaliswa komthwalo.

I-bot enhle ikhiqiza izigidi ezimbalwa zemiyalezo ngosuku - ngisho nabasebenzisi abakhuluma kakhulu abakwazi ukuziqhayisa ngalokhu. Lokhu kusho ukuthi ezinye izimo ze-text-injini, lapho ama-bots ahlala khona, aqala ukuhlupheka ngokugcwele.

Izinjini zemilayezo ngo-2016 ziyizikhathi ezingu-100 zamalungu engxoxo nezingxoxo zamalungu, kanye nezinjini zombhalo ezingu-8000. Basingathwe kumaseva ayinkulungwane, ngalinye linememori engu-64 GB. Njengesinyathelo sokuqala esiphuthumayo, sikhulise inkumbulo ngomunye u-32 GB. Silinganisele izibikezelo. Ngaphandle kwezinguquko ezinkulu, lokhu bekuyokwanela cishe omunye unyaka. Udinga ukuthi ubambe i-hardware noma uthuthukise imininingwane yolwazi ngokwazo.

Ngenxa yemvelo yezakhiwo, kunengqondo kuphela ukwandisa ihadiwe ngokuphindaphinda. Okusho ukuthi, okungenani ukuphinda kabili inani lezimoto - ngokusobala, lena yindlela ebiza kakhulu. Sizothuthukisa.

Umqondo omusha

Ingqikithi emaphakathi yendlela entsha yingxoxo. Ingxoxo inohlu lwemilayezo ehlobene nayo. Umsebenzisi unohlu lwezingxoxo.

Ubuncane obudingekayo yisizindalwazi esisha ezimbili:

  • ingxoxo-injini. Lena inqolobane yamavekhtha engxoxo. Ingxoxo ngayinye ine-vector yemilayezo ehlobene nayo. Umlayezo ngamunye unombhalo kanye nesihlonzi somlayezo esiyingqayizivele ngaphakathi kwengxoxo - chat_local_id.
  • injini yomsebenzisi. Lesi isitoreji sama-vectors wabasebenzisi - izixhumanisi kubasebenzisi. Umsebenzisi ngamunye une-vector ye-peer_id (abaxhumanisi - abanye abasebenzisi, ingxoxo eminingi noma imiphakathi) kanye nevekhtha yemilayezo. I-peer_id ngayinye inevekhtha yemilayezo ehlobene nayo. Umlayezo ngamunye une-chat_local_id kanye ne-ID yomlayezo eyingqayizivele yalowo msebenzisi - user_local_id.

Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde
Amaqoqo amasha ayaxhumana esebenzisa i-TCP - lokhu kuqinisekisa ukuthi ukuhleleka kwezicelo akushintshi. Izicelo ngokwazo kanye neziqinisekiso zazo zirekhodwa ku-hard drive - ukuze sikwazi ukubuyisela isimo somugqa nganoma yisiphi isikhathi ngemva kokuhluleka noma ukuqalisa kabusha injini. Njengoba injini yomsebenzisi kanye ne-chat-injini iyizinhlamvu eziyizinkulungwane ezingu-4 ngayinye, umugqa wesicelo phakathi kwamaqoqo uzosatshalaliswa ngokulinganayo (kodwa empeleni akukho nhlobo - futhi isebenza ngokushesha kakhulu).

Ukusebenza ngediski kuzinqolobane zethu zolwazi ezimweni eziningi kusekelwe kwinhlanganisela yelogi kanambambili yezinguquko (binlog), izifinyezo ezimile kanye nesithombe esiyingxenye yenkumbulo. Izinguquko phakathi nosuku zibhalwa ku-binlog, futhi isifinyezo sesimo samanje senziwa ngezikhathi ezithile. Isifinyezo iqoqo lezakhiwo zedatha ezithuthukiselwe izinjongo zethu. Iqukethe unhlokweni (i-metaindex yesithombe) kanye nesethi ye-metafiles. Unhlokweni ugcinwa unomphela ku-RAM futhi ubonisa ukuthi ungabheka kuphi idatha kusuka kusifinyezo. Imetafile ngayinye ihlanganisa idatha okungenzeka idingeke ngesikhathi esiseduze—ngokwesibonelo, ehlobene nomsebenzisi oyedwa. Uma ubuza imininingo egciniwe usebenzisa unhlokweni wesifinyezo, i-metafile edingekayo iyafundwa, bese izinguquko ku-binlog ezenzeke ngemva kokudalwa kwesifinyezo ziyabhekwa. Ungafunda kabanzi mayelana nezinzuzo zale ndlela lapha.

Ngesikhathi esifanayo, idatha ku-hard drive ngokwayo ishintsha kanye kuphela ngosuku - ebusuku kakhulu eMoscow, lapho umthwalo uncane. Ngenxa yalokhu (ukwazi ukuthi isakhiwo esiku-disk sihlala sihlala usuku lonke), singakwazi ukukhokhela ama-vector esikhundleni sobukhulu obumisiwe - futhi ngenxa yalokhu, sithola inkumbulo.

Ukuthumela umlayezo kuhlelo olusha kubukeka kanje:

  1. I-backend ye-PHP ithinta injini yomsebenzisi ngesicelo sokuthumela umlayezo.
  2. i-user-engine proxies isicelo sesibonelo esifiselekayo senjini yengxoxo, esibuyela ku-user-engine chat_local_id - isihlonzi esiyingqayizivele somlayezo omusha ngaphakathi kwale ngxoxo. I-chat_engine ibe isisakaza umlayezo kubo bonke abamukeli engxoxweni.
  3. umsebenzisi-injini ithola i-chat_local_id enjinini yengxoxo futhi ibuyisela i-user_local_id ku-PHP - isihlonzi somlayezo esiyingqayizivele salo msebenzisi. Lesi sihlonzi sibe sesisetshenziswa, isibonelo, ukusebenza ngemilayezo nge-API.

Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde
Kodwa ngaphezu kokuthumela imiyalezo empeleni, udinga ukwenza izinto ezimbalwa ezibalulekile:

  • Uhlu olungezansi, isibonelo, imilayezo yakamuva kakhulu oyibonayo lapho uvula uhlu lwezingxoxo. Imilayezo engafundiwe, imilayezo enamathegi (“Okubalulekile”, “Ugaxekile”, njll.).
  • Ukucindezela imilayezo enjinini yengxoxo
  • Igcina kunqolobane imilayezo enjinini yomsebenzisi
  • Sesha (ngazo zonke izingxoxo nangaphakathi kwenye ethize).
  • Isibuyekezo sesikhathi sangempela (Longpolling).
  • Ukulondoloza umlando ukuze kusetshenziswe ukulondoloza isikhashana kumakhasimende eselula.

Zonke izinhlu ezingaphansi ziyizakhiwo ezishintsha ngokushesha. Ukusebenza nabo sisebenzisa Izihlahla ze-splay. Lokhu kukhetha kuchazwa iqiniso lokuthi phezulu esihlahleni kwesinye isikhathi sigcina yonke ingxenye yemiyalezo evela esifinyezweni - ngokwesibonelo, ngemuva kokuphinda kubhalwe kabusha ebusuku, isihlahla siqukethe phezulu okukodwa, okuqukethe yonke imilayezo yohlu oluncane. Isihlahla se-Splay senza kube lula ukufaka phakathi ne-vertex enjalo ngaphandle kokucabanga ngokulinganisa. Ngaphezu kwalokho, i-Splay ayigcini idatha engadingekile, esigcina inkumbulo.

Imilayezo ibandakanya inani elikhulu lolwazi, ikakhulukazi umbhalo, okuwusizo ukukwazi ukuminyanisa. Kubalulekile ukuthi sikwazi ukukhipha ngokunembile ngisho nomlayezo owodwa owodwa. Isetshenziselwa ukucindezela imilayezo I-algorithm ye-Huffman nge-heuristics yethu - isibonelo, siyazi ukuthi emilayezweni amagama ashintshana ngokuthi “okungewona amagama” - izikhala, izimpawu zokubhala - futhi sikhumbula ezinye zezici zokusebenzisa izimpawu zolimi lwesiRashiya.

Njengoba kunabasebenzisi abambalwa kakhulu kunezingxoxo, ukuze ulondoloze izicelo zediski zokufinyelela okungahleliwe enjinini yengxoxo, sigcina inqolobane imilayezo enjinini yomsebenzisi.

Ukusesha umlayezo kusetshenziswa njengombuzo we-diagonal kusuka kunjini yomsebenzisi kuya kuzo zonke izimo zenjini yengxoxo eziqukethe izingxoxo zalo msebenzisi. Imiphumela ihlanganiswe enjinini yomsebenzisi uqobo.

Nokho, yonke imininingwane isicatshangelwe, okusele nje ukushintshela ohlelweni olusha - futhi okungcono ngaphandle kokuthi abasebenzisi bakuqaphele.

Ukuthuthwa kwedatha

Ngakho-ke, sinenjini yombhalo egcina imilayezo ngomsebenzisi, namaqoqo amabili ezingxoxo namalungu ezingxoxo ezigcina idatha mayelana namagumbi ezingxoxo eziningi kanye nabasebenzisi abakuwo. Ungasuka kanjani kulokhu uye enjinini entsha yomsebenzisi kanye nenjini yokuxoxa?

izingxoxo zamalungu esikimini esidala zasetshenziswa ngokuyinhloko ukuze kuthuthukiswe. Ngokushesha sidlulise idatha edingekayo isuka kuyo iye kumalungu engxoxo, ngemva kwalokho ayizange isabambisene nenqubo yokuthutha.

Ulayini wamalungu engxoxo. Ihlanganisa izikhathi ezingu-100, kuyilapho injini yengxoxo inezinkulungwane ezi-4. Ukuze udlulise idatha, udinga ukuyiletha ekuhambisaneni - kulokhu, amalungu engxoxo ahlukaniswe amakhophi afanayo ayizinkulungwane ezingu-4, bese ukufundwa kwe-binlog yamalungu engxoxo kwavunyelwa enjini yezingxoxo.
Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde
Manje injini yengxoxo iyazi mayelana nezingxoxo eziningi ezivela kumalungu engxoxo, kodwa ayikakwazi lutho mayelana nezingxoxo nabaxhumanisi ababili. Izingxoxo ezinjalo zitholakala enjinini yombhalo ngokubhekiselwa kubasebenzisi. Lapha sithathe idatha "ngokubheka phambili": isenzakalo ngasinye senjini yengxoxo sibuze zonke izimo zenjini yombhalo uma ngabe zinengxoxo edingekayo.

Kuhle - injini yengxoxo iyazi ukuthi yiziphi izingxoxo zezingxoxo eziningi futhi iyazi ukuthi yiziphi izingxoxo ezikhona.
Udinga ukuhlanganisa imilayezo ezingxoxweni ezinezingxoxo eziningi ukuze ugcine usunohlu lwemilayezo engxoxweni ngayinye. Okokuqala, injini yengxoxo ithola enjinini yombhalo yonke imilayezo yabasebenzisi esuka kule ngxoxo. Kwezinye izimo kukhona okuningi kakhulu (kufika emakhulwini ezigidi), kodwa ngaphandle kokungavamile kakhulu ingxoxo ingena ngokuphelele ku-RAM. Sinemilayezo enga-odwe, ngayinye ngamakhophi ambalwa - ngemva kwakho konke, yonke idonswa ezimweni ezihlukene zenjini yombhalo ehambisana nabasebenzisi. Umgomo uwukuhlunga imilayezo nokulahla amakhophi athatha indawo engadingekile.

Umlayezo ngamunye unesitembu sesikhathi esiqukethe isikhathi othunyelwe ngaso nombhalo. Sisebenzisa isikhathi sokuhlunga - sibeka izikhombisi emilayezweni emidala kakhulu yabahlanganyeli bengxoxo eningi futhi siqhathanise ama-hashes asuka embhalweni wamakhophi ahlosiwe, siye ngasesitembu sesikhathi esikhulayo. Kunengqondo ukuthi amakhophi azoba ne-hashi nesitembu sesikhathi esifanayo, kodwa empeleni lokhu akunjalo ngaso sonke isikhathi. Njengoba ukhumbula, ukuvumelanisa ohlelweni oludala kwenziwa yi-PHP - futhi ezimweni ezingavamile, isikhathi sokuthumela umlayezo ofanayo sasihluka phakathi kwabasebenzisi abahlukene. Kulezi zimo, sizivumele ukuthi sihlele isitembu sesikhathi - ngokuvamile phakathi nesekhondi. Inkinga yesibili ukuhleleka okuhlukile kwemilayezo yabamukeli abahlukene. Ezimweni ezinjalo, sivumele ikhophi eyengeziwe ukuthi idalwe, enezinketho ezihlukile zoku-oda zabasebenzisi abahlukene.

Ngemva kwalokhu, idatha mayelana nemilayezo eku-multichat ithunyelwa enjinini yomsebenzisi. Futhi nakhu kufika isici esingemnandi semilayezo engenisiwe. Ekusebenzeni okuvamile, imilayezo eza enjinini ihlelwa ngokuqinile ngendlela ekhuphukayo by user_local_id. Imilayezo engeniswe enjinini endala ilahlekelwe yilesi sici esiwusizo. Ngesikhathi esifanayo, ukuze kube lula ukuhlola, udinga ukwazi ukufinyelela kuzo ngokushesha, ubheke okuthile kuzo bese wengeza ezintsha.

Sisebenzisa isakhiwo sedatha esikhethekile ukuze sigcine imilayezo engenisiwe.

Imele i-vector yosayizi Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usindeuphi wonke umuntu Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde - zihlukile futhi zihlelwe ngokulandelana kokwehla, nokuhleleka okukhethekile kwezakhi. Engxenyeni ngayinye enezinkomba Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde izakhi zihlelwa. Ukusesha i-elementi esakhiweni esinjalo kuthatha isikhathi Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde ngokusebenzisa Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde ukusesha kanambambili. Ukwengezwa kwe-elementi kuncishisiwe Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde.

Ngakho-ke, sithole ukuthi singayidlulisela kanjani idatha isuka ezinjinini ezindala iye kwezintsha. Kodwa le nqubo ithatha izinsuku ezimbalwa - futhi akunakwenzeka ukuthi phakathi nalezi zinsuku abasebenzisi bethu bazoyeka umkhuba wokubhalelana. Ukuze ungalahlekelwa yimilayezo ngalesi sikhathi, sishintshela kusikimu somsebenzi esisebenzisa kokubili amaqoqo amadala namasha.

Idatha ibhalelwa amalungu engxoxo kanye nenjini yomsebenzisi (hhayi enjinini yombhalo, njengasekusebenzeni okuvamile ngokohlelo oludala). i-proxy injini yesicelo senjini yokuxoxa - futhi lapha ukuziphatha kuncike ekutheni le ngxoxo isivele ihlanganisiwe noma cha. Uma ingxoxo ingakahlanganiswa, injini yengxoxo ayizibhali umlayezo, futhi ukucubungula kwayo kwenzeka kuphela enjinini yombhalo. Uma ingxoxo isivele ihlanganisiwe yaba yinjini yengxoxo, ibuyisela i-chat_local_id kunjini yomsebenzisi futhi ithumela umlayezo kubo bonke abamukeli. ama-proxies enjini yomsebenzisi yonke idatha eya enjinini yombhalo - ukuze kuthi uma kwenzeka okuthile, sikwazi ukuhlehla, sibe nayo yonke idatha yamanje enjinini endala. i-text-engine ibuyisela i-user_local_id, injini yomsebenzisi eyigcinayo bese ibuyela emuva.
Phinda ubhale isizindalwazi somlayezo we-VKontakte kusukela ekuqaleni futhi usinde
Ngenxa yalokho, inqubo yoguquko ibukeka kanje: sixhuma amaqoqo angenalutho enjini yomsebenzisi kanye ne-chat-engine. i-chat-engine ifunda yonke i-binlog yamalungu engxoxo, bese ukwenza ummeleli kuqala ngokohlelo oluchazwe ngenhla. Sidlulisa idatha endala futhi sithole amaqoqo amabili avumelanisiwe (amadala namasha). Okusele nje wukushintsha ukufunda kusuka kunjini yombhalo kuye kunjini yomsebenzisi futhi ukhubaze ukwenza ummeleli.

Imiphumela

Ngenxa yendlela entsha, wonke amamethrikhi okusebenza ezinjini athuthukisiwe futhi izinkinga zokungaguquguquki kwedatha zixazululiwe. Manje sesingakwazi ukusebenzisa ngokushesha izici ezintsha emilayezweni (futhi sesivele siqalile ukwenza lokhu - senyuse inani eliphezulu lababambe iqhaza engxoxweni, saqalisa ukusesha imiyalezo edluliselwe phambili, sethula imilayezo ephiniwe futhi sakhuphula umkhawulo enanini eliphelele lemilayezo ngomsebenzisi ngamunye) .

Izinguquko ku-logic zinkulu ngempela. Futhi ngithanda ukuqaphela ukuthi lokhu akusho njalo iminyaka yonke yokuthuthuka kweqembu elikhulu nezinkulungwane zemigqa yekhodi. injini yengxoxo kanye nenjini yomsebenzisi kanye nazo zonke izindaba ezengeziwe ezifana ne-Huffman yokucindezelwa komlayezo, izihlahla ze-Splay kanye nesakhiwo semilayezo engenisiwe ingaphansi kwemigqa yekhodi eyizinkulungwane ezingu-20. Futhi zabhalwa ngabathuthukisi aba-3 ezinyangeni eziyi-10 kuphela (noma kunjalo, kufanelekile ukukhumbula ukuthi konke ezintathu unjiniyela - Ompetha bomhlaba ezinhlelweni zezemidlalo).

Ngaphezu kwalokho, esikhundleni sokuphinda kabili inani lamaseva, sehlise inani lawo ngesigamu - manje injini yomsebenzisi kanye nenjini yokuxoxa bukhoma emishinini engokoqobo engu-500, kuyilapho isikimu esisha sinekhanda elikhulu lokulayisha. Sigcine imali eningi kumishini - cishe amaRandi ayizigidi ezi-5 + amaRandi ayizinkulungwane ezingama-750 ngonyaka ezindlekweni zokusebenza.

Silwela ukuthola izixazululo ezingcono kakhulu zezinkinga eziyinkimbinkimbi nezinkulu. Sinenqwaba yazo - yingakho sifuna onjiniyela abanekhono emnyangweni wedathabheyisi. Uma uthanda futhi wazi ukuthi ungazixazulula kanjani izinkinga ezinjalo, unolwazi oluhle kakhulu lwama-algorithms nezakhiwo zedatha, sikumema ukuthi ujoyine iqembu. Xhumana nathi HRukuze uthole imininingwane.

Ngisho noma le ndaba ingekho ngawe, sicela uqaphele ukuthi siyazazisa izincomo. Tshela umngane mayelana izikhala zonjiniyela, futhi uma eqeda ngempumelelo isikhathi sokuhlolwa, uzothola ibhonasi yama-ruble ayizinkulungwane eziyi-100.

Source: www.habr.com

Engeza amazwana