Haɓaka tambayoyin bayanai akan misalin sabis na B2B don magina

Yadda za a girma sau 10 adadin tambayoyin zuwa bayanan bayanai ba tare da matsawa zuwa uwar garken da ya fi dacewa ba kuma kula da aikin tsarin? Zan gaya muku yadda muka magance tabarbarewar ayyukan bayanan mu, yadda muka inganta tambayoyin SQL don yin hidima ga masu amfani da yawa gwargwadon yuwuwar ba ƙara farashin kayan aikin kwamfuta ba.

Ina yin sabis don sarrafa hanyoyin kasuwanci a cikin kamfanonin gine-gine. Kimanin kamfanoni dubu 3 ne ke aiki tare da mu. Fiye da mutane dubu 10 suna aiki tare da tsarinmu kowace rana don 4-10 hours. Yana magance matsalolin daban-daban na tsarawa, sanarwa, faɗakarwa, tabbatarwa ... Muna amfani da PostgreSQL 9.6. Muna da tebura kusan 300 a cikin ma'ajin bayanai kuma ana karɓar tambayoyi har miliyan 200 (mabambanta dubu 10) kowace rana. A matsakaita muna da buƙatun dubu 3-4 a sakan daya, a mafi yawan lokutan aiki fiye da buƙatun dubu 10 a sakan daya. Yawancin tambayoyin sune OLAP. Akwai ƙarancin ƙari, gyare-gyare da gogewa, ma'ana nauyin OLTP yana da ɗan haske. Na ba da duk waɗannan lambobin don ku iya tantance girman aikinmu kuma ku fahimci yadda ƙwarewarmu za ta iya zama a gare ku.

Hoto na daya. Na waka

Lokacin da muka fara haɓakawa, ba mu yi tunani sosai game da irin nauyin da zai faɗo a kan bayanan ba da kuma abin da za mu yi idan uwar garken ta daina ja. Lokacin zayyana bayanan, mun bi shawarwarin gabaɗaya kuma mun yi ƙoƙarin kada mu harbe kanmu a ƙafa, amma ya wuce nasiha ta gabaɗaya kamar “kada ku yi amfani da tsarin. Dabi'un Halin Mahalli ba mu shiga ba. Mun tsara bisa ka'idodin daidaitawa, guje wa sakewa bayanai kuma ba mu damu da hanzarta wasu tambayoyin ba. Da zaran masu amfani na farko sun isa, mun ci karo da matsalar aiki. Kamar yadda muka saba, sam ba mu yi shiri da wannan ba. Matsalolin farko sun zama masu sauƙi. A matsayinka na mai mulki, duk abin da aka warware ta hanyar ƙara sabon index. Amma akwai lokacin da faci masu sauƙi suka daina aiki. Da yake fahimtar cewa ba mu da kwarewa kuma yana da wuya a gare mu mu fahimci abin da ke haifar da matsalolin, mun dauki hayar kwararru waɗanda suka taimaka mana wajen kafa uwar garken daidai, haɗa sa ido, kuma sun nuna mana inda za mu duba don samun. kididdiga.

Hoto na biyu. Ƙididdiga

Don haka muna da tambayoyi daban-daban kusan dubu 10 waɗanda ake aiwatarwa a cikin ma'ajin mu kowace rana. Daga cikin waɗannan dubu 10, akwai dodanni waɗanda ake aiwatar da su sau miliyan 2-3 tare da matsakaicin lokacin aiwatarwa na 0.1-0.3 ms, kuma akwai tambayoyi tare da matsakaicin lokacin aiwatarwa na daƙiƙa 30 waɗanda ake kira sau 100 a rana.

Ba zai yiwu a inganta dukkan tambayoyin dubu 10 ba, don haka mun yanke shawarar gano inda za mu jagoranci kokarinmu don inganta aikin da aka yi daidai da bayanan. Bayan an maimaita sau da yawa, mun fara rarraba buƙatun zuwa nau'ikan.

TOP buƙatun

Waɗannan su ne tambayoyin mafi nauyi waɗanda ke ɗaukar mafi yawan lokaci (jimlar lokaci). Waɗannan su ne tambayoyin da ake kira sau da yawa ko kuma tambayoyin da ke ɗaukar lokaci mai tsawo don aiwatarwa (an inganta yawan tambayoyin da aka fi sani da dogon lokaci a farkon matakan gwagwarmaya). Sakamakon haka, uwar garken yana ciyar da mafi yawan lokaci akan aiwatar da su. Haka kuma, yana da mahimmanci a raba manyan buƙatun ta jimlar lokacin aiwatarwa da daban ta lokacin IO. Hanyoyin inganta irin waɗannan tambayoyin sun ɗan bambanta.

Ayyukan da aka saba na duk kamfanoni shine yin aiki tare da buƙatun TOP. Akwai kaɗan daga cikinsu; inganta ko da tambaya ɗaya na iya 'yantar da 5-10% na albarkatu. Koyaya, yayin da aikin ya girma, haɓaka tambayoyin TOP ya zama ƙaramar aiki mara nauyi. An riga an yi aiki da duk hanyoyi masu sauƙi, kuma mafi yawan buƙatar "nauyi" yana ɗaukar "kawai" 3-5% na albarkatu. Idan tambayoyin TOP gabaɗaya suna ɗaukar ƙasa da 30-40% na lokacin, to tabbas kun riga kun yi ƙoƙarin sanya su aiki cikin sauri kuma lokaci yayi da za ku ci gaba don inganta tambayoyin daga rukuni na gaba.
Ya rage don amsa tambayar nawa manyan tambayoyin da ya kamata a haɗa a cikin wannan rukunin. Yawancin lokaci ina ɗaukar akalla 10, amma ba fiye da 20. Ina ƙoƙarin tabbatar da cewa lokacin farko da na ƙarshe a cikin ƙungiyar TOP ya bambanta da fiye da sau 10. Wato idan lokacin aiwatar da tambaya ya ragu sosai daga matsayi na 1 zuwa na 10, to sai in dauki TOP-10, idan digon ya fi a hankali, to ina kara girman kungiyar zuwa 15 ko 20.
Haɓaka tambayoyin bayanai akan misalin sabis na B2B don magina

Makiyaya na tsakiya

Waɗannan duk buƙatun ne waɗanda ke zuwa nan da nan bayan TOP, ban da na ƙarshe na 5-10%. Yawancin lokaci, wajen inganta waɗannan tambayoyin yana nufin damar haɓaka aikin uwar garken sosai. Waɗannan buƙatun na iya yin nauyi har zuwa 80%. Amma ko da rabonsu ya wuce 50%, to lokaci ya yi da za a duba su da kyau.

Wutsiya

Kamar yadda aka ambata, waɗannan tambayoyin suna zuwa a ƙarshe kuma suna ɗaukar 5-10% na lokaci. Kuna iya mantawa da su kawai idan ba ku yi amfani da kayan aikin bincike ta atomatik ba, to inganta su kuma na iya zama arha.

Yadda za a tantance kowace ƙungiya?

Ina amfani da tambayar SQL wanda ke taimakawa yin irin wannan kima don PostgreSQL (Na tabbata cewa ana iya rubuta irin wannan tambayar don sauran DBMSs da yawa)

Tambayar SQL don kimanta girman ƙungiyoyin TOP-MEDIUM-TAIL

SELECT sum(time_top) AS sum_top, sum(time_medium) AS sum_medium, sum(time_tail) AS sum_tail
FROM
(
  SELECT CASE WHEN rn <= 20              THEN tt_percent ELSE 0 END AS time_top,
         CASE WHEN rn > 20 AND rn <= 800 THEN tt_percent ELSE 0 END AS time_medium,
         CASE WHEN rn > 800              THEN tt_percent ELSE 0 END AS time_tail
  FROM (
    SELECT total_time / (SELECT sum(total_time) FROM pg_stat_statements) * 100 AS tt_percent, query,
    ROW_NUMBER () OVER (ORDER BY total_time DESC) AS rn
    FROM pg_stat_statements
    ORDER BY total_time DESC
  ) AS t
)
AS ts

Sakamakon tambayar shine ginshiƙai guda uku, kowannensu yana ɗauke da adadin lokacin da ake ɗauka don aiwatar da tambayoyi daga wannan rukunin. A cikin buƙatun akwai lambobi biyu (a cikin akwati na shine 20 da 800) waɗanda ke raba buƙatun daga rukuni ɗaya daga wata.

Wannan shine yadda rabon buƙatun ke kwatanta daidai lokacin da aikin ingantawa ya fara da yanzu.

Haɓaka tambayoyin bayanai akan misalin sabis na B2B don magina

Jadawalin ya nuna cewa rabon buƙatun TOP ya ragu sosai, amma "masu ƙauyen tsakiya" sun karu.
Da farko, buƙatun TOP sun haɗa da kurakurai. A tsawon lokaci, cututtukan yara sun ɓace, rabon buƙatun TOP ya ragu, kuma dole ne a ƙara ƙoƙari don hanzarta buƙatun buƙatun.

Don samun rubutun buƙatun muna amfani da buƙatun mai zuwa

SELECT * FROM (
  SELECT ROW_NUMBER () OVER (ORDER BY total_time DESC) AS rn, total_time / (SELECT sum(total_time) FROM pg_stat_statements) * 100 AS tt_percent, query
  FROM pg_stat_statements
  ORDER BY total_time DESC
) AS T
WHERE
rn <= 20 -- TOP
-- rn > 20 AND rn <= 800 -- MEDIUM
-- rn > 800  -- TAIL

Anan akwai jerin dabarun da aka fi amfani da su waɗanda suka taimaka mana saurin tambayoyin TOP:

  • Sake fasalin tsarin, alal misali, sake yin aikin dabaru ta hanyar amfani da dillalin saƙo maimakon tambayoyin lokaci-lokaci zuwa bayanan bayanai.
  • Ƙara ko canza fihirisa
  • Sake rubuta tambayoyin ORM zuwa tsantsar SQL
  • Sake rubuta maƙarƙashiya data loda hankali
  • Caching ta hanyar lalata bayanai. Misali, muna da isar da haɗin tebur -> Invoice -> Buƙatar -> Aikace-aikace. Wato, kowane bayarwa yana da alaƙa da aikace-aikacen ta wasu tebur. Don kar a haɗa duk tebur a cikin kowane buƙatun, mun kwafi hanyar haɗin zuwa buƙatun a cikin Teburin Bayarwa.
  • Caching a tsaye tebur tare da litattafan tunani da wuyan canza tebur a cikin ƙwaƙwalwar shirin.

Wani lokaci canje-canjen sun kai ga sake fasalin mai ban sha'awa, amma sun ba da 5-10% na nauyin tsarin kuma an barata. Bayan lokaci, shaye-shaye ya zama ƙarami kuma ƙarami, kuma ana buƙatar sake fasalin da yawa.

Sa'an nan kuma muka mayar da hankali ga rukuni na biyu na buƙatun - ƙungiyar manoma na tsakiya. Akwai ƙarin tambayoyi da yawa a cikinsa kuma da alama zai ɗauki lokaci mai yawa don nazarin ƙungiyar gaba ɗaya. Koyaya, yawancin tambayoyin sun zama masu sauƙi don haɓakawa, kuma an maimaita matsaloli da yawa sau da yawa a cikin bambancin daban-daban. Anan akwai misalan wasu haɓakawa na yau da kullun waɗanda muka yi amfani da su ga yawancin tambayoyi iri ɗaya kuma kowane rukunin ingantattun tambayoyin sun sauke bayanan da kashi 3-5%.

  • Maimakon bincika kasancewar bayanan ta amfani da COUNT da cikakken sikanin tebur, an fara amfani da EXISTS
  • Ka rabu da DISTINCT (babu girke-girke na gaba ɗaya, amma wani lokacin zaka iya kawar da shi cikin sauƙi ta hanyar hanzarta buƙatar ta sau 10-100).

    Misali, maimakon tambaya don zaɓar duk direbobi daga babban tebur na isar da saƙo (DELIVERY)

    SELECT DISTINCT P.ID, P.FIRST_NAME, P.LAST_NAME
    FROM DELIVERY D JOIN PERSON P ON D.DRIVER_ID = P.ID
    

    yayi tambaya akan ƙaramin tebur MUTUM

    SELECT P.ID, P.FIRST_NAME, P.LAST_NAME
    FROM PERSON
    WHERE EXISTS(SELECT D.ID FROM DELIVERY WHERE D.DRIVER_ID = P.ID)
    

    Da alama mun yi amfani da abin da ke da alaƙa, amma yana ba da saurin gudu fiye da sau 10.

  • A yawancin lokuta, an yi watsi da COUNT gaba ɗaya kuma
    maye gurbinsu da lissafin ƙima
  • maimakon
    UPPER(s) LIKE JOHN%’ 
    

    amfani

    s ILIKE “John%”
    

Kowane takamaiman buƙatu wani lokaci ana yin saurin sauri da sau 3-1000. Duk da rawar gani da aka yi, da farko mun ga kamar babu wata ma'ana a inganta tambayar da ke ɗaukar 10 ms don kammalawa, tana ɗaya daga cikin tambayoyin ɗari uku mafi nauyi, kuma tana ɗaukar ɗaruruwan kashi na jimlar lokacin ɗaukar bayanai. Amma ta hanyar yin amfani da girke-girke iri ɗaya ga rukunin tambayoyin nau'in iri ɗaya, mun sami nasara kaɗan kaɗan. Don kada mu ɓata lokaci da hannu muna bitar duk ɗaruruwan tambayoyi da hannu, mun rubuta sauƙaƙan rubutun da yawa waɗanda suka yi amfani da maganganu na yau da kullun don nemo tambayoyi iri ɗaya. Sakamakon haka, bincika ƙungiyoyin tambayoyi ta atomatik ya ba mu damar ƙara haɓaka ayyukanmu tare da ƙaramin ƙoƙari.

A sakamakon haka, mun yi aiki a kan kayan aiki iri ɗaya tsawon shekaru uku yanzu. Matsakaicin nauyin yau da kullun yana kusan 30%, a cikin kololuwar ya kai 70%. Yawan buƙatun, da kuma adadin masu amfani, ya ƙaru kusan sau 10. Kuma duk wannan godiya ga ci gaba da lura da waɗannan ƙungiyoyin buƙatun TOP-MEDIUM. Da zaran sabuwar bukata ta bayyana a cikin rukunin TOP, nan da nan za mu bincika ta kuma mu yi ƙoƙarin hanzarta ta. Muna bitar ƙungiyar MEDIUM sau ɗaya a mako ta amfani da rubutun bincike. Idan muka ci karo da sababbin tambayoyin da muka riga mun san yadda ake inganta su, muna canza su da sauri. Wani lokaci muna samun sababbin hanyoyin ingantawa waɗanda za a iya amfani da su ga tambayoyi da yawa lokaci guda.

Dangane da hasashen mu, uwar garken na yanzu zai jure haɓakar yawan masu amfani da wani sau 3-5. Gaskiya, muna da ƙarin ace guda ɗaya sama da hannunmu - har yanzu ba mu canja wurin tambayoyin SELECT zuwa madubi ba, kamar yadda aka ba da shawarar. Amma ba za mu yi wannan da hankali ba, saboda muna so mu fara ƙare gaba ɗaya damar haɓakawa na "smart" kafin kunna "makamai masu nauyi".
Mahimman kallon aikin da aka yi na iya ba da shawarar yin amfani da ma'auni a tsaye. Sayi uwar garken mafi ƙarfi maimakon ɓata lokacin ƙwararru. Maiyuwa uwar garken ba zai yi tsada haka ba, musamman tun da har yanzu ba mu ƙãre iyakar sikelin a tsaye ba. Koyaya, kawai adadin buƙatun ya ƙaru sau 10. A cikin shekaru da yawa, aikin tsarin ya karu kuma yanzu akwai ƙarin nau'ikan buƙatun. Godiya ga caching, aikin da ya wanzu ana yin shi a cikin ƴan buƙatu kaɗan, da buƙatu mafi inganci. Wannan yana nufin zaku iya ninka a amince da wani 5 don samun ainihin saurin haɓakawa. Don haka, bisa ga mafi yawan ƙididdiga masu ra'ayin mazan jiya, zamu iya cewa haɓakawa ya kasance sau 50 ko fiye. Juyawa uwar garken a tsaye zai kara farashin sau 50. Musamman ganin cewa da zarar an inganta aikin yana aiki koyaushe, kuma lissafin uwar garken haya yana zuwa kowane wata.

source: www.habr.com

Add a comment