Wahi a
Ua loiloi au i nā hakahaka no ke kūlana o ka ʻenekinia data e like me lākou ma Ianuali 2020 e hoʻomaopopo ai i nā loea ʻenehana i kaulana loa. A laila hoʻohālikelike au i nā hopena me nā ʻikepili no nā hakahaka no ke kūlana ʻepekema data - a ua puka mai kekahi mau ʻokoʻa hoihoi.
Me ka nui ʻole o ka preamble, eia nā ʻenehana he ʻumi i ʻōlelo pinepine ʻia ma nā hoʻolaha hana:
Ka haʻi ʻana i nā ʻenehana i nā hakahaka no ke kūlana o ka ʻenekini data ma 2020
Nā kuleana o ka ʻenekinia data
I kēia lā, he mea koʻikoʻi ka hana a nā ʻenekinia data no nā hui - ʻo ia ka poʻe kuleana no ka mālama ʻana i ka ʻike a lawe mai i kahi ʻano e hiki ai i nā limahana ke hana pū me ia. Hoʻokumu nā ʻenekini ʻikepili i nā pipeline e kahe a i ʻole nā ʻikepili mai nā kumu he nui. A laila hana nā Pipelines i ka unuhi, hoʻololi, a me ka hoʻouka ʻana i nā hana (ma nā huaʻōlelo ʻē aʻe, nā kaʻina ETL), e hana i ka ʻikepili i kūpono no ka hoʻohana hou ʻana. Ma hope o kēia, hāʻawi ʻia ka ʻikepili i nā mea noiʻi a me nā ʻepekema data no ka hana hohonu. ʻO ka hope, hoʻopau ka ʻikepili i kāna huakaʻi i nā dashboards, nā hōʻike, a me nā kumu hoʻohālike aʻo mīkini.
Ke ʻimi nei au i ka ʻike e hiki ai iaʻu ke huki i ka hopena e pili ana i nā ʻenehana i makemake nui ʻia i ka hana a kahi ʻenekini data i kēia manawa.
Nā Palapala
Ua hōʻiliʻili au i ka ʻike mai ʻekolu mau wahi huli hana −
No kēlā me kēia huaʻōlelo, helu au i ka pākēneka o nā hits mai ka huina o nā kikokikona ma kēlā me kēia paena, a laila helu ʻia ka awelika no nā kumu ʻekolu.
Nā hualoaʻa
Ma lalo iho nei nā huaʻōlelo ʻenekinia ʻike loea he kanakolu me nā helu kiʻekiʻe loa ma nā kahua hana ʻekolu.
A eia nā helu like, akā hōʻike ʻia ma ke ʻano papa.
E hele kāua i ka hoʻonohonoho.
Nānā i nā hopena
Hōʻike ʻia ʻo SQL a me Python i ʻoi aku ma mua o ʻelua hapakolu o nā wehe hana i loiloi ʻia. ʻO kēia mau ʻenehana ʻelua i kūpono ke aʻo mua.
Ua ʻōlelo ʻia ʻo Spark ma kahi o ka hapalua o nā hakahaka.
Hōʻike ʻia ʻo AWS ma kahi o 45% o nā leka hana. He kahua hoʻonohonoho kapua i hana ʻia e Amazon; loaʻa iā ia ka māhele mākeke nui loa ma waena o nā paepae ao āpau.
A laila hele mai ʻo Java a me Hadoop - ʻoi aku ma mua o 40% no ko lākou kaikunāne.
Ua like ia me ka holo ʻana i ka mīkini manawa
A laila ʻike mākou iā Hive, Scala, Kafka a me NoSQL - ua ʻōlelo ʻia kēlā me kēia ʻenehana i ka hapaha o nā hakahaka i waiho ʻia. He polokalamu waihona ʻikepili ʻo Apache Hive e "maʻalahi ka heluhelu, kākau, a me ka hoʻokele ʻana i nā ʻikepili nui e noho ana i nā hale kūʻai i hoʻohana ʻia me SQL."
Hoʻohālikelike me nā huaʻōlelo i nā hakahaka ʻepekema data
Eia nā huaʻōlelo ʻenehana he kanakolu i maʻamau i waena o nā mea hana ʻepekema data. Ua loaʻa iaʻu kēia papa inoa ma ke ʻano like me ka mea i wehewehe ʻia ma luna no ka ʻenekinia data.
Nā ʻōlelo o ka ʻenehana i nā hakahaka no ke kūlana o ka ʻepekema data ma 2020
Inā mākou e kamaʻilio e pili ana i ka nui o ka helu, i hoʻohālikelike ʻia i ka hoʻopaʻa ʻana i manaʻo ʻia ma mua, aia he 28% o nā hakahaka (12 versus 013). E ʻike kākou i nā ʻenehana i emi ʻole i nā hakahaka no nā ʻepekema data ma mua o nā ʻenekinia data.
ʻOi aku ka kaulana i ka ʻenekinia data
Hōʻike ka pakuhi ma lalo nei i nā huaʻōlelo me ka ʻokoʻa awelika ʻoi aku ka nui ma mua o 10% a i ʻole ka liʻiliʻi ma mua o -10%.
ʻO nā ʻokoʻa nui loa o ka huaʻōlelo pinepine ma waena o ka ʻenekinia data a me ka ʻepekema data
Hōʻike ka AWS i ka piʻi nui loa: ma ka ʻenekinia ʻikepili ua ʻike ʻia he 25% ʻoi aku ka maʻamau ma mua o ka ʻepekema data (ma kahi o 45% a me 20% o ka huina o nā hakahaka, kēlā me kēia). ʻIke ʻia ka ʻokoʻa!
Eia ka ʻikepili like ma kahi hōʻike ʻokoʻa iki - ma ka pakuhi, aia nā hopena no ka huaʻōlelo like ma nā hakahaka no ke kūlana o ka ʻenekinia ʻikepili a me ka ʻepekema ʻikepili aia ʻaoʻao.
ʻO nā ʻokoʻa nui loa o ka huaʻōlelo pinepine ma waena o ka ʻenekinia data a me ka ʻepekema data
ʻO ka lele nui aʻe aʻu i ʻike ai ma Spark - pono e hana pinepine kahi ʻenekini data me ka ʻikepili nui.
ʻAʻole kaulana i ka ʻenekinia data
I kēia manawa e ʻike kākou i nā ʻenehana i kaulana ʻole i nā hakahaka ʻenekinia data.
ʻO ka hāʻule ʻoi loa i hoʻohālikelike ʻia i ka ʻāpana ʻepekema data i loaʻa ma
Ma ke koi ma ka ʻenehana ʻikepili a me ka ʻepekema data
Pono e hoʻomaopopo ʻia he ʻewalu o nā kūlana he ʻumi mua ma nā pūʻulu ʻelua. ʻO SQL, Python, Spark, AWS, Java, Hadoop, Hive a me Scala i komo i loko o ka ʻumi kiʻekiʻe no ka ʻenehana ʻikepili a me nā ʻoihana ʻepekema data. Ma ka pakuhi ma lalo nei hiki iā ʻoe ke ʻike i nā ʻenehana kaulana loa ma waena o nā limahana ʻenekinia data, a ma hope o lākou ko lākou hōʻailona no nā hakahaka no nā ʻepekema data.
koi
Inā makemake ʻoe e komo i ka ʻenekinia data, e aʻo wau iā ʻoe e haku i nā ʻenehana aʻe - papa inoa wau iā lākou ma ke ʻano o ka mea nui.
E aʻo iā SQL. Ke hilinaʻi nei au iā PostgreSQL no ka mea he kumu wehe ia, kaulana loa i ke kaiāulu, a aia i kahi pae ulu. Hiki iā ʻoe ke aʻo pehea e hoʻohana ai i ka ʻōlelo mai ka puke My Memorable SQL - loaʻa kona mana hoʻokele
Master Python, ʻoiai ʻaʻole i ka pae paʻakikī loa. Hoʻolālā ʻia kaʻu Python Memorable no nā poʻe hoʻomaka. Hiki ke kūʻai ʻia ma
Ke kamaʻāina ʻoe iā Python, e neʻe i nā pandas, kahi waihona Python i hoʻohana ʻia no ka hoʻomaʻemaʻe ʻana a me ka hana ʻana. Inā makemake ʻoe e hana i kahi ʻoihana e koi ana i ka hiki ke kākau ma Python (a ʻo kēia ka hapa nui o lākou), hiki iā ʻoe ke maopopo e manaʻo ʻia ka ʻike o nā pandas ma ke ʻano maʻamau. Ke hoʻopau nei au i kahi alakaʻi hoʻomaka no ka hana ʻana me nā pandas - hiki iā ʻoe
Kumu AWS. Inā makemake ʻoe e lilo i ʻenekinia data, ʻaʻole hiki iā ʻoe ke hana me ka ʻole o kahi paepae kapuaʻi i loko o ka stash, a ʻo AWS ka mea kaulana loa o lākou. Ua kōkua nui nā papa iaʻu
Inā ua hoʻopau ʻoe i kēia papa inoa holoʻokoʻa a makemake ʻoe e ulu hou i nā maka o nā mea hana ma ke ʻano he ʻenekini data, manaʻo wau e hoʻohui iā Apache Spark no ka hana ʻana me ka ʻikepili nui. ʻOiai ʻo kaʻu noiʻi ʻana i nā hakahaka ʻepekema data i hōʻike i ka emi ʻana o ka hoihoi, ma waena o nā ʻenekinia ʻikepili e ʻike mau ʻia ana ma kahi kokoke i kēlā me kēia lua lua.
Ma hope
Manaʻo wau ua loaʻa iā ʻoe kēia hiʻohiʻona o nā ʻenehana noiʻi nui loa no nā ʻenekini data pono. Inā ʻoe e noʻonoʻo nei pehea ka holomua o nā hana loiloi, heluhelu
Source: www.habr.com