ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ืื ื• ื—ื™ื™ื ื‘ืชืงื•ืคื” ืžื“ื”ื™ืžื” ืฉื‘ื” ืืชื” ื™ื›ื•ืœ ืœื—ื‘ืจ ื‘ืžื”ื™ืจื•ืช ื•ื‘ืงืœื•ืช ื›ืžื” ื›ืœื™ื ืžื•ื›ื ื™ื ื‘ืงื•ื“ ืคืชื•ื—, ืœื”ื’ื“ื™ืจ ืื•ืชื ืขื "ื”ืชื•ื“ืขื” ื›ื‘ื•ื™ื”" ืœืคื™ ื”ืขืฆื” ืฉืœ stackoverflow, ืžื‘ืœื™ ืœื”ืชืขืžืง ื‘"ืื•ืชื™ื•ืช ืžืจื•ื‘ื•ืช", ื•ืœื”ืคืขื™ืœ ืื•ืชื ืœืคืขื•ืœื” ืžืกื—ืจื™ืช. ื•ื›ืฉืฆืจื™ืš ืœืขื“ื›ืŸ/ืœื”ืจื—ื™ื‘ ืื• ืฉืžื™ืฉื”ื• ืžืืชื—ืœ ื‘ื˜ืขื•ืช ื›ืžื” ืžื›ื•ื ื•ืช - ืืชื” ืžื‘ื™ืŸ ืฉื”ืชื—ื™ืœ ืื™ื–ื” ื—ืœื•ื ืจืข ืื•ื‘ืกืกื™ื‘ื™, ื”ื›ืœ ื”ืกืชื‘ืš ื‘ืฆื•ืจื” ื“ืจืžื˜ื™ืช ืœืœื ื”ื™ื›ืจ, ืื™ืŸ ื“ืจืš ื—ื–ืจื”, ื”ืขืชื™ื“ ืžืขื•ืจืคืœ ื•ื‘ื˜ื•ื— ื™ื•ืชืจ, ื‘ืžืงื•ื ืœืชื›ื ืช, ืœื’ื“ืœ ื“ื‘ื•ืจื™ื ื•ืœืขืฉื•ืช ื’ื‘ื™ื ื”.

ืœื ื‘ื›ื“ื™ ืขืžื™ืชื™ื ืžื ื•ืกื™ื ื™ื•ืชืจ, ืฉืจืืฉื™ื”ื ื–ืจื•ืขื™ื ื‘ืื’ื™ื ื•ืœื›ืŸ ื›ื‘ืจ ืืคื•ืจื™ื, ืฉื•ืงืœื™ื ืืช ื”ืคืจื™ืกื” ื”ืžื”ื™ืจื” ืœื”ืคืœื™ื ืฉืœ ื—ื‘ื™ืœื•ืช "ืžื›ื•ืœื•ืช" ื‘"ืงื•ื‘ื™ื•ืช" ื‘ืขืฉืจื•ืช ืฉืจืชื™ื ื‘"ืฉืคื•ืช ืื•ืคื ืชื™ื•ืช" ืขื ืชืžื™ื›ื” ืžื•ื‘ื ื™ืช ื‘- ืงืœื˜/ืคืœื˜ ืืกื™ื ื›ืจื•ื ื™ ืœื ื—ื•ืกื, ื—ื™ื™ืš ื‘ืฆื ื™ืขื•ืช. ื•ื”ื ืžืžืฉื™ื›ื™ื ื‘ืฉืงื˜ ืœืงืจื•ื ืžื—ื“ืฉ ืืช "man ps", ืœื”ืชืขืžืง ื‘ืงื•ื“ ื”ืžืงื•ืจ ืฉืœ "nginx" ืขื“ ืฉืขื™ื ื™ื”ื ืžื“ืžืžื•ืช, ื•ืœื›ืชื•ื‘, ืœื›ืชื•ื‘, ืœื›ืชื•ื‘ ื‘ื“ื™ืงื•ืช ื™ื—ื™ื“ื”. ืขืžื™ืชื™ื ื™ื•ื“ืขื™ื ืฉื”ื“ื‘ืจ ื”ืžืขื ื™ื™ืŸ ื‘ื™ื•ืชืจ ื™ื’ื™ืข ื›ืืฉืจ "ื›ืœ ื–ื”" ื™ื•ื ืื—ื“ ื™ืชื”ืคืš ื‘ืœื™ืœื” ื‘ืขืจื‘ ื”ืฉื ื” ื”ื—ื“ืฉื”. ื•ื”ื ื™ืขื–ืจื• ืจืง ืขืœ ื™ื“ื™ ื”ื‘ื ื” ืžืขืžื™ืงื” ืฉืœ ื˜ื‘ืขื• ืฉืœ ื™ื•ื ื™ืงืก, ื˜ื‘ืœืช ื”ืžืฆื‘ื™ื ื”ืžืฉื•ื ื ืช ืฉืœ TCP/IP ื•ืืœื’ื•ืจื™ืชืžื™ ืžื™ื•ืŸ-ื—ื™ืคื•ืฉ ื‘ืกื™ืกื™ื™ื. ืœื”ื—ื–ื™ืจ ืืช ื”ืžืขืจื›ืช ืœื—ื™ื™ื ื‘ื–ืžืŸ ืฉื”ืฆืœืฆื•ืœื™ื ืžื›ื™ื.

ืื” ื›ืŸ, ืงืฆืช ื”ื™ืกื— ื“ืขืชื™, ืื‘ืœ ืื ื™ ืžืงื•ื•ื” ืฉื”ืฆืœื—ืชื™ ืœื”ืขื‘ื™ืจ ืืช ืžืฆื‘ ื”ืฆื™ืคื™ื™ื”.
ื”ื™ื•ื ืื ื™ ืจื•ืฆื” ืœื—ืœื•ืง ืืช ื”ื ื™ืกื™ื•ืŸ ืฉืœื ื• ื‘ืคืจื™ืกืช ืžื—ืกื ื™ืช ื ื•ื—ื” ื•ื–ื•ืœื” ืขื‘ื•ืจ DataLake, ืืฉืจ ืคื•ืชืจืช ืืช ืจื•ื‘ ื”ืžืฉื™ืžื•ืช ื”ืื ืœื™ื˜ื™ื•ืช ื‘ื—ื‘ืจื” ืขื‘ื•ืจ ื—ื˜ื™ื‘ื•ืช ืžื‘ื ื™ื•ืช ืฉื•ื ื•ืช ืœื—ืœื•ื˜ื™ืŸ.

ืœืคื ื™ ื–ืžืŸ ืžื” ื”ื’ืขื ื• ืœื”ื‘ื ื” ืฉื—ื‘ืจื•ืช ืฆืจื™ื›ื•ืช ื™ื•ืชืจ ื•ื™ื•ืชืจ ืืช ื”ืคื™ืจื•ืช ืฉืœ ื ื™ืชื•ื— ืžื•ืฆืจ ื•ื˜ื›ื ื™ ื›ืื—ื“ (ืฉืœื ืœื“ื‘ืจ ืขืœ ื”ื“ื•ื‘ื“ื‘ืŸ ืฉื‘ืงืฆืคืช ื‘ื“ืžื•ืช ืœืžื™ื“ืช ืžื›ื•ื ื”) ื•ื›ื“ื™ ืœื”ื‘ื™ืŸ ืžื’ืžื•ืช ื•ืกื™ื›ื•ื ื™ื - ืฆืจื™ืš ืœืืกื•ืฃ ื•ืœื ืชื— ืขื•ื“ ื•ืขื•ื“ ืžื“ื“ื™ื.

ื ื™ืชื•ื— ื˜ื›ื ื™ ื‘ืกื™ืกื™ ื‘-Bitrix24

ืœืคื ื™ ืžืกืคืจ ืฉื ื™ื, ื‘ืžืงื‘ื™ืœ ืœื”ืฉืงืช ืฉื™ืจื•ืช Bitrix24, ื”ืฉืงืขื ื• ื‘ืื•ืคืŸ ืืงื˜ื™ื‘ื™ ื–ืžืŸ ื•ืžืฉืื‘ื™ื ื‘ื™ืฆื™ืจืช ืคืœื˜ืคื•ืจืžื” ืื ืœื™ื˜ื™ืช ืคืฉื•ื˜ื” ื•ืืžื™ื ื” ืฉืชืขื–ื•ืจ ืœืจืื•ืช ื‘ืžื”ื™ืจื•ืช ื‘ืขื™ื•ืช ื‘ืชืฉืชื™ืช ื•ืœืชื›ื ืŸ ืืช ื”ืฉืœื‘ ื”ื‘ื. ื›ืžื•ื‘ืŸ ืฉื”ื™ื” ืžื•ืžืœืฅ ืœืงื—ืช ื›ืœื™ื ืžื•ื›ื ื™ื ื›ืžื” ืฉื™ื•ืชืจ ืคืฉื•ื˜ื™ื ื•ืžื•ื‘ื ื™ื. ื›ืชื•ืฆืื” ืžื›ืš, ื ืื’ื™ื•ืก ื ื‘ื—ืจ ืœื ื™ื˜ื•ืจ ื•-munin ืœื ื™ืชื•ื— ื•ื”ื“ืžื™ื”. ืขื›ืฉื™ื• ื™ืฉ ืœื ื• ืืœืคื™ ืฆ'ืงื™ื ื‘ื ื’ื™ื•ืก, ืžืื•ืช ืชืจืฉื™ืžื™ื ื‘ืžื•ื ื™ืŸ, ื•ื”ืงื•ืœื’ื•ืช ืฉืœื ื• ืžืฉืชืžืฉื™ื ื‘ื”ื ื‘ื”ืฆืœื—ื” ืžื“ื™ ื™ื•ื. ื”ืžื“ื“ื™ื ื‘ืจื•ืจื™ื, ื”ื’ืจืคื™ื ื‘ืจื•ืจื™ื, ื”ืžืขืจื›ืช ืขื•ื‘ื“ืช ื‘ืฆื•ืจื” ืืžื™ื ื” ื›ื‘ืจ ื›ืžื” ืฉื ื™ื ื•ืžืชื•ื•ืกืคื™ื ืœื” ื‘ืื•ืคืŸ ืงื‘ื•ืข ื‘ื“ื™ืงื•ืช ื•ื’ืจืคื™ื ื—ื“ืฉื™ื: ื›ืฉืื ื—ื ื• ืžื›ื ื™ืกื™ื ืฉื™ืจื•ืช ื—ื“ืฉ ืœืคืขื•ืœื”, ืžื•ืกื™ืคื™ื ืžืกืคืจ ื‘ื“ื™ืงื•ืช ื•ื’ืจืคื™ื. ื‘ื”ืฆืœื—ื”.

ืืฆื‘ืข ืขืœ ื”ื“ื•ืคืง - ื ื™ืชื•ื— ื˜ื›ื ื™ ืžืชืงื“ื

ื”ืจืฆื•ืŸ ืœืงื‘ืœ ืžื™ื“ืข ืขืœ ื‘ืขื™ื•ืช "ื›ืžื” ืฉื™ื•ืชืจ ืžื”ืจ" ื”ื•ื‘ื™ืœ ืื•ืชื ื• ืœื ื™ืกื•ื™ื™ื ืคืขื™ืœื™ื ื‘ื›ืœื™ื ืคืฉื•ื˜ื™ื ื•ืžื•ื‘ื ื™ื - pinba ื•-xhprof.

ืคื™ื ื‘ื” ืฉืœื—ื” ืœื ื• ื ืชื•ื ื™ื ืกื˜ื˜ื™ืกื˜ื™ื™ื ื‘ื—ื‘ื™ืœื•ืช UDP ืขืœ ืžื”ื™ืจื•ืช ื”ืคืขื•ืœื” ืฉืœ ื—ืœืงื™ื ืžื“ืคื™ ืื™ื ื˜ืจื ื˜ ื‘-PHP, ื•ื™ื›ื•ืœื ื• ืœืจืื•ืช ื‘ืื™ื ื˜ืจื ื˜ ื‘ืื—ืกื•ืŸ MySQL (ืคื™ื ื‘ื” ืžื’ื™ืขื” ืขื ืžื ื•ืข MySQL ืžืฉืœื” ืœื ื™ืชื•ื— ืื™ืจื•ืขื™ื ืžื”ื™ืจ) ืจืฉื™ืžื” ืงืฆืจื” ืฉืœ ื‘ืขื™ื•ืช ื•ืœื”ื’ื™ื‘ ืœ ืื•ึนืชึธื. ื•-xhprof ืืคืฉืจื” ืœื ื• ืื•ื˜ื•ืžื˜ื™ืช ืœืืกื•ืฃ ื’ืจืคื™ื ืฉืœ ื‘ื™ืฆื•ืข ื“ืคื™ PHP ื”ื›ื™ ืื™ื˜ื™ื™ื ืžืœืงื•ื—ื•ืช ื•ืœื ืชื— ืžื” ื™ื›ื•ืœ ืœื”ื•ื‘ื™ืœ ืœื›ืš - ื‘ืจื•ื’ืข, ืœืžื–ื•ื’ ืชื” ืื• ืžืฉื”ื• ื—ื–ืง ื™ื•ืชืจ.

ืœืคื ื™ ื–ืžืŸ ืžื”, ืขืจื›ืช ื”ื›ืœื™ื ื”ืชื—ื“ืฉื” ื‘ืžื ื•ืข ื“ื™ ืคืฉื•ื˜ ื•ืžื•ื‘ืŸ ื”ืžื‘ื•ืกืก ืขืœ ืืœื’ื•ืจื™ืชื ื”ืื™ื ื“ืงืก ื”ื”ืคื•ืš, ืžื™ื•ืฉื ื‘ืฆื•ืจื” ืžื•ืฉืœืžืช ื‘ืกืคืจื™ื™ืช Lucene ื”ืื’ื“ื™ืช - Elastic/Kibana. ื”ืจืขื™ื•ืŸ ื”ืคืฉื•ื˜ ืฉืœ ื”ืงืœื˜ืช ืžืกืžื›ื™ื ืžืจื•ื‘ื” ื”ืœื™ื›ื™ ืœืื™ื ื“ืงืก ื”ืคื•ืš ืฉืœ Lucene ื”ืžื‘ื•ืกืก ืขืœ ืื™ืจื•ืขื™ื ื‘ื™ื•ืžื ื™ื ื•ื—ื™ืคื•ืฉ ืžื”ื™ืจ ื‘ื”ื ื‘ืืžืฆืขื•ืช ื—ืœื•ืงืช ื”ื™ื‘ื˜ื™ื ื”ืชื‘ืจืจ ื›ืžื•ืขื™ืœ ืžืื•ื“.

ืœืžืจื•ืช ื”ืžืจืื” ื”ื˜ื›ื ื™ ืœืžื“ื™ ืฉืœ ื”ื“ืžื™ื•ืช ื‘-Kibana ืขื ืžื•ืฉื’ื™ื ื‘ืจืžื” ื ืžื•ื›ื” ื›ืžื• "ื“ืœื™" "ื–ื•ืจื ื›ืœืคื™ ืžืขืœื”" ื•ื”ืฉืคื” ืฉื”ื•ืžืฆืื” ืžื—ื“ืฉ ืฉืœ ื”ืืœื’ื‘ืจื” ื”ื”ืชื™ื™ื—ืกื•ืชื™ืช ืฉื˜ืจื ื ืฉื›ื—ื” ืœื—ืœื•ื˜ื™ืŸ, ื”ื›ืœื™ ื”ื—ืœ ืœืขื–ื•ืจ ืœื ื• ื”ื™ื˜ื‘ ื‘ืžืฉื™ืžื•ืช ื”ื‘ืื•ืช:

  • ื›ืžื” ืฉื’ื™ืื•ืช PHP ื”ื™ื• ืœืœืงื•ื— Bitrix24 ื‘ืคื•ืจื˜ืœ p1 ื‘ืฉืขื” ื”ืื—ืจื•ื ื” ื•ืื™ื–ื”? ืœื”ื‘ื™ืŸ, ืœืกืœื•ื— ื•ืœืชืงืŸ ื‘ืžื”ื™ืจื•ืช.
  • ื›ืžื” ืฉื™ื—ื•ืช ื•ื™ื“ืื• ื‘ื•ืฆืขื• ื‘ืคื•ืจื˜ืœื™ื ื‘ื’ืจืžื ื™ื” ื‘-24 ื”ืฉืขื•ืช ื”ืงื•ื“ืžื•ืช, ื‘ืื™ื–ื• ืื™ื›ื•ืช ื•ื”ื™ื• ืงืฉื™ื™ื ื‘ืขืจื•ืฅ/ืจืฉืช?
  • ืขื“ ื›ืžื” ืคื•ืขืœืช ืคื•ื ืงืฆื™ื•ื ืœื™ื•ืช ื”ืžืขืจื›ืช (ืชื•ืกืฃ C ืฉืœื ื• ืขื‘ื•ืจ PHP), ืฉื”ื•ืจื›ื‘ื” ืžื”ืžืงื•ืจ ื‘ืขื“ื›ื•ืŸ ื”ืฉื™ืจื•ืช ื”ืื—ืจื•ืŸ ื•ื”ื•ืฉืงื” ืœืœืงื•ื—ื•ืช? ื”ืื ื™ืฉ ืชืงืœื•ืช?
  • ื”ืื ื ืชื•ื ื™ ืœืงื•ื—ื•ืช ืžืชืื™ืžื™ื ืœื–ื™ื›ืจื•ืŸ PHP? ื”ืื ื™ืฉ ืฉื’ื™ืื•ืช ืœื’ื‘ื™ ื—ืจื™ื’ื” ืžื”ื–ื™ื›ืจื•ืŸ ืฉื”ื•ืงืฆื” ืœืชื”ืœื™ื›ื™ื: "ื—ืกืจ ื–ื™ื›ืจื•ืŸ"? ืœืžืฆื•ื ื•ืœื ื˜ืจืœ.

ื”ื ื” ื“ื•ื’ืžื” ืงื•ื ืงืจื˜ื™ืช. ืœืžืจื•ืช ื‘ื“ื™ืงื” ื™ืกื•ื“ื™ืช ื•ืจื‘-ืฉื›ื‘ืชื™ืช, ื”ืœืงื•ื—, ืขื ืžืืจื– ืžืื•ื“ ืœื ืกื˜ื ื“ืจื˜ื™ ื•ื ืชื•ื ื™ ืงืœื˜ ืคื’ื•ืžื™ื, ืงื™ื‘ืœ ืฉื’ื™ืื” ืžืขืฆื‘ื ืช ื•ื‘ืœืชื™ ืฆืคื•ื™ื”, ื ืฉืžืขื” ืกื™ืจื ื” ื•ื”ืชื—ื™ืœ ืชื”ืœื™ืš ื”ืชื™ืงื•ืŸ ื”ืžื”ื™ืจ:

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ื‘ื ื•ืกืฃ, kibana ืžืืคืฉืจืช ืœืืจื’ืŸ ื”ื•ื“ืขื•ืช ืœืื™ืจื•ืขื™ื ืžื•ื’ื“ืจื™ื, ื•ืชื•ืš ื–ืžืŸ ืงืฆืจ ื”ื›ืœื™ ื‘ื—ื‘ืจื” ื”ื—ืœ ืœืฉืžืฉ ืขืฉืจื•ืช ืขื•ื‘ื“ื™ื ืžืžื—ืœืงื•ืช ืฉื•ื ื•ืช - ืžืชืžื™ื›ื” ื˜ื›ื ื™ืช ื•ืคื™ืชื•ื— ื•ืขื“ QA.

ื”ืคืขื™ืœื•ืช ืฉืœ ื›ืœ ืžื—ืœืงื” ื‘ื—ื‘ืจื” ื”ืคื›ื” ื ื•ื—ื” ืœืžืขืงื‘ ื•ืœืžื“ื™ื“ื” - ื‘ืžืงื•ื ืœื ืชื— ื™ื“ื ื™ืช ื™ื•ืžื ื™ื ื‘ืฉืจืชื™ื, ืืชื” ืจืง ืฆืจื™ืš ืœื”ื’ื“ื™ืจ ื™ื•ืžื ื™ ื ื™ืชื•ื— ืคืขื ืื—ืช ื•ืœืฉืœื•ื— ืื•ืชื ืœืืฉื›ื•ืœ ื”ืืœืกื˜ื™ ื›ื“ื™ ืœื™ื”ื ื•ืช, ืœืžืฉืœ, ืžื”ืจื”ื•ืจื™ื ื‘ืงื™ื‘ื ื”. ืœื•ื— ื”ืžื—ื•ื•ื ื™ื ืžืกืคืจ ื”ื’ื•ืจื™ื ืฉื ืžื›ืจื• ื‘ืขืœื™ ืฉื ื™ ืจืืฉื™ื ืฉื”ื•ื“ืคืกื• ื‘ืžื“ืคืกืช ืชืœืช ืžื™ืžื“ ื‘ื—ื•ื“ืฉ ื”ื™ืจื— ื”ืื—ืจื•ืŸ.

ื ื™ืชื•ื— ืขืกืงื™ ื‘ืกื™ืกื™

ื›ื•ืœื ื™ื•ื“ืขื™ื ืฉื ื™ืชื•ื— ืขืกืงื™ ื‘ื—ื‘ืจื•ืช ืžืชื—ื™ืœ ืœืจื•ื‘ ื‘ืฉื™ืžื•ืฉ ืืงื˜ื™ื‘ื™ ื‘ื™ื•ืชืจ ื‘-, ื›ืŸ, ืืงืกืœ. ืื‘ืœ ื”ืขื™ืงืจ ืฉื–ื” ืœื ื ื’ืžืจ ืฉื. ื’ื ื’ื•ื’ืœ ืื ืœื™ื˜ื™ืงืก ืžื‘ื•ืกืก ืขื ืŸ ืžื•ืกื™ืฃ ืฉืžืŸ ืœืžื“ื•ืจื” - ืืชื” ืžืชื—ื™ืœ ืžื”ืจ ืœื”ืชืจื’ืœ ืœื“ื‘ืจื™ื ื”ื˜ื•ื‘ื™ื.

ื‘ื—ื‘ืจื” ื”ืžืชืคืชื—ืช ื‘ืฆื•ืจื” ื”ืจืžื•ื ื™ืช ืฉืœื ื•, ื”ื—ืœื• ืœื”ื•ืคื™ืข ืคื” ื•ืฉื "ื ื‘ื™ืื™ื" ืฉืœ ืขื‘ื•ื“ื” ืื™ื ื˜ื ืกื™ื‘ื™ืช ื™ื•ืชืจ ืขื ื ืชื•ื ื™ื ื’ื“ื•ืœื™ื ื™ื•ืชืจ. ื”ืฆื•ืจืš ื‘ื“ื™ื•ื•ื—ื™ื ืžืขืžื™ืงื™ื ื•ืจื‘ ื’ื•ื ื™ ื”ื—ืœ ืœื”ื•ืคื™ืข ื‘ืื•ืคืŸ ืงื‘ื•ืข, ื•ื‘ืืžืฆืขื•ืช ืžืืžืฆื™ื ืฉืœ ื—ื‘ืจ'ื” ืžืžื—ืœืงื•ืช ืฉื•ื ื•ืช, ืœืคื ื™ ื–ืžืŸ ืžื” ืื•ืจื’ืŸ ืคืชืจื•ืŸ ืคืฉื•ื˜ ื•ืคืจืงื˜ื™ - ืฉื™ืœื•ื‘ ืฉืœ ClickHouse ื•-PowerBI.

ื‘ืžืฉืš ื“ื™ ื”ืจื‘ื” ื–ืžืŸ ื”ืคืชืจื•ืŸ ื”ื’ืžื™ืฉ ื”ื–ื” ืขื–ืจ ืžืื•ื“, ืื‘ืœ ืœืื˜ ืœืื˜ ื”ืชื—ื™ืœื” ืœื”ื’ื™ืข ื”ื”ื‘ื ื” ืฉ-ClickHouse ื”ื•ื ืœื ื’ื•ืžื™ ื•ืื™ ืืคืฉืจ ืœืœืขื•ื’ ื›ื›ื”.

ื›ืืŸ ื—ืฉื•ื‘ ืœื”ื‘ื™ืŸ ื”ื™ื˜ื‘ ืฉ-ClickHouse, ื›ืžื• Druid, ื›ืžื• Vertica, ื›ืžื• Amazon RedShift (ื”ืžื‘ื•ืกืกืช ืขืœ postgres), ื”ื ืžื ื•ืขื™ื ืื ืœื™ื˜ื™ื™ื ื”ืžื•ืชืืžื™ื ืœื ื™ืชื•ื— ื ื•ื— ืœืžื“ื™ (ืกื›ื•ืžื™ื, ืฆื‘ื™ืจื”, ืžื™ื ื™ืžื•ื-ืžืงืกื™ืžื•ื ืœืคื™ ืขืžื•ื“ื•ืช ื•ื›ืžื” ื”ืฆื˜ืจืคื•ืช ืืคืฉืจื™ื•ืช ), ื›ื™ ืžืื•ืจื’ืŸ ืœืื—ืกื•ืŸ ื™ืขื™ืœ ืฉืœ ืขืžื•ื“ื•ืช ืฉืœ ื˜ื‘ืœืื•ืช ื™ื—ืกื™ื, ื‘ื ื™ื’ื•ื“ ืœ-MySQL ื•ืžืกื“ื™ ื ืชื•ื ื™ื ืื—ืจื™ื (ืžื•ื›ื•ื•ื ื™ ืฉื•ืจื•ืช) ื”ืžื•ื›ืจื™ื ืœื ื•.

ื‘ืขืฆื, ClickHouse ื”ื•ื ืจืง "ื‘ืกื™ืก ื ืชื•ื ื™ื" ืจื—ื‘ ื™ื•ืชืจ, ืขื ื”ื›ื ืกื” ืœื ื ื•ื—ื” ื‘ืžื™ื•ื—ื“ ื ืงื•ื“ืชื™ืช (ื›ื›ื” ื–ื” ื ื•ืขื“, ื”ื›ืœ ื‘ืกื“ืจ), ืื‘ืœ ื ื™ืชื•ื— ื ืขื™ื ื•ืžืขืจื›ืช ืฉืœ ืคื•ื ืงืฆื™ื•ืช ื—ื–ืงื•ืช ืžืขื ื™ื™ื ื•ืช ืœืขื‘ื•ื“ื” ืขื ื ืชื•ื ื™ื. ื›ืŸ, ืืชื” ื™ื›ื•ืœ ืืคื™ืœื• ืœื™ืฆื•ืจ ืžืงื‘ืฅ - ืื‘ืœ ืืชื” ืžื‘ื™ืŸ ืฉืœืงื™ืขื” ื‘ืžืกืžืจื™ื ืขื ืžื™ืงืจื•ืกืงื•ืค ื–ื” ืœื ืœื’ืžืจื™ ื ื›ื•ืŸ ื•ื”ืชื—ืœื ื• ืœื—ืคืฉ ืคืชืจื•ื ื•ืช ืื—ืจื™ื.

ื‘ื™ืงื•ืฉ ืœืคื™ืชื•ืŸ ื•ืื ืœื™ืกื˜ื™ื

ืœื—ื‘ืจื” ืฉืœื ื• ื™ืฉ ืžืคืชื—ื™ื ืจื‘ื™ื ืฉื›ื•ืชื‘ื™ื ืงื•ื“ ื›ืžืขื˜ ื›ืœ ื™ื•ื ื‘ืžืฉืš 10-20 ืฉื ื” ื‘-PHP, JavaScript, C#, C/C++, Java, Go, Rust, Python, Bash. ื™ืฉื ื ื’ื ืžื ื”ืœื™ ืžืขืจื›ืช ืžื ื•ืกื™ื ืจื‘ื™ื ืฉื—ื•ื• ื™ื•ืชืจ ืžืืกื•ืŸ ืื—ื“ ืžื“ื”ื™ื ืœื—ืœื•ื˜ื™ืŸ ืฉืื™ื ื• ืžืชืื™ื ืœื—ื•ืงื™ ื”ืกื˜ื˜ื™ืกื˜ื™ืงื” (ืœื“ื•ื’ืžื”, ื›ืืฉืจ ืจื•ื‘ ื”ื“ื™ืกืงื™ื ื‘-raid-10 ื ื”ืจืกื™ื ืขืœ ื™ื“ื™ ืžื›ืช ื‘ืจืง ื—ื–ืงื”). ื‘ื ืกื™ื‘ื•ืช ื›ืืœื”, ื‘ืžืฉืš ื–ืžืŸ ืจื‘ ืœื ื”ื™ื” ื‘ืจื•ืจ ืžื” ื–ื” "ืื ืœื™ื˜ื™ืงืื™ ืคื™ืชื•ืŸ". Python ื”ื•ื ื›ืžื• PHP, ืจืง ื”ืฉื ืงืฆืช ื™ื•ืชืจ ืืจื•ืš ื•ื™ืฉ ืงืฆืช ืคื—ื•ืช ืขืงื‘ื•ืช ืฉืœ ื—ื•ืžืจื™ื ืžืฉื ื™ื ืชื•ื“ืขื” ื‘ืงื•ื“ ื”ืžืงื•ืจ ืฉืœ ื”ืžืชื•ืจื’ืžืŸ. ืขื ื–ืืช, ื›ื›ืœ ืฉื ื•ืฆืจื• ื™ื•ืชืจ ื•ื™ื•ืชืจ ื“ื•ื—ื•ืช ืื ืœื™ื˜ื™ื™ื, ืžืคืชื—ื™ื ืžื ื•ืกื™ื ื”ื—ืœื• ืœื”ื‘ื™ืŸ ื™ื•ืชืจ ื•ื™ื•ืชืจ ืืช ื”ื—ืฉื™ื‘ื•ืช ืฉืœ ื”ืชืžื—ื•ืช ืฆืจื” ื‘ื›ืœื™ื ื›ืžื• numpy, pandas, matplotlib, seaborn.
ืืช ื”ืชืคืงื™ื“ ื”ืžื›ืจื™ืข, ื›ื›ืœ ื”ื ืจืื”, ืžื™ืœืื” ื”ืชืขืœืคื•ืช ืคืชืื•ืžื™ืช ืฉืœ ืขื•ื‘ื“ื™ื ืžืฉื™ืœื•ื‘ ื”ืžื™ืœื™ื "ืจื’ืจืกื™ื” ืœื•ื’ื™ืกื˜ื™ืช" ื•ื”ืคื’ื ืช ื“ื™ื•ื•ื— ื™ืขื™ืœ ืขืœ ื ืชื•ื ื™ื ื’ื“ื•ืœื™ื ื‘ืืžืฆืขื•ืช, ื›ืŸ, ื›ืŸ, pyspark.

Apache Spark, ื”ืคืจื“ื™ื’ืžื” ื”ืคื•ื ืงืฆื™ื•ื ืœื™ืช ืฉืœื” ืžืชืื™ืžื” ืืœื’ื‘ืจื” ืจืœืฆื™ื•ื ื™ืช ื‘ืฆื•ืจื” ืžื•ืฉืœืžืช, ื•ื”ื™ื›ื•ืœื•ืช ืฉืœื” ืขืฉื• ืจื•ืฉื ืขืœ ืžืคืชื—ื™ื ื”ืžื•ืจื’ืœื™ื ื‘-MySQL ืฉื”ืฆื•ืจืš ืœื—ื–ืง ืืช ื”ืฉื•ืจื•ืช ืขื ืื ืœื™ืกื˜ื™ื ืžื ื•ืกื™ื ื”ืชื‘ืจืจ ื›ืฉืžืฉ.

ื ื™ืกื™ื•ื ื•ืช ื ื•ืกืคื™ื ืฉืœ Apache Spark/Hadoop ืœื”ืžืจื™ื ื•ืžื” ืœื ืžืžืฉ ื”ืœืš ืœืคื™ ื”ืชืกืจื™ื˜

ืขื ื–ืืช, ืžื”ืจ ืžืื•ื“ ื”ืชื‘ืจืจ ืฉืžืฉื”ื• ืžืขืจื›ืชื™ืช ืœื ืžืžืฉ ืชืงื™ืŸ ืขื Spark, ืื• ืฉืคืฉื•ื˜ ื”ื™ื” ืฆื•ืจืš ืœืฉื˜ื•ืฃ ืืช ื”ื™ื“ื™ื™ื ื˜ื•ื‘ ื™ื•ืชืจ. ืื ืขืจื™ืžืช Hadoop/MapReduce/Lucene ื ื•ืฆืจื” ืขืœ ื™ื“ื™ ืžืชื›ื ืชื™ื ืžื ื•ืกื™ื ืœืžื“ื™, ื•ื–ื” ื‘ืจื•ืจ ืื ืžืกืชื›ืœื™ื ืžืงืจื•ื‘ ืขืœ ืงื•ื“ ื”ืžืงื•ืจ ื‘-Java ืื• ื‘ืจืขื™ื•ื ื•ืช ืฉืœ Doug Cutting ื‘- Lucene, ืื– Spark, ืคืชืื•ื, ื ื›ืชื‘ ื‘ืฉืคื” ื”ืืงื–ื•ื˜ื™ืช Scala, ืฉื”ื™ื ืžืื•ื“ ืฉื ื•ื™ ื‘ืžื—ืœื•ืงืช ืžื ืงื•ื“ืช ืžื‘ื˜ ืฉืœ ืคืจืงื˜ื™ื•ืช ื•ื›ืจื’ืข ืื™ื ื• ืžืชืคืชื—. ื•ื”ื™ืจื™ื“ื” ื”ืงื‘ื•ืขื” ื‘ื—ื™ืฉื•ื‘ื™ื ื‘ืืฉื›ื•ืœ Spark ืขืงื‘ ืขื‘ื•ื“ื” ืœื ื”ื’ื™ื•ื ื™ืช ื•ืœื ืžืื•ื“ ืฉืงื•ืคื” ืขื ื”ืงืฆืืช ื–ื™ื›ืจื•ืŸ ืœืฆืžืฆื•ื ืคืขื•ืœื•ืช (ืžืคืชื—ื•ืช ืจื‘ื™ื ืžื’ื™ืขื™ื ื‘ื‘ืช ืื—ืช) ื™ืฆืจื” ืกื‘ื™ื‘ื• ื”ื™ืœื” ืฉืœ ืžืฉื”ื• ืฉื™ืฉ ืœื• ืžืงื•ื ืœื’ื“ื•ืœ. ื‘ื ื•ืกืฃ, ื”ืžืฆื‘ ื”ื•ื—ืžืจ ืขืœ ื™ื“ื™ ืžืกืคืจ ืจื‘ ืฉืœ ื™ืฆื™ืื•ืช ืคืชื•ื—ื•ืช ืžื•ื–ืจื•ืช, ืงื‘ืฆื™ื ื–ืžื ื™ื™ื ืฉื’ื“ืœื• ื‘ืžืงื•ืžื•ืช ื”ื›ื™ ืœื ืžื•ื‘ื ื™ื ื•ืชืœื•ืช ื‘ื’ื™ื”ื ื•ื ืฉืœ ืฆื ืฆื ืช - ืžื” ืฉื’ืจื ืœืžื ื”ืœื™ ืžืขืจื›ืช ืœืชื—ื•ืฉื” ืื—ืช ืฉื”ื™ื™ืชื” ืžื•ื›ืจืช ื”ื™ื˜ื‘ ืžื™ืœื“ื•ืช: ืฉื ืื” ืขื–ื” (ืื• ืื•ืœื™ ื”ื ื”ื™ื• ืฆืจื™ื›ื™ื ืœืฉื˜ื•ืฃ ื™ื“ื™ื™ื ืขื ืกื‘ื•ืŸ).

ื›ืชื•ืฆืื” ืžื›ืš, "ืฉืจื“ื ื•" ืžืกืคืจ ืคืจื•ื™ืงื˜ื™ื ืื ืœื™ื˜ื™ื™ื ืคื ื™ืžื™ื™ื ื”ืžืฉืชืžืฉื™ื ื‘ืื•ืคืŸ ืคืขื™ืœ ื‘- Apache Spark (ื›ื•ืœืœ Spark Streaming, Spark SQL) ื•ื‘ืžืขืจื›ืช ื”ืืงื•ืœื•ื’ื™ืช ืฉืœ Hadoop (ื•ื›ืŸ ื”ืœืื” ื•ื›ืŸ ื”ืœืื”). ืœืžืจื•ืช ื”ืขื•ื‘ื“ื” ืฉืขื ื”ื–ืžืŸ ืœืžื“ื ื• ืœื”ืชื›ื•ื ืŸ ื•ืœื ื˜ืจ ืืช "ื–ื”" ื“ื™ ื˜ื•ื‘, ื•"ื–ื”" ื›ืžืขื˜ ื”ืคืกื™ืง ืœืงืจื•ืก ืคืชืื•ื ื‘ื’ืœืœ ืฉื™ื ื•ื™ื™ื ื‘ืื•ืคื™ ื”ื ืชื•ื ื™ื ื•ื—ื•ืกืจ ื”ืื™ื–ื•ืŸ ืฉืœ ื’ื™ื‘ื•ื‘ RDD ืื—ื™ื“, ื”ืจืฆื•ืŸ ืœืงื—ืช ืžืฉื”ื• ื›ื‘ืจ ืžื•ื›ืŸ , ืขื•ื“ื›ืŸ ื•ืžื ื•ื”ืœ ืื™ืคืฉื”ื• ื‘ืขื ืŸ ื”ืชื—ื–ืง ื™ื•ืชืจ ื•ื™ื•ืชืจ. ื‘ืชืงื•ืคื” ื–ื• ื ื™ืกื™ื ื• ืœื”ืฉืชืžืฉ ื‘ืžื›ืœื•ืœ ื”ืขื ืŸ ื”ืžื•ื›ืŸ ืฉืœ ืฉื™ืจื•ืชื™ ื”ืื™ื ื˜ืจื ื˜ ืฉืœ ืืžื–ื•ืŸ - EMR ื•ืœืื—ืจ ืžื›ืŸ ื ื™ืกื• ืœืคืชื•ืจ ื‘ืขื™ื•ืช ื‘ืืžืฆืขื•ืชื•. EMR ื”ื•ื Apache Spark ืฉื”ื•ื›ืŸ ืขืœ ื™ื“ื™ ืืžื–ื•ืŸ ืขื ืชื•ื›ื ื” ื ื•ืกืคืช ืžื”ืžืขืจื›ืช ื”ืืงื•ืœื•ื’ื™ืช, ื‘ื“ื•ืžื” ืœื‘ื ื™ื™ืช Cloudera/Hortonworks.

ืื—ืกื•ืŸ ืงื‘ืฆื™ ื’ื•ืžื™ ืœื ื™ืชื•ื— ื”ื•ื ืฆื•ืจืš ื“ื—ื•ืฃ

ื”ื—ื•ื•ื™ื” ืฉืœ "ื‘ื™ืฉื•ืœ" Hadoop/Spark ืขื ื›ื•ื•ื™ื•ืช ื‘ื—ืœืงื™ื ืฉื•ื ื™ื ื‘ื’ื•ืฃ ืœื ื”ื™ื™ืชื” ืœืฉื•ื•ื. ื”ืฆื•ืจืš ืœื™ืฆื•ืจ ืื—ืกื•ืŸ ืงื‘ืฆื™ื ื™ื—ื™ื“, ื–ื•ืœ ื•ืืžื™ืŸ ืฉื™ื”ื™ื” ืขืžื™ื“ ื‘ืคื ื™ ืชืงืœื•ืช ื—ื•ืžืจื” ื•ื‘ื• ื ื™ืชืŸ ื™ื”ื™ื” ืœืื—ืกืŸ ืงื‘ืฆื™ื ื‘ืคื•ืจืžื˜ื™ื ืฉื•ื ื™ื ืžืžืขืจื›ื•ืช ืฉื•ื ื•ืช ื•ืœืขืฉื•ืช ื“ื’ื™ืžื•ืช ื™ืขื™ืœื•ืช ื•ื—ืกื›ื•ื ื™ื•ืช ื‘ื–ืžืŸ ืœื“ื™ื•ื•ื—ื™ื ืžื ืชื•ื ื™ื ืืœื•. ื‘ืจื•ืจ.

ืจืฆื™ืชื™ ื’ื ืฉืขื“ื›ื•ืŸ ื”ืชื•ื›ื ื” ืฉืœ ื”ืคืœื˜ืคื•ืจืžื” ื”ื–ื• ืœื ื™ื”ืคื•ืš ืœืกื™ื•ื˜ ืฉืœ ื”ืฉื ื” ื”ื—ื“ืฉื” ืขื ืงืจื™ืืช ืขืงื‘ื•ืช Java ืฉืœ 20 ืขืžื•ื“ื™ื ื•ื ื™ืชื•ื— ื™ื•ืžื ื™ื ืžืคื•ืจื˜ื™ื ื‘ืื•ืจืš ืงื™ืœื•ืžื˜ืจื™ื ืฉืœ ื”ืืฉื›ื•ืœ ื‘ืืžืฆืขื•ืช Spark History Server ื•ื–ื›ื•ื›ื™ืช ืžื’ื“ืœืช ืขื ืชืื•ืจื” ืื—ื•ืจื™ืช. ืจืฆื™ืชื™ ืฉื™ื”ื™ื” ืœื™ ื›ืœื™ ืคืฉื•ื˜ ื•ืฉืงื•ืฃ ืฉืœื ืžืฆืจื™ืš ืฆืœื™ืœื” ืงื‘ื•ืขื” ืžืชื—ืช ืœืžื›ืกื” ื”ืžื ื•ืข ืื ื‘ืงืฉืช ื”- MapReduce ื”ืกื˜ื ื“ืจื˜ื™ืช ืฉืœ ื”ืžืคืชื— ืชืคืกื™ืง ืœืคืขื•ืœ ื›ืืฉืจ ืขื•ื‘ื“ ื”-reduce data ื ืคืœ ืžื”ื–ื™ื›ืจื•ืŸ ื‘ื’ืœืœ ืืœื’ื•ืจื™ืชื ื—ืœื•ืงืช ื ืชื•ื ื™ ืžืงื•ืจ ืฉืœื ื ื‘ื—ืจ ื”ื™ื˜ื‘.

ื”ืื ืืžื–ื•ืŸ S3 ืžื•ืขืžื“ืช ืœ-DataLake?

ื”ื ื™ืกื™ื•ืŸ ืขื Hadoop/MapReduce ืœื™ืžื“ ืื•ืชื ื• ืฉืื ื• ื–ืงื•ืงื™ื ืœืžืขืจื›ืช ืงื‘ืฆื™ื ื ื™ืชื ืช ืœื”ืจื—ื‘ื” ื•ืืžื™ื ื” ื•ืžืขืœื™ื” ืขื•ื‘ื“ื™ื ื ื™ืชื ื™ื ืœื”ืจื—ื‘ื”, ืฉ"ืžืชืงืจื‘ื™ื" ืœื ืชื•ื ื™ื ื›ื“ื™ ืœื ืœื”ื ื™ืข ืืช ื”ื ืชื•ื ื™ื ื‘ืจืฉืช. ืขื•ื‘ื“ื™ื ืฆืจื™ื›ื™ื ืœื”ื™ื•ืช ืžืกื•ื’ืœื™ื ืœืงืจื•ื ื ืชื•ื ื™ื ื‘ืคื•ืจืžื˜ื™ื ืฉื•ื ื™ื, ืืš ืจืฆื•ื™ ืœื ืœืงืจื•ื ืžื™ื“ืข ืžื™ื•ืชืจ ื•ืœื”ื™ื•ืช ืžืกื•ื’ืœื™ื ืœืื—ืกืŸ ื ืชื•ื ื™ื ืžืจืืฉ ื‘ืคื•ืจืžื˜ื™ื ื ื•ื—ื™ื ืœืขื•ื‘ื“ื™ื.

ืฉื•ื‘, ื”ืจืขื™ื•ืŸ ื”ื‘ืกื™ืกื™. ืื™ืŸ ืฉื•ื ืจืฆื•ืŸ "ืœืฉืคื•ืš" ื‘ื™ื’ ื“ืื˜ื” ืœืชื•ืš ืžื ื•ืข ืื ืœื™ื˜ื™ ืžืงื‘ืฅ ืื—ื“, ืฉื™ื™ื—ื ืง ื‘ืžื•ืงื“ื ืื• ื‘ืžืื•ื—ืจ ื•ืชืฆื˜ืจืš ืœื’ื–ื•ืจ ืื•ืชื• ื‘ืฆื•ืจื” ืžื›ื•ืขืจืช. ืื ื™ ืจื•ืฆื” ืœืื—ืกืŸ ืงื‘ืฆื™ื, ืจืง ืงื‘ืฆื™ื, ื‘ืคื•ืจืžื˜ ืžื•ื‘ืŸ ื•ืœื‘ืฆืข ืขืœื™ื”ื ืฉืื™ืœืชื•ืช ืื ืœื™ื˜ื™ื•ืช ื™ืขื™ืœื•ืช ื‘ืืžืฆืขื•ืช ื›ืœื™ื ืฉื•ื ื™ื ืืš ืžื•ื‘ื ื™ื. ื•ื™ื”ื™ื• ืขื•ื“ ื•ืขื•ื“ ืงื‘ืฆื™ื ื‘ืคื•ืจืžื˜ื™ื ืฉื•ื ื™ื. ื•ืขื“ื™ืฃ ืœื’ืจื•ืก ืœื ืืช ื”ืžื ื•ืข, ืืœื ืืช ื ืชื•ื ื™ ื”ืžืงื•ืจ. ืื ื—ื ื• ืฆืจื™ื›ื™ื DataLake ื ื™ืชืŸ ืœื”ืจื—ื‘ื” ื•ืื•ื ื™ื‘ืจืกืœื™, ื”ื—ืœื˜ื ื•...

ืžื” ืื ืชืื—ืกื ื• ืงื‘ืฆื™ื ื‘ืื—ืกื•ืŸ ื”ืขื ืŸ ื”ื ื™ืชืŸ ืœื”ืจื—ื‘ื” ื”ืžื•ื›ืจ ื•ื”ื™ื“ื•ืข Amazon S3, ืžื‘ืœื™ ืฉืชืฆื˜ืจื›ื• ืœื”ื›ื™ืŸ ืฆืœืขื•ืช ืžืฉืœื›ื ืž-Hadoop?

ื‘ืจื•ืจ ืฉื”ื ืชื•ื ื™ื ื”ืื™ืฉื™ื™ื "ื ืžื•ื›ื™ื", ืื‘ืœ ืžื” ืœื’ื‘ื™ ื ืชื•ื ื™ื ืื—ืจื™ื ืื ืื ื—ื ื• ืžื•ืฆื™ืื™ื ืื•ืชื ืœืฉื ื•"ืžื ื™ืขื™ื ืื•ืชื ื‘ื™ืขื™ืœื•ืช"?

ืžืขืจื›ืช ืืงื•ืœื•ื’ื™ืช ืฉืœ Cluster-bigdata-analytics ืฉืœ Amazon Web Services - ื‘ืžื™ืœื™ื ืคืฉื•ื˜ื•ืช ืžืื•ื“

ืื ืœืฉืคื•ื˜ ืขืœ ืคื™ ื”ื ื™ืกื™ื•ืŸ ืฉืœื ื• ืขื AWS, Apache Hadoop/MapReduce ื ืžืฆื ื‘ืฉื™ืžื•ืฉ ืคืขื™ืœ ืฉื ื›ื‘ืจ ื–ืžืŸ ืจื‘ ืชื—ืช ืจื˜ื‘ื™ื ืฉื•ื ื™ื, ืœืžืฉืœ ื‘ืฉื™ืจื•ืช DataPipeline (ืื ื™ ืžืงื ื ื‘ืขืžื™ืชื™ื™, ื”ื ืœืžื“ื• ืœื”ื›ื™ืŸ ืื•ืชื• ื ื›ื•ืŸ). ื›ืืŸ ื”ื’ื“ืจื ื• ื’ื™ื‘ื•ื™ื™ื ืžืฉื™ืจื•ืชื™ื ืฉื•ื ื™ื ืžื˜ื‘ืœืื•ืช DynamoDB:
ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ื•ื”ื ืคื•ืขืœื™ื ื‘ืื•ืคืŸ ืงื‘ื•ืข ืขืœ ืืฉื›ื•ืœื•ืช Hadoop/MapReduce ืžืฉื•ื‘ืฆื™ื ื›ืžื• ืฉืขื•ืŸ ื›ื‘ืจ ื›ืžื” ืฉื ื™ื. "ืงื‘ืข ืืช ื–ื” ื•ืฉื›ื— ืžื–ื”":

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ืืชื” ื™ื›ื•ืœ ื’ื ืœืขืกื•ืง ื‘ื™ืขื™ืœื•ืช ื‘ืฉื˜ื ื™ื–ื ื ืชื•ื ื™ื ืขืœ ื™ื“ื™ ื”ื’ื“ืจืช ืžื—ืฉื‘ื™ื ื ื™ื™ื“ื™ื ืฉืœ Jupiter ื‘ืขื ืŸ ืขื‘ื•ืจ ืื ืœื™ืกื˜ื™ื ื•ืฉื™ืžื•ืฉ ื‘ืฉื™ืจื•ืช AWS SageMaker ื›ื“ื™ ืœืืžืŸ ื•ืœืคืจื•ืก ืžื•ื“ืœื™ื ืฉืœ AI ืœืงืจื‘. ื›ืš ื–ื” ื ืจืื” ืืฆืœื ื•:

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ื•ื›ืŸ, ืืชื” ื™ื›ื•ืœ ืœื”ืจื™ื ืžื—ืฉื‘ ื ื™ื™ื“ ืœืขืฆืžืš ืื• ืœืื ืœื™ืกื˜ ื‘ืขื ืŸ ื•ืœืฆืจืฃ ืื•ืชื• ืœืืฉื›ื•ืœ Hadoop/Spark, ืœืขืฉื•ืช ืืช ื”ื—ื™ืฉื•ื‘ื™ื ื•ืื– ืœืžืกืžืจ ื”ื›ืœ:

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ืžืžืฉ ื ื•ื— ืœืคืจื•ื™ืงื˜ื™ื ืื ืœื™ื˜ื™ื™ื ื‘ื•ื“ื“ื™ื ื•ืœื—ืœืงื ื”ืฉืชืžืฉื ื• ื‘ื”ืฆืœื—ื” ื‘ืฉื™ืจื•ืช EMR ืœื—ื™ืฉื•ื‘ื™ื ื•ื ื™ืชื•ื—ื™ื ื‘ืงื ื” ืžื™ื“ื” ื’ื“ื•ืœ. ืžื” ืœื’ื‘ื™ ืคืชืจื•ืŸ ืžืขืจื›ืช ืขื‘ื•ืจ DataLake, ื”ืื ื–ื” ื™ืขื‘ื•ื“? ื‘ืจื’ืข ื–ื” ื”ื™ื™ื ื• ืขืœ ืกืฃ ืชืงื•ื•ื” ื•ื™ื™ืื•ืฉ ื•ื”ืžืฉื›ื ื• ื‘ื—ื™ืคื•ืฉื™ื.

ื“ื‘ืง AWS - ืืจื•ื– ื‘ืงืคื™ื“ื” ืืคืืฆ'ื™ ืกืคืืจืง ืขืœ ืกื˜ืจื•ืื™ื“ื™ื

ื”ืชื‘ืจืจ ืฉืœ-AWS ื™ืฉ ื’ืจืกื” ืžืฉืœื” ืฉืœ ืขืจื™ืžืช "ื›ื•ื•ืจืช/ื—ื–ื™ืจ/ื ื™ืฆื•ืฅ". ืชืคืงื™ื“ื” ืฉืœ ื›ื•ื•ืจืช, ื›ืœื•ืžืจ. ืงื˜ืœื•ื’ ื”ืงื‘ืฆื™ื ื•ืกื•ื’ื™ื”ื ื‘-DataLake ืžื‘ื•ืฆืข ืขืœ ื™ื“ื™ ืฉื™ืจื•ืช "ืงื˜ืœื•ื’ ื ืชื•ื ื™ื", ืฉืื™ื ื• ืžืกืชื™ืจ ืืช ืชืื™ืžื•ืชื• ืœืคื•ืจืžื˜ Apache Hive. ืขืœื™ืš ืœื”ื•ืกื™ืฃ ืžื™ื“ืข ืœืฉื™ืจื•ืช ื–ื” ืœื’ื‘ื™ ื”ื™ื›ืŸ ืžืžื•ืงืžื™ื ื”ืงื‘ืฆื™ื ืฉืœืš ื•ื‘ืื™ื–ื” ืคื•ืจืžื˜ ื”ื ื ืžืฆืื™ื. ื”ื ืชื•ื ื™ื ื™ื›ื•ืœื™ื ืœื”ื™ื•ืช ืœื ืจืง ื‘-s3, ืืœื ื’ื ื‘ืžืกื“ ื”ื ืชื•ื ื™ื, ืื‘ืœ ื–ื” ืœื ื”ื ื•ืฉื ืฉืœ ื”ืคื•ืกื˜ ื”ื–ื”. ื›ืš ืžืื•ืจื’ื ืช ืกืคืจื™ื™ืช ื”ื ืชื•ื ื™ื ืฉืœ DataLake ืฉืœื ื•:

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ื”ืงื‘ืฆื™ื ืจืฉื•ืžื™ื, ืžืขื•ืœื”. ืื ื”ืงื‘ืฆื™ื ืขื•ื“ื›ื ื•, ืื ื• ืžืฉื™ืงื™ื ืกื•ืจืงื™ื ื‘ืื•ืคืŸ ื™ื“ื ื™ ืื• ืขืœ ืคื™ ืœื•ื— ื–ืžื ื™ื, ืฉื™ืขื“ื›ื ื• ืžื™ื“ืข ืขืœื™ื”ื ืžื”ืื’ื ื•ื™ืฉืžืจื• ืื•ืชื. ืœืื—ืจ ืžื›ืŸ ื ื™ืชืŸ ืœืขื‘ื“ ืืช ื”ื ืชื•ื ื™ื ืžื”ืื’ื ื•ืœื”ืขืœื•ืช ืืช ื”ืชื•ืฆืื•ืช ืœืžืงื•ื ื›ืœืฉื”ื•. ื‘ืžืงืจื” ื”ืคืฉื•ื˜ ื‘ื™ื•ืชืจ, ืื ื• ืžืขืœื™ื ื’ื ืœ-s3. ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื™ื›ื•ืœ ืœื”ื™ืขืฉื•ืช ื‘ื›ืœ ืžืงื•ื, ืืš ืžื•ืžืœืฅ ืœื”ื’ื“ื™ืจ ืืช ื”ืขื™ื‘ื•ื“ ื‘ืืฉื›ื•ืœ Apache Spark ื‘ืืžืฆืขื•ืช ื™ื›ื•ืœื•ืช ืžืชืงื“ืžื•ืช ื“ืจืš AWS Glue API. ืœืžืขืฉื”, ืืชื” ื™ื›ื•ืœ ืœืงื—ืช ืืช ืงื•ื“ ื”ืคื™ืชื•ืŸ ื”ื™ืฉืŸ ื•ื”ื˜ื•ื‘ ื•ื”ืžื•ื›ืจ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช pyspark ื•ืœื”ื’ื“ื™ืจ ืืช ื”ื‘ื™ืฆื•ืข ืฉืœื• ืขืœ N ืฆืžืชื™ื ืฉืœ ืืฉื›ื•ืœ ื‘ืขืœ ืงื™ื‘ื•ืœืช ื›ืœืฉื”ื™ ืขื ื ื™ื˜ื•ืจ, ืžื‘ืœื™ ืœื—ืคื•ืจ ื‘ืžืขื™ื™ื ืฉืœ Hadoop ื•ืœื’ืจื•ืจ ืงื•ื ื˜ื™ื™ื ืจื™ื ืฉืœ docker-smoker ื•ืœื‘ื˜ืœ ื”ืชื ื’ืฉื•ื™ื•ืช ืชืœื•ืช .

ืฉื•ื‘, ืจืขื™ื•ืŸ ืคืฉื•ื˜. ืื™ืŸ ืฆื•ืจืš ืœื”ื’ื“ื™ืจ ืืช Apache Spark, ืืชื” ืจืง ืฆืจื™ืš ืœื›ืชื•ื‘ ืงื•ื“ python ืขื‘ื•ืจ pyspark, ืœื‘ื“ื•ืง ืื•ืชื• ื‘ืื•ืคืŸ ืžืงื•ืžื™ ืขืœ ืฉื•ืœื—ืŸ ื”ืขื‘ื•ื“ื” ืฉืœืš ื•ืื– ืœื”ืคืขื™ืœ ืื•ืชื• ืขืœ ืืฉื›ื•ืœ ื’ื“ื•ืœ ื‘ืขื ืŸ, ืชื•ืš ืฆื™ื•ืŸ ื”ื™ื›ืŸ ื ืžืฆืื™ื ื ืชื•ื ื™ ื”ืžืงื•ืจ ื•ื”ื™ื›ืŸ ืœืฉื™ื ืืช ื”ืชื•ืฆืื”. ืœืคืขืžื™ื ื–ื” ื”ื›ืจื—ื™ ื•ืฉื™ืžื•ืฉื™, ื•ื”ื ื” ืื™ืš ืžื’ื“ื™ืจื™ื ืืช ื–ื”:

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ืœืคื™ื›ืš, ืื ืืชื” ืฆืจื™ืš ืœื—ืฉื‘ ืžืฉื”ื• ืขืœ ืืฉื›ื•ืœ Spark ื‘ืืžืฆืขื•ืช ื ืชื•ื ื™ื ื‘-s3, ืื ื• ื›ื•ืชื‘ื™ื ืงื•ื“ ื‘-python/pyspark, ื‘ื•ื“ืงื™ื ืื•ืชื•, ื•ื‘ื”ืฆืœื—ื” ืœืขื ืŸ.

ืžื” ืขื ื”ืชื–ืžื•ืจ? ืžื” ืื ื”ืžืฉื™ืžื” ืชื™ืคื•ืœ ื•ืชื™ืขืœื? ื›ืŸ, ืžื•ืฆืข ืœืขืฉื•ืช ืฆื™ื ื•ืจ ื™ืคื” ื‘ืกื’ื ื•ืŸ Apache Pig ื•ืืคื™ืœื• ื ื™ืกื™ื ื• ืื•ืชื, ืื‘ืœ ื‘ื™ื ืชื™ื™ื ื”ื—ืœื˜ื ื• ืœื”ืฉืชืžืฉ ื‘ืชื–ืžื•ืจ ื”ืžื•ืชืื ืื™ืฉื™ืช ืฉืœื ื• ื‘-PHP ื•-JavaScript (ืื ื™ ืžื‘ื™ืŸ, ื™ืฉ ื“ื™ืกื•ื ื ืก ืงื•ื’ื ื™ื˜ื™ื‘ื™, ืื‘ืœ ื–ื” ืขื•ื‘ื“, ืขื‘ื•ืจ ืฉื ื™ื ื•ืœืœื ืฉื’ื™ืื•ืช).

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ื”ืคื•ืจืžื˜ ืฉืœ ื”ืงื‘ืฆื™ื ื”ืžืื•ื—ืกื ื™ื ื‘ืื’ื ื”ื•ื ื”ืžืคืชื— ืœื‘ื™ืฆื•ืขื™ื

ื–ื” ืžืื•ื“ ืžืื•ื“ ื—ืฉื•ื‘ ืœื”ื‘ื™ืŸ ืขื•ื“ ืฉืชื™ ื ืงื•ื“ื•ืช ืžืคืชื—. ืขืœ ืžื ืช ืฉืฉืื™ืœืชื•ืช ืขืœ ื ืชื•ื ื™ ืงื‘ืฆื™ื ื‘ืื’ื ื™ืชื‘ืฆืขื• ื‘ืžื”ื™ืจื•ืช ื”ืืคืฉืจื™ืช ื•ื”ื‘ื™ืฆื•ืขื™ื ืœื ื™ื“ืจื“ืจื• ื›ืืฉืจ ืžื™ื“ืข ื—ื“ืฉ ื ื•ืกืฃ, ืขืœื™ืš:

  • ืื—ืกืŸ ืขืžื•ื“ื•ืช ืฉืœ ืงื‘ืฆื™ื ื‘ื ืคืจื“ (ื›ื“ื™ ืฉืœื ืชืฆื˜ืจืš ืœืงืจื•ื ืืช ื›ืœ ื”ืฉื•ืจื•ืช ื›ื“ื™ ืœื”ื‘ื™ืŸ ืžื” ื™ืฉ ื‘ืขืžื•ื“ื•ืช). ื‘ืฉื‘ื™ืœ ื–ื” ืœืงื—ื ื• ืืช ืคื•ืจืžื˜ ื”ืคืจืงื˜ ืขื ื“ื—ื™ืกื”
  • ื—ืฉื•ื‘ ืžืื•ื“ ืœื’ื–ื•ืจ ืงื‘ืฆื™ื ืœืชื™ืงื™ื•ืช ื›ืžื•: ืฉืคื”, ืฉื ื”, ื—ื•ื“ืฉ, ื™ื•ื, ืฉื‘ื•ืข. ืžื ื•ืขื™ื ืฉืžื‘ื™ื ื™ื ื‘ืกื•ื’ ื–ื” ืฉืœ ืจื™ืกื•ืง ื™ืกืชื›ืœื• ืจืง ืขืœ ื”ืชื™ืงื™ื•ืช ื”ื“ืจื•ืฉื•ืช, ืžื‘ืœื™ ืœื ืคื•ืช ืืช ื›ืœ ื”ื ืชื•ื ื™ื ื‘ืจืฆืฃ.

ื‘ืขื™ืงืจื• ืฉืœ ื“ื‘ืจ, ื‘ื“ืจืš ื–ื•, ืืชื” ืคื•ืจืก ืืช ื ืชื•ื ื™ ื”ืžืงื•ืจ ื‘ืฆื•ืจื” ื”ื™ืขื™ืœื” ื‘ื™ื•ืชืจ ืขื‘ื•ืจ ื”ืžื ื•ืขื™ื ื”ืื ืœื™ื˜ื™ื™ื ื”ืชืœื•ื™ื™ื ืœืžืขืœื”, ืฉืืคื™ืœื• ื‘ืชื™ืงื™ื•ืช ืžืจื•ืกืงื•ืช ื™ื›ื•ืœื™ื ืœื”ื™ื›ื ืก ื•ืœืงืจื•ื ืจืง ืืช ื”ืขืžื•ื“ื•ืช ื”ื ื—ื•ืฆื•ืช ืžืงื‘ืฆื™ื ื‘ืื•ืคืŸ ืกืœืงื˜ื™ื‘ื™. ืืชื” ืœื ืฆืจื™ืš "ืœืžืœื" ืืช ื”ื ืชื•ื ื™ื ื‘ืฉื•ื ืžืงื•ื (ื”ืื—ืกื•ืŸ ืคืฉื•ื˜ ื™ืชืคื•ืฆืฅ) - ืคืฉื•ื˜ ื”ื›ื ืก ืื•ืชื ืžื™ื“ ื‘ื—ื•ื›ืžื” ืœืžืขืจื›ืช ื”ืงื‘ืฆื™ื ื‘ืคื•ืจืžื˜ ื”ื ื›ื•ืŸ. ื›ืžื•ื‘ืŸ, ื›ืืŸ ืฆืจื™ืš ืœื”ื™ื•ืช ื‘ืจื•ืจ ืฉืื—ืกื•ืŸ ืงื•ื‘ืฅ csv ืขื ืง ื‘-DataLake, ืฉืงื•ื“ื ื›ืœ ืฆืจื™ืš ืœืงืจื•ื ืื•ืชื• ืฉื•ืจื” ืื—ืจ ืฉื•ืจื” ืขืœ ื™ื“ื™ ื”ืืฉื›ื•ืœ ื›ื“ื™ ืœื—ืœืฅ ืืช ื”ืขืžื•ื“ื•ืช, ืœื ืžืื•ื“ ืจืฆื•ื™. ืชื—ืฉื•ื‘ ืฉื•ื‘ ืขืœ ืฉืชื™ ื”ื ืงื•ื“ื•ืช ืœืขื™ืœ ืื ืขื“ื™ื™ืŸ ืœื ื‘ืจื•ืจ ืœืžื” ื›ืœ ื–ื” ืงื•ืจื”.

AWS Athena - ื”ื’'ืง-in-the-box

ื•ืื–, ืชื•ืš ื›ื“ื™ ื™ืฆื™ืจืช ืื’ื, ืื™ื›ืฉื”ื• ื ืชืงืœื ื• ื‘ื˜ืขื•ืช ื‘ืืžื–ื•ื ืก ืืชื ื”. ืคืชืื•ื ื”ืชื‘ืจืจ ืฉื‘ืืžืฆืขื•ืช ืกื™ื“ื•ืจ ืงืคื“ื ื™ ืฉืœ ืงื‘ืฆื™ ื”ื™ื•ืžืŸ ื”ืขื ืงื™ื™ื ืฉืœื ื• ืœืจืกื™ืกื™ ืชื™ืงื™ื•ืช ื‘ืคื•ืจืžื˜ ื”ืขืžื•ื“ื•ืช ื”ื ื›ื•ืŸ (ืคืจืงื˜), ืืชื” ื™ื›ื•ืœ ืžื”ืจ ืžืื•ื“ ืœื‘ืฆืข ืžื”ื ื‘ื—ื™ืจื•ืช ืื™ื ืคื•ืจืžื˜ื™ื‘ื™ื•ืช ื‘ื™ื•ืชืจ ื•ืœื‘ื ื•ืช ื“ื•ื—ื•ืช ื‘ืœื™, ื‘ืœื™ ืืฉื›ื•ืœ Apache Spark/Glue.

ืžื ื•ืข Athena ื”ืžื•ืคืขืœ ืขืœ ื™ื“ื™ ื ืชื•ื ื™ื ื‘-s3 ืžื‘ื•ืกืก ืขืœ ื”ืื’ื“ื™ ืคืจืกื˜ื• - ื ืฆื™ื’ ืžืžืฉืคื—ืช ื”ื’ื™ืฉื•ืช MPP (ืขื™ื‘ื•ื“ ืžืงื‘ื™ืœื™ ืžืกื™ื‘ื™) ืœืขื™ื‘ื•ื“ ื ืชื•ื ื™ื, ืœื•ืงื— ื ืชื•ื ื™ื ื”ื™ื›ืŸ ืฉื”ื ื ืžืฆืื™ื, ืž-s3 ื•-Hadoop ื•ืขื“ ืœืงืกื ื“ืจื” ื•ืงื‘ืฆื™ ื˜ืงืกื˜ ืจื’ื™ืœื™ื. ืืชื” ืจืง ืฆืจื™ืš ืœื‘ืงืฉ ืžืืชื ื” ืœื‘ืฆืข ืฉืื™ืœืชืช SQL, ื•ืื– ื”ื›ืœ "ืขื•ื‘ื“ ื‘ืžื”ื™ืจื•ืช ื•ื‘ืื•ืคืŸ ืื•ื˜ื•ืžื˜ื™". ื—ืฉื•ื‘ ืœืฆื™ื™ืŸ ืฉืืชื ื” ื”ื™ื "ื—ื›ืžื”", ื”ื™ื ืขื•ื‘ืจืช ืจืง ืœืชื™ืงื™ื•ืช ื”ืžืจื•ืกืงื•ืช ื”ื“ืจื•ืฉื•ืช ื•ืงื•ืจืืช ืจืง ืืช ื”ืขืžื•ื“ื•ืช ื”ื ื—ื•ืฆื•ืช ื‘ื‘ืงืฉื”.

ื’ื ื”ืชืžื—ื•ืจ ืœื‘ืงืฉื•ืช ืœืืชื ื” ืžืขื ื™ื™ืŸ. ืื ื—ื ื• ืžืฉืœืžื™ื ืขื‘ื•ืจ ื ืคื— ื”ื ืชื•ื ื™ื ื”ืกืจื•ืงื™ื. ื”ึธื”ึตืŸ. ืœื ืขื‘ื•ืจ ืžืกืคืจ ื”ืžื›ื•ื ื•ืช ื‘ืืฉื›ื•ืœ ืœื“ืงื”, ืืœื... ืขื‘ื•ืจ ื”ื ืชื•ื ื™ื ืฉื ืกืจืงื• ื‘ืคื•ืขืœ ืขืœ 100-500 ืžื›ื•ื ื•ืช, ืจืง ื”ื ืชื•ื ื™ื ื”ื“ืจื•ืฉื™ื ืœื”ืฉืœืžืช ื”ื‘ืงืฉื”.

ื•ื‘ื‘ืงืฉืช ืจืง ืืช ื”ืขืžื•ื“ื•ืช ื”ื ื—ื•ืฆื•ืช ืžืชื™ืงื™ื•ืช ืฉื ืฉื‘ืจื• ื›ื”ืœื›ื”, ื”ืชื‘ืจืจ ืฉืฉื™ืจื•ืช ืืชื ื” ืขื•ืœื” ืœื ื• ืขืฉืจื•ืช ื“ื•ืœืจื™ื ื‘ื—ื•ื“ืฉ. ื•ื‘ื›ืŸ, ื ื”ื“ืจ, ื›ืžืขื˜ ื‘ื—ื™ื ื, ื‘ื”ืฉื•ื•ืื” ืœื ื™ืชื•ื— ืขืœ ืืฉื›ื•ืœื•ืช!

ืื’ื‘, ื›ืš ืื ื• ืžื—ืœืงื™ื ืืช ื”ื ืชื•ื ื™ื ืฉืœื ื• ื‘-s3:

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ื›ืชื•ืฆืื” ืžื›ืš, ืชื•ืš ื–ืžืŸ ืงืฆืจ ื”ื—ืœื• ืžื—ืœืงื•ืช ืฉื•ื ื•ืช ืœื—ืœื•ื˜ื™ืŸ ื‘ื—ื‘ืจื”, ืžืื‘ื˜ื—ืช ืžื™ื“ืข ื•ื›ืœื” ื‘ืื ืœื™ื˜ื™ืงื”, ืœืฉืœื•ื— ื‘ืงืฉื•ืช ืœืืชื ื” ื‘ืื•ืคืŸ ืืงื˜ื™ื‘ื™ ื•ืžื”ืจ, ืชื•ืš ืฉื ื™ื•ืช, ืœืงื‘ืœ ืชืฉื•ื‘ื•ืช ืžื•ืขื™ืœื•ืช ืžื ืชื•ื ื™ื "ื’ื“ื•ืœื™ื" ืขืœ ืคื ื™ ืชืงื•ืคื•ืช ืืจื•ื›ื•ืช ืœืžื“ื™: ื—ื•ื“ืฉื™ื, ื—ืฆื™ ืฉื ื” ื•ื›ื•' ืค.

ืื‘ืœ ื”ืœื›ื ื• ืจื—ื•ืง ื™ื•ืชืจ ื•ื”ืชื—ืœื ื• ืœืœื›ืช ืœืขื ืŸ ื›ื“ื™ ืœืงื‘ืœ ืชืฉื•ื‘ื•ืช ื“ืจืš ืžื ื”ืœ ื”ืชืงืŸ ODBC: ืื ืœื™ืกื˜ ื›ื•ืชื‘ ืฉืื™ืœืชืช SQL ื‘ืงื•ื ืกื•ืœื” ืžื•ื›ืจืช, ืฉื‘-100-500 ืžื›ื•ื ื•ืช "ื‘ื’ืจื•ืฉื™ื" ืฉื•ืœื—ืช ื ืชื•ื ื™ื ืœ-s3 ื•ืžื—ื–ื™ืจื” ืชืฉื•ื‘ื” ื‘ื“ืจืš ื›ืœืœ ืชื•ืš ืžืกืคืจ ืฉื ื™ื•ืช. ื ื•ึนื—ึท. ื•ืžื”ืจ. ืื ื™ ืขื“ื™ื™ืŸ ืœื ืžืืžื™ืŸ.

ื›ืชื•ืฆืื” ืžื›ืš, ืœืื—ืจ ืฉื”ื—ืœื˜ื ื• ืœืื—ืกืŸ ื ืชื•ื ื™ื ื‘-s3, ื‘ืคื•ืจืžื˜ ื˜ื•ืจ ื™ืขื™ืœ ื•ืขื ืคื™ืฆื•ืœ ืกื‘ื™ืจ ืฉืœ ื ืชื•ื ื™ื ืœืชื™ืงื™ื•ืช... ืงื™ื‘ืœื ื• ืืช DataLake ื•ืžื ื•ืข ืื ืœื™ื˜ื™ ืžื”ื™ืจ ื•ื–ื•ืœ - ื‘ื—ื™ื ื. ื•ื”ื•ื ื”ืคืš ืคื•ืคื•ืœืจื™ ืžืื•ื“ ื‘ื—ื‘ืจื”, ื›ื™... ืžื‘ื™ืŸ SQL ื•ืขื•ื‘ื“ ื‘ืกื“ืจื™ ื’ื•ื“ืœ ืžื”ืจ ื™ื•ืชืจ ืžืืฉืจ ื“ืจืš ื”ืคืขืœื”/ืขืฆื™ืจื”/ื”ื’ื“ืจืช ืืฉื›ื•ืœื•ืช. "ื•ืื ื”ืชื•ืฆืื” ื–ื”ื”, ืœืžื” ืœืฉืœื ื™ื•ืชืจ?"

ื‘ืงืฉื” ืœืืชื ื” ื ืจืื™ืช ื‘ืขืจืš ื›ืš. ืื ืชืจืฆื”, ื›ืžื•ื‘ืŸ, ืืชื” ื™ื›ื•ืœ ืœื™ืฆื•ืจ ืžืกืคื™ืง ืฉืื™ืœืชืช SQL ืžื•ืจื›ื‘ืช ื•ืžืจื•ื‘ืช ืขืžื•ื“ื™ื, ืื‘ืœ ื ื’ื‘ื™ืœ ืืช ืขืฆืžื ื• ืœืงื™ื‘ื•ืฅ ืคืฉื•ื˜. ื‘ื•ืื• ื ืจืื” ืื™ืœื• ืงื•ื“ื™ ืชื’ื•ื‘ื” ื”ื™ื• ืœืœืงื•ื— ืœืคื ื™ ืžืกืคืจ ืฉื‘ื•ืขื•ืช ื‘ื™ื•ืžื ื™ ืฉืจืช ื”ืื™ื ื˜ืจื ื˜ ื•ื ื•ื•ื“ื ืฉืื™ืŸ ืฉื’ื™ืื•ืช:

ืื™ืš ืืจื’ื ื• DataLake ื™ืขื™ืœ ื•ื–ื•ืœ ื‘ืžื™ื•ื—ื“ ื•ืžื“ื•ืข ื–ื” ื›ืš

ืžืžืฆืื™ื

ืœืื—ืจ ืฉืขื‘ืจื ื•, ืฉืœื ืœื•ืžืจ, ื“ืจืš ืืจื•ื›ื”, ืื‘ืœ ื›ื•ืื‘ืช, ืชื•ืš ื”ืขืจื›ืช ื›ืœ ื”ื–ืžืŸ ื›ืจืื•ื™ ืืช ื”ืกื™ื›ื•ื ื™ื ื•ืืช ืจืžืช ื”ืžื•ืจื›ื‘ื•ืช ื•ื”ืขืœื•ืช ืฉืœ ื”ืชืžื™ื›ื”, ืžืฆืื ื• ืคืชืจื•ืŸ ืขื‘ื•ืจ DataLake ื•ืื ืœื™ื˜ื™ืงืก ืฉืœื ืžืคืกื™ืง ืœืจืฆื•ืช ืื•ืชื ื• ื’ื ื‘ืžื”ื™ืจื•ืช ื•ื’ื ื‘ืขืœื•ืช ื”ื‘ืขืœื•ืช.

ื”ืชื‘ืจืจ ืฉื‘ื ื™ื™ืช DataLake ื™ืขื™ืœ, ืžื”ื™ืจ ื•ื–ื•ืœ ืœืชืคืขื•ืœ ืœืฆืจื›ื™ื ืฉืœ ืžื—ืœืงื•ืช ืฉื•ื ื•ืช ืœื—ืœื•ื˜ื™ืŸ ื‘ื—ื‘ืจื” ื”ื™ื ืœื’ืžืจื™ ื‘ืžืกื’ืจืช ื”ื™ื›ื•ืœื•ืช ืฉืœ ื’ื ืžืคืชื—ื™ื ืžื ื•ืกื™ื ืฉืžืขื•ืœื ืœื ืขื‘ื“ื• ื›ืื“ืจื™ื›ืœื™ื ื•ืœื ื™ื•ื“ืขื™ื ืœืฉืจื˜ื˜ ืจื™ื‘ื•ืขื™ื ืขืœ ืจื™ื‘ื•ืขื™ื ืขื ื—ื™ืฆื™ื ื•ืžื›ื™ืจื™ื 50 ืžื•ื ื—ื™ื ืžื”ืืงื•ืกื™ืกื˜ื ืฉืœ Hadoop.

ื‘ืชื—ื™ืœืช ื”ื“ืจืš ืจืืฉื™ ื”ืชืคืฆืœ ืžื’ื ื™ ื”ื—ื™ื•ืช ื”ืคืจืื™ื™ื ื”ืจื‘ื™ื ืฉืœ ืชื•ื›ื ื•ืช ืคืชื•ื—ื•ืช ื•ืกื’ื•ืจื•ืช ื•ืžื”ื‘ื ืช ื ื˜ืœ ื”ืื—ืจื™ื•ืช ืขืœ ื”ืฆืืฆืื™ื. ืคืฉื•ื˜ ื”ืชื—ืœ ืœื‘ื ื•ืช ืืช ื”-DataLake ืฉืœืš โ€‹โ€‹ืžื›ืœื™ื ืคืฉื•ื˜ื™ื: nagios/munin -> elastic/kibana -> Hadoop/Spark/s3..., ืื™ืกื•ืฃ ืžืฉื•ื‘ ื•ื”ื‘ื ื” ืขืžื•ืงื” ืฉืœ ื”ืคื™ื–ื™ืงื” ืฉืœ ื”ืชื”ืœื™ื›ื™ื ื”ืžืชืจื—ืฉื™ื. ื›ืœ ื“ื‘ืจ ืžื•ืจื›ื‘ ื•ืขื›ื•ืจ - ืชื ื• ืื•ืชื• ืœืื•ื™ื‘ื™ื ื•ืœืžืชื—ืจื™ื.

ืื ืืชื” ืœื ืจื•ืฆื” ืœืœื›ืช ืœืขื ืŸ ื•ืื•ื”ื‘ ืœืชืžื•ืš, ืœืขื“ื›ืŸ ื•ืœืชืงืŸ ืคืจื•ื™ืงื˜ื™ื ืฉืœ ืงื•ื“ ืคืชื•ื—, ืืชื” ื™ื›ื•ืœ ืœื‘ื ื•ืช ืกื›ื™ืžื” ื“ื•ืžื” ืœืฉืœื ื• ื‘ืื•ืคืŸ ืžืงื•ืžื™, ืขืœ ืžื›ื•ื ื•ืช ืžืฉืจื“ื™ื•ืช ื–ื•ืœื•ืช ืขื Hadoop ื•-Presto ืœืžืขืœื”. ื”ืขื™ืงืจ ืœื ืœืขืฆื•ืจ ื•ืœื”ืชืงื“ื, ืœืกืคื•ืจ, ืœื—ืคืฉ ืคืชืจื•ื ื•ืช ืคืฉื•ื˜ื™ื ื•ื‘ืจื•ืจื™ื ื•ื”ื›ืœ ื‘ื”ื—ืœื˜ ื™ืกืชื“ืจ! ื‘ื”ืฆืœื—ื” ืœื›ื•ืœื ื•ืœื”ืชืจืื•ืช ืฉื•ื‘!

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”