Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie

Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie
Ma luna o nā loina hea i kūkulu ʻia ai kahi Data Warehouse?

E nānā i ka waiwai o ka ʻoihana a me ka ʻikepili me ka loaʻa ʻole o ka code boilerplate. Ka mālama ʻana iā DWH ma ke ʻano he codebase: versioning, review, automated testing and CI. Modular, extensible, open source a me ke kaiāulu. ʻO nā palapala pili i ka mea hoʻohana a me ka hiʻohiʻona hilinaʻi (Data Lineage).

ʻO nā mea hou aʻe e pili ana i kēia a me ke kuleana o DBT i ka Big Data & Analytics ecosystem - welcome to cat.

Aloha kākou kela kanaka

Ua pili ʻo Artemy Kozyr. No nā makahiki ʻoi aku ma mua o 5 mau makahiki aʻu e hana pū ana me nā hale waihona ʻikepili, ke kūkulu ʻana i ka ETL/ELT, a me ka ʻikepili ʻikepili a me ka ʻike. Ke hana nei au i kēia manawa huila, Ke aʻo nei au ma OTUS ma kahi papa ʻIkeʻikepili, a i kēia lā makemake wau e kaʻana like me ʻoe i kahi ʻatikala aʻu i kākau ai i mua o ka hoʻomaka ʻana kau inoa hou no ka papa.

Hōʻike Overview

ʻO ka hoʻolālā DBT e pili ana i ka T ma ka acronym ELT (Extract - Transform - Load).

Me ka hiki ʻana mai o nā ʻikepili analytical huahana a hiki ke hoʻonui ʻia e like me BigQuery, Redshift, Snowflake, ʻaʻohe kumu o ka hana ʻana i nā hoʻololi ma waho o ka Data Warehouse. 

ʻAʻole hoʻoiho ʻo DBT i ka ʻikepili mai nā kumu, akā hāʻawi i nā manawa kūpono no ka hana ʻana me ka ʻikepili i hoʻoili ʻia i loko o ka Storage (ma loko a i waho paha).

Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie
ʻO ke kumu nui o DBT ka lawe ʻana i ke code, hōʻuluʻulu iā ia i SQL, hoʻokō i nā kauoha ma ke kaʻina pololei i ka Repository.

Hoʻolālā papahana DBT

Aia ka papahana i nā papa kuhikuhi a me nā faila o nā ʻano 2 wale nō:

  • Hoʻohālike (.sql) - he ʻāpana o ka hoʻololi i hōʻike ʻia e kahi nīnau SELECT
  • Kōnae hoʻonohonoho (.yml) - nā ʻāpana, nā hoʻonohonoho, nā hoʻokolohua, nā palapala

Ma kahi pae kumu, ua hoʻonohonoho ʻia ka hana penei:

  • Hoʻomākaukau ka mea hoʻohana i ke code kumu hoʻohālike i kekahi IDE kūpono
  • Ke hoʻohana nei i ka CLI, hoʻomaka ʻia nā hiʻohiʻona, hoʻohui ʻo DBT i ka code model i SQL
  • Hoʻokō ʻia ka code SQL i hoʻohui ʻia i ka Storage ma kahi kaʻina i hāʻawi ʻia (graph)

Eia ke ʻano o ka holo ʻana mai ka CLI:

Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie

ʻO nā mea a pau he SELECT

He hiʻohiʻona pepehi kanaka kēia o ka ʻIkepili Mea Hana Hana. Ma nā huaʻōlelo ʻē aʻe, hoʻokaʻawale ʻo DBT i nā code āpau e pili ana i ka hoʻopili ʻana i kāu mau nīnau i loko o ka hale kūʻai (nā ʻano like ʻole mai nā kauoha CREATE, INSERT, UPDATE, DELETE ALTER, GRANT, ...).

ʻO kēlā me kēia kumu hoʻohālike e pili ana i ke kākau ʻana i hoʻokahi nīnau SELECT e wehewehe ana i ka hoʻonohonoho ʻikepili hopena.

I kēia hihia, hiki i ka loiloi hoʻololi ke hoʻohui i nā ʻikepili mai nā kumu hoʻohālike ʻē aʻe. He laʻana o kahi hoʻohālike e kūkulu i kahi hōʻikeʻike kauoha (f_orders):

{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
 
with orders as (
 
   select * from {{ ref('stg_orders') }}
 
),
 
order_payments as (
 
   select * from {{ ref('order_payments') }}
 
),
 
final as (
 
   select
       orders.order_id,
       orders.customer_id,
       orders.order_date,
       orders.status,
       {% for payment_method in payment_methods -%}
       order_payments.{{payment_method}}_amount,
       {% endfor -%}
       order_payments.total_amount as amount
   from orders
       left join order_payments using (order_id)
 
)
 
select * from final

He aha nā mea hoihoi e ʻike ai ma ʻaneʻi?

ʻO ka mua: Hoʻohana ʻia ʻo CTE (Common Table Expressions) - e hoʻonohonoho a hoʻomaopopo i ka code i loaʻa i nā loli he nui a me nā loina ʻoihana.

ʻO ka lua: ʻO ka code model kahi hui ʻana o SQL a me ka ʻōlelo Jinja (ʻōlelo hoʻohālike).

Hoʻohana ka laʻana i kahi loop no ka mea, e hoʻopuka i ka nui no kēlā me kēia ʻano uku i kuhikuhi ʻia ma ka ʻōlelo i. Hoʻohana ʻia ka hana mihi - ka hiki ke kuhikuhi i nā hiʻohiʻona ʻē aʻe i loko o ke code:

  • I ka houluulu ana mihi e hoʻololi ʻia i kahi kiko kuhikuhi i ka papaʻaina a i ʻole ka nānā ʻana ma Storage
  • mihi hiki iā ʻoe ke kūkulu i ka pakuhi hilinaʻi kumu hoʻohālike

ʻO ia Jinja hoʻohui i nā mea hiki ʻole i ka DBT. ʻO nā mea i hoʻohana pinepine ʻia:

  • Inā / ʻē aʻe nā ʻōlelo - nā ʻōlelo lālā
  • No nā puka lou - cycles
  • Nā mea hoʻololi
  • Makro - hana macros

Mea Hana: Papa, Nānā, Hoʻonui

ʻO ka hoʻolālā materialization kahi ala e mālama ʻia ai ka hopena o ka ʻikepili kumu hoʻohālike i ka Storage.

Ma nā ʻōlelo kumu ʻo ia:

  • Papa - papa kino ma ka Waihona
  • Nānā - nānā, papa ʻaina ma ka Storage

Aia kekahi mau hoʻolālā materialization paʻakikī:

  • Hoʻonui - hoʻouka ʻia (o nā papa ʻike nui); hoʻohui ʻia nā laina hou, hoʻololi ʻia nā laina i hoʻololi ʻia, holoi ʻia nā laina i holoi ʻia 
  • Ephemeral - ʻaʻole i hoʻokō pololei ʻia ke kumu hoʻohālike, akā komo ʻo ia ma ke ʻano he CTE i nā hiʻohiʻona ʻē aʻe
  • Hiki iā ʻoe ke hoʻohui iā ʻoe iho i nā hoʻolālā ʻē aʻe

Ma kahi o nā hoʻolālā materialization, aia nā manawa no ka hoʻonui ʻana no nā Storages kikoʻī, no ka laʻana:

  • Snowflake: Nā papa kuʻuna, Hoʻohui ʻia ka hana, ka hui ʻana o ka papa, ke kope kope ʻana, nā manaʻo palekana
  • ʻO Redshift: Distkey, Sortkey (interleaved, compound), Late Binding Views
  • ʻO BigQuery: Hoʻokaʻawale papa a me ka hui ʻana, Hoʻohui ʻia ka hana, KMS Encryption, Lepili a me nā huaʻōlelo
  • hunaahi: Hōpili waihona (parquet, csv, json, orc, delta), partition_by, clustered_by, bākeke, incremental_strategy

Ke kākoʻo ʻia nei nā waihona i kēia manawa:

  • ʻO Postgres
  • ʻO Redshift
  • ʻO BigQuery
  • Snowflake
  • Presto (ʻāpana)
  • Spark (ʻāpana)
  • Microsoft SQL Server (mea hoʻopili kaiaulu)

E hoʻomaikaʻi i kā mākou kumu hoʻohālike:

  • E hoʻonui kākou i kona hoʻopiha (Incremental)
  • E hoʻohui i nā kī ʻāpana a me ka wehe ʻana no Redshift

-- Конфигурация модели: 
-- Инкрементальное наполнение, уникальный ключ для обновления записей (unique_key)
-- Ключ сегментации (dist), ключ сортировки (sort)
{{
  config(
       materialized='incremental',
       unique_key='order_id',
       dist="customer_id",
       sort="order_date"
   )
}}
 
{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
 
with orders as (
 
   select * from {{ ref('stg_orders') }}
   where 1=1
   {% if is_incremental() -%}
       -- Этот фильтр будет применен только для инкрементального запуска
       and order_date >= (select max(order_date) from {{ this }})
   {%- endif %} 
 
),
 
order_payments as (
 
   select * from {{ ref('order_payments') }}
 
),
 
final as (
 
   select
       orders.order_id,
       orders.customer_id,
       orders.order_date,
       orders.status,
       {% for payment_method in payment_methods -%}
       order_payments.{{payment_method}}_amount,
       {% endfor -%}
       order_payments.total_amount as amount
   from orders
       left join order_payments using (order_id)
 
)
 
select * from final

Kiʻi kiʻi hilinaʻi

He lāʻau hilinaʻi nō hoʻi ia. Ua kapa ʻia ʻo DAG (Directed Acyclic Graph).

Hoʻokumu ʻo DBT i ka pakuhi e pili ana i ka hoʻonohonoho ʻana o nā kumu hoʻohālike a pau, a i ʻole, ref() nā loulou i loko o nā hiʻohiʻona i nā kumu hoʻohālike ʻē aʻe. ʻO ka loaʻa ʻana o ka pakuhi hiki iā ʻoe ke hana i kēia mau mea:

  • Ke holo nei i nā kumu hoʻohālike ma ke kaʻina pololei
  • Ka hoʻolikelike ʻana o ka hale kūʻai
  • Ka holo ʻana i kahi subgraph kūʻokoʻa 

Laʻana o ka ʻike kiʻi kiʻi:

Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie
He kumu hoʻohālike kēlā me kēia node o ka pakuhi; ua kuhikuhi ʻia nā ʻaoʻao o ka pakuhi e ka ʻōlelo ref.

ʻIkepili a me ka palapala

Ma waho aʻe o ka hana ʻana i nā hiʻohiʻona iā lākou iho, ʻae ʻo DBT iā ʻoe e hoʻāʻo i kekahi mau manaʻo e pili ana i ka hoʻonohonoho ʻikepili hopena, e like me:

  • ʻAʻole Null
  • kūikawā
  • Hoʻopaʻa Kūʻai - ka pono kuhikuhi (no ka laʻana, customer_id i ka papa kauoha e pili ana i ka id i ka papaʻaina o nā mea kūʻai aku)
  • Hoʻohālikelike i ka papa inoa o nā waiwai i ʻae ʻia

Hiki ke hoʻohui i kāu mau ho'āʻo pono'ī (nā ho'āʻoʻikepili maʻamau), e like me, no ka laʻana, % deviation o ka loaʻa kālā me nā hōʻailona mai kahi lā, hoʻokahi pule, hoʻokahi mahina i hala. Hiki ke lilo i ho'āʻo kekahi manaʻo i haku ʻia ma ke ʻano he nīnau SQL.

Ma kēia ala, hiki iā ʻoe ke hopu i nā deviations makemake ʻole a me nā hewa i ka ʻikepili ma ka Warehouse windows.

Ma ka ʻōlelo o ka palapala, hāʻawi ʻo DBT i nā mīkini no ka hoʻohui ʻana, ka hoʻololi ʻana, a me ka hāʻawi ʻana i nā metadata a me nā manaʻo ma ke kumu hoʻohālike a me nā pae hiʻona. 

Eia ke ʻano o ka hoʻohui ʻana i nā hoʻokolohua a me nā palapala i ka pae faila hoʻonohonoho:

 - name: fct_orders
   description: This table has basic information about orders, as well as some derived facts based on payments
   columns:
     - name: order_id
       tests:
         - unique # проверка на уникальность значений
         - not_null # проверка на наличие null
       description: This is a unique identifier for an order
     - name: customer_id
       description: Foreign key to the customers table
       tests:
         - not_null
         - relationships: # проверка ссылочной целостности
             to: ref('dim_customers')
             field: customer_id
     - name: order_date
       description: Date (UTC) that the order was placed
     - name: status
       description: '{{ doc("orders_status") }}'
       tests:
         - accepted_values: # проверка на допустимые значения
             values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']

A eia ke ʻano o kēia palapala ma ka pūnaewele i hana ʻia:

Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie

Macros a me nā Module

ʻAʻole nui ka manaʻo o DBT e lilo i pūʻulu o nā palapala SQL, akā e hāʻawi i nā mea hoʻohana i kahi ala ikaika a waiwai nui no ke kūkulu ʻana i kā lākou mau hoʻololi a me ka hāʻawi ʻana i kēia mau modules.

ʻO Macros nā pūʻulu o nā kūkulu a me nā ʻōlelo i hiki ke kapa ʻia he mau hana i loko o nā hiʻohiʻona. ʻAe ʻo Macros iā ʻoe e hoʻohana hou i ka SQL ma waena o nā hiʻohiʻona a me nā papahana e like me ka DRY (Do not Repeat Yourself) engineering principle.

Laʻana macro:

{% macro rename_category(column_name) %}
case
 when {{ column_name }} ilike  '%osx%' then 'osx'
 when {{ column_name }} ilike  '%android%' then 'android'
 when {{ column_name }} ilike  '%ios%' then 'ios'
 else 'other'
end as renamed_product
{% endmacro %}

A me kona hoʻohana ʻana:

{% set column_name = 'product' %}
select
 product,
 {{ rename_category(column_name) }} -- вызов макроса
from my_table

Hele mai ʻo DBT me kahi luna pūʻolo e hiki ai i nā mea hoʻohana ke hoʻolaha a hoʻohana hou i nā modula a me nā macros.

ʻO ia ka hiki ke hoʻouka a hoʻohana i nā hale waihona puke e like me:

  • dbt_utils: hana pū me ka lā/manawa, nā kī pani, nā ho'āʻo Schema, Pivot/Unpivot a me nā mea ʻē aʻe
  • Nā papa hōʻikeʻike mākaukau no nā lawelawe e like me Puʻu hau и ka paopao 
  • Nā hale waihona puke no nā hale kūʻai ʻikepili kikoʻī, e.g. ʻO Redshift 
  • kālai lāʻauʻana - Module no ka hoʻopaʻa inoa ʻana i ka hana DBT

Hiki ke loaʻa kahi papa inoa piha o nā pūʻolo ma dbt hub.

ʻOi aku ka nui o nā hiʻohiʻona

Maʻaneʻi e wehewehe au i kekahi mau hiʻohiʻona hoihoi a me ka hoʻokō ʻana a ka hui a me aʻu e hoʻohana ai e kūkulu i kahi Data Warehouse huila.

Ka hoʻokaʻawale ʻana o nā kaiapuni wā holo DEV - TEST - PROD

ʻOiai i loko o ka pūʻulu DWH hoʻokahi (i loko o nā papahana like ʻole). No ka laʻana, e hoʻohana ana i kēia ʻōlelo:

with source as (
 
   select * from {{ source('salesforce', 'users') }}
   where 1=1
   {%- if target.name in ['dev', 'test', 'ci'] -%}           
       where timestamp >= dateadd(day, -3, current_date)   
   {%- endif -%}
 
)

'Ōlelo maoli kēia code: no nā kaiapuni dev, hoao, ci e lawe i ka ʻikepili no nā lā 3 i hala a ʻaʻole hou. ʻO ia hoʻi, ʻoi aku ka wikiwiki o ka holo ʻana i kēia mau kaiapuni a koi aku i nā kumuwaiwai liʻiliʻi. I ka holo ʻana ma luna o ke kaiapuni prod e nānā ʻole ʻia ke kūlana kānana.

Hoʻopilikino me ka hoʻopāpā kolamu ʻē aʻe

ʻO Redshift kahi DBMS columnar e hiki ai iā ʻoe ke hoʻonohonoho i nā algorithms kaomi ʻikepili no kēlā me kēia kolamu. ʻO ke koho ʻana i nā algorithm maikaʻi loa hiki ke hōʻemi i ka nui o ka disk ma 20-50%.

Makoleko redshift.compress_table e hoʻokō i ke kauoha ANALYZE COMPRESSION, e hana i kahi papaʻaina hou me nā algorithm e hoʻopili ai i ke kolamu i ʻōlelo ʻia, nā kī ʻāpana i wehewehe ʻia (dist_key) a me nā kī koho (sort_key), e hoʻoili i ka ʻikepili iā ia, a inā pono, e holoi i ke kope kahiko.

Pulima macro:

{{ compress_table(schema, table,
                 drop_backup=False,
                 comprows=none|Integer,
                 sort_style=none|compound|interleaved,
                 sort_keys=none|List<String>,
                 dist_style=none|all|even,
                 dist_key=none|String) }}

Holo ke kumu hoʻohālike logging

Hiki iā ʻoe ke hoʻopili i nā makau i kēlā me kēia hoʻokō o ke kŘkohu, e hoʻokō ʻia ma mua o ka hoʻomaka ʻana a i ʻole ma hope koke o ka pau ʻana o ka hana ʻana o ke kumu hoʻohālike:

   pre-hook: "{{ logging.log_model_start_event() }}"
   post-hook: "{{ logging.log_model_end_event() }}"

E ʻae ka module logging iā ʻoe e hoʻopaʻa i nā metadata pono a pau i kahi papa ʻokoʻa, hiki ke hoʻohana ʻia ma hope no ka loiloi a nānā ʻana i nā bottlenecks.

ʻO kēia ke ʻano o ka dashboard e pili ana i ka hoʻopaʻa ʻana i ka ʻikepili ma Looker:

Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie

ʻOtomation o ka mālama mālama

Inā ʻoe e hoʻohana i kekahi mau hoʻonui o ka hana o ka Repository i hoʻohana ʻia, e like me UDF (User Defined Functions), a laila ʻoi aku ka maʻalahi o ka hoʻololi ʻana i kēia mau hana, ka mana ʻae, a me ka holo ʻana i waho o nā mea hou e hana ma DBT.

Hoʻohana mākou i ka UDF ma Python e helu i nā hashes, nā leka uila, a me ka decoding bitmask.

ʻO kahi hiʻohiʻona o kahi macro e hana ana i kahi UDF ma nā wahi hoʻokō (dev, test, prod):

{% macro create_udf() -%}
 
 {% set sql %}
       CREATE OR REPLACE FUNCTION {{ target.schema }}.f_sha256(mes "varchar")
           RETURNS varchar
           LANGUAGE plpythonu
           STABLE
       AS $$  
           import hashlib
           return hashlib.sha256(mes).hexdigest()
       $$
       ;
 {% endset %}
  
 {% set table = run_query(sql) %}
 
{%- endmacro %}

Ma Wheely hoʻohana mākou iā Amazon Redshift, kahi i hoʻokumu ʻia ma PostgreSQL. No Redshift, he mea nui e hōʻiliʻili mau i nā ʻikepili ma nā papaʻaina a hoʻokuʻu i kahi diski - nā kauoha ANALYZE a me VACUUM, kēlā me kēia.

No ka hana ʻana i kēia, hoʻokō ʻia nā kauoha mai ka redshift_maintenance macro i kēlā me kēia pō:

{% macro redshift_maintenance() %}
 
   {% set vacuumable_tables=run_query(vacuumable_tables_sql) %}
 
   {% for row in vacuumable_tables %}
       {% set message_prefix=loop.index ~ " of " ~ loop.length %}
 
       {%- set relation_to_vacuum = adapter.get_relation(
                                               database=row['table_database'],
                                               schema=row['table_schema'],
                                               identifier=row['table_name']
                                   ) -%}
       {% do run_query("commit") %}
 
       {% if relation_to_vacuum %}
           {% set start=modules.datetime.datetime.now() %}
           {{ dbt_utils.log_info(message_prefix ~ " Vacuuming " ~ relation_to_vacuum) }}
           {% do run_query("VACUUM " ~ relation_to_vacuum ~ " BOOST") %}
           {{ dbt_utils.log_info(message_prefix ~ " Analyzing " ~ relation_to_vacuum) }}
           {% do run_query("ANALYZE " ~ relation_to_vacuum) %}
           {% set end=modules.datetime.datetime.now() %}
           {% set total_seconds = (end - start).total_seconds() | round(2)  %}
           {{ dbt_utils.log_info(message_prefix ~ " Finished " ~ relation_to_vacuum ~ " in " ~ total_seconds ~ "s") }}
       {% else %}
           {{ dbt_utils.log_info(message_prefix ~ ' Skipping relation "' ~ row.values() | join ('"."') ~ '" as it does not exist') }}
       {% endif %}
 
   {% endfor %}
 
{% endmacro %}

Kapua DBT

Hiki ke hoʻohana i ka DBT ma ke ʻano he lawelawe (Managed Service). Aia i loko:

  • IDE pūnaewele no ka hoʻomohala ʻana i nā papahana a me nā hiʻohiʻona
  • Hoʻonohonoho hana a hoʻonohonoho
  • Loaʻa maʻalahi a maʻalahi i nā lāʻau
  • Paena pūnaewele me nā palapala o kāu papahana
  • Hoʻohui CI (Hoʻohui Hoʻomau)

Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie

hopena

ʻO ka hoʻomākaukau ʻana a me ka ʻai ʻana i ka DWH e lilo i mea leʻaleʻa a maikaʻi hoʻi e like me ka inu ʻana i kahi smoothie. Aia ʻo DBT i Jinja, nā mea hoʻohana (modules), kahi mea hoʻopili, mea hoʻokō, a me kahi luna hoʻonohonoho. Ma ka hui pū ʻana i kēia mau mea e loaʻa iā ʻoe kahi wahi hana piha no kāu Data Warehouse. ʻAʻohe ala maikaʻi aʻe e hoʻokele i ka hoʻololi ʻana i loko o DWH i kēia lā.

Mea Hana Hana ʻIkepili a i ʻole ka mea maʻamau ma waena o Data Warehouse a me Smoothie

ʻO nā manaʻoʻiʻo i hahai ʻia e nā mea hoʻomohala o DBT ua hoʻokumu ʻia penei:

  • ʻO ke code, ʻaʻole ʻo GUI, ʻo ia ka abstraction maikaʻi loa no ka hōʻike ʻana i ka loiloi analytical paʻakikī
  • Pono ka hana me ka ʻikepili e hoʻololi i nā hana maikaʻi loa i ka ʻenekinia polokalamu (Software Engineering)

  • Pono e hoʻomalu ʻia nā ʻōnaehana ʻikepili koʻikoʻi e ke kaiāulu mea hoʻohana ma ke ʻano he polokalamu open source
  • ʻAʻole wale nā ​​mea hana analytics, akā e lilo pū ka code i mea waiwai o ke kaiāulu Open Source

Ua hoʻokumu kēia mau manaʻoʻiʻo i kahi huahana i hoʻohana ʻia e nā ʻoihana 850 i kēia lā, a lilo lākou i kumu o nā hoʻonui hoihoi e hana ʻia i ka wā e hiki mai ana.

No ka poʻe hoihoi, aia kahi wikiō o kahi haʻawina hāmama aʻu i hāʻawi ai i kekahi mau mahina i hala aku nei ma ke ʻano he haʻawina wehe ma OTUS - Mea Hana Hana Ikepili no Amazon Redshift Storage.

Ma waho aʻe o ka DBT a me ka Data Warehousing, ma ke ʻano he ʻāpana o ka papa Data Engineer ma ka platform OTUS, aʻo wau a me koʻu mau hoa hana i nā papa ma kekahi mau kumuhana kūpono a me nā kumuhana hou.

  • Nā Manaʻo Hoʻolālā no nā noi ʻikepili nui
  • E hoʻomaʻamaʻa me Spark a me Spark Streaming
  • Ke ʻimi nei i nā ʻano a me nā mea hana no ka hoʻouka ʻana i nā kumu ʻikepili
  • Ke kūkulu ʻana i nā hale hōʻikeʻike loiloi ma DWH
  • Nā manaʻo NoSQL: HBase, Cassandra, ElasticSearch
  • Nā loina o ka nānā ʻana a me ka hoʻokani pila 
  • ʻO ka papahana hope: hoʻohui i nā mākau āpau ma lalo o ke kākoʻo aʻoaʻo

Nā Manaʻo:

  1. Palapala DBT - Introduction — Nā palapala kūhelu
  2. He aha ka dbt? — E nānā i ka ʻatikala a kekahi o nā mea kākau o DBT 
  3. Mea Hana Hana Ikepili no Amazon Redshift Storage - YouTube, Hoʻopaʻa ʻana i kahi haʻawina wehe OTUS
  4. E ʻike iā Greenplum — ʻO ka haʻawina wehe aʻe ʻo Mei 15, 2020
  5. Papa Hana ʻIkepili —OTUS
  6. Ke kūkulu ʻana i kahi kaʻina hana o nā ʻikepili makua - He nānā i ka wā e hiki mai ana o ka ʻikepili a me ka analytics
  7. ʻO ka manawa kēia no ka wehe ʻana i ka ʻikepili kumu - Ka ulu ʻana o ka analytics a me ka mana o Open Source
  8. Hoʻohui mau a me ka hoʻāʻo ʻana i kūkulu ʻia me dbtCloud - Nā loina o ke kūkulu ʻana iā CI me ka hoʻohana ʻana iā DBT
  9. E hoʻomaka me ka aʻo DBT — E hoʻomaʻamaʻa, nā ʻōlelo aʻoaʻo ʻanuʻu no ka hana kūʻokoʻa
  10. ʻO ka hale kūʻai Jaffle - Github DBT Tutorial — Github, code papahana hoʻonaʻauao

E aʻo hou e pili ana i ka papa.

Source: www.habr.com

E kūʻai i ka hoʻokipa hilinaʻi no nā pūnaewele me ka pale DDoS, nā kikowaena VPS VDS 🔥 E kūʻai i ka hoʻokipa pūnaewele hilinaʻi me ka pale DDoS, nā kikowaena VPS VDS | ProHoster