Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie

Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie
Ntawm cov hauv paus ntsiab lus twg yog qhov zoo tshaj plaws Data Warehouse tsim?

Tsom ntsoov rau kev lag luam tus nqi thiab kev tshuaj ntsuam xyuas thaum tsis muaj boilerplate code. Tswj DWH raws li codebase: versioning, tshuaj xyuas, automated testing thiab CI. Modular, extensible, qhib qhov chaw thiab zej zog. Cov ntaub ntawv tus neeg siv-phooj ywg thiab kev cia siab pom pom (Data Lineage).

Xav paub ntau ntxiv txog tag nrho cov no thiab hais txog lub luag haujlwm ntawm DBT hauv Cov Ntaub Ntawv Loj & Analytics ecosystem - txais tos miv.

Nyob zoo txhua tus

Artemy Kozyr yog nyob rau hauv kov. Rau ntau tshaj 5 xyoos kuv tau ua haujlwm nrog cov ntaub ntawv warehouses, tsim ETL / ELT, nrog rau cov ntaub ntawv txheeb xyuas thiab kev pom. Kuv tam sim no ua haujlwm hauv log tsheb, Kuv qhia ntawm OTUS ntawm ib chav kawm Cov Kws Ua Hauj Lwm Cov Ntaub Ntawv, thiab hnub no kuv xav qhia rau koj ib tsab xov xwm uas kuv tau sau rau hauv kev cia siab ntawm qhov pib kev cuv npe tshiab rau chav kawm.

Kev tshuaj xyuas luv luv

DBT lub moj khaum yog txhua yam hais txog T hauv ELT (Extract - Transform - Load) acronym.

Nrog rau qhov tshwm sim ntawm cov khoom tsim tau thiab ntsuas qhov ntsuas tau zoo li BigQuery, Redshift, Snowflake, tsis muaj qhov taw qhia hauv kev hloov pauv sab nraum Cov Ntaub Ntawv Warehouse. 

DBT tsis rub tawm cov ntaub ntawv los ntawm cov peev txheej, tab sis muab lub sijhawm zoo rau kev ua haujlwm nrog cov ntaub ntawv uas twb tau muab tso rau hauv Cia (hauv Internal lossis External Storage).

Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie
Lub hom phiaj tseem ceeb ntawm DBT yog coj cov cai, muab tso ua ke rau hauv SQL, ua tiav cov lus txib hauv cov kab ke kom raug hauv Repository.

DBT Project Structure

Qhov project muaj cov npe thiab cov ntaub ntawv tsuas yog 2 hom:

  • Qauv (.sql) - chav tsev ntawm kev hloov pauv qhia los ntawm SELECT query
  • Configuration file (.yml) - parameters, settings, tests, documentation

Nyob rau theem pib, kev ua haujlwm yog tsim raws li hauv qab no:

  • Tus neeg siv npaj tus qauv code hauv ib qho yooj yim IDE
  • Siv CLI, cov qauv tsim tawm, DBT suav cov qauv code rau hauv SQL
  • Cov lej SQL tau muab tso ua ke yog ua tiav hauv Kev Cia Siab hauv ib ntu (graph)

Nov yog yam uas khiav los ntawm CLI yuav zoo li:

Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie

Txhua yam yog SELECT

Qhov no yog ib tug killer feature ntawm cov ntaub ntawv tsim Tool moj khaum. Hauv lwm lo lus, DBT abstracts tag nrho cov cai cuam tshuam nrog kev tsim koj cov lus nug rau hauv Lub Khw (variations ntawm cov lus txib CREATE, INSERT, UPDATE, DELETE ALTER, GRANT, ...).

Txhua tus qauv suav nrog sau ib qho lus nug SELECT uas txhais cov ntaub ntawv tshwm sim.

Nyob rau hauv cov ntaub ntawv no, lub transformation logic yuav ua tau ntau theem thiab sib sau cov ntaub ntawv los ntawm ob peb lwm cov qauv. Ib qho piv txwv ntawm tus qauv uas yuav tsim qhov kev txiav txim showcase (f_orders):

{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
 
with orders as (
 
   select * from {{ ref('stg_orders') }}
 
),
 
order_payments as (
 
   select * from {{ ref('order_payments') }}
 
),
 
final as (
 
   select
       orders.order_id,
       orders.customer_id,
       orders.order_date,
       orders.status,
       {% for payment_method in payment_methods -%}
       order_payments.{{payment_method}}_amount,
       {% endfor -%}
       order_payments.total_amount as amount
   from orders
       left join order_payments using (order_id)
 
)
 
select * from final

Dab tsi nthuav peb tuaj yeem pom ntawm no?

Ua ntej: Siv CTE (Common Table Expressions) - los npaj thiab nkag siab cov lej uas muaj ntau yam kev hloov pauv thiab kev lag luam logic

Thib ob: Tus qauv code yog sib xyaw ntawm SQL thiab lus Jinja (templating lus).

Piv txwv siv lub voj rau los tsim tus nqi rau txhua txoj kev them nyiaj uas tau teev tseg hauv cov lus qhia txheej. Cov haujlwm tseem siv tau ref - lub peev xwm los siv lwm cov qauv hauv cov cai:

  • Thaum muab tso ua ke ref yuav hloov mus rau lub hom phiaj pointer rau ib lub rooj los yog saib hauv Cia
  • ref tso cai rau koj los tsim ib qho qauv kev vam khom graph

Nws yog Jinja ntxiv yuav luag tsis muaj kev txwv rau DBT. Cov feem ntau siv yog:

  • Yog / lwm nqe lus - ceg nqe lus
  • Rau loops
  • Hloov pauv
  • Macro - tsim macros

Materialization: Table, Saib, Incremental

Materialization lub tswv yim yog ib txoj hauv kev raws li qhov tshwm sim ntawm cov qauv ntaub ntawv yuav muab khaws cia rau hauv Cia.

Hauv cov ntsiab lus tseem ceeb nws yog:

  • Table - lub cev lub rooj nyob rau hauv Cia
  • Saib - saib, lub rooj virtual hauv Cia

Kuj tseem muaj ntau cov tswv yim tsim khoom siv tau yooj yim:

  • Incremental - incremental loading (ntawm cov rooj loj loj); cov kab tshiab tau ntxiv, cov kab hloov pauv hloov tshiab, cov kab tshem tawm raug tshem tawm 
  • Ephemeral - tus qauv tsis materialize ncaj qha, tab sis koom nrog CTE hauv lwm cov qauv
  • Lwm cov tswv yim koj tuaj yeem ntxiv koj tus kheej

Ntxiv nrog rau cov tswv yim tsim khoom, muaj cov hauv kev rau kev ua kom zoo rau cov khoom tshwj xeeb, piv txwv li:

  • Paj daus: Cov rooj hloov pauv hloov pauv, Kev coj tus cwj pwm sib koom ua ke, cov lus sib koom ua ke, luam cov nyiaj pab, ruaj ntseg saib
  • Redshift: Distkey, Sortkey (interleaved, compound), Late Binding Views
  • LojQuery: Table partitioning & clustering, Merge cwj pwm, KMS Encryption, Labels & Tags
  • Txim: Cov ntaub ntawv hom (parquet, csv, json, orc, delta), partition_by, clustered_by, thoob, incremental_strategy

Cov nram qab no Storages yog tam sim no txaus siab:

  • Xaib
  • Redshift
  • LojQuery
  • Paj daus
  • Presto (ib nrab)
  • Spark (ib nrab)
  • Microsoft SQL Server (community adapter)

Cia peb txhim kho peb cov qauv:

  • Cia peb ua nws qhov incremental (Incremental)
  • Wb ntxiv segmentation thiab sorting keys rau Redshift

-- ΠšΠΎΠ½Ρ„ΠΈΠ³ΡƒΡ€Π°Ρ†ΠΈΡ ΠΌΠΎΠ΄Π΅Π»ΠΈ: 
-- Π˜Π½ΠΊΡ€Π΅ΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½ΠΎΠ΅ Π½Π°ΠΏΠΎΠ»Π½Π΅Π½ΠΈΠ΅, ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ ΠΊΠ»ΡŽΡ‡ для обновлСния записСй (unique_key)
-- ΠšΠ»ΡŽΡ‡ сСгмСнтации (dist), ΠΊΠ»ΡŽΡ‡ сортировки (sort)
{{
  config(
       materialized='incremental',
       unique_key='order_id',
       dist="customer_id",
       sort="order_date"
   )
}}
 
{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
 
with orders as (
 
   select * from {{ ref('stg_orders') }}
   where 1=1
   {% if is_incremental() -%}
       -- Π­Ρ‚ΠΎΡ‚ Ρ„ΠΈΠ»ΡŒΡ‚Ρ€ Π±ΡƒΠ΄Π΅Ρ‚ ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ для ΠΈΠ½ΠΊΡ€Π΅ΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½ΠΎΠ³ΠΎ запуска
       and order_date >= (select max(order_date) from {{ this }})
   {%- endif %} 
 
),
 
order_payments as (
 
   select * from {{ ref('order_payments') }}
 
),
 
final as (
 
   select
       orders.order_id,
       orders.customer_id,
       orders.order_date,
       orders.status,
       {% for payment_method in payment_methods -%}
       order_payments.{{payment_method}}_amount,
       {% endfor -%}
       order_payments.total_amount as amount
   from orders
       left join order_payments using (order_id)
 
)
 
select * from final

Model dependency graph

Nws tseem yog tsob ntoo nyob. Nws tseem hu ua DAG (Directed Acyclic Graph).

DBT tsim ib daim duab raws li kev teeb tsa ntawm txhua qhov project qauv, los yog, ref() txuas hauv cov qauv mus rau lwm cov qauv. Muaj ib daim duab tso cai rau koj ua cov hauv qab no:

  • Khiav cov qauv hauv cov kab ke kom raug
  • Parallelization ntawm lub khw muag khoom tsim
  • Khiav ib tug arbitrary subgraph 

Piv txwv ntawm graph visualization:

Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie
Txhua qhov ntawm daim duab yog tus qauv; cov npoo ntawm daim duab tau teev tseg los ntawm kev qhia ref.

Cov ntaub ntawv zoo thiab cov ntaub ntawv

Ntxiv rau kev tsim cov qauv ntawm lawv tus kheej, DBT tso cai rau koj los ntsuas ntau qhov kev xav txog cov ntaub ntawv tsim tawm, xws li:

  • Tsis Null
  • Cim
  • Reference Integrity - referential integrity (piv txwv li, customer_id nyob rau hauv lub rooj txiav txim sib raug rau id nyob rau hauv cov neeg muas zaub rooj)
  • Sib piv cov npe ntawm cov txiaj ntsig tau txais

Nws muaj peev xwm ntxiv koj tus kheej cov kev xeem (cov ntaub ntawv kev cai), xws li, piv txwv li, % kev sib txawv ntawm cov nyiaj tau los nrog cov ntsuas los ntawm ib hnub, ib lub lim tiam, ib hlis dhau los. Txhua qhov kev xav tau tsim raws li SQL query tuaj yeem dhau los ua kev sim.

Ua li no, koj tuaj yeem ntes tsis xav sib txawv thiab ua yuam kev hauv cov ntaub ntawv hauv Warehouse windows.

Hais txog cov ntaub ntawv, DBT muab cov txheej txheem rau kev ntxiv, hloov kho, thiab faib cov ntaub ntawv metadata thiab cov lus pom ntawm tus qauv thiab txawm tias tus cwj pwm qib. 

Nov yog qhov ntxiv cov kev xeem thiab cov ntaub ntawv zoo li ntawm cov ntaub ntawv teeb tsa:

 - name: fct_orders
   description: This table has basic information about orders, as well as some derived facts based on payments
   columns:
     - name: order_id
       tests:
         - unique # ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° Π½Π° ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ
         - not_null # ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° Π½Π° Π½Π°Π»ΠΈΡ‡ΠΈΠ΅ null
       description: This is a unique identifier for an order
     - name: customer_id
       description: Foreign key to the customers table
       tests:
         - not_null
         - relationships: # ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° ссылочной цСлостности
             to: ref('dim_customers')
             field: customer_id
     - name: order_date
       description: Date (UTC) that the order was placed
     - name: status
       description: '{{ doc("orders_status") }}'
       tests:
         - accepted_values: # ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° Π½Π° допустимыС значСния
             values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']

Thiab ntawm no yog cov ntaub ntawv no zoo li ntawm lub vev xaib tsim tawm:

Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie

Macros thiab Modules

Lub hom phiaj ntawm DBT tsis yog ntau heev los ua ib txheej ntawm SQL scripts, tab sis muab cov neeg siv nrog lub zog thiab muaj txiaj ntsig zoo txhais tau tias tsim lawv tus kheej hloov pauv thiab faib cov qauv no.

Macros yog cov txheej txheem tsim thiab kab lus uas tuaj yeem hu ua haujlwm hauv cov qauv. Macros tso cai rau koj rov siv SQL ntawm cov qauv thiab cov haujlwm raws li DRY (Tsis Txhob Rov Ua Koj Tus Kheej) engineering txoj cai.

Macro piv txwv:

{% macro rename_category(column_name) %}
case
 when {{ column_name }} ilike  '%osx%' then 'osx'
 when {{ column_name }} ilike  '%android%' then 'android'
 when {{ column_name }} ilike  '%ios%' then 'ios'
 else 'other'
end as renamed_product
{% endmacro %}

Thiab nws siv:

{% set column_name = 'product' %}
select
 product,
 {{ rename_category(column_name) }} -- Π²Ρ‹Π·ΠΎΠ² макроса
from my_table

DBT los nrog tus thawj tswj pob uas tso cai rau cov neeg siv luam tawm thiab rov siv tus kheej cov qauv thiab macros.

Qhov no txhais tau tias muaj peev xwm thauj khoom thiab siv cov tsev qiv ntawv xws li:

  • dbt_utils: Ua haujlwm nrog Hnub / Sijhawm, Surrogate Keys, Schema tests, Pivot / Unpivot thiab lwm yam
  • Npaj-ua showcase templates rau cov kev pab xws li Snowplow ΠΈ Stripe 
  • Cov tsev qiv ntawv rau cov khw muag khoom tshwj xeeb, xws li. Redshift 
  • txiav - Module rau kev sau npe DBT ua haujlwm

Ib daim ntawv teev tag nrho ntawm pob khoom tuaj yeem pom ntawm dbt kub.

Txawm ntau nta

Ntawm no kuv yuav piav qhia txog ob peb lwm yam nthuav dav thiab kev siv uas pab neeg thiab kuv siv los tsim kom muaj Data Warehouse hauv log tsheb.

Kev sib cais ntawm qhov chaw khiav haujlwm DEV - TEST - PROD

Txawm nyob rau hauv tib lub DWH pawg (hauv cov txheej txheem sib txawv). Piv txwv li, siv cov lus hauv qab no:

with source as (
 
   select * from {{ source('salesforce', 'users') }}
   where 1=1
   {%- if target.name in ['dev', 'test', 'ci'] -%}           
       where timestamp >= dateadd(day, -3, current_date)   
   {%- endif -%}
 
)

Cov cai no hais tias: rau ib puag ncig dev, test, ci nqa cov ntaub ntawv tsuas yog rau 3 hnub dhau los thiab tsis muaj ntxiv lawm. Ntawd yog, kev khiav hauv cov cheeb tsam no yuav nrawm dua thiab xav tau cov peev txheej tsawg dua. Thaum khiav ntawm ib puag ncig khoom lub lim lim yuav tsis quav ntsej.

Materialization nrog alternate kem encoding

Redshift yog columnar DBMS uas tso cai rau koj los teeb tsa cov ntaub ntawv compression algorithms rau txhua tus kab ke. Xaiv qhov zoo tshaj plaws algorithms tuaj yeem txo qhov chaw disk los ntawm 20-50%.

Macro redshift.compress_table yuav ua tiav qhov kev txiav txim siab COMPRESSION, tsim ib lub rooj tshiab nrog cov kab lus pom zoo encoding algorithms, teev segmentation yuam sij (dist_key) thiab sorting keys (sort_key), hloov cov ntaub ntawv mus rau nws, thiab, yog tias tsim nyog, rho tawm cov ntawv qub.

Macro kos npe:

{{ compress_table(schema, table,
                 drop_backup=False,
                 comprows=none|Integer,
                 sort_style=none|compound|interleaved,
                 sort_keys=none|List<String>,
                 dist_style=none|all|even,
                 dist_key=none|String) }}

Kev sau qauv khiav

Koj tuaj yeem txuas hooks rau txhua qhov kev ua tiav ntawm tus qauv, uas yuav raug tua ua ntej tso lossis tam sim tom qab tsim cov qauv tiav:

   pre-hook: "{{ logging.log_model_start_event() }}"
   post-hook: "{{ logging.log_model_end_event() }}"

Lub log log module yuav tso cai rau koj sau tag nrho cov ntaub ntawv tsim nyog hauv ib lub rooj sib cais, uas tuaj yeem siv los tshawb xyuas thiab txheeb xyuas cov fwj.

Nov yog qhov dashboard zoo li raws li cov ntaub ntawv teev npe hauv Looker:

Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie

Automation ntawm Kev Kho Kom Zoo

Yog tias koj siv qee qhov kev txuas ntxiv ntawm kev ua haujlwm ntawm Cov Khoom Siv Siv, xws li UDF (User Defined Functions), tom qab ntawd kev hloov kho ntawm cov haujlwm no, kev tswj xyuas, thiab kev siv lub tshuab dov tawm ntawm cov ntawv tso tawm tshiab yog qhov yooj yim heev los ua hauv DBT.

Peb siv UDF hauv Python los suav hashes, email domains, thiab bitmask decoding.

Ib qho piv txwv ntawm macro uas tsim UDF ntawm txhua qhov chaw ua haujlwm (dev, xeem, prod):

{% macro create_udf() -%}
 
 {% set sql %}
       CREATE OR REPLACE FUNCTION {{ target.schema }}.f_sha256(mes "varchar")
           RETURNS varchar
           LANGUAGE plpythonu
           STABLE
       AS $$  
           import hashlib
           return hashlib.sha256(mes).hexdigest()
       $$
       ;
 {% endset %}
  
 {% set table = run_query(sql) %}
 
{%- endmacro %}

Ntawm Wheely peb siv Amazon Redshift, uas yog raws li PostgreSQL. Rau Redshift, nws yog ib qho tseem ceeb uas yuav tsum tsis tu ncua sau cov txheeb cais ntawm cov ntxhuav thiab tso cov chaw seem disk - cov lus txiav txim siab thiab VACUUM, feem.

Ua li no, cov lus txib los ntawm redshift_maintenance macro raug tua txhua hmo:

{% macro redshift_maintenance() %}
 
   {% set vacuumable_tables=run_query(vacuumable_tables_sql) %}
 
   {% for row in vacuumable_tables %}
       {% set message_prefix=loop.index ~ " of " ~ loop.length %}
 
       {%- set relation_to_vacuum = adapter.get_relation(
                                               database=row['table_database'],
                                               schema=row['table_schema'],
                                               identifier=row['table_name']
                                   ) -%}
       {% do run_query("commit") %}
 
       {% if relation_to_vacuum %}
           {% set start=modules.datetime.datetime.now() %}
           {{ dbt_utils.log_info(message_prefix ~ " Vacuuming " ~ relation_to_vacuum) }}
           {% do run_query("VACUUM " ~ relation_to_vacuum ~ " BOOST") %}
           {{ dbt_utils.log_info(message_prefix ~ " Analyzing " ~ relation_to_vacuum) }}
           {% do run_query("ANALYZE " ~ relation_to_vacuum) %}
           {% set end=modules.datetime.datetime.now() %}
           {% set total_seconds = (end - start).total_seconds() | round(2)  %}
           {{ dbt_utils.log_info(message_prefix ~ " Finished " ~ relation_to_vacuum ~ " in " ~ total_seconds ~ "s") }}
       {% else %}
           {{ dbt_utils.log_info(message_prefix ~ ' Skipping relation "' ~ row.values() | join ('"."') ~ '" as it does not exist') }}
       {% endif %}
 
   {% endfor %}
 
{% endmacro %}

DBT Huab

Nws tuaj yeem siv DBT ua qhov kev pabcuam (Managed Service). suav nrog:

  • Web IDE rau kev tsim cov haujlwm thiab cov qauv
  • Kev teeb tsa thiab kev teem caij ua haujlwm
  • Yooj yim thiab yooj yim nkag mus rau cov cav
  • Lub vev xaib nrog cov ntaub ntawv ntawm koj qhov project
  • Txuas CI (Nruam Integration)

Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie

xaus

Kev npaj thiab noj DWH ua kom muaj kev lom zem thiab muaj txiaj ntsig zoo li haus ib lub smoothie. DBT muaj xws li Jinja, cov neeg siv txuas ntxiv (modules), tus sau, tus thawj coj, thiab tus thawj tswj pob. Los ntawm muab cov ntsiab lus no ua ke koj tau txais ib qho chaw ua haujlwm tiav rau koj Cov Ntaub Ntawv Warehouse. Tsis muaj ib txoj hauv kev zoo dua los tswj kev hloov pauv hauv DWH niaj hnub no.

Data Build Tool lossis dab tsi yog qhov sib txawv ntawm Data Warehouse thiab Smoothie

Cov kev ntseeg ua raws li cov tsim tawm ntawm DBT yog tsim raws li hauv qab no:

  • Code, tsis yog GUI, yog qhov kev paub daws teeb meem zoo tshaj plaws rau kev nthuav tawm cov tswv yim nyuaj
  • Ua haujlwm nrog cov ntaub ntawv yuav tsum hloov kho cov kev coj ua zoo tshaj plaws hauv software engineering (Software Engineering)

  • Cov ntaub ntawv tseem ceeb yuav tsum tau tswj hwm los ntawm cov neeg siv hauv zej zog raws li qhib qhov software
  • Tsis tsuas yog cov cuab yeej tshuaj ntsuam xyuas xwb, tab sis kuj cov lej yuav nce ntxiv los ua cov cuab yeej ntawm Open Source zej zog

Cov kev ntseeg tseem ceeb no tau tsim cov khoom lag luam uas tau siv los ntawm ntau dua 850 lub tuam txhab niaj hnub no, thiab lawv tsim lub hauv paus ntawm ntau qhov kev nthuav dav uas yuav tsim nyob rau yav tom ntej.

Rau cov neeg txaus siab, muaj ib daim vis dis aus ntawm kev qhia qhib kuv tau muab ob peb lub hlis dhau los ua ib feem ntawm kev qhia qhib ntawm OTUS - Data Build Tool rau Amazon Redshift Storage.

Ntxiv rau DBT thiab Data Warehousing, uas yog ib feem ntawm cov chav kawm Data Engineer ntawm OTUS platform, kuv cov npoj yaig thiab kuv qhia cov chav kawm ntawm ntau yam tseem ceeb thiab niaj hnub no:

  • Architectural Concepts rau Big Data Applications
  • Xyaum nrog Spark thiab Spark Streaming
  • Tshawb nrhiav txoj hauv kev thiab cov cuab yeej los thauj cov ntaub ntawv
  • Tsim kev tshuaj ntsuam xyuas hauv DWH
  • NoSQL cov ntsiab lus: HBase, Cassandra, ElasticSearch
  • Cov ntsiab cai ntawm kev saib xyuas thiab orchestration 
  • Kev Ua Haujlwm Kawg: muab tag nrho cov txuj ci ua ke hauv kev txhawb nqa kev cob qhia

Links:

  1. DBT cov ntaub ntawv - Taw qhia - Cov ntaub ntawv raug cai
  2. Dab tsi, raws nraim, yog dbt? - Tshawb xyuas tsab xov xwm los ntawm ib tus kws sau ntawv ntawm DBT 
  3. Data Build Tool rau Amazon Redshift Storage - YouTube, Sau ntawm OTUS qhib zaj lus qhia
  4. Tau paub Greenplum - Zaj lus qhia qhib tom ntej yog lub Tsib Hlis 15, 2020
  5. Chav Kawm Data Engineering β€”OTUS
  6. Tsim kom muaj Mature Analytics Workflow - Saib yav tom ntej ntawm cov ntaub ntawv thiab kev txheeb xyuas
  7. Nws yog lub sijhawm rau qhib qhov kev tshuaj ntsuam - Kev hloov pauv ntawm kev txheeb xyuas thiab cuam tshuam ntawm Open Source
  8. Kev sib koom ua ke tsis tu ncua thiab Kev Tshawb Fawb Ua Haujlwm nrog dbtCloud - Cov ntsiab cai ntawm kev tsim CI siv DBT
  9. Pib nrog DBT kev qhia - Xyaum, cov lus qhia ib ntus rau kev ua haujlwm ywj pheej
  10. Jaffle khw - Github DBT Tutorial - Github, kev kawm txoj haujlwm code

Kawm ntxiv txog chav kawm.

Tau qhov twg los: www.hab.com

Ntxiv ib saib