Cov lej SQL tau muab tso ua ke yog ua tiav hauv Kev Cia Siab hauv ib ntu (graph)
Nov yog yam uas khiav los ntawm CLI yuav zoo li:
Txhua yam yog SELECT
Qhov no yog ib tug killer feature ntawm cov ntaub ntawv tsim Tool moj khaum. Hauv lwm lo lus, DBT abstracts tag nrho cov cai cuam tshuam nrog kev tsim koj cov lus nug rau hauv Lub Khw (variations ntawm cov lus txib CREATE, INSERT, UPDATE, DELETE ALTER, GRANT, ...).
Txhua tus qauv suav nrog sau ib qho lus nug SELECT uas txhais cov ntaub ntawv tshwm sim.
Nyob rau hauv cov ntaub ntawv no, lub transformation logic yuav ua tau ntau theem thiab sib sau cov ntaub ntawv los ntawm ob peb lwm cov qauv. Ib qho piv txwv ntawm tus qauv uas yuav tsim qhov kev txiav txim showcase (f_orders):
{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
with orders as (
select * from {{ ref('stg_orders') }}
),
order_payments as (
select * from {{ ref('order_payments') }}
),
final as (
select
orders.order_id,
orders.customer_id,
orders.order_date,
orders.status,
{% for payment_method in payment_methods -%}
order_payments.{{payment_method}}_amount,
{% endfor -%}
order_payments.total_amount as amount
from orders
left join order_payments using (order_id)
)
select * from final
Dab tsi nthuav peb tuaj yeem pom ntawm no?
Ua ntej: Siv CTE (Common Table Expressions) - los npaj thiab nkag siab cov lej uas muaj ntau yam kev hloov pauv thiab kev lag luam logic
-- ΠΠΎΠ½ΡΠΈΠ³ΡΡΠ°ΡΠΈΡ ΠΌΠΎΠ΄Π΅Π»ΠΈ:
-- ΠΠ½ΠΊΡΠ΅ΠΌΠ΅Π½ΡΠ°Π»ΡΠ½ΠΎΠ΅ Π½Π°ΠΏΠΎΠ»Π½Π΅Π½ΠΈΠ΅, ΡΠ½ΠΈΠΊΠ°Π»ΡΠ½ΡΠΉ ΠΊΠ»ΡΡ Π΄Π»Ρ ΠΎΠ±Π½ΠΎΠ²Π»Π΅Π½ΠΈΡ Π·Π°ΠΏΠΈΡΠ΅ΠΉ (unique_key)
-- ΠΠ»ΡΡ ΡΠ΅Π³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ (dist), ΠΊΠ»ΡΡ ΡΠΎΡΡΠΈΡΠΎΠ²ΠΊΠΈ (sort)
{{
config(
materialized='incremental',
unique_key='order_id',
dist="customer_id",
sort="order_date"
)
}}
{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
with orders as (
select * from {{ ref('stg_orders') }}
where 1=1
{% if is_incremental() -%}
-- ΠΡΠΎΡ ΡΠΈΠ»ΡΡΡ Π±ΡΠ΄Π΅Ρ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ ΡΠΎΠ»ΡΠΊΠΎ Π΄Π»Ρ ΠΈΠ½ΠΊΡΠ΅ΠΌΠ΅Π½ΡΠ°Π»ΡΠ½ΠΎΠ³ΠΎ Π·Π°ΠΏΡΡΠΊΠ°
and order_date >= (select max(order_date) from {{ this }})
{%- endif %}
),
order_payments as (
select * from {{ ref('order_payments') }}
),
final as (
select
orders.order_id,
orders.customer_id,
orders.order_date,
orders.status,
{% for payment_method in payment_methods -%}
order_payments.{{payment_method}}_amount,
{% endfor -%}
order_payments.total_amount as amount
from orders
left join order_payments using (order_id)
)
select * from final
Model dependency graph
Nws tseem yog tsob ntoo nyob. Nws tseem hu ua DAG (Directed Acyclic Graph).
DBT tsim ib daim duab raws li kev teeb tsa ntawm txhua qhov project qauv, los yog, ref() txuas hauv cov qauv mus rau lwm cov qauv. Muaj ib daim duab tso cai rau koj ua cov hauv qab no:
Khiav cov qauv hauv cov kab ke kom raug
Parallelization ntawm lub khw muag khoom tsim
Khiav ib tug arbitrary subgraph
Piv txwv ntawm graph visualization:
Txhua qhov ntawm daim duab yog tus qauv; cov npoo ntawm daim duab tau teev tseg los ntawm kev qhia ref.
Cov ntaub ntawv zoo thiab cov ntaub ntawv
Ntxiv rau kev tsim cov qauv ntawm lawv tus kheej, DBT tso cai rau koj los ntsuas ntau qhov kev xav txog cov ntaub ntawv tsim tawm, xws li:
Nov yog qhov ntxiv cov kev xeem thiab cov ntaub ntawv zoo li ntawm cov ntaub ntawv teeb tsa:
- name: fct_orders
description: This table has basic information about orders, as well as some derived facts based on payments
columns:
- name: order_id
tests:
- unique # ΠΏΡΠΎΠ²Π΅ΡΠΊΠ° Π½Π° ΡΠ½ΠΈΠΊΠ°Π»ΡΠ½ΠΎΡΡΡ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
- not_null # ΠΏΡΠΎΠ²Π΅ΡΠΊΠ° Π½Π° Π½Π°Π»ΠΈΡΠΈΠ΅ null
description: This is a unique identifier for an order
- name: customer_id
description: Foreign key to the customers table
tests:
- not_null
- relationships: # ΠΏΡΠΎΠ²Π΅ΡΠΊΠ° ΡΡΡΠ»ΠΎΡΠ½ΠΎΠΉ ΡΠ΅Π»ΠΎΡΡΠ½ΠΎΡΡΠΈ
to: ref('dim_customers')
field: customer_id
- name: order_date
description: Date (UTC) that the order was placed
- name: status
description: '{{ doc("orders_status") }}'
tests:
- accepted_values: # ΠΏΡΠΎΠ²Π΅ΡΠΊΠ° Π½Π° Π΄ΠΎΠΏΡΡΡΠΈΠΌΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ
values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
Thiab ntawm no yog cov ntaub ntawv no zoo li ntawm lub vev xaib tsim tawm:
Macros thiab Modules
Lub hom phiaj ntawm DBT tsis yog ntau heev los ua ib txheej ntawm SQL scripts, tab sis muab cov neeg siv nrog lub zog thiab muaj txiaj ntsig zoo txhais tau tias tsim lawv tus kheej hloov pauv thiab faib cov qauv no.
Macros yog cov txheej txheem tsim thiab kab lus uas tuaj yeem hu ua haujlwm hauv cov qauv. Macros tso cai rau koj rov siv SQL ntawm cov qauv thiab cov haujlwm raws li DRY (Tsis Txhob Rov Ua Koj Tus Kheej) engineering txoj cai.
Macro piv txwv:
{% macro rename_category(column_name) %}
case
when {{ column_name }} ilike '%osx%' then 'osx'
when {{ column_name }} ilike '%android%' then 'android'
when {{ column_name }} ilike '%ios%' then 'ios'
else 'other'
end as renamed_product
{% endmacro %}
Thiab nws siv:
{% set column_name = 'product' %}
select
product,
{{ rename_category(column_name) }} -- Π²ΡΠ·ΠΎΠ² ΠΌΠ°ΠΊΡΠΎΡΠ°
from my_table
DBT los nrog tus thawj tswj pob uas tso cai rau cov neeg siv luam tawm thiab rov siv tus kheej cov qauv thiab macros.
Qhov no txhais tau tias muaj peev xwm thauj khoom thiab siv cov tsev qiv ntawv xws li:
with source as (
select * from {{ source('salesforce', 'users') }}
where 1=1
{%- if target.name in ['dev', 'test', 'ci'] -%}
where timestamp >= dateadd(day, -3, current_date)
{%- endif -%}
)
Cov cai no hais tias: rau ib puag ncig dev, test, ci nqa cov ntaub ntawv tsuas yog rau 3 hnub dhau los thiab tsis muaj ntxiv lawm. Ntawd yog, kev khiav hauv cov cheeb tsam no yuav nrawm dua thiab xav tau cov peev txheej tsawg dua. Thaum khiav ntawm ib puag ncig khoom lub lim lim yuav tsis quav ntsej.
Materialization nrog alternate kem encoding
Redshift yog columnar DBMS uas tso cai rau koj los teeb tsa cov ntaub ntawv compression algorithms rau txhua tus kab ke. Xaiv qhov zoo tshaj plaws algorithms tuaj yeem txo qhov chaw disk los ntawm 20-50%.
Lub vev xaib nrog cov ntaub ntawv ntawm koj qhov project
Txuas CI (Nruam Integration)
xaus
Kev npaj thiab noj DWH ua kom muaj kev lom zem thiab muaj txiaj ntsig zoo li haus ib lub smoothie. DBT muaj xws li Jinja, cov neeg siv txuas ntxiv (modules), tus sau, tus thawj coj, thiab tus thawj tswj pob. Los ntawm muab cov ntsiab lus no ua ke koj tau txais ib qho chaw ua haujlwm tiav rau koj Cov Ntaub Ntawv Warehouse. Tsis muaj ib txoj hauv kev zoo dua los tswj kev hloov pauv hauv DWH niaj hnub no.
Cov kev ntseeg ua raws li cov tsim tawm ntawm DBT yog tsim raws li hauv qab no:
Cov ntaub ntawv tseem ceeb yuav tsum tau tswj hwm los ntawm cov neeg siv hauv zej zog raws li qhib qhov software
Tsis tsuas yog cov cuab yeej tshuaj ntsuam xyuas xwb, tab sis kuj cov lej yuav nce ntxiv los ua cov cuab yeej ntawm Open Source zej zog
Cov kev ntseeg tseem ceeb no tau tsim cov khoom lag luam uas tau siv los ntawm ntau dua 850 lub tuam txhab niaj hnub no, thiab lawv tsim lub hauv paus ntawm ntau qhov kev nthuav dav uas yuav tsim nyob rau yav tom ntej.
Rau cov neeg txaus siab, muaj ib daim vis dis aus ntawm kev qhia qhib kuv tau muab ob peb lub hlis dhau los ua ib feem ntawm kev qhia qhib ntawm OTUS - Data Build Tool rau Amazon Redshift Storage.