Cassandra. Yuav ua li cas tsis tuag yog tias koj tsuas paub Oracle

Hlo Habr.

Kuv lub npe yog Misha Butrimov, Kuv xav qhia koj me ntsis txog Cassandra. Kuv zaj dab neeg yuav muaj txiaj ntsig zoo rau cov neeg uas tsis tau ntsib NoSQL databases - nws muaj ntau yam kev siv thiab qhov teeb meem uas koj yuav tsum paub txog. Thiab yog tias koj tsis tau pom lwm yam uas tsis yog Oracle lossis lwm yam kev sib raug zoo database, cov khoom no yuav cawm koj txoj sia.

Dab tsi yog qhov zoo ntawm Cassandra? Nws yog NoSQL database tsim tsis muaj ib qho kev ua tsis tiav uas ntsuas tau zoo. Yog tias koj xav tau ntxiv ob peb terabytes rau qee qhov database, koj tsuas yog ntxiv cov nodes rau lub nplhaib. Nthuav nws mus rau lwm qhov chaw khaws ntaub ntawv? Ntxiv nodes rau pawg. Ua kom tiav RPS? Ntxiv nodes rau pawg. Nws ua hauj lwm nyob rau hauv qhov opposite direction ib yam nkaus thiab.

Cassandra. Yuav ua li cas tsis tuag yog tias koj tsuas paub Oracle

Nws yog dab tsi ntxiv? Nws yog hais txog kev tuav ntau qhov kev thov. Tab sis ntau npaum li cas? 10, 20, 30, 40 txhiab thov ib ob tsis ntau. 100 txhiab thov ib ob rau kev kaw cia - ib yam nkaus thiab. Muaj cov tuam txhab uas hais tias lawv khaws 2 lab thov ib ob. Tej zaum lawv yuav tau ntseeg nws.

Thiab hauv txoj cai, Cassandra muaj qhov sib txawv loj ntawm cov ntaub ntawv sib txheeb - nws tsis zoo ib yam li lawv txhua. Thiab qhov no tseem ceeb heev uas yuav tsum nco ntsoov.

Tsis yog txhua yam uas zoo ib yam ua haujlwm zoo ib yam

Thaum ib tug npoj yaig tuaj rau kuv thiab nug: "Ntawm no yog CQL Cassandra lus nug, thiab nws muaj cov lus xaiv, nws muaj qhov twg, nws muaj thiab. Kuv sau ntawv thiab nws tsis ua haujlwm. Vim li cas?" Kho Cassandra zoo li cov ntaub ntawv sib raug zoo yog txoj hauv kev zoo tshaj plaws los tua tus kheej ua phem. Thiab kuv tsis txhawb nws, nws yog txwv tsis pub nyob rau hauv Russia. Koj tsuas yog tsim ib yam dab tsi tsis ncaj ncees lawm.

Piv txwv li, ib tus neeg siv khoom tuaj rau peb thiab hais tias: "Cia peb tsim cov ntaub ntawv rau TV series, lossis cov ntaub ntawv rau daim ntawv qhia zaub mov. Peb yuav muaj zaub mov noj nyob ntawd lossis cov npe ntawm TV series thiab cov neeg ua yeeb yam hauv nws. " Peb hais zoo siab tias: β€œCia li mus!” Tsuas yog xa ob bytes, ob peb lub cim thiab koj ua tiav, txhua yam yuav ua haujlwm sai thiab ntseeg tau. Thiab txhua yam zoo kom txog rau thaum cov neeg siv khoom tuaj thiab hais tias cov niam tsev kuj tau daws qhov teeb meem tsis sib haum: lawv muaj cov npe ntawm cov khoom, thiab lawv xav paub seb lawv xav noj dab tsi. Koj tuag lawm.

Qhov no yog vim Cassandra yog hybrid database: nws ib txhij muab tus nqi tseem ceeb thiab khaws cov ntaub ntawv hauv kab dav. Hauv Java lossis Kotlin, nws tuaj yeem piav qhia zoo li no:

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>

Ntawd yog, ib daim ntawv qhia uas tseem muaj ib daim ntawv qhia txheeb. Thawj tus yuam sij rau daim ntawv qhia no yog tus yuam sij kab lossis muab faib - tus yuam sij muab faib. Qhov thib ob tus yuam sij, uas yog tus yuam sij rau daim ntawv qhia uas twb tau txheeb lawm, yog tus yuam sij Clustering.

Txhawm rau piav qhia qhov kev faib tawm ntawm cov ntaub ntawv, cia peb kos peb cov nodes. Tam sim no koj yuav tsum to taub yuav ua li cas decompose cov ntaub ntawv rau hauv nodes. Vim hais tias yog tias peb tsoo txhua yam rau hauv ib qho (los ntawm txoj kev, tuaj yeem muaj ib txhiab, ob txhiab, tsib - ntau npaum li koj nyiam), qhov no tsis yog tiag tiag txog kev faib khoom. Yog li ntawd, peb xav tau kev ua lej uas yuav rov qab tau tus lej. Tsuas yog tus lej, qhov ntev ntev uas yuav poob rau qee qhov ntau. Thiab peb yuav muaj ib lub luag hauj lwm rau ib qho, qhov thib ob rau qhov thib ob, qhov thib ib rau nth.

Cassandra. Yuav ua li cas tsis tuag yog tias koj tsuas paub Oracle

Tus lej no raug coj los siv hash muaj nuj nqi, uas yog siv rau qhov peb hu ua Partition key. Qhov no yog kem uas tau teev tseg nyob rau hauv Thawj qhov tseem ceeb cov lus qhia, thiab qhov no yog kem uas yuav yog thawj thiab tseem ceeb tshaj plaws ntawm daim ntawv qhia. Nws txiav txim siab seb qhov twg yuav tau txais cov ntaub ntawv twg. Ib lub rooj yog tsim nyob rau hauv Cassandra nrog yuav luag tib syntax li hauv SQL:

CREATE TABLE users (
	user_id uu id,
	name text,
	year int,
	salary float,
	PRIMARY KEY(user_id)

)

Lub hauv paus tseem ceeb hauv qhov no muaj ib kem, thiab nws kuj yog tus yuam sij muab faib.

Peb cov neeg siv yuav ua li cas? Qee tus yuav mus rau ib qho, qee qhov mus rau lwm qhov, thiab qee qhov mus rau peb. Qhov tshwm sim yog ib lub rooj hash zoo tib yam, tseem hu ua daim ntawv qhia, tseem hu ua phau ntawv txhais lus hauv Python, los yog tus qauv tseem ceeb ntawm tus nqi yooj yim uas peb tuaj yeem nyeem tag nrho cov txiaj ntsig, nyeem thiab sau los ntawm qhov tseem ceeb.

Cassandra. Yuav ua li cas tsis tuag yog tias koj tsuas paub Oracle

Xaiv: thaum tso cai rau lim hloov mus rau hauv kev luam theej duab tag nrho, lossis tsis ua dab tsi

Cia peb sau qee cov lus xaiv: select * from users where, userid = . Nws hloov tawm zoo li hauv Oracle: peb sau xaiv, qhia cov xwm txheej thiab txhua yam ua haujlwm, cov neeg siv tau txais nws. Tab sis yog tias koj xaiv, piv txwv li, tus neeg siv nrog ib lub xyoo yug, Cassandra yws tias nws tsis tuaj yeem ua tiav qhov kev thov. Vim tias nws tsis paub dab tsi ntawm qhov peb faib cov ntaub ntawv hais txog lub xyoo yug - nws tsuas muaj ib kab ntawv qhia ua tus yuam sij. Ces nws hais tias, β€œOk, kuv tseem ua tau raws li qhov kev thov no. Ntxiv tso cai lim. " Peb ntxiv cov lus qhia, txhua yam ua haujlwm. Thiab tam sim no ib yam dab tsi txaus ntshai tshwm sim.

Thaum peb khiav ntawm cov ntaub ntawv xeem, txhua yam zoo. Thiab thaum koj ua tiav cov lus nug hauv kev tsim khoom, qhov twg peb muaj, piv txwv li, 4 lab cov ntaub ntawv, ces txhua yam tsis zoo rau peb. Vim tias tso cai rau kev lim dej yog cov lus qhia uas tso cai rau Cassandra sau tag nrho cov ntaub ntawv los ntawm lub rooj no los ntawm tag nrho cov nodes, tag nrho cov chaw cov ntaub ntawv (yog tias muaj ntau ntawm lawv hauv pawg no), thiab tsuas yog lim nws. Qhov no yog ib qho analogue ntawm Full Scan, thiab tsis tshua muaj leej twg txaus siab rau nws.

Yog tias peb tsuas xav tau cov neeg siv los ntawm ID, peb yuav zoo nrog qhov no. Tab sis qee zaum peb yuav tsum tau sau lwm cov lus nug thiab yuam lwm yam kev txwv ntawm kev xaiv. Yog li ntawd, peb nco ntsoov: qhov no yog tag nrho ib daim ntawv qhia uas muaj ib tug tseem ceeb muab faib, tab sis nyob rau hauv nws yog ib tug sorted daim ntawv qhia.

Thiab nws kuj muaj tus yuam sij, uas peb hu ua Clustering Key. Tus yuam sij no, uas, dhau los, muaj cov kab uas peb xaiv, nrog kev pab los ntawm Cassandra nkag siab tias nws cov ntaub ntawv raug txheeb xyuas li cas thiab yuav nyob ntawm txhua qhov ntawm. Ntawd yog, rau qee qhov yuam sij Partition, Clustering key yuav qhia koj raws nraim li cas thawb cov ntaub ntawv rau hauv tsob ntoo no, qhov chaw nws yuav coj mus rau qhov twg.

Qhov no yog ib tsob ntoo tiag tiag, tus sib piv tsuas yog hu ua nyob rau ntawd, uas peb dhau ib pawg ntawm txhua kab hauv daim ntawv ntawm ib qho khoom, thiab nws kuj tau teev tseg raws li cov npe kab.

CREATE TABLE users_by_year_salary_id (
	user_id uuid,
	name text,
	year int,
	salary float,
	PRIMARY KEY((year), salary, user_id)

Ua tib zoo mloog cov lus qhia tseem ceeb; nws thawj qhov kev sib cav (hauv peb qhov xwm txheej, lub xyoo) yog ib txwm muab faib qhov tseem ceeb. Nws tuaj yeem muaj ib lossis ntau kab, nws tsis muaj teeb meem. Yog hais tias muaj ob peb kab, nws yuav tsum tau muab tshem tawm nyob rau hauv brackets dua kom cov lus preprocessor nkag siab tias qhov no yog lub hauv paus ntsiab lus, thiab tom qab nws tag nrho cov lwm kab yog Clustering key. Nyob rau hauv cov ntaub ntawv no, lawv yuav kis tau nyob rau hauv lub comparator nyob rau hauv qhov kev txiav txim nyob rau hauv uas lawv tshwm sim. Ntawd yog, thawj kab yog qhov tseem ceeb dua, qhov thib ob tsis tseem ceeb, thiab lwm yam. Yuav ua li cas peb sau, piv txwv li, sib npaug teb rau cov chav kawm cov ntaub ntawv: peb sau cov teb, thiab rau lawv peb sau qhov twg loj dua thiab qhov twg me dua. Hauv Cassandra, cov no yog, kuj hais lus, thaj chaw ntawm cov ntaub ntawv hauv chav kawm, uas qhov sib npaug sau rau nws yuav raug siv.

Peb teeb tsa kev txheeb xyuas thiab ua kom muaj kev txwv

Koj yuav tsum nco ntsoov tias qhov kev txiav txim (nce, nce mus, txawm li cas los xij) tau teeb tsa tib lub sijhawm thaum tus yuam sij raug tsim, thiab nws tsis tuaj yeem hloov pauv tom qab. Nws lub cev txiav txim siab seb cov ntaub ntawv yuav raug txheeb xyuas li cas thiab nws yuav khaws li cas. Yog tias koj xav tau hloov tus yuam sij Clustering lossis kev txiav txim siab, koj yuav tsum tsim lub rooj tshiab thiab hloov cov ntaub ntawv rau hauv nws. Qhov no yuav tsis ua haujlwm nrog ib qho uas twb muaj lawm.

Cassandra. Yuav ua li cas tsis tuag yog tias koj tsuas paub Oracle

Peb tau sau peb lub rooj nrog cov neeg siv thiab pom tias lawv poob rau hauv lub nplhaib, thawj xyoo yug, thiab tom qab ntawd sab hauv ntawm txhua qhov ntawm cov nyiaj hli thiab tus neeg siv ID. Tam sim no peb tuaj yeem xaiv los ntawm kev txwv kev txwv.

Peb qhov chaw ua haujlwm rov tshwm sim dua where, and, thiab peb tau txais cov neeg siv, thiab txhua yam zoo dua. Tab sis yog tias peb sim siv tsuas yog ib feem ntawm Clustering key, thiab ib qho tseem ceeb tsawg, ces Cassandra yuav yws tam sim ntawd tias nws nrhiav tsis tau qhov chaw hauv peb daim duab qhia chaw qhov khoom no, uas muaj cov teb no rau qhov sib piv null, thiab qhov no. uas yog cia li teem, - qhov twg nws dag. Kuv yuav tau rub tag nrho cov ntaub ntawv los ntawm node dua thiab lim nws. Thiab qhov no yog ib qho analogue ntawm Full Scan nyob rau hauv ib lub node, qhov no yog phem.

Hauv txhua qhov xwm txheej tsis meej, tsim lub rooj tshiab

Yog tias peb xav kom muaj peev xwm tsom cov neeg siv los ntawm ID, lossis hnub nyoog, lossis nyiaj hli, peb yuav ua li cas? Tsis muaj dab tsi. Tsuas yog siv ob lub rooj. Yog tias koj xav mus cuag cov neeg siv peb txoj kev sib txawv, yuav muaj peb lub rooj. Ploj mus yog cov hnub thaum peb txuag qhov chaw ntawm lub hau. Qhov no yog qhov khoom pheej yig tshaj plaws. Nws raug nqi ntau dua li lub sijhawm teb, uas tuaj yeem cuam tshuam rau tus neeg siv. Nws yog ntau qab ntxiag rau tus neeg siv kom tau txais ib yam dab tsi hauv ib thib ob tshaj li 10 feeb.

Peb pauv qhov chaw tsis tsim nyog thiab cov ntaub ntawv denormalized kom muaj peev xwm ntsuas tau zoo thiab ua haujlwm tau zoo. Tom qab tag nrho, nyob rau hauv qhov tseeb, ib pawg uas muaj peb cov ntaub ntawv chaw, txhua tus muaj tsib nodes, nrog ib tug tau txais kev pab ntawm cov ntaub ntawv khaws cia (thaum tsis muaj dab tsi ploj), muaj peev xwm ciaj sia tuag ntawm ib tug data center kiag li. Thiab ob lub pob ntxiv nyob rau hauv ib qho ntawm ob qho ntxiv. Thiab tsuas yog tom qab qhov teeb meem no pib. Qhov no yog ib qho zoo nkauj redundancy, nws tsim nyog ob peb ntxiv SSD drives thiab processors. Yog li ntawd, txhawm rau siv Cassandra, uas tsis yog SQL, uas tsis muaj kev sib raug zoo, cov yuam sij txawv teb chaws, koj yuav tsum paub cov cai yooj yim.

Peb tsim txhua yam raws li koj qhov kev thov. Qhov tseem ceeb tsis yog cov ntaub ntawv, tab sis yuav ua li cas daim ntawv thov yuav ua haujlwm nrog nws. Yog tias nws yuav tsum tau txais cov ntaub ntawv sib txawv hauv ntau txoj kev lossis tib cov ntaub ntawv sib txawv, peb yuav tsum muab tso rau hauv ib txoj kev uas yooj yim rau daim ntawv thov. Txwv tsis pub, peb yuav ua tsis tiav hauv Full Scan thiab Cassandra yuav tsis ua rau peb muaj txiaj ntsig.

Denormalizing cov ntaub ntawv yog tus qauv. Peb tsis nco qab txog cov ntaub ntawv ib txwm, peb tsis muaj cov ntaub ntawv sib txheeb. Yog peb tso ib yam 100 zaug, nws yuav pw 100 zaus. Nws tseem pheej yig dua li nres.

Peb xaiv cov yuam sij rau kev faib tawm kom lawv faib ib txwm. Peb tsis xav kom tus hash ntawm peb cov yuam sij poob rau hauv ib qho nqaim. Ntawd yog, xyoo yug hauv qhov piv txwv saum toj no yog piv txwv tsis zoo. Ntau qhov tseeb, nws yog qhov zoo yog tias peb cov neeg siv feem ntau faib los ntawm xyoo yug, thiab qhov phem yog tias peb tham txog cov tub ntxhais kawm qib 5 - qhov kev faib tawm yuav tsis zoo heev.

Kev txheeb xyuas raug xaiv ib zaug ntawm Clustering Key creation theem. Yog tias nws yuav tsum tau hloov pauv, peb yuav tau hloov kho peb lub rooj nrog tus yuam sij txawv.

Thiab qhov tseem ceeb tshaj plaws: yog tias peb yuav tsum khaws cov ntaub ntawv tib yam hauv 100 txoj kev sib txawv, ces peb yuav muaj 100 lub rooj sib txawv.

Tau qhov twg los: www.hab.com

Ntxiv ib saib