Package tidyr inosanganisirwa mukati meimwe yeanonyanya kufarirwa maraibhurari mumutauro weR - tidyverse.
Chinangwa chikuru chepakeji ndechekuunza iyo data mufomu chaiyo.
Yatova kuwanikwa paHabré
S.J.K.: Kuunganidza () uye kupararira () kuchaderedzwa?
Hadley Wickham: Kusvika kumwero wakati. Isu hatichakurudzire kushandiswa kweaya mabasa uye kugadzirisa tsikidzi mazviri, asi ivo vacharamba varipo mupakeji mune yavo yazvino mamiriro.
Zviri mukati
Kana iwe uchifarira kuongorora data, unogona kunge uchifarira zvangu
TidyData pfungwa Mabasa makuru anosanganisirwa mune tidyr package Pfungwa nyowani yekushandura data kubva pakufara kuenda kune refu fomati uye neimwe nzira Kuisa iyo yazvino vhezheni yetidyr 0.8.3.9000 Shanduko kune zvitsva Muenzaniso wakapfava wekushandura data kubva pahupamhi kuenda kune refu fomati Specs Tsanangudzo uchishandisa maitiro akawanda (.value) Kushandura mafaremu emazuva kubva kureba kuenda kune yakafara fomati Mienzaniso yakati wandei yekushanda neiyo nyowani tidyr pfungwa mhedziso
TidyData pfungwa
Chinangwa tidyr - batsira iwe kuunza iyo data kune inonzi yakarongeka fomu. Neat data idata uko:
- Mutsauko wega wega uri mukoramu.
- Chimwe nechimwe chekucherechedza itambo.
- Kukosha kwega kwega isero.
Zviri nyore uye zviri nyore kushanda nedata rinounzwa mune yakarongeka data paunenge uchiongorora.
Mabasa makuru anosanganisirwa mune tidyr package
tidyr ine seti yemabasa akagadzirirwa kushandura matafura:
fill()
- kuzadza kukosha kwakashaikwa mukoramu neakapfuura kukosha;separate()
- kupatsanura munda mumwe kuita akati wandei uchishandisa kupatsanura;unite()
- inoita mashandiro ekubatanidza minda yakati wandei kuita imwe, inverse chiito chebasaseparate()
;pivot_longer()
- basa rinoshandura data kubva kune yakakura fomati kuenda kune refu fomati;pivot_wider()
- basa rinoshandura data kubva kune yakareba fomati kuenda kune yakakura fomati. Iko reverse kushanda kweiyo yakaitwa nebasapivot_longer()
.gather()
yapera - basa rinoshandura data kubva kune yakakura fomati kuenda kune refu fomati;spread()
yapera - basa rinoshandura data kubva kune yakareba fomati kuenda kune yakakura fomati. Iko reverse kushanda kweiyo yakaitwa nebasagather()
.
Pfungwa nyowani yekushandura data kubva pakufara kuenda kune refu fomati uye neimwe nzira
Kare, mabasa aishandiswa kune rudzi urwu rwekushandura gather()
и spread()
. Kwemakore ekuvapo kwemabasa aya, zvakava pachena kuti kune vazhinji vashandisi, kusanganisira munyori wepasuru, mazita emabasa aya uye nharo dzawo aive asina kunyatsojeka, uye zvakakonzera matambudziko mukuawana uye kunzwisisa kuti nderipi remabasa aya anoshandura. date frame kubva pahupamhi kusvika kureba fomati, uye neimwe nzira.
Panyaya iyi, in tidyr Maviri matsva, akakosha mabasa akawedzerwa ayo akagadzirirwa kushandura mazuva mafuremu.
Zvitsva zvitsva pivot_longer()
и pivot_wider()
vakafemerwa nezvimwe zvezvinhu zviri mupakeji cdata, yakagadzirwa naJohn Mount naNina Zumel.
Kuisa iyo yazvino vhezheni yetidyr 0.8.3.9000
Kuisa iyo nyowani, yazvino vhezheni yepasuru tidyr 0.8.3.9000, pane zvitsva zvinowanikwa, shandisa kodhi inotevera.
devtools::install_github("tidyverse/tidyr")
Panguva yekunyora, aya mabasa anongowanikwa mune dev vhezheni yepakeji paGitHub.
Shanduko kune zvitsva
Muchokwadi, hazvina kuoma kuendesa zvinyorwa zvekare kuti zvishande nemabasa matsva; kuti unzwisise zviri nani, ini ndichatora muenzaniso kubva muzvinyorwa zvemabasa ekare uye kuratidza maitirwo akafanana anoitwa uchishandisa matsva. pivot_*()
mabasa.
Shandura fomati yakafara kuita fomati refu.
Muenzaniso kodhi kubva kuunganidza basa zvinyorwa
# example
library(dplyr)
stocks <- data.frame(
time = as.Date('2009-01-01') + 0:9,
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2),
Z = rnorm(10, 0, 4)
)
# old
stocks_gather <- stocks %>% gather(key = stock,
value = price,
-time)
# new
stocks_long <- stocks %>% pivot_longer(cols = -time,
names_to = "stock",
values_to = "price")
Kushandura fomati refu kuita yakafara fomati.
Muenzaniso kodhi kubva kuspread function zvinyorwa
# old
stocks_spread <- stocks_gather %>% spread(key = stock,
value = price)
# new
stock_wide <- stocks_long %>% pivot_wider(names_from = "stock",
values_from = "price")
Nokuti mumienzaniso iri pamusoro yekushanda nayo pivot_longer()
и pivot_wider()
, mutafura yepakutanga mumatanda hapana makoramu akanyorwa mumakakava mazita_ku и values_to mazita avo anofanira kunge ari muquotation marks.
Tafura iyo ichakubatsira iwe zvakanyanya nyore kuona kuti ungachinja sei kushanda nepfungwa itsva tidyr.
Cherechedza kubva kumunyori
Zvose zvinyorwa zviri pasi apa zvinogadzirisa, ndingatoti shanduro yekusununguka
vignettes kubva kune yepamutemo tidyverse raibhurari webhusaiti.
Muenzaniso wakapfava wekushandura data kubva pahupamhi kuenda kune refu fomati
pivot_longer ()
- inoita kuti data irebe nekudzikisa huwandu hwemakoramu uye kuwedzera huwandu hwemitsara.
Kuti uite mienzaniso yakaratidzwa muchinyorwa, iwe unofanirwa kutanga wabatanidza mapakeji anodiwa:
library(tidyr)
library(dplyr)
library(readr)
Ngatitii tine tafura ine mhedzisiro yeongororo iyo (pakati pezvimwe zvinhu) yakabvunza vanhu nezvechitendero chavo uye mari yegore:
#> # A tibble: 18 x 11
#> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Agnostic 27 34 60 81 76 137
#> 2 Atheist 12 27 37 52 35 70
#> 3 Buddhist 27 21 30 34 33 58
#> 4 Catholic 418 617 732 670 638 1116
#> 5 Don’t k… 15 14 15 11 10 35
#> 6 Evangel… 575 869 1064 982 881 1486
#> 7 Hindu 1 9 7 9 11 34
#> 8 Histori… 228 244 236 238 197 223
#> 9 Jehovah… 20 27 24 24 21 30
#> 10 Jewish 19 19 25 25 30 95
#> # … with 8 more rows, and 4 more variables: `$75-100k` <dbl>,
#> # `$100-150k` <dbl>, `>150k` <dbl>, `Don't know/refused` <dbl>
Tafura iyi ine data yechitendero chevakapindura mumitsara, uye mazinga emari akapararira pamazita emakoramu. Huwandu hwevakapindura kubva muchikamu chega chega hunochengeterwa mumasero kukosha pamharadzano yechitendero uye chikamu chemari. Kuunza tafura mune yakatsvinda, chaiyo fomati, yakakwana kushandisa pivot_longer()
:
pew %>%
pivot_longer(cols = -religion, names_to = "income", values_to = "count")
pew %>%
pivot_longer(cols = -religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3
#> religion income count
#> <chr> <chr> <dbl>
#> 1 Agnostic <$10k 27
#> 2 Agnostic $10-20k 34
#> 3 Agnostic $20-30k 60
#> 4 Agnostic $30-40k 81
#> 5 Agnostic $40-50k 76
#> 6 Agnostic $50-75k 137
#> 7 Agnostic $75-100k 122
#> 8 Agnostic $100-150k 109
#> 9 Agnostic >150k 84
#> 10 Agnostic Don't know/refused 96
#> # … with 170 more rows
Basa Nharo pivot_longer()
- Nharo yekutanga makorari, inotsanangura kuti ndeapi makoramu anoda kubatanidzwa. Muchiitiko ichi, mitsara yose kunze nguva.
- Kupokana mazita_ku inopa zita reshanduko ichagadzirwa kubva mumazita emakoramu atakabatanidza.
- values_to inopa zita rekuchinja kuchagadzirwa kubva kune data rakachengetwa mumakoshi emasero emakoramu akabatanidzwa.
Specs
Uku ndiko kushanda kutsva kwepakeji tidyr, iyo yaimbove isingawanikwe kana uchishanda nemabasa enhaka.
Tsanangudzo ndeye data furemu, mutsara wega wega unoenderana nekoramu imwe mune itsva yekubuda zuva furemu, uye maviri akakosha makoramu anotanga ne:
- .name ine zita rekutanga rechikamu.
- .value rine zita rekoramu rinenge riine macell values.
Makoramu asara eiyo yakatarwa anoratidza kuti iyo nyowani icharatidza sei zita remakoramu akadzvanywa kubva .name.
Tsanangudzo inotsanangura metadata yakachengetwa muzita rekoramu, ine mutsara wekoramu yega yega uye imwe koramu kune imwe neimwe shanduko, yakasanganiswa nezita rekoramu, iyi tsananguro inogona kuratidzika kuvhiringa panguva ino, asi mushure mekutarisa mienzaniso mishoma inozove yakawanda. zvakajeka.
Iyo poindi yeiyo yakatarwa ndeyekuti iwe unogona kudzoreredza, kugadzirisa, uye kutsanangura metadata nyowani yedataframe iri kushandurwa.
Kuti ushande nezvakatsanangurwa paunenge uchishandura tafura kubva kune yakakura fomati kuenda kune yakareba fomati, shandisa basa racho pivot_longer_spec()
.
Mashandire anoita basa iri nderekuti rinotora chero zuva rayo uye rinogadzira metadata yaro nenzira yakatsanangurwa pamusoro.
Semuenzaniso, ngatitorei iyo dataset yakapihwa nepakeji tidyr. Iyi dataset ine ruzivo rwakapihwa nesangano rehutano repasi rose pamusoro pezviitiko zvechirwere cherurindi.
who
#> # A tibble: 7,240 x 60
#> country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534
#> <chr> <chr> <chr> <int> <int> <int> <int>
#> 1 Afghan… AF AFG 1980 NA NA NA
#> 2 Afghan… AF AFG 1981 NA NA NA
#> 3 Afghan… AF AFG 1982 NA NA NA
#> 4 Afghan… AF AFG 1983 NA NA NA
#> 5 Afghan… AF AFG 1984 NA NA NA
#> 6 Afghan… AF AFG 1985 NA NA NA
#> 7 Afghan… AF AFG 1986 NA NA NA
#> 8 Afghan… AF AFG 1987 NA NA NA
#> 9 Afghan… AF AFG 1988 NA NA NA
#> 10 Afghan… AF AFG 1989 NA NA NA
#> # … with 7,230 more rows, and 53 more variables
Ngativake matauriro ayo.
spec <- who %>%
pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")
#> # A tibble: 56 x 3
#> .name .value name
#> <chr> <chr> <chr>
#> 1 new_sp_m014 count new_sp_m014
#> 2 new_sp_m1524 count new_sp_m1524
#> 3 new_sp_m2534 count new_sp_m2534
#> 4 new_sp_m3544 count new_sp_m3544
#> 5 new_sp_m4554 count new_sp_m4554
#> 6 new_sp_m5564 count new_sp_m5564
#> 7 new_sp_m65 count new_sp_m65
#> 8 new_sp_f014 count new_sp_f014
#> 9 new_sp_f1524 count new_sp_f1524
#> 10 new_sp_f2534 count new_sp_f2534
#> # … with 46 more rows
minda nyika, svc, svc zvatove zvakasiyana. Basa redu nderekupeta makoramu nawo new_sp_m014 pamusoro newrel_f65.
Mazita emakoramu aya anochengeta mashoko anotevera:
- Prefix
new_
inoratidza kuti iyo column ine data pamusoro pezviitiko zvitsva zvechirwere chetachiona, zuva remazuva ano rine ruzivo chete pamusoro pezvirwere zvitsva, saka chivakashure ichi mumamiriro ezvinhu azvino hachina chirevo. sp
/rel
/sp
/ep
inotsanangura nzira yekuongorora chirwere.m
/f
hutano hwemurwere.014
/1524
/2535
/3544
/4554
/65
zera remurwere.
Tinogona kupatsanura aya makoramu tichishandisa basa extract()
kushandisa kutaura nguva dzose.
spec <- spec %>%
extract(name, c("diagnosis", "gender", "age"), "new_?(.*)_(.)(.*)")
#> # A tibble: 56 x 5
#> .name .value diagnosis gender age
#> <chr> <chr> <chr> <chr> <chr>
#> 1 new_sp_m014 count sp m 014
#> 2 new_sp_m1524 count sp m 1524
#> 3 new_sp_m2534 count sp m 2534
#> 4 new_sp_m3544 count sp m 3544
#> 5 new_sp_m4554 count sp m 4554
#> 6 new_sp_m5564 count sp m 5564
#> 7 new_sp_m65 count sp m 65
#> 8 new_sp_f014 count sp f 014
#> 9 new_sp_f1524 count sp f 1524
#> 10 new_sp_f2534 count sp f 2534
#> # … with 46 more rows
Ndapota cherechedza chikamu .name inofanira kuramba isina kuchinjwa sezvo iyi indekisi yedu mumazita emakoramu ekutanga dataset.
Gender uye zera (columns hukama и zera) dzakagadzirisa uye dzinozivikanwa hunhu, saka zvinokurudzirwa kushandura aya makoramu kuita zvinhu:
spec <- spec %>%
mutate(
gender = factor(gender, levels = c("f", "m")),
age = factor(age, levels = unique(age), ordered = TRUE)
)
Chekupedzisira, kuti tishandise iyo yakatarwa yatakagadzira kune yekutanga zuva furemu ani tinofanira kushandisa nharo spec mubasa pivot_longer()
.
who %>% pivot_longer(spec = spec)
#> # A tibble: 405,440 x 8
#> country iso2 iso3 year diagnosis gender age count
#> <chr> <chr> <chr> <int> <chr> <fct> <ord> <int>
#> 1 Afghanistan AF AFG 1980 sp m 014 NA
#> 2 Afghanistan AF AFG 1980 sp m 1524 NA
#> 3 Afghanistan AF AFG 1980 sp m 2534 NA
#> 4 Afghanistan AF AFG 1980 sp m 3544 NA
#> 5 Afghanistan AF AFG 1980 sp m 4554 NA
#> 6 Afghanistan AF AFG 1980 sp m 5564 NA
#> 7 Afghanistan AF AFG 1980 sp m 65 NA
#> 8 Afghanistan AF AFG 1980 sp f 014 NA
#> 9 Afghanistan AF AFG 1980 sp f 1524 NA
#> 10 Afghanistan AF AFG 1980 sp f 2534 NA
#> # … with 405,430 more rows
Zvese zvatangoita zvinogona kuratidzwa nenzira inotevera:
Tsanangudzo uchishandisa maitiro akawanda (.value)
Mumuenzaniso uri pamusoro, mutsara wezvinyorwa .value yaingova noukoshi humwe chete, kazhinji kacho ndizvo zvazviri.
Asi dzimwe nguva mamiriro ezvinhu anogona kuitika kana iwe uchida kuunganidza data kubva kumakoroni ane akasiyana marudzi e data muhukoshi. Kushandisa basa renhaka spread()
izvi zvingava zvakaoma kuita.
Muenzaniso uri pasi apa unotorwa kubva
Ngatigadzirei dataframe yekudzidzira.
family <- tibble::tribble(
~family, ~dob_child1, ~dob_child2, ~gender_child1, ~gender_child2,
1L, "1998-11-26", "2000-01-29", 1L, 2L,
2L, "1996-06-22", NA, 2L, NA,
3L, "2002-07-11", "2004-04-05", 2L, 2L,
4L, "2004-10-10", "2009-08-27", 1L, 1L,
5L, "2000-12-05", "2005-02-28", 2L, 1L,
)
family <- family %>% mutate_at(vars(starts_with("dob")), parse_date)
#> # A tibble: 5 x 5
#> family dob_child1 dob_child2 gender_child1 gender_child2
#> <int> <date> <date> <int> <int>
#> 1 1 1998-11-26 2000-01-29 1 2
#> 2 2 1996-06-22 NA 2 NA
#> 3 3 2002-07-11 2004-04-05 2 2
#> 4 4 2004-10-10 2009-08-27 1 1
#> 5 5 2000-12-05 2005-02-28 2 1
Date rakagadzirwa rine data revana vemhuri imwe mumutsara wega wega. Mhuri dzinogona kuva nemwana mumwe kana vaviri. Kumwana wega wega, data rinopihwa pazuva rekuzvarwa uye murume kana mukadzi, uye data remwana wega wega riri mumakoramu akasiyana; basa redu nderekuunza iyi data kune iyo chaiyo fomati kuti iongororwe.
Ndokumbira utarise kuti isu tine maviri akasiyana ane ruzivo nezve mwana wega wega: murume kana mukadzi uye zuva rekuzvarwa (makoramu ane prefix dop zvine zuva rekuzvarwa, makoramu ane chivakashure hukama ine bonde remwana). Mhedzisiro inotarisirwa ndeyekuti dzinofanirwa kuoneka muzvikamu zvakasiyana. Tinogona kuita izvi nekugadzira iyo yakatarwa iyo iyo column .value
zvichava nezvirevo zviviri zvakasiyana.
spec <- family %>%
pivot_longer_spec(-family) %>%
separate(col = name, into = c(".value", "child"))%>%
mutate(child = parse_number(child))
#> # A tibble: 4 x 3
#> .name .value child
#> <chr> <chr> <dbl>
#> 1 dob_child1 dob 1
#> 2 dob_child2 dob 2
#> 3 gender_child1 gender 1
#> 4 gender_child2 gender 2
Saka, ngatitorei nhanho-nhanho kutarisa zviito zvinoitwa nekodhi iri pamusoro.
pivot_longer_spec(-family)
- gadzira chirevo chinomanikidza makoramu ese aripo kunze kwekoramu yemhuri.separate(col = name, into = c(".value", "child"))
- patsanura mbiru .name, iyo ine mazita enzvimbo dzenzvimbo, uchishandisa iyo underscore uye nekupinda iyo inokonzeresa kukosha mumakoramu. .value и Mwana.mutate(child = parse_number(child))
- shandura maitiro emunda Mwana kubva pane zvinyorwa kusvika pamhando yedata data.
Iye zvino isu tinokwanisa kushandisa iyo inoguma yakatarwa kune yekutanga dataframe uye kuunza tafura kune yaunoda fomu.
family %>%
pivot_longer(spec = spec, na.rm = T)
#> # A tibble: 9 x 4
#> family child dob gender
#> <int> <dbl> <date> <int>
#> 1 1 1 1998-11-26 1
#> 2 1 2 2000-01-29 2
#> 3 2 1 1996-06-22 2
#> 4 3 1 2002-07-11 2
#> 5 3 2 2004-04-05 2
#> 6 4 1 2004-10-10 1
#> 7 4 2 2009-08-27 1
#> 8 5 1 2000-12-05 2
#> 9 5 2 2005-02-28 1
Tinoshandisa nharo na.rm = TRUE
, nokuti chimiro chemazuva ano che data chinomanikidzira kusikwa kwemitsara yekuwedzera kune zvisizvo zvekucherechedza. Nokuti mhuri 2 ine mwana mumwe chete, na.rm = TRUE
inovimbisa kuti mhuri 2 ichava nemutsara mumwe mukubuda.
Kushandura mafaremu emazuva kubva kureba kuenda kune yakafara fomati
pivot_wider()
- ndiyo inverse shanduko, uye zvinopesana inowedzera huwandu hwemakoramu edeti furemu nekudzikisa huwandu hwemitsara.
Rudzi urwu rweshanduko haruwanzo shandiswa kuunza data muchimiro chakarurama, zvisinei, nzira iyi inogona kubatsira pakugadzira matafura epivot anoshandiswa mukuratidzira, kana kusanganisa nemamwe maturusi.
Chaizvoizvo mabasa pivot_longer()
и pivot_wider()
ane symmetrical, uye anogadzira zviito zvinopesana kune mumwe nemumwe, kureva: df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec)
и df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec)
ichadzorera df yepakutanga.
Muenzaniso wakapfava wekushandura tafura kune yakakura fomati
Kuratidza kuti basa racho rinoshanda sei pivot_wider()
isu tichashandisa dataset hove_kusangana, iyo inochengetedza ruzivo rwekuti zviteshi zvakasiyana-siyana zvinonyora sei kufamba kwehove murwizi.
#> # A tibble: 114 x 3
#> fish station seen
#> <fct> <fct> <int>
#> 1 4842 Release 1
#> 2 4842 I80_1 1
#> 3 4842 Lisbon 1
#> 4 4842 Rstr 1
#> 5 4842 Base_TD 1
#> 6 4842 BCE 1
#> 7 4842 BCW 1
#> 8 4842 BCE2 1
#> 9 4842 BCW2 1
#> 10 4842 MAE 1
#> # … with 104 more rows
Kazhinji, tafura iyi ichave inodzidzisa uye iri nyore kushandisa kana iwe ukapa ruzivo rwechiteshi chimwe nechimwe mumutsara wakasiyana.
fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 NA NA NA NA NA
#> 5 4847 1 1 1 NA NA NA NA NA NA NA
#> 6 4848 1 1 1 1 NA NA NA NA NA NA
#> 7 4849 1 1 NA NA NA NA NA NA NA NA
#> 8 4850 1 1 NA 1 1 1 1 NA NA NA
#> 9 4851 1 1 NA NA NA NA NA NA NA NA
#> 10 4854 1 1 NA NA NA NA NA NA NA NA
#> # … with 9 more rows, and 1 more variable: MAW <int>
Iyi data seti inongorekodha ruzivo kana hove dzaonekwa nechiteshi, i.e. kana chero hove isina kunyorwa neimwe chiteshi, saka iyi data haizove mutafura. Izvi zvinoreva kuti zvinobuda zvichazadzwa neNA.
Zvisinei, munyaya iyi tinoziva kuti kusavapo kwechinyorwa kunoreva kuti hove haina kuonekwa, saka tinogona kushandisa nharo values_fill mubasa pivot_wider()
uye zadza izvi zvisipo ne zeros:
fish_encounters %>% pivot_wider(
names_from = station,
values_from = seen,
values_fill = list(seen = 0)
)
#> # A tibble: 19 x 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 0 0 0 0 0
#> 5 4847 1 1 1 0 0 0 0 0 0 0
#> 6 4848 1 1 1 1 0 0 0 0 0 0
#> 7 4849 1 1 0 0 0 0 0 0 0 0
#> 8 4850 1 1 0 1 1 1 1 0 0 0
#> 9 4851 1 1 0 0 0 0 0 0 0 0
#> 10 4854 1 1 0 0 0 0 0 0 0 0
#> # … with 9 more rows, and 1 more variable: MAW <int>
Kugadzira zita rekoramu kubva kune akawanda masosi akasiyana
Fungidzira isu tine tafura ine musanganiswa wechigadzirwa, nyika uye gore. Kuti ugadzire zuva rekuyedza, unogona kumhanyisa kodhi inotevera:
df <- expand_grid(
product = c("A", "B"),
country = c("AI", "EI"),
year = 2000:2014
) %>%
filter((product == "A" & country == "AI") | product == "B") %>%
mutate(value = rnorm(nrow(.)))
#> # A tibble: 45 x 4
#> product country year value
#> <chr> <chr> <int> <dbl>
#> 1 A AI 2000 -2.05
#> 2 A AI 2001 -0.676
#> 3 A AI 2002 1.60
#> 4 A AI 2003 -0.353
#> 5 A AI 2004 -0.00530
#> 6 A AI 2005 0.442
#> 7 A AI 2006 -0.610
#> 8 A AI 2007 -2.77
#> 9 A AI 2008 0.899
#> 10 A AI 2009 -0.106
#> # … with 35 more rows
Basa redu nderekuwedzera iyo data data kuitira kuti imwe koramu ine data yemusanganiswa wega wega wechigadzirwa nenyika. Kuti uite izvi, ingopfuura munharo mazita_kubva vhekita rine mazita eminda ichabatanidzwa.
df %>% pivot_wider(names_from = c(product, country),
values_from = "value")
#> # A tibble: 15 x 4
#> year A_AI B_AI B_EI
#> <int> <dbl> <dbl> <dbl>
#> 1 2000 -2.05 0.607 1.20
#> 2 2001 -0.676 1.65 -0.114
#> 3 2002 1.60 -0.0245 0.501
#> 4 2003 -0.353 1.30 -0.459
#> 5 2004 -0.00530 0.921 -0.0589
#> 6 2005 0.442 -1.55 0.594
#> 7 2006 -0.610 0.380 -1.28
#> 8 2007 -2.77 0.830 0.637
#> 9 2008 0.899 0.0175 -1.30
#> 10 2009 -0.106 -0.195 1.03
#> # … with 5 more rows
Iwe unogona zvakare kushandisa zvakatemwa kune basa pivot_wider()
. Asi kana yatumirwa pivot_wider()
iyo yakatarwa inoita shanduko yakapesana pivot_longer()
: Makoramu akatsanangurwa mukati .name, uchishandisa kukosha kubva .value nemamwe makoramu.
Kune iyi dhatabheti, unogona kugadzira yakasarudzika kana iwe uchida kuti nyika yese inobvira uye chigadzirwa musanganiswa ive nekoramu yayo, kwete chete iyo iripo mune data:
spec <- df %>%
expand(product, country, .value = "value") %>%
unite(".name", product, country, remove = FALSE)
#> # A tibble: 4 x 4
#> .name product country .value
#> <chr> <chr> <chr> <chr>
#> 1 A_AI A AI value
#> 2 A_EI A EI value
#> 3 B_AI B AI value
#> 4 B_EI B EI value
df %>% pivot_wider(spec = spec) %>% head()
#> # A tibble: 6 x 5
#> year A_AI A_EI B_AI B_EI
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 -2.05 NA 0.607 1.20
#> 2 2001 -0.676 NA 1.65 -0.114
#> 3 2002 1.60 NA -0.0245 0.501
#> 4 2003 -0.353 NA 1.30 -0.459
#> 5 2004 -0.00530 NA 0.921 -0.0589
#> 6 2005 0.442 NA -1.55 0.594
Mienzaniso yakati wandei yekushanda neiyo nyowani tidyr pfungwa
Kuchenesa data uchishandisa iyo US Census Income uye Rent dataset semuenzaniso.
Data set isu_yerent_mari ine mari yepakati uye ruzivo rwekurenda kudunhu rega rega muUS ye2017 (data set inowanikwa mupakeji tidycensus).
us_rent_income
#> # A tibble: 104 x 5
#> GEOID NAME variable estimate moe
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 01 Alabama income 24476 136
#> 2 01 Alabama rent 747 3
#> 3 02 Alaska income 32940 508
#> 4 02 Alaska rent 1200 13
#> 5 04 Arizona income 27517 148
#> 6 04 Arizona rent 972 4
#> 7 05 Arkansas income 23789 165
#> 8 05 Arkansas rent 709 5
#> 9 06 California income 29454 109
#> 10 06 California rent 1358 3
#> # … with 94 more rows
Mune fomu iyo iyo data inochengetwa mune dataset isu_yerent_mari kushanda navo kunonetsa zvakanyanya, saka tinoda kugadzira data seti ine makoramu: rendi, rent_moe, uya, mari_moe. Pane nzira dzakawanda dzekugadzira iyi dhizaini, asi chinhu chikuru ndechekuti isu tinofanirwa kugadzira ese musanganiswa wezvakasiyana kukosha uye estimate/moeuye wobva wagadzira zita rekoramu.
spec <- us_rent_income %>%
expand(variable, .value = c("estimate", "moe")) %>%
mutate(
.name = paste0(variable, ifelse(.value == "moe", "_moe", ""))
)
#> # A tibble: 4 x 3
#> variable .value .name
#> <chr> <chr> <chr>
#> 1 income estimate income
#> 2 income moe income_moe
#> 3 rent estimate rent
#> 4 rent moe rent_moe
Kupa iyi specification pivot_wider()
inotipa mhedzisiro yatiri kutsvaga:
us_rent_income %>% pivot_wider(spec = spec)
#> # A tibble: 52 x 6
#> GEOID NAME income income_moe rent rent_moe
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 136 747 3
#> 2 02 Alaska 32940 508 1200 13
#> 3 04 Arizona 27517 148 972 4
#> 4 05 Arkansas 23789 165 709 5
#> 5 06 California 29454 109 1358 3
#> 6 08 Colorado 32401 109 1125 5
#> 7 09 Connecticut 35326 195 1123 5
#> 8 10 Delaware 31560 247 1076 10
#> 9 11 District of Columbia 43198 681 1424 17
#> 10 12 Florida 25952 70 1077 3
#> # … with 42 more rows
The World Bank
Dzimwe nguva kuunza data yakaiswa mufomu yaunoda kunoda matanho akati wandei.
Dataset world_bank_pop ine World Bank data pamusoro pehuwandu hwenyika imwe neimwe pakati pa2000 na2018.
#> # A tibble: 1,056 x 20
#> country indicator `2000` `2001` `2002` `2003` `2004` `2005` `2006`
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 ABW SP.URB.T… 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4 4.49e+4
#> 2 ABW SP.URB.G… 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
#> 3 ABW SP.POP.T… 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5 1.01e+5
#> 4 ABW SP.POP.G… 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0 7.98e-1
#> 5 AFG SP.URB.T… 4.44e6 4.65e6 4.89e6 5.16e6 5.43e+6 5.69e+6 5.93e+6
#> 6 AFG SP.URB.G… 3.91e0 4.66e0 5.13e0 5.23e0 5.12e+0 4.77e+0 4.12e+0
#> 7 AFG SP.POP.T… 2.01e7 2.10e7 2.20e7 2.31e7 2.41e+7 2.51e+7 2.59e+7
#> 8 AFG SP.POP.G… 3.49e0 4.25e0 4.72e0 4.82e0 4.47e+0 3.87e+0 3.23e+0
#> 9 AGO SP.URB.T… 8.23e6 8.71e6 9.22e6 9.77e6 1.03e+7 1.09e+7 1.15e+7
#> 10 AGO SP.URB.G… 5.44e0 5.59e0 5.70e0 5.76e0 5.75e+0 5.69e+0 4.92e+0
#> # … with 1,046 more rows, and 11 more variables: `2007` <dbl>,
#> # `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#> # `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>
Chinangwa chedu ndechekugadzira yakatsvinda data seti ine imwe neimwe shanduko mukoramu yayo. Hazvinyatso kujeka kuti ndeapi matanho anodiwa, asi isu tichatanga nedambudziko riri pachena: gore rinopararira pamakoramu akawanda.
Kuti ugadzirise izvi unoda kushandisa basa pivot_longer()
.
pop2 <- world_bank_pop %>%
pivot_longer(`2000`:`2017`, names_to = "year")
#> # A tibble: 19,008 x 4
#> country indicator year value
#> <chr> <chr> <chr> <dbl>
#> 1 ABW SP.URB.TOTL 2000 42444
#> 2 ABW SP.URB.TOTL 2001 43048
#> 3 ABW SP.URB.TOTL 2002 43670
#> 4 ABW SP.URB.TOTL 2003 44246
#> 5 ABW SP.URB.TOTL 2004 44669
#> 6 ABW SP.URB.TOTL 2005 44889
#> 7 ABW SP.URB.TOTL 2006 44881
#> 8 ABW SP.URB.TOTL 2007 44686
#> 9 ABW SP.URB.TOTL 2008 44375
#> 10 ABW SP.URB.TOTL 2009 44052
#> # … with 18,998 more rows
Nhanho inotevera ndeyekutarisa chiratidzo chekuchinja.
pop2 %>% count(indicator)
#> # A tibble: 4 x 2
#> indicator n
#> <chr> <int>
#> 1 SP.POP.GROW 4752
#> 2 SP.POP.TOTL 4752
#> 3 SP.URB.GROW 4752
#> 4 SP.URB.TOTL 4752
Apo SP.POP.GROW iri kukura kwevanhu, SP.POP.TOTL ihuwandu hwevanhu, uye SP.URB. * chinhu chimwe chete, asi kunzvimbo dzemumaguta chete. Ngatipatsanurei hunhu uhwu muzvikamu zviviri: nharaunda - nzvimbo (yakazara kana yedhorobha) uye shanduko ine data chaiyo (huwandu kana kukura):
pop3 <- pop2 %>%
separate(indicator, c(NA, "area", "variable"))
#> # A tibble: 19,008 x 5
#> country area variable year value
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 ABW URB TOTL 2000 42444
#> 2 ABW URB TOTL 2001 43048
#> 3 ABW URB TOTL 2002 43670
#> 4 ABW URB TOTL 2003 44246
#> 5 ABW URB TOTL 2004 44669
#> 6 ABW URB TOTL 2005 44889
#> 7 ABW URB TOTL 2006 44881
#> 8 ABW URB TOTL 2007 44686
#> 9 ABW URB TOTL 2008 44375
#> 10 ABW URB TOTL 2009 44052
#> # … with 18,998 more rows
Iye zvino zvatinofanira kuita kupatsanura shanduko kuita makoramu maviri:
pop3 %>%
pivot_wider(names_from = variable, values_from = value)
#> # A tibble: 9,504 x 5
#> country area year TOTL GROW
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 ABW URB 2000 42444 1.18
#> 2 ABW URB 2001 43048 1.41
#> 3 ABW URB 2002 43670 1.43
#> 4 ABW URB 2003 44246 1.31
#> 5 ABW URB 2004 44669 0.951
#> 6 ABW URB 2005 44889 0.491
#> 7 ABW URB 2006 44881 -0.0178
#> 8 ABW URB 2007 44686 -0.435
#> 9 ABW URB 2008 44375 -0.698
#> 10 ABW URB 2009 44052 -0.731
#> # … with 9,494 more rows
Mazita ekubatana
Mumwe muenzaniso wekupedzisira, fungidzira uine runyorwa rwekufona rwawakakopa uye nekumisa kubva pawebhusaiti:
contacts <- tribble(
~field, ~value,
"name", "Jiena McLellan",
"company", "Toyota",
"name", "John Smith",
"company", "google",
"email", "[email protected]",
"name", "Huxley Ratcliffe"
)
Kunyora rondedzero iyi kwakaoma nekuti hapana musiyano unoratidza kuti ndeupi data nderaani. Isu tinogona kugadzirisa izvi nekucherechedza kuti data reumwe neumwe mutsva rinotanga ne "zita", saka isu tinokwanisa kugadzira yakasarudzika identifier uye nekuiwedzera imwe neimwe nguva iyo munda column ine kukosha "zita":
contacts <- contacts %>%
mutate(
person_id = cumsum(field == "name")
)
contacts
#> # A tibble: 6 x 3
#> field value person_id
#> <chr> <chr> <int>
#> 1 name Jiena McLellan 1
#> 2 company Toyota 1
#> 3 name John Smith 2
#> 4 company google 2
#> 5 email [email protected] 2
#> 6 name Huxley Ratcliffe 3
Iye zvino zvatine ID yakasarudzika kune yega yega, tinogona kushandura munda uye kukosha kuita makoramu:
contacts %>%
pivot_wider(names_from = field, values_from = value)
#> # A tibble: 3 x 4
#> person_id name company email
#> <int> <chr> <chr> <chr>
#> 1 1 Jiena McLellan Toyota <NA>
#> 2 2 John Smith google [email protected]
#> 3 3 Huxley Ratcliffe <NA> <NA>
mhedziso
Maonero angu ndeekuti pfungwa itsva tidyr zvechokwadi zvakanyanya intuitive, uye zvakanyanya kukwirira mukushanda kune legacy mabasa spread()
и gather()
. Ndinovimba kuti chinyorwa ichi chakakubatsira kubata nacho pivot_longer()
и pivot_wider()
.
Source: www.habr.com