Iphakheji icocekile ibandakanywe kumbindi welona thala leencwadi lidumileyo kulwimi lwesi-R - icocekile.
Injongo ephambili yephakheji kukuzisa idatha kwifom echanekileyo.
Sele ikhona kuHabré
S.J.K.: Ngaba collection() kunye ne spread() iya kurhoxiswa?
Hadley Wickham: Ukuya kwenye indawo. Asisayi kuphinda sincome ukusetyenziswa kwale misebenzi kwaye silungise iziphene kuzo, kodwa ziya kuqhubeka zikhona kwiphakheji kwimeko yazo yangoku.
Iziqulatho
Ukuba unomdla kuhlalutyo lwedatha, unokuba nomdla kum
Ingcamango yeTidyData Imisebenzi ephambili ibandakanyiwe kwiphakheji ye-tidyr Ingqikelelo entsha yokuguqula idatha ukusuka kububanzi ukuya kwifomathi ende kwaye ngokuphambeneyo Ukufakela olona guqulelo lwangoku lwe-tidyr 0.8.3.9000 Ukutshintshela kwiimpawu ezintsha Umzekelo olula wokuguqula idatha ukusuka kububanzi ukuya kwifomathi ende Iinkcukacha Ukuchazwa usebenzisa amaxabiso amaninzi (.value) Ukuguqula izakhelo zomhla ukusuka kwifomati emide ukuya kububanzi Imizekelo emininzi ephucukileyo yokusebenza ngengcinga entsha ye-tidyr isiphelo
Ingcamango yeTidyData
Injongo icocekile — kukunceda uzise idatha kwinto ebizwa ngokuba yifomu ecocekileyo. Idatha ecocekileyo yidatha apho:
- Uguqulo ngalunye lukwikholamu.
- Uqwalaselo ngalunye lungumtya.
- Ixabiso ngalinye yiseli.
Kulula kakhulu kwaye kulula ngakumbi ukusebenza ngedatha enikezelwe kwidatha ecocekileyo xa uhlalutya.
Imisebenzi ephambili ibandakanyiwe kwiphakheji ye-tidyr
i-tidyr iqulethe uluhlu lwemisebenzi eyilelwe ukuguqula iitafile:
fill()
- ukuzalisa amaxabiso alahlekileyo kwikholamu kunye namaxabiso angaphambili;separate()
— ukwahlula intsimi ibe ninzi usebenzisa isahluli;unite()
- yenza umsebenzi wokudibanisa amasimi amaninzi kwelinye, isenzo esichasayo somsebenziseparate()
;pivot_longer()
- umsebenzi oguqula idatha ukusuka kwifomathi ebanzi ukuya kwifomathi ende;pivot_wider()
- umsebenzi oguqula idatha ukusuka kwifomathi ende ukuya kwifomathi ebanzi. Ukusebenza umva kwalowo owenziwe ngumsebenzipivot_longer()
.gather()
iphelelwe lixesha - umsebenzi oguqula idatha ukusuka kwifomathi ebanzi ukuya kwifomathi ende;spread()
iphelelwe lixesha - umsebenzi oguqula idatha ukusuka kwifomathi ende ukuya kwifomathi ebanzi. Ukusebenza umva kwalowo owenziwe ngumsebenzigather()
.
Ingqikelelo entsha yokuguqula idatha ukusuka kububanzi ukuya kwifomathi ende kwaye ngokuphambeneyo
Ngaphambili, imisebenzi yayisetyenziselwa olu hlobo lwenguqu gather()
и spread()
. Ukutyhubela iminyaka yobukho bale misebenzi, kuye kwacaca ukuba kubasebenzisi abaninzi, kubandakanywa nombhali wepakethe, amagama ale misebenzi kunye neengxoxo zabo azicacanga ncam, kwaye kubangele ubunzima ekuyifumaneni nasekuqondeni ukuba yeyiphi le misebenzi eguqulayo. isakhelo somhla ukusuka kububanzi ukuya kwifomathi ende, kwaye ngolunye uhlobo.
Kule nkalo, kwi icocekile Imisebenzi emibini emitsha, ebalulekileyo iye yongezwa eyilelwe ukuguqula izakhelo zomhla.
Iimpawu ezintsha pivot_longer()
и pivot_wider()
ziphefumlelwe zezinye zeempawu ezikwiphakheji cdata, eyenziwe nguJohn Mount kunye noNina Zumel.
Ukufakela olona guqulelo lwangoku lwe-tidyr 0.8.3.9000
Ukufakela entsha, inguqulelo yangoku yepakethe icocekile 0.8.3.9000, apho izinto ezintsha zikhoyo, sebenzisa le khowudi ilandelayo.
devtools::install_github("tidyverse/tidyr")
Ngexesha lokubhalwa, le misebenzi ifumaneka kuphela kwi-dev version yephakheji kwi-GitHub.
Ukutshintshela kwiimpawu ezintsha
Ngapha koko, akunzima ukutshintshela imibhalo emidala ukusebenza ngemisebenzi emitsha; ukuqonda ngcono, ndiya kuthatha umzekelo kumaxwebhu emisebenzi emidala kwaye ndibonise indlela imisebenzi efanayo eyenziwa ngayo kusetyenziswa emitsha. pivot_*()
imisebenzi.
Guqula ifomathi ebanzi ibe yifomati ende.
Ikhowudi yomzekelo ukusuka kuxwebhu lomsebenzi wokuqokelela
# example
library(dplyr)
stocks <- data.frame(
time = as.Date('2009-01-01') + 0:9,
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2),
Z = rnorm(10, 0, 4)
)
# old
stocks_gather <- stocks %>% gather(key = stock,
value = price,
-time)
# new
stocks_long <- stocks %>% pivot_longer(cols = -time,
names_to = "stock",
values_to = "price")
Ukuguqula ifomathi ende kwifomati ebanzi.
Ikhowudi yomzekelo ukusuka kumaxwebhu omsebenzi wosasazo
# old
stocks_spread <- stocks_gather %>% spread(key = stock,
value = price)
# new
stock_wide <- stocks_long %>% pivot_wider(names_from = "stock",
values_from = "price")
Ngokuba kule mizekelo ingasentla yokusebenza nayo pivot_longer()
и pivot_wider()
, kwitheyibhile yokuqala esitokisini akukho zikholamu ezidweliswe kwiingxoxo amagama_ukuya и values_to amagama abo makabe kumanqaku okucaphula.
Itheyibhile eya kukunceda ukuba ufumane ngokulula indlela yokutshintshela ekusebenzeni ngombono omtsha icocekile.
Inqaku elivela kumbhali
Wonke umbhalo ongezantsi uyaguquguquka, ndingatsho nokuguqulela simahla
iivignettes kwiwebhusayithi esemthethweni yethala leencwadi.
Umzekelo olula wokuguqula idatha ukusuka kububanzi ukuya kwifomathi ende
pivot_longer ()
- yenza isethi yedatha ixesha elide ngokunciphisa inani leekholomu kunye nokwandisa inani lemiqolo.
Ukwenza imizekelo evezwe kwinqaku, kufuneka kuqala udibanise iipakethe eziyimfuneko:
library(tidyr)
library(dplyr)
library(readr)
Masithi sinetheyibhile eneziphumo zophando oluthi (phakathi kwezinye izinto) lubuze abantu ngenkolo yabo nangomvuzo wonyaka:
#> # A tibble: 18 x 11
#> religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Agnostic 27 34 60 81 76 137
#> 2 Atheist 12 27 37 52 35 70
#> 3 Buddhist 27 21 30 34 33 58
#> 4 Catholic 418 617 732 670 638 1116
#> 5 Don’t k… 15 14 15 11 10 35
#> 6 Evangel… 575 869 1064 982 881 1486
#> 7 Hindu 1 9 7 9 11 34
#> 8 Histori… 228 244 236 238 197 223
#> 9 Jehovah… 20 27 24 24 21 30
#> 10 Jewish 19 19 25 25 30 95
#> # … with 8 more rows, and 4 more variables: `$75-100k` <dbl>,
#> # `$100-150k` <dbl>, `>150k` <dbl>, `Don't know/refused` <dbl>
Le theyibhile iqulethe idatha yenkolo yabaphenduli kwimiqolo, kwaye amanqanaba engeniso asasazwe kumagama eekholamu. Inani labaphenduli kwicandelo ngalinye ligcinwa kumaxabiso eseli ekudibaneni kwenkolo kunye nenqanaba lengeniso. Ukuzisa itafile kwifomathi ecocekileyo, echanekileyo, kwanele ukuyisebenzisa pivot_longer()
:
pew %>%
pivot_longer(cols = -religion, names_to = "income", values_to = "count")
pew %>%
pivot_longer(cols = -religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3
#> religion income count
#> <chr> <chr> <dbl>
#> 1 Agnostic <$10k 27
#> 2 Agnostic $10-20k 34
#> 3 Agnostic $20-30k 60
#> 4 Agnostic $30-40k 81
#> 5 Agnostic $40-50k 76
#> 6 Agnostic $50-75k 137
#> 7 Agnostic $75-100k 122
#> 8 Agnostic $100-150k 109
#> 9 Agnostic >150k 84
#> 10 Agnostic Don't know/refused 96
#> # … with 170 more rows
Iingxoxo zoMsebenzi pivot_longer()
- Ingxabano yokuqala iikhola, ichaza ukuba yeyiphi imiqolo ekufuneka idityaniswe. Kule meko, zonke iikholamu ngaphandle ixesha.
- Impikiswano amagama_ukuya inika igama loguqulo oluya kwenziwa ukusuka kumagama eekholamu esizidibanise.
- values_to inika igama loguquko oluya kwenziwa kwidatha egcinwe kumaxabiso eeseli zekholamu ezidityanisiweyo.
Iinkcukacha
Lo ngumsebenzi omtsha wepakethe icocekile, ebingafumaneki ngaphambili xa usebenza ngemisebenzi yelifa.
Ubalulo sisakhelo sedatha, umqolo ngamnye ohambelana nekholamu enye kwisakhelo somhla wemveliso entsha, kunye neekholamu ezimbini ezikhethekileyo eziqala ngo:
- .name iqulathe igama lekholamu yoqobo.
- .ixabiso iqulathe igama loluhlu oluza kuqulatha amaxabiso eseli.
Imihlathi eseleyo yenkcazo ibonisa indlela umhlathi omtsha oza kubonisa ngayo igama leekholamu ezicinezelweyo ukusuka .name.
Inkcazo ichaza i-metadata egcinwe kwigama lekholomu, kunye nomqolo omnye kwikholamu nganye kunye nekholamu enye yenguqu nganye, idityaniswe negama lekholomu, le nkcazo ingabonakala ididekile okwangoku, kodwa emva kokujonga imizekelo embalwa iya kuba yinto eninzi. icace ngakumbi.
Inqaku lengcaciso kukuba unokubuyisa, ulungise, kwaye uchaze imethadatha entsha yesakhelo sedatha eguqulwayo.
Ukusebenza neenkcukacha xa uguqula itafile ukusuka kwifomathi ebanzi ukuya kwifomathi ende, sebenzisa umsebenzi pivot_longer_spec()
.
Indlela osebenza ngayo lo msebenzi kukuba ithatha nasiphi na isakhelo somhla kwaye ivelise imetadata yayo ngendlela echazwe ngasentla.
Njengomzekelo, makhe sithathe ukuba ngubani iseti yedatha ebonelelweyo kunye nephakheji icocekile. Le datha iqulethe ulwazi olunikezelwe ngumbutho wezempilo wamazwe ngamazwe malunga nesifo sephepha.
who
#> # A tibble: 7,240 x 60
#> country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534
#> <chr> <chr> <chr> <int> <int> <int> <int>
#> 1 Afghan… AF AFG 1980 NA NA NA
#> 2 Afghan… AF AFG 1981 NA NA NA
#> 3 Afghan… AF AFG 1982 NA NA NA
#> 4 Afghan… AF AFG 1983 NA NA NA
#> 5 Afghan… AF AFG 1984 NA NA NA
#> 6 Afghan… AF AFG 1985 NA NA NA
#> 7 Afghan… AF AFG 1986 NA NA NA
#> 8 Afghan… AF AFG 1987 NA NA NA
#> 9 Afghan… AF AFG 1988 NA NA NA
#> 10 Afghan… AF AFG 1989 NA NA NA
#> # … with 7,230 more rows, and 53 more variables
Masakhe iinkcukacha zayo.
spec <- who %>%
pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")
#> # A tibble: 56 x 3
#> .name .value name
#> <chr> <chr> <chr>
#> 1 new_sp_m014 count new_sp_m014
#> 2 new_sp_m1524 count new_sp_m1524
#> 3 new_sp_m2534 count new_sp_m2534
#> 4 new_sp_m3544 count new_sp_m3544
#> 5 new_sp_m4554 count new_sp_m4554
#> 6 new_sp_m5564 count new_sp_m5564
#> 7 new_sp_m65 count new_sp_m65
#> 8 new_sp_f014 count new_sp_f014
#> 9 new_sp_f1524 count new_sp_f1524
#> 10 new_sp_f2534 count new_sp_f2534
#> # … with 46 more rows
amasimi lizwe, isoxnumx, isoxnumx sele ziguquguquka. Umsebenzi wethu kukuguqula iikholamu nge entsha_sp_m014 kwi newrel_f65.
Amagama ale kholamu agcina olu lwazi lulandelayo:
- Isimaphambili
new_
ibonisa ukuba ikholamu iqulethe idatha kwiimeko ezintsha zesifo sofuba, umhla okhoyo wangoku uqulethe ulwazi kuphela kwizifo ezitsha, ngoko esi simaphambili kwimeko yangoku ayithwali nayiphi na intsingiselo. sp
/rel
/sp
/ep
ichaza indlela yokufumanisa isifo.m
/f
isini somguli.014
/1524
/2535
/3544
/4554
/65
uluhlu lobudala besigulane.
Singahlula le miqolo ngokusebenzisa umsebenzi extract()
usebenzisa intetho eqhelekileyo.
spec <- spec %>%
extract(name, c("diagnosis", "gender", "age"), "new_?(.*)_(.)(.*)")
#> # A tibble: 56 x 5
#> .name .value diagnosis gender age
#> <chr> <chr> <chr> <chr> <chr>
#> 1 new_sp_m014 count sp m 014
#> 2 new_sp_m1524 count sp m 1524
#> 3 new_sp_m2534 count sp m 2534
#> 4 new_sp_m3544 count sp m 3544
#> 5 new_sp_m4554 count sp m 4554
#> 6 new_sp_m5564 count sp m 5564
#> 7 new_sp_m65 count sp m 65
#> 8 new_sp_f014 count sp f 014
#> 9 new_sp_f1524 count sp f 1524
#> 10 new_sp_f2534 count sp f 2534
#> # … with 46 more rows
Nceda uqaphele ikholam .name kufuneka ihlale ingatshintshanga kuba esi sisalathiso sethu kuluhlu lwamagama esiseko sedata.
Isini kunye nobudala (iikholamu ngesini и ubudala) zilungisiwe kwaye ziyaziwa amaxabiso, ngoko kuyacetyiswa ukuba uguqule le kholamu kwizinto:
spec <- spec %>%
mutate(
gender = factor(gender, levels = c("f", "m")),
age = factor(age, levels = unique(age), ordered = TRUE)
)
Okokugqibela, ukuze sisebenzise iinkcukacha esizenzileyo kwisakhelo somhla wokuqala Ngubani kufuneka sisebenzise ingxabano umzekelo kumsebenzi pivot_longer()
.
who %>% pivot_longer(spec = spec)
#> # A tibble: 405,440 x 8
#> country iso2 iso3 year diagnosis gender age count
#> <chr> <chr> <chr> <int> <chr> <fct> <ord> <int>
#> 1 Afghanistan AF AFG 1980 sp m 014 NA
#> 2 Afghanistan AF AFG 1980 sp m 1524 NA
#> 3 Afghanistan AF AFG 1980 sp m 2534 NA
#> 4 Afghanistan AF AFG 1980 sp m 3544 NA
#> 5 Afghanistan AF AFG 1980 sp m 4554 NA
#> 6 Afghanistan AF AFG 1980 sp m 5564 NA
#> 7 Afghanistan AF AFG 1980 sp m 65 NA
#> 8 Afghanistan AF AFG 1980 sp f 014 NA
#> 9 Afghanistan AF AFG 1980 sp f 1524 NA
#> 10 Afghanistan AF AFG 1980 sp f 2534 NA
#> # … with 405,430 more rows
Yonke into esisanda kuyenza inokuboniswa ngokwesicwangciso ngolu hlobo lulandelayo:
Ukuchazwa usebenzisa amaxabiso amaninzi (.value)
Kulo mzekelo ungentla, ikholamu yenkcazo .ixabiso iqulathe ixabiso elinye kuphela, kwiimeko ezininzi oku kunjalo.
Kodwa ngamanye amaxesha imeko inokuvela xa ufuna ukuqokelela idatha kwiikholamu ezineentlobo ezahlukeneyo zedatha kumaxabiso. Ukusebenzisa umsebenzi welifa spread()
oku kuya kuba nzima kakhulu ukwenza.
Lo mzekelo ungezantsi uthathwe
Masenze i-dataframe yoqeqesho.
family <- tibble::tribble(
~family, ~dob_child1, ~dob_child2, ~gender_child1, ~gender_child2,
1L, "1998-11-26", "2000-01-29", 1L, 2L,
2L, "1996-06-22", NA, 2L, NA,
3L, "2002-07-11", "2004-04-05", 2L, 2L,
4L, "2004-10-10", "2009-08-27", 1L, 1L,
5L, "2000-12-05", "2005-02-28", 2L, 1L,
)
family <- family %>% mutate_at(vars(starts_with("dob")), parse_date)
#> # A tibble: 5 x 5
#> family dob_child1 dob_child2 gender_child1 gender_child2
#> <int> <date> <date> <int> <int>
#> 1 1 1998-11-26 2000-01-29 1 2
#> 2 2 1996-06-22 NA 2 NA
#> 3 3 2002-07-11 2004-04-05 2 2
#> 4 4 2004-10-10 2009-08-27 1 1
#> 5 5 2000-12-05 2005-02-28 2 1
Isakhelo somhla esiyiliweyo siqulathe idatha yabantwana bosapho olunye kumqolo ngamnye. Iintsapho zinokuba nomntwana omnye okanye ababini. Kumntwana ngamnye, idatha inikwe ngomhla wokuzalwa kunye nesini, kwaye idatha yomntwana ngamnye ikwiikholamu ezahlukeneyo; umsebenzi wethu kukuzisa le datha kwifomathi echanekileyo ukuze ihlalutywe.
Nceda uqaphele ukuba sineenguqu ezimbini ezinolwazi malunga nomntwana ngamnye: isini kunye nomhla wokuzalwa (iikholamu ezinesimaphambili Dop ziqulathe umhla wokuzalwa, iikholamu ezinezimaphambili ngesini ziqulathe isini somntwana). Isiphumo esilindelekileyo kukuba kufuneka zivele kwimiqolo eyahlukileyo. Sinokukwenza oku ngokuvelisa inkcazo apho ikholamu .value
ziya kuba neentsingiselo ezimbini ezahlukeneyo.
spec <- family %>%
pivot_longer_spec(-family) %>%
separate(col = name, into = c(".value", "child"))%>%
mutate(child = parse_number(child))
#> # A tibble: 4 x 3
#> .name .value child
#> <chr> <chr> <dbl>
#> 1 dob_child1 dob 1
#> 2 dob_child2 dob 2
#> 3 gender_child1 gender 1
#> 4 gender_child2 gender 2
Ke, makhe sijonge inyathelo ngenyathelo kwizenzo ezenziwa yile khowudi ingentla.
pivot_longer_spec(-family)
- yenza inkcazo ecinezela zonke iikholamu ezikhoyo ngaphandle kwekholamu yosapho.separate(col = name, into = c(".value", "child"))
- yahlula ikholamu .name, equlathe amagama emihlaba yemvelaphi, usebenzisa i underscore kwaye ingenisa amaxabiso anesiphumo kwimiqolo. .ixabiso и Umntwana.mutate(child = parse_number(child))
- guqula ixabiso lendawo Umntwana ukusuka kumbhalo ukuya kudidi lwedatha yamanani.
Ngoku sinokufaka isiphumo esibalulwayo kwi-dataframe yokuqala kwaye sizise itafile kwifom efunwayo.
family %>%
pivot_longer(spec = spec, na.rm = T)
#> # A tibble: 9 x 4
#> family child dob gender
#> <int> <dbl> <date> <int>
#> 1 1 1 1998-11-26 1
#> 2 1 2 2000-01-29 2
#> 3 2 1 1996-06-22 2
#> 4 3 1 2002-07-11 2
#> 5 3 2 2004-04-05 2
#> 6 4 1 2004-10-10 1
#> 7 4 2 2009-08-27 1
#> 8 5 1 2000-12-05 2
#> 9 5 2 2005-02-28 1
Sisebenzisa ingxoxo na.rm = TRUE
, ngenxa yokuba ifom yangoku yedatha inyanzelisa ukudalwa kwemiqolo eyongezelelweyo yoqwalaselo olungekhoyo. Ngokuba Usapho 2 lunomntwana omnye kuphela, na.rm = TRUE
iqinisekisa ukuba usapho lwe-2 luya kuba nomqolo omnye kwimveliso.
Ukuguqula izakhelo zomhla ukusuka kwifomati emide ukuya kububanzi
pivot_wider()
- luguqulo oluguqukileyo, kwaye ngokuchaseneyo kwandisa inani leekholamu zomhla wesakhelo ngokunciphisa inani lemiqolo.
Olu hlobo lwenguqu alufane lusetyenziswe ukuzisa idatha kwifomu echanekileyo, nangona kunjalo, obu buchule bunokuba luncedo ekudaleni iitafile zepivot ezisetyenziswa kumboniso, okanye ukudibanisa nezinye izixhobo.
Ngokwenene imisebenzi pivot_longer()
и pivot_wider()
zi-symmetrical, kwaye zivelisa izenzo ezichaseneyo enye kwenye, o.k. df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec)
и df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec)
izakubuyisela idf yoqobo.
Owona mzekelo ulula wokuguqula itafile kwifomati ebanzi
Ukubonisa indlela umsebenzi osebenza ngayo pivot_wider()
siya kusebenzisa idataset fish_ukudibana, egcina ulwazi malunga nendlela izikhululo ezahlukeneyo ezirekhoda ngayo ukuhamba kweentlanzi ecaleni komlambo.
#> # A tibble: 114 x 3
#> fish station seen
#> <fct> <fct> <int>
#> 1 4842 Release 1
#> 2 4842 I80_1 1
#> 3 4842 Lisbon 1
#> 4 4842 Rstr 1
#> 5 4842 Base_TD 1
#> 6 4842 BCE 1
#> 7 4842 BCW 1
#> 8 4842 BCE2 1
#> 9 4842 BCW2 1
#> 10 4842 MAE 1
#> # … with 104 more rows
Kwiimeko ezininzi, le theyibhile iya kuba nolwazi ngakumbi kwaye kulula ukuyisebenzisa ukuba unikezela ngolwazi lwesikhululo ngasinye kwikholamu eyahlukileyo.
fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 NA NA NA NA NA
#> 5 4847 1 1 1 NA NA NA NA NA NA NA
#> 6 4848 1 1 1 1 NA NA NA NA NA NA
#> 7 4849 1 1 NA NA NA NA NA NA NA NA
#> 8 4850 1 1 NA 1 1 1 1 NA NA NA
#> 9 4851 1 1 NA NA NA NA NA NA NA NA
#> 10 4854 1 1 NA NA NA NA NA NA NA NA
#> # … with 9 more rows, and 1 more variable: MAW <int>
Le datha iseti irekhodi yolwazi kuphela xa iintlanzi zifunyenwe sisikhululo, oko kukuthi. ukuba nayiphi na intlanzi ayizange irekhodwe sisikhululo esithile, ngoko le datha ayiyi kuba kwitafile. Oku kuthetha ukuba isiphumo siya kuzaliswa nge-NA.
Nangona kunjalo, kule meko siyazi ukuba ukungabikho kwerekhodi kuthetha ukuba intlanzi ayizange ibonwe, ngoko sinokusebenzisa ingxabano amaxabiso_gcwalisa kumsebenzi pivot_wider()
kwaye ugcwalise la maxabiso angekhoyo ngooziro:
fish_encounters %>% pivot_wider(
names_from = station,
values_from = seen,
values_fill = list(seen = 0)
)
#> # A tibble: 19 x 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 0 0 0 0 0
#> 5 4847 1 1 1 0 0 0 0 0 0 0
#> 6 4848 1 1 1 1 0 0 0 0 0 0
#> 7 4849 1 1 0 0 0 0 0 0 0 0
#> 8 4850 1 1 0 1 1 1 1 0 0 0
#> 9 4851 1 1 0 0 0 0 0 0 0 0
#> 10 4854 1 1 0 0 0 0 0 0 0 0
#> # … with 9 more rows, and 1 more variable: MAW <int>
Ukuvelisa igama lekholamu ukusuka kwiintlobo ezininzi zemithombo
Khawucinge ukuba sinetafile equlethe indibaniselwano yemveliso, ilizwe kunye nonyaka. Ukwenza isakhelo somhla wovavanyo, ungaqhuba le khowudi ilandelayo:
df <- expand_grid(
product = c("A", "B"),
country = c("AI", "EI"),
year = 2000:2014
) %>%
filter((product == "A" & country == "AI") | product == "B") %>%
mutate(value = rnorm(nrow(.)))
#> # A tibble: 45 x 4
#> product country year value
#> <chr> <chr> <int> <dbl>
#> 1 A AI 2000 -2.05
#> 2 A AI 2001 -0.676
#> 3 A AI 2002 1.60
#> 4 A AI 2003 -0.353
#> 5 A AI 2004 -0.00530
#> 6 A AI 2005 0.442
#> 7 A AI 2006 -0.610
#> 8 A AI 2007 -2.77
#> 9 A AI 2008 0.899
#> 10 A AI 2009 -0.106
#> # … with 35 more rows
Umsebenzi wethu kukwandisa isakhelo sedatha ukwenzela ukuba ikholamu enye iqulethe idatha yendibaniselwano nganye yemveliso kunye nelizwe. Ukwenza oku, faka nje ingxabano amagama_asuka i-vector equlethe amagama emihlaba ekufuneka idityaniswe.
df %>% pivot_wider(names_from = c(product, country),
values_from = "value")
#> # A tibble: 15 x 4
#> year A_AI B_AI B_EI
#> <int> <dbl> <dbl> <dbl>
#> 1 2000 -2.05 0.607 1.20
#> 2 2001 -0.676 1.65 -0.114
#> 3 2002 1.60 -0.0245 0.501
#> 4 2003 -0.353 1.30 -0.459
#> 5 2004 -0.00530 0.921 -0.0589
#> 6 2005 0.442 -1.55 0.594
#> 7 2006 -0.610 0.380 -1.28
#> 8 2007 -2.77 0.830 0.637
#> 9 2008 0.899 0.0175 -1.30
#> 10 2009 -0.106 -0.195 1.03
#> # … with 5 more rows
Unako kwakhona ukufaka iinkcukacha kumsebenzi pivot_wider()
. Kodwa xa ingenisiwe pivot_wider()
iinkcukacha zenza uguqulelo oluchaseneyo pivot_longer()
: Imihlathi ekhankanyiweyo kwi .name, usebenzisa amaxabiso avela .ixabiso kunye nezinye iikholamu.
Kule datha yedatha, unokuvelisa inkcazo yesiko ukuba ufuna ilizwe elinokwenzeka kunye nemveliso yokudibanisa ibe nekholomu yayo, kungekhona nje ekhoyo kwidatha:
spec <- df %>%
expand(product, country, .value = "value") %>%
unite(".name", product, country, remove = FALSE)
#> # A tibble: 4 x 4
#> .name product country .value
#> <chr> <chr> <chr> <chr>
#> 1 A_AI A AI value
#> 2 A_EI A EI value
#> 3 B_AI B AI value
#> 4 B_EI B EI value
df %>% pivot_wider(spec = spec) %>% head()
#> # A tibble: 6 x 5
#> year A_AI A_EI B_AI B_EI
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 -2.05 NA 0.607 1.20
#> 2 2001 -0.676 NA 1.65 -0.114
#> 3 2002 1.60 NA -0.0245 0.501
#> 4 2003 -0.353 NA 1.30 -0.459
#> 5 2004 -0.00530 NA 0.921 -0.0589
#> 6 2005 0.442 NA -1.55 0.594
Imizekelo emininzi ephucukileyo yokusebenza ngengcinga entsha ye-tidyr
Ukucoca idatha usebenzisa i-US Census Income kunye nedatha yeRenti njengomzekelo.
Iseti yedatha us_rente_ingeniso iqulethe ingeniso ephakathi kunye nolwazi lwerenti yelizwe ngalinye e-US ngo-2017 (iseti yedatha ekhoyo kwiphakheji i-tidycensus).
us_rent_income
#> # A tibble: 104 x 5
#> GEOID NAME variable estimate moe
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 01 Alabama income 24476 136
#> 2 01 Alabama rent 747 3
#> 3 02 Alaska income 32940 508
#> 4 02 Alaska rent 1200 13
#> 5 04 Arizona income 27517 148
#> 6 04 Arizona rent 972 4
#> 7 05 Arkansas income 23789 165
#> 8 05 Arkansas rent 709 5
#> 9 06 California income 29454 109
#> 10 06 California rent 1358 3
#> # … with 94 more rows
Kwifom apho idatha igcinwa khona kwi-dataset us_rente_ingeniso ukusebenza nabo akulunganga kakhulu, ngoko singathanda ukwenza isethi yedatha enemiqolo: Ukuqesha, rent_moe, eze, ingeniso_moe. Zininzi iindlela zokwenza le ngcaciso, kodwa eyona ngongoma iphambili kukuba kufuneka sivelise yonke indibaniselwano yamaxabiso aguquguqukayo kunye. uqikelelo/moekwaye emva koko uvelise igama lekholamu.
spec <- us_rent_income %>%
expand(variable, .value = c("estimate", "moe")) %>%
mutate(
.name = paste0(variable, ifelse(.value == "moe", "_moe", ""))
)
#> # A tibble: 4 x 3
#> variable .value .name
#> <chr> <chr> <chr>
#> 1 income estimate income
#> 2 income moe income_moe
#> 3 rent estimate rent
#> 4 rent moe rent_moe
Ukubonelela ngale ngcaciso pivot_wider()
isinika iziphumo esizifunayo:
us_rent_income %>% pivot_wider(spec = spec)
#> # A tibble: 52 x 6
#> GEOID NAME income income_moe rent rent_moe
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 01 Alabama 24476 136 747 3
#> 2 02 Alaska 32940 508 1200 13
#> 3 04 Arizona 27517 148 972 4
#> 4 05 Arkansas 23789 165 709 5
#> 5 06 California 29454 109 1358 3
#> 6 08 Colorado 32401 109 1125 5
#> 7 09 Connecticut 35326 195 1123 5
#> 8 10 Delaware 31560 247 1076 10
#> 9 11 District of Columbia 43198 681 1424 17
#> 10 12 Florida 25952 70 1077 3
#> # … with 42 more rows
IBhanki yehlabathi
Ngamanye amaxesha ukuzisa idatha kwifomu efunwayo kufuna amanyathelo amaninzi.
Iseti yedatha ibhanki_yelizwe_pop iqulethe idatha yeBhanki yehlabathi malunga nabemi belizwe ngalinye phakathi kuka-2000 kunye no-2018.
#> # A tibble: 1,056 x 20
#> country indicator `2000` `2001` `2002` `2003` `2004` `2005` `2006`
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 ABW SP.URB.T… 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4 4.49e+4
#> 2 ABW SP.URB.G… 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
#> 3 ABW SP.POP.T… 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5 1.01e+5
#> 4 ABW SP.POP.G… 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0 7.98e-1
#> 5 AFG SP.URB.T… 4.44e6 4.65e6 4.89e6 5.16e6 5.43e+6 5.69e+6 5.93e+6
#> 6 AFG SP.URB.G… 3.91e0 4.66e0 5.13e0 5.23e0 5.12e+0 4.77e+0 4.12e+0
#> 7 AFG SP.POP.T… 2.01e7 2.10e7 2.20e7 2.31e7 2.41e+7 2.51e+7 2.59e+7
#> 8 AFG SP.POP.G… 3.49e0 4.25e0 4.72e0 4.82e0 4.47e+0 3.87e+0 3.23e+0
#> 9 AGO SP.URB.T… 8.23e6 8.71e6 9.22e6 9.77e6 1.03e+7 1.09e+7 1.15e+7
#> 10 AGO SP.URB.G… 5.44e0 5.59e0 5.70e0 5.76e0 5.75e+0 5.69e+0 4.92e+0
#> # … with 1,046 more rows, and 11 more variables: `2007` <dbl>,
#> # `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#> # `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>
Injongo yethu kukudala iseti yedatha ecocekileyo kunye noguquko ngalunye kwikholamu yalo. Akukacaci kakuhle ukuba ngawaphi amanyathelo afunekayo, kodwa siza kuqala ngeyona ngxaki icacileyo: unyaka usasazeke kwiikholamu ezininzi.
Ukuze ulungise oku kufuneka usebenzise umsebenzi pivot_longer()
.
pop2 <- world_bank_pop %>%
pivot_longer(`2000`:`2017`, names_to = "year")
#> # A tibble: 19,008 x 4
#> country indicator year value
#> <chr> <chr> <chr> <dbl>
#> 1 ABW SP.URB.TOTL 2000 42444
#> 2 ABW SP.URB.TOTL 2001 43048
#> 3 ABW SP.URB.TOTL 2002 43670
#> 4 ABW SP.URB.TOTL 2003 44246
#> 5 ABW SP.URB.TOTL 2004 44669
#> 6 ABW SP.URB.TOTL 2005 44889
#> 7 ABW SP.URB.TOTL 2006 44881
#> 8 ABW SP.URB.TOTL 2007 44686
#> 9 ABW SP.URB.TOTL 2008 44375
#> 10 ABW SP.URB.TOTL 2009 44052
#> # … with 18,998 more rows
Isinyathelo esilandelayo kukujonga ukuguquguquka kwesalathisi.
pop2 %>% count(indicator)
#> # A tibble: 4 x 2
#> indicator n
#> <chr> <int>
#> 1 SP.POP.GROW 4752
#> 2 SP.POP.TOTL 4752
#> 3 SP.URB.GROW 4752
#> 4 SP.URB.TOTL 4752
Apho iSP.POP.GROW ikukukhula kwabemi, iSP.POP.TOTL inabemi bebonke, kunye neSP.URB. * into efanayo, kodwa kuphela kwimimandla yasezidolophini. Masahlule ezi xabiso zibe zimbini eziguquguqukayo: indawo - indawo (iyonke okanye idolophu) kunye noguquko oluqulethe idatha yokwenyani (abemi okanye ukukhula):
pop3 <- pop2 %>%
separate(indicator, c(NA, "area", "variable"))
#> # A tibble: 19,008 x 5
#> country area variable year value
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 ABW URB TOTL 2000 42444
#> 2 ABW URB TOTL 2001 43048
#> 3 ABW URB TOTL 2002 43670
#> 4 ABW URB TOTL 2003 44246
#> 5 ABW URB TOTL 2004 44669
#> 6 ABW URB TOTL 2005 44889
#> 7 ABW URB TOTL 2006 44881
#> 8 ABW URB TOTL 2007 44686
#> 9 ABW URB TOTL 2008 44375
#> 10 ABW URB TOTL 2009 44052
#> # … with 18,998 more rows
Ngoku ekuphela kwento ekufuneka siyenze kukwahlulahlula umahluko ube ziikholamu ezimbini:
pop3 %>%
pivot_wider(names_from = variable, values_from = value)
#> # A tibble: 9,504 x 5
#> country area year TOTL GROW
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 ABW URB 2000 42444 1.18
#> 2 ABW URB 2001 43048 1.41
#> 3 ABW URB 2002 43670 1.43
#> 4 ABW URB 2003 44246 1.31
#> 5 ABW URB 2004 44669 0.951
#> 6 ABW URB 2005 44889 0.491
#> 7 ABW URB 2006 44881 -0.0178
#> 8 ABW URB 2007 44686 -0.435
#> 9 ABW URB 2008 44375 -0.698
#> 10 ABW URB 2009 44052 -0.731
#> # … with 9,494 more rows
Uluhlu lwabafowunelwa
Umzekelo wokugqibela, khawucinge ukuba unoluhlu lwabafowunelwa olukhutshelweyo kwaye ulunamathisele kwiwebhusayithi:
contacts <- tribble(
~field, ~value,
"name", "Jiena McLellan",
"company", "Toyota",
"name", "John Smith",
"company", "google",
"email", "[email protected]",
"name", "Huxley Ratcliffe"
)
Ukudwelisa olu luhlu kunzima kakhulu kuba akukho ziguquko ezichonga ukuba yeyiphi idata yeyiphi umfowunelwa. Singakulungisa oku ngokuqaphela ukuba idata yomfowunelwa omtsha ngamnye iqala ngokuthi "igama", ngoko ke sinokwenza isichongi esisodwa kwaye sisongeze ngelinye ixesha umhlathi wendawo uqulethe ixabiso "igama":
contacts <- contacts %>%
mutate(
person_id = cumsum(field == "name")
)
contacts
#> # A tibble: 6 x 3
#> field value person_id
#> <chr> <chr> <int>
#> 1 name Jiena McLellan 1
#> 2 company Toyota 1
#> 3 name John Smith 2
#> 4 company google 2
#> 5 email [email protected] 2
#> 6 name Huxley Ratcliffe 3
Ngoku ukuba sine-ID eyodwa yomfowunelwa ngamnye, sinokujika intsimi kunye nexabiso libe yimihlathi:
contacts %>%
pivot_wider(names_from = field, values_from = value)
#> # A tibble: 3 x 4
#> person_id name company email
#> <int> <chr> <chr> <chr>
#> 1 1 Jiena McLellan Toyota <NA>
#> 2 2 John Smith google [email protected]
#> 3 3 Huxley Ratcliffe <NA> <NA>
isiphelo
Uluvo lwam lobuqu kukuba umbono omtsha icocekile ngokwenene enembile ngakumbi, kwaye iphezulu kakhulu ekusebenzeni kwimisebenzi yelifa spread()
и gather()
. Ndiyathemba ukuba eli nqaku likuncede ukuba ujongane nayo pivot_longer()
и pivot_wider()
.
umthombo: www.habr.com