ProHoster > Блог > Kev tswj hwm > R pob tidyr thiab nws cov haujlwm tshiab pivot_longer thiab pivot_wider
R pob tidyr thiab nws cov haujlwm tshiab pivot_longer thiab pivot_wider
Pob tidyr suav nrog lub hauv paus ntawm ib lub tsev qiv ntawv nrov tshaj plaws hauv R hom - tidyverse.
Lub hom phiaj tseem ceeb ntawm pob yog coj cov ntaub ntawv mus rau hauv daim ntawv raug.
Twb muaj nyob rau ntawm Habre luam tawm mob siab rau lub pob no, tab sis nws hnub rov qab mus rau 2015. Thiab kuv xav qhia koj txog qhov kev hloov pauv tam sim no, uas tau tshaj tawm ob peb hnub dhau los los ntawm nws tus kws sau ntawv, Hedley Wickham.
S.J.K.: Puas yuav sau ( ) thiab kis ( ) raug txiav tawm?
Hadley Wickham: Rau qee yam. Peb yuav tsis pom zoo kom siv cov haujlwm no thiab kho cov kab mob hauv lawv, tab sis lawv tseem yuav muaj nyob hauv pob hauv lawv lub xeev tam sim no.
Txheem
Yog tias koj txaus siab rau cov ntaub ntawv tsom xam, tej zaum koj yuav txaus siab rau kuv telegram и youtube cov channel. Cov ntsiab lus feem ntau yog mob siab rau R hom lus.
Lub hom phiaj tidyr - pab koj nqa cov ntaub ntawv mus rau lub npe hu ua zoo zoo. Cov ntaub ntawv zoo yog cov ntaub ntawv uas:
Txhua qhov sib txawv yog nyob rau hauv ib kab.
Txhua qhov kev soj ntsuam yog ib txoj hlua.
Txhua tus nqi yog cell.
Nws yog qhov yooj yim dua thiab yooj yim dua los ua haujlwm nrog cov ntaub ntawv uas tau nthuav tawm hauv cov ntaub ntawv huv si thaum ua kev tshuaj xyuas.
Cov haujlwm tseem ceeb suav nrog hauv pob tidyr
tidyr muaj cov haujlwm tsim los hloov cov ntxhuav:
pivot_longer() - ib txoj haujlwm uas hloov cov ntaub ntawv los ntawm ntau hom ntawv mus rau hom ntawv ntev;
pivot_wider() - muaj nuj nqi uas hloov cov ntaub ntawv los ntawm hom ntev mus rau hom dav. Rov qab ua haujlwm ntawm ib qho ua los ntawm kev ua haujlwm pivot_longer().
gather()dhau lawm - ib txoj haujlwm uas hloov cov ntaub ntawv los ntawm ntau hom ntawv mus rau hom ntawv ntev;
spread()dhau lawm - muaj nuj nqi uas hloov cov ntaub ntawv los ntawm hom ntev mus rau hom dav. Rov qab ua haujlwm ntawm ib qho ua los ntawm kev ua haujlwm gather().
Lub tswv yim tshiab rau kev hloov cov ntaub ntawv los ntawm dav mus rau ntev hom thiab rov ua dua
Yav dhau los, kev ua haujlwm tau siv rau hom kev hloov pauv no gather() и spread(). Tau ntau xyoo dhau los ntawm cov haujlwm no, nws tau pom tseeb tias rau cov neeg siv feem ntau, suav nrog tus sau lub pob, cov npe ntawm cov haujlwm no thiab lawv cov lus sib cav tsis pom tseeb, thiab ua rau muaj teeb meem nrhiav lawv thiab nkag siab qhov twg ntawm cov haujlwm no hloov pauv. ib hnub thav duab los ntawm dav mus rau ntev hom, thiab vice versa.
Hauv qhov no, hauv tidyr Ob txoj haujlwm tshiab, tseem ceeb tau ntxiv uas tau tsim los hloov cov hnub thav duab.
Cov yam ntxwv tshiab pivot_longer() и pivot_wider() tau txais kev tshoov siab los ntawm qee yam ntawm cov yam ntxwv hauv pob cdata, tsim los ntawm John Mount thiab Nina Zumel.
Txhim kho qhov tam sim no version ntawm tidyr 0.8.3.9000
Txhawm rau nruab qhov tshiab, feem ntau tam sim no ntawm lub pob tidyr0.8.3.9000, qhov twg cov yam ntxwv tshiab muaj, siv cov cai hauv qab no.
devtools::install_github("tidyverse/tidyr")
Thaum lub sijhawm sau ntawv, cov haujlwm no tsuas yog muaj nyob rau hauv dev version ntawm pob ntawm GitHub.
Hloov mus rau cov yam ntxwv tshiab
Qhov tseeb, hloov cov ntawv qub los ua haujlwm nrog cov haujlwm tshiab tsis yog qhov nyuaj, kom nkag siab ntau dua, kuv yuav ua piv txwv los ntawm cov ntaub ntawv ntawm cov haujlwm qub thiab qhia tias tib txoj haujlwm ua haujlwm li cas siv cov tshiab. pivot_*() kev ua haujlwm.
Hloov hom ntawv dav mus rau hom ntawv ntev.
Piv txwv code los ntawm cov ntaub ntawv sau ua haujlwm
# example
library(dplyr)
stocks <- data.frame(
time = as.Date('2009-01-01') + 0:9,
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2),
Z = rnorm(10, 0, 4)
)
# old
stocks_gather <- stocks %>% gather(key = stock,
value = price,
-time)
# new
stocks_long <- stocks %>% pivot_longer(cols = -time,
names_to = "stock",
values_to = "price")
Hloov hom ntawv ntev mus rau hom dav.
Piv txwv code los ntawm cov ntaub ntawv nthuav dav
# old
stocks_spread <- stocks_gather %>% spread(key = stock,
value = price)
# new
stock_wide <- stocks_long %>% pivot_wider(names_from = "stock",
values_from = "price")
Vim hauv cov piv txwv saum toj no ntawm kev ua haujlwm nrog pivot_longer() и pivot_wider(), nyob rau hauv thawj lub rooj stocks tsis muaj kab ntawv teev nyob rau hauv kev sib cav npe_to и values_rau lawv cov npe yuav tsum muaj nyob rau hauv cov lus hais.
Ib lub rooj uas yuav pab tau koj yooj yim tshaj plaws kom paub yuav ua li cas hloov mus ua hauj lwm nrog lub tswv yim tshiab tidyr.
Nco tseg los ntawm tus sau
Tag nrho cov ntawv hauv qab no yog yoog raws, kuv txawm hais tias dawb txhais lus vignettes los ntawm official tidyverse tsev qiv ntawv lub vev xaib.
Ib qho piv txwv yooj yim ntawm kev hloov cov ntaub ntawv los ntawm dav mus rau hom ntawv ntev
pivot_longer () - ua kom cov ntaub ntawv teev ntev dua los ntawm kev txo tus naj npawb ntawm kab thiab nce cov kab.
Cov kab ntawv ntxiv ntawm qhov kev qhia tshwj xeeb qhia txog yuav ua li cas cov kab tshiab yuav tso tawm lub npe ntawm cov kab uas tau sau los ntawm .name.
Cov lus qhia tshwj xeeb piav qhia cov metadata khaws cia hauv kab npe, nrog ib kab rau txhua kab thiab ib kab rau txhua qhov sib txawv, ua ke nrog cov npe kab lus, cov ntsiab lus no yuav zoo li tsis meej pem tam sim no, tab sis tom qab saib ob peb yam piv txwv nws yuav dhau los ua ntau. meej dua.
Txhawm rau ua haujlwm nrog cov lus qhia tshwj xeeb thaum hloov lub rooj los ntawm ntau hom ntawv mus rau ib hom ntawv ntev, siv cov haujlwm pivot_longer_spec().
Qhov kev ua haujlwm no ua haujlwm li cas yog tias nws yuav siv sijhawm ib hnub thiab tsim nws cov metadata raws li tau piav qhia saum toj no.
Ua piv txwv, cia peb coj tus dataset uas tau muab nrog lub pob tidyr. Cov ntaub ntawv no muaj cov ntaub ntawv muab los ntawm lub koom haum saib xyuas kev noj qab haus huv thoob ntiaj teb txog qhov tshwm sim ntawm tuberculosis.
who
#> # A tibble: 7,240 x 60
#> country iso2 iso3 year new_sp_m014 new_sp_m1524 new_sp_m2534
#> <chr> <chr> <chr> <int> <int> <int> <int>
#> 1 Afghan… AF AFG 1980 NA NA NA
#> 2 Afghan… AF AFG 1981 NA NA NA
#> 3 Afghan… AF AFG 1982 NA NA NA
#> 4 Afghan… AF AFG 1983 NA NA NA
#> 5 Afghan… AF AFG 1984 NA NA NA
#> 6 Afghan… AF AFG 1985 NA NA NA
#> 7 Afghan… AF AFG 1986 NA NA NA
#> 8 Afghan… AF AFG 1987 NA NA NA
#> 9 Afghan… AF AFG 1988 NA NA NA
#> 10 Afghan… AF AFG 1989 NA NA NA
#> # … with 7,230 more rows, and 53 more variables
Cia peb tsim nws cov specification.
spec <- who %>%
pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")
#> # A tibble: 405,440 x 8
#> country iso2 iso3 year diagnosis gender age count
#> <chr> <chr> <chr> <int> <chr> <fct> <ord> <int>
#> 1 Afghanistan AF AFG 1980 sp m 014 NA
#> 2 Afghanistan AF AFG 1980 sp m 1524 NA
#> 3 Afghanistan AF AFG 1980 sp m 2534 NA
#> 4 Afghanistan AF AFG 1980 sp m 3544 NA
#> 5 Afghanistan AF AFG 1980 sp m 4554 NA
#> 6 Afghanistan AF AFG 1980 sp m 5564 NA
#> 7 Afghanistan AF AFG 1980 sp m 65 NA
#> 8 Afghanistan AF AFG 1980 sp f 014 NA
#> 9 Afghanistan AF AFG 1980 sp f 1524 NA
#> 10 Afghanistan AF AFG 1980 sp f 2534 NA
#> # … with 405,430 more rows
Txhua yam peb nyuam qhuav ua tuaj yeem ua tau piav qhia raws li hauv qab no:
Specification siv ntau tus nqi (.value)
Hauv qhov piv txwv saum toj no, kab lus specification .tus nqi muaj tsuas yog ib qho nqi, feem ntau qhov no yog rooj plaub.
Peb siv kev sib cav na.rm = TRUE, vim tias daim ntawv tam sim no ntawm cov ntaub ntawv yuam kev tsim cov kab ntxiv rau kev soj ntsuam uas tsis muaj nyob. Vim tsev neeg 2 muaj ib tug me nyuam xwb, na.rm = TRUE lav tias tsev neeg 2 yuav muaj ib kab hauv cov zis.
Hloov cov hnub thav duab los ntawm ntev mus rau dav hom
pivot_wider() - yog qhov kev hloov pauv hloov pauv, thiab rov qab ua kom cov kab ntawm kab hnub los ntawm kev txo cov kab.
Qhov kev hloov pauv no tsis tshua muaj siv los coj cov ntaub ntawv mus rau hauv daim ntawv raug, txawm li cas los xij, cov txheej txheem no tuaj yeem muaj txiaj ntsig zoo rau kev tsim cov lus pivot siv hauv kev nthuav qhia, lossis rau kev koom ua ke nrog lwm cov cuab yeej.
Tiag tiag cov haujlwm pivot_longer() и pivot_wider() yog symmetrical, thiab tsim ua inverse rau ib leeg, i.e. df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec) и df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec) yuav rov qab tus qub df.
Qhov yooj yim piv txwv ntawm kev hloov lub rooj rau ib hom ntawv dav
Yuav ua kom pom kev ua haujlwm li cas pivot_wider() peb yuav siv lub dataset ntses_encounters, uas khaws cov ntaub ntawv hais txog yuav ua li cas cov chaw nres tsheb sib txawv sau cov kev txav ntawm cov ntses raws tus dej.
#> # A tibble: 114 x 3
#> fish station seen
#> <fct> <fct> <int>
#> 1 4842 Release 1
#> 2 4842 I80_1 1
#> 3 4842 Lisbon 1
#> 4 4842 Rstr 1
#> 5 4842 Base_TD 1
#> 6 4842 BCE 1
#> 7 4842 BCW 1
#> 8 4842 BCE2 1
#> 9 4842 BCW2 1
#> 10 4842 MAE 1
#> # … with 104 more rows
Feem ntau, cov lus no yuav qhia ntau dua thiab siv tau yooj yim dua yog tias koj nthuav qhia cov ntaub ntawv rau txhua qhov chaw nres tsheb hauv ib kab sib cais.
fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 NA NA NA NA NA
#> 5 4847 1 1 1 NA NA NA NA NA NA NA
#> 6 4848 1 1 1 1 NA NA NA NA NA NA
#> 7 4849 1 1 NA NA NA NA NA NA NA NA
#> 8 4850 1 1 NA 1 1 1 1 NA NA NA
#> 9 4851 1 1 NA NA NA NA NA NA NA NA
#> 10 4854 1 1 NA NA NA NA NA NA NA NA
#> # … with 9 more rows, and 1 more variable: MAW <int>
Cov ntaub ntawv no tsuas yog sau cov ntaub ntawv thaum ntses tau kuaj pom los ntawm qhov chaw nres tsheb, i.e. yog tias ib qho ntses tsis raug kaw los ntawm qee qhov chaw nres tsheb, cov ntaub ntawv no yuav tsis nyob hauv lub rooj. Qhov no txhais tau tias cov zis yuav puv nrog NA.
Txawm li cas los xij, qhov no peb paub tias qhov tsis muaj cov ntaub ntawv txhais tau tias tsis pom cov ntses, yog li peb tuaj yeem siv qhov kev sib cav. values_fill hauv kev ua haujlwm pivot_wider() thiab sau cov txiaj ntsig uas ploj lawm nrog xoom:
Xav txog tias peb muaj ib lub rooj uas muaj kev sib txuas ntawm cov khoom, lub teb chaws thiab xyoo. Txhawm rau tsim ib daim ntawv xeem hnub, koj tuaj yeem khiav cov cai hauv qab no:
df <- expand_grid(
product = c("A", "B"),
country = c("AI", "EI"),
year = 2000:2014
) %>%
filter((product == "A" & country == "AI") | product == "B") %>%
mutate(value = rnorm(nrow(.)))
#> # A tibble: 45 x 4
#> product country year value
#> <chr> <chr> <int> <dbl>
#> 1 A AI 2000 -2.05
#> 2 A AI 2001 -0.676
#> 3 A AI 2002 1.60
#> 4 A AI 2003 -0.353
#> 5 A AI 2004 -0.00530
#> 6 A AI 2005 0.442
#> 7 A AI 2006 -0.610
#> 8 A AI 2007 -2.77
#> 9 A AI 2008 0.899
#> 10 A AI 2009 -0.106
#> # … with 35 more rows
#> # A tibble: 4 x 4
#> .name product country .value
#> <chr> <chr> <chr> <chr>
#> 1 A_AI A AI value
#> 2 A_EI A EI value
#> 3 B_AI B AI value
#> 4 B_EI B EI value
df %>% pivot_wider(spec = spec) %>% head()
#> # A tibble: 6 x 5
#> year A_AI A_EI B_AI B_EI
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 -2.05 NA 0.607 1.20
#> 2 2001 -0.676 NA 1.65 -0.114
#> 3 2002 1.60 NA -0.0245 0.501
#> 4 2003 -0.353 NA 1.30 -0.459
#> 5 2004 -0.00530 NA 0.921 -0.0589
#> 6 2005 0.442 NA -1.55 0.594
Ntau qhov piv txwv ntawm kev ua haujlwm nrog lub tswv yim tshiab tidyr
Ntxuav cov ntaub ntawv siv US Census Income thiab Rent dataset ua piv txwv.
Cov ntaub ntawv teev peb_rent_income muaj cov nyiaj tau los nruab nrab thiab cov ntaub ntawv xauj tsev rau txhua lub xeev hauv Asmeskas rau 2017 (cov ntaub ntawv teev muaj nyob rau hauv pob tidycensus).
us_rent_income
#> # A tibble: 104 x 5
#> GEOID NAME variable estimate moe
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 01 Alabama income 24476 136
#> 2 01 Alabama rent 747 3
#> 3 02 Alaska income 32940 508
#> 4 02 Alaska rent 1200 13
#> 5 04 Arizona income 27517 148
#> 6 04 Arizona rent 972 4
#> 7 05 Arkansas income 23789 165
#> 8 05 Arkansas rent 709 5
#> 9 06 California income 29454 109
#> 10 06 California rent 1358 3
#> # … with 94 more rows
Nyob rau hauv daim ntawv uas cov ntaub ntawv khaws cia nyob rau hauv lub dataset peb_rent_income kev ua haujlwm nrog lawv yog qhov tsis yooj yim heev, yog li peb xav tsim cov ntaub ntawv teeb tsa nrog kab: nqi xauj tsev, rent_moe, tuaj, khwv nyiaj_moe. Muaj ntau txoj hauv kev los tsim qhov kev qhia tshwj xeeb no, tab sis lub ntsiab lus tseem ceeb yog tias peb yuav tsum tsim txhua qhov sib xyaw ua ke ntawm cov txiaj ntsig sib txawv thiab kwv/moethiab tom qab ntawd tsim lub npe kab.
#> # A tibble: 6 x 3
#> field value person_id
#> <chr> <chr> <int>
#> 1 name Jiena McLellan 1
#> 2 company Toyota 1
#> 3 name John Smith 2
#> 4 company google 2
#> 5 email [email protected] 2
#> 6 name Huxley Ratcliffe 3
Tam sim no peb muaj tus ID tshwj xeeb rau txhua qhov kev sib cuag, peb tuaj yeem tig daim teb thiab tus nqi rau hauv kab:
#> # A tibble: 3 x 4
#> person_id name company email
#> <int> <chr> <chr> <chr>
#> 1 1 Jiena McLellan Toyota <NA>
#> 2 2 John Smith google [email protected]
#> 3 3 Huxley Ratcliffe <NA> <NA>
xaus
Kuv tus kheej xav tias lub tswv yim tshiab tidyr tiag tiag ntau intuitive, thiab ho superior nyob rau hauv functionality rau legacy functions spread() и gather(). Kuv vam tias tsab xov xwm no tau pab koj daws nrog pivot_longer() и pivot_wider().