R package tidyr ati awọn oniwe-titun awọn iṣẹ pivot_longer ati pivot_wider

Apoti tidyr to wa ninu koko ọkan ninu awọn ile-ikawe olokiki julọ ni ede R - tidyverse.
Idi akọkọ ti package ni lati mu data wa sinu fọọmu deede.

Tẹlẹ wa lori Habré atejade igbẹhin si package yii, ṣugbọn o wa pada si ọdun 2015. Ati pe Mo fẹ lati sọ fun ọ nipa awọn iyipada lọwọlọwọ julọ, eyiti a kede ni ọjọ diẹ sẹhin nipasẹ onkọwe rẹ, Hedley Wickham.

R package tidyr ati awọn oniwe-titun awọn iṣẹ pivot_longer ati pivot_wider

S.J.K.: Yoo ṣajọ () ati tan kaakiri () jẹ idinku?

Hadley Wickham: Si diẹ ninu awọn iye. A kii yoo ṣeduro lilo awọn iṣẹ wọnyi mọ ati ṣatunṣe awọn idun ninu wọn, ṣugbọn wọn yoo tẹsiwaju lati wa ninu package ni ipo lọwọlọwọ wọn.

Awọn akoonu

Ti o ba nifẹ si itupalẹ data, o le nifẹ ninu mi telegram и youtube awọn ikanni. Pupọ julọ akoonu jẹ igbẹhin si ede R.

TidyData Erongba

Ero tidyr - ṣe iranlọwọ fun ọ lati mu data wa si eyiti a pe ni fọọmu afinju. Data afinju jẹ data nibiti:

  • Oniyipada kọọkan wa ninu iwe kan.
  • Akiyesi kọọkan jẹ okun.
  • Iye kọọkan jẹ sẹẹli kan.

O rọrun pupọ ati irọrun diẹ sii lati ṣiṣẹ pẹlu data ti o ṣafihan ni data mimọ nigbati o nṣe itupalẹ.

Awọn iṣẹ akọkọ ti o wa ninu package tidyr

tidyr ni akojọpọ awọn iṣẹ ti a ṣe apẹrẹ lati yi awọn tabili pada:

  • fill() - kikun awọn iye ti o padanu ninu iwe kan pẹlu awọn iye iṣaaju;
  • separate() - pin aaye kan si ọpọlọpọ nipa lilo oluyapa;
  • unite() - ṣe iṣẹ ṣiṣe ti apapọ awọn aaye pupọ sinu ọkan, iṣẹ onidakeji ti iṣẹ naa separate();
  • pivot_longer() - iṣẹ kan ti o ṣe iyipada data lati ọna kika jakejado si ọna kika gigun;
  • pivot_wider() - iṣẹ kan ti o ṣe iyipada data lati ọna kika gigun si ọna kika jakejado. Iṣiṣẹ iyipada ti ọkan ti o ṣe nipasẹ iṣẹ naa pivot_longer().
  • gather()igba atijọ - iṣẹ kan ti o ṣe iyipada data lati ọna kika jakejado si ọna kika gigun;
  • spread()igba atijọ - iṣẹ kan ti o ṣe iyipada data lati ọna kika gigun si ọna kika jakejado. Iṣiṣẹ iyipada ti ọkan ti o ṣe nipasẹ iṣẹ naa gather().

Agbekale tuntun fun iyipada data lati jakejado si ọna kika gigun ati idakeji

Ni iṣaaju, awọn iṣẹ ni a lo fun iru iyipada yii gather() и spread(). Ni awọn ọdun ti aye ti awọn iṣẹ wọnyi, o han gbangba pe fun ọpọlọpọ awọn olumulo, pẹlu onkọwe ti package, awọn orukọ awọn iṣẹ wọnyi ati awọn ariyanjiyan wọn ko han gbangba, o fa awọn iṣoro ni wiwa wọn ati oye eyiti ninu awọn iṣẹ wọnyi ṣe iyipada. fireemu ọjọ kan lati jakejado si ọna kika gigun, ati ni idakeji.

Ni idi eyi, in tidyr Tuntun meji, awọn iṣẹ pataki ti jẹ apẹrẹ lati yi awọn fireemu ọjọ pada.

Awọn ẹya tuntun pivot_longer() и pivot_wider() ni atilẹyin nipasẹ diẹ ninu awọn ẹya ti o wa ninu package cdata, ti a ṣẹda nipasẹ John Mount ati Nina Zumel.

Fifi awọn julọ lọwọlọwọ version of tidyr 0.8.3.9000

Lati fi sori ẹrọ tuntun, ẹya lọwọlọwọ julọ ti package tidyr 0.8.3.9000, nibiti awọn ẹya tuntun wa, lo koodu atẹle.

devtools::install_github("tidyverse/tidyr")

Ni akoko kikọ, awọn iṣẹ wọnyi wa nikan ni ẹya dev ti package lori GitHub.

Iyipada si awọn ẹya tuntun

Ni otitọ, ko nira lati gbe awọn iwe afọwọkọ atijọ lati ṣiṣẹ pẹlu awọn iṣẹ tuntun; fun oye ti o dara julọ, Emi yoo gba apẹẹrẹ lati inu iwe ti awọn iṣẹ atijọ ati ṣafihan bii awọn iṣẹ ṣiṣe kanna ṣe ni lilo awọn tuntun. pivot_*() awọn iṣẹ.

Iyipada ọna kika jakejado si ọna kika gigun.

Apeere koodu lati akojo iṣẹ iwe

# example
library(dplyr)
stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

# old
stocks_gather <- stocks %>% gather(key   = stock, 
                                   value = price, 
                                   -time)

# new
stocks_long   <- stocks %>% pivot_longer(cols      = -time, 
                                       names_to  = "stock", 
                                       values_to = "price")

Yiyipada ọna kika gigun si ọna kika jakejado.

Apẹẹrẹ koodu lati iwe iṣẹ itankale

# old
stocks_spread <- stocks_gather %>% spread(key = stock, 
                                          value = price) 

# new 
stock_wide    <- stocks_long %>% pivot_wider(names_from  = "stock",
                                            values_from = "price")

Nitori ninu awọn loke apeere ti ṣiṣẹ pẹlu awọn pivot_longer() и pivot_wider(), ninu atilẹba tabili akojopo ko si ọwọn akojọ si ni awọn ariyanjiyan awọn orukọ_si и iye_to orukọ wọn gbọdọ wa ni awọn ami ifọrọhan.

Tabili ti yoo ṣe iranlọwọ fun ọ julọ ni irọrun ro bi o ṣe le yipada si ṣiṣẹ pẹlu imọran tuntun tidyr.

R package tidyr ati awọn oniwe-titun awọn iṣẹ pivot_longer ati pivot_wider

Akiyesi lati onkowe

Gbogbo ọrọ ti o wa ni isalẹ jẹ adaṣe, Emi yoo paapaa sọ itumọ ọfẹ vignettes lati awọn osise tidyverse ìkàwé aaye ayelujara.

Apeere ti o rọrun ti iyipada data lati fife si ọna kika gigun

pivot_longer () - jẹ ki awọn eto data gun gun nipasẹ idinku nọmba awọn ọwọn ati jijẹ nọmba awọn ori ila.

R package tidyr ati awọn oniwe-titun awọn iṣẹ pivot_longer ati pivot_wider

Lati ṣiṣẹ awọn apẹẹrẹ ti a gbekalẹ ninu nkan naa, o nilo akọkọ lati sopọ awọn idii pataki:

library(tidyr)
library(dplyr)
library(readr)

Jẹ ki a sọ pe a ni tabili pẹlu awọn abajade iwadi kan ti (pẹlu awọn ohun miiran) beere lọwọ eniyan nipa ẹsin wọn ati owo-ori ọdọọdun:

#> # A tibble: 18 x 11
#>    religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k`
#>    <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#>  1 Agnostic      27        34        60        81        76       137
#>  2 Atheist       12        27        37        52        35        70
#>  3 Buddhist      27        21        30        34        33        58
#>  4 Catholic     418       617       732       670       638      1116
#>  5 Don’t k…      15        14        15        11        10        35
#>  6 Evangel…     575       869      1064       982       881      1486
#>  7 Hindu          1         9         7         9        11        34
#>  8 Histori…     228       244       236       238       197       223
#>  9 Jehovah…      20        27        24        24        21        30
#> 10 Jewish        19        19        25        25        30        95
#> # … with 8 more rows, and 4 more variables: `$75-100k` <dbl>,
#> #   `$100-150k` <dbl>, `>150k` <dbl>, `Don't know/refused` <dbl>

Tabili yii ni awọn data ẹsin ti awọn oludahun ninu awọn ori ila, ati awọn ipele owo-wiwọle ti tuka kaakiri awọn orukọ ọwọn. Nọmba awọn idahun lati ẹka kọọkan ti wa ni ipamọ ninu awọn iye sẹẹli ni ikorita ti ẹsin ati ipele owo-wiwọle. Lati mu tabili wa sinu afinju, ọna kika ti o tọ, o to lati lo pivot_longer():

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3
#>    religion income             count
#>    <chr>    <chr>              <dbl>
#>  1 Agnostic <$10k                 27
#>  2 Agnostic $10-20k               34
#>  3 Agnostic $20-30k               60
#>  4 Agnostic $30-40k               81
#>  5 Agnostic $40-50k               76
#>  6 Agnostic $50-75k              137
#>  7 Agnostic $75-100k             122
#>  8 Agnostic $100-150k            109
#>  9 Agnostic >150k                 84
#> 10 Agnostic Don't know/refused    96
#> # … with 170 more rows

Awọn ariyanjiyan iṣẹ pivot_longer()

  • Akọkọ ariyanjiyan kola, ṣe apejuwe awọn ọwọn wo ni o nilo lati dapọ. Ni idi eyi, gbogbo awọn ọwọn ayafi akoko.
  • Ariyanjiyan awọn orukọ_si yoo fun awọn orukọ ti awọn oniyipada ti yoo wa ni da lati awọn orukọ ti awọn ọwọn ti a concatenated.
  • iye_to n fun orukọ oniyipada kan ti yoo ṣẹda lati data ti o fipamọ sinu awọn iye ti awọn sẹẹli ti awọn ọwọn ti a dapọ.

Awọn alaye pato (satunkọ)

Eyi jẹ iṣẹ-ṣiṣe tuntun ti package tidyr, eyiti ko si tẹlẹ nigbati o ba n ṣiṣẹ pẹlu awọn iṣẹ iṣe.

Sipesifikesonu jẹ fireemu data kan, ila kọọkan eyiti o baamu iwe kan ninu fireemu ọjọ iṣelọpọ tuntun, ati awọn ọwọn pataki meji ti o bẹrẹ pẹlu:

  • .name ni awọn atilẹba iwe orukọ.
  • .iye ni orukọ ti ọwọn ti yoo ni awọn iye sẹẹli ninu.

Awọn ọwọn ti o ku ti sipesifikesonu ṣe afihan bi iwe tuntun yoo ṣe han orukọ awọn ọwọn fisinuirindigbindigbin lati .name.

Sipesifikesonu ṣe apejuwe metadata ti a fipamọ sinu orukọ ọwọn, pẹlu ila kan fun iwe kọọkan ati iwe kan fun oniyipada kọọkan, ni idapo pẹlu orukọ iwe, itumọ yii le dabi airoju ni akoko, ṣugbọn lẹhin wiwo awọn apẹẹrẹ diẹ yoo di pupọ. kedere.

Ojuami ti sipesifikesonu ni pe o le gba pada, yipada, ati ṣalaye metadata tuntun fun iyipada dataframe naa.

Lati ṣiṣẹ pẹlu awọn pato nigba iyipada tabili lati ọna kika jakejado si ọna kika gigun, lo iṣẹ naa pivot_longer_spec().

Bii iṣẹ yii ṣe n ṣiṣẹ ni pe o gba fireemu ọjọ eyikeyi ati ṣe ipilẹṣẹ metadata rẹ ni ọna ti a ṣalaye loke.

Gẹgẹbi apẹẹrẹ, jẹ ki a mu data data tani ti o pese pẹlu package tidyr. Ipilẹ data yii ni alaye ti a pese nipasẹ ajọ ilera agbaye lori iṣẹlẹ ti iko.

who
#> # A tibble: 7,240 x 60
#>    country iso2  iso3   year new_sp_m014 new_sp_m1524 new_sp_m2534
#>    <chr>   <chr> <chr> <int>       <int>        <int>        <int>
#>  1 Afghan… AF    AFG    1980          NA           NA           NA
#>  2 Afghan… AF    AFG    1981          NA           NA           NA
#>  3 Afghan… AF    AFG    1982          NA           NA           NA
#>  4 Afghan… AF    AFG    1983          NA           NA           NA
#>  5 Afghan… AF    AFG    1984          NA           NA           NA
#>  6 Afghan… AF    AFG    1985          NA           NA           NA
#>  7 Afghan… AF    AFG    1986          NA           NA           NA
#>  8 Afghan… AF    AFG    1987          NA           NA           NA
#>  9 Afghan… AF    AFG    1988          NA           NA           NA
#> 10 Afghan… AF    AFG    1989          NA           NA           NA
#> # … with 7,230 more rows, and 53 more variables

Jẹ ká kọ awọn oniwe-sipesifikesonu.

spec <- who %>%
  pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")

#> # A tibble: 56 x 3
#>    .name        .value name        
#>    <chr>        <chr>  <chr>       
#>  1 new_sp_m014  count  new_sp_m014 
#>  2 new_sp_m1524 count  new_sp_m1524
#>  3 new_sp_m2534 count  new_sp_m2534
#>  4 new_sp_m3544 count  new_sp_m3544
#>  5 new_sp_m4554 count  new_sp_m4554
#>  6 new_sp_m5564 count  new_sp_m5564
#>  7 new_sp_m65   count  new_sp_m65  
#>  8 new_sp_f014  count  new_sp_f014 
#>  9 new_sp_f1524 count  new_sp_f1524
#> 10 new_sp_f2534 count  new_sp_f2534
#> # … with 46 more rows

Awọn aaye orilẹ-ede, isoxnumx, isoxnumx jẹ tẹlẹ oniyipada. Iṣẹ-ṣiṣe wa ni lati yi awọn ọwọn pada pẹlu titun_sp_m014 on tuntun_f65.

Awọn orukọ ti awọn ọwọn wọnyi tọju alaye wọnyi:

  • Ìpele new_ tọkasi wipe awọn iwe ni awọn data lori titun igba ti iko, awọn ti isiyi fireemu ọjọ ni alaye nikan lori titun arun, ki yi ìpele ninu awọn ti isiyi o tọ ko ni gbe eyikeyi itumo.
  • sp/rel/sp/ep ṣe apejuwe ọna kan fun ṣiṣe ayẹwo aisan kan.
  • m/f iwa alaisan.
  • 014/1524/2535/3544/4554/65 alaisan ori ibiti o.

A le pin awọn ọwọn wọnyi nipa lilo iṣẹ naa extract()lilo deede ikosile.

spec <- spec %>%
        extract(name, c("diagnosis", "gender", "age"), "new_?(.*)_(.)(.*)")

#> # A tibble: 56 x 5
#>    .name        .value diagnosis gender age  
#>    <chr>        <chr>  <chr>     <chr>  <chr>
#>  1 new_sp_m014  count  sp        m      014  
#>  2 new_sp_m1524 count  sp        m      1524 
#>  3 new_sp_m2534 count  sp        m      2534 
#>  4 new_sp_m3544 count  sp        m      3544 
#>  5 new_sp_m4554 count  sp        m      4554 
#>  6 new_sp_m5564 count  sp        m      5564 
#>  7 new_sp_m65   count  sp        m      65   
#>  8 new_sp_f014  count  sp        f      014  
#>  9 new_sp_f1524 count  sp        f      1524 
#> 10 new_sp_f2534 count  sp        f      2534 
#> # … with 46 more rows

Jọwọ ṣe akiyesi ọwọn naa .name yẹ ki o wa ko yipada nitori eyi ni atọka wa sinu awọn orukọ ọwọn ti ipilẹ data atilẹba.

Iwa ati ọjọ ori (awọn ọwọn abo и ori) ni awọn iye ti o wa titi ati ti a mọ, nitorinaa o gba ọ niyanju lati yi awọn ọwọn wọnyi pada si awọn okunfa:

spec <-  spec %>%
            mutate(
              gender = factor(gender, levels = c("f", "m")),
              age = factor(age, levels = unique(age), ordered = TRUE)
            ) 

Nikẹhin, lati le lo sipesifikesonu ti a ṣẹda si fireemu ọjọ atilẹba ti o a nilo lati lo ariyanjiyan Ami ni iṣẹ pivot_longer().

who %>% pivot_longer(spec = spec)

#> # A tibble: 405,440 x 8
#>    country     iso2  iso3   year diagnosis gender age   count
#>    <chr>       <chr> <chr> <int> <chr>     <fct>  <ord> <int>
#>  1 Afghanistan AF    AFG    1980 sp        m      014      NA
#>  2 Afghanistan AF    AFG    1980 sp        m      1524     NA
#>  3 Afghanistan AF    AFG    1980 sp        m      2534     NA
#>  4 Afghanistan AF    AFG    1980 sp        m      3544     NA
#>  5 Afghanistan AF    AFG    1980 sp        m      4554     NA
#>  6 Afghanistan AF    AFG    1980 sp        m      5564     NA
#>  7 Afghanistan AF    AFG    1980 sp        m      65       NA
#>  8 Afghanistan AF    AFG    1980 sp        f      014      NA
#>  9 Afghanistan AF    AFG    1980 sp        f      1524     NA
#> 10 Afghanistan AF    AFG    1980 sp        f      2534     NA
#> # … with 405,430 more rows

Ohun gbogbo ti a kan ṣe ni a le ṣe afihan ni ọna kika bi atẹle:

R package tidyr ati awọn oniwe-titun awọn iṣẹ pivot_longer ati pivot_wider

Sipesifikesonu nipa lilo awọn iye pupọ (.iye)

Ni awọn apẹẹrẹ loke, awọn sipesifikesonu iwe .iye iye kan ṣoṣo ti o wa ninu, ni ọpọlọpọ igba eyi jẹ ọran naa.

Ṣugbọn lẹẹkọọkan ipo kan le dide nigbati o nilo lati gba data lati awọn ọwọn pẹlu awọn oriṣi data oriṣiriṣi ni awọn iye. Lilo iṣẹ-iṣajulọ spread() eyi yoo nira pupọ lati ṣe.

Awọn apẹẹrẹ ni isalẹ wa ni ya lati vignettes si package tabili.

Jẹ ki ká ṣẹda ikẹkọ dataframe.

family <- tibble::tribble(
  ~family,  ~dob_child1,  ~dob_child2, ~gender_child1, ~gender_child2,
       1L, "1998-11-26", "2000-01-29",             1L,             2L,
       2L, "1996-06-22",           NA,             2L,             NA,
       3L, "2002-07-11", "2004-04-05",             2L,             2L,
       4L, "2004-10-10", "2009-08-27",             1L,             1L,
       5L, "2000-12-05", "2005-02-28",             2L,             1L,
)
family <- family %>% mutate_at(vars(starts_with("dob")), parse_date)

#> # A tibble: 5 x 5
#>   family dob_child1 dob_child2 gender_child1 gender_child2
#>    <int> <date>     <date>             <int>         <int>
#> 1      1 1998-11-26 2000-01-29             1             2
#> 2      2 1996-06-22 NA                     2            NA
#> 3      3 2002-07-11 2004-04-05             2             2
#> 4      4 2004-10-10 2009-08-27             1             1
#> 5      5 2000-12-05 2005-02-28             2             1

Férémù ọjọ́ tí a ṣẹ̀dá ní dátà nínú àwọn ọmọ ẹbí kan nínú ìlà kọ̀ọ̀kan. Awọn idile le ni ọmọ kan tabi meji. Fun ọmọ kọọkan, a pese data ni ọjọ ibi ati abo, ati pe data fun ọmọ kọọkan wa ni awọn ọwọn lọtọ; iṣẹ-ṣiṣe wa ni lati mu data yii wa si ọna kika to pe fun itupalẹ.

Jọwọ ṣe akiyesi pe a ni awọn oniyipada meji pẹlu alaye nipa ọmọ kọọkan: akọ-abo ati ọjọ ibi wọn (awọn ọwọn pẹlu ìpele iṣaaju dop ni ọjọ ibi ninu, awọn ọwọn pẹlu ìpele abo ni ibalopo ti awọn ọmọ). Abajade ti o nireti ni pe wọn yẹ ki o han ni awọn ọwọn lọtọ. A le se eyi nipa ti o npese kan sipesifikesonu ninu eyi ti awọn iwe .value yoo ni meji ti o yatọ itumo.

spec <- family %>%
  pivot_longer_spec(-family) %>%
  separate(col = name, into = c(".value", "child"))%>%
  mutate(child = parse_number(child))

#> # A tibble: 4 x 3
#>   .name         .value child
#>   <chr>         <chr>  <dbl>
#> 1 dob_child1    dob        1
#> 2 dob_child2    dob        2
#> 3 gender_child1 gender     1
#> 4 gender_child2 gender     2

Nitorinaa, jẹ ki a wo igbese-nipasẹ-igbesẹ awọn iṣe ti o ṣe nipasẹ koodu ti o wa loke.

  • pivot_longer_spec(-family) - ṣẹda kan sipesifikesonu ti o compress gbogbo awọn ti wa tẹlẹ ọwọn ayafi awọn ebi iwe.
  • separate(col = name, into = c(".value", "child")) - pin awọn ọwọn .name, eyiti o ni awọn orukọ ti awọn aaye orisun, ni lilo abẹlẹ ati titẹ awọn iye abajade sinu awọn ọwọn .iye и ọmọ.
  • mutate(child = parse_number(child)) - yipada awọn iye aaye ọmọ lati ọrọ to nomba data iru.

Bayi a le lo sipesifikesonu abajade si dataframe atilẹba ati mu tabili wa si fọọmu ti o fẹ.

family %>% 
    pivot_longer(spec = spec, na.rm = T)

#> # A tibble: 9 x 4
#>   family child dob        gender
#>    <int> <dbl> <date>      <int>
#> 1      1     1 1998-11-26      1
#> 2      1     2 2000-01-29      2
#> 3      2     1 1996-06-22      2
#> 4      3     1 2002-07-11      2
#> 5      3     2 2004-04-05      2
#> 6      4     1 2004-10-10      1
#> 7      4     2 2009-08-27      1
#> 8      5     1 2000-12-05      2
#> 9      5     2 2005-02-28      1

A lo ariyanjiyan na.rm = TRUE, nitori awọn ti isiyi fọọmu ti awọn data fi agbara mu awọn ẹda ti afikun awọn ori ila fun ti kii-existent akiyesi. Nitori idile 2 ni ọmọ kan ṣoṣo, na.rm = TRUE onigbọwọ wipe ebi 2 yoo ni ọna kan ninu awọn wu.

Yiyipada awọn fireemu ọjọ lati gun si ọna kika fife

pivot_wider() - jẹ iyipada onidakeji, ati ni idakeji mu nọmba awọn ọwọn ti fireemu ọjọ pọ si nipa idinku nọmba awọn ori ila.

R package tidyr ati awọn oniwe-titun awọn iṣẹ pivot_longer ati pivot_wider

Iru iyipada yii jẹ ṣọwọn lilo pupọ lati mu data wa sinu fọọmu deede, sibẹsibẹ, ilana yii le wulo fun ṣiṣẹda awọn tabili pivot ti a lo ninu awọn igbejade, tabi fun iṣọpọ pẹlu awọn irinṣẹ miiran.

Ni otitọ awọn iṣẹ pivot_longer() и pivot_wider() jẹ onírẹlẹ̀, wọ́n sì ń gbé àwọn ìgbésẹ̀ tí ó yàtọ̀ síra wọn, ie. df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec) и df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec) yoo da df atilẹba pada.

Apẹẹrẹ ti o rọrun julọ ti yiyipada tabili kan si ọna kika jakejado

Lati ṣe afihan bi iṣẹ naa ṣe n ṣiṣẹ pivot_wider() a yoo lo dataset fish_incounters, eyi ti o tọju alaye nipa bi awọn ibudo oriṣiriṣi ṣe ṣe igbasilẹ iṣipopada ẹja lẹba odo.

#> # A tibble: 114 x 3
#>    fish  station  seen
#>    <fct> <fct>   <int>
#>  1 4842  Release     1
#>  2 4842  I80_1       1
#>  3 4842  Lisbon      1
#>  4 4842  Rstr        1
#>  5 4842  Base_TD     1
#>  6 4842  BCE         1
#>  7 4842  BCW         1
#>  8 4842  BCE2        1
#>  9 4842  BCW2        1
#> 10 4842  MAE         1
#> # … with 104 more rows

Ni ọpọlọpọ igba, tabili yii yoo jẹ alaye diẹ sii ati rọrun lati lo ti o ba ṣafihan alaye fun ibudo kọọkan ni iwe lọtọ.

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA
#>  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA
#>  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA
#>  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#>  8 4850        1     1     NA     1       1     1     1    NA    NA    NA
#>  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> # … with 9 more rows, and 1 more variable: MAW <int>

Eto data yii n ṣe igbasilẹ alaye nikan nigbati o ti rii ẹja nipasẹ ibudo, i.e. ti eyikeyi ẹja ko ba gba silẹ nipasẹ diẹ ninu awọn ibudo, lẹhinna data yii kii yoo wa ninu tabili. Eyi tumọ si abajade yoo kun pẹlu NA.

Sibẹsibẹ, ninu ọran yii a mọ pe isansa igbasilẹ tumọ si pe a ko rii ẹja naa, nitorinaa a le lo ariyanjiyan naa iye_fill ni iṣẹ pivot_wider() ki o si kun awọn iye ti o padanu pẹlu awọn odo:

fish_encounters %>% pivot_wider(
  names_from = station, 
  values_from = seen,
  values_fill = list(seen = 0)
)

#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1     0     0     0     0     0
#>  5 4847        1     1      1     0       0     0     0     0     0     0
#>  6 4848        1     1      1     1       0     0     0     0     0     0
#>  7 4849        1     1      0     0       0     0     0     0     0     0
#>  8 4850        1     1      0     1       1     1     1     0     0     0
#>  9 4851        1     1      0     0       0     0     0     0     0     0
#> 10 4854        1     1      0     0       0     0     0     0     0     0
#> # … with 9 more rows, and 1 more variable: MAW <int>

Ṣiṣẹda orukọ ọwọn lati awọn oniyipada orisun pupọ

Fojuinu pe a ni tabili ti o ni apapo ọja, orilẹ-ede ati ọdun. Lati ṣe agbekalẹ fireemu ọjọ idanwo kan, o le ṣiṣẹ koodu atẹle:

df <- expand_grid(
  product = c("A", "B"), 
  country = c("AI", "EI"), 
  year = 2000:2014
) %>%
  filter((product == "A" & country == "AI") | product == "B") %>% 
  mutate(value = rnorm(nrow(.)))

#> # A tibble: 45 x 4
#>    product country  year    value
#>    <chr>   <chr>   <int>    <dbl>
#>  1 A       AI       2000 -2.05   
#>  2 A       AI       2001 -0.676  
#>  3 A       AI       2002  1.60   
#>  4 A       AI       2003 -0.353  
#>  5 A       AI       2004 -0.00530
#>  6 A       AI       2005  0.442  
#>  7 A       AI       2006 -0.610  
#>  8 A       AI       2007 -2.77   
#>  9 A       AI       2008  0.899  
#> 10 A       AI       2009 -0.106  
#> # … with 35 more rows

Iṣẹ-ṣiṣe wa ni lati faagun fireemu data ki iwe kan ni data ninu fun akojọpọ ọja ati orilẹ-ede kọọkan. Lati ṣe eyi, o kan kọja ni ariyanjiyan awọn orukọ_lati a fekito ti o ni awọn orukọ ti awọn aaye lati wa ni dapọ.

df %>% pivot_wider(names_from = c(product, country),
                 values_from = "value")

#> # A tibble: 15 x 4
#>     year     A_AI    B_AI    B_EI
#>    <int>    <dbl>   <dbl>   <dbl>
#>  1  2000 -2.05     0.607   1.20  
#>  2  2001 -0.676    1.65   -0.114 
#>  3  2002  1.60    -0.0245  0.501 
#>  4  2003 -0.353    1.30   -0.459 
#>  5  2004 -0.00530  0.921  -0.0589
#>  6  2005  0.442   -1.55    0.594 
#>  7  2006 -0.610    0.380  -1.28  
#>  8  2007 -2.77     0.830   0.637 
#>  9  2008  0.899    0.0175 -1.30  
#> 10  2009 -0.106   -0.195   1.03  
#> # … with 5 more rows

O tun le lo awọn pato si iṣẹ kan pivot_wider(). Sugbon nigba ti silẹ si pivot_wider() sipesifikesonu wo ni idakeji iyipada pivot_longer(): Awọn ọwọn pato ninu .name, lilo awọn iye lati .iye ati awọn miiran ọwọn.

Fun datasetiti yii, o le ṣe agbekalẹ sipesifikesonu aṣa ti o ba fẹ ki gbogbo orilẹ-ede ti o ṣeeṣe ati akojọpọ ọja lati ni ọwọn tirẹ, kii ṣe awọn ti o wa ninu data nikan:

spec <- df %>% 
  expand(product, country, .value = "value") %>% 
  unite(".name", product, country, remove = FALSE)

#> # A tibble: 4 x 4
#>   .name product country .value
#>   <chr> <chr>   <chr>   <chr> 
#> 1 A_AI  A       AI      value 
#> 2 A_EI  A       EI      value 
#> 3 B_AI  B       AI      value 
#> 4 B_EI  B       EI      value

df %>% pivot_wider(spec = spec) %>% head()

#> # A tibble: 6 x 5
#>    year     A_AI  A_EI    B_AI    B_EI
#>   <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2000 -2.05       NA  0.607   1.20  
#> 2  2001 -0.676      NA  1.65   -0.114 
#> 3  2002  1.60       NA -0.0245  0.501 
#> 4  2003 -0.353      NA  1.30   -0.459 
#> 5  2004 -0.00530    NA  0.921  -0.0589
#> 6  2005  0.442      NA -1.55    0.594

Awọn apẹẹrẹ to ti ni ilọsiwaju pupọ ti ṣiṣẹ pẹlu imọran tidyr tuntun

Nsọ data di mimọ nipa lilo owo-wiwọle ikaniyan AMẸRIKA ati iwe data iyalo gẹgẹbi apẹẹrẹ.

Eto data owo oya_iyalo ni owo-wiwọle agbedemeji ati alaye iyalo fun gbogbo ipinlẹ ni AMẸRIKA fun ọdun 2017 (ṣeto data ti o wa ninu package tidycensus).

us_rent_income
#> # A tibble: 104 x 5
#>    GEOID NAME       variable estimate   moe
#>    <chr> <chr>      <chr>       <dbl> <dbl>
#>  1 01    Alabama    income      24476   136
#>  2 01    Alabama    rent          747     3
#>  3 02    Alaska     income      32940   508
#>  4 02    Alaska     rent         1200    13
#>  5 04    Arizona    income      27517   148
#>  6 04    Arizona    rent          972     4
#>  7 05    Arkansas   income      23789   165
#>  8 05    Arkansas   rent          709     5
#>  9 06    California income      29454   109
#> 10 06    California rent         1358     3
#> # … with 94 more rows

Ni fọọmu ti o ti fipamọ data sinu ibi ipamọ data owo oya_iyalo Ṣiṣẹ pẹlu wọn ko ni irọrun pupọ, nitorinaa a yoo fẹ lati ṣẹda ṣeto data pẹlu awọn ọwọn: iyalo, iyalo_moe, , owo_moe. Awọn ọna pupọ lo wa lati ṣẹda sipesifikesonu yii, ṣugbọn aaye akọkọ ni pe a nilo lati ṣe agbekalẹ gbogbo apapọ ti awọn iye oniyipada ati ifoju / moeati lẹhinna ṣe ina orukọ ọwọn.

  spec <- us_rent_income %>% 
    expand(variable, .value = c("estimate", "moe")) %>% 
    mutate(
      .name = paste0(variable, ifelse(.value == "moe", "_moe", ""))
    )

#> # A tibble: 4 x 3
#>   variable .value   .name     
#>   <chr>    <chr>    <chr>     
#> 1 income   estimate income    
#> 2 income   moe      income_moe
#> 3 rent     estimate rent      
#> 4 rent     moe      rent_moe

Pese yi sipesifikesonu pivot_wider() fun wa ni abajade ti a n wa:

us_rent_income %>% pivot_wider(spec = spec)

#> # A tibble: 52 x 6
#>    GEOID NAME                 income income_moe  rent rent_moe
#>    <chr> <chr>                 <dbl>      <dbl> <dbl>    <dbl>
#>  1 01    Alabama               24476        136   747        3
#>  2 02    Alaska                32940        508  1200       13
#>  3 04    Arizona               27517        148   972        4
#>  4 05    Arkansas              23789        165   709        5
#>  5 06    California            29454        109  1358        3
#>  6 08    Colorado              32401        109  1125        5
#>  7 09    Connecticut           35326        195  1123        5
#>  8 10    Delaware              31560        247  1076       10
#>  9 11    District of Columbia  43198        681  1424       17
#> 10 12    Florida               25952         70  1077        3
#> # … with 42 more rows

Banki Agbaye

Nigba miiran mimu data ṣeto sinu fọọmu ti o fẹ nilo awọn igbesẹ pupọ.
Eto data aye_bank_pop ni data Banki Agbaye lori iye eniyan ti orilẹ-ede kọọkan laarin ọdun 2000 ati 2018.

#> # A tibble: 1,056 x 20
#>    country indicator `2000` `2001` `2002` `2003`  `2004`  `2005`   `2006`
#>    <chr>   <chr>      <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
#>  1 ABW     SP.URB.T… 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4  4.49e+4
#>  2 ABW     SP.URB.G… 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
#>  3 ABW     SP.POP.T… 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5  1.01e+5
#>  4 ABW     SP.POP.G… 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0  7.98e-1
#>  5 AFG     SP.URB.T… 4.44e6 4.65e6 4.89e6 5.16e6 5.43e+6 5.69e+6  5.93e+6
#>  6 AFG     SP.URB.G… 3.91e0 4.66e0 5.13e0 5.23e0 5.12e+0 4.77e+0  4.12e+0
#>  7 AFG     SP.POP.T… 2.01e7 2.10e7 2.20e7 2.31e7 2.41e+7 2.51e+7  2.59e+7
#>  8 AFG     SP.POP.G… 3.49e0 4.25e0 4.72e0 4.82e0 4.47e+0 3.87e+0  3.23e+0
#>  9 AGO     SP.URB.T… 8.23e6 8.71e6 9.22e6 9.77e6 1.03e+7 1.09e+7  1.15e+7
#> 10 AGO     SP.URB.G… 5.44e0 5.59e0 5.70e0 5.76e0 5.75e+0 5.69e+0  4.92e+0
#> # … with 1,046 more rows, and 11 more variables: `2007` <dbl>,
#> #   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#> #   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>

Ibi-afẹde wa ni lati ṣẹda ṣeto data afinju pẹlu oniyipada kọọkan ninu iwe tirẹ. Koyewa ni pato kini awọn igbesẹ ti o nilo, ṣugbọn a yoo bẹrẹ pẹlu iṣoro ti o han julọ: ọdun ti tan kaakiri awọn ọwọn pupọ.

Lati ṣe atunṣe eyi o nilo lati lo iṣẹ naa pivot_longer().

pop2 <- world_bank_pop %>% 
  pivot_longer(`2000`:`2017`, names_to = "year")

#> # A tibble: 19,008 x 4
#>    country indicator   year  value
#>    <chr>   <chr>       <chr> <dbl>
#>  1 ABW     SP.URB.TOTL 2000  42444
#>  2 ABW     SP.URB.TOTL 2001  43048
#>  3 ABW     SP.URB.TOTL 2002  43670
#>  4 ABW     SP.URB.TOTL 2003  44246
#>  5 ABW     SP.URB.TOTL 2004  44669
#>  6 ABW     SP.URB.TOTL 2005  44889
#>  7 ABW     SP.URB.TOTL 2006  44881
#>  8 ABW     SP.URB.TOTL 2007  44686
#>  9 ABW     SP.URB.TOTL 2008  44375
#> 10 ABW     SP.URB.TOTL 2009  44052
#> # … with 18,998 more rows

Igbesẹ t’okan ni lati wo oniyipada atọka.
pop2 %>% count(indicator)

#> # A tibble: 4 x 2
#>   indicator       n
#>   <chr>       <int>
#> 1 SP.POP.GROW  4752
#> 2 SP.POP.TOTL  4752
#> 3 SP.URB.GROW  4752
#> 4 SP.URB.TOTL  4752

Nibiti SP.POP.GROW ti wa ni idagbasoke olugbe, SP.POP.TOTL ni lapapọ olugbe, ati SP.URB. * Ohun kanna, ṣugbọn fun awọn agbegbe ilu nikan. Jẹ ki a pin awọn iye wọnyi si awọn oniyipada meji: agbegbe - agbegbe (lapapọ tabi ilu) ati oniyipada kan ti o ni data gangan (olugbe tabi idagbasoke):

pop3 <- pop2 %>% 
  separate(indicator, c(NA, "area", "variable"))

#> # A tibble: 19,008 x 5
#>    country area  variable year  value
#>    <chr>   <chr> <chr>    <chr> <dbl>
#>  1 ABW     URB   TOTL     2000  42444
#>  2 ABW     URB   TOTL     2001  43048
#>  3 ABW     URB   TOTL     2002  43670
#>  4 ABW     URB   TOTL     2003  44246
#>  5 ABW     URB   TOTL     2004  44669
#>  6 ABW     URB   TOTL     2005  44889
#>  7 ABW     URB   TOTL     2006  44881
#>  8 ABW     URB   TOTL     2007  44686
#>  9 ABW     URB   TOTL     2008  44375
#> 10 ABW     URB   TOTL     2009  44052
#> # … with 18,998 more rows

Bayi gbogbo ohun ti a ni lati ṣe ni pin oniyipada si awọn ọwọn meji:

pop3 %>% 
  pivot_wider(names_from = variable, values_from = value)

#> # A tibble: 9,504 x 5
#>    country area  year   TOTL    GROW
#>    <chr>   <chr> <chr> <dbl>   <dbl>
#>  1 ABW     URB   2000  42444  1.18  
#>  2 ABW     URB   2001  43048  1.41  
#>  3 ABW     URB   2002  43670  1.43  
#>  4 ABW     URB   2003  44246  1.31  
#>  5 ABW     URB   2004  44669  0.951 
#>  6 ABW     URB   2005  44889  0.491 
#>  7 ABW     URB   2006  44881 -0.0178
#>  8 ABW     URB   2007  44686 -0.435 
#>  9 ABW     URB   2008  44375 -0.698 
#> 10 ABW     URB   2009  44052 -0.731 
#> # … with 9,494 more rows

Akojọ ti awọn olubasọrọ

Apeere kan ti o kẹhin, fojuinu pe o ni atokọ olubasọrọ ti o daakọ ati lẹẹmọ lati oju opo wẹẹbu kan:

contacts <- tribble(
  ~field, ~value,
  "name", "Jiena McLellan",
  "company", "Toyota", 
  "name", "John Smith", 
  "company", "google", 
  "email", "[email protected]",
  "name", "Huxley Ratcliffe"
)

Ṣiṣeto atokọ yii nira pupọ nitori pe ko si oniyipada ti o ṣe idanimọ iru data ti o jẹ ti olubasọrọ wo. A le ṣe atunṣe eyi nipa ṣiṣe akiyesi pe data fun olubasọrọ tuntun kọọkan bẹrẹ pẹlu orukọ kan ("orukọ"), nitorinaa a le ṣẹda idanimọ alailẹgbẹ kan ki o pọsi nipasẹ ẹyọkan ni akoko kọọkan iye “orukọ” ba pade ni aaye aaye:

contacts <- contacts %>% 
  mutate(
    person_id = cumsum(field == "name")
  )
contacts

#> # A tibble: 6 x 3
#>   field   value            person_id
#>   <chr>   <chr>                <int>
#> 1 name    Jiena McLellan           1
#> 2 company Toyota                   1
#> 3 name    John Smith               2
#> 4 company google                   2
#> 5 email   [email protected]          2
#> 6 name    Huxley Ratcliffe         3

Ni bayi pe a ni ID alailẹgbẹ fun olubasọrọ kọọkan, a le yi aaye ati iye si awọn ọwọn:

contacts %>% 
  pivot_wider(names_from = field, values_from = value)

#> # A tibble: 3 x 4
#>   person_id name             company email          
#>       <int> <chr>            <chr>   <chr>          
#> 1         1 Jiena McLellan   Toyota  <NA>           
#> 2         2 John Smith       google  [email protected]
#> 3         3 Huxley Ratcliffe <NA>    <NA>

ipari

Mi ti ara ẹni ero ni wipe titun Erongba tidyr lotitọ ni oye diẹ sii, ati pe o ga julọ ni iṣẹ ṣiṣe si awọn iṣẹ-ijogunba spread() и gather(). Mo nireti pe nkan yii ṣe iranlọwọ fun ọ lati koju pivot_longer() и pivot_wider().

orisun: www.habr.com

Fi ọrọìwòye kun