R ํŒจํ‚ค์ง€ tidyr ๋ฐ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅivot_longer ๋ฐivot_wider

ํŒจํ‚ค์ง€ ์ •๋ฆฌ์ •๋ˆ R ์–ธ์–ด์—์„œ ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ค‘ ํ•˜๋‚˜์˜ ํ•ต์‹ฌ์— ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊น”๋”ํ•œ.
ํŒจํ‚ค์ง€์˜ ์ฃผ์š” ๋ชฉ์ ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ •ํ™•ํ•œ ํ˜•์‹์œผ๋กœ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด๋ฏธ Habrรฉ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์ถœํŒ ์ด ํŒจํ‚ค์ง€์—๋งŒ ์ „๋…ํ–ˆ์ง€๋งŒ ๊ทธ ๋‚ ์งœ๋Š” 2015๋…„์œผ๋กœ ๊ฑฐ์Šฌ๋Ÿฌ ์˜ฌ๋ผ๊ฐ‘๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ €์ž์ธ Hedley Wickham์ด ๋ฉฐ์น  ์ „์— ๋ฐœํ‘œํ•œ ์ตœ์‹  ๋ณ€๊ฒฝ ์‚ฌํ•ญ์— ๋Œ€ํ•ด ๋ง์”€๋“œ๋ฆฌ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

R ํŒจํ‚ค์ง€ tidyr ๋ฐ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅivot_longer ๋ฐivot_wider

S.J.K.: Gather() ๋ฐ Spread()๋Š” ๋” ์ด์ƒ ์‚ฌ์šฉ๋˜์ง€ ์•Š๋‚˜์š”?

ํ•ด๋“ค๋ฆฌ ์œ„์ปด: ์–ด๋Š ์ •๋„. ์šฐ๋ฆฌ๋Š” ๋” ์ด์ƒ ์ด๋Ÿฌํ•œ ๊ธฐ๋Šฅ์˜ ์‚ฌ์šฉ์„ ๊ถŒ์žฅํ•˜์ง€ ์•Š๊ณ  ๋ฒ„๊ทธ๋ฅผ ์ˆ˜์ •ํ•˜์ง€ ์•Š์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ•ด๋‹น ๊ธฐ๋Šฅ์€ ํ˜„์žฌ ์ƒํƒœ๋กœ ํŒจํ‚ค์ง€์— ๊ณ„์† ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

๋‚ด์šฉ

๋ฐ์ดํ„ฐ ๋ถ„์„์— ๊ด€์‹ฌ์ด ์žˆ๋‹ค๋ฉด ์ œ ๊ธ€์—๋„ ๊ด€์‹ฌ์ด ์žˆ์œผ์‹ค ๊ฒ๋‹ˆ๋‹ค. ์ „๋ณด ะธ ์œ ํŠœ๋ธŒ ์ฑ„๋„. ๋Œ€๋ถ€๋ถ„์˜ ์ฝ˜ํ…์ธ ๋Š” R ์–ธ์–ด์— ์ „๋…ํ•ฉ๋‹ˆ๋‹ค.

๊น”๋”ํ•œ ๋ฐ์ดํ„ฐ ๊ฐœ๋…

๊ณจ ์ •๋ฆฌ์ •๋ˆ โ€” ๋ฐ์ดํ„ฐ๋ฅผ ์†Œ์œ„ ๊น”๋”ํ•œ ํ˜•์‹์œผ๋กœ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. ๊น”๋”ํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.

  • ๊ฐ ๋ณ€์ˆ˜๋Š” ์—ด์— ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฐ ๊ด€์ธก๊ฐ’์€ ๋ฌธ์ž์—ด์ž…๋‹ˆ๋‹ค.
  • ๊ฐ ๊ฐ’์€ ์…€์ž…๋‹ˆ๋‹ค.

๋ถ„์„์„ ์ˆ˜ํ–‰ํ•  ๋•Œ, ๊น”๋”ํ•œ ๋ฐ์ดํ„ฐ๋กœ ์ œ์‹œ๋œ ๋ฐ์ดํ„ฐ๋กœ ์ž‘์—…ํ•˜๋Š” ๊ฒƒ์ด ํ›จ์”ฌ ์‰ฝ๊ณ  ํŽธ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

tidyr ํŒจํ‚ค์ง€์— ํฌํ•จ๋œ ์ฃผ์š” ๊ธฐ๋Šฅ

tidyr์—๋Š” ํ…Œ์ด๋ธ”์„ ๋ณ€ํ™˜ํ•˜๋„๋ก ์„ค๊ณ„๋œ ํ•จ์ˆ˜ ์„ธํŠธ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • fill() โ€” ์ด์ „ ๊ฐ’์œผ๋กœ ์—ด์˜ ๋ˆ„๋ฝ๋œ ๊ฐ’์„ ์ฑ„์›๋‹ˆ๋‹ค.
  • separate() โ€” ๊ตฌ๋ถ„ ๊ธฐํ˜ธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•˜๋‚˜์˜ ํ•„๋“œ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.
  • unite() โ€” ์—ฌ๋Ÿฌ ํ•„๋“œ๋ฅผ ํ•˜๋‚˜๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ์ž‘์—…, ์ฆ‰ ํ•จ์ˆ˜์˜ ๋ฐ˜๋Œ€ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. separate();
  • pivot_longer() โ€” ๋ฐ์ดํ„ฐ๋ฅผ ์™€์ด๋“œ ํ˜•์‹์—์„œ ๊ธด ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ธฐ๋Šฅ
  • pivot_wider() - ๋ฐ์ดํ„ฐ๋ฅผ ๊ธด ํ˜•์‹์—์„œ ์™€์ด๋“œ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค. ํ•จ์ˆ˜๊ฐ€ ์ˆ˜ํ–‰ํ•œ ์ž‘์—…์˜ ์—ญ๋™์ž‘ pivot_longer().
  • gather()์“ธ๋ชจ์—†๋Š” โ€” ๋ฐ์ดํ„ฐ๋ฅผ ์™€์ด๋“œ ํ˜•์‹์—์„œ ๊ธด ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ธฐ๋Šฅ
  • spread()์“ธ๋ชจ์—†๋Š” - ๋ฐ์ดํ„ฐ๋ฅผ ๊ธด ํ˜•์‹์—์„œ ์™€์ด๋“œ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค. ํ•จ์ˆ˜๊ฐ€ ์ˆ˜ํ–‰ํ•œ ์ž‘์—…์˜ ์—ญ๋™์ž‘ gather().

๋ฐ์ดํ„ฐ๋ฅผ ๋„“์€ ํ˜•์‹์—์„œ ๊ธด ํ˜•์‹์œผ๋กœ ๋˜๋Š” ๊ทธ ๋ฐ˜๋Œ€๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ƒˆ๋กœ์šด ๊ฐœ๋…

์ด์ „์—๋Š” ์ด๋Ÿฌํ•œ ์ข…๋ฅ˜์˜ ๋ณ€ํ™˜์— ํ•จ์ˆ˜๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. gather() ะธ spread(). ์ˆ˜๋…„ ๋™์•ˆ ์ด๋Ÿฌํ•œ ํ•จ์ˆ˜๊ฐ€ ์กด์žฌํ•˜๋ฉด์„œ ํŒจํ‚ค์ง€ ์ž‘์„ฑ์ž๋ฅผ ํฌํ•จํ•œ ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ด๋Ÿฌํ•œ ํ•จ์ˆ˜์˜ ์ด๋ฆ„๊ณผ ํ•ด๋‹น ์ธ์ˆ˜๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š์•„ ํ•ด๋‹น ํ•จ์ˆ˜๋ฅผ ์ฐพ๊ณ  ์ด๋Ÿฌํ•œ ํ•จ์ˆ˜ ์ค‘ ์–ด๋–ค ํ•จ์ˆ˜๊ฐ€ ๋ณ€ํ™˜ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช์—ˆ๋‹ค๋Š” ๊ฒƒ์ด ๋ถ„๋ช…ํ•ด์กŒ์Šต๋‹ˆ๋‹ค. ๋‚ ์งœ ํ”„๋ ˆ์ž„์„ ๋„“์€ ํ˜•์‹์—์„œ ๊ธด ํ˜•์‹์œผ๋กœ, ๋˜๋Š” ๊ทธ ๋ฐ˜๋Œ€๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.

์ด์™€ ๊ด€๋ จํ•˜์—ฌ, ์ •๋ฆฌ์ •๋ˆ ๋‚ ์งœ ํ”„๋ ˆ์ž„์„ ๋ณ€ํ™˜ํ•˜๋„๋ก ์„ค๊ณ„๋œ ๋‘ ๊ฐ€์ง€ ์ƒˆ๋กญ๊ณ  ์ค‘์š”ํ•œ ๊ธฐ๋Šฅ์ด ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ pivot_longer() ะธ pivot_wider() ํŒจํ‚ค์ง€์˜ ์ผ๋ถ€ ๊ธฐ๋Šฅ์—์„œ ์˜๊ฐ์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. cdata, John Mount์™€ Nina Zumel์ด ์ œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ตœ์‹  ๋ฒ„์ „์˜ tidyr 0.8.3.9000 ์„ค์น˜

์ƒˆ๋กœ์šด ์ตœ์‹  ๋ฒ„์ „์˜ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•˜๋ ค๋ฉด ์ •๋ฆฌ์ •๋ˆ 0.8.3.9000, ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.

devtools::install_github("tidyverse/tidyr")

์ด ๊ธ€์„ ์“ฐ๋Š” ์‹œ์ ์—์„œ ์ด๋Ÿฌํ•œ ๊ธฐ๋Šฅ์€ GitHub์˜ ๊ฐœ๋ฐœ ๋ฒ„์ „ ํŒจํ‚ค์ง€์—์„œ๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์œผ๋กœ ์ „ํ™˜

์‚ฌ์‹ค, ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ด์ „ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ „์†กํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ต์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋” ๋‚˜์€ ์ดํ•ด๋ฅผ ์œ„ํ•ด ์ด์ „ ๊ธฐ๋Šฅ ๋ฌธ์„œ์˜ ์˜ˆ๋ฅผ ๋“ค์–ด ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋™์ผํ•œ ์ž‘์—…์ด ์ˆ˜ํ–‰๋˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค. pivot_*() ๊ธฐ๋Šฅ.

์™€์ด๋“œ ํ˜•์‹์„ ๊ธด ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜์ง‘ ํ•จ์ˆ˜ ๋ฌธ์„œ์˜ ์˜ˆ์ œ ์ฝ”๋“œ

# example
library(dplyr)
stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

# old
stocks_gather <- stocks %>% gather(key   = stock, 
                                   value = price, 
                                   -time)

# new
stocks_long   <- stocks %>% pivot_longer(cols      = -time, 
                                       names_to  = "stock", 
                                       values_to = "price")

๊ธด ํ˜•์‹์„ ์™€์ด๋“œ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

ํ™•์‚ฐ ํ•จ์ˆ˜ ๋ฌธ์„œ์˜ ์˜ˆ์ œ ์ฝ”๋“œ

# old
stocks_spread <- stocks_gather %>% spread(key = stock, 
                                          value = price) 

# new 
stock_wide    <- stocks_long %>% pivot_wider(names_from  = "stock",
                                            values_from = "price")

์™œ๋ƒํ•˜๋ฉด ์œ„์˜ ์ž‘์—… ์˜ˆ์—์„œ pivot_longer() ะธ pivot_wider(), ์›๋ณธ ํ…Œ์ด๋ธ”์—์„œ ์ฃผ์‹ ์ธ์ˆ˜์— ๋‚˜์—ด๋œ ์—ด์ด ์—†์Šต๋‹ˆ๋‹ค. ์ด๋ฆ„_๋Œ€์ƒ ะธ ๊ฐ’_to ์ด๋ฆ„์€ ๋”ฐ์˜ดํ‘œ๋กœ ๋ฌถ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ƒˆ๋กœ์šด ๊ฐœ๋…์˜ ์ž‘์—…์œผ๋กœ ์ „ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐ€์žฅ ์‰ฝ๊ฒŒ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ํ‘œ ์ •๋ฆฌ์ •๋ˆ.

R ํŒจํ‚ค์ง€ tidyr ๋ฐ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅivot_longer ๋ฐivot_wider

์ €์ž์˜ ๋ฉ”๋ชจ

์•„๋ž˜์˜ ๋ชจ๋“  ํ…์ŠคํŠธ๋Š” ์ ์‘ํ˜•์ž…๋‹ˆ๋‹ค. ์‹ฌ์ง€์–ด ๋ฌด๋ฃŒ ๋ฒˆ์—ญ์ด๋ผ๊ณ  ๋งํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ์‚ฝํ™” ๊ณต์‹ tidyverse ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์›น์‚ฌ์ดํŠธ์—์„œ.

๋ฐ์ดํ„ฐ๋ฅผ ์™€์ด๋“œ ํ˜•์‹์—์„œ ๊ธด ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฐ„๋‹จํ•œ ์˜ˆ

pivot_longer () โ€” ์—ด ์ˆ˜๋ฅผ ์ค„์ด๊ณ  ํ–‰ ์ˆ˜๋ฅผ ๋Š˜๋ ค ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋” ๊ธธ๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

R ํŒจํ‚ค์ง€ tidyr ๋ฐ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅivot_longer ๋ฐivot_wider

๊ธฐ์‚ฌ์— ์ œ์‹œ๋œ ์˜ˆ์ œ๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด ๋จผ์ € ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ์—ฐ๊ฒฐํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

library(tidyr)
library(dplyr)
library(readr)

์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์ข…๊ต์™€ ์—ฐ๊ฐ„ ์†Œ๋“์— ๊ด€ํ•ด ์งˆ๋ฌธํ•œ ์„ค๋ฌธ ์กฐ์‚ฌ ๊ฒฐ๊ณผ๊ฐ€ ํฌํ•จ๋œ ํ…Œ์ด๋ธ”์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

#> # A tibble: 18 x 11
#>    religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k`
#>    <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#>  1 Agnostic      27        34        60        81        76       137
#>  2 Atheist       12        27        37        52        35        70
#>  3 Buddhist      27        21        30        34        33        58
#>  4 Catholic     418       617       732       670       638      1116
#>  5 Donโ€™t kโ€ฆ      15        14        15        11        10        35
#>  6 Evangelโ€ฆ     575       869      1064       982       881      1486
#>  7 Hindu          1         9         7         9        11        34
#>  8 Historiโ€ฆ     228       244       236       238       197       223
#>  9 Jehovahโ€ฆ      20        27        24        24        21        30
#> 10 Jewish        19        19        25        25        30        95
#> # โ€ฆ with 8 more rows, and 4 more variables: `$75-100k` <dbl>,
#> #   `$100-150k` <dbl>, `>150k` <dbl>, `Don't know/refused` <dbl>

์ด ํ…Œ์ด๋ธ”์—๋Š” ์‘๋‹ต์ž์˜ ์ข…๊ต ๋ฐ์ดํ„ฐ๊ฐ€ ํ–‰์œผ๋กœ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉฐ ์†Œ๋“ ์ˆ˜์ค€์€ ์—ด ์ด๋ฆ„์— ๋ถ„์‚ฐ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ์‘๋‹ต์ž ์ˆ˜๋Š” ์ข…๊ต์™€ ์†Œ๋“ ์ˆ˜์ค€์ด ๊ต์ฐจํ•˜๋Š” ์…€ ๊ฐ’์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ํ…Œ์ด๋ธ”์„ ๊น”๋”ํ•˜๊ณ  ์˜ฌ๋ฐ”๋ฅธ ํ˜•์‹์œผ๋กœ ๋งŒ๋“ค๋ ค๋ฉด ๋‹ค์Œ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค. pivot_longer():

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3
#>    religion income             count
#>    <chr>    <chr>              <dbl>
#>  1 Agnostic <$10k                 27
#>  2 Agnostic $10-20k               34
#>  3 Agnostic $20-30k               60
#>  4 Agnostic $30-40k               81
#>  5 Agnostic $40-50k               76
#>  6 Agnostic $50-75k              137
#>  7 Agnostic $75-100k             122
#>  8 Agnostic $100-150k            109
#>  9 Agnostic >150k                 84
#> 10 Agnostic Don't know/refused    96
#> # โ€ฆ with 170 more rows

ํ•จ์ˆ˜ ์ธ์ˆ˜ pivot_longer()

  • ์ฒซ ๋ฒˆ์งธ ์ธ์ˆ˜ ๋ชฉ๊ฑธ์ด, ๋ณ‘ํ•ฉํ•ด์•ผ ํ•  ์—ด์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ๋‹ค์Œ์„ ์ œ์™ธํ•œ ๋ชจ๋“  ์—ด์€ ์‹œ๊ฐ„.
  • ๋…ผ์˜ ์ด๋ฆ„_๋Œ€์ƒ ์šฐ๋ฆฌ๊ฐ€ ์—ฐ๊ฒฐํ•œ ์—ด์˜ ์ด๋ฆ„์—์„œ ์ƒ์„ฑ๋  ๋ณ€์ˆ˜์˜ ์ด๋ฆ„์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ’_to ๋ณ‘ํ•ฉ๋œ ์—ด์˜ ์…€ ๊ฐ’์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ์—์„œ ์ƒ์„ฑ๋  ๋ณ€์ˆ˜์˜ ์ด๋ฆ„์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋ช…์„ธ์„œ

์ด๊ฒƒ์€ ํŒจํ‚ค์ง€์˜ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค. ์ •๋ฆฌ์ •๋ˆ, ์ด์ „์—๋Š” ๋ ˆ๊ฑฐ์‹œ ๊ธฐ๋Šฅ์œผ๋กœ ์ž‘์—…ํ•  ๋•Œ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์–‘์€ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ, ๊ฐ ํ–‰์€ ์ƒˆ ์ถœ๋ ฅ ๋‚ ์งœ ํ”„๋ ˆ์ž„์˜ ํ•œ ์—ด๊ณผ ๋‹ค์Œ์œผ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋‘ ๊ฐœ์˜ ํŠน์ˆ˜ ์—ด์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

  • .name์„ ์›๋ž˜ ์—ด ์ด๋ฆ„์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
  • .๊ฐ’ ์…€ ๊ฐ’์ด ํฌํ•จ๋  ์—ด์˜ ์ด๋ฆ„์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์–‘์˜ ๋‚˜๋จธ์ง€ ์—ด์€ ์ƒˆ ์—ด์ด ์••์ถ•๋œ ์—ด์˜ ์ด๋ฆ„์„ ํ‘œ์‹œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค. .name์„.

์‚ฌ์–‘์€ ์—ด ์ด๋ฆ„์— ์ €์žฅ๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์—ด์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ ํ–‰๊ณผ ๊ฐ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ ์—ด์ด ์—ด ์ด๋ฆ„๊ณผ ๊ฒฐํ•ฉ๋ฉ๋‹ˆ๋‹ค. ์ด ์ •์˜๋Š” ํ˜„์žฌ๋กœ์„œ๋Š” ํ˜ผ๋ž€์Šค๋Ÿฌ์›Œ ๋ณด์ผ ์ˆ˜ ์žˆ์ง€๋งŒ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ๋ฅผ ์‚ดํŽด๋ณด๋ฉด ํ›จ์”ฌ ๋” ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์›Œ์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋” ๋ช…ํ™•ํ•ด์กŒ์Šต๋‹ˆ๋‹ค.

์‚ฌ์–‘์˜ ์š”์ ์€ ๋ณ€ํ™˜ ์ค‘์ธ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ๋Œ€ํ•œ ์ƒˆ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๊ฒ€์ƒ‰, ์ˆ˜์ • ๋ฐ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ…Œ์ด๋ธ”์„ ์™€์ด๋“œ ํ˜•์‹์—์„œ ๊ธด ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ ์‚ฌ์–‘ ์ž‘์—…์„ ํ•˜๋ ค๋ฉด ๋‹ค์Œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค. pivot_longer_spec().

์ด ํ•จ์ˆ˜์˜ ์ž‘๋™ ๋ฐฉ์‹์€ ์œ„์—์„œ ์„ค๋ช…ํ•œ ๋ฐฉ์‹์œผ๋กœ ๋ชจ๋“  ๋‚ ์งœ ํ”„๋ ˆ์ž„์„ ๊ฐ€์ ธ์™€ ํ•ด๋‹น ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ํŒจํ‚ค์ง€์™€ ํ•จ๊ป˜ ์ œ๊ณต๋˜๋Š” who ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ •๋ฆฌ์ •๋ˆ. ์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋Š” ๊ฒฐํ•ต ๋ฐœ๋ณ‘๋ฅ ์— ๋Œ€ํ•ด ๊ตญ์ œ ๋ณด๊ฑด ๊ธฐ๊ตฌ์—์„œ ์ œ๊ณตํ•œ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

who
#> # A tibble: 7,240 x 60
#>    country iso2  iso3   year new_sp_m014 new_sp_m1524 new_sp_m2534
#>    <chr>   <chr> <chr> <int>       <int>        <int>        <int>
#>  1 Afghanโ€ฆ AF    AFG    1980          NA           NA           NA
#>  2 Afghanโ€ฆ AF    AFG    1981          NA           NA           NA
#>  3 Afghanโ€ฆ AF    AFG    1982          NA           NA           NA
#>  4 Afghanโ€ฆ AF    AFG    1983          NA           NA           NA
#>  5 Afghanโ€ฆ AF    AFG    1984          NA           NA           NA
#>  6 Afghanโ€ฆ AF    AFG    1985          NA           NA           NA
#>  7 Afghanโ€ฆ AF    AFG    1986          NA           NA           NA
#>  8 Afghanโ€ฆ AF    AFG    1987          NA           NA           NA
#>  9 Afghanโ€ฆ AF    AFG    1988          NA           NA           NA
#> 10 Afghanโ€ฆ AF    AFG    1989          NA           NA           NA
#> # โ€ฆ with 7,230 more rows, and 53 more variables

์‚ฌ์–‘์„ ๊ตฌ์ถ•ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

spec <- who %>%
  pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")

#> # A tibble: 56 x 3
#>    .name        .value name        
#>    <chr>        <chr>  <chr>       
#>  1 new_sp_m014  count  new_sp_m014 
#>  2 new_sp_m1524 count  new_sp_m1524
#>  3 new_sp_m2534 count  new_sp_m2534
#>  4 new_sp_m3544 count  new_sp_m3544
#>  5 new_sp_m4554 count  new_sp_m4554
#>  6 new_sp_m5564 count  new_sp_m5564
#>  7 new_sp_m65   count  new_sp_m65  
#>  8 new_sp_f014  count  new_sp_f014 
#>  9 new_sp_f1524 count  new_sp_f1524
#> 10 new_sp_f2534 count  new_sp_f2534
#> # โ€ฆ with 46 more rows

๋ถ„์•ผ ๊ตญ๊ฐ€, iso2, iso3 ์ด๋ฏธ ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์ž„๋ฌด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์—ด์„ ๋’ค์ง‘๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. new_sp_m014 ์— newrel_f65.

์ด๋Ÿฌํ•œ ์—ด์˜ ์ด๋ฆ„์—๋Š” ๋‹ค์Œ ์ •๋ณด๊ฐ€ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.

  • ์ ‘๋‘์‚ฌ new_ ์—ด์— ์ƒˆ๋กœ์šด ๊ฒฐํ•ต ์‚ฌ๋ก€์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Œ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ ํ˜„์žฌ ๋‚ ์งœ ํ”„๋ ˆ์ž„์—๋Š” ์ƒˆ๋กœ์šด ์งˆ๋ณ‘์— ๋Œ€ํ•œ ์ •๋ณด๋งŒ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ํ˜„์žฌ ๋งฅ๋ฝ์—์„œ ์ด ์ ‘๋‘์‚ฌ๋Š” ์–ด๋–ค ์˜๋ฏธ๋„ ๊ฐ–์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • sp/rel/sp/ep ์งˆ๋ณ‘์„ ์ง„๋‹จํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
  • m/f ํ™˜์ž์˜ ์„ฑ๋ณ„.
  • 014/1524/2535/3544/4554/65 ํ™˜์ž ์—ฐ๋ น๋Œ€.

ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด ์—ด์„ ๋ถ„ํ• ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. extract()์ •๊ทœ ํ‘œํ˜„์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

spec <- spec %>%
        extract(name, c("diagnosis", "gender", "age"), "new_?(.*)_(.)(.*)")

#> # A tibble: 56 x 5
#>    .name        .value diagnosis gender age  
#>    <chr>        <chr>  <chr>     <chr>  <chr>
#>  1 new_sp_m014  count  sp        m      014  
#>  2 new_sp_m1524 count  sp        m      1524 
#>  3 new_sp_m2534 count  sp        m      2534 
#>  4 new_sp_m3544 count  sp        m      3544 
#>  5 new_sp_m4554 count  sp        m      4554 
#>  6 new_sp_m5564 count  sp        m      5564 
#>  7 new_sp_m65   count  sp        m      65   
#>  8 new_sp_f014  count  sp        f      014  
#>  9 new_sp_f1524 count  sp        f      1524 
#> 10 new_sp_f2534 count  sp        f      2534 
#> # โ€ฆ with 46 more rows

์นผ๋Ÿผ์„ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š” .name์„ ์ด๋Š” ์›๋ณธ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์—ด ์ด๋ฆ„์— ๋Œ€ํ•œ ์ธ๋ฑ์Šค์ด๋ฏ€๋กœ ๋ณ€๊ฒฝ๋˜์ง€ ์•Š์€ ์ƒํƒœ๋กœ ์œ ์ง€๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์„ฑ๋ณ„ ๋ฐ ์—ฐ๋ น(์—ด ์„ฑ๋ณ„ ะธ ๋‚˜์ด)์—๋Š” ๊ณ ์ •๋œ ๊ฐ’๊ณผ ์•Œ๋ ค์ง„ ๊ฐ’์ด ์žˆ์œผ๋ฏ€๋กœ ์ด๋Ÿฌํ•œ ์—ด์„ ์š”์ธ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

spec <-  spec %>%
            mutate(
              gender = factor(gender, levels = c("f", "m")),
              age = factor(age, levels = unique(age), ordered = TRUE)
            ) 

๋งˆ์ง€๋ง‰์œผ๋กœ ์šฐ๋ฆฌ๊ฐ€ ๋งŒ๋“  ์‚ฌ์–‘์„ ์›๋ž˜ ๋‚ ์งœ ํ”„๋ ˆ์ž„์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋ˆ„๊ตฌ ์šฐ๋ฆฌ๋Š” ์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค ํˆฌ๊ธฐ ๊ธฐ๋Šฅ์—์„œ pivot_longer().

who %>% pivot_longer(spec = spec)

#> # A tibble: 405,440 x 8
#>    country     iso2  iso3   year diagnosis gender age   count
#>    <chr>       <chr> <chr> <int> <chr>     <fct>  <ord> <int>
#>  1 Afghanistan AF    AFG    1980 sp        m      014      NA
#>  2 Afghanistan AF    AFG    1980 sp        m      1524     NA
#>  3 Afghanistan AF    AFG    1980 sp        m      2534     NA
#>  4 Afghanistan AF    AFG    1980 sp        m      3544     NA
#>  5 Afghanistan AF    AFG    1980 sp        m      4554     NA
#>  6 Afghanistan AF    AFG    1980 sp        m      5564     NA
#>  7 Afghanistan AF    AFG    1980 sp        m      65       NA
#>  8 Afghanistan AF    AFG    1980 sp        f      014      NA
#>  9 Afghanistan AF    AFG    1980 sp        f      1524     NA
#> 10 Afghanistan AF    AFG    1980 sp        f      2534     NA
#> # โ€ฆ with 405,430 more rows

๋ฐฉ๊ธˆ ์ˆ˜ํ–‰ํ•œ ๋ชจ๋“  ์ž‘์—…์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฐœ๋žต์ ์œผ๋กœ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

R ํŒจํ‚ค์ง€ tidyr ๋ฐ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅivot_longer ๋ฐivot_wider

๋‹ค์ค‘ ๊ฐ’(.value)์„ ์‚ฌ์šฉํ•œ ์ง€์ •

์œ„์˜ ์˜ˆ์—์„œ ์‚ฌ์–‘ ์—ด์€ .๊ฐ’ ๊ฐ’์ด ํ•˜๋‚˜๋งŒ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ์— ํ•ด๋‹น๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ๋•Œ๋กœ๋Š” ๊ฐ’์˜ ๋ฐ์ดํ„ฐ ์œ ํ˜•์ด ๋‹ค๋ฅธ ์—ด์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ด์•ผ ํ•˜๋Š” ์ƒํ™ฉ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ ˆ๊ฑฐ์‹œ ๊ธฐ๋Šฅ ์‚ฌ์šฉ spread() ์ด๊ฒƒ์€ ๊ฝค ์–ด๋ ค์šธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์•„๋ž˜ ์˜ˆ๋Š” ๋‹ค์Œ์—์„œ ๊ฐ€์ ธ์˜จ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‚ฝํ™” ํŒจํ‚ค์ง€์— ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”.

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ๋งŒ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

family <- tibble::tribble(
  ~family,  ~dob_child1,  ~dob_child2, ~gender_child1, ~gender_child2,
       1L, "1998-11-26", "2000-01-29",             1L,             2L,
       2L, "1996-06-22",           NA,             2L,             NA,
       3L, "2002-07-11", "2004-04-05",             2L,             2L,
       4L, "2004-10-10", "2009-08-27",             1L,             1L,
       5L, "2000-12-05", "2005-02-28",             2L,             1L,
)
family <- family %>% mutate_at(vars(starts_with("dob")), parse_date)

#> # A tibble: 5 x 5
#>   family dob_child1 dob_child2 gender_child1 gender_child2
#>    <int> <date>     <date>             <int>         <int>
#> 1      1 1998-11-26 2000-01-29             1             2
#> 2      2 1996-06-22 NA                     2            NA
#> 3      3 2002-07-11 2004-04-05             2             2
#> 4      4 2004-10-10 2009-08-27             1             1
#> 5      5 2000-12-05 2005-02-28             2             1

์ƒ์„ฑ๋œ ๋‚ ์งœ ํ”„๋ ˆ์ž„์—๋Š” ๊ฐ ์ค„์— ํ•œ ๊ฐ€์กฑ์˜ ์ž๋…€์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ๊ฐ€์กฑ์—๊ฒŒ๋Š” ํ•œ๋‘ ๋ช…์˜ ์ž๋…€๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์–ด๋ฆฐ์ด์— ๋Œ€ํ•ด ์ƒ๋…„์›”์ผ๊ณผ ์„ฑ๋ณ„์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ œ๊ณต๋˜๋ฉฐ ๊ฐ ์–ด๋ฆฐ์ด์˜ ๋ฐ์ดํ„ฐ๋Š” ๋ณ„๋„์˜ ์—ด์— ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์ž„๋ฌด๋Š” ๋ถ„์„์„ ์œ„ํ•ด ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์˜ฌ๋ฐ”๋ฅธ ํ˜•์‹์œผ๋กœ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ฐ ์–ด๋ฆฐ์ด์— ๋Œ€ํ•œ ์ •๋ณด์—๋Š” ์„ฑ๋ณ„๊ณผ ์ƒ๋…„์›”์ผ(์ ‘๋‘์‚ฌ๊ฐ€ ๋ถ™์€ ์—ด)์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ๋ณ€์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์„ธ๋ก€ ์ƒ๋…„์›”์ผ, ์ ‘๋‘์‚ฌ๊ฐ€ ์žˆ๋Š” ์—ด ํฌํ•จ ์„ฑ๋ณ„ ์•„์ด์˜ ์„ฑ๋ณ„์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค). ์˜ˆ์ƒ๋˜๋Š” ๊ฒฐ๊ณผ๋Š” ๋ณ„๋„์˜ ์—ด์— ํ‘œ์‹œ๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์—ด์ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‚ฌ์–‘์„ ์ƒ์„ฑํ•˜์—ฌ ์ด๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. .value ๋‘ ๊ฐ€์ง€ ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ๊ฐ–๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

spec <- family %>%
  pivot_longer_spec(-family) %>%
  separate(col = name, into = c(".value", "child"))%>%
  mutate(child = parse_number(child))

#> # A tibble: 4 x 3
#>   .name         .value child
#>   <chr>         <chr>  <dbl>
#> 1 dob_child1    dob        1
#> 2 dob_child2    dob        2
#> 3 gender_child1 gender     1
#> 4 gender_child2 gender     2

์ด์ œ ์œ„ ์ฝ”๋“œ๊ฐ€ ์ˆ˜ํ–‰ํ•˜๋Š” ์ž‘์—…์„ ๋‹จ๊ณ„๋ณ„๋กœ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • pivot_longer_spec(-family) โ€” ํŒจ๋ฐ€๋ฆฌ ์—ด์„ ์ œ์™ธํ•œ ๋ชจ๋“  ๊ธฐ์กด ์—ด์„ ์••์ถ•ํ•˜๋Š” ์‚ฌ์–‘์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
  • separate(col = name, into = c(".value", "child")) - ์—ด์„ ๋ถ„ํ•  .name์„, ๋ฐ‘์ค„์„ ์‚ฌ์šฉํ•˜๊ณ  ๊ฒฐ๊ณผ ๊ฐ’์„ ์—ด์— ์ž…๋ ฅํ•˜๋Š” ์†Œ์Šค ํ•„๋“œ์˜ ์ด๋ฆ„์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. .๊ฐ’ ะธ ์•„์ด.
  • mutate(child = parse_number(child)) โ€” ํ•„๋“œ ๊ฐ’ ๋ณ€ํ™˜ ์•„์ด ํ…์ŠคํŠธ์—์„œ ์ˆซ์ž ๋ฐ์ดํ„ฐ ์œ ํ˜•์œผ๋กœ.

์ด์ œ ๊ฒฐ๊ณผ ์‚ฌ์–‘์„ ์›๋ณธ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ ์šฉํ•˜๊ณ  ํ…Œ์ด๋ธ”์„ ์›ํ•˜๋Š” ํ˜•์‹์œผ๋กœ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

family %>% 
    pivot_longer(spec = spec, na.rm = T)

#> # A tibble: 9 x 4
#>   family child dob        gender
#>    <int> <dbl> <date>      <int>
#> 1      1     1 1998-11-26      1
#> 2      1     2 2000-01-29      2
#> 3      2     1 1996-06-22      2
#> 4      3     1 2002-07-11      2
#> 5      3     2 2004-04-05      2
#> 6      4     1 2004-10-10      1
#> 7      4     2 2009-08-27      1
#> 8      5     1 2000-12-05      2
#> 9      5     2 2005-02-28      1

์šฐ๋ฆฌ๋Š” ๋…ผ์Ÿ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค na.rm = TRUE, ํ˜„์žฌ ๋ฐ์ดํ„ฐ ํ˜•์‹์—์„œ๋Š” ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ด€์ธก๊ฐ’์— ๋Œ€ํ•ด ์ถ”๊ฐ€ ํ–‰์„ ์ƒ์„ฑํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด 2๋ฒˆ ๊ฐ€์กฑ์—๊ฒŒ๋Š” ์ž๋…€๊ฐ€ ํ•œ ๋ช…๋ฟ์ด๊ณ , na.rm = TRUE ๊ณ„์—ด 2๊ฐ€ ์ถœ๋ ฅ์— ํ•˜๋‚˜์˜ ํ–‰์„ ๊ฐ–๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

๋‚ ์งœ ํ”„๋ ˆ์ž„์„ ๊ธด ํ˜•์‹์—์„œ ๋„“์€ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜

pivot_wider() -๋Š” ์—ญ๋ณ€ํ™˜์ด๊ณ , ๊ทธ ๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ ํ–‰ ์ˆ˜๋ฅผ ์ค„์—ฌ ๋‚ ์งœ ํ”„๋ ˆ์ž„์˜ ์—ด ์ˆ˜๋ฅผ ๋Š˜๋ฆฝ๋‹ˆ๋‹ค.

R ํŒจํ‚ค์ง€ tidyr ๋ฐ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅivot_longer ๋ฐivot_wider

์ด๋Ÿฌํ•œ ์ข…๋ฅ˜์˜ ๋ณ€ํ™˜์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ •ํ™•ํ•œ ํ˜•์‹์œผ๋กœ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐ ๊ฑฐ์˜ ์‚ฌ์šฉ๋˜์ง€ ์•Š์ง€๋งŒ ์ด ๊ธฐ์ˆ ์€ ํ”„๋ ˆ์  ํ…Œ์ด์…˜์— ์‚ฌ์šฉ๋˜๋Š” ํ”ผ๋ฒ— ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค๊ฑฐ๋‚˜ ๋‹ค๋ฅธ ๋„๊ตฌ์™€ ํ†ตํ•ฉํ•˜๋Š” ๋ฐ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ ๊ธฐ๋Šฅ์€ pivot_longer() ะธ pivot_wider() ๋Œ€์นญ์ ์ด๋ฉฐ ์„œ๋กœ ๋ฐ˜๋Œ€๋˜๋Š” ๋™์ž‘์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰: df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec) ะธ df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec) ์›๋ณธ df๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

ํ…Œ์ด๋ธ”์„ ์™€์ด๋“œ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ์˜ˆ

๊ธฐ๋Šฅ์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด pivot_wider() ์šฐ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค ๋ฌผ๊ณ ๊ธฐ_๋งŒ๋‚จ, ์ด๋Š” ๋‹ค์–‘ํ•œ ์Šคํ…Œ์ด์…˜์ด ๊ฐ•์„ ๋”ฐ๋ผ ๋ฌผ๊ณ ๊ธฐ์˜ ์›€์ง์ž„์„ ๊ธฐ๋กํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

#> # A tibble: 114 x 3
#>    fish  station  seen
#>    <fct> <fct>   <int>
#>  1 4842  Release     1
#>  2 4842  I80_1       1
#>  3 4842  Lisbon      1
#>  4 4842  Rstr        1
#>  5 4842  Base_TD     1
#>  6 4842  BCE         1
#>  7 4842  BCW         1
#>  8 4842  BCE2        1
#>  9 4842  BCW2        1
#> 10 4842  MAE         1
#> # โ€ฆ with 104 more rows

๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ, ๊ฐ ์—ญ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋ณ„๋„์˜ ์—ด์— ํ‘œ์‹œํ•˜๋ฉด ์ด ํ‘œ๊ฐ€ ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์‚ฌ์šฉํ•˜๊ธฐ ๋” ์‰ฌ์›Œ์ง‘๋‹ˆ๋‹ค.

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA
#>  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA
#>  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA
#>  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#>  8 4850        1     1     NA     1       1     1     1    NA    NA    NA
#>  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> # โ€ฆ with 9 more rows, and 1 more variable: MAW <int>

์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ์Šคํ…Œ์ด์…˜์—์„œ ๋ฌผ๊ณ ๊ธฐ๋ฅผ ๊ฐ์ง€ํ•œ ๊ฒฝ์šฐ์—๋งŒ ์ •๋ณด๋ฅผ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ์Šคํ…Œ์ด์…˜์—์„œ ๋ฌผ๊ณ ๊ธฐ๊ฐ€ ๊ธฐ๋ก๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ ์ด ๋ฐ์ดํ„ฐ๋Š” ํ…Œ์ด๋ธ”์— ํฌํ•จ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ถœ๋ ฅ์ด NA๋กœ ์ฑ„์›Œ์ง์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์ด ๊ฒฝ์šฐ ๊ธฐ๋ก์ด ์—†๋‹ค๋Š” ๊ฒƒ์€ ๋ฌผ๊ณ ๊ธฐ๊ฐ€ ๋ณด์ด์ง€ ์•Š์•˜๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋ฏ€๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฃผ์žฅ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ’_์ฑ„์šฐ๊ธฐ ๊ธฐ๋Šฅ์—์„œ pivot_wider() ๋‹ค์Œ ๋ˆ„๋ฝ๋œ ๊ฐ’์„ 0์œผ๋กœ ์ฑ„์›๋‹ˆ๋‹ค.

fish_encounters %>% pivot_wider(
  names_from = station, 
  values_from = seen,
  values_fill = list(seen = 0)
)

#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1     0     0     0     0     0
#>  5 4847        1     1      1     0       0     0     0     0     0     0
#>  6 4848        1     1      1     1       0     0     0     0     0     0
#>  7 4849        1     1      0     0       0     0     0     0     0     0
#>  8 4850        1     1      0     1       1     1     1     0     0     0
#>  9 4851        1     1      0     0       0     0     0     0     0     0
#> 10 4854        1     1      0     0       0     0     0     0     0     0
#> # โ€ฆ with 9 more rows, and 1 more variable: MAW <int>

์—ฌ๋Ÿฌ ์†Œ์Šค ๋ณ€์ˆ˜์—์„œ ์—ด ์ด๋ฆ„ ์ƒ์„ฑ

์ œํ’ˆ, ๊ตญ๊ฐ€, ์—ฐ๋„์˜ ์กฐํ•ฉ์ด ํฌํ•จ๋œ ํ…Œ์ด๋ธ”์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํ…Œ์ŠคํŠธ ๋‚ ์งœ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜๋ ค๋ฉด ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

df <- expand_grid(
  product = c("A", "B"), 
  country = c("AI", "EI"), 
  year = 2000:2014
) %>%
  filter((product == "A" & country == "AI") | product == "B") %>% 
  mutate(value = rnorm(nrow(.)))

#> # A tibble: 45 x 4
#>    product country  year    value
#>    <chr>   <chr>   <int>    <dbl>
#>  1 A       AI       2000 -2.05   
#>  2 A       AI       2001 -0.676  
#>  3 A       AI       2002  1.60   
#>  4 A       AI       2003 -0.353  
#>  5 A       AI       2004 -0.00530
#>  6 A       AI       2005  0.442  
#>  7 A       AI       2006 -0.610  
#>  8 A       AI       2007 -2.77   
#>  9 A       AI       2008  0.899  
#> 10 A       AI       2009 -0.106  
#> # โ€ฆ with 35 more rows

์šฐ๋ฆฌ์˜ ์ž„๋ฌด๋Š” ํ•˜๋‚˜์˜ ์—ด์— ์ œํ’ˆ๊ณผ ๊ตญ๊ฐ€์˜ ๊ฐ ์กฐํ•ฉ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋˜๋„๋ก ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ ค๋ฉด ์ธ์ˆ˜๋ฅผ ์ „๋‹ฌํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฆ„_์ถœ์ฒ˜ ๋ณ‘ํ•ฉํ•  ํ•„๋“œ์˜ ์ด๋ฆ„์„ ํฌํ•จํ•˜๋Š” ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค.

df %>% pivot_wider(names_from = c(product, country),
                 values_from = "value")

#> # A tibble: 15 x 4
#>     year     A_AI    B_AI    B_EI
#>    <int>    <dbl>   <dbl>   <dbl>
#>  1  2000 -2.05     0.607   1.20  
#>  2  2001 -0.676    1.65   -0.114 
#>  3  2002  1.60    -0.0245  0.501 
#>  4  2003 -0.353    1.30   -0.459 
#>  5  2004 -0.00530  0.921  -0.0589
#>  6  2005  0.442   -1.55    0.594 
#>  7  2006 -0.610    0.380  -1.28  
#>  8  2007 -2.77     0.830   0.637 
#>  9  2008  0.899    0.0175 -1.30  
#> 10  2009 -0.106   -0.195   1.03  
#> # โ€ฆ with 5 more rows

๊ธฐ๋Šฅ์— ์‚ฌ์–‘์„ ์ ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. pivot_wider(). ํ•˜์ง€๋งŒ ์ œ์ถœํ•  ๋•Œ pivot_wider() ์‚ฌ์–‘์€ ๋ฐ˜๋Œ€ ๋ณ€ํ™˜์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. pivot_longer(): ๋‹ค์Œ์— ์ง€์ •๋œ ์—ด .name์„, ๋‹ค์Œ์˜ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ .๊ฐ’ ๋ฐ ๊ธฐํƒ€ ์—ด.

์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ์— ์žˆ๋Š” ํ•ญ๋ชฉ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ๊ตญ๊ฐ€ ๋ฐ ์ œํ’ˆ ์กฐํ•ฉ์— ์ž์ฒด ์—ด์ด ์žˆ๋„๋ก ํ•˜๋ ค๋ฉด ์‚ฌ์šฉ์ž ์ง€์ • ์‚ฌ์–‘์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

spec <- df %>% 
  expand(product, country, .value = "value") %>% 
  unite(".name", product, country, remove = FALSE)

#> # A tibble: 4 x 4
#>   .name product country .value
#>   <chr> <chr>   <chr>   <chr> 
#> 1 A_AI  A       AI      value 
#> 2 A_EI  A       EI      value 
#> 3 B_AI  B       AI      value 
#> 4 B_EI  B       EI      value

df %>% pivot_wider(spec = spec) %>% head()

#> # A tibble: 6 x 5
#>    year     A_AI  A_EI    B_AI    B_EI
#>   <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2000 -2.05       NA  0.607   1.20  
#> 2  2001 -0.676      NA  1.65   -0.114 
#> 3  2002  1.60       NA -0.0245  0.501 
#> 4  2003 -0.353      NA  1.30   -0.459 
#> 5  2004 -0.00530    NA  0.921  -0.0589
#> 6  2005  0.442      NA -1.55    0.594

์ƒˆ๋กœ์šด tidyr ๊ฐœ๋…์„ ํ™œ์šฉํ•œ ๋ช‡ ๊ฐ€์ง€ ๊ณ ๊ธ‰ ์‚ฌ๋ก€

๋ฏธ๊ตญ ์ธ๊ตฌ ์กฐ์‚ฌ ์†Œ๋“ ๋ฐ ์ž„๋Œ€ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์˜ˆ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์„ธํŠธ us_rent_income 2017๋…„ ๋ฏธ๊ตญ ๋ชจ๋“  ์ฃผ์˜ ์ค‘๊ฐ„ ์†Œ๋“ ๋ฐ ์ž„๋Œ€๋ฃŒ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค(๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ํŒจํ‚ค์ง€๋กœ ์ œ๊ณต๋จ). ๊น”๋”ํ•œ ์ธ๊ตฌ ์กฐ์‚ฌ).

us_rent_income
#> # A tibble: 104 x 5
#>    GEOID NAME       variable estimate   moe
#>    <chr> <chr>      <chr>       <dbl> <dbl>
#>  1 01    Alabama    income      24476   136
#>  2 01    Alabama    rent          747     3
#>  3 02    Alaska     income      32940   508
#>  4 02    Alaska     rent         1200    13
#>  5 04    Arizona    income      27517   148
#>  6 04    Arizona    rent          972     4
#>  7 05    Arkansas   income      23789   165
#>  8 05    Arkansas   rent          709     5
#>  9 06    California income      29454   109
#> 10 06    California rent         1358     3
#> # โ€ฆ with 94 more rows

๋ฐ์ดํ„ฐ์„ธํŠธ์— ๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋˜๋Š” ํ˜•ํƒœ๋กœ us_rent_income ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘์—…ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ๋ถˆํŽธํ•˜๋ฏ€๋กœ ์—ด์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ์ž„๋Œ€, ์ž„๋Œ€๋ฃŒ_๋ชจ, ์™”๋‹ค, ์†Œ๋“_๋ชจ์—. ์ด ์‚ฌ์–‘์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€๊ฐ€ ์žˆ์ง€๋งŒ ์ค‘์š”ํ•œ ์ ์€ ๋ณ€์ˆ˜ ๊ฐ’๊ณผ ๋ณ€์ˆ˜์˜ ๋ชจ๋“  ์กฐํ•ฉ์„ ์ƒ์„ฑํ•ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฒฌ์ /๋ชจ์—๊ทธ๋Ÿฐ ๋‹ค์Œ ์—ด ์ด๋ฆ„์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  spec <- us_rent_income %>% 
    expand(variable, .value = c("estimate", "moe")) %>% 
    mutate(
      .name = paste0(variable, ifelse(.value == "moe", "_moe", ""))
    )

#> # A tibble: 4 x 3
#>   variable .value   .name     
#>   <chr>    <chr>    <chr>     
#> 1 income   estimate income    
#> 2 income   moe      income_moe
#> 3 rent     estimate rent      
#> 4 rent     moe      rent_moe

์ด ์‚ฌ์–‘์„ ์ œ๊ณต pivot_wider() ์šฐ๋ฆฌ๊ฐ€ ์ฐพ๊ณ  ์žˆ๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

us_rent_income %>% pivot_wider(spec = spec)

#> # A tibble: 52 x 6
#>    GEOID NAME                 income income_moe  rent rent_moe
#>    <chr> <chr>                 <dbl>      <dbl> <dbl>    <dbl>
#>  1 01    Alabama               24476        136   747        3
#>  2 02    Alaska                32940        508  1200       13
#>  3 04    Arizona               27517        148   972        4
#>  4 05    Arkansas              23789        165   709        5
#>  5 06    California            29454        109  1358        3
#>  6 08    Colorado              32401        109  1125        5
#>  7 09    Connecticut           35326        195  1123        5
#>  8 10    Delaware              31560        247  1076       10
#>  9 11    District of Columbia  43198        681  1424       17
#> 10 12    Florida               25952         70  1077        3
#> # โ€ฆ with 42 more rows

์„ธ๊ณ„ ์€ํ–‰

๋•Œ๋กœ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์›ํ•˜๋Š” ํ˜•์‹์œผ๋กœ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ฐ์ดํ„ฐ์„ธํŠธ world_bank_pop 2000๋…„๋ถ€ํ„ฐ 2018๋…„๊นŒ์ง€ ๊ฐ ๊ตญ๊ฐ€์˜ ์ธ๊ตฌ์— ๋Œ€ํ•œ ์„ธ๊ณ„์€ํ–‰ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

#> # A tibble: 1,056 x 20
#>    country indicator `2000` `2001` `2002` `2003`  `2004`  `2005`   `2006`
#>    <chr>   <chr>      <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
#>  1 ABW     SP.URB.Tโ€ฆ 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4  4.49e+4
#>  2 ABW     SP.URB.Gโ€ฆ 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
#>  3 ABW     SP.POP.Tโ€ฆ 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5  1.01e+5
#>  4 ABW     SP.POP.Gโ€ฆ 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0  7.98e-1
#>  5 AFG     SP.URB.Tโ€ฆ 4.44e6 4.65e6 4.89e6 5.16e6 5.43e+6 5.69e+6  5.93e+6
#>  6 AFG     SP.URB.Gโ€ฆ 3.91e0 4.66e0 5.13e0 5.23e0 5.12e+0 4.77e+0  4.12e+0
#>  7 AFG     SP.POP.Tโ€ฆ 2.01e7 2.10e7 2.20e7 2.31e7 2.41e+7 2.51e+7  2.59e+7
#>  8 AFG     SP.POP.Gโ€ฆ 3.49e0 4.25e0 4.72e0 4.82e0 4.47e+0 3.87e+0  3.23e+0
#>  9 AGO     SP.URB.Tโ€ฆ 8.23e6 8.71e6 9.22e6 9.77e6 1.03e+7 1.09e+7  1.15e+7
#> 10 AGO     SP.URB.Gโ€ฆ 5.44e0 5.59e0 5.70e0 5.76e0 5.75e+0 5.69e+0  4.92e+0
#> # โ€ฆ with 1,046 more rows, and 11 more variables: `2007` <dbl>,
#> #   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#> #   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>

์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” ๊ฐ ๋ณ€์ˆ˜๊ฐ€ ํ•ด๋‹น ์—ด์— ํฌํ•จ๋œ ๊น”๋”ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ •ํ™•ํžˆ ์–ด๋–ค ๋‹จ๊ณ„๊ฐ€ ํ•„์š”ํ•œ์ง€๋Š” ํ™•์‹คํ•˜์ง€ ์•Š์ง€๋งŒ ๊ฐ€์žฅ ๋ช…๋ฐฑํ•œ ๋ฌธ์ œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์—ฐ๋„๊ฐ€ ์—ฌ๋Ÿฌ ์—ด์— ๊ฑธ์ณ ๋ถ„์‚ฐ๋˜์–ด ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋ฉด ๋‹ค์Œ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. pivot_longer().

pop2 <- world_bank_pop %>% 
  pivot_longer(`2000`:`2017`, names_to = "year")

#> # A tibble: 19,008 x 4
#>    country indicator   year  value
#>    <chr>   <chr>       <chr> <dbl>
#>  1 ABW     SP.URB.TOTL 2000  42444
#>  2 ABW     SP.URB.TOTL 2001  43048
#>  3 ABW     SP.URB.TOTL 2002  43670
#>  4 ABW     SP.URB.TOTL 2003  44246
#>  5 ABW     SP.URB.TOTL 2004  44669
#>  6 ABW     SP.URB.TOTL 2005  44889
#>  7 ABW     SP.URB.TOTL 2006  44881
#>  8 ABW     SP.URB.TOTL 2007  44686
#>  9 ABW     SP.URB.TOTL 2008  44375
#> 10 ABW     SP.URB.TOTL 2009  44052
#> # โ€ฆ with 18,998 more rows

๋‹ค์Œ ๋‹จ๊ณ„๋Š” ํ‘œ์‹œ ๋ณ€์ˆ˜๋ฅผ ์‚ดํŽด๋ณด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
pop2 %>% count(indicator)

#> # A tibble: 4 x 2
#>   indicator       n
#>   <chr>       <int>
#> 1 SP.POP.GROW  4752
#> 2 SP.POP.TOTL  4752
#> 3 SP.URB.GROW  4752
#> 4 SP.URB.TOTL  4752

SP.POP.GROW๊ฐ€ ์ธ๊ตฌ ์ฆ๊ฐ€์ธ ๊ฒฝ์šฐ SP.POP.TOTL์€ ์ด ์ธ๊ตฌ์ด๊ณ  SP.URB์ž…๋‹ˆ๋‹ค. * ๋˜‘๊ฐ™์ง€๋งŒ ๋„์‹œ ์ง€์—ญ์—๋งŒ ํ•ด๋‹น๋ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ’์„ ๋ฉด์  - ๋ฉด์ (์ „์ฒด ๋˜๋Š” ๋„์‹œ)๊ณผ ์‹ค์ œ ๋ฐ์ดํ„ฐ(์ธ๊ตฌ ๋˜๋Š” ์„ฑ์žฅ)๊ฐ€ ํฌํ•จ๋œ ๋ณ€์ˆ˜์˜ ๋‘ ๊ฐ€์ง€ ๋ณ€์ˆ˜๋กœ ๋‚˜๋ˆ„์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

pop3 <- pop2 %>% 
  separate(indicator, c(NA, "area", "variable"))

#> # A tibble: 19,008 x 5
#>    country area  variable year  value
#>    <chr>   <chr> <chr>    <chr> <dbl>
#>  1 ABW     URB   TOTL     2000  42444
#>  2 ABW     URB   TOTL     2001  43048
#>  3 ABW     URB   TOTL     2002  43670
#>  4 ABW     URB   TOTL     2003  44246
#>  5 ABW     URB   TOTL     2004  44669
#>  6 ABW     URB   TOTL     2005  44889
#>  7 ABW     URB   TOTL     2006  44881
#>  8 ABW     URB   TOTL     2007  44686
#>  9 ABW     URB   TOTL     2008  44375
#> 10 ABW     URB   TOTL     2009  44052
#> # โ€ฆ with 18,998 more rows

์ด์ œ ์šฐ๋ฆฌ๊ฐ€ ํ•ด์•ผ ํ•  ์ผ์€ ๋ณ€์ˆ˜๋ฅผ ๋‘ ๊ฐœ์˜ ์—ด๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

pop3 %>% 
  pivot_wider(names_from = variable, values_from = value)

#> # A tibble: 9,504 x 5
#>    country area  year   TOTL    GROW
#>    <chr>   <chr> <chr> <dbl>   <dbl>
#>  1 ABW     URB   2000  42444  1.18  
#>  2 ABW     URB   2001  43048  1.41  
#>  3 ABW     URB   2002  43670  1.43  
#>  4 ABW     URB   2003  44246  1.31  
#>  5 ABW     URB   2004  44669  0.951 
#>  6 ABW     URB   2005  44889  0.491 
#>  7 ABW     URB   2006  44881 -0.0178
#>  8 ABW     URB   2007  44686 -0.435 
#>  9 ABW     URB   2008  44375 -0.698 
#> 10 ABW     URB   2009  44052 -0.731 
#> # โ€ฆ with 9,494 more rows

์—ฐ๋ฝ์ฒ˜ ๋ชฉ๋ก

๋งˆ์ง€๋ง‰ ์˜ˆ๋ฅผ ๋“ค์–ด, ์›น์‚ฌ์ดํŠธ์—์„œ ๋ณต์‚ฌํ•˜์—ฌ ๋ถ™์—ฌ๋„ฃ์€ ์—ฐ๋ฝ์ฒ˜ ๋ชฉ๋ก์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

contacts <- tribble(
  ~field, ~value,
  "name", "Jiena McLellan",
  "company", "Toyota", 
  "name", "John Smith", 
  "company", "google", 
  "email", "[email protected]",
  "name", "Huxley Ratcliffe"
)

์–ด๋–ค ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค ์—ฐ๋ฝ์ฒ˜์— ์†ํ•˜๋Š”์ง€ ์‹๋ณ„ํ•˜๋Š” ๋ณ€์ˆ˜๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ชฉ๋ก์„ ํ‘œ๋กœ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ฐ ์ƒˆ ์—ฐ๋ฝ์ฒ˜์˜ ๋ฐ์ดํ„ฐ๊ฐ€ "์ด๋ฆ„"์œผ๋กœ ์‹œ์ž‘ํ•œ๋‹ค๋Š” ์ ์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๊ณ ์œ  ์‹๋ณ„์ž๋ฅผ ๋งŒ๋“ค๊ณ  ํ•„๋“œ ์—ด์— "์ด๋ฆ„" ๊ฐ’์ด ํฌํ•จ๋  ๋•Œ๋งˆ๋‹ค ์ด๋ฅผ 1์”ฉ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

contacts <- contacts %>% 
  mutate(
    person_id = cumsum(field == "name")
  )
contacts

#> # A tibble: 6 x 3
#>   field   value            person_id
#>   <chr>   <chr>                <int>
#> 1 name    Jiena McLellan           1
#> 2 company Toyota                   1
#> 3 name    John Smith               2
#> 4 company google                   2
#> 5 email   [email protected]          2
#> 6 name    Huxley Ratcliffe         3

์ด์ œ ๊ฐ ์—ฐ๋ฝ์ฒ˜์— ๋Œ€ํ•œ ๊ณ ์œ  ID๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ํ•„๋“œ์™€ ๊ฐ’์„ ์—ด๋กœ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

contacts %>% 
  pivot_wider(names_from = field, values_from = value)

#> # A tibble: 3 x 4
#>   person_id name             company email          
#>       <int> <chr>            <chr>   <chr>          
#> 1         1 Jiena McLellan   Toyota  <NA>           
#> 2         2 John Smith       google  [email protected]
#> 3         3 Huxley Ratcliffe <NA>    <NA>

๊ฒฐ๋ก 

๋‚ด ๊ฐœ์ธ์ ์ธ ์˜๊ฒฌ์€ ์ƒˆ๋กœ์šด ๊ฐœ๋…์ด ์ •๋ฆฌ์ •๋ˆ ๊ธฐ์กด ๊ธฐ๋Šฅ๋ณด๋‹ค ํ›จ์”ฌ ๋” ์ง๊ด€์ ์ด๊ณ  ๊ธฐ๋Šฅ๋ฉด์—์„œ ์›”๋“ฑํ•ฉ๋‹ˆ๋‹ค. spread() ะธ gather(). ์ด ๊ธฐ์‚ฌ๊ฐ€ ๊ท€ํ•˜์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ์— ๋„์›€์ด ๋˜์—ˆ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค. pivot_longer() ะธ pivot_wider().

์ถœ์ฒ˜ : habr.com

์ฝ”๋ฉ˜ํŠธ๋ฅผ ์ถ”๊ฐ€