ProHoster > Blog > Ma'muriyat > Ichki ustunlarni kengaytirish - R tilidan foydalangan holda ro'yxatlar (tidyr to'plami va unnest oilaning funktsiyalari)
Ichki ustunlarni kengaytirish - R tilidan foydalangan holda ro'yxatlar (tidyr to'plami va unnest oilaning funktsiyalari)
Ko'pgina hollarda, API-dan olingan javob yoki murakkab daraxt tuzilishiga ega bo'lgan boshqa ma'lumotlar bilan ishlashda siz JSON va XML formatlariga duch kelasiz.
Ushbu formatlar juda ko'p afzalliklarga ega: ular ma'lumotlarni juda ixcham saqlashadi va ma'lumotlarning keraksiz takrorlanishini oldini olishga imkon beradi.
Ushbu formatlarning kamchiliklari ularni qayta ishlash va tahlil qilishning murakkabligidir. Tarkibi bo'lmagan ma'lumotlardan hisob-kitoblarda foydalanish mumkin emas va unga vizualizatsiyani qurish mumkin emas.
Ushbu maqola nashrning mantiqiy davomidir "R paketi tidyr va uning yangi funksiyalari pivot_longer va pivot_wider". Bu sizga tuzilmagan ma'lumotlar tuzilmalarini tanish va tahlil qilish uchun mos jadval shakliga keltirishga yordam beradi. tidyr, kutubxonaning o'zagiga kiritilgan tidyverse, va uning funksiyalar oilasi unnest_*().
Mundarija
Agar siz ma'lumotlarni tahlil qilish bilan qiziqsangiz, meni qiziqtirishi mumkin telegramma и youtube kanallar. Kontentning katta qismi R tiliga bag'ishlangan.
To'rtburchaklar(tarjimonning eslatmasi, men ushbu atama uchun tegishli tarjima variantlarini topa olmadim, shuning uchun uni avvalgidek qoldiramiz.) ichki massivlar bilan tuzilmagan ma'lumotlarni tanish satr va ustunlardan iborat ikki o'lchovli jadvalga keltirish jarayonidir. IN tidyr Ichki ro'yxat ustunlarini kengaytirish va ma'lumotlarni to'rtburchaklar, jadval ko'rinishiga kamaytirishga yordam beradigan bir nechta funksiyalar mavjud:
unnest_longer() ustunlar ro'yxatining har bir elementini oladi va yangi qator yaratadi.
unnest_wider() ustunlar ro'yxatining har bir elementini oladi va yangi ustun yaratadi.
unnest_auto() qaysi funksiyadan foydalanish yaxshiroq ekanligini avtomatik ravishda aniqlaydi unnest_longer() yoki unnest_wider().
hoist() o'xshash unnest_wider() lekin faqat belgilangan komponentlarni tanlaydi va bir necha darajali joylashtirish bilan ishlashga imkon beradi.
Ikki o'lchovli jadvalga bir necha darajali joylashtirish bilan tuzilmagan ma'lumotlarni keltirish bilan bog'liq muammolarning aksariyati ro'yxatga olingan funktsiyalarni dplyr bilan birlashtirish orqali hal qilinishi mumkin.
Ushbu texnikani namoyish qilish uchun biz paketdan foydalanamiz repurrrsive, bu veb-API-dan olingan bir nechta murakkab, ko'p darajali ro'yxatlarni taqdim etadi.
Keling, boshlang gh_users, oltita GitHub foydalanuvchilari haqidagi ma'lumotlarni o'z ichiga olgan ro'yxat. Avval ro'yxatni o'zgartiramiz gh_users в tibbiyotle ramka:
users <- tibble( user = gh_users )
Bu biroz ziddiyatli ko'rinadi: nima uchun ro'yxatni taqdim eting gh_users, yanada murakkab ma'lumotlar tuzilishiga? Ammo ma'lumotlar ramkasi katta afzalliklarga ega: u bir nechta vektorlarni birlashtiradi, shunda hamma narsa bitta ob'ektda kuzatiladi.
Har bir ob'ekt elementi users - har bir element ustunni ifodalovchi nomli ro'yxat.
Bunday holda, bizda 30 ta ustundan iborat jadval mavjud va ularning ko'pchiligi bizga kerak bo'lmaydi, shuning uchun biz o'rniga unnest_wider() foydalanish hoist(). hoist() bilan bir xil sintaksis yordamida tanlangan komponentlarni ajratib olish imkonini beradi purrr::pluck():
users %>% hoist(user,
followers = "followers",
login = "login",
url = "html_url"
)
#> # A tibble: 6 x 4
#> followers login url user
#> <int> <chr> <chr> <list>
#> 1 303 gaborcsardi https://github.com/gaborcsardi <named list [27]>
#> 2 780 jennybc https://github.com/jennybc <named list [27]>
#> 3 3958 jtleek https://github.com/jtleek <named list [27]>
#> 4 115 juliasilge https://github.com/juliasilge <named list [27]>
#> 5 213 leeper https://github.com/leeper <named list [27]>
#> 6 34 masalmon https://github.com/masalmon <named list [27]>
hoist() ko'rsatilgan komponentlarni ustunlar ro'yxatidan olib tashlaydi foydalanuvchishuning uchun siz o'ylab ko'rishingiz mumkin hoist() komponentlarni sana ramkasining ichki ro'yxatidan uning yuqori darajasiga ko'chirish kabi.
Github omborlari
Ro'yxatni tekislash gh_repos ga aylantirish orqali xuddi shunday boshlaymiz tibble:
Bu safar elementlar foydalanuvchi ushbu foydalanuvchiga tegishli bo'lgan omborlar ro'yxatini ifodalaydi. Har bir ombor alohida kuzatuvdir, shuning uchun aniq ma'lumotlar tushunchasiga ko'ra (taxminan toza ma'lumotlar) ular yangi chiziqlarga aylanishi kerak, shuning uchun biz foydalanamiz unnest_longer() va yo'q unnest_wider():
repos <- repos %>% unnest_longer(repo)
repos
#> # A tibble: 176 x 1
#> repo
#> <list>
#> 1 <named list [68]>
#> 2 <named list [68]>
#> 3 <named list [68]>
#> 4 <named list [68]>
#> 5 <named list [68]>
#> 6 <named list [68]>
#> 7 <named list [68]>
#> 8 <named list [68]>
#> 9 <named list [68]>
#> 10 <named list [68]>
#> # … with 166 more rows
Endi biz foydalanishimiz mumkin unnest_wider() yoki hoist() :
repos %>% hoist(repo,
login = c("owner", "login"),
name = "name",
homepage = "homepage",
watchers = "watchers_count"
)
#> # A tibble: 176 x 5
#> login name homepage watchers repo
#> <chr> <chr> <chr> <int> <list>
#> 1 gaborcsardi after <NA> 5 <named list [65]>
#> 2 gaborcsardi argufy <NA> 19 <named list [65]>
#> 3 gaborcsardi ask <NA> 5 <named list [65]>
#> 4 gaborcsardi baseimports <NA> 0 <named list [65]>
#> 5 gaborcsardi citest <NA> 0 <named list [65]>
#> 6 gaborcsardi clisymbols "" 18 <named list [65]>
#> 7 gaborcsardi cmaker <NA> 0 <named list [65]>
#> 8 gaborcsardi cmark <NA> 0 <named list [65]>
#> 9 gaborcsardi conditions <NA> 0 <named list [65]>
#> 10 gaborcsardi crayon <NA> 52 <named list [65]>
#> # … with 166 more rows
Foydalanishga e'tibor bering c("owner", "login"): Bu bizga ichki ro'yxatdagi ikkinchi darajali qiymatni olish imkonini beradi owner. Muqobil yondashuv butun ro'yxatni olishdir owner va keyin funksiyadan foydalaning unnest_wider() uning har bir elementini ustunga qo'ying:
To'g'ri funktsiyani tanlash haqida o'ylashning o'rniga unnest_longer() yoki unnest_wider() foydalanishingiz mumkin unnest_auto(). Ushbu funktsiya ma'lumotlarni o'zgartirish uchun eng mos funksiyani tanlash uchun bir nechta evristik usullardan foydalanadi va tanlangan usul haqida xabarni ko'rsatadi.
got_chars bilan bir xil tuzilishga ega gh_users: Bu nomli roʻyxatlar toʻplami boʻlib, ichki roʻyxatning har bir elementi “Taxtlar oʻyini” qahramonining ayrim atributini tavsiflaydi. olib kelish got_chars Jadval ko'rinishi uchun biz avvalgi misollardagi kabi sana ramkasini yaratishdan boshlaymiz va keyin har bir elementni alohida ustunga aylantiramiz:
chars <- tibble(char = got_chars)
chars
#> # A tibble: 30 x 1
#> char
#> <list>
#> 1 <named list [18]>
#> 2 <named list [18]>
#> 3 <named list [18]>
#> 4 <named list [18]>
#> 5 <named list [18]>
#> 6 <named list [18]>
#> 7 <named list [18]>
#> 8 <named list [18]>
#> 9 <named list [18]>
#> 10 <named list [18]>
#> # … with 20 more rows
chars2 <- chars %>% unnest_wider(char)
chars2
#> # A tibble: 30 x 18
#> url id name gender culture born died alive titles aliases father
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr>
#> 1 http… 1022 Theo… Male Ironbo… In 2… "" TRUE <chr … <chr [… ""
#> 2 http… 1052 Tyri… Male "" In 2… "" TRUE <chr … <chr [… ""
#> 3 http… 1074 Vict… Male Ironbo… In 2… "" TRUE <chr … <chr [… ""
#> 4 http… 1109 Will Male "" "" In 2… FALSE <chr … <chr [… ""
#> 5 http… 1166 Areo… Male Norvos… In 2… "" TRUE <chr … <chr [… ""
#> 6 http… 1267 Chett Male "" At H… In 2… FALSE <chr … <chr [… ""
#> 7 http… 1295 Cres… Male "" In 2… In 2… FALSE <chr … <chr [… ""
#> 8 http… 130 Aria… Female Dornish In 2… "" TRUE <chr … <chr [… ""
#> 9 http… 1303 Daen… Female Valyri… In 2… "" TRUE <chr … <chr [… ""
#> 10 http… 1319 Davo… Male Wester… In 2… "" TRUE <chr … <chr [… ""
#> # … with 20 more rows, and 7 more variables: mother <chr>, spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>
tuzilma got_chars dan biroz qiyinroq gh_users, chunki ba'zi ro'yxat komponentlari char o'zlari ro'yxat, natijada biz ustunlar - ro'yxatlarni olamiz:
Sizning keyingi harakatlaringiz tahlil maqsadlariga bog'liq. Ehtimol, qahramon paydo bo'lgan har bir kitob va seriya uchun satrlarga ma'lumot qo'yish kerak bo'ladi:
chars2 %>%
select(name, books, tvSeries) %>%
pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
unnest_longer(value)
#> # A tibble: 180 x 3
#> name media value
#> <chr> <chr> <chr>
#> 1 Theon Greyjoy books A Game of Thrones
#> 2 Theon Greyjoy books A Storm of Swords
#> 3 Theon Greyjoy books A Feast for Crows
#> 4 Theon Greyjoy tvSeries Season 1
#> 5 Theon Greyjoy tvSeries Season 2
#> 6 Theon Greyjoy tvSeries Season 3
#> 7 Theon Greyjoy tvSeries Season 4
#> 8 Theon Greyjoy tvSeries Season 5
#> 9 Theon Greyjoy tvSeries Season 6
#> 10 Tyrion Lannister books A Feast for Crows
#> # … with 170 more rows
Yoki siz xarakter va asarga mos keladigan jadval yaratmoqchi bo'lishingiz mumkin:
chars2 %>%
select(name, title = titles) %>%
unnest_longer(title)
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # … with 50 more rows
(Bo'sh qiymatlarga e'tibor bering "" dalada title, bu ma'lumotlarni kiritishda qilingan xatolar bilan bog'liq got_chars: aslida, sohada tegishli kitob va teleseriallar nomlari mavjud bo'lmagan belgilar title bo'sh satrni o'z ichiga olgan 0 uzunlikdagi vektor emas, balki 1 uzunlikdagi vektorga ega bo'lishi kerak.)
Funktsiyadan foydalanib, yuqoridagi misolni qayta yozishimiz mumkin unnest_auto(). Ushbu yondashuv bir martalik tahlil qilish uchun qulay, ammo siz tayanmasligingiz kerak unnest_auto() muntazam foydalanish uchun. Gap shundaki, agar sizning ma'lumotlar strukturangiz o'zgarsa unnest_auto() Agar dastlab ro'yxat ustunlarini qatorlarga kengaytirgan bo'lsa, tanlangan ma'lumotlarni o'zgartirish mexanizmini o'zgartirishi mumkin unnest_longer(), keyin kiruvchi ma'lumotlarning tuzilishi o'zgarganda, mantiqni foydasiga o'zgartirish mumkin unnest_wider(), va bu yondashuvni doimiy ravishda ishlatish kutilmagan xatolarga olib kelishi mumkin.
tibble(char = got_chars) %>%
unnest_auto(char) %>%
select(name, title = titles) %>%
unnest_auto(title)
#> Using `unnest_wider(char)`; elements have 18 names in common
#> Using `unnest_longer(title)`; no element has names
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # … with 50 more rows
Google bilan geokodlash
Keyinchalik, Google geokodlash xizmatidan olingan ma'lumotlarning yanada murakkab tuzilishini ko'rib chiqamiz. Hisob ma'lumotlarini keshlash Google maps API bilan ishlash qoidalariga ziddir, shuning uchun avval API atrofida oddiy o'ram yozaman. Bu Google Maps API kalitini muhit o'zgaruvchisida saqlashga asoslangan; Agar sizda Google Maps API bilan ishlash kaliti sizning muhit o'zgaruvchilaringizda saqlangan bo'lsa, ushbu bo'limda keltirilgan kod bo'laklari bajarilmaydi.
has_key <- !identical(Sys.getenv("GOOGLE_MAPS_API_KEY"), "")
if (!has_key) {
message("No Google Maps API key found; code chunks will not be run")
}
# https://developers.google.com/maps/documentation/geocoding
geocode <- function(address, api_key = Sys.getenv("GOOGLE_MAPS_API_KEY")) {
url <- "https://maps.googleapis.com/maps/api/geocode/json"
url <- paste0(url, "?address=", URLencode(address), "&key=", api_key)
jsonlite::read_json(url)
}
Ushbu funktsiya qaytaradigan ro'yxat juda murakkab:
Yaxshiyamki, biz ushbu ma'lumotlarni bosqichma-bosqich funktsiyalar yordamida jadval shakliga aylantirish muammosini hal qilishimiz mumkin tidyr. Vazifani biroz qiyinroq va realistik qilish uchun men bir nechta shaharlarni geokodlashdan boshlayman:
city <- c ( "Houston" , "LA" , "New York" , "Chicago" , "Springfield" ) city_geo <- purrr::map (city, geocode)
Olingan natijani o'zgartiraman tibble, qulaylik uchun men tegishli shahar nomi bilan ustun qo'shaman.
loc <- tibble(city = city, json = city_geo)
loc
#> # A tibble: 5 x 2
#> city json
#> <chr> <list>
#> 1 Houston <named list [2]>
#> 2 LA <named list [2]>
#> 3 New York <named list [2]>
#> 4 Chicago <named list [2]>
#> 5 Springfield <named list [2]>
Birinchi daraja komponentlarni o'z ichiga oladi status и result, biz uni kengaytirishimiz mumkin unnest_wider() :
loc %>%
unnest_wider(json)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <list [1]> OK
#> 2 LA <list [1]> OK
#> 3 New York <list [1]> OK
#> 4 Chicago <list [1]> OK
#> 5 Springfield <list [1]> OK
E'tibor bering results ko'p darajali ro'yxatdir. Aksariyat shaharlarda 1 ta element mavjud (geokodlash API ga mos keladigan noyob qiymatni ifodalaydi), lekin Springfildda ikkita element mavjud. Biz ularni alohida qatorlarga tortib olishimiz mumkin unnest_longer() :
loc %>%
unnest_wider(json) %>%
unnest_longer(results)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <named list [5]> OK
#> 2 LA <named list [5]> OK
#> 3 New York <named list [5]> OK
#> 4 Chicago <named list [5]> OK
#> 5 Springfield <named list [5]> OK
Endi ularning barchasi bir xil komponentlarga ega, ular yordamida tekshirish mumkin unnest_wider():
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results)
#> # A tibble: 5 x 7
#> city address_componen… formatted_addre… geometry place_id types status
#> <chr> <list> <chr> <list> <chr> <lis> <chr>
#> 1 Houst… <list [4]> Houston, TX, USA <named … ChIJAYWN… <lis… OK
#> 2 LA <list [4]> Los Angeles, CA… <named … ChIJE9on… <lis… OK
#> 3 New Y… <list [3]> New York, NY, U… <named … ChIJOwg_… <lis… OK
#> 4 Chica… <list [4]> Chicago, IL, USA <named … ChIJ7cv0… <lis… OK
#> 5 Sprin… <list [5]> Springfield, MO… <named … ChIJP5jI… <lis… OK
Ro'yxatni kengaytirish orqali har bir shaharning kenglik va uzunlik koordinatalarini topishimiz mumkin geometry:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry)
#> # A tibble: 5 x 10
#> city address_compone… formatted_addre… bounds location location_type
#> <chr> <list> <chr> <list> <list> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… <named … APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… <named … APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… <named … APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… <named … APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… <named … APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Va keyin siz kengaytirishingiz kerak bo'lgan joy location:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Yana, unnest_auto() Kiruvchi ma'lumotlarning tuzilishini o'zgartirish natijasida yuzaga kelishi mumkin bo'lgan ba'zi xavflar bilan tavsiflangan operatsiyani soddalashtiradi:
loc %>%
unnest_auto(json) %>%
unnest_auto(results) %>%
unnest_auto(results) %>%
unnest_auto(geometry) %>%
unnest_auto(location)
#> Using `unnest_wider(json)`; elements have 2 names in common
#> Using `unnest_longer(results)`; no element has names
#> Using `unnest_wider(results)`; elements have 5 names in common
#> Using `unnest_wider(geometry)`; elements have 4 names in common
#> Using `unnest_wider(location)`; elements have 2 names in common
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Bundan tashqari, har bir shahar uchun birinchi manzilga qarashimiz mumkin:
loc %>%
unnest_wider(json) %>%
hoist(results, first_result = 1) %>%
unnest_wider(first_result) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Yoki foydalaning hoist() to'g'ridan-to'g'ri borish uchun ko'p darajali sho'ng'in uchun lat и lng.
loc %>%
hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)
#> # A tibble: 5 x 4
#> city lat lng json
#> <chr> <dbl> <dbl> <list>
#> 1 Houston 29.8 -95.4 <named list [2]>
#> 2 LA 34.1 -118. <named list [2]>
#> 3 New York 40.7 -74.0 <named list [2]>
#> 4 Chicago 41.9 -87.6 <named list [2]>
#> 5 Springfield 37.2 -93.3 <named list [2]>
Sharla Gelfandning diskografiyasi
Va nihoyat, biz eng murakkab tuzilmani - Sharla Gelfandning diskografiyasini ko'rib chiqamiz. Yuqoridagi misollarda bo'lgani kabi, biz ro'yxatni bitta ustunli ma'lumotlar ramkasiga aylantirishdan boshlaymiz va keyin uni har bir komponent alohida ustun bo'lishi uchun kengaytiramiz. Bundan tashqari, ustunni o'zgartiraman date_added R da tegishli sana va vaqt formatiga.
discs <- tibble(disc = discog) %>%
unnest_wider(disc) %>%
mutate(date_added = as.POSIXct(strptime(date_added, "%Y-%m-%dT%H:%M:%S")))
discs
#> # A tibble: 155 x 5
#> instance_id date_added basic_information id rating
#> <int> <dttm> <list> <int> <int>
#> 1 354823933 2019-02-16 17:48:59 <named list [11]> 7496378 0
#> 2 354092601 2019-02-13 14:13:11 <named list [11]> 4490852 0
#> 3 354091476 2019-02-13 14:07:23 <named list [11]> 9827276 0
#> 4 351244906 2019-02-02 11:39:58 <named list [11]> 9769203 0
#> 5 351244801 2019-02-02 11:39:37 <named list [11]> 7237138 0
#> 6 351052065 2019-02-01 20:40:53 <named list [11]> 13117042 0
#> 7 350315345 2019-01-29 15:48:37 <named list [11]> 7113575 0
#> 8 350315103 2019-01-29 15:47:22 <named list [11]> 10540713 0
#> 9 350314507 2019-01-29 15:44:08 <named list [11]> 11260950 0
#> 10 350314047 2019-01-29 15:41:35 <named list [11]> 11726853 0
#> # … with 145 more rows
Ushbu darajada biz Sharla diskografiyasiga har bir disk qachon qo'shilganligi haqida ma'lumot olamiz, lekin biz bu disklar haqida hech qanday ma'lumotni ko'rmayapmiz. Buning uchun biz ustunni kengaytirishimiz kerak basic_information:
discs %>% unnest_wider(basic_information)
#> Column name `id` must not be duplicated.
#> Use .name_repair to specify repair.
Afsuski, biz xatoga duch kelamiz, chunki... ro'yxat ichida basic_information bir xil nomdagi ustun mavjud basic_information. Agar bunday xatolik yuzaga kelsa, uning sababini tezda aniqlash uchun siz foydalanishingiz mumkin names_repair = "unique":
Keyin kerak bo'lganda ularni asl ma'lumotlar to'plamiga qaytarishingiz mumkin.
xulosa
Kutubxonaning asosiy qismiga tidyverse umumiy ma'lumotlarni qayta ishlash falsafasi bilan birlashtirilgan ko'plab foydali paketlarni o'z ichiga oladi.
Ushbu maqolada biz funktsiyalar oilasini ko'rib chiqdik unnest_*(), ular ichki ro'yxatlardan elementlarni ajratib olish bilan ishlashga qaratilgan. Ushbu paket kontseptsiyaga muvofiq ma'lumotlarni aylantirishni osonlashtiradigan ko'plab boshqa foydali xususiyatlarni o'z ichiga oladi Toza ma'lumotlar.