ProHoster > Блог > administratio > Expanding columns nested - lists using the R language (tidyr sarcina and functions of unstrad family)
Expanding columns nested - lists using the R language (tidyr sarcina and functions of unstrad family)
Pleraque, cum opus est cum responsione ab API recepta vel cum quavis alia notitia quae structuram arboris complexae habet, cum JSON et XML formatis obvia es.
Formae hae multae utilitates habent: notitias satis arcte reponunt et te permittunt duplicationem rerum supervacuarum notitiarum vitare.
Incommodum harum formarum est multiplicitas eorum processus et analysis. Data informis usus in calculis et visualizationibus uti non potest, in ea aedificari non potest.
Hic articulus est logica continuatio publicationis "R involucrum conditum et nova munera eius cardo longior et pivot_wider". Instructas notitias structuras in familiarem et aptam ad analysin tabularis utens involucrum afferes tidyr, inclusa in media bibliothecae tidyverse, ac familiae functionum unnest_*().
contentus
Si vos es interested in Analysis analysi, fortasse interesse in my telegraphum и YouTube rivos. Pleraque contenti r lingua dedicata est.
Rectang(nota interpres, optiones translationis aptae huic termino non inveni, sic relinquemus ut est). est processus notitiarum informarum inducendi cum vestimentis nidificatis in tabulam duarum dimensivam constans ordinibus et columnis familiaribus. IN' tidyr Plures sunt functiones quae tibi auxilium praebent columnas indicem involucrum expandas et notitias ad formam rectangulam tabulari rediges;
unnest_longer() unumquodque elementum columnae indicem capit novumque ordinem creat.
unnest_wider() quodlibet elementum columnae indicem capit et novam columnam creat.
unnest_auto() sponte determinat quod munus est optimum uti unnest_longer() aut unnest_wider().
hoist() similis unnest_wider() sed tantum partes determinatas eligit ac permittit ut pluribus locis nidificandi operaris.
Pleraque problemata cum notitia informis afferenda cum pluribus gradibus nidificandi in mensa duarum dimensivarum, solvi possunt componendo functiones enumeratas cum dplyr.
Ad has artes demonstrandas, fasciculo utemur repurrrsive, qui varias complexiones praebet, multi- plicamenta e tela API derivata.
cum incipere gh_users, index qui informationes circiter sex GitHub utentium continet. Primum in album transmutare gh_users в tibble artubus;
users <- tibble( user = gh_users )
Hoc parum videtur counterinuitivum: quare album praebent gh_usersut multipliciori notitia structurae? Tabulae datae magnum commodum habet: multiplices vectores coniungit ut omnia in uno obiecto vestigentur.
Unumquodque elementum users index nominatus in quo unumquodque elementum columnae repraesentat.
In hoc casu mensam habemus constantem ex 30 columnis et pluribus non indigemus, sic loco possumus unnest_wider() ut hoist(). hoist() nobis concedit ut excerpere delecti components utendo syntaxin as purrr::pluck():
users %>% hoist(user,
followers = "followers",
login = "login",
url = "html_url"
)
#> # A tibble: 6 x 4
#> followers login url user
#> <int> <chr> <chr> <list>
#> 1 303 gaborcsardi https://github.com/gaborcsardi <named list [27]>
#> 2 780 jennybc https://github.com/jennybc <named list [27]>
#> 3 3958 jtleek https://github.com/jtleek <named list [27]>
#> 4 115 juliasilge https://github.com/juliasilge <named list [27]>
#> 5 213 leeper https://github.com/leeper <named list [27]>
#> 6 34 masalmon https://github.com/masalmon <named list [27]>
hoist() certa removet a columna album nomine components Disputatiout possis considerans hoist() sicut movens elementa ab indice interno e corpore ad suum summum gradum.
Github repositoria
Album alignment gh_repos incipimus similiter convertendo tibble:
Hoc tempore elementa Disputatio repraesentant elenchum repositorium ab hoc usuario possesso. Singula repositio est observatio separata, secundum notionem notitiarum nitidarum (Proxime. Bene servetur notitia) lineae novae fiant, unde utimur unnest_longer() neque unnest_wider():
repos <- repos %>% unnest_longer(repo)
repos
#> # A tibble: 176 x 1
#> repo
#> <list>
#> 1 <named list [68]>
#> 2 <named list [68]>
#> 3 <named list [68]>
#> 4 <named list [68]>
#> 5 <named list [68]>
#> 6 <named list [68]>
#> 7 <named list [68]>
#> 8 <named list [68]>
#> 9 <named list [68]>
#> 10 <named list [68]>
#> # … with 166 more rows
Nunc uti possumus unnest_wider() aut hoist() :
repos %>% hoist(repo,
login = c("owner", "login"),
name = "name",
homepage = "homepage",
watchers = "watchers_count"
)
#> # A tibble: 176 x 5
#> login name homepage watchers repo
#> <chr> <chr> <chr> <int> <list>
#> 1 gaborcsardi after <NA> 5 <named list [65]>
#> 2 gaborcsardi argufy <NA> 19 <named list [65]>
#> 3 gaborcsardi ask <NA> 5 <named list [65]>
#> 4 gaborcsardi baseimports <NA> 0 <named list [65]>
#> 5 gaborcsardi citest <NA> 0 <named list [65]>
#> 6 gaborcsardi clisymbols "" 18 <named list [65]>
#> 7 gaborcsardi cmaker <NA> 0 <named list [65]>
#> 8 gaborcsardi cmark <NA> 0 <named list [65]>
#> 9 gaborcsardi conditions <NA> 0 <named list [65]>
#> 10 gaborcsardi crayon <NA> 52 <named list [65]>
#> # … with 166 more rows
Attende ad usum c("owner", "login"): Hoc nobis permittit ut in secundo gradu valoris ex indice nestedino owner. Vel potest accedere ad totum album owner tum per munus unnest_wider() singula elementa sua in columnam ponere;
Instead of jus munus cogitandi de eligens unnest_longer() aut unnest_wider() vos can utor unnest_auto(). Hoc munus pluribus modis heuristicis utitur ad munus aptissimum ad notitias mutandas eligendas, et nuntium de methodo electa ostendit.
got_chars habet eandem structuram to gh_users: Hic est index nominatorum, ubi singula elementi interni indicem aliquod proprium ludi thronorum characterem describit. adductio got_chars Ad visum mensae, incipimus creando tabulam, sicut in praecedentibus exemplis, et deinde unumquodque elementum in columnam separatam convertimus;
chars <- tibble(char = got_chars)
chars
#> # A tibble: 30 x 1
#> char
#> <list>
#> 1 <named list [18]>
#> 2 <named list [18]>
#> 3 <named list [18]>
#> 4 <named list [18]>
#> 5 <named list [18]>
#> 6 <named list [18]>
#> 7 <named list [18]>
#> 8 <named list [18]>
#> 9 <named list [18]>
#> 10 <named list [18]>
#> # … with 20 more rows
chars2 <- chars %>% unnest_wider(char)
chars2
#> # A tibble: 30 x 18
#> url id name gender culture born died alive titles aliases father
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr>
#> 1 http… 1022 Theo… Male Ironbo… In 2… "" TRUE <chr … <chr [… ""
#> 2 http… 1052 Tyri… Male "" In 2… "" TRUE <chr … <chr [… ""
#> 3 http… 1074 Vict… Male Ironbo… In 2… "" TRUE <chr … <chr [… ""
#> 4 http… 1109 Will Male "" "" In 2… FALSE <chr … <chr [… ""
#> 5 http… 1166 Areo… Male Norvos… In 2… "" TRUE <chr … <chr [… ""
#> 6 http… 1267 Chett Male "" At H… In 2… FALSE <chr … <chr [… ""
#> 7 http… 1295 Cres… Male "" In 2… In 2… FALSE <chr … <chr [… ""
#> 8 http… 130 Aria… Female Dornish In 2… "" TRUE <chr … <chr [… ""
#> 9 http… 1303 Daen… Female Valyri… In 2… "" TRUE <chr … <chr [… ""
#> 10 http… 1319 Davo… Male Wester… In 2… "" TRUE <chr … <chr [… ""
#> # … with 20 more rows, and 7 more variables: mother <chr>, spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>
structure got_chars aliquantum Difficilius est gh_users, quod quidam album components char ipsae elenchus sunt, ut inde columnas - tabulas accipiamus:
Tuae actiones ulteriores a laxitate analyseos pendent. Fortasse debes informationes lineas pro cuiusque libri et seriei in qua indoles apparet;
chars2 %>%
select(name, books, tvSeries) %>%
pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
unnest_longer(value)
#> # A tibble: 180 x 3
#> name media value
#> <chr> <chr> <chr>
#> 1 Theon Greyjoy books A Game of Thrones
#> 2 Theon Greyjoy books A Storm of Swords
#> 3 Theon Greyjoy books A Feast for Crows
#> 4 Theon Greyjoy tvSeries Season 1
#> 5 Theon Greyjoy tvSeries Season 2
#> 6 Theon Greyjoy tvSeries Season 3
#> 7 Theon Greyjoy tvSeries Season 4
#> 8 Theon Greyjoy tvSeries Season 5
#> 9 Theon Greyjoy tvSeries Season 6
#> 10 Tyrion Lannister books A Feast for Crows
#> # … with 170 more rows
Vel fortasse mensam vis creare quae tibi indolem et opus permittit aequare;
chars2 %>%
select(name, title = titles) %>%
unnest_longer(title)
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # … with 50 more rows
(Nota inanis values "" in agro title, hoc est propter ignorantiam factam in notitia intrantes got_charscharacteribus enim, quibus non sunt correspondentes libri et tituli series TV in agro title debet habere vectorem longitudinis 0, non vectorem longitudinis 1 continens chorda vacua.
Non possumus auto exemplum utens supra ad munus unnest_auto(). Aditus hic analysi uni temporis opportunus est, sed non debes inniti unnest_auto() for use in a ordinarius basis. Punctum est quod si vestri notitia structuram mutationes unnest_auto() mutare mechanismum delectae notitiae mutare potest, si initio album columnas in ordines utens expandit unnest_longer()tunc, cum structura notitiarum advenientium mutatur, logica in favorem mutari potest unnest_wider()et per hanc accessionem in perennem fundamentum ad errores inopinatos ducere potest.
tibble(char = got_chars) %>%
unnest_auto(char) %>%
select(name, title = titles) %>%
unnest_auto(title)
#> Using `unnest_wider(char)`; elements have 18 names in common
#> Using `unnest_longer(title)`; no element has names
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # … with 50 more rows
Geocoding cum Google
Deinde inspiciemus magis implicatam structuram notitiarum quae ex Google geocoding servitio consecutae sunt. Caching documentorum est contra regulas operandi cum mappis Google API, itaque primum fasciculum simplex circum API scribam. Quae in reponenda tabularum Google API clavis in ambitu variabili nititur; Si clavis ad operandum cum Google Maps API in variabilibus ambitibus conditis non habes, codice fragmenta in hac sectione exhibita non exsecutioni mandabuntur.
has_key <- !identical(Sys.getenv("GOOGLE_MAPS_API_KEY"), "")
if (!has_key) {
message("No Google Maps API key found; code chunks will not be run")
}
# https://developers.google.com/maps/documentation/geocoding
geocode <- function(address, api_key = Sys.getenv("GOOGLE_MAPS_API_KEY")) {
url <- "https://maps.googleapis.com/maps/api/geocode/json"
url <- paste0(url, "?address=", URLencode(address), "&key=", api_key)
jsonlite::read_json(url)
}
Fortunate quaestionem hanc datam in tabulara gradatim per functiones gradatim solvere possumus tidyr. Ut munus paulo difficilius et concretum, paucas urbes geocoding incipiam;
city <- c ( "Houston" , "LA" , "New York" , "Chicago" , "Springfield" ) city_geo <- purrr::map (city, geocode)
Et convertam inde eventum in tibblepro opportunitate, addam columnam congruenti urbi nomine.
loc <- tibble(city = city, json = city_geo)
loc
#> # A tibble: 5 x 2
#> city json
#> <chr> <list>
#> 1 Houston <named list [2]>
#> 2 LA <named list [2]>
#> 3 New York <named list [2]>
#> 4 Chicago <named list [2]>
#> 5 Springfield <named list [2]>
Primus gradus continet partes status и result, quam extendere possumus cum unnest_wider() :
loc %>%
unnest_wider(json)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <list [1]> OK
#> 2 LA <list [1]> OK
#> 3 New York <list [1]> OK
#> 4 Chicago <list [1]> OK
#> 5 Springfield <list [1]> OK
nota quod results est multi-gradu album. Plurimae urbes 1 elementum habent (singulum valorem cum geocoding API respondentem repraesentant), at Springfield duas habet. In singulis lineis possumus trahere unnest_longer() :
loc %>%
unnest_wider(json) %>%
unnest_longer(results)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <named list [5]> OK
#> 2 LA <named list [5]> OK
#> 3 New York <named list [5]> OK
#> 4 Chicago <named list [5]> OK
#> 5 Springfield <named list [5]> OK
Omnia autem membra eadem habent, quae verificari possunt unnest_wider():
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results)
#> # A tibble: 5 x 7
#> city address_componen… formatted_addre… geometry place_id types status
#> <chr> <list> <chr> <list> <chr> <lis> <chr>
#> 1 Houst… <list [4]> Houston, TX, USA <named … ChIJAYWN… <lis… OK
#> 2 LA <list [4]> Los Angeles, CA… <named … ChIJE9on… <lis… OK
#> 3 New Y… <list [3]> New York, NY, U… <named … ChIJOwg_… <lis… OK
#> 4 Chica… <list [4]> Chicago, IL, USA <named … ChIJ7cv0… <lis… OK
#> 5 Sprin… <list [5]> Springfield, MO… <named … ChIJP5jI… <lis… OK
Invenire possumus coordinatas uniuscuiusque civitatis latitudinem et longitudinem dilatando geometry:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry)
#> # A tibble: 5 x 10
#> city address_compone… formatted_addre… bounds location location_type
#> <chr> <list> <chr> <list> <list> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… <named … APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… <named … APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… <named … APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… <named … APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… <named … APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Et tunc locus pro quo dilatari debes location:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Sed rursus; unnest_auto() descriptum simplificat operationem cum aliquibus periculis quae causari possunt immutando structuram notitiarum advenientis;
loc %>%
unnest_auto(json) %>%
unnest_auto(results) %>%
unnest_auto(results) %>%
unnest_auto(geometry) %>%
unnest_auto(location)
#> Using `unnest_wider(json)`; elements have 2 names in common
#> Using `unnest_longer(results)`; no element has names
#> Using `unnest_wider(results)`; elements have 5 names in common
#> Using `unnest_wider(geometry)`; elements have 4 names in common
#> Using `unnest_wider(location)`; elements have 2 names in common
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Etiam in unaquaque urbe primam orationem modo intueri possumus:
loc %>%
unnest_wider(json) %>%
hoist(results, first_result = 1) %>%
unnest_wider(first_result) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Aut uti hoist() ad multi-gradu dare directe ad lat и lng.
loc %>%
hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)
#> # A tibble: 5 x 4
#> city lat lng json
#> <chr> <dbl> <dbl> <list>
#> 1 Houston 29.8 -95.4 <named list [2]>
#> 2 LA 34.1 -118. <named list [2]>
#> 3 New York 40.7 -74.0 <named list [2]>
#> 4 Chicago 41.9 -87.6 <named list [2]>
#> 5 Springfield 37.2 -93.3 <named list [2]>
Discography of Sharla Gelfand
Denique ad structuram maxime implicatam spectabimus - discographiam Sharla Gelfand. Sicut in superioribus exemplis incipimus convertendo album ad singulas columnas datas tabulas, et tunc extendimus ipsum ut singula pars sit columna separata. Item converto columnam date_added ad congruum diem et tempus format in R.
discs <- tibble(disc = discog) %>%
unnest_wider(disc) %>%
mutate(date_added = as.POSIXct(strptime(date_added, "%Y-%m-%dT%H:%M:%S")))
discs
#> # A tibble: 155 x 5
#> instance_id date_added basic_information id rating
#> <int> <dttm> <list> <int> <int>
#> 1 354823933 2019-02-16 17:48:59 <named list [11]> 7496378 0
#> 2 354092601 2019-02-13 14:13:11 <named list [11]> 4490852 0
#> 3 354091476 2019-02-13 14:07:23 <named list [11]> 9827276 0
#> 4 351244906 2019-02-02 11:39:58 <named list [11]> 9769203 0
#> 5 351244801 2019-02-02 11:39:37 <named list [11]> 7237138 0
#> 6 351052065 2019-02-01 20:40:53 <named list [11]> 13117042 0
#> 7 350315345 2019-01-29 15:48:37 <named list [11]> 7113575 0
#> 8 350315103 2019-01-29 15:47:22 <named list [11]> 10540713 0
#> 9 350314507 2019-01-29 15:44:08 <named list [11]> 11260950 0
#> 10 350314047 2019-01-29 15:41:35 <named list [11]> 11726853 0
#> # … with 145 more rows
In hoc gradu notitias de singulis discus additus est discographiae Sharla, sed notitias de illis discis non videmus. Ad hoc faciendum opus est columnam ampliare basic_information:
discs %>% unnest_wider(basic_information)
#> Column name `id` must not be duplicated.
#> Use .name_repair to specify repair.
Donec consequat suscipit enim,... intra album basic_information est eiusdem nominis columna basic_information. Si talis error incidit, ut causam suam cito determinet, poteris uti names_repair = "unique":
Potes ergo eas ad originalem dataset prout opus fuerit coniungere.
conclusio,
Ad nucleum bibliothecae tidyverse plures sarcinas utiles comprehendit a communi notitia processus philosophiae unita.
In hoc articulo familiae functionum examinavimus unnest_*()quae ad operandum de tabulis nidificandis eximendis elementis destinantur. In hac sarcina multas alias notas utilias continet quae faciliorem reddere notitias secundum notionem faciunt Reditus Data.