Kwaziwai, vagari veHabr. Pamberi pekutanga kosi. Takagadzirira shanduro yezvinyorwa zvinonakidza.

Muzviitiko zvakawanda zvekushandisa izvo isu, se Isu tinoona pamawebhusaiti evatengi vedu kuti ruzivo rwakakosha rwakavigwa muhukama pakati pemasangano, semuenzaniso, kana tichiongorora hukama pakati pevashandisi, kutsamira pakati pezvinhu, kana kubatana pakati pemasensor. Mamiriro ekushandisa akadaro anowanzo kuenzanisirwa pagirafu. Pakutanga gore rino, Amazon yakaburitsa dhatabhesi nyowani, Neptune. Mune ino post, tinoda kugovera mazano edu ekutanga, maitiro akanakisa, uye chii chinogona kuvandudzwa nekufamba kwenguva.
Nei Tichida Amazon Neptune
Girafu dhatabhesi inovimbisa kubata zvakanyanya akabatana dhataseti zvirinani pane avo ane hukama. Mumadataset akadaro, ruzivo rwakakodzera runowanzo kuchengetwa muhukama pakati pezvinhu. Kuedza Neptune, takashandisa inoshamisa yakavhurika data chirongwa. MusicBrainz inounganidza metadata yese inofungidzirwa nezvemumhanzi, senge ruzivo nezve maartist, nziyo, kuburitswa kwealbum, kana makonzati, pamwe nekuti muimbi akashanda nani, kana kuti rinhi rakaburitswa munyika ipi. MusicBrainz inogona kufungidzirwa se network yakakura yemasangano ayo akabatana neimwe indasitiri yemimhanzi.
Iyo MusicBrainz dataset inopihwa se CSV yekurasa yehukama dhatabhesi. Pakazara, nzvimbo yekurasa ine mitsara inosvika 93 miriyoni mumatafura 157. Nepo mamwe ematafura aya aine yakakosha data senge maartist, zviitiko, kurekodha, kuburitswa, kana mateki, vamwe link tables - chengetedza hukama pakati pevanyori uye zvakarekodhwa, mamwe maartist kana kuburitswa, zvichingodaro. Ivo vanoratidza chimiro chegirafu che dataset. Pakushandura dataset kuita RDF katatu, takawana angangoita 500 miriyoni zviitiko.
Zvichienderana neruzivo uye mhinduro kubva kuvashandi vepurojekiti vatinoshanda navo, isu tinofungidzira setup umo iyi nheyo yeruzivo inoshandiswa kuwana ruzivo rutsva. Uyezve, isu tinofungidzira kuti ichivandudzwa nguva dzose, semuenzaniso, nekuwedzera zvitsva zvitsva kana kuvandudza nhengo dzeboka.
kuchinja
Sezvinotarisirwa, kuisa Amazon Neptune iri nyore. Zvakanyatsotsanangurwa. Unogona kuvhura dhatabhesi regirafu nekungodzvanya zvishoma. Nekudaro, kana zvasvika pakunyanya kurongeka, Zvakaoma kuwana. Naizvozvo, isu tinoda kunongedza imwe gadziriso parameter.

Configuration screenshot yeparameter mapoka
Amazon inotaura kuti Neptune inotarisa pane yakaderera-latency transactional basa, saka iyo default chikumbiro nguva yekupedza ndeye 120 masekondi. Nekudaro, isu takaedza akawanda analytical makesi ekushandisa umo isu tinogara tichirova muganho uyu. Iyi nguva yekupera inogona kugadziridzwa nekugadzira boka idzva reparameter yeNeptune nekurimisa kwariri neptune_query_timeout chirambidzo chinoenderana.
Loading Data
Pazasi isu tichakurukura zvakadzama kuti takaisa sei MusicBrainz data muNeptune.
Hukama muhutatu
Kutanga, takashandura iyo MusicBrainz data kuita RDF katatu. Naizvozvo, patafura yega yega, isu takatsanangura template inotaridza kuti imwe neimwe koramu inomiririrwa sei mumatatu. Mumuenzaniso uyu, mutsara wega wega kubva patafura yemifananidzo inomisikidzwa kusvika gumi nembiri RDF katatu.
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/gid> "${gid}"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/name> "${name}"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/sort-name> "${sort_name}"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/begin-date> "${begin_date_year}-${begin_date_month}-${begin_date_day}"^^xsd:<http://www.w3.org/2001/XMLSchema#date> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/end-date> "${end_date_year}-${end_date_month}-${end_date_day}"^^xsd:<http://www.w3.org/2001/XMLSchema#date> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/type> <http://musicbrainz.foo/artist-type/${type}> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/area> <http://musicbrainz.foo/area/${area}> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/gender> <http://musicbrainz.foo/gender/${gender}> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/comment> "${comment}"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/edits-pending> "${edits_pending}"^^<http://www.w3.org/2001/XMLSchema#int> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/last-updated> "${last_updated}"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/ended> "${ended}"^^<http://www.w3.org/2001/XMLSchema#boolean> .
Bulk upload
Iyo yakakurudzirwa nzira yekurodha yakawanda data kuNeptune inzira yekurodha yakawanda kuburikidza neS3. Mushure mekuisa mafaera ako matatu kuS3, iwe unotanga kurodha nePOST chikumbiro. Muchiitiko chedu, izvi zvakatora maawa makumi maviri nemana kune 500 mamiriyoni katatu. Taitarisira kuti ichakurumidza.
curl -X POST -H 'Content-Type: application/json' http://your-neptune-cluster:8182/loader -d '{
"source" : "s3://your-s3-bucket",
"format" : "ntriples",
"iamRoleArn" : "arn:aws:iam::your-iam-user:role/NeptuneLoadFromS3",
"region" : "eu-west-1",
"failOnError" : "FALSE"
}'Kuti tidzivise iyi yakareba maitiro pese patinotanga Neptune, isu takafunga kudzoreredza muenzaniso kubva muchidimbu neawa matatu atotakurwa. Kutangisa kubva pamufananidzo kunokurumidza kukurumidza, asi zvinotora ingangoita awa kuti Neptune iwanikwe kune zvikumbiro.
Patakatanga kurodha katatu muNeptune, takasangana nezvikanganiso zvakasiyana.
{
"errorCode" : "PARSING_ERROR",
"errorMessage" : "Content after '.' is not allowed",
"fileName" : [...],
"recordNum" : 25
}Dzimwe dzadzo dzaive zvikanganiso zvekuongorora, sezvaratidzwa pamusoro. Kusvika pari zvino, hatisati tanyatsonzwisisa kuti chii chakashata panguva ino. Zvimwe zvishoma zvingabatsira zvechokwadi. Kukanganisa uku kwakaitika kweinenge 1% yeakaiswa katatu. Asi kana zvasvika pakuyedza Neptune, tatambira chokwadi chekuti tinongoshanda ne99% yedata kubva kuMusicBrainz.
Kunyangwe iri risiri dambudziko kune vanhu vanoziva SPARQL, ramba uchifunga kuti RDF katatu inofanirwa kutsanangurwa nemhando dzedata dzakajeka, dzinogona kuunza zvikanganiso zvakare.
Streaming download
Sezvambotaurwa pamusoro, isu hatidi kushandisa Neptune seyakaomesesa data chitoro, asi sechinhu chinoshanduka uye chinoshanduka cheruzivo hwaro. Naizvozvo, taifanira kutsvaga nzira dzekusuma hutatu hutsva sezvo hwaro hweruzivo hunochinja, semuenzaniso, kana album nyowani painotsikiswa kana patinenge tichida kugadzira ruzivo rwakabva.
Neptune inotsigira vashandisi vekuisa kuburikidza neSPARQL mibvunzo, zvese nedata rakaomarara uye zvichibva pane zvakasarudzwa. Tichakurukura nzira mbiri pazasi.
Chimwe chezvinangwa zvedu chaive chekuisa data nenzira yekutenderera. Funga nezvekubudiswa kwealbum munyika itsva. Kubva pakuona kweMusicBrainz, izvi zvinoreva kuti kuburitswa, kunosanganisira maalbum, singles, EPs, nezvimwe, rekodhi nyowani inowedzerwa patafura. kusunungurwa-nyikaMuRDF, tinomepu ruzivo urwu kune maviri matsva matatu.
INSERT DATA { <http://musicbrainz.foo/release-country/737041> <http://musicbrainz.foo/release> <http://musicbrainz.foo/release/435759> };INSERT DATA { <http://musicbrainz.foo/release-country/737041> <http://musicbrainz.foo/date-year> "2018"^^<http://www.w3.org/2001/XMLSchema#int> };Chimwe chinangwa chaive chekuburitsa ruzivo rutsva kubva mugirafu. Ngatitii tinoda kudzoreredza huwandu hwekuburitswa kwakaburitswa nemuimbi wega wega mubasa ravo. Mubvunzo uyu wakaoma uye unotora anopfuura maminetsi makumi maviri muNeptune, saka tinoda kugadzirisa mhedzisiro kuti tishandise ruzivo rutsva urwu mune mumwe mubvunzo. Naizvozvo, isu tinowedzera katatu ane ruzivo urwu kudzokera kugirafu nekuisa mhedzisiro ye subquery.
INSERT {
?artist_credit <http://musicbrainz.foo/number-of-releases> ?number_of_releases
} WHERE {
SELECT ?artist_credit (COUNT(*) as ?number_of_releases)
WHERE {
?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
?release_group <http://musicbrainz.foo/name> ?release_group_name .
}
GROUP BY ?artist_credit
}Kuwedzera katatu katatu kune girafu kunotora mashoma milliseconds, nepo nguva yekuuraya yekuisa mhedzisiro ye subquery inoenderana nenguva yekuitwa kweiyo subquery pachayo.
Nepo isu tisina kuishandisa kazhinji, Neptune inokutenderawo kuti ubvise katatu zvichienderana nemasampuli kana data rakajeka, iro rinogona kushandiswa kugadzirisa ruzivo.
SPARQL mibvunzo
Nekusuma subset yapfuura, iyo inodzosa nhamba yezvinoburitswa pamutambi wega wega, takatounza mhando yekutanga yemubvunzo yatinoda kupindura tichishandisa Neptune. Kugadzira mubvunzo muNeptune zviri nyore-tumira chikumbiro chePOST kune iyo SPARQL yekupedzisira, sezvakaratidzwa pazasi:
curl -X POST --data-binary 'query=SELECT ?artist ?p ?o where {?artist <http://musicbrainz.foo/name> "Elton John" . ?artist ?p ?o . }' http://your-neptune-cluster:8182/sparqlTaitawo mubvunzo unodzosa ma profiles ane ruzivo rwemazita, zera, nenyika kwaanobva. Ramba uchifunga kuti maartist anogona kunge ari vanhu, mapoka, kana orchestra. Isu tinowedzerawo iyi data neruzivo nezve huwandu hwekuburitswa kwega kwega muimbi anoburitswa mukati megore. Kune solo artists, isu tinosanganisirawo ruzivo nezvemapoka avaive chikamu chegore rega rega.
SELECT
?artist_name ?year
?releases_in_year ?releases_up_year
?artist_type_name ?releases
?artist_gender ?artist_country_name
?artist_begin_date ?bands
?bands_in_year
WHERE {
# Bands for each artist
{
SELECT
?year
?first_artist
(group_concat(DISTINCT ?second_artist_name;separator=",") as ?bands)
(COUNT(DISTINCT ?second_artist_name) AS ?bands_in_year)
WHERE {
VALUES ?year {
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2010 2011 2012 2013 2014 2015 2016 2017 2018
}
?first_artist <http://musicbrainz.foo/name> "Elton John" .
?first_artist <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist> .
?first_artist <http://musicbrainz.foo/type> ?first_artist_type .
?first_artist <http://musicbrainz.foo/name> ?first_artist_name .
?second_artist <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist> .
?second_artist <http://musicbrainz.foo/type> ?second_artist_type .
?second_artist <http://musicbrainz.foo/name> ?second_artist_name .
optional { ?second_artist <http://musicbrainz.foo/begin-date-year> ?second_artist_begin_date_year . }
optional { ?second_artist <http://musicbrainz.foo/end-date-year> ?second_artist_end_date_year . }
?l_artist_artist <http://musicbrainz.foo/entity0> ?first_artist .
?l_artist_artist <http://musicbrainz.foo/entity1> ?second_artist .
?l_artist_artist <http://musicbrainz.foo/link> ?link .
optional { ?link <http://musicbrainz.foo/begin-date-year> ?link_begin_date_year . }
optional { ?link <http://musicbrainz.foo/end-date-year> ?link_end_date_year . }
FILTER (!bound(?link_begin_date_year) || ?link_begin_date_year <= ?year)
FILTER (!bound(?link_end_date_year) || ?link_end_date_year >= ?year)
FILTER (!bound(?second_artist_begin_date_year) || ?second_artist_begin_date_year <= ?year)
FILTER (!bound(?second_artist_end_date_year) || ?second_artist_end_date_year >= ?year)
FILTER (?first_artist_type NOT IN (<http://musicbrainz.foo/artist-type/2>, <http://musicbrainz.foo/artist-type/5>, <http://musicbrainz.foo/artist-type/6>))
FILTER (?second_artist_type IN (<http://musicbrainz.foo/artist-type/2>, <http://musicbrainz.foo/artist-type/5>, <http://musicbrainz.foo/artist-type/6>))
}
GROUP BY ?first_artist ?year
}
# Releases up to a year
{
SELECT
?artist
?year
(group_concat(DISTINCT ?release_name;separator=",") as ?releases)
(COUNT(*) as ?releases_up_year)
WHERE {
VALUES ?year {
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2010 2011 2012 2013 2014 2015 2016 2017 2018
}
?artist <http://musicbrainz.foo/name> "Elton John" .
?artist_credit_name <http://musicbrainz.foo/artist-credit> ?artist_credit .
?artist_credit_name <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit-name> .
?artist_credit_name <http://musicbrainz.foo/artist> ?artist .
?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
?release_group <http://musicbrainz.foo/name> ?release_group_name .
?release <http://musicbrainz.foo/release-group> ?release_group .
?release <http://musicbrainz.foo/name> ?release_name .
?release_country <http://musicbrainz.foo/release> ?release .
?release_country <http://musicbrainz.foo/date-year> ?release_country_year .
FILTER (?release_country_year <= ?year)
}
GROUP BY ?artist ?year
}
# Releases in a year
{
SELECT ?artist ?year (COUNT(*) as ?releases_in_year)
WHERE {
VALUES ?year {
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2010 2011 2012 2013 2014 2015 2016 2017 2018
}
?artist <http://musicbrainz.foo/name> "Elton John" .
?artist_credit_name <http://musicbrainz.foo/artist-credit> ?artist_credit .
?artist_credit_name <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit-name> .
?artist_credit_name <http://musicbrainz.foo/artist> ?artist .
?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
?release_group <http://musicbrainz.foo/name> ?release_group_name .
?release <http://musicbrainz.foo/release-group> ?release_group .
?release_country <http://musicbrainz.foo/release> ?release .
?release_country <http://musicbrainz.foo/date-year> ?release_country_year .
FILTER (?release_country_year = ?year)
}
GROUP BY ?artist ?year
}
# Master data
{
SELECT DISTINCT ?artist ?artist_name ?artist_gender ?artist_begin_date ?artist_country_name
WHERE {
?artist <http://musicbrainz.foo/name> ?artist_name .
?artist <http://musicbrainz.foo/name> "Elton John" .
?artist <http://musicbrainz.foo/gender> ?artist_gender_id .
?artist_gender_id <http://musicbrainz.foo/name> ?artist_gender .
?artist <http://musicbrainz.foo/area> ?birth_area .
?artist <http://musicbrainz.foo/begin-date-year> ?artist_begin_date.
?birth_area <http://musicbrainz.foo/name> ?artist_country_name .
FILTER(datatype(?artist_begin_date) = xsd:int)
}Nekuda kwekuoma kwemubvunzo uyu, taigona chete kubvunza mapoinzi enyanzvi, akadai saElton John, asi kwete kune vese maartist. Neptune haiite senge ichikwirisa mubvunzo uyu nekusiya mafirita muzvikamu zvidiki. Naizvozvo, imwe neimwe subset inofanirwa kusefa nemaoko nezita remuimbi.
Neptune ine zvese paawa uye per-IO mitengo. Pakuyedzwa kwedu, takashandisa diki Neptune muenzaniso, inodhura $0,384/awa. Pamubvunzo uri pamusoro, uyo unosanganisa chimiro chemushandi mumwechete, Amazon inotibhadharisa makumi ezviuru zveI/O mashandiro, zvichireva mutengo wemadhora 0.02.
mhedziso
Kutanga, Amazon Neptune inopa pane zvakawanda zvezvivimbiso zvayo. Sevhisi inogadziriswa, idhatabhesi regirafu riri nyore kwazvo kuisa uye rinogona kusimuka uye richimhanya pasina kurongeka kwakawanda. Heano maitiro edu mashanu akakosha ekutora:
- Kuisa muhuwandu kuri nyore asi kunononoka. Inogona kuomeswa nemameseji ekukanganisa ayo asinganyanyi kubatsira.
- Kudhawunirodha kunotsigira zvese zvataitarisira uye zvaive nekukurumidza.
- Mibvunzo yacho iri nyore asi haina kupindirana zvakakwana kuita mibvunzo yekuongorora.
- SPARQL mibvunzo inofanirwa kugadzirwa nemaoko
- Mari yeAmazon yakaoma kufungidzira nekuti zvinonetsa kufungidzira huwandu hwe data yakaongororwa nemubvunzo weSPARQL.
Ndizvo zvose ikozvino. Sign up for .
Source: www.habr.com
