Maonero ekutanga eAmazon Neptune

Kwazisai, vagari veKhabrovsk. Mukutarisira kutanga kwekosi "AWS yeVagadziri" Takagadzirira shanduro yezvinyorwa zvinonakidza.

Maonero ekutanga eAmazon Neptune

Muzviitiko zvakawanda zvekushandisa zvatinoda bakdataSezvatinoona pamawebhusaiti evatengi vedu, ruzivo rwakakosha rwakavigwa muhukama pakati pemasangano, semuenzaniso pakuongorora hukama pakati pevashandisi, kutsamira pakati pezvinhu, kana kubatana pakati pemasensor. Mamiriro ekushandisa akadaro anowanzo kuenzanisirwa pane girafu. Pakutanga gore rino, Amazon yakaburitsa dhatabhesi yayo nyowani, Neptune. Mune ino post tinoda kugovera mazano edu ekutanga, maitiro akanaka uye izvo zvinogona kuvandudzwa nekufamba kwenguva.

Nei taida Amazon Neptune

Girafu dhatabhesi inovimbisa kubata zvakanyanya akabatana data seti zvirinani pane iwo ehukama akaenzana. Mumadataset akadaro, ruzivo rwakakodzera runowanzo kuchengetwa muhukama pakati pezvinhu. Isu takashandisa inoshamisa yakavhurika data chirongwa kuyedza Neptune Vadivelu Comedy MusicBrainz. MusicBrainz inounganidza marudzi ese emimhanzi metadata ingafungidzirwe, senge ruzivo nezve maartist, nziyo, kuburitswa kwealbum kana makonzati, pamwe nekuti muimbi ari kuseri kwerwiyo akabatana naye kana kuti riini rakaburitswa munyika ipi. MusicBrainz inogona kuoneka senge network yakakura yemasangano ayo akabatana neimwe indasitiri yemimhanzi.

Iyo MusicBrainz dataset inopihwa se CSV yekurasa yehukama dhatabhesi. Pakazara, nzvimbo yekurasa ine mitsara inosvika 93 miriyoni mumatafura 157. Nepo mamwe ematafura aya aine data rekutanga senge maartist, zviitiko, kurekodha, kuburitswa kana mateki, vamwe link tables - chengetedza hukama pakati pevanyori uye zvinyorwa, mamwe maartist kana kuburitswa, nezvimwewo ... Vanoratidza chimiro chegirafu che data set. Pakushandura dataset kuita RDF katatu, takawana angangoita 500 miriyoni zviitiko.

Zvichienderana neruzivo uye zvinoonekwa zvevabatsiri vepurojekiti vatinoshanda navo, tinopa marongero umo iyi nheyo yeruzivo inoshandiswa kuwana ruzivo rutsva. Pamusoro pezvo, isu tinotarisira kuti ichagadziridzwa nguva nenguva, semuenzaniso nekuwedzera zvitsva zvitsva kana kuvandudza nhengo dzeboka.

kuchinja

Sezvinotarisirwa, kuisa Amazon Neptune iri nyore. Akanyatsotsanangurwa documented. Unogona kuvhura dhatabhesi regirafu mune mashoma mashoma. Nekudaro, kana zvasvika pakunyanya kurongeka, ruzivo rwakakosha zvakaoma kuwana. Naizvozvo, isu tinoda kunongedza kune imwe gadziriso parameter.

Maonero ekutanga eAmazon Neptune
Configuration screenshot yeparameter mapoka

Amazon inoti Neptune inotarisa pane yakaderera-latency transactional mitoro, ndosaka iyo default chikumbiro timeout iri 120 masekondi. Isu, zvakadaro, takaedza akawanda ekuongorora mashandisiro emakesi umo taigara tichisvika padanho iri. Iyi nguva yekupera inogona kuchinjwa nekugadzira boka idzva reparameter yeNeptune uye kuseta neptune_query_timeout ganhuriro inowirirana.

Loading Data

Pazasi isu tichakurukura zvakadzama kuti takaisa sei MusicBrainz data muNeptune.

Hukama muhutatu

Kutanga, takashandura iyo MusicBrainz data kuita RDF katatu. Naizvozvo, patafura yega yega, isu takatsanangura template inotsanangura kuti koramu yega yega inomiririrwa sei mutatu. Mumuenzaniso uyu, mutsara wega wega kubva patafura yevatambi wakamisikidzwa kusvika gumi nembiri RDF katatu.

<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/gid> "${gid}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/name> "${name}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/sort-name> "${sort_name}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/begin-date> "${begin_date_year}-${begin_date_month}-${begin_date_day}"^^xsd:<http://www.w3.org/2001/XMLSchema#date> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/end-date> "${end_date_year}-${end_date_month}-${end_date_day}"^^xsd:<http://www.w3.org/2001/XMLSchema#date> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/type> <http://musicbrainz.foo/artist-type/${type}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/area> <http://musicbrainz.foo/area/${area}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/gender> <http://musicbrainz.foo/gender/${gender}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/comment> "${comment}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/edits-pending> "${edits_pending}"^^<http://www.w3.org/2001/XMLSchema#int> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/last-updated> "${last_updated}"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/ended> "${ended}"^^<http://www.w3.org/2001/XMLSchema#boolean> .

Bulk upload

Iyo yakakurudzirwa nzira yekurodha yakawanda yedata muNeptune iri kuburikidza nehuwandu hwekurodha maitiro kuburikidza neS3. Mushure mekuisa mafaera ako katatu kuS3, unotanga kurodha uchishandisa POST chikumbiro. Muchiitiko chedu, zvakatora maawa makumi maviri nemana kune mazana mashanu emamiriyoni katatu. Taitarisira kuti ichakurumidza.

curl -X POST -H 'Content-Type: application/json' http://your-neptune-cluster:8182/loader -d '{
 
 
 "source" : "s3://your-s3-bucket",
 
 "format" : "ntriples",
 
 "iamRoleArn" : "arn:aws:iam::your-iam-user:role/NeptuneLoadFromS3",
 
 "region" : "eu-west-1",
 
 "failOnError" : "FALSE"
 
}'

Kuti tidzivise kuita uku kwakareba pese patinotangisa Neptune, takafunga kudzoreredza muenzaniso kubva pamufananidzo umo mapatatu aya akange atoiswa. Kumhanya kubva pamufananidzo kunokurumidza kukurumidza, asi zvichiri kutora nguva ingangoita awa kusvika Neptune yavepo kune zvikumbiro.

Patakatanga kurodha katatu muNeptune, takasangana nezvikanganiso zvakasiyana.

{
 
 
 "errorCode" : "PARSING_ERROR",
 
 "errorMessage" : "Content after '.' is not allowed",
 
 "fileName" : [...],
 
 "recordNum" : 25
 
}

Dzimwe dzadzo dzaive zvikanganiso zvekuongorora, sezvaratidzwa pamusoro apa. Kusvika parizvino, hatisati taziva kuti chii chaizvo chakashata panguva ino. Rumwe ruzivo rushoma rwaizobatsira pano. Kukanganisa uku kwakaitika kweinenge 1% yeakaiswa katatu. Asi kusvika pakuedza Neptune inoenda, isu takagamuchira chokwadi chekuti isu tinongoshanda ne99% yeruzivo kubva kuMusicBrainz.

Kunyangwe izvi zviri nyore kune vanhu vanoziva SPARQL, ziva kuti RDF katatu inofanirwa kutsanangurwa nemhando dzedata dzakajeka, izvo zvinogona kukonzera kukanganisa.

Streaming download

Sezvambotaurwa pamusoro, isu hatidi kushandisa Neptune seyakaomesesa data chitoro, asi sechinhu chinoshanduka uye chinoshanduka cheruzivo hwaro. Saka isu taifanira kutsvaga nzira dzekuunza hutatu hutsva kana hwaro hwezivo hwachinja, semuenzaniso kana album nyowani yaburitswa kana patinenge tichida kugadzira ruzivo rwakabva.

Neptune inotsigira vashandisi vekuisa kuburikidza neSPARQL mibvunzo, iri mbishi uye sampuli-yakavakirwa. Tichakurukura nzira mbiri dziri pasi apa.

Chimwe chezvinangwa zvedu kwaive kuisa data nenzira yekutenderera. Funga kuburitsa arubhamu munyika itsva. Kubva pakuona kweMusicBrainz, izvi zvinoreva kuti kuburitswa kunosanganisira maalbum, singles, EPs, nezvimwe, yekupinda nyowani inowedzerwa patafura. kusunungurwa-nyika. MuRDF, tinofananidza ruzivo urwu nerutsva rutsva ruviri.

INSERT DATA { <http://musicbrainz.foo/release-country/737041> <http://musicbrainz.foo/release> <http://musicbrainz.foo/release/435759> };INSERT DATA { <http://musicbrainz.foo/release-country/737041> <http://musicbrainz.foo/date-year> "2018"^^<http://www.w3.org/2001/XMLSchema#int> };

Chimwe chinangwa chaiva chekuwana ruzivo rutsva kubva pagirafu. Ngatitii tinoda kuwana nhamba yekuburitswa kwega kwega muimbi akaburitsa mubasa ravo. Muvhunzo wakadaro wakaoma uye unotora maminitsi makumi maviri muNeptune, saka tinoda kugadzirisa mhedzisiro kuitira kuti tishandise ruzivo rutsva urwu mune mumwe mubvunzo. Saka isu tinowedzera katatu neruzivo urwu kudzokera kugirafu, tichiisa mhedzisiro yeiyo subquery.

INSERT {
 
 
  ?artist_credit <http://musicbrainz.foo/number-of-releases> ?number_of_releases
 
} WHERE {
 
  SELECT ?artist_credit (COUNT(*) as ?number_of_releases)
 
  WHERE {
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
  }
 
  GROUP BY ?artist_credit
 
}

Kuwedzera katatu katatu kune girafu kunotora mashoma milliseconds, nepo nguva yekuuraya yekuisa mhedzisiro ye subquery zvinoenderana nenguva yekuuraya yeiyo subquery pachayo.

Kunyangwe tisina kuishandisa kazhinji, Neptune inokutenderawo kuti ubvise katatu zvichienderana nemasampuli kana data rakajeka, rinogona kushandiswa kugadzirisa ruzivo.

SPARQL mibvunzo

Nekusuma sampuli yapfuura, iyo inodzosa nhamba yezvinoburitswa zvemuimbi wega wega, takatounza mhando yekutanga yemubvunzo yatinoda kupindura tichishandisa Neptune. Kuvaka mubvunzo muNeptune zviri nyore - tumira chikumbiro chePOST kune SPARQL endpoint, sezvakaratidzwa pazasi:

curl -X POST --data-binary 'query=SELECT ?artist ?p ?o where {?artist <http://musicbrainz.foo/name> "Elton John" . ?artist ?p ?o . }' http://your-neptune-cluster:8182/sparql

Pamusoro pezvo, taita mubvunzo unodzosa chimiro chemuimbi chine ruzivo nezve zita ravo, zera, kana nyika kwaanobva. Ramba uchifunga kuti vatambi vanogona kunge vari vanhu, zvikwata, kana orchestra. Uye zvakare, isu tinowedzera iyi data neruzivo nezve huwandu hwekuburitswa kwakaburitswa nemaartist mukati megore. Kune solo artists, isu tinowedzerawo ruzivo nezve mabhendi akatorwa nemuimbi gore rega rega.

SELECT
 
 
 ?artist_name ?year
 
 ?releases_in_year ?releases_up_year
 
 ?artist_type_name ?releases
 
 ?artist_gender ?artist_country_name
 
 ?artist_begin_date ?bands
 
 ?bands_in_year
 
WHERE {
 
 # Bands for each artist
 
 {
 
   SELECT
 
     ?year
 
     ?first_artist
 
     (group_concat(DISTINCT ?second_artist_name;separator=",") as ?bands)
 
     (COUNT(DISTINCT ?second_artist_name) AS ?bands_in_year)     
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018
 
     }   
 
     ?first_artist <http://musicbrainz.foo/name> "Elton John" .
 
     ?first_artist <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist> .
 
     ?first_artist <http://musicbrainz.foo/type> ?first_artist_type .
 
     ?first_artist <http://musicbrainz.foo/name> ?first_artist_name .
 

 
 
     ?second_artist <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist> .
 
     ?second_artist <http://musicbrainz.foo/type> ?second_artist_type .
 
     ?second_artist <http://musicbrainz.foo/name> ?second_artist_name .
 
     optional { ?second_artist <http://musicbrainz.foo/begin-date-year> ?second_artist_begin_date_year . }
 
     optional { ?second_artist <http://musicbrainz.foo/end-date-year> ?second_artist_end_date_year . }
 

 
 
     ?l_artist_artist <http://musicbrainz.foo/entity0> ?first_artist .
 
     ?l_artist_artist <http://musicbrainz.foo/entity1> ?second_artist .
 
     ?l_artist_artist <http://musicbrainz.foo/link> ?link .
 

 
 
     optional { ?link <http://musicbrainz.foo/begin-date-year> ?link_begin_date_year . }
 
     optional { ?link <http://musicbrainz.foo/end-date-year> ?link_end_date_year . }
 

 
 
     FILTER (!bound(?link_begin_date_year) || ?link_begin_date_year <= ?year)
 
     FILTER (!bound(?link_end_date_year) || ?link_end_date_year >= ?year)
 
     FILTER (!bound(?second_artist_begin_date_year) || ?second_artist_begin_date_year <= ?year)
 
     FILTER (!bound(?second_artist_end_date_year) || ?second_artist_end_date_year >= ?year)
 
     FILTER (?first_artist_type NOT IN (<http://musicbrainz.foo/artist-type/2>, <http://musicbrainz.foo/artist-type/5>, <http://musicbrainz.foo/artist-type/6>))
 
     FILTER (?second_artist_type IN (<http://musicbrainz.foo/artist-type/2>, <http://musicbrainz.foo/artist-type/5>, <http://musicbrainz.foo/artist-type/6>))
 
   }
 
   GROUP BY ?first_artist ?year
 
 }
 
 # Releases up to a year
 
 {
 
   SELECT
 
     ?artist
 
     ?year
 
     (group_concat(DISTINCT ?release_name;separator=",") as ?releases)
 
     (COUNT(*) as ?releases_up_year)
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018 
 
     }
 

 
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 

 
 
     ?artist_credit_name <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?artist_credit_name <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit-name> .
 
     ?artist_credit_name <http://musicbrainz.foo/artist> ?artist .
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 

 
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
     ?release <http://musicbrainz.foo/release-group> ?release_group .
 
     ?release <http://musicbrainz.foo/name> ?release_name .
 
     ?release_country <http://musicbrainz.foo/release> ?release .
 
     ?release_country <http://musicbrainz.foo/date-year> ?release_country_year .
 

 
 
     FILTER (?release_country_year <= ?year)
 
   }
 
   GROUP BY ?artist ?year
 
 }
 
 # Releases in a year
 
 {
 
   SELECT ?artist ?year (COUNT(*) as ?releases_in_year)
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018 
 
     }
 

 
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 

 
 
     ?artist_credit_name <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?artist_credit_name <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit-name> .
 
     ?artist_credit_name <http://musicbrainz.foo/artist> ?artist .
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 

 
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
     ?release <http://musicbrainz.foo/release-group> ?release_group .
 
     ?release_country <http://musicbrainz.foo/release> ?release .
 
     ?release_country <http://musicbrainz.foo/date-year> ?release_country_year .
 

 
 
     FILTER (?release_country_year = ?year)
 
   }
 
   GROUP BY ?artist ?year
 
 }
 
 # Master data
 
 {
 
   SELECT DISTINCT ?artist ?artist_name ?artist_gender ?artist_begin_date ?artist_country_name
 
   WHERE {
 
     ?artist <http://musicbrainz.foo/name> ?artist_name .
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 
     ?artist <http://musicbrainz.foo/gender> ?artist_gender_id .
 
     ?artist_gender_id <http://musicbrainz.foo/name> ?artist_gender .
 
     ?artist <http://musicbrainz.foo/area> ?birth_area .
 
     ?artist <http://musicbrainz.foo/begin-date-year> ?artist_begin_date.
 
     ?birth_area <http://musicbrainz.foo/name> ?artist_country_name .
 

 
 
     FILTER(datatype(?artist_begin_date) = xsd:int)
 
   }

Nekuda kwekuoma kwemubvunzo wakadai, isu taigona chete kubvunza mapoinzi kune chaiwo muimbi, akadai saElton John, asi kwete kune vese maartist. Neptune haiite senge ichikwirisa mubvunzo wakadaro nekudonhedza mafirita mumasubselects. Naizvozvo, sarudzo yega yega inofanirwa kusefa nemaoko nezita remuimbi.

Neptune ine machaji eawa nea-I/O. Pakuyedzwa kwedu, takashandisa iyo shoma shoma Neptune muenzaniso, inodhura $0,384/awa. Panyaya yemubvunzo uri pamusoro, iyo inoverengera chimiro chemushandi mumwechete, Amazon inotibhadharisa makumi ezviuru zveI/O mashandiro, zvichireva mutengo wemadhora 0.02.

mhedziso

Kutanga, Amazon Neptune inochengeta yakawanda yezvivimbiso zvayo. Sesevhisi inotungamirwa, idhatabhesi regirafu iro riri nyore kwazvo kuisa uye rinogona kusimuka uye richimhanya pasina kuwanda kwekugadzirisa. Hezvinoi zvatinowana zvishanu zvakakosha:

  • Kuisa pahuwandu kuri nyore asi kunononoka. Asi inogona kuomeswa nemameseji ekukanganisa ayo asinganyanyi kubatsira.
  • Kudhawunirodha kunotsigira zvese zvataitarisira uye zvaive nekukurumidza
  • Mibvunzo iri nyore, asi haina kupindirana zvakakwana kuti iite mibvunzo yekuongorora
  • SPARQL mibvunzo inofanirwa kuvandudzwa nemawoko
  • Kubhadhara kweAmazon kwakaoma kufungidzira nekuti zvinonetsa kufungidzira huwandu hwe data yakaongororwa nemubvunzo weSPARQL.

Ndizvo zvose. Sign up for webinar yemahara pane iyo nyaya "Mutoro Bancing".


Source: www.habr.com

Voeg