Ko nga ahuatanga tuatahi o Amazon Neptune

Tena koutou, e nga kainoho o Khabrovsk. I te tatari mo te timatanga o te akoranga "AWS mo nga Kaihanga" Kua whakaritea e matou he whakamaoritanga o nga mea whakamiharo.

Ko nga ahuatanga tuatahi o Amazon Neptune

I roto i te maha o nga keehi e pai ana matou bakdataKa kite tatou i nga paetukutuku a o taatau kiritaki, ka hunahia nga korero e tika ana i roto i nga hononga i waenga i nga hinonga, hei tauira i te wa e tarai ana i nga hononga i waenga i nga kaiwhakamahi, i te whakawhirinaki ki waenga i nga huānga, i nga hononga ranei i waenga i nga puoro. Ko te nuinga o nga wa ka whakatauirahia enei keehi i runga i te kauwhata. I te timatanga o tenei tau, ka tukuna e Amazon tana putunga kauwhata hou, a Neptune. I roto i tenei pou e hiahia ana matou ki te whakapuaki i o maatau whakaaro tuatahi, nga mahi pai me nga mea ka taea te whakapai ake i te waa.

He aha matou i hiahia ai ki a Amazon Neptune

E oati ana nga papaunga raraunga kauwhata ka pai ake te whakahaere i nga huinga raraunga tino hono atu i o raatau taurite hononga. I roto i enei huingararaunga, ko te nuinga o nga wa ka penapena nga korero e tika ana i roto i nga hononga i waenga i nga taonga. I whakamahia e matou he kaupapa raraunga tuwhera whakamiharo hei whakamatautau i a Neptune WaiataBrainz. Ka kohia e MusicBrainz nga momo metadata puoro katoa ka whakaarohia, penei i nga korero mo nga kaitoi, nga waiata, nga whakaputanga pukaemi, nga konohete ranei, me ko wai te kaitoi i muri i te waiata i mahi tahi me te wa i tukuna ai te pukaemi i tera whenua. Ka kitea a MusicBrainz he whatunga nui o nga hinonga e hono ana ki te umanga puoro.

Ko te huingararaunga MusicBrainz ka tukuna hei putunga CSV o te patengi raraunga hononga. Huihui katoa, kei te 93 miriona nga rarangi kei roto i te putunga 157 tepu. Ahakoa ko etahi o enei ripanga kei roto nga raraunga taketake penei i nga kaitoi, nga huihuinga, nga rekoata, nga whakaputanga, nga riipene ranei, etahi atu tepu hono — toa hononga i waenga i nga kaitoi me nga rekoata, etahi atu kaitoi, tuku ranei, aha atu... Ka whakaatu i te hanganga kauwhata o te huinga raraunga. I te hurihanga o te huingararaunga ki nga taarua RDF, tata ki te 500 miriona nga waa i whiwhi.

I runga i nga wheako me nga whakaaro o nga hoa kaupapa e mahi ana matou, ka whakaatuhia e matou he waahi e whakamahia ai tenei turanga matauranga ki te tiki korero hou. I tua atu, e tumanako ana matou kia whakahoutia i nga wa katoa, hei tauira ma te taapiri i nga putanga hou, te whakahou ranei i nga mema o te roopu.

whakatikatikanga

Ka rite ki te tumanako, he ngawari te whakauru i a Amazon Neptune. He tino taipitopito ia tuhia. Ka taea e koe te whakarewa i te papaa raraunga kauwhata i roto i nga paato noa. Heoi, ina tae mai ki te whirihoranga taipitopito ake, nga korero e tika ana uaua ki te kitea. Na reira, e hiahia ana matou ki te tohu ki tetahi tawhā whirihoranga.

Ko nga ahuatanga tuatahi o Amazon Neptune
Whakaahuatanga whirihoranga mo nga roopu tawhā

E ai ki a Amazon, e aro ana a Neptune ki nga taumahatanga tauwhitinga iti, na reira ko te 120 hēkona te paunga tono taunoa. Heoi, kua whakamatauria e matou te maha o nga keehi whakamahi tātari i tae ai matou ki tenei rohe. Ka taea te huri i tenei waahi ma te hanga i tetahi roopu tawhā hou mo Neptune me te tautuhinga neptune_query_timeout herenga rite.

Uta Raraunga

Kei raro nei ka matapakihia e matou me pehea te utaina o nga raraunga MusicBrainz ki Neptune.

Whanaungatanga i roto i te toru

Tuatahi, i hurihia e matou nga raraunga MusicBrainz ki RDF takitoru. Na reira, mo ia ripanga, i tautuhia e matou he tauira e whakaatu ana i te ahua o ia pou i roto i te toru. I tenei tauira, ko ia rarangi mai i te teepu kaihaka ka mapi ki te tekau ma rua nga taarua RDF.

<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/gid> "${gid}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/name> "${name}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/sort-name> "${sort_name}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/begin-date> "${begin_date_year}-${begin_date_month}-${begin_date_day}"^^xsd:<http://www.w3.org/2001/XMLSchema#date> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/end-date> "${end_date_year}-${end_date_month}-${end_date_day}"^^xsd:<http://www.w3.org/2001/XMLSchema#date> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/type> <http://musicbrainz.foo/artist-type/${type}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/area> <http://musicbrainz.foo/area/${area}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/gender> <http://musicbrainz.foo/gender/${gender}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/comment> "${comment}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/edits-pending> "${edits_pending}"^^<http://www.w3.org/2001/XMLSchema#int> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/last-updated> "${last_updated}"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/ended> "${ended}"^^<http://www.w3.org/2001/XMLSchema#boolean> .

Tukunga ake

Ko te tikanga ki te uta i te nui o nga raraunga ki roto i a Neptune ma te tukunga ake ma te S3. Whai muri i te tukunga o au konae takitoru ki te S3, ka timata koe i te tukunga ake ma te tono POST. I a maatau, tata ki te 24 haora mo te 500 miriona taarua. I tumanako matou kia tere ake.

curl -X POST -H 'Content-Type: application/json' http://your-neptune-cluster:8182/loader -d '{
 
 
 "source" : "s3://your-s3-bucket",
 
 "format" : "ntriples",
 
 "iamRoleArn" : "arn:aws:iam::your-iam-user:role/NeptuneLoadFromS3",
 
 "region" : "eu-west-1",
 
 "failOnError" : "FALSE"
 
}'

Hei karo i tenei tukanga roa i nga wa katoa ka whakarewahia e matou a Neptune, i whakatau matou ki te whakahoki mai i te tauira mai i te whakaahua kua utaina keehia enei taarua. He tere ake te oma mai i te whakaahua, engari ka roa te haora kia watea mai a Neptune mo nga tono.

I te wa tuatahi ka utaina nga taarua ki Neptune, he maha nga hapa i pa ki a matou.

{
 
 
 "errorCode" : "PARSING_ERROR",
 
 "errorMessage" : "Content after '.' is not allowed",
 
 "fileName" : [...],
 
 "recordNum" : 25
 
}

Ko etahi o ratou he hapa porotiti, penei i runga ake nei. I tenei wa, kaore ano matou kia mohio he aha nga mea i he i tenei wa. He iti ake nga korero ka tino awhina i konei. I puta tenei hapa mo te 1% o nga taarua kua whakauruhia. Engari mo te whakamatautau i a Neptune, i whakaae matou ki te mahi me te 99% o nga korero mai i MusicBrainz anake.

Ahakoa he ngawari tenei mo te hunga e mohio ana ki te SPARQL, kia mohio koe me tuhi nga taarua RDF ki nga momo raraunga marama, ka puta he hapa.

Rere tikiake

Ka rite ki te korero i runga ake nei, kaore matou e hiahia ki te whakamahi i a Neptune hei toa raraunga pateko, engari hei turanga matauranga ngawari me te tipu haere. No reira me rapu huarahi ki te whakauru i nga taarua hou ina huri te turanga matauranga, hei tauira ka whakaputahia he pukaemi hou, ina hiahia ana matou ki te whakaemi i te matauranga i ahu mai.

Ka tautoko a Neptune i nga kaitoro whakauru ma roto i nga patai SPARQL, ma te mata me te tauira. Ka matapakihia nga huarahi e rua i raro nei.

Ko tetahi o a maatau whainga ko te whakauru i nga raraunga i runga i te rerema. Whakaarohia te tuku pukaemi ki tetahi whenua hou. Mai i te tirohanga a MusicBrainz, ko te tikanga mo te tukunga kei roto ko nga pukaemi, kootahi, ko nga EP, me etahi atu, ka taapirihia he urunga hou ki te tepu. whenua tuku. I roto i te RDF, ka whakaritea e matou enei korero ki nga taarua hou e rua.

INSERT DATA { <http://musicbrainz.foo/release-country/737041> <http://musicbrainz.foo/release> <http://musicbrainz.foo/release/435759> };INSERT DATA { <http://musicbrainz.foo/release-country/737041> <http://musicbrainz.foo/date-year> "2018"^^<http://www.w3.org/2001/XMLSchema#int> };

Ko tetahi atu whainga ko te whiwhi matauranga hou mai i te kauwhata. Me kii e hiahia ana matou ki te tiki i te maha o nga whakaputanga kua whakaputaina e ia kaitoi i a raatau mahi. He tino uaua taua patai, neke atu i te 20 meneti te roa o Neptune, no reira me whakakorikori tatou i te hua ki te whakamahi ano i tenei matauranga hou ki etahi atu patai. Na ka taapirihia nga taarua me enei korero ki te kauwhata, ka uru ki te hua o te patai.

INSERT {
 
 
  ?artist_credit <http://musicbrainz.foo/number-of-releases> ?number_of_releases
 
} WHERE {
 
  SELECT ?artist_credit (COUNT(*) as ?number_of_releases)
 
  WHERE {
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
  }
 
  GROUP BY ?artist_credit
 
}

Ko te taapiri i nga takitoru kotahi ki te kauwhata he iti noa nga manomano, engari ko te wa mahi mo te whakauru i te hua o te patai ka whakawhirinaki ki te wa mahi o te patai.

Ahakoa kaore i whakamahia e matou i nga wa maha, ka taea e Neptune te tango i nga taarua i runga i nga tauira, i nga raraunga marama ranei, ka taea te whakamahi hei whakahou i nga korero.

Uiui SPARQL

Ma te whakauru i te tauira iti o mua, e whakahoki ana i te maha o nga whakaputanga mo ia kaitoi, kua whakauruhia e matou te momo patai tuatahi e hiahia ana matou ki te whakautu ma te whakamahi i a Neptune. He ngawari te hanga patai ki Neptune - tukuna he tono POST ki te pito mutunga SPARQL, penei i raro nei:

curl -X POST --data-binary 'query=SELECT ?artist ?p ?o where {?artist <http://musicbrainz.foo/name> "Elton John" . ?artist ?p ?o . }' http://your-neptune-cluster:8182/sparql

I tua atu, kua whakatinanahia e matou he patai e whakahoki mai ana i tetahi tohu toi kei roto nga korero mo o raatau ingoa, tau, whenua takenga mai ranei. Kia maumahara ko nga kaihaka he tangata takitahi, he roopu, he kaiwaiata ranei. I tua atu, ka taapirihia e matou enei raraunga me nga korero mo te maha o nga whakaputanga i tukuna e nga kaitoi i roto i te tau. Mo nga kaitoi takitahi, ka taapirihia nga korero mo nga roopu i uru mai nga kaitoi ia tau.

SELECT
 
 
 ?artist_name ?year
 
 ?releases_in_year ?releases_up_year
 
 ?artist_type_name ?releases
 
 ?artist_gender ?artist_country_name
 
 ?artist_begin_date ?bands
 
 ?bands_in_year
 
WHERE {
 
 # Bands for each artist
 
 {
 
   SELECT
 
     ?year
 
     ?first_artist
 
     (group_concat(DISTINCT ?second_artist_name;separator=",") as ?bands)
 
     (COUNT(DISTINCT ?second_artist_name) AS ?bands_in_year)     
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018
 
     }   
 
     ?first_artist <http://musicbrainz.foo/name> "Elton John" .
 
     ?first_artist <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist> .
 
     ?first_artist <http://musicbrainz.foo/type> ?first_artist_type .
 
     ?first_artist <http://musicbrainz.foo/name> ?first_artist_name .
 

 
 
     ?second_artist <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist> .
 
     ?second_artist <http://musicbrainz.foo/type> ?second_artist_type .
 
     ?second_artist <http://musicbrainz.foo/name> ?second_artist_name .
 
     optional { ?second_artist <http://musicbrainz.foo/begin-date-year> ?second_artist_begin_date_year . }
 
     optional { ?second_artist <http://musicbrainz.foo/end-date-year> ?second_artist_end_date_year . }
 

 
 
     ?l_artist_artist <http://musicbrainz.foo/entity0> ?first_artist .
 
     ?l_artist_artist <http://musicbrainz.foo/entity1> ?second_artist .
 
     ?l_artist_artist <http://musicbrainz.foo/link> ?link .
 

 
 
     optional { ?link <http://musicbrainz.foo/begin-date-year> ?link_begin_date_year . }
 
     optional { ?link <http://musicbrainz.foo/end-date-year> ?link_end_date_year . }
 

 
 
     FILTER (!bound(?link_begin_date_year) || ?link_begin_date_year <= ?year)
 
     FILTER (!bound(?link_end_date_year) || ?link_end_date_year >= ?year)
 
     FILTER (!bound(?second_artist_begin_date_year) || ?second_artist_begin_date_year <= ?year)
 
     FILTER (!bound(?second_artist_end_date_year) || ?second_artist_end_date_year >= ?year)
 
     FILTER (?first_artist_type NOT IN (<http://musicbrainz.foo/artist-type/2>, <http://musicbrainz.foo/artist-type/5>, <http://musicbrainz.foo/artist-type/6>))
 
     FILTER (?second_artist_type IN (<http://musicbrainz.foo/artist-type/2>, <http://musicbrainz.foo/artist-type/5>, <http://musicbrainz.foo/artist-type/6>))
 
   }
 
   GROUP BY ?first_artist ?year
 
 }
 
 # Releases up to a year
 
 {
 
   SELECT
 
     ?artist
 
     ?year
 
     (group_concat(DISTINCT ?release_name;separator=",") as ?releases)
 
     (COUNT(*) as ?releases_up_year)
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018 
 
     }
 

 
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 

 
 
     ?artist_credit_name <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?artist_credit_name <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit-name> .
 
     ?artist_credit_name <http://musicbrainz.foo/artist> ?artist .
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 

 
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
     ?release <http://musicbrainz.foo/release-group> ?release_group .
 
     ?release <http://musicbrainz.foo/name> ?release_name .
 
     ?release_country <http://musicbrainz.foo/release> ?release .
 
     ?release_country <http://musicbrainz.foo/date-year> ?release_country_year .
 

 
 
     FILTER (?release_country_year <= ?year)
 
   }
 
   GROUP BY ?artist ?year
 
 }
 
 # Releases in a year
 
 {
 
   SELECT ?artist ?year (COUNT(*) as ?releases_in_year)
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018 
 
     }
 

 
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 

 
 
     ?artist_credit_name <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?artist_credit_name <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit-name> .
 
     ?artist_credit_name <http://musicbrainz.foo/artist> ?artist .
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 

 
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
     ?release <http://musicbrainz.foo/release-group> ?release_group .
 
     ?release_country <http://musicbrainz.foo/release> ?release .
 
     ?release_country <http://musicbrainz.foo/date-year> ?release_country_year .
 

 
 
     FILTER (?release_country_year = ?year)
 
   }
 
   GROUP BY ?artist ?year
 
 }
 
 # Master data
 
 {
 
   SELECT DISTINCT ?artist ?artist_name ?artist_gender ?artist_begin_date ?artist_country_name
 
   WHERE {
 
     ?artist <http://musicbrainz.foo/name> ?artist_name .
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 
     ?artist <http://musicbrainz.foo/gender> ?artist_gender_id .
 
     ?artist_gender_id <http://musicbrainz.foo/name> ?artist_gender .
 
     ?artist <http://musicbrainz.foo/area> ?birth_area .
 
     ?artist <http://musicbrainz.foo/begin-date-year> ?artist_begin_date.
 
     ?birth_area <http://musicbrainz.foo/name> ?artist_country_name .
 

 
 
     FILTER(datatype(?artist_begin_date) = xsd:int)
 
   }

Na te uaua o taua patai, ka taea e matou te mahi i nga patai tohu mo tetahi kaitoi motuhake, penei i a Elton John, engari kaua mo nga kaitoi katoa. Ko te ahua karekau a Neptune e arotau i taua patai ma te tuku i nga whiriwhiringa ki roto i nga whiriwhiringa iti. Nō reira, me tātari ā-ringa ia kōwhiringa mā te ingoa kaitoi.

Kei a Neptune nga utu mo ia haora me ia-I/O. Mo a maatau whakamatautau, i whakamahia e matou te tauira Neptune iti rawa, he $0,384/haora te utu. Mo te patai i runga ake nei, e tatau ana i te whaarangi mo te kaimahi kotahi, ka utua e Amazon nga tekau mano o nga whakahaere I/O, e tohu ana he $0.02 te utu.

mutunga

Tuatahi, ko Amazon Neptune te pupuri i te nuinga o ana kupu whakaari. Hei ratonga whakahaere, he papaa raraunga kauwhata he tino ngawari ki te whakauru, ka taea te whakahaere me te kore e nui te whirihoranga. Anei a maatau kitenga matua e rima:

  • He ngawari te tukunga ake engari puhoi. Engari ka uaua ki nga karere hapa kaore i te tino awhina.
  • E tautoko ana te tango i nga mea katoa i tumanakohia e matou, a he tere rawa
  • He ngawari nga patai, engari kaore i te nui te pahekoheko hei whakahaere i nga patai tātari
  • Ko nga patai SPARQL me arotau a ringa
  • He uaua ki te whakatau tata nga utu a Amazon na te mea he uaua ki te whakatau tata i te nui o nga raraunga i karapahia e te patai SPARQL.

Heoi ano. Waitohu mo webinar kore utu i runga i te kaupapa "Uta Whakataunga".


Source: will.com

Tāpiri i te kōrero