Thawj qhov kev xav ntawm Amazon Neptune

Ua tsaug, cov neeg nyob hauv Khabrovsk. Hauv kev cia siab ntawm qhov pib ntawm chav kawm "AWS for Developers" Peb tau npaj ib qho kev txhais cov ntaub ntawv nthuav.

Thawj qhov kev xav ntawm Amazon Neptune

Hauv ntau qhov kev siv uas peb nyiam bakdataRaws li peb pom ntawm peb cov neeg siv khoom lub vev xaib, cov ntaub ntawv cuam tshuam tau muab zais hauv kev sib txuas ntawm cov koom haum, piv txwv li thaum txheeb xyuas kev sib raug zoo ntawm cov neeg siv, kev cia siab ntawm cov ntsiab lus, lossis kev sib txuas ntawm cov sensors. Cov xwm txheej zoo li no feem ntau yog ua qauv ntawm daim duab. Nyuam qhuav pib lub xyoo no, Amazon tau tshaj tawm nws cov ntaub ntawv duab tshiab, Neptune. Hauv tsab xov xwm no peb xav qhia peb cov tswv yim thawj zaug, kev coj ua zoo thiab dab tsi tuaj yeem txhim kho lub sijhawm.

Vim li cas peb xav tau Amazon Neptune

Graph databases cog lus tias yuav daws cov ntaub ntawv sib txuas tau zoo dua li lawv qhov sib npaug sib npaug. Hauv cov ntaub ntawv zoo li no, cov ntaub ntawv cuam tshuam feem ntau yog khaws cia hauv kev sib raug zoo ntawm cov khoom. Peb tau siv qhov project qhib cov ntaub ntawv zoo heev los sim Neptune MusicBrainz. MusicBrainz sau txhua hom suab paj nruag metadata xav tau, xws li cov ntaub ntawv hais txog cov neeg ua yeeb yam, nkauj, album tso tawm lossis kev hais kwv txhiaj, nrog rau leej twg tus kws kos duab tom qab zaj nkauj koom tes nrog lossis thaum lub album tau tso tawm hauv lub tebchaws twg. MusicBrainz tuaj yeem pom tau tias yog lub network loj ntawm cov koom haum uas muaj qee yam txuas nrog kev lag luam suab paj nruag.

Lub MusicBrainz dataset yog muab los ua CSV pob tseg ntawm cov ntaub ntawv sib raug zoo. Nyob rau hauv tag nrho, cov pov tseg muaj txog 93 lab kab hauv 157 lub rooj. Txawm hais tias qee qhov ntawm cov rooj no muaj cov ntaub ntawv yooj yim xws li cov kws ua yeeb yam, cov xwm txheej, kaw, tso tawm lossis cov lem, lwm tus link tables - khaws kev sib raug zoo ntawm cov kws ua yeeb yam thiab cov ntaub ntawv kaw tseg, lwm tus kws ua yeeb yam lossis tshaj tawm, thiab lwm yam ... Lawv ua qauv qhia cov qauv duab ntawm cov ntaub ntawv teeb tsa. Thaum hloov cov ntaub ntawv rau hauv RDF triples, peb tau txais kwv yees li 500 lab zaus.

Raws li cov kev paub dhau los thiab kev xav ntawm cov neeg koom tes nrog peb ua haujlwm, peb nthuav tawm qhov chaw uas qhov kev paub no siv los muab cov ntaub ntawv tshiab. Tsis tas li ntawd, peb cia siab tias nws yuav hloov kho tsis tu ncua, piv txwv li los ntawm kev ntxiv cov kev tshaj tawm tshiab lossis hloov kho cov tswv cuab hauv pab pawg.

hloov

Raws li xav tau, txhim kho Amazon Neptune yog qhov yooj yim. Nws paub meej heev ntaub ntawv. Koj tuaj yeem tso cov duab duab hauv ob peb clicks xwb. Txawm li cas los xij, thaum nws los txog rau kev nthuav dav ntxiv, cov ntaub ntawv tsim nyog nyuaj nrhiav. Yog li ntawd, peb xav taw tes rau ib tug configuration parameter.

Thawj qhov kev xav ntawm Amazon Neptune
Configuration screenshot rau parameter pawg

Amazon hais tias Neptune tsom mus rau kev ua haujlwm qis qis qis, uas yog vim li cas lub sijhawm thov ncua sijhawm yog 120 vib nas this. Txawm li cas los xij, peb tau sim ntau qhov kev siv tshuaj ntsuam xyuas uas peb tau mus txog qhov txwv tsis tu ncua. Qhov kev ncua sij hawm no tuaj yeem hloov pauv los ntawm kev tsim ib pawg tshiab parameter rau Neptune thiab teeb tsa neptune_query_timeout raug txwv.

Chaw thau khoom

Hauv qab no peb yuav tham txog qhov nthuav dav li cas peb thauj cov ntaub ntawv MusicBrainz rau hauv Neptune.

Kev sib raug zoo hauv peb

Ua ntej, peb hloov cov ntaub ntawv MusicBrainz rau RDF triples. Yog li ntawd, rau txhua lub rooj, peb txhais cov qauv uas txhais tau tias txhua kab sawv cev li cas hauv peb npaug. Hauv qhov piv txwv no, txhua kab ntawm lub rooj ua yeeb yam tau kos rau kaum ob RDF triples.

<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/gid> "${gid}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/name> "${name}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/sort-name> "${sort_name}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/begin-date> "${begin_date_year}-${begin_date_month}-${begin_date_day}"^^xsd:<http://www.w3.org/2001/XMLSchema#date> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/end-date> "${end_date_year}-${end_date_month}-${end_date_day}"^^xsd:<http://www.w3.org/2001/XMLSchema#date> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/type> <http://musicbrainz.foo/artist-type/${type}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/area> <http://musicbrainz.foo/area/${area}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/gender> <http://musicbrainz.foo/gender/${gender}> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/comment> "${comment}"^^<http://www.w3.org/2001/XMLSchema#string> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/edits-pending> "${edits_pending}"^^<http://www.w3.org/2001/XMLSchema#int> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/last-updated> "${last_updated}"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
 
<http://musicbrainz.foo/artist/${id}> <http://musicbrainz.foo/ended> "${ended}"^^<http://www.w3.org/2001/XMLSchema#boolean> .

Upload ntau

Txoj kev qhia kom thauj cov ntaub ntawv ntau rau hauv Neptune yog los ntawm cov txheej txheem upload ntau ntawm S3. Tom qab upload koj cov ntaub ntawv triples rau S3, koj pib upload siv POST thov. Hauv peb qhov xwm txheej, nws siv sijhawm li 24 teev rau 500 lab triplets. Peb cia siab tias nws yuav nrawm dua.

curl -X POST -H 'Content-Type: application/json' http://your-neptune-cluster:8182/loader -d '{
 
 
 "source" : "s3://your-s3-bucket",
 
 "format" : "ntriples",
 
 "iamRoleArn" : "arn:aws:iam::your-iam-user:role/NeptuneLoadFromS3",
 
 "region" : "eu-west-1",
 
 "failOnError" : "FALSE"
 
}'

Txhawm rau zam qhov txheej txheem ntev no txhua zaus peb tso Neptune, peb tau txiav txim siab los kho qhov piv txwv los ntawm cov duab thaij duab uas cov triplets twb tau thauj khoom lawm. Kev khiav ntawm ib qho snapshot yog nrawm dua, tab sis tseem yuav siv sijhawm li ib teev kom txog thaum Neptune muaj rau kev thov.

Thaum pib thauj peb lub hlis mus rau hauv Neptune, peb tau ntsib ntau yam yuam kev.

{
 
 
 "errorCode" : "PARSING_ERROR",
 
 "errorMessage" : "Content after '.' is not allowed",
 
 "fileName" : [...],
 
 "recordNum" : 25
 
}

Ib txhia ntawm lawv tau parsing yuam kev, raws li qhia saum toj no. Txog rau tam sim no, peb tseem tsis tau paub tias qhov twg yog qhov tsis raug ntawm lub sijhawm no. Ib qho ntxiv me ntsis yuav pab tau ntawm no. Qhov kev ua yuam kev no tau tshwm sim rau kwv yees li 1% ntawm qhov tso tawm triples. Tab sis raws li kev sim Neptune mus, peb lees txais qhov tseeb tias peb tsuas yog ua haujlwm nrog 99% ntawm cov ntaub ntawv los ntawm MusicBrainz.

Txawm hais tias qhov no yooj yim rau cov neeg paub txog SPARQL, yuav tsum paub tias RDF triples yuav tsum tau sau tseg nrog cov ntaub ntawv meej meej, uas tuaj yeem ua rau yuam kev.

Streaming download

Raws li tau hais los saum toj no, peb tsis xav siv Neptune ua cov ntaub ntawv zoo li qub, tab sis yog qhov hloov pauv tau thiab hloov pauv kev paub. Yog li peb yuav tsum nrhiav txoj hauv kev los qhia peb tus kheej tshiab thaum lub hauv paus kev paub hloov pauv, piv txwv li thaum cov nkauj tshiab tau tshaj tawm lossis thaum peb xav ua kom muaj txiaj ntsig los ntawm kev paub.

Neptune txhawb nqa cov tswv yim los ntawm SPARQL cov lus nug, ob qho tib si nyoos thiab cov qauv-raws li. Peb yuav tham txog ob txoj hauv kev hauv qab no.

Ib qho ntawm peb lub hom phiaj yog nkag mus rau cov ntaub ntawv hauv streaming. Xav txog kev tso tawm ib qho album hauv lub tebchaws tshiab. Los ntawm MusicBrainz qhov kev xav, qhov no txhais tau hais tias rau kev tso tawm uas suav nrog albums, singles, EPs, thiab lwm yam, ib qho kev nkag tshiab yog ntxiv rau lub rooj. tso-lub teb chaws. Hauv RDF, peb phim cov ntaub ntawv no nrog ob peb qhov tshiab.

INSERT DATA { <http://musicbrainz.foo/release-country/737041> <http://musicbrainz.foo/release> <http://musicbrainz.foo/release/435759> };INSERT DATA { <http://musicbrainz.foo/release-country/737041> <http://musicbrainz.foo/date-year> "2018"^^<http://www.w3.org/2001/XMLSchema#int> };

Lwm lub hom phiaj yog kom tau txais kev paub tshiab los ntawm daim duab. Cia peb hais tias peb xav kom tau txais cov naj npawb ntawm kev tshaj tawm txhua tus kws kos duab tau tshaj tawm hauv lawv txoj haujlwm. Cov lus nug zoo li no yog qhov nyuaj heev thiab siv sijhawm ntau dua 20 feeb hauv Neptune, yog li peb yuav tsum ua kom tiav qhov txiaj ntsig thiaj li yuav rov siv qhov kev paub tshiab no hauv qee cov lus nug. Yog li peb ntxiv triples nrog cov ntaub ntawv no rov qab mus rau daim duab, nkag mus rau qhov tshwm sim ntawm cov lus nug.

INSERT {
 
 
  ?artist_credit <http://musicbrainz.foo/number-of-releases> ?number_of_releases
 
} WHERE {
 
  SELECT ?artist_credit (COUNT(*) as ?number_of_releases)
 
  WHERE {
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
  }
 
  GROUP BY ?artist_credit
 
}

Ntxiv ib zaug peb zaug rau daim duab yuav siv ob peb milliseconds, thaum lub sijhawm ua tiav rau kev ntxig cov txiaj ntsig ntawm cov lus nug yog nyob ntawm lub sijhawm ua tiav ntawm subquery nws tus kheej.

Txawm hais tias peb tsis siv nws ntau zaus, Neptune kuj tso cai rau koj tshem tawm triplets raws li cov qauv lossis cov ntaub ntawv qhia meej, uas tuaj yeem siv los hloov kho cov ntaub ntawv.

SPARQL cov lus nug

Los ntawm kev qhia cov qauv yav dhau los, uas rov qab cov naj npawb ntawm kev tso tawm rau txhua tus kws kos duab, peb twb tau qhia thawj hom lus nug uas peb xav teb siv Neptune. Kev tsim cov lus nug hauv Neptune yog ib qho yooj yim - xa daim ntawv thov rau SPARQL qhov kawg, raws li qhia hauv qab no:

curl -X POST --data-binary 'query=SELECT ?artist ?p ?o where {?artist <http://musicbrainz.foo/name> "Elton John" . ?artist ?p ?o . }' http://your-neptune-cluster:8182/sparql

Tsis tas li ntawd, peb tau siv cov lus nug uas xa rov qab rau tus kws kos duab profile uas muaj cov ntaub ntawv hais txog lawv lub npe, hnub nyoog, lossis lub tebchaws tuaj. Nco ntsoov tias cov neeg ua yeeb yam yuav yog cov neeg, pab pawg, lossis orchestras. Tsis tas li ntawd, peb ntxiv cov ntaub ntawv no nrog cov ntaub ntawv hais txog tus naj npawb ntawm kev tshaj tawm los ntawm cov kws ua yeeb yam hauv lub xyoo. Rau cov neeg ua yeeb yam solo, peb kuj ntxiv cov ntaub ntawv hais txog cov pab pawg uas tau koom nrog txhua xyoo.

SELECT
 
 
 ?artist_name ?year
 
 ?releases_in_year ?releases_up_year
 
 ?artist_type_name ?releases
 
 ?artist_gender ?artist_country_name
 
 ?artist_begin_date ?bands
 
 ?bands_in_year
 
WHERE {
 
 # Bands for each artist
 
 {
 
   SELECT
 
     ?year
 
     ?first_artist
 
     (group_concat(DISTINCT ?second_artist_name;separator=",") as ?bands)
 
     (COUNT(DISTINCT ?second_artist_name) AS ?bands_in_year)     
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018
 
     }   
 
     ?first_artist <http://musicbrainz.foo/name> "Elton John" .
 
     ?first_artist <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist> .
 
     ?first_artist <http://musicbrainz.foo/type> ?first_artist_type .
 
     ?first_artist <http://musicbrainz.foo/name> ?first_artist_name .
 

 
 
     ?second_artist <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist> .
 
     ?second_artist <http://musicbrainz.foo/type> ?second_artist_type .
 
     ?second_artist <http://musicbrainz.foo/name> ?second_artist_name .
 
     optional { ?second_artist <http://musicbrainz.foo/begin-date-year> ?second_artist_begin_date_year . }
 
     optional { ?second_artist <http://musicbrainz.foo/end-date-year> ?second_artist_end_date_year . }
 

 
 
     ?l_artist_artist <http://musicbrainz.foo/entity0> ?first_artist .
 
     ?l_artist_artist <http://musicbrainz.foo/entity1> ?second_artist .
 
     ?l_artist_artist <http://musicbrainz.foo/link> ?link .
 

 
 
     optional { ?link <http://musicbrainz.foo/begin-date-year> ?link_begin_date_year . }
 
     optional { ?link <http://musicbrainz.foo/end-date-year> ?link_end_date_year . }
 

 
 
     FILTER (!bound(?link_begin_date_year) || ?link_begin_date_year <= ?year)
 
     FILTER (!bound(?link_end_date_year) || ?link_end_date_year >= ?year)
 
     FILTER (!bound(?second_artist_begin_date_year) || ?second_artist_begin_date_year <= ?year)
 
     FILTER (!bound(?second_artist_end_date_year) || ?second_artist_end_date_year >= ?year)
 
     FILTER (?first_artist_type NOT IN (<http://musicbrainz.foo/artist-type/2>, <http://musicbrainz.foo/artist-type/5>, <http://musicbrainz.foo/artist-type/6>))
 
     FILTER (?second_artist_type IN (<http://musicbrainz.foo/artist-type/2>, <http://musicbrainz.foo/artist-type/5>, <http://musicbrainz.foo/artist-type/6>))
 
   }
 
   GROUP BY ?first_artist ?year
 
 }
 
 # Releases up to a year
 
 {
 
   SELECT
 
     ?artist
 
     ?year
 
     (group_concat(DISTINCT ?release_name;separator=",") as ?releases)
 
     (COUNT(*) as ?releases_up_year)
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018 
 
     }
 

 
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 

 
 
     ?artist_credit_name <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?artist_credit_name <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit-name> .
 
     ?artist_credit_name <http://musicbrainz.foo/artist> ?artist .
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 

 
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
     ?release <http://musicbrainz.foo/release-group> ?release_group .
 
     ?release <http://musicbrainz.foo/name> ?release_name .
 
     ?release_country <http://musicbrainz.foo/release> ?release .
 
     ?release_country <http://musicbrainz.foo/date-year> ?release_country_year .
 

 
 
     FILTER (?release_country_year <= ?year)
 
   }
 
   GROUP BY ?artist ?year
 
 }
 
 # Releases in a year
 
 {
 
   SELECT ?artist ?year (COUNT(*) as ?releases_in_year)
 
   WHERE {
 
     VALUES ?year {
 
       1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
 
       1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
 
       1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
 
       1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
 
       2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
 
       2010 2011 2012 2013 2014 2015 2016 2017 2018 
 
     }
 

 
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 

 
 
     ?artist_credit_name <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?artist_credit_name <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit-name> .
 
     ?artist_credit_name <http://musicbrainz.foo/artist> ?artist .
 
     ?artist_credit <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/artist-credit> .
 

 
 
     ?release_group <http://musicbrainz.foo/artist-credit> ?artist_credit .
 
     ?release_group <http://musicbrainz.foo/rdftype> <http://musicbrainz.foo/release-group> .
 
     ?release_group <http://musicbrainz.foo/name> ?release_group_name .
 
     ?release <http://musicbrainz.foo/release-group> ?release_group .
 
     ?release_country <http://musicbrainz.foo/release> ?release .
 
     ?release_country <http://musicbrainz.foo/date-year> ?release_country_year .
 

 
 
     FILTER (?release_country_year = ?year)
 
   }
 
   GROUP BY ?artist ?year
 
 }
 
 # Master data
 
 {
 
   SELECT DISTINCT ?artist ?artist_name ?artist_gender ?artist_begin_date ?artist_country_name
 
   WHERE {
 
     ?artist <http://musicbrainz.foo/name> ?artist_name .
 
     ?artist <http://musicbrainz.foo/name> "Elton John" .
 
     ?artist <http://musicbrainz.foo/gender> ?artist_gender_id .
 
     ?artist_gender_id <http://musicbrainz.foo/name> ?artist_gender .
 
     ?artist <http://musicbrainz.foo/area> ?birth_area .
 
     ?artist <http://musicbrainz.foo/begin-date-year> ?artist_begin_date.
 
     ?birth_area <http://musicbrainz.foo/name> ?artist_country_name .
 

 
 
     FILTER(datatype(?artist_begin_date) = xsd:int)
 
   }

Vim qhov nyuaj ntawm cov lus nug no, peb tsuas tuaj yeem ua cov lus nug rau cov kws kos duab tshwj xeeb, xws li Elton John, tab sis tsis yog rau txhua tus kws kos duab. Neptune tsis zoo li yuav ua kom zoo dua li cov lus nug los ntawm kev xa cov lim dej rau hauv cov kev xaiv. Yog li ntawd, txhua qhov kev xaiv yuav tsum tau lim los ntawm tus kws kos duab lub npe.

Neptune muaj ob teev thiab ib-I / O nqi. Rau peb qhov kev sim, peb siv qhov tsawg kawg nkaus Neptune piv txwv, uas raug nqi $ 0,384 / teev. Nyob rau hauv cov ntaub ntawv ntawm cov lus nug saum toj no, uas xam cov profile rau ib tug neeg ua hauj lwm, Amazon tsub peb kaum tawm txhiab tus I/O ua hauj lwm, implying tus nqi ntawm $0.02.

xaus

Ua ntej, Amazon Neptune khaws feem ntau ntawm nws cov lus cog tseg. Raws li kev tswj hwm kev pabcuam, nws yog daim duab database uas yooj yim heev rau nruab thiab tuaj yeem nce thiab khiav yam tsis muaj kev teeb tsa ntau. Nov yog peb tsib qhov kev tshawb pom tseem ceeb:

  • Bulk upload yog ib qho yooj yim tab sis qeeb. Tab sis nws tuaj yeem nyuaj nrog cov lus yuam kev uas tsis muaj txiaj ntsig zoo.
  • Streaming download txhawb txhua yam peb xav tau thiab ceev heev
  • Cov lus nug yooj yim, tab sis tsis muaj kev sib tham txaus los khiav cov lus nug analytical
  • SPARQL queries yuav tsum optimized manually
  • Amazon kev them nyiaj yog qhov nyuaj rau kwv yees vim tias nws nyuaj rau kwv yees tus nqi ntawm cov ntaub ntawv luam tawm los ntawm SPARQL cov lus nug.

Yog tag nrho. Sau npe rau free webinar ntawm lub ncauj lus "Load Balancing".


Tau qhov twg los: www.hab.com

Ntxiv ib saib