Elasticsearch yog lub tshuab tshawb nrhiav nrog json so api, siv Lucene thiab sau hauv Java. Cov lus piav qhia ntawm txhua qhov zoo ntawm lub cav no muaj nyob ntawm . Hauv dab tsi hauv qab no peb yuav xa mus rau Elasticsearch li ES.
Cov tshuab zoo sib xws yog siv rau kev tshawb nrhiav nyuaj hauv cov ntaub ntawv database. Piv txwv li, kev tshawb fawb coj mus rau hauv tus account lub morphology ntawm cov lus los yog tshawb fawb los ntawm geo coordinates.
Hauv tsab xov xwm no kuv yuav tham txog cov hauv paus ntawm ES siv cov piv txwv ntawm indexing blog posts. Kuv mam li qhia koj yuav ua li cas lim, txheeb xyuas thiab tshawb cov ntaub ntawv.
Txhawm rau kom tsis txhob nyob ntawm qhov kev ua haujlwm, kuv yuav ua txhua qhov kev thov rau ES siv CURL. Kuj tseem muaj lub plugin rau google chrome hu ua .
Cov ntawv muaj txuas mus rau cov ntaub ntawv thiab lwm qhov chaw. Thaum kawg muaj cov kev sib txuas rau kev nkag mus ceev rau cov ntaub ntawv. Cov ntsiab lus ntawm cov lus tsis paub yuav pom nyob rau hauv .
Txhim kho ES
Txhawm rau ua qhov no, peb thawj zaug xav tau Java. Cov neeg tsim tawm nruab Java versions tshiab dua Java 8 hloov tshiab 20 lossis Java 7 hloov tshiab 55.
Qhov kev faib tawm ES muaj nyob ntawm . Tom qab unpacking lub archive koj yuav tsum tau khiav bin/elasticsearch. Kuj muaj Cov. muaj . .
Tom qab kev teeb tsa thiab tso tawm, cia peb tshawb xyuas cov haujlwm ua haujlwm:
# Π΄Π»Ρ ΡΠ΄ΠΎΠ±ΡΡΠ²Π° Π·Π°ΠΏΠΎΠΌΠ½ΠΈΠΌ Π°Π΄ΡΠ΅Ρ Π² ΠΏΠ΅ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ
#export ES_URL=$(docker-machine ip dev):9200
export ES_URL=localhost:9200
curl -X GET $ES_URLPeb yuav tau txais qee yam zoo li no:
{
"name" : "Heimdall",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.2.1",
"build_hash" : "d045fc29d1932bce18b2e65ab8b297fbf6cd41a1",
"build_timestamp" : "2016-03-09T09:38:54Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}Indexing
Cia peb ntxiv ib tsab ntawv rau ES:
# ΠΠΎΠ±Π°Π²ΠΈΠΌ Π΄ΠΎΠΊΡΠΌΠ΅Π½Ρ c id 1 ΡΠΈΠΏΠ° post Π² ΠΈΠ½Π΄Π΅ΠΊΡ blog.
# ?pretty ΡΠΊΠ°Π·ΡΠ²Π°Π΅Ρ, ΡΡΠΎ Π²ΡΠ²ΠΎΠ΄ Π΄ΠΎΠ»ΠΆΠ΅Π½ Π±ΡΡΡ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊΠΎ-ΡΠΈΡΠ°Π΅ΠΌΡΠΌ.
curl -XPUT "$ES_URL/blog/post/1?pretty" -d'
{
"title": "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"content": "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p>",
"tags": [
"ΠΊΠΎΡΡΡΠ°",
"ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ"
],
"published_at": "2014-09-12T20:44:42+00:00"
}'
server teb:
{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : false
}
ES tau tsim blog thiab ncej. Peb tuaj yeem kos ib qho kev sib piv: qhov ntsuas yog qhov chaw khaws ntaub ntawv, thiab ib hom yog lub rooj hauv cov ntaub ntawv no. Txhua hom muaj nws tus kheej lub tswv yim - , ib yam li lub rooj sib tham. Mapping yog generated txiav thaum cov ntaub ntawv yog indexed:
# ΠΠΎΠ»ΡΡΠΈΠΌ mapping Π²ΡΠ΅Ρ
ΡΠΈΠΏΠΎΠ² ΠΈΠ½Π΄Π΅ΠΊΡΠ° blog
curl -XGET "$ES_URL/blog/_mapping?pretty"Hauv cov lus teb rau tus neeg rau zaub mov, kuv ntxiv qhov tseem ceeb ntawm thaj chaw ntawm cov ntaub ntawv indexed hauv cov lus:
{
"blog" : {
"mappings" : {
"post" : {
"properties" : {
/* "content": "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p>", */
"content" : {
"type" : "string"
},
/* "published_at": "2014-09-12T20:44:42+00:00" */
"published_at" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
/* "tags": ["ΠΊΠΎΡΡΡΠ°", "ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ"] */
"tags" : {
"type" : "string"
},
/* "title": "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°" */
"title" : {
"type" : "string"
}
}
}
}
}
}Nws tsim nyog sau cia tias ES tsis sib txawv ntawm ib tus nqi thiab ib qho ntawm cov nqi. Piv txwv li, lub npe teb tsuas muaj ib lub npe, thiab cov ntawv cim npe muaj ib qho array ntawm cov hlua, txawm hais tias lawv tau sawv cev tib yam hauv kev kos duab.
Peb mam li tham ntxiv txog daim ntawv qhia tom qab.
Thov
Retrieving ib daim ntawv los ntawm nws tus ID:
# ΠΈΠ·Π²Π»Π΅ΡΠ΅ΠΌ Π΄ΠΎΠΊΡΠΌΠ΅Π½Ρ Ρ id 1 ΡΠΈΠΏΠ° post ΠΈΠ· ΠΈΠ½Π΄Π΅ΠΊΡΠ° blog
curl -XGET "$ES_URL/blog/post/1?pretty"{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"content" : "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p>",
"tags" : [ "ΠΊΠΎΡΡΡΠ°", "ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ" ],
"published_at" : "2014-09-12T20:44:42+00:00"
}
}Cov yuam sij tshiab tau tshwm sim hauv cov lus teb: _version ΠΈ _source. Feem ntau, txhua tus yuam sij pib nrog _ raug cais ua nom.
Ntsiab _version qhia cov ntaub ntawv version. Nws yog tsim nyog rau lub optimistic xauv mechanism ua hauj lwm. Piv txwv li, peb xav hloov cov ntaub ntawv uas muaj version 1. Peb xa cov ntaub ntawv hloov pauv thiab qhia tias qhov no yog qhov hloov kho ntawm cov ntaub ntawv nrog version 1. Yog tias lwm tus neeg kuj tau kho cov ntaub ntawv nrog version 1 thiab xa cov kev hloov pauv ua ntej peb, ces ES yuav tsis lees txais peb cov kev hloov pauv, vim nws khaws cov ntaub ntawv nrog version 2.
Ntsiab _source muaj cov ntaub ntawv uas peb indexed. ES tsis siv tus nqi no rau kev nrhiav haujlwm vim Index yog siv rau kev tshawb nrhiav. Txhawm rau txuag chaw, ES khaws cov ntaub ntawv compressed. Yog tias peb tsuas xav tau tus id, thiab tsis yog tag nrho cov ntaub ntawv, ces peb tuaj yeem kaw qhov chaw cia.
Yog tias peb tsis xav tau cov ntaub ntawv ntxiv, peb tuaj yeem tau txais cov ntsiab lus ntawm _source:
curl -XGET "$ES_URL/blog/post/1/_source?pretty"{
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"content" : "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p>",
"tags" : [ "ΠΊΠΎΡΡΡΠ°", "ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ" ],
"published_at" : "2014-09-12T20:44:42+00:00"
}
Koj tseem tuaj yeem xaiv tsuas yog qee thaj chaw:
# ΠΈΠ·Π²Π»Π΅ΡΠ΅ΠΌ ΡΠΎΠ»ΡΠΊΠΎ ΠΏΠΎΠ»Π΅ title
curl -XGET "$ES_URL/blog/post/1?_source=title&pretty"{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°"
}
}Cia peb ntsuas ob peb nqe lus ntxiv thiab khiav cov lus nug nyuaj.
curl -XPUT "$ES_URL/blog/post/2" -d'
{
"title": "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΡΠ΅Π½ΠΊΠΈ",
"content": "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΡΠ΅Π½ΠΊΠΎΠ²<p>",
"tags": [
"ΡΠ΅Π½ΠΊΠΈ",
"ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ"
],
"published_at": "2014-08-12T20:44:42+00:00"
}'curl -XPUT "$ES_URL/blog/post/3" -d'
{
"title": "ΠΠ°ΠΊ Ρ ΠΌΠ΅Π½Ρ ΠΏΠΎΡΠ²ΠΈΠ»ΡΡ ΠΊΠΎΡΠ΅Π½ΠΎΠΊ",
"content": "<p>ΠΡΡΠ΅ΡΠ°Π·Π΄ΠΈΡΠ°ΡΡΠ°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ Π±Π΅Π΄Π½ΠΎΠ³ΠΎ ΠΊΠΎΡΠ΅Π½ΠΊΠ° Ρ ΡΠ»ΠΈΡΡ<p>",
"tags": [
"ΠΊΠΎΡΡΡΠ°"
],
"published_at": "2014-07-21T20:44:42+00:00"
}'Kev cais tawm
# Π½Π°ΠΉΠ΄Π΅ΠΌ ΠΏΠΎΡΠ»Π΅Π΄Π½ΠΈΠΉ ΠΏΠΎΡΡ ΠΏΠΎ Π΄Π°ΡΠ΅ ΠΏΡΠ±Π»ΠΈΠΊΠ°ΡΠΈΠΈ ΠΈ ΠΈΠ·Π²Π»Π΅ΡΠ΅ΠΌ ΠΏΠΎΠ»Ρ title ΠΈ published_at
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"size": 1,
"_source": ["title", "published_at"],
"sort": [{"published_at": "desc"}]
}'{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : null,
"_source" : {
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"published_at" : "2014-09-12T20:44:42+00:00"
},
"sort" : [ 1410554682000 ]
} ]
}
}Peb xaiv tus ncej kawg. size txwv tus naj npawb ntawm cov ntaub ntawv tawm. total qhia tag nrho cov ntaub ntawv uas phim qhov kev thov. sort nyob rau hauv cov zis muaj ib tug array ntawm integers uas sorting yog ua. Cov. hnub tau hloov mus rau ib tug integer. Xav paub ntau ntxiv txog kev txheeb xyuas tuaj yeem pom hauv .
Lim thiab queries
ES txij li version 2 tsis paub qhov txawv ntawm cov ntxaij lim dej thiab cov lus nug, hloov .
Cov ntsiab lus nug sib txawv ntawm cov ntsiab lus lim hauv qhov lus nug tsim _score thiab tsis cached. Kuv yuav qhia koj seb _score yog dab tsi tom qab.
Lim los ntawm hnub
Peb siv qhov kev thov nyob rau hauv cov ntsiab lus ntawm lim:
# ΠΏΠΎΠ»ΡΡΠΈΠΌ ΠΏΠΎΡΡΡ, ΠΎΠΏΡΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½ΡΠ΅ 1ΠΎΠ³ΠΎ ΡΠ΅Π½ΡΡΠ±ΡΡ ΠΈΠ»ΠΈ ΠΏΠΎΠ·ΠΆΠ΅
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"filter": {
"range": {
"published_at": { "gte": "2014-09-01" }
}
}
}'Lim los ntawm cov cim npe
Peb siv mus nrhiav cov ntaub ntawv ids uas muaj ib lo lus muab:
# Π½Π°ΠΉΠ΄Π΅ΠΌ Π²ΡΠ΅ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΡ, Π² ΠΏΠΎΠ»Π΅ tags ΠΊΠΎΡΠΎΡΡΡ
Π΅ΡΡΡ ΡΠ»Π΅ΠΌΠ΅Π½Ρ 'ΠΊΠΎΡΡΡΠ°'
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"_source": [
"title",
"tags"
],
"filter": {
"term": {
"tags": "ΠΊΠΎΡΡΡΠ°"
}
}
}'{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"tags" : [ "ΠΊΠΎΡΡΡΠ°", "ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ" ]
}
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "ΠΠ°ΠΊ Ρ ΠΌΠ΅Π½Ρ ΠΏΠΎΡΠ²ΠΈΠ»ΡΡ ΠΊΠΎΡΠ΅Π½ΠΎΠΊ",
"tags" : [ "ΠΊΠΎΡΡΡΠ°" ]
}
} ]
}
}Nrhiav cov ntawv nyeem
Peb ntawm peb cov ntaub ntawv muaj cov hauv qab no hauv cov ntsiab lus teb:
<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p><p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΡΠ΅Π½ΠΊΠΎΠ²<p><p>ΠΡΡΠ΅ΡΠ°Π·Π΄ΠΈΡΠ°ΡΡΠ°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ Π±Π΅Π΄Π½ΠΎΠ³ΠΎ ΠΊΠΎΡΠ΅Π½ΠΊΠ° Ρ ΡΠ»ΠΈΡΡ<p>
Peb siv mus nrhiav cov ntaub ntawv ids uas muaj ib lo lus muab:
# source: false ΠΎΠ·Π½Π°ΡΠ°Π΅Ρ, ΡΡΠΎ Π½Π΅ Π½ΡΠΆΠ½ΠΎ ΠΈΠ·Π²Π»Π΅ΠΊΠ°ΡΡ _source Π½Π°ΠΉΠ΄Π΅Π½Π½ΡΡ
Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ²
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"_source": false,
"query": {
"match": {
"content": "ΠΈΡΡΠΎΡΠΈΡ"
}
}
}'{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.11506981,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "2",
"_score" : 0.11506981
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : 0.11506981
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "3",
"_score" : 0.095891505
} ]
}
}Txawm li cas los xij, yog tias peb tshawb nrhiav "cov dab neeg" hauv cov ntsiab lus teb, peb yuav tsis pom dab tsi, vim Qhov ntsuas tsuas muaj cov lus qub xwb, tsis yog lawv cov stems. Txhawm rau ua qhov kev tshawb fawb zoo, koj yuav tsum teeb tsa lub ntsuas ntsuas.
teb _score qhia tau hais tias . Yog tias qhov kev thov raug ua tiav hauv cov ntsiab lus lim, ces tus nqi _score yuav ib txwm sib npaug rau 1, uas txhais tau tias ua tiav qhov sib tw rau lub lim.
Cov kws tshuaj ntsuam xyuas
yog xav tau los hloov cov ntawv sau rau hauv ib pawg tokens.
Analyzers muaj ib tug thiab ob peb xaiv tau . Tokenizer tuaj yeem ua ntej los ntawm ntau yam . Tokenizers rhuav tshem cov kab hauv paus rau hauv tokens, xws li qhov chaw thiab cov cim cim. TokenFilter tuaj yeem hloov cov tokens, rho tawm lossis ntxiv cov tshiab, piv txwv li, tsuas yog tso lub qia ntawm lo lus, tshem tawm cov prepositions, ntxiv cov ntsiab lus. CharFilter - hloov tag nrho cov hlua, piv txwv li, txiav tawm html cim npe.
ES muaj ntau yam . Piv txwv li, ib tug analyzer .
Cia peb ua kom zoo dua thiab cia saib yuav ua li cas tus qauv thiab Lavxias teb sab analyzers hloov txoj hlua "Cov dab neeg funny txog kittens":
# ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌ Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡ standard
# ΠΎΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ Π½ΡΠΆΠ½ΠΎ ΠΏΠ΅ΡΠ΅ΠΊΠΎΠ΄ΠΈΡΠΎΠ²Π°ΡΡ Π½Π΅ ASCII ΡΠΈΠΌΠ²ΠΎΠ»Ρ
curl -XGET "$ES_URL/_analyze?pretty&analyzer=standard&text=%D0%92%D0%B5%D1%81%D0%B5%D0%BB%D1%8B%D0%B5%20%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D0%B8%20%D0%BF%D1%80%D0%BE%20%D0%BA%D0%BE%D1%82%D1%8F%D1%82"{
"tokens" : [ {
"token" : "Π²Π΅ΡΠ΅Π»ΡΠ΅",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "ΠΈΡΡΠΎΡΠΈΠΈ",
"start_offset" : 8,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "ΠΏΡΠΎ",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "ΠΊΠΎΡΡΡ",
"start_offset" : 20,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 3
} ]
}# ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌ Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡ russian
curl -XGET "$ES_URL/_analyze?pretty&analyzer=russian&text=%D0%92%D0%B5%D1%81%D0%B5%D0%BB%D1%8B%D0%B5%20%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D0%B8%20%D0%BF%D1%80%D0%BE%20%D0%BA%D0%BE%D1%82%D1%8F%D1%82"{
"tokens" : [ {
"token" : "Π²Π΅ΡΠ΅Π»",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "ΠΈΡΡΠΎΡ",
"start_offset" : 8,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "ΠΊΠΎΡ",
"start_offset" : 20,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 3
} ]
}Tus txheej txheem analyzer cais txoj hlua los ntawm qhov chaw thiab hloov txhua yam mus rau cov ntaub ntawv qis, tus kws tshuaj ntsuam xyuas Lavxias tau tshem tawm cov lus tsis tseem ceeb, hloov nws mus rau cov ntaub ntawv qis thiab sab laug lub qia ntawm cov lus.
Wb pom qhov twg Tokenizer, TokenFilters, CharFilters tus neeg soj ntsuam Lavxias siv:
{
"filter": {
"russian_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"russian_keywords": {
"type": "keyword_marker",
"keywords": []
},
"russian_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"russian": {
"tokenizer": "standard",
/* TokenFilters */
"filter": [
"lowercase",
"russian_stop",
"russian_keywords",
"russian_stemmer"
]
/* CharFilters ΠΎΡΡΡΡΡΡΠ²ΡΡΡ */
}
}
}Cia peb piav qhia peb cov ntsuas ntsuas raws li Lavxias, uas yuav txiav tawm html cim npe. Wb hu ua default, vim tus ntsuas nrog lub npe no yuav raug siv los ntawm lub neej ntawd.
{
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
/* Π΄ΠΎΠ±Π°Π²Π»ΡΠ΅ΠΌ ΡΠ΄Π°Π»Π΅Π½ΠΈΠ΅ html ΡΠ΅Π³ΠΎΠ² */
"char_filter": ["html_strip"],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}Ua ntej, tag nrho HTML cim npe yuav raug tshem tawm los ntawm cov kab hauv paus, tom qab ntawd tus qauv tokenizer yuav faib nws mus rau hauv tokens, cov txiaj ntsig tokens yuav txav mus rau cov ntaub ntawv qis, cov lus tsis tseem ceeb yuav raug tshem tawm, thiab cov tokens ntxiv yuav nyob twj ywm ntawm lo lus.
Tsim ib qho index
Saum toj no peb tau piav qhia lub neej ntawd analyzer. Nws yuav siv tau rau txhua txoj hlua. Peb cov ntawv tshaj tawm muaj cov ntawv cim npe, yog li cov cim tseem yuav ua tiav los ntawm tus ntsuas. Vim Peb tab tom nrhiav rau cov posts los ntawm qhov tseeb match rau ib daim ntawv, ces peb yuav tsum tau lov tes taw kev tsom xam rau cov cim npe.
Wb tsim ib qho index blog2 nrog ib qho kev soj ntsuam thiab daim ntawv qhia, nyob rau hauv qhov kev tshuaj ntsuam ntawm cov cim npe yog neeg xiam:
curl -XPOST "$ES_URL/blog2" -d'
{
"settings": {
"analysis": {
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}
},
"mappings": {
"post": {
"properties": {
"content": {
"type": "string"
},
"published_at": {
"type": "date"
},
"tags": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string"
}
}
}
}
}'Cia peb ntxiv tib 3 cov lus rau qhov ntsuas no (blog2). Kuv yuav tshem tawm qhov txheej txheem no vim ... nws zoo ib yam li ntxiv cov ntaub ntawv rau blog index.
Cov ntawv nyeem puv nrog kev txhawb nqa
Cia peb saib lwm hom kev thov:
# Π½Π°ΠΉΠ΄Π΅ΠΌ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΡ, Π² ΠΊΠΎΡΠΎΡΡΡ
Π²ΡΡΡΠ΅ΡΠ°Π΅ΡΡΡ ΡΠ»ΠΎΠ²ΠΎ 'ΠΈΡΡΠΎΡΠΈΠΈ'
# query -> simple_query_string -> query ΡΠΎΠ΄Π΅ΡΠΆΠΈΡ ΠΏΠΎΠΈΡΠΊΠΎΠ²ΡΠΉ Π·Π°ΠΏΡΠΎΡ
# ΠΏΠΎΠ»Π΅ title ΠΈΠΌΠ΅Π΅Ρ ΠΏΡΠΈΠΎΡΠΈΡΠ΅Ρ 3
# ΠΏΠΎΠ»Π΅ tags ΠΈΠΌΠ΅Π΅Ρ ΠΏΡΠΈΠΎΡΠΈΡΠ΅Ρ 2
# ΠΏΠΎΠ»Π΅ content ΠΈΠΌΠ΅Π΅Ρ ΠΏΡΠΈΠΎΡΠΈΡΠ΅Ρ 1
# ΠΏΡΠΈΠΎΡΠΈΡΠ΅Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ ΠΏΡΠΈ ΡΠ°Π½ΠΆΠΈΡΠΎΠ²Π°Π½ΠΈΠΈ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠ²
curl -XPOST "$ES_URL/blog2/post/_search?pretty" -d'
{
"query": {
"simple_query_string": {
"query": "ΠΈΡΡΠΎΡΠΈΠΈ",
"fields": [
"title^3",
"tags^2",
"content"
]
}
}
}'Vim Peb tab tom siv lub ntsuas ntsuas nrog Lavxias stemming, tom qab ntawd qhov kev thov no yuav rov qab tag nrho cov ntaub ntawv, txawm hais tias lawv tsuas muaj lo lus 'keeb kwm'.
Qhov kev thov yuav muaj cov cim tshwj xeeb, piv txwv li:
""fried eggs" +(eggplant | potato) -frittata"Thov syntax:
+ signifies AND operation
| signifies OR operation
- negates a single token
" wraps a number of tokens to signify a phrase for searching
* at the end of a term signifies a prefix query
( and ) signify precedence
~N after a word signifies edit distance (fuzziness)
~N after a phrase signifies slop amount# Π½Π°ΠΉΠ΄Π΅ΠΌ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΡ Π±Π΅Π· ΡΠ»ΠΎΠ²Π° 'ΡΠ΅Π½ΠΊΠΈ'
curl -XPOST "$ES_URL/blog2/post/_search?pretty" -d'
{
"query": {
"simple_query_string": {
"query": "-ΡΠ΅Π½ΠΊΠΈ",
"fields": [
"title^3",
"tags^2",
"content"
]
}
}
}'
# ΠΏΠΎΠ»ΡΡΠΈΠΌ 2 ΠΏΠΎΡΡΠ° ΠΏΡΠΎ ΠΊΠΎΡΠΈΠΊΠΎΠ²ua tim khawv
PS
Yog tias koj txaus siab rau cov ntawv no-cov lus qhia, muaj tswv yim rau cov ntawv tshiab lossis muaj cov lus pom zoo rau kev koom tes, kuv yuav zoo siab tau txais cov lus hauv cov lus ntawm tus kheej lossis xa ntawv m.kuzmin+habr@darkleaf.ru.
Tau qhov twg los: www.hab.com
