Elasticsearch waa makiin raadin leh json rest api, isticmaalaya Lucene oo ku qoran Java. Tilmaanta dhammaan faa'iidooyinka matoorkaan ayaa laga heli karaa at . Waxa soo socda waxaan u tixraaci doonaa Elasticsearch sida ES.
Matoorada la midka ah ayaa loo isticmaalaa raadinta adag ee xogta dukumeentiga. Tusaale ahaan, raadinta iyada oo la tixgelinayo qaab-dhismeedka luqadda ama raadinta isku-duwayaasha geo.
Maqaalkan waxaan ka hadli doonaa aasaaska ES aniga oo isticmaalaya tusaalaha tilmaamaya qoraallada blog-ka. Waxaan ku tusi doonaa sida loo shaandheeyo, loo kala saaro oo loo raadiyo dukumeentiyada
Si aanan ugu tiirsanaan nidaamka hawlgalka, waxaan samayn doonaa dhammaan codsiyada ES anigoo isticmaalaya CURL. Waxa kale oo jira plugin loogu talagalay google chrome oo la yiraahdo .
Qoraalku wuxuu ka kooban yahay xiriirinta dukumeentiyada iyo ilo kale. Dhammaadka waxaa jira xiriiriyeyaal si degdeg ah loo galo dukumeentiyada. Qeexitaannada ereyada aan la aqoon ayaa laga heli karaa .
Rakibaadda
Si tan loo sameeyo, waxaan marka hore u baahanahay Java. Horumarinta ku rakib noocyada Java ee ka cusub Java 8 update 20 ama Java 7 update 55.
Qaybinta ES ayaa laga heli karaa . Ka dib markaad furto kaydka waxaad u baahan tahay inaad orod bin/elasticsearch. Sidoo kale waa la heli karaa . waxaa jira . .
Ka dib rakibidda iyo bilawga, aan hubinno shaqeynta:
# Π΄Π»Ρ ΡΠ΄ΠΎΠ±ΡΡΠ²Π° Π·Π°ΠΏΠΎΠΌΠ½ΠΈΠΌ Π°Π΄ΡΠ΅Ρ Π² ΠΏΠ΅ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ
#export ES_URL=$(docker-machine ip dev):9200
export ES_URL=localhost:9200
curl -X GET $ES_URLWaxaan heli doonaa wax sidan oo kale ah:
{
"name" : "Heimdall",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.2.1",
"build_hash" : "d045fc29d1932bce18b2e65ab8b297fbf6cd41a1",
"build_timestamp" : "2016-03-09T09:38:54Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}Tilmaanta
Aan ku darno qoraal ES:
# ΠΠΎΠ±Π°Π²ΠΈΠΌ Π΄ΠΎΠΊΡΠΌΠ΅Π½Ρ c id 1 ΡΠΈΠΏΠ° post Π² ΠΈΠ½Π΄Π΅ΠΊΡ blog.
# ?pretty ΡΠΊΠ°Π·ΡΠ²Π°Π΅Ρ, ΡΡΠΎ Π²ΡΠ²ΠΎΠ΄ Π΄ΠΎΠ»ΠΆΠ΅Π½ Π±ΡΡΡ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊΠΎ-ΡΠΈΡΠ°Π΅ΠΌΡΠΌ.
curl -XPUT "$ES_URL/blog/post/1?pretty" -d'
{
"title": "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"content": "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p>",
"tags": [
"ΠΊΠΎΡΡΡΠ°",
"ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ"
],
"published_at": "2014-09-12T20:44:42+00:00"
}'
jawaabta serverka:
{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : false
}
ES si toos ah ayaa loo sameeyay blog iyo boostada. Waxaan sawiri karnaa isbarbardhig shuruudaysan: tusmuhu waa xog ururin, noocuna waa miis ku jira xogtan. Nooc kastaa wuxuu leeyahay nidaam u gaar ah - , sida miiska xiriirka. Khariidaynta si toos ah ayaa loo soo saarayaa marka dukumeentiga la tilmaamo:
# ΠΠΎΠ»ΡΡΠΈΠΌ mapping Π²ΡΠ΅Ρ
ΡΠΈΠΏΠΎΠ² ΠΈΠ½Π΄Π΅ΠΊΡΠ° blog
curl -XGET "$ES_URL/blog/_mapping?pretty"Jawaabta server-ka, waxaan ku daray qiyamka goobaha dukumeentiga la tilmaamay ee faallooyinka:
{
"blog" : {
"mappings" : {
"post" : {
"properties" : {
/* "content": "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p>", */
"content" : {
"type" : "string"
},
/* "published_at": "2014-09-12T20:44:42+00:00" */
"published_at" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
/* "tags": ["ΠΊΠΎΡΡΡΠ°", "ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ"] */
"tags" : {
"type" : "string"
},
/* "title": "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°" */
"title" : {
"type" : "string"
}
}
}
}
}
}Waxaa xusid mudan in ES-du aysan kala sooceynin hal qiime iyo qiyamyo kala duwan. Tusaale ahaan, goobta cinwaanku waxay si fudud uga kooban tahay cinwaan, goobta tags-kuna waxa ay ka kooban tahay xargo badan, in kasta oo ay si isku mid ah uga muuqdaan khariidaynta.
Wax badan ayaanu ka hadli doonaa khariidaynta.
Codsiyada
Soo celinta dukumeenti aqoonsigiisa:
# ΠΈΠ·Π²Π»Π΅ΡΠ΅ΠΌ Π΄ΠΎΠΊΡΠΌΠ΅Π½Ρ Ρ id 1 ΡΠΈΠΏΠ° post ΠΈΠ· ΠΈΠ½Π΄Π΅ΠΊΡΠ° blog
curl -XGET "$ES_URL/blog/post/1?pretty"{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"content" : "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p>",
"tags" : [ "ΠΊΠΎΡΡΡΠ°", "ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ" ],
"published_at" : "2014-09-12T20:44:42+00:00"
}
}Furayaal cusub ayaa ka soo muuqday jawaabta: _version ΠΈ _source. Guud ahaan, dhammaan furayaasha laga bilaabo _ waxaa loo kala saaraa si rasmi ah.
Furaha _version waxay tusinaysaa nooca dukumentiga. Waxaa loo baahan yahay in habka qufulka ee rajooyinka leh uu shaqeeyo. Tusaale ahaan, waxaan rabnaa inaan bedelno dukumeenti leh nooca 1. Waxaan soo gudbineynaa dukumeenti la beddelay oo aan muujineyno in kani yahay tafatirka dukumeenti leh nooca 1. Haddii qof kale uu sidoo kale tafatiray dukumeenti leh nooca 1 oo uu soo gudbiyay isbeddello annaga naga hor, markaa ES ma aqbali doonto isbedeladayada, sababtoo ah waxay ku kaydinaysaa dukumeentiga nooca 2.
Furaha _source waxa ku jira dukumeenti aanu tilmaanay. ES uma isticmaasho qiimahan hawlgallada raadinta sababtoo ah Tilmaamaha waxaa loo isticmaalaa raadinta. Si loo badbaadiyo meel bannaan, ES waxay kaydisaa dukumeenti isha la isku cadeeyey. Haddii aan kaliya u baahnayn id, oo aan u baahnayn dhammaan dukumeentiga isha, markaa waxaan joojin karnaa kaydinta isha.
Haddii aanan u baahnayn macluumaad dheeri ah, waxaan heli karnaa kaliya waxa ku jira _source:
curl -XGET "$ES_URL/blog/post/1/_source?pretty"{
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"content" : "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p>",
"tags" : [ "ΠΊΠΎΡΡΡΠ°", "ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ" ],
"published_at" : "2014-09-12T20:44:42+00:00"
}
Waxa kale oo aad dooran kartaa beeraha qaarkood:
# ΠΈΠ·Π²Π»Π΅ΡΠ΅ΠΌ ΡΠΎΠ»ΡΠΊΠΎ ΠΏΠΎΠ»Π΅ title
curl -XGET "$ES_URL/blog/post/1?_source=title&pretty"{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°"
}
}Aynu tix-gelinno dhowr qoraal oo kale oo aan wadno weydiimo kakan.
curl -XPUT "$ES_URL/blog/post/2" -d'
{
"title": "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΡΠ΅Π½ΠΊΠΈ",
"content": "<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΡΠ΅Π½ΠΊΠΎΠ²<p>",
"tags": [
"ΡΠ΅Π½ΠΊΠΈ",
"ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ"
],
"published_at": "2014-08-12T20:44:42+00:00"
}'curl -XPUT "$ES_URL/blog/post/3" -d'
{
"title": "ΠΠ°ΠΊ Ρ ΠΌΠ΅Π½Ρ ΠΏΠΎΡΠ²ΠΈΠ»ΡΡ ΠΊΠΎΡΠ΅Π½ΠΎΠΊ",
"content": "<p>ΠΡΡΠ΅ΡΠ°Π·Π΄ΠΈΡΠ°ΡΡΠ°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ Π±Π΅Π΄Π½ΠΎΠ³ΠΎ ΠΊΠΎΡΠ΅Π½ΠΊΠ° Ρ ΡΠ»ΠΈΡΡ<p>",
"tags": [
"ΠΊΠΎΡΡΡΠ°"
],
"published_at": "2014-07-21T20:44:42+00:00"
}'Kala sooc
# Π½Π°ΠΉΠ΄Π΅ΠΌ ΠΏΠΎΡΠ»Π΅Π΄Π½ΠΈΠΉ ΠΏΠΎΡΡ ΠΏΠΎ Π΄Π°ΡΠ΅ ΠΏΡΠ±Π»ΠΈΠΊΠ°ΡΠΈΠΈ ΠΈ ΠΈΠ·Π²Π»Π΅ΡΠ΅ΠΌ ΠΏΠΎΠ»Ρ title ΠΈ published_at
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"size": 1,
"_source": ["title", "published_at"],
"sort": [{"published_at": "desc"}]
}'{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : null,
"_source" : {
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"published_at" : "2014-09-12T20:44:42+00:00"
},
"sort" : [ 1410554682000 ]
} ]
}
}Waxaan doorannay qoraalkii ugu dambeeyay. size xaddidaya tirada dukumentiyada la soo saarayo. total waxay tusinaysaa wadarta guud ee dukumeentiyada u dhigma codsiga. sort Wax-soo-saarka waxa uu ka kooban yahay tirooyin tirooyin ah oo kala-soocidda. Kuwaas. taariikhda waxaa loo beddelay tiro. Macluumaad dheeraad ah oo ku saabsan kala-soocidda ayaa laga heli karaa .
Shaandhaynta iyo weydiimaha
ES tan iyo nooca 2 ma kala saaro filtarrada iyo weydiimaha, beddelkeeda .
Macnaha guud ee waydiinta way ka duwan tahay macnaha shaandhada taas oo waydiintu dhaliso _score oo aan la kaydin Waxaan ku tusi doonaa waxa _dhibcaha hadhow yahay.
Ku sifee taariikhda
Waxaan isticmaalnaa codsiga marka la eego shaandhada:
# ΠΏΠΎΠ»ΡΡΠΈΠΌ ΠΏΠΎΡΡΡ, ΠΎΠΏΡΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½ΡΠ΅ 1ΠΎΠ³ΠΎ ΡΠ΅Π½ΡΡΠ±ΡΡ ΠΈΠ»ΠΈ ΠΏΠΎΠ·ΠΆΠ΅
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"filter": {
"range": {
"published_at": { "gte": "2014-09-01" }
}
}
}'Ku kala shaandhee summada
Waxaan isticmaalnaa si aad u raadiso ids dukumeenti ka kooban kelmad la bixiyay:
# Π½Π°ΠΉΠ΄Π΅ΠΌ Π²ΡΠ΅ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΡ, Π² ΠΏΠΎΠ»Π΅ tags ΠΊΠΎΡΠΎΡΡΡ
Π΅ΡΡΡ ΡΠ»Π΅ΠΌΠ΅Π½Ρ 'ΠΊΠΎΡΡΡΠ°'
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"_source": [
"title",
"tags"
],
"filter": {
"term": {
"tags": "ΠΊΠΎΡΡΡΠ°"
}
}
}'{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "ΠΠ΅ΡΠ΅Π»ΡΠ΅ ΠΊΠΎΡΡΡΠ°",
"tags" : [ "ΠΊΠΎΡΡΡΠ°", "ΡΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ" ]
}
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "ΠΠ°ΠΊ Ρ ΠΌΠ΅Π½Ρ ΠΏΠΎΡΠ²ΠΈΠ»ΡΡ ΠΊΠΎΡΠ΅Π½ΠΎΠΊ",
"tags" : [ "ΠΊΠΎΡΡΡΠ°" ]
}
} ]
}
}Raadinta qoraalka buuxa
Saddex ka mid ah dukumeentiyadayadu waxay ka kooban yihiin kuwan soo socda goobta nuxurka:
<p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΠΊΠΎΡΡΡ<p><p>Π‘ΠΌΠ΅ΡΠ½Π°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ ΡΠ΅Π½ΠΊΠΎΠ²<p><p>ΠΡΡΠ΅ΡΠ°Π·Π΄ΠΈΡΠ°ΡΡΠ°Ρ ΠΈΡΡΠΎΡΠΈΡ ΠΏΡΠΎ Π±Π΅Π΄Π½ΠΎΠ³ΠΎ ΠΊΠΎΡΠ΅Π½ΠΊΠ° Ρ ΡΠ»ΠΈΡΡ<p>
Waxaan isticmaalnaa si aad u raadiso ids dukumeenti ka kooban kelmad la bixiyay:
# source: false ΠΎΠ·Π½Π°ΡΠ°Π΅Ρ, ΡΡΠΎ Π½Π΅ Π½ΡΠΆΠ½ΠΎ ΠΈΠ·Π²Π»Π΅ΠΊΠ°ΡΡ _source Π½Π°ΠΉΠ΄Π΅Π½Π½ΡΡ
Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ²
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"_source": false,
"query": {
"match": {
"content": "ΠΈΡΡΠΎΡΠΈΡ"
}
}
}'{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.11506981,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "2",
"_score" : 0.11506981
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : 0.11506981
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "3",
"_score" : 0.095891505
} ]
}
}Si kastaba ha noqotee, haddii aan raadinno "sheekooyin" goobta nuxurka, ma heli doonno wax, sababtoo ah Tusmadu waxa ay ka kooban tahay oo keliya ereyada asalka ah, ee ma aha afkooda. Si aad u samayso raadinta tayada sare leh, waxaad u baahan tahay inaad habayso falanqaynta.
field _score muujinaya . Haddii codsiga lagu fuliyo shaandhada, markaa qiimaha _score wuxuu had iyo jeer la mid noqonayaa 1, taas oo macnaheedu yahay isbarbardhig dhammaystiran shaandhada.
Falanqeeyayaasha
ayaa loo baahan yahay si loogu beddelo qoraalka isha laga dhigo tiro calaamado ah.
Falanqeeyayaasha ayaa ka kooban hal iyo dhawr doorasho . Tokenizer waxaa laga yaabaa inay ka horreeyaan dhowr . Tokenizer-yadu waxay jebiyaan xadhigga isha calaamado, sida meelaha bannaan iyo xarfaha xarakaynta. TokenFilter waxa uu beddeli karaa calaamado, tirtiri karaa ama ku dari karaa kuwa cusub, tusaale ahaan, waxa uu ka tagaa jiridda kelmadda, ka saara hor-u-jeedinta, ku darso ereyo la mid ah. CharFilter - waxay beddeshaa dhammaan xargaha isha, tusaale ahaan, waxay gooyaa tags html.
ES waxay leedahay dhowr . Tusaale ahaan, falanqeeye .
Aynu ka faa'iidaysano oo aan aragno sida halbeegga iyo falanqeeyayaasha Ruushku u beddelaan xadhigga "Sheekooyin qosol leh oo ku saabsan kittens":
# ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌ Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡ standard
# ΠΎΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ Π½ΡΠΆΠ½ΠΎ ΠΏΠ΅ΡΠ΅ΠΊΠΎΠ΄ΠΈΡΠΎΠ²Π°ΡΡ Π½Π΅ ASCII ΡΠΈΠΌΠ²ΠΎΠ»Ρ
curl -XGET "$ES_URL/_analyze?pretty&analyzer=standard&text=%D0%92%D0%B5%D1%81%D0%B5%D0%BB%D1%8B%D0%B5%20%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D0%B8%20%D0%BF%D1%80%D0%BE%20%D0%BA%D0%BE%D1%82%D1%8F%D1%82"{
"tokens" : [ {
"token" : "Π²Π΅ΡΠ΅Π»ΡΠ΅",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "ΠΈΡΡΠΎΡΠΈΠΈ",
"start_offset" : 8,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "ΠΏΡΠΎ",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "ΠΊΠΎΡΡΡ",
"start_offset" : 20,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 3
} ]
}# ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌ Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡ russian
curl -XGET "$ES_URL/_analyze?pretty&analyzer=russian&text=%D0%92%D0%B5%D1%81%D0%B5%D0%BB%D1%8B%D0%B5%20%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D0%B8%20%D0%BF%D1%80%D0%BE%20%D0%BA%D0%BE%D1%82%D1%8F%D1%82"{
"tokens" : [ {
"token" : "Π²Π΅ΡΠ΅Π»",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "ΠΈΡΡΠΎΡ",
"start_offset" : 8,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "ΠΊΠΎΡ",
"start_offset" : 20,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 3
} ]
}Falanqeeyaha caadiga ah wuxuu u kala qaybiyay xadhigga meelo bannaan oo wax walba u beddelay kiis hoose, falanqeeyaha Ruushku wuxuu ka saaray ereyo aan muhiim ahayn, wuxuu u beddelay kiis hoose oo ka tagay jiridda erayada.
Aynu aragno Tokenizer, TokenFilters, CharFilters falanqeeyaha Ruushku isticmaalo:
{
"filter": {
"russian_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"russian_keywords": {
"type": "keyword_marker",
"keywords": []
},
"russian_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"russian": {
"tokenizer": "standard",
/* TokenFilters */
"filter": [
"lowercase",
"russian_stop",
"russian_keywords",
"russian_stemmer"
]
/* CharFilters ΠΎΡΡΡΡΡΡΠ²ΡΡΡ */
}
}
}Aynu qeexno falanqeeyeheena ku salaysan Ruushka, kaas oo gooyn doona html tags. Aan ugu yeerno default, sababtoo ah Falanqeeye leh magacan ayaa loo isticmaali doonaa si caadi ah.
{
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
/* Π΄ΠΎΠ±Π°Π²Π»ΡΠ΅ΠΌ ΡΠ΄Π°Π»Π΅Π½ΠΈΠ΅ html ΡΠ΅Π³ΠΎΠ² */
"char_filter": ["html_strip"],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}Marka hore, dhammaan calaamadaha HTML ayaa laga saari doonaa xargaha isha, ka dibna heerka tokenizer wuxuu u kala qaybin doonaa calaamado, calaamadihii ka soo baxay waxay u guuri doonaan kuwa yaryar, ereyada aan muhiimka ahayn waa la saari doonaa, calaamadaha soo hadhayna waxay ahaan doonaan asalka ereyga.
Samaynta Tusaha
Xagga sare waxaan ku sharaxnay falanqeeyaha caadiga ah. Waxay khusaysaa dhammaan goobaha xargaha. Boostadayadu waxay ka kooban tahay sumadyo kala duwan, markaa summada waxaa sidoo kale habayn doona falanqeeyaha. Sababtoo ah Waxaan ku raadineynaa qoraalada si sax ah oo u dhigma sumadda, ka dib waxaan u baahanahay inaan joojino falanqaynta goobta tags-ka.
Aan abuurno index blog2 oo leh falanqeeye iyo khariidad, kaas oo falanqaynta goobta tagsku ay naafada tahay:
curl -XPOST "$ES_URL/blog2" -d'
{
"settings": {
"analysis": {
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}
},
"mappings": {
"post": {
"properties": {
"content": {
"type": "string"
},
"published_at": {
"type": "date"
},
"tags": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string"
}
}
}
}
}'Aynu ku darno isla 3 qoraalo tusahan (blog2). Waan ka tagi doonaa nidaamkan sababtoo ah... waxay la mid tahay in lagu daro dukumeentiyada tusmada blog-ka.
Raadinta qoraalka buuxa oo leh taageero odhaaheed
Aynu eegno nooc kale oo codsi ah:
# Π½Π°ΠΉΠ΄Π΅ΠΌ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΡ, Π² ΠΊΠΎΡΠΎΡΡΡ
Π²ΡΡΡΠ΅ΡΠ°Π΅ΡΡΡ ΡΠ»ΠΎΠ²ΠΎ 'ΠΈΡΡΠΎΡΠΈΠΈ'
# query -> simple_query_string -> query ΡΠΎΠ΄Π΅ΡΠΆΠΈΡ ΠΏΠΎΠΈΡΠΊΠΎΠ²ΡΠΉ Π·Π°ΠΏΡΠΎΡ
# ΠΏΠΎΠ»Π΅ title ΠΈΠΌΠ΅Π΅Ρ ΠΏΡΠΈΠΎΡΠΈΡΠ΅Ρ 3
# ΠΏΠΎΠ»Π΅ tags ΠΈΠΌΠ΅Π΅Ρ ΠΏΡΠΈΠΎΡΠΈΡΠ΅Ρ 2
# ΠΏΠΎΠ»Π΅ content ΠΈΠΌΠ΅Π΅Ρ ΠΏΡΠΈΠΎΡΠΈΡΠ΅Ρ 1
# ΠΏΡΠΈΠΎΡΠΈΡΠ΅Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ ΠΏΡΠΈ ΡΠ°Π½ΠΆΠΈΡΠΎΠ²Π°Π½ΠΈΠΈ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠ²
curl -XPOST "$ES_URL/blog2/post/_search?pretty" -d'
{
"query": {
"simple_query_string": {
"query": "ΠΈΡΡΠΎΡΠΈΠΈ",
"fields": [
"title^3",
"tags^2",
"content"
]
}
}
}'Sababtoo ah Waxaan isticmaaleynaa falanqeeye leh stemming Ruush, ka dib codsigani wuxuu soo celin doonaa dhammaan dukumentiyada, in kasta oo ay ka kooban yihiin kaliya ereyga 'taariikhda'.
Codsiga waxaa ku jiri kara xarfo gaar ah, tusaale ahaan:
""fried eggs" +(eggplant | potato) -frittata"Codso syntax:
+ signifies AND operation
| signifies OR operation
- negates a single token
" wraps a number of tokens to signify a phrase for searching
* at the end of a term signifies a prefix query
( and ) signify precedence
~N after a word signifies edit distance (fuzziness)
~N after a phrase signifies slop amount# Π½Π°ΠΉΠ΄Π΅ΠΌ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΡ Π±Π΅Π· ΡΠ»ΠΎΠ²Π° 'ΡΠ΅Π½ΠΊΠΈ'
curl -XPOST "$ES_URL/blog2/post/_search?pretty" -d'
{
"query": {
"simple_query_string": {
"query": "-ΡΠ΅Π½ΠΊΠΈ",
"fields": [
"title^3",
"tags^2",
"content"
]
}
}
}'
# ΠΏΠΎΠ»ΡΡΠΈΠΌ 2 ΠΏΠΎΡΡΠ° ΠΏΡΠΎ ΠΊΠΎΡΠΈΠΊΠΎΠ²tixraacyada
PS
Haddii aad xiisaynayso casharrada noocan oo kale ah, hayso fikrado maqaallo cusub ama aad hayso soo jeedinno iskaashi, waxaan ku farxi doonaa in aan helo fariin fariin shakhsi ah ama boostada m.kuzmin+habr@darkleaf.ru.
Source: www.habr.com
