Aasaaska Elasticsearch

Elasticsearch waa makiin raadin leh json rest api, isticmaalaya Lucene oo ku qoran Java. Tilmaanta dhammaan faa'iidooyinka matoorkaan ayaa laga heli karaa at website-ka rasmiga ah. Waxa soo socda waxaan u tixraaci doonaa Elasticsearch sida ES.

Matoorada la midka ah ayaa loo isticmaalaa raadinta adag ee xogta dukumeentiga. Tusaale ahaan, raadinta iyada oo la tixgelinayo qaab-dhismeedka luqadda ama raadinta isku-duwayaasha geo.

Maqaalkan waxaan ka hadli doonaa aasaaska ES aniga oo isticmaalaya tusaalaha tilmaamaya qoraallada blog-ka. Waxaan ku tusi doonaa sida loo shaandheeyo, loo kala saaro oo loo raadiyo dukumeentiyada

Si aanan ugu tiirsanaan nidaamka hawlgalka, waxaan samayn doonaa dhammaan codsiyada ES anigoo isticmaalaya CURL. Waxa kale oo jira plugin loogu talagalay google chrome oo la yiraahdo dareen.

Qoraalku wuxuu ka kooban yahay xiriirinta dukumeentiyada iyo ilo kale. Dhammaadka waxaa jira xiriiriyeyaal si degdeg ah loo galo dukumeentiyada. Qeexitaannada ereyada aan la aqoon ayaa laga heli karaa qaamuusyada.

Rakibaadda

Si tan loo sameeyo, waxaan marka hore u baahanahay Java. Horumarinta ku talin ku rakib noocyada Java ee ka cusub Java 8 update 20 ama Java 7 update 55.

Qaybinta ES ayaa laga heli karaa website horumariye. Ka dib markaad furto kaydka waxaad u baahan tahay inaad orod bin/elasticsearch. Sidoo kale waa la heli karaa baakadaha ku habboon iyo yum. waxaa jira sawirka rasmiga ah ee docker. Wax badan oo ku saabsan rakibidda.

Ka dib rakibidda iyo bilawga, aan hubinno shaqeynta:

# для удобства Π·Π°ΠΏΠΎΠΌΠ½ΠΈΠΌ адрСс Π² ΠΏΠ΅Ρ€Π΅ΠΌΠ΅Π½Π½ΡƒΡŽ
#export ES_URL=$(docker-machine ip dev):9200
export ES_URL=localhost:9200

curl -X GET $ES_URL

Waxaan heli doonaa wax sidan oo kale ah:

{
  "name" : "Heimdall",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.2.1",
    "build_hash" : "d045fc29d1932bce18b2e65ab8b297fbf6cd41a1",
    "build_timestamp" : "2016-03-09T09:38:54Z",
    "build_snapshot" : false,
    "lucene_version" : "5.4.1"
  },
  "tagline" : "You Know, for Search"
}

Tilmaanta

Aan ku darno qoraal ES:

# Π”ΠΎΠ±Π°Π²ΠΈΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ c id 1 Ρ‚ΠΈΠΏΠ° post Π² индСкс blog.
# ?pretty ΡƒΠΊΠ°Π·Ρ‹Π²Π°Π΅Ρ‚, Ρ‡Ρ‚ΠΎ Π²Ρ‹Π²ΠΎΠ΄ Π΄ΠΎΠ»ΠΆΠ΅Π½ Π±Ρ‹Ρ‚ΡŒ Ρ‡Π΅Π»ΠΎΠ²Π΅ΠΊΠΎ-Ρ‡ΠΈΡ‚Π°Π΅ΠΌΡ‹ΠΌ.

curl -XPUT "$ES_URL/blog/post/1?pretty" -d'
{
  "title": "ВСсСлыС котята",
  "content": "<p>БмСшная история ΠΏΡ€ΠΎ котят<p>",
  "tags": [
    "котята",
    "смСшная история"
  ],
  "published_at": "2014-09-12T20:44:42+00:00"
}'

jawaabta serverka:

{
  "_index" : "blog",
  "_type" : "post",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : false
}

ES si toos ah ayaa loo sameeyay index blog iyo nooca boostada. Waxaan sawiri karnaa isbarbardhig shuruudaysan: tusmuhu waa xog ururin, noocuna waa miis ku jira xogtan. Nooc kastaa wuxuu leeyahay nidaam u gaar ah - khariidaynta, sida miiska xiriirka. Khariidaynta si toos ah ayaa loo soo saarayaa marka dukumeentiga la tilmaamo:

# ΠŸΠΎΠ»ΡƒΡ‡ΠΈΠΌ mapping всСх Ρ‚ΠΈΠΏΠΎΠ² индСкса blog
curl -XGET "$ES_URL/blog/_mapping?pretty"

Jawaabta server-ka, waxaan ku daray qiyamka goobaha dukumeentiga la tilmaamay ee faallooyinka:

{
  "blog" : {
    "mappings" : {
      "post" : {
        "properties" : {
          /* "content": "<p>БмСшная история ΠΏΡ€ΠΎ котят<p>", */ 
          "content" : {
            "type" : "string"
          },
          /* "published_at": "2014-09-12T20:44:42+00:00" */
          "published_at" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          /* "tags": ["котята", "смСшная история"] */
          "tags" : {
            "type" : "string"
          },
          /*  "title": "ВСсСлыС котята" */
          "title" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

Waxaa xusid mudan in ES-du aysan kala sooceynin hal qiime iyo qiyamyo kala duwan. Tusaale ahaan, goobta cinwaanku waxay si fudud uga kooban tahay cinwaan, goobta tags-kuna waxa ay ka kooban tahay xargo badan, in kasta oo ay si isku mid ah uga muuqdaan khariidaynta.
Wax badan ayaanu ka hadli doonaa khariidaynta.

Codsiyada

Soo celinta dukumeenti aqoonsigiisa:

# ΠΈΠ·Π²Π»Π΅Ρ‡Π΅ΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ с id 1 Ρ‚ΠΈΠΏΠ° post ΠΈΠ· индСкса blog
curl -XGET "$ES_URL/blog/post/1?pretty"
{
  "_index" : "blog",
  "_type" : "post",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "title" : "ВСсСлыС котята",
    "content" : "<p>БмСшная история ΠΏΡ€ΠΎ котят<p>",
    "tags" : [ "котята", "смСшная история" ],
    "published_at" : "2014-09-12T20:44:42+00:00"
  }
}

Furayaal cusub ayaa ka soo muuqday jawaabta: _version ΠΈ _source. Guud ahaan, dhammaan furayaasha laga bilaabo _ waxaa loo kala saaraa si rasmi ah.

Furaha _version waxay tusinaysaa nooca dukumentiga. Waxaa loo baahan yahay in habka qufulka ee rajooyinka leh uu shaqeeyo. Tusaale ahaan, waxaan rabnaa inaan bedelno dukumeenti leh nooca 1. Waxaan soo gudbineynaa dukumeenti la beddelay oo aan muujineyno in kani yahay tafatirka dukumeenti leh nooca 1. Haddii qof kale uu sidoo kale tafatiray dukumeenti leh nooca 1 oo uu soo gudbiyay isbeddello annaga naga hor, markaa ES ma aqbali doonto isbedeladayada, sababtoo ah waxay ku kaydinaysaa dukumeentiga nooca 2.

Furaha _source waxa ku jira dukumeenti aanu tilmaanay. ES uma isticmaasho qiimahan hawlgallada raadinta sababtoo ah Tilmaamaha waxaa loo isticmaalaa raadinta. Si loo badbaadiyo meel bannaan, ES waxay kaydisaa dukumeenti isha la isku cadeeyey. Haddii aan kaliya u baahnayn id, oo aan u baahnayn dhammaan dukumeentiga isha, markaa waxaan joojin karnaa kaydinta isha.

Haddii aanan u baahnayn macluumaad dheeri ah, waxaan heli karnaa kaliya waxa ku jira _source:

curl -XGET "$ES_URL/blog/post/1/_source?pretty"
{
  "title" : "ВСсСлыС котята",
  "content" : "<p>БмСшная история ΠΏΡ€ΠΎ котят<p>",
  "tags" : [ "котята", "смСшная история" ],
  "published_at" : "2014-09-12T20:44:42+00:00"
}

Waxa kale oo aad dooran kartaa beeraha qaarkood:

# ΠΈΠ·Π²Π»Π΅Ρ‡Π΅ΠΌ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ ΠΏΠΎΠ»Π΅ title
curl -XGET "$ES_URL/blog/post/1?_source=title&pretty"
{
  "_index" : "blog",
  "_type" : "post",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "title" : "ВСсСлыС котята"
  }
}

Aynu tix-gelinno dhowr qoraal oo kale oo aan wadno weydiimo kakan.

curl -XPUT "$ES_URL/blog/post/2" -d'
{
  "title": "ВСсСлыС Ρ‰Π΅Π½ΠΊΠΈ",
  "content": "<p>БмСшная история ΠΏΡ€ΠΎ Ρ‰Π΅Π½ΠΊΠΎΠ²<p>",
  "tags": [
    "Ρ‰Π΅Π½ΠΊΠΈ",
    "смСшная история"
  ],
  "published_at": "2014-08-12T20:44:42+00:00"
}'
curl -XPUT "$ES_URL/blog/post/3" -d'
{
  "title": "Как Ρƒ мСня появился ΠΊΠΎΡ‚Π΅Π½ΠΎΠΊ",
  "content": "<p>Π”ΡƒΡˆΠ΅Ρ€Π°Π·Π΄ΠΈΡ€Π°ΡŽΡ‰Π°Ρ история ΠΏΡ€ΠΎ Π±Π΅Π΄Π½ΠΎΠ³ΠΎ ΠΊΠΎΡ‚Π΅Π½ΠΊΠ° с ΡƒΠ»ΠΈΡ†Ρ‹<p>",
  "tags": [
    "котята"
  ],
  "published_at": "2014-07-21T20:44:42+00:00"
}'

Kala sooc

# Π½Π°ΠΉΠ΄Π΅ΠΌ послСдний пост ΠΏΠΎ Π΄Π°Ρ‚Π΅ ΠΏΡƒΠ±Π»ΠΈΠΊΠ°Ρ†ΠΈΠΈ ΠΈ ΠΈΠ·Π²Π»Π΅Ρ‡Π΅ΠΌ поля title ΠΈ published_at
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
  "size": 1,
  "_source": ["title", "published_at"],
  "sort": [{"published_at": "desc"}]
}'
{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : null,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "1",
      "_score" : null,
      "_source" : {
        "title" : "ВСсСлыС котята",
        "published_at" : "2014-09-12T20:44:42+00:00"
      },
      "sort" : [ 1410554682000 ]
    } ]
  }
}

Waxaan doorannay qoraalkii ugu dambeeyay. size xaddidaya tirada dukumentiyada la soo saarayo. total waxay tusinaysaa wadarta guud ee dukumeentiyada u dhigma codsiga. sort Wax-soo-saarka waxa uu ka kooban yahay tirooyin tirooyin ah oo kala-soocidda. Kuwaas. taariikhda waxaa loo beddelay tiro. Macluumaad dheeraad ah oo ku saabsan kala-soocidda ayaa laga heli karaa dukumentiyo.

Shaandhaynta iyo weydiimaha

ES tan iyo nooca 2 ma kala saaro filtarrada iyo weydiimaha, beddelkeeda fikradda macnaha guud ayaa la soo bandhigay.
Macnaha guud ee waydiinta way ka duwan tahay macnaha shaandhada taas oo waydiintu dhaliso _score oo aan la kaydin Waxaan ku tusi doonaa waxa _dhibcaha hadhow yahay.

Ku sifee taariikhda

Waxaan isticmaalnaa codsiga kala duwan marka la eego shaandhada:

# ΠΏΠΎΠ»ΡƒΡ‡ΠΈΠΌ посты, ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½Ρ‹Π΅ 1ΠΎΠ³ΠΎ сСнтября ΠΈΠ»ΠΈ ΠΏΠΎΠ·ΠΆΠ΅
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
  "filter": {
    "range": {
      "published_at": { "gte": "2014-09-01" }
    }
  }
}'

Ku kala shaandhee summada

Waxaan isticmaalnaa weydiimaha muddada si aad u raadiso ids dukumeenti ka kooban kelmad la bixiyay:

# Π½Π°ΠΉΠ΄Π΅ΠΌ всС Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Ρ‹, Π² ΠΏΠΎΠ»Π΅ tags ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… Π΅ΡΡ‚ΡŒ элСмСнт 'котята'
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
  "_source": [
    "title",
    "tags"
  ],
  "filter": {
    "term": {
      "tags": "котята"
    }
  }
}'
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "title" : "ВСсСлыС котята",
        "tags" : [ "котята", "смСшная история" ]
      }
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "3",
      "_score" : 1.0,
      "_source" : {
        "title" : "Как Ρƒ мСня появился ΠΊΠΎΡ‚Π΅Π½ΠΎΠΊ",
        "tags" : [ "котята" ]
      }
    } ]
  }
}

Raadinta qoraalka buuxa

Saddex ka mid ah dukumeentiyadayadu waxay ka kooban yihiin kuwan soo socda goobta nuxurka:

  • <p>БмСшная история ΠΏΡ€ΠΎ котят<p>
  • <p>БмСшная история ΠΏΡ€ΠΎ Ρ‰Π΅Π½ΠΊΠΎΠ²<p>
  • <p>Π”ΡƒΡˆΠ΅Ρ€Π°Π·Π΄ΠΈΡ€Π°ΡŽΡ‰Π°Ρ история ΠΏΡ€ΠΎ Π±Π΅Π΄Π½ΠΎΠ³ΠΎ ΠΊΠΎΡ‚Π΅Π½ΠΊΠ° с ΡƒΠ»ΠΈΡ†Ρ‹<p>

Waxaan isticmaalnaa weydiinta kulan si aad u raadiso ids dukumeenti ka kooban kelmad la bixiyay:

# source: false ΠΎΠ·Π½Π°Ρ‡Π°Π΅Ρ‚, Ρ‡Ρ‚ΠΎ Π½Π΅ Π½ΡƒΠΆΠ½ΠΎ ΠΈΠ·Π²Π»Π΅ΠΊΠ°Ρ‚ΡŒ _source Π½Π°ΠΉΠ΄Π΅Π½Π½Ρ‹Ρ… Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ²
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
  "_source": false,
  "query": {
    "match": {
      "content": "история"
    }
  }
}'
{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.11506981,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "2",
      "_score" : 0.11506981
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "1",
      "_score" : 0.11506981
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "3",
      "_score" : 0.095891505
    } ]
  }
}

Si kastaba ha noqotee, haddii aan raadinno "sheekooyin" goobta nuxurka, ma heli doonno wax, sababtoo ah Tusmadu waxa ay ka kooban tahay oo keliya ereyada asalka ah, ee ma aha afkooda. Si aad u samayso raadinta tayada sare leh, waxaad u baahan tahay inaad habayso falanqaynta.

field _score muujinaya ku habboonaanta. Haddii codsiga lagu fuliyo shaandhada, markaa qiimaha _score wuxuu had iyo jeer la mid noqonayaa 1, taas oo macnaheedu yahay isbarbardhig dhammaystiran shaandhada.

Falanqeeyayaasha

Falanqeeyayaasha ayaa loo baahan yahay si loogu beddelo qoraalka isha laga dhigo tiro calaamado ah.
Falanqeeyayaasha ayaa ka kooban hal Qeexiyaha iyo dhawr doorasho Tusmooyinka Token. Tokenizer waxaa laga yaabaa inay ka horreeyaan dhowr CharFilters. Tokenizer-yadu waxay jebiyaan xadhigga isha calaamado, sida meelaha bannaan iyo xarfaha xarakaynta. TokenFilter waxa uu beddeli karaa calaamado, tirtiri karaa ama ku dari karaa kuwa cusub, tusaale ahaan, waxa uu ka tagaa jiridda kelmadda, ka saara hor-u-jeedinta, ku darso ereyo la mid ah. CharFilter - waxay beddeshaa dhammaan xargaha isha, tusaale ahaan, waxay gooyaa tags html.

ES waxay leedahay dhowr falanqeeyayaasha caadiga ah. Tusaale ahaan, falanqeeye Ruush.

Aynu ka faa'iidaysano API oo aan aragno sida halbeegga iyo falanqeeyayaasha Ruushku u beddelaan xadhigga "Sheekooyin qosol leh oo ku saabsan kittens":

# ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€ standard       
# ΠΎΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ Π½ΡƒΠΆΠ½ΠΎ ΠΏΠ΅Ρ€Π΅ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ Π½Π΅ ASCII символы
curl -XGET "$ES_URL/_analyze?pretty&analyzer=standard&text=%D0%92%D0%B5%D1%81%D0%B5%D0%BB%D1%8B%D0%B5%20%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D0%B8%20%D0%BF%D1%80%D0%BE%20%D0%BA%D0%BE%D1%82%D1%8F%D1%82"
{
  "tokens" : [ {
    "token" : "вСсСлыС",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "<ALPHANUM>",
    "position" : 0
  }, {
    "token" : "истории",
    "start_offset" : 8,
    "end_offset" : 15,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "ΠΏΡ€ΠΎ",
    "start_offset" : 16,
    "end_offset" : 19,
    "type" : "<ALPHANUM>",
    "position" : 2
  }, {
    "token" : "котят",
    "start_offset" : 20,
    "end_offset" : 25,
    "type" : "<ALPHANUM>",
    "position" : 3
  } ]
}
# ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€ russian
curl -XGET "$ES_URL/_analyze?pretty&analyzer=russian&text=%D0%92%D0%B5%D1%81%D0%B5%D0%BB%D1%8B%D0%B5%20%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D0%B8%20%D0%BF%D1%80%D0%BE%20%D0%BA%D0%BE%D1%82%D1%8F%D1%82"
{
  "tokens" : [ {
    "token" : "вСсСл",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "<ALPHANUM>",
    "position" : 0
  }, {
    "token" : "истор",
    "start_offset" : 8,
    "end_offset" : 15,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "ΠΊΠΎΡ‚",
    "start_offset" : 20,
    "end_offset" : 25,
    "type" : "<ALPHANUM>",
    "position" : 3
  } ]
}

Falanqeeyaha caadiga ah wuxuu u kala qaybiyay xadhigga meelo bannaan oo wax walba u beddelay kiis hoose, falanqeeyaha Ruushku wuxuu ka saaray ereyo aan muhiim ahayn, wuxuu u beddelay kiis hoose oo ka tagay jiridda erayada.

Aynu aragno Tokenizer, TokenFilters, CharFilters falanqeeyaha Ruushku isticmaalo:

{
  "filter": {
    "russian_stop": {
      "type":       "stop",
      "stopwords":  "_russian_"
    },
    "russian_keywords": {
      "type":       "keyword_marker",
      "keywords":   []
    },
    "russian_stemmer": {
      "type":       "stemmer",
      "language":   "russian"
    }
  },
  "analyzer": {
    "russian": {
      "tokenizer":  "standard",
      /* TokenFilters */
      "filter": [
        "lowercase",
        "russian_stop",
        "russian_keywords",
        "russian_stemmer"
      ]
      /* CharFilters ΠΎΡ‚ΡΡƒΡ‚ΡΡ‚Π²ΡƒΡŽΡ‚ */
    }
  }
}

Aynu qeexno falanqeeyeheena ku salaysan Ruushka, kaas oo gooyn doona html tags. Aan ugu yeerno default, sababtoo ah Falanqeeye leh magacan ayaa loo isticmaali doonaa si caadi ah.

{
  "filter": {
    "ru_stop": {
      "type":       "stop",
      "stopwords":  "_russian_"
    },
    "ru_stemmer": {
      "type":       "stemmer",
      "language":   "russian"
    }
  },
  "analyzer": {
    "default": {
      /* добавляСм ΡƒΠ΄Π°Π»Π΅Π½ΠΈΠ΅ html Ρ‚Π΅Π³ΠΎΠ² */
      "char_filter": ["html_strip"],
      "tokenizer":  "standard",
      "filter": [
        "lowercase",
        "ru_stop",
        "ru_stemmer"
      ]
    }
  }
}

Marka hore, dhammaan calaamadaha HTML ayaa laga saari doonaa xargaha isha, ka dibna heerka tokenizer wuxuu u kala qaybin doonaa calaamado, calaamadihii ka soo baxay waxay u guuri doonaan kuwa yaryar, ereyada aan muhiimka ahayn waa la saari doonaa, calaamadaha soo hadhayna waxay ahaan doonaan asalka ereyga.

Samaynta Tusaha

Xagga sare waxaan ku sharaxnay falanqeeyaha caadiga ah. Waxay khusaysaa dhammaan goobaha xargaha. Boostadayadu waxay ka kooban tahay sumadyo kala duwan, markaa summada waxaa sidoo kale habayn doona falanqeeyaha. Sababtoo ah Waxaan ku raadineynaa qoraalada si sax ah oo u dhigma sumadda, ka dib waxaan u baahanahay inaan joojino falanqaynta goobta tags-ka.

Aan abuurno index blog2 oo leh falanqeeye iyo khariidad, kaas oo falanqaynta goobta tagsku ay naafada tahay:

curl -XPOST "$ES_URL/blog2" -d'
{
  "settings": {
    "analysis": {
      "filter": {
        "ru_stop": {
          "type": "stop",
          "stopwords": "_russian_"
        },
        "ru_stemmer": {
          "type": "stemmer",
          "language": "russian"
        }
      },
      "analyzer": {
        "default": {
          "char_filter": [
            "html_strip"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ru_stop",
            "ru_stemmer"
          ]
        }
      }
    }
  },
  "mappings": {
    "post": {
      "properties": {
        "content": {
          "type": "string"
        },
        "published_at": {
          "type": "date"
        },
        "tags": {
          "type": "string",
          "index": "not_analyzed"
        },
        "title": {
          "type": "string"
        }
      }
    }
  }
}'

Aynu ku darno isla 3 qoraalo tusahan (blog2). Waan ka tagi doonaa nidaamkan sababtoo ah... waxay la mid tahay in lagu daro dukumeentiyada tusmada blog-ka.

Raadinta qoraalka buuxa oo leh taageero odhaaheed

Aynu eegno nooc kale oo codsi ah:

# Π½Π°ΠΉΠ΄Π΅ΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Ρ‹, Π² ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… встрСчаСтся слово 'истории'
# query -> simple_query_string -> query содСрТит поисковый запрос
# ΠΏΠΎΠ»Π΅ title ΠΈΠΌΠ΅Π΅Ρ‚ ΠΏΡ€ΠΈΠΎΡ€ΠΈΡ‚Π΅Ρ‚ 3
# ΠΏΠΎΠ»Π΅ tags ΠΈΠΌΠ΅Π΅Ρ‚ ΠΏΡ€ΠΈΠΎΡ€ΠΈΡ‚Π΅Ρ‚ 2
# ΠΏΠΎΠ»Π΅ content ΠΈΠΌΠ΅Π΅Ρ‚ ΠΏΡ€ΠΈΠΎΡ€ΠΈΡ‚Π΅Ρ‚ 1
# ΠΏΡ€ΠΈΠΎΡ€ΠΈΡ‚Π΅Ρ‚ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ ΠΏΡ€ΠΈ Ρ€Π°Π½ΠΆΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠΈ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ²
curl -XPOST "$ES_URL/blog2/post/_search?pretty" -d'
{
  "query": {
    "simple_query_string": {
      "query": "истории",
      "fields": [
        "title^3",
        "tags^2",
        "content"
      ]
    }
  }
}'

Sababtoo ah Waxaan isticmaaleynaa falanqeeye leh stemming Ruush, ka dib codsigani wuxuu soo celin doonaa dhammaan dukumentiyada, in kasta oo ay ka kooban yihiin kaliya ereyga 'taariikhda'.

Codsiga waxaa ku jiri kara xarfo gaar ah, tusaale ahaan:

""fried eggs" +(eggplant | potato) -frittata"

Codso syntax:

+ signifies AND operation
| signifies OR operation
- negates a single token
" wraps a number of tokens to signify a phrase for searching
* at the end of a term signifies a prefix query
( and ) signify precedence
~N after a word signifies edit distance (fuzziness)
~N after a phrase signifies slop amount
# Π½Π°ΠΉΠ΄Π΅ΠΌ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Ρ‹ Π±Π΅Π· слова 'Ρ‰Π΅Π½ΠΊΠΈ'
curl -XPOST "$ES_URL/blog2/post/_search?pretty" -d'
{
  "query": {
    "simple_query_string": {
      "query": "-Ρ‰Π΅Π½ΠΊΠΈ",
      "fields": [
        "title^3",
        "tags^2",
        "content"
      ]
    }
  }
}'

# ΠΏΠΎΠ»ΡƒΡ‡ΠΈΠΌ 2 поста ΠΏΡ€ΠΎ ΠΊΠΎΡ‚ΠΈΠΊΠΎΠ²

tixraacyada

PS

Haddii aad xiisaynayso maqaallo la mid ah -casharro, hayso fikrado maqaallo cusub, ama aad leedahay soo jeedin iskaashi, markaa waan ku farxi doonaa inaan farriin ku helo farriin shakhsi ah ama iimayl [emailka waa la ilaaliyay].

Source: www.habr.com