Elasticsearch o se masini suʻesuʻe ma le json rest api, faʻaaoga Lucene ma tusia i Java. O lo'o maua se fa'amatalaga o mea lelei uma o lenei afi i . I le mea o loʻo mulimuli mai o le a tatou vaʻai i Elasticsearch pei ES.
E fa'aogaina masini fa'atusa mo su'esu'ega lavelave i totonu o fa'amaumauga fa'amaumauga. Mo se fa'ata'ita'iga, su'esu'e e fa'atatau i le morphology o le gagana po'o le su'esu'e e fa'atūlaga fa'afanua.
I totonu o lenei tusiga o le a ou talanoa e uiga i faʻavae o le ES faʻaaogaina le faʻataʻitaʻiga o le faʻasinoina o pou blog. O le a ou fa'aali atu ia te oe le fa'amama, fa'avasega ma su'esu'e pepa.
Ina ia aua nei faʻalagolago i le faiga faʻaogaina, o le a ou faia uma talosaga i le ES faʻaaoga CURL. O loʻo iai foʻi se faʻapipiʻi mo google chrome e taʻua .
O tusitusiga o lo'o iai feso'ota'iga i fa'amaumauga ma isi fa'apogai. I le faaiuga o loʻo i ai fesoʻotaʻiga mo le vave maua o faʻamaumauga. E mafai ona maua fa'amatalaga o upu e le masani ai ile .
Fa'apipi'i ES
Ina ia faia lenei mea, matou te manaʻomia muamua Java. Atina'e faʻapipiʻi Java versions fou nai lo Java 8 faʻafouina 20 poʻo Java 7 faʻafouina 55.
O lo'o maua le tufatufaga ES ile . A maeʻa ona tatala le faʻamaumauga e te manaʻomia e tamoe bin/elasticsearch. E avanoa foi . E i ai . .
A maeʻa faʻapipiʻi ma faʻalauiloa, seʻi o tatou siaki le gaioiga:
# для удобства запомним адрес в переменную
#export ES_URL=$(docker-machine ip dev):9200
export ES_URL=localhost:9200
curl -X GET $ES_URLO le a matou mauaina se mea e pei o lenei:
{
"name" : "Heimdall",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.2.1",
"build_hash" : "d045fc29d1932bce18b2e65ab8b297fbf6cd41a1",
"build_timestamp" : "2016-03-09T09:38:54Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}Fa'asinomaga
Se'i o tatou fa'aopoopo se pou i le ES:
# Добавим документ c id 1 типа post в индекс blog.
# ?pretty указывает, что вывод должен быть человеко-читаемым.
curl -XPUT "$ES_URL/blog/post/1?pretty" -d'
{
"title": "Веселые котята",
"content": "<p>Смешная история про котят<p>",
"tags": [
"котята",
"смешная история"
],
"published_at": "2014-09-12T20:44:42+00:00"
}'
tali a le server:
{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : false
}
E otometi lava ona faia blog ma pou. E mafai ona tatou tusia se fa'atusa fa'atatau: o se fa'asinomaga o se fa'amaumauga, ma o se ituaiga o se laulau i totonu o lenei fa'amaumauga. E tofu ituaiga ta'itasi ma lana polokalame − , e pei lava o se laulau fa'afeso'ota'i. E otometi ona faia fa'afanua pe a fa'asino le pepa:
# Получим mapping всех типов индекса blog
curl -XGET "$ES_URL/blog/_mapping?pretty"I le tali a le 'auʻaunaga, na ou faʻaopoopoina le taua o fanua o le faʻasinomaga pepa i faʻamatalaga:
{
"blog" : {
"mappings" : {
"post" : {
"properties" : {
/* "content": "<p>Смешная история про котят<p>", */
"content" : {
"type" : "string"
},
/* "published_at": "2014-09-12T20:44:42+00:00" */
"published_at" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
/* "tags": ["котята", "смешная история"] */
"tags" : {
"type" : "string"
},
/* "title": "Веселые котята" */
"title" : {
"type" : "string"
}
}
}
}
}
}E taua le matauina o le ES e le faʻaeseeseeseina i le va o se tasi tau ma se faʻasologa o tau. Mo se faʻataʻitaʻiga, o le ulutala ulutala o loʻo i ai se ulutala, ma o faʻailoga faʻailoga o loʻo i ai le tele o manoa, e ui lava o loʻo faʻatusalia i le auala lava e tasi i le faʻafanua.
O le a tatou talanoa atili e uiga i faafanua mulimuli ane.
Talosaga
Toe aumai se pepa i lona id:
# извлечем документ с id 1 типа post из индекса blog
curl -XGET "$ES_URL/blog/post/1?pretty"{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "Веселые котята",
"content" : "<p>Смешная история про котят<p>",
"tags" : [ "котята", "смешная история" ],
"published_at" : "2014-09-12T20:44:42+00:00"
}
}Na aliali mai ki fou i le tali: _version и _source. I se tulaga lautele, o ki uma e amata i _ o lo'o fa'avasegaina o ni tagata aloa'ia.
Ki _version fa'aalia le fa'asologa o pepa. E manaʻomia mo le faʻaogaina o le loka faʻamoemoe e galue. Mo se faʻataʻitaʻiga, matou te manaʻo e sui se pepa o loʻo i ai le version 1. Matou te tuʻuina atu le suiga o le pepa ma faʻaalia o se faʻataʻitaʻiga lea o se pepa faʻatasi ma le version 1. Afai e faʻasaʻo e se tasi se pepa faʻatasi ma le version 1 ma tuʻuina mai suiga i o matou luma, ona E le talia e ES a tatou suiga, aua na te teuina le pepa ma le version 2.
Ki _source o lo'o i ai le pepa na matou fa'asinoina. E le fa'aogaina e le ES lea tau mo galuega su'esu'e ona O lo'o fa'aogaina fa'asino igoa mo su'esu'ega. Ina ia fa'asaoina le avanoa, e teu ai e le ES se pepa fa'apogai. Afai tatou te manaʻomia naʻo le id, ae le o le faʻamaumauga atoa, ona mafai lea ona tatou faʻamalo le teuina o punaoa.
Afai matou te le manaʻomia ni faʻamatalaga faaopoopo, e mafai ona matou mauaina naʻo mea o loʻo i totonu o le _source:
curl -XGET "$ES_URL/blog/post/1/_source?pretty"{
"title" : "Веселые котята",
"content" : "<p>Смешная история про котят<p>",
"tags" : [ "котята", "смешная история" ],
"published_at" : "2014-09-12T20:44:42+00:00"
}
E mafai fo'i ona e filifilia na'o vaega fa'apitoa:
# извлечем только поле title
curl -XGET "$ES_URL/blog/post/1?_source=title&pretty"{
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "Веселые котята"
}
}Se'i tatou fa'asino i ni nai pou ma fa'agasolo fesili lavelave.
curl -XPUT "$ES_URL/blog/post/2" -d'
{
"title": "Веселые щенки",
"content": "<p>Смешная история про щенков<p>",
"tags": [
"щенки",
"смешная история"
],
"published_at": "2014-08-12T20:44:42+00:00"
}'curl -XPUT "$ES_URL/blog/post/3" -d'
{
"title": "Как у меня появился котенок",
"content": "<p>Душераздирающая история про бедного котенка с улицы<p>",
"tags": [
"котята"
],
"published_at": "2014-07-21T20:44:42+00:00"
}'Faʻavasegaina
# найдем последний пост по дате публикации и извлечем поля title и published_at
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"size": 1,
"_source": ["title", "published_at"],
"sort": [{"published_at": "desc"}]
}'{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : null,
"_source" : {
"title" : "Веселые котята",
"published_at" : "2014-09-12T20:44:42+00:00"
},
"sort" : [ 1410554682000 ]
} ]
}
}Na matou filifilia le pou mulimuli. size fa'atapula'a le aofa'i o pepa e tu'uina atu. total fa'aalia le aofa'i o pepa e fetaui ma le talosaga. sort i totonu o le fa'auluuluga o lo'o i ai se fa'asologa o numera e fa'atino ai le fa'avasegaina. O na. o le aso na liua i le numera atoa. E mafai ona maua nisi fa'amatalaga e uiga i le fa'avasegaina i totonu .
Filifiliga ma fesili
ES talu mai le fa'aaliga 2 e le'o fa'ailogaina le va o filiga ma fesili, nai lo .
E 'ese'ese le tala'aga o le fa'amatalaga mai se fa'amatalaga fa'amama ona o le su'esu'ega e maua ai se _score ma e le'o fa'asaoina. O le a ou faaali atu ia te oe le _score mulimuli ane.
Filifili ile aso
Matou te faʻaaogaina le talosaga i le tulaga o le faamama:
# получим посты, опубликованные 1ого сентября или позже
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"filter": {
"range": {
"published_at": { "gte": "2014-09-01" }
}
}
}'Filifili e fa'ailoga
Matou te faaaogaina e su'e ai id pepa o lo'o iai se upu ua tu'uina atu:
# найдем все документы, в поле tags которых есть элемент 'котята'
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"_source": [
"title",
"tags"
],
"filter": {
"term": {
"tags": "котята"
}
}
}'{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "Веселые котята",
"tags" : [ "котята", "смешная история" ]
}
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "Как у меня появился котенок",
"tags" : [ "котята" ]
}
} ]
}
}Su'ega tusitusiga atoa
E tolu a matou pepa o loʻo i ai mea nei i totonu o le anotusi:
<p>Смешная история про котят<p><p>Смешная история про щенков<p><p>Душераздирающая история про бедного котенка с улицы<p>
Matou te faaaogaina e su'e ai id pepa o lo'o iai se upu ua tu'uina atu:
# source: false означает, что не нужно извлекать _source найденных документов
curl -XGET "$ES_URL/blog/post/_search?pretty" -d'
{
"_source": false,
"query": {
"match": {
"content": "история"
}
}
}'{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.11506981,
"hits" : [ {
"_index" : "blog",
"_type" : "post",
"_id" : "2",
"_score" : 0.11506981
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "1",
"_score" : 0.11506981
}, {
"_index" : "blog",
"_type" : "post",
"_id" : "3",
"_score" : 0.095891505
} ]
}
}Ae peitaʻi, afai tatou te suʻeina "tala" i totonu o le anotusi, tatou te le maua se mea, aua O le faasinoupu e aofia ai na o uluai upu, ae le o latou au. Ina ia faia se suʻesuʻega maualuga, e tatau ona e faʻatulagaina le suʻega.
laufanua _score faʻaali . Afai e fa'atinoina le talosaga i totonu o se fa'aoga faamama, o le tau o le _score o le a tutusa i taimi uma ma le 1, o lona uiga o se fetaui atoatoa i le faamama.
Tagata suʻesuʻe
e mana'omia e fa'aliliu ai le fa'amatalaga autu i se seti o fa'ailoga.
E tasi le au su'esu'e ma le tele o filifiliga . Tokenizer atonu e muamua i nisi . Tokenizers vaevae le manoa puna i ni faailoga, e pei o avanoa ma mataitusi faailoga. TokenFilter e mafai ona suia faʻailoga, tape pe faʻaopoopo mea fou, mo se faʻataʻitaʻiga, tuʻu naʻo le ogalaau o le upu, aveese prepositions, faʻaopoopo upu tutusa. CharFilter - suia le manoa puna uma, mo se faʻataʻitaʻiga, tipi ese pine html.
ES e tele . Mo se faʻataʻitaʻiga, se suʻesuʻega .
Tatou faaaoga tatau ma seʻi o tatou vaʻai pe faʻafefea ona suia e le au suʻesuʻe masani ma Rusia le manoa "Tala malie e uiga i pusi":
# используем анализатор standard
# обязательно нужно перекодировать не ASCII символы
curl -XGET "$ES_URL/_analyze?pretty&analyzer=standard&text=%D0%92%D0%B5%D1%81%D0%B5%D0%BB%D1%8B%D0%B5%20%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D0%B8%20%D0%BF%D1%80%D0%BE%20%D0%BA%D0%BE%D1%82%D1%8F%D1%82"{
"tokens" : [ {
"token" : "веселые",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "истории",
"start_offset" : 8,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "про",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "котят",
"start_offset" : 20,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 3
} ]
}# используем анализатор russian
curl -XGET "$ES_URL/_analyze?pretty&analyzer=russian&text=%D0%92%D0%B5%D1%81%D0%B5%D0%BB%D1%8B%D0%B5%20%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D0%B8%20%D0%BF%D1%80%D0%BE%20%D0%BA%D0%BE%D1%82%D1%8F%D1%82"{
"tokens" : [ {
"token" : "весел",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "истор",
"start_offset" : 8,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "кот",
"start_offset" : 20,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 3
} ]
}Na vaeluaina e le tagata su'esu'e masani le manoa i avanoa ma fa'aliliu mea uma i mata'itusi la'ititi, na aveese e le tagata su'esu'e Rusia ni upu le taua, fa'aliliu i mata'itusi laiti ma tu'u ai le fua o upu.
Se'i tatou va'ai po'o fea Tokenizer, TokenFilters, CharFilters e fa'aaoga e le su'esu'e Rusia:
{
"filter": {
"russian_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"russian_keywords": {
"type": "keyword_marker",
"keywords": []
},
"russian_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"russian": {
"tokenizer": "standard",
/* TokenFilters */
"filter": [
"lowercase",
"russian_stop",
"russian_keywords",
"russian_stemmer"
]
/* CharFilters отсутствуют */
}
}
}Sei o tatou faʻamatalaina la matou suʻesuʻega faʻavae i luga o le Rusia, lea o le a tipi ese ai faʻailoga html. Se'i ta'ua o le faaletonu, aua ose su'esu'e e iai le igoa lea o le a fa'aaogaina e aunoa ma se totogi.
{
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
/* добавляем удаление html тегов */
"char_filter": ["html_strip"],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}Muamua, o tag HTML uma o le a aveesea mai le manoa puna, ona vaeluaina lea e le tokenizer standard i ni faʻailoga, o faʻailoga e maua ai o le a faʻagasolo i mataʻitusi laiti, o le a aveesea upu le taua, ma o faʻailoga o loʻo totoe o le a tumau pea le aʻa o le upu.
Fausia o se Fa'asinomaga
I luga na matou faʻamatalaina le suʻesuʻe faʻaletonu. O le a fa'aoga i fanua manoa uma. O la matou pou o loʻo i ai se faʻasologa o pine, o lea o le a faʻatautaia foi e le tagata suʻesuʻe ia pine. Aua O loʻo matou suʻeina pou e fetaui tonu ma se pine, ona matou manaʻomia lea e faʻamalo suʻesuʻega mo le faʻailoga.
Sei o tatou faia se index blog2 ma se suʻesuʻega ma faʻafanua, lea e faʻaletonu ai le auiliiliga o faʻailoga fanua:
curl -XPOST "$ES_URL/blog2" -d'
{
"settings": {
"analysis": {
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}
},
"mappings": {
"post": {
"properties": {
"content": {
"type": "string"
},
"published_at": {
"type": "date"
},
"tags": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string"
}
}
}
}
}'Se'i o tatou fa'aopoopo le 3 pou tutusa i lenei fa'ailoga (blog2). O le a ou aveese lenei faiga ona... e tutusa ma le faʻaopoopoina o pepa i le blog index.
Su'esu'ega fa'amatalaga atoa ma fa'amatalaga lagolago
Sei o tatou tilotilo i se isi ituaiga o talosaga:
# найдем документы, в которых встречается слово 'истории'
# query -> simple_query_string -> query содержит поисковый запрос
# поле title имеет приоритет 3
# поле tags имеет приоритет 2
# поле content имеет приоритет 1
# приоритет используется при ранжировании результатов
curl -XPOST "$ES_URL/blog2/post/_search?pretty" -d'
{
"query": {
"simple_query_string": {
"query": "истории",
"fields": [
"title^3",
"tags^2",
"content"
]
}
}
}'Aua O loʻo matou faʻaaogaina se suʻesuʻega faʻatasi ma Rusia stemming, ona toe faʻafoʻi lea e lenei talosaga pepa uma, e ui lava o loʻo i ai naʻo le upu 'talafaasolopito'.
O le talosaga e ono iai ni mataitusi fa'apitoa, mo se fa'ata'ita'iga:
""fried eggs" +(eggplant | potato) -frittata"Talosaga syntax:
+ signifies AND operation
| signifies OR operation
- negates a single token
" wraps a number of tokens to signify a phrase for searching
* at the end of a term signifies a prefix query
( and ) signify precedence
~N after a word signifies edit distance (fuzziness)
~N after a phrase signifies slop amount# найдем документы без слова 'щенки'
curl -XPOST "$ES_URL/blog2/post/_search?pretty" -d'
{
"query": {
"simple_query_string": {
"query": "-щенки",
"fields": [
"title^3",
"tags^2",
"content"
]
}
}
}'
# получим 2 поста про котиковmau
PS
Afai e te fiafia i ia tusiga-lesona, maua ni manatu mo tala fou pe i ai ni talosaga mo le galulue faʻatasi, o le a ou fiafia e maua se feʻau i se savali patino poʻo le meli m.kuzmin+habr@darkleaf.ru.
puna: www.habr.com
