ELK搜索高级课程

第十一章 索引Index入门

一、索引管理

1、为什么我们要手动创建索引

直接put数据 PUT index/_doc/1,es会自动生成索引,并建立动态映射dynamic mapping。

在生产上,我们需要自己手动建立索引和映射,为了更好地管理索引。就像数据库的建表语句一样。

2、索引管理

(1)创建索引

创建索引的语法

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
PUT /index
{
"settings": { ... any settings ... },
"mappings": {
"properties" : {
"field1" : { "type" : "text" }
}
},
"aliases": {
"default_index": {}
}
}

举例:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"field1":{
"type": "text"
},
"field2":{
"type": "text"
}
}
},
"aliases": {
"default_index": {}
}
}

索引别名

插入数据

From: 元动力
1
2
3
4
5
POST /my_index/_doc/1
{
"field1":"java",
"field2":"js"
}

查询数据 都可以查到

GET /my_index/_doc/1

GET /default_index/_doc/1

(2)查询索引

GET /my_index/_mapping

GET /my_index/_setting

(3)修改索引

修改副本数

From: 元动力
1
2
3
4
5
6
PUT /my_index/_settings
{
"index" : {
"number_of_replicas" : 2
}
}
(4)删除索引

DELETE /my_index

DELETE /index_one,index_two

DELETE /index_*

DELETE /_all

为了安全起见,防止恶意删除索引,删除时必须指定索引名:

elasticsearch.yml

action.destructive_requires_name: true

二、定制分词器

1、默认的分词器

standard

分词三个组件,character filter,tokenizer,token filter

standard tokenizer:以单词边界进行切分

standard token filter:什么都不做

lowercase token filter:将所有字母转换为小写

stop token filer(默认被禁用):移除停用词,比如a the it等等

2、修改分词器的设置

启用english停用词token filter

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"es_std": {
"type": "standard",
"stopwords": "_english_"
}
}
}
}
}

测试分词

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
GET /my_index/_analyze
{
"analyzer": "standard",
"text": "a dog is in the house"
}

GET /my_index/_analyze
{
"analyzer": "es_std",
"text":"a dog is in the house"
}

3、定制化自己的分词器

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": ["&=> and"]
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": ["the", "a"]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["html_strip", "&_to_and"],
"tokenizer": "standard",
"filter": ["lowercase", "my_stopwords"]
}
}
}
}
}

测试

From: 元动力
1
2
3
4
5
GET /my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "tom&jerry are a friend in the house, <a>, HAHA!!"
}

设置字段使用自定义分词器

From: 元动力
1
2
3
4
5
6
7
8
9
PUT /my_index/_mapping/
{
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}

三、type底层结构及弃用原因

1、type是什么

type,是一个index中用来区分类似的数据的,类似的数据,但是可能有不同的fields,而且有不同的属性来控制索引建立、分词器. field的value,在底层的lucene中建立索引的时候,全部是opaque bytes类型,不区分类型的。 lucene是没有type的概念的,在document中,实际上将type作为一个document的field来存储,即_type,es通过_type来进行type的过滤和筛选。

2、es中不同type存储机制

一个index中的多个type,实际上是放在一起存储的,因此一个index下,不能有多个type重名,而类型或者其他设置不同的,因为那样是无法处理的

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"goods": {
"mappings": {
"electronic_goods": {
"properties": {
"name": {
"type": "string",
},
"price": {
"type": "double"
},
"service_period": {
"type": "string"
}
}
},
"fresh_goods": {
"properties": {
"name": {
"type": "string",
},
"price": {
"type": "double"
},
"eat_period": {
"type": "string"
}
}
}
}
}
}
From: 元动力
1
2
3
4
5
6
PUT /goods/electronic_goods/1
{
"name": "小米空调",
"price": 1999.0,
"service_period": "one year"
}
From: 元动力
1
2
3
4
5
6
PUT /goods/fresh_goods/1
{
"name": "澳洲龙虾",
"price": 199.0,
"eat_period": "one week"
}

es文档在底层的存储是这样子的

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"goods": {
"mappings": {
"_type": {
"type": "string",
"index": "false"
},
"name": {
"type": "string"
}
"price": {
"type": "double"
}
"service_period": {
"type": "string"
},
"eat_period": {
"type": "string"
}
}
}
}

底层数据存储格式

From: 元动力
1
2
3
4
5
6
7
{
"_type": "electronic_goods",
"name": "小米空调",
"price": 1999.0,
"service_period": "one year",
"eat_period": ""
}
From: 元动力
1
2
3
4
5
6
7
{
"_type": "fresh_goods",
"name": "澳洲龙虾",
"price": 199.0,
"service_period": "",
"eat_period": "one week"
}

3、type弃用

同一索引下,不同type的数据存储其他type的field 大量空值,造成资源浪费。

所以,不同类型数据,要放到不同的索引中。

es9中,将会彻底删除type。

四、定制dynamic mapping

1、定制dynamic策略

true:遇到陌生字段,就进行dynamic mapping

false:新检测到的字段将被忽略。这些字段将不会被索引,因此将无法搜索,但仍将出现在返回点击的源字段中。这些字段不会添加到映射中,必须显式添加新字段。

strict:遇到陌生字段,就报错

创建mapping

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
PUT /my_index
{
"mappings": {
"dynamic": "strict",
"properties": {
"title": {
"type": "text"
},
"address": {
"type": "object",
"dynamic": "true"
}
}
}
}

插入数据

From: 元动力
1
2
3
4
5
6
7
8
9
PUT /my_index/_doc/1
{
"title": "my article",
"content": "this is my article",
"address": {
"province": "guangdong",
"city": "guangzhou"
}
}

报错

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"error": {
"root_cause": [
{
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [content] within [_doc] is not allowed"
}
],
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [content] within [_doc] is not allowed"
},
"status": 400
}

2、自定义 dynamic mapping策略

es会根据传入的值,推断类型。

1569123478530
1569123478530

date_detection 日期探测

默认会按照一定格式识别date,比如yyyy-MM-dd。但是如果某个field先过来一个2017-01-01的值,就会被自动dynamic mapping成date,后面如果再来一个"hello world"之类的值,就会报错。可以手动关闭某个type的date_detection,如果有需要,自己手动指定某个field为date类型。

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
PUT /my_index
{
"mappings": {
"date_detection": false,
"properties": {
"title": {
"type": "text"
},
"address": {
"type": "object",
"dynamic": "true"
}
}
}
}

测试

From: 元动力
1
2
3
4
5
6
7
8
9
10
PUT /my_index/_doc/1
{
"title": "my article",
"content": "this is my article",
"address": {
"province": "guangdong",
"city": "guangzhou"
},
"post_date":"2019-09-10"
}

查看映射

From: 元动力
1
GET /my_index/_mapping

自定义日期格式

From: 元动力
1
2
3
4
5
6
PUT my_index
{
"mappings": {
"dynamic_date_formats": ["MM/dd/yyyy"]
}
}

插入数据

From: 元动力
1
2
3
4
PUT my_index/_doc/1
{
"create_date": "09/25/2019"
}

numeric_detection 数字探测

虽然json支持本机浮点和整数数据类型,但某些应用程序或语言有时可能将数字呈现为字符串。通常正确的解决方案是显式地映射这些字段,但是可以启用数字检测(默认情况下禁用)来自动完成这些操作。

From: 元动力
1
2
3
4
5
6
PUT my_index
{
"mappings": {
"numeric_detection": true
}
}
From: 元动力
1
2
3
4
5
PUT my_index/_doc/1
{
"my_float": "1.0",
"my_integer": "1"
}

3、定制自己的dynamic mapping template

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
PUT /my_index
{
"mappings": {
"dynamic_templates": [
{
"en": {
"match": "*_en",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "english"
}
}
}
]
}
}

插入数据

From: 元动力
1
2
3
4
5
6
7
8
9
PUT /my_index/_doc/1
{
"title": "this is my first article"
}

PUT /my_index/_doc/2
{
"title_en": "this is my first article"
}

搜索

From: 元动力
1
2
GET my_index/_search?q=first
GET my_index/_search?q=is

title没有匹配到任何的dynamic模板,默认就是standard分词器,不会过滤停用词,is会进入倒排索引,用is来搜索是可以搜索到的

title_en匹配到了dynamic模板,就是english分词器,会过滤停用词,is这种停用词就会被过滤掉,用is来搜索就搜索不到了

模板写法

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "integer"
}
}
},
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
]
}
}

模板参数

From: 元动力
1
2
3
4
5
"match":   "long_*",
"unmatch": "*_text",
"match_mapping_type": "string",
"path_match": "name.*",
"path_unmatch": "*.middle",
From: 元动力
1
2
"match_pattern": "regex",
"match": "^profit_\d+$"

场景

1结构化搜索

默认情况下,elasticsearch将字符串字段映射为带有子关键字字段的文本字段。但是,如果只对结构化内容进行索引,而对全文搜索不感兴趣,则可以仅将“字段”映射为“关键字”。请注意,这意味着为了搜索这些字段,必须搜索索引所用的完全相同的值。

From: 元动力
1
2
3
4
5
6
7
8
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}

2仅搜索

与前面的示例相反,如果您只关心字符串字段的全文搜索,并且不打算对字符串字段运行聚合、排序或精确搜索,您可以告诉弹性搜索将其仅映射为文本字段(这是5之前的默认行为)

From: 元动力
1
2
3
4
5
6
7
8
{
"strings_as_text": {
"match_mapping_type": "string",
"mapping": {
"type": "text"
}
}
}

3norms 不关心评分

norms是指标时间的评分因素。如果您不关心评分,例如,如果您从不按评分对文档进行排序,则可以在索引中禁用这些评分因子的存储并节省一些空间。

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"norms": false,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}

五、零停机重建索引

1、零停机重建索引

场景:

一个field的设置是不能被修改的,如果要修改一个Field,那么应该重新按照新的mapping,建立一个index,然后将数据批量查询出来,重新用bulk api写入index中。

批量查询的时候,建议采用scroll api,并且采用多线程并发的方式来reindex数据,每次scoll就查询指定日期的一段数据,交给一个线程即可。

(1)一开始,依靠dynamic mapping,插入数据,但是不小心有些数据是2019-09-10这种日期格式的,所以title这种field被自动映射为了date类型,实际上它应该是string类型的

From: 元动力
1
2
3
4
5
6
7
8
9
PUT /my_index/_doc/1
{
"title": "2019-09-10"
}

PUT /my_index/_doc/2
{
"title": "2019-09-11"
}

(2)当后期向索引中加入string类型的title值的时候,就会报错

From: 元动力
1
2
3
4
PUT /my_index/_doc/3
{
"title": "my first article"
}

报错

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Invalid format: \"my first article\""
}
},
"status": 400
}

(3)如果此时想修改title的类型,是不可能的

From: 元动力
1
2
3
4
5
6
7
8
PUT /my_index/_mapping
{
"properties": {
"title": {
"type": "text"
}
}
}

报错

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [title] of different type, current_type [date], merged_type [text]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [title] of different type, current_type [date], merged_type [text]"
},
"status": 400
}

(4)此时,唯一的办法,就是进行reindex,也就是说,重新建立一个索引,将旧索引的数据查询出来,再导入新索引。

(5)如果说旧索引的名字,是old_index,新索引的名字是new_index,终端java应用,已经在使用old_index在操作了,难道还要去停止java应用,修改使用的index为new_index,才重新启动java应用吗?这个过程中,就会导致java应用停机,可用性降低。

(6)所以说,给java应用一个别名,这个别名是指向旧索引的,java应用先用着,java应用先用prod_index alias来操作,此时实际指向的是旧的my_index

From: 元动力
1
PUT /my_index/_alias/prod_index

(7)新建一个index,调整其title的类型为string

From: 元动力
1
2
3
4
5
6
7
8
9
10
PUT /my_index_new
{
"mappings": {
"properties": {
"title": {
"type": "text"
}
}
}
}

(8)使用scroll api将数据批量查询出来

From: 元动力
1
2
3
4
5
6
7
GET /my_index/_search?scroll=1m
{
"query": {
"match_all": {}
},
"size": 1
}

返回

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAADpAFjRvbnNUWVZaVGpHdklqOV9zcFd6MncAAAAAAAA6QRY0b25zVFlWWlRqR3ZJajlfc3BXejJ3AAAAAAAAOkIWNG9uc1RZVlpUakd2SWo5X3NwV3oydwAAAAAAADpDFjRvbnNUWVZaVGpHdklqOV9zcFd6MncAAAAAAAA6RBY0b25zVFlWWlRqR3ZJajlfc3BXejJ3",
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": null,
"_source": {
"title": "2019-01-02"
},
"sort": [
0
]
}
]
}
}

(9)采用bulk api将scoll查出来的一批数据,批量写入新索引

From: 元动力
1
2
3
POST /_bulk
{ "index": { "_index": "my_index_new", "_id": "1" }}
{ "title": "2019-09-10" }

(10)反复循环8~9,查询一批又一批的数据出来,采取bulk api将每一批数据批量写入新索引

(11)将prod_index alias切换到my_index_new上去,java应用会直接通过index别名使用新的索引中的数据,java应用程序不需要停机,零提交,高可用

From: 元动力
1
2
3
4
5
6
7
POST /_aliases
{
"actions": [
{ "remove": { "index": "my_index", "alias": "prod_index" }},
{ "add": { "index": "my_index_new", "alias": "prod_index" }}
]
}

(12)直接通过prod_index别名来查询,是否ok

From: 元动力
1
GET /prod_index/_search

2、生产实践:基于alias对client透明切换index

From: 元动力
1
PUT /my_index_v1/_alias/my_index

client对my_index进行操作

reindex操作,完成之后,切换v1到v2

From: 元动力
1
2
3
4
5
6
7
POST /_aliases
{
"actions": [
{ "remove": { "index": "my_index_v1", "alias": "my_index" }},
{ "add": { "index": "my_index_v2", "alias": "my_index" }}
]
}

第十二章 中文分词器 IK分词器

一、Ik分词器安装使用

1、中文分词器

standard 分词器,仅适用于英文。

From: 元动力
1
2
3
4
5
GET /_analyze
{
"analyzer": "standard",
"text": "中华人民共和国人民大会堂"
}

我们想要的效果是什么:中华人民共和国,人民大会堂

IK分词器就是目前最流行的es中文分词器

2、安装

官网:https://github.com/medcl/elasticsearch-analysis-ikopen in new window

下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releasesopen in new window

根据es版本下载相应版本包。

解压到 es/plugins/ik中。

重启es

3、ik分词器基础知识

ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国人民大会堂”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民大会堂,人民大会,大会堂”,会穷尽各种可能的组合;

ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国人民大会堂”拆分为“中华人民共和国,人民大会堂”。

4、ik分词器的使用

存储时,使用ik_max_word,搜索时,使用ik_smart

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
PUT /my_index 
{
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}
}

搜索

From: 元动力
1
GET /my_index/_search?q=中华人民共和国人民大会堂

二、 ik配置文件

1、 ik配置文件

ik配置文件地址:es/plugins/ik/config目录

IKAnalyzer.cfg.xml:用来配置自定义词库

main.dic:ik原生内置的中文词库,总共有27万多条,只要是这些单词,都会被分在一起

preposition.dic: 介词

quantifier.dic:放了一些单位相关的词,量词

suffix.dic:放了一些后缀

surname.dic:中国的姓氏

stopword.dic:英文停用词

ik原生最重要的两个配置文件

main.dic:包含了原生的中文词语,会按照这个里面的词语去分词

stopword.dic:包含了英文的停用词

停用词,stopword

a the and at but

一般,像停用词,会在分词的时候,直接被干掉,不会建立在倒排索引中

2、自定义词库

(1)自己建立词库:每年都会涌现一些特殊的流行词,网红,蓝瘦香菇,喊麦,鬼畜,一般不会在ik的原生词典里

自己补充自己的最新的词语,到ik的词库里面

IKAnalyzer.cfg.xml:ext_dict,创建mydict.dic。

补充自己的词语,然后需要重启es,才能生效

(2)自己建立停用词库:比如了,的,啥,么,我们可能并不想去建立索引,让人家搜索

custom/ext_stopword.dic,已经有了常用的中文停用词,可以补充自己的停用词,然后重启es

三、使用mysql热更新 词库

1、热更新

每次都是在es的扩展词典中,手动添加新词语,很坑

(1)每次添加完,都要重启es才能生效,非常麻烦

(2)es是分布式的,可能有数百个节点,你不能每次都一个一个节点上面去修改

es不停机,直接我们在外部某个地方添加新的词语,es中立即热加载到这些新词语

热更新的方案

(1)基于ik分词器原生支持的热更新方案,部署一个web服务器,提供一个http接口,通过modified和tag两个http响应头,来提供词语的热更新

(2)修改ik分词器源码,然后手动支持从mysql中每隔一定时间,自动加载新的词库

用第二种方案,第一种,ik git社区官方都不建议采用,觉得不太稳定

2、步骤

1、下载源码

https://github.com/medcl/elasticsearch-analysis-ik/releasesopen in new window

ik分词器,是个标准的java maven工程,直接导入eclipse就可以看到源码

2、修改源

org.wltea.analyzer.dic.Dictionary类,160行Dictionary单例类的初始化方法,在这里需要创建一个我们自定义的线程,并且启动它

org.wltea.analyzer.dic.HotDictReloadThread类:就是死循环,不断调用Dictionary.getSingleton().reLoadMainDict(),去重新加载词典

Dictionary类,399行:this.loadMySQLExtDict(); 加载mymsql字典。

Dictionary类,609行:this.loadMySQLStopwordDict();加载mysql停用词

config下jdbc-reload.properties。mysql配置文件

3、mvn package打包代码

target\releases\elasticsearch-analysis-ik-7.3.0.zip

4、解压缩ik压缩包

将mysql驱动jar,放入ik的目录下

5、修改jdbc相关配置

6、重启es

观察日志,日志中就会显示我们打印的那些东西,比如加载了什么配置,加载了什么词语,什么停用词

7、在mysql中添加词库与停用词

8、分词实验,验证热更新生效

From: 元动力
1
2
3
4
5
GET /_analyze
{
"analyzer": "ik_smart",
"text": "传智播客"
}

第十三章 java api 实现索引管理

代码

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
@SpringBootTest
@RunWith(SpringRunner.class)
public class TestIndex {

@Autowired
RestHighLevelClient client;

// @Autowired
// RestClient restClient;
//创建索引
@Test
public void testCreateIndex() throws IOException {
//创建索引对象
CreateIndexRequest createIndexRequest = new CreateIndexRequest("ydlclass_book");
//设置参数
createIndexRequest.settings(Settings.builder().put("number_of_shards", "1").put("number_of_replicas", "0"));
//指定映射1
createIndexRequest.mapping(" {\n" +
" \t\"properties\": {\n" +
" \"name\":{\n" +
" \"type\":\"keyword\"\n" +
" },\n" +
" \"description\": {\n" +
" \"type\": \"text\"\n" +
" },\n" +
" \"price\":{\n" +
" \"type\":\"long\"\n" +
" },\n" +
" \"pic\":{\n" +
" \"type\":\"text\",\n" +
" \"index\":false\n" +
" }\n" +
" \t}\n" +
"}", XContentType.JSON);

//指定映射2


// Map<String, Object> message = new HashMap<>();
// message.put("type", "text");
// Map<String, Object> properties = new HashMap<>();
// properties.put("message", message);
// Map<String, Object> mapping = new HashMap<>();
// mapping.put("properties", properties);
// createIndexRequest.mapping(mapping);

//指定映射3


// XContentBuilder builder = XContentFactory.jsonBuilder();
// builder.startObject();
// {
// builder.startObject("properties");
// {
// builder.startObject("message");
// {
// builder.field("type", "text");
// }
// builder.endObject();
// }
// builder.endObject();
// }
// builder.endObject();
// createIndexRequest.mapping(builder);


//设置别名
createIndexRequest.alias(new Alias("ydlclass_index_new"));

// 额外参数
//设置超时时间
createIndexRequest.setTimeout(TimeValue.timeValueMinutes(2));
//设置主节点超时时间
createIndexRequest.setMasterTimeout(TimeValue.timeValueMinutes(1));
//在创建索引API返回响应之前等待的活动分片副本的数量,以int形式表示
createIndexRequest.waitForActiveShards(ActiveShardCount.from(2));
createIndexRequest.waitForActiveShards(ActiveShardCount.DEFAULT);

//操作索引的客户端
IndicesClient indices = client.indices();
//执行创建索引库
CreateIndexResponse createIndexResponse = indices.create(createIndexRequest, RequestOptions.DEFAULT);

//得到响应(全部)
boolean acknowledged = createIndexResponse.isAcknowledged();
//得到响应 指示是否在超时前为索引中的每个分片启动了所需数量的碎片副本
boolean shardsAcknowledged = createIndexResponse.isShardsAcknowledged();

System.out.println("!!!!!!!!!!!!!!!!!!!!!!!!!!!" + acknowledged);
System.out.println(shardsAcknowledged);

}


//异步新增索引
@Test
public void testCreateIndexAsync() throws IOException {
//创建索引对象
CreateIndexRequest createIndexRequest = new CreateIndexRequest("ydlclass_book2");
//设置参数
createIndexRequest.settings(Settings.builder().put("number_of_shards", "1").put("number_of_replicas", "0"));
//指定映射1
createIndexRequest.mapping(" {\n" +
" \t\"properties\": {\n" +
" \"name\":{\n" +
" \"type\":\"keyword\"\n" +
" },\n" +
" \"description\": {\n" +
" \"type\": \"text\"\n" +
" },\n" +
" \"price\":{\n" +
" \"type\":\"long\"\n" +
" },\n" +
" \"pic\":{\n" +
" \"type\":\"text\",\n" +
" \"index\":false\n" +
" }\n" +
" \t}\n" +
"}", XContentType.JSON);

//监听方法
ActionListener<CreateIndexResponse> listener =
new ActionListener<CreateIndexResponse>() {

@Override
public void onResponse(CreateIndexResponse createIndexResponse) {
System.out.println("!!!!!!!!创建索引成功");
System.out.println(createIndexResponse.toString());
}

@Override
public void onFailure(Exception e) {
System.out.println("!!!!!!!!创建索引失败");
e.printStackTrace();
}
};

//操作索引的客户端
IndicesClient indices = client.indices();
//执行创建索引库
indices.createAsync(createIndexRequest, RequestOptions.DEFAULT, listener);

try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}



//删除索引库
@Test
public void testDeleteIndex() throws IOException {
//删除索引对象
DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("ydlclass_book2");
//操作索引的客户端
IndicesClient indices = client.indices();
//执行删除索引
AcknowledgedResponse delete = indices.delete(deleteIndexRequest, RequestOptions.DEFAULT);
//得到响应
boolean acknowledged = delete.isAcknowledged();
System.out.println(acknowledged);

}

//异步删除索引库
@Test
public void testDeleteIndexAsync() throws IOException {
//删除索引对象
DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("ydlclass_book2");
//操作索引的客户端
IndicesClient indices = client.indices();

//监听方法
ActionListener<AcknowledgedResponse> listener =
new ActionListener<AcknowledgedResponse>() {
@Override
public void onResponse(AcknowledgedResponse deleteIndexResponse) {
System.out.println("!!!!!!!!删除索引成功");
System.out.println(deleteIndexResponse.toString());
}

@Override
public void onFailure(Exception e) {
System.out.println("!!!!!!!!删除索引失败");
e.printStackTrace();
}
};
//执行删除索引
indices.deleteAsync(deleteIndexRequest, RequestOptions.DEFAULT, listener);

try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}

}

// Indices Exists API
@Test
public void testExistIndex() throws IOException {
GetIndexRequest request = new GetIndexRequest("ydlclass_book");
request.local(false);//从主节点返回本地信息或检索状态
request.humanReadable(true);//以适合人类的格式返回结果
request.includeDefaults(false);//是否返回每个索引的所有默认设置

boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}


// Indices Open API
@Test
public void testOpenIndex() throws IOException {
OpenIndexRequest request = new OpenIndexRequest("ydlclass_book");

OpenIndexResponse openIndexResponse = client.indices().open(request, RequestOptions.DEFAULT);
boolean acknowledged = openIndexResponse.isAcknowledged();
System.out.println("!!!!!!!!!"+acknowledged);
}

// Indices Close API
@Test
public void testCloseIndex() throws IOException {
CloseIndexRequest request = new CloseIndexRequest("index");
AcknowledgedResponse closeIndexResponse = client.indices().close(request, RequestOptions.DEFAULT);
boolean acknowledged = closeIndexResponse.isAcknowledged();
System.out.println("!!!!!!!!!"+acknowledged);

}
}

第十四章 search搜索入门

一、搜索语法入门

无条件搜索所有

From: 元动力
1
GET /book/_search
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
{
"took" : 969,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "book",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Bootstrap开发",
"description" : "Bootstrap是由Twitter推出的一个前台页面开发css框架,是一个非常流行的开发框架,此框架集成了多种页面效果。此开发框架包含了大量的CSS、JS程序代码,可以帮助开发者(尤其是不擅长css页面开发的程序人员)轻松的实现一个css,不受浏览器限制的精美界面css效果。",
"studymodel" : "201002",
"price" : 38.6,
"timestamp" : "2019-08-25 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"bootstrap",
"dev"
]
}
},
{
"_index" : "book",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "java编程思想",
"description" : "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
"studymodel" : "201001",
"price" : 68.6,
"timestamp" : "2019-08-25 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"java",
"dev"
]
}
},
{
"_index" : "book",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "spring开发基础",
"description" : "spring 在java领域非常流行,java程序员都在用。",
"studymodel" : "201001",
"price" : 88.6,
"timestamp" : "2019-08-24 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"spring",
"java"
]
}
}
]
}
}

解释

took:耗费了几毫秒

timed_out:是否超时,这里是没有

_shards:到几个分片搜索,成功几个,跳过几个,失败几个。

hits.total:查询结果的数量,3个document

hits.max_score:score的含义,就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高

hits.hits:包含了匹配搜索的document的所有详细数据

2、传参

与http请求传参类似

From: 元动力
1
GET /book/_search?q=name:java&sort=price:desc

类比sql: select * from book where name like ’ %java%’ order by price desc

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "book",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "java编程思想",
"description" : "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
"studymodel" : "201001",
"price" : 68.6,
"timestamp" : "2019-08-25 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"java",
"dev"
]
},
"sort" : [
68.6
]
}
]
}
}

3、图解timeout

GET /book/_search?timeout=10ms

全局设置:配置文件中设置 search.default_search_timeout:100ms。默认不超时。

二、multi-index 多索引搜索

1、multi-index搜索模式

告诉你如何一次性搜索多个index和多个type下的数据

From: 元动力
1
2
3
4
/_search:所有索引下的所有数据都搜索出来
/index1/_search:指定一个index,搜索其下所有的数据
/index1,index2/_search:同时搜索两个index下的数据
/index*/_search:按照通配符去匹配多个索引

应用场景:生产环境log索引可以按照日期分开。

log_to_es_20190910

log_to_es_20190911

log_to_es_20180910

2、初步图解一下简单的搜索原理

搜索原理初步图解

三、分页搜索

1、分页搜索的语法

sql: select * from book limit 1,5

size,from

GET /book/_search?size=10

GET /book/_search?size=10&from=0

GET /book/_search?size=10&from=20

GET /book_search?from=0&size=3

2、deep paging

什么是deep paging

根据相关度评分倒排序,所以分页过深,协调节点会将大量数据聚合分析。

deep paging 性能问题

1消耗网络带宽,因为所搜过深的话,各 shard 要把数据传递给 coordinate node,这个过程是有大量数据传递的,消耗网络。

2消耗内存,各 shard 要把数据传送给 coordinate node,这个传递回来的数据,是被 coordinate node 保存在内存中的,这样会大量消耗内存。

3消耗cup,coordinate node 要把传回来的数据进行排序,这个排序过程很消耗cpu。 所以:鉴于deep paging的性能问题,所有应尽量减少使用。

四、 query string基础语法

1、query string基础语法

GET /book/_search?q=name:java

GET /book/_search?q=+name:java

GET /book/_search?q=-name:java

一个是掌握q=field:search content的语法,还有一个是掌握+和-的含义

2、_all metadata的原理和作用

From: 元动力
1
GET /book/_search?q=java

直接可以搜索所有的field,任意一个field包含指定的关键字就可以搜索出来。我们在进行中搜索的时候,难道是对document中的每一个field都进行一次搜索吗?不是的。

es中_all元数据。建立索引的时候,插入一条docunment,es会将所有的field值经行全量分词,把这些分词,放到_all field中。在搜索的时候,没有指定field,就在_all搜索。

举例

From: 元动力
1
2
3
4
5
{
name:jack
email:123@qq.com
address:beijing
}

_all : jack,123@qq.com,beijing

五、query DSL入门

1、DSL

query string 后边的参数原来越多,搜索条件越来越复杂,不能满足需求。

GET /book/_search?q=name:java&size=10&from=0&sort=price:desc

DSL:Domain Specified Language,特定领域的语言

es特有的搜索语言,可在请求体中携带搜索条件,功能强大。

查询全部 GET /book/_search

From: 元动力
1
2
3
4
GET /book/_search
{
"query": { "match_all": {} }
}

排序 GET /book/_search?sort=price:desc

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
GET /book/_search 
{
"query" : {
"match" : {
"name" : " java"
}
},
"sort": [
{ "price": "desc" }
]
}

分页查询 GET /book/_search?size=10&from=0

From: 元动力
1
2
3
4
5
6
GET  /book/_search 
{
"query": { "match_all": {} },
"from": 0,
"size": 1
}

指定返回字段 GET /book/ _search? _source=name,studymodel

From: 元动力
1
2
3
4
5
GET /book/_search 
{
"query": { "match_all": {} },
"_source": ["name", "studymodel"]
}

通过组合以上各种类型查询,实现复杂查询。

2、 Query DSL语法

From: 元动力
1
2
3
4
5
6
{
QUERY_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
From: 元动力
1
2
3
4
5
6
7
8
{
QUERY_NAME: {
FIELD_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
}
From: 元动力
1
2
3
4
5
6
7
8
GET /test_index/_search 
{
"query": {
"match": {
"test_field": "test"
}
}
}

3、组合多个搜索条件

搜索需求:title必须包含elasticsearch,content可以包含elasticsearch也可以不包含,author_id必须不为111

sql where and or !=

初始数据:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
POST /website/_doc/1
{
"title": "my hadoop article",
"content": "hadoop is very bad",
"author_id": 111
}

POST /website/_doc/2
{
"title": "my elasticsearch article",
"content": "es is very bad",
"author_id": 112
}
POST /website/_doc/3
{
"title": "my elasticsearch article",
"content": "es is very goods",
"author_id": 111
}

搜索:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
GET /website/_doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "elasticsearch"
}
}
],
"should": [
{
"match": {
"content": "elasticsearch"
}
}
],
"must_not": [
{
"match": {
"author_id": 111
}
}
]
}
}
}

返回:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
"took" : 488,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47000363,
"hits" : [
{
"_index" : "website",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.47000363,
"_source" : {
"title" : "my elasticsearch article",
"content" : "es is very bad",
"author_id" : 112
}
}
]
}
}

更复杂的搜索需求:

select * from test_index where name='tom' or (hired =true and (personality ='good' and rude != true ))

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
GET /test_index/_search
{
"query": {
"bool": {
"must": { "match":{ "name": "tom" }},
"should": [
{ "match":{ "hired": true }},
{ "bool": {
"must":{ "match": { "personality": "good" }},
"must_not": { "match": { "rude": true }}
}}
],
"minimum_should_match": 1
}
}
}

六、full-text search 全文检索

1、全文检索

重新创建book索引

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
PUT /book/
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"description":{
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"studymodel":{
"type": "keyword"
},
"price":{
"type": "double"
},
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"pic":{
"type":"text",
"index":false
}
}
}
}插入数据
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
PUT /book/_doc/1
{
"name": "Bootstrap开发",
"description": "Bootstrap是由Twitter推出的一个前台页面开发css框架,是一个非常流行的开发框架,此框架集成了多种页面效果。此开发框架包含了大量的CSS、JS程序代码,可以帮助开发者(尤其是不擅长css页面开发的程序人员)轻松的实现一个css,不受浏览器限制的精美界面css效果。",
"studymodel": "201002",
"price":38.6,
"timestamp":"2019-08-25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "bootstrap", "dev"]
}

PUT /book/_doc/2
{
"name": "java编程思想",
"description": "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
"studymodel": "201001",
"price":68.6,
"timestamp":"2019-08-25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "java", "dev"]
}

PUT /book/_doc/3
{
"name": "spring开发基础",
"description": "spring 在java领域非常流行,java程序员都在用。",
"studymodel": "201001",
"price":88.6,
"timestamp":"2019-08-24 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "spring", "java"]
}

搜索

From: 元动力
1
2
3
4
5
6
7
8
GET  /book/_search 
{
"query" : {
"match" : {
"description" : "java程序员"
}
}
}

2、_score初探

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.137549,
"hits" : [
{
"_index" : "book",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.137549,
"_source" : {
"name" : "spring开发基础",
"description" : "spring 在java领域非常流行,java程序员都在用。",
"studymodel" : "201001",
"price" : 88.6,
"timestamp" : "2019-08-24 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"spring",
"java"
]
}
},
{
"_index" : "book",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.57961315,
"_source" : {
"name" : "java编程思想",
"description" : "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
"studymodel" : "201001",
"price" : 68.6,
"timestamp" : "2019-08-25 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"java",
"dev"
]
}
}
]
}
}

结果分析

1、建立索引时, description字段 term倒排索引

java 2,3

程序员 3

2、搜索时,直接找description中含有java的文档 2,3,并且3号文档含有两个java字段,一个程序员,所以得分高,排在前面。2号文档含有一个java,排在后面。

七、DSL 语法练习

1、match_all

From: 元动力
1
2
3
4
5
6
GET /book/_search
{
"query": {
"match_all": {}
}
}

2、match

From: 元动力
1
2
3
4
5
6
7
8
GET /book/_search
{
"query": {
"match": {
"description": "java程序员"
}
}
}

3、multi_match

From: 元动力
1
2
3
4
5
6
7
8
9
GET /book/_search
{
"query": {
"multi_match": {
"query": "java程序员",
"fields": ["name", "description"]
}
}
}

4、range query 范围查询

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
GET /book/_search
{
"query": {
"range": {
"price": {
"gte": 80,
"lte": 90
}
}
}
}

5、term query

字段为keyword时,存储和搜索都不分词

From: 元动力
1
2
3
4
5
6
7
8
GET /book/_search
{
"query": {
"term": {
"description": "java程序员"
}
}
}

6、terms query

From: 元动力
1
2
3
4
5
GET /book/_search
{
"query": { "terms": { "tag": [ "search", "full_text", "nosql" ] }}
}

7、exist query 查询有某些字段值的文档

From: 元动力
1
2
3
4
5
6
7
8
GET /_search
{
"query": {
"exists": {
"field": "join_date"
}
}
}

8、Fuzzy query

返回包含与搜索词类似的词的文档,该词由Levenshtein编辑距离度量。

包括以下几种情况:

  • 更改角色(box→fox)

  • 删除字符(aple→apple)

  • 插入字符(sick→sic)

  • 调换两个相邻字符(ACT→CAT)

From: 元动力
1
2
3
4
5
6
7
8
9
10
GET /book/_search
{
"query": {
"fuzzy": {
"description": {
"value": "jave"
}
}
}
}

9、IDs

From: 元动力
1
2
3
4
5
6
7
8
GET /book/_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}

10、prefix 前缀查询

From: 元动力
1
2
3
4
5
6
7
8
9
10
GET /book/_search
{
"query": {
"prefix": {
"description": {
"value": "spring"
}
}
}
}

11、regexp query 正则查询

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
GET /book/_search
{
"query": {
"regexp": {
"description": {
"value": "j.*a",
"flags" : "ALL",
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}

八、 Filter

1、 filter与query示例

需求:用户查询description中有"java程序员",并且价格大于80小于90的数据。

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
GET /book/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "java程序员"
}
},
{
"range": {
"price": {
"gte": 80,
"lte": 90
}
}
}
]
}
}
}

使用filter:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
GET /book/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "java程序员"
}
}
],
"filter": {
"range": {
"price": {
"gte": 80,
"lte": 90
}
}
}
}
}
}

2、filter与query对比

filter,仅仅只是按照搜索条件过滤出需要的数据而已,不计算任何相关度分数,对相关度没有任何影响。

query,会去计算每个document相对于搜索条件的相关度,并按照相关度进行排序。

应用场景:

一般来说,如果你是在进行搜索,需要将最匹配搜索条件的数据先返回,那么用query 如果你只是要根据一些条件筛选出一部分数据,不关注其排序,那么用filter

3、filter与query性能

filter,不需要计算相关度分数,不需要按照相关度分数进行排序,同时还有内置的自动cache最常使用filter的数据

query,相反,要计算相关度分数,按照分数进行排序,而且无法cache结果

九、定位错误语法

验证错误语句:

From: 元动力
1
2
3
4
5
6
7
8
GET /book/_validate/query?explain
{
"query": {
"mach": {
"description": "java程序员"
}
}
}

返回:

From: 元动力
1
2
3
4
{
"valid" : false,
"error" : "org.elasticsearch.common.ParsingException: no [query] registered for [mach]"
}

正确

From: 元动力
1
2
3
4
5
6
7
8
GET /book/_validate/query?explain
{
"query": {
"match": {
"description": "java程序员"
}
}
}

返回

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"valid" : true,
"explanations" : [
{
"index" : "book",
"valid" : true,
"explanation" : "description:java description:程序员"
}
]
}

一般用在那种特别复杂庞大的搜索下,比如你一下子写了上百行的搜索,这个时候可以先用validate api去验证一下,搜索是否合法。

合法以后,explain就像mysql的执行计划,可以看到搜索的目标等信息。

十、定制排序规则

1、默认排序规则

默认情况下,是按照_score降序排序的

然而,某些情况下,可能没有有用的_score,比如说filter

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
GET book/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "java程序员"
}
}
]
}
}
}

当然,也可以是constant_score

2、定制排序规则

相当于sql中order by ?sort=sprice:desc

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET /book/_search 
{
"query": {
"constant_score": {
"filter" : {
"term" : {
"studymodel" : "201001"
}
}
}
},
"sort": [
{
"price": {
"order": "asc"
}
}
]
}

十一、 Text字段排序问题

如果对一个text field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。

通常解决方案是,将一个text field建立两次索引,一个分词,用来进行搜索;一个不分词,用来进行排序。

fielddate:true

PUT /website 
{
  "mappings": {
  "properties": {
    "title": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }        
      }      
    },
    "content": {
      "type": "text"
    },
    "post_date": {
      "type": "date"
    },
    "author_id": {
      "type": "long"
    }
  }
 }
}

插入数据

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
PUT /website/_doc/1
{
"title": "first article",
"content": "this is my second article",
"post_date": "2019-01-01",
"author_id": 110
}

PUT /website/_doc/2
{
"title": "second article",
"content": "this is my second article",
"post_date": "2019-01-01",
"author_id": 110
}

PUT /website/_doc/3
{
"title": "third article",
"content": "this is my third article",
"post_date": "2019-01-02",
"author_id": 110
}

搜索

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
GET /website/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title.keyword": {
"order": "desc"
}
}
]
}

十二、Scroll分批查询

场景:下载某一个索引中1亿条数据,到文件或是数据库。

不能一下全查出来,系统内存溢出。所以使用scoll滚动搜索技术,一批一批查询。

scoll搜索会在第一次搜索的时候,保存一个当时的视图快照,之后只会基于该旧的视图快照提供数据搜索,如果这个期间数据变更,是不会让用户看到的

每次发送scroll请求,我们还需要指定一个scoll参数,指定一个时间窗口,每次搜索请求只要在这个时间窗口内能完成就可以了。

搜索

From: 元动力
1
2
3
4
5
6
7
GET /book/_search?scroll=1m
{
"query": {
"match_all": {}
},
"size": 3
}

返回

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAMOkWTURBNDUtcjZTVUdKMFp5cXloVElOQQ==",
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [

]
}
}

获得的结果会有一个scoll_id,下一次再发送scoll请求的时候,必须带上这个scoll_id

From: 元动力
1
2
3
4
5
GET /_search/scroll
{
"scroll": "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAMOkWTURBNDUtcjZTVUdKMFp5cXloVElOQQ=="
}

与分页区别:

分页给用户看的 deep paging

scroll是用户系统内部操作,如下载批量数据,数据转移。零停机改变索引映射。

第十五章 java api实现搜索

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
package com.itheima.es;

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.*;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;
import java.util.Map;

/**
* creste by itheima.itcast
*/
@SpringBootTest
@RunWith(SpringRunner.class)
public class TestSearch {
@Autowired
RestHighLevelClient client;

//搜索全部记录
@Test
public void testSearchAll() throws IOException {

// GET book/_search
// {
// "query": {
// "match_all": {}
// }
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());

//获取某些字段
searchSourceBuilder.fetchSource(new String[]{"name"}, new String[]{});

searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");

}
}



//搜索分页
@Test
public void testSearchPage() throws IOException {
// GET book/_search
// {
// "query": {
// "match_all": {}
// },
// "from": 0,
// "size": 2
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());


//第几页
int page=1;
//每页几个
int size=2;
//下标计算
int from=(page-1)*size;


searchSourceBuilder.from(from);
searchSourceBuilder.size(size);


searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("id:" + id);
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");

}
}



//ids搜索
@Test
public void testSearchIds() throws IOException {
// GET /book/_search
// {
// "query": {
// "ids" : {
// "values" : ["1", "4", "100"]
// }
// }
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.idsQuery().addIds("1","4","100"));




searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("id:" + id);
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");

}
}


//match搜索
@Test
public void testSearchMatch() throws IOException {
//
// GET /book/_search
// {
// "query": {
// "match": {
// "description": "java程序员"
// }
// }
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery("description", "java程序员"));




searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("id:" + id);
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");

}
}

//term 搜索
@Test
public void testSearchTerm() throws IOException {
//
// GET /book/_search
// {
// "query": {
// "term": {
// "description": "java程序员"
// }
// }
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.termQuery("description", "java程序员"));




searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("id:" + id);
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");

}
}


//multi_match搜索
@Test
public void testSearchMultiMatch() throws IOException {
// GET /book/_search
// {
// "query": {
// "multi_match": {
// "query": "java程序员",
// "fields": ["name", "description"]
// }
// }
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.multiMatchQuery("java程序员","name","description"));




searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("id:" + id);
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");

}
}
// GET /book/_search
// {
// "query": {
// "bool": {
// "must": [
// {
// "multi_match": {
// "query": "java程序员",
// "fields": ["name","description"]
// }
// }
// ],
// "should": [
// {
// "match": {
// "studymodel": "201001"
// }
// }
// ]
// }
// }
// }

//bool搜索
@Test
public void testSearchBool() throws IOException {
// GET /book/_search
// {
// "query": {
// "bool": {
// "must": [
// {
// "multi_match": {
// "query": "java程序员",
// "fields": ["name","description"]
// }
// }
// ],
// "should": [
// {
// "match": {
// "studymodel": "201001"
// }
// }
// ]
// }
// }
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

//构建multiMatch请求
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("java程序员", "name", "description");
//构建match请求
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("studymodel", "201001");

BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
boolQueryBuilder.must(multiMatchQueryBuilder);
boolQueryBuilder.should(matchQueryBuilder);

searchSourceBuilder.query(boolQueryBuilder);




searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("id:" + id);
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");

}
}

// GET /book/_search
// {
// "query": {
// "bool": {
// "must": [
// {
// "multi_match": {
// "query": "java程序员",
// "fields": ["name","description"]
// }
// }
// ],
// "should": [
// {
// "match": {
// "studymodel": "201001"
// }
// }
// ],
// "filter": {
// "range": {
// "price": {
// "gte": 50,
// "lte": 90
// }
// }
//
// }
// }
// }
// }

//filter搜索
@Test
public void testSearchFilter() throws IOException {
// GET /book/_search
// {
// "query": {
// "bool": {
// "must": [
// {
// "multi_match": {
// "query": "java程序员",
// "fields": ["name","description"]
// }
// }
// ],
// "should": [
// {
// "match": {
// "studymodel": "201001"
// }
// }
// ],
// "filter": {
// "range": {
// "price": {
// "gte": 50,
// "lte": 90
// }
// }
//
// }
// }
// }
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

//构建multiMatch请求
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("java程序员", "name", "description");
//构建match请求
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("studymodel", "201001");

BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
boolQueryBuilder.must(multiMatchQueryBuilder);
boolQueryBuilder.should(matchQueryBuilder);

boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gte(50).lte(90));

searchSourceBuilder.query(boolQueryBuilder);


searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("id:" + id);
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");

}
}




//sort搜索
@Test
public void testSearchSort() throws IOException {
// GET /book/_search
// {
// "query": {
// "bool": {
// "must": [
// {
// "multi_match": {
// "query": "java程序员",
// "fields": ["name","description"]
// }
// }
// ],
// "should": [
// {
// "match": {
// "studymodel": "201001"
// }
// }
// ],
// "filter": {
// "range": {
// "price": {
// "gte": 50,
// "lte": 90
// }
// }
//
// }
// }
// },
// "sort": [
// {
// "price": {
// "order": "asc"
// }
// }
// ]
// }
//1构建搜索请求
SearchRequest searchRequest = new SearchRequest("book");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

//构建multiMatch请求
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery("java程序员", "name", "description");
//构建match请求
MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("studymodel", "201001");

BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
boolQueryBuilder.must(multiMatchQueryBuilder);
boolQueryBuilder.should(matchQueryBuilder);

boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gte(50).lte(90));

searchSourceBuilder.query(boolQueryBuilder);

//按照价格升序
searchSourceBuilder.sort("price", SortOrder.ASC);


searchRequest.source(searchSourceBuilder);

//2执行搜索
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3获取结果
SearchHits hits = searchResponse.getHits();

//数据数据
SearchHit[] searchHits = hits.getHits();
System.out.println("--------------------------");
for (SearchHit hit : searchHits) {
String id = hit.getId();
float score = hit.getScore();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String name = (String) sourceAsMap.get("name");
String description = (String) sourceAsMap.get("description");
Double price = (Double) sourceAsMap.get("price");
System.out.println("id:" + id);
System.out.println("name:" + name);
System.out.println("description:" + description);
System.out.println("price:" + price);
System.out.println("==========================");
}
}

}

第十六章 评分机制详解

一、评分机制 TF\IDF

1、算法介绍

relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度。

Elasticsearch使用的是 term frequency/inverse document frequency算法,简称为TF/IDF算法。TF词频(Term Frequency),IDF逆向文件频率(Inverse Document Frequency)

Term frequency:搜索文本中的各个词条在field文本中出现了多少次,出现次数越多,就越相关。

1571494142950
1571494142950

举例:搜索请求:hello world

doc1 : hello you and me,and world is very good.

doc2 : hello,how are you

Inverse document frequency:搜索文本中的各个词条在整个索引的所有文档中出现了多少次,出现的次数越多,就越不相关.

1571494159465
1571494159465
1571494176760
1571494176760

举例:搜索请求:hello world

doc1 : hello ,today is very good

doc2 : hi world ,how are you

整个index中1亿条数据。hello的document 1000个,有world的document 有100个。

doc2 更相关

Field-length norm:field长度,field越长,相关度越弱

举例:搜索请求:hello world

doc1 : {"title":"hello article","content ":"balabalabal 1万个"}

doc2 : {"title":"my article","content ":"balabalabal 1万个,world"}

2、_score是如何被计算出来的

From: 元动力
1
2
3
4
5
6
7
8
GET /book/_search?explain=true
{
"query": {
"match": {
"description": "java程序员"
}
}
}

返回

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.137549,
"hits" : [
{
"_shard" : "[book][0]",
"_node" : "MDA45-r6SUGJ0ZyqyhTINA",
"_index" : "book",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.137549,
"_source" : {
"name" : "spring开发基础",
"description" : "spring 在java领域非常流行,java程序员都在用。",
"studymodel" : "201001",
"price" : 88.6,
"timestamp" : "2019-08-24 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"spring",
"java"
]
},
"_explanation" : {
"value" : 2.137549,
"description" : "sum of:",
"details" : [
{
"value" : 0.7936629,
"description" : "weight(description:java in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.7936629,
"description" : "score(freq=2.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.47000363,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.7675597,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 2.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 12.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 35.333332,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 1.3438859,
"description" : "weight(description:程序员 in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 1.3438859,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.98082924,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.6227967,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 12.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 35.333332,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
},
{
"_shard" : "[book][0]",
"_node" : "MDA45-r6SUGJ0ZyqyhTINA",
"_index" : "book",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.57961315,
"_source" : {
"name" : "java编程思想",
"description" : "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
"studymodel" : "201001",
"price" : 68.6,
"timestamp" : "2019-08-25 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"java",
"dev"
]
},
"_explanation" : {
"value" : 0.57961315,
"description" : "sum of:",
"details" : [
{
"value" : 0.57961315,
"description" : "weight(description:java in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.57961315,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.47000363,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 3,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.56055,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 19.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 35.333332,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
}
]
}
}

3、分析一个document是如何被匹配上的

From: 元动力
1
2
3
4
5
6
7
8
GET /book/_explain/3
{
"query": {
"match": {
"description": "java程序员"
}
}
}

二、Doc value

搜索的时候,要依靠倒排索引;排序的时候,需要依靠正排索引,看到每个document的每个field,然后进行排序,所谓的正排索引,其实就是doc values

在建立索引的时候,一方面会建立倒排索引,以供搜索用;一方面会建立正排索引,也就是doc values,以供排序,聚合,过滤等操作使用

doc values是被保存在磁盘上的,此时如果内存足够,os会自动将其缓存在内存中,性能还是会很高;如果内存不足够,os会将其写入磁盘上

倒排索引

doc1: hello world you and me

doc2: hi, world, how are you

termdoc1doc2
hello*
world**
you**
and*
me*
hi*
how*
are*

搜索时:

hello you --> hello, you

hello --> doc1

you --> doc1,doc2

doc1: hello world you and me

doc2: hi, world, how are you

sort by 出现问题

正排索引

doc1: { "name": "jack", "age": 27 }

doc2: { "name": "tom", "age": 30 }

documentnameage
doc1jack27
doc2tom30

三、query phase

1、query phase

(1)搜索请求发送到某一个coordinate node,构构建一个priority queue,长度以paging操作from和size为准,默认为10

(2)coordinate node将请求转发到所有shard,每个shard本地搜索,并构建一个本地的priority queue

(3)各个shard将自己的priority queue返回给coordinate node,并构建一个全局的priority queue

2、replica shard如何提升搜索吞吐量

一次请求要打到所有shard的一个replica/primary上去,如果每个shard都有多个replica,那么同时并发过来的搜索请求可以同时打到其他的replica上去

四、 fetch phase

1、fetch phbase工作流程

(1)coordinate node构建完priority queue之后,就发送mget请求去所有shard上获取对应的document

(2)各个shard将document返回给coordinate node

(3)coordinate node将合并后的document结果返回给client客户端

2、一般搜索,如果不加from和size,就默认搜索前10条,按照_score排序

五、搜索参数小总结

1、preference

决定了哪些shard会被用来执行搜索操作

_primary, _primary_first, _local, _only_node:xyz, _prefer_node:xyz, _shards:2,3

bouncing results问题,两个document排序,field值相同;不同的shard上,可能排序不同;每次请求轮询打到不同的replica shard上;每次页面上看到的搜索结果的排序都不一样。这就是bouncing result,也就是跳跃的结果。

搜索的时候,是轮询将搜索请求发送到每一个replica shard(primary shard),但是在不同的shard上,可能document的排序不同

解决方案就是将preference设置为一个字符串,比如说user_id,让每个user每次搜索的时候,都使用同一个replica shard去执行,就不会看到bouncing results了

2、timeout

已经讲解过原理了,主要就是限定在一定时间内,将部分获取到的数据直接返回,避免查询耗时过长

3、routing

document文档路由,_id路由,routing=user_id,这样的话可以让同一个user对应的数据到一个shard上去

4、search_type

default:query_then_fetch

dfs_query_then_fetch,可以提升revelance sort精准度

第十七章 聚合入门

一、聚合示例

1、需求:计算每个studymodel下的商品数量

sql语句: select studymodel,count(*) from book group by studymodel

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
GET /book/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"group_by_model": {
"terms": { "field": "studymodel" }
}
}
}

2、需求:计算每个tags下的商品数量

设置字段"fielddata": true

From: 元动力
1
2
3
4
5
6
7
8
9
PUT /book/_mapping/
{
"properties": {
"tags": {
"type": "text",
"fielddata": true
}
}
}

查询

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
GET /book/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"group_by_tags": {
"terms": { "field": "tags" }
}
}
}

3、需求:加上搜索条件,计算每个tags下的商品数量

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
GET /book/_search
{
"size": 0,
"query": {
"match": {
"description": "java程序员"
}
},
"aggs": {
"group_by_tags": {
"terms": { "field": "tags" }
}
}
}

4、需求:先分组,再算每组的平均值,计算每个tag下的商品的平均价格

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
GET /book/_search
{
"size": 0,
"aggs" : {
"group_by_tags" : {
"terms" : {
"field" : "tags"
},
"aggs" : {
"avg_price" : {
"avg" : { "field" : "price" }
}
}
}
}
}

5、需求:计算每个tag下的商品的平均价格,并且按照平均价格降序排序

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET /book/_search
{
"size": 0,
"aggs" : {
"group_by_tags" : {
"terms" : {
"field" : "tags",
"order": {
"avg_price": "desc"
}
},
"aggs" : {
"avg_price" : {
"avg" : { "field" : "price" }
}
}
}
}
}

6、需求:按照指定的价格范围区间进行分组,然后在每组内再按照tag进行分组,最后再计算每组的平均价格

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
GET /book/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 40
},
{
"from": 40,
"to": 60
},
{
"from": 60,
"to": 80
}
]
},
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
},
"aggs": {
"average_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}

二、两个核心概念:bucket和metric

1、bucket:一个数据分组

city name 北京 张三 北京 李四 天津 王五 天津 赵六

天津 王麻子

划分出来两个bucket,一个是北京bucket,一个是天津bucket 北京bucket:包含了2个人,张三,李四 上海bucket:包含了3个人,王五,赵六,王麻子

2、metric:对一个数据分组执行的统计

metric,就是对一个bucket执行的某种聚合分析的操作,比如说求平均值,求最大值,求最小值

select count(*) from book group studymodel

bucket:group by studymodel --> 那些studymodel相同的数据,就会被划分到一个bucket中 metric:count(*),对每个user_id bucket中所有的数据,计算一个数量。还有avg(),sum(),max(),min()

3、电视案例

创建索引及映射

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
PUT /tvs
PUT /tvs/_search
{
"properties": {
"price": {
"type": "long"
},
"color": {
"type": "keyword"
},
"brand": {
"type": "keyword"
},
"sold_date": {
"type": "date"
}
}
}

插入数据

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
POST /tvs/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-10-28" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-11-05" }
{ "index": {}}
{ "price" : 3000, "color" : "绿色", "brand" : "小米", "sold_date" : "2019-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "蓝色", "brand" : "TCL", "sold_date" : "2019-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "绿色", "brand" : "TCL", "sold_date" : "2019-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "红色", "brand" : "三星", "sold_date" : "2020-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "蓝色", "brand" : "小米", "sold_date" : "2020-02-12" }
需求1 统计哪种颜色的电视销量最高
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
GET /tvs/_search
{
"size" : 0,
"aggs" : {
"popular_colors" : {
"terms" : {
"field" : "color"
}
}
}
}

查询条件解析

size:只获取聚合结果,而不要执行聚合的原始数据 aggs:固定语法,要对一份数据执行分组聚合操作 popular_colors:就是对每个aggs,都要起一个名字, terms:根据字段的值进行分组 field:根据指定的字段的值进行分组

返回

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took" : 18,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"popular_colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "红色",
"doc_count" : 4
},
{
"key" : "绿色",
"doc_count" : 2
},
{
"key" : "蓝色",
"doc_count" : 2
}
]
}
}
}

返回结果解析

hits.hits:我们指定了size是0,所以hits.hits就是空的 aggregations:聚合结果 popular_color:我们指定的某个聚合的名称 buckets:根据我们指定的field划分出的buckets key:每个bucket对应的那个值 doc_count:这个bucket分组内,有多少个数据 数量,其实就是这种颜色的销量

每种颜色对应的bucket中的数据的默认的排序规则:按照doc_count降序排序

需求2 统计每种颜色电视平均价格
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GET /tvs/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}

在一个aggs执行的bucket操作(terms),平级的json结构下,再加一个aggs,这个第二个aggs内部,同样取个名字,执行一个metric操作,avg,对之前的每个bucket中的数据的指定的field,price field,求一个平均值

返回:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "红色",
"doc_count" : 4,
"avg_price" : {
"value" : 3250.0
}
},
{
"key" : "绿色",
"doc_count" : 2,
"avg_price" : {
"value" : 2100.0
}
},
{
"key" : "蓝色",
"doc_count" : 2,
"avg_price" : {
"value" : 2000.0
}
}
]
}
}
}

buckets,除了key和doc_count avg_price:我们自己取的metric aggs的名字 value:我们的metric计算的结果,每个bucket中的数据的price字段求平均值后的结果

相当于sql: select avg(price) from tvs group by color

需求3 继续下钻分析

每个颜色下,平均价格及每个颜色下,每个品牌的平均价格

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
GET /tvs/_search 
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
},
"aggs": {
"color_avg_price": {
"avg": {
"field": "price"
}
},
"group_by_brand": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
需求4:更多的metric

count:bucket,terms,自动就会有一个doc_count,就相当于是count avg:avg aggs,求平均值 max:求一个bucket内,指定field值最大的那个数据 min:求一个bucket内,指定field值最小的那个数据 sum:求一个bucket内,指定field值的总和

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
GET /tvs/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"min_price" : { "min": { "field": "price"} },
"max_price" : { "max": { "field": "price"} },
"sum_price" : { "sum": { "field": "price" } }
}
}
}
}
需求5:划分范围 histogram
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET /tvs/_search
{
"size" : 0,
"aggs":{
"price":{
"histogram":{
"field": "price",
"interval": 2000
},
"aggs":{
"income": {
"sum": {
"field" : "price"
}
}
}
}
}
}

histogram:类似于terms,也是进行bucket分组操作,接收一个field,按照这个field的值的各个范围区间,进行bucket分组操作

From: 元动力
1
2
3
4
"histogram":{ 
"field": "price",
"interval": 2000
}

interval:2000,划分范围,02000,20004000,40006000,60008000,8000~10000,buckets

bucket有了之后,一样的,去对每个bucket执行avg,count,sum,max,min,等各种metric操作,聚合分析

需求6:按照日期分组聚合

date_histogram,按照我们指定的某个date类型的日期field,以及日期interval,按照一定的日期间隔,去划分bucket

min_doc_count:即使某个日期interval,2017-01-01~2017-01-31中,一条数据都没有,那么这个区间也是要返回的,不然默认是会过滤掉这个区间的 extended_bounds,min,max:划分bucket的时候,会限定在这个起始日期,和截止日期内

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GET /tvs/_search
{
"size" : 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "2019-01-01",
"max" : "2020-12-31"
}
}
}
}
}
需求7 统计每季度每个品牌的销售额
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
GET /tvs/_search 
{
"size": 0,
"aggs": {
"group_by_sold_date": {
"date_histogram": {
"field": "sold_date",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2019-01-01",
"max": "2020-12-31"
}
},
"aggs": {
"group_by_brand": {
"terms": {
"field": "brand"
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
},
"total_sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
需求8 :搜索与聚合结合,查询某个品牌按颜色销量

搜索与聚合可以结合起来。

sql select count(*)

from tvs

where brand like "%小米%"

group by color

es aggregation,scope,任何的聚合,都必须在搜索出来的结果数据中之行,搜索结果,就是聚合分析操作的scope

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GET /tvs/_search 
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "小米"
}
}
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
}
}
}
}
需求9 global bucket:单个品牌与所有品牌销量对比

aggregation,scope,一个聚合操作,必须在query的搜索结果范围内执行

出来两个结果,一个结果,是基于query搜索结果来聚合的; 一个结果,是对所有数据执行聚合的

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
GET /tvs/_search 
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "小米"
}
}
},
"aggs": {
"single_brand_avg_price": {
"avg": {
"field": "price"
}
},
"all": {
"global": {},
"aggs": {
"all_brand_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
需求10:过滤+聚合:统计价格大于1200的电视平均价格

搜索+聚合

过滤+聚合

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
GET /tvs/_search 
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"range": {
"price": {
"gte": 1200
}
}
}
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
需求11 bucket filter:统计品牌最近一个月的平均价格
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
GET /tvs/_search 
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "小米"
}
}
},
"aggs": {
"recent_150d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-150d"
}
}
},
"aggs": {
"recent_150d_avg_price": {
"avg": {
"field": "price"
}
}
}
},
"recent_140d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-140d"
}
}
},
"aggs": {
"recent_140d_avg_price": {
"avg": {
"field": "price"
}
}
}
},
"recent_130d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-130d"
}
}
},
"aggs": {
"recent_130d_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}

aggs.filter,针对的是聚合去做的

如果放query里面的filter,是全局的,会对所有的数据都有影响

但是,如果,比如说,你要统计,长虹电视,最近1个月的平均值; 最近3个月的平均值; 最近6个月的平均值

bucket filter:对不同的bucket下的aggs,进行filter

需求12 排序:按每种颜色的平均销售额降序排序
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
GET /tvs/_search 
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color",
"order": {
"avg_price": "asc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}

相当于sql子表数据字段可以立刻使用。

需求13 排序:按每种颜色的每种品牌平均销售额降序排序
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
GET /tvs/_search  
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
},
"aggs": {
"group_by_brand": {
"terms": {
"field": "brand",
"order": {
"avg_price": "desc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}

第十八章 java api实现聚合

简单聚合,多种聚合,详见代码。

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
package com.ydlclass.es;

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.Aggregation;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.Aggregations;
import org.elasticsearch.search.aggregations.bucket.histogram.*;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.*;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;
import java.util.List;

/**
* creste by ydlclass.ydl
*/
@SpringBootTest
@RunWith(SpringRunner.class)
public class TestAggs {
@Autowired
RestHighLevelClient client;

//需求一:按照颜色分组,计算每个颜色卖出的个数
@Test
public void testAggs() throws IOException {
// GET /tvs/_search
// {
// "size": 0,
// "query": {"match_all": {}},
// "aggs": {
// "group_by_color": {
// "terms": {
// "field": "color"
// }
// }
// }
// }

//1 构建请求
SearchRequest searchRequest=new SearchRequest("tvs");

//请求体
SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.query(QueryBuilders.matchAllQuery());

TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms("group_by_color").field("color");
searchSourceBuilder.aggregation(termsAggregationBuilder);

//请求体放入请求头
searchRequest.source(searchSourceBuilder);

//2 执行
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3 获取结果
// "aggregations" : {
// "group_by_color" : {
// "doc_count_error_upper_bound" : 0,
// "sum_other_doc_count" : 0,
// "buckets" : [
// {
// "key" : "红色",
// "doc_count" : 4
// },
// {
// "key" : "绿色",
// "doc_count" : 2
// },
// {
// "key" : "蓝色",
// "doc_count" : 2
// }
// ]
// }
Aggregations aggregations = searchResponse.getAggregations();
Terms group_by_color = aggregations.get("group_by_color");
List<? extends Terms.Bucket> buckets = group_by_color.getBuckets();
for (Terms.Bucket bucket : buckets) {
String key = bucket.getKeyAsString();
System.out.println("key:"+key);

long docCount = bucket.getDocCount();
System.out.println("docCount:"+docCount);

System.out.println("=================================");
}
}

// #需求二:按照颜色分组,计算每个颜色卖出的个数,每个颜色卖出的平均价格
@Test
public void testAggsAndAvg() throws IOException {
// GET /tvs/_search
// {
// "size": 0,
// "query": {"match_all": {}},
// "aggs": {
// "group_by_color": {
// "terms": {
// "field": "color"
// },
// "aggs": {
// "avg_price": {
// "avg": {
// "field": "price"
// }
// }
// }
// }
// }
// }

//1 构建请求
SearchRequest searchRequest=new SearchRequest("tvs");

//请求体
SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.query(QueryBuilders.matchAllQuery());

TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms("group_by_color").field("color");

//terms聚合下填充一个子聚合
AvgAggregationBuilder avgAggregationBuilder = AggregationBuilders.avg("avg_price").field("price");
termsAggregationBuilder.subAggregation(avgAggregationBuilder);

searchSourceBuilder.aggregation(termsAggregationBuilder);

//请求体放入请求头
searchRequest.source(searchSourceBuilder);

//2 执行
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3 获取结果
// {
// "key" : "红色",
// "doc_count" : 4,
// "avg_price" : {
// "value" : 3250.0
// }
// }
Aggregations aggregations = searchResponse.getAggregations();
Terms group_by_color = aggregations.get("group_by_color");
List<? extends Terms.Bucket> buckets = group_by_color.getBuckets();
for (Terms.Bucket bucket : buckets) {
String key = bucket.getKeyAsString();
System.out.println("key:"+key);

long docCount = bucket.getDocCount();
System.out.println("docCount:"+docCount);

Aggregations aggregations1 = bucket.getAggregations();
Avg avg_price = aggregations1.get("avg_price");
double value = avg_price.getValue();
System.out.println("value:"+value);

System.out.println("=================================");
}
}

// #需求三:按照颜色分组,计算每个颜色卖出的个数,以及每个颜色卖出的平均值、最大值、最小值、总和。
@Test
public void testAggsAndMore() throws IOException {
// GET /tvs/_search
// {
// "size" : 0,
// "aggs": {
// "group_by_color": {
// "terms": {
// "field": "color"
// },
// "aggs": {
// "avg_price": { "avg": { "field": "price" } },
// "min_price" : { "min": { "field": "price"} },
// "max_price" : { "max": { "field": "price"} },
// "sum_price" : { "sum": { "field": "price" } }
// }
// }
// }
// }

//1 构建请求
SearchRequest searchRequest=new SearchRequest("tvs");

//请求体
SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.query(QueryBuilders.matchAllQuery());

TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms("group_by_color").field("color");


//termsAggregationBuilder里放入多个子聚合
AvgAggregationBuilder avgAggregationBuilder = AggregationBuilders.avg("avg_price").field("price");
MinAggregationBuilder minAggregationBuilder = AggregationBuilders.min("min_price").field("price");
MaxAggregationBuilder maxAggregationBuilder = AggregationBuilders.max("max_price").field("price");
SumAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("sum_price").field("price");

termsAggregationBuilder.subAggregation(avgAggregationBuilder);
termsAggregationBuilder.subAggregation(minAggregationBuilder);
termsAggregationBuilder.subAggregation(maxAggregationBuilder);
termsAggregationBuilder.subAggregation(sumAggregationBuilder);


searchSourceBuilder.aggregation(termsAggregationBuilder);

//请求体放入请求头
searchRequest.source(searchSourceBuilder);

//2 执行
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3 获取结果
// {
// "key" : "红色",
// "doc_count" : 4,
// "max_price" : {
// "value" : 8000.0
// },
// "min_price" : {
// "value" : 1000.0
// },
// "avg_price" : {
// "value" : 3250.0
// },
// "sum_price" : {
// "value" : 13000.0
// }
// }
Aggregations aggregations = searchResponse.getAggregations();
Terms group_by_color = aggregations.get("group_by_color");
List<? extends Terms.Bucket> buckets = group_by_color.getBuckets();
for (Terms.Bucket bucket : buckets) {
String key = bucket.getKeyAsString();
System.out.println("key:"+key);

long docCount = bucket.getDocCount();
System.out.println("docCount:"+docCount);

Aggregations aggregations1 = bucket.getAggregations();

Max max_price = aggregations1.get("max_price");
double maxPriceValue = max_price.getValue();
System.out.println("maxPriceValue:"+maxPriceValue);

Min min_price = aggregations1.get("min_price");
double minPriceValue = min_price.getValue();
System.out.println("minPriceValue:"+minPriceValue);

Avg avg_price = aggregations1.get("avg_price");
double avgPriceValue = avg_price.getValue();
System.out.println("avgPriceValue:"+avgPriceValue);

Sum sum_price = aggregations1.get("sum_price");
double sumPriceValue = sum_price.getValue();
System.out.println("sumPriceValue:"+sumPriceValue);

System.out.println("=================================");
}
}

// #需求四:按照售价每2000价格划分范围,算出每个区间的销售总额 histogram
@Test
public void testAggsAndHistogram() throws IOException {
// GET /tvs/_search
// {
// "size" : 0,
// "aggs":{
// "by_histogram":{
// "histogram":{
// "field": "price",
// "interval": 2000
// },
// "aggs":{
// "income": {
// "sum": {
// "field" : "price"
// }
// }
// }
// }
// }
// }

//1 构建请求
SearchRequest searchRequest=new SearchRequest("tvs");

//请求体
SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.query(QueryBuilders.matchAllQuery());

HistogramAggregationBuilder histogramAggregationBuilder = AggregationBuilders.histogram("by_histogram").field("price").interval(2000);

SumAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("income").field("price");
histogramAggregationBuilder.subAggregation(sumAggregationBuilder);
searchSourceBuilder.aggregation(histogramAggregationBuilder);

//请求体放入请求头
searchRequest.source(searchSourceBuilder);

//2 执行
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3 获取结果
// {
// "key" : 0.0,
// "doc_count" : 3,
// income" : {
// "value" : 3700.0
// }
// }
Aggregations aggregations = searchResponse.getAggregations();
Histogram group_by_color = aggregations.get("by_histogram");
List<? extends Histogram.Bucket> buckets = group_by_color.getBuckets();
for (Histogram.Bucket bucket : buckets) {
String keyAsString = bucket.getKeyAsString();
System.out.println("keyAsString:"+keyAsString);
long docCount = bucket.getDocCount();
System.out.println("docCount:"+docCount);

Aggregations aggregations1 = bucket.getAggregations();
Sum income = aggregations1.get("income");
double value = income.getValue();
System.out.println("value:"+value);

System.out.println("=================================");

}
}

// #需求五:计算每个季度的销售总额
@Test
public void testAggsAndDateHistogram() throws IOException {
// GET /tvs/_search
// {
// "size" : 0,
// "aggs": {
// "sales": {
// "date_histogram": {
// "field": "sold_date",
// "interval": "quarter",
// "format": "yyyy-MM-dd",
// "min_doc_count" : 0,
// "extended_bounds" : {
// "min" : "2019-01-01",
// "max" : "2020-12-31"
// }
// },
// "aggs": {
// "income": {
// "sum": {
// "field": "price"
// }
// }
// }
// }
// }
// }

//1 构建请求
SearchRequest searchRequest=new SearchRequest("tvs");

//请求体
SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.query(QueryBuilders.matchAllQuery());

DateHistogramAggregationBuilder dateHistogramAggregationBuilder = AggregationBuilders.dateHistogram("date_histogram").field("sold_date").calendarInterval(DateHistogramInterval.QUARTER)
.format("yyyy-MM-dd").minDocCount(0).extendedBounds(new ExtendedBounds("2019-01-01", "2020-12-31"));
SumAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("income").field("price");
dateHistogramAggregationBuilder.subAggregation(sumAggregationBuilder);

searchSourceBuilder.aggregation(dateHistogramAggregationBuilder);
//请求体放入请求头
searchRequest.source(searchSourceBuilder);

//2 执行
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

//3 获取结果
// {
// "key_as_string" : "2019-01-01",
// "key" : 1546300800000,
// "doc_count" : 0,
// "income" : {
// "value" : 0.0
// }
// }
Aggregations aggregations = searchResponse.getAggregations();
ParsedDateHistogram date_histogram = aggregations.get("date_histogram");
List<? extends Histogram.Bucket> buckets = date_histogram.getBuckets();
for (Histogram.Bucket bucket : buckets) {
String keyAsString = bucket.getKeyAsString();
System.out.println("keyAsString:"+keyAsString);
long docCount = bucket.getDocCount();
System.out.println("docCount:"+docCount);

Aggregations aggregations1 = bucket.getAggregations();
Sum income = aggregations1.get("income");
double value = income.getValue();
System.out.println("value:"+value);

System.out.println("====================");
}

}

}

第十九章 es7 sql新特性

一、快速入门

From: 元动力
1
2
3
4
POST /_sql?format=txt
{
"query": "SELECT * FROM tvs "
}

二、启动方式

1、http 请求

2、客户端:elasticsearch-sql-cli.bat

3、代码

三、显示方式

1573212830146
1573212830146

四、sql 翻译

From: 元动力
1
2
3
4
POST /_sql/translate
{
"query": "SELECT * FROM tvs "
}

返回:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"size" : 1000,
"_source" : false,
"stored_fields" : "_none_",
"docvalue_fields" : [
{
"field" : "brand"
},
{
"field" : "color"
},
{
"field" : "price"
},
{
"field" : "sold_date",
"format" : "epoch_millis"
}
],
"sort" : [
{
"_doc" : {
"order" : "asc"
}
}
]
}

五、与其他DSL结合

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
POST /_sql?format=txt
{
"query": "SELECT * FROM tvs",
"filter": {
"range": {
"price": {
"gte" : 1200,
"lte" : 2000
}
}
}
}

六、 java 代码实现sql功能

1、前提 es拥有白金版功能

kibana中管理-》许可管理 开启白金版试用

2、导入依赖

    <dependency>
        <groupId>org.elasticsearch.plugin</groupId>
        <artifactId>x-pack-sql-jdbc</artifactId>
        <version>7.3.0</version>
    </dependency>

&lt;repositories&gt;
    &lt;repository&gt;
        &lt;id&gt;elastic.co&lt;/id&gt;
        &lt;url&gt;https://artifacts.elastic.co/maven&lt;/url&gt;
    &lt;/repository&gt;
&lt;/repositories&gt;

3、代码

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public static void main(String[] args) {
try {
Connection connection = DriverManager.getConnection("jdbc:es://http://localhost:9200");
Statement statement = connection.createStatement();
ResultSet results = statement.executeQuery(
"select * from tvs");
while(results.next()){
System.out.println(results.getString(1));
System.out.println(results.getString(2));
System.out.println(results.getString(3));
System.out.println(results.getString(4));
System.out.println("============================");
}
}catch (Exception e){
e.printStackTrace();
}

大型企业可以购买白金版,增加Machine Learning、高级安全性x-pack。

第二十章 Logstash学习

一、Logstash基本语法组成

1573291947262
1573291947262

1、什么是Logstash

logstash是一个数据抽取工具,将数据从一个地方转移到另一个地方。如hadoop生态圈的sqoop等。下载地址:https://www.elastic.co/cn/downloads/logstashopen in new window

logstash之所以功能强大和流行,还与其丰富的过滤器插件是分不开的,过滤器提供的并不单单是过滤的功能,还可以对进入过滤器的原始数据进行复杂的逻辑处理,甚至添加独特的事件到后续流程中。 Logstash配置文件有如下三部分组成,其中input、output部分是必须配置,filter部分是可选配置,而filter就是过滤器插件,可以在这部分实现各种日志过滤功能。

2、配置文件:

From: 元动力
1
2
3
4
5
6
7
8
9
input {
#输入插件
}
filter {
#过滤匹配插件
}
output {
#输出插件
}

3、启动操作:

From: 元动力
1
logstash.bat -e 'input{stdin{}} output{stdout{}}'

为了好维护,将配置写入文件,启动

From: 元动力
1
logstash.bat -f ../config/test1.conf

二、Logstash输入插件(input)

https://www.elastic.co/guide/en/logstash/current/input-plugins.htmlopen in new window

1、标准输入(Stdin)

From: 元动力
1
2
3
4
5
6
7
8
9
10
input{
stdin{

}
}
output {
stdout{
codec=&gt;rubydebug
}
}

2、读取文件(File)

logstash使用一个名为filewatch的ruby gem库来监听文件变化,并通过一个叫.sincedb的数据库文件来记录被监听的日志文件的读取进度(时间戳),这个sincedb数据文件的默认路径在 <path.data>/plugins/inputs/file下面,文件名类似于.sincedb_123456,而<path.data>表示logstash插件存储目录,默认是LOGSTASH_HOME/data。

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
input {
file {
path =&gt; ["/var/*/*"]
start_position =&gt; "beginning"
}
}
output {
stdout{
codec=&gt;rubydebug
}
}

默认情况下,logstash会从文件的结束位置开始读取数据,也就是说logstash进程会以类似tail -f命令的形式逐行获取数据。

3、读取TCP网络数据

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
input {
tcp {
port =&gt; "1234"
}
}

filter {
grok {
match =&gt; { "message" =&gt; "%{SYSLOGLINE}" }
}
}

output {
stdout{
codec=&gt;rubydebug
}
}

三、Logstash过滤器插件(Filter)

https://www.elastic.co/guide/en/logstash/current/filter-plugins.htmlopen in new window

1、Grok 正则捕获

grok是一个十分强大的logstash filter插件,他可以通过正则解析任意文本,将非结构化日志数据弄成结构化和方便查询的结构。他是目前logstash 中解析非结构化日志数据最好的方式。

Grok 的语法规则是:

From: 元动力
1
%{语法: 语义}

例如输入的内容为:

From: 元动力
1
172.16.213.132 [07/Feb/2019:16:24:19 +0800] "GET / HTTP/1.1" 403 5039

%{IP:clientip}匹配模式将获得的结果为:clientip: 172.16.213.132 %{HTTPDATE:timestamp}匹配模式将获得的结果为:timestamp: 07/Feb/2018:16:24:19 +0800 而%{QS:referrer}匹配模式将获得的结果为:referrer: "GET / HTTP/1.1"

下面是一个组合匹配模式,它可以获取上面输入的所有内容:

From: 元动力
1
%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}

通过上面这个组合匹配模式,我们将输入的内容分成了五个部分,即五个字段,将输入内容分割为不同的数据字段,这对于日后解析和查询日志数据非常有用,这正是使用grok的目的。

例子:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
input{
stdin{}
}
filter{
grok{
match =&gt; ["message","%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}"]
}
}
output{
stdout{
codec =&gt; "rubydebug"
}
}

输入内容:

From: 元动力
1
172.16.213.132 [07/Feb/2019:16:24:19 +0800] "GET / HTTP/1.1" 403 5039

2、时间处理(Date)

date插件是对于排序事件和回填旧数据尤其重要,它可以用来转换日志记录中的时间字段,变成LogStash::Timestamp对象,然后转存到@timestamp字段里,这在之前已经做过简单的介绍。 下面是date插件的一个配置示例(这里仅仅列出filter部分):

From: 元动力
1
2
3
4
5
6
7
8
filter {
grok {
match =&gt; ["message", "%{HTTPDATE:timestamp}"]
}
date {
match =&gt; ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
}
}

3、数据修改(Mutate)

(1)正则表达式替换匹配字段

gsub可以通过正则表达式替换字段中匹配到的值,只对字符串字段有效,下面是一个关于mutate插件中gsub的示例(仅列出filter部分):

From: 元动力
1
2
3
4
5
filter {
mutate {
gsub =&gt; ["filed_name_1", "/" , "_"]
}
}

这个示例表示将filed_name_1字段中所有"/"字符替换为"_"。

(2)分隔符分割字符串为数组

split可以通过指定的分隔符分割字段中的字符串为数组,下面是一个关于mutate插件中split的示例(仅列出filter部分):

From: 元动力
1
2
3
4
5
filter {
mutate {
split =&gt; ["filed_name_2", "|"]
}
}

这个示例表示将filed_name_2字段以"|"为区间分隔为数组。

(3)重命名字段

rename可以实现重命名某个字段的功能,下面是一个关于mutate插件中rename的示例(仅列出filter部分):

From: 元动力
1
2
3
4
5
filter {
mutate {
rename =&gt; { "old_field" =&gt; "new_field" }
}
}

这个示例表示将字段old_field重命名为new_field。

(4)删除字段

remove_field可以实现删除某个字段的功能,下面是一个关于mutate插件中remove_field的示例(仅列出filter部分):

From: 元动力
1
2
3
4
5
filter {
mutate {
remove_field =&gt; ["timestamp"]
}
}

这个示例表示将字段timestamp删除。

(5)GeoIP 地址查询归类
From: 元动力
1
2
3
4
5
filter {
geoip {
source =&gt; "ip_field"
}
}
综合例子:
From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
input {
stdin {}
}
filter {
grok {
match =&gt; { "message" =&gt; "%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}" }
remove_field =&gt; [ "message" ]
}
date {
match =&gt; ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
}
mutate {
convert =&gt; [ "response","float" ]
rename =&gt; { "response" =&gt; "response_new" }
gsub =&gt; ["referrer","\"",""]
split =&gt; ["clientip", "."]
}
}
output {
stdout {
codec =&gt; "rubydebug"
}

四、Logstash输出插件(output)

https://www.elastic.co/guide/en/logstash/current/output-plugins.htmlopen in new window

output是Logstash的最后阶段,一个事件可以经过多个输出,而一旦所有输出处理完成,整个事件就执行完成。 一些常用的输出包括:

  • file: 表示将日志数据写入磁盘上的文件。
  • elasticsearch:表示将日志数据发送给Elasticsearch。Elasticsearch可以高效方便和易于查询的保存数据。

1、输出到标准输出(stdout)

From: 元动力
1
2
3
4
5
output {
stdout {
codec =&gt; rubydebug
}
}

2、保存为文件(file)

From: 元动力
1
2
3
4
5
output {
file {
path =&gt; "/data/log/%{+yyyy-MM-dd}/%{host}_%{+HH}.log"
}
}

3、输出到elasticsearch

From: 元动力
1
2
3
4
5
6
output {
elasticsearch {
host =&gt; ["192.168.1.1:9200","172.16.213.77:9200"]
index =&gt; "logstash-%{+YYYY.MM.dd}"
}
}
  • host:是一个数组类型的值,后面跟的值是elasticsearch节点的地址与端口,默认端口是9200。可添加多个地址。
  • index:写入elasticsearch的索引的名称,这里可以使用变量。Logstash提供了%{+YYYY.MM.dd}这种写法。在语法解析的时候,看到以+ 号开头的,就会自动认为后面是时间格式,尝试用时间格式来解析后续字符串。这种以天为单位分割的写法,可以很容易的删除老的数据或者搜索指定时间范围内的数据。此外,注意索引名中不能有大写字母。
  • manage_template:用来设置是否开启logstash自动管理模板功能,如果设置为false将关闭自动管理模板功能。如果我们自定义了模板,那么应该设置为false。
  • template_name:这个配置项用来设置在Elasticsearch中模板的名称。

五、综合案例

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
input {
file {
path =&gt; ["D:/ES/logstash-7.3.0/nginx.log"]
start_position =&gt; "beginning"
}
}

filter {
grok {
match =&gt; { "message" =&gt; "%{IP:clientip}\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:response}\ %{NUMBER:bytes}" }
remove_field =&gt; [ "message" ]
}
date {
match =&gt; ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
}
mutate {
rename =&gt; { "response" =&gt; "response_new" }
convert =&gt; [ "response","float" ]
gsub =&gt; ["referrer","\"",""]
remove_field =&gt; ["timestamp"]
split =&gt; ["clientip", "."]
}
}

output {
stdout {
codec =&gt; "rubydebug"
}

elasticsearch {
host =&gt; ["localhost:9200"]
index =&gt; "logstash-%{+YYYY.MM.dd}"
}

}

第二十一章 kibana学习

一、基本查询

1、是什么:elk中数据展现工具。

2、下载:https://www.elastic.co/cn/downloads/kibanaopen in new window

3、使用:建立索引模式,index partten

discover 中使用DSL搜索。

二、可视化

绘制图形

三、仪表盘

将各种可视化图形放入,形成大屏幕。

四、使用模板数据指导绘图

点击主页的添加模板数据,可以看到很多模板数据以及绘图。

五、其他功能

监控,日志,APM等功能非常丰富。

第二十二章 集群部署

见部署图

结点的三个角色

主结点:master节点主要用于集群的管理及索引 比如新增结点、分片分配、索引的新增和删除等。 数据结点:data 节点上保存了数据分片,它负责索引和搜索操作。 客户端结点:client 节点仅作为请求客户端存在,client的作用也作为负载均衡器,client 节点不存数据,只是将请求均衡转发到其它结点。

通过下边两项参数来配置结点的功能:

From: 元动力
1
2
3
node.master: #是否允许为主结点
node.data: #允许存储数据作为数据结点
node.ingest: #是否允许成为协调节点

四种组合方式:

From: 元动力
1
2
3
4
master=true,data=true:即是主结点又是数据结点
master=false,data=true:仅是数据结点
master=true,data=false:仅是主结点,不存储数据
master=false,data=false:即不是主结点也不是数据结点,此时可设置ingest为true表示它是一个客户端。

第二十三章 项目实战

一、项目一:ELK用于日志分析

需求:集中收集分布式服务的日志

1逻辑模块程序随时输出日志

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package com.ydlclass.es;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.util.Random;

/**
* creste by ydlclass.ydl
*/
@SpringBootTest
@RunWith(SpringRunner.class)
public class TestLog {
private static final Logger LOGGER= LoggerFactory.getLogger(TestLog.class);

@Test
public void testLog(){
Random random =new Random();

while (true){
int userid=random.nextInt(10);
LOGGER.info("userId:{},send:{}",userid,"hello world.I am "+userid);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}

2logstash收集日志到es

grok 内置类型

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?&lt;![0-9.+-])(?&gt;[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?&lt;![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?&lt;![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b

POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
QUOTEDSTRING (?&gt;(?&lt;!\\)(?&gt;"(?&gt;\\.|[^\\"]+)+"|""|(?&gt;'(?&gt;\\.|[^\\']+)+')|''|(?&gt;`(?&gt;\\.|[^\\`]+)+`)|``))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}

# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
IPV4 (?&lt;![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
IP (?:%{IPV6}|%{IPV4})
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
HOST %{HOSTNAME}
IPORHOST (?:%{HOSTNAME}|%{IP})
HOSTPORT %{IPORHOST}:%{POSINT}

# paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (?&gt;/(?&gt;[\w_%!$@:.,-]+|\\.)*)+
TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
WINPATH (?&gt;[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&amp;]*))?(?:&amp;(?:[A-Za-z0-9]+(?:=(?:[^&amp;]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*'|(){},~@#%&amp;/=:;_?\-\[\]]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?

# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHNUM2 (?:0[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])

# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)

# Years?
YEAR (?&gt;\d\d){1,2}
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!&lt;[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[PMCE][SD]T|UTC)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_RFC2822 %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
DATESTAMP_EVENTLOG %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}

# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG (?:[\w._/%-]+)
SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY &lt;%{NONNEGINT:facility}.%{NONNEGINT:priority}&gt;
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}

# Shortcuts
QS %{QUOTEDSTRING}

# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}

# Log Levels
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)

写logstash配置文件。

3kibana展现数据

二、项目二:学成在线站内搜索模块

1、mysql导入course_pub表

2、创建索引xc_course

3、创建映射

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
PUT /xc_course
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"description" : {
"analyzer" : "ik_max_word",
"search_analyzer": "ik_smart",
"type" : "text"
},
"grade" : {
"type" : "keyword"
},
"id" : {
"type" : "keyword"
},
"mt" : {
"type" : "keyword"
},
"name" : {
"analyzer" : "ik_max_word",
"search_analyzer": "ik_smart",
"type" : "text"
},
"users" : {
"index" : false,
"type" : "text"
},
"charge" : {
"type" : "keyword"
},
"valid" : {
"type" : "keyword"
},
"pic" : {
"index" : false,
"type" : "keyword"
},
"qq" : {
"index" : false,
"type" : "keyword"
},
"price" : {
"type" : "float"
},
"price_old" : {
"type" : "float"
},
"st" : {
"type" : "keyword"
},
"status" : {
"type" : "keyword"
},
"studymodel" : {
"type" : "keyword"
},
"teachmode" : {
"type" : "keyword"
},
"teachplan" : {
"analyzer" : "ik_max_word",
"search_analyzer": "ik_smart",
"type" : "text"
},
"expires" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"pub_time" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"start_time" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"end_time" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}

4、logstash创建模板文件

Logstash的工作是从MySQL中读取数据,向ES中创建索引,这里需要提前创建mapping的模板文件以便logstash使用。

在logstach的config目录创建xc_course_template.json,内容如下:

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
{
"mappings" : {
"doc" : {
"properties" : {
"charge" : {
"type" : "keyword"
},
"description" : {
"analyzer" : "ik_max_word",
"search_analyzer" : "ik_smart",
"type" : "text"
},
"end_time" : {
"format" : "yyyy-MM-dd HH:mm:ss",
"type" : "date"
},
"expires" : {
"format" : "yyyy-MM-dd HH:mm:ss",
"type" : "date"
},
"grade" : {
"type" : "keyword"
},
"id" : {
"type" : "keyword"
},
"mt" : {
"type" : "keyword"
},
"name" : {
"analyzer" : "ik_max_word",
"search_analyzer" : "ik_smart",
"type" : "text"
},
"pic" : {
"index" : false,
"type" : "keyword"
},
"price" : {
"type" : "float"
},
"price_old" : {
"type" : "float"
},
"pub_time" : {
"format" : "yyyy-MM-dd HH:mm:ss",
"type" : "date"
},
"qq" : {
"index" : false,
"type" : "keyword"
},
"st" : {
"type" : "keyword"
},
"start_time" : {
"format" : "yyyy-MM-dd HH:mm:ss",
"type" : "date"
},
"status" : {
"type" : "keyword"
},
"studymodel" : {
"type" : "keyword"
},
"teachmode" : {
"type" : "keyword"
},
"teachplan" : {
"analyzer" : "ik_max_word",
"search_analyzer" : "ik_smart",
"type" : "text"
},
"users" : {
"index" : false,
"type" : "text"
},
"valid" : {
"type" : "keyword"
}
}
}
},
"template" : "xc_course"
}

5、logstash配置mysql.conf

1、ES采用UTC时区问题

ES采用UTC 时区,比北京时间早8小时,所以ES读取数据时让最后更新时间加8小时

where timestamp > date_add(:sql_last_value,INTERVAL 8 HOUR)

2、logstash每个执行完成会在/config/logstash_metadata记录执行时间下次以此时间为基准进行增量同步数据到索引库。

6、启动

From: 元动力
1
.\logstash.bat -f ..\config\mysql.conf

7、后端代码

(1)Controller

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
@RestController
@RequestMapping("/search/course")
public class EsCourseController {
@Autowired
EsCourseService esCourseService;

@GetMapping(value="/list/{page}/{size}")
public QueryResponseResult&lt;CoursePub&gt; list(@PathVariable("page") int page, @PathVariable("size") int size, CourseSearchParam courseSearchParam) {
return esCourseService.list(page,size,courseSearchParam);
}

}

(2)

From: 元动力
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
@Service
public class EsCourseService {
@Value("${heima.course.source_field}")
private String source_field;

@Autowired
RestHighLevelClient restHighLevelClient;

//课程搜索
public QueryResponseResult&lt;CoursePub&gt; list(int page, int size, CourseSearchParam courseSearchParam) {
if (courseSearchParam == null) {
courseSearchParam = new CourseSearchParam();
}
//1创建搜索请求对象
SearchRequest searchRequest = new SearchRequest("xc_course");

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//过虑源字段
String[] source_field_array = source_field.split(",");
searchSourceBuilder.fetchSource(source_field_array, new String[]{});
//创建布尔查询对象
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
//搜索条件
//根据关键字搜索
if (StringUtils.isNotEmpty(courseSearchParam.getKeyword())) {
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery(courseSearchParam.getKeyword(), "name", "description", "teachplan")
.minimumShouldMatch("70%")
.field("name", 10);
boolQueryBuilder.must(multiMatchQueryBuilder);
}
if (StringUtils.isNotEmpty(courseSearchParam.getMt())) {
//根据一级分类
boolQueryBuilder.filter(QueryBuilders.termQuery("mt", courseSearchParam.getMt()));
}
if (StringUtils.isNotEmpty(courseSearchParam.getSt())) {
//根据二级分类
boolQueryBuilder.filter(QueryBuilders.termQuery("st", courseSearchParam.getSt()));
}
if (StringUtils.isNotEmpty(courseSearchParam.getGrade())) {
//根据难度等级
boolQueryBuilder.filter(QueryBuilders.termQuery("grade", courseSearchParam.getGrade()));
}

//设置boolQueryBuilder到searchSourceBuilder
searchSourceBuilder.query(boolQueryBuilder);
//设置分页参数
if (page &lt;= 0) {
page = 1;
}
if (size &lt;= 0) {
size = 12;
}
//起始记录下标
int from = (page - 1) * size;
searchSourceBuilder.from(from);
searchSourceBuilder.size(size);

//设置高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.preTags("&lt;font class='eslight'&gt;");
highlightBuilder.postTags("&lt;/font&gt;");
//设置高亮字段
// &lt;font class='eslight'&gt;node&lt;/font&gt;学习
highlightBuilder.fields().add(new HighlightBuilder.Field("name"));
searchSourceBuilder.highlighter(highlightBuilder);

searchRequest.source(searchSourceBuilder);

QueryResult&lt;CoursePub&gt; queryResult = new QueryResult();
List&lt;CoursePub&gt; list = new ArrayList&lt;CoursePub&gt;();
try {
//2执行搜索
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
//3获取响应结果
SearchHits hits = searchResponse.getHits();
long totalHits=hits.getTotalHits().value;
//匹配的总记录数
// long totalHits = hits.totalHits;
queryResult.setTotal(totalHits);
SearchHit[] searchHits = hits.getHits();
for (SearchHit hit : searchHits) {
CoursePub coursePub = new CoursePub();
//源文档
Map&lt;String, Object&gt; sourceAsMap = hit.getSourceAsMap();
//取出id
String id = (String) sourceAsMap.get("id");
coursePub.setId(id);
//取出name
String name = (String) sourceAsMap.get("name");
//取出高亮字段name
Map&lt;String, HighlightField&gt; highlightFields = hit.getHighlightFields();
if (highlightFields != null) {
HighlightField highlightFieldName = highlightFields.get("name");
if (highlightFieldName != null) {
Text[] fragments = highlightFieldName.fragments();
StringBuffer stringBuffer = new StringBuffer();
for (Text text : fragments) {
stringBuffer.append(text);
}
name = stringBuffer.toString();
}
}
coursePub.setName(name);
//图片
String pic = (String) sourceAsMap.get("pic");
coursePub.setPic(pic);
//价格
Double price = null;
try {
if (sourceAsMap.get("price") != null) {
price = (Double) sourceAsMap.get("price");
}

} catch (Exception e) {
e.printStackTrace();
}
coursePub.setPrice(price);
//旧价格
Double price_old = null;
try {
if (sourceAsMap.get("price_old") != null) {
price_old = (Double) sourceAsMap.get("price_old");
}
} catch (Exception e) {
e.printStackTrace();
}
coursePub.setPrice_old(price_old);
//将coursePub对象放入list
list.add(coursePub);
}


} catch (IOException e) {
e.printStackTrace();
}

queryResult.setList(list);
QueryResponseResult&lt;CoursePub&gt; queryResponseResult = new QueryResponseResult&lt;CoursePub&gt;(CommonCode.SUCCESS, queryResult);

return queryResponseResult;
}


}

本站由 钟意 使用 Stellar 1.28.1 主题创建。
又拍云 提供CDN加速/云存储服务
vercel 提供托管服务
湘ICP备2023019799号-1
总访问 次 | 本页访问