最近在学习Elasticsearch,在看到span not query的时候一头雾水,官方也没给出更详细的例子。如鲠在喉,难受。
经过一番搜索和实践,得出了一点儿经验。
先定义Mapping
PUT /span_not_query_test
{
"mappings": {
"properties": {
"content": {
"type": "text"
}
}
}
}
造两条数据
PUT /span_not_query_test/_doc/1
{
"content":"the quick red fox jumps over the sleepy cat"
}
PUT /span_not_query_test/_doc/2
{
"content":"the quick brown fox jumps over the lazy dog"
}
例子1
POST /span_not_query_test/_search
{
"query": {
"span_not": {
"include": {
"span_term": {
"content": {
"value": "quick"
}
}
},
"exclude": {
"span_term": {
"content": {
"value": "the"
}
}
}
}
}
}
结论:
exclude.span_term.content.value
== quick
,无文档返回;否则,会返回两个文档。
例子2
POST /span_not_query_test/_search
{
"query": {
"span_not": {
"include": {
"span_near": {
"clauses": [
{
"span_term": {
"content": {
"value": "quick"
}
}
},
{
"span_term": {
"content": {
"value": "over"
}
}
}
],
"slop": 3,
"in_order": true
}
},
"exclude": {
"span_term": {
"content": {
"value": "lazy"
}
}
}
}
}
}
实验结果如下:
exclude.span_term.content.value
in [quick
,fox
,jumps
,over
],无文档返回;exclude.span_term.content.value
==red
,只返回了the quick brown fox jumps over the lazy dog
这一个文档;exclude.span_term.content.value
==brown
,只返回了the quick red fox jumps over the sleepy cat
这一个文档;exclude.span_term.content.value
in [the
,over
,lazy
,dog
,sleepy
,cat
],即如果是content中quick
之前的任意terms或over
之后的任意terms,都会返回这两个文档。
结论:
exclude
和must_not
的工作方式不一样,它并不会把符合自身条件的docs查询出来然后再从include
的结果中remove掉它们,而只是在条件这一层面上判断是否包含在include
的条件范围内。
当然,最好的方式还是去看Elasticsearch和Lucene的SpanNotQuery
的源码。
参考资料
- https://stackoverflow.com/questions/24260103/spannotquery-giving-unexpected-results-exclude-is-ignored
- https://elasticsearch.cn/article/13677