Doris 4.0.5 倒排索引使用ngram分词,查询长字符串性能差的问题

Viewed 12

创建表索引
CREATE inverted index tokenizer if not exists ngram3_tokenizer
properties (
"type"="ngram",
"min_gram" = "3",
"max_gram" = "3"
);

CREATE inverted index analyzer if not exists ngram3_analyzer
properties (
"tokenizer" = "ngram3_tokenizer",
"token_filter" = "lowercase"
);

**创建表时指定的索引如下
**INDEX inv_URL (URL) USING INVERTED PROPERTIES( "analyzer"="ngram3_analyzer", "support_phrase"="true" )

查询SQL

select url
from t1
where search('url:"https://doris.apache.org/zh-CN/docs/3.x/query-acceleration/materialized-view/async-materialized-view/functions-and-demands/"')
and time > (7天前) and time < 当前

7天内的查询,长字符查询非常慢

如果是换成较短的字符串的话就快很多,如下面的SQL
select url
from t1
where search('url:"doris.apache.org"') and time > (7天前) and time < 当前

select url
from t1
where search('url:"apache"') and time > (7天前) and time < 当前

字符串很短就会非常快

用户也会有查询三个字符的情况,比如pwd这样的需求

请问该如何优化?

0 Answers