【doris查询paimon catalog表】连续执行同一个查询sql出现性能波动 和 并发执行同一个查询明显变慢

Viewed 13

针对以下两种情况,可以做哪些优化?
问题1:
同个sql(查询paimon表)并发查询50 ,plan time增加
Execution Summary:
- Parse SQL Time: 3ms
- Nereids Lock Table Time: 4ms
- Nereids Analysis Time: 4ms
- Nereids Rewrite Time: 110ms
- Nereids Optimize Time: 6ms
- Nereids Translate Time: 2ms
- Workload Group: normal
- Analysis Time: 5ms
- Plan Time: 11sec788ms
- JoinReorder Time: N/A
- CreateSingleNode Time: N/A
- QueryDistributed Time: N/A
- Init Scan Node Time: 386ms
- Finalize Scan Node Time: 1sec331ms
- Get Splits Time: 1sec331ms
- Get Partitions Time: N/A
- Get Partition Files Time: N/A
- Create Scan Range Time: -3693ms
- Get Partition Version Time: N/A
- Get Partition Version Count (hasData): N/A
- Get Partition Version Count: N/A
- Get Table Version Time: N/A
- Get Table Version Count: N/A
- Schedule Time: 92ms
- Fragment Assign Time: 1ms
- Fragment Serialize Time: 14ms
- Fragment RPC Phase1 Time: 72ms
- Fragment RPC Phase2 Time: 5ms
- Fragment Compressed Size: 1.45 MB
- Fragment RPC Count: 6

问题2:
单个sql(查paimon表)查询, 进行连续执行数次,会有一次查询耗时明显偏高,查看profile文件主要是Nereids Analysis Time 和 Nereids Lock Table Time 指标明显升高。
将paimon catalog配置的刷新周期"metadata_refresh_interval_sec" 调大,查询时间有所降低,存在问题是刷新周期时间调大后,查不到最新数据,延迟太高(数据更新,paimon snapshot文件也会有变化,而元数据包括了文件信息)。
Execution Summary:
- Parse SQL Time: 15ms
- Nereids Lock Table Time: 13sec944ms
- Nereids Analysis Time: 6sec703ms
- Nereids Rewrite Time: 54ms
- Nereids Optimize Time: 183ms
- Nereids Translate Time: 4ms
- Workload Group: normal
- Analysis Time: 20sec631ms
- Plan Time: 1sec845ms
- JoinReorder Time: N/A
- CreateSingleNode Time: N/A
- QueryDistributed Time: N/A
- Init Scan Node Time: 111ms
- Finalize Scan Node Time: 199ms
- Get Splits Time: 199ms
- Get Partitions Time: N/A
- Get Partition Files Time: N/A
- Create Scan Range Time: -199ms
- Get Partition Version Time: N/A
- Get Partition Version Count (hasData): N/A
- Get Partition Version Count: N/A
- Get Table Version Time: N/A
- Get Table Version Count: N/A
- Schedule Time: 110ms

1 Answers
  1. "并发查询 50,plan time增加" ,并发上去后 FE 的负载有观察吗?比如是不是 CPU 比较高,导致优化器解析比较慢什么的,可能得结合负载情况观察下。
  2. "Nereids Analysis Time 和 Nereids Lock Table Time 指标明显升高" 这个如果能复现的话,可能也需要结合监控查看,并且可以收集个火焰图:https://doris.apache.org/zh-CN/community/developer-guide/fe-profiler/

您可以加我主页微信,一起分析下