进行tpch性能测试时,使用query2进行hive表查询,发现通过webui手动连续执行数次会有1次查询耗时明显偏高,分析profile发现是“Nereids Analysis Time”指标导致。下面列出正常和异常的profile信息片段
异常:
Execution Summary:
- Parse SQL Time: 1ms
- Nereids Lock Table Time: 11ms
**- Nereids Analysis Time: 4s602ms**
- Nereids Rewrite Time: 6ms
- Nereids Optimize Time: 3ms
- Nereids Translate Time: 0ms
- Workload Group: normal
- Analysis Time: 4s612ms
- Plan Time: 42ms
- JoinReorder Time: N/A
- CreateSingleNode Time: N/A
- QueryDistributed Time: N/A
- Init Scan Node Time: 0ms
- Finalize Scan Node Time: 11ms
- Get Splits Time: 7ms
- Get Partitions Time: 0ms
- Get Partition Files Time: 7ms
- Create Scan Range Time: 4ms
- Schedule Time: 28ms
- Fragment Assign Time: 1ms
- Fragment Serialize Time: 4ms
- Fragment RPC Phase1 Time: 22ms
- Fragment RPC Phase2 Time: 1ms
- Fragment Compressed Size: 50.58 KB
- Fragment RPC Count: 2
- Schedule Time Of BE: {"phase1":{"158.2.100.39: 8060":{"RPC Work Time":"21ms","RPC Latency From FE To BE":"1ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"0ms"}},"phase2":{"158.2.100.39: 8060":{"RPC Work Time":"1ms","RPC Latency From FE To BE":"0ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"0ms"}}}
- Wait and Fetch Result Time: 532ms
- Fetch Result Time: 530ms
- Write Result Time: 0ms
- Doris Version: doris-2.1.8-1-834d802457
- Is Nereids: Yes
- Is Pipeline: Yes
- Is Cached: No
- Total Instances Num: 52
- Instances Num Per BE: 158.2.100.39:8060:52
- Parallel Fragment Exec Instance Num: 16
- Trace ID: 716c05af-db0d-423d-bf9a-0a8e222eb51c
- Transaction Commit Time: N/A
- Executed By Frontend: N/A
- Nereids GarbageCollect Time: 0ms
- Nereids BeFoldConst Time: 0ms
正常:
Execution Summary:
- Parse SQL Time: 1ms
- Nereids Lock Table Time: 2ms
- Nereids Analysis Time: 2ms
- Nereids Rewrite Time: 5ms
- Nereids Optimize Time: 3ms
- Nereids Translate Time: 1ms
- Workload Group: normal
- Analysis Time: 3ms
- Plan Time: 16ms
- JoinReorder Time: N/A
- CreateSingleNode Time: N/A
- QueryDistributed Time: N/A
- Init Scan Node Time: 0ms
- Finalize Scan Node Time: 4ms
- Get Splits Time: 0ms
- Get Partitions Time: 0ms
- Get Partition Files Time: 0ms
- Create Scan Range Time: 4ms
- Schedule Time: 27ms
- Fragment Assign Time: 0ms
- Fragment Serialize Time: 3ms
- Fragment RPC Phase1 Time: 21ms
- Fragment RPC Phase2 Time: 3ms
- Fragment Compressed Size: 50.43 KB
- Fragment RPC Count: 2
- Schedule Time Of BE: {"phase1":{"158.2.100.39: 8060":{"RPC Work Time":"20ms","RPC Latency From FE To BE":"1ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"0ms"}},"phase2":{"158.2.100.39: 8060":{"RPC Work Time":"0ms","RPC Latency From FE To BE":"2ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"1ms"}}}
- Wait and Fetch Result Time: 509ms
- Fetch Result Time: 508ms
- Write Result Time: 0ms
- Doris Version: doris-2.1.8-1-834d802457
- Is Nereids: Yes
- Is Pipeline: Yes
- Is Cached: No
- Total Instances Num: 52
- Instances Num Per BE: 158.2.100.39:8060:52
- Parallel Fragment Exec Instance Num: 16
- Trace ID: 376b8d76-8e49-496f-ac42-c185d15b871b
- Transaction Commit Time: N/A
- Executed By Frontend: N/A
- Nereids GarbageCollect Time: 0ms
- Nereids BeFoldConst Time: 0ms
补充:
- 进一步测试了3.0.4版本,依然有类似情况
- 暂停一小段时间,比如1分钟,再次执行也会出现性能明显变慢的情况
- 通过analyze database计算统计信息后,依然存在该情况
- 环境为1个FE+1个BE
结果:
经selectdb大佬指导,是由于hive catalog配置的刷新周期过于频繁导致"metadata_refresh_interval_sec" = "60",将其调大即可解决