Doris 2.0.8 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms 出现大量慢查询量增加

Viewed 36

业务没有调整情况下,doris连接从50左右上升到接近200,平时不慢的查询,也很久才返回。
3台机器,1台机器部署一个fe和be.
1,集群状态正常,同一时刻fe节点没有类似超时错误,服务器CPU,内存,磁盘没有压力。
2,异常时间点 prometheus 监控服务器网络,RetransSegs - Segments retransmitted 指标 平时是4-5,异常的时候接近200,持续时间3分钟做。但是同一台机器上的es节点没有任何异常

常前后的be日志

W1127 14:17:36.061868 30916 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:37.821758 31032 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:37.821802 31032 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:38.387328 30866 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:38.387403 30866 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:38.566062 30986 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:38.566102 30986 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:44.912712 30950 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:44.912743 30950 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:48.016036 30740 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:48.016072 30740 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:50.023115 26207 status.h:396] meet error status: [INTERNAL_ERROR]RuntimeFilter::join_rpc meet rpc error, msg=[E1008]Reached timeout=3000ms @10.100.0.56:8060.

  0#  doris::IRuntimeFilter::join_rpc() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
  1#  doris::VRuntimeFilterSlots::finish_publish() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
  2#  doris::vectorized::HashJoinNode::~HashJoinNode() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:335
  3#  doris::ObjectPool::add<doris::vectorized::HashJoinNode>(doris::vectorized::HashJoinNode*)::{lambda(void*)#1}::__invoke(void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/object_
pool.h:40
  4#  doris::RuntimeState::~RuntimeState() at /home/zcp/repo_center/doris_release/doris/be/src/common/object_pool.h:0
  5#  doris::pipeline::PipelineFragmentContext::~PipelineFragmentContext() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/runtime_state.h:58
  6#  doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../i
nclude/c++/11/ext/atomicity.h:98
  7#  doris::pipeline::TaskScheduler::_do_work(unsigned long) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/task_scheduler.cpp:0
  8#  doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:0
  9#  doris::Thread::supervise_thread(void*) at /var/local/ldb_toolchain/bin/../usr/include/pthread.h:562
  10# start_thread
  11# clone
2 Answers

解决了。我们doris 在不同交换机,交换机互联带宽满了。