业务没有调整情况下,doris连接从50左右上升到接近200,平时不慢的查询,也很久才返回。
3台机器,1台机器部署一个fe和be.
1,集群状态正常,同一时刻fe节点没有类似超时错误,服务器CPU,内存,磁盘没有压力。
2,异常时间点 prometheus 监控服务器网络,RetransSegs - Segments retransmitted 指标 平时是4-5,异常的时候接近200,持续时间3分钟做。但是同一台机器上的es节点没有任何异常
常前后的be日志
W1127 14:17:36.061868 30916 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:37.821758 31032 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:37.821802 31032 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:38.387328 30866 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:38.387403 30866 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:38.566062 30986 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:38.566102 30986 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:44.912712 30950 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:44.912743 30950 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:48.016036 30740 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.55:8060
W1127 14:17:48.016072 30740 runtime_filter_mgr.cpp:472] runtimefilter rpc err:[E1008]Reached timeout=500ms @10.100.0.56:8060
W1127 14:17:50.023115 26207 status.h:396] meet error status: [INTERNAL_ERROR]RuntimeFilter::join_rpc meet rpc error, msg=[E1008]Reached timeout=3000ms @10.100.0.56:8060.
0# doris::IRuntimeFilter::join_rpc() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
1# doris::VRuntimeFilterSlots::finish_publish() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
2# doris::vectorized::HashJoinNode::~HashJoinNode() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:335
3# doris::ObjectPool::add<doris::vectorized::HashJoinNode>(doris::vectorized::HashJoinNode*)::{lambda(void*)#1}::__invoke(void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/object_
pool.h:40
4# doris::RuntimeState::~RuntimeState() at /home/zcp/repo_center/doris_release/doris/be/src/common/object_pool.h:0
5# doris::pipeline::PipelineFragmentContext::~PipelineFragmentContext() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/runtime_state.h:58
6# doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../i
nclude/c++/11/ext/atomicity.h:98
7# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/task_scheduler.cpp:0
8# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:0
9# doris::Thread::supervise_thread(void*) at /var/local/ldb_toolchain/bin/../usr/include/pthread.h:562
10# start_thread
11# clone