doris3.0.8运行insert into 任务报错send fragments failed. io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED

Viewed 16

报错相关信息如下:
ERROR TASK - [taskid=581172014614581248],send fragments failed. io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: CallOptions deadline exceeded after 304.999996916s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[remote_addr=192.168.249.83/192.168.249.83:8060]]], host: 192.168.249.83
java.sql.SQLException: send fragments failed. io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: CallOptions deadline exceeded after 304.999996916s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[remote_addr=192.168.249.83/192.168.249.83:8060]]], host: 192.168.249.83
be.INFO中详细信息:
W20260423 09:59:02.613395 96636 status.h:444] meet error status: [INTERNAL_ERROR]PStatus: send fragments failed. io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: CallOptions deadline exceeded after 304.999996916s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[remote_addr=192.168.249.83/192.168.249.83:8060]]]

     0#  doris::Status doris::Status::create<true>(doris::PStatus const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string        .h:187
     1#  std::_Function_handler<void (), doris::PInternalService::cancel_plan_fragment(google::protobuf::RpcController*, doris::PCancelPlanFragmentRequest const*, doris::PCancelPlan        FragmentResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:0
     2#  doris::WorkThreadPool<false>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
     3#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
     4#  start_thread
     5#  __clone

I20260423 09:59:02.613548 96636 internal_service.cpp:659] Cancel query a7a506afab9147e0-897fe24d54670503, reason: [INTERNAL_ERROR]PStatus: send fragments failed. io.grpc.StatusRuntimeE xception: DEADLINE_EXCEEDED: CallOptions deadline exceeded after 304.999996916s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[remote_addr=192.168.249.83/192.168.249.83 :8060]]]

     0#  doris::Status doris::Status::create<true>(doris::PStatus const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string        .h:187
     1#  std::_Function_handler<void (), doris::PInternalService::cancel_plan_fragment(google::protobuf::RpcController*, doris::PCancelPlanFragmentRequest const*, doris::PCancelPlan        FragmentResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:0
     2#  doris::WorkThreadPool<false>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:6464102958         3#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
     4#  start_thread
     5#  __clone

W20260423 09:59:02.615175 99866 fragment_mgr.cpp:545] report error status: PStatus: send fragments failed. io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: CallOptions deadline excee ded after 304.999996916s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[remote_addr=192.168.249.83/192.168.249.83:8060]]] to coordinator: TNetworkAddress(hostname=192.1 68.249.84, port=9020), query id: a7a506afab9147e0-897fe24d54670503
I20260423 09:59:02.615635 96636 pipeline_fragment_context.cpp:171] PipelineFragmentContext::cancel|query_id=a7a506afab9147e0-897fe24d54670503|fragment_id=0|reason=[INTERNAL_ERROR]PStat us: send fragments failed. io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: CallOptions deadline exceeded after 304.999996916s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[remote_addr=192.168.249.83/192.168.249.83:8060]]]

     0#  doris::Status doris::Status::create<true>(doris::PStatus const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string        .h:187
     1#  std::_Function_handler<void (), doris::PInternalService::cancel_plan_fragment(google::protobuf::RpcController*, doris::PCancelPlanFragmentRequest const*, doris::PCancelPlan        FragmentResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:0
     2#  doris::WorkThreadPool<false>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
     3#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
     4#  start_thread
     5#  __clone
2 Answers

你们有监控吗?看看20260423 09:59:02.613395 这个时间点的日志的,可能是网络问题导致的。

同时,你们BE的配置怎么样,cpu是多少core的,parallel_pipeline_task_num 这个参数是多少?看看是不是并发太多导致的。

你可以加我主页微信

1、有监控,老师这块需要看监控的哪一部分
2、be配置
storage_root_path=/app/apache-doris-3.0.8-bin-x64/be/storage
spill_storage_root_path=/app/apache-doris-3.0.8-bin-x64/be/storeg/spilll
webserver_port=18040
priority_networks = 192.168.249.84/32
JAVA_HOME="/app/jdk-17.0.10
fragment_pool_thread_num_max=2048
fragment_pool_queue_size=4096
brpc_num_threads=1024
result_buffer_cancelled interval_time=30000
3、CPU
32核
4、parallel_pipeline_task_num
这个参数没有配置
5、日志在对应的服务器192.1 68.249.84上的日志是这样的
W20260423 09:59:02.613395 96636 status.h:444] meet error status: [INTERNAL_ERROR]PStatus: send fragments failed, io.grpcc.StatusRuntimeException: DEADLINE_EXCEEDED: Ca110options deadine exceedea after 304.999996916s. Name resolution delay 0.000000000 seconds. [closed=[], open=[remote_addr=192.168.249.83:8060]]]

0# doris::Status doris::Status::create(true)(doris::PStatus const&) at /var/local/lbd-toolchain/bin/.../lib/gcc/x86_64-linux-gnu/11/../include/c++/11/bits/basic_string.h:187
1# std::Function_handler<void()>, doris::PInternalService::cancel_plan_fragment(google::protobuf::RpcController*, doris::PCancelPlanFragmentRequest const*, doris::PCancelPlanFragmentResult*, google::protobuf::Closure*):$0>...M_invoke(std::__tr1::lambda(std::basic_string<char, std::char_traits, std::allocator>&&), Any_data_const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:0
2# doris::WorkThreadPool::work_thread(init) at /var/local/lbd-toolchain/bin/.../lib/gcc/x86_64-linux-gnu/11/../include/c++/11/bits/basic_string.h:186
3# execute_native_thread_route at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++/v1/include/stdc++.h:85

W20260423 09:59:02.613548 96636 internal service.cpp:659] Cancel query a7a506afab9147e0-897fe24d54670503, reason: [INTERNAL_ERROR]PStatus: send fragments failed, io.grpcc.StatusRuntimeException: DEADLINE_EXCEEDED: Ca110options deadline exceedea after 304.999996916s. Name resolution delay 0.000000000 seconds. [closed=[], open=[remote_addr=192.168.249.83:8060]]]

0# doris::Status doris::Status::create(true)(doris::PStatus const&) at /var/local/lbd-toolchain/bin/.../lib/gcc/x86_64-linux-gnu/11/../include/c++/11/bits/basic_string.h:187
1# std::Function_handler<void()>, doris::PInternalService::cancel_plan_fragment(google::protobuf::RpcController*, doris::PCancelPlanFragmentRequest const*, doris::PCancelPlanFragmentResult*, google::protobuf::Closure*):$0>...M_invoke(std::__tr1::lambda(std::basic_string<char, std::char_traits, std::allocator>&&), Any_data_const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:0
2# doris::WorkThreadPool::work_thread(init) at /var/local/lbd-toolchain/bin/.../lib/gcc/x86_64-linux-gnu/11/../include/c++/11/bits/basic_string.h:186
3# execute_native_thread_route at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++/v1/include/stdc++.h:85