2.1.5版本doris集群,采集任务偶尔报错,有大佬帮忙看下什么原因吗

Viewed 14

采集报错:

  [2025-06-12 15:06:10] {standard_task_runner.py:124} ERROR - Failed to execute job 2982148 for task ae118afe57a54e51acf850e5d27c313d ((1105, \'errCode = 2, detailMessage = (be1.doris.server)[CANCELLED]JdbcExecutorException: Initialize datasource failed: \
  CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.\'); 1085808)

然后我查看be1.doris.server这个be节点的日志:

I20250612 15:06:09.928042 380760 fragment_mgr.cpp:705] Register query/load memory tracker, query/load id: 5b1616e9186b4826-b978b2fc3aeb07cc limit: 0
I20250612 15:06:09.928058 380760 pipeline_x_fragment_context.cpp:189] PipelineXFragmentContext::prepare|query_id=5b1616e9186b4826-b978b2fc3aeb07cc|fragment_id=3|pthread_id=139860029560576
W20250612 15:06:09.929618 3952632 jni-util.cpp:259] org.apache.doris.jdbc.JdbcExecutorException: Initialize datasource failed:
        at org.apache.doris.jdbc.BaseJdbcExecutor.init(BaseJdbcExecutor.java:345)
        at org.apache.doris.jdbc.BaseJdbcExecutor.<init>(BaseJdbcExecutor.java:93)
        at org.apache.doris.jdbc.MySQLJdbcExecutor.<init>(MySQLJdbcExecutor.java:50)
Caused by: java.sql.SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.
        at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162)
        at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128)
        at org.apache.doris.jdbc.BaseJdbcExecutor.init(BaseJdbcExecutor.java:334)
        ... 2 more
W20250612 15:06:09.929708 3952632 status.h:412] meet error status: [INTERNAL_ERROR]JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.

        0#  doris::JniUtil::GetJniExceptionMsg(JNIEnv_*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /home/zcp/repo_center/doris_release/doris/be/src/util/jni-util.h:117
        1#  doris::vectorized::JdbcConnector::open(doris::RuntimeState*, bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
        2#  doris::vectorized::NewJdbcScanner::open(doris::RuntimeState*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:481
        3#  doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:377
        4#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
        5#  doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:0
        6#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
        7#  ?
        8#  __clone
W20250612 15:06:09.929783 381895 task_scheduler.cpp:361] Pipeline task failed. query_id: 19041b233144e4a-995f84bdc9aefe23|0-0 reason: [INTERNAL_ERROR]JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.

        0#  doris::JniUtil::GetJniExceptionMsg(JNIEnv_*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /home/zcp/repo_center/doris_release/doris/be/src/util/jni-util.h:117
        1#  doris::vectorized::JdbcConnector::open(doris::RuntimeState*, bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
        2#  doris::vectorized::NewJdbcScanner::open(doris::RuntimeState*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:481
        3#  doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:377
        4#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
        5#  doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:0
        6#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
        7#  ?
        8#  __clone
I20250612 15:06:09.929948 381895 pipeline_x_fragment_context.cpp:141] PipelineXFragmentContext::cancel|query_id=19041b233144e4a-995f84bdc9aefe23|fragment_id=0|reason=3|error message=JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.
I20250612 15:06:09.929968 381895 pipeline_x_fragment_context.cpp:141] PipelineXFragmentContext::cancel|query_id=19041b233144e4a-995f84bdc9aefe23|fragment_id=1|reason=3|error message=JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.
W20250612 15:06:09.929982 381895 pipeline_x_fragment_context.cpp:154] PipelineXFragmentContext cancel instance: 19041b233144e4a-995f84bdc9aefe25
W20250612 15:06:09.930008 381895 pipeline_x_fragment_context.cpp:154] PipelineXFragmentContext cancel instance: 19041b233144e4a-995f84bdc9aefe24
W20250612 15:06:09.930061 3910788 vtablet_writer.cpp:587] cancel node channel VNodeChannel[409261-10302], load_id=19041b233144e4a-995f84bdc9aefe23, txn_id=5213769, node=be3.doris.server:18060, error message: [CANCELLED]JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.
I20250612 15:06:09.930064 381894 fragment_mgr.cpp:608] Removing query 19041b233144e4a-995f84bdc9aefe23 instance 19041b233144e4a-995f84bdc9aefe25, all done? false
W20250612 15:06:09.930109 379037 fragment_mgr.cpp:433] report error status: JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms. to coordinator: TNetworkAddress(hostname=fe1.doris.server, port=19020), query id: 19041b233144e4a-995f84bdc9aefe23, instance id: 0-0
W20250612 15:06:09.930167 379046 fragment_mgr.cpp:433] report error status: JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms. to coordinator: TNetworkAddress(hostname=fe1.doris.server, port=19020), query id: 19041b233144e4a-995f84bdc9aefe23, instance id: 0-0
I20250612 15:06:09.930131 381895 fragment_mgr.cpp:608] Removing query 19041b233144e4a-995f84bdc9aefe23 instance 19041b233144e4a-995f84bdc9aefe24, all done? true
W20250612 15:06:09.930164 3910788 vtablet_writer.cpp:587] cancel node channel VNodeChannel[409261-14637], load_id=19041b233144e4a-995f84bdc9aefe23, txn_id=5213769, node=be1.doris.server:18060, error message: [CANCELLED]JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.
I20250612 15:06:09.930250 381895 fragment_mgr.cpp:614] Query 19041b233144e4a-995f84bdc9aefe23 finished
W20250612 15:06:09.930303 3910788 vtablet_writer.cpp:587] cancel node channel VNodeChannel[409261-14626], load_id=19041b233144e4a-995f84bdc9aefe23, txn_id=5213769, node=be2.doris.server:18060, error message: [CANCELLED]JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.
I20250612 15:06:09.930371 3910788 vtablet_writer.cpp:1347] close olap table sink. load_id=19041b233144e4a-995f84bdc9aefe23, txn_id=5213769, canceled all node channels due to error: [CANCELLED]JdbcExecutorException: Initialize datasource failed:
CAUSED BY: SQLTransientConnectionException: HikariPool-13 - Connection is not available, request timed out after 5000ms.
I20250612 15:06:09.930541 381071 load_channel_mgr.cpp:194] load channel has been cancelled: 019041b233144e4a-995f84bdc9aefe23
I20250612 15:06:09.930567 381071 load_channel.cpp:69] load channel removed load_id=019041b233144e4a-995f84bdc9aefe23, is high priority=0, sender_ip=be1.doris.server
I20250612 15:06:09.930603 381245 vtablet_writer.cpp:995] All node channels are stopped(maybe finished/offending/cancelled), sender thread exit. 19041b233144e4a-995f84bdc9aefe23

我配置的定时任务,每小时跑一次,同一时间有会很多采集任务入到doris中,偶尔会有不同的任务失败,不是同一个任务一直失败,基本都是这个报错,有大佬能帮看下什么原因吗

使用的catalog方式采集,同一个任务可能一会成功,下一次又失败了,感觉很奇怪

0 Answers