最近将Doris环境的版本从2.1.5升级到4.0.1,升级后验证功能是否都正常时发现使用streamload方式向一张聚合表批量导入数据时会超时ReadTimeOut, 表结构如下
CREATE TABLE instantaneous (
interval bigint NOT NULL,
org_id varchar(32) NULL DEFAULT "",
device_code varchar(64) NULL,
point_name varchar(128) NULL,
max_value decimal(21,5) MAX NULL,
min_value decimal(21,5) MIN NULL,
avg_value decimal(21,5) REPLACE_IF_NOT_NULL NULL,
diff_value decimal(21,5) REPLACE_IF_NOT_NULL NULL,
instantaneous varchar(64) REPLACE NULL,
last_update_time datetime REPLACE NULL
) ENGINE=OLAP AGGREGATE KEY(interval, org_id, device_code, point_name)
DISTRIBUTED BY HASH(interval) BUCKETS AUTO
一开始怀疑是httpclient的问题,尝试过将http请求由hutool的HttpRequest换成HttpURLConnection、okHttp和apache的httpclient都不行,甚至还手动用apifox客户端发起请求,也是以失败告终。
http请求超时信息如下:
16:17:56.354 [http-nio-8977-exec-1] INFO c.w.s.i.xxxxServiceImpl - [executeWithRedirectHandling,146] - 重定向到: http://xxx:8040/api/db/instantaneous/_stream_load?
16:19:26.391 [http-nio-8977-exec-1] ERROR c.w.s.i.xxxxServiceImpl - [streamLoadOriginxxx,91] - Exception:Read timed out
16:19:26.403 [http-nio-8977-exec-1] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - [log,175] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.RuntimeException: java.net.SocketTimeoutException: Read timed out] with root cause
DorisBE中的日志信息如下:
I20260330 16:17:58.209887 3649372 stream_load.cpp:218] new income streaming load request.id=a24a526ecb2d21dc-ff901fd4e5a3c38a, job_id=-1, txn_id=-1, label=xx_instantaneous_-149401293_636232920, elapse(s)=0, db=dbname, tbl=xx_instantaneous, group_commit=0, HTTP headers=Content-Length:131, Accept:text/html, image/gif, image/jpeg, *; q=.2, /; q=.2, Connection:keep-alive, Host:192.xx.xx.223:8040, Pragma:no-cache, Cache-Control:no-cache, User-Agent:Java/1.8.0_421, column_separator:,, Content-Encoding:gzip, label:xx_instantaneous_-149401293_636232920, Expect:100-continue, Content-Type:text/plain, Authorization:Basic cm9vdDp6Ynh4QDIwXXX=,
I20260330 16:17:58.218360 3649372 stream_load_executor.cpp:74] begin to execute stream load. label=xx_instantaneous_-149401293_636232920, txn_id=1640929, query_id=a24a526ecb2d21dc-ff901fd4e5a3c38a
I20260330 16:17:58.218484 3649372 fragment_mgr.cpp:716] query_id: a24a526ecb2d21dc-ff901fd4e5a3c38a, coord_addr: TNetworkAddress(hostname=192.xx.xx.224, port=9020), total fragment num on current host: 0, fe process uuid: 1774077869210, query type: LOAD, report audit fe:TNetworkAddress(hostname=192.xx.xx.224, port=9020), use wg:1773992035626,normal
I20260330 16:17:58.221365 3649372 stream_load.cpp:225] finished to handle HTTP header, id=a24a526ecb2d21dc-ff901fd4e5a3c38a, job_id=-1, txn_id=1640929, label=xx_instantaneous_-149401293_636232920, elapse(s)=0
I20260330 16:17:58.224470 3648292 tablets_channel.cpp:162] open tablets channel (load_id=a24a526ecb2d21dc-ff901fd4e5a3c38a, index_id=1774078842526), tablets num: 2 timeout(s): 259200, init senders 1 with incremental off
I20260330 16:17:58.227481 1841946 vtablet_writer.cpp:1257] VNodeChannel[1774078842526-1773992035664], load_id=a24a526ecb2d21dc-ff901fd4e5a3c38a, txn_id=1640929, node=192.xx.xx.223:8060 mark closed, left pending batch size: 1 hang_wait: 0
I20260330 16:17:58.230815 3648332 tablets_channel.cpp:283] txn 1640929: close tablets channel of index 1774078842526 , sender id: 0, backend 1773992035664, remain senders: 0
I20260330 16:17:58.230840 3649080 vtablet_writer.cpp:1300] All node channels are stopped(maybe finished/offending/cancelled), sender thread exit. a24a526ecb2d21dc-ff901fd4e5a3c38a
I20260330 16:17:58.234810 3648188 segment_creator.cpp:269] tablet_id:1774078842563, flushing rowset_dir: /opt/module/doris-4.0.1/be/storage/data/890/1774078842563/102384574, rowset_id:0200000000eda20b444777b36a916c4c4b065fd75017308a, data size:1.89 KB, index size:0, timing breakdown: total=2ms, finalize=2ms, inverted_index=0ms, collector=0ms
I20260330 16:18:02.542423 2042950 backend_service.cpp:1290] query for dictionary status, return 0 rows
I20260330 16:18:02.605852 3648068 olap_server.cpp:1322] cooldown producer get tablet num: 0
I20260330 16:18:06.612844 3647967 wal_manager.cpp:489] Scheduled(every 10s) WAL info: [/opt/module/doris-4.0.1/be/storage/wal: limit 28538003865 Bytes, used 0 Bytes, estimated wal bytes 0 Bytes, available 28538003865 Bytes.];
I20260330 16:18:07.549392 2042950 backend_service.cpp:1290] query for dictionary status, return 0 rows
W20260330 16:10:12.143518 1340683 pipeline_fragment_context.cpp:215] PipelineFragmentContext cancel instance: d64ae1260a62e09a-e5e7889fa459b481
W20260330 16:10:12.144016 2746942 fragment_mgr.cpp:559] report error status: rpc failed, error code:1008, error text:[E1008]Reached timeout=599968ms @192.xx.xx.223:8060, host: 192.xx.xx.223 to coordinator: TNetworkAddress(hostname=192.xx.xx.224, port=9020), query id: d64ae1260a62e09a-e5e7889fa459b481
W20260330 16:10:12.146026 2746942 stream_load_executor.cpp:107] fragment execute failed, err_msg=[INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599968ms @192.xx.xx.223:8060, host: 192.xx.xx.223, id=d64ae1260a62e09a-e5e7889fa459b481, job_id=-1, txn_id=1640914, label=audit_log_20260330_160016_489_192_168_218_224_9010, elapse(s)=600
W20260330 16:10:12.146436 1341916 stream_load.cpp:113] handle streaming load failed, id=d64ae1260a62e09a-e5e7889fa459b481, errmsg=[INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599968ms @192.xx.xx.223:8060, host: 192.xx.xx.223
W20260330 16:11:28.912442 1340683 task_scheduler.cpp:84] Pipeline task failed. query_id: 8df17604b5342d7-a0bb20d59613ebce reason: [LIMIT_REACH]PStatus: query reach limit
W20260330 16:13:21.196149 1341717 vtablet_writer.h:176] failed to send brpc batch, error=RPC call is timed out, error_text=[E1008]Reached timeout=599954ms @192.xx.xx.223:8060
W20260330 16:13:21.196503 1341717 vtablet_writer.cpp:1100] rpc failed may caused by timeout. increase BE config min_load_rpc_timeout_ms of to avoid this if you are sure that your table building and data are reasonable.
W20260330 16:13:21.196563 1341717 vtablet_writer.cpp:832] cancel node channel VNodeChannel[1773992035924-1773992035664], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.223:8060, error message: VNodeChannel[1773992035924-1773992035664], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.223:8060, err: [INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223
W20260330 16:13:21.196676 1341717 brpc_closure.h:128] RPC meet failed: [E1008]Reached timeout=599954ms @192.xx.xx.223:8060
W20260330 16:13:21.196988 1341599 vtablet_writer.h:176] failed to send brpc batch, error=RPC call is timed out, error_text=[E1008]Reached timeout=599956ms @192.xx.xx.224:8060
W20260330 16:13:21.197175 1341599 vtablet_writer.cpp:1100] rpc failed may caused by timeout. increase BE config min_load_rpc_timeout_ms of to avoid this if you are sure that your table building and data are reasonable.
W20260330 16:13:21.197221 1341599 vtablet_writer.cpp:832] cancel node channel VNodeChannel[1773992035924-1773992035665], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.224:8060, error message: VNodeChannel[1773992035924-1773992035665], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.224:8060, err: [INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223
W20260330 16:13:21.197259 1341599 brpc_closure.h:128] RPC meet failed: [E1008]Reached timeout=599956ms @192.xx.xx.224:8060
W20260330 16:13:21.197507 2730999 vtablet_writer.cpp:295] VNodeChannel[1773992035924-1773992035665], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.224:8060, close channel failed, err: [INTERNAL_ERROR]VNodeChannel[1773992035924-1773992035665], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.224:8060, err: [INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223
W20260330 16:13:21.197764 2730999 vtablet_writer.cpp:832] cancel node channel VNodeChannel[1773992035924-1773992035665], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.224:8060, error message: [INTERNAL_ERROR]VNodeChannel[1773992035924-1773992035665], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.224:8060, err: [INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223
W20260330 16:13:21.198024 2730999 vtablet_writer.cpp:295] VNodeChannel[1773992035924-1773992035664], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.223:8060, close channel failed, err: [INTERNAL_ERROR]VNodeChannel[1773992035924-1773992035664], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.223:8060, err: [INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223
W20260330 16:13:21.198139 2730999 vtablet_writer.cpp:832] cancel node channel VNodeChannel[1773992035924-1773992035664], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.223:8060, error message: [INTERNAL_ERROR]VNodeChannel[1773992035924-1773992035664], load_id=404e4d9e3ff2f76c-188b50fc8a456fb1, txn_id=1640918, node=192.xx.xx.223:8060, err: [INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223
W20260330 16:13:21.198959 1340680 pipeline_fragment_context.cpp:215] PipelineFragmentContext cancel instance: 404e4d9e3ff2f76c-188b50fc8a456fb1
W20260330 16:13:21.199514 2730999 fragment_mgr.cpp:559] report error status: rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223 to coordinator: TNetworkAddress(hostname=192.xx.xx.224, port=9020), query id: 404e4d9e3ff2f76c-188b50fc8a456fb1
W20260330 16:13:21.200660 2730999 stream_load_executor.cpp:107] fragment execute failed, err_msg=[INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223, id=404e4d9e3ff2f76c-188b50fc8a456fb1, job_id=-1, txn_id=1640918, label=audit_log_202xxx30_160325_570_192_x_x_224_9010, elapse(s)=600
W20260330 16:13:21.200966 1341920 stream_load.cpp:113] handle streaming load failed, id=404e4d9e3ff2f76c-188b50fc8a456fb1, errmsg=[INTERNAL_ERROR]rpc failed, error code:1008, error text:[E1008]Reached timeout=599954ms @192.xx.xx.223:8060, host: 192.xx.xx.223
为了不长时间影响业务,将streamload方式改为insert into方式批量插入。
然后在测试环境下尝试将上述的表改为UNIQUE模型的表,居然没有问题,谁遇到过,这是怎么回事呢?