2.1.9,Streamload 导入数据 RPC call is timed out

Viewed 18

Doris 版本 : 2.1.9
集群: 1FE, 5BE 都是 16c 64G

在做实时的数据通过streamload 导入到Doris中时, 报错:

[CANCELLED][INTERNAL_ERROR]VNodeChannel[1368917458-194561778], load_id=1b4596a92b373154-970aa09005a78d9b, txn_id=808994560, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[1368917458-194561778], load_id=1b4596a92b373154-970aa09005a78d9b, txn_id=808994560, node=dw-doris-be-0003:10114, host: dw-doris-be-0003

查看be.info 日志发现很多timeout

image.png

这里不上一点详细日志:

I20250725 08:47:30.579151 3735102 merger.cpp:388] estimate batch size for vertical compaction, tablet id: 1368884299 group data size: 0 row num: 439 consume bytes: 18434 way cnt: 50 batch size: 4064
I20250725 08:47:30.584628 3735102 merger.cpp:388] estimate batch size for vertical compaction, tablet id: 1368884299 group data size: 0 row num: 439 consume bytes: 26370 way cnt: 50 batch size: 4064
I20250725 08:47:30.590526 3735102 merger.cpp:388] estimate batch size for vertical compaction, tablet id: 1368884299 group data size: 0 row num: 439 consume bytes: 10975 way cnt: 50 batch size: 4064
I20250725 08:47:30.596400 3735102 merger.cpp:388] estimate batch size for vertical compaction, tablet id: 1368884299 group data size: 0 row num: 439 consume bytes: 7463 way cnt: 50 batch size: 4064
W20250725 08:47:30.611594 3736105 ref_count_closure.h:115] RPC meet failed: [E1008]Reached timeout=60000ms @10.5.3.176:10114
I20250725 08:47:30.611860 3734687 vtablet_writer.cpp:158] mark node_id:VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114 tablet_id: -1 as failed, err: VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114
W20250725 08:47:30.611897 3734687 vtablet_writer.cpp:589] cancel node channel VNodeChannel[269056337-194561779], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0004:10114, error message: [INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
W20250725 08:47:30.611943 3734687 vtablet_writer.cpp:589] cancel node channel VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, error message: [INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
W20250725 08:47:30.611992 3734687 vtablet_writer.cpp:589] cancel node channel VNodeChannel[269056337-194561757], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0002:10114, error message: [INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
I20250725 08:47:30.612001 3735102 tablet_meta_manager.cpp:293] remove old version delete bitmap, tablet_id: 1368884299 version: 29166, removed keys size: 0
W20250725 08:47:30.612020 3734687 vtablet_writer.cpp:589] cancel node channel VNodeChannel[269056337-10023], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0001:10114, error message: [INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
W20250725 08:47:30.612041 3734687 vtablet_writer.cpp:589] cancel node channel VNodeChannel[269056337-194561782], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0005:10114, error message: [INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
I20250725 08:47:30.612071 3734687 vtablet_writer.cpp:1373] close olap table sink. load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, canceled all node channels due to error: [INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
I20250725 08:47:30.612063 3735102 compaction.cpp:786] succeed to do cumulative compaction is_vertical=1. tablet=1368884299, output_version=[28280-29166], current_max_version=29166, disk=/msun/data/disk1/doris, segments=50, input_rowset_size=949464, output_rowset_size=89716, input_row_num=813, output_row_num=448, filtered_row_num=0, merged_row_num=365. elapsed time=0.108558s. cumulative_compaction_policy=size_based, compact_row_per_second=7488
W20250725 08:47:30.612388 3734753 fragment_mgr.cpp:636] report error status: [INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003 to coordinator: TNetworkAddress(hostname=dw-doris-fe-0001, port=10116), query id: 6144299f339516cc-5dd62e0301954b93, instance id: 6144299f339516cc-5dd62e0301954b94
I20250725 08:47:30.612732 3734753 query_context.cpp:189] Query 6144299f339516cc-5dd62e0301954b93 deconstructed, mem_tracker: , deregister query/load memory tracker, queryId=6144299f339516cc-5dd62e0301954b93, Limit=2.00 GB, CurrUsed=1.03 MB, PeakUsed=2.14 MB
W20250725 08:47:30.612802 3734753 stream_load_executor.cpp:105] fragment execute failed, err_msg=[CANCELLED][INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003, id=6144299f339516cc-5dd62e0301954b93, job_id=-1, txn_id=808994379, label=95f4c2b1-d118-4fa4-adf0-20a9245e59be, elapse(s)=60
W20250725 08:47:30.612845 3736323 stream_load.cpp:110] handle streaming load failed, id=6144299f339516cc-5dd62e0301954b93, errmsg=[CANCELLED][INTERNAL_ERROR]VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[269056337-194561778], load_id=6144299f339516cc-5dd62e0301954b93, txn_id=808994379, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
I20250725 08:47:30.614789 3736323 stream_load.cpp:137] finished to execute stream load. label=95f4c2b1-d118-4fa4-adf0-20a9245e59be, txn_id=808994379, query_id=6144299f339516cc-5dd62e0301954b93, load_cost_ms=60008, receive_data_cost_ms=6, read_data_cost_ms=0, write_data_cost_ms=60002, commit_and_publish_txn_cost_ms=0, number_total_rows=0, number_loaded_rows=0, receive_bytes=905, loaded_bytes=0
I20250725 08:47:30.614886 3736323 stream_load.cpp:208] new income streaming load request.id=784420658288728e-7686bf3c974a21a6, job_id=-1, txn_id=-1, label=0593e179-70ff-40b8-b209-e83878979094, elapse(s)=0, db=cdrapp_sxzyyy, tbl=st_st_pd_mk_mdc2_mz_disease_view, group_commit=0
I20250725 08:47:30.617357 3736323 stream_load_executor.cpp:72] begin to execute stream load. label=0593e179-70ff-40b8-b209-e83878979094, txn_id=808994467, query_id=784420658288728e-7686bf3c974a21a6
I20250725 08:47:30.617954 3736323 stream_load.cpp:214] finished to handle HTTP header, id=784420658288728e-7686bf3c974a21a6, job_id=-1, txn_id=808994467, label=0593e179-70ff-40b8-b209-e83878979094, elapse(s)=0
I20250725 08:47:30.618211 3734687 vtablet_writer.cpp:127] init new node for instance 0, incremantal:0
I20250725 08:47:30.618228 3734687 vtablet_writer.cpp:127] init new node for instance 0, incremantal:0
I20250725 08:47:30.618234 3734687 vtablet_writer.cpp:127] init new node for instance 0, incremantal:0
I20250725 08:47:30.618242 3734687 vtablet_writer.cpp:127] init new node for instance 0, incremantal:0
I20250725 08:47:30.618252 3734687 vtablet_writer.cpp:127] init new node for instance 0, incremantal:0
W20250725 08:47:30.626075 3736044 ref_count_closure.h:115] RPC meet failed: [E1008]Reached timeout=60000ms @10.5.3.176:10114
I20250725 08:47:30.626250 3734721 vtablet_writer.cpp:158] mark node_id:VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114 tablet_id: -1 as failed, err: VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114
W20250725 08:47:30.626286 3734721 vtablet_writer.cpp:589] cancel node channel VNodeChannel[210228238-194561782], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0005:10114, error message: [INTERNAL_ERROR]VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
W20250725 08:47:30.626324 3734721 vtablet_writer.cpp:589] cancel node channel VNodeChannel[210228238-194561779], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0004:10114, error message: [INTERNAL_ERROR]VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
W20250725 08:47:30.626343 3734721 vtablet_writer.cpp:589] cancel node channel VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, error message: [INTERNAL_ERROR]VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
W20250725 08:47:30.626368 3734721 vtablet_writer.cpp:589] cancel node channel VNodeChannel[210228238-194561757], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0002:10114, error message: [INTERNAL_ERROR]VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
W20250725 08:47:30.626397 3734721 vtablet_writer.cpp:589] cancel node channel VNodeChannel[210228238-10023], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0001:10114, error message: [INTERNAL_ERROR]VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
I20250725 08:47:30.626415 3734721 vtablet_writer.cpp:1373] close olap table sink. load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, canceled all node channels due to error: [INTERNAL_ERROR]VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, host: dw-doris-be-0003
W20250725 08:47:30.626729 3734753 fragment_mgr.cpp:636] report error status: [INTERNAL_ERROR]VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, open failed, err: [INTERNAL_ERROR]failed to open tablet writer, error=RPC call is timed out, error_text=[E1008]Reached timeout=60000ms @10.5.3.176:10114, info=VNodeChannel[210228238-194561778], load_id=cc43aa1da4c41093-f2531f055ed97590, txn_id=808994389, node=dw-doris-be-0003:10114, host: dw-doris-be-0003 to coordinator: TNetworkAddress(hostname=dw-doris-fe-0001, port=10116), query id: cc43aa1da4c41093-f2531f055ed97590, instance id: cc43aa1da4c41093-f2531f055ed97591
I20250725 08:47:30.626955 1054973 task_worker_pool.cpp:337] successfully submit task|type=CLEAR_TRANSACTION_TASK|signature=478544270

另外导入时,也报了很多线程池被打满的异常

[CANCELLED]VNodeChannel[1329479264-194561778], load_id=604f6febb4bd0722-e6f68483e2be058e, txn_id=809055286, node=dw-doris-be-0003:10114, open failed, err: [CANCELLED]PStatus: (dw-doris-be-0003)[CANCELLED]fail to offer request to the work pool, pool=PriorityThreadPool(name=brpc_light, queue_size=10240/10240, active_thread=128/128, total_get_wait_time=1111855273204023275, total_put_wait_time=203712707044), host: dw-doris-be-0003

1 Answers

问题1:

调整下:tablet_writer_open_rpc_timeout_sec = 600,可能是一次open 太多个 tablet writer导致的,

问题2:
线程池打满了,brpc_light_work_pool_max_queue_size= 翻一倍试试的