Doris4.0.2版本,集群部署模式3台主机,不稳定,经常报错。insert报failed to write enough replicas 0/1 for tablet,select查询也报错

Viewed 13

1、查询问题:
fe从节点日志:2026-04-02 12:31:19,596 WARN (mysql-nio-pool-36|689) [ReadListener.lambda$handleEvent$0():60] Exception happened in one session(org.apache.doris.qe.ConnectContext@518705f9:dict_user).
java.io.IOException: Error happened when receiving packet.
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:389)
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:842)
2026-04-02 12:31:19,598 WARN (mysql-nio-pool-34|673) [ConnectProcessor.handleQuery():202] execute query exception
org.apache.doris.common.UserException: errCode = 2, detailMessage = cancel query by user from x.x.x.x:62926
at org.apache.doris.qe.runtime.QueryProcessor.doGetNext(QueryProcessor.java:166)
at org.apache.doris.qe.runtime.QueryProcessor.getNext(QueryProcessor.java:123)
at org.apache.doris.qe.NereidsCoordinator.getNext(NereidsCoordinator.java:217)
at org.apache.doris.qe.StmtExecutor.executeAndSendResult(StmtExecutor.java:1321)
at org.apache.doris.qe.StmtExecutor.handleCacheStmt(StmtExecutor.java:1183)
at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1254)
at org.apache.doris.qe.StmtExecutor.handleQueryWithRetry(StmtExecutor.java:914)
at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:818)
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:541)
at org.apache.doris.qe.StmtExecutor.queryRetry(StmtExecutor.java:500)
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:485)
at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:311)
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:198)
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:231)
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:259)
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:403)
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:842)
_______________
fe master节点:
2026-04-02 12:31:19,591 WARN (mysql-nio-pool-48|1215) [StmtExecutor.execute():546] Analyze failed. stmt[1806, 229d0aa41d4d4ba8-869d7d4b338a15b3]
org.apache.doris.common.NereidsException: errCode = 2, detailMessage = Unknown thread id: 3023
at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:698)
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:541)
at org.apache.doris.qe.StmtExecutor.queryRetry(StmtExecutor.java:500)
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:485)
at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:311)
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:198)
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:231)
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:259)
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:403)
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:842)
Caused by: org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = Unknown thread id: 3023
... 13 more
Caused by: org.apache.doris.common.DdlException: errCode = 2, detailMessage = Unknown thread id: 3023


be 主节点:report error status: PStatus: cancel query by user from x.x.x.x:30186 to coordinator: TNetworkAddress(hostname=x.x.x.x, port=9020), query id: 6c67b03286844545-b1ee2c43a10de21e

2、插入问题
fe主节点:
2026-03-26 18:13:41,084 WARN (mtmv-task-execute-1-thread-10|5404) [AbstractInsertExecutor.execImpl():200] insert [label_944351d8bab1478a_a34f6c407d5eb4af] with query id 944351d8bab1478a-a34f6c407d5eb4af failed, (x.x.x.x)[INTERNAL_ERROR]failed to open DeltaWriter 1774432324439: failed to write enough replicas 0/1 for tablet 1774432324439 due to connection errors

be节点:
W20260326 17:11:25.704821 872655 fragment_mgr.cpp:575] Retrying ReportExecStatus. query id: a133aebc924448aa-9762e261beb80f9c, instance id: 0-0 to TNetworkAddress(hostname=x.x.x.x, port=9020), err: No more data to read.
W20260326 17:12:18.653076 870044 fragment_mgr.cpp:917] Query d6fd5bcce6d44ba2-b4017ad3cf8157d0 does not exists, failed to cancel it
W20260326 17:12:25.701879 869544 status.h:439] meet error status: [INTERNAL_ERROR]Failed to connect to backend 1766541440409: [E1008]Reached timeout=60000ms @x.x.x.x:8060
W20260326 17:12:25.702281 869544 load_stream_stub.cpp:532] open stream failed: [INTERNAL_ERROR]Failed to connect to backend 1766541440409: [E1008]Reached timeout=60000ms @x.x.x.x:8060

1 Answers

BE 报错:Failed to connect to backend 1766541440409

得看下报错这个节点和 be_id = 1766541440409 的这个 BE 节点之间,8060 端口通讯是否是正常的。