Doris从2.1.6迭代升级到3.1.2后故障问题

Viewed 15

春节前有一天,FE的磁盘满了,没有及时清理,后续清理磁盘后,通过--metadata_failure_recovery命令,修复元数据并启动主节点后,重启从节点发现FOLLOWER无法接入主节点,一启动从节点就会导致主节点宕机,报出是BDB异常,查了说是3.x之后BDB保护机制导致的,避免BDB数据错乱;
后续问AI,尝试执行
ALTER SYSTEM DROP FOLLOWER "10.0.30.216:9010";删除FOLLOWER后
重新执行ALTER SYSTEM ADD FOLLOWER "10.0.30.216:9010";主节点执行语句后,出现了三次
2026-03-27 15:14:00,517 ERROR (EditLog-Flusher|62) [BDBJEJournal.write():198] catch an exception when writing to database. sleep and retry. the first journal id 446603267
com.sleepycat.je.rep.InsufficientReplicasException: (JE 18.3.12) Commit policy: SIMPLE_MAJORITY required 1 replica. But none were active with this master.
at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureReplicasForCommit(DurabilityQuorum.java:116) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
错误后,报出BDB异常就自动退出了
2026-03-27 15:14:05,518 ERROR (EditLog-Flusher|62) [BDBJEJournal.write():222] write bdb failed. will exit. the first journalId: 446603267, bdb database Name: 446602721
2026-03-27 15:14:05,519 ERROR (EditLog-Flusher|62) [LogUtils.stderr():54] StderrLogger 2026-03-27 15:14:05,519 write bdb failed. will exit. the first journalId: 446603267, bdb database Name: 446602721
尝试过删除旧的从节点的BDB文件夹,清空meta-data,仍然无法加入主节点集群,目前只有主节点单机运行中。。。有解决方案吗?。。

0 Answers