doris 4.0.3 存算分离k8s部署,主机重启后be无法启动

Viewed 3

doris版本:4.0.3
部署方式:2节点k3s集群,fdb 2节点部署在主机上,operator和doris都是helm chart部署,存储为minio,初始安装后所有服务正常,已创建设置默认vault和cg,空集群未创建任何库和表
故障现象:重启集群主机后其他服务正常,但be的2个pod无法启动,一直crash,be pod日志如下:

[root@ky10sp3 pvc-088260e4-3787-49eb-80d4-3e10eb2821db]# kubectl logs -n hope doriscluster-helm-cg0-0
Defaulted container "compute" out of: compute, default-init (init)
[Sun Mar 8 10:22:15 CST 2026] [info] the host machine support avx2 instruction set.
[Sun Mar 8 10:22:15 CST 2026] [info] Process conf file be.conf ...
[Sun Mar 8 10:22:15 CST 2026] [info] read 'be.conf' config [ enable_tls: ]
[Sun Mar 8 10:22:15 CST 2026] [info] read 'be.conf' config [ tls_private_key_path: ]
[Sun Mar 8 10:22:15 CST 2026] [info] read 'be.conf' config [ tls_certificate_path: ]
[Sun Mar 8 10:22:15 CST 2026] [info] read 'be.conf' config [ tls_ca_certificate_path: ]
[Sun Mar 8 10:22:15 CST 2026] [info] read 'be.conf' config [ heartbeat_service_port: ]
[Sun Mar 8 10:22:15 CST 2026] [info] use root no password show backends result 1772935622904 doriscluster-helm-cg0-0.doriscluster-helm-cg0.hope.svc.cluster.local 9050 9060 8040 8060 8050 2026-03-08 10:07:32 2026-03-08 10:13:14 false false 16 0.000 0.000 1.000 B 0.000 0.00 % 0.00 % 0.000 {"cloud_unique_id" : "1:1427461416:p_NUM_Vd", "compute_group_status" : "NORMAL", "private_endpoint" : "", "compute_group_name" : "cg0", "location" : "default", "public_endpoint" : "", "compute_group_id" : "nzEclcBd"} java.net.UnknownHostException: doriscluster-helm-cg0-0.doriscluster-helm-cg0.hope.svc.cluster.local {"lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false,"isActive":true,"isShutdown":false,"currentFragmentNum":0,"lastFragmentUpdateTime":0} 26 2 0.00 2026-03-08 10:13:14 0
1772935622905 doriscluster-helm-cg0-1.doriscluster-helm-cg0.hope.svc.cluster.local 9050 9060 8040 8060 8050 2026-03-08 10:07:25 2026-03-08 10:13:44 false false 17 0.000 0.000 1.000 B 0.000 0.00 % 0.00 % 0.000 {"cloud_unique_id" : "1:1427461416:NT7MV5rP", "compute_group_status" : "NORMAL", "private_endpoint" : "", "compute_group_name" : "cg0", "location" : "default", "public_endpoint" : "", "compute_group_id" : "nzEclcBd"} java.net.UnknownHostException: doriscluster-helm-cg0-1.doriscluster-helm-cg0.hope.svc.cluster.local {"lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false,"isActive":true,"isShutdown":false,"currentFragmentNum":0,"lastFragmentUpdateTime":0} 26 2 0.00 2026-03-08 10:13:44 0 .
[Sun Mar 8 10:22:15 CST 2026] [info] Check myself (doriscluster-helm-cg0-0.doriscluster-helm-cg0.hope.svc.cluster.local:9050) exist in FE, start be directly ...
[Sun Mar 8 10:22:15 CST 2026] run start_be.sh
StdoutLogger 2026-03-08 10:22:16,356 Start time: Sun Mar 8 10:22:16 CST 2026
StdoutLogger 2026-03-08 10:22:16,488 Added missing Java option: -Dhadoop.shell.setsid.enabled=false
StdoutLogger 2026-03-08 10:22:16,493 Added missing Java option: -Darrow.enable_null_check_for_get=false
StdoutLogger 2026-03-08 10:22:16,499 Added missing Java option: -Djol.skipHotspotSAAttach=true
StdoutLogger 2026-03-08 10:22:16,511 Added missing Java option: -Dfile.encoding=UTF-8
StdoutLogger 2026-03-08 10:22:16,561 Added missing Java option: --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
StdoutLogger 2026-03-08 10:22:16,568 Added missing Java option: --add-opens=java.xml/com.sun.org.apache.xerces.internal.jaxp=ALL-UNNAMED
W20260308 10:22:22.443363 292 timezone_utils.cpp:99] Meet illegal tzdata file: iso3166.tab. skipped
W20260308 10:22:22.444608 292 timezone_utils.cpp:99] Meet illegal tzdata file: leap-seconds.list. skipped
W20260308 10:22:22.445037 292 timezone_utils.cpp:99] Meet illegal tzdata file: leapseconds. skipped
W20260308 10:22:22.446023 292 timezone_utils.cpp:99] Meet illegal tzdata file: tzdata.zi. skipped
W20260308 10:22:22.446455 292 timezone_utils.cpp:99] Meet illegal tzdata file: zone.tab. skipped
W20260308 10:22:22.446887 292 timezone_utils.cpp:99] Meet illegal tzdata file: zone1970.tab. skipped
W20260308 10:22:22.447294 292 timezone_utils.cpp:99] Meet illegal tzdata file: zonenow.tab. skipped
E20260308 10:22:22.460981 292 variable.cpp:179] Already exposed doris_cache_data_page_cache' whose value is 0'
E20260308 10:22:22.461067 292 variable.cpp:179] Already exposed doris_cache_data_page_cache_persecond' whose value is 0'
E20260308 10:22:22.461370 292 variable.cpp:179] Already exposed doris_cache_index_page_cache' whose value is 0'
E20260308 10:22:22.461418 292 variable.cpp:179] Already exposed doris_cache_index_page_cache_persecond' whose value is 0'
E20260308 10:22:22.461577 292 variable.cpp:179] Already exposed doris_cache_pkindex_page_cache' whose value is 0'
E20260308 10:22:22.461594 292 variable.cpp:179] Already exposed doris_cache_pkindex_page_cache_persecond' whose value is 0'
E20260308 10:22:22.461699 292 variable.cpp:179] Already exposed doris_cache_point_query_row_cache' whose value is 0'
E20260308 10:22:22.461719 292 variable.cpp:179] Already exposed doris_cache_point_query_row_cache_persecond' whose value is 0'
E20260308 10:22:22.461794 292 variable.cpp:179] Already exposed doris_cache_segment_cache' whose value is 0'
E20260308 10:22:22.461807 292 variable.cpp:179] Already exposed doris_cache_segment_cache_persecond' whose value is 0'
E20260308 10:22:22.461861 292 variable.cpp:179] Already exposed doris_cache_schema_cache' whose value is 0'
E20260308 10:22:22.461875 292 variable.cpp:179] Already exposed doris_cache_schema_cache_persecond' whose value is 0'
E20260308 10:22:22.461941 292 variable.cpp:179] Already exposed doris_cache_common_obj_lrucache' whose value is 0'
E20260308 10:22:22.461962 292 variable.cpp:179] Already exposed doris_cache_common_obj_lrucache_persecond' whose value is 0'
E20260308 10:22:22.462034 292 variable.cpp:179] Already exposed doris_cache_point_query_lookup_connection_cache' whose value is 0'
E20260308 10:22:22.462061 292 variable.cpp:179] Already exposed doris_cache_point_query_lookup_connection_cache_persecond' whose value is 0'
E20260308 10:22:22.462263 292 variable.cpp:179] Already exposed doris_cache_inverted_index_searcher_cache' whose value is 0'
E20260308 10:22:22.462302 292 variable.cpp:179] Already exposed doris_cache_inverted_index_searcher_cache_persecond' whose value is 0'
E20260308 10:22:22.462481 292 variable.cpp:179] Already exposed doris_cache_inverted_index_query_cache' whose value is 0'
E20260308 10:22:22.462513 292 variable.cpp:179] Already exposed doris_cache_inverted_index_query_cache_persecond' whose value is 0'
E20260308 10:22:22.462613 292 variable.cpp:179] Already exposed doris_cache_query_cache' whose value is 0'
E20260308 10:22:22.462642 292 variable.cpp:179] Already exposed doris_cache_query_cache_persecond' whose value is 0'
E20260308 10:22:22.462821 292 variable.cpp:179] Already exposed doris_cache_mow_delete_bitmap_agg_cache' whose value is 0'
E20260308 10:22:22.462854 292 variable.cpp:179] Already exposed doris_cache_mow_delete_bitmap_agg_cache_persecond' whose value is 0'
0# 0x0000560E52F45C35 in /opt/apache-doris/be/lib/doris_be
1# 0x00007FF08B1C1520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
3# raise in /lib/x86_64-linux-gnu/libc.so.6
4# abort in /lib/x86_64-linux-gnu/libc.so.6
5# 0x0000560E5D48C84F in /opt/apache-doris/be/lib/doris_be
6# __cxxabiv1::__terminate(void ()()) in /opt/apache-doris/be/lib/doris_be
7# 0x0000560E5D48AD09 in /opt/apache-doris/be/lib/doris_be
8# 0x0000560E5D48AE53 in /opt/apache-doris/be/lib/doris_be
9# std::__throw_invalid_argument(char const
) in /opt/apache-doris/be/lib/doris_be
10# doris::io::FSFileCacheStorage::load_cache_info_into_memory(doris::io::BlockFileCache*) const in /opt/apache-doris/be/lib/doris_be
11# 0x0000560E53045DB0 in /opt/apache-doris/be/lib/doris_be
12# 0x0000560E5D520390 in /opt/apache-doris/be/lib/doris_be
13# 0x00007FF08B213AC3 in /lib/x86_64-linux-gnu/libc.so.6
14# __clone in /lib/x86_64-linux-gnu/libc.so.6

/opt/apache-doris/be/bin/start_be.sh: line 600: 292 Aborted (core dumped) ${LIMIT:+${LIMIT}} "${DORIS_HOME}/lib/doris_be" "$@" 2>> "${LOG_DIR}/be.out" < /dev/null
[Sun Mar 8 10:22:52 CST 2026] run post_exit
[Sun Mar 8 10:22:52 CST 2026] [info] read 'be.conf' config [ LOG_DIR: ]

看起来是加载文件缓存报错了,但看了缓存目录挂载是正常的,目录下文件内容也正常,请问这是什么原因?

0 Answers