K8S 部署 ddc 3.0.3 升级 3.0.4 BE 一直重启

Viewed 68

请问大家遇到过 ddc 从 3.0.3 升级到 3.0.4 be 一直重启的问题吗?
K8S 部署 FE MS 都启动好了,切换回 3.0.3 可以正常启动
报错如下:

[Mon Apr  7 12:15:02 CST 2025] [info] the host machine support avx2 instruction set.
[Mon Apr  7 12:15:02 CST 2025] [info] /etc/doris not exist or not a directory, ignore ...
[Mon Apr  7 12:15:02 CST 2025] [info] use root no password show backends result 10041	dev-disaggregated-cluster-cg1-0.dev-disaggregated-cluster-cg1.hd-dev-doris-v1.svc.cluster.local	9050	9060	8040	8060	-1	2025-04-03 14:39:55	2025-04-07 12:14:57	true	false	252	0.000 	0.000 	1.000 B	0.000 	0.00 %	0.00 %	0.000 	{"cloud_unique_id" : "1:1837630261:COSUbH8v", "compute_group_status" : "NORMAL", "private_endpoint" : "", "compute_group_name" : "cg1", "location" : "default", "public_endpoint" : "", "compute_group_id" : "YeLW9pRV"}		doris-3.0.3-rc04-62a58bff4c	{"lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false,"isActive":true,"currentFragmentNum":0,"lastFragmentUpdateTime":1743998921243}	0	mix	8	80.00 GB
10042	dev-disaggregated-cluster-cg1-1.dev-disaggregated-cluster-cg1.hd-dev-doris-v1.svc.cluster.local	9050	9060	8040	8060	-1	2025-04-03 14:39:56	2025-04-07 12:14:57	true	false	254	0.000 	0.000 	1.000 B	0.000 	0.00 %	0.00 %	0.000 	{"cloud_unique_id" : "1:1837630261:ifUJOaqM", "compute_group_status" : "NORMAL", "private_endpoint" : "", "compute_group_name" : "cg1", "location" : "default", "public_endpoint" : "", "compute_group_id" : "YeLW9pRV"}		doris-3.0.3-rc04-62a58bff4c	{"lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false,"isActive":true,"currentFragmentNum":0,"lastFragmentUpdateTime":1743998921242}	0	mix	8	80.00 GB
10043	dev-disaggregated-cluster-cg1-2.dev-disaggregated-cluster-cg1.hd-dev-doris-v1.svc.cluster.local	9050	9060	8040	8060	-1	2025-04-03 14:40:01	2025-04-07 11:19:37	false	false	252	0.000 	0.000 	1.000 B	0.000 	0.00 %	0.00 %	0.000 	{"cloud_unique_id" : "1:1837630261:CTzo5IV4", "compute_group_status" : "NORMAL", "private_endpoint" : "", "compute_group_name" : "cg1", "location" : "default", "public_endpoint" : "", "compute_group_id" : "YeLW9pRV"}	java.net.UnknownHostException: dev-disaggregated-cluster-cg1-2.dev-disaggregated-cluster-cg1.hd-dev-doris-v1.svc.cluster.local		{"lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false,"isActive":true,"currentFragmentNum":0,"lastFragmentUpdateTime":0}	240		8	0.00  .
[Mon Apr  7 12:15:02 CST 2025] [info] Check myself (dev-disaggregated-cluster-cg1-2.dev-disaggregated-cluster-cg1.hd-dev-doris-v1.svc.cluster.local:9050) exist in FE, start be directly ...
/etc/podinfo/annotationsis not exists.
[Mon Apr  7 12:15:02 CST 2025] run start_be.sh
StdoutLogger 2025-04-07 12:15:02,697 Start time: Mon Apr  7 12:15:02 CST 2025
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250407 12:15:03.413686   684 config.cpp:1776] set config enable_file_cache true [OK]
I20250407 12:15:03.413869   684 config.cpp:1776] set config enable_file_cache true [OK]
RuntimeLogger I20250407 12:15:03.413899   684 doris_main.cpp:388]  version doris-3.0.4-rc02(AVX2) RELEASE (build git://vm-241@39f9074cec769a10a0a93658b35d16ce59630e1e)
Built on Fri, 21 Feb 2025 16:31:22 1Z by vm-241
OpenJDK 64-Bit Server VM warning: Option CriticalJNINatives was deprecated in version 16.0 and will likely be removed in a future release.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
RuntimeLogger I20250407 12:15:05.241397   684 doris_main.cpp:496] Doris backend JNI is initialized.
[WARNING!] /sys/kernel/mm/transparent_hugepage/enabled: [always] madvise never, Doris not recommend turning on THP, which may cause the BE process to use more memory and cannot be freed in time. Turn off THP: `echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled`
RuntimeLogger I20250407 12:15:05.244735   684 mem_info.cpp:378] Physical Memory: 2164198039552, BE Available Physical Memory(consider cgroup): 85899345920, Mem Limit: 72.00 GB, origin config value: 90%, System Mem Available Min Reserve: 4.00 GB, Vm Min Free KBytes: 179.58 MB, Vm Overcommit Memory: 1
RuntimeLogger I20250407 12:15:05.244786   684 doris_main.cpp:514] Cpu Info:
  Model: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Cores: 112
  Max Possible Cores: 112
  L1 Cache: 48.00 KB (Line: 64.00 B)
  L2 Cache: 1.25 MB (Line: 64.00 B)
  L3 Cache: 42.00 MB (Line: 64.00 B)
  Hardware Supports:
    ssse3
    sse4_1
    sse4_2
    popcnt
    avx
    avx2
  Numa Nodes: 2
  Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->0 | 7->0 | 8->0 | 9->0 | 10->0 | 11->0 | 12->0 | 13->0 | 14->0 | 15->0 | 16->0 | 17->0 | 18->0 | 19->0 | 20->0 | 21->0 | 22->0 | 23->0 | 24->0 | 25->0 | 26->0 | 27->0 | 28->1 | 29->1 | 30->1 | 31->1 | 32->1 | 33->1 | 34->1 | 35->1 | 36->1 | 37->1 | 38->1 | 39->1 | 40->1 | 41->1 | 42->1 | 43->1 | 44->1 | 45->1 | 46->1 | 47->1 | 48->1 | 49->1 | 50->1 | 51->1 | 52->1 | 53->1 | 54->1 | 55->1 | 56->0 | 57->0 | 58->0 | 59->0 | 60->0 | 61->0 | 62->0 | 63->0 | 64->0 | 65->0 | 66->0 | 67->0 | 68->0 | 69->0 | 70->0 | 71->0 | 72->0 | 73->0 | 74->0 | 75->0 | 76->0 | 77->0 | 78->0 | 79->0 | 80->0 | 81->0 | 82->0 | 83->0 | 84->1 | 85->1 | 86->1 | 87->1 | 88->1 | 89->1 | 90->1 | 91->1 | 92->1 | 93->1 | 94->1 | 95->1 | 96->1 | 97->1 | 98->1 | 99->1 | 100->1 | 101->1 | 102->1 | 103->1 | 104->1 | 105->1 | 106->1 | 107->1 | 108->1 | 109->1 | 110->1 | 111->1 |
RuntimeLogger I20250407 12:15:05.244817   684 doris_main.cpp:515] Disk Info: 
  Num disks 103: sdc, sda, sdb, sdd, dm-, loop, sde, sdf, sdg, sdh, sdag, sdah, sdai, sdaj, sdak, sdal, sdam, sdan, sdao, sdap, sdaq, sdar, sdaw, sdax, sday, sdaz, sdbe, sdbf, sdbg, sdbh, sdbi, sdbj, sdbk, sdbl, sdbq, sdbr, sdbs, sdbt, sdbu, sdbv, sdbw, sdbx, sdco, sdcp, sdcq, sdcr, sdcs, sdct, sdcu, sdcv, rbd, sdi, sdk, sdj, sdl, sdm, sdn, sdo, sdp, sdq, sdr, sds, sdt, sdu, sdv, sdw, sdx, sdy, sdz, sdaa, sdab, sdac, sdad, sdae, sdaf, sdas, sdat, sdau, sdav, sdba, sdbb, sdbc, sdbd, sdbm, sdbn, sdbo, sdbp, sdby, sdbz, sdca, sdcb, sdcc, sdcd, sdce, sdcf, sdcg, sdch, sdci, sdcj, sdck, sdcl, sdcm, sdcn
RuntimeLogger I20250407 12:15:05.244827   684 doris_main.cpp:516] Physical Memory: 85899345920
Memory Limt: 77309411328
CGroup Info: Process CGroup Memory Info (cgroups path: /sys/fs/cgroup, cgroup version: v2): memory limit: 85899345920, memory usage: 727432312
RuntimeLogger I20250407 12:15:05.245234   684 backend_options.cpp:62] local host ip=172.51.21.60
RuntimeLogger W20250407 12:15:05.346850   684 timezone_utils.cpp:98] Meet illegal tzdata file: iso3166.tab. skipped
RuntimeLogger W20250407 12:15:05.346975   684 timezone_utils.cpp:98] Meet illegal tzdata file: leap-seconds.list. skipped
RuntimeLogger W20250407 12:15:05.346998   684 timezone_utils.cpp:98] Meet illegal tzdata file: leapseconds. skipped
RuntimeLogger W20250407 12:15:05.347478   684 timezone_utils.cpp:98] Meet illegal tzdata file: tzdata.zi. skipped
RuntimeLogger W20250407 12:15:05.347504   684 timezone_utils.cpp:98] Meet illegal tzdata file: zone.tab. skipped
RuntimeLogger W20250407 12:15:05.347525   684 timezone_utils.cpp:98] Meet illegal tzdata file: zone1970.tab. skipped
RuntimeLogger W20250407 12:15:05.347550   684 timezone_utils.cpp:98] Meet illegal tzdata file: zonenow.tab. skipped
RuntimeLogger I20250407 12:15:05.347810   684 timezone_utils.cpp:115] Preloaded657 timezones.
RuntimeLogger I20250407 12:15:05.408318   684 cgroup_cpu_ctl.cpp:68] [cgroup_init_path]doris cgroup home path is not specify, if you not use workload group, you can ignore this log.
RuntimeLogger I20250407 12:15:05.408483   684 block_file_cache_factory.cpp:100] The cache /opt/apache-doris/be/file_cache config size 0 is larger than 88% disk size 185485668515 or zero, recalc it.
RuntimeLogger I20250407 12:15:05.408506   684 block_file_cache_factory.cpp:108] [FileCache] path: /opt/apache-doris/be/file_cache total_size: 185485668515 disk_total_size: 185485668515
RuntimeLogger I20250407 12:15:05.409040   684 block_file_cache.cpp:238] file cache path= /opt/apache-doris/be/file_cache capacity: 185485668515, max_file_block_size: 1048576, max_query_cache_size: 0, disposable_queue_size: 9274283425, disposable_queue_elements: 102400, index_queue_size: 9274283425, index_queue_elements: 102400, ttl_queue_size: 92742834250, ttl_queue_elements: 102400, query_queue_size: 74194267415, query_queue_elements: 102400, storage: disk
RuntimeLogger I20250407 12:15:05.416416   856 fs_file_cache_storage.cpp:109] FileCache /opt/apache-doris/be/file_cache lazy load done.
RuntimeLogger I20250407 12:15:05.417874   684 exec_env_init.cpp:362] pipeline executors_size set |size=112
RuntimeLogger I20250407 12:15:05.417891   860 block_file_cache.cpp:1876] Starting background evict in advance thread
RuntimeLogger I20250407 12:15:05.441609   684 task_scheduler.cpp:61] TaskScheduler set cores|size=112
RuntimeLogger I20250407 12:15:05.443812   974 fragment_mgr.cpp:976] FragmentMgr cancel worker start working.
RuntimeLogger I20250407 12:15:05.786084   684 load_path_mgr.cpp:67] Load path configured to []
RuntimeLogger I20250407 12:15:05.786890  1007 result_buffer_mgr.cpp:192] result buffer manager cancel thread begin.
RuntimeLogger I20250407 12:15:05.813710   684 cache_manager.h:47] Register Cache DataPageCache
RuntimeLogger E20250407 12:15:05.813846   684 variable.cpp:179] Already exposed `doris_cache_data_page_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.813879   684 variable.cpp:179] Already exposed `doris_cache_data_page_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.813905   684 cache_manager.h:47] Register Cache IndexPageCache
RuntimeLogger E20250407 12:15:05.814010   684 variable.cpp:179] Already exposed `doris_cache_index_page_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814018   684 variable.cpp:179] Already exposed `doris_cache_index_page_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814025   684 cache_manager.h:47] Register Cache PKIndexPageCache
RuntimeLogger E20250407 12:15:05.814128   684 variable.cpp:179] Already exposed `doris_cache_pkindex_page_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814136   684 variable.cpp:179] Already exposed `doris_cache_pkindex_page_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814142   684 exec_env_init.cpp:476] Storage page cache memory limit: 14.40 GB, origin config value: 20%
RuntimeLogger I20250407 12:15:05.814163   684 cache_manager.h:47] Register Cache PointQueryRowCache
RuntimeLogger E20250407 12:15:05.814224   684 variable.cpp:179] Already exposed `doris_cache_point_query_row_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814232   684 variable.cpp:179] Already exposed `doris_cache_point_query_row_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814242   684 exec_env_init.cpp:489] Row cache memory limit: 14.40 GB, origin config value: 20%
RuntimeLogger I20250407 12:15:05.814250   684 cache_manager.h:47] Register Cache SegmentCache
RuntimeLogger E20250407 12:15:05.814288   684 variable.cpp:179] Already exposed `doris_cache_segment_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814296   684 variable.cpp:179] Already exposed `doris_cache_segment_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814302   684 exec_env_init.cpp:514] segment_cache_capacity <= fd_number * 1 / 5, fd_number: 1048576 segment_cache_capacity: 209700 min_segment_cache_mem_limit 3865470565
RuntimeLogger I20250407 12:15:05.814311   684 cache_manager.h:47] Register Cache SchemaCache
RuntimeLogger E20250407 12:15:05.814342   684 variable.cpp:179] Already exposed `doris_cache_schema_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814348   684 variable.cpp:179] Already exposed `doris_cache_schema_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814354   684 exec_env_init.cpp:522] max file reader cache size is: 349525, resource hard limit is: 1048576, config file_cache_max_file_reader_cache_size is: 1000000
RuntimeLogger I20250407 12:15:05.814410   684 cache_manager.h:47] Register Cache CommonObjLRUCache
RuntimeLogger E20250407 12:15:05.814450   684 variable.cpp:179] Already exposed `doris_cache_common_obj_lrucache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814456   684 variable.cpp:179] Already exposed `doris_cache_common_obj_lrucache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814464   684 cache_manager.h:47] Register Cache PointQueryLookupConnectionCache
RuntimeLogger E20250407 12:15:05.814497   684 variable.cpp:179] Already exposed `doris_cache_point_query_lookup_connection_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814504   684 variable.cpp:179] Already exposed `doris_cache_point_query_lookup_connection_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814517   684 inverted_index_cache.cpp:59] fd_number: 1048576, inverted index open searcher limit: 209715
RuntimeLogger I20250407 12:15:05.814522   684 cache_manager.h:47] Register Cache InvertedIndexSearcherCache
RuntimeLogger E20250407 12:15:05.814615   684 variable.cpp:179] Already exposed `doris_cache_inverted_index_searcher_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814626   684 variable.cpp:179] Already exposed `doris_cache_inverted_index_searcher_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814635   684 exec_env_init.cpp:543] Inverted index searcher cache memory limit: 7.20 GB, origin config value: 10%
RuntimeLogger I20250407 12:15:05.814642   684 cache_manager.h:47] Register Cache InvertedIndexQueryCache
RuntimeLogger E20250407 12:15:05.814738   684 variable.cpp:179] Already exposed `doris_cache_inverted_index_query_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814745   684 variable.cpp:179] Already exposed `doris_cache_inverted_index_query_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814752   684 exec_env_init.cpp:557] Inverted index query match cache memory limit: 7.20 GB, origin config value: 10%
RuntimeLogger I20250407 12:15:05.814757   684 cache_manager.h:47] Register Cache QueryCache
RuntimeLogger E20250407 12:15:05.814781   684 variable.cpp:179] Already exposed `doris_cache_query_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814788   684 variable.cpp:179] Already exposed `doris_cache_query_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814795   684 exec_env_init.cpp:566] query cache memory limit: 512MB
RuntimeLogger I20250407 12:15:05.814805   684 cache_manager.h:47] Register Cache LastSuccessChannelCache
RuntimeLogger E20250407 12:15:05.814831   684 variable.cpp:179] Already exposed `doris_cache_last_success_channel_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.814865   684 variable.cpp:179] Already exposed `doris_cache_last_success_channel_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.814929   684 wal_manager.cpp:116] wal_dir:/opt/apache-doris/be/storage/wal, tmp_dir:/opt/apache-doris/be/storage/wal/tmp
RuntimeLogger I20250407 12:15:05.815239   684 cache_manager.h:47] Register Cache TabletSchemaCache
RuntimeLogger I20250407 12:15:05.815253  1466 wal_manager.cpp:479] Sleep 1s to wait for storage engine init.
RuntimeLogger E20250407 12:15:05.815289   684 variable.cpp:179] Already exposed `doris_cache_tablet_schema_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.815304   684 variable.cpp:179] Already exposed `doris_cache_tablet_schema_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.815315   684 cache_manager.h:47] Register Cache TabletColumnObjectPool
RuntimeLogger E20250407 12:15:05.815337   684 variable.cpp:179] Already exposed `doris_cache_tablet_column_object_pool' whose value is `0'
RuntimeLogger E20250407 12:15:05.815346   684 variable.cpp:179] Already exposed `doris_cache_tablet_column_object_pool_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.815436   684 exec_env_init.cpp:634] The file deploy_mode doesn't exist, create it.
start BE in cloud mode, cloud_unique_id: , meta_service_endpoint: 
RuntimeLogger I20250407 12:15:05.815493   684 cache_manager.h:47] Register Cache CloudTabletCache
RuntimeLogger E20250407 12:15:05.815521   684 variable.cpp:179] Already exposed `doris_cache_cloud_tablet_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.815531   684 variable.cpp:179] Already exposed `doris_cache_cloud_tablet_cache_persecond' whose value is `0'
RuntimeLogger W20250407 12:15:05.815559   684 cloud_storage_engine.cpp:306] failed to get storage vault info. err=[INVALID_ARGUMENT]Meta service endpoint is empty. Please configure manually or wait for heartbeat to obtain.
RuntimeLogger I20250407 12:15:05.816473   684 cache_manager.h:47] Register Cache CloudTxnDeleteBitmapCache
RuntimeLogger E20250407 12:15:05.816515   684 variable.cpp:179] Already exposed `doris_cache_cloud_txn_delete_bitmap_cache' whose value is `0'
RuntimeLogger E20250407 12:15:05.816529   684 variable.cpp:179] Already exposed `doris_cache_cloud_txn_delete_bitmap_cache_persecond' whose value is `0'
RuntimeLogger I20250407 12:15:05.817157   684 storage_engine.cpp:134] stream load record path: /opt/apache-doris/be/storage
RuntimeLogger I20250407 12:15:05.822523   684 cloud_storage_engine.cpp:241] refresh s3 info thread started
RuntimeLogger I20250407 12:15:05.822577   684 cloud_storage_engine.cpp:247] vacuum stale rowsets thread started
RuntimeLogger I20250407 12:15:05.822618   684 cloud_storage_engine.cpp:252] sync tablets thread started
RuntimeLogger I20250407 12:15:05.822659   684 cloud_storage_engine.cpp:258] evict quering thread started
RuntimeLogger I20250407 12:15:05.825286   684 cloud_storage_engine.cpp:283] compaction tasks producer thread started, base thread num 4 cumu thread num 20
RuntimeLogger I20250407 12:15:05.825320  1617 cloud_storage_engine.cpp:409] try to start compaction producer process!
RuntimeLogger I20250407 12:15:05.825347   684 cloud_storage_engine.cpp:290] lease compaction thread started
RuntimeLogger I20250407 12:15:05.825358  1617 cloud_tablet_mgr.cpp:357] get_topn_compaction_score, n=8 type=2 num_tablets=0 num_skipped=0 num_disabled=0 num_filtered=0 max_score=0 max_score_tablet=0 tablets=[]
RuntimeLogger I20250407 12:15:05.825388   684 cloud_storage_engine.cpp:296] check tablet delete bitmap score thread started
RuntimeLogger I20250407 12:15:05.825425  1619 cloud_storage_engine.cpp:793] try to start check tablet delete bitmap score!
RuntimeLogger I20250407 12:15:05.825428   684 workload_sched_policy_mgr.cpp:35] start workload scheduler 
RuntimeLogger I20250407 12:15:05.825436   684 spill_stream_manager.cpp:51] init spill stream manager
RuntimeLogger I20250407 12:15:05.825486   684 spill_stream_manager.cpp:322] spill storage path: /opt/apache-doris/be/storage, capacity: 892.81 GB, limit: 160.71 GB, available: 315.15 GB
RuntimeLogger I20250407 12:15:05.837584   684 spill_stream_manager.cpp:85] spill gc thread started
RuntimeLogger I20250407 12:15:05.837786   684 agent_server.cpp:59] Register workload group listener
RuntimeLogger I20250407 12:15:05.837795   684 agent_server.cpp:65] Register workload scheduler policy listener
RuntimeLogger I20250407 12:15:05.838698   684 cloud_backend_service.cpp:57] Doris CloudBackendService listening on 9060
RuntimeLogger I20250407 12:15:05.838898   684 thrift_server.cpp:404] ThriftServer 'backend' started on port: 9060
RuntimeLogger I20250407 12:15:05.939824   684 brpc_service.cpp:92] BRPC server bind to host: 0.0.0.0, port: 8060
RuntimeLogger W20250407 12:15:05.959290   684 task_control.cpp:202] Fail to create _workers[241], Resource temporarily unavailable
RuntimeLogger I20250407 12:15:05.960184   684 server.cpp:1106] Server[doris::CloudInternalServiceImpl] is serving on port=8060.
RuntimeLogger I20250407 12:15:05.960196   684 server.cpp:1109] Check out http://dev-disaggregated-cluster-cg1-2:8060 in web browser.
RuntimeLogger W20250407 12:15:06.792353   684 status.h:424] meet error status: [RUNTIME_ERROR]Could not create thread. (error 11) Resource temporarily unavailable

	0#  doris::Thread::start_thread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&, unsigned long, scoped_refptr<doris::Thread>*) at /home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:445
	1#  doris::ThreadPool::create_thread() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:244
	2#  doris::ThreadPool::init() at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:502
	3#  doris::Status doris::ThreadPoolBuilder::build<doris::ThreadPool>(std::unique_ptr<doris::ThreadPool, std::default_delete<doris::ThreadPool> >*) const at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:502
	4#  doris::EvHttpServer::start() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
	5#  doris::HttpService::start() at /home/zcp/repo_center/doris_release/doris/be/src/service/http_service.cpp:0
	6#  main at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:389
	7#  ?
	8#  __libc_start_main
	9#  _start
*** Query id: 0-0 ***
*** is nereids: 0 ***
*** tablet id: 0 ***
*** Aborted at 1743999306 (unix time) try "date -d @1743999306" if you are using GNU date ***
*** Current BE git commitID: 39f9074cec ***
*** SIGSEGV address not mapped to object (@0x8) received by PID 684 (TID 684 OR 0x7fb2013c1a00) from PID 8; stack trace: ***
RuntimeLogger I20250407 12:15:06.815495  1466 wal_manager.cpp:485] Scheduled(every 10s) WAL info: [/opt/apache-doris/be/storage/wal: limit 33838706688 Bytes, used 0 Bytes, estimated wal bytes 0 Bytes, available 33838706688 Bytes.];
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java/lib/server/libjvm.so
 3# 0x00007FB201517520 in /lib/x86_64-linux-gnu/libc.so.6
 4# std::_Hashtable<std::shared_ptr<doris::MetricEntity>, std::pair<std::shared_ptr<doris::MetricEntity> const, int>, std::allocator<std::pair<std::shared_ptr<doris::MetricEntity> const, int> >, std::__detail::_Select1st, doris::MetricEntityEqualTo, doris::MetricEntityHash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find(std::shared_ptr<doris::MetricEntity> const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1570
 5# doris::MetricRegistry::deregister_entity(std::shared_ptr<doris::MetricEntity> const&) in /opt/apache-doris/be/lib/doris_be
 6# doris::ThreadPool::shutdown() at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:330
 7# doris::ThreadPool::init() in /opt/apache-doris/be/lib/doris_be
 8# doris::Status doris::ThreadPoolBuilder::build<doris::ThreadPool>(std::unique_ptr<doris::ThreadPool, std::default_delete<doris::ThreadPool> >*) const at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.h:125
 9# doris::EvHttpServer::start() at /home/zcp/repo_center/doris_release/doris/be/src/http/ev_http_server.cpp:116
10# doris::HttpService::start() in /opt/apache-doris/be/lib/doris_be
11# main at /home/zcp/repo_center/doris_release/doris/be/src/service/doris_main.cpp:576
12# 0x00007FB2014FED90 in /lib/x86_64-linux-gnu/libc.so.6
13# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6
14# _start in /opt/apache-doris/be/lib/doris_be

/opt/apache-doris/be/bin/start_be.sh: line 435:   684 Segmentation fault      (core dumped) ${LIMIT:+${LIMIT}} "${DORIS_HOME}/lib/doris_be" "$@" 2>&1 < /dev/null

DDC配置如下:

apiVersion: v1
kind: ConfigMap
metadata:
  name: be-configmap
  namespace: hd-dev-doris-v1
  labels:
    app.kubernetes.io/component: be
data:
  be.conf: |
    # For jdk 17, this JAVA_OPTS will be used as default JVM options
    JAVA_OPTS_FOR_JDK_17="-Xmx1024m -DlogPath=$LOG_DIR/jni.log -Xlog:gc*:$LOG_DIR/be.gc.log.$CUR_DATE:time,uptime:filecount=10,filesize=50M -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.security.krb5.debug=true -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-opens=java.management/sun.management=ALL-UNNAMED"
    file_cache_path = [{"path":"/mnt/disk1/doris_cloud/file_cache","total_size":207374182400,"query_limit":207374182400}]
---
apiVersion: disaggregated.cluster.doris.com/v1
kind: DorisDisaggregatedCluster
metadata:
  name: dev-disaggregated-cluster
  namespace: hd-dev-doris-v1
spec:
  metaService:
    image: apache/doris:ms-3.0.3
    envVars:
      - name: TZ
        value: Asia/Shanghai
    requests:
      cpu: 4
      memory: 4Gi
    limits:
      cpu: 4
      memory: 4Gi
    fdb:
      configMapNamespaceName:
        name: fdb-dev-cluster-config
        namespace: hd-dev-doris-v1
  feSpec:
    replicas: 2
    image: apache/doris:fe-3.0.3
    envVars:
      - name: TZ
        value: Asia/Shanghai
    requests:
      cpu: 6
      memory: 80Gi
    limits:
      cpu: 6
      memory: 80Gi
    service:
      type: NodePort
      portMaps:
        - nodePort: 30830
          targetPort: 8030
        - nodePort: 30920
          targetPort: 9020
        - nodePort: 30930
          targetPort: 9030
        - nodePort: 30910
          targetPort: 9010
    persistentVolume:
      persistentVolumeClaimSpec:
        storageClassName: netapp-iscsi
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  computeGroups:
    - uniqueId: cg1
      replicas: 3
      image: apache/doris:be-3.0.3
      envVars:
      - name: TZ
        value: Asia/Shanghai
      requests:
        cpu: 8
        memory: 80Gi
      limits:
        cpu: 8
        memory: 80Gi
      persistentVolume:
        # logNotStore: true
        persistentVolumeClaimSpec:
          storageClassName: ocs-storagecluster-ceph-rbd
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 200Gi
1 Answers

能给一下完整的be的日志吗,可以加我微信(hhj_0530)看看。