目前be有四个节点,tablet大概是70w,数据库存量3T(含三副本),be重启后有一段时间一直报这个错,I20250211 13:05:59.079993 53302 wal_manager.cpp:480] Sleep 1s to wait for storage engine init.
但是发现这个过程中,be可用,没过多久,就报内存不足,然后就crash了。内存从原来64G扩到90G了,但是感觉给多少用多少。mem_limit限制70%。没有配置workload,就默认的。
W20250211 13:33:30.205469 63248 internal_service.cpp:338] exec plan fragment failed, errmsg=[MEM_ALLOC_FAILED]Create Expr failed because [E11] Allocator sys memory check failed: Cannot alloc:64, consuming tracker:<Load#Id=38d11b823906491c-80453db25555dcdc>, peak used 0, current used 0, exec node:<>, process memory used 86.61 GB exceed limit 62.04 GB or sys available memory 93.68 MB less than low water mark 4.43 GB.
0# doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> > const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173
1# Allocator<false, false, false, DefaultMemoryAllocator>::sys_memory_check(unsigned long) const at /home/zcp/repo_center/doris_release/doris/be/src/vec/common/allocator.cpp:76
2# Allocator<false, false, false, DefaultMemoryAllocator>::alloc_impl(unsigned long, unsigned long) at /home/zcp/repo_center/doris_release/doris/be/src/vec/common/allocator.cpp:187
3# doris::vectorized::ColumnVector<unsigned long>::reserve(unsigned long) at /home/zcp/repo_center/doris_release/doris/be/src/vec/common/pod_array.h:143
4# doris::vectorized::IDataType::create_column_const(unsigned long, doris::vectorized::Field const&) const at /home/zcp/repo_center/doris_release/doris/be/src/vec/common/cow.h:198
5# doris::vectorized::VLiteral::init(doris::TExprNode const&) at /home/zcp/repo_center/doris_release/doris/be/src/vec/common/cow.h:143
6# std::_Sp_counted_ptr_inplace<doris::vectorized::VLiteral, std::allocator<doris::vectorized::VLiteral>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<doris::TExprNode const&>(std::allocator<doris::vectorized::VLiteral>, doris::TExprNode const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:521
7# doris::vectorized::VExpr::create_expr(doris::TExprNode const&, std::shared_ptr<doris::vectorized::VExpr>&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:565
8# doris::vectorized::VExpr::create_tree_from_thrift(std::vector<doris::TExprNode, std::allocator<doris::TExprNode> > const&, int*, std::shared_ptr<doris::vectorized::VExpr>&, std::shared_ptr<doris::vectorized::VExprContext>&) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:491
9# doris::vectorized::VExpr::create_expr_tree(doris::TExpr const&, std::shared_ptr<doris::vectorized::VExprContext>&) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:491
10# doris::vectorized::VExpr::create_expr_trees(std::vector<doris::TExpr, std::allocator<doris::TExpr> > const&, std::vector<std::shared_ptr<doris::vectorized::VExprContext>, std::allocator<std::shared_ptr<doris::vectorized::VExprContext> > >&) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:491
11# doris::pipeline::UnionSourceOperatorX::init(doris::TPlanNode const&, doris::RuntimeState*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:491
12# doris::pipeline::PipelineXFragmentContext::_create_tree_helper(doris::ObjectPool*, std::vector<doris::TPlanNode, std::allocator<doris::TPlanNode> > const&, doris::TPipelineFragmentParams const&, doris::DescriptorTbl const&, std::shared_ptr<doris::pipeline::OperatorXBase>, int*, std::shared_ptr<doris::pipeline::OperatorXBase>*, std::shared_ptr<doris::pipeline::Pipeline>&, int, bool) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:491
13# doris::pipeline::PipelineXFragmentContext::_build_pipelines(doris::ObjectPool*, doris::TPipelineFragmentParams const&, doris::DescriptorTbl const&, std::shared_ptr<doris::pipeline::OperatorXBase>*, std::shared_ptr<doris::pipeline::Pipeline>) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/pipeline_x/pipeline_x_fragment_context.cpp:761
14# doris::pipeline::PipelineXFragmentContext::prepare(doris::TPipelineFragmentParams const&, doris::ThreadPool*) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/pipeline_x/pipeline_x_fragment_context.cpp:262
15# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, doris::QuerySource, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:491
16# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, doris::QuerySource) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:244
17# doris::PInternalServiceImpl::_exec_plan_fragment_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PFragmentRequestVersion, bool, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:0
18# doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:378
19# doris::WorkThreadPool<false>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
20# execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
21# start_thread
22# __clone
I20250211 13:33:30.209805 63271 internal_service.cpp:634] Cancel query 38d11b823906491c-80453db25555dcdc, reason: INTERNAL_ERROR
W20250211 13:33:30.209880 63271 fragment_mgr.cpp:1252] Could not find the query id:38d11b823906491c-80453db25555dcdc fragment id:0 to cancel