doris 版本为 2.1.8
集群环境:3个fe+7个be节点,虚拟机,单个节点的资源配置为16C 32G 10T磁盘,3个fe节点和be节点为混合部署,其余4个be节点单独部署。
最初是随机有1-2个be节点会在周六或周天晚上的凌晨2点崩溃,程序的采集任务集中配置在凌晨1:30-03:00。查看be.out日志,未找到crash信息,info和waring也没有具体的报错信息。后续通过doris-manager将集群版本升级到2.1.11版本后6个be节点均在凌晨2点出现了crash重启的情况,be.out的日志如下
*** Query id: 1c2e81dfe754dc3-aa6a44613e368e0b ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1764612023 (unix time) try "date -d @1764612023" if you are using GNU date ***
*** Current BE git commitID: 97b77e6cda ***
*** SIGSEGV invalid permissions for mapped object (@0x55aedb23afc8) received by PID 4102987 (TID 4105799 OR 0x7f3c04bb6640) from PID 18446744073091133384; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
1# os::Linux::chained_handler(int, siginfo*, void*) in /data/sdv1/jdk1.8/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /data/sdv1/jdk1.8/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo*, void*) in /data/sdv1/jdk1.8/jre/lib/amd64/server/libjvm.so
4# 0x00007F401B00C520 in /lib/x86_64-linux-gnu/libc.so.6
5# doris::vectorized::VMergeIteratorContext::copy_rows(doris::vectorized::Block*, bool) at /home/zcp/repo_center/doris_release/doris/be/src/vec/olap/vgeneric_iterators.cpp:148
6# doris::Status doris::vectorized::VMergeIterator::_next_batch(doris::vectorized::Block*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/olap/vgeneric_iterators.h:242
7# doris::vectorized::VMergeIterator::next_batch(doris::vectorized::Block*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/olap/vgeneric_iterators.h:201
8# doris::BetaRowsetReader::next_block(doris::vectorized::Block*) at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/beta_rowset_reader.cpp:357
9# doris::vectorized::VCollectIterator::_topn_next(doris::vectorized::Block*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/olap/vcollect_iterator.cpp:292
10# doris::vectorized::VCollectIterator::next(doris::vectorized::Block*) in /data/sdv1/doris/be/lib/doris_be
11# doris::vectorized::BlockReader::_direct_next_block(doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/olap/block_reader.cpp:262
12# doris::vectorized::BlockReader::next_block_with_aggregation(doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/olap/block_reader.cpp:67
13# doris::vectorized::NewOlapScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/new_olap_scanner.cpp:508
14# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) in /data/sdv1/doris/be/lib/doris_be
15# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/vscanner.cpp:102
16# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr, std::shared_ptr) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:280
17# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr, std::shared_ptr)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
18# doris::ThreadPool::dispatch_thread() in /data/sdv1/doris/be/lib/doris_be
19# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:499
20# 0x00007F401B05EAC3 in /lib/x86_64-linux-gnu/libc.so.6
21# 0x00007F401B0F08C0 in /lib/x86_64-linux-gnu/libc.so.6