*** Query id: be4f1a6f1146ec9d-2bd4fd2fc8345080 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1779748364 (unix time) try "date -d @1779748364" if you are using GNU date ***
*** Current BE git commitID: fa98abc190 ***
*** SIGSEGV address not mapped to object (@0x4fba39000000) received by PID 1985065 (TID 1985943 OR 0x7f82ec8f1700) from PID 956301312; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /data01/srdcloud/sourcecode/doris/be/src/common/signal_handler.h:421
1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/jdk64/current/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /usr/jdk64/current/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo_t*, void*) in /usr/jdk64/current/jre/lib/amd64/server/libjvm.so
4# 0x00007F84A4AE8680 in /lib64/libc.so.6
5# doris::validate_utf8(doris::TFileScanRangeParams const&, char const*, unsigned long) at /data01/srdcloud/sourcecode/doris/be/src/util/utf8_check.cpp:335
6# doris::vectorized::CsvReader::_validate_line(doris::Slice const&, bool*) at /data01/srdcloud/sourcecode/doris/be/src/vec/exec/format/csv/csv_reader.cpp:722
7# doris::vectorized::CsvReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /data01/srdcloud/sourcecode/doris/be/src/vec/exec/format/csv/csv_reader.cpp:565
8# doris::vectorized::VFileScanner::_get_block_wrapped(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /data01/srdcloud/sourcecode/doris/be/src/vec/exec/scan/vfile_scanner.cpp:364
9# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /data01/srdcloud/sourcecode/doris/be/src/vec/exec/scan/vfile_scanner.cpp:307
10# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) in /usr/local/doris-be/lib/doris_be
11# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /data01/srdcloud/sourcecode/doris/be/src/vec/exec/scan/vscanner.cpp:102
12# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /data01/srdcloud/sourcecode/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:280
13# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::ThreadPool::dispatch_thread() in /usr/local/doris-be/lib/doris_be
15# doris::Thread::supervise_thread(void*) at /data01/srdcloud/sourcecode/doris/be/src/util/thread.cpp:499
16# 0x00007F84A48F2F2B in /lib64/libpthread.so.0
17# __clone in /lib64/libc.so.6
I20260601 00:28:17.715545 1536357 stream_load.cpp:208] new income streaming load request.id=be4f1a6f1146ec9d-2bd4fd2fc8345080, job_id=-1, txn_id=-1, label=e86d5039-af6b-4043-ac02-3bd63e8d288d, elapse(s)=0, db=paaslog, tbl=telepg_stat_slow_query, group_commit=0
I20260601 00:28:17.737607 1536357 stream_load_executor.cpp:72] begin to execute stream load. label=e86d5039-af6b-4043-ac02-3bd63e8d288d, txn_id=2694415957, query_id=be4f1a6f1146ec9d-2bd4fd2fc8345080
I20260601 00:28:17.740809 1536357 stream_load.cpp:214] finished to handle HTTP header, id=be4f1a6f1146ec9d-2bd4fd2fc8345080, job_id=-1, txn_id=2694415957, label=e86d5039-af6b-4043-ac02-3bd63e8d288d, elapse(s)=0
I20260601 00:28:17.757200 1535891 tablets_channel.cpp:162] open tablets channel (load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, index_id=27215197), tablets num: 613 timeout(s): 1800, init senders 1 with incremental off
W20260601 00:58:18.001104 1533810 task_scheduler.cpp:134] Timeout, query_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, instance_id=be4f1a6f1146ec9d-2bd4fd2fc8345081, task info: QueryId: be4f1a6f1146ec9d-2bd4fd2fc8345080
InstanceId: be4f1a6f1146ec9d-2bd4fd2fc8345081
ScanOperator: is_source: 1, is_sink: 0, is_closed: 0, is_pending_finish: 0, scanner_ctx is null: false , scanner ctx detail = id: 8f4a794c3ea167c8-21afa957409de397, total scanners: 1, blocks in queue: 0, _should_stop: false, _is_finished: false, free blocks: 0, limit: -1, _num_running_scanners: 1, _max_thread_num: 1, _max_bytes_in_queue: 107374182, query_id: be4f1a6f1146ec9d-2bd4fd2fc8345080
W20260601 00:58:18.002229 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-830822], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.34:8060, error message: [CANCELLED]
W20260601 00:58:18.002250 1533887 fragment_mgr.cpp:628] report error status: to coordinator: TNetworkAddress(hostname=10.251.112.145, port=9020), query id: be4f1a6f1146ec9d-2bd4fd2fc8345080, instance id: be4f1a6f1146ec9d-2bd4fd2fc8345081
W20260601 00:58:18.002302 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10010], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.160:8060, error message: [CANCELLED]
W20260601 00:58:18.002334 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10020], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.159:8060, error message: [CANCELLED]
W20260601 00:58:18.002372 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10016], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.157:8060, error message: [CANCELLED]
W20260601 00:58:18.002405 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-830820], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.32:8060, error message: [CANCELLED]
W20260601 00:58:18.002439 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10003], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.153:8060, error message: [CANCELLED]
W20260601 00:58:18.002475 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-830819], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.31:8060, error message: [CANCELLED]
W20260601 00:58:18.002502 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-830787], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.30:8060, error message: [CANCELLED]
W20260601 00:58:18.002526 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-822818844], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.149:8060, error message: [CANCELLED]
W20260601 00:58:18.002565 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-822818702], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.36:8060, error message: [CANCELLED]
W20260601 00:58:18.002607 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10018], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.158:8060, error message: [CANCELLED]
W20260601 00:58:18.002642 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-830823], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.35:8060, error message: [CANCELLED]
W20260601 00:58:18.002672 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10007], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.155:8060, error message: [CANCELLED]
W20260601 00:58:18.002696 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10009], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.156:8060, error message: [CANCELLED]
W20260601 00:58:18.002763 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10011], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.161:8060, error message: [CANCELLED]
W20260601 00:58:18.002789 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10012], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.150:8060, error message: [CANCELLED]
W20260601 00:58:18.002813 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10013], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.162:8060, error message: [CANCELLED]
W20260601 00:58:18.002840 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10014], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.151:8060, error message: [CANCELLED]
W20260601 00:58:18.002866 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10005], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.154:8060, error message: [CANCELLED]
W20260601 00:58:18.002890 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-830821], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.33:8060, error message: [CANCELLED]
W20260601 00:58:18.002921 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10015], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.152:8060, error message: [CANCELLED]
W20260601 00:58:18.002957 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10017], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.146:8060, error message: [CANCELLED]
W20260601 00:58:18.002985 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10019], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.147:8060, error message: [CANCELLED]
W20260601 00:58:18.003024 1533846 vtablet_writer.cpp:589] cancel node channel VNodeChannel[27215197-10021], load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, node=10.251.112.148:8060, error message: [CANCELLED]
I20260601 00:58:18.003054 1533846 vtablet_writer.cpp:1376] close olap table sink. load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, txn_id=2694415957, canceled all node channels due to error: [CANCELLED]
I20260601 00:58:18.003557 1536259 vtablet_writer.cpp:1016] All node channels are stopped(maybe finished/offending/cancelled), sender thread exit. be4f1a6f1146ec9d-2bd4fd2fc8345080
I20260601 00:58:18.003844 1535786 load_channel_mgr.cpp:196] load channel has been cancelled: be4f1a6f1146ec9d-2bd4fd2fc8345080
I20260601 00:58:18.003865 1535786 load_channel.cpp:71] load channel removed load_id=be4f1a6f1146ec9d-2bd4fd2fc8345080, is high priority=0, sender_ip=10.251.112.147
线上部署的doris 2.1.11 x86环境经常出现coredump问题 具体来说是发生在 utf8_check.cpp:335行
执行的是streamload数据导入过程,
00:28 streamload开始时间
00:58 streamload任务超时取消(1800秒)
04:xx 出现be coredump (coredump关联的query id和streamload的id是一致的
另外还观察到一条日志:
ScanOperator: is_source: 1, is_sink: 0, is_closed: 0, is_pending_finish: 0, scanner_ctx is null: false , scanner ctx detail = id: 8f4a794c3ea167c8-21afa957409de397, total scanners: 1, blocks in queue: 0, _should_stop: false, _is_finished: false, free blocks: 0, limit: -1, _num_running_scanners: 1, _max_thread_num: 1, _max_bytes_in_queue: 107374182, query_id: be4f1a6f1146ec9d-2bd4fd2fc8345080
证明了任务被canncel时, 的却有一个_num_running_scanners 线程在运行, 而几个小时过后执行到utf_check 步骤出现问题。(为什么三个小时之后才走到这里? )
从coredump信息看, 出现问题的地方更可能是访问非法地址(line.data) @0x4fba39000000 或者越界 (line.size 的问题), 而不是校验validate_utf8 本身出现问题。
请帮忙进一步看一下可能是什么原因。
我们目前尝试过修改 enable_text_validate_utf8 = false; 来规避, 但由于这个参数对streamload并不生效, 因此目前没有很好的手段规避这个问题。