Doris版本:2.1.8
问题现象:
在查询时发现有一条数据查不到,查询时报错Size of filter doesn't match size of column,但这条数据在外表hive中是正常的能查到。
查询语句
SELECT
STRUCT_ELEMENT(device_info,'device_os') as dim_value,
tdid AS kpi_value
FROM cdp_traffic_app.v_cdp_wechat_mp_new_user_yumid new
where
partitionday = '20250207'
and appkey = '642931FE9530465391215F77CD92957A'
and tdid = 'omxHq0BcsEs2raBbcrszmiqq8pfc'
查询其他行数据是正常的(查询语句只有tdid有区别)
v_cdp_wechat_mp_new_user_yumid view结构
CREATE OR REPLACE
VIEW `v_cdp_wechat_mp_new_user_yumid` AS
select
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`yumid`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`tdid`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`openid`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`geo_info`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`geo_info_extra`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`device_info`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`device_info_extra`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`app_info`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`app_info_extra`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`network_type`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`etl_insert_time`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`partitionday`,
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`.`appkey`
from
`cdp_traffic_hive`.`dw_all`.`cdp_wechat_mp_new_user_yumid`;
该条数据在hive中也没什么异常
SQL报错及BE日志
SQL [1105] [HY000]: errCode = 2, detailMessage = (172.16.24.170)[CANCELLED]cur path: hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0. Read parquet file hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0 failed, reason = [E-1721][E-1721] Size of filter doesn't match size of column: size=918, filter.size=4064
0# doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> > const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173
1# doris::Exception::Exception<unsigned long&, unsigned long&>(int, std::basic_string_view<char, std::char_traits<char> > const&, unsigned long&, unsigned long&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
2# doris::vectorized::ColumnVector<unsigned char>::filter(doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 16ul, 16ul> const&) at /home/zcp/repo_center/doris_release/doris/be/src/vec/columns/columns_common.h:86
3# doris::vectorized::ColumnNullable::filter(doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 16ul, 16ul> const&) at /home/zcp/repo_center/doris_release/doris/be/src/vec/columns/column_nullable.cpp:373
4# doris::vectorized::Block::filter_block_internal(doris::vectorized::Block*, std::vector<unsigned int, std::allocator<unsigned int> > const&, doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 16ul, 16ul> const&) at /home/zcp/repo_center/doris_release/doris/be/src/vec/core/block.cpp:790
5# doris::vectorized::RowGroupReader::next_batch(doris::vectorized::Block*, unsigned long, unsigned long*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:0
6# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:489
7# doris::vectorized::VFileScanner::_get_block_wrapped(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:494
8# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:494
9# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/vscanner.cpp:0
10# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/vscanner.cpp:102
11# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:380
12# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
13# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:0
14# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
15# start_thread
16# __clone
W20250415 18:50:34.700150 3805 fragment_mgr.cpp:644] report error status: cur path: hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0. Read parquet file hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0 failed, reason = [E-1721][E-1721] Size of filter doesn't match size of column: size=918, filter.size=4064
W20250415 18:50:34.700441 49548 fragment_mgr.cpp:644] report error status: cur path: hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0. Read parquet file hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0 failed, reason = [E-1721][E-1721] Size of filter doesn't match size of column: size=918, filter.size=4064
W20250415 18:51:31.542778 28789 status.h:415] meet error status: [INTERNAL_ERROR]Read parquet file hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0 failed, reason = [E-1721][E-1721] Size of filter doesn't match size of column: size=918, filter.size=4064
W20250415 18:51:31.543093 28789 scanner_scheduler.cpp:283] Scan thread read VScanner failed: [INTERNAL_ERROR]cur path: hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0. Read parquet file hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0 failed, reason = [E-1721][E-1721] Size of filter doesn't match size of column: size=918, filter.size=4064
W20250415 18:51:31.543856 7691 task_scheduler.cpp:361] Pipeline task failed. query_id: 76d8677cd4c84bf7-bd9c31850fb30884|0-0 reason: [INTERNAL_ERROR]cur path: hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0. Read parquet file hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0 failed, reason = [E-1721][E-1721] Size of filter doesn't match size of column: size=918, filter.size=4064
W20250415 18:51:31.545722 49511 fragment_mgr.cpp:644] report error status: cur path: hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0. Read parquet file hdfs://datalake-prd.bigdata3.prd.storage.local/apps/hive/warehouse/dw/all/cdp_wechat_mp_new_user_yumid/partitionday=20250207/appkey=642931FE9530465391215F77CD92957A/000000_0 failed, reason = [E-1721][E-1721] Size of filter doesn't match size of column: size=918, filter.size=4064
这个问题应该怎么排查,有相关的案例吗,这个问题比较着急
在论坛中发现有类似的问题,但是还没有结论:
https://ask.selectdb.com/questions/D11Y1/cha-xun-bao-cuo-size-of-filter-doesn-t-match-size-of-column