时间分区字段str_to_date格式不正确后直接全表扫描导致CPU飙升

Viewed 18

升级到2.1.10后,又重现与之前提过的情况相同,升级2.1.7后大表查询CPU利用率飙升 2.1.8貌似有修复

select * from table where datatime > str_to_date('2025-05-29 17:30', '%Y-%m-%d %H:%i:%s')  and datatime <= '2025-05-29 14:30:00' and datatime <= '2025-05-29 16:30:00'

对于时间分区表的分区字段用str_to_date转换, '%Y-%m-%d %H:%i:%s'中多一个空格或少一个空格,都会直接扫描全表,explain

  0:VOlapScanNode(131)
     TABLE: TABLE***** PREAGGREGATION: OFF. Reason: No aggregate on scan.
     PREDICATES: ((CAST(datatime[#0] AS datetimev2(6)) > str_to_date('2025-05-29 17:30', '%Y-%m-%d %H:%i:%s')) AND (datatime[#0] <= '2025-05-29 14:30:00'))
     partitions=93/99 (p20250226,p20250227,p20250228,p20250301,p20250302,p20250303,p20250304,p20250305,p20250306,p20250307,p20250308,p20250309,p20250310,p20250311,p20250312,p20250313,p20250314,p20250315,p20250316,p20250317,p20250318,p20250319,p20250320,p20250321,p20250322,p20250323,p20250324,p20250325,p20250326,p20250327,p20250328,p20250329,p20250330,p20250331,p20250401,p20250402,p20250403,p20250404,p20250405,p20250406,p20250407,p20250408,p20250409,p20250410,p20250411,p20250412,p20250413,p20250414,p20250415,p20250416,p20250417,p20250418,p20250419,p20250420,p20250421,p20250422,p20250423,p20250424,p20250425,p20250426,p20250427,p20250428,p20250429,p20250430,p20250501,p20250502,p20250503,p20250504,p20250505,p20250506,p20250507,p20250508,p20250509,p20250510,p20250511,p20250512,p20250513,p20250514,p20250515,p20250516,p20250517,p20250518,p20250519,p20250520,p20250521,p20250522,p20250523,p20250524,p20250525,p20250526,p20250527,p20250528,p20250529)
     tablets=744/744, tabletList=561796735,561796739,561796743 ...
     cardinality=1385948369, avgRowSize=0.0, numNodes=1
     pushAggOp=NONE
     limit: 200

改为正确的格式str_to_date('2025-05-29 17:30', '%Y-%m-%d %H:%i') 则扫描分区正常
期望的结果情况是格式不匹配则报异常或者直接返回空结果

1 Answers

2.1.8 正常,2.1.10 有问题是这样吗?