Doris 使用 CREATE-TABLE-AS-SELECT 语句导入clickhouse数据的问题

Viewed 26

Doris版本:4.0.2
通过创建catalog的方式导入clickhouse数据库中的表
clickhouse的驱动版本:clickhouse-jdbc-0.4.6-all.jar

clickhouse的表结构定义如下:
CREATE TABLE
indicator_history.xzl_hq_indicator_1000 (
mktcode Int32 COMMENT '市场代码',
tgtcode String COMMENT '标的代码',
bizdate Date COMMENT '业务时间',
_order Int32,
time Nullable(Int64),
lastPx Nullable(Int32),
volume Nullable(Int64),
turnover Nullable(Int64),
totalVol Nullable(Int64),
totalTur Nullable(Int64),
highPx Nullable(Int64),
lowPx Nullable(Int64)
) ENGINE = HaUniqueMergeTree(
'/clickhouse/tables/ee3f72ec-b673-4675-ae3f-72ecb6738675/{shard}',
'{replica}'
)
PARTITION BY
toYear(bizdate)
ORDER BY
(mktcode, tgtcode, bizdate, _order) UNIQUE KEY (mktcode, tgtcode, bizdate, _order) SETTINGS index_granularity = 8192,
max_gc_outdated_parts_per_round = 500

CREATE-TABLE-AS-SELECT语句如下
CREATE TABLE IF NOT EXISTS indicator.xzl_hq_indicator_1000
AUTO PARTITION BY RANGE (date_trunc(bizdate, 'year'))()
DISTRIBUTED BY HASH(mktcode,tgtcode) BUCKETS 24
PROPERTIES (
'replication_num' = '1'
)
AS SELECT * FROM clickhouse.indicator_history.xzl_hq_indicator_1000

报错如下:ERROR 1105 (HY000) at line 1: errCode = 2, detailMessage = [JNI_ERROR]JdbcExecutorException: jdbc get block address: | CAUSED BY: SQLException: java.io.IOException: Failed to read boolean value, expect 0 (false) or 1 (true) but we got: 32 | CAUSED BY: IOException: Failed to read boolean value, expect 0 (false) or 1 (true) but we got: 32

3 Answers

由于 JDBC Catalog 的类型推断和外部库的实际存储可能存在不完全一致的情况,官方在函数下推部分也特别强调了不同数据库之间类型/函数语义不一致的问题,并通过黑白名单、重写规则等方式规避错误。

在你的场景下,可以先不用 CTAS SELECT *,而是:

在 Doris 里显式列出所有字段;
对被识别为 BOOLEAN 的列,强制转成 INT 或 STRING 再导入。
示例(假设 some_col 被识别为 BOOLEAN):

CREATE TABLE IF NOT EXISTS indicator.xzl_hq_indicator_1000
AUTO PARTITION BY RANGE (date_trunc(bizdate, 'year'))()
DISTRIBUTED BY HASH(mktcode, tgtcode) BUCKETS 24
PROPERTIES (
'replication_num' = '1'
)
AS
SELECT
mktcode,
tgtcode,
bizdate,
_order,
time,
lastPx,
volume,
turnover,
totalVol,
totalTur,
highPx,
lowPx
FROM clickhouse.indicator_history.xzl_hq_indicator_1000;
如果 SHOW CREATE TABLE 里确实有某列是 BOOLEAN,可以在 SELECT 里写成:

CAST(some_bool_like_col AS INT) AS some_bool_like_col
或:

CAST(some_bool_like_col AS STRING) AS some_bool_like_col
这样 Doris 在通过 JDBC 读取时就不会按布尔去解析,从而绕过“0/1 以外值导致的解析错误”。

SHOW CREATE TABLE 结果如下:
xzl_hq_indicator_1000 | CREATE TABLE xzl_hq_indicator_1000 (
mktcode int NOT NULL,
tgtcode varchar(65533) NOT NULL,
bizdate date NOT NULL,
_order int NOT NULL,
time bigint NULL,
lastPx int NULL,
volume bigint NULL,
turnover bigint NULL,
totalVol bigint NULL,
totalTur bigint NULL,
highPx bigint NULL,
lowPx bigint NULL
) ENGINE=OLAP
UNIQUE KEY(mktcode, tgtcode, bizdate)
AUTO PARTITION BY RANGE (date_trunc(bizdate, 'year'))
()
DISTRIBUTED BY HASH(mktcode, tgtcode) BUCKETS 24
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"min_load_replica_num" = "-1",
"is_being_synced" = "false",
"storage_medium" = "hdd",
"storage_format" = "V2",
"inverted_index_storage_format" = "V3",
"enable_unique_key_merge_on_write" = "true",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false",
"group_commit_interval_ms" = "10000",
"group_commit_data_bytes" = "134217728",
"enable_mow_light_delete" = "false"
);
看上去并没有发现BOOLEAN,然后我使用insert into select语句导入数据,还是一样的问题:
MySQL [(none)]> insert into indicator.xzl_hq_indicator_1000 select mktcode,
-> tgtcode,
-> bizdate,
-> _order,
-> time,
-> lastPx,
-> volume,
-> turnover,
-> totalVol,
-> totalTur,
-> highPx,
-> lowPx
-> FROM clickhouse.indicator_history.xzl_hq_indicator_1000;
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.102.29.13)[JNI_ERROR]JdbcExecutorException: jdbc get block address: | CAUSED BY: SQLException: java.io.IOException: Failed to read boolean value, expect 0 (false) or 1 (true) but we got: 32 | CAUSED BY: IOException: Failed to read boolean value, expect 0 (false) or 1 (true) but we got: 32。会不会是数据量太大导致的呢?

感觉是clickhouse直接拒绝处理了,比如说我想通过排序,然后插入前1000条数据,也会报错,插入一条也会报错,如果我不排序,直接插入的话很快,插入1000000条都很快,过程如下:
MySQL [(none)]> insert into indicator.xzl_hq_indicator_1000 select * from clickhouse.indicator_history.xzl_hq_indicator_1000 order by mktcode, tgtcode, bizdate, _order limit 0,1000;
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.102.29.13)[JNI_ERROR]JdbcExecutorException: jdbc get block address: | CAUSED BY: SQLException: java.io.IOException: Failed to read boolean value, expect 0 (false) or 1 (true) but we got: 32 | CAUSED BY: IOException: Failed to read boolean value, expect 0 (false) or 1 (true) but we got: 32
MySQL [(none)]>
MySQL [(none)]>
MySQL [(none)]>
MySQL [(none)]>
MySQL [(none)]>
MySQL [(none)]> insert into indicator.xzl_hq_indicator_1000 select * from clickhouse.indicator_history.xzl_hq_indicator_1000 order by mktcode, tgtcode, bizdate, _order limit 0,1;
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.102.29.12)[JNI_ERROR]JdbcExecutorException: jdbc get block address: | CAUSED BY: SQLException: java.io.IOException: Failed to read boolean value, expect 0 (false) or 1 (true) but we got: 32 | CAUSED BY: IOException: Failed to read boolean value, expect 0 (false) or 1 (true) but we got: 32
MySQL [(none)]> insert into indicator.xzl_hq_indicator_1000 select * from clickhouse.indicator_history.xzl_hq_indicator_1000 limit 1;
Query OK, 1 row affected (0.124 sec)
{'label':'label_357fd3d103a14e90_8254758dc1eafafd', 'status':'VISIBLE', 'txnId':'7820'}

MySQL [(none)]> insert into indicator.xzl_hq_indicator_1000 select * from clickhouse.indicator_history.xzl_hq_indicator_1000 limit 100000;
Query OK, 100000 rows affected (0.408 sec)
{'label':'label_dc0d5d2c11834a69_a836e4376082827c', 'status':'VISIBLE', 'txnId':'7821'}

MySQL [(none)]> insert into indicator.xzl_hq_indicator_1000 select * from clickhouse.indicator_history.xzl_hq_indicator_1000 limit 1000000;
Query OK, 1000000 rows affected (2.323 sec)
{'label':'label_d8ee9b4f39b74135_bb69e1706df6a33f', 'status':'VISIBLE', 'txnId':'7822'}