头好痛,创建了 ROUTINE LOAD 任务 ,但是不消费也不执行,doris版本是3.0.5,kafka版本是kafka_2.13-4.0.0

Viewed 32

image.png

问了gpt,说是版本不兼容,真的是这样吗?

下面兼容版本也不行哎,求助

CREATE ROUTINE LOAD spider_origin_data.routine_load_news_data
ON news_data
COLUMNS (
    unique_key,
    title,
    content,
    province = json_extract(region, '$.province'),
    city = json_extract(region, '$.city'),
    area = json_extract(region, '$.area'),
    publish_at,
    files,
    site_url,
    site_id,
    site_subject_id,
    type
)
PROPERTIES (
    "desired_concurrent_number" = "3",
    "max_batch_interval" = "20",
    "max_batch_rows" = "300001",      -- 必须 > 200000
    "max_batch_size" = "104857600",  -- 100MB(字节形式)
    "format" = "json",
    "strip_outer_array" = "true",
    "json_root" = ""
)
FROM KAFKA (
    "kafka_broker_list" = "192.168.xxx:9092,192.168.xxx:9092,192.168.xxx:9092",
    "kafka_topic" = "origin_spider_data",
    "property.group.id" = "doris-loader-news",  -- Doris 3.0 推荐显式设置
    "property.auto.offset.reset" = "earliest"
);

gpt的回答
image.png

2 Answers

看截图状态ROUTINE LOAD TASK 是运行的
你这个设置是从最新数据开始消费,如果你需要从topic最早消费点消费数据,可以修改一下property.kafka_offsets="OFFSET_BEGINNING", property.kafka_default_offsets="OFFSET_BEGINNING"

image.png
补充一下ROUTINE LOAD TASK 详情