升级到4.1版本后,BROKER Load 很慢

Viewed 129

2026-05-13:我是8号升级到4.1版本后,从HDFS导入到doris,通过BROKER Load方式,单个分区同步时间从40s左右,上升到3分钟左右,HDFS为ORC格式,单分区大概在32M左右,35w条左右数据,排除网络问题,在同一个机房,看了一下,应该是读的时候,只有一个BE,一个并发在读,HDFS文件路径也只有一个文件,如何解决该问题,提高读的并发 image.png

JobId Label State Progress Type EtlInfo TaskInfo ErrorMsg CreateTime EtlStartTime EtlFinishTime LoadStartTime LoadFinishTime URL JobDetails TransactionId ErrorTablets User Comment FirstErrorMsg
1778493883807 dwd_sale_c_pos_promo_order_item_di_2026_04_13_20260513080032_77436 FINISHED 100.00% (1/1) BROKER unselected.rows=0;
dpp.abnorm.ALL=0;
dpp.norm.ALL=357493
cluster:hdfs_cluster;
timeout(s):3600;
max_filter_ratio:0.0
2026-05-13 13:06:05 2026-05-13 13:06:10 2026-05-13 13:06:10 2026-05-13 13:06:10 2026-05-13 13:09:12 {
"ScannedRows":357493,
"LoadBytes":161351849,
"FileNumber":1,
"FileSize":32347019,
"TaskNumber":1,
"Unfinished backends":[],
"All backends":[10009]
}
5416801 {} root

2026-05-17:使用datax测试了一下,同分区使用了22s就结束了,符合预期,跟之前2.1.11版本的broker load的时间相当,所以,现版本的broker load 存在读hdfs很慢的问题.(注:datax使用的是stream load 的方式)
image.png

1 Answers

取个load的profile吧:

  1. set enable_profile = true;
  2. set profile_level = 2;
  3. SHOW LOAD WHERE LABEL = "your_label"; (查到 JobId)
  4. SHOW LOAD PROFILE;
  5. curl -u user:password \
    "http://<fe_host>:<fe_http_port>/api/profile/text/?query_id=<job_id>" > brokerload_profile.txt

按照这个流程取个load profile 可以私发我主页微信