doris使用curl导入数据很慢,io跑不满,只用了1%

Viewed 4

1、建表语句
CREATE TABLE IF NOT EXISTS test.test_orc(
字段XXX
)
ENGINE=OLAP
-- 必须用 UNIQUE KEY
UNIQUE KEY(id)
-- 匹配 96 核,最大化并行
DISTRIBUTED BY HASH(id) BUCKETS 96
PROPERTIES(
-- 单机无需副本
"replication_num" = "1",
-- LZ4 比 ZSTD 解压更快(CPU 换 I/O)
"compression" = "LZ4"
);
2、crul命令
curl --location -trusted -u root:123456
-H "label:test_orc_$(date +%s)"
-H "format:orc"
-H "max_filter_ratio:0.01"
-H "timeout:300"
-T /home/part-00048-c40b4171-648c-4148-92fa-d63b544a7860-c000.snappy.orc
-XPUT http://172.17.23.3:8030/api/test/test_orc/_stream_load
{
"TxnId": 5244,
"Label": "test_orc_1769563705",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 920905,
"NumberLoadedRows": 920905,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 102362174,
"LoadTimeMs": 65771,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 184,
"ReadDataTimeMs": 115,
"WriteDataTimeMs": 65404,
"ReceiveDataTimeMs": 224,
"CommitAndPublishTimeMs": 25
}
4、PrepareTime和CommitTime之间耗时很长
show proc '/transactions/1769048215630/finished';

TransactionId
Label
Coordinator
TransactionStatus
LoadJobSourceType
PrepareTime
PreCommitTime
CommitTime
PublishTime
FinishTime
Reason
ErrorReplicasCount
ListenerId
TimeoutMs
ErrMsg
5141 label_nb_mass_resource_all_06422579079z0YdKBS6I_1769501280028 BE: 172.17.23.3 VISIBLE BACKEND_STREAMING 2026-01-27 16:08:00 - 2026-01-27 16:22:37 2026-01-27 16:22:37 2026-01-27 16:22:37 0 -1 3600000

4、IO和cpu使用率很低
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 3.00 0.00 3.00 0.00 31.20 20.80 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 156.80 78.40 19978.40 55785.60 644.25 0.05 0.22 0.07 0.53 0.12 2.86

top - 16:54:07 up 8 days, 8:03, 10 users, load average: 4.56, 5.29, 6.75
Tasks: 891 total, 2 running, 888 sleeping, 1 stopped, 0 zombie
%Cpu(s): 2.4 us, 0.9 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 36228716+total, 44329224 free, 27226729+used, 45690644 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 69148312 avail Mem

5、不是下面原因(测试过):
UNIQUE KEY MERGE ON WRITE 的随机读导致的“写前读”阻塞,不是这个原因;
线程很多ps -T -p $(pgrep doris_be) -o comm= | sort | uniq -c | sort -nr,不是这个原因;
要并发curl,不是这个原因,单个curl都导入不快,并发没意义;
BUCKETS 96改为10,不是这个原因;
这是 ORC 格式导入,不是 CSV/JSON/parquet/JSON,不是这个原因;
“ORC → CSV 转换 + CSV Stream Load” 是唯一能跑满 IO/CPU 的方案。 不行,试过;
Broker Load需要HDFS,现在文件在centos磁盘上,只能用streamload;
建表加PROPERTIES (
"enable_unique_key_merge_on_write" = "false"
)也不行;
enable_single_replica_insert_wal 在建表 PROPERTIES 和 be.conf 中均不存在,不行;
-H "batch_size:10240" -H "max_batch_rows:10240"加这个也不行;
磁盘是14块SSD做了RAID,IO性能非常好;

1 Answers