2.1.8.1,stream load方式导入json文件,实际保存的结果条数与预期不符

Viewed 27

版本:

apache-doris-2.1.8.1-bin-arm64

stream load的header参数:

format:json,
max_filter_ratio:1,
read_json_by_line:true,

发送doris-streamload结果:

{"Status":"Success","Comment":"","BeginTxnTimeMs":0,"Message":"OK","NumberUnselectedRows":0,"CommitAndPublishTimeMs":41,"Label":"544446ac-20fe-41f5-a532-50f3739ac782","LoadBytes":358522815,"StreamLoadPutTimeMs":3,"NumberTotalRows":447129,"WriteDataTimeMs":33070,"ReceiveDataTimeMs":30587,"TxnId":181554569,"LoadTimeMs":33116,"TwoPhaseCommit":"false","ReadDataTimeMs":1048,"NumberLoadedRows":447129,"NumberFilteredRows":0}

数据库查询的结果总数:

image.png
本次共导入447129条数据,且没有任何报错信息,最终从数据库中查询只有303886条数据。

1 Answers

可能的原因

  1. 数据质量问题

    把max_filter_ratio设置为0,并且关闭严格模式,因为把max_filter_ratio设置为1所有有数据质量问题的数据都会被过滤

  2. 导入表模型是unique表
    unique表对于主键相同的数据会进行覆盖