集群环境
- 版本 doris-2.0.5-rc02
- docker 部署 3 FE 16 core 64G内存 JVM max 32G
- docker 部署 10 BE 16 core 64G内存
- BE 和 FE 混合部署的节点使用不同磁盘
- BE磁盘 6893G xfs (混部节点为 5893G xfs)
- FE磁盘 893G xfs
现象
运维发现5号be节点的4号磁盘使用率持续上涨至80%,其他磁盘大概在 50%左右
- 对5号be节点执行 ADMIN CLEAN TRASH ON(BackendHost:BackendHeartBeatPort); TrashUsedCapcacity 下降至0
- 对5号be节点执行 ADMIN REBALANCE DISK 等了十多分钟,还是没有什么变化
- 对整个集群进行 ADMIN CLEAN TRASH
- 通过 api 修改所有 be 和 fe 水位线
``/api/update_config?storage_flood_stage_usage_percent=80\&persist=true /api/update_config? storage_flood_stage_left_capacity_bytes=193273528320\&persist=true /api/_set_config?storage_high_watermark_usage_percent=78\&storage_min_left_capacity_bytes=211527139328\&persist=true\&reset_persist=false /api/_set_config?storage_flood_stage_usage_percent=80\&storage_flood_stage_left_capacity_bytes=193273528320\&persist=true\&reset_persist=false
- stream load 出现错误, 5号be节点的4号达到限制
``disk /opt/apache-doris/be/storage3 on backend 15617 exceed limit usage, path hash: -8679819090117116242
- 修改5号be节点的storage_flood_stage_usage_percent为95
- stream load 恢复
- 5号be节点4号磁盘使用率达到95%
- stream load 再次出现错误, 5号be节点的4号达到限制
- 下线5号be节点 ALTER SYSTEM DECOMMISSION BACKEND
- stream load 恢复
- 8号节点磁盘使用率达到80%,stream load 再次出现错误
disk /opt/apache-doris/be/storage3 on backend 15642
- 停止stream load
be 磁盘情况
其中 5、6、7 节点为 fe 所在节点
排查过程
- 参数配置
disable_balance、disable_disk_balance、disable_colocate_balance、
disable_tablet_scheduler 均为 false - 均衡权重检查
对整个集群进行 ADMIN CLEAN TRASH,检查时5号节点为LOW,4号磁盘为 HIGH - 均衡任务执行情况检查
16个任务都在RUNNING,14个balance,记得SrcBe和DestBe都是同一个
错误日志
fe.log
2024-05-12 11:59:38,344 INFO (thrift-server-pool-63|49686) [DatabaseTransactionMgr.abortTransaction():1405] abort transaction: TransactionState. transaction id: 332518, label: 8b0b5e45-d6e6-4f8e-8596-059cd58d3576, db id: 16009, table id list: 46171, callback id: -1, coordinator: BE: 10.161.71.111, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1715515178342, commit time: -1, finish time: 1715515178343, reason: [ANALYSIS_ERROR]TStatus: errCode = 2, detailMessage = disk /opt/apache-doris/be/storage3 on backend 15617 exceed limit usage, path hash: -8679819090117116242
0# doris::Status doris::Status::create<true>(doris::TStatus const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
1# doris::StreamLoadAction::_process_put(doris::HttpRequest*, std::shared_ptr<doris::StreamLoadContext>) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:445
2# doris::StreamLoadAction::_on_header(doris::HttpRequest*, std::shared_ptr<doris::StreamLoadContext>) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
3# doris::StreamLoadAction::on_header(doris::HttpRequest*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
4# doris::EvHttpServer::on_header(evhttp_request*) at /home/zcp/repo_center/doris_release/doris/be/src/http/ev_http_server.cpp:255
5# ?
6# bufferevent_run_readcb_
7# ?
8# ?
9# ?
10# ?
11# std::_Function_handler<void (), doris::EvHttpServer::start()::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/atomicity.h:98
12# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:0
13# doris::Thread::supervise_thread(void*) at /var/local/ldb_toolchain/bin/../usr/include/pthread.h:562
14# ?
15# clone
successfully
fe.warn.log
2024-05-12 10:57:11,792 WARN (thrift-server-pool-132|63851) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49799, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:11,856 WARN (thrift-server-pool-124|50415) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49815, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:12,806 WARN (thrift-server-pool-32|414) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49799, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:12,872 WARN (thrift-server-pool-106|50331) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49815, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:13,809 WARN (thrift-server-pool-132|63851) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49799, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:13,873 WARN (thrift-server-pool-124|50415) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49815, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:31,959 WARN (thrift-server-pool-30|412) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49815, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:32,019 WARN (thrift-server-pool-74|50095) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49799, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:32,961 WARN (thrift-server-pool-107|50332) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49815, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:33,029 WARN (thrift-server-pool-32|414) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49799, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:33,968 WARN (thrift-server-pool-106|50331) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49815, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:34,084 WARN (thrift-server-pool-132|63851) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:114, be_port:9595, http_port:9596), task_type:CLONE, signature:49799, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(114)[INTERNAL_ERROR]Disk reach capacity limit]), report_version:17151864112113)
2024-05-12 10:57:34,829 WARN (ForkJoinPool-1-worker-15|106673) [TabletInvertedIndex.lambda$null$0():190] replica 88559 of tablet 49803 on backend 15621 need recovery. replica in FE: [replicaId=88559, BackendId=15621, version=55525, dataSize=176756482189, rowCount=2528641921, lastFailedVersion=63872, lastSuccessVersion=55525, lastFailedTimestamp=1715511452350, schemaHash=971940348, state=NORMAL], report version 55525, report schema hash: 971940348, is bad: false, is version missing: true
fe.log 搜索的均衡任务日志
2024-05-12 10:37:12,732 INFO (tablet scheduler|40) [TabletScheduler.addTablet():272] Add tablet to pending queue, tablet id: 72486, state: PENDING, type: BALANCE, balance: BE_BALANCE, priority: LOW, tablet size: 0, visible version: -1, committed version: -1
2024-05-12 10:37:12,736 INFO (tablet scheduler|40) [TabletScheduler.removeTabletCtx():1589] remove the tablet tablet id: 72486, status: HEALTHY, state: PENDING, type: BALANCE, balance: BE_BALANCE, priority: LOW, tablet size: 0, from backend: 15558, src path hash: -3799794340428994182, visible version: 1, committed version: 1. err: unable to find low backend. because: unable to find low backend
2024-05-12 11:00:00,144 INFO (tablet scheduler|40) [BeLoadRebalancer.selectAlternativeTabletsForCluster():220] select alternative tablets, medium: HDD, num: 3, detail: [72486, 40764, 21717]
2024-05-12 11:00:00,144 INFO (tablet scheduler|40) [TabletScheduler.addTablet():272] Add tablet to pending queue, tablet id: 72486, state: PENDING, type: BALANCE, balance: BE_BALANCE, priority: LOW, tablet size: 0, visible version: -1, committed version: -1
2024-05-12 11:00:00,144 INFO (tablet scheduler|40) [TabletScheduler.removeTabletCtx():1589] remove the tablet tablet id: 72486, status: HEALTHY, state: PENDING, type: BALANCE, balance: BE_BALANCE, priority: LOW, tablet size: 0, from backend: 15558, src path hash: -3799794340428994182, visible version: 1, committed version: 1. err: unable to find low backend. because: unable to find low backend
2024-05-12 11:01:20,847 INFO (tablet scheduler|40) [BeLoadRebalancer.selectAlternativeTabletsForCluster():220] select alternative tablets, medium: HDD, num: 3, detail: [72486, 45971, 42265]
2024-05-12 11:01:20,848 INFO (tablet scheduler|40) [TabletScheduler.addTablet():272] Add tablet to pending queue, tablet id: 72486, state: PENDING, type: BALANCE, balance: BE_BALANCE, priority: LOW, tablet size: 0, visible version: -1, committed version: -1
2024-05-12 11:01:22,863 INFO (tablet scheduler|40) [TabletScheduler.removeTabletCtx():1589] remove the tablet tablet id: 72486, status: HEALTHY, state: PENDING, type: BALANCE, balance: BE_BALANCE, priority: LOW, tablet size: 0, from backend: 15558, src path hash: -3799794340428994182, visible version: 1, committed version: 1. err: unable to find low backend. because: unable to find low backend
应用
数据分布
SHOW PARTITIONS FROM 100301;
分区 | 数据量 |
---|---|
220240510 | 277.070 GB |
220240511 | 6.648 TB |
220240512 | 5.653 TB |
建表语句
CREATE TABLE `100301` (
in_time DATETIME,
pcode char(2),
s_logo CHAR(4),
d_flag BOOLEAN DEFAULT '0',
s_id VARCHAR(10),
dgi STRING,
sri STRING,
fid STRING,
ofa STRING,
cid STRING,
usage_type STRING,
file_category STRING,
exception_file_id STRING,
exception_cdr_id STRING,
city_code STRING,
sorting_action STRING,
exception_type STRING,
record_type STRING,
phone_no VARCHAR(200),
call_start_time STRING,
call_end_time STRING,
billed_duration BIGINT,
charging_condition_changes STRING,
data_traffic_list STRING,
total_charging_volume STRING,
service_code STRING,
upstream_traffic_1 BIGINT,
downstream_traffic_1 BIGINT,
upstream_traffic_2 BIGINT,
downstream_traffic_2 BIGINT,
mobile_user_IMSI STRING,
mobile_device_IMEI STRING,
location_area_identification STRING,
cell_number STRING,
home_network_operator STRING,
roaming_network_operator STRING,
home_location_area_code STRING,
visited_location_area_code STRING,
home_province STRING,
roaming_province STRING,
SGSN_service_node_IP_address STRING,
current_GGSN_PGW_IP_address STRING,
network_initiated_PDP_context STRING,
PDP_context_billing_Identifier STRING,
PDP_type STRING,
camel_related_to_PDP_context STRING,
IMS_signaling_PDP_context_flag STRING,
service_fee STRING,
communication_charge STRING,
charge_code STRING,
partial_record_indicator STRING,
mobile_network_capabilities STRING,
routing_area_at_record_creation STRING,
location_area_at_record_creation STRING,
cell_identity_or_service_area_code STRING,
APN_network_identifier STRING,
APN_selection_mode STRING,
APN_operational_identifier STRING,
SGSN_change_flag STRING,
used_SGSN_PLMN_identifier STRING,
roaming_type STRING,
user_type STRING,
service_major_class STRING,
record_closure_reason STRING,
supplementary_fields STRING,
user_data_charging_characteristics STRING,
rat_type_value STRING,
charging_features_selection_mode STRING,
GSN_code STRING,
camel_charging_information_set STRING,
user_account STRING,
access_point_info STRING,
access_controller_IP_address STRING,
nas_IP_address STRING,
ap_ssid STRING,
online_Charging_Session_Description STRING,
pdn_connection_identifier STRING,
user_csg_information STRING,
ipv4_address STRING,
ims_signaling STRING,
p_gw_control_plane_IP_address STRING,
pdn_type_ipv4_and_ipv6_dual_stack_info STRING,
service_priority STRING,
radio_resource_occupancy_priority STRING,
uplink_bandwidth STRING,
downlink_bandwidth STRING,
guaranteed_bandwidth STRING,
reserved_field_1 STRING,
reserved_field_2 STRING,
reserved_field_3 STRING,
rg STRING,
reserved_field_5 STRING,
sgsn_ip_address_ipv6 STRING,
current_ggsn_pgws_ip_address_ipv6 STRING,
served_pdppdn_address_ipv6 STRING,
reserved_field_6 STRING,
reserved_field_7 STRING,
reserved_field_8 STRING,
reserved_field_9 STRING,
reserved_field_10 STRING,
served_pdppdn_address_ipv4 STRING,
apn_operational_identifier_duplicate STRING,
network_initiated_pdp_context_duplicate STRING,
pdp_context_billing_identifier_duplicate STRING,
roaming_city_code STRING,
downstream_traffic_1_duplicate BIGINT,
upstream_traffic_1_duplicate BIGINT,
current_ggsn_pgws_ip_address_ipv6_duplicate STRING,
served_pdppdn_address_ipv6_duplicate STRING,
qci STRING,
/* 索引 */
INDEX idx_phone_no (`phone_no`) USING INVERTED PROPERTIES("parser" = "english")
)
ENGINE = OLAP
DUPLICATE KEY(in_time,pcode,s_logo,d_flag)
PARTITION BY RANGE (`in_time`) ()
DISTRIBUTED BY HASH(`phone_no`) BUCKETS AUTO
PROPERTIES (
"compression" = "zstd",
"replication_allocation" = "tag.location.default: 3",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-10",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "jc2",
"dynamic_partition.create_history_partition" = "true",
"dynamic_partition.history_partition_num" = "10",
"bloom_filter_columns" = "phone_no"
);
问题
- 是单个tablet数据过多导致的无法节点内均衡么?
- 单个be节点的单磁盘到达水位线后,看stream load 错误信息还是尝试写入那个磁盘是因为什么?不会尝试写入其他低负载磁盘么?