doris4.0.2 共5节点集群部署，仅1个节点部分磁盘故障就导致__internal_schema中的TABLET丢失且无法恢复

Question

以 __internal_schema 库中的 audit_log 表为例说明，该库该表都是系统自建的，
audit_log 的ddl如下：

-- __internal_schema.audit_log definition

CREATE TABLE `audit_log` (
  `query_id` varchar(48) NULL,
  `time` datetime(3) NULL,
  `client_ip` varchar(128) NULL,
  `user` varchar(128) NULL,
  `frontend_ip` varchar(1024) NULL,
  `catalog` varchar(128) NULL,
  `db` varchar(128) NULL,
  `state` varchar(128) NULL,
  `error_code` int NULL,
  `error_message` text NULL,
  `query_time` bigint NULL,
  `cpu_time_ms` bigint NULL,
  `peak_memory_bytes` bigint NULL,
  `scan_bytes` bigint NULL,
  `scan_rows` bigint NULL,
  `return_rows` bigint NULL,
  `shuffle_send_rows` bigint NULL,
  `shuffle_send_bytes` bigint NULL,
  `spill_write_bytes_from_local_storage` bigint NULL,
  `spill_read_bytes_from_local_storage` bigint NULL,
  `scan_bytes_from_local_storage` bigint NULL,
  `scan_bytes_from_remote_storage` bigint NULL,
  `parse_time_ms` int NULL,
  `plan_times_ms` map NULL,
  `get_meta_times_ms` map NULL,
  `schedule_times_ms` map NULL,
  `hit_sql_cache` tinyint NULL,
  `handled_in_fe` tinyint NULL,
  `queried_tables_and_views` array NULL,
  `chosen_m_views` array NULL,
  `changed_variables` map NULL,
  `sql_mode` text NULL,
  `stmt_type` varchar(48) NULL,
  `stmt_id` bigint NULL,
  `sql_hash` varchar(128) NULL,
  `sql_digest` varchar(128) NULL,
  `is_query` tinyint NULL,
  `is_nereids` tinyint NULL,
  `is_internal` tinyint NULL,
  `workload_group` text NULL,
  `compute_group` text NULL,
  `stmt` text NULL
) ENGINE=OLAP
DUPLICATE KEY(`query_id`, `time`, `client_ip`)
COMMENT 'Doris internal audit table, DO NOT MODIFY IT'
PARTITION BY RANGE(`time`)
(PARTITION p20251219 VALUES [('2025-12-19 00:00:00'), ('2025-12-20 00:00:00')),
PARTITION p20251220 VALUES [('2025-12-20 00:00:00'), ('2025-12-21 00:00:00')),
PARTITION p20251221 VALUES [('2025-12-21 00:00:00'), ('2025-12-22 00:00:00')),
PARTITION p20251222 VALUES [('2025-12-22 00:00:00'), ('2025-12-23 00:00:00')),
PARTITION p20251223 VALUES [('2025-12-23 00:00:00'), ('2025-12-24 00:00:00')),
PARTITION p20251224 VALUES [('2025-12-24 00:00:00'), ('2025-12-25 00:00:00')),
PARTITION p20251225 VALUES [('2025-12-25 00:00:00'), ('2025-12-26 00:00:00')),
PARTITION p20251226 VALUES [('2025-12-26 00:00:00'), ('2025-12-27 00:00:00')),
PARTITION p20251227 VALUES [('2025-12-27 00:00:00'), ('2025-12-28 00:00:00')))
DISTRIBUTED BY HASH(`query_id`) BUCKETS 2
PROPERTIES (
"replication_allocation" = "tag.location.default: 3",
"min_load_replica_num" = "-1",
"is_being_synced" = "false",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-30",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.replication_allocation" = "tag.location.default: 3",
"dynamic_partition.buckets" = "2",
"dynamic_partition.create_history_partition" = "false",
"dynamic_partition.history_partition_num" = "-1",
"dynamic_partition.hot_partition_num" = "0",
"dynamic_partition.reserved_history_periods" = "NULL",
"dynamic_partition.storage_policy" = "",
"storage_medium" = "hdd",
"storage_format" = "V2",
"inverted_index_storage_format" = "V3",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false",
"group_commit_interval_ms" = "10000",
"group_commit_data_bytes" = "134217728"
);

可以看到副本数已经设置成3了，但是用
SHOW REPLICA STATUS FROM audit_log 命令查看，可以看到该表前面8个tablet是单副本

这就导致我明明只有一个节点的磁盘出现异常，但是数据却没法修复了。并且该表还是系统表，我都不清楚能不能重建该表及该库下的其他表

阿渊@SelectDB (没回帖直接加我主页微信) · Answer

首先可以先判断下这张表读写是否正常，如果说多副本表，比如3副本表，坏一个副本不影响读写，坏两个副本，可读，但是写入会有影响，因为写入要满足多数派。
如果说这张表正常读写，tablet是会自动修复的，或者可以手动介入。show tablet $tablet_id 查看 detail ，通过admin set replica status的方式价格坏副本手动设置成bad。
这张表是物理表，也可以手动重建。

您可以加我微信，我们及时沟通～

oicq1699 · Answer

谢谢回复
这个可能是系统一个bug吧，我不知道算不算bug,我猜测这几个表在be节点添加数量大于等于3时重新定义了表ddl吧，把1副本改成3副本了，但是并没有修复已存在的tablet数量，导致部分tablet数量还是1个。
最终我的处理是用空tablet去修复，但是数据肯定是有丢失，不过这几个表都是统计表，也没啥大影响

doris4.0.2 共5节点集群部署，仅1个节点部分磁盘故障就导致__internal_schema中的TABLET丢失且无法恢复

2 Answers