doris4.0.2 共5节点集群部署,仅1个节点部分磁盘故障就导致__internal_schema中的TABLET丢失且无法恢复

Viewed 6

以 __internal_schema 库中的 audit_log 表为例说明,该库该表都是系统自建的,
audit_log 的ddl如下:

-- __internal_schema.audit_log definition

CREATE TABLE `audit_log` (
  `query_id` varchar(48) NULL,
  `time` datetime(3) NULL,
  `client_ip` varchar(128) NULL,
  `user` varchar(128) NULL,
  `frontend_ip` varchar(1024) NULL,
  `catalog` varchar(128) NULL,
  `db` varchar(128) NULL,
  `state` varchar(128) NULL,
  `error_code` int NULL,
  `error_message` text NULL,
  `query_time` bigint NULL,
  `cpu_time_ms` bigint NULL,
  `peak_memory_bytes` bigint NULL,
  `scan_bytes` bigint NULL,
  `scan_rows` bigint NULL,
  `return_rows` bigint NULL,
  `shuffle_send_rows` bigint NULL,
  `shuffle_send_bytes` bigint NULL,
  `spill_write_bytes_from_local_storage` bigint NULL,
  `spill_read_bytes_from_local_storage` bigint NULL,
  `scan_bytes_from_local_storage` bigint NULL,
  `scan_bytes_from_remote_storage` bigint NULL,
  `parse_time_ms` int NULL,
  `plan_times_ms` map<text,int> NULL,
  `get_meta_times_ms` map<text,int> NULL,
  `schedule_times_ms` map<text,int> NULL,
  `hit_sql_cache` tinyint NULL,
  `handled_in_fe` tinyint NULL,
  `queried_tables_and_views` array<text> NULL,
  `chosen_m_views` array<text> NULL,
  `changed_variables` map<text,text> NULL,
  `sql_mode` text NULL,
  `stmt_type` varchar(48) NULL,
  `stmt_id` bigint NULL,
  `sql_hash` varchar(128) NULL,
  `sql_digest` varchar(128) NULL,
  `is_query` tinyint NULL,
  `is_nereids` tinyint NULL,
  `is_internal` tinyint NULL,
  `workload_group` text NULL,
  `compute_group` text NULL,
  `stmt` text NULL
) ENGINE=OLAP
DUPLICATE KEY(`query_id`, `time`, `client_ip`)
COMMENT 'Doris internal audit table, DO NOT MODIFY IT'
PARTITION BY RANGE(`time`)
(PARTITION p20251219 VALUES [('2025-12-19 00:00:00'), ('2025-12-20 00:00:00')),
PARTITION p20251220 VALUES [('2025-12-20 00:00:00'), ('2025-12-21 00:00:00')),
PARTITION p20251221 VALUES [('2025-12-21 00:00:00'), ('2025-12-22 00:00:00')),
PARTITION p20251222 VALUES [('2025-12-22 00:00:00'), ('2025-12-23 00:00:00')),
PARTITION p20251223 VALUES [('2025-12-23 00:00:00'), ('2025-12-24 00:00:00')),
PARTITION p20251224 VALUES [('2025-12-24 00:00:00'), ('2025-12-25 00:00:00')),
PARTITION p20251225 VALUES [('2025-12-25 00:00:00'), ('2025-12-26 00:00:00')),
PARTITION p20251226 VALUES [('2025-12-26 00:00:00'), ('2025-12-27 00:00:00')),
PARTITION p20251227 VALUES [('2025-12-27 00:00:00'), ('2025-12-28 00:00:00')))
DISTRIBUTED BY HASH(`query_id`) BUCKETS 2
PROPERTIES (
"replication_allocation" = "tag.location.default: 3",
"min_load_replica_num" = "-1",
"is_being_synced" = "false",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-30",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.replication_allocation" = "tag.location.default: 3",
"dynamic_partition.buckets" = "2",
"dynamic_partition.create_history_partition" = "false",
"dynamic_partition.history_partition_num" = "-1",
"dynamic_partition.hot_partition_num" = "0",
"dynamic_partition.reserved_history_periods" = "NULL",
"dynamic_partition.storage_policy" = "",
"storage_medium" = "hdd",
"storage_format" = "V2",
"inverted_index_storage_format" = "V3",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false",
"group_commit_interval_ms" = "10000",
"group_commit_data_bytes" = "134217728"
);

可以看到副本数已经设置成3了,但是用
SHOW REPLICA STATUS FROM audit_log 命令查看,可以看到该表前面8个tablet是单副本
image.png

这就导致我明明只有一个节点的磁盘出现异常,但是数据却没法修复了。并且该表还是系统表,我都不清楚能不能重建该表及该库下的其他表

0 Answers