manager_id file and agent_id not exist, waiting for register request

Viewed 30

如图:实际是监控的数据都有 ,就是状态不对 如何解决,监控log错误如下
image.png

time="2025-05-14T14:34:20.651+08:00" level=info msg="supervisord quit since agent is stopping"
time="2025-05-14T14:34:21.439+08:00" level=info msg="patroller quit since agent is stopping"
time="2025-05-14T14:34:21.439+08:00" level=info msg="doris agent stopped gracefully"
time="2025-05-14T14:34:37.846+08:00" level=info msg="Doris agent version: 24.1.5"
time="2025-05-14T14:34:37.846+08:00" level=info msg="Build version: go1.22.5"
time="2025-05-14T14:34:37.846+08:00" level=info msg="Build commit: 228caf91c"
time="2025-05-14T14:34:37.846+08:00" level=info msg="Build time: 2025-03-31T14:54:13+0800"
time="2025-05-14T14:34:37.846+08:00" level=info msg="doris agent starting"
time="2025-05-14T14:34:37.849+08:00" level=info msg="doris agent started"
time="2025-05-14T14:34:38.718+08:00" level=debug msg="manager_id file and agent_id not exist, waiting for register request"

time="2025-05-14T14:35:08.722+08:00" level=debug msg="manager_id file and agent_id not exist, waiting for register request"
time="2025-05-14T14:35:38.720+08:00" level=debug msg="manager_id file and agent_id not exist, waiting for register request"
time="2025-05-14T14:36:08.729+08:00" level=debug msg="manager_id file and agent_id not exist, waiting for register request"
time="2025-05-14T14:36:38.724+08:00" level=debug msg="manager_id file and agent_id not exist, waiting for register request"

1 Answers

是不是手动删除过{Agent部署目录}/bin目录里面的agent_id和manager_id文件?这两个是认证文件,不能手动删除的。目前已发布的manager版本还不能支持对已经存在的agent节点重新注册。
解决办法:
方法1:重新托管集群,会重新注册agent
方法2:[非官方解法]
2.1 到另外正常的agent机器上去,找到部署目录/bin下面的manager_id文件,拷贝一份过来(因为图片里面的三台机器所属于一个manager,所以manager_id文件内容是一样的)
2.2 然后到manager那台机器上面,找到webserver/log目录,假设你的agent的端口是默认的8972,cd webserver/log之后执行命令:

grep -C 20 -rnw "8972/heartbeat" . | grep -C 1 "agent_id"

将得到的agent_id的内容复制,写入到agent_id文件即可。