Featured image of post Zabbix 6.0 でデータベース肥大化が原因で Disk Full したので対処した

Zabbix 6.0 でデータベース肥大化が原因で Disk Full したので対処した

Summary

弊宅の監視をになっている Zabbix が Disk Full 直前であることに気づいて調査を始めた。

原因調査

弊宅ではヒストリーを特大保存することが想定されたので PostgreSQL に変更して、 TimescaleDB を導入したため圧縮率を確認した。その結果は下記で非常に効率的に圧縮されていて Disk Full になるような形跡はない。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
> sudo -u postgres psql -d zabbix-server

zabbix-server=# SELECT hypertable_name, pg_size_pretty(hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass))
  FROM timescaledb_information.hypertables;
 hypertable_name | hypertable_size
-----------------+-----------------
 history         | 2767 MB
 history_uint    | 2465 MB
 history_str     |   26 MB
 history_text    |   22 kB
 history_log     |   24 kB
 trends          |  279 MB
 trends_uint     |  202 MB
(7 rows)

zabbix-server=# SELECT hypertable_name, pg_size_pretty(hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass))
  FROM timescaledb_information.hypertables;
 hypertable_name | pg_size_pretty
-----------------+----------------
 history         | 1829 MB
 history_uint    | 1662 MB
 history_str     |   13 MB
 history_text    | 9144 kB
 history_log     |   24 kB
 trends          |  267 MB
 trends_uint     |  193 MB
(7 rows)

そうなると、次は「どのテーブル?」が気になるのでネットで拾ったクエリーで確認する
auditlog のテーブルが 55GB で特大である。。。これは Zabbix 6.0 で監査ログがめちゃくちゃ出るようになったのが影響してそうである。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
zabbix-server=# SELECT pgn.nspname, relname, pg_size_pretty(relpages::bigint * 8 * 1024) AS size, CASE WHEN relkind =
't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE pgd.reltoastrelid = pg.oid) WHEN nspname =
'pg_toast' AND relkind = 'i' THEN (SELECT pgt.relname FROM pg_class pgt WHERE SUBSTRING(pgt.relname
FROM 10) = REPLACE(SUBSTRING(pg.relname FROM 10), '_index', '')) ELSE (SELECT pgc.relname FROM
pg_class pgc WHERE pg.reltoastrelid = pgc.oid) END::varchar AS refrelname, CASE WHEN nspname =
'pg_toast' AND relkind = 'i' THEN (SELECT pgts.relname FROM pg_class pgts WHERE pgts.reltoastrelid =
(SELECT pgt.oid FROM pg_class pgt WHERE SUBSTRING(pgt.relname FROM 10) = REPLACE(SUBSTRING(pg.relname
FROM 10), '_index', ''))) END AS relidxrefrelname, relfilenode, relkind, reltuples::bigint, relpages
FROM pg_class pg, pg_namespace pgn WHERE pg.relnamespace = pgn.oid AND pgn.nspname NOT IN
('information_schema', 'pg_catalog') ORDER BY relpages DESC limit 100;
        nspname        |           relname           |  size   |    refrelname    | relidxrefrelname | relfilenode | relkind | reltuples | relpages
-----------------------+-----------------------------+---------+------------------+------------------+-------------+---------+-----------+----------
 public                | auditlog                    | 55 GB   | pg_toast_1487938 |                  |     1487938 | r       |  71637048 |  7255536
 public                | auditlog_pkey               | 3863 MB |                  |                  |     1487950 | i       |  71637048 |   494401
 public                | auditlog_1                  | 516 MB  |                  |                  |     1487952 | i       |  71637048 |    66045
 public                | auditlog_2                  | 493 MB  |                  |                  |     1487953 | i       |  71637048 |    63115
 public                | auditlog_3                  | 450 MB  |                  |                  |     1487954 | i       |  71637048 |    57610

How to cleanup database disk space from oversized auditlog database table - ZABBIX Forums

対処

普通に、 Table 上のデータを削除する試みも試したが PostgreSQL の仕様上、追記型データベースになり削除されたフラグを書き込みプロセスの任意のタイミングで VACUUM と言われる領域開放のコマンドが発行され Disk Block が開放される仕組みのため今回の様に Disk Full してからでは削除できない。

削除した場合は下記の順番で Disk が使用され詰んだ そのため作業を実施するには2倍の Disk 容量では足りず、 3倍の Disk が必要である。

  1. delete でデータを削除, 削除されたフラグが記録される
  2. WAL ログにも記録される
  3. 全部が終わるまでは書き込んだデータと VACUUM で消えるデータの分 Disk が必要

これが深夜の 23:00 に発覚したので辛かったゾ

Disk がなくてどうしょうもないため、インスタンスサイズを上げて処理したあと WAL-G の機能でリストアすることにする。 一時利用のインスタンスが上がってきたら下記で zabbix-server を止めておく

1
systemctl stop zabbix-server

次に現状の最古の監査ログと件数を確認しておく。
弊宅では 730日保管するように設定したが約 530 日分が保管されており、 7300万件あった。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
> sudo -u postgres psql -d zabbix-server -c 'select to_timestamp(min(clock)) as date from auditlog;'
          date          
------------------------
 2023-09-25 00:23:18+09
(1 row)

> sudo -u postgres psql -d zabbix-server -c 'select count(*) from auditlog;'
  count   
----------
 73160735   # <= 73,160,735

原因となりそうな箇所として、 Housekeeper で削除される最大件数が 500 件に設定されていた。
あとから調べた限り、監査ログは毎時6,000件は発生ていたようで、弊宅の環境では毎時100万になることはないため最大値を設定しまった。

/etc/zabbix/zabbix_server.conf
1
MaxHousekeeperDelete=1000000

PostgreSQL の設定では WAL-g を使ってバックアップとリストアを実施するため Archive モードを確認しておく

archive_modeon であることを確認しておく

/etc/postgresql/14/main/postgresql.conf
1
2
3
# - Archiving -

archive_mode = 'on'

ここまでできたら一度 PostgreSQL を再起動

1
systemctl restart postgresql

いよいよ監査ログを削除する。 今回は 15日分残すことにしたので、 1709218800 以前はバッサリ削除することにした。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 1709218800 2023-03-01 00:00:00
> sudo -u postgres psql -d zabbix-server -c 'delete from auditlog where clock < 1709218800;'
DELETE 60887120 # 60,887,120

> dh -h
Filesystem     1M-blocks  Used Available Use% Mounted on
tmpfs               1599     2      1598   1% /run
/dev/vda1         336097 94010    224152  30% /
tmpfs               7995     1      7995   1% /dev/shm
tmpfs                  5     0         5   0% /run/lock
tmpfs               1599     1      1599   1% /run/user/1001

> sudo -u postgres psql -d zabbix-server -c 'select to_timestamp(min(clock)) as date from auditlog;'
          date          
------------------------
 2024-03-01 00:00:04+09
(1 row)

> sudo -u postgres psql -d zabbix-server -c 'select count(*) from auditlog;'
  count   
----------
 12273619 # 12,273,619
(1 row)

前述の通り、 VACUUM しなければ Disk Block は開放されないため実施する。
これが完了すれば Disk は空くはずである

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
> sudo -u postgres psql -d zabbix-server -c 'vacuum full;'
VACUUM

> df -h
Filesystem     1M-blocks  Used Available Use% Mounted on
tmpfs               1599     2      1598   1% /run
/dev/vda1         336097 38239    279922  13% /
tmpfs               7995     1      7995   1% /dev/shm
tmpfs                  5     0         5   0% /run/lock
tmpfs               1599     1      1599   1% /run/user/1001

Table 容量も確認しておこう。結果 10GB 程度にダイエットが完了した。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
sudo -u postgres psql -d zabbix-server

zabbix-server=# SELECT pgn.nspname, relname, pg_size_pretty(relpages::bigint * 8 * 1024) AS size, CASE WHEN relkind =
't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE pgd.reltoastrelid = pg.oid) WHEN nspname =
'pg_toast' AND relkind = 'i' THEN (SELECT pgt.relname FROM pg_class pgt WHERE SUBSTRING(pgt.relname
FROM 10) = REPLACE(SUBSTRING(pg.relname FROM 10), '_index', '')) ELSE (SELECT pgc.relname FROM
pg_class pgc WHERE pg.reltoastrelid = pgc.oid) END::varchar AS refrelname, CASE WHEN nspname =
'pg_toast' AND relkind = 'i' THEN (SELECT pgts.relname FROM pg_class pgts WHERE pgts.reltoastrelid =
(SELECT pgt.oid FROM pg_class pgt WHERE SUBSTRING(pgt.relname FROM 10) = REPLACE(SUBSTRING(pg.relname
FROM 10), '_index', ''))) END AS relidxrefrelname, relfilenode, relkind, reltuples::bigint, relpages
FROM pg_class pg, pg_namespace pgn WHERE pg.relnamespace = pgn.oid AND pgn.nspname NOT IN
('information_schema', 'pg_catalog') ORDER BY relpages DESC limit 10;
        nspname        |       relname       |   size   |    refrelname    | relidxrefrelname | relfilenode | relkind | reltuples | relpages 
-----------------------+---------------------+----------+------------------+------------------+-------------+---------+-----------+----------
 public                | auditlog            | 10213 MB | pg_toast_1487938 |                  |    14615045 | r       |  12273619 |  1307311
 public                | auditlog_pkey       | 582 MB   |                  |                  |    14615050 | i       |  12273619 |    74519
 _timescaledb_internal | _hyper_1_3647_chunk | 130 MB   |                  |                  |    12588989 | r       |   2266514 |    16670
 _timescaledb_internal | _hyper_1_3639_chunk | 130 MB   |                  |                  |    12562069 | r       |   2263446 |    16648
 _timescaledb_internal | _hyper_1_3655_chunk | 128 MB   |                  |                  |    12629568 | r       |   2232690 |    16421
 _timescaledb_internal | _hyper_1_3663_chunk | 127 MB   |                  |                  |    12656420 | r       |   2205610 |    16221
 _timescaledb_internal | _hyper_1_3615_chunk | 124 MB   |                  |                  |    12489544 | r       |   2158248 |    15871
 _timescaledb_internal | _hyper_1_3623_chunk | 121 MB   |                  |                  |    12511324 | r       |   2102781 |    15465
 _timescaledb_internal | _hyper_1_3631_chunk | 119 MB   |                  |                  |    12535552 | r       |   2079475 |    15292
 _timescaledb_internal | _hyper_1_3673_chunk | 113 MB   |                  |                  |    12682644 | r       |   1960139 |    14416
(10 rows)

zabbix-server=# SELECT hypertable_name, pg_size_pretty(hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass))
  FROM timescaledb_information.hypertables;
 hypertable_name | pg_size_pretty 
-----------------+----------------
 history         | 2438 MB
 history_uint    | 2129 MB
 history_str     | 19 MB
 history_text    | 16 MB
 history_log     | 24 kB
 trends          | 265 MB
 trends_uint     | 179 MB
(7 rows)

バックアップの取得

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
> sudo -u postgres /usr/local/bin/wal-g backup-push
INFO: 2024/03/13 20:45:36.942069 Running remote backup through Postgres connection.
INFO: 2024/03/13 20:45:36.942145 Features like delta backup are disabled, there might be a performance impact.
INFO: 2024/03/13 20:45:36.942149 To run with local backup functionalities, supply [db_directory].
INFO: 2024/03/13 20:45:36.946127 Starting remote backup
INFO: 2024/03/13 20:45:36.978800 Streaming remote backup
INFO: 2024/03/13 20:45:36.984445 Adding data directory
INFO: 2024/03/13 20:45:40.425341 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:45:48.601133 This file base/1486151/14615045 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:46:02.575249 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:46:10.564937 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:46:13.159179 This file base/1486151/14615045.4 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:46:26.894618 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:46:32.419388 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:46:35.625057 This file base/1486151/14615045.7 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:46:50.693952 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:47:00.767213 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:47:02.461613 This file base/1486151/14615045.3 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:47:18.187064 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:47:31.972262 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:47:33.258112 This file base/1486151/14615045.6 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:47:47.199094 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:48:01.637708 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:48:03.311752 This file base/1486151/14615045.1 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:48:15.411863 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:48:31.695367 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:48:34.268306 This file base/1486151/14615045.2 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:48:46.795904 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:49:09.892732 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:49:12.314419 This file base/1486151/14615045.8 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:49:27.994861 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:49:41.526432 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 20:49:43.451676 This file base/1486151/14615045.5 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 20:49:57.294651 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:50:13.219922 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:50:37.404133 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:50:54.636412 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 20:51:16.451883 Finishing backup
INFO: 2024/03/13 20:51:16.451919 If wal-g hangs during this step, please Postgres log file for details.
INFO: 2024/03/13 20:51:16.457887 Updating metadata
INFO: 2024/03/13 20:51:16.457933 Uploading metadata
INFO: 2024/03/13 20:51:17.285365 Wrote backup with name base_0000000100000C4400000030

base backup を確認 base_0000000100000C4400000030 が作ったやつ

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
> sudo -u postgres /usr/local/bin/wal-g backup-list
name                          modified             wal_segment_backup_start
base_0000000100000BFC00000085 2024-03-03T15:33:21Z 0000000100000BFC00000085
base_0000000100000C020000003B 2024-03-04T15:48:18Z 0000000100000C020000003B
base_0000000100000C07000000F5 2024-03-05T16:11:46Z 0000000100000C07000000F5
base_0000000100000C0D00000076 2024-03-06T15:39:12Z 0000000100000C0D00000076
base_0000000100000C130000001D 2024-03-07T15:38:52Z 0000000100000C130000001D
base_0000000100000C18000000C0 2024-03-08T15:39:26Z 0000000100000C18000000C0
base_0000000100000C1E0000004C 2024-03-09T15:17:43Z 0000000100000C1E0000004C
base_0000000100000C23000000EE 2024-03-10T15:17:01Z 0000000100000C23000000EE
base_0000000100000C4400000009 2024-03-13T02:47:45Z 0000000100000C4400000009
base_0000000100000C4400000030 2024-03-13T03:51:18Z 0000000100000C4400000030

本番にリストア

本番側で DB を止めて ファイルを削除する

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
> systemctl stop postgresql
> systemctl status postgresql
○ postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Wed 2024-03-13 11:00:21 JST; 1h 15min ago
    Process: 1613 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 1613 (code=exited, status=0/SUCCESS)
        CPU: 1ms

Mar 13 01:19:30 zabbix-01 systemd[1]: Starting PostgreSQL RDBMS...
Mar 13 01:19:30 zabbix-01 systemd[1]: Finished PostgreSQL RDBMS.
Mar 13 11:00:21 zabbix-01 systemd[1]: postgresql.service: Deactivated successfully.
Mar 13 11:00:21 zabbix-01 systemd[1]: Stopped PostgreSQL RDBMS.

> df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           392M  1.3M  390M   1% /run
/dev/vda1        94G   89G  253M 100% /
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           392M  4.0K  392M   1% /run/user/1001

> sudo -u postgres bash -c 'rm -r /var/lib/postgresql/14/main/*'

> df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           392M  1.3M  390M   1% /run
/dev/vda1        94G   17G   73G  19% /
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           392M  4.0K  392M   1% /run/user/1001

バックアップの戻し LATEST backup is: 'base_0000000100000C4400000030' が先程の wal-g backup-push を一致することを確認しておく

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
> sudo -u postgres /usr/local/bin/wal-g backup-fetch /var/lib/postgresql/14/main/ LATEST
INFO: 2024/03/13 20:52:42.968259 Selecting the latest backup...
INFO: 2024/03/13 20:52:44.445459 LATEST backup is: 'base_0000000100000C4400000030'
INFO: 2024/03/13 20:52:51.465022 Finished extraction of part_001.tar.br
INFO: 2024/03/13 20:52:57.457942 Finished extraction of part_002.tar.br
INFO: 2024/03/13 20:53:02.447395 Finished extraction of part_010.tar.br
INFO: 2024/03/13 20:53:12.755538 Finished extraction of part_012.tar.br
INFO: 2024/03/13 20:53:13.637649 Finished extraction of part_004.tar.br
INFO: 2024/03/13 20:53:20.211405 Finished extraction of part_014.tar.br
INFO: 2024/03/13 20:53:20.894047 Finished extraction of part_008.tar.br
INFO: 2024/03/13 20:53:20.918344 Finished extraction of part_005.tar.br
INFO: 2024/03/13 20:53:31.182877 Finished extraction of part_018.tar.br
INFO: 2024/03/13 20:53:31.592347 Finished extraction of part_011.tar.br
INFO: 2024/03/13 20:53:34.101487 Finished extraction of part_016.tar.br
INFO: 2024/03/13 20:53:38.788110 Finished extraction of part_009.tar.br
INFO: 2024/03/13 20:53:39.836330 Finished extraction of part_006.tar.br
INFO: 2024/03/13 20:53:45.067718 Finished extraction of part_003.tar.br
INFO: 2024/03/13 20:53:47.445449 Finished extraction of part_007.tar.br
INFO: 2024/03/13 20:53:55.481410 Finished extraction of part_021.tar.br
INFO: 2024/03/13 20:54:08.276366 Finished extraction of part_019.tar.br
INFO: 2024/03/13 20:54:14.148661 Finished extraction of part_017.tar.br
INFO: 2024/03/13 20:54:14.300711 Finished extraction of part_020.tar.br
INFO: 2024/03/13 20:54:21.061829 Finished extraction of part_013.tar.br
INFO: 2024/03/13 20:54:35.256035 Finished extraction of part_022.tar.br
INFO: 2024/03/13 20:55:18.226019 Finished extraction of part_015.tar.br
INFO: 2024/03/13 20:55:18.475307 Finished extraction of pg_control.tar.br
INFO: 2024/03/13 20:55:18.475843 
Backup extraction complete.

展開された Disk 使用量も想定どおり

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
> ls -l /var/lib/postgresql/14/main/
total 84
-rw------- 1 postgres postgres  209 Mar 13 21:18 backup_label
drwx------ 7 postgres postgres 4096 Mar 13 21:20 base
drwx------ 2 postgres postgres 4096 Mar 13 21:21 global
drwx------ 2 postgres postgres 4096 Mar 13 21:18 log
drwx------ 2 postgres postgres 4096 Mar 13 21:18 pg_commit_ts
drwx------ 2 postgres postgres 4096 Mar 13 21:18 pg_dynshmem
drwx------ 4 postgres postgres 4096 Mar 13 21:18 pg_logical
drwx------ 4 postgres postgres 4096 Mar 13 21:21 pg_multixact
drwx------ 2 postgres postgres 4096 Mar 13 21:18 pg_notify
drwx------ 2 postgres postgres 4096 Mar 13 21:21 pg_replslot
drwx------ 2 postgres postgres 4096 Mar 13 21:21 pg_serial
drwx------ 2 postgres postgres 4096 Mar 13 21:21 pg_snapshots
drwx------ 2 postgres postgres 4096 Mar 13 21:18 pg_stat
drwx------ 2 postgres postgres 4096 Mar 13 21:21 pg_stat_tmp
drwx------ 2 postgres postgres 4096 Mar 13 21:21 pg_subtrans
drwx------ 2 postgres postgres 4096 Mar 13 21:18 pg_tblspc
drwx------ 2 postgres postgres 4096 Mar 13 21:18 pg_twophase
-rw------- 1 postgres postgres    3 Mar 13 21:21 PG_VERSION
drwx------ 3 postgres postgres 4096 Mar 13 21:21 pg_wal
drwx------ 2 postgres postgres 4096 Mar 13 21:18 pg_xact
-rw------- 1 postgres postgres   88 Mar 13 21:18 postgresql.auto.conf
-rw------- 1 postgres postgres    0 Mar 13 21:18 tablespace_map

> df -h
Filesystem     1M-blocks  Used Available Use% Mounted on
tmpfs                392     2       390   1% /run
/dev/vda1          95941 34135     56671  38% /
tmpfs               1956     0      1956   0% /dev/shm
tmpfs                  5     0         5   0% /run/lock
tmpfs                392     1       392   1% /run/user/1001

postgresql.conf にリストアコマンドを設定

/etc/postgresql/14/main/postgresql.conf
1
2
3
4
5
#restore_command = ''           # command to use to restore an archived logfile segment
                                # placeholders: %p = path of file to restore
                                #               %f = file name only
                                # e.g. 'cp /mnt/server/archivedir/%f %p'
restore_command = '/usr/local/bin/wal-g wal-fetch "%f" "%p"'

複数配置されている可能性もあるので確認

1
2
> grep -E '^(restore|recovery)' /etc/postgresql/14/main/postgresql.conf
restore_command = '/usr/local/bin/wal-g wal-fetch "%f" "%p"'

このファイルを作成することで PostgreSQL 起動時に restore_command が実行されて Database が再構築される

1
> sudo -u postgres touch /var/lib/postgresql/14/main/recovery.signal

PostgreSQL を起動して修復させる
starting archive recovery で始まり、 archive recovery complete で完了するはずである

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
> systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
     Active: active (exited) since Wed 2024-03-13 22:06:34 JST; 36s ago
    Process: 231285 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 231285 (code=exited, status=0/SUCCESS)
        CPU: 1ms

Mar 13 22:06:34 zabbix-01 systemd[1]: Starting PostgreSQL RDBMS...
Mar 13 22:06:34 zabbix-01 systemd[1]: Finished PostgreSQL RDBMS.

> 
2024-03-13 22:06:25.007 JST [231164] LOG:  starting PostgreSQL 14.11 (Ubuntu 14.11-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-03-13 22:06:25.008 JST [231164] LOG:  listening on IPv6 address "::1", port 5432
2024-03-13 22:06:25.008 JST [231164] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2024-03-13 22:06:25.008 JST [231164] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-03-13 22:06:25.015 JST [231165] LOG:  database system was interrupted; last known up at 2024-03-13 12:45:36 JST
ERROR: 2024/03/13 22:06:27.869465 Archive '00000002.history' does not exist.
2024-03-13 22:06:27.872 JST [231165] LOG:  starting archive recovery
2024-03-13 22:06:29.923 JST [231197] netbox@netbox FATAL:  the database system is starting up
2024-03-13 22:06:29.930 JST [231198] netbox@netbox FATAL:  the database system is starting up
2024-03-13 22:06:30.168 JST [231165] LOG:  restored log file "0000000100000C4400000030" from archive
2024-03-13 22:06:30.183 JST [231165] LOG:  redo starts at C44/30000028
INFO: 2024/03/13 22:06:30.907840 WAL-prefetch file:  0000000100000C4400000031
INFO: 2024/03/13 22:06:30.918103 WAL-prefetch file:  0000000100000C4400000032
INFO: 2024/03/13 22:06:30.928326 WAL-prefetch file:  0000000100000C4400000033
INFO: 2024/03/13 22:06:30.938626 WAL-prefetch file:  0000000100000C4400000034
INFO: 2024/03/13 22:06:30.948929 WAL-prefetch file:  0000000100000C4400000035
INFO: 2024/03/13 22:06:30.959148 WAL-prefetch file:  0000000100000C4400000036
INFO: 2024/03/13 22:06:30.969413 WAL-prefetch file:  0000000100000C4400000037
INFO: 2024/03/13 22:06:30.979660 WAL-prefetch file:  0000000100000C4400000038
INFO: 2024/03/13 22:06:30.989899 WAL-prefetch file:  0000000100000C4400000039
INFO: 2024/03/13 22:06:31.000128 WAL-prefetch file:  0000000100000C440000003A
2024-03-13 22:06:31.937 JST [231165] LOG:  restored log file "0000000100000C4400000031" from archive
2024-03-13 22:06:31.947 JST [231165] LOG:  consistent recovery state reached at C44/31000050
2024-03-13 22:06:31.948 JST [231164] LOG:  database system is ready to accept read-only connections
2024-03-13 22:06:32.717 JST [231165] LOG:  restored log file "0000000100000C4400000032" from archive
INFO: 2024/03/13 22:06:32.782694 WAL-prefetch file:  0000000100000C440000003B
2024-03-13 22:06:33.519 JST [231165] LOG:  restored log file "0000000100000C4400000033" from archive
INFO: 2024/03/13 22:06:33.594427 WAL-prefetch file:  0000000100000C440000003C
2024-03-13 22:06:34.326 JST [231165] LOG:  restored log file "0000000100000C4400000034" from archive
INFO: 2024/03/13 22:06:34.406829 WAL-prefetch file:  0000000100000C440000003D
ERROR: 2024/03/13 22:06:34.627416 Archive '0000000100000C440000003B' does not exist.

2024-03-13 22:06:35.106 JST [231165] LOG:  restored log file "0000000100000C4400000035" from archive
INFO: 2024/03/13 22:06:35.123471 WAL-prefetch file:  0000000100000C440000003B
INFO: 2024/03/13 22:06:35.154871 WAL-prefetch file:  0000000100000C440000003E
ERROR: 2024/03/13 22:06:35.402278 Archive '0000000100000C440000003C' does not exist.

2024-03-13 22:06:35.876 JST [231165] LOG:  restored log file "0000000100000C4400000036" from archive
INFO: 2024/03/13 22:06:35.911866 WAL-prefetch file:  0000000100000C440000003C
INFO: 2024/03/13 22:06:35.942650 WAL-prefetch file:  0000000100000C440000003F
ERROR: 2024/03/13 22:06:36.277734 Archive '0000000100000C440000003D' does not exist.

2024-03-13 22:06:36.645 JST [231165] LOG:  restored log file "0000000100000C4400000037" from archive
INFO: 2024/03/13 22:06:36.683864 WAL-prefetch file:  0000000100000C440000003D
INFO: 2024/03/13 22:06:36.714766 WAL-prefetch file:  0000000100000C4400000040
ERROR: 2024/03/13 22:06:36.872674 Archive '0000000100000C440000003B' does not exist.

ERROR: 2024/03/13 22:06:36.914242 Archive '0000000100000C440000003E' does not exist.

INFO: 2024/03/13 22:06:37.434530 WAL-prefetch file:  0000000100000C440000003B
2024-03-13 22:06:37.437 JST [231165] LOG:  restored log file "0000000100000C4400000038" from archive
INFO: 2024/03/13 22:06:37.465291 WAL-prefetch file:  0000000100000C440000003E
INFO: 2024/03/13 22:06:37.496499 WAL-prefetch file:  0000000100000C4400000041
ERROR: 2024/03/13 22:06:37.639744 Archive '0000000100000C440000003C' does not exist.

ERROR: 2024/03/13 22:06:37.715640 Archive '0000000100000C440000003F' does not exist.

INFO: 2024/03/13 22:06:38.217963 WAL-prefetch file:  0000000100000C440000003C
2024-03-13 22:06:38.243 JST [231165] LOG:  restored log file "0000000100000C4400000039" from archive
INFO: 2024/03/13 22:06:38.248569 WAL-prefetch file:  0000000100000C440000003F
INFO: 2024/03/13 22:06:38.279192 WAL-prefetch file:  0000000100000C4400000042
ERROR: 2024/03/13 22:06:38.469648 Archive '0000000100000C4400000040' does not exist.

ERROR: 2024/03/13 22:06:38.513063 Archive '0000000100000C440000003D' does not exist.

2024-03-13 22:06:38.996 JST [231165] LOG:  restored log file "0000000100000C440000003A" from archive
INFO: 2024/03/13 22:06:39.013546 WAL-prefetch file:  0000000100000C440000003D
INFO: 2024/03/13 22:06:39.044783 WAL-prefetch file:  0000000100000C4400000040
INFO: 2024/03/13 22:06:39.076103 WAL-prefetch file:  0000000100000C4400000043
ERROR: 2024/03/13 22:06:39.291183 Archive '0000000100000C440000003E' does not exist.

ERROR: 2024/03/13 22:06:39.308130 Archive '0000000100000C4400000041' does not exist.

ERROR: 2024/03/13 22:06:39.309227 Archive '0000000100000C440000003B' does not exist.

INFO: 2024/03/13 22:06:39.750725 WAL-prefetch file:  0000000100000C440000003B
INFO: 2024/03/13 22:06:39.781402 WAL-prefetch file:  0000000100000C440000003E
INFO: 2024/03/13 22:06:39.812008 WAL-prefetch file:  0000000100000C4400000041
INFO: 2024/03/13 22:06:39.843517 WAL-prefetch file:  0000000100000C4400000044
ERROR: 2024/03/13 22:06:39.972609 Archive '0000000100000C440000003C' does not exist.

ERROR: 2024/03/13 22:06:40.055627 Archive '0000000100000C440000003F' does not exist.

ERROR: 2024/03/13 22:06:40.164410 Archive '0000000100000C4400000042' does not exist.

ERROR: 2024/03/13 22:06:40.805630 Archive '0000000100000C440000003D' does not exist.

ERROR: 2024/03/13 22:06:40.843434 Archive '0000000100000C4400000040' does not exist.

ERROR: 2024/03/13 22:06:41.002822 Archive '0000000100000C4400000043' does not exist.

ERROR: 2024/03/13 22:06:41.484332 Archive '0000000100000C440000003B' does not exist.

ERROR: 2024/03/13 22:06:41.546919 Archive '0000000100000C4400000041' does not exist.

ERROR: 2024/03/13 22:06:41.625724 Archive '0000000100000C440000003E' does not exist.

ERROR: 2024/03/13 22:06:41.748877 Archive '0000000100000C4400000044' does not exist.

ERROR: 2024/03/13 22:06:41.804121 Archive '0000000100000C440000003B' does not exist.
2024-03-13 22:06:41.807 JST [231165] LOG:  redo done at C44/3A000110 system usage: CPU: user: 0.00 s, system: 0.02 s, elapsed: 11.62 s
2024-03-13 22:06:41.807 JST [231165] LOG:  last completed transaction was at log time 2024-03-13 15:39:32.252469+09
2024-03-13 22:06:43.320 JST [231165] LOG:  restored log file "0000000100000C440000003A" from archive
INFO: 2024/03/13 22:06:44.065522 WAL-prefetch file:  0000000100000C440000003B
INFO: 2024/03/13 22:06:44.075757 WAL-prefetch file:  0000000100000C440000003C
INFO: 2024/03/13 22:06:44.086004 WAL-prefetch file:  0000000100000C440000003D
INFO: 2024/03/13 22:06:44.096180 WAL-prefetch file:  0000000100000C440000003E
INFO: 2024/03/13 22:06:44.106405 WAL-prefetch file:  0000000100000C440000003F
INFO: 2024/03/13 22:06:44.116670 WAL-prefetch file:  0000000100000C4400000040
INFO: 2024/03/13 22:06:44.126875 WAL-prefetch file:  0000000100000C4400000041
INFO: 2024/03/13 22:06:44.137178 WAL-prefetch file:  0000000100000C4400000042
INFO: 2024/03/13 22:06:44.147359 WAL-prefetch file:  0000000100000C4400000043
INFO: 2024/03/13 22:06:44.157616 WAL-prefetch file:  0000000100000C4400000044
ERROR: 2024/03/13 22:06:45.824583 Archive '0000000100000C440000003F' does not exist.

ERROR: 2024/03/13 22:06:45.877488 Archive '0000000100000C4400000041' does not exist.

ERROR: 2024/03/13 22:06:45.892335 Archive '0000000100000C440000003B' does not exist.

ERROR: 2024/03/13 22:06:45.899515 Archive '0000000100000C440000003E' does not exist.

ERROR: 2024/03/13 22:06:45.905118 Archive '0000000100000C440000003C' does not exist.

ERROR: 2024/03/13 22:06:45.916719 Archive '00000002.history' does not exist.
ERROR: 2024/03/13 22:06:45.917252 Archive '0000000100000C4400000040' does not exist.

2024-03-13 22:06:45.919 JST [231165] LOG:  selected new timeline ID: 2
ERROR: 2024/03/13 22:06:45.932770 Archive '0000000100000C440000003D' does not exist.

2024-03-13 22:06:45.942 JST [231165] LOG:  archive recovery complete
ERROR: 2024/03/13 22:06:45.950458 Archive '0000000100000C4400000044' does not exist.

ERROR: 2024/03/13 22:06:45.965251 Archive '0000000100000C4400000042' does not exist.

ERROR: 2024/03/13 22:06:45.975817 Archive '0000000100000C4400000043' does not exist.

ERROR: 2024/03/13 22:06:48.562238 Archive '00000001.history' does not exist.
2024-03-13 22:06:48.580 JST [231164] LOG:  database system is ready to accept connections
2024-03-13 22:06:48.582 JST [231469] LOG:  TimescaleDB background worker launcher connected to shared catalogs
INFO: 2024/03/13 22:06:51.363506 FILE PATH: 00000002.history.br

起動したら、 テーブルサイズなどを確認しておく

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
sudo -u postgres psql -d zabbix-server

zabbix-server=# SELECT pgn.nspname, relname, pg_size_pretty(relpages::bigint * 8 * 1024) AS size, CASE WHEN relkind =
't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE pgd.reltoastrelid = pg.oid) WHEN nspname =
'pg_toast' AND relkind = 'i' THEN (SELECT pgt.relname FROM pg_class pgt WHERE SUBSTRING(pgt.relname
FROM 10) = REPLACE(SUBSTRING(pg.relname FROM 10), '_index', '')) ELSE (SELECT pgc.relname FROM
pg_class pgc WHERE pg.reltoastrelid = pgc.oid) END::varchar AS refrelname, CASE WHEN nspname =
'pg_toast' AND relkind = 'i' THEN (SELECT pgts.relname FROM pg_class pgts WHERE pgts.reltoastrelid =
(SELECT pgt.oid FROM pg_class pgt WHERE SUBSTRING(pgt.relname FROM 10) = REPLACE(SUBSTRING(pg.relname
FROM 10), '_index', ''))) END AS relidxrefrelname, relfilenode, relkind, reltuples::bigint, relpages
FROM pg_class pg, pg_namespace pgn WHERE pg.relnamespace = pgn.oid AND pgn.nspname NOT IN
('information_schema', 'pg_catalog') ORDER BY relpages DESC limit 10;
        nspname        |       relname       |   size   |    refrelname    | relidxrefrelname | relfilenode | relkind | reltuples | relpages 
-----------------------+---------------------+----------+------------------+------------------+-------------+---------+-----------+----------
 public                | auditlog            | 10213 MB | pg_toast_1487938 |                  |    14615045 | r       |  12271772 |  1307311
 public                | auditlog_pkey       | 582 MB   |                  |                  |    14615050 | i       |  12271772 |    74519
 _timescaledb_internal | _hyper_1_3647_chunk | 130 MB   |                  |                  |    12588989 | r       |   2266514 |    16670
 _timescaledb_internal | _hyper_1_3639_chunk | 130 MB   |                  |                  |    12562069 | r       |   2263446 |    16648
 _timescaledb_internal | _hyper_1_3655_chunk | 128 MB   |                  |                  |    12629568 | r       |   2232690 |    16421
 _timescaledb_internal | _hyper_1_3663_chunk | 127 MB   |                  |                  |    12656420 | r       |   2205610 |    16221
 _timescaledb_internal | _hyper_1_3615_chunk | 124 MB   |                  |                  |    12489544 | r       |   2158248 |    15871
 _timescaledb_internal | _hyper_1_3623_chunk | 121 MB   |                  |                  |    12511324 | r       |   2102781 |    15465
 _timescaledb_internal | _hyper_1_3631_chunk | 119 MB   |                  |                  |    12535552 | r       |   2079475 |    15292
 _timescaledb_internal | _hyper_1_3673_chunk | 113 MB   |                  |                  |    12682644 | r       |   1960139 |    14416
(10 rows)

zabbix-server=# SELECT hypertable_name, pg_size_pretty(hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass))
  FROM timescaledb_information.hypertables;
 hypertable_name | pg_size_pretty 
-----------------+----------------
 history         | 2438 MB
 history_uint    | 2129 MB
 history_str     | 19 MB
 history_text    | 16 MB
 history_log     | 24 kB
 trends          | 265 MB
 trends_uint     | 179 MB
(7 rows)

zabbix-server=# select to_timestamp(min(clock)) as date from auditlog;
          date          
------------------------
 2024-03-01 00:00:04+09
(1 row)

zabbix-server=# select count(*) from auditlog;
  count   
----------
 12273629
(1 row)

1
2
3
4
5
6
7
> df -m
Filesystem     1M-blocks  Used Available Use% Mounted on
tmpfs                392     2       390   1% /run
/dev/vda1          95941 34370     56436  38% /
tmpfs               1956     1      1956   1% /dev/shm
tmpfs                  5     0         5   0% /run/lock
tmpfs                392     1       392   1% /run/user/1001

事後バックアップを取得

作業が終わったので base バックアップを再度取得して S3 に保管しておく

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
> sudo -u postgres /usr/local/bin/wal-g backup-push
INFO: 2024/03/13 16:27:54.308194 Running remote backup through Postgres connection.
INFO: 2024/03/13 16:27:54.308237 Features like delta backup are disabled, there might be a performance impact.
INFO: 2024/03/13 16:27:54.308243 To run with local backup functionalities, supply [db_directory].
INFO: 2024/03/13 16:27:54.311153 Starting remote backup
INFO: 2024/03/13 16:27:54.394404 Streaming remote backup
INFO: 2024/03/13 16:27:54.398481 Adding data directory
INFO: 2024/03/13 16:27:57.292151 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:28:00.459655 This file base/1486151/14615045 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:28:10.562509 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:28:17.266360 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:28:19.918033 This file base/1486151/14615045.4 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:28:29.294349 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:28:33.425415 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:28:36.106621 This file base/1486151/14615045.7 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:28:45.449469 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:28:53.925799 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:28:55.575111 This file base/1486151/14615045.3 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:29:04.921248 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:29:13.961158 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:29:16.913536 This file base/1486151/14615045.6 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:29:27.625260 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:29:37.194185 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:29:38.972942 This file base/1486151/14615045.1 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:29:47.820663 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:29:59.633128 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:30:01.481208 This file base/1486151/14615045.2 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:30:11.487502 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:30:28.594310 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:30:32.375201 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:30:33.760537 This file base/1486151/14615045.8 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:30:43.351224 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:30:53.651041 maxTarSize exceeded. Closing this tar.
WARNING: 2024/03/13 16:30:55.854545 This file base/1486151/14615045.5 is larger than max tar size. It will have its own tar file, which will be larger than the selected max tar size.
INFO: 2024/03/13 16:31:05.428701 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:31:16.780749 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:31:31.963916 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:31:44.598928 maxTarSize exceeded. Closing this tar.
INFO: 2024/03/13 16:32:01.543754 Finishing backup
INFO: 2024/03/13 16:32:01.543798 If wal-g hangs during this step, please Postgres log file for details.
INFO: 2024/03/13 16:32:01.550277 Updating metadata
INFO: 2024/03/13 16:32:01.550325 Uploading metadata
INFO: 2024/03/13 16:32:02.399014 Wrote backup with name base_0000000200000C4400000055

> sudo -u postgres /usr/local/bin/wal-g backup-list
name                          modified             wal_segment_backup_start
base_0000000100000BFC00000085 2024-03-03T15:33:21Z 0000000100000BFC00000085
base_0000000100000C020000003B 2024-03-04T15:48:18Z 0000000100000C020000003B
base_0000000100000C07000000F5 2024-03-05T16:11:46Z 0000000100000C07000000F5
base_0000000100000C0D00000076 2024-03-06T15:39:12Z 0000000100000C0D00000076
base_0000000100000C130000001D 2024-03-07T15:38:52Z 0000000100000C130000001D
base_0000000100000C18000000C0 2024-03-08T15:39:26Z 0000000100000C18000000C0
base_0000000100000C1E0000004C 2024-03-09T15:17:43Z 0000000100000C1E0000004C
base_0000000100000C23000000EE 2024-03-10T15:17:01Z 0000000100000C23000000EE
base_0000000100000C4400000009 2024-03-13T02:47:45Z 0000000100000C4400000009
base_0000000100000C4400000030 2024-03-13T03:51:18Z 0000000100000C4400000030
base_0000000200000C4400000055 2024-03-13T07:32:03Z 0000000200000C4400000055

これで乗ってるサービスを再起動すれば問題なくもとに戻る。

Disk 監視はちゃんとやろうなお兄さんと約束だぞw

予防策

監査ログ保管期間の修正

デフォルトでは 356d になりそれだけで Disk を弊宅の場合 40GB 以上食うことになるため、 15d に変更した。
Zabbix 6.0 では監査ログを API で取得できるため必要になったら 毎日 Export する仕組みを作ろうと思う

管理 > 一般設定 > 監査ログ

Housekeeper の設定修正

ParameterMandatoryRangeDefaultDescription
MaxHousekeeperDeleteno0-10000005000No more than MaxHousekeeperDelete rows (corresponding to [tablename], [field], [value]) will be deleted per one task in one housekeeping cycle.
If set to 0 then no limit is used at all. In this case you must know what you are doing, so as not to overload the database! 2
This parameter applies only to deleting history and trends of already deleted items.

1 Zabbix server
/etc/zabbix/zabbix_server.conf
1
MaxHousekeeperDelete=1000000

参考情報

How to cleanup database disk space from oversized auditlog database table - ZABBIX Forums

Built with Hugo
テーマ StackJimmy によって設計されています。