Skip to content

Instantly share code, notes, and snippets.

@hyperion-cs
Last active March 12, 2025 07:19
Show Gist options
  • Save hyperion-cs/c0df0110e6fc575dc85f797af4342afe to your computer and use it in GitHub Desktop.
Save hyperion-cs/c0df0110e6fc575dc85f797af4342afe to your computer and use it in GitHub Desktop.
PostgreSQL probably orphaned files
WITH opt AS (
SELECT '1 day'::interval older_than
),base_files AS (
SELECT
format('base/%s/%s', db.oid::text, ls.file) path,
ls.file, substring(ls.file, '^\d+') parent
FROM pg_database db
CROSS JOIN LATERAL pg_ls_dir('base/' || db.oid::text) ls(file)
WHERE db.datname = current_database() AND ls.file ~ '^\d+'
),probably_orphaned_parents AS (
SELECT * FROM base_files bf, opt
WHERE file = parent AND NOT EXISTS (
SELECT FROM pg_class p
WHERE pg_relation_filenode(p.oid)::text = bf.file
)
),probably_orphaned_all as (
SELECT bf.*, sf.* FROM probably_orphaned_parents pop
JOIN base_files bf ON bf.parent = pop.parent
JOIN opt ON true
CROSS JOIN LATERAL pg_stat_file(bf.path) AS sf
WHERE sf.access <= now() - opt.older_than OR sf.access IS NULL
)
SELECT path, pg_size_pretty(size) size_pretty, access access_ts
FROM probably_orphaned_all
UNION SELECT '~TOTAL~', (SELECT pg_size_pretty(SUM(size)) FROM probably_orphaned_all), NULL
ORDER BY path;
@hyperion-cs
Copy link
Author

hyperion-cs commented Mar 6, 2025

❗ The script above is provided as is, i.e. the authors are not responsible for the accuracy of the output results and their possible use. As a rule, it should be called as a superuser, because the functions pg_ls_dir and pg_stat_file are used.

It may help if there was a DBMS failure (including unfavorable transaction rollback and/or DBMS crash)
=> in this case, so-called “probably orphaned files” may probably appear. These files are physically present on the disk, but they are no longer logically connected with the database (i.e., they are actually just garbage). You can detect them (and later, possibly, manually delete them) with the help of this script.

As you can see, the script only reads the data, i.e. it does not make any changes on its own.

Options

older_than - show only orphans that were accessed earlier or equal than the specified interval (default is 1 day ago or earlier).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment