10000 gh-134004: Dbm vacuuming by Andrea-Oliveri · Pull Request #134028 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-134004: Dbm vacuuming #134028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 1, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
7505713
Added tests for vacuuming functionality of dbm
Andrea-Oliveri May 14, 2025
1147774
Added vacuuming logic to dbm.sqlite
Andrea-Oliveri May 14, 2025
109a378
Added vacuuming logic to dbm.dumb
Andrea-Oliveri May 14, 2025
cdacb53
Updated documentation of dbm
Andrea-Oliveri May 14, 2025
02a7b8a
Adapted vacuum tests to allow for submodules missing method
Andrea-Oliveri May 14, 2025
dcb43a2
Pushing news and acks entries
Andrea-Oliveri May 15, 2025
89fb2db
Changed News entry to avoid failure during Doc testing due to referen…
Andrea-Oliveri May 15, 2025
476dc55
Changed method names from .vacuum to .reorganize in dbm.sqlite and db…
Andrea-Oliveri May 15, 2025
19c0c8d
Added .reorganize() method in shelve to expose dbm submodule's own .r…
Andrea-Oliveri May 15, 2025
88b4014
Added documentation for shelve.reorganize
Andrea-Oliveri May 15, 2025
5c1d45f
Fixed link in doc
Andrea-Oliveri May 15, 2025
992e7aa
Updated news
Andrea-Oliveri May 15, 2025
b96480b
PR review: removed unnecessary .keys()
Andrea-Oliveri May 15, 2025
4c23b64
Updated documentation to correct notes indentation
Andrea-Oliveri May 17, 2025
8a80977
Left previously removed comment as requested in PR
Andrea-Oliveri May 17, 2025
166a553
Modified documentation of dbm.dumb warning to align with shelve warning
Andrea-Oliveri May 17, 2025
6f34de5
Skipping test instead of succeeding if method not implemented for sub…
Andrea-Oliveri May 28, 2025
059ad82
Converted redundant f-string to regular string
Andrea-Oliveri May 28, 2025
3e7049f
Added versionadded to method documentations
Andrea-Oliveri May 28, 2025
2f5af38
Added whatsnew entries
Andrea-Oliveri May 28, 2025
e2370ac
Merged changes from branch main
Andrea-Oliveri May 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Changed method names from .vacuum to .reorganize in dbm.sqlite and db…
…m.dumb for consistency with dbm.gnu. Also updated documentations and tests to reflect the change
  • Loading branch information
Andrea-Oliveri committed May 15, 2025
commit 476dc55d3a05afb0fb199ea392f4650d2acec8c6
13 changes: 6 additions & 7 deletions Doc/library/dbm.rst
< 8000 tr data-hunk="520d0a73ae015dadde33ebd43cf8bdd31b6d78f5e12f12032add999dbd74d695" class="show-top-border">
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,8 @@ the Oracle Berkeley DB.

.. note::
None of the underlying modules will automatically shrink the disk space used by
the database file. However, :mod:`dbm.sqlite3` and :mod:`dbm.dumb` provide
a :meth:`!vacuum` method that can be used for this purpose. :mod:`dbm.gnu` can
do the same with its :meth:`!reorganize`, called like this for retro-compatibility.
the database file. However, :mod:`dbm.sqlite3`, :mod:`dbm.gnu` and :mod:`dbm.dumb`
provide a :meth:`!reorganize` method that can be used for this purpose.


.. exception:: error
Expand Down Expand Up @@ -193,14 +192,14 @@ or any other SQLite browser, including the SQLite CLI.
The Unix file access mode of the file (default: octal ``0o666``),
used only when the database has to be created.

.. method:: sqlite3.vacuum()
.. method:: sqlite3.reorganize()

If you have carried out a lot of deletions and would like to shrink the space
used on disk, this method will reorganize the database; therwise, deleted file
used on disk, this method will reorganize the database; otherwise, deleted file
space will be kept and reused as new (key, value) pairs are added.

.. note::
During vacuuming, as much as twice the size of the original database is required
While reorganizing, as much as twice the size of the original database is required
in free disk space.


Expand Down Expand Up @@ -481,7 +480,7 @@ The :mod:`!dbm.dumb` module defines the following:

Close the database.

.. method:: dumbdbm.vacuum()
.. method:: dumbdbm.reorganize()

If you have carried out a lot of deletions and would like to shrink the space
used on disk, this method will reorganize the database; otherwise, deleted file
Expand Down
14 changes: 7 additions & 7 deletions Lib/dbm/dumb.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,30 +284,30 @@ def __enter__(self):
def __exit__(self, *args):
self.close()

def vacuum(self):
def reorganize(self):
if self._readonly:
raise error('The database is opened for reading only')
self._verify_open()
# Ensure all changes are committed before vacuuming.
# Ensure all changes are committed before reorganizing.
self._commit()
# Open file in r+ to allow changing in-place.
with _io.open(self._datfile, 'rb+') as f:
vacuum_pos = 0
reorganize_pos = 0

# Iterate over existing keys, sorted by starting byte.
for key in sorted(self._index.keys(), key = lambda k: self._index[k][0]):
pos, siz = self._index[key]
f.seek(pos)
val = f.read(siz)

f.seek(vacuum_pos)
f.seek(reorganize_pos)
f.write(val)
self._index[key] = (vacuum_pos, siz)
self._index[key] = (reorganize_pos, siz)

blocks_occupied = (siz + _BLOCKSIZE - 1) // _BLOCKSIZE
vacuum_pos += blocks_occupied * _BLOCKSIZE
reorganize_pos += blocks_occupied * _BLOCKSIZE

f.truncate(vacuum_pos)
f.truncate(reorganize_pos)
# Commit changes to index, which were not in-place.
self._commit()

Expand Down
6 changes: 3 additions & 3 deletions Lib/dbm/sqlite3.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (CAST(? AS BLOB), CAST(? AS BLOB))"
DELETE_KEY = "DELETE FROM Dict WHERE key = CAST(? AS BLOB)"
ITER_KEYS = "SELECT key FROM Dict"
VACUUM = "VACUUM"
REORGANIZE = "VACUUM"


class error(OSError):
Expand Down Expand Up @@ -123,8 +123,8 @@ def __enter__(self):
def __exit__(self, *args):
self.close()

def vacuum(self):
self._execute(VACUUM)
def reorganize(self):
self._execute(REORGANIZE)


def open(filename, /, flag="r", mode=0o666):
Expand Down
27 changes: 13 additions & 14 deletions Lib/test/test_dbm.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,31 +135,31 @@ def test_anydbm_access(self):
assert(f[key] == b"Python:")
f.close()

def test_anydbm_readonly_vacuum(self):
def test_anydbm_readonly_reorganize(self):
self.init_db()
with dbm.open(_fname, 'r') as d:
# Early stopping.
if not hasattr(d, 'vacuum'):
if not hasattr(d, 'reorganize'):
return

self.assertRaises(dbm.error, lambda: d.vacuum())
self.assertRaises(dbm.error, lambda: d.reorganize())

def test_anydbm_vacuum_not_changed_content(self):
def test_anydbm_reorganize_not_changed_content(self):
self.init_db()
with dbm.open(_fname, 'c') as d:
# Early stopping.
if not hasattr(d, 'vacuum'):
if not hasattr(d, 'reorganize'):
return

keys_before = sorted(d.keys())
values_before = [d[k] for k in keys_before]
d.vacuum()
d.reorganize()
keys_after = sorted(d.keys())
values_after = [d[k] for k in keys_before]
self.assertEqual(keys_before, keys_after)
self.assertEqual(values_before, values_after)

def test_anydbm_vacuum_decreased_size(self):
def test_anydbm_reorganize_decreased_size(self):

def _calculate_db_size(db_path):
if os.path.isfile(db_path):
Expand All @@ -171,31 +171,30 @@ def _calculate_db_size(db_path):
total_size += os.path.getsize(file_path)
return total_size

# This test requires relatively large databases to reliably show difference in size before and after vacuum.
# This test requires relatively large databases to reliably show difference in size before and after reorganizing.
with dbm.open(_fname, 'n') as f:
# Early stopping.
if not hasattr(f, 'vacuum'):
if not hasattr(f, 'reorganize'):
return

for k in self._dict:
f[k.encode('ascii')] = self._dict[k] * 100000
db_keys = list(f.keys())

# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
size_before = _calculate_db_size(_fname)
size_before = _calculate_db_size(os.path.dirname(_fname))

# Delete some elements from the start of the database.
keys_to_delete = db_keys[:len(db_keys) // 2]
with dbm.open(_fname, 'c') as f:
for k in keys_to_delete:
del f[k]
f.vacuum()
f.reorganize()

# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
size_after = _calculate_db_size(_fname)
size_after = _calculate_db_size(os.path.dirname(_fname))

# Less or equal because not all submodules support vacuuming.
self.assertLessEqual(size_after, size_before)
self.assertLess(size_after, size_before)

def test_open_with_bytes(self):
dbm.open(os.fsencode(_fname), "c").close()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
:mod:`!dbm.dumb` and :mod:`!dbm.sqlite` now have :meth:`!vacuum` methods to
:mod:`!dbm.dumb` and :mod:`!dbm.sqlite` now have :meth:`!reorganize` methods to
recover unused free space previously occupied by deleted entries.
Loading
0