8000 add keep_last property for housekeeping duties, remove clean_up option · homeylab/bookstack-file-exporter@bd871a4 · GitHub
[go: up one dir, main page]

Skip to content

Commit bd871a4

Browse files
committed
add keep_last property for housekeeping duties, remove clean_up option
1 parent e71cf35 commit bd871a4

File tree

14 files changed

+188
-23
lines changed

14 files changed

+188
-23
lines changed

.pylintrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,7 @@ disable=raw-checker-failed,
430430
useless-suppression,
431431
deprecated-pragma,
432432
use-symbolic-message-instead,
433-
missing-module-docstring
433+
missing-module-docstring # added for lint warnings that don't seem necessary
434434

435435

436436
# Enable the message, report, category or checker with the given id(s). You can

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ BASE_IMAGE_TAG=3.11-slim-bookworm
44
IMAGE_NAME=homeylab/bookstack-file-exporter
55
# keep this start sequence unique (IMAGE_TAG=)
66
# github actions will use this to create a tag
7-
IMAGE_TAG=0.0.1
7+
IMAGE_TAG=0.0.2
88
DOCKER_WORK_DIR=/export
99
DOCKER_CONFIG_DIR=/export/config
1010
DOCKER_EXPORT_DIR=/export/dump

README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ Supported backup formats are shown [here](https://demo.bookstackapp.com/api/docs
2929

3030
Backups are exported in `.tgz` format and generated based off timestamp. Export names will be in the format: `%Y-%m-%d_%H-%M-%S` (Year-Month-Day_Hour-Minute-Second). *Files are first pulled locally to create the tarball and then can be sent to object storage if needed*. Example file name: `bookstack_export_2023-09-22_07-19-54.tgz`.
3131

32+
The exporter can also do housekeeping duties and keep a configured number of archives and delete older ones. See `keep_last` property in the `Configuration` section. Object storage provider configurations include their own `keep_last` property for flexibility.
33+
3234
## Using This Application
3335
Ensure a valid configuration is provided when running this application. See `Configuration` section for more details.
3436

@@ -102,6 +104,9 @@ Env variables for credentials will take precedence over configuration file optio
102104
### Configuration
103105
See below for an example and explanation. Optionally, look at `examples/` folder of the github repo for more examples.
104106

107+
For object storage configuration, find more information in their respective sections
108+
- [Minio](https://github.com/homeylab/bookstack-file-exporter#minio-backups)
109+
105110
Schema and values are checked so ensure proper settings are provided. As mentioned, credentials can be specified as environment variables instead if preferred.
106111
```
107112
# if http/https not specified, defaults to https
@@ -140,6 +145,7 @@ minio_config:
140145
region: "us-east-1"
141146
bucket: "mybucket"
142147
path: "bookstack/file_backups"
148+
keep_last:
143149
144150
# output directory for the exported archive
145151
# relative or full path
@@ -153,11 +159,13 @@ output_path: "bkps/"
153159
# omit this or set to false if not needed
154160
export_meta: true
155161
156-
# optional if using object storage targets
157-
# After uploading to object storage targets, choose to clean up local files
158-
# delete the archive from local filesystem
159-
# will not be cleaned up if set to false or omitted
160-
clean_up: true
162+
# optional if specified exporter can delete older archives
163+
# valid values are:
164+
# set to -1 if you want to delete all archives after each run
165+
# - this is useful if you only want to upload to object storage
166+
# set to 1+ if you want to retain a certain number of archives
167+
# set to 0 or comment out section if you want no action done
168+
keep_last: 5
161169
```
162170

163171
### Backup Behavior
@@ -230,6 +238,12 @@ secret_key: ""
230238
# optional, will use root bucket path if not set
231239
# in example below, the exported archive will appear in: `<bucket_name>:/bookstack/backups/bookstack-<timestamp>.tgz`
232240
path: "bookstack/file_backups"
241+
242+
# optional if specified exporter can delete older archives
243+
# valid values are:
244+
# set to 1+ if you want to retain a certain number of archives
245+
# set to 0 or comment out section if you want no action done
246+
keep_last: 5
233247
```
234248

235249
As mentioned you can optionally 10000 set access and secret key as env variables. If both are specified, env variable will take precedence.

bookstack_file_exporter/archiver/archiver.py

Lines changed: 55 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from typing import List, Dict, Union
22
from datetime import datetime
33
import logging
4+
import os
45

56
from bookstack_file_exporter.exporter.node import Node
67
from bookstack_file_exporter.archiver import util
@@ -27,6 +28,8 @@
2728

2829
_DATE_STR_FORMAT = "%Y-%m-%d_%H-%M-%S"
2930

31+
# pylint: disable=too-many-instance-attributes
32+
3033
class Archiver:
3134
"""
3235
Archiver pulls all the necessary files from upstream
@@ -97,19 +100,65 @@ def _gzip_tar(self):
97100
def _archive_minio(self, config: StorageProviderConfig):
98101
minio_archiver = MinioArchiver(config)
99102
minio_archiver.upload_backup(self._archive_file)
103+
minio_archiver.clean_up(config.keep_last, _FILE_EXTENSION_MAP['tgz'])
100104

101105
def _archive_s3(self, config: StorageProviderConfig):
102106
pass
103107

104-
def clean_up(self, clean_up_archive: Union[bool, None]):
108+
def clean_up(self, keep_last: Union[int, None]):
105109
"""remove archive after sending to remote target"""
106-
self._clean(clean_up_archive)
107-
108-
def _clean(self, clean_up_archive: Union[bool, None]):
110+
# this captures keep_last = 0
111+
if not keep_last:
112+
return
113+
to_delete = self._get_stale_archives(keep_last)
114+
if to_delete:
115+
self._delete_files(to_delete)
116+
117+
def _get_stale_archives(self, keep_last: int) -> List[str]:
109118
# if user is uploading to object storage
110119
# delete the local .tgz archive since we have it there already
111-
if clean_up_archive:
112-
util.remove_file(self._archive_file)
120+
archive_list: List[str] = util.scan_archives(self.base_dir, _FILE_EXTENSION_MAP['tgz'])
121+
if not archive_list:
122+
log.debug("No archive files found to clean up")
123+
return []
124+
# if negative number, we remove all local archives
125+
# assume user is using remote storage and will upload there
126+
if keep_last < 0:
127+
log.debug("Local archive files will be deleted, keep_last: -1")
128+
return archive_list
129+
# keep_last > 0 condition
130+
to_delete = []
131+
if len(archive_list) > keep_last:
132+
log.debug("Number of archives is greater than 'keep_last'")
133+
log.debug("Running clean up of local archives")
134+
to_delete = self._filter_archives(keep_last, archive_list)
135+
return to_delete
136+
137+
def _filter_archives(self, keep_last: int, file_list: List[str]) -> List[str]:
138+
"""get older archives based on keep number"""
139+
file_dict = {}
140+
for file in file_list:
141+
file_dict[file] = os.stat(file).st_ctime
142+
# order dict by creation time
143+
# ascending order
144+
ordered_dict = dict(sorted(file_dict.items(), key=lambda item: item[1]))
145+
# ordered_dict = {k: v for k, v in sorted(file_dict.items(),
146+
# key=lambda item: item[1])}
147+
148+
files_to_clean = []
149+
# how many items we will have to delete to fulfill keep_last
150+
to_delete = len(ordered_dict) - keep_last
151+
for key in ordered_dict:
152+
files_to_clean.append(key)
153+
to_delete -= 1
154+
if to_delete <= 0:
155+
break
156+
log.debug("%d local archives will be cleaned up", len(files_to_clean))
157+
return files_to_clean
158+
159+
def _delete_files(self, file_list: List[str]):
160+
for file in file_list:
161+
util.remove_file(file)
113162

114163
# convert page data to bytes
115164
def _get_data_format(self, page_node_id: int, export_format: str) -> bytes:

bookstack_file_exporter/archiver/minio_archiver.py

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
1-
from typing import Union
1+
from typing import Union, List
22
import logging
33

4+
# pylint: disable=import-error
45
from minio import Minio
6+
# pylint: disable=import-error
7+
from minio.datatypes import Object as MinioObject
58

69
from bookstack_file_exporter.config_helper.remote import StorageProviderConfig
710

11+
12+
813
log = logging.getLogger(__name__)
914

1015
class MinioArchiver:
@@ -54,3 +59,65 @@ def upload_backup(self, local_file_path: str):
5459
result = self._client.fput_object(self.bucket, object_path, local_file_path)
5560
log.info("""Created object: %s with tag: %s and version-id: %s""",
5661
result.object_name, result.etag, result.version_id)
62+
63+
def clean_up(self, keep_last: Union[int, None], file_extension: str):
64+
"""delete objects based on 'keep_last' number"""
65+
# this captures keep_last = 0
66+
if not keep_last:
67+
return
68+
to_delete = self._get_stale_objects(keep_last, file_extension)
69+
if to_delete:
70+
self._delete_objects(to_delete)
71+
72+
def _scan_objects(self, file_extension: str) -> List[MinioObject]:
73+
filter_str = "bookstack_export_"
74+
# prefix should end in '/' for minio
75+
# ref: https://min.io/docs/minio/linux/developers/python/API.html#list_objects
76+
path_prefix = self.path + "/"
77+
# get all objects in archive path/directory
78+
full_list: List[MinioObject] = self._client.list_objects(self.bucket, prefix=path_prefix)
79+
# validate and filter out non managed objects
80+
if full_list:
81+
return [object for object in full_list
82+
if object.object_name.endswith(file_extension)
83+
and filter_str in object.object_name]
84+
return []
85+
86+
def _get_stale_objects(self, keep_last: int, file_extension: str) -> List[MinioObject]:
87+
minio_objects = self._scan_objects(file_extension)
88+
if not minio_objects:
89+
log.debug("No minio objects found to clean up")
90+
return []
91+
if keep_last < 0:
92+
# we want to keep one copy at least
93+
# last copy that remains if local is deleted
94+
log.debug("Minio 'keep_last' set to negative number, ignoring")
95+
return []
96+
# keep_last > 0 condition
97+
to_delete = []
98+
if len(minio_objects) > keep_last:
99+
log.debug("Number of minio objects is greater than 'keep_last'")
100+
log.debug("Running clean up of minio objects")
101+
to_delete = self._filter_objects(keep_last, minio_objects)
102+
return to_delete
103+
104+
def _filter_objects(self, keep_last: int,
105+
minio_objects: List[MinioObject]) -> List[MinioObject]:
106+
# sort by minio datetime 'last_modified' time
107+
# ascending order
108+
sorted_objects = sorted(minio_objects, key=lambda d: d.last_modified)
109+
objects_to_clean = []
110+
# how many items we will have to delete to fulfill 'keep_last'
111+
to_delete = len(sorted_objects) - keep_last
112+
# collect objects to delete
113+
for item in sorted_objects:
114+
objects_to_clean.append(item)
115+
to_delete -= 1
116+
if to_delete <= 0:
117+
break
118+
log.debug("%d minio objects will be cleaned up", len(objects_to_clean))
119+
return objects_to_clean
120+
121+
def _delete_objects(self, minio_objects: List[MinioObject]):
122+
for item in minio_objects:
123+
self._client.remove_object(self.bucket, item.object_name)

bookstack_file_exporter/archiver/util.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
import shutil
77
from io import BytesIO
88
import gzip
9+
import glob
910

1011
from bookstack_file_exporter.common import util
1112

@@ -41,3 +42,8 @@ def create_gzip(tar_file: str, gzip_file: str, remove_old: bool = True):
4142
shutil.copyfileobj(f_in, f_out)
4243
if remove_old:
4344
remove_file(tar_file)
45+
46+
def scan_archives(base_dir: str, extension: str) -> str:
47+
"""scan export directory for archives"""
48+
file_pattern = f"{base_dir}_*{extension}"
49+
return glob.glob(file_pattern)

bookstack_file_exporter/common/util.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
import logging
22
from typing import Tuple, Dict
3+
# pylint: disable=import-error
34
import requests
5+
# pylint: disable=import-error
46
from requests.adapters import HTTPAdapter, Retry
57

68
log = logging.getLogger(__name__)

bookstack_file_exporter/config_helper/config_helper.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,14 @@
22
import argparse
33
from typing import Dict, Tuple
44
import logging
5-
5+
# pylint: disable=import-error
66
import yaml
77

88
from bookstack_file_exporter.config_helper import models
99
from bookstack_file_exporter.config_helper.remote import StorageProviderConfig
1010

1111
log = logging.getLogger(__name__)
1212

13-
1413
_DEFAULT_HEADERS = {
1514
'Content-Type': 'application/json; charset=utf-8'
1615
}
@@ -31,6 +30,8 @@
3130
_MINIO_ACCESS_KEY_FIELD='MINIO_ACCESS_KEY'
3231
_MINIO_SECRET_KEY_FIELD='MINIO_SECRET_KEY'
3332

33+
# pylint: disable=too-many-instance-attributes
34+
3435
## Normalize config from cli or from config file
3536
class ConfigNode:
3637
"""
@@ -102,7 +103,8 @@ def _generate_remote_config(self) -> Dict[str, StorageProviderConfig]:
102103
minio_secret_key, self.user_inputs.minio_config.bucket,
103104
host=self.user_inputs.minio_config.host,
104105
path=self.user_inputs.minio_config.path,
105-
region=self.user_inputs.minio_config.region)
106+
region=self.user_inputs.minio_config.region,
107+
keep_last=self.user_inputs.minio_config.keep_last)
106108
return object_config
107109

108110
def _generate_headers(self) -> Dict[str, str]:
Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
from typing import Dict, Literal, List, Optional
2+
# pylint: disable=import-error
23
from pydantic import BaseModel
34

4-
# pylint: disable=R0903
5-
5+
# pylint: disable=too-few-public-methods
66
class MinioConfig(BaseModel):
77
"""YAML schema for minio configuration"""
88
host: str
@@ -11,12 +11,15 @@ class MinioConfig(BaseModel):
1111
bucket: str
1212
path: Optional[str] = None
1313
region: str
14+
keep_last: Optional[int] = None
1415

16+
# pylint: disable=too-few-public-methods
1517
class BookstackAccess(BaseModel):
1618
"""YAML schema for bookstack access credentials"""
1719
token_id: str
1820
token_secret: str
1921

22+
# pylint: disable=too-few-public-methods
2023
class UserInput(BaseModel):
2124
"""YAML schema for user provided configuration file"""
2225
host: str
@@ -26,4 +29,4 @@ class UserInput(BaseModel):
2629
output_path: Optional[str] = None
2730
export_meta: Optional[bool] = None
2831
minio_config: Optional[MinioConfig] = None
29-
clean_up: Optional[bool] = None
32+
keep_last: Optional[int] = None

bookstack_file_exporter/config_helper/remote.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,11 @@ class StorageProviderConfig:
2020
"""
2121
def __init__(self, access_key: str, secret_key: str, bucket: str,
2222
host: Union[str, None]=None, path: Union[str, None]=None,
23-
region: Union[str, None]=None):
23+
region: Union[str, None]=None, keep_last: Union[int, None] = None):
2424
self.host = host
2525
self.access_key = access_key
2626
self.secret_key = secret_key
2727
self.bucket = bucket
2828
self.path = path
2929
self.region = region
30+
self.keep_last = keep_last

0 commit comments

Comments
 (0)
0