Enable Online Backup in PostgreSQL
Enable Online Backup in PostgreSQL
Added by Rinn Bernd, last edited by Kohler Manuel on May 24, 2011 (view change) Cook book style enabling of an online backup for PostgreSQL (tested with PostgreSQL 8.4 & 9.0) 1. Put the scripts archive_wal.sh and full_db_backup.sh to /usr/local/bin usr/local/bin/archive_wal.sh
#! /bin/bash WAL_PATH="$1" WAL_FILE="$2" BACKUP_DIR=/mnt/local0/db-backups/pg_xlog test ! -f ${BACKUP_DIR}/${WAL_FILE} && /bin/cp ${WAL_PATH} ${BACKUP_DIR}/$ {WAL_FILE}
/usr/local/bin/full_db_backup.sh
#! /bin/bash
MAIL_LIST="mail@bsse.ethz.ch" BOX=`uname -n` MAILX="/bin/mail" PG_DATA_DIR=/var/lib/pgsql/9.0/data BACKUP_DIR=/net/bs-dsu-data/array0/dsu/bsse/db-backups/full DATE=`/bin/date +%Y-%m-%d_%H%M` BACKUP_PATH="${BACKUP_DIR}/${DATE}" /usr/bin/psql -U postgres -c "SELECT pg_start_backup('${BACKUP_PATH}')" /usr/bin/rsync -a --exclude "pg_xlog/*" ${PG_DATA_DIR} ${BACKUP_PATH}/ /usr/bin/psql -U postgres -c "SELECT pg_stop_backup()" if [ $? -ne echo -e backup from else echo -e "PostgreSQL fi 0 ]; then "PostgreSQL DB backup broken ... :-(" | $MAILX -s "PostgreSQL $BOX is B R O K E N !" $MAIL_LIST "PostgreSQL DB backup ran OK on $BOX :-)" | $MAILX -s Backup from $BOX ran OK" $MAIL_LIST
2. Create the directories given as $BACKUP_DIR in the two scripts and give them to user postgres:
mkdir chown mkdir chown /mnt/local0/db-backups/full postgres:postgres /mnt/local0/db-backups/full /mnt/local0/db-backups/pg_xlog postgres:postgres /mnt/local0/db-backups/pg_xlog
[..] checkpoint_timeout = 1h [..] archive_mode = on # allows archiving to be done # (change requires restart) archive_command = '/usr/local/bin/archive_wal.sh %p %f' # command to use to archive a logfile segment archive_timeout = 60 # force a logfile segment switch after this # range 30s-1h
Restoring from online backup 1. Make sure that the old postgres process is stopped
# /etc/init.d/postgresql stop
Be careful to check that there is enough space for two copies of the database on the partition if you move it to a place on the same partition. 3. Copy the latest full backup to pgsql/data:
# rsync -a /mnt/local0/db-backups/full/<last>/data /mnt/localssd/pgsql/
5. Create recovery.conf:
/mnt/localssd/pgsql/data/recovery.conf restore_command = '/bin/cp /mnt/local0/db-backups/pg_xlog/%f "%p"'
Background information about using write ahead logs Quick Introduction to Checkpoint Timing As you generate transactions, Postgres puts data into the write-ahead log (WAL). The WAL is organized into segments that are typically 16MB each. Periodically, after the system finishes a checkpoint, the WAL data up to a certain point is guaranteed to have been applied to the database. At that point the old WAL files aren't needed anymore and can be reused. Checkpoints are generally caused by one of two things happening: checkpoint_segments worth of WAL files have been written more than checkpoint_timeout seconds have passed since the last checkpoint The system doesn't stop working while the checkpoint is happening; it just keeps creating new WAL files. As long as the checkpoint finishes in advance of what the next one is required things should be fine. In the 8.2 model, processing the checkpoint occurs as fast as data can be written to disk. All of the dirty data is written out in one burst, then PostgreSQL asks the operating system to confirm the data has been written via the fsync call (see Tuning PostgreSQL WAL Synchronization for lots of details about what fsync does). 8.3 lets the checkpoint occur at a more leisurely pace. source: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm pg_stat_bgwriter sample analysis Here is an example from a more busy server than the earlier example, courtesy of pgsql-general, and what advice they were given based on these statistics: db=# select * from pg_stat_bgwriter; checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc ------------------+-----------------+--------------------+---------------+------------------+----------------+--------------118 | 435 | 1925161 | 126291 | 7| 1397373 | 2665693 You had 118 checkpoints that happened because of checkpoint_timeout passing. 435 of them happened before that, typically those are because checkpoint_segments was reached. This suggests you might improve your checkpoint situation by increasing checkpoint_segments, but that's not a bad ratio. Increasing that parameter and spacing checkpoints further apart helps give the checkpoint spreading logic of checkpoint_completion_target more room to work over, which reduces the average load from the checkpoint process.
During those checkpoints, 1,925,161 8K buffers were written out. That means on average, a typical checkpoint is writing 3481 buffers out, which works out to be 27.2MB each. Pretty low, but that's an average; there could have been some checkpoints that wrote a lot more while others wrote nothing, and you'd need to sample this data regularly to figure that out. The background writer cleaned 126,291 buffers (cleaned=wrote out dirty ones) during that time. 7 times, it wrote the maximum number it was allowed to before meeting its other goals. That's pretty low; if it were higher, it would be obvious you could gain some improvement by increasing bgwriter_lru_maxpages. Since last reset, 2,665,693 8K buffers were allocated to hold database pages. Out of those allocations, 1,397,373 times a database backend (probably the client itself) had to write a page in order to make space for the new allocation. That's not awful, but it's not great. You might try and get a higher percentage written by the background writer in advance of when the backend needs them by increasing bgwriter_lru_maxpages, bgwriter_lru_multiplier, and decreasing bgwriter_delay--making the changes in that order is the most effective strategy. source: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm