Suppose you have a time series database containing terabytes of data. How do you mange backups for this data? Do you think it is too big to backup and blindly rely on database replication for data safety? Then you are in trouble.
Why replication doesn’t save from disaster?
Replication is the process of creating multiple copies of the same data on distinct hardware resources and maintaining this data in consistent state. Replication saves from hardware failures — if certain node or disk goes out of service, your data shouldn’t be lost or corrupted, since there should remain at least one copy of the data. Are we safe? No:
- What if you accidentally remove the data during database migration? It is removed from all the replicas. It is lost. There is no way to recover it.
- What if you made a mistake during routine database cluster upgrade or reconfiguration and this led to data loss? Replication doesn’t help in this case too.
How to protect from these issues? Use plain old backups.
Plain old backups
S3 and GCS are the most promising storage options for backups. They are inexpensive, reliable and durable. But they have certain limitations:
- There are limits on file sizes, which can be uploaded to object storage: 5TB maximum object size and 5GB maximum part size to upload at once. What if you need to backup files with sizes exceeding these numbers?
- Limited network bandwidth, so full backups can take a few days to complete. For instance, more than 27 hours is needed for transferring 10TB of data over a gigabit network.
- Non-zero probability for network errors. What if network error occurs at the end of uploading of 10TB file? Spend yet another 27 hours on uploading it again?
- Paid egress traffic from the datacenter where the database is located. Backup sizes and backup frequency increase network bandwidth costs.
Is there a way to overcome these limitations? Yes if certain conditions are met:
- Data must be split into multiple files, so each file can be backed up independently.
- Files must be immutable, i.e. their contents shouldn’t change over time. This allows uploading each file to backup storage only once.
- New data must go into new files, so incremental backups could be cheap — just backup new files.
- The total number of files shouldn’t be too high. This reduces overhead and costs on per-file operations and management.
- Data must be stored in compressed form on disk in order to reduce network bandwidth usage during backups.
If the database stores all the data according to these conditions, then it is quite easy to setup inexpensive and fast incremental backups on top of S3 or GCS. Full backups also can be sped up with server-side copying of shared immutable files between old backup and new backup. Both GCS and S3 support server-side objects copy. This operation is usually fast when copying objects of any size in the same bucket, since only metadata is copied.
Which data structure adheres the principles mentioned above and can be used as building block for time series database? B-tree — the heart of the most databases? LMDB? PGDATA or TOAST from Postgresql?
No. All these data structures modify file contents on disk.
Log-structured merge trees and backups
LSM tree adheres all the conditions mentioned above:
- It stores data in multiple files.
- Files are immutable.
- New data goes into new files.
- The total number of files remains low thanks to background merging of smaller files into bigger files.
- Sorted rows usually have good compression ratio.
- CockroachDB — large-scale distributed database with SQL support.
- ClickHouse — fast column-based analytical database with SQL-like syntax.
- MyRocks — fast MySQL clone from Facebook with good on-disk data compression.
- VictoriaMetrics — fast and cost-effective time series database with the best on-disk data compression and PromQL support.
In theory all these databases can support incremental backups provided they store all the data in LSM-like data structures. But how to make backups from live data when new files are constantly added and old files are constantly removed from the database? Thanks to files’ immutability in LSM-like data structures, it is easy to make instant snapshot via hard links and then to backup data from the snapshot.
Full disclosure: I’m the core developer of VictoriaMetrics, so this chapter is dedicated to the recently published vmbackup tool. This command-line utility creates VictoriaMetrics backups on S3 and GCS. It takes full advantage of LSM tree properties mentioned above:
- It supports incremental backups by uploading only new files to backup storage.
- It supports fast full backups by employing server-side copy of shared files from already existing backups.
These features allow saving hours and terabytes of network bandwidth when performing backups for multi-terabyte time series database.
It is quite easy setting up smart backups with frequent incremental backups and less frequent full backups with server-side copy.
VictoriaMetrics can produce files exceeding 5TB. How does
vmbackup handle such files in the face of 5TB object size limit mentioned in the beginning of the article? And how does it handle network errors at the end of uploading such a big file? The answer is simple — it just splits files into 1GB chunks and uploads each chunk independently. So it re-transfers only 1GB of data on network errors during the upload in the worst case.
- While replication provides availability during hardware issues, it doesn’t save from data loss. Use backups.
- Large backups can be fast and cheap if proper database is used. I’d recommend VictoriaMetrics :)
- VictoriaMetrics backups can be used for backing up data collected from many Prometheus instances.