Cassandra: Backups Snapshot
Taking a backup in Cassandra is actually taking a snapshot.
You can take a snapshot of the SSTable for a given Keyspace.
Snapshot is taken using the nodetool utility.
This creates a hardlink for the SSTables (backed up) in the keyspace/snapshot directory.
Hardlinks do not consume additional disk space.
The nodetool snapshot is a local command. For cluster wide, the command must be run on every node of the entire cluster.
For global snaphot, one must run using parallel ssh utility, pssh.
The snapshots must be copied onto to a seperate offline location.
As in RDBMS we also have Incremental backups.
By default, this feature is disabled.
To enable this, we need to change the settings in cassandra.yaml
incremental_backups = true
For recovery of data, the relevant SSTable must be present.
As a best practise, old snapshots must be deleted as they continue to be accumulated.
Fundamentally, backing up data in Cassandra involves taking a snapshot of the SSTable for
a given keyspace at a moment in time, as it must have all the tables in order to properly
recover if needed.
You can create a snapshot using nodetool (we specify hostname, JMX port and keyspace):
# nodetool -h localhost -p 7199 snapshot scott
The snapshot is created in data_directory/<keyspace>/table_name-UUID/snapshots/snapshot_name directory.
The snapshot directory will contain *.db files (these have data when snapshot is taken).