RDS Snapshots
RDS snapshots have multiple purposes: migrations, backups, etc. When RDS DB instances are created using the Cloud Platform RDS terraform module, an IAM user account is created for management purposes. This user can create, delete, copy, and restore RDS snapshots.
Examples of managing RDS DB snapshots using the AWS CLI, via the Cloud Platform Service Pod can be found in the README within the RDS terraform module
Considerations
- The amount of manual snapshots per AWS account is limited, so it’s important to cleanup old snapshots
- Daily snapshots are provided “out of the box”, and do not count towards the “Manual Snapshots” total
- Managing snapshots is the teams’ responsibility (as are snapshot restores), so teams are responsible for cleaning up unneeded manual snapshots in order to avoid hitting our AWS account limits
Restoring live services from an RDS DB snapshot
If you want to restore your production RDS DB instance from a previous snapshot, either because the database is corrupted or the whole database was deleted, here is the procedure:
db-instance-identifier
of the DB instance you want to restore, using the Cloud Platform CLI;
1. Get the cloud-platform decode-secret -n <your namespace> -s <the secret storing your RDS details>
Look for a line like this in the output;
"rds_instance_address": "cloud-platform-xxxxxxxxxxxxxxxx.abcdefghijkl.eu-west-2.rds.amazonaws.com",
The db-instance-identifier
is the cloud-platform-xxxxxxxxxxxxxxxx
part.
2. List the available snapshots using the AWS CLI:
First of all, ensure you have a Service Pod running and configured for your target RDS instance, in order to run the AWS CLI commands.
aws rds describe-db-snapshots --db-instance-identifier <db-instance-identifier>
The output will have a list of snapshots available for that RDS DB instance. Pick the DBSnapshotIdentifier
from which you want the RDS to be restored.
3. Contact the Cloud Platform team about the restore.
Before making changes to the RDS manifest, you need to:
Add an
APPLY_PIPELINE_SKIP_THIS_NAMESPACE
file at the root of your namespace.Provide the team with your
db-instance-identifier
so they can rename it via the AWS console.
rds.tf
) in the cloud-platform-environments repo to include snapshot-identifier
.
4. Update your current RDS manifest file (usually called snapshot_identifier = "rds:cloud-platform-XXXXX-2020-02-08-04-23"
For example:
module "example_team_rds" {
source = "github.com/ministryofjustice/cloud-platform-terraform-rds-instance?ref=5.1"
cluster_name = var.cluster_name
cluster_state_bucket = var.cluster_state_bucket
team_name = var.team_name
business-unit = var.business-unit
application = var.application
is-production = var.is-production
db_engine_version = "11.4"
environment-name = var.environment-name
infrastructure-support = var.infrastructure-support
force_ssl = "true"
rds_family = "postgres11"
snapshot_identifier = "rds:cloud-platform-XXXXX-2020-02-08-04-23"
providers = {
# Can be either "aws.london" or "aws.ireland"
aws = aws.london
}
}
5. Check the actual “AllocatedStorage” value described in your RDS instance using the command:
aws rds describe-db-instances --db-instance-identifier <your-rds-db-identifier>
If that is different to what is set for your RDS, update your RDS manifest to include the actual db_allocated_storage
by adding:
db_allocated_storage = "<the actual value you got from the above cli command>"
Important: It is crucial to retain other original settings associated with the DB instance when restoring from the snapshot. Details such as
db-engine-version
,db-engine
,rds-family
must be the same as the original instance when the snapshot was taken. So don’t make any other changes to this file except the above mentioned.Warning: If there is an RDS read replica for the DB instance that is being restored, then you must set
count = 0
or comment out the read replica code (and any Kubernetes secrets relating to the read replica). This will remove the read replica. This is required as Terraform will throw the following error as Terraform is unable to change the read replica target:Error: cannot elect new source database for replication
When the read replica is re-created, it will be created with a new database identifier.
6. Raise a PR
After making the necessary changes to the manifest, raise a PR and contact the Cloud Platform team via the #ask-cloud-platform channel with the PR link. Once the new DB instance is restored and available, your application should be able to access the database with the same database credentials and endpoint as before.
7. Only required for Read Replicas - Recreate the DB read replica if it was removed as a part of step 3:
In Terraform, re-enable the DB read replica code by setting count = 1
or uncommenting the code and any Kubernetes secrets that may also be created as a part of it.
Create a PR that includes the changes for the Cloud Platform team and post the link into the #ask-cloud-platform channel.
Once the PR has been approved, Merge to Main and check that the Apply Pipeline creates the new DB read replica instance.
Note: The read replica will have a new database identifier once it is created.
8. Important: Version Downgrades Cannot Be Done In-Place
AWS does not support downgrading a database to an earlier version on the same instance. For example, if a database has been upgraded from PostgreSQL 16.3 to 17, it cannot be downgraded back to 16.3 using the standard snapshot restoration process outlined above.
To properly downgrade an RDS database to a previous version:
- Take a snapshot of the current database to preserve it (in case restoration is needed later):
aws rds create-db-snapshot \
--db-instance-identifier <current-db-instance-identifier> \
--db-snapshot-identifier <current-version-snapshot-name>
Create a new database instance with:
- A different identifier than the current database
- Engine version matching the snapshot you want to restore from
- RDS family matching the snapshot you want to restore from
- All other configuration parameters consistent with the source of the snapshot
In the terraform module:
module "example_team_rds" {
source = "github.com/ministryofjustice/cloud-platform-terraform-rds-instance?ref=5.1"
# ... other parameters ...
db_instance_identifier = "<new-instance-name>" # Must be different from current DB
db_engine_version = "16.3" # Earlier version
rds_family = "postgres16" # Family matching earlier version
snapshot_identifier = "<old-version-snapshot>"
# ... other parameters ...
}
Note: This approach requires a cutover as the new database will have a different endpoint and connection string.
After verifying the new downgraded database works properly, the old one can be deleted if needed.