RDS Snapshots
RDS snapshots have multiple purposes: migrations, backups, etc. When RDS DB instances are created using the Cloud Platform RDS terraform module, an IAM user account is created for management purposes. This user can create, delete, copy, and restore RDS snapshots.
Examples of managing RDS DB snapshots using the AWS CLI can be found in the README within the RDS terraform module
Considerations
- The amount of manual snapshots per AWS account is limited, so it’s important to cleanup old snapshots
- Daily snapshots are provided “out of the box”, and do not count towards the “Manual Snapshots” total
- Managing snapshots is the teams’ responsibility (as are snapshot restores), so teams are responsible for cleaning up unneeded manual snapshots in order to avoid hitting our AWS account limits
Restoring live services from an RDS DB snapshot
If you want to restore your production RDS DB instance from a previous snapshot, either because the database is corrupted or the whole database was deleted, here is the procedure:
1. Get the db-instance-identifier
of the DB instance you want to restore, using the Cloud Platform CLI;
cloud-platform decode-secret -n <your namespace> -s <the secret storing your RDS details>
Look for a line like this in the output;
"rds_instance_address": "cloud-platform-xxxxxxxxxxxxxxxx.abcdefghijkl.eu-west-2.rds.amazonaws.com",
The db-instance-identifier
is the cloud-platform-xxxxxxxxxxxxxxxx
part.
2. List the available snapshots using the AWS CLI:
First of all, ensure you have a Service Pod running and configured for your target RDS instance, in order to run the AWS CLI commands.
aws rds describe-db-snapshots --db-instance-identifier <db-instance-identifier>
The output will have a list of snapshots available for that RDS DB instance. Pick the DBSnapshotIdentifier
from which you want the RDS to be restored.
3. Update your current RDS manifest file (usually called rds.tf
) in the cloud-platform-environments repo to include snapshot-identifier
and raise the PR.
snapshot_identifier = "rds:cloud-platform-XXXXX-2020-02-08-04-23"
For example:
module "example_team_rds" {
source = "github.com/ministryofjustice/cloud-platform-terraform-rds-instance?ref=5.1"
cluster_name = var.cluster_name
cluster_state_bucket = var.cluster_state_bucket
team_name = var.team_name
business-unit = var.business-unit
application = var.application
is-production = var.is-production
db_engine_version = "11.4"
environment-name = var.environment-name
infrastructure-support = var.infrastructure-support
force_ssl = "true"
rds_family = "postgres11"
snapshot_identifier = "rds:cloud-platform-XXXXX-2020-02-08-04-23"
providers = {
# Can be either "aws.london" or "aws.ireland"
aws = aws.london
}
}
Check the terraform plan for any change to the current values except known after apply
.
Your RDS db_allocated_storage
will be autoscaled to maximum of 10000
by default unless you have
set to specific number. Hence, the actual db_allocated_storage
may be higher than what is set as default(10).
Check the actual “AllocatedStorage” value described in your RDS instance using the command:
aws rds describe-db-instances --db-instance-identifier <your-rds-db-identifier>
If that is different to what is set for your RDS, update your RDS manifest to include the actual db_allocated_storage
by adding:
db_allocated_storage = "<the actual value you got from the above cli command>"
It is important to retain other original settings associated with the DB instance when restoring from the snapshot. So don’t make any other changes to this file except the above mentioned.
Warning: If there is an RDS read replica for the DB instance that is being restored, then you must set
count = 0
or comment out the read replica code (and any Kubernetes secrets relating to the read replica). This will remove the read replica. This is required as Terraform will throw the following error as Terraform is unable to change the read replica target:Error: cannot elect new source database for replication
When the read replica is re-created, it will be created with a new database identifier.
4. Contact the Cloud Platform team about the restore, providing the db-instance-identifier
and the PR raised.
5. Before the PR can be merged, the Cloud Platform team have to:
- pause the Apply Pipeline, and
- rename the DB Instance via the AWS console
This allows the pipeline to create the new DB Instance from the snapshot, using the same name and settings as the original, and preserves the database endpoint and credentials.
Sometimes the pipeline throws an error:
InvalidParameterCombination: Cannot upgrade mariadb from 10.4.13 to 10.4.8
.This will happen if AWS upgraded the minor version automatically and terraform stored the exact db_engine_version in its state.
To fix this, change the
db_engine_version
in yourrds.tf
from10.4
to the one AWS upgraded to, in this case10.4.13
, and repeat the previous steps again.Once the the terraform apply is successful, and the new DB instance has been created, revert the changes made for
db_engine_version
.
6. Merge the PR to Main and check that the Apply Pipeline creates the new DB instance with the same identifier and endpoints. The Cloud Platform team can also apply your changes manually if this is time critical.
Once the new DB instance is restored and available, your application should be able to access the database with the same database credentials and endpoint as before.
After you have verified that your DB instance and the database is working as expected, let the Cloud Platform team know so they can delete the old, renamed DB instance.
7. Only required for Read Replicas - Recreate the DB read replica if it was removed as a part of step 3:
In Terraform, re-enable the DB read replica code by setting count = 1
or uncommenting the code and any Kubernetes secrets that may also be created as a part of it.
Create a PR that includes the changes for the Cloud Platform team and post the link into the #ask-cloud-platform channel.
Once the PR has been approved, Merge to Main and check that the Apply Pipeline creates the new DB read replica instance.
Note: The read replica will have a new database identifier once it is created.