Skip to main content

RDS Snapshots

RDS snapshots have multiple purposes: migrations, backups, etc. When RDS DB instances are created using the Cloud Platform RDS terraform module, an IAM user account is created for management purposes. This user can create, delete, copy, and restore RDS snapshots.

Examples of managing RDS DB snapshots using the AWS CLI can be found in the README within the RDS terraform module

Considerations

  • The amount of manual snapshots per AWS account is limited, so it’s important to cleanup old snapshots
  • Daily snapshots are provided “out of the box”, and do not count towards the “Manual Snapshots” total
  • Managing snapshots is the teams’ responsibility (as are snapshot restores), so teams are responsible for cleaning up unneeded manual snapshots in order to avoid hitting our AWS account limits

Restoring live services from an RDS DB snapshot

If you want to restore your production RDS DB instance from a previous snapshot, either because the database is corrupted or the whole database was deleted, here is the procedure:

1. Get the db-instance-identifier of the DB instance you want to restore, using the Cloud Platform CLI;

cloud-platform decode-secret -n <your namespace> -s <the secret storing your RDS details>

Look for a line like this in the output;

"rds_instance_address": "cloud-platform-xxxxxxxxxxxxxxxx.abcdefghijkl.eu-west-2.rds.amazonaws.com",

The db-instance-identifier is the cloud-platform-xxxxxxxxxxxxxxxx part.

2. List the available snapshots using the AWS CLI:

First of all, ensure you have a Service Pod running and configured for your target RDS instance, in order to run the AWS CLI commands.

aws rds describe-db-snapshots --db-instance-identifier <db-instance-identifier>

The output will have a list of snapshots available for that RDS DB instance. Pick the DBSnapshotIdentifier from which you want the RDS to be restored.

3. Update your current RDS manifest file (usually called rds.tf) in the cloud-platform-environments repo to include snapshot-identifier and raise the PR.

snapshot_identifier = "rds:cloud-platform-XXXXX-2020-02-08-04-23"

For example:

module "example_team_rds" {
    source               = "github.com/ministryofjustice/cloud-platform-terraform-rds-instance?ref=5.1"
    cluster_name           = var.cluster_name
    cluster_state_bucket   = var.cluster_state_bucket
    team_name              = var.team_name
    business-unit          = var.business-unit
    application            = var.application
    is-production          = var.is-production
    db_engine_version      = "11.4"
    environment-name       = var.environment-name
    infrastructure-support = var.infrastructure-support
    force_ssl              = "true"
    rds_family             = "postgres11"

    snapshot_identifier = "rds:cloud-platform-XXXXX-2020-02-08-04-23"

    providers = {
    # Can be either "aws.london" or "aws.ireland"
    aws = aws.london
    }
}

Check the terraform plan for any change to the current values except known after apply.

Your RDS db_allocated_storage will be autoscaled to maximum of 10000 by default unless you have set to specific number. Hence, the actual db_allocated_storage may be higher than what is set as default(10).

Check the actual “AllocatedStorage” value described in your RDS instance using the command:

aws rds describe-db-instances --db-instance-identifier <your-rds-db-identifier>

If that is different to what is set for your RDS, update your RDS manifest to include the actual db_allocated_storage by adding:

db_allocated_storage = "<the actual value you got from the above cli command>"

It is important to retain other original settings associated with the DB instance when restoring from the snapshot. So don’t make any other changes to this file except the above mentioned.

Warning: If there is an RDS read replica for the DB instance that is being restored, then you must set count = 0 or comment out the read replica code (and any Kubernetes secrets relating to the read replica). This will remove the read replica. This is required as Terraform will throw the following error as Terraform is unable to change the read replica target: Error: cannot elect new source database for replication

When the read replica is re-created, it will be created with a new database identifier.

4. Contact the Cloud Platform team about the restore, providing the db-instance-identifier and the PR raised.

5. Before the PR can be merged, the Cloud Platform team have to:

  • pause the Apply Pipeline, and
  • rename the DB Instance via the AWS console

This allows the pipeline to create the new DB Instance from the snapshot, using the same name and settings as the original, and preserves the database endpoint and credentials.

Sometimes the pipeline throws an error: InvalidParameterCombination: Cannot upgrade mariadb from 10.4.13 to 10.4.8.

This will happen if AWS upgraded the minor version automatically and terraform stored the exact db_engine_version in its state.

To fix this, change the db_engine_version in your rds.tf from 10.4 to the one AWS upgraded to, in this case 10.4.13, and repeat the previous steps again.

Once the the terraform apply is successful, and the new DB instance has been created, revert the changes made for db_engine_version.

6. Merge the PR to Main and check that the Apply Pipeline creates the new DB instance with the same identifier and endpoints. The Cloud Platform team can also apply your changes manually if this is time critical.

Once the new DB instance is restored and available, your application should be able to access the database with the same database credentials and endpoint as before.

After you have verified that your DB instance and the database is working as expected, let the Cloud Platform team know so they can delete the old, renamed DB instance.

7. Only required for Read Replicas - Recreate the DB read replica if it was removed as a part of step 3:

In Terraform, re-enable the DB read replica code by setting count = 1 or uncommenting the code and any Kubernetes secrets that may also be created as a part of it.

Create a PR that includes the changes for the Cloud Platform team and post the link into the #ask-cloud-platform channel.

Once the PR has been approved, Merge to Main and check that the Apply Pipeline creates the new DB read replica instance.

Note: The read replica will have a new database identifier once it is created.

This page was last reviewed on 28 October 2024. It needs to be reviewed again on 28 April 2025 by the page owner #cloud-platform .
This page was set to be reviewed before 28 April 2025 by the page owner #cloud-platform. This might mean the content is out of date.