Skip to main content

RDS Snapshots

RDS snapshots have multiple purposes: migrations, backups, etc. When RDS DB instances are created using the Cloud Platform RDS terraform module an IAM user account is created for management purposes. This user can create, delete, copy, and restore RDS snapshots.

Examples of managing RDS DB snapshots using the AWS CLI can be found in the README within the RDS terraform module

Considerations

  • The amount of manual snapshots per AWS account is limited, so it’s important to cleanup old snapshots
  • Daily snapshots are provided “out of the box”, and do not count towards the “Manual Snapshots” total
  • Managing snapshots is the teams’ responsibility (as are snapshot restores), so teams are responsible for cleaning up unneeded manual snapshots in order to avoid hitting our AWS account limits

Restoring live services from an RDS DB snapshot

If you want to restore your production RDS DB instance from a previous snapshot, either because the database is corrupted or the whole database was deleted, here is the procedure:

1. Get the db-instance-identifier of the DB instance you want to restore, using the Cloud Platform CLI;

cloud-platform decode-secret -n <your namespace> -s <the secret storing your RDS details>

Look for a line like this in the output;

"rds_instance_address": "cloud-platform-xxxxxxxxxxxxxxxx.abcdefghijkl.eu-west-2.rds.amazonaws.com",

The db-instance-identifier is the cloud-platform-xxxxxxxxxxxxxxxx part.

2. List the available snapshots using the AWS CLI:

eval $(cloud-platform decode-secret -n <your namespace> -s <the secret storing your RDS details> --export-aws-credentials)
aws rds describe-db-snapshots --db-instance-identifier <db-instance-identifier>

The output will have a list of snapshots available for that RDS DB instance. Pick the DBSnapshotIdentifier from which you want the RDS to be restored.

3. Update your current RDS manifest file (usually called rds.tf) in the cloud-platform-environments repo to include snapshot-identifier and raise the PR.

snapshot_identifier = "rds:cloud-platform-XXXXX-2020-02-08-04-23"

For example:

module "example_team_rds" {
    source               = "github.com/ministryofjustice/cloud-platform-terraform-rds-instance?ref=5.1"
    cluster_name           = var.cluster_name
    cluster_state_bucket   = var.cluster_state_bucket
    team_name              = var.team_name
    business-unit          = var.business-unit
    application            = var.application
    is-production          = var.is-production
    db_engine_version      = "11.4"
    environment-name       = var.environment-name
    infrastructure-support = var.infrastructure-support
    force_ssl              = "true"
    rds_family             = "postgres11"

    snapshot_identifier = "rds:cloud-platform-XXXXX-2020-02-08-04-23"

    providers = {
    # Can be either "aws.london" or "aws.ireland"
    aws = aws.london
    }
}

It is important to retain the original settings associated with the DB instance when restoring from the snapshot. So don’t make any other changes to this file.

4. Contact the Cloud Platform team about the restore, providing the db-instance-identifier and the PR raised.

5. Before the PR can be merged, the Cloud Platform team have to:

  • pause the concourse pipeline, and
  • rename the DB Instance via the AWS console

This allows the pipeline to create the new DB Instance from the snapshot, using the same name and settings as the original, and preserves the database endpoint and credentials.

Sometimes the pipeline throws an error: InvalidParameterCombination: Cannot upgrade mariadb from 10.4.13 to 10.4.8.

This will happen if AWS upgraded the minor version automatically and terraform stored the exact db_engine_version in its state.

To fix this, change the db_engine_version in your rds.tf from 10.4 to the one AWS upgraded to, in this case 10.4.13 and repeat the previous steps again.

Once the the terraform apply is successful, and the new DB instance has been created, revert the changes made for db_engine_version.

6. Merge the PR to master and check that the build pipeline creates the new DB instance with the same identifier and endpoints. The Cloud Platform team can also apply your changes manually if this is time critical.

Once the new DB instance is restored and available, your application should be able to access the database with the same database credentials and endpoint as before.

After you have verified that your DB instance and the database is working as expected, let the Cloud Platform team know so they can delete the old, renamed DB instance.

This page was last reviewed on 27 April 2021. It needs to be reviewed again on 27 July 2021 .
This page was set to be reviewed before 27 July 2021. This might mean the content is out of date.