Skip to content

Instantly share code, notes, and snippets.

@kichik
Last active June 18, 2025 06:12
Show Gist options
  • Save kichik/7a2ecb0d36358c50c7b878ad9fd982bc to your computer and use it in GitHub Desktop.
Save kichik/7a2ecb0d36358c50c7b878ad9fd982bc to your computer and use it in GitHub Desktop.
CloudFormation template that stops RDS from automatically starting back up
# aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DB=arn:aws:rds:us-east-1:XXX:db:XXX
Description: Automatically stop RDS instance every time it turns on due to exceeding the maximum allowed time being stopped
Parameters:
DB:
Description: ARN of database that needs to be stopped
Type: String
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:db:[^:]*
MaxStartupTime:
Description: Maximum number of minutes to wait between database is automatically started and the time it's ready to be shut down. Extend this limit if your database takes a long time to boot up.
Type: Number
MinValue: 10
Default: 25
Resources:
DatabaseStopperFunction:
Type: AWS::Lambda::Function
Properties:
Role: !GetAtt DatabaseStopperRole.Arn
Runtime: python3.6
Handler: index.handler
Timeout: 20
Code:
ZipFile:
Fn::Sub: |
import boto3
import time
def handler(event, context):
print("got", event)
db = event["detail"]["SourceArn"]
id = event["detail"]["SourceIdentifier"]
message = event["detail"]["Message"]
region = event["region"]
rds = boto3.client("rds", region_name=region)
if message == "DB instance is being started due to it exceeding the maximum allowed time being stopped.":
print("database turned on automatically, setting last seen tag...")
last_seen = int(time.time())
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}])
elif message == "DB instance started":
print("database started (and sort of available?)")
last_seen = 0
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]:
if t["Key"] == "DbStopperLastSeen":
last_seen = int(t["Value"])
if time.time() < last_seen + (60 * ${MaxStartupTime}):
print("database was automatically started in the last ${MaxStartupTime} minutes, turning off...")
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait
rds.stop_db_instance(DBInstanceIdentifier=id)
print("success! removing auto-start tag...")
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}])
else:
print("ignoring manual database start")
else:
print("error: unknown database event!")
DatabaseStopperRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: Notify
PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- rds:StopDBInstance
Effect: Allow
Resource: !Ref DB
- Action:
- rds:AddTagsToResource
- rds:ListTagsForResource
- rds:RemoveTagsFromResource
Effect: Allow
Resource: !Ref DB
Condition:
ForAllValues:StringEquals:
aws:TagKeys:
- DbStopperLastSeen
DatabaseStopperPermission:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !GetAtt DatabaseStopperFunction.Arn
Principal: events.amazonaws.com
SourceArn: !GetAtt DatabaseStopperRule.Arn
DatabaseStopperRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.rds
detail-type:
- "RDS DB Instance Event"
resources:
- !Ref DB
detail:
Message:
- "DB instance is being started due to it exceeding the maximum allowed time being stopped."
- "DB instance started"
Targets:
- Arn: !GetAtt DatabaseStopperFunction.Arn
Id: DatabaseStopperLambda
@sqlxpert
Copy link

Hello! I want to share github.com/sqlxpert/stay-stopped-aws-rds-aurora , an open-source utility that does not require pre-designating a database, creating a separate stack for each affected database, or tagging databases temporarily. It works for RDS and Aurora. You can deploy it as a CloudFormation stack in one region in one AWS account or as a StackSet across multiple regions and/or multiple AWS accounts.

I've linked directly to the "Design" section of the ReadMe. In brief, Stay-Stopped responds to the RDS-specific RDS-EVENT-0154 (DB instance is being started due to it exceeding the maximum allowed time being stopped.) and to the Aurora-specific RDS-EVENT-0153 (DB cluster is being started due to it exceeding the maximum allowed time being stopped.). Stay-Stopped overcomes the issue mentioned in the former TODO on line 24 of KeepDbStopped without a Step Function. A "Perspective" at the end of the ReadMe goes into considerable detail about avoiding a race condition bug, which is what was lurking there.

According to "Starting an Amazon RDS DB instance that was previously stopped" in the Amazon Relational Database Service User Guide, "The startup process can take minutes to hours."

The Aurora proposal in the 2021-01-24 comment, above, would not have worked as written. At minimum, it would have been necessary to edit the EventPattern and the handler code to match these two Aurora-specific events: RDS-EVENT-0153 (DB cluster is being started due to it exceeding the maximum allowed time being stopped.) and RDS-EVENT-0151 (DB cluster started.). According to "Stopping and starting an Amazon Aurora DB cluster" in the User Guide for Aurora, "The startup process can take minutes to hours, but usually takes several minutes." "Usually" is cold comfort, and stop_db_cluster will fail until all database instances in the cluster have reached available status.

I learned more than I ever wanted to know about start_db_instance / stop_db_instance and start_db_cluster / stop_db_cluster from writing Stay-Stopped and updating my other utility, github.com/sqlxpert/lights-off-aws (which starts and stops EC2 compute instances as well as RDS and Aurora databases according to cron schedules in their tags). RDS and Aurora require different error-handling. I wrote up my findings in 5 AWS Services, 5 Different Approaches to Idempotence on community.aws, in case the information is useful to you. Feedback on the utilities and on the article is welcome.

Before closing, I'd like to highlight two strengths of KeepDbStopped:

  1. Responding to a pair of events allows as much time as necessary between when a database enters starting status and when it reaches available status — the only status that allows successful submission of a request to stop it. A potential problem is that the database might never reach available status (no second event), or that it might reach available status but enter a different status between the 3 total tries allowed when a Lambda function is invoked asynchronously, from an EventBridge event bus. After the second event, which is in fact RDS-EVENT-0088 (DB instance started.), all 3 stop_db_instance calls can fail in rare cases such as long-running maintenance or storage optimization.

  2. Restricting rds:AddTagsToResource and rds:RemoveTagsFromResource to a designated database Resource and a designated TagKey avoids a security risk. It is difficult to prevent the modification of a Lambda function's source code and to prevent the passing of a Lambda function role to an arbitrary function. In the wrong hands, a role with a more permissive policy would allow arbitrary tagging of arbitrary RDS resources. For the benefit of others, I'll mention the Allow + ForAllValues Null gotcha, which shouldn't affect add_tags_to_resource or remove_tags_from_resource but can affect operations where specifying tags is optional.

Nice work, and cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment