-
-
Save kichik/7a2ecb0d36358c50c7b878ad9fd982bc to your computer and use it in GitHub Desktop.
# aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DB=arn:aws:rds:us-east-1:XXX:db:XXX | |
Description: Automatically stop RDS instance every time it turns on due to exceeding the maximum allowed time being stopped | |
Parameters: | |
DB: | |
Description: ARN of database that needs to be stopped | |
Type: String | |
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:db:[^:]* | |
MaxStartupTime: | |
Description: Maximum number of minutes to wait between database is automatically started and the time it's ready to be shut down. Extend this limit if your database takes a long time to boot up. | |
Type: Number | |
MinValue: 10 | |
Default: 25 | |
Resources: | |
DatabaseStopperFunction: | |
Type: AWS::Lambda::Function | |
Properties: | |
Role: !GetAtt DatabaseStopperRole.Arn | |
Runtime: python3.6 | |
Handler: index.handler | |
Timeout: 20 | |
Code: | |
ZipFile: | |
Fn::Sub: | | |
import boto3 | |
import time | |
def handler(event, context): | |
print("got", event) | |
db = event["detail"]["SourceArn"] | |
id = event["detail"]["SourceIdentifier"] | |
message = event["detail"]["Message"] | |
region = event["region"] | |
rds = boto3.client("rds", region_name=region) | |
if message == "DB instance is being started due to it exceeding the maximum allowed time being stopped.": | |
print("database turned on automatically, setting last seen tag...") | |
last_seen = int(time.time()) | |
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}]) | |
elif message == "DB instance started": | |
print("database started (and sort of available?)") | |
last_seen = 0 | |
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]: | |
if t["Key"] == "DbStopperLastSeen": | |
last_seen = int(t["Value"]) | |
if time.time() < last_seen + (60 * ${MaxStartupTime}): | |
print("database was automatically started in the last ${MaxStartupTime} minutes, turning off...") | |
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait | |
rds.stop_db_instance(DBInstanceIdentifier=id) | |
print("success! removing auto-start tag...") | |
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}]) | |
else: | |
print("ignoring manual database start") | |
else: | |
print("error: unknown database event!") | |
DatabaseStopperRole: | |
Type: AWS::IAM::Role | |
Properties: | |
AssumeRolePolicyDocument: | |
Version: '2012-10-17' | |
Statement: | |
- Action: | |
- sts:AssumeRole | |
Effect: Allow | |
Principal: | |
Service: | |
- lambda.amazonaws.com | |
ManagedPolicyArns: | |
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole | |
Policies: | |
- PolicyName: Notify | |
PolicyDocument: | |
Version: '2012-10-17' | |
Statement: | |
- Action: | |
- rds:StopDBInstance | |
Effect: Allow | |
Resource: !Ref DB | |
- Action: | |
- rds:AddTagsToResource | |
- rds:ListTagsForResource | |
- rds:RemoveTagsFromResource | |
Effect: Allow | |
Resource: !Ref DB | |
Condition: | |
ForAllValues:StringEquals: | |
aws:TagKeys: | |
- DbStopperLastSeen | |
DatabaseStopperPermission: | |
Type: AWS::Lambda::Permission | |
Properties: | |
Action: lambda:InvokeFunction | |
FunctionName: !GetAtt DatabaseStopperFunction.Arn | |
Principal: events.amazonaws.com | |
SourceArn: !GetAtt DatabaseStopperRule.Arn | |
DatabaseStopperRule: | |
Type: AWS::Events::Rule | |
Properties: | |
EventPattern: | |
source: | |
- aws.rds | |
detail-type: | |
- "RDS DB Instance Event" | |
resources: | |
- !Ref DB | |
detail: | |
Message: | |
- "DB instance is being started due to it exceeding the maximum allowed time being stopped." | |
- "DB instance started" | |
Targets: | |
- Arn: !GetAtt DatabaseStopperFunction.Arn | |
Id: DatabaseStopperLambda |
Run it against actual event test data.
Do you see {'key1: 'value1', 'key2': 'value2', 'key3': 'value3'}
containing SourceArn
?
Hello! I want to share github.com/sqlxpert/stay-stopped-aws-rds-aurora , an open-source utility that does not require pre-designating a database, creating a separate stack for each affected database, or tagging databases temporarily. It works for RDS and Aurora. You can deploy it as a CloudFormation stack in one region in one AWS account or as a StackSet across multiple regions and/or multiple AWS accounts.
I've linked directly to the "Design" section of the ReadMe. In brief, Stay-Stopped responds to the RDS-specific RDS-EVENT-0154 (DB instance is being started due to it exceeding the maximum allowed time being stopped.
) and to the Aurora-specific RDS-EVENT-0153 (DB cluster is being started due to it exceeding the maximum allowed time being stopped.
). Stay-Stopped overcomes the issue mentioned in the former TODO on line 24
of KeepDbStopped without a Step Function. A "Perspective" at the end of the ReadMe goes into considerable detail about avoiding a race condition bug, which is what was lurking there.
According to "Starting an Amazon RDS DB instance that was previously stopped" in the Amazon Relational Database Service User Guide, "The startup process can take minutes to hours."
The Aurora proposal in the 2021-01-24 comment, above, would not have worked as written. At minimum, it would have been necessary to edit the EventPattern
and the handler
code to match these two Aurora-specific events: RDS-EVENT-0153 (DB cluster is being started due to it exceeding the maximum allowed time being stopped.
) and RDS-EVENT-0151 (DB cluster started.
). According to "Stopping and starting an Amazon Aurora DB cluster" in the User Guide for Aurora, "The startup process can take minutes to hours, but usually takes several minutes." "Usually" is cold comfort, and stop_db_cluster
will fail until all database instances in the cluster have reached available
status.
I learned more than I ever wanted to know about start_db_instance
/ stop_db_instance
and start_db_cluster
/ stop_db_cluster
from writing Stay-Stopped and updating my other utility, github.com/sqlxpert/lights-off-aws (which starts and stops EC2 compute instances as well as RDS and Aurora databases according to cron
schedules in their tags). RDS and Aurora require different error-handling. I wrote up my findings in 5 AWS Services, 5 Different Approaches to Idempotence on community.aws, in case the information is useful to you. Feedback on the utilities and on the article is welcome.
Before closing, I'd like to highlight two strengths of KeepDbStopped:
-
Responding to a pair of events allows as much time as necessary between when a database enters
starting
status and when it reachesavailable
status — the only status that allows successful submission of a request to stop it. A potential problem is that the database might never reachavailable
status (no second event), or that it might reachavailable
status but enter a different status between the 3 total tries allowed when a Lambda function is invoked asynchronously, from an EventBridge event bus. After the second event, which is in fact RDS-EVENT-0088 (DB instance started.
), all 3stop_db_instance
calls can fail in rare cases such as long-running maintenance or storage optimization. -
Restricting
rds:AddTagsToResource
andrds:RemoveTagsFromResource
to a designated databaseResource
and a designatedTagKey
avoids a security risk. It is difficult to prevent the modification of a Lambda function's source code and to prevent the passing of a Lambda function role to an arbitrary function. In the wrong hands, a role with a more permissive policy would allow arbitrary tagging of arbitrary RDS resources. For the benefit of others, I'll mention the Allow + ForAllValues Null gotcha, which shouldn't affectadd_tags_to_resource
orremove_tags_from_resource
but can affect operations where specifying tags is optional.
Nice work, and cheers!
@kichik it doesnt work for me when I run a test event in Lambda. I get this error
17 Feb 2022 14:09 [INFO] (/var/runtime/bootstrap.py) main started at epoch 1645106978694
17 Feb 2022 14:09 [INFO] (/var/runtime/bootstrap.py) init completed at epoch 1645106978694
got {'key1: 'value1', 'key2': 'value2', 'key3': 'value3'}
'SourceArn' : KeyError
Traceback (most recent call last):
File "/var/task/index.py", line 6, in handler
db = event["SourceArn"]
KeyError: 'SourceArn'
This also happens for ["detail"] in db = event["detail"]["SourceArn"]
I have only run this through Lambda as a 'Test' I configured on the Lambda. I have not tested this yet by using the Event Rule that listens for the message.
In the AWS cli if I run the command rds describe-events against my RDS Cluster I can see the following under 'Events' SourceIdentifier, SourceType, SourceArn and Message