Skip to content

Instantly share code, notes, and snippets.

@pblittle
Created October 30, 2024 00:20
Show Gist options
  • Save pblittle/c09d3d97cfe957a102d344d5239c0162 to your computer and use it in GitHub Desktop.
Save pblittle/c09d3d97cfe957a102d344d5239c0162 to your computer and use it in GitHub Desktop.
This document outlines the architecture and implementation details for deploying a Lightning Network Daemon (LND) infrastructure on Google Cloud Platform (GCP). Our setup is designed to provide a secure, scalable, and highly available environment for running LND nodes and associated services.

LND Infra on Google Cloud Platform

This document outlines the architecture and implementation details for deploying a Lightning Network Daemon (LND) infrastructure on Google Cloud Platform (GCP). Our setup is designed to provide a secure, scalable, and highly available environment for running LND nodes and associated services.

High-Level Overview

  1. Multi-Environment Setup: We maintain separate environments for development, non-production, and production, each with its own set of resources and security measures.

  2. Secure Networking: Utilizing GCP's Shared VPC for network segregation and security, with both base and restricted networks.

  3. Containerized Deployments: LND and associated services are deployed as containers in Google Kubernetes Engine (GKE) clusters, ensuring consistency and scalability.

  4. Managed Services: Leveraging GCP managed services where possible, including Cloud SQL for databases and Cloud Storage for backups.

  5. Observability: Comprehensive logging and monitoring using a combination of GCP's native tools and Datadog for advanced observability.

  6. CI/CD Automation: Utilizing GitHub Actions and Terraform Cloud for infrastructure as code and automated deployments.

  7. Security-First Approach: Implementing robust security measures including IAM, encryption, and security controls aligned with GCP's security foundations guide.

  8. Cost Optimization: Implementing budget controls and cost optimization strategies to manage cloud spending effectively.

Architecture Goals

  • Maintain clear separation between environments
  • Centralize shared resources for efficient management
  • Implement granular access controls at each level
  • Facilitate consistent policy application across resources
  • Streamline automation and CI/CD processes across the organization

Folders and Projects

│ foo.xyz
│
├── fldr-bootstrap
│   └── <TBD>
│
├── fldr-common
│   ├── dns-hub
│   ├── logging
│   ├── billing
│   ├── secrets
│   ├── security-command-center
│   ├── container-registry
│   └── monitoring
│
├── fldr-production
│   ├── base-shared-vpc-host
│   ├── restricted-shared-vpc-host
│   ├── secrets
│   ├── monitoring
│   └── application-projects
│
├── fldr-nonproduction
│   ├── base-shared-vpc-host
│   ├── restricted-shared-vpc-host
│   ├── secrets
│   ├── monitoring
│   └── application-projects
│
└── fldr-development
    ├── base-shared-vpc-host
    ├── restricted-shared-vpc-host
    ├── secrets
    ├── monitoring
    └── application-projects

Terraform Notes

Project Directory Structure

├── Makefile
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tf
├── terraform_cloud.tfvars
└── modules/
    └── folder/
        ├── main.tf
        ├── variables.tf
        └── outputs.tf

terraform.tf

variable "tf_cloud_organization" {}
variable "tf_cloud_workspace" {}

terraform {
  cloud {
    organization = var.tf_cloud_organization
    workspaces {
      name = var.tf_cloud_workspace
    }
  }
}

Makefile

TF_VAR_FILE := terraform.tfvars

# Phony targets
.PHONY: init

# Initialize Terraform
init:
	@echo "Initializing Terraform..."
	terraform init -backend-config=$(TF_CLOUD_VAR_FILE)

org_id is 1026320405589

environment_code is d, n, or p for development, non-production, and production respectively.

variable "org_id" {
  description = "GCP Organization ID"
  type        = string
}

variable "environment_code" {
  description = "Short code for the environment"
  type        = string
}

resource "google_folder" common" {
  display_name = "fldr-common"
  parent       = "organizations/${var.org_id}"
}

resource "google_project" "billing" {
  name = "prj-common-billing"
  project_id = "prj-common-billing-<UNIQ_STRING>"
  folder_id  = "${google_folder.common.name}"
}

fldr-bootstrap

Centralize and manage resources required for automation across the organization.

Contains

  • Service accounts
  • IAM configurations
  • Other automation-related resources

fldr-common

Provide centralized management for resources that are used by all environments.

Projects

  • dns-hub: Central DNS management for the organization.
  • logging: Centralized logging for all environments.
  • billing: Manages billing data and reports.
  • secrets: Centralized secrets management.
  • security-command-center: Security monitoring and management.
  • container-registry: Stores and manages container images.
  • monitoring: Organization-wide monitoring and alerting.

fldr-production

Host live, customer-facing services with the strictest security controls.

Projects

  • base-shared-vpc-host: Hosts the base Shared VPC network.
  • restricted-shared-vpc-host: Hosts the restricted Shared VPC network.
  • secrets: Environment-specific secrets management.
  • monitoring: Environment-specific monitoring and alerting.
  • application-projects: Contains all production application-specific projects.

fldr-nonproduction

Provide an environment for staging, integration testing, and pre-production validation.

Projects

  • base-shared-vpc-host: Hosts the base Shared VPC network.
  • restricted-shared-vpc-host: Hosts the restricted Shared VPC network.
  • secrets: Environment-specific secrets management.
  • monitoring: Environment-specific monitoring and alerting.
  • application-projects: Contains all non-production application-specific projects.

fldr-development

Facilitate rapid development and experimentation while maintaining basic security controls.

Projects

  • base-shared-vpc-host: Hosts the base Shared VPC network.
  • restricted-shared-vpc-host: Hosts the restricted Shared VPC network.
  • secrets: Environment-specific secrets management.
  • monitoring: Environment-specific monitoring and alerting.
  • application-projects: Contains all development application-specific projects.

Work-in-Progress

fldr-development
├── prj-dev-shared-base-network
│   └── vpc-d-shared-base
├── prj-dev-shared-restricted-network
│   └── vpc-d-shared-restricted
├── prj-dev-secrets
│   └── bkt-dev-lightning-secrets
└── fldr-dev-applications
    ├── prj-dev-lightning-core
    │   ├── gke-dev-lightning-core-cluster
    │   │   ├── gke-bitcoin-node (StatefulSet)
    │   │   └── gke-lnd-nodes (StatefulSet)
    │   ├── sql-dev-lnd-database (PostgreSQL)
    │   └── bkt-dev-lightning-backups
    ├── prj-dev-lightning-services
    │   ├── gke-dev-lightning-services-cluster
    ├── prj-dev-lightning-api
    │   └── ep-dev-lightning-api (Cloud Endpoints)
    └── prj-dev-lightning-events
        └── ps-dev-lightning-events (Pub/Sub)

Architecture Notes

  1. LND Core:

    • Deployed as a StatefulSet in the gke-dev-lightning-core-cluster.
    • Uses an external PostgreSQL database (sql-dev-lnd-database) instead of the default bbolt.
  2. LND Database:

    • Utilizes Cloud SQL for PostgreSQL (sql-dev-lnd-database).
    • Ensures data persistence and allows for easier backups and scaling.
  3. Bitcoin Node:

    • Runs as a StatefulSet in the gke-dev-lightning-core-cluster.
    • The full blockchain data is stored locally and doesn't require backing up to Cloud Storage.
  4. Additional Services:

    • Custom Lightning Service and User Authentication are example services.
    • These are optional and depend on specific application needs.
  5. Databases:

    • All databases, including LND's, use PostgreSQL on Cloud SQL.
    • Separate instances for LND and custom services ensure isolation.
  6. API and Events:

    • Cloud Endpoints and Pub/Sub are included for API management and event-driven architecture.
    • These are optional components based on the broader application needs.
  7. Networking:

    • Uses both base and restricted shared VPC networks for different security requirements.
  8. Secrets Management:

    • Dedicated project and Cloud Storage bucket for managing secrets.
  9. Backup Strategy:

    • bkt-dev-lightning-backups: A Cloud Storage bucket for storing critical backup files.
    • Includes backups of:
      • Bitcoin Core wallet backups (if applicable)
      • LND static channel backups (SCB)
      • Macaroons and TLS certificates
      • Configuration files
    • Does not include full blockchain data, as it's not necessary to back up.
    • PostgreSQL databases use Cloud SQL's built-in backup functionality.

Naming Conventions

We follow GCP's recommended naming conventions:

  • Folders: fldr-{purpose}
  • Projects: prj-{environment}-{purpose}
  • VPC networks: vpc-{environment}-{purpose}
  • GKE clusters: gke-{environment}-{purpose}

Development Environment

Projects

  • fldr-development
    • prj-dev-shared-base-network
    • prj-dev-shared-restricted-network
    • prj-dev-secrets
    • fldr-dev-applications
      • prj-dev-lightning-core
        • gke-dev-lightning-core-cluster
          • gke-bitcoin-node (StatefulSet)
          • gke-lnd-nodes (StatefulSet)
        • sql-dev-lnd-database (PostgreSQL)
        • bkt-dev-lightning-backups
      • prj-dev-lightning-services
        • gke-dev-lightning-services-cluster
      • prj-dev-lightning-api
        • ep-dev-lightning-api (Cloud Endpoints)
      • prj-dev-lightning-events
        • ps-dev-lightning-events (Pub/Sub)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment