How Can Concourse Best Leverage K8s as a Runtime?

The current landscape of Kubernetes for CI/CD leaves us with a lot of new options since the original Kubernetes Runtime RFC (#2) and it makes sense for the Concourse team + community to have a discussion around how we can best leverage K8s for Concourse workloads.

🚂

Commonalities

Regardless of how we choose to move forward leveraging K8s, there's some common concerns which we'll need to address to make the K8s runtime a reality ( whether we use Tekton or roll our own solution leveraging K8s primatives )

Unit of Execution

tldr: Does the proposed K8s runtime schedule individual steps, builds (with many steps), or a whole pipeline at once?

What is the smallest unit of exectution we can / want to leverage in the K8s runtime to ensure we're optimially leveraging the features of Kubernetes and addressing any technical challenges around volume streaming between steps of a build.

Image Support

tldr: How does the proposed runtime support existing task image behaviour in Concourse?

There are currently many different ways in which images are fetched or specified by users to define the RootFS for a task step or custom resource. Users can provide an image_resource which uses a resource get operation to fetch the image RootFS into a volume, but they can also use an output from another step in the pipeline as the image: for a task.

Volume Caching

tldr: Where does the existing behaviour of using Bagggageclaim to cache volumes (resource caches, task caches, etc) fit in, given we cannot control Volumes' lifecycle outside of a pod / deployment ?

Tekton

https://github.com/tektoncd

Unit of Execution

Image Support

tektoncd/pipeline#639

Volume Caching

https://github.com/tektoncd/pipeline/blob/master/docs/developers/README.md#how-are-resources-shared-between-tasks

PipelineRun supports using a Persistent Volume Claim or a GCS bucket to share artifacts between tasks.

Rolling our Own

Using K8s Primitives, or some mix of K8s primatives and custom CRDs

Unit of Execution

One big 'ol Pod for a running build's steps? Jobs for build steps?

Image Support

Volume Caching

Separate Baggageclaim component deployed to support this? One Baggageclaim per node using anti-affinity? How do we mounty volumes to the containers of a build step.

https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/

The Runtime Interface

What is the Line Between Core and Runtime?

...

Build Step Execution

The exec package is the cleanest place to discuss the process by which a scheduled build's plan is converted to executable steps, and handed off to the "runtime". The execution of steps involves the creation of containers and volumes by the "Runtime" (Currently most of this lives in the worker package, but there's a higher-order abstraction for resources in the resource package, and the db package is also involved at different levels )

There are 3 builds step types which have a concrete execution which involves Containers and Volumes, while the other step types available are meta-steps:

Resource Checking

But what about Resource checking! radar also creates containers!

topherbullock/K8s Discovery.md

How Can Concourse Best Leverage K8s as a Runtime?

Commonalities

Unit of Execution

Image Support

Volume Caching

Tekton

Unit of Execution

Image Support

Volume Caching

Rolling our Own

Unit of Execution

Image Support

Volume Caching

The Runtime Interface

What is the Line Between Core and Runtime?

Build Step Execution

`get`

`task`

`put`

Resource Checking

Runtime's State

How much state can / should the runtime store in the DB?

Are the Current "Runtime" DB Objects Too Garden Specific?

Efficiency Gains for Different Runtime Engines