Skip to content

Instantly share code, notes, and snippets.

@oblazek
Last active March 23, 2021 20:59
Show Gist options
  • Save oblazek/466a9ae836f663f8349b71e76abaee7e to your computer and use it in GitHub Desktop.
Save oblazek/466a9ae836f663f8349b71e76abaee7e to your computer and use it in GitHub Desktop.
Cilium in OpenStack

Architecture

       .------------------------------------.                     .------------------------------------.
       |         openstack cluster          |                     |           k8s cluster              |
       |           tt-ost2-ko               |                     |           tt-k8st2-ko              |
       |             worker                 |                     |             worker                 |
       |         tt-lab3.ko.iszn.cz         |                     |         tt-lab9.ko.iszn.cz         |
       | .-------------.   .-------------.  |                     | .---------------..----------------.|
       | |     vm1     |   |     vm2     |  |                     | |     pod1      ||     pod2       ||
       | | 10.247.2.19 |   | 10.247.2.20 |  |                     | | 10.247.144.77 || 10.247.144.152 ||
       | '------|------'   '------|------'  |                     | '------|--------''-------|--------'|
       |.-------v--------..-------v--------.|                     |.-------v--------..-------v--------.|
       || tape4f9e219-e1 || tap853025d2-98 ||                     || lxc5b048b77e28 || lxca302b9e65ee ||
       |'-------|--------''-------|--------'|                     |'-------|--------''-------|--------'|
       |        |                 |         |                     |        |                 |         |
       |        |                 |         |                     |        |                 |         |
       |        |                 |         |                     |        |                 |         |
       |        |                 |         |                     |        |                 |         |
       |        |                 |     .-----.                .-----.     |                 |         |
       |        '-----------------'---->| NIC |-----overlay----| NIC <-----------------------'         |
       |"powered" by cilium             '-----'                '-----'               powered by cilium |
       '------------------------------------'                     '------------------------------------'

Cilium status

root@tt-lab3:~# cilium status --verbose | more
KVStore:                Ok   etcd: 1/1 connected, lease-ID=34f27840cc3acfa9, lock lease-ID=34f27840cc3acfab, has-quorum=true: https://10.248.14.20:4379 - 3.3.12 (Leader)
Kubernetes:             Ok   1.19 (v1.19.7+k3s1) [linux/amd64]
Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumNetworkPolicy", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1$
:NetworkPolicy"]
KubeProxyReplacement:   Partial
Cilium:                 Ok        OK
NodeMonitor:            Disabled
Cilium health daemon:   Ok
IPAM:                   IPv4: 2/255 allocated from 10.247.2.0/24,
Allocated addresses:
  10.247.2.183 (health)
  10.247.2.95 (router)
ClusterMesh:   1/1 clusters ready, 0 global-services
   tt-k8st2-ko.conf: ready, 0 nodes, 11 identities, 0 services, 0 failures (last: never)
   └  etcd: 3/3 connected, lease-ID=158c77439c7b75a0, lock lease-ID=2ed1771796079eaf, has-quorum=true: https://tt-lab11.ko.iszn.cz:2479 - 3.4.3 (Leader); https://tt-lab9.ko.iszn.cz:2479 - 3.4.3; https://tt-lab10.ko.iszn.cz:2479 - 3.4.3
BandwidthManager:    Disabled
Masquerading:        Disabled
Controller Status:   62/62 healthy
Proxy Status:   No managed proxy redirect
Hubble:         Disabled
KubeProxyReplacement Details:
  Status:             Partial
  Protocols:          TCP, UDP
  Session Affinity:   Disabled
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Disabled
  - LoadBalancer:   Disabled
  - externalIPs:    Disabled
  - HostPort:       Disabled
BPF Maps:   dynamic sizing: off
  Name                          Size
  Non-TCP connection tracking   262144
  TCP connection tracking       524288
  Endpoint policy               65535
  Events                        40
  IP cache                      512000
  IP masquerading agent         16384
  IPv4 fragmentation            8192
  IPv4 service                  65536
  IPv6 service                  65536
  IPv4 service backend          65536
  IPv6 service backend          65536
  IPv4 service reverse NAT      65536
  IPv6 service reverse NAT      65536
  Metrics                       1024
  NAT                           524288
  Neighbor table                524288
  Global policy                 16384
  Per endpoint policy           65536
  Session affinity              65536
  Signal                        40
  Sockmap                       65535
  Sock reverse NAT              262144
  Tunnel                        65536
Cluster health:                               4/4 reachable   (2021-03-23T14:13:11Z)
  Name                                        IP              Node        Endpoints
  tt-ost2-ko/tt-lab3.ko.iszn.cz (localhost)   10.248.14.26    reachable   reachable
  tt-ost2-ko/tt-lab4.ko.iszn.cz               10.248.14.24    reachable   reachable
  tt-ost2-ko/tt-lab5.ko.iszn.cz               10.248.14.22    reachable   reachable
  tt-ost2-ko/tt-lab6.ko.iszn.cz               10.248.14.20    reachable   reachable

Cilium-agent arguments in OS cluster

ExecStart=/opt/cilium/sbin/cilium-agent \
        --enable-l7-proxy=false \
        --disable-envoy-version-check=true\
        --enable-remote-node-identity \
        --k8s-kubeconfig-path /opt/cilium/conf/kubeconfig.yaml \
        --enable-ipv6=false \
        --prometheus-serve-addr=":9099" \
        --enable-host-reachable-services=true \
        --enable-endpoint-routes=true \
        --enable-local-node-route=false \
        --masquerade=false \
        --kvstore etcd \
        --kvstore-opt etcd.config=/opt/cilium/conf/etcd.config \
        --clustermesh-config /opt/cilium/conf/clusters \
        --cluster-id {{ cluster_id }} \
        --cluster-name {{ cluster_name }}

There are currently 4 nodes in the openstack cluster, each having InternalIP from range 10.248.14.0/24. Openstack networking (including IPAM) is for now provided by calico so this works as in chaining mode. Bird is used for bgp announcing since each vm has routable IP address.

The problem with calico in this setup is that it doesn't use per worker CIDR or anything that cilium usually uses. It allocates IPs from one network (selected subnet) to every vm spawned anywhere in the cluster. This can be a problem here when cilium-agent tries to insert node CIDRs of each remote node into routing table.

func (n *linuxNodeHandler) updateNodeRoute(prefix *cidr.CIDR, addressFamilyEnabled bool, isLocalNode bool) error {
	if prefix == nil || !addressFamilyEnabled {
		return nil
	}

	_, err := n.createNodeRouteSpec(prefix, isLocalNode)
	if err != nil {
		return err
	}
	if _, err := route.Upsert(nodeRoute); err != nil {
		log.WithError(err).WithFields(nodeRoute.LogFields()).Warning("Unable to update route")
		return err
	}

	return nil
}

This results in having something like 10.247.2.0/24 via 10.247.2.95 dev cilium_host src 10.247.2.95 mtu 145 in routing table on each node.

# ip route
root@tt-lab3:~# ip r
default via 10.248.14.1 dev eth0 onlink
10.247.2.0/24 via 10.247.2.95 dev cilium_host src 10.247.2.95 mtu 1450
10.247.2.19 dev tape4f9e219-e1 scope link
10.247.2.20 dev tap853025d2-98 scope link
10.247.2.95 dev cilium_host scope link
10.247.2.183 dev lxc_health scope link
10.248.14.0/24 dev eth0 proto kernel scope link src 10.248.14.26

Each node also contains direct routes (above) for each instance like 10.247.2.19 dev tape4f9e219-e1 scope link which is inserted into the routing table by calico-felix (deamon running on every node).

All this becomes a problem when remote instance (vm1) tries to talk to local node (tt-lab5.ko.iszn.cz) as in this picture:

     .---------------------------------------------------------------------------------------------------.
     |                                        openstack cluster                                          |
     |                                          tt-ost2-ko                                               |
     |                                                                                                   |
     | .---------------------------------.                    .---------------------------------.        |
     | .-------------.   .-------------. |                    .-------------.   .-------------. |        |
     | |     vm1     |   |     vm2     | |                    |     vmX     |   |     vmY     | |        |
     | | 10.247.2.19 |   | 10.247.2.20 | |                    |             |   |             | |        |
     | '------|------'   '------|------' |                    '------|------'   '------|------' |        |
     |.-------v--------..-------v--------.                   .-------v--------..-------v--------.        |
     || tape4f9e219-e1 || tap853025d2-98 |                   | tape4f9e219-e1 || tap853025d2-98 |        |
     |'-------|--------''----------------'                   '----------------''----------------'        |
     | |      |                          |                    |                                 |        |
     | |      |                          |                    |          .---------------------.|        |
     | |      |                          |                    |          | http server         ||        |
     | |      |                          |                    |      .-->| running on the host ||        |
     | |      |                       .-----.              .-----.   |   '---------------------'|        |
     | |      '---------------------->| NIC |------------->| NIC |----                          |        |
     | |node "tt-lab3.ko.iszn.cz"     '-----'              '-----'     node "tt-lab5.ko.iszn.cz"|        |
     | '---------------------------------'                    '---------------------------------'        |
     |                                                                                                   |
     | root@tt-lab3:~# ip r                                root@tt-lab5:~# ip r                          |
     | default via 10.248.14.1 dev eth0 onlink             default via 10.248.14.1 dev eth0 onlink       |
     | 10.247.2.0/24 via 10.247.2.95 dev cilium_host       10.247.1.27 dev tap535fa509-a6 scope link     |
     |  src 10.247.2.95 mtu 1450                           10.247.2.0/24 via 10.247.2.135 dev cilium_host|
     | 10.247.2.19 dev tape4f9e219-e1 scope link            src 10.247.2.135 mtu 1450                    |
     | 10.247.2.20 dev tap853025d2-98 scope link           10.247.2.11 dev tap4355f2bb-33 scope link     |
     | 10.247.2.95 dev cilium_host scope link              10.247.2.12 dev tap8cd570f1-67 scope link     |
     | 10.247.2.183 dev lxc_health scope link              10.247.2.29 dev tapbee1339f-84 scope link     |
     | 10.248.14.0/24 dev eth0 proto kernel scope          10.247.2.135 dev cilium_host scope link       |
     |  link src 10.248.14.26                              10.247.2.218 dev lxc_health scope link        |
     |                                                     10.248.14.0/24 dev eth0 proto kernel scope    |
     |                                                      link src 10.248.14.22                        |
     '---------------------------------------------------------------------------------------------------'

Anyway, if updateNodeRoute function is modified to not add the CIDR route 10.247.2.0/24 via 10.247.2.95 everything starts to work like a charm.

With this I would like to open a discussion on what should be the right way to make this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment