K8s and MetalLB: A LoadBalancer for On-Prem Deployments

Photo by Modestas Urbonas on Unsplash

So, you just spun up a brand new K8s cluster on vSphere, OpenStack, Bare Metal, or a Raspberry Pi cluster and started running your first workloads. Everything is going great so far. That’s awesome! But then, you try to deploy a helm chart for that fancy new app you’ve been wanting to test out. The the pods come up ok but you can’t seem to access it from the link the chart spit out. Yikes!

So you think to yourself, “What the heck? I can get to it fine from inside the cluster and there aren’t any issues with the pods or in the logs…”

You wear your debugging glasses and take a look at the service responsible for handling the traffic to the pods. Awful! That service is just stuck pending waiting for an External-IP.

kubectl get svcNAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
test         LoadBalancer   <pending>     80:31168/TCP   3s

Describing the service doesn’t offer much help either. It doesn’t show anything out of the ordinary other than the fact that there are no events associated with it:

kubectl describe svc test
Name:                     test
Namespace:                default
Labels:                   <none>
Annotations:              kubectl.kubernetes.io/last-applied-configuration:
Selector:                 app=nginx
Type:                     LoadBalancer
Port:                     http  80/TCP
TargetPort:               80/TCP
NodePort:                 http  32325/TCP
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

So, what’s going on here? What’s wrong with my service?

Well, the issue here is that our cluster is running on-prem and doesn’t have a proper cloud provider configured with a LoadBalancer support like you would have with AWS, GCP, Azure, etc. Therefore, when you create a service with type LoadBalancer it will sit in the pending state until the cluster is configured to provide those capabilities.

So, what are our options here? With vSphere you have the NSX Container Plug-in, with OpenStack you have the Neutron LBaaS plugin, for Bare Metal and Pi clusters…well, you’re kind of on your own. So, why not just use the NCP or Neutron plugins? The answer usually comes down to complexity and cost. Spinning up a full SDN is quite a bit of added overhead to manage, however, if you are already running NSX or Neutron they may be worth considering first.

So, what option do we have that will satisfy the needs of all of our environments? This is where MetalLB comes in. MetalLB is a simple solution for K8s network load balancing using standard routing protocols aimed to “Just Work.”

Setting up MetalLB

A basic deployment of MetalLB requires the following prerequisite components to function properly:

  • A Kubernetes cluster (v1.13.0+) that does not already have network load-balancing
  • IPv4 address ranges for MetalLB to provision service instances
  • A CNI that is compatible with MetalLB – most are supported, but see here for caveats.

IPVS Configuration Requirements

If you happen to be running kube-proxy in ipvs mode you’ll have to make a quick change to allow ARP. If you are using the default iptables mode then you can safely skip this step.

$ kubectl edit configmap -n kube-system kube-proxy

Set the following keys un the kube-proxy configmap:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
  strictARP: true

Deploying MetalLB Resources

The following commands will create a namespace metallb-system, the speaker daemonset, controller deployment, and service accounts:

$ kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/namespace.yaml
$ kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/metallb.yaml

The first time you deploy MetalLB you’ll also have to generate the secretkey for speaker communication however for upgrades this can remain unchanged:

$ kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

At this point, it should have spun up a controller pod and one speaker pod per node that should all get to the running state:

$ k get po -n metallb-system
NAME                          READY   STATUS    RESTARTS   AGE
controller-5c9894b5cd-8mww9   1/1     Running   0          4d1h
speaker-4bjhf                 1/1     Running   0          4d1h
speaker-jnfpk                 1/1     Running   0          4d1h
speaker-sgwht                 1/1     Running   0          4d1h

Though the components are all up and running now, they won’t actually do anything until you provide them with configuration. MetalLB can be configured to run in either Layer2 or BGP modes. For this deployment we will be using Layer2, however, more information and BGP configuration examples can be found here.

Layer2 Configuration

For Layer2 configuration all you need is a set of IPv4 addresses allocated for MetalLB to hand out for requesting services. This configuration is set in the configmap as shown below which after being written to a file can be applied via kubectl apply -f config.yml

apiVersion: v1
kind: ConfigMap
  namespace: metallb-system
  name: config
  config: |
    - name: default
      protocol: layer2

After applying the configmap, our changes should take effect and our service should now have an External-IP ready for use! Note that the External IP was pulled from the IP range in addresses array of the ConfigMap.

$ k get svc
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
kubernetes   ClusterIP      <none>          443/TCP        5d1h
test         LoadBalancer   80:32325/TCP   22h

A few Caveats and Limitations

Layer 2 mode has two primary limitations you should know about that they call out as part of the documentation:

  • Potentially slow failover
  • Single node bottlenecking

In Layer2 mode, a single leader-elected node receives all traffic for a service IP. This means that your service’s ingress bandwidth is limited to the bandwidth of a single node. This is a fundamental limitation of using ARP and NDP to steer traffic.

Also, due to the leader lease time of 10 seconds, it can take time for failover to occur and properly direct traffic in the event of a failure.

For many situations this is not a dealbreaker, however, for more critical systems you may want to leverage the BGP mode. For more information regarding the tradeoffs, see the BGP Limitations in the documentation.

Spread the word

twitter icon facebook icon linkedin icon