Creating a Highly Available Secured Kubernetes Cluster on AWS with Kops
Today I will be talking about Kops, which is an official tool for creating Kubernetes clusters on AWS, with support for GCE and VMware vSphere in alpha. It takes a whole lot of the pain out of setting up a Kubernetes cluster yourself, but still presents many challenges to overcome and a great degree of freedom in how you configure the cluster.
I have recently created a Kubernetes cluster on AWS for a client, where I used the Kops tool for the very first time and I will here present what I learnt about implementing best practices with this technology stack. Given the rapid development of Kubernetes itself, and how relatively young Kops is, it proved to be far from a walk in the park to create a production-grade cluster. Documentation is often relatively poor, or just plain missing. The intention of this blog post is to make it easier for others going down the same route.
What we set out to do on this project, is to produce a highly available Kubernetes cluster (on AWS) which you can only SSH into via a dedicated host (a so-called bastion), and that secures the control plane/API via client certificate based authentication and RBAC authorization.
Creating the Cluster Itself
The very first step, which is thankfully quite simple thanks to the awesome power of Kops, is to bring the cluster up. The below kops invocation creates a highly available cluster on AWS, with 5 worker nodes spread among three availability zones and 3 master nodes in the same AZs. For security, all master/worker nodes are in a private subnet and not exposed to the Internet. We also instantiate a bastion host as the sole entry point into the cluster via SSH, and the cluster is configured to enable RBAC as its authorization mode. I chose Flannel as the networking system as I have some experience with it from prior work with Kubernetes and have a good impression of it.
kops --state s3://example.com create cluster --zones \ eu-central-1a,eu-central-1b,eu-central-1c --master-zones \ eu-central-1a,eu-central-1b,eu-central-1c --topology private --networking flannel \ --master-size m4.large --node-size m4.large --node-count 5 \ --bastion --cloud aws --ssh-public-key id_rsa.pub --authorization RBAC --yes \ example.com
On the topic of high availability, since the cluster has three master nodes within three different availability zones, the cluster is protected from the outage of individual AZs and is still available if a master goes down. The same goes for worker nodes.
Exporting the Kubectl Configuration
After creating the cluster, we would like to generate a configuration file to use in order to have kubectl operate against our cluster. We do this with the following command:
KUBECONFIG=$CLUSTER.kubeconfig kops export kubecfg $CLUSTER
Configuring Cluster Components for RBAC
In order for certain cluster components to function with RBAC enabled, some configuration is required. Basically, what we need to do is to bind the right roles to service accounts to allow the latter to perform certain tasks on behalf of pods.
Configure the cluster when it’s ready by applying the configuration files in the sections below:
kubectl --kubeconfig=$CLUSTER.kubeconfig apply -f kube-system-rbac.yaml kubectl --kubeconfig=$CLUSTER.kubeconfig apply -f kube-flannel-rbac.yaml
Default System Service Account
The default service account in the kube-system namespace must be bound to the cluster-admin role:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: system:default-sa
subjects:
- kind: ServiceAccount
name: default
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
Flannel Service Account
The flannel service account in the kube-system namespace must be given a role with certain permissions in order to enable the Flannel networking component to do its job:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
Client Certificate Based Authentication
I decided to implement user authentication via TLS certificates, as this is directly supported in the kubectl tool and ties easily in with RBAC authorization. The trick here is to get hold of the certificate authority (CA) certificate and key that Kops used when creating the cluster, as it will allow us to generate valid user certificates. Luckily, these files are stored in Kops' S3 bucket. The following commands copies the CA key and certificate to the local directory:
aws s3 cp s3://$BUCKET/$CLUSTER/pki/private/ca/$KEY ca.key aws s3 cp s3://$BUCKET/$CLUSTER/pki/issued/ca/$CERT ca.crt
Now that we have the CA key and certificate, we can generate a user certificate with the openssl command line tool. The procedure consists of first generating a private key, then with the previously generated key generating a signing request for a certificate representing a user with username $USERNAME and finally signing the certificate with the help of the CA key and certificate. The below commands will produce a key and certificate named user.key and user.crt, respectively:
openssl genrsa -out user.key 4096 openssl req -new -key user.key -out user.csr -subj '/CN=$USERNAME/O=developer' openssl x509 -req -in user.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out user.crt \ -days 365
Granting Cluster Administrator Rights to User
We would like for our user, as represented by the certificate, to have cluster administrator rights, meaning that they are basically permitted any operation on the cluster. The way to do this is to create a ClusterRoleBinding that gives the cluster-admin role to the new user, as in the following command:
kubectl --kubeconfig $CLUSTER.kubeconfig create clusterrolebinding \ $USERNAME-cluster-admin-binding --clusterrole=cluster-admin --user=$USERNAME
After granting your user this role, we can start using it instead of the default admin user, as you’ll see in the next section.
Identifying User through Certificate via Kubectl
Given the certificate we created for the user previously, and having assigned the user the cluster-admin role, we can now identify towards Kubernetes by modifying the kubectl configuration. We configure kubectl to authenticate with the certificate towards the cluster via the following commands:
kubectl --kubeconfig=$CLUSTER.kubeconfig config set-credentials $USERNAME \ --client-key=user.key --client-certificate=user.crt kubectl --kubeconfig=$CLUSTER.kubeconfig config set-context $CLUSTER --user $USERNAME kubectl --kubeconfig=$CLUSTER.kubeconfig config use-context $CLUSTER
After configuring kubectl with the previous commands, you should be able to operate on the cluster as the new user. Try f.ex. listing all pods:
kubectl --kubeconfig $CLUSTER.kubeconfig get pods --all-namespaces
If the above command worked, you should now have a fully functional cluster which you can deploy your applications within - have fun!