Intro to K8s

What is K8s?

An open-source system for deployment, scaling and management automation for containerised applications. It has support for on-prem, public and hybrid cloud.

Key Features:

  1. Service discovery and load balancing

  2. Self Healing

  3. Horizontal Scaling

  4. IPv4 / IPv6 dual stack

  5. Automated rollouts and rollbacks

  6. Secret and configuration management

  7. Storage Orchestration

A Kubernetes cluster is made of masters and nodes, these can be Linux hosts, VMs, bare metal servers, instances on cloud etc.

Control Plane (MASTER)

The master is a collection of services that make up the control plane for the cluster. Brain of the cluster where all the control and scheduling decisions are made. Runs a lot of specialised control loops and services.

Best way to set up the master

  • Simple setups like labs and test environments can have all the master services run on a single host

  • The best practice for all other setups is to have a multi-master high availability (HA)

  • HA masters are a default in most cloud providers like Azure Kubernetes Service (AKS), AWS Elastic Kubernetes Service (EKS) and Google Kubernetes Engine (GKE)

  • Running 3-4 replicas of the master in HA configuration is a common practice

  • Do not run user applications on the master since you want the master to concentrate fully on managing the cluster

Control plane provides :

  1. API server :

    • The frontend / gateway to the control plane, all instructions and communications go through thisT

    • his is the central and main component of the Kubernetes cluster. All internal and external components communicate via the API server.

    • It exposes a RESTful API that you POST YAML configurations file to over HTTPS. These YAML files are called manifests. Manifests contain the desired state of the application that is running (which image is used for which container, which ports are exposed, how many pod replicas we want)

    • All the requests the API server receives are authenticated and authorised. Once done, the configuration is validated, persisted and stores in the cluster store and deployed.

  2. ETCD (cluster store) :

    • The only stateful (information retained for future use) part of the control plane

    • Persistently stores the entire configuration and the state of the cluster

    • Based on ETCD (a popular distributed database)

    • A good practice is to run 3-5 etcd replicas for HA, this provides adequate backup to recover when things go wrong

    • Prefers consistency over availability (uses RAFT consensus algorithm)

  3. Kube Controller:

    • Implements all the background control loops that monitor the cluster

    • It’s a controller of controllers : spawns all the independent control loops and monitors them

    • Some of these control loops include : node controller, endpoint controller and the ReplicaSet controller

    • These control loops run in the background and continuously watches the API server for any changes The aim is to ensure that the current state of the controller matches that of the desired state

    • The logic of all these loops is essentially :

      1. obtain the desired state

      2. observe the current state

      3. determine the difference if any

      4. reconcile the differences

  4. Scheduler :

    • Watches the API server for any new tasks, and assigns them to the appropriate nodes

    • Runs on a complex logic that filters out nodes incapable of running the tasks and internally ranks the nodes that are capable of running the task - the node with the highest ranking is selected to run the task

    • How is the ranking done?

      Each of these questions have a weightage and based on the most number of points gained by a node, the task is assigned to the node with the highest points.

  5. Cloud controller manager :

    • If your cloud is running on a public cloud platform like AWS, AZure, GCP the control plane will also run a cloud controller manager

    • It manages the integrations with the underlying cloud technologies and services (loud balancers, storage etc)

Worker (Nodes)

Nodes are the workers of a Kubernetes cluster. They do the following:

  • Watch the API server for new work assignments

  • Execute new assignments

  • Report back to the control plane using via the API server

They are built simpler than the master node.

Kube Proxy :

  • network proxy to allow for communication both inside and outside the cluster (pod-pod communication)

  • Runs on every node in the cluster and is responsible for local cluster networking

  • Makes sure that each node gets it’s own unique IP address and implements local IPTABLES or IPVS rules to handle routing and load balancing of traffic on the Pod network


    CRI (container runtime interface) :

  • A software that dictates how containers and images are run, for example Docker

  • It works on pulling images, starting and stopping containers

  • examples : docker, cri-containerd (container-dee)


    Kubelet :

  • Main component of the worker node

  • Runs on every node in the cluster

  • The terms kubelet and node are often used interchangeably

  • Watches the API server for new work assignments

  • Maintains a reporting channel to the control plane

  • If it can not execute a task, it lets the control plane know

  • Monitors and maintains information about the server it runs on and exchanges this information with the Kube API server (if pod gets killed, does not run etc). Kube Scheduler decides where the service can again be deployed etc.

Kubernetes DNS

  • Every Kubernetes cluster has an internal DNS service that is vital to its operations

  • The cluster’s DNS service has a static IP address that is hard-coded into every pod on the cluster (all the nodes know how to find it)

Data Plane

Pods

A group of whales is called a “pod of whales”. The docker logo is a whale, so a group of containers are called pods.

There are two ways to run a pod

  1. Single container per pod

  2. Multiple containers per pod (multi-container pods)

Either way, a Kubernetes Pod is a construct for running one or more containers.

Pod Anatomy

Pod is an environment to run containers. Pods themselves do not run anything, it’s just a sandbox for hosting containers. Keeping it high level, you ring-fence an area of the host OS, build a network stack, create a bunch of kernel namespaces, and run one or more containers in it. That’s a Pod.

Minimum unit for scaling

Pods are the minimum unit of scaling in K8s. If you are scaling up or down, you need to add or remove pods respectively. You should not scale by adding more containers in the same pod.

Atomic Operations

Deploying a pod is an atomic operation. A pod is considered ready only when all it’s containers are up and running.

Pod Lifecycle

Pods are mortal, you can die. If they die unexpectedly, K8s brings up a new one in its place. This new pod will have a new IP and ID. Kubernetes makes sure that all pods that are created are in the same subnet, so containers in different pods can still communicate. Kubelet proxy assigns these IP addresses.

Should all my containers be in the same pod?

  • do all the services need to be co-located and co-scheduled?

  • example : database service and front end service

    • These two do they have to be on the same pods?

    • you can scale front end to 10-12 instances but can not do that to databases

    • these two services may not be co-scheduled

How does K8s ensure replication of pods?

  • Deployment : is a YAML object that defines the pods and the number of container instances called replicas for each pod

  • Replicas controller is a service running on the control plane

  • Scheduler decides which nodes should have how many pods and which pods

Kubernetes Networking

  • Pods communicate with other pods without NAT (network admin protocol) - they communicate directly with Ip address

  • Nodes communicate with pods without NAT

  • Every node in the cluster is assigned a CIDR block - IP addresses for pods running on it

Namespace

  • Provides logical networking stack with separate routes, firewall rules and network devices

  • By default, “root” namespace is used by nodes (unless specified). When a pod comes up, different namespaces are given

  • Namespaces are completely isolated - communication between namespaces happens via virtual ethernet devices that behave like patch panels

Example :

  • If you have 3 nodes (VMs/PCS etc) and you want these nodes to behave like a cluster of nodes they should be able to communicate between themselves.

  • Each pod gets a unique IP address. By default, it allocates one range of IPs for each node to avoid conflicting IPs.

  • Within a pod all containers are in the same namespace - they can communicate with each other using local host

  • What about communication from one pod to another?

Deployments

For any application to run on a K8s cluster, it needs to be:

  • Packaged in a container

  • Wrapped in a pod (pod is a wrapper that allows a container to run on a kubernetes cluster)

  • Deployed via a manifest file

Pods are deployed using a high level controller. The most common controller is the Deployment. It offers :

  • Scalability

  • self-healing

  • rolling updates

Deployments are defined in YAML manifest files that specify things like which image to use and how many replicas to deploy. Once defined in the deployment YAML file, it is posted to the API server as the desired state of the application and let K8s implement it.

The Declarative model

  1. Declare the desired state of an application in a manifest file

    • Manifest files are written in YAML and they tell K8s how the application should look (desired state)

    • It mentions which image to use, how many replicas to run, which network ports to listen on, how to perform updates

  2. POST it to the API server

    • The most common way of doing this is using the kubectl command-line utility

    • This sends the manifest to the control plane as an HTTP POST, usually on port 443

  3. K8s stores it in the cluster store as the applications desired state

    • Once the request is authenticated and authorized, K8s inspects the manifest and identifies which controller to send it to

    • It records the config in the cluster store as part of the cluster’s overall desired state

  4. K8s implements the desired state

    • Work gets scheduled on the cluster

    • Images are pulled, containers are started, networks building, starting the application process

  5. K8s implements watch loops to make sure the current state of the application does not vary from the desired state

    • K8s uses background reconciliation lops that constantly monitor the state of the cluster

    • If current state of the cluster varies from the desired state, K8s will perform whatever tasks are necessary to reconcile the issue

    Declarative vs Imperative Model

Imperative model is where you provide a log list of platform specific commands to build things. Declarative model is the opposite of this.

Declarative model is :

  • Simpler than the imperative model

  • enables self healing, scaling, lends itself to version control and self-documentation

  • tells the cluster how things should look (desired state)

  • if it stops looking like this, the cluster notices the discrepancy and does the reconciliation to bring it back to the desired state

That’s all for this post! In the next post I will document more on pods, networking and installation and usage of K8s.

 
 
Previous
Previous

Computer Vision : Homography

Next
Next

Recommendation Systems : Part 1