This week’s blog is a joint post by Nati Shalom, CTO Cloudify and yours truly. We discuss the need for multiple K8s clusters along with Network orchestration.
The need for a multi Kubernetes cluster:
According to the CNCF survey, most organizations are using more than one Kubernetes cluster. The ‘typical’ use cases for multi Kubernetes clusters can be driven by different needs:
- To separate clusters between applications and teams
- To separate between deployment stages, as in the case of development and production
- To regulate compliance—separating clusters per region
- Multi Cloud—to support Kubernetes clusters on multiple clouds including hybrid, public, and private clouds
- To avoid vendor lock-in—allowing portability among cloud providers
- Edge/IoT—to manage deployments across many distributed Kubernetes clusters on edge devices
Challenges with the Existing Cluster Management
The Kubernetes project provides an open container management framework that can run on multiple environments across public clouds. Having said that, managing a Kubernetes distribution includes many extensions that cover things like high availability, support for multiple environments and operating systems, bare-metal, and upgrades that are not covered by the open source project and tend to be proprietary for each distribution.
One of the challenges of managing multi Kubernetes clusters is the inconsistency amongst the different Kubernetes distributions. Each provides its own set of provisioning tools, high availability and clustering, network management, application management and monitoring tools. This stresses IT departments and increases both Capex and Opex, and requires SMEs not readily available.
For example, Red Hat OpenShift only supports Red Hat Enterprise Linux as an operating system, and it comes with lots of custom application provisioning and management extensions.
Google Anthos supports its own proprietary service mesh; it also assumes that Anthos must be upgraded to support K8s upgrades. It relies heavily on VMware for on-premises environments, and its bare-metal support still requires a lot of manual intervention.
You can read more about the comparison between Anthos and OpenShift here.
Another challenge with multi Kubernetes cluster management is the ability to manage distributed deployments that span multiple clusters and handling drifts, partial failure as well as continuous updates of such a distributed service as outlined in ‘The Challenge of Continuous Delivery in Distributed Environments’.
Most of the existing solutions are focused primarily on managing the Kubernetes infrastructure management but not on the service itself.
EKS Anywhere: Enabling a Consistent and Open Kubernetes Management Across Any Environment
According to the same CNCF survey, EKS is the most popular managed Kubernetes platform.
Amazon announced the availability of EKS Anywhere and EKS Distribution during re: Invent 2020. This paved the way for an open source bridge to its managed Kubernetes platform, EKS. With this announcement, we now have the opportunity to deliver an open and consistent multi Kubernetes cluster management across any environment.
EKS Platforms support four simultaneous K8s versions and we support mix and match of K8s clusters running different versions, as long as they are supported in the roadmap. For EKS Rover/EKSA, it is expected that the same policy will apply.
Cloudify and Multi Kubernetes Cluster Management
What’s needed is an open-source solution that provides this orchestration functionality across all multi Kubernetes platforms such as EKS, AKS, GKE, OpenShift, MiniKube, etc. That’s what Cloudify delivers.
Key features of Cloudify service orchestration for multi Kubernetes clusters includes:
- Support for mixed workloads: Cloudify supports mixed workloads through a single automation scheme. This includes Cloud Native, Serverless, VM-based and any generic REST service.
- Multi-Cloud support: Cloudfy supports all major public and private clouds.
- Auto-discovery: Cloudify automatically discovers Kubernetes clusters.
- Lifecycle management and day-2 operation of application services: Cloudify supports across multiple Kubernetes clusters, including support for handling partial failure scenarios (resume, rollback).
- Distributed workflow execution: Cloudify manages this via a single command for executing workflows across many Kubernetes environments/workloads.
- Avoiding Kubernetes application service configuration drifts: Cloudify continuously monitors the service configuration and ensures that it meets its desired service configuration state, even when changes are made directly on the service. Cloudify detects such divergences and self-heals the difference. (a.k.a) deployment update.
- Scheduled workflow: Cloudify handles periodic deployment and maintenance tasks.
- Multi-tenancy between clusters: Cloudify matches workload and cluster per user/tenant or location.
- Continuous availability between clusters: reliable replication of deployment between multiple clusters.
- Breaking silos between clusters and external services: Cloudify integrates Kubernetes with external DB, storage, serverless or legacy services.
- Policy-based workload management between clusters: placing workload on a target cluster based on the service requirements and Kubernetes cluster capabilities (tagging).
- Support for highly secured air-gapped environments.
- Extreme (Edge) Scale: Cloudify can deploy the same service across hundreds or thousands of Kubernetes clusters.
EKS Anywhere Differentiators
Amazon EKS Anywhere is a new deployment option for Amazon EKS that allows customers to create and operate Kubernetes clusters on customer-managed infrastructure, supported by AWS. Customers can now run Amazon EKS Anywhere on their own on-premises infrastructure using VMware and or bare metal deployments.
As a full Tel-Co grade distribution, Amazon EKS Anywhere helps simplify the creation and operation of on-premises Kubernetes clusters with default component configurations, all while providing tools for automating cluster management. It builds on the strengths of Amazon EKS Distro: the same Kubernetes distribution that powers Amazon EKS on AWS. AWS supports all Amazon EKS Anywhere components including the integrated 3rd-party software, so that customers may reduce their support costs and avoid maintenance of redundant open-source and third-party tools. In addition, Amazon EKS Anywhere gives customers on-premises Kubernetes operational tooling that’s consistent with Amazon EKS. You can leverage the EKS console to view all of your Kubernetes clusters (including EKS Anywhere clusters) running anywhere, through the EKS Connector.
Continuous Deployment Support Across Multiple Clusters Made Simple (GitOps ready)
What’s also needed is a single command that handles the execution of deployment and workflow updates of distributed services across multiple clusters. Things like placement, rollback and resuming operations. Again, the Cloudify project provides this functionality.
In addition, handling custom rolling upgrade workflows such as blue/green processes in which the update process moves gradually among different clusters (until it gets to full completion) or rollback to a previous stable version (*in the case a failure is detected) becomes significantly simpler using Cloudify.
Cloudify also provides built-in integration with CI/CD platforms such as GitActions, CircleCI, Codepipeline and Jenkins that can trigger workflows and handle update processes in a way that is consistent with any continuous update process.
Managing a Full Kubernetes Environment Stack
A typical Kubernetes environment includes other infrastructure services such as storage, network, database, etc.
Cloudify’s ‘Environment as a Service’ (EaaS) allows users to create a full-stack Kubernetes environment via a single command. It can also create an optimized stack for development and production environments as described in the development and production use case.
This example also illustrates how you can continuously update the environment stack itself – as code.
EKS Anywhere and Cloudify – an Open Alternative to Google Anthos
Let’s take a look at a specific, practical example. Cloudify’s close partnership with AWS provides a tight integration with EKS.
Cloudify also integrates natively with the rest of the AWS stack which also includes the support for AWS Cloud Formation as illustrated in the diagram below.
The result of this integration provides a consistent multi Kubernetes cluster across any environment and includes a rich management UI as well as deployment and workflow execution among clusters.
The focus behind this UE design is to allow an IT engineer who doesn’t know what the workload is doing to easily understand what’s happening, monitor the progress, and have an easy integration for close-loop, etc.
In addition to that, Cloudify provides the ability to manage deployment across multiple clusters through a single workflow. We refer to that as bulk operation. In this case Cloudify can deploy a Kubernetes service that is packaged as a Helm chart, through multiple clusters while handling the specific token authentication per cluster, and can use the specific inventory information that is relevant to each cluster. It will also handle partial failure scenarios by allowing to resume or roll back such deployments.
Optimized for Extreme Performance with Support for Intel Data Plane Acceleration
Cloudify and Intel partnered to provide a performance-optimized data-plane stack for Kubernetes. This includes orchestration of virtualized radio access network workloads based on Cloud Native Network Functions (CNF) and infrastructure buildout in hybrid cloud.
An example of this was shown at Mobile World Congress 2022 in the AWS booth with a fully automated O-RAN Alliance xRAN test. It was running Intel FlexRAN (vRAN SDK) on EKS-D with appropriate Kubernetes features activated (SR-IOV Device Plugin, Multus for multiple network interfaces into pods, and Node Feature Discovery). This was delivered across multiple bare metal servers configured for real time (kernel and tuned profile) and dataplane workloads (Huge Pages, Isolcpus), accelerated in processor (AVX512 in Xeon) and adapters (ACC100 vRAN Accelerator card, 710 NICs). Hardware support for precision timing protocol in NICs and ethernet fabric was also part of this demo.
You can learn about the specific integration and benchmark result in this post.
Optimized for Cost with Spot Ocean(™) Integration
Cost optimization is another critical consideration for multi Kubernetes cluster management. Cloudify’s partnership and integration with Spot Ocean allow developers the ability to provide a cost-optimized EKS cluster by plugging Ocean as an EKS data plane.
Spot Ocean automates cloud infrastructure for containers. It continuously analyzes how your containers are using infrastructure, and automatically scales compute resources to maximize utilization and availability, using the optimal blend of spot, reserved and on-demand compute instances.
Cloudify also comes with an Ocean plugin (coming soon) which will allow users to control the way they use and configure Ocean through a dedicated plugin.
You can read more about the Cloudify and Spot Ocean integration here.
Distributed Orchestration at Scale
A distributed Kubernetes cluster can often run in a hybrid environment where some of the clusters can be located in an environment that is not accessible to external management.
Managing such an environment requires federated cluster architecture a.k.a ‘manager of managers’. In this model the central manager provides a single point of access and API to manage the deployment of services across all the sub managers. The deployment target can be defined explicitly by pointing to a specific cluster or dynamically through tags and filters and placement policy.
The sub managers often act as gateway service to the downstream services that reside with each network domain as illustrated in the diagram below.
A detailed explanation of how this model works is provided in the Distributed Orchestration at Scale post.
Final Notes
Amazon’s release of EKS-A and EKS-D is a game changer for the Kubernetes distribution ecosystem, as it brings the most popular managed Kubernetes platform to any environment.
Having said that, managing workloads across multiple Kubernetes clusters is still a pain that grows exponentially with the number of clusters.
The need for managing multiple Kubernetes clusters can vary from a simple case where we will need separate clusters between development and production environments, or between segregated departments within the same enterprise, to a more extreme case where we would want to manage hundreds (and potentially thousands) of clusters across edge networks.
The post, Architecting Kubernetes clusters — how many should you have? provides a good summary of the tradeoffs between using shared clusters and separate clusters even in more mainstream enterprise use cases. The bottom line from that exercise is that the need for multi Kubernetes clusters is going to be more commonly seen, even with medium size enterprises.
The challenge with multi Kubernetes clusters is that it’s significantly harder to manage. This is what the new class of multi Kubernetes clusters such as Google Anthos is trying to address.
The combination of Cloudify as an open orchestrator and AWS EKS and EKS-A provides an industry-first solution that addresses both the consistency issues as well as the distributed service management.
A good starting point is to try our Cloudify and EKS integration can be either with our free community edition or our SaaS version. You can find more about it here: https://cloudify.co/kubernetes/
References
- Distributed Orchestration at Scale
- The Challenge of Continuous Delivery in Distributed Environments
- Run Data Analytics on Kubernetes 2X times faster
- Delivering the Right Infrastructure for the Right Job
- Change Management — From a Business Request to Fulfillment ServiceNow & Cloudify
- Architecting Kubernetes clusters — how many should you have?
- EKS Anywhere vs. Anthos, the Showdown of cloud k8s
- Getting started with Cloudify Multi Kubernetes Cluster
Image by Ashish Bogawat from Pixabay