Tech Blog | Haptik

From Slow to Swift: Revolutionizing Kubernetes StatefulSet Updates with OpenKruise

Written by Shoeb Khan | Oct 1, 2024 12:17:15 PM

Discover how we tackled the challenges of slow, sequential updates in our Kubernetes-based Message Consumer application, deployed using StatefulSets, and how OpenKruise’s AdvancedStatefulSet enabled us to dramatically improve our update times.

The Issue

We encountered a significant bottleneck with updating pods in parallel in our StatefulSet-managed application. The native Kubernetes StatefulSet only allows updating one pod at a time, which extends the update duration significantly. For example, updating 100 pods took about 14 minutes. Our architecture required each pod to maintain a unique identity (ranging from 0 to N-1), as each pod is responsible for connecting to a specific Azure EventHub partition (Kafka equivalent Azure service). This one-to-one mapping was necessary to prevent partition-related errors. Because of this requirement, we couldn’t simply switch to using Deployments, which assign random identities to pods.

Why We Needed StatefulSets

Our Message consumer application runs as a StatefulSet on Azure Kubernetes Service (AKS), with the number of pods matching the number of partitions in the corresponding EventHub topic. Each pod runs a single consumer that connects to one EventHub partition:

  • Pod 0 connects to Partition 0

  • Pod 1 connects to Partition 1

  • ..........

  • Pod N-1 connects to Partition N-1

This one-to-one mapping is essential to avoid epoch-related errors that arise when consumers attempt to connect to multiple partitions. Since adopting this architecture, these errors have ceased.

In summary, StatefulSets were necessary because they allow each pod to have a unique, persistent identity (0 to N-1), which is required for each pod to connect to a specific EventHub partition.

The Solution: OpenKruise AdvancedStatefulSet

To overcome the limitation of serial updates in StatefulSets, we turned to OpenKruise's AdvancedStatefulSet. OpenKruise extends the native Kubernetes StatefulSet, adding advanced features like parallel updates while preserving StatefulSet’s key benefit of persistent, unique pod identities.

With OpenKruise AdvancedStatefulSet, we were able to:

  • Perform parallel updates, significantly reducing update time.

  • Maintain the unique identity of each pod, ensuring the continued one-to-one mapping of pods to EventHub partitions.

Benefits of the Solution

The impact of switching to OpenKruise AdvancedStatefulSet was immediate:

  • Previously, updating 100 pods one by one took 14 minutes.

  • With OpenKruise AdvancedStatefulSet and 25% parallelism enabled, the same update process now completes in just 1.5 minutes, reducing update time by 90%.

This improvement drastically made our update process much more efficient and faster.

More About OpenKruise

OpenKruise is a CNCF (Cloud Native Computing Foundation) incubating project. This designation reflects its strong community backing, adherence to best practices, and commitment to open standards. It integrates seamlessly with Kubernetes and offers powerful extensions like AdvancedStatefulSet that enhance the functionality of StatefulSets.

OpenKruise Architecture

Fig 1. Openkruise Architecture. Ref: https://openkruise.io/docs/core-concepts/architecture

Deployment Process

Here’s how we integrated OpenKruise AdvancedStatefulSet into our Kubernetes environment:

1. Install OpenKruise Controller: We installed the OpenKruise controller into our existing Kubernetes cluster. 


# Firstly add openkruise charts repository if you haven't done this.
$ helm repo add openkruise https://openkruise.github.io/charts/

# [Optional]
$ helm repo update

# Install the latest version.
$ helm install kruise openkruise/kruise --version 1.7.1

2. Update StatefulSet Definitions: We modified our StatefulSet definitions to leverage AdvancedStatefulSet instead of the default StatefulSet.

-  apiVersion: apps/v1
+  apiVersion: apps.kruise.io/v1beta1
   kind: StatefulSet
   metadata:
     name: sample
   spec:
     #...

3. Configure Parallelism: We configured the parallel update parameters to allow 25% of our pods to be updated simultaneously, significantly reducing update time by 90%.

apiVersion: apps.kruise.io/v1beta1
kind: StatefulSet
spec:
  # ...
  podManagementPolicy: Parallel
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%

Openkruise Advanced Statefulset - kubectl commands

Fig 2. Openkruise Advanced Statefulset - kubectl commands

K8S Native Statefulset vs Openkruise Advanced Statefulset Update Time Comparison

Updating 100 pods using the Kubernetes native StatefulSet in a sequential manner took approximately 14 minutes. In contrast, updating 100 pods with OpenKruise's Advanced StatefulSet using a 25% rolling update strategy took just 1.5 minutes—an impressive 90% reduction in update time. The images below illustrate this significant improvement.

K8S Native Statefulset update 

Fig 3. Kubernetes Native Statefulset sequential update took 14 minutes

Openkruise Advanced Statefulset update 

Fig 4. Openkruise Advanced Statefulset parallel update took only 1.5 minutes, an improvement of ~ 90%

Additional Resources: OpenKruise Documentation

Conclusion

By deploying OpenKruise AdvancedStatefulSet, we overcame the limitations of serial updates in Kubernetes StatefulSets. The ability to perform parallel updates drastically cut down the time required for updates, improving our workflow efficiency without compromising the unique pod identity requirements of our architecture.

OpenKruise, as a CNCF incubating project, has proven to be a reliable and well-supported solution for managing stateful workloads. It enabled us to seamlessly integrate advanced functionality into our existing Kubernetes setup, and it’s now a crucial part of our infrastructure.

Also Read: Securing PII Data at Scale