Understanding the K8sGPT Operator: Theory, Architecture, and Code

5 min readDec 13, 2024

The K8sGPT Operator is an intermediary between the Kubernetes control plane, K8sGPT diagnostics workloads, and external observability tools like Prometheus. Here’s a breakdown of its key components and their interactions:

Custom Resource Definition (CRD): The K8sGPT CRD allows users to declaratively define the scope and behavior of K8sGPT diagnostics through YAML manifests.
K8sGPT Operator: The operator constantly monitors for changes in the K8sGPT CRD. It reconciles the desired state defined in the CRD with the cluster’s actual state, ensuring the K8sGPT deployment matches user specifications. It exposes metrics via Prometheus-compatible endpoints and gathers diagnostic metrics and analysis results from the K8sGPT deployment.
K8sGPT Deployment: The K8sGPT deployment is the main diagnostic engine. It interacts directly with the Kubernetes API Server to gather cluster state and resource information and performs AI-driven analysis on the specified resources (e.g., pods, services, deployments).
API Server Integration: The API Server provides a real-time cluster state for the K8sGPT deployment. The K8sGPT deployment uses this data to perform checks, identify anomalies, and analyze cluster health. Feedback is sent back to the operator for further processing.
Prometheus Integration: Prometheus scrapes metrics exposed by the K8sGPT Operator. These metrics can be visualized using dashboards like Grafana or used for alerting purposes.
Results Handling: Once diagnostics are performed, the results are:

Processed and stored by the operator.
Outputted to user-defined destinations (e.g., logs, dashboards).
Integrated into workflows like CI/CD pipelines for automated actions.

Step-by-Step Workflow

User Input:

A user creates or updates a K8sGPT Custom Resource (CR), defining the desired analysis behavior and scope.

Reconciliation:

The operator detects the CR change and enters its reconciliation loop.
It deploys or updates the K8sGPT workload to match the defined CRD.

Diagnostics:

The K8sGPT deployment interacts with the API Server to gather live cluster data.
It performs diagnostic checks and anomaly detection based on user specifications.

Metrics and Results:

The K8sGPT deployment exposes metrics, which the operator scrapes.
Results are formatted and sent to Prometheus or user-defined destinations.

Integration and Action:

The operator integrates the results into existing workflows, triggering alerts or automated actions based on findings.

Advantages of This Architecture

Scalability: The operator dynamically manages the K8sGPT workload, scaling it with cluster requirements.
Flexibility: Through CRDs, users have full control over the scope and configuration of diagnostics.
Observability: Integrates natively with Prometheus and supports external dashboards.
Automation: Enables automation of issue detection and resolution, streamlining DevOps workflows.

If you want to understand from a code point of view, these are some important files.

The relevant files for the K8sGPT Operator and CRD definitions:

Custom Resource Definitions (CRD): The CRD defines how the custom resource (K8sGPT) behaves and is configured.

k8sgpt-operator/config/crd/bases/core.k8sgpt.ai_k8sgpts.yaml

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.14.0
  name: k8sgpts.core.k8sgpt.ai
spec:
  group: core.k8sgpt.ai
  names:
    kind: K8sGPT
    listKind: K8sGPTList
    plural: k8sgpts
    singular: k8sgpt
  scope: Namespaced
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        description: K8sGPT is the Schema for the k8sgpts API
        properties:
          apiVersion:
            description: |-
              APIVersion defines the versioned schema of this representation of an object.
              Servers should convert recognized schemas to the latest internal value, and
              may reject unrecognized values.
              More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
            type: string
          kind:
            description: |-
              Kind is a string value representing the REST resource this object represents.
              Servers may infer this from the endpoint the client submits requests to.
              Cannot be updated.
              In CamelCase.
              More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
            type: string
          metadata:
            type: object
          spec:
            description: K8sGPTSpec defines the desired state of K8sGPT
            properties:
              ai:
                properties:
                  anonymized:
                    default: true
                    type: boolean
                  backOff:
                    properties:
                      enabled:
                        default: false
                        type: boolean
                      maxRetries:
                        default: 5
                        type: integer
                    required:
                    - enabled
                    - maxRetries
                    type: object
                  backend:
                    default: openai
                    enum:
                    - ibmwatsonxai
                    - openai
                    - localai
                    - azureopenai
                    - amazonbedrock
                    - cohere
                    - amazonsagemaker
                    - google
                    - googlevertexai

Operator Logic: This file defines the schema and data structures for the K8sGPT resource.

k8sgpt-operator/api/v1alpha1/k8sgpt_types.go (Defines the CRD schema and types).

// K8sGPTSpec defines the desired state of K8sGPT
type K8sGPTSpec struct {
 Version string `json:"version,omitempty"`
 // +kubebuilder:default:=ghcr.io/k8sgpt-ai/k8sgpt
 Repository       string             `json:"repository,omitempty"`
 ImagePullSecrets []ImagePullSecrets `json:"imagePullSecrets,omitempty"`
 NoCache          bool               `json:"noCache,omitempty"`
 CustomAnalyzers  []CustomAnalyzer   `json:"customAnalyzers,omitempty"`
 Filters          []string           `json:"filters,omitempty"`
 ExtraOptions     *ExtraOptionsRef   `json:"extraOptions,omitempty"`
 Sink             *WebhookRef        `json:"sink,omitempty"`
 AI               *AISpec            `json:"ai,omitempty"`
 RemoteCache      *RemoteCacheRef    `json:"remoteCache,omitempty"`
 Integrations     *Integrations      `json:"integrations,omitempty"`
 NodeSelector     map[string]string  `json:"nodeSelector,omitempty"`
 TargetNamespace  string             `json:"targetNamespace,omitempty"`
 // Define the kubeconfig the Deployment must use.
 // If empty, the Deployment will use the ServiceAccount provided by Kubernetes itself.
 Kubeconfig *SecretRef `json:"kubeconfig,omitempty"`
}

k8sgpt-operator/controllers/k8sgpt_controller.go: The operator’s controller logic is implemented here. The core responsibility of the controller is to reconcile the desired state defined in the CR with the actual state in the cluster.

func (r *K8sGPTReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
 _ = log.FromContext(ctx)

 instance := K8sGPTInstance{
  r:      r,
  req:    req,
  ctx:    ctx,
  logger: k8sgptControllerLog,
 }

 initStep := InitStep{}
 finalizerStep := FinalizerStep{}
 configureStep := ConfigureStep{}
 preAnalysisStep := PreAnalysisStep{}
 analysisStep := AnalysisStep{}
 resultStatusStep := ResultStatusStep{}

 initStep.setNext(&finalizerStep)
 finalizerStep.setNext(&configureStep)
 configureStep.setNext(&preAnalysisStep)
 preAnalysisStep.setNext(&analysisStep)
 analysisStep.setNext(&resultStatusStep)

 return initStep.execute(&instance)

}

k8sgpt-operator/pkg/resources/k8sgpt.go (Resource handling): Helper functions in k8sgpt.go manage Kubernetes resources like Deployments and Services for K8sGPT workloads.

// GetDeployment Create deployment with the latest K8sGPT image
func GetDeployment(config v1alpha1.K8sGPT, outOfClusterMode bool, c client.Client) (*appsv1.Deployment, error) {

 // Create deployment
 image := config.Spec.Repository + ":" + config.Spec.Version
 replicas := int32(1)
 deployment := appsv1.Deployment{
  ObjectMeta: metav1.ObjectMeta{
   Name:      config.Name,
   Namespace: config.Namespace,
   OwnerReferences: []metav1.OwnerReference{
    {
     Kind:               config.Kind,
     Name:               config.Name,
     UID:                config.UID,
     APIVersion:         config.APIVersion,
     BlockOwnerDeletion: utils.PtrBool(true),
     Controller:         utils.PtrBool(true),
    },
   },
  },

Sample Custom Resource:

k8sgpt-operator/config/samples/valid_k8sgpt.yaml

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: default
spec:
  ai:
    enabled: true
    model: gpt-3.5-turbo
    backend: openai
    secret:
      name: k8sgpt-sample-secret
      key: openai-api-key
    # anonymized: false
    # language: english
  noCache: false
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.8

Conclusion

The K8sGPT Operator is a powerful tool that simplifies and automates the integration of AI-driven diagnostics within Kubernetes environments. Leveraging the Kubernetes Operator pattern bridges the gap between theoretical concepts and practical implementation, making it easier for DevOps teams to deploy, configure, and manage K8sGPT workloads.

From its custom resource definitions (CRDs) to the controller logic and resource management code, the operator ensures seamless reconciliation of your cluster’s desired and actual states. It provides a flexible and extensible framework for cluster diagnostics while supporting integrations with external systems like Prometheus and Slack.

Through this exploration, we’ve covered the K8sGPT Operator’s theoretical foundation and delved into its inner workings, such as the CRD schema, reconciliation loop, and resource generation. This holistic understanding equips you with the knowledge to deploy and utilize the operator effectively in real-world scenarios.

The K8sGPT Operator exemplifies the synergy between AI and Kubernetes, enabling smarter cluster management. Implementing and customizing this operator for your specific use cases will empower your workflows with automation, observability, and actionable insights, paving the way for more resilient and efficient Kubernetes operations.

Understanding the K8sGPT Operator: Theory, Architecture, and Code

Step-by-Step Workflow

Advantages of This Architecture

Conclusion

Written by Prashant Lakhera

No responses yet