Understanding the K8sGPT Operator: Theory, Architecture, and Code
The K8sGPT Operator is an intermediary between the Kubernetes control plane, K8sGPT diagnostics workloads, and external observability tools like Prometheus. Here’s a breakdown of its key components and their interactions:
- Custom Resource Definition (CRD): The K8sGPT CRD allows users to declaratively define the scope and behavior of K8sGPT diagnostics through YAML manifests.
- K8sGPT Operator: The operator constantly monitors for changes in the K8sGPT CRD. It reconciles the desired state defined in the CRD with the cluster’s actual state, ensuring the K8sGPT deployment matches user specifications. It exposes metrics via Prometheus-compatible endpoints and gathers diagnostic metrics and analysis results from the K8sGPT deployment.
- K8sGPT Deployment: The K8sGPT deployment is the main diagnostic engine. It interacts directly with the Kubernetes API Server to gather cluster state and resource information and performs AI-driven analysis on the specified resources (e.g., pods, services, deployments).
- API Server Integration: The API Server provides a real-time cluster state for the K8sGPT deployment. The K8sGPT deployment uses this data to perform checks, identify anomalies, and analyze cluster health. Feedback is sent back to the operator for further processing.
- Prometheus Integration: Prometheus scrapes metrics exposed by the K8sGPT Operator. These metrics can be visualized using dashboards like Grafana or used for alerting purposes.
- Results Handling: Once diagnostics are performed, the results are:
- Processed and stored by the operator.
- Outputted to user-defined destinations (e.g., logs, dashboards).
- Integrated into workflows like CI/CD pipelines for automated actions.
Step-by-Step Workflow
User Input:
- A user creates or updates a K8sGPT Custom Resource (CR), defining the desired analysis behavior and scope.
Reconciliation:
- The operator detects the CR change and enters its reconciliation loop.
- It deploys or updates the K8sGPT workload to match the defined CRD.
Diagnostics:
- The K8sGPT deployment interacts with the API Server to gather live cluster data.
- It performs diagnostic checks and anomaly detection based on user specifications.
Metrics and Results:
- The K8sGPT deployment exposes metrics, which the operator scrapes.
- Results are formatted and sent to Prometheus or user-defined destinations.
Integration and Action:
- The operator integrates the results into existing workflows, triggering alerts or automated actions based on findings.
Advantages of This Architecture
- Scalability: The operator dynamically manages the K8sGPT workload, scaling it with cluster requirements.
- Flexibility: Through CRDs, users have full control over the scope and configuration of diagnostics.
- Observability: Integrates natively with Prometheus and supports external dashboards.
- Automation: Enables automation of issue detection and resolution, streamlining DevOps workflows.
If you want to understand from a code point of view, these are some important files.
The relevant files for the K8sGPT Operator and CRD definitions:
Custom Resource Definitions (CRD): The CRD defines how the custom resource (K8sGPT
) behaves and is configured.
- k8sgpt-operator/config/crd/bases/core.k8sgpt.ai_k8sgpts.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.14.0
name: k8sgpts.core.k8sgpt.ai
spec:
group: core.k8sgpt.ai
names:
kind: K8sGPT
listKind: K8sGPTList
plural: k8sgpts
singular: k8sgpt
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: K8sGPT is the Schema for the k8sgpts API
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: K8sGPTSpec defines the desired state of K8sGPT
properties:
ai:
properties:
anonymized:
default: true
type: boolean
backOff:
properties:
enabled:
default: false
type: boolean
maxRetries:
default: 5
type: integer
required:
- enabled
- maxRetries
type: object
backend:
default: openai
enum:
- ibmwatsonxai
- openai
- localai
- azureopenai
- amazonbedrock
- cohere
- amazonsagemaker
- google
- googlevertexai
Operator Logic: This file defines the schema and data structures for the K8sGPT
resource.
- k8sgpt-operator/api/v1alpha1/k8sgpt_types.go (Defines the CRD schema and types).
// K8sGPTSpec defines the desired state of K8sGPT
type K8sGPTSpec struct {
Version string `json:"version,omitempty"`
// +kubebuilder:default:=ghcr.io/k8sgpt-ai/k8sgpt
Repository string `json:"repository,omitempty"`
ImagePullSecrets []ImagePullSecrets `json:"imagePullSecrets,omitempty"`
NoCache bool `json:"noCache,omitempty"`
CustomAnalyzers []CustomAnalyzer `json:"customAnalyzers,omitempty"`
Filters []string `json:"filters,omitempty"`
ExtraOptions *ExtraOptionsRef `json:"extraOptions,omitempty"`
Sink *WebhookRef `json:"sink,omitempty"`
AI *AISpec `json:"ai,omitempty"`
RemoteCache *RemoteCacheRef `json:"remoteCache,omitempty"`
Integrations *Integrations `json:"integrations,omitempty"`
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
TargetNamespace string `json:"targetNamespace,omitempty"`
// Define the kubeconfig the Deployment must use.
// If empty, the Deployment will use the ServiceAccount provided by Kubernetes itself.
Kubeconfig *SecretRef `json:"kubeconfig,omitempty"`
}
k8sgpt-operator/controllers/k8sgpt_controller.go:
The operator’s controller logic is implemented here. The core responsibility of the controller is to reconcile the desired state defined in the CR with the actual state in the cluster.
func (r *K8sGPTReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
_ = log.FromContext(ctx)
instance := K8sGPTInstance{
r: r,
req: req,
ctx: ctx,
logger: k8sgptControllerLog,
}
initStep := InitStep{}
finalizerStep := FinalizerStep{}
configureStep := ConfigureStep{}
preAnalysisStep := PreAnalysisStep{}
analysisStep := AnalysisStep{}
resultStatusStep := ResultStatusStep{}
initStep.setNext(&finalizerStep)
finalizerStep.setNext(&configureStep)
configureStep.setNext(&preAnalysisStep)
preAnalysisStep.setNext(&analysisStep)
analysisStep.setNext(&resultStatusStep)
return initStep.execute(&instance)
}
k8sgpt-operator/pkg/resources/k8sgpt.go
(Resource handling): Helper functions ink8sgpt.go
manage Kubernetes resources like Deployments and Services for K8sGPT workloads.
// GetDeployment Create deployment with the latest K8sGPT image
func GetDeployment(config v1alpha1.K8sGPT, outOfClusterMode bool, c client.Client) (*appsv1.Deployment, error) {
// Create deployment
image := config.Spec.Repository + ":" + config.Spec.Version
replicas := int32(1)
deployment := appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: config.Name,
Namespace: config.Namespace,
OwnerReferences: []metav1.OwnerReference{
{
Kind: config.Kind,
Name: config.Name,
UID: config.UID,
APIVersion: config.APIVersion,
BlockOwnerDeletion: utils.PtrBool(true),
Controller: utils.PtrBool(true),
},
},
},
Sample Custom Resource:
- k8sgpt-operator/config/samples/valid_k8sgpt.yaml
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-sample
namespace: default
spec:
ai:
enabled: true
model: gpt-3.5-turbo
backend: openai
secret:
name: k8sgpt-sample-secret
key: openai-api-key
# anonymized: false
# language: english
noCache: false
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.8
Conclusion
The K8sGPT Operator is a powerful tool that simplifies and automates the integration of AI-driven diagnostics within Kubernetes environments. Leveraging the Kubernetes Operator pattern bridges the gap between theoretical concepts and practical implementation, making it easier for DevOps teams to deploy, configure, and manage K8sGPT workloads.
From its custom resource definitions (CRDs) to the controller logic and resource management code, the operator ensures seamless reconciliation of your cluster’s desired and actual states. It provides a flexible and extensible framework for cluster diagnostics while supporting integrations with external systems like Prometheus and Slack.
Through this exploration, we’ve covered the K8sGPT Operator’s theoretical foundation and delved into its inner workings, such as the CRD schema, reconciliation loop, and resource generation. This holistic understanding equips you with the knowledge to deploy and utilize the operator effectively in real-world scenarios.
The K8sGPT Operator exemplifies the synergy between AI and Kubernetes, enabling smarter cluster management. Implementing and customizing this operator for your specific use cases will empower your workflows with automation, observability, and actionable insights, paving the way for more resilient and efficient Kubernetes operations.