🤖 The tool that will replace your DevOps/SRE/System Engineering team — DevOps GPT 🤖
“The tool that will replace your DevOps/SRE/System Engineering team” — Now that I have your attention, let’s talk about automation in the world of DevOps.
At a high level, the DevOps/SRE/System Engineering teams handle two critical tasks:
1️⃣ Writing and deploying the code
2️⃣ Monitoring the infrastructure post-deployment
While the first task has seen a lot of automation, the second, monitoring and troubleshooting, is still a manual, time-consuming process. Most solutions today just send logs to a centralized system and rely on alerts, leaving engineers to figure out the root cause themselves.
But what if LLMs could do the heavy lifting?
💡 Introducing DevOps GPT — An AI-driven agent that proactively monitors logs and suggests solutions in real-time.
🔍 How It Works:
✅ AI Agent runs directly on the server
✅ Detects errors from logs
✅ First, it checks its internal knowledge base (built from experience)
✅ Caches previous solutions to save costs and avoid redundant LLM calls
✅ If no match is found, it queries models like OpenAI, LLAMA3, or DeepSeek for recommendations
✅ Sends the recommended solution via Slack (more integrations coming soon!)
Installation
Step 1: Install the RPM Package
Run the following command in RedHat to install DevOps-GPT Agent:
rpm -ivh https://github.com/thedevops-gpt/devops-gpt/releases/download/0.0.1/devops-gpt-0.0.1-0.el9.x86_64.rpm
Note: If you encounter dependency issues, resolve them with:
yum -y install python3-pip
Step 2: Configure the Agent
You can configure OpenAI, DeepSeek, or Llama 3 based on your requirements. Run the configuration command and follow the prompts:
#OpenAI
sudo devops-gpt-configure
Enter check interval in seconds [10]:
Enter batch size [1]:
Enter maximum errors per batch [10]:
Enter error window in seconds [3600]: Available LLM providers:
1. OpenAI (requires API key)
2. Ollama (local)
Choose LLM provider (1/2) [1]:
Enter OpenAI API key: sk-XXXXX
Enable Slack notifications? (y/n) [y]:
Enter Slack webhook URL: https://hooks.slack.com/services/XXXXXXXXX
Configuration saved to /etc/devops-gpt/config.yaml
# Ollama(DeepSeek or Llama3
sudo devops-gpt-configure
Enter check interval in seconds [10]:
Enter batch size [1]:
Enter maximum errors per batch [10]:
Enter error window in seconds [3600]: Available LLM providers:
1. OpenAI (requires API key)
2. Ollama (local)
Choose LLM provider (1/2) [1]: 2Available Ollama models:
1. Llama 3.3
2. DeepSeek
Choose Ollama model (1/2) [1]: 2
Enable Slack notifications? (y/n) [y]:
Enter Slack webhook URL: https://hooks.slack.com/services/XXXXXX
Configuration saved to /etc/devops-gpt/config.yaml
Configuration Prompts:
- Enter check interval in seconds [10]: Set the interval between checks (default: 10 seconds).
- Enter batch size [1]: Define the number of errors to process in a batch (default: 1).
- Enter maximum errors per batch [10]: Set the maximum number of errors in a batch (default: 10).
- Enter error window in seconds [3600]: Specify the time window for error tracking (default: 1 hour).
Choose LLM provider:
- OpenAI (requires API key)
- Ollama (local LLM)
- DeepSeek (local LLM)
- Enter OpenAI API key: Provide your OpenAI API key (https://platform.openai.com/api-keys).
- Enable Slack notifications? (y/n): Enable or disable Slack notifications.
- Enter Slack webhook URL: Add your Slack Webhook URL (https://api.slack.com/messaging/webhooks).
- The configuration is saved to /etc/devops-gpt/config.yaml.
Step 3: Start and Enable the Service
Start the DevOps-GPT Agent service:
sudo systemctl start devops-gpt
sudo systemctl enable devops-gpt
sudo systemctl status devops-gpt
● devops-gpt.service - DevOps GPT Service
Loaded: loaded (/usr/lib/systemd/system/devops-gpt.service; disabled; preset: disabled)
Active: active (running) since Sun 2025-02-02 01:53:24 UTC; 4s ago
Main PID: 28448 (python3)
Tasks: 1 (limit: 48154)
Memory: 26.9M
CPU: 618ms
CGroup: /system.slice/devops-gpt.service
└─28448 /usr/bin/python3 -m devops_gpt.main
Feb 02 01:53:24plakhera.example.com systemd[1]: Started DevOps GPT Service.
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: 2025-02-02 01:53:24,646 - __main__ - INFO - Starting DevOps GPT service...
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: 2025-02-02 01:53:24,646 - devops_gpt.config_manager - INFO - Loading config from: /etc/devops-gpt/config.yaml
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: Debug: Loaded config contents: {'log_paths': ['system', 'syslog'], 'check_interval': 10, 'batch_size': 1, 'max_errors_per_batch': 10, 'error_window': 3600, 'llm_provider': 'openai', 'op>
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: 2025-02-02 01:53:24,654 - __main__ - INFO - Configuration loaded. Monitoring paths: ['system', 'syslog']
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: 2025-02-02 01:53:24,654 - devops_gpt.log_monitor - INFO - DevOps GPT Log Monitor initialized with paths: ['system', 'syslog']
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: 2025-02-02 01:53:24,672 - devops_gpt.cache.local_cache - INFO - DevOps GPT Cache initialized at /var/cache/devops-gpt/openai
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: 2025-02-02 01:53:24,672 - devops_gpt.llms.openai_provider - INFO - DevOps GPT OpenAI provider initialized
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: 2025-02-02 01:53:24,679 - devops_gpt.error_patterns - INFO - DevOps GPT Error Analyzer initialized with 20 patterns
Feb 02 01:53:24plakhera.example.com devops-gpt[28448]: 2025-02-02 01:53:24,679 - __main__ - INFO - DevOps GPT initialized with OPENAI as LLM provider
- Once the service is up and running, you will receive an alert on Slack.
💬 Give it a try! Your feedback is invaluable as I refine this tool. I’ll be sharing more updates throughout the week — stay tuned!
🔗 For more details about the installation process, see the following link: https://github.com/thedevops-gpt/devops-gpt