OpenShift Logging Complete Guide: Log Collection, Analysis, and Monitoring [2026]
![OpenShift Logging Complete Guide: Log Collection, Analysis, and Monitoring [2026]](/images/blog/openshift/openshift-logging-hero.webp)
OpenShift Logging Complete Guide: Log Collection, Analysis, and Monitoring
When something goes wrong with your system, logs are your best friend. But in Kubernetes environments, Pods can be deleted and recreated at any time, and logs disappear with them.
OpenShift Logging solves this problem by centrally collecting, storing, and querying all logs. No matter where Pods run or how long they live, logs won't be lost.
This article provides a complete introduction to OpenShift Logging architecture and configuration to help you build a reliable log management system. If you're not familiar with OpenShift yet, we recommend first reading the OpenShift Complete Guide.
Introduction to OpenShift Logging
Importance of Logs
In container environments, logs are:
- First-hand data for troubleshooting: Application errors, performance issues
- Key for security audits: Who did what, when
- Compliance requirements: Many regulations require log retention
- Source for business analysis: User behavior, usage patterns
Without a logging system is like driving without a dashboard.
Architecture Overview
OpenShift Logging consists of three major parts:
┌─────────────────────────────────────────────────────┐
│ Log Sources │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Application│ │Infrastructure│ │ Audit │ │
│ │ Logs │ │ Logs │ │ Logs │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
├───────┼────────────┼────────────┼───────────────────┤
│ └────────────┼────────────┘ │
│ ▼ │
│ ┌─────────┐ │
│ │ Vector │ ← Log Collector │
│ └────┬────┘ │
│ ▼ │
│ ┌─────────┐ │
│ │ Loki │ ← Log Storage │
│ └────┬────┘ │
│ ▼ │
│ ┌─────────┐ │
│ │ Console │ ← Log Query │
│ └─────────┘ │
└─────────────────────────────────────────────────────┘
5.x vs 6.x Differences
OpenShift Logging had major architecture changes in version 6.x:
| Aspect | 5.x | 6.x |
|---|---|---|
| Log Collector | Fluentd | Vector |
| Log Storage | Elasticsearch | LokiStack (recommended) |
| Configuration | ClusterLogging CR | ClusterLogForwarder primary |
| Architecture Complexity | More complex | Simplified |
| Performance | Average | Improved |
6.x is the currently recommended version; this article focuses on 6.x.
Logging 6.x New Features
Vector Log Collector
Vector replaces Fluentd as the default collector.
Vector's advantages:
- Better performance: Written in Rust, more memory and CPU efficient
- Simpler configuration: More intuitive syntax
- More complete features: Built-in transformation, routing, filtering
Vector deploys as DaemonSet, one per node:
oc get pods -n openshift-logging -l app.kubernetes.io/component=collector
LokiStack
LokiStack is the recommended log storage solution, replacing Elasticsearch.
Loki's design philosophy differs from Elasticsearch:
- Only indexes metadata (timestamps, labels), not log content
- Lower storage costs
- Fast queries (when labels are known)
- Easier horizontal scaling
LokiStack architecture:
┌─────────────────────────────────────┐
│ LokiStack │
│ ┌─────────┐ ┌─────────┐ │
│ │Ingester │ │ Querier │ │
│ └────┬────┘ └────┬────┘ │
│ │ │ │
│ ┌────▼───────────▼────┐ │
│ │ Object Storage │ │
│ │ (S3 / ODF) │ │
│ └─────────────────────┘ │
└─────────────────────────────────────┘
Simplified Architecture
6.x architecture is more streamlined:
Old Architecture (5.x): Fluentd → Kafka (optional) → Elasticsearch → Kibana
New Architecture (6.x): Vector → LokiStack → OpenShift Console
No additional Kibana needed—query directly in OpenShift Console.
Installation and Configuration
Install Operators
Step 1: Install Loki Operator
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: loki-operator
namespace: openshift-operators-redhat
spec:
channel: stable-6.0
name: loki-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
Step 2: Install Cluster Logging Operator
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: cluster-logging
namespace: openshift-logging
spec:
channel: stable-6.0
name: cluster-logging
source: redhat-operators
sourceNamespace: openshift-marketplace
Configure LokiStack
LokiStack requires object storage. Using AWS S3 as example:
Step 1: Create Secret (S3 Credentials)
apiVersion: v1
kind: Secret
metadata:
name: logging-loki-s3
namespace: openshift-logging
stringData:
access_key_id: "<AWS_ACCESS_KEY>"
access_key_secret: "<AWS_SECRET_KEY>"
bucketnames: "openshift-logging-loki"
endpoint: "https://s3.ap-northeast-1.amazonaws.com"
region: "ap-northeast-1"
Step 2: Create LokiStack
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
size: 1x.small
storage:
schemas:
- version: v13
effectiveDate: "2024-10-01"
secret:
name: logging-loki-s3
type: s3
storageClassName: gp3-csi
tenants:
mode: openshift-logging
Configure ClusterLogForwarder
ClusterLogForwarder defines how logs are collected and forwarded:
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
serviceAccount:
name: cluster-logging-operator
outputs:
- name: default-lokistack
type: lokiStack
lokiStack:
target:
name: logging-loki
namespace: openshift-logging
authentication:
token:
from: serviceAccount
pipelines:
- name: default-logstore
inputRefs:
- application
- infrastructure
outputRefs:
- default-lokistack
Storage Configuration Recommendations
| Cluster Size | LokiStack Size | Storage Needed/Day | Recommended Retention |
|---|---|---|---|
| Small (<50 Pods) | 1x.extra-small | ~10 GB | 7 days |
| Medium (50-200 Pods) | 1x.small | ~50 GB | 14 days |
| Large (200+ Pods) | 1x.medium | ~200 GB | 30 days |
Loki Integration
LokiStack Architecture Details
LokiStack includes multiple components:
| Component | Function |
|---|---|
| Distributor | Receives logs, distributes to Ingesters |
| Ingester | Temporarily stores logs, writes to storage |
| Querier | Handles query requests |
| Query Frontend | Query caching, splitting |
| Compactor | Compresses old data |
LogQL Query Syntax
Loki uses LogQL query language, similar to PromQL:
Basic Queries:
# Query logs for specific namespace
{kubernetes_namespace_name="my-app"}
# Query specific Pod
{kubernetes_pod_name="my-pod-abc123"}
# Multiple conditions
{kubernetes_namespace_name="my-app", kubernetes_container_name="web"}
Filter Log Content:
# Logs containing "error"
{kubernetes_namespace_name="my-app"} |= "error"
# Logs not containing "debug"
{kubernetes_namespace_name="my-app"} != "debug"
# Regular expression
{kubernetes_namespace_name="my-app"} |~ "error|exception"
JSON Parsing:
# Parse JSON logs, filter level=error
{kubernetes_namespace_name="my-app"} | json | level="error"
# Extract specific fields
{kubernetes_namespace_name="my-app"} | json | line_format "{{.message}}"
Statistical Queries:
# Error count in past 5 minutes
count_over_time({kubernetes_namespace_name="my-app"} |= "error" [5m])
# Log volume per minute
rate({kubernetes_namespace_name="my-app"} [1m])
Query in Console
OpenShift Console has built-in log query interface:
- Go to Observe → Logs
- Select log type (Application / Infrastructure / Audit)
- Enter LogQL query
- View results
Can also use CLI:
# Install logcli
# Query logs
logcli query '{kubernetes_namespace_name="my-app"}' --limit=100
Log Types
OpenShift categorizes logs into three major types:
Application Logs
From user-deployed applications (non openshift-* namespaces):
- Container stdout/stderr
- Log files written by applications
Label example:
{
log_type: "application",
kubernetes_namespace_name: "my-app",
kubernetes_pod_name: "web-abc123",
kubernetes_container_name: "web"
}
Infrastructure Logs
From OpenShift system components:
- Pods in openshift-* namespaces
- Pods in kube-* namespaces
- Node system logs (journald)
Label example:
{
log_type: "infrastructure",
kubernetes_namespace_name: "openshift-apiserver"
}
Audit Logs
System operation audit records:
- Kubernetes API Audit: Who did what to the API
- OpenShift API Audit: OAuth, Route operations, etc.
- OVN Audit: Network operations
- Node Audit: Linux auditd logs
Audit logs are critical for compliance.
Log Forwarding
Besides storing to LokiStack, logs can be forwarded to external systems.
Forward to Kafka
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
outputs:
- name: kafka-receiver
type: kafka
kafka:
brokers:
- kafka.example.com:9092
topic: openshift-logs
pipelines:
- name: to-kafka
inputRefs:
- application
outputRefs:
- kafka-receiver
Forward to Splunk
outputs:
- name: splunk-receiver
type: splunk
splunk:
url: https://splunk.example.com:8088
token:
secretName: splunk-token
key: token
Forward to Cloud Services
AWS CloudWatch:
outputs:
- name: cloudwatch
type: cloudwatch
cloudwatch:
region: ap-northeast-1
groupBy: namespaceName
secret:
name: cloudwatch-credentials
Azure Monitor:
outputs:
- name: azure-monitor
type: azureMonitor
azureMonitor:
customerId: "<workspace-id>"
logType: openshift
secret:
name: azure-monitor-secret
Log forwarding architecture needs to consider latency, reliability, and cost. Book an architecture consultation and let us help you design the best solution.
Audit Log Management
Kubernetes API Audit
Records all operations to Kubernetes API:
{
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "RequestResponse",
"stage": "ResponseComplete",
"requestURI": "/api/v1/namespaces/default/pods",
"verb": "create",
"user": {
"username": "admin",
"groups": ["system:authenticated"]
},
"responseStatus": {
"code": 201
}
}
Audit Policy
OpenShift's default audit policy balances completeness and storage. To adjust:
apiVersion: config.openshift.io/v1
kind: APIServer
metadata:
name: cluster
spec:
audit:
profile: Default # or WriteRequestBodies, AllRequestBodies
| Profile | Description | Log Volume |
|---|---|---|
| Default | Records metadata | Medium |
| WriteRequestBodies | Records write operation bodies | Large |
| AllRequestBodies | Records all operation bodies | Very Large |
Compliance Requirements
Common compliance standards' log requirements:
| Standard | Requirements |
|---|---|
| PCI DSS | Retain at least 1 year, accessible for 3 months |
| SOC 2 | System access logs, change logs |
| HIPAA | Access auditing, integrity protection |
| ISO 27001 | Log protection, regular review |
Set log retention policy:
apiVersion: loki.grafana.com/v1
kind: LokiStack
spec:
limits:
global:
retention:
days: 30
streams:
- selector: '{log_type="audit"}'
priority: 1
period: 365d # Audit logs retained 1 year
Best Practices
Log Retention Policy
Tiered Retention:
| Log Type | Recommended Retention | Notes |
|---|---|---|
| Application Logs | 7-30 days | Based on troubleshooting needs |
| Infrastructure Logs | 14-30 days | System troubleshooting |
| Audit Logs | 90-365 days | Compliance requirements |
Resource Planning
Vector (Collector):
- ~200-500 MB RAM per node
- CPU depends on log volume
LokiStack:
- Ingesters need more memory
- Choose appropriate size
# Production recommendation
spec:
size: 1x.medium # or larger
replication:
factor: 3 # Replica count
Performance Tuning
1. Use Label Filtering
# Good: Filter by labels first, reduce scan volume
{kubernetes_namespace_name="my-app"} |= "error"
# Bad: No labels, scans everything
{} |= "error"
2. Limit Time Range
# Good: Explicit time range
{kubernetes_namespace_name="my-app"} |= "error"
# Select "Past 1 hour" when querying
# Bad: Query "Past 30 days" of massive data
3. Appropriate Log Levels
Applications should:
- Use INFO or WARN level in production
- Avoid excessive DEBUG logs
- Structured logs (JSON) are easier to query
Security Considerations
1. Access Control
Limit who can see which logs:
# RBAC: Only allow viewing own namespace logs
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: log-viewer
namespace: my-app
subjects:
- kind: User
name: developer
roleRef:
kind: ClusterRole
name: cluster-logging-application-view
2. Sensitive Data
Avoid logging sensitive information:
- Passwords, API Keys
- Personally Identifiable Information (PII)
- Credit card numbers
Can use Vector's transformation features to mask:
spec:
filters:
- name: mask-sensitive
type: prune
prune:
in:
- .message
notIn:
- password
- secret
Common Troubleshooting
Missing Logs
Symptom: Some Pod logs don't appear
Troubleshoot:
# Check Vector Pod status
oc get pods -n openshift-logging -l app.kubernetes.io/component=collector
# Check Vector logs
oc logs -n openshift-logging -l app.kubernetes.io/component=collector
# Check ClusterLogForwarder status
oc get clusterlogforwarder -n openshift-logging -o yaml
Common causes:
- Pod log path isn't standard location
- Vector resource insufficient
- Target storage has issues
Performance Issues
Symptom: Queries very slow or timeout
Solutions:
- Narrow query scope (time, namespace)
- Increase LokiStack resources
- Check object storage latency
Storage Space Insufficient
Symptom: Logs stop writing
Solutions:
- Check storage usage
- Adjust retention policy
- Expand storage
# Check PVC usage
oc get pvc -n openshift-logging
FAQ
Q1: Why did 6.x switch to Loki instead of Elasticsearch?
Loki's design is more suitable for log scenarios: (1) Only indexes labels not content, storage costs much lower; (2) Architecture simpler, less operations burden; (3) Good integration with Prometheus ecosystem. Elasticsearch is powerful but somewhat over-designed for logs, and resource-intensive.
Q2: Can old Elasticsearch logs be migrated to Loki?
Technically possible, but usually not recommended. More pragmatic approach is: (1) Keep old Elasticsearch for reading old logs; (2) New logs go to LokiStack; (3) Old logs naturally age out after retention period.
Q3: Log volume is huge, how to control costs?
Several directions: (1) Only collect needed logs, use ClusterLogForwarder to filter; (2) Shorten retention period; (3) Use cheaper object storage (S3 Standard-IA); (4) Increase compression ratio; (5) Avoid logging DEBUG level to production.
Q4: Can I send to both Loki and Splunk simultaneously?
Yes. ClusterLogForwarder supports multiple outputs, same logs can go to multiple destinations. But watch network bandwidth and Splunk licensing costs.
Q5: How do I know if logs are being lost?
Several checkpoints: (1) Compare Pod count vs Pods with logs; (2) Monitor Vector's drop metrics; (3) Spot-check specific Pod logs for completeness; (4) Set up alerts when log volume suddenly drops.
Logging Systems are Operations' Eyes
Poor design will leave you blind to problems. From log architecture to retention policy, every decision affects subsequent troubleshooting efficiency.
Book an architecture consultation and let us help you optimize your logging architecture.
Reference Resources
Need Professional Cloud Advice?
Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help
Book Free ConsultationRelated Articles
OpenShift Advanced Features: ACM, ACS, LDAP, Authentication Configuration Complete Guide [2026]
In-depth introduction to OpenShift advanced feature configuration, covering ACM multi-cluster management, ACS advanced security, LDAP/AD authentication, RBAC permission design, Auto Scaling, and Service Mesh.
OpenShiftOpenShift AI: Complete Enterprise AI/ML Platform Guide [2026]
In-depth analysis of OpenShift AI platform, covering AI/ML workflows, OpenShift Lightspeed, GPU support, Jupyter integration, MLOps practices, and deployment configuration.
OpenShiftOpenShift Architecture Deep Dive: Control Plane, Operator and Network Design [2026]
In-depth analysis of OpenShift architecture design, covering Control Plane components, Worker Nodes, Operator mechanism, OVN-Kubernetes networking, storage architecture, security design and high availability configuration.