mirror of
https://github.com/morten-olsen/homelab-nuclei-operator.git
synced 2026-02-08 02:16:23 +01:00
feat: implement pod-based scanning architecture
This major refactor moves from synchronous subprocess-based scanning to asynchronous pod-based scanning using Kubernetes Jobs. ## Architecture Changes - Scanner jobs are now Kubernetes Jobs with TTLAfterFinished for automatic cleanup - Jobs have owner references for garbage collection when NucleiScan is deleted - Configurable concurrency limits, timeouts, and resource requirements ## New Features - Dual-mode binary: --mode=controller (default) or --mode=scanner - Annotation-based configuration for Ingress/VirtualService resources - Operator-level configuration via environment variables - Startup recovery for orphaned scans after operator restart - Periodic cleanup of stuck jobs ## New Files - DESIGN.md: Comprehensive architecture design document - internal/jobmanager/: Job Manager for creating/monitoring scanner jobs - internal/scanner/runner.go: Scanner mode implementation - internal/annotations/: Annotation parsing utilities - charts/nuclei-operator/templates/scanner-rbac.yaml: Scanner RBAC ## API Changes - Added ScannerConfig struct for per-scan scanner configuration - Added JobReference struct for tracking scanner jobs - Added ScannerConfig field to NucleiScanSpec - Added JobRef and ScanStartTime fields to NucleiScanStatus ## Supported Annotations - nuclei.homelab.mortenolsen.pro/enabled - nuclei.homelab.mortenolsen.pro/templates - nuclei.homelab.mortenolsen.pro/severity - nuclei.homelab.mortenolsen.pro/schedule - nuclei.homelab.mortenolsen.pro/timeout - nuclei.homelab.mortenolsen.pro/scanner-image ## RBAC Updates - Added Job and Pod permissions for operator - Created separate scanner service account with minimal permissions ## Documentation - Updated README, user-guide, api.md, and Helm chart README - Added example annotated Ingress resources
This commit is contained in:
100
docs/api.md
100
docs/api.md
@@ -10,6 +10,8 @@ This document provides a complete reference for the Nuclei Operator Custom Resou
|
||||
- [Status](#status)
|
||||
- [Type Definitions](#type-definitions)
|
||||
- [SourceReference](#sourcereference)
|
||||
- [ScannerConfig](#scannerconfig)
|
||||
- [JobReference](#jobreference)
|
||||
- [Finding](#finding)
|
||||
- [ScanSummary](#scansummary)
|
||||
- [ScanPhase](#scanphase)
|
||||
@@ -62,6 +64,16 @@ spec:
|
||||
- critical
|
||||
schedule: "@every 24h"
|
||||
suspend: false
|
||||
scannerConfig:
|
||||
image: "custom-scanner:latest"
|
||||
timeout: "1h"
|
||||
resources:
|
||||
requests:
|
||||
cpu: 200m
|
||||
memory: 512Mi
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: 1Gi
|
||||
```
|
||||
|
||||
#### Spec Fields
|
||||
@@ -74,6 +86,7 @@ spec:
|
||||
| `severity` | []string | No | Severity filter. Valid values: `info`, `low`, `medium`, `high`, `critical` |
|
||||
| `schedule` | string | No | Cron schedule for periodic rescanning |
|
||||
| `suspend` | bool | No | When true, suspends scheduled scans |
|
||||
| `scannerConfig` | [ScannerConfig](#scannerconfig) | No | Scanner-specific configuration overrides |
|
||||
|
||||
#### Schedule Format
|
||||
|
||||
@@ -110,6 +123,12 @@ status:
|
||||
lastScanTime: "2024-01-15T10:30:00Z"
|
||||
completionTime: "2024-01-15T10:35:00Z"
|
||||
nextScheduledTime: "2024-01-16T10:30:00Z"
|
||||
scanStartTime: "2024-01-15T10:30:05Z"
|
||||
jobRef:
|
||||
name: my-app-scan-abc123
|
||||
uid: "job-uid-12345"
|
||||
podName: my-app-scan-abc123-xyz
|
||||
startTime: "2024-01-15T10:30:00Z"
|
||||
summary:
|
||||
totalFindings: 3
|
||||
findingsBySeverity:
|
||||
@@ -127,6 +146,7 @@ status:
|
||||
timestamp: "2024-01-15T10:32:00Z"
|
||||
lastError: ""
|
||||
observedGeneration: 1
|
||||
retryCount: 0
|
||||
```
|
||||
|
||||
#### Status Fields
|
||||
@@ -138,10 +158,14 @@ status:
|
||||
| `lastScanTime` | *Time | When the last scan was initiated |
|
||||
| `completionTime` | *Time | When the last scan completed |
|
||||
| `nextScheduledTime` | *Time | When the next scheduled scan will run |
|
||||
| `scanStartTime` | *Time | When the scanner pod actually started scanning |
|
||||
| `jobRef` | *[JobReference](#jobreference) | Reference to the current or last scanner job |
|
||||
| `summary` | *[ScanSummary](#scansummary) | Aggregated scan statistics |
|
||||
| `findings` | [][Finding](#finding) | Array of scan results |
|
||||
| `lastError` | string | Error message if the scan failed |
|
||||
| `observedGeneration` | int64 | Generation observed by the controller |
|
||||
| `retryCount` | int | Number of consecutive availability check retries |
|
||||
| `lastRetryTime` | *Time | When the last availability check retry occurred |
|
||||
|
||||
#### Conditions
|
||||
|
||||
@@ -188,6 +212,82 @@ type SourceReference struct {
|
||||
| `namespace` | string | Yes | Namespace of the source resource |
|
||||
| `uid` | string | Yes | UID of the source resource |
|
||||
|
||||
### ScannerConfig
|
||||
|
||||
`ScannerConfig` defines scanner-specific configuration that can override default settings.
|
||||
|
||||
```go
|
||||
type ScannerConfig struct {
|
||||
Image string `json:"image,omitempty"`
|
||||
Resources *corev1.ResourceRequirements `json:"resources,omitempty"`
|
||||
Timeout *metav1.Duration `json:"timeout,omitempty"`
|
||||
TemplateURLs []string `json:"templateURLs,omitempty"`
|
||||
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
|
||||
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `image` | string | No | Override the default scanner image |
|
||||
| `resources` | ResourceRequirements | No | Resource requirements for the scanner pod |
|
||||
| `timeout` | Duration | No | Override the default scan timeout |
|
||||
| `templateURLs` | []string | No | Additional template repositories to clone |
|
||||
| `nodeSelector` | map[string]string | No | Node selector for scanner pod scheduling |
|
||||
| `tolerations` | []Toleration | No | Tolerations for scanner pod scheduling |
|
||||
|
||||
**Example:**
|
||||
|
||||
```yaml
|
||||
scannerConfig:
|
||||
image: "ghcr.io/custom/scanner:v1.0.0"
|
||||
timeout: "1h"
|
||||
resources:
|
||||
requests:
|
||||
cpu: 200m
|
||||
memory: 512Mi
|
||||
limits:
|
||||
cpu: "2"
|
||||
memory: 2Gi
|
||||
nodeSelector:
|
||||
node-type: scanner
|
||||
tolerations:
|
||||
- key: "dedicated"
|
||||
operator: "Equal"
|
||||
value: "scanner"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
### JobReference
|
||||
|
||||
`JobReference` contains information about the scanner job for tracking and debugging.
|
||||
|
||||
```go
|
||||
type JobReference struct {
|
||||
Name string `json:"name"`
|
||||
UID string `json:"uid"`
|
||||
PodName string `json:"podName,omitempty"`
|
||||
StartTime *metav1.Time `json:"startTime,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `name` | string | Yes | Name of the Kubernetes Job |
|
||||
| `uid` | string | Yes | UID of the Job |
|
||||
| `podName` | string | No | Name of the scanner pod (for log retrieval) |
|
||||
| `startTime` | *Time | No | When the job was created |
|
||||
|
||||
**Example:**
|
||||
|
||||
```yaml
|
||||
jobRef:
|
||||
name: my-scan-abc123
|
||||
uid: "12345678-1234-1234-1234-123456789012"
|
||||
podName: my-scan-abc123-xyz
|
||||
startTime: "2024-01-15T10:30:00Z"
|
||||
```
|
||||
|
||||
### Finding
|
||||
|
||||
`Finding` represents a single vulnerability or issue discovered during a scan.
|
||||
|
||||
@@ -7,6 +7,8 @@ This guide provides detailed instructions for using the Nuclei Operator to autom
|
||||
- [Introduction](#introduction)
|
||||
- [Installation](#installation)
|
||||
- [Basic Usage](#basic-usage)
|
||||
- [Scanner Architecture](#scanner-architecture)
|
||||
- [Annotation-Based Configuration](#annotation-based-configuration)
|
||||
- [Configuration Options](#configuration-options)
|
||||
- [Working with Ingress Resources](#working-with-ingress-resources)
|
||||
- [Working with VirtualService Resources](#working-with-virtualservice-resources)
|
||||
@@ -24,11 +26,13 @@ The Nuclei Operator automates security scanning by watching for Kubernetes Ingre
|
||||
|
||||
1. Extracts target URLs from the resource
|
||||
2. Creates a NucleiScan custom resource
|
||||
3. Executes a Nuclei security scan
|
||||
3. Creates a Kubernetes Job to execute the Nuclei security scan in an isolated pod
|
||||
4. Stores the results in the NucleiScan status
|
||||
|
||||
This enables continuous security monitoring of your web applications without manual intervention.
|
||||
|
||||
The operator uses a **pod-based scanning architecture** where each scan runs in its own isolated Kubernetes Job, providing better scalability, reliability, and resource control.
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
@@ -151,6 +155,224 @@ kubectl apply -f manual-scan.yaml
|
||||
|
||||
---
|
||||
|
||||
## Scanner Architecture
|
||||
|
||||
The nuclei-operator uses a pod-based scanning architecture for improved scalability and reliability:
|
||||
|
||||
1. **Operator Pod**: Manages NucleiScan resources and creates scanner jobs
|
||||
2. **Scanner Jobs**: Kubernetes Jobs that execute nuclei scans in isolated pods
|
||||
3. **Direct Status Updates**: Scanner pods update NucleiScan status directly via the Kubernetes API
|
||||
|
||||
### Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Kubernetes Cluster │
|
||||
│ │
|
||||
│ ┌──────────────────┐ ┌──────────────────────────────────────┐ │
|
||||
│ │ Operator Pod │ │ Scanner Jobs │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ┌────────────┐ │ │ ┌─────────┐ ┌─────────┐ │ │
|
||||
│ │ │ Controller │──┼─────┼─▶│ Job 1 │ │ Job 2 │ ... │ │
|
||||
│ │ │ Manager │ │ │ │(Scanner)│ │(Scanner)│ │ │
|
||||
│ │ └────────────┘ │ │ └────┬────┘ └────┬────┘ │ │
|
||||
│ │ │ │ │ │ │ │ │
|
||||
│ └────────┼─────────┘ └───────┼────────────┼─────────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Kubernetes API Server │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ NucleiScan │ │ NucleiScan │ │ NucleiScan │ ... │ │
|
||||
│ │ │ Resource │ │ Resource │ │ Resource │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Benefits
|
||||
|
||||
- **Scalability**: Multiple scans can run concurrently across the cluster
|
||||
- **Isolation**: Each scan runs in its own pod with dedicated resources
|
||||
- **Reliability**: Scans survive operator restarts
|
||||
- **Resource Control**: Per-scan resource limits and quotas
|
||||
- **Observability**: Individual pod logs for each scan
|
||||
|
||||
### Scanner Configuration
|
||||
|
||||
Configure scanner behavior via Helm values:
|
||||
|
||||
```yaml
|
||||
scanner:
|
||||
# Enable scanner RBAC resources
|
||||
enabled: true
|
||||
|
||||
# Scanner image (defaults to operator image)
|
||||
image: "ghcr.io/morten-olsen/nuclei-operator:latest"
|
||||
|
||||
# Default scan timeout
|
||||
timeout: "30m"
|
||||
|
||||
# Maximum concurrent scan jobs
|
||||
maxConcurrent: 5
|
||||
|
||||
# Job TTL after completion (seconds)
|
||||
ttlAfterFinished: 3600
|
||||
|
||||
# Default resource requirements for scanner pods
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 256Mi
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: 1Gi
|
||||
|
||||
# Default templates to use
|
||||
defaultTemplates: []
|
||||
|
||||
# Default severity filter
|
||||
defaultSeverity: []
|
||||
```
|
||||
|
||||
### Per-Scan Scanner Configuration
|
||||
|
||||
You can override scanner settings for individual scans using the `scannerConfig` field in the NucleiScan spec:
|
||||
|
||||
```yaml
|
||||
apiVersion: nuclei.homelab.mortenolsen.pro/v1alpha1
|
||||
kind: NucleiScan
|
||||
metadata:
|
||||
name: custom-scan
|
||||
spec:
|
||||
sourceRef:
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
name: my-ingress
|
||||
namespace: default
|
||||
uid: "abc123"
|
||||
targets:
|
||||
- https://example.com
|
||||
scannerConfig:
|
||||
# Override scanner image
|
||||
image: "custom-scanner:latest"
|
||||
# Override timeout
|
||||
timeout: "1h"
|
||||
# Custom resource requirements
|
||||
resources:
|
||||
requests:
|
||||
cpu: 200m
|
||||
memory: 512Mi
|
||||
limits:
|
||||
cpu: "2"
|
||||
memory: 2Gi
|
||||
# Node selector for scanner pod
|
||||
nodeSelector:
|
||||
node-type: scanner
|
||||
# Tolerations for scanner pod
|
||||
tolerations:
|
||||
- key: "scanner"
|
||||
operator: "Equal"
|
||||
value: "true"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Annotation-Based Configuration
|
||||
|
||||
You can configure scanning behavior for individual Ingress or VirtualService resources using annotations.
|
||||
|
||||
### Supported Annotations
|
||||
|
||||
| Annotation | Type | Default | Description |
|
||||
|------------|------|---------|-------------|
|
||||
| `nuclei.homelab.mortenolsen.pro/enabled` | bool | `true` | Enable/disable scanning for this resource |
|
||||
| `nuclei.homelab.mortenolsen.pro/templates` | string | - | Comma-separated list of template paths or tags |
|
||||
| `nuclei.homelab.mortenolsen.pro/severity` | string | - | Comma-separated severity filter: info,low,medium,high,critical |
|
||||
| `nuclei.homelab.mortenolsen.pro/schedule` | string | - | Cron schedule for periodic scans |
|
||||
| `nuclei.homelab.mortenolsen.pro/timeout` | duration | `30m` | Scan timeout |
|
||||
| `nuclei.homelab.mortenolsen.pro/scanner-image` | string | - | Override scanner image |
|
||||
| `nuclei.homelab.mortenolsen.pro/exclude-templates` | string | - | Templates to exclude |
|
||||
| `nuclei.homelab.mortenolsen.pro/tags` | string | - | Template tags to include |
|
||||
| `nuclei.homelab.mortenolsen.pro/exclude-tags` | string | - | Template tags to exclude |
|
||||
|
||||
### Example Annotated Ingress
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: myapp-ingress
|
||||
annotations:
|
||||
nuclei.homelab.mortenolsen.pro/enabled: "true"
|
||||
nuclei.homelab.mortenolsen.pro/severity: "medium,high,critical"
|
||||
nuclei.homelab.mortenolsen.pro/schedule: "0 2 * * *"
|
||||
nuclei.homelab.mortenolsen.pro/templates: "cves/,vulnerabilities/"
|
||||
spec:
|
||||
rules:
|
||||
- host: myapp.example.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: myapp
|
||||
port:
|
||||
number: 80
|
||||
```
|
||||
|
||||
### Example Annotated VirtualService
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.istio.io/v1beta1
|
||||
kind: VirtualService
|
||||
metadata:
|
||||
name: myapp-vs
|
||||
annotations:
|
||||
nuclei.homelab.mortenolsen.pro/enabled: "true"
|
||||
nuclei.homelab.mortenolsen.pro/severity: "high,critical"
|
||||
nuclei.homelab.mortenolsen.pro/timeout: "1h"
|
||||
nuclei.homelab.mortenolsen.pro/tags: "cve,oast"
|
||||
spec:
|
||||
hosts:
|
||||
- myapp.example.com
|
||||
gateways:
|
||||
- my-gateway
|
||||
http:
|
||||
- route:
|
||||
- destination:
|
||||
host: myapp
|
||||
port:
|
||||
number: 80
|
||||
```
|
||||
|
||||
### Disabling Scanning
|
||||
|
||||
To disable scanning for a specific resource:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
nuclei.homelab.mortenolsen.pro/enabled: "false"
|
||||
```
|
||||
|
||||
This is useful when you want to temporarily exclude certain resources from scanning without removing them from the cluster.
|
||||
|
||||
### Annotation Precedence
|
||||
|
||||
When both annotations and NucleiScan spec fields are present, the following precedence applies:
|
||||
|
||||
1. **NucleiScan spec fields** (highest priority) - Direct configuration in the NucleiScan resource
|
||||
2. **Annotations** - Configuration from the source Ingress/VirtualService
|
||||
3. **Helm values** - Default configuration from the operator deployment
|
||||
4. **Built-in defaults** (lowest priority) - Hardcoded defaults in the operator
|
||||
|
||||
---
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Severity Filtering
|
||||
|
||||
Reference in New Issue
Block a user