feat: implement pod-based scanning architecture

This major refactor moves from synchronous subprocess-based scanning to
asynchronous pod-based scanning using Kubernetes Jobs.

## Architecture Changes
- Scanner jobs are now Kubernetes Jobs with TTLAfterFinished for automatic cleanup
- Jobs have owner references for garbage collection when NucleiScan is deleted
- Configurable concurrency limits, timeouts, and resource requirements

## New Features
- Dual-mode binary: --mode=controller (default) or --mode=scanner
- Annotation-based configuration for Ingress/VirtualService resources
- Operator-level configuration via environment variables
- Startup recovery for orphaned scans after operator restart
- Periodic cleanup of stuck jobs

## New Files
- DESIGN.md: Comprehensive architecture design document
- internal/jobmanager/: Job Manager for creating/monitoring scanner jobs
- internal/scanner/runner.go: Scanner mode implementation
- internal/annotations/: Annotation parsing utilities
- charts/nuclei-operator/templates/scanner-rbac.yaml: Scanner RBAC

## API Changes
- Added ScannerConfig struct for per-scan scanner configuration
- Added JobReference struct for tracking scanner jobs
- Added ScannerConfig field to NucleiScanSpec
- Added JobRef and ScanStartTime fields to NucleiScanStatus

## Supported Annotations
- nuclei.homelab.mortenolsen.pro/enabled
- nuclei.homelab.mortenolsen.pro/templates
- nuclei.homelab.mortenolsen.pro/severity
- nuclei.homelab.mortenolsen.pro/schedule
- nuclei.homelab.mortenolsen.pro/timeout
- nuclei.homelab.mortenolsen.pro/scanner-image

## RBAC Updates
- Added Job and Pod permissions for operator
- Created separate scanner service account with minimal permissions

## Documentation
- Updated README, user-guide, api.md, and Helm chart README
- Added example annotated Ingress resources
This commit is contained in:
Morten Olsen
2025-12-12 20:51:23 +01:00
parent 519ed32de3
commit 335689da22
22 changed files with 3060 additions and 245 deletions

View File

@@ -2,6 +2,14 @@
A Helm chart for deploying the Nuclei Operator - a Kubernetes operator that automatically scans Ingress and VirtualService resources using Nuclei security scanner.
## Features
- **Pod-based Scanning Architecture**: Each scan runs in an isolated Kubernetes Job for better scalability and reliability
- **Annotation-based Configuration**: Configure scanning behavior per-resource using annotations
- **Automatic Discovery**: Watches Kubernetes Ingress and Istio VirtualService resources
- **Scheduled Scans**: Support for cron-based scheduled rescanning
- **Flexible Configuration**: Configurable templates, severity filters, and scan options
## Prerequisites
- Kubernetes 1.26+
@@ -137,6 +145,24 @@ The following table lists the configurable parameters of the Nuclei Operator cha
| `nuclei.backoff.max` | Maximum backoff interval | `10m` |
| `nuclei.backoff.multiplier` | Backoff multiplier | `2.0` |
### Scanner Pod Configuration
The operator uses a pod-based scanning architecture where each scan runs in its own Kubernetes Job. Configure scanner pod behavior with these parameters:
| Parameter | Description | Default |
|-----------|-------------|---------|
| `scanner.enabled` | Enable scanner RBAC resources | `true` |
| `scanner.image` | Scanner image (defaults to operator image) | `""` |
| `scanner.timeout` | Default scan timeout | `30m` |
| `scanner.maxConcurrent` | Maximum concurrent scan jobs | `5` |
| `scanner.ttlAfterFinished` | Job TTL after completion (seconds) | `3600` |
| `scanner.resources.requests.cpu` | Scanner pod CPU request | `100m` |
| `scanner.resources.requests.memory` | Scanner pod memory request | `256Mi` |
| `scanner.resources.limits.cpu` | Scanner pod CPU limit | `1` |
| `scanner.resources.limits.memory` | Scanner pod memory limit | `1Gi` |
| `scanner.defaultTemplates` | Default templates to use | `[]` |
| `scanner.defaultSeverity` | Default severity filter | `[]` |
### ServiceMonitor (Prometheus Operator)
| Parameter | Description | Default |
@@ -199,6 +225,28 @@ nuclei:
rescanAge: "24h"
```
### With Custom Scanner Configuration
```yaml
# values.yaml
scanner:
enabled: true
timeout: "1h"
maxConcurrent: 10
ttlAfterFinished: 7200
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: "2"
memory: 2Gi
defaultSeverity:
- medium
- high
- critical
```
### With Node Affinity
```yaml
@@ -215,6 +263,44 @@ affinity:
- arm64
```
## Annotation-Based Configuration
You can configure scanning behavior for individual Ingress or VirtualService resources using annotations:
| Annotation | Description |
|------------|-------------|
| `nuclei.homelab.mortenolsen.pro/enabled` | Enable/disable scanning (`true`/`false`) |
| `nuclei.homelab.mortenolsen.pro/templates` | Comma-separated list of template paths |
| `nuclei.homelab.mortenolsen.pro/severity` | Comma-separated severity filter |
| `nuclei.homelab.mortenolsen.pro/schedule` | Cron schedule for periodic scans |
| `nuclei.homelab.mortenolsen.pro/timeout` | Scan timeout duration |
| `nuclei.homelab.mortenolsen.pro/scanner-image` | Override scanner image |
### Example Annotated Ingress
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
nuclei.homelab.mortenolsen.pro/enabled: "true"
nuclei.homelab.mortenolsen.pro/severity: "medium,high,critical"
nuclei.homelab.mortenolsen.pro/schedule: "0 2 * * *"
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp
port:
number: 80
```
## Uninstallation
```bash

View File

@@ -60,6 +60,24 @@ spec:
value: {{ .Values.nuclei.backoff.max | quote }}
- name: NUCLEI_BACKOFF_MULTIPLIER
value: {{ .Values.nuclei.backoff.multiplier | quote }}
- name: SCANNER_IMAGE
value: {{ .Values.scanner.image | default (printf "%s:%s" .Values.image.repository (.Values.image.tag | default .Chart.AppVersion)) | quote }}
- name: SCANNER_TIMEOUT
value: {{ .Values.scanner.timeout | quote }}
- name: MAX_CONCURRENT_SCANS
value: {{ .Values.scanner.maxConcurrent | quote }}
- name: JOB_TTL_AFTER_FINISHED
value: {{ .Values.scanner.ttlAfterFinished | quote }}
- name: SCANNER_SERVICE_ACCOUNT
value: {{ include "nuclei-operator.fullname" . }}-scanner
{{- if .Values.scanner.defaultTemplates }}
- name: DEFAULT_TEMPLATES
value: {{ join "," .Values.scanner.defaultTemplates | quote }}
{{- end }}
{{- if .Values.scanner.defaultSeverity }}
- name: DEFAULT_SEVERITY
value: {{ join "," .Values.scanner.defaultSeverity | quote }}
{{- end }}
ports: []
securityContext:
{{- toYaml .Values.securityContext | nindent 10 }}

View File

@@ -6,6 +6,18 @@ metadata:
labels:
{{- include "nuclei-operator.labels" . | nindent 4 }}
rules:
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
@@ -13,6 +25,20 @@ rules:
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods/log
verbs:
- get
- apiGroups:
- networking.istio.io
resources:

View File

@@ -0,0 +1,51 @@
{{- if .Values.scanner.enabled }}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "nuclei-operator.fullname" . }}-scanner
namespace: {{ .Release.Namespace }}
labels:
{{- include "nuclei-operator.labels" . | nindent 4 }}
app.kubernetes.io/component: scanner
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "nuclei-operator.fullname" . }}-scanner
labels:
{{- include "nuclei-operator.labels" . | nindent 4 }}
app.kubernetes.io/component: scanner
rules:
# Scanner needs to read NucleiScan resources
- apiGroups:
- nuclei.homelab.mortenolsen.pro
resources:
- nucleiscans
verbs:
- get
# Scanner needs to update NucleiScan status
- apiGroups:
- nuclei.homelab.mortenolsen.pro
resources:
- nucleiscans/status
verbs:
- get
- patch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "nuclei-operator.fullname" . }}-scanner
labels:
{{- include "nuclei-operator.labels" . | nindent 4 }}
app.kubernetes.io/component: scanner
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "nuclei-operator.fullname" . }}-scanner
subjects:
- kind: ServiceAccount
name: {{ include "nuclei-operator.fullname" . }}-scanner
namespace: {{ .Release.Namespace }}
{{- end }}

View File

@@ -130,4 +130,36 @@ serviceMonitor:
# Network policies
networkPolicy:
# Enable network policy
enabled: false
enabled: false
# Scanner configuration
scanner:
# Enable scanner RBAC resources
enabled: true
# Scanner image (defaults to operator image)
image: ""
# Default scan timeout
timeout: "30m"
# Maximum concurrent scan jobs
maxConcurrent: 5
# Job TTL after completion (seconds)
ttlAfterFinished: 3600
# Default resource requirements for scanner pods
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: "1"
memory: 1Gi
# Default templates to use
defaultTemplates: []
# Default severity filter
defaultSeverity: []