feat: implement pod-based scanning architecture

This major refactor moves from synchronous subprocess-based scanning to
asynchronous pod-based scanning using Kubernetes Jobs.

## Architecture Changes
- Scanner jobs are now Kubernetes Jobs with TTLAfterFinished for automatic cleanup
- Jobs have owner references for garbage collection when NucleiScan is deleted
- Configurable concurrency limits, timeouts, and resource requirements

## New Features
- Dual-mode binary: --mode=controller (default) or --mode=scanner
- Annotation-based configuration for Ingress/VirtualService resources
- Operator-level configuration via environment variables
- Startup recovery for orphaned scans after operator restart
- Periodic cleanup of stuck jobs

## New Files
- DESIGN.md: Comprehensive architecture design document
- internal/jobmanager/: Job Manager for creating/monitoring scanner jobs
- internal/scanner/runner.go: Scanner mode implementation
- internal/annotations/: Annotation parsing utilities
- charts/nuclei-operator/templates/scanner-rbac.yaml: Scanner RBAC

## API Changes
- Added ScannerConfig struct for per-scan scanner configuration
- Added JobReference struct for tracking scanner jobs
- Added ScannerConfig field to NucleiScanSpec
- Added JobRef and ScanStartTime fields to NucleiScanStatus

## Supported Annotations
- nuclei.homelab.mortenolsen.pro/enabled
- nuclei.homelab.mortenolsen.pro/templates
- nuclei.homelab.mortenolsen.pro/severity
- nuclei.homelab.mortenolsen.pro/schedule
- nuclei.homelab.mortenolsen.pro/timeout
- nuclei.homelab.mortenolsen.pro/scanner-image

## RBAC Updates
- Added Job and Pod permissions for operator
- Created separate scanner service account with minimal permissions

## Documentation
- Updated README, user-guide, api.md, and Helm chart README
- Added example annotated Ingress resources
This commit is contained in:
Morten Olsen
2025-12-12 20:51:23 +01:00
parent 519ed32de3
commit 335689da22
22 changed files with 3060 additions and 245 deletions

View File

@@ -55,6 +55,127 @@ spec:
spec:
description: NucleiScanSpec defines the desired state of NucleiScan
properties:
scannerConfig:
description: ScannerConfig allows overriding scanner settings for
this scan
properties:
image:
description: Image overrides the default scanner image
type: string
nodeSelector:
additionalProperties:
type: string
description: NodeSelector for scanner pod scheduling
type: object
resources:
description: Resources defines resource requirements for the scanner
pod
properties:
claims:
description: |-
Claims lists the names of resources, defined in spec.resourceClaims,
that are used by this container.
This field depends on the
DynamicResourceAllocation feature gate.
This field is immutable. It can only be set for containers.
items:
description: ResourceClaim references one entry in PodSpec.ResourceClaims.
properties:
name:
description: |-
Name must match the name of one entry in pod.spec.resourceClaims of
the Pod where this field is used. It makes that resource available
inside a container.
type: string
request:
description: |-
Request is the name chosen for a request in the referenced claim.
If empty, everything from the claim is made available, otherwise
only the result of this request.
type: string
required:
- name
type: object
type: array
x-kubernetes-list-map-keys:
- name
x-kubernetes-list-type: map
limits:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: |-
Limits describes the maximum amount of compute resources allowed.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
type: object
requests:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: |-
Requests describes the minimum amount of compute resources required.
If Requests is omitted for a container, it defaults to Limits if that is explicitly specified,
otherwise to an implementation-defined value. Requests cannot exceed Limits.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
type: object
type: object
templateURLs:
description: TemplateURLs specifies additional template repositories
to clone
items:
type: string
type: array
timeout:
description: Timeout overrides the default scan timeout
type: string
tolerations:
description: Tolerations for scanner pod scheduling
items:
description: |-
The pod this Toleration is attached to tolerates any taint that matches
the triple <key,value,effect> using the matching operator <operator>.
properties:
effect:
description: |-
Effect indicates the taint effect to match. Empty means match all taint effects.
When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
type: string
key:
description: |-
Key is the taint key that the toleration applies to. Empty means match all taint keys.
If the key is empty, operator must be Exists; this combination means to match all values and all keys.
type: string
operator:
description: |-
Operator represents a key's relationship to the value.
Valid operators are Exists and Equal. Defaults to Equal.
Exists is equivalent to wildcard for value, so that a pod can
tolerate all taints of a particular category.
type: string
tolerationSeconds:
description: |-
TolerationSeconds represents the period of time the toleration (which must be
of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
it is not set, which means tolerate the taint forever (do not evict). Zero and
negative values will be treated as 0 (evict immediately) by the system.
format: int64
type: integer
value:
description: |-
Value is the taint value the toleration matches to.
If the operator is Exists, the value should be empty, otherwise just a regular string.
type: string
type: object
type: array
type: object
schedule:
description: |-
Schedule for periodic rescanning in cron format
@@ -249,6 +370,26 @@ spec:
- timestamp
type: object
type: array
jobRef:
description: JobRef references the current or last scanner job
properties:
name:
description: Name of the Job
type: string
podName:
description: PodName is the name of the scanner pod (for log retrieval)
type: string
startTime:
description: StartTime when the job was created
format: date-time
type: string
uid:
description: UID of the Job
type: string
required:
- name
- uid
type: object
lastError:
description: LastError contains the error message if the scan failed
type: string
@@ -284,6 +425,11 @@ spec:
RetryCount tracks the number of consecutive availability check retries
Used for exponential backoff when waiting for targets
type: integer
scanStartTime:
description: ScanStartTime is when the scanner pod actually started
scanning
format: date-time
type: string
summary:
description: Summary provides aggregated scan statistics
properties:

View File

@@ -11,6 +11,26 @@ rules:
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- networking.istio.io
resources:

View File

@@ -1,6 +1,43 @@
# Example Ingress resource that would trigger NucleiScan creation
# When this Ingress is created, the nuclei-operator will automatically
# create a corresponding NucleiScan resource to scan the exposed endpoints.
#
# The operator uses a pod-based scanning architecture where each scan
# runs in an isolated Kubernetes Job for better scalability and reliability.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress
namespace: default
labels:
app.kubernetes.io/name: example-app
app.kubernetes.io/managed-by: kustomize
annotations:
# Nuclei scanning configuration
nuclei.homelab.mortenolsen.pro/enabled: "true"
nuclei.homelab.mortenolsen.pro/severity: "medium,high,critical"
nuclei.homelab.mortenolsen.pro/schedule: "0 2 * * *"
# Optional: Additional scanning configuration
# nuclei.homelab.mortenolsen.pro/templates: "cves/,vulnerabilities/"
# nuclei.homelab.mortenolsen.pro/timeout: "1h"
# nuclei.homelab.mortenolsen.pro/scanner-image: "custom-scanner:latest"
# nuclei.homelab.mortenolsen.pro/tags: "cve,oast"
# nuclei.homelab.mortenolsen.pro/exclude-tags: "dos"
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: example-service
port:
number: 80
---
# Example Ingress with TLS - endpoints will be scanned with HTTPS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
@@ -10,9 +47,10 @@ metadata:
app.kubernetes.io/name: example-app
app.kubernetes.io/managed-by: kustomize
annotations:
# Optional: Add annotations to customize scan behavior
# nuclei.homelab.mortenolsen.pro/scan-enabled: "true"
# nuclei.homelab.mortenolsen.pro/severity: "high,critical"
# Nuclei scanning configuration
nuclei.homelab.mortenolsen.pro/enabled: "true"
nuclei.homelab.mortenolsen.pro/severity: "high,critical"
nuclei.homelab.mortenolsen.pro/templates: "cves/,vulnerabilities/,exposures/"
kubernetes.io/ingress.class: nginx
spec:
# TLS configuration - endpoints will be scanned with HTTPS
@@ -52,8 +90,8 @@ spec:
port:
number: 8080
---
# Example Ingress without TLS (HTTP only)
# This will be scanned with HTTP scheme
# Example Ingress with scanning disabled
# This will NOT trigger a NucleiScan creation
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
@@ -61,6 +99,9 @@ metadata:
namespace: default
labels:
app.kubernetes.io/name: internal-app
annotations:
# Disable scanning for this internal resource
nuclei.homelab.mortenolsen.pro/enabled: "false"
spec:
rules:
- host: internal.example.local
@@ -72,4 +113,45 @@ spec:
service:
name: internal-app
port:
number: 80
number: 80
---
# Example Ingress with full annotation configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: fully-configured-ingress
namespace: default
labels:
app.kubernetes.io/name: configured-app
annotations:
# Enable scanning
nuclei.homelab.mortenolsen.pro/enabled: "true"
# Severity filter - only report medium and above
nuclei.homelab.mortenolsen.pro/severity: "medium,high,critical"
# Schedule daily scans at 2 AM
nuclei.homelab.mortenolsen.pro/schedule: "0 2 * * *"
# Use specific template directories
nuclei.homelab.mortenolsen.pro/templates: "cves/,vulnerabilities/,misconfiguration/"
# Set scan timeout to 1 hour
nuclei.homelab.mortenolsen.pro/timeout: "1h"
# Include specific tags
nuclei.homelab.mortenolsen.pro/tags: "cve,oast,sqli,xss"
# Exclude certain tags
nuclei.homelab.mortenolsen.pro/exclude-tags: "dos,fuzz"
kubernetes.io/ingress.class: nginx
spec:
tls:
- hosts:
- secure.example.com
secretName: secure-tls-secret
rules:
- host: secure.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: secure-app
port:
number: 443