Files
nuclei-operator/ARCHITECTURE.md
Morten Olsen 277fc459d5 init
2025-12-12 11:10:01 +01:00

42 KiB

Nuclei Operator Architecture

Overview

The Nuclei Operator is a Kubernetes operator that automates security scanning of web applications exposed through Kubernetes Ingress resources and Istio VirtualService CRDs. It uses Nuclei, a fast and customizable vulnerability scanner, to perform security assessments and stores the results in a custom Kubernetes resource.

flowchart TB
    subgraph Kubernetes Cluster
        subgraph Watched Resources
            ING[Ingress Resources]
            VS[VirtualService CRDs]
        end
        
        subgraph Nuclei Operator
            IC[Ingress Controller]
            VSC[VirtualService Controller]
            NSC[NucleiScan Controller]
            SM[Scan Manager]
        end
        
        subgraph Custom Resources
            NS[NucleiScan CRDs]
        end
        
        ING -->|watches| IC
        VS -->|watches| VSC
        IC -->|creates/updates| NS
        VSC -->|creates/updates| NS
        NS -->|reconciles| NSC
        NSC -->|triggers| SM
        SM -->|updates results| NS
    end
    
    subgraph External
        NE[Nuclei Engine]
    end
    
    SM -->|executes scans| NE
    NE -->|scan targets| ING
    NE -->|scan targets| VS

1. Project Structure

The operator follows the standard Kubebuilder project layout:

nuclei-operator/
├── api/
│   └── v1alpha1/
│       ├── groupversion_info.go      # API group/version metadata
│       ├── nucleiscan_types.go       # NucleiScan CRD type definitions
│       └── zz_generated.deepcopy.go  # Auto-generated deep copy methods
├── cmd/
│   └── main.go                       # Operator entry point
├── config/
│   ├── crd/
│   │   └── bases/
│   │       └── nuclei.homelab.mortenolsen.pro_nucleiscans.yaml
│   ├── default/
│   │   └── kustomization.yaml
│   ├── manager/
│   │   └── manager.yaml              # Deployment manifest
│   ├── production/
│   │   ├── kustomization.yaml        # Production overlay
│   │   └── manager_patch.yaml        # Production patches
│   ├── rbac/
│   │   ├── role.yaml                 # ClusterRole
│   │   ├── role_binding.yaml         # ClusterRoleBinding
│   │   └── service_account.yaml      # ServiceAccount
│   └── samples/
│       ├── nuclei_v1alpha1_nucleiscan.yaml
│       └── example-ingress.yaml
├── internal/
│   ├── controller/
│   │   ├── ingress_controller.go     # Ingress watcher/controller
│   │   ├── virtualservice_controller.go  # VirtualService watcher/controller
│   │   ├── nucleiscan_controller.go  # NucleiScan reconciler
│   │   ├── nucleiscan_controller_test.go  # Controller tests
│   │   └── suite_test.go             # Test suite setup
│   └── scanner/
│       ├── scanner.go                # Nuclei scan execution logic
│       └── parser.go                 # JSONL output parser
├── test/
│   ├── e2e/
│   │   ├── e2e_suite_test.go         # E2E test suite
│   │   └── e2e_test.go               # E2E tests
│   └── utils/
│       └── utils.go                  # Test utilities
├── docs/
│   ├── api.md                        # API reference
│   └── user-guide.md                 # User guide
├── Dockerfile
├── Makefile
├── go.mod
├── go.sum
├── PROJECT                           # Kubebuilder project metadata
├── ARCHITECTURE.md                   # This file
├── CONTRIBUTING.md                   # Contribution guidelines
└── README.md

Key Directories

Directory Purpose
api/v1alpha1/ CRD type definitions and API versioning
cmd/ Main entry point for the operator
config/ Kubernetes manifests for deployment
internal/controller/ Reconciliation logic for all controllers
internal/scanner/ Nuclei scan execution and result parsing
test/ End-to-end tests
docs/ Additional documentation

2. CRD Design

NucleiScan Custom Resource Definition

API Group: nuclei.homelab.mortenolsen.pro
Version: v1alpha1
Kind: NucleiScan

Type Definitions

// api/v1alpha1/nucleiscan_types.go

package v1alpha1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
)

// SourceReference identifies the Ingress or VirtualService that triggered this scan
type SourceReference struct {
    // APIVersion of the source resource
    // +kubebuilder:validation:Required
    APIVersion string `json:"apiVersion"`
    
    // Kind of the source resource - Ingress or VirtualService
    // +kubebuilder:validation:Enum=Ingress;VirtualService
    Kind string `json:"kind"`
    
    // Name of the source resource
    // +kubebuilder:validation:Required
    Name string `json:"name"`
    
    // Namespace of the source resource
    // +kubebuilder:validation:Required
    Namespace string `json:"namespace"`
    
    // UID of the source resource for owner reference
    // +kubebuilder:validation:Required
    UID string `json:"uid"`
}

// NucleiScanSpec defines the desired state of NucleiScan
type NucleiScanSpec struct {
    // SourceRef references the Ingress or VirtualService being scanned
    // +kubebuilder:validation:Required
    SourceRef SourceReference `json:"sourceRef"`
    
    // Targets is the list of URLs to scan, extracted from the source resource
    // +kubebuilder:validation:Required
    // +kubebuilder:validation:MinItems=1
    Targets []string `json:"targets"`
    
    // Templates specifies which Nuclei templates to use
    // If empty, uses default templates
    // +optional
    Templates []string `json:"templates,omitempty"`
    
    // Severity filters scan results by severity level
    // +kubebuilder:validation:Enum=info;low;medium;high;critical
    // +optional
    Severity []string `json:"severity,omitempty"`
    
    // Schedule for periodic rescanning in cron format
    // If empty, scan runs once
    // +optional
    Schedule string `json:"schedule,omitempty"`
    
    // Suspend prevents scheduled scans from running
    // +optional
    Suspend bool `json:"suspend,omitempty"`
}

// ScanPhase represents the current phase of the scan
// +kubebuilder:validation:Enum=Pending;Running;Completed;Failed
type ScanPhase string

const (
    ScanPhasePending   ScanPhase = "Pending"
    ScanPhaseRunning   ScanPhase = "Running"
    ScanPhaseCompleted ScanPhase = "Completed"
    ScanPhaseFailed    ScanPhase = "Failed"
)

// Finding represents a single Nuclei scan finding
type Finding struct {
    // TemplateID is the Nuclei template identifier
    TemplateID string `json:"templateId"`
    
    // TemplateName is the human-readable template name
    TemplateName string `json:"templateName,omitempty"`
    
    // Severity of the finding
    Severity string `json:"severity"`
    
    // Type of the finding - http, dns, ssl, etc.
    Type string `json:"type,omitempty"`
    
    // Host that was scanned
    Host string `json:"host"`
    
    // MatchedAt is the specific URL or endpoint where the issue was found
    MatchedAt string `json:"matchedAt,omitempty"`
    
    // ExtractedResults contains any data extracted by the template
    // +optional
    ExtractedResults []string `json:"extractedResults,omitempty"`
    
    // Description provides details about the finding
    // +optional
    Description string `json:"description,omitempty"`
    
    // Reference contains URLs to additional information about the finding
    // +optional
    Reference []string `json:"reference,omitempty"`
    
    // Tags associated with the finding
    // +optional
    Tags []string `json:"tags,omitempty"`
    
    // Timestamp when the finding was discovered
    Timestamp metav1.Time `json:"timestamp"`
    
    // Metadata contains additional template metadata
    // +kubebuilder:pruning:PreserveUnknownFields
    // +optional
    Metadata *runtime.RawExtension `json:"metadata,omitempty"`
}

// ScanSummary provides aggregated statistics about the scan
type ScanSummary struct {
    // TotalFindings is the total number of findings
    TotalFindings int `json:"totalFindings"`
    
    // FindingsBySeverity breaks down findings by severity level
    FindingsBySeverity map[string]int `json:"findingsBySeverity,omitempty"`
    
    // TargetsScanned is the number of targets that were scanned
    TargetsScanned int `json:"targetsScanned"`
    
    // Duration of the scan in seconds
    DurationSeconds int64 `json:"durationSeconds,omitempty"`
}

// NucleiScanStatus defines the observed state of NucleiScan
type NucleiScanStatus struct {
    // Phase represents the current scan phase
    Phase ScanPhase `json:"phase,omitempty"`
    
    // Conditions represent the latest available observations
    // +optional
    Conditions []metav1.Condition `json:"conditions,omitempty"`
    
    // LastScanTime is when the last scan was initiated
    // +optional
    LastScanTime *metav1.Time `json:"lastScanTime,omitempty"`
    
    // CompletionTime is when the last scan completed
    // +optional
    CompletionTime *metav1.Time `json:"completionTime,omitempty"`
    
    // NextScheduledTime is when the next scheduled scan will run
    // +optional
    NextScheduledTime *metav1.Time `json:"nextScheduledTime,omitempty"`
    
    // Summary provides aggregated scan statistics
    // +optional
    Summary *ScanSummary `json:"summary,omitempty"`
    
    // Findings contains the array of scan results from Nuclei JSONL output
    // Each element is a parsed JSON object from Nuclei output
    // +optional
    Findings []Finding `json:"findings,omitempty"`
    
    // LastError contains the error message if the scan failed
    // +optional
    LastError string `json:"lastError,omitempty"`
    
    // ObservedGeneration is the generation observed by the controller
    ObservedGeneration int64 `json:"observedGeneration,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:shortName=ns;nscan
// +kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase`
// +kubebuilder:printcolumn:name="Findings",type=integer,JSONPath=`.status.summary.totalFindings`
// +kubebuilder:printcolumn:name="Source",type=string,JSONPath=`.spec.sourceRef.kind`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`

// NucleiScan is the Schema for the nucleiscans API
type NucleiScan struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   NucleiScanSpec   `json:"spec,omitempty"`
    Status NucleiScanStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// NucleiScanList contains a list of NucleiScan
type NucleiScanList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []NucleiScan `json:"items"`
}

Example NucleiScan Resource

apiVersion: nuclei.homelab.mortenolsen.pro/v1alpha1
kind: NucleiScan
metadata:
  name: myapp-ingress-scan
  namespace: default
  ownerReferences:
    - apiVersion: networking.k8s.io/v1
      kind: Ingress
      name: myapp-ingress
      uid: abc123-def456-ghi789
      controller: true
      blockOwnerDeletion: true
spec:
  sourceRef:
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    name: myapp-ingress
    namespace: default
    uid: abc123-def456-ghi789
  targets:
    - https://myapp.example.com
    - https://api.myapp.example.com
  severity:
    - medium
    - high
    - critical
status:
  phase: Completed
  lastScanTime: "2024-01-15T10:30:00Z"
  completionTime: "2024-01-15T10:35:00Z"
  observedGeneration: 1
  summary:
    totalFindings: 3
    findingsBySeverity:
      medium: 2
      high: 1
    targetsScanned: 2
    durationSeconds: 300
  findings:
    - templateId: CVE-2021-44228
      templateName: Log4j RCE
      severity: critical
      type: http
      host: https://myapp.example.com
      matchedAt: https://myapp.example.com/api/login
      timestamp: "2024-01-15T10:32:00Z"
    - templateId: ssl-weak-cipher
      templateName: Weak SSL Cipher Detected
      severity: medium
      type: ssl
      host: https://api.myapp.example.com
      timestamp: "2024-01-15T10:33:00Z"
  conditions:
    - type: Ready
      status: "True"
      lastTransitionTime: "2024-01-15T10:35:00Z"
      reason: ScanCompleted
      message: Scan completed successfully with 3 findings

3. Controller Design

The operator uses three controllers working together:

flowchart TB
    subgraph Controllers
        IC[IngressController]
        VSC[VirtualServiceController]
        NSC[NucleiScanController]
    end
    
    subgraph Resources
        ING[Ingress]
        VS[VirtualService]
        NS[NucleiScan]
    end
    
    ING -->|Watch Create/Update/Delete| IC
    VS -->|Watch Create/Update/Delete| VSC
    NS -->|Watch Create/Update| NSC
    
    IC -->|Create/Update NucleiScan| NS
    VSC -->|Create/Update NucleiScan| NS
    NSC -->|Execute Scan & Update Status| NS
    
    NS -.->|OwnerRef| ING
    NS -.->|OwnerRef| VS

3.1 Ingress Controller

Purpose: Watches Ingress resources and creates/updates corresponding NucleiScan resources.

// internal/controller/ingress_controller.go

type IngressReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// Reconcile handles Ingress events
func (r *IngressReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch the Ingress
    // 2. Check if NucleiScan exists for this Ingress
    // 3. Extract target URLs from Ingress rules
    // 4. Create or update NucleiScan with owner reference
    // 5. Handle Ingress deletion via owner reference garbage collection
}

// SetupWithManager sets up the controller with the Manager
func (r *IngressReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&networkingv1.Ingress{}).
        Owns(&nucleiv1alpha1.NucleiScan{}).
        Complete(r)
}

Reconciliation Flow:

flowchart TD
    A[Ingress Event] --> B{Ingress Exists?}
    B -->|No - Deleted| C[NucleiScan deleted via OwnerRef GC]
    B -->|Yes| D[Extract URLs from Ingress Rules]
    D --> E{NucleiScan Exists?}
    E -->|No| F[Create NucleiScan with OwnerRef]
    E -->|Yes| G{Targets Changed?}
    G -->|No| H[No Action]
    G -->|Yes| I[Update NucleiScan Targets]
    I --> J[Reset Scan Status to Pending]
    F --> K[Done]
    J --> K
    H --> K

3.2 VirtualService Controller

Purpose: Watches Istio VirtualService resources and creates/updates corresponding NucleiScan resources.

// internal/controller/virtualservice_controller.go

type VirtualServiceReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// Reconcile handles VirtualService events
func (r *VirtualServiceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch the VirtualService
    // 2. Check if NucleiScan exists for this VirtualService
    // 3. Extract target URLs from VirtualService hosts
    // 4. Create or update NucleiScan with owner reference
    // 5. Handle VirtualService deletion via owner reference garbage collection
}

// SetupWithManager sets up the controller with the Manager
func (r *VirtualServiceReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&istionetworkingv1beta1.VirtualService{}).
        Owns(&nucleiv1alpha1.NucleiScan{}).
        Complete(r)
}

3.3 NucleiScan Controller

Purpose: Reconciles NucleiScan resources and executes Nuclei scans.

// internal/controller/nucleiscan_controller.go

type NucleiScanReconciler struct {
    client.Client
    Scheme  *runtime.Scheme
    Scanner scanner.Scanner
}

// Reconcile handles NucleiScan events
func (r *NucleiScanReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch the NucleiScan
    // 2. Check current phase
    // 3. If Pending: start scan, update to Running
    // 4. If Running: check scan status, update results when complete
    // 5. If Completed/Failed: check if rescan needed based on schedule
    // 6. Update status conditions
}

// SetupWithManager sets up the controller with the Manager
func (r *NucleiScanReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&nucleiv1alpha1.NucleiScan{}).
        Complete(r)
}

Reconciliation Flow:

flowchart TD
    A[NucleiScan Event] --> B{Phase?}
    
    B -->|Pending| C[Start Nuclei Scan]
    C --> D[Update Phase to Running]
    D --> E[Requeue after 30s]
    
    B -->|Running| F{Scan Complete?}
    F -->|No| G[Requeue after 30s]
    F -->|Yes - Success| H[Parse JSONL Results]
    H --> I[Update Findings Array]
    I --> J[Update Summary]
    J --> K[Set Phase to Completed]
    F -->|Yes - Error| L[Set Phase to Failed]
    L --> M[Record Error Message]
    
    B -->|Completed| N{Schedule Defined?}
    N -->|No| O[Done]
    N -->|Yes| P{Time for Rescan?}
    P -->|No| Q[Requeue at NextScheduledTime]
    P -->|Yes| R[Reset to Pending]
    R --> C
    
    B -->|Failed| S{Retry?}
    S -->|Yes| T[Reset to Pending with Backoff]
    T --> C
    S -->|No| U[Done - Manual Intervention]

Controller Registration

// cmd/main.go

func main() {
    // ... setup code ...
    
    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
        Scheme:                 scheme,
        MetricsBindAddress:     metricsAddr,
        Port:                   9443,
        HealthProbeBindAddress: probeAddr,
        LeaderElection:         enableLeaderElection,
        LeaderElectionID:       "nuclei-operator.homelab.mortenolsen.pro",
    })
    
    // Register Ingress Controller
    if err = (&controller.IngressReconciler{
        Client: mgr.GetClient(),
        Scheme: mgr.GetScheme(),
    }).SetupWithManager(mgr); err != nil {
        setupLog.Error(err, "unable to create controller", "controller", "Ingress")
        os.Exit(1)
    }
    
    // Register VirtualService Controller
    if err = (&controller.VirtualServiceReconciler{
        Client: mgr.GetClient(),
        Scheme: mgr.GetScheme(),
    }).SetupWithManager(mgr); err != nil {
        setupLog.Error(err, "unable to create controller", "controller", "VirtualService")
        os.Exit(1)
    }
    
    // Register NucleiScan Controller
    if err = (&controller.NucleiScanReconciler{
        Client:  mgr.GetClient(),
        Scheme:  mgr.GetScheme(),
        Scanner: scanner.NewNucleiScanner(),
    }).SetupWithManager(mgr); err != nil {
        setupLog.Error(err, "unable to create controller", "controller", "NucleiScan")
        os.Exit(1)
    }
    
    // ... start manager ...
}

4. Scanning Strategy

4.1 URL Extraction

From Ingress Resources

// internal/util/url_extractor.go

func ExtractURLsFromIngress(ingress *networkingv1.Ingress) []string {
    var urls []string
    
    // Extract from TLS hosts - these are HTTPS
    for _, tls := range ingress.Spec.TLS {
        for _, host := range tls.Hosts {
            urls = append(urls, fmt.Sprintf("https://%s", host))
        }
    }
    
    // Extract from rules
    for _, rule := range ingress.Spec.Rules {
        if rule.Host != "" {
            // Check if host is in TLS list
            isHTTPS := false
            for _, tls := range ingress.Spec.TLS {
                for _, tlsHost := range tls.Hosts {
                    if tlsHost == rule.Host {
                        isHTTPS = true
                        break
                    }
                }
            }
            
            scheme := "http"
            if isHTTPS {
                scheme = "https"
            }
            
            // Add paths if defined
            if rule.HTTP != nil {
                for _, path := range rule.HTTP.Paths {
                    urls = append(urls, fmt.Sprintf("%s://%s%s", scheme, rule.Host, path.Path))
                }
            } else {
                urls = append(urls, fmt.Sprintf("%s://%s", scheme, rule.Host))
            }
        }
    }
    
    return deduplicateURLs(urls)
}

From VirtualService Resources

func ExtractURLsFromVirtualService(vs *istionetworkingv1beta1.VirtualService) []string {
    var urls []string
    
    // VirtualService hosts can be FQDNs or short names
    for _, host := range vs.Spec.Hosts {
        // Skip wildcard hosts
        if strings.HasPrefix(host, "*") {
            continue
        }
        
        // Determine protocol from Gateway configuration
        // Default to HTTPS for security scanning
        scheme := "https"
        
        // Extract paths from HTTP routes
        for _, httpRoute := range vs.Spec.Http {
            for _, match := range httpRoute.Match {
                if match.Uri != nil {
                    if match.Uri.Prefix != "" {
                        urls = append(urls, fmt.Sprintf("%s://%s%s", scheme, host, match.Uri.Prefix))
                    } else if match.Uri.Exact != "" {
                        urls = append(urls, fmt.Sprintf("%s://%s%s", scheme, host, match.Uri.Exact))
                    }
                }
            }
        }
        
        // If no specific paths, add base URL
        if len(urls) == 0 {
            urls = append(urls, fmt.Sprintf("%s://%s", scheme, host))
        }
    }
    
    return deduplicateURLs(urls)
}

4.2 Scan Execution

The scanner implementation uses a synchronous execution model where scans complete within the reconcile loop.

Location: internal/scanner/scanner.go

// Scanner defines the interface for executing Nuclei scans
type Scanner interface {
    // Scan executes a Nuclei scan against the given targets and returns the results
    Scan(ctx context.Context, targets []string, options ScanOptions) (*ScanResult, error)
}

// ScanOptions contains configuration options for a scan
type ScanOptions struct {
    Templates []string      // Nuclei templates to use
    Severity  []string      // Severity filter
    Timeout   time.Duration // Maximum scan duration
}

// ScanResult contains the results of a completed scan
type ScanResult struct {
    Findings []nucleiv1alpha1.Finding
    Summary  nucleiv1alpha1.ScanSummary
    Duration time.Duration
}

// NucleiScanner implements the Scanner interface using the Nuclei binary
type NucleiScanner struct {
    nucleiBinaryPath string
    templatesPath    string
}

// Configuration via environment variables:
// - NUCLEI_BINARY_PATH: Path to nuclei binary (default: "nuclei")
// - NUCLEI_TEMPLATES_PATH: Path to templates directory (default: uses nuclei default)
// - NUCLEI_TIMEOUT: Default scan timeout (default: 30m)

func NewNucleiScannerWithDefaults() *NucleiScanner {
    return NewNucleiScanner(DefaultConfig())
}

func (s *NucleiScanner) Scan(ctx context.Context, targets []string, options ScanOptions) (*ScanResult, error) {
    // 1. Create temporary directory for scan
    // 2. Write targets to file
    // 3. Build nuclei command arguments
    // 4. Execute nuclei with context timeout
    // 5. Parse JSONL output
    // 6. Calculate summary statistics
    // 7. Return results
}

Command Arguments:

args := []string{
    "-l", targetsFile,      // Target list file
    "-jsonl",               // JSON Lines output format
    "-silent",              // Suppress banner
    "-no-color",            // No ANSI colors
}

if s.templatesPath != "" {
    args = append(args, "-t", s.templatesPath)
}

if len(options.Templates) > 0 {
    for _, t := range options.Templates {
        args = append(args, "-t", t)
    }
}

if len(options.Severity) > 0 {
    args = append(args, "-severity", strings.Join(options.Severity, ","))
}

4.3 JSONL Parsing

Location: internal/scanner/parser.go

The parser handles Nuclei's JSONL output format, which can have varying field types (strings vs arrays).

// NucleiOutput represents the structure of Nuclei's JSONL output
type NucleiOutput struct {
    TemplateID       string     `json:"template-id"`
    TemplatePath     string     `json:"template-path"`
    Info             NucleiInfo `json:"info"`
    Type             string     `json:"type"`
    Host             string     `json:"host"`
    MatchedAt        string     `json:"matched-at"`
    Timestamp        string     `json:"timestamp"`
    ExtractedResults interface{} `json:"extracted-results,omitempty"`
    MatcherName      string     `json:"matcher-name,omitempty"`
    IP               string     `json:"ip,omitempty"`
}

type NucleiInfo struct {
    Name        string      `json:"name"`
    Author      interface{} `json:"author"`      // Can be string or []string
    Tags        interface{} `json:"tags"`        // Can be string or []string
    Description string      `json:"description,omitempty"`
    Severity    string      `json:"severity"`
    Reference   interface{} `json:"reference,omitempty"` // Can be string or []string
    Metadata    interface{} `json:"metadata,omitempty"`
}

// ParseJSONLOutput parses Nuclei's JSONL output and returns a slice of Findings
func ParseJSONLOutput(output []byte) ([]nucleiv1alpha1.Finding, error) {
    var findings []nucleiv1alpha1.Finding
    scanner := bufio.NewScanner(bytes.NewReader(output))
    
    for scanner.Scan() {
        line := scanner.Bytes()
        if len(line) == 0 || !bytes.HasPrefix(bytes.TrimSpace(line), []byte("{")) {
            continue // Skip empty lines and non-JSON output
        }
        
        finding, err := parseJSONLine(line)
        if err != nil {
            continue // Log warning but continue parsing
        }
        findings = append(findings, finding)
    }
    
    return findings, scanner.Err()
}

// parseStringSlice handles fields that can be string or []string
func parseStringSlice(v interface{}) []string {
    switch val := v.(type) {
    case string:
        if strings.Contains(val, ",") {
            return strings.Split(val, ",")
        }
        return []string{val}
    case []interface{}:
        result := make([]string, 0, len(val))
        for _, item := range val {
            if s, ok := item.(string); ok {
                result = append(result, s)
            }
        }
        return result
    }
    return nil
}

4.4 Scan Lifecycle State Machine

stateDiagram-v2
    [*] --> Pending: NucleiScan Created
    
    Pending --> Running: Scan Started
    Running --> Completed: Scan Finished Successfully
    Running --> Failed: Scan Error
    
    Completed --> Pending: Scheduled Rescan
    Failed --> Pending: Retry with Backoff
    
    Completed --> [*]: No Schedule
    Failed --> [*]: Max Retries Exceeded

Phase Transitions:

From To Trigger Actions
- Pending NucleiScan created Initialize status
Pending Running Reconciler picks up Start Nuclei process, set LastScanTime
Running Completed Scan finishes Parse results, update Findings, set CompletionTime
Running Failed Scan error Record error, increment retry count
Completed Pending Schedule triggers Reset status, calculate NextScheduledTime
Failed Pending Retry backoff elapsed Reset status, increment attempt

5. Owner References

Owner references ensure that NucleiScan resources are automatically garbage collected when their source Ingress or VirtualService is deleted.

Implementation

// internal/util/owner_reference.go

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
)

// SetOwnerReference sets the owner reference on a NucleiScan
func SetOwnerReference(owner metav1.Object, owned metav1.Object, scheme *runtime.Scheme) error {
    return controllerutil.SetControllerReference(owner, owned, scheme)
}

// CreateOwnerReference creates an owner reference for cross-namespace scenarios
// Note: Standard owner references require same namespace
func CreateOwnerReference(owner metav1.Object, blockOwnerDeletion bool) metav1.OwnerReference {
    return metav1.OwnerReference{
        APIVersion:         owner.GetObjectKind().GroupVersionKind().GroupVersion().String(),
        Kind:               owner.GetObjectKind().GroupVersionKind().Kind,
        Name:               owner.GetName(),
        UID:                owner.GetUID(),
        Controller:         ptr.To(true),
        BlockOwnerDeletion: ptr.To(blockOwnerDeletion),
    }
}

Usage in Controllers

// In IngressReconciler.Reconcile()
func (r *IngressReconciler) createNucleiScan(ctx context.Context, ingress *networkingv1.Ingress) error {
    nucleiScan := &nucleiv1alpha1.NucleiScan{
        ObjectMeta: metav1.ObjectMeta{
            Name:      fmt.Sprintf("%s-scan", ingress.Name),
            Namespace: ingress.Namespace,
        },
        Spec: nucleiv1alpha1.NucleiScanSpec{
            SourceRef: nucleiv1alpha1.SourceReference{
                APIVersion: "networking.k8s.io/v1",
                Kind:       "Ingress",
                Name:       ingress.Name,
                Namespace:  ingress.Namespace,
                UID:        string(ingress.UID),
            },
            Targets: util.ExtractURLsFromIngress(ingress),
        },
    }
    
    // Set owner reference - this enables garbage collection
    if err := controllerutil.SetControllerReference(ingress, nucleiScan, r.Scheme); err != nil {
        return fmt.Errorf("failed to set owner reference: %w", err)
    }
    
    return r.Create(ctx, nucleiScan)
}

Owner Reference Behavior

Scenario Behavior
Ingress deleted NucleiScan automatically deleted by Kubernetes GC
VirtualService deleted NucleiScan automatically deleted by Kubernetes GC
NucleiScan manually deleted Recreated by Ingress/VS controller on next reconcile
Ingress updated NucleiScan targets updated, scan re-triggered

Important Considerations

  1. Same Namespace Requirement: Owner references only work within the same namespace. The NucleiScan must be created in the same namespace as the source Ingress/VirtualService.

  2. Controller Reference: Using SetControllerReference ensures:

    • Only one controller owns the resource
    • controller: true is set in the owner reference
    • blockOwnerDeletion: true prevents premature deletion
  3. Finalizers: Consider adding finalizers if cleanup actions are needed before deletion:

const finalizerName = "nuclei.homelab.mortenolsen.pro/finalizer"

func (r *NucleiScanReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    nucleiScan := &nucleiv1alpha1.NucleiScan{}
    if err := r.Get(ctx, req.NamespacedName, nucleiScan); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    
    // Check if being deleted
    if !nucleiScan.DeletionTimestamp.IsZero() {
        if controllerutil.ContainsFinalizer(nucleiScan, finalizerName) {
            // Perform cleanup (cancel running scans, etc.)
            if err := r.Scanner.CancelScan(ctx, nucleiScan.Name); err != nil {
                return ctrl.Result{}, err
            }
            
            // Remove finalizer
            controllerutil.RemoveFinalizer(nucleiScan, finalizerName)
            if err := r.Update(ctx, nucleiScan); err != nil {
                return ctrl.Result{}, err
            }
        }
        return ctrl.Result{}, nil
    }
    
    // Add finalizer if not present
    if !controllerutil.ContainsFinalizer(nucleiScan, finalizerName) {
        controllerutil.AddFinalizer(nucleiScan, finalizerName)
        if err := r.Update(ctx, nucleiScan); err != nil {
            return ctrl.Result{}, err
        }
    }
    
    // ... rest of reconciliation
}

6. RBAC Requirements

The operator requires specific permissions to watch, create, and manage resources across the cluster.

ClusterRole Definition

# config/rbac/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: nuclei-operator-manager-role
rules:
  # Core API - Events for recording
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "patch"]
  
  # Networking API - Ingress resources
  - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]
  
  # Istio API - VirtualService resources
  - apiGroups: ["networking.istio.io"]
    resources: ["virtualservices"]
    verbs: ["get", "list", "watch"]
  
  # Nuclei Operator API - NucleiScan resources
  - apiGroups: ["nuclei.homelab.mortenolsen.pro"]
    resources: ["nucleiscans"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  
  - apiGroups: ["nuclei.homelab.mortenolsen.pro"]
    resources: ["nucleiscans/status"]
    verbs: ["get", "update", "patch"]
  
  - apiGroups: ["nuclei.homelab.mortenolsen.pro"]
    resources: ["nucleiscans/finalizers"]
    verbs: ["update"]

Leader Election Role

# config/rbac/leader_election_role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: nuclei-operator-leader-election-role
  namespace: nuclei-operator-system
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "patch"]

ServiceAccount and Bindings

# config/rbac/service_account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nuclei-operator-controller-manager
  namespace: nuclei-operator-system
---
# config/rbac/role_binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: nuclei-operator-manager-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nuclei-operator-manager-role
subjects:
  - kind: ServiceAccount
    name: nuclei-operator-controller-manager
    namespace: nuclei-operator-system
---
# config/rbac/leader_election_role_binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: nuclei-operator-leader-election-rolebinding
  namespace: nuclei-operator-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: nuclei-operator-leader-election-role
subjects:
  - kind: ServiceAccount
    name: nuclei-operator-controller-manager
    namespace: nuclei-operator-system

RBAC Summary Table

Resource API Group Verbs Purpose
ingresses networking.k8s.io get, list, watch Watch Ingress resources for changes
virtualservices networking.istio.io get, list, watch Watch VirtualService resources for changes
nucleiscans nuclei.homelab.mortenolsen.pro all Full control over NucleiScan CRDs
nucleiscans/status nuclei.homelab.mortenolsen.pro get, update, patch Update scan status and results
nucleiscans/finalizers nuclei.homelab.mortenolsen.pro update Manage finalizers for cleanup
events core create, patch Record events for observability
configmaps core all Leader election coordination
leases coordination.k8s.io all Leader election coordination

7. Deployment Architecture

Operator Deployment

# config/manager/manager.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nuclei-operator-controller-manager
  namespace: nuclei-operator-system
  labels:
    control-plane: controller-manager
spec:
  replicas: 1
  selector:
    matchLabels:
      control-plane: controller-manager
  template:
    metadata:
      labels:
        control-plane: controller-manager
    spec:
      serviceAccountName: nuclei-operator-controller-manager
      terminationGracePeriodSeconds: 10
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: manager
          image: nuclei-operator:latest
          command:
            - /manager
          args:
            - --leader-elect
            - --health-probe-bind-address=:8081
            - --metrics-bind-address=:8080
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: true
          ports:
            - containerPort: 8080
              name: metrics
              protocol: TCP
            - containerPort: 8081
              name: health
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8081
            initialDelaySeconds: 15
            periodSeconds: 20
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8081
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            limits:
              cpu: 500m
              memory: 256Mi
            requests:
              cpu: 100m
              memory: 128Mi
          volumeMounts:
            - name: nuclei-templates
              mountPath: /nuclei-templates
              readOnly: true
            - name: scan-workspace
              mountPath: /tmp/nuclei-scans
      volumes:
        - name: nuclei-templates
          configMap:
            name: nuclei-templates
        - name: scan-workspace
          emptyDir: {}

High-Level Deployment Diagram

flowchart TB
    subgraph Kubernetes Cluster
        subgraph nuclei-operator-system namespace
            SA[ServiceAccount]
            DEP[Deployment: controller-manager]
            SVC[Service: metrics]
            CM[ConfigMap: nuclei-templates]
        end
        
        subgraph RBAC
            CR[ClusterRole]
            CRB[ClusterRoleBinding]
            R[Role: leader-election]
            RB[RoleBinding: leader-election]
        end
        
        subgraph CRDs
            CRD[NucleiScan CRD]
        end
        
        subgraph Watched Namespaces
            NS1[Namespace: app1]
            NS2[Namespace: app2]
            ING1[Ingress]
            ING2[Ingress]
            VS1[VirtualService]
            SCAN1[NucleiScan]
            SCAN2[NucleiScan]
            SCAN3[NucleiScan]
        end
    end
    
    SA --> DEP
    CR --> CRB
    CRB --> SA
    R --> RB
    RB --> SA
    
    DEP -->|watches| ING1
    DEP -->|watches| ING2
    DEP -->|watches| VS1
    DEP -->|manages| SCAN1
    DEP -->|manages| SCAN2
    DEP -->|manages| SCAN3
    
    ING1 -.->|ownerRef| SCAN1
    ING2 -.->|ownerRef| SCAN2
    VS1 -.->|ownerRef| SCAN3

8. Future Considerations

Potential Enhancements

  1. Webhook Validation: Add admission webhooks to validate NucleiScan resources before creation.

  2. Metrics and Monitoring: Expose Prometheus metrics for:

    • Scan duration histograms
    • Finding counts by severity
    • Scan success/failure rates
  3. Notification Integration: Send alerts when critical vulnerabilities are found via:

    • Slack webhooks
    • PagerDuty
    • Email
  4. Template Management: CRD for managing Nuclei templates:

    apiVersion: nuclei.homelab.mortenolsen.pro/v1alpha1
    kind: NucleiTemplate
    metadata:
      name: custom-templates
    spec:
      gitRepository: https://github.com/org/nuclei-templates
      branch: main
      syncInterval: 1h
    
  5. Scan Policies: Define scanning policies at namespace or cluster level:

    apiVersion: nuclei.homelab.mortenolsen.pro/v1alpha1
    kind: ScanPolicy
    metadata:
      name: default-policy
    spec:
      schedule: "0 2 * * *"  # Daily at 2 AM
      severity:
        - medium
        - high
        - critical
      excludeTemplates: []
      namespaceSelector:
        matchLabels:
          scan-enabled: "true"
    
  6. Multi-cluster Support: Extend to scan resources across multiple clusters using cluster federation.


Appendix A: Nuclei JSONL Output Format

Example Nuclei JSONL output that will be parsed into the Findings array:

{"template":"cves/2021/CVE-2021-44228.yaml","template-id":"CVE-2021-44228","template-path":"/nuclei-templates/cves/2021/CVE-2021-44228.yaml","info":{"name":"Apache Log4j RCE","author":["pdteam"],"tags":["cve","cve2021","rce","log4j"],"description":"Apache Log4j2 <=2.14.1 JNDI features...","severity":"critical","reference":["https://nvd.nist.gov/vuln/detail/CVE-2021-44228"]},"type":"http","host":"https://example.com","matched-at":"https://example.com/api/login","timestamp":"2024-01-15T10:32:00.000Z"}
{"template":"ssl/weak-cipher-suites.yaml","template-id":"ssl-weak-cipher","template-path":"/nuclei-templates/ssl/weak-cipher-suites.yaml","info":{"name":"Weak SSL Cipher Suites","author":["pdteam"],"tags":["ssl","tls","cipher"],"description":"Weak SSL cipher suites detected","severity":"medium"},"type":"ssl","host":"https://api.example.com","matched-at":"https://api.example.com:443","timestamp":"2024-01-15T10:33:00.000Z"}

Appendix B: Quick Reference Commands

# Initialize Kubebuilder project
kubebuilder init --domain homelab.mortenolsen.pro --repo github.com/mortenolsen/nuclei-operator

# Create API and Controller
kubebuilder create api --group nuclei --version v1alpha1 --kind NucleiScan

# Generate CRD manifests
make manifests

# Install CRDs into cluster
make install

# Run operator locally
make run

# Build and push operator image
make docker-build docker-push IMG=<registry>/nuclei-operator:tag

# Deploy to cluster
make deploy IMG=<registry>/nuclei-operator:tag

Appendix C: Testing Strategy

Unit Tests

// internal/controller/ingress_controller_test.go

var _ = Describe("Ingress Controller", func() {
    Context("When reconciling an Ingress", func() {
        It("Should create a NucleiScan for a new Ingress", func() {
            // Test implementation
        })
        
        It("Should update NucleiScan when Ingress hosts change", func() {
            // Test implementation
        })
        
        It("Should not create duplicate NucleiScans", func() {
            // Test implementation
        })
    })
})

Integration Tests

// internal/controller/suite_test.go

var _ = BeforeSuite(func() {
    // Setup test environment with envtest
    testEnv = &envtest.Environment{
        CRDDirectoryPaths: []string{
            filepath.Join("..", "..", "config", "crd", "bases"),
        },
    }
    // ...
})

End-to-End Tests

# Deploy to kind cluster and run e2e tests
make test-e2e