Skip to content

Legacy Linux Edge WAN with strongSwan and FRR

Status

Superseded — this early Linux strongSwan + FRR edge pattern is no longer part of the supported HybridOps baseline. The supported path is the VyOS-based edge control plane and on-prem site-extension model.

Superseded guidance

Keep this ADR as implementation history only. For the current supported operating path, use:


1. Context

HybridOps required a reliable early path for site-to-cloud connectivity before the current VyOS-based edge control plane and Hetzner site-extension pattern existed. This ADR records that interim Linux-based design and why it was attractive at the time.

Requirements:

  • Route-based IPsec compatible with cloud-native VPN gateways (GCP HA VPN, Azure VPN Gateway)
  • Dynamic routing via BGP for automatic failover and prefix exchange
  • Narrow traffic selectors to protect management traffic on shared hosts
  • Deterministic configuration via Ansible for repeatability and evidence

Constraints:

  • Must run on standard Linux (Debian/Ubuntu) without proprietary software
  • Must support dual-tunnel HA patterns matching cloud VPN gateway designs
  • Configuration must be auditable and version-controlled

2. Decision

The historical decision recorded here was to adopt strongSwan with swanctl configuration and FRR for BGP as the WAN edge stack for that interim phase.

This is no longer the default path for new HybridOps deployments. New operator guidance should use the current VyOS-based runbooks linked above.

  • IPsec: strongSwan swanctl with route-based VTI interfaces
  • Routing: FRR BGPd with strict prefix-list filtering
  • Tunnels: Dual VTI interfaces per site for HA (matches GCP HA VPN / Azure active-active)
  • Traffic selectors: Narrow selectors based on advertised/imported prefixes
  • Automation: Ansible roles wan_edge (configuration) and wan_validate (verification)

3. Rationale

strongSwan + swanctl over other IPsec implementations:

  • Native Linux, widely deployed, active maintenance
  • swanctl provides declarative configuration (vs legacy ipsec.conf)
  • VTI support for route-based tunnels required by cloud providers
  • Mark-based SA selection avoids policy conflicts with management traffic

FRR over other routing daemons:

  • Industry-standard BGP implementation
  • Integrated vtysh for operational familiarity
  • Prefix-list and route-map support for policy control
  • Active community, Debian/Ubuntu packages available

Dual-tunnel HA pattern:

  • Matches GCP HA VPN interface model (two tunnels, two inside /30s)
  • Provides redundancy without complex failover scripts
  • BGP handles path selection automatically

Narrow traffic selectors:

  • Prevents IPsec policies from capturing management (SSH) traffic
  • Allows shared-host deployments where tunnel and management coexist
  • Matches effective behavior in production (only routed prefixes traverse tunnel)

4. Consequences

4.1 Positive consequences

  • Zero licensing cost for WAN edge functionality
  • Consistent configuration across sites via Ansible
  • Cloud-provider agnostic (same pattern for GCP, Azure, AWS)
  • Full observability via standard Linux tools (ip xfrm, vtysh, journalctl)
  • Testable locally with WAN simulator before production deployment

4.2 Negative consequences / risks

  • Requires Linux networking expertise for troubleshooting
  • No vendor support; community and internal knowledge required
  • BGP misconfiguration can cause routing loops or blackholes
  • IPsec rekeying during high traffic may cause brief packet loss

Mitigations:

  • wan_validate role provides automated health checks
  • Strict prefix-lists prevent route leaks
  • Smoke tests validate configuration before production apply

5. Alternatives considered

Commercial SD-WAN (Cisco, Fortinet, Palo Alto)

  • Rejected: Licensing cost prohibitive for platform scale
  • Rejected: Vendor lock-in conflicts with multi-cloud strategy

WireGuard

  • Rejected: No native BGP integration
  • Rejected: Not supported by GCP/Azure VPN gateways for site-to-cloud

OpenVPN

  • Rejected: TLS-based, not compatible with cloud IPsec gateways
  • Rejected: Performance inferior to kernel IPsec

LibreSwan

  • Considered: Similar capability to strongSwan
  • Rejected: Less active development, smaller community

6. Implementation notes

Ansible roles:

  • hybridops.network.wan_edge — strongSwan, VTI, FRR configuration
  • hybridops.network.wan_validate — IPsec, BGP, route, reachability checks

Key files:

  • roles/wan_edge/templates/swanctl.conf.j2 — IPsec configuration
  • roles/wan_edge/templates/frr.conf.j2 — BGP configuration
  • roles/wan_edge/defaults/main.yml — tunable defaults

Configuration flow:

  1. Packages installed (strongswan-swanctl, frr)
  2. VTI interfaces created with marks matching IPsec SA
  3. swanctl.conf rendered with narrow traffic selectors
  4. frr.conf rendered with prefix-lists and peer-group
  5. Services enabled, handlers restart on config change
  6. CHILD_SAs verified installed before completion

7. Operational impact and validation

Validation role checks:

  • CHILD_SA count matches tunnel count
  • No SAs in transient state (REKEYING, DELETING)
  • BGP neighbors established (not Active/Idle)
  • Accepted prefix count >= 1 per neighbor
  • Expected routes present in BGP table
  • End-to-end ping to remote loopbacks

Smoke test:

  • Local WAN simulator with two VMs
  • Exercises full IPsec + BGP + routing chain
  • Run via make test.local ROLE=wan_edge

Production monitoring:

  • swanctl --list-sas for IPsec state
  • vtysh -c "show bgp summary" for BGP state
  • Prometheus exporters available for both (future enhancement)

8. References

Related ADRs:

External:


Maintainer: HybridOps
License: MIT-0 for code, CC-BY-4.0 for documentation


Runbooks

Superseded by