Sidecars Are Dying in 2025: Ambient Mesh, eBPF, and the Gateway API Are Rewriting Kubernetes Service Networking
A pragmatic guide to life after sidecars: how Istio Ambient Mesh, eBPF-driven data planes (Cilium today, Linkerd emerging), and Kubernetes Gateway API cut latency and cost, simplify mTLS and traffic policy, and how to migrate safely—plus when you shouldn’t.
Executive summary
- Sidecars transformed Kubernetes service networking from 2017–2022, but their cost and complexity at scale are now often unjustifiable.
- Three converging trends in 2024–2025 make a sidecar-light or sidecar-free future viable:
- Istio Ambient Mesh separates L4 identity/mTLS from L7 policy with waypoint proxies—no per-pod sidecars.
- eBPF data planes (notably Cilium Service Mesh; Linkerd’s eBPF work is emerging) move more routing, policy, and telemetry into the kernel path.
- The Kubernetes Gateway API unifies north-south and east-west traffic policy, replacing bespoke CRDs with a standard model and conformance tests.
- Expect materially lower latency (fewer context switches), reduced resource spend (no per-pod proxies), simpler upgrades, and standardized policy.
- Migration can be incremental: adopt Gateway API first, enable L4 mesh/mTLS, then add L7 where it pays for itself.
- You should not migrate if you rely on niche L7 per-pod features, run on old kernels/Windows nodes, or have org/process constraints that make kernel-level datapaths too risky today.
The problem sidecars solved—and why they’re costly now
Sidecar proxies provided a uniform place to inject features that Kubernetes lacked: mTLS, retries/timeouts, circuit breaking, traffic shaping, and detailed telemetry. By 2021, “install a mesh, inject sidecars” was the default.
The bill came due:
- Latency: Each request pays for at least two extra user-space proxies (client and server) and multiple kernel/user context switches. For small RPCs, a sidecar can add 0.5–2 ms p95 per hop in steady state, higher under load.
- CPU/Memory: Common Envoy sidecars consume tens of MB of RAM and measurable CPU per pod even when idle. Across 1,000 pods, 50–100 GB RAM of proxies is routine.
- Operational drag: Upgrading proxies means touching every workload; iptables interception complicates debugging; policy CRDs proliferate; security posture depends on hardening thousands of containers.
Cloud-native adopters tolerated this for the feature gains. But as kernel and platform capabilities matured, the opportunity cost became obvious.
What changed by 2025
- Ambient Mesh decouples L4 identity from L7 features.
- Istio Ambient Mesh introduces ztunnel (per-node, L4-only, zero-trust tunnel) for identity and mTLS without a per-pod proxy. L7 features are optional and localized via “waypoint” proxies per service or namespace.
- Result: most traffic pays only the L4 cost; L7 is applied only to workloads that need it.
- eBPF moves the fast path into the kernel.
- eBPF programs attach to sockets/XDP and implement load balancing, policy, and observability with minimal overhead.
- Cilium replaces kube-proxy, adds identity-aware L3/L4/L7 policy with Envoy only when necessary, and now delivers service-mesh features without per-pod sidecars.
- Linkerd’s community has been exploring eBPF-based datapaths to reduce iptables reliance and potentially enable sidecarless modes; as of 2024–2025, this is evolving—production users should verify current status and support.
- The Gateway API standardizes traffic policy.
- Gateway API is now the de-facto spec for HTTP/GRPC/TCP/TLS routing across ingress, egress, and service-to-service (via the GAMMA initiative).
- Mesh vendors are converging on Gateway API for traffic splitting, timeouts, retries, header matching, etc., reducing vendor lock-in and CRD sprawl.
Architectural mental model: control plane, L4 identity, L7 policy
- Control plane: config, identity distribution, and reconciliation (Istio control plane, Cilium operator, Linkerd control plane).
- L4 identity and encryption: implemented by ztunnel (Ambient) or eBPF datapaths (Cilium) often with SPIFFE/SPIRE identities. The L4 layer should be everywhere, cheap, and uniform.
- L7 policy and features: applied only where needed via waypoint proxies (Ambient) or node-level/namespace-scoped Envoy (Cilium). L7 is expensive; use intentionally.
This model matches modern production needs: default-zero-trust, low overhead, and selective higher-level control.
Istio Ambient Mesh: L4 by default, L7 when it pays for itself
Ambient Mesh (introduced 2022, matured substantially by 2024) rethinks Istio’s data plane.
- ztunnel: a lightweight, per-node, L4-only component establishing mTLS and identity. No per-pod proxy injection.
- Waypoint: an Envoy-based L7 proxy deployed per service, workload, or namespace for HTTP-aware features (routing, retries, traffic splitting, JWT auth, etc.).
- HBONE: a tunnel protocol enabling L4 mTLS while allowing optional L7 inspection where a waypoint is present.
Benefits:
- Latency: Most traffic crosses a single L4 tunnel hop instead of two sidecars; reduced context switches.
- Cost: ztunnel instances scale with nodes, not pods. Waypoints are provisioned only for L7 workloads.
- Operability: No per-pod injection; upgrades are decoupled from application deployments. L7 blast radius is scoped to a waypoint.
Typical rollout flow:
- Enable Ambient in the Istio control plane.
- Onboard namespaces to the mesh with a label.
- Enforce mesh-wide mTLS via AuthorizationPolicy/PeerAuthentication.
- Add L7 via waypoint only where needed.
Example: enable Ambient and onboard a namespace
yaml# 1) Install Istio with Ambient mode enabled (operator/helm not shown here) --- apiVersion: v1 kind: Namespace metadata: name: shop labels: istio.io/dataplane-mode: ambient # opt this namespace into ztunnel --- apiVersion: "security.istio.io/v1beta1" kind: PeerAuthentication metadata: name: default namespace: shop spec: mtls: mode: STRICT
Add a waypoint for selective L7 control over a service account
yamlapiVersion: networking.istio.io/v1alpha3 kind: Waypoint metadata: name: payments-waypoint namespace: shop spec: workloadSelector: labels: app: payments proxy: type: ENVOY # L7 features enabled via Envoy
Implement L7 traffic split using Gateway API via Istio
yamlapiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: payments-traffic-split namespace: shop spec: parentRefs: - name: payments-waypoint-gw # Istio binds a Gateway object to waypoint hostnames: ["payments.shop.svc.cluster.local"] rules: - matches: - path: type: PathPrefix value: "/" backendRefs: - name: payments-v1 port: 8080 weight: 90 - name: payments-v2 port: 8080 weight: 10 timeouts: request: 5s retries: attempts: 3 perTryTimeout: 1s
Migration from sidecar-based Istio
- Keep the control plane.
- Disable auto-injection, label namespaces for Ambient.
- Create waypoints for workloads that had L7 policies; translate VirtualService/DestinationRule to HTTPRoute where possible.
- Validate mTLS with PeerAuthentication and AuthorizationPolicy.
Operational notes
- Ambient’s resource cost is dominated by ztunnel per node and any deployed waypoints—not by pod count.
- Observability remains available via Envoy filters where waypoints exist, and via ztunnel/L4 telemetry elsewhere.
Cilium Service Mesh: eBPF fast path plus Envoy only where needed
Cilium began as an eBPF-powered CNI and kube-proxy replacement. It now delivers a service mesh without sidecars by combining:
- eBPF-based L3/L4 load balancing and policy attached to sockets and network interfaces.
- Identity-aware policy: each pod gets an identity (label-derived), enforced directly in the datapath.
- L7 visibility and policy via Envoy sidecars replaced by Envoy proxies running per-node or scoped where required.
- Full Gateway API support including GAMMA for mesh scenarios.
Why eBPF matters
- Kernel-level execution avoids iptables rule bloat and user-space round trips.
- Lower tail latency and better CPU efficiency for high QPS, small-message RPCs.
- Fine-grained observability via Hubble without traffic mirroring.
Example: Cilium installed as kube-proxy replacement with Gateway API
bash# Example (simplified) using cilium CLI cilium install \ --version 1.15.5 \ --datapath-mode=veth \ --enable-hubble \ --kube-proxy-replacement=strict \ --set gatewayAPI.enabled=true cilium hubble enable --ui
Identity-aware L3/L4 policy with mTLS requirement
yamlapiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: ratings-policy namespace: shop spec: endpointSelector: matchLabels: app: ratings ingress: - fromEndpoints: - matchLabels: app: frontend toPorts: - ports: - port: "8080" protocol: TCP rules: http: - method: GET path: "/api/v1/ratings"
Note: Cilium can enforce L7 HTTP rules via Envoy without sidecars by running Envoy where required. For pure L4 performance, omit L7 rules and rely on eBPF only.
Gateway API HTTPRoute for internal service-to-service
yamlapiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: mesh-internal namespace: shop spec: gatewayClassName: cilium listeners: - name: http protocol: HTTP port: 80 allowedRoutes: namespaces: from: Same --- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: ratings-route namespace: shop spec: parentRefs: - name: mesh-internal hostnames: ["ratings.shop.svc.cluster.local"] rules: - matches: - path: type: PathPrefix value: "/api/v1" backendRefs: - name: ratings port: 8080 filters: - type: RequestHeaderModifier requestHeaderModifier: add: - name: x-mesh-user value: "gateway-api"
Observability: Hubble
bash# Tail L4/L7 flows hubble observe --protocol http --namespace shop --follow # Get service dependency graph hubble status hubble ui # optional web UI
mTLS and identity
- Cilium can integrate with SPIFFE/SPIRE or cert-manager to issue identities.
- Data plane enforcement at L4 is extremely efficient; L7 mTLS termination/reencryption is possible through Envoy where needed.
Operational notes
- If you enable kube-proxy replacement, validate cluster DNS and node-local services thoroughly.
- Cilium upgrades are daemonset rollouts, not pod-by-pod reinjections.
Linkerd in 2025: smallest sidecar, Gateway API, and eBPF integration paths
Linkerd historically prioritized simplicity and safety with a minimal Rust proxy sidecar. In 2025, you’ll see three realities in production:
- Many teams stick with Linkerd’s sidecars because they are small, predictable, and operationally simple, and because Linkerd’s core L7 features meet their needs.
- Linkerd supports the Gateway API (including GAMMA-aligned use cases) for traffic management and policy, reducing the need for proprietary CRDs.
- There’s ongoing community and vendor research into eBPF-enhanced datapaths for Linkerd to reduce iptables dependence and potentially enable sidecarless operation. Treat this as evolving—verify maturity, support, and conformance before broad adoption.
For teams converging on the Gateway API, Linkerd can participate in the same policy model you use for ingress, with mesh enforcement by the Linkerd control plane and proxies. If you later decide to shift to a sidecarless approach, your Gateway API artifacts carry forward.
Gateway API: your long-lived traffic policy surface
Gateway API decouples policy from implementation with portable CRDs:
- GatewayClass, Gateway: define data-plane instances and listeners.
- HTTPRoute, GRPCRoute, TCPRoute, TLSRoute, UDPRoute: define routing.
- PolicyAttachment: attach timeouts/retries/TLS/backends to routes or backends.
- GAMMA: a community effort to apply Gateway API for service-to-service policy in meshes.
Key benefits
- Standardization and conformance: vendors implement the same CRDs with test suites.
- Fewer bespoke CRDs: VirtualService/DestinationRule equivalents expressed as HTTPRoute/BackendPolicy, etc.
- Easier migration: routes outlive the choice of mesh or ingress provider.
Example: GRPCRoute with retries and timeouts
yamlapiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: internal-grpc namespace: shop spec: gatewayClassName: cilium # or istio, or a managed class listeners: - name: grpc protocol: GRPC port: 8080 --- apiVersion: gateway.networking.k8s.io/v1 kind: GRPCRoute metadata: name: reviews-grpc-route namespace: shop spec: parentRefs: - name: internal-grpc hostnames: ["reviews.shop.svc.cluster.local"] rules: - matches: - method: service: "reviews.v1.ReviewsService" method: "GetReview" backendRefs: - name: reviews port: 8080 filters: - type: RequestMirror requestMirror: backendRef: name: reviews-canary port: 8080 timeouts: request: 2s retries: attempts: 2 perTryTimeout: 1s
Note: exact filter/timeout fields may vary; some features ship via policy attachments such as BackendPolicy or extension policies provided by the implementation.
Performance and cost modeling: what you actually save
Back-of-the-envelope math for a medium cluster.
Assumptions
- 1,000 pods, average sidecar RAM 80 MiB, CPU 50 millicores baseline.
- P95 per-hop sidecar-induced latency: 0.8 ms at steady state.
- Traffic involves 2 service-to-service hops per request.
Sidecar-based mesh cost
- Memory: ~80 GiB across the cluster for proxies alone.
- CPU: 50 vCPU total baseline across proxies.
- Latency tax per request path: ~3.2 ms at p95 (two hops, client+server proxies each).
Ambient + eBPF model
- ztunnel per node: 50–150 MiB per node. For 50 nodes, ~2.5–7.5 GiB cluster-wide.
- Optional waypoints: provisioned only for a subset (say 10%) of services that need L7 features.
- eBPF datapath: negligible rule explosion; low CPU overhead for L4 policies.
- Latency tax: often sub-millisecond; for pure L4, tens to hundreds of microseconds. L7 costs apply only where waypoints exist.
Net effect
- 10×+ reduction in steady RAM overhead is realistic.
- Double-digit percent CPU savings under load, and lower p95/p99 latency, particularly for small gRPC/HTTP messages.
- Mesh upgrades decouple from app rollouts, reducing toil and risk.
Your mileage will vary. Always run a representative benchmark in a staging environment before committing cluster-wide.
Baseline test harness example (HTTP)
bash# Deploy a simple demo with and without L7 policy kubectl -n perf apply -f demo.yaml # Generate load (Fortio or wrk) fortio load -c 64 -qps 0 -t 300s http://frontend.perf.svc.cluster.local:8080/ # Capture p50/p90/p99, CPU, RSS kubectl top pod -n perf hubble observe --protocol http --namespace perf --since 5m | tee flows.log
Migration blueprint: a pragmatic, low-risk path
- Preconditions
- Kernel 5.10+ recommended for mature eBPF features (5.15+ ideal). For managed Kubernetes, ensure the node image supports required bpf features.
- Plan for CNI interactions: if adopting Cilium, use kube-proxy replacement or validate compatibility when leaving kube-proxy on.
- Inventory language runtimes and TLS requirements (JVM, Go, .NET) for any app-level changes.
- Stabilize on Gateway API
- Introduce Gateway API for ingress first, using your provider’s GatewayClass.
- Start expressing internal routing and traffic-splitting policies as HTTPRoute/GRPCRoute with GAMMA where supported.
- Benefits accrue immediately: portable policy model and easier next steps.
- Turn on L4 mesh/mTLS everywhere
- Option A (Ambient): enable Ambient, label namespaces, flip PeerAuthentication to STRICT, and verify ztunnel connectivity. Add waypoints later.
- Option B (Cilium): migrate CNI to Cilium, enable identity-aware policies, integrate SPIRE/cert-manager for mTLS. Keep L7 policies off initially.
- Option C (Linkerd today): if staying with Linkerd, ensure Gateway API integration for policy and adopt CNI mode to minimize init-container/iptables costs. Revisit eBPF-sidecarless when it’s production-ready for your environment.
- Introduce L7 selectively
- Add waypoints (Ambient) or enable L7 on specific routes (Cilium) only for workloads that need retries, timeouts, JWT/OIDC auth, or complex routing.
- Keep default traffic purely L4 for minimal overhead.
- Observability and SLOs
- Use Hubble (Cilium) or ztunnel/Envoy metrics (Ambient) with OpenTelemetry. Establish golden signals per service.
- Compare before/after p95, CPU, and RSS. Make L7 policy justify its cost.
- Decommission sidecars and iptables
- Remove sidecar injection (Istio/Linkerd) for namespaces migrated to Ambient/eBPF.
- Validate emergency rollback: keep previous manifests and admission policies to re-enable sidecars if needed.
- Harden and scale out
- Enforce mesh-wide mTLS, default-deny L3/L4 policies, and progressive rollout guardrails.
- Set SLOs and error budgets around the data plane itself. Practice node drain and component failover tests.
Rollout checklist
- Canary one namespace with non-critical services.
- Run chaos: cordon/drain nodes, restart ztunnel/daemonsets, simulate packet loss.
- Validate zero downtime during control-plane upgrades.
Security posture: identity, keys, and blast radius
- SPIFFE/SPIRE for workload identity: decouple mTLS identity from pod IPs and certificates bound to trust domains (e.g., spiffe://cluster/ns/serviceaccount).
- Key management: use SDS-like mechanisms; ensure keys never leave memory unencrypted; rotate short-lived certs (e.g., 12–24h).
- Blast radius: L7 waypoints scope policy and failures to a namespace or service instead of every pod.
- Node-level agents: harden host security (AppArmor/SELinux), restrict BPF syscalls to trusted daemons, and monitor for kernel regressions.
When you shouldn’t migrate (yet)
- You depend on per-pod L7 features that don’t map cleanly to waypoints or node-level proxies (e.g., unique TLS origination per pod, highly dynamic per-pod authz logic).
- Older kernels or Windows worker nodes where eBPF features or Ambient support are limited.
- Heavy multi-cluster or legacy network environments where your current mesh integrates with bespoke gateways or appliances that rely on sidecar semantics.
- Organizational constraints: security teams disallow kernel-level datapath changes, or you can’t perform node image updates reliably.
- You already run Linkerd sidecars efficiently, and cost/latency pressures are low. Optimization may not justify migration risk.
Anti-patterns to avoid
- Turning on L7 globally “just in case.” Apply L7 only where needed.
- Mixing overlapping policy surfaces (Gateway API + legacy CRDs + NetworkPolicies) without ownership and precedence defined.
- Skipping conformance: don’t assume Gateway API features behave identically; verify implementation-specific extensions.
- Big-bang cutovers. Always canary new datapaths and keep a rollback playbook.
Field notes: translating policies
From Istio VirtualService/DestinationRule to Gateway API
- VirtualService http.route -> HTTPRoute rules.
- DestinationRule subsets -> Service subsets via labels, referenced by backendRefs.
- TrafficSplit (SMI) users can convert to HTTPRoute with multiple backendRefs and weights.
From NetworkPolicy to CiliumNetworkPolicy
- Standard Kubernetes NetworkPolicy remains valid for L3/L4. Use CNP for identity-aware and L7 policies.
From sidecar annotations to Ambient
- Remove sidecar injection labels and annotations.
- For L7 policies, declare waypoints and attach HTTPRoutes.
Troubleshooting and observability tips
- Path tracing: with Hubble, visualize path and policy decisions. With Ambient, inspect ztunnel logs for identity issues and waypoint Envoy for L7 decisions.
- mTLS failures: check SPIRE status, certificate validity, and trust domain alignment.
- Latency spikes: ensure you didn’t unintentionally apply L7 globally; watch CPU throttling on waypoint pods.
- Kernel issues: pin to a known-good kernel; keep Cilium/Istio release notes handy for compatibility matrices.
Commands
bash# Check pod identities (Cilium) cilium endpoint list -n shop # Confirm Gateway API status kubectl get gateways.gateway.networking.k8s.io -A kubectl get httproutes.gateway.networking.k8s.io -A # Ambient: inspect ztunnel kubectl -n istio-system logs ds/ztunnel -c ztunnel --tail=200 # Envoy config dump (waypoint) kubectl -n shop exec deploy/payments-waypoint -c envoy -- envoy --config-dump
What about multi-cluster and egress?
- Multi-cluster: Ambient and Cilium both support multi-cluster topologies. Favor identity federation (SPIFFE Federation) and Gateway API for cross-cluster routing.
- Egress: use Gateway API’s TLSRoute/TCPRoute to define egress policies; combine with explicit egress gateways rather than transparent egress where auditability is required.
The call: pick a lane, standardize policy, iterate
Pragmatic guidance:
- Standardize on the Gateway API now; it outlives your data plane choice.
- Adopt L4 mTLS everywhere with either Ambient or Cilium—low cost, high value.
- Layer in L7 per-service via waypoints/node-level Envoy only where feature ROI is clear.
- If you’re on Linkerd and happy, use Gateway API and CNI integrations today; revisit sidecarless as it matures.
Sidecars aren’t literally dead in 2025—but they are no longer the default. Treat them as a targeted tool for targeted workloads, not a cluster-wide tax.
Further reading and references
- Istio Ambient Mesh architecture and HBONE
- Cilium Service Mesh, kube-proxy replacement benchmarks, and Hubble
- Kubernetes Gateway API and GAMMA conformance
- SPIFFE/SPIRE for workload identity
Consult your vendor or upstream release notes for exact GA/alpha statuses, as implementations evolve quickly.