• Why Kubernetes EndpointSlice Delays Break Service Discovery
    Jun 15 2026
    In episode 52 of DevOps Daily with Fexingo, Lucas and Luna dig into a hidden performance gotcha in Kubernetes service discovery: the EndpointSlice controller's default sync interval. Lucas walks through a real incident at a mid-size e-commerce company where a deployment rollout caused five minutes of traffic blackholing because EndpointSlices lagged behind pod readiness. They compare EndpointSlice to the older Endpoints API, explain the kube-controller-manager flags that control sync timing, and discuss the trade-offs between faster propagation and API server load. Luna challenges Lucas on whether most teams should even tweak these defaults, and they land on a practical recommendation: monitor your service propagation latency before you tune anything. If you manage Kubernetes at scale and have ever seen 'no endpoints available' for longer than you expected, this episode explains why. #Kubernetes #EndpointSlice #ServiceDiscovery #kubeControllerManager #PodLifecycle #TrafficBlackholing #CloudNative #DevOps #SiteReliabilityEngineering #KubernetesNetworking #EndpointsAPI #ContainerOrchestration #KubernetesPerformance #RollingUpdate #FexingoTechnology #TechnologyPodcast #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Mehr anzeigen Weniger anzeigen
    10 Min.
  • Why Kubernetes Service Mesh Sidecars Drain Your Memory Budget
    Jun 14 2026
    In this episode of DevOps Daily, Lucas and Luna explore a hidden cost of Kubernetes service meshes: the memory overhead of sidecar proxies. They break down how Istio's Envoy sidecars commonly consume 50-100 MB per pod, and how a 50-node cluster with 200 pods can waste over 10 GB of RAM just on proxy overhead. They compare sidecar-less alternatives like Cilium's eBPF-based mesh and Istio's ambient mode, and share real benchmarks from large-scale deployments that saved 30-40% on compute costs by switching. The hosts also discuss practical strategies—like tuning proxy resource limits and using selective sidecar injection—to avoid blowing your cluster's memory budget. A must-listen for any Kubernetes operator optimizing cloud costs. #Kubernetes #ServiceMesh #Istio #Envoy #Sidecar #MemoryOverhead #Cilium #eBPF #AmbientMesh #CloudCosts #ResourceOptimization #DevOps #Technology #FexingoBusiness #BusinessPodcast #Podcast #CloudNative #SRE Keep every episode free: buymeacoffee.com/fexingo
    Mehr anzeigen Weniger anzeigen
    7 Min.
  • Why Kubernetes Priority Classes Create Scheduling Chaos
    Jun 14 2026
    Episode 50 of DevOps Daily with Fexingo takes a hard look at Kubernetes PriorityClasses — the feature meant to ensure critical pods run first but often backfires into scheduling chaos. Lucas and Luna unpack a real production outage at a mid-sized fintech company where misconfigured priority classes caused a cascade of evictions and resource starvation. They walk through the mechanics of preemption, the hidden cost of default priorities, and why many teams set up their priority tiers wrong. The conversation also touches on the tension between reliability engineers and cost optimizers, and offers a practical heuristic for assigning priority classes that won't surprise you during a traffic spike. If you've ever wondered why your cluster autoscaler is scaling up like crazy while pods are still pending, this episode explains the silent culprit. #Kubernetes #DevOps #PriorityClasses #Scheduling #CloudNative #Preemption #ClusterAutoscaler #PodLifecycle #Fintech #ProductionOutage #ResourceManagement #SRE #PlatformEngineering #K8sBestPractices #TechPodcast #DevOpsDaily #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Mehr anzeigen Weniger anzeigen
    9 Min.
  • How Kubernetes Node Draining Can Cause Cascading Failures
    Jun 13 2026
    In this episode of DevOps Daily with Fexingo, Lucas and Luna dive into a specific failure mode that often surprises even experienced Kubernetes operators: cascading failures caused by mishandled node draining. They walk through a real-world scenario where a routine node drain for a kernel patch triggered a chain reaction: PodDisruptionBudgets were respected on paper, but the remaining nodes couldn't handle the load, leading to resource starvation and a partial cluster outage. They break down why the Kubernetes scheduler doesn't account for combined resource pressure during draining, how default eviction timeouts can mask the problem, and what operational practices — like pre-drain load testing and gradual cordoning — can prevent the cascade. No fluff, just one concrete angle with actionable takeaways. #Kubernetes #NodeDraining #CascadingFailures #PodDisruptionBudget #ClusterResilience #SRE #DevOps #CloudNative #Operations #ProductionIncident #KubernetesScheduler #EvictionTimeout #LoadTesting #HighAvailability #Infrastructure #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Mehr anzeigen Weniger anzeigen
    9 Min.
  • Why Kubernetes CronJob Timezones Break Production Schedules
    Jun 13 2026
    In this episode, Lucas and Luna dive into a deceptively tricky Kubernetes feature: CronJob timezone handling. They break down how CronJobs default to UTC regardless of the cluster's regional settings, leading to off-schedule jobs, missed backups, and batch processing failures. Using a real-world example from a fintech company that lost two hours of payment reconciliation data, they explain why the .spec.schedule timezone field added in Kubernetes 1.27 is still underused, and why operators relying on TZ environment variables are playing with fire. They also cover how kube-controller-manager's timezone drift can compound the problem, and share a simple audit command to check all CronJobs for implicit UTC assumptions. If you manage scheduled jobs in Kubernetes, this episode will save you from a midnight pager. #Kubernetes #CronJob #Timezone #DevOps #CloudNative #Scheduling #ProductionBug #Fintech #BatchProcessing #KubeControllerManager #UTC #SiteReliability #PlatformEngineering #TechPodcast #FexingoBusiness #BusinessPodcast #DevOpsDaily #SoftwareOperations Keep every episode free: buymeacoffee.com/fexingo
    Mehr anzeigen Weniger anzeigen
    8 Min.
  • Why Kubernetes Pod Security Standards Still Leak
    Jun 12 2026
    Lucas and Luna revisit pod security standards in Kubernetes, digging into a specific case where a restricted PSP still allowed privilege escalation through a misconfigured seccomp profile. They walk through a real-world example from a fintech startup that ran a compliance audit and discovered their 'secure' pods were running with default-deny seccomp disabled. The conversation covers why PSPs and PSS profiles are not a silver bullet, how admission controllers can be bypassed, and what operators should actually check in their cluster logs. No fluff, just the concrete gap between policy intent and runtime reality. #Kubernetes #PodSecurity #Seccomp #ContainerSecurity #DevOps #Technology #CloudNative #SecurityAudit #PSP #PSS #AdmissionController #RuntimeSecurity #Fintech #Compliance #FexingoBusiness #BusinessPodcast #Podcast #DevOpsDaily Keep every episode free: buymeacoffee.com/fexingo
    Mehr anzeigen Weniger anzeigen
    8 Min.
  • Why Kubernetes Pod Priority Preemption Wastes Cluster Resources
    Jun 12 2026
    Lucas and Luna break down a subtle but costly Kubernetes anti-pattern: using pod priority and preemption without proper resource accounting. They walk through how a mid-stage startup accidentally triggered a cascading preemption storm by misconfiguring priority classes on their CI/CD runners and ML training pods, leading to 40% wasted compute and unpredictable evictions. Lucas explains the math behind the scheduler's preemption logic, why lower-priority pods can still cause cluster thrashing, and three concrete fixes: bin packing via node selectors, using descheduler plugins, and setting priority thresholds based on actual workload criticality. Luna pushes back on the 'just set higher priority' mindset and asks whether priority classes even make sense without guaranteed QoS. The episode includes a natural donation segment around the 25% mark where the hosts reflect on the value of ad-free content. Perfect for anyone managing Kubernetes clusters who has ever wondered why their nodes look full but their applications feel empty. #Kubernetes #PodPriority #Preemption #ClusterScheduler #ResourceManagement #CloudNative #DevOps #K8sAntiPatterns #NodePressure #BinPacking #Descheduler #QualityOfService #CICD #MLTraining #ClusterThrashing #FexingoBusiness #BusinessPodcast #Technology Keep every episode free: buymeacoffee.com/fexingo
    Mehr anzeigen Weniger anzeigen
    8 Min.
  • Why Kubernetes Topology Spread Constraints Create Unbalanced Nodes
    Jun 11 2026
    Episode 45 of DevOps Daily with Fexingo digs into Kubernetes topology spread constraints – the feature meant to spread pods evenly across failure domains, but that can actually create severe node imbalances. Lucas and Luna explore a real incident where a three-node cluster ended up with 40 pods on one node and 4 each on the others, causing resource exhaustion and cascading failures. They explain how 'maxSkew' and 'whenUnsatisfiable' interact in surprising ways, why topology spread constraints don't play well with autoscalers, and how to avoid the 'spread paradox' where enforcing balance creates imbalance. Tune in for practical configuration advice and a breakdown of when to use topology spread instead of pod affinity. #Kubernetes #TopologySpreadConstraints #MaxSkew #PodScheduling #DevOps #K8s #ClusterBalancing #NodeImbalance #PodAffinity #CloudNative #FexingoBusiness #BusinessPodcast #Technology #CI/CD #SoftwareOperations #Scheduling #Autoscaler #KubernetesFailure Keep every episode free: buymeacoffee.com/fexingo
    Mehr anzeigen Weniger anzeigen
    11 Min.