10BC0 feat: Add enhanced node repair configuration support by sapphirew · Pull Request #8512 · eksctl-io/eksctl · GitHub
[go: up one dir, main page]

Skip to content

Conversation

sapphirew
Copy link
Contributor
@sapphirew sapphirew commented Sep 23, 2025

Description

This PR extends the existing node repair configuration in eksctl to support the full range of AWS EKS node repair parameters. The current implementation only supports a boolean enabled flag, but AWS EKS supports additional parameters for fine-grained control over node repair behavior including thresholds, parallel repair limits, and custom repair overrides, as defined in EKS doc: https://docs.aws.amazon.com/eks/latest/APIReference/API_NodeRepairConfig.html

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the userdocs directory)
  • Manually tested (WIP)
  • Made sure the title of the PR is a good description that can go into the release notes
  • (Core team) Added labels for change area (e.g. area/nodegroup) and kind (e.g. kind/improvement)

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

This commit implements comprehensive enhanced node repair configuration
for EKS managed nodegroups with the following features:

- Support for percentage and count-based unhealthy node thresholds
- Configurable parallel repair limits (percentage and count)
- Advanced node repair config overrides for specific conditions
- Full CLI flag support for all new parameters
- Complete YAML configuration file support
- Backward compatibility with existing configurations

Key changes:
- Extended API types with new NodeRepairConfigOverride struct
- Added CLI flags for all new parameters
- Updated CloudFormation builder for AWS EKS integration
- Comprehensive unit and integration tests
- Updated documentation and examples
- Enhanced JSON schema validation

CLI Examples:
  eksctl create cluster --enable-node-repair --node-repair-max-unhealthy-percentage=25
  eksctl create nodegroup --enable-node-repair --node-repair-max-parallel-count=2

Config Examples:
  nodeRepairConfig:
    enabled: true
    maxUnhealthyNodeThresholdPercentage: 20
    maxParallelNodesRepairedCount: 2
    nodeRepairConfigOverrides:
    - nodeMonitoringCondition: NetworkNotReady
      nodeUnhealthyReason: InterfaceNotUp
      repairAction: Restart
      minRepairWaitTimeMins: 15
@sapphirew sapphirew added the kind/feature New feature or request label Sep 23, 2025
Copy link
Member
@cheeseandcereal cheeseandcereal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0