SageMaker / Client / batch_add_cluster_nodes
batch_add_cluster_nodes¶
- SageMaker.Client.batch_add_cluster_nodes(**kwargs)¶
Adds nodes to a HyperPod cluster by incrementing the target count for one or more instance groups. This operation returns a unique
NodeLogicalId
for each node being added, which can be used to track the provisioning status of the node. This API provides a safer alternative toUpdateCluster
for scaling operations by avoiding unintended configuration changes.Note
This API is only supported for clusters using
Continuous
as theNodeProvisioningMode
.See also: AWS API Documentation
Request Syntax
response = client.batch_add_cluster_nodes( ClusterName='string', ClientToken='string', NodesToAdd=[ { 'InstanceGroupName': 'string', 'IncrementTargetCountBy': 123 }, ] )
- Parameters:
ClusterName (string) –
[REQUIRED]
The name of the HyperPod cluster to which you want to add nodes.
ClientToken (string) –
A unique, case-sensitive identifier that you provide to ensure the idempotency of the request. This token is valid for 8 hours. If you retry the request with the same client token within this timeframe and the same parameters, the API returns the same set of
NodeLogicalIds
with their latest status.This field is autopopulated if not provided.
NodesToAdd (list) –
[REQUIRED]
A list of instance groups and the number of nodes to add to each. You can specify up to 5 instance groups in a single request, with a maximum of 50 nodes total across all instance groups.
(dict) –
Specifies an instance group and the number of nodes to add to it.
InstanceGroupName (string) – [REQUIRED]
The name of the instance group to which you want to add nodes.
IncrementTargetCountBy (integer) – [REQUIRED]
The number of nodes to add to the specified instance group. The total number of nodes across all instance groups in a single request cannot exceed 50.
- Return type:
dict
- Returns:
Response Syntax
{ 'Successful': [ { 'NodeLogicalId': 'string', 'InstanceGroupName': 'string', 'Status': 'Running'|'Failure'|'Pending'|'ShuttingDown'|'SystemUpdating'|'DeepHealthCheckInProgress'|'NotFound' }, ], 'Failed': [ { 'InstanceGroupName': 'string', 'ErrorCode': 'InstanceGroupNotFound'|'InvalidInstanceGroupStatus', 'FailedCount': 123, 'Message': 'string' }, ] }
Response Structure
(dict) –
Successful (list) –
A list of
NodeLogicalIDs
that were successfully added to the cluster. TheNodeLogicalID
is unique per cluster and does not change between instance replacements. Each entry includes aNodeLogicalId
that can be used to track the node’s provisioning status (withDescribeClusterNode
), the instance group name, and the current status of the node.(dict) –
Information about a node that was successfully added to the cluster.
NodeLogicalId (string) –
A unique identifier assigned to the node that can be used to track its provisioning status through the
DescribeClusterNode
operation.InstanceGroupName (string) –
The name of the instance group to which the node was added.
Status (string) –
The current status of the node. Possible values include
Pending
,Running
,Failed
,ShuttingDown
,SystemUpdating
,DeepHealthCheckInProgress
, andNotFound
.
Failed (list) –
A list of errors that occurred during the node addition operation. Each entry includes the instance group name, error code, number of failed additions, and an error message.
(dict) –
Information about an error that occurred during the node addition operation.
InstanceGroupName (string) –
The name of the instance group for which the error occurred.
ErrorCode (string) –
The error code associated with the failure. Possible values include
InstanceGroupNotFound
andInvalidInstanceGroupState
.FailedCount (integer) –
The number of nodes that failed to be added to the specified instance group.
Message (string) –
A descriptive message providing additional details about the error.
Exceptions