Customize DeepSeek-R1 671b model using Amazon SageMaker HyperPod recipes – Part 2

Customize DeepSeek-R1 671b model using Amazon SageMaker HyperPod recipes – Part 2

This post is the second part of the DeepSeek series focusing on model customization with Amazon SageMaker HyperPod recipes (or recipes for brevity). In Part 1, we demonstrated the performance and ease of fine-tuning DeepSeek-R1 distilled models using these recipes. In this post, we use the recipes to fine-tune the original DeepSeek-R1 671b parameter model. We demonstrate this through the step-by-step implementation of these recipes using both SageMaker training jobs and SageMaker HyperPod.

Business use case

After its public release, DeepSeek-R1 model, developed by DeepSeek AI, showed impressive results across multiple evaluation benchmarks. The model follows the Mixture of Experts (MoE) architecture and has 671 billion parameters. Traditionally, large models are well adapted for a wide spectrum of generalized tasks by the virtue of being trained on the huge amount of data. The DeepSeek-R1 model was trained on 14.8 trillion tokens. The original R1 model demonstrates strong few-shot or zero-shot learning capabilities, allowing it to generalize to new tasks and scenarios that weren’t part of its original training.

However, many customers prefer to either fine-tune or run continuous pre-training of these models to adapt it to their specific business applications or to optimize it for specific tasks. A financial organization might want to customize the model with their custom data to assist with their data processing tasks. Or a hospital network can fine-tune it with their patient records to act as a medical assistant for their doctors. Fine-tuning can also extend the model’s generalization ability. Customers can fine-tune it with a corpus of text in specific languages that aren’t fully represented in the original training data. For example, a model fine-tuned with an additional trillion tokens of Hindi language will be able to expand the same generalization capabilities to Hindi.

The decision on which model to fine-tune depends on the end application as well as the available dataset. Based on the volume of proprietary data, customers can decide to fine-tune the larger DeepSeek-R1 model instead of doing it for one of the distilled versions. In addition, the R1 models have their own set of guardrails. Customers might want to fine-tune to update those guardrails or expand on them.

Fine-tuning larger models like DeepSeek-R1 requires careful optimization to balance cost, deployment requirements, and performance effectiveness. To achieve optimal results, organizations must meticulously select an appropriate environment, determine the best hyperparameters, and implement efficient model sharding strategies.

Solution architecture

SageMaker HyperPod recipes effectively address these requirements by providing a carefully curated mix of distributed training techniques, optimizations, and configurations for state-of-the-art (SOTA) open source models. These recipes have undergone extensive benchmarking, testing, and validation to provide seamless integration with the SageMaker training and fine-tuning processes.

In this post, we explore solutions that demonstrate how to fine-tune the DeepSeek-R1 model using these recipes on either SageMaker HyperPod or SageMaker training jobs. Your choice between these services will depend on your specific requirements and preferences. If you require granular control over training infrastructure and extensive customization options, SageMaker HyperPod is the ideal choice. SageMaker training jobs, on the other hand, is tailored for organizations that want a fully managed experience for their training workflows. To learn more details about these service features, refer to Generative AI foundation model training on Amazon SageMaker.

The following diagram illustrates the solution architecture for training using SageMaker HyperPod. With HyperPod, users can begin the process by connecting to the login/head node of the Slurm cluster. Each step is run as a Slurm job and uses Amazon FSx for Lustre for storing model checkpoints. For DeepSeek-R1, the process consists of the following steps:

  1. Download the DeepSeek-R1 model and convert weights from FP8 to BF16 format
  2. Load the model into memory and perform fine-tuning using Quantized Low-Rank Adaptation (QLoRA)
  3. Merge QLoRA adapters with the base model
  4. Convert and load the model for batch evaluation

The following diagram illustrates the solution architecture for SageMaker training jobs. You can execute each step in the training pipeline by initiating the process through the SageMaker control plane using APIs, AWS Command Line Interface (AWS CLI), or the SageMaker ModelTrainer SDK. In response, SageMaker launches training jobs with the requested number and type of compute instances to run specific tasks. For DeepSeek-R1, the process consists of three main steps:

  1. Download and convert R1 to BF16 datatype format
  2. Load the model into memory and perform fine-tuning
  3. Consolidate and load the checkpoints into memory, then run inference and metrics to evaluate performance improvements

Prerequisites

Complete the following prerequisites before running the DeepSeek-R1 671B model fine-tuning notebook:

  1. Make the following quota increase requests for SageMaker. You need to request a minimum of two ml.p5.48xlarge instances (with 8 x NVIDIA H100 GPUs) ranging to a maximum of four ml.p5.48xlarge instances (depending on time-to-train and cost-to-train trade-offs for your use case). On the Service Quotas console, request the following SageMaker quotas. It can take up to 24 hours for the quota increase to be approved:
    • P5 instances (ml.p5.48xlarge) for training job usage: 2–4
    • P5 instances (ml.p5.48xlarge) for HyperPod clusters (ml.p5.48xlarge for cluster usage): 2–4
  2. If you choose to use HyperPod clusters to run your training, set up a HyperPod Slurm cluster, referring to Amazon SageMaker HyperPod Developer Guide. Alternatively, you can also use the AWS CloudFormation template provided in the Own Account workshop and follow the instructions to set up a cluster and a development environment to access and submit jobs to the cluster.
  3. (Optional) If you choose to use SageMaker training jobs, you can create an Amazon SageMaker Studio domain (refer to Use quick setup for Amazon SageMaker AI) to access Jupyter notebooks with the preceding role (You can use JupyterLab in your local setup too).
    1. Create an AWS Identity and Access Management (IAM) role with managed policies AmazonSageMakerFullAccess, AmazonFSxFullAccess, and AmazonS3FullAccess to give the necessary access to SageMaker to run the examples.
  4. Clone the GitHub repository with the assets for this deployment. This repository consists of a notebook that references training assets:
git clone https://github.com/aws-samples/sagemaker-distributed-training-workshop.git
cd 18_sagemaker_training_recipes/ft_deepseek_r1_qlora

Solution walkthrough

To perform the solution, follow the steps in the next sections.

Technical considerations

The default weights provided by the DeepSeek team on their official R1 repository are of type FP8. However, we chose to disable FP8 in our recipes because we empirically found that training with BF16 enhances generalization across diverse datasets with minimal changes to the recipe hyperparameters. Therefore, to achieve stable fine-tuning for a model of 671b parameter size, we recommend first converting the model from FP8 to BF16 using the fp8_cast_bf16.py command-line script provided by DeepSeek. Executing this script will copy over the converted BF16 weights in Safetensor format to the specified output directory. Remember to copy over the model’s config.yaml to the output directory so the weights are loaded accurately. These steps are encapsulated in a prologue script and are documented step-by-step under the Fine-tuning section.

Customers can use a sequence length of 8K for training, as tested on a p5.48xlarge instance, each equipped with eight NVIDIA H100 GPUs. You can also choose a smaller sequence length if needed. Training with a sequence length greater than 8K might lead to out-of-memory issues with GPUs. Also, converting model weights from FP8 to BF16 requires a p5.48xlarge instance, which is also recommended for training due to the model’s high host memory requirements during initialization.

Customers must upgrade their transformers version to transformers==4.48.2 to run the training.

Fine-tuning

Run the finetune_deepseek_r1_671_qlora.ipynb notebook to fine-tune the DeepSeek-R1 model using QLoRA on SageMaker.

Prepare the dataset

This section covers loading the FreedomIntelligence/medical-o1-reasoning-SFT dataset, tokenizing and chunking the dataset, and configuring the data channels for SageMaker training on Amazon Simple Storage Service (Amazon S3). Complete the following steps:

  1. Format the dataset by applying the prompt format for DeepSeek-R1:
def generate_prompt(data_point):
full_prompt = f"""
Below is an instruction that describes a task, paired with an input
that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{data_point["Question"]}

### Response:
{data_point["Complex_CoT"]}

"""
return {"prompt": full_prompt.strip()}
  1. Load the FreedomIntelligence/medical-o1-reasoning-SFT dataset and split it into training and validation datasets:
# Load dataset from the hub
train_set = load_dataset(dataset_name, 'en', split="train[5%:]")
test_set = load_dataset(dataset_name, 'en', split="train[:5%]")

...

train_dataset = train_set.map(
generate_and_tokenize_prompt,
remove_columns=columns_to_remove,
batched=False
)

test_dataset = test_set.map(
generate_and_tokenize_prompt,
remove_columns=columns_to_remove,
batched=False
)
  1. Load the DeepSeek-R1 tokenizer from the Hugging Face Transformers library and generate tokens for the train and validation datasets. We use the original sequence length of 8K:
model_id = "deepseek-ai/DeepSeek-R1"
max_seq_length=8096

# Initialize a tokenizer by loading a pre-trained tokenizer configuration, using the fast tokenizer implementation if available.
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)

...

train_dataset = train_dataset.map(tokenize, remove_columns=["prompt"])
test_dataset = test_dataset.map(tokenize, remove_columns=["prompt"])
  1. Prepare the training and validation datasets for SageMaker training by saving them as arrow files, required by SageMaker HyperPod recipes, and constructing the S3 paths where these files will be uploaded. This dataset will be used in both SageMaker training jobs and SageMaker HyperPod examples:
train_dataset_s3_path = f"s3://{bucket_name}/{input_path}/train"
val_dataset_s3_path = f"s3://{bucket_name}/{input_path}/test"

train_dataset.save_to_disk(train_dataset_s3_path)
val_dataset.save_to_disk(val_dataset_s3_path)

The next section describes how to run a fine-tuning example with SageMaker training jobs.

Option A: Fine-tune using SageMaker training jobs

Follow these high-level steps:

  1. Download DeepSeek-R1 to the FSx for Lustre mounted directory
  2. Convert DeepSeek-R1 from FP8 to BF16
  3. Fine-tune the DeepSeek-R1 model
  4. Merge the trained adapter with the base model

Define a utility function to create the ModelTrainer class for every step of the SageMaker training jobs pipeline:

# Creates and executes a model training job using SageMaker
def create_model_trainer(
use_recipes: bool,
compute: dict,
network: dict,
data_channel: dict,
action: str,
hyperparameters: dict ={},
source_code: str=None,
training_recipe: str=None,
recipe_overrides: str=None,
image_uri: str=None
) -> ModelTrainer:

...

Download DeepSeek-R1 to the FSx for Lustre mounted directory

Follow these steps:

  1. Select the instance type, Amazon FSx data channel, network configuration for the training job, and source code, then define the ModelTrainer class to run the training job on the ml.c5.18xlarge instance to download DeepSeek-R1 from the Hugging Face DeepSeek-R1 hub:
# Create compute instance
compute = ComputeCreator.create(
instance_type="ml.c5.18xlarge",
instance_count=1
)

# Create FSx data channel
data_channel = FSxDataChannelCreator.create_channel(
directory_path=fsx_mount_point
)

# Create network configuration
network = NetworkConfigCreator.create_network_config(network_config)

# Set up source code configuration
source_code = SourceCode(
source_dir="scripts",
entry_script="download.py"
)
...

# Create model trainer
model_trainer = create_model_trainer(
compute=compute,
network=network,
data_channel=data_channel,
action="download",
source_code=source_code
...
)
  1. Initiate the training calling train function of the ModelTrainer class:
model_trainer.train(input_data_config=[data_channel], wait=True)

Convert DeepSeek R1 from FP8 to BF16

Use ModelTrainer to convert the DeepSeek-R1 downloaded model weights from FP8 to BF16 format for optimal PEFT training. We use script convert.sh to run the execution using the ml.c5.18xlarge instance.

Use SageMaker training warm pool configuration to retain and reuse provisioned infrastructure after the completion of a model download training job in the previous step:

# Define constants
FSX_MODELDIR_BF16 = "deepseek-r1-bf16"
FSX_DIR_PATH = f"{fsx_mount_point}/{fsx_dir_basemodel}"

# Create compute instance
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=1
)

...

# Set up source code configuration
source_code = SourceCode(
source_dir="scripts",
entry_script="convert.sh"
)

...
# Create model trainer for conversion
model_trainer = create_model_trainer(
..
action="convert",
...
)

Fine-tune the DeepSeek-R1 model

The next phase involves fine-tuning the DeepSeek-R1 model using two ml.p5.48xlarge instances, using distributed training. You implement this through the SageMaker recipe hf_deepseek_r1_671b_seq8k_gpu_qlora, which incorporates the QLoRA methodology. QLoRA makes the large language model (LLM) trainable on limited compute by quantizing the base model to 4-bit precision while using small, trainable low-rank adapters for fine-tuning, dramatically reducing memory requirements without sacrificing model quality:

# Create compute configuration with P5 instances
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=2
)

...

# Create model trainer for fine-tuning
model_trainer = create_model_trainer(
use_recipes=True,
...
action="finetune",
training_recipe='fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora',
recipe_overrides=recipe_overrides
)

Initiate the training job to fine-tune the model. SageMaker training jobs will provision two P5 instances, orchestrate the SageMaker model parallel container smdistributed-modelparallel:2.4.1-gpu-py311-cu121, and execute the recipe to fine-tune DeepSeek-R1 with the QLoRA strategy on an ephemeral cluster:

model_trainer.train (input_data_config=[data_channel], wait=True)

Merge the trained adapter with the base model

Merge the trained adapters with the base model so it can be used for inference:

# Create compute configuration with P5 instance
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=1
)

# Configure source code location and entry point
source_code = SourceCode(
source_dir="scripts",
entry_script="cli-inference.sh"
)
...

# Create model trainer for adapter merging
model_trainer = create_model_trainer(
use_recipes=False,
...
action="mergeadapter",
source_code=source_code,
)

The next section shows how you can run similar steps on HyperPod to run your generative AI workloads.

Option B: Fine-tune using SageMaker HyperPod with Slurm

To fine-tune the model using HyperPod, make sure that your cluster is up and ready by following the prerequisites mentioned earlier. To access the login/head node of the HyperPod Slurm cluster from your development environment, follow the login instructions at SSH into Cluster in the workshop.

Alternatively, you can also use AWS Systems Manager and run a command such as the following to start the session. You can find the cluster ID, instance group name, and instance ID on the Amazon SageMaker console.

aws ssm start-session --target sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id] --region region_name
  1. When you’re in the cluster’s login/head node, run the following commands to set up the environment. Run sudo su - ubuntu to run the remaining commands as the root user, unless you have a specific user ID to access the cluster and your POSIX user is created through a lifecycle script on the cluster. Refer to the multi-user setup for more details.
# create a virtual environment
python3 -m venv ${PWD}/venv
source venv/bin/activate

# clone the recipes repository and set up the environment
git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
cd sagemaker-hyperpod-recipes
pip3 install -r requirements.txt
  1. Create a squash file using Enroot to run the job on the cluster. Enroot runtime offers GPU acceleration, rootless container support, and seamless integration with HPC environments, making it ideal for running workflows securely.
# create a squash file using Enroot
REGION=<region>
IMAGE="658645717510.dkr.ecr.${REGION}.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121"
aws ecr get-login-password --region "${REGION}" | docker login --username AWS --password-stdin 658645717510.dkr.ecr.${REGION}.amazonaws.com
enroot import -o $PWD/smdistributed-modelparallel.sqsh dockerd://${IMAGE}
  1. After you’ve created the squash file, update the recipes_collection/config.yaml file with the absolute path to the squash file (created in the preceding step), and update the instance_type if needed. The final config file should have the following parameters:
...

cluster_type: slurm
...

instance_type: p5.48xlarge
...

container: /fsx/<path-to-smdistributed-modelparallel>.sqsh
...

Also update the file recipes_collection/cluster/slurm.yaml to add container_mounts pointing to the FSx for Lustre file system used in your cluster.

Follow these high-level steps to set up, fine-tune, and evaluate the model using HyperPod recipes:

  1. Download the model and convert weights to BF16
  2. Fine-tune the model using QLoRA
  3. Merge the trained model adapter
  4. Evaluate the fine-tuned model

Download the model and convert weights to BF16

Download the DeepSeek-R1 model from the HuggingFace hub and convert the model weights from FP8 to BF16. You need to convert this to use QLoRA for fine-tuning. Copy and execute the following bash script:

#!/bin/bash
start=$(date +%s)
# install git lfs and download the model from huggingface
sudo apt-get install git-lfs
GIT_LFS_SKIP_SMUDGE=1 && git clone https://huggingface.co/deepseek-ai/DeepSeek-R1 
&& cd DeepSeek-R1 && git config lfs.concurrenttransfers nproc &&  git lfs pull
end=$(date +%s)
echo "Time taken to download model: $((end - start)) seconds"
start=$(date +%s)
#convert the model weights from fp8 to bf16
source venv/bin/activate
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3/inference && pip install -r requirements.txt && 
wget https://github.com/aws/sagemaker-hyperpod-training-adapter-for-nemo/blob/main/src/hyperpod_nemo_adapter/scripts/fp8_cast_bf16.py && 
python fp8_cast_bf16.py --input-fp8-hf-path ./DeepSeek-R1 --output-bf16-hf-path ./DeepSeek-R1-bf16

end=$(date +%s)
echo "Time taken to convert model to BF16: $((end - start)) seconds"

Fine-tune the model using QLoRA

Download the prepared dataset that you uploaded to Amazon S3 into your FSx for Lustre volume attached to the cluster.

  1. Enter the following commands to download the files from Amazon S3:
aws s3 cp s3://{bucket_name}/{input_path}/train /fsx/ubuntu/deepseek/data/train --recursive
aws s3 cp s3://{bucket_name}/{input_path}/test /fsx/ubuntu/deepseek/data/test --recursive
  1. Update the launcher script to fine-tune the DeepSeek-R1 671B model. The launcher scripts serve as convenient wrappers for executing the training script, main.py file, simplifying the process of fine-tuning and parameter adjustment. For fine-tuning the DeepSeek R1 671B model, you can find the specific script at:
launcher_scripts/deepseek/run_hf_deepseek_r1_671b_seq8k_gpu_qlora.sh

Before running the script, you need to modify the location of the training and validation files, update the HuggingFace model ID, and optionally the access token for private models and datasets. The script should look like the following (update recipes.trainer.num_nodes if you’re using a multi-node cluster):

#!/bin/bash

# Original Copyright (c), NVIDIA CORPORATION. Modifications © Amazon.com

#Users should setup their cluster type in /recipes_collection/config.yaml

SAGEMAKER_TRAINING_LAUNCHER_DIR=${SAGEMAKER_TRAINING_LAUNCHER_DIR:-"$(pwd)"}

HF_MODEL_NAME_OR_PATH="/fsx/ubuntu/deepseek/DeepSeek-R1-bf16" # Path to the BF16 converted model

TRAIN_DIR="/fsx/ubuntu/deepseek/data/train" # Location of training dataset
VAL_DIR="/fsx/ubuntu/deepseek/data/train/" # Location of validation dataset

EXP_DIR="/fsx/ubuntu/deepseek/checkpoints" # Location to save experiment info including logging, checkpoints, etc.

HYDRA_FULL_ERROR=1 python3 "${SAGEMAKER_TRAINING_LAUNCHER_DIR}/main.py" 
recipes=fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora 
base_results_dir="${SAGEMAKER_TRAINING_LAUNCHER_DIR}/results" 
recipes.run.name="hf-deepseek-r1-671b-seq8k-gpu-qlora" 
recipes.exp_manager.exp_dir="$EXP_DIR" 
recipes.trainer.num_nodes=2 
recipes.model.train_batch_size=1 
recipes.model.data.train_dir="$TRAIN_DIR" 
recipes.model.data.val_dir="$VAL_DIR" 
recipes.model.hf_model_name_or_path="$HF_MODEL_NAME_OR_PATH" 

You can view the recipe for this fine-tuning task under recipes_collection/recipes/fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora.yaml and override additional parameters as needed.

  1. Submit the job by running the launcher script:
bash launcher_scripts/deepseek/run_hf_deepseek_r1_671b_seq8k_gpu_qlora.sh

Monitor the job using Slurm commands such as squeue and scontrol show to view the status of the job and the corresponding logs. The logs can be found in the results folder in the launch directory. When the job is complete, the model adapters are stored in the EXP_DIR that you defined in the launch. The structure of the directory should look like this:

ls -R
.:.:
checkpoints experiment result.json

./checkpoints:
peft_sharded

./checkpoints/peft_sharded:
step_50

./checkpoints/peft_sharded/step_50:
README.md adapter_config.json adapter_model.safetensors tp0_ep0

You can see the trained adapter weights are stored as part of the checkpointing under ./checkpoints/peft_sharded/step_N. We will later use this to merge with the base model.

Merge the trained model adapter

Follow these steps:

  1. Run a job using the smdistributed-modelparallel enroot image to merge the adapter with the base model.
  1. Download the merge_peft_checkpoint.py code from sagemaker-hyperpod-training-adapter-for-nemo repository and store it in Amazon FSx. Modify the export variables in the following scripts accordingly to reflect the paths for SOURCE_DIR, ADAPTER_PATH, BASE_MODEL_BF16 and MERGE_MODEL_PATH.
#!/bin/bash
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0
#SBATCH --nodes=1 # number of nodes to use
#SBATCH --job-name=deepseek_merge_adapter # name of your job
#SBATCH --exclusive # job has exclusive use of the resource, no sharing
#SBATCH --wait-all-nodes=1

set -ex;
export SOURCE_DIR=/fsx/path_to_merge_code #(folder containing merge_peft_checkpoint.py)
export ADAPTER_PATH=/fsx/path_to_adapter #( from previous step )
export BASE_MODEL_BF16=/fsx/path_to_base #( BF16 model from step 1 )
export MERGE_MODEL_PATH=/fsx/path_to_merged_model

# default variables for mounting local paths to container
: "${IMAGE:=$(pwd)/smdistributed-modelparallel.sqsh}"
: "${HYPERPOD_PATH:="/var/log/aws/clusters":"/var/log/aws/clusters"}" #this is need for validating its hyperpod cluster
: "${ADAPTER_PATH_1:=$ADAPTER_PATH:$ADAPTER_PATH}"
: "${BASE_MODEL_BF16_1:=$BASE_MODEL_BF16:$BASE_MODEL_BF16}"
: "${MERGE_MODEL_PATH_1:=$MERGE_MODEL_PATH:$MERGE_MODEL_PATH}"
: "${SOURCE_DIR_1:=$SOURCE_DIR:$SOURCE_DIR}"
############

declare -a ARGS=(
--container-image $IMAGE
--container-mounts $HYPERPOD_PATH,$ADAPTER_PATH_1,$BASE_MODEL_BF16_1,$MERGE_MODEL_PATH_1,$SOURCE_DIR_1
)
#Merge adapter with base model.

srun -l "${ARGS[@]}" python  $SOURCE_DIR/merge_peft_checkpoint.py 
--hf_model_name_or_path $BASE_MODEL_BF16 
--peft_adapter_checkpoint_path $ADAPTER_PATH 
--output_model_path $MERGE_MODEL_PATH 
--deepseek_v3 true

Evaluate the fine-tuned model

Use the basic testing scripts provided by DeekSeek to deploy the merged model.

  1. Start by cloning their repo:
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3/inference
pip install -r requirements.txt
  1. You need to convert the merged model to a specific format for running inference. In this case, you need 4*P5 instances to deploy the model because the merged model is in BF16. Enter the following command to convert the model:
python convert.py --hf-ckpt-path /fsx/ubuntu/deepseek/DeepSeek-V3-Base/ 
--save-path /fsx/ubuntu/deepseek/DeepSeek-V3-Demo --n-experts 256 
--model-parallel 32
  1. When the conversion is complete, use the following sbatch script to run the batch inference, making the following adjustments:
    1. Update the ckpt-path to the converted model path from the previous step.
    2. Create a new prompts.txt file with each line containing a prompt. The job will use the prompts from this file and generate output.
#!/bin/bash
#SBATCH —nodes=4
#SBATCH —job-name=deepseek_671b_inference
#SBATCH —output=deepseek_671b_%j.out

# Set environment variables
export MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
export MASTER_PORT=29500
source /fsx/ubuntu/alokana/deepseek/venv/bin/activate
# Run the job using torchrun
srun /fsx/ubuntu/alokana/deepseek/venv/bin/torchrun 
—nnodes=4 
—nproc-per-node=8 
—rdzv_id=$SLURM_JOB_ID 
—rdzv_backend=c10d 
—rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT 
./generate.py 
—ckpt-path /fsx/ubuntu/alokana/deepseek/DeepSeek-R1-Demo 
—config ./configs/config_671B.json 
--input-file ./prompts.txt

Cleanup

To clean up your resources to avoid incurring more charges, follow these steps:

  1. Delete any unused SageMaker Studio resources.
  2. (Optional) Delete the SageMaker Studio domain.
  3. Verify that your training job isn’t running anymore. To do so, on your SageMaker console, choose Training and check Training jobs.
  4. If you created a HyperPod cluster, delete the cluster to stop incurring costs. If you created the networking stack from the HyperPod workshop, delete the stack as well to clean up the virtual private cloud (VPC) resources and the FSx for Lustre volume.

Conclusion

In this post, we demonstrated how to fine-tune large models such as DeepSeek-R1 671B using either SageMaker training jobs or SageMaker HyperPod with HyperPod recipes in a few steps. This approach minimizes the complexity of identifying optimal distributed training configurations and provides a simple way to properly size your workloads with the best price-performance architecture on AWS.

To start using SageMaker HyperPod recipes, visit our sagemaker-hyperpod-recipes GitHub repository for comprehensive documentation and example implementations. Our team continually expands our recipes based on customer feedback and emerging machine learning (ML) trends, making sure you have the necessary tools for successful AI model training.


About the Authors

 Kanwaljit Khurmi is a Principal Worldwide Generative AI Solutions Architect at AWS. He collaborates with AWS product teams, engineering departments, and customers to provide guidance and technical assistance, helping them enhance the value of their hybrid machine learning solutions on AWS. Kanwaljit specializes in assisting customers with containerized applications and high-performance computing solutions.

Arun Kumar Lokanatha is a Senior ML Solutions Architect with the Amazon SageMaker team. He specializes in large language model training workloads, helping customers build LLM workloads using SageMaker HyperPod, SageMaker training jobs, and SageMaker distributed training. Outside of work, he enjoys running, hiking, and cooking.

 Anoop Saha is a Sr GTM Specialist at Amazon Web Services (AWS) focusing on generative AI model training and inference. He partners with top frontier model builders, strategic customers, and AWS service teams to enable distributed training and inference at scale on AWS and lead joint GTM motions. Before AWS, Anoop held several leadership roles at startups and large corporations, primarily focusing on silicon and system architecture of AI infrastructure.

Rohith Nadimpally is a Software Development Engineer working on AWS SageMaker, where he accelerates large-scale AI/ML workflows. Before joining Amazon, he graduated with Honors from Purdue University with a degree in Computer Science. Outside of work, he enjoys playing tennis and watching movies.

​ 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top