S3 compatible storage with Ray Train examples

Some of our distributed training examples require an external storage solution so that all nodes can access the same data. The following are examples for configuring S3 or Minio storage for your Ray Train script or interactive session.

S3 Bucket

In your Python Script add the following environment variables:

os.environ["AWS_ACCESS_KEY_ID"] = "XXXXXXXX"
os.environ["AWS_SECRET_ACCESS_KEY"] = "XXXXXXXX"
os.environ["AWS_DEFAULT_REGION"] = "XXXXXXXX"

Alternatively you can specify these variables in your runtime environment on Job Submission.

submission_id = client.submit_job(
    entrypoint=...,
    runtime_env={
        "env_vars": {
            "AWS_ACCESS_KEY_ID": os.environ.get('AWS_ACCESS_KEY_ID'),
            "AWS_SECRET_ACCESS_KEY": os.environ.get('AWS_SECRET_ACCESS_KEY'),
            "AWS_DEFAULT_REGION": os.environ.get('AWS_DEFAULT_REGION')
        },
    }
)

In your Trainer configuration you can specify a run_config which will utilise your external storage.

trainer = TorchTrainer(
    train_func_distributed,
    scaling_config=scaling_config,
    run_config = ray.train.RunConfig(storage_path="s3://BUCKET_NAME/SUB_PATH/", name="unique_run_name")
)

To learn more about Amazon S3 Storage you can find information here.

Minio Bucket

In your Python Script add the following function for configuring your run_config:

import s3fs
import pyarrow

def get_minio_run_config():
   s3_fs = s3fs.S3FileSystem(
       key = os.getenv('MINIO_ACCESS_KEY', "XXXXX"),
       secret = os.getenv('MINIO_SECRET_ACCESS_KEY', "XXXXX"),
       endpoint_url = os.getenv('MINIO_URL', "XXXXX")
   )
   custom_fs = pyarrow.fs.PyFileSystem(pyarrow.fs.FSSpecHandler(s3_fs))
   run_config = ray.train.RunConfig(storage_path='training', storage_filesystem=custom_fs)
   return run_config

You can update the run_config to further suit your needs above. Lastly the new run_config must be added to the Trainer:

trainer = TorchTrainer(
    train_func_distributed,
    scaling_config=scaling_config,
    run_config = get_minio_run_config()
)

To find more information on creating a Minio Bucket compatible with RHOAI you can refer to this documentation. Note: You must have s3fs and pyarrow installed in your environment for this method.