S3 compatible storage with Ray Train examples
Some of our distributed training examples require an external storage solution so that all nodes can access the same data. The following are examples for configuring S3 or Minio storage for your Ray Train script or interactive session.
S3 Bucket
In your Python Script add the following environment variables:
os.environ["AWS_ACCESS_KEY_ID"] = "XXXXXXXX"
os.environ["AWS_SECRET_ACCESS_KEY"] = "XXXXXXXX"
os.environ["AWS_DEFAULT_REGION"] = "XXXXXXXX"
Alternatively you can specify these variables in your runtime environment on Job Submission.
submission_id = client.submit_job(
entrypoint=...,
runtime_env={
"env_vars": {
"AWS_ACCESS_KEY_ID": os.environ.get('AWS_ACCESS_KEY_ID'),
"AWS_SECRET_ACCESS_KEY": os.environ.get('AWS_SECRET_ACCESS_KEY'),
"AWS_DEFAULT_REGION": os.environ.get('AWS_DEFAULT_REGION')
},
}
)
In your Trainer configuration you can specify a run_config
which
will utilise your external storage.
trainer = TorchTrainer(
train_func_distributed,
scaling_config=scaling_config,
run_config = ray.train.RunConfig(storage_path="s3://BUCKET_NAME/SUB_PATH/", name="unique_run_name")
)
To learn more about Amazon S3 Storage you can find information here.
Minio Bucket
In your Python Script add the following function for configuring your run_config:
import s3fs
import pyarrow
def get_minio_run_config():
s3_fs = s3fs.S3FileSystem(
key = os.getenv('MINIO_ACCESS_KEY', "XXXXX"),
secret = os.getenv('MINIO_SECRET_ACCESS_KEY', "XXXXX"),
endpoint_url = os.getenv('MINIO_URL', "XXXXX")
)
custom_fs = pyarrow.fs.PyFileSystem(pyarrow.fs.FSSpecHandler(s3_fs))
run_config = ray.train.RunConfig(storage_path='training', storage_filesystem=custom_fs)
return run_config
You can update the run_config
to further suit your needs above.
Lastly the new run_config
must be added to the Trainer:
trainer = TorchTrainer(
train_func_distributed,
scaling_config=scaling_config,
run_config = get_minio_run_config()
)
To find more information on creating a Minio Bucket compatible with
RHOAI you can refer to this
documentation.
Note: You must have s3fs
and pyarrow
installed in your
environment for this method.