codeflare_sdk.ray.cluster package

Submodules

codeflare_sdk.ray.cluster.cluster module

The cluster sub-module contains the definition of the Cluster object, which represents the resources requested by the user. It also contains functions for checking the cluster setup queue, a list of all existing clusters, and the user’s working namespace.

class codeflare_sdk.ray.cluster.cluster.Cluster(config: ClusterConfiguration)[source]

Bases: object

An object for requesting, bringing up, and taking down resources. Can also be used for seeing the resource cluster status and details.

Note that currently, the underlying implementation is a Ray cluster.

cluster_dashboard_uri() str[source]

Returns a string containing the cluster’s dashboard URI.

cluster_uri() str[source]

Returns a string containing the cluster’s URI.

create_app_wrapper()[source]

Called upon cluster object creation, creates an AppWrapper yaml based on the specifications of the ClusterConfiguration.

details(print_to_console: bool = True) RayCluster[source]
down()[source]

Deletes the AppWrapper yaml, scaling-down and deleting all resources associated with the cluster.

from_k8_cluster_object(appwrapper=True, write_to_file=False, verify_tls=True)[source]
is_dashboard_ready() bool[source]
property job_client
job_logs(job_id: str) str[source]

This method accesses the head ray node in your cluster and returns the logs for the provided job id.

job_status(job_id: str) str[source]

This method accesses the head ray node in your cluster and returns the job status for the provided job id.

list_jobs() List[source]

This method accesses the head ray node in your cluster and lists the running jobs.

local_client_url()[source]
status(print_to_console: bool = True) Tuple[CodeFlareClusterStatus, bool][source]

Returns the requested cluster’s status, as well as whether or not it is ready for use.

up()[source]

Applies the Cluster yaml, pushing the resource request onto the Kueue localqueue.

wait_ready(timeout: int | None = None, dashboard_check: bool = True)[source]

Waits for requested cluster to be ready, up to an optional timeout (s). Checks every five seconds.

codeflare_sdk.ray.cluster.cluster.get_cluster(cluster_name: str, namespace: str = 'default', write_to_file: bool = False, verify_tls: bool = True)[source]
codeflare_sdk.ray.cluster.cluster.get_current_namespace()[source]
codeflare_sdk.ray.cluster.cluster.list_all_clusters(namespace: str, print_to_console: bool = True)[source]

Returns (and prints by default) a list of all clusters in a given namespace.

codeflare_sdk.ray.cluster.cluster.list_all_queued(namespace: str, print_to_console: bool = True, appwrapper: bool = False)[source]

Returns (and prints by default) a list of all currently queued-up Ray Clusters in a given namespace.

codeflare_sdk.ray.cluster.config module

The config sub-module contains the definition of the ClusterConfiguration dataclass, which is used to specify resource requirements and other details when creating a Cluster object.

class codeflare_sdk.ray.cluster.config.ClusterConfiguration(name: str, namespace: str | None = None, head_info: ~typing.List[str] = <factory>, head_cpu_requests: int | str = 2, head_cpu_limits: int | str = 2, head_cpus: int | str | None = None, head_memory_requests: int | str = 8, head_memory_limits: int | str = 8, head_memory: int | str | None = None, head_gpus: int | None = None, head_extended_resource_requests: ~typing.Dict[str, str | int] = <factory>, machine_types: ~typing.List[str] = <factory>, worker_cpu_requests: int | str = 1, worker_cpu_limits: int | str = 1, min_cpus: int | str | None = None, max_cpus: int | str | None = None, num_workers: int = 1, worker_memory_requests: int | str = 2, worker_memory_limits: int | str = 2, min_memory: int | str | None = None, max_memory: int | str | None = None, num_gpus: int | None = None, template: str = '/home/runner/work/codeflare-sdk/codeflare-sdk/src/codeflare_sdk/ray/templates/base-template.yaml', appwrapper: bool = False, envs: ~typing.Dict[str, str] = <factory>, image: str = '', image_pull_secrets: ~typing.List[str] = <factory>, write_to_file: bool = False, verify_tls: bool = True, labels: ~typing.Dict[str, str] = <factory>, worker_extended_resource_requests: ~typing.Dict[str, str | int] = <factory>, extended_resource_mapping: ~typing.Dict[str, str] = <factory>, overwrite_default_resource_mapping: bool = False, local_queue: str | None = None)[source]

Bases: object

This dataclass is used to specify resource requirements and other details, and is passed in as an argument when creating a Cluster object.

Attributes: - name: The name of the cluster. - namespace: The namespace in which the cluster should be created. - head_info: A list of strings containing information about the head node. - head_cpus: The number of CPUs to allocate to the head node. - head_memory: The amount of memory to allocate to the head node. - head_gpus: The number of GPUs to allocate to the head node. (Deprecated, use head_extended_resource_requests) - head_extended_resource_requests: A dictionary of extended resource requests for the head node. ex: {“nvidia.com/gpu”: 1} - machine_types: A list of machine types to use for the cluster. - min_cpus: The minimum number of CPUs to allocate to each worker. - max_cpus: The maximum number of CPUs to allocate to each worker. - num_workers: The number of workers to create. - min_memory: The minimum amount of memory to allocate to each worker. - max_memory: The maximum amount of memory to allocate to each worker. - num_gpus: The number of GPUs to allocate to each worker. (Deprecated, use worker_extended_resource_requests) - template: The path to the template file to use for the cluster. - appwrapper: A boolean indicating whether to use an AppWrapper. - envs: A dictionary of environment variables to set for the cluster. - image: The image to use for the cluster. - image_pull_secrets: A list of image pull secrets to use for the cluster. - write_to_file: A boolean indicating whether to write the cluster configuration to a file. - verify_tls: A boolean indicating whether to verify TLS when connecting to the cluster. - labels: A dictionary of labels to apply to the cluster. - worker_extended_resource_requests: A dictionary of extended resource requests for each worker. ex: {“nvidia.com/gpu”: 1} - extended_resource_mapping: A dictionary of custom resource mappings to map extended resource requests to RayCluster resource names - overwrite_default_resource_mapping: A boolean indicating whether to overwrite the default resource mapping.

appwrapper: bool = False
envs: Dict[str, str]
extended_resource_mapping: Dict[str, str]
head_cpu_limits: int | str = 2
head_cpu_requests: int | str = 2
head_cpus: int | str | None = None
head_extended_resource_requests: Dict[str, str | int]
head_gpus: int | None = None
head_info: List[str]
head_memory: int | str | None = None
head_memory_limits: int | str = 8
head_memory_requests: int | str = 8
image: str = ''
image_pull_secrets: List[str]
labels: Dict[str, str]
local_queue: str | None = None
machine_types: List[str]
max_cpus: int | str | None = None
max_memory: int | str | None = None
min_cpus: int | str | None = None
min_memory: int | str | None = None
name: str
namespace: str | None = None
num_gpus: int | None = None
num_workers: int = 1
overwrite_default_resource_mapping: bool = False
template: str = '/home/runner/work/codeflare-sdk/codeflare-sdk/src/codeflare_sdk/ray/templates/base-template.yaml'
verify_tls: bool = True
worker_cpu_limits: int | str = 1
worker_cpu_requests: int | str = 1
worker_extended_resource_requests: Dict[str, str | int]
worker_memory_limits: int | str = 2
worker_memory_requests: int | str = 2
write_to_file: bool = False

codeflare_sdk.ray.cluster.generate_yaml module

This sub-module exists primarily to be used internally by the Cluster object (in the cluster sub-module) for AppWrapper generation.

codeflare_sdk.ray.cluster.generate_yaml.augment_labels(item: dict, labels: dict)[source]
codeflare_sdk.ray.cluster.generate_yaml.del_from_list_by_name(l: list, target: List[str]) list[source]
codeflare_sdk.ray.cluster.generate_yaml.gen_names(name)[source]
codeflare_sdk.ray.cluster.generate_yaml.generate_appwrapper(cluster: Cluster)[source]
codeflare_sdk.ray.cluster.generate_yaml.head_worker_gpu_count_from_cluster(cluster: Cluster) Tuple[int, int][source]
codeflare_sdk.ray.cluster.generate_yaml.head_worker_resources_from_cluster(cluster: Cluster) Tuple[dict, dict][source]
codeflare_sdk.ray.cluster.generate_yaml.is_kind_cluster()[source]
codeflare_sdk.ray.cluster.generate_yaml.is_openshift_cluster()[source]
codeflare_sdk.ray.cluster.generate_yaml.notebook_annotations(item: dict)[source]
codeflare_sdk.ray.cluster.generate_yaml.read_template(template)[source]
codeflare_sdk.ray.cluster.generate_yaml.update_env(spec, env)[source]
codeflare_sdk.ray.cluster.generate_yaml.update_image(spec, image)[source]
codeflare_sdk.ray.cluster.generate_yaml.update_image_pull_secrets(spec, image_pull_secrets)[source]
codeflare_sdk.ray.cluster.generate_yaml.update_names(cluster_yaml: dict, cluster: Cluster)[source]
codeflare_sdk.ray.cluster.generate_yaml.update_nodes(ray_cluster_dict: dict, cluster: Cluster)[source]
codeflare_sdk.ray.cluster.generate_yaml.update_resources(spec, cpu_requests, cpu_limits, memory_requests, memory_limits, custom_resources)[source]
codeflare_sdk.ray.cluster.generate_yaml.wrap_cluster(cluster_yaml: dict, appwrapper_name: str, namespace: str)[source]
codeflare_sdk.ray.cluster.generate_yaml.write_user_yaml(user_yaml, output_file_name)[source]

codeflare_sdk.ray.cluster.pretty_print module

This sub-module exists primarily to be used internally by the Cluster object (in the cluster sub-module) for pretty-printing cluster status and details.

codeflare_sdk.ray.cluster.pretty_print.print_app_wrappers_status(app_wrappers: List[AppWrapper], starting: bool = False)[source]
codeflare_sdk.ray.cluster.pretty_print.print_cluster_status(cluster: RayCluster)[source]

Pretty prints the status of a passed-in cluster

codeflare_sdk.ray.cluster.pretty_print.print_clusters(clusters: List[RayCluster])[source]
codeflare_sdk.ray.cluster.pretty_print.print_no_resources_found()[source]
codeflare_sdk.ray.cluster.pretty_print.print_ray_clusters_status(app_wrappers: List[AppWrapper], starting: bool = False)[source]

codeflare_sdk.ray.cluster.status module

The status sub-module defines Enums containing information for Ray cluster states states, and CodeFlare cluster states, as well as dataclasses to store information for Ray clusters.

class codeflare_sdk.ray.cluster.status.CodeFlareClusterStatus(value)[source]

Bases: Enum

Defines the possible reportable states of a Codeflare cluster.

FAILED = 5
QUEUED = 3
QUEUEING = 4
READY = 1
STARTING = 2
SUSPENDED = 7
UNKNOWN = 6
class codeflare_sdk.ray.cluster.status.RayCluster(name: str, status: ~codeflare_sdk.ray.cluster.status.RayClusterStatus, head_cpu_requests: int, head_cpu_limits: int, head_mem_requests: str, head_mem_limits: str, num_workers: int, worker_mem_requests: str, worker_mem_limits: str, worker_cpu_requests: int | str, worker_cpu_limits: int | str, namespace: str, dashboard: str, worker_extended_resources: ~typing.Dict[str, int] = <factory>, head_extended_resources: ~typing.Dict[str, int] = <factory>)[source]

Bases: object

For storing information about a Ray cluster.

dashboard: str
head_cpu_limits: int
head_cpu_requests: int
head_extended_resources: Dict[str, int]
head_mem_limits: str
head_mem_requests: str
name: str
namespace: str
num_workers: int
status: RayClusterStatus
worker_cpu_limits: int | str
worker_cpu_requests: int | str
worker_extended_resources: Dict[str, int]
worker_mem_limits: str
worker_mem_requests: str
class codeflare_sdk.ray.cluster.status.RayClusterStatus(value)[source]

Bases: Enum

Defines the possible reportable states of a Ray cluster.

FAILED = 'failed'
READY = 'ready'
SUSPENDED = 'suspended'
UNHEALTHY = 'unhealthy'
UNKNOWN = 'unknown'

Module contents