codeflare_sdk.ray.cluster package
Submodules
codeflare_sdk.ray.cluster.cluster module
The cluster sub-module contains the definition of the Cluster object, which represents the resources requested by the user. It also contains functions for checking the cluster setup queue, a list of all existing clusters, and the user’s working namespace.
- class codeflare_sdk.ray.cluster.cluster.Cluster(config: ClusterConfiguration)[source]
Bases:
object
An object for requesting, bringing up, and taking down resources. Can also be used for seeing the resource cluster status and details.
Note that currently, the underlying implementation is a Ray cluster.
- create_app_wrapper()[source]
Called upon cluster object creation, creates an AppWrapper yaml based on the specifications of the ClusterConfiguration.
- details(print_to_console: bool = True) RayCluster [source]
- down()[source]
Deletes the AppWrapper yaml, scaling-down and deleting all resources associated with the cluster.
- property job_client
- job_logs(job_id: str) str [source]
This method accesses the head ray node in your cluster and returns the logs for the provided job id.
- job_status(job_id: str) str [source]
This method accesses the head ray node in your cluster and returns the job status for the provided job id.
- list_jobs() List [source]
This method accesses the head ray node in your cluster and lists the running jobs.
- status(print_to_console: bool = True) Tuple[CodeFlareClusterStatus, bool] [source]
Returns the requested cluster’s status, as well as whether or not it is ready for use.
- codeflare_sdk.ray.cluster.cluster.get_cluster(cluster_name: str, namespace: str = 'default', write_to_file: bool = False, verify_tls: bool = True)[source]
codeflare_sdk.ray.cluster.config module
The config sub-module contains the definition of the ClusterConfiguration dataclass, which is used to specify resource requirements and other details when creating a Cluster object.
- class codeflare_sdk.ray.cluster.config.ClusterConfiguration(name: str, namespace: str | None = None, head_info: ~typing.List[str] = <factory>, head_cpu_requests: int | str = 2, head_cpu_limits: int | str = 2, head_cpus: int | str | None = None, head_memory_requests: int | str = 8, head_memory_limits: int | str = 8, head_memory: int | str | None = None, head_gpus: int | None = None, head_extended_resource_requests: ~typing.Dict[str, str | int] = <factory>, machine_types: ~typing.List[str] = <factory>, worker_cpu_requests: int | str = 1, worker_cpu_limits: int | str = 1, min_cpus: int | str | None = None, max_cpus: int | str | None = None, num_workers: int = 1, worker_memory_requests: int | str = 2, worker_memory_limits: int | str = 2, min_memory: int | str | None = None, max_memory: int | str | None = None, num_gpus: int | None = None, template: str = '/home/runner/work/codeflare-sdk/codeflare-sdk/src/codeflare_sdk/ray/templates/base-template.yaml', appwrapper: bool = False, envs: ~typing.Dict[str, str] = <factory>, image: str = '', image_pull_secrets: ~typing.List[str] = <factory>, write_to_file: bool = False, verify_tls: bool = True, labels: ~typing.Dict[str, str] = <factory>, worker_extended_resource_requests: ~typing.Dict[str, str | int] = <factory>, extended_resource_mapping: ~typing.Dict[str, str] = <factory>, overwrite_default_resource_mapping: bool = False, local_queue: str | None = None)[source]
Bases:
object
This dataclass is used to specify resource requirements and other details, and is passed in as an argument when creating a Cluster object.
Attributes: - name: The name of the cluster. - namespace: The namespace in which the cluster should be created. - head_info: A list of strings containing information about the head node. - head_cpus: The number of CPUs to allocate to the head node. - head_memory: The amount of memory to allocate to the head node. - head_gpus: The number of GPUs to allocate to the head node. (Deprecated, use head_extended_resource_requests) - head_extended_resource_requests: A dictionary of extended resource requests for the head node. ex: {“nvidia.com/gpu”: 1} - machine_types: A list of machine types to use for the cluster. - min_cpus: The minimum number of CPUs to allocate to each worker. - max_cpus: The maximum number of CPUs to allocate to each worker. - num_workers: The number of workers to create. - min_memory: The minimum amount of memory to allocate to each worker. - max_memory: The maximum amount of memory to allocate to each worker. - num_gpus: The number of GPUs to allocate to each worker. (Deprecated, use worker_extended_resource_requests) - template: The path to the template file to use for the cluster. - appwrapper: A boolean indicating whether to use an AppWrapper. - envs: A dictionary of environment variables to set for the cluster. - image: The image to use for the cluster. - image_pull_secrets: A list of image pull secrets to use for the cluster. - write_to_file: A boolean indicating whether to write the cluster configuration to a file. - verify_tls: A boolean indicating whether to verify TLS when connecting to the cluster. - labels: A dictionary of labels to apply to the cluster. - worker_extended_resource_requests: A dictionary of extended resource requests for each worker. ex: {“nvidia.com/gpu”: 1} - extended_resource_mapping: A dictionary of custom resource mappings to map extended resource requests to RayCluster resource names - overwrite_default_resource_mapping: A boolean indicating whether to overwrite the default resource mapping.
- appwrapper: bool = False
- envs: Dict[str, str]
- extended_resource_mapping: Dict[str, str]
- head_cpu_limits: int | str = 2
- head_cpu_requests: int | str = 2
- head_cpus: int | str | None = None
- head_extended_resource_requests: Dict[str, str | int]
- head_gpus: int | None = None
- head_info: List[str]
- head_memory: int | str | None = None
- head_memory_limits: int | str = 8
- head_memory_requests: int | str = 8
- image: str = ''
- image_pull_secrets: List[str]
- labels: Dict[str, str]
- local_queue: str | None = None
- machine_types: List[str]
- max_cpus: int | str | None = None
- max_memory: int | str | None = None
- min_cpus: int | str | None = None
- min_memory: int | str | None = None
- name: str
- namespace: str | None = None
- num_gpus: int | None = None
- num_workers: int = 1
- overwrite_default_resource_mapping: bool = False
- template: str = '/home/runner/work/codeflare-sdk/codeflare-sdk/src/codeflare_sdk/ray/templates/base-template.yaml'
- verify_tls: bool = True
- worker_cpu_limits: int | str = 1
- worker_cpu_requests: int | str = 1
- worker_extended_resource_requests: Dict[str, str | int]
- worker_memory_limits: int | str = 2
- worker_memory_requests: int | str = 2
- write_to_file: bool = False
codeflare_sdk.ray.cluster.generate_yaml module
This sub-module exists primarily to be used internally by the Cluster object (in the cluster sub-module) for AppWrapper generation.
- codeflare_sdk.ray.cluster.generate_yaml.del_from_list_by_name(l: list, target: List[str]) list [source]
- codeflare_sdk.ray.cluster.generate_yaml.head_worker_gpu_count_from_cluster(cluster: Cluster) Tuple[int, int] [source]
- codeflare_sdk.ray.cluster.generate_yaml.head_worker_resources_from_cluster(cluster: Cluster) Tuple[dict, dict] [source]
- codeflare_sdk.ray.cluster.generate_yaml.update_image_pull_secrets(spec, image_pull_secrets)[source]
- codeflare_sdk.ray.cluster.generate_yaml.update_nodes(ray_cluster_dict: dict, cluster: Cluster)[source]
- codeflare_sdk.ray.cluster.generate_yaml.update_resources(spec, cpu_requests, cpu_limits, memory_requests, memory_limits, custom_resources)[source]
codeflare_sdk.ray.cluster.pretty_print module
This sub-module exists primarily to be used internally by the Cluster object (in the cluster sub-module) for pretty-printing cluster status and details.
- codeflare_sdk.ray.cluster.pretty_print.print_app_wrappers_status(app_wrappers: List[AppWrapper], starting: bool = False)[source]
- codeflare_sdk.ray.cluster.pretty_print.print_cluster_status(cluster: RayCluster)[source]
Pretty prints the status of a passed-in cluster
- codeflare_sdk.ray.cluster.pretty_print.print_clusters(clusters: List[RayCluster])[source]
- codeflare_sdk.ray.cluster.pretty_print.print_ray_clusters_status(app_wrappers: List[AppWrapper], starting: bool = False)[source]
codeflare_sdk.ray.cluster.status module
The status sub-module defines Enums containing information for Ray cluster states states, and CodeFlare cluster states, as well as dataclasses to store information for Ray clusters.
- class codeflare_sdk.ray.cluster.status.CodeFlareClusterStatus(value)[source]
Bases:
Enum
Defines the possible reportable states of a Codeflare cluster.
- FAILED = 5
- QUEUED = 3
- QUEUEING = 4
- READY = 1
- STARTING = 2
- SUSPENDED = 7
- UNKNOWN = 6
- class codeflare_sdk.ray.cluster.status.RayCluster(name: str, status: ~codeflare_sdk.ray.cluster.status.RayClusterStatus, head_cpu_requests: int, head_cpu_limits: int, head_mem_requests: str, head_mem_limits: str, num_workers: int, worker_mem_requests: str, worker_mem_limits: str, worker_cpu_requests: int | str, worker_cpu_limits: int | str, namespace: str, dashboard: str, worker_extended_resources: ~typing.Dict[str, int] = <factory>, head_extended_resources: ~typing.Dict[str, int] = <factory>)[source]
Bases:
object
For storing information about a Ray cluster.
- dashboard: str
- head_cpu_limits: int
- head_cpu_requests: int
- head_extended_resources: Dict[str, int]
- head_mem_limits: str
- head_mem_requests: str
- name: str
- namespace: str
- num_workers: int
- status: RayClusterStatus
- worker_cpu_limits: int | str
- worker_cpu_requests: int | str
- worker_extended_resources: Dict[str, int]
- worker_mem_limits: str
- worker_mem_requests: str