A sample Apache Airflow provider package built by Astronomer.
Project description
Apache Airflow Provider for SkyPilot
A provider you can utilize multiple clouds on Apache Airflow through SkyPilot.
Installation
The SkyPilot provider for Apache Airflow was developed and tested on an environment with the following dependencies installed:
- Apache Airflow >= 2.6.0
- SkyPilot >= 0.4.1
The installation of the SkyPilot provider may start from the Airflow environment configured with Docker instructed in "Running Airflow in Docker".
Base on the docker configuration, add a pip install
command in the Dockerfile and build your own Docker image.
RUN pip install --user airflow-provider-skypilot
Then, make sure that SkyPilot is properly installed and initialized on the same environment. The initialization includes cloud account setup and access verification. Please refer to SkyPilot Installation for more information.
Configuration
A SkyPilot provider process runs on an Airflow worker, but it stores its metadata into the Airflow master node. This scheme allows a set of consecutive sky tasks runs across multiple workers by sharing the metadata.
Following settings in the docker-compose.yaml
defines the data sharing, including cloud credentials, metadata and workspace.
x-airflow-common:
environment:
volumes:
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
# mount cloud credentials
- ${HOME}/.aws:/opt/airflow/sky_home_dir/.aws
- ${HOME}/.azure:/opt/airflow/sky_home_dir/.azure
- ${HOME}/.config/gcloud:/opt/airflow/sky_home_dir/.config/gcloud
- ${HOME}/.scp:/opt/airflow/sky_home_dir/.scp
# mount sky metadata
- ${HOME}/.sky:/opt/airflow/sky_home_dir/.sky
- ${HOME}/.ssh:/opt/airflow/sky_home_dir/.ssh
# mount sky working dir
- ${HOME}/sky_workdir:/opt/airflow/sky_home_dir/sky_workdir
This example mounts the cloud credentials for AWS
, Azure
, GCP
, and SCP
,
which have been made by SkyPilot could account setup.
For SkyPilot metadata, check .sky/
and .ssh/
are placed in your ${HOME}
directory and mount them.
Additionally, you can mount your own directory like sky_workdir/
for user resources including user codes and yaml
task definition files for Skypilot execution.
Note that all Sky directories are mounted under
sky_home_dir/
. They will be symbolic-linked to${HOME}/
in workers where a SkyPilot provider process actually runs.
Usage
The SkyPilot provider includes the following operators:
- SkyLaunchOperator
- SkyExecOperator
- SkyDownOperator
- SkySSHOperator
- SkyRsyncUpOperator
- SkyRsyncDownOperator
SkyLaunchOperator
creates an cloud cluster and executes a Sky task, as shown below:
sky_launch_task = SkyLaunchOperator(
task_id="sky_launch_task",
sky_task_yaml="~/sky_workdir/my_task.yaml",
cloud="cheapest", # aws|azure|gcp|scp|ibm ...
gpus="A100:1",
minimum_cpus=16,
minimum_memory=32,
auto_down=False,
sky_home_dir='/opt/airflow/sky_home_dir', #set by default
dag=dag
)
Once SkyLaunchOperator
creates a Sky cluster with auto_down=False
, the created cluster can be utilized by the other Sky operators.
Please refer to an example dag for multiple Sky operators running on a single Sky cluster.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for airflow_provider_skypilot-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd116fc12c7c31ca4e5358db0b51bbd53eb6cec7a633d4a3d3150184ed0eceec |
|
MD5 | b2764ace06294206ac5cf50daede2ed2 |
|
BLAKE2b-256 | f11f5efdb71537ab63a8fd62f7a8dab386fa008d2374d62094afbf533cbd649a |