Guide
Creating Custom Tasks
Build your own Docker-based task images that integrate with Fabric pipelines.
Overview
A Fabric task is a Docker image that follows specific conventions. This guide explains how to create custom task images that integrate seamlessly with Fabric pipelines.
Requirements
To create a compatible Fabric task image, you need:
- A CLI command — The image must contain a command-line interface that accepts parameters as flags
- A
tasks.yamlfile — Describes available tasks, their inputs, and how to call them - Workspace convention — Read from and write to
/app/fabric_workspace
The tasks.yaml File
The tasks.yaml manifest tells Fabric what tasks are available in your image. This file must be copied to /tasks.yaml in your Docker image.
Example tasks.yaml
version: "1.0"
tasks:
- name: "my_custom_task"
image: "my-docker-registry/my-task-image:latest"
image_args:
- "run-cli-command"
inputs:
- name: "input-file"
type: "string"
required: true
- name: "threshold"
type: "float"
value: 0.5
- name: "verbose"
type: "bool"
value: false
Field Reference
| Field | Description |
|---|---|
name |
Unique identifier for the task within Fabric |
image |
Full Docker image path |
image_args |
Base command prepended to input flags |
inputs |
List of parameters the task accepts |
Input Fields
| Field | Description |
|---|---|
name |
Flag name (passed as --name=value) |
type |
Data type: string, int, float, bool |
required |
Whether the input must be provided |
value |
Default value if not provided |
The CLI Command
When Fabric executes a task, it runs:
docker run --rm \
-v $(pwd):/app/fabric_workspace \
your-image [image_args] --input-name=input-value
For example, if your image_args is ["python", "main.py"] and you have an input named threshold:
python main.py --threshold=0.5
Path Conventions
Always assume data is in /app/fabric_workspace. If a user provides data/input.txt, look for it at:
/app/fabric_workspace/data/input.txt
Boolean Flags
For bool types:
- true: Fabric passes the flag without a value (
--verbose) - false: Fabric omits the flag entirely
Ensure your CLI parser handles boolean flags correctly.
The Dockerfile
The Dockerfile must install your CLI and place tasks.yaml at /tasks.yaml.
Example Dockerfile
# Use a base image suitable for your application
FROM python:3.11-slim
# Set the working directory
WORKDIR /app
# Copy your application code
COPY . .
# Install dependencies
RUN pip install -r requirements.txt
# --- CRITICAL STEP ---
# Copy tasks.yaml to the root of the image
COPY tasks.yaml /tasks.yaml
# Set the entrypoint (optional, depending on image_args)
ENTRYPOINT ["python", "app/main.py"]
Complete Example
Let's create a task that calculates statistics on a CSV file.
Directory Structure
my-stats-task/
├── Dockerfile
├── tasks.yaml
├── requirements.txt
└── stats.py
stats.py
#!/usr/bin/env python3
import argparse
import pandas as pd
import json
from pathlib import Path
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--input-file", required=True)
parser.add_argument("--output-file", default="stats.json")
parser.add_argument("--columns", default="all")
args = parser.parse_args()
# Read from workspace
workspace = Path("/app/fabric_workspace")
df = pd.read_csv(workspace / args.input_file)
# Calculate stats
if args.columns != "all":
cols = args.columns.split(",")
df = df[cols]
stats = df.describe().to_dict()
# Write to workspace
output_path = workspace / args.output_file
with open(output_path, "w") as f:
json.dump(stats, f, indent=2)
print(f"Statistics written to {args.output_file}")
if __name__ == "__main__":
main()
tasks.yaml
version: "1.0"
tasks:
- name: "csv_stats"
image: "my-registry/csv-stats:latest"
image_args:
- "python"
- "stats.py"
inputs:
- name: "input-file"
type: "string"
required: true
- name: "output-file"
type: "string"
value: "stats.json"
- name: "columns"
type: "string"
value: "all"
Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY stats.py .
COPY tasks.yaml /tasks.yaml
ENTRYPOINT ["python", "stats.py"]
requirements.txt
pandas>=2.0.0
Build and Push
docker build -t my-registry/csv-stats:latest .
docker push my-registry/csv-stats:latest
Best Practices
1. Statelessness
Tasks should be stateless. Read from /app/fabric_workspace and write results back to it. Don't rely on data persisting between runs.
2. Logging
Write logs to stdout or stderr. Fabric captures these and displays them to the user.
print("Processing file...") # Goes to stdout
import sys
print("Warning: missing column", file=sys.stderr) # Goes to stderr
3. Exit Codes
Exit with non-zero status if the task fails. Fabric uses exit codes to determine success.
import sys
if error_occurred:
print("Error: something went wrong", file=sys.stderr)
sys.exit(1) # Non-zero = failure
sys.exit(0) # Zero = success
4. Input Validation
Validate inputs early and provide clear error messages:
if not Path(workspace / args.input_file).exists():
print(f"Error: Input file not found: {args.input_file}", file=sys.stderr)
sys.exit(1)
5. Progress Updates
For long-running tasks, print progress updates:
for i, chunk in enumerate(chunks):
process(chunk)
print(f"Processed {i+1}/{len(chunks)} chunks")
Registering Your Task
Once your image is built and pushed:
1. Add the Image
fabric workspace add-image my-registry/csv-stats:latest
2. Pull Task Definitions
fabric workspace pull
This downloads /tasks.yaml from your image and registers the tasks locally.
3. Verify Registration
fabric list-tasks
Your task should appear in the list.
4. Use in Pipelines
tasks:
analyze-data:
image: my-registry/csv-stats:latest
inputs:
input-file: "data/measurements.csv"
output-file: "results/stats.json"
columns: "temperature,humidity,pressure"
Next Steps
- Writing Pipelines — Use your tasks in workflows
- Forest Health Indices — See a production task example
- Contributing — Share your tasks with the community