Creating Custom Tasks

Build your own Docker-based task images that integrate with Fabric pipelines.

Advanced 20 min

Overview

A Fabric task is a Docker image that follows specific conventions. This guide explains how to create custom task images that integrate seamlessly with Fabric pipelines.

Requirements

To create a compatible Fabric task image, you need:

  1. A CLI command — The image must contain a command-line interface that accepts parameters as flags
  2. A tasks.yaml file — Describes available tasks, their inputs, and how to call them
  3. Workspace convention — Read from and write to /app/fabric_workspace

The tasks.yaml File

The tasks.yaml manifest tells Fabric what tasks are available in your image. This file must be copied to /tasks.yaml in your Docker image.

Example tasks.yaml

version: "1.0"
tasks:
  - name: "my_custom_task"
    image: "my-docker-registry/my-task-image:latest"
    image_args:
      - "run-cli-command"
    inputs:
      - name: "input-file"
        type: "string"
        required: true
      - name: "threshold"
        type: "float"
        value: 0.5
      - name: "verbose"
        type: "bool"
        value: false

Field Reference

Field Description
name Unique identifier for the task within Fabric
image Full Docker image path
image_args Base command prepended to input flags
inputs List of parameters the task accepts

Input Fields

Field Description
name Flag name (passed as --name=value)
type Data type: string, int, float, bool
required Whether the input must be provided
value Default value if not provided

The CLI Command

When Fabric executes a task, it runs:

docker run --rm \
  -v $(pwd):/app/fabric_workspace \
  your-image [image_args] --input-name=input-value

For example, if your image_args is ["python", "main.py"] and you have an input named threshold:

python main.py --threshold=0.5

Path Conventions

Always assume data is in /app/fabric_workspace. If a user provides data/input.txt, look for it at:

/app/fabric_workspace/data/input.txt

Boolean Flags

For bool types:

  • true: Fabric passes the flag without a value (--verbose)
  • false: Fabric omits the flag entirely

Ensure your CLI parser handles boolean flags correctly.

The Dockerfile

The Dockerfile must install your CLI and place tasks.yaml at /tasks.yaml.

Example Dockerfile

# Use a base image suitable for your application
FROM python:3.11-slim

# Set the working directory
WORKDIR /app

# Copy your application code
COPY . .

# Install dependencies
RUN pip install -r requirements.txt

# --- CRITICAL STEP ---
# Copy tasks.yaml to the root of the image
COPY tasks.yaml /tasks.yaml

# Set the entrypoint (optional, depending on image_args)
ENTRYPOINT ["python", "app/main.py"]

Complete Example

Let's create a task that calculates statistics on a CSV file.

Directory Structure

my-stats-task/
├── Dockerfile
├── tasks.yaml
├── requirements.txt
└── stats.py

stats.py

#!/usr/bin/env python3
import argparse
import pandas as pd
import json
from pathlib import Path

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--input-file", required=True)
    parser.add_argument("--output-file", default="stats.json")
    parser.add_argument("--columns", default="all")
    args = parser.parse_args()

    # Read from workspace
    workspace = Path("/app/fabric_workspace")
    df = pd.read_csv(workspace / args.input_file)

    # Calculate stats
    if args.columns != "all":
        cols = args.columns.split(",")
        df = df[cols]

    stats = df.describe().to_dict()

    # Write to workspace
    output_path = workspace / args.output_file
    with open(output_path, "w") as f:
        json.dump(stats, f, indent=2)

    print(f"Statistics written to {args.output_file}")

if __name__ == "__main__":
    main()

tasks.yaml

version: "1.0"
tasks:
  - name: "csv_stats"
    image: "my-registry/csv-stats:latest"
    image_args:
      - "python"
      - "stats.py"
    inputs:
      - name: "input-file"
        type: "string"
        required: true
      - name: "output-file"
        type: "string"
        value: "stats.json"
      - name: "columns"
        type: "string"
        value: "all"

Dockerfile

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY stats.py .
COPY tasks.yaml /tasks.yaml

ENTRYPOINT ["python", "stats.py"]

requirements.txt

pandas>=2.0.0

Build and Push

docker build -t my-registry/csv-stats:latest .
docker push my-registry/csv-stats:latest

Best Practices

1. Statelessness

Tasks should be stateless. Read from /app/fabric_workspace and write results back to it. Don't rely on data persisting between runs.

2. Logging

Write logs to stdout or stderr. Fabric captures these and displays them to the user.

print("Processing file...")  # Goes to stdout
import sys
print("Warning: missing column", file=sys.stderr)  # Goes to stderr

3. Exit Codes

Exit with non-zero status if the task fails. Fabric uses exit codes to determine success.

import sys

if error_occurred:
    print("Error: something went wrong", file=sys.stderr)
    sys.exit(1)  # Non-zero = failure

sys.exit(0)  # Zero = success

4. Input Validation

Validate inputs early and provide clear error messages:

if not Path(workspace / args.input_file).exists():
    print(f"Error: Input file not found: {args.input_file}", file=sys.stderr)
    sys.exit(1)

5. Progress Updates

For long-running tasks, print progress updates:

for i, chunk in enumerate(chunks):
    process(chunk)
    print(f"Processed {i+1}/{len(chunks)} chunks")

Registering Your Task

Once your image is built and pushed:

1. Add the Image

fabric workspace add-image my-registry/csv-stats:latest

2. Pull Task Definitions

fabric workspace pull

This downloads /tasks.yaml from your image and registers the tasks locally.

3. Verify Registration

fabric list-tasks

Your task should appear in the list.

4. Use in Pipelines

tasks:
  analyze-data:
    image: my-registry/csv-stats:latest
    inputs:
      input-file: "data/measurements.csv"
      output-file: "results/stats.json"
      columns: "temperature,humidity,pressure"

Next Steps