Intermediate

BeeGFS for AI Workloads

Set up BeeGFS as a cost-effective parallel file system for AI training with buddy mirroring, flexible striping, and GPU-direct storage integration.

Why BeeGFS?

BeeGFS (originally FhGFS) is a parallel file system developed by the Fraunhofer Institute that has gained significant popularity for AI workloads. Its appeal lies in its simplicity of deployment, excellent small-file performance, and the ability to co-locate storage and compute on the same nodes.

💡
Key advantage: BeeGFS can run storage services on GPU compute nodes themselves, using local NVMe drives. This eliminates network overhead for data access and is particularly effective for small to medium GPU clusters where dedicated storage infrastructure is impractical.

BeeGFS Components

  1. Management Service (beegfs-mgmtd)

    Registry service that tracks all other BeeGFS services. Lightweight; only accessed during mount and service registration.

  2. Metadata Service (beegfs-meta)

    Handles directory and file metadata. Distributes metadata across multiple servers for parallel namespace operations.

  3. Storage Service (beegfs-storage)

    Stores file data chunks. Each server can manage multiple storage targets. Files are striped across targets for throughput.

  4. Client (beegfs-client)

    Kernel module that mounts the BeeGFS file system. Communicates directly with metadata and storage servers.

Configuration for AI Training

Bash - BeeGFS Setup
# Configure striping for training data
beegfs-ctl --setpattern --chunksize=2M --numtargets=8 /mnt/beegfs/training/

# Enable buddy mirroring for checkpoint data
beegfs-ctl --setpattern --buddymirror --numtargets=4 /mnt/beegfs/checkpoints/

# Check file distribution across storage targets
beegfs-ctl --getentryinfo /mnt/beegfs/training/dataset.bin

# Monitor storage target utilization
beegfs-ctl --listtargets --nodetype=storage --state

BeeOND: BeeGFS On Demand

BeeOND creates ephemeral BeeGFS file systems from local drives on allocated compute nodes. This is ideal for AI training jobs that need fast scratch space during a training run but do not need persistent storage.

Bash - BeeOND for Training Jobs
# Create a BeeOND file system across allocated GPU nodes
beeond start -n /path/to/nodefile -d /local/nvme -c /mnt/beeond

# Stage training data from persistent storage
beeond-cp stagein -n /path/to/nodefile \
  -g /persistent/training-data/ -l /mnt/beeond/data/

# After training, stage out results
beeond-cp stageout -n /path/to/nodefile \
  -l /mnt/beeond/results/ -g /persistent/results/

# Tear down when job completes
beeond stop -n /path/to/nodefile -L -d
Best practice: Use BeeOND for training jobs that fit on local NVMe capacity. Stage data in before training starts and stage results out after completion. This eliminates all network I/O during the training loop, maximizing GPU utilization.