Skip to main content

Uploading files

A few ways to move files into a running GPU instance. Pick based on file size and your workflow.

Small files — paste into the browser terminal

For a requirements.txt, a small Python script, or a short config:

  1. Open the browser terminal.
  2. cat > my_file.txt <<'EOF' (end with EOF on its own line to finish).
  3. Paste your content.
  4. Hit Enter, then type EOF.

Works for anything under a few KB. Awkward for larger files.

Medium files — wget / curl from a public URL

# Download a dataset, model checkpoint, or file from a public bucket
wget https://example.com/my-dataset.tar.gz
curl -O https://example.com/my-file.zip

# From HuggingFace
pip install huggingface_hub
huggingface-cli download meta-llama/Llama-3.1-8B --local-dir ./llama

Fastest for large public files — bandwidth in our DCs is generous.

From Git

apt-get update && apt-get install -y git
git clone https://github.com/you/your-repo.git
cd your-repo

If the repo is private, use a personal access token or deploy key.

Using a Jupyter image

If you launched with a Jupyter image (quay.io/jupyter/pytorch-notebook:cuda12-latest etc.), the Jupyter UI has a file upload button in the file browser panel. Drag-and-drop small files directly into the current working directory.

Persistent files — use storage

Files in the container's default filesystem are lost when the instance terminates. For anything you want to keep between sessions:

  • Cloud drive (block storage, single-instance at a time) — good for notebooks, model checkpoints, datasets attached to one box. Mount at /workspace or /mnt/data. See Storage.
  • Shared filesystem (network storage, multi-instance) — good for shared team datasets or model weights accessed by multiple instances. Mount at /shared.

Typical workflow with a cloud drive

  1. Create a my-notebooks cloud drive (say, 50 GB).
  2. Launch an instance, attach the drive at /workspace.
  3. Work in /workspacegit clone, save notebooks, download datasets.
  4. Terminate the instance when done. Drive persists.
  5. Next session: launch a new instance, attach the same drive, everything's still there.

Typical workflow with a shared filesystem

  1. One team member creates a team-datasets shared filesystem, uploads the dataset once.
  2. Every team member launches their own instance with team-datasets attached at /shared.
  3. Everyone reads from /shared/datasets/* — no need to re-download per user.

Very large files (> 20 GB)

  • From a public source: wget / huggingface-cli as above — bandwidth is fine, the transfer happens server-side.
  • From your laptop: slower, limited by your home/office upload. Options:
    • Upload to S3 / GCS first, then aws s3 cp or gsutil cp inside the instance. Our bandwidth is faster than yours to/from cloud providers.
    • Use scp over the web — actually, we don't offer SSH. So pipe via Jupyter upload or cloud-intermediary.
    • Ping the #ecolink-support Slack channel if you're moving TB-scale data regularly — we can arrange bulk-transfer tooling.

Downloading files back out

Same principle:

  • Small files: cat my_file | base64 in the terminal, copy the base64, decode locally.
  • Medium/large: push to a cloud bucket (aws s3 cp, gcloud storage cp, or huggingface-cli upload) — outbound bandwidth is also plentiful.
  • Jupyter file browser: right-click → download.