Uploading files
A few ways to move files into a running GPU instance. Pick based on file size and your workflow.
Small files — paste into the browser terminal
For a requirements.txt, a small Python script, or a short config:
- Open the browser terminal.
cat > my_file.txt <<'EOF'(end withEOFon its own line to finish).- Paste your content.
- Hit Enter, then type
EOF.
Works for anything under a few KB. Awkward for larger files.
Medium files — wget / curl from a public URL
# Download a dataset, model checkpoint, or file from a public bucket
wget https://example.com/my-dataset.tar.gz
curl -O https://example.com/my-file.zip
# From HuggingFace
pip install huggingface_hub
huggingface-cli download meta-llama/Llama-3.1-8B --local-dir ./llama
Fastest for large public files — bandwidth in our DCs is generous.
From Git
apt-get update && apt-get install -y git
git clone https://github.com/you/your-repo.git
cd your-repo
If the repo is private, use a personal access token or deploy key.
Using a Jupyter image
If you launched with a Jupyter image (quay.io/jupyter/pytorch-notebook:cuda12-latest etc.), the Jupyter UI has a file upload button in the file browser panel. Drag-and-drop small files directly into the current working directory.
Persistent files — use storage
Files in the container's default filesystem are lost when the instance terminates. For anything you want to keep between sessions:
- Cloud drive (block storage, single-instance at a time) — good for notebooks, model checkpoints, datasets attached to one box. Mount at
/workspaceor/mnt/data. See Storage. - Shared filesystem (network storage, multi-instance) — good for shared team datasets or model weights accessed by multiple instances. Mount at
/shared.
Typical workflow with a cloud drive
- Create a
my-notebookscloud drive (say, 50 GB). - Launch an instance, attach the drive at
/workspace. - Work in
/workspace—git clone, save notebooks, download datasets. - Terminate the instance when done. Drive persists.
- Next session: launch a new instance, attach the same drive, everything's still there.
Typical workflow with a shared filesystem
- One team member creates a
team-datasetsshared filesystem, uploads the dataset once. - Every team member launches their own instance with
team-datasetsattached at/shared. - Everyone reads from
/shared/datasets/*— no need to re-download per user.
Very large files (> 20 GB)
- From a public source:
wget/huggingface-clias above — bandwidth is fine, the transfer happens server-side. - From your laptop: slower, limited by your home/office upload. Options:
- Upload to S3 / GCS first, then
aws s3 cporgsutil cpinside the instance. Our bandwidth is faster than yours to/from cloud providers. - Use
scpover the web — actually, we don't offer SSH. So pipe via Jupyter upload or cloud-intermediary. - Ping the
#ecolink-supportSlack channel if you're moving TB-scale data regularly — we can arrange bulk-transfer tooling.
- Upload to S3 / GCS first, then
Downloading files back out
Same principle:
- Small files:
cat my_file | base64in the terminal, copy the base64, decode locally. - Medium/large: push to a cloud bucket (
aws s3 cp,gcloud storage cp, orhuggingface-cli upload) — outbound bandwidth is also plentiful. - Jupyter file browser: right-click → download.
Related
- Storage — cloud drives and shared filesystems
- Jupyter access
- Duration and extending — so the instance doesn't terminate while you're mid-upload