January 7, 2020

Dockerised Workspaces: an example with Python & Scrapy

Most of my daily workflow is heavily reliant on Docker; and although it’s a highly respected tool in your average engineer’s arsenal, I still think it’s underappreciated in some scenarios. One of those scenarios for me is in producing small isolated workspaces.

A core component of my backup strategy involves Dockerised “workspaces”; simply directories with a Dockerfile, a Makefile or shell script, and some child directories which are largely used as volumes.

This has a few great benefits:

  1. I can be sure that by just backing up my home folder I’m not going to lose any system configuration or packages that specific projects rely upon.
  2. I can isolate different environments, avoid dependency conflicts, and not have to worry about some of the headaches caused by tools like virtualenv.
  3. It’s trivial to configure initially and subsequently reproducible without any effort.

My Scrapy Workspace

This workspace took around 15 minutes to make: and even that is quite slack. It’s simply composed of a minimal Dockerfile in combination with a trivial shell script that wraps around the container and provides convenience methods.

Usage is as simple as cloning the repository; just checkout - badum tish - the repository on Github.

➜  git clone [email protected]:FergusInLondon/Scrapy-Workspace.git scrapy 
Cloning into 'scrapy'...
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (9/9), done.
remote: Compressing objects: 100% (6/6), done.
Receiving objects: 100% (9/9), done.
remote: Total 9 (delta 0), reused 9 (delta 0), pack-reused 0
➜  cd scrapy 
➜  ./scrapy.sh new example
Sending build context to Docker daemon  62.98kB
Step 1/5 : FROM python:3-buster
 ---> 1f88553e8143
Step 2/5 : WORKDIR /scrapers/
 ---> Using cache
 ---> 4b101836265e
Step 3/5 : COPY requirements.txt requirements.txt
 ---> Using cache
 ---> 937304d45fc9
Step 4/5 : RUN pip install -r requirements.txt
 ---> Using cache
 ---> a751f4e54fc0
Step 5/5 : ENTRYPOINT [ "scrapy" ]
 ---> Using cache
 ---> 35e9fb91969b
Successfully built 35e9fb91969b
Successfully tagged scraper:latest
New Scrapy project 'example', using template directory '/usr/local/lib/python3.8/site-packages/scrapy/templates/project', created in:
    /scrapers/example

You can start your first spider with:
    cd example
    scrapy genspider example example.com
➜

I wrote the workspace file on my laptop, yet have since cloned it - for the above example - on another machine; and I had an identical environment within 5 minutes.

Need to introduce a new dependency? Treat it like a Python project and update requirements.txt and then run ./scrapy.sh build before pushing back upstream so other machines will have the same environment. Done.

Want to save HD space? Simply prune your Docker containers and images: no need for rooting around your distro’s package manager.

At some point I’ll try and write up how these little Docker workspaces fit in to my overall backup strategy too!

© Fergus In London 2019

Powered by Hugo & Kiss. Source available on Github.