Jump to the content part
DubGeiser Home

Ramblings

Do the simplest thing that could possibly work

2010-07-28 23:35:48, by Per

I've been listening to Security now lately, which I like (although I'd like to hear less commercial blahblah on it). When I recommended the podcast to some one (more specifically the Portable dog killer episode), I got the reply that only the last 20 episodes or so came up on iTunes.

Now, on Steve Gibson's Security Now! page, all the episodes are linked, so I thought I'd make a small script to download all the episodes.

First, I got the page locally, so I didn't have to hit the page every time I took the script for a test run:

$ wget http://www.grc.com/securitynow.htm

Then I whipped out the snake and wrote a script sn.py

from BeautifulSoup import BeautifulSoup


links = BeautifulSoup(open("securitynow.htm").read()).findAll("a", href=True)
for l in links:
    if l["href"].endswith(".mp3") and not l["href"].endswith("-lq.mp3"):
        print l['href']

This gave me the links to all the mp3's, one per line. I redirected that output to a file, that I then used as input for wget again:

$ python sn.py > snpodcasts.txt
$ wget -i snpodcasts.txt

Done...

This took about 20 minutes, so wondering about where my time went and contemplating about the approach I took; there were a couple of things bothering me:

  1. On my desktop I didn't have BeautifulSoup yet, so I needed to install that first
  2. When doing one of the first test runs, I had not taken into account that the page also lists the lower quality version, which I didn't needed and certainly didn't want to download too.
  3. The going back and forth between shell and Vim (wget, coding python, testing python script, running the script and then wget again), although something I do often on larger coding project, felt kinda "fiddly".

Then I realized I just caught myself "doing the first thing that comes to mind" and mistaking that for "doing the simplest thing that could possibly work"; because I immediately saw a possible solution, my mind automatically assumed that that was the simplest thing to do. But, if I had taken a closer look at the problem at hand and a closer look at the page and how the links were formatted, I'd probably come up with following one liner:

$ wget $(python -c 'for i in range(1, 259): print "http://media.grc.com/sn/sn-%03d.mp3" % i')

Now, this might be not as generic as the first solution, but it would've been up & running in maybe half the time.

"Doing the simplest thing that could possibly work" is a programming mantra that has taken quite a while to sink in, and look, it still bites me from time to time.