-
Do the simplest thing that could possibly work
2010-07-28 23:35:48, by Per
I've been listening to Security now lately, which I like (although I'd like to hear less commercial blahblah on it). When I recommended the podcast to some one (more specifically the Portable dog killer episode), I got the reply that only the last 20 episodes or so came up on iTunes.
Now, on Steve Gibson's Security Now! page, all the episodes are linked, so I thought I'd make a small script to download all the episodes.
First, I got the page locally, so I didn't have to hit the page every time I took the script for a test run:
$ wget http://www.grc.com/securitynow.htm
Then I whipped out the snake and wrote a script
sn.pyfrom BeautifulSoup import BeautifulSoup links = BeautifulSoup(open("securitynow.htm").read()).findAll("a", href=True) for l in links: if l["href"].endswith(".mp3") and not l["href"].endswith("-lq.mp3"): print l['href']This gave me the links to all the mp3's, one per line. I redirected that output to a file, that I then used as input for wget again:
$ python sn.py > snpodcasts.txt $ wget -i snpodcasts.txt
Done...
This took about 20 minutes, so wondering about where my time went and contemplating about the approach I took; there were a couple of things bothering me:
- On my desktop I didn't have BeautifulSoup yet, so I needed to install that first
- When doing one of the first test runs, I had not taken into account that the page also lists the lower quality version, which I didn't needed and certainly didn't want to download too.
- The going back and forth between shell and Vim (wget, coding python, testing python script, running the script and then wget again), although something I do often on larger coding project, felt kinda "fiddly".
Then I realized I just caught myself "doing the first thing that comes to mind" and mistaking that for "doing the simplest thing that could possibly work"; because I immediately saw a possible solution, my mind automatically assumed that that was the simplest thing to do. But, if I had taken a closer look at the problem at hand and a closer look at the page and how the links were formatted, I'd probably come up with following one liner:
$ wget $(python -c 'for i in range(1, 259): print "http://media.grc.com/sn/sn-%03d.mp3" % i')
Now, this might be not as generic as the first solution, but it would've been up & running in maybe half the time.
"Doing the simplest thing that could possibly work" is a programming mantra that has taken quite a while to sink in, and look, it still bites me from time to time.