Data munging

Downloading video with Awk

August 29, 2008 Data munging No comments

Peteris Krumins, the prolific blogger and programmer, decided to explore TCP/IP networking in GNU Awk, and came up with this, a YouTube video downloader.

Subscribe to Peteris’ blog. It’s well worth reading.

Creating Excel files with Perl

August 11, 2008 CPAN, Data munging 1 comment

Linux Journal has an article on creating Excel files using Spreadsheet::WriteExcel. It has its quirks, like creating corrupted spreadsheets if you try to populate a cell more than once, but when you need it, there’s nothing else to do what it does.

Scrabble cheating with Perl one-liners

June 2, 2008 Code craft, Data munging 2 comments

I’m in the middle of a game of Scrabulous with Christoper Humphries on Facebook, and I get “tolkien” handed to me in my tray. Good letters, and I ought to be able to make a bingo out of them. Alas, the best I could get to play on the board was “knot”, but what else could I have made? Perl to the rescue!

All I need to do is match across the contents of /usr/share/dict/words in a Perl one-liner. The -n flag means “loop over the input file, but don’t print $_”. My little program goes in -e, and it looks like this:

$ perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
/e/ && /n/' /usr/share/dict/words
allokinetic
ankylopoietic
anticlockwise
automatonlike
bibliokleptomania
....

Lots of good words, but they’re awfully long. Let’s limit it to seven-letter bingos. We have to use the -l flag to drop the linefeed from the input lines, so the length call is accurate.

$ perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
/e/ && /n/ && length($_)==7' /usr/share/dict/words
$

Shoot, nothing there. Let’s try eight.

perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
/e/ && /n/ && length($_)==8' /usr/share/dict/words
knotlike
townlike

“knotlike”! That would have been beautiful. Oh well. 🙁

Who’s making bogus web requests?

November 28, 2007 Data munging, Web 1 comment

Yesterday I noticed in my Apache access log a lot of 404s that looked like this:

aaa.xx.65.186 - - [25/Jul/2007:05:55:05 -0500] "GET http://www.some-advertising-site.com/banner/digits HTTP/1.1" 404 305 "http://some-different-website.com/" "legitimate-looking agent"

Not only am I not hosting banner ads, the GET request is invalid. It should be GET /banner/digits..., without the scheme and hostname part of it. I wondered how many I had of these, and how many hits I was getting. A Perl one-liner to the rescue!

perl -MData::Dumper -nae'++$n{$F[0]} if /GET http/; 
END{print Dumper%n}' access.log
$VAR1 = {
'aaa.xx.65.186' => 132, # Real IPs obscured
'bb.yyy.7.60' => 48,
'ccc.zzz.46.147' => 111,
'dd.qq.71.82' => 33
};

So it looked like I was getting hit by a couple of 0wnz0red boxes with some sort of virus on them. I added them to my iptables DROP list and was done with it.