Recently in Data munging Category

Downloading video with Awk

| | Comments (0)

Peteris Krumins, the prolific blogger and programmer, decided to explore TCP/IP networking in GNU Awk, and came up with this, a YouTube video downloader.

Subscribe to Peteris' blog. It's well worth reading.

Linux Journal has an article on creating Excel files using Spreadsheet::WriteExcel. It has its quirks, like creating corrupted spreadsheets if you try to populate a cell more than once, but when you need it, there's nothing else to do what it does.

I'm in the middle of a game of Scrabulous with Christoper Humphries on Facebook, and I get "tolkien" handed to me in my tray. Good letters, and I ought to be able to make a bingo out of them. Alas, the best I could get to play on the board was "knot", but what else could I have made? Perl to the rescue!

All I need to do is match across the contents of /usr/share/dict/words in a Perl one-liner. The -n flag means "loop over the input file, but don't print $_". My little program goes in -e, and it looks like this:

$ perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
    /e/ && /n/' /usr/share/dict/words 
allokinetic
ankylopoietic
anticlockwise
automatonlike
bibliokleptomania
....

Lots of good words, but they're awfully long. Let's limit it to seven-letter bingos. We have to use the -l flag to drop the linefeed from the input lines, so the length call is accurate.

$ perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
    /e/ && /n/ && length($_)==7' /usr/share/dict/words
$

Shoot, nothing there. Let's try eight.

perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
    /e/ && /n/ && length($_)==8' /usr/share/dict/words 
knotlike
townlike

"knotlike"! That would have been beautiful. Oh well. :-(

Yesterday I noticed in my Apache access log a lot of 404s that looked like this:

aaa.xx.65.186 - - [25/Jul/2007:05:55:05 -0500] "GET http://www.some-advertising-site.com/banner/digits HTTP/1.1" 404 305 "http://some-different-website.com/" "legitimate-looking agent"

Not only am I not hosting banner ads, the GET request is invalid. It should be GET /banner/digits..., without the scheme and hostname part of it. I wondered how many I had of these, and how many hits I was getting. A Perl one-liner to the rescue!

perl -MData::Dumper -nae'++$n{$F[0]} if /GET http/; \
    END{print Dumper\%n}' access.log

$VAR1 = {
          'aaa.xx.65.186' => 132, # Real IPs obscured
          'bb.yyy.7.60' => 48,
          'ccc.zzz.46.147' => 111,
          'dd.qq.71.82' => 33
        };

So it looked like I was getting hit by a couple of 0wnz0red boxes with some sort of virus on them. I added them to my iptables DROP list and was done with it.

About this Archive

This page is a archive of recent entries in the Data munging category.

CPAN is the previous category.

Databases is the next category.

Find recent content on the main index or look in the archives to find all content.

Other Perl Sites

Other Swell Blogs

  • geek2geek: An ongoing analysis of how geeks communicate, how we fail and how to fix it.