Saturday, March 09, 2013

Baseball Hack 22: translate.pl and rosters.pl

I'm reviving the blog when it applies to my 35 for 35. Yes, I'd love to blog here more, but life has brought along many more important things over these past few years. That being said, here's some Windows-compatible code for hack 22 in the book Baseball Hacks. First, translate.pl. Run this from the same directory your zipped Retrosheet event files are stored, and it will unzip them and concatenate all the play-by-play data to pbp.csv. This code requires the Perl extension Archive::Extract, and also takes advantage of readdir functionality only available in Perl 5.2 or later.
#!/usr/bin/perl
use Archive::Extract;

$outfile = '"C:\Users\John\Desktop\Baseball Hacks\retrosheet\pbp.csv"';
print `type all_hdr.txt > $outfile`;

opendir RSDIR, "." or die "can't open directory .: $!\n";
while (readdir RSDIR) {
 if ( $_ =~ /(\d\d\d\deve)\.zip$/ ) {
   print "Unzipping $_\n";
   my $ae = Archive::Extract->new( archive => $_ );
   my $ok = $ae->extract( to => '.\\' . substr($_, 0, -4) );
   opendir YRDIR, substr($_, 0, -4) or die "can't open directory .: $!\n";
   chdir(substr($_, 0, -4)) or die "can't change to directory .: $!\n";
   while (readdir YRDIR) {
    if ( $_ =~ /(\d\d)(\d\d)(\w\w\w)\.EV[AN]$/ ) {
     $century = $1; $year = $2; $team = 3;
     print `..\\BEVENT.EXE -y $century$year -f 0-96 $_ >> $outfile`;
    }
   }
   chdir("..") or die "can't change to directory .: $!\n";
   close YRDIR;
 }
}
close RSDIR;
print "done\n";
Similarly, here is rosters.pl, which loops through the unzipped event directories and concatenates all roster files for all years into a single file. You must specify this file on the command line, e.g. ./rosters.pl > rosters.csv
#!/usr/bin/perl

print "retroID,lastName,firstName,bats,throws,team,pos\n";

opendir RSDIR, "." or die "can't open directory .: $!\n";
while (readdir RSDIR) {
 if ( $_ =~ /(\d\d\d\d)eve$/ ) {
   opendir YRDIR, $_ or die "can't open directory .: $!\n";
   chdir($_) or die "can't change to directory .: $!\n";
   while (readdir YRDIR) {
    if ( $_ =~ /(\w{3})(\d{4})\.ROS$/ ) {
     $team = $1;
     $year = $2;
     open FILE, "<$_";
     while () {
      s/\n//;
      s/\cM//;
      s/\"//g;
      if (/[a-z]{5}\d{3}/) {
       print "$year,$_\n";
      }
     }
     close FILE;
    }
   }
   chdir("..") or die "can't change to directory .: $!\n";
   close YRDIR;
 }
}
close RSDIR;
print "done\n";
Once you've run this Perl code to create pbp.csv and rosters.csv, you can add them to your SQL database using the instructions in the book.