I'm reviving the blog when it applies to my
35 for 35. Yes, I'd love to blog here more, but life has brought along many more important things over these past few years.
That being said, here's some Windows-compatible code for hack 22 in the book
Baseball Hacks. First, translate.pl. Run this from the same directory your zipped
Retrosheet event files are stored, and it will unzip them and concatenate all the play-by-play data to pbp.csv. This code requires the Perl extension
Archive::Extract, and also takes advantage of
readdir functionality only available in Perl 5.2 or later.
#!/usr/bin/perl
use Archive::Extract;
$outfile = '"C:\Users\John\Desktop\Baseball Hacks\retrosheet\pbp.csv"';
print `type all_hdr.txt > $outfile`;
opendir RSDIR, "." or die "can't open directory .: $!\n";
while (readdir RSDIR) {
if ( $_ =~ /(\d\d\d\deve)\.zip$/ ) {
print "Unzipping $_\n";
my $ae = Archive::Extract->new( archive => $_ );
my $ok = $ae->extract( to => '.\\' . substr($_, 0, -4) );
opendir YRDIR, substr($_, 0, -4) or die "can't open directory .: $!\n";
chdir(substr($_, 0, -4)) or die "can't change to directory .: $!\n";
while (readdir YRDIR) {
if ( $_ =~ /(\d\d)(\d\d)(\w\w\w)\.EV[AN]$/ ) {
$century = $1; $year = $2; $team = 3;
print `..\\BEVENT.EXE -y $century$year -f 0-96 $_ >> $outfile`;
}
}
chdir("..") or die "can't change to directory .: $!\n";
close YRDIR;
}
}
close RSDIR;
print "done\n";
Similarly, here is rosters.pl, which loops through the unzipped event directories and concatenates all roster files for all years into a single file. You must specify this file on the command line, e.g.
./rosters.pl > rosters.csv
#!/usr/bin/perl
print "retroID,lastName,firstName,bats,throws,team,pos\n";
opendir RSDIR, "." or die "can't open directory .: $!\n";
while (readdir RSDIR) {
if ( $_ =~ /(\d\d\d\d)eve$/ ) {
opendir YRDIR, $_ or die "can't open directory .: $!\n";
chdir($_) or die "can't change to directory .: $!\n";
while (readdir YRDIR) {
if ( $_ =~ /(\w{3})(\d{4})\.ROS$/ ) {
$team = $1;
$year = $2;
open FILE, "<$_";
while () {
s/\n//;
s/\cM//;
s/\"//g;
if (/[a-z]{5}\d{3}/) {
print "$year,$_\n";
}
}
close FILE;
}
}
chdir("..") or die "can't change to directory .: $!\n";
close YRDIR;
}
}
close RSDIR;
print "done\n";
Once you've run this Perl code to create pbp.csv and rosters.csv, you can add them to your SQL database using the instructions in the book.