I love Baseball-Reference.com. I mean, who doesn’t love Baseball-Reference.com, it is one of the best things on earth, besides beer and baseball. I think I look something up in that site everytime I think about baseball – be it stats, ages, history, past results…
In an anticipation for the 2013 Baseball Hack Day which is coming March 30th, I’ve installed the lahman database on my MAMP server on my mac. Here is how I did it.
Databases for sabermetricians, Part One by Colin Wyers, working with Working with the Lahman Baseball Database by Open Source Sports, Installing the SQL Version of the Lahman Database by Pitch by Pitch were all very useful. As with Add / Import .SQL file To MySQL Database Server.
After downloading the 2012 Version of SQL version lahman database from Sean Lahman’s site, I discovered that the lahman2012.sql file is 53 MB in size, and contains 516,646 lines of SQL code. Which is fine. Until when I go to MAMP’s phpMyAdmin page, click on “Import” and discover that the “file to Import” has a little note taht says “(Max: 32MiB).”
Well, it can take the .zip file (“File may be compressed (gzip, bzip2, zip) or uncompressed.
A compressed file’s name must end in .[format].[compression]. Example: .sql.zip”) so I just right click on the lahman2012.sql, select “Compress lahman2012.sql” option and, a ha, I am so smart, now the file is less than 10 MB, and problem solved.
Unless, of course, problem is not solved.
When I hit “Go” I get nothing. A blank page. No spinning wheel or “please wait” note. Nothing.
Puzzled. Ask internet. Why, internet?
This guy is having exact same problem as me (God bless Stack Overflow). He says “I figured out why, its was the zip file that was giving problems, when you use another format for example gzip, everything goes smooth :)”
A-ha. Learn new thing every day. I didn’t know this, but Mac (at least on my Moutain Lion) ships with bzip2 in the terminal.
All I needed to do was
bzip2 -z lahman2012.sql from the terminal, which produced lahman2012.sql.bz2, and import that file, and boom! I have a database with 25 tables, with ~514,573 rows of data.
I am going to play with Mat Gargano’s Baseball Tools PHP library, and such (which uses ezSQL, I didn’t know about it, but it looks like it is super useful). I am all geeking out in baseball geek heaven and it will be a lot of fun!!