I love parsing public data. I blogged about it here http://blogs.securiteam.com/index.php/archives/328 about 4 years ago (wow, how time flies)
Now, there is a new set of email data from Supreme Court Justice nominee Elena Kagan which the Sunlight Foundation folks put into a nice gmail interface here: http://elenasinbox.com/
Unfortunately, the dump from the archives looks to be in PDF format. I’m hoping there is a way to get the plain text dump of these emails. I’ve contacted the Sunlight guys and hope to get a chance to run some parsing algorithms shortly
Update: Tom Lee and Jake Brewer quickly responded and shared their methodology with me (thanks guys!)…I’m downloading now and will be parsing shortly
Last update: After getting everything converted over to text, I ran a series of checks for different things like checking/saving accounts, ssn, credit card, pr0n, etc. The only hits were a password to a non-existent site and some pr0n hits in the received box. All in all, very tame stuff.
!Dmitry
dmitry.chan@gmail.com