An article I was reading mentioned research regarding shark attacks from the 16th century to present day. This momentarily caught me off guard, as a part of my mind rejected logic and decided that sharks didn't exist until the 20th century, perhaps because that's when TV and film came into existence.

I saw someone with a hat that said "DUBSTEP". I didn't find it all that odd, but were it another genre of music on that fella's head, it would be a weird hat. Dubstep!

I had to make my first sitemap, which is an XML document that tells search engines what pages to scan. The site has over 150K pages, so doing this manually was not an option. My next thought was to use an inexpensive program that periodically scans the site, and automatically generates new sitemaps. Despite the convenience, these programs were not easily customized, and required impossible-to-authorize server modifications. I'm glad these barriers existed, as the solution was writing a script that got me acquainted with SimpleXML.

Instead of scanning the entire site, a small amount of URLs are entered manually, and the rest were populated based on database results. Using database calls instead of crawling through files means it's a faster process that gives me more control over what the map includes. Because a sitemap is limited to 65000 entires, I made sure that the script generated additional maps before the current ones reached capacity, then rendered an index that referenced every map based on how many were created.

I haven't crunched the numbers yet, but after examining my samples of @ messages to the MSNBC and Fox News Twitter accounts, there are significantly more negative messages to Fox, and the majority of MSNBC's negative messages are complaints about Pat Buchanan being on their station. Twitter may be more of a bubble than Ann Arbor.

I'm doing a project that measures hostility in Tweets sent to Fox News vs MSNBC. Twitter's OAUTH restrictions made it difficult for me to collect the data with a PHP script I was working on, but fortunately I found this program:
Every 10 minutes it performs a new search for public mentions of the two networks, and adds the results to an XML file. I can then import the XML into Excel and perform all sorts of analysis on it.
