
How much do I browse on the weekends versus weekdays? Late night versus day time?.Then, use those general numbers to guide more specific and interesting queries: What is the peak hour for my web visiting activity?.What's the average number of sites visited per day?.How many webpage visits total does the History database contain?.Start off with questions that get the general outline of the data: Whether data is personal or public, the basic questions and queries are largely the same. But even if you limit ourselves to collating and counting the URLs that you've visited, what you visit on the web reflects in part your interests, your fears, and even your sleeping habits. Questions to askĭepending on what browser you use, you'll find out that your browser has recorded a lot more information about you than what websites you've visited. Luckily for us, knowing SQL opens a vast array of opportunities for practicing analysis on interesting datasets.įor example, if you even occasionally browse the web on your personal computer, then you are in complete ownership of a unique and very personal dataset, the records of which are entirely of your own making: the history of websites you've visited, which conveniently for us, every major browser today stores in an easy to access SQLite database. SQL expertise can only do so much.ĭata analysis and SQL should feel difficult and foreign when you are working with difficult and foreign data. You haven't had the time to build a beat, and then to get the tips and scoops from the officials and folks who know where the stories are. If you're new to journalism, you don't have this advantage. They already know what they'll find in the data before writing an actual query. The ones who do it well are intimately knowledgeable about what's in the data, what's missing, and everything in the world that that data touches. So how do journalists extract insights and powerful stories from even the most benign datasets. But this depth of data required the state legislature to care about the problem of racial profiling, and then to pass a law and allocate resources to properly collect the data. In contrast, every law agency in Connecticut publishes detailed data about every traffic stop, including the age, gender, race, and ethnicity of the driver, the reason the stop was initiated, whether the vehicle was searched, and what, if anything, was found. the age, race, and gender of the subject, while being vague about the reason for the stop and what happened during the stop:

While Menlo Park publishes police stop data, it's almost entirely lacking information about who was stopped – e.g. Before the data is made publicly available, agencies can be overzealous in scrubbing it of the details that are not only interesting, but provide vital context needed to accurately analyze the data. That said, it's not easy to learn SQL with public data.

We study public data because its free, its creation is a result of our tax dollars, and its contents and insights influence our laws and policies. Whoever first thought "If you didn't do anything wrong, what do you have to hide?" obviously didn't know SQL.
