Data Analytics
After 44 years of PC rule, Albertans voted in a majority NDP government in Tuesday’s election. The result surprised many even though polls were predicting an NDP landslide. Wherever you fit on the political spectrum, until it happened, it didn’t seem possible.
So how did it happen? The fall (or death) of the PC dynasty. Rachel Notley and the impression she made on Albertans. Naheed Nenshi nudging Calgary voters to be open to alternatives. Prominent businessmen hectoring Edmontonians to stick with the status quo. Alberta’s changing demographics. Or how about mirrors, downturns, hope, fear, or math?
[Read more →]
In Part 1 of this post, I looked at a few criticisms of the methodology used in the Fraser Institute’s high school rankings. Here, I’m going to explain what I think is the real problem with the rankings: they’re not necessary.
The Fraser rankings, released annually and regularly reported by the media, have largely shaped public perception of school performance. The authors have stated they want to make it possible for parents and educators to easily compare and monitor the academic performance of Alberta high schools. They could have done this by creating a better interface for Alberta diploma exam data. Comparisons made with that data would be easy to understand and evaluate. Instead, they came up with a complicated (and arbitrary) scoring formula to rate and rank schools that essentially shifted the conversation from how schools are doing academically to how schools are doing in the Fraser rankings.
[Read more →]
In Part 1, I’ll take a closer look at a few criticisms of the Fraser Institute’s Alberta high school rankings, an annual attempt to compare the academic performance of secondary schools across the province. I’ll then explain in Part 2 what I think is the real problem with the rankings: they’re not necessary. Alberta Education achievement data can already be used to monitor academic performance at individual schools. Direct comparisons made with that data would be easy to understand and evaluate. The Fraser ratings, which combine diploma test results and other variables into a single score using an ad hoc formula, are needlessly complicated and misleading, both for parents and for administrators.
[Read more →]
TweetHeroes is a Twitter app that discovers and ranks influential users tweeting on specific topics. We built it to make sense of the conversations, players, and networks in Twitter. We know about wefollow, Klout, twitaholic, and Twitalyzer among others. They just didn’t show us what we wanted to see or how we wanted to see it. So we built our own social media analytics thingy. Here are three ways you can use it.
Want to know the central players in networks that are tweeting about Boston, Seattle, Ottawa or 30 other North American cities? How about political networks like the Tea Party or Gov 2.0? Check out our topic-specific ranking pages like this one for New York City:
[Read more →]
Over the past couple of months, our team has been working on TweetHeroes — a Twitter tool to discover and rank influential users on specific topics (among other things). During this time I’ve followed the #yeg stream, let’s say obsessively. And I’ve noticed that the debate over the Edmonton City Centre Airport (#ecca) is pretty much a constant, consistently trending among the influential #yeg users.
There are two sides to the debate: support the plebiscite or dead issue, move on. So where does everyone stand?
Good question. Because our team is conditioned to see everything in terms of networks, we decided to dig into the #yeg #ecca stream over the past couple of months to see how the players in this tempest are connected. We identified #yeg Twitter users who’ve tweeted at least twice about #ecca and #yeg in that roughly two-month time frame. We then built the retweet network for these users, connecting two users if either one has retweeted the other. (Twitter retweets on issue-specific tags are excellent indicators of affinity or a shared position.)
At this point, we had a confusing and densely connected graph — the #ecca tag is very popular! What we really wanted to know though was where everyone stood, not just on the issue, but in relation to each other. Who was on each side? Who was central? Who was supporting whom?
[Read more →]
We know from last week’s post on analyzing the Edmonton census data that adjacent age groups generally tend to group together in Edmonton neighbourhoods; e.g., 50-54 year-olds tend to live in neighbourhoods with relatively higher numbers of 40-49 and 55-59 year-olds. I’m going to take this idea a little further and, using some common clustering techniques, show how Edmonton neighbourhoods can be divided into 5 major age-based clusters.
Clustering in a nutshell
According to the Wikipedia article, clustering is “the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense.” The most famous clustering algorithm, and the one that we used for this analysis, is called k-means. (Andrew Moore has an excellent tutorial for those interested.) K-means is a relatively simple but powerful technique that’s very useful for exploring datasets. There are quite a few details that a practitioner has to sort out (e.g., scaling, collinearity, etc.), but the output of k-means often reveals clear and distinct patterns and helps us get our bearings, particularly with marketing data.
[Read more →]
The City of Edmonton recently released the 2009 Edmonton municipal census data as part of the Open Data initiative. The current catalogue doesn’t include results at the neighbourhood level (except in PDF). Edmonton blogger Mack Male recently shared a neighbourhood census file. He also gave an excellent map-based example of what could be done with such interesting data.
[Read more →]