In Part 1, I’ll take a closer look at a few criticisms of the Fraser Institute’s Alberta high school rankings, an annual attempt to compare the academic performance of secondary schools across the province. I’ll then explain in Part 2 what I think is the real problem with the rankings: they’re not necessary. Alberta Education achievement data can already be used to monitor academic performance at individual schools. Direct comparisons made with that data would be easy to understand and evaluate. The Fraser ratings, which combine diploma test results and other variables into a single score using an ad hoc formula, are needlessly complicated and misleading, both for parents and for administrators.
The Fraser High School Rankings
Every year, the Fraser Institute produces a ranking of secondary schools in Alberta called the School Report Cards. According to the authors, Peter Cowley, Stephen Easton, and Michael Thomas, these report cards collect “a variety of relevant, objective indicators of school performance into one, easily accessible public document so that anyone can analyze and compare the performance of individual schools.” Parents can use the rankings to evaluate schools for their children. Administrators can track rankings over time to improve performance at individual schools and across the system.
The authors contend that they’ve developed an accurate measure of school performance that can be used to evaluate and improve the educational system. Not everyone agrees. The rankings’ many critics vigorously attack the philosophy (Can we measure school performance with standardized tests? Should we be ranking schools at all?), the politics (The rankings are produced by a right-wing think tank with an anti-public school agenda.), and the methodology (Aside from test scores, none of the other indicators are accurate proxies for achievement.) behind the work.
As a data agency, we do a fair bit of work in developing and evaluating KPIs for public and private sector clients, hence our interest in how people go about measuring performance in different settings. I thought it might be useful to take a data-driven look at the the report cards, focusing narrowly on how the authors calculate a school’s rating and the effect, intended or unintended, of their choices on the actual rankings they produce.
How are the Fraser Ratings calculated?
The information here is taken directly from the Institute’s 2012 Report which describes the authors’ methodology in detail. Overall school ratings are meant to reflect academic performance and are calculated using diploma exam results released by Alberta Education and other public sources of data. The authors base a school’s overall rating on its performance on eight indicators which are categorized into three groups.
Three indicators of effective teaching: (1) average diploma examination mark; (2) percentage of diploma examinations failed; (3) difference between the school mark and examination mark in diploma courses;
An indication of consistency in teaching and assessment (gender gap): (4) difference between male and female students in the average value of their exam marks in English 30-1; (5) difference between male and female students in the average value of their exam marks in Pure Mathematics 30;
Three indicators of practical, well-informed counselling: (6) diploma courses taken per student; (7) diploma completion rate (the rate at which first-time grade 12 students receive a diploma in the school year); (8) delayed advancement rate (the rate at which students fail to graduate or advance a grade in the school year);
To calculate an overall rating, the authors first standardize the results for each of the indicators. These standardized results are then weighted and combined to produce an overall standardized score. This overall score is then converted into a rating on a 10-point scale.
Indicator weightings: average exam mark—20%, percentage of exams failed—20%, school vs exam mark—10%, English 30 gender gap—5%, Pure Math 30 gender gap—5%, courses taken per student—20%, diploma completion rate—10%, and delayed advancement rate—10%.
In instances when Gender gap could not be calculated, school vs exam mark difference was weighted at 20%. When delayed advancement rate could not be calculated, diploma completion rate was weighted at 20%.
Some common criticisms of the Fraser Ratings
1. The ratings are not indicators of academic achievement. Test scores have a research basis to merit association with achievement. The other indicators do not. But the weighting formula assigns a mere 20% to the average diploma exam mark. With test scores contributing so little, the ratings clearly don’t reflect what they’re meant to: academic achievement.
What the data says: This one’s straightforward to counter. The correlation between the average diploma mark and the overall rating is a very healthy 0.90. You can see the strength of the relationship in the graph below (each point represents one of 276 Alberta high schools).
Why is the correlation high if the weighting is low? I’ll come back to this when I look at the covariance of the indicators. For now, if we accept that exam marks are a good measure of academic achievement, then the Fraser ratings, with their high correlation with average diploma marks, should be considered a good measure of academic achievement.
2. Everyone knows that schools which rank the highest are almost always those from higher income neighbourhoods. Upper income neighbourhood schools have far more advantages and resources than schools in lower income neighbourhoods and will obviously do better academically. What’s the point in comparing them?
What the data says: Can most of the variation in the academic performance of schools be attributed to the economic status of a school’s students? Does economic status determine academic performance?
To examine this issue, the authors combined census data from Statistics Canada with enrollment data from Alberta Education to estimate the average parental income of the student body attending each school. (This is a common and useful technique: marketers often use census data in this manner to study the demographics of their customers.)
Assuming the data is correct, I’ve graphed average exam mark against parental income in the graph below. Parental income is weakly correlated with average exam mark: correlation coefficients are 0.56 for Calgary and Edmonton schools, 0.24 for schools outside the cities, and 0.44 overall.
What can we learn? Although it’s true that higher income schools tend to have higher exam results than lower income schools, there is still a great deal of variation in academic performance that isn’t explained by economic status. Schools matter, and, given similar socioeconomic contexts, effective schools produce better academic results than their peers.
3. The factors that make up the ratings are clearly inter-related and reinforce each other. Larry Booi, former president of the Alberta Teacher’s Association, criticized an earlier version of the ratings that included only five indicators: “A school that scores high on the first factor (diploma exam marks) will almost certainly score high on the second factor (that is, low diploma exam failure rates)… Thus, while the rating is purportedly based on five factors, in reality it is based on five aspects of one factor—diploma examination courses.”
What the data says: The relationships between indicator variables can be seen in the correlation matrix below (AEM: average exam mark; CPS: courses taken per student; DAR: delayed advancement rate; DCR: diploma completion rate; ENG: English 30 gender gap; %FAIL: percentage of exams failed; MATH: Pure Math 30 gender gap; SCH: school vs. exam mark):
Average exam mark is correlated to all indicators except the two gender gap variables. This explains why we observe a strong correlation between the Fraser rating and exam marks despite the low weighting of exam marks in the scoring formula. We can examine the covariance structure more closely by performing and interpreting a principal components analysis (PCA) on the indicator data. (This is another very useful statistical technique for exploring and understanding relationships among attributes in datasets.) The graph of the first two principal components is shown below:
The first principal component (the x-axis) measures academic performance. Good indicators of academic achievement (average exam mark, diploma completion rate, and courses taken per student) have a positive projection onto the first component while bad indicators (delayed advancement rate, percentage of exams failed, and school vs. exam mark) have a negative projection. Academic achievement is the most important factor but not, as Booi suggested, the only factor we have to consider: the first principal component accounts for just 47% of the variance in the original data.
The second principal component (the y-axis) is interesting. Three indicators (diploma completion rate, courses taken per student, and school vs. exam mark) have a positive projection onto the second component, and they account for most of the loading. This means that the second component highlights schools with high diploma completion rates (a good thing) but also high discrepancies between school awarded marks and diploma marks (not a good thing). Let’s call it the grade inflation component.
The third component (which isn’t shown) has equally high loading from the Pure Math and English 30 gender gap indicators. All other indicators have a near-zero projection onto this component. This shows that gender gap is largely independent of the other academic achievement variables, a finding that helps explain a quirk of the ratings that I’ll get to a little later.
To sum up, PCA shows that the first component is academic achievement, the second component is grade inflation, and the third component is gender gap. Taken together, the first three components account for over 80% of the variance in the Alberta high school data set.
4. The rankings dramatically distort relatively small differences in exam marks, with the result that some schools appear to perform far worse than they actually do. Schools that are separated by over 200 positions in the rankings have less than a 10% difference in their average exam marks. Most schools are separated by a few percentage points. Why place such a big emphasis on such small differences?
What the data says: The distribution of average diploma exam marks has a mean of 63.8 and a standard deviation of 5.4. It ranges from a low of 37.0 (Calgary’s International School of Excellence – a private school) to a high of 85.6 (Edmonton’s Old Scona Academic – a public school). A histogram of the exam scores (with a normal distribution superimposed) is shown below.
Do small differences in average diploma scores between schools matter? It depends on the size of the difference, the number of students taking the diploma exams in each school, and the variance of the test scores in each school. These three factors can be used to perform a difference of means test, a statistical method for comparing means to see if one is significantly different from another. The test allows us to determine when a difference in means is unlikely to be explained by chance and thus points to a real, underlying distinction in the performance of the schools.
A difference of 10% in the middle of the distribution, it turns out, is highly significant, even when we’re dealing with the smaller schools (in the range of 30 diploma students). For larger schools (300 or more diploma students), a difference of 3% in average diploma scores is often significant.
5. Schools operate with different missions and in different contexts. They have vast differences in student populations. Some serve special needs students, some have a high proportion of ESL students, some are in small, rural communities, some are in dense, urban neighbourhoods, some select students based on high academic achievement, some serve students with no plans for higher education, some serve the general student population, some are private and exclusive. Yet the Fraser Institute ignores these differences and ranks all schools on the same scale.
What the data says: We work with national retailers operating in multiple markets. To evaluate stores, we often cluster them in terms of retail traffic, urban/rural location, and surrounding demographics. We then compare the performance of stores within each cluster. This allows management to set realistic benchmarks and create suitable programs for each individual store.
Schools, like stores, operate within contexts. Using k-means clustering on school characteristics such as enrollment, location, and parental income, we can identify distinct school segments (e.g., wealthy private schools, large urban schools, small rural schools, etc.) in the Fraser data. Some schools also have a very specific focus: Old Scona selects kids based on high academic achievement, Braemar is a school for pregnant and parenting teens, Jack James prepares students for a transition to the workforce, etc.
Using clustering techniques and an understanding of very specific school missions to segment schools, and then replacing a global ranking with a ranking in each school segment would be a more meaningful input for parents and administrators looking to evaluate schools.
(The % of special needs and ESL students — factors that are often cited by Fraser critics as important drivers of school achievement — are surprisingly uncorrelated to the average diploma exam mark in the Fraser data.)
6. The gender gap variables systematically bias the rankings to favour single sex, i.e., private, schools. This argument was put forward by Michael Simmonds in his 2012 PhD thesis, Interpreting the Fraser Institute Ranking of Secondary Schools in British Columbia. Simmonds noted that after the gender gap variables were introduced, BC public schools fared significantly worse in the rankings, with a 92% decline in the number of public schools scoring between 9.0-10.0, the top end of the scale. By contrast, the decline in top ranking private schools (more of which are boys or girls only) was only 32%. This resulted in a redistribution of schools at the top end, with public schools representing just 10% of the “best” with the new scoring formula, as opposed to 46% prior to the introduction of the gender gap variables.
What the data says: Gender gap data is used as an input to a school’s rating if it’s available. If it isn’t available, School vs. exam mark difference is weighted higher. As we’ve already seen, school vs. exam mark difference is correlated with the other academic achievement indicators in the scoring formula, but the two gender gap variables are not.
If you subtract a negatively correlated random variable from another random variable, you increase the variance of the difference more than if you subtract an uncorrelated random variable: high values go higher, low values go lower. High achieving schools without gender gap data will get a boost in their academic rating (compared to high achieving schools with gender gap data) due to the increased weight of the (negatively correlated) school vs. exam mark difference indicator to substitute for the missing (uncorrelated) gender gap variables.
(It’s as though, in a math course, two high-achieving students were asked to take one more test to determine their final score. The first student wrote a 20-question math test; the second student flipped a fair coin 20 times and counted the number of heads. Who ends up with the higher mark?)
This is one explanation for the improved performance of single-gender private schools in the BC rankings. It also points out the unexpected consequences that can flow from the use of complex scoring formulas.
- The Fraser ratings are highly correlated with academic achievement.
- Given similar socioeconomic contexts, some schools produce better academic results than others.
- The data used in the Fraser ratings measures three components: (1) academic achievement; (2) grade inflation; (3) gender gap.
- Relatively small differences in academic achievement between schools can be statistically significant.
- There are distinct clusters of schools; comparisons within clusters would be more meaningful than the global comparison used by the Fraser Institute.
- The gender gap variables systematically bias the ratings to favour single-gender schools and schools with no gender data.
How do you map the census data to the schools to get a guess as to the socioeconomic status of the parents? I have always been leery of that aspect of the FI rankings: private schools draw from higher socioeconomic strata in very large catchments, much larger than the areas they are nominally “within”. This is also true of “magnet” schools like French immersion public schools; oftentimes the French schools basically operate like little private schools, creaming a particular strata of family out of the other schools. If a private school is assigned a large geographic “catchment”, it’ll naturally underestimate the incomes of the parental population. I’d feel better about the income/achievement graph if I knew the incomes were actually measured from the student body, not estimated from geographic principles.