I really want to understand this and I hope you can help me. Matt Townsley’s blog post and a recent direct marketing email from Carnegie Learning have got me thinking.

I’ve had an opportunity to read and share John Hattie’s research on Visible Learning in middle level endorsement classes and I do believe, overall, he’s onto something. There are hundreds of strategies or programs teachers can implement to raise student achievement. Some have greater impact than others.

Hattie’s research has taught me to look for the game changers–any strategy that has an effect size of 0.40 or greater is worth implementing. I’m not a statistician but I’m using that effect size to help me navigate through the maze of educational products being touted.

There’s been a lot of PR of late regarding the results from a study on Carnegie Learning’s Algebra 1 Cognitive Tutor. According to the research students grew from the 50th to the 58th percentile.

This is where I become confused. The research abstract states, “The estimated effect is statistically significant for high schools…” And the conclusions state, “The effect size of approximately 0.20 is educationally meaningful” (page 27).

I’m perplexed. The RAND research says this is statistically significant and educationally meaningful. Hattie’s yard stick would say it has low impact.

What are your thoughts? Can you clarify my befuddlement?

### Like this:

Like Loading...

*Related*

AT our school right now, I think we teachers would take anything in the positive growth camp. But the administration says not if it costs. Or if it’s too controversial. I like Jiji the penguin, it’s exactly what our ELL students need. But it requires spending money with a group that is not a traditional textbook company. Not gonna happen.

I tried the Carnegie Course. I thought it was great for helping both the low performing students and the high achievers, since they could go at their own pace. It didn’t help the students whose own pace was somewhat akin to that of a sleeping sloth. They were the only group we evaluated as a new textbook company who were honest about our chances of success if we carry on as we are. Bottom line: it was too expensive for our school. Cheaper to have kids not learn, but pass them anyway.

I liked college prep math. It was cheaper, but too risky for us to use. Not a text book company.

As I recall, right now there is nothing that actually works to improve math learning for all students at no cost and made by a major corporation. It reminds me of my boss in 1983 who refused to buy a desktop computer until IBM made one.

I’m glad to hear you had a positive experience. Were you part of the study, or just piloting the program? At the middle school level, according to the report, there is some concern with the validity of the experiment.

We’ve been holding out on piloting textbooks until the dust settles on the Common Core. In the mean time we’ve been aligning our current resources, supplementing with new stuff, etc.

So your boss wouldn’t buy a Compaq?!

I’m one of the founders of Carnegie Learning, and I hope I can answer your questions about the study.

Characterizing the effects as “educationally meaningful” or “nearly double the expected gain” follows from a study by Lipsey and others (see http://www.unh.edu/research/sites/unh.edu.research/files/docs/Infrastructure/IES%20-%20Interpreting%20Effects%20of%20Ed%20Interventions%2012-1128%20.pdf). That study reports that year-over-year gains on standardized math tests are about .22-.25 standard deviations for 8th and 9th grade students (Table 5). Cognitive Tutor students gained about .2 standard deviations relative to the control group (i.e. on top of the “normal” gain of .22-.25), which is where the “nearly double” comes from.

Hattie does talk about the difference between a pre- to post-test effect size and an effect relative to a control group, but it isn’t clear that he accounts for this in his meta-analyses. If you assume that the control group gained .25 standard deviations over the school year, then the Cognitive Tutor group gained .45 standard deviations, relative to pretest (the study used a prealgebra test as the pretest and an algebra test as posttest, so you can’t directly measure gain). So maybe Hattie would agree that this is educationally meaningful.

Part of the point of the Lipsey work is to challenge the notion that effect size is a “common ruler” that can be consistently used across contexts. Hattie might not agree.

Thank you so much for taking the time to respond. Over the next few days I’ll read the link you’ve provided. As I stated in the post I’m not a statistician, but I’m very interested in learning how to interpret results.

Thanks again.

Going from the 50th to 58th percentile is not such a big change. To take a more familiar example, that would be like going from a 510 to a 535 or so on the math SAT. The difference of 2 or 3 questions on a 54 question test.

One question is how big was the difference between the regular group and the one receiving the new program. Another question is was that difference “real” or maybe just due to random differences between the two groups.

Percentiles are a good way to think about the amount of difference. Going from the 50th to 58th is like the difference between your 13th highest score and 15th highest score in a class of 25 students. For SAT scores it is the difference between a 510 and 535 on the math section. You can measure differences in terms of standard deviations as well, but I find that less meaningful. Whether you think the 50th to 58th percentile is a lot is, of course, an opinion.

In everyday language we often say something is significantly better, meaning a lot better. Statistically significant doesn’t mean that at all. If you have large samples, then you figure your results are fairly accurate, so it doesn’t take much of a difference to get “statistically significant” results – just means you think the difference is real.

@l hodge – that’s a good description of statistical vs practical significance. But predicting what the practical effect would be is more complex than knowing the effect size or percentile rank. For example, the RAND study was designed as an “intent to treat” study, meaning that schools were randomly assigned to use Cognitive Tutor and then counted as using it, regardless of their level or quality of implementation. Since the study’s averaging in this way, you’d expect the overall effect size to be smaller than in a study with more control over implementation quality.