Last week I was racking my brain trying to understand the nuances of such research terms as “statistically significant” and “educationally meaningful” as it relates to Carnegie Learning’s Algebra 1 Cognitive Tutor.
Below are their comments.
I’m one of the founders of Carnegie Learning, and I hope I can answer your questions about the study.
Characterizing the effects as “educationally meaningful” or “nearly double the expected gain” follows from a study by Lipsey and others. That study reports that year-over-year gains on standardized math tests are about .22-.25 standard deviations for 8th and 9th grade students (Table 5). Cognitive Tutor students gained about .2 standard deviations relative to the control group (i.e. on top of the “normal” gain of .22-.25), which is where the “nearly double” comes from.
Hattie does talk about the difference between a pre- to post-test effect size and an effect relative to a control group, but it isn’t clear that he accounts for this in his meta-analyses. If you assume that the control group gained .25 standard deviations over the school year, then the Cognitive Tutor group gained .45 standard deviations, relative to pretest (the study used a prealgebra test as the pretest and an algebra test as posttest, so you can’t directly measure gain). So maybe Hattie would agree that this is educationally meaningful.
Part of the point of the Lipsey work is to challenge the notion that effect size is a “common ruler” that can be consistently used across contexts. Hattie might not agree.
This is an intriguing argument — of course I would want to be careful about using the d> .40 as if it applied willy nilly to everything. It is tough out there making changes and even changes of d> .20 can be worth striving for. Yes, double a control group is worth looking at. I have split many of the effects in VL into those compared to a control group and those that are more pre-post. Even for the former (control – comparison) a contrast of .40 is average. So I would not completely agree with the comments above but would note that they (and indeed all in the VL book) are probabilities – so the probability of this program having a worthwhile impact is pretty good –but the true test is your implementation and this is what I would be focusing on – for example you may be implementing this program with greater than .20 effects found in the report (or not) – so I would make this my focus.
1.) d> .40 is a guide.
2.) An effect of d> .20 can be worth striving for.
3.) Focus on the implementation. Know if you are getting greater or lesser than the effects found in research.