Evaluating Evaluations of Evaluations

Stark and Freishat (2014) are pretty negative about student evaluations of teaching. Many of their criticisms, however, seem to be about the general problems of surveys while other criticisms seem to be about poor tenure decision processes. Are student evaluations flawed? Yes, but I’m not sure their eight recommendations are the solution.

Recommendations:
1) Drop omnibus items. True, general “teaching effectiveness” measures can be misleading and may be influenced by inappropriate factors but what measures aren’t? The authors need more detail on why omnibus measures are uniquely bad.
2) Don’t average or compare averages. “Instead, report the distribution of scores, the number of respondent, and the response rate”. (Stark and Frieshat 2014, page 20). Averages help simplify and are standard in commercial and academic research. As academics regularly report averages in our research we can’t really complain about the use of averages in evaluating us. As to also reporting distributions I’m all for this.
3) Low response rates aren’t good. True, but this doesn’t mean student evaluations are bad, only surveys with low response rates.
4) Look at student comments but understand their limitations. Agreed, hopefully people are already doing this.
5) “Avoid comparing teaching in courses of different types, levels, sizes, functions, or disciplines”. (Stark and Freishat 2014, page 20). It is true that less comparable things are harder to compare but I’d hate to abandon all assessment of teaching. All courses differ — to assess what works best you must compare while acknowledging courses are never totally comparable.
6) “Use teaching portfolios as part of the review process”. (Stark and Freishat 2014, page 20). The authors worry that tenure decisions are made on just average student evaluation teaching ratings. I agree. Tenure is a big decision so deserves considerable effort. Smaller decisions don’t justify as much effort. Is their advice about all review processes or just tenure decisions?
7) Classroom observation is recommended. This may be useful but whilst academic observers usually have more subject knowledge than students, in elective subjects they often have much less than the teacher being observed meaning flaws remain. Observers also typically spend a lot less time in the classroom than the students. Fewer observations from fewer observers and the idiosyncratic mistakes/bias are likely to be larger.
8) The authors suggest evaluators spend more time looking at materials and observing during reviews. Using more information will probably improve assessment but has downsides. For example, extensive reviews take a lot of time which probably explains why extensive reviews sometimes aren’t being done at the moment.

While there are reasonable elements I worry that these recommendations transfer power to senior faculty observers from students. I fear that academics are too insular and student evaluations are one of the few ways outsiders can penetrate the academic bubble. Any solution that involves academics being more involved in evaluating academics makes me nervous.

Read: Philip B. Stark and Richard Frieshat (2014) An Evaluation of Course Evaluations, available here