Does better learning affect teaching evaluations? If so, how?
The answer, from new research approaches dating from 2010, seems to be that increased learning tends to cause lower scores on students' evaluations of teaching (SET), but this is a complicated issue that has historically been a bone of contention.
There is a huge literature on this topic. The people who study this kind of thing the most intensely are psychometricians. There are many things on which they seem to agree universally, and many of these areas simply represent the consensus view of professional psychometricians on their field in general:
The surveys used for students' evaluations of teaching (SET) should be designed by professionals, and are basically useless if created by people who lack professional expertise in psychometrics. Certain common practices, such as treating evaluation scores as if they were linear (and can therefore meaningfully be averaged), show a lack of competence in measurement.
It's a terrible idea to use SETs as the sole measure of a teacher's effectiveness. Multiple measures are always better than a single measure. But, as is often the case, administrators tend to prefer a single measure that is cheap to administer and superficially appears impartial and scientific.
SETs are increasingly being given online rather than being administered in class on paper. This is a disaster, because the response rates for the online evaluations are extremely low (usually 20-40%), so the resulting data are basically worthless.
The difficulty of a course or the workload, as measured by SET scores, has nearly zero correlation with achievement.
SET scores are multidimensional measures of multidimensional traits, but they seem to break down into two main dimensions, professional and personal, which are weighted about the same. The personal dimension is subject to biases based on sex, race, ethnicity, and sexual orientation (Calkins).
Getting down to the main question: does better learning affect teaching evaluations?
Before 2010, the best studies on this topic were ones in which students were randomly assigned to different sections of the same course, and then given an identical test at the end to measure achievement. These studies tended to show that SET ratings had correlations with achievement of about +0.30 to +0.44. But Cohen says, "There is one study finding of a strong negative relationship between ratings and the highest rated instructors had the lowest performing students. There is also one study finding showing the opposite, a near perfect positive relationship between ratings and achievement." This lack of consistency is not surprising, because we're talking about different fields of education and different SET forms. A typical positive correlation of +0.4 would indicate that 16% of the variance in students' performance could be attributed to differences between teachers that could be measured by SETs. Although 16% isn't very high, the sign of the correlation in most of the studies is positive and statistically significant.
But starting in 2010, new evidence arrived that turned this whole picture upside-down (Carrell, Braga). In these newer studies, students were randomly assigned to different sections of a class such as calculus, but they were then followed later in their career as they took required follow-up classes such as aeronautical engineering. The Carrell study was done at the US Air Force Academy, and due to the academy's structure, there was low attrition, and students could be forced to take the follow-up courses.
Carrell constructed a measure of added value for each teacher based on their students' performance on a test given at the end of the class (contemporaneous value-added), and a different measure (follow-on course value-added) based on performance in the later, required follow-on courses.
Academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous value-added but positively correlated with follow-on course value-added.
We find that less experienced and less qualified professors produce students who perform significantly better in the contemporaneous course being taught, whereas more experienced and highly qualified professors produce students who perform better in the follow-on related curriculum.
Braga's study at Bocconi University in Italy produces similar findings:
[We] find that our measure of teacher effectiveness is negatively correlated with the students' evaluations: in other words, teachers who are associated with better subsequent performance receive worst evaluations from their students. We rationalize these results with a simple model where teachers can either engage in real teaching or in teaching-to-the-test, the former requiring higher students' effort than the latter.
References
Abrami, d'Apollonia, and Rosenfield, "The dimensionality of student ratings of instruction: what we know and what we do not," in The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective, eds. Perry and Smart, Springer 2007 - link
Braga, Paccagnella, and Pellizzari, "Evaluating Students' Evaluations of Professors," IZA Discussion Paper No. 5620, April 2011 - link
Calkins and Micari, "Less-Than-Perfect Judges: Evaluating Student Evaluations," Thought & Action, fall 2010, p. 7 - link
Carrell and West, "Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors," J Political Economy 118 (2010) 409 - link
Marsh and Roche, "Making Students' Evaluations of Teaching Effectiveness Effective: The Critical Issues of Validity, Bias, and Utility," American Psycologist, November 1997, p. 1187 - link
Stark and Freishtat, "An Evaluation of Course Evaluations," ScienceOpen https://www.scienceopen.com/document/vid/42e6aae5-‐246b-‐4900-‐8015-‐ dc99b467b6e4?0 - link