Skip to main content Skip to main content

Prepared by Sheila Rucki

TO:  SRI Task Force
FROM:  Sheila Rucki
RE:  SRI Task Force Proposal



In the spirit of moving the discussion of this group forward and as a faculty representative to this committee, I would like to present the following research and make the following proposal for the use of written student SRI comments in the faculty review process.  I have attempted to incorporate those sentiments around which a consensus seems to be developing and to limit my analysis to peer-reviewed research.



This committee, and the faculty represented in the Faculty Senate,  are in general agreement that the written comments on the revised SRI form can provide useful formative feedback for faculty.  However, the overwhelming majority of the Faculty Senate voted to exclude those written comments from the portfolios, submitted through Digital Measures, for summative review purposes.

Furthermore, the Faculty Senate RTP Committee has expressed concerns about how those comments were used in some portfolios reviewed during the 2012-2013 review cycle along with more general concerns about the ethics and practicality of reviewing those comments.

The challenge for this committee seems to be to devise a way that the information in the written comments can be used productively and fairly.  This challenge is complicated by the absence of research-based evidence that:

  1. the written comments deviate from the numerical data in a statistically significant way and
  2. if such a deviation exists, that the data from the written comments is somehow better or a more accurate reflection of the quality of the instruction than the numerical data.

My literature search has revealed very little research on the relationship between quantitative and qualitative data in SRIs.  This lack of research is confirmed by Wongsyrawat (2011). 
According to Wongsyrawat, the few studies that do focus on the reliability of student comments as a measure of teaching effectiveness reveal mixed results with one finding a 0.9 correlation with the Likert-scale evaluations and another  finding correlations of between 0.2 and 0.5 (Wongsurawat 2011, 68).

As I have said before, if there is no consistent evidence that the qualitative data  deviates from the quantitative data, then its inclusion only increases the amount of data that must be absorbed but does not increase the information available. 

Comparing Quantitative and Qualitative SRI Data

If there is a difference, then the problem is deciding which set of data is a more valid measure of quality of instruction.  When a reviewer uses the written comments to interpret the quantitative data, she is implicitly assuming that the qualitative data is somehow better.  So, to take an anecdote from our discussions, if an instructor receives an overall Likert score of 5 from a class but one or two students also report in the written comments that the instructor was disorganized, what is one to make of that?  Is it that the number would be better if the instructor were better organized (a formative suggestion)?  That the instructor’s disorganization had no impact on the quality of instruction (which means the comments should have no influence at all)?  That the 5 should be considered in the context of the comments and not be taken at face value (a summative conclusion)?  If we chose the final option, what we are saying is that the comments are not only different from the numbers but also better indices of quality of instruction than the numbers and, thus, should be used to discount the quantitative data.  However, I have seen no research to suggest why this might be true. 

To extend the anecdote, what if this pattern of high quantitative ratings and negative comments about organization persists over many semesters, creating  a “pattern” of complaints about the instructor’s disorganization.  We are still confronted with the same problem:  When the quantitative and qualitative data conflict, why should we act as if the qualitative data is a more valid, rather than just a more visceral, measure of instructional performance?

The research I have found that directly takes on the issue of the reliability and validity of written comments versus the quantitative scoring actually suggest that the test for the validity of written comments is to determine how close the quantitative scoring of each individual author is to the mean score for the course (Wongsurawat 2011).  Thus, this research concludes, in order to evaluate the validity of any comment, or any group of comments, one must know how the author evaluated the course on the Likert scale part of the SRI.  Those commenters whose Likert score is closest to the mean of all scores should have their comments weighted more heavily and those whose Likert scores deviate from the mean should have their scores discounted.

This is an interesting mechanism for addressing the negativity bias, discussed below, but clearly requires an investment of time beyond what is practical.  Furthermore, in my experience, Digital Measures presents the comments  separately from the quantitative data, making this kind of weighting impossible as well as impractical. 

Thinking of this issue both as an instructor and a social scientist, I see the formative value of the written comments.  From a formative perspective, the comments may provide information not already factored into the quantitative data, but there is no  evidence that they serve a summative purpose or that they are sufficiently valid and reliable to justify using them to discount the quantitative data.

Given this absence of evidence to justify their inclusion, we can now move to a discussion of the potential harms.  In addition to the issues raised by the Faculty Senate RTP Committee, there is the problem sometimes referred to as the “Amazon Effect” or negativity bias.

Negativity Bias

If there is no clear advantage to including written SRI comments in the summative evaluation of faculty, the question remains as to whether there is harm.  On this point, the psychology literature seems to be clear.  It is also impressively large.  I have attached a review article from the Review of General Psychology (Baumeister et al. 2001) that details the scope of the psychology literature dealing with the phenomenon of negativity bias.  The review of literature concludes that there is an overwhelming consensus within the discipline that the bias is real and that it occurs at an “automatic and not a fully conscious level” (Baumeister et al., 341).[1]  The conclusion is that humans express a negativity bias even if they are unaware of it, or in the case of the research cited by Baumeister et al., even when they deny that such a bias went into their decision process. 

Baumeister et al. cite research as far back as 1965 uncovering the strength of this bias and conclude that negative  information about a person “carries more weight and has a larger impact on impression than good information” (344).  Specifically in the realm of hiring and personnel decision-making, cited research shows that “personnel interviewers use unfavorable information as a basis for rejecting candidates to a greater extent than they use favorable information as a basis for hiring them” (347).  Further research establishes that we spend more time processing negative information as compared to positive information, thus creating stronger memories of the negative (340, 343-344), and that we believe negative behaviors to contain more information about a person’s character than positive behaviors (346).

If we return to our hypothetical professor, this research gives us reason to pause at the claim that a “pattern” might emerge by looking at written comments that allows the reviewer to discount or override the quantitative data.  If this research is to be given credence, we must now ask how heavily negative information has been weighted, perhaps unconsciously, and accept the possibility that a larger body of positive information in the comment data set was either not noticed or dismissed as containing less information.  As noted by Ito et al. , “A growing catalog of errors, biases, and asymmetries points to the conclusion that negative information more strongly influences people’s evaluations than comparably extreme positive information…”(1998, 887).


Returning to the questions I posed at the beginning of our meetings, there is no research conclusively supporting the position that the evaluation of teaching from written comments deviates substantially from that of quantitative evaluation.  At best, the research taken as a whole is inconclusive.  However, there is substantial research that suggests that if there is a difference it will almost surely skew negative.  The only model for integrating written SRI comments into the summative evaluation of teaching that I have been able to unearth that attempts to solve this systematic bias requires resources far in excess of what we currently devote to the evaluation of teaching.  I am left to conclude that, despite good intentions, there is little scientific evidence to justify the inclusion of written comments in Digital Measures for the purpose of summative evaluation of instruction.  If we add to this the issues raised by the Faculty Senate RTP Committee, I believe there is no objective reason to continue to use those comments for summative purposes. 

Nevertheless, the discussions of this committee and my own experience suggest to me that there is a consensus that the written comments of the SRI serve two important formative purposes.  First, when the comments correlate with the scores, they provide faculty members with information that may help them identify their own weaknesses and improve upon them.  Second, direct supervisors can use the comments to help faculty members identify weaknesses and provide mentoring or recommend campus resources to help them grow as professors.  If our goal is to provide the best learning experience that we can for our students, these uses of written SRI comments seem appropriate. 

Furthermore, as a member of this faculty, I recognize that there might be colleagues   unable or unwilling to improve their teaching.  In fairness to them and to the investment that this institution makes in its tenure-track faculty, I believe this unwillingness must be demonstrated.  Documented evidence that faculty members with low SRI scores have been notified and provided with proper mentoring and resources to improve their teaching should be a minimum requirement for a negative evaluation of teaching in the tenure review.  We do a disservice to our faculty, our institution, and most importantly to our students if we allow substandard teaching to go unremarked on, with no attempt to remediate the problem, for five years. 


  1.  Written comments shall not be automatically uploaded into Digital Measure unless or until that time that they can be anonymously and objectively weighted by reviewers as per the recommendations of Wongsurawat (2011) or a similarly tested mechanism for ensuring the reliability and validity of the data.
  2. Written comments may be uploaded by individual faculty for inclusion in their summative review as per the resolution passed by the Faculty Senate on 20 March, 2013 (See highlighted section in attached Proposed Handbook Language Change).
  3. Original SRI forms, or copies of original SRI forms, will be distributed to faculty, chairs, and deans for the purposes of formative evaluation.  The written comments shall not be presented separately from the Likert ratings for each individual comment sheet.
  4. Recommendations for improvements in instruction derived from formative evaluations, along with recommendations for mentoring or other resources to improve teaching, shall be made in writing, provided to the faculty, and uploaded to Digital Measures.



Works Cited

Ansolabehere, Stephen and Shanto Iyengar.  1997.  Going Negative:  How Political Advertisements Shrink and Polarize the Electorate.  New York:  The Free Press, a Division of
Simon & Schuster, Inc.

Baumeister, Roy F., Ellen Bratslavsky, Catrin Finkenauer, and Kathleen D. Vohs.  2001.  Bad Is Stronger Than Good.  Review of General Psychology 5, no. 4:  323-3710.

Ito, Tiffany A., Jeff T. Larsen, N. Kyle Smith, and John T. Cacioppo.  1998.  Negative information Weighs  More Heavily on the Brain:  The Negativity Bias in Evaluative Categorizations.  Journal of Personality and Social Psychology 75, no. 4:  887—900.

Wongsurawat, Winai.  2011.  What’s a comment worth?  How to better understand student evaluations of teaching.  Quality Assurance in Education 19, no. 1:  67-83.


Proposed Handbook Change Language

Student Ratings of Instruction


Change #1

V. C. 4. (a)

Current Language



C.            Definitions

 (4) Student Ratings of Instruction:

 (a) All performance reviews shall include student ratings of instruction for all classes assigned using the approved “Student Ratings of Instruction” (SRIs) form. Exceptions include:

 (i)   Field experiences and internships as determined by the Department; and

 (ii)   Classes with fewer than five students must be evaluated according to Department Guidelines.


Proposed Language

(4) Student Ratings of Instruction:

 (a) All performance reviews shall include the quantitative summaries of student ratings of instruction for all classes assigned using the approved “Student Ratings of Instruction” (SRIs) form. Student comments from the SRIs will only be included if the faculty member includes all of them from the selected section in their ‘Additional Materials for Review.’ Exceptions include:

 (i)   Field experiences and internships as determined by the Department; and

(ii)   Classes with fewer than five students must be evaluated according to Department Guidelines.



[1] Cognizant of the age of this review, I conducted a citation search.  What I found is that subsequent to this review there has been little independent research on this issue and recent citations of this article and the articles cited within it tend to treat the reality of negativity bias as a given.  I have not found any published research denying the existence of negative bias published in a major psychology journal since 2001.  Furthermore, I have found multiple uses of this research in fields far afield from psychology, including political science research into the power of negative campaigns.  This research concludes that although consumers of political advertising report their immunity to negative advertising (to the point of saying they vote against candidates who run negative campaigns) the evidence suggests that negative campaign advertising is much more effective than positive campaign messages in establishing the identity of the candidate in the eyes of the voter.  See Ansolabehere and Iyengar (1997) for example.

Edit this page