Shared publicly  - 
 
I have been thinking about my response to the topic of robograders and automatic assessment as I have heard my respected colleagues in CS, math, science, and just general all around great educators write on the subject.

Once again I find myself in the unpopular position of defending not the current implementation but the idea. In the article linked to below, the author makes the final point, "If it is worth the time for the student to do the work isn't it worth our time to grade it?" This is a false comparison IMHO as the student is groking with the concept, learning, and playing with the concept; I am initially just verifiying the correctness of the problem.

Of course when necessary I further dive into the answer to see how and why they made certain choices but I fail to see why I couldn't have the computer make the first pass so I can focus quickly on those who need immediate help. A well designed question tests not just if they got the answer but how and what misconceptions they may have whether it be with fractions, units, definitions, etc. Is the technology there yet to do this effectively? On the whole no, but these things have to start somewhere and even more so with machine learning that depends on large data sets to improve.

I would hate this to be considered an endorsement of those currently using automated grading especially for high stakes grading or those who are trying to diminish/eliminate the role of the educator because nothing is more abhorrent to me, but not spending my entire weekend grading and instead opening up a dashboard with a detailed diagnosis that I can then use to drive instruction, support, and further challenge my students sounds empowering to me as an educator.
Can mentors be replaced by robo-readers? Josh Nutzman talks about the possibility in automated grading programs.
1
Daniel Bo's profile photoAudrey Watters's profile photoPhil Wagner's profile photoEric Eslinger's profile photo
5 comments
 
I'm with you, +Phil Wagner , stay strong. The truth is, the better we get at automating grading (designing better tests where necessary), the more opportunities to learn for those who wish to learn.
 
I like the Google Form Summary because it lets me see all the answers for a certain question together and I can judge my teaching and know what needs reteaching.
 
Well, I disagree. (Shocker.) But I'll disagree about the research and the tools with a reason that I haven't really articulated yet in all the various posts I've written criticizing the robo-graders.

I don't think the way in which this research is being reported -- that automated grading software now performs the same as humans -- is actually an accurate description of what's going on. Who exactly are these human graders? What do we know about how well they "read" to begin with? Are we talking about jobs like this -- http://www.ehow.com/how_4805077_work-home-scoring-tests-pearson.html? A position where you make minimum wage scoring tests, probably with a rubric beside you that is fairly mechanistic and automated to begin with?

A computer "making the first pass" -- does that mean asking students to run spell check? Asking them to run grammar check? Or what about asking to see versions so you can make sure students have thought through and edited their work?

I find robo essay grading abhorrent, not only because I think it sends the message to students that what they write isn't worth being read by a human, but because it's a recognition that what we're doing is creating tons of folks who are reading and writing like computers.
 
+Audrey Watters I am talking about lower level skills like does it work, is it correct? These are skills that can be quickly and easily caught. Anything that a student writes or makes is worth reading don't get me wrong. However, are schools assigning things worth doing or do they stop at what can be run through the machine?

How difficult is it to read and fully enjoy a student's essay when you are thinking about mechanics? Could the automatic feedback help the student refine so that their work can be something they can be proud of? I guess not in high stakes grading situations but those are things I try and avoid. Plus I don't think students are going into the SAT or GRE have deep expectations beyond passing the test. Whether the test at all is a validation of one's ability is another story entirely. I try to base my grading on something greater than what can be Googled, plugged into a calculator, or spell checker. Besides with the rise of the robograders, comes the rise of smarter ways of cheating.
 
There's clearly a line there somewhere. Or at least, there's one to me. On the one hand, we've been playing around with automatic assessment for ages in both testing and instruction. Intelligent Tutors, Cognitively Based Instruction, Mastery Learning, all of that is based around the idea that the computer can provide some guidance to the learner as they navigate the space. There's plenty of room for automatic something-or-other.

Let's look at essay scoring as a special case of the Problem (capital-P problem): a complex student response to a particular challenge. In some domains, this is a serious (actual) problem: proving a theorem, analyzing data, designing an experiment, designing a user interaction, creating and supporting an argument.

In the case of the Problem, I do not like any kind of real automatic scoring. To be flippant, I'd say that anything that is amenable to auto-coding isn't really a Problem, it's probably an exercise (writing a five-paragraph essay on the plot of a book is an exercise, arguing about the motivation of a character in the historical context of the novel is a Problem).

To be more thorough, I think that any student who works on a real Problem needs frequent and supportive formative assessment. Automatic coding might be fine for discriminating Good from Bad, but I doubt it would be able to provide substantive feedback to the student. To that end, in the inquiry-support software I did my dissertation on (and am currently re-writing from scratch), I ask students to assess themselves and each other, instead of trying to assess the quality of the hypothesis automatically. That's a digression, though, so I'll get back to the main idea.

I could imagine an auto-scorer that let you frequently assign Problems, but only manually grade (and provide feedback on) the weakest students' work as well as some rotating randomly-selected other student work so you can make sure you're giving the most help to the people who need it most.

I would still rather the answer be: adjust teacher workloads so that they have a small enough total student size and sane enough content workload that this is a non-issue. If you can have N students work through a lot of interesting Problems as they learn Foo and provide feedback to them all, that's a good class size and set of standards. If we have to cut corners on feedback to either cover the material or triage the especially needy, that's something that needs improvement.
Add a comment...