SIGCSE 2004 DC Application
Titus Winters (titus@cs.ucr.edu)
Department Computer Science & Engineering
Surge Building Room 281
University of California at Riverside, 92521
Phone: 909 262 0385
Keywords: Automated grading, database, kdd, curriculum assessment
Personal Homepage:
http://www.cs.ucr.edu/~titus
Advisor: Tom Payne (thp at cs ucr edu)
The Archive
Introduction
The main thrust of the recent accreditation requirements from ABET, The Accreditation Board for
Engineering and Technology, is that accredited engineering
programs must have in place, and demonstrate use of, a
"continuous-improvement process." Much like the "Total Quality
Management" phenomenon that swept through industrial process
development in the 1990s, a continuous-improvement process means one
thing: feedback. The output from the system, in this case the
educational program, must be taken into account in the early stages of
each cycle of the system and adjustments made in an attempt to
increase the quality of the next batch of outputs.
The Plan at UCR
Many institutions (plan to) use surveys and course grades as
"evidence" of the educational process. In contrast, the CS&E
Department at UCR seeks to apply the tools of the realm of data
mining, knowledge discovery, and machine learning to the problem of
measuring instructional effectiveness. The goal is to gather detailed
information on the assessment of all of our students in every course,
down to the individual question level. In our opinion, a student's
course grade, or even aggregate score on an assignment or exam, fails
to capture the student's knowledge as it pertains to our educational
program objectives. When taken individually, a question that involves
linked-lists gives us some information about whether a given student
understands linked-lists or not, while the aggregate score on an exam
that tests various data structures can only imply a students overall
level of understanding.
Archiving this information, for all questions, for all students, in
all courses in our department is a massive undertaking. However, once
the data is captured, we anticipate the use of Baysean networks to
extract the relationships between each individual unit of assessment
and our program objectives. After this information extraction, and
given a student's performance history on units of assessment
pertaining to objective O, we can estimate to what level each
student has reached that objective in the range [0-1]. Summed across
our entire student population, we can now obtain a number that in some
sense represents the level that our students know that material. If
our estimate is that number is too low, appropriate action must be
taken.
Goals of the Research
Our primary goal is to increase the effectiveness of our educational
program. Our hope is that we can use the process described above to
bring our instructional efforts out of the realm of qualitative,
irreproducible events, and into the realm of numerically verifiable,
quantitative experiment. (Our approach has been described as akin to
the difference between Aristotelean and Newtonian dynamics.) KDD on
educational data has been a tried-and-true area for some time ([1])
Current Status
Obviously, a project of this scope requires an extensive groundwork.
Additionally, much of the groundwork is necessary in order to overcome
the distaste that research faculty have against time-consuming
modifications to their instructional techniques. I hope to allow and
encourage faculty to record this fine granularity data by presenting a
set of tools that will reduce the time that instructional tasks,
such as assessment/feedback and test preparation, take while gathering
data as a side-effect.
Agar
My primary task is currently the development of Agar, a framework for
automating grading. Agar is unique and interesting because of two
main features: the tool framework and the comment system.
Agar was developed with the intent to be as general as possible, with
the idea that if the functional tests that are provided with Agar are
found to be insufficient, it should be easy for a grader or instructor
to develop new tests for the assignment in question with relative
ease. To this end, rather than following a standard
high-performance route for plugin development using dynamically
linked libraries, Agar takes a much more simplistic approach. Tools
are written to respond to the "--help" agrument with a list of
command-line parameters, which are then parsed
by Agar to create a dynamically generated dialog box within the GUI
system to allow configuration of the functional tool in question.
Additionally, tools must respond to "--name" with a human-readable
name describing the tool (for example, "C++ Driver Tests" or "Detect
Line Wraps"). Finally, when the test is executed on the appropriate
input (either the compiled executable, if applicable, or the source
files for the assignment), the tool must exit with return code 0 for
success and 1 for failure, in which case the contents of the tool's
standard output stream will be appended to the student's results.
Other features, such as conditional execution of tests and dynamic
submission identification add ease-of-use, but the main power of Agar
for functionality testing is this tool interface.
While automated grading is by no means a new idea ([4], [6], [7]),
I feel that Agar represents a significant advancement over previous
attempts in that it also has shown greatly reduced effort required to
grade NON programming material such as written work and quizzes. The
automated testing features of Agar make it ideal for grading
programming work of all kinds, but the real benefit comes from the
time-savings for human graders in providing detailed
feedback. Additionally, Agar is intended to be fully open source, so
anything that isn't handled by the tool interface can be added to the
system internally.
Any grader can attest that different students make the same mistakes A
simple CS1/CS2 example would be forgetting to write a base case for a
recursive function on a quiz. Using the Agar framework, the first
time a human grader finds such a problem they create a new Comment,
assign a point value (positive for bonus, zero to just write a
comment, and negative for penalty) to the Comment, and write out a
note to the student. A drag-and-drop system within the Agar interface
then allows that Comment to be assigned to any other student that is
found to have the same mistake. Further, since Comments are assigned
by reference, the point value or feedback can be changed later on, and all
submissions that received that comment will automatically be
updated.
This comment system allows for much greater feedback to be generated
for each student in a much shorter amount of time. For C++ homework
in our lower-division courses, a two-week project for a class of 60-70
students now takes about 4-6 hours to grade, record scores, generate,
and send out detailed feedback for students. Previously, lower
quality feedback and less accurate grading was taking upward of 10-15
hours. Similarly, 12 written problems for the same course were graded
and commented on in just under 5 hours, or slightly less than 5
minutes per student. Students in this course have expressed how
helpful they find it to get their work returned to them within 1-2
days of turning it in, and have detailed comments and feedback emailed
to them while they still remember what the assignment involved.
Graders are similarly pleased in that they get to do more in less
time, and no longer have any bookkeeping to do since Agar can automatically
exports its results to a course grade-book in the form of a
spreadsheet, and options to export to the campus BlackBoard system coming soon.
PACE: Program for Accelerated Creation of Exams
The second prong of our attack against instructional inertia is the
development of a tool to develop question banks for exams. This has
been done before many times, including corporate attempts like Respondus, but we have need of a
tool that will interface with our Archive, store student results
courtesy of Agar, and allow us a bit more freedom in how we manipulate
the question data. PACE is that tool. PACE is in early testing
stages, but we believe that by generating question banks for each
course, instructors (especially in courses that change instructors
often) will be more inclined to teach similar material,
since the effort of assessing that material will be already done and
provided for them by the previous instructor.
Interim Conclusions
Fine-grained information is always useful. It reveals immediately
that on most exams there are some questions that are poorly worded,
that some topics were not as well understood by the class as the
instructor might think, and possibly even that some questions were
graded with an incorrect or incomplete answer in mind.
We have also discovered that computer assisted grading, using tools
such as Agar, can greatly reduce the amount of time necessary
both to perform basic grading and evaluation and to provide detailed
feedback to the students. We have cut the time requirements to grade
programming homework by 50-60%, while increasing the detail of our
records, the quality of feedback to the students, and the consistency
of the grading and clerical reporting. More interestingly, non
programming homework can still be graded with a significant
time-savings using a tool like Agar. Agar and other user-interface
tools will hopefully inspire instructors to perform the detailed score
recording that would be too tedious to do by hand, while saving them
time overall.
Open Issues
There are a number of issues that are very uncertain at this stage of
development, and a frighteningly large number of them could be
"deal-breakers" with regard to the final completion of the project as
currently envisioned. A brief list of these concerns includes:
- How can we most efficiently gather problem-level data? Can we
make assessment easy enough that Professors, many of whom are
stubbornly set in their ways and don't want to "waste" time on
clerical aspects of instruction, will be willing to use them?
- Can Baysean analysis be effectively performed on a matrix of
2500-25,000 cells?
- Would unsupervised learning, such as simple clustering, provide
better information about which program objectives an item of
assessment pertains to?
- Is an individual student's performance so "noisy" that it will
invalidate the final data mining? Student's are difficult
experimental subjects on an individual level, since individually they
have sicknesses, personal issues, other classes, late night parties,
etc etc.
Current Stage in my Program of Study
I am currently finishing up my coursework and preparing for my Oral
Examination. I intend to advance to Candidacy for a PhD in March of
2004.
What I Hope to Gain from participating in the Doctoral Consortium
Since there is no formal Computer Science Education program at UCR, my
focus has often been guided by the instructional problems that we are
facing at any given time. My hope is that by attending the Doctoral
Consortium I will gain exposure to more academic and formal approaches
to CSE, as well as to publicize the work that we are undertaking here
at UCR. I feel that the DC will be a wonderful opportunity for me to
begin networking with other researchers in CSE, which is extremely
important as my work is mostly being performed in isolation.
Bibliographic References
- ABET
- BlackBoard
- The New
Automated Grader Master Page
-
On Automated Grading of Programming Assignments in an Academic
Institution
- Respondus
- Lass, et. al. Tools and Techniques for Large Scale Grading
using Web-based Commerical Off-The-Shelf Software. SIGCSE
Conference on Innovation and Technology in CSE, 2003.
- Using
KDD To Analyze the Impact of Curriculum Revisions in a Brazilian
University