Walking the rake: 10 critical mistakes in developing a knowledge test

Walking the rake: 10 critical mistakes in developing a knowledge test
Before enrolling in a new Machine Learning Advanced course, we test prospective students to determine their level of readiness and understand what exactly they need to offer to prepare for the course. But a dilemma arises: on the one hand, we have to test our knowledge of Data Science, on the other hand, we cannot arrange a full-fledged 4-hour exam.

To solve this problem, we deployed a TestDev headquarters right in the Data Science course development team (and it seems that this is just the beginning). We present you a list of 10 "rakes" that are stepped on when developing tests for assessing knowledge. We hope that the world of online learning will become a little better after this.

Rake 1: Do not clearly define the goals of testing

In order to correctly define goals and write a test that will take them into account, at the planning stage, we must answer ourselves a few questions:

  1. What are we actually checking? 
  2. In what environment will testing take place and what mechanics are used? What are the limitations in this environment? The same item will allow you to understand the technical requirements for the device on which the test will take place, and also for the content (if the test is taken from phones, the pictures should be readable even on a small screen, it should be possible to enlarge them, etc.).
  3. How long will the testing take? You need to think about the conditions under which the user will take the test. Could it be that he needs to abort the testing process and then continue again?
  4. Will there be feedback? How do we form and deliver it? What do you need to get? Is there a time gap between test execution and feedback?

In our case, having answered these questions, we defined the following list of goals for the test:

  1. The test should show whether future students are ready to take the course, whether they have enough knowledge and skills.
  2. The test should give us material for feedback, indicate the topic in which the students made a mistake so that they can improve their knowledge. How to make it - we will tell further.

Rake 2: Do not draw up a TOR for an expert - the compiler of the test

To compose test items, it is very important to involve an expert in the field in which knowledge is being tested. And for an expert, in turn, you need a competent TK (description), which includes the topics of the test, the knowledge/skills being tested and their level.

An expert will not do such TK for himself, because his job is to come up with tasks, not the structure of the test. Moreover, while few people develop tests professionally, even in the process of teaching. This is taught in a separate specialty - psychometrics.

If you want to quickly get acquainted with psychometrics, then in Russia there are summer school for all those interested. For a more in-depth study, the Institute of Education has magistracy and graduate school.

When preparing the TOR, we collect a detailed description of the test for the expert (or better, together with him): the topics of the tasks, the type of tasks, their number.

How to choose the type of tasks: having decided on the topics, we decide which tasks can best check this? Classic options: open-answer task, multiple or single choice task, matching, etc. (don't forget about the technical limitations of the environment in which the testing is carried out!). After determining and prescribing the type of tasks, we have a ready-made TOR for the expert. You can call it a test specification.

Rake 3: Do not involve an expert in test development

When immersing an expert in the development of a test, it is very important not only to indicate to him the β€œscope of work”, but to involve him in the development procedure itself.

How to make working with an expert as efficient as possible:

  • Set it up in advance and spend some time talking about the science of test development, psychometrics.
  • Focus the assessor on creating a valid and reliable assessment tool rather than a list of questions.
  • Explain that his work includes a preparatory stage, not only the development of the tasks themselves.

Some experts (due to their nature) may perceive this as a test of their own work, and we explain to them that even when creating excellent tasks, they simply may not fit the specific goals of testing.

To make the process go quickly, we prepare with an expert a table of topics coverage (knowledge and skills), which is part of the test specification. It is this table that allows us to work out the questions accurately, to determine what we will measure. In each case, it can be drawn up a little differently. Our task: to check how well a person is oriented in the knowledge and skills of the previous, basic courses, in order to understand how ready he is for learning in the new course.

Rake 4: Thinking the expert β€œknows best”

Knows the subject better. But it doesn't always make sense. It is very important to check the wording of the assignments. Write clear instructions, for example, "Choose 1 correct option." In 90%, experts prepare questions in a way that they themselves understand. And that's okay. But before passing the test to those who will take it, everything needs to be checked and combed so that the people who take the test understand exactly what is required of them and do not make mistakes just because they could misinterpret the text of the task.

To avoid double interpretation of tasks, we conduct "cognitive laboratories". We ask people from the Central Asia to take the test, saying out loud what they think and fixing it in detail. At "cognitive laboratories" you can "catch" incomprehensible questions, bad wording, get the first feedback on the test.

Rake 5: Ignore test execution time

sarcasm mode: on
Of course, our test is the best, everyone dreams of passing it! Yes, all 4 hours.
sarcasm mode: off

When there is a list of everything that can be checked, the main thing is not to do this (it sounds strange at first glance, doesn’t it?). You need to cut ruthlessly, highlighting key knowledge and skills with an expert (yes, a number of skills can also be tested in the test). We look at the type of tasks and estimate the target execution time: if there are still more than reasonable limits, we cut it!

To cut down on the volume, you can also try (neatly) testing two skills in one task. In this case, it is difficult to understand why the person made a mistake, but if done correctly, both skills can be taken into account. It is important to make sure that these 2 skills correspond to the same area of ​​expertise.

Rake 6: Don't think over the scoring system

Often, when compiling assessment tests, they use the classic scoring system, for example, 1 point for easy tasks and 2 points for difficult ones. But it is not universal. It’s just that the sum of the test scores will tell us little: we don’t know for which tasks these points were received and we can only determine the number of correct tasks. We need an accurate understanding of exactly what skills test participants are demonstrating. In addition, we want to give them feedback on what topics need to be improved.

After all, we are doing a test that will divide people into those who are ready and not ready to take the program, we will advise some of them to prepare for the course on free education. It is important for us that only those who really need it and who are ready for it get into this group.

What we are doing in our situation: we determine within the working group of test developers which groups of people need to be distinguished (for example, ready to learn, partially ready) and form a table of characteristics of such groups, indicating what skills and knowledge will be relevant for a group of people who are ready to learning. So you can form the "difficulty" of tasks for such tests.

Rake 7: Evaluate results only automatically

Of course, the assessment should be as objective as possible, so some of the students' materials are evaluated automatically, "by the keys" - comparing with the correct answers. Even if there is no special testing system, there are plenty of free solutions. And if you have an understanding of the principles of writing scripts, then you can do anything with Google forms and results in tables. If some of the tasks are checked by experts, then we need to think over the delivery of answers to the experts, without information about the dealers. And think about how to integrate the results of the expert check into the final assessment.

We initially wanted to make several open tasks with a code, when experts evaluate solutions according to pre-formed criteria, and even prepared a system that exports individual answers of test participants to a special table for experts, and then imports the results into a table with grading calculation. But after discussing with the representatives of the target audience, the product manager and the pedagogical designer, we decided that it would be much more efficient and useful for the participants to conduct a technical interview with instant expert feedback and discussion of the code, as well as individual questions.

Now the expert verifies the passing of the test, clarifying some questions. To do this, we have prepared a guide of questions, evaluation criteria for a technical interview. Prior to the technical interview, the examiner receives a test taker response card to select questions to ask.

Rake 8: Don't explain test results

Presenting feedback to participants is a separate issue. We need not only to inform about the test score, but also to give an understanding of the test results.
It can be: 

  • Tasks in which the participant made a mistake, and which he completed correctly.
  • Topics in which the participant made mistakes.
  • His ranking among those taking the exam.
  • Description of the level of the participant, in accordance with, for example, the description of the level of specialists (based on the description of vacancies).

During the pilot launch of our test, for those who wanted to enroll in the program, along with the results, we showed a list of topics that needed to be pulled up. But this is certainly not ideal, we will improve and make feedback better.

Rake 9: Do not discuss the test with developers

Perhaps the sharpest rake, which is especially unpleasant to step on, is to send the test, description, and scoring scale to the developers in the β€œas is” state.
What needs to be discussed:

  • The appearance of the questions, the structure, the position of the graphics, what the choice of the correct answer looks like.
  • How is the score calculated (if needed), are there any additional conditions.
  • How feedback is formed, where to get texts, are there any additional, automatically generated blocks.
  • What additional information do you need to collect and at what point (same contacts).

To avoid misunderstandings, we ask our developers to code 2 or 3 different questions so that we can see what they look like before programming the test itself.

Rake 10: Without testing, upload immediately to production

3 times, guys, different people should check the test 3 times, and better - each 3 times. This truth is obtained by blood, sweat and pixels by lines of code.

Our test checks the following trio:

  1. Product - checks the test for performance, appearance, mechanics.
  2. Test developer - checks the text of tasks, their order, the form of working with the test, types of tasks, correct answers, readability and normal viewing of graphics.
  3. The author of tasks (expert) - checks the test for fidelity from an expert position.

Example from practice: only on the third run, the author of the tasks saw that 1 task remained in the old version of the wording. All the previous ones also actively ruled. But when the test was coded, it looked different than it was originally imagined. Most likely something will have to be corrected. This must be taken into account.

Π‘onclusion

Carefully bypassing all these "rakes", we created a special bot in Telegram, to test the knowledge of applicants. Anyone can test it while we are preparing the next material, in which we will tell you what happened inside the bot, and what it all transformed into later.

Walking the rake: 10 critical mistakes in developing a knowledge test
You can get a sought-after profession from scratch or Level Up in terms of skills and salary by completing SkillFactory online courses:

More courses

Source: habr.com

Add a comment