I’ve just started reading The Lady Tasting Tea, the story of statistics in/and modern science. But one of the early examples has gotten me thinking – how would a scientist go about testing the general intelligence/retained knowledge of a group of students?

Given:

Whatever we measure is really part of a random scatter, whose probabilities are described by a mathematical function, the distribution function.

It seems unlikely a contemporary scientist dropped onto planet B would propose the kind of one-and-done tests that students generally encounter at the end of subjects, semesters, years and school itself.

From the book:

Consider a simple example from the experience of a teacher with a particular student. The teacher is interested in finding some measure of how much the child has learned. To this end, the teacher “experiments” by giving the child a group of tests. Each test is marked on a scale from 0 to 100. Any one test provides a poor estimate of how much the child knows. It may be that the child did not study the few things that were on that test but knows a great deal about things that were not on the test. The child may have had a headache the day she took a particular test. The child may have had an argument with parents the morning of a particular test. For many reasons, one test does not provide a good estimate of knowledge. So, the teacher gives a set of tests. The average score from all those tests is taken as a better estimate of how much the child knows. How much the child knows is the outcome. The scores on individual tests are the data.

I’m quite biased here as I’m absolutely horrid at standardised testing – for a variety of reasons including medical. But it does seem to be yet another aspect of schooling that should be updated given our increasingly sophisticated understanding of the world. Randomness is not to be messed with.

(As usual my emphasis)