Posts tagged with binaries

Statisticians are crystal clear on human variation. They know that not everyone is the same. When they speak about groups in general terms, they know that they are reducing N-dimensional reality to a 1-dimensional single parameter.

Nevertheless, statisticians permit, in their regression models, variables that only take on one value, such as {0,1} for male/female or {a,b,c,d} for married/never-married/divorced/widowed.
No one doing this believes that all such people are the same. And anyone who’s done the least bit of data cleaning knows that there will be NA's, wrongly coded cases, mistaken observations, ill-defined measures, and aberrances of other kinds. It can still be convenient to use binary or n-ary dummies to speak simply. Maybe the marriages of some people coded as currently married are on the rocks, and therefore they are more like divorced—or like a new category of people in the midst of watching their lives fall apart. Yes, we know. But what are you going to do—ask respondents to rate their marriage on a scale of one to ten? That would introduce false precision and model error, and might put respondents in such a strange mood that they answer other questions strangely. Better to just live with being wrong. Any statistician who uses the cut function in R knows that the variable didn’t become basketed←continuous in reality. But a facet_wrap plot is easier to interpret than a 3D wireframe or cloud-points plot.

To the precise mind, there’s a world of difference between saying

  • "the mean height of men > the mean height of women", and saying
  • "men are taller than women".


Of course one can interpret the second statement to be just a vaguer, simpler inflection of the first. But some people understand  statements like the second to mean “each man is taller than each woman”. Or, perniciously, they take “Blacks have lower IQ than Whites” to mean “every Black is mentally inferior to every White.”

I want to live somewhere between pedantry and ignorance. We can give each other a break on the precision as long as the precise idea behind the words is mutually understood.


Dummyisation is different to stereotyping because:

  • stereotypes deny variability in the group being discussed
  • dummyisation acknowledges that it’s incorrect, before even starting
  • stereotyping relies on familiar categories or groupings like skin colour
  • dummyisation can be applied to any partitioning of a set, like based on height or even grouped at random

It’s the world of difference between taking on a hypotheticals for the purpose of reaching a valid conclusion, and bludgeoning someone who doesn’t accept your version of the facts.

So this is a word I want to coin (unless a better one already exists—does it?):

  • dummyisation is assigning one value to a group or region
  • for convenience of the present discussion,
  • recognising fully that other groupings are possible
  • and that, in reality, not everyone from the group is alike.
  • Instead, we apply some ∞→1 function or operator on the truly variable, unknown, and variform distribution or manifold of reality, and talk about the results of that function.
  • We do this knowing it’s technically wrong, as a (hopefully productive) way of mulling over the facts from different viewpoints.

In other words, dummyisation is purposely doing something wrong for the sake of discussion.

  • "Adults have to deal with moral grey areas"
  • "I’m not liberal or conservative, I guess I’m somewhere in the middle"
  • "It may be helpful to think of data science and business intelligence as being on two ends of the same spectrum” (source)
  • "On a sliding scale from 1 to 10, how happy are you with life?"
  • "[S]cientific bias…is a model for separating plausible hypotheses from their opposite.” (source)
  • Please rate your attitude toward the following statements from “strongly agree” to “strongly disagree”.
  • How did you like that book, movie, play, album? Please answer anywhere between ★ and ★★★★★.
  • "The truth lies somewhere in between"


People talk about “grey areas” as if [0,1] is so much more sophisticated than {0,1}. I find such rhetoric limiting. After all, the convex combinations of black and white are totally ordered, completely linear, and only one-dimensional! A painting in B&W couldn’t display much variation. (Not that it couldn’t be interesting.) We deal everyday with things more complicated than “a grey area” because the world is 3-D and colour is Lab (3-D nonlinear). Add in texture and smell and you’ve increased the psychological dimensionality manyfold.


The metaphor is insufficiently rich. Adult situations don’t fall on a straight line. Political viewpoints don’t sit neatly next to each other in 1-D. Moral ambiguity is certainly more colourful and convoluted than the path from #000000 to #FFFFFF.

Me, I’m more interested in 2.7-dimensional hornspheres, quartz crystal spires, hot-air balloons with a row of golden rings piercing the spine, and quasi-polar negatively bent inside-out torii-cum-logcabins. Or even just something as “pedestrian” as a mountaintop pine forest, which is already much more intricate than, cough cough, the unit interval [0,1].


So—back to my original point—I think moral ambiguity resembles a cell complex more than a line segment. Real situations—the layered tragedies, ironies, comedies, and lengthy mediocrities that desirous, egocentric humans instinctively generate—have a much more interesting shape than “the span between 0 and 1.”


I guess I shouldn’t be so critical. The people using the grey-area metaphor probably don’t avail themselves of the whimsical thought-gardens in which more exciting shapes live. Sorry there, I was just feeling constricted.


I hope you’ve enjoyed these drawings by Robert Ghrist from his (free) notes on homotopy.