Posts tagged with projection

## Dummyisation

Statisticians are crystal clear on human variation. They know that not everyone is the same. When they speak about groups in general terms, they know that they are reducing N-dimensional reality to a 1-dimensional single parameter.

Nevertheless, statisticians permit, in their regression models, variables that only take on one value, such as {0,1} for male/female or {a,b,c,d} for married/never-married/divorced/widowed.

No one doing this believes that all such people are the same. And anyone who’s done the least bit of data cleaning knows that there will be NA's, wrongly coded cases, mistaken observations, ill-defined measures, and aberrances of other kinds. It can still be convenient to use binary or n-ary dummies to speak simply. Maybe the marriages of some people coded as currently married are on the rocks, and therefore they are more like divorced—or like a new category of people in the midst of watching their lives fall apart. Yes, we know. But what are you going to do—ask respondents to rate their marriage on a scale of one to ten? That would introduce false precision and model error, and might put respondents in such a strange mood that they answer other questions strangely. Better to just live with being wrong. Any statistician who uses the cut function in R knows that the variable didn’t become basketed←continuous in reality. But a facet_wrap plot is easier to interpret than a 3D wireframe or cloud-points plot.

To the precise mind, there’s a world of difference between saying

• "the mean height of men > the mean height of women", and saying
• "men are taller than women".

Of course one can interpret the second statement to be just a vaguer, simpler inflection of the first. But some people understand  statements like the second to mean “each man is taller than each woman”. Or, perniciously, they take “Blacks have lower IQ than Whites” to mean “every Black is mentally inferior to every White.”

I want to live somewhere between pedantry and ignorance. We can give each other a break on the precision as long as the precise idea behind the words is mutually understood.



Dummyisation is different to stereotyping because:

• stereotypes deny variability in the group being discussed
• dummyisation acknowledges that it’s incorrect, before even starting
• stereotyping relies on familiar categories or groupings like skin colour
• dummyisation can be applied to any partitioning of a set, like based on height or even grouped at random

It’s the world of difference between taking on a hypotheticals for the purpose of reaching a valid conclusion, and bludgeoning someone who doesn’t accept your version of the facts.

So this is a word I want to coin (unless a better one already exists—does it?):

• dummyisation is assigning one value to a group or region
• for convenience of the present discussion,
• recognising fully that other groupings are possible
• and that, in reality, not everyone from the group is alike.
• Instead, we apply some ∞→1 function or operator on the truly variable, unknown, and variform distribution or manifold of reality, and talk about the results of that function.
• We do this knowing it’s technically wrong, as a (hopefully productive) way of mulling over the facts from different viewpoints.

In other words, dummyisation is purposely doing something wrong for the sake of discussion.

some very small categories

hi-res

Thanks,

hi-res

Vertical cross-section of the Sierra Madres, near Oaxaca City.

You can look at physically “straight” cross-sections of a mountain (or trench), or you can take the vertical cross-section along a path γ in 2-D.

You can think about the landscape as a scalar field in 2-D (puncture the plane if you need to do this to the whole Earth) with the height being a numerical quantity assigned to any point. In that case the image above records the values the scalar field takes along the 1-dimensional γ. (If the scalar field were discontinuous, that would mean you either just biked off a cliff, or biked into a wall.)  Or you can think about it in 3-D — looking around from your bicycle as you ascend the mountain ridge.

Or you can think about it in 4-D — the Earth hurtling and whirling through spacetime, events in a light cone”always there” in the Tralfamadorian past.

Related ideas:

• projection
• statistical cross-sections (cohorts, longitudinal data, versus “panel” measurements now)

hi-res

## Infinite Data

Since people liked my last opinion piece on #big data, here’s another one.

Imagine there was a technology that allowed me to record the position of every atom in a small room, thereby generating some ridiculous amount of data (Avogadro’s number is 𝒪(10²³) so some prefix around that order of magnitude — eg yoctobytes). And also imagine that there was a way for other scientists to decode and view all of that. (Maybe the latency and bandwidth can still be restricted even though neither capacity nor resolution nor fidelity nor coverage of the measurement are restricted — although that won’t be relevant to my thought experiment, it would seem “like today” where MapReduce is required.)

Let’s say I am running some behavioural economics experiment, because I like those. What fraction of the data am I going to make use of in building my model? I submit that the psychometric model might be exactly the same size as it is today. If I’m interested in decision theory then I’m going to be looking to verify/falsify some high-level hypothesis like “Expected utility" or "Hebbian learning". The evidence for/against that idea is going to be so far above the atomic level, so far above the neuron level, I will basically still be looking at what I look at now:

• Did the decisions they ended up making (measured by maybe 𝒪(100), maybe even 𝒪(1) numbers in a table) correspond to the theory?
• For example if I draw out their assessment of the probability and some utility ranking then did I get them to violate that?

If I’ve recorded every atom in the room, then with some work I can get up to a coarser resolution and make myself an MRI. (Imagine working with tick-level stock data when you really are only interested in monthly price movements—but in 3-D.) (I guess I wrote myself into even more of a corner here, if we have atomic level data then it’s quantum, meaning you really have to do some work to get it to the fMRI scale!) But say I’ve gotten to fMRI level data, then what am I going to do with them? I don’t know how brains work. I could look up some theories of what lighting-up in different areas of the brain means (and what about 16-way dynamical correlations of messages passing between brain areas? I don’t think anatomy books have gotten there yet). So I would have all this fMRI data and basically not know what to do with it. I could start my next research project to look at numerically / mathematically obvious properties of this dataset, but that doesn’t seem like it would yield up a Master Answer of the Experiment because there’s no interplay beween theories of the brain and trying different experiments to test it out — I’m just looking at “one single cross section” which is my one behavioural econ experiment. Might squeeze some juice but who knows.

Then let’s talk about people critiquing my research paper. I would post all the atomic-level data online of course, because that’s what Jesus would do. But would the people arguing against my paper be able to use that granular data effectively?

I don’t really think so. I think they would look at the very high level of 𝒪(100) or 𝒪(1) data that I mentioned before, where I would be looking.

• They might argue about my interpretation of the numbers or statistical methods.
• They might say that what I count as evidence doesn’t really count as evidence because my reasoning was bad.
• They couldn’t argue that the experiment isn’t replicable because I imagined a perfect-fidelity machine here.
• They could go one or two levels deeper and find that my experimental setup was imperfect—the administrator of the questions didn’t speak the questions in exactly the same tone of voice each time; her face was at a slightly different angle; she wore a different coloured shirt on the other day. But in my imaginary world with perfect instruments, those kinds of errors would be so easy to see everywhere that nobody would take such a criticism seriously. (And of course because I am the author of this fantasy, there actually aren’t significant implementation errors in the experiment.)

Now think about either the scientists 100 years after that or if we had such perfect-fidelity recordings of some famous historical experiment. Let’s say it’s Michelson & Morley. Then it would be interesting to just watch the video from all angles (full resolution still not necessary) and learn a bit about the characters we’ve talked so much about.

But even here I don’t think what you would do is run an exploratory algorithm on the atomic level and see what it finds — even if you had a bajillion processing power so it didn’t take so long. There’s just way too much to throw away. If you had a perfect-fidelity-10²⁵-zoom-full-capacity replica of something worth observing, that resolution and fidelity would be useful to make sure you have the one key thing worth observing, not because you want to look at everything and “do an algo” to find what’s going on. Imagine you have a videotape of a murder scene, the benefit is that you’ve recorded every angle and every second, and then you zoom in on the murder weapon or the grisly act being committed or the face of the person or the tiny piece of hair they left and that one little sliver of the data space is what counts.

What would you do with infinite data? I submit that, for analysis, you’d throw most of the 10²⁵ bytes away.

Categorial decomposition of Galilean spacetime.

Sean Carroll tells us that it was Galileo who first si rese conto che motion can be separated into:

• motion in the x direction — ẋ or x′[t]
• motion in the y direction — ẏ or y′[t]
• motion in the z direction — ż or z′[t]

and, importantly, that physical laws should be the same for all the 360° × 360° orthonormal choices of (x,y,z). It was Galileo’s idea that you can draw axes, that forces can be decomposed onto those axes, and that forces along one axis behave independently of each other.

For example if you kick a football, it goes forward x′[t], chips up y′[t], and bends left z′[t]. If you kicked it off a cliff, it would retain its exact same forward x'[t] speed even after it dropped y<0 below the plane of the cliff at an ever increasing speed. (NB: That’s not actually true, which is why we say “in a vacuum”.)



The traditional way to talk about a path γ is talking in tuples:

• First, you have some points
• Then, you have a 3-basis.
• Then, you have an interval.
• If you want to talk about kicking the ball, you would probably call the ball a point, say “there is” a vector space tangent to the ball, and your single kick of the ball constitutes a single force-vector applied (instantaneously) to the point, I mean ball. “Then” — by which I mean “at higher values of t∈interval" — the ball "is" chipped up in the air, "then" back on the ground.
• The path γ is any member of the product (pairing) of 3-basis with interval.

path γ ∈ time × space*

* space in the geographer’s sense; the casual, not mathematical, sense of the word space. Lawvere calls mathematical space a “universe” … like the theoretical universe that the theory lives in

All of this “you have” — it’s a violation of E′. The “false subject” in English sentences that start with “There are” is repeated over, and over, and over again in mathematics (hence the invention of the symbol ∃).



Now cometh F William Lawvere, 3 centuries later, with a conceptual breakthrough.

path γ : time → space

The categoryists use labelled dots and labelled arrows to sketch concepts. So in pictures 2 and 3 you can see projection arrows splitting 3-space into a 2-plane (ground) and a 1-line (air). (Arrows sometimes seem backwards in category theory. Galileo projects 3D onto 1D + 2D, so something like “coprojection” would be the natural piecing together of independent sub-motions to get the full picture.)

And the Galileo example is just meant to be a shared thing we can all discuss. But this same thought-pattern — categorial decomposition — I can use on non-chalkboard things from my life as well. Gottman-style 2-eqn relationship dynamics; speculating about some economics in the news; love triangles; the deeper you plant this seed, the more places you see it.

(Source: amazon.com)

## Why are rotations linear?

You know what’s surprising?

• Rotations are linear transformations.

I guess lo conocí but no entendí. Like, I could write you the matrix formula for a rotation by θ degrees:

$\large \dpi{200} \bg_white R(\theta) = \begin{bmatrix} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}$

But why is that linear? Lines are straight and circles bend. When you rotate something you are moving it along a circle. So how can that be linear?

I guess 2-D linear mappings ℝ²→ℝ² surprise our natural 1-D way of thinking about “straightness”.



New answer, July 2014: Really it’s better to think of rotations (and other linear operators like derivative, certain kernels, Laplace or Fourier transform) as homomorphisms or isomorphisms rather than “linear” like a line.

Algebra works the same after a homomorphism. That’s what “homomorphism” means, and “linear” is sometimes used as a synonym for “homomorphism”.

So really saying a transform is linear is just saying that things work the same after the transform. The concept of linear transforms, then, is useful in the following way:

2. I come up with a transformation which simplifies things for my wee brain but leaves the essentials unchanged
3. I resolve whatever was the original issue in the easier-yet-essentially-the-same space
4. I transform back to where Is tarted.

A canonical example in physics is rotating the coordinate system (basis isomorphism) to make calculations easier. Like making one dimension disappear by lining things up better.

Quantum mechanics and string theory use different isomorphisms that are more complicated than rotating the coordinate system.

Bonus: Sean Carroll shows us (for free) how Minkowski spacetime uses the concept of rotation—across space-and-time (as if these were orthogonal dimensions as much as the typical xy plane) to explain time dilation and length contraction, the salient features of special relativity.

quasiperiodic tilings from a 15th-century Uzbekistani madrasa

Yeah, the same quasiperiodic tilings that theoretical physicist Roger Penrose wrote about in the 20th century. The same quasiperiodic tilings that Tony Robbin says are the projections of a high-dimensional cubic lattice onto 2-D.

What now, Christian culture?

ARTICLE IN: Saudi Aramco World, via Artemy Kolchinsky
PS Saudi Aramco does $233 billion in sales each year. For reference the total value of Facebook is$25b. So Saudi Aramco transacts 9 Facebooks each year. What now, The Social Network?

hi-res