Posted on Monday, 9 May 2011

## Autocorrelation

A “truly” random, uniform random, completely random sequence might look like

◯◯⨯◯⨯⨯⨯⨯◯◯⨯◯◯⨯⨯◯⨯◯◯⨯⨯◯⨯⨯◯⨯◯◯⨯◯
R code: > xooooo = sample( c("◯", "⨯") , 30, rep = T)

like the flips of a fair coin. But there are other “random”s as well.

### Biased

For example, biased random, like an unfair coin with 4/5 bias, might generate a sequence that looks like this:

◯◯◯◯⨯◯◯⨯◯◯⨯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯

R code: > xooooo = sample( c("◯","◯","◯","◯", "⨯") , 30, rep = T)

### Self-Correlated

But there’s also autocorrelated, or serially correlated, randomness.

◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯⨯⨯⨯◯◯◯◯◯◯◯

For example you feel fine ◯ 80% of the time and 20% you’re sick ⨯ — and of course the sick days are more likely to come one after another. Or 80% of the time you don’t smoke ◯ but then you buy a pack and all of a sudden you smoke ⨯⨯⨯ three days in a row. Once you’ve broken your resolve, you’re more likely to smoke again the next day.

$\dpi{200} \bg_white \rm{phenomenon}_t = x_t + y_t + z_t + 10\% \cdot\rm{phenomenon}_{t-1} + \varepsilon_t$

Equation-wise, autocorrelation amounts to adding a self-lag term to the other explanatory variables (plus unexplained residual). Besides habit and viral invasion, autocorrelation brings many things under the penumbra of randomness:

• income. The strong gets more, while the weak ones fade. If you made a lot of money at your previous job, your next employer will pay you more either to steal you away or simply because salary history determines compensation in HR’s formula.
• unemployment. Jobless today, jobless tomorrow. Those who are unemployed for more than six months are even more likely to be unemployed for the long term. Also people who take care of their own kids as their job are likely to still be doing so next week and next year rather than working for a company.
• likelihood of cancer. Back to the subject of smoking, your likelihood of getting cancer accumulates faster and faster the more you smoke. I’ve seen claims that there is a kink in the cumulative propensity to cancer rate above one pack / day.
• stock prices. Stocks don’t just jump around in a Cauchy distribution, although maybe the daily change in stock price does. Daily change is a lag term $\dpi{200} \bg_white \rm{price}_t - \rm{price}_{t-1}$ so that’s serial correlation.

Serial correlation or autocorrelation refers to things that bunch together. When it rains, it pours.

28 notes

1. shane4ster reblogged this from isomorphismes
2. pipoytales reblogged this from isomorphismes
3. adfdfay23423497gdsafyagdf said: the “R code” refers to the third chain randomness, serially correlated randomness, right? Will it always have little groups within its chain like that? Just for clarification! :D
4. isomorphismes posted this