Posts tagged with **R**

I’ve googled *How do I find out how big my workspace is* too many times … here’s the explicit code to run and hopefully the next googler sees this post:

**for (thing in ls()) {** message(thing); print(object.size(get(thing)), units='auto') **}**

Fin. You can stop there.

Or for a bit of context… Here’s an example code to generate objects of variable sizes where you might not be sure how big they are:

require(boot) require(datasets) data(sunspot.year) system.time(boot.1 <- boot( sunspot.year, max, R=1e3, parallel='multicore', ncpu=4)) system.time(boot.2 <-boot( sunspot.year, max, R=1e4)) system.time(boot.3 <- tsboot( sunspot.year, max, R=1e5, parallel='multicore', ncpu=4)) system.time(boot.4 <- boot( sunspot.year, max, R=1e5, parallel='multicore', ncpu=8)) system.time(boot.5 <- boot( sunspot.year, max, R=1e6 parallel='multicore', ncpu=8)) print(boot.1) plot(boot.1) par(col=rgb(0,0,0,.1), pch=20) plot(boot.2) for (thing in ls()) {message(thing) print(object.size(get(thing)), units='auto')}

This code is doing a few things:

- resampling the sunspot dataset to try to estimate the most sunspots we “should” see in a year (with a very stylised meaning of “should”).

This is worth looking into because some people say global warming is caused by sunspots rather than eg carbon emissions multiplying greenhouse effects.

History only happened once but by bootstrapping we try to overcome this.

- noodling around with multiple cores (my laptop has 8;
`sudo apt-get install lscpu`

). Nothing interesting happens in this case; still, multicore is an option. - timing how long fake reprocessings of history take with various amounts of resampling and various numbers of cores
- showing how big those bootstrap objects are. Remember,
`R`

runs entirely in memory, so big datasets or derived objects of any kind can cramp your home system or bork your EC2. - printing the size of the objects, as promised. On my system (which I didn’t run the exact code above) the output was:
>

**for (obj in ls()) { message(obj); print(object.size(get(obj)), units='auto') }**b.1 89.1 Kb b.2 792.2 Kb b.3 7.6 Mb b.4 7.6 Mb b.5 7.6 Mb b.6 792.2 Kb obj 64 bytes sunspot.year 2.5 Kb

**PS** To find out how much memory you have (in linux or maybe Mac also) do:

$free -mttotal used free shared buffers cached Mem: 15929 12901 3028 0 214 9585 -/+ buffers/cache: 3102 12827 Swap: 10123 0 10123 Total: 26053 12901 13152

*A question I’ve googled before without success. Hopefully this answer will show up for someone who needs it. I’ll also go over the better-known uses of ? just in case.*

- To get help in
`R`

about a function like`subset`

you type`?subset`

. That’s like`man subset`

from the command line. - If you only know roughly what you’re looking for use double question marks: so
`??nonlinear`

will lead to the package`nlme`

. That’s like`apropos`

on the command line. - To get a package overview, type
`?xts::xts`

. There is no`?xts`

help. Packages that don’t have`?twitteR::twitteR`

you will need to use`??twitteR`

to find the help pages on`?twitteR::status-class`

,`?twitteR::dmGet`

, etc. - Finally, the question of the title.
**To get R help on punctuation such as**Examples follow:`(`

,`{`

,`[`

,```

,`::`

,`...`

,`+`

, and yes, even on`?`

itself, use single quotes to ‘escape’ the meaningful symbol.`?'`'`

`?'('`

`?'['`

`?'...'`

`?'+'`

`?'%*%'`

`?'%x%'`

`?'%o%'`

`?'%%'`

`?'%/%'`

`?'$'`

`?'^'`

`?'~'`

`?'<-'`

`?'='`

`?'<<-'`

All of the quotation marks ```

, `'`

, `"`

use the same help file so `?'"'`

or `?'`'`

will give you the help file for `?'''`

.

**tl,dr:** If you want to be contacted for freelance R work, edit this list https://github.com/isomorphisms/hire-an-r-programmer/blob/gh-pages/README.md.

**Background/Problem:** I was looking for a list of R-programming freelancers and realised there is no such list.

Other than famous people and people I already talk to, I don’t know even a small fraction of the R community—let alone people who do R among other things and don’t participate in the mailing lists or chatrooms I do.

This is actually a more general problem since anyone looking to hire an R programmer will find a wall of tutorials if they http://google.com/search?q=hire+an+r+programmer.

**Solution:** I thought about making a publicly-editable website where freelancers can put their contact info, specialty areas, links to projects, preferred kind of work, rates, and so on.

Of course, I’d have to make the login system. And validate users. And fight spam. And think up some database models, change the fields if someone suggests something better…. And it would be nice to link to StackOverflow, Github, CRAN, and …

The more I thought about it the more I favoured a solution where someone else does all the work. GitHub already has a validation system, usernames, logins, and a publicly editable “wiki”. MVP. No maintenance, no vetting, no development. GitHub already shows up in google so whoever searches for “hire an R programmer” will find you if you put your details there.

It’s actually unbelievable that we’ve had R-Bloggers as a gathering place for so long, but nowhere central to list who’s looking for work.

So I committed https://github.com/isomorphisms/hire-an-r-programmer/blob/gh-pages/README.md which is a markdown file you can add your information to, if you want to be found by recruiters who are looking for R programmers. Forking is a good design pattern for this purpose as well. Add whatever information you want, and if you think I’m missing some fields you can add those as well. Suggestions/comments also welcome below.

I like this concept of “low volatility, interrupted by occasional periods of high volatility”. I think I will call it “volatility”.

Daniel Davies

via nonergodic

(PS: If you didn’t see it before: try plotting this in `R`

:

vol.of.vol <- function(x) { dpois(x, lambda=dpois(x, 5) }

… and so on, to your heart’s content.

Fun, right?)

I was re-reading Michael Murray’s explanation of cointegration:

and marvelling at the calculus.

Calculus blows my mind sometimes. Like, hey guess how much we can do with subtraction.

— protëa(@isomorphisms) March 28, 2013

Of course it’s not *any* subtraction. It’s subtracting a function from a shifted version of itself. Still doesn’t sound like a universal revolution.

(But of course the observation that the lagged first-difference will be zero around an extremum (turning point), along with symbolic formulæ for (infinitesimal) first-differences of a function, made a decent splash.)

Jeff Ryan wrote some R functions that make it easy to first-difference financial time series.

Here’s how to do the first differences of Goldman Sachs’ share price:

require(quantmod) getSymbols("GS") gs <- Ad(GS) plot( gs - lag(gs) )

Look how much more structured the result is! Now all of the numbers are within a fairly narrow band. With `length(gs)`

I found 1570 observations. Here are 1570 random normals `plot(rnorm(1570, sd=10), type="l")`

for comparison:

Not perfectly similar, but very close!

Looking at the first differences compared to a Gaussian brings out what’s different between public equity markets and a random walk. What sticks out to me is the vol leaping up aperiodically in the $GS time series.

I think I got even a little closer with drawing the stdev’s from a Poisson process `plot(rnorm(1570, sd=rpois(1570, lambda=5)), type="l")`

but I’ll end there with the graphical futzing.

What’s really amazing to me is how much difference a subtraction makes.

The Cauchy distribution (`?dcauchy`

in `R`

) nails a flashlight over the number line

and swings it at a constant speed from 9 o’clock down to 6 o’clock over to 3 o’clock. (Or the other direction, from 3→6→9.) Then counts how much light shone on each number.

In other words we want to map evenly from `the circle (minus the top point)`

onto `the line`

. Two of the most basic, yet topologically distinct shapes related together.

You’ve probably heard of a mapping that does something close enough to this: it’s called `tan`

.

Since `tan`

is so familiar it’s implemented in Excel, which means you can simulate draws from a Cauchy distribution in a spreadsheet. Make a column of `=RAND()`

's *(say column A)* and then pipe them through `tan`

. For example `B1=TAN(A1)`

. You could even do `=TAN(RAND())`

as your only column. That’s not quite it; you need to stretch and shift the `[0,1]`

domain of `=RAND()`

so it matches `[−π,+π]`

like the circle. So really the long formula (if you didn’t break it into separate columns) would be `=TAN( PI() * (RAND()−.5) )`

. A stretch and a shift and you’ve matched the domains up. There’s your Cauchy draw.

In R one could draw three Cauchy’s with `rcauchy(3)`

or with `tan(2*(runif(3)`

.`−`

.5))

What’s happening at `tan(−3π/2)`

and `tan(π/2)`

? The `tan`

function is putting out to ±∞.

I saw this in school and didn’t know what to make of it—I don’t think I had any further interest than finishing my problem set.

I saw as well the ±∞ in the output of `flip[x]= 1/x`

.

`1/−.0000...001 → −∞`

whereas`1/.0000...0001 → +∞`

.

It’s not immediately clear in the `flip[x]`

example but in `tan[x/2]`

what’s definitely going on is that the angle is circling around the top of the circle (the hole in the top) and the flashlight of the Cauchy distribution could be pointing to the right or to the left at a parallel above the line.

Why not just call this ±∞ the same thing? “Projective infinity”, or, the hole in the top of the circle.