Posts tagged with R

> plot( polyroot(choose(131,14:29)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:39)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:59)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:79)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:99)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:119)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:139)) ,pch=19,col='red')

I’ve googled How do I find out how big my workspace is too many times … here’s the explicit code to run and hopefully the next googler sees this post:

for (thing in ls()) { message(thing); print(object.size(get(thing)), units='auto') }

Fin. You can stop there.


Or for a bit of context… Here’s an example code to generate objects of variable sizes where you might not be sure how big they are:

system.time(boot.1 <- boot( sunspot.year, max, R=1e3, parallel='multicore', ncpu=4))
system.time(boot.2 <- boot( sunspot.year, max, R=1e4))
system.time(boot.3 <- tsboot( sunspot.year, max, R=1e5, parallel='multicore', ncpu=4))
system.time(boot.4 <- boot( sunspot.year, max, R=1e5, parallel='multicore', ncpu=8))
system.time(boot.5 <- boot( sunspot.year, max, R=1e6 parallel='multicore', ncpu=8))
par(col=rgb(0,0,0,.1), pch=20)
for (thing in ls()) {
    print(object.size(get(thing)), units='auto')

This code is doing a few things:

  1. resampling the sunspot dataset to try to estimate the most sunspots we “should” see in a year (with a very stylised meaning of “should”).

    This is worth looking into because some people say global warming is caused by sunspots rather than eg carbon emissions multiplying greenhouse effects.

    History only happened once but by bootstrapping we try to overcome this.

  2. noodling around with multiple cores (my laptop has 8; sudo apt-get install lscpu). Nothing interesting happens in this case; still, multicore is an option.
  3. timing how long fake reprocessings of history take with various amounts of resampling and various numbers of cores
  4. showing how big those bootstrap objects are. Remember, R runs entirely in memory, so big datasets or derived objects of any kind can cramp your home system or bork your EC2.
  5. printing the size of the objects, as promised. On my system (which I didn’t run the exact code above) the output was:
    > for (obj in ls()) { message(obj); print(object.size(get(obj)), units='auto') }
    89.1 Kb
    792.2 Kb
    7.6 Mb
    7.6 Mb
    7.6 Mb
    792.2 Kb
    64 bytes
    2.5 Kb


PS To find out how much memory you have (in linux or maybe Mac also) do:

$ free -mt
             total       used       free     shared    buffers     cached
Mem:         15929      12901       3028          0        214       9585
-/+ buffers/cache:       3102      12827
Swap:        10123          0      10123
Total:       26053      12901      13152

A question I’ve googled before without success. Hopefully this answer will show up for someone who needs it. I’ll also go over the better-known uses of ? just in case.

  • To get help in R about a function like subset you type ?subset . That’s like man subset from the command line.
  • If you only know roughly what you’re looking for use double question marks: so ??nonlinear will lead to the package nlme. That’s like apropos on the command line.
  • To get a package overview, type ?xts::xts. There is no ?xts help. Packages that don’t have ?twitteR::twitteR you will need to use ??twitteR to find the help pages on ?twitteR::status-class, ?twitteR::dmGet, etc.
  • Finally, the question of the title. To get R help on punctuation such as (, {, [, `, ::, ..., +, and yes, even on ? itself, use single quotes to ‘escape’ the meaningful symbol. Examples follow:
    • ?'`'
    • ?'('
    • ?'['
    • ?'...'
    • ?'+'
    • ?'%*%'
    • ?'%x%'
    • ?'%o%'
    • ?'%%'
    • ?'%/%'
    • ?'$'
    • ?'^'
    • ?'~'
    • ?'<-'
    • ?'='
    • ?'<<-'

All of the quotation marks `, ', " use the same help file so ?'"' or ?'`' will give you the help file for ?'''.

tl,dr: If you want to be contacted for freelance R work, edit this list https://github.com/isomorphisms/hire-an-r-programmer/blob/gh-pages/README.md.


Background/Problem: I was looking for a list of R-programming freelancers and realised there is no such list.

Other than famous people and people I already talk to, I don’t know even a small fraction of the R community—let alone people who do R among other things and don’t participate in the mailing lists or chatrooms I do.

This is actually a more general problem since anyone looking to hire an R programmer will find a wall of tutorials if they http://google.com/search?q=hire+an+r+programmer.


Solution: I thought about making a publicly-editable website where freelancers can put their contact info, specialty areas, links to projects, preferred kind of work, rates, and so on.

Of course, I’d have to make the login system. And validate users. And fight spam. And think up some database models, change the fields if someone suggests something better…. And it would be nice to link to StackOverflow, Github, CRAN, and …

The more I thought about it the more I favoured a solution where someone else does all the work. GitHub already has a validation system, usernames, logins, and a publicly editable “wiki”. MVP. No maintenance, no vetting, no development. GitHub already shows up in google so whoever searches for “hire an R programmer” will find you if you put your details there.

It’s actually unbelievable that we’ve had R-Bloggers as a gathering place for so long, but nowhere central to list who’s looking for work.

So I committed https://github.com/isomorphisms/hire-an-r-programmer/blob/gh-pages/README.md which is a markdown file you can add your information to, if you want to be found by recruiters who are looking for R programmers. Forking is a good design pattern for this purpose as well. Add whatever information you want, and if you think I’m missing some fields you can add those as well. Suggestions/comments also welcome below.

Two interesting ideas here:

  • "trading time"
  • price impact of a trade proportional to exp( √size )

Code follows:

Read More

(Source: finmath.stanford.edu)

playing along with Elias Wegert in R:

X <- matrix(1:100,100,100)                  #grid
X <- X * complex(imaginary=.05) + t(X)/20    #twist & shout
X <- X - complex(real=2.5,imaginary=2.5)     #recentre
plot(X, col=hcl(h=55*Arg(sin(X)), c=Mod(sin(X))*40 ) ,        pch=46, cex=6)

Found it was useful to define these few functions:

arg <- function(z) (Arg(z)+pi)/2/pi*360     #for HCL colour input
ring <- function(C) C[.8 < Mod(C) &   Mod(C) < 1.2]        #focus on the unit circle
lev <- function(x) ceiling(log(x)) - log(x)
m <- function(z) lev(Mod(z))
plat <- function(domain, FUN) plot( domain, col= hcl( h=arg(FUN(domain)), l=70+m(domain)), pch=46, cex=1.5, main=substitute(FUN) )           #say it directly

NB, hcl's hue[0,360] so phase or arg needs to be matched to that.

I like this concept of “low volatility, interrupted by occasional periods of high volatility”. I think I will call it “volatility”.

Daniel Davies

via nonergodic


(PS: If you didn’t see it before: try plotting this in R:

vol.of.vol <- function(x) {
    dpois(x, lambda=dpois(x, 5)

… and so on, to your heart’s content.


Fun, right?)

Whilst reading John Hempton’s post on shorting $HLF I decided to follow along in quantmod.

Long story short, HerbaLife sells weight-loss supplements through a multilayer marketing business model which Bill Ackman contends is an unsustainable, illegal pyramid scheme. Ackman calls it “the only billion-dollar brand no-one’s ever heard of” and Hempton posts some very unflattering Craigslist marketing adverts:




thus undermining its credibility.

I should mention that I got some internet ads from Pershing Square Capital Management when I googled for herbalife information. In other words the shorts are spreading the word around to the common man to jump on this short! Destroy this pyramid scheme! You could read this as similar to a penny-stock email, but I view it simply as credible self-interest: I already put my shorts on for what I believe are rational reasons. It’s obviously in my self-interest to convince you to do the same but I do in fact believe that $HLF will and should go down and you know I do because I put my money where my mouth is. Whether that’s an ideal confluence of honesty, moral high ground, and selfishness—capitalism at its best—or some overpowerful hedgies using their marketing dollars to bring down a solid company, I’ll leave up to you.


Anyway, on to the quantmod stuff.

Here’s how to generate the 2007–present view:


require(quantmod); getSymbols('HLF'); setDefaults(chartSeries, up.col="gold", dn.col="#2255aa", color.vol=FALSE); chartSeries(HLF)

Now here’s the interesting part.

(…”Ackman” should read “Einhorn” in red there…)

You can notice in red that trades per day (volume) have risen to 10, 20 times normal levels during 2013—which maybe we can attribute to the “buzz” generated by Pershing Square, @KidDynamite, Bronte Capital, and whoever else is calling $HLF a pyramid scheme.

median(Vo(HLF)) tells me the halfway split between “high” and “low” trading volume for this stock. It’s roughly 2 million trades per day. Then with quantmod I can plot those hi-lo subsets with chartSeries(subset(HLF, Vo(HLF)<2e6)); chartSeries(subset(HLF, Vo(HLF)>2e6)) to get a visual on “calm days” versus “heavy days”. That’s something you can’t do with Google Charts.

Here’s calm (under 2 million trades/day)


upper half of heavy trading days (over 2 million/day)


and what I’ll call “pirate days” (over 10 million trades/day)—with plunderers swarming the stock, battling with swords between their teeth


wherein it’s visually clear that very heavy trading skewed blue over gold—i.e. $HLF closed lower than it opened on that day: the heavy trading volume was taking the price downward.

But more precisely what happened on those days? This is a good job for the stem-and-leaf plot. Notice, by the way, that reality here looks nothing like a bell curve. Sorry, pet peeve. Anyway here is the stem plot of heavy trading days:

> hi.volume <- subset(HLF, Vo(HLF)>1e7)
> stem(Cl(hi.volume)-Op(hi.volume))

  The decimal point is at the |

  -14 | 1
  -12 | 
  -10 | 
   -8 | 2
   -6 | 1554
   -4 | 542
   -2 | 430
   -0 | 988761851
    0 | 345667780388
    2 | 058699
    4 | 1
    6 | 5

I love stem plots because they give you precision and the general picture at once. From the way the ink lies you get the same pic as the kernel density plot( density( Cl(hi.volume) - Op(hi.volume) ), col="#333333" , ylab="", main="Volume at least 10 million $HLF", yaxt="n", xlab="Price Movement over the Trading Day"); polygon( density( Cl(hi.volume) - Op(hi.volume) ), col="#333333", border="#333333" )
but you can also see actual numbers in the stem plot. For example the ones to the right of +0 are pretty interesting. Slight gains on many of those pirate days, but not enough to bash back a 14-point loss on a single day.

I was re-reading Michael Murray’s explanation of cointegration:

and marvelling at the calculus.

Of course it’s not any subtraction. It’s subtracting a function from a shifted version of itself. Still doesn’t sound like a universal revolution.

(But of course the observation that the lagged first-difference will be zero around an extremum (turning point), along with symbolic formulæ for (infinitesimal) first-differences of a function, made a decent splash.)

definition of derivative

Jeff Ryan wrote some R functions that make it easy to first-difference financial time series.


Here’s how to do the first differences of Goldman Sachs’ share price:

gs <- Ad(GS)
plot(  gs - lag(gs)  )


Look how much more structured the result is! Now all of the numbers are within a fairly narrow band. With length(gs) I found 1570 observations. Here are 1570 random normals plot(rnorm(1570, sd=10), type="l") for comparison:


Not perfectly similar, but very close!

Looking at the first differences compared to a Gaussian brings out what’s different between public equity markets and a random walk. What sticks out to me is the vol leaping up aperiodically in the $GS time series.

I think I got even a little closer with drawing the stdev’s from a Poisson process plot(rnorm(1570, sd=rpois(1570, lambda=5)), type="l")


but I’ll end there with the graphical futzing.

What’s really amazing to me is how much difference a subtraction makes.

The Cauchy distribution (?dcauchy in R) nails a flashlight over the number line


and swings it at a constant speed from 9 o’clock down to 6 o’clock over to 3 o’clock. (Or the other direction, from 3→6→9.) Then counts how much light shone on each number.


In other words we want to map evenly from the circle (minus the top point) onto the line. Two of the most basic, yet topologically distinct shapes related together.


You’ve probably heard of a mapping that does something close enough to this: it’s called tan.

Since tan is so familiar it’s implemented in Excel, which means you can simulate draws from a Cauchy distribution in a spreadsheet. Make a column of =RAND()'s (say column A) and then pipe them through tan. For example B1=TAN(A1). You could even do =TAN(RAND()) as your only column. That’s not quite it; you need to stretch and shift the [0,1] domain of =RAND() so it matches [−π,+π] like the circle. So really the long formula (if you didn’t break it into separate columns) would be =TAN( PI() * (RAND()−.5) ). A stretch and a shift and you’ve matched the domains up. There’s your Cauchy draw.

In R one could draw three Cauchy’s with rcauchy(3) or with tan(2*(runif(3).5)).



What’s happening at tan(−3π/2) and tan(π/2)? The tan function is putting out to ±∞.

I saw this in school and didn’t know what to make of it—I don’t think I had any further interest than finishing my problem set.

File:Hyperbola one over x.svg

I saw as well the ±∞ in the output of flip[x]= 1/x.

  • 1/−.0000...001 → −∞ whereas 1/.0000...0001 → +∞.

It’s not immediately clear in the flip[x] example but in tan[x/2] what’s definitely going on is that the angle is circling around the top of the circle (the hole in the top) and the flashlight of the Cauchy distribution could be pointing to the right or to the left at a parallel above the line.

Why not just call this ±∞ the same thing? “Projective infinity”, or, the hole in the top of the circle.