Posts tagged with R

Hello R people.

In December of 2013 I posted a cheap-o wiki-editable (thank you github) contact list which recruiters can use to find you, if they’re looking for R programmers.

In what I consider a resounding success, within a few weeks it got onto the first page of google (thank you github), and within a month or two it was the first result in Google.

So I would say this is the best (also to my knowledge the only) place to put your name if you want to be found for R work.

Posting this again because some peoples’ situations may have changed, and others may have not seen the first notice. Also because a little bird told me about a recruiter who wants to hire people for full-time ggplot work in London asap.

I’ll be checking github for pull requests in the wake of this posting, to make sure your details don’t linger in github limbo.

√(x²−1)(x²−k²).      x,k∈ℂ

(actually just going over the unit circle, not all of ℂ)

edit: hey, are these showing up as moving gif’s for you?

Read More


@tdhopper posted his self-measurements of weight loss


a few months back. I recently decided also that I wanted to lose fat-weight—the infamous “I could stand to be a few kilos lighter”—and I think I came up with a more productive way of thinking about my progress: I’m not going to look at the scale at all. I’m just going to count calorie estimates from the treadmill estimator or use online calculators for how much is burned by running / swimming — and calories burned is the only thing I will use: no attempts at eating less.


Also, instead of thinking in terms of weight I’m going to think in terms of volume. Here are some pictures of people holding 5 pounds of fat (2¼ kilos):

As you can see this is a large fraction of a person’s flesh, if their BMI is in the healthy range.

I’m not so fat that I have tens of litres of fat making up my body. Rather if I look at myself and visually “remove 2 litres” that “looks” like it would be very substantial—such a huge volume that, of course it would take weeks of diligent exercise!

But as we know from Mr Hopper’s posts (or I know it from my own experience of weighing myself), the noise is louder than the signal.

The magnitude of daily variation swamps the magnitude of “fundamental” progress.


The goal of counting kcal burned and thinking in terms of volume is to make both the goals and the progress feel more visceral. Everybody knows how to lose weight, the problem is just that one doesn’t do it. Other than simply increasing self-discipline or increasing the mental energy I put towards this goal (neither of which I want to do).

  1. More accurate measurement of my small-scale progress and
  2. Choosing meaningful goals in the first place—not a number grabbed out of the air (“five kilos”—why five?), but rather imagine how much volume has left my muffin-top and how much volume is left—whilst still carrying with me the “larger numbers” associated with kcal fat-loss, than the “small numbers” which characterise litres (gallons ~ 8 lbs) of fat loss.

Here’s my mathematical model of why this is hard in the first place:

  • I take about 100 measurements at roughly the same time but not exactly timepoints <- 1:1e2 + rnorm(1e2,sd=1)
  • the natural variation in weight, in the unit scale of [kcal stored by fat] is on the order of kilos daily.variation <- 1e5 * sin( runif(1,min=-pi/2,max=pi/2) + timepoints)
  • even if I subtracted off my daily fluctuation pattern (Mr Hopper does this by weighing himself at the same time every day), there are apparently other noise factors on the order of half a kilo or perhaps .1 kilo other.variation <- 1e4 * sin( runif(1,min=-pi/2,max=pi/2) + timepoints)

  • the “underlying phenomenon” I’m trying to measure is perhaps on the order of .01 kilos lost per day. Let’s say I lose 1 kilo in 3 weeks, that would be 8000 kcal if I’m good. (i.e., I actually do my workouts and I don’t eat a compensatory extra 8000± kcal). I could model the underlying fat loss as a step function to be more truthful but I’ll use a linear model, saying I lose 100 kcal per measurement (supposing I measure 3 times a day) rather than 700 kcal every time I work out, which is not once a day (that would be the step function). But the catch is, I’m not sure if I’m compensating by eating more. My statistical task is to estimate B, in other words to distinguish if I’m losing weight or not, and how fast I’m losing it (in kcal units, leaving the conversion 8000 kcal ~ 1 kilo as an afterthought), from the signal-swamped data. B<-rnorm(1,mean=100,sd=50); trend<-−B*timepoints
  • Now my job is to estimate B. Is it even positive? (i.e. am I actually losing weight?) In R I just made the variable so I could print(B) but the point is to model why it’s hard to do this from my real data, which is the sum data <- daily.variation   +   other.variation   - B*timepoints
  • This is why I like my idea: measurements of kcal burned on the treadmill is 1000 times more precise than measurements of my bodyweight.

So my overall system is to do “chunks” of 7000 kcal = 1 kilo of fat or 3500 kcal =1 pound of fat. I can stand to do 500–700 kcal per cardio session—about an hour. (I also do an extra +1 kcal for every minute it took me to penalise for low speed: exercise crowds out normal metabolism.) Then it becomes a “long count” up to 3500 or up to 7000. That means 5 cardio sessions (of 770 kcal each) to get up to 1 pound of fat-loss, 7 wimped-out cardio sessions (of 550 kcal each) to reach a pound, and so on. It’s easy enough to “count to 5”. This system makes each one of the 5 be significantly large at the order of magnitude appropriate to convert kcal of exercise to litres of body volume.

> plot( polyroot(choose(131,14:29)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:39)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:59)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:79)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:99)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:119)) ,pch=19,col='red')
> plot( polyroot(choose(131,14:139)) ,pch=19,col='red')

I’ve googled How do I find out how big my workspace is too many times … here’s the explicit code to run and hopefully the next googler sees this post:

for (thing in ls()) { message(thing); print(object.size(get(thing)), units='auto') }

Fin. You can stop there.


Or for a bit of context… Here’s an example code to generate objects of variable sizes where you might not be sure how big they are:

system.time(boot.1 <- boot( sunspot.year, max, R=1e3, parallel='multicore', ncpu=4))
system.time(boot.2 <- boot( sunspot.year, max, R=1e4))
system.time(boot.3 <- tsboot( sunspot.year, max, R=1e5, parallel='multicore', ncpu=4))
system.time(boot.4 <- boot( sunspot.year, max, R=1e5, parallel='multicore', ncpu=8))
system.time(boot.5 <- boot( sunspot.year, max, R=1e6 parallel='multicore', ncpu=8))
par(col=rgb(0,0,0,.1), pch=20)
for (thing in ls()) {
    print(object.size(get(thing)), units='auto')

This code is doing a few things:

  1. resampling the sunspot dataset to try to estimate the most sunspots we “should” see in a year (with a very stylised meaning of “should”).

    This is worth looking into because some people say global warming is caused by sunspots rather than eg carbon emissions multiplying greenhouse effects.

    History only happened once but by bootstrapping we try to overcome this.

  2. noodling around with multiple cores (my laptop has 8; sudo apt-get install lscpu). Nothing interesting happens in this case; still, multicore is an option.
  3. timing how long fake reprocessings of history take with various amounts of resampling and various numbers of cores
  4. showing how big those bootstrap objects are. Remember, R runs entirely in memory, so big datasets or derived objects of any kind can cramp your home system or bork your EC2.
  5. printing the size of the objects, as promised. On my system (which I didn’t run the exact code above) the output was:
    > for (obj in ls()) { message(obj); print(object.size(get(obj)), units='auto') }
    89.1 Kb
    792.2 Kb
    7.6 Mb
    7.6 Mb
    7.6 Mb
    792.2 Kb
    64 bytes
    2.5 Kb


PS To find out how much memory you have (in linux or maybe Mac also) do:

$ free -mt
             total       used       free     shared    buffers     cached
Mem:         15929      12901       3028          0        214       9585
-/+ buffers/cache:       3102      12827
Swap:        10123          0      10123
Total:       26053      12901      13152

A question I’ve googled before without success. Hopefully this answer will show up for someone who needs it. I’ll also go over the better-known uses of ? just in case.

  • To get help in R about a function like subset you type ?subset . That’s like man subset from the command line.
  • If you only know roughly what you’re looking for use double question marks: so ??nonlinear will lead to the package nlme. That’s like apropos on the command line.
  • To get a package overview, type ?xts::xts. There is no ?xts help. Packages that don’t have ?twitteR::twitteR you will need to use ??twitteR to find the help pages on ?twitteR::status-class, ?twitteR::dmGet, etc.
  • Finally, the question of the title. To get R help on punctuation such as (, {, [, `, ::, ..., +, and yes, even on ? itself, use single quotes to ‘escape’ the meaningful symbol. Examples follow:
    • ?'`'
    • ?'('
    • ?'['
    • ?'...'
    • ?'+'
    • ?'%*%'
    • ?'%x%'
    • ?'%o%'
    • ?'%%'
    • ?'%/%'
    • ?'$'
    • ?'^'
    • ?'~'
    • ?'<-'
    • ?'='
    • ?'<<-'

All of the quotation marks `, ', " use the same help file so ?'"' or ?'`' will give you the help file for ?'''.

tl,dr: If you want to be contacted for freelance R work, edit this list


Background/Problem: I was looking for a list of R-programming freelancers and realised there is no such list.

Other than famous people and people I already talk to, I don’t know even a small fraction of the R community—let alone people who do R among other things and don’t participate in the mailing lists or chatrooms I do.

This is actually a more general problem since anyone looking to hire an R programmer will find a wall of tutorials if they


Solution: I thought about making a publicly-editable website where freelancers can put their contact info, specialty areas, links to projects, preferred kind of work, rates, and so on.

Of course, I’d have to make the login system. And validate users. And fight spam. And think up some database models, change the fields if someone suggests something better…. And it would be nice to link to StackOverflow, Github, CRAN, and …

The more I thought about it the more I favoured a solution where someone else does all the work. GitHub already has a validation system, usernames, logins, and a publicly editable “wiki”. MVP. No maintenance, no vetting, no development. GitHub already shows up in google so whoever searches for “hire an R programmer” will find you if you put your details there.

It’s actually unbelievable that we’ve had R-Bloggers as a gathering place for so long, but nowhere central to list who’s looking for work.

So I committed which is a markdown file you can add your information to, if you want to be found by recruiters who are looking for R programmers. Forking is a good design pattern for this purpose as well. Add whatever information you want, and if you think I’m missing some fields you can add those as well. Suggestions/comments also welcome below.

Two interesting ideas here:

  • "trading time"
  • price impact of a trade proportional to exp( √size )

Code follows:

Read More


playing along with Elias Wegert in R:

X <- matrix(1:100,100,100)                  #grid
X <- X * complex(imaginary=.05) + t(X)/20    #twist & shout
X <- X - complex(real=2.5,imaginary=2.5)     #recentre
plot(X, col=hcl(h=55*Arg(sin(X)), c=Mod(sin(X))*40 ) ,        pch=46, cex=6)

Found it was useful to define these few functions:

arg <- function(z) (Arg(z)+pi)/2/pi*360     #for HCL colour input
ring <- function(C) C[.8 < Mod(C) &   Mod(C) < 1.2]        #focus on the unit circle
lev <- function(x) ceiling(log(x)) - log(x)
m <- function(z) lev(Mod(z))
plat <- function(domain, FUN) plot( domain, col= hcl( h=arg(FUN(domain)), l=70+m(domain)), pch=46, cex=1.5, main=substitute(FUN) )           #say it directly

NB, hcl's hue[0,360] so phase or arg needs to be matched to that.

I like this concept of “low volatility, interrupted by occasional periods of high volatility”. I think I will call it “volatility”.

Daniel Davies

via nonergodic


(PS: If you didn’t see it before: try plotting this in R:

vol.of.vol <- function(x) {
    dpois(x, lambda=dpois(x, 5)

… and so on, to your heart’s content.


Fun, right?)