Quantcast

Posts tagged with econometrics

If the astronomical observations and other quantities on which the computation of orbits were absolutely correct, the elements also, whether deduced from three or four observations, would be strictly accurate (so far indeed as the motion is supposed to take place exactly according to the laws of Kepler), and, therefore, if other observations were used, they might be confirmed but not corrected.

But since all our measurements and observations are nothing more than approximations to the truth, the same must be true of all calculations resting upon them, and the highest aim of all computations made concerning concrete phenomena must be to approximate, as nearly as practicable, to the truth. But this can be accomplished in no other way than by a suitable combination of more observations than the number absolutely requisite for the determination of the unknown quantities. This problem can only be properly understood when an approximate knowledge of the orbit has been already attained, which is afterwards to be corrected so as to satisfy all the observations in the most accurate manner possible.

Johann Carl Friedrich Gauß, Theoria Motus Corporum Cœlestium in Sectionibus Conicis solem Ambientium, 1809

(translation by C.H. Davis 1963)

(Source: cs.unc.edu)




notstatschat:

1. As Dan Davies observed (from memory) The Great Depression really happened; it wasn’t just an unusually inaccurate observation of an underlying 4% return on equities”

2. Why do we assume errors have zero mean?  …the mean of the residuals is not identifiable separately from the intercept, and we just choose the parametrization that has mean-zero residuals. In that situation it’s not an assumption and couldn’t be falsified empirically.




[In] Against Method … [Paul] Feyerabend divides his argument into an abstract critique followed by a number of historical case studies.

The abstract critique is a reductio ad absurdum of … the belief that a single methodology can produce scientific progress. Feyerabend … identifies four features of methodological monism: the principle of falsification, a demand for increased empirical content, the forbidding of ad hoc hypotheses and the consistency condition.

He then demonstrates that these features [together would] imply that science could not progress….

Wikipedia

(Source: Wikipedia)




I am a philosophical naïf with the background knowledge of the typical economist; that is, “utilitarianism is the moral framework, Rawls said some stuff that disagreed, but what can you really do with that stuff? Now tell me your R-squared.




[T]he implication of authority in science is rather odd given that the fifteenth-century revival of science in western Europe was a reaction against argument from authority.


…Precisely what “science” denotes is … unclear, but the present mental associations of objectivity and progress ensure that … using this prestigious epithet confers an air of authority; … would anyone attend the London School of Economics and Political Alchemy?




Although partial least squares regression was not designed for classification and discrimination, it is … used for these purposes. For example, PLS has been used to:

  • • distinguish Alzheimer’s, senile dementia of the Alzheimer’s type, and vascular dementia
  • • discriminate between Arabica and Robusta coffee beans
  • • classify waste water pollution
  • • separate active and inactive compounds in a quantitative structure-activity relationship study
  • • differentiate two types of hard red wheat using near-infrared analysis
  • • distinguish transsexualism, borderline personality disorder and controls using a standard instrument
  • • determine the year of vintage port wine
  • • classify soy sauce by geographic region
  • • determine emission sources in ambient aerosol studies
Matthew Barker and William Rayens

(Source: enpub.fulton.asu.edu)




The word “statistics” was a coinage of German and Italian enthusiasts for state action in the early eighteenth century…..

It became a sort of insanity….[tourists]…Samuel Johnson…. By the 1850’s the conservative critics of capitalism, such as Charles Dickens, were becoming very cross about statistics….

Counting can surely be a nitwit’s, or the Devil’s, tool. Among the more unnerving exhibits in the extermination camp at Auschwitz are the books in which Hitler’s willing executioners kept records on every person they killed.

The formal and mathematical theory of statistics was largely invented in the 1880’s by eugenicists (those clever racists at the origin of so much in the social sciences) and pefected in the twentieth century by agronomists (…Iowa State…). The newly mathematised statistics became a fetish in the wannabe sciences. During the 1920’s…sociology…quantification was a way of claiming status, as it became also in economics…(political economy), and in psychology, fresh[ly] separated from philosophy. In the 1920’s and 1930’s…social anthropologists…counted coconuts.

Deirdre McCloskey

HT @AClassicLiberal




My father used to tell me that when people complimented him on his tie, it was never because of the tie—it was because of the suit. If he wore his expensive suit, people would say “Nice tie!” But they were just mis-identifying what it was that they thought was nice. Similarly if you’re interviewing candidates and accidentally doing your part to perpetuate the beauty premium to salaries, you aren’t going to think “She was really beautiful, therefore she must be more competent”. You might just notate that she was a more effective communicator, got her point across better, seemed like more of a team player, something like that.

 

Achen (2002) proposes that regression in the social sciences should stick to at most three independent variables. Schrodt (2009) uses the phrase “nibbled to death by dummies”.

I understand the gripes. These two men are talking about political analysis, where the “macro” variables are shaky to begin with. What does it mean that the Heritage Foundation rated two countries 7 versus 9 points apart on corruption or freedom? Acts of corruption are individual and localised to a geography. Even “ethnofract”, which seems like a valid integral, still maps ∼10⁷ individual variation down to 10⁰. But this is statistics with fraught macro measures trying to answer questions that are hard to quantify in the first place—like the Kantian peace or center–periphery theories of global political structure.

What about regressions on complexes in more modest settings with more definitive data measurements? Let’s say my client is a grocery store. I want to answer for them how changing the first thing you see in the store will affect the amount purchased of the other items. (In general trying to answer how store layout affects purchases of all items … this being a “first bite”.) Imagine for my benefit also that I’m assisted or directed by someone with domain knowledge: someone who understands the mechanisms that make X cause Y—whether it’s walking, smelling, typical thought patterns or reaction paths, typical goals when entering the store, whatever it is.

I swear by my very strong personal intuition that complexes are everywhere. By complexes I mean highly interdependent cause & effect entanglements. Intrafamily violence, development of sexual preference, popularity of a given song, career choice, are explained not by one variable but by a network of causes.  You can’t just possess an engineering degree to make a lot of money in oil & gas. You also need to move to certain locations, give your best effort, network, not make obvious faux pax on your CV, not seduce your boss’ son, and on and on. In a broad macro picture we pick up that wealth goes up with higher degrees in the USA. Going from G.E.D. to Bachelor is associated with tripling ± 1 wealth.

I think this statistical path is worth exploring for application in any retail store. Or e-store or vending machine (both of which have a 2-D arrangement). Here as the prep are some photos of 3-D stores:

http://4.bp.blogspot.com/_JaHRPL7dPnc/TUG8AVboDBI/AAAAAAAADBQ/OzAbFcF3xoc/s400/anthropologie-seattle.jpg

http://mariamccabe.files.wordpress.com/2010/01/anthropologie_modernenglish.jpg

http://2.bp.blogspot.com/_v-sbLptZkQw/TJN0vePbm9I/AAAAAAAAJzo/ktiWwLCty0E/s640/anthropologie1.JPG
http://dineroclub.net/wp-content/uploads/2012/09/ANTHROPOLOGYSTORE.jpg

http://www.fabulouslybroke.com/wp-content/uploads/2009/06/anthropologie-store-chicago.jpg


https://lh3.googleusercontent.com/-Zc1Dg5bkpuM/TXgQLKz-O-I/AAAAAAAAVWE/7mv74AXQ5N8/s1600/Decorator+Shop+1.jpg



http://t1.gstatic.com/images?q=tbn:ANd9GcReKL_BAz1i34ZrJbrkKDwlCmZHmcs8Ux4PE_BuOD0Ru-pScnyH
http://4.bp.blogspot.com/_JaHRPL7dPnc/TUG8AVboDBI/AAAAAAAADBQ/OzAbFcF3xoc/s400/anthropologie-seattle.jpg
http://solaennuevayork.com/wp-content/uploads/2012/04/crit-span.jpg

And for the 2-D case (vending machine or e-store) here are some screen shots from Modcloth, marked up with potential “interaction arrows” that I speculated.

image

image

image

Again, I don’t have a great understanding of how item placement or characteristics really work so I am just making up some possible connections with these arrows here. Think of them as question marks.

  • purse, shoes, dress. Do you lead the (potential) customer up the path to a particular combination that looks so perfect? (As in a fashion ad—showing several pieces in combination, in context, rather than a “wide array” of the shirts she could be wearing in this scene.)
    Chanel
  • colours. Is it better to put matching colours next to each other? Or does that push customers in one direction when we’d prefer them to spread out over the products?
    image
    http://acauseofdazzling.files.wordpress.com/2010/12/anthropologie_3.jpg
    http://i0.wp.com/www.emanuela.nl/wp-content/uploads/2011/05/anthropologie-nyc2.jpg
  • variety versus contrastability. Is it better to show “We have a marmalade orange and a Kelly green and a sky blue party dress—so much variety!” or to put three versions of the “little black dress” so the consumer can tightly specify her preferences on it?
    Prada
    Shop Window Design

    http://stylematters.us/wp/wp-content/uploads/2009/12/img_photo1.jpg
    http://1.bp.blogspot.com/_5UxblBQYTxk/TJH636wOZLI/AAAAAAAABy4/mGh7JeLcbhU/s1600/IMG_8769.JPG
    http://cathylwood.files.wordpress.com/2008/11/huntsville-102908-015.jpg
    http://4.bp.blogspot.com/_V_qSgJvLF30/TKOtmxTUCwI/AAAAAAAACbU/tp1o5hotrrA/s1600/IMG_5771.JPG
    http://www.fashion.hr/img/repository/2011/07/web_image/cn_image_size_sicgit_nyc_anthropologie.jpg

    And if you are going to put a purse or shoes along with it (now in 3-ary relations) again the same question arises. Is it better to put gold shoes and black shoes next to the “cocktail dress” to show its versatility? Or to keep it simple—just a standard shoe so you can think “Yes” or “No” and insert your own creativity independently, for example “In contrast to the black shoes they are showing me, I can visualise how my gold sparkly shoes would look in their place”? More and more issues of independence, contrast, context, and interdependence the more I think about the design challenge here.
    http://static.dezeen.com/uploads/2008/08/ferrer-store-by-arne-quinze-mg_4500.jpg

    http://static.dezeen.com/uploads/2008/08/ferrer-store-by-arne-quinze-mg_4463.jpg

  • "random" or "space" or "comparison". You put the flowers next to the shelves to make the shelves look less industrial, more rather part of a “beautiful home”. Strew “interesting books” that display some kind of character and give the shopper the good feelings of intellect or sophistication or depth.
    http://www.interiordesign.net/articles/blog/1850000585/20090304/Tjep.jpg
    http://si.wsj.net/public/resources/images/PJ-AT567_MENSTO_F_20100210145945.jpg
    http://i2.wp.com/www.emanuela.nl/wp-content/uploads/2011/05/anthropologie-bread-display1.jpg
    http://3.bp.blogspot.com/-_PZT6FOMv4A/TrxEl_pzyTI/AAAAAAAAAhE/DKJLdkU4j0o/s640/Plastic+Cups.jpg
    http://www.in-formdesign.com/Project/6/1.jpg
    http://1.bp.blogspot.com/-gyQXBb4TnW8/TkTDT7NquoI/AAAAAAAABZM/IjyVwRNpbt8/s1600/anthropologie.jpg
    http://i.telegraph.co.uk/multimedia/archive/01679/westfield_1679898c.jpg
    http://3.bp.blogspot.com/_uUplWiZ3Zjs/TBfpgLL1TFI/AAAAAAAAAyA/VAypsvgyrQQ/s1600/Modern-V2K-Nisantasi-Fashion-Store-interior-by-Autoban.jpg
    http://farm1.static.flickr.com/58/196986909_74f16e1fc8.jpg?v=1153821700
    http://www.refinery29.com/static/bin/entry/162/x/2761/roslyn-editorial-january-1.jpg
    http://takenfrom.com/Blank_Site/archive/Chlorine/High%20End%20Fashion%20Store%20Interior%20Design%20by%20Autoban.jpg
    ferrerstorebyarnequinzevk3.jpg
    Or, what if you just leave a blank space in the e-store array? Does it waste more time by making the shopper scroll down more? or does it create “breathing room” the way an expensive clothing store stocks few items?
  • price comparisons. You stock the really really expensive pantsuit next to the expensive pantsuit not to sell the really-really-expensive one, but to justify the price or lend even more glamour to the expensive one.
    image
    image

  • more obvious, direct complements like put carrots and pitas next to hummous so both the hummous looks better and you will enjoy it more. Nothing sneaky in that case.

Did you ever have the experience that you buy something in the store and it read so differently in the store and when you were caught up in the magic of the lifestyle they were trying to present to you, but now it’s hanging up with your stuff it reads so different and doesn’t actually say what you thought it said at the time?

http://www.keeyool.com/020407_anthro_01.jpg
http://newcambridgeobserver.com/wp-content/uploads/2010/10/Anthropol-12.jpg

For me if I’m clothes shopping I’m thinking back on what else I own, what outfits I could make with this, how this is going to look on me, how its message fits in with my own personal style. And at the same time, the store is fighting me to define the context.

 

In the Modcloth example I’m talking mostly about 2- or 3-way interactions between objects. In analogy to simplicial complexes these would be the 1-faces or 2-faces of a skeleton.

But in general in a branded store, the overall effect is closer to let’s say the N-cells or N−1-cells. Maybe it’s not as precise as the painting in http://isomorphismes.tumblr.com/post/16039994007/thoroughly-enmeshed-composition-perturbation or a perfectly crafted poem or TV advertisement, where one change would spoil the perfection.

http://www.kineda.com/photos/travel/hlam2.jpg

But clothing stores are definitely holistic to a degree. By which I mean that the whole is more than the sum of the parts. It’s about how everything works together rather than any one thing. And a good brand develops its own je ne sais quoi which, more than the elements individually, evokes some ideal lifestyle.

image
http://kickshawproductions.com/blog/wp-content/uploads/2010/12/09.10.22-The-Room.jpg

http://www.pitaya.com/slides/2013/slide2.jpg
Cafe Interior Design Best Interior Design Tips For Cafe Pictures
Cafe Shop Design Cafedesigns

Dior Balloon Ad
http://2.bp.blogspot.com/-HmIMSXLgNlI/TlKZkG9_r0I/AAAAAAAAEEc/A4wVR3KnbcY/s1600/Jo2_blog_V_22oct09_mag.jpg

http://mylifestream.net/photostream/uploaded_images/Anthropologie-store-in-Rockefeller-Center-2_12-27-2005_1-13-56_PM-792272.JPG

More on this topic after I finish my reading on Markov basis.




SETUP (CAN BE SKIPPED)

We start with data (how was it collected?) and the hope that we can compare them. We also start with a question which is of the form:

  • how much tax increase is associated with how much tax avoidance/tax evasion/country fleeing by the top 1%?
  • how much traffic does our website lose (gain) if we slow down (speed up) the load time?
  • how many of their soldiers do we kill for every soldier we lose?
  • how much do gun deaths [suicide | gang violence | rampaging multihomicide] decrease with 10,000 guns taken out of the population?
  • how much more fuel do you need to fly your commercial jet 1,000 metres higher in the sky?
  • how much famine [to whom] results when the price of low-protein wheat rises by $1?
  • how much vegetarian eating results when the price of beef rises by $5? (and again distributionally, does it change preferentially by people with a certain culture or personal history, such as they’ve learned vegetarian meals before or they grew up not affording meat?) How much does the price of beef rise when the price of feed-corn rises by $1?
  • how much extra effort at work will result in how much higher bonus?
  • how many more hours of training will result in how much faster marathon time (or in how much better heart health)?
  • how much does society lose when a scientist moves to the financial sector?
  • how much does having a modern financial system raise GDP growth? (here ∵ the X ~ branchy and multidimensional, we won’t be able to interpolate in Tufte’s preferred sense)
  • how many petatonnes of carbon per year does it take to raise the global temperature by how much?
  • how much does $1000 million spent funding basic science research yield us in 30 years?
  • how much will this MBA raise my annual income?
  • how much more money does a comparable White make than a comparable Black? (or a comparable Man than a comparable Woman?)
  • how much does a reduction in child mortality decrease fecundity? (if it actually does)

  • how much can I influence your behaviour by priming you prior to this psychological experiment?
  • how much higher/lower do Boys score than Girls on some assessment? (the answer is usually “low |β|, with low p" — in other words "not very different but due to the high volume of data whatever we find is with high statistical strength")

bearing in mind that this response-magnitude may differ under varying circumstances. (Raising morning-beauty-prep time from 1 minute to 10 minutes will do more than raising 110 minutes to 120 minutes of prep. Also there may be interaction terms like you need both a petroleum engineering degree and to live in one of {Naija, Indonesia, Alaska, Kazakhstan, Saudi Arabia, Oman, Qatar} in order to see the income bump. Also many of these questions have a time-factor, like the MBA and the climate ones.)

building up a nonlinear function from linear parts

As Trygve Haavelmo put it: using reason alone we can probably figure out which direction each of these responses will go. But knowing just that raising the tax rate will drive away some number of rich doesn’t push the debate very far—if all you lose is a handful of symbolic Eduardo Saverins who were already on the cusp of fleeing the country, then bringing up the Laffer curve is chaff. But if the number turns out to be large then it’s really worth discussing.

In less polite terms: until we quantify what we’re debating about, you can spit bollocks all day long. Once the debate is quantified then the discussion should become way more intelligent, less derailing to irrelevant theoretically-possible-issues-which-are-not-really-worth-wasting-time-on.

So we change one variable over which we have control and measure how the interesting thing responds. Once we measure both we come to the regression stage where we try to make a statement of the form “A 30% increase in effort will result in a 10% increase in wage” or “5 extra minutes getting ready in the morning will make me look 5% better”. (You should agree from those examples that the same number won’t necessarily hold throughout the whole range. Like if I spend three hours getting ready the returns will have diminished from the returns on the first five minutes.)

Correlation

Avoiding causal language, we say that a 10% increase in (your salary) is associated with a 30% increase in (your effort).

 
MAIN PART (SKIP TO HERE IF SKIMMING)

The two numbers that jump out of any regression table output (e.g., lm in R) are p and β.

  • β is the estimated size of the linear effect
  • p is how sure we are that the estimated size is exactly β. (As in golf, a low p is better: more confident, more sure. Low p can also be stated as a high t.)

Wary that regression tables spit out many, many numbers (like Durbin-Watson statistic, F statistic, Akaike Information, and more) specifically to measure potential problems with interpreting β and p naïvely, here are pictures of the textbook situations where p and β can be interpreted in the straightforward way:

First, the standard cases where the regression analysis works as it should and how to read it is fairly obvious:
(NB: These are continuous variables rather than on/off switches or ordered categories. So instead of “Followed the weight-loss regimen” or “Didn’t follow the weight-loss regimen” it’s someone quantified how much it was followed. Again, actual measurements (how they were coded) getting in the way of our gleeful playing with numbers.)

image
image
image

Second, the case I want to draw attention to: a small statistical significance doesn’t necessarily mean nothing’s going on there.

image
image

The code I used to generate these fake-data and plots.

If the regression measures a high β but low confidence (high p), that is still worth taking a look at. If regression picks up wide dispersion in male-versus-female wages—let’s say double—but we’re not so confident (high p) that it’s exactly double because it’s sometimes 95%, sometimes 180%, sometimes 310%, we’ve still picked up a significant effect.

The exact value of β would not be statistically significant or confidently precise due to a high p but actually this would be a very significant finding. (Try it the same with any of my other examples, or another quantitative-comparison scenario you think up. It’s either a serious opportunity, or a serious problem, that you’ve uncovered. Just needs further looking to see where the variation around double comes from.)

You can read elsewhere about how awful it is that p<.05 is the password for publishable science, for many reasons that require some statistical vocabulary. But I think the most intuitive problem is the one I just stated. If your geiger counter flips out to ten times the deadly level of radiation, it doesn’t matter if it sometimes reads 8, sometimes 0, and sometimes 15—the point is, you need to be worried and get the h*** out of there. (Unless the machine is wacked—but you’d still be spooked, wouldn’t you?)

 
FOLLOW-UP (CAN BE SKIPPED)

The scale of β is the all-important thing that we are after. Small differences in βs of variables that are important to your life can make a huge difference.

  • Think about getting a 3% raise (1.03) versus a 1% wage cut (.99).
  • Think about twelve in every 1000 births kill the mother versus four in every 1000.
  • Think about being 5 minutes late for the meeting versus 5 minutes early.

image
linear maps as multiplication
linear mappings -- notice they're ALL straight lines through the origin!


Order-of-magnitude differences (like 20 versus 2) is the difference between fly and dog; between life in the USA and near-famine; between oil tanker and gas pump; between Tibet’s altitude and Illinois’; between driving and walking; even the Black Death was only a tenth of an order of magnitude of reduction in human population.




Keeping in mind that calculus tells us that nonlinear functions can be approximated in a local region by linear functions (unless the nonlinear function jumps), β is an acceptable measure of “Around the current levels of webspeed” or “Around the current levels of taxation” how does the interesting thing respond.



Linear response magnitudes can also be used to estimate global responses in a nonlinear function, but you will be quantifying something other than the local linear approximation.

Anscombes quartet  The four data sets are different, yet they have the same &#8220;line of best fit&#8221; as computed by ordinary least squares regression.




Tibshirani’s original paper on the lasso.

  • Breiman’s Garotte — 1993
  • Tibshirani lasso paper submitted — 1994
  • Tibshirani lasso paper revised — 1995
  • Tibshirani lasso paper accepted — 1996

This is one of those papers that I’m so excited about, I feel like “You should just read the whole thing! It’s all good!” But I realise that’s less than reasonable.

Here is a bit of summary, feel free to request other information and I’ll do my best to adapt it.

The basic question is: I have some data and I want the computer to generate (regress) a linear model of it for me. What procedure should I tell the computer to do to get a good | better | best model?

The first technique, by Abraham de Moivre, applied fruitfully by Gauss in the late 1800’s (so, ok, no computers then — but nowadays we just run lm in R), was to minimise the sum of squared error (Euclidean distance)
image
of a given affine model of the data. (Affine being linear + one more parameter for a variable origin, to account for the average value of the data ex observable parameters. For example to model incomes in the USA when the only observed parameters are age, race, and zip code—you would want to include the average baseline US income level, and that would be accomplished mathematically by shifting the origin, a.k.a. the alpha; or autonomous or “vector of ones” regression-model parameter, a.k.a. the affine addition to an otherwise linear model.)

It was noticed by several someones at various points in time that whilst de Moivre’s least-squares (OLS) method is provably (calculus of variations) the optimal linear model given well-behaved data, real data does not always behave.

http://bookcoverarchive.com/images/books/wellbehaved_women_seldom_make_history.large.jpg

In the presence of correlation, missing data, wrong data, and other problems, the “optimal” OLS solution is overfit, meaning that the model it makes for you picks up on too many of the problems. Is there a way to pick up on more signal and less noise? More gold and less dross? More of the real stuff and fewer of the impurities?

I can think of 2 ways people have tried to scrape off the corrosion without flaying as well too much of the underlying good-material:

  1. Assume simpler models are better. This is the approach taken by ridge regression (a.k.a. Tikhonov regularisation a.k.a. penalisation), the lasso, and the garotte
    ridge regression with tuning parameter highlighted
  2. Compare ensembles of models, then choose one in the “middle”. Robust methods, for example, use statistical functions that vary less in theory from flawed situation to flawed situation, than do other statistical functions. Subset selection, hierarchical methods, and  generate a lot of models on the real data and 
 

That’s the backstory. Now on to what Tibshirani actually says. His original lasso paper contrasts 3 ways of penalising complicated models, plus regression on subsets.

The three formulae:

  • penalisation & restriction to subsets
    Ridge Regression: edited it again to make the lambda term specifically look like the tuning parameter for the penalty
    Tibshirani 3
    definition of ’hat_j^o is the OLS magnitudes
  • garotte
    Leo Breiman garotte
  • lasso
    Tibshirani lasso def

look superficially quite similar. Tibshirani discusses the pro’s, con’s, when’s, and wherefore’s of the different approaches.

Tibshirani lasso 1

 

(In reading this paper I learned a new symbol, the operator ƒ(x) = (x)⁺. It means
’(x) = (x)⁺
and looks like
’(x) = (x)⁺
). In R code, ifelse(x<0, 0, x). Like absolute value but not exactly.)

 

Back to the lasso. How does such a small change to the penalty function, change the estimated linear model we get as output?

http://i.imgur.com/VJiAh.png
http://i.imgur.com/zeBrm.png

http://i.imgur.com/tqOim.png
http://i.imgur.com/zeBrm.png