Posts tagged with derivative

Big, long cycle = trend.

gradient descent on a 2-dimensional convex, quadratic cost function with condition number=100
adding momentum the gradient speeds up the approximation, in these high-condition cases — still using gradient descent (which scales better than Newton-Raphson in high-D)
like adding momentum in an oscillating mechanical system that vibrates too much
heavy ball method (Polyak)

gradient descent on a 2-dimensional convex, quadratic cost function with condition number=100

  • adding momentum the gradient speeds up the approximation, in these high-condition cases — still using gradient descent (which scales better than Newton-Raphson in high-D)
  • like adding momentum in an oscillating mechanical system that vibrates too much
  • heavy ball method (Polyak)





A fun exercise/problem/puzzle introducing function space.


Saying derivative is “slope” is a nice pedant’s lie, like the Bohr atom


which misses out on a deeper and more interesting later viewpoint:

|6,4,1> Orbital Animation|3,2,1>+|3,1,-1> Orbital Animation


The “slope” viewpoint—and what underlies it: the “charts” or “plots” view of functions as ƒ(x)–vs–x—like training wheels, eventually need to come off. The “slope” metaphor fails

  • for pushforwards,
  • on surfaces,
  • on curves γ that double back on themselves
  • my vignettes about integrals,
  • and, in my opinion, it’s harder to “see” derivatives or calculus in a statistical or business application, if you think of “derivative = slope”. Since you’re presented with reams of numbers rather than pictures of ƒ(x)–vs–x, where is the “slope” there?

"Really" it’s all about diff’s. Derivatives are differences (just zoomed in…this is what lim ∆x↓0 was for) and that viewpoint works, I think, everywhere.

I half-heartedly tried making the following illustrations in R with the barcode package but they came out ugly. Even uglier than my handwriting—so now enjoy the treat of my ugly handwriting.


Step back to Descartes definition of a function. It’s an association between two sets.


And the language we use sounds “backwards” to that of English. If I say “associate a temperature number to every point over the USA”

US temperatures

then that should be written as a function ƒ: surface → temp.,

(or we could say ƒ: ℝ²→ℝ with ℝ²=(lat,long) )

The \to arrow and the "maps to" phrasing are backwards of the way we speak.

  • "Assign a temperature to the surface" —versus— "Map each surface point to a temperature element from the set of possible temperatures”.

a function is an association between sets

{elf, book, Kraken, 4^π^e} … no, I’m not sure where that came from either. But I think we can agree that such a set is unstructured.

Cartesian function from non-space to weird space

Great. I drew above a set “without other structure" as the source (domain) and a branched, partially ordered weirdy thing as the target (codomain). Now it’s possible with some work to come up with a calculus like the infinitesimal one on ℝ→ℝ functions that’s taught to many 19-year-olds, but that takes more work. But for right now my point is to make that look ridiculous and impossible. Newton’s calculus is something we do only with a specific kind of Cartesian mapping: where both the from and the to have Euclidean concepts of straight-line-ness and distance has the usual meaning from maths class. In other words the Newtonian derivative applies only to smooth mappings from ℝ to ℝ.


Let’s stop there and think about examples of mappings.

(Not from the real world—I’ll do another post on examples of functions from the real world. For now just accept that numbers describe the world and let’s consider abstractly some mappings that associate, not arbitrarily but in a describable pattern, some numbers to other numbers.)

successor function and square function

sin function

(I didn’t have a calculator at the time but the circle values for [1,2,3,4,5,6,7] are [57°,114°,172°,229°,286°,344°,401°=41°].)

I want to contrast the “map upwards” pictures to both the Cartesian pictures for structure-less sets


and to the normal graphical picture of a “chart” or “plot”.



Notice what’s obscured and what’s emphasised in each of the picture types. The plots certainly look better—but we lose the Cartesian sense that the “vertical” axis is no more vertical than is the horizontal. Both ℝ’s in ƒ: ℝ→ℝ are just the same as the other.

And if I want to compose mappings? As in the parabola picture above (first the square function, then an affine recentering). I can only show the end result of g∘ƒ rather than the intermediate result.


Whereas I could line up a long vertical of successive transformations (like one might do in Excel except that would be column-wise to the right) and see the results of each “input-output program”.

(Además, I have a languishing draft post called “How I Got to Gobbledegook” which shows how much simpler a sequence of transforms can be rather than “a forbidding formula from a textbook”.)

Another weakness of the “charts” approach is that whereas "Stay the same" command ought to be the simplest one (it’s a null command), it gets mapped to the 45˚ line:


Here’s the familiar parabola / plot “my way”: with the numbers written out so as to equalise the target space and the source space.

Parabola with the domain and codomain on the same footing.


Now the “new” tool is in hand let’s go back to the calculus. Now I’m going to say "derivative=pulse" and that’s the main point of this essay.

linear approximations (differentials) of a parabola (x²)

Considering both the source ℝ→ and the target →ℝ on the same footing, I’ll call the length of the arrows the “mapping strength”. In a convex mapping like square the diffs are going to increase as you go to the right.


OK now in the middle of the piece, here is the main point I want to make about derivatives and calculus and how looking at numbers written on the paper rather than plots makes understanding a push forward possible. And, in my opinion, since in business the gigantic databases of numbers are commoner than charts making themselves, and in life we just experience stimuli rather than someone making a chart to explain it to us, this perspective is the more practical one.

differences on a scalar field (California)

I’m deliberately alliding the concepts of diff as

  • difference
  • R's diff function
  • differential (as in differential calculus or as in linear approximation)
because they’re all related.
differentials on a surface (Where is the Slope?)
a U-neighbourhood of Los Angeles
In my example of an open set around Los Angeles, a surface diff could be you measure the temperature on your rooftop in Los Feliz, and then measure the temperature down the block. Or across the city. Or, if you want to be infinitesimal and truly calculus-ish about it, the difference between the temperature of one fraction of an atom in your room and its nearby neighbour. (How could that be coherent? There are ways, but let’s just stick with the cross-city differential and pretend you could zoom in for more detail if you liked.)


I’m still not quite done with the “my style of pictures” because there’s another insight you can get from writing these mappings as a bar code rather than as a “chart”. Indeed, this is exactly what a rug plot does when it shows histograms.

a rug plot or carpet plot is like a barcode on the bottom of your plot to show the marginal (one-dimension only) distribution of data

Here are some strip plots = rug plots = carpet plots = barcode plots of nonlinear functions for comparison.



The main conclusion of calculus is that nonlinear functions can be approximated by linear functions. The approximation only works “locally” at small scales, but still if you’re engineering the screws holding a plane together, it’s nice to know that you can just use a multiple (linear function) rather than some complicated nonlineary thingie to estimate how much the screws are going to shake and come loose.

For me, at least, way too many years of solving y=mx+b obscured the fact that linear functions are just multiples. You take the space and stretch or shrink it by a constant multiple. Like converting a currency: take pesos, divide by 8, get dollars. The multiple doesn’t change if you have 10,000 pesos or 10,000,000 pesos, it’s still the same conversion rate.



linear maps as multiplication

linear mappings -- notice they're ALL straight lines through the origin!

the flip function

So in a neighborhood or locality a linear approximation is enough. That means that a collection of linear functions can approximate a nonlinear one to arbitrary precision.

building up a nonlinear function from linear parts

That means we can use computers!

Calculus says Smooth functions can be approximatedaround a local neighborhood of a pointwith straight lines



I can’t use the example of self times self so many times without exploring the concept a bit. Squares to me seem so limited and boring. No squizzles, no funky shapes, just boring chalkboard and rulers.

But that’s probably too judgmental.


recursive "Square" function

After all there’s something self-referential and almost recursive about repeated applications of the square function. And it serves as the basis for Euclidean distance (and standard deviation formula) via the Pythagorean theorem.

How those two are connected is a mystery I still haven’t wrapped my head around. But a cool connection I have come to understand is that between:

  • a variety of inverse square laws in Nature
  • a curve that is equidistant from a point and a line
  • and the area of a rectangle which has both sides equal.

inverse square laws

what does self times self have to do with the geometric figure of a parabola?


I guess first of all one has to appreciate that “parabola” shouldn’t necessarily have anything to do with x•x. Hopefully that’s become more obvious if you read the sections above where I point out that the target ℝ isn’t any more “vertical” than is the source ℝ.


The inverse-square laws show up everywhere because our universe is 3-dimensional. The surface of a 3-dimensional ball (like an expanding wave of gravitons, or an expanding wave of photons, or an expanding wave of sound waves) is 2-dimensional, which means that whatever “force” or “energy” is “painted on” the surface, will drop off as the square rate (surface area) when the radius increases at a constant rate. Oh. Thanks, Universe, for being 3-dimensional.

inverse square laws  why, why, why, WHY?!?!

What’s most amazing about the parabola—gravity connection is that it’s a metaphor that spans across both space and time. The curvature that looks like a-plane-figure-equidistant-to-a-line-and-a-point is curving in time.

Just playing with z² / z² + 2z + 2


on WolframAlpha. That’s Wikipedia’s example of a function with two poles (= two singularities = two infinities). Notice how “boring” line-only pictures are compared to the the 3-D ℂ→>ℝ picture of the mapping (the one with the poles=holes). That’s why mathematicians say ℂ uncovers more of “what’s really going on”.

As opposed to normal differentiability, ℂ-differentiability of a function implies:

  • infinite descent into derivatives is possible (no chain of C¹ ⊂ C² ⊂ C³ ... Cω like usual)

  • nice Green’s-theorem type shortcuts make many, many ways of doing something equivalent. (So you can take a complicated real-world situation and validly do easy computations to understand it, because a squibbledy path computes the same as a straight path.)

Pretty interesting to just change things around and see how the parts work.

  • The roots of the denominator are 1+i and 1−i (of course the conjugate of a root is always a root since i and −i are indistinguishable)
  • you can see how the denominator twists
  • a fraction in ℂ space maps lines to circles, because lines and circles are turned inside out (they are just flips of each other: see also projective geometry)
  • if you change the z^2/ to a z/ or a 1/ you can see that.
  • then the Wikipedia picture shows the poles (infinities) 

Complex ℂ→ℂ maps can be split into four parts: the input “real”⊎”imaginary”, and the output “real"⊎"imaginary”. Of course splitting them up like that hides the holistic truth of what’s going on, which comes from the perspective of a “twisted” plane where the elements z are mod z • exp(i • arg z).

a conformal map (angle-preserving map)

ℂ→ℂ mappings mess with my head…and I like it.

60 Plays • Download

In gradeschool calculus I learnt that derivative = slope. That was a nice teacher’s lie (like the Bohr atom is a nice teacher’s lie) to get the essential point across. But “derivative = slope” isn’t ultimately helpful because in real life, functions aren’t drawn on a chalkboard. ℝ→ℝ drawings don’t always look like what they feel like (e.g. this parabola).

ℝ→ℝ drawings’ “slope” feels more like a pulse, a β (observed magnitude), a force, a pay rise, a spike in the price of petrol, a nasty vega wave that chokes out a hedge fund, cruising down the highway (speedometer not odometer), a basic not a derived parameter, a linear operator in the space of all functionals, a blip, a pushforward, an impression, a straight-line projection from data, a deep dive into a function’s infinite profundity, a “bite” in the words of Jan Koenderink.

A derivative “is really” a pulse. And an integral “is really” an accumulation.


This story, “Bird’s Eye View” by Radiolab (minute 12:00), nicely illustrates a differential-geometry-consistent view of derivative & integral in the pleasantly-unexpected space of rare languages.

English : Derivative :: Pormpuraaw : Integral

In the Pormpuraaw language of Cape York, Australia, people say things like “You have an ant on your south-west leg” and “Move your cup to the north-north-west a bit”. “How ya goin’?” one asks the other. "Headed east-north-east in the middle distance."

  • Little kids always know, even indoors, which cardinal direction they’re facing.
  • This is very useful when you live in the outback without a GPS.
  • American linguistics professor who was exploring there: “After about a week I developed a bird’s-eye view of myself on a map, like a video game, in the upper right corner of my mind’s eye.”



The mental map is like a running integral ∮ xᵗθᵗ dt of moves they make. (Or we could think of it decomposed into two integrals, one that tracks changes in orientation ∮ θᵗ and one that tracks accumulating changes in place ∮x.) In other words, a bird’s-eye view.


left right forward back : derivative :: NSEW : integral

Our English way of thinking is like a differential-geometry-consistent derivative. The time derivative “takes a bite” out of space and so is always relative to the particular moment in time. “Left” and “right” are concepts like this — relative, immediate, and having no length of their own. Just like the differential forms in Élie Cartan’s exterior algebra — tangent to our bodies.


There is a way to make this more precise and I think it would make sense to do it on  || with a twistor || spinor. (Help, anyone? David?)


Our English conception of time & space is like a (time-)derivative of our movements. The Pormpuraawans’ conception of time & space is like an integral of their movements, orientation, and location. When we think of direction it’s an immediate slice of time. When they think of direction they’ve been tracking those relative-direction derivatives and they answer with the sum.

(Source: )