Posts tagged with **derivative**

gradient descent on a 2-dimensional convex, quadratic cost function with condition number=100

- adding momentum the gradient speeds up the approximation, in these high-condition cases — still using gradient descent (which scales better than Newton-Raphson in high-D)
- like adding momentum in an oscillating mechanical system that vibrates too much
- heavy ball method (Polyak)

(Source: simons.berkeley.edu)

hi-res

Saying derivative is “slope” is a nice pedant’s lie, like the Bohr atom

which misses out on a deeper and more interesting later viewpoint:

The “slope” viewpoint—and what underlies it: the “charts” or “plots” view of functions as `ƒ(x)–vs–x`

—like training wheels, eventually need to come off. The “slope” metaphor fails

- for pushforwards,
- on surfaces,

- on curves γ that double back on themselves

- my vignettes about integrals,
- and, in my opinion, it’s harder to “see” derivatives or calculus in a statistical or business application, if you think of “derivative = slope”. Since you’re presented with reams of numbers rather than pictures of
`ƒ(x)–vs–x`

, where is the “slope” there?

"Really" it’s all about **diff’s**. Derivatives are differences (just zoomed in…this is what `lim ∆x↓0`

was for) and *that* viewpoint works, I think, everywhere.

I half-heartedly tried making the following illustrations in R with the barcode package but they came out ugly. Even uglier than my handwriting—so now enjoy the treat of my ugly handwriting.

Step back to **Descartes’** definition of a function. It’s an **association between two sets.**

And the language we use sounds “backwards” to that of English. If I say “associate a temperature number *to* every point over the USA”

then that should be written as a function `ƒ: surface → temp`

.,

(or we could say `ƒ: ℝ²→ℝ`

with `ℝ²=(lat,long)`

)

The `\to`

arrow and the *"maps to"* phrasing are backwards of the way we speak.

- "Assign a temperature to the surface" —versus— "Map each surface point to a temperature element from the set of possible temperatures”.

`{elf, book, Kraken, 4^π^e}`

… no, I’m not sure where that came from either. But I think we can agree that such a set is unstructured.

Great. I drew above a set “without other structure" as the source (domain) and a branched, partially ordered weirdy thing as the target (codomain). Now it’s possible with some work to come up with a calculus like the infinitesimal one on ℝ→ℝ functions that’s taught to many 19-year-olds, but that takes more work. But for right now my point is to make that look ridiculous and impossible. Newton’s calculus is something we do only with a specific kind of Cartesian mapping: where both the `from`

and the `to`

have Euclidean concepts of straight-line-ness and distance has the usual meaning from maths class. In other words the Newtonian derivative applies only to smooth mappings from ℝ to ℝ.

Let’s stop there and think about **examples of mappings.**

(Not from the real world—I’ll do another post on examples of functions from the real world. For now just accept that numbers describe the world and let’s consider abstractly some mappings that associate, not arbitrarily but in a describable pattern, some numbers to other numbers.)

(I didn’t have a calculator at the time but the circle values for `[1,2,3,4,5,6,7]`

are `[57°,114°,172°,229°,286°,344°,401°=41°]`

.)

I want to contrast the “map upwards” pictures to both the Cartesian pictures for structure-less sets

and to the normal graphical picture of a “chart” or “plot”.

Notice what’s obscured and what’s emphasised in each of the picture types. The plots certainly *look* better—but we lose the Cartesian sense that the “vertical” axis is no more vertical than is the horizontal. Both ℝ’s in ƒ: ℝ→ℝ are just the same as the other.

And if I want to compose mappings? As in the parabola picture above (first the `square`

function, then an affine recentering). I can only show the end result of g∘ƒ rather than the intermediate result.

Whereas I could line up a long vertical of successive transformations (like one might do in Excel except that would be column-wise to the right) and see the results of each “input-output program”.

(Además, I have a languishing draft post called “How I Got to Gobbledegook” which shows how much simpler a sequence of transforms can be rather than “a forbidding formula from a textbook”.)

Another weakness of the “charts” approach is that whereas `"Stay the same"`

command ought to be the simplest one (it’s a null command), it gets mapped to the 45˚ line:

Here’s the familiar parabola / `x²`

plot “my way”: with the numbers written out so as to equalise the target space and the source space.

Now the “new” tool is in hand let’s go back to the calculus. Now I’m going to say **"derivative=pulse"** and that’s the main point of this essay.

Considering both the source ℝ→ and the target →ℝ on the same footing, I’ll call the length of the arrows the “mapping strength”. In a convex mapping like `square`

the diffs are going to increase as you go to the right.

OK now in the middle of the piece, here is the main point I want to make about derivatives and calculus and how looking at *numbers* written on the paper rather than *plots* makes understanding a push forward possible. And, in my opinion, since in business the gigantic databases of numbers are commoner than charts making themselves, and in life we just experience stimuli rather than someone making a chart to explain it to us, this perspective is the more practical one.

I’m deliberately alliding the concepts of diff as

- difference
`R`

's`diff`

function- differential (as in differential calculus or as in linear approximation)

**Linear**

I’m still not quite done with the “my style of pictures” because there’s another insight you can get from writing these mappings as a bar code rather than as a “chart”. Indeed, this is exactly what a rug plot does when it shows histograms.

Here are some strip plots = rug plots = carpet plots = barcode plots of nonlinear functions for comparison.

The main conclusion of calculus is that nonlinear functions can be approximated by linear functions. The approximation only works “locally” at small scales, but still if you’re engineering the screws holding a plane together, it’s nice to know that you can just use a multiple (linear function) rather than some complicated nonlineary thingie to estimate how much the screws are going to shake and come loose.

For me, at least, way too many years of solving `y=mx+b`

obscured the fact that *linear functions are just multiples*. You take the space and stretch or shrink it by a constant multiple. Like converting a currency: take pesos, divide by 8, get dollars. The multiple doesn’t change if you have 10,000 pesos or 10,000,000 pesos, it’s still the same conversion rate.

So in a neighborhood or locality a linear approximation is enough. That means that a collection of linear functions can approximate a nonlinear one to arbitrary precision.

That means we can use computers!

**Square**

I can’t use the example of `self times self`

so many times without exploring the concept a bit. Squares to me seem so limited and boring. No squizzles, no funky shapes, just boring chalkboard and rulers.

But that’s probably too judgmental.

After all there’s something self-referential and almost recursive about repeated applications of the `square`

function. And it serves as the basis for Euclidean distance (and standard deviation formula) via the Pythagorean theorem.

How those two are connected is a mystery I still haven’t wrapped my head around. But a cool connection I have come to understand is that between:

- a variety of inverse square laws in Nature
- a curve that is equidistant from a point and a line
- and the area of a rectangle which has both sides equal.

I guess first of all one has to appreciate that “parabola” shouldn’t necessarily have anything to do with `x•x`

. Hopefully that’s become more obvious if you read the sections above where I point out that the target ℝ isn’t any more “vertical” than is the source ℝ.

The inverse-square laws show up everywhere *because our universe is 3-dimensional*. The surface of a 3-dimensional ball (like an expanding wave of gravitons, or an expanding wave of photons, or an expanding wave of sound waves) is 2-dimensional, which means that whatever “force” or “energy” is “painted on” the surface, will drop off as the square rate (surface area) when the radius increases at a constant rate. Oh. Thanks, Universe, for being 3-dimensional.

What’s most amazing about the parabola—gravity connection is that it’s a metaphor that spans across both space *and* time. The curvature that looks like a-plane-figure-equidistant-to-a-line-and-a-point is curving *in time*.

60 Plays • Download

In gradeschool calculus I learnt that derivative = slope. That was a nice teacher’s lie (like the Bohr atom is a nice teacher’s lie) to get the essential point across. But “derivative = slope” isn’t ultimately helpful because in real life, functions aren’t drawn on a chalkboard. ℝ→ℝ drawings don’t always look like what they feel like (e.g. this parabola).

ℝ→ℝ drawings’ “slope” *feels* more like a pulse, a β (observed magnitude), a force, a pay rise, a spike in the price of petrol, a nasty vega wave that chokes out a hedge fund, cruising down the highway (speedometer not odometer), a basic not a derived parameter, a linear operator in the space of all functionals, a blip, a pushforward, an impression, a straight-line projection from data, a deep dive into a function’s infinite profundity, a “bite” in the words of Jan Koenderink.

**A derivative “is really” a pulse. And an integral “is really” an accumulation.**

This story, “Bird’s Eye View” by Radiolab (minute 12:00), nicely illustrates a differential-geometry-consistent view of derivative & integral in the pleasantly-unexpected space of rare languages.

English : Derivative :: Pormpuraaw : Integral

In the Pormpuraaw language of Cape York, Australia, people say things like “You have an ant on your south-west leg” and “Move your cup to the north-north-west a bit”. “*How ya goin’?”* one asks the other. *"Headed east-north-east in the middle distance."*

- Little kids always know, even indoors, which cardinal direction they’re facing.
- This is very useful when you live in the outback without a GPS.
- American linguistics professor who was exploring there: “After about a week I developed a bird’s-eye view of myself on a map, like a video game, in the upper right corner of my mind’s eye.”

The mental map is like a running integral ∮ xᵗθᵗ dt of moves they make. (Or we could think of it decomposed into two integrals, one that tracks changes in orientation ∮ θᵗ and one that tracks accumulating changes in place ∮xᵗ.) In other words, a bird’s-eye view.

**left right forward back : derivative :: NSEW : integral**

Our English way of thinking is like a differential-geometry-consistent derivative. The time derivative “takes a bite” out of space and so is always relative to the particular moment in time. “Left” and “right” are concepts like this — relative, immediate, and having no length of their own. Just like the differential forms in Élie Cartan’s exterior algebra — tangent to our bodies.

There is a way to make this more precise and I *think* it would make sense to do it on ℂ || with a twistor || spinor. (Help, anyone? David?)

Our English conception of time & space is like a (time-)derivative of our movements. The Pormpuraawans’ conception of time & space is like an integral of their movements, orientation, and location. When we think of direction it’s an immediate slice of time. When they think of direction they’ve been *tracking* those relative-direction derivatives and they answer with the sum.