Posts tagged with calculus

gradient descent on a 2-dimensional convex, quadratic cost function with condition number=100

• adding momentum the gradient speeds up the approximation, in these high-condition cases — still using gradient descent (which scales better than Newton-Raphson in high-D)
• like adding momentum in an oscillating mechanical system that vibrates too much
• heavy ball method (Polyak)

(Source: simons.berkeley.edu)

hi-res

## What should we name our band?

For me this question is a symbol of problems that people’s intuitions are good at, but their mathematical models fail at.

• We can certainly define the domain (set of possible words)
• and we can define reasonable scalar-ish codomains (number of hit records, rankings by critics, faces on the people outside your show, …)
• but how would you set up an optimisation problem to answer this question?

It doesn’t just fail because it needs to be parameterised by

• the history of other bands (“Lady Gaga”)
• puns or linguistic meaning (“The Beatles”)
• emotional tenor of the band’s songs (imagine if The BeeGees were instead called Thräsherdëth)

but also because calculus' Really Cool Idea

finds no purchase since any 1-D lineup of all possible band names won’t be 𝒞¹ onto the success of the band.

Like “What should I write about today?”, “What line of business should I get into?”, “What scientific problem should I study?”, “What should I do with my life?”, and a lot of other “broad, open-ended” questions, choosing a band-name is something I don’t think can be mathematised today. It’s also a mental shorthand for me for any question that is going to be answered better by “art” than by “science”.

Double integrals `∫∫ƒ(x)dA` are introduced as a “little teacher’s lie” in calculus. The “real story” requires “geometric algebra”, or “the logic of length-shape-volume relationships”. Keywords

• multilinear algebra
• Grassmann algebra / Grassmanian
• exterior calculus
• Élie Cartán’s differential-forms approach to tensors

These equivalence-classes of blobs explain how

• volumes (ahem—oriented volumes!)
• areas (ahem—oriented areas!)
• arrows (vectors)
• numbers (scalars)

"should" interface with each other. That is, Clifford algebra or Grassman algebra or "exterior algebra" or "geometrical algebra" encodes how physical quantities with these dimensionalities do interface with each other.

(First the volumes are abstracted from their original context—then they can be “attached” to something else.)

` `

EDIT:user mrfractal points out that Clifford algebras can only have dimensions of 2,4,8,16,… https://en.wikipedia.org/wiki/Clifford_algebra#Basis_and_dimension Yes, that’s right. This post is not totally correct. I let it fly out of the queue without editing it and it may contain other inaccuracies. I was trying to throw out a bunch of relevant keywords that go along with these motivating pictures, and relate it to equivalence-classing, one of my favourite themes within this blog. The text here is disjointed, unedited, and perhaps wrong in other ways. Mostly just wanted to share the pictures; I’ll try to fix up the text some other time. Grazie.

(Source: arxiv.org)

Define the derivative to be the thing that makes the fundamental theorem of calculus work.

## CALCULUS: the Really Really Short Version

in <10 words: Derivative is shifted-subtraction. Integral is accumulation.

So, you never went to university…or you assiduously avoided all maths whilst at university…or you started but were frightened away by the epsilons and deltas…. But you know the calculus is one of the pinnacles of human thought, and it would be nice to know just a bit of what they’re talking about……

Both thorough and brief intro-to-calculus lectures can be found online. I think I can explain differentiation and integration—the two famous operations of calculus—even more briefly.

` `

Let’s talk about sequences of numbers. Sequences that make sense next to each other, like your child’s height at different ages

not just an unrelated assemblage of numbers which happen to be beside each other. If you have handy a sequence of numbers that’s relevant to you, that’s great.

` `

Differentiation and integration are two ways of transforming the sequence to see it differently-but-more-or-less-equivalently.

Consider the sequence 1, 2, 3, 4, 5. If I look at the differences I could rewrite this sequence as `[starting point of 1]`, +1, +1, +1, +1. All I did was look at the difference between each number in the sequence and its neighbour. If I did the same thing to the sequence 1, 4, 9, 16, 25, the differences would be `[starting point of 1]`, +3, +5, +7, +9.

That’s the derivative operation. Derivative is shifted-subtraction. It’s (first-)differencing, except in real calculus you would have an infinite, continuous thickness of decimals—more numbers between 1, 4, and 9 than you could possibly want. In R you can use the `diff` operation on a sequence of data to automate what I did above. For example do

• `seq <- 1:5`
• `diff(seq)`
• `seq2 <- seq*seq`
• `diff(seq2)`

A couple of things you may notice:

• I could have started at a different starting point and talked about a sequence with the same changes, changing from a different initial value. For example 5, 6, 7, 8, 9 does the same +1, +1, +1, +1 but starts at 5.
• I could second-difference the numbers, differencing the first-differences: +3, +5, +7, +9 (the differences in the sequence of square numbers) gets me ++2, ++2, ++2.
• I could third-difference the numbers, differencing the second-differences: +++0, +++0.
• Every time I `diff` I lose one of the observations. This isn’t a problem in the infinitary version although sometimes even infinitely-thick sequences can only be differentiated a few times, for other reasons.

The other famous tool for looking differently at a sequence is to look at cumulative sums: cumsum in R. This is integration. Looking at “total so far” in the sequence.

Consider again the sequence 1, 2, 3, 4, 5. If I added up the “total so far” at each point I would get 1, 3, 6, 10, 15. This is telling me the same information – just in a different way. The fundamental theorem of calculus says that if I `diff( cumsum( 1:5 ))` I will get back to +1, +2, +3, +4, +5. You can verify this without a calculator by subtracting neighbours—looking at differences—amongst 1, 3, 6, 10, 15. (Go ahead, try it; I’ll wait.)

Let’s look back at the square sequence 1, 4, 9, 25, 36. If I cumulatively sum I’d have 1, 5, 15, 40, 76. Pick any sequence of numbers that’s relevant to you and do `cumsum` and `diff` on it as many times as you like.

` `

Those are the basics.

##### Why are people so interested in this stuff?

Why is it useful? Why did it make such a splash and why is it considered to be in the canon of human progress? Here are a few reasons:

• If the difference in a sequence goes from +, +, +, +, … to −, −, −, −, …, then the numbers climbed a hill and started going back down. In other words the sequence reached a maximum. We like to maximize things, like efficiency, profit,
• A corresponding statement could be made for valley-bottoms. We like to minimise things like cost, waste, usage of valuable materials, etc.
• The `diff` verb takes you from position → velocity → acceleration, so this mathematics relates fundamental stuff in physics.
• The `cumsum` verb takes you from acceleration → velocity → position, which allows you to calculate stuff like work. Therefore you can pre-plan for example what would be the energy cost to do something in a large scale that’s too costly to just try it.
• What’s the difference between income and wealth? Well if you define `net income` to be what you earn less what you spend,

then `wealth = cumsum(net income)` and `net income = diff(wealth)`. Another everyday relationship made absolutely crystal clear.
• In higher-dimensional or more-abstract versions of the fundamental theorem of calculus, you find out that, sometimes, complicated questions like the sum of forces a paramecium experiences all along a sequential curved path, can be reduced to merely the start and finish (i.e., the complicatedness may be one dimension less than what you thought).
• Further-abstracted versions also allow you to optimise surfaces (including “surfaces” in phase-space) and therefore build bridges or do rocket-science.
• With the fluidity that comes with being able to `diff` and `cumsum`, you can do statistics on continuous variables like height or angle, rather than just on count variables like number of people satisfying condition X.
• At small enough scales, calculus (specifically Taylor’s theorem) tells you that "most" nonlinear functions can be linearised: i.e., approximated by repeated addition of a constant `+const+const+const+const+const+...`. That’s just about the simplest mathematical operation I can think of. It’s nice to be able to talk at least locally about a complicated phenomenon in such simple terms.
• In the infinitary version, symbolic formulae `diff` and `cumsum` to other symbolic formulae. For example `diff( x² ) = 2x` (look back at the square sequence above if you didn’t notice this the first time). This means instead of having to try (or make your computer try) a lot of stuff to see what’s going to work, you can just-plain-understand something.
• Also because of the symbolic nicety: post-calculus, if you only know how, e.g., `diff( diff( diff( x )))` relates to `x` – but don’t know a formula for `x` itself – you’re not totally up a creek. You can use calculus tools to make relationships between varying `diff` levels of a sequence, just as good as a normal formula – thus expanding the landscape of things you can mathematise and solve.
• In fact `diff( diff( x )) = − x` is the source of this, this

, this,

, and therefore the physical properties of all materials (hardness, conductivity, density, why is the sky blue, etc) – which derive from chemistry which derives from Schrödinger’s Equation, which is solved by the “harmonic” `diff( diff( x )) = − x`.

Calculus isn’t “the end” of mathematics. It’s barely even before or after other mathematical stuff you may be familiar with. For example it doesn’t come “after” trigonometry, although the two do relate to each other if you’re familiar with both. You could apply the “differencing” idea to groups, topology, imaginary numbers, or other things. Calculus is just a tool for looking at the same thing in a different way.

hi-res

Transgressing boundaries, smashing binaries, and queering categories are important goals within certain schools of thought.

Reading such stuff the other week-end I noticed (a) a heap of geometrical metaphors and (b) limited geometrical vocabulary.

In my opinion functional analysis (as in, precision about mathematical functions—not practical deconstruction) points toward more appropriate geometries than just the `[0,1]` of fuzzy logic. If your goal is to escape “either/or” then I don’t think you’ve escaped very much if you just make room for an “in between”.

By contrast `ℝ→ℝ` functions (even continuous ones; even smooth ones!) can wiggle out of definitions you might naïvely try to impose on them. The space of functions naturally lends itself to different metrics that are appropriate for different purposes, rather than “one right answer”. And even trying to define a rational means of categorising things requires a lot—like, Terence Tao level—of hard thinking.

I’ll illustrate my point with the arbitrary function ƒ pictured at the top of this post. Suppose that ƒ∈𝒞². So it does make sense to talk about whether ƒ′′≷0.

But in the case I drew above, ƒ′′≹0. In fact “most” 𝒞² functions on that same interval wouldn’t fully fit into either “concave" or "convex”.

So “fits the binary” is rarer than “doesn’t fit the binary”. The “borderlands” are bigger than the staked-out lands. And it would be very strange to even think about trying to shoehorn generic 𝒞² functions into

• one type,
• the other,
• or “something in between”.

Beyond “false dichotomy”, ≶ in this space doesn’t even pass the scoff test. I wouldn’t want to call the ƒ I drew a “queer function”, but I wonder if a geometry like this isn’t more what queer theorists want than something as evanescent as “liminal”, something as thin as "boundary".

hi-res

## ∂ Campbell’s

`Cylinder = line-segment × disc`

`C = | × ●`

The “product rule” from calculus works as well with the boundary operator `∂` as with the differentiation operator `∂`.

`∂C  =   ∂| × ●   +   | × ∂●`

Oops. Typo. Sorry, I did this really late at night! `cos` and `sin` need to be swapped.

Oops. Another typo. Wrong formula for circumference.

## Subtraction Is Crazy

I was re-reading Michael Murray’s explanation of cointegration:

and marvelling at the calculus.

Of course it’s not any subtraction. It’s subtracting a function from a shifted version of itself. Still doesn’t sound like a universal revolution.

(But of course the observation that the lagged first-difference will be zero around an extremum (turning point), along with symbolic formulæ for (infinitesimal) first-differences of a function, made a decent splash.)

$\large \dpi{200} \bg_white f ^\prime \equiv \lim_{\mathrm{lag} \downarrow 0} {\mathrm{lag} (f)-f \over |\mathrm{lag}| }$

Jeff Ryan wrote some R functions that make it easy to first-difference financial time series.

Here’s how to do the first differences of Goldman Sachs’ share price:

```require(quantmod)
getSymbols("GS")
plot(  gs - lag(gs)  )
```

Look how much more structured the result is! Now all of the numbers are within a fairly narrow band. With `length(gs)` I found 1570 observations. Here are 1570 random normals `plot(rnorm(1570, sd=10), type="l")` for comparison:

Not perfectly similar, but very close!

Looking at the first differences compared to a Gaussian brings out what’s different between public equity markets and a random walk. What sticks out to me is the vol leaping up aperiodically in the \$GS time series.

I think I got even a little closer with drawing the stdev’s from a Poisson process `plot(rnorm(1570, sd=rpois(1570, lambda=5)), type="l")`

but I’ll end there with the graphical futzing.

What’s really amazing to me is how much difference a subtraction makes.

differential topology lecture by John W. Milnor from the 1960’s: Topology from the Differentiable Viewpoint

• A function that’s problematic for analytic continuations:
$\large \dpi{200} \bg_white \begin{cases}{0 & \text{ if } t < 0, \\ \exp{-{1 \over t}} & \text{ if } t > 0 } \end{cases}$
• Definitions of smooth manifold, diffeomorphism, category of smooth manifolds
• bicontinuity condition
• two Euclidean spaces are diffeomorphic iff they have the same dimension
• torus ≠ sphere but compact manifolds are equivalence-classable by genus
• Moebius band is not compact
• Four categories of topology, which were at first thought to be the same, but by the 60’s seen to be really different (and the maps that keep you within the same category):

diffeomorphisms on smooth manifolds;

piecewise-linear maps on simplicial complexes;

homeomorphisms on sets (point-set topology)

• Those three examples of categories helped understand category and functor in general. You could work for your whole career in one category—for example if you work on fluid dynamics, you’re doing fundamentally different stuff than computer scientists on type theory—and this would filter through to your vocabulary and the assumptions you take for granted. Eg “maps” might mean “smooth bicontinuous maps” in fluid dynamics but non-surjective, discontinuous maps are possible all the time in logic or theoretical comptuer science. Functor being the comparison between the different subjects.
• The fourth, homotopy theory, was invented in the 1930’s because topology itself was too hard.

• Minute 38-40. A pretty slick proof. I often have a hard time following, but this is an exception.
• Minute 43. He misspeaks! In defining the hypercube.
• Minute 47. Homology groups relate the category of topological-spaces-with-homotopy-classes-of-mappings, to the category of groups-with-homomorphisms.

That’s the first of three lectures. Also Milnor’s thoughts almost half a century later on how differential topology had evolved since the lectures:

Hat tip to david a edwards.

What I really loved about this talk was the categorical perspective. The talks are really structured so that three categories — smooth things, piecewise things, and points/sets — are developed in parallel. Better than development of the theory of categories in the abstract, I like having these specific examples of categories and how “sameness" differs from category to category.

(Source: simonsfoundation.org)