Fourier's Cheat — On Domain Shifts and the Tricks That Made Modern Computation Possible

There is a question that cuts to the heart of how computers actually work, and it almost never gets asked: what did we give up when we chose digital over analog?

Analog computers — the kind that were serious engineering tools through the 1960s — do not calculate. They are the calculation. You wire up a circuit whose electrical behavior mirrors the physics of the problem you want to solve. A capacitor naturally integrates. A resistor-inductor pair naturally models a damped oscillator. Want to know the trajectory of an artillery shell? Build a circuit whose voltage behaves like the shell. Read the answer off a meter. The computation happens at the speed of electricity, continuously, the way nature computes things — because you are, in a real sense, running nature.

Digital computers gave all of that up. They operate on discrete numbers, one operation after another, in steps. They cannot integrate. They cannot represent a continuous signal — only a sequence of snapshots. What they gained in exchange was programmability, precision, and the ability to run any problem on the same machine. What they lost was the ability to do calculus cheaply.

The entire history of computational mathematics since then has been a long negotiation with that loss. And the most important tool in that negotiation is something most people learned about in college and immediately forgot.

I. Fourier’s door.

Joseph Fourier’s insight, published in his 1822 Théorie analytique de la chaleur, was that any signal — any wave, any pattern, any reasonable function — can be described as a sum of simple sine waves at different frequencies and amplitudes. Your voice, when recorded, looks like a chaotic squiggle. But it is secretly a superposition of specific frequencies: some high, some low, each with its own strength. Fourier’s transform reveals which frequencies are present and how strong each one is. This is what your inner ear does in real time, which is why you can hear a C-major chord as three distinct notes rather than a single indecipherable pressure wave.

The reason this matters for computation is what it does to a particular problem called convolution.

Convolution sounds technical, but the concept is everywhere. Blurring a photograph is convolution. Echo in audio is convolution. The way a radar signal smears across a receiver is convolution. Computing it directly in the time domain — the natural way you would approach it, by looking at how signals overlap moment by moment — requires sliding one signal across the other and summing products at every single point. For any signal of meaningful length, this is genuinely catastrophic: double the length of the signal and you quadruple the work. On a digital computer, it is the kind of problem that should take forever.

The Fourier cheat: transform both signals into the frequency domain. There, that catastrophic sliding-window multiplication becomes ordinary element-wise multiplication. Multiply the two transformed signals together. Transform back. You have the answer. With the Fast Fourier Transform — the Cooley–Tukey algorithm published in 1965 and arguably the most important algorithm of the twentieth century — the round trip costs the logarithm of what the direct calculation would have cost.

This is not an approximation. It is exact. You have not simplified the mathematics; you have revealed that the mathematics looks completely different depending on which direction you approach it from. The same problem that is a nightmare in one domain is trivial in another. The transform is not a shortcut through the work. It is a door into a room where the work was never hard to begin with.

This is why your phone can process audio in real time. Why JPEG compression works. Why WiFi signals can be decoded, MRI machines can reconstruct images, and noise-canceling headphones can subtract ambient sound from a microphone feed. All of them are Fourier transforms in disguise — problems that would strand a digital computer if approached directly, solved by looking sideways. 3Blue1Brown’s video makes the geometric intuition feel almost inevitable, which is the highest praise one can give a piece of mathematics exposition.

II. The GPU tells the same story.

Modern GPUs did not invent new mathematics. They invented new arrangements for doing mathematics that was already understood to be “embarrassingly parallel” — the technical term for a calculation where each result does not depend on any other. Matrix multiplication, the core operation of deep learning, is the canonical example. To multiply two large matrices, you need to compute thousands of dot products. None of them depend on each other. A CPU computes them in sequence; a GPU computes all of them at once across thousands of cores.

But parallelism alone is not the whole story. The other trick is data locality. Moving numbers from main memory to a processor is slow — orders of magnitude slower than arithmetic. The GPU’s solution is tiling: break large matrices into small blocks that fit entirely into the processor’s local memory, work exhaustively within each tile, then move to the next. The arithmetic never waits for the data. This is the same logic as the Fourier cheat: identify the actual constraint (memory bandwidth, not computing power), and redesign the approach around it.

Then there is quantization. Floating-point arithmetic — the full-precision kind — is expensive. But most of what deep learning does can be done nearly as well with integers, or with 8-bit numbers instead of 32-bit ones. This is the “good enough” cheat: trade a small amount of precision for a large gain in speed and memory. It is why large language models that would otherwise require warehouse-scale hardware can run on consumer graphics cards. The numbers are slightly wrong. The outputs are indistinguishable.

Every one of these tricks follows the same logic. You cannot solve the problem as stated. You restate the problem in terms that reveal a simpler structure. You solve it there. You return.

III. The history of AI is the same story in slow motion.

The perceptron, proposed by Frank Rosenblatt in his 1958 paper, was the first major cheat in machine learning. The world is non-linear — complicated, curved, contingent. But Rosenblatt observed that for simple classification tasks, you could pretend it was linear: multiply inputs by weights, sum the products, threshold the result. The simplification was brutal. It famously could not solve XOR — a problem a child could grasp — and Minsky and Papert’s Perceptrons (1969) made the limitation famous enough to freeze the field for over a decade. But it worked for what it worked for, and the machinery it established — weighted inputs, learned weights — became the foundation of everything that followed.

The insight behind deep learning, which Geoffrey Hinton and collaborators kept alive through the long winter and finally made tractable with backpropagation in 1986, was that stacking layers of linear operations, with a small non-linearity between each layer, creates something that can approximate any function. The ReLU function in modern practice is almost insultingly simple: output zero if the input is negative, output the input otherwise. A single linear layer is almost useless. Thousands of them, stacked, become a universal approximator. The non-linearity is the hinge — the thing that lets the system fold and bend the input space until any pattern can be found.

What AlexNet demonstrated in Krizhevsky, Sutskever, and Hinton’s 2012 paper was not a new mathematical principle but that this machinery could be made to work at scale, on GPUs, on real data, for real problems. The domain shift from raw pixels to learned features is Fourier in another guise: instead of sine waves, the basis functions are learned from the data. Instead of frequency components, you have feature vectors. The network is finding the coordinate system in which the input data has simple structure — where “cat” and “not cat” occupy different regions of space, where grammatically related words cluster together, where the intent of a sentence has a direction and a magnitude.

The large language model is this idea taken to its limit. Text is transformed into high-dimensional vectors, the attention mechanism introduced by Vaswani and colleagues in Attention Is All You Need (2017) computes weighted sums across sequences, and the entire forward pass is an enormous sequence of matrix multiplications — each one a linear operation, each non-linearity a tiny kink in an otherwise flat landscape. The mathematics is not complicated. The scale is. The trick is that if you make the domain large enough — hundreds of billions of parameters, a training corpus that spans most of human writing — the approximation becomes indistinguishable from understanding.

IV. The thread.

The thread connecting Fourier to Rosenblatt to Hinton to the transformer is not hardware or data or even mathematics. It is a way of thinking: when a problem is impossible as posed, change the domain. Find the angle from which the same problem reveals its easy structure. The hard part is never the arithmetic. It is knowing which direction to look.

Analog computers did this automatically, because they were the domain. Digital computers have to work for it — every trick, every transformation, every cheat is a vote against the original decision to discretize. What computational mathematics has been building, for two centuries now, is a growing library of ways to make discrete machines approximate what continuous nature does without effort.

There is a temptation, watching this history, to read it as a story of progress: we got better at finding the tricks. The honest reading is humbler. We are still digital computers pretending to do calculus. Every breakthrough — the FFT, the GPU, the perceptron, the transformer — is the same admission dressed in different clothes: we cannot solve this the obvious way, so let us look elsewhere. The cleverness is real. So is the patch.

Fourier showed us the door. We have been walking through it ever since.

Well — Professor Wilson Zuluaga et al, all those years of enduring your signal processing classes left some marks on me. Thank you.

I. Fourier’s door.#

II. The GPU tells the same story.#

III. The history of AI is the same story in slow motion.#

IV. The thread.#

Further reading#

I. Fourier’s door.

II. The GPU tells the same story.

III. The history of AI is the same story in slow motion.

IV. The thread.

Further reading