The undressing of the king

For the last two decades, Amazon, Microsoft, Google, and Alibaba have been amassing the world’s computing power. Now, they own 67% of it. If we are to build knock-off humans, where do we go from here?

Humans are striving to replicate humanity—some through biological methods, others through the fusion of software and hardware. Both approaches demand immense computational power.

For instance, OpenAI estimated that training GPT-3, the first model to achieve notable conversational proficiency, required 314 exaFLOPs. Similarly, LLaMA was trained on a cluster of 2,048 A100 GPUs, each delivering approximately 312 TFLOPS. This configuration represents the upper limit for A100 clusters due to switch topology, yielding a total performance of 639 PFLOPS.

This scale of computing signifies not just technological advancement but an escalating barrier to entry in AI research and development. Consider the progression of computational power beyond standard desktop capabilities. A typical desktop today operates at around 50 TFLOPS, metaphorically akin to a “mouse of compute”—just one 400th of the computational capacity of a “person” in this analogy. Achieving such capacity would require roughly nine more iterations of Moore’s Law. I estimate the technological Singularity may occur around 2038, coinciding intriguingly with the Unix timestamp rollover—a milestone I’ve long anticipated.

NVIDIA’s GPU advancements vividly illustrate this rapid progress:

  • GeForce RTX 4090 (2022): 82.6 TFLOPS
  • GeForce GTX 1080 (2016): 11.3 TFLOPS
  • GeForce RTX 2080 (2018): 14.2 TFLOPS
  • GeForce RTX 3090 (2020): 35.6 TFLOPS

The RTX 4090 marks a significant advancement in computational power, though with increased energy demands. This progression reinforces the ongoing relevance of Moore’s Law, as computational capabilities continue to double approximately every two years.

In practical terms, today’s market allows for “purchasing” advancements equivalent to two years of Moore’s Law through increased investment. For example, Facebook’s acquisition of a 2,048-GPU cluster represents a leap of about 11 Moore’s Law cycles—roughly 16 years ahead of the consumer standard—assuming access to an eight-GPU setup.

George Hotz proposes a benchmark for a “person of compute” at 20 PFLOPS, achievable with 64 A100 GPUs or a single 42U server rack filled with A100s. Currently, achieving this computational milestone requires approximately 30 kW of energy, highlighting the immense resources needed to sustain such advanced systems.

The true entry barrier for advanced AI development lies in an organization’s capacity to convert energy into intellect through computational resources. The dominance of a few cloud providers and chip manufacturers in supplying this infrastructure has created a highly uneven playing field. With the cost of achieving “a person of compute” estimated at $250,000, smaller entities face substantial disadvantages. This centralization of AI development raises critical concerns, including restricted access, biased outcomes, potential backdoors, and invasive telemetry.

Machine learning models, in isolation, hold limited intrinsic value; their utility stems from computational power, which in turn depends on silicon chip production. The real value lies in generating tangible resources that can be exchanged or applied, often amplified through software or scientific innovation. However, producing and maintaining silicon chips requires skilled expertise and significant capital—resources controlled by only a handful of global companies. Similarly, only a select few organizations can afford these chips, making equitable access to computational power at fair market rates essential for maintaining autonomy in AI development.

Cloud providers, having benefited from the 2010s cloud revolution by catering to SMBs and startups with storage, networking, and general compute, are now reinvesting their wealth into specialized computational capabilities. Counterbalancing this dominance requires reducing reliance on these providers by developing software optimized to minimize dependence on high-level computational abstractions.

The reluctance of major players to support smaller-scale initiatives is evident in practices such as Nvidia’s and AMD’s limited release of detailed GPU documentation. This withholding of information further entrenches their control and underscores the need for decentralizing access to computational resources.