CPU Architecture basics every developer should know

Table of Contents

Usually, as software engineer, we spend a lot of time working with high-level concepts: frameworks, design patterns, clean code… But it can allow us to miss or forget what’s happening under the hood of our machines. Most precisely, about the CPUs. Which is a shame, considering how it can sometimes help us write better and more effient code.

The basics: CPU, Cores, and Threads #

At its core, a CPU (Central Processing Unit) is the brain of our computer, in the way that it does all the “thinking” of our computer. It’s the piece of hardware that executes our program’s instructions: making the calculations, and coordinating all the other components of our system.

In opposition to the early days CPUs, modern CPUs are not just single units anymore.

In the early days, CPUs were pretty simple: one unit that could execute one instruction at time. But just how a company wouldn’t be able to scale with just one employee, CPUs need a way to execute multiple tasks at the same time: cores.

Cores #

We can think of cores as independent processors within our CPU. Each core can execute its own set of instructions independently from the others. When we have a program that can be split into multiple independent tasks, having multiple cores means we can actually run these tasks truly in parallel.

For example, modern consumer CPUs like the Intel i9-13900K comes with 24 cores (8 P-cores and 16 E-cores, more on these in the Modern CPU Architecture section), while some server CPUs can have up to 128 cores in a single processor!

To demonstrate this, if we’re writing a program that needs to process a large array of data, we could split it into chunks and process each chunk on a different core:

from multiprocessing import Pool

def process_chunk(data):
    return sum(x * 2 for x in data)

# Split our data into chunks that can run on different cores
with Pool() as pool:
    result = pool.map(process_chunk, chunks_of_data)

Threads #

Threads are a bit different. They’re like virtual cores that share the same physical core. Most modern CPUs support something called Hyperthreading (Intel) or SMT (Simultaneous Multi-Threading, AMD), which allows each physical core to handle two threads simultaneously.

But here’s the catch: unlike cores that can truly run in parallel, threads share the same physical resources. They’re more useful for tasks that involve a lot of waiting (like I/O operations) than heavy computations.

To segregate this even further, there are actually two concepts we need to distinguish:\

Hardware threads (also called logical processors): These are what we talked about with Hyperthreading/SMT. When a physical core supports two hardware threads, it means it has duplicate registers and control units, allowing it to maintain two different execution states. While these threads do share some physical resources (like cache and execution units), they can actually execute in parallel to some extent - although usually not as efficiently as two separate cores would.
Software threads: These are what we create in our programs. They are managed by the operating system and can be more numerous than the hardware threads available.

An instance of software threads would be:

# This could create 100 software threads
for i in range(100):
    threading.Thread(target=task).start()

Modern CPU Architecture example: Apple M Series Example #

Let’s take Apple’s M series chips as an interesting example of modern CPU design. These chips introduce something pretty clever. They have two different types of cores:

Performance cores (P-cores): These are the powerful ones, designed for heavy tasks like compiling code or running complex calculations
Efficiency cores (E-cores): These are smaller, more power-efficient cores for background tasks and less demanding operations

This is what we call a heterogeneous architecture, and it’s becoming more common. The idea is simple but smart: use the right tool for the job. Why waste energy running a background task on a powerful core when a more efficient one would do just fine?

How This Knowledge Helps Us Write Better Code #

Understanding CPU architecture can help us make better decisions when writing code:

Parallelization choices:
- Use multiple processes (cores) for CPU-intensive tasks
- Use threads for I/O-bound operations
- Be aware that creating too many threads/processes can actually hurt performance
Resource awareness:

# Bad - creates a virtual thread for each item
for item in large_list:
    threading.Thread(target=process_item, args=(item,)).start()

# Better - uses a thread pool with a reasonable number of workers
with ThreadPoolExecutor(max_workers=cpu_count() * 2) as executor:
    executor.map(process_item, large_list)

Background tasks: On systems with E-cores (like M series), consider marking non-urgent background tasks as low priority so the OS can schedule them on E-cores:

import os

def background_task():
    # On Unix systems, we can hint that this is a background task
    os.nice(10)  # Increases niceness (lower priority)
    # Do background work...

When to Care About This? #

Let’s be honest: for many applications, especially smaller ones, you won’t need to think about CPU architecture that much. The operating system and modern programming languages already do a pretty good job at managing resources.

But understanding these concepts becomes crucial when:

You’re working on performance-critical applications
You need to optimize CPU usage or battery life
You’re dealing with parallel or concurrent programming
You’re trying to debug performance issues

And let’s remember: premature optimization is the root of all evil, but when you do need to optimize, understanding what’s happening under the hood can make a huge difference!