Python GIL: Explained Like I'm Five
So, you have heard of the Global Interpreter Lock, or GIL for short? What exactly is GIL? Well, in short GIL is a mechanism that is built into the CPython interpreter. It is designed to prevent multiple native threads from executing Python bytecodes at once.
“Blah-blah-blah, stop trying to sound smart! What the hell are CPython and bytecode?!”. Hopefully, I haven’t lost you yet at this point.
Before we start talking about GIL, let me briefly introduce to you what CPython and bytecode are to understand the typical answers about GIL that you read on the Internet.
Don’t worry, it’s not as scary as it sounds! Here's my GIL ELI5 take.
What is CPython
Just like how there are many different brands of soda (Coke, Pepsi, Mountain Dew, etc.), there are also many different versions of Python that you can use to write and run your programs.
CPython is like the "original" or "standard" version of the Python programming language. You’re probably using it every day!
Written in C, CPython is called the "reference implementation" of Python, which means that it's the version that other versions of Python are compared to and tested against.
It's a bit like how Coke is the original soda that other sodas are based on or try to imitate.
Not actually compiling to C
To be clear, CPython does not translate your Python code to C. The Cython project on the other hand lets you compile your code to C.
Several alternative implementations of Python are built to address different needs or to improve the performance of CPython in certain areas.
A non-exhaustive list of alternative implementations of Python includes:
- PyPy: a fast, compliant implementation of Python that is built using a just-in-time compiler. It is often faster than CPython, particularly for larger programs.
- Jython: an implementation of Python that is written in Java and can be used to run Python code on the Java Virtual Machine (JVM). This allows Python programs to integrate with Java programs and libraries.
- IronPython: an implementation of Python built using the .NET framework and designed to be used with other .NET languages such as C#.
What is bytecode
Are you ready to learn about the mysterious world of Python bytecodes? Buckle up!
When you write a Python program and run it, the Python interpreter first compiles the source code into bytecodes and then executes the bytecodes to produce the desired output.
Imagine that you have a recipe for making the perfect soda. The recipe is written in a language that you can understand, like English.
But when you go to make the soda, you don't actually follow the recipe in English. You translate it into a series of instructions that your soda machine can understand – these instructions are the "bytecodes".
In Python, bytecodes are a lower-level representation of the source code of a Python program.
They are typically stored in files with an
.pyc extension, which stands for "Python compiled code". These files are created automatically by the Python interpreter when it encounters a
.py file that it needs to execute.
They are used to speed up the execution of Python programs by avoiding the need to recompile the source code each time the program is run.
So that's bytecode in a nutshell! It's a set of instructions that tells your computer what to do, kind of like a recipe for soda.
Just don't try to drink your bytecode — that would be a bit weird.
Why is GIL necessary
Well, you see, CPython wasn’t designed to be thread-safe. In other words, multiple threads can potentially interfere with one another. This is bad because it can lead to something called “race condition”.
The GIL makes our life easier as Python developers to write multi-threaded programs without worrying about race conditions.
How do I use GIL?
In case you’re wondering — you don't need to do anything special to use the GIL in your Python code. It is implemented automatically in the CPython interpreter and is active by default.
However, you can use the
threading module (docs) to create and manage threads in your Python code:
import threading def make_soda(): print("Making a delicious soda!") if __name__ == "__main__": t1 = threading.Thread(target=make_soda) t2 = threading.Thread(target=make_soda) t3 = threading.Thread(target=make_soda) t1.start() t2.start() t3.start()
Imagine that you and your friends are making a batch of soda together. You've got all the ingredients ready to go — sugar, water, caramel coloring, and some secret spices.
But you all try to add the ingredients to the same pot at the same time. This can lead to unexpected results because you don't know which ingredients will be mixed in first. This is similar to a race condition in a program, where two or more threads try to modify the same shared resource at the same time!
But because the GIL is in place, the other thread will have to wait for its turn before it can start making some fizzy goodness.
You may have already noticed — GIL may seem like a bottleneck because of this wait. But, it’s actually just there to keep things running smoothly.
Input/Output (I/O) Bound
It is important to note that GIL does not affect programs that rely on I/O operations as these operations typically release the GIL while waiting for data to be transferred.
What exactly do you mean by I/O bound?
At its most basic, I/O bound means that a program is waiting for I/O operations to complete. This can include things like:
- Reading from or writing to a file
- Sending or receiving data over a network
- Interacting with a user through a graphical user interface (GUI), etc.
Essentially, any time a program has to wait for something to happen outside of itself, it's probably doing I/O-bound work.
A TL;DR about all the fuss about the Global Interpreter Lock, or GIL.
“The GIL is a great feature of Python! It makes it easy for us to write multi-threaded programs without having to worry about race conditions!” – says the supporters.
They argue that most Python programs aren’t CPU-intensive, to begin with. Most of the time, the programs are doing I/O-bound work which GIL doesn’t really affect. So, why all the fuss?
“The GIL is holding us back!” – say the critics with an opposite view.
With GIL, only one thread can execute Python bytecodes at a time.
As a result, this can limit the performance of CPU-bound Python programs on systems with multiple CPU cores. Because of this, GIL is known as a major limitation of Python.
Although most Python programs are I/O-bound, it doesn’t mean that we shouldn’t be able to write high-performance CPU-bound programs in Python.
So who's right? Well, as with most debates, there's no easy answer.
The GIL may be a necessary compromise that makes it easy for Python programmers to write multi-threaded programs. Nevertheless, it can also limit the performance of CPU-bound programs on systems with multiple CPU cores.
The debate lives on, for now, I guess. Lastly, check out this HN post and comments if you're interested to dive a little bit deeper into the wonderful world of GIL.