Multi thread python for loop

First, in Python, if your code is CPU-bound, multithreading won't help, because only one thread can hold the Global Interpreter Lock, and therefore run Python code, at a time. So, you need to use processes, not threads.

This is not true if your operation "takes forever to return" because it's IO-bound—that is, waiting on the network or disk copies or the like. I'll come back to that later.


Next, the way to process 5 or 10 or 100 items at once is to create a pool of 5 or 10 or 100 workers, and put the items into a queue that the workers service. Fortunately, the stdlib multiprocessing and concurrent.futures libraries both wraps up most of the details for you.

The former is more powerful and flexible for traditional programming; the latter is simpler if you need to compose future-waiting; for trivial cases, it really doesn't matter which you choose. (In this case, the most obvious implementation with each takes 3 lines with futures, 4 lines with multiprocessing.)

If you're using 2.6-2.7 or 3.0-3.1, futures isn't built in, but you can install it from PyPI (pip install futures).


Finally, it's usually a lot simpler to parallelize things if you can turn the entire loop iteration into a function call (something you could, e.g., pass to map), so let's do that first:

def try_my_operation(item):
    try:
        api.my_operation(item)
    except:
        print('error with item')

Putting it all together:

executor = concurrent.futures.ProcessPoolExecutor(10)
futures = [executor.submit(try_my_operation, item) for item in items]
concurrent.futures.wait(futures)

If you have lots of relatively small jobs, the overhead of multiprocessing might swamp the gains. The way to solve that is to batch up the work into larger jobs. For example (using grouper from the itertools recipes, which you can copy and paste into your code, or get from the more-itertools project on PyPI):

def try_multiple_operations(items):
    for item in items:
        try:
            api.my_operation(item)
        except:
            print('error with item')

executor = concurrent.futures.ProcessPoolExecutor(10)
futures = [executor.submit(try_multiple_operations, group) 
           for group in grouper(5, items)]
concurrent.futures.wait(futures)

Finally, what if your code is IO bound? Then threads are just as good as processes, and with less overhead (and fewer limitations, but those limitations usually won't affect you in cases like this). Sometimes that "less overhead" is enough to mean you don't need batching with threads, but you do with processes, which is a nice win.

So, how do you use threads instead of processes? Just change ProcessPoolExecutor to ThreadPoolExecutor.

If you're not sure whether your code is CPU-bound or IO-bound, just try it both ways.


Can I do this for multiple functions in my python script? For example, if I had another for loop elsewhere in the code that I wanted to parallelize. Is it possible to do two multi threaded functions in the same script?

Yes. In fact, there are two different ways to do it.

First, you can share the same (thread or process) executor and use it from multiple places with no problem. The whole point of tasks and futures is that they're self-contained; you don't care where they run, just that you queue them up and eventually get the answer back.

Alternatively, you can have two executors in the same program with no problem. This has a performance cost—if you're using both executors at the same time, you'll end up trying to run (for example) 16 busy threads on 8 cores, which means there's going to be some context switching. But sometimes it's worth doing because, say, the two executors are rarely busy at the same time, and it makes your code a lot simpler. Or maybe one executor is running very large tasks that can take a while to complete, and the other is running very small tasks that need to complete as quickly as possible, because responsiveness is more important than throughput for part of your program.

If you don't know which is appropriate for your program, usually it's the first.

How often do we have to run a compute-heavy operation on a list of objects? or read a list of files from a storage space like S3?

What is common between the two problems stated above? Long wait times!

What is different? One is a CPU-bound process and another is an I/O (input-output) bound process.

Yeah, I know! That is heavy. Let’s explore this in pieces.

What is a CPU?

CPU is the Central Processing Unit. That’s it!

Just kidding, let me explain a bit more. A CPU’s productivity is measured based on cycles (but it can depend on a lot of other factors too, including the architecture) - the time required for the execution of one simple processor operation. So, for example, if you check out the task manager on your system, there are multiple processes working on the same CPU, for example, Google Chrome, Slack, VSCode, etc. and they are making it happen by sharing the amount of time each of their cycles run. One software might be running for 10 cycles while another runs for the next 5 cycles and the third for the next 8 cycles and it repeats. All of this might seem like a lot of waiting around and doing nothing for the non-running software, but guess what? It is not at all visible to the human eye.

What is multi-processing?

Utilization of multiple processors to run a task. It is generally a perfect use case for CPU-bound operations. In our case, we would create processes and give them each an object from the list that we want to process. This would lead to parallel processing and a faster response, but the same amount of CPU time. Simply put, if we had 2 items on the list and it was taking 20 seconds earlier, thanks to multiprocessing it will take 10 seconds (approx) but the same amount of CPU cycles which gets divided among the two processes.

Caveat: Processors do not share memory, so it is harder to share a global variable among the two processes, but there are solutions to this problem which are beyond the scope of this article.

What is multi-threading?

Multi-threading in the context of python is more similar to distributing work to different workers within the same processor but they cannot all work at the same time. So, what is the advantage? If it is an I/O bound functionality that we are trying to multithread while the first worker waits to get their file from storage the second worker can start its work and give back control when the first worker is done waiting.

Let’s test out some of these concepts!

We are going to use the Python Module — concurrent.futures

It has ThreadPoolExcecutor and ProcessPoolExecutor classes which have the same interface, and are subclasses of the Executor class.

The code samples were run on my local system having the following configuration.

Multi thread python for loop

System config: The highlighted text shows the system has 6 cores that can handle two threads each, hence 12 logical processors. Simply it means that each core (working unit) can run two threads but it does not happen simultaneously. The threads are just scheduled to share the cycles of the core.

Given below, are two code samples having the same compute intensive function consisting of multiple transformations applied on an image. Yup, I couldn’t think of any other use case so decided to do a bunch of stuff to an image for no apparent reason. Anyway, the code sample with normal execution runs in 456.551 sec while with multiprocessing it completes in 236.995 sec. Here, ProcessPoolExecutor is used for multiprocessing with 61 parallel workers. There are a number of other parameters that can be used to customize the process, documentation can be found here.

The compute intensive function is run on a list of images with normal executionThe compute intensive function is run on a list of images using multiprocessing

Multithreading is implemented in a similar manner, but is needed for a different reason altogether - I/O bound process, as mentioned initially. Given below is an example of a multithreading implementation in which we are trying to read images from a folder on the local system. The total time taken for the task with multithreading is 0.3212 sec while without multithreading it is 0.3732 sec which is a significant difference for a very small list. Here, we are also using partial functions, which is useful when we want to pass in a function with multiple arguments prefilled.

Another useful feature of concurrent.futures is the Future objects created by Executor.submit(). The future objects are very similar to promises in javascript. One of the functions in the module is future.as_completed which is used to get the completed results and hence the response received may not be in the same order as the input submitted.

This was just a very brief overview of what is happening underneath the hood and also how easily we can make our code run faster. While there is a lot more to this topic, I hope this helps you to get started. Here is a very good article covering the ins and outs of ThreadPoolExecutor.

Hi there, thanks a lot for taking the time to read the article. I am a Machine Learning Engineer currently exploring the MLOps world but also other industries. I use medium to jot down my thoughts about topics that piqued my interest recently. You can connect with me on LinkedIn and I am always up for a quick chat :)

Can Python run multiple threads?

To recap, threading in Python allows multiple threads to be created within a single process, but due to GIL, none of them will ever run at the exact same time. Threading is still a very good option when it comes to running multiple I/O bound tasks concurrently.

How do you write multiple threads in Python?

To use multithreading, we need to import the threading module in Python Program..
import threading..
def print_hello(n):.
print("Hello, how old are you ", n).
t1 = threading. Thread( target = print_hello, args =(18, )).

Can Python threads run on multiple cores?

Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

How do you run a for loop concurrently in Python?

Parallel for Loop in Python.
Use the multiprocessing Module to Parallelize the for Loop in Python..
Use the joblib Module to Parallelize the for Loop in Python..
Use the asyncio Module to Parallelize the for Loop in Python..

Tải thêm tài liệu liên quan đến bài viết Multi thread python for loop