Multithreaded HTTP requests in Python

This will be a rather brief overview and benchmark of 2 different ways you can parallelize HTTP requests in Python. The complete code snippet can be found at the end of this article.

Part 1 - concurrent.futures & requests

Most of the people familiar with Python had used requests library before in one way or another, it’s one of the simplest and elegant solutions to making HTTP requests in Python. So, naturally, when we think of multithreading HTTP calls - wrapping requests in some form of parallel execution is the first thing that comes to mind.

Let’s write a base method that makes an HTTP GET call using requests:

1def http_get_with_requests(url: str, headers: Dict = {}, proxies: Dict = {}, timeout: int = 10) -> (int, Dict[str, Any], bytes):
2    response = requests.get(url, headers=headers, proxies=proxies, timeout=timeout)
3
4    response_json = None
5    try:
6        response_json = response.json()
7    except:
8        pass
9
10    response_content = None
11    try:
12        response_content = response.content
13    except:
14        pass
15
16    return (response.status_code, response_json, response_content)

We then add parallelization on top of it, using ThreadPoolExecutor from concurrent.futures:

1def http_get_with_requests_parallel(list_of_urls: List[str], headers: Dict = {}, proxies: Dict = {}, timeout: int = 10) -> (List[Tuple[int, Dict[str, Any], bytes]], float):
2    t1 = time.time()
3    results = []
4    executor = ThreadPoolExecutor(max_workers=100)
5    for result in executor.map(http_get_with_requests, list_of_urls, repeat(headers), repeat(proxies), repeat(timeout)):
6        results.append(result)
7    t2 = time.time()
8    t = t2 - t1
9    return results, t

You can try out setting max_workers to other values, on my PC I determined that going below 100 drops my request speed, and going above 100 doesn’t really change anything. Your PC’s hardware + internet speed combination may produce other results.

P.S. Bonus points - try out how ProcessPoolExecutor will work on your system. I didn’t notice any signifficant differences in speed on my PC.

Part 2 - asyncio & aiohttp

An alternative, newer and more robust approach would be to take a dive in Python’s asyncio and make HTTP call with aiohttp.

Same as before, we’ll write a base HTTP GET call:

1async def http_get_with_aiohttp(session: ClientSession, url: str, headers: Dict = {}, proxy: str = None, timeout: int = 10) -> (int, Dict[str, Any], bytes):
2    response = await session.get(url=url, headers=headers, proxy=proxy, timeout=timeout)
3
4    response_json = None
5    try:
6        response_json = await response.json(content_type=None)
7    except json.decoder.JSONDecodeError as e:
8        pass
9
10    response_content = None
11    try:
12        response_content = await response.read()
13    except:
14        pass
15
16    return (response.status, response_json, response_content)

And a multithreaded version:

1async def http_get_with_aiohttp_parallel(session: ClientSession, list_of_urls: List[str], headers: Dict = {}, proxy: str = None, timeout: int = 10) -> (List[Tuple[int, Dict[str, Any], bytes]], float):
2    t1 = time.time()
3    results = await asyncio.gather(*[http_get_with_aiohttp(session, url, headers, proxy, timeout) for url in list_of_urls])
4    t2 = time.time()
5    t = t2 - t1
6    return results, t

Note that we’ll need to pass an additional session object to these methods, this ClientSession will be initialized later in the main function.

Part 3 - Benchmarking

If you’ve noticed - we always measure execution time in our multithreaded methods, that’s because now we’ll need to compare them. To make a fair lenghty comparison let’s take a 1000 URLs, run this batch 10 times on each approach, collect request speeds (number of requests made, divided by the time it took to execute all requests) and compare their averages.

1# URL list
2urls = ["https://api.myip.com/" for i in range(0, 1000)]
3
4# Benchmark aiohttp
5session = ClientSession()
6speeds_aiohttp = []
7for i in range(0, 10):
8    results, t = await http_get_with_aiohttp_parallel(session, urls)
9    v = len(urls) / t
10    print('AIOHTTP: Took ' + str(round(t, 2)) + ' s, with speed of ' + str(round(v, 2)) + ' r/s')
11    speeds_aiohttp.append(v)
12await session.close()
13
14# Benchmark requests
15speeds_requests = []
16for i in range(0, 10):
17    results, t = http_get_with_requests_parallel(urls)
18    v = len(urls) / t
19    print('REQUESTS: Took ' + str(round(t, 2)) + ' s, with speed of ' + str(round(v, 2)) + ' r/s')
20    speeds_requests.append(v)
21
22# Calculate averages
23avg_speed_aiohttp = sum(speeds_aiohttp) / len(speeds_aiohttp)
24avg_speed_requests = sum(speeds_requests) / len(speeds_requests)
25print('--------------------')
26print('AVG SPEED AIOHTTP: ' + str(round(avg_speed_aiohttp, 2)) + ' r/s')
27print('AVG SPEED REQUESTS: ' + str(round(avg_speed_requests, 2)) + ' r/s')

For the aiohttp part we had to initialize ClientSession before any ot the requests were made, and we closed it manually after all requests were done. Yes, Python’s with construction would work here, I just didn’t want to add another indentation level.

Part 4 - Complete code and results

Well, first of all the results of benchmarks. Yes, aiohttp is faster, somewhere around 2.1 - 2.3 times faster than ThreadPoolExecutor.

1RUN №1
2AVG SPEED AIOHTTP: 501.25 r/s
3AVG SPEED REQUESTS: 215.94 r/s
4
5RUN №2
6AVG SPEED AIOHTTP: 500.95 r/s
7AVG SPEED REQUESTS: 221.53 r/s
8
9RUN №3
10AVG SPEED AIOHTTP: 489.21 r/s
11AVG SPEED REQUESTS: 226.95 r/s

Of course there are too many variables to cover here, your PC’s multithreading abilities, your connection speeds, the URLs that you call, server load & performance and a lot more factors that can influence these request speeds. But as we determined, in general, in similar conditions - aiohttp will work better and faster.

If you decide to experiment and compare these approaches please send me your findings. I’d be very interested to know how they behave in different environments.

Thanks for reading! Hope this article was helpful in deciding which approach to multithreaded HTTP requests you should take. And here’s the complete code fragment:

1import asyncio
2import json
3import time
4from typing import Dict, Any, List, Tuple
5import requests
6from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
7from itertools import repeat
8from aiohttp import ClientSession
9
10
11def http_get_with_requests(url: str, headers: Dict = {}, proxies: Dict = {}, timeout: int = 10) -> (int, Dict[str, Any], bytes):
12    response = requests.get(url, headers=headers, proxies=proxies, timeout=timeout)
13
14    response_json = None
15    try:
16        response_json = response.json()
17    except:
18        pass
19
20    response_content = None
21    try:
22        response_content = response.content
23    except:
24        pass
25
26    return (response.status_code, response_json, response_content)
27
28
29def http_get_with_requests_parallel(list_of_urls: List[str], headers: Dict = {}, proxies: Dict = {}, timeout: int = 10) -> (List[Tuple[int, Dict[str, Any], bytes]], float):
30    t1 = time.time()
31    results = []
32    executor = ThreadPoolExecutor(max_workers=100)
33    for result in executor.map(http_get_with_requests, list_of_urls, repeat(headers), repeat(proxies), repeat(timeout)):
34        results.append(result)
35    t2 = time.time()
36    t = t2 - t1
37    return results, t
38
39
40async def http_get_with_aiohttp(session: ClientSession, url: str, headers: Dict = {}, proxy: str = None, timeout: int = 10) -> (int, Dict[str, Any], bytes):
41    response = await session.get(url=url, headers=headers, proxy=proxy, timeout=timeout)
42
43    response_json = None
44    try:
45        response_json = await response.json(content_type=None)
46    except json.decoder.JSONDecodeError as e:
47        pass
48
49    response_content = None
50    try:
51        response_content = await response.read()
52    except:
53        pass
54
55    return (response.status, response_json, response_content)
56
57
58async def http_get_with_aiohttp_parallel(session: ClientSession, list_of_urls: List[str], headers: Dict = {}, proxy: str = None, timeout: int = 10) -> (List[Tuple[int, Dict[str, Any], bytes]], float):
59    t1 = time.time()
60    results = await asyncio.gather(*[http_get_with_aiohttp(session, url, headers, proxy, timeout) for url in list_of_urls])
61    t2 = time.time()
62    t = t2 - t1
63    return results, t
64
65
66async def main():
67    print('--------------------')
68
69    # URL list
70    urls = ["https://api.myip.com/" for i in range(0, 1000)]
71
72    # Benchmark aiohttp
73    session = ClientSession()
74    speeds_aiohttp = []
75    for i in range(0, 10):
76        results, t = await http_get_with_aiohttp_parallel(session, urls)
77        v = len(urls) / t
78        print('AIOHTTP: Took ' + str(round(t, 2)) + ' s, with speed of ' + str(round(v, 2)) + ' r/s')
79        speeds_aiohttp.append(v)
80    await session.close()
81
82    print('--------------------')
83
84    # Benchmark requests
85    speeds_requests = []
86    for i in range(0, 10):
87        results, t = http_get_with_requests_parallel(urls)
88        v = len(urls) / t
89        print('REQUESTS: Took ' + str(round(t, 2)) + ' s, with speed of ' + str(round(v, 2)) + ' r/s')
90        speeds_requests.append(v)
91
92    # Calculate averages
93    avg_speed_aiohttp = sum(speeds_aiohttp) / len(speeds_aiohttp)
94    avg_speed_requests = sum(speeds_requests) / len(speeds_requests)
95    print('--------------------')
96    print('AVG SPEED AIOHTTP: ' + str(round(avg_speed_aiohttp, 2)) + ' r/s')
97    print('AVG SPEED REQUESTS: ' + str(round(avg_speed_requests, 2)) + ' r/s')
98
99
100asyncio.run(main())

In case you’d like to check my other work or contact me:

Multithreaded HTTP requests in Python

Part 1 - concurrent.futures & requests

Part 2 - asyncio & aiohttp

Part 3 - Benchmarking

Part 4 - Complete code and results

More articles from TekLeo

Jump start in object detection with Detectron2 (license plate detection)

Quickstart with Java Spring Boot mircoservices