Navigate back to the homepage

Multithreaded HTTP requests in Python

Leo Ertuna
February 28th, 2022 · 2 min read

This will be a rather brief overview and benchmark of 2 different ways you can parallelize HTTP requests in Python. The complete code snippet can be found at the end of this article.

Part 1 - concurrent.futures & requests

Most of the people familiar with Python had used requests library before in one way or another, it’s one of the simplest and elegant solutions to making HTTP requests in Python. So, naturally, when we think of multithreading HTTP calls - wrapping requests in some form of parallel execution is the first thing that comes to mind.

Let’s write a base method that makes an HTTP GET call using requests:

1def http_get_with_requests(url: str, headers: Dict = {}, proxies: Dict = {}, timeout: int = 10) -> (int, Dict[str, Any], bytes):
2 response = requests.get(url, headers=headers, proxies=proxies, timeout=timeout)
3
4 response_json = None
5 try:
6 response_json = response.json()
7 except:
8 pass
9
10 response_content = None
11 try:
12 response_content = response.content
13 except:
14 pass
15
16 return (response.status_code, response_json, response_content)

We then add parallelization on top of it, using ThreadPoolExecutor from concurrent.futures:

1def http_get_with_requests_parallel(list_of_urls: List[str], headers: Dict = {}, proxies: Dict = {}, timeout: int = 10) -> (List[Tuple[int, Dict[str, Any], bytes]], float):
2 t1 = time.time()
3 results = []
4 executor = ThreadPoolExecutor(max_workers=100)
5 for result in executor.map(http_get_with_requests, list_of_urls, repeat(headers), repeat(proxies), repeat(timeout)):
6 results.append(result)
7 t2 = time.time()
8 t = t2 - t1
9 return results, t

You can try out setting max_workers to other values, on my PC I determined that going below 100 drops my request speed, and going above 100 doesn’t really change anything. Your PC’s hardware + internet speed combination may produce other results.

P.S. Bonus points - try out how ProcessPoolExecutor will work on your system. I didn’t notice any signifficant differences in speed on my PC.

Part 2 - asyncio & aiohttp

An alternative, newer and more robust approach would be to take a dive in Python’s asyncio and make HTTP call with aiohttp.

Same as before, we’ll write a base HTTP GET call:

1async def http_get_with_aiohttp(session: ClientSession, url: str, headers: Dict = {}, proxy: str = None, timeout: int = 10) -> (int, Dict[str, Any], bytes):
2 response = await session.get(url=url, headers=headers, proxy=proxy, timeout=timeout)
3
4 response_json = None
5 try:
6 response_json = await response.json(content_type=None)
7 except json.decoder.JSONDecodeError as e:
8 pass
9
10 response_content = None
11 try:
12 response_content = await response.read()
13 except:
14 pass
15
16 return (response.status, response_json, response_content)

And a multithreaded version:

1async def http_get_with_aiohttp_parallel(session: ClientSession, list_of_urls: List[str], headers: Dict = {}, proxy: str = None, timeout: int = 10) -> (List[Tuple[int, Dict[str, Any], bytes]], float):
2 t1 = time.time()
3 results = await asyncio.gather(*[http_get_with_aiohttp(session, url, headers, proxy, timeout) for url in list_of_urls])
4 t2 = time.time()
5 t = t2 - t1
6 return results, t

Note that we’ll need to pass an additional session object to these methods, this ClientSession will be initialized later in the main function.

Part 3 - Benchmarking

If you’ve noticed - we always measure execution time in our multithreaded methods, that’s because now we’ll need to compare them. To make a fair lenghty comparison let’s take a 1000 URLs, run this batch 10 times on each approach, collect request speeds (number of requests made, divided by the time it took to execute all requests) and compare their averages.

1# URL list
2urls = ["https://api.myip.com/" for i in range(0, 1000)]
3
4# Benchmark aiohttp
5session = ClientSession()
6speeds_aiohttp = []
7for i in range(0, 10):
8 results, t = await http_get_with_aiohttp_parallel(session, urls)
9 v = len(urls) / t
10 print('AIOHTTP: Took ' + str(round(t, 2)) + ' s, with speed of ' + str(round(v, 2)) + ' r/s')
11 speeds_aiohttp.append(v)
12await session.close()
13
14# Benchmark requests
15speeds_requests = []
16for i in range(0, 10):
17 results, t = http_get_with_requests_parallel(urls)
18 v = len(urls) / t
19 print('REQUESTS: Took ' + str(round(t, 2)) + ' s, with speed of ' + str(round(v, 2)) + ' r/s')
20 speeds_requests.append(v)
21
22# Calculate averages
23avg_speed_aiohttp = sum(speeds_aiohttp) / len(speeds_aiohttp)
24avg_speed_requests = sum(speeds_requests) / len(speeds_requests)
25print('--------------------')
26print('AVG SPEED AIOHTTP: ' + str(round(avg_speed_aiohttp, 2)) + ' r/s')
27print('AVG SPEED REQUESTS: ' + str(round(avg_speed_requests, 2)) + ' r/s')

For the aiohttp part we had to initialize ClientSession before any ot the requests were made, and we closed it manually after all requests were done. Yes, Python’s with construction would work here, I just didn’t want to add another indentation level.

Part 4 - Complete code and results

Well, first of all the results of benchmarks. Yes, aiohttp is faster, somewhere around 2.1 - 2.3 times faster than ThreadPoolExecutor.

1RUN №1
2AVG SPEED AIOHTTP: 501.25 r/s
3AVG SPEED REQUESTS: 215.94 r/s
4
5RUN №2
6AVG SPEED AIOHTTP: 500.95 r/s
7AVG SPEED REQUESTS: 221.53 r/s
8
9RUN №3
10AVG SPEED AIOHTTP: 489.21 r/s
11AVG SPEED REQUESTS: 226.95 r/s

Of course there are too many variables to cover here, your PC’s multithreading abilities, your connection speeds, the URLs that you call, server load & performance and a lot more factors that can influence these request speeds. But as we determined, in general, in similar conditions - aiohttp will work better and faster.

If you decide to experiment and compare these approaches please send me your findings. I’d be very interested to know how they behave in different environments.

Thanks for reading! Hope this article was helpful in deciding which approach to multithreaded HTTP requests you should take. And here’s the complete code fragment:

1import asyncio
2import json
3import time
4from typing import Dict, Any, List, Tuple
5import requests
6from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
7from itertools import repeat
8from aiohttp import ClientSession
9
10
11def http_get_with_requests(url: str, headers: Dict = {}, proxies: Dict = {}, timeout: int = 10) -> (int, Dict[str, Any], bytes):
12 response = requests.get(url, headers=headers, proxies=proxies, timeout=timeout)
13
14 response_json = None
15 try:
16 response_json = response.json()
17 except:
18 pass
19
20 response_content = None
21 try:
22 response_content = response.content
23 except:
24 pass
25
26 return (response.status_code, response_json, response_content)
27
28
29def http_get_with_requests_parallel(list_of_urls: List[str], headers: Dict = {}, proxies: Dict = {}, timeout: int = 10) -> (List[Tuple[int, Dict[str, Any], bytes]], float):
30 t1 = time.time()
31 results = []
32 executor = ThreadPoolExecutor(max_workers=100)
33 for result in executor.map(http_get_with_requests, list_of_urls, repeat(headers), repeat(proxies), repeat(timeout)):
34 results.append(result)
35 t2 = time.time()
36 t = t2 - t1
37 return results, t
38
39
40async def http_get_with_aiohttp(session: ClientSession, url: str, headers: Dict = {}, proxy: str = None, timeout: int = 10) -> (int, Dict[str, Any], bytes):
41 response = await session.get(url=url, headers=headers, proxy=proxy, timeout=timeout)
42
43 response_json = None
44 try:
45 response_json = await response.json(content_type=None)
46 except json.decoder.JSONDecodeError as e:
47 pass
48
49 response_content = None
50 try:
51 response_content = await response.read()
52 except:
53 pass
54
55 return (response.status, response_json, response_content)
56
57
58async def http_get_with_aiohttp_parallel(session: ClientSession, list_of_urls: List[str], headers: Dict = {}, proxy: str = None, timeout: int = 10) -> (List[Tuple[int, Dict[str, Any], bytes]], float):
59 t1 = time.time()
60 results = await asyncio.gather(*[http_get_with_aiohttp(session, url, headers, proxy, timeout) for url in list_of_urls])
61 t2 = time.time()
62 t = t2 - t1
63 return results, t
64
65
66async def main():
67 print('--------------------')
68
69 # URL list
70 urls = ["https://api.myip.com/" for i in range(0, 1000)]
71
72 # Benchmark aiohttp
73 session = ClientSession()
74 speeds_aiohttp = []
75 for i in range(0, 10):
76 results, t = await http_get_with_aiohttp_parallel(session, urls)
77 v = len(urls) / t
78 print('AIOHTTP: Took ' + str(round(t, 2)) + ' s, with speed of ' + str(round(v, 2)) + ' r/s')
79 speeds_aiohttp.append(v)
80 await session.close()
81
82 print('--------------------')
83
84 # Benchmark requests
85 speeds_requests = []
86 for i in range(0, 10):
87 results, t = http_get_with_requests_parallel(urls)
88 v = len(urls) / t
89 print('REQUESTS: Took ' + str(round(t, 2)) + ' s, with speed of ' + str(round(v, 2)) + ' r/s')
90 speeds_requests.append(v)
91
92 # Calculate averages
93 avg_speed_aiohttp = sum(speeds_aiohttp) / len(speeds_aiohttp)
94 avg_speed_requests = sum(speeds_requests) / len(speeds_requests)
95 print('--------------------')
96 print('AVG SPEED AIOHTTP: ' + str(round(avg_speed_aiohttp, 2)) + ' r/s')
97 print('AVG SPEED REQUESTS: ' + str(round(avg_speed_requests, 2)) + ' r/s')
98
99
100asyncio.run(main())

In case you’d like to check my other work or contact me:

More articles from TekLeo

Jump start in object detection with Detectron2 (license plate detection)

A very practical guide to build your first object detection model, jump from rough idea to proof-of-concept in one day

January 2nd, 2022 · 5 min read

Quickstart with Java Spring Boot mircoservices

In today’s tutorial I wanted to cover one simple, practical and probably most widely used way to build your microservice architecture with Java

November 12th, 2020 · 3 min read
© 2020–2022 TekLeo
Link to $https://tekleo.net/Link to $https://github.com/jpleorxLink to $https://medium.com/@leo.ertunaLink to $https://www.linkedin.com/in/leo-ertuna-14b539187/Link to $mailto:leo.ertuna@gmail.com