Hands-on Tutorials
Automatic API calls, pandas manipulations, and traveling salesman problem
Many business cases require knowing the distance between some places. It’s a key in the logistics and energy industry, but it is also important when you plan your holidays.
In this tutorial we will see how to get several types of distances:
- the distance on the surface of Earth
- the distance when traveling by car
- or using paid API for commercial purposes
- the optimal route between several points
- brute force finding the optimum
- Plotting the places on plotly geochart
You can see the complete code in this GitHub repository.
Dataset
For this tutorial, I’ll use a small dataset from kaggle containing names and locations of all the capital cities in the world.
The distance on the surface of an object
Earth is not flat like our ancestors though. It’s not even a giant sphere levitating in the Universe. It’s considered being an ellipsoid. You can define the distance between two places on the surface of such a shape mathematically. We don’t have to do the calculation ourselves though, we can get help from the popular geopy library which offers many functions to calculate the distance on a great_circle or geodesics distance on the surface of an oblate ellipsoid.
Using geopy.distance.distance[[lat_1, lon_1], [lat_2, lon_2]]
returns the distance on the surface of a space object like Earth. You can choose whether you want the distance in kilometers
, miles
, nautical miles
or feet
.
Driving Distance between places
Unfortunately, such a distance is merely academic. Even the airplanes circle around the airfields, ascend, and land thus traveling much further. We usually need a driving distance, how far the car, bike or a walking person needs to travel.
Getting this information is also easy. You can use APIs of free services like OpenStreetMap or paid API like google maps. By checking the distance between Paris and Rome you will realize the driver must journey over 1400 km to reach the city which was supposed to be 1100 km away. The car must cross the Alps and avoid the sea.
Often it’s not crucial how far the car must go, but how long does it take. The OSRM API provides that information as well.
[In]: route_1["duration"]
[Out]: 54279.6 # OSRM return the duration in second[In]: str[datetime.timedelta[seconds=route_1["duration"]]]
[Out]: '15:04:39.600000'
In order to get the route_1
variable you need to call the API using the requests
library:
import requests
import json# call the OSMR API
r = requests.get[f"//router.project-osrm.org/route/v1/car/{lon_1},{lat};{lon_2},{lat_2}?overview=false"""]# then you load the response using the json libray
# by default you get only one alternative so you access 0-th element of the `routes`routes = json.loads[r.content]
route_1 = routes.get["routes"][0]
Besides the latitude and longitude, the API call can contain other parameters described in the documentation.
profile
- car, bike, footcoordinates
- lat,lon of the first point; lon, lat of the second point, e.g. 2.333333,48.866667;12.483333,41.900000alternative
- whether to return only the first option or more alternativessteps
- whether to return route steps - e.g. at the crossroad turn leftgeometries
- how is the route returned, eitherpolyline
[default],polyline6
,geojson
overview
- how to return the route -simplified
[default],full
,false
annotations
- if additional metadata are provided for each point on the route
Driving distance using Google directions API
OSRM router is not meant for excessive use [see the API policy] so in case you plan a commercial application, consider services like google map’s direction API. In that case, you start by calling a different API endpoint. You also need the API key to access the Google API and you are billed accordingly. Google offers $200 monthly which cover 40000 basic route requests [as of Dec-2020]
# coordinations format
origin_coor = 8.866667,2.333333
destination_coor = 41.9, 12.48# google API url
url = f"//maps.googleapis.com/maps/api/directions/json?origin={origin_coor}&destination={destination_coor}&mode=driving&key={API_KEY}"
Also, the structure of the answer is a bit different.
[In]:
results = json.loads[r.content]
legs = results.get["routes"].pop[0].get["legs"]
legs[0].get["duration"], legs[0].get["distance"][Out]:
[{'text': '13 hours 44 mins', 'value': 49459},
{'text': '1,420 km', 'value': 1420430}]
The optimal route between many places
Finding the optimal route between several places is quite a complex task. First, you need to gather the distances/duration between all the cities. That’s [numberof_places choose 2
] combinations of pairs of places.
Let’s look at the central European region containing 9 states. There are 36 unique combinations of distances between the 9 capitals of these countries.
[In]:
combination = itertools.combinations[list[ce_cities["capital"]],2]
len[[c for c in combinations]][Out]:
36# since python 3.8 you can simply count the combinations using
[In]: math.comb[9,2]
[Out]: 36
In the complete notebook on the github you can see all the 36 combinations and their distances processed into a pandas data frame. Actually, all 72 combinations, because each route is there twice, for example, Prague-Vienna and Vienna-Prague. [disregarding, minor differences which can occur taking the other direction]
We’re having 9 places. We can start in any of the nice. From your starting point, you can head to any of the 8 remaining places. Then to any of the 7 from the rest. The mean that between 9 cities we have 181440 possible routes in total:
[In]:
number_of_places = 9
math.factorial[number_of_places]/2[Out]: 181440
If we would have one place more, it would be 1814400. With 11 addresses we reach almost 20 million combinations and the number skyrocket really fast. To find an optimal route from all those combinations is the task for the Travelling salesman problem.
Travelling salesman problem [TPS]
In this example, we will consider the collected duration of the car routes between our 9 central European cities. I haven’t found the ideal python implementation for this problem. Genevieve Hayes describes the use of her mlrose library in her article. Jaboc Moore suggests another approach also using genetic algorithms. There are many more TPS solvers available online.
In the gist below you can see the mlrose algorithm and because we have relatively few options [~200K] we can confirm that the results are accurate using brute force, by simply calculating the total duration of all of these twenty thousand paths.
Brute force approach
On the picture in the previous chapter, you could see that each city has an ID. 1 →Vienna, 2 →Prague etc. The dist_list
variable contains the travel time between each combination, e.g. [0,1,13949]
. A journey from Vienna to Prague takes 13949 seconds or 3:44:54.
Because we have 9 cities, we
can create all possible roads between them using itertools
.
[In]:
import itertools
perm = [[l for l in itertools.permutations[range[9], 9]]]
perm[:5] # get first 5 items[Out]:
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[0, 1, 2, 3, 4, 5, 6, 8, 7],
[0, 1, 2, 3, 4, 5, 7, 6, 8],
[0, 1, 2, 3, 4, 5, 7, 8, 6],
[0, 1, 2, 3, 4, 5, 8, 6, 7]]
To easily access the distances we turn the list of tuples [[0,1,13949],[2,1,13651],...]
into a dictionary {[0,1]:13494, [2,1]: 13651, ...}
Then we iterate over the list of all path permutations. We add the starting point to the end, to get back home.
# pick the first path
path = perm[0]# add th first element to conclude the circular path
path = path + [path[0],][In]: print[path]
[Out]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 0]
And iterate through all the sub_paths: 0–1
, 1–2
, 2–3
…
# iterate through the path and sum all the distances
total_path_distance = 0
for i in range[len[path]-1]:
edge = [path[i], path[i+1]]
total_path_distance += distances[edge] [In]: print[total_path_distance]
[Out]: 24945
If we do it for all the possible combinations we find out, that there are 9 shortest cyclical routes that start and end at the same place. These paths are just shifted, depending on where you start. One of them is also the path found by the TPS algorithm.
[0, 5, 2, 1, 8, 4, 7, 3, 6, 0]
[5, 2, 1, 8, 4, 7, 3, 6, 0, 5]
[2, 1, 8, 4, 7, 3, 6, 0, 5, 2]
Usually, you cannot choose, because your truck or the postman must return home at the end of the shift. If you consider your vacation, you also come back home, eventually. But, if it would be possible, start in Bern and end in Prague [or v.v.] to avoid the longest route on the path.
In the github notebook you can check how to order or reverse the list of the cities in pandas. For example, if you need to start in Ljubljana.
You can also plot the chart using Plotly library.
Conclusion
We have review how to get the distance between two points or the list of places. We can decide if we need the shortest distance, the driving distance, or time. Knowing this information we can find the shortest route between the locations by solving the Travelling Salesman problem or if we don’t have many places by force.
See the complete code on github - Driving Distance between two places.ipynb
Did you like this article? Maybe you will also like:* How to turn a list of addreses into a map with geocoding
* Plotly Express comphrehensive tutorial
* How to unzip and process bunch of XML in a folder
* Unusual but useful upsampling with pandas