By Jelle Pelgrims
Introduction
In this article I have documented my high school project, which I completed in 2013. The project tried to find an approximate solution for the travelling salesman problem, which concerns a salesman trying to find the shortest route between a number of cities that he can only visit once. My solution used a genetic algorithm; a heuristic inspired by evolution.
The code for this article can be found here.
1. The travelling salesman problem
The travelling salesman problem (or TSP) can be defined as follows: given a number of cities, the distances between them, and an origin city, find the shortest route that visits each city only once and returns to the origin city. It is a subject of major importance within the logistics sector, seeing as shorter routes and faster delivery times are crucial in the process of maximizing profits and minimizing costs.
The problem seems deceptively easy at first; finding the shortest route between four or five of cities is easily done, after all. The actual difficulty of the problem lies in the exponential increase of possible routes as the number of cities increases. With five cities there are 120 possible routes, with 10 cities there are about three and a half million routes. Once more than fifteen cities are taken into consideration there are already more than a trillion possible routes. Brute-forcing a TSP quickly becomes impractical. Finding a perfect solution might even become impossible with a large enough number of cities; "good enough" will have to do. This is where the genetic algorithm comes into play.
2. Applying genetic algorithms
A genetic algorithm (or GA) is a process that looks for solutions to an unwieldy problem by mimicking evolution. The algorithm generates a large population of random solutions to start with. The solutions are then judged using a function that calculates the quality of the solutions (also known as the "fitness" function). The best solutions are selected and used to "breed" and generate a new population of solutions, only to repeat the whole process. The algorithm will halt and return the best solution once a specific threshold of solution fitness is reached.
2.1. Encoding solutions
For the algorithm to be able to generate solutions to a specific problem there needs to be a solution encoding; a manner of describing solutions that the algorithm "understands". For the travelling salesman problem this turns out to be rather easy. If we assign an ID (a number, or any other unique sort of identifier) to each city, we can then describe a route by simply making a list of cities, sorted in the order in which they will be visited. Additionally, the first and last city in this list will have to be the city of departure (so the route ends where it started, as required by the problem statement). With this solution encoding it is now possible to create random routes by randomly sorting the list of city ID's and adding the departure city ID to the front and end of the list.
2.2. Evaluating solutions
Now that we have a way to describe solutions to the TSP and a way of generating random routes to start off with, we still need a way to evaluate the solutions or routes. This is usually done with a fitness function, which takes a solution and calculates a fitness score, indicating the success of the solution. Once again this turns out to be rather easy for the travelling salesman problem. We want the shortest route, so we can calculate the fitness score by simply adding up the distances between the cities that were travelled through. This way the fitness score is synonymous with the length of a given route, and a lower score/length is better.
2.3. Selecting solutions
In order to generate a new population we will have to select enough pairs of "parent solutions" which will then be used to create offspring. There are lots of ways to do this; in this case I have used the "tournament selection" method. This method will choose a specified amount of random solutions from the population, and select the solution with the highest fitness score to serve as a parent.
There are other method to do this, but the most important thing to take away here is that they all try to select the best solutions, while still introducing some randomness. The added randomness keeps the populations from becoming a monoculture. If only the best solutions were selected, the algorithm would quickly converge on the same solution, possibly missing other and better solutions.
2.4. Generating new solutions
Once enough parent solutions are selected they will somehow have to be combined to form a new solution. This is usually done using a "crossover" method. A simple crossover example applied to our TSP algorithm could look like this: split the parent routes at the same random point, and concatenate the first part of the first parent and the second part of the other parent to form a new route.
To actually use this crossover method we will have to make things a little bit more complicated. A big issue here is the fact that solution routes can only contain each city once. If we simply combine random routes without any precautions some of the new routes will contain duplicate cities. These routes are invalid solutions for the TSP and need to be avoided somehow.
I chose to use an adapted single-point crossover method. Given two routes, the method will divide the first route at a random point. The first resulting part is used as the "front" of the new route. The method then loops over the cities in the second route and appends every city to the new route that it doesn't already contain.
It is also possible to introduce more variation in the population through the use of mutation. This can simply be done by swappping two random cities in a route.