Part 3: Trustbit Logistics Hackathon - Add speed model to logistic simulation
So far in the series we have built a trivial logistics simulation runtime. At this point it is only capable of finding the fastest route between two locations. This is implemented as a form of A* algorithm that uses predefined travel times.
Let’s extend the implementation and demonstrate how we can “plug” different models into the simulation runtime.
Within this article we’ll mine historical data to build a naive speed model. This model will predict an average speed for a road segment, given time of the day. Logic will live in train
function.
The trained model will then be passed to a modified route
function that is very similar to the logic from the previous article.
When wired together, we should be able to do something like that:
Source code for this article is in Trustbit/logisim repository, within the src/logisim3
.
Mine the data
We’ll use synthetic data from Logistic Kata 2.3 which is stored in history.csv
and looks like this:
The data shows a historical event log of various transports driving between the locations.
Each line tracks one travel between connected locations (a segment) including arrival time and average speed.
We know that there are traffic jams during the day, so we want to build a model that predicts travel time between two adjacent locations given the departure time of the day.
For that, we need to:
Compute departure time for each record in
history.csv
Group all records for each pair of
(Origin -> Destination)
For each group, train a model that predicts the speed, using departure time as a feature.
This time, instead of basic logic from scratch, we are going to use popular Python libraries:
numpy - mathematical functions
pandas - data analysis library
fire - Google library to generate CLI interface from any function
Our imports will follow standard conventions and look like this:
Data is bundled with the source code, so we can load it into data frames:
For each record in history_df
we can compute departure time by:
Looking up distance between two locations in
map_df
Computing departure datetime by
Departure=Time-(Km/Speed)
Converting date time (with date) to a time of the day variable via
departure.hour + departure.minute / 60.0
`
Let’s do that.
We start by defining two helper functions. One maps location pairs to a segment name, another one - computes departure time of the day:
Next, we join both data frames by the computed Segment
column and then fill in our Depart
value:
Training
It is always a good idea to split dataset in two partitions: train and test. We train the model on the former and test quality on the latter.
Next, we group all records by Origin, Destination. For each edge, we use numpy to fit a polynomial of degree 3 to available recorded points, where x
is time of the departure and y
is speed. This is a naive way to capture a model.
At this point our model is a dictionary, where keys are represented by tuples and values are functions. This might be hard to keep in mind, especially in a language like Python.
Let’s wrap that data structure with a class that makes prediction more explicit:
Now we can use this model to compute Mean Squared Error (or MSE) on a test dataset:
Given test ratio of 0.2 current implementation prints:
This is not bad, although still worse than the lowest MSE of 44.29 achieved by Daniel Weller in Transport Tycoon Kata 2.3.
Plug Model into Routing
This SpeedModel
hides away speed calculation complexity, so it is fairly simple to plug into the simulation code from the previous article.
We are going to strip the comments here, they will still be available in the linked source code.
We start by loading a map. This time we ignore speed and keep track of the distance between the adjacent locations:
Start of the simulation loop is going to stay exactly the same as before, so we are going to skip it here.
We are going to change part of the simulation loop that determines time interval between current clock
in location
and truck arrival to destination
:
Having that out of the way, we could now wire model training and routing into a single function:
Thanks to fire
library, we can execute train_and_route
like this:
Summary
Within this article we have explored how to mine historical data for insights, capture them into a model and plug that model into the deterministic simulation.
We covered the speed model, but the approach could be applied in a similar fashion to any other simulation parameter: location profiles, incident probabilities, fuel consumption or intermodal transfer times.
Source code for this article is in Trustbit/logisim repository, within the src/logisim3
.