Trustbit

View Original

Part 3: Trustbit Logistics Hackathon - Add speed model to logistic simulation

So far in the series we have built a trivial logistics simulation runtime. At this point it is only capable of finding the fastest route between two locations. This is implemented as a form of A* algorithm that uses predefined travel times.

Let’s extend the implementation and demonstrate how we can “plug” different models into the simulation runtime.

Within this article we’ll mine historical data to build a naive speed model. This model will predict an average speed for a road segment, given time of the day. Logic will live in train function.

The trained model will then be passed to a modified route function that is very similar to the logic from the previous article.

When wired together, we should be able to do something like that:

See this content in the original post

Source code for this article is in Trustbit/logisim repository, within the src/logisim3.

Mine the data

We’ll use synthetic data from Logistic Kata 2.3 which is stored in history.csv and looks like this:

The data shows a historical event log of various transports driving between the locations.

Each line tracks one travel between connected locations (a segment) including arrival time and average speed.

We know that there are traffic jams during the day, so we want to build a model that predicts travel time between two adjacent locations given the departure time of the day.

For that, we need to:

  • Compute departure time for each record in history.csv

  • Group all records for each pair of (Origin -> Destination)

  • For each group, train a model that predicts the speed, using departure time as a feature.

This time, instead of basic logic from scratch, we are going to use popular Python libraries:

  • numpy - mathematical functions

  • pandas - data analysis library

  • fire - Google library to generate CLI interface from any function

Our imports will follow standard conventions and look like this:

See this content in the original post

Data is bundled with the source code, so we can load it into data frames:

See this content in the original post

For each record in history_df we can compute departure time by:

  • Looking up distance between two locations in map_df

  • Computing departure datetime by Departure=Time-(Km/Speed)

  • Converting date time (with date) to a time of the day variable via departure.hour + departure.minute / 60.0`

Let’s do that.

We start by defining two helper functions. One maps location pairs to a segment name, another one - computes departure time of the day:

See this content in the original post

Next, we join both data frames by the computed Segment column and then fill in our Depart value:

See this content in the original post

Training

It is always a good idea to split dataset in two partitions: train and test. We train the model on the former and test quality on the latter.

See this content in the original post

Next, we group all records by Origin, Destination. For each edge, we use numpy to fit a polynomial of degree 3 to available recorded points, where x is time of the departure and y is speed. This is a naive way to capture a model.

See this content in the original post

At this point our model is a dictionary, where keys are represented by tuples and values are functions. This might be hard to keep in mind, especially in a language like Python.

Let’s wrap that data structure with a class that makes prediction more explicit:

See this content in the original post

Now we can use this model to compute Mean Squared Error (or MSE) on a test dataset:

See this content in the original post


Given test ratio of 0.2 current implementation prints:

See this content in the original post

This is not bad, although still worse than the lowest MSE of 44.29 achieved by Daniel Weller in Transport Tycoon Kata 2.3.

Plug Model into Routing

This SpeedModel hides away speed calculation complexity, so it is fairly simple to plug into the simulation code from the previous article.

We are going to strip the comments here, they will still be available in the linked source code.

We start by loading a map. This time we ignore speed and keep track of the distance between the adjacent locations:

See this content in the original post

Start of the simulation loop is going to stay exactly the same as before, so we are going to skip it here.

We are going to change part of the simulation loop that determines time interval between current clock in location and truck arrival to destination:

See this content in the original post

Having that out of the way, we could now wire model training and routing into a single function:

See this content in the original post

Thanks to fire library, we can execute train_and_route like this:

See this content in the original post

Summary

Within this article we have explored how to mine historical data for insights, capture them into a model and plug that model into the deterministic simulation.

We covered the speed model, but the approach could be applied in a similar fashion to any other simulation parameter: location profiles, incident probabilities, fuel consumption or intermodal transfer times.

Source code for this article is in Trustbit/logisim repository, within the src/logisim3.