Posted by : at

nVidia’s Jetson platform is arguably the most powerful family of devices for deep learning at the edge. In order to achieve the full benefits of the platform, a framework called TensorRT drastically reduces inference time for supported network architectures and layers. However, nVidia does not currently make it easy to take your existing models from Keras/Tensorflow and deploy them on the Jetson with TensorRT. One reason for this is the python API for TensorRT only supports x86 based architectures. This leaves us with no real easy way of taking advantage of the benefits of TensorRT. However, there is a harder way that does work: To achieve maximum inference performance we can export and convert our model to .uff format, and then load it in TensorRT’s C++ API.

1. Training and exporting to .pb

  • Train your model
  • If using Jupyter, restart the kernel you trained your model in to remove training layers from the graph
  • Reload the models weights
  • Use an export function like the one in this notebook to export the graph to a .pb file

2. Converting .pb to .uff

I suggest using the chybhao666/cuda9_cudnn7_tensorrt3.0:latest Docker container to access the script needed for converting a .pb export from Keras/Tensorflow to .uff format for TensorRT import.

cd /usr/lib/python2.7/dist-packages/uff/bin
# List Layers and manually pick out the output layer
# For most networks it will be dense_x/BiasAdd, the last one that isn't a placeholder or activation layer
python tensorflow --input-file /path/to/graph.pb -l

# Convert to .uff, replace dense_1/BiasAdd with the name of your output layer
python tensorflow -o /path/to/graph.uff --input-file /path/to/graph.pb -O dense_1/BiasAdd

More information on the .pb export and .uff conversion is available from nVidia

3. Loading the .uff into TensorRT C++ Inference API

I have created a generic class which can load the graph from a .uff file and setup TensorRT for inference while taking care of all host / device CUDA memory management behind the scenes. It supports any number of inputs and outputs and is available on my Github. It can be built with nVidia nSight Eclipse Edition using a remote toolchain (instructions here)


  • Keep in mind that many layers are not supported by TensorRT 3.0. The most obvious omission is BatchNorm, which is used in many different types of deep neural nets.
  • Concatenate only works on the channel axis and if and only if the other dimensions are the same. If you have multiple paths for convolution, you are limited to concatenating them only when they have the same dimensions.