Posted by : at

One of the great things to release alongside the Jetson Nano is Jetpack 4.2, which includes support for TensorRT in python. One of the easiest ways to get started with TensorRT is using the TF-TRT interface, which lets us seamlessly integrate TensorRT with a Tensorflow graph even if some layers are not supported. Of course this means we can easily accelerate Keras models as well!

nVidia now provides a prebuilt Tensorflow for Jetson that we can install through pip, but we also need to make sure certain dependencies are satisfied.

sudo apt install python3-numpy python3-markdown python3-mock python3-termcolor python3-astor libhdf5-dev

Follow the instructions here to install tensorflow-gpu on Jetpack 4.2:

Now that Tensorflow is installed on the Nano, lets load a pretrained MobileNet from Keras and take a look at its performance with and without TensorRT for binary classification.

import tensorflow.keras as keras
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten

mobilenet = keras.applications.mobilenet.MobileNet(include_top=False, input_shape=(224, 224, 3), weights='imagenet', alpha=0.25)

x = Flatten()(mobilenet.output)
new_output = Dense(1, activation='sigmoid')(x)

model = Model(inputs=mobilenet.input, outputs=new_output)

# TODO: Train your model for binary classification task'mobilenet.h5')

Next we can execute inferences with different settings using this script (thanks to jeng1220 for the Keras to TF-TRT code)

You will need to install plac to run the script: pip3 install --user plac

# Tensorflow Standard Inference
python3 -S 30 -T TF mobilenet.h5
# Time = 4.19 s
# Samples = 30
# FPS = Samples / Time = 30 / 4.19 = 7.16 FPS

# TensorRT FP32 Inference
python3 -S 30 -T FP32 mobilenet.h5
# Time = 0.96 s
# Samples = 30
# FPS = Samples / Time = 30 / 0.96 = 31.3 FPS

# TensorRT FP16 Inference
python3 -S 30 -T FP16 mobilenet.h5
# Time = 0.84 s
# Samples = 30
# FPS = Samples / Time = 30 / 0.84 = 35.8 FPS

It looks like TensorRT makes a significant difference vs simply running the inference in Tensorflow! Stay tuned for my next steps on the Nano: implementing and optimizing MobileNet SSD object detection to run at 30+ FPS!