Jekyll2020-10-22T15:48:32+00:00https://csvance.github.io/feed.xmlCarroll VanceCarroll VanceOrdinary Differential Equations: Limit Sets and Long Term Behavior2019-05-09T12:00:00+00:002019-05-09T12:00:00+00:00https://csvance.github.io/blog/2019/05/09/ode-limit-sets-long-term-behavior<p>I wrote my Ordinary Differential Equations term project in <a href="https://www.latex-project.org">LaTeX</a> and thought it would be fun to see if I could convert it to markdown and post it here. Apparently <a href="https://www.mathjax.org">MathJax</a> makes this easy, so here it is!</p>
<h1 id="abstract">Abstract</h1>
<p>This paper will address the long-term behavior of systems of ordinary
differential equations (hereafter ODEs), including the first and second
dimensional cases. The specific behavior of interest is:</p>
\[\lim_{t\to\infty} y(t) \neq\pm\infty\]
<p>where ${y}(t)$ is a solution to
a system of ODEs. Families of solutions will be considered analytically
and numerically for behavior such as periodic orbits and fixed points.
The considered problems will be plotted and have their limit sets
enumerated.</p>
<h1 id="dimension-d--1">Dimension D = 1</h1>
<p>Consider the first order linear ODE:</p>
\[y\prime = y\]
<p>By simply looking
at the equation, it can be observed that it is autonomous (no
independent variable is present in the ODE). From this we know that the
behavior of the solution does not change with the independent variable
$t$. Solving the differential equation through integration provides a
solution:</p>
\[y(t) = C_1e^t\]
<p>Analytically it can be observed that with an
initial condition of $y(0) = 0$ that $C_1 = 0$. Consequentially for any
value of $t$, $y(t) = 0$:</p>
\[\lim_{t\to\infty} y(t) = 0 \textrm{ where } y(0) = 0\]
<p>For any other
initial condition, it can observed that $y(t)$ will grow without bounds:</p>
<p><img src="/assets/img/posts/d1.png" alt="" class="img-fluid" /></p>
<p>Because $y\prime = y$ is a one dimensional system, its phase plane only
has a single dimension $y(t)$. The limit set includes all points in this
dimension such that:</p>
\[\lim_{t\to\infty} y(t) \neq\pm\infty | t \in \mathbb{R}\]
<p>It directly
follows that the limit set for the ODE $y\prime = y$ is $\{(0)\}$.</p>
<h1 id="dimension-d--2">Dimension D = 2</h1>
<h4 id="damped-vibrating-spring">Damped Vibrating Spring</h4>
<p>Consider the second order system of linear ODEs:</p>
\[\begin{array}{lcl}
y\prime=v\\
v\prime=-4y-2v
\end{array}\]
<p>This is derived from a second order unforced and damped
harmonic motion ODE $y\prime\prime + 2cy\prime + \omega_0^2 = 0$ where
$c = 1$ and $\omega_0 = 2$. When $c < \omega_0$ the system is
over-damped. Knowing this, it should be expected that all solutions in
the phase plane converge to a single point as the damping term overtakes
the rest of the system. Consequentially, the limit set of the system
contains the point of convergence for all solutions as $t\to\infty$.</p>
<p><img src="/assets/img/posts/d2_a_p.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_a_v.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_a_ph.png" alt="" class="img-fluid" /></p>
<p>Plotting the solution curves displays behavior typical of a spiral sink,
and the trace determinant plane confirms it: $D(A) = 4,T(A) = -2$. Due
to this behavior, all sollution curves in the phase plane converge to a
single two dimensional equilibrium point $(0, 0)$ as $t\to\infty$. It
follows that the limit set for this ODE is $\{(0, 0)\}$. While this
limit set resembles the limit set of the one dimensional case, it should
be noted that the ODE in the one dimensional case only had a non
infinite solution as $t\to\infty$ with a single IVP, where as every IVP
in the two dimensional case converged towards the origin.</p>
<h4 id="undamped-vibrating-spring">Undamped Vibrating Spring</h4>
<p>Consider the second order system of linear ODEs:</p>
\[\begin{array}{lcl}
y\prime=v\\
v\prime=-4y
\end{array}\]
<p>This is derived from a second order unforced and undamped
harmonic motion ODE $y\prime\prime + 2cy\prime + \omega_0^2 = 0$ where
$c = 0$ and $\omega_0 = 2$. When $c = 0$ the system is undamped. Knowing
this, it should be expected that all solutions of $y(t)$ and
$y\prime(t)$ to be repeating in nature, so they should form an ellipse
like shape in the phase plane. Consequentially, the limit set of the
system should contain as many members as there are solutions to initial
value problems. A plot confirms these suspicions:</p>
<p><img src="/assets/img/posts/d2_b_p.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_b_v.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_b_ph.png" alt="" class="img-fluid" /></p>
<p>This system clearly exhibits a center behavior, which is confirmed by
the trace determinant plane: $D(A) = 4,T(A) = 0$. It follows that the
limit set contains every curve in the phase plane formed by the general
solution of the ODE and its derivative:</p>
\[\begin{array}{lcl}
y(t) = C_1cos(2t) + C_2sin(2t)\\
y\prime(t) = -C_1(2sin(2t)) + C_2(2cos(2t))\\
\end{array}\]
<p>It should also be noted that the origin of the phase plane
$(0, 0)$ is in the limit set. This can easily be verified by solving the
initial value problem for $y(0) = 0, y\prime(0) = 0$. Finally, the limit
set for this ODE can be characterized by the two behaviors mentioned
above:</p>
<ol>
<li>
<p>$(y=0,y\prime=0)$: Solutions that start at the origin stay at the
origin as $t\to\infty$</p>
</li>
<li>
<p>$y\ne0,y\prime \ne 0$: Solutions that start outside of the origin
stay in a periodic solution curve as $t\to\infty$ which is defined
by the system solution for the initial value problem.</p>
</li>
</ol>
<h4 id="undamped-pendulum">Undamped Pendulum</h4>
<p>Consider the second order system of nonlinear ODEs:</p>
\[\begin{array}{lcl}
\theta\prime=\omega\\
\omega\prime=-sin(\theta)\\
\end{array}\]
<p>This is derived from the undamped pendulum ODE
$\theta\prime\prime = -sin(\theta)$. There are quite a few initial
conditions $p_0$ to consider.</p>
<ol>
<li>
<p>For $p_0\in\{ (\theta = 2n\pi, \omega_0 = 0)| n\in\mathbb{Z} \}$,
the phase plane solution should be a single point $p_0$ as the
pendulum has no energy.</p>
</li>
<li>
<p>For $p_0\in\{\theta_0 \ne 0, \omega_0 = 0\}$ the phase plane
solution should look like a center around a point
$p_c \in \{ (\theta = 2n\pi, \omega)| n\in\mathbb{Z} \}$ as the
pendulum converts potential energy to velocity, back to potential
energy, and finally changing directions to repeat the process.</p>
</li>
<li>
<p>For $p_0\in\{(\theta_0=0,\mid\omega_0\mid \lessapprox 2.00027)\}$ ,
the pendulum should behave like case two.</p>
</li>
<li>
<p>For $p_0\in\{(\theta_0=0,\mid\omega_0\mid \gtrapprox 2.00027)\}$,
the pendulum no longer changes direction and
$\lim_{t\to\infty} \theta(t) = \pm\infty$</p>
</li>
<li>
<p>For
$p_0\in\{ (\theta_0 = \pi + 2n\pi, \omega_0 = 0)| n\in\mathbb{Z} \}$,
the phase plane solution should be a single point $p_0$ as the
pendulum needs a nudge in either direction to start moving.</p>
</li>
</ol>
<p>The plots above confirm the predicted behavior of solutions. The limit
set can be broken down into three different cases:</p>
<p><img src="/assets/img/posts/d2_c_p.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_c_v.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_c_ph.png" alt="" class="img-fluid" /></p>
<ol>
<li>
<p>Solutions where $\omega$ contains positive and negative values
behave as a center around some point
$p_c \in \{(\theta_0 = 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$.
This is confirmed by looking at the trace determinant plane for the
center equilibrium: $D(J)=1, T(J)=0$.</p>
</li>
<li>
<p>$p_0 \in \{(\theta_0 = 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$:
Solutions to this IVP stay at the angle they started at as
$t\to\infty$. These equilibrium points are shared with case one.</p>
</li>
<li>
<p>$\{ (\theta_0 = \pi + 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$:
Solutions to this IVP stay at the angle they started at as
$t\to\infty$. These equilibrium points are saddle points in the
trace determinant plane: $D(J)=-1, T(J)=0$.</p>
</li>
</ol>
<p>It is important to keep in mind that solutions where
$\lim_{t\to\infty} \omega(t) \neq\pm\infty$ are not a member of the
limit set if $\lim_{t\to\infty} \theta(t) = \pm\infty$. This excludes
any solutions for which $\omega$ is exclusively $> 0$ or $< 0$ as
$t\to\infty$. However, we can still make a definitive statement as to
the behavior of $\omega$ as $t\to\infty$.</p>
<h4 id="damped-pendulum">Damped Pendulum</h4>
<p>Consider the second order system of nonlinear ODEs:</p>
\[\begin{array}{lcl}
\theta\prime=\omega\\
\omega\prime=-sin(\theta) - \frac{1}{2}\omega\\
\end{array}\]
<p>This is derived from the damped pendulum ODE
$\theta\prime\prime = -sin(\theta) - c\theta\prime$. A reasonable guess
as to its behavior would be behaving much like the undamped case except
in the cases that resulted in endless oscillation. On the phase plane,
these cases should converge to
$(\theta_0 = 2n\pi | n\in\mathbb{Z}, \omega_0 = 0)$ instead.</p>
<p><img src="/assets/img/posts/d2_d_p.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_d_v.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_d_ph.png" alt="" class="img-fluid" /></p>
<p>Based on analysis of the system and observed behavior of the numerical
plot, a limit set can be constructed:</p>
<ol>
<li>
<p>When $\{(\theta_0 = \pi + 2n\pi, \omega_0=0)| n\in\mathbb{Z}\}$,
solutions remain where they started. This can be verified because
these values are in the set of equilibrium points. The trace
determinant shows saddle point behavior for the set of initial
conditions: $D(J)=1, T(J)=-\frac{1}{2}$</p>
</li>
<li>
<p>All solutions not included in case one converge to a spiral sinks in
the set of $\{(\theta = 2n\pi, \omega = 0)| n\in\mathbb{Z}\}$
depending on initial conditions. This can be confirmed by
recognizing that equilibrium points which follow the same pattern
are all in the spiral sink region of the trace determinant plane:
$D(J)=1, T(J)=-\frac{1}{2}$</p>
</li>
</ol>
<h4 id="competing-species">Competing Species</h4>
<p>Consider the second order system of nonlinear ODEs:</p>
\[\begin{array}{lcl}
x\prime = (1 - x - y)x\\
y\prime = (4 - 2x -7y)y\\
\end{array}\]
<p>In a competing species system, the populations of two
species interact with each other and/or themselves. To analyze such a
system for long term behavior, the equilibrium points should first be
considered:</p>
\[\begin{array}{lcl}
(1-x-y)x = 0\\
(4-2x-7y)y=0\\
\end{array}\]
<p>Solving for the x-nullcline, y-nullcline, and nullcline
provides the set
$\{(0, 0), (1, 0), (0, \frac{4}{7}), ({\frac{3}{5}, \frac{2}{5}})\}$
Next, the system is linearized by computing the Jacobian:</p>
\[\begin{array}{lcl}
J = \begin{pmatrix}
\frac{\partial}{\partial x}(1 - x - y)x & \frac{\partial}{\partial y}(1 - x - y)x \\
\frac{\partial}{\partial x}(4 - 2x - 7y)y & \frac{\partial}{\partial y}(4 - 2x - 7y)y
\end{pmatrix}\\
J = \begin{pmatrix}
1 - y - 2x & -x\\
-2y & 4-14y-2x
\end{pmatrix}\\
\end{array}\]
<p>Next the Jacobian is used to examine the equilibrium in
the trace determinant plane:</p>
\[\begin{array}{lcl}
(0, 0):\textrm{ Nodal Source}\\
D_{0, 0} = det_{0, 0}(J) = 4\\
T_{0, 0} = tr_{0, 0}(J) = 5\\
\\
(1, 0):\textrm{ Saddle Point}\\
D_{1, 0} = det_{1, 0}(J) = -2\\
T_{1, 0} = tr_{1, 0}(J) = 1\\
\\
(0, \frac{4}{7}):\textrm{ Saddle Point}\\
D_{0, \frac{4}{7}} = det_{0, \frac{4}{7}}(J) = 0\\
T_{0, \frac{4}{7}} = tr_{0, \frac{4}{7}}(J) = -3\\
\\
(\frac{3}{5}, \frac{2}{5}):\textrm{ Nodal Sink}\\
D_{\frac{3}{5}, \frac{2}{5}} = det_{\frac{3}{5}, \frac{2}{5}}(J) = \frac{6}{5}\\
T_{\frac{3}{5}, \frac{2}{5}} = tr_{\frac{3}{5}, \frac{2}{5}}(J) = -\frac{17}{5}\\
\end{array}\]
<p>Plotting for several initial value problems, a few things
can be observed:</p>
<ol>
<li>
<p>When $x$ and $y$ have an initial values $> 0$, their solutions
gravitate towards $\frac{3}{5}$ and $\frac{2}{5}$. This is
consistent with the predicted nodal sink behavior.</p>
</li>
<li>
<p>Initial values of $x$ and $y$ that start at zero stay at zero, which
is expected when starting at the center of a nodal source.</p>
</li>
<li>
<p>When $y = 0$, Initial values of $x > 0$ gravitate towards one. This
makes sense when $y = 0$ reduces the first equation to
$x\prime = x(x - 1)$ which clearly has a root of $1$. It follows
that it has an equilibrium point there as well. Examining the trace
determinant of $(1, 0)$ places it in the region of saddle points.</p>
</li>
<li>
<p>When $x = 0$, $y$ gravitates towards $\frac{4}{7}$. This makes sense
considering the second equation is reduced to $y\prime = y(4 - 7y)$,
which has a root $\frac{4}{7}$. Examining the trace determinant of
this point it clearly falls into the region of saddle points.</p>
</li>
</ol>
<p><img src="/assets/img/posts/d2_e_ph.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_e_1_1.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_e_1_0.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_e_0_1.png" alt="" class="img-fluid" />
<img src="/assets/img/posts/d2_e_0_0.png" alt="" class="img-fluid" /></p>
<p>Considering the above, the limit set contains individual points on the
phase plane:
$\{(0, 0), (1, 0), (0, \frac{4}{7}), ({\frac{3}{5}, \frac{2}{5}})\}$.
This system contains more non periodic individual points in its limit
set than any previously examined system while containing no center
curves.</p>
<h4 id="van-der-pols-equation">Van der Pol’s Equation</h4>
<p>Consider the following system of nonlinear ODEs:</p>
\[\begin{array}{lcl}
x\prime = 2x-y-x^3\\
y\prime = x\\
\end{array}\]
<p>To analyze this system, first the equilibrium points need
to be solved using the intersection of the nullclines:</p>
\[\begin{array}{lcl}
2x-y-x^3 = 0\\
x = 0\\
S = \\{(0, 0)\\}\\
\end{array}\]
<p>Next, the system is linearized and the trace determinant
of the equilibrium point computed:</p>
\[\begin{array}{lcl}
J = \begin{pmatrix}
\frac{\partial}{\partial x}2x-y-x^3 & \frac{\partial}{\partial y} 2x-y-x^3\\
\frac{\partial}{\partial x}x & \frac{\partial}{\partial y} x
\end{pmatrix}\\
\\
J = \begin{pmatrix}
2 - 3x^2 & -1\\
1 & 0
\end{pmatrix}\\
\\
(0, 0):\textrm{ Special Case / Nodal Source}\\
D_{0, 0} = det_{0, 0}(J) = 1\\
T_{0, 0} = tr_{0, 0}(J) = 2\\
\end{array}\]
<p>The equilibrium point is a special case with real
eigenvalues and $T^2 = 4D$. Plotting the phase plane reveals atypical
behavior:</p>
<p><img src="/assets/img/posts/d2_f_ph.png" alt="" class="img-fluid" /></p>
<p>There are two clear behaviors here, both of which are clearly members of
the limit set:</p>
<ol>
<li>
<p>For $(x = 0, y = 0)$, the solution does not leave the origin of the
phase plane.</p>
</li>
<li>
<p>All other initial values seem to converge into the exact same
periodic closed curve. This is different than previous center
periodic solutions where there were infinitely many different paths
depending on initial conditions.</p>
</li>
</ol>
<p>Setting the initial condition to a part of the closed loop yields the
following result:</p>
<p><img src="/assets/img/posts/d2_f_loop.png" alt="" class="img-fluid" /></p>
<h1 id="extra-work">Extra Work</h1>
<h4 id="approximating-van-der-pols-solution-curve">Approximating Van der Pol’s Solution Curve</h4>
<p>While I wasn’t able to find a solution to Van der Pol’s equation, I was
not content walking away without at least approximating a solution. Here
are the steps I took to approximate a curve. First, we apply a rotation
using a linear transformation with $\theta=\frac{\pi}{16}$ (Matrix $A$
is the output of ODE45)</p>
<p><img src="/assets/img/posts/fit_rotate.png" alt="" class="img-fluid" /></p>
\[\begin{array}{lcl}
R = \begin{pmatrix}
cos(\theta) & -sin(\theta) \\
sin(\theta) & cos(\theta) \\
\end{pmatrix}\\\\
X = AR
\end{array}\]
<p>Next a shear transform is applied with $k = 0.03$:</p>
<p><img src="/assets/img/posts/fit_shear.png" alt="" class="img-fluid" /></p>
\[\begin{array}{lcl}
K = \begin{pmatrix}
1 & k \\
0 & 1 \\
\end{pmatrix}\\\\
X = XK
\end{array}\]
<p><img src="/assets/img/posts/fit_pretrans.png" alt="" class="img-fluid" /></p>
<p>A polynomial fit is found with $n = 26$. Next we apply the inverse of
the linear transformations previously applied.</p>
<p><img src="/assets/img/posts/fit_pretest.png" alt="" class="img-fluid" /></p>
<p>The moment of truth:</p>
<p><img src="/assets/img/posts/fit_test.png" alt="" class="img-fluid" /></p>
<p>While the fit may not be anywhere near perfect, it is a good first
attempt and perhaps worthy of more exploration at a later time.</p>Carroll VanceI wrote my Ordinary Differential Equations term project in LaTeX and thought it would be fun to see if I could convert it to markdown and post it here. Apparently MathJax makes this easy, so here it is!Installing and using Tensoflow with TF-TRT on the Jetson Nano2019-04-08T12:00:00+00:002019-04-08T12:00:00+00:00https://csvance.github.io/blog/2019/04/08/installing-tensorflow-tftrt-jetson-nano<p>One of the great things to release alongside the <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/">Jetson Nano</a> is <a href="https://developer.nvidia.com/embedded/jetpack">Jetpack 4.2</a>, which includes support for <a href="https://developer.nvidia.com/tensorrt">TensorRT</a> in python. One of the easiest ways to get started with TensorRT is using the <a href="https://github.com/tensorflow/tensorrt">TF-TRT interface</a>, which lets us seamlessly integrate TensorRT with a <a href="http://tensorflow.org">Tensorflow</a> graph even if some layers are not supported. Of course this means we can easily accelerate <a href="https://keras.io">Keras</a> models as well!</p>
<p>nVidia now provides a <a href="https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/">prebuilt Tensorflow</a> for Jetson that we can install through pip, but we also need to make sure certain dependencies are satisfied.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>apt <span class="nb">install </span>python3-numpy python3-markdown python3-mock python3-termcolor python3-astor libhdf5-dev</code></pre></figure>
<p>Follow the instructions here to install tensorflow-gpu on Jetpack 4.2: <a href="https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/">https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-</a></p>
<p>Now that Tensorflow is installed on the Nano, lets load a pretrained MobileNet from Keras and take a look at its performance with and without TensorRT for binary classification.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">tensorflow.keras</span> <span class="k">as</span> <span class="n">keras</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">Model</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.layers</span> <span class="kn">import</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">Flatten</span>
<span class="n">mobilenet</span> <span class="o">=</span> <span class="n">keras</span><span class="p">.</span><span class="n">applications</span><span class="p">.</span><span class="n">mobilenet</span><span class="p">.</span><span class="n">MobileNet</span><span class="p">(</span><span class="n">include_top</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">224</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">weights</span><span class="o">=</span><span class="s">'imagenet'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.25</span><span class="p">)</span>
<span class="n">mobilenet</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">Flatten</span><span class="p">()(</span><span class="n">mobilenet</span><span class="p">.</span><span class="n">output</span><span class="p">)</span>
<span class="n">new_output</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'sigmoid'</span><span class="p">)(</span><span class="n">x</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="n">mobilenet</span><span class="p">.</span><span class="nb">input</span><span class="p">,</span> <span class="n">outputs</span><span class="o">=</span><span class="n">new_output</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
<span class="c1"># TODO: Train your model for binary classification task
</span>
<span class="n">model</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="s">'mobilenet.h5'</span><span class="p">)</span></code></pre></figure>
<p>Next we can execute inferences with different settings using <a href="https://gist.github.com/csvance/47ec78d67894c0d454ca98029d4d323c">this script</a> (thanks to <a href="https://github.com/jeng1220">jeng1220</a> for the <a href="https://github.com/jeng1220/KerasToTensorRT">Keras to TF-TRT code</a>)</p>
<p>You will need to install plac to run the script: <code class="language-plaintext highlighter-rouge">pip3 install --user plac</code></p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Tensorflow Standard Inference</span>
python3 tftrt_inference.py <span class="nt">-S</span> 30 <span class="nt">-T</span> TF mobilenet.h5
<span class="c"># Time = 4.19 s</span>
<span class="c"># Samples = 30</span>
<span class="c"># FPS = Samples / Time = 30 / 4.19 = 7.16 FPS</span>
<span class="c"># TensorRT FP32 Inference</span>
python3 tftrt_inference.py <span class="nt">-S</span> 30 <span class="nt">-T</span> FP32 mobilenet.h5
<span class="c"># Time = 0.96 s</span>
<span class="c"># Samples = 30</span>
<span class="c"># FPS = Samples / Time = 30 / 0.96 = 31.3 FPS</span>
<span class="c"># TensorRT FP16 Inference</span>
python3 tftrt_inference.py <span class="nt">-S</span> 30 <span class="nt">-T</span> FP16 mobilenet.h5
<span class="c"># Time = 0.84 s</span>
<span class="c"># Samples = 30</span>
<span class="c"># FPS = Samples / Time = 30 / 0.84 = 35.8 FPS</span></code></pre></figure>
<p>It looks like TensorRT makes a significant difference vs simply running the inference in Tensorflow! Stay tuned for my next steps on the Nano: implementing and optimizing MobileNet SSD object detection to run at 30+ FPS!</p>Carroll VanceOne of the great things to release alongside the Jetson Nano is Jetpack 4.2, which includes support for TensorRT in python. One of the easiest ways to get started with TensorRT is using the TF-TRT interface, which lets us seamlessly integrate TensorRT with a Tensorflow graph even if some layers are not supported. Of course this means we can easily accelerate Keras models as well!Using the new Jetson.GPIO python library in Jetpack 4.22019-03-21T12:00:00+00:002019-03-21T12:00:00+00:00https://csvance.github.io/blog/2019/03/21/jetpack-42-jetson-gpio<p>With the release of the new <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/">Jetson Nano</a> also comes the <a href="https://developer.nvidia.com/embedded/jetpack">4.2 release of nVidia’s Jetpack BSP for the Jetson</a>. This included a new python library called Jetson.GPIO which provides a familiar interface for anyone who has used RPi.GPIO before. However, it doesn’t seem to be installed by default, so here are the instructions for getting it loaded into python!</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Setup groups/permissions</span>
<span class="nb">sudo </span>groupadd <span class="nt">-f</span> <span class="nt">-r</span> gpio
<span class="nb">sudo </span>usermod <span class="nt">-a</span> <span class="nt">-G</span> gpio your_user_name
<span class="nb">sudo cp</span> /opt/nvidia/jetson-gpio/etc/99-gpio.rules /etc/udev/rules.d/
<span class="nb">sudo </span>udevadm control <span class="nt">--reload-rules</span> <span class="o">&&</span> <span class="nb">sudo </span>udevadm trigger
<span class="c"># Reboot required for changes to take effect</span>
<span class="nb">sudo </span>reboot</code></pre></figure>
<p>Now we need to install Jetson.GPIO:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>pip3 <span class="nb">install </span>Jetson.GPIO
<span class="nb">sudo </span>pip <span class="nb">install </span>Jetson.GPIO</code></pre></figure>
<p>After this we should be able to import the library in both versions of python:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">nvidia@tx2:~<span class="nv">$ </span>python3
Python 3.6.7 <span class="o">(</span>default, Oct 22 2018, 11:32:17<span class="o">)</span>
<span class="o">[</span>GCC 8.2.0] on linux
Type <span class="s2">"help"</span>, <span class="s2">"copyright"</span>, <span class="s2">"credits"</span> or <span class="s2">"license"</span> <span class="k">for </span>more information.
<span class="o">>>></span> import Jetson.GPIO
<span class="o">>>></span>
nvidia@tx2:~<span class="nv">$ </span>python
Python 2.7.15rc1 <span class="o">(</span>default, Nov 12 2018, 14:31:15<span class="o">)</span>
<span class="o">[</span>GCC 7.3.0] on linux2
Type <span class="s2">"help"</span>, <span class="s2">"copyright"</span>, <span class="s2">"credits"</span> or <span class="s2">"license"</span> <span class="k">for </span>more information.
<span class="o">>>></span> import Jetson.GPIO
<span class="o">>>></span></code></pre></figure>
<p>Happy hacking!</p>Carroll VanceWith the release of the new Jetson Nano also comes the 4.2 release of nVidia’s Jetpack BSP for the Jetson. This included a new python library called Jetson.GPIO which provides a familiar interface for anyone who has used RPi.GPIO before. However, it doesn’t seem to be installed by default, so here are the instructions for getting it loaded into python!TensorRT ROS Nodes for nVidia Jetson2018-10-11T12:00:00+00:002018-10-11T12:00:00+00:00https://csvance.github.io/blog/2018/10/11/tensorrt-ros-nodes-nvidia-jetson<p>During the past few months I have been working towards making high performance deep learning inferences much more accessible in <a href="http://www.ros.org">ROS</a> on the <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems-dev-kits-modules/">nVidia Jetson TX2</a>. The result is <a href="https://github.com/csvance/jetson_tensorrt">jetson_tensorrt</a>: a collection of optimized <a href="https://developer.nvidia.com/tensorrt">TensoRT</a> based nodes and nodelets specifically tailored to the Jetson platform. To start out with, classification and object detection are supported for <a href="https://developer.nvidia.com/digits">nVidia DIGITS</a> ImageNet and DetectNets.</p>
<p>Here is some example output in rviz from a single class pedestrian detector using an Intel RealSense D435:
<img src="https://csvance.github.io/assets/img/posts/tensorrt_detect.jpg" alt="detect" class="img-fluid" /></p>
<p align="center">
<b>DetectNet Object Detection</b><br />
</p>
<p>Support is planned for SegNets as well.</p>
<p>Check it out on <a href="https://github.com/csvance/jetson_tensorrt">Github</a>!</p>Carroll VanceDuring the past few months I have been working towards making high performance deep learning inferences much more accessible in ROS on the nVidia Jetson TX2. The result is jetson_tensorrt: a collection of optimized TensoRT based nodes and nodelets specifically tailored to the Jetson platform. To start out with, classification and object detection are supported for nVidia DIGITS ImageNet and DetectNets.Keras/Tensorflow, TensorRT, and Jetson2018-05-23T03:00:00+00:002018-05-23T03:00:00+00:00https://csvance.github.io/blog/2018/05/23/keras-tensorrt-jetson<p>nVidia’s Jetson platform is arguably the most powerful family of devices for deep learning at the edge. In order to achieve the full benefits of the platform, a framework called TensorRT drastically reduces inference time for supported network architectures and layers. However, nVidia does not currently make it easy to take your existing models from Keras/Tensorflow and deploy them on the Jetson with TensorRT. One reason for this is the python API for TensorRT only supports x86 based architectures. This leaves us with no real easy way of taking advantage of the benefits of TensorRT. However, there is a harder way that does work: To achieve maximum inference performance we can export and convert our model to .uff format, and then load it in TensorRT’s C++ API.</p>
<h2 id="1-training-and-exporting-to-pb">1. Training and exporting to .pb</h2>
<ul>
<li>Train your model</li>
<li>If using Jupyter, restart the kernel you trained your model in to remove training layers from the graph</li>
<li>Reload the models weights</li>
<li>Use an export function like the one in <a href="https://github.com/csvance/keras-tensorrt-jetson/blob/master/training/training.ipynb">this notebook</a> to export the graph to a .pb file</li>
</ul>
<h2 id="2-converting-pb-to-uff">2. Converting .pb to .uff</h2>
<p>I suggest using the <a href="https://github.com/chybhao666/TensorRT">chybhao666/cuda9_cudnn7_tensorrt3.0:latest Docker container</a> to access the script needed for converting a .pb export from Keras/Tensorflow to .uff format for TensorRT import.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd /usr/lib/python2.7/dist-packages/uff/bin
# List Layers and manually pick out the output layer
# For most networks it will be dense_x/BiasAdd, the last one that isn't a placeholder or activation layer
python convert_to_uff.py tensorflow --input-file /path/to/graph.pb -l
# Convert to .uff, replace dense_1/BiasAdd with the name of your output layer
python convert_to_uff.py tensorflow -o /path/to/graph.uff --input-file /path/to/graph.pb -O dense_1/BiasAdd
</code></pre></div></div>
<p>More information on the .pb export and .uff conversion is available from <a href="https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#exporttftouff">nVidia</a></p>
<h2 id="3-loading-the-uff-into-tensorrt-c-inference-api">3. Loading the .uff into TensorRT C++ Inference API</h2>
<p>I have created a generic class which can load the graph from a .uff file and setup TensorRT for inference while taking care of all host / device CUDA memory management behind the scenes. It supports any number of inputs and outputs and is available on my <a href="https://github.com/csvance/keras-tensorrt-jetson/blob/master/inference/">Github</a>. It can be built with <a href="https://developer.nvidia.com/nsight-eclipse-edition">nVidia nSight Eclipse Edition</a> using a remote toolchain <a href="https://devblogs.nvidia.com/remote-application-development-nvidia-nsight-eclipse-edition/">(instructions here)</a></p>
<h3 id="caveats">Caveats</h3>
<ul>
<li>Keep in mind that many layers are not supported by TensorRT 3.0. The most obvious omission is BatchNorm, which is used in many different types of deep neural nets.</li>
<li>Concatenate only works on the channel axis and if and only if the other dimensions are the same. If you have multiple paths for convolution, you are limited to concatenating them only when they have the same dimensions.</li>
</ul>Carroll VancenVidia’s Jetson platform is arguably the most powerful family of devices for deep learning at the edge. In order to achieve the full benefits of the platform, a framework called TensorRT drastically reduces inference time for supported network architectures and layers. However, nVidia does not currently make it easy to take your existing models from Keras/Tensorflow and deploy them on the Jetson with TensorRT. One reason for this is the python API for TensorRT only supports x86 based architectures. This leaves us with no real easy way of taking advantage of the benefits of TensorRT. However, there is a harder way that does work: To achieve maximum inference performance we can export and convert our model to .uff format, and then load it in TensorRT’s C++ API.Turn Based Games and 1v1 DQNs2018-01-09T18:04:00+00:002018-01-09T18:04:00+00:00https://csvance.github.io/blog/2018/01/09/turn-based-games-1v1-dqn<h2 id="background">Background</h2>
<p>At this point, one would have to be living under a rock to have not heard of <a href="https://deepmind.com">DeepMind’s</a> success at teaching itself to play Go by playing itself without any feature engineering. However, most available tutorials online about <a href="https://deepmind.com/research/dqn/">Deep Q Networks</a> are coming from an entirely different angle: learning how to play various single player games in the <a href="https://github.com/openai/gym">OpenAI Gym</a>. If one simply applies these examples to turn based games in which the AI learns by playing itself, a world of hurt is in store for several reasons:</p>
<ul>
<li>In standard DQN learning, the target reward is retrieved by using the next state after an action is taken. However, the next state in a turned based dueling game is used by the enemy of the agent who took the action. To further complicate matters, the generated next state from an action is in the perspective of the agent taking the action. If we attempt to implement standard DQN, we are training the agent with data used in incorrect game contexts and assigning rewards for the wrong perspective.</li>
<li>Many turn based dueling games only have a win condition rather than a score which can be used for rewards. This complicates both measuring a DQN’s performance and assigning rewards.</li>
</ul>
<h2 id="state-and-perspective">State and Perspective</h2>
<p>First of all, in a game where an agent plays itself from multiple perspectives, we must be careful the correct perspective is provided when making predictions or training discounted future rewards. For example, let us consider the game <a href="https://en.wikipedia.org/wiki/Connect_Four">Connect Four</a>. Instead of viewing the game as a battle between a red agent and a black agent, we could consider it from the perspective the agents viewpoint at the state being considered. For example, when the agent who takes the second turn blocks the agent who went first, the following next state is generated:</p>
<p><img src="https://csvance.github.io/assets/img/posts/perspective_a.png" alt="perspective" class="img-fluid" /></p>
<p>However, this next state wouldn’t be used by the agent who went second to take an action. It is going to be used by the agent who went first, but it needs to be inverted to their perspective before it can be used:</p>
<p><img src="https://csvance.github.io/assets/img/posts/perspective_b.png" alt="perspective" class="img-fluid" /></p>
<p>However, this is not the only tweak needed to get DQN working with a dueling turn based game. Let us recall how the discounted future reward is calculated:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">future_reward = reward + gamma * amax(predict(next_state))</code></li>
<li>gamma: discount factor, for example 0.9</li>
<li>reward: the reward the agent recieved for taking an action</li>
<li>next_state: the state generated from applying an action to the original state</li>
<li>amax selects: highest value from the result</li>
</ul>
<p>Remember, next_state will be the enemy agent’s state. So if we simply implement this formula, we are predicting the discounted future reward that the enemy agent might receive, not our own. We must predict one more state into the future in order to propagate the discounted future reward:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># We must invert the perspective of next_state so it is in the perspective of the enemy of the player who took the action which resulted in next_state
</span><span class="n">next_state</span><span class="p">.</span><span class="n">invert_perspective</span><span class="p">()</span>
<span class="c1"># Predict the action the enemy is most likely to take
</span><span class="n">enemy_action</span> <span class="o">=</span> <span class="n">argmax</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">next_state</span><span class="p">))</span>
<span class="c1"># Apply the action and invert the perspective back to the original one
</span><span class="n">true_next_state</span> <span class="o">=</span> <span class="n">next_state</span><span class="p">.</span><span class="nb">apply</span><span class="p">(</span><span class="n">enemy_action</span><span class="p">)</span>
<span class="n">true_next_state</span><span class="p">.</span><span class="n">invert_perspective</span><span class="p">()</span>
<span class="c1"># Finally calculate discounted future reward
</span><span class="n">future_reward</span> <span class="o">=</span> <span class="n">reward</span> <span class="o">+</span> <span class="n">gamma</span> <span class="o">*</span> <span class="n">amax</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">true_next_state</span><span class="p">))</span></code></pre></figure>
<p>I have also tried subtracting the enemy reward from the reward that took the original action, but have not been able to measure good long or short term results with this policy.</p>
<h2 id="win-conditions-and-rewards">Win Conditions and Rewards</h2>
<p>Another problem with certain board games such as Connect Four is that they have no objective way of keeping score. There is only reward for victory and punishment for failure. I have had luck using 1.0 for victory, -1.0 for failure, and 0.0 for all other moves. Samples for duplicate games in a row and ties should be discarded as they don’t contain any useful information and will only serve to pollute our replay memory.</p>
<h2 id="measuring-performance">Measuring Performance</h2>
<p>One major challenge of DQNs with only win / loss conditions is measuring the networks performance over time. I have found a few ways to do this, including having the agent play a short term reward maximizing symbolic AI every N games as validation. If our agent cannot beat an agent that only thinks in the short term, then we need to continue making changes to the network structure, hyper-parameters, and feature representation. Beating this short sighted AI consistently should be our first goal.</p>
<h2 id="network-stability">Network Stability</h2>
<p>A common mistake creating a DQN is making the network have too few dimensions to begin with. This can cause serious aliasing in our predictions, resulting in an unstable network. Generally speaking, it is better to start with a wide network and testing how much the network can be slimmed down.</p>
<p>We must also make sure our training data and labels are formatted in a way to ensure stability. Rewards should be normalized in the [-1., 1.] range, and any discounted future reward which is outside of this range should be clipped.</p>
<p>Another factor in network stability is our experience replay buffer size. Too small and our network will forget past things it learned, and too big and it will take excessive time to learn. I find it is generally its better to start smaller while testing if the network is able to learn simple gameplay, and increasing it as training time increases and we want to insure network stability. People smarter than I such as Schaul et al. (2017) have proposed methods to optimize the size of the experience replay buffer: <a href="https://arxiv.org/abs/1511.05952">Prioritized Experience Replay</a> which may be worth investigating if you are unsure how to tune this.</p>
<p>Another factor to consider is the optimizer learning rate. A high learning rate can create instabilities in the neural networks state approximation behavior, resulting in all kinds of catastrophic forgetfulness. Starting at 0.001 is a good idea, and if you note instabilities with this try decreasing it from there. I find that 0.0001 works optimally for longer training sessions.</p>
<p>Finally, techniques used in deep neural networks such as dropout and batchnorm have a negative impact on Deep-Q Learning. I suggest watching <a href="https://www.youtube.com/watch?v=fevMOp5TDQs">Deep RL Bootcamp Lecture 3: Deep Q-Networks</a> if you are interested in more information on this.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Deep Q Learning proves to be both extremely interesting and challenging. While I am not completely happy with my own results in training a DQN for Connect Four, I think it is at least worth posting some of the things I have learned from the experience. My current agent can be found at the link below.</p>
<ul>
<li><a href="https://github.com/csvance/deep-learning-connect-four">Github: DQN AI for Connect Four</a></li>
</ul>
<h2 id="references">References</h2>
<ul>
<li><a href="https://keon.io/deep-q-learning/">Deep Q-Learning with Keras and Gym</a></li>
<li><a href="https://www.youtube.com/watch?v=fevMOp5TDQs">Deep RL Bootcamp Lecture 3: Deep Q-Networks</a></li>
</ul>Carroll VanceBackground At this point, one would have to be living under a rock to have not heard of DeepMind’s success at teaching itself to play Go by playing itself without any feature engineering. However, most available tutorials online about Deep Q Networks are coming from an entirely different angle: learning how to play various single player games in the OpenAI Gym. If one simply applies these examples to turn based games in which the AI learns by playing itself, a world of hurt is in store for several reasons:The RNN Sequence Prediction Seed Problem and How To Solve It2017-12-29T18:04:00+00:002017-12-29T18:04:00+00:00https://csvance.github.io/blog/2017/12/29/RNN-seed-problem<h2 id="the-problem">The Problem</h2>
<p>There are many <a href="https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/">tutorials</a> on how to create <a href="https://en.wikipedia.org/wiki/Recurrent_neural_network">Recurrent Neural Networks</a> and use them for sequence generation. However, most of these tutorials show an example where an initial seed value must be used to start the generation process. This is highly impractical for a query response generation scheme. Luckily, there is a fairly easy way to solve this involving how we format our training data.</p>
<h2 id="solution">Solution</h2>
<h3 id="empty-item">Empty Item</h3>
<p>First of all, we need to make sure we have a dummy value which represents an empty sequence item. It is convenient to use zero for this, because most padding functions are going to pad with zeros, and functions like np.zeros make it easy to initialize this. We will represent this value as NULL for simplicity.</p>
<h3 id="end-of-sequence-item">End of Sequence Item</h3>
<p>We need a second special value for marking the end of a sequence. Call it EOS for short, and it can be whatever value you want provided it stays consistent.</p>
<h3 id="sequence-steps">Sequence Steps</h3>
<p>Instead of feeding entire sequences for training, we are going to step through each sequence, generating subsequences starting with all empty values and adding one more item in each step. The label for each sequence will simply be the next word in the sequence, or <EOS> if we reach the end of the sequence. For example, take the sequence "The quick brown fox jumps over the lazy dog. This will break down into the following sequences:</EOS></p>
<pre><code class="python">
"NULL NULL NULL NULL NULL NULL NULL NULL NULL" -> The
"The NULL NULL NULL NULL NULL NULL NULL NULL" -> quick
"The quick NULL NULL NULL NULL NULL NULL NULL" -> brown
"The quick brown NULL NULL NULL NULL NULL NULL" -> fox
"The quick brown fox NULL NULL NULL NULL NULL" -> jumps
"The quick brown fox jumps NULL NULL NULL NULL" -> over
"The quick brown fox jumps over NULL NULL NULL" -> the
"The quick brown fox jumps over the NULL NULL" -> lazy
"The quick brown fox jumps over the lazy NULL" -> dog
"The quick brown fox jumps over the lazy dog" -> EOS
</code></pre>
<h3 id="sequence-size">Sequence Size</h3>
<p>If you have a sequence larger than your maximum size, start removing the first element before you append a new one. This will have no bearing on prediction because the network will learn how to handle this.</p>
<h2 id="conclusion">Conclusion</h2>
<p>We can now train the network and start by feeding it a sequence filled with NULLs to predict the first value. Here is an example doing this using <a href="https://keras.io">Keras</a>:</p>
<ul>
<li><a href="https://github.com/csvance/armchair-expert/blob/master/models/structure.py">Preprocessing & Model</a></li>
</ul>Carroll VanceThe Problem There are many tutorials on how to create Recurrent Neural Networks and use them for sequence generation. However, most of these tutorials show an example where an initial seed value must be used to start the generation process. This is highly impractical for a query response generation scheme. Luckily, there is a fairly easy way to solve this involving how we format our training data.Hello World!2017-12-09T19:40:48+00:002017-12-09T19:40:48+00:00https://csvance.github.io/blog/2017/12/09/hello-world<p>Hi, and welcome to my portfolio and blog. I will be chronicling my explorations in machine learning, artificial intelligence, and software engineering here.</p>Carroll VanceHi, and welcome to my portfolio and blog. I will be chronicling my explorations in machine learning, artificial intelligence, and software engineering here.