Jekyll2020-06-27T17:51:39+00:00https://csvance.github.io/feed.xmlCarroll VanceCarroll VanceOrdinary Differential Equations: Limit Sets and Long Term Behavior2019-05-09T12:00:00+00:002019-05-09T12:00:00+00:00https://csvance.github.io/blog/2019/05/09/ode-limit-sets-long-term-behavior<p>I wrote my Ordinary Differential Equations term project in <a href="https://www.latex-project.org">LaTeX</a> and thought it would be fun to see if I could convert it to markdown and post it here. Apparently <a href="https://www.mathjax.org">MathJax</a> makes this easy, so here it is!</p> <h1 id="abstract">Abstract</h1> <p>This paper will address the long-term behavior of systems of ordinary differential equations (hereafter ODEs), including the first and second dimensional cases. The specific behavior of interest is:</p> <script type="math/tex; mode=display">\lim_{t\to\infty} y(t) \neq\pm\infty</script> <p>where ${y}(t)$ is a solution to a system of ODEs. Families of solutions will be considered analytically and numerically for behavior such as periodic orbits and fixed points. The considered problems will be plotted and have their limit sets enumerated.</p> <h1 id="dimension-d--1">Dimension D = 1</h1> <p>Consider the first order linear ODE:</p> <script type="math/tex; mode=display">y\prime = y</script> <p>By simply looking at the equation, it can be observed that it is autonomous (no independent variable is present in the ODE). From this we know that the behavior of the solution does not change with the independent variable $t$. Solving the differential equation through integration provides a solution:</p> <script type="math/tex; mode=display">y(t) = C_1e^t</script> <p>Analytically it can be observed that with an initial condition of $y(0) = 0$ that $C_1 = 0$. Consequentially for any value of $t$, $y(t) = 0$:</p> <script type="math/tex; mode=display">\lim_{t\to\infty} y(t) = 0 \textrm{ where } y(0) = 0</script> <p>For any other initial condition, it can observed that $y(t)$ will grow without bounds:</p> <p><img src="/assets/img/posts/d1.png" alt="" class="img-fluid" /></p> <p>Because $y\prime = y$ is a one dimensional system, its phase plane only has a single dimension $y(t)$. The limit set includes all points in this dimension such that:</p> <script type="math/tex; mode=display">\lim_{t\to\infty} y(t) \neq\pm\infty | t \in \mathbb{R}</script> <p>It directly follows that the limit set for the ODE $y\prime = y$ is $\{(0)\}$.</p> <h1 id="dimension-d--2">Dimension D = 2</h1> <h4 id="damped-vibrating-spring">Damped Vibrating Spring</h4> <p>Consider the second order system of linear ODEs:</p> <script type="math/tex; mode=display">\begin{array}{lcl} y\prime=v\\ v\prime=-4y-2v \end{array}</script> <p>This is derived from a second order unforced and damped harmonic motion ODE $y\prime\prime + 2cy\prime + \omega_0^2 = 0$ where $c = 1$ and $\omega_0 = 2$. When $c &lt; \omega_0$ the system is over-damped. Knowing this, it should be expected that all solutions in the phase plane converge to a single point as the damping term overtakes the rest of the system. Consequentially, the limit set of the system contains the point of convergence for all solutions as $t\to\infty$.</p> <p><img src="/assets/img/posts/d2_a_p.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_a_v.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_a_ph.png" alt="" class="img-fluid" /></p> <p>Plotting the solution curves displays behavior typical of a spiral sink, and the trace determinant plane confirms it: $D(A) = 4,T(A) = -2$. Due to this behavior, all sollution curves in the phase plane converge to a single two dimensional equilibrium point $(0, 0)$ as $t\to\infty$. It follows that the limit set for this ODE is $\{(0, 0)\}$. While this limit set resembles the limit set of the one dimensional case, it should be noted that the ODE in the one dimensional case only had a non infinite solution as $t\to\infty$ with a single IVP, where as every IVP in the two dimensional case converged towards the origin.</p> <h4 id="undamped-vibrating-spring">Undamped Vibrating Spring</h4> <p>Consider the second order system of linear ODEs:</p> <script type="math/tex; mode=display">\begin{array}{lcl} y\prime=v\\ v\prime=-4y \end{array}</script> <p>This is derived from a second order unforced and undamped harmonic motion ODE $y\prime\prime + 2cy\prime + \omega_0^2 = 0$ where $c = 0$ and $\omega_0 = 2$. When $c = 0$ the system is undamped. Knowing this, it should be expected that all solutions of $y(t)$ and $y\prime(t)$ to be repeating in nature, so they should form an ellipse like shape in the phase plane. Consequentially, the limit set of the system should contain as many members as there are solutions to initial value problems. A plot confirms these suspicions:</p> <p><img src="/assets/img/posts/d2_b_p.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_b_v.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_b_ph.png" alt="" class="img-fluid" /></p> <p>This system clearly exhibits a center behavior, which is confirmed by the trace determinant plane: $D(A) = 4,T(A) = 0$. It follows that the limit set contains every curve in the phase plane formed by the general solution of the ODE and its derivative:</p> <script type="math/tex; mode=display">\begin{array}{lcl} y(t) = C_1cos(2t) + C_2sin(2t)\\ y\prime(t) = -C_1(2sin(2t)) + C_2(2cos(2t))\\ \end{array}</script> <p>It should also be noted that the origin of the phase plane $(0, 0)$ is in the limit set. This can easily be verified by solving the initial value problem for $y(0) = 0, y\prime(0) = 0$. Finally, the limit set for this ODE can be characterized by the two behaviors mentioned above:</p> <ol> <li> <p>$(y=0,y\prime=0)$: Solutions that start at the origin stay at the origin as $t\to\infty$</p> </li> <li> <p>$y\ne0,y\prime \ne 0$: Solutions that start outside of the origin stay in a periodic solution curve as $t\to\infty$ which is defined by the system solution for the initial value problem.</p> </li> </ol> <h4 id="undamped-pendulum">Undamped Pendulum</h4> <p>Consider the second order system of nonlinear ODEs:</p> <script type="math/tex; mode=display">\begin{array}{lcl} \theta\prime=\omega\\ \omega\prime=-sin(\theta)\\ \end{array}</script> <p>This is derived from the undamped pendulum ODE $\theta\prime\prime = -sin(\theta)$. There are quite a few initial conditions $p_0$ to consider.</p> <ol> <li> <p>For $p_0\in\{ (\theta = 2n\pi, \omega_0 = 0)| n\in\mathbb{Z} \}$, the phase plane solution should be a single point $p_0$ as the pendulum has no energy.</p> </li> <li> <p>For $p_0\in\{\theta_0 \ne 0, \omega_0 = 0\}$ the phase plane solution should look like a center around a point $p_c \in \{ (\theta = 2n\pi, \omega)| n\in\mathbb{Z} \}$ as the pendulum converts potential energy to velocity, back to potential energy, and finally changing directions to repeat the process.</p> </li> <li> <p>For $p_0\in\{(\theta_0=0,\mid\omega_0\mid \lessapprox 2.00027)\}$ , the pendulum should behave like case two.</p> </li> <li> <p>For $p_0\in\{(\theta_0=0,\mid\omega_0\mid \gtrapprox 2.00027)\}$, the pendulum no longer changes direction and $\lim_{t\to\infty} \theta(t) = \pm\infty$</p> </li> <li> <p>For $p_0\in\{ (\theta_0 = \pi + 2n\pi, \omega_0 = 0)| n\in\mathbb{Z} \}$, the phase plane solution should be a single point $p_0$ as the pendulum needs a nudge in either direction to start moving.</p> </li> </ol> <p>The plots above confirm the predicted behavior of solutions. The limit set can be broken down into three different cases:</p> <p><img src="/assets/img/posts/d2_c_p.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_c_v.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_c_ph.png" alt="" class="img-fluid" /></p> <ol> <li> <p>Solutions where $\omega$ contains positive and negative values behave as a center around some point $p_c \in \{(\theta_0 = 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$. This is confirmed by looking at the trace determinant plane for the center equilibrium: $D(J)=1, T(J)=0$.</p> </li> <li> <p>$p_0 \in \{(\theta_0 = 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$: Solutions to this IVP stay at the angle they started at as $t\to\infty$. These equilibrium points are shared with case one.</p> </li> <li> <p>$\{ (\theta_0 = \pi + 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$: Solutions to this IVP stay at the angle they started at as $t\to\infty$. These equilibrium points are saddle points in the trace determinant plane: $D(J)=-1, T(J)=0$.</p> </li> </ol> <p>It is important to keep in mind that solutions where $\lim_{t\to\infty} \omega(t) \neq\pm\infty$ are not a member of the limit set if $\lim_{t\to\infty} \theta(t) = \pm\infty$. This excludes any solutions for which $\omega$ is exclusively $&gt; 0$ or $&lt; 0$ as $t\to\infty$. However, we can still make a definitive statement as to the behavior of $\omega$ as $t\to\infty$.</p> <h4 id="damped-pendulum">Damped Pendulum</h4> <p>Consider the second order system of nonlinear ODEs:</p> <script type="math/tex; mode=display">\begin{array}{lcl} \theta\prime=\omega\\ \omega\prime=-sin(\theta) - \frac{1}{2}\omega\\ \end{array}</script> <p>This is derived from the damped pendulum ODE $\theta\prime\prime = -sin(\theta) - c\theta\prime$. A reasonable guess as to its behavior would be behaving much like the undamped case except in the cases that resulted in endless oscillation. On the phase plane, these cases should converge to $(\theta_0 = 2n\pi | n\in\mathbb{Z}, \omega_0 = 0)$ instead.</p> <p><img src="/assets/img/posts/d2_d_p.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_d_v.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_d_ph.png" alt="" class="img-fluid" /></p> <p>Based on analysis of the system and observed behavior of the numerical plot, a limit set can be constructed:</p> <ol> <li> <p>When $\{(\theta_0 = \pi + 2n\pi, \omega_0=0)| n\in\mathbb{Z}\}$, solutions remain where they started. This can be verified because these values are in the set of equilibrium points. The trace determinant shows saddle point behavior for the set of initial conditions: $D(J)=1, T(J)=-\frac{1}{2}$</p> </li> <li> <p>All solutions not included in case one converge to a spiral sinks in the set of $\{(\theta = 2n\pi, \omega = 0)| n\in\mathbb{Z}\}$ depending on initial conditions. This can be confirmed by recognizing that equilibrium points which follow the same pattern are all in the spiral sink region of the trace determinant plane: $D(J)=1, T(J)=-\frac{1}{2}$</p> </li> </ol> <h4 id="competing-species">Competing Species</h4> <p>Consider the second order system of nonlinear ODEs:</p> <script type="math/tex; mode=display">\begin{array}{lcl} x\prime = (1 - x - y)x\\ y\prime = (4 - 2x -7y)y\\ \end{array}</script> <p>In a competing species system, the populations of two species interact with each other and/or themselves. To analyze such a system for long term behavior, the equilibrium points should first be considered:</p> <script type="math/tex; mode=display">\begin{array}{lcl} (1-x-y)x = 0\\ (4-2x-7y)y=0\\ \end{array}</script> <p>Solving for the x-nullcline, y-nullcline, and nullcline provides the set $\{(0, 0), (1, 0), (0, \frac{4}{7}), ({\frac{3}{5}, \frac{2}{5}})\}$ Next, the system is linearized by computing the Jacobian:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{array}{lcl} J = \begin{pmatrix} \frac{\partial}{\partial x}(1 - x - y)x & \frac{\partial}{\partial y}(1 - x - y)x \\ \frac{\partial}{\partial x}(4 - 2x - 7y)y & \frac{\partial}{\partial y}(4 - 2x - 7y)y \end{pmatrix}\\ J = \begin{pmatrix} 1 - y - 2x & -x\\ -2y & 4-14y-2x \end{pmatrix}\\ \end{array} %]]></script> <p>Next the Jacobian is used to examine the equilibrium in the trace determinant plane:</p> <script type="math/tex; mode=display">\begin{array}{lcl} (0, 0):\textrm{ Nodal Source}\\ D_{0, 0} = det_{0, 0}(J) = 4\\ T_{0, 0} = tr_{0, 0}(J) = 5\\ \\ (1, 0):\textrm{ Saddle Point}\\ D_{1, 0} = det_{1, 0}(J) = -2\\ T_{1, 0} = tr_{1, 0}(J) = 1\\ \\ (0, \frac{4}{7}):\textrm{ Saddle Point}\\ D_{0, \frac{4}{7}} = det_{0, \frac{4}{7}}(J) = 0\\ T_{0, \frac{4}{7}} = tr_{0, \frac{4}{7}}(J) = -3\\ \\ (\frac{3}{5}, \frac{2}{5}):\textrm{ Nodal Sink}\\ D_{\frac{3}{5}, \frac{2}{5}} = det_{\frac{3}{5}, \frac{2}{5}}(J) = \frac{6}{5}\\ T_{\frac{3}{5}, \frac{2}{5}} = tr_{\frac{3}{5}, \frac{2}{5}}(J) = -\frac{17}{5}\\ \end{array}</script> <p>Plotting for several initial value problems, a few things can be observed:</p> <ol> <li> <p>When $x$ and $y$ have an initial values $&gt; 0$, their solutions gravitate towards $\frac{3}{5}$ and $\frac{2}{5}$. This is consistent with the predicted nodal sink behavior.</p> </li> <li> <p>Initial values of $x$ and $y$ that start at zero stay at zero, which is expected when starting at the center of a nodal source.</p> </li> <li> <p>When $y = 0$, Initial values of $x &gt; 0$ gravitate towards one. This makes sense when $y = 0$ reduces the first equation to $x\prime = x(x - 1)$ which clearly has a root of $1$. It follows that it has an equilibrium point there as well. Examining the trace determinant of $(1, 0)$ places it in the region of saddle points.</p> </li> <li> <p>When $x = 0$, $y$ gravitates towards $\frac{4}{7}$. This makes sense considering the second equation is reduced to $y\prime = y(4 - 7y)$, which has a root $\frac{4}{7}$. Examining the trace determinant of this point it clearly falls into the region of saddle points.</p> </li> </ol> <p><img src="/assets/img/posts/d2_e_ph.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_e_1_1.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_e_1_0.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_e_0_1.png" alt="" class="img-fluid" /> <img src="/assets/img/posts/d2_e_0_0.png" alt="" class="img-fluid" /></p> <p>Considering the above, the limit set contains individual points on the phase plane: $\{(0, 0), (1, 0), (0, \frac{4}{7}), ({\frac{3}{5}, \frac{2}{5}})\}$. This system contains more non periodic individual points in its limit set than any previously examined system while containing no center curves.</p> <h4 id="van-der-pols-equation">Van der Pol’s Equation</h4> <p>Consider the following system of nonlinear ODEs:</p> <script type="math/tex; mode=display">\begin{array}{lcl} x\prime = 2x-y-x^3\\ y\prime = x\\ \end{array}</script> <p>To analyze this system, first the equilibrium points need to be solved using the intersection of the nullclines:</p> <script type="math/tex; mode=display">\begin{array}{lcl} 2x-y-x^3 = 0\\ x = 0\\ S = \\{(0, 0)\\}\\ \end{array}</script> <p>Next, the system is linearized and the trace determinant of the equilibrium point computed:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{array}{lcl} J = \begin{pmatrix} \frac{\partial}{\partial x}2x-y-x^3 & \frac{\partial}{\partial y} 2x-y-x^3\\ \frac{\partial}{\partial x}x & \frac{\partial}{\partial y} x \end{pmatrix}\\ \\ J = \begin{pmatrix} 2 - 3x^2 & -1\\ 1 & 0 \end{pmatrix}\\ \\ (0, 0):\textrm{ Special Case / Nodal Source}\\ D_{0, 0} = det_{0, 0}(J) = 1\\ T_{0, 0} = tr_{0, 0}(J) = 2\\ \end{array} %]]></script> <p>The equilibrium point is a special case with real eigenvalues and $T^2 = 4D$. Plotting the phase plane reveals atypical behavior:</p> <p><img src="/assets/img/posts/d2_f_ph.png" alt="" class="img-fluid" /></p> <p>There are two clear behaviors here, both of which are clearly members of the limit set:</p> <ol> <li> <p>For $(x = 0, y = 0)$, the solution does not leave the origin of the phase plane.</p> </li> <li> <p>All other initial values seem to converge into the exact same periodic closed curve. This is different than previous center periodic solutions where there were infinitely many different paths depending on initial conditions.</p> </li> </ol> <p>Setting the initial condition to a part of the closed loop yields the following result:</p> <p><img src="/assets/img/posts/d2_f_loop.png" alt="" class="img-fluid" /></p> <h1 id="extra-work">Extra Work</h1> <h4 id="approximating-van-der-pols-solution-curve">Approximating Van der Pol’s Solution Curve</h4> <p>While I wasn’t able to find a solution to Van der Pol’s equation, I was not content walking away without at least approximating a solution. Here are the steps I took to approximate a curve. First, we apply a rotation using a linear transformation with $\theta=\frac{\pi}{16}$ (Matrix $A$ is the output of ODE45)</p> <p><img src="/assets/img/posts/fit_rotate.png" alt="" class="img-fluid" /></p> <script type="math/tex; mode=display">% <![CDATA[ \begin{array}{lcl} R = \begin{pmatrix} cos(\theta) & -sin(\theta) \\ sin(\theta) & cos(\theta) \\ \end{pmatrix}\\\\ X = AR \end{array} %]]></script> <p>Next a shear transform is applied with $k = 0.03$:</p> <p><img src="/assets/img/posts/fit_shear.png" alt="" class="img-fluid" /></p> <script type="math/tex; mode=display">% <![CDATA[ \begin{array}{lcl} K = \begin{pmatrix} 1 & k \\ 0 & 1 \\ \end{pmatrix}\\\\ X = XK \end{array} %]]></script> <p><img src="/assets/img/posts/fit_pretrans.png" alt="" class="img-fluid" /></p> <p>A polynomial fit is found with $n = 26$. Next we apply the inverse of the linear transformations previously applied.</p> <p><img src="/assets/img/posts/fit_pretest.png" alt="" class="img-fluid" /></p> <p>The moment of truth:</p> <p><img src="/assets/img/posts/fit_test.png" alt="" class="img-fluid" /></p> <p>While the fit may not be anywhere near perfect, it is a good first attempt and perhaps worthy of more exploration at a later time.</p>Carroll VanceI wrote my Ordinary Differential Equations term project in LaTeX and thought it would be fun to see if I could convert it to markdown and post it here. Apparently MathJax makes this easy, so here it is!Installing and using Tensoflow with TF-TRT on the Jetson Nano2019-04-08T12:00:00+00:002019-04-08T12:00:00+00:00https://csvance.github.io/blog/2019/04/08/installing-tensorflow-tftrt-jetson-nano<p>One of the great things to release alongside the <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/">Jetson Nano</a> is <a href="https://developer.nvidia.com/embedded/jetpack">Jetpack 4.2</a>, which includes support for <a href="https://developer.nvidia.com/tensorrt">TensorRT</a> in python. One of the easiest ways to get started with TensorRT is using the <a href="https://github.com/tensorflow/tensorrt">TF-TRT interface</a>, which lets us seamlessly integrate TensorRT with a <a href="http://tensorflow.org">Tensorflow</a> graph even if some layers are not supported. Of course this means we can easily accelerate <a href="https://keras.io">Keras</a> models as well!</p> <p>nVidia now provides a <a href="https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/">prebuilt Tensorflow</a> for Jetson that we can install through pip, but we also need to make sure certain dependencies are satisfied.</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>apt <span class="nb">install </span>python3-numpy python3-markdown python3-mock python3-termcolor python3-astor libhdf5-dev</code></pre></figure> <p>Follow the instructions here to install tensorflow-gpu on Jetpack 4.2: <a href="https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/">https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-</a></p> <p>Now that Tensorflow is installed on the Nano, lets load a pretrained MobileNet from Keras and take a look at its performance with and without TensorRT for binary classification.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">tensorflow.keras</span> <span class="k">as</span> <span class="n">keras</span> <span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">Model</span> <span class="kn">from</span> <span class="nn">tensorflow.keras.layers</span> <span class="kn">import</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">Flatten</span> <span class="n">mobilenet</span> <span class="o">=</span> <span class="n">keras</span><span class="p">.</span><span class="n">applications</span><span class="p">.</span><span class="n">mobilenet</span><span class="p">.</span><span class="n">MobileNet</span><span class="p">(</span><span class="n">include_top</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">224</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">weights</span><span class="o">=</span><span class="s">'imagenet'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.25</span><span class="p">)</span> <span class="n">mobilenet</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span> <span class="n">x</span> <span class="o">=</span> <span class="n">Flatten</span><span class="p">()(</span><span class="n">mobilenet</span><span class="p">.</span><span class="n">output</span><span class="p">)</span> <span class="n">new_output</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'sigmoid'</span><span class="p">)(</span><span class="n">x</span><span class="p">)</span> <span class="n">model</span> <span class="o">=</span> <span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="n">mobilenet</span><span class="p">.</span><span class="nb">input</span><span class="p">,</span> <span class="n">outputs</span><span class="o">=</span><span class="n">new_output</span><span class="p">)</span> <span class="n">model</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span> <span class="c1"># TODO: Train your model for binary classification task </span> <span class="n">model</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="s">'mobilenet.h5'</span><span class="p">)</span></code></pre></figure> <p>Next we can execute inferences with different settings using <a href="https://gist.github.com/csvance/47ec78d67894c0d454ca98029d4d323c">this script</a> (thanks to <a href="https://github.com/jeng1220">jeng1220</a> for the <a href="https://github.com/jeng1220/KerasToTensorRT">Keras to TF-TRT code</a>)</p> <p>You will need to install plac to run the script: <code class="language-plaintext highlighter-rouge">pip3 install --user plac</code></p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Tensorflow Standard Inference</span> python3 tftrt_inference.py <span class="nt">-S</span> 30 <span class="nt">-T</span> TF mobilenet.h5 <span class="c"># Time = 4.19 s</span> <span class="c"># Samples = 30</span> <span class="c"># FPS = Samples / Time = 30 / 4.19 = 7.16 FPS</span> <span class="c"># TensorRT FP32 Inference</span> python3 tftrt_inference.py <span class="nt">-S</span> 30 <span class="nt">-T</span> FP32 mobilenet.h5 <span class="c"># Time = 0.96 s</span> <span class="c"># Samples = 30</span> <span class="c"># FPS = Samples / Time = 30 / 0.96 = 31.3 FPS</span> <span class="c"># TensorRT FP16 Inference</span> python3 tftrt_inference.py <span class="nt">-S</span> 30 <span class="nt">-T</span> FP16 mobilenet.h5 <span class="c"># Time = 0.84 s</span> <span class="c"># Samples = 30</span> <span class="c"># FPS = Samples / Time = 30 / 0.84 = 35.8 FPS</span></code></pre></figure> <p>It looks like TensorRT makes a significant difference vs simply running the inference in Tensorflow! Stay tuned for my next steps on the Nano: implementing and optimizing MobileNet SSD object detection to run at 30+ FPS!</p>Carroll VanceOne of the great things to release alongside the Jetson Nano is Jetpack 4.2, which includes support for TensorRT in python. One of the easiest ways to get started with TensorRT is using the TF-TRT interface, which lets us seamlessly integrate TensorRT with a Tensorflow graph even if some layers are not supported. Of course this means we can easily accelerate Keras models as well!Using the new Jetson.GPIO python library in Jetpack 4.22019-03-21T12:00:00+00:002019-03-21T12:00:00+00:00https://csvance.github.io/blog/2019/03/21/jetpack-42-jetson-gpio<p>With the release of the new <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/">Jetson Nano</a> also comes the <a href="https://developer.nvidia.com/embedded/jetpack">4.2 release of nVidia’s Jetpack BSP for the Jetson</a>. This included a new python library called Jetson.GPIO which provides a familiar interface for anyone who has used RPi.GPIO before. However, it doesn’t seem to be installed by default, so here are the instructions for getting it loaded into python!</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Setup groups/permissions</span> <span class="nb">sudo </span>groupadd <span class="nt">-f</span> <span class="nt">-r</span> gpio <span class="nb">sudo </span>usermod <span class="nt">-a</span> <span class="nt">-G</span> gpio your_user_name <span class="nb">sudo cp</span> /opt/nvidia/jetson-gpio/etc/99-gpio.rules /etc/udev/rules.d/ <span class="nb">sudo </span>udevadm control <span class="nt">--reload-rules</span> <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>udevadm trigger <span class="c"># Reboot required for changes to take effect</span> <span class="nb">sudo </span>reboot</code></pre></figure> <p>Now we need to install Jetson.GPIO:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">sudo </span>pip3 <span class="nb">install </span>Jetson.GPIO <span class="nb">sudo </span>pip <span class="nb">install </span>Jetson.GPIO</code></pre></figure> <p>After this we should be able to import the library in both versions of python:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash">nvidia@tx2:~<span class="nv">$</span>python3 Python 3.6.7 <span class="o">(</span>default, Oct 22 2018, 11:32:17<span class="o">)</span> <span class="o">[</span>GCC 8.2.0] on linux Type <span class="s2">"help"</span>, <span class="s2">"copyright"</span>, <span class="s2">"credits"</span> or <span class="s2">"license"</span> <span class="k">for </span>more information. <span class="o">&gt;&gt;&gt;</span> import Jetson.GPIO <span class="o">&gt;&gt;&gt;</span> nvidia@tx2:~<span class="nv">$ </span>python Python 2.7.15rc1 <span class="o">(</span>default, Nov 12 2018, 14:31:15<span class="o">)</span> <span class="o">[</span>GCC 7.3.0] on linux2 Type <span class="s2">"help"</span>, <span class="s2">"copyright"</span>, <span class="s2">"credits"</span> or <span class="s2">"license"</span> <span class="k">for </span>more information. <span class="o">&gt;&gt;&gt;</span> import Jetson.GPIO <span class="o">&gt;&gt;&gt;</span></code></pre></figure> <p>Happy hacking!</p>Carroll VanceWith the release of the new Jetson Nano also comes the 4.2 release of nVidia’s Jetpack BSP for the Jetson. This included a new python library called Jetson.GPIO which provides a familiar interface for anyone who has used RPi.GPIO before. However, it doesn’t seem to be installed by default, so here are the instructions for getting it loaded into python!TensorRT ROS Nodes for nVidia Jetson2018-10-11T12:00:00+00:002018-10-11T12:00:00+00:00https://csvance.github.io/blog/2018/10/11/tensorrt-ros-nodes-nvidia-jetson<p>During the past few months I have been working towards making high performance deep learning inferences much more accessible in <a href="http://www.ros.org">ROS</a> on the <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems-dev-kits-modules/">nVidia Jetson TX2</a>. The result is <a href="https://github.com/csvance/jetson_tensorrt">jetson_tensorrt</a>: a collection of optimized <a href="https://developer.nvidia.com/tensorrt">TensoRT</a> based nodes and nodelets specifically tailored to the Jetson platform. To start out with, classification and object detection are supported for <a href="https://developer.nvidia.com/digits">nVidia DIGITS</a> ImageNet and DetectNets.</p> <p>Here is some example output in rviz from a single class pedestrian detector using an Intel RealSense D435: <img src="https://csvance.github.io/assets/img/posts/tensorrt_detect.jpg" alt="detect" class="img-fluid" /></p> <p align="center"> <b>DetectNet Object Detection</b><br /> </p> <p>Support is planned for SegNets as well.</p> <p>Check it out on <a href="https://github.com/csvance/jetson_tensorrt">Github</a>!</p>Carroll VanceDuring the past few months I have been working towards making high performance deep learning inferences much more accessible in ROS on the nVidia Jetson TX2. The result is jetson_tensorrt: a collection of optimized TensoRT based nodes and nodelets specifically tailored to the Jetson platform. To start out with, classification and object detection are supported for nVidia DIGITS ImageNet and DetectNets.Keras/Tensorflow, TensorRT, and Jetson2018-05-23T03:00:00+00:002018-05-23T03:00:00+00:00https://csvance.github.io/blog/2018/05/23/keras-tensorrt-jetson<p>nVidia’s Jetson platform is arguably the most powerful family of devices for deep learning at the edge. In order to achieve the full benefits of the platform, a framework called TensorRT drastically reduces inference time for supported network architectures and layers. However, nVidia does not currently make it easy to take your existing models from Keras/Tensorflow and deploy them on the Jetson with TensorRT. One reason for this is the python API for TensorRT only supports x86 based architectures. This leaves us with no real easy way of taking advantage of the benefits of TensorRT. However, there is a harder way that does work: To achieve maximum inference performance we can export and convert our model to .uff format, and then load it in TensorRT’s C++ API.</p> <h2 id="1-training-and-exporting-to-pb">1. Training and exporting to .pb</h2> <ul> <li>Train your model</li> <li>If using Jupyter, restart the kernel you trained your model in to remove training layers from the graph</li> <li>Reload the models weights</li> <li>Use an export function like the one in <a href="https://github.com/csvance/keras-tensorrt-jetson/blob/master/training/training.ipynb">this notebook</a> to export the graph to a .pb file</li> </ul> <h2 id="2-converting-pb-to-uff">2. Converting .pb to .uff</h2> <p>I suggest using the <a href="https://github.com/chybhao666/TensorRT">chybhao666/cuda9_cudnn7_tensorrt3.0:latest Docker container</a> to access the script needed for converting a .pb export from Keras/Tensorflow to .uff format for TensorRT import.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd /usr/lib/python2.7/dist-packages/uff/bin # List Layers and manually pick out the output layer # For most networks it will be dense_x/BiasAdd, the last one that isn't a placeholder or activation layer python convert_to_uff.py tensorflow --input-file /path/to/graph.pb -l # Convert to .uff, replace dense_1/BiasAdd with the name of your output layer python convert_to_uff.py tensorflow -o /path/to/graph.uff --input-file /path/to/graph.pb -O dense_1/BiasAdd </code></pre></div></div> <p>More information on the .pb export and .uff conversion is available from <a href="https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#exporttftouff">nVidia</a></p> <h2 id="3-loading-the-uff-into-tensorrt-c-inference-api">3. Loading the .uff into TensorRT C++ Inference API</h2> <p>I have created a generic class which can load the graph from a .uff file and setup TensorRT for inference while taking care of all host / device CUDA memory management behind the scenes. It supports any number of inputs and outputs and is available on my <a href="https://github.com/csvance/keras-tensorrt-jetson/blob/master/inference/">Github</a>. It can be built with <a href="https://developer.nvidia.com/nsight-eclipse-edition">nVidia nSight Eclipse Edition</a> using a remote toolchain <a href="https://devblogs.nvidia.com/remote-application-development-nvidia-nsight-eclipse-edition/">(instructions here)</a></p> <h3 id="caveats">Caveats</h3> <ul> <li>Keep in mind that many layers are not supported by TensorRT 3.0. The most obvious omission is BatchNorm, which is used in many different types of deep neural nets.</li> <li>Concatenate only works on the channel axis and if and only if the other dimensions are the same. If you have multiple paths for convolution, you are limited to concatenating them only when they have the same dimensions.</li> </ul>Carroll VancenVidia’s Jetson platform is arguably the most powerful family of devices for deep learning at the edge. In order to achieve the full benefits of the platform, a framework called TensorRT drastically reduces inference time for supported network architectures and layers. However, nVidia does not currently make it easy to take your existing models from Keras/Tensorflow and deploy them on the Jetson with TensorRT. One reason for this is the python API for TensorRT only supports x86 based architectures. This leaves us with no real easy way of taking advantage of the benefits of TensorRT. However, there is a harder way that does work: To achieve maximum inference performance we can export and convert our model to .uff format, and then load it in TensorRT’s C++ API.Turn Based Games and 1v1 DQNs2018-01-09T18:04:00+00:002018-01-09T18:04:00+00:00https://csvance.github.io/blog/2018/01/09/turn-based-games-1v1-dqn<h2 id="background">Background</h2> <p>At this point, one would have to be living under a rock to have not heard of <a href="https://deepmind.com">DeepMind’s</a> success at teaching itself to play Go by playing itself without any feature engineering. However, most available tutorials online about <a href="https://deepmind.com/research/dqn/">Deep Q Networks</a> are coming from an entirely different angle: learning how to play various single player games in the <a href="https://github.com/openai/gym">OpenAI Gym</a>. If one simply applies these examples to turn based games in which the AI learns by playing itself, a world of hurt is in store for several reasons:</p> <ul> <li>In standard DQN learning, the target reward is retrieved by using the next state after an action is taken. However, the next state in a turned based dueling game is used by the enemy of the agent who took the action. To further complicate matters, the generated next state from an action is in the perspective of the agent taking the action. If we attempt to implement standard DQN, we are training the agent with data used in incorrect game contexts and assigning rewards for the wrong perspective.</li> <li>Many turn based dueling games only have a win condition rather than a score which can be used for rewards. This complicates both measuring a DQN’s performance and assigning rewards.</li> </ul> <h2 id="state-and-perspective">State and Perspective</h2> <p>First of all, in a game where an agent plays itself from multiple perspectives, we must be careful the correct perspective is provided when making predictions or training discounted future rewards. For example, let us consider the game <a href="https://en.wikipedia.org/wiki/Connect_Four">Connect Four</a>. Instead of viewing the game as a battle between a red agent and a black agent, we could consider it from the perspective the agents viewpoint at the state being considered. For example, when the agent who takes the second turn blocks the agent who went first, the following next state is generated:</p> <p><img src="https://csvance.github.io/assets/img/posts/perspective_a.png" alt="perspective" class="img-fluid" /></p> <p>However, this next state wouldn’t be used by the agent who went second to take an action. It is going to be used by the agent who went first, but it needs to be inverted to their perspective before it can be used:</p> <p><img src="https://csvance.github.io/assets/img/posts/perspective_b.png" alt="perspective" class="img-fluid" /></p> <p>However, this is not the only tweak needed to get DQN working with a dueling turn based game. Let us recall how the discounted future reward is calculated:</p> <ul> <li><code class="language-plaintext highlighter-rouge">future_reward = reward + gamma * amax(predict(next_state))</code></li> <li>gamma: discount factor, for example 0.9</li> <li>reward: the reward the agent recieved for taking an action</li> <li>next_state: the state generated from applying an action to the original state</li> <li>amax selects: highest value from the result</li> </ul> <p>Remember, next_state will be the enemy agent’s state. So if we simply implement this formula, we are predicting the discounted future reward that the enemy agent might receive, not our own. We must predict one more state into the future in order to propagate the discounted future reward:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># We must invert the perspective of next_state so it is in the perspective of the enemy of the player who took the action which resulted in next_state </span><span class="n">next_state</span><span class="p">.</span><span class="n">invert_perspective</span><span class="p">()</span> <span class="c1"># Predict the action the enemy is most likely to take </span><span class="n">enemy_action</span> <span class="o">=</span> <span class="n">argmax</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">next_state</span><span class="p">))</span> <span class="c1"># Apply the action and invert the perspective back to the original one </span><span class="n">true_next_state</span> <span class="o">=</span> <span class="n">next_state</span><span class="p">.</span><span class="nb">apply</span><span class="p">(</span><span class="n">enemy_action</span><span class="p">)</span> <span class="n">true_next_state</span><span class="p">.</span><span class="n">invert_perspective</span><span class="p">()</span> <span class="c1"># Finally calculate discounted future reward </span><span class="n">future_reward</span> <span class="o">=</span> <span class="n">reward</span> <span class="o">+</span> <span class="n">gamma</span> <span class="o">*</span> <span class="n">amax</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">true_next_state</span><span class="p">))</span></code></pre></figure> <p>I have also tried subtracting the enemy reward from the reward that took the original action, but have not been able to measure good long or short term results with this policy.</p> <h2 id="win-conditions-and-rewards">Win Conditions and Rewards</h2> <p>Another problem with certain board games such as Connect Four is that they have no objective way of keeping score. There is only reward for victory and punishment for failure. I have had luck using 1.0 for victory, -1.0 for failure, and 0.0 for all other moves. Samples for duplicate games in a row and ties should be discarded as they don’t contain any useful information and will only serve to pollute our replay memory.</p> <h2 id="measuring-performance">Measuring Performance</h2> <p>One major challenge of DQNs with only win / loss conditions is measuring the networks performance over time. I have found a few ways to do this, including having the agent play a short term reward maximizing symbolic AI every N games as validation. If our agent cannot beat an agent that only thinks in the short term, then we need to continue making changes to the network structure, hyper-parameters, and feature representation. Beating this short sighted AI consistently should be our first goal.</p> <h2 id="network-stability">Network Stability</h2> <p>A common mistake creating a DQN is making the network have too few dimensions to begin with. This can cause serious aliasing in our predictions, resulting in an unstable network. Generally speaking, it is better to start with a wide network and testing how much the network can be slimmed down.</p> <p>We must also make sure our training data and labels are formatted in a way to ensure stability. Rewards should be normalized in the [-1., 1.] range, and any discounted future reward which is outside of this range should be clipped.</p> <p>Another factor in network stability is our experience replay buffer size. Too small and our network will forget past things it learned, and too big and it will take excessive time to learn. I find it is generally its better to start smaller while testing if the network is able to learn simple gameplay, and increasing it as training time increases and we want to insure network stability. People smarter than I such as Schaul et al. (2017) have proposed methods to optimize the size of the experience replay buffer: <a href="https://arxiv.org/abs/1511.05952">Prioritized Experience Replay</a> which may be worth investigating if you are unsure how to tune this.</p> <p>Another factor to consider is the optimizer learning rate. A high learning rate can create instabilities in the neural networks state approximation behavior, resulting in all kinds of catastrophic forgetfulness. Starting at 0.001 is a good idea, and if you note instabilities with this try decreasing it from there. I find that 0.0001 works optimally for longer training sessions.</p> <p>Finally, techniques used in deep neural networks such as dropout and batchnorm have a negative impact on Deep-Q Learning. I suggest watching <a href="https://www.youtube.com/watch?v=fevMOp5TDQs">Deep RL Bootcamp Lecture 3: Deep Q-Networks</a> if you are interested in more information on this.</p> <h2 id="conclusion">Conclusion</h2> <p>Deep Q Learning proves to be both extremely interesting and challenging. While I am not completely happy with my own results in training a DQN for Connect Four, I think it is at least worth posting some of the things I have learned from the experience. My current agent can be found at the link below.</p> <ul> <li><a href="https://github.com/csvance/deep-learning-connect-four">Github: DQN AI for Connect Four</a></li> </ul> <h2 id="references">References</h2> <ul> <li><a href="https://keon.io/deep-q-learning/">Deep Q-Learning with Keras and Gym</a></li> <li><a href="https://www.youtube.com/watch?v=fevMOp5TDQs">Deep RL Bootcamp Lecture 3: Deep Q-Networks</a></li> </ul>Carroll VanceBackground At this point, one would have to be living under a rock to have not heard of DeepMind’s success at teaching itself to play Go by playing itself without any feature engineering. However, most available tutorials online about Deep Q Networks are coming from an entirely different angle: learning how to play various single player games in the OpenAI Gym. If one simply applies these examples to turn based games in which the AI learns by playing itself, a world of hurt is in store for several reasons:The RNN Sequence Prediction Seed Problem and How To Solve It2017-12-29T18:04:00+00:002017-12-29T18:04:00+00:00https://csvance.github.io/blog/2017/12/29/RNN-seed-problem<h2 id="the-problem">The Problem</h2> <p>There are many <a href="https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/">tutorials</a> on how to create <a href="https://en.wikipedia.org/wiki/Recurrent_neural_network">Recurrent Neural Networks</a> and use them for sequence generation. However, most of these tutorials show an example where an initial seed value must be used to start the generation process. This is highly impractical for a query response generation scheme. Luckily, there is a fairly easy way to solve this involving how we format our training data.</p> <h2 id="solution">Solution</h2> <h3 id="empty-item">Empty Item</h3> <p>First of all, we need to make sure we have a dummy value which represents an empty sequence item. It is convenient to use zero for this, because most padding functions are going to pad with zeros, and functions like np.zeros make it easy to initialize this. We will represent this value as NULL for simplicity.</p> <h3 id="end-of-sequence-item">End of Sequence Item</h3> <p>We need a second special value for marking the end of a sequence. Call it EOS for short, and it can be whatever value you want provided it stays consistent.</p> <h3 id="sequence-steps">Sequence Steps</h3> <p>Instead of feeding entire sequences for training, we are going to step through each sequence, generating subsequences starting with all empty values and adding one more item in each step. The label for each sequence will simply be the next word in the sequence, or <EOS> if we reach the end of the sequence. For example, take the sequence "The quick brown fox jumps over the lazy dog. This will break down into the following sequences:</EOS></p> <pre><code class="python"> "NULL NULL NULL NULL NULL NULL NULL NULL NULL" -&gt; The "The NULL NULL NULL NULL NULL NULL NULL NULL" -&gt; quick "The quick NULL NULL NULL NULL NULL NULL NULL" -&gt; brown "The quick brown NULL NULL NULL NULL NULL NULL" -&gt; fox "The quick brown fox NULL NULL NULL NULL NULL" -&gt; jumps "The quick brown fox jumps NULL NULL NULL NULL" -&gt; over "The quick brown fox jumps over NULL NULL NULL" -&gt; the "The quick brown fox jumps over the NULL NULL" -&gt; lazy "The quick brown fox jumps over the lazy NULL" -&gt; dog "The quick brown fox jumps over the lazy dog" -&gt; EOS </code></pre> <h3 id="sequence-size">Sequence Size</h3> <p>If you have a sequence larger than your maximum size, start removing the first element before you append a new one. This will have no bearing on prediction because the network will learn how to handle this.</p> <h2 id="conclusion">Conclusion</h2> <p>We can now train the network and start by feeding it a sequence filled with NULLs to predict the first value. Here is an example doing this using <a href="https://keras.io">Keras</a>:</p> <ul> <li><a href="https://github.com/csvance/armchair-expert/blob/master/models/structure.py">Preprocessing &amp; Model</a></li> </ul>Carroll VanceThe Problem There are many tutorials on how to create Recurrent Neural Networks and use them for sequence generation. However, most of these tutorials show an example where an initial seed value must be used to start the generation process. This is highly impractical for a query response generation scheme. Luckily, there is a fairly easy way to solve this involving how we format our training data.Hello World!2017-12-09T19:40:48+00:002017-12-09T19:40:48+00:00https://csvance.github.io/blog/2017/12/09/hello-world<p>Hi, and welcome to my portfolio and blog. I will be chronicling my explorations in machine learning, artificial intelligence, and software engineering here.</p>Carroll VanceHi, and welcome to my portfolio and blog. I will be chronicling my explorations in machine learning, artificial intelligence, and software engineering here.