Ordinary Differential Equations: Limit Sets and Long Term Behavior

I wrote my Ordinary Differential Equations term project in LaTeX and thought it would be fun to see if I could convert it to markdown and post it here. Apparently MathJax makes this easy, so here it is!

Abstract

This paper will address the long-term behavior of systems of ordinary differential equations (hereafter ODEs), including the first and second dimensional cases. The specific behavior of interest is:

where ${y}(t)$ is a solution to a system of ODEs. Families of solutions will be considered analytically and numerically for behavior such as periodic orbits and fixed points. The considered problems will be plotted and have their limit sets enumerated.

Dimension D = 1

Consider the first order linear ODE:

By simply looking at the equation, it can be observed that it is autonomous (no independent variable is present in the ODE). From this we know that the behavior of the solution does not change with the independent variable $t$. Solving the differential equation through integration provides a solution:

Analytically it can be observed that with an initial condition of $y(0) = 0$ that $C_1 = 0$. Consequentially for any value of $t$, $y(t) = 0$:

For any other initial condition, it can observed that $y(t)$ will grow without bounds:

Because $y\prime = y$ is a one dimensional system, its phase plane only has a single dimension $y(t)$. The limit set includes all points in this dimension such that:

It directly follows that the limit set for the ODE $y\prime = y$ is $\{(0)\}$.

Dimension D = 2

Damped Vibrating Spring

Consider the second order system of linear ODEs:

This is derived from a second order unforced and damped harmonic motion ODE $y\prime\prime + 2cy\prime + \omega_0^2 = 0$ where $c = 1$ and $\omega_0 = 2$. When $c < \omega_0$ the system is over-damped. Knowing this, it should be expected that all solutions in the phase plane converge to a single point as the damping term overtakes the rest of the system. Consequentially, the limit set of the system contains the point of convergence for all solutions as $t\to\infty$.

Plotting the solution curves displays behavior typical of a spiral sink, and the trace determinant plane confirms it: $D(A) = 4,T(A) = -2$. Due to this behavior, all sollution curves in the phase plane converge to a single two dimensional equilibrium point $(0, 0)$ as $t\to\infty$. It follows that the limit set for this ODE is $\{(0, 0)\}$. While this limit set resembles the limit set of the one dimensional case, it should be noted that the ODE in the one dimensional case only had a non infinite solution as $t\to\infty$ with a single IVP, where as every IVP in the two dimensional case converged towards the origin.

Undamped Vibrating Spring

Consider the second order system of linear ODEs:

This is derived from a second order unforced and undamped harmonic motion ODE $y\prime\prime + 2cy\prime + \omega_0^2 = 0$ where $c = 0$ and $\omega_0 = 2$. When $c = 0$ the system is undamped. Knowing this, it should be expected that all solutions of $y(t)$ and $y\prime(t)$ to be repeating in nature, so they should form an ellipse like shape in the phase plane. Consequentially, the limit set of the system should contain as many members as there are solutions to initial value problems. A plot confirms these suspicions:

This system clearly exhibits a center behavior, which is confirmed by the trace determinant plane: $D(A) = 4,T(A) = 0$. It follows that the limit set contains every curve in the phase plane formed by the general solution of the ODE and its derivative:

It should also be noted that the origin of the phase plane $(0, 0)$ is in the limit set. This can easily be verified by solving the initial value problem for $y(0) = 0, y\prime(0) = 0$. Finally, the limit set for this ODE can be characterized by the two behaviors mentioned above:

  1. $(y=0,y\prime=0)$: Solutions that start at the origin stay at the origin as $t\to\infty$

  2. $y\ne0,y\prime \ne 0$: Solutions that start outside of the origin stay in a periodic solution curve as $t\to\infty$ which is defined by the system solution for the initial value problem.

Undamped Pendulum

Consider the second order system of nonlinear ODEs:

This is derived from the undamped pendulum ODE $\theta\prime\prime = -sin(\theta)$. There are quite a few initial conditions $p_0$ to consider.

  1. For $p_0\in\{ (\theta = 2n\pi, \omega_0 = 0)| n\in\mathbb{Z} \}$, the phase plane solution should be a single point $p_0$ as the pendulum has no energy.

  2. For $p_0\in\{\theta_0 \ne 0, \omega_0 = 0\}$ the phase plane solution should look like a center around a point $p_c \in \{ (\theta = 2n\pi, \omega)| n\in\mathbb{Z} \}$ as the pendulum converts potential energy to velocity, back to potential energy, and finally changing directions to repeat the process.

  3. For $p_0\in\{(\theta_0=0,\mid\omega_0\mid \lessapprox 2.00027)\}$ , the pendulum should behave like case two.

  4. For $p_0\in\{(\theta_0=0,\mid\omega_0\mid \gtrapprox 2.00027)\}$, the pendulum no longer changes direction and $\lim_{t\to\infty} \theta(t) = \pm\infty$

  5. For $p_0\in\{ (\theta_0 = \pi + 2n\pi, \omega_0 = 0)| n\in\mathbb{Z} \}$, the phase plane solution should be a single point $p_0$ as the pendulum needs a nudge in either direction to start moving.

The plots above confirm the predicted behavior of solutions. The limit set can be broken down into three different cases:

  1. Solutions where $\omega$ contains positive and negative values behave as a center around some point $p_c \in \{(\theta_0 = 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$. This is confirmed by looking at the trace determinant plane for the center equilibrium: $D(J)=1, T(J)=0$.

  2. $p_0 \in \{(\theta_0 = 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$: Solutions to this IVP stay at the angle they started at as $t\to\infty$. These equilibrium points are shared with case one.

  3. $\{ (\theta_0 = \pi + 2n\pi, \omega_0 = 0) | n\in\mathbb{Z} \}$: Solutions to this IVP stay at the angle they started at as $t\to\infty$. These equilibrium points are saddle points in the trace determinant plane: $D(J)=-1, T(J)=0$.

It is important to keep in mind that solutions where $\lim_{t\to\infty} \omega(t) \neq\pm\infty$ are not a member of the limit set if $\lim_{t\to\infty} \theta(t) = \pm\infty$. This excludes any solutions for which $\omega$ is exclusively $> 0$ or $< 0$ as $t\to\infty$. However, we can still make a definitive statement as to the behavior of $\omega$ as $t\to\infty$.

Damped Pendulum

Consider the second order system of nonlinear ODEs:

This is derived from the damped pendulum ODE $\theta\prime\prime = -sin(\theta) - c\theta\prime$. A reasonable guess as to its behavior would be behaving much like the undamped case except in the cases that resulted in endless oscillation. On the phase plane, these cases should converge to $(\theta_0 = 2n\pi | n\in\mathbb{Z}, \omega_0 = 0)$ instead.

Based on analysis of the system and observed behavior of the numerical plot, a limit set can be constructed:

  1. When $\{(\theta_0 = \pi + 2n\pi, \omega_0=0)| n\in\mathbb{Z}\}$, solutions remain where they started. This can be verified because these values are in the set of equilibrium points. The trace determinant shows saddle point behavior for the set of initial conditions: $D(J)=1, T(J)=-\frac{1}{2}$

  2. All solutions not included in case one converge to a spiral sinks in the set of $\{(\theta = 2n\pi, \omega = 0)| n\in\mathbb{Z}\}$ depending on initial conditions. This can be confirmed by recognizing that equilibrium points which follow the same pattern are all in the spiral sink region of the trace determinant plane: $D(J)=1, T(J)=-\frac{1}{2}$

Competing Species

Consider the second order system of nonlinear ODEs:

In a competing species system, the populations of two species interact with each other and/or themselves. To analyze such a system for long term behavior, the equilibrium points should first be considered:

Solving for the x-nullcline, y-nullcline, and nullcline provides the set $\{(0, 0), (1, 0), (0, \frac{4}{7}), ({\frac{3}{5}, \frac{2}{5}})\}$ Next, the system is linearized by computing the Jacobian:

Next the Jacobian is used to examine the equilibrium in the trace determinant plane:

Plotting for several initial value problems, a few things can be observed:

  1. When $x$ and $y$ have an initial values $> 0$, their solutions gravitate towards $\frac{3}{5}$ and $\frac{2}{5}$. This is consistent with the predicted nodal sink behavior.

  2. Initial values of $x$ and $y$ that start at zero stay at zero, which is expected when starting at the center of a nodal source.

  3. When $y = 0$, Initial values of $x > 0$ gravitate towards one. This makes sense when $y = 0$ reduces the first equation to $x\prime = x(x - 1)$ which clearly has a root of $1$. It follows that it has an equilibrium point there as well. Examining the trace determinant of $(1, 0)$ places it in the region of saddle points.

  4. When $x = 0$, $y$ gravitates towards $\frac{4}{7}$. This makes sense considering the second equation is reduced to $y\prime = y(4 - 7y)$, which has a root $\frac{4}{7}$. Examining the trace determinant of this point it clearly falls into the region of saddle points.

Considering the above, the limit set contains individual points on the phase plane: $\{(0, 0), (1, 0), (0, \frac{4}{7}), ({\frac{3}{5}, \frac{2}{5}})\}$. This system contains more non periodic individual points in its limit set than any previously examined system while containing no center curves.

Van der Pol’s Equation

Consider the following system of nonlinear ODEs:

To analyze this system, first the equilibrium points need to be solved using the intersection of the nullclines:

Next, the system is linearized and the trace determinant of the equilibrium point computed:

The equilibrium point is a special case with real eigenvalues and $T^2 = 4D$. Plotting the phase plane reveals atypical behavior:

There are two clear behaviors here, both of which are clearly members of the limit set:

  1. For $(x = 0, y = 0)$, the solution does not leave the origin of the phase plane.

  2. All other initial values seem to converge into the exact same periodic closed curve. This is different than previous center periodic solutions where there were infinitely many different paths depending on initial conditions.

Setting the initial condition to a part of the closed loop yields the following result:

Extra Work

Approximating Van der Pol’s Solution Curve

While I wasn’t able to find a solution to Van der Pol’s equation, I was not content walking away without at least approximating a solution. Here are the steps I took to approximate a curve. First, we apply a rotation using a linear transformation with $\theta=\frac{\pi}{16}$ (Matrix $A$ is the output of ODE45)

Next a shear transform is applied with $k = 0.03$:

A polynomial fit is found with $n = 26$. Next we apply the inverse of the linear transformations previously applied.

The moment of truth:

While the fit may not be anywhere near perfect, it is a good first attempt and perhaps worthy of more exploration at a later time.

Installing and using Tensoflow with TF-TRT on the Jetson Nano

One of the great things to release alongside the Jetson Nano is Jetpack 4.2, which includes support for TensorRT in python. One of the easiest ways to get started with TensorRT is using the TF-TRT interface, which lets us seamlessly integrate TensorRT with a Tensorflow graph even if some layers are not supported. Of course this means we can easily accelerate Keras models as well!

nVidia now provides a prebuilt Tensorflow for Jetson that we can install through pip, but we also need to make sure certain dependencies are satisfied.

sudo apt install python3-numpy python3-markdown python3-mock python3-termcolor python3-astor libhdf5-dev

Follow the instructions here to install tensorflow-gpu on Jetpack 4.2: https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-

Now that Tensorflow is installed on the Nano, lets load a pretrained MobileNet from Keras and take a look at its performance with and without TensorRT for binary classification.

import tensorflow.keras as keras
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten

mobilenet = keras.applications.mobilenet.MobileNet(include_top=False, input_shape=(224, 224, 3), weights='imagenet', alpha=0.25)
mobilenet.summary()

x = Flatten()(mobilenet.output)
new_output = Dense(1, activation='sigmoid')(x)

model = Model(inputs=mobilenet.input, outputs=new_output)
model.summary()

# TODO: Train your model for binary classification task

model.save('mobilenet.h5')

Next we can execute inferences with different settings using this script (thanks to jeng1220 for the Keras to TF-TRT code)

You will need to install plac to run the script: pip3 install --user plac

# Tensorflow Standard Inference
python3 tftrt_inference.py -S 30 -T TF mobilenet.h5
# Time = 4.19 s
# Samples = 30
# FPS = Samples / Time = 30 / 4.19 = 7.16 FPS

# TensorRT FP32 Inference
python3 tftrt_inference.py -S 30 -T FP32 mobilenet.h5
# Time = 0.96 s
# Samples = 30
# FPS = Samples / Time = 30 / 0.96 = 31.3 FPS

# TensorRT FP16 Inference
python3 tftrt_inference.py -S 30 -T FP16 mobilenet.h5
# Time = 0.84 s
# Samples = 30
# FPS = Samples / Time = 30 / 0.84 = 35.8 FPS

It looks like TensorRT makes a significant difference vs simply running the inference in Tensorflow! Stay tuned for my next steps on the Nano: implementing and optimizing MobileNet SSD object detection to run at 30+ FPS!

Using the new Jetson.GPIO python library in Jetpack 4.2

With the release of the new Jetson Nano also comes the 4.2 release of nVidia’s Jetpack BSP for the Jetson. This included a new python library called Jetson.GPIO which provides a familiar interface for anyone who has used RPi.GPIO before. However, it doesn’t seem to be installed by default, so here are the instructions for getting it loaded into python!

# Setup groups/permissions
sudo groupadd -f -r gpio
sudo usermod -a -G gpio your_user_name
sudo cp /opt/nvidia/jetson-gpio/etc/99-gpio.rules /etc/udev/rules.d/
sudo udevadm control --reload-rules && sudo udevadm trigger

# Reboot required for changes to take effect
sudo reboot

Now we need to install Jetson.GPIO:

sudo pip3 install Jetson.GPIO
sudo pip install Jetson.GPIO

After this we should be able to import the library in both versions of python:

nvidia@tx2:~$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import Jetson.GPIO
>>>
nvidia@tx2:~$ python
Python 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import Jetson.GPIO
>>>

Happy hacking!

Covariance Matrices in ROS

A common issue learning ROS for those without a background in mathematics (specifically Statistics and Linear Algebra) is how to generate a covariance matrix for various message types, and why one needs to generate such a thing in the first place. Today I will attempt to explain both!

Covariance Matrix

Covariance Matrix

What are covariance matrices used for anyway?

In robotics, a common problem is how to estimate the robot’s pose in three dimensional space. We need to know where the robot (starting with base_link) is in order to understand where the sensor readings take place. In ROS most things are relative to base_link so we if we don’t know where base_link is we don’t know where laser_scanner, camera, or any of our other sensors are! If we don’t know where the sensor is, we don’t know where the readings apply to in space. Covariance matrices help us with robotic state estimation problems in two different ways here:

  • The diagonal of a covariance matrix is simply the variance of one of our sensor readings. By knowing the variance, we know a reasonable upper and lower bound of the error over time. This is important to know for using these readings to estimate other things.
  • State estimation algorithms can use the non diagonal entries (covariance) of the covariance matrix to reduce error in readings by understanding how things vary with other things. A positive covariance tells us that when one reading is high, the other one is likely to be as well. A negative covariance tells us the opposite.

Something Practical

A practical example of a covariance matrix is using an IMU sensor to improve our estimation of our current position and heading. An IMU can help with this by letting the robot know how much it is accelerating and how fast it is rotating.

Let’s look at the simplest possible case: a robot that lives in the X and Y dimensions. At T=0s, the robot is at (x=0, y=0), and our accelerometer is telling us we are accelerating at (x=1, y=0) m/s^2. At T=1s we see our acceleration remains constant, and want to know how far the robot moved and how fast it is going. Newton taught us that velocity is the derivative of position, and acceleration the derivative of velocity. Since we are working with numbers, we will need to use numerical integration techniques to determine our velocity V and position P from acceleration A. Because our acceleration is constant it forms a rectangle shape, so we can just multiply our ΔT by our acceleration (1 m/s^2). We find that our velocity V increases linearly by 1 m/s over 1 second. To get to position P We can calculate the area of the triangle under the linear velocity curve (1/2 base*height). This gives us a final position P of (x=0.5, y=0) m. Now consider how the covariance matrix can help us understand the error in this process, and eventually improve our estimation by combining the results from other sensors.

Let’s start by looking at the variance of acceleration. In this particular case we found it to be (x=0.001, y=0.002). This means it has a standard deviation of (x=0.0316, y=0.0447) by taking the square root of the variance. We then add and subtract 1/2 the standard deviation from our linear accelerometer readings. This gives us a reasonable estimation of the real range of the acceleration A.

Now we repeat the process of numerical integration with both our high and low estimates.

Covariance Example

We have now calculated a possible range of estimates for velocity V and position P for our IMU. However, we have only accounted for a single time step. Because we are uncertain about our starting state for the next time step, the possible range of values balloons quickly, especially when one starts considering the direction the robot is facing during all of this.

To improve our estimation we can use a second sensor and combine the results together. Our state estimator will look at the intersections of our different sensors estimations, and calculate the maximum likely value for each state we are estimating. So say we also had an estimation of our position from odometry, and there was some intersection between our IMU estimation. The true X and Y we are trying to estimate is most likely inside the intersection of these ranges.

The concept for covariance is similar, but the details are far more complicated. Just know that state estimation systems such as Extended Kalman Filters use this information to make better predictions about the state.

How to calculate a (sampling) covariance matrix?

Let’s consider the case of an IMU sensor again. We want to measure the sampling noise of the linear acceleration when it is completely still. We create a matrix A containing 3 rows (x, y, z) and 100 columns (individual readings). Next, we calculate the row mean of x, y, and z. We subtract each rows row mean from itself. This gives us a matrix B. Next, we matrix multiply B*B^T. Finally we divide the resulting matrix by n, the number of samples we recorded. This gives us a covariance matrix C.

Covariance Calculation

When we multiply two matrices together, the result has the dimensions of their outside dimensions. So 3 x 100 * 100 x 3 = 3 x 3. So we now have a 3 x 3 covariance matrix, and we need to add it to our ROS IMU messages. The IMU message in ROS contains an array called linear_acceleration_covariance which has 9 floating point values. We can simply reshape our matrix to a 9 element array, and store the values in linear_acceleration_covariance. In our covariance array, indexes 0, 4, and 8 (the diagonals) will contain variances, and the rest of the indexes contain covariances between variables.

Keep in mind that this process is only taking into account sampling noise from the sensor, rather than inherent inaccuracies that may be present. It would be prudent to pad the variance for certain sources such as odometry due to factors such as wheel slippage.

For an example of calculating a covariance matrix with a real sensor and data within a ROS node, see my LSM9DS0 IMU ROS node.

TensorRT ROS Nodes for nVidia Jetson

During the past few months I have been working towards making high performance deep learning inferences much more accessible in ROS on the nVidia Jetson TX2. The result is jetson_tensorrt: a collection of optimized TensoRT based nodes and nodelets specifically tailored to the Jetson platform. To start out with, classification and object detection are supported for nVidia DIGITS ImageNet and DetectNets.

Here is some example output in rviz from a single class pedestrian detector using an Intel RealSense D435: detect

DetectNet Object Detection

Support is planned for SegNets as well.

Check it out on Github!

Keras/Tensorflow, TensorRT, and Jetson

nVidia’s Jetson platform is arguably the most powerful family of devices for deep learning at the edge. In order to achieve the full benefits of the platform, a framework called TensorRT drastically reduces inference time for supported network architectures and layers. However, nVidia does not currently make it easy to take your existing models from Keras/Tensorflow and deploy them on the Jetson with TensorRT. One reason for this is the python API for TensorRT only supports x86 based architectures. This leaves us with no real easy way of taking advantage of the benefits of TensorRT. However, there is a harder way that does work: To achieve maximum inference performance we can export and convert our model to .uff format, and then load it in TensorRT’s C++ API.

1. Training and exporting to .pb

  • Train your model
  • If using Jupyter, restart the kernel you trained your model in to remove training layers from the graph
  • Reload the models weights
  • Use an export function like the one in this notebook to export the graph to a .pb file

2. Converting .pb to .uff

I suggest using the chybhao666/cuda9_cudnn7_tensorrt3.0:latest Docker container to access the script needed for converting a .pb export from Keras/Tensorflow to .uff format for TensorRT import.

cd /usr/lib/python2.7/dist-packages/uff/bin
# List Layers and manually pick out the output layer
# For most networks it will be dense_x/BiasAdd, the last one that isn't a placeholder or activation layer
python convert_to_uff.py tensorflow --input-file /path/to/graph.pb -l

# Convert to .uff, replace dense_1/BiasAdd with the name of your output layer
python convert_to_uff.py tensorflow -o /path/to/graph.uff --input-file /path/to/graph.pb -O dense_1/BiasAdd

More information on the .pb export and .uff conversion is available from nVidia

3. Loading the .uff into TensorRT C++ Inference API

I have created a generic class which can load the graph from a .uff file and setup TensorRT for inference while taking care of all host / device CUDA memory management behind the scenes. It supports any number of inputs and outputs and is available on my Github. It can be built with nVidia nSight Eclipse Edition using a remote toolchain (instructions here)

Caveats

  • Keep in mind that many layers are not supported by TensorRT 3.0. The most obvious omission is BatchNorm, which is used in many different types of deep neural nets.
  • Concatenate only works on the channel axis and if and only if the other dimensions are the same. If you have multiple paths for convolution, you are limited to concatenating them only when they have the same dimensions.

Turn Based Games and 1v1 DQNs

Background

At this point, one would have to be living under a rock to have not heard of DeepMind’s success at teaching itself to play Go by playing itself without any feature engineering. However, most available tutorials online about Deep Q Networks are coming from an entirely different angle: learning how to play various single player games in the OpenAI Gym. If one simply applies these examples to turn based games in which the AI learns by playing itself, a world of hurt is in store for several reasons:

  • In standard DQN learning, the target reward is retrieved by using the next state after an action is taken. However, the next state in a turned based dueling game is used by the enemy of the agent who took the action. To further complicate matters, the generated next state from an action is in the perspective of the agent taking the action. If we attempt to implement standard DQN, we are training the agent with data used in incorrect game contexts and assigning rewards for the wrong perspective.
  • Many turn based dueling games only have a win condition rather than a score which can be used for rewards. This complicates both measuring a DQN’s performance and assigning rewards.

State and Perspective

First of all, in a game where an agent plays itself from multiple perspectives, we must be careful the correct perspective is provided when making predictions or training discounted future rewards. For example, let us consider the game Connect Four. Instead of viewing the game as a battle between a red agent and a black agent, we could consider it from the perspective the agents viewpoint at the state being considered. For example, when the agent who takes the second turn blocks the agent who went first, the following next state is generated:

perspective

However, this next state wouldn’t be used by the agent who went second to take an action. It is going to be used by the agent who went first, but it needs to be inverted to their perspective before it can be used:

perspective

However, this is not the only tweak needed to get DQN working with a dueling turn based game. Let us recall how the discounted future reward is calculated:

  • future_reward = reward + gamma * amax(predict(next_state))
  • gamma: discount factor, for example 0.9
  • reward: the reward the agent recieved for taking an action
  • next_state: the state generated from applying an action to the original state
  • amax selects: highest value from the result

Remember, next_state will be the enemy agent’s state. So if we simply implement this formula, we are predicting the discounted future reward that the enemy agent might receive, not our own. We must predict one more state into the future in order to propagate the discounted future reward:

# We must invert the perspective of next_state so it is in the perspective of the enemy of the player who took the action which resulted in next_state
next_state.invert_perspective()
# Predict the action the enemy is most likely to take
enemy_action = argmax(predict(next_state))
# Apply the action and invert the perspective back to the original one
true_next_state = next_state.apply(enemy_action)
true_next_state.invert_perspective()
# Finally calculate discounted future reward
future_reward = reward + gamma * amax(predict(true_next_state))

I have also tried subtracting the enemy reward from the reward that took the original action, but have not been able to measure good long or short term results with this policy.

Win Conditions and Rewards

Another problem with certain board games such as Connect Four is that they have no objective way of keeping score. There is only reward for victory and punishment for failure. I have had luck using 1.0 for victory, -1.0 for failure, and 0.0 for all other moves. Samples for duplicate games in a row and ties should be discarded as they don’t contain any useful information and will only serve to pollute our replay memory.

Measuring Performance

One major challenge of DQNs with only win / loss conditions is measuring the networks performance over time. I have found a few ways to do this, including having the agent play a short term reward maximizing symbolic AI every N games as validation. If our agent cannot beat an agent that only thinks in the short term, then we need to continue making changes to the network structure, hyper-parameters, and feature representation. Beating this short sighted AI consistently should be our first goal.

Network Stability

A common mistake creating a DQN is making the network have too few dimensions to begin with. This can cause serious aliasing in our predictions, resulting in an unstable network. Generally speaking, it is better to start with a wide network and testing how much the network can be slimmed down.

We must also make sure our training data and labels are formatted in a way to ensure stability. Rewards should be normalized in the [-1., 1.] range, and any discounted future reward which is outside of this range should be clipped.

Another factor in network stability is our experience replay buffer size. Too small and our network will forget past things it learned, and too big and it will take excessive time to learn. I find it is generally its better to start smaller while testing if the network is able to learn simple gameplay, and increasing it as training time increases and we want to insure network stability. People smarter than I such as Schaul et al. (2017) have proposed methods to optimize the size of the experience replay buffer: Prioritized Experience Replay which may be worth investigating if you are unsure how to tune this.

Another factor to consider is the optimizer learning rate. A high learning rate can create instabilities in the neural networks state approximation behavior, resulting in all kinds of catastrophic forgetfulness. Starting at 0.001 is a good idea, and if you note instabilities with this try decreasing it from there. I find that 0.0001 works optimally for longer training sessions.

Finally, techniques used in deep neural networks such as dropout and batchnorm have a negative impact on Deep-Q Learning. I suggest watching Deep RL Bootcamp Lecture 3: Deep Q-Networks if you are interested in more information on this.

Conclusion

Deep Q Learning proves to be both extremely interesting and challenging. While I am not completely happy with my own results in training a DQN for Connect Four, I think it is at least worth posting some of the things I have learned from the experience. My current agent can be found at the link below.

References

The RNN Sequence Prediction Seed Problem and How To Solve It

The Problem

There are many tutorials on how to create Recurrent Neural Networks and use them for sequence generation. However, most of these tutorials show an example where an initial seed value must be used to start the generation process. This is highly impractical for a query response generation scheme. Luckily, there is a fairly easy way to solve this involving how we format our training data.

Solution

Empty Item

First of all, we need to make sure we have a dummy value which represents an empty sequence item. It is convenient to use zero for this, because most padding functions are going to pad with zeros, and functions like np.zeros make it easy to initialize this. We will represent this value as NULL for simplicity.

End of Sequence Item

We need a second special value for marking the end of a sequence. Call it EOS for short, and it can be whatever value you want provided it stays consistent.

Sequence Steps

Instead of feeding entire sequences for training, we are going to step through each sequence, generating subsequences starting with all empty values and adding one more item in each step. The label for each sequence will simply be the next word in the sequence, or if we reach the end of the sequence. For example, take the sequence "The quick brown fox jumps over the lazy dog. This will break down into the following sequences:


"NULL NULL NULL NULL NULL NULL NULL NULL NULL" -> The
"The NULL NULL NULL NULL NULL NULL NULL NULL" -> quick
"The quick NULL NULL NULL NULL NULL NULL NULL" -> brown
"The quick brown NULL NULL NULL NULL NULL NULL" -> fox
"The quick brown fox NULL NULL NULL NULL NULL" -> jumps
"The quick brown fox jumps NULL NULL NULL NULL" -> over
"The quick brown fox jumps over NULL NULL NULL" -> the
"The quick brown fox jumps over the NULL NULL" -> lazy
"The quick brown fox jumps over the lazy NULL" -> dog
"The quick brown fox jumps over the lazy dog" -> EOS

Sequence Size

If you have a sequence larger than your maximum size, start removing the first element before you append a new one. This will have no bearing on prediction because the network will learn how to handle this.

Conclusion

We can now train the network and start by feeding it a sequence filled with NULLs to predict the first value. Here is an example doing this using Keras:

Hello World!

Hi, and welcome to my portfolio and blog. I will be chronicling my explorations in machine learning, artificial intelligence, and software engineering here.