The other day, I was presented with a challenge:
Describe how a neural network works without invoking the metaphor of the brain.
The following is my attempt to meet that challenge. I will do this through the lens of a simplified self-driving car example.
(At the urging of Jack Clark, I have also written a tutorial on how to write a simple neural network in the Appendix at the end of this post.)
Suppose you want to program a self-driving car to activate the brakes, accelerator, and to steer left or right at the appropriate times. You have a number of sensors:
- front proximity radar
- rear proximity radar
- left proximity radar
- right proximity radar, and
- a speedometer.
If you knew what you were doing, you would wire the front proximity sensor to the breaks so that when something was really close in front of the car then the brakes would be activated hard. Likewise you would wire the left proximity radar and right proximity radar to the steering wheel such that the steering wheel was turned away from things that threaten the car from the side.
Here is a diagram showing the sensors on the left and the actuators (devices that can exert change on the world). The circles are connection points for the wires that we will call nodes.
But suppose you don’t know how a car works. How would you wire up the sensors?
Just to make things more complicated, you don’t have transistors, integrated circuits, or logic gates. All you have are wires of varying degrees of conductivity. Some wires are superconductors and electricity flows without much resistance at all. Some wires barely conduct electricity at all.
(Electrical engineers are scowling at me right now.)
One thing you could do is you could connect all sensors to all outputs. Pick wires with different conductivity randomly as you wire everything together.
Now try it out. Does you car crash? Probably. It probably crashes a lot. In fact, the behavior of the car as the sensors activate will be quite random.
Swap out some wires for different ones with different conductivity. Try it again. Maybe this time it crashes a little bit less. If it crashes less, we are on the right track. Maybe it crashes a little bit more.
If you do this long enough, you will eventually find a configuration that doesn’t crash at all.
Maybe something like the following will work, where the darker edges are higher-conductivity wires and the lighter lines are higher-resistance.
Even a simple network like this can do a lot of things. This one can apply pressure to the brakes depending on how close something is in front of the car and on how fast it is going. (For illustration purposes only, it’s not really a good idea to have the speedometer pass a lot of electrical signal directly to the brakes.)
However, it might be that you can’t find any combination of wires that makes the car do what you want because the desired car behavior is too complicated. In this case you add more wires, arranged in layers. The intermediate nodes aggregate electrical signals and combine them before passing more electrical signal on.
For example, you might want the car to brake if something is in front of the car unless there is something very close behind the car.
Adding the speedometer and steering back in…
It is not immediately obvious why this would work. It works because of details I haven’t explained yet. Bear with me.
Training the Network
Training is what we call the process of figuring out what types of wires to use for each connection between nodes. Without going into details, we run some sample inputs — different driving scenarios — into the network and then check to see if the car did the right thing. A scenario is just a combination of signals from the sensors (e.g., nothing close on the front radar, something close on rear radar, speedometer reading 60mph, etc.).
We measure how wrong each of the actuators were. If an actuator was activated mostly right we try wires connected to it with slightly different conductivity. If an actuator was activated very wrong we try wires with very different conductivity. Try the network on a lot of scenarios, tweaking the network after each time. Of course later tweaks might screw up earlier tweaks, so you run all the scenarios again on the network that you came up with after applying all the tweaks. Do this a lot of times and the network gets better and better at all of the scenarios.
The Metaphor Breaks Down
The metaphor breaks down pretty quickly. For example, it is possible for some wires to convert electrons into anti-electrons by multiplying the electrical current strength by -1. Anti-electrons, seriously? This sounds like it would screw things up. However, there something I haven’t told you much about the nodes. We need to dig in deeper into what is happening inside the nodes. They are doing more than just aggregating electrical current. If the total amount of electrical signal coming into a node is negative then a circuit breaker triggers preventing any negative electricity from passing out.
(This is called a rectified linear unit, or ReLU. Traditional neural networks use a sigmoid, but I find that ReLU fits the electrical circuit metaphor better. Real networks use them sometimes because it is more efficient to compute the output of a ReLU than a sigmoid.)
Furthermore, every node except the ones on the far left (the input nodes) have a special wire connected up to a negative electricity source. What this does is send a lot of negative energy into each node, trying to break the circuits. The wire connecting the node to the negative electricity source could have high conductivity, making the node require more positive electricity to keep the circuit breaker from tripping. The wire could have a lot of resistance, making the node more likely to pass electricity along. Again, you don’t need to figure out ahead of time which wires to use, just continue to randomly try different combinations until you get one that works and makes the car do what you want.
These connections to the negative electrical source, in conjunction with wires that turn electrons into anti-electrons is the reason why we can create a network that triggers the breaks when the front proximity radar activates unless the rear proximity radar is also active.
I don’t know if this is better than the brain and neuron metaphor. In some sense the brain metaphor is more apt because neurons in the brain were the inspiration for neural networks in the first place. Modern day neural networks frequently bear little resemblance to brains because of their highly engineered architectures and nodes. Because of the complex engineering and super-specialization of networks, the electrical circuit metaphor may be a better way of understanding exactly what these networks are trying to accomplish: programs. Regardless, this primer on neural networks probably didn’t make you think of brains. Or at least I hope you are convinced that you don’t need to fall back on the brain metaphor to explain what a neural network does and how it works.
Appendix: Code a Neural Net
Let’s get our hands dirty and write the neural net for that stupidly simple car described above. You can grab the code from https://github.com/markriedl/StupidlySimpleCar. I will also walk you through writing your own.
Writing a neural net is non-trivial, but there are APIs that make basic neural nets relatively trivial. If you aren’t trying to do anything too unusual, these toolkits should be fine. Google’s Tensorflow is perhaps one of the most popular. TFlearn is an API on top of Tensorflow that hides a lot of the complexity. We will use TFlearn because of its simplicity. We will use Python as our programming language.
- Install Python 2.7 if you don’t already have it.
- Install Numpy, a python numerical processing package.
- Install TFlearn. Follow the instructions, which will first ask you to install Tensorflow.
Before we can do anything we need a dataset. Grab “make-dataset.py” from https://github.com/markriedl/StupidlySimpleCar and run it:
python make-dataset data.csv 100000
This will create a coma-separated data file, “data.csv”, containing 100,000 synthetically produced sensor inputs and the desired response from our car. We do this because I don’t actually have a car that works exactly like this. In reality, I would record the sensor values while humans drive around and also record whether the human driver pushes the brakes, accelerator, etc. Neural networks work best with large amounts of data.
The data set contains the following information, which is a little bit different from the examples above:
- Front proximity: a value from 0.0 to 1.0 indicating how close another car is in front.
- Rear proximity: a value from 0.0 to 1.0 indicating how close another car is in back.
- Left proximity: a value from 0.0 to 1.0 indicating how close another car (or roadside barrier) is to the left.
- Right proximity: a value from 0.0 to 1.0 indicating how close another car (or roadside barrier) is to the right.
- Brakes: How hard the brakes are pressed, from 0.0 (not at all) to 1.0 (fully engaged).
- Accelarator: How hard the accelerator is pressed, from 0.0 (not at all) to 1.0 (fully engaged).
- Steer left: How far left the steering wheel is turned to the left, from 0.0 (not at all) to 1.0 (90 degrees).
- Steer right: How far left the steering wheel is turned to the right, from 0.0 (not at all) to 1.0 (90 degrees).
Some things to note: the steering wheel cannot be turned right and left at the same time. Also, the brakes and accelerator cannot be engaged at the same time.
The other reason synthetic data is good is because we can look at the “make-data.py” program to see exactly how the data is generated. For example, we can see that the brakes are only engaged if a car is in front is closer than 0.5 and we can see the formula for how hard the brakes will be pressed is based on the proximity. But if there is a car close behind (less than 0.5 on the proximity sensor), then the brakes are not pressed and the accelerator is also not pressed. Since we can see the program that created the data, we are basically asking the neural net to learn the wiring that will reconstruct this program. Remember: circuits are programs.
(If the data came from human drivers, the neural net would be learning the “program” that human drivers use to make decisions. How cool is that?)
Next we need to write a program that sets up and trains a neural net. Using TFlearn, this is going to be very straightforward because all of the mathematics will be hidden in the API. Call the file “car-tutorial.py”.
First, we need to write code to load the data set.
In this code snipet, we import tflearn and numpy packages. We use tflearn’s data loading functions to read in the CSV file. We do this twice. First, we get the data in the first 4 columns (sensor information). Second, we get the “labels”, which is a description of what we want the car to do when it sees those sensor inputs.
The neural network will set up the connections to recreate the labels from the sensor information. We do this so that we can put new sensor information into the network that we have never seen before and hopefully still get correct responses.
The last lines change the python arrays into numpy arrays. The data is required to be in this format.
Next, we need to set up a neural net and train it.
The first line tells the neural network how many input nodes. The second and third lines create two layers of intermediate nodes with 16 notes per row. The nodes in each of these layers has a connection to every node in the row before it and after it. (I added one more layer than in the example above because I found it worked better through trial and error.)
The fourth line sets up 4 output nodes. These four nodes correspond to the (1) brakes, (2) accelerator, (3) left steering, and (4) right steering.
The fifth line tells the neural network how to measure how wrong the outputs are. We are telling it to compare the activation of the output nodes to the labels using mean squared difference:
for each of the n nodes in the output layer.
We then tell tflearn that this network is a “deep neural network” (DNN). So a deep neural network is technically any network with 3 or more rows of nodes. (Disappointing, I know).
The last line invokes tflearn’s training algorithm. TFlearn is being told to sweep over the data 10 times and print out stats. The most important stat is the loss, which is the mean square difference between what the neural network decides and what the correct activations should be. This number will get smaller and smaller as the training algorithm tweaks the network details over and over.
If we run the program now…
…we should see something like the following:
We see loss (green) going down, which tells us that the neural network is getting better and better.
Next we need a way to test our neural network.
This asks the user to input values for the front, rear, left, and right sensors. It then puts these values into the trained neural network and reports on what the car should do.
If you put in values from data.csv (for example, in my data.csv the first row has 0.39, 0.77, 0.09, and 0.80) you should see numbers more or less similar to those in the 4th-8th columns (e.g,. ~0.2, ~0.0, ~0.0, ~0.81). The output probably won’t be exact, but that is okay. In most cases it should be close.
You can also put in combinations of values that are not in your CSV file and you should see outputs that look plausible most of the time. This is equivalent to the car getting into situations it has never seen before and still acting properly.
The code above is available at https://github.com/markriedl/StupidlySimpleCar as “car-tutorial.py”.
A slightly more complicated version is there as “car.py”. This version allows you to specify the name of the data file. It also saves the neural network model and allows you to load it so that you don’t have to go through training every time the program is run.
Appendix 2: Looking Inside Tensorflow
One of the fun things that Tensorflow and TFlearn can do is visualize the neural network underlying the code you just wrote (called tensorboard).
This is what the neural network in “car-tutorial.py” looks like when visualized:
This is a standard way of visualizing a neural network. It doesn’t look like the sketches above, but only because it is too hard to much trouble to draw all those nodes and connections. Instead, we draw a bubble for each layer in the network and a single line between bubbles to denote all the connections from every node in one layer to every node in the next layer. This is how the visualization relates to my earlier sketches (“car-tutorial.py” has one extra layer, but you should get the idea):
We can even go farther and look inside the bubbles. Below I have expanded one of the bubbles — a layer of nodes.
Now we can see all the mathematical operations that happen inside each layer of nodes.
The take-away is that everything that is happening in a neural network is the result of a bunch of numbers — originating from the data — being successively multiplied and added together. You are basically looking at the code inside TFlearn and seeing the additions and multiplications.
(Neural network training involves some calculus to figure out how to adjust the numbers in the W and b matrices. I’ll spare you the details.)
The point is, if any, there is no magic under the hood of neural network code, even when using fancy APIs like Tensorflow and TFlearn.