1 - Interface to FANN¶ ↑

$ cd src/tictactoe ; pushd `bundle list ruby-fann`
$ cd ext/ruby_fann

Pick up changes to FANN from ruby-fann

$ make clean ; make

Training Artificial Neural Network (ANN)

ruby_fann.c::Init_ruby_fann()
  Provides the ruby wrapper around the C fann lib

fann_train_data.c::fann_train_on_data()
  fann_train_epoch()
    fann_train_epoch_irpropm()

      Cycles through each training sample

      fann_train.c::fann_compute_MSE()
        Calc diff of output neurons from expected output given in trg sample

2 - Reinforcement learning¶ ↑

Software agents taking actions in an environment so as to maximise some notion of cumulative reward

In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality

Typically formulated as a Markov decision process (MDP)

Mathematical framework for modeling decision making in situations where outcomes are partly random and 
partly under the control of the decision maker

MDP is a discrete time stochastic control process
  At each time step the process is in some state "s" and the decision maker may choose any action "a" that is 
  available in state "s"

  The process responds at the next time step by randomly moving into a new state "s'" and giving the decision 
  maker a corresponding reward R.a(s,s')

  Given "s" and "a", the process is conditionally independent of all previous states and actions

  Extension of Markov chains so if only 1 action exists and all rewards are the same (ie zero) then MDP reduces 
  to Markov chain

$ cd ~/Documents/ann ; wget -r -np -k https://webdocs.cs.ualberta.ca/~sutton/book/ebook/
  -r = recursive
  -np = don't follow links to parent directories
  -k = make links in downloaded HTML/CSS point to local files