Activation Function (AKA Transfer function)

In a neural network activation function adds non-linearity to it. Types:

Sigmoid(Logistic) (used mostly for output layer(looks like probability))
RelU or Rectified Linear Unit (important discovery for NN - most-used for hidden layers)(not suitable for output layer if output is supposed to be probability) and leaky RelU with some slope on negative part
tanH (Hyperbolic) (-1 - 1) or ArcTan (Tan Inverse -> maps to -Pi/2 - Pi/2)
Linear(or Identity) layer (used for output layers(best for regression))
Softmax (classification giving probability) (probability coz outputs add upto 1)
SquareRoot
Exponential
Sine.
Ramp
Step (Binary)
Unit Sum

if the network computation is something that is multiplicative, use log as activation so that the sum becomes addition.

Constraint Optimization: optimize in such a way that the output is constrained to some value.

Steps => number of iteration of batches Epoch => number of iterations of going throught the entire dataset