CNN Concepts
Computational View of a DNN
Its Just Matrix Math
In deep neural networks (DNN), matrix math is the foundation for propagating data through layers, as depicted in the diagram. Each layer starts with input features (\( \mathbf{x}_1, \mathbf{x}_2, \dots \)) that are combined with a weight matrix (\( \mathbf{W} \)) to compute scores:
The scores are then passed through an activation function \( f(\cdot) \), such as the one shown in the diagram, to introduce non-linearity:
The result, \( \mathbf{y} \), represents the output of hidden units in each layer, which then serve as inputs to the next layer. This process is repeated across multiple layers, from the first layer (weights, scores, and activation functions) through the hidden layers to the final layer, which produces the output units.
Each hidden layer transforms its inputs into a new representation using the same sequence: weights, scores, activation functions, and outputs. These operations are computationally intensive but highly parallelizable, making GPUs and TPUs essential for efficiently handling the large-scale computations in deep learning.
Training the DNN
Training is the process of learning the optimal weights and biases for a DNN. It involves:
- Forward Pass: Input features are propagated through the network to compute predictions.
- Loss Calculation: The error between predictions and targets is measured using a loss function.
- Backpropagation: Gradients of the loss with respect to the weights are calculated.
- Weight Update: Weights are adjusted using an optimizer (e.g., SGD) to minimize the error.
This process repeats over many iterations (epochs) until the model converges.
Executing the DNN (Inference)
Inference uses the trained weights to make predictions. It involves:
- Forward Pass Only: Input features propagate through the network to compute outputs.
- Prediction: The final output represents the model's decision.
Unlike training, inference is faster and less computationally intensive as it skips backpropagation and weight updates.
Key Differences
Aspect | Training | Inference |
---|---|---|
Purpose | Learn weights. | Make predictions. |
Operations | Forward pass + backpropagation. | Forward pass only. |
Resource Needs | High (GPUs/TPUs). | Low, optimized for speed. |