Deep Learning
Build neural networks with Keras and TensorFlow — CNNs for images, RNNs for sequences, transfer learning, and model optimization techniques.
Keras and TensorFlow
Keras is a high-level API that runs on top of TensorFlow. It simplifies building neural networks with an intuitive interface while TensorFlow handles the low-level computations.
# Building a simple neural network with Keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(128, activation='relu', input_shape=(784,)),
Dropout(0.3),
Dense(64, activation='relu'),
Dropout(0.3),
Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
model.fit(X_train, y_train, epochs=20, batch_size=32,
validation_split=0.2)
Key Concepts
- Sequential model — A linear stack of layers. The simplest way to build a neural network.
- Dense layer — A fully connected layer where every neuron connects to every neuron in the next layer.
- Activation functions — ReLU (hidden layers), sigmoid (binary output), softmax (multi-class output).
- Dropout — Randomly disables neurons during training to prevent overfitting.
- Optimizer — Adam is the default choice. It adapts the learning rate automatically.
- Loss function — categorical_crossentropy for multi-class, binary_crossentropy for binary, MSE for regression.
Convolutional Neural Networks (CNNs)
CNNs are designed for image data. They automatically learn spatial features like edges, textures, and shapes through convolutional layers.
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
MaxPooling2D((2,2)),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D((2,2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
CNN Components
- Conv2D — Convolutional layer that applies filters to detect features (edges, corners, textures)
- MaxPooling2D — Reduces spatial dimensions by taking the max value in each region. Reduces computation and adds translation invariance.
- Flatten — Converts 2D feature maps to 1D for the Dense layers
- Filters — Number of feature detectors. More filters = more features detected.
- Kernel size — Size of the filter window (commonly 3x3 or 5x5)
Recurrent Neural Networks (RNNs)
RNNs process sequential data by maintaining a hidden state that captures information from previous time steps.
- Simple RNN — Basic recurrent layer. Suffers from vanishing gradients for long sequences.
- LSTM (Long Short-Term Memory) — Uses gates (forget, input, output) to control information flow. Handles long-term dependencies.
- GRU (Gated Recurrent Unit) — Simplified version of LSTM with fewer parameters. Often performs similarly.
RNN Applications
- Text classification and sentiment analysis
- Time series forecasting
- Machine translation
- Speech recognition
Transfer Learning
Transfer learning uses a pre-trained model (trained on millions of images) as a starting point, then fine-tunes it for your specific task.
from tensorflow.keras.applications import VGG16
# Load pre-trained VGG16 (without top classification layers)
base_model = VGG16(weights='imagenet', include_top=False,
input_shape=(224, 224, 3))
# Freeze the pre-trained layers
base_model.trainable = False
# Add custom classification head
model = Sequential([
base_model,
Flatten(),
Dense(256, activation='relu'),
Dense(num_classes, activation='softmax')
])
Model Optimization
- Early stopping — Stop training when validation loss stops improving to prevent overfitting
- Learning rate scheduling — Reduce learning rate over time for finer convergence
- Batch normalization — Normalize layer inputs to accelerate training
- Data augmentation — Apply random transformations to training images to increase effective dataset size
- Regularization (L1/L2) — Add penalties for large weights to prevent overfitting
Practice Questions
A) ReLU
B) Sigmoid
C) Softmax
D) Tanh
Show Answer
C) Softmax. Softmax converts raw outputs into probabilities that sum to 1, making it ideal for multi-class classification. Sigmoid is for binary classification (single probability). ReLU and tanh are for hidden layers, not output layers in classification tasks.
A) Speed up training
B) Reduce overfitting by randomly disabling neurons
C) Increase model accuracy
D) Normalize input data
Show Answer
B) Reduce overfitting by randomly disabling neurons. Dropout randomly sets a fraction of neurons to zero during each training step. This forces the network to learn redundant representations, preventing it from relying too heavily on any single neuron. It is only active during training, not inference.
A) Train a deep CNN from scratch
B) Use transfer learning with a pre-trained model like VGG16
C) Use K-Means clustering
D) Use logistic regression on raw pixels
Show Answer
B) Use transfer learning with a pre-trained model like VGG16. With only 500 images, training a deep CNN from scratch would severely overfit. Transfer learning uses a model pre-trained on millions of images (ImageNet) and fine-tunes it on your small dataset, achieving much better performance.
A) Dense
B) LSTM
C) Conv2D
D) Flatten
Show Answer
C) Conv2D. Convolutional layers (Conv2D) apply learnable filters across the image to detect features like edges, textures, and patterns. Dense layers are fully connected (no spatial awareness). LSTM is for sequences. Flatten converts 2D maps to 1D vectors.
A) Simple RNN
B) Dense layer
C) LSTM
D) Conv1D
Show Answer
C) LSTM (Long Short-Term Memory). LSTM networks use three gates (forget, input, output) to selectively remember or forget information over long sequences. This solves the vanishing gradient problem that makes simple RNNs unable to learn long-range dependencies.