Intermediate

Deep Learning

Build neural networks with Keras and TensorFlow — CNNs for images, RNNs for sequences, transfer learning, and model optimization techniques.

Keras and TensorFlow

Keras is a high-level API that runs on top of TensorFlow. It simplifies building neural networks with an intuitive interface while TensorFlow handles the low-level computations.

# Building a simple neural network with Keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(128, activation='relu', input_shape=(784,)),
    Dropout(0.3),
    Dense(64, activation='relu'),
    Dropout(0.3),
    Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.fit(X_train, y_train, epochs=20, batch_size=32,
          validation_split=0.2)

Key Concepts

Sequential model — A linear stack of layers. The simplest way to build a neural network.
Dense layer — A fully connected layer where every neuron connects to every neuron in the next layer.
Activation functions — ReLU (hidden layers), sigmoid (binary output), softmax (multi-class output).
Dropout — Randomly disables neurons during training to prevent overfitting.
Optimizer — Adam is the default choice. It adapts the learning rate automatically.
Loss function — categorical_crossentropy for multi-class, binary_crossentropy for binary, MSE for regression.

Convolutional Neural Networks (CNNs)

CNNs are designed for image data. They automatically learn spatial features like edges, textures, and shapes through convolutional layers.

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten

model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    MaxPooling2D((2,2)),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D((2,2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

CNN Components

Conv2D — Convolutional layer that applies filters to detect features (edges, corners, textures)
MaxPooling2D — Reduces spatial dimensions by taking the max value in each region. Reduces computation and adds translation invariance.
Flatten — Converts 2D feature maps to 1D for the Dense layers
Filters — Number of feature detectors. More filters = more features detected.
Kernel size — Size of the filter window (commonly 3x3 or 5x5)

Recurrent Neural Networks (RNNs)

RNNs process sequential data by maintaining a hidden state that captures information from previous time steps.

Simple RNN — Basic recurrent layer. Suffers from vanishing gradients for long sequences.
LSTM (Long Short-Term Memory) — Uses gates (forget, input, output) to control information flow. Handles long-term dependencies.
GRU (Gated Recurrent Unit) — Simplified version of LSTM with fewer parameters. Often performs similarly.

RNN Applications

Text classification and sentiment analysis
Time series forecasting
Machine translation
Speech recognition

Transfer Learning

Transfer learning uses a pre-trained model (trained on millions of images) as a starting point, then fine-tunes it for your specific task.

from tensorflow.keras.applications import VGG16

# Load pre-trained VGG16 (without top classification layers)
base_model = VGG16(weights='imagenet', include_top=False,
                   input_shape=(224, 224, 3))

# Freeze the pre-trained layers
base_model.trainable = False

# Add custom classification head
model = Sequential([
    base_model,
    Flatten(),
    Dense(256, activation='relu'),
    Dense(num_classes, activation='softmax')
])

💡

When to use transfer learning: When your dataset is small (fewer than 10,000 images), transfer learning is almost always better than training from scratch. Common pre-trained models: VGG16, ResNet50, InceptionV3, MobileNet.

Model Optimization

Early stopping — Stop training when validation loss stops improving to prevent overfitting
Learning rate scheduling — Reduce learning rate over time for finer convergence
Batch normalization — Normalize layer inputs to accelerate training
Data augmentation — Apply random transformations to training images to increase effective dataset size
Regularization (L1/L2) — Add penalties for large weights to prevent overfitting

Practice Questions

📝

Q1: Which activation function should you use in the output layer for a 10-class image classification problem?

A) ReLU
B) Sigmoid
C) Softmax
D) Tanh

Show Answer

C) Softmax. Softmax converts raw outputs into probabilities that sum to 1, making it ideal for multi-class classification. Sigmoid is for binary classification (single probability). ReLU and tanh are for hidden layers, not output layers in classification tasks.

📝

Q2: What is the purpose of the Dropout layer in a neural network?

A) Speed up training
B) Reduce overfitting by randomly disabling neurons
C) Increase model accuracy
D) Normalize input data

Show Answer

B) Reduce overfitting by randomly disabling neurons. Dropout randomly sets a fraction of neurons to zero during each training step. This forces the network to learn redundant representations, preventing it from relying too heavily on any single neuron. It is only active during training, not inference.

📝

Q3: You have a dataset of only 500 images and need to build an accurate image classifier. Which approach is most appropriate?

A) Train a deep CNN from scratch
B) Use transfer learning with a pre-trained model like VGG16
C) Use K-Means clustering
D) Use logistic regression on raw pixels

Show Answer

B) Use transfer learning with a pre-trained model like VGG16. With only 500 images, training a deep CNN from scratch would severely overfit. Transfer learning uses a model pre-trained on millions of images (ImageNet) and fine-tunes it on your small dataset, achieving much better performance.

📝

Q4: Which layer type is specifically designed to detect spatial features in images?

A) Dense
B) LSTM
C) Conv2D
D) Flatten

Show Answer

C) Conv2D. Convolutional layers (Conv2D) apply learnable filters across the image to detect features like edges, textures, and patterns. Dense layers are fully connected (no spatial awareness). LSTM is for sequences. Flatten converts 2D maps to 1D vectors.

📝

Q5: Which RNN variant uses gates to handle long-term dependencies and solve the vanishing gradient problem?

A) Simple RNN
B) Dense layer
C) LSTM
D) Conv1D

Show Answer

C) LSTM (Long Short-Term Memory). LSTM networks use three gates (forget, input, output) to selectively remember or forget information over long sequences. This solves the vanishing gradient problem that makes simple RNNs unable to learn long-range dependencies.

← PreviousML with Python Next →IBM Watsonx