Beginner

Generative AI Fundamentals (24%)

Domain 2 of the AIF-C01 exam — foundation models, large language models, prompt engineering techniques, retrieval-augmented generation (RAG), and Amazon Bedrock.

What Is Generative AI?

Generative AI refers to AI systems that can create new content — text, images, code, audio, video — rather than just analyzing or classifying existing content. Unlike traditional ML that predicts a label or number, generative AI produces entirely new outputs.

Examples include chatbots that write essays, tools that generate images from text descriptions, code assistants that write functions, and systems that summarize documents. This domain is heavily tested because generative AI is the fastest-growing area of the AWS AI ecosystem.

Foundation Models

A foundation model (FM) is a large AI model trained on massive, broad datasets that can be adapted to a wide range of downstream tasks. Instead of training a model from scratch for each use case, you start with a foundation model and customize it.

Key Characteristics

Pre-trained on massive data — Trained on billions of data points (text, images, code) from the internet and other sources
General-purpose — Can perform many tasks without task-specific training (summarization, translation, Q&A, code generation)
Adaptable — Can be fine-tuned or prompted for specific use cases
Expensive to train — Training from scratch costs millions of dollars and requires massive compute. Most organizations use pre-trained FMs rather than building their own.

💡

Exam tip: The exam distinguishes between using a foundation model as-is (via prompting), fine-tuning it with your own data, and training a model from scratch. Know when each approach is appropriate. Prompting is cheapest and fastest. Fine-tuning adds domain-specific knowledge. Training from scratch is rarely needed.

Large Language Models (LLMs)

LLMs are a type of foundation model specifically designed for natural language tasks. They are trained on enormous text datasets and can generate, understand, translate, and summarize text.

How LLMs Work (Simplified)

LLMs predict the next token (word or sub-word) in a sequence, one at a time. Given "The cat sat on the," the model predicts "mat" as the most likely next token. By chaining these predictions, the model generates coherent paragraphs, essays, and code.

Key LLM Concepts for the Exam

Tokens — The units LLMs process. A token can be a word, part of a word, or a punctuation mark. "Unbelievable" might be split into ["un", "believ", "able"].
Context window — The maximum number of tokens the model can process at once. Larger context windows allow processing longer documents.
Temperature — Controls randomness. Low temperature (0.0-0.3) = deterministic, factual. High temperature (0.7-1.0) = creative, varied.
Top-p (nucleus sampling) — Controls diversity by limiting the pool of next-token candidates. Lower top-p = more focused output.
Hallucination — When the model generates plausible-sounding but factually incorrect information. A major challenge with LLMs.

Prompt Engineering

Prompt engineering is the practice of crafting effective inputs (prompts) to get the best outputs from a foundation model. It is the most cost-effective way to customize model behavior without fine-tuning.

Prompt Engineering Techniques

Zero-shot prompting — Ask the model to perform a task with no examples. "Classify this review as positive or negative: ..."
Few-shot prompting — Provide a few examples in the prompt before the actual task. The model learns the pattern from the examples.
Chain-of-thought prompting — Ask the model to "think step by step" to improve reasoning on complex problems.
System prompts — Set the model's role and behavior. "You are a helpful medical assistant. Always recommend consulting a doctor."

⚠

Exam focus: Know the difference between zero-shot, few-shot, and chain-of-thought prompting. The exam presents scenarios and asks which technique is most appropriate. Few-shot is best when you want consistent output format. Chain-of-thought is best for reasoning tasks.

Retrieval-Augmented Generation (RAG)

RAG is a technique that combines a foundation model with an external knowledge base. Instead of relying solely on the model's training data, RAG retrieves relevant information from your own documents and includes it in the prompt.

How RAG Works

Index — Your documents are chunked, converted to vector embeddings, and stored in a vector database
Retrieve — When a user asks a question, the system finds the most relevant document chunks using similarity search
Augment — The retrieved chunks are added to the prompt as context
Generate — The foundation model generates an answer based on both the question and the retrieved context

Why RAG Matters

Reduces hallucination — The model answers from your actual documents, not from potentially outdated or incorrect training data
No fine-tuning needed — You can add company-specific knowledge without retraining the model
Always up-to-date — Update the knowledge base with new documents at any time
Cost-effective — Much cheaper than fine-tuning or training a custom model

💡

Exam tip: RAG is a very commonly tested topic. The exam often asks: "A company wants their chatbot to answer questions about internal company documents. The chatbot should not make up answers. What approach should they use?" The answer is RAG. Amazon Bedrock Knowledge Bases is the AWS service that implements RAG.

Amazon Bedrock

Amazon Bedrock is the AWS fully managed service for building generative AI applications. It is the most important service for Domain 2 and frequently tested across the entire exam.

Key Bedrock Features

Multiple foundation models — Access models from Amazon (Titan), Anthropic (Claude), Meta (Llama), Cohere, Stability AI, and others through a single API
Bedrock Knowledge Bases — Managed RAG service. Connect your data sources (S3, web pages) and Bedrock handles chunking, embedding, vector storage, and retrieval
Bedrock Agents — Create AI agents that can plan multi-step tasks, call APIs, and access knowledge bases to complete complex workflows
Fine-tuning — Customize foundation models with your own labeled data for improved domain-specific performance
Guardrails — Define content filters, denied topics, and PII redaction to control model outputs and ensure responsible AI
Model evaluation — Compare models using automatic and human evaluation metrics

Amazon Titan Models

Amazon's own family of foundation models, available exclusively through Bedrock:

Titan Text — Text generation, summarization, Q&A
Titan Embeddings — Convert text to vector embeddings for RAG and semantic search
Titan Image Generator — Generate and edit images from text prompts
Titan Multimodal Embeddings — Embeddings for both text and images

Fine-Tuning vs Prompting vs RAG

The exam frequently asks when to use each approach. Here is the decision framework:

Prompting (Zero/Few-Shot)

When: General tasks, quick experimentation, no custom data needed. Cost: Lowest. Time: Instant. Example: "Summarize this article."

RAG

When: Need answers from specific documents, up-to-date information, reduce hallucination. Cost: Low-medium. Time: Hours to set up. Example: Q&A over company policies.

Fine-Tuning

When: Need domain-specific style, tone, or terminology. Model must "think" differently, not just access new data. Cost: Medium-high. Time: Days. Example: Medical report generation in a specific format.

Practice Questions

📝

Q1: A company wants its customer service chatbot to answer questions using the company's internal product documentation. The documentation is updated weekly. What is the MOST cost-effective approach?

A) Fine-tune a foundation model on the documentation
B) Train a custom model from scratch
C) Use retrieval-augmented generation (RAG) with Amazon Bedrock Knowledge Bases
D) Increase the model's temperature to improve creativity

Show Answer

C) Use RAG with Amazon Bedrock Knowledge Bases. RAG is ideal here: the company needs answers from specific documents that change weekly. RAG does not require retraining, and knowledge bases can be updated easily. Fine-tuning (A) is more expensive and would need to be repeated with each documentation update. Training from scratch (B) is overkill. Temperature (D) affects randomness, not knowledge.

📝

Q2: Which prompt engineering technique provides example input-output pairs before the actual task?

A) Zero-shot prompting
B) Few-shot prompting
C) Chain-of-thought prompting
D) Temperature tuning

Show Answer

B) Few-shot prompting. Few-shot prompting includes a few examples (input-output pairs) in the prompt to show the model the expected pattern. Zero-shot (A) provides no examples. Chain-of-thought (C) asks the model to reason step by step. Temperature tuning (D) is a parameter setting, not a prompting technique.

📝

Q3: When using an LLM, what does the "temperature" parameter control?

A) The size of the context window
B) The speed of inference
C) The randomness and creativity of the output
D) The number of tokens generated

Show Answer

C) The randomness and creativity of the output. Temperature controls how random the model's next-token selection is. Low temperature (near 0) produces deterministic, focused outputs. High temperature (near 1) produces more diverse, creative outputs. It does not affect context window size (A), speed (B), or output length (D).

📝

Q4: What is the PRIMARY benefit of using Amazon Bedrock over building your own LLM infrastructure?

A) Bedrock models are always more accurate
B) Bedrock provides fully managed access to multiple foundation models without managing infrastructure
C) Bedrock is the only way to use generative AI on AWS
D) Bedrock models never hallucinate

Show Answer

B) Bedrock provides fully managed access to multiple foundation models without managing infrastructure. Bedrock is a managed service that removes the need to provision, manage, and scale GPU infrastructure. It offers choice across multiple model providers through a single API. A is not guaranteed, C is incorrect (SageMaker JumpStart also offers FMs), and D is false (all LLMs can hallucinate).

📝

Q5: A foundation model generates a detailed but completely fabricated answer about a topic. What is this behavior called?

A) Overfitting
B) Underfitting
C) Hallucination
D) Data drift

Show Answer

C) Hallucination. Hallucination is when an LLM generates plausible-sounding but factually incorrect or entirely made-up information. It is one of the biggest challenges with generative AI. Mitigation strategies include RAG (grounding responses in real data), lower temperature, and guardrails.

← Previous AI/ML Fundamentals (20%) Next → AWS AI/ML Services (36%)