Study Notes — Certification Prep

Prompt Engineering
Study Guide

Comprehensive overview of prompt engineering frameworks, tactics, taxonomy, and operationalization.

Updated: April 2026

Version: 1.0

Category: Prompting

Reading Time: ~8 min

Author: Michaël Bettan

Definitions & Scope

Prompt Engineering

A rigorous engineering discipline focused on strategic design and iterative optimization, shifting AI from a static text predictor into a dynamic, agentic reasoning engine.

Large Language Models (LLMs)

Text generation models that utilize advanced algorithms and transformer architectures to predict the next token based on probabilistic likelihood.

Tokens

The fundamental linguistic unit in NLP (subwords, characters, words). Roughly 100 tokens equate to 75 words.

TCREI Framework

Industry standard for transforming vague requests into high-performance, production-ready prompts. The first 3 steps focus on Giving Direction and providing examples. The final 2 focus on Evaluating Quality and dividing labor.

Task

Be explicit. Defines the Action, Role, and Format. (e.g., Use explicit verbs like "Refactor" rather than "Write").

Context

Provide background details and explain situational awareness to prevent generic "hallucinations."

References

Provide 3–5 diverse examples (Few-shot learning) to steer style and logic reliably.

Evaluate

Review output against success criteria. Use "Chain-of-Thought" to reduce logic errors, track against gold standards.

Iterate

Tweak the prompt using Evaluation results. Apply Positive Framing (tell the model what it should do).

Prompt Anatomy & Structure

Type of Prompts

System Prompt: Defines fundamental capabilities and overarching rules (e.g., "You strictly output JSON").
Role Prompt: Frames output style, voice, and personality (e.g., "Act as a senior DevOps engineer").
Contextual Prompt: Provides immediate, task-specific background data needed to process the query.

"Lost in the Middle"

LLMs pay strict attention to the very beginning and very end of a massive prompt, but skim the middle.

Top of Prompt: Place large context documents and reference data here.
Bottom of Prompt: Place critical constraints, task descriptions, and output formatting rules at the very end.

Delimiters

Use clear markers (###, """, or <xml>) to separate instructions from data.

Leading Words

End your prompt with the exact first character of the expected response (e.g., "The valid JSON response is: {").

Optimization & Technical Tactics

Basic Optimization Tactics

Positive Framing: Tell the model what it should do rather than what it shouldn't. LLMs respond better to "Include X" than "Avoid Y."
Prompt Versioning: Prompt engineering is highly iterative. Treat prompts like code. Document your attempts tracking Prompt text, Temperature, Top-P, Top-K, and Output.

Prompt Caching

Place large static documents, system instructions, and few-shot examples at the very beginning of the prompt. Caching cuts costs by up to 90% and drastically speeds up inference.

Prompt Security (Guardrails)

Defend against Prompt Injection and System Leakage by utilizing input/output moderation models to intercept toxic or out-of-scope requests before reaching the main LLM.

Variables in Prompts (Templating)

For reusable enterprise prompts, use bracketed variables (e.g., {city}, {document}) to programmatically inject data into the exact same structure.

Automatic Prompt Eng. (APE)

Using an LLM to generate 10 variations of an instruction, test them all, and select the one yielding best factual accuracy.

Class Mixing (Few-Shot)

When providing examples for classification tasks, you must randomize the order. Listing all "Positive" examples first causes overfitting and hallucination.

Prompt Chaining (Pipeline Generation)

Avoid the "Mega-Prompt"

Asking the model to perform five complex tasks simultaneously overloads attention and drastically increases hallucinations. Instead, break multifaceted problems into sequential, isolated prompts.

Prompt 1 (Extraction): "Extract all financial figures and metrics from this text."
Prompt 2 (Drafting): "Using only the facts extracted above, draft an executive summary."
Prompt 3 (Formatting): "Rewrite this summary in the tone of an internal company email."

Positive Instructions vs. Strict Constraints

LLMs perform better with positive framing (what to do). Reserve strict negative constraints (ONLY, NEVER, DO NOT) exclusively for output formatting (e.g., strictly JSON), safety guardrails, or preventing specific hallucinations.

Advanced Prompting Tactics

Inner Monologue

Hiding reasoning steps from the final user output using specific formats (e.g., wrapping thought loops in """).

Meta Prompting

Using one AI model to generate, analyze, or refine a prompt for another AI model.

Least to Most

Sequentially generating increasingly detailed knowledge on a topic, feeding previous outputs into the next prompt.

Ask for Context

Allowing the LLM to ask clarifying questions before answering a user query.

Text Style Unbundling

Extracting features (tone, length, vocabulary) from a document to replicate its exact writing style.

Agentic Frameworks (Autonomous)

ReAct (Reason + Act)

A prompting paradigm interleaving "Thoughts" with "Actions." The model generates a reasoning trace, executes an action (like an API call or web search), and observes the result before continuing.

Reflexion (Self-Correction Loop)

Forcing the model to act as its own QA engineer. 1) Actor (generates draft), 2) Evaluator (scores draft against rules), 3) Self-Reflection (analyzes failures and writes corrected final output).

Role Prompting

Assigning a specific persona to narrow the model's probabilistic output space to domain-specific vocabulary.

Advanced Parameter Tuning

Beyond the text, engineers must control the mathematical parameters governing token selection.

Temperature

Controls randomness. Use low temperature (0.1 - 0.3) for Logic, Math, and strict Formatting. Use high (0.7+) for Creativity.

Top-P (Nucleus Sampling)

Controls vocabulary diversity. Rule of thumb: Alter Temperature OR Top-P, but rarely both.

Frequency Penalty

Penalizes words based on how often they have already appeared in the text. Prevents repetitive loops.

Presence Penalty

Penalizes words based on whether they have appeared at all. Forces the model to introduce new topics.

Top-K

Restricts next token prediction to the top 'K' most likely tokens. Lower Top-K keeps the model factual.

Max Token Length

Restricts output length. Warning: Hitting the limit abruptly stops generation (causing broken JSON); it does not force the model to write more concisely.

The Repetition Loop Bug

Extreme or clashing sampling settings (like high temperature with low Top-P) can cause the model to get "stuck" generating the same filler word, phrase, or sentence structure repeatedly until the context window fills.

Prompting Taxonomy

Zero & Few-Shot

Zero-Shot Prompting: Relying purely on precise semantic phrasing without prior examples.
Few-Shot Prompting (In-Context): Providing a limited set of demonstration pairs (input-output) to leverage pattern-matching.

Thought Generation

Thought Generation: Prompting the model to explicitly articulate internal reasoning (e.g., Chain-of-Thought).
Decomposition: Breaking highly complex, multi-faceted problems into smaller, manageable sub-problems to prevent context loss.

Evaluation

Ensembling: Generating multiple parallel outputs from identical/varied prompts and aggregating them for a robust consensus.
Self-Criticism: Requiring the model to independently evaluate, critique, and refine its own initial outputs.

Advanced Thought & Structural Patterns

Contrastive Chain-of-Thought (CCoT)

Embedding both valid and invalid reasoning pathways within the prompt. Showing miscalculations explicitly maps the boundaries of correct reasoning.

Step-Back Prompting

Forcing the LLM to pause and abstract a granular problem into a high-level conceptual inquiry before calculating the final answer. Highly synergistic with RAG.

Self-Consistency (Ensembling)

Running the exact same prompt multiple times (using temperature-based decoding) to generate diverse reasoning paths and feed answers back into the LLM for consensus.

Tree of Thoughts (ToT)

Generalizing CoT into a dynamic tree structure. The model generates multiple candidate thoughts, self-evaluates them, and uses search algorithms (BFS/DFS) to look ahead or backtrack.

Recursive Self-Improvement (RSIP)

Treating the LLM as a continuous reasoning loop. The model critiques itself through shifting "evaluation lenses" (e.g., logic, structural flow, tone), revising specific flaws each pass.

Context-Aware Decomposition (CAD)

Breaking a massive task into 3-5 macro-components, but forcing the model to explicitly define how the localized sub-task interacts with global system dependencies.

Task-Specific Prompt Blueprints

Information Extraction (NER)

Concept: Forcing the model to pull exact data points without adding conversational filler.
Formula: Extract the following entities from the text below: [Entity 1, Entity 2]. Output strictly as a JSON array.

Classification & Sentiment

Concept: Restricting the model's output to predefined categories.
Formula: Classify the following text into exactly one of these categories: [Cat A, Cat B, Cat C]. Do not provide explanations.

Anti-Hallucination Summarization

Concept: Grounding the model to only use provided context.
Formula: Summarize the following text. You must ONLY use the facts provided in the text. If a fact is not in the text, do not include it. Cite your claims using [Paragraph Number].

Structured Input Schemas

Concept: Using JSON schemas to format your input data (not just outputs) acts as a strict blueprint for attention.
Formula: Evaluate the following product data to answer the user's query: { "name": "Headphones", "price": 99.99 }

Emergent Roles & Enterprise LLMOps

5-Stage Maturity Model

Ad-hoc Experimentation
Template Standardization
Systematic Evaluation
Production Observability
Continuous Optimization

Prompt vs. Tuning

Prompt Design: Relies entirely on base model weights. Fast to build, but lengthy prompts increase per-call token costs.
Supervised Fine-Tuning (SFT): Physically alters underlying weights. Allows for much shorter prompts, significantly lowering latency and cost.
LoRA: Updates only a minute fraction of the model's parameters via small adapter layers. Highly cost-effective customization.

Prompt Engineer

Tasked exclusively with pattern design, template governance, and assessing prompt sensitivity across updating models.

Knowledge Engineer

Responsible for curating, structuring, and maintaining vast RAG sources and vector schemas.

AI QA/Evaluation Engineer

Focused on building automated testing suites, managing red-team operations, and continuously probing prompts for jailbreaks, hallucinations, and bias.

AIOps Specialist

Manages overarching infrastructure, oversees complex cost tuning (token length vs. inference speed), and ensures seamless observability/rollback capabilities.

Self-Assessment Questions

Q1. What does the "Lost in the Middle" phenomenon refer to in LLMs?

LLMs pay strict attention to the very beginning and very end of a massive prompt, but skim the middle. So critical constraints should be placed at the end.

Q2. Why is "Prompt Chaining" preferred over creating a single "Mega-Prompt"?

Asking the model to perform multiple complex tasks simultaneously overloads attention and increases hallucinations. Breaking it into sequential prompts improves accuracy.

Q3. What is the difference between Zero-Shot and Few-Shot prompting?

Zero-Shot relies purely on precise phrasing without examples. Few-Shot provides a limited set of demonstration pairs (input-output) to leverage pattern-matching.