If you’re interested in a brief overview of AI architecture and the basics, I highly recommend this YouTube video. Also check out the five-day course published by Kaggle, available here.
A
Activation Function: A mathematical function applied to a node’s output in a neural network to introduce nonlinearity and help the model learn complex patterns.
Adapters: Small, trainable modules inserted into a pretrained model to allow it to adapt to specific tasks without modifying the entire model. Adapters enable efficient fine-tuning, as only the adapter modules are updated, leaving the core model intact.
Adapter-Based Fine-Tuning: A method where small neural network modules, called adapters, are inserted into a pretrained model. Only the adapters are trained for new tasks, while the original model weights stay mostly unchanged. This makes fine-tuning more efficient and allows multiple tasks to be handled with minimal extra parameters.
Here’s an entry for AI Models along with the list of top 10 models used by prompt engineers:
AI Models
An AI model refers to a mathematical model that has been trained on large amounts of data to recognize patterns, make decisions, or generate content. These models are the backbone of AI systems and vary in complexity depending on the tasks they are designed to perform. The models are typically built using machine learning or deep learning algorithms, and once trained, they can be used for a wide range of applications, including text generation, image recognition, translation, and more.
Top 10 AI Models Used by Prompt Engineers
- GPT-4.1 (OpenAI)
Definition: GPT-4.1 is a highly advanced language model optimized for tasks like coding, instruction-following, and long-context understanding. It can process millions of tokens and is renowned for its versatile prompt engineering capabilities.
Use: Ideal for complex tasks, including technical and multi-step problem solving. - Gemini 2.0 (Google DeepMind)
Definition: Gemini 2.0 excels at generating engaging and creative content, particularly for platforms requiring high-energy and punchy responses.
Use: Perfect for content creation in marketing, social media, and entertainment. - Claude 2 (Anthropic)
Definition: Claude 2 focuses on strong reasoning and ethical AI usage, making it a trusted model for sensitive and nuanced tasks.
Use: Preferred for tasks that require robust safety features and logical reasoning. - Mistral 7B (Mistral AI)
Definition: An open-weight model offering a balance between performance and resource efficiency, Mistral 7B is effective for both research and practical applications.
Use: Great for cost-effective, efficient models in various applications. - LLaMA 2 (Meta)
Definition: LLaMA 2 is an open-source, research-oriented model series developed by Meta, optimized for scalability and versatility in AI tasks.
Use: Frequently used for research and custom AI development due to its flexibility. - GPT-4o (OpenAI)
Definition: GPT-4o is an earlier version of GPT-4 that focuses on high-quality content generation with flexible handling of various tasks.
Use: Previously popular for conversational agents and general-purpose AI applications. - GPT-3.5 (OpenAI)
Definition: GPT-3.5 is widely recognized for its solid performance and cost-effectiveness, often chosen for less complex tasks or large-scale deployments.
Use: Used for many standard NLP tasks, including chatbots and simple content generation. - Gemini 1.5 (Google DeepMind)
Definition: Gemini 1.5 is an earlier version of Google’s Gemini models, balancing performance with lower resource requirements.
Use: Suitable for general AI applications, offering a good balance of cost and performance. - Claude 1 (Anthropic)
Definition: Claude 1 laid the foundation for the Claude series, focusing on safety and reliability in AI systems.
Use: Early adoption for trusted conversational AI and ethical applications. - LLaMA (Meta)
Definition: The original LLaMA model series from Meta set the stage for subsequent developments in AI research, focusing on high-quality, open-source models.
Use: Primarily used in research environments, particularly in academia and for specialized AI projects.
Artificial General Intelligence (AGI): A hypothetical AI system that possesses general intelligence and can perform intellectual tasks as well as or better than a human. AGI does not currently exist.
Attention: A technique used in transformers and other neural networks that allows models to focus on relevant parts of the input when generating text.
AutoRater: An AI model trained to automatically evaluate and score the quality of other AI-generated outputs. AutoRaters are often used in reinforcement learning systems to replace human feedback, making it faster and cheaper to improve AI models.
B
Batching: A technique used in machine learning where multiple input data samples are processed together in a single step or “batch” to improve efficiency. By processing multiple inputs simultaneously, batching speeds up both training and inference by taking advantage of parallel processing capabilities of modern hardware.
BERT: Bidirectional Encoder Representations from Transformers. A popular natural language processing technique developed by Google.
Bias: In artificial intelligence and machine learning, bias refers to systematic errors or unfair tendencies in a model’s predictions, often reflecting imbalances or prejudices present in the training data. Bias can lead to inaccurate, unfair, or harmful outputs, and addressing it is a key challenge in building ethical AI systems.
Example: If a language model trained on biased data consistently associates certain professions with specific genders, it is exhibiting bias learned from its training sources.
C
Chain of Thought Prompting: A technique where a model is guided to generate intermediate reasoning steps before reaching a final answer. By encouraging the model to “think out loud,” chain of thought prompting improves performance on complex tasks like math, logic, and problem-solving. From the Google Whitepaper on prompting: “Chain of thought prompting is based on greedy decoding, predicting the next word in a sequence based on the highest probability assigned by the language model. Generally speaking, when using reasoning, to come up with the final answer, there’s likely one single answer. Therefore, the temperature should always be set to zero. https://www.kaggle.com/whitepaper-prompt-engineering
Example: For a question like, “If a train leaves at 3 PM and travels for 2 hours, what time does it arrive?”, a chain of thought response would be: “The train leaves at 3 PM. It travels for 2 hours. Adding 2 hours to 3 PM gives 5 PM. Therefore, the train arrives at 5 PM.”
Chatbot: A computer program designed to simulate conversation with human users, especially over the internet. Chatbots like ChatGPT are a type of conversational agent.
ChatGPT: A large language model chatbot created by Anthropic and launched in November 2022. It is built on GPT technology.
Chunk Settings: Parameters or configurations that control how large blocks of data (chunks) are split, processed, or fed into an AI model. Chunking is especially important when working with large text, documents, or datasets that exceed a model’s token limit. Good chunk settings help maintain context, coherence, and performance during input processing or generation tasks.
Why Chunk Settings Matter in AI:
- Token Limits: Models have a maximum token capacity (e.g., 8K, 32K tokens). Chunking ensures inputs stay within this limit.
- Memory Management: Smaller chunks make it easier for the model to process data without losing important information.
- Context Preservation: Proper chunk sizing helps the AI maintain understanding across larger documents by overlapping or intelligently dividing text.
Typical Chunk Settings:
- Chunk Size: How large each chunk is, usually measured in tokens or words (e.g., 512 tokens per chunk).
- Chunk Overlap: How much content overlaps between chunks to preserve context between them (e.g., 50-token overlap).
- Maximum Chunks: Limits the number of chunks processed to avoid exceeding system limits.
Codebase: The entire collection of source code used to build a software application, system, or project. It includes all the files, libraries, and resources that developers use to create, maintain, and update a project. A codebase can be stored in version control systems like Git to track changes and manage collaboration. Example: In a machine learning project, the codebase would include scripts for data preprocessing, model training, evaluation, and deployment.
Coherence: The quality of being logical, consistent, and connected in writing or speech. In the context of AI, coherence refers to how well a model’s output flows naturally and maintains clear, understandable relationships between ideas across sentences, paragraphs, or conversational turns.
Why Coherence Matters in AI:
- User Experience: Coherent responses are easier for humans to read and understand.
- Task Performance: High coherence improves the effectiveness of tasks like summarization, storytelling, and multi-step reasoning.
- Trustworthiness: A coherent model output feels more reliable and intelligent to users.
Example:
Incoherent Response:
“Cats are pets. Sometimes people like dogs. Pets have fur. Independent is sometimes.”
Coherent Response:
“Cats are popular pets because they are independent and affectionate. They require less attention than dogs but still enjoy human companionship.”
Constraint: A rule, limit, or condition placed on an AI model’s output or behavior to guide how it responds. Constraints help ensure that the generated content meets specific requirements, such as length, style, format, tone, or topic relevance.
Example: A prompt might include a constraint like “Respond in exactly 100 words” or “Use only formal language appropriate for business communication.”
Context Window:
A context window refers to a fixed-size sliding window that moves across a sequence of tokens (such as words or characters) in a text. This window captures the surrounding context of a specific token, allowing models to consider nearby information when making predictions or understanding the meaning of that token.
For example, in language modeling tasks, a context window helps neural networks process words in context. The size of the window determines how many preceding and subsequent tokens are considered. Larger context windows can capture more distant dependencies, but they also increase computational complexity.
In other words, a context window provides the necessary context for AI models to make informed decisions based on the surrounding words or characters in a given sequence.
Contextual Prompting: A method of providing a model with specific context or background information within the prompt itself, enabling the model to generate more relevant and accurate responses. This approach helps the model understand the nuances or requirements of the task by embedding context directly into the input.
Example: If you want a model to summarize a scientific article, you might provide a contextual prompt like:
“Given the following scientific article about climate change, summarize the key findings:” followed by the article text. This helps the model understand that it should summarize, not just generate random text.
Cross-validation: A technique for assessing how a model performs on unseen data by dividing the dataset into training and testing subsets multiple times.
Curriculum learning: Training an AI system on progressively more difficult tasks, similar to how students progress through grade levels.
D
DALL-E: A generative AI system created by OpenAI that can create realistic images and art from text descriptions.
Data Preprocessing: The process of preparing raw data for analysis or machine learning by cleaning, transforming, and organizing it into a usable format. This step often involves removing noise, handling missing values, normalizing or scaling data, and encoding categorical variables to ensure that the model can learn effectively from the data. Example: In a machine learning project, data preprocessing might involve converting text data into numerical features (e.g., using TF-IDF or word embeddings) and normalizing numerical data to ensure consistent scales before feeding it into a model.
Decision Trees: a type of machine learning algorithm used for both classification and regression tasks. They model decisions and their possible consequences, including outcomes, resource costs, and utility. A decision tree works by splitting data into subsets based on different attributes, and it continues to split these subsets until it reaches a decision or prediction.
Deterministic Output: The result produced by a model or system that is consistent and predictable every time it is given the same input. In deterministic systems, there is no randomness or variation in the output, meaning that the model always produces the same response or outcome for identical inputs.
Example: In a rule-based system or a model with no randomness (like a fixed algorithm), given the same input data, it will always output the same result, such as calculating “5 + 3” always producing “8.”
Dialogflow: A natural language understanding (NLU) platform developed by Google that enables developers to build conversational interfaces, such as chatbots and voice-powered apps. Dialogflow interprets user input, matches it to intents, and generates structured responses, helping create interactive, human-like conversations across different platforms like websites, apps, and messaging services.
Example: A business might use Dialogflow to build a chatbot that answers customer service questions on their website or integrates with Google Assistant.
Distillation: A process in machine learning where a smaller, more efficient model (the student) is trained to mimic the behavior of a larger, more complex model (the teacher). This allows for faster and more resource-efficient models without significantly sacrificing performance.
E
ELIZA: An early natural language processing computer program created in the 1960s to simulate conversation.
Embeddings: Representing words, phrases or items as numeric vectors that encode semantic meaning based on context. Allows AI models to understand language.
Epoch: One complete pass through the entire training dataset during the learning process of a machine learning model.
F
F1 Score: A metric used to evaluate a model’s accuracy by balancing precision and recall. It is the harmonic mean of precision and recall, providing a single score that reflects both false positives and false negatives. A higher F1 score indicates better overall performance, especially on imbalanced datasets.
Feature Engineering: The process of selecting, modifying, or creating input variables (features) to improve a machine learning model’s performance.
Federated learning: A distributed machine learning approach where models are trained across decentralized devices or servers holding local data samples, without exchanging their training data.
Few-shot Prompting: A method where a model is provided with a few examples of a task before it is asked to perform the same task on new input. This allows the model to generalize and perform the task more effectively, leveraging the few examples as context for understanding the task.
Example: In few-shot prompting, you might provide a model with a few examples of sentence translations, such as:
- “Translate ‘Good morning’ into French: ‘Bonjour.'”
- “Translate ‘How are you?’ into French: ‘Comment ça va?'”
Then, you ask it to translate a new sentence: “Translate ‘Thank you’ into French.”
Fine-tuning: The process of taking a pretrained machine learning model and customizing it with additional data and training to perform a specific task. This is done with large language models like GPT-3.
Flash Attention: A highly optimized attention mechanism designed to speed up the self-attention process in transformer models. By improving memory and computational efficiency, Flash Attention allows models to handle longer sequences and larger datasets faster while using less GPU memory.
G
Generative AI: AI systems capable of generating new content like text, images, video, and audio from scratch. Large language models like GPT-3 are a type of generative AI.
General Principles: Fundamental rules, guidelines, or concepts that govern behavior, decision-making, or problem-solving across various contexts. These principles are typically broad in nature and apply to a wide range of tasks or domains, providing a foundational understanding that can be adapted to specific scenarios.
Example: In machine learning, general principles might include the importance of data preprocessing, model validation, and regularization techniques, which are applied across different types of tasks, whether it’s classification, regression, or clustering.
Generative Pretrained Transformer (GPT): A series of natural language processing models developed by OpenAI using the transformer technique. GPT-3 is the third version.
Gradient Descent: An optimization algorithm that adjusts a model’s parameters to minimize the error by moving step-by-step toward the lowest point of a loss function.
Greedy Decoding: A method of text generation where the model selects the token with the highest probability at each step, without considering any alternatives. This approach aims for the most likely sequence but can lead to repetitive or less creative outputs because it doesn’t explore less probable options.
Example: In generating a sentence, if the model predicts “the” with the highest probability followed by “cat,” “sat,” and so on, greedy decoding will always choose the most probable word at each step, resulting in a straightforward and deterministic output.
H
Hallucination: When AI systems like generative chatbots produce false information, make up facts or exhibit inconsistent responses. An ongoing challenge.
Human-in-the-loop: A technique in AI where humans work together with AI systems to enhance performance. Used to improve generative models.
Hyperparameters: The variables that govern the training process and model architecture for machine learning algorithms. Must be tuned for optimal results.
I
Inference Process: The phase where a trained machine learning model uses its learned patterns to make predictions or generate outputs based on new input data. Inference is what happens when the model is deployed and put to use, as opposed to being trained.
Information Extraction: The process of automatically identifying and extracting structured information from unstructured text. This can include extracting entities (e.g., names, dates), relationships, and other relevant data points to transform raw text into useful insights.
Instruction: A clear and direct command or request given to an AI model within a prompt to tell it exactly what task to perform. Instructions can specify actions like summarizing, generating, analyzing, or formatting information, helping to shape the model’s response to match user expectations.
Example: “Write a five-sentence story about a cat who becomes a mayor.”
Intermediate Reasoning Steps: The individual, logical steps that a model takes between the initial input and the final answer when solving a problem. Laying out these steps helps make the model’s thought process more transparent, improves accuracy, and reduces errors in complex tasks.
Example: In solving “What is (5 + 3) × 2?”, the intermediate reasoning steps would be:
- First add 5 + 3 to get 8.
- Then multiply 8 × 2 to get 16.
Interpretability: The ability to explain how and why an AI model makes decisions. Important for understanding generative models.
J
JSON Repair: The process of identifying and correcting errors or issues in a JSON (JavaScript Object Notation) file to ensure it is properly formatted and can be processed correctly by applications or systems. Common issues include missing commas, mismatched brackets, or incorrect data types. JSON is widely used in AI for data exchange, configuration files, and API responses. Inaccurate or improperly formatted JSON can lead to failures in data processing or communication between AI systems, hindering model training, deployment, or integration tasks. Tools like json-repair (available on PyPy) are invaluable when it comes to JSON repair.
JSON Schema for Input: A blueprint or specification that defines the structure, format, and validation rules for the input data in JSON format. It helps ensure that the data sent to a system or API adheres to expected structures, such as required fields, data types, and constraints. In AI, JSON schemas are often used to ensure the proper format of training data, API requests, or configuration files.
Example 1:
Schema for User Data
{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer", "minimum": 18 },
"email": { "type": "string", "format": "email" }
},
"required": ["name", "email"]
}
This schema ensures that input data for a user has a name, an email (valid email format), and an age of at least 18.
Example 2:
Schema for AI Model Configuration
{
"type": "object",
"properties": {
"model_type": { "type": "string" },
"max_tokens": { "type": "integer", "minimum": 100 },
"temperature": { "type": "number", "minimum": 0, "maximum": 1 }
},
"required": ["model_type", "max_tokens"]
}
This schema ensures that any input configuration for the AI model includes the model type and a minimum number of tokens, while also restricting the temperature setting between 0 and 1.
L
Large language model: AI models like GPT-3 that are trained on massive text datasets to generate human-like text. Key to generative AI.
LLaMA: Short for “Large Language Model Meta AI,” LLaMA is a series of open-weight language models developed by Meta (Facebook’s parent company). Designed for research and practical applications, LLaMA models are smaller and more efficient than some traditional large models while still delivering strong performance.
LLM (Large Language Model): A type of AI model trained on massive amounts of text data to understand and generate human-like language. LLMs can perform a wide range of tasks such as answering questions, writing content, translating languages, and more.
LoRA (Low-Rank Adaptation): A method used for efficient fine-tuning of large pretrained models by adding low-rank layers to the model. LoRA reduces the number of parameters that need to be adjusted, making the fine-tuning process faster and more memory-efficient without sacrificing performance.
M
Machine learning: The study of computer algorithms that can improve automatically through experience and data. Powers modern AI like generative models.
Model: In machine learning, a model is a mathematical representation that learns patterns from data to make predictions or decisions. Once trained on a dataset, the model can be used to infer outcomes on new, unseen data based on its learned patterns.
Model Garden: A collection or catalog of machine learning models available within a platform (such as Google Cloud’s Vertex AI). Model Gardens provide users with access to a variety of pre-trained, fine-tuned, or customizable models for different tasks like text generation, image classification, translation, and more. Users can browse, compare, and deploy models directly from the garden.
Example: A developer might visit the Model Garden to quickly find a large language model for text summarization or a vision model for image recognition, without having to train one from scratch.
Model Performance: A measure of how well an AI model accomplishes its intended task, typically evaluated using metrics like accuracy, speed, efficiency, coherence, and robustness. Performance assessments help developers understand a model’s strengths, weaknesses, and areas for improvement.
Key Aspects of Model Performance:
- Accuracy: How correctly the model makes predictions or generates outputs.
- Speed/Latency: How quickly the model can produce a response.
- Throughput: The number of tasks or queries a model can handle over a period of time.
- Coherence and Relevance: How logically consistent and on-topic the outputs are.
- Robustness: How well the model handles noisy, unexpected, or adversarial inputs.
- Efficiency: How well the model uses computational resources (like memory and processing power).
When evaluating a chatbot:
- High performance: Quickly gives accurate, relevant, and coherent answers to customer questions.
- Low performance: Takes too long, gives wrong or confusing answers, or misunderstands the question.
Multimodal LLM (Large Language Model): A type of AI model that can process and generate multiple types of data, such as text, images, and audio, within the same model framework. Multimodal LLMs enable more complex and flexible interactions by integrating different modalities to understand and respond in richer ways.
Multimodal Prompting: A technique where a model is prompted using multiple types of input data—such as text, images, audio, or video—rather than just text alone. Multimodal models can understand and generate responses that combine information from different sources, enabling richer and more complex interactions.
Example: A user might upload an image of a dog and ask the model, “Describe the breed and suggest a good name,” combining visual and text-based inputs.
N
Natural language processing (NLP): The ability of a computer program to understand, interpret, and manipulate human language. Allows chatbots to converse.
Neural network: A computing system inspired by the human brain’s neurons. Neural nets power deep learning algorithms used to create generative AI.
Nucleus Sampling: A method of text generation where the model selects the next token from a dynamic subset of the vocabulary, which includes the smallest number of tokens whose cumulative probability exceeds a threshold p. This helps to generate more coherent and diverse outputs compared to traditional methods like top-k sampling by focusing on the most probable options while avoiding the extremes. Example: If p = 0.9, the model will consider only the smallest set of tokens that together have a 90% cumulative probability, giving it more flexibility and diversity in generating text.
O
One-shot Prompting: A type of input to a model where a single example is provided to guide the model’s response or task execution. The model is expected to understand the task and generate appropriate outputs based on the single example, allowing it to generalize to similar tasks without extensive training. Example: Asking a language model, “Translate the sentence ‘Hello, how are you?’ into French” after showing it one example translation, such as “Translate ‘Good morning’ into French: ‘Bonjour,'” is a one-shot prompt.
OpenAI: A San Francisco AI research company that created important generative AI models like GPT-3 and DALL-E. https://openai.com/
Output Format: The structure or arrangement of the information produced by an AI model in response to a prompt. Output format defines how the AI organizes and presents the results, such as text, tables, lists, code, or other structured data. Specifying the output format in a prompt helps ensure that the model’s response is delivered in a way that meets the user’s needs.
Example: “Provide the list of cities in a table with columns for Name, Country, and Population.”
Common Output Formats:
- Text: Standard paragraph or sentence-based output, suitable for general responses.
Example: “The Eiffel Tower is located in Paris, France.” - List: A series of items presented in a bullet-point or numbered list format.
Example:- Apples
- Bananas
- Oranges
- Table: A structured format with rows and columns to display data.
Example: Name Age Occupation Alice 30 Engineer Bob 25 Designer - JSON (JavaScript Object Notation): A data format that organizes information into key-value pairs, often used for data exchange.
Example:{ "name": "Alice", "age": 30, "occupation": "Engineer" }
- Code: Structured output in the form of programming code, used for technical or developer tasks.
Example:def greet(name): return f"Hello, {name}!"
- Bullet Points: A concise way of presenting information, typically used for summarizing key points.
Example:- Quick processing
- Cost-effective
- High accuracy
- CSV (Comma-Separated Values): A simple text format used for representing tabular data where each value is separated by a comma.
Example:Name, Age, Occupation Alice, 30, Engineer Bob, 25, Designer
- Markdown: A lightweight markup language used to format text in plain text documents, commonly used for documentation.
Example:# This is a heading - Item 1 - Item 2
- XML (Extensible Markup Language): A format used for storing and transporting data in a hierarchical structure, often used for configuration or data exchange.
Example:<person> <name>Alice</name> <age>30</age> <occupation>Engineer</occupation> </person>
- Natural Language: Simple conversational text or explanations, often used when the model is expected to output answers in human-readable form.
Example: “The Eiffel Tower is located in Paris, France.”
Overfitting: When a machine learning model performs very well on its training data but fails to generalize well to new, unseen data. Models need to avoid overfitting.
P
Parameter: The internal variables or “knobs” which machine learning models learn from training data in order to make predictions and decisions.
PEFT (Parameter-Efficient Fine-Tuning): A technique for adapting large pretrained models by adjusting only a small subset of their parameters instead of retraining the entire model. PEFT makes fine-tuning faster, cheaper, and more practical for specialized tasks without needing massive computing resources.
Predicted Distribution: The probability distribution over all possible tokens or outcomes that a model generates as a response to a given input. It represents the likelihood of each token being the next in the sequence based on the model’s understanding of the context. Example: When generating the next word in a sentence, the model might output a predicted distribution where “cat” has a probability of 0.6, “dog” has 0.3, and “fish” has 0.1, indicating how likely each word is to come next.
Predicted Token Probabilities: The likelihood values assigned by a language model to each possible token (word, character, etc.) as the next step in text generation. These probabilities represent how confident the model is about each token being the correct or most appropriate choice based on the context of the input. Example: In a sentence generation task, if the model predicts “cat” with a probability of 0.6 and “dog” with 0.4, it means the model is more confident that “cat” is the next word.
Prefix Caching: A technique used to speed up language model inference by storing and reusing the results of previously processed token sequences (prefixes). Instead of recalculating outputs from scratch each time, the model can quickly retrieve cached results for parts of the input that have already been processed.
Pre-training: The initial phase where a machine learning model learns general patterns from a large dataset before being fine-tuned for a specific task. It builds foundational knowledge, allowing the model to adapt faster and perform better on specialized tasks.
Prompt engineering: The crafting of text prompts to get the best results from large language models like GPT-3. More of an art than science.
Q
QLoRA (Quantized Low-Rank Adaptation): A variant of LoRA that combines low-rank adaptation with quantization, reducing the memory and computational requirements even further. By quantizing the low-rank matrices, QLoRA allows fine-tuning large models with even fewer resources while maintaining performance.
Question Prompt: A type of prompt where the user poses a direct question to the AI model to elicit a specific answer, explanation, or action. Question prompts help focus the model’s response on providing information, solving a problem, or offering an opinion based on the question asked.
Example: “What are the main causes of climate change?”
Types of Question Prompts:
Comparative Questions: Request a comparison between two or more things.
Example: “Which is more fuel-efficient, electric cars or hybrid cars?”
Open-ended Questions: Encourage detailed, thoughtful responses.
Example: “How can renewable energy impact global economies?”
Closed-ended Questions: Seek short, specific, or yes/no answers.
Example: “Is the Earth the third planet from the Sun?”
Clarifying Questions: Ask for more information or explanation.
Example: “Can you explain what you mean by ‘energy efficiency’?”
R
ReAct: A framework used in reinforcement learning and language models that combines reasoning and action. ReAct encourages the model to perform reasoning steps before taking an action, which helps it make more informed and effective decisions, especially in complex tasks or problem-solving scenarios.
Regularization: Techniques used to reduce overfitting by adding constraints or penalties to a machine learning model’s complexity.
Repetition Loop Bug: An issue that occurs during text generation where the model continuously repeats a specific sequence or token, leading to an endless loop or a highly repetitive output. This bug typically arises from the model’s failure to properly handle token selection, often caused by the improper tuning of parameters like temperature or the model’s inability to escape a local probability peak. Example: In a chatbot, a repetition loop bug might cause the model to repeatedly say, “Hello, how can I help you?” over and over without progressing the conversation.
Retrieval Augmented Generation (RAG): Retrieval-Augmented Generation (RAG) is an innovative approach in the field of natural language processing (NLP) that combines two powerful techniques: retrieval-based methods and generative models. In essence, RAG systems leverage external knowledge sources—such as databases, documents, or the web—along with generative models like GPT (Generative Pretrained Transformers) to improve the accuracy, relevance, and diversity of generated content.
Reinforcement learning: An AI technique where models learn through trial and error and positive/negative feedback without labeled training data. Promising for advancing generative AI.
Reinforcement Learning Algorithms: A class of machine learning methods where an agent learns by interacting with an environment, receiving rewards or penalties based on its actions. The goal is to discover strategies that maximize cumulative rewards over time.
RLAIF (Reinforcement Learning with AI Feedback): A training method where an AI model improves its behavior based on feedback generated by another AI, rather than relying solely on human evaluations. RLAIF helps scale up reinforcement learning processes by reducing the need for constant human supervision.
Role Prompting: A technique where a model is assigned a specific role or persona within the prompt to guide its behavior and response style. By setting the role explicitly, the model can tailor its output to fit that role, whether it’s as a teacher, assistant, writer, or any other defined character.
Example: A role prompt might be: “You are a professional travel guide. Please recommend a vacation destination for someone interested in history and culture.” This helps the model respond with relevant advice aligned with the role of a travel guide.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation): A set of metrics used to evaluate the quality of text generation by comparing the model’s output to human-written reference texts. ROUGE measures overlaps in n-grams, word sequences, and word pairs, helping assess how well the generated text captures the key ideas of the original.
S
Sampling Controls: Parameters used in machine learning and natural language processing models to manage the randomness and diversity of generated outputs. Common sampling controls include temperature, top-k sampling, and top-p (nucleus) sampling, which affect the probability distribution of token selection, thus influencing how predictable or creative a model’s responses are.
Self-Consistency: A technique used in prompting or model evaluation where multiple reasoning paths or outputs are generated for the same input, and the final answer is selected based on the most common or consistent result. This approach helps improve reliability by reducing the chance of errors from a single, possibly flawed, reasoning chain.
Example: In solving a math problem, a model might generate five different chains of thought; if four of them conclude the answer is “12,” self-consistency would choose “12” as the final answer.
Soft Prompt: A learnable set of embeddings (vectors) that guide a language model’s behavior without changing its original parameters. Unlike traditional text prompts, soft prompts are optimized during training and exist purely in the model’s internal representation space.
Step-back Prompting: A prompting technique where the model is asked to first reflect or reason about a problem before attempting to answer it directly. This approach encourages the model to “step back,” think through the task logically, and produce more accurate and thoughtful responses.
Example: Instead of asking a model, “What’s the answer to this math problem?” you might prompt it with, “Before solving, explain how you would approach this math problem step-by-step,” helping it reason before giving the final answer.
Step-by-Step Reasoning: A prompting technique where a model is encouraged to break down a problem into smaller logical steps before arriving at a final answer. This method helps improve the accuracy and reliability of complex outputs by mimicking human-like problem-solving processes.
Example: Instead of immediately answering “What is 24 divided by 6 plus 3?”, a step-by-step reasoning prompt would guide the model to first divide 24 by 6 (getting 4) and then add 3 to reach the final answer of 7.
Supervised learning: AI algorithms trained on labeled datasets mapping inputs to desired outputs. Supervision makes models more specialized.
Synthetic Data: Artificially generated data that mimics real-world data but is created through algorithms, simulations, or models instead of being collected from real events or users. Synthetic data is often used to augment training datasets, protect privacy, or simulate rare scenarios in machine learning.
Example: Creating a set of fake customer reviews using a language model to train a sentiment analysis system without using real customer data.
System Prompting: A technique where a model is provided with explicit instructions or constraints in the form of a prompt before it begins generating responses. System prompts are typically used to guide the model’s behavior, ensuring that the output aligns with specific requirements or instructions.
Example: In a conversation with a language model, a system prompt might be something like, “You are a helpful assistant. Please respond concisely and professionally,” to set the tone and behavior for the model’s responses.
T
Temperature: A setting that controls the randomness of a language model’s output during text generation. Lower temperatures make the model’s responses more focused and predictable, while higher temperatures make them more creative and diverse.
Text Summarization: The process of reducing a large body of text to a shorter version that preserves the key ideas and meaning. It can be done using either extractive methods (selecting key sentences or phrases) or abstractive methods (generating new sentences to summarize the content in a more natural way).
Thought-Action Loop: A continuous cycle where a model generates a thought or decision (e.g., reasoning about a problem), takes an action based on that thought (e.g., generating a response or solving a task), and then evaluates the result. This iterative process allows the model to refine its output and adapt its reasoning through repeated cycles of thinking and acting.
Example: In a conversational agent, the model might think through a question (“What is the capital of France?”), generate a response (“Paris”), and then check if the response makes sense before adjusting or expanding on it for clarity or further accuracy.
Throughput: A measure of how much data or how many tasks a system can process in a given amount of time. In AI and machine learning, higher throughput means a model can handle more inferences, training steps, or data points efficiently.
Tiebreaking: A technique used in machine learning and natural language processing to resolve situations where multiple tokens or options have the same probability or score. Tiebreaking ensures that a single output is selected by applying additional rules or randomness to choose between equally likely candidates. Example: In text generation, if two tokens have the same highest probability, a tiebreaking rule might randomly select one or prefer a token based on additional factors like context or frequency.
Tokenization: Splitting text into smaller parts like words, phrases or sentences called tokens when processing natural language with AI.
Token Limit: The maximum number of tokens (words, subwords, characters, or other units) that a model can process in a single input or output sequence. Token limits are important because models have a fixed capacity for handling data, and exceeding the limit can cause truncation, loss of information, or errors. Example: If a model has a token limit of 512 tokens, any input text that exceeds this number may be truncated, meaning only the first 512 tokens are processed, and the remaining text is ignored.
Top-k Sampling: A method for controlling randomness in text generation by limiting the selection of the next token to the top k most likely options. Instead of considering the entire vocabulary, the model only chooses from the top k candidates, which helps balance between creativity and coherence in generated text.
Top-p Sampling: Also known as nucleus sampling, this method selects the next token based on a dynamic probability threshold. Instead of choosing from a fixed number of top candidates (like top-k), the model considers the smallest set of tokens whose cumulative probability mass is greater than or equal to p. This allows for more flexible and diverse outputs while maintaining coherence.
Traditional Prompt: A straightforward input or instruction given to a model without the use of advanced techniques like few-shot, chain-of-thought, or role prompting. In a traditional prompt, the model is simply asked to perform a task directly, often relying on its general training to understand and respond.
Example: A traditional prompt might be, “Write a short story about a dragon,” without providing examples, step-by-step reasoning, or assigning a role.
Transformer: A neural network architecture particularly effective for natural language processing. First introduced in 2017.
Transfer learning: A technique where a model pretrained on one machine learning task is reused as the starting point for a related task. Enables training complex AI with less data.
Tree of Thoughts: A reasoning framework where a model explores multiple possible paths of thinking (like branches of a tree) before selecting the best solution. Instead of following just one linear chain of thought, the model considers several options at each decision point, allowing for deeper exploration and better problem-solving.
Example: When planning a story, a model might first branch into different plot ideas (adventure, mystery, romance), then for each, branch into possible character choices, and so on, ultimately picking the most promising storyline after evaluating several paths.
Turing test: A test conceived by Alan Turing to assess whether a machine can exhibit intelligent behavior indistinguishable from a human. Smart chatbots approach this.
U
Underfitting: When a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and new data.
Unsupervised learning: AI models that learn patterns from unlabeled, unclassified data. Key technique that enabled development of generative AI.
V
Variables in Prompts: Placeholders or dynamic elements within a prompt that can be filled with different values at runtime. Variables make prompts more flexible and reusable by allowing the same structure to adapt to different inputs or contexts.
Example: In the prompt, “Summarize the article titled [Article_Title],” the [Article_Title] is a variable that can be replaced with the title of any specific article.
Variational autoencoder (VAE): A deep learning technique used to generate new content like images. Important component of models like DALL-E.
Here’s a helpful list of verbs commonly used to describe actions in prompts, especially when designing instructions for AI models:
Verbs for Prompt Actions
Here’s a helpful list of verbs commonly used to describe actions in prompts, especially when designing instructions for AI models:
- Generate — “Generate a list of ideas.”
- Summarize — “Summarize this article in three sentences.”
- Classify — “Classify the following emails as spam or not spam.”
- Translate — “Translate this paragraph into Spanish.”
- Rewrite — “Rewrite this sentence to sound more professional.”
- Explain — “Explain the concept of gravity to a 10-year-old.”
- Compare — “Compare the advantages of solar and wind energy.”
- List — “List five reasons to visit Japan.”
- Analyze — “Analyze the tone of the following review.”
- Expand — “Expand this idea into a full paragraph.”
- Correct — “Correct any grammatical errors in this text.”
- Suggest — “Suggest a creative title for this story.”
- Predict — “Predict the outcome of the next election based on current data.”
- Design — “Design a workout plan for beginners.”
- Outline — “Outline the steps needed to start a business.”
- Critique — “Critique this essay for clarity and coherence.”
- Summon — “Summon historical facts about ancient Egypt.”
- Emulate — “Emulate the writing style of Ernest Hemingway.”
- Optimize — “Optimize this code for better performance.”
Vertex AI: A managed machine learning platform from Google Cloud that provides tools for building, deploying, and scaling AI models. Vertex AI integrates various Google Cloud services and AI tools to streamline the entire machine learning lifecycle, from data preparation to model training and deployment.
W
Weight: A numerical value in a machine learning model that determines the strength or importance of a connection between two elements (like neurons in a neural network). During training, weights are adjusted to help the model make better predictions.
Word embedding: Representing words as numeric vectors that encode semantic meaning based on context. Allows NLP models to understand language.
Z
Zero-shot Prompt: A type of input given to a model in which it is expected to perform a task without any prior examples or specific training on that task. In zero-shot learning, the model relies on its pre-existing knowledge and generalization abilities to generate a response or solve a problem. Example: Asking a language model, “Translate this sentence into French,” without providing any example translations is a zero-shot prompt. The model applies its understanding of languages to perform the task without prior examples.