The process of running new inputs through a trained AI model to generate predictions, text, classifications, or other outputs. Training builds the model; inference uses it. Every time you send a prompt to ChatGPT, that's inference. Understanding inference matters because it's where costs accumulate — optimizing inference speed and efficiency is critical for production AI applications, especially at scale.