◉ ai

Inference

Using a trained AI model to generate outputs

What is Inference?

The process of running new inputs through a trained AI model to generate predictions, text, classifications, or other outputs. Training builds the model; inference uses it. Every time you send a prompt to ChatGPT, that's inference. Understanding inference matters because it's where costs accumulate, optimizing inference speed and efficiency is critical for production AI applications, especially at scale.

Take the next useful step

LV’s AI hub

Original context on AI applied to creation, research, and systems.

Creator OS

Useful when AI is part of a content workflow that still needs human judgment.

💡

In plain words

"Think of it like asking a trained chef to cook — the training is done, now it's live performance."

How it works

Key takeaways

Running the model on new inputs = inference
This is where your per-query cost lives
Latency and throughput are key metrics

▸

Real-world example

Every time ChatGPT generates a response, that's inference. The model's weights are frozen — it's running a forward pass on your input. Inference is what costs money per query, and optimizing it is a whole discipline.

Related terms

LLM (Large Language Model)

AI models that understand and generate text

Fine-Tuning

Training an AI model on your specific data

Tokens

The basic units AI uses to process text

Open Weights

AI models anyone can use and modify

Editorial

Services

Shop