The term “Generative” in “Generative Models” refers to the ability of these models to generate new data that is similar to the training data they were exposed to. In the context of machine learning, generative models aim to capture the underlying patterns and structures of a dataset in order to generate new samples that resemble the original data.
Generative models can be broadly categorized into two types: explicit and implicit. Explicit generative models, such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), explicitly model the probability distribution of the data. Implicit generative models, like autoregressive models, model the data without explicitly specifying the probability distribution.
While these models are called “generative,” it’s essential to note that they don’t possess true creativity or understanding. They learn statistical patterns and correlations present in the training data and use that knowledge to generate new samples. The term “generative” implies the ability to produce new examples, but it doesn’t imply creativity, consciousness, or true understanding.
Generative models are considered generative in the sense that they can generate new data instances resembling the training data, but their generation is based on learned statistical patterns rather than a true understanding of the data.
The Confusing Terminology of “Generative” vs “Generation” Models in AI
There is some confusion and debate around the terminology used for models that generate outputs like text, images, etc. in AI. Specifically, is it accurate to call them “generative” models?
The traditional definition of a generative model is that it models the joint probability distribution p(x,y). In contrast, a discriminative model models the conditional probability p(y|x). Models used for generation tasks like text and image generation do condition on some input, so they fit the definition of discriminative.
I think the terminology arose because:
- These models are often trained on generative objectives like maximum likelihood on the joint distribution or approximations to it (e.g. GANs)
- While conditioning on inputs, a key use case is to generate/sample realistic outputs like text, images etc. So the ability to “generate” outputs led to them being termed “generative models”
But terminology-wise, “conditional generation models” or simply “generation models” would be more precise than calling them “generative.”
So in summary:
- Generative model: Models joint p(x,y)
- Discriminative model: Models conditional p(y|x)
- Generation model: Discriminative model that conditions on input and generates output samples
The Difference Between “Generative” and “Generation” Models
Originally, in machine learning the term “generative model” had a precise, technical meaning – it was a model that could learn and recreate the full joint distribution of data. For example, a generative model could learn all possible images that look real.
On the flip side, a “discriminative model” learned the conditional distribution – predicting some outputs given certain inputs. So if you fed it an image caption, it could generate a matching picture.
Here’s where things get muddy. The latest wave of AI, with chatbots like ChatGPT and image generators like DALL-E, are discriminative by nature. You give them some input like a prompt, and they output a tailored response by predicting the conditional distribution.
But because these systems are so adept at producing remarkably human-like text, images etc., they got labeled as “generative models.” Their generation capabilities led to associating them with that term.
However, technically it’s wrong – they are discriminative models doing generation. Hence the debate around calling them “generation models” instead.
In the end, the terminology has drifted from its original technical meaning in the AI community. The popularity of systems that can “generate” content has led to that term becoming shorthand for any model doing generation, even if they are not modeling the complete joint distribution.
So while the experts may cringe at the misuse of these terms, the naming seems to have stuck around generating stuff! But as AI literacy increases, it is worth revisiting this terminology to prevent confusion among new practitioners.