Meta has just dropped a hefty 90-page paper on its latest creation, Meta Movie Gen, a “cast of media foundation models.” Though still locked in the lab, Meta announced its ambition of combining audio, image, video, and text into one seamless generative AI model.
Today’s AI can already generate videos from text or images, sync soundtracks, and even align lip movements.
But generating sound effects from words alone that match a video’s contents and atmosphere? That’s been elusive.
Meta Movie Gen avowed to do all that—and more.
You can edit an existing video by typing a simple prompt, add an audio element to a generated video, or transform an everyday selfie into an animated scene. Such an integration of all content modalities reminds us of the title of the film Everything Everywhere All at Once.
Trained on vast multimedia datasets, Meta Movie Gen cranked up the heat on competitors like Runway, Pika Labs, Luma AI, and OpenAI’s Sora, all racing to redefine video generation.
The research team for Meta Movie Gen proposed new approaches to foundation models including co-training of images and videos, spatial-temporal compression in the latent space, rich text embeddings, bi-directional attention instead of causal attention, efficient inference using temporal tiling and others.
Though Meta asserts that its models are trained on “licensed or publicly available data,” there’s still the lingering question: what proprietary troves of video and image data does Meta have access to that others don’t?
Mark Zuckerberg confidently declared on Instagram that by 2025, Instagram users will be wielding this new app.
Likely, social media will experience the first wave of change.
Picture a travel post capturing the Northern Lights, but you’re dressed for a Hawaiian beach instead of bundled in a winter coat—all changed with a single text input.
Or imagine promoting your bakery with a cozy video of customers savoring cupcakes, without filming a single scene.
One can even fabricate a career highlight—delivering a keynote speech or accepting a prestigious award.
“Reinventing” realities soon, these digital manipulations might not be seen as “deepfakes.” Instead, they’ll be beautified as creative advertising and polished personal branding.
As Meta and others scale up their computing power, we’ll see media models spinning out longer, more complex content.
Competitors will scramble to craft their own versions, reshaping industries from film to animation to gaming.
Still, even Meta’s sample videos show where the technology stumbles.
The koala surfing, the fire-dancer twirling, the child flying a kite—everything looks superficially convincing.
The lighting feels unnatural, the shading overwhelms, the movement stiffens, and the joy of play disappears in a homogenizing portrayal of a child running with their back turned to us.
Faces are locked in frozen smiles, textures too smooth and too shiny.
AI videos flooding social media, reality will soon be edited into a surreal perfection—flawless fruits, wrinkle-free faces, overly groomed dogs.
Imagine a generation growing up in these machine-made fantasies, where the messiness of life is polished away.
Why read Romeo and Juliet when an AI can whip up a custom video adaptation?
Why learn to draw when AI can animate your vision with a single textual command?
Why practice piano when AI can compose a flawless performance, complete with your likeness?
What’s missed in the pixels?
Even as AI advances, it struggles to replicate the nuance of real life experience, subtle shifts in emotions, and internal tensions of human struggles.
Meta Movie Gen, much like the capitalist production and reproduction of media that the German Philosopher Walter Benjamin critiqued in his influential essay “The Work of Art in the Age of Mechanical Reproduction,” accelerates the decay of “aura” of our lives and artistic expressions.
As Benjamin argued, capital controlled, mass reproduction strips art of its authenticity and uniqueness. Meta’s AI-generated videos will offer easily customizable, endlessly replicable spectacles, turning personal memories and identities into commodified illusions.
The result is a world where the original, genuine experience fades in favor of reproductions crafted for public consumption.
But the issue runs deeper. Benjamin’s critique of the film industry’s exploitation of the masses is mirrored here.
Media foundation models, controlled by profit-driven corporations, do not just create—they manipulate. The allure of personalized, AI-generated realities is a product designed to benefit the tech companies and its shareholders, drawing users into a cycle of producing and consuming polished, curated versions of themselves.
This commodification of identity, under the guise of creativity, exploit genuine self-expressions.
Meta’s grip on these models may mean that users lose control over their own representations, further alienating individuals from their authentic lives.
In a world awash with generated content, it’s crucial we hold onto what makes us human—our voices, our time, our presence interacting with others in the physical world.
Because no matter how vivid these digital creations may seem, they’ll never replace the authenticity of being there for our loved ones.