After more than 40 years of operation, DTVE is closing its doors and our website will no longer be updated daily. Thank you for all of your support.
Q&A: Papercup’s Jesse Shemen on AI dubbing
Jesse Shemen co-founder & CEO at Papercup, talks to DTVE about the company’s AI dubbing technology and its increasing demand.
How has AI technology changed over the years?
AI has been around virtually since the introduction of computing, the idea of how do you create computers that are as or more intelligent than humans, there’s always been efforts to try and create that. You can argue every new technology is a form of artificial intelligence, because it’s accomplishing something that previously wasn’t capable from existing technology.
But one of the tectonic shifts that we’ve seen is the introduction of these large language models, which makes the translation task much simpler for us, because you can basically feed it a lot of the context that it needs to translate appropriately and to generate the right translation and target language. Before, if you used things like Google Translate, as great as they were to help to navigate directions for getting to a corner shop in Italy, it wasn’t super effective at translating conversation, or complex video, or audio content or even documents.
But large language models changes the nature of how you can tackle the translation question. I think that’s been a really big shift and has only helped us because it means it’s reduced the amount of human input that we need to correct an output, so it’s readily available for end viewers to consume.
How has the increasing production of foreign language content impacted the market?
The popularity of a lot of these foreign titles that a lot of people didn’t anticipate certainly invoked a lot of demand for dubbing because even Americans were who were notoriously known for claiming they would never consume dubbed content into English. Then Netflix said, we’re not really sure that hypothesis is is right and exported a lot of these main titles like Money Heist and Squid Game, which became a billion dollar franchise.
I think that was very telling to the industry, there’s probably a lot more demand internationally for every piece of content we create that we’re not even uncovering. And if you can have a mechanism that allowed you to test and probe the quality of that content and other territories, you can unlock vast oceans of audiences that you actually weren’t engaging with in the past.
What does Papercup currently have in the works?
We’ve signed a partnership with the World Poker Tour, which recently launched which has been exciting because it’s a foray into sports.
Papercup dubbing it at least 50% cheaper than what you’d see as alternative, but you do see a wide range of different cost buckets across the market. You have this major cost and turnaround saving because we are combining human with AI, we’re not just solely reliant on humans to actually deliver the output. Our system is technology first, human second, not the reverse. So our voices are generated, synthetically and then almost edited and annotated and, and optimised by humans.
Our main aim is to try and figure out how to constantly improve our modelling of human speech so that we can reflect its inherent complexity. Human expression is so complex because there’s so many different ways in which even one individual can articulate that same utterance. So that’s a never ending project, and we’re focusing on doing that across languages.
How will dubbing progress in the next five years?
I think in the next five years, we will live in a world where all the where all videos are consumable in any language at a high quality level. Not just your Netflix and Amazon Prime Video libraries, but it’ll be live TV and sports, web conference calls on Zoom and FaceTime. I think all forms of audio and video will be instantaneously translatable into any language but for the first time, at a high quality level and low latency. I also still think theatrical releases, and some of the top 1% of content will be the purview of human – professional voice artists who can do even more than what a human can do in this very artistic format.