AI search, and synthetic voices in ads and other creative content will become the future of advertising, but OpenAI has revealed some oddities in ChatGPT-40 technology such as the ability to clone a voice and finish thoughts and sentences.
OpenAI last week published a report detailing "key areas of risk" for the company's latest large language model, GPT-4.0, and how its executives hope to mitigate them. Many of the concerns have been addressed, but they could have promoted deepfakes, as well as copyright infringement and licensing deals.
“Voice generation can also occur in non-adversarial situations, such as our use of that ability to generate voices for ChatGPT’s Advanced Voice Mode,” OpenAI wrote in a report. “During testing, we also observed rare instances where the model would unintentionally generate an output emulating the user’s voice.”
advertisement
advertisement
The technology also has the ability to imitate voices, but "nonverbal vocalizations" like sound effects such as erotic moans, violent screams, and gunshots. Certain text-based filters were updated to filter, detect and block audio containing music. The limited alpha of ChatGPT’s Advanced Voice Mode was instructed to not sing.
The audio clip OpenAI shared in the blog post demonstrates how the technology can continue the sentence in the same voice it began with after the word “no.”
It’s an eerie example of how advertisers and creators can change the direction of content even without the help of the original designer or author.
Look back to the fiasco in July when Elon Musk shared a video without identifying it as a parody. Christopher Kohls, a YouTuber better known online as Mr. Reagan, created the parody that spoofed Vice President Kamala Harris. Kohls posted it to his channel on YouTube and Musk shared it without noting it was a parody.
Ironically, when asked to cite the most advanced GPT model available today, Google Gemini states "GPT-40 is considered the most advanced GPT model available."
Voice generation at OpenAI can occur in other situations, such as the ability to generate voices for ChatGPT’s advanced voice mode. During testing, OpenAI notices in rare instances where the model would unintentionally generate an output that copied or emulated the user’s voice.
OpenAI developers addressed voice generation related-risks by allowing only the preset voices they created in collaboration with voice actors to be used. Selected voices were used to post-train the audio model.
Standalone output classifiers were built to detect if the GPT-4o output uses a voice that’s different from OpenAI’s approved list. They streamed it during audio generation and blocked the output if the speaker didn’t match the chosen predetermined voice.
Toward the end of the report, OpenAI notes the risk of unintentional voice replication remains "minimal."
“Our system currently catches 100% of meaningful deviations from the system voice based on our internal evaluations, which includes samples generated by other system voices, clips during which the model used a voice from the prompt as part of its completion, and an assortment of human samples,” the company wrote in a post.