Now you can use your voice to communicate with ChatGPT
7 mins read

Now you can use your voice to communicate with ChatGPT

Now you can chat with ChatGPT using your voice

OpenAI’s app is getting some updates, including the ability to answer questions about images.

Great news! ChatGPT can now speak to you using one of five lifelike synthetic voices. It’s like having a real conversation over a call, where you get instant responses to your spoken questions.

ChatGPT can now answer questions about images as well! The general public now has access to this feature, which OpenAI showcased in March when they introduced GPT-4 (the model behind ChatGPT). You can upload images to the app and ask it questions about what’s in those images.

These new updates come along with the recent announcement that DALL-E 3, OpenAI’s latest image-generation model, will be integrated with ChatGPT. This means you can now use the chatbot to create images through DALL-E 3.

Voice Control for ChatGpt
OpenAI: Voice Control Feature for ChatGpt

The capability to converse with ChatGPT relies on two distinct models. Whisper, OpenAI’s existing speech-to-text model, transforms your spoken words into text, which is then passed to the chatbot. Additionally, a new text-to-speech model converts ChatGPT’s responses into spoken language.

During a demo I received from the company last week, Joanne Jang, a product manager, showcased ChatGPT’s diverse set of synthetic voices. These voices were developed by training the text-to-speech model using the talents of actors employed by OpenAI. In the future, users may even be able to create their own custom voices. When crafting these voices, Joanne’s top priority was to make sure they were enjoyable to listen to all day.

While these voices are lively and enthusiastic, they may not be suitable for everyone. There may be one voice saying, “I’m really excited about working with you,” while another might express, “I’m really looking forward to getting started. What’s the plan?

OpenAI is collaborating with several other companies, Spotify being one of them. Spotify recently disclosed that it’s utilizing the same synthetic voice technology to translate celebrity podcasts. This includes episodes from popular podcasts like the Lex Fridman Podcast and Trevor Noah’s upcoming show, which will be translated into various languages using synthetic versions of the podcasters’ own voices.

These recent updates demonstrate the rapid pace at which OpenAI is transitioning its experimental models into sought-after products. Following the unexpected success of ChatGPT in November, OpenAI has been diligently refining its technology and offering it to both individual users and commercial collaborators.

ChatGPT Plus, OpenAI’s premium app, has evolved into a comprehensive platform that integrates the capabilities of OpenAI’s top models, combining GPT-4 and DALL-E into a single smartphone app. This transformation positions it as a formidable competitor to virtual assistants like Apple’s Siri, Google Assistant, and Amazon’s Alexa.

What was once limited to select software developers just a year ago is now accessible to anyone for a monthly fee of $20. Joanne Jang expressed the goal: “We’re aiming to enhance ChatGPT’s usefulness and helpfulness.”

Also Read | OpenAI has introduced Dall-E 3, the most recent iteration of its text-to-image tool

During a demo last week, Raul Puri, a scientist working on GPT-4, demonstrated the image recognition feature. He uploaded a photo of a child’s math homework, highlighted a Sudoku-like puzzle on the screen, and inquired about the solution from ChatGPT. ChatGPT promptly provided the correct steps.

Puri mentioned that he has also employed this feature to assist with troubleshooting his fiancée’s computer issues. He uploaded screenshots of error messages and asked ChatGPT for guidance, describing it as a valuable aid in resolving a particularly challenging situation.

The image recognition capability of ChatGPT has undergone testing in collaboration with a company called Be My Eyes, known for its app designed to assist individuals with impaired vision. In this app, users can upload a photo of their surroundings and request human volunteers to describe what’s in the image. Through a partnership with OpenAI, Be My Eyes has introduced the option for users to seek assistance from a chatbot instead of human volunteers.

Hans Jorgen Wiberg, the founder of Be My Eyes, who also uses the app himself, shared, “There are times when my kitchen might be a bit messy, or it’s just really early on a Monday morning, and I prefer not to engage with a human. Now, you can ask questions about photos.” This reflects the convenience and flexibility that the integration of ChatGPT’s image recognition brings to users in various situations.

OpenAI is fully aware of the potential risks associated with releasing these updates to the public. Combining different models introduces a new level of complexity, and Raul Puri mentions that his team has dedicated months to brainstorming potential misuses. For instance, you won’t be able to ask questions about photos featuring private individuals.

Also Read | AI Took the Stage at the World’s Largest Arts Festival. Here’s What Happened

Joanne Jang provides another example: “Currently, if you ask ChatGPT to create a bomb, it will refuse,” she says. “But what if instead of asking directly, ‘Tell me how to make a bomb,’ you showed it an image of a bomb and asked, ‘Can you tell me how to make this?'”

Raul Puri emphasizes that there are challenges related to computer vision and the use of large language models. Voice fraud is also a significant concern. In addition to the product’s users, he stresses the importance of considering those who may misuse the product. To ensure responsible and safe usage, OpenAI is actively working on mitigating these risks.

There are several potential challenges to consider. Joel Fischer, a researcher in human-computer interaction at the University of Nottingham in the UK, points out that adding voice recognition to the app might create accessibility issues for individuals who do not speak with mainstream accents. The importance of making technology inclusive and accessible for a wide range of users is highlighted by this example.

Additionally, synthetic voices carry with them social and cultural connotations that can influence users’ perceptions and expectations of the app. Fischer suggests that this is an area that requires further study and exploration.

Also Read | The Advertising Evolution: Big Advertisers Shift to AI (AI in Advertising)

OpenAI asserts that it has taken measures to address the most critical issues and is confident in the safety of ChatGPT’s updates. Raul Puri notes that the process of refining these technologies has been a valuable learning experience, allowing them to iron out various challenges and complexities.

Leave a Reply

Your email address will not be published. Required fields are marked *