Olufemi Adeyemi 


Microsoft retired the enthusiastic office assistant Clippy approximately 17 years ago, but the concept of a friendly and supportive AI assistant appears to have resurfaced. The company is revamping Copilot, its text-based AI tool integrated with Windows and other applications, by introducing features such as vision, voice capabilities, and enhanced problem-solving skills, all while adopting a more “encouraging” demeanor.

“We are truly at an incredible turning point,” states Mustafa Suleyman, CEO of Microsoft AI. “AI companions can now perceive what we see, hear what we hear, and communicate in the same language we use.”

The reception of Copilot has been mixed, with some users reporting issues like lag or unclear responses. Nevertheless, Microsoft is optimistic that this tool could become a vital component of Windows, Office, and more. By integrating OpenAI’s algorithms into widely used software, the company is leading the way in exploring AI's potential to enhance productivity in office environments. Google, a major competitor, is also integrating AI into its office applications, including Gmail and Google Docs.

The updated Copilot will feature the ability to engage in conversation using multiple humanlike voices, managing interruptions and pauses seamlessly. “You can interject at any point, and it can actively listen,” Suleyman explains. “That’s the essence of effective conversation.”

Suleyman further notes that Copilot has been refined to provide greater emotional support to users. “It’s on your side, supporting you, acting as your cheerleader,” he remarks. Copilot Voice will be launched today in English for users in Australia, Canada, New Zealand, the United Kingdom, and the United States, with plans to expand to additional countries soon.

Microsoft's Clippy, a personified paper clip, became infamous for its appearance in Word, often greeting users with the line, “It looks like you’re writing a letter …” However, the feature was largely disliked, as it did not fulfill its promise of humanlike intelligence, often forgetting user preferences and repeating itself. In contrast, large language models exhibit a greater ability to simulate human intelligence, yet their behavior can still be strange and unpredictable, which may influence the acceptance of Copilot.

Copilot Voice will be included in the free version of Copilot for Windows, which is also accessible through a standalone mobile app and online.

Additionally, Microsoft is rolling out experimental enhancements to Copilot, available exclusively to subscribers of the $20-per-month Copilot Pro plan. An opt-in feature named Copilot Vision will enable the AI assistant to view users’ screens and respond to items they highlight with their cursor. Suleyman notes that users can point to a product and request Copilot's opinion based on online reviews.

Suleyman mentions that a frequent request involves aesthetic advice, particularly when users are browsing fashion websites and inquire about specific patterns or dress styles. He further explains that Copilot may eventually provide critiques of web pages, offering qualitative assessments tailored to a user’s interests. “It could read the entire page in an instant and then engage in a discussion about it,” he states. “For instance, you could ask, ‘Do you think this is an article I would enjoy?’ which presents a unique experience.”

Text interactions with Copilot are retained for a period of 18 months, according to Microsoft, although users have the option to delete their conversations. Copilot Vision, however, will not store user inquiries, as Microsoft states that data will be erased at the conclusion of each session. This feature will be restricted to specific websites and will not have access to copyrighted or adult content. Microsoft has indicated that it will be available to Copilot Pro users in the US at a yet-to-be-announced date. The company also assures that no data is shared with OpenAI.

Another experimental feature, Think Deeper, enables Copilot to tackle more intricate problems by employing a method that resembles step-by-step reasoning. This technology is partially based on a new AI model named OpenAI o1, which was introduced earlier this month by OpenAI. Think Deeper will be accessible to select Copilot Pro users in the US starting today.

These updates to Copilot demonstrate Microsoft's commitment to innovating its AI tools and enhancing their appeal. They also highlight the swift advancements in AI technology, as many leading large language models—responsible for powering chatbots—are now capable of processing audio and images in addition to text. Recently, OpenAI, Google, and other companies have equipped their models with the ability to engage in natural conversations using various human voices.

Microsoft is navigating a landscape filled with competition and underlying uncertainties. 

The company has reportedly invested $13 billion in OpenAI and holds a license that allows it access to its AI models. While OpenAI is still regarded as a frontrunner in the AI sector, it has recently experienced significant upheaval, highlighted by the recent resignations of CTO Mira Murati and two senior engineers involved in research initiatives. Suleyman refrained from commenting on the current situation at OpenAI.

Suleyman joined Microsoft in March following a $650 million licensing agreement with his startup, Inflection AI. He is a co-founder of the British AI firm DeepMind, which was acquired by Google in 2014. Last year, DeepMind was integrated into Google’s AI division, now known as Google DeepMind, and is currently led by another co-founder, Demis Hassabis.

Microsoft created Copilot after the success of GitHub Copilot, a tool for coders launched in 2021 that autocompletes code blocks and addresses programming inquiries.

Shane Greenstein, a Harvard Business School professor who has analyzed Microsoft’s AI strategy, notes that developing a practical general-purpose assistant will be more complex. He emphasizes that the company’s new experimental features must demonstrate their value to users.

“It took five to 10 years of experimentation with web interfaces to learn how to attract more than just tech-savvy individuals to shop online,” Greenstein states. “I anticipate a similar timeframe for iteration in this area.”