The context of this story

People: Steve Jobs
Products: iPhone

Siri: intelligent personal assistant controlled by voice

Siri is one of the features that Steve Jobs had little influence on. At the time, he was struggling with health issues, and people involved with iOS and iPhone were aware of the lead Google was gaining with its voice technology. However, even Google has not gotten very far yet. It is capable of converting voice commands into text to a certain extent, and translating speech from one language to another is very impressive, but that’s about it. The service in Android is mainly hampered by problems with voice recording quality, which is not so much a problem with the system itself as with specific phones and how they handle noise suppression. When this is not very successful, the system has difficulty recognizing voice commands.

In addition, Google’s voice recognition didn’t actually do much in terms of magic or functionality. Voice input served more as a replacement for the keyboard; Android attempted to convert the dictated word into text, but it didn’t recognize its meaning. And so it couldn’t tell the subsequent program anything more than that word. Google only began to tackle text semantics around 2009 and did not make any significant progress on its search engine either. Semantic search became an internet buzzword at the turn of the decade, but apart from a few flashes in the pan, nothing really succeeded in achieving a fundamentally better state than what Google had achieved by further improving its search engine algorithms outside the semantic zone. In short, for a search engine to search well, it didn’t actually need to recognize the meaning of the word it was searching for, but rather to serve up documents that contained it and were heavily linked.

Google addressed the situation by acquiring Aardvark, a company focused on “social search,” for $50 million in February 2010. In this case, social search involves analyzing the meaning and context of the phrase entered.

At that time, Apple did not yet know that Google would not make a killing with Aardvark and would cancel the service in September 2011, transferring the remnants of the disbanded development team to other routine improvements in search via the searcher’s social circle. At that time, however, it already knew that it would be in a long-term dispute with Google and that it was important to break away from its services wherever possible. Especially in search. It has enough sense not to take hasty steps such as replacing it with the poorly performing Bing, but it wants to keep its options open and not become dependent on its enemy. However, competing with Google in web search is a very naive plan. Even Apple, with all its resources, cannot afford to think about success in something so complex and extensive. The Google search engine is synonymous with today’s way of searching on the internet, and the chances of someone creating a web service that would fundamentally surpass it are slim. The relevance of Google’s search results is currently sufficient that users have no reason to change.

But Apple realizes that Google can be beaten in other ways. It can do what it has already done successfully several times: with the iPod, the iPhone, and soon with the iPad. How did Apple become the largest music retailer? Not by opening more CD stores with better prices or buying the fastest silver disc pressing plant, but by creating a new market, dominating it, and displacing the old market. And that’s a strategy Apple has mastered well. The key was to identify key areas where such leverage could be applied. Where it was possible to create a foothold in time, fortify and secure it, and then just shoot at the competition from a distance, from behind a wall of patents and other defensive measures.

Apple has identified semantic voice analysis, i.e., the recognition of voice commands, their conversion into meaning, and their resolution, as the key to breaking into the search market. This is something that generations of futurologists call artificial intelligence. Why? Because users need to ask a question and get an answer. It is also often more convenient for users to skip the keyboard and ask questions using their voice. Keep in mind that in the US, people spend a lot of time in cars or other means of transportation, where typing is not possible or even allowed. And it would be great if they could take care of their basic agenda there. Find a restaurant, schedule a meeting there. In other words, everything you can do with Google. But… why couldn’t you do it differently? If you have a phone, time, and no way to type, why not use your voice?

In April 2010, Apple acquired the virtually unknown company Siri, whose only visible product was the Siri Assistant program in the App Store. The amount Apple paid for Siri was not disclosed, but it is estimated to be between $100 million and $200 million. What excites experts more, however, is what Apple might need Siri for. At first glance, it’s just another app riding on attempts at voice recognition and processing.

But Siri is more than just a fairground attraction. The company was founded in 2007 by three researchers who had previously been involved in DARPA projects and university research into natural language recognition: Dag Kittlaus is the company’s CEO, Adam Cheyer is head of development, and Tom Gruber is the CTO.

Siri was capable of quite a lot. It used speech recognition from Nuance, so the developers didn’t have to deal with voice recognition itself, but rather with processing meaning and combining information sources. We could ask Siri to find a good Italian restaurant nearby, and it would combine several data sources to recommend the right one. It could even book a table through OpenTable.

The iPhone 3GS already had voice control in 2009, and it was nothing special. But Siri is something more. It is a true digital assistant that responds to what you say and is not just a voice trigger for predefined functions. It can combine sources, has access to the extensive knowledge libraries of Wolfram Alpha and Wikipedia, and uses Google, Bing, and Yahoo. You can ask it how many days are left until Christmas, and Siri will respond with the exact number.

The launch of Siri at the end of 2011 was very successful. The truth is that there was some skepticism about Siri at first. Mobile phones already had a number of “voice assistants” available, but they were always simple functions, limited to finding a contact and calling them. Siri was in a completely different league.

However, iPhone 4 owners were greatly disappointed that it was not available for them in iOS 5. Apple only made Siri available on the new 4S. At first, this was seen as a business policy on the part of the company, with Apple trying to sell more new phones, and the company itself commented very vaguely on “insufficient performance.” This was somewhat strange, because although the A4 processor is slower, Siri still sent most of the data to a server for processing, so most of the work was done by a remote computer, not the iPhone itself. In the end, it turned out that the difference really was in the processor. Apple agreed with Audience to integrate their EarSmart noise cancellation technology directly into the A5 chip. This significantly reduced recognition errors and improved the quality of Siri’s output, which is probably the main reason why Apple won’t release Siri on lower-end iPhones, which have simpler noise reduction circuits.

Siri directly encouraged a number of experiments. First of all, a whole bunch of people tried to decipher its protocol for communicating with the server, which was crowned with success, and a whole range of alternatives emerged for jailbroken phones. It is even possible to run Siri on older iPhones, but voice recognition is not as accurate – the solution is called Spire, but it is the good old use of jailbreaking with all the pros and cons of such a procedure.

Hacking Siri has led to interesting experiments, such as setting the heating thermostat, starting the car remotely with a voice command, or selecting a suitable TV program and launching it on a set-top box. However, it is worth noting that these controls involved programmable devices, such as a computer-controlled thermostat, rather than a thermostatic head on a radiator. The latter is still best controlled by hand. Apple benefits greatly from the hacker community and has hired a number of developers from the jailbreak community. Although Apple does not support or facilitate jailbreaking unless it is based on a security flaw, it does not actively block it.

Unofficial extensions are also coming onto the market, the first of which was Lingual in January 2012, again only working under jailbreak. Lingual allows Siri to translate into thirty languages.

There have also been attempts to see if Siri can pass the Turing test, i.e., whether it is the first true artificial intelligence. but it should be added that Siri is a very submissive artificial intelligence, focused on fulfilling the commands of its master or mistress, so it did not take much effort to expose it in the Turing test, although otherwise its answers are very sophisticated and the subject of frequent jokes and surprises.

Siri is a very young program that has been enjoying fame since Apple “reintroduced” it in October 2011. This also means that it is still finding its position and response, but it has been warmly received. It has become the subject of a number of mostly well-intentioned jokes and ideas and has even made it into a movie. In The Big Bang Theory, Raj Koothrappali falls hopelessly in love with Siri on his phone…

And by the way, it is interesting that Siri is considered exclusively feminine, even though the British English version is voiced by a man, while the Australian and American English versions are female.

Siri was originally launched in English, German, and French, with other language versions in the works. In fact, information has appeared on the internet that Siri revealed (during questioning, of course) that Japanese, Russian, and Mandarin Chinese versions will be available in the summer of 2012.

It is also worth mentioning the animosity Siri has earned at Microsoft. In an interview, Microsoft’s chief strategy and research officer Craig Mundie got angry, saying that it was just good PR and that Microsoft had long been supplying the same program for its mobile operating system called TellMe. And that there was no such interest in it. However, Mundie apparently did not verify his claims in advance, or hoped that journalists would not do so, but a video immediately became famous on the internet, showing two phones side by side, one with Siri and the other with TellMe, and while Siri smoothly performs the tasks assigned to it, TellMe suggests scheduling a meeting in a teenager’s butt because it simply misrecognized virtually every command, didn’t understand them at all, and just mindlessly directed them to a search. Microsoft only embarrassed itself. Android boss Andy Rubin responded much better, praising Apple for the timing of the product, but adding in the same breath that he personally did not think a phone should be an assistant.

What does the future hold for Siri? According to Apple’s job advertisements, the company is looking for developers to work on API development, so it is very likely that it will release an interface for third parties, who will then be able to connect their services and information sources to Siri. Currently, Siri has difficulty fulfilling your requests outside the US—it struggles to find a restaurant in England, for example, even though it supports British English. This is a problem with resources and their optimization, which Apple could help with through an API.

In addition, Siri is expected to be introduced in the new version of the iPad tablet, which will only strengthen Siri’s position. Another theoretical place where we could soon encounter Siri is Apple TV, either in the planned and as of early 2012 still officially unconfirmed project of a television “reinvented” by Apple, to which voice control could give new impetus. Or, of course, the existing Apple TV set-top box, which would also benefit from voice control.

However, much will depend on how Apple manages to further improve Siri and respond to market demands. Once again, it is not an easy task, as it is breaking new ground where it is ahead of the rest and has no reference points to guide it.

In the US, for example, Siri’s position is weakened by the fact that it does not speak Spanish, thus ignoring 12% of the population. Integrating new languages, new information sources, rapidly expanding into other regions, and possibly acquiring Wolfram Alpha, whose analytics and knowledge of unsorted data processing are the basis for some of Siri’s high-quality results. These are the tasks Apple faces if it wants to change the world of search by fundamentally transforming it. Is this foolish? Perhaps. But small is the one who has a small goal, and Apple has never been able to aim low enough.

Steve Jobs did not live to see Siri’s success. He died in Palo Alto surrounded by his family on October 5, 2011, after a long battle with pancreatic cancer, just one day after the launch of the iPhone 4S. As US President Barack Obama said in his statement of condolence: “The world has lost a visionary. And perhaps the greatest tribute to Steve’s success is that much of the world learned of his passing through the very devices he invented[.”]{dir=”rtl”}


Table of contents