On October 14, 2011, Apple introduced the new iPhone 4S. One of its major new features was Siri, a personal assistant application. Siri uses a natural language processing technology to interact with the user.
Interestingly, Apple explained that Siri works by sending data to a remote server (that’s probably why Siri only works over 3G or WiFi). As soon as we could put our hands on the new iPhone 4S, we decided to have a sneak peek at how it really works.
Today, we managed to crack open Siri’s protocol. As a result, we are able to use Siri’s recognition engine from any device. Yes, that means anyone could now write an Android app that uses the real Siri! Or use Siri on an iPad! And we’re goign to share this know-how with you.
The best demo probably is Siri’s speech-to-text feature. We made a simple recording of us saying “Applidium vous souhaite une bonne journé”, and got a perfect result !
This sound sample never went through any iPhone, but nonetheless we got Siri to analyze it for us.
Understanding the protocol – A brief technical history
At Applidium we’re used to building mobile applications. The best way to chat with a remote server is HTTP, as it’s the protocol that is the more likely to work in any case.
The easiest way to sniff HTTP traffic is to setup a proxy server, configure your iPhone to use it, and look at what goes through the proxy. Surprisingly, when we did, we wouldn’t gather any traffic when using Siri. So we ressorted to using
tcpdump on a network gateway, and we realised Siri’s traffic was TCP, on port 443, to a server at 220.127.116.11.
https://18.104.22.168/ on a desktop machine we noticed that this server was presenting a certificate for
guzzoni.apple.com. So it seemed like Siri was communicating with a server named guzzoni.apple.com over HTTPS.
As you know, the “S” in HTTPS stands for “secure” : all traffic between a client and an https server is ciphered. So we couldn’t read it using a sniffer. In that case, the simplest solution is to fake an HTTPSserver, use a fake DNS server, and see what the incoming requests are. Unfortunately, the people behind Siri did things right : they check that guzzoni’s certificate is valid, so you cannot fake it. Well… they did check that it was valid, but thing is, you can add your own “root certificate”, which lets you mark any certificate you want as valid.
So basically all we had to do was to setup a custom SSL certification authority, add it to our iPhone 4S, and use it to sign our very own certificate for a fake “guzzoni.apple.com”. And it worked : Siri was sending commands to your own HTTPS sever! Seems like someone at Apple missed something!
That’s when we realised how Siri’s protocol is opaque. Let’s have a look at a Siri HTTP request. The request’s body is binary (we’ll get into that later), and here are the headers :
ACE /ace HTTP/1.0
User-Agent: Assistant(iPhone/iPhone4,1; iPhone OS/5.0/9A334) Ace/1.0
A few interesting things :
- The request is using a custom “ACE” method, instead of a more usual GET.
- The url requested is “/ace”
- The Content-Length is nearly 2GB. Which is obviously not conforming to the HTTP standard.
- X-Ace-host is some form of GUID. After trying with several iPhone 4Ses, it seems to be tied to the actual device (pretty much like an UDID).
Now let’s move on to the body. The body is some raw binary content. When we first looked at it with an hex editor, we noticed it started with
0xAACCEE. Oh, seems like header ! Unfortunately, we couldn’t understand anything of what was after that.
That’s when we took some time to think. As people who are used to designing mobile application, we know there’s one thing which is very important when talking over a network : compression. The bandwidth is often limited, so it’s usually a very good idea to compress your data. And what is the most ubiquitous compression library around ? zlib:“http://zlib.net/”. It’s a very solid library, really efficient and powerful (makes sense, it’s half french!). So we tried to pipe that binary data through zlib. But nothing came out, we were missing a zlib header. That’s when we thought “hmm, so there’s already thisAACCEE header in the request body. Maybe there’s some more ?”. We developers like to keep things packed. 3 bytes is not a good length for a header. 4 would be. So we tried un-zipping after the 4th byte. And it worked!
Now when we unziped the content, we got onto some new binary data. Not very understandable either, but some parts were text. Among them, one caugh our attention :
bplist00. Hurray, it seems like the data is some binary plist. After fiddling a little bit with that binary stream, we figured out it was made out of chunks :
- Chunks starting with
0x020000xxxx are “plist” packets,
xxxx being the size of the binary plist data that follows the header.
- Chunks starting with
0x030000xxxx are “ping” packets, sent by the iPhone to Siri’s servers to keep the connection alive. Here
xx is the ping sequence number.
- Chunks starting with
0x040000xxxx are “pong” packets, sent by Siri’s server as a reply to ping packets. Without surprise,
xx is the pong sequence number.
And deciphering the content of binary plists is very easy, you can do it on Mac OS X with the “plutil” command-line tool. Or in ruby with the
CFPropertyList gem on any platform.
What we learned
We did really learn a few interesting things about how the iPhone 4S talks to Apple’s servers :
The audio data
The iPhone 4S really sends raw audio data. It’s compressed using the Speex audio codec, which makes sense as it’s a codec specifically tailored for VoIP.
The iPhone 4S sends identifiers everywhere. So if you want to use Siri on another device, you still need the identifier of at least one iPhone 4S. Of course we’re not publishing ours, but it’s very easy to retrieve one using the tools we’ve written. Of course Apple could blacklist an identifier, but as long as you’re keeping it for personal use, that should be alright!
The actual content
The protocol is actually very, very chatty. Your iPhone sends a tons of things to Apple’s servers. And those servers reply an incredible amount of information. For example, when you’re using text-to-speech, Apple’s server even reply a confidence score and the timestamp of each word.
What’s next ?
Here’s a collection of tools we wrote to help us understand the protocol. They’re written mostly in Ruby (because that’s a wonderfully simple language), some parts are in C and some in Objective-C.
Technical resources are created with specific intent and potentially captured and reused for other purpose. Little history lesson as long as there have been services made available to the public people have been finding ways to re-purpose or use those services without paying.
- Digital Cable
- Digital Satellite
- Phone -Phreaking
- Power -Leeching
Most of the time these services were regional or isolated geographically. People didn’t have as much access to information as they do today. Just a few days ago hackers took control of a satellite http://www.pakistantoday.com.pk/2011/11/hackers-take-command-of-us-satellites/ or how about the foreigners with the Russian address that damaged a water plant http://www.theverge.com/2011/11/18/2572079/springfield-water-plant-scada-hacked-us-russia at the same time another person got into a system in a Texas plant.
Story after story is the same thing over and again. In our lifetime, we are never going to stop this behavior. That is the key to this discussion, this a behavior problem.
I don’t believe that we can protect the internet. We can protect technical assets that are disconnected from the network but protecting something connected would be like trying to protect your hand from your brain. If there is a connection and there is INTENT there will be a result. People that are curious or driven with unlimited access can and will find ways to access these resources. What I am suggesting is that we focus on education and identification of behaviors to help work on these challenges. Recently, I watched the movie Starship Troopers (you know mindless scifi) during the movie the leader Sky Marshal decided to attack the enemy head on. When the troopers attacked they were overwhelmed by the sheer numbers of enemies. There was another aspect as well, the enemy was smarter than expected. Attacking something you don’t understand is not likely to produce a desired result. At some point the leadership decided that it must understand the enemy to achieve success. Cyber threats are no different. We are dealing with thousands of everyday people who have the power of the most up to date and relevant information at their command. Some of them work together, some of them work alone, some are destructive and some are simply curious or just want to solve a puzzle they are told is unsolvable.
What do you do when the enemy is you? If we start to pay attention to our culture and recognize or realize our actual connectivity with the global community we can start to find ways to limit our damages. We are not moving to cloud computing or moving towards a cloud paradigm; as long as we are connected by a logical and physical connection we are IN A CLOUD. We need to focus on behavior sciences with predictive gaming algorithms to identify the greatest risks based on technological trends, this will help us mitigate the damages that will for certain occur.
File: archives/7/p7_0x03_Hacker's Manifesto_by_The Mentor.txt
Volume One, Issue 7, Phile 3 of 10
The following was written shortly after my arrest...
\/\The Conscience of a Hacker/\/
Written on January 8, 1986
Another one got caught today, it's all over the papers. "Teenager
Arrested in Computer Crime Scandal", "Hacker Arrested after Bank Tampering"...
Damn kids. They're all alike.
But did you, in your three-piece psychology and 1950's technobrain,
ever take a look behind the eyes of the hacker? Did you ever wonder what
made him tick, what forces shaped him, what may have molded him?
I am a hacker, enter my world...
Mine is a world that begins with school... I'm smarter than most of
the other kids, this crap they teach us bores me...
Damn underachiever. They're all alike.
I'm in junior high or high school. I've listened to teachers explain
for the fifteenth time how to reduce a fraction. I understand it. "No, Ms.
Smith, I didn't show my work. I did it in my head..."
Damn kid. Probably copied it. They're all alike.
I made a discovery today. I found a computer. Wait a second, this is
cool. It does what I want it to. If it makes a mistake, it's because I
screwed it up. Not because it doesn't like me...
Or feels threatened by me...
Or thinks I'm a smart ass...
Or doesn't like teaching and shouldn't be here...
Damn kid. All he does is play games. They're all alike.
And then it happened... a door opened to a world... rushing through
the phone line like heroin through an addict's veins, an electronic pulse is
sent out, a refuge from the day-to-day incompetencies is sought... a board is
"This is it... this is where I belong..."
I know everyone here... even if I've never met them, never talked to
them, may never hear from them again... I know you all...
Damn kid. Tying up the phone line again. They're all alike...
You bet your ass we're all alike... we've been spoon-fed baby food at
school when we hungered for steak... the bits of meat that you did let slip
through were pre-chewed and tasteless. We've been dominated by sadists, or
ignored by the apathetic. The few that had something to teach found us will-
ing pupils, but those few are like drops of water in the desert.
This is our world now... the world of the electron and the switch, the
beauty of the baud. We make use of a service already existing without paying
for what could be dirt-cheap if it wasn't run by profiteering gluttons, and
you call us criminals. We explore... and you call us criminals. We seek
after knowledge... and you call us criminals. We exist without skin color,
without nationality, without religious bias... and you call us criminals.
You build atomic bombs, you wage wars, you murder, cheat, and lie to us
and try to make us believe it's for our own good, yet we're the criminals.
Yes, I am a criminal. My crime is that of curiosity. My crime is
that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me
I am a hacker, and this is my manifesto. You may stop this individual,
but you can't stop us all... after all, we're all alike.