This article is the first part of a three part series that steps outline in detail some basic architectural components that could be used to build an ‘Android Virtual Assistant’. Apple have a have a Virtual Assistant called Siri. The underlying architecture for the sophisticated software application that is Siri involves a powerful backend cluster application which handles the semantic language understanding and scheduling. This interacts on the mobile with client based voice related api’s. The role of an intelligent Virtual assistant in the mobile context is to allow you use your voice to send messages, schedule meetings, place phone calls, and more.
The rewards of researching the architectural basis of the emerging space of Virtual assistant’s is that it pull together in a single context many diverse but important domains that implement and utilize ontologies, inference engines, big data query engines, data structures, and dialog models from natural language systems.
Overview of possible technologies for an Android Virtual Assistant
The three main architectural components you could use for this system are;
1) A transport system with a Voice Over Internet Protocol (VOIP) server backend.
2) An Android or Google wear client
3) A Natural Language Processing (NLP) Parser and semantic engine for the Natural Language Interface (NLI).
Android has a well-developed Speech Recognizer with speech text conversion since Android 4.1 Jelly Bean, however this Android feature is not really suitable for a sophisticated ‘Android Virtual Assistant’ where we will need a NLP engine running as a back end server and we will need to pipe the speech data to our NLP engine.
The glue that will bind everything together will be VOIP. That is we need a way to send the users voice data to the NLP engine. The NLP engine will figure out the request and then depending on the request creates a voice response and either sends it back or processes further queries and sends the voice response and data back to the Android client. Android already provides a powerful voice api that can be exploited to cut down the calls to the NLP engine and exploited in the NLP response.
This series of articles is an introduction to the technologies you would employ to build an Android Virtual Assistant. It will bring together a lot of useful Android and related protocol information along the way which could be applied in a range of applications.
To create a Virtual Assistant as outlined previously requires the integration and implementation of several technologies as well as VOIP. Everything reduces in the end to a related use case with its associated services api’s, such as a booking service for transport or accommodation.
Session Initiation Protocol
While the Android Media API’s will allow us to record voice and send it as real-time voice data and Android’s voice api system’s application level voice actions will simplify and reduce the amount of code we will write, for a meaningful user experience we will need to employ a level of voice processing, analysis and learning that will require more cpu processing and system resources than what is currently available in mobile hardware. We will need a powerful remote process like a NLP engine as well as a way to get the voice data there for processing. One possible solution for this is VOIP.
Session Initiation Protocol (SIP) is the Internet standard for real-time voice and video communications. It’s a fundamental building block for many popular consumer VOIP products that you may have used. For Android 2.3 version (gingerbread) or greater, Android provides a SIP api with a base level SIP service you can use to set up voice calls, without having to manage sessions, transport-level communication, or audio record or playback directly. However the application would possibly need to hook into the audio record and playback methods of the Media API and combine this with the voice calls. The Android SIP api is fairly high level and easy to use.
There are two existing open source Android clients for SIP Sipdroid and CSipSimple.
With CSipSimple it is possible to work with Android’s dialer and call logs, and with some work you can set up a dialog that prompts the user when dialing out to choose SIP or GSM.
With SIPDroid you do not have access to this functionality. CSIPSimple is much easier to set up and work with however Sipdroid has a better integration with IP PBX.
PBX and IP PBX
A private branch exchange (PBX) is a gateway to the public telephone system. A VOIP gateway combines with a PBX to produce an IP PBX via TCP/IP. There are companies offering free hosted cloud based IP PBX solutions. As well as the cloud based IP PBX solution there exist many options for VOIP for the budget conscious. Budget friendly options would include Google Voice soon to be merged with Google Hangouts. This is a good option as with your Google Voice account you can link to VOIP service like GROOVE IP or Talkatone.
We will use one such free service with the free open source Android SIP application Sipdroid to illustrate both the architectural principles of a VOIP gateway and how to configure a SIP client application. Both CSipSimple and SIPDroid could be exploited to manage the boilerplate involved in implementing an Android SIP client.
In this example we will go through the steps to connect with Sipdroid to a free account at PBXes.com. By doing this we will get used to terminology of the SIP protocol and how an Android SIP client is implemented and configured.
PBXes.com provides a virtual PBX (Private Branch Exchange) service. Basically by setting up what are known as Extensions we can pipe voice data to and from the Android client to the NLP backend.
Create the extensions for your VIP network by selecting ‘Extensions’ from the left menu, then ‘SIP’ under ‘Add an Extension.’ Each extension will be specific for a unique Android device in the network.
If successful you will see your new extension listed as below.
3) Set up the Android Sipdroid Application for the new PBXes account
First download and install Sipdroid form the source here.
To set up Sipdroid source as an eclipse project, open as an existing project into eclipse with the ADT plugin then configure the Android manifest.xml file with the appropriate values for your Android sdk set up, for example
<uses-sdk android:minSdkVersion="19" android:targetSdkVersion="22"/>
Once you have the Sipdroid source building in eclipse run it in an emulator. The screenshot below shows Sipdroid running in the emulator before any configuration steps.
Within the Account Settings section of sipdroid, set up your extension·
Authorization Username and Password will be the username / password provided at account registration
The screenshot below shows the successful configuration options
You should see a green dot appear in your notifications tray on successful configuration as shown in the screenshot below.
Now we are set up to call the extension 100, however this will fail in the emulator, you will need to install your apk file into an Android phone connected via the adb Usb Bridge.
Android SIP API
As well as outlining the key technologies employed in developing Virtual Assistants this practical exercise served to illustrate the principles of SIP and the configuration parameters needed to bootstrap a SIP application. The Sipdroid application uses native libraries for its SIP implementation that are apart from the native Android SIP implementation. These include Skype’s super wideband audio codec SILK, SpanDSP and Speex. You have to be careful using non Android native libraries as they are often contaminated by Patents limiting inclusion in open source sip applications.
In Part 2 we will step through the components of a much simpler Android SIP client using the pure Android SIP api.