Android TutorialsAndroid Virtual Assistant Part 2: Android SIP and Voice API’s

Android Virtual Assistant Part 2: Android SIP and Voice API’s

This article is part two of a series where we overview technologies that could be used to construct what we can loosely term as a Android Virtual Assistant. In part one we reviewed the SIP (session initiation protocol) that is used for controlling of VoIP calls. We set up an extensions on the IP_PBX provider as an endpoint for the final destination of a VoIP call. We are not restricted to, the endpoint could be a software application running on a remote device or a cloud server. In other words, a remote SIP endpoint is any device or cloud server that is VoIP enabled with SIP.

We saw how it has a unique address that is a SIP URI, similar in function as an email or a phone number. From the demo code we saw that the generic form of this url is’

<protocol>:<endpoint>@<domain> eg ‘sip:[email protected]

If we have issues accessing the domain we could always swap out the domain name with the actual IP of the endpoint.

We will now extend our toolset for bootstrapping SIP enabled applications and see possible ways to implement the native Android SIP api and the integrate this with the Android Voice Actions api. The source code for this article can be found here

Configuring the native Android SIP API

To run the source in parts one and two you need to have an unlocked Android 2.3 or greater version on your phone. Also it doesn’t make sense to try and run the code on an android tablet without cellular voice capabilities.
In the provided Android source we can see a typical way to configure the Android native SIP api. The settings for the SIP configuration can be found in this xml file. The key point is that unlike when we configured the SipDroid application here we do not provide a SIP port, rather the Android api defaults to the SIP default port of 5060. You set the values that you created when you configured the extensions on

    	….    />
    	…..  />
    	…... />

As well as the basic parameters for the Native SIP api we will also need to correctly enable the Android SIP api permissions.

Configuring the Android SIP Permissions

In the Androidmanifest you will see the following features and permissions

<uses-permission android:name="android.permission.USE_SIP" />
	<uses-permission android:name="android.permission.INTERNET" />
	<uses-permission android:name="android.permission.VIBRATE" />
	<uses-permission android:name="android.permission.ACCESS_WIFI_STATE" />
	<uses-permission android:name="android.permission.WAKE_LOCK" />
	<uses-permission android:name="android.permission.RECORD_AUDIO" />

    	android:required="true" />
    	android:required="true" />
    	android:required="true" />

The uses-feature required attribute set to true informs your application runtime that it cannot function without the declared feature. However uses-feature elements are informational only, that is the application can still be installed on devices that do not support SIP. This means we have more configuration work to do.

Bootstrapping the Android SIP Application

The following code fragment from the Entry Point activity for the application shows how we can use the static methods in the SipManager class form the SIP api to check that the required hardware and system support is available to the application as it boots. Then to inform the user before exit. We can do this easily by first checking we have the hardware and native library support by calling first the Android context and then chain this with calls to the static methods in the SIP api class SipManager, The chained method calls will look like



If either of these calls returns false then we can inform the user in some way and implement a flow in the code to gracefully exit the application.

As well as the static methods in the SipManager class there are two other high level classes we need to setup to obtain a minimal SIP interface. These are SipAudioCall and BroadcastReceiver. The Google engineers designed the SIP api really well and the SipAudioCall type allows you to painlessly instantiate this class with SipManager then make Internet audio calls over SIP using the makeAudioCall() and takeAudioCall() methods.
Things get a little trickier when we think about the use case for the user and his phone, basically we need a way to listen for incoming SIP calls that could happen at any time. We do this by using the Android BroadcastReceiver as a Base class. You can see this in the demo source code here. As our application needs to receive these Broadcast events (SIP calls) we need a way to flag to the system that we want to receive them, we do this by publishing a tag in are Android manifest.xml as shown below

   android:label="Call Receiver" />

With the steps outlined above for configuring and bootstrapping the application we have set up a minimal interface for the Native Android SIP api.

Android Voice API

Android Voice Actions are a base api and service that has been implemented in some form since 2010 and started with Android 2.2 (Froyo). The capabilities of the API have rapidly evolved since then first with Voice Search in Android 4.1 Jelly Bean and then notably with the advent of Android wearables and their obvious requirement for a Voice initiated interface.

Voice Actions in the Froyo sense are a series of spoken commands that let you control your phone using your voice. Call businesses and contacts, send texts and email, listen to music, browse the web, and complete common tasks, all just by speaking into your phone.

The Android Wear platform provides several voice intents that are based on user actions such as “Take a note” or “Set an alarm”. This allows users to say what they want to do and let the system figure out the best activity to start.

In addition to using voice actions to launch activities, you can also call the system’s built-in Speech Recognizer activity to obtain speech input from users. This is useful to obtain input from users and then process it, such as doing a search or sending it as a message.
Voice Actions are basically hard coded commands.

The example code for this section can be found here. It consists of two activities, the EduonixVoiceActivity activity simply uses the platform provided voice intents like


For example while EduonixVoiceRecognition goes further and calls the system’s built-in Speech Recognizer activity to obtain speech input from users.

Bootstrapping the Voice Application

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO"/>

The following code fragment from the EduonixVoiceActivity activity shows how we can use the PackageManager class to check the required voice intents are available for the application

PackageManager pm = getPackageManager();
List<ResolveInfo> activities = pm.queryIntentActivities(new Intent(
     RecognizerIntent.ACTION_RECOGNIZE_SPEECH), 0);
if (activities.size() == 0) {
  Toast.makeText(this, "Voice recognizer not present",

The following code fragment from the EduonixVoiceRecognition activity shows how we can perform the same function using the static methods in the SpeechRecognizer class form the Voice api.


With the preceding permissions and code we can check that the required hardware and system support is available to the application as it boots.

Both the provided activities build on the Android class RecognizerIntent. This is the gateway for voice input and actions for the application. In the EduonixVoiceActivity you can see how to use the provided RecognizerIntent to call Voice action activities via Intents. These supplied Voice actions such as ACTION_WEB_SEARCH allow for speech input to text and searching.

The EduonixVoiceRecognition shows how we can extend the platform provided language model by using the SpeechRecognizer to call an external speech recognition engine with a dedicated language model. This quickly becomes complex, we can by using the EXTRA_CALLING_PACKAGE we can create a flow where we call out to a commercial speech company like VLingo, Dragon or Microsoft’s Bing (which has a large free quota).

Moving Forward

The Android RecognizerIntent class creates Voice actions that supports the text and language models to perform functions like word search. We obtain results using these recognizer models and then classify or filter the results to find what are the best matches based on a vocabulary. There are different techniques to do this and they can range from simple parsing to complex statistical models (called language models).
Using the SpeechRecognizer class we can stream audio to remote servers to perform speech recognition with extended language models. To implement a non-trivial Virtual Assistant we need to go beyond this with a speech engine that works with both language models, natural language processing and high-level artificial intelligence. In our architecture for creating an ‘Android Digital Assistant’ the SIP endpoint will be an A Natural Language Processing (NLP) and AI engine.
SIP is not restricted to audio, SIP is video streaming capable like Skype. To test video streaming over SIP can use one of the many video streaming servers that offer a trial period. SIP is a robust protocol and SIP video streaming api’s are available in a number of languages.


Please enter your comment!
Please enter your name here

Exclusive content

- Advertisement -

Latest article


More article

- Advertisement -Eduonix Blog