Artificial IntelligenceSpeech to Text Conversion Using JavaScript

Speech to Text Conversion Using JavaScript

Speech recognition is a feature that gives us the ability to perform tasks using our spoken words as input. Speech recognition is gradually becoming a part of our lives in the form of voice assistants such as Alexa, Google Assistant, and Siri. Whether it’s dictating words to your device to compose a document, doing a web search using voice, or controlling your computer using speech — speech to text conversion is making our life faster and comfortable. It has the potential to replace traditional forms of human to machine interface input devices, such as keyboards. A future where humans are able to interact with machines just by using their speech and bodily movements is not very far.

The Web Speech API

The Web Speech API can perform two types of functions:

Speech recognition (speech to text): this feature checks for words and phrases in the speech input and provides the identified words as output text.

Speech synthesis (text to speech): this feature synthesizes text and converts it into speech.

A basic web application for speech to text conversion using JavaScript:

Like any other web app, we need an application having the following files in its directory:

• The index.html file which contains the HTML code for the web app
• The style.css which contains the CSS styles used in the web app
• The index.js file containing the JavaScript code of the web app
• A web server for running the web app

Web server for chrome
Speech recognition can be implemented in the browser using JavaScript Web Speech API. The Web Speech API enables the web app to accept speech as input through the device’s microphone and convert the speech into text by matching the words in the speech against the words in its vocabulary.

Along with SpeechRecognition API, a number of closely related APIs are used for displaying results, grammar, etc. These results can then be used as input by other APIs for performing tasks.

Speech to text demo app is being used as an example here. The user just has to tap the start button on the screen and say the keyword and the webpage will display the word in the text.

Demo Web App

JS speech to textindex.html:

<?php Header("Cache-Control: max-age=3000, must-revalidate"); ?>

<!DOCTYPE html>

<html lang="en">
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate" />
        <meta http-equiv="Pragma" content="no-cache" />
        <meta http-equiv="Expires" content="0" />
        <title>Speech to text conversion using JavaScript</title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width, initial-scale=1">

        <link rel="stylesheet" href="style.css">
        <link href="https://fonts.googleapis.com/css?family=Shadows+Into+Light" rel="stylesheet">

    </head>

    <body>
        <div class="mycontainer">

            <h1>Speech to text conversion using JavaScript</h1>

            <div class="mywebapp"> 
                <div class="input">
                    <textarea id="textbox" rows="6"></textarea>
                </div>         
                <button id="start-btn" title="Start">Start</button>
                <p id="instructions">Press the Start button</p>
            </div>
        </div>
        <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
        <script src="script.js"></script>
    </body>
</html>

style.css:

body {
  background: #1e2440;
  color: #f2efe2;
  font-size: 16px;
}


button:focus {
    outline: 0;
}

.mycontainer {
    max-width: 500px;
    margin: 0 auto;
    padding: 150px 100px;
    text-align: center;
}


.mywebapp {
    margin: 50px auto;
}

#textbox {
    margin: 30px 0;
}

@media (max-width: 768px) {
  .mycontainer {
    width: 85vw;
    max-width: 85vw;
  }

    button {
        margin-bottom: 10px;
}
}

script.js:

var SpeechRecognition = window.webkitSpeechRecognition;
  
var recognition = new SpeechRecognition();

var Textbox = $('#textbox');
var instructions = $('instructions');

var Content = '';

recognition.continuous = true;

recognition.onresult = function(event) {

  var current = event.resultIndex;

  var transcript = event.results[current][0].transcript;
 
    Content += transcript;
    Textbox.val(Content);
  
};

recognition.onstart = function() { 
  instructions.text('Voice recognition is ON.');
}

recognition.onspeechend = function() {
  instructions.text('No activity.');
}

recognition.onerror = function(event) {
  if(event.error == 'no-speech') {
    instructions.text('Try again.');  
  }
}

$('#start-btn').on('click', function(e) {
  if (Content.length) {
    Content += ' ';
  }
  recognition.start();
});

Textbox.on('input', function() {
  Content = $(this).val();
})

Line by Line Explanation of the Javascript Code

var SpeechRecognition = window.webkitSpeechRecognition;

Currently, the Web Speech API is only fully supported by Chrome for desktop and Chrome for Android. The Speech Recognition interface exists in the Chrome browser’s window object as webkitSpeechRecognition.

Speech recognition

var recognition = new SpeechRecognition();

Here we created an instantiation of the speech recognition interface.

var Textbox = $('#textbox');

This will hold the text for display after the speech is converted to text.

recognition.continuous = true;

This tells the interface that the speech is considered to be continuous, the speech to text conversion should be done instantaneously and pauses in speech are to be ignored.

recognition.onresult = function(event)

The event onresult holds all the values of speech converted to text so far but as we go on displaying, we only display the current word. So the current word is extracted into the variable transcript and appended to the content of the content to be displayed.

$('#start-btn').on('click', function(e) {
  if (Content.length) {
    Content += ' ';
  }
  recognition.start();
});

This will start the speech listening on the button click.

Conclusion

The speech recognition feature in its current form is free to use, highly developed, and gives reasonably accurate results. It needs better adaptation and more devices and browsers to support it for wider acceptance. There is a lot of open source development happening in this field with newer use cases being envisioned for proper adoption. Lack of standardization of speech recognition libraries and browsers needing to seek user permission for listening to microphone input due to privacy concern is also holding it back.

There are a lot of developments happening in terms of speech recognition. Voice assistance with machine learning is even being used to mimic human speech. There are also projects being undertaken to create universal translators which will be trained to take speech in any human language as input and translate it into words of another language as per the user’s preference.

Also, if you feel passionate about learning Machine Learning to explore the wonderful scope it holds in the future, you can try Machine Learning For Absolute Beginners online tutorial.

5 COMMENTS

  1. Sir after applying this code, if suppose I say “fullStop”then it is should be like” . ” but why it give output in text fullstop ?

  2. in cordova lite app – is that wont recognize
    var SpeechRecognition = window.webkitSpeechRecognition; ?
    please provide me the solution

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exclusive content

- Advertisement -

Latest article

21,501FansLike
4,106FollowersFollow
106,000SubscribersSubscribe

More article

- Advertisement -