top of page


Updated: Jul 31, 2021

Over the last few years, Voice assistants have gained a lot of popularity. Voice assistants are not only used in our mobile phones but also it is widely being used in Computers, Cars, Hospitals, Public Sectors etc. This piece of technology mainly relies on various APIs to fetch information, NLP to respond smartly and Deep learning and Machine learning. Various modules can be used to load system resources and applications to make it work for our daily tasks. Deep learning can be used for Image classification and object/text detection. Natural Language Processing can be used so that the voice assistant replies smartly based on its Natural Language Processing Intelligence. Certainly, using a variety of technologies makes our voice assistant perform better and makes it more reliable for our day-to-day usage.

Nowadays, everyone is troubled by typing commands into our phones or systems. This definitely causes procrastination in our busy schedule. Switching to a voice assistant for feeding our commands/basic tasks might be the best solution. Voice assistants like Google Assistant, Bixbie , Alexa, Siri have been favored and are revolutionizing the notion of Smart Homes and Smart devices. We can configure our own voice assistants using python modules and basic artificial intelligence techniques. This entire article will demonstrate you the way you can build your own Alexa, Siri or Jarvis from ironman/Nebula from GOTG xD .

The entire implementation is divided into 3 parts:

  • Step1 : Non-NLP based assistant

  • Step2: Identifying Similar words using our voice assistant and give similarity scores.

  • Step3 : NLP based responses.

Before we get started, here is the data flow diagram for the project fig 0.1:

Fig 0.1

Step 1 First, we are going to import pyttsx3 which will help us to convert text/string commands into voice (Assistant’s Voice) and SpeechRecognition which will identify out voice input and convert it to strings for processing. From pyttsx3, SAPI5 which is a Microsoft Speech API will help us load the available voices.

import pyttsx3
import speech_recognition as sr
import datetime
import wikipedia
import webbrowser
import os
import smtplib

Now, for using the microphone resource from our system, we’ll use:

r = SpeechRecognition.Recognizer()

The voice assistant greets you according to the time using the date time module. When asked for the name, it/she replies as follows. We have used web browser module to access our default web browser for searching queries or websites like as shown in fig 1.1."")

Fig 1.1

Similarly, we can search for google queries directly. The search query is directly fed into the google url,fig 1.2 .

url = '' + query

We have used Wikipedia API to fetch the information directly from the Wikipedia page for the given input. Tasks like opening an application and searching a location can be done as shown in fig 1.3 and fig 1.4 .

query = query.replace("wikipedia", "")
results = wikipedia.summary(query, sentences=2)

Fig 1.3

Fig 1.4

Smtplib is used to send emails. You need to grant permission to third party application from your email and fed in the password inside your code. The code and output is as shown fig 1.5.

server = smtplib.SMTP('', 587)
server.login('', '#######')
server.sendmail('', to, content)

Fig 1.5

Step 2:

This part of implementation consist of deep learning algorithms to identify the faces and smiles for the given dataset of images. It also can detect faces and smiles in your camera. The NLP’s wordnet, spacy modules are used to determined the similarity score between 2 words that we give in.

Here, we have used OpenCV library for face detection. The datasets used to identify are haarcascade_faces and haarcascade_smile. Here, we have converted every pixel to a grey format.

image = cv2.cvtColor(original_image, cv2.COLOR_BGR2GRAY)
detected_faces = face_cascade.detectMultiScale(image=image, scaleFactor=1.3, minNeighbors=4)
cv2.rectangle(original_image,(x, y),(x + width, y + height),(0,255,0),thickness=2)

That is because, algorithm works well with B&W images than Colored Images. The code and output is as illustrated, fig 2.1

Fig 2.1

For the sake of implementation of part 3, we are going to check for similarity of words. In that way, the assistant will actually find for similar words from dataset set and try replying for the best possible outcome.

import spacy
nlp = spacy.load('en_core_web_md')
token1 = nlp(word1)
token2 = nlp(word2)

Here, we have used spacy which is another algorithm to calculate similarity scores as shown in Fig 2.2 .

Fig 2.2

Step 3:

For this stage, we have used wordnet and wu-palmer similarity, cosine similarity to recommend synonyms and find words similar to the given voice command with best similarity score.

As a dataset, we have chatbot.txt file. When a voice input command is given to the assistant, It will search for the keywords and compare that keyword with words from chatbot.txt. For the words with highest similarity, it will read out the given lines from our dataset.

For eg, we we say, “Tell me a joke”. Here “Joke” is a keyword. So the algorithm will now search for words that are similar to the word “joke”. And once it finds out, it will read out the lines for that are related to that word as shown in fig 3.1 .

TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(sent_tokens)
vals = cosine_similarity(tfidf[-1], tfidf)
flat = vals.flatten()
req_tfidf = flat[-2]

Fig 3.1

We are able to study about various AI techniques and certainly that makes our life reliable. Assistants will surely sustain and make a great part of upcoming era of smart homes, smart cars, smart hospital services.

GitHub Code:






82 views0 comments

Recent Posts

See All
bottom of page