Offline on-Device ML – Text Classification

Machine Learning has proven to be a great advantage over the simple rule-based system. However, it comes with its own set of complexities such as training model, its size, computation, etc. As a result, it becomes challenging to use machine learning for mobile applications, where users expect a quick response. 

But with the release of TensorFlow lite by google, it’s now possible to ship and run any deep learning model directly on the device using Firebase MLKit. 

Image result for on device ml

Before delving deeper into this, let’s first understand the key advantages of having an ML model on the device:

  • No server communication and hence reduced hosting cost
  • Offline support – Will work without Internet
  • Speed – Speed of the task will improve as all processes are running locally
  • Privacy – Data will reside inside the user’s device

We will be using python as the backend to train and convert a model to the Tflite type. Below is an overview of the topics we shall be covering:

  • Data preparation and preprocessing
  • Building word tokenizer
  • Building a text classifier model using bag-of-words as Feature using Keras.
  • Converting Keras model (.h5) to Tflite format.
  • Creating an android application to run inference on the offline model.

Data Preparation

We need to first create a dataset for text classification. For simplicity, we can use SNIPS intent classification dataset with classes. 

You can download the dataset from here. 

Building Word Tokenizer 

Since Machine Learning works only on numbers, we need to first transform sentences to fixed number representation. For this, we will create a word_index dictionary, with a mapping of each word to a unique identity number. 

Here we will read uniques words from a sentence list and assign them a unique index. This will then be used to convert sentences to list of numbers:

Building a Text Classifier model 

We will build a text classifier (using the bag-of-words feature) using DNN architecture and bag-of-words as input feature:

Run the method given below to test your model by giving a model path and word_index path:

Converting Keras Model (.h5) to Tflite format

We need to convert the above model file to Tflite format, which we will then ship to the ML kit and android device.

Creating the Device Application 

Given below is the basic flow of how the ML model works on the device.

Let’s now discuss step-by-step the process we will be following to run inference.

Starting your project

  1. Add word_index.json and model.tflite inside assets of your android project.
  2. Add the dependencies for the ML Kit Android libraries to your module (app-level) Gradle file (usually app/build.gradle):

Also, 
in your build.gradle ( app-level ), add these lines, which will disallow the compression of .tflite files.

Hosting Models on Firebase

Follow the below steps to host your model.tflite mile to MLKit console.

  1. In the ML Kit section of the Firebase console, click the Custom tab.
  2. Click Add custom model (or Add another model).
  3. Specify a name that will be used to identify your model in your Firebase project, then upload the TensorFlow Lite model file (usually ending in .tflite or .lite).
  4. In your app’s manifest, declare that INTERNET permission is required:

Define Constants value used for Model 

Creating Model Input for Given Text

This method will return a list of integers in the required shape expected by the model. Here are the steps involved:

  1. Read word_index file from assets.
  2. Clean the text, removing punctuations, extra spaces, etc.
  3. Create a list of zeros of the size of the model input shape.
  4. Split text into words, based on words present in the text, it finds the index of that word from word_index and assigns value 1 that index in the list of the above-created zeros.

Code for the above implementation is given below:

Run Classification 

Call run inference method with the above-processed model input. It returns the label (int) with the maximum confidence score.

 

I hope the above helps you in getting started with ML on-device. Please do try the above and let us know if you have any feedback. We will be sharing more details in the following blog. 

Haptik is hiring. Do visit our careers page.

Share
Written by:

Data Scientist having 3+ years of experience. Solved the data problems in domains like Finance, Social Media, NLP, etc. I strongly believe in "Your I-can attitude is more important than your IQ".