Emoji Modelling  - Building an Emoji Autocomplete through Deep Neural nets

Emoji Modelling - Building an Emoji Autocomplete through Deep Neural nets

Large Language Models have taken the world by storm and it’s important to understand how they work internally. In this edition I will focus on Embedding and how language models use Word Embeddings to understand words and their similarities (or relationship between them).

LLMs, like other NLP models, use a technique called word embeddings (or more generally, token embeddings). These embeddings are dense vector representations of words (or sub-word units called tokens). The key idea is that words with similar meanings or that appear in similar contexts are located closer to each other in this high-dimensional vector space.

Keeping this idea in mind, I will implement a similar concept but not on textual data but on emojis. Yes, emojis! 🙂

We use emojis heavily while chatting and if we see some texts or tweets (having emojis) we can witness that they have some patterns. A text having a happy / positive sentiment might follow a smiley or a heart emoji. While a text / tweet having a negative / outrageous sentiment might have a sad or angry face emoji too!

We will use this idea to predict or autocomplete the next set of emojis given an initial emoji.


Article content


Setting up the Problem statement

Let’s set up a problem statement for our Neural Network to work upon so that we can build our Autocomplete engine. We know that a Neural Network works on the idea of Backpropagation where we give a set of input data along with the desired outcome and at every iteration calculate how far our network prediction is from the desired outcome. And then use backprop at every step to reduce this loss value and make the neural network prediction more accurate, useful and general.

Now let’s try to frame a problem statement for our Neural Network to work upon. We need our network to predict the future set of Emojis when given a set of Emojis as an input. For this our network needs to be trained on the dataset which has a series of emojis.

We pick this data from a public dataset of Tweets. When we pick emojis from the tweets we make sure that their context is preserved which we cannot achieve if we pick the emojis from a standard tabular dataset. By saying its context is preserved, I mean positive emojis follow the positive emojis (in tweets). It’s very unlikely we will see positive emojis following negative emojis (won’t make sense).

Article content


By this we preserve the order of the emojis so that they make sense and when trained on a Neural Network, our network can predict meaningful emojis. Following up on that, I picked up the Tweet dataset (having emojis) and extracted all the emojis out of it preserving their order.

Sharing some sample tweets below from the dataset which is used in training the neural net.

Article content

We can understand that if we preserve the order of emojis from the above content we will have multiple sets of emojis clubbed together that interpret a sentiment. Could be happy, sad, enraged, angry, fear, love many more…

Article content

Once we have the series of emojis extracted from the dataset, we make a pair of 3:1 dataset where we have 3 emojis in the series and the 4th one taken up as a target. In this way we order our entire series of emojis to build the training dataset for our network.

Article content

After processing the tweets and extracting the emojis and building the input dataset for our neural network, we have approx 90K sets of emojis with their targets.


Neural Network architecture

We have our dataset in place, now it’s time to build our Neural Network architecture. I will be using a fairly simple architecture which will be as follows.

Sequential (
  (0): Linear(in_features=30, out_features=100, bias=True)
  (1): Tanh()
  (2): Linear(in_features=100, out_features=50, bias=True)
  (3): Tanh()
  (4): Linear(in_features=50, out_features=839, bias=True)
)        


Article content

I have used a 3-layer Neural Network followed by a Tanh activation function. The last layer returns logits which are used to understand the probability of each emojis to come next in the sequence. By comparing the probability of the target emoji we can train the network to improve upon it through backprop.

We have used Emoji Embedding which is a 10 dimensional Tensor which represents every emoji in our dataset by a 10 dimensional vector. The goal of the network is to train this embedding vector so that the network can make meaningful predictions.


Training the Neural Network

We trained our above defined neural net on the input emoji dataset with a batch of 1024 emoji sets (picked at random) at once for about 50K iterations and achieved a loss of approx 2.5

Article content


Inference

Now it’s time to use the above trained network to predict the next set of emojis from a starter emoji. Let’s see how well our network has trained to make some predictions.

These were some predictions made by our network.

Article content

You can see our model predicting some decent similar emojis!

We can use a more sophisticated model on a much more dense and cleaned dataset to build a more efficient emoji autocomplete model.


Conclusion

You can check my Youtube channel for more relatable technical content.

Meanwhile what you all can also do is to Like and Share this edition among your peers and also Subscribe to this newsletter so that you all can get notified when I come up with more content in future.

Until next time, Dive Deep and Keep Learning!


Mustafa Radaideh

MSc | BPM | AI | SW Architect | Enterprise Architect | Open Banking | Open Finance | Fintech | Digital Transformation

2mo

Nice one & worth reading.

Ananya Shukla

Salesforce Developer @tcs | Salesforce 3x certified | LWC | Apex | Flows | Apex Triggers | Javascript | Salesforce Declarative Development

2mo

Saurav Prateek thank you for sharing...it was worth reading 🚀👍

Daniel Gillett

Node.js • NestJS • Serverless APIs • AWS • React • Next.js • Fullstack

2mo

Thank you to Saurav Prateek for sharing his thoughtful exploration of how deep neural networks and embedding techniques can be adapted to the world of emojis. I particularly enjoyed seeing how you preserve the natural order of emojis from live conversations helps the model capture real user sentiment sequences, rather than relying on abstract tabular data. Very insightful, and useful! Cheers!

To view or add a comment, sign in

Others also viewed

Explore topics