What is tokenization in nltk

This recipe explains what is tokenization in nltk

Recipe Objective

What is Tokenization? Tokenization is the task of chopping the text into smaller peices which are called tokens, here the tokens can be either words, characters or subwords. There are different tokenizers with different functionality lets understand them one by one.

List of Classification Algorithms in Machine Learning

Step 1 - Sentence Tokenization, Import the sent_tokenize

from nltk.tokenize import sent_tokenize

These tokenizer Splitt the Sentences into Paragraphs.

Step 2 - Take a simple text and apply sentence tokenization on that

My_text = "Hello everyone, Welcome to the session. Now your going to study about tokenization !!" sent_tokenize(My_text)

['Hello everyone, Welcome to the session.', 'Now your going to study about tokenization !', '!']

Here we can see that, the sentence has been converted into a paragraph.

Step 3 - Word Tokenization, Import the word_tokenize

from nltk.tokenize import word_tokenize

Step 4 - Apply word tokenization on simple text

word_tokenize(My_text)

['Hello',
 'everyone',
 ',',
 'Welcome',
 'to',
 'the',
 'session',
 '.',
 'Now',
 'your',
 'going',
 'to',
 'study',
 'about',
 'tokenization',
 '!',
 '!']

From the above we can see that the sentence has been converted into words

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

Recommender System Machine Learning Project for Beginners-3
Content Based Recommender System Project - Building a Content-Based Product Recommender App with Streamlit

Build a Graph Based Recommendation System in Python-Part 2
In this Graph Based Recommender System Project, you will build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.

PyTorch Project to Build a GAN Model on MNIST Dataset
In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Build Deep Autoencoders Model for Anomaly Detection in Python
In this deep learning project , you will build and deploy a deep autoencoders model using Flask.