Machine-Learning-approach-to-Bengali-POS-Tagging-using-BNLP

Machine Learning approach to Bengali Parts of Speech Tagging

About the Project:

This project has been done as the part of Minor Project submission at Heritage Institute of Technology under the Mentorship of Prof. Sandipan Ganguly (HIT-K).

Introduction to BNLP (Bengali Natural Language Processing) Toolkit:

A library with pre-trained model for POS Tagging, Word Embedding, Name Entity Recognition, FastText, Bengali StopWords, Bengali Corpus Class recognition etc.

Installation

pip install bnlp_toolkit

or Upgrade

pip install -U bnlp_toolkit

Methodology:

Raw Text-> Tokenization -> POS Tagging


pie-chart

pie-chart-Evaluated result of BNLP


Confusion Matrix:

We found false positive result as well & calculated Confusion Matrices to get Precision, Recall & F1 value.

We have used dataset from NLTR & got 90% accuracy.

Tools:

  1. Jupyter Notebook/Google Colab
  2. BNLP Library taken from: Prof. Sagor Sarker (Bangladesh) on GitHub.
  3. Research papers on Bengali Pos Tagging taken as references.

Mentor: Prof. Sandipan Ganguly (HIT-K).

Developers:

  1. Rajdeep Das (LinkedIn)
  2. Arghyadeep Banerjee
  3. Soham Chakraborty
  4. Tanmay Guchhait
  5. Debabrata Maity
  6. Alik Sarkar
  7. Sanju Manna

Read Publication on ResearchGate platform:

Link to read this publication

OR, you can click via DOI:http://dx.doi.org/10.13140/RG.2.2.35358.41287/1

Subject: Project Technical Report (Publication no. 359257508)

References taken from:

  1. https://bnlp.readthedocs.io/en/latest/
  2. https://github.com/sagorbrur/bnlp
  3. https://www.researchgate.net/publication/348957805_BNLP_Natural_language_processing_toolkit_for_Bengali_language
  4. https://medium.com/analytics-vidhya/bengali-pos-part-of-speech-tagging-using-indian-corpus-e85f47d3ad65
  5. https://nltr.itewb.gov.in/

BNLP Developer Credit: Prof. Sagor Sarker (https://github.com/sagorbrur)

Thank you for visiting.

© Rajdeep Das