Named Entity Recognition of an Oversampled and Preprocessed Manufacturing Data Corpus

Nurul Hannah Mohd Yusof; Nurul Adilla Mohd Subha; Nurulaqilla Khamis; Norikhwan Hamzah

doi:10.37934/araset.36.1.203216

Authors

Nurul Hannah Mohd Yusof Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
Nurul Adilla Mohd Subha Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
Nurulaqilla Khamis Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
Norikhwan Hamzah Faculty of Mechanical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

DOI:

https://doi.org/10.37934/araset.36.1.203216

Keywords:

Named Entity Recognition, Hidden Markov Model, Factory Reports

Abstract

In recent manufacturing industry, improving the manufacturing process is of paramount importance. One area that holds great potential for enhancement is the application and manipulation of maintenance data. By effectively leveraging this data, manufacturers can optimize maintenance schedules, leading to increased efficiency, reduced costs, and minimized downtime. However, the challenge lies in handling vast amounts of maintenance data that often come in various formats, making it difficult to extract valuable insights. Without proper analysis, this unprocessed data can result in unforeseen issues, costly disruptions, and extended downtime periods. To overcome this obstacle, modern manufacturing companies are turning to advanced technologies such as language modelling, text classification, machine translation, and Named Entity Recognition (NER). To the best of our knowledge, no investigation has been conducted to assess the impact of text preprocessing on NER performance. Improving the initial stage of NER, such as text preprocessing, can enhance NER performance which leads to the training model’s efficiency performance. In this study, Hidden Markov Model (HMM) is employed to improve NER performance by utilizing oversampling and text preprocessing techniques. The study is performed without IOB labelling and consider seven specific entities and the preprocessing text tasks include tokenization, lemmatization, erase punctuation, stop words removal, and elimination of long and short words. As a result, HMM for NER with oversampling and preprocessed text outperformed the one without any of both by 20.10% and 27.59%, respectively, due to consideration of significant classes and words among the entity classes in preprocessed factory reports. This finding highlights the importance of text preprocessing method selection in NER and its capability to optimize maintenance schedule and reduce downtime.

Downloads

Download data is not yet available.

Author Biographies

Nurul Hannah Mohd Yusof, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

nhannahmyusof@gmail.com

Nurul Adilla Mohd Subha, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

nuruladilla@utm.my

Nurulaqilla Khamis, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

nurulaqilla@utm.my

Norikhwan Hamzah, Faculty of Mechanical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

norikhwan@utm.my

Named Entity Recognition of an Oversampled and Preprocessed Manufacturing Data Corpus

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Nurul Hannah Mohd Yusof, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

Nurul Adilla Mohd Subha, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

Nurulaqilla Khamis, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

Norikhwan Hamzah, Faculty of Mechanical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

Downloads

Published

Issue

Section

Most read articles by the same author(s)

araset

THE PUBLISHER

PREP

SUBMISSION

JOURNAL METRICS AND INDEXING

DISTRIBUTION OF AUTHORS

Information

Named Entity Recognition of an Oversampled and Preprocessed Manufacturing Data Corpus

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Nurul Hannah Mohd Yusof, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

Nurul Adilla Mohd Subha, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

Nurulaqilla Khamis, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

Norikhwan Hamzah, Faculty of Mechanical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

Downloads

Published

Issue

Section

Most read articles by the same author(s)

araset

THE PUBLISHER

PREP

SUBMISSION

JOURNAL METRICS AND INDEXING

DISTRIBUTION OF AUTHORS

RELATED PUBLICATION

Information