Named Entity Recognition Using Word-Embedding Techniques for ArabicWeb16: An Empirical Study
Al-Salman, Sharefah Al-Ghamdi; Mashael Al-Duwais; Hend Al-Khalifa; and Abdulmalik . 2018
The 3rd Workshop on Open-Source Arabic Corpora and Processing Tools introduces ArabicWeb16 Data Challenge track. The challenge is about experimenting with ArabicWeb16 dataset, the largest Arabic Web dataset publicly available with about 150M Arabic Web pages. In this paper, we explore the ArabicWeb16 dataset and experiment with it to build word-embedding models for Named Entity Recognition (NER) task. Word-embedding models are powerful for building many Natural Language Processing (NLP) tasks including NER. We tried two word-embedding models: Google Word2Vec model and Stanford GloVe model. The two models were used to recognize similar words for each named entity type. The ArabicWeb16 dataset was somehow hard to pre-process, however, the final results showed promising outputs.
Although syntactic analysis using the sequence labeling method is promising, it can be problematic when the labels sequence does not contain a root label. This can result in errors in the final…
With the advent of pre-trained language models, many natural language processing tasks in various languages have achieved great success.
This paper introduces the first syntactically annotated corpus for Classical Arabic poetry, a morphologically rich ancient Arabic text. The paper describes how the dependency treebank was prepared…