Sindhi Text Corpus using XML and Custom Tags

Authors

  • Zeeshan Bhatti
  • Majid Shah University of Sindh, Jamshoro

DOI:

https://doi.org/10.30537/sjcms.v2i2.215

Keywords:

Corpus; Sindhi; Sindhi Corpus; Natural Language Processing; XML

Abstract

Sindhi language being one of the oldest languages of the world, has still very limited use in digital age due to lack of digital contents. The use of corpus for each language has been extremely important in facilitating the natural language processing of its script. This research work address the issue of building corpus for Sindhi Language using UML based Tagging. The tree based XML tag structure is designed to develop Sindhi Corpa, that has two main nodes namely metadata and sindhi Document which contains the main text.

Downloads

Download data is not yet available.

Downloads

Published

2018-12-31

Most read articles by the same author(s)