Sindhi Text Corpus using XML and Custom Tags

Zeeshan Bhatti; Majid Shah

doi:10.30537/sjcms.v2i2.215

Sindhi Text Corpus using XML and Custom Tags

Authors

Zeeshan Bhatti
Majid Shah University of Sindh, Jamshoro

DOI:

https://doi.org/10.30537/sjcms.v2i2.215

Keywords:

Corpus; Sindhi; Sindhi Corpus; Natural Language Processing; XML

Abstract

Sindhi language being one of the oldest languages of the world, has still very limited use in digital age due to lack of digital contents. The use of corpus for each language has been extremely important in facilitating the natural language processing of its script. This research work address the issue of building corpus for Sindhi Language using UML based Tagging. The tree based XML tag structure is designed to develop Sindhi Corpa, that has two main nodes namely metadata and sindhi Document which contains the main text.

Downloads

Download data is not yet available.

Downloads

Published

2018-12-31

Issue

Vol. 2 No. 2 (2018): Sukkur IBA Journal of Computing and Mathematical Sciences

Section

Research Articles

License

The SJCMS holds the rights of all the published papers. Authors are required to transfer copyrights to journal to make sure that the paper is solely published in SJCMS, however, authors and readers can freely read, download, copy, distribute, print, search, or link to the full texts of its articles and to use them for any other lawful purpose.

The SJCMS is licensed under Creative Commons Attribution-NonCommercial 4.0 International License.

Sindhi Text Corpus using XML and Custom Tags

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

Most read articles by the same author(s)

ISSN-Block