Part-of-speech Tagging
Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech. A part of speech is a category of words with similar grammatical properties. Common English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, etc.
Example:
| Vinken | , | 61 | years | old | 
|---|---|---|---|---|
| NNP | , | CD | NNS | JJ | 
| Title | Description | 
|---|---|
| Uppsala Persian Corpus: UPC [website] [download] | Uppsala Persian Corpus (UPC) is a large, freely available Persian corpus. The corpus is a modified version of the Bijankhan corpus with additional sentence segmentation and consistent tokenization containing 2,704,028 tokens and annotated with 31 part-of-speech tags. The part-of-speech tags are listed with explanations in this table. | 
| Large-Scale Colloquial Persian [website] | Large Scale Colloquial Persian Dataset (LSCP) is hierarchically organized in a semantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI). |