farsinlp.github.io

Datasets for Farsi (Persian) Natural Language Processing (NLP)

Sentiment Analysis

Sentiment analysis is the task of classifying the polarity of a given text.

Title Description
MirasOpinion
[website]
MirasOpinion is crawled from the Digikala website, one of the largest e-commerce websites in Iran. 2.5 million comments have been crawled, and after some pre-processing, we reduce its size to one million comments. Then the corpus had been labeled using crowd-sourcing; A telegram bot is used to send the unlabeled data to several users. Our bot asks them to label the represented document as positive, negative, or neutral.
PerSent
[website]
[download]
This dataset presents real-valued polarity labels, in the range from -1 to 1, for thousands of Persian words and expressions.
LexiPers
[website]
[download]
An ontology based sentiment lexicon for Persian.
LSCP
[website]
[download]
Enhanced large scale colloquial Persian language understanding. a dataset of 27M casual Persian (Farsi) tweets with its derivation tree, part-of-speech tags, sentiment polarity and parallel sentences in English, German, Czech, Italian and Hindi spoken languages.