Datasets for Farsi (Persian) Natural Language Processing (NLP)

Question Answering

Question Answering is the task of answering questions, typically in the extractive form: the answer to the question can be extracted as a span from the corresponding text provided for the question.

Title Description
Persian Question Answering (PersianQA) Dataset is a reading comprehension dataset on Persian Wikipedia. The crowd-sourced dataset consists of more than 9,000 entries. Each entry can be either an impossible-to-answer or a question with one or more answers spanning in the passage (the context) from which the questioner proposed the question. Much like the SQuAD2.0 dataset, the impossible or unanswerable questions can be utilized to create a system which “knows that it doesn’t know the answer”.