farsinlp.github.io

Datasets for Farsi (Persian) Natural Language Processing (NLP)

Title	Description
Persian Wikipedia word2vec [website] [download_cbow] [downlad_skipgram]	Word2vec models generated using Persian Wikipedia corpus including CBOW and Skipgram models.
Persian-Wikipedia-glove [website] [download]	GloVe model trained on the Persian Wikipedia corpus.
Fasttext [website] [download_bin] [download_vec]	Pre-trained word vectors for 157 languages, trained on Common Crawl and Wikipedia using fastText. These models were trained using CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives.
Word2vec model for Farsi literature [website] [download]	This document is dedicated to providing a word2vec model developed for Farsi poems of 48 poets.