farsinlp.github.io

Datasets for Farsi (Persian) Natural Language Processing (NLP)

Pre-trained Embeddings

Title Description
Persian Wikipedia word2vec
[website] [download_cbow] [downlad_skipgram]
Word2vec models generated using Persian Wikipedia corpus including CBOW and Skipgram models.
Persian-Wikipedia-glove
[website] [download]
GloVe model trained on the Persian Wikipedia corpus.
Fasttext
[website] [download_bin] [download_vec]
Pre-trained word vectors for 157 languages, trained on Common Crawl and Wikipedia using fastText. These models were trained using CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives.
Word2vec model for Farsi literature
[website] [download]
This document is dedicated to providing a word2vec model developed for Farsi poems of 48 poets.