magicfile icon Magic file website -

Download the database of Farsi Steaming datasets for evaluation

Short description and download link
Today, in this post, we have prepared for you, dear users of the website, the magic file of a database of Persian Steaming data sets for evaluation purposes.


List of similar files

Short link:
Full description of the file

Download the database of Farsi Steaming datasets for evaluation

The Persian stemming data set includes a set of Persian words that have been stemmed or reduced using morphological analysis methods. These words are accessible as a text file and are usually used for use in natural language processing or building machine learning models. The Persian stemming data set is known as one of the important data sets in the field of Persian language processing.

Explanations about Steaming

Stemming is one of the natural language processing methods that uses linguistic rules and different algorithms to transform words into their basic forms or roots. This method is usually used in natural language processing and text analysis to convert different words that are actually related to a common root or meaning into a similar form.

For example, using the stemming method, the words "I'm going", "I'm gone" and "we went" are converted into the word "went". This work is very useful for text processing and analysis because by reducing the number of words and converting them to the basic form, language rules and patterns can be easily identified, and with this, text analysis and processing can be done faster and more accurately.

Steaming is usually done using different algorithms. Some of these algorithms are: Porter's algorithm, Lemma's algorithm and the closest path algorithm. These algorithms convert words into their base form or roots according to linguistic rules and lexical patterns.

There is no standard dataset to evaluate the accuracy of Persian root algorithms. In order to create a dataset to evaluate the correctness of bases, we need a set of words along with their stems. These datasets are automatically extracted from two manually rooted datasets. The first dataset consists of a set of words and their roots extracted from the PerTreeBank collection [1]. This collection contains 4689 distinct words. In addition, to perform a better evaluation, we selected a large text set for the second data set. The words and their roots are extracted from this data set from the collection of the Persian Affiliation Tree Bank [2]. It contains 26,913 distinct words. These two data sets are of good quality in terms of the variety of speech part tags.

Each root data set consists of three columns. The first column is the inflectional word, the second is its root, and the third is its part of speech. You must add your roots to the fourth column. Then you can use the following command.

Sample database images

Dear users, it is recommended to download.

Click on the link below to download the database of Persian Steaming datasets for evaluation purposes

Click here to download

Files that you may need


Download the full list of specialized power words of English power in Persian and vice versa in Excel file

more details

User comments

کد امنیتی

List of website special files

The best notification service push script notification management and build site notification push

Download software to convert text file to vcf (mobile contact)

Download automatic translation software for po, pot files in full for all languages, including Persian

Download software to change the source language and code of Visual Studio (design elements of form design)