magicfile icon Magic file website - magicfile.ir

Download the source and language recognition code of a text written with vb.net

دانلود-سورس-و-کد تشخیص-Tongue-یک-متن-نوشته-شده-با-vb.net
Short description and download link
We have prepared the source and language recognition code of a text written with vb.net for you, dear users of the magic file website.

Download

List of similar files

Short link: https://en.magicfile.ir/?p=2453
Full description of the file

دانلود سورس و کدDetecting the language of a text نوشته شده با vb.net

We have prepared the source and language recognition code of a text written with vb.net for you, dear users of the magical file website. The language recognition solution given is based on n-gram and word occurrence comparison. It is suitable for any language that uses words (this is actually not true for all languages). Depending on the model and the length of the input text, the accuracy is between 70% (only short Norwegian, Swedish and Danish classified by the "all" model) and 99.8% using the "default" model.

Background

Language recognition of a written text is probably one of the most fundamental tasks in natural language processing (NLP). For any language depending on the processing of an unknown text, the first thing you need to know is what language the text is written in. Fortunately, this is one of the easier NLP challenges. The approach I have chosen to implement is widely known and very simple. The idea is that each language has a unique set of (co)occurrence characters.

Sample of runtime images

The first step is to collect those statistics for all the languages ​​that should be recognized. This is not as easy as it may seem at first. The problem is collecting a large set of test data (plain text) that includes only one language and is not domain specific. (Only newspaper articles may lack the use of the "I" word and direct speech. Using Shakespeare's plays would not be the best approach to recognize contemporary texts. Medical articles usually contain many domain-specific terms that are not even language-specific (major , minor, arterial, etc...) and if that's not hard enough, the texts should not be copyrighted. copyrighted?) I chose to use Wikipedia as my main source. I had to do some filtering to "Wikipedia contains many proper names (ie group names) that often contain a 'the' or an 'and' are. That is why those words exist in many languages ​​even if they are not part of the language. This should not necessarily be a disadvantage, as Anglicism has spread widely across many languages. I have three for each language. I made a statistic: Wikipedia contains many proper names (i.e. names of groups) that often contain a “the” or “and.” This is why Those words exist in many languages ​​even if they are not part of the language. This should not necessarily be a disadvantage, as Anglicism has spread widely across many languages. I created three statistics for each language:

  • Character set
    • Some languages ​​have a very specific character set (such as Chinese, Japanese, and Russian). For others, some characters give a good hint of the target languages ​​(eg, German Umlauts).
  • N-Grams

    • After converting the text into words (if necessary), the number of times 1, 2, and 3 grams was counted. Some n-grams are very language specific (eg, "TH" in English).
  • word list

    • A final source of disambiguation is the words that are actually used. Some languages ​​(such as Portuguese and Spanish) are almost identical in the characters used as well as the occurrence of certain n-grams. However, different words are used at different frequencies.

The statistical set is called a model. I have created subsets of the "all" model that best meet my needs (see table below). The "common" model includes the 10 most spoken languages ​​in the world. "Small" and "Default" are based on my usage scenarios. If you are from another part of the world, your preferences may be different. So please don't take offense at my choice of what languages ​​are in which model.

All statistics are sorted and ranked according to their occurrence. In the demo program, all models can be studied in detail. Classification of an unknown text is simple. The text is marked up and three tables are generated for statistics. The result table is compared with all model tables and the distance is calculated. The comparison table of the model that has the smallest distance with the unknown text is most likely the language of the text.

Language code Tongue Quality Assumption Common big Short
nl Dutch 13 x x
en English 13 x x x x
ca Catalan 13
fr French 13 x x x x
es Spanish 13 x x x x
no Norwegian 13 x x
da Danish 13 x x
it Italian 13 x x
sv Swedish 13 x x
de German 13 x x x x
pt Portuguese 13 x x x
ro Romanian 13
vi Vietnamese 13
tr Turkish 13 x
fi Finnish 12 x
hu Hungarian 12 x
cs Czech 12 x
pl Polish 12 x
el Greek 12 x
fa Persian 12
he Hebrew 12
sr Serbian 12
sl Slovenian 12
ar Arabic 12 x
nn Norwegian, Nynorsk (Norway) 12
ru Russian 11 x x
et Estonian 11
ko Korean 10
hi Hindi 10 x
is Icelandic 10
th Thai 9
bn Bengali (Bangladesh) 9 x
ja Japanese 9 x
zh Chinese (Simplified) 8 x
se Sami (Northern) (Sweden) 5

Dear user, you are offered a download

To download the source and language recognition code of a text written with vb.net, click on the link below

Click here to download

Files that you may need

دانلود-سورس-و-کد-دیکشنری-انگلیسی-به-فارسی-و-برعکس-با-سی-شارپ-همراه-دیتابیس-sqlite

Download the source and code of English to Persian dictionary and vice versa with C # with sqlite database

Download
more details
دانلود-سورس-و-کد-نرم-افزار-تبدیل-کد-ویژوال-بیسیک-به-سی-شارپ-و-برعکس

Download the source and software code to convert Visual Basic code to C# and vice versa

Download
more details
سیستم-مدیریت-ثبت-نام-ارشد-با-استفاده-از-چارچوب-Bunifu-با-کد-منبع-کامل-vb.net-و-دیتابیس-mysql

Senior registration management system using Bunifu framework with full source code vb.net and mysql database

Download
more details
دانلود-سورس-و-کد-نرم-افزار آزمون-املاي-انگشتي-Tongue-اشاره-براي-استفاده-در-کلاس-درس

Download the source and code of the sign language fingerprint test software for use in the classroom

Download
more details

User comments

کد امنیتی

Comment sent by Mozhgan - 2/21/2023 8:57:59 pm
Hello, I downloaded it, it is really wonderful
 
The answer of the magic file support
Hello Thank you
 
Comment sent by Mehbod - 1/25/2023 1:58:26 am
Hello I am very happy, I downloaded the file, it is the one I was looking for, I said to thank you
 
The answer of the magic file support
Hello please
 
Comment sent by Mahyar - 2022/12/17 12:52:04 am
Don't be tired I downloaded the file, it was the one I was looking for
 
The answer of the magic file support
Hello you're welcome
 

List of website special files

دانلود-نرم-افزار-تغییر-Tongue-سورس-و-کد-ویژوال-استودیو-(عناصر-دیزاین-طراحی-فرم-ها)
Download software to change the source language and code of Visual Studio (design elements of form design)

بهترین-سرویس-پوش-نوتیفیکیشن-اسکريپت-مديريت-اعلان-و-ساخت-پوش-نوتیفیکیشن-سایت
The best notification service push script notification management and build site notification push

دانلود-نرم-افزار-تبدیل-فایل-متنی-به-vcf-(مخاطب-موبایل)
Download software to convert text file to vcf (mobile contact)

دانلود-نرم-افزار-ترجمه-خودکار-فایل-های-po-,-pot-بصورت-کامل-برای-تمامی-Tongue-ها-از-جمله-فارسی
Download automatic translation software for po, pot files in full for all languages, including Persian