Faceți căutări pe acest blog

duminică, 4 septembrie 2011

Zetar, a free spell-checker dictionary builder, is reborn

This is a new old project of mine. It's new because I just started to code it and it's old because I had the idea for a very long time (I think I got this idea back in 2006).

The problem:
I believe that there is no good free spell checking dictionary available for Romanian, so I decided to build one.

The solution:
Build a dictionary from published text, based on the frequency of words. The idea is that a bad form will be less common than a good form of the word.

What is available:

I have committed a small spring based project that is able to break text into words and put the words into a database. The schema is very simple, I am counting he number of times each word appears. I plan to further develop this and provide a web interface to access the database. For now, all it does is to scan a XML file (I tested it with the Romanian XML dump of Wikipedia).

It's not very efficient, but it was done in a few hours.
The project uses Spring, Hibernate and Firebird SQL, But you can easily switch another DB if you like.

You can get the sources from my repo: https://bitbucket.org/ieugen/zetar/

Niciun comentariu:

Trimiteți un comentariu