This is a new old project of mine. It's new because I just started to code it and it's old because I had the idea for a very long time (I think I got this idea back in 2006).
The problem:
I believe that there is no good free spell checking dictionary available for Romanian, so I decided to build one.
The solution:
Build a dictionary from published text, based on the frequency of words. The idea is that a bad form will be less common than a good form of the word.
What is available:
I have committed a small spring based project that is able to break text into words and put the words into a database. The schema is very simple, I am counting he number of times each word appears. I plan to further develop this and provide a web interface to access the database. For now, all it does is to scan a XML file (I tested it with the Romanian XML dump of Wikipedia).
It's not very efficient, but it was done in a few hours.
The project uses Spring, Hibernate and Firebird SQL, But you can easily switch another DB if you like.
You can get the sources from my repo: https://bitbucket.org/ieugen/zetar/
Niciun comentariu:
Trimiteți un comentariu