Check the spelling of a word or find prosecutions:
How to use the Command Line Version
How to use this tool in your own program
A
description how to install a module you find at the main
page of the ASV toolbox Project.
The line
you have to copy into the toolbox.start file look like this:
de.uni_leipzig.asv.toolbox.levenshtein.LevenshteinModul
This tool verify a words
spelling and find prosecutions for this word. It uses direct a cyclic word
graphs, so called DAWG’s.
Before you can start you have to do 2 simple configurations. The first one is to choose a DAWG. For this you have 2 opportunities: use a integrated DAWG from the drop down menu(see figure 1) or choose your own DAWG from file(see figure 2).
This
is the drop down menu for choosing a integrated DAWG for the language you
want to use.
![]()

figure 1
Click
this button for open the file open dialog below. Select
a DAWG file and open it. The DAWG will be loaded now from this file.
After finishing in the drop down menu for the language will be selected
“own DAWG”.
![]()


figure 2
The second
configuration is select a distance(see figure 3). The distance describe
how dissimilar a word could be from your word but will be listed in the Did you
mean text area on the panel.
Use
the arrow buttons to increase or decrease the distance. For example: The distance of 3 means all words in the Did you mean
text area will be at the most 3 positions different from your entered word.
![]()

figure 3
Now you can
enter your word in the text filed at the top of the panel. Press enter or use
the go button to start the spell checking and finding of prosecutions(see
figure 4).
Enter
here the word. Use the enter button at your keyboard or the go button for
starting. Here
you find the result of the spell checking. Here you find all found prosecutions to your word. Use
this button to save all words from spell checking in a file. The file
will have the following format: first line : the word you entered following lines: one word from spell checking Use
this button to save all prosecutions in a file. The file will have the
following format: first line : the word you entered following lines: one prosecution

![]()
![]()
![]()
![]()

figure 4
This option will you find
at the Options panel. Fill out all text fields with the correct data and click
on connect to Database. Now all tables and their columns will be loaded for the
tool(see figure 5).
Choose the tables and columns you need for the DAWG. The table
should contains at least 2 columns, one with the words and one with ids for
the words. Click
on this button to connect to your database and load the tables and column
names for this tool. Fill in the text fields(the first 7 things, starting with the Driver
Class and ending with the Database) the right information for the database
you want to use.
![]()
![]()
![]()

figure 5
Before you
start training choose how many words you want to use(see figure 6).
Id of
the first word which should be used in the DAWG. Number of words which should be used in the DAWG.
![]()
![]()

figure 6
Now start
the training by clicking on Load Words.. button(see figure 7).
Load
Words.. button. Click it and the save dialog will open for choosing the
file for the trained DAWG. The button will be not accessible until the DAWG
is saved. Save dialog for choosing the file for the trained DAWG. After
choosing the file the training will begin.
![]()
![]()

figure 7
For training a DAWG from
file you need a word list in the format(one word per line) like in figure 8.

figure 8
For starting the training
click on the Choose file.. button on the Options panel and choose the training
file(see figure 9).
Choose
file.. button to open the file dialog below. It will be not accessible
until the DAWG is saved. Select
the training file and open it.
![]()
![]()

figure 9
Choose in
the second file dialog, opening after choosing the trainings file, the file for
the trained DAWG. After pressing save DAWG the training of the DAWG will begin(see
figure 10).
After click on this button the training of the DAWG will begin.
![]()

figure 10
For starting the command
line version of this tool use the following command:
java -Xmx500M -classpath .;./lib/ASV_Levenshtein.jar -Djava.ext.dirs=.;.lib
de.uni_leipzig.asv.toolbox.levenshtein.Levenshtein option ...
-? Print this information.
-C Create a word graph from file (-i) or from a database.
-g Start gui mode.
-w Specify the word to check. (Use with -f)
-f Specify the dawg file to use.
-i Specify the file containing words. (Use with -C).
-o Save output to the specified file.
-l Levenshtein distance to use. (default is 1)
-D Specify the driver (default is com.mysql.jdbc.Driver).
-P Specify the protocol (default is mysql).
-h Specify the database host (default is localhost).
-x Set the port to use. (default is 3306).
-d Specify the database
-u Database user name.
-p The user's password.
-t Specify the table.
-W Specify the table's column which contains the words.
-c Specify the table's column which contains the word ids.
-I Specify the lowest word id. (default is 101)
-O Specify the numbers of words. (default is 2000)
·
java
-Xmx500M -classpath .; ./lib/ASV_Levenshtein.jar -Djava.ext.dirs=.;.lib
de.uni_leipzig.asv.toolbox.levenshtein.Levenshtein -C -D com.mysql.jdbc.Driver
-P mysql -h localhost -x 3306 -d de1M -u root -p root -t words -W word -c w_id
-I 101 -O 5000 -o ./examples/test.dawg
·
java
-Xmx500M -classpath .; ./lib/ASV_Levenshtein.jar -Djava.ext.dirs=.;.lib
de.uni_leipzig.asv.toolbox.levenshtein.Levenshtein -C -i
./resources/levenshtein/plain/wordlist_de.txt -o ./examples/de_cli.dawg
·
(needs
the dawg which was build in the second example)java -Xmx500M -classpath .;
./lib/ASV_Levenshtein.jar -Djava.ext.dirs=.;.lib
de.uni_leipzig.asv.toolbox.levenshtein.Levenshtein -w Baum -f
./examples/de_cli.dawg
·
(needs
the dawg which was build in the second example)java -Xmx500M -classpath .;
./lib/ASV_Levenshtein.jar -Djava.ext.dirs=.;.lib
de.uni_leipzig.asv.toolbox.levenshtein.Levenshtein -w Baum -f
./examples/de_cli.dawg -o ./examples/Levenshtein_CLOutput_Baum.txt -l 3
It is easy to use
Levenshtein for your own program. You only need the 3 classes Levenshtein, Dawg
and DawgFactory which you find in the package de.uni_leipzig.asv.toolbox.levenshtein.
|
class |
description |
|
Dawg |
This class represent the
DAWG. You have to create an instance of this class using the class DawgFactory.
|
|
DawgFactory |
This class create an
instance of the class Dawg. For this use the method LoadGraph(String
filename) which needs as parameter a string representing the path to |
|
Levenshtein |
This class provides the
algorithms to find alternatives and prosecutions. There are 2 methods and 1
attribute you need to know. |
Here are an example of a
JAVA class(LevenshteinTest.java) using the Levenshtein tool. You can find the
class LevenshteinTest.java in the package de.uni_leipzig.asv.toolbox.tests.
package de.uni_leipzig.asv.toolbox.tests;
import java.util.Vector;
import de.uni_leipzig.asv.toolbox.levenshtein.Dawg;
import de.uni_leipzig.asv.toolbox.levenshtein.DawgFactory;
import de.uni_leipzig.asv.toolbox.levenshtein.Levenshtein;
public class LevenshteinTest {
public static void main(String[]
args) {
//DAWG file
String dawgFile = "./resources/levenshtein/top50000en.dawg";
//load DAWG
Dawg Graph = DawgFactory.LoadGraph(dawgFile);
//put graph in Leveshtein with key dawgFile
Levenshtein.WordGraphs.put(dawgFile, Graph);
//word for calculation
String word = "half";
//distance
int distance = 2;
//calculate alternatives and
prosecutions
Vector<String> alternativs = Levenshtein.FindAlternatives(dawgFile,
word, distance, true);
Vector<String> prosecutions = Levenshtein.FindProsecutions(dawgFile,
word);
System.out.println("word:\n"+word);
System.out.println("alternativs:");
for(int i = 0; i< alternativs.size(); i++)System.out.print(alternativs.get(i)+" ");
System.out.println("\nprosecutions:");
for(int i = 0; i< prosecutions.size(); i++)System.out.print(prosecutions.get(i)+" ");
}
}
You can start this test.
Below you see the output of the test.
Loading graph. This may take a while..
word:
half
alternatives:
half Half calf halo hall halt Zale Val
Vale Valu Khalq Khalaf Kalb Ghali Gal Gale Gala Gall Gulf Golf Lal Elf Rolf Daf
Dal Dali Dale Daly Pal Palo Pall Palm Falk Fall Cal Calfa Call Cali Calif Chalk
Sal Salt Salk Sale Self Shale Shall Shelf Yale Wolf Whale Wharf Wald Walk Walt
Wall Nall pal palm pale pals pall Alf gulf gale gala gall golf Hal Hale Hall
chalk calm call Bali Bala Ball mall malt male Maly Male Mali Mall hulk hull
holy hole hold hilt hill ha hajj hawk hack haul hauls ham hams hat hats hate
haze hazy hair hail hails hang hand harp harm hard halls halve halts had hadn
have has hasn hash hay heal held helm hell help bale balm bald balk ball behalf
whale wharf wolf walk wall Tal Tale Talb Tall Talk Taif fall self shale shall
shelf salt sale al ale all tall tale talk
prosecutions:
halfway halftime half-way half-year
half-time half-empty half-staff half-inch half-interest half-brother half-point
half-price half-mile half-million half-day half-dozen half-hour half-hearted
half-century