Loading and saving a configuration:
Configure the tool by your own:
How to use the Command Line Version
How to use Namerec in your own program
A
description how to install a module is available at the main
page of the ASV Toolbox project.
The line
you have to copy into the toolbox.start file looks like this:
de.uni_leipzig.asv.toolbox.namerec.gui.RecognizerPanel
This tool tries to
recognize names in sentences. It needs some initially given names (a gazetteer)
and some rules for learning new names.
Before you start the tool
you have to configure the tool. You can do this by your own or load a
configuration.
Choose the File Management
panel. At the bottom you find two buttons, one for loading configurations and
one for saving them(see figure 1).
This button is for saving a configuration to file.
If you use this button a new file saving dialog will open where you can
save the actual configuration to file in any directory. This button is for loading a configuration file.
If you use this button a new file open dialog will open where you can
choose the configuration file from any directory.
![]()
![]()

figure 1
|
There are 6
panels for configuring the tool. Let us start with the first one. This is the
File Management panel. Here you can configure with output files you want to
have and where you want to save them (figure 2).
all complex names which were found items which are were found but to rare or
were to rare classified as the same. Rule Context: specify why and item was
classified


![]()
![]()


figure 2
log file for namerec: containing
information about all what namerec does
The second
panel is the Parameters and Settings panel. Here you can specify the parameters
for the algorithm and some settings for the database input and the usage of the
internal tokenizer (see figure 3).
Field for the version id.
Here you can decide if you want to
use the internal tokenizer for
tokenise your text and if you want to replace all numbers with %N%. This is for configurate the database:
choose the first and the last id of the sentences you want to analyse with
the tool.
![]()
![]()
![]()

Her you can specify the numbers of
verification thresh, the number of sentences for verification, the
threshold for accept items and the sentences between the time estimation.
figure 3
The next
panel is the Tag System panel. Here you can configure tag encoding and the
regular expression tagging(see figure 4).
Enter one new entry to table by
filling out the fields. Content of the table which will be used by the
algorithm. Button for saving the content of the
table to file. Delete the complete content of the
table. Button for loading the content of the table
from file.
![]()
![]()
![]()
![]()


Button to delete the selected entry of the
table. Auto Fill button for the tag
encoding.
figure 4
The next
panel is the Rules and Patterns panel(see figure 5). The functionality
of this panel is like the one of the Tag System panel.
Pattern to find names. Rules for classify unknown words in a
context.
![]()
![]()

figure 5
The next
panel is the Known names panel. Here you can list all names which already are
known(see figure 6). The functionality of this panel is like the one of
the Tag System panel.
All known names.
![]()

figure 6
The last
panel you have to configure is the Database Settings panel(see figure 7).
Switch to switch on/off the write
back to the database. If one no other output possible. Database settings for database input
and output. Table for database input. Output table for write back result.
![]()
![]()
![]()
![]()


figure 7
Switch for switch on/off the verification
by database. Database settings for verification: needs table with
words, table with sentences and a table which connect both table with the
help of the id.
At the Run
panel you can star the tool(see figure 8).
Text area for output result. Select this for write output to file. Stop button to stop the algorithm
while running. Select this for run NE recognition. Field for enter a sentence. Button to choose a file for input. Button to start the algorithm from
sentence. Button to start the algorithm from
file. Button to start the algorithm from
database.
![]()



![]()
![]()
![]()
![]()
![]()

figure 8
For starting the command line
version of this tool use the following command:
java -Xmx500M -classpath .;./lib/ASV_Namerec.jar -Djava.ext.dirs=.;./lib
de.uni_leipzig.asv.toolbox.namerec.Recognizer configfile [-t -rn] [-o outfile]
db|file|sentence [filename|sentence]
configfile - path to a configuration
file of this tool containing the settings for this run
-t use tokenizer
-rn replace numbers
-o outputfile write output to file outputfile, if not specified written to
console
db use sentences from database for run(configured in configfile)
file (needs filename behind separated by space)use sentences from file filename
for run
sentence (needs sentence behind separated by space) use the specified sentence
for run
·
Run
Namerec with configuration ./config/namerec/NameRec_noWriteback.cfg with db
input and output to file
./example/namerec/namerecdb.txt
java -Xmx500M -classpath .;./lib/ASV_Namerec.jar -Djava.ext.dirs=.;./lib
de.uni_leipzig.asv.toolbox.namerec.Recognizer
./config/namerec/NameRec_noWriteback.cfg –o ./examples/NameRec/namerecdb.txt db
·
Run
Namerec with configuration ./config/namerec/NameRec_noWriteback.cfg with
sentence input and output to file
./example/namerec/namerecdb.txt
java -Xmx500M -classpath .;./lib/ASV_Namerec.jar -Djava.ext.dirs=.;./lib
de.uni_leipzig.asv.toolbox.namerec.Recognizer
./config/namerec/NameRec_noWriteback.cfg –o ./examples/NameRec/namerecdb.txt
sentence Geoge Bush ist gesucht.
·
Run
Namerec with configuration ./config/namerec/NameRec_noWriteback.cfg with file
input and output to file
./example/namerec/namerecdb.txt
java -Xmx500M -classpath .;./lib/ASV_Namerec.jar -Djava.ext.dirs=.;./lib
de.uni_leipzig.asv.toolbox.namerec.Recognizer
./config/namerec/NameRec_noWriteback.cfg –o ./examples/NameRec/namerecdb.txt
file ./examples/NameRec/Namerec.txt
For using Namerec in your
own program you only have to know some classes.
·
Recognizer:
This class will do
the algorithm. Additional it provides some methods for initialise your rules.
·
SatzDatasource:
Interface that provides access to any datasource.
·
Config:
Class for handle
with your configuration file.
Here are an example of a
JAVA class(NamerecTest.java) using the Namerec tool. You can find the class
NamerecTest.java in the package de.uni_leipzig.asv.toolbox.tests.
package
de.uni_leipzig.asv.toolbox.tests;
import java.io.File;
import
java.io.FileNotFoundException;
import
java.io.IOException;
import
java.util.Observable;
import
java.util.Observer;
import java.util.Scanner;
import java.util.Vector;
import
javax.swing.SwingUtilities;
import
de.uni_leipzig.asv.toolbox.namerec.NameTable;
import
de.uni_leipzig.asv.toolbox.namerec.Pattern;
import
de.uni_leipzig.asv.toolbox.namerec.Recognizer;
import
de.uni_leipzig.asv.toolbox.namerec.SatzDatasource;
import de.uni_leipzig.asv.toolbox.namerec.util.Config;
import
de.uni_leipzig.asv.toolbox.namerec.util.SwingWorker;
public class
NamerecTest {
private static
Vector<Pattern> extraPats;
public static void main(String[]
args) {
final boolean tokenize
= true;
SwingWorker
sw = new SwingWorker(){
public Object
construct() {
//read config
String
configFile = "./config/namerec/completedatabasetestconfig.cfg";
Config
cfg2 = null;
try {
cfg2
= new Config(configFile);
}
catch (FileNotFoundException e1) {
e1.printStackTrace();
}
catch (IOException e1) {
e1.printStackTrace();
}
String
parent = new File(configFile).getParent();
//load classification Rules
Vector<Pattern>
classRules = new Vector<Pattern>();
Scanner
insc;
try {
insc = new Scanner(new File(parent + "/"
+
cfg2.getString("IN.PATFILE", "")));
classRules =
Recognizer.loadClassRules(insc);
}
catch (FileNotFoundException e) {
e.printStackTrace();
return null;
}
//load extraction pattern
extraPats = new
Vector<Pattern>();
try {
insc
= new Scanner(new File(parent + "/"
+
cfg2.getString("IN.PATFILENE", "")));
extraPats =
Recognizer.loadExtractionPattern(insc);
}
catch (FileNotFoundException e) {
e.printStackTrace();
return null;
}
cfg2.set("DB.WRITEBACK", "false");
//make paths to files absolute
cfg2.set("IN.REGEXP", parent + "/" + cfg2.getString("IN.REGEXP", ""));
cfg2.set("IN.PATFILE", parent + "/" + cfg2.getString("IN.PATFILE", ""));
cfg2.set("IN.PATFILENE", parent + "/"
+
cfg2.getString("IN.PATFILENE", ""));
cfg2.set("IN.CLASSNAMES", parent + "/"
+
cfg2.getString("IN.CLASSNAMES", ""));
cfg2.set("IN.KNOWLEDGE", parent + "/"
+
cfg2.getString("IN.KNOWLEDGE", ""));
//create Regonizer
Recognizer.makePatternMap(classRules,
extraPats);
Recognizer
rec;
try {
rec
= new Recognizer(cfg2, null, classRules, extraPats,tokenize);
}
catch (IOException e1) {
e1.printStackTrace();
return null;
}
Recognizer.cfg2=cfg2;
final String sentence
= "Osama
Bin Laden ist gesucht.";
//set Datasource
rec.ds = new
SatzDatasource() {
boolean isDone = false;
public String
getNextSentence() {
if (!this.isDone) {
this.isDone = true;
return
sentence;
}
return "END";
}
public int
getNumOfSentences() {
return 1;
}
};
rec.addObserver(getObserver());
try {
rec.doTheRecogBoogie(false,
tokenize);
}
catch (Exception e) {
e.printStackTrace();
return null;
}
return rec;
}
public void finished(){
System.out.println("finished");
}
private Observer
getObserver() {
return new Observer() {
public void
update(Observable o, Object arg) {
final Object[]
arr = (Object[]) arg;
SwingUtilities.invokeLater(new Runnable() {
public void run() {
System.out.println("\n\n\nResult:"+Recognizer.outputSentenceCL((NameTable) arr[0],
(String)
arr[1],extraPats));
}
});
}
};
}
};
sw.start();
}
}
You
can start this test. Below you see the output of the test.
Confg:Einstellungen:
-------------
Klassen:
.\config\namerec//additional/completedatabasetestconfig.cfg.tagsystem
Wissen
Items:
.\config\namerec//additional/completedatabasetestconfig.cfg.knowledgeItems
Wissen
Regexp: .\config\namerec//additional/completedatabasetestconfig.cfg.regex
Wissen
Regeln: .\config\namerec//additional/completedatabasetestconfig.cfg.rules
Regeln
für NEs .\config\namerec//additional/completedatabasetestconfig.cfg.extraPats
Anzahl Sätze zur
Kandidatenüberprüfung 30
Threshhold Anerkennung Item 0.15
Beginne bei Satz: 0
Ende bei Satz: -1
Datei für neue Items:
Datei für eventuelle Items:
maybes-de.txt
Datei für Kontexte, wenn Regeln
irgendwie zuschlagen:
Datei für komplett bekannte Namen:
NEs-de.txt
Anzahl der Verifikationsthreads: 10
Could not connect! You can not use database option.
Could not connect! You can not use database option.
Initializing basetagger...
Number of Rules: 65
0: Osama Bin
Laden ist gesucht.
Knowledge: Bin=ZN
Knowledge: Osama=VN
Knowledge: .=PU
Knowledge: Laden=NN
verification
done!
finished
Result:<person
pattern="VN ZN NN">Osama Bin Laden</person> ist gesucht .