Documentation of Pendulum

*    

back to main page

 

Installation. 1

Introduction. 1

How to use the Gui Version. 1

Loading and saving a configuration from/to file: 1

Configure the tool by your own: 3

Starting the tool: 15

How to use the Command Line Version. 19

Command: 20

Options: 22

Examples: 27

 

Installation

A description how to install a module is available at the main page of the ASV Toolbox project.

The line you have to copy into the toolbox.start file looks like this:

de.uni_leipzig.asv.toolbox.pendel.PendelPanel

Introduction

This tool is for finding named entities by bootstrapping.

How to use the Gui Version

For using the tool you need a database which contains 3 tables, a table with words, a table with sentences and table which connect the words and sentences tables.

Before you start the bootstrapping process you have to set some configurations. You can do this by loading a configuration file or set the configurations by your own.

Loading and saving a configuration from/to file:

At the File Management Panel you will find 2 buttons. One for loading a configuration and one for saving the current configuration(see figure 1). After loading a configuration you can change also the configuration of the tool. For this see “Configure the tool by your own”.

This button is for saving a configuration to file. If you use this button a new file saving dialog will open where you can save the actual configuration to file in any directory.

 

This button is for loading a configuration file. If you use this button a new file open dialog will open where you can choose the configuration file from any directory.

 
File Management Panel -configuration files

figure 1

 

Configure the tool by your own:

There are 5 panels where you can change settings. So let us begin at the first panel, the File Management panel. Here you can choose which output files you want to get after running the bootstrapping process. Select the check boxes before the file you want. For every file a new file dialog will open for choosing the output file (figures 2&3).

Example for one of the file dialogs. Here you see the file dialog for the log file. It file is named with logpendel.txt and is placed in the main directory of the toolbox.(You don’t have to name the file in this way or placed it in a directory of the toolbox. Other directories or filenames are also possible.) 

 
example file dialog

New Items file: In this file all named entities which were find during the bootstrapping process are saved. An entry in this file look like this: Gerhard      VN    11/29 Angela

(Gerhard=new item, VN = classification of the new item, 11/29 in 11 of 29 cases classified in this way, Angela= new item were find because of this item )

 
figure 2

In this file you find the extraction pattern which causes that classified words become a named entity: An entry look like this: Angela Merkel VN NN->name

(Angela Merkel=named entity, VN NN->name= extraction pattern which were used)

 

In this file you find the rule why a item was classified as in the way it is classified. An entry look like this: Angela(VN) Merkel(?NN) VN GR*->NN

(item(class) = item which is already in the knowledge base with the classification, item(?class)= item which is found because of  the rule of this entry, VN GR*->NN = rule which causes the classification)

 
example: all output file selected

In this file the log will be saved. All about new items, may be items, … will be logged. So you can comprehend why an item was found or why it not become a new item.

 

This file is very similar to the new item file. Also the entry are written in the same way like in the new item file. But the listed items are only maybe new items means there are not enough data to be sure that the classification is right.

 
figure 3

 

 

 

 

The next panel is the Parameters and Settings panel. Here you have to configure the database and the parameters for the bootstrapping algorithm (see figure 4&5).

The last table you have to configure is the table connecting the words and sentence table.  Choose the table, column with the id of the words and the column with the ids with the sentences in this order.

 

This are the sentence table settings.  Choose the table with sentences(first drop down menu), the column with the id of the sentences(second drop down menu) and the column with the sentences(last drop down menu).

 
database settings

Configure here the settings for the word table. At the first drop down menu table name, at the second drop down menu the id field for words, the third drop down field the column with the word and at the last drop down menu the column containing the frequency of the word in the corpus you use.

 
figure 4

Configure here the settings for your database.

 

Use this button to get the default settings for the parameters.

 
 

 


Minimum count of the word in the corpus to become a new item.

 

Threshold which have to be exceeded for acceptance of the item.

 

Maximum number of sentence which are used for verification.

 

Select the check box if you want to use the internal tokenizer.

 

Maximum number of sentence in which a word is searched.

 
parameters settings

figure 5

Button to add rules from file. A new file dialog will open to choose the file.  If you save the configuration the rule will be saved in file ends with .rules .

 
The next panel is the Rules and Patterns panel. Here you can add, delete and save rules and patterns(see figure 6&7).

Select a rule in the table above and click on this button to delete this rule.

 

Here you can add a new rule. Enter the rule and click an add to add the rule.

 

Table which contains all rules which will be used.

 

Button to save the rules to file.

 

Button to delete all rules.

 
configure the classification rules

figure 6

 

This is the extraction pattern part. The functionality is like the functionality of the class rules part above.

 
configure the extraction pattern

figure 7

The next panel is the Input Items panel. Here you can add start items. This are items which are already classified. Additional you can add same background knowledge items which will be used for classification and extraction but they will not be listed in the item list at the end. The functionality of this panel is like the functionality of the class rules panel or the extraction panel. The panel may look like in figure 8.

Here you can configure the background knowledge items.

 

Here you can configure the start items.

 
example of the input item panel

figure 8

The next panel is the Tag System panel. Here you can configure the tag encoding and the regular expression tagging. Regular expression tagging means that you can use a regular expression for finding candidates for new items. The panel may look like figure 9. The functionality is like the functionality of the Rules and Patterns panel.

Auto Fill button to fill out the table automatically.

 

tag encode panel

 

Regular expression panel

 
tag encode panel

figure 9

Starting the tool:

After configure the tool you can start the finding of named entities(see figure 10).

Button to stop the algorithm.

 

All items which are found or in start item list.

 

Button to let the algorithm have a break.

 

table containing the unused items.

 

Button to start the algorithm.

 
started algorithm

figure 10

 

How to use the Command Line Version

Command:

For starting the command line version of this tool use the following command:
java -Xmx500M -classpath .;./lib/ASV_Pendulum.jar -Djava.ext.dirs=.;./lib de.uni_leipzig.asv.toolbox.pendel.PendelCL configfile -o outputfile [-t]

Options:

configfile path to the configuration file which should be used for the run

-o outputfile path to the output file in which the output will be written

-t use the internal tokenizer

 

Examples:

 

 

Reference

This is an implementation of the bootstrapping method described in

 

Quasthoff, U.; Biemann, Chr. : Named entity learning and verification: EM in large corpora. In: Proceedings of CoNLL-2002 , The Sixth Workshop on Computational Language Learning, 31 August and 1 September 2002 in association with Coling 2002 in Taipei, Taiwan

 

back to main page