![]()
How to use the Command Line Version
How you can use the Zipfel-Tool in your own program
A
description how to install a module is available at the main
page of the ASV Toolbox project.
The line
you have to copy into the toolbox.start file looks like this:
de.uni_leipzig.asv.toolbox.zipfel.ZipfelPlugin
This tool
demonstrates Zipf’s law. For a better understanding the definition of the Zipfs
law:
“Originally, Zipf's law stated that,
in a corpus of natural language utterances, the frequency of any word is
roughly inversely proportional to its rank in the frequency table. So, the most
frequent word will occur approximately twice as often as the second most
frequent word, which occurs twice as often as the fourth most frequent word,
etc.” wiki[01]
The tool uses
input from a file, a database or from plain text input. It extracts the words
from the given input and calculates the frequencies of the words.
After this, it
shows you the result in a table or a diagram.
Here you
can find the direct link to the help you read now. Also you see the current
version of the Zipfel program and the authors.

On the
input panel, you have 3 options for input and 1 option for the settings.
If you have
a file containing the text you want to analyze, go to this panel and click on
the “browse” button. Browse for the file on your local machine.

If
everything is ok with your file, you can click on the start button. A popup
will be shown as long as the program analyzes your input.
The data
can also be taken from a database. On this panel, you have to specify the
parameters for the database connection.
The
standard settings for LCC databases are shown below:

At first
you have to fill out the fields “host”, “port”, “name”, “password” and
database. After that, click on the connect button. The plug-in will load all
names of the available tables including the columns of the tables. It assumes a
table with word forms and frequencies.
If the
program succeeded in loading this information, you can choose the proper table
and the word and frequency columns
If you have
chosen the right parameters, you can click on the start button. A popup will be
shown as long as the program analyzes your input.
You can
copy and paste any text you want in this field. Then, click on the start
button. A popup will be shown as long as the program analyzes your input.

On this
panel you can change the settings for the tool.

Here you
can choose the following parameters:
-
show straight line if this value is set, the straight
line will be shown
-
show boundaries if this value is set, the
boundaries are shown in the diagram panel
-
lower boundary here you can choose the lower
boundary (if the value is set, you can move the value by dragging the arrow or
click on the button “Input” for directly typing the value)
-
upper boundary here you can choose the upper
boundary (if the value is set, you can move the value by dragging the arrow or
click on the button “Input” for directly typing the value)
On this
panel you see the statistics results after your input is analyzed.

The output
in detail:
-
number of types (different word) this value shows you how many
different words (types) are in the input
-
number of tokens (including
frequency) this
value shows you how many tokens are in the input
-
text dependent constant k this value shows you the text
dependent constant k, which represents the mean value of all text dependent
constants k, which will be calculated for every word with the following formula
:
§
k = r * n, where r is the rang of
the word and n the number of the word
-
language dependent constant c this value shows you the mean value
of language dependent constant c which will be calculated with the following
formula :
§
c = rn*n/N where n
is the number of the word, N the number of all words and rn is the biggest rank of all words
with frequency n
On this
panel you see the result as a list of all words in the input. The words are
ordered by their count.

In detail
the given output:
-
relative commonness this value shows you the relative
commonness, which is be calculated with the following formula:
§
n/N where n is the number of the word and N the
number of all words
-
text dependent constant k this value shows you the text
dependent constant k, which is calculated with the following formula :
§
k = r * n, where r is the rang of
the word and n the number of the word
-
language dependent constant c this value shows you the value of
the language dependent constant c, which is calculated with the following
formula :
§
c = rn*n/N where n
is the number of the word, N the number of all words and rn is the biggest rank of all words
which are given n times
This panel
shows you a selection of some words, down to the lowest rank.


Here you
can export the results. There are 3 options to do this.
-
Export as open document table this exports the result to a open
document table
-
Export as CVS file if you click on this button, the
result will be written in a csv file (comma separated values)
-
Diagramm as .png if you click on this button, the
result will be saved as a png file (a graphics format)

This panel
shows you the result as a diagram. The x axis represents the rank of a word and
the y axis represents the count n of a word. The axes are scaled logarithmically.

java
-classpath .;./lib/ASV_Zipfel.jar -Djava.ext.dirs=.;./lib
de.uni_leipzig.toolbox.zipfel.ZipfelCmdLine [parameter …]
-host
<host> host of your database
-port
<port; default=3306> port of the
database
-user
<user> your user name of the
database
-pw
<password> your password of
the database
-db
<database> database
containing data for Zipfel
-table
<database table> table on
which Zipfel should run
-wordcol
<column containing words or word numbers>
column containing the words or their word
numbers for Zipfel
-countcol
<column containing frequency>
column containing the frequency of
the words
-where
<restriction using where; embed in quotation marks>
This option allow you to specify a
where clause for restricting the data
-filenamediagram
<output file for diagram in png format>
absolute path of the output file in
png format
-filenametablecsv
<output file for table in csv format>
absolute path of the output file in
csv format
-filenametableods
<output file for table in open document format>
absolute path of the output file in
open document format
command
line output will be in the following format :
[restriction
of the selection] TAB types TAB tokens TAB k TAB c TAB a TAB b
This example shows you how you can
use the tool in your own program:
import
java.io.File;
import
java.io.FileInputStream;
import
java.io.FileNotFoundException;
import
java.io.IOException;
import
java.io.InputStream;
import
java.util.Enumeration;
import
java.util.Hashtable;
import
de.uni_leipzig.asv.toolbox.zipfel.Wort;
import
de.uni_leipzig.asv.toolbox.zipfel.Zipfel;
import
de.uni_leipzig.asv.toolbox.zipfel.ZipfelErgebnis;
/**
* @author madaxe
*
*/
public class
ZipfelTest {
/**
* @return
*/
private
InputStream getInput()
{InputStream
is = null;
try {
is = new FileInputStream(new File("input.txt"));
} catch
(FileNotFoundException e) {
e.printStackTrace();
}
return is;
}
/**
*
*/
public
ZipfelTest()
{
}
/**
*
*/
private void run(){
Zipfel
z = new Zipfel();
try {
Hashtable
hs = new Hashtable(z.process(getInput()));
Enumeration
enumer = hs.keys();
Wort[]
words = new Wort[hs.size()];
int b = 0;
while(enumer.hasMoreElements())
{
Object
key = enumer.nextElement();
Wort
w = (Wort)hs.get(key);
System.out.println(w.getWort()+
" : "+w.getAnzahl());
words[b] = w;
b++;
}
ZipfelErgebnis ze = new
ZipfelErgebnis(words);
System.out.println(ze.getGesamtzahl());
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* @param args
*/
public static void main(String[]
args) {
ZipfelTest
zt = new ZipfelTest();
zt.run();
}
}
Links :
wiki[01] - http://en.wikipedia.org/wiki/Zipf%27s_law