![]() |
| Wortschatz :
Suche : NextLinks : FindLinks |
ObjectiveThe objective of FindLinks is to
procure the data for NextLinks. Therefore as many as possible web pages are loaded and
links within these pages are detected. Project StatusThe system is still in a beta test phase. The FindLinks client is not stable yet but will be available soon. ArchitectureFindLinks has a client-server architecture. The FindLinks server is responsible for the distribution of the URLs to the clients. The FindLinks clients process the URLs and send the analyzed results back to the server. The FindLinks clients are are platform independent and can operate on every computer connected to the internet. Technical RealizationThe FindLinks server has a list of several million URLs which have to be evaluated successively. Each client receives its own package of 500 URLs and tries to download these 500 pages. From each of this received pages the contained URLs will be extracted and only the list of these URLs will be sent back to the server. Afterwards the client receives the next package of 500 URLs and so on. For Webmaster: Load Balancing and robots.txtA reasonable URL ordering should prevent
individual servers from being overloaded by a large
number of requests within short time. The file robots.txt
(see http://www.robotstxt.org/)
is considered by the FindLinks server. Changes in that file
are noticed at the latest after 30 days. ImprintFindLinks is a project of the Automated Speech Processing Group (see http://www.asv.informatik.uni-leipzig.de/) at the Institute of Computer Science at Leipzig University. Contact: wort@informatik.uni-leipzig.de |