Top Document: freeWAIS-sf Frequently Asked Questions [FAQ] with answers Previous Document: 5.4) How can I index HTML files? Next Document: 5.6) How can I index my ftp server? See reader questions & answers on this topic! - Help others by sharing your knowledge See question 'How can I index HTML files?' first. Lets assume, your servers pages reside in directory '/home/robots/www/pages'. Your servers URL might be 'http://myserver/'. The database will be named 'www-pages'. An easy format file (www-pages.fmt) would be: record-sep: /\n\n/ # never matches layout: headline: /<[Tt][Ii][Tt][Ll][Ee]>/ /<\/[Tt][Ii][Tt][Ll][Ee]>/ 80 /<[Tt][Ii][Tt][Ll][Ee]> *./ end: region: /<[Hh][Tt][Mm][Ll]>/ stemming TEXT GLOBAL end: /<.[Bb][Oo][Dd][Yy]>/ Then call waisindex -t URL /home/robots/www/pages http://myserver \ -d www-pages -t fields \ `find /home/robots/www/pages -type f -name "*.html" -print` If you do not have the modified URL handling compiled in, the headline always contains the URL. With the modified handling, headlines contain the title string of the HTML document, if there is any. An example database is running at http://ls6-www.informatik.uni-dortmund.de/SFgate/www-pages rsp. wais://ls6-www.informatik.uni-dortmund.de/www-pages. User Contributions:Top Document: freeWAIS-sf Frequently Asked Questions [FAQ] with answers Previous Document: 5.4) How can I index HTML files? Next Document: 5.6) How can I index my ftp server? Single Page [ Usenet FAQs | Web FAQs | Documents | RFC Index ] Send corrections/additions to the FAQ Maintainer: pfeifer@ls6.informatik.uni-dortmund.de (Ulrich Pfeifer)
Last Update March 27 2014 @ 02:12 PM
|
Comment about this article, ask questions, or add new information about this topic: