Last Update: October 26 2005


Changes :

version 0.3 (Jul 14 1997)
* enhanced X Window  user interface - now supports keyboard focus traversing
  between widgets (work not perfect)
* most of widgets are modified
* new feature added - updating remote URL references in local tree to local in
  HTML documents
* now is possible to enter more starting URLs
* many bug fixes

version 0.3pl1 (Aug 6 1997)
* avoid to change modification time of file (I want to implement document tree
  synchronisation soon)
* removed bug which results in hang when try to transfer moved robots.txt file
* now moved URLs are correctly rewritten in HTML document (broken in 0.3)
* more verbose reporting about moved documents

version 0.5 (Sep 25 1997)
* now every host name is converted to lower case to prevent redundancy
* some changes in widget library
* implemented transparent "reget" with FTP or HTTP protocol. Not every HTTP
  server supports reget. (Apache 1.2, Netscape, MS IIS, and ever HTTP/1.1
  compliant server)
* now all files are at first stored with temporary name (possible use of reget
  in another run of program). When download is finished file gets true filename.
* new mode "resume regets" is implemented
* code restructuring
* functions to convert date string to internal format (synchronisation ...)
* new mode "singlepage" added - download only one HTML document with all inline
  objects (pictures, ...)
* server side map are now handled correctly
* repaired bug when anchor names are not written in local URLs when rewriting
  (broken in 0.3, 0.3pl1, in previous versions was good)
* changes in file naming rules (each directory index is now stored in _._.html
  file not in index.html or ftp_dir_index.html) == better reverse transformation
  from filename to URL.
* implemented HTTP and FTP synchronization
* added new mode to SButton widget and its successors to emulate on/off button
* Toggle implemented transparently (mixed use of SButton > , CheckButton ,
* asynchronous connect when running in X Window mode
* !!!!!!!!!!!! changed name for subdirectory where www documents are stored from
  !!!!!!!!!!!! "www" to "http" (this make one of my colleague very sick :-))
* timeouts are now handled via "select()"
* now is each URL added to hash table too for better performance in
  was_before() function - this means little more work for each URL but when
  working on big set of URLs this will save lot of CPU time.
* simple SSL support by using of SSLeay
* removed some bugs
* added FTP proxy support
* update X Window interface and scheduler to reflect all changes
* updated documentation

version 0.5pl1 (Sep 30 1997)
* removed bug which avoid use of X Window interface when compiled without SSL
* start to rewrite some of widgets
* all modes which scans local document tree now scans only  desired directories
* removed bug when pavuk sometimes hangs for long period if you try to schedule

version 0.6 (Nov 11 1997)
* all command line parameters are handled transparently via param table
* each parameter is now possible to handle in "pavukrc" file
* !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
* WOW WOW WOW I finally solve that problem with that dirty TreeWidget !!!!
* !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
* keyboard control for TreeWidget (ScrollTreeWidget)
* removed one big memory leak in get_abs_file_path()
* Combo widget
* Configuration management via so called scenarios
* many bug fixes in X window interface
* more command line switches (opposites for booleans)
* removed bug in file_is_parsable() while checking if file successfully opened
* removed bug in close_socket() -> "if (sock < 0) close(sock)"
  ^^^^.. I love you strace.

version 0.6pl1 (Nov 13 1997)
* removed mistake with list parameters ( -asite , -dsite , -ddomain ...)
* removed bugs in -v -h parameters checking

version 0.6pl2 (Nov 16 1997)
* repaired some bugs - scenario loading, Domain Allow/disallow switch ...
* extended scenario loader/saver to allow scenario dir selection
* repaired html parser - \n or \r inside parsed tag results in buggy result
* command-line scenario saver

version 0.6pl3 (Dec 2 1997)
* limitation for size of transfered document added (-maxsize)
* limitation for MIME type of transfered document via HTTP/HTTPS
* authorization for HTTP proxy added
* repaired bug - Xtoolkit standart parameter were not recognized
* repaired bug - when parent document were not successfully processed ,
  stays locked
* repaired bug - when using HTTP proxy && connecting to SSL server
* added SSL proxy support
* added Gopher proxy support
* added gatewaying FTP and Gopher via HTTP proxy
* better FTP data connection handling
* progres meter on terminal (-progres)
* Log widget implemented

version 0.7 (Dec 30 1997)
* rewritten message reporting system for X Window - now based on Log widget
* added NLS support via GNU gettext
* created slovak message catalog by (zatial bez diakritiky)
* implemented removing of improper files directories (in sync mode)
* bug in FTP synchronization removed - buggy reply code check
* some needless FTP commands are not send while retrieving directory list -
* ftp data connection is established before REST while restarting FTP transfer -
  sometimes FTP server starts transfer from beginning instead of from given
  position (I don't known why)
* checking of file size when synchronizing (FTP only)
* better FTP control connection handling
* some bug fixes
* logging messages to file
* solved problems with FTP synchronization

version 0.7pl1 (Jan 13 1998)
* added support for HTTP/HTTPS URLs with authentication informations :
* in sync mode used standart UTC time instead of localtime - gmtime()
* ftp command MDTM sent only when required
* handling of HTML tag <META HTTP-EQUIV="Refresh" Content="..; URL=...">
* added in file stored authentication informations (read manual for authinfo
  file format)
* added more entries into mime type selection dialog
  (from apache mime.types file)
* now pavuk sets return code of program to number of failed transfers
* now you can optionally omit some directory levels from local doc tree
  (try set -base_level $nr at command line and you will see what this means)
* checking of write() fail
* progres is now reported correctly when restarting transfer
* changed some of widgets to have translatable strings
* repaired bug in ScrollWin widget code , when TreeList or Log widget sometimes
  jumps up
* asynchronous DNS name resolving via external process
  (breakable in X11 interface)
* dirty solved error in Col and Row widget when resizable widget gets zero size
* German message catalog by Joergen Grieb

version 0.7pl2 (Jan 15 1998)
* repaired compile bug in update_links.c (when compiling without X Window
  interface support)
* implemented buffered DNS requests in dns_gethostbyname()
* repaired bug when downloading FTP directory via HTTP gateway and gateway
  returns HTML document with local nor remote URLs
* implemented so called dirty ftp proxy (-ftp_dirtyproxy) using CONNECT
  request to HTTP proxy.
* repaired bug in filename_to_url() http.password and http.user are not
  initialised to NULL
* synchronisation with FTP<->HTTP gateway is now possible
* to translatable message catalog added geometry of window

version 0.7pl3 (Jan 26 1998)
* in sync mode is now reported correctly ,that document is up to date
* implemented active FTP data connection
* new slovak message catalog in ISO-8859-2 encoding by me
* you can now specify directory from which will be message catalog loaded
  (-msgcat or NLSMessageCatalogDir:)
* rewritten passing of X-attributes to be smarter translatable
* now each comand line switch can have own help text ==> easier management
  of massage catalogs && self documenting switches
* rewritten all interface dependent staff to easier support GTK
* some initial GTK things done

version 0.8 (Feb 27 1998)
* automake/autoconf compilation-configuration scripts == very easy
* GTK interface
* gnu-win32 portability
* rewritten HTML parsing code + HTML4.0 support
* fcntl locking on systems, where flock not supported
* some bugs in X-interface solved
* GTK Calendar widget
* minor bug fixes
* restriction on document creation time implemented
* rewritten parts of X-toolkit interface to look similar as GTK interface
* Czech message catalog by Petr Vyhnalek

version 0.8pl1 (Mar 25 1998)
* some memory leaks removed
* URL based synchronisation
* command line scheduling (-schedule)
* repaired configure script : don't fail configuring GTK interface when Xpm or
  Xext libraries not successfully checked, gettext in glibc2
* cyclic rescheduling (-reschedule)
* limit set of documents only on starting site (-dont_leave_site/-leave_site)
* limit set of documents only on starting directory on starting site
* updated GTK interface for GTK+-0.99.4 =<
* inline objects are on same level of tree as parent when checking deep limit
* new option (-leave_level) to limit number of levels outside from starting
* you can now disable compiling of URL tree preview (big memory save)
  run configure script with --disable-tree
* solved bug in xinterface.c , which causes segfault in sprintf with some
  versions of libc.
* man page is installable via make install
* solved problems in widgets, which refuse to run Xt interface in some

version 0.8pl2 (Mar 30 1998)
* repaired bug in url_to_absolute_url() , when relative URL start with / ,
  was oddly rewritten.
* localedir in configure script now point in right place
* added pavuk.spec to distribution (for building RPMS)
* repaired configure script to detect right Xext,Xt library in some i
* extended set of unsafe characters in URL for encoding

version 0.8pl3 (Jun 9 1998)
* repaired bug when pavuk seg faults if redirecting to unsupported protocol
* repaired bug when pavuk miss part of tag between attribute name and value
  of attribute while rewriting links inside HTML document
* repaired bug in GTK interface - reading of uninitialised values

version 0.8pl4 (Jul 19 1998)
* added function CardBoxSwitchTo() to allow switching of Tabs in CardBox widget
* added "Open URL" dialog to File menu
* new mode "dontstore" implemented, for fetching files to proxy-cache
* added logo to About dialog

version 0.9 (Aug 5 1998)
* repaired bug in HTTP proxy code
* totally rewritten internal handling of URL tree !!!!!!
  (thank to Marc David Rovners base idea and my hard long work :-) )
* now icons works in tree preview with GTK interface as in Xt interface
* updated Czech message catalog
* window delete event is now handled right in GTK interface

version 0.9pl1 (Aug 9 1998)
* solved problems while compiling v0.9 without GUI
* repaired bugs excellently reported by Dmitry Semenov
  - HTTP reget doesn't work in sync mode
  - -preserve_time doesn't work with FTP and only in sync mode
* I have get working menu with Tree preview in GTK interface :-) as in Xt
* it is now possible to disable processing of some URLs by using of Tree

version 0.9pl2 (Sep 6 1998)
* minor bug fixes reported by some users
* repaired bug ,when -cdir ends with '/' and using -base_level switch results
  to broken filenames
* implemented interactive downloading using URL tree preview dialog
* solved problem in GTK URL tree preview with more starting URLs
* URL tree preview dialog in Xt interface is now not modal
* basic support for sending and receiving HTTP cookies (writing to cookie file
  not supported yet, GUI can't hand cookie parameters - only via cmd-line)

version 0.9pl3 (Sep 20 1998)
* intelligent updating of cookie file implemented (the some file may be updated
  with more processes concurrently without cookie looses)
* GUI interface for cookies setup
* HTML file on FTP server is processed right
* repaired rewriting of redirected url with fragment name specification
* you can now download from URL tree preview manually files which were broken or

version 0.9pl4 (Jan 6 1999)
* cookie file may contain any comments started by '#'
  (not saved back after update)
* host name translation errors are reported now right
* buffered IO implemented
* some minor bug fixes
* repaired any segfaults
* new & more icons for URL tree preview
* HTML tag & attribute restrictions for selection of URL's from HTML docs
* checking cookies if source domain is equal with domain attribute of
  Set-Cookie MIME entry
* cookie file is now right ordered (not reversed each time :-)
* new Czech message catalog in ISO8859-2 encoding by Petr Vyhnalek
* added new switch -gui_font , which allows you to set font used in
  GUI interface
* added new switch -language for used to set language of messages while
  compiled with GNU gettext support
* added very simple SOCKS(4/5) support (not tested yet)
* -pattern accepts comma-separated list of documentname matching patterns
* new option -url_pattern to enter comma-separated list of url matching
* -user_condition options added to provide option for user to specify by
  external script or program if URL should be processed or not
* repaired bug when extra space characters in scenario file are not removed
* repaired seg-fault while doing HTTP reget (thank to Orestes Sanchez Benavente)
* added -disabled_cookie_domains option

version 0.9pl5 (Jan 28 1999)
* you can now immediately change communication language from GTK GUI
* added gtk-config script to configure script for GTK configuration checking
* added client certification stuff for HTTPS (SSL) (not tested yet)
* some segfaults repaired in GUI code
* repaired time handling bugs
* added realm info to authinfo file
* HTTP authorization schemes are now handled properly
* HTTP digest access authorization implemented (it work with my apache server)

version 0.9pl6 (Feb 28 1999)
* when compiling with SSLeay lib using md5 computing routines from libcrypto.a
  instead of apaches md5c.c
* reuse of HTTP digest access nonce in more following requests is now
* digest authorization with proxy server
* added QueryGeometry to all Nws widgets for windows autosizing
  (finally - I am so lazy :-))
* filename conversion routines for changing local filename
  (delete set of characters , change string to string , tr like char to char)
* language change now work too if some files were processed
  (Tree preview not destroyed)
* while changing language all visible windows stay visible
* menu entry labels are GNOME compliant
* beautify of xinterface.c
* rewritten Xt interface to support language change from GUI
* each file selection entry now have browse button
* send QUIT signal while running in text mode and pavuk will exit safe
* added sample of Xt resources file for Pavuk
* thank to H?ard Skinnemoen added some features from gtk+-1.1.*
        - new style of adding childs to scrolled windows
        - parsing of ~/.pavuk-gtkrc
* solved win32/cygwin32/unix file path madnes

version 0.9pl7 (Mar 30 1999)
* changes for support GTK+-1.2.0
* removed sk and cs ASCII message catalogs from distribution
* repaired comandline time parameter scanning routine
* all labels in GTK interface are now left justified
* scheduling now work well
* solved problems when compiling without GNU gettext support and with GUI
* a lot of GTK improvements
* better processing of some stupid HTML constructions
* HTML comments and inline scripts are not parsed && processed
* default location of system pavukrc changed from $(prefix)/lib/pavukrc
  to $(prefix)/etc/pavukrc
* added a lot of new HTML tags for processing

version 0.9pl8 (Apr 12 1999)
* now compile with gettext support on systems without LC_MESSAGES defined
* checking of robots.txt now work again (thank to Stefan Stidl)
  - checking disabled in many previous versions because of oddly written
    condition :-(
* better detection of cyclic HTTP redirections
* repaired SEG fault while in GUI and HTTP redirection to already processed
  document occurs
* new icons for buttons added from Andreas Kraska . If you want old buttons,
  execute configure script with --disable-new_buttons option.
* accelerated menubar with GTK+-1.2<
* using putenv on system where setenv & unsetenv not found
* a lot of minor bug fixes

version 0.9pl9 (Apr 18 1999)
* repaired bug, when all documents downloaded over HTTP/HTTPS were processed as
  HTML documents (a lot of rewriting operations on binary files :-()
* repaired implementation of setenv/unsetenv on systems where not implemented
  (thank to Orestes Sanchez Benavente)
* timeout on connect() call
* now pavuk work on filesystems, where doesn't work link() call (FAT)
* better detection of already downloaded directories
* not buffered read while reading document data from net
* new Action menu
* enhanced use of GTK+-1.2 < features (GTK 1.0.x compatibility preserved)

version 0.9pl10 (Apr 25 1999)
* repaired bugs in net_connect() function
* repaired bug while using active ftp connection
* you can now miniaturize main pavuk window (GTK+ only)
* !!!!! -progres option repaired to -progress
* new option -runX (you can immediately start downloading files after GUI
  interface is started)
* simple support for CSS
* a lot of bugs fixed

version 0.9pl11 (May 2 1999)
* new -index_name option used to change default name of directory index
* new -store_name option used to set filename for document downloaded with
  -mode singlepage
* changed version of used autoconf (1.3) and automake (1.4)
* support for processing standalone CSS files
* doesn't get SIGPIPE when decoding encoded file (not fork-ing in GUI)
* using CTree widget instead of Tree with GTK+-1.2

version 0.9pl12 (May 5 1999)
* new option -ftplist to use wide listing of FTP directories (using LIST
  ftp cmd instead of NLST) (only unix style of list supported)
* new option -preserve_perm to preserve options of ftp files
  (assume -ftplist option)
* now pavuk saves ftp symbolic links as symbolic links not normal files
* new option -preserve_slinks to leave point symbolic link to same location
  as on remote server.
* Go Bg button now work properly with GTK+ (thank to Jan Kratochvil)
* new option -FTPhtml/-noFTPhtml to enable/disable processing of files
  downloaded over FTP protocol
* anchor names for FTP urls now parsed right

version 0.9pl13 (May 16 1999)
* pavuk now removes empty directories in local document tree
* directories are now processed right
* new option -min_size to eliminate transfer of small documents
* new options -skip_url_pattern and -skip_pattern
* repaired bug in document time preservation (thank to Tomas Dobrovolny)
* while updating parent document links, and it is locked, pavuk will wait
  until lock will be released
* locked document is allways rescheduled

version 0.9pl14 (May 23 1999)
* thank to Steffen Kern added dropping of URL's to url list and pavuk main window
  (for example from netscape)
* thank to Tomas Dobrovolny fixed some minor bugs in script
* new HTML tags for table backgrounds added (thank to Szabolcs Szakacsits)
* new -htDig option for cooperation with htDig web indexing program
* new option -check_size/-nocheck_size for enabling/disabling checking of
  document size (some HTTP servers report bad Content-length: header)
* minor bug fixes

version 0.9pl15 (Jun 21 1999)
* many fixes and changes in HTML parser code
* better support for Cascading Style Sheets
* lot of patches from Szabolcs Szakacsits and Stefen Kern added
* fetching of URLs from clipboard implemented for GTK and Xt GUI
* repaired encoding of URLs (thank to Marc Haber and Szabolcs Szakacsits)
* new option -urls_file (for reading URLs from file or stdin)
* get SSL stuff working again (was broken because of non-blocking IO)
* updated Czech message catalog (by Petr Cech)
* new icons in icons/ directory
* a lot of changes / bug fixes

version 0.9pl16 (Jun 29 1999)
* checking for zero size of file
* fixed bug with using -store_name option (thank to Marc Haber)
* new type of log file added (option -slogfile)
* -mode resumeregets now recurse through links
* removed many memory leaks inside new HTML and CSS parser code
* removed some random crashes with Xt GUI

version 0.9pl17 (Jul 06 1999)
* bigger read buffer -> better read performance on fast connections
* new option -identity for specifying User-Agent: HTTP request field
* new option -nosend_from for deny sending From: field with HTTP request
* new option -nostore_index used to tell pavuk not to store documents
  referenced with directory URLs
* new option -acharset used to specify set of prefered document encodings
  for HTTP protocol
* changed selection retrieving with GTK+ GUI
* better native language switching in internationalized environment
* bug fixes

version 0.9pl18 (Jul 26 1999)
* support for EPLF format listing of FTP directories
* support for Novel format listing of FTP directories
* repaired one typo which breaks compilation without GUI
* automatic preferences saving/loading to file ~/.pavuk_prefs
* loading & saving of menu accelerator keys to prefs file
* fixed type casting bug in html/css parser code (thank to Robert Gasch)
* support for newer openssl versions (0.9.3<)
* better & nicer progress meter
* limitation of transfer speed (max/min)
* my CERN HTTP/proxy server is somehow odd - synchronization of WWW pages
  wont work if you specify port number in URL (curious), so port number
  was removed from URL if portnumber is default.
* sync mode work now well when spanning to another server
* sync mode work again with servers which not respond right 304 code (mea culpa)
* added Apply button to configuration dialogs
* fixed lot of bugs in net_connect function
* instalation of pavuk icons to $(prefix)/share/icons/
* new quota options (quota for file size, transfer amount and free space on
* solved bug, when Gtk+ URL list not show its contents
* solved bug, when pavuk crashes on redirection to unsupported URL
* corrected fetching of URI: header content for redirected URLs
* several bug fixes and improvements

version 0.9pl19 (Sep 06 1999)
* changed URL equivalence checking from filename based to URL based
* internal URL representation now contains its local filename  , this means
  lower memory footprint, but bigger memory consumption
* several minor memoryleaks removed
* implemented universal & flexible mapping mechanism URL -> local filename
  based on RE or wildcard patterns and simple rules (see manual ,
  option -fnrules) (thank for James Feeney base idea)
* implemented optional saving of info files for each document (each info file
  contain  source URL of document and documents downloaded via HTTP/HTTPS have
  there whole HTTP header)
* repaired parsing of standalone CSS files
* if is enabled storing of info files and you change default local tree layout
  (with -fnrules or -base_level or -tr_* options) now will URLs newer overlap
* new option -all_to_local used to force rewriting all URLs in HTML document,
  to point to expected location
* new reminder mode for checking if any URL was modified in given period
* code cleanups
* new option -sel_to_local used to force rewriting all URLs in HTML document,
  which accomplish to limits, to point to expected location
* many corrections in messages (thank to Colin Marquardt)
* repaired bug in removing BASE tag from HTML code, and now is not removed, but
  commented out (thank for bug report and idea to Jan Tomasek)
* added icons to OK && Cancel buttons in Gtk interface (GTK+ only)
* changed all GtkList widgets to GtkCList
* added Clear & Modify buttons to each editlist dialog (GTK+ only)
* you can now optionally change pixmaps for buttons from pavukrc file
  (see all Btn*Icon*: statements)
* fixed bug in ftp directory translation to HTML when using passwords with
* finally I fixed that bug which randomly puts trash to pattern options in GUI
  interface. strtok() is really bad function :-(
* fstatfs emulation on SYSV systems using fstatvfs
* better detection of header files where is fstatfs declared
* repaired Seg Fault when using cookies (thank to Andrew Hall)
* added more icons to GTK+ dialogs (thank to Frederic Toussaint)
* each dialog window can be closed with ESC key (GTK+1.2 only)
* each menu entry can have now assigned shortcut (GTK+1.2 only)
* make uninstall now work well (thank to Colin Marquardt)
* option -lmax now work properly with inline objects
  (thank to Bernd Lutkenhoner)
* removed old_buttons
* actualized German message catalog (thank to Colin Marquardt), please if you
  speak German check it and possible errors report to Colin
* new option -check_cookie for enabling checking if cookie is set for from
  which commes
* fixed bug in cookie handling code
* collections of button icons for pavuk in button_icons/
* a bit fixed URL redirection code for nonabsolute URLs
* fixed detection of base URL of document for documents with URL with search
* new French message catalog (many thanks to Frederic Toussaint), please if you
  speak French check it and possible corrections report to author
* actualized Czech message catalog (thank to Petr Cech)

version 0.9pl20 (Sep 29 1999)
* new option -all_to_remote used to leave all links inside HTML document to
  remote location (proposed by Diego Antona Archilla)
* fixed incompatibility with GTK+-1.0
* with starting HTTP URLs now pavuk sends optionally as Referrer: field self URL
  see option -auto_referer (proposed by Sergey Taranenko)
* fixed segfault in cookie modification code
* numbering of documents with overlaying local names for different URLs
* new better HTML tag handling routines
* removed a lot of memory leaks
* URL downloading order strategies implemented (idea by Sergey Taranenko)
* replaced GtkText widget with GtkCList widget in log window
* now works limiting of length of log in GTK+ interface
* fetching files from Netscape browser cache directory
  (great idea by Sergey Taranenko)
* new Spanish message catalog by Javier Comeron

version 0.9pl21 (Oct 13 1999)
* support for removing advertisement banners from HTML pages
  (base idea by Mika Joukainen)
* timestamps are written to regular log file when starting and ending log
  (proposed by Jan Tomasek)
* support for Bell V8 implementation of regular expressions (as used in cygwin)
* fixed SegFault which occurs while loading scenarios during downloading
  progress (thank to Sergey Taranenko)
* authorization info editor (only for GTK+ GUI)
* new option -check_bg/-nocheck_bg used to detect if we run as background job,
  if so don't write any messages to screen
* fixed some errors in Xt interface errors
* fixed bug when stdout isn't flushed before _exit()
  (thank to Szabolcs Szakacsits)
* new option -send_if_range/-nosend_if_range. This option should be used when
  HTTP server supports reget, but sometimes generates different Etag field
  for not changed document (if Etag and If-Range field differs reget will start
  from beginning of file)
* locking of log file
* optional numbering of log file when log file locked (option -unique_log)
  (proposed by Sergey Taranenko)
* several messages fixes (thank to Colin Marquardt)
* running of post processing command after successful download of document
  see option -post_cmd (proposed by Sergey Taranenko)
* counting of fatal errors
* fixed core dump in lfname structure cleanup when using fnmatch patterns
  (thank to Kevin Gamiels report)
* fixed bug which causes some broken links
* fixed bug which causes bug when compiling Xt version of interface with
  support for loading files from Netscape browser cache
  (thank to Niraj Sachdeva)
* portability to HPUX solved (thank to Niraj Sachdeva)
* fixed bugs and oddities in sync mode code (thank to Szabolcs Szakacsits)
* fixed typo which causes problems using mode linkupdate from command line
  (thank to Szabolcs Szakacsits)
* fixed bug when using -store_info, pavuk leaves opened some of lock
  files, this causes Too many open files error (thank to Dawit Yimam)
* significant speedup of sync mode
* some internationalization fixes (thank to Javier Comeron)
* several bug fixes in local name assigning code (when using -fnrules option)
* fixed possible problems with timeout detection in GTK+ interface
* now is possible to specify template of scheduling command
  (look for -sched_cmd option)
* fixed bad behavior with "" urls inside HTML documents
* fixed bug in URL parsing when contains both anchor and searchstr

version 0.9pl22 (Nov ?? 1999)
* fixed portability to systems which doesn't declare h_errno
* got rid of all dirty strtok()s (I hope without mistakes)
* removed all configuration environment values !!!!!!!!
* fixed problems with loading files from NS cache on big endian machines
* more properties for URL displayed in URL tree preview (GTK only)
* added UI configuration for -stime option
* fixed some bugs in base URL of document handling in HTML parser (thank to
  Laurent Salles report)
* fixed functionality of -min_size option (thank to Frank Baumgart)
* fixed segfault when running user condition script (thank to Frank Baumgart)
* added support for BSD regular expressions
* added support for GNU regular expressions
* started debug levels implementation
* selection of SSL client methods version implemented, option -ssl_version
  (thank to Ians idea)
* handling of &amp; and &#38; inside URLs (thank to Matts note)
* fixed typo in configure script which causes misconfiguration in some cases
* fixed handling of URLs with \n \r \t characters
* repaired handling of nonblocking IOs (thank to Szabolcs Szakacsits solution)
* fixed buggy behaviour of get_abs_file_path() function
* optional unique SSL ID with all SSL sessions (thank to Jeff Roberson howto)
* added handling of starting urls in form server:[port]/...
* added new Append URL dialog for appending URLs within downloading progress
  (GTK only)
* added proxy authorization with CONNECT request
* fixed handling of \ and " characters inside quoted strings
* added new option -httpad to be able to add some user defined HTTP headers
  in HTTP requests
* implemented statistical reports for downloading progress (can be saved to
  file - -statfile option, or previewed inside GTK UI window)
* fixed limits checking (prefix,postfix,patterns) for HTTP URLs with search
  string part
* changed debug mode controlling with -debug_level option
* new WIN32 specific option -ewait, to enable user to control if console
  will disappear after pavuk will finished (proposed by Jan Tomasek)
* started writing NEWS document, to enable users briefly know new pavuk
  features in particular pavuk versions without reading huge ChangeLog file
* new chance to save URL tree structure from URL tree preview dialog
  window (GTK+-1.2 only)
* .pavuk_info directories are now omitted, when scanning local document tree
  in linkupdate,resumeregets and local tree based sync mode
* fixed pavuks behavior of option -check_bg on systems where getpgrp() needs
  PID parameter

version 0.9pl23 (Dec 20 1999)
* huge internal rewrite, changed handling of some globals - big step to
  MT version, cleanup of internal algorithms
* implemented new mode (ftpdir) for listing contents of FTP directories
  (proposed by Niraj Sachdeva)
* added new macro %m (domain name) to -fnrules option
* changed handling of encoded documents - now are decoded only HTML and
  plain text documents all others will be stored encoded
* fixed corruption of cookies.txt file after user break
* completely changed handling of refresh META tag - broken in several
  previous releases
* fixed potability to FreeBSD (thank to Holdrich Kristian)
* new options -aip_pattern & -dip_pattern for specifying allowed IP
  addresses with regular patterns (proposed by Samuel Laker)
* fixed bug in option -debug_level setting to "all" (thank to Andreas Mohr)
* fixed logging to nonanonymous FTP servers through HTTP gateway proxy
  (thank to Andreas Mohr)
* new option -site_level for limiting how many site levels to leave from
  starting site
* TOS settings for FTP data and control connection
* introduced new protocol FTPS for making SSL connection to FTP servers
  with SSL support
* if you will set environment variable PAVUKRC_FILE, pavuk will read this
  file as user pavukrc file instead of ~/.pavukrc file (proposed by
  Andreas Mohr)
* fixed SSL reading function, which should cause in some cases lost of data
  at end of file or hang in select()
* fixed problems with makealldirs() on WIN32 platform
* added additional informations (size,processing time) to structured log
  file (proposed by Dave Becket)
* fixed problems with restarting in GUI interfaces
* fixed problem with URLs with slashes at end of query string (thank to
  Dave Becket report)
* fixed problem with naming of local copies of FTP directories when
  downloading trough HTTP gateway
* added new HTML tag for URL processing CSOBJ/HT
* added new URL schemes for processing (tel,fax,modem,sms - from IETF drafts)
* automatic handling of unsafe characters inside filenames (now handled only
  Windows - \:*?"<>|) (proposed by Jan Tomasek)
* configure script now detects if msgfmt supports --statistics option
  (proposed by Dave Becket)
* fixed hangup after blocking locking inside document read loop
* implemented much cleaner blocking locking
* fixed several odd behaviours when generating localname of document
* implemented simple adjusting of too long filenames
* partially implemented HTTP/1.1 protocol with persistent connections !!!
* new options -use_http11/-nouse_http11 for enabling or disabling HTTP/1.1
  protocol support
* many many bug fixes
* extended URL based sync mode. Now you can specify subdirectory which
  contains mirrored documents (with option -subdir) and that directory is
  scanned before for documents, and after URL based synchronization is finished
  pavuk starts checking URLs from local tree, which were not checked in URL
  based synchronization.
* get rid of most of unsafe static buffers
* support for deflate encoding method via zlib
* handling of 1xx HTTP response codes
* bit changed behaviour with -site_level & -leave_level when processing
  moved URLs
* more automatic scan for OpenSSL || SSLeay libraries location
* fixed bug , which causes segfault, if BASE URL is unknown or unsupported
  (thank to Jeff Robersons report)
* applied patch from Jeff Roberson, which enables to use specified local
  netwok interface for communication (usefull for multihomed hosts)
  uses new option -local_ip
* thanks to Colin Marquardt improved quality of manual
* fixed linkupdate to work properly again (thank to Jaydeep Desais report)

version 0.9pl24 (Feb 09 2000)
* implemented parsing of VMS style FTP directory listings
* solved problems with FTP control connections, when pavuk breaks data
  transfer before finished
* rewritten from scratch URL parser - now is cleaner, easier extensible,
  faster and with lower memory footprint, and I hope conformable with
  RFC 2396
* new routine for comparing URLs based on url structure instead of URL
  string - means faster and with lower memory footprint
* bit better internal handling of query strings
* fixed segfault with decoding nonHTML documents
* fixed handling of FTP list processing on FTP servers which doesn't include
  "total xxx" line on top of directory listing
* added support for parsing old style BSD directories listings
* removed some random memory leaks introduced in previous release
* fixed closeups of several unhandled HTTP/1.1 persistant connections with
  remaining unrequired data
* fixed again handling of moved URLs with -leave_level option
* fixed ftpdir mode behaviour with some of HTTP gateways for FTP (for example
  Squid) (thanks to Niraj Sachdeva)
* implemented HTTP POST requests (see option -request)
* implemented parsing of DOS/Windows style FTP directory listings
* fixed handling of oddly detected persistant connections when using HTTP/1.0
  and talking to HTTP/1.1 server which doesn't respond with Connection: close
* fixed "Zero size" possible error reporting only for cases when we don't know
  exact size or size is non zero
* implemented dialog for editing HTML forms (GTK+ only)
* new option -hash_size for performance tuning when mirroring large amount
  of URLs
* now supports FTP URLs as defined in RFC (ftp://serv.dom/path for relative
  path to login directory and ftp://serv.dom//path for absolute path from FTP
  server root directory)
* changed behavior when doing FTP directory listings (CWD path + NLST/LIST
  changed to NLST/LIST /path)
* rejection of UNIX special files (sockets, devices, fifos) in FTP directory
* fixed segfault on empty FTP directory listings
* fixed segfault in document info storing code
* rewritten document locking routine, because of possible race conditions and
  errors in previous implementation
* enhancement for -fnrules option, which allows much higher flexibility in
  local name assignment to document (undocumented and not well tested yet)
* fixed unfunctional -store_name option
* fixed h_errno test in configure script, to work on SYSV systems (thanks to
  Marc Chantome)
* implemented dropping of URLs to URL Append dialog
* implemented option to be able to follow downloading process inside
  URL tree preview window (GTK+-1.2 only) (proposed by Francois RicharC)
* fixed odd behavior of FTP URL parser on WIN32 platform with FTP URLs in
  form ftp://ftp.server.dom//absolute/path/...
* fixed bug in new FTP directory processing routines when listing directories
  on MS FTP servers (thank to LE FAUCHEUR Frederic)
* fixed bug in routine which is computing difference between GMT and local
  time (on some platforms localtime() and gmtime() returns same statically
  allocated buffer for returning result)
* updated Properties view in URL Tree preview to show POST request infos
* support for inserting POST request inside URL tree from Form editor
* repaired URL parser to support URLs in form http://www.server.dom?xxxx
* fixed possible segfault in FTP code, which may occur, when pavuk is not
  able to establish data connection
* fixed bugs in scenario saving code (thank to Peter Erbak, Bill Miller)
* fixed cookies handling with moved documents

version 0.9pl25 (Mar ?? 1999)
* get rid of all Xt GUI code
* fixed bug in code which handles filesystem unsafe characters in Win32
* fixed bug in sync mode which stops crawling when starting document is
  up to date (thank to Dave Becket)
* fixed minor bug in handling of ; character inside URL
* implemented support for multiple HTTP proxy servers with intelligent round
  robin scheduling
* fixed segfault when using ftp/gopher HTTP gateway and cookies are enabled
  for sending
* fixed bug in url_compare() function which have bad results when comparing
  URLs with different scheme (thank to Niraj Sachdeva)
* fixed uninitialized HOME environment variable checking (thank to Andreas
* added check for db_185.h to configure script when looking for Berkeley DB1
  header files (thank to Roar Bergheim)
* fixed checking of start/end time limits in sync mode (thank to Peter Thalman)
* fixed segfault with moved robots.txt files (thank to Bill Miller)
* fixed bug in function filename_to_url() which causes odd behavior mostly
  in sync mode (thank to Peter Thalman)
* fixed HTTP proxy Digest authorization code
* added possibility to use authinfo file to store proxy authorization
* implemented optional multithreading support (now works only console version,
  GTK version need some further changes and testing)
* changed URL encoding/decoding handling, now user must enter regularly
  encoded URLs
* several simplification changes in files (thank to aldomel)
* fixes to script files to get working
  'make distcheck' (thanks to aldomel)
* simplified recomputation of GMT time from local time on systems with
  tm_gmtoff inside struct tm (thank to Robert Brennecke)
* corrected pavuk behaviour when -request contains some unpredictable request
  specifications (thank to aldomel)
* fixed compilation with --disable-tree
* fixed SSL read/write errors handling (thank to Jeff Roberson)
* split gui code to more modules
* fixed segfault when trying to preview document properties in URL tree
  preview dialog
* fixed scheduling from UI
* bit changed statusbar in UI
* zillion miscelaneous changes to get working GUI with multithreading
* workaround HP-UX NAME_MAX/PATH_MAX settings to disable automatic adjusting
  of long filenames to 14/255 limits (thank to Niraj Sachdeva)
* get working again -store_name option (thank to Orestes Sanchez Benavente
  and Jan Tomasek)
* fixed possible problems with reading and writing via SSL on nonblocking
* fixed functionality of -local_ip option when you change it in GUI
* fixed rewriting of URLs in HTML form action tags
* optimized header files dependencies - faster compilation
* removed minor memory leaks in HTML forms processing code
* corrected parsing of FTP response to PASV command to be able to cooperate
  with publicfile FTP server (thank to Felix von Leitner)
* fixed implementation of html_tag_co_elem() function
* implemented chance to fill noninteractively HTML forms when matching form
  is found (many thanks to Jeff Robersons idea and first implementation)
* implemented dumping of documents to any supplied file descriptor (thank to
  Honza Tomasek)
* corrected pavuk process exit value computation (redirected documents are
  not counted as failed yet) (thank to Thomas Coppock)
* fixed bug in function url_to_absolute_url() which causes bad behaviour with
  URLs ending with -index_name. (thank to Antoine Martin)
* --------- released testing version 0.9pl25c
* implemented code for saving session data to ~/.pavuk_keys in GTK interface
* corrected handling of multiline lists in HTML form filling dialog
* corrected several bugs in HTML forms parsing code
* fixed hangup on exit when using language switching from GUI menu
* fixed possible segfault when HTTP server respond with improper response
* --------- released testing version 0.9pl25d
* added several sample identity strings to combobox in GUI
* added files for integration to Gnome menu
* fixed bug with -fnrules F ... caused by FNM_PATHNAME flag passed to
  fnmatch() with some libc implementations (thank to Nicolay Mausz)
* corrected bad behaviour of function get_abs_file_path_oss() which expands
  wrong way relative paths to absolute paths
* changed behaviour of 'Load scenario' which now resets configuration before
  loading scenario and added new function 'Add scenario' which behaves same
  as 'Load scenario' before
* fixed bug introduced in 0.9pl25a which damages url structure and cause
  cycling of download and hangups or segfaults on exit
* adjusted NS cache directory access routines to be safe when accessing from
  multiple threads
* ---------- released testing version 0.9p25e
* fixed segfault caused by wrong call to tl_str_concat() in doc_download()
* fixed GUI compilation without NLS support (thanks to Gabor Z. Papp)
* fixed Toggle toolbar functionality
* minor corrections in Makefiles (thanks to Petr Cech)
* fixed pavuk.spec file to properly build RPMs
* updated Slovak,Cech,Spanish massage catalogs (thanks to all authors)

version 0.9pl26 (Aug 31 2000)
* added new Italian message catalog by Antonio Fragola
* updated German message catalog (thanks to Colin Marquardt)
* fixed sending of HTTP Content-type: request header with POST requests
* implemented optional deleting of remote FTP documents after successfull
  transfer (idea by Gabor Z. Papp)
* you can now optionally disable the numbering of overlaying documents to
  achieve unique name using option -nounigue_name (idea by Nicolay Mausz)
* added patch from Nicolay Mausz which implements new rmpar function in
  -fnrules option syntax
* fixed bug in SSL reading code which raises error when session was regularly
  closed on other side (thanks to Martijn van Oosterhout patch)
* fixed cooperation with SSL FTP servers which indicates successful swith to
  SSL mode with 234 response code (thanks to Martijn van Oosterhout patch)
* fixed opening of FTP data connections. Old code should make deadlocks in
  communication with some proxy servers. (thanks to Martijn van Oosterhout)
* fixed typo in config.h which refuses compilation on HP-UX (thanks to Niraj
* ---------- released testing version 0.9p26a
* better checking for pthreads support in configure script
* added option --with-gtk-config to configure script, to allow easier
  configuration on system with such weird renaming of libs/scripts as
  on FreeBSD
* added handling of HTTP server response fields Content-Location:,
  Content-Base:, Base: for setting base URL of document (thanks to Robo
* warning Zero length ... will now not appear with HTTP documents which
  doesn't contain Content-Lenght: response field
* fixed total document size computation of partially transfered documents
  if server doesn't provide Content-Lenght: header but only Content-Range:
* fixed broken robots.txt parser
* support for extended robots.txt standart with new Allow: statement
* -request option was extended to allow specify in request also destination
  filename of document in local filesystem
* -debug_level user show now also filename where document is stored
* fixed bug in robots.c when host name field in robots structure was
  deallocated without discarding data when restarting
* added MT locking of robots data; without locking should cause unpredictable
* now it is possible to enter empty values for form data in POST request
  specification dialog
* form editor dialog now properly extracts also hidden fields
* corrected handling of HTTP response code 303 with POST requests, now pavuk
  correctly redirects to GET request as it should
* ---------- released testing version 0.9p26b
* added support for PCRE regular expression in -*rpattern options and in
  -fnrules option
* -amime -dmime options now accepts also wildcard patterns
* added TLSv1 support for HTTPS/FTPS communication
* added new option in configure script --with-regex, which allow to select
  prefered regular expression type (one of none/auto/posix/gnu/v8/bsd/pcre)
* fixed compilation error in lfname.c when none of supported regular
  expressions types was configured
* enabled substring substitution in -lfname option when using Bell V8 regular
  expressions and regsub() function is available (cygwin b20 doesn't export it)
* added new option -dump_urlsfd to enable outputing URLs from downloaded HTML
  documents to selected file descriptor - usable for scripting
* addjusted filenames handling in WIN32 version to support new style of mapping
  win32 paths to POSIX paths in newer cygwin-1.x.y versions
* corrected comparing of URLs in -formdata option (thanks to Jeff Roberson)
* ---------- released testing version 0.9pl26c
* fixed seg-fault on parsing supported URLs with missing scheme dependant
  part of URL string (thanks to Marc Tooley).
* fixed problem with sleep() implementations which use SIGALRM for wake up
  in multithreaded version (thanks to Antoine Martin)
* new option -dont_leave_site_enter_dir/-leave_site_enter_dir which allows to
  limit leaving of directory which we entered first on the site
* enabled option -store_name to work also in other modes than just singlepage
* wrote small document wget-pavuk.HOWTO for wget users who are starting to
  use pavuk
* updated manual page
* -h option works now properly when -bg option is also used (thanks to
  Artem Frolov)
* attempt for workaround signal handling inconsistency in multithreading
  environment (thanks to Antoine Martin)
* define DB_LIBRARY_COMPATIBILITY_API in nscache.c before including db_185.h
  to force reading 1.8x Berkeley DB format with 3.xx library
* updated Slovak message catalog
* ---------- released testing version 0.9pl26d
* fixed problems with frozed threads on Solaris when starting download (thanks
  to Antoine Martin)
* added call to FreeConsole when running pavuk with -bg option on Win32
  systems (thanks to Andreas Mohr)
* added some gdk_flush() calls to status list modification code to force
  better updates
* added new option -singlepage/-nosinglepage to overcome limits of -mode
  singlepage (thanks to Jo? Savignon)
* now in sync mode is also checked size of documents downloaded over HTTP
  (thanks to Raun Nohavitza)
* added check for ssize_t type, without it won't compile on Ultrix
* ---------- released testing version 0.9pl26e
* added support to using network paths on WIN32 with cygwin-1.1 =<
* fixed broken -dont_leave_site_dir option
* added commandline passwords hiding feature (thanks to Steven Haryanto)
* fixed behaviour of -dont_leave_site_dir with moved site enter URLs
* updated German and Spanish translations (thanks to Javier and Colin)

version 0.9pl27 (Dec 13 2000)
* fixed infinite loop bug when both -store_name && -request options are used
  (thanks to Matthew)
* add new menu to GUI for selecting starting URLs from opened documents inside
* fixed bug which causes to reload mostly all HTML documents in sync mode
  because of sizes comparing
* fixed bug in parsing FnameRules: scenario field (thanks to Le Faucheur
* fixed freeze on scenario loading from GUI in multithreaded version (thanks
  to Le Faucheur Frederic)
* query string from HTTP/HTTPS URLs are now not decoded when generating
  local names
* new naming convention for local documents downloaded via POST request
  name#query (thanks to mda)
* fixed bug which causes hangs or segfaults when using -formdata option,
  because of doublefreeing memory chunk (thanks to Matthew)
* added two new patterns (<script , <style) to routine for guessing HTML files
* fixed dumping of wrong ENCODING: fields in -formdata, -request infos to
  scenario file (thanks to Matthew)
* ---------- released testing version 0.9pl27a
* now works -disable_html_tag all or -enable_html_tag all to disable/enable
  all HTML tags
* fixed fast spawning loop in multithreaded version caused by bad use of
  pthread_cond_timedwait() (thanks to Bjorn R. Bjornsson)
* fixed progress display bug showing size in bytes instead of kilobytes
  (thanks to Andreas Mohr)
* fixed bug in FTP code when pavuk opens twice data connection for directory
  listings (thanks to Raun Nohavitza)
* fixed stupid bug when pavuk uses short int type instead of unsigned short
  for storing port numbers (thanks to Raun Nohavitza)
* fixed checking of HTML document types with added encoding after MIME type
  (thanks to Brunie-Taton Alain)
* repaired broken site level computing on sites with moved starting documents
  in -site_level option
* implemented functions for launching commands on WIN32 with system()-like
  function when cygwin not installed (thanks to Thierry R?nier)
* added support for loading files from MSIE cache on Win32, and added options
  -ie_cache/-noie_cache to enable/disable this feature
* backported improvements to gaccel code from chbg. Now it is much more
* added new macro %q to -fnrules option, which will be replaced with urlencoded
  query string from POS/GET request specification
* fixed big memory leak in old style fnrules evaluation function caused by bad
  block nesting
* added two new functions (sif, !, &, |) to -fnrules option. ! is logical NOT
  for numeric values. & is logical AND for num. values, | is logical OR for
  numeric values. sif is decision between two strings by condition.
  (sif (cond) (val_if_cond_true) (val_if_cond_false)) is equivalent for C
  expression (cond) ? (val_if_cond_true) : (val_if_cond_false)
* added checks to reject compilation of NS cache reading code with BerkeleyDB
  2.0 and above because of incompatible database format. NScache uses 1.8x hash.
* corrected support for reading NS cache on big endian platforms based on patch
  for my NScache program from ...
* made HTTP/1.1 default (still possible to switch to HTTP/1.0 with option
* changed handling of parent urls in URL structure. Now is used linked list
  instead of nul terminated array. It is much safer for handling in MT.
* fixed segfault on redirection of robots.txt when HTTP/1.1 enabled cased by
  bad handling of persistant connections
* fixed bug in robots.txt file parsing code which causes infinite loops with
  some robots.txt files
* fixed memory leaks on robots.txt redirections
* fixed segfault when using -mode dontstore in multithreaded mode, caused by
  allocating shorter buffer for storing temporary unique name :-(
* fix to be able to compile with gtk-1.3 (aka gtk-2.0)
* added support for HTTP redirection on 307 response code
* added description messages for all HTTP/1.1 response codes which may occur
  and cause unknown errors just with numeric description
* fixed bug in processing of HTTP/1.1 chunked transfer encoding types
  after moved URLs because of oddly initialized trailer reading flags :-(
* it is possible now enter on commandline options unsupported in current
  compile time configuration, pavuk now only displays warning instead of
  raising error and exiting (thanks to Bjorn R. Bjornsson)
* fixed compilation when threads are enabled support for regular expressions
  is disabled or not present
* added locking of robots.txt info structure to prevent downloading it
  concurrently with multiple threads when compiled with MT support
* ---------- released testing version 0.9pl27b
* fixed compilation bug when compiling without SSL support (thanks to
  Le Faucheur Frederic)
* fixed bug made in previous testing release which causes segfault always
  when opening Limits config dialog because of use of initialized pointer
* added support for long/short commandline options with GNU getopt like syntax
  and compatibility with old format of pavuk options (no short options defined
* changed handling of scenarios from commandline. Scenario is now loaded at time
  when is --scenario option processed by commandline parser instead of prior to
  commandline parsing as before.
* now it is not mandatory to specify --scndir option before loading scenario.
* ---------- released testing version 0.9pl27c
* more reliable implementation of asynchronous DNS client/server for GUI
  version. Now guarantees atomicity of reads/writes, so no possible of
  protocol inconsistence after user break in middle of communication.
* internal restructuralization of code (hope not, but may lead to problems)
* fixed bug in preserving of persistant connections on robot.txt redirects
* fixed unnecessary closures of persistant connections in sync mode after
  304 response code
* added new options -dump_after/-nodump_after for use with -dumpfd option.
  this option control when will be document dumped to output (immediately or
  after download&processing)
* added new options -dump_response/-nodump_response for dumping also HTTP
  responses to -dumpfd
* fixed bug in parsing CSS inside HTML tags
* removed support for extracting destination URL from HTML after HTTP
  redirects. It must be broken server which doesn't send Location: header
  after redirect ... not worth to add workarounds for this problem
* rewrote from scratch the HTML parser (this means I've got rid of the
  oldest, worsest written code in pavuk). It seemds it should be bit faster
  and is much better extensible an maintainable.
* removed few small memory leaks
* added simple support for javascript patterns in DOM event attributes of tags,
  based on regular expressions
* ---------- released testing version 0.9pl27d
* fixed several memory leaks
* fixed bug in base64 encoding routine which was failing with non ASCII
  characters above 127
* changed way how is handled Digest authorization
* implemented NTLM authorization
* implemented NTLM proxy authorization
* now -auth_scheme & -http_proxy_auth options accept also textual parameters
  "user" "Basic" "Digest" "NTLM" besides numeric 1 2 3 4
* total restructuralization and cleanup of HTTP handling code. I was carefull,
  but it may lead to problems.
* now works NTLM and Digest authorization also with CONNECT requests
* minor changes in common settings dialog
* fixed bug in processing js patterns caused by bad tag attributes
* added new option -js_patterns to allow parsing of custom javascript patterns
  inside HTML documents
* added support for parsing also script body and look for patterns line by line
  (works also for files referenced by <SCRIPT SRC=...>
* implemented handling of proxy redirects (305 HTTP response)
* fixed compilation bug caused by undeclared _mt_dumpfd_lock_ mutex (thanks
  to Le Faucheur Frederic)
* fixed bug in handling locales in national environment (thanks to Milan
* added Czech translation to Gnome desktop entry for pavuk (thanks to Milan
* ---------- released testing version 0.9pl27e
* implemented detection of broken HTTP/1.0 proxies which don't handle properly
  downgrading to HTTP/1.0 when communicating with server which use newer HTTP
  protocol version (this causes bug when trying to use persistent connections)
* more paranoia checking of reading/writing sockets in HTTP code
* automatic request repeat after premature closure of persistent HTTP
* added support for robots excluding with <META NAME="robots" content="...">
  (thanks to Markus Mayer)
* fixed compilation bug with OpenSSL-0.9.6 because of new MD4 implementation
  int this OpenSSL version (thanks to Le Faucheur Frederic)
* fixed bug in new html parsing engine which fails to parse properly rest of
  document after <script>...</script>
* added support for HTTP/1.0 Keep-Alive proxy connections
* ---------- released testing version 0.9pl27f
* added install script for NSIS win32 installer
* fixed compilation bugs when building without GUI
* portability fixes to QNX RtP
* updated auth info edit dialog for NTLM support
* fixed possible MT race condition in gopher directory persing routine
* fixed confusion of ftp code with -remove_old & -ftplist when in sync mode
  files disappeared from server were processed like directories which failed
  (thanks to galanga)
* ported to BeOS 5 PE (works fine except file locking)
* added support for javascrip parsing in javascript:... URLs inside any
  supported HTML attribute
* fixed ftp directory listing when using active ftp data connections
* added option -follow_cmd which allows you to execute some script which
  can decide if pavuk should follow links from current document (thanks to
  Georg Rehm and hashao)
* adjusted establishment of active ftp data connections to be able to handle
  properly states, when server is unable or don't want to connect before
  sending response
* leading/trailing spaces are removed from attributes before processing it
  as URL to support broken sites ...
* ---------- released testing version 0.9pl27g
* fixed segfault when Location: contains relative URL after redirect
* fixed broken timestamping of HTML files in sync mode (thanks to Le Faucheur
* fixed segfault on broken HTML tags with leading spaces and unclosed quotes
* if -store_info is active also rejected URLs contain stored MIME header
  (thanks to Georg Rehm)
* don't apply limiting conditions (minsize/maxsize/mimet) on robots.txt
* fixed segfault when -norelocate option is activated (thanks to Markus Mayer)
* added O_BINARY to several open calls to prevent possible problems on Win32
* added new options -retrieve_symlink/-noretrieve_symlink to enable
  downloading of symbolic links from FTP server as regular files (thanks to
  Petr Cech & Andras Korn)
* fixed segfault in robots info cleanup code
* implemented new -js_transform option to allow bit more powerfull support
  for js patterns. No rewriting supported now (thanks to Mark D. Anderson)
* fixed problems when compiling with PCRE support
* ---------- released testing version 0.9pl27h
* fixed segfault on broken meta refresh tag (thanks to Georg Rehm)
* fixed bug in removing of trailing spaces from URLs (thanks to Le Faucheur
* added support for access authorization to FTP proxy server (thanks to Beno
* added GUI config for -js_transform option
* fixed bug in processing javascript bodies enclosed between <script></script>,
  which causes breaking of ending </script> tag
* -js_pattern patterns without substrings are now omitted
* fixed broken behaviour of pavuk when while regeting file receives empty
  response, it will process it as proper HTTP/0.9 response and stops regeting
  file (thanks to Christian Axbrink)
* simplified that horrible dialogs for adding prefered languages,charsets and
  mime types
* added new debug level "limits" for debugging limiting conditions
* updated manual page
* fixed deadlock on closing log file
* ---------- released testing version 0.9pl27i
* updated Czech message catalog (thanks to Petr Cech)
* added initialization of GTK locales
* added possibility to generate massage catalogs in UTF-8 encoding for
  use with future versions of GTK+
* fixed problems with switching language multiple times in GUI window
* updated documentation
* updated German message catalog (thanks to Colin Marquardt)
* fixed retrieving of URLs from selection and via DND to omit illegal CRLF
  characters (thanks to Aleksander Adamowski)
* adjusted win32 installer script to support installing message catalogs
* added support for setting message catalog path on WIN32 to install directory
* better handling of WIN32 paths in GUI
* added window icon to WIN32 version

version 0.9pl28 (Aug ?? 2001)
* added new option (-limit_inlines/-dont_limit_inlines) to disable checking
  of limiting options for inline objects (thanks to Olivier Sirol)
* fixed bug with special characters in filenames on FTP servers (thanks to
  Jo? GRONDIN), same for Gopher directories
* FTP directory listings are now transfered in ASCII mode (thanks to Jo?
* removed MT race condition in calling inet_ntoa()
* added new option -ftp_list_options to allow passing options to FTP LIST/NLST
* support for multiple WWW-Authenticate: and Proxy-Authenticate: in HTTP
  response (thanks to Monika Nowotnik)
* ported to AtheOS
* fixed improperly handled rewriting of links in HTML documents pointing to
  itself (thanks to Nicolay Mausz)
* added new function (getval) to -fnrules option extened syntax rule for
  getting values of query parameters of URL (thanks to Nicolay Mausz)
* added initialization of OpenSSL PRNG randomizer to prevent message
  "PRNG not seeded" on some platforms (thanks to Albert Chin)
* ---------- released testing version 0.9pl28a
* compilation fixes for nongcc compilers and bigendian architectures (thanks
  to Albert Chin)
* fixed segfault which occurred always when used unknown long option
* added forgoten gdk options to option table
* fixed compilation without NTLM support enabled (thanks to Georg Rehm)
* added option --disable-ntlm to configure script to be able to compile
  pavuk without NTLM authorization support (thanks to Albert Chin)
* fixed segfault which occurs when closing Common config dialog (thanks to
  Georg Rehm)
* fixed all notworking options using regular patters when pavuk is compiled
  as multithreaded program (thanks to Mirko)
* fixed NTLM implementation to be able to work properly on bigendian machines,
  with non GCC compilers and on 64bit platforms
* fixed leaking of file descriptors after "File redirect" when have before
  persistent connection opened
* improved URL queue handling and downloading threads management
* changed internally handling of filename assignments (not well tested yet,
  can cause instability or deadlocks in MT)
* fixed segfault when no URL is specified in -request or -formdata options
  (thanks to Andrew Price)
* fixed segfault when using -formdata option caused by freeing already freed
  memory chunk (thanks to Andrew Price)
* removed several minor memory leaks
* added checking of BerkeleyDB implementation in libc in configure script
* updated French message catalog (thanks to Le Faucheur Frederic and
  Pascal Adoux)
* added new option -fix_wuftpd, to fix broken wuftpd behaviour, when it
  doesn't raise error when listing not existing directory (thanks to
* ---------- released testing version 0.9pl28b
* added new option -post_update/-nopost_update to force pavuks URL updating
  engine to update in parents documents only URL currently downloaded
* %o macro is supported now also in simple -fnrules macros
* added two new macros to -fnrules option - %M == mime type of document,
  %E == standard extension of document MIME type. This two new macros work
  properly only when used with -post_update options. (thanks to Majkel
* in sync mode are now processed at first links from directory scan (if -subdir
  was specified) and than just other links.
* added two new functions to -fnrules option rules (getext - gets extension
  from path , seq - string equal)
* fixed scheduling, broken by changes to support long options
* fixed commandline parser, so it again support --long-opt=val style of
* using mkstemp instead of tmpnam when available (thanks to Fr??ic
  L . W . Meunier)
* type icons in tree view were replaces with smaller icons
* new option -info_dir which allows you to store pavuk_info files outside
  of document tree
* fixed bug, when after reget of document also unnecessary documents are
  loaded to memory, this can cause out of memory situations with big
  documents (thanks to Jinghua Liu)
* added new option -js_transform2 which have similar function as -js_transform
  just it allows also rewriting of matched URLs. This is also very suitable to
  add tags/attributes which are not supported by pavuk at default.
* added forgoten handling of GUI configuration of -js_transform option
* new faster growing hash function to allow bigger size hashes when downloading
  huge amount of documents
* ---------- released testing version 0.9pl28c
* fixed resources leaking after reopening of netscape cache index
* better handling of netscape chache index file after modifying with some
  other program
* added support for loading files form mozilla browser chache directory
* fixed broken saving of document infos for rejected files (thanks to Georg
* changed a bit logic of lists when cleaning lists and deleting fields (thanks
  to Marco Strack)
* implemented new options -aport/-dport to allow/deny downloading of documents
  from servers at specified ports (thanks to Georg Rehm)
* fixed bug in handling patterns in GUI (thanks to Georg Rehm)
* added to configure script checking of POSIX regex in libregex (as on recent
  cygwin versions)
* fixed compilation of MT version (thans to Jeremy P. Campbell)
* ---------- released testing version 0.9pl28d
* fixed problems with -preserve_time on win2000 (thanks to Andreas Schiling)
* added new option -hack_add_index/-nohack_add_index usefull to more extensive
  site mirroring when for each URL taken from HTML documents also directory
  of the document is added to queue (thanks to stvictor)
* better handling of unsafe characters in HTTP requests
* updated manual page
* after unexpected error while regeting, the .in_ file now will be always
* ftp directories are not insterted into queue twice when doing directory
  based synchronization (thanks to Jo? GRONDIN)
* no more problems with duplicating FTP directory indexes in sync mode
  (thanks to Jo? GRONDIN)
* on error in scenario file pavuk now exits with error instead of continuing
  (thanks to Jo? GRONDIN)
* when processing symlink from FTP server which points to directory, pavuk
  will make link to directory not to directory index file (thanks to Jo?
* if HTTP server sends Content-Length: in response and option -check_size is
  active, than pavuk now reads exactly this size without waiting on
  connection close even when not using persistent connections. This (thanks to
  Glen Stewart)
* ---------- released testing version 0.9pl28e
* fixed SSL library detection on SYSV systems with libsock (thanks to Eun-Mok)
* added new option -default_prefix to simplify mirroring when -base_level
  option is used
* -max_time option now allows to specify subminute times
* in GUI it is now possible to enter subminute communication timeout
* added right button menu to log widget
* ---------- released testing version 0.9pl28f
* new function "ud" for -fnrules option used for decoding URL encoded
  strings (thanks to Tony Gale)
* applied patch from Albert Chin
   -  new -egd_socket <path> command-line option
   -  new --egd-socket=<path> autoconf option to provide a hard-coded
      compile-time path for the EGD socket
   -  use RAND_file_name to get the pathname of the EGD socket if RANDFILE
      env variable is set instead of RAND_EGD_SOCKET_PATH env variable
   -  new --with-zlib-includes=DIR and --with-zlib-libraries=DIR autoconf
      options to specify location of zlib library
  (many thanks to Albert Chin)
* fixed bug in URL rewriting engine (thanks to Nicolay Mausz)
* fixed broken -mode reminder (thanks to Andrea Tasso)
* fixed bug in parsing ftp URLs with transfer type specified (thanks to
  Richard Ems)
* replaced old config.sub, config.guess files with new versions from
  automake-2.50 and adapted for atheos (thanks to Petr Cech)
* in -formdata and -request options it is now possible to specify requests
  without any field entered (thanks to Dima Nemchenko)
* fixed broken behaviour of -limit_inlines/-dont_limit_inlines option
* fixed sync mode with mirrors with changed layout of local tree
* rewritten limiting conditions checking engine
* ---------- released testing version 0.9pl28g
* fixed msgfmt detection in configure script (thanks to Richard Ems)
* fixed compilation without SSL support (thanks to Richard Ems)
* updated Spanish Message catalog for 0.9pl27 (thanks to Francisco Javier
  Comer? Gayoso)
* rewritten limiting conditions checking engine again
* implemented JavaScript bindings to enable users to use more flexible
  conditions for excluding URLs from download (new option -js_script_file)
* implemented new function "jsf" for -fnrules option which allows execution
  of JavaScript functions by name
* ---------- released testing version 0.9pl28h
* implemented JavaScript console dialog
* fixed segfault which occurred always after unexpected HTTP response when
  regeting files (thanks to ha shao)
* implemented workaround for ftp servers which understand REST command but
  always restart from scratch (greeting MS :-)) (thanks to Raun Nohavitza)
* exported new attribute of url in Javascript bindings (html_tag) which holds
  source HTML tag of particular URL when level == 0
* new method "get_sub" of PavukFnrules class in JS bindings for getting
  subpatterns from -fnrules patterns
* more enhancements for JS bindings classes
* fixed hangup in http_throw_message_body()
* fixed possible race condition when using url_set_path()
* added new option -ftp_login_handshake to enable customizing of FTP server
  login procedure (thanks to Marko Daris)
* added new option -rsleep for randomizing sleep time between transfers in
  interval 0 -> -sleep (thanks to Christian Canella)
* added new Japanese message catalog by SATO Satoru (thanks)
* ---------- released testing version 0.9pl28i
* rewrote detection of BerkeleyBD 1.8x in configure script
* updated French message catalog (thanks to Frederic Le Faucher)
* fixed compilation with Gtk+-1.0
* applied IRIX portability patch from Albert Chin (thanks)
* fixed compilation on newest version of cygwin (thanks to Pablo Blasco)

version 0.9pl29 (??? ?? 2001)
* redesigned SSL implementation
* FTPS now works perfectly over proxy
* added support for Netscape NSS as replacement for OpenSSL SSL layer
* fixed detection BerkeleyBD 1.8x header files
* ---------- released testing version 0.9pl29a
* FTP active mode now uses address from getsockname() instead from gethostname()
* applied changes made between 0.9pl28i and 0.9pl28
* added IPv6 networking support including FTP support by RFC1639 and RFC2428
* added new options -dont_touch_url_pattern -dont_touch_url_rpattern to deny
  download and rewrite of particular URLs in HTML tags
* added clause into COPYING file about linking pavuk with OpenSSL
* applied part of patch from Albert Chin (thanks!)
  - fixed prototype declarations in htmlparser code
  - added include arpa/inet.h in http_proxy.c
  - fixed declarations in html_proxy.c, ftp.c to make some compilers happy
  - rewrite of configure script to use config file instead of horrible looking
    infinite compilation commandlines
* updated win32 installer config file - pavuk.nsi
* added README.win32 file to source distribution
* added new Polish message catalog by Przemyslaw Sulek (thanks!)
* ---------- released testing version 0.9pl29b
* updated Czech message catalog by Petr Cech (thanks!)
* updated Polish message catalog by Przemyslaw Sulek (thanks!)
* added new Ukrainian message catalog by Dmytro O. Redchuk (thanks!)
* fixed typing of variables in ntl_auth.c (thanks to Petr Cech)
* added new options -dont_touch_tag_rpattern to deny download and rewrite
  of URLs in particular HTML tags
* droped GTK+-1.0.x GUI support (sorry)
* fixed swithing of languages in GUI
* much better handling of GoBg function ... now is GUI cleanup done mostly
  immediately without waiting for ending transfer
* added two new properties to PavukUrl class in JS bindings to allow writing
  of content based limiting options (both are defined when level == 0)
  - .html_doc - full content of parent document of URL
  - .html_doc_offset - offset of current HTML tag in parent document of URL
* fixed compilation without IPv6 support
* po/Makefile now uses generated list of catalogs instead of hand written
* LINGUAS support in configure script
* fixed initialization NSS library after config changes
* -unique_sslid option is now supported when using NSS as SSL library
* new options -nss_domestic_policy/-nss_export_policy to allow selection
  of SSL ciphers suites in NSS for U.S. Domestic or for Export ciphers
* support for libmcrypt/libgcrypt DES in ntlm code to allow not to use
  non GPL compatible libcrypto from OpenSSL
* removed default javascript patterns
* applied patch from Harald Forster (thanks!)
  - fixed *printf formats for shorts and chars
  - fixed handling of va_list in xvaprintf
  - fixed bugs in dllist* routines
* using safe vsnprintf instead of vsprintf when available
* fixed IPv6 support to work with FreeBSD
* replaced libc ctype functions with own functions to be not dependent on
  proper locale
* fixed get_1qstr() to not fail when last char is \ (back slash)
* added GUI config for -dont_touch* options
* added support for no EOL closed strings in gui_xprint()
* hopefully fixed the reason for crashing randomly in multithreaded mode
  caused by trashing URL structure temporary linked insided hash tables
  (thanks to all who reported the MT crashes)
* when loading preferences (-prefs) always reset config to prevent adding
  cumulative options loaded form rc files (thanks to Harald Forster)
* loading scenario and reseting config from GUI no more leaks memory
* updated Japanese message catalog (thanks to Sato Satoru)
* fixed bug with -max_time option causing total pavuk confusion (thanks to
  Gema Pizana)
* the entered not applied config changes in common and limits dialogs will
  not disappear when trying to popup already visible dialogs (thanks to Harald
* ---------- released testing version 0.9pl29c
* added new spec file for building multiple RPMS of pavuk with different
  configurations (thanks to Rami El-Charif )
* added support for new format of Mozilla cache
* implemented new options -tag_pattern & -tag_rpattern to allow precise
  matching of URLs inside HTML tags based on matching of HTML tag, HTML
  attribute and URL patterns (thanks to Huaxin Wang)
* updated man page
* updated Slovak message catalog
* switched to use newer autoconf(2.50) & automake(1.4-p4) versions
* processing of HTML files downloaded over gopher is now supported
* retry for document transfer is now performed always when it is
  clever to do so. Increased default number of retries to 2.
* fixed storing of local name for URL into scenario (thanks to Stephen Sweigart)
* when you will specify LNAME: filed in -formdata specification, it
  will be used like local name for the request
* !!! changed exit values of pavuk process. Now 0 means everything was OK,
  1 means configuration error and 2 means that there were some problems
  with some documents
* MacOSX portability fixes
* fixed routine for adding starting URLs to allow entering file: URLs
* fixed segfaulting in cookie expiration code (thanks to Mark D. Anderson)
* fixed compilation with disabled regexp support
* fixed segfaulting when using -asite/-dsite/-adomain/-ddomain options and
  file: URL appears in the html documents (thanks to farquat)
* ---------- released testing version 0.9pl29d
* several random fixes
* fixed NTLM nonce decoding
* fixed getval and rmpar functions of fnrules option (thanx to Alexey Morozov)
* fixed ssl_write functions which sometimes hangs printing error message forever
  (thanks to Robert Dobozy)
* fixed optional sending of WC/ASCII type of NTLM T3 messages
* ---------- released testing version 0.9pl29e
* Netli measurement code added
* SSL rewrite
* First Sourceforge release
* ---------- released testing version 0.9pl29f (Jun 3 2003)
* bug fixes
* ---------- released testing version 0.9pl30a (Jul 12 2003)
* fixed build system for translation files
* added -referer/-noreferer option
* updated german translations
* updated autoconf stuff and rebuild all Makefiles using autoconf 2.57
* included lots of updates and newer files floating around in the net
* ---------- released testing version 0.9pl30b (2004-07-05)
* fixed buffer overflow (BUG #984898)
* added AREA tag onClick event in htmltags.c to make javascript work.
* added a number of mimetype extensions to mimetype.h
* fixed OPTION element default value for certain common case.
* made POST the default form method in get_data_socket()
* fixed buffer overflows in digest authentication code
* fixed crash for META-Refresh URL's
* introduced new source-code design to get rid of tabs:
  indent --no-space-after-function-call-names
  indent -npcs -nprs -npsl -nsaf -nsai -nsaw -nut -bli0 -nlp
  Sources will be modified step by step. Care is necessary, as indent fails
  on the MT-macros! A new target reindent in source reindents the whole sources
* migrated large parts of code to ANSI-C, fixed lots of warning messages
* added --disabled-gtk2 option to autoconfig and GTK_FACE define now holds the
  GTK version number, some fixes are necessary for GTK2
* security fixes preventing possible buffer overflows
* cleanup of build system
* fixed wrong name building (BUG #1012746)
* ---------- released version 0.9.31 (2004-11-08)
* security fixes preventing possible buffer overflows
* cleanup build (language installation works again)
* added more const statements allover the source
* fixed HTML entity decoding error (thanks Michal Toma for the report)
* compiles with GTK2, but still brings run-time warnings (BUG #1068224)
* fixed handling of local anchors (<A HREF="#link">)
* fixed handling of path separators in search strings (BUG #1064453)
* read support for KDE2 cookie file (~/.kde/share/apps/kcookiejar/cookies)
* Added --enable-utf-8 option to configure, which produces all locale files
  in UTF-8 encoding.
* ---------- released version 0.9.32 (2005-03-17)
* slovak locale updated
* dont_leave_site condition no longer differentiates between protocols (HTTP,
  HTTPS, ...)
* fixed bug in case there are quoting characters inside a quoted string
* fixed strange URL's in the form <a href="?...."> to use the parent document
  instead of no document
* security patches
* fixed .pavukrc error (BUG #1247202)
* ---------- released version 0.9.33 (2005-09-27)
* fixed 64bit problems (BUG #1226863)
* updated German locale, fixes done by Debian developers (Hey, please inform
  us about errors. Scanning the net and all distributions for possible fixes
  is not very helpful.)
* ---------- released version 0.9.34 (2006-01-09)
* security fixes
* some minor bug fixes
* reworked build system a lot, fixed RPM spec file
* now builds fine using most of the possibilities pavuk provides
* RPM builds on openSUSE build service for SUSE since version 9.3, Fedora
  since version 4 and Mandriva since version 2006
* RPM packages can be found here:
* ---------- released version 0.9.35 (2007-02-21)
* added -persistent/-nopersistent option

2007-april-30 [notes taken from old work back in 2005/2006 merged into pavuk mainstream source tree]

* bufio has seen a MAJOR overhaul. It is now capable of pushing text &
  binary data to the file system at unprecedented rates. This is done by
  adding a variable sized (and possibly large) memory cache, resulting in
  large size I/O operations. These perform very much faster than the regular
  RTL I/O calls. (tested on quad CPU UNIX Dell servers)

  the new bufio was required as I needed to log/track a huge amount of data
  in the shortest possible time / lowest possible CPU load.

* cookie handling has been fixed/augmented. pavuk can now have the initial
  cookie values that go with a certain web request preconfigured on the
  commandline. Also, several bugs in handling the cookies have been fixed.
  (tested on a wicked ASP.NET intranet site which 'assumed' the use of a
  special web client (a TV set top box) which would transmit it's serial #
  as a client-side created(!) cookie to the web server. This site/client
  combo thus actually transmitted cookies which would first show up in a web
  _request_ instead of the usual: a server-side _response_.)

* several portability items have been changed (h_errno, ...) to make the
  code compile and work on the odd-flavored UNIX box. A native Win32 port is
  under way: it now works, inclusing zlib and OpenSSL, though the latter has
  not been tested recently.

  Note that the changes may have broken GTK support, as I was not able to
  build the code with GTK on my UNIX boxes.

* socket I/O (IP traffic) has been fixed to properly cope with user breaks
  (a user hitting Ctrl+C). Several locations in the software where the
  unexpected signal would cause an infinite loop have been identified and

* added several lines of DEBUG_xxx to aid both developer and user in
  tracking down hard to diagnose issues inside pavuk while scanning a site.

* Accepted-Encoding (more specifically: the handling of x-gzip/gzip/x-
  compress/compress encoding) has been changed to allow for better
  portability: data is expanded in-memory, without the need for an external
  'gzip' tool and/or OS-specific forks & pipes.

  (Win32 wouldn't know a fork if ever it saw one.)

* ALL stdio is now handled through the new bufio system. This not only
  improves performance when you've got -debug and -debuglevel dialed all the
  way up, but also corrected several spots where, depending on your C RTL,
  stdio/stderr traffic would arrive at different moments on your console
  (some of it was written through the FILE I/O, some through direct I/O,
  causing blurbs of output to pass one another along the way to the actual

* buffer overrun protection has been improved. Note also that every
  snprintf() and derivative thereof is now 'augmented' by an additional line
  of code which ensures that the last character in the buffer is guaranteed
  to be a NUL sentinel, thus ensuring that the buffer will always present
  data in correct C string format (NUL-terminated). (This is an old habit of
  mine as some C RTLs have shown to be kinda flaky on the subject of NUL
  sentinels when snprintf() et al are writing data up to the edge of their
  output buffers: some C RTLs 'forget' to put a NUL there under particular
  circumstances (some commercial Watcom compiler releases come to mind).

* multithreading pavuk has been tested on an high perf MP UNIX box and it
  was like the documentation/notes state somewhere: instable. The thread
  interlocking has now been fixed; one of the hardest to fix proved to be
  the lockup at the end of a pavuk run. The fix also includes the use of
  semaphores and some additional code changes to make the code thread safe;
  critical sections are now handled as such. This includes placing several
  non-threadsafe C RTL calls (e.g. ctime()) inside critical sections!

* auto-form-filling (the feature which led me to select pavuk over wget et
  al when I started the hammer/chunky project) has been fixed for those
  special pages where you have an empty form to submit: the site I had to
  test included such a form, which was submitted using javascript, but did
  not contain _any_ input fields (but cookies were expected to come with
  that request, thank you). Before, pavuk crashed on such a page. This has
  now been fixed.

* added a 'reindent' target to the makefile, using GNU indent to reformat
  the code. (When you're working several weeks on end in crunch time, you
  want to see some proper and consistent looking source code, even when you
  just made it a mess yourself...)

  Also extended the cleanup makefile target to help me in cleaning up any
  backup and/or temporary files created by vi and some log diagnostic

  [edit may/2007: wasn't this already in the makefiles before - see
  ChangeLog entry in 2003?]

* added several commandline parameter types, which allow you to instruct
  pavuk to use OS file handles or file names for logging activity, while you
  can now also specify whether a log file should be overwritten (default) or
  appended to (new feature) by adding another '@' prefix to the file path.

  TODO: document this properly.

* added hammer/crunchy modes: several ways to scan a web site and than
  rescan it. The higher (later) hammer mode has been specifically written to
  use pavuk as a 'replay attack' based DoS tool for testing high performance
  web servers. (bufio was overhauled to allow us to log all I/O data +
  diagnostics to disc while hammering the server while the pavuk system
  _must_ perform better (= faster) than the web server when running both on
  equivalent hardware.)

* The native Win32 port has been overhauled (previous code was never
  released to the public) to make sure I did not have to look for OS-
  specific path elements _everywhere_ in the code (it was becomes a code-
  wise maintainance nightmare while fixing up/down all those 'absolute path'
  and 'path expansion' code sections to handle Win32 drive letters (root is
  '[A-Z]:[\\/]' instead of simply '/').

  This has been fixed by using the cygwin 'path hack' for the native Win32
  port too: root is '/cygdrive/[a-z]/' so it looks exactly like a UNIX path.

  Any places in the codes which need to address the OS while passing an OS-
  specific path are now handled almost invisibly: all relevant C RTL calls
  (fopen/open/stat/lstat/symlink/link/unlink/rename/mkdir/rmdir/opendir) are
  now encapsulated in tl_[sysname] wrapper functions where these
  /cygdrive/[x]/ paths are converted back to native Win32 paths before the
  actual C RTL function is called. Also any debug/print statement, which is
  used to report a file path, is fixed to convert file paths to the native
  representation with a minimum of fuss: see the new tl_native() call for a
  description how this was done. This code has not been tested in a UNIX/MP
  environment, but the design is such that this should not cause any trouble
  (pthread port for Win32 is in progress ATM).

* added -debug_level modes: all/trace/dev/bufio/cookie/htmlform. Also added
  a feature where you can now specify a set of debug levels and have some of
  those levels _removed_, e.g. 'all,!dev' will show anything _except_ 'dev'
  level debug output: note the new '!' prefix.

* -debug_level output is now prefixed with its level in caps and square
  brackets, e.g. '[PROCE]' to aid in filtering the debug output (for
  instance by piping it through sed/grep).

* unified debug output handling in the code: -debug_levels are now only
  active when you specify -debug too.

* inflate_decode() and gzip_decode() have been fixed to suit a multithreaded
  environment. gzip_decode() now has an in-memory implementation, using the
  zlib library, for those systems which do not support UNIX pipes/forks.

* Fixed deflate/compress handling: the MJF Accept-Encoding deflate hack has
  been removed and the request header extended. (tested on a Wikipedia
  HTTP/1.1 compliant server)

  You may wish to permanently disable the code within

  in decode.c if you do not wish to depend on the external gzip tool any

* _all_ system header file #include's have been removed from the sources and
  integrated into config.h to allow for better portable source code. and have been extended to include several more OS-
  dependent system call and header file checks.

  A seperate native Win32 version of the header file is also provided (used
  by the MSVC2005 native Win32 build).

* several hardcoded buffer sizes in the software have been made configurable
  (but remain hardcoded). See for instance dinfo.c: 12 -->
  PAVUK_INFO_DIRNAME and 1024-and-other-fixed-buf-sizes -->

* fixed several cases where dangling (i.e. free()d but not NULL-ed) pointers
  caused havok. Code has been quickly reviewed to locate and fix additional
  spots that did not yet cause pavuk to go 'crazy Ivan' (Hunt for the Red
  October, anyone? ;-) )

* hardcoded lock filenames have been converted to #define's to allow these
  to be changed in a single spot (config.h), improving portability. e.g.:

    '._lock' --> PAVUK_LOCK_FILENAME

* UNIX-specific octal privs have been changed to their proper #define's to
  allow for maximum portability (Win32 doesn't know '0644' but can cope with


  though maybe in a odd way).

* fixed quite a few spots where an unidentified form encoding method would
  lead to _very_ instable bahaviour, including crashes/core dumps. Look for

    fi->method = FORM_M_UNKNOWN

  assignments and additonal FORM_M_UNKNOWN checks.

* added -no_dns support for those who have to work in an environment with
  flaky or no DNS support (I had to as I was working on a box in a specially
  configured, partially walled-off DMZ zone while developing and testing
  pavuk against a web server.)

* fixed typos in the text as I came along them.

* the bufio overhaul also lead to a overhaul of the -dumpxxx code,
  removing/fixing several spots in the code which caused incorrect/instable
  behaviour. (e.g. code in doc.c)

* Fixed handling of compressed data for any text-based server response;
  pavuk now correctly handles any gzipped/deflated text, including, for
  instance, any 'text/javascript' content sent over the wire in compressed
  form (tested on a Wikipedia-based HTTP/1.1 compliant server).

* added -progress_mode: several choices in progress verbosity.

* added -no_disc_io: test a grab/scan without writing anything to disc.
  Mostly useful in combination with the earlier -hammer modes.

* fixed/updated HTTP error response handling in accordance with RFC2616 so I
  can better see what a HTTP/1.1 compliant target is reporting back to
  pavuk. (errcode.c et al)

* unified timing units to fix a few timing oddities: instead of minutes,
  etc. the code uses seconds everywhere (apart, of course, from the few
  locations where we use milleseconds ;-) )

  -timeout is now in milliseconds!

* Added -rtimeout and -wtimeout command line parameters.
  (unit: milliseocnds)

* added -allow_persistent / -noallow_persistent commandline arguments to
  allow/disallow the use of HTTP/1.1 persistent connections.

* added -dumpcmd and -dumpdir commandline arguments.

* added -bad_content commandline argument for use with the hammer/chunky

* added -report_url_on_err commandline argument: report the URL which was
  processed while the error occurred.

* added -test_id commandline argument: this is included in the timing report
  so reports can be better automatically processed / combined.

* added -page_sfx commandline argument to help pavuk identify what suffixes
  are to be considered web pages (useful for scanning ASP and ASP.NET sites
  which present unusual mime types with their pages).

* added -tlogfile4sum commandline argument: specify a log file where timing
  info is stored. Handy when pavuk is not only used to grab the info off a
  site but also scan & report site performance.

* added -encode commandline parameter as the counterpart of -noencode.

* added -nohtDig, -noquiet and -noverbose commandline parameters as
  counterparts of -htDig, -quiet and -verbose respectively.

* added filepath support to -dumpfd and -dump_urlfd: by specifying the
  option prefixed with a '@' character, pavuk will treat the option value as
  filepath specification instead of a OS file handle and subsequently open
  the specific file internally. Note that adding yet another '@' character
  as a prefix signals pavuk to _append_ to the specified file, instead of
  _overwriting_ it.

  This is useful when you wish to have those dumps but are working in an
  environment where you cannot pass valid file handles through the

* added -dump_request and -nodeump_request commandline arguments for use
  with -dumpfd: when -dump_request is specified, the log file will include
  complete dump of each request sent to the server by pavuk. Thus you can
  produce a complete audit trail of the exchange.

* replaced the DUMP_URLLIST macros in stats.c by two functions. Code is a
  bit cleaner that way.

* fixed times.c which barfed on timestamps beyond 2037 (signed int wrap
  around for time_t).

* added assert() checks at several locations in the code to help track down
  unexpected behaviour which could lead to crashes (like it did till now).

* unified the proliferation of HEX2ASC-alike macros with and without off-by-
  one offsets inside. Now there's one macro for each of 'em in tools.h.

* changed the option to --disable-threads to keep the pattern
  consistent (--disable-xxx series of options in configure), but the default
  behaviour remains the same.

* as --disable-debug removes any debug-_related_ features from
  the pavuk build, these options have been added: --disable-debugging will
  create a default build with all debugging removed from the compiled
  binaries. --disable-prof and --disable-gprof have been added to remove any
  profile info from the default compiled binaries.

* added checks in for socklen_t, pid_t and a bunch of system
  calls and header files that do not live in each environment.


* included pthreads-Win32 based multithreading support in the native Win32

* included EXPERIMENTAL tre (regex) support in the native Win32 build.

* fixed several lurking bugs (buffer overruns, etc.) which only showed in a
  multithreaded environment.

* fixed locking bugs in the new bufio implementation.

* added Win32 memory leak + heap checking for the DEBUG build: many memory
  leaks have been tracked and fixed. (MSVC <ctrdbg.h> based)

* fixed memory leak due to wrong scope in report_error() code.

* added DBGxxx macro's to aid heap tracking for the debug build. See
  DBGdecl/DBGpass/DBGvars usage.

* removed a very nasty memleak in html_parser_get_url() which would leak at
  least 3 blocks for each rejected local anchor URL - and those come quite a
  few! Took me a day to track it down. :-(

* added filtering so gzipped/compressed files on the server are not
  decompressed unintentionally while the server supports Accept-
  Encoding:gzip or compress.

  ( doc_download_helper() in doc.c )


* renamed function should_leave_persistent() to the more appropriately named

* Updated 'chunky' source to the state of the latest pavuk CVS contents (as
  of today) as this code has not yet been merged into CVS itself.

* fixed bugs in -scenario handling, when scanrio files produced by pavuk are
  re-used in the Win32 environment

* fixed bugs in path & file type commandline arguments for the native Win32

* fixed bug in retrying/resuming download for RFC2616 (HTTP/1.1) 'chunked'
  content download handling.

* merged -allow_persistent / -noallow_persistent commandline arguments with
  the equivalent -persistent/-nopersistent feature from the official pavuk
  CVS sources.

  Also improved the code a bit: added the 'Connection: close' header for
  requests over -nopersistent connections, so the server will close the
  connection for us.

* added the -ignore_chunk_bug commandline argument to allow pavuk to handle
  RFC2616 'chunked' downloads from buggy (IIS) web servers.

  ( See also:


* recompiled in 64-bit Linux (SuSe 10.2) and fixed a few items in the, and files. Also added the tests\
  and www\ directories to the distro.

* fixed a few 64-bit compile warnings; at least the test cases in tests\
  perform OK now on a 64-bit Linux system.

* updated the man page a bit; still a lot more to do. Where is that 'nroff
  for dummies' cheatsheet when you need it?  ;-(

* listed -use_http11 as 'on' by default now.

* moved MODE_MIRROR unescape code section up in url.c to line 1682 in
  url_get_local_name_real() as this code would otherwise have no effect at
  all in any environment where the '%' percent character is included in the
  FS_UNSAFE_CHARACTERS charset (for example: Win32).

* PARAM_DOUBLE default values are now fixed point values in 'long' integer
  format; the current values in the program (all 0.0) are clearly within
  range _and_ it 'saves' on compiler warnings quite a bit. (We've still some
  way to go before we get anywhere near a '[almost-]zero-warning cross
  platform portable build: few int to pointer and vice versa casts remain.)

* fixed bug in cfg_get_num_params() which would access uninitialized memory
  out there in NirvanaLand when a PARAM_UNSUPPORTED option was passed to

* Fixed to include 'debug' build handling for KDevelop (which
  would pass '--enable-debug=full' to ./configure).

* updated the script to increase portability (opendir/closedir:
  dirent.h et al)

* included a few aufoconf macros in the m4 directory for easier/proper
  portability support using autoconf et al.

* bugs fixed from BUGS list: multithreaded mode is not as stable as single
  threaded (fixed at least for the CLI version of pavuk; the GTK GUI version
  is in a rather bad shape)

* bugs fixed from BUGS list: signal handling / timeout does not really work
  (at least not in multi threaded downloads). After a SIGINT pavuk just
  hangs.) This has also been fixed for the CLI version of pavuk at least.

* Win32 port now includes JavaScript support (using the statically linked
  Mozilla js library).

* fixed short option definitions in options.h: -tp / -tsp et al

* 'fixed' GUI for Javascript enabled builds (GTK2) - WARNING: it compiles
  now, but has NOT been tested, so expect bugs here!

* merged the 'chunky' code with the pavuk main source tree. Now 'chunky' is
  equivalent to building pavuk with './configure --enable-hammer'.

* set default from -leave_site to -dont_leave_site to prevent 'blown up' web
  crawls when this filter parameter has not been specified.

  This change includes a fix for the cfg/command line handling of pavuk for
  the conditions section (see condition.h + config.c) as pavuk assumed
  sizeof(long)==sizeof(int) in these code sections.

* Now the proper GPL license (GPL, not LGPL) is included in the file


* fixed processing of zero byte length files (robot.txt at,
  etc.): no more crash/assertion failure due to NULLed docu->contents.

* fixed a few memleaks.

* added extra error checking for file rename operations as some issues were
  found with the Win32 build when using a SAMBA-shared filesystem for
  storing the spidered data/files. (It turned out that the same issues
  existed when using native (NTFS, FAT32) filesystems.)

* dialed down the number of default threads from 3 to 1 (see BUGS) to
  prevent a hail of (legitimate) rename error reports.

* added flock() implementation for Win32: when built with multithreading
  support, having no valid flock() implementation is very dangerous!

* changed to detect both flock() and fcntl() file locking
  mechanisms so pavuk will be able to support writing spidered content to
  network shares on both Win32 and UNIX systems: flock() does not support
  network shares locks, fcntl() does, at least on the latest Linux kernels,
  see man flock(2)

* added error reporting/checking for undesirable use of invalid flock()
  implementation. (Useful when porting pavuk to other non-Unix platforms.)

* Fixed content/file size treatment code for items which are already
  available locally (i.e. pavuk finds the item at the remote has not changed
  from when the last time it fetched the item into local cache).

* Fixed the conditions for when to display certain informational messages:
  less screen clutter when not running in '-verbose' mode OR when running in
  '-progress' modes.

* Fixed several error/info messages in the code section for decompressing
  gzip/compress transmitted HTTP content.

* Fixed handling of gzip/compress transmitted content when retrieved from
  local store instead (when pavuk discovers that the file at the remote site
  has not changed since the last time it was fetched and stored on your
  local disc).

* Fixed a few memleaks.

* Changed the DBGvars/DBGpass/DBGargs macros used for tracing memory
  allocations in debug mode to make these macros look more like regular 'C'
  functions to 'demented' code formatters and analysis tools. The drawback
  is that these still look 'weird' in function prototypes, but that causes
  quite a few less errors/warnings than the old style.

* Fixed bugs in get_abs_file_path() directory detection and Win32 abs path

  Also fixed code which produced double slashes in file paths on occasion,
  causing trouble on Win32 platforms. (Fix applied generally.)

* Fixed mk_native() allocated string management pool to support printf() et
  al where up to 3 mk_native() calls are made in the argument list. This is
  important to prevent spurious crashes in multithreaded mode when the worst
  case scenario for mk_native() applies: all threads are executing printf()-
  style statement which has multiple calls to mk_native() in the argument

  Currently overdimensioned a bit as the actual code only has two
  simultaneous calls while the pool now is dimensioned to tolerate 3
  simultaneous calls per thread.

* No more _strfindnchr() and strfindnchr(): strfindnchr() - and its use -
  has now been fixed to match the (proper working) _strfindnchr().
  [fnmatch.c/tools.c et al]

* Fixed const-correctness of several functions.

* Added '-mime_type_file' commandline option to help pavuk support an up-to-
  date list of mime types and their filename extensions, using, for example,
  the UNIX mime.types(5) config file as a source of MIME type information.

  If the user does not specify the '-mime_type_file' option, the original
  built-in defaults will be used instead.

  This feature has been added to provide better support for the pavuk -
  fnrules %M macro: this macro now will use this configuration to produce a
  suitable filename extension for each MIME type: the first extension listed
  in the '-mime_type_file' config file for the given MIME type will be used
  as extension for the %M macro.

* Changed the GTK GUI macros to become functions for ease of debugging. The
  added (tiny) call overhead won't be a performance hit anyway.

* Fixed -fnrules handling: the generated path is cleaned up before it is
  returned to pavuk for use.

  Cleanup actions:
  - duplicate '/' slashes are removed
  - filenames and directory names which end in a '.' dot, get the dot

* Added '%X' to the -fnrules formatted processing to allow reformatting of
  filenames using an optional mimetype-derived extension. This is useful
  when grabbing Wiki (MediaWiki et al) sites when you'd like to store the
  grabbed content using default mimetype-related filename extensions, so
  instead of storing a file like


  that would transform into


  while pages like


  would remain as is.

  (Note: this might be considered shorthand for a -fnrules (...) expression
   which compares both %e and %E. The intent of %X, however, is to only
   allow %e extensions to pass which are 'valid' for the given MIME type and
   force the %E mimetype based extension for all other cases.)

  CAVEAT: %e/%E/%X/%Y will print the extension WITHOUT the leading '.' dot in
          both simple mode and extended LISP mode.

* Added '%Y', '%A' and '%B' to the -fnrules macros: '%Y' uses the MIME type
  prefered filename extension if the URL/filename doesn't have an extension
  yet (while the rather similar '%X' will OVERRIDE the existing extension if
  it is not listed with the specified MIME type).

  '%B' prints the 'basic MIME type', i.e. the MIME type without the ';'
  semicolon separated MIME attributes such as language, etc., while '%A' will
  print these extensions (if they were passed to us by the server).

  CAVEAT: %e/%E/%X/%Y will print the extension WITHOUT the leading '.' dot in
          both simple mode and extended LISP mode.

  All this allows for pavuk -fnrules commandline arguments like this:

    -fnrules F '*' '%h:%r/%d/%b%s.%Y'
    -mime_types_file ./mime.types
    -tr_chr_chr ':\\!&=?' '_'

  so we'll be able to grab a [Media]Wiki site while storing those pages as
  regular 'abc_php_xyz.html', instead of 'abc.php?xyz' page/filenames.

* Added -fnrules 'fnseq' operator to the extended rules: compares a
  wildcard pattern and a string a la fnmatch(3).

* Checked and updated manpage for the -fnrules operators (added 'ud' and
  'sp' operators to the manpage).

* Added -fnrules 'sn' operator to the extended rules as counterpart of 'ns'.
  'sn' uses strtol() to convert a string to a number, while 'ns' uses
  printf() to format a number to a string. (See the man page.)

* Updated the man page a bit regarding '-fnrules'.

* sanitized escape_str(); a quick code review led us to a lurking bug in
  uconfig.c@309, which has been fixed implicitly.

* Added/updates source code documentation: tools.c/tr.c soure code comments.

* Added some sanity checks in the code (tools.c/tr.c/lfname.c)

* Added debug_level 'rules' to allow debugging of both simple and 'extended'
  -fnrules expressions and '-fnrules' URL F/R matching.

* Different boxes exhibit different mktime() behaviour, especially when
  handling out of range tm value sets. Besides, mktime() works in 'local
  time' while some parts of the code require a robust UTC mkgmtime() (not
  available on many boxes) --> ripped & introduced as tl_mkgmtime(). A local
  time-aware equivalent with excellent out-of-range handling is available as

* Added additional error handling around calls which try to parse time
  stamps using tl_mkgmtime() and tl_mktime() (times.c).

  Basically, now both HTTP and FTP benefit from the new code which should
  now proces timestamps like the UTC timestamps they are, while 'out of UNIX
  time_t bounds' timestamps (beyond the range 1970..2038 A.D.) are handled
  in a more sane manner:

  - out of bounds timestamps are reported by pavuk

  - out of bounds timestamps are then 'sanitized', i.e. restricted to the
    1/1/1970..31/12/2037 date range, i.e. a timestamp beyond the horizon,
    like '1/4/2051' will be 'sanitized' (= restricted) to the upper bound:
    31/12/2037. The same goes for te from antiquity like '11/3/1969' (the
    birthday of a certain person), which will be 'sanitized' towards

* Split up DEBUG into developer related stuff, such as memory/heap checking,
  ASSERT/VERIFY, etc. and user related stuff (the -debug and -debug_level
  command line arguments): ./configure is now fitted with an extra


  which will turn on/off -debug/-debug_level user level debugging support in
  pavuk, while the existing


  adds/removes additional developer checks, such as heap allocated checks
  and ASSERT and VERIFY macros.

  In the code, -debug/-debug_level related code is located within the
  'HAVE_DEBUG_FEATURES' sections, while the developer debug/release builds
  are still related to the standard 'DEBUG' #define.

  This now results in three ./configure options that determine the (debug)
  feature set of your binary:

  --enable/disable-debugging --> compile a binary with source level debug
                                 info included and all optimizations
                                 DISabled for improved debugging (by using
                                 gdb or another debugger of your choice)

  --enable/disable-debug     --> include/exclude additional run time checks
                                 in your binary. Most important are the
                                 ASSERT and VERIFY pre/post-condition
                                 validation methods located throughout the
                                 code. The use of these is advised, though
                                 these may cause a performance hit.

                             --> include/exclude user level -debug/-
                                 debug_level command line features, which
                                 help you as a pavuk user to 'debug' pavuk
                                 during the run. Using -debug, pavuk will be
                                 EXTREMELY verbose, which can be toned down
                                 by applying a -debug_level restriction
                                 filter. For example:

                                   -debug -debug_level all,!devel

                                 will be VERY verbose, but will NOT log any
                                 DEVEL level debug info, while:

                                   -debug -debug_level !all,rules

                                 will ONLY produce additional output for the
                                 RULES level, i.e. when pavuk processes -
                                 fnrules and/or JavaScript macros.

* Fixed crash when non-RFC compliant website was grabbed: see testcase 7a.

* Added targeted help: when options cannot be parsed correctly,
  short_usage() will try to help the user by printing the full help for the
  abusing commandline option only. (Of course, I screwed up while using
  debug_level flag sets _again_ :-( [Ger])

* Some improvements for network connectivity error handling and reporting.
  (xvherror() added.) This is the result of some FTP tests with pavuk (tests

* Don't yak about 'Checking "robots.txt"' anymore when doing a FTP grab when
  robots.txt is NOT applicable anyway.

* FTP: added crude 'autodetect/retry' mechanism for FTP servers which do not
  like NLST (==> response code 550) but report correct directory content for
  LIST (or vice versa). (ftp.c)

* FTP/HTTP: at debug level 'protoD' pavuk will now dump RAW data/content
  received from the server before preprocessing (i.e. converting to HTML or

* Added command line option integer sizing support: byte sizes can now be
  specified in K, M or G. Other integer values can also be postfixed with K,
  M or G, but then these will be treated like the ISO values 1000, 1E6 and

* Addition memory leak fixes in case pavuk is fed an invalid commandline.

* NTLM support code: fixed a few glaring bugs.

* Added O_SHORT_LIVED to lock file open() flags for better Win32 behaviour.

* Fixed code to load the pavuk configuration settings from, in order of


  which matches the description in the manual.
  (see also man page)


* Added 'js' flag to '-debug_level', which is used to dump a lot of detail
  about the pattern matching and transformation applied to JavaScript code
  using the '-js_pattern' and '-js_transform / -js_transform2' commandline

* Added sanity check for '-js_pattern' and '-js_transform[2]' regexes, which
  MUST contain a subexpression for them to 'work' as expected.

* removed re_pmatch_sub() and changed the code where it was used to work
  with the available re_pmatch_subs() call, which allows for more elaborate
  validation anyway. See htmlparser.c.

* Removed a regex handling bug in the -js_transform[2] code, which would
  crash pavuk when using regexes where the first subexpression might be

  The crash is due to the fact that the regex parser would return indexes '-
  1' for these empty subexpression(s), resulting in out-of-bounds memory
  writes in the rewrite code. This in turn would nuke the heap, so after
  that is was only a matter of time for pavuk to fail dramatically.

2008 feb 04

* Added DEBUG_MISC() lines to solve issue: [ 1852885 ] to
  improve manipulation by locally stored files

* Included provisional fix (I don't have a working sample run to reproduce
  the issue (yet)) for issue: 1852884 ] infinite loop on
  unexpected responses

* Cleaned up the mess that was -progress_mode.

* Cleaned up several DEBUG_xxx macro mistakes

* Added a little description to the 'hidden' -htDig commandline option,
  which can be used to dump the server-transmitted MIME headers for each
  URL, similar to the htdig tool.

* Added a bit of documentation for the -rollback option (which was

2008 mar 20

* GNU gettext tools don't like '\r' in i18n strings --> fixed by changing
  the related printf() statements in src/doc.c

* started update of configure scripts to the latest autoconf/automake.

  Also reordered the NEWS file so it will work with the new, stricter

    ./bootstrap && ./configure && make distcheck

  distro test cycle.

2008 jul 10

* fixed ';' semicolon bug in http.c near line 2074 which caused incorrect
  decoding of the HTTP/1.x response code header.

* fixed gzip/compress/... content compression support (HTTP/1.1 Accept-
  Encoding); the previous code was a valliant attempt to 'fix' the client
  side (pavuk) to cope with buggy web servers which send the wrong encoding
  type for already compressed files, but this would screw up particular
  responses by *well-behaving* web servers. Of course this would only happen
  in rare circumstances so it was kinda hard to track down.

  Documentation for -Enc/-noEnc has been updated to reflect this situation
  and the code now (hopefully properly) finally supports compressed data
  transmission for RFC2616-complaint web servers.

  If you find that your 'downloaded' compressed files are already
  /incorrectly/ DEcompressed by pavuk, this is NOT the fault of the client
  (pavuk) but evidence that your server is behaving inappropriately and the
  proper remedy for this is the use of the option '-noEnc' which turns this
  feature off so the server is not allowed to screw up in this way any more.

  Also made sure one can check if pavuk has been built with compression
  support by calling 'pavuk --version' and looking at the feature list.

* autoconf/configure script: using the highly undocumented v_cflags or other
  x_* variables as environment variables to hack the configure script (you
  could do that, especially with v_cflags) has been obsoleted while the
  configure and m4/* scripts have been upgraded to support autoconf
  2.62/automake 1.10 and use ONLY *documented* AC.*/etc. macros from now on.

  Note: thanks to the JavaScript library issues on SuSe10.2/AMD64 (older JS
        lib version and seemingly partial header install), I may have failed
        to eradicate all undocumented macros.

* Extra note about bash, at least on SuSe10.2/64-bit, handles
  'if eval test ...' just ever so slightly different than 'if test ...',
  especially where it comes to 'test -n'. As these styles were mixed rather
  arbitrarily before, the 'if eval test ...' style has been completely
  removed from the configure script, as this would sometimes render quite
  unexpected (and incorrect!) results.

* has been updated to ensure important Microsoft Visual Studio
  files are not damaged by having their CRLF sequences converted to UNIX LF
  line endings: this kind of thing will make MSVC spit you in the face and
  reject everything you try until you give it back those CRLF line endings
  in there. So much for XML as project file format and MSVC...

* extra fixes to ensure 'make distcheck' does not barf up a hairball. This
  includes enforcing the permanent inclusion of the 'po' subdirectory in the
  Makefile set for multilingual support.

* configure/Makefile(s): if you don't have one or more of the
  archiving/compression tools compress/lzma/gzip/tar/7z(7zip) installed on
  your system, we don't go belly up at config ~ nor at 'make dist' time
  anymore. This, of course, includes correct behaviour at 'make distcheck'
  time: only use/test those 'GNU standard' formats, which can be created on
  your box.

* Added the 'bootstrap' shell script, next to ''. I know they
  serve the (almost) same purpose, but 'bootstrap' is far more sophisticated
  than and I didn't wish to overwrite ''. Besides, IDEs
  on UNIX boxen expect either the one or the other (there's no single
  'standard' for this), so we might as well provide both.

  At a later time, we might probably point to bootstrap.

* Updated the mime.types MIME 'hint' file: currently, it's a mix of

  1) all properly registered MIME types ( )

  2) the mime.types file provided with the latest Apache/XAMPP

  3) my (Ger Hobbelt) additional file extension hints as used on my own
     servers. This is mostly about professional graphics ~ and modern
     'scene' audio/video container formats, such as Matroska. This only adds
     extensions for otherwise already existing MIME types.

* Updated the DocBook-based documentation for several options (-End/-noEnc, ...)

* 'pavuk --version' now also reports if ZLIB support is included in the
  binary. This is important for '-Enc'.

* Fixed the '-Enc' compressed transmission and HTTP header processing code
  to act properly with fully RFC2616-compliant web servers, discarding the
  old 'hack/fix' attempt to solve a non-complaint server issue at the
  client, as this would break things for fully compliant servers in the rare
  (but extremely annoying) use case:

  - pavuk with '-Enc' option

  - webserver is fully RFC2616 compliant

  - pavuk issues request for file in a .tar.Z or other gzip/compress
    compressed format, where the file on the server is only slightly
    compressed (fastest compression).

  - webserver will transmit file to pavuk, but due to pavuk reporting it is
    able to handle compressed transmission AND the server discovering that
    the content can be compressed quite some more than it already was, the
    file will be transmitted after a server-side just-in-time compression

  - pavuk receives the data. The old hacked code would NOT decompress the
    data. However it SHOULD because the server PROPERLY reported 'Content-
    Encoding: gzip' to pavuk. End result: grabbed data which you cannot
    process nor trust to be in the same format as stored on the server as it
    all 'depends' on arbitrary conditions which you cannot control: is the
    web server able to compress the data before transmission? Is the web
    server configured to allow compression? Etc.

  This use case has now been fixed.

  The effect of BADLY behaving web servers (which send 'Content-Encoding:
  gzip' for any .Z, .z or .gz files (IIS x.x and other servers which are not
  configured to /properly/ handle files and MIME types) is described in the
  DocBook manual page now, including the fix for this (specify the '-noEnc'
  commandline with pavuk).

* active FTP: timeout and stop/break handling slightly improved: now pavuk
  should always terminate under all circumstances while a break or stop has
  been signalled.

* Changed the default for '-url_strategy' from 'level' to 'leveli' to make
  pavuk behave more like your regular web browser (with a user clicking
  through web pages).

* Initial fix for NTLM support for 64-bit Windows. (Only lightly tested.)

  This includes converting that bit of code to support the C99 intNN_t types
  (where NN e {8,16,32}), while the configure script takes care about
  providing the proper types for not-fully-C99-compliant environments.

* The TRE regex package would barf up a hairball due to the incorrect header
  file being loaded. ./configure now recognizes TRE specifics a bit better
  and the code now loads the proper header file (<tre/regex.h> instead of
  <regex.h>). This is important on systems which have multiple, ever so
  slightly incompatible regex processing libraries installed.

* Improved diagnostics a little bit by adding reporting support for
  URL_PARENT_REWRITING, i.e. the situation where a parent page of a grabbed
  page is loaded for the sake of adjusting (rewriting) the URLs in its

* Fixed code so it would compile in full (-DDEBUG) debug mode on UNIX.

* autoconf/configure: ran into some weird issues due to inconsistent M4 []
  quoting: quite a few lines did without it. Turns out that this is a BIG
  No!No! as adding the AX_ADD_OPTION() macro turned this lurking mess into a
  true disaster.

  Fixed by applying [] quoting throughout. The only place where I didn't do
  it, is in the first and second args of AC_DEFINE() -- which should be used
  instead of AC_DEFINE_UNQUOTED when you don't need the latters extra
  functionality anyway -- and the first arg of AC_DEFINE_UNQUOTED(). Any
  other spot where [] quotes are missing in the M4 macros and/or Consider that a bug and please report so I can fix it.

* Finally got the configure system to recognize my JavaScript libraries and
  all. Tugged and tweaked a few items in the bindings to allow maximum
  flexibility for the JS code when it is used to filter URLs (e.g.
  JavaScript pavuk_url_cond_check() function).

* Updated jsbind.c to use latest SpiderMonkey 1.8.x (tested on Win32)

* Changed man/Makefile to ensure HTML is not recreated every 'make' run, but
  only when manpage changes. This should really copy the results from
  ./doc/, but that's for later...

* DocBook documentation: tweaked man page generation to mimic original
  manpage title exactly.

* DocBook documentation: updated '-version' info (important to see at run-
  time what abilities you've got with /your/ pavuk.

* Win32/MSVC: all project files have been updated to produce next to
  Win32/x86: Win64/AMD64 and Win64/Itanium binaries. These project files
  assume the existence of all optional libraries: OpenSSL, SpiderMonkey
  (JavaScript), zlib.

  Where to get those, prefered directory layout, etc. to be published, so
  others can build from source on Win32/64 too and get the same results.

2008 jul 20

* tweaked configure+makefiles so that a 'make dist' from CVS becomes
  possible: there were quite a few references to yet unpublishable files in
  my makefiles (Ger Hobbelt).

* config section: improved adherence to C standards: no more potentially
  dangerous mixed use of function and data pointers by typecasting function
  pointers into data pointers and vice versa.

  This has been resolved by an added layer of indirection, which makes it
  all very legal C again. It goes somewhat like this:

    function_pointer_type ptr = &function;
    data_pointer_type d = &ptr;

  then use (d[0])(...) to call the function.

  This contrasts the old code:

    data_pointer_type d = (data_pointer_type)&function;

  and function invocation using:


* Added support for parsing 'hidden' CSS and JavaScript in HTML. The support
  is also extended to generally parse inside HTML comments PLUS Microsoft IE
  CC's (Conditional Comments): <!--[if...]><![endif]-->


  These are all enabled by default; documentation has been updated for these
  as well.

* Fixed CSS and [Java]Script handling in the HTML tokenizer/parser, which
  was feeding the filters and URL extractors (htmlparser.c).

  Now the code can cope better with incorrectly formatted pages / files.

* Reordered the HTML tags in htmltags.c in a preparatory move to check the
  list for missing attributes (onXXX JavaScript items for one! several are
  missing) and HTML 3/4 tags. (htmltags.c)

2008 aug 13

* updated the -debug_level related code; DEBUG_DEVEL() and a few others now
  'automagically' report the sourcefile+lineno without the need to specify
  these explicitly + some DEVEL_*() calls have been shifted to other
  '-debug_devel' levels (net, mtthr, htmlform, ...)

* completed the -debug_level tracing for multithreaded runs: now all
  semaphore accesses can be traced using the -debug_devel mtthr

* Major fix for bufio+socket code: no more lockup for pavuk due to delayed
  reception of response data (tl_selectr() would incorrectly lock
  indefinitely -- which proved to be a generic coding mistake in both
  tl_selectr() and tl_selectw() -- PLUS better error condition handling in
  an attempt to improve handling of all sorts of 'spurious error conditions'
  which may occur when your network suffers from packet loss or other
  undesirable effects.

* -mode remind code fix for multithreaded use to make it match recurse and
  other modes better; not severely tested so YMMV! (The old code wouldn't
  work anyway, so it's an improvement anyhow).

* few code cleanups (#if 0 ... #endif)

* DocBook manual updated: now all return codes from pavuk are documented.

* minor code fixes for SSL/SFTP.

* updated configure and code to assist in compiling with both latest
  SiderMonkey and older Mozilla JavaScript libraries (Win32/64 and UNIX

* Some unused error checks replaced by ASSERT() and some ASSERT()s replaced
  by error reports as those errors /can/ happen in actual use (though

* Fix for parsing malformed URLs (with multiple '#' and/or '?': bookmarks
  and query string parts would not be stripped/detached correctly as the
  last '#'/'?' instead of the FIRST occurrence of '#'/'?' would be picked as
  a separation point.

* Ran the gettext files through pot/pox/po again. Lots of 'fuzzies'... These
  need to be fixed.

* EXPERIMENTAL: added preliminary code for extended JavaScript support:
  hooks to process HTML and CSS just like you can process embedded <SCRIPT>s
  now. The new hooks are still 'nulls', i.e. do not have any effect.

  This is a work in progress; it compiles & runs (tested on UNIX and Win32
  in multithreaded mode) but the new hooks still need to be implemented.

  The goal here is that all grabbed (parsable) content should be processable
  by custom JavaScript script functions AND when more than one URL is found,
  the JavaScript code should be allowed to add those extra URLs to the pavuk
  queue (using the new url.queue() JavaScript PavukUrl object method --
  currently a 'nil' member function as it still must be fully implemented).

* isatty() fixes which check for error conditions and do /not/ provide
  special 'console oriented' features when isatty(0) produces an error (may
  happen on Win32/UNIX).

* Checked and updated all header files (after I ran into a cyclic dependency
  when changing a bit of code): no .h files will #include "config.h"; all .c
  files /do/ #include "config.h" as the first header.

  System-dependent stuff (TRUE/FALSE definitions and a few other bits) have
  been moved to config.h (where they below IMO) and removed from tools.h

  This is a change required for the gzip fix [SF bug #2050527].

* Preliminary fix for CSS url grabbing and rewriting bug [SF bug #2050537].

  The new code will now try to keep these three styles of <url> formatting
  in CSS intact -- this is done so as to keep particular CSS browser hacks
  intact as much as possible:

    @import "<url>"
    @import url(<url>)
    @import url('<url>')
    @import url("<url>")

  and of course the use of 'url()' elsewhere in any CSS is treated like the
  three examples above, i.e. NONE of these should be changed regarding <url>
  delimiters (quotes or braces) when rewritten by pavuk.

  The ONLY situation where pavuk will CHANGE the quotes is when a <url> is
  found to contain the delimiter quote itself: in that case the quotes are
  changed from ' to " and vice versa.

2008 aug 18

* minor fixes to the includes mime.types file

* configure: added support/auto-detection for the GNU GDB extended debug
  output (-ggdb -g3) for when building a debug build.

* NTLM: fixed code for Win64 and other 64-bit platforms which do or do not
  support structure packing.

* documentation update: -[no]chunk_bug commandline argument finally
  documented (was in there already for a longer time; is a special fix for
  badly behaving IIS web servers which transmit data in 'chunked mode'.

  Also upgraded the documentation for the -tr_str_str/tr_chr_chr options so
  one can finally read how to use [:print:] and other definitions in there
  for -tr_chr_chr and be able to determine up front what the bugger will do
  for you.

  For example:

  Why does -tr_chr_chr '[hexnum:]' '0123456789abcdef' *not* do what you
  expect when the filename has any of the a..f characters? (Answer: they all
  become 'f' as [:hexnum:] actually expands to


  itself, so it is longer than the destination set and by definition any
  'overflow' will be replaced by the last character in the target set.)

* HTML/CSS/JavaScript parent rewriting was sometimes flaky; this has been
  fixed by fixing several bits of antiquated code in pavuk: now all code
  sections are equaly aware of URL_ISHTML, URL_ISSTYLE and/or URL_ISSCRIPT.

  Several functions have been adapted to mirror the new awareness:

  ext_is_html() has been enhanced and has been renamed to actually show its
  intended function: ext_is_parsable() -- which can be a HTML, CSS *or*
  JavaScript file! (not only HTML can be parent of other URLs and need
  updating ('URL parent rewriting').

  [ SF bug #2050537 ] CSS @import bad / HTML corrupted --> fixed

* On SuSe10.2/AMD64 glibc6 dumped core when running pavuk in full-out '-
  debug -debug_level all' (the latter is implicit when you use '-debug')
  mode. This was caused by glibc()'s printf() functions *sensibly* executing
  a strlen() operation on the data fed to one of several '%.*s' printf()
  formatting parameters, while those data series had NOT been NUL

  This would happen when debugging pavuk while fetching data from a gzip-
  enabled web server: the gzip/inflate code would NOT append a new NUL

* Several other '%.*s' and '%s' related core dump spots in the DEBUG_XYZ()
  code which would dump downloaded content have been fixed by feeding the
  data through an enhanced asciidump function -- which will switch to HEX
  dumping when the content to be shown for scutiny contains a large amount
  of non-ASCII data (> 10% is the current heuristic to switch over).

* glibc6 on SuSe10.2/AMD64 would also dump core when being fed a 110K string
  to a printf '%s' statement. This has been fixed by always limiting the
  amount of content to be displayed when debug-printing downloaded data
  (various '-debug_level's)

* gzip/inflate would fail to perform on 'non-parsable' content, i.e. plain
  text files downloaded from a gzip-enabled web server. This has been fixed.

  CAVEAT: The current gzip/inflate code does not deliver when it is fed very
          large files. Hence, when downloading VMware images and/or multi-GB
          ISO files, a workaround is to specify -noEnc. This will be fixed
          at a later date.

  [SF bug #2050527] nonparsed files saved in (wrong) compressed when using 
	            HTTP --> fixed

* Parent rewriting would try to treat all parents as HTML, which is VERY
  wrong when the actual parent is a CSS stylesheet or a JavaScript script
  file. Fixed.

* unified variable names for 'struct doc' variables: it is *QUITE*
  irritating to loose your display of 'docu' contents just because this call
  uses 'docp' for the same (or 'html_doc') while trying to track down
  lurking parent rewriting and file URL parsing bugs.

  Updated all sourcefiles to the use of varname 'docu' for the current
  document. 'docp' and 'html_doc' have been renamed.

* two bugfixes for the tr() code: (1) when using X-Y character ranges, the
  size estimator would allocate way too less space. This has been fixed. (2)
  the documentation says it well: you cannot include a NUL in a tr()
  character set. In one case (a range at the start of the spec like this: '-
  z' would actually attempt to insert such a NUL anyhow, causing subtle bug.
  Fixed. And a minor code cleanup.

* fixed argument quoting for external app invocation, which is particularly
  important for Windows machines: they treat '-quoting quite different from
  "-quoting. Fixed by using "-quotes instead of the original '-quotes.

* -enable_js is now turned ON by default - just like the documentation
  already said.

  KNOWN ISSUE: empty lines in JavaScript code and files gets stripped by
               pavuk on rewriting; this will be fixed at a later date.

* fix in mime.types file for CVS file extension + added mime types for 
  Microsoft Office 2007

* fixed heap corruption in ainterface.c when calling append_starting_url()
  when url has been specified in the extended '-request' format, including
  a predefined local filename. (Would dump core on some systems.)

* moved the url2diag and info2diag functions from recurse.c to where they should
  have been: url.c -- to resolve a cyclic dependency.

* fixed up the '-request' format url parser/decoder url_parse() call: several
  types of input specification error would be silently rejected (now pavuk
  prints a suitable error message to tell the user what [s]he did wrong and what
  was expected) + a few tugs & tweaks to fix behavior for parsing extended 
  URL specifications (including cookies, predefined local filenames, etc.) and
  an extra '-debug' (level: URL) line to help you diagnose how the '-request's
  have been parsed/decoded.

* now you can use the extended '-request' URL format anywhere on the
  commandline and/or your pavuk configuration files -- as long as you keep
  it within quotes on the commandline of course, e.g.

   pavuk "URL: LFNAME:example.html"

* fix: config files generated by pavuk now properly select the 'short format'
  (URL:....) instead of the 'long url spec fomat' (Request:....): previously
  pavuk would loose information about web forms, cookies, local filenames, etc.
  for some types of requested url.

* quickfix for issue reported on the mailing list regarding JavaScript
  interface functions causing the build to fail - which happened when no
  JavaScript library could be found.

  NOTE: on Linux, the JS libraries and headerfiles seem to get installed in
        various places. The current ./configure script looks for the
        header file in the directory
        unless you specify the '--with-js-includes=<dir>' option when running

        The same goes for the js library itself: the current configure script
        looks for either libjs or libmozjs in any of these directories:
        unless you specify the ./configure --with-js-libraries=<dir> option
        to point to your specific libjs.a / libmozjs.a

* added an advanced example of use to the pavuk DocBook documentation
  which will end up in the manpage (where it's a bit too much, but then
  at least the users have an extended example of actual use) -- example
  shows how to grab the up-to-date content from a MediaWiki-based web 

* added S/M/H/D unit support for the time argument decoder function

* Updated the manual regarding:

  - all missing 'hammer mode' options

  - the missing -rtimeout and -wtimeout options

  - checked first few options in options.h and made sure those were all
    documented. (This is a work in progress...)

* All timeouts are now in milliseconds, except the -max_time one, which is
  in minutes.

  All timeout arguments (except -max_time) now recognize the alternative
  units for specifying time: s/m/h/d/S/M/H/D: second, minute, hour, day.

  When no unit has been specified, the unit 'milliseconds' is assumed.

* Fix for bug report #2158794: now all DEBUG_*() functions are called 
  using the proper number of arguments.

  The code has been further enhanced for all printf()-like functions 
  (such as the DEBUF_*() and x*printf() functions) to enable GCC and MSVC
  to check the format specification strings and parameter count and 
  type (GCC).
  This led to the discovery of a multitude of errors, which have been 
  fixed (wrong integer sizes, etc.).

* Preliminary code move to allow downloading extremely large entities
  (larger than 2GB) such as DVD ISO images: this has been done by more
  judicious use of the size_t and ssize_t types instead of simply 'int'.
  On 64-bit platforms, size_t/ssize_t can handle 64-bit sizes, while
  'int' cannot (as GCC still uses 32-bit ints on most common hardware
  64-bit architectures (Intel, ...)).  Further effort will need to be
  spent to adapt the system (and OpenSSL) calls to enable the complete
  datapath for >2GB entity sizes (at least when compiled on 64-bit).

* Small documentation fix: regex overview of characterset changed in DocBook
  source so it appears as a simple list, instead of just one long paragraph
  full of concatenated items --> improved readability.

* const-ified the source code and fixed a few comment typos and a
  lurking bug in FTP (found thanks to constification): filename
  for directory index urls could be damaged in particular circumstances.

* fixed makefiles for environments without any DocBook tools. Also fixed
  configure script to help detect the absence of mandatory DocBook template
  files. Plus added DocBook produce to the distro as we cannot expect everyone
  to have the DocBook tools; nevertheless, everybody /should/ receive a full
  set of documentation.

* Bugfix in GET_NUMLIST(): now original numlist is properly removed (would only
  be noticable before when specifying multiple port numbers).

* memleak fix for _free_httphdr(): now also the httphdr struct itself gets 

* Fixed lockups in debug logging code when running in '-x' GUI mode; overhauled the
  'recursive invocation' detection code within, which is mandatory to prevent
  recursive calls to debug/log functions to blow up the stack and dump core while
  running in ultra verbose debug/diag mode (-debug -debug_level all). This is the
  second part of the fix for bug #2184196.

* Bugfix for #2023089: new code is introduced for '-lmax' depth level checks:
  the 'depth' (a.k.a. 'level') will always be taken from the non-inline parent URL
  which has the lowest level.

  This should fix situations where 'inline' URLs have 'inline' *parent* URLs, such
  as style sheets, which are referenced non-inline URLs (HTML files).

  Seeking out the lowest level non-inline parent should also take care of situations
  where multiple HTML files at different levels themselves, all (directly!) reference the same
  stylesheet/inline URL.

* Attempt at fixing a GUI semaphore lockup, caused by LOCK_CFG_URLSTACK being used
  for different purposes (was a quick hack once to create a 'critical section' there)
  in recurse.c @ 1129. Same hack, but now we use LOCK_GHBN which should cause much less trouble

* Bit of code cleanup.

* Code review checks to see if URLT_FTPS and URLT_GOPHER are used consistently where
  you'd expect them. As you would URLT_HTTPS, next to URLT_HTTP.

* Code review checks and fixes to prevent pspurious damage to url->parent structures:
  now the access to this element is critical-sectioned /everywhere/ using LOCK_URL(u); existed
  in 95% of the places already, now all code has been checked.

* Several fixes for multithreaded GTK GUI use. Most important thing which
  was missing: a call to gtk_threads_init().

* JavaScript: updated HTML tag/attribute tables to recognize all
  onXYZ=... JavaScript event attributes in HTML + added the full
  set of attributes to the url pattern class/object which is
  available in pavuk's own JavaScript extension.