Pavuk

SourceForge

Last Update: February 27 2006

 
 

JavaScript Documentation :

Pavuk uses JavaScript bindings for doing some complicated tasks which need some more complexity than can be achieved with not scriptable programs. At first you need to have javascript library from Mozilla project installed on your system. It is possible to download sources from Mozilla FTP space. Compilation is very simple. The use of JavaScript scripts within pavuk is not yet documented in the manual pages. You can load one JavaScript file into pavuk using option -js_script_file. Currently there are two places in pavuk, where users can insert own JavaScript functions. One is inside routine which is doing decision whether a particular URL should be downloaded or not. If you want insert a own JavaScript decision function, you must name it pavuk_url_cond_check. The prototype of this function looks like following:
function pavuk_url_cond_check(url, level)
{
}
Arguments:
  • level is an integer number and indicates from which of the different places in pavuk code pavuk_url_cond_check function was called:
    levelplace
    0Condition checking is called from HTML parsing routine. At this point you can use all conditions besides -dmax, -min_time, -max_time, -max_size, -min_size, -amimet, -dmimet
    1Condition checking is called from routine which is performing queuing of URLs into download queue. At this point you can use all conditions like in level 0 including -dmax.
    2Condition checking is called when URL is taken from download queue and will be transfered after this check will be successful. At this point you can use same set of conditions like in level 1.
    3Condition checking is called after pavuk sent request for download and detected document size, modification time and mime type. In this level you can use all conditions.

  • url is an object instance of PavukUrl class. It contains all information about the particular URL and is wrapper for parsed URLs defined in pavuk as structure of url type.
It has following attributes:
read-write attributes
status(int32, defined always) holds bitfields with different information (look in url.h to see more)
read-only attributes defined always
protocolone of "http", "https", "ftp", "ftps", "file", "gopher", "unknown" means the type of the URL
levellevel in document tree at which this URL lies
ref_cntnumber of parent documents which reference this URL
urlstrfull URL string
read-only attributes defined when protocol is "http" or "https"
http_hosthost name or IP address
http_portport number
http_documentHTTP document
http_searchstrquery string when available (the part of URL after ?)
http_anchor_nameanchor name when available (the part of URL after #)
http_useruser name for authorization when available
http_passwordpassword for authorization when available
read-only attributes defined when protocol is "ftp" or "ftps"
ftp_hosthost name or IP address
ftp_portport number
ftp_useruser name for authorization when available
ftp_passwordpassword for authorization when available
ftp_pathpath to file or directory
ftp_anchor_nameanchor name when available (the part of URL after #)
ftp_dirflag whether this URL points to directory
read-only attributes defined when protocol is "file"
file_namepath to file or directory
file_searchstrquery string when available (the part of URL after ?)
file_anchor_nameanchor name when available (the part of URL after #)
read-only attributes defined when protocol is "gopher"
gopher_hosthost name or IP address
gopher_portport number
gopher_selectorselector string
read-only attributes defined when protocol is "unknown"
unsupported_urlstrfull URL string
read-only attributes available when performing checking of conditions
check_levelequivalent to level parameter of pavuk_url_cond_check function
mime_typeMIME type of this URL (defined when available)
doc_sizesize of document (defined when available)
modification_timemodification time of document (defined when available)
doc_numbernumber of document in download queue (defined when available)
html_docfull content of parent document of current URL (defined when level is 0)
html_doc_offsetoffset of current HTML tag in parent document of URL (defined when level is 0)
moved_toget URL to which was this URL moved (define when available)
html_tagfull HTML tag including <> from which is taken current URL (defined when level is 0)
tagname of HTML tag from which is current URL taken (defined when level is 0)
attribname of HTML tag attribute from which is current URL taken (defined when level is 0)

And following methods:
get_parent(n)get URL of n-th parent document
check_cond(name, ....)check condition which option name is "name". When you will not provide additional parameters pavuk will use parameters from command line or scenario file for condition checking. Otherwise it will use listed parameters.

Here is a example what pavuk_url_cond_check function can look like:
function pavuk_url_cond_check (url, level)
{
  if(level == 0)
  {
    if(url.level > 3 && url.check_cond("-asite", "www.host.com"))
      return false;

    if(url.check_cond("-url_rpattern",
    "http://www.pavuk.org/", "http://www.pavuk.org/~pic/") && 
    url.check_cond("-dsfx", ".jar", ".tgz", ".png))
      return false;
  }

  if(level == 2)
  {
    par = url.get_parent();

    if(par && par.get_moved())
      return false;
  }

  return true;
}
The example is useless, but shows you how to use this feature...

The second possible use of JavaScript with pavuk is in -fnrules option for generating local names. In this case it is done by special function of extended -fnrules option syntax called jsf which has one parameter - the name of javascript function which will be called. The function must return a string parameter and its prototype is something like following:
function some_jsf_func(fnrule)
{
}
The fnrule parameter is an object instance of PavukFnrules class. It has one attribute url which is of PavukUrl type described above and also have one method get_macro(macro) which returns a value of the %x macros used in -fnrules option.

You can do something like -fnrules F "*" '(jsf "some_fnrules_func")'