Sophie

Sophie

distrib > Mandriva > 8.2 > i586 > media > contrib > by-pkgid > 710ed0d687dee071f36b1a99483146be > files > 48

pavuk-0.9pl28-2mdk.i586.rpm

Wget-Pavuk.HOWTO
****************

This file contains base information for wget users who are starting using
pavuk.

I am not very authoritative person to write this kind of document.I am not
regulary using wget, so some of informations contained in this document may
be confused. I just was asked to provide such kind of informations, so I am
trying to write it.

I will appreciate any corrections and contributions to this document.

* Stevo Ondrejicka <ondrej@idata.sk>


Mapping of wget options to pavuk options
----------------------------------------

It is not possible to map exectly all options, because wget have some features
which pavuk lacks and vice versa. Also some of options doesn't behave exactly
same in pavuk.

_________________________________________________________________________
| Wget                             | Pavuk                              |
|-----------------------------------------------------------------------|
| * version information                                                 |
| -V, --version                    | -v                                 |
|-----------------------------------------------------------------------|
| * options help                                                        |
| -h, --help                       | -h                                 |
|-----------------------------------------------------------------------|
| * puting job at background                                            |
| -b, --background                 | -bg                                |
|-----------------------------------------------------------------------|
| * log messages                                                        |
| -a, --append-file=FILE           | -log_file FILE                     |
| -o, --output-file=FILE           | no option to force overwrite of    |
|                                  | log file, always appending         |
|-----------------------------------------------------------------------|
| * debuging                                                            |
| -d, --debug                      | -debug -debug_level LEVEL          |
|                                  | pavuk allows to focus on interested|
|                                  | messages by selecting of debug     |
|                                  | level - html,protos,protoc,protod  |
|                                  |         procs,locks,net,misc,user  |
|                                  |         mtlock,mtthr               |
|-----------------------------------------------------------------------|
| * verbosity of messages                                               |
| -q, --quiet                      | -quiet                             |
| -v, --verbose                    | -verbose                           |
| -nv, --non-verbose               | this option doesn't have any       |
|                                  | meaning for pavuk, because pavuk   |
|                                  | have better tunning of verbosity   |
|                                  | using -debug_level option          |
|-----------------------------------------------------------------------|
| * input of starting URLs from file                                    |
| -i, --input-file=FILE            | -urls_file FILE                    |
|-----------------------------------------------------------------------|
| * force processing of all files as HTML file                          |
| -F,  --force-html                | no similar option, pavuk rely on   |
|                                  | proper MIME types sent by servers. |
|                                  | For local files pavuk tries to     |
|                                  | guess HTML files by extensions or  |
|                                  | by contents                        |
|-----------------------------------------------------------------------|
| * number of attempts to get document after fail                       |
| -t, --tries=NUMBER               | -retry NUMBER                      |
| 0 means unlimited                | 0 means unlimited                  |
|-----------------------------------------------------------------------|
| * writing document to specified file                                  |
| -O, --output-document=FILE       | -store_name FILE                   |
|-----------------------------------------------------------------------|
| * reusing of already downloaded file                                  |
| -nc, --no-clobber                | this is default pavuk behavior     |
|                                  | and can't be turned off            |
|-----------------------------------------------------------------------|
| * resuming of broken downloads                                        |
| -c, --continue                   | this is pavuk default behaviour    |
|                                  | pavuk always prefix document names |
|                                  | with ".in_" and just after download|
|                                  | is finished it renames file to     |
|                                  | regular name, this way it can      |
|                                  | properly detect and restart broken |
|                                  | downloads                          |
|-----------------------------------------------------------------------|
| * progress meter                                                      |
| --dot-style=STYLE                | pavuk have only one style of       |
|                                  | progress (numeric) and it can be   |
|                                  | only turned on/off using -progress |
|                                  | and -noprogress option (default is |
|                                  | turned off)                        |
|-----------------------------------------------------------------------|
| * synchronizing                                                       |
| -N, --timestamping               | -mode sync                         |
|-----------------------------------------------------------------------|
| * output of protocol level communication with servers                 |
| -S, --server-response            | -debug -debug_level LEVEL          |
|                                  | LEVEL protos - for server responses|
|                                  | LEVEL protoc - for pavuk requests  |
|                                  | LEVEL protod - POST data sent in   |
|                                  |                HTTP POST request   |
|                                  | you can activate multiple debug    |
|                                  | levels - for example :             |
|                                  |  -debug_level protos,protoc        |
|-----------------------------------------------------------------------|
| * checking of URLs validity                                           |
| --spider                         | something similar cand be achived  |
|                                  | by -mode remind option.             |
|                                  | it reports broken and modified     |
|                                  | links not everything               |
|                                  | you must use also -remind_cmd      |
|                                  | option to set program which will   |
|                                  | process output (default is mailx)  |
|-----------------------------------------------------------------------| 
| * communication timeout                                               |
| -T, --timeout=SECONDS            | -timeout MINUTES                   |
|-----------------------------------------------------------------------|
| * delaying of transfers                                               |
| -w, --wait=SECONDS               | -sleep SECONDS                     |
|-----------------------------------------------------------------------|
| * enabling/disabling connections through proxy                        |
| -Y, --proxy=on/off               | no similar option, if proxy is     |
|                                  | specified, it is always used       |
|-----------------------------------------------------------------------|
| * retrieval quota                                                     |
| -Q, --quota=NUMBER               | -trans_quota NUMBER                |
| wget stops download just after   | pavuk stops download right after   |
| file download is finished after  | transfer quota is exeede (also in  |
| quota is exeeded                 | midle of transfer)                 |
|-----------------------------------------------------------------------|
| * adjusting of local directory layout of downloaded files             |
| -nd, --no-directories            | -fnrules F "*" "%n"                |
| -x, --force-directories          | this is default for pavuk          |
| -nH, --no-host-directories       | -base_level 2                      |
|                                  | or -fnrules F "*" "%d/%n"          |
| -P, --directory-prefix=PREFIX    | -cdir PREFIX                       |
| --cut-dirs=NUMBER                | -base_level NUMBER+1               |
|-----------------------------------------------------------------------|
| * authorization for accessing documents                               |
| --http-user=USER                 | -auth_name USER                    |
| --http-passwd=PASS               | -auth_passwd PASS                  |
|-----------------------------------------------------------------------|
| * caching of documents on proxy server                                |
| -C, --cache=on/off               | -cache -nocache                    |
|-----------------------------------------------------------------------|
| * workaround for buggy HTTP servers with bad Content-Length header    |
| --ignore-length                  | -check_size/-nocheck_size          |
|-----------------------------------------------------------------------|
| * adding custom fields to HTTP request header                         |
| --header=STRING                  | -httpad STRING                     |
|-----------------------------------------------------------------------|
| * proxy authorization                                                 |
| --proxy-user=USER                | -http_proxy_user USER              |
| --proxy-passwd=PASS              | -http_proxy_pass PASS              |
|-----------------------------------------------------------------------|
| * storing HTTP response headers                                       |
| -s, --save-headers               | pavuk never stores HTTP responses  |
|                                  | within documents, it provides      |
|                                  | -store_info, to store this         |
|                                  | information in sepearate files in  |
|                                  | .pavuk_info directory              |
|-----------------------------------------------------------------------|
| * spoofing of User-Agent: request field                               |
| -U, --user-agent=AGENT           | -identity AGENT                    |
|-----------------------------------------------------------------------|
| * preserving of symbolic links when transfering through FTP           |
| --retr-symlinks                  | -preserve_slinks/-nopreserve_slinks|
|-----------------------------------------------------------------------|
| * globbing in FTP transfers                                           |
| -g, --glob=on/off                | no exactly similar functionality   |
|                                  | pavuk allow to specify wildcard    |
|                                  | patterns and regular expressions   |
|                                  | with options                       |
|                                  | -pattern, -rpattern, -skip_pattern |
|                                  | -skip_rpattern, -url_patter,       |
|                                  | -url_rpattern, -skip_url_pattern   |
|                                  | -skip_url_rpatter                  |
|                                  | this patterns arre applied on all  |
|                                  | URLs not just FTP URLs             |
|-----------------------------------------------------------------------|
| * setting of FTP data connection type                                 |
| --passive-ftp                    | -ftp_active/-ftp_passive           |
|-----------------------------------------------------------------------|
| * recursing through WWW                                               |
| -r, --recursive                  | this is default pavuk behaviour    |
|-----------------------------------------------------------------------|
| * limiting recursion level                                            |
| -l, --level=NUMBER               | -lmax NUMBER                       |
| 0 unlimited                      | 0 unlimited                        |
|-----------------------------------------------------------------------|
| * prefetching files to proxy                                          |
| --delete-after                   | -mode dontstore                    |
|-----------------------------------------------------------------------|
| * converting links in HTML documents                                  |
| -k, --convert-links              | default pavuk behaviour, it is     |
|                                  | possible to set how to convert the |
|                                  | as default pavuk maintain          |
|                                  | consistency of links inside HTML   |
|                                  | documents, so all links are always |
|                                  | valid and when documents is stored |
|                                  | to local tree, pavuk overwrites    |
|                                  | links in all HTML documents which  |
|                                  | points to it.                      |
|                                  | default behaviour is possible to   |
|                                  | change with options:               |
|                                  | -all_to_local, -sel_to_local,      |
|                                  | -all_to_remote                     |
|-----------------------------------------------------------------------|
| * mirroring                                                           |
| -m, --mirror                     | -mode sync                         |
|-----------------------------------------------------------------------|
| * directory listings handling                                         |
| -nr, --dont-remove-listing       | -store_index/-nostore_index        |
|                                  | pavuk as default converts all      |
|                                  | directory listing to HTML files and|
|                                  | as default are this files stored.  |
|-----------------------------------------------------------------------|
| * allowing/disallowing of specified suffixes (file extensions)        |
| -A, --accept=LIST                | -asfx LIST                         |
| -R, --reject=LIST                | -dsfx LIST                         |
|-----------------------------------------------------------------------|
| * allowing/disallowing of specified domains                           |
| -D, --domains=LIST               | -adomain LIST                      |
| --exclude-domains=LIST           | -ddomain LIST                      |
|-----------------------------------------------------------------------|
| * following of relative only URLs                                     |
| -L, --relative                   | no similar option                  |
|-----------------------------------------------------------------------|
| * enabling of transfers from FTP servers                              |
| --follow-ftp                     | -FTP/-noFTP                        |
|-----------------------------------------------------------------------|
| * spaning to other servers                                            |
| -H, --span-hosts                 | this is default pavuk behaviour    |
|                                  | it is possible to change it with   |
|                                  | -dont_leave_site/-leave_site       |
|-----------------------------------------------------------------------|
| * allowing/disallowing of specified directories (prefixes)            |
| -I, --include-directories=LIST   | -aprefix LIST                      |
| -X, --exclude-directories=LIST   | -dprefix LIST                      |
|-----------------------------------------------------------------------|
| * DNS lookup                                                          |
| -nh, --no-host-lookup            | no similar option                  |
|                                  | pavuk always caches DNS request in |
|                                  | local hash table                   |
|-----------------------------------------------------------------------|
| * disable ascending to parent directories                             |
| -np, --no-parent                 | -dont_leave_dir/-leave_dir         |
|-----------------------------------------------------------------------|


This document I wrote looking at "wget --help" output and sometimes checking 
info documentation of wget. The version of wget I have is "GNU Wget 1.5.3"
as distributed with RedHat 6.2 .
If you find any bugs/mistakes/typos/... please tell me and I will try to 
correct it.