CroftSoft / Portfolio

Site Retriever

This product was conceived of and implemented by David Wallace Croft for Analytic Services, Inc. while working on the Technology for Identifying Missing Children project. He later fleshed this out into a full web spider which would download images and pass them to a third-party face recognition engine.

This product may be downloaded and used under the terms of the license agreement below.

Function
Downloads an entire web site and copies its content hierarchy to a local directory.
Applications
Applications include evidence gathering, mirroring, and caching.
Known Competitors
GetBot: a shareware GUI product for Wintel
Current Features
Command-line version only.
Respects the Robots Exclusion Protocol.
Future Features
Needs a Graphical User Interface (GUI).
Needs to be able to start from other than the specified branch root so that sibling branches can be downloaded even if the common parent cannot.
Needs an option to convert absolute links to relative links during the download.
Known Bugs
None.

Version

The latest version is 1.0a1, 1999-02-05.

Requires Java 2 (1.2+).

License


Site Retriever

Version 1.0a1, 1999-02-05

Binary Code License



This binary code license ("License") contains rights and

restrictions associated with use of the accompanying



Site Retriever Version 1.0a1, 1999-02-05



software and documentation ("Software"). Read the License

carefully before using the Software. By using the Software

you agree to the terms and conditions of this License.



1.  License to Distribute. Licensee is granted a royalty-free

right to reproduce and distribute the Software provided that

Licensee: (i) distributes the Software complete and

unmodified; (ii) does not distribute additional software

intended to replace any component(s) of the Software; (iii)

does not remove or alter any proprietary legends or notices

contained in the Software; (iv) only distributes the Program

subject to a license agreement that protects ANSER's

interests consistent with the terms contained herein; and

(v) agrees to indemnify, hold harmless, and defend ANSER and

its licensors from and against any claims or lawsuits,

including attorneys' fees, that arise or result from the use

or distribution of the Program.



2.  Restrictions. (a) Software is confidential copyrighted

information of ANSER and title to all copies is retained by

ANSER and/or its licensors. Except as otherwise provided by

law for purposes of decompilation of the Software, Licensee

shall not translate, reverse engineer, disassemble,

decompile, or otherwise attempt to derive the source code of

Software. Software may not be leased, assigned, or

sublicensed, in whole or in part, except as specifically

authorized in Section 1. (b) Software is not designed or

intended, and ANSER expressly disclaims any representations or

warranties (either expressed or implied), for use (i) in

online control of aircraft, air traffic, aircraft navigation

or aircraft communications; or (ii) in the design,

construction, operation or maintenance of any nuclear

facility.



3.  Trademarks and Logos. This License does not authorize

Licensee to use any ANSER name, trademark or logo.



4.  Disclaimer of Warranty. Software is provided "AS IS,"

without a warranty of any kind. ALL EXPRESS OR IMPLIED

REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED

WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR

PURPOSE OR NON-INFRINGEMENT, ARE HEREBY EXCLUDED.



5.  Limitation of Liability.   IN NO EVENT WILL ANSER OR ITS

LICENSORS BE LIABLE FOR ANY LOST REVENUE, PROFIT OR DATA, OR

FOR DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL OR

PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE

THEORY OF LIABILITY, RELATING TO THE USE, DOWNLOAD,

DISTRIBUTION OF OR INABILITY TO USE SOFTWARE, EVEN IF ANSER

HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.



6.  Termination.  Licensee may terminate this License at any

time by destroying all copies of Software. This License will

terminate immediately without notice from ANSER if Licensee

fails to comply with any provision of this License. Upon

such termination, Licensee must destroy all copies of

Software.



7.  Maintenance and Support.  No upgrades or support are to

be provided to Licensee under the terms of this License.



8.  Export Regulations. Software, including technical data,

is subject to U.S. export control laws, including the U.S.

Export Administration Act and its associated regulations,

and may be subject to export or import regulations in other

countries. Licensee agrees to comply strictly with all such

regulations and acknowledges that it has the responsibility

to obtain licenses to export, re-export, or import Software.

Software may not be downloaded, or otherwise exported or

re-exported (i) into, or to a national or resident of, Cuba,

Iraq,Iran, North Korea, Libya, Sudan, Syria or any country

to which the U.S.has embargoed goods; or (ii) to anyone on

the U.S. Treasury Department's list of Specially Designated

Nations or the U.S. Commerce Department's Table of Denial

Orders.



9.  Restricted Rights. Use, duplication or disclosure by the

United States government is subject to the restrictions as

set forth in the Rights in Technical Data and Computer

Software Clauses in DFARS 252.227-7013(c) (1) (ii) and FAR

52.227-19(c) (2) as applicable.



10. Governing Law. Any action related to this License will

be governed by West Virginia law and controlling U.S. federal

law. No choice of law rules of any jurisdiction will apply.



11. Severability. If any of the above provisions are held to

be in violation of applicable law, void, or unenforceable in

any jurisdiction, then such provisions are herewith waived

or amended to the extent necessary for the License to be

otherwise enforceable in such jurisdiction.   However, if in

ANSER's opinion deletion or amendment of any provisions of the

License by operation of this paragraph unreasonably

compromises the rights or increase the liabilities of ANSER or

its licensors, ANSER reserves the right to terminate the

License and refund the fee paid by Licensee, if any, as

Licensee's sole and exclusive remedy.

Installation and Execution

  1. Download siteretriever.zip.
    Extract SiteRetriever.jar from the zip file.

  2. Install Java on your computer if you do not already have it.

  3. To run as a system command line prompt utility, run
    
         java -jar SiteRetriever.jar
    
         
    and you will be prompted with a help message similar to the following:
    
    
    
    SiteRetriever Copyright 1998-1999 Analytic Services, Inc.
    
    Version 1.0a1, 1999-02-05
    
    http://nexos.anser.org:8080/java/app/siteretriever/
    
    David Wallace Croft, croftd@anser.org
    
    
    
    Downloads an entire web site and copies its content hierarchy
    
    to a local directory.
    
    
    
    Command-line Arguments:
    
      1.  your e-mail address;
    
      2.  root of web site branch to download; and
    
      3.  local destination directory.
    
    
    
    Example:
    
    java -jar SiteRetriever.jar croftd@anser.org http://www.anser.org/ C:\mirror\
    
    
    
         

Usage Notes

Site Retriever abides by the Robots Exclusion Protocol so some pages may not be downloaded if the webmaster has chosen to block them from softbot access.

Site Retriever only downloads pages that are apparently in the same branch as the root URL specified on the command line. For example, if


https://www.croftsoft.com:8080/people/david/

were specified on the command line, only pages that match the following criteria would be downloaded:

Your e-mail address is provided on the command-line so that webmasters may contact you if they have problems or concerns with your use of Site Retriever on their site. Your e-mail address is passed to the web server with each file request as part of the standard request header.

Version History

First release.