Wild Web Wanderer

Stage: development

Current plan is available as {3}

Requirements

Business requirements

Web application development requires versatile testing. Frequently, usual testing using predefined tests is not enough: users are much more creative than any requirements analysis may imagine and can trigger completely unexpected scenarious.

Thus, we need a tool that would allow random or semirandom web application traversal, logging each action. It should be done in two modes:

  • follow only links and
  • follow links and fill out forms.

Such tool should be able

  • to run from automated testing suite (e.g. at CI),
  • to log each action in the executable test form,
  • to limit traversed links/forms to the predefined site (regexp?) and
  • to limit its execution by time or by the number of visited pages.

General architecture

Plain script.

Users

Single user.

Data

  • Web page
  • Link
  • Form
  • Starting URL
  • Allowed URL regexps
  • Error regexps
  • Logscript - script that logged all actions performed by RWT
  • Random seed
  • Input generators

Functions, screens, performance

No screens.

Performance is usually limited by web application.

test link traversal

Event flow:

  1. System starts up and reads its configuration from configuration files in such order:
    • system-wide configuration file ('/etc/rwt/config.py')
    • user configuration file ('$HOME/.rwt/config.py')
    • project configuration file ( '$cwd/rwt-config.py')
    • command line options
  2. If a random seed is specified, System initializes pseudo random numbers generator from it. Otherwise, it initializes from system time.
  3. System sets starting web page as the current one.
  4. System performs the following procedure for the current web page:
    1. download the page content
    2. if the output contains a text, satisfying any error regexp, notify the user and stop the execution
    3. get all links from the page
    4. limit the links set to those satisfying allowed url regexps
    5. select random link and set it as the current one
    6. log the action as an executable python code
    7. repead the step
  5. System continues the previous cycle until the earlies happen: the time limit would be reached or the number of links tranversed would be exceeded.

Tests:

  1. System-wide configuration file doesn't exist
  2. System-wide configuration file is not a valid python script
  3. User configuration file doesn't exist
  4. User configuration file is not a valid python script
  5. Project configuration file doesn't exist
  6. Project configuration file is not a valid python script
  7. Command line option is broken (define CLI syntax!)
  8. Random seed is not a number
  9. A page is unavailable (returns anything bug 2* or 3* HTTP code)
  10. No error regexps.
  11. Invalid error regexp (e.g. not a regexp at all)
  12. No links on the page (define behaviour! e.g. return to the previous page and select another link)
  13. Time limit is not a number
  14. Tranversed links limit is not a number.
  15. Success test of a simple static local website.

Tickets: #1

test link traversal and form posts

Event flow:

  1. System starts up and reads its configuration from configuration files in such order:
    • system-wide configuration file ('/etc/rwt/config.py')
    • user configuration file ('$HOME/.rwt/config.py')
    • project configuration file ( '$cwd/rwt-config.py')
    • command line options
  2. If a random seed is specified, System initializes pseudo random numbers generator from it. Otherwise, it initializes from system time.
  3. System sets starting web page as the current one.
  4. System performs the following procedure for the current web page:
    1. if current page is a link, download the page content; otherwise (a form) - download an HTTP POST result
    2. if the output contains a text, satisfying any error regexp, notify the user and stop the execution
    3. get all links and all forms from the page
    4. limit the links and form actions set to those satisfying allowed url regexps
    5. select random link or form and set it as the current one
    6. if the current page is a form post, generate the random input to it using configured generators
    7. log the action as an executable python code
    8. repead the step
  5. System continues the previous cycle until the earlies happen: the time limit would be reached or the number of links tranversed would be exceeded.

Tests:

  1. System-wide configuration file doesn't exist
  2. System-wide configuration file is not a valid python script
  3. User configuration file doesn't exist
  4. User configuration file is not a valid python script
  5. Project configuration file doesn't exist
  6. Project configuration file is not a valid python script
  7. Command line option is broken (define CLI syntax!)
  8. Random seed is not a number
  9. A page (through a link) is unavailable (returns anything bug 2* or 3* HTTP code)
  10. A page (through an HTTP POST) is unavailable
  11. No error regexps.
  12. Invalid error regexp (e.g. not a regexp at all)
  13. No links and no forms on the page (define behaviour! e.g. return to the previous page and select another link)
  14. No links on the page.
  15. No forms on the page.
  16. Generator for a form field is not available.
  17. Generator for a form field throws an exception.
  18. Time limit is not a number
  19. Tranversed links limit is not a number.
  20. Success test of a simple local website.
  21. Success test of text generator
  22. Success test of radiobutton generator
  23. Success test of select generator
  24. Success test of checkbox generator
  25. Success test of textarea generator
  26. Success test of date generator

Tickets: #2

Environment

Other hardware and software integration

  • colo.kds.com.ua VE
  • ALT Linux Sisyphus 4.0

Development language and style

  • Python 2.4.x
  • PEP 8
  • 100% test coverage with nose
  • nose unit tests
  • pylint score 8 or better
  • all generated html should be valid xhtml and valid css

Licensing and license compatability

GPL v2 or later

Risk profile

Below the identified project risks are enumerated.

  • Real development velocity too low comparing to initially estimated
  • Non-development delays
  • Software defects in the developed Product
  • Software defects in 3rd party products

Things in development

None yet

Future development

None yet

Work progress

2007.08.01

Revived from dead friday project

-- akhavr