Improving Software Quality

Since 1968 end users have come to depend more on software, and their expectations for product quality have risen dramatically. Moreover, the pace of development has accelerated in the new millennium, thanks to the Internet, competition and the tools developers use. It is easier to write Java code and port it than it ever was to write C code and port it. The crop of rapid prototyping languages—scripting languages—like Python, Perl and Ruby makes it easy to build Web sites quickly. Databases have become commodities and don’t need to be reinvented each time.

“QA is still a challenge, still generally left to the end, and the staff is treated as second-class citizens,” said Ed Hirgelt, manager of services development for Quest Software. However, because of the speed of development and time-to-market requirements, QA is becoming more visible. Test-driven development moves testing to earlier in the life cycle. Tools like JUnit and Ant make it easier to run tests as part of the nightly build process. The concept of a continuous build is helping produce reliable software.

Hirgelt characterizes a continuous build process as one in which a build is initiated when a developer commits code back to the source repository. The product is built and tests run automatically. Problems are caught sooner rather than later.

QA also has been changing as the result of such factors as the wind-down following Y2K and the subsequent business decline. As software companies faced hard times, one solution was to improve efficiency by hiring the most skilled testers available and automating as much testing as possible, according to Elfriede Dustin, internal SQA consultant for global security services at Symantec.

The loss of jobs following the dot-com implosion meant software companies went from having to hire practically anyone with a pulse in the late 1990s to the luxury of choosing from only the most highly qualified candidates. That change has affected who is being hired for QA positions. In some large companies, “coding skills are what you are judged and hired by, with testing skills coming in a distant second,” said Duri Price of Exceed Training, who has worked in the software QA field since 1992. Jeff Feldstein, who manages a team of 35 test engineers at Cisco Systems, concurred. He hires software engineers exclusively, and then sells them on test engineering.

“Testers need to be involved from the beginning of the development life cycle,” said Symantec’s Dustin. More important, however, is that so much depends on the developers’ skills. The most efficient and knowledgeable testers cannot succeed if developers implement bad software or ineffective software development life cycles and processes are in place. If testing is the only quality phase implemented as part of the QA process, it can at most be considered a Band-Aid often too late in the development life cycle to make much of a quality difference. Testing is only one piece of the quality puzzle.
Agile Methods

The growing influence of agile processes has had direct and indirect consequences on quality. With the advent of Extreme Programming (XP) and the agile movement, testing has become more of a developer activity, said Dustin. Agile methodologies have provided a model for moving testing forward and putting more of the responsibility in the hands of developers.

“With the onset of agile methodologies comes the concept of test-driven development, which was introduced with Extreme Programming,” said Bob Galen, a senior QA manager at Thomson-Dialog. The principle is to design tests before designing code, usually using a unit-testing framework such as JUnit or xUnit to help support the practice. Test-driven development has gone beyond XP to become a mainstream development practice, he noted. “It has also sensitized a new generation of software developers on the importance of and skills required to properly test software.”

Testers are getting better code and can’t simply repeat the same tests as the development team. Galen said they must find other value areas, such as getting involved at the front end with requirement definition and acceptance test development, working in parallel with the development teams, and at the back end working with the customer to run the acceptance testing and doing performance and load testing or usability testing.


Not everyone is convinced that agile processes are the answer. Dustin said she doubts that Extreme Programming can work in a development effort larger than 10 or so developers. Feldstein’s group hasn’t embraced agile methodologies (though he acknowledged that they have been used successfully elsewhere in Cisco) because he doesn’t see them as a way to get high-quality software, but as a way to get software fast. “It’s not clear that agile puts quality at the forefront,” he said. “Getting it out quickly isn’t a priority for us. Getting it right is.”

At the Cisco facility where Feldstein works, testers become involved during the development of the product requirements document. “Marketing, development and test are all equal in the team, and they all get involved early on,” he explained. The whole team owns the quality, and the whole team decides when to ship. The process is requirements-driven, he said, and they don’t need test-driven development. He also noted that the processes are constantly being refined.

“Once developers have committed to a schedule, which occurs when the functionality spec is complete,” he said, “they tell you what the API looks like. We can start coding simultaneously. We do unit testing on the test software.” When a stand-alone component is complete, it’s handed off for functional and performance testing. A stand-alone component is complete after developers have done unit testing, and integration testing between components, and have established baseline performance.
Automation

Another agile community influence has been to drive testers toward automation, according to Thomson-Dialog’s Galen. “The notion of complete automated unit-testing capabilities is carrying over into an expectation of general automated testing. Why have automated unit testing and the increased flexibility and safety of executing them, when you have to manually test the application from a QA point of view? It simply doesn’t make sense.” He said that there is pressure for testers to create high degrees of automation leveraging agile development practices.

The tools are evolving from capture/playback methods toward alternative methods of driving tests that are longer lived and require less maintenance, said Galen. Concepts like “keyword driven,” “model driven,” “coverage driven” and “database driven” are coming into play.

The advantage of manual testing is that it’s not done the same way every time. In classic automation, you execute the same set of test cases every time. Model-based testing, which is a recent development, is a form of automated testing that adds randomness by inserting random behavior into test automation software. You run through tests in different order so you exercise different areas of code.

Automation is not altogether a panacea. Many companies are spending lots of time and effort on automation that doesn’t return the investment, said Exceed Training’s Price. “I’ve seen huge batches of automated tests that merely informed us that what we thought should work, did work. They verified. But they didn’t usually do a great job at testing.”

Price said that a competent tester with or without coding skills could often break the system manually. Coding skills aren’t the only skills needed, or even the most important ones, he insisted. The most important thing a tester needs to know how to do is figure out what to test, why to test it and how to test it. The implementation of the test will vary, but first you have to figure out your target. That skill set is getting less and less attention.

Exploratory Testing

Test automation, by executing a large number of planned tests, frees up resources to do more free-form exploratory testing, according to Price. Exploratory, or context-based testing, is a movement spearheaded by James Bach and articulated in the book he co-authored with Cem Kaner and Bret Pettichord, “Lessons Learned in Software Testing” (Wiley, 2001). Dion Johnson, an independent consultant who focuses on QA, QC, requirements analysis and process improvement, said that exploratory testing supports less initial test planning and documentation and more execution of spur-of the-moment testing concepts based on a tester’s intuition about what is happening with the application in real time.

Galen characterized it as testing within the context presented for a given effort or project. For example, in a schedule-driven context, management might give a team two days in which to test a release of the product. That’s the context. If the team is operating in a plan and scripted model, it picks which test cases to run that will fit within the two days. The team might try some prioritization, but it stays within the bounds of plan and then test.

Context-based and exploratory testing leverage the background, skills and experience of the testers in order to make better decisions under real conditions, rather than trying vainly to create a never-changing set of tests that anyone can run.

In an exploratory model, the team might first look at the intent of the release. If it is a beta test for a mortgage broker customer, the team might choose to test customer-centric and known problem areas in the allotted two days, using an intimate knowledge of the customer and of the product, to define the focus of testing. The team would spend little time planning but much more time reacting to the specific need.
Security

The Internet explosion has led to a new focus on security, usability and performance testing, said Johnson. Security testing offers a new set of challenges to testers since security flaws don’t necessarily affect how a system works from an application perspective. Functional testing will typically ferret out functional problems, but security vulnerabilities have nothing to do with application functionality.

Roger Thornton, founder and CTO of Fortify Software, said that application expertise doesn’t translate into security expertise. Security expertise comes from the operations side of the company. Even recognizing that a security flaw exists can be difficult. There is a tendency to blame hackers rather than accept the fact that the software is vulnerable and needs to be fixed. Security flaws may be features of the software. “The hacker’s best friend used to be the sys admin. Now it’s the programmer,” he said, citing SQL insertion as a way that theoretically inaccessible information can be retrieved.

Security testing involves testing for conditions that could lead to security breaches, and that means you have to know where to look. “Security bugs are hard to tease out,” said John Viega, co-author with Gary McGraw of “Building Secure Software: How to Avoid Security Problems the Right Way” (Addison-Wesley Professional, 2001). “You need to find vulnerabilities early.” He advocates using a static approach and examining source code to find potential vulnerabilities, such as buffer overflows. But he said that right now static testing suffers from too many false positives.
Performance Testing

Usability design and testing also have gained in importance as an approach for ensuring not only that applications meet customer expectations, but also that customers can navigate through them. And performance testing is important to ensure that the system can support the expected load on the system, given the potentially high traffic on Internet applications. Feldstein runs a performance test bed in parallel with functional testing, and he monitors resources all the time.

When performance testing specialist Scott Barber, chief technology officer at PerfTestPlus, started doing performance testing five years ago, it was seen as a minor add-on service after the completion of functional testing to validate the maximum supported load on the way to production. Today, performance testing is typically not considered an optional add-on service tacked on at the end, although Barber said it is still five years behind functional testing in terms of industry acceptance and maturity. While performance testing is considered early in the process and is thought to be important, it is still generally outside the overall development process and doesn’t start until a beta release.
Open-Source Tools

Not only have processes and methods evolved, but the tools landscape has changed rapidly over the past few years as well, particularly with the availability of open-source tools. “Open source is invading the test space as aggressively as it is within mainstream development,” said Galen, “and we need to be adapting towards the evolution.”

Open-source tools range from development tools to new scripting languages to competitive frameworks to off-the-shelf automation development tools. Besides the xUnit tools, there are tools for acceptance testing, such as FitNesse. Scripting languages such as Python, Jython and Ruby are gaining ground on Perl. “As a tester, it’s not good enough any longer to know a little Perl,” he said. “You must understand a variety of scripting languages to leverage in your day-to-day tasks.”

Testing frameworks that are becoming available in open source rival the traditional automation tool vendor offerings in breadth and capabilities. Barber said that vendors of commercial test tools are “about to be in for a shock, as they find that the new tools are significantly cheaper, and as they learn that the open-source tools are competitive.”
A Glance Forward

Beyond the impact of open-source tools, the test-tool market is on the verge of a major overhaul, according to Barber. Testing tools are being integrated into development environments, a confirmation that the industry is beginning to acknowledge that developers need to be the first to test and that testers need to work hand-in-hand with developers.

Parallel Testing

    Usage:

  • To ensure that the processing of new application (new version) is consistent with respect to the processing of previous application version.
    Objective:

  • Conducting redundant processing to ensure that the new version or application performs correctly.
  • Demonstrating consistency and inconsistency between 2 versions of the application.
    How to Use

  • Same input data should be run through 2 versions of same application system.
  • Parallel testing can be done with whole system or part of system (segment).
    When to Use

  • When there is uncertainty regarding correctness of processing of new application where the new and old version are similar.
  • In financial applications like banking where there are many similar applications the processing can be verified for old and new version through parallel testing
    Example

  • Operating new and old version of a payroll system to determine that the paychecks from both systems are reconcilable.
  • Running old version of application to ensure that the functions of old system are working fine with respect to the problems encountered in the new system.

Manual Testing

    Usage:

  • It involves testing of all the functions performed by the people while preparing the data and using these data from automated system.
    Objective:

  • Verify manual support documents and procedures are correct.
  • Determine Manual support responsibility is correct
  • Determine Manual support people are adequately trained.
  • Determine Manual support and automated segment are properly interfaced.
    How to Use

  • Process evaluated in all segments of SDLC.
  • Execution of the can be done in conjunction with normal system testing.
  • Instead of preparing, execution and entering actual test transactions the clerical and supervisory personnel can use the results of processing from application system.
  • To test people it requires testing the interface between the people and application system.
    When to Use

  • Verification that manual systems function properly should be conducted throughout the SDLC.
  • Should not be done at later stages of SDLC.
  • Best done at installation stage so that the clerical people do not get used to the actual system just before system goes to production.
    Example

  • Provide input personnel with the type of information they would normally receive from their customers and then have them transcribe that information and enter it in the computer.
  • Users can be provided a series of test conditions and then asked to respond to those conditions. Conducted in this manner, manual support testing is like an examination in which the users are asked to obtain the answer from the procedures and manuals available to them.

Error Handling Testing

    Usage:

  • It determines the ability of applications system to process the incorrect transactions properly
  • Errors encompass all unexpected conditions.

  • In some system approx. 50% of programming effort will be devoted to handling error condition.

    Objective:

  • Determine Application system recognizes all expected error conditions
  • Determine Accountability of processing errors has been assigned and procedures provide a high probability that errors will be properly corrected
  • Determine During correction process reasonable control is maintained over errors.
    How to Use

  • A group of knowledgeable people is required to anticipate what can go wrong in the application system.
  • It is needed that all the application knowledgeable people assemble to integrate their knowledge of user area, auditing and error tracking.
  • Then logical test error conditions should be created based on this assimilated information.
    When to Use

  • Throughout SDLC.
  • Impact from errors should be identified and should be corrected to reduce the errors to acceptable level.
  • Used to assist in error management process of system development and maintenance.
    Example

  • Create a set of erroneous transactions and enter them into the application system then find out whether the system is able to identify the problems..
  • Using iterative testing enters transactions and trap errors. Correct them. Then enter transactions with errors, which were not present in the system earlier.

Regression Testing

    Usage:

  • All aspects of system remain functional after testing.
  • Change in one segment does not change the functionality of other segment.

    Objective:

  • Determine System documents remain current
  • Determine System test data and test conditions remain current
  • Determine Previously tested system functions properly without getting effected though changes are made in some other segment of application system.
    How to Use

  • Test cases, which were used previously for the already tested segment is, re-run to ensure that the results of the segment tested currently and the results of same segment tested earlier are same.
  • Test automation is needed to carry out the test transactions (test condition execution) else the process is very time consuming and tedious.
  • In this case of testing cost/benefit should be carefully evaluated else the efforts spend on testing would be more and payback would be minimum.
    When to Use

  • When there is high risk that the new changes may effect the unchanged areas of application system.
  • In development process: Regression testing should be carried out after the pre-determined changes are incorporated in the application system.
  • In Maintenance phase : regression testing should be carried out if there is a high risk that loss may occur when the changes are made to the system
    Example

  • Re-running of previously conducted tests to ensure that the unchanged portion of system functions properly.
  • Reviewing previously prepared system documents (manuals) to ensure that they do not get effected after changes are made to the application system.
    Disadvantage

  • Time consuming and tedious if test automation not done

Requirements Testing

    Usage:

  • To ensure that system performs correctly
  • To ensure that correctness can be sustained for a considerable period of time.

  • System can be tested for correctness through all phases of SDLC but incase of reliability the programs should be in place to make system operational.
    Objective:

  • Successfully implementation of user requirements,/li>
  • Correctness maintained over considerable period of time Processing of the application complies with the organization’s policies and procedures.
    Secondary users needs are fulfilled:
  • Security officer
  • DBA
  • Internal auditors
  • Record retention
  • Comptroller
    How to Use

    Test conditions created
  • These test conditions are generalized ones, which becomes test cases as the SDLC progresses until system is fully operational.
  • Test conditions are more effective when created from user’s requirements.
  • Test conditions if created from documents then if there are any error in the documents those will get incorporated in Test conditions and testing would not be able to find those errors.
  • Test conditions if created from other sources (other than documents) error trapping is effective.
  • Functional Checklist created.
    When to Use

  • Every application should be Requirement tested
  • Should start at Requirements phase and should progress till operations and maintenance phase.
  • The method used to carry requirement testing and the extent of it is important.
    Example

  • Creating test matrix to prove that system requirements as documented are the requirements desired by the user.
  • Creating checklist to verify that application complies to the organizational policies and procedures.

Unit Testing

In computer programming, a unit test is a method of testing the correctness of a particular module of source code.

The idea is to write test cases for every non-trivial function or method in the module so that each test case is separate from the others if possible. This type of testing is mostly done by the developers.

Benefits

The goal of unit testing is to isolate each part of the program and show that the individual parts are correct. It provides a written contract that the piece must satisfy. This isolated testing provides four main benefits:

Encourages change

Unit testing allows the programmer to refactor code at a later date, and make sure the module still works correctly (regression testing). This provides the benefit of encouraging programmers to make changes to the code since it is easy for the programmer to check if the piece is still working properly.

Simplifies Integration

Unit testing helps eliminate uncertainty in the pieces themselves and can be used in a bottom-up testing style approach. By testing the parts of a program first and then testing the sum of its parts will make integration testing easier.

Documents the code

Unit testing provides a sort of "living document" for the class being tested. Clients looking to learn how to use the class can look at the unit tests to determine how to use the class to fit their needs.

Separation of Interface from Implementation

Because some classes may have references to other classes, testing a class can frequently spill over into testing another class. A common example of this is classes that depend on a database; in order to test the class, the tester finds herself writing code that interacts with the database. This is a mistake, because a unit test should never go outside of its own class boundary. As a result, the software developer abstracts an interface around the database connection, and then implements that interface with their own Mock Object. This results in loosely coupled code, thus minimizing dependencies in the system.

Limitations

It is important to realize that unit-testing will not catch every error in the program. By definition, it only tests the functionality of the units themselves. Therefore, it will not catch integration errors, performance problems and any other system-wide issues. In addition, it may not be trivial to anticipate all special cases of input the program unit under study may receive in reality. Unit testing is only effective if it is used in conjunction with other software testing activities.

Unit Testing - Software Unit Testing, Tools, Research Topics, Toolkits, Extreme Programming Unit Testing

White box testing

White box testing is a test case design method that uses the control structure of the procedural design to derive test cases. Test cases can be derived that


1. guarantee that all independent paths within a module have been exercised at least once,
2. exercise all logical decisions on their true and false sides,
3. execute all loops at their boundaries and within their operational bounds, and
4. exercise internal data structures to ensure their validity.

The Nature of Software Defects

Logic errors and incorrect assumptions are inversely proportional to the probability that a program path will be executed. General processing tends to be well understood while special case processing tends to be prone to errors.


We often believe that a logical path is not likely to be executed when it may be executed on a regular basis. Our unconscious assumptions about control flow and data lead to design errors that can only be detected by path testing.

Typographical errors are random.

Basis Path Testing

This method enables the designer to derive a logical complexity measure of a procedural design and use it as a guide for defining a basis set of execution paths. Test cases that exercise the basis set are guaranteed to execute every statement in the program at least once during testing.


Flow Graphs

Flow graphs can be used to represent control flow in a program and can help in the derivation of the basis set. Each flow graph node represents one or more procedural statements. The edges between nodes represent flow of control. An edge must terminate at a node, even if the node does not represent any useful procedural statements. A region in a flow graph is an area bounded by edges and nodes. Each node that contains a condition is called a predicate node. Cyclomatic complexity is a metric that provides a quantitative measure of the logical complexity of a program. It defines the number of independent paths in the basis set and thus provides an upper bound for the number of tests that must be performed.



The Basis Set

An independent path is any path through a program that introduces at least one new set of processing statements (must move along at least one new edge in the path). The basis set is not unique. Any number of different basis sets can be derived for a given procedural design. Cyclomatic complexity, V(G), for a flow graph G is equal to

1. The number of regions in the flow graph.
2. V(G) = E - N + 2 where E is the number of edges and N is the number of nodes.
3. V(G) = P + 1 where P is the number of predicate nodes.

Deriving Test Cases
1. From the design or source code, derive a flow graph.
2. Determine the cyclomatic complexity of this flow graph.
   Even without a flow graph, V(G) can be determined by counting
the number of conditional statements in the code.
3. Determine a basis set of linearly independent paths.
    Predicate nodes are useful for determining the necessary paths.
4. Prepare test cases that will force execution of each path in the basis set.
   Each test case is executed and compared to the expected results.

Automating Basis Set Derivation
The derivation of the flow graph and the set of basis paths is amenable to automation. A software tool to do this can be developed using a data structure called a graph matrix. A graph matrix is a square matrix whose size is equivalent to the number of nodes in the flow graph. Each row and column correspond to a particular node and the matrix corresponds to the connections (edges) between nodes. By adding a link weight to each matrix entry, more information about the control flow can be captured. In its simplest form, the link weight is 1 if an edge exists and 0 if it does not. But other types of link weights can be represented:

� the probability that an edge will be executed,
� the processing time expended during link traversal,
� the memory required during link traversal, or
� the resources required during link traversal.

Graph theory algorithms can be applied to these graph matrices to help in the analysis necessary to produce the basis set.

Loop Testing

This white box technique focuses exclusively on the validity of loop constructs. Four different classes of loops can be defined:

1. simple loops,
2. nested loops,
3. concatenated loops, and
4. unstructured loops.

Simple Loops

The following tests should be applied to simple loops where n is the maximum number of allowable passes through the loop:

1. skip the loop entirely,
2. only pass once through the loop,
3. m passes through the loop where m < n,
4. n - 1, n, n + 1 passes through the loop.

Nested Loops

The testing of nested loops cannot simply extend the technique of simple loops since this would result in a geometrically increasing number of test cases. One approach for nested loops:

1. Start at the innermost loop. Set all other loops to minimum values.
2. Conduct simple loop tests for the innermost loop while holding the outer loops at their minimums. Add tests for out-of-range or excluded values.
3. Work outward, conducting tests for the next loop while keeping all other outer loops at minimums and other nested loops to typical values.
4. Continue until all loops have been tested.

Concatenated Loops

Concatenated loops can be tested as simple loops if each loop is independent of the others. If they are not independent (e.g. the loop counter for one is the loop counter for the other), then the nested approach can be used.

Unstructured Loops

This type of loop should be redesigned not tested!!!
Other White Box Techniques
Other white box testing techniques include:

1. Condition testing
exercises the logical conditions in a program.
2. Data flow testing
selects test paths according to the locations of definitions and uses of variables in the program.

Black box testing

Black box testing attempts to derive sets of inputs that will fully exercise all the functional requirements of a system. It is not an alternative to white box testing. This type of testing attempts to find errors in the following categories:

1. incorrect or missing functions,
2. interface errors,
3. errors in data structures or external database access,
4. performance errors, and 5. initialization and termination errors.

Tests are designed to answer the following questions:

1. How is the function's validity tested?
2. What classes of input will make good test cases?
3. Is the system particularly sensitive to certain input values?
4. How are the boundaries of a data class isolated?
5. What data rates and data volume can the system tolerate?
6. What effect will specific combinations of data have on system operation?

White box testing should be performed early in the testing process, while black box testing tends to be applied during later stages. Test cases should be derived which

1. reduce the number of additional test cases that must be designed to achieve reasonable testing, and
2. tell us something about the presence or absence of classes of errors, rather than an error associated only with the specific test at hand.

Equivalence Partitioning

This method divides the input domain of a program into classes of data from which test cases can be derived. Equivalence partitioning strives to define a test case that uncovers classes of errors and thereby reduces the number of test cases needed. It is based on an evaluation of equivalence classes for an input condition. An equivalence class represents a set of valid or invalid states for input conditions.

Equivalence classes may be defined according to the following guidelines:

1. If an input condition specifies a range, one valid and two invalid equivalence classes are defined.
2. If an input condition requires a specific value, then one valid and two invalid equivalence classes are defined.
3. If an input condition specifies a member of a set, then one valid and one invalid equivalence class are defined.
4. If an input condition is boolean, then one valid and one invalid equivalence class are defined.

Boundary Value Analysis

This method leads to a selection of test cases that exercise boundary values. It complements equivalence partitioning since it selects test cases at the edges of a class. Rather than focusing on input conditions solely, BVA derives test cases from the output domain also. BVA guidelines include:

1. For input ranges bounded by a and b, test cases should include values a and b and just above and just below a and b respectively.
2. If an input condition specifies a number of values, test cases should be developed to exercise the minimum and maximum numbers and values just above and below these limits.
3. Apply guidelines 1 and 2 to the output.
4. If internal data structures have prescribed boundaries, a test case should be designed to exercise the data structure at its boundary.

Cause-Effect Graphing Techniques

Cause-effect graphing is a technique that provides a concise representation of logical conditions and corresponding actions. There are four steps:

1. Causes (input conditions) and effects (actions) are listed for a module and an identifier is assigned to each.
2. A cause-effect graph is developed.
3. The graph is converted to a decision table.
4. Decision table rules are converted to test cases.
What is blackbox testing, difference between blackbox testing and whitebox testing, Blackbox Testing plans, unbiased blackbox testing

What Is a Good Test Case?

What Is a Good Test Case?

Abstract

Designing good test cases is a complex art. The complexity comes from three sources:
. Test cases help us discover information. Different types of tests are more effective for different classes of information.

. Test cases can be “good” in a variety of ways. No test case will be good in all of them.

. People tend to create test cases according to certain testing styles, such as domain testing or risk-based testing. Good domain tests are different from good risk-based tests.

What’s a Test Case?

Let’s start with the basics. What’s a test case?

IEEE Standard 610 (1990) defines test case as follows:
“(1) A set of test inputs, execution conditions, and expected results developed for a particular objective, such as to exercise a particular program path or to verify compliance with a specific requirement. “(2) (IEEE Std 829-1983) Documentation specifying inputs, predicted results, and a set of execution conditions for a test item.”

According to Ron Patton (2001, p. 65),
“Test cases are the specific inputs that you’ll try and the procedures that you’ll follow when you test the software.”

Boris Beizer (1995, p. 3) defines a test as “A sequence of one or more subtests executed as a sequence because the outcome and/or final state of one subtest is the input and/or initial state of the next. The word ‘test’ is used to include subtests, tests proper, and test suites.

“A test case specifies the pretest state of the IUT and its environment, the test inputs or conditions, and the expected result. The expected result specifies what the IUT should produce from the test inputs. This specification includes messages generated by the IUT, exceptions, returned values, and resultant state of the IUT and its environment. Test cases may also specify initial and resulting conditions for other objects that constitute the IUT and its environment.”

In practice, many things are referred to as test cases even though they are far from being fully documented.

Brian Marick uses a related term to describe the lightly documented test case, the test idea:

“A test idea is a brief statement of something that should be tested. For example, if you're testing a square root function, one idea for a test would be ‘test a number less than zero’. The idea is to check if the code handles an error case.”

In my view, a test case is a question that you ask of the program. The point of running the test is to gain information, for example whether the program will pass or fail the test.

It may or may not be specified in great procedural detail, as long as it is clear what is the idea of the test and how to apply that idea to some specific aspect (feature, for example) of the product. If the documentation is an essential aspect of a test case, in your vocabulary, please substitute the term “test idea” for “test case” in everything that follows.

An important implication of defining a test case as a question is that a test case must be reasonably capable of revealing information.

. Under this definition, the scope of test cases changes as the program gets more stable. Early in testing, when anything in the program can be broken, trying the largest “legal” value in a numeric input field is a sensible test. But weeks later, after the program has passed this test several times over several builds, a standalone test of this one field is no longer a test case because there is only a miniscule probability of failure. A more appropriate test case at this point might combine boundaries of ten different variables at the same time or place the boundary in the context of a long-sequence test or a scenario.

. Also, under this definition, the metrics that report the number of test cases are meaningless. What do you do with a set of 20 single-variable tests that were interesting a few weeks ago but now should be retired or merged into a combination? Suppose you create a combination test that includes the 20 tests. Should the metric report this one test, 20 tests, or 21? What about the tests that you run only once?

What about the tests that you design and implement but never run because the program design changes in ways that make these tests uninteresting?

Another implication of the definition is that a test is not necessarily designed to expose a defect. The goal is information. Very often, the information sought involves defects, but not always. (I owe this insight to Marick, 1997.) To assess the value of a test, we should ask how well it provides the information we’re looking for.

. Find defects. This is the classic objective of testing. A test is run in order to trigger failures that expose defects. Generally, we look for defects in all interesting parts of the product.

. Maximize bug count. The distinction between this and “find defects” is that total number of bugs is more important than coverage. We might focus narrowly, on only a few high-risk features, if this is the way to find the most bugs in the time available.

. Block premature product releases. This tester stops premature shipment by finding bugs so serious that no one would ship the product until they are fixed. For every release-decision meeting, the tester’s goal is to have new showstopper bugs.

. Help managers make ship / no-ship decisions. Managers are typically concerned with risk in the field. They want to know about coverage (maybe not the simplistic code coverage statistics, but some indicators of how much of the product has been addressed and how much is left), and how important the known problems are. Problems that appear significant on paper but will not lead to customer dissatisfaction are probably not relevant to the ship decision.

. Minimize technical support costs. Working in conjunction with a technical support or help desk group, the test team identifies the issues that lead to calls for support. These are often peripherally related to the product under test--for example, getting the product to work with a specific printer or to import data successfully from a third party database might prevent more calls than a low-frequency, data-corrupting crash.

. Assess conformance to specification. Any claim made in the specification is checked. Program characteristics not addressed in the specification are not (as part of this objective) checked.

. Conform to regulations. If a regulation specifies a certain type of coverage (such as, at least one test for every claim made about the product), the test group creates the appropriate tests. If the regulation specifies a style for the specifications or other
documentation, the test group probably checks the style. In general, the test group is focusing on anything covered by regulation and (in the context of this objective) nothing that is not covered by regulation.

. Minimize safety-related lawsuit risk. Any error that could lead to an accident or injury is of primary interest. Errors that lead to loss of time or data or corrupt data, but that don’t carry a risk of injury or damage to physical things are out of scope.

. Find safe scenarios for use of the product (find ways to get it to work, in spite of the bugs). Sometimes, all that you’re looking for is one way to do a task that will consistently work--one set of instructions that someone else can follow that will reliably deliver the benefit they are supposed to lead to. In this case, the tester is not looking for bugs. He is trying out, empirically refining and documenting, a way to do a task.

. Assess quality. This is a tricky objective because quality is multi-dimensional. The nature of quality depends on the nature of the product. For example, a computer game that is rock solid but not entertaining is a lousy game. To assess quality --to measure and report back on the level of quality -- you probably need a clear definition of the most important quality criteria for this product, and then you need a theory that relates test results to the definition. For example, reliability is not just about the number of bugs in the product. It is (or is often defined as being) about the number of reliability-related failures that can be expected in a period of time or a period of use. (Reliability-related? In measuring reliability, an organization might not care, for example, about misspellings in error messages.) To make this prediction, you need a mathematically and empirically sound model that links test results to reliability. Testing involves gathering the data needed by the model. This might involve extensive work in areas of the product believed to be stable as well as some work in weaker areas. Imagine a reliability model based on counting bugs found (perhaps weighted by some type of severity) per N lines of code or per K hours of testing. Finding the bugs is important. Eliminating duplicates is important. Troubleshooting to make the bug report easier to understand and more likely to fix is (in the context of assessment) out of scope.

. Verify correctness of the product. It is impossible to do this by testing. You can prove that the product is not correct or you can demonstrate that you didn’t find any errors in a given period of time using a given testing strategy. However, you can’t test exhaustively, and the product might fail under conditions that you did not test. The best you can do (if you have a solid, credible model) is assessment--test-based estimation of the probability of errors. (See the discussion of reliability, above).

. Assure quality. Despite the common title, quality assurance, you can’t assure quality by testing. You can’t assure quality by gathering metrics. You can’t assure quality by setting standards. Quality assurance involves building a high quality product and for that, you need skilled people throughout development who have time and motivation and an appropriate balance of direction and creative freedom. This is out of scope for a test organization. It is within scope for the project manager and associated executives. The test organization can certainly help in this process by performing a wide range of technical investigations, but those investigations are not quality assurance.

Given a testing objective, the good test series provides information directly relevant to that objective.

Tests Intended to Expose Defects

Let’s narrow our focus to the test group that has two primary objectives:

. Find bugs that the rest of the development group will consider relevant (worth reporting) and

. Get these bugs fixed.

Even within these objectives, tests can be good in many different ways. For example, we might say that one test is better than another if it is:

. More powerful. I define power in the usual statistical sense as more likely to expose a bug if it the bug is there. Note that Test 1 can be more powerful than Test 2 for one type of bug and less powerful than Test 2 for a different type of bug.

. More likely to yield significant (more motivating, more persuasive) results. A problem is significant if a stakeholder with influence would protest if the problem is not fixed. (A stakeholder is a person who is affected by the product. A stakeholder with influence is someone whose preference or opinion might result in change to the product.)

. More credible. A credible test is more likely to be taken as a realistic (or reasonable) set of operations by the programmer or another stakeholder with influence.
“ Corner case
” is an example of a phrase used by programmers to say that a test or bug is non-credible:
“ No one would do that.
” A test case is credible if some (or all) stakeholders agree that it is realistic.

. Representative of events more likely to be encountered by the customer. A population of tests can be designed to be highly credible. Set up your population to reflect actual usage probabilities. The more frequent clusters of activities are more likely to be covered or covered more thoroughly. (I say cluster of activities to suggest that many features are used together and so we might track which combinations of features are used and in what order, and reflect this more specific information in our analysis.) For more details, read Musa's (1998) work on software reliability engineering.

. Easier to evaluate. The question is, did the program pass or fail the test? Ease of Evaluation. The tester should be able to determine, quickly and easily, whether the program passed or failed the test. It is not enough that it is possible to tell whether the program passed or failed. The harder evaluation is, or the longer it takes, the more likely it is that failures will slip through unnoticed. Faced with time-consuming evaluation, the tester will take shortcuts and find ways to less expensively guess whether the program is OK or not. These shortcuts will typically be imperfectly accurate (that is, they may miss obvious bugs or they may flag correct code as erroneous.)

. More useful for troubleshooting. For example, high volume automated tests will often crash the system under test without providing much information about the relevant test conditions needed to reproduce the problem. They are not useful for troubleshooting. Tests that are harder to repeat are less useful for troubleshooting. Tests that are harder to perform are less likely to be performed correctly the next time, when you are troubleshooting a failure that was exposed by this test.

. More informative. A test provides value to the extent that we learn from it. In most cases, you learn more from the test that the program passes than the one the program fails, but the informative test will teach you something (reduce your uncertainty) whether the program passes it or fails.

o For example, if we have already run a test in several builds, and the program reliably passed it each time, we will expect the program to pass this test again. Another "pass" result from the reused test doesn't contribute anything to our mental model of the program.
o The notion of equivalence classes provides another example of information value. Behind any test is a set of tests that are sufficiently similar to it that we think of the other tests as essentially redundant with this one. In traditional jargon, this is the "equivalence class" or the "equivalence partition." If the tests are sufficiently similar, there is little added information to be obtained by running the second one after running the first.
o This criterion is closely related to Karl Popper’s theory of value of experiments (See Popper 1992). Good experiments involve risky predictions. The theory predicts something that many people would expect not to be true. Either your favorite theory is false or lots of people are surprised. Popper’s analysis of what makes for good experiments (good tests) is a core belief in a mainstream approach to the philosophy of science. Perhaps the essential consideration here is that the expected value of what you will learn from this test has to be balanced against the opportunity cost of designing and running the test. The time you spend on this test is time you don't have available for some other test or other activity.

. Appropriately complex. A complex test involves many features, or variables, or other attributes of the software under test. Complexity is less desirable when the program has changed in many ways, or when you’re testing many new features at once. If the program has many bugs, a complex test might fail so quickly that you don’t get to run much of it. Test groups that rely primarily on complex tests complain of blocking bugs. A blocking bug causes many tests to fail, preventing the test group from learning the other things about the program that these tests are supposed to expose. Therefore, early in testing, simple tests are desirable. As the program gets more stable, or (as in eXtreme Programming or any evolutionary development lifecycle) as more stable features are incorporated into the program, greater complexity becomes more desirable.

. More likely to help the tester or the programmer develop insight into some aspect of the product, the customer, or the environment. Sometimes, we test to understand the product, to learn how it works or where its risks might be. Later, we might design tests to expose faults, but especially early in testing we are interested in learning what it is and how to test it. Many tests like this are never reused. However, in a test-first design environment, code changes are often made experimentally, with the expectation that the (typically, unit) test suite will alert the programmer to side effects. In such an environment, a test might be designed to flag a performance change, a difference in rounding error, or some other change that is not a defect. An unexpected change in program behavior might alert the programmer that her model of the code or of the impact of her code change is incomplete or wrong, leading her to additional testing and troubleshooting. (Thanks to Ward Cunningham and Brian Marick for suggesting this example.)

. Function testing

. Domain testing

. Specification-based testing

. Risk-based testing

. Stress testing

. Regression testing

. User testing

. Scenario testing

. State-model based testing

. High volume automated testing

. Exploratory testing

Bach and I call these "paradigms" of testing because we have seen time and again that one or two of them dominate the thinking of a testing group or a talented tester. An analysis we find intriguing goes like this:

If I was a "scenario tester" (a person who defines testing primarily in terms of application of scenario tests), how would I actually test the program? What makes one scenario test better than another? Why types of problems would I tend to miss, what would be difficult for me to find or interpret, and what would be particularly easy?
Here are thumbnail sketches of the styles, with some thoughts on how test cases are “good”
within them.

Function Testing

Test each function / feature / variable in isolation.

Most test groups start with fairly simple function testing but then switch to a different style, often involving the interaction of several functions, once the program passes the mainstream function tests. Within this approach, a good test focuses on a single function and tests it with middle-of-theroad values. We don’t expect the program to fail a test like this, but it will if the algorithm is fundamentally wrong, the build is broken, or a change to some other part of the program has fowled this code. These tests are highly credible and easy to evaluate but not particularly powerful.

Some test groups spend most of their effort on function tests. For them, testing is complete when every item has been thoroughly tested on its own. In my experience, the tougher function tests look like domain tests and have their strengths.

Domain Testing

The essence of this type of testing is sampling. We reduce a massive set of possible tests to a small group by dividing (partitioning) the set into subsets (subdomains) and picking one or two representatives from each subset.

In domain testing, we focus on variables, initially one variable at time. To test a given variable, the set includes all the values (including invalid values) that you can imagine being assigned to the variable. Partition the set into subdomains and test at least one representative from each subdomain. Typically, you test with a "best representative", that is, with a value that is at least as likely to expose an error as any other member of the class. If the variable can be mapped to the number line, the best representatives are typically boundary values.
Most discussions of domain testing are about input variables whose values can be mapped to the number line. The best representatives of partitions in these cases are typically boundary cases. A good set of domain tests for a numeric variable hits every boundary value, including the minimum, the maximum, a value barely below the minimum, and a value barely above the maximum.


Whittaker (2003) provides an extensive discussion of the many different types of variables we can analyze in software, including input variables, output variables, results of intermediate calculations, values stored in a file system, and data sent to devices or other programs.
.
Kaner, Falk & Nguyen (1993) provided a detailed analysis of testing with a variable (printer type, in configuration testing) that can’t be mapped to a number line.

These tests are higher power than tests that don’t use “best representatives” or that skip some of the subdomains (e.g. people often skip cases that are expected to lead to error messages).

The first time these tests are run, or after significant relevant changes, these tests carry a lot of information value because boundary / extreme-value errors are common.

Bugs found with these tests are sometimes dismissed, especially when you test extreme values of several variables at the same time. (These tests are called corner cases.) They are not necessarily credible, they don’t necessarily represent what customers will do, and thus they are not necessarily very motivating to stakeholders.

Specification-Based Testing

Check the program against every claim made in a reference document, such as a design specification, a requirements list, a user interface description, a published model, or a user manual.

These tests are highly significant (motivating) in companies that take their specifications seriously. For example, if the specification is part of a contract, conformance to the spec is very important. Similarly products must conform to their advertisements, and life-critical products must conform to any safety-related specification. Specification-driven tests are often weak, not particularly powerful representatives of the class of tests that could test a given specification item.

Some groups that do specification-based testing focus narrowly on what is written in the document. To them, a good set of tests includes an unambiguous and relevant test for each claim made in the spec.
Other groups look further, for problems in the specification. They find that the most informative tests in a well-specified product are often the ones that explore ambiguities in the spec or examine aspects of the product that were not well-specified.

Risk-Based Testing

Imagine a way the program could fail and then design one or more tests to check whether the program will actually fail that in way.

A “complete” set of risk-based tests would be based on an exhaustive risk list, a list of every way the program could fail.

A good risk-based test is a powerful representative of the class of tests that address a given risk.

To the extent that the tests tie back to significant failures in the field or well known failures in a competitor’s product, a risk-based failure will be highly credible and highly motivating. However, many risk-based tests are dismissed as academic (unlikely to occur in real use). Being able to tie the “risk” (potential failure) you test for to a real failure in the field is very valuable, and makes tests more credible.

Risk-based tests tend to carry high information value because you are testing for a problem that you have some reason to believe might actually exist in the product. We learn a lot whether the program passes the test or fails it.

Stress Testing

There are a few different definition of stress tests.

. Under one common definition, you hit the program with a peak burst of activity and see it fail.

. IEEE Standard 610.12-1990 defines it as "Testing conducted to evaluate a system or component at or beyond the limits of its specified requirements with the goal of causing the system to fail."

. A third approach involves driving the program to failure in order to watch how the program fails. For example, if the test involves excessive input, you don’t just test near the specified limits. You keep increasing the size or rate of input until either the program finally fails or you become convinced that further increases won’t yield a failure. The fact that the program eventually fails might not be particularly surprising or motivating. The interesting thinking happens when you see the failure and ask what vulnerabilities have been exposed and which of them might be triggered under less extreme circumstances. Jorgensen (2003) provides a fascinating example of this style of work.

I work from this third definition.

These tests have high power.

Some people dismiss stress test results as not representative of customer use, and therefore not credible and not motivating. Another problem with stress testing is that a failure may not be useful unless the test provides good troubleshooting information, or the lead tester is extremely familiar with the application. A good stress test pushes the limit you want to push, and includes enough diagnostic support to make it reasonably easy for you to investigate a failure once you see it. Some testers, such as Alberto Savoia (2000), use stress-like tests to expose failures that are hard to see if the system is not running several tasks concurrently. These failures often show up well within the theoretical limits of the system and so they are more credible and more motivating. They are not necessarily easy to troubleshoot.

Regression Testing

Design, develop and save tests with the intent of regularly reusing them, Repeat the tests after making changes to the program.

This is a good point (consideration of regression testing) to note that this is not an orthogonal list of test types. You can put domain tests or specification-based tests or any other kinds of tests into your set of regression tests.

So what’s the difference between these and the others? I’ll answer this by example:

Suppose a tester creates a suite of domain tests and saves them for reuse. Is this domain testing or regression testing?

. I think of it as primarily domain testing if the tester is primarily thinking about partitioning variables and finding good representatives when she creates the tests.

. I think of it as primarily regression testing if the tester is primarily thinking about building a set of reusable tests.

Regression tests may have been powerful, credible, and so on, when they were first designed. However, after a test has been run and passed many times, it’s not likely that the program will fail it the next time, unless there have been major changes or changes in part of the code directly involved with this test. Thus, most of the time, regression tests carry little information value.

A good regression test is designed for reuse. It is adequately documented and maintainable. (For suggestions that improve maintainability of GUI-level tests, see Graham & Fewster, 1999; Kaner, 1998; Pettichord, 2002, and the papers at www.pettichord.com in general). A good regression test is designed to be likely to fail if changes induce errors in the function(s) or area(s) of the program addressed by the regression test.

User Testing

User testing is done by users. Not by testers pretending to be users. Not by secretaries or executives pretending to be testers pretending to be users. By users. People who will make use of the finished product.

User tests might be designed by the users or by testers or by other people (sometimes even by lawyers, who included them as acceptance tests in a contract for custom software). The set of user tests might include boundary tests, stress tests, or any other type of test.

Some user tests are designed in such detail that the user merely executes them and reports whether the program passed or failed them. This is a good way to design tests if your goal is to provide a carefully scripted demonstration of the system, without much opportunity for wrong things to show up as wrong. If your goal is to discover what problems a user will encounter in real use of the system, your task is much more difficult. Beta tests are often described as cheap, effective user tests but in practice they can be quite expensive to administer and they may not yield much information. For some suggestions on beta tests, see Kaner, Falk & Nguyen (1993).

A good user test must allow enough room for cognitive activity by the user while providing enough structure for the user to report the results effectively (in a way that helps readers understand and troubleshoot the problem).

Failures found in user testing are typically credible and motivating. Few users run particularly powerful tests. However, some users run complex scenarios that put the program through its paces.

Scenario Testing

A scenario is a story that describes a hypothetical situation. In testing, you check how the program copes with this hypothetical situation.

The ideal scenario test is credible, motivating, easy to evaluate, and complex.

In practice, many scenarios will be weak in at least one of these attributes, but people will still call them scenarios. The key message of this pattern is that you should keep these four attributes in mind when you design a scenario test and try hard to achieve them.


An important variation of the scenario test involves a harsher test. The story will often involve a sequence, or data values, that would rarely be used by typical users. They might arise, however, out of user error or in the course of an unusual but plausible situation, or in the behavior of a hostile user. Hans Buwalda (2000a, 2000b) calls these "killer soaps" to distinguish them from normal scenarios, which he calls "soap operas." Such scenarios are common in security testing or other forms of stress testing.

In the Rational Unified Process, scenarios come from use cases. (Jacobson, Booch, & Rumbaugh, 1999). Scenarios specify actors, roles, business processes, the goal(s) of the actor(s), and events that can occur in the course of attempting to achieve the goal. A scenario is an instantiation of a use case. A simple scenario traces through a single use case, specifying the data values and thus the path taken through the case. A more complex use case involves concatenation of several use cases, to track through a given task, end to end. (See also Bittner & Spence, 2003; Cockburn, 2000; Collard, 1999; Constantine & Lockwood, 1999; Wiegers, 1999.) For a cautionary note, see Berger (2001).

However they are derived, good scenario tests have high power the first time they’re run.

Groups vary in how often they run a given scenario test.

. Some groups create a pool of scenario tests as regression tests.
. Others (like me) run a scenario once or a small number of times and then design another scenario rather than sticking with the ones they’ve used before.

Testers often develop scenarios to develop insight into the product. This is especially true early in testing and again late in testing (when the product has stabilized and the tester is trying to understand advanced uses of the product.)

State-Model-Based Testing

In state-model-based testing, you model the visible behavior of the program as a state machine and drive the program through the state transitions, checking for conformance to predictions from the model. This approach to testing is discussed extensively at www.model-basedtesting.org. In general, comparisons of software behavior to the model are done using automated tests and so the failures that are found are found easily (easy to evaluate).

In general, state-model-based tests are credible, motivating and easy to troubleshoot. However, state-based testing often involves simplifications, looking at transitions between operational modes rather than states, because there are too many states (El-Far 1995). Some abstractions to operational modes are obvious and credible, but others can seem overbroad or otherwise odd to some stakeholders, thereby reducing the value of the tests. Additionally, if the model is oversimplified, failures exposed by the model can be difficult to troubleshoot (Houghtaling, 2001). Talking about his experiences in creating state models of software, Harry Robinson (2001) reported that much of the bug-finding happens while doing the modeling, well before the automated tests are coded. Elisabeth Hendrickson (2002) trains testers to work with state models as an exploratory testing tool--her models might never result in automated tests, their value is that they guide the analysis by the tester. El-Far, Thompson & Mottay (2001) and El-Far (2001) discuss some of the considerations in building a good suite of model-based tests. There are important tradeoffs, involving, for example, the level of detail (more detailed models find more bugs but can be much harder to read and maintain). For much more, see the papers at www.model-based-testing.org.

High-Volume Automated Testing

High-volume automated testing involves massive numbers of tests, comparing the results against one or more partial oracles.

. The simplest partial oracle is running versus crashing. If the program crashes, there must be a bug. See Nyman (1998, 2002) for details and experience reports.

. State-model-based testing can be high volume if the stopping rule is based on the results of the tests rather than on a coverage criterion. For the general notion of stochastic state-based testing, see Whittaker (1997). For discussion of state-model-based testing ended by a coverage stopping rule, see Al-Ghafees & Whittaker (2002).

. Jorgensen (2002) provides another example of high-volume testing. He starts with a file that is valid for the application under test. Then he corrupts it in many ways, in many places, feeding the corrupted files to the application. The application rejects most of the bad files and crashes on some. Sometimes, some applications lose control when handling these files. Buffer overruns or other failures allow the tester to take over the application or the machine running the application. Any program that will read any type of data stream can be subject to this type of attack if the tester can modify the data stream before it reaches the program.
. Kaner (2000) describes several other examples of high-volume automated testing approaches. One classic approach repeatedly feeds random data to the application under test and to another application that serves as a reference for comparison, an oracle. Another approach runs an arbitrarily long random sequence of regression tests, tests that the program has shown it can pass one by one. Memory leaks, stack corruption, wild pointers or other garbage that cumulates over time finally causes failures in these long sequences. Yet another approach attacks the program with long sequences of activity and uses probes (tests built into the program that log warning or failure messages in response to unexpected conditions) to expose problems.

High-volume testing is a diverse grouping. The essence of it is that the structure of this type of testing is designed by a person, but the individual test cases are developed, executed, and interpreted by the computer, which flags suspected failures for human review. The almost-complete automation is what makes it possible to run so many tests.

. The individual tests are often weak. They make up for low power with massive numbers.

. Because the tests are not handcrafted, some tests that expose failures may not be particularly credible or motivating. A skilled tester often works with a failure to imagine a broader or more significant range of circumstances under which the failure might arise, and then craft a test to prove it.

. Some high-volume test approaches yield failures that are very hard to troubleshoot. It is easy to see that the failure occurred in a given test, but one of the necessary conditions that led to the failure might have been set up thousands of tests before the one that actually failed. Building troubleshooting support into these tests is a design challenge that some test groups have tackled more effectively than others.

Exploratory Testing

Exploratory testing is “any testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests” (Bach 2003a). Bach points out that tests span a continuum between purely scripted (the tester does precisely what the script specifies and nothing else) to purely exploratory (none of the tester’s activities are pre-specified and the tester is not required to generate any test documentation beyond bug reports). Any given testing effort falls somewhere on this continuum. Even predominantly pre-scripted testing can be exploratory when performed by a skilled tester. “In the prototypic case (what Bach calls “freestyle exploratory testing”), exploratory testers continually learn about the software they’re testing, the market for the product, the various ways in which the product could fail, the weaknesses of the product (including where problems have been found in the application historically and which developers tend to make which kinds of errors), and the best ways to test the software. At the same time that they’re doing all this learning, exploratory testers also test the software, report the problems they find, advocate for the problems they found to be fixed, and develop new tests based on the information they’ve obtained so far in their learning.” (Tinkham & Kaner, 2003)

An exploratory tester might use any type of test--domain, specification-based, stress, risk-based, any of them. The underlying issue is not what style of testing is best but what is most likely to reveal the information the tester is looking for at the moment.

Exploratory testing is not purely spontaneous. The tester might do extensive research, such as studying competitive products, failure histories of this and analogous products, interviewing programmers and users, reading specifications, and working with the product.

What distinguishes skilled exploratory testing from other approaches and from unskilled exploration, is that in the moments of doing the testing, the person who is doing exploratory testing well is fully engaged in the work, learning and planning as well as running the tests. Test cases are good to the extent that they advance the tester’s knowledge in the direction of his information-seeking goal. Exploratory testing is highly goal-driven, but the goal may change quickly as the tester gains new knowledge.

Concluding Notes

There’s no simple formula or prescription for generating “good” test cases. The space of interesting tests is too complex for this. There are tests that are good for your purposes, for bringing forth the type of information that you’re seeking.

Many test groups, most of the ones that I’ve seen, stick with a few types of tests. They are primarily scenario testers or primarily domain testers, etc. As they get very good at their preferred style(s) of testing, their tests become, in some ways, excellent. Unfortunately, no style yields tests that are excellent in all of the ways we wish for tests. To achieve the broad range of value from our tests, we have to use a broad range of techniques.

Correlation

Correlation? What’s that?

If you think correlation has something to do with the fit of data points to a function curve on a graph, and the word has no meaning to you in the context of LoadRunner then this document is for you. It explains what correlation in LoadRunner is, why you have to do it, how to do it, and what to do when it goes wrong. If this is the first time that you have used LoadRunner, or if you have been using it a little but are not a guru, then read on.



Introduction

LoadRunner when recording a script simply listens to the client (browser) talking to the server (web server) and writes it all down. The complete transcript of everything that was said, the dates/time, content, requests and replies can be found in the Recording Log. (View-> Output Window-> Recording Log) The script is sort of an easier to read version of this. The main difference is that the script only contains the client’s communication.

If you imagine that LoadRunner is an impersonator pretending to be the client (browser) the script is LoadRunner’s note that tell it what to say to the server to successfully fool it. We want the server to believe that LoadRunner is a real client, and so send it the information requested.

This script has the hard coded information of the original conversation (Browser session) that occurred between the client and server. This hard coded information may not be enough to fool the server during replay however. It may have to be correlated.


What is correlation?

Correlation is where the script is modified so that some of the hard coded values in the script are no longer hard coded. Rather than have LoadRunner send the original value to the server, we may need to send different values.

For example, the original recorded script may have included the server sending the client a session identification number. Something to identify the client during that particular session. This session ID was hard coded into the script during recording.

During replay, the server will send LoadRunner a new session ID. We need to capture this value, and incorporate it into the script so we can send it back to the server to correctly identify ourselves for this new session. If we leave the script unmodified, we will send the old hard coded session ID to the server. The server will look at it and think it invalid, or unknown, and so will not send us the pages we have requested. LoadRunner will not have successfully fooled the server into believing it is a client.

Correlation is the capturing of dynamic values passed from the server to the client and back. We save this captured value into a LoadRunner parameter, and then use this parameter in the script in place of the original value. During replay, LoadRunner will now listen to what the server sends to it, and when it makes requests of the server, send this new, valid value back to the server. Thus fooling the server into believing it is talking to a real client.

Why do I have to correlate?

If you try to replay a script without correlating first, then most likely the script will fail. The requests it sends to the server will not be replied to. Either the session ID is invalid, so the server won’t allow you into the site, or it won’t allow you to create new records because they are the same as existing ones, or the server won’t understand your request because it isn’t what it is expecting.

Any value which changes every time you connect to the server is a candidate for correlation. A correlated script will send the server the information it is looking for, and so allow the script to replay. This will allow many Vusers to replay the script many times, and so place load on your server.



What errors mean I have to correlate?

There are no specific errors that are associated with correlation, but there are errors that could be caused because a value hasn’t been correlated. For example, a session ID. If an invalid session ID is sent to a web server, how that server responds depends on the implementation of that server. It might send a page specifically stating the Session ID is invalid and ask you to log in again. It might send an HTTP 404 Page not found error because the requesting user didn’t have permissions for the specified page, and so the server couldn’t find the page.

In general any error message returned from the server after LoadRunner makes a request that complains about permissions can point to a hard coded value that needs to be correlated.


The tools (functions) used to correlate.

In LoadRunner 7.X there four functions that you can use for correlation. A list of them, along with documentation and examples can be found in the on-line documentation. From VuGen, go to Help-> Function reference-> Contents-> Web and Wireless Vuser Functions-> Correlation Functions.

The first two functions are essentially the same, and I will talk about then together. The third function, the web_save_reg_param function differs in implementation, and the parameters it takes, but does the same job, and is used in much the same way. The last function is associated with the first two, and isn’t directly a correlation function, but rather a LoadRunner setting. It will be talked about later in a different section.

Web_create_html_param :-

This is the standard correlation function in LoadRunner 6.X and 7.X. This function takes three parameters.

web_create_html_param ( “Parameter Name”, “Left Boundary”, “Right Boundary” );

Each of these parameters is a pointer to a string. That means that if they are entered as literal text, they need to be enclosed in “quotes“. Each parameter is separated by a comma.

Parameter Name:- This is the name of the parameter, or place holder variable that LoadRunner will save the captured value into. After successfully capturing the value, the parameter name is used in the script in place of the original value. LoadRunner will identify the parameter / placeholder, and substitute the captured value for the placeholder during replay. This name should have no spaces, but apart from that limitation, it is entirely up to you what name you give.

Left Boundary:- This is where we tell LoadRunner how to find the dynamic value that we are looking for. In the Left Boundary we specify the text that will appear to the left of the changing value.

Right Boundary:- This is where we tell LoadRunner how to identify the end of the dynamic value we are looking for. Here we place the text that will appear after the value we are looking for.

web_create_html_param_ex

web_create_html_param_ex ( “Parameter Name”, “Left Boundary”, “Right Boundary”, “Instance”);

This function is the same as the web_create_html_param function, except it doesn’t look for the first instance of the boundaries, but rather the nth instance of those boundaries. The first three parameters are the same, name, left, right, the last parameter is a pointer to a string, so it must be enclosed in double quotes. It is the number of the occurrence. If you place the number one here (i.e. “1”) then the function behaves exactly as the web_create_html_param function. It looks for the first occurrence. If you put the number three here (i.e. “3”) it will look for the 3rd occurrence of the left and right boundaries, and place what appears in between into the parameter.

web_reg_save_param

web_reg_save_param ( “Parameter Name” , , LAST );

The first thing to note about this function as different from the web_create_html_param functions is that the number of parameters it takes can vary. The first one is still the name, but after that there are different attributes that can be used. These attributes can appear in any order because they contain within them what they are. For example, the attribute to identify the left boundary is “LB= followed by the text of the left boundary. I won’t be talking about all of the options for this function, they are listed in the documentation. Please have a look at it. (Help-> Function Reference)

The first parameter is the name, then the list of attributes or parameters, then the keyword LAST. This identifies the end of the function. The keyword is not enclosed in quotes, all parameters are. All parameters and keywords are separated by commas.



Identifying values to correlate.

So we have the tools, and we know why we need to use them, but how do we know what to use them on? What values in the script need to be correlated. The simplest answer is, “Any value that changes between sessions required for the script to replay.”

A hypothetical example. We are logging onto a web site. When we send the server our user name and password, it replies to us with a session ID that is good for that session. The session ID needs to be correlated for replay. We need to capture this value during replay to use in the script in place of the hard coded value.

To identify values to correlate, record the script, and save it. Open a new script, and record the same actions, and business process again. As much as possible, during recording, enter the same values in both scripts. For example, user ID, password, and fields and edit selections. Save the second script, and then run it with Extended log. (Vuser-> Run time settings-> Log-> Extended log. Check all three options)

Go to tools-> Compare with Vuser, and choose the first recorded script. WinDiff will open and display the two scripts side by side. Lines with differences in them will be highlighted in Yellow. Differences within the line, will be in red.




If WinDiff gives an error here, dismiss the error, WinDiff will be minimized in the task bar. Right click on it. Choose restore. Then go to File-> Select Files/Directories, and manually select the action sections for the two scripts.

Differences like "lr_think_time" can be ignored. They are load runner pacing functions, and don’t represent data sent to the server.

Locate the first difference and take note of it and search the script open in VuGen for that difference. That is the original value hard coded into the script that was different in the second script. Highlight it, and copy it.



Go to the Recording log, and place your cursor at the top. Hit Control F (Ctrl+F) to do a search and paste in the original value. We are looking for the first occurrence of this value in the recording log. If you don’t find the value in the recording log, check you are looking in the right scripts recording log. Remember you have two almost identical scripts here.


If you find the value, scroll up in the log, and make sure the value was sent as part of a response from the server. The first header you come across while looking up the script should be preference with a receiving response. This indicates that the value was sent by the server to the client. If the value first appears as part of a sending request, then the value originated on the client side, and doesn’t need to be correlated, but rather parameterized. That is a different topic all together. The response will have a comment before it that looks like this

*** [tid=640 Action1 2] Receiving response ( 10/8/2001 12:10:26 )



So, we have a value that is different between subsequent recordings, it was sent from the server to the client. This value most likely needs to be correlated. If the value you were looking for doesn’t meet these criteria,

1. Different between recordings
2. Originated first on the server and sent to the client

It probably doesn’t need to be correlated.



Now that we know Why and What, How do we parameterize?

Step 1.
After confirming that the first occurrence was part of a received response from the server, we need to now figure out where to place the web_create_html_param( ) function. The web_create_html_param statement needs to go immediately before the request that fetched the dynamic value from the server. In order to find this request or URL in the script, we need to replay the script once with extended log and all the three options (In Vuser->Runtime Settings->Log) turned on.

In the recording log, pick up the text that is before the dynamic value. This text should remain constant no matter how many times you replay the script and highlight it and copy it. This is the text that will identify to LoadRunner where to find the start of the value we are capturing.




Now, go to the execution log and search for the text that you just copied from the recording log.




You should see a corresponding Action1.c() at the beginning of that line with a number in the brackets. That is the number of the line the script where you need to put the web_create_html_param( ) function. The function should go right above that line in the script.



So, add a couple of blank lines to your script before the function at that line, and then type in web_create_html_param(“UserSession” but give it a name that means more to you than UserSession.



Step 2.
Go back to the execution log and highlight the text to the left of the dynamic value and copy it. This should be some of the same text we searched for in the Execution log.

The amount of text you highlight should be sufficient so that it is unique in this reply from the server. I would suggest copying as much as possible without copying any special characters. These show in the execution log as black squares, and the actual character they represent is uncertain. After selecting a boundary, go to the top of the Servers reply, and hit Ctrl+F and do a search for that boundary. You want to make certain what you have selected is the first occurrence in the servers reply. If it isn’t select more text to make it unique, or consider using the web_create_html_param_ex function or the ORD parameter or the web_reg_save_param function.

Once you have finalized the static text that represents the left boundary, copy it into the web_create_html_param (or web_reg_save_param) statement. If it contains any carriage returns, place it all on one line. If there are any “ in the text, place the escape character before it so LoadRunner doesn’t incorrectly think it is the end of the parameter, but rather a character to search for. For example, if the Left boundary was 'input type=hidden name=userSession value=' without the single quotes and we are using the web_create_html_param statement, then the function we have so far would be

Web_create_html_param(“UserSession”, “input type=hidden name=userSession value=”,


Step 3.
We are now going to tell LoadRunner how to identify the end of the value we are trying to capture. That is the right boundary of what we are looking for. Again look in the execution log and copy the static text that appears to the right of the dynamic value we are looking for. For example, lets say the execution log contained the following

… userSession value=75893.0884568651DQADHfApHDHfcDtccpfAttcf>…

Then the example so far to save the number into the parameter UserSession would be

web_create_html_param(“UserSession”, “input type=hidden name=userSession value=”, “>”);



In choosing a right boundary, make sure you choose enough static text to specify the end of the value. If the boundary you specify appears in the value you are trying to capture, then you will not capture the whole value.

Recap:-
That was a lot of looking through the recording and execution logs and checking of values. Lets just recap what we have done. We have identified a value that we think needs to be correlated. We then identified in the script where to place the statement that would ultimately capture and save the value into a parameter. We then placed the statement, and gave LoadRunner the text strings that appear on either side of the value we are looking for so that it can find it.

The flow of logic for this is, the correlation functions tells LoadRunner what to look for in the next set of replies from the server. LoadRunner makes a request of the server. The server replies. LoadRunner looks thorough the replies for the left and right boundaries. If it finds then, what is in-between is then saved to a parameter of the name specified.

Remember, the parameter can’t have a value till AFTER the next statement is executed. The correlation statement only tells LoadRunner what to look for. It doesn’t assign a value to the parameter. Assignment of a value to the parameter doesn’t happen till after LoadRunner makes a request of the server and looks in the reply. If you have in your script a case where a correlation statement is followed by a function that attempts to use the parameter, the statement is in the wrong place, and the script will fail.
This is always incorrect:-

web_create_html_param(…);
Web_submit (… {Parameter}…);

There needs to be in-between the two, the request of the server that causes it to reply with the value we are trying to capture.

Replacing the hard coded value in the script with the parameter.

Once we have created the parameter, the next step is to replace the hard coded occurrences with the parameter. Look through the script for the original value. Where you find it, delete out the value and replace it with the parameter. Note, only the value we want replaces is deleted. The characters around it remain.

i.e.

Change :
..... .....userSession=75893.0884568651DQADHfApHDHfcDtccpfAttcf&username=test........
.....


To :
.....
.....userSession={UserSession}&username=test........
.....



At this point, you are ready to run the script to test if it works, or if it needs further correlation, or more work on this correlation.



Common errors when correlating.

When LoadRunner fails to find the boundaries for a web_create, it will print a warning message in the execution log like this:-

Warning: No match found for the requested parameter "Name". If the data you want to save exceeds 256 bytes, use web_set_max_html_param_len to increase the parameter size

Firstly, this is a warning not an error. There are times when you might want to use the web_create_html_param function for purposes other that correlation. These require the function to not cause an error, so this is a warning.

Secondly, the advice the warning message gives is good, but I recommend thinking about it first. Was the value you were trying to capture more than 256 characters long? In the above example it was only 20 characters long. Have a look at the recording log and see how long the original value is. Have a look at the second recording made earlier and see how long it was in that script. Turn on the extended log (Run-Time settings-> Log-> Extended log-> All data returned from server) and have a look at how long it is in the execution log. If at any time any of these values was close to being, say 200 characters, then yes, add a web_set_max_html_param_len statement to the start of the script to make the maximum longer than 256 characters. If all occurrences of the script were much shorter that the max parameter length then the problem is either the web_create_html_param is in the wrong place, or that the boundaries are incorrect. Go back and look at the boundaries that you have selected, look at the placement of the web_create_html_param function. Is it immediately before the statement that causes the server to reply with the data you are looking for?

The Parameter length is longer than the current maximum.
(The web_set_max_html_param_len function)

web_set_max_html_param_len ( “length” );

This statement tells LoadRunner to look for larger matches for the left and right parameter. When it finds the left boundary, it will look ahead the max parameter length for the right boundary. This setting is script wide, and takes effect from when it is executed. It only needs appear in the script once. Having LoadRunner look for longer matches uses up more memory, and CPU to search through the returned text from the server. For this reason, don’t set it too high, or you will be making your script less scalable. That is you will reduce the number of Vusers that can run it on a given machine. Try to have the maximum parameter length no more than 100 characters greater that what you are expecting.



Special cases for the boundaries:-

There are some special characters and cases when specifying the boundaries. Double quotes should be preceded by a \ so LoadRunner recognizes then as part of the string to look for. If your text includes any carriage returns, that are part of the http, and not just part of the wrap around in the recording log, these need to be specified as a \r\n character set. If the \ character is part of the text, it too needs to be preceded by a \ to indicate it is a literal.

Recording log boundary parameters (left, right)

Value=”57685” “Value=\”” “\””

Value_”\item\”value’7875’ “Value_\”\\item\\\”value’ “’”

Value= “Value=\r\n\”” “\””
“7898756”


Debug help

Sometimes you want to print out the value that was assigned to a parameter. To do this, use the lr_eval_string function, and the lr_output_message function. For example, to print the value of the parameter to the execution log.

lr_output_message(“ Value Captured = %s", lr_eval_string("{Name}"));

If you find that the value being substituted is too long, too short, or completely wrong, printing out the value will help identify the changes you need to make to the correlation function. If you have extra characters at the start of the value, you need to add them to the end of the left boundary, if you have extra characters at the end of the parameter value, you need to add them to the start of the right boundary. If you are getting the wrong value all together, do searches in the recording log for the left boundary, and make sure that you have a unique boundary, and that LoadRunner isn’t picking up an earlier occurrence. You can then use the web_create_html_param_ex function, or add to the boundaries to make them unique.



Other Correlation help resources.

The examples in the Function reference contain a lot of data and examples on how to use these function. I would recommend looking over them.

The Customer Support site has a video for download that goes over correlation. You can get it from

http://support.merc-int.com

After logging in, go to Downloads-> Brows. Enter LoadRunner in the product selection box, Mercury Interactive downloads radio button, and click on retrieve. Under training,
click on the LoadRunner Web Script Correlation Training link.