[Gambas-user] Questions about gb.test and gb.test.tap

Tobias Boege taboege at gmail.com
Mon Feb 24 23:20:58 CET 2020


On Mon, 24 Feb 2020, Benoît Minisini wrote:
> Hi,
> 
> This mail in mainly for Christof and Tobias...
> 
> What is the difference between gb.test and gb.test.tap?
> 

Thanks for asking this before I had a chance to post a long essay about
something nobody asked for :-)

> I mean: for me gb.test should run the test modules and generate output in
> the TAP format, but what does gb.test.tap do?
> 

My take is that gb.test is for running tests in Gambas projects.
gb.test.tap is a reusable component for reading and writing TAP.

Now let me say a few things in order to explain a few problems I have with
the current gb.test and why I wrote gb.test.tap -- which is not intended
as an alternative test framework but as a dependency for gb.test, to make
a part of it reusable and enforce proper encapsulation.

TAP is a two-party protocol. A "test" is a process which loads and uses the
"system under test" (think a test module inside the project), makes assertions
about function return values, errors happening or not happening, etc.
Whether the assertions at runtime turn out to be true or not is reflected by
"ok" / "not ok" lines printed on stdout of the test process, together with
other control or diagnostic messages that may be helpful to the other side.

The "test harness" is another process (think gbt3 or the IDE) which plans
a series of tests, by picking test modules and methods, then runs each test,
collects its TAP output and analyzes if the test was successful.

gb.test currently mixes these two concerns. It does output TAP, but instead
of having the other party parse it, it prints a "self-summary" in the end,
for inspection by I don't know who. *Typically*, in my experience, the test
process is only concerned with outputting honest TAP and then terminates,
meanwhile the harness reads that TAP and rehashes it for the user. gb.test
does not follow this separation and at the moment I cannot do that because
it does not include a general TAP parser.

TAP being a language-agnostic specification, there is a universal test
harness called "prove", you probably have it installed on your system.
I have a couple of tests here of which I can select three to run:

  $ prove -e '' t/readme.t t/like.t t/except.t
  t/readme.t .. ok
  t/like.t .... ok
  t/except.t .. ok
  All tests successful.
  Files=3, Tests=24,  0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
  Result: PASS

It ran the three executables (ELF binary, so no interpreter -e '' needed),
read all their TAP outputs containing 24 assertions in total and summarized
them briefly -- apparently everything is fine. If I had wanted the entire
TAP output, I could have asked for --verbose.

gb.test tries to do all of that in one process at the moment and this is
a lack of separation that facilitates it doing "impure" things. Concretely
I mean this

  https://gitlab.com/gambas/gambas/-/blob/3752088d37/comp/src/gb.test/.src/TestSuite/TestCase.class#L48

  Try Object.Call($MyTestModule, Me.name)
  If Error Then
    Inc Track.Counter
    Track.NOKs.Add(Track.Counter)
    Print "not ok " & Track.Counter & " " &
    Track.TestModuleName & ":" & Track.TestName & " Raised error: " & Error.Text
    Error.Clear
  Endif

If a *test method* raises an uncaught error, then a *test harness*-like
piece of code catches it and spoofs a test failure in the TAP stream.
Arguably that is not correct, semantically. If an error is raised and not
caught, then the *test code* is buggy, not the *tested* code. Furthermore,
this trick isn't always helpful: it only inserts one test failure when
the entire test method aborted due to an error at any point, what if the
test method had a couple more tests planned? Then the TAP stream will be
subsequently shifted and parts will be missing but it continues to run.

In the architecture I outlined above this should not be possible, as tester
and testee are two separate processes, the harness can't catch Gambas errors
of the test code. And the TAP protocol is designed so that tests crashing
because of uncaught exceptions or even segfaults are ordinary, detectable
error scenarios, which can be handled gracefully -- but of course only if the
one analyzing the test does so from another process, which didn't just crash.

In summary, my qualms with current gb.test is lack of separation that I
know from other TAP systems. I encapsulated the TAP printing and parsing
into gb.test.tap (because TAP printing was also scattered over different
parts of gb.test in an ad-hoc fashion -- a "not ok" here and a few
"bail out"s there). Test code should only use Assert to print TAP,
harness code should use TestHarness to parse TAP.

I see gb.test as a sort of trojan horse component that gets loaded by the
project and, up to the current limitations, can find test modules and invoke
their test methods, which in turn are part of the project, so they can test
even the project's internals as thoroughly as desired.

But IMO this testing should be directed from a separate process like in
this flowchart (start on the top left):

    User selects              Results are aggregated and
    test modules <----------- reported to the user with
    and methods               appropriate detail (IDE or gbt3)
        |                             ^
        |                             |
        |     Test harness casts      |
        `---> magic at gbx3 to run ---+
              selected tests          |
                 (IDE or gbt3)        |
                   |                  |
                   |                  |
                   |    Many instances of gbx3 set up
                   `--> and run the specified tests,
                        print their TAP and exit
                          (gb.test w/ test modules)

> Otherwise, as for running the tests from 'gb.test': As Christof said, the
> problem is that 'gb.test' cannot see the test modules if they are not
> exported. Which is somewhat logical.
> 
> But I don't want the test modules to be exported, they will pollute the
> global symbol table.
> 

D'accord on the second point.

> So I bring the following suggestions:
> 
> 1) Adding a running test option to the interpreter. (Eventually use '-t' and
> find another name for the tracing option).
> 
> 2) The running test option will have an argument that specifies which test
> modules to run (everything by default).
> 
> 3) The interpreter in running test mode will not run the startup class, but
> will load every test module, and will run a global public method in an
> hardcoded global public class that must be implemented by the 'gb.test'
> component. For example "Test.Main". It will receive the class objects in
> argument. This is how it will be able to access non-exported test modules
> and run them.
> 
> 4) The '.test' file created by the compiler is still needed. It will be used
> by the interpreter to find the test modules.
> 
> 5) That way anyone can implement another test system, provided that it
> implements Test.Main().
> 
> 6) I guess the test modules should have common public methods to be used by
> 'gb.test'. Can it be made standard? If not, it's not a big deal, as a
> different test system will have different assertion methods.
> 
> What do you think?
> 

Yes, this is exactly the system I had in mind, give or take a few
implementation details. Special treatment in the interpreter had
to come into play at some point.

Keeping it open to alternative test frameworks is definitely a
bonus -- though making one is not my intention at the moment.

Regards,
Tobias

-- 
"There's an old saying: Don't change anything... ever!" -- Mr. Monk


More information about the User mailing list