[Gambas-user] Coming back to gb.test

Fri Apr 3 01:30:01 CEST 2020

Hi Christof, Benoît and whoever cares about testing,

sorry for the absence while the gb.test situation is broken. I've always
been a single-tasker and slow to context-switch and even moreso since late
2018 (Meltdown?) and working from home now just makes it worse.

Christof has reminded me what the critical points are:

On Thu, 02 Apr 2020, Christof Thalhofer wrote:
> ----------------------------------------------------------------
> 1) Assert is now in gb.test.tap and not in gb.test. But this makes no
> sense because asserts must not be associated to the output format. In
> addition the flow of testing and assertions cannot be debugged inside
> gb.test any more.
> 

I agree with this. gb.test must contain the Assert module which provides
"high-level" assertions like comparing strings and printing diagnostics
when the test fails ("expected this string but got that string").

But gb.test.tap should still export a *minimal* Assert module that just
contains Ok, Diag, BailOut, Todo, Skip and Subtest functionality.
gb.test builds its assertions on top of these primitives by extending
the Assert module. This allows TAP to be switched out for another format
as long as the other format's Assert module provides the above primitives.

I think Benoît also suggested something along those lines, so we agree.
D'accord?

> 2) gb.test.tap does not output TAP line by line (+/- summary) as it was
> before. This is necessary if something in the test system itself breaks
> so that the last state can be seen.
> 

This would be a bug. TapPrinter uses the Print instruction followed
by a Flush, so every line of output should be immediately visible.

An exception used to be gb.test's self-tests, those had to be buffered
inside the process producing the TAP to summarize them. I think I changed
that by adding a RawTap option, but I'm seeing the problem you describe,
the output is indeed not showing.

So yes, I acknowledge that bug and it needs to be fixed. I think I knew
last month what was causing it...

> 3) TestFailures.test is broken and does not run any more. It was
> necessary to ensure that crashes and failures were handled and reported
> well by gb.test.
> ----------------------------------------------------------------

More specifically:

On Tue, 10 Mar 2020, Christof Thalhofer wrote:
> In gb.test TestFailures.test has 11 Asserts that have to report "not ok".
> 

On this I'm sceptical.

This might be a good occasion to share here a doodle of a more elaborate
flowchart [1] of testing than the one Christof has posted earlier [2].
I hope it is understandable; it is more detailed and specifies the process
boundary and what the concerns of the components are. Notice, in comparison
with [2], that gb.test.tap is used in two places in [1]: once on the
producer's end and once on the analyzer's end. gb.test, which lives in
the process emitting TAP, does not have access to the test summary in my
model. The TAP goes straight back to the test harness. gb.test's only job
is to setup and teardown the environment and invoke the test code that
the user wants to run, from inside the process that contains the code.
gbt3 and the IDE are two alternative implementations of a test harness:
gbt3 is suitable for automated testing in a continuous integration or
test-on-installation context, while the IDE allows you to do test-driven
development.

In light of this flowchart, first of all: should we continue to catch and
handle errors in test code in gb.test? For convenience, here's the wrapping
we are talking about, slightly simplified:

--8<-- https://gitlab.com/gambas/gambas/-/blob/3c09d70986a95/comp/src/gb.test/.src/TestSuite/TestCase.class#L48
Try Object.Call(<test method>)
If Error Then
  Print "not ok " & ... & TestModuleName & ":" & Track.TestName & " Raised error: " & Error.Text
  Error.Clear
Endif
--------------------------------------------------------------------------

The effect when a test method raises a Gambas error is that the error is
caught and cleared, the message is inserted into the TAP stream as a failed
assertion with some data about where it happened and testing continues.

Now, suppose we don't have gb.test catch errors from test methods. What will
happen is that the interpreter just dies. It will print the error message
and where it was raised to stderr and terminate with a non-zero exit code.
This is pretty much the information that you inject into the TAP stream
(except the error location we get from the interpreter's message is where
the error happened exactly, not necessarily the test module/method name).
The test harness gets this information in this scenario as well. The only
thing that doesn't happen is that testing continues with the next test
method, which is alright I would say -- *except* when you want to run a
test like your TestFailures.test which must fail and must raise errors
to test that gb.test catches them appropriately. But when gb.test doesn't
include error catching logic, this test can simply be removed!

What are the downsides to not catching errors? When it dies, the interpreter
spits out all the information you assemble by itself anyway, so that's not
a problem. And even better: when the IDE runs our test process and the error
is uncaught, the debugger kicks in and lets us examine the stack frame and
local variables where the error happened (subject to the usual constraints:
you can't debug code that is in a different component, but we will be able
to catch errors in the project being tested). This would not be possible
when you catch the error and convert it into a failed assertion. Debugging
ability is improved by *not* catching those errors.

As a historical side note: I think at some point we agreed that inserting
a runtime error as a failed assertion is lying to some extent. Then we
converted the «Print "not ok"» part into a «BailOut», which better follows
the semantics of TAP. The only reason we now need to go back to the old
way is to make TestFaillures.test work. Is that right?

I know that you expended much effort into making sure errors are caught
properly by gb.test and everything is reported gracefully, but with the
multi-process architecture laid out in my flowchart, this issue can be
solved differently, by honestly crashing. N.B. that my flowchart specifies
a hypothetical future way for testing to work. It is not possible without
introducing the test mode and teaching the IDE about testing so that we
can do test mode *in* debug mode. I think it's worth it and we should
implement that flow.

Anyway, should we still decide to keep the Try-If-Error part -- which I
think is doable, too --, then what makes this test a bit weird to handle
is that it must produce all "not ok"s in order to count as succeeded.
I would like it to be rewritten in a way that it produces "ok"s when
gb.test does its job properly, if possible. This is so that a normal
TAP parser (instead of two specially instructed "this is good if it all
fails" human eyes) will tell us if gb.test behaves correctly.

Regards,
Tobi

PS: I have a local branch (with no commits but quite some staged changes)
towards the flowchart [1] already since early March. Unless the plan is
overthrown, I will polish those changes up and push them for review on the
weekend.

[1] https://gitlab.com/snippets/1960494
[2] https://gitlab.com/gambas/gambas/-/blob/3c09d70986a95/comp/src/gb.test/.hidden/flowchart.svg

-- 
"There's an old saying: Don't change anything... ever!" -- Mr. Monk