[Gambas-bugtracker] Bug #1520: HtmlDocument confused by two doctype's in same document

bugtracker at gambaswiki.org bugtracker at gambaswiki.org
Fri Feb 1 12:00:53 CET 2019


http://gambaswiki.org/bugtracker/edit?object=BUG.1520&from=L21haW4-

Comment #1 by Tobias BOEGE:

Fixed in 579c9e1fc. The code did anticipate lowercase <!DOCTYPE> but it (unintentionally) prioritised
any uppcase one it found before, even if it was further into the document.

Bad news is that this fix is not enough to correctly parse a Google search results page. That one huge
<script> tag there has an embedded "</script>", probably in a string or so, which throws the parsing off.
It seems like the HTML parser has to learn about quoting rules in Javascript (and CSS for that matter,
and what else you can embed directly) for that endeavour to succeed...

Tobias BOEGE changed the state of the bug to: Fixed.




More information about the Bugtracker mailing list