From bugtracker at gambaswiki.org Fri Feb 1 01:13:56 2019 From: bugtracker at gambaswiki.org (bugtracker at gambaswiki.org) Date: Fri, 01 Feb 2019 00:13:56 GMT Subject: [Gambas-bugtracker] Bug #1520: HtmlDocument confused by two doctype's in same document Message-ID: http://gambaswiki.org/bugtracker/edit?object=BUG.1520&from=L21haW4- T. Lee DAVIDSON reported a new bug. Summary ------- HtmlDocument confused by two doctype's in same document Type : Bug Priority : Medium Gambas version : 3.12 Product : XML components Description ----------- A search on Google yields a page which source contains two DOCTYPE tags. The one at the page top is in lowercase letters. Another one, in uppercase letters, is embedded within a Javascript element further down in the page. HtmlDocument seems to prefer the uppercase'd DocType tag and therefore truncates the top of the document. System information ------------------ [System] Gambas=3.12.90 e467d664c (master) OperatingSystem=Linux Kernel=4.4.165-81-default Architecture=x86_64 Distribution=openSUSE Leap 42.3 Desktop=KDE5 Theme=QtCurve Language=en_US.UTF-8 Memory=3951M [Libraries] Cairo=/usr/lib64/libcairo.so.2.11502.0 Curl=/usr/lib64/libcurl.so.4.3.0 DBus=/lib64/libdbus-1.so.3.8.14 GStreamer=/usr/lib64/libgstreamer-0.10.so.0.30.0 GStreamer=/usr/lib64/libgstreamer-1.0.so.0.803.0 GTK+2=/usr/lib64/libgtk-x11-2.0.so.0.2400.31 GTK+3=/usr/lib64/libgtk-3.so.0.2000.10 OpenGL=/usr/lib64/libGL.so.1.2.0 Poppler=/usr/lib64/libpoppler.so.60.0.0 QT4=/usr/lib64/libQtCore.so.4.8.7 QT5=/usr/lib64/libQt5Core.so.5.6.2 SDL=/usr/lib64/libSDL-1.2.so.0.11.4 SQLite=/usr/lib64/libsqlite3.so.0.8.6 [Environment] ALSA_CONFIG_PATH=/etc/alsa-pulse.conf AUDIODRIVER=pulseaudio COLORTERM=1 CONFIG_SITE=/usr/share/site/x86_64-unknown-linux-gnu CPU=x86_64 CSHEDIT=emacs CVS_RSH=ssh DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-zBWiqk4h0x,guid=252f96c0cc2313f8de42943f5c532800 DESKTOP_SESSION=/usr/share/xsessions/plasma5 DISPLAY=:0 FROM_HEADER= GB_GUI=gb.qt5 GOARCH=amd64 GOOS=linux GOPATH=/go:/usr/share/go/1.9/contrib GOROOT=/usr/lib64/go/1.9 GPG_AGENT_INFO=/tmp/gpg-oLEFoq/S.gpg-agent:2400:1 GPG_TTY=not a tty GS_LIB=/.fonts GTK2_RC_FILES=/etc/gtk-2.0/gtkrc:/.gtkrc-2.0 GTK_IM_MODULE=cedilla GTK_MODULES=canberra-gtk-module G_BROKEN_FILENAMES=1 G_FILENAME_ENCODING=@locale,UTF-8,ISO-8859-15,CP1252 HISTSIZE=1000 HOME= HOST= HOSTNAME= HOSTTYPE=x86_64 INPUTRC=/.inputrc JAVA_BINDIR=/usr/lib64/jvm/java/bin JAVA_HOME=/usr/lib64/jvm/java JAVA_ROOT=/usr/lib64/jvm/java JDK_HOME=/usr/lib64/jvm/java JRE_HOME=/usr/lib64/jvm/java/jre KDE_FULL_SESSION=true KDE_SESSION_UID=1000 KDE_SESSION_VERSION=5 KOTLIN_HOME=/.sdkman/candidates/kotlin/current LANG=en_US.UTF-8 LESS=-M -I -R LESSCLOSE=lessclose.sh %s %s LESSKEY=/etc/lesskey.bin LESSOPEN=lessopen.sh %s LESS_ADVANCED_PREPROCESSOR=no LE_WORKING_DIR=/.acme.sh LOGNAME= MACHTYPE=x86_64-suse-linux MAIL=/var/spool/mail/ MANPATH=/usr/local/man:/usr/share/man MINICOM=-c on MORE=-sl NNTPSERVER=news OSTYPE=linux PAGER=less PATH=/.sdkman/candidates/kotlin/current/bin:/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games PROFILEREAD=true PULSE_PROP_OVERRIDE_application.icon_name=plasma PULSE_PROP_OVERRIDE_application.name=Plasma PULSE_PROP_OVERRIDE_application.version=5.8.7 PWD= PYTHONSTARTUP=/etc/pythonstart QEMU_AUDIO_DRV=pa QMLSCENE_DEVICE= QSG_RENDER_LOOP= QT_AUTO_SCREEN_SCALE_FACTOR=0 QT_IM_MODULE=xim QT_IM_SWITCHER=imsw-multi QT_SYSTEM_DIR=/usr/share/desktop-data SDKMAN_CANDIDATES_DIR=/.sdkman/candidates SDKMAN_CURRENT_API=https://api.sdkman.io/2 SDKMAN_DIR=/.sdkman SDKMAN_LEGACY_API=https://api.sdkman.io/1 SDKMAN_PLATFORM=Linux64 SDKMAN_VERSION=5.5.13+272 SDK_HOME=/usr/lib64/jvm/java SDL_AUDIODRIVER=pulse SESSION_MANAGER=local/:@/tmp/.ICE-unix/2468,unix/:/tmp/.ICE-unix/2468 SHELL=/bin/bash SHLVL=2 SSH_AGENT_PID=2399 SSH_ASKPASS=/usr/lib/ssh/ksshaskpass SSH_AUTH_SOCK=/tmp/ssh-Ob3ovMoxbShy/agent.2282 TERM=xterm TZ=:/etc/localtime USER= VDPAU_DRIVER=va_gl WINDOWMANAGER=/usr/bin/startkde XAUTHLOCALHOSTNAME= XAUTHORITY=/.Xauthority XCURSOR_SIZE=0 XCURSOR_THEME=breeze_cursors XDG_CONFIG_DIRS=/etc/xdg XDG_CURRENT_DESKTOP=KDE XDG_DATA_DIRS=/usr/share XDG_RUNTIME_DIR=/run/user/1000 XDG_SEAT=seat0 XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0 XDG_SESSION_CLASS=user XDG_SESSION_DESKTOP=KDE XDG_SESSION_ID=2 XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session2 XDG_SESSION_TYPE=x11 XDG_VTNR=7 XKEYSYMDB=/usr/X11R6/lib/X11/XKeysymDB XMODIFIERS=@im=local XNLSPATH=/usr/share/X11/nls XSESSION_IS_UP=yes _=/usr/bin/kstart From bugtracker at gambaswiki.org Fri Feb 1 11:01:30 2019 From: bugtracker at gambaswiki.org (bugtracker at gambaswiki.org) Date: Fri, 01 Feb 2019 10:01:30 GMT Subject: [Gambas-bugtracker] Bug #1515: gb.report2 In-Reply-To: References: Message-ID: http://gambaswiki.org/bugtracker/edit?object=BUG.1515&from=L21haW4- Comment #5 by Michael ALTROGGE: Aaaah ... gut zu wissen dass ich fehlermeldungen in meiner muttersprache abgeben kann ... From bugtracker at gambaswiki.org Fri Feb 1 12:00:53 2019 From: bugtracker at gambaswiki.org (bugtracker at gambaswiki.org) Date: Fri, 01 Feb 2019 11:00:53 GMT Subject: [Gambas-bugtracker] Bug #1520: HtmlDocument confused by two doctype's in same document In-Reply-To: References: Message-ID: http://gambaswiki.org/bugtracker/edit?object=BUG.1520&from=L21haW4- Comment #1 by Tobias BOEGE: Fixed in 579c9e1fc. The code did anticipate lowercase but it (unintentionally) prioritised any uppcase one it found before, even if it was further into the document. Bad news is that this fix is not enough to correctly parse a Google search results page. That one huge ", probably in a string or so, which throws the parsing off. It seems like the HTML parser has to learn about quoting rules in Javascript (and CSS for that matter, and what else you can embed directly) for that endeavour to succeed... Tobias BOEGE changed the state of the bug to: Fixed. From bugtracker at gambaswiki.org Fri Feb 1 16:23:24 2019 From: bugtracker at gambaswiki.org (bugtracker at gambaswiki.org) Date: Fri, 01 Feb 2019 15:23:24 GMT Subject: [Gambas-bugtracker] Bug #1520: HtmlDocument confused by two doctype's in same document In-Reply-To: References: Message-ID: http://gambaswiki.org/bugtracker/edit?object=BUG.1520&from=L21haW4- Comment #2 by T. Lee DAVIDSON: Thank you, Tobi. Just FYI: I had saved the source from three separate Google search pages and just searched through all of them for the string: script . I did not find any embedded (ie. withing a string)