[Gambas-devel] Some new littles things

Ron Onstenk ronstk at ...124...
Fri Nov 7 16:57:28 CET 2008


On Friday 07 November 2008, birchy wrote:
> 
> Fabien Bodard-4 wrote:
> > This function allow to to read directly a web page in one shot. and
> > return it in a string

Not realy. how about the required cookies, passing variable to the site like
http://www.google.fr/?q=gambas+HttpClient&l=fr_FR
with cookies telling the porno save search method in you preferences.
> 
> You have read my mind! This is exactly the kind of tool that i would use in
> a LOT of my projects.

As in the linux world everything is a file the file component main usage is
file storage as files on the hdd.

> At the moment i am using the HttpClient of 
> gb.web.curl. 

Correct while this component gb.net.curl is build for handling this kind
of files.
This accepts fragmented files from the server as the file component can't do.

> A HTML parsing library would be excellent, but i don't know how 
> easy it would be to write one as html is not as uniform as xml.

html is xml but xml is not html.

With the in <doctype declared file you should use the *.dtd to parse the html file.
If this is missing on hobby sites you take 1 of the 4/5 existing ones.

The biggest problem with html files is the incorrect usage of elements in places where 
they are not allowed to be. 
i.e. using text decoration tags outside the container where the text live.

<u>
<b>
<p>
underlined bold text
</p>
</u>
</b>

despite it works almost the U and B tags must be inside the P, 
secondly the closing U and B are reverse order as the open.
In this example the strict.dtd should fail but transitional may work.

In fact it is not that HTML is not uniform or the parser but the
website developer is the bad guy :)
I'v seen this kind of code also in web pages made with non webpage editors
as MS Word (play a little with colorize, U and B and you see the carbage result.

About easy, it will not be the case in respect to invent all the variations you hit.
A good example to look how complex it can be is the source of firefox.
What you want is the DOM representation.

The only thing I see is usage of the KHTML(gecko) library available in the web component.
Just what we need is a multi dimension array containing the DOM or
a gambas interface to request objects from the DOM and is partial alread in the xml
component.
In VB this is the WebBrowser and WebBrowser2 component.

Just my 2 cents

Ron 1st





More information about the Devel mailing list