<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<pre class="moz-quote-pre" wrap="">What you have there is JSON embedded in a <script> tag inside HTML.
<blockquote type="cite"><pre class="moz-quote-pre" wrap="">Thankfully, this data has its own <script> tag which is appropriately
type'd as application/ld+json, so it is not so hard to extract using
gb.xml.html and gb.web:
Dim h As New HtmlDocument(your file here...)
Dim el As XmlElement
Dim data As New Collection[]
For Each el In h.GetElementsByTagName("script")
If el.Attributes["type"] <> "application/ld+json" Then Continue
data.Add(JSON.Decode(el.TextContent))
Next
</pre></blockquote>
Thanks, Tobias, for your code. I tried the above code (with the Gambas Components gb.net, gb.net.curl, gb.web, gb.xml, gb.xml.html included: was that overkill?) and it produced a large file named HTMLandXML.txt I've extracted some of it to an attached file named PartOfHTMLandXML.txt However, I have no idea of what to do with it. There are largish blocks of 'code' for each TV programme. Some sample lines which show the (emboldened) data that I want to obtain are:
line 212: <h3 class="programme__titles"><a href="<b><a class="moz-txt-link-freetext" href="https://www.bbc.co.uk/programmes/m00049tf">https://www.bbc.co.uk/programmes/m00049tf</a></b>"
lines 214-215: ><span class="programme__title delta"><span><b>Have I Got a Bit More News for You</b></span></span><span class="hidden">—</span><span class="programme__subtitle centi"><span>Series 57</span>, <span><b>Episode 2</b></span></span></a></h3>
lines 219-220: <abbr title="<b>Episode 2 of 9</b>"><span datatype="xsd:int">2</span>/<span class="programme__groupsize">9</span></abbr> <span><b>Guest host Alan Johnson joins Ian Hislop and Paul Merton for the topical news quiz.</b></span>
I have no idea how to extract this data from the file. Which Gambas Component should I use? Is there an example/tutorial/book on how to extract this data in Gambas? I am a real newbie to HTML & XML etc!
</pre>
</body>
</html>