<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<div class="moz-cite-prefix">On 20/08/2020 11:12,
<a class="moz-txt-link-abbreviated" href="mailto:user-request@lists.gambas-basic.org">user-request@lists.gambas-basic.org</a> wrote:<br>
</div>
<blockquote type="cite"
cite="mid:mailman.0.1597918322.23951.user@lists.gambas-basic.org">
<table class="header-part1" width="100%" cellspacing="0"
cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="headerdisplayname" style="display:inline;">Subject:
</div>
Re: [Gambas-user] How to Disassemble XML/HTML</td>
</tr>
<tr>
<td>
<div class="headerdisplayname" style="display:inline;">From:
</div>
T Lee Davidson <a class="moz-txt-link-rfc2396E" href="mailto:t.lee.davidson@gmail.com"><t.lee.davidson@gmail.com></a></td>
</tr>
<tr>
<td>
<div class="headerdisplayname" style="display:inline;">Date:
</div>
19/08/2020, 15:18</td>
</tr>
</tbody>
</table>
<table class="header-part2" width="100%" cellspacing="0"
cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="headerdisplayname" style="display:inline;">To:
</div>
<a class="moz-txt-link-abbreviated" href="mailto:user@lists.gambas-basic.org">user@lists.gambas-basic.org</a></td>
</tr>
</tbody>
</table>
<br>
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-unicode">On 8/19/20 2:31 AM, John Rose
wrote:
<br>
<blockquote type="cite" style="color: #000000;">I have some
questions:
<br>
<br>
1. Can you recommend a printed book and/or online tutorial to
help me understand your coding (in the routines processing the
HTMLandXML data such as what the "@graph" element is) and the
concepts behind it? Please remember that I'm a newbie to
HTML& XML etc.
<br>
<br>
2. Are all of the following Gambas Components required in the
attached httpClientExtra app (your modified httpClient app
slightly changed by me): gb.net, gb.net.curl b.web, gb.xml,
gb.xml.html?
<br>
<br>
3. Is there a Gambas component and/or standard coding to
extract values from the Episodes information? I'm thinking of
the identifier, episodeNumber, description, datePublished,
name & url fields. I'd like to extract them into the
corresponding aIdentifier, aEpisodeNumber, aDescription,
aDatePublished, aName & aURL Gambas string arrays, for
each Episode's set of data. Obviously I could code this
myself, but it would be nice if there are already routine(s)
written to do this kind of thing.
<br>
<br>
4. What coding is required to put the partOfSeries &
partOfSeason sections (from the Prettified JSON data)
immediately after the episode data for each Episode in the
Episodes text & file?
<br>
Similar to 3, I would like to also extract some fields
(description & name) in the partOfSeries section and some
data (name) in the partOfSeason section for each TVEpisode in
the Prettified JSON. For example, the values from the lines:
<br>
/description -> Series exploring behind the scenes at
Longleat Estate and Safari Park//
<br>
//name -> Animal Park//
<br>
//name -> Summer 2020/
<br>
in this part of Prettified JSON :
<br>
/@type -> TVEpisode//
<br>
// identifier -> m000lwqj//
<br>
// episodeNumber -> 1//
<br>
// description -> Kate and Ben return to Longleat just
as the Covid-19 pandemic forces the park to close.//
<br>
// datePublished -> 2020-08-17//
<br>
// image -> <a class="moz-txt-link-freetext"
href="https://ichef.bbci.co.uk/images/ic/480xn/p08n899w.jpg//"
moz-do-not-send="true">https://ichef.bbci.co.uk/images/ic/480xn/p08n899w.jpg//</a>
<br>
// name -> Episode 1//
<br>
// url -> <a class="moz-txt-link-freetext"
href="https://www.bbc.co.uk/programmes/m000lwqj//"
moz-do-not-send="true">https://www.bbc.co.uk/programmes/m000lwqj//</a>
<br>
// partOfSeries://
<br>
// @type -> TVSeries//
<br>
// image -> <a class="moz-txt-link-freetext"
href="https://ichef.bbci.co.uk/images/ic/480xn/p07jtz7g.jpg//"
moz-do-not-send="true">https://ichef.bbci.co.uk/images/ic/480xn/p07jtz7g.jpg//</a>
<br>
// description -> Series exploring behind the scenes
at Longleat Estate and Safari Park//
<br>
// identifier -> b006w6ns//
<br>
// name -> Animal Park//
<br>
// url -> <a class="moz-txt-link-freetext"
href="https://www.bbc.co.uk/programmes/b006w6ns//"
moz-do-not-send="true">https://www.bbc.co.uk/programmes/b006w6ns//</a>
<br>
// partOfSeason://
<br>
// @type -> TVSeason//
<br>
// position -> 29//
<br>
// identifier -> m000lwk9//
<br>
// name -> Summer 2020//
<br>
/
<br>
</blockquote>
<br>
To answer your #2 question: gb.net.curl provides httpClient
(which the app uses) and requires gb.net. So both are required.
gb.xml.html provides HtmlDocument (which the app uses) and
requires gb.xml. So both those are also required. gb.web
provides the JSON.Decode function (which the app uses).
gb.util.web also provides the JSON.Decode function. So one, or
the other, is required.
<br>
<br>
For your question #1: The code Tobi provided loads the web page
into an HtmlDocument object and then extracts the embedded JSON
data, and with JSON.Decode, converts it into a Gambas
representation (ie. Gambas datatypes) of the JSON data. So,
you're no longer working with HTML/XML. You're working with
Gambas datatypes representing the JSON data from the web page.
<br>
<br>
Therefore, you should focus on understanding JSON.
<br>
<a class="moz-txt-link-freetext"
href="https://www.json.org/json-en.html"
moz-do-not-send="true">https://www.json.org/json-en.html</a>
<br>
<a class="moz-txt-link-freetext"
href="https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON"
moz-do-not-send="true">https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON</a>
<br>
<br>
#3: The app already uses the component(s) necessary to extract
the values you're wanting. As for standard coding to do that, it
depends on exactly what you mean by that. You need to determine
the actual structure of the data so you know how to reference
whatever particular element contains the info you want to
extract. Then you can use the Gambas representation of that JSON
data to retrieve the info from a standard Gambas datatype which
in this case is hData as a multidimensional array of
collections. Not clear? See #4.
<br>
<br>
#4: The reason I 'prettified' the data with indentation is to
show the additional (sub-)dimensions. To directly access, for
example, the type and description of a partOfSeries we would
use:
<br>
hData[1]["@graph"]["partOfSeries"]["@type"] , and
<br>
hData[1]["@graph"]["partOfSeries"]["description"]
<br>
<br>
Since we can see that partOfSeries is a single-dimensional
collection containing only string values, we can easily
enumerate over it:
<br>
For Each sElement as String in
hData[1]["@graph"]["partOfSeries"]
<br>
Print sElement
<br>
Next
<br>
<br>
For one more example, to directly access the broadcaster's legal
name, we would use:
<br>
hData[1]["@graph"]["publication"]["publishedOn"]["broadcaster"]["legalName"]
<br>
<br>
Now since the "publication" element is multidimensional,
enumerating over it and printing the value of its elements would
cause an error when trying to print the value of "publishedOn"
which is itself another Collection[]. This error could be
prevented if we check the type of each element [with TypeOf(),
Object.Type(), or Object.Is()] and do not try to print anything
that is not a string.
<br>
<br>
It may be easier for you to see the distinction of the
sub-dimensions if you set iTabWidth at line 101 to 4.
<br>
</div>
</blockquote>
<p>I've tried to obtain the various field values for some @type
elements and some partOfSeries elements. However it now stops , I
think, at the line shown in the code below. I think it might be
due to the first Episode 'extracted' having no partOfSeries
section. How do I test for that?</p>
<p>aSeriesName.Add(cEpisode["partOfSeries"]["name"])</p>
<p>All the aEpisode... & aSeries... are defined as global arrays
of strings e.g. Private aSeriesName As String[]<br>
</p>
<p>I still don't fully understand this extraction of JSON field
values. But I will take a look at the above 2 URLs of JSON
information.<br>
</p>
<p>Private Procedure ExtractEpisodes()<br>
Dim caEpisodes As Collection[]<br>
Dim cEpisode As Collection<br>
Dim sTextContent As String<br>
sTextContent = ""<br>
If hData.Count = 0 Then <br>
QuitAfterError("No Episodes in Week " & sWeekNumber, "for
" & sConnectMedium & " " & sConnectChannel)<br>
End If<br>
caEpisodes = hData[1]["@graph"]<br>
For Each cEpisode In caEpisodes<br>
aEpisodeName.Add(cEpisode["name"])<br>
aEpisodeDescription.Add(cEpisode["description"])<br>
aEpisodeDatePublished.Add(cEpisode["datePublished"])<br>
aEpisodeIdentifier.Add(cEpisode["identifier"])<br>
aSeriesName.Add(cEpisode["partOfSeries"]["name"])<br>
aSeriesDescription.Add(cEpisode["partOfSeries"]["description"])<br>
If Left(UCase(cEpisode["partOfSeries"]["name"]), 12) = "LINE
OF DUTY" Then<br>
Print "Line of Duty"<br>
Print "EpisodeName=" & cEpisode["name"]<br>
Print "Episode Description=" & cEpisode["description"]<br>
Print "DatePublished=" & cEpisode["datePublished"]<br>
Print "Identifier=" & cEpisode["identifier"]<br>
Print "SeriesName=" & cEpisode["partOfSeries"]["name"]<br>
Print "SeriesDescription=" &
cEpisode["partOfSeries"]["description"]<br>
Endif<br>
For Each sInfo As Variant In cEpisode<br>
If TypeOf(sInfo) <> gb.String Then Continue<br>
sTextContent &= cEpisode.Key & " -> " & sInfo
& "\n"<br>
Next<br>
sTextContent &= "\n"<br>
Next<br>
</p>
<pre class="moz-signature" cols="72">--
John
0044 1902 331266
0044 7476 041418</pre>
</body>
</html>