<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html;

      charset=windows-1252">

  </head>

  <body>

    <div class="moz-cite-prefix">On 20/08/2020 11:12,

      <a class="moz-txt-link-abbreviated" href="mailto:user-request@lists.gambas-basic.org">user-request@lists.gambas-basic.org</a> wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:mailman.0.1597918322.23951.user@lists.gambas-basic.org">

      <table class="header-part1" width="100%" cellspacing="0"

        cellpadding="0" border="0">

        <tbody>

          <tr>

            <td>

              <div class="headerdisplayname" style="display:inline;">Subject:

              </div>

              Re: [Gambas-user] How to Disassemble XML/HTML</td>

          </tr>

          <tr>

            <td>

              <div class="headerdisplayname" style="display:inline;">From:

              </div>

              T Lee Davidson <a class="moz-txt-link-rfc2396E" href="mailto:t.lee.davidson@gmail.com"><t.lee.davidson@gmail.com></a></td>

          </tr>

          <tr>

            <td>

              <div class="headerdisplayname" style="display:inline;">Date:

              </div>

              19/08/2020, 15:18</td>

          </tr>

        </tbody>

      </table>

      <table class="header-part2" width="100%" cellspacing="0"

        cellpadding="0" border="0">

        <tbody>

          <tr>

            <td>

              <div class="headerdisplayname" style="display:inline;">To:

              </div>

              <a class="moz-txt-link-abbreviated" href="mailto:user@lists.gambas-basic.org">user@lists.gambas-basic.org</a></td>

          </tr>

        </tbody>

      </table>

      <br>

      <div class="moz-text-flowed" style="font-family: -moz-fixed;

        font-size: 12px;" lang="x-unicode">On 8/19/20 2:31 AM, John Rose

        wrote:

        <br>

        <blockquote type="cite" style="color: #000000;">I have some

          questions:

          <br>

          <br>

          1. Can you recommend a printed book and/or online tutorial to

          help me understand your coding (in the routines processing the

          HTMLandXML data such as what the "@graph" element is) and the

          concepts behind it? Please remember that I'm a newbie to

          HTML& XML etc.

          <br>

          <br>

          2. Are all of the following Gambas Components required in the

          attached httpClientExtra app (your modified httpClient app

          slightly changed by me): gb.net, gb.net.curl b.web, gb.xml,

          gb.xml.html?

          <br>

          <br>

          3. Is there a Gambas component and/or standard coding to

          extract values from the Episodes information? I'm thinking of

          the identifier, episodeNumber, description, datePublished,

          name & url fields. I'd like to extract them into the

          corresponding aIdentifier, aEpisodeNumber, aDescription,

          aDatePublished, aName & aURL Gambas string arrays, for

          each Episode's set of data. Obviously I could code this

          myself, but it would be nice if there are already routine(s)

          written to do this kind of thing.

          <br>

          <br>

          4. What coding is required to put the partOfSeries &

          partOfSeason sections (from the Prettified JSON data)

          immediately after the episode data for each Episode in the

          Episodes text & file?

          <br>

          Similar to 3, I would like to also extract some fields

          (description & name) in the partOfSeries section and some

          data (name) in the partOfSeason section for each TVEpisode in

          the Prettified JSON. For example, the values from the lines:

          <br>

          /description -> Series exploring behind the scenes at

          Longleat Estate and Safari Park//

          <br>

          //name -> Animal Park//

          <br>

          //name -> Summer 2020/

          <br>

          in this part of Prettified JSON :

          <br>

          /@type -> TVEpisode//

          <br>

          //    identifier -> m000lwqj//

          <br>

          //    episodeNumber -> 1//

          <br>

          //    description -> Kate and Ben return to Longleat just

          as the Covid-19 pandemic forces the park to close.//

          <br>

          //    datePublished -> 2020-08-17//

          <br>

          //    image -> <a class="moz-txt-link-freetext"

            href="https://ichef.bbci.co.uk/images/ic/480xn/p08n899w.jpg//"

            moz-do-not-send="true">https://ichef.bbci.co.uk/images/ic/480xn/p08n899w.jpg//</a>

          <br>

          //    name -> Episode 1//

          <br>

          //    url -> <a class="moz-txt-link-freetext"

            href="https://www.bbc.co.uk/programmes/m000lwqj//"

            moz-do-not-send="true">https://www.bbc.co.uk/programmes/m000lwqj//</a>

          <br>

          //    partOfSeries://

          <br>

          //      @type -> TVSeries//

          <br>

          //      image -> <a class="moz-txt-link-freetext"

            href="https://ichef.bbci.co.uk/images/ic/480xn/p07jtz7g.jpg//"

            moz-do-not-send="true">https://ichef.bbci.co.uk/images/ic/480xn/p07jtz7g.jpg//</a>

          <br>

          //      description -> Series exploring behind the scenes

          at Longleat Estate and Safari Park//

          <br>

          //      identifier -> b006w6ns//

          <br>

          //      name -> Animal Park//

          <br>

          //      url -> <a class="moz-txt-link-freetext"

            href="https://www.bbc.co.uk/programmes/b006w6ns//"

            moz-do-not-send="true">https://www.bbc.co.uk/programmes/b006w6ns//</a>

          <br>

          //    partOfSeason://

          <br>

          //      @type -> TVSeason//

          <br>

          //      position -> 29//

          <br>

          //      identifier -> m000lwk9//

          <br>

          //      name -> Summer 2020//

          <br>

          /

          <br>

        </blockquote>

        <br>

        To answer your #2 question: gb.net.curl provides httpClient

        (which the app uses) and requires gb.net. So both are required.

        gb.xml.html provides HtmlDocument (which the app uses) and

        requires gb.xml. So both those are also required. gb.web

        provides the JSON.Decode function (which the app uses).

        gb.util.web also provides the JSON.Decode function. So one, or

        the other, is required.

        <br>

        <br>

        For your question #1: The code Tobi provided loads the web page

        into an HtmlDocument object and then extracts the embedded JSON

        data, and with JSON.Decode, converts it into a Gambas

        representation (ie. Gambas datatypes) of the JSON data. So,

        you're no longer working with HTML/XML. You're working with

        Gambas datatypes representing the JSON data from the web page.

        <br>

        <br>

        Therefore, you should focus on understanding JSON.

        <br>

        <a class="moz-txt-link-freetext"

          href="https://www.json.org/json-en.html"

          moz-do-not-send="true">https://www.json.org/json-en.html</a>

        <br>

        <a class="moz-txt-link-freetext"

href="https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON"

          moz-do-not-send="true">https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON</a>

        <br>

        <br>

        #3: The app already uses the component(s) necessary to extract

        the values you're wanting. As for standard coding to do that, it

        depends on exactly what you mean by that. You need to determine

        the actual structure of the data so you know how to reference

        whatever particular element contains the info you want to

        extract. Then you can use the Gambas representation of that JSON

        data to retrieve the info from a standard Gambas datatype which

        in this case is hData as a multidimensional array of

        collections. Not clear? See #4.

        <br>

        <br>

        #4: The reason I 'prettified' the data with indentation is to

        show the additional (sub-)dimensions. To directly access, for

        example, the type and description of a partOfSeries we would

        use:

        <br>

        hData[1]["@graph"]["partOfSeries"]["@type"] , and

        <br>

        hData[1]["@graph"]["partOfSeries"]["description"]

        <br>

        <br>

        Since we can see that partOfSeries is a single-dimensional

        collection containing only string values, we can easily

        enumerate over it:

        <br>

        For Each sElement as String in

        hData[1]["@graph"]["partOfSeries"]

        <br>

          Print sElement

        <br>

        Next

        <br>

        <br>

        For one more example, to directly access the broadcaster's legal

        name, we would use:

        <br>

hData[1]["@graph"]["publication"]["publishedOn"]["broadcaster"]["legalName"]

        <br>

        <br>

        Now since the "publication" element is multidimensional,

        enumerating over it and printing the value of its elements would

        cause an error when trying to print the value of "publishedOn"

        which is itself another Collection[]. This error could be

        prevented if we check the type of each element [with TypeOf(),

        Object.Type(), or Object.Is()] and do not try to print anything

        that is not a string.

        <br>

        <br>

        It may be easier for you to see the distinction of the

        sub-dimensions if you set iTabWidth at line 101 to 4.

        <br>

      </div>

    </blockquote>

    <p>I've tried to obtain the various field values for some @type

      elements and some partOfSeries elements. However it now stops , I

      think, at the line shown in the code below. I think it might be

      due to the first Episode 'extracted' having no partOfSeries

      section.  How do I test for that?</p>

    <p>aSeriesName.Add(cEpisode["partOfSeries"]["name"])</p>

    <p>All the aEpisode... & aSeries... are defined as global arrays

      of strings e.g. Private aSeriesName As String[]<br>

    </p>

    <p>I still don't fully understand this extraction of JSON field

      values. But I will take a look at the above 2 URLs of JSON

      information.<br>

    </p>

    <p>Private Procedure ExtractEpisodes()<br>

        Dim caEpisodes As Collection[]<br>

        Dim cEpisode As Collection<br>

        Dim sTextContent As String<br>

        sTextContent = ""<br>

        If hData.Count = 0 Then <br>

          QuitAfterError("No Episodes in Week " & sWeekNumber, "for

      " & sConnectMedium & " " & sConnectChannel)<br>

        End If<br>

        caEpisodes = hData[1]["@graph"]<br>

        For Each cEpisode In caEpisodes<br>

          aEpisodeName.Add(cEpisode["name"])<br>

          aEpisodeDescription.Add(cEpisode["description"])<br>

          aEpisodeDatePublished.Add(cEpisode["datePublished"])<br>

          aEpisodeIdentifier.Add(cEpisode["identifier"])<br>

          aSeriesName.Add(cEpisode["partOfSeries"]["name"])<br>

      aSeriesDescription.Add(cEpisode["partOfSeries"]["description"])<br>

          If Left(UCase(cEpisode["partOfSeries"]["name"]), 12) = "LINE

      OF DUTY" Then<br>

            Print "Line of Duty"<br>

            Print "EpisodeName=" & cEpisode["name"]<br>

            Print "Episode Description=" & cEpisode["description"]<br>

            Print "DatePublished=" & cEpisode["datePublished"]<br>

            Print "Identifier=" & cEpisode["identifier"]<br>

            Print "SeriesName=" & cEpisode["partOfSeries"]["name"]<br>

            Print "SeriesDescription=" &

      cEpisode["partOfSeries"]["description"]<br>

          Endif<br>

          For Each sInfo As Variant In cEpisode<br>

            If TypeOf(sInfo) <> gb.String Then Continue<br>

            sTextContent &= cEpisode.Key & " -> " & sInfo

      & "\n"<br>

          Next<br>

          sTextContent &= "\n"<br>

        Next<br>

    </p>

    <pre class="moz-signature" cols="72">-- 

John

0044 1902 331266

0044 7476 041418</pre>

  </body>

</html>