<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html;

      charset=windows-1252">

  </head>

  <body>

    <div class="moz-cite-prefix">On 10/08/2020 18:09,

      <a class="moz-txt-link-abbreviated" href="mailto:user-request@lists.gambas-basic.org">user-request@lists.gambas-basic.org</a> wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:mailman.0.1597079342.21172.user@lists.gambas-basic.org">On

      8/10/20 3:48 AM, John Rose wrote:

      <br>

      <blockquote type="cite" style="color: #000000;">What you have

        there is JSON embedded in a <script> tag inside HTML.

        <br>

        <blockquote type="cite" style="color: #000000;">Thankfully, this

          data has its own <script> tag which is appropriately

          <br>

          type'd as application/ld+json, so it is not so hard to extract

          using

          <br>

          gb.xml.html and gb.web:

          <br>

          <br>

             Dim h As New HtmlDocument(your file here...)

          <br>

             Dim el As XmlElement

          <br>

             Dim data As New Collection[]

          <br>

             For Each el In h.GetElementsByTagName("script")

          <br>

               If el.Attributes["type"] <> "application/ld+json"

          Then Continue

          <br>

               data.Add(JSON.Decode(el.TextContent))

          <br>

             Next

          <br>

          <br>

        </blockquote>

        <br>

        Thanks, Tobias, for your code. I tried the above code (with the

        Gambas Components gb.net, gb.net.curl, gb.web, gb.xml,

        gb.xml.html included: was that overkill?) and it produced a

        large file named HTMLandXML.txt I've extracted some of it to an

        attached file named PartOfHTMLandXML.txt However, I have no idea

        of what to do with it. There are largish blocks of 'code' for

        each TV programme. Some sample lines which show the (emboldened)

        data that I want to obtain are:

        <br>

        line 212:  <h3 class="programme__titles"><a href="<b

          class="moz-txt-star"><span class="moz-txt-tag">*</span><a class="moz-txt-link-freetext" href="https://www.bbc.co.uk/programmes/m00049tf">https://www.bbc.co.uk/programmes/m00049tf</a><span

            class="moz-txt-tag">*</span></b>"

        <br>

        lines 214-215:   ><span class="programme__title

        delta"><span><b class="moz-txt-star"><span

            class="moz-txt-tag">*</span>Have I Got a Bit More News for

          You<span class="moz-txt-tag">*</span></b></span></span><span

        class="hidden">&mdash;</span><span

        class="programme__subtitle centi"><span>Series

        57</span>, <span>*Episode

        2*</span></span></a></h3>

        <br>

        lines 219-220:  <abbr title="*Episode 2 of 9*"><span

        datatype="xsd:int">2</span>/<span

class="programme__groupsize">9</span></abbr>               

        <span>*Guest host Alan Johnson joins Ian Hislop and Paul

        Merton for the topical news quiz.*</span>

        <br>

        I have no idea how to extract this data from the file. Which

        Gambas Component should I use? Is there an example/tutorial/book

        on how to extract this data in Gambas? I am a real newbie to

        HTML & XML etc!

        <br>

      </blockquote>

      <br>

      What Tobi gave you is a very simple method of extracting the data.

      All that is needed now is to figure out exactly how to retrieve

      the relevant data.

      <br>

      <br>

      I've attached a modified version of your program. There are now

      three buttons along with their event handlers and two additional

      subroutines. Plus, I have modified the Extract subroutine to show

      the decoded JSON data. For convenience, I moved hData to the top

      of the class file as a global variable.

      <br>

      <br>

      When you click the Extract button, you will see that there are two

      collections that have been added to the hData Collection[] (array

      of collections). The second one appears to contain the data you

      want in the "@graph" element which is another array of

      collections.

      <br>

      <br>

      The episode data can be accessed from hData by specifying the

      second element and then the "@graph" element, ie:

      <br>

      hData[1]["@graph"]

    </blockquote>

    <p>Thanks, Lee, for the above <b>invaluable</b> help. I don't

      pretend to fully understand it but it works.</p>

    <p> I have some questions:</p>

    <p>1. Can you recommend a printed book and/or online tutorial to

      help me understand your coding (in the routines processing the

      HTMLandXML data such as what the "@graph" element is) and the

      concepts behind it? Please remember that I'm a newbie to HTML&

      XML etc.<br>

    </p>

    <p>2. Are all of the following Gambas Components required in the

      attached httpClientExtra app (your modified httpClient app

      slightly changed by me): gb.net, gb.net.curl b.web, gb.xml,

      gb.xml.html?<br>

    </p>

    <p>3. Is there a Gambas component and/or standard coding to extract

      values from the Episodes information? I'm thinking of the

      identifier, episodeNumber, description, datePublished, name &

      url fields. I'd like to extract them into the corresponding

      aIdentifier, aEpisodeNumber, aDescription, aDatePublished, aName

      & aURL Gambas string arrays, for each Episode's set of data.

      Obviously I could code this myself, but it would be nice if there

      are already routine(s) written to do this kind of thing.<br>

    </p>

    <p>4. What coding is required to put the partOfSeries &

      partOfSeason sections (from the Prettified JSON data) immediately

      after the episode data for each Episode in the Episodes text &

      file?<br>

      Similar to 3, I would like to also extract some fields

      (description & name) in the partOfSeries section and some data

      (name) in the partOfSeason section for each TVEpisode in the

      Prettified JSON. For example, the values from the lines:<br>

      <i>description -> Series exploring behind the scenes at

        Longleat Estate and Safari Park</i><i><br>

      </i><i>name -> Animal Park</i><i><br>

      </i><i>name -> Summer 2020</i><br>

      in this part of Prettified JSON :<br>

      <i>@type -> TVEpisode</i><i><br>

      </i><i>    identifier -> m000lwqj</i><i><br>

      </i><i>    episodeNumber -> 1</i><i><br>

      </i><i>    description -> Kate and Ben return to Longleat just

        as the Covid-19 pandemic forces the park to close.</i><i><br>

      </i><i>    datePublished -> 2020-08-17</i><i><br>

      </i><i>    image ->

        <a class="moz-txt-link-freetext" href="https://ichef.bbci.co.uk/images/ic/480xn/p08n899w.jpg">https://ichef.bbci.co.uk/images/ic/480xn/p08n899w.jpg</a></i><i><br>

      </i><i>    name -> Episode 1</i><i><br>

      </i><i>    url -> <a class="moz-txt-link-freetext" href="https://www.bbc.co.uk/programmes/m000lwqj">https://www.bbc.co.uk/programmes/m000lwqj</a></i><i><br>

      </i><i>    partOfSeries:</i><i><br>

      </i><i>      @type -> TVSeries</i><i><br>

      </i><i>      image ->

        <a class="moz-txt-link-freetext" href="https://ichef.bbci.co.uk/images/ic/480xn/p07jtz7g.jpg">https://ichef.bbci.co.uk/images/ic/480xn/p07jtz7g.jpg</a></i><i><br>

      </i><i>      description -> Series exploring behind the scenes

        at Longleat Estate and Safari Park</i><i><br>

      </i><i>      identifier -> b006w6ns</i><i><br>

      </i><i>      name -> Animal Park</i><i><br>

      </i><i>      url -> <a class="moz-txt-link-freetext" href="https://www.bbc.co.uk/programmes/b006w6ns">https://www.bbc.co.uk/programmes/b006w6ns</a></i><i><br>

      </i><i>    partOfSeason:</i><i><br>

      </i><i>      @type -> TVSeason</i><i><br>

      </i><i>      position -> 29</i><i><br>

      </i><i>      identifier -> m000lwk9</i><i><br>

      </i><i>      name -> Summer 2020</i><i><br>

      </i><br>

      <br>

    </p>

  </body>

</html>