Extract data from a GET Google Document output bundle

I am trying to extract from the Get Google Document module output bundle the startIndex and endIndex of every element that contains a link textStyle equal to a specified url. The structure (part of it) looks like this:

In this example, i want to get the value of startIndex 1827 and endIndex 1867 by identifying the link url as gotomarket.global/events/innovaud-life-sciences-uk-mission-432-433/

I am guessing there may be a way to parse the get document output or perhaps even i can use a basic Docs API get document request and in the query specify a filter? But i dont really know the syntax and even if i get a smaller file, it will stil lneed parsing to extract the required data.

BTW the ultimate aim is having got the start and end indexes i can then use the batchUpdate method to change the URL. That’s the end goal.

Would greatly appreciate any help.

Thx
Ian

Hi @dude,

I am assuming you want to get the start & end-index of every element that contains an URL - is that correct?

Then I’d approach it along the lines of this:

  1. Get the output from your screenshot
  2. Iterate through the elements array (iterator)
    → Filter elements.textRun.textStyle.link.url exists
  3. Array aggregator (startIndex & endIndex)

I hope it helps! :slight_smile:

2 Likes

Richard you are a star thank you. I get the approach but have a few issues if you coud possibly advise?

Firstly with iterator set like this there are no output bundles:

Then with the filter, I can’t seem to travel down the branches far enough. I set the iterator to run through the content array to get some data and then to set a filter i hit the buffers here:

What do your reckon?

Thx Ian

Oh and actually i dont want all links, only those links where the link.url is www.abc.com

This is the structure of the google doc output bundle:

You might see a little popup in the bottom right corner saying “Refreshing Metadata” sometimes. Then the output of modules is being updated. In this example it indeed is rather odd, that the textStyle etc. isn’t being parsed. Maybe it’s because you chose paragraph instead of elements? Or Do you need to go through the paragraphs and then within each paragraph go through every element? → in this case you’ll need a second iterator

To specify the URL manually you need to use {{ }}
I think it’s {{37.textRun.textStyle.link.url}} (if you iterate through the elements array!)

You added “only links with URL www.abc.com” → adjust the filter “url = www.abc.com” instead of “url exists”

Given the output of the module you’ll probably need 3 iterators

  1. iterate through the content array
    2.iterate through the paragraphs of a content
  2. iterate through the elements of a paragraph
2 Likes

very neat, i’ll try now. TY so much.

2 Likes

I think i have this working now other than the fact that the array aggregator is shows 23 entries when there is only one match:

Is this expected behaviour?

In the array aggregator you need to choose your first iterator as the “source module”.

I assume you have 22 empty output bundles and only 1 which contains the link? This should be fixed with this simple change

2 Likes

Hi Richard, this works perfectly now. I can’t thank you enough for your support. Let me know if i can help you in any way. My work supports companies looking to internationalise.

BTW, is there a way I can share the scenario for others to access and learn? And do i now close this and mark it as SOLVED as I have seen in other commnity apps. And if so how?

Many thx once again.

3 Likes

Hello there @dude :wave:

Awesome to see that you managed to get this up and running with the help of @Richard_Johannes. Great job :clap:


Just FYI: you mark the answer as a solution by clicking the :white_check_mark: box

1 Like