Extract data from a GET Google Document output bundle

dude · September 7, 2023, 9:31am

I am trying to extract from the Get Google Document module output bundle the startIndex and endIndex of every element that contains a link textStyle equal to a specified url. The structure (part of it) looks like this:

In this example, i want to get the value of startIndex 1827 and endIndex 1867 by identifying the link url as gotomarket.global/events/innovaud-life-sciences-uk-mission-432-433/

I am guessing there may be a way to parse the get document output or perhaps even i can use a basic Docs API get document request and in the query specify a filter? But i dont really know the syntax and even if i get a smaller file, it will stil lneed parsing to extract the required data.

BTW the ultimate aim is having got the start and end indexes i can then use the batchUpdate method to change the URL. That’s the end goal.

Would greatly appreciate any help.

Thx
Ian

Richard_Johannes · September 7, 2023, 5:17pm

Hi @dude,

I am assuming you want to get the start & end-index of every element that contains an URL - is that correct?

Then I’d approach it along the lines of this:

Get the output from your screenshot
Iterate through the elements array (iterator)
→ Filter elements.textRun.textStyle.link.url exists
Array aggregator (startIndex & endIndex)

I hope it helps!

dude · September 7, 2023, 5:28pm

Richard you are a star thank you. I get the approach but have a few issues if you coud possibly advise?

Firstly with iterator set like this there are no output bundles:

Then with the filter, I can’t seem to travel down the branches far enough. I set the iterator to run through the content array to get some data and then to set a filter i hit the buffers here:

What do your reckon?

Thx Ian

dude · September 7, 2023, 5:34pm

Oh and actually i dont want all links, only those links where the link.url is www.abc.com

dude · September 7, 2023, 5:36pm

This is the structure of the google doc output bundle:

Richard_Johannes · September 7, 2023, 5:37pm

You might see a little popup in the bottom right corner saying “Refreshing Metadata” sometimes. Then the output of modules is being updated. In this example it indeed is rather odd, that the textStyle etc. isn’t being parsed. Maybe it’s because you chose paragraph instead of elements? Or Do you need to go through the paragraphs and then within each paragraph go through every element? → in this case you’ll need a second iterator

To specify the URL manually you need to use {{ }}
I think it’s {{37.textRun.textStyle.link.url}} (if you iterate through the elements array!)

You added “only links with URL www.abc.com” → adjust the filter “url = www.abc.com” instead of “url exists”

Given the output of the module you’ll probably need 3 iterators

iterate through the content array
2.iterate through the paragraphs of a content
iterate through the elements of a paragraph

dude · September 7, 2023, 5:39pm

very neat, i’ll try now. TY so much.

dude · September 7, 2023, 6:04pm

I think i have this working now other than the fact that the array aggregator is shows 23 entries when there is only one match:

Is this expected behaviour?

Richard_Johannes · September 7, 2023, 6:47pm

In the array aggregator you need to choose your first iterator as the “source module”.

I assume you have 22 empty output bundles and only 1 which contains the link? This should be fixed with this simple change

dude · September 8, 2023, 10:14am

Hi Richard, this works perfectly now. I can’t thank you enough for your support. Let me know if i can help you in any way. My work supports companies looking to internationalise.

BTW, is there a way I can share the scenario for others to access and learn? And do i now close this and mark it as SOLVED as I have seen in other commnity apps. And if so how?

Many thx once again.

Michaela · September 8, 2023, 10:21am

Hello there @dude

Awesome to see that you managed to get this up and running with the help of @Richard_Johannes. Great job

Just FYI: you mark the answer as a solution by clicking the box

Topic		Replies	Views
I want to retrieve in my google sheet the links to my images How To filters , google-sheets	13	49	February 19, 2025
Getting Data from Google Analytics How To arrays	3	280	June 1, 2024
Filtering Json Data Getting Started filters , arrays	4	393	June 7, 2024
Retrieving Iterator bundle data after Text Aggregator How To aggregators , iterator	5	811	April 1, 2024
Extract text content from HTML and save it to a Google Doc How To functions , connections	7	909	July 5, 2024

Extract data from a GET Google Document output bundle

Related topics