Help extracting text from plain text (.txt) file

Hello all,
I am trying to extract text from a plain text file to populate into a Google Sheet. The problem with this file is that there are no uniform delimiters so I am left using a regex to try to extract this data. (or am I over looking some other way?)

Below is a screen shot of this file. The black circle indicates a data set. The red circles indicate the data we want to extract from the data set. As you can see, these data sets repeat down through the sheet.

Any suggestion on how I can successfully extract this data into a usable array?

Welcome to the Make community!

Yes you could possibly use a Text Parser Match Pattern module to extract the data you need.

If you need further assistance, please provide the following:

A.

Can you please provide the text as text, not an image of text?

You can paste the formatted text in this manner:

  • Either add three backticks ``` before and after the pasted text, like this:

    ```
    text content goes here
    ```

  • Or use the format code button in the editor:
    Screenshot_2023-10-02_191027

This will allow others to better assist you. Thanks!

2 Likes

Thanks for your reply. I have tried the text parser and am having a difficult time building a regex to extract it correctly. Here is a snippet of the file in formatted text.

MACUNGIE           ULSHO    ULSHO AD
17:00 Change:     -0.1000  -0.1000
09/18/23  Price:   3.2650   3.2900

MACUNGIE-1         ULSD DY  ULSDDYAD ULSK     ULSK DY
17:00 Change:     -0.1000  -0.1000  -0.1000  -0.1000
09/18/23  Price:   3.4320   3.4570   4.1200   4.1250

MACUNGIE-2         87 Con   89 Con   90 Con   87CNE10  89CNE10  93CNE10
17:00 Change:     -0.0300  -0.0300  -0.0300  -0.0300  -0.0300  -0.0300
09/18/23  Price:   3.4355   3.5255   3.7355   2.7740   2.9800   3.3450
** 9/20 0800 - 9/21 1600 ETHANOL GAS NOT AVAIL AT TERMINAL II

MACUNGIE-4         B2DF     B3DF     B5DF     B10DF    B20DF
17:00 Change:     -0.1000  -0.1000  -0.1000  -0.1000  -0.1000
09/18/23  Price:   3.4300   3.4280   3.4260   3.4190   3.4050

MACUNGIE-5 BIO AD  B2DF AD  B3DF AD  B5DF AD  B10DF AD B20DF AD
17:00 Change:     -0.1000  -0.1000  -0.1000  -0.1000  -0.1000
09/18/23  Price:   3.4550   3.4530   3.4510   3.4440   3.4300

MACUNGIE-6 BIO DY  B2DF DY  B3DF DY  B5DF DY  B10DF DY B20DF DY
17:00 Change:     -0.1000  -0.1000  -0.1000  -0.1000  -0.1000
09/18/23  Price:   3.4350   3.4330   3.4310   3.4240   3.4100

MALVERN - BUCKEYE  ULSHO    ULSD DY  ULSDDYAD
17:00 Change:     -0.1000  -0.1000  -0.1000
09/18/23  Price:   3.2160   3.3910   3.4160

MALVERN BIO        B2DF     B5DF     B20DF
17:00 Change:     -0.1000  -0.1000  -0.1000
09/18/23  Price:   3.3900   3.3870   3.3720

Welcome to the Make community!

You can use a Text Parser “Match Pattern” module with this regular expression pattern

(?<=^|\n|\s+)(?<label>\w(?:[\w -])+?(?=\s{2,}))(?:\s+(\w{1,7}(?:\s\w{1,4}|(?:AD|DY))?(?=\s+|\n)))(?:\s+(\w{1,7}(?:\s\w{1,4}|(?:AD|DY))?(?=\s+|\n)))?(?:\s+(\w{1,7}(?:\s\w{1,4}|(?:AD|DY))?(?=\s+|\n)))?(?:\s+(\w{1,7}(?:\s\w{1,4}|(?:AD|DY))?(?=\s+|\n)))?(?:\s+(\w{1,7}(?:\s\w{1,4}|(?:AD|DY))?(?=\s+|\n)))?(?:\s+(\w{1,7}(?:\s\w{1,4}|(?:AD|DY))?(?=\s+|\n)))?[\w\W]+?Price:(?:\s+([\d.]+)(?=\s+|\n|$))(?:\s+([\d.]+)(?=\s+|\n|$))?(?:\s+([\d.]+)(?=\s+|\n|$))?(?:\s+([\d.]+)(?=\s+|\n|$))?(?:\s+([\d.]+)(?=\s+|\n|$))?(?:\s+([\d.]+)(?=\s+|\n|$))?

Regex test/proof: https://regex101.com/r/HNZgLx

Important Info

  • :warning: Global match must be set to YES!

Screenshot

Output


For more information, see Text Parser in the Make Help Center:

Match Pattern
The Match pattern module enables you to find and extract string elements matching a search pattern from a given text. The search pattern is a regular expression (aka regex or regexp), which is a sequence of characters in which each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning.

Hope this helps!

2 Likes

You can copy and paste this module export into your scenario. This will paste the modules shown in my screenshots above.

  1. Copy the code below by clicking the copy button when you mouseover the top-right of the code block
    Screenshot_2024-01-17_200117

  2. Enter your scenario editor. Press ESC to close any dialogs. Press CTRLV to paste in the canvas.

  3. Click on each imported module and save it. You may need to remap some variables.

Modules JSON Export

{
    "subflows": [
        {
            "flow": [
                {
                    "id": 19,
                    "module": "regexp:Parser",
                    "version": 1,
                    "parameters": {
                        "pattern": "(?<=^|\\n|\\s+)(?<label>\\w(?:[\\w -])+?(?=\\s{2,}))(?:\\s+(\\w{1,7}(?:\\s\\w{1,4}|(?:AD|DY))?(?=\\s+|\\n)))(?:\\s+(\\w{1,7}(?:\\s\\w{1,4}|(?:AD|DY))?(?=\\s+|\\n)))?(?:\\s+(\\w{1,7}(?:\\s\\w{1,4}|(?:AD|DY))?(?=\\s+|\\n)))?(?:\\s+(\\w{1,7}(?:\\s\\w{1,4}|(?:AD|DY))?(?=\\s+|\\n)))?(?:\\s+(\\w{1,7}(?:\\s\\w{1,4}|(?:AD|DY))?(?=\\s+|\\n)))?(?:\\s+(\\w{1,7}(?:\\s\\w{1,4}|(?:AD|DY))?(?=\\s+|\\n)))?[\\w\\W]+?Price:(?:\\s+([\\d.]+)(?=\\s+|\\n|$))(?:\\s+([\\d.]+)(?=\\s+|\\n|$))?(?:\\s+([\\d.]+)(?=\\s+|\\n|$))?(?:\\s+([\\d.]+)(?=\\s+|\\n|$))?(?:\\s+([\\d.]+)(?=\\s+|\\n|$))?(?:\\s+([\\d.]+)(?=\\s+|\\n|$))?",
                        "global": true,
                        "sensitive": true,
                        "multiline": false,
                        "singleline": false,
                        "continueWhenNoRes": false,
                        "ignoreInfiniteLoopsWhenGlobal": false
                    },
                    "mapper": {
                        "text": "{{17.value}}"
                    },
                    "metadata": {
                        "designer": {
                            "x": 408,
                            "y": -705,
                            "messages": [
                                {
                                    "category": "last",
                                    "severity": "warning",
                                    "message": "A transformer should not be the last module in the route."
                                }
                            ]
                        },
                        "restore": {
                            "parameters": {
                                "sensitive": {
                                    "collapsed": true
                                },
                                "multiline": {
                                    "collapsed": true
                                },
                                "singleline": {
                                    "collapsed": true
                                },
                                "continueWhenNoRes": {
                                    "collapsed": true
                                }
                            }
                        },
                        "parameters": [
                            {
                                "name": "pattern",
                                "type": "text",
                                "label": "Pattern",
                                "required": true
                            },
                            {
                                "name": "global",
                                "type": "boolean",
                                "label": "Global match",
                                "required": true
                            },
                            {
                                "name": "sensitive",
                                "type": "boolean",
                                "label": "Case sensitive",
                                "required": true
                            },
                            {
                                "name": "multiline",
                                "type": "boolean",
                                "label": "Multiline",
                                "required": true
                            },
                            {
                                "name": "singleline",
                                "type": "boolean",
                                "label": "Singleline",
                                "required": true
                            },
                            {
                                "name": "continueWhenNoRes",
                                "type": "boolean",
                                "label": "Continue the execution of the route even if the module finds no matches",
                                "required": true
                            },
                            {
                                "name": "ignoreInfiniteLoopsWhenGlobal",
                                "type": "boolean",
                                "label": "Ignore errors when there is an infinite search loop",
                                "required": true
                            }
                        ],
                        "expect": [
                            {
                                "name": "text",
                                "type": "text",
                                "label": "Text"
                            }
                        ],
                        "interface": [
                            {
                                "type": "text",
                                "name": "label",
                                "label": "label"
                            },
                            {
                                "type": "text",
                                "name": "$2",
                                "label": "$2"
                            },
                            {
                                "type": "text",
                                "name": "$3",
                                "label": "$3"
                            },
                            {
                                "type": "text",
                                "name": "$4",
                                "label": "$4"
                            },
                            {
                                "type": "text",
                                "name": "$5",
                                "label": "$5"
                            },
                            {
                                "type": "text",
                                "name": "$6",
                                "label": "$6"
                            },
                            {
                                "type": "text",
                                "name": "$7",
                                "label": "$7"
                            },
                            {
                                "type": "text",
                                "name": "$8",
                                "label": "$8"
                            },
                            {
                                "type": "text",
                                "name": "$9",
                                "label": "$9"
                            },
                            {
                                "type": "text",
                                "name": "$10",
                                "label": "$10"
                            },
                            {
                                "type": "text",
                                "name": "$11",
                                "label": "$11"
                            },
                            {
                                "type": "text",
                                "name": "$12",
                                "label": "$12"
                            },
                            {
                                "type": "text",
                                "name": "$13",
                                "label": "$13"
                            },
                            {
                                "type": "uinteger",
                                "name": "i",
                                "label": "i"
                            },
                            {
                                "type": "any",
                                "name": "__IMTMATCH__",
                                "label": "Fallback Match"
                            }
                        ]
                    }
                }
            ]
        }
    ],
    "metadata": {
        "version": 1
    }
}
2 Likes

Worked! Thanks for your help.