I’m trying to extract data from an HTML table that’s coming in through an email. I’ve tried using Regex and the text version but that’s not working despite validating on regex testers.
I’m now using the text parser and deprecated “extract from html table” which retrieves the data with nested tables, rows and columns, which is hard for me to use. So I’m trying to use the Iterator module but I haven’t been able to set it up correctly since it’s a nested array and that’s puzzling me.
Here’s an example HTML that I’m trying to work with. I’m specifically trying to extract the values of “order number”, “customer email”, “customer name”, “comments”, and “star rating”.
<table border="0" width="100%" cellspacing="0" cellpadding="0" bgcolor="#f6f6f6">
<tbody>
<tr>
<td>A customer has left you feedback from their order.</td>
<td width="580">
<div>
<table border="0" width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
</tr>
<tr>
<td>
<h1>Customer 2494157 sent feedback</h1>
<table border="0" width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td><img src="https://s3-us-west-2/logoEmail.png" alt="" width="70" /></td>
</tr>
</tbody>
</table>
<table border="0" width="80%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>Order Number</td>
<td>6651</td>
</tr>
<tr>
<td>Customer Email</td>
<td>family98@gmail.com</td>
</tr>
<tr>
<td>Customer Name</td>
<td>John Smith</td>
</tr>
<tr>
<td>Comments</td>
</tr>
<tr>
<td>Star Rating</td>
<td>5</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
<tr>
<td>
<p>- Laundry Team</p>
</td>
</tr>
</tbody>
</table>
I’d appreciate help. I’m pretty lost at this point.