HTML Parsing/Web Scraping using Alteryx

Parkrun Ireland (the free weekly timed 5km run) recently passed the 1,000,000 parkruns milestone. I wanted to take a look at how these runs had accumulated over the years. To do this I needed the event history for each parkrun course.

Each course displays its event history in a HTML table.
Event_History
I decided the quickest way to get this history for all 75 courses was to use Alteryx.

Step 1: Pass the URL to the Download tool which will output all the HTML.
Step1_Download
Step 2: Use the RegEx tool to isolate and extract just the HTML for the event history table.
Step2_HTMLTable
Step 3: Use the RegEx and Text to Columns tools to transform the HTML table into a pipe-delimited table.
Step3_PipeDelimitedTable
Step 4: Use the Text to Columns tool to split the rows into columns at the pipe (|) delimiter.
Step 5: Use the Interface tools to turn the workflow into a Batch macro which takes a list of courses as the input.
Workflow
You can check out the Tableau dashboard built using this data here on Tableau Public.
Tableau_Dashboard

Leave a comment