Read HTML Table - Extension Operator
I was expecting the following URL to parse properly:
https://www.hockey-reference.com/leagues/NHL_2019_skaters.html
However, the operator did not find any tables on the page. The tutorial process does properly parse tables from wikipedia, but fails on the page above.
That said, this is my go-to reference for my students as the tables are easily parsed in R and Python. For example:
import pandas as pd
tables = pd.read_html("https://www.hockey-reference.com/leagues/NHL_2019_skaters.html")
skaters = tables[0]
skaters.head().
Yes, there has to be some cleanup on the columns and data types, but that is part of the exercise and why I like using this reference. I figured it would be even more powerful as a training exercise in RM given the amount of data prep that is necessary.
Any helps or tips on how to configure this operator would be much appreciated!
https://www.hockey-reference.com/leagues/NHL_2019_skaters.html
However, the operator did not find any tables on the page. The tutorial process does properly parse tables from wikipedia, but fails on the page above.
That said, this is my go-to reference for my students as the tables are easily parsed in R and Python. For example:
import pandas as pd
tables = pd.read_html("https://www.hockey-reference.com/leagues/NHL_2019_skaters.html")
skaters = tables[0]
skaters.head().
Yes, there has to be some cleanup on the columns and data types, but that is part of the exercise and why I like using this reference. I figured it would be even more powerful as a training exercise in RM given the amount of data prep that is necessary.
Any helps or tips on how to configure this operator would be much appreciated!