Import data from a current web session?
In 11.3 you can do:
ExternalEvaluate[session, "JavascriptExecute" ->
"return document.documentElement.outerHTML;"]
In 12.0 and up, the syntax has change a little bit:
session = StartWebSession[];
WebExecute[session, "OpenWebPage" ->
"https://weather.com/weather/tenday/l/New+York+NY+10017:4:US"]
html = WebExecute[ session, "JavascriptExecute" ->
"return document.documentElement.outerHTML;"]
Note that the Wolfram Language comes with a WeatherData function built-in, so you don't need to scrape data from a web page.
Also the National Weather Service has an public API, which might give you a more structured way to get this sort of data.
You import the html
above with this:
ImportString[ html, "XMLObject" ]
This gives you a Wolfram Language expression that you can traverse with Part
, Take
and use with functions like Cases
.
If your actual interest is stocks, then you should probably be aware of the FinancialData function.
This will work on Mathematica 11.3, and not in 12 as websession[]
seems to have changed.
Module[
{
session = StartExternalSession["WebDriver-Chrome"],
iws, chromedo, img, links
},
chromedo[cmd_] := ExternalEvaluate[session, cmd];
Pause[1];
iws = ExternalEvaluateWebDriver`Private`websession[];
Pause[1];(*Time to load chrome*)
chromedo[
"OpenWebPage" ->
"https://www.barchart.com/stocks/quotes/SPY/options?moneyness=allRows"
];
Pause[15];(*Time to load the page*)
Echo@WebUnit`GetURL[iws];
html = WebUnit`GetPageHtml[iws];
DeleteObject[session];
]
TableForm[ImportString[html, {"HTML", "Data"}][[1, 2, 2, 2, 1, 2]]]
Here is a fairly stupid workaround to solve the above problem.
As suggested by rhermans, we can first obtain the html text of the webpage after it has finished loading:
session = StartExternalSession["WebDriver-Chrome"];
ffoxdo[cmd_] := ExternalEvaluate[session, cmd];
iws = ExternalEvaluateWebDriver`Private`websession[];
ffoxdo[ "OpenWebPage" -> "https://weather.com/weather/tenday/l/New+York+NY+10017:4:US"];
Pause[3];
html=WebUnit`GetPageHtml[iws];
DeleteObject[session];
Then, since Import
cannot be used directly on a string of html text, we save it to disk and load from there:
Export["my.txt", html];
RenameFile["my.txt", "my.html"];
data = Import["my.html", "Data"];
DeleteFile["my.html"];
Now data
indeed contains the output I was hoping for. But the workaround of writing to disk first is kind of unsatisfactory.