Get web page using HtmlAgilityPack.NETCore
Use the HttpClient
as a new way to interact with remote resources via http.
As for your solution, you probably need to use the async
methods here for non-blocking your thread, instead of .Result
usage. Also note that HttpClient
was meant to be used from different threads starting from .Net 4.5, so you should not recreate it each time:
// instance or static variable
HttpClient client = new HttpClient();
// get answer in non-blocking way
using (var response = await client.GetAsync(url))
{
using (var content = response.Content)
{
// read answer in non-blocking way
var result = await content.ReadAsStringAsync();
var document = new HtmlDocument();
document.LoadHtml(result);
var nodes = document.DocumentNode.SelectNodes("Your nodes");
//Some work with page....
}
}
Great article about async/await: Async/Await - Best Practices in Asynchronous Programming by @StephenCleary | March 2013
I had the same problem in Visual Studio code with netcoreapp1.0. Ended up using HtmlAgilityPack version 1.5.0-beta5 instead.
Remember to add:
using HtmlAgilityPack;
using System.Net.Http;
using System.IO;
I did it like this:
HttpClient hc = new HttpClient();
HttpResponseMessage result = await hc.GetAsync($"http://somewebsite.com");
Stream stream = await result.Content.ReadAsStreamAsync();
HtmlDocument doc = new HtmlDocument();
doc.Load(stream);
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class='whateverclassyouarelookingfor']");