Google HtmlAgilityPack: Html parsing in Xamarin.Forms (C# - Xaml) | SubramanyamRaju Xamarin & Windows App Dev Tutorials

Friday 1 March 2019

HtmlAgilityPack: Html parsing in Xamarin.Forms (C# - Xaml)

Introduction:
There are some situations when we want to parse and extract information from an HTML page instead of rendering it. In this case, we can use HtmlAgilityPack nuget package because it has a set of powerful API very easy to use. In this post, we will discuss how to set up an Xamarin.Forms project that uses HtmlAgilityPack and how to extract information from an HTML string.

Requirements:
  • This article source code was prepared by using Visual Studio Community for Mac (7.6.9). And it is better to install latest visual studio updates from here.
  • This sample project is Xamarin.Forms NetStandardLibrary project and tested in Android emulator and iOS simulators.
  • Used HtmlAgilityPack Nuget Package version is V1.9.1.
Description:
To understand this article, we will take sample html string which is having <img> tag. After that we will extract <img> src value to display it on ContentPage.
So sample html string is 
  1. <!DOCTYPE html>  
  2. <html>  
  3. <body>  
  4. <h2>HTML Image Parsing</h2>  
  5. <img src='https://www.w3schools.com/html/img_girl.jpg' alt='Girl in a jacket' width='500' height='600'>  
  6. </body>  
  7. </html>  
And we need to display "https://www.w3schools.com/html/img_girl.jpg" on our app by parsing it from above html string using HtmlAgilityPack.

We have to follow below few steps for html parsing:
Step 1: Adding HtmlAgilityPack Nuget package to All three projects (Core, Android, iOS)
So Right on your Core/Android/iOS project => Add => Add Nuget Pachages => search for "HtmlAgilityPack" and Add Package.



Step 2: Creat a method to parse html string

Parsing html is very easy. But before parsing, we need to load html string as HtmlDocument and need to iterate node (ex; img) value like below.

  1. private string HtmlParsing(string html)    
  2.        {    
  3.            HtmlDocument document = new HtmlDocument();    
  4.            document.LoadHtml(html);    
  5.            //Getting img src value    
  6.            var imgUrl = document.DocumentNode.Descendants("img").FirstOrDefault().GetAttributeValue("src"null);    
  7.            return imgUrl;    
  8.        }   

If your html string having list of Images, you can also iterate every img node with help of foreach like below

  1. private List<string> HtmlParsing(string html){    
  2.            HtmlDocument document = new HtmlDocument();    
  3.            document.LoadHtml(html);    
  4.            //Fetching list of img tags    
  5.            var imgs = document.DocumentNode.Descendants("img");    
  6.            var ImageList = new List<string>();    
  7.            foreach (var node in imgs.ToList())    
  8.            {    
  9.                //Getting list of img src value      
  10.                ImageList.Add(node.GetAttributeValue("src"null));    
  11.            }    
  12.            return ImageList;    
  13.        }  

Step 3: Displaying Image in ContentPage

Now it is very easy to use above method to display parsed imgUrl in ContentPage with help of below csharp code.

  1. myImage.Source = HtmlParsing(html);// Displaying Image from https://www.w3schools.com/html/img_girl.jpg 

Output:


You can also directly work on below sample source code to understand this article. 

HtmlParsingXFSample
You can also see overview of this article from below youtube video. For more videos please don't forget to SUBSCRIBE our youtube channel from here.



FeedBack Note: Please share your thoughts, what you think about this post, Is this post really helpful for you? I always welcome if you drop comments on this post and it would be impressive.

Follow me always at @Subramanyam_B
Have a nice day by  :)

No comments:

Post a Comment

Search Engine Submission - AddMe