Sep 292019 0 Responses

Review: Web Content Extractor’s Online/Cloud based Scraping Platfrom

Scraping data, extracting data from website, can be a very daunting task if the website has a lot of data on it. If you are scraping data from a website without using an automation software for web scraping, like Web Content Extractor, then you are likely to spend a lot of time in order to extract the data. But if you use an automation software, you are likely to finish the work earlier and with very little effort.

Certainly this tool is useful and helps you do tedious and boring tasks easily but there is another plus point to this tool is Accuracy. If you are not using an automated tool to extract data from a HTML website, you are most likely to have errors in case of data extraction. It can be due to several reasons but the most prominent one would be Human Error. But with the use of this tool, this error is minimized. Once you specify the type of data and set its fields, you will get all the similar data scraped without going through the hassle of doing it manually and thus minimizing the error.

Newprosoft’s Web Content Extractor is scraping player from many years even before some other online/cloud based scraping platform came in web scraping market, It’s desktop scraping software has been a very optimal choice but now it provides an online substitute to itself. You don’t have to download the Web Content Extractor to extract your data. You can use its online version and do scraping without leaving your browser. This has certainly increased the productivity of Web Content Extractor and also opened door for integration of scraping with other tools  via APIs.

Introduction: Functions of Web Content Extractor is very helpful in obtaining data from a website without any hassle. With this amazing tool you can extract product pricing data, real estate data, scrape eCommerce products, songs or movie information; gather news and articles on a certain topic; information from dating sites or job boards or any web-resources. This is a short list of functions Web Content Extractor. And you are not limited to the above applications, it can perfectly work with any kind of web information. You can also customize it to deal with any kind of website data. It is highly automatic and you can design your scraper on your browser using point-and-click interface for defining scraper navigation/crawling and data fields to capture.

As per the pricing, it is very economical and can be bought by anyone in need of its services.

pricing plan

If you are a user and want to use it lightly, say, 1000 pages per month. Then you ought to choose its Entry Plan that costs $30 per month for scraping 10,000 pages.

If you are a medium user and have website that requires to crawl 50K pages you will need its Standard Plan. It costs $60 for 50,000 pages per month.

In case you have a very high need for web scraping and you need more than 50,000 pages scraped, then you should go for Enterprise plan. It offers 500,000 pages per month for $150 per month.

They also provides web scraping service and offers first two agents for free with every plan.

You have the choice to go all bonkers or be conservative in scraping. Now go and scrap the hell of out of web pages.

How to do Web Scraping using this platform ?

If you are wondering about the usage of this online version, you don’t have to worry. I prepared small-basic steps, it scraps data from very simple 1 level depth of data from the Playground Example – Detail Page With Hyper Link , let’s get to it.

  • First of all go to http://www.webcontentextractor.com/
  • Then scroll down to see “Get started with a free account”.
  • Make an account if you don’t have already.
  • Once logged in, click on New Project. A new Window will open in that enter the project name and click on create to get a window like this.
  • Enter your start URL or list of start URLs in to next step

project-configuration

Once you enter URL and click on “Next” button you will see Layout as shown in below screenshot where you can define Crawling/Navigation Pattern and Depth. You can use point and click interface to highlight the likable links that scraper needs to crawl repeatedly.

navigation pattern

 

In our case we only have one level of depth so we choosing clickable links of profile names and when we select two similar links the platform automatically detects prompts you to crawl all similar links.

Similar-Navigation-Links

After defining crawling pattern it’s time to define data fields that needs to scraped from profile page as shown in below screenshot we can select specific element on browser and name it.

fields

Here we have defined fields like Name, Address, City, State, Email, Zipcode, Phone, URL and Timestamp. On the Left side you can see list of defined fields and on right side it shows HTML tree view of DOM elements of that page.

This platform also provides basic transformation functionality, When you define a field or by editing existing field you can use inbuilt transformation functions like below

Text Transformation

  • Add String at the Beginning:
    This function adds string at the beginning of a specific string.
  • Add String at the End:
    This function adds string to the end of a specific string.
  • Extract Email:
    Extract email addresses from a specific string using pattern matching (a regular expression).
  • Extract Phone:
    Extract phone numbers from a specific string using pattern matching (a regular expression).
  • Left String:
    Extracts string from left.
  • Replace String:
    Replace the sub-string specified in the find What parameter with the sub string specified in the second replace With parameter.
  • Replace String Regex:
    Replace all strings that match a regular expression pattern with a specified replacement string. The pattern parameter specifies the pattern of an expression. The replacement is the replacement string.
  • Right String:
    Extract string from right.
  • Split String:
    Extracts string from the text.
  • Strip HTML:
    This function removes all html tags from a specific string.
  • Sub String:
    Extracts part of string (sub string) from whole string.
  • Sub String RegEx:
    Extract the sub-string from a specific string using regular expression (Regex).
  • Trim String:
    Remove white-space, tab and line break characters on both sides of a string.

Now it’s time to execute the agent we have created, You can click on “Run Scraper” green button on top and now relax the scraper will run and extract data. Below is the screenshot of extracted data from this simple scraping agent. You can download extracted data in CSV and JSON format.

Scraper Data Result

Platform also provides Log Tab where you can view log of the scraper and it also has Scraper URL Graph tab which shows the list of URLs scraper crawled and shows whether it’s successfully executed and got 200 OK response of it failed.

URL Graph

 

Conclusion:

Overall, this Web Content Extractor Online has made data scraping so easy that a newbie can use it and extract the desired data without any problem. All you need is a basic knowledge of how to use a browser and ability to read. No coding is required in order to extract data from a big website with a lot of images and data. Thus this tool makes the work of hours, a work of few minutes with fewer clicks. Thus in order to avoid hassle and time wastage on the tedious and boring task of data extraction manually, use Web Content Extractor.

This scraping platform is suitable for simple data extraction jobs as it do not have reach set of commands like Conditional Flow, Java Script based Custom Interaction, Resolving CAPTCHAs, Parsing JSONs and XMLs and many other features are missing as compare to other scraping platforms.

 

Leave a Reply

Your email address will not be published. Please enter your name, email and a comment.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">