Dec 162014 0 Responses

A comparison of Web Scraping Softwares Import.io, Visual Web Ripper, Newprosoft and Mozenda

canadiantire storeThe Set Up: One of our clients wanted to extract retail locations from Canadian Tire – http://www.canadiantire.ca store locator. He provided us a file containing zip codes of cities for which he wants to scrape store locations.

Import.io

import.io scraping servicesFor this requirement, at first we have tried import.io web scraping services (https://import.io/)to scrape retail store locations. Import.io does not allow the user to input values from a list and submit forms so this service did not work for the requirement.

Newprosoft.com

Then we gave a try to Web Scraping Software named Web Content Extractor (http://www.newprosoft.com) software. Again web content extractor also failed for our requirement. It does not support filling out forms from a file containing list of values.

Visual Web Ripper – Web Scraping Software

visual web ripper we had to started hunting for other web extraction software that best suits our requirement and we come across Visual Web Ripper – a tool for automated web scraping. It has “Input Data Source” option to provide a list of input values to a data extraction project but it could not be done by a non-programmer so it was back to the drawing board with our project.

Mozenda – Screen Scraper

mozenda-screen-scraperWe eventually landed at Mozenda website-www.mozenda.com. We registered for Mozenda’s free trial and started working with their tool. we quickly found that Mozenda was the best fit for our requirements. Mozenda provides web automation feature that automatically fills out web forms and submit queries using static or dynamic inputs. Moreover, Mozenda is the easiest and fastest way to scrape the data from the web as it provides a simple point-and-click interface and the scraper projects run on their cloud computing environment. It also has a wide range of export file formats.

One can extract data from a website using Mozenda by following three easy steps illustrated below:

  1. Build a data extraction project (agent) using Mozenda Agent Builder windows application. By using Mozenda Agent Builder, you can build agents that can extract specific information from the website.
    1. Type in the URL of the target website (http://www.canadiantire.ca/en/store-locator.html in our case) and navigate to the webpage you want to start gathering information from.
    2. Then click “Start a new agent from this page” on left pane.
    3. Then click on Address, City or Postal Code search textbox to create List of Inputs where we need to choose the source of input values.
    4. There are three options for input source: import a file, use a collection and manually enter inputs. Select import a file option as we have a file ready containing all Canadian zip codes and browse a csv file. The file will be uploaded and a collection is created. Then select the field from the file that the agent will use in the search.

      Mozenda screen scraping software

    5. Click on the search button.
    6. Then click on “Create a list of items” action on left pane and click on label that you want to extract and specify field name for that. Repeat this step for all the fields that you want to extract.

      Mozenda screen scraping software

    7. Finally Test the agent. If agent test is successful, then save the agent.

      Mozenda screen scraping software

  2. By Clicking on the “Run Now” option, the agent will run on the Mozenda Web Console application. This starts the agent running on the cloud. During the run state, you can view extracted results in the web console. There are a number of options in the Web console to manage the results and other tasks. You can also schedule the agent at specific time and intervals.
  3. Manage the extracted results. After the Agent has successfully completed the project, you can export the results into various file formats like csv, excel etc.

Leave a Reply

Your email address will not be published. Please enter your name, email and a comment.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>