Feb 272017 Tagged with , , 0 Responses

Content Grabber V2 – With New Features

Content Grabber – a web scraping software developed by Sequentum is the most advanced web scraping software in the market and it just keeps getting better.

I am excited to highlight several enhancements to Content Grabber features in the Version 2: Chrome based Web browser, Edge Selections, Data Retention options, Change Tracking, Export to single database table, Export script templates, Export settings, File downloads, Screenshots, Retry errors, Simplified Action Configurations, Multiple command selections, Group commands, Improved XPath editor, New self-contained agents. These features better enable web scrapers to extract data from websites efficiently and effectively.

Here is what’s new in Content Grabber v2 in more detail:

Web Browser

Chrome based Web Browser:

According to the latest data released by web tracker Net Market Share, Google Chrome is still the world’s most popular internet browser after it obtained a market share of 48.65 percent in June. Content Grabber v1 is completely dependent on Microsoft Internet Explorer to be installed on the computer, whereas Content Grabber v2 has an embedded web browser based on Google Chrome and is completely self-contained – meaning no existing web browser is required on the computer where Content Grabber is running, and agent behavior no longer depends on existing browser configuration on the computer.

Edge Selections:

Content Grabber provides an easy way to select an element to extract on the website. One can easily click on the edge of a web element to select its parent. This is especially convenient while selecting web elements such as table rows, which cannot be selected directly in the web browser because they are completely covered by table cells.

Agents

Data retention:

New data retention options have been introduced in Content Grabber 2. Old data can now be removed, exported, or retained as well as retained for duplicate checks only. New default duplicate scripts can be used to copy old data to the current data set, so you end up with the complete current data set. In the old version (Content Grabber 1), it is impossible to know when data was removed from the website using duplicate scripts.

Change tracking:

It is an important requirement to know the data that has changed. Many times, you want to monitor changes on some websites for example: Amazon deals or get notified of the latest jobs of interest from Upwork. Or, you want to track updates on your competitor’s website which doesn’t provide an RSS feed. Previously, it was not possible to track changes in extracted data in Content Grabber 1 without the use of scripts. Now an agent can monitor the latest changes that have been made to the extracted data. The agent will mark extracted data as deleted, modified or added. If data is deleted but later returned, the data will be marked as “returned”, or “returned modified” if the data is returned in a modified state. An agent can be configured to only export data that has changed since the last successful run, or only export data that has changed in a specified number of days.

Success criteria:

A user can now define success criteria or a set of rules. From these it can be decided whether the agent completed successfully or not. This can be used to monitor data changes and control notifications.

Export from Multiple Agents:

Sometimes there may be a requirement to develop multiple agents and export the data to the single database tables. It was not possible to implement this in the older version of Content Grabber. But now you can export the data from multiple agents to the same database tables. Example: Content Aggregator; develop multiple agents to extract data from various websites and export it to the same database tables.

Export script templates:

Script templates are pre-written snippets of code provided by Content Grabber. Content Grabber provides multiple kinds of script templates that help write common code to use while developing web scraping agents. Export script templates have been added in Content Grabber 2 which made it easier to build export scripts for agents. An export script template will contain the C# / VB code required to write the data.

Export Settings:

Export Settings have been simplified. Data exported from a container can now be separated by setting a single option on the container. Exported data from list containers is now never separated by default, which should make the export data format more predictable.

File downloads:

More efficient and reliable file downloads when using the “Click to Download” option.

Screenshots:

Agents running in a service are now able to take screenshots which is not possible in the previous version.

Retry errors:

The current version of an agent can now be used when continuing a data extraction or retrying errors. The current version of the agent can differ from the version at the time the data extraction started. An error occurs if the agent has been modified in a way that requires a new internal data structure. It’s still possible to use a version of the agent that was current when the data extraction started.

User Interface

Multiple command selections:

In Content Grabber 2, it is possible to select multiple commands at a time in Agent Explorer. Selected commands can be easily deleted, copied, moved or grouped together.

Group Commands:

Now a user can select multiple commands and move them inside a Group command which can be a much easier arrangement for navigating and maintaining an agent.

Improved XPath editor:

xpath-selection

Content Grabber provides a point and click interface to select any element and automatically gets the XPath of selected element. If you want to edit the XPath in some cases, you can use the XPath editor. The Improved XPath editor now has color highlighting. It has been moved from the Ribbon menu to a docking window.

Self-Contained Agents

In Content Grabber 2, the self-contained agent’s user interface has been redesigned. The user interface is now completely based on HTML and CSS. A user can customize an agent’s look and feel with a Premium Edition Content Grabber license.

web_scraping_agent

API

Now the Content Grabber 2 API supports pagination which is essential when we have a large number of records. Here a large data set is divided into smaller data sets. Previously it was not possible. Pagination improves user experience, reduces the server load and enables faster retrieval.

 

Leave a Reply

Your email address will not be published. Please enter your name, email and a comment.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>