Most organizations depend on the web to collect data that is important to their decision making process. Automating data collection from websites can significantly help businesses reduce time, costs and manual errors.
Content Grabber – an advanced web scraping software application can help businesses automatically harvest data from the web. Content Grabber requires no programming. Content Grabber has a very unique feature – “self-contained agent” that provides a way for web scraping via agent assisted automation.
The self-contained agent feature is only available in the Professional & Premium version of Content Grabber.
What is Self Contained Agent?
A self-contained agent is a wrapper for your Content Grabber project making it independent of what the user might have installed. It is a single executable containing all resources and the runtime. One can execute the self-contained agent on any machine that has the windows operating system running.
The Content Grabber Self-Contained web scraping agent can be distributed royalty free. It can run without the need for the Content Grabber software to be present on target computer.
New business opportunity
Not only can a self-contained agents be distributed royalty free, they can also be white-labelled (branded) with you own company logos. This means businesses can create and package their own web scraping creations and sell them. A good opportunity for new business revenue. Some example applications include:
- Brand monitoring – extract text when a company name is mentioned on social media sites
- Executive recruiting – monitor sites for changes in executive profile status
- Scan web email for prospect customer details.
Content Grabber Self-Contained Agent in Action
Working with Content Grabber feels natural. Content Grabber offers the use of its visual editor with a point and click wizard based tool that can be used to create self-contained web scraping agent.
How to create self-contained web scraping agent?
To create a self-contained agent, simply go to Main Menu, Select File->Export Agent.
The following dialog will popup:
Then check “Create Self-Contained Agent” option in the Export Agent Dialog box.
In the Export Agent Dialog box, there are several options like Agent filename, include internal data, include external data, and customize design. Let us explore Customize Design in detail.
Customize Agent Design:
Using “Customize Design” option, one can customize the look and feel of the agent being created. On clicking Customize Design button, a following dialog will popup.
Customize Design Dialog has three tabs:
- Agent Details Tab: You can customize a self-contained agent by providing Agent Details like name, logo, icon and description in Agent Details tab. These information are being displayed in Startup Screen of an agent.
- Company Details Tab: You can specify branding details like Company name, logo, website, email, address and description in Company Details tab. These information are being displayed in About Screen of a stand alone agent.
- Templates Tab: In Templates tab, Content Grabber allows you to specify custom template files for white labeling your agent. The template files are HTML files that controls the look and feel of the agent layout. One can also configure different options to enable/disable on the agent interface.
After doing the necessary customization click the Save button.
Then finally click the Export button. That’s it. You are done with creating the Content Grabber self-contained web scraping agent.
Example:
We will workout a sample agent by considering the zillow website scraping – an online real estate database. We have a set of addresses as input values, we want to search zestimate, Baths#, Beds# and Sqft. For each of these addresses in zillow website and want the extracted data in CSV file as output.
The following screen shows the example self-contained agent “Input Values” screen. We have pasted list of addresses separated by a newline.
To make the Input value visible you need to make Public Provider property set to True and Assign Name to Public Provider Name property here we assigned name “Property Addresses”.
After successful execution of our agent, we are able to get the extracted result as shown in the following “View Data” screen:
The following options can be configured in agent interface by end user:
- Scheduling: An Agent scheduling option allows to automatically run the agent at predefined time intervals. This can be done every hour, every day, month, year and so on.
- Notifications: You can Enable/Disable email notifications on completion of agent run, on errors and on low number of page loads.
- Input Values: A self-contained agent can have both a database or files as public data providers. It can also have more than one public data providers, and each data provider can be chosen from a drop down box as shown in the following screen.
- Export target/data: It allows you to export the extracted data to file formats only. It does not allows to export data to database. For export to database, you need to use an export script.
- Proxies: An end user can set proxies for anonymous scraping.
Upgrading a self-contained agent:
If content grabber agent breaks due to change in target website, one can upgrade by just making necessary changes to work it correctly. After making changes, simply export the agent again. The end-user can override existing self-contained agent with new agent. Any configuration changes the end-user has made to the agent will not be lost unless you delete the configuration folder.
Drawbacks of a self-contained agent:
- Upfront cost of the development software license.
- A self-contained agent can only export data to file formats such as CSV, XML or Excel Spreadsheets. It can not export data to database.
- The self-contained agents require IE8+ and .NET4+ to run. While this is standard for most Windows PCs, this means the Content Grabber Self-Contained agent can only be executed on machines running Windows but not on machines running other operating system like Linux or Mac.
Summary:
Content Grabber self-contained agent enables business users and developer with no coding skills to design and run web scraping. Content Grabber provides agent assisted automation with a graphical easy to use Visio-like interface. The Content Grabber agent can be packaged as a standalone executable, branded as your own and distributed royalty free. That is the great opportunity for developers to do business.