Oct 172016 Tagged with , , , 0 Responses

Octoparse Review – An Automated Web Scraping Tool

octoparse-web-scraperOctoparse is a powerful automated web scraping software with an easy-to-use point-and-click user interface, which enables users to apply different patterns to extract data from different websites with ease.

It provides different advanced functions like Smart Mode, Cloud Extraction, API Access that helps users to capture data from any static or dynamic websites without any programming knowledge. Various export formats are available such as CSV, Excel, HTML, TXT. It also enables users to export extracted data into databases like MySQL, SQL Server, and Oracle.

Octoparse offers three editions to meet your data extraction needs, including Free, Standard and Professional. It is one the best free web scraping tools available in the market. Two paid editions provide cloud platform with multiple cloud servers for web scraping.

Octoparse Price

 

( For detailed features check out here)

Distinct Features of Octoparse

  1. Visual Workflow Designer: Octoparse provides a simple and user-friendly Visual Workflow Designer that enables users to extract data in bulk in the easiest and fastest way. Users can configure an extraction rule to instruct the program: which web page is to be crawled, which data fields to be collected etc.

scraper workflow designer

  1. No coding needed: All you need to do is to follow simple steps to configure a rule while extracting data. No coding needed. It has very rich set of tutorials on how to extract data with Octoparse.
  2. Smart Mode: This feature enables users to instantly turn web pages into Excel with only one click – enter your target URL in the text box and click “SMART”. It is a lot easier and the extraction rule is automatically created by the program, which lowers the barrier to entry for anyone who needs data. It works perfectly on list or table pages such as category pages, search results pages, etc. It usually takes less than a minute to get data for one page.

Smart Mode

 

 

  1. Cloud Extraction: Cloud Extraction allows users to run the data extraction tasks on the cloud platform. When you run the task using Cloud Extraction feature, technically, it speeds up data extraction (4 to 10 times) than Local Extraction.

If it takes around 1 second to load a web page, 4*7*24*3600 web pages will be scraped with 4 cloud server per week when running 1 scraping task. When running 2 extraction tasks, 2 cloud servers will be assigned to each task and 2*7*24*3600 pages will be scraped per week.

  1. Deal with Complex Websites: Octoparse can easily handle dynamic websites built with rich JavaScript and AJAX. It is also flexible with hard to crawl ASP website. Users can use it to
  • Scrape data from behind a log in.
  • Scrape data from a website with infinite scroll like Twitter or Facebook.
  • Scrape a website with pagination.
  1. XPath Tool and RegEx Tool: These tools enable you to scrape data you want precisely. With these two tools, you will find it much easier to define an XPath or write a regular expression. You can also modify the XPath in Octoparse to exactly locate the data on the web page and extract the data you want.

    Smart Scraping Mode

  1. Incremental Extraction: This function allows you to extract the updated data without having to configure another rule. Updated data is identified by new URLs that are generated by new pages.incremental-extraction
  2. Ad Blocking: This feature enables you to get rid of annoying ads including banners, pop-ups, etc. when scraping website using Octoparse. To use Ad Blocking feature, you simply choose Ad Blocking option while setting up Basic Information step. Ad blocking feature will optimize the loading time and reduce the number of web requests hence boosts the extraction speed.ad blocking
  3. API Access: Octoparse has APIs available for you to access data. Users can create an API to connect the system to the scraped data in real time. To use Octoparse APIs, users must get the task ID of an extraction task. The easiest way to get the task ID is to right click a task and select “Create an API”.Scraping API
  4. Schedule Data Extraction: Octoparse enables users to run an extraction task at a scheduled time. Once setting the schedule time, the program will automatically run the task at that particular time.Create API
  5. Various Exporting Capabilities: Octoparse provides different export formats like CSV, Excel, HTML, TXT. It also enables users to export extracted data into different databases. (MySQL, SQL Server, and Oracle)scheduled-data-extraction
  6. Proxies & IP Rotation: Octoparse enables you to scrape websites by rotating anonymous proxy servers to prevent your IP address from being blacklisted. The cloud platform has rich proxy servers and users don’t have to manually create a connection with different proxies. Or you can add a list of external proxy servers manually and configure connection for automatic rotation.
  7. Support: There are rich tutorials on the website for both beginners and experienced users. For technical support, users can reach the support team through Skype, Facebook Messenger and email

Cons: As for now, Octoparse couldn’t handle CAPTCHA. The Smart Mode couldn’t deal with complex websites that need users to login. Moreover, it doesn’t have more controlled logging and error handling facilities.In conclusion, Octoparse is a feature-rich visual scraping application and worth a try. It can help you to get any public web data easily and efficiently.

 

 

Leave a Reply

Your email address will not be published. Please enter your name, email and a comment.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>