Scraping Tools

Apr 022018 Tagged with , , , 0 Responses

Content Grabber – Useful Custom Scripts

Content Grabber has powerful custom scripting using which you can customize Content Grabber behavior and develop power full web scraping agent that can crawl and scrape data from simple to very complex websites.

Below are few example of custom script using C# which shows database connection and dynamically run time Xpath modification when scraper running.  Read More…

Dec 092017 0 Responses

Puppeteer – Web Scraping using Headless Chrome Node API

Puppeteer is Headless Chrome browser developed by Google Team. A headless browser is a web browser without a graphical user interface(GUI) means that it has no visual components. Headless browsers enable you to control web page via programming without human intervention. In Programmer’s term, Puppeteer is a node library or API for Headless browsing as well as browser automation developed by Google Chrome team.
web scraping with chrome
Browser automation helps you to automate repetitive tasks and web application testing. For example, monitoring product pricing over period of time, form submission, automatically login to web app, perform some task and logout etc.

There are many libraries for browser automation and web scraping like PhantomJS, Selenium IDE etc. However Puppeteer runs faster and uses less memory. Puppeteer only works with Google Chrome browser.Puppeteer can be used for:

Read More…

Oct 172016 Tagged with , , , 0 Responses

Octoparse Review – An Automated Web Scraping Tool

octoparse-web-scraperOctoparse is a powerful automated web scraping software with an easy-to-use point-and-click user interface, which enables users to apply different patterns to extract data from different websites with ease.

It provides different advanced functions like Smart Mode, Cloud Extraction, API Access that helps users to capture data from any static or dynamic websites without any programming knowledge. Various export formats are available such as CSV, Excel, HTML, TXT. It also enables users to export extracted data into databases like MySQL, SQL Server, and Oracle. Read More…

Jul 082016 Tagged with , , , 4 Responses

Xpath Generator – Free tool for making Xpath Expression

xpath generatorXPath is a query language for selecting nodes from HTML or XML document. XPath is used to navigate through elements and attributes in an HTML or XML document. Xpath is inevitable part of web scraping. To extract web element, one must know what is its XPath. Most of the web scrpaing software comes with inbuilt functionality to generate xpath expression easily and some browsers also support facility to inspect XPath but it lacks some advanced functionality. Keeping in mind these limitations, we have made a special tool for XPath Selection named “Xpath Generator”.

Read More…

Mar 222015 Tagged with , , , 1 Response

Python code to connect with MySQL and SQL Server database in Fminer

Fminer Run CodeFminer is powerful web scraping tool as well as best browser automation tool that support many features that web scraping software needs. Some of the Fminer’s key feature are Support of Multithreading, Captcha solving feature, set actions to deal with browser automation like Input text into text field, select option from drop down, choose radio button and check boxes, project scheduling and email feature, custom  Proxy support and many more. You can read detail on each feature of fminer on my previous post Fminer – Visual Web Scraping Software With Macro Recorder and Diagram Designer Read More…

Jan 042014 Tagged with , ,

How to do data scraping from PDF files using PHP?

pdf data scraping

PDF Scraping using PHP

Situations arise when you want to scrap data from PDF or want to search PDF files for matching text. Suppose you have website where users uploads PDF files and you want to give search functionality to user which searches all uploaded PDF file content for matching text and show all PDFs that contains matching search keywords.

Or you might have all London real estate properties details in PDF report file and you want to quickly grab scrape data from PDF reports then you might need PDF scraping library. Read More…

Nov 282013

How to use Web Content Extractor(WCE) as Email Scraper?

Email-ScrapingWeb Content Extractor is a great web scraping software developed by Newprosoft Team. The software has easy to use project wizard to create a scraping configuration and scrape data from websites.

One day I came to see the Visual Email Extractor which is also product of Newprosoft and similar to Web Content Extractor but it’s primary use is to scrape email addresses by crawling websites you feed to the scraper. I had noticed that with the little modification in Web Content Extractor project configuration you can use it same as Visual Email Extractor to extract email addresses.

Read More…