Text Parsing

Jun 112018 Tagged with , , , 0 Responses

Google OpenRefine : Opensource and Free Tool to Work With Messy Data

open refineIn all the data intensive fields like retail, banking, telecom, insurance etc. managing data without any error is a challenging task. Data cleaning thus becomes vital in modifying or removing data in a database that may be duplicated, incomplete, incorrect or poorly formatted. Every data wrangler wants to cleanup and transform the data into other formats in a quick manner and practicing a lot to refine and analyse the raw data. This practice is widely referred as Data Wrangling, sometimes referred as data munging or data cleansing.

Data quality is an important aspect in the overall success of decision making. Inaccurate data leads to wrong assumptions and analysis. Consequently it leads to failure of the campaign or project. Redundant data can cause various problems like slow load ups, increases inconsistency and decreases efficiency. A good data cleaning tool solves these problems and cleans your database of redundant data, incorrect information and bad entries. Read More…

Jan 042014 Tagged with , ,

How to do data scraping from PDF files using PHP?

pdf data scraping

PDF Scraping using PHP

Situations arise when you want to scrap data from PDF or want to search PDF files for matching text. Suppose you have website where users uploads PDF files and you want to give search functionality to user which searches all uploaded PDF file content for matching text and show all PDFs that contains matching search keywords.

Or you might have all London real estate properties details in PDF report file and you want to quickly grab scrape data from PDF reports then you might need PDF scraping library. Read More…