Talend Open Studio

Mar 212019 Tagged with , , , 0 Responses

Talend Introduction & Tutorial to Merge files, having same schema

What is ETL?

Extract, Transform, Load (ETL) is the process of extracting data from various data sources, organizing it together, and storing it into a single database for later use like decision making and business insights. Before people used to perform ETL through manual coding in SQL or .NET, but today lots of ETL tools are available that simplify the process. ETL is generally used for data migration, data replication, operational processes, data transformation and data synchronization.

ETL Process

Extract

Extract is the first step in the ETL process and the most important step. Data is saved in various formats like in row text file, Excel or CSV files, RDMS database or in JSON or XML files. This process allows to read those different data sources and pass it to the next process which is Transform.

Transform

This second step transforms data into required format, it includes various operations on data such as Joining, Sorting, Filtering, Type Conversion, Lookups, Validating and other data operations and these steps make data prepared for the next step.

Load

In this last step the processed data get loaded to final destination which can be raw file, can be saved in Excel or CSV or also can be loaded in to database system like MySQL, Access or PostgreSQL and many other available options.

ETL Tools and Software

There are many ETL tools available in market both commercial as well as open source like Informatica Power Center, IBM Infosphere Information Server, Oracle Data Integrator, Microsoft SQL Server Integrated Services(SSIS), Ab Initio, Sybase ETL and many more.

ETL has big role in web scraping process. Data scraped from Public websites or other sources are not always in well format or some time it’s messy, ETL tools like Talend and other tools helps to transform the data in required format, validate them, merge them and load it to database like MySQL, NoSQL, sqLite, Oracle and many others or storage target like Amazon S3, FTP, Azure, Dropbox and others. Read More…