Data Integration

Data integration is the process of transferring data between different storage types and locations. This typically includes extraction, cleaning, loading into target data repository and verification.

Data integration can be classified in different type of activities, depending of the objective to accomplish. The objective of a data Integration project might be:

   1. Data Migration migrate

Typically: data on old servers that will soon be refurbished needs to be transferred to a new system.

   2. Data Consolidation consolidate

Very often, after 2 companies merged, the data of the 2 companies is distributed amongst many different systems. The process of consolidation moves remote data into one central consolidated repository.

   3. Data Federation  (ETL for Business Intelligence and Data Warehousing) federate

The process of data federation moves data from many different sources into one central data repository to be able to make different kind of analysis (create the OLAP reports, create predictive or segmentation models, or any other statistical activities). Data Federation is mostly used in collaboration with Business-Intelligence tools (such as predictive analytic tools, data warehousing tool, OLAP tools)

   4. Data synchronization synchronization

Process that ensures that 2 different data repository contains the same up-to-date data.

   5. Master Data Management mdm

Processes and tools to define and manage non-transactional data. Provides for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing data to an organization to ensure consistency and control.

For example, here is a screenshot of a simple data-load script made with Anatella that imports a text file into a database:
  data load

This Anatella script:

  • Loads the text file (that can be compressed on RAR,ZIP,GZ,LZO) (the de-compression will occur in RAM "on-the-fly").
  • Checks for the name of the fields:
    • Field names like "key" or "age" are usually forbidden inside a relational database
    • Field names with special characters (like the quote or the minus sign) are forbidden
    …and Anatella automatically correct the field names so that they are ""accepted" by the database
  • Creates the target table inside the database (using a "CREATE TABLE" SQL statement). The field types are automatically detected based on the content of the text file.
  • Uploads the text file inside the database ("INSERT" type of operation).
The nice thing about this script is that it works for any source text file and any target database!
This is the easiest solution if you need to quickly upload a large text file into a database.