Data quality

With Anatella, you can easily perform any data quality and data cleaning tasks.

You can easily (non-limitative list):

  • Check the validity of character fields
    For example, check for the right formats using powerful regular expressions.
    You can use the following Anatella operator to perform this task:
          regex

  • Check the validity of Numeric fields
    For example: you can compute means, number of unique values, look for the highest & lowest number, count the number of missings, etc. You can use the following Anatella operator to perform this task:
          regex

  • Check for missing values
    You can use the following Anatella operator to perform this task:
          regex

  • Check dates
    For example: is it the right format, is it in Range?
    You can use the following Anatella operators to perform this task:
          regex and regex

  • Looking for duplicates
    Anatella contains a “box” to remove duplicates.
    You can use the following Anatella operator to perform this task:
          regex

  • Looking for consistency between a set of (primary) keys between different datasources.
    You can use the following Anatella operator to perform this task:
          regex and regex

  • Compare 2 datasets.
    You can compare a selection of the character fields & numeric fields inside the 2 dataset.
    You can use the following Anatella operator to perform this task:
          regex

  • You can design any complex test that you want using the powerful Javascript scripting engine included inside Anatella.

Once an error has been detected you can easily correct it, for example using the replace string?Anatella operator.

Automated text-spelling correction

Anatella also include a unique operator that checks & corrects the spelling mistakes in any text field. For example, let’s assume that your database contains a field named “City of Birth”. This field will usually contains many different orthography of the same city. For example, the city "RIO DE JANEIRO" can be mis-spelled in a number of different ways (this is a real-world example):?

  • RIO DXE JANEIRO, RIO DE JAEIRO, RIOP DE JANEIRO, RIO NDE JANEIRO, RIO DEJANEIRO, `RIO DE JANEIRO, RIO DE JANIRO, RIO DE JANEI RO, RIO DE JANEIRIO, RI0 DE JANEIRO, RIO DE JNEIRO, RIO DE JANEEIRO, RIO DE JANEIROO, RIO DE JANAEIRO, RIO DE JANEIROR, RIO DE JANEIRO RJ
The Anatella automatic-spelling-correction-operator ( regex ) will detect and correct all these miss-spellings.