Data Quality

Data quality can simply be described as a fitness for use of data. To be more specific every portion of data has to be accurate to clearly represent the value of itself. This is as much important for the clarity of data as for the correlation between massive databases. Without certain standards those databases would collapse.

There is a number of different approaches to the subject of data quality, but regardless of that data has to fit a few objectives concerning its correctness, consistency, completeness and validity. To do that we have to put through our data over certain process that involves extracting the very core of data, cleaning it from any unnecessary information, conform to our requirements and deliver high quality data at the end. The most important actions are cleaning the data and conforming it. The final quality of data depends on these factors.


Data Quality as a process

A procedure that provides mentioned objectives has to be fast – significant volumes of data processed in a alloted time frame; corrective – improving the information; thorough – a source of trustworthy data and transparent – the source of data must reveal all flaws before the whole process starts or it will affect the results dramatically. Those objectives have to be balanced in order to obtain maximum efficiency.

Sometimes completeness will be more important than speed of the process and sometimes quite the opposite. It is unable to do both at the highest possible standard. The key is to find a well-balanced point between those two. Data quality suffers most when process is being rushed in exchange for lack of thoroughness.

Corrective role of the process is highly wanted due to number of irrelevant information, known as dirty data, but also it may slow down the procedure when it is overzealous. This objective is balanced by transparency of the data source, but it has to be controlled in the same fashion as above. Revealing too much produces an effect where the insight is superior to results and that is unacceptable, where insufficient transparency leads to generating dirty data and that impacts data quality.

Procedures that are keeping database in high quality must be run periodically to ensure a desired level of quality standards remain. Records can not be duplicated, out of date, unsynchronized (important in multiple disparate databases). Such an imperative task is provided by data stewards, who are in charge of data quality. Information-quality leader is responsible for balancing main objectives (mentioned above) and taking audits to have control over whole process, wherever suitable for making an improvement in favor of better data quality. There is also a dimension manager, who is in charge of publishing conformed dimensions and creating them in a consistent fashion. Data warehouse manager is also worth mentioning – he is the person responsible for keeping the data warehouse running without any clutter between the warehouse and the quality data assurance team.
It means that business people are accountable for value of the data (it is not IT responsibility). Commitment of keeping good quality of the data should be permanent, not temporary for better economic results.

Data quality can be used in many different branches, from plain databases through intelligence gathering, consumption requirements, supply management. This reveals how massive data quality assurance is, providing huge amount of information and supporting many fields.

Data Quality challenges

The challenges addressed by data quality platforms can be broken down into business and technical requirements.
The technical data quality problems are usually caused by data entry errors, system field limitations, mergers and acquisitions, system migrations.

    Technical DQ challenges:
  • Inconsistent standards and discrepancies in data format, structure and values
  • Missing data, fields filled with default values or nulls
  • Spelling errors
  • Data in wrong fields
  • Buried information
  • Data anomalies

To be able to measure Data Quality, data should be divided into quantifiable units (data fields and rules) that can be tested for completeness and validity.

    Some business DQ challenges:
  • Reports are accurate and credible
  • Data driven business process work flawlessly
  • Shipments go out on time
  • Invoices are accurate
  • Data quality should be a driver for successful ERP, CRM or DSS implementations