Technical Knowledge Sharing Session
Data Profiling & Data Quality
Data Profiling
Data profiling is the process of examining the data available from a source
and collecting statistics or summary information about that data.
What do you get?
• Frequency
• additional metadata information such as
• data types,
• length,
• discrete values,
• uniqueness, occurrence of null values,
• typical string patterns. etc.
Benefits of data profiling
• To improve data quality,
• shorten the implementation cycle of major projects, and
• improve users' understanding of data
Data Quality
Tooling Demo
Open Source Data Quality [OSDQ]
Java Based free tool. Freedom to write logics and
Connects to few databases update data directly to
Adhoc run, does not save your databases
connections
Quick analysis and reports for Get colourful charts against
profiling each column you profile and
save them in various formats
Talend Data Quality
Community Based free tool Freedom to write logics and
Lot of data connectors update data directly to
databases.
Quick analysis and reports for Get bar charts against each
profiling with loads of inbuilt column you profile.
capabilities You cannot save them in free
versions
Questions, Feedback