With growing need of data everywhere, the quality of data is equally acquiring importance to enable all to come at right understanding of things in organization. Some of the challenges that organizations keep facing when dealing with data are:
1. Ever changing data demand due to fast evolution of ICT sector
2. Multiple sources of data
3. Data providers do not care to respond to queries
4. Inadequate data processing tools
We expect to have a systematic mechanism for ongoing identification of quality problem. The approach that helps develop and evolve data culture within organization. The framework approach shall help streamline the process of collection, processing and dissemination of statistical data.
Bigger organizations are data heavy, and therefore framework has to be geared to the size of organization. In this article, a brief description of ITU’s Data Quality Assurance Framework is covered.
Components of framework:
1. Set of underlying principles that form the basis of DQAF.
2. Quality dimensions highlighting the various aspects of data and process quality
3. Quality guidelines comprising of good practices for assuring quality.
4. Quality assessment and improvement program
“It is important that decision makers and the public have confidence that, irrespective of the organization that has responsibility for the statistics, they are compiled in accordance with accepted data quality standards.”
Good practices include:
High quality statistics accessible to all
a. Regular consultations with key users
b. Periodic review to ensure relevance
c. Providing access to all – make decisions and reports publicly available
Ensure impartiality and highest professional standards
a. Follow published professional code of conduct.
b. If a different standard is evolved, publish the standard.
c. Follow an accepted methodology, terminology, and data presentation.
Stay scientific and adhere to best practices
a. Continuously aim to improve systems, methodology and quality in data related activities.
b. Continuously enhance skills of concerned officials by training, seminars, publication of scientific papers etc.
c. Steadfast documentation efforts for current processes on data collection, definitions, classifications, churning, quality assessment and publication.
Chose timeliness and aspects of data quality to stay cost-efficient and minimize reporting burden.
a. Stay systematic. To that end identify critical categories of data.
b. Work jointly on the timelines with the concerned entities.
c. Contributing to an integrated data & statistics presentation plan making gaps or overlaps visible to all concerned.
Keep a process to ensure PII (Personally Identifiable Information) data confidentiality
a. Ensure check by framework and procedure for approval.
Erroneous interpretation of data to be immediately checked
a. Build a process to respond quickly to the reported errors
b. Develop educational material to understand the published data or statistic.
Adhere to professional standards that are practical and feasible
a. Network with experts and ensure decisions are impartial, implementation is monitored and promulgate good practices.
b. Participate in international conferences, meetings and discussions.
c. Coordinate and organize participation amongst experts to encourage complementarities and synergy. The cooperation may stem from joint projects, advocacy and empowering by sharing knowledge.
The data that is sought is relevant to the users. Thus, for measuring relevance we must ascertain:
a. Who are the users of data?
b. What are their needs?
c. Whether there are processes in place to determine the views of users and the uses they make of data.
Accuracy can be measured by estimating data values released and true value as per the definition of the released dataset. The accuracy of data is as good as the source and process employed to extract data. It is therefore recommended to have a documented and agreed expectation on data collection and processing.
Credibility comes from the trust (or brand image) of the source producing the data and the process and quality process in place during data production process.
Coherence refers to consistency and logic within & across dataset in dimensions of time and other classifications. Thus, data definitions may not change without explanation from one classification to another or shall imply different interpretations in different time; sum of parts shall equal to total etc. To ensure coherence, metadata creation and management is fundamental element.
Data has a value for a certain duration only, thus if the data is released later it may not be relevant. Releasing data on time help data consumers to plan and add value to the ecosystem. Having a warning system in place, and associating service levels would help to contain inappropriate delays.
Is your data readily discoverable, and available in consumption ready format? Accessibility of data also comes from affordability, metadata availability and user support.
The interpretability of data is largely determined by users. Therefore, it is pertinent to adequately include data definitions, metadata, and identify target user groups. Clarity in objectives of releasing data and understanding target consumers helps data producers to address interpretability.
Process Quality Dimensions
8. Sound Methodology
Following best practices and international standards across all stages of data collection from data requirements, through design, data collection, processing, analysis, dissemination, archiving and evaluation. Sound methodology also include sufficient documentation and training of staff.
9. Sound Systems
Data collection stages are all technology driven, it is important to adhere to best practices and international methodologies when employing IT systems for data production activities. The systems used should be optimally used with minimum waste of time and space.
10. Cost Efficiency
Not only does it cost to prepare, organize, analyse data but also it costs to procure data. It is important to optimize costs at all stages from data procurement to final dissemination.
A well-defined scope, general guidelines, detailed guidelines, monitoring mechanism and reference documentation at all the stages of data collection to final delivery activity is an important roadmap towards ensuring quality guidelines.
1. Scope: A short description of the activity to which the guideline is for.
2. General Guidelines: Statement of the best practices reflecting the aims of guidelines
3. Detailed guidelines: Covers all aspects of quality and performance to be addressed
4. Monitoring mechanisms: the methods by which adherence to the guidelines can be monitored including quality, performance indicators and quality assessments.
5. Reference Documentation: documents that elaborate guidelines and methodologies in place.