1. <track id="mxwjq"><source id="mxwjq"></source></track>

        Data Provenance

        Data Provenance: On the origin of Data

        by Ernest Field

        The term Data Provenance refers to the subject of tracing the origins of data and its transfer between datasets. Determining the origins of scientific databases is a major concern.

        Data Provenance (DP)

        DP is defined as:

        Meta-data that describes the history of a data set starting from its original sources. A web search revealed scores of articles on this subject. All the examples concerned large datasets who’s DPs were difficult to assess. These datasets arose from workflows and transfers from other datasets. Some datasets may have indeterminate origins. The complexity of the generation process can make it difficult to assign a quality rating to many of these datasets.

        Several articles report on the role of DP in the validation of scientific databases. One issue they discuss is the development of the languages needed to describe data generation histories. One article introduces the idea of developing provenance stores or vaults that contain quality assured data. Another reports a major study on DP undertaken by IBM in collaboration with the EU. There are many more applications of DP reported that apply to specific subjects.

        OR Supports Data Provenance

        The quality of data is vital in all practical OR projects and underpins the quality of the final results. Increasingly data generators are being used to develop new techniques and databases but good quality data is still required. The many datasets that appear in OR projects fall into five main Categories, each with their own DP characteristics.

        1. Existing data on an organisation’s operations.
        2. Similar data on other organisations.
        3. New data produced by observation of the operations under study.
        4. Data produced by analysis and modelling.
        5. The results of literature surveys to check on other people’s relevant work.

        There is a Yorkshire proverb that says:

        "Never believe anything th’as not seen for thee sen, and then don’t be too sure”.

        This cautionary principle should be followed in assessing the quality of data.

        A more formal approach to DP would include providing for each data element meta-data, such as the last date of update. In the case of a database this would involve adding new DP fields.

        Value Of Provenance

        The benefits arising from DP are likely to be application specific, although the development of new DP languages will have general benefits. DP assists in identifying datasets that are unreliable because of missing or inaccurate data. Most OR studies find existing datasets need updating or upgrading for use by the OR study. The benefits accruing to the organisation from the improved data should be claimed as an OR result.

        To Summarise

        • Data provenance is important to all users of large datasets.
        • The complexity of DP requires new languages to record the metadata.
        • OR must continue to use high provenance data
        • The subject of data provenance should be kept under review.
        September 2009: Inside OR

        Latest News

        July 2020

        Learned Societies lead on Professionalisation of Data Science

        In this instance, professionalisation isn’t a smart suit or a slick presentation. It covers the knowledge, skills and behaviours that might be expected of a data scientist. The OR Society is a key player in a group of bodies interested in what such professionalisation might look like.

        Read More

        March 2020

        Optimising for Empathy

        It is easy to appreciate that AI systems improve business workflows but what about ones that enable better human-to-human interaction? This is where AI-enabled CRMs come in.

        Read More

        February 2020

        OR-led partnership to unlock 5G potential in rural Wales

        Experts from Cardiff University will lead a partnership studying the needs of rural communities to unlock innovations made possible with 5G technology.

        Read More