On the Origin of Metadata
Abstract
:1. Introduction
2. Proposition I: Species Have an Unlimited Reproductive Capacity
2.1. Darwin’s View
2.2. The (Meta)Data Counterpart
3. Proposition II: In Fact, Numbers of Each Species are Limited
3.1. Darwin’s View
3.2. The (Meta)Data Counterpart
4. Proposition III: Individual Variation & Differences are Hereditary
4.1. Darwin’s View
4.2. The (Meta)Data Counterpart
- Binary metadata describe the data on bit level. Bitstreams are the actual data in a file. Binary metadata, e.g., file system information and file header information, keep the enclosed information accessible by pointing out how the bits should be transformed to a representation of the data, e.g., in a certain compression format.
- Technical metadata describe the data on file level. Data formats and their derivatives evolve quickly. As both container and compression formats age, it is hard to find software that is still able to interpret old formats. The only way to keep this kind of information accessible is to support migration and/or emulation in which the technical metadata, e.g., coder-decoder (codec) information, will be key in keeping that possible.
- Structural metadata describe the relationships between a set of files that correspond to a possible representation of the intellectual content of certain data. A certain book might be an aggregation of a set of chapters with pages in a specific order identified by the table of contents. This structural metadata is necessary to fully describe the complete book as a correct ordering of these pages.
- Descriptive metadata describe extra data, e.g., author, title, location, date, etc., to better find and locate the original data. When exchanging digital multimedia content from different industries/institutions—be it broadcasters, libraries, cultural institutions, and archives—an additional problem concerning descriptive metadata arises, i.e., a lot of industries/institutions already describe, control, and save their descriptive metadata according to their own (standardized) schemes. As such, some file these extra metadata as metadata, others file them as real data. Both strategies have their pros and cons. If a coordinating institution wants to file these extra descriptions as metadata, it means it is forced to choose one metadata standard to do so, which is not obvious, as most metadata schemes are domain specific. To guarantee lossless filing of all descriptive metadata, our coordinating institution must opt for the lowest common denominator of all descriptive metadata schemes used by all partnering industries/institutions, which would lead to an enormous unmaintainable metadata scheme. It is therefore best to archive the descriptive metadata (in its original metadata format) together with their original data, thus being sure not to lose any information ever.
- Preservation metadata describe essential extra data that support and document the digital preservation process. No digital storage device is perfect and perpetually liable, as bit preservation is still an unsolved paradigm. The simplest model of these failures is analogous to the decay of radioactive atoms. Each bit in a data file independently is subject to a random process that has a constant small probability per unit time of causing its value to flip. The time after which there is a 50% probability that a bit will have flipped is the bit half-life. The requirement of a 50% chance that 1 petabyte of data will survive for a century translates into a required bit half-life of 8×1017 years. To put things into perspective, the current estimate of the age of the universe U is 1.4 × 1010 years, so this is a bit half-life of approximately only 6 × 107U [10]. As stated earlier, information in a digital form is a conceptual object. This information can be altered and copied pretty easily without one notifying that in its visible representation. Opposed to analogue information, it is indeed much harder to preserve the authenticity of digital information. This too can be solved by adding tenability metadata to the preservation package of the archived essence. Such metadata have check sums, digital signatures, certificates, encryption, and cyclic redundancy checksum for indicating the data is not altered without it being documented. Furthermore, an archived dataset also needs its provenance documented. This type of preservation metadata (e.g., encoding software, version history, references to the original sources, etc.) describes the genesis of the intrinsic information, i.e., the original owners of the data, the processes determining the current form of that data, and all of its available, intermediate versions, as this information is vital in verifying all changes the data has experienced from genesis until date. Lastly, context-aware metadata (e.g., related data sets, help files, original language on first publication, etc.) must be retained, as these describe possible relationships of the intrinsic data with other data that is not embraced within its own information package.
- Lastly, rights metadata describe the rights on digital objects (e.g., rights metadata for describing copyright statements, (changing) licenses, and possible grants that are given), as this info is also vital to guarantee long-term access to the data, and thus must be saved too.
5. Proposition IV: Survival of the Fittest
5.1. Darwin’s View
5.2. The (Meta)Data Counterpart
- The big, proprietary player buys the agile open solution and incorporates it within its own software solution. The big player embraces the obviously fit characteristics of the open agile solution and continues its world domination, e.g., Google bought Freebase and now improves its search results using a semantic knowledge graph [12].
- The big, proprietary player buys the agile open solution, puts it aside and hopes to maintain its world domination without embracing evolution, e.g., SUN Microsystems bought Netscape’s web server suite—at that time superior to SUN’s suite, but failed to incorporate it within its own suite and finally stopped further developing evolutions of it as it tried to win the Java battle [13]. In two years time Microsoft’s IIS server took over, together with an incumbent agile open source software solution, the Apache software foundation [14]. He, whodaresto stand still, will—compared to his opponents—automatically go backwards and be eradicated.
- The small goes for world domination, becomes big itself and than the use of “open standards” becomes a subtle game changer in just getting on top of one another. Before 2005, when Apple was far smaller than Microsoft, it invested heavily in “open standards” through W3C, i.e., HTML, CSS, and JavaScript, and they told at numerous occasions that Microsoft was holding back evolution through not adopting these standards as proposed, but keep on building their own proprietary version of it [15]. Only five years later, the situation completely reversed. Now Apple is more dominant than Microsoft, pushing its own end-to-end ecosystem without adhering to all “open standards” out there. Microsoft took up standardization again and had one of the first browsers implementing most of the new HTML5 features.
6. Proposition V: Environment is a Selecting Mechanism
6.1. Darwin’s View
6.2. The (Meta)Data Counterpart
7. Proposition VI: Sexual Selection
7.1. Darwin’s View
7.2. The (Meta)Data Counterpart
8. Proposition VII: Mutability of the Genome
8.1. Darwin’s View
8.2. The (Meta)Data Counterpart
9. Proposition VIII: Cooperation of Mutations and Natural Selection Leads to Adaptation
9.1. Darwin’s View
9.2. The (Meta)Data Counterpart
10. Proposition IX: Mankind and Its Behavior are Products of an Evolutionary Process
10.1. Darwin’s View
10.2. The (Meta)Data Counterpart
11. Discussion
12. Conclusions
Acknowledgments
References
- Berners-Lee, T.; Hendler, J.; Lassila, O. The semantic web. Sci. Am. 2001, 284, 34–43. [Google Scholar]
- Darwin, C. On the Origin of Species; Murray: London, UK, 1859. [Google Scholar]
- Darwin, C. Descent of Man, and Selection in Relation to Sex; Murray: London, UK, 1871. [Google Scholar]
- Darwin, C. The Expression of the Emotions in Man and Animals; Murray: London, UK, 1872. [Google Scholar]
- de Laender, J. Het Verdriet van Darwin. Over de Pijn en de Troost van het Rationalisme; ACCO: Leuven, Belgium, 2004. [Google Scholar]
- Weinberger, D. Metadata and understanding. KMWorld Magazine. 29 September 2006. Available online: http://www.kmworld.com/Articles/News/News-Analysis/Metadata-and-understanding-18278.aspx (accessed on 7 December 2012).
- Raup, D.M. Extinction from a paleontological perspective. Eur. Rev. 1993, 1, 207–216. [Google Scholar] [CrossRef]
- ISO/IEC. Space Data and Information Transfer Systems—Open Archival Information System—Reference Model. Available online: http://www.iso.org/iso/catalogue_detail.htm?csnumber=24683 (accessed on 4 December 2012).
- Mendel, G.J. Versuche über pflanzenhybriden. Verh. Naturforschenden Ver. Brünn 1866, 4, 3–47. [Google Scholar]
- Rosenthal, D. Bit preservation: A solved problem? In Proceedings of the 5th International Conference on Preservation of Digital Objects, London, UK, 29–30 September 2008; pp. 1–7.
- van Valen, L. Molecular evolution as predicted by natural selection. J. Mol. Evol. 1974, 3, 89–101. [Google Scholar] [CrossRef]
- Singhal, A. Introducing the knowledge graph: Things, not strings. Available online: http://googleblog.blogspot.be/2012/05/introducing-knowledge-graph-things-not.html (accessed on 4 December 2012).
- Garud, R.; Jain, S.; Kumaraswamy, A. Institutional entrepreneurship in the sponsorship of common technological standards: The case of SUN microsystems and JAVA. Acad. Manag. J. 2002, 45, 196–214. [Google Scholar] [CrossRef]
- The Apache Software Foundation Home Page. Available online: http://www.apache.org (accessed on 4 December 2012).
- Züger, M.; Poltier, S.; Volkart, A. Economic challenges of standardization. Internet Econ. V 2010, 1, 31–54. [Google Scholar]
- Schema.org Documentation. Available online: http://www.schema.org/docs/documents.html (accessed on 4 December 2012).
- Cusumano, M.; Mylonadis, Y.; Rosenbloom, R. Strategic maneuvering and mass-market dynamics: The triumph of VHS over betamax. Bus. Hist. Rev. 1992, 66, 51–94. [Google Scholar] [CrossRef]
- Angelides, M.; Agius, H. The Handbook of MPEG Applications: Standards in Practice; Wiley: West Sussex, UK, 2011. [Google Scholar]
- ISO/IEC. Information Technology—Multimedia Content Description Interface—Part 1: Systems (MPEG-7). 2002. Available online: http://mpeg.chiariglione.org/tutorials/papers/IEEEMM_mp7overview_withcopyrigth.pdf (accessed on 4 December 2012).
- Dublin Core Metadata Initiative. DCMI Metadata Terms. 2008. Available online: http://dublincore.org/specifications/ (accessed on 4 December 2012).
- Weagley, J.; Gelches, E.; Park, J.-R. Interoperability and metadata quality in digital video repositories: A study of Dublin core. J. Libr. Metadata 2010, 10, 37–57. [Google Scholar] [CrossRef]
- Shadbolt, N.; Hall, W.; Berners-Lee, T. The semantic web revisited. IEEE Intell. Syst. 2006, 21, 96–101. [Google Scholar]
- Klyne, G.; Carroll, J. Resource Description Framework (RDF): Concepts and abstract syntax. W3C Recommendation. Available online: http://www.w3.org/TR/rdf-concepts (accessed on 4 December 2012).
- Hillmann, D.; Phipps, J. Application profiles: Exposing and enforcing metadata quality. In Proceedings of the 7th International Conference on Dublin Core and Metadata Applications, Singapore, 27–31 August 2007; pp. 53–62.
- Miles, A.; Bechhofer, S. SKOS simple knowledge organization system reference. W3C Recommendation. Available online: http://www.w3.org/TR/skos-reference/ (accessed on 4 December 2012).
- van Hooland, S.; Verborgh, R.; de Wilde, M.; Hercher, J.; Mannens, E.; van de Walle, R. Evaluating the success of vocabulary reconciliation for cultural heritage collections. J. Am. Soc. Inf. Sci. Technol. 2012, in press. [Google Scholar]
- Free Your Metadata Initiative. Publish and polish your metadata using google refine. 2012. Available online: http://freeyourmetadata.org/ (accessed on 4 December 2012).
- Google Refine. A power tool for working with messy data. 2012. Available online: http://code.google.com/p/google-refine/ (accessed on 4 December 2012).
- Cyganiak, R.; Jentzsch, A. Linking open data cloud diagram. 2011. Available online: http://lod-cloud.net/ (accessed on 4 December 2012).
- Open Calais Home Page. Available online: http://www.opencalais.com/ (accessed on 4 December 2012).
- GeoNames Home Page. Available online: http://www.geonames.org/ (accessed on 4 December 2012).
- DBPedia Home Page. Available online: http://dbpedia.org/ (accessed on 4 December 2012).
- Freebase Home Page. Available online: http://www.freebase.com/ (accessed on 4 December 2012).
- de Sutter, R.; Braeckman, K.; Mannens, E.; van de Walle, R. Integrating audiovisual feature extraction tools in media annotation production systems. In Proceedings of the 13th IASTED International Conference on Internet and Multimedia Systems and Applications, Honolulu, HI, USA, 17–19 August 2009; pp. 76–81.
- TheDatatank Open Source Framework. Available online: http://www.thedatatank.com/ (accessed on 4 December 2012).
- Semantifier Tool. Available online: http://datatank.demo.ibbt.be/The-Semantifier/ (accessed on 4 December 2012).
- van de Sompel, H.; Sanderson, R.; Nelson, M.; Balakireva, L.; Shankar, H.; Ainsworth, S. Memento: Time travel for the web. 2009. Available online: http://arxiv.org/abs/0911.1112 (accessed on 4 December 2012).
- Coppens, S.; Mannens, E.; Vandeursen, D.; Hochstenbach, P.; Janssens, B.; van de Walle, R. Publishing provenance information on the web using the memento datetime content negotiation. In Proceedings of the 5th WWW Linked Data on the Web Workshop, Hyderabad, India, March 2011; pp. 6–15.
- PREMIS—PREservation Metadata: Implementation Strategies. 2012. Available online: http://www.loc.gov/standards/premis/ (accessed on 4 December 2012).
- Bizer, C.; Heath, T.; Berners-Lee, T. Linked data—The story so far. Int. J. Semant. Web Inf. Syst. 2009, 5, 1–22. [Google Scholar]
- Berners-Lee, T. Linked data—5 star scheme. 2006. Available online: http://www.w3.org/DesignIssues/LinkedData.html (accessed on 4 December 2012).
- Nelson, T. Computer Lib/Dream Machines; Microsoft Press: Redmond, WA, USA, 1987. [Google Scholar]
- Logan, R. What is information? Why is it relativistic and what is its relationship to materiality, meaning and organization. J. Inf. 2012, 3, 68–91. [Google Scholar]
- Mannens, E. Interoperability of Semantics in News Production. Ph.D. Thesis, Ghent University, Ghent, Belgium, March, 2011. [Google Scholar]
© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Share and Cite
Mannens, E.; Verborgh, R.; Van Hooland, S.; Hauttekeete, L.; Evens, T.; Coppens, S.; Van de Walle, R. On the Origin of Metadata. Information 2012, 3, 790-808. https://doi.org/10.3390/info3040790
Mannens E, Verborgh R, Van Hooland S, Hauttekeete L, Evens T, Coppens S, Van de Walle R. On the Origin of Metadata. Information. 2012; 3(4):790-808. https://doi.org/10.3390/info3040790
Chicago/Turabian StyleMannens, Erik, Ruben Verborgh, Seth Van Hooland, Laurence Hauttekeete, Tom Evens, Sam Coppens, and Rik Van de Walle. 2012. "On the Origin of Metadata" Information 3, no. 4: 790-808. https://doi.org/10.3390/info3040790