2013 Scholarly Publishing Brown Bag Series Dataset Metadata Sai Deng University of Central Florida Libraries oDataset Metadata Service at UCF oUnderstanding data, research data and datasets oWhat do the funding agencies say: Data, metadata related requirements oData types, formats, and documentation oResearcherID, Scopus and ORCid oDOIs and data citation oDataset record examples, their associated standards, and data repositories oCuration Tools for datasets o o o o o o o o o o o o o o o o o o o o o o o o o o Data Research data Dataset Data set Dataset Metadata Service Research data types Research data formats Data documentation Metadata Metadata standards Metadata schemas Controlled vocabularies Thesauri Funding agencies ResearcherID ORCid DataCite DOI ARKs Handles Data publication Data archiving Data sharing Data repository Data citation Data curation * Word cloud generated using Tagxedo. o “Data Set (also called ‘Dataset’) Metadata” provides researchers consultation on: oProject and dataset documentation; oAcquiring DOIs for your datasets; oMetadata standards (Common and Domain Specific); oMetadata schemas customization; oControlled vocabularies and thesauri; oData curation tools and practices. o Assists in describing basic properties of your data and enriching metadata for your datasets; o Supports applying controlled vocabularies or optimizing keywords to enhance the search of your datasets; o Helps to prepare your metadata and data for deposit and preservation. o Data are numerical quantities or other factual attributes derived from observation, experiment or calculation. – National Research Council, 1992a. "Setting priorities for space research: Opportunities and imperatives." o Data are facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors. Data in a database may be characterized as predominantly word oriented (e.g., as in a text, bibliography, directory, dictionary), numeric (e.g., properties, statistics, experimental values), image (e.g., fixed or moving video, such as a film of microbes under magnification or time-lapse photography of a flower opening), or sound (e.g., a sound recording of a tornado or a fire)… Data can also be referred to as raw, processed, or verified. - Committee for a Study on Promoting Access to Scientific and Technical Data for the Public Interest, National Research Council. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases (1999). Available at: http://www.nap.edu/openbook.php?record_id=9692&page=15 oIn the context of these Principles and Guidelines [Principles and Guidelines for Access to Research Data from Public Funding], “research data” are defined as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. – Organisation for Economic Co-operation and Development (OECD, 2007). OECD Principles and Guidelines for Access to Research Data from Public Funding. P.13. Available at: http://www.oecd.org/dataoecd/9/61/38500813.pdf oResearch data is often defined as the information (e.g. data sets, microarray, numerical data, clinical trial information, textual records, images, sound, etc.) generated or used as quantitative evidence in primary biomedical research. This research data is distinguished by the fact that it is accepted by the research community as a means to validate research findings, observations and hypotheses. - HLWIKI Canada (2011). http://hlwiki.slais.ubc.ca/index.php/Data_curation oResearch data, unlike other types of information, is collected, observed, or created, for purposes of analysis to produce original research results. - University of Edinburgh. http://www.ed.ac.uk/schools-departments/information-services/services/researchsupport/data-library/research-data-mgmt/2.2360/research-data-definition oA logically meaningful collection or grouping of similar or related data, usually assembled as a matter of record or for research, for example, the American FactFinder Data Sets provided online by the U.S. Census Bureau or the National Elevation Dataset available from the U.S. Geological Survey. - Online dictionary for library and information science (ODLIS). http://www.abc-clio.com/ODLIS/odlis_A.aspx oA research data set constitutes a systematic, partial representation of the subject being investigated. - Organisation for Economic Co-operation and Development (OECD, 2007). http://www.oecd.org/dataoecd/9/61/38500813.pdf o"Over the life course of a survey that results in a data set – from initial conceptualization to data publication and beyond - a huge amount of metadata is typically produced. These metadata can be recorded in DDI format and re-used as the data collection, processing, tabulation, and reporting/dissemination take place." - Arofan Gregory, Open Data Foundation (2011). The Data Documentation Initiative (DDI): An Introduction for National Statistical Institutes. Available at: http://odaf.org/papers/DDI_Intro_forNSIs.pdf o The term “data” is used in this report to refer to any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc. Such data may be generated by various means including observation, computation, or experiment. -National Science Foundation (2005). Long-Lived digital data Collections: enabling Research and education in the 21st Century. P.9. Available at: http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf o As stated in NSF’s “Information about the Data Management Plan Required for all Proposals” for Biological Sciences, the Federal government defines data (OMB Circular A-110) as: “…the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” This definition includes both original data (observations, measurements etc.) as well as metadata (e.g., experimental protocols, software code for statistical analysis etc.). o The NSF Grant Proposal Guide recommends the inclusion of a “data management plan” that explains how your proposal will comply with NSF’s data sharing policies. The data management plan may include: o The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project; o The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies); o Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements; o Policies and provisions for re-use, re-distribution, and the production of derivatives; and o Plans for archiving data, samples, and other research products, and for preservation of access to them. o See NSF's Grant Proposal Guide for more information. oNIH Data Sharing Policy and Implementation Guidance (http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm) o“Investigators seeking $500,000 or more in direct costs in any year should include a description of how final research data will be shared, or explain why data sharing is not possible…” oData Sharing Plan (to follow immediately after the Research Plan Section) o “The precise content of the data-sharing plan will vary, depending on the data being collected and how the investigator is planning to share the data. Applicants who are planning to share data may wish to describe briefly the expected schedule for data sharing, the format of the final dataset, the documentation to be provided, whether or not any analytic tools also will be provided, whether or not a data-sharing agreement will be required…” oData Documentation o “Regardless of the mechanism used to share data, each dataset will require documentation. (Some fields refer to data documentation by other terms, such as metadata or codebooks)… Documentation provides information about the methodology and procedures used to collect the data, details about codes, definitions of variables, variable field locations, frequencies, and the like. The precise content of documentation will vary by scientific area, study design, the type of data collected, and characteristics of the dataset.” o “It is appropriate for scientific authors to acknowledge the source of data upon which their manuscript is based. Many investigators include this information in the methods and/or reference sections of their manuscripts. Journals generally include an acknowledgement section… Most journals now expect that DNA and amino acid sequences that appear in articles will be submitted to a sequence database before publication." oChanges to Public Access Policy Compliance Efforts Apply to All Awards with Anticipated Start Dates on or after July 1, 2013 (http://grants.nih.gov/grants/guide/notice-files/NOT-OD-13-042.html) oFor non-competing continuation grant awards with a start date of July 1, 2013 or beyond: oNIH will delay processing of an award if publications arising from it are not in compliance with the NIH public access policy (which requires paper to be posted to PubMed within a year after publication; PMCID included in citations); oInvestigators will need to use My NCBI to enter papers onto progress reports. Papers can be associated electronically using the RPPR, or included in the PHS 2590 using the My NCBI generated PDF report. oData Management Plans for NEH Office of Digital Humanities: Proposals and Awards (http://www.neh.gov/files/grants/data_management_plans_2013.pdf) o “‘Data’ is defined as materials generated or collected during the course of conducting research. o Examples of humanities data could include citations, software code, algorithms, digital tools, documentation, databases, geospatial coordinates (for example, from archaeological digs), reports, and articles. o Excluded, however, are things such as preliminary analyses, drafts of papers, plans for future research, peer-review assessments, communications with colleagues, materials that must remain confidential until they are published, and information whose release would result in an invasion of personal privacy (for example, information that could be used to identify a particular person who was one of the subjects of a research study). o Many variables govern what constitutes “data” and the management of data, and each discipline has its own culture regarding data…” oContents of the DMP o "Expected data. The DMP should describe the types of data, samples, physical collections, software, curriculum materials, or other materials to be produced in the course of the project. It should then describe the expected types of data to be retained…“ o "Data formats and dissemination. The DMP should describe data formats, media, and dissemination approaches that will be used to make data and metadata available to others. Policies for public access and sharing should be described, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, and other rights or requirements…" o "Final Performance Reports. Final performance reports are required for all NEH awards. The final performance report must discuss the execution and any updating of the original DMP. This discussion should describe o data produced during the grant period; o data to be retained after the grant period expires; o verification that data will be available for sharing; o discussion of community standards for data format; o the plan to disseminate the data; o the format that will be used to make data available to others, including any metadata; and o the archival location of data." o NASA Earth Science Data & Information Policy (http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/) o "…NASA has adopted the following data policy (in this context the term 'data' includes observation data, metadata, products, information, algorithms, including scientific source code, documentation, models, images, and research results): o NASA will plan and follow data acquisition policies that ensure the collection of long-term data sets needed to satisfy the research requirements of NASA's Earth science program. o NASA commits to the full and open sharing of Earth science data obtained from NASA Earth observing satellites, sub-orbital platforms and field campaigns with all users as soon as such data become available. o o There will be no period of exclusive access to NASA Earth science data... o All NASA Earth science missions, projects, and grants and cooperative agreements shall include data management plans to facilitate the implementation of these data principles. o o [More on data access…] o o NASA will make available all NASA-generated standard products along with the source code for algorithm software, coefficients, and ancillary data used to generate these products. … Data archives will include easily accessible information about the data holdings, including quality assessments, supporting relevant information, and guidance for locating and obtaining data. [More on partnerships and metrics…] NASA Guidebook for Proposers Responding to a NASA Research Announcement (NRA) or Cooperative Agreement Notice (CAN) (http://www.hq.nasa.gov/office/procurement/nraguidebook/proposer2010.pdf) oUSDA Scientific Integrity Policy Handbook (Guidance for Implementation of DR 1074-001) July 10, 2013 (http://www.usda.gov/documents/usda-scientific-integrity-policy-handbook.pdf) oUSDA Forest Service o Forest Inventory & Analysis and National Research Data Archive (http://www.fs.fed.us/research/products/data/) o The Forest Service Metadata Users Guide (http://www.fs.fed.us/gac/metadata/index.html) o "The steps to get from the REAL WORLD to a GIS product are detailed and many. With each step, information must be gathered in the form of METADATA or information about data. Metadata describes the overall history of our data, including content, quality, condition, and other characteristics..." o 6 steps to create metadata: Gathering Metadata- Preparation-Creating FGDC Metadata-Publishing-Using Metadata-Maintaining Metadata" (http://www.fs.fed.us/gac/metadata/step1.html) oUSGS Data Management http://www.usgs.gov/datamanagement/index.php o How do I Create Metadata? (http://www.usgs.gov/datamanagement/describe/metadata.php) o "Metadata describe information about a dataset, such that a dataset can be understood, re-used, and integrated with other datasets. Information described in a metadata record includes where the data were collected, who is responsible for the dataset, why the dataset was created, and how the data are organized. Metadata generally follow a standard format, making it easier to compare datasets and to transfer files electronically." o Standard: "Make sure your record is compliant with FGDC standards." o Validation tool: Metadata Parser (http://geo-nsdi.er.usgs.gov/validation/) o Tools: Online Metadata Editor (OME); Metavist; FGDC Metadata Editor for ArcGIS 10; Morpho… o USGS Core Science Metadata Clearinghouse (http://mercury.ornl.gov/clearinghouse) oResearch data can be generated for different purposes and through different processes. In general, it can include the following types of data: o Observational: data captured in real-time, usually irreplaceable. For example, sensor data, survey data, sample data, neuroimages. o Experimental: ldata from lab equipment, often reproducible, but can be expensive. For example, gene sequences, chromatograms, toroid magnetic field data. o Simulation: data generated from test models where model and metadata are more important than output data. For example, climate models, economic models. o Derived or compiled: data is reproducible but expensive. For example, text and data mining, compiled database, 3D models. o Reference or canonical: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals. oText - flat text files, Word, PDF, RTF, XML. oNumerical - Statistical Package for the Social Sciences (SPSS), Stata, Excel. oMultimedia - jpeg, tiff, dicom, mpeg, quicktime. oModels - 3D, statistical. oSoftware - Java, C programs. oDiscipline specific - Flexible Image Transport System (FITS) in astronomy, Crystallographic Information File (CIF) in chemistry. oInstrument specific - Olympus Confocal Microscope Data Format, Carl Zeiss Digital Microscopic Image Format (ZVI). oDOE generates scientific research data in many forms, both text and non-text. Much of the Department's textbased R&D results are readily available via OSTI databases. OSTI has broadened efforts to make non-text scientific and technical information (STI) available as well, providing access to underlying non-text data such as numeric files, computer simulations and interactive maps, as well as multimedia and scientific images. - Department of Energy (DOE). http://www.osti.gov/data/index.shtml o Documents (text, Word), spreadsheets o Laboratory notebooks, field notebooks, diaries o Questionnaires, transcripts, codebooks o Audiotapes, videotapes o Photographs, films o Test responses o Slides, artifacts, specimens, samples o Collection of digital objects acquired and generated during the process of research o Data files o Database contents (video, audio, text, images) o Models, algorithms, scripts o Contents of an application (input, output, log files for analysis software, simulation software, schemas) o Methodologies and workflows o Standard operating procedures and protocols Other research records: o Correspondence o Project files o Grant applications o Ethics applications o Technical reports o Research reports o Master lists o Signed consent forms o To make your data easy to understand and analyze through your research lifecycle and in the long term, it is considered good practice to document your data. Data documentation is part of the data curation process. o Research data can be documented at various levels: Project level, File or database level and Variable or item level. o Documentation is meant to be read by humans; some metadata is designed more for machine processing than human readability. o Documentation and metadata are different things. However, metadata can be taken as a type of documentation. oThis session will explain an important metadata element, identifier, including identifier for researcher and for dataset; it will cover some dataset record examples, their related standards, and data repositories. o More metadata information will be covered in the Metadata Services session. oResearcherID (www.researcherid.com) oResearcherID provides a solution to the author ambiguity problem within the scholarly research community. oEach member is assigned a unique identifier to enable researchers to: o Manage their publication lists; o Track their times cited counts and h-index; o Identify potential collaborators and avoid author misidentification. oResearcherID information integrates with Thomson Reuters’ Web of Knowledge and is ORCID compliant, allowing you to: o Claim and showcase your publications from a single one account; o Search the registry to find collaborators; o Review publication lists and explore how research is used around the world. oPenny Beile oResearcherID: I-5179-2013 oScopus Author ID: 6508098666 http://www.researcherid.com/rid/I-5179-2013 o Scopus is the largest abstract and citation database of peerreviewed literature, features smart tools to track, analyze and visualize research. o Scopus Author IDs are system generated. o Penny Beile o ResearcherID: I-51792013 o Scopus Author ID: 6508098666 o ORCid: 0000-00033502-4865 http://www.scopus.com/authid/detail.url?authorId=6508098666 o Sai Deng o RID: C-3066-2013 o Scopus Author ID: 23979305900 o ORCid: 00000002-3681-4888 o Exchange Data with ORCID o ORCid: Combination of ResearcherID (Thomson Reuters) and Contributor ID (CrossRef); o Independent of publishers. http://orcid.org/0000-0002-3681-4888 oPenny Beile o ResearcherID: I5179-2013 o Scopus Author ID: 6508098666 o ORCid: 00000003-3502-4865 http://orcid.org/0000-0003-3502-4865 oDigital Object Identifier (DOI) o e.g. http://dx.doi.org/10.3886/ICPSR20363.v1 oArchival Resource Keys (ARKs) oe.g. http://ark.cdlib.org/ark:/13030/tf5p30086k? oHandles oe.g. http://soar.wichita.edu/handle/10057/3031 oPersistent URLs (PURLs) oAll can be resolved to an internet location. oDigital Object Identifier (DOI): an identifier scheme administered by the International DOI Foundation. It is built on the Handle System. oExample: Dataset: Experience of Violence in the Lives of Homeless Persons: The Florida Four City Study, 2003-2004 (ICPSR 20363) http://dx.doi.org/10.3886/ICPSR20363.v1 http://dx.doi.org/ 10.3886/ ICPSR20363 .v1 resolver service prefix (assigning body) suffix (resource) o DataCite: A global citations framework for data with member institutions offering services and advice to researchers. o Individuals wishing to register a DOI for their dataset normally do so via their data repository, rather than directly through DataCite. o Any repository wishing to register DOIs needs to obtain a username and password from DataCite to gain access to the registration service. o Alternatively, the organization can manage its DOIs through a third-party service such as EZID. (http://www.dcc.ac.uk/resources/how-guides/cite-datasets#x1-17000) o DataCite Membership o Current US members: CDL, Purdue, Harvard, IEEE, ICPSR, Dept. of Energy, Microsoft Research o Affiliated membership fee: 1700 p.a o Doesn't need to be a member, can work with a DataCite member or a third-party service to get DOIs for datasets o ICPSR (Interuniversity Consortium for Political and Social Research): an associate member of DataCite. o ICPSR’s “How to prepare citation”: o Citation required basic elements: o Identifier o Creator o Title o Publisher o Publication Year o For example: o Wright, James D., Jana L. Jasinski, Elizabeth Mustaine, and Jennifer Wesely. Experience of Violence in the Lives of Homeless Persons: The Florida Four City Study, 2003-2004. ICPSR20363-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-11-22. doi:10.3886/ICPSR20363.v1 o Persistent URL: http://dx.doi.org/10.3886/ICPSR20363.v1 o Can be exported as RIS (generic format for RefWorks, EndNote, etc.) or EndNote XML (EndNote X4.0.1 or higher) oDataCite Metadata Schema 3.0 (released 2013-07-24; preferred) (http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.0.pdf) FIELDS: resource creator title publisher publicationYear subject date resourceType alternativeIdentifier version description … http://www.icpsr.umich.edu/icpsrweb/ICPSR/datacite/studies/2036 3 oSocial Science Dataset oHumanities Dataset oBiological Science Dataset oBiotechnology Dataset oGeospatial and Earth Science Dataset ICPSR: Interuniversity Consortium for Political and Social Research. http://www.icpsr.umich.edu/icpsrweb/NA CJD/studies/20363?archive=NACJD&q=%22 university+of+central+florida%22&permit% 5B0%5D=AVAILABLE&x=-999&y=-84 Field Labels: Title Principal investigator(s) Summary Access notes Dataset(s) Field Labels: Study description Citation Funding Scope of study • • • • • • • • • Subject terms Smallest geographic unit Geographic coverage Time period Date of collection Unit of observation Universe Data types Data collection notes Methodology • • Study purpose Study design Field Labels: • Sample • Mode of data collection • Description of variables • Response rates • Presence of common scales • Extent of processing Field Labels: Version(s) Related publications Variables Utilities • Metadata exports • Download statistics oData Documentation Initiative (DDI): a metadata specification for the social and behavioral sciences. It is an XML metadata standard for documenting numeric data. Detailed information is available at: http://www.ddialliance.org/ oProjects using the DDI (http://www.ddialliance.org/ddi-at-work/projects) oDDI-compliant data repository: oICPSR oData deposit form: https://www.icpsr.umich.edu/cgi-bin/ddf2 oUCF is a member of ICPSR. docDscr The Document Description consists of bibliographic information describing the DDI-compliant document itself as a whole. Included Fields: citation • titleStmt • prodStmt • verStmt • holdings http://www.icpsr.umich.edu/icpsrweb/ICPSR/ddi2/studies/20363 Included Fields: Citation titlStmt rspStmt prodStmt fundAg grantNo distStmt biblCit Holdings stdyInfo Subject Abstract sumDscr Method dataColl Notes anlyInfo dataAccs setAvail useStmt stdyDscr The Study Description consists of information about the data collection, study, or compilation that the DDI-compliant documentation file describes. This section includes information about how the study should be cited, who collected or compiled the data, who distributes the data, keywords about the content of the data, summary (abstract) of the content of the data, data collection methods and processing, etc. fileDscr Data Files Description Included Fields: fileDscr fileTxt fileName Information about the data file(s) that comprises a collection. This section can be repeated for collections with multiple files. OAI_DC … Fields: oai_dc:dc dc:title dc:creator dc:subject dc:description dc:date dc:type dc:identifier dc:coverage dc:rights http://www.icpsr.umich.edu/icpsrweb/ICPSR/dc/studies/20363 OAI DC is an XML format for the serialisation of Simple Dublin Core metadata descriptions. It is used within the Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH). OAI-PMH requires that data providers support the oai_dc metadata format. Fields: collection Record Leader controlfield datafield subfield MARC21 XML MARC21 encoded in XML. … http://www.icpsr.umich.edu/icpsrw eb/ICPSR/marc/studies/20363 oCharles Brockden Brown Electronic Archive and Scholarly Edition “The Charles Brockden Brown Electronic Archive and Scholarly Edition is an editorial collective that is preparing an MLA-CSE approved print edition of Charles Brockden Brown's (1771-1810) personal letters, poetry, and selected periodical writings to be published by Bucknell University Press (7 volumes). A searchable archive of all of Brown's works (unedited) is also being developed.” –From project website. o This archive uses “The Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange” (TEI P5, http://www.tei-c.org/Guidelines/P5/) for text encoding; o It uses eXtensible Text Framework (XTF) to index bibliographical and primary-document materials. o All information is from the project website: http://www.brockdenbrown.ucf.edu oCharles Brockden Brown Electronic Archive and Scholarly Edition *Shown here are the documents display front-end. Don’t see TEI XML files for download. http://www.brockdenbrown.cah.ucf.edu/xtf3/search?browsetitle=ss;sort=title http://www.brockdenbrown.cah.ucf.edu/xtf3/view?docId=179806167.xml&chunk.id=d1084e134&toc.id=&brand=default ENRICH: European Networking Resources and Information concerning Cultural Heritage. • Below is the overall structure of an ENRICH-conformant XML document. • Examples from “The ENRICH Schema — A Reference Guide,” a conformant subset of Release 1.4 of TEI P5. The minimal required structure for teiHeader: <teiHeader> <fileDesc> <titleStmt> <title>[Title of manuscript]</title> </titleStmt> <publicationStmt> <distributor>[name of data provider]</distributor> <idno>[project-specific identifier]</idno> </publicationStmt> <TEI> <sourceDesc> <teiHeader> <msDesc xml:id="ex5" xml:lang="en"> <!-- ... metadata describing the manuscript --> <!-- [full manuscript description ]--> </teiHeader> </msDesc> <facsimile> <teiHeader> (TEI </sourceDesc> <!-- ... metadata describing the digital images --> </fileDesc> header) supplies the </facsimile> <revisionDesc> descriptive and <text> <change when="2008-01-01"> declarative information <!-- (optional) transcription of the manuscript --> <!-[revision information] --> </text> making up an electronic </change> </TEI> title page prefixed to http://projects.oucs.ox.ac.uk/ENRICH/Delive rables/referenceManual_en.html </revisionDesc> </teiHeader> every TEI-conformant text. msDesc (manuscript description) provides detailed information about a single manuscript. <msDesc xml:id="ex1" xml:lang="en"> <msIdentifier> <settlement>Oxford</settlement> <repository>Bodleian Library</repository> <idno>MS. Add. A. 61</idno> <altIdentifier type="former"> <idno>28843</idno> </altIdentifier> </msIdentifier> (http://projects.oucs.ox.ac.uk/ENRICH/ <msContents> Deliverables/referenceManual_en.html) <p> <quote xml:lang="lat">Hic incipit Bruitus Anglie,</quote> the <title xml:lang="lat">De origine et gestis Regum Angliae</title> of Geoffrey of Monmouth (Galfridus Monumetensis): beg. <quote xml:lang="lat">Cum mecum multa &amp; de multis.</quote> In Latin.</p> </msContents> <physDesc> <p> <material>Parchment</material>: written in more than one hand: 7¼ x 5⅜ in., i + 55 leaves, in double columns: with a few coloured capitals.</p> </physDesc> <history> <p>Written in <origPlace>England</origPlace> in the <origDate>13th cent.</origDate> On fol. 54v very faint is <quote xml:lang="lat">Iste liber est fratris guillelmi de buria de ... Roberti ordinis fratrum Pred[icatorum],</quote> 14th cent. (?): <quote>hanauilla</quote> is written at the foot of the page (15th cent.). Bought from the rev. W. D. Macray on March 17, 1863, for £1 10s.</p> </history> </msDesc> Examples from ENRICH Fields: msDesc msIdentifier Settlement repository Idno altIdentifier msContents P quote title physDesc p material History p origPlace origDate quote The official TEI P5 guideline is at: http://www.tei-c.org/release/doc/tei-p5doc/en/Guidelines.pdf More TEI projects and examples are available at the TEI website: http://www.teic.org/Activities/Projects/ dc.contributor.author Crawford, Nicholas G. dc.contributor.author Faircloth, Brant C. dc.contributor.author McCormack, John E. dc.contributor.author Brumfield, Robb T. dc.contributor.author Winker, Kevin dc.contributor.author Glenn, Travis C. dc.date.accessioned 2012-05-18T15:48:08Z dc.date.available 2012-05-18T15:48:08Z dc.date.issued 2012-05-16 dc.identifier doi:10.5061/dryad.75nv22qj dc.identifier.citation Crawford NG, Faircloth BC, McCormack JE, Brumfield RT, Winker K, Glenn TC (2012) More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaurs. Biology Letters 8(5): 783-786. dc.identifier.uri http://hdl.handle.net/10255/dryad.3 8214 dc.description We present the first genomic-scale analysis addressing the phylogenetic position of turtles, using over 1,000 loci from representatives of all major reptile lineages including tuatara… dc.relation.haspart doi:10.5061/dryad.75nv22qj/1 dc.relation.haspart doi:10.5061/dryad.75nv22qj/2 dc.relation.haspart … Dryad (https://datadryad.org/) o A digital repository for data underlying the international scientific and medical literature. This is an example of full metadata view. http://www.datadryad.org/handle/ 10255/dryad.38214?show=full dc.relation.isreferencedby doi:10.1098/rsbl.2012.0331 dc.relation.isreferencedby PMID:22593086 dc.subject ultraconserved elements dc.subject phylogenomic dc.subject phylogenetics dc.subject reptiles dc.subject turtles dc.subject evolution dc.subject archosaurs dc.title Data from: More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaurs dc.type Article dwc.ScientificName Pantherophis guttata dwc.ScientificName Pelomedusa subrufa dwc.ScientificName Chrysemys picta dwc.ScientificName Alligator mississippiensis dwc.ScientificName Crocodylus porosus dwc.ScientificName Sphenodon tuatara dwc.ScientificName Gallus gallus dwc.ScientificName Taeniopygia guttata dwc.ScientificName Anolis carolinensis dwc.ScientificName Homo sapiens dc.contributor.corresponding Author Faircloth, Brant C. prism.publicationName Biology Letters Dryad (https://datadryad.org/) o It is built upon the opensource DSpace repository software; o It utilizes a combination of Dublin Core (DC) and Darwin Core (DwC) metadata standards. o Digital Object Identifiers (DOIs) provided by DataCite through EZID. Files in this package Title Downloaded Description Download Details … o If clicking View File Details, it displays: Simple View o Fees: Starting Sept. 1, 2013, Dryad will charge fees upon submission. The submitter is asked to pay this fee at the time of submission unless: o the associated journal, or another organization, has already contracted with Dryad to cover the submission fee, or o the submitter is based in a country classified by the World Bank as a lowincome or lower-middle-income economy. o Additional submission fees will apply to data packages in excess of 10 GB and from journals without integrated submission. o Pricing Plan Comparison Tool (http://www.datadryad.org/pages/pricing) dc.contributor.author dc.contributor.author dc.date.accessioned dc.date.available dc.date.created dc.date.issued dc.identifier dc.identifier.uri Pluenneke, David [Collector] Pluenneke, David [Cataloger] 2010-09-14T19:48:27Z 2010-09-14T19:48:27Z 1987-07-15 1987-07-15 3647 http://library.wichita.edu/techserv/ herbarium/browse.asp?id=3647 en_US en_US en_US en_US en_US en_US http://hdl.handle.net/10057/3031 1034821 bytes image/jpeg en_US Wichita State University. Dept. of dc.publisher en_US Biological Sciences U.S. Dept. of Agriculture Natural dc.relation en_US Resources Conservation Service dc.relation.uri http://plants.usda.gov/ en_US Copyright Wichita State University, dc.rights en_US 2010 dc.source WSU herbarium en_US dc.subject Asteraceae en_US dc.subject Achillea lanulosa Nutt. en_US dc.subject Yarrow en_US dc.subject United States -- Colorado -- Teller Co. en_US dc.title Achillea lanulosa Nutt. [Taxon] en_US dc.title.alternative Yarrow en_US dc.type Image en_US dc.coverage.spacial North America en_US dc.coverage.spacial United States en_US dc.coverage.spacial Colorado en_US dc.coverage.spacial Teller Co. en_US NW1/4, SE1/4, Sec 32, T13S, R69W; dc.coverage.spacial en_US Woodland Pk Quad dwc.family Asteraceae en_US dwc.verbatimElevation 11600' en_US dwc.habitat Alpine en_US dc.identifier.uri dc.format.extent dc.format.mimetype dc.language.iso DSpace: A sample herbarium record from “Virtual Herbarium Collection” in Shocker Open Access Repository (SOAR) Files in this item Name: HERBARIUM_58.jpg Size: 1010.Kb Format: image/ jpeg View/Open View in Browser http://soar.wichita.edu/handle/10057/ 3031?show=full FIELDS: LOCUS DEFINITION ACCESSION NID KEYWORD SOURCE REFERENCE FEATURES BASE COUNT ORIGIN … National Center for Biotechnology Information (http://www.ncbi. nlm.nih.gov/) GenBank GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2011 Jan;39(Database issue):D32-7). http://www.ncbi.nlm.nih.gov/nucleotide?cmd=Retrieve&dop t=GenBank&list_uids=1293613 PubChem The leading freelyavailable, small compound database, part of the National Center for Biotechnology Information/ National Library of Medicine's Entrez suite of databases. http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5760 MeSH (Medical Subject Headings) is the NLM controlled vocabulary thesaurus used for indexing articles for PubMed. o Web display: Data and Resources Web Page XML File Web Page … Metadata Source ISO-19239 Metadata Original FGDC Metadata http://www.geoplatform.gov/node/243/bf5a5c64-085e-4c68-a489-93e8608d3ad1 Content Standard for Digital Geospatial Metadata (CSDGM) (http://www.fgdc.gov/m etadata/geospatialmetadata-standards) It is maintained by the Federal Geographic Data Committee (FGDC). Often referred to as the “FGDC Metadata Standard.” Geospatial Platform: An Internet-based capability providing shared and trusted geospatial data, services, and applications for use by the public and by government agencies and partners to meet their mission needs. Biological data of field activity 08CRD01 (B-1-08-VI) in U.S. Virgin Islands from 05/30/2008 to 06/13/2008 Metadata File Identifier: Metadata Language: eng; USA: utf8 FGDC/CSDGM Metadata Resource Type: Dataset Responsible Party: Individual Name: Clint Steele <http://walrus.wr.usgs.gov/staff/csteele.html> Organisation Name: U.S. Geological Survey (USGS) <http://www.usgs.gov>, Coastal and Marine Geology (CMG) <http://walrus.wr.usgs.gov> Position Name: InfoBank Group Leader <http://walrus.wr.usgs.gov/staff/csteele.html> Role: Point Of Contact Contact Info: … Metadata Date: 2013-03-03 Metadata Standard Name: ISO 19115-2 Geographic Information - Metadata - Part 2: Extensions for Imagery and Gridded Data Metadata Standard Version: ISO 19115-2:2009(E) http://catalog.data.gov/harvest/object/dfe4a33f-0fd2-4135-ae33-a1c526bd7a73/html Data Identification Abstract: United States Geological Survey, Saint Petersburg, Florida, Center for Coastal and Watershed Studies… Purpose: These data and information are intended for science researchers, students… FGDC/CSDGM Language: eng; USA Metadata Citation: Title: Biological data of field activity 08CRD01 (B-1-08-VI) in U.S. Virgin Islands from 05/30/2008 to 06/13/2008 Date: Date: 2013-03-03 Date Type: Publication Date Organisation Name: U.S. Geological Survey (USGS) <http://www.usgs.gov>, Coastal and Marine Geology (CMG) <http://walrus.wr.usgs.gov> Role: Publisher Contact Info: … Point Of Contact: … Representation Type: Vector Topic Category: Keyword Collection: Keyword: EARTH SCIENCE > OCEANS Associated Thesaurus: Global Change Master Directory (GCMD) Keyword: Marine Geology Associated Thesaurus: USGS CMG InfoBank Spatial Extent: West Bounding Longitude: -65.75000 East Bounding Longitude: -63.25000 North Bounding Latitude: 18.75000 South Bounding Latitude: 17.25000 Constraints: Please recognize the U.S. Geological Survey (USGS) as the source of this information. Physical materials are under controlled on-site access. Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGS… Legal Constraints: Use Constraints: Other Restrictions Other Constraints: Use Constraints: Please recognize the U.S. Geological Survey (USGS) as the source of this information. Physical materials are under controlled on-site access… … FGDC/CSDGM Distribution Metadata Distribution Format: Format Name: ASCII Format Version: File Decompression Technique: No compression applied Transfer Options: URL: http://walrus.wr.usgs.gov/infobank/b/b108vi/html/b-1-08-vi.nav.html Distributor: Distributor Contact: … Quality Scope: Dataset CSDGM Fields (under idinfo): Idinfo Citation citeinfo Origin Pubdate Title Pubinfo Top level elements: Onlink idinfo: Identification Information; Descript dataqual: Data Quality Abstract Information; Purpose spdoinfo: Spatial Data Supplinf Organization Timeperd Information; Status spref: Spatial Reference Spdom Information; eainfo: Entity and Keywords Attribute Information; Accconst distinfo: Distribution Useconst Information; Ptcontac metainfo: Metadata Native Reference Information. Crossref Content Standard for Digital Geospatial Metadata (CSDGM) Record in XML View http://catalog.data.gov/har vest/object/dfe4a33f-0fd24135-ae33a1c526bd7a73/original NASA Atmospheric Science Data Center (ASDC) Labels: Summary Related URL Geographic Coverage Spatial coordinates Temporal Coverage … http://gcmd.gsfc.nasa.gov/KeywordSearch/M etadata.do?Portal=langley&KeywordPath=Par ameters%7CATMOSPHERE%7CAIR+QUALITY%7C CARBON+MONOXIDE&OrigMetadataNode=GCM D&EntryId=MOP034&MetadataView=Full&Meta dataType=0&lbnode=mdlb1 Labels: Location Keywords Science Keywords ISO Topic category Platform Instrument Project Ancillary Keywords Data Set Progress Data Center Personnel Extended Metadata Properties Creation and Review Dates … Directory Interchange Format (DIF): a descriptive and standardized format for exchanging information about scientific data sets. The DIF Writer’s Guide: http://gcmd.gsfc.nasa.gov/U ser/difguide/difman.html. Origin: DIF was the product of an Earth Science and Applications Data Systems Workshop (ESADS) held February 24-26, 1987 on catalog interoperability (CI). (http://gcmd.gsfc.nasa. gov/add/difguide/whatisadif. html) Dublin Core Metadata Standard DIF Title Entry_Title Creator Data_Set_Citation: Dataset_Creator Personnel: Role: Investigator: Last_Name Personnel: Role: Investigator: First_Name Personnel: Role: Investigator: Middle_Name Subject and Keywords Keyword Parameters: Category Parameters: Topic Parameters: Term Parameters: Variable Parameters: Detailed_Variable Source_Name Sensor_Name Project Location Description Summary Publisher Data_Set_Citation: Dataset_Publisher Data_Center: Data_Center_Name Data_Center: Data_Center_URL Data_Center: Data Center Contact, Last_Name Data_Center: Data Center Contact, First_Name Data_Center: Data Center Contact, Middle_Name Contributor: Personnel: Role: Personnel: Last_Name Personnel: First_Name Personnel: Middle_Name Date Data_Set_Citation: Dataset_Release_Date Resource Type Data_Set_Citation: Data_Presentation_Form Format Resource Identifier Source Language Relation Coverage Rights Management Group: Distribution Distribution_Media Distribution_Size Distribution_Format Fees Data Center: Data_Set_ID Data_Set_Citation: Online_Resource Related_URL: URL_Content_Type Related_URL: URL Related_URL: URL_Content_Type Related_URL: URL Source_Name Data_Set_Language Parent_DIF Data_Set_Citation: Online_Resource Related_URL: URL_Content_Type Related_URL: URL Reference Location Spatial_Coverage: Southernmost_Latitude Spatial_Coverage: Northernmost_Latitude Spatial_Coverage: Easternmost_Longitude Spatial_Coverage: Westernmost_Longitude Temporal_Coverage: Start_Date Temporal_Coverage: Stop_Date Paleo_Temporal_Coverage: Paleo_Start_Date Paleo_Temporal_Coverage: Paleo_Stop_Date Paleo_Temporal_Coverage: Chronostratigraphic_Unit Use_Constraints Access_Constraints National Space Science Data Center (NSSDC) NASA's permanent archive for space science mission data. “Many applications have been created to support the SPDF Common Data Format (CDF) data standard.” Screen snap shot from the CDF Windows Imaging Tool (CWIT). From Examples of CDF Applications (http://cdf.gsfc.nasa.gov/html/examples.html) DataUp: An open source tool helping researchers document, manage, and archive their tabular data. DataUp operates within the scientist's workflow and integrates with Microsoft® Excel. http://dataup.cdlib.org/ Colectica for Microsoft Excel A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format, the open standard for data documentation. http://www.colectica.com/software/colecticaforexcel QualAnon: DSDR Qualitative Data Anonymizer This free transcript anonymization tool is designed solely to de-identify qualitative interview transcripts. https://www.icpsr.umich.edu//icpsrweb/ DSDR/tools/anonymize.jsp OpenRefine (exGoogle Refine) is a powerful tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase. http://openrefine.org/ Nesstar Publisher is a free advanced data management program. It can be used for the preparation of data and metadata. It's DDI compliant. http://www.nesstar.com/soft ware/publisher.html <oXygen/> XML Editor is an XML tool that supports all the XML schema languages. The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers. You can use <oXygen/> XML Editor to work with all XMLbased technologies including XML databases, XProc pipelines, and web services. http://www.oxygenxml.com/ Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath. http://xml.ascc.net/resource/sche matron/schematron.html Altova XMLSpy is an advanced XML editor for modeling, editing, transforming, and debugging XML-related technologies. http://www.altova.com/xmlspy .html LabTrove is a free blogging platform specifically designed for use in a research environment. It aims to serve as a highly flexible electronic notebook and data management syste by i tegrati g with a lab’s data-producing instruments; researchers can describe an experiment and associate it with its data output at the time of capture, rather than annotating after the fact. http://www.labtrove.org/ Kepler is a scientific workflow modeling and management system that enables users, regardless of programming experience, to set up data analysis pipelines. The software will assemble, execute, and document theof services and scripts that scientists with largescale data use to execute research. https://kepler-project.org/ DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies, and to receive tailored institutional guidance to help them in the process. https://dmp.cdlib.org/ DataCite The DataCite Consortium provides a number of services to support efforts at increasing the ease and prevalence of data citation. http://www.datacite.org OpenDOAR: An authoritative worldwide directory of academic open access repositories. DataBib: Databib is a community-driven, annotated bibliography of research data repositories. http://databib.org/ http://www.opendoar.org/countrylist.php Open Access Directory: Data Repositories A list of repositories and databases for open data. It is part of the Open Access Directory maintained by Simmons College. http://oad.simmons.edu/oadwiki/Data_ repositories CDL Curation and Publishing Services http://www.cdlib.org Create, edit, share, and save data management plans Open source add-in for Microsoft Excel as a data collection tool Create and manage persistent identifiers Curation repository: store, manage, and share research data Open access scholarly publishing services: papers, journals, books, seminars & more An infrastructure to publish and get credit for sharing research data Data Publication * This slide is by Joan Starr, California Digital Library. http://www.slideshare.net/joanstarr/dataset-metadata-tools-approaches-for-access-preservation?from_search=1 Data Set Related Services http://library.ucf.edu/ScholarlyCommunication/UCFResearchLifecycle.pdf o Data Set (Dataset) Metadata Service: o Provides researchers consultation on: o Project and dataset documentation; o Acquiring DOIs for your datasets; o Metadata standards (Common and Domain Specific); o Metadata schemas customization; o Controlled vocabularies and thesauri; o Data curation tools and practices. o Assists in describing basic properties of your data and enriching metadata for your datasets; o Supports applying controlled vocabularies or optimizing keywords to enhance the search of your datasets; o Helps to prepare your metadata and data for deposit and preservation. o Will work with the library Scholarly Communication team and subject librarians to: o Introduce ORCid to researchers; o Promote data curation tools; o Provide data repositories information. oScholarly Communication: (http://library.ucf.edu/ScholarlyCommunication/) oSC Contact Information (http://library.ucf.edu/ScholarlyCommunication/Contact.php) oMetadata Services (http://library.ucf.edu/ScholarlyCommunication/Metadata.php) oUCF Library Research Guides (http://guides.ucf.edu) oMetadata guide (http://guides.ucf.edu/metadata) oUCF Library Digital Collections (http://library.ucf.edu/Systems/DigitalCollections/) oResearch and Information Services (http://library.ucf.edu/Reference/) oSubject Librarians (http://library.ucf.edu/SubjectLibrarians/) o More information on metadata standards, controlled vocabularies, data curation tools and repositories will be covered in the Metadata Services session. Contact: Sai Deng, Metadata Librarian sai.deng@ucf.edu 407-823-4312 (Office)
