DSpace Manual PDF
DSpace Manual PDF
DSpace Manual PDF
x Documentation
DSpace 4.x
Documentation
Page 1 of 836
DSpace 4.x Documentation
Table of Contents
1 Introduction __________________________________________________________________________ 10
1.1 Release Notes ___________________________________________________________________ 11
1.1.1 4.8 Release Notes __________________________________________________________ 11
1.1.2 4.7 Release Notes __________________________________________________________ 12
1.1.3 4.6 Release Notes __________________________________________________________ 13
1.1.4 4.5 Release Notes __________________________________________________________ 14
1.1.5 4.4 Release Notes __________________________________________________________ 15
1.1.6 4.3 Release Notes __________________________________________________________ 16
1.1.7 4.2 Release Notes __________________________________________________________ 17
1.1.8 4.1 Release Notes __________________________________________________________ 18
1.1.9 4.0 Release Notes __________________________________________________________ 19
1.2 Functional Overview _______________________________________________________________ 25
1.2.1 Online access to your digital assets ____________________________________________ 26
1.2.2 Metadata Management ______________________________________________________ 28
1.2.3 Licensing _________________________________________________________________ 30
1.2.4 Persistent URLs and Identifiers ________________________________________________ 31
1.2.5 Getting content into DSpace __________________________________________________ 32
1.2.6 Getting content out of DSpace ________________________________________________ 35
1.2.7 User Management __________________________________________________________ 37
1.2.8 Access Control ____________________________________________________________ 38
1.2.9 Usage Metrics _____________________________________________________________ 40
1.2.10 Digital Preservation ________________________________________________________ 41
1.2.11 System Design ___________________________________________________________ 42
2 Installing DSpace ______________________________________________________________________ 45
2.1 For the Impatient _________________________________________________________________ 45
2.2 Hardware Recommendations ________________________________________________________ 46
2.3 Prerequisite Software ______________________________________________________________ 46
2.3.1 UNIX-like OS or Microsoft Windows ____________________________________________ 46
2.3.2 Oracle Java JDK 7 (standard SDK is fine, you don't need J2EE) or OpenJDK 7 __________ 47
2.3.3 Apache Maven 3.x (Java build tool) ____________________________________________ 47
2.3.4 Apache Ant 1.8 or later (Java build tool) _________________________________________ 48
2.3.5 Relational Database: (PostgreSQL or Oracle) ____________________________________ 48
2.3.6 Servlet Engine (Apache Tomcat 7 or later, Jetty, Caucho Resin or equivalent) ___________ 49
2.3.7 Perl (only required for [dspace]/bin/dspace-info.pl) _________________________________ 50
2.4 Installation Instructions _____________________________________________________________ 51
2.4.1 Overview of Install Options ___________________________________________________ 51
2.4.2 Overview of DSpace Directories _______________________________________________ 52
2.4.3 Installation ________________________________________________________________ 53
2.5 Advanced Installation ______________________________________________________________ 61
1 Introduction
DSpace is an open source software platform that enables organisations to:
capture and describe digital material using a submission workflow module, or a variety of programmatic
ingest options
distribute an organisation's digital assets over the web through a search and retrieval system
preserve digital assets over the long term
This system documentation includes a functional overview of the system, which is a good introduction to the
capabilities of the system, and should be readable by non-technical folk. Everyone should read this section first
because it introduces some terminology used throughout the rest of the documentation.
For people actually running a DSpace service, there is an installation guide, and sections on configuration and
the directory structure.
Finally, for those interested in the details of how DSpace works, and those potentially interested in modifying
the code for their own purposes, there is a detailed architecture and design section.
The DSpace Public API Javadocs. Build these with the command mvn javadoc:javadoc
The DSpace Wiki contains stacks of useful information about the DSpace platform and the work people
are doing with it. You are strongly encouraged to visit this site and add information about your own work.
Useful Wiki areas are:
A list of DSpace resources (Web sites, mailing lists etc.)
Technical FAQ
A list of projects using DSpace
Guidelines for contributing back to DSpace
www.dspace.org has announcements and contains useful information about bringing up an instance of
DSpace at your organization.
The DSpace Community List. Join DSpace-Community to ask questions or join discussions about non-
technical aspects of building and running a DSpace service. It is open to all DSpace users. Ask
questions, share news, and spark discussion about DSpace with people managing other DSpace sites.
Watch DSpace-Community for news of software releases, user conferences, and announcements from
the DSpace Federation.
The DSpace Technical List. DSpace developers help answer installation and technology questions,
share information and help each other solve technical problems through the DSpace-Tech mailing list.
Post questions or contribute your expertise to other developers working with the system.
The DSpace Development List. Join Discussions among DSpace Developers. The DSpace-Devel listserv
is for DSpace developers working on the DSpace platform to share ideas and discuss code changes to
the open source platform. Join other developers to shape the evolution of the DSpace software. The
DSpace community depends on its members to frame functional requirements and high-level
architecture, and to facilitate programming, testing, documentation and to the project.
Additional support options are available in the DSpace Support Guide
This documentation was produced with Confluence software. A PDF version was generated directly
from Confluence. An online, updated version of this 4.x Documentation is also available at: https://wiki.
duraspace.org/display/DSDOC4x
Welcome to Release 4.8, a security release for the DSpace 4.x platform. For information on upgrading to
DSpace 4, please see Upgrading DSpace.
DSpace 4.8 contain security fixes for both the XMLUI and JSPUI. To ensure your 4.x site is secure,
we highly recommend ALL DSpace 4.x users upgrade to DSpace 4.8 .
DSpace 4.8 is a security fix release to resolve issues located in DSpace 4.x XMLUI and JSPUI. As it only
provides security and bug fixes, DSpace 4.8 should constitute an easy upgrade from DSpace 4.x for most
users. No additional configuration changes should be necessary when upgrading from DSpace 4.x to 4.8. It is
necessary to run a database update script. See Upgrading From 4.0 to 4.x for details.
This release addresses the following security issues discovered in DSpace 4.x and below:
In addition, this release fixes minor bugs in the 4.x releases. For more information, see the Changes in 4.x
page.
4.8 Acknowledgments
The 4.8 release was led by the Committers.
The following individuals provided code or bug fixes to the 4.8 release: Pascal-Nicolas Becker (pnbecker), Tim
Donohue (tdonohue), Samuel Cambien (samuelcambien), Jonas Van Goolen (Jonas VG (atmire)), Mark Wood
(mwood).
DSpace 4.7 contain security fix for both the XMLUI and JSPUI. To ensure your 4.x site is secure, we
highly recommend ALL DSpace 4.x users upgrade to DSpace 4.7.
DSpace 4.7 is a security fix release to resolve an issue located in DSpace 4.x XMLUI and JSPUI. As it only
provides a security-fix, DSpace 4.7 should constitute an easy upgrade from DSpace 4.x for most users. No
database changes or additional configuration changes should be necessary when upgrading from DSpace 4.x
to 4.7.
This release addresses the following security issues discovered in DSpace 4.x and below:
In addition, this release fixes minor bugs in the 4.x releases. For more information, see the Changes in 4.x
page.
4.7 Acknowledgments
The 4.7 release was led by Andrea Pascarelli (4Science) and the Committers.
The following individuals provided code or bug fixes to the 4.7 release: Pascal-Nicolas Becker (pnbecker),
Andrea Bollini (abollini), Andrea Pascarelli (lap82)
DSpace 4.6 contain security fix for both the XMLUI and JSPUI. To ensure your 4.x site is secure, we
highly recommend ALL DSpace 4.x users upgrade to DSpace 4.6.
DSpace 4.6 is a security fix release to resolve an issue located in DSpace 4.x XMLUI and JSPUI. As it only
provides a security-fix, DSpace 4.6 should constitute an easy upgrade from DSpace 4.x for most users. No
database changes or additional configuration changes should be necessary when upgrading from DSpace 4.x
to 4.6.
This release addresses the following security issues discovered in DSpace 4.x and below:
security fix:
[MEDIUM SEVERITY] XML External Entity (XXE) vulnerability in pdfbox. (DS-3309 - requires a
JIRA account to access)
Reported by Seth Robbins
other fixes:
It was not possible to send mails using an ssl secured connection to the mail server ( DS-2702)
In addition, this release fixes minor bugs in the 4.x releases. For more information, see the Changes in 4.x
page.
4.6 Acknowledgments
The 4.6 release was led by Andrea Pascarelli (4Science) and the Committers.
The following individuals provided code or bug fixes to the 4.6 release: Pascal-Nicolas Becker (pnbecker),
Andrea Bollini (abollini), Roeland Dillen (rradillen), Tim Donohue (tdonohue), Bram Luyten (bram-atmire),
Andrea Pascarelli (lap82), Mark Wood (mwoodiupui)
DSpace 4.5 contains security fixes for both the XMLUI and JSPUI. To ensure your 4.x site is secure,
we highly recommend ALL DSpace 4.x users upgrade to DSpace 4.5 .
DSpace 4.5 is a security fix release to resolve several issues located in DSpace 4.x XMLUI and JSPUI. As it
only provides security-fixes, DSpace 4.5 should constitute an easy upgrade from DSpace 4.x for most users. No
database changes or additional configuration changes should be necessary when upgrading from DSpace 4.x
to 4.5.
This release addresses the following security issues discovered in DSpace 4.x and below:
4.5 Acknowledgments
The 4.5 release was led by Tim Donohue (DuraSpace) and the Committers.
The following individuals provided code or bug fixes to the 4.5 release: Andrea Bollini (abollini), Tim Donohue
(tdonohue), Ivan Masar (helix84), Mark Wood (mwoodiupui)
DSpace 4.4 contains security fixes for the JSPUI only. To ensure your 4.x site is secure, we highly
recommend JSPUI DSpace 4.x users upgrade to DSpace 4.4.
DSpace 4.4 is a security fix release to resolve several issues located in DSpace 4.x JSPUI. As it only provides
security-fixes, DSpace 4.4 should constitute an easy upgrade from DSpace 4.x for most users. No database
changes or additional configuration changes should be necessary when upgrading from DSpace 4.x to 4.4.
This release addresses the following security issues discovered in DSpace 4.x and below:
4.4 Acknowledgments
The 4.4 release was led by Tim Donohue (DuraSpace), Andrea Schweer (U of Waikato) and the Committers.
The following individuals provided code or bug fixes to the 4.4 release: Pascal-Nicolas Becker (pnbecker), CTU
Developers (ctu-developers), Roeland Dillen (rradillen), Tim Donohue (tdonohue), Àlex Magaz Graça (rivaldi8),
Bram Luyten (bram-atmire), Ivan Masar (helix84), Christian Scheible (christian-scheible), Andrea Schweer
(aschweer), Jonas Van Goolen (jonas-atmire), Mark Wood (mwoodiupui)
DSpace 4.3 contains security fixes for both the XMLUI and JSPUI. To ensure your 4.x site is secure,
we highly recommend all DSpace 4.x users upgrade to DSpace 4.3 .
We also highly recommend removing any "allowLinking=true" settings from your Tomcat's <Context>
configuration. Previously our installation documentation erroneously listed examples which included
"allowLinking=true", while the Tomcat documentation lists it as a possible security concern. The
XMLUI Directory Traversal Vulnerability (see below) is also exacerbated by this setting.
DSpace 4.3 is a security fix release to resolve several issues located in DSpace 4.x. As it only provides security-
fixes, DSpace 4.3 should constitute an easy upgrade from DSpace 4.x for most users. No database changes or
additional configuration changes should be necessary when upgrading from DSpace 4.x to 4.3.
This release addresses the following security issues discovered in DSpace 4.x and below:
4.3 Acknowledgments
The 4.3 release was led by Tim Donohue (DuraSpace) and the Committers.
The following individuals provided code or bug fixes to the 4.3 release: Terry Brady (terrywbrady), Roeland
Dillen (rradillen), Tim Donohue (tdonohue), Ivan Masar (helix84), Andrea Pascarelli (lap82), Avi Romanoff
(aroman), Christian Scheible (christian-scheible), Robin Taylor (robintaylor)
Fixed occasional "Out of Memory" errors when indexing large bitstreams/files in Discovery ( DS-1958)
Fixed issue where REST API was not releasing "context" and ignored database pooling ( DS-1986)
Fixed Solr commit delays when "did you mean" functionality is enabled in Discovery ( DS-2060)
Fixed the "dspace classpath" command (DS-1998)
Fixed issue where thumbnails were not displayed when using JSPUI + Oracle database ( DS-2013)
Fixed validation of OAI-PMH response (DS-1928)
Fixed several Oracle database upgrade script errors (DS-2036, DS-2038, DS-2056, and DS-1957)
Fixed Maven build issue on Windows operating systems (DS-1940)
Other minor fixes See Changes in 4.x section for a list of all fixes.
4.2 Acknowledgments
The 4.2 release was led by Robin Taylor (U of Edinburgh) and the Committers.
The following individuals provided code or bug fixes to the 4.2 release: Pascal-Nicolas Becker (pnbecker), Peter
Dietz (peterdietz), Roeland Dillen (rradillen), Tim Donohue (tdonohue), Denis Fdz, Keith Gilbertson (keithgee),
Panagiotis Koutsourakis (kutsurak), Bram Luyten (bram-atmire), Mini-Pillai, Ivan Masar (helix84), Thomas Misilo
(misilot), Hardy Pottinger (hardyoyo), Antoine Snyers (antoine-atmire), Robin Taylor (robintaylor), Kevin Van de
Velde (KevinVdV), Mark Wood (mwoodiupui)
Fixed issue where having a period (.) in your handle prefix generated incorrect identifiers ( DS-1536)
Fixed broken quick build (from [dspace-src]/dspace) (DS-1867)
Fixed a crash of DSpace during CSV import via BTE (DS-1857)
Fixed collection harvesting to DSpace via ORE (DS-1848)
Fixed deposit of new items via SWORD (DS-1846)
Fixed search hit highlighting in XMLUI (DS-1907)
Fixed broken 'stat-initial' script (DS-1795)
Other minor fixes. See Changes in 4.x section for a list of all fixes.
4.1 Acknowledgments
The 4.1 release was led by Mark Wood (IUPUI) and the Committers.
The following individuals provided code or bug fixes to the 4.1 release: Andrea Bollini (abollini), Roeland Dillen
(rradillen), Tim Donohue (tdonohue), Àlex Magaz Graça (rivaldi8), Panagiotis Koutsourakis (kutsurak),
marsaoua, Ivan Masar (helix84), João Melo (lyncodev), Thomas Misilo (misilot), Andrea Pascarelli (lap82),
Adán Román Ruiz, Andrea Schweer (aschweer), Kim Shepherd (kshepherd), Kostas Stamatis (kstamatis),
Kevin Van de Velde (KevinVdV), Mark Wood (mwoodiupui)
DSpace 4.0 ships with a number of new features. Certain features are automatically enabled by default while
others require deliberate activation.
The following non-exhaustive list contains the major new features in 4.0 that are enabled by default:
Discovery: Search & Browse is now enabled by default in both XMLUI and JSPUI.
Note: The Lucene/DB-based search & browse backend is still supported, but is
deprecated and might be removed in a future release. Any new features should use the
Discovery API instead of tying directly to Lucene, Solr or Elastic Search.
Contributors:
lap - Luigi Andrea Pascarelli with the support of CINECA
ab - Andrea Bollini with the support of CINECA
kv - Kevin Van de Velde with the support of @mire
im - Ivan Masár
A new Bootstrap-based default look and feel for JSPUI (see DS-1675 for screenshots)
Kindly contributed by Andrea Bollini & Luigi Andrea Pascarelli with the support of
CINECA
Kindly contributed by Keiji Suzuki & Luigi Andrea Pascarelli with the support of CINECA
some general bug fixes including: bitstream url construction, config options,
context management and connection pool, ORIGINAL bundle problem (DS-1149
)
proper METSDSpaceSIP support in both deposit and update
proper authentication for accessing actionable bitstreams (i.e. those that can be
replaced via sword), tightened security options around mediated actions, and
add extra security to the access of descriptive documents (deposit receipts,
statements)
more configuration options: bundles to expose in Statements, DepositMO
extensions (for individual files), and many more
some general refactoring
addition of 404 responses where necessary
better support for add/replace of metadata, and how metadata updates are
handled on archived items
update to latest version of Java Server library
new bitstream formats in the bitstream registry
Kindly contributed by Mark H. Wood with the support of IUPUI University Library
Kindly contributed by Ivan Masár and Terry Brady with the support of Georgetown
University
Filtering of web spiders from statistics can now match by the spider host's DNS name
or the spider's User-Agent string.
Kindly contributed by Mark H. Wood with the support of IUPUI University Library
Several improvements to help Google Scholar better index your content (requested by
Google Scholar team). See also Search Engine Optimization recommendations, for
ways to further enhance Google Scholar (and other search engine) findability.
DS-1482-Add a way for harvesters to find recently added items (request from Google)
Closed
Kindly contributed by several members of the DSpace Committer team (see individual
tickets for more details).
The following list contains all features that are included in the DSpace 4.0 release, but need to be enabled
manually.
Review the documentation for these features carefully, especially if you are upgrading from an older version
of DSpace.
DOI Support
Kindly contributed by Pascal-Nicolas Becker & Mark Wood with the support of TU Berlin
and IUPUI University Library
Kindly contributed by Pascal-Nicolas Becker, Andrea Bollini & Mark Wood with the
support of TU Berlin and CINECA
New feature:
Documentation
Kindly contributed by Elias Tzoc and James Russell with the support of Miami
University
Kindly contributed by Ivan Masár and Sam Ottenhoff of Longsight for Allegheny College
(DS-1078).
Kindly contributed by Jason Sherman with the support of University of Science and Arts
of Oklahoma
For items with restricted access, allows users to ask the original author for a
copy of the item
Original contribution of Adán Román Ruiz (Arvo Consultores). JSPUI version adapted
from the Universidade do Minho. XMLUI version funded by Instituto Oceanográfico de
España. Additional improvements/bug fixes by Andrea Bollini (CINECA).
A new REST web service API module based on Jersey (a JAX RS 1.0 implementation)
(DS-1696)
Provides:
Kindly contributed by Peter Dietz with the support of Ohio State University Libraries
A full list of all changes / bug fixes in 4.x is available in the Changes in 4.x section.
The following individuals have contributed directly to this release of DSpace: Adan Roman, Alan Orth, Alexey
Maslov, Àlex Magaz Graça, Andrea Bollini, Andrea Schweer, Andrew Waterman, Anja Le Blanc, Bavo Van Geit
(@mire), Bram Luyten (@mire), Brian Freels-Stendel, Cedric Devaux, Christian Scheible, Christos
Rodosthenous, Claudia Jürgen, Clint Bellanger, Denis Fdz, DSpace @ Lyncode, Elias Tzoc, Fabio Bolognesi,
Hardy Pottinger, Hélder Silva, Hilton Gibson, Ian Boston, Ivan Masár, james bardin, James Halliday, Jason
Sherman, João Melo, Jonathan Blood, Jose Blanco, Juan Corrales Correyero, Keiji Suzuki, Kevin Van de
Velde, Kim Shepherd, Kostas Maistrelis, Kostas Stamatis, LifeH2O, Luigi Andrea Pascarelli, Marco Fabiani,
Marco Weiss, Marina Muilwijk, Mark Diggory, Mark H. Wood, Michael White, Moises A., Moises Alvarez,
Onivaldo Rosa Junior, Pascal-Nicolas Becker, Peter Dietz, Rania Stathopoulou, Raul Ruiz, Richard Jones,
Richard Rodgers, Robert Ruiz, Robin Taylor, Roeland Dillen, Samuel Ottenhoff, Sara Amato, Sean Carte,
Stuart Lewis, Terry Brady, Thomas Autry, Thomas Misilo, Tiago Murakami, Tim Donohue, Toni Prieto, usha
sharma, and others who reviewed and commented on their work. Many of these could not do this work without
the support (release time and financial) of their associated institutions. We offer thanks to those institutions for
supporting their staff to take time to contribute to the DSpace project.
A big thank you also goes out to the DSpace Community Advisory Team (DCAT), who helped the developers to
prioritize and plan out several of the new features that made it into this release. The current DCAT members
include: Amy Lana, Augustine Gitonga, Bram Luyten, Ciarán Walsh, Claire Bundy, Dibyendra Hyoju, Elena
Feinstein, Elin Stangeland, Iryna Kuchma, Jim Ottaviani, Leonie Hayes, Maureen Walsh, Michael Guthrie,
Sarah Molloy, Sarah Shreeves, Sue Kunda, Valorie Hollister and Yan Han.
We apologize to any contributor accidentally left off this list. DSpace has such a large, active development
community that we sometimes lose track of all our contributors. Our ongoing list of all known people/institutions
that have contributed to DSpace software can be found on our DSpace Contributors page. Acknowledgments to
those left off will be made in future releases.
Want to see your name appear in our list of contributors? All you have to do is report an issue, fix a bug,
improve our documentation or help us determine the necessary requirements for a new feature! Visit our Issue
Tracker to report a bug, or join dspace-devel mailing list to take part in development work. If you'd like to help
improve our current documentation, please get in touch with one of our Committers with your ideas. You don't
even need to be a developer! Repository managers can also get involved by volunteering to join the DSpace
Community Advisory Team and helping our developers to plan new features.
The Release Team consisted of Mark H. Wood, Hardy Pottinger and Andrea Bollini.
Additional thanks to Tim Donohue from DuraSpace for keeping all of us focused on the work at hand, for
calming us when we got excited, and for the general support for the DSpace project.
Full-text search
DSpace can process uploaded text based contents for full-text searching. This means that not only the
metadata you provide for a given file will be searchable, but all of its contents will be indexed as well. This
allows users to search for specific keywords that only appear in the actual content and not in the provided
description.
Navigation
DSpace allows users to find their way to relevant content in a number of ways, including:
Another important mechanism for discovery in DSpace is the browse. This is the process whereby the user
views a particular index, such as the title index, and navigates around it in search of interesting items. The
browse subsystem provides a simple API for achieving this by allowing a caller to specify an index, and a
subsection of that index. The browse subsystem then discloses the portion of the index of interest. Indices that
may be browsed are item title, item issue date, item author, and subject terms. Additionally, the browse can be
limited to items within a particular collection or community.
Files that have been uploaded to DSpace are often referred to as "Bitstreams". The reason for this is mainly
historic and tracks back to the technical implementation. After ingestion, files in DSpace are stored on the file
system as a stream of bits without the file extension.
OpenURL Support
DSpace supports the OpenURL protocol from SFX, in a rather simple fashion. If your institution has an SFX
server, DSpace will display an OpenURL link on every item page, automatically using the Dublin Core
metadata. Additionally, DSpace can respond to incoming OpenURLs. Presently it simply passes the information
in the OpenURL to the search subsystem. A list of results is then displayed, which usually gives the relevant
item (if it is in DSpace) at the top of the list.
Metadata
Broadly speaking, DSpace holds three sorts of metadata about archived content:
Descriptive Metadata: DSpace can support multiple flat metadata schemas for describing an item. A
qualified Dublin Core metadata schema loosely based on the Library Application Profile set of elements
and qualifiers is provided by default. The set of elements and qualifiers used by MIT Libraries comes pre-
configured with the DSpace source code. However, you can configure multiple schemas and select
metadata fields from a mix of configured schemas to describe your items. Other descriptive metadata
about items (e.g. metadata described in a hierarchical schema) may be held in serialized bitstreams.
Communities and collections have some simple descriptive metadata (a name, and some descriptive
prose), held in the DBMS.
Administrative Metadata: This includes preservation metadata, provenance and authorization policy
data. Most of this is held within DSpace's relational DBMS schema. Provenance metadata (prose) is
stored in Dublin Core records. Additionally, some other administrative metadata (for example, bitstream
byte sizes and MIME types) is replicated in Dublin Core records so that it is easily accessible outside of
DSpace.
Structural Metadata: This includes information about how to present an item, or bitstreams within an
item, to an end-user, and the relationships between constituent parts of the item. As an example,
consider a thesis consisting of a number of TIFF images, each depicting a single page of the thesis.
Structural metadata would include the fact that each image is a single page, and the ordering of the TIFF
images/pages. Structural metadata in DSpace is currently fairly basic; within an item, bitstreams can be
arranged into separate bundles as described above. A bundle may also optionally have a primary
bitstream. This is currently used by the HTML support to indicate which bitstream in the bundle is the first
HTML file to send to a browser. In addition to some basic technical metadata, a bitstream also has a
'sequence ID' that uniquely identifies it within an item. This is used to produce a 'persistent' bitstream
identifier for each bitstream. Additional structural metadata can be stored in serialized bitstreams, but
DSpace does not currently understand this natively.
Definitions
Choice Management
This is a mechanism that generates a list of choices for a value to be entered in a given metadata field.
Depending on your implementation, the exact choice list might be determined by a proposed value or query, or
it could be a fixed list that is the same for every query. It may also be closed (limited to choices produced
internally) or open, allowing the user-supplied query to be included as a choice.
Authority Control
This works in addition to choice management to supply an authority key along with the chosen value, which is
also assigned to the Item's metadata field entry. Any authority-controlled field is also inherently choice-
controlled.
1. There is a simple and positive way to test whether two values are identical, by comparing authority
keys.
Comparing plain text values can give false positive results e.g. when two different people have a
name that is written the same.
It can also give false negative results when the same name is written different ways, e.g. "J.
Smith" vs. "John Smith".
2. Help in entering correct metadata values. The submission and admin UIs may call on the authority to
check a proposed value and list possible matches to help the user select one.
3. Improved interoperability. By sharing a name authority with another application, your DSpace can
interoperate more cleanly with other applications.
For example, a DSpace institutional repository sharing a naming authority with the campus social
network would let the social network construct a list of all DSpace Items matching the shared
author identifier, rather than by error-prone name matching.
When the name authority is shared with a campus directory, DSpace can look up the email
address of an author to send automatic email about works of theirs submitted by a third party.
That author does not have to be an EPerson.
4. Authority keys are normally invisible in the public web UIs. They are only seen by administrators editing
metadata. The value of an authority key is not expected to be meaningful to an end-user or site visitor.
Authority control is different from the controlled vocabulary of keywords already implemented in the
submission UI:
1. Authorities are external to DSpace. The source of authority control is typically an external database or
network resource.
Plug-in architecture makes it easy to integrate new authorities without modifying any core code.
2. This authority proposal impacts all phases of metadata management.
The keyword vocabularies are only for the submission UI.
Authority control is asserted everywhere metadata values are changed, including unattended
/batch submission, LNI and SWORD package submission, and the administrative UI.
Some Terminology
Authority An authority is a source of fixed values for a given domain, each unique value identified by a
key.
Authority The information associated with one of the values in an authority; may include alternate
Record spellings and equivalent forms of the value, etc.
Authority An opaque, hopefully persistent, identifier corresponding to exactly one record in the authority.
Key
1.2.3 Licensing
DSpace offers support for licenses on different levels
Handles
Researchers require a stable point of reference for their works. The simple evolution from sharing of citations to
emailing of URLs broke when Web users learned that sites can disappear or be reconfigured without notice,
and that their bookmark files containing critical links to research results couldn't be trusted in the long term. To
help solve this problem, a core DSpace feature is the creation of a persistent identifier for every item, collection
and community stored in DSpace. To persist identifiers, DSpace requires a storage- and location- independent
mechanism for creating and maintaining identifiers. DSpace uses the CNRI Handle System for creating these
identifiers. The rest of this section assumes a basic familiarity with the Handle system.
DSpace uses Handles primarily as a means of assigning globally unique identifiers to objects. Each site running
DSpace needs to obtain a unique Handle 'prefix' from CNRI, so we know that if we create identifiers with that
prefix, they won't clash with identifiers created elsewhere.
Presently, Handles are assigned to communities, collections, and items. Bundles and bitstreams are not
assigned Handles, since over time, the way in which an item is encoded as bits may change, in order to allow
access with future technologies and devices. Older versions may be moved to off-line storage as a new
standard becomes de facto. Since it's usually the item that is being preserved, rather than the particular bit
encoding, it only makes sense to persistently identify and allow access to the item, and allow users to access
the appropriate bit encoding from there.
Of course, it may be that a particular bit encoding of a file is explicitly being preserved; in this case, the
bitstream could be the only one in the item, and the item's Handle would then essentially refer just to that
bitstream. The same bitstream can also be included in other items, and thus would be citable as part of a
greater item, or individually.
The Handle system also features a global resolution infrastructure; that is, an end-user can enter a Handle into
any service (e.g. Web page) that can resolve Handles, and the end-user will be directed to the object (in the
case of DSpace, community, collection or item) identified by that Handle. In order to take advantage of this
feature of the Handle system, a DSpace site must also run a 'Handle server' that can accept and resolve
incoming resolution requests. All the code for this is included in the DSpace source code bundle.
hdl:1721.123/4567
http://hdl.handle.net/1721.123/4567
The above represent the same Handle. The first is possibly more convenient to use only as an identifier;
however, by using the second form, any Web browser becomes capable of resolving Handles. An end-user
need only access this form of the Handle as they would any other URL. It is possible to enable some browsers
to resolve the first form of Handle as if they were standard URLs using CNRI's Handle Resolver plug-in, but
since the first form can always be simply derived from the second, DSpace displays Handles in the second
form, so that it is more useful for end-users.
It is important to note that DSpace uses the CNRI Handle infrastructure only at the 'site' level. For example, in
the above example, the DSpace site has been assigned the prefix '1721.123'. It is still the responsibility of the
DSpace site to maintain the association between a full Handle (including the '4567' local part) and the
community, collection or item in question.
Each bitstream has a sequence ID, unique within an item. This sequence ID is used to create a persistent ID, of
the form:
For example:
https://dspace.myu.edu/bitstream/123.456/789/24/foo.html
The above refers to the bitstream with sequence ID 24 in the item with the Handle hdl:123.456/789. The foo.html
is really just there as a hint to browsers: Although DSpace will provide the appropriate MIME type, some
browsers only function correctly if the file has an expected extension.
The batch item importer is an application, which turns an external SIP (an XML metadata document with some
content files) into an "in progress submission" object. The Web submission UI is similarly used by an end-user
to assemble an "in progress submission" object.
Depending on the policy of the collection to which the submission in targeted, a workflow process may be
started. This typically allows one or more human reviewers or 'gatekeepers' to check over the submission and
ensure it is suitable for inclusion in the collection.
When the Batch Ingester or Web Submit UI completes the InProgressSubmission object, and invokes the next
stage of ingest (be that workflow or item installation), a provenance message is added to the Dublin Core which
includes the filenames and checksums of the content of the submission. Likewise, each time a workflow
changes state (e.g. a reviewer accepts the submission), a similar provenance statement is added. This allows
us to track how the item has changed since a user submitted it.
Once any workflow process is successfully and positively completed, the InProgressSubmission object is
consumed by an "item installer", that converts the InProgressSubmission into a fully blown archived item in
DSpace. The item installer:
Workflow Steps
A collection's workflow can have up to three steps. Each collection may have an associated e-person group for
performing each step; if no group is associated with a certain step, that step is skipped. If a collection has no e-
person groups associated with any step, submissions to that collection are installed straight into the main
archive.
In other words, the sequence is this: The collection receives a submission. If the collection has a group
assigned for workflow step 1, that step is invoked, and the group is notified. Otherwise, workflow step 1 is
skipped. Likewise, workflow steps 2 and 3 are performed if and only if the collection has a group assigned to
those steps.
When a step is invoked, the submission is put into the 'task pool' of the step's associated group. One member of
that group takes the task from the pool, and it is then removed from the task pool, to avoid the situation where
several people in the group may be performing the same task without realizing it.
The member of the group who has taken the task from the pool may then perform one of three actions:
2 Can edit metadata provided by the user with the submission, but cannot change the submitted
files. Can accept submission for inclusion, or reject submission.
3 Can edit metadata provided by the user with the submission, but cannot change the submitted
files. Must then commit to archive; may not reject submission.
If a submission is 'accepted', it is passed to the next step in the workflow. If there are no more workflow steps
with associated groups, the submission is installed in the main archive.
One last possibility is that a workflow can be 'aborted' by a DSpace site administrator. This is accomplished
using the administration UI.
The reason for this apparently arbitrary design is that is was the simplest case that covered the needs of the
early adopter communities at MIT. The functionality of the workflow system will no doubt be extended in the
future.
DSpace also includes various package importer tools, which support many common content packaging formats
like METS. For more information see Package Importer and Exporter.
OAI Support
The Open Archives Initiative has developed a protocol for metadata harvesting. This allows sites to
programmatically retrieve or 'harvest' the metadata from several sources, and offer services using that
metadata, such as indexing or linking services. Such a service could allow users to access information from a
large number of sites from one place.
DSpace exposes the Dublin Core metadata for items that are publicly (anonymously) accessible. Additionally,
the collection structure is also exposed via the OAI protocol's 'sets' mechanism. OCLC's open source OAICat
framework is used to provide this functionality.
You can also configure the OAI service to make use of any crosswalk plugin to offer additional metadata
formats, such as MODS.
DSpace's OAI service does support the exposing of deletion information for withdrawn items, but not for items
that are 'expunged' (see above). DSpace also supports OAI-PMH resumption tokens.
SWORD Support
SWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items
into repositories. SWORD was further developed in SWORD version 2 to add the ability to retrieve, update, or
delete deposits. DSpace supports the SWORD protocol via the 'sword' web application and SWord v2 via the
swordv2 web application. The specification and further information can be found at http://swordapp.org.
DSpace also includes various package exporter tools, which support many common content packaging formats
like METS. For more information see Package Importer and Exporter.
Packager Plugins
Packagers are software modules that translate between DSpace Item objects and a self-contained external
representation, or "package". A Package Ingester interprets, or ingests, the package and creates an Item. A
Package Disseminator writes out the contents of an Item in the package format.
A package is typically an archive file such as a Zip or "tar" file, including a manifest document which contains
metadata and a description of the package contents. The IMS Content Package is a typical packaging standard.
A package might also be a single document or media file that contains its own metadata, such as a PDF
document with embedded descriptive metadata.
Package ingesters and package disseminators are each a type of named plugin (see Plugin Manager), so it is
easy to add new packagers specific to the needs of your site. You do not have to supply both an ingester and
disseminator for each format; it is perfectly acceptable to just implement one of them.
Most packager plugins call upon Crosswalk Plugins to translate the metadata between DSpace's object model
and the package format.
More information about calling Packagers to ingest or disseminate content can be found in the Package
Importer and Exporter section of the System Administration documentation.
Crosswalk Plugins
Crosswalks are software modules that translate between DSpace object metadata and a specific external
representation. An Ingestion Crosswalk interprets the external format and crosswalks it to DSpace's internal
data structure, while a Dissemination Crosswalk does the opposite.
For example, a MODS ingestion crosswalk translates descriptive metadata from the MODS format to the
metadata fields on a DSpace Item. A MODS dissemination crosswalk generates a MODS document from the
metadata on a DSpace Item.
Crosswalk plugins are named plugins (see Plugin Manager), so it is easy to add new crosswalks. You do not
have to supply both an ingester and disseminator for each format; it is perfectly acceptable to just implement
one of them.
There is also a special pair of crosswalk plugins which use XSL stylesheets to translate the external metadata
to or from an internal DSpace format. You can add and modify XSLT crosswalks simply by editing the DSpace
configuration and the stylesheets, which are stored in files in the DSpace installation directory.
The Packager plugins and OAH-PMH server make use of crosswalk plugins.
This functionality could also be used in situations where researchers wish to collaborate on a particular
submission, although there is no particular collaborative workspace functionality.
E-mail address
Subscriptions
As noted above, end-users (e-people) may 'subscribe' to collections in order to be alerted when new items
appear in those collections. Each day, end-users who are subscribed to one or more collections will receive an
e-mail giving brief details of all new items that appeared in any of those collections the previous day. If no new
items appeared in any of the subscribed collections, no e-mail is sent. Users can unsubscribe themselves at
any time. RSS feeds of new items are also available for collections and communities.
Groups
Groups are another kind of entity that can be granted permissions in the authorization system. A group is
usually an explicit list of E-People; anyone identified as one of those E-People also gains the privileges granted
to the group.
However, an application session can be assigned membership in a group without being identified as an E-
Person. For example, some sites use this feature to identify users of a local network so they can read restricted
materials not open to the whole world. Sessions originating from the local network are given membership in the
"LocalUsers" group and gain the corresponding privileges.
Administrators can also use groups as "roles" to manage the granting of privileges more efficiently.
Authentication
Authentication is when an application session positively identifies itself as belonging to an E-Person and/or
Group. In DSpace 1.4 and later, it is implemented by a mechanism called Stackable Authentication: the DSpace
configuration declares a "stack" of authentication methods. An application (like the Web UI) calls on the
Authentication Manager, which tries each of these methods in turn to identify the E-Person to which the session
belongs, as well as any extra Groups. The E-Person authentication methods are tried in turn until one
succeeds. Every authenticator in the stack is given a chance to assign extra Groups. This mechanism offers the
following advantages:
Separates authentication from the Web user interface so the same authentication methods are used for
other applications such as non-interactive Web Services
Improved modularity: The authentication methods are all independent of each other. Custom
authentication methods can be "stacked" on top of the default DSpace username/password method.
Cleaner support for "implicit" authentication where username is found in the environment of a Web
request, e.g. in an X.509 client certificate.
Authorization
DSpace's authorization system is based on associating actions with objects and the lists of EPeople who can
perform them. The associations are called Resource Policies, and the lists of EPeople are called Groups. There
are two built-in groups: 'Administrators', who can do anything in a site, and 'Anonymous', which is a list that
contains all users. Assigning a policy for an action on an object to anonymous means giving everyone
permission to do that action. (For example, most objects in DSpace sites have a policy of 'anonymous' READ.)
Permissions must be explicit - lack of an explicit permission results in the default policy of 'deny'. Permissions
also do not 'commute'; for example, if an e-person has READ permission on an item, they might not necessarily
have READ permission on the bundles and bitstreams in that item. Currently Collections, Communities and
Items are discoverable in the browse and search systems regardless of READ authorization.
Collection
DEFAULT_BITSTREAM_READ inherited as READ by Bitstreams of all submitted items. Note: only affects
Bitstreams of an item at the time it is initially submitted. If a Bitstream is
added later, it does not get the same default read policy.
COLLECTION_ADMIN collection admins can edit items in a collection, withdraw items, map other
items into this collection.
Item
Bundle
Bitstream
Note that there is no 'DELETE' action. In order to 'delete' an object (e.g. an item) from the archive, one must
have REMOVE permission on all objects (in this case, collection) that contain it. The 'orphaned' item is
automatically deleted.
*File Downloads information is only displayed for item-level statistics. Note that downloads from separate
bitstreams are also recorded and represented separately. DSpace is able to capture and store File Download
information, even when the bitstream was downloaded from a direct link on an external website.
System Statistics
Various statistical reports about the contents and use of your system can be automatically generated by the
system. These are generated by analyzing DSpace's log files. Statistics can be broken down monthly.
Checksum Checker
The purpose of the checker is to verify that the content in a DSpace repository has not become corrupted or
been tampered with. The functionality can be invoked on an ad-hoc basis from the command line, or configured
via cron or similar. Options exist to support large repositories that cannot be entirely checked in one run of the
tool. The tool is extensible to new reporting and checking priority approaches.
Data Model
The way data is organized in DSpace is intended to reflect the structure of the organization using the DSpace
system. Each DSpace site is divided into communities, which can be further divided into sub-communities
reflecting the typical university structure of college, department, research center, or laboratory.
Communities contain collections, which are groupings of related content. A collection may appear in more than
one community.
Each collection is composed of items, which are the basic archival elements of the archive. Each item is owned
by one collection. Additionally, an item may appear in additional collections; however every item has one and
only one owning collection.
Items are further subdivided into named bundles of bitstreams. Bitstreams are, as the name suggests, streams
of bits, usually ordinary computer files. Bitstreams that are somehow closely related, for example HTML files
and images that compose a single HTML document, are organized into bundles.
Each bitstream is associated with one Bitstream Format. Because preservation services may be an important
aspect of the DSpace service, it is important to capture the specific formats of files that users submit. In
DSpace, a bitstream format is a unique and consistent way to refer to a particular file format. An integral part of
a bitstream format is an either implicit or explicit notion of how material in that format can be interpreted. For
example, the interpretation for bitstreams encoded in the JPEG standard for still image compression is defined
explicitly in the Standard ISO/IEC 10918-1. The interpretation of bitstreams in Microsoft Word 2000 format is
defined implicitly, through reference to the Microsoft Word 2000 application. Bitstream formats can be more
specific than MIME types or file suffixes. For example, application/ms-word and .doc span multiple versions of
the Microsoft Word application, each of which produces bitstreams with presumably different characteristics.
Each bitstream format additionally has a support level, indicating how well the hosting institution is likely to be
able to preserve content in the format in the future. There are three possible support levels that bitstream
formats may be assigned by the hosting institution. The host institution should determine the exact meaning of
each support level, after careful consideration of costs and requirements. MIT Libraries' interpretation is shown
below:
Supported The format is recognized, and the hosting institution is confident it can make bitstreams of
this format usable in the future, using whatever combination of techniques (such as migration,
emulation, etc.) is appropriate given the context of need.
Known The format is recognized, and the hosting institution will promise to preserve the bitstream as-
is, and allow it to be retrieved. The hosting institution will attempt to obtain enough
information to enable the format to be upgraded to the 'supported' level.
Unsupported The format is unrecognized, but the hosting institution will undertake to preserve the
bitstream as-is and allow it to be retrieved.
Each item has one qualified Dublin Core metadata record. Other metadata might be stored in an item as a
serialized bitstream, but we store Dublin Core for every item for interoperability and ease of discovery. The
Dublin Core may be entered by end-users as they submit content, or it might be derived from other metadata as
part of an ingest process.
Items can be removed from DSpace in one of two ways: They may be 'withdrawn', which means they remain in
the archive but are completely hidden from view. In this case, if an end-user attempts to access the withdrawn
item, they are presented with a 'tombstone,' that indicates the item has been removed. For whatever reason, an
item may also be 'expunged' if necessary, in which case all traces of it are removed from the archive.
Object Example
Item A technical report; a data set with accompanying description; a video recording of a lecture
Bitstream A single HTML file; a single image file; a source code file
SRB is purely an option but may be used in lieu of the server's file system or in addition to the file system.
Without going into a full description, SRB is a very robust, sophisticated storage manager that offers essentially
unlimited storage and straightforward means to replicate (in simple terms, backup) the content on other local or
remote storage resources.
2 Installing DSpace
For the Impatient
Hardware Recommendations
Prerequisite Software
UNIX-like OS or Microsoft Windows
Oracle Java JDK 7 (standard SDK is fine, you don't need J2EE) or OpenJDK 7
Apache Maven 3.x (Java build tool)
Configuring a Proxy
Apache Ant 1.8 or later (Java build tool)
Relational Database: (PostgreSQL or Oracle)
Servlet Engine (Apache Tomcat 7 or later, Jetty, Caucho Resin or equivalent)
Perl (only required for [dspace]/bin/dspace-info.pl)
Installation Instructions
Overview of Install Options
Overview of DSpace Directories
Installation
Advanced Installation
'cron' jobs / scheduled tasks
Multilingual Installation
DSpace over HTTPS
Enabling the HTTPS support in Tomcat 7.0
Using SSL on Apache HTTPD with mod_jk
The Handle Server
Updating Existing Handle Prefixes
Google and HTML sitemaps
Statistics
Windows Installation
Checking Your Installation
Known Bugs
Common Problems
Common Installation Issues
General DSpace Issues
Only experienced unix admins should even attempt the following without going to the detailed
Installation Instructions
useradd -m dspace
gzip xzf dspace-4.x-src-release.tar.gz
createuser --username=postgres --no-superuser --pwprompt dspace
createdb --username=postgres --owner=dspace --encoding=UNICODE dspace
cd [dspace-source]
vi build.properties
mkdir [dspace]
chown dspace [dspace]
su - dspace
cd [dspace-source]
mvn package
cd [dspace-source]/dspace/target/dspace-<version>-build
ant fresh_install
cp -r [dspace]/webapps/* [tomcat]/webapps
/etc/init.d/tomcat start
[dspace]/bin/dspace create-administrator
Also, please note that the configuration and installation guidelines relating to a particular tool below are here for
convenience. You should refer to the documentation for each individual component for complete and up-to-date
details. Many of the tools are updated on a frequent basis, and the guidelines below may become out of date.
Microsoft Windows: After verifying all prerequisites below, see the Windows Installation section for
Windows tailored instructions
2.3.2 Oracle Java JDK 7 (standard SDK is fine, you don't need
J2EE) or OpenJDK 7
Oracle's Java can be downloaded from the following location: http://www.oracle.com/technetwork/java/javase
/downloads/index.html. Again, you can just download the Java SE JDK version.
Configuring a Proxy
You can configure a proxy to use for some or all of your HTTP requests in Maven. The username and password
are only required if your proxy requires basic authentication (note that later releases may support storing your
passwords in a secured keystore‚ in the mean time, please ensure your settings.xml file (usually ${user.home}/.
m2/settings.xml) is secured with permissions appropriate for your operating system).
Example:
<settings>
.
.
<proxies>
<proxy>
<active>true</active>
<protocol>http</protocol>
<host>proxy.somewhere.com</host>
<port>8080</port>
<username>proxyuser</username>
<password>somepassword</password>
<nonProxyHosts>www.google.com|*.somewhere.com</nonProxyHosts>
</proxy>
</proxies>
.
.
</settings>
Oracle 10g or greater Details on acquiring Oracle can be downloaded from the following location:
http://www.oracle.com/database/. You will need to create a database for DSpace. Make sure that the
character set is one of the Unicode character sets. DSpace uses UTF-8 natively, and it is suggested that
the Oracle database use the same character set. You will also need to create a user account for DSpace
(e.g. dspace) and ensure that it has permissions to add and remove tables in the database. Refer to the
Quick Installation for more details.
NOTE: If the database server is not on the same machine as DSpace, you must install the Oracle
client to the DSpace server and point tnsnames.ora and listener.ora files to the database the
Oracle server.
NOTE: DSpace uses sequences to generate unique object IDs — beware Oracle sequences,
which are said to lose their values when doing a database export/import, say restoring from a
backup. Be sure to run the script etc/oracle/update-sequences.sql after importing.
For people interested in switching from Postgres to Oracle, I know of no tools that would do this
automatically. You will need to recreate the community, collection, and eperson structure in the
Oracle system, and then use the item export and import tools to move your content over.
Tomcat 7 Version
If you are using Tomcat 7, we recommend running Tomcat 7.0.30 or above. Tomcat 7.0.29 and lower
versions suffer from a memory leak. As a result, those versions of tomcat require an unusual high
amount of memory to run DSpace. This has been resolved as of Tomcat 7.0.30. More information can
be found in DS-1553
Apache Tomcat 7 or higher. Tomcat can be downloaded from the following location: http://tomcat.
apache.org.
Note that DSpace will need to run as the same user as Tomcat, so you might want to install and
run Tomcat as a user called 'dspace'. Set the environment variable TOMCAT_USER
appropriately.
You need to ensure that Tomcat has a) enough memory to run DSpace and b) uses UTF-8 as its
default file encoding for international character support. So ensure in your startup scripts (etc) that
the following environment variable is set: JAVA_OPTS="-Xmx512M -Xms64M -Dfile.
encoding=UTF-8"
Modifications in [tomcat]/conf/server.xml : You also need to alter Tomcat's default
configuration to support searching and browsing of multi-byte UTF-8 correctly. You need to add a
configuration option to the <Connector> element in [tomcat]/config/server.xml: URIEncoding="
UTF-8"e.g. if you're using the default Tomcat config, it should read:
You may change the port from 8080 by editing it in the file above, and by setting the variable
CONNECTOR_PORT in server.xml.
Jetty or Caucho Resin DSpace will also run on an equivalent servlet Engine, such as Jetty (http://www.
mortbay.org/jetty/index.html) or Caucho Resin (http://www.caucho.com/). Jetty and Resin are configured
for correct handling of UTF-8 by default.
It is important to note that the strategies are identical in terms of the list of procedures required to complete the
build process, the only difference being that the Source Release includes "more modules" that will be built given
their presence in the distribution package.
DSpace uses three separate directory trees. Although you don't need to know all the details of them in order to
install DSpace, you do need to know they exist and also know how they're referred to in this document:
1. The installation directory, referred to as [dspace]. This is the location where DSpace is installed and
running. It is the location that is defined in the dspace.cfg as "dspace.dir". It is where all the DSpace
configuration files, command line scripts, documentation and webapps will be installed.
2. The source directory, referred to as [dspace-source] . This is the location where the DSpace
release distribution has been unpacked. It usually has the name of the archive that you expanded such
as dspace-<version>-release or dspace-<version>-src-release. Normally it is the directory
where all of your "build" commands will be run.
3. The web deployment directory. This is the directory that contains your DSpace web application(s). In
DSpace 1.5.x and above, this corresponds to [dspace]/webapps by default. However, if you are using
Tomcat, you may decide to copy your DSpace web applications from [dspace]/webapps/ to
[tomcat]/webapps/ (with [tomcat] being wherever you installed Tomcat‚ also known as
$CATALINA_HOME).
For details on the contents of these separate directory trees, refer to directories.html. Note that the
[dspace-source] and [dspace] directories are always separate!
If you ever notice that many files seems to have duplicates under [dspace-source]/dspace/target do not
worry about it. This "target" directory will be used by Maven for the build process and you should not change
any file in it unless you know exactly what you are doing.
2.4.3 Installation
This method gets you up and running with DSpace quickly and easily. It is identical in both the Default Release
and Source Release distributions.
1. Create the DSpace user. This needs to be the same user that Tomcat (or Jetty etc.) will run as. e.g. as
root run:
useradd -m dspace
2. Download the latest DSpace release. There are two version available with each release of DSpace: (
dspace-n.x-release. and dspace-n.x-src-release.zzz); you only need to choose one. If you want a copy of
all underlying Java source code, you should download the dspace-n.x-src-release.xxxWithin each
version, you have a choice of compressed file format. Choose the one that best fits your environment.
a. Alternatively, you may choose to check out the latest release from the DSpace GitHub Repository.
In this case, you'd be checking out the full Java source code. You'd also want to be sure to
checkout the appropriate tag or branch. For more information on using / developing from the
GitHub Repository, see: Development with Git
3. Unpack the DSpace software. After downloading the software, based on the compression file format,
choose one of the following methods to unpack your software:
a. Zip file. If you downloaded dspace-4.x-release.zip do the following:
unzip dspace-4.x-release.zip
For ease of reference, we will refer to the location of this unzipped version of the DSpace release
as [dspace-source] in the remainder of these instructions. After unpacking the file, the user may
wish to change the ownership of the dspace-4.x-release to the "dspace" user. (And you may need
to change the group).
4. Database Setup
PostgreSQL:
Create a dspace database user. This is entirely separate from the dspace operating-
system user created above (you are still logged in as "root"):
You will be prompted (twice) for a password for the new dspace user. Then you'll be
prompted for the password of the PostgreSQL superuser (postgres).
Create a dspace database, owned by the dspace PostgreSQL user (you are still logged in
as 'root'):
You will be prompted for the password of the PostgreSQL superuser ( postgres).
Oracle:
Setting up DSpace to use Oracle is a bit different now. You will need still need to get a copy
of the Oracle JDBC driver, but instead of copying it into a lib directory you will need to
install it into your local Maven repository. (You'll need to download it first from this location:
http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html.)
Run the following command (all on one line):
mvn install:install-file
-Dfile=ojdbc6.jar
-DgroupId=com.oracle
-DartifactId=ojdbc6
-Dversion=11.2.0.3.0
-Dpackaging=jar
-DgeneratePom=true
You need to compile DSpace with an Oracle driver (ojdbc6.jar) corresponding to your
Oracle version - update the version in [dspace-source]/pom.xml E.g.:
<dependency>
<groupId>com.oracle</groupId>
<artifactId>ojdbc6</artifactId>
<version>11.2.0.3.0</version>
</dependency>
Create a database for DSpace. Make sure that the character set is one of the Unicode
character sets. DSpace uses UTF-8 natively, and it is required that the Oracle database
use the same character set. Create a user account for DSpace (e.g. dspace) and ensure
that it has permissions to add and remove tables in the database.
Uncomment and edit the Oracle database settings in [dspace-source]/build.properties (see
below for more information on the build.properties file):
db.name = oracle
db.driver = oracle.jdbc.OracleDriver
db.url = jdbc:oracle:thin:@host:port/SID
Where SID is the SID of your database defined in tnsnames.ora, default Oracle port is 1521.
Alternatively, you can use a full SID definition, e.g.:
db.url = jdbc:oracle:thin:@(description=(address_list=(address=(protocol=TCP)
(host=localhost)(port=1521)))(connect_data=(service_name=DSPACE)))
Later, during the Maven build step, don't forget to specify mvn -Ddb.name=oracle
package
5.
When you edit the "build.properties" file (or a custom *.properties file), take care not to
remove or comment out any settings. Doing so, may cause your final "dspace.cfg" file to
be misconfigured with regards to that particular setting. Instead, if you wish to remove
/disable a particular setting, just clear out its value. For example, if you don't want to be
notified of new user registrations, ensure the "mail.registration.notify" setting has no
value, e.g.
mail.registration.notify=
6. DSpace Directory: Create the directory for the DSpace installation (i.e. [dspace]). As root (or a user
with appropriate permissions), run:
mkdir [dspace]
chown dspace [dspace]
7. Build the Installation Package: As the dspace UNIX user, generate the DSpace installation package.
cd [dspace-source]/dspace/
mvn package
In the DSpace 4.0 release, the above "mvn package" command must be run from the root
source directory (i.e. [dspace-source]), otherwise you will receive build errors. This was a
small (but annoying) bug in our Maven build process, which is fixed in the 4.1 release (see DS-
1867)
Without any extra arguments, the DSpace installation package is initialized for PostgreSQL. If
you want to use Oracle instead, you should build the DSpace installation package as follows:
mvn -Ddb.name=oracle package
Without any extra arguments, the DSpace installation package will be initialized using the
settings in the [dspace-source]/build.properties file. However, if you want it to build
using a custom properties file, you may specify the "-Denv" (environment) flag as follows:
mvn -Denv=test package (would build the installation package using a custom [dspace-
source]/test.properties file)
mvn -Denv=local package (would build the installation package using a custom [dspace-
source]/local.properties file)
See General Configuration section for more details.
8. Install DSpace and Initialize Database: As the dspace UNIX user, initialize the DSpace database and
install DSpace to [dspace]:
cd [dspace-source]/dspace/target/dspace-[version]-build
ant fresh_install
To see a complete list of build targets, run: ant help The most likely thing to go wrong here is
the database connection. See the Common Problems Section.
<?xml version='1.0'?>
<Context
docBase="[dspace]/webapps/xmlui"
reloadable="true"
cachingAllowed="false"/>
<?xml version='1.0'?>
<Context
docBase="[dspace]/webapps/jspui"
reloadable="true"
cachingAllowed="false"/>
<?xml version='1.0'?>
<Context
docBase="[dspace]/webapps/oai"
reloadable="true"
cachingAllowed="false"/>
DEFINE ADDITIONAL CONTEXT PATHS FOR OTHER DSPACE WEB APPLICATIONS (SOLR, SWORD, LNI,
etc.): \[app\].xml
<?xml version='1.0'?>
<!-- CHANGE THE VALUE OF "[app]" FOR EACH APPLICATION YOU WISH TO ADD -->
<Context
docBase="[dspace]/webapps/[app]"
reloadable="true"
cachingAllowed="false"/>
The name of the file (not including the suffix ".xml") will be the name of the context, so for example
xmlui.xml defines the context at http://host:8080/xmlui. To define the root context (
http://host:8080/), name that context's file ROOT.xml.
The above Tomcat Context Settings show adding the following to each <Context>
element:reloadable="true" cachingAllowed="false"
These settings are extremely useful to have when you are first getting started with
DSpace, as they let you tweak the DSpace XMLUI (XSLTs or CSS) or JSPUI (JSPs) and
see your changes get automatically reloaded by Tomcat (without having to restart
Tomcat). However, it is worth noting that the Apache Tomcat documentation
recommends Production sites leave the default values in place (reloadable="
false" cachingAllowed="true"), as allowing Tomcat to automatically reload all
changes may result in "significant runtime overhead".
It is entirely up to you whether to keep these Tomcat settings in place. We just
recommend beginning with them, so that you can more easily customize your site
without having to require a Tomcat restart. Smaller DSpace sites may not notice any
performance issues with keeping these settings in place in Production. Larger DSpace
sites may wish to ensure that Tomcat performance is more streamlined.
Technique B. Simple and complete. You copy only (or all) of the DSpace Web application(s) you
wish to use from the [dspace]/webapps directory to the appropriate directory in your Tomcat/Jetty
/Resin installation. For example:
cp -R [dspace]/webapps/* [tomcat]/webapps* (This will copy all the web applications to
Tomcat).
cp -R [dspace]/webapps/jspui [tomcat]/webapps* (This will copy only the jspui web
application to Tomcat.)
10. Administrator Account: Create an initial administrator account:
[dspace]/bin/dspace create-administrator
11. Initial Startup! Now the moment of truth! Start up (or restart) Tomcat/Jetty/Resin. Visit the base URL(s)
of your server, depending on which DSpace web applications you want to use. You should see the
DSpace home page. Congratulations! Base URLs of DSpace Web Applications:
JSP User Interface - (e.g.) http://dspace.myu.edu:8080/jspui
XML User Interface (aka. Manakin) - (e.g.) http://dspace.myu.edu:8080/xmlui
OAI-PMH Interface - (e.g.) http://dspace.myu.edu:8080/oai/request?verb=Identify
(Should return an XML-based response)
In order to set up some communities and collections, you'll need to login as your DSpace Administrator (which
you created with create-administrator above) and access the administration UI in either the JSP or XML
user interface.
the e-mail subscription feature that alerts users of new items being deposited;
the 'media filter' tool, that generates thumbnails of images and extracts the full-text of documents for
indexing;
the 'checksum checker' that tests the bitstreams in your repository for corruption;
the sitemap generator, which enhances the ability of major search engines to index your content and
make it findable;
the curation system queueing feature, which allows administrators to "queue" tasks (to run at a later
time) from the Admin UI;
and Discovery (search & browse), OAI-PMH and Usage Statistics all receive performance benefits from
regular re-optimization
For much more information on recommended scheduled tasks, please see Scheduled Tasks via Cron.
According to the languages you wish to support, you have to make sure, that all the i18n related files are
available see the Multilingual User Interface Configuring MultiLingual Support section for the JSPUI or the
Multilingual Support for XMLUI in the configuration documentation.
The solution is to use HTTPS (HTTP over SSL, i.e. Secure Socket Layer, an encrypted transport), which
protects your passwords against being captured. You can configure DSpace to require SSL on all
"authenticated" transactions so it only accepts passwords on SSL connections.
The following sections show how to set up the most commonly-used Java Servlet containers to support HTTP
over SSL.
1. For Production use: Follow this procedure to set up SSL on your server. Using a "real" server certificate
ensures your users' browsers will accept it without complaints. In the examples below,
$CATALINA_BASE is the directory under which your Tomcat is installed.
a. Create a Java keystore for your server with the password changeit, and install your server
certificate under the alias "tomcat". This assumes the certificate was put in the file server.pem:
b. Install the CA (Certifying Authority) certificate for the CA that granted your server cert, if
necessary. This assumes the server CA certificate is in ca.pem:
c. Optional – ONLY if you need to accept client certificates for the X.509 certificate stackable
authentication module See the configuration section for instructions on enabling the X.509
authentication method. Load the keystore with the CA (certifying authority) certificates for the
authorities of any clients whose certificates you wish to accept. For example, assuming the client
CA certificate is in client1.pem:
d. Now add another Connector tag to your server.xmlTomcat configuration file, like the example
below. The parts affecting or specific to SSL are shown in bold. (You may wish to change some
details such as the port, pathnames, and keystore password)
<Connector port="8443"
maxThreads="150" minSpareThreads="25"
maxSpareThreads="75"
enableLookups="false"
disableUploadTimeout="true"
acceptCount="100" debug="0"
scheme="https" secure="true" sslProtocol="TLS"
keystoreFile="conf/keystore" keystorePass="changeit" clientAuth="true" -
ONLY if using client X.509 certs for authentication!
truststoreFile="conf/keystore" trustedstorePass="changeit" />
Also, check that the default Connector is set up to redirect "secure" requests to the same port as
your SSL connector, e.g.:
<Connector port="8080"
maxThreads="150" minSpareThreads="25"
maxSpareThreads="75"
enableLookups="false"
redirectPort="8443"
acceptCount="100" debug="0" />
2. Quick-and-dirty Procedure for Testing: If you are just setting up a DSpace server for testing, or to
experiment with HTTPS, then you don't need to get a real server certificate. You can create a "self-
signed" certificate for testing; web browsers will issue warnings before accepting it but they will function
exactly the same after that as with a "real" certificate. In the examples below, $CATALINA_BASEis the
directory under which your Tomcat is installed.
a. Optional – ONLY if you don't already have a server certificate. Follow this sub-procedure to
request a new, signed server certificate from your Certifying Authority (CA):
Create a new key pair under the alias name "tomcat". When generating your key, give the
Distinguished Name fields the appropriate values for your server and institution. CN should
be the fully-qualified domain name of your server host. Here is an example:
Then, create a CSR (Certificate Signing Request) and send it to your Certifying Authority.
They will send you back a signed Server Certificate. This example command creates a
CSR in the file tomcat.csr
Before importing the signed certificate, you must have the CA's certificate in your keystore
as a trusted certificate. Get their certificate, and import it with a command like this (for the
example mitCA.pem):
Finally, when you get the signed certificate from your CA, import it into the keystore with a
command like the following example: (cert is in the file signed-cert.pem)
Since you now have a signed server certificate in your keystore, you can, obviously, skip
the next steps of installing a signed server certificate and the server CA's certificate.
b. Create a Java keystore for your server with the password changeit, and install your server
certificate under the alias "tomcat". This assumes the certificate was put in the file server.pem:
When answering the questions to identify the certificate, be sure to respond to "First and last
name" with the fully-qualified domain name of your server (e.g. test-dspace.myuni.edu). The other
questions are not important.
c. Optional – ONLY if you need to accept client certificates for the X.509 certificate stackable
authentication module See the configuration section for instructions on enabling the X.509
authentication method. Load the keystore with the CA (certifying authority) certificates for the
authorities of any clients whose certificates you wish to accept. For example, assuming the client
CA certificate is in client1.pem:
d. Follow the procedure in the section above to add another Connector tag, for the HTTPS port, to
your server.xml file.
When using Apache 2.4.2 (and lower) in front of a DSpace webapp deployed in Tomcat,
mod_proxy_ajp and possibly mod_proxy_http breaks the connection to the back end (Tomcat)
prematurely leading to response mixups. This is reported as bug CVE-2012-3502 ( http://web.nvd.nist.
gov/view/vuln/detail?vulnId=CVE-2012-3502 ) of Apache and fixed in Apache 2.4.3 (see http://www.
apache.org/dist/httpd/CHANGES_2.4) . The 2.2.x branch hasn't shown this problem only the 2.4.x
branch has.
If you choose Apache HTTPD as your primary HTTP server, you can have it forward requests to the Tomcat
servlet container via Apache Jakarta Tomcat Connector. This can be configured to work over SSL as well. First,
you must configure Apache for SSL; for Apache 2.0 see Apache SSL/TLS Encryption for information about
using mod_ssl.
If you are using X.509 Client Certificates for authentication: add these configuration options to the
appropriate httpd configuration file, e.g. ssl.conf, and be sure they are in force for the virtual host and
namespace locations dedicated to DSpace:
Now consult the Apache Jakarta Tomcat Connector documentation to configure the mod_jk (note: NOT mod_jk2
) module. Select the AJP 1.3 connector protocol. Also follow the instructions there to configure your Tomcat
server to respond to AJP.
To use SSL on Apache HTTPD with mod_webapp consult the DSpace 1.3.2 documentation. Apache have
deprecated the mod_webapp connector and recommend using mod_jk.
To use Jetty's HTTPS support consult the documentation for the relevant tool.
You don't have to use CNRI's Handle system. At the moment, you need to change the code a little to use
something else (e.g PURLs) but that should change soon.
You'll notice that while you've been playing around with a test server, DSpace has apparently been
creating handles for you looking like hdl:123456789/24 and so forth. These aren't really Handles, since
the global Handle system doesn't actually know about them, and lots of other DSpace test installs will
have created the same IDs. They're only really Handles once you've registered a prefix with CNRI (see
below) and have correctly set up the Handle server included in the DSpace distribution. This Handle
server communicates with the rest of the global Handle infrastructure so that anyone that understands
Handles can find the Handles your DSpace has created.
If you want to use the Handle system, you'll need to set up a Handle server. This is included with
DSpace. Note that this is not required in order to evaluate DSpace; you only need one if you are running
a production service. You'll need to obtain a Handle prefix from the central CNRI Handle site.
A Handle server runs as a separate process that receives TCP requests from other Handle servers, and issues
resolution requests to a global server or servers if a Handle entered locally does not correspond to some local
content. The Handle protocol is based on TCP, so it will need to be installed on a server that can send and
receive TCP on port 2641.
1. To configure your DSpace installation to run the handle server, run the following command:
Ensure that [dspace]/handle-server matches whatever you have in dspace.cfg for the handle.dir property.
a. If you are using Windows, the proper command is:
Ensure that [dspace]/handle-server matches whatever you have in dspace.cfg for the handle.dir
property.
2. Edit the resulting [dspace]/handle-server/config.dct file to include the following lines in the "server_config"
clause:
"storage_type" = "CUSTOM"
"storage_class" = "org.dspace.handle.HandlePlugin"
This tells the Handle server to get information about individual Handles from the DSpace code.
3. Once the configuration file has been generated, you will need to go to http://hdl.handle.net/4263537/5014
to upload the generated sitebndl.zip file. The upload page will ask you for your contact information. An
administrator will then create the naming authority/prefix on the root service (known as the Global Handle
Registry), and notify you when this has been completed. You will not be able to continue the handle
server installation until you receive further information concerning your naming authority.
4. When CNRI has sent you your naming authority prefix, you will need to edit the config.dct file. The file
will be found in /[dspace]/handle-server. Look for "300:0.NA/YOUR_NAMING_AUTHORITY". Replace
YOUR_NAMING_AUTHORITY with the assigned naming authority prefix sent to you.
5. Now start your handle server (as the dspace user):
[dspace]/bin/start-handle-server
a. If you are using Windows, the proper command is (please replace "[dspace]\handle-server" with
the full path of the handle-server directory):
Ensure that [dspace]/handle-server matches whatever you have in dspace.cfg for the handle.dir
property.
Note that since the DSpace code manages individual Handles, administrative operations such as Handle
creation and modification aren't supported by DSpace's Handle server.
This script will change any handles currently assigned prefix 123456789 to prefix 1303, so for example handle
123456789/23 will be updated to 1303/23 in the database.
Sitemaps allow DSpace to expose its content without the crawlers having to index every page. HTML sitemaps
provide a list of all items, collections and communities in HTML format, whilst Google sitemaps provide the
same information in gzipped XML format.
To generate the sitemaps, you need to run [dspace]/bin/dspace generate-sitemaps This creates the sitemaps in
[dspace]/sitemaps/
When running [dspace]/bin/dspace generate-sitemaps the script informs Google that the sitemaps have been
updated. For this update to register correctly, you must first register your Google sitemap index page ( /dspace
/sitemap) with Google at http://www.google.com/webmasters/sitemaps/. If your DSpace server requires the use
of a HTTP proxy to connect to the Internet, ensure that you have set http.proxy.host and http.proxy.port in
[dspace]/config/dspace.cfg
The URL for pinging Google, and in future, other search engines, is configured in [dspace]/config/dspace.cfg
using the sitemap.engineurls setting where you can provide a comma-separated list of URLs to 'ping'.
You can generate the sitemaps automatically every day using an additional cron job:
More information on why we highly recommend enabling sitemaps can be found at Search Engine
Optimization (SEO).
2.5.6 Statistics
DSpace uses the Apache Solr application underlaying the statistics. There is no need to download any separate
software. All the necessary software is included. To understand all of the configuration property keys, the user
should refer to DSpace Statistic Configuration for detailed information.
Download the DSpace source from SourceForge and unzip it (WinZip will do this)
If you install PostgreSQL, it's recommended to select to install the pgAdmin III tool. It provides a nice
User Interface for interacting with PostgreSQL databases.
For all path separators use forward slashes (e.g. "/"). For example: "C:/dspace" is a valid Windows path.
But, be warned that "C:\dspace" IS INVALID and will cause errors.
System is up and running. User can see the DSpace home page. [Tomcat/Jetty, firewall, IP assignment,
DNS]
Database is running and working correctly. Attempt to create a user, community or collection
[PostgreSQL, Oracle] Run the test database command to see if other issues are being report: [dspace]
/bin/dspace test-database
Email subsystem is running. The user can issue the following command to test the email system. t
attempts to send a test email to the email address that is set in dspace.cfg (mail.admin). If it fails, you will
get messages informing you as to why, will refer you to the DSpace documentation. [dspace]/bin/dspace
test-email
The known bugs in a release are documented in the KNOWN_BUGS file in the source package.
Please see the DSpace bug tracker for further information on current bugs, and to find out if the bug has
subsequently been fixed. This is also where you can report any further bugs you find.
it usually means you haven't yet added the relevant configuration parameter to your PostgreSQL
configuration (see above), or perhaps you haven't restarted PostgreSQL after making the change.
Also, make sure that the db.username and db.password properties are correctly set in [dspace]
/config/dspace.cfg. An easy way to check that your DB is working OK over TCP/IP is to try this on
the command line:
Enter the dspace database password, and you should be dropped into the psql tool with a
dspace=> prompt.
Another common error looks like this:
This means that the PostgreSQL JDBC driver is not present in [dspace]/lib. See above.
GeoLiteCity Database file fails to download or install, when you run ant fresh_install: There
are two common errors that may occur:
If your error looks like this:
BUILD FAILED
/dspace-release/dspace/target/dspace-1.8.0-build/build.xml:931: java.net.
ConnectException: Connection timed out
it means that you likely either (a) don't have an internet connection to download the necessary
GeoLite Database file (used for DSpace Statistics), or (b) the GeoLite Database file's URL is no
longer valid. You should be able to resolve this issue by following the "Manually Installing
/Updating GeoLite Database File" instructions above.
Another common message looks like this:
Again, this means the GeoLite Database file cannot be downloaded or is unavailable for some
reason. You should be able to resolve this issue by following the "Manually Installing/Updating
GeoLite Database File" instructions above.
Database connections don't work, or accessing DSpace takes forever: If you find that when you try
to access a DSpace Web page and your browser sits there connecting, or if the database connections
fail, you might find that a 'zombie' database connection is hanging around preventing normal operation.
To see if this is the case, try running: ps -ef | grep postgres
You might see some processes like this:
This is normal. DSpace maintains a 'pool' of open database connections, which are re-used to
avoid the overhead of constantly opening and closing connections. If they're 'idle' it's OK; they're
waiting to be used.
However sometimes, if something went wrong, they might be stuck in the middle of a query, which
seems to prevent other connections from operating, e.g.:
This means the connection is in the middle of a SELECT operation, and if you're not using
DSpace right that instant, it's probably a 'zombie' connection. If this is the case, try running kill
on the process, and stopping and restarting Tomcat.
3 Upgrading DSpace
This section describes how to upgrade a DSpace installation from one version to the next. Details of the
differences between the functionality of each version are given in the Version History section.
In order to minimize downtime, it is always recommended to first perform a DSpace upgrade using a
Development or Test server. You should note any problems you may have encountered (and also how
to resolve them) before attempting to upgrade your Production server. It also gives you a chance to
"practice" at the upgrade. Practice makes perfect, and minimizes problems and downtime.
Additionally, if you are using a version control system, such as subversion or git, to manage your
locally developed features or modifications, then you can do all of your upgrades in your local version
control system on your Development server and commit the changes. That way your Production
server can just checkout your well tested and upgraded code.
You should perform all of the steps of each upgrade between the version from which you are starting
and the version to which you are upgrading. You do not need to install each intervening version, but
you do need to carry out all of the configuration changes and additions, and all of the database
updates, for each one. For example, when upgrading from 1.6.x to 1.8.x, you need to perform the
configuration & database upgrade steps detailed in Upgrading From 1.6.x to 1.7.x followed by those
detailed in Upgrading From 1.7.x to 1.8.x.
These instructions are valid for any of the following upgrade paths:
For more information about specific fixes released in each 4.x version, please refer to the Release
Notes.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 4.x. Whenever you see these path references,
be sure to replace them with the actual path names on your local system.
Database: Make a snapshot/dump of the database. For the PostgreSQL database use Postgres'
pg_dump command. For example:
Assetstore: Backup the directory ([dspace]/assetstore by default, and any other assetstores
configured in the [dspace]/config/dspace.cfg "assetstore.dir" and "assetstore.dir.#" settings)
Configuration: Backup the entire directory content of [dspace]/config.
Customizations: If you have custom code, such as themes, modifications, or custom scripts, you will
want to back them up to a safe location.
cd [dspace-source]/dspace/
mvn -U clean package
6. Update DSpace.
a. Update the DSpace installed directory with the new code and libraries. Issue the following
commands:
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant update
b. Updating to 4.7 database schema:The database schema has minor updates in 4.7. So, you will
need to update your existing DSpace 4.x database. Please use the appropriate command and
SQL script to update your database:
PostgreSQL: psql --user [dspace-dbms-user] -f [dspace-source]/dspace
/etc/postgres/database_schema_4-47.sql [dspace-database]
You should be prompted for the database password.
Oracle: sqlplus [dspace-dbms-user]/[database password] [dspace-source]
/dspace/etc/oracle/database_schema_4-47.sql
NOTE: [dspace-dbms-user] will be the value of db.username in config
/dspace.cfg. The database password will be the value of db.
password. [dspace-database] will be the part of db.url following
the last slash.
c. Updating to 4.8 database schema: The database schema has minor updates in 4.8. So, you will
need to update your existing DSpace 4.7 database. (NOTE: ensure your database has been
upgraded to 4.7 prior to updating to 4.8). Please use the appropriate command and SQL script to
update your database:
PostgreSQL: psql --user [dspace-dbms-user] -f [dspace-source]/dspace
/etc/postgres/database_schema_4-48.sql [dspace-database]
You should be prompted for the database password.
Oracle: sqlplus [dspace-dbms-user]/[database password] [dspace-source]
/dspace/etc/oracle/database_schema_4-48.sql
NOTE: [dspace-dbms-user] will be the value of db.username in config
/dspace.cfg. The database password will be the value of db.
password. [dspace-database] will be the part of db.url following
the last slash.
7. Check whether your DSpace instance is affected by either of the below bugs. There were a few
database level bugs resolved in DSpace 4.1 and 4.2, which may require some institutions to run a script
on their database content to resolve them. These do NOT affect all institutions, but you should be aware
of them:
a. Fixing the effects of DS-1536 - If your institution uses a Handle prefix which contains a period (e.g.
123.456/x), then you should run the recommended scripts (see below) on your database.
b. Fixing the effects of DS-2036 - If your institution uses an Oracle database backend with Discovery
(for search/browse), then you should run the recommended script (see below) on your database.
8. Update your DSpace Configurations (if needed). There are no new required configurations in DSpace
4.1 or 4.2. So, your existing DSpace 4.x configurations should work fine.
9. Refresh Browse and Search Indexes. Though there are not any database changes, it is a good policy
to rebuild your search and browse indexes when upgrading to a new release. To do this, run the
following command from your DSpace install directory (as the dspace user):
[dspace]/bin/dspace index-discovery -f
a. If you're still using Lucene (you disabled Discovery): If you are using Lucene for search
/browse, you will also need to refresh Lucene indexes by running the following command:
[dspace]/bin/dspace index-lucene-init
10. Deploy Web Applications. If necessary, copy the web applications files from your [dspace]
/webapps directory to the subdirectory of your servlet container (e.g. Tomcat):
cp -R [dspace]/webapps/* [tomcat]/webapps/
The following code is tested to work with Postgres. The Oracle code is believed to work, but hasn't
been tested on an affected instance. Make sure to have a proper database backup before trying either
version and to verify whether it fixed the problem before moving on.
UPDATE metadatavalue
SET text_value = 'http://hdl.handle.net/'||handle
FROM metadatafieldregistry, handle
WHERE text_value = 'http://hdl.handle.net/XXXXX'
AND metadatafieldregistry.metadata_field_id = metadatavalue.metadata_field_id
AND metadatafieldregistry.element = 'identifier'
AND metadatafieldregistry.qualifier = 'uri'
AND handle.resource_type_id = 2
AND handle.resource_id = metadatavalue.item_id;
/* NOTE, you'll need to run this code in two stages: in SQL Developer, first run the query below,
then cut/paste the results back into a new query, and run that query */
These instructions are valid for any of the following upgrade paths:
For more information about specific fixes released in each 4.x version, please refer to the appropriate
release notes:
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 4.0. Whenever you see these path references,
be sure to replace them with the actual path names on your local system.
Database: Make a snapshot/dump of the database. For the PostgreSQL database use Postgres'
pg_dump command. For example:
Assetstore: Backup the directory ([dspace]/assetstore by default, and any other assetstores
configured in the [dspace]/config/dspace.cfg "assetstore.dir" and "assetstore.dir.#" settings)
Configuration: Backup the entire directory content of [dspace]/config.
Customizations: If you have custom code, such as themes, modifications, or custom scripts, you will
want to back them up to a safe location.
cd [dspace-source]/dspace/
mvn -U clean package
In the DSpace 4.0 release, the above "mvn -U clean package" command must be run from the
root source directory (i.e. [dspace-source]), otherwise you will receive build errors. This was
a small (but annoying) bug in our Maven build process, which is fixed in the 4.1 release (see
DS-1867)
6. Update DSpace.
a. Update the DSpace installed directory with the new code and libraries. Issue the following
commands:
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant update
b. The database schema has changed in 4.0. So, you will need to update your existing DSpace 3.x
database. Please use the appropriate command and SQL script to update your database:
PostgreSQL:
Upgrade database to 4.0: psql --user [dspace-dbms-user] -f [dspace-
source]/dspace/etc/postgres/database_schema_3-4.sql [dspace-
database]
You should be prompted for the database password.
Upgrade database to 4.7: psql --user [dspace-dbms-user] -f [dspace-
source]/dspace/etc/postgres/database_schema_4-47.sql [dspace-
database]
You should be prompted for the database password.
Upgrade database to 4.8: psql --user [dspace-dbms-user] -f [dspace-
source]/dspace/etc/postgres/database_schema_4-48.sql [dspace-
database]
You should be prompted for the database password.
Oracle:
Upgrade database to 4.0: sqlplus [dspace-dbms-user]/[database
password] [dspace-source]/dspace/etc/oracle/database_schema_3-
4.sql
Upgrade database to 4.7: sqlplus [dspace-dbms-user]/[database
password] [dspace-source]/dspace/etc/oracle/database_schema_4-
47.sql
Upgrade database to 4.8: sqlplus [dspace-dbms-user]/[database
password] [dspace-source]/dspace/etc/oracle/database_schema_4-
48.sql
[dspace-dbms-user] will be the value of db.username in config/dspace.cfg. The database
password will be the value of db.password. [dspace-database] will be the part of db.url
following the last slash.
7. Update your DSpace Configurations. You should review your configuration for new and changed
configurations in DSpace 4.0. In the specific case of dspace.cfg it is recommended to start with a fresh
copy of the file from the new version and copy your site-specific settings from the old file. Read the new
file carefully to see if you need (or want) other alterations. Please notice that the default search and
browse support has changed from the old Lucene/DBMS-based method to Discovery.
8. Deploy Web Applications. If necessary, copy the web applications files from your [dspace]
/webapps directory to the subdirectory of your servlet container (e.g. Tomcat):
cp -R [dspace]/webapps/* [tomcat]/webapps/
wget http://localhost:8080/solr/<core>/update?optimize=true
Depending on your set-up, some or all of the following cores may exist: search, statistics, oai. To see
what solr cores exist in your set-up, look for the directories in [dspace]/solr - each directory
corresponds to a solr core.
If you have been through several DSpace upgrades and have not done this, there is a chance that your
indexes are in a format too old for the most recent Solr to convert:
Format error
You may be able to use your older DSpace installation before upgrading it, to upgrade your indexes
enough that a second optimization after upgrading DSpace will succeed. You can also use an external
tool with its own version of the required libraries, such as Luke. You will need a version of Lucene (the
indexing library used by Solr) which can convert the earlier version into the later one. So, for example,
you could use the Solr webapp. in DSpace 1.8.x or 3.x, or Luke 3.5, to upgrade version 2 indexes to
version 3. Once you have version 3 index files, you should be able to upgrade to DSpace 4.x, and the
next optimization you do will upgrade the indexes again to version 4.
11. Refresh Browse and Search Indexes. DSpace 4 relies on SOLR based Discovery for both search and
browse purposes. To update the Discovery indexes, run the following command from your DSpace install
directory as the dspace user.
[dspace]/bin/dspace index-discovery -f
12. Check your cron / Task Scheduler jobs. The index maintenance commands' names have changed to
make them clearer. You will need to update your scripts.
These instructions are valid for any of the following upgrade paths:
For more information about specific fixes released in each 3.x version, please refer to the appropriate
release notes:
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 3.x. Whenever you see these path references,
be sure to replace them with the actual path names on your local system.
Database: Make a snapshot/dump of the database. For the PostgreSQL database use Postgres'
pg_dump command. For example:
Assetstore: Backup the directory ([dspace]/assetstore by default, and any other assetstores
configured in the [dspace]/config/dspace.cfg "assetstore.dir" and "assetstore.dir.#" settings)
Configuration: Backup the entire directory content of [dspace]/config.
Customizations: If you have custom code, such as themes, modifications, or custom scripts, you will
want to back them up to a safe location.
cd [dspace-source]/dspace/
mvn -U clean package
5. Stop Tomcat. Take down your servlet container. For Tomcat, use the $CATALINA_HOME/shutdown.sh
script. (Many Unix-based installations will have a startup/shutdown script in the /etc/init.d or /etc
/rc.d directories.)
6. Update DSpace.
a. Update the DSpace installed directory with the new code and libraries. Issue the following
commands:
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant update
b. No database changes have been made in DSpace 3.1 or 3.2. So, there is no need to update your
existing DSpace 3.x database.
7. Update your DSpace Configurations (if needed). There are no new required configurations in DSpace
3.1 or 3.2. So, your existing DSpace 3.x configurations should work fine. However, there are a few minor,
optional updates to be aware of:
a. In DSpace 3.0, the OAI Harvester (XMLUI only) settings were accidentally removed from the
[dspace]/config/modules/oai.cfg file. They have been reinstated in DSpace 3.1 or
above. Please see DS-1461 for more info.
b. In DSpace 3.2, it is again possible to custom the <description> tag of the OAI-PMH "Identify"
response. This is achieved through a new (optional) configuration "description.file" within the
[dspace]/config/modules/oai.cfg file. Please see DS-1479 for more info.
8. Refresh Browse and Search Indexes. Though there are not any database changes, it is a good policy
to rebuild your search and browse indexes when upgrading to a new release. To do this, run the
following command from your DSpace install directory (as the dspace user):
[dspace]/bin/dspace index-init
a. Refresh Discovery: If you are using Discovery (Solr) for search/browse, you will also need to
refresh Discovery indexes by running the following command:
[dspace]/bin/dspace update-discovery-index -f
9. Deploy Web Applications. If necessary, copy the web applications files from your [dspace]
/webapps directory to the subdirectory of your servlet container (e.g. Tomcat):
cp -R [dspace]/webapps/* [tomcat]/webapps/
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 3.0. Whenever you see these path references,
be sure to replace them with the actual path names on your local system. You should also check the
DSpace Release 3.0 Notes to see what changes are in this version.
In DSpace 3.0 there have been a few significant changes to how you upgrade and configure DSpace.
Notably:
A build.properties file has been introduced: This file provides a convenient place to set the
most commonly used configuration properties held in dspace.cfg. For a more detailed
explanation please refer to the Installing DSpace and Configuration Reference sections.
Database: Make a snapshot/dump of the database. For the PostgreSQL database use Postgres'
pg_dump command. For example:
Assetstore: Backup the directory ([dspace]/assetstore by default, and any other assetstores
configured in the [dspace]/config/dspace.cfg "assetstore.dir" and "assetstore.dir.#" settings)
Configuration: Backup the entire directory content of [dspace]/config.
Customizations: If you have custom code, such as themes, modifications, or custom scripts, you will
want to back them up to a safe location.
cd [dspace-source]/dspace/
mvn -U clean package
6. Update DSpace.
a. Update the DSpace installed directory with the new code and libraries. Issue the following
commands:
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant update
[dspace]/bin/dspace index-init
a. Refresh Discovery: If you are using Discovery (Solr) for search/browse, you will also need to
refresh Discovery indexes by running the following command:
[dspace]/bin/dspace update-discovery-index -f
9. Update OAI-PMH indexes. DSpace 3.0 comes with a brand new OAI 2.0 Server which uses a Solr
backend by default. As such, it needs to have its indexes updated on a regular basis. To update the OAI
2.0 indexes, you should run the following command:
This same 'dspace oai import' command should also be run on a regular basis (e.g. via
cron) to keep the OAI 2.0 indexes in sync. For more information, see the Scheduled Tasks
section of the OAI 2.0 documentation.
If you are using OAI-PMH, but do not yet have the Solr webapp ( [dspace]/webapps/solr/)
installed, you will need to:
(1) EITHER Modify the default OAI 2.0 config file ([dspace]/config/modules/oai.cfg) to
use a database backend. See: OAI 2.0 Server#UsingDatabase
(2) OR install/enable Tomcat to use the DSpace Solr webapp ([dspace]/webapps/solr/),
and optionally configure the OAI settings in [dspace]/config/modules/oai.cfg
search.anonymous = true
A new feature in 3.0 is that you can now put users into DSpace groups based on a part of their DN in
LDAP. See the new login.groupmap.* options in Authentication
Plugins#ConfiguringLDAPAuthentication.
11. Deploy Web Applications. If necessary, copy the web applications files from your [dspace]
/webapps directory to the subdirectory of your servlet container (e.g. Tomcat):
cp -R [dspace]/webapps/* [tomcat]/webapps/
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 1.8. Whenever you see these path references,
be sure to replace them with the actual path names on your local system. You should also check the
DSpace Release 1.8.0 Notes to see what changes are in this version.
In DSpace 1.8.0, there have been a few significant changes to how you upgrade and configure
DSpace. Notably:
The dspace.cfg has been "split up": Many "module" configurations have now been moved
out of the 'dspace.cfg' and into separate configuration files in the [dspace]/config
/modules/ directory.
Authentication Configurations are now in [dspace]/config/modules
/authentication*.cfg files
Batch Metadata Editing Configurations are now in the [dspace]/config/modules
/bulkedit.cfg file
Discovery Configurations are now in the [dspace]/config/modules/discovery.
cfg file
OAI-PMH / OAI-ORE Configurations are now in the [dspace]/config/modules
/oai.cfg file
Solr Statistics Configurations are now in the [dspace]/config/modules/solr-
statistics.cfg file
SWORD Configurations are now in [dspace]/config/modules/sword*.cfg files
All other DSpace configurations are still in the dspace.cfg configuration file.
Behavior of 'ant update' has changed: The ant update upgrade command now defaults to
replacing any existing configuration files (though the existing configuration files will first be
backed up to a file with the suffix *.old).
In prior versions of DSpace (before 1.8.0), this ant update command would leave
existing configuration files intact (and you would have to manually merge in new
configuration settings, which would be in a file with the suffix *.new). If you prefer this
previous behavior, you can still achieve the same result by running:
ant -Doverwrite=false update
WARNING: If you choose to run ant -Doverwrite=false update please be aware
that this will not auto-upgrade any of your configuration files. This means you must
closely watch the output of this command, and ensure you manually upgrade all
configuration files in the [dspace]/config/ directory as well as all Solr configurations
/schemas in the [dspace]/solr/search/conf/ and [dspace]/solr/statistics
/conf/ directories.
The structure of the source release has now been changed: Please see Advanced
Customisation for more details.
Database: Make a snapshot/dump of the database. For the PostgreSQL database use Postgres'
pg_dump command. For example:
Assetstore: Backup the directory ([dspace]/assetstore by default, and any other assetstores
configured in the [dspace]/config/dspace.cfg "assetstore.dir" and "assetstore.dir.#" settings)
Configuration: Backup the entire directory content of [dspace]/config.
Customizations: If you have custom code, such as themes, modifications, or custom scripts, you will
want to back them up to a safe location.
cd [dspace-source]/dspace/
mvn -U clean package
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant -Dconfig=[dspace]/config/dspace.cfg update
The ant update script has changed slightly as of DSpace 1.8. It now defaults to
replacing your existing configuration files (after backing them up first). See the Changes
to the DSpace 1.8 Upgrade / Configuration Process note at the top of this page for more
details.
b. No database changes have been made in either 1.8.1 or 1.8.2. So there is no need to update your
existing 1.8 database.
6. Update your DSpace Configurations.
a. There are no new configurations for the 1.8.1 or 1.8.2 releases. Your existing 1.8 configuration
files should work fine
7. Refresh Browse and Search Indexes. Though there are not any database changes, it is a good policy
to rebuild your search and browse indexes when upgrading to a new release. To do this, run the
following command from your DSpace install directory (as the dspace user):
[dspace]/bin/dspace index-init
8. Deploy Web Applications. If necessary, copy the web applications files from your [dspace]/webapps
directory to the subdirectory of your servlet container (e.g. tomcat):
cp -R [dspace]/webapps/* [tomcat]/webapps/
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 1.8. Whenever you see these path references,
be sure to replace them with the actual path names on your local system. You should also check the
DSpace Release 1.8.0 Notes to see what changes are in this version.
In DSpace 1.8.0, there have been a few significant changes to how you upgrade and configure
DSpace. Notably:
The dspace.cfg has been "split up": Many "module" configurations have now been moved
out of the 'dspace.cfg' and into separate configuration files in the [dspace]/config
/modules/ directory.
Authentication Configurations are now in [dspace]/config/modules
/authentication*.cfg files
Batch Metadata Editing Configurations are now in the [dspace]/config/modules
/bulkedit.cfg file
Discovery Configurations are now in the [dspace]/config/modules/discovery.
cfg file
OAI-PMH / OAI-ORE Configurations are now in the [dspace]/config/modules
/oai.cfg file
Solr Statistics Configurations are now in the [dspace]/config/modules/solr-
statistics.cfg file
SWORD Configurations are now in [dspace]/config/modules/sword*.cfg files
All other DSpace configurations are still in the dspace.cfg configuration file.
Behavior of 'ant update' has changed: The ant update upgrade command now defaults to
replacing any existing configuration files (though the existing configuration files will first be
backed up to a file with the suffix *.old).
In prior versions of DSpace (before 1.8.0), this ant update command would leave
existing configuration files intact (and you would have to manually merge in new
configuration settings, which would be in a file with the suffix *.new). If you prefer this
previous behavior, you can still achieve the same result by running:
ant -Doverwrite=false update
WARNING: If you choose to run ant -Doverwrite=false update please be aware
that this will not auto-upgrade any of your configuration files. This means you must
closely watch the output of this command, and ensure you manually upgrade all
configuration files in the [dspace]/config/ directory as well as all Solr configurations
/schemas in the [dspace]/solr/search/conf/ and [dspace]/solr/statistics
/conf/ directories.
The structure of the source release has now been changed: Please see Advanced
Customisation for more details.
Database: Make a snapshot/dump of the database. For the PostgreSQL database use Postgres'
pg_dump command. For example:
Assetstore: Backup the directory ([dspace]/assetstore by default, and any other assetstores
configured in the [dspace]/config/dspace.cfg "assetstore.dir" and "assetstore.dir.#" settings)
Configuration: Backup the entire directory content of [dspace]/config.
Customizations: If you have custom code, such as themes, modifications, or custom scripts, you will
want to back them up to a safe location.
cd [dspace-source]/dspace/
mvn -U clean package
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant -Dconfig=[dspace]/config/dspace.cfg update
The ant update script has changed slightly as of DSpace 1.8.0. It now defaults to
replacing your existing configuration files (after backing them up first). See the Changes
to the DSpace 1.8 Upgrade / Configuration Process note at the top of this page for more
details.
b. Apply database changes to your database by running one of the following database schema
upgrade scripts.
Applying a database change will alter your database! The database upgrade scripts
have been tested, however, there is always a chance something could go wrong. So, do
yourself a favor and create a backup of your database before you run a script that will
alter your database.
i. PostgreSQL: [dspace-source]/dspace/etc/postgres/database_schema_17-18.sql
ii. Oracle: [dspace-source]/dspace/etc/oracle/database_schema_17-18.sql
6.
b. Set New Configurations: There are new configuration settings in the new release that add or
change functionality. You should review these new settings and ensure that they are set according
to your needs.
i. New settings for Creative Commons licensing in dspace.cfg
ii. New settings for RSS feeds (see "webui.feed.podcast.*") in dspace.cfg which now
support richer features, such as iTunes podcast and publishing to iTunesU
iii. Several major configuration sections have now been removed from the dspace.cfg
and separated into their own config files. Configuration sections which have been
moved include Authentication settings, Batch Metadata Editing settings, Discovery settings,
OAI-PMH/OAI-ORE settings, Statistics settings and SWORD settings. So, any
configurations from these sections should be removed from your existing dspace.cfg file, as
they will be ignored. For more information, see the Changes to the DSpace 1.8 Upgrade /
Configuration Process note at the top of this page.
iv. Several new configurations files have been created in the [dspace]/config/modules/
directory. Each of these corresponds to a new feature in 1.8.0 (or a configuration section
which has now been moved out of the dspace.cfg file):
authentication-*.cfg files : new location for Authentication Configurations.
bulkedit.cfg : new location for Batch Metadata Editing Configurations.
discovery.cfg : new location for Discovery Configurations.
fetchccdata.cfg : configuration for new "Fetch CC Data" Curation Task.
oai.cfg : new location for OAI-PMH / OAI-ORE Configurations.
solr-statistics.cfg : new location for Solr Statistics Configurations.
spring.cfg : configuration file for DSpace Service Manager (should not need
modification).
submission-curation.cfg - configuration file for new Virus Scanning on
Submission feature.
sword-client.cfg : configuration file for new SWORDv1 Client feature.
sword-server.cfg : new location for SWORDv1 Server Configurations.
swordv2-server.cfg : configuration file for new SWORDv2 Server feature.
translator.cfg : configuration for new "Microsoft Translator" Curation Task.
workflow.cfg : configuration for new Configurable Workflow feature.
v. Finally, there is a new [dspace]/config/spring/ directory which holds Spring
Framework configuration files. The vast majority of users should never need to modify
these settings, but they are available for hardcore developers who wish to add new
features via the DSpace Services Framework (based on Spring Framework).
7. Generate Browse and Search Indexes. The search mechanism has been updated in 1.8, so you must
perform a full reindex of your site for searching and browsing to work. To do this, run the following
command from your DSpace install directory (as the dspace user):
[dspace]/bin/dspace index-init
8. Deploy Web Applications. If necessary, copy the web applications files from your [dspace]/webapps
directory to the subdirectory of your servlet container (e.g. tomcat):
cp -R [dspace]/webapps/* [tomcat]/webapps/
Updating the file statistics will ensure that old file downloads statistics data will also be filterable using the filter
bundle feature. The benefit of upgrading is that only files within, for example, the "ORIGINAL" bundle are shown
as opposed to also showing statistics from the LICENSE bundle. More information about this feature can be
found at Statistics differences between DSpace 1.7.x and 1.8.0
Applying this change will involve dumping all the old file statistics into a file and re-loading them.
Therefore it is wise to create a backup of the [DSpace]/solr/statistics/data directory. It is best to create
this backup when the Tomcat/Jetty/Resin server program isn't running.
When a backup has been made, start the Tomcat/Jetty/Resin server program.
The update script has one option (-r) which will, if given, not only update the broken file statistics but also
delete statistics for files that were removed from the system. If this option isn't active, these statistics will receive
the "BITSTREAM_DELETED" bundle name.
#The -r is optional
[dspace]/bin/dspace stats-util -b -r
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 1.7.2. Whenever you see these path
references, be sure to replace them with the actual path names on your local system. Additionally, be
sure to backup your configs, source code modifications, and database before doing a step that could
destroy your instance.
cd [dspace-source]/dspace/
mvn -U clean package
6. Update DSpace. Update the DSpace installed directory with the new code and libraries. Issue the
following commands:
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant -Dconfig=[dspace]/config/dspace.cfg update
7. Generate Browse and Search Indexes. Though there are not any database changes between 1.7 and
1.7.1 release, it makes good policy to rebuild your search and browse indexes when upgrading to a new
release. To do this, run the following command from your DSpace install directory (as the dspace user):
[dspace]/bin/dspace index-init
8. Deploy Web Applications. Copy the web applications files from your [dspace]/webapps directory to
the subdirectory of your servlet container (e.g. tomcat):
cp -R [dspace]/webapps/* [tomcat]/webapps/
9. Restart servlet container. Now restart your Tomcat/Jetty/Resin server program and test out the
upgrade.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 1.7.x. Whenever you see these path references,
be sure to replace them with the actual path names on your local system.
Upgrade Steps
Before upgrading you need to check you are using the current recommended minimum versions of
Java (1.6), Maven (2.0.8 or above) and ant (1.7 or above). For more details, see the current listing of
Prerequisite Software
1. Backup Your DSpace. First, and foremost, make a complete backup of your system, including:
A snapshot of the database. To have a "snapshot" of the PostgreSQL database, you need to shut
it down during the backup. You should also have your regular PostgreSQL Backup output (using
Postgres' pg_dump command).
The asset store ([dspace]/assetstore by default)
Your configuration files and customizations to DSpace (including any customized scripts).
2. Download DSpace 1.7.x Retrieve the new DSpace 1.7.x source code either as a download from
DSpace.org or check it out directly from the SVN code repository. If you downloaded DSpace do not
unpack it on top of your existing installation. Refer to Installation Instructions, Step 3 for unpacking
directives.
3. Stop Tomcat. Take down your servlet container. For Tomcat, use the $CATALINA_HOME/bin
/shutdown.sh script. (Many Unix-based installations will have a startup/shutdown script in the /etc
/init.d or /etc/rc.d directories).
4. Apply any customizations. If you have made any local customizations to your DSpace installation they
will need to be migrated over to the new DSpace. These are normally housed in one of the following
places:
JSPUI modifications: [dspace-source]/dspace/modules/jspui/src/main/webapp/
XMLUI modifications: [dspace-source]/dspace/modules/xmlui/src/main/webapp/
5. Update Configuration Files. Some parameters have changed and some are new. You can either
attempt to make these changes in your current 1.6.x dspace.cfg file, or you can start with a new 1.7
dspace.cfg and re-modify it as needed. Configuration changes are noted below:
*CORRECTION* There was a missing hyphen "-" in the property key for mail character set:
# Set the default mail character set. This may be over ridden by providing a line
# inside the email template "charset: <encoding>", otherwise this default is used.
#mail.charset = UTF-8
*CORRECTION* This was moved from the end of the solr configuration section to just under
Logging Configurations:
# If enabled, the logging and the solr statistics system will look for
# an X-Forward header. If it finds it, it will use this for the user IP Address
# useProxies = true
*CHANGE* The MediaFilter is now able to process Power Point Text Extracter
*CHANGE* The Crosswalk Plugin Configuration has changed with additional lines. Edit your file
accordingly:
plugin.selfnamed.org.dspace.content.crosswalk.IngestionCrosswalk = \
org.dspace.content.crosswalk.XSLTIngestionCrosswalk, \
org.dspace.content.crosswalk.QDCCrosswalk
plugin.named.org.dspace.content.crosswalk.StreamIngestionCrosswalk = \
org.dspace.content.crosswalk.NullStreamIngestionCrosswalk = NULLSTREAM, \
org.dspace.content.crosswalk.CreativeCommonsRDFStreamIngestionCrosswalk =
DSPACE_CCRDF, \
org.dspace.content.crosswalk.LicenseStreamIngestionCrosswalk = DSPACE_DEPLICENSE
plugin.named.org.dspace.content.crosswalk.DisseminationCrosswalk = \
org.dspace.content.crosswalk.AIPDIMCrosswalk = DIM, \
org.dspace.content.crosswalk.AIPTechMDCrosswalk = AIP-TECHMD, \
org.dspace.content.crosswalk.SimpleDCDisseminationCrosswalk = DC, \
org.dspace.content.crosswalk.SimpleDCDisseminationCrosswalk = dc, \
org.dspace.content.crosswalk.PREMISCrosswalk = PREMIS, \
org.dspace.content.crosswalk.METSDisseminationCrosswalk = METS, \
org.dspace.content.crosswalk.METSDisseminationCrosswalk = mets, \
org.dspace.content.crosswalk.METSRightsCrosswalk = METSRIGHTS, \
org.dspace.content.crosswalk.OREDisseminationCrosswalk = ore, \
org.dspace.content.crosswalk.DIMDisseminationCrosswalk = dim, \
org.dspace.content.crosswalk.RoleCrosswalk = DSPACE-ROLES
*NEW*
plugin.named.org.dspace.content.crosswalk.StreamDisseminationCrosswalk = \
org.dspace.content.crosswalk.CreativeCommonsRDFStreamDisseminationCrosswalk =
DSPACE_CCRDF, \
org.dspace.content.crosswalk.CreativeCommonsTextStreamDisseminationCrosswalk =
DSPACE_CCTEXT, \
org.dspace.content.crosswalk.LicenseStreamDisseminationCrosswalk = DSPACE_DEPLICENSE
*CHANGE* The Packager Plugin Configuration has changed considerably. Carefully revise your
configuration file:
plugin.named.org.dspace.content.packager.PackageIngester = \
org.dspace.content.packager.DSpaceAIPIngester = AIP, \
org.dspace.content.packager.PDFPackager = Adobe PDF, PDF, \
org.dspace.content.packager.DSpaceMETSIngester = METS, \
org.dspace.content.packager.RoleIngester = DSPACE-ROLES
*CHANGE* The Mets Ingester configuration has change and been updated. Carefully edit:
# Default Option to make use of collection templates when using the METS ingester (defa
ult is false)
mets.default.ingest.useCollectionTemplate = false
# Locally cached copies of METS schema documents to save time on ingest. This
# will often speed up validation & ingest significantly. Before enabling
# these settings, you must manually cache all METS schemas in
*NEW* A new property has been added to control the discovery index for the Event System
Configuration:
*NEW* License bundle display is now configurable. You are able to either display or suppress.
# whether to display the contents of the licence bundle (often just the deposit
# licence in standard DSpace installation
webui.licence_bundle.show = false
*CORRECTION* Thumbnail generation. The width and height of generated thumbnails had a
missing equal sign.
*CORRECTION and ADDITION* Authority Control Settings have changed. Formerly called
ChoiceAuthority, it is now referred to as DCInputAuthority.
#plugin.selfnamed.org.dspace.content.authority.ChoiceAuthority = \
# org.dspace.content.authority.DCInputAuthority, \
# org.dspace.content.authority.DSpaceControlledVocabulary
*NEW* You are now able to order your bitstreams by sequence id or file name.
*NEW* DSpace now includes a metadata mapping feature that makes repository content
discoverable by Google Scholar:
# Enabling this property will concatenate CSS, JS and JSON files where possible.
# CSS files can be concatenated if multiple CSS files with the same media attribute
# are used in the same page. Links to the CSS files are automatically referring to the
# concatenated resulting CSS file.
# The theme sitemap should be updated to use the ConcatenationReader for all js, css
and json
# files before enabling this property.
#xmlui.theme.enableConcatenation = false
# Enabling this property will minify CSS, JS and JSON files where possible.
# The theme sitemap should be updated to use the ConcatenationReader for all js, css
and json
# files before enabling this property.
#xmlui.theme.enableMinification = false
*NEW* XMLUI Mirage Theme. This is a new theme with it's own configuration:
# DSpace by default uses 100 records as the limit for the oai responses.
# This can be altered by enabling the oai.response.max-records parameter
# and setting the desired amount of results.
oai.response.max-records = 100
----
cd [dspace-source]/dspace/
mvn -U clean package
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant -Dconfig=[dspace]/config/dspace.cfg update
8. Update the Database. You will need to run the 1.6.x to 1.7.x database upgrade script.
For PostgreSQL:
For Oracle: Execute the upgrade script, e.g. with sqlplus, recording the output:
a. Start SQL*Plus with sqlplus [connect args]
b. Record the output: SQL> spool 'upgrade.lst'
c. Run the upgrade script SQL> @[dspace-source]/dspace/etc/oracle
/database_schema_16-17.sql
d. Turn off recording of output: SQL> spool off
9. Generate Browse and Search Indexes. It's always good policy to rebuild your search and browse
indexes when upgrading to a new release. To do this, run the following command from your DSpace
install directory (as the 'dspace' user):
[dspace]/bin/dspace index-init
10. Deploy Web Applications. If your servlet container (e.g. Tomcat) is not configured to look for new web
applications in your [dspace]/webapps directory, then you will need to copy the web applications files
into the appropriate subdirectory of your servlet container. For example:
cp -R [dspace]/webapps/* [tomcat]/webapps/
11. Restart servlet container. Now restart your Tomcat/Jetty/Resin server program and test out the
upgrade.
12. Add a new crontab entry, or add to your system's scheduler, the following, run as the DSpace user, to
enable routine maintenance of your SOLR indexes. If you do not run this command daily, it is likely your
production instances of DSpace will exhaust the available memory in your servlet container
[dspace]/bin/dspace stats-util -o
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 1.6.1. Whenever you see these path
references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
1. Backup Your DSpace. First, and foremost, make a complete backup of your system, including:
A snapshot of the database. To have a "snapshot" of the PostgreSQL database, you need to shut
it down during the backup. You should also have your regular PostgreSQL Backup output (using
pg_dump commands).
The asset store ([dspace]/assetstore by default)
Your configuration files and customizations to DSpace (including any customized scripts).
2. Download DSpace 1.6.2 Retrieve the new DSpace 1.6.2 source code either as a download from
DSpace.org or check it out directly from the SVN code repository. If you downloaded DSpace do not
unpack it on top of your existing installation. Refer to Installation Instructions, Step 3 for unpacking
directives.
3. Stop Tomcat. Take down your servlet container. For Tomcat, use the $CATALINA/shutdown.sh script.
(Many installations will have a startup/shutdown script in the /etc/init.d or /etc/rc.d directories.
4. Apply any customizations. If you have made any local customizations to your DSpace installation they
will need to be migrated over to the new DSpace. These are housed in one of the following places:
JSPUI modifications: [dspace-source]/dspace/modules/jspui/src/main/webapp/
XMLUI modifications: [dspace-source]/dspace/modules/xmlui/src/main/webapp/
5. Update Configuration Files. There are no additions to this release. So you do not have to update the
configuration files.
6. Build DSpace. Run the following commands to compile DSpace.:
cd /[dspace-source]/dspace/
mvn -U clean package
7. Update DSpace. Update the DSpace installed directory with the new code and libraries. Issue the
following commands:
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant -Dconfig=[dspace]/config/dspace.cfg update
8. Run Registry Format Update for CC License. Creative Commons licenses have been assigned the
wrong mime-type in past versions of DSpace. Even if you are not currently using CC Licenses, you
should update your Bitstream Format Registry to include a new entry with the proper mime-type. To
update your registry, run the following command: dspace]/bin/dspace registry-loader -bitstream [dspace]
/etc/upgrades/15-16/new-bitstream-formats.xml
9. Update the Database. If you are using Creative Commons Licenses in your DSpace submission
process, you will need to run the 1.5.x to 1.6.x database upgrade script again. In 1.6.0 the improper
mime-type was being assigned to all CC Licenses. This has now been resolved, and rerunning the
upgrade script will now assign the proper mime-type to all existing CC Licenses in your DSpace
installation. NOTE: You will receive messages that most of the script additions already exist. This is
normal, and nothing to be worried about.
For PostgreSQL: psql -U [dspace-user] -f [dspace-source]/dspace/etc/postgres
/database_schema_15-16.sql [database name] (Your database name is by default 'dspace').
Example:
psql -U dspace -f
/dspace-1.6-1-src-release/dspace/etc/postgres/database_schema_15-16.sql dspace
(The line break above is cosmetic. Please place your command in one line.
For Oracle: Execute the upgrade script, e.g. with sqlplus, recording the output:
a. Start SQL*Plus with sqlplus [connect args]
b. Record the output: SQL> spool 'upgrade.lst'
c. Run the upgrade script SQL> @[dspace-source]/dspace/etc/oracle
/database_schema_15-16.sql
d. Turn off recording of output: SQL> spool off
e. Please note: The final few statements WILL FAIL. That is because you have run some
queries and use the results to construct the statements to remove the constraints, manually
‚ Oracle doesn't have any easy way to automate this (unless you know PL/SQL). So, look
for the comment line beginning:
and follow the instructions in the actual SQL file. Refer to the contents of the spool file
"upgrade.lst" for the output of the queries you'll need.
10. Generate Browse and Search Indexes. Though there are not any database changes in the 1.6 to 1.6.1
release, it makes good policy to rebuild your search and browse indexes when upgrading to a new
release. To do this, run the following command from your DSpace install directory (as the dspace user):
[dspace]/bin/dspace index-init
11. Deploy Web Applications. Copy the web applications files from your [dspace]/webapps directory to the
subdirectory of your servlet container (e.g. tomcat):cp -R [dspace]/webapps/* [tomcat]/webapps/
12. Restart servlet. Now restart your Tomcat/Jetty/Resin server program and test out the upgrade.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 1.6. Whenever you see these path references, be
sure to replace them with the actual path names on your local system.
Upgrade Steps
1. Backup Your DSpace. First, and foremost, make a complete backup of your system, including:
A snapshot of the database. _To have a "snapshot" of the PostgreSQL database, you need to
shut it down during the backup. You should also have your regular PostgreSQL Backup output
(using pg_dump commands). _
The asset store ([dspace]/assetstore by default)
Your configuration files and customizations to DSpace (including any customized scripts).
2. Download DSpace 1.6.x Retrieve the new DSpace 1.6.x source code either as a download from
DSpace.org or check it out directly from the SVN code repository. If you downloaded DSpace do not
unpack it on top of your existing installation. Refer to Installation Instructions, Step 3 for unpacking
directives.
3. Stop Tomcat. Take down your servlet container. For Tomcat, use the $CATALINA/shutdown.sh script.
(Many installations will have a startup/shutdown script in the /etc/init.d or /etc/rc.d directories.
4. Apply any customizations. If you have made any local customizations to your DSpace installation they
will need to be migrated over to the new DSpace. These are housed in one of the following places:
JSPUI modifications: [dspace-source]/dspace/modules/jspui/src/main/webapp/
XMLUI modifications: [dspace-source]/dspace/modules/xmlui/src/main/webapp/
5. Update Configuration Files. Some of the parameters have change and some are new. Changes will be
noted below:
**CHANGE** The base url and oai urls property keys are set differently
# DSpace host name - should match base URL. Do not include port number
dspace.hostname = localhost
# DSpace base URL. Include port number etc., but NOT trailing slash
# Change to xmlui if you wish to use the xmlui as the default, or remove
# "/jspui" and set webapp of your choice as the "ROOT" webapp in
# the servlet engine.
dspace.url = ${dspace.baseUrl}/xmlui
# The base URL of the OAI webapp (do not include /request).
dspace.oai.url = ${dspace.baseUrl}/oai
**NEW** New email options (Add these at the end of the "Email Settings" sub-section):
# Pass extra settings to the Java mail library. Comma separated, equals sign
# between the key and the value.
#mail.extraproperties = mail.smtp.socketFactory.port=465, \
# mail.smtp.socketFactory.class=javax.net.ssl.SSLSocketFactory, \
# mail.smtp.socketFactory.fallback=false
# COLLECTION ADMIN
#core.authorization.collection-admin.policies = true
#core.authorization.collection-admin.template-item = true
#core.authorization.collection-admin.submitters = true
#core.authorization.collection-admin.workflows = true
#core.authorization.collection-admin.admin-group = true
# item owned by his collection
#core.authorization.collection-admin.item.delete = true
#core.authorization.collection-admin.item.withdraw = true
#core.authorization.collection-admin.item.reinstatiate = true
#core.authorization.collection-admin.item.policies = true
# also bundle...
#core.authorization.collection-admin.item.create-bitstream = true
#core.authorization.collection-admin.item.delete-bitstream = true
#core.authorization.collection-admin.item-admin.cc-license = true
# ITEM ADMIN
#core.authorization.item-admin.policies = true
# also bundle...
#core.authorization.item-admin.create-bitstream = true
#core.authorization.item-admin.delete-bitstream = true
#core.authorization.item-admin.cc-license = true
**CHANGE** METS ingester has been revised. (Modify In "Crosswalk and Packager Plugin
Settings")
# Option to make use of collection templates when using the METS ingester (default is f
alse)
mets.submission.useCollectionTemplate = false
# Crosswalk Plugins:
plugin.named.org.dspace.content.crosswalk.IngestionCrosswalk = \
org.dspace.content.crosswalk.PREMISCrosswalk = PREMIS \
org.dspace.content.crosswalk.OREIngestionCrosswalk = ore \
org.dspace.content.crosswalk.NullIngestionCrosswalk = NIL \
org.dspace.content.crosswalk.QDCCrosswalk = qdc \
org.dspace.content.crosswalk.OAIDCIngestionCrosswalk = dc \
org.dspace.content.crosswalk.DIMIngestionCrosswalk = dim
plugin.selfnamed.org.dspace.content.crosswalk.IngestionCrosswalk = \
org.dspace.content.crosswalk.XSLTIngestionCrosswalk
plugin.named.org.dspace.content.crosswalk.DisseminationCrosswalk = \
org.dspace.content.crosswalk.SimpleDCDisseminationCrosswalk = DC \
org.dspace.content.crosswalk.SimpleDCDisseminationCrosswalk = dc \
org.dspace.content.crosswalk.PREMISCrosswalk = PREMIS \
org.dspace.content.crosswalk.METSDisseminationCrosswalk = METS \
org.dspace.content.crosswalk.METSDisseminationCrosswalk = mets \
org.dspace.content.crosswalk.OREDisseminationCrosswalk = ore \
org.dspace.content.crosswalk.QDCCrosswalk = qdc \
org.dspace.content.crosswalk.DIMDisseminationCrosswalk = dim
**CHANGE** Event Settings have had the following revision with the addition of 'harvester'
(modify in "Event System Configuration"):
also:
**NEW** New option for using the Batch Editing capabilities. See Batch Metadata Editing
Configuration and also System Administration : Batch Metadata Editing
# Metadata elements to exclude when exporting via the user interfaces, or when
# using the command line version and not using the -a (all) option.
# bulkedit.ignore-on-export = dc.date.accessioned, dc.date.available, \
# dc.date.updated, dc.description.provenance
**NEW** Ability to hide metadata fields is now available. (Look for "JSPUI & XMLUI Configurations
" Section)
**NEW**Choice Control and Authority Control options are available (Look for "JSPUI & XMLUI
Configurations" Section):
And also:
#plugin.named.org.dspace.content.authority.ChoiceAuthority = \
# org.dspace.content.authority.SampleAuthority = Sample, \
# org.dspace.content.authority.LCNameAuthority = LCNameAuthority, \
# org.dspace.content.authority.SHERPARoMEOPublisher = SRPublisher, \
# org.dspace.content.authority.SHERPARoMEOJournalTitle = SRJournalTitle
##
## This sets the default lowest confidence level at which a metadata value is included
**REPLACE** RSS Feeds now support Atom 1.0. Replace its previous configuration with the one
below:
# enable syndication feeds - links display on community and collection home pages
# (This setting is not used by XMLUI, as you enable feeds in your theme)
webui.feed.enable = false
# number of DSpace items per feed (the most recent submissions)
webui.feed.items = 4
# maximum number of feeds in memory cache
# value of 0 will disable caching
webui.feed.cache.size = 100
# number of hours to keep cached feeds before checking currency
# value of 0 will force a check with each request
webui.feed.cache.age = 48
# which syndication formats to offer
# use one or more (comma-separated) values from list:
# rss_0.90, rss_0.91, rss_0.92, rss_0.93, rss_0.94, rss_1.0, rss_2.0
webui.feed.formats = rss_1.0,rss_2.0,atom_1.0
# URLs returned by the feed will point at the global handle server (e.g. http://hdl.
handle.net/123456789/1)
# Set to true to use local server URLs (i.e. http://myserver.myorg/handle/123456789/1)
webui.feed.localresolve = false
# Customize the metadata fields to show in the feed for each item's description.
# Elements will be displayed in the order that they are specified here.
#
# The form is <schema prefix>.<element>[.<qualifier>|.*][(date)], ...
#
# Similar to the item display UI, the name of the field for display
# in the feed will be drawn from the current UI dictionary,
# using the key:
# "metadata.<field>"
#
# e.g. "metadata.dc.title"
# "metadata.dc.contributor.author"
# "metadata.dc.date.issued"
webui.feed.item.description = dc.title, dc.contributor.author, \
dc.contributor.editor, dc.description.abstract, \
dc.description
# name of field to use for authors (Atom only) - repeatable
webui.feed.item.author = dc.contributor.author
# Customize the extra namespaced DC elements added to the item (RSS) or entry
# (Atom) element. These let you include individual metadata values in a
# structured format for easy extraction by the recipient, instead of (or in
# addition to) appending these values to the Description field.
## dc:creator value(s)
#webui.feed.item.dc.creator = dc.contributor.author
**NEW* *Exposure of METS metadata can be now hidden. (See "OAI-PMH SPECIFIC
CONFIGURATIONS" in the dspace.cfg file)
# When exposing METS/MODS via OAI-PMH all metadata that can be mapped to MODS
# is exported. This includes description.provenance which can contain personal
# email addresses and other information not intended for public consumption. To
# hide this information set the following property to true
oai.mets.hide-provenance = true
**NEW* *SWORD has added the following to accept MIME/types. (See "SWORD Specific
Configurations" Section)
**NEW* *New OAI Harvesting Configuration settings are now available. (See " OAI Harvesting
Configurations"
#---------------------------------------------------------------#
#--------------OAI HARVESTING CONFIGURATIONS--------------------#
#---------------------------------------------------------------#
# These configs are only used by the OAI-ORE related functions #
#---------------------------------------------------------------#
# Amount of time subtracted from the from argument of the PMH request to account
# for the time taken to negotiate a connection. Measured in seconds. Default value is 1
20.
#harvester.timePadding = 120
# How frequently the harvest scheduler checks the remote provider for updates,
# measured in minutes. The default value is 12 hours (or 720 minutes)
#harvester.harvestFrequency = 720
# The heartbeat is the frequency at which the harvest scheduler queries the local
# database to determine if any collections are due for a harvest cycle (based on
# the harvestFrequency) value. The scheduler is optimized to then sleep until the
# next collection is actually ready to be harvested. The minHeartbeat and
# maxHeartbeat are the lower and upper bounds on this timeframe. Measured in seconds.
# Default minHeartbeat is 30. Default maxHeartbeat is 3600.
#harvester.minHeartbeat = 30
#harvester.maxHeartbeat = 3600
# How many harvest process threads the scheduler can spool up at once. Default value
is 3.
#harvester.maxThreads = 3
# How much time passes before a harvest thread is terminated. The termination process
# waits for the current item to complete ingest and saves progress made up to that
point.
# Measured in hours. Default value is 24.
#harvester.threadTimeout = 24
# When harvesting an item that contains an unknown schema or field within a schema what
# should the harvester do? Either add a new registry item for the field or schema,
ignore
# the specific field or schema (importing everything else about the item), or fail with
# an error. The default value if undefined is: fail.
# Possible values: 'fail', 'add', or 'ignore'
harvester.unknownField = add
harvester.unknownSchema = fail
# The webapp responsible for minting the URIs for ORE Resource Maps.
# If using oai, the dspace.oai.uri config value must be set.
# The URIs generated for ORE ReMs follow the following convention for both cases.
# format: [baseURI]/metadata/handle/[theHandle]/ore.xml
# Default value is oai
#ore.authoritative.source = oai
# A harvest process will attempt to scan the metadata of the incoming items
# (dc.identifier.uri field, to be exact) to see if it looks like a handle.
# If so, it matches the pattern against the values of this parameter.
# If there is a match the new item is assigned the handle from the metadata value
# instead of minting a new one. Default value: hdl.handle.net
#harvester.acceptedHandleServer = hdl.handle.net, handle.myu.edu
# Pattern to reject as an invalid handle prefix (known test string, for example)
# when attempting to find the handle of harvested items. If there is a match with
# this config parameter, a new handle will be minted instead. Default value: 123456789.
#harvester.rejectedHandlePrefix = 123456789, myTestHandle
**NEW** SOLR Statistics Configurations. For a little more detailed information regarding the
configuration, please refer to DSpace SOLR Statistics Configuration ; or, for installation
procedures, refer to Advanced Installation: Dspace Statistics.
#---------------------------------------------------------------#
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
#---------------------------------------------------------------#
# These configs are only used by the SOLR interface/webapp to #
# track usage statistics. #
#---------------------------------------------------------------#
statistics.item.authorization.admin=true
cd /[dspace-source]/dspace/
mvn -U clean package
7. Update the database. The database schema needs to be updated to accommodate changes to the
database. SQL files contain the relevant updates are provided. Please note that if you have made any
local customizations to the database schema, you should consult these updates and make sure they will
work for you.
For PostgreSQL: psql -U [dspace-user] -f [dspace-source]/dspace/etc
/postgres/database_schema_15-16.sql [database name] (Your database name is by
default 'dspace'). Example: psql -U dspace -f /dspace-1.6-1-src-release/dspace
/etc/postgres/database_schema_15-16.sql dspace
For Oracle: Execute the upgrade script, e.g. with sqlplus, recording the output:
a. Start SQL*Plus with sqlplus [connect args]
b. Record the output: SQL> spool 'upgrade.lst'
c. Run the upgrade script SQL> @[dspace-source]/dspace/etc/oracle
/database_schema_15-16.sql
d. Turn off recording of output: SQL> spool off
e. Please note: The final few statements WILL FAIL. That is because you have run some
queries and use the results to construct the statements to remove the constraints,
manually; Oracle doesn't have any easy way to automate this (unless you know PL/SQL).
So, look for the comment line beginning:
and follow the instructions in the actual SQL file. Refer to the contents of the spool file
"upgrade.lst" for the output of the queries you'll need.
8. Update DSpace. Update the DSpace installed directory with the new code and libraries. Issue the
following commands:
cd [dspace-source]/dspace/target/dspace-[version]-build.dir
ant -Dconfig=[dspace]/config/dspace.cfg update
9. Update Registry for the CC License. If you use the CC License, an incorrect mime-type type is being
assigned. You will need to run the following step: _dspace]/bin/dspace registry-loader -bitstream [dspace]
/etc/upgrades/15-16/new-bitstream-formats.xml _
10. Generate Browse and Search Indexes. It makes good policy to rebuild your search and browse
indexes when upgrading to a new release. Almost every release has database changes and indexes can
be affected by this. In the DSpace 1.6 release there is Authority Control features and those will need the
indexes to be regenerated. To do this, run the following command from your DSpace install directory (as
the dspace user):[dspace]/bin/dspace index-init
11. Deploy Web Applications. Copy the web applications files from your [dspace]/webapps directory to the
subdirectory of your servlet container (e.g. tomcat):cp -R [dspace]/webapps/* [tomcat]/webapps/
12. Restart servlet. Now restart your Tomcat/Jetty/Resin server program and test out the upgrade.
13. Rolling Log Appender Upgrade. You will want to upgrade your logs to the new format to use the SOLR
Statistics now included with DSpace. While the commands for this are found in Chapter 8, here are the
steps needed to be performed.
[dspace]/bin/dspace stats-log-converter -i input file name -o output file name -m (if you
have more than one dspace.log file)
[dspace]/bin/dspace stats-log-importer -i input file name (probably the output name from
above) -m
The user is highly recommended to see the System Administration : DSpace Log Converter
documentation.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 1.5. Whenever you see these path references, be
sure to replace them with the actual path names on your local system.
Upgrade Steps
The changes in DSpace 1.5.2 do not include any database schema upgrades, and the upgrade should be
straightforward.
1. Backup your DSpace First and foremost, make a complete backup of your system, including:
A snapshot of the database
The asset store ([dspace]/assetstore by default)
Your configuration files and customizations to DSpace
Your statistics scripts ([dspace]/bin/stat*) which contain customizable dates
2. Download DSpace 1.5.2 Get the new DSpace 1.5.2 source code either as a download from DSpace.org
or check it out directly from the SVN code repository. If you downloaded DSpace do not unpack it on top
of your existing installation.
3. Build DSpace Run the following commands to compile DSpace.
cd [dspace-source]/dspace/
mvn package
You will find the result in [dspace-source]/dspace/target/dspace-1.5.2-build.dir/; inside this directory is the
compiled binary distribution of DSpace.
4. Stop Tomcat Take down your servlet container, for Tomcat use the bin/shutdown.sh script.
5. Apply any customizations If you have made any local customizations to your DSpace installation they
will need to be migrated over to the new DSpace. Commonly these modifications are made to "JSP"
pages located inside the [dspace 1.4.2]/jsp/local directory. These should be moved [dspace-source]
/dspace/modules/jspui/src/main/webapp/ in the new build structure. See Customizing the JSP Pages for
more information.
6. Update DSpace Update the DSpace installed directory with new code and libraries. Inside the [dspace-
source]/dspace/target/dspace-1.5-build.dir/ directory run:
cd [dspace-source]/dspace/target/dspace-1.5-build.dir/
ant -Dconfig=[dspace]/config/dspace.cfg update
7. Update configuration files This ant target preserves existing files in [dspace]/config _ and will copy any
new configuration files in place. If an existing file prevents copying the new file in place, the new file will
have the suffix _.new, for example [dspace]/local/dspace.cfg.new. Note: there is also a configuration
option -Doverwrite=true which will instead copy the conflicting target files to *.old suffixes and overwrite
target file then with the new file (essentially the opposite) this is beneficial for developers and those who
use the [dspace-source]/dspace/config to maintain their changes.
cd [dspace-source]/dspace/target/dspace-1.5-build.dir/
ant -Dconfig=[dspace]/config/dspace.cfg update_configs
You must then verify that you've merged and differenced in the [dspace]/config/*/.new files into your
configuration. Some of the new parameters you should look out for in dspace.cfg include:
New option to restrict the expose of private items. The following needs to be added to dspace.cfg:
# If required, a group name can be given here, and all users who log in
# using the DSpace password system will automatically become members of
# this group. This is useful if you want a group made up of all internal
# authenticated users.
#password.login.specialgroup = group-name
# If required, a group name can be given here, and all users who log in
# to LDAP will automatically become members of this group. This is useful
# if you want a group made up of all internal authenticated users.
#ldap.login.specialgroup = group-name
MARC 21 ordering should now be used as default. Unless you have it set already, or you have it
set to a different value, the following should be set:
plugin.named.org.dspace.sort.OrderFormatDelegate = org.dspace.sort.
OrderFormatTitleMarc21=title
# This is the search scope value for the LDAP search during
# autoregistering. This will depend on your LDAP server setup.
# This value must be one of the following integers corresponding
# to the following values:
# object scope : 0
# one level scope : 1
# subtree scope : 2
#ldap.search_scope = 2
# The full DN and password of a user allowed to connect to the LDAP server
# and search for the DN of the user trying to log in. If these are not specified,
# the initial bind will be performed anonymously.
#ldap.search.user = cn=admin,ou=people,o=myu.edu
#ldap.search.password = password
# If your LDAP server does not hold an email address for a user, you can use
# the following field to specify your email domain. This value is appended
# to the netid in order to make an email address. E.g. a netid of 'user' and
# ldap.netid_email_domain as '@example.com' would set the email of the user
# to be 'user@example.com
#ldap.netid_email_domain = @example.com
# this option below specifies that the email comes from the mentioned header.
# The value is CASE-Sensitive.
authentication.shib.email-header = MAIL
# this option below forces the software to acquire the email from Tomcat.
authentication.shib.email-use-tomcat-remote-user = true
# when user is fully authN on IdP but would not like to release
# his/her roles to DSpace (for privacy reason?), what should be
# the default roles be given to such users?
# The values are separated by semi-colon or comma
# authentication.shib.default-roles = Staff, Walk-ins
# The following mappings specify role mapping between IdP and Dspace.
# the left side of the entry is IdP's role (prefixed with
#---------------------------------------------------------------#
#--------------SWORD SPECIFIC CONFIGURATIONS--------------------#
#---------------------------------------------------------------#
# These configs are only used by the SWORD interface #
#---------------------------------------------------------------#
# The base URL of the SWORD deposit. This is the URL from
# which DSpace will construct the deposit location urls for
# collections.
#
# The default is {dspace.url}/sword/deposit
#
# In the event that you are not deploying DSpace as the ROOT
# application in the servlet container, this will generate
# incorrect URLs, and you should override the functionality
# by specifying in full as below:
#
# sword.deposit.url = http://www.myu.ac.uk/sword/deposit
# The base URL of the SWORD media links. This is the URL
# which DSpace will use to construct the media link urls
# for items which are deposited via sword
#
# The default is {dspace.url}/sword/media-link
#
# In the event that you are not deploying DSpace as the ROOT
# application in the servlet container, this will generate
# incorrect URLs, and you should override the functionality
# by specifying in full as below:
#
# sword.media-link.url = http://www.myu.ac.uk/sword/media-link
# Should the server offer as the default the list of all Communities
# to a Service Document request. If false, the server will offer
# the list of all collections, which is the default and recommended
# behavior at this stage.
#
# NOTE: a service document for Communities will not offer any viable
# deposit targets, and the client will need to request the list of
# Collections in the target before deposit can continue
#
sword.expose-communities = false
# The bundle name that SWORD should store incoming packages under if
# sword.keep-original-package is set to true. The default is "SWORD"
# if not value is set
#
# sword.bundle.name = SWORD
8. Restart Tomcat Restart your servlet container, for Tomcat use the bin/startup.sh script.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 1.5. Whenever you see these path references, be
sure to replace them with the actual path names on your local system.
Upgrade Steps
The changes in DSpace 1.5 are significant and wide spread involving database schema upgrades, code
restructuring, completely new user and programmatic interfaces, and new build system.
1. Backup your DSpace First and foremost, make a complete backup of your system, including:
A snapshot of the database
The asset store ([dspace]/assetstore by default)
Your configuration files and customizations to DSpace
Your statistics scripts ([dspace]/bin/stat*) which contain customizable dates
2. Download DSpace 1.5.x Get the new DSpace 1.5 source code either as a download from SourceForge
or check it out directly from the SVN code repository. If you downloaded DSpace do not unpack it on top
of your existing installation.
3. Build DSpace The build process has radically changed for DSpace 1.5. With this new release the build
system has moved to a maven-based system enabling the various projects (JSPUI, XMLUI, OAI, and
Core API) into separate projects. See the Installing DSpace section for more information on building
DSpace using the new maven-based build system. Run the following commands to compile DSpace.
cd [dspace-source]/dspace/;
mvn package
You will find the result in [dspace-source]/dspace/target/dspace-1.5-build.dir/; inside this directory is the
compiled binary distribution of DSpace.
4. Stop Tomcat Take down your servlet container, for Tomcat use the bin/shutdown.sh script.
5. Update dspace.cfg Several new parameters need to be added to your [dspace]/config/dspace.cfg.
While it is advisable to start with a fresh DSpace 1.5 _dspace.cfg configuration file_ here are the
minimum set of parameters that need to be added to an old DSpace 1.4.2 configuration.
org.dspace.app.webui.util.CollectionStyleSelection
# Browse indexes
webui.browse.index.1 = dateissued:item:dateissued
webui.browse.index.2 = author:metadata:dc.contributor.*:text
webui.browse.index.3 = title:item:title
webui.browse.index.4 = subject:metadata:dc.subject.*:text
# Sorting options
webui.itemlist.sort-option.1 = title:dc.title:title
webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
webui.itemlist.sort-option.3 =
dateaccessioned:dc.date.accessioned:date
# Recent submissions
recent.submissions.count = 5
Item+Create|Modify|Modify_Metadata:Collection+Add|Remove
6. Add 'xmlui.xconf' Manakin configuration The new Manakin user interface available with DSpace 1.5
requires an extra configuration file that you will need to manually copy it over to your configuration
directory.
cp [dspace-source]/dspace/config/xmlui.xconf
[dspace]/config/xmlui.xconf
cp [dspace-source]/dspace/config/item-submission.xml
[dspace]/config/item-submission.xml
cp [dspace-source]/dspace/config/item-submission.dtd
[dspace]/config/item-submission.dtd
8. Add new 'input-forms.xml' and 'input-forms.dtd' configurable submission configuration The input-
forms.xml now has an included dtd reference to support validation. You'll need to merge in your changes
to both file/and or copy them into place.
cp [dspace-source]/dspace/config/input-forms.xml
[dspace]/config/input-forms.xml
cp [dspace-source]/dspace/config/input-forms.dtd
[dspace]/config/inputforms.dtd
cp [dspace-source]/dspace/config/crosswalks/sword-swap-ingest.xsl
[dspace]/config/crosswalks/sword-swap-ingest.xsl
cp
[dspace-source]/dspace/config/crosswalks/xhtml-head-item.properties
[dspace]/config/crosswalks/xhtml-head-item.properties
10. Add 'registration_notify' email files A new configuration option (registration.notify = you@your-email.
com) can be set to send a notification email whenever a new user registers to use your DSpace. The
email template for this email needs to be copied.
cp [dspace-source]/dspace/config/emails/registration_notify
[dspace]/config/emails/registration_notify
11. Update the database The database schema needs updating. SQL files contain the relevant updates are
provided, note if you have made any local customizations to the database schema you should consult
these updates and make sure they will work for you.
For PostgreSQL psql -U [dspace-user] -f [dspace-source]/dspace/etc/database_schema_14-15.
sql [database-name]
For Oracle [dspace-source]/dspace/etc/oracle/database_schema_142-15.sql contains the
commands necessary to upgrade your database schema on oracle.
12. Apply any customizations If you have made any local customizations to your DSpace installation they
will need to be migrated over to the new DSpace. Commonly these modifications are made to "JSP"
pages located inside the [dspace 1.4.2]/jsp/local directory. These should be moved [dspace-source]
/dspace/modules/jspui/src/main/webapp/ in the new build structure. See Customizing the JSP Pages for
more information.
13. Update DSpace Update the DSpace installed directory with new code and libraries. Inside the [dspace-
source]/dspace/target/dspace-1.5-build.dir/ directory run:
cd [dspace-source]/dspace/target/dspace-1.5-build.dir/;
ant -Dconfig=[dspace]/config/dspace.cfg update
14. Update the Metadata Registry New Metadata Registry updates are required to support SWORD.
cp [dspace-source]/dspace/config/registries/sword-metadata.xml
[dspace]/config/registries/sword-metadata.xml;
[dspace]/bin/dsrun org.dspace.administer.MetadataImporter -f
[dspace]/config/registries/sword-metadata.xml
15. Rebuild browse and search indexes One of the major new features of DSpace 1.5 is the browse
system which necessitates that the indexes be recreated. To do this run the following command from
your DSpace installed directory:
[dspace]/bin/index-init
16. Update statistics scripts The statistics scripts have been rewritten for DSpace 1.5. Prior to 1.5 they
were written in Perl, but have been rewritten in Java to avoid having to install Perl. First, make a note of
the dates you have specified in your statistics scripts for the statistics to run from. You will find these in
[dspace]/bin/stat-initial, as $start_year and $start_month. Note down these values.Copy the new stats
scripts:
cp [dspace-source]/dspace/bin/stat* [dspace]/bin/
Then edit your statistics configuration file with the start details. Add the following to [dspace]/conf/dstat.cfg
# the year and month to start creating reports from# - year as four digits (e.g. 2005)# - month as a
number (e.g. January is 1, December is 12)start.year = 2005start.month = 1 Replace '2005' and '1' as
with the values you noted down. dstat.cfg also used to contain the hostname and service name as
displayed at the top of the statistics. These values are now taken from dspace.cfg so you can remove
host.name and host.url from dstat.cfg if you wish. The values now used are dspace.hostname and
dspace.name from dspace.cfg
17. Deploy web applications Copy the web applications files from your [dspace]/webapps directory to the
subdirectory of your servlet container (e.g. Tomcat):
cp [dspace]/webapps/* [tomcat]/webapps/
18. Restart Tomcat Restart your servlet container, for Tomcat use the bin/startup.sh script.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-1.4.x-source] to the source directory for DSpace 1.4.x. Whenever you see these path
references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
The changes in 1.4.x releases are only code and configuration changes so the update is simply a matter of
rebuilding the wars and slight changes to your config file.
1. Get the new DSpace 1.4.x source code from the DSpace page on SourceForge and unpack it
somewhere. Do not unpack it on top of your existing installation!!
2. Copy the PostgreSQL driver JAR to the source tree. For example:
cd [dspace]/lib
cp postgresql.jar [dspace-1.4.x-source]/lib
3. Note: Licensing conditions for the handle.jar file have changed. As a result, the latest version of the
handle.jar file is not included in this distribution. It is recommended you read the [new license
conditions|http://www.handle.net/upgrade_6-2_DSpace.html] and decide whether you wish to update
your installation's handle.jar. If you decide to update, you should replace the existing handle.jar in
[dspace-1.4.x-source]/lib with the new version.
4. Take down Tomcat (or whichever servlet container you're using).
5. A new configuration item webui.html.max-depth-guess has been added to avoid infinite URL spaces. Add
the following to the dspace.cfg file:
If webui.html.max-depth-guess is not present in dspace.cfg the default value is used. If archiving entire
web sites or deeply nested HTML documents it is advisable to change the default to a higher value more
suitable for these types of materials.
6. Your 'localized' JSPs (those in jsp/local) now need to be maintained in the source directory. If you have
locally modified JSPs in your [dspace]/jsp/local directory, you will need to merge the changes in the new
1.4.x versions into your locally modified ones. You can use the diff command to compare your JSPs
against the 1.4.x versions to do this. You can also check against the DSpace CVS.
7. In [dspace-1.4.x-source] run:
8. Copy the .war Web application files in [dspace-1.4.x-source]/build to the webapps sub-directory of your
servlet container (e.g. Tomcat). e.g.:
cp [dspace-1.4.x-source]/build/*.war [tomcat]/webapps
If you're using Tomcat, you need to delete the directories corresponding to the old .war files. For
example, if dspace.war is installed in [tomcat]/webapps/dspace.war, you should delete the [tomcat]
/webapps/dspace directory. Otherwise, Tomcat will continue to use the old code in that directory.
9. Restart Tomcat.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-1.4.x-source] to the source directory for DSpace 1.4.x. Whenever you see these path
references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
1. First and foremost, make a complete backup of your system, including:
A snapshot of the database
The asset store ([dspace]/assetstore by default)
Your configuration files and localized JSPs
2. Download the latest DSpace 1.4.x source bundle and unpack it in a suitable location (not over your
existing DSpace installation or source tree!)
3. Copy the PostgreSQL driver JAR to the source tree. For example:
cd [dspace]/lib
cp postgresql.jar [dspace-1.4.x-source]/lib
4. Note: Licensing conditions for the handle.jar file have changed. As a result, the latest version of the
handle.jar file is not included in this distribution. It is recommended you read the new license conditions
and decide whether you wish to update your installation's handle.jar. If you decide to update, you should
replace the existing handle.jar in [dspace-1.4.x-source]/lib with the new version.
5. Take down Tomcat (or whichever servlet container you're using).
6. Your DSpace configuration will need some updating:
In dspace.cfg, paste in the following lines for the new stackable authentication feature, the new
method for managing Media Filters, and the Checksum Checker.
#authentication.x509.keystore.password = changeit
plugin.sequence.org.dspace.app.mediafilter.MediaFilter = \
org.dspace.app.mediafilter.PDFFilter,
org.dspace.app.mediafilter.HTMLFilter, \
org.dspace.app.mediafilter.WordFilter,
org.dspace.app.mediafilter.JPEGFilter
# to enable branded preview: remove last line above, and uncomment 2
lines below
# org.dspace.app.mediafilter.WordFilter,
org.dspace.app.mediafilter.JPEGFilter, \
# org.dspace.app.mediafilter.BrandedPreviewJPEGFilter
plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checke
r.SimpleDispatcher
# Standard interface implementations. You shouldn't need to tinker
with these.
plugin.single.org.dspace.checker.ReporterDAO=org.dspace.checker.Report
erDAOImpl
If you have customized advanced search fields (search.index.n fields, note that you now need to
include the schema in the values. Dublin Core is specified as dc. So for example, if in 1.3.2 you
had:
search.index.1 = title:title.alternative
search.index.1 = title:dc.title.alternative
9. The database schema needs updating. SQL files containing the relevant file are provided. If you've
modified the schema locally, you may need to check over this and make alterations.
For PostgreSQL: [dspace-1.4.x-source]/etc/database_schema_13-14.sql contains the SQL
commands to achieve this for PostgreSQL. To apply the changes, go to the source directory, and
run:psql -f etc/database_schema_13-14.sql [DSpace database name] -h localhost
For Oracle: [dspace-1.4.x-source]/etc/oracle/database_schema_13-14.sql should be run on the
DSpace database to update the schema.
10. Rebuild the search indexes: [dspace]/bin/index-all
11. Copy the .war Web application files in [dspace-1.4-source]/build to the webapps sub-directory of your
servlet container (e.g. Tomcat). e.g.:
cp [dspace-1.4-source]/build/*.war
[tomcat]/webapps
If you're using Tomcat, you need to delete the directories corresponding to the old .war files. For
example, if dspace.war is installed in [tomcat]/webapps/dspace.war, you should delete the [tomcat]
/webapps/dspace directory. Otherwise, Tomcat will continue to use the old code in that directory.
12. Restart Tomcat.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-1.3.2-source] to the source directory for DSpace 1.3.2. Whenever you see these path
references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
The changes in 1.3.2 are only code changes so the update is simply a matter of rebuilding the wars.
1. Get the new DSpace 1.3.2 source code from the DSpace page on SourceForge and unpack it
somewhere. Do not unpack it on top of your existing installation!!
2. Copy the PostgreSQL driver JAR to the source tree. For example:
cd [dspace]/lib
cp postgresql.jar [dspace-1.3.2-source]/lib
4. Your 'localized' JSPs (those in jsp/local) now need to be maintained in the source directory. If you have
locally modified JSPs in your [dspace]/jsp/local directory, you will need to merge the changes in the new
1.3.2 versions into your locally modified ones. You can use the diff command to compare the 1.3.1 and
1.3.2 versions to do this.
5. In [dspace-1.3.2-source] run:
6. Copy the .war Web application files in [dspace-1.3.2-source]/build to the webapps sub-directory of your
servlet container (e.g. Tomcat). e.g.:
cp [dspace-1.3.2-source]/build/*.war
[tomcat]/webapps
If you're using Tomcat, you need to delete the directories corresponding to the old .war files. For
example, if dspace.war is installed in [tomcat]/webapps/dspace.war, you should delete the [tomcat]
/webapps/dspace directory. Otherwise, Tomcat will continue to use the old code in that directory.
7. Restart Tomcat.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-1.3.x-source] to the source directory for DSpace 1.3.x. Whenever you see these path
references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
1. Step one is, of course, to back up all your data before proceeding!! Include all of the contents of
[dspace] and the PostgreSQL database in your backup.
2. Get the new DSpace 1.3.x source code from the DSpace page on SourceForge and unpack it
somewhere. Do not unpack it on top of your existing installation!!
3. Copy the PostgreSQL driver JAR to the source tree. For example: cd [dspace]/libcp postgresql.jar
[dspace-1.2.2-source]/lib
4. Take down Tomcat (or whichever servlet container you're using).
5. Remove the old version of xerces.jar from your installation, so it is not inadvertently later used:rm
[dspace]/lib/xerces.jar
6. Install the new config files by moving dstat.cfg and dstat.map from [dspace-1.3.x-source]/config/ to
[dspace]/config
8. Build and install the updated DSpace 1.3.x code. Go to the [dspace-1.3.x-source] directory, and run:ant -
Dconfig=[dspace]/config/dspace.cfg update
9. You'll need to make some changes to the database schema in your PostgreSQL database. [dspace-1.3.x-
source]/etc/database_schema_12-13.sql contains the SQL commands to achieve this. If you've modified
the schema locally, you may need to check over this and make alterations. To apply the changes, go to
the source directory, and run: psql -f etc/database_schema_12-13.sql [DSpace database name] -h
localhost
10. Customize the stat generating statistics as per the instructions in System Statistical Reports
11. Initialize the statistics using: [dspace]/bin/stat-initial[dspace]/bin/stat-general[dspace]/bin/stat-report-initial
[dspace]/bin/stat-report-general
12. Rebuild the search indexes: [dspace]/bin/index-all
13. Copy the .war Web application files in [dspace-1.3.x-source]/build to the webapps sub-directory of your
servlet container (e.g. Tomcat). e.g.:cp [dspace-1.3.x-source]/build/*.war [tomcat]/webapps
14. Restart Tomcat.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-1.2.2-source] to the source directory for DSpace 1.2.2. Whenever you see these path
references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
The changes in 1.2.2 are only code and config changes so the update should be fairly simple.
1. Get the new DSpace 1.2.2 source code from the DSpace page on SourceForge and unpack it
somewhere. Do not unpack it on top of your existing installation!!
2. Copy the PostgreSQL driver JAR to the source tree. For example:
cd [dspace]/lib
cp postgresql.jar [dspace-1.2.2-source]/lib
6. In [dspace-1.2.2-source] run:
7. Copy the .war Web application files in [dspace-1.2.2-source]/build to the webapps sub-directory of your
servlet container (e.g. Tomcat). e.g.:
cp [dspace-1.2.2-source]/build/*.war
[tomcat]/webapps
If you're using Tomcat, you need to delete the directories corresponding to the old .war files. For
example, if dspace.war is installed in [tomcat]/webapps/dspace.war, you should delete the [tomcat]
/webapps/dspace directory. Otherwise, Tomcat will continue to use the old code in that directory.
8. To finalize the install of the new configurable submission forms you need to copy the file [dspace-1.2.2-
source]/config/input-forms.xml into [dspace]/config.
9. Restart Tomcat.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-1.2.1-source] to the source directory for DSpace 1.2.1. Whenever you see these path
references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
The changes in 1.2.1 are only code changes so the update should be fairly simple.
1. Get the new DSpace 1.2.1 source code from the DSpace page on SourceForge and unpack it
somewhere. Do not unpack it on top of your existing installation!!
2. Copy the PostgreSQL driver JAR to the source tree. For example:
cd [dspace]/lib
cp postgresql.jar [dspace-1.2.1-source]/lib
5. You need to add a few new parameters to your [dspace]/dspace.cfg for browse/search and item
thumbnails display, and for configurable DC metadata fields to be indexed.
search.index.1 = author:contributor.*
search.index.2 = author:creator.*
search.index.3 = title:title.*
search.index.4 = keyword:subject.*
search.index.5 = abstract:description.abstract
search.index.6 = author:description.statementofresponsibility
search.index.7 = series:relation.ispartofseries
search.index.8 = abstract:description.tableofcontents
search.index.9 = mime:format.mimetype
search.index.10 = sponsor:description.sponsorship
search.index.11 = id:identifier.*
6. In [dspace-1.2.1-source] run:
7. Copy the .war Web application files in [dspace-1.2.1-source]/build to the webapps sub-directory of your
servlet container (e.g. Tomcat). e.g.:
cp [dspace-1.2.1-source]/build/*.war
[tomcat]/webapps
If you're using Tomcat, you need to delete the directories corresponding to the old .war files. For
example, if dspace.war is installed in [tomcat]/webapps/dspace.war, you should delete the [tomcat]
/webapps/dspace directory. Otherwise, Tomcat will continue to use the old code in that directory.
8. Restart Tomcat.
This document refers to the install directory for your existing DSpace installation as [dspace], and to
the source directory for DSpace 1.2 as [dspace-1.2-source]. Whenever you see these path references
below, be sure to replace them with the actual path names on your local system.
The process for upgrading to 1.2 from either 1.1 or 1.1.1 is the same. If you are running DSpace 1.0 or
1.0.1, you need to follow the instructions for Upgrading From 1.0.1 to 1.1 before following these
instructions.
Note also that if you've substantially modified DSpace, these instructions apply to an unmodified 1.1.1
DSpace instance, and you'll need to adapt the process to any modifications you've made.
Upgrade Steps
1. Step one is, of course, to back up all your data before proceeding!! Include all of the contents of
[dspace] and the PostgreSQL database in your backup.
2. Get the new DSpace 1.2 source code from the DSpace page on SourceForge and unpack it somewhere.
Do not unpack it on top of your existing installation!!
3. Copy the required Java libraries that we couldn't include in the bundle to the source tree. For example:
cd [dspace]/lib
cp activation.jar servlet.jar mail.jar
[dspace-1.2-source]/lib
5. It's a good idea to upgrade all of the various third-party tools that DSpace uses to their latest versions:
Java (note that now version 1.4.0 or later is required)
Tomcat (Any version after 4.0 will work; symbolic links are no longer an issue)
PostgreSQL (don't forget to build/download an updated JDBC driver .jar file! Also, back up the
database first.)
Ant
6. You need to add the following new parameters to your [dspace]/dspace.cfg:
There are one or two other, optional extra parameters (for controlling the pool of database connections).
See the version history for details. If you leave them out, defaults will be used.Also, to avoid future
confusion, you might like to remove the following property, which is no longer required:
config.template.oai-web.xml =
[dspace]/oai/WEB-INF/web.xml
7. The layout of the installation directory (i.e. the structure of the contents of [dspace]) has changed
somewhat since 1.1.1. First up, your 'localized' JSPs (those in jsp/local) now need to be maintained in
the source directory. So make a copy of them now! Once you've done that, you can remove [dspace]/jsp
and [dspace]/oai, these are no longer used. (.war Web application archive files are used instead). Also, if
you're using the same version of Tomcat as before, you need to remove the lines from Tomcat's conf
/server.xml file that enable symbolic links for DSpace. These are the <Context> elements you added
to get DSpace 1.1.1 working, looking something like this:
Be sure to remove the <Context> elements for both the Web UI and the OAI Web applications.
8. Build and install the updated DSpace 1.2 code. Go to the DSpace 1.2 source directory, and run:
cp [dspace-1.2-source]/config/news-*
[dspace-1.2-source]/config/mediafilter.cfg
[dspace-1.2-source]/config/dc2mods.cfg
[dspace]/config
10. You'll need to make some changes to the database schema in your PostgreSQL database. [dspace-1.2-
source]/etc/database_schema_11-12.sql contains the SQL commands to achieve this. If you've modified
the schema locally, you may need to check over this and make alterations. To apply the changes, go to
the source directory, and run:
11. A tool supplied with the DSpace 1.2 codebase will then update the actual data in the relational database.
Run it using:
[dspace]/bin/dsrun
org.dspace.administer.Upgrade11To12
[dspace]/bin/index-all
13. Delete the existing symlinks from your servlet container's (e.g. Tomcat's) webapp sub-directory. Copy the
.war Web application files in [dspace-1.2-source]/build to the webapps sub-directory of your servlet
container (e.g. Tomcat). e.g.:
cp [dspace-1.2-source]/build/*.war
[tomcat]/webapps
You might also wish to run it now to generate thumbnails and index full text for the content already in
your system.
16. Note 1: This update process has effectively 'touched' all of your items. Although the dates in the Dublin
Core metadata won't have changed (accession date and so forth), the 'last modified' date in the database
for each will have been changed. This means the e-mail subscription tool may be confused, thinking that
all items in the archive have been deposited that day, and could thus send a rather long email to lots of
subscribers. So, it is recommended that you turn off the e-mail subscription feature for the next day ,
by commenting out the relevant line in DSpace's cron job, and then re-activating it the next day. Say you
performed the update on 08-June-2004 (UTC), and your e-mail subscription cron job runs at 4am (UTC).
When the subscription tool runs at 4am on 09-June-2004, it will find that everything in the system has a
modification date in 08-June-2004, and accordingly send out huge emails. So, immediately after the
update, you would edit DSpace's 'crontab' and comment out the /dspace/bin/subs-daily line. Then, after
4am on 09-June-2004 you'd 'un-comment' it out, so that things proceed normally. Of course this means,
any real new deposits on 08-June-2004 won't get e-mailed, however if you're updating the system it's
likely to be down for some time so this shouldn't be a big problem.
17. Note 2: After consultation with the OAI community, various OAI-PMH changes have occurred:
The OAI-PMH identifiers have changed (they're now of the form oai:hostname:handle as opposed
to just Handles)
The set structure has changed, due to the new sub-communities feature.
The default base URL has changed
As noted in note 1, every item has been 'touched' and will need re-harvesting. The above means
that, if already registered and harvested, you will need to re-register your repository, effectively as
a 'new' OAI-PMH data provider. You should also consider posting an announcement to the OAI
implementers e-mail list so that harvesters know to update their systems. Also note that your site
may, over the next few days, take quite a big hit from OAI-PMH harvesters. The resumption token
support should alleviate this a little, but you might want to temporarily whack up the database
connection pool parameters in [dspace]/config/dspace.cfg. See the dspace.cfg distributed with the
source code to see what these parameters are and how to use them. (You need to stop and
restart Tomcat after changing them.)I realize this is not ideal; for discussion as to the reasons
behind this please see relevant posts to the OAI community: post one, post two. If you really can't
live with updating the base URL like this, you can fairly easily have thing proceed more-or-less as
they are, by doing the following:
Change the value of OAI_ID_PREFIX at the top of the org.dspace.app.oai.DSpaceOAICatalog
class to hdl:
Change the servlet mapping for the OAIHandler servlet back to / (from /request)
Rebuild and deploy _oai.war_However, note that in this case, all the records will be re-harvested
by harvesters anyway, so you still need to brace for the associated DB activity; also note that the
set spec changes may not be picked up by some harvesters. It's recommended you read the
above-linked mailing list posts to understand why the change was made.
Now, you should be finished!
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-1.1.1-source] to the source directory for DSpace 1.1.1. Whenever you see these path
references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
Fortunately the changes in 1.1.1 are only code changes so the update is fairly simple.
4. If you have locally modified JSPs of the following JSPs in your [dspace]/jsp/local directory, you might like
to merge the changes in the new 1.1.1 versions into your locally modified ones. You can use the diff
command to compare the 1.1 and 1.1.1 versions to do this. The changes are quite minor.
collection-home.jsp
admin/authorize-collection-edit.jsp
admin/authorize-community-edit.jsp
admin/authorize-item-edit.jsp
admin/eperson-edit.jsp
5. Restart Tomcat.
To upgrade from DSpace 1.0.1 to 1.1, follow the steps below. Your dspace.cfg does not need to be
changed. In the notes below [dspace] refers to the install directory for your existing DSpace
installation, and [dspace-1.1-source] to the source directory for DSpace 1.1. Whenever you see these
path references, be sure to replace them with the actual path names on your local system.
Upgrade Steps
1. Take down Tomcat (or whichever servlet container you're using).
2. We recommend that you upgrade to the latest version of PostgreSQL (7.3.2). Included are some notes to
help you do this (see the postgres-upgrade-notes.txt file). Note you will also have to upgrade Ant
to version 1.5 if you do this.
3. Make the necessary changes to the DSpace database. These include a couple of minor schema
changes, and some new indexes which should improve performance. Also, the names of a couple of
database views have been changed since the old names were so long they were causing problems. First
run psql to access your database (e.g. psql -U dspace -W and then enter the password), and enter these
SQL commands:
4. Fix your JSPs for Unicode. If you've modified the site 'skin' (jsp/local/layout/header-default.jsp) you'll
need to add the Unicode header, i.e.:
to the <HEAD> element. If you have any locally-edited JSPs, you need to add this page directive to the
top of all of them:
(If you haven't modified any JSPs, you don't have to do anything.)
5. Copy the required Java libraries that we couldn't include in the bundle to the source tree. For example:
cd [dspace]/lib
cp *.policy activation.jar servlet.jar mail.jar
[dspace-1.1-source]/lib
6. Compile up the new DSpace code, replacing [dspace]/config/dspace.cfg with the path to your current,
LIVE configuration. (The second line, touch `find .`, is a precaution, which ensures that the new code has
a current datestamp and will overwrite the old code. Note that those are back quotes.)
cd [dspace-1.1-source]
touch `find .`
ant
ant -Dconfig= [dspace]/config/dspace.cfg update
7. Update the database tables using the upgrader tool, which sets up the new > last_modified date in the
item table:
Run [dspace]/bin/dsrun
org.dspace.administer.Upgrade101To11
[dspace]/bin/dsrun
org.dspace.authorize.FixDefaultPolicies
9. Fix the OAICat properties file. Edit [dspace]/config/templates/oaicat.properties. Change the line that says
Identify.deletedRecord=yes
To:
Identify.deletedRecord=persistent
This is needed to fix the OAI-PMH 'Identity' verb response. Then run [dspace]/bin/install-configs.
10. Re-run the indexing to index abstracts and fill out the renamed database views:
[dspace]/bin/index-all
11. Restart Tomcat. Tomcat should be run with the following environment variable set, to ensure that
Unicode is handled properly. Also, the default JVM memory heap sizes are rather small. Adjust -
Xmx512M (512Mb maximum heap size) and -Xms64M (64Mb Java thread stack size) to suit your
hardware.
This page provides some simple tips/tricks you can use to upgrade DSpace over multiple versions (e.
g. 1.5.x -> 1.6.x -> 1.7.x -> 1.8.x -> 3.x -> 4.x).
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and
[dspace-source] to the source directory for DSpace 4.x. Whenever you see these path references,
be sure to replace them with the actual path names on your local system.
Database: Make a snapshot/dump of the database. For the PostgreSQL database use Postgres'
pg_dump command. For example:
Assetstore: Backup the directory ([dspace]/assetstore by default, and any other assetstores
configured in the [dspace]/config/dspace.cfg "assetstore.dir" and "assetstore.dir.#" settings)
Configuration: Backup the entire directory content of [dspace]/config.
Customizations: If you have custom code, such as themes, modifications, or custom scripts, you will
want to back them up to a safe location.
As an example, if you wanted to upgrade from DSpace 1.5.x to DSpace 4.x, you'd perform each of these
upgrades, one-by-one, in order:
Likely, you'd want to perform some local testing between each upgrade to ensure that each upgrade is
successful. Otherwise, if there are later issues, you may not be able to easily tell which upgrade caused the
problems.
It is also recommended to read the Release Notes for each major version.
1. Install the version of DSpace you wish to upgrade to by following the latest installation instructions:
a. For example: to upgrade to DSpace 4.x follow the instructions at Installing DSpace.
b. At this point, you'll have a fresh copy of DSpace, but no data
2. Make a backup copy of your old DSpace database. You'll use this to upgrade your data to the latest
version of DSpace:
a. For example, in PostgreSQL use the "pg_dump" command to backup all your old data
3. Create an exact replica of your old DSpace database (this is mostly just for safety purposes – you don't
want to lose any of your Production data!)
a. For example, in PostgreSQL use the "createdb" and "psql" commands:
# Create a replica database. I've called this one "dspace-upgrade" just as an example
createdb -U [database-user] -E UNICODE dspace-upgrade
4. Now, upgrade this replica database to be compatible with the version of DSpace you wish to upgrade to.
This is done utilizing the "database_schema-*.sql" upgrade scripts provided in the [dspace-source]
/dspace/etc/postgres/ folder.
a. For example, suppose you are upgrading from DSpace 1.5.x to 4.x. That means you'll be
upgrading your database from being 1.5.x compatible to being 4.x compatible. In order to do that,
you will run all these database upgrade scripts in order (again this example is just for
PostgreSQL):
5. You now have a 4.x compatible database (in "dspace-upgrade")! So, you can dump it's data out to a file:
a. For example, to dump "dspace-upgrade" to a file named "my-dspace-upgraded-db.sql":
6. Remember your "fresh" installation of the newer version of DSpace (in this case DSpace 4.x)? Well, now
you will just move your old data over to that fresh installation.
a. First, you need to move over your upgraded Database data. That's done by removing the default
"fresh installation" database, and replacing it with your upgraded version:
# Delete the default (MAKE SURE IT IS EMPTY) "fresh install" Database on the NEW DSpace
dropdb -U [database-user] [database-name]
# Recreate an empty DB
createdb -U [database-user] -E UNICODE [database-name]
b. Next, copy over your old DSpace "Assetstore" directory into the newly installed version of
DSpace. This assetstore directory contains all the actual content files
cp -R [path-to-old-dspace]/assetstore/* [path-to-new-dspace]/assetstore/
c. Finally, reindex all of your DSpace content in the newly installed version of DSpace. For example,
assuming you are using Discovery faceted/filtered search/browse (the default in DSpace 4.x),
you'd run:
[dspace]/bin/dspace index-discovery -f
7. At this point, you should have a fresh installation of a newer version of DSpace with your content
migrated to it. As a final step, you may wish to re-configure or re-customize the fresh installation (based
on your old settings).
a. Also be sure to review all the Upgrade Instructions for the versions you "skipped over".
Sometimes there are important notes or details in there!
4 Using DSpace
This page offers access to all aspects of the documentation relevant to using DSpace after it has been properly
installed or upgraded. These pages assume that DSpace is functioning properly. Please refer to the section on
System Administration if you are looking for information on diagnosing DSpace issues and measures you can
take to restore your DSpace to a state in which it functions properly.
Additions module
Maven WAR Overlays
DSpace Source Release
For more details on Maven WAR Overlays and how they relate to DSpace, see this presentation from Fall 2009:
Making DSpace XMLUI Your Own
(Please note that this presentation was made for DSpace 1.5.x and 1.6.x, but much of it still applies to current
versions of DSpace.)
Which build option you need to use will depend on your local development practices. If you have been careful to
utilize Maven WAR Overlays for your local code/changes (putting everything under [dspace-source]
/dspace/modules/*), then the Quick Build option may be the best way for you to recompile & reapply your
local modifications. However, if you have made direct changes to code within a subdirectory of [dspace-
source] (e.g. /dspace-api, /dspace-xmlui, /dspace-jspui, etc.) then you will need to utilize the Full
Build option in order to ensure those modifications are included in the final WAR files.
Introduction
The DSpace Spring Service Manager supports overriding configuration at many levels.
Configuration
This latter method requires the addon to implement a SpringLoader to identify the location to look for Spring
configuration and a place configuration files into that location. This can be seen inside the current [dspace-
source]/config/modules/spring.cfg
Configuration Priorities
The ordering of the loading of Spring configuration is the following:
api: when placed in this module the Spring files will always be processed into services (since all of the
DSpace modules are dependent on the API).
discovery: when placed in this module the Spring files will only be processed when the discovery library
is present (in the case of discovery in the xmlui & in the command line interface).
The reason why there is a separate directory is that if a service cannot be loaded, which would the case for the
configurable workflow (the JSPUI would not be able to retrieve the XMLUI interface classes), the kernel will
crash and DSpace will not start.
So you need to indeed create a new directory in [dspace]/config/spring. Next you need to create a class that
inherits from the "org.dspace.kernel.config.SpringLoader". This class only contains one method named
getResourcePaths(). What we do now at the moment is implement this in the following manner:
@Override
public String[] getResourcePaths(ConfigurationService configurationService) {
StringBuffer filePath = new StringBuffer();
filePath.append(configurationService.getProperty("dspace.dir"));
filePath.append(File.separator);
filePath.append("config");
filePath.append(File.separator);
filePath.append("spring");
filePath.append(File.separator);
filePath.append("{module.name}"); //Fill in the module name in this string
filePath.append(File.separator);
try {
//By adding the XML_SUFFIX here it doesn't matter if there should be some kind of spring.
xml.old file in there it will only load in the active ones.
return new String[]{new File(filePath.toString()).toURI().toURL().toString() +
XML_SUFFIX};
} catch (MalformedURLException e) {
return new String[0];
}
}
After the class has been created you will also need to add it to the "springloader.modules" property located in
the [dspace]/config/modules/spring.cfg.
The Spring service manager will check this property to ensure that only the interface implementations which it
can find the class for are loaded in.
By doing this way we give some flexibility to the developers so that they can always create their own Spring
modules and then Spring will not crash when it can't find a certain class.
Architectural Overview
Please see Architectural Overview here: DSpace Services Framework
Tutorials
Several good Spring / DSpace Services Tutorials are already available:
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
org.dspace.authenticate.PasswordAuthentication
1. A request is received from an end-user's browser that, if fulfilled, would lead to an action requiring
authorization taking place.
2. If the end-user is already authenticated:
If the end-user is allowed to perform the action, the action proceeds
If the end-user is NOT allowed to perform the action, an authorization error is displayed.
If the end-user is NOT authenticated, i.e. is accessing DSpace anonymously:
3. The parameters etc. of the request are stored.
4. The Web UI's startAuthentication method is invoked.
5. First it tries all the authentication methods which do implicit authentication (i.e. they work with just the
information already in the Web request, such as an X.509 client certificate). If one of these succeeds, it
proceeds from Step 2 above.
6. If none of the implicit methods succeed, the UI responds by putting up a "login" page to collect
credentials for one of the explicit authentication methods in the stack. The servlet processing that
page then gives the proffered credentials to each authentication method in turn until one succeeds, at
which point it retries the original operation from Step 2 above.
Please see the source files AuthenticationManager.java and AuthenticationMethod.java for
more details about this mechanism.
Authentication by Password
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
org.dspace.authenticate.PasswordAuthentication
Use of inbuilt e-mail address/password-based log-in. This is achieved by forwarding a request that is
attempting an action requiring authorization to the password log-in servlet, /password-login. The
password log-in servlet (org.dspace.app.webui.servlet.PasswordServlet) contains code that
will resume the original request if authentication is successful, as per step 3. described above.
Users can register themselves (i.e. add themselves as e-people without needing approval from the
administrators), and can set their own passwords when they do this
Users are not members of any special (dynamic) e-person groups
You can restrict the domains from which new users are able to register. To enable this feature,
uncomment the following line from dspace.cfg: authentication.password.domain.valid =
example.com Example options might be '@example.com' to restrict registration to users with
addresses ending in @example.com, or '@example.com, .ac.uk' to restrict registration to users with
addresses ending in @example.com or with addresses in the .ac.uk domain.
Configuration [dspace]/config/modules/authentication-password.cfg
File:
Property: domain.valid
Informational This option allows you to limit self-registration to email addresses ending in a particular
Note: domain value. The above example would limit self-registration to individuals with "@mit.edu"
email addresses and all ".ac.uk" email addresses.
Property: login.specialgroup
Configuration [dspace]/config/modules/authentication-password.cfg
File:
Informational This option allows you to automatically add all password authenticated users to a specific
Note: DSpace Group (the group must exist in DSpace) for the remainder of their logged in
session.
Property: digestAlgorithm
Informational This option specifies the hashing algorithm to be used in converting plain-text passwords to
Note: more secure password digests. The example value is the default. You may select any digest
algorithm available through java.security.MessageDigest on your system. At least MD2,
MD5, SHA-1, SHA-256, SHA-384, and SHA-512 should be available, but you may have
installed others. Most sites will not need to adjust this.
Shibboleth Authentication
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
org.dspace.authenticate.ShibAuthentication
Before DSpace will work with Shibboleth, you must have the following:
1. An Apache web server with the "mod_shib" module installed. As mentioned, this mod_shib module acts
as a proxy for all HTTP requests for your servlet container (typically Tomcat). Any requests to DSpace
that require authentication via Shibboleth should be redirected to 'shibd' (the shibboleth daemon) by this
"mod_shib" module. Details on installing/configuring mod_shib in Apache are available at: https://wiki.
shibboleth.net/confluence/display/SHIB2/NativeSPApacheConfig We also have a sample Apache +
mod_shib configuration provided below.
2. An external Shibboleth Idp (Identity Provider). Using mod_shib, DSpace will only act as a Shibboleth SP
(Service Provider). The actual Shibboleth Authentication & Identity information must be provided by an
external IdP. If you are using Shibboleth at your institution already, then there already should be a
Shibboleth IdP available. More information about Shibboleth IdPs versus SPs is available at: https://wiki.
shibboleth.net/confluence/display/SHIB2/UnderstandingShibboleth
For more information on installing and configuring a Shibboleth Service Provider see: https://wiki.shibboleth.net
/confluence/display/SHIB2/Installation
When configuring your Shibboleth Service Provider there are two Shibboleth paradigms you may use: Active or
Lazy Sessions. Active sessions is where the mod_shib module is configured to product an entire URL space.
No one will be able to access that URL without first authenticating with Shibboleth. Using this method you will
need to configure shibboleth to protect the URL: "/shibboleth-login". The alternative, Lazy Session does not
protect any specific URL. Instead Apache will allow access to any URL, and when the application wants to it
may initiate an authenticated session.
The Lazy Session method is preferable for most DSpace installations, as you usually want to provide public
access to (most) DSpace content, while restricting access to only particular areas (e.g. administration UI/tools,
private Items, etc.). When Active Sessions are enabled your entire DSpace site will be access restricted. In
other words, when using Active Sesssions, Shibboleth will require everyone to first authenticate before they can
access any part of your repository (which essentially results in a "dark archive", as anonymous access will not
be allowed).
The Shibboleth setting "ShibUseHeaders" is no longer required to be set to "On", as DSpace will
correctly utilize attributes instead of headers.
When "ShibUseHeaders" is set to "Off" (which is recommended in the mod_shib documentation),
proper configuration of Apache to pass attributes to Tomcat (via either mod_jk or mod_proxy) can
be a bit tricky, SWITCH has some great documentation on exactly what you need to do. We will
eventually paraphrase/summarize this documentation here, but for now, the SWITCH page will
have to do.
When initially setting up Apache & mod_shib, https://www.testshib.org/ provides a great testing ground
for your configurations. This site provides a sample/demo Shibboleth IdP (as well as a sample Shibboleth
SP) which you can test against. It acts as a "sandbox" to get your configurations working properly, before
you point DSpace at your production Shibboleth IdP.
Below, we have provided a sample Apache configuration. However, as every institution has their own specific
Apache setup/configuration, it is highly likely that you will need to tweak this configuration in order to get it
working properly. Again, see the official mod_shib documentation for much more detail about each of these
settings: https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPApacheConfig These configurations are
meant to be added to an Apache <VirtualHost> which acts as a proxy to your Tomcat (or other servlet
container) running DSpace. More information on Apache VirtualHost settings can be found at: https://httpd.
apache.org/docs/2.2/vhosts/
#### SAMPLE MOD_SHIB CONFIGURATION FOR APACHE2 (it may require local modifications based on your
Apache setup) ####
# While this sample VirtualHost is for HTTPS requests (recommended for Shibboleth, obviously),
# you may also need/want to create one for HTTP (*:80)
<VirtualHost *:443>
...
# PLEASE NOTE: We have omitted many Apache settings (ServerName, LogLevel, SSLCertificateFile,
etc)
# which you may need/want to add to your VirtualHost
# Most DSpace instances will want to use Shibboleth "Lazy Session", which ensures that
users
# can access DSpace without first authenticating via Shibboleth.
# This section turns on Shibboleth "Lazy Session". Also ensures that once they have
authenticated
# (by accessing /Shibboleth.sso/Login path), then their Shib session is kept alive
<Location />
AuthType shibboleth
ShibRequireSession Off
require shibboleth
# If your "shibboleth2.xml" file specifies an <ApplicationOverride> setting for your
# DSpace Service Provider, then you may need to tell Apache which "id" to redirect Shib
requests to.
# Just uncomment this and change the value "my-dspace-id" to the associated @id attribute
value.
#ShibRequestSetting applicationId my-dspace-id
</Location>
# If a user attempts to access the DSpace shibboleth login page, force them to authenticate
via Shib
<Location "/shibboleth-login">
AuthType shibboleth
ShibRequireSession On
# Please note that setting ShibUseHeaders to "On" is a potential security risk.
# You may wish to set it to "Off". See the mod_shib docs for details about this setting:
# https://wiki.shibboleth.net/confluence/display/SHIB2
/NativeSPApacheConfig#NativeSPApacheConfig-AuthConfigOptions
# Here's a good guide to configuring Apache + Tomcat when this setting is "Off":
# https://www.switch.ch/de/aai/support/serviceproviders/sp-access-rules.
html#javaapplications
ShibUseHeaders On
require valid-user
</Location>
# Finally, you may need to ensure requests to /Shibboleth.sso are NOT redirected
# to Tomcat (as they need to be handled by mod_shib instead).
# NOTE: THIS SETTING IS LIKELY ONLY NEEDED IF YOU ARE USING mod_proxy TO REDIRECT
# ALL REQUESTS TO TOMCAT (e.g. ProxyPass / ajp://localhost:8009/)
# ProxyPass /Shibboleth.sso !
</IfModule>
...
</VirtualHost>
DSpace supports authentication using NetID, or email address. A user's NetID is a unique identifier from the
IdP that identifies a particular user. The NetID can be of almost any form such as a unique integer, string, or
with Shibboleth 2.0 you can use "targeted ids". You will need to coordinate with your shibboleth federation or
identity provider. There are three ways to supply identity information to DSpace:
The NetID-based method is superior because users may change their email address with the identity provider.
When this happens DSpace will not be able to associate their new address with their old account.
In the case where a NetID header is not available or not found DSpace will fall back to identifying a user based-
upon their email address.
In the event that neither Shibboleth headers are found then as a last resort DSpace will look at Tomcat's remote
user field. This is the least attractive option because Tomcat has no way to supply additional attributes about a
user. Because of this the autoregister option is not supported if this method is used.
If you are currently using Email based authentication (either 1 or 2) and want to upgrade to NetID based
authentication then there is an easy path. Simply enable shibboleth to pass the NetID attribute and set the netid-
header below to the correct value. When a user attempts to log in to DSpace first DSpace will look for an
EPerson with the passed NetID, however when this fails DSpace will fall back to email based authentication.
Then DSpace will update the user's EPerson account record to set their netted so all future authentications for
this user will be based upon netted. One thing to note is that DSpace will prevent an account from switching
NetIDs. If an account all ready has a NetID set and then they try and authenticate with a different NetID the
authentication will fail.
EPerson Metadata:
One of the primary benefits of using Shibboleth based authentication is receiving additional attributes about
users such as their names, telephone numbers, and possibly their academic department or graduation
semester if desired. DSpace treats the first and last name attributes differently because they (along with email
address) are the three pieces of minimal information required to create a new user account. For both first and
last name supply direct mappings to the Shibboleth headers. In additional to the first and last name DSpace
supports other metadata fields such as phone, or really anything you want to store on an eperson object.
Beyond the phone field, which is accessible in the user's profile screen, none of these additional metadata fields
will be used by DSpace out-of-the box. However if you develop any local modification you may access these
attributes from the EPerson object. The Vireo ETD workflow system utilizes this to aid students when submitting
an ETD.
Role-based Groups:
DSpace is able to place users into pre-defined groups based upon values received from Shibboleth. Using this
option you can place all faculty members into a DSpace group when the correct affiliation's attribute is provided.
When DSpace does this they are considered 'special groups', these are really groups but the user's
membership within these groups is not recorded in the database. Each time a user authenticates they are
automatically placed within the pre-defined DSpace group, so if the user loses their affiliation then the next time
they login they will no longer be in the group.
Depending upon the shibboleth attributed use in the role-header it may be scoped. Scoped is shibboleth
terminology for identifying where an attribute originated from. For example a students affiliation may be
encoded as "student@tamu.edu". The part after the @ sign is the scope, and the preceding value is the value.
You may use the whole value or only the value or scope. Using this you could generate a role for students and
one institution different than students at another institution. Or if you turn on ignore-scope you could ignore the
institution and place all students into one group.
The values extracted (a user may have multiple roles) will be used to look up which groups to place the user
into. The groups are defined as "role.<role-name>" which is a comma separated list of DSpace groups.
Configuration [dspace]/config/modules/authentication-shibboleth.cfg
File:
Property: lazysession
Informational Whether to use lazy sessions or active sessions. For more DSpace instances, you will likely
Note: want to use lazy sessions. Active sessions will force every user to authenticate via
Shibboleth before they can access your DSpace (essentially resulting in a "dark archive").
Property: lazysession.loginurl
Configuration [dspace]/config/modules/authentication-shibboleth.cfg
File:
Informational The url to start a shibboleth session (only for lazy sessions). Generally this setting will be "
Note: /Shibboleth.sso/Login"
Property: lazysession.secure
Informational Force HTTPS when authenticating (only for lazy sessions). Generally this is recommended
Note: to be "true".
Property: netid-header
Informational The HTTP header where shibboleth will supply a user's NetID. This HTTP header should be
Note: specified as an Attribute within your Shibboleth "attribute-map.xml" configuration file.
Property: email-header
Informational The HTTP header where the shibboleth will supply a user's email address. This HTTP
Note: header should be specified as an Attribute within your Shibboleth "attribute-map.xml"
configuration file.
Property: email-use-tomcat-remote-user
Informational Used when a netid or email headers are not available should Shibboleth authentication fall
Note: back to using Tomcat's remote user feature? Generally this is not recommended. See the
"Authentication Methods" section above.
Property: reconvert.attributes
Configuration [dspace]/config/modules/authentication-shibboleth.cfg
File:
Informational Shibboleth attributes are by default UTF-8 encoded. Some servlet container automatically
Note: converts the attributes from ISO-8859-1 (latin-1) to UTF-8. As the attributes already were
UTF-8 encoded it may be necessary to reconvert them. If you set this property true DSpace
converts all shibboleth attributes retrieved from the servlet container from UTF-8 to ISO-
8859-1 and uses the result as if it were UTF-8. This procedure restores the shibboleth
attributes if the servlet container wrongly converted them from ISO-8859-1 to UTF-8. Set
this true, if you notice character encoding problems within shibboleth attributes.
This property was added with DSpace version 4.2 and is not available in DSpace versions
before.
Property: autoregister
Property: sword.compatibility
Informational SWORD compatibility will allow this authentication method to work when using SWORD.
Note: SWORD relies on username and password based authentication and is entirely incapable of
supporting shibboleth. This option allows you to authenticate username and passwords for
SWORD sessions with out adding another authentication method onto the stack. You will
need to ensure that a user has a password. One way to do that is to create the user via the
create-administrator command line command and then edit their permissions.
WARNING: If you enable this option while ALSO having "PasswordAuthentication" enabled,
then you should ensure that "PasswordAuthentication" is listed prior to "ShibAuthentication"
in your authentication.cfg file. Otherwise, ShibAuthentication will be used to authenticate all
of your users INSTEAD OF PasswordAuthentication.
Property: firstname-header
Informational The HTTP header where the shibboleth will supply a user's given name. This HTTP header
Note: should be specified as an Attribute within your Shibboleth "attribute-map.xml" configuration
file.
Configuration [dspace]/config/modules/authentication-shibboleth.cfg
File:
Property: lastname-header
Informational The HTTP header where the shibboleth will supply a user's surname. This HTTP header
Note: should be specified as an Attribute within your Shibboleth "attribute-map.xml" configuration
file.
Property: eperson.metadata
Example
Value: eperson.metadata = \
SHIB-telephone => phone, \
SHIB-cn => cn
Informational Additional user attributes mapping, multiple attributes may be stored for each user. The left
Note: side is the Shibboleth-based metadata Header and the right side is the eperson metadata
field to map the attribute to.
Property: eperson.metadata.autocreate
Informational If the eperson metadata field is not found, should it be automatically created?
Note:
Property: role-header
Informational The shibboleth header to do role-based mappings (see section on roll based mapping
Note: section above)
Property: role-header.ignore-scope
Informational Weather to ignore the attribute's scope (everything after the @ sign for scoped attributes)
Note:
Configuration [dspace]/config/modules/authentication-shibboleth.cfg
File:
Property: role-header.ignore-value
Informational Weather to ignore the attribute's value (everything before the @ sign for scoped attributes)
Note:
Property: role.[affiliation-attribute]
Example
Value: role.faculty = Faculty, Member \
role.staff = Staff, Member \
role.student = Students, Member
Informational Mapping of affiliation values to DSpace groups. See the "Role-based Groups" section above
Note: for more info.
LDAP Authentication
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
org.dspace.authenticate.LDAPAuthentication
If you want to give any special privileges to LDAP users, create a stackable authentication method to
automatically put people who have a netid into a special group. You might also want to give certain email
addresses special privileges. Refer to the Custom Authentication Code section below for more information
about how to do this.
Configuration [dspace]/config/modules/authentication-ldap.cfg
File:
Property: enable
Informational This setting will enable or disable LDAP authentication in DSpace. With the setting off,
Note: users will be required to register and login with their email address. With this setting on,
users will be able to login and register with their LDAP user ids and passwords.
Property: autoregister
Informational This will turn LDAP autoregistration on or off. With this on, a new EPerson object will be
Note: created for any user who successfully authenticates against the LDAP server when they first
login. With this setting off, the user must first register to get an EPerson object by entering
their ldap username and password and filling out the forms.
Property: provider_url
Informational This is the url to your institution's LDAP server. You may or may not need the /o=myu.edu
Note: part at the end. Your server may also require the ldaps:// protocol.
Property: id_field
Explanation: This is the unique identifier field in the LDAP directory where the username is stored.
Property: object_context
Configuration [dspace]/config/modules/authentication-ldap.cfg
File:
Informational This is the object context used when authenticating the user. It is appended to the id_field
Note: and username. For example uid=username,ou=people,o=myu.edu. You will need to
modify this to match your LDAP configuration.
Property: search_context
Informational This is the search context used when looking up a user's LDAP object to retrieve their data
Note: for autoregistering. With autoregister turned on, when a user authenticates without an
EPerson object we search the LDAP directory to get their name and email address so that
we can create one for them. So after we have authenticated against uid=username,
ou=people,o=byu.edu we now search in ou=people for filtering on [uid=username]. Often
the search_context is the same as the object_context parameter. But again this
depends on your LDAP server configuration.
Property: email_field
Informational This is the LDAP object field where the user's email address is stored. "mail" is the default
Note: and the most common for LDAP servers. If the mail field is not found the username will be
used as the email address when creating the eperson object.
Property: surname_field
Example surname_field = sn
Value:
Informational This is the LDAP object field where the user's last name is stored. "sn" is the default and is
Note: the most common for LDAP servers. If the field is not found the field will be left blank in the
new eperson object.
Property: givenname_field
Informational This is the LDAP object field where the user's given names are stored. I'm not sure how
Note: common the givenName field is in different LDAP instances. If the field is not found the field
will be left blank in the new eperson object.
Configuration [dspace]/config/modules/authentication-ldap.cfg
File:
Property: phone_field
Informational This is the field where the user's phone number is stored in the LDAP directory. If the field is
Note: not found the field will be left blank in the new eperson object.
Property: login.specialgroup
Informational If required, a group name can be given here, and all users who log into LDAP will
Note: automatically become members of this group. This is useful if you want a group made up of
all internal authenticated users. (Remember to log on as the administrator, add this to the
"Groups" with read rights).
Property: login.groupmap.*
that user would get assigned to the ALL_STUDENTS DSpace group on login.
Note 2: This option can be used independently from the login.specialgroup option,
which will put all LDAP users into a single DSpace group . Both options may be used
together.
Please note, that DSpace 3.0 doesn't contain the LDAPHierarchicalAuthentication class
anymore. This functionality is now supported by LDAPAuthentication, which uses the same
configuration options. See Upgrading From 1.8.x to 3.x for information about upgrading.
If your users are spread out across a hierarchical tree on your LDAP server, you may wish to have DSpace
search for the user name in your tree. Here's how it works:
You can optionally specify the search scope. If anonymous access is not enabled on your LDAP server, you will
need to specify the full DN and password of a user that is allowed to bind in order to search for the users.
Configuration [dspace]/config/modules/authentication-ldap.cfg
File:
Property: search_scope
Example search_scope = 2
Value:
Informational This is the search scope value for the LDAP search during autoregistering. This will depend
Note: on your LDAP server setup. This value must be one of the following integers corresponding
to the following values:
object scope : 0
one level scope : 1
subtree scope : 2
Property: search.user
search.password
Informational The full DN and password of a user allowed to connect to the LDAP server and search for
Note: the DN of the user trying to log in. If these are not specified, the initial bind will be performed
anonymously.
Property: netid_email_domain
Configuration [dspace]/config/modules/authentication-ldap.cfg
File:
Informational If your LDAP server does not hold an email address for a user, you can use the following
Note: field to specify your email domain. This value is appended to the netid in order to make an
email address. E.g. a netid of 'user' and netid_email_domain as @example.com would
set the email of the user to be user@example.com
IP Authentication
Enabling IP Authentication
To enable IP Authentication, you must ensure the org.dspace.authenticate.IPAuthentication class
is listed as one of the AuthenticationMethods in the following configuration:
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
org.dspace.authenticate.IPAuthentication
Configuring IP Authentication
Configuration File: [dspace]/config/modules/authentication-ip.cfg
Once enabled, you are then able to map DSpace groups to IP addresses in authentication-ip.cfg by
setting ip.GROUPNAME = iprange[, iprange ...], e.g:
Negative matches can be set by prepending the entry with a '-'. For example if you want to include all of a class
B network except for users of a contained class c network, you could use: 111.222,-111.222.333.
Notes:
If the Groupname contains blanks you must escape the spaces, e.g. "Department\ of\ Statistics"
If your DSpace installation is hidden behind a web proxy, remember to set the useProxies
configuration option within the 'Logging' section of dspace.cfg to use the IP address of the user rather
than the IP address of the proxy server.
1. See the HTTPS installation instructions to configure your Web server. If you are using HTTPS with
Tomcat, note that the <Connector> tag must include the attribute clientAuth="true" so the server
requests a personal Web certificate from the client.
2. Add the org.dspace.authenticate.X509Authentication plugin first to the list of stackable
authentication methods in the value of the configuration key plugin.sequence.org.dspace.
authenticate.AuthenticationMethod
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
org.dspace.authenticate.X509Authentication, \
org.dspace.authenticate.PasswordAuthentication
1. You must also configure DSpace with the same CA certificates as the web server, so it can accept and
interpret the clients' certificates. It can share the same keystore file as the web server, or a separate one,
or a CA certificate in a file by itself. Configure it by oneof these methods, either the Java keystore
2. Choose whether to enable auto-registration: If you want users who authenticate successfully to be
automatically registered as new E-Persons if they are not already, set the autoregister configuration
property to true. This lets you automatically accept all users with valid personal certificates. The default
is false.
By keeping this code in a separate method, we can customize the authentication process for MIT by simply
adding it to the stack in the DSpace configuration. None of the code has to be touched.
You can create your own custom authentication method and add it to the stack. Use the most similar existing
method as a model, e.g. org.dspace.authenticate.PasswordAuthentication for an "explicit" method
(with credentials entered interactively) or org.dspace.authenticate.X509Authentication for an
implicit method.
4.3.1 Introduction
DSpace provides a batch metadata editing tool. The batch editing tool is able to produce a comma delimited file
in the CSV format. The batch editing tool facilitates the user to perform the following:
For information about configuration options for the Batch Metadata Editing tool, see Batch Metadata Editing
Configuration
Out of the box, the batch metadata editing features do not support the DSpace versioning system.
Changes are applied straight on the item metadata and no versions of these items are being
generated and stored as part of these edit operations. Be careful when using these features.
Whenever you are on a collection, you will have the possibility to export the metadata of that specific collection.
You just have to click "Export Metadata" in the Context menu
After you have altered the metadata, you can import it back into the repository quite simply. You just need to go
to the homepage.
On this page you can see which changes you have made within the CSV-file. You can now either accept these
changes and click "Apply changes" or not, in that case click "Return".
Whenever you are on a collection, you will have the possibility to export the metadata of that specific collection.
You just have to click "Export Metadata" in the Admin tools.
After you have altered the metadata, you can import it back into the repository quite simply. On (almost) every
page of the repository you can access the administrator tools.
Once you are in the administrator tools, just click on "content" and then you only have to select "import
metadata" from the list that drops down.
On this page you can see which changes you have made within the CSV-file. You can now either accept these
changes and click "Apply changes" or not, in that case click "Return".
Export parameters
The following table summarizes the basics.
Arguments Description
short and
(long) forms):
-i or --id The Item, Collection, or Community handle or Database ID to export. If not specified, all
items will be exported.
-a or --all Include all the metadata fields that are not normally changed (e.g. provenance) or those
fields you configured in the [dspace]/config/modules/bulkedit.cfg to be ignored
on export.
Example commands
Example:
In the above example we have requested that a collection, assigned handle ' 1989.1/24' export the entire
collection to the file 'col_14.csv' found in the '/batch_export' directory.
Import parameters
The following table summarizes the basics.
-s or --silent Silent mode. The import function does not prompt you to make sure you wish to
make the changes.
-e or --email The email address of the user. This is only required when adding new items.
-w or --workflow When adding new items, the program will queue the items up to use the
Collection Workflow processes.
-n or --notify when adding new items using a workflow, send notification emails.
-t or --template When adding new items, use the Collection template, if it exists.
Silent Mode should be used carefully. It is possible (and probable) that you can overlay the wrong data and
cause irreparable damage to the database.
Example commands
Example
If you are wishing to upload new metadata without bitstreams, at the command line:
In the above example we threw in all the arguments. This would add the metadata and engage the workflow,
notification, and templates to all be applied to the items that are being added.
It is not recommended to import CSV files of more than 1,000 lines. When importing files larger than
this, it is hard to accurately verify the changes that the import tool states it will make, and large files
may cause 'Out Of Memory' errors part way through the process.
File Structure. The first row of the csv must define the metadata values that the rest of the csv represents. The
first column must always be "id" which refers to the item's id. All other columns are optional. The other columns
contain the dublin core metadata fields that the data is to reside.
id,collection,dc.title,dc.contributor,dc.date.issued,etc,etc,etc.
Subsequent rows in the csv file relate to items. A typical row might look like:
If you want to store multiple values for a given metadata element, they can be separated with the double-pipe '||'
(or another character that you defined in your modules/bulkedit.cfg file. For example:
Horses||Dogs||Cats
Elements are stored in the database in the order that they appear in the csv file. You can use this to order
elements where order may matter, such as authors, or controlled vocabulary such as Library of Congress
Subject Headings.
When importing a csv file, the importer will overlay the data onto what is already in the repository to determine
the differences. It only acts on the contents of the csv file, rather than on the complete item metadata. This
means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows
(items) or Columns (metadata elements) can be removed and will be ignored. For example, if you only want to
edit item abstracts, you can remove all of the other columns and just leave the abstract column. (You do need
to leave the ID column intact. This is mandatory).
If you are using the web user interface for adding metadata-only items, any activated collection workflow steps
are effectively bypassed. As a result, these added items are immediately archived into the repository.
1. 'expunge' This permanently deletes an item. Use with care! This action must be enabled by setting
'allowexpunge = true' in modules/bulkedit.cfg
2. 'withdraw' This withdraws an item from the archive, but does not delete it.
3. 'reinstate' This reinstates an item that has previously been withdrawn.
If an action makes no change (for example, asking to withdraw an item that is already withdrawn) then, just like
metadata that has not changed, this will be ignored.
1. Insert a new column. The first row should be the new metadata element. (We will refer to it as the
TARGET)
2. Select the column/rows of the data you wish to change. (We will refer to it as the SOURCE)
3. Cut and paste this data into the new column (TARGET) you created in Step 1.
4. Leave the column (SOURCE) you just cut and pasted from empty. Do not delete it.
Common Issues
Metadata values in CSV export seem to have duplicate columns
Configuration [dspace]/config/modules/bulkedit.cfg
File:
Property: valueseparator
Example valueseparator = ||
Value:
Configuration [dspace]/config/modules/bulkedit.cfg
File:
Informational The delimiter used to separate values within a single field. For example, this will place the
note double pipe between multiple authors appearing in one record (Smith, William || Johannsen,
Susan). This applies to any metadata field that appears more than once in a record. The
user can change this to another character.
Property: fieldseparator
Example fieldseparator = ,
Value:
Informational The delimiter used to separate fields (defaults to a comma for CSV). Again, the user could
note change it something like '$'. If you wish to use a tab, semicolon, or hash (#) sign as the
delimiter, set the value to be tab, semicolon or hash.
fieldseparator = tab
Property: gui-item-limit
Example gui-item-limit = 20
Value:
Informational When using the WEBUI, this sets the limit of the number of items allowed to be edited in
note one processing. There is no limit when using the CLI.
Property: ignore-on-export
Example
Value: ignore-on-export = dc.date.accessioned, \
dc.date.available, \
dc.date.updated, dc.description.provenance
Informational Metadata elements to exclude when exporting via the user interfaces, or when using the
note command line version and not using the -a (all) option.
For ease of use, the Configuration documentation is broken into several parts:
General Configuration - addresses general conventions used with configuring not only the dspace.cfg
file, but other configuration files which use similar conventions.
The build.properties Configuration Properties File - specifies the basic build.properties file settings
(these basic settings are used when building/installing/upgrading DSpace)
The dspace.cfg Configuration Properties File - specifies the basic dspace.cfg file settings (these
settings are used when DSpace is actually running)
Optional or Advanced Configuration Settings - contain other more advanced settings that are optional in
the dspace.cfg configuration file.
As of version 1.8 much of the DSpace configuration has been moved to discrete configuration
files related to specific functionality and is documented in subsequent sections of this
document.
General Configuration
Input Conventions
Update Reminder
The build.properties Configuration Properties File
The dspace.cfg Configuration Properties File
Main DSpace Configurations
DSpace Database Configuration
DSpace Email Settings
Wording of E-mail Messages
File Storage
SRB (Storage Resource Brokerage) File Storage
Logging Configuration
Configuring the Search Engine
Delegation Administration: Authorization System Configuration
Login as feature
Restricted Item Visibility Settings
Proxy Settings
Configuring Media Filters
Crosswalk and Packager Plugin Settings
Configurable MODS Dissemination Crosswalk
XSLT-based Crosswalks
Testing XSLT Crosswalks
Configurable Qualified Dublin Core (QDC) dissemination crosswalk
Configuring Crosswalk Plugins
Configuring Packager Plugins
In general, most of the configuration files, namely dspace.cfg and xmlui.xconf will provide a good source
of information not only with configuration but also with customization (cf. Customization chapters)
Input Conventions
We will use the dspace.cfg as our example for input conventions used throughout the system. It is a basic Java
properties file, where lines are either comments, starting with a '#', blank lines, or property/value pairs of the
form:
Some property defaults are "commented out". That is, they have a "#" preceding them, and the DSpace
software ignores the config property. This may cause the feature not to be enabled, or, cause a default property
to be used when the software is compiled and updated.
The property value may contain references to other configuration properties, in the form ${property.name}.
This follows the ant convention of allowing references in property files. A property may not refer to itself.
Examples:
Property values can include other, previously defined values, by enclosing the property name in ${...}. For
example, if your dspace.cfg contains:
dspace.dir = /dspace
dspace.history = ${dspace.dir}/history
Then the value of dspace.history property is expanded to be /dspace/history. This method is especially useful
for handling commonly used file paths.
Update Reminder
Things you should know about editing dspace.cfg files.
It is important to remember that there are * multiple dspace.cfg files in serveral places after an installation
of DSpace.* The only two you should notice are:
To keep the two files in synchronization, you can edit your files in [dspace-source]/dspace/config/ and
then you would run the following commands:
cd [dspace-source]/dspace/
mvn package
cd [dspace-source]/dspace/target/dspace-<version>-build.dir
ant update_configs
This will copy the source dspace.cfg (along with other configuration files) into the runtime ([dspace]
/config) directory.
Please note that there are in fact two options available, choose whichever you prefer :-
"ant update_configs" ==> Moves existing configs in [dspace]/config/ to *.old files and replaces them
with what is in [dspace-source]/dspace/config/
"ant -Doverwrite=false update_configs" ==> Leaves existing configs in [dspace]/config/ intact. Just
copies new configs from
[dspace-source]/dspace/config/ over to *.new files.
Users/Developers may also choose to copy the build.properties under a different name for different
environments (e.g. development, test & production), and choose which environment to build DSpace for by
passing a "-Denv" (environment) flag to the Maven build process (e.g. "mvn package -Denv=test" would build
DSpace using a custom "test.properties" file).
Here's a basic example of how build.properties (or any *.properties) file may be used to simplify installation &
development:
It is worth noting that the [dspace-source]/build.properties file (or custom properties file) is
ONLY used in the act of building/installing/upgrading DSpace. During that build/install/upgrade
process, any setting currently available in the build.properties will be inherited (copied) to the
dspace.cfg file. However, if you need to add new settings to your build.properties file, you will
need to modify your dspace.cfg file in order for it to be inherited (see the note below titled "You may
add new settings to your build.properties or custom *.properties").
Once DSpace is installed, the system only uses the settings in your [dspace]/config/dspace.cfg
file. So, the build.properties file will not be copied into your installation directory, and all runtime
configurations are pulled from the final dspace.cfg file.
When you edit the "build.properties" file (or a custom *.properties file), take care not to remove or
comment out any settings. Doing so, may cause your final "dspace.cfg" file to be misconfigured with
regards to that particular setting. Instead, if you wish to remove/disable a particular setting, just clear
out its value. For example, if you don't want to be notified of new user registrations, ensure the "mail.
registration.notify" setting has no value, e.g.
mail.registration.notify=
Based on your institution's needs, you may wish to add settings to your own build.properties (or
custom *.properties) file. This is actually a relatively easy process.
Any existing DSpace configuration (any config in dspace.cfg or in any configuration file under
[dspace-src]/dspace/config/modules/*.cfg) can be "moved" into your local build.properties
file via the following process:
1. First, copy the existing configuration from the *.cfg file into your local build.properties file. You
can actually choose to rename this configuration in build.properties, if it makes more sense.
Essentially, the name of the new configuration in build.properties is entirely up to you.
a. For example, if you want to copy the LDAP "provider_url" from [dspace-src]
/dspace/config/modules/authentication-ldap.cfg to your build.properties,
you may wish to rename it to "ldap.provider_url" within build.properties
b. You can also choose to keep the name of the configuration the same in build.properties.
For example, if you wish to move the "xmlui.google.analytics.key" (from dspace.
cfg) to your build.properties, you could keep the name the same.
2. Second, you will need to modify the corresponding configuration file (the config file you copied
the setting from) so that it now references your newly added build.properties setting. This is
achieved by using the "${setting-in-build.properties}" placeholder.
a. For example, to reference a new "ldap.provider_url" setting in build.properties
(mentioned in 1.a above) , just modify the [dspace-src]/dspace/config/modules
/authentication-ldap.cfg file to have a line that says provider_url=${ldap.
provider_url} (The first part is the name of the actual config in authentication-ldap.
cfg, and the second part is the name of the config in build.properties)
b. Another example: To reference a new "xmlui.google.analytics.key" setting in build.
properties (mentioned in 1.b above), just modify the [dspace-src]/dspace/config
/dspace.cfg file to have a line that says xmlui.google.analytics.
key=${xmlui.google.analytics.key} (The first part is the name of the actual
config in dspace.cfg, and the second part is the name of the config in build.properties)
3. Finally, rebuild DSpace (using Maven), and redeploy (using Ant). The new settings in your build.
properties file will automatically be copied into your configuration file during the rebuild process.
In ordinary use, this file is assumed to be [dspace]/config/dspace.cfg. If you define a system property -
Ddspace.configuration=/some/path/to/a/file then that file will be used instead.
Example /dspace
Value:
Informational Root directory of DSpace installation. Omit the trailing slash '/'. Note that if you change this,
Note: there are several other parameters you will probably want to change to match, e.g.
assetstore.dir .
(On Windows be sure to use forward slashes for the directory path! For example: "C:/dspace"
is a valid path for Windows.)
Property: dspace.hostname
Property: dspace.baseUrl
Example http://dspacetest.myu.edu:8080
Value:
Informational Main URL at which DSpace Web UI webapp is deployed. Include any port number, but do not
Note: include the trailing '/'.
Property: dspace.url
Informational DSpace base URL. URL that determines whether JSPUI or XMLUI will be loaded by default.
note Include port number etc., but NOT trailing slash. Change to /xmlui if you wish to use the
xmlui (Manakin) as the default, or remove "/jspui" and set webapp of your choice as the
"ROOT" webapp in the servlet engine.
Property: dspace.oai.url
Informational The base URL of the OAI webapp (do not include /request).
note:
Property: dspace.name
Informational Short and sweet site name, used throughout Web UI, e-mails and elsewhere (such as OAI
Note: protocol)
Property: db.name
Property: db.url
Informational The above value is the default value when configuring with PostgreSQL. When using Oracle,
Note: use this value: jbdc.oracle.thin:@//host:port/dspace
Property: db.username
Informational In the installation directions, the administrator is instructed to create the user "dspace" who
Note: will own the database "dspace".
Property: db.password
Informational This is the password that was prompted during the installation process (cf. 3.2.3. Installation)
Note:
Property: db.schema
Informational If your database contains multiple schemas, you can avoid problems with retrieving the
Note: definitions of duplicate objects by specifying the schema name here that is used for DSpace
by uncommenting the entry. This property is optional.
Property: db.maxconnections
Example db.maxconnections = 30
Value:
Property: db.maxwait
Informational Maximum time to wait before giving up if all connections in pool are busy (in milliseconds).
Note:
Property: db.maxidle
Example db.maxidle = -1
Value:
Property: db.statementpool
Property: db.poolname
Informational Specify a name for the connection pool. This is useful if you have multiple applications
Note: sharing Tomcat's database connection pool. If nothing is specified, it will default to
'dspacepool'
Property: db.jndi
Informational Specify the name of a configured connection pool to be fetched from a directory using JNDI. If
Note: this property is not configured or no such pool can be retrieved, then DSpace will fall back to
creating its own pool using the other db.* properties. db.name must still be specified.
DSpace will look up a javax.mail.Session object in JNDI and, if found, will use that to send email. Otherwise it
will create a Session using some of the properties detailed below.
Property: mail.server
Informational The address on which your outgoing SMTP email server can be reached.
Note:
Property: mail.server.username
Informational SMTP mail server authentication username, if required. This property is optional.
Note:
Property: mail.server.password
Informational SMTP mail server authentication password, if required. This property is optional/
Note:
Property: mail.server.port
Example mail.server.port = 25
Value:
Informational The port on which your SMTP mail server can be reached. By default, port 25 is used.
Note: Change this setting if your SMTP mailserver is running on another port. This property is
optional.
Property: mail.from.address
Informational The "From" address for email. Change the 'myu.edu' to the site's host name.
Note:
Property: feedback.recipient
Informational When a user clicks on the feedback link/feature, the information will be sent to the email
Note: address of choice. This configuration is currently limited to only one recipient. Since DSpace
4.0, this is also the email address displayed on the contacts page.
Property: mail.admin
Property: alert.recipient
Informational Enter the recipient for server errors and alerts. This property is optional.
Note:
Property: registration.notify
Informational Enter the recipient that will be notified when a new user registers on DSpace. This property is
Note: optional.
Property: mail.charset
Informational Set the default mail character set. This may be over-ridden by providing a line inside the email
Note: template 'charset: <encoding>', otherwise this default is used.
Property: mail.allowed.referrers
Informational A comma separated list of hostnames that are allowed to refer browsers to email forms.
Note: Default behavior is to accept referrals only from dspace.hostname. This property is optional.
Property: mail.extraproperties
Example
Value: mail.extraproperties = mail.smtp.socketFactory.port=465, \
mail.smtp.socketFactory.class=javax.net.ssl.SSLSocketFactory, \
mail.smtp.socketFactory.fallback=false
Informational If you need to pass extra settings to the Java mail library. Comma separated, equals sign
Note: between the key and the value. This property is optional.
Property: mail.server.disabled
Informational An option is added to disable the mailserver. By default, this property is set to 'false'. By
Note: setting value to 'true', DSpace will not send out emails. It will instead log the subject of the
email which should have been sent. This is especially useful for development and test
environments where production data is used when testing functionality. This property is
optional.
Property: mail.session.name
Informational Specifies the name of a javax.mail.Session object stored in JNDI under java:comp/env
Note: /mail. The default value is "Session".
Property: default.language
Informational If no other language is explicitly stated in the input-forms.xml, the default language will be
Note: attributed to the metadata values.
File Storage
DSpace supports two distinct options for storing your repository bitstreams (uploaded files). The files are not
stored in the database in which Metadata, user information, ... are stored. An assetstore is a directory on your
server, on which the bitstreams are stored and consulted afterwards. The usage of different assetstore
directories is the default "technique" in DSpace. The parameters below define which assetstores are present,
and which one should be used for newly incoming items. As an alternative, DSpace can also use SRB (Storage
Resource Brokerage) as an alternative. See SRB File Storage for details regarding SRB.
Property: assetstore.dir
Informational This is Asset (bitstream) store number 0 (Zero). You need not place your assetstore under the
Note: /dspace directory, but may want to place it on a different logical volume on the server that
DSpace resides. So, you might have something like this: assetstore.dir = /storevgm
/assestore .
Property:
assetstore.dir.1
assetstore.dir.2
Example
Value: assetstore.dir.1 = /second/assetstore
assetstore.dir.2 = /third/assetstore
Informational This property specifies extra asset stores like the one above, counting from one (1) upwards.
Note: This property is commented out (#) until it is needed.
Property: assetstore.incoming
Example assetstore.incoming = 1
Value:
Informational Specify the number of the store to use for new bitstreams with this property. The default is 0
Note: [zero] which corresponds to the 'assestore.dir' above. As the asset store number is stored in
the item metadata (in the database), always keep the assetstore numbering consistent and
don't change the asset store number in the item metadata.
Be Careful
In the examples above, you can see that your storage does not have to be under the /dspace
directory. For the default installation it needs to reside on the same server (unless you plan to
configure SRB (see below)). So, if you added storage space to your server, and it has a different
logical volume/name/directory, you could have the following as an example:
assetstore.dir = /storevgm/assetstore
assetstore.dir.1 = /storevgm2/assetstore
assetstore.incoming = 1
Please Note: When adding additional storage configuration, you will then need to uncomment and declare
assestore.incoming = 1
The same framework is used to configure SRB storage. That is, the asset store number (0..n) can reference a
file system directory as above or it can reference a set of SRB account parameters. But any particular asset
store number can reference one or the other but not both. This way traditional and SRB storage can both be
used but with different asset store numbers. The same cautions mentioned above apply to SRB asset stores as
well. The particular asset store a bitstream is stored in is held in the database, so don't move bitstreams
between asset stores, and do not renumber them.
Property: srb.hosts.1
Property: srb.port.1
Property: srb.mcatzone.1
Informational Your SRB Metadata Catalog Zone. An SRB Zone (or zone for short) is a set of SRB servers
Note: 'brokered' or administered through a single MCAT. Hence a zone consists of one or more
SRB servers along with one MCAT-enabled server. Any existing SRB system (version 2.x.x
and below) can be viewed as an SRB zone. For more information on zones, please check
http://www.sdsc.edu/srb/index.php/Zones.
Property: srb.mdasdomainname.1
Informational Your SRB domain. This domain should be created under the same zone, specified in srb.
Note: mcatzone. Information on domains is included here http://www.sdsc.edu/srb/index.php/Zones.
Property: srb.defaultstorageresource.1
Property: srb.username.1
Property: srb.password.1
Property: srb.homedirectory.1
Example
Value: srb.homedirectory.1 =
/mysrbzone/home/ mysrbuser.mysrbdomain
Property: srb.parentdir.1
Informational Several of the terms, such as mcatzone, have meaning only in the SRB context and will be
Note: familiar to SRB users. The last, srb.paratdir.n, can be used for additional (SRB) upper
directory structure within an SRB account. This property value could be blank as well.
The 'assetstore.incoming' property is an integer that references where new bitstreams will be stored. The
default (say the starting reference) is zero. The value will be used to identify the storage where all new
bitstreams will be stored until this number is changed. This number is stored in the Bitstream table
(store_number column) in the DSpace database, so older bitstreams that may have been stored when ' asset.
incoming' had a different value can be found.
In the simple case in which DSpace uses local (or mounted) storage the number can refer to different
directories (or partitions). This gives DSpace some level of scalability. The number links to another set of
properties 'assetstore.dir', 'assetstore.dir.1' (remember zero is default), assetstore.dir.2', etc., where the values
are directories.
To support the use of SRB DSpace uses the same scheme but broaden to support:
If SRB is chosen from the first install of DSpace, it is suggested that 'assetstore.dir' (no integer appended) be
retained to reference a local directory (as above under File Storage) because build.xml uses this value to do a
mkdir. In this case, 'assetstore.incoming' can be set to 1 (i.e. uncomment the line in File Storage above) and
the 'assetstore.dir' will not be used.
Logging Configuration
Property: log.init.config
Informational This is where your logging configuration file is located. You may override the default log4j
Note: configuration by providing your own. Existing alternatives are:
log.init.config = ${dspace.dir}/config/log4j.properties
log.init.config = ${dspace.dir}/config/log4j-console.properties
Property: log.dir
Informational This is where to put the logs. (This is used for initial configuration only)
Note:
Property: useProxies
Informational If your DSpace instance is protected by a proxy server, in order for log4j to log the correct IP
Note: address of the user rather than of the proxy, it must be configured to look for the X-Forwarded-
For header. This feature can be enabled by ensuring this setting is set to true. This also
affects IPAuthentication, and should be enabled for that to work properly if your installation
uses a proxy server.
Since DSpace 4.0 the advanced search module named Discovery (based on Apache SOLR) is the
default search provider. It provides up-to-date features, such as filtering/faceting, hit highlighting,
search snippets, etc.
Please refer to Legacy methods for re-indexing content if you want re-enable and customize the
"legacy" DSpace search engine (based on Apache Lucene).
The CNRI Handle system is a 3rd party service for maintaining persistent URL's. For a nominal fee, you can
register a handle prefix for your repository. As a result, your repository items will be also available under the
links http://handle.net/<<handle prefix>>/<<item id>>. As the base url of your repository might change or evolve,
the persistent handle.net URL's secure the consistency of links to your repository items. For complete
information regarding the Handle server, the user should consult The Handle Server section of Installing
DSpace.
Property: handle.canonical.prefix
Informational Canonical Handle URL prefix. By default, DSpace is configured to use http://hdl.handle.net/
Note: as the canonical URL prefix when generating dc.identifier.uri during submission, and
in the 'identifier' displayed in item record pages. If you do not subscribe to CNRI's handle
service, you can change this to match the persistent URL service you use, or you can force
DSpace to use your site's URL, e.g. handle.canonical.prefix = ${dspace.url}
/handle/. Note that this will not alter dc.identifer.uri metadata for existing items (only
for subsequent submissions).
Property: handle.prefix
Informational The default installed by DSpace is 123456789 but you will replace this upon receiving a
Note: handle from CNRI.
Property: handle.dir
Informational The default files, as shown in the Example Value is where DSpace will install the files used for
Note: the Handle Server.
Authorization to execute the functions that are allowed to user with WRITE permission on an object will be
attributed to be the ADMIN of the object (e.g. community/collection/admin will be always allowed to edit
metadata of the object). The default will be "true" for all the configurations.
Property: core.authorization.community-admin.
create-subelement
Property: core.authorization.community-admin.
delete-subelement
Property: core.authorization.community-admin.
policies
Property: core.authorization.community-admin.admin-
group
Property: core.authorization.community-admin.
collection.policies
Property: core.authorization.community-admin.
collection.template-item
Property: core.authorization.community-admin.
collection.submitters
Property: core.authorization.community-admin.
collection.workflows
Property: core.authorization.community-admin.
collection.admin-group
Property: core.authorization.community-admin.item.
delete
Property: core.authorization.community-admin.item.
withdraw
Property: core.authorization.community-admin.item.
reinstate
Property: core.authorization.community-admin.item.
policies
Property: core.authorization.community-admin.item.
create-bitstream
Property: core.authorization.community-admin.item.
delete-bitstream
Property: core.authorization.community-admin.item.
cc-license
Community Administration:
The properties for collection administrators work core.authorization.collection-admin.policies
similar to those core.authorization.collection-admin.template-
of community administrators, item
core.authorization.collection-admin.submitters
with respect to collection administration.
core.authorization.collection-admin.workflows
core.authorization.collection-admin.admin-group
Collection Administration:
Item owned by the above CollectionThe properties core.authorization.collection-admin.item.delete
for collection core.authorization.collection-admin.item.
administrators work similar to those of withdraw
core.authorization.collection-admin.item.
community administrators,
reinstatiate
with respect to administration of core.authorization.collection-admin.item.
items in underlying collections. policies
Collection Administration:
Bundles of bitstreams, related to items owned by core.authorization.collection-admin.item.create-
collections in the bitstream
above Community. The properties for collection core.authorization.collection-admin.item.delete-
bitstream
administrators
core.authorization.collection-admin.item-admin.
work similar to those of community administrators, cc-license
with respect to
administration of bitstreams related to items in
underlying collections.
Item Administration:
Bundles of bitstreams, related to items owned by core.authorization.item-admin.create-bitstream
collections in the core.authorization.item-admin.delete-bitstream
above Community. The properties for item core.authorization.item-admin.cc-license
administrators work
similar to those of community and collection
administrators,
with respect to administration of bitstreams
related to items in underlying collections.
Login as feature
Property: webui.user.assumelogin
Informational Determine if super administrators (those whom are in the Administrators group) can login as
Note: another user from the "edit eperson" page. This is useful for debugging problems in a running
dspace instance, especially in the workflow process. The default value is false, i.e., no one
may assume the login of another user.
Please note that this configuration parameter has changed name in DSpace 4.0 from
xmlui.user.assumelogin to webui.user.assumelogin as it is now supported also in the
JSP UI
Property: harvest.includerestricted.rss
Informational When set to 'true' (default), items that haven't got the READ permission for the ANONYMOUS
Note: user, will be included in RSS feeds anyway.
Property: harvest.includerestricted.subscription
Informational When set to true (default), items that haven't got the READ permission for the ANONYMOUS
Note: user, will be included in Subscription emails anyway.
Proxy Settings
These settings for proxy are commented out by default. Uncomment and specify both properties if proxy server
is required for external http requests. Use regular host name without port number.
Property: http.proxy.host
Informational Note Enter the host name without the port number.
Property: http.proxy.port
Informational Note Enter the port number for the proxy server.
Media Filters are configured as Named Plugins, with each filter also having a separate configuration setting (in
dspace.cfg) indicating which formats it can process. The default configuration is shown below.
Property: filter.plugins
Example
Value: filter.plugins = PDF Text Extractor, Html Text Extractor, \
Word Text Extractor, JPEG Thumbnail
Informational Place the names of the enabled MediaFilter or FormatFilter plugins. To enable Branded
Note: Preview, comment out the previous one line and then uncomment the two lines in found in
dspace.cfg:
Property: plugin.named.org.dspace.app.mediafilter.FormatFilter
Example
Value: plugin.named.org.dspace.app.mediafilter.FormatFilter = \
org.dspace.app.mediafilter.PDFFilter = PDF Text Extractor, \
org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \
org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG
Property:
filter.org.dspace.app.mediafilter.PDFFilter.inputFormats
filter.org.dspace.app.mediafilter.HTMLFilter.inputFormats
filter.org.dspace.app.mediafilter.WordFilter.inputFormats
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats
Example
Value: filter.org.dspace.app.mediafilter.PDFFilter.inputFormats = Adobe PDF
filter.org.dspace.app.mediafilter.HTMLFilter.inputFormats = HTML, Text
filter.org.dspace.app.mediafilter.WordFilter.inputFormats = Microsoft Word
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats = BMP, GIF, JPEG, \
image/png
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats = BMP, \
GIF, JPEG, image/png
Property: pdffilter.largepdfs
Informational It this value is set for "true", all PDF extractions are written to temp files as they are indexed.
Note: This is slower, but helps to ensure that PDFBox software DSpace uses does not eat up all
your memory.
Property: pdffilter.skiponmemoryexception
Informational If this value is set for "true", PDFs which still result in an "Out of Memory" error from PDFBox
Note: are skipped over. These problematic PDFs will never be indexed until memory usage can be
decreased in the PDFBox software.
Finally, the appropriate filter.<class path>.inputFormats defines the valid input formats which each
filter can be applied. These format names must match the short description field of the Bitstream Format
Registry.
You can also implement more dynamic or configurable Media/Format Filters which extend SelfNamedPlugin
.
For more information on Media/Format Filters, see the section on Mediafilters for Transforming
DSpace Content.
For more information on using Packagers and Crosswalks, see the section on Importing and Exporting
Content via Packages.
The value of this property is a path to a separate properties file containing the configuration for this crosswalk.
The pathname is relative to the DSpace configuration directory, i.e. the config subdirectory of the DSpace
install directory. Example from the dspace.cfg file:
Properties: crosswalk.mods.properties.MODS
crosswalk.mods.properties.mods
Informational This defines a crosswalk named MODS whose configuration comes from the file [dspace]
Note: /config/crosswalks/mods.properties . (In the above example, the lower-case name
was added for OAI-PMH)
The MODS crosswalk properties file is a list of properties describing how DSpace metadata elements are to be
turned into elements of the MODS XML output document. The property name is a concatenation of the
metadata schema, element name, and optionally the qualifier. For example, the contributor.author element in
the native Dublin Core schema would be: dc.contributor.author. The value of the property is a line containing
two segments separated by the vertical bar ("|"_): The first part is an XML fragment which is copied into the
output document. The second is an XPath expression describing where in that fragment to put the value of the
metadata element. For example, in this property:
dc.contributor.author = <mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>%s</mods:namePart>
</mods:name>
Some of the examples include the string "%s" in the prototype XML where the text value is to be inserted, but
don't pay any attention to it, it is an artifact that the crosswalk ignores. For example, given an author named
Jack Florey, the crosswalk will insert
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>Jack Florey</mods:namePart>
</mods:name>
into the output document. Read the example configuration file for more details.
XSLT-based Crosswalks
The XSLT crosswalks use XSL stylesheet transformation (XSLT) to transform an XML-based external metadata
format to or from DSpace's internal metadata. XSLT crosswalks are much more powerful and flexible than the
configurable MODS and QDC crosswalks, but they demand some esoteric knowledge (XSL stylesheets). Given
that, you can create all the crosswalks you need just by adding stylesheets and configuration lines, without
touching any of the Java code.
Properties: crosswalk.submission.MODS.stylesheet
As shown above, there are three (3) parts that make up the properties "key":
crosswalk.submission.PluginName.stylesheet =
1 2 3 4
You can make two different plugin names point to the same crosswalk, by adding two configuration entries with
the same path:
crosswalk.submission.MyFormat.stylesheet = crosswalks/myformat.xslt
crosswalk.submission.almost_DC.stylesheet = crosswalks/myformat.xslt
The dissemination crosswalk must also be configured with an XML Namespace (including prefix and URI) and
an XML schema for its output format. This is configured on additional properties in the DSpace configuration:
crosswalk.dissemination.PluginName.namespace.Prefix = namespace-URI
crosswalk.dissemination.PluginName.schemaLocation = schemaLocation value
For example:
crosswalk.dissemination.qdc.namespace.dc = http://purl.org/dc/elements/1.1/
crosswalk.dissemination.qdc.namespace.dcterms = http://purl.org/dc/terms/
crosswalk.dissemination.qdc.schemalocation = http://purl.org/dc/elements/1.1/ \
http://dublincore.org/schemas/xmls/qdc/2003/04/02/qualifieddc.xsd
For example, you can test the marc plugin on the handle 123456789/3 with:
Informations from the script will be printed to stderr while the XML output of the dissemination crosswalk will be
printed to stdout. You can give a third parameter containing a filename to write the output into a file, but be
careful: the file will be overwritten if it exists.
Testing a submission crosswalk works quite the same way. Use the following command-line utility, it calls the
crosswalk plugin to translate an XML document you submit, and displays the resulting intermediate XML (DIM).
Invoke it with:
[dspace]/bin/dspace dsrun
org.dspace.content.crosswalk.XSLTIngestionCrosswalk [-l] <plugin name> <input-file>
where <plugin name> is the name of the crosswalk plugin to test (e.g. "LOM"), and <input-file> is a file
containing an XML document of metadata in the appropriate format.
Add the -l option to pass the ingestion crosswalk a list of elements instead of a whole document, as if the List
form of the ingest() method had been called. This is needed to test ingesters for formats like DC that get called
with lists of elements instead of a root element.
Properties: crosswalk.qdc.namspace.qdc.dc
Properties: crosswalk.qdc.namspace.qdc.dcterms
Properties: crosswalk.qdc.schemaLocation.QDC
Example
Value: crosswalk.qdc.schemaLocation.QDC = http://www.purl.org/dc/terms \
http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd \
http://purl.org/dc/elements/1.1 \
http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd
Properties: crosswalk.qdc.properties.QDC
Informational Configuration of the QDC Crosswalk dissemination plugin for Qualified DC. (Add lower-case
Note: name for OAI-PMH. That is, change QDC to qdc.)}}
In the property key "crosswalk.qdc.properties.QDC" the value of this property is a path to a separate
properties file containing the configuration for this crosswalk. The pathname is relative to the DSpace
configuration directory /[dspace]/config . Referring back to the "Example Value" for this property key, one
has crosswalks/qdc.properties which defines a crosswalk named QDC whose configuration comes from
the file [dspace]/config/crosswalks/qdc.properties .
You will also need to configure the namespaces and schema location strings for the XML output generated by
this crosswalk. The namespaces properties names are formatted:
crosswalk.qdc.namespace.prefix = uri
where prefix is the namespace prefix and uri is the namespace URI. See the above Property and Example
Value keys as the default dspace.cfg has been configured.
The QDC crosswalk properties file is a list of properties describing how DSpace metadata elements are to be
turned into elements of the Qualified DC XML output document. The property name is a concatenation of the
metadata schema, element name, and optionally the qualifier. For example, the contributor.author
element in the native Dublin Core schema would be: dc.contributor.author . The value of the property is
an XML fragment, the element whose value will be set to the value of the metadata field in the property key.
the generated XML in the output document would look like, e.g.:
<dcterms:temporal>Fall, 2005</dcterms:temporal>
You can add names for existing crosswalks, add new plugin classes, and add new configurations for the
configurable crosswalks as noted below.
You can add names for the existing plugins, and add new plugins, by altering these configuration properties.
See the Plugin Manager architecture for more information about plugins.
Property: event.dispatcher.default.class
Informational This is the default synchronous dispatcher (Same behavior as traditional DSpace).
Note:
Property: event.dispatcher.default.consumers
Informational This is the default synchronous dispatcher (Same behavior as traditional DSpace).
Note:
Property: event.dispatcher.noindex.class
Informational The noindex dispatcher will not create search or browse indexes (useful for batch item
Note: imports).
Property: event.dispatcher.noindex.consumers
Informational The noindex dispatcher will not create search or browse indexes (useful for batch item
Note: imports).
Property: event.consumer.search.class
Property: event.consumer.search.filters
Example {{event.consumer.search.filters = }}
Value: Community | Collection | Item | Bundle+Add | Create | Modify |
Modify_Metadata | Delete | Remove
Property: event.consumer.browse.class
Property: event.consumer.browse.filters
Example event.consumer.browse.filters =
Value: Community | Collection | Item | Bundle+Add | Create | Modify |
Modify_Metadata | Delete | Remove
Property: event.consumer.eperson.class
Property: event.consumer.eperson.filters
Property: event.consumer.test.class
Informational Test consumer for debugging and monitoring. Commented out by default.
Note:
Property: event.consumer.test.filters
Informational Test consumer for debugging and monitoring. Commented out by default.
Note:
Property: testConsumer.verbose
Informational Set this to true to enable testConsumer messages to standard output. Commented out by
Note: default.
Embargo
DSpace embargoes utilize standard metadata fields to hold both the "terms" and the "lift date". Which fields you
use are configurable, and no specific metadata element is dedicated or predefined for use in embargo. Rather,
you specify exactly what field you want the embargo system to examine when it needs to find the terms or
assign the lift date.
Property: embargo.field.terms
Informational Embargo terms will be stored in the item metadata. This property determines in which
Note: metadata field these terms will be stored. An example could be dc.embargo.terms
Property: embargo.field.lift
Informational The Embargo lift date will be stored in the item metadata. This property determines in which
Note: metadata field the computed embargo lift date will be stored. You may need to create a DC
metadata field in your Metadata Format Registry if it does not already exist. An example could
be dc.embargo.liftdate
Property: embargo.terms.open
Informational You can determine your own values for the embargo.field.terms property (see above). This
Note: property determines what the string value will be for indefinite embargos. The string in terms
field to indicate indefinite embargo.
Property: plugin.single.org.dspace.embargo.EmbargoSetter
Informational To implement the business logic to set your embargos, you need to override the
Note: EmbargoSetter class. If you use the value DefaultEmbargoSetter, the default implementation
will be used.
Property: plugin.single.org.dspace.embargo.EmbargoLifter
Informational To implement the business logic to lift your embargos, you need to override the EmbargoLifter
Note: class. If you use the value DefaultEmbargoLifter, the default implementation will be used.
More details on Embargo configuration, including specific examples can be found in the Embargo
section of the documentation.
Property: plugin.single.org.dspace.checker.BitstreamDispatcher
Property: checker.retention.default
Informational This option specifies the default time frame after which all checksum checks are removed
Note: from the database (defaults to 10 years). This means that after 10 years, all successful or
unsuccessful matches are removed from the database.
Property: checker.retention.CHECKSUM_MATCH
Example checker.retention.CHECKSUM_MATCH = 8w
Value:
Informational This option specifies the time frame after which a successful match will be removed from your
Note: DSpace database (defaults to 8 weeks). This means that after 8 weeks, all successful
matches are automatically deleted from your database (in order to keep that database table
from growing too large).
For more information on using DSpace's built-in Checksum verification system, see the section on
Validating CheckSums of Bitstreams.
Property: org.dspace.app.itemexport.work.dir
Informational The directory where the exports will be done and compressed.
Note:
Property: org.dspace.app.itemexport.download.dir
Informational The directory where the compressed files will reside and be read by the downloader.
Note
Property: org.dspace.app.itemexport.life.span.hours
Example org.dspace.app.itemexport.life.span.hours = 48
Value:
Informational The length of time in hours each archive should live for. When new archives are created this
Note entry is used to delete old ones.
Property: org.dspace.app.itemexport.max.size
Informational The maximum size in Megabytes (Mb) that the export should be. This is enforced before the
Note compression. Each bitstream's size in each item being exported is added up, if their
cumulative sizes are more than this entry the export is not kicked off.
Subscription Emails
DSpace, through some advanced installation and setup, is able to send out an email to collections that a user
has subscribed. The user who is subscribed to a collection is emailed each time an item id added or modified.
The following property key controls whether or not a user should be notified of a modification.
Property: eperson.subscription.onlynew
Informational For backwards compatibility, the subscription emails by default include any modified items.
Note: The property key is COMMENTED OUT by default.
Hiding Metadata
It is now possible to hide metadata from public consumption that is only available to the Administrator.
Property: metadata.hide.dc.description.provenance
Informational Hides the metadata in the property key above except to the administrator. Fields named here
Note: are hidden in the following places UNLESS the logged-in user is an Administrator:
1. XMLUI metadata XML view, and Item splash pages (long and short views).
2. JSPUI Item splash pages
3. OAI-PMH server, "oai_dc" format. (Note: Other formats are *not* affected.)To
designate a field as hidden, add a property here in the form: metadata.hide.
SCHEMA.ELEMENT.QUALIFIER = true. This default configuration hides the dc.
description.provenance field, since that usually contains email addresses which
ought to be kept private and is mainly of interest to administrators.
Property: webui.submit.blocktheses
Property: webui.submit.upload.required
Informational Whether or not a file is required to be uploaded during the "Upload" step in the submission
Note: process. The default is true. If set to "false", then the submitter (human being) has the option
to skip the uploading of a file.
Example
webui.submit.upload.progressbar = true
Value:
Informational Whether or not show a progress bar during file upload. Please note that to work this feature
Note: requires a JSON endpoint (json/uploadProgress) that is enabled by default. See the named
plugin for the interface org.dspace.app.webui.json.JSONRequest
org.dspace.app.webui.json.UploadProgressJSON = uploadProgress
This property is actually supported only by the JSPUI, the XMLUI doesn't provide yet a
progress bar indicator for file upload.
Actually this feature is available only for the JSP UI. Nonetheless the integration is mainly developed
as an independent service at the dspace-api level.
Property:
webui.submission.sherparomeo-policy-enabled
Example
webui.submission.sherparomeo-policy-enabled = true
Value:
Informational Controls whether or not the UI submission should try to use the Sherpa/RoMEO Publishers
Note: Policy Database Integration (default true)
Property: sherpa.romeo.url
Informational The Sherpa/RoMEO endpoint. Shared with the authority control feauture for Journal Title
Note: autocomplete see AuthorityControlSettings
Property: sherpa.romeo.apikey
Example
sherpa.romeo.apikey = YOUR-API-KEY
Value:
Informational Allow to use a specific API key to raise the usage limit (500 calls/day for unregistred user).
Note:
You can register for a free api access key at http://www.sherpa.ac.uk/news/romeoapikeys.htm
The functionality rely on understanding to which Journal (ISSN) is related the submitting item. This is done out
of box looking to some item metadata but a different strategy can be used as for example look to a metadata
authority in the case that the Sherpa/RoMEO autocomplete for Journal is used (see AuthorityControlSettings)
The strategy used to discover the Journal related to the submission item is defined in the spring file /config
/spring/api/sherpa.xml
<bean class="org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService"
id="org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService">
<property name="issnItemExtractors">
<list>
<bean class="org.dspace.app.sherpa.submit.MetadataValueISSNExtractor">
<property name="metadataList">
<list>
<value>dc.identifier.issn</value>
</list>
</property>
</bean>
<!-- Use the follow if you have the SHERPARoMEOJournalTitle enabled
<bean class="org.dspace.app.sherpa.submit.MetadataAuthorityISSNExtractor">
<property name="metadataList">
<list>
<value>dc.title.alternative</value>
</list>
</property>
</bean> -->
</list>
</property>
</bean>
Creative Commons licensing is optionally available and may be configured for any given collection that has a
defined submission sequence, or be part of the "default" submission process. This process is described in the
Submission User Interface section of this manual. There is a Creative Commons step already defined (step 5),
but it is commented out, so enabling Creative Commons licensing is typically just a matter of uncommenting the
CC License step.
In the JSPUI, an "iframe" is opened to the Creative Commons site. When a Creative Commons license is
selected from that site, information about the CC license is stored in a series of internal bitstreams:
The URL of the CC License is stored in a bitstream named "license_url" in the CC-LICENSE
bundle
The full (HTML) text of the CC License is stored in a bitstream named "license_txt" in the CC-
LICENSE bundle
The RDF version of the CC License is stored in a bitstream named "license_rdf" in the CC-
LICENSE bundle
In the XMLUI, the Create Commons REST API is utilized. This allows the XMLUI to also store metadata
references to the selected CC license, while also storing the CC License as a bitstream. In the XMLUI,
the following CC License information is captured:
The URL of the CC License is stored in the "dc.rights.uri" metadata field (or whatever field is
configured in the "cc.license.uri" setting below)
The name of the CC License is stored in the "dc.rights" metadata field (or whatever field is
configured in the "cc.license.name" setting below). This only occurs if "cc.submit.setname=true"
(default value)
The RDF version of the CC License is stored in a bitstream named "license_rdf" in the CC-
LICENSE bundle (as long as "cc.submit.addbitstream=true", which is the default value)
The following configurations (in dspace.cfg) relate to the XMLUI Creative Commons license process
ONLY:
Property: cc.api.rooturl
Informational Generally will never have to assign a different value - this is the
Note: base URL of the Creative Commons service API.
Property: cc.license.uri
Informational The field that holds the Creative Commons license URI. If you change
Note: from the default value (dc.rights.uri), you will have to reconfigure
the XMLUI for proper display of license data
Property: cc.license.name
Informational The field that holds the Creative Commons license Name. If you
Note: change from the default value (dc.rights), you will have to
reconfigure the XMLUI for proper display of license data
Property: cc.submit.setname
Informational If true, the license assignment will add the field configured with
Note: the "cc.license.name" with the name of the CC license; if false,
only "cc.license.uri" field is added.
Property: cc.submit.addbitstream
Informational If true, the license assignment will add a bitstream with the CC
Note: license RDF; if false, only metadata field(s) are added.
Property: cc.license.classfilter
Informational This list defines the values that will be excluded from the license
Note: (class) selection list, as defined by the web service at the URL:
http://api.creativecommons.org/rest/1.5/classes
Property: cc.license.jurisdiction
Example cc.license.jurisdiction = nz
Value:
Property: webui.licence_bundle.show
Informational Sets whether to display the contents of the license bundle (often just the deposit license in the
Note: standard DSpace installation).
Property: webui.browse.thubnail.show
Informational Controls whether to display thumbnails on browse and search result pages. If you have
Note: customized the Browse columnlist, then you must also include a "thumbnail" column in your
configuration. _(This configuration property key is not used by XMLUI. To show thumbnails
using XMLUI, you need to create a theme which displays them)._
Property: webui.browse.thumbnail.maxheight
Example webui.browse.thumbnail.maxheight = 80
Value:
Informational This property determines the maximum height of the browse/search thumbnails in pixels (px).
Note: This only needs to be set if the thumbnails are required to be smaller than the dimensions of
thumbnails generated by MediaFilter.
Property: webui.browse.thumbnail.maxwidth
Example webui.browse.thumbnail.maxwidth = 80
Value:
Informational This determines the maximum width of the browse/search thumbnails in pixels (px). This only
Note: needs to be set if the thumbnails are required to be smaller than the dimensions of thumbnails
generated by MediaFilter.
Property: webui.item.thumbnail.show
Informational This determines whether or not to display the thumbnail against each bitstream. (This
Note: configuration property key is not used by XMLUI. To show thumbnails using XMLUI, you need
to create a theme which displays them).
Property: webui.browse.thumbnail.linkbehavior
Informational This determines where clicks on the thumbnail in browse and search screens should lead.
Note: The only values currently supported are "item" or "bitstream", which will either take the user to
the item page, or directly download the bitstream.
Property: thumbnail.maxwidth
Example thumbnail.maxwidth = 80
Value:
Informational This property sets the maximum width of generated thumbnails that are being displayed on
Note: item pages.
Property: thumbnail.maxheight
Example thumbnail.maxheight = 80
Value:
Informational This property sets the maximum height of generated thumbnails that are being displayed on
Note: item pages.
Property: webui.preview.enabled
Property: webui.preview.maxwidth
Informational This property sets the maximum width for the preview image.
Note:
Property: webui.preview.maxheight
Informational This property sets the maximum height for the preview image.
Note:
Property: webui.preview.brand
Informational This is the brand text that will appear with the image.
Note:
Property: webui.preview.brand.abbrev
Informational An abbreviated form of the full Branded Name. This will be used when the preview image
Note: cannot fit the normal text.
Property: webui.preview.brand.height
Example webui.preview.brand.height = 20
Value:
Property: webui.preview.brand.font
Informational This property sets the font for your Brand text that appears with the image.
Note:
Property: webui.preview.brand.fontpoint
Example webui.preview.brand.fontpoint = 12
Value:
Informational This property sets the font point (size) for your Brand text that appears with the image.
Note:
Property: webui.preview.dc
Informational The Dublin Core field that will display along with the preview. This field is optional.
Note:
Property: webui.strengths.show
Informational Determines if communities and collections should display item counts when listed. The default
Note: behavior if omitted, is false.
Property: webui.strengths.cache
Informational When showing the strengths, should they be counted in real time, or fetched from the cache.
Note: Counts fetched in real time will perform an actual count of the database contents every time a
page with this feature is requested, which will not scale. If you set the property key is set to
cache ("true") you must run the following command periodically to update the count: /
[dspace]/bin/dspace itemcounter. The default is to count in real time (set to "false").
Informational This is an example of how one "Defines the Indexes". See Defining the Indexes in the next
Note: sub-section.
Informational This is an example of how one "Defines the Sort Options". See Defining Sort Options in the
Note: following sub-section.
Starting from DSpace 3.0 you can configure which implementation use for the Browse DAOs both for
create/update operations and for read operations. This allows you to customize which browse engine
is utilized in your DSpace. Options include:
SOLR Browse Engine (SOLR DAOs), default since DSpace 4.0 - This enables Apache Solr to
be utilized as a backend for all browsing of DSpace. This option requires that you have
Discovery (Solr search/browse engine) enabled in your DSpace.
PostgreSQL Browse Engine (PostgreSQL DAOs) - This enables all browsing to be done via
PostgreSQL database tables. (This is the traditional browsing option for users who have
PostgreSQL installed.)
Oracle Browse Engine (Oracle DAOs) - This enables all browsing to be done via Oracle
database tables. (This is the traditional browsing option for users who have Oracles installed.)
Property: browseDAO.class
Informational This property configures the Java class that is used for READ operations by the Browse
Note: System. You need to have Discovery enabled (this is the default since DSpace 4.0) to
use the Solr Browse DAOs
Property: browseCreateDAO.class
Informational This property configures the java class that is used for WRITE operations by the Browse
Note: System. You need to have Discovery enabled (this is the default since DSpace 4.0) to
use the Solr Browse DAOs
If you want to re-enable the legacy DBMS Browse Engine please refer to Legacy methods for re-indexing
content
If you make changes in this section be sure to update your SOLR indexes running the Discovery
Maintenance Script, see Discovery
DSpace comes with four default indexes pre-defined: author, title, date issued, and subjects. Users may also
define additional indexes or re-configure the current indexes for different levels of specificity. For example, the
default entries that appear in the dspace.cfg as default installation:
webui.browse.index.1 = dateissued:metadata:dc.date.issued:date:full
webui.browse.index.2 = author:metadata:dc.contributor.*:text
webui.browse.index.3 = title:metadata:dc.title:title:full
webui.browse.index.4 = subject:metadata:dc.subject.*:text
#webui.browse.index.5 = dateaccessioned:item:dateaccessioned
webui. n is the index number. The index numbers must start from 1 and increment continuously by
browse. 1 thereafter. Deviation from this will cause an error during install or a configuration update.
index. < n > So anytime you add a new browse index, remember to increase the number. (Commented
out index numbers may be used over again).
<index The name by which the index will be identified. You will need to update your Messages.
name> properties file to match this field. (The form used in the Messages.properties file is: browse.
type.metadata.<index name> .
<schema The schema used for the field to be index. The default is dc (for Dublin Core).
prefix>
<element> The schema element. In Dublin Core, for example, the author element is referred to as
"Contributor". The user should consult the default Dublin Core Metadata Registry table in
Appendix A.
<qualifier> This is the qualifier to the <element> component. The user has two choices: an asterisk "" or
a proper qualifier of the element. The asterisk is a wildcard and causes DSpace to
index all types of the schema element. For example, if you have the element
"contributor" and the qualifier "" then you would index all contributor data regardless of
the qualifier. Another example, you have the element "subject" and the qualifier "lcsh" would
cause the indexing of only those fields that have the qualifier "lcsh". (This means you would
only index Library of Congress Subject Headings and not all data elements that are subjects.
<index Choose full or single. This refers to the way that the index will be displayed in the
display> browse listing. "Full" will be the full item list as specified by webui.itemlist.columns ;
"single" will be a single list of only the indexed term.
If you are customizing this list beyond the default, you will need to insert the text you wish to appear in the
navigation and on link and buttons. You need to edit the Messages.properties file. The form of the
parameter(s) in the file:
browse.type.<index name>
If you make changes in this section be sure to update your SOLR indexes running the Discovery
Maintenance Script, see Discovery
Sort options will be available when browsing a list of items (i.e. only in "full" mode, not "single" mode). You can
define an arbitrary number of fields to sort on, irrespective of which fields you display using web.itemlist.columns
. For example, the default entries that appear in the dspace.cfg as default installation:
webui.itemlist.sort-option.1 = title:dc.title:title
webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
<option The name by which the sort option will be identified. This may be used in later configuration
name> or to locate the message key (found in Messages.properties file) for this index.
<schema The schema used for the field to be index. The default is dc (for Dublin Core).
prefix>
<element> The schema element. In Dublin Core, for example, the author element is referred to as
"Contributor". The user should consult the default Dublin Core Metadata Registry table in
Appendix A.
<qualifier> This is the qualifier to the <element> component. The user has two choices: an asterisk "*"
or a proper qualifier of the element.
If you make changes in this section be sure to update your SOLR indexes running the Discovery
Maintenance Script, see Discovery
Normalization Rules are those rules that make it possible for the indexes to intermix entries without regard to
case sensitivity. By default, the display of metadata in the browse indexes are case-sensitive. In the example
below, you retrieve separate entries:
Twain, Marktwain, markTWAIN, MARK
However, clicking through from either of these will result in the same set of items (i.e., any item that contains
either representation in the correct field).
Property: webui.browse.metadata.case-insensitive
Informational This controls the normalization of the index entry. Uncommenting the option (which is
Note: commented out by default) will make the metadata items case-insensitive. This will result in a
single entry in the example above. However, the value displayed may be any one of the above
‚ depending on what representation was present in the first item indexed.
At the present time, you would need to edit your metadata to clean up the index presentation.
Informational This enable/disable the show of frequencies (count) in metadata browse < n > refers to the
Note: browse configuration. As default frequencies are shown for all metadata browse
Property: webui.browse.value_columns.max
Informational This sets the options for the size (number of characters) of the fields stored in the database.
Note: The default is 0, which is unlimited size for fields holding indexed data. Some database
implementations (e.g. Oracle) will enforce their own limit on this field size. Reducing the field
size will decrease the potential size of your database and increase the speed of the browse,
but it will also increase the chance of mis-ordering of similar fields. The values are
commented out, but proposed values for reasonably performance versus result quality. This
affects the size of field for the browse value (this will affect display, and value sorting )
Property: webui.browse.sort_columns.max
Informational Size of field for hidden sort columns (this will affect only sorting, not display). Commented out
Note: as default.
Property: webui.browse.value_columns.omission_mark
Informational Omission mark to be placed after truncated strings in display. The default is "...".
Note:
Property: plugin.named.org.dspace.sort.OrderFormatDelegate
Example
Value: plugin.named.org.dspace.sort.OrderFormatDelegate = \
org.dspace.sort.OrderFormatTitleMarc21=title
Informational This sets the option for how the indexes are sorted. All sort normalizations are carried out by
Note: the OrderFormatDelegate. The plugin manager can be used to specify your own delegates for
each datatype. The default datatypes (and delegates) are:
author = org.dspace.sort.OrderFormatAuthor
title = org.dspace.sort.OrderFormatTitle
text = org.dspace.sort.OrderFormatText
If you redefine a default datatype here, the configuration will be used in preferences to the
default. However, if you do not explicitly redefine a datatype, then the default will still be used
in addition to the datatypes you do specify. As of DSpace release 1.5.2, the multi-lingual
MARC21 title ordering is configured as default, as shown in the example above. To use the
previous title ordering (before release 1.5.2), comment out the configuration in your dspace.cfg
file.
Informational
Note:
Property: webui.browse.author-field
Informational Note: This defines which field is the author/editor, etc. listing.
Replace dc.contributor.* with another field if appropriate. The field should be listed in the configuration for
webui.itemlist.columns, otherwise you will not see its effect. It must also be defined in webui.
itemlist.columns as being of the datatype text otherwise the functionality will be overridden by the specific
data type feature. (This setting is not used by the XMLUI as it is controlled by your theme).
Now that we know which field is our author or other multiple metadata value field we can provide the option to
truncate the number of values displayed by default. We replace the remaining list of values with "et al" or the
language pack specific alternative. Note that this is just for the default, and users will have the option of
changing the number displayed when they browse the results. See the following table:
Property: webui.browse.author-limit
Informational Where < n > is an integer number of values to be displayed. Use -1 for unlimited (the
Note: default value).
Property: webui.browse.link.<n>
Informational This is used to configure which fields should link to other browse listings. This should be
Note: associated with the name of one of the browse indexes (webui.browse.index.n) with a
metadata field listed in webui.itemlist.columns above. If this condition is not fulfilled,
cross-linking will not work. Note also that crosslinking only works for metadata fields not
tagged as title in webui.itemlist.columns.
The format of the property key is webui.browse.link.<n> = <index name>:<display column metadata> Please
notice the punctuation used between the elements.
<index name> This need to match your entry for the index name from webui.browse.index
property key.
webui.browse.link.1 = author:dc.contributor.*
Creates a link for all types of contributors (authors, editors, illustrators, others, etc.)
webui.browse.link.2 = subject:dc.subject.lcsh
Creates a link to subjects that are Library of Congress only. In this case, you have a browse index that contains
only LC Subject Headings
webui.browse.link.3 = series:dc.relation.ispartofseries
Creates a link for the browse index "Series". Please note this is again, a customized browse index and not part
of the DSpace distributed release.
Recent Submissions
Since DSpace 4.0 this will apply by default only to JSPUI. XML UI will use a new way to configure the
recent submissions that does not rely on the Browse System. See Discovery
This allows us to define which index to base Recent Submission display on, and how many we should show at
any one time. This uses the PluginManager to automatically load the relevant plugin for the Community and
Collection home pages. Values given in examples are the defaults supplied in dspace.cfg
Property: recent.submission.sort-option
Informational Define the sort name (from webui.browse.sort-options) to use for displaying recent
Note: submissions.
Property: recent.submissions.count
Informational Defines how many recent submissions should be displayed at any one time.
Note:
There will be the need to set up the processors that the PluginManager will load to actually perform the recent
submissions query on the relevant pages. This is already configured by default dspace.cfg so there should be
no need for the administrator/programmer to worry about this.
plugin.sequence.org.dspace.plugin.CommunityHomeProcessor = \
org.dspace.app.webui.components.RecentCommunitySubmissions
plugin.sequence.org.dspace.plugin.CollectionHomeProcessor = \
org.dspace.app.webui.components.RecentCollectionSubmissions
Example
Value: plugin.named.org.dspace.content.license.LicenseArgumentFormatter = \
org.dspace.content.license.SimpleDSpaceObjectLicenseFormatter = collection, \
org.dspace.content.license.SimpleDSpaceObjectLicenseFormatter = item, \
org.dspace.content.license.SimpleDSpaceObjectLicenseFormatter = eperson
Informational It is possible include contextual information in the submission license using substitution
Note: variables. The text substitution is driven by a plugin implementation.
Property: webui.feed.enable
Informational By default, RSS feeds are set to true (on) . Change key to "false" to disable.
Note:
Property: webui.feed.items
Example webui.feed.items = 4
Value:
Informational Defines the number of DSpace items per feed (the most recent submissions)
Note:
Property: webui.feed.cache.size
Informational Defines the maximum number of feeds in memory cache. Value of "0" will disable caching.
Note:
Property: webui.feed.cache.age
Example webui.feed.cache.age = 48
Value:
Informational Defines the number of hours to keep cached feeds before checking currency. The value of " 0"
Note: will force a check with each request.
Property: webui.feed.formats
Informational Defines which syndication formats to offer. You can use more than one; use a comma-
Note: separated list. The following list are the available values: rss_0.90, rss_0.91, rss_0.92, rss_0.
93, rss_0.94, rss_1.0, rss_2.0, atom_1.0.
Property: webui.feed.localresolve
Informational By default, (set to false), URLs returned by the feed will point at the global handle resolver (e.
Note: g. http://hdl.handle.net/123456789/1). If set to true the local server URLs are used (e.g.
http://myserver.myorg/handle/123456789/1).
Property: webui.feed.item.title
Informational This property customizes each single-value field displayed in the feed information for each
Note: item. Each of the fields takes a single metadata field. The form of the key is <scheme prefix>.
<element>.<qualifier> In place of the qualifier, one may leave it blank to exclude any qualifiers
or use the wildcard "*" to include all qualifiers for a particular element.
Property: webui.feed.item.date
Informational This property customizes each single-value field displayed in the feed information for each
Note: item. Each of the fields takes a single metadata field. The form of the key is <scheme prefix>.
<element>.<qualifier> In place of the qualifier, one may leave it blank to exclude any qualifiers
or use the wildcard "*" to include all qualifiers for a particular element.
Property: webui.feed.item.description
Example
Value: webui.feed.item.description = dc.title, dc.contributor.author, \
dc.contributor.editor, dc.description.abstract, \
dc.description
Informational One can customize the metadata fields to show in the feed for each item's description.
Note: Elements are displayed in the order they are specified in dspace.cfg.Like other property keys,
the format of this property key is: webui.feed.item.description = <scheme prefix>.<element>.
<qualifier>. In place of the qualifier, one may leave it blank to exclude any qualifiers or use the
wildcard "*" to include all qualifiers for a particular element.
Property: webui.feed.item.author
Informational The name of field to use for authors (Atom only); repeatable.
Note:
Property: webui.feed.logo.url
Informational Customize the image icon included with the site-wide feeds. This must be an absolute URL.
Note:
Property: webui.feed.item.dc.creator
Informational This optional property adds structured DC elements as XML elements to the feed description.
Note: They are not the same thing as, for example, webui.feed.item.description. Useful when a
program or stylesheet will be transforming a feed and wants separate author, description,
date, etc.
Property: webui.feed.item.dc.date
Informational This optional property adds structured DC elements as XML elements to the feed description.
Note: They are not the same thing as, for example, webui.feed.item.description. Useful when a
program or stylesheet will be transforming a feed and wants separate author, description,
date, etc.
Property: webui.feed.item.dc.description
Informational This optional property adds structured DC elements as XML elements to the feed description.
Note: They are not the same thing as, for example, webui.feed.item.description. Useful when a
program or stylesheet will be transforming a feed and wants separate author, description,
date, etc.
Property: webui.feed.podcast.collections
Informational This optional property enables Podcast Support on the RSS feed for the specified collection
Note: handles. The podcast is iTunes compatible and will expose the bitstreams in the items for
viewing and download by the podcast reader. Multiple values are separated by commas. For
more on using/enabling Media RSS Feeds to share content via iTunesU, see: Enable Media
RSS Feeds
Property: webui.feed.podcast.communities
Informational This optional property enables Podcast Support on the RSS feed for the specified community
Note: handles. The podcast is iTunes compatible and will expose the bitstreams in the items for
viewing and download by the podcast reader. Multiple values are separated by commas. For
more on using/enabling Media RSS Feeds to share content via iTunesU, see: Enable Media
RSS Feeds
Property: webui.feed.podcast.mimetypes
Informational This optional property for Podcast Support, allows you to choose which MIME types of
Note: bitstreams are to be enclosed in the podcast feed. Multiple values are separated by commas.
For more on using/enabling Media RSS Feeds to share content via iTunesU, see: Enable
Media RSS Feeds
Property: webui.feed.podcast.sourceuri
Informational This optional property for the Podcast Support will allow you to use a value for a metadata
Note: field as a replacement for actual bitstreams to be enclosed in the RSS feed. A use case for
specifying the external sourceuri would be if you have a non-DSpace media streaming server
that has a copy of your media file that you would prefer to have the media streamed from. For
more on using/enabling Media RSS Feeds to share content via iTunesU, see: Enable Media
RSS Feeds
OpenSearch Support
OpenSearch is a small set of conventions and documents for describing and using "search engines", meaning
any service that returns a set of results for a query. See extensive description in the Business Layer section of
the documentation.
Please note that for result data formatting, OpenSearch uses Syndication Feed Settings (RSS). So, even if
Syndication Feeds are not enable, they must be configured to enable OpenSearch. OpenSearch uses all the
configuration properties for DSpace RSS to determine the mapping of metadata fields to feed fields. Note that a
new field for authors has been added (used in Atom format only).
Property: websvc.opensearch.enable
Informational Whether or not OpenSearch is enabled. By default, the feature is disabled. Change the
Note: property key to "true" to enable.
Property: websvc.opensearch.uicontext
Informational Context for HTML request URLs. Change only for non-standard servlet mapping.
Note: IMPORTANT: If you are using XMLUI and have Discovery enabled, this property's value
should be changed to discover.
Property: websvc.opensearch.svccontext
Informational Context for RSS/Atom request URLs. Change only for non-standard servlet mapping.
Note: IMPORTANT: If you are using XMLUI and have Discovery enabled, this property's value
should be changed to open-search/discover.
Property: websvc.opensearch.autolink
Property: websvc.opensearch.validity
Example websvc.opensearch.validity = 48
Value:
Informational Number of hours to retain results before recalculating. This applies to the Manakin interface
Note: only.
Property: websvc.opensearch.shortname
Informational A short name used in browsers for search service. It should be sixteen (16) or fewer
Note: characters.
Property: websvc.opensearch.longname
Property: websvc.opensearch.description
Property: websvc.opensearch.faviconurl
Informational Location of favicon for service, if any. They must by 16 x 16 pixels. You can provide your own
Note: local favicon instead of the default.
Property: websvc.opensearch.samplequery
Informational Sample query. This should return results. You can replace the sample query with search
Note: terms that should actually yield results in your repository.
Property: websvc.opensearch.tags
Property: websvc.opensearch.formats
Informational Result formats offered. Use one or more comma-separated from the list: html, atom, rss.
Note: Please note that html is required for auto discovery in browsers to function, and must be the
first in the list if present.
Property: webui.content_disposition_threshold
Informational The default value is set to 8MB. This property key applies to the JSPUI interface.
Note:
Property: xmlui.content_disposition_threshold
Informational The default value is set to 8MB. This property key applies to the XMLUI (Manakin)
Note: interface.
Property: webui.html.max-depth-guess
Example webui.html.max-depth-guess = 3
Value:
Informational When serving up composite HTML items in the JSP UI, how deep can the request be for us to
Note: serve up a file with the same name? For example, if one receives a request for " foo/bar/index.
html" and one has a bitstream called just "index.html", DSpace will serve up the former
bitstream (foo/bar/index.html) for the request if webui.html.max-depth-guess is 2 or greater. If
webui.html.max-depth-guess is 1 or less, then DSpace would not serve that bitstream, as the
depth of the file is greater. If webui.html.max-depth-guess is zero, the request filename and
path must always exactly match the bitstream name. The default is set to 3.
Property: xmlui.html.max-depth-guess
Example xmlui.html.max-depth-guess = 3
Value:
Informational When serving up composite HTML items in the XMLUI, how deep can the request be for us to
Note: serve up a file with the same name? For example, if one receives a request for " foo/bar/index.
html" and one has a bitstream called just "index.html", DSpace will serve up the former
bitstream (foo/bar/index.html) for the request if webui.html.max-depth-guess is 2 or greater. If
xmlui.html.max-depth-guess is 1 or less, then DSpace would not serve that bitstream, as the
depth of the file is greater. If _webui.html.max-depth-guess _is zero, the request filename and
path must always exactly match the bitstream name. The default is set to 3.
Sitemap Settings
To aid web crawlers index the content within your repository, you can make use of sitemaps.
Property: sitemap.dir
Property: sitemap.engineurls
Informational Comma-separated list of search engine URLs to "ping" when a new Sitemap has been
Note: created. Include everything except the Sitemap UL itself (which will be URL-encoded and
appended to form the actual URL "pinged").Add the following to the above parameter if you
have an application ID with Yahoo: http://search.yahooapis.com/SiteExplorererService/V1
/updateNotification?appid=REPLACE_ME?url=_ . (Replace the component _REPLACE_ME
with your application ID). There is no known "ping" URL for MSN/Live search.
For an in-depth description of this feature, please consult: Authority Control of Metadata Values
Property: plugin.named.org.dspace.content.authority.ChoiceAuthority
Example
Value: plugin.named.org.dspace.content.authority.ChoiceAuthority = \
org.dspace.content.authority.SampleAuthority = Sample, \
org.dspace.content.authority.LCNameAuthority = LCNameAuthority, \
org.dspace.content.authority.SHERPARoMEOPublisher = SRPublisher, \
org.dspace.content.authority.SHERPARoMEOJournalTitle = SRJournalTitle
Informational --
Note:
Property: plugin.selfnamed.org.dspace.content.authority.ChoiceAuthority
Example
Value: plugin.selfnamed.org.dspace.content.authority.ChoiceAuthority = \
org.dspace.content.authority.DCInputAuthority
Property: lcname.url
Informational Please refers to the Sherpa/RoMEO Publishers Policy Database Integration section for details
Note: about such properties. See Configuring the Sherpa/RoMEO Publishers Policy Database
Integration
Property: authority.minconfidence
Informational This sets the default lowest confidence level at which a metadata value is included in an
Note: authority-controlled browse (and search) index. It is a symbolic keyword, one of the following
values (listed in descending order): accepted, uncertain, ambiguous, notfound, failed,
rejected, novalue, unset. See org.dspace.content.authority.Choices source for
descriptions.
Property: xmlui.lookup.select.size
Example xmlui.lookup.select.size = 12
Value:
Informational This property sets the number of selectable choices in the Choices lookup popup
Note:
Property: upload.temp.dir
Informational This property sets where DSpace temporarily stores uploaded files.
Note:
Property: upload.max
Informational Maximum size of uploaded files in bytes. A negative setting will result in no limit being set.
Note: The default is set for 512Mb.
Property: webui.itemdisplay.default
Example
Value: webui.itemdisplay.default = dc.title, dc.title.alternative, \
dc.contributor.*, dc.subject, dc.data.issued(date), \
dc.publisher, dc.identifier.citation, \
dc.relation.ispartofseries, dc.description.abstract, \
dc.description, dc.identifier.govdoc, \
dc.identifier.uri(link), dc.identifier.isbn, \
dc.identifier.issn, dc.identifier.ismn, dc.identifier
Informational This is used to customize the DC metadata fields that display in the item display (the brief
Note: display) when pulling up a record. The format is: <schema>.<element>.
<_optional_qualifier> . In place of the qualifier, one can use the wildcard "*" to include
all fields of the same element, or, leave it blank for unqualified elements. Additionally, two
additional options are available for behavior/rendering: (date) and (link). See the following
examples:
Property:
webui.resolver.1.urn
webui.resolver.1.baseurl
webui.resolver.2.urn
webui.resolver.2.baseurl
Example
Value: webui.resolver.1.urn = doi
webui.resolver.1.baseurl = http://dx.doi.org/
webui.resolver.2.urn = hdl
webui.resolver.2.baseurl = http://hdl.handle.net/
Informational When using "resolver" in webui.itemdisplay to render identifiers as resolvable links, the base
Note: URL is taken from <code>webui.resolver.<n>.baseurl<code> where <code>webui.resolver.
<n>.baseurl<code> matches the urn specified in the metadata value. The value is appended
to the "baseurl" as is, so the baseurl needs to end with the forward slash almost in any case.
If no urn is specified in the value it will be displayed as simple text. For the doi and hdl urn
defaults values are provided, respectively http://dc.doi.org and http://hdl.handle.net are used.
If a metadata value with style "doi", "handle" or "resolver" matches a URL already, it is simply
rendered as a link with no other manipulation.
Property: plugin.single.org.dspace.app.webui.util.StyleSelection
Example
Value: plugin.single.org.dspace.app.webui.util.StyleSelection = \
org.dspace.app.web.util.CollectionStyleSelection
#org.dspace.app.web.util.MetadataStyleSelection
Informational Specify which strategy to use for select the style for an item.
Note:
Property: webui.itemdisplay.thesis.collections
Property:
webui.itemdisplay.metadata-style
webui.itemdisplay.metadata-style
Example
Value: webui.itemdisplay.metadata-style = schema.element[.qualifier|.*]
webui.itemdisplay.metadata-style = dc.type
Property: webui.itemlist.columns
Example
Value: webui.itemlist.columns = thumbnail, dc.date.issued(date), dc.title, \
dc.contributor.*
Informational Customize the DC fields to use in the item listing page. Elements will be displayed left to right
Note: in the order they are specified here. The form is <schema prefix>.<element>[.<qualifier> | .*]
[(date)], ...
Although not a requirement, it would make sense to include among the listed fields at least
the date and title fields as specified by the webui.browse.index configuration options in
the next section mentioned. (cf.)
If you have enabled thumbnails (webui.browse.thumbnail.show), you must also include a
'thumbnail' entry in your columns‚ this is where the thumbnail will be displayed.
Property: webui.itemlist.width
Informational You can customize the width of each column with the following line--you can have numbers
Note: (pixels) or percentages. For the 'thumbnail' column, a setting of '*' will use the max width
specified for browse thumbnails (cf. webui.browse.thumbnail.maxwidth, thumbnail.
maxwidth)
Property:
webui.itemlist.browse.<index name>.sort.<sort name>.columns
webui.itemlist.sort.<sort name>.columns
webui.itemlist.browse.<browse name>.columns
webui.itemlist.<sort or index name>.columns
Example
Value:
Informational You can override the DC fields used on the listing page for a given browse index and/or sort
Note: option. As a sort option or index may be defined on a field that isn't normally included in the
list, this allows you to display the fields that have been indexed/sorted on. There are a
number of forms the configuration can take, and the order in which they are listed below is the
priority in which they will be used (so a combination of an index name and sort name will take
precedence over just the browse name).In the last case, a sort option name will always take
precedence over a browse index name. Note also, that for any additional columns you list,
you will need to ensure there is an itemlist.<field name> entry in the messages file.
Property: webui.itemlist.dateaccessioned.columns
Informational This would display the date of the accession in place of the issue date whenever the
Note: dateaccessioned browsed index or sort option is selected. Just like webui.itemlist.columns,
you will need to include a 'thumbnail' entry to display the thumbnails in the item list.
Property: webui.itemlist.dateaccessioned.widths
Informational As in the aforementioned property key, you can customize the width of the columns for each
Note: configured column list, substituting ".widths" for ".columns" in the property name. See the
setting for webui.itemlist.widths for more information.
Property: webui.itemlist.tablewidth
Informational You can also set the overall size of the item list table with the following setting. It can lead to
Note: faster table rendering when used with the column widths above, but not generally
recommended.
Property: webui.session.invalidate
Informational Enable or disable session invalidation upon login or logout. This feature is enabled by default
Note: to help prevent session hijacking but may cause problems for shibboleth, etc. If omitted, the
default value is "true". [Only used for JSPUI authentication].
Property: jspui.google.analytics.key
Informational If you would like to use Google Analytics to track general website statistics then use the
Note: following parameter to provide your Analytics key.
Example default.locale = en
Value:
Informational The default language for the application is set with this property key. This is a locale
Note: according to i18n and might consist of country, country_language or
country_language_variant. If no default locale is defined, then the server default locale will be
used. The format of a local specifier is described here: http://java.sun.com/j2se/1.4.2/docs/api
/java/util/Locale.html
Changes in dspace.cfg
Property: webui.supported.locales
Informational All the locales that are supported by this instance of DSpace. Comma separated list.
Note:
Related Files
If you set webui.supported.locales make sure that all the related additional files for each language are available.
LOCALE should correspond to the locale set in webui.supported.locales, e. g.: for webui.supported.locales =
en, de, fr, there should be:
[dspace-source]/dspace/modules/jspui/src/main/resources/Messages.properties
[dspace-source]/dspace/modules/jspui/src/main/resources/Messages_en.
properties
[dspace-source]/dspace/modules/jspui/src/main/resources/Messages_de.
properties
[dspace-source]/dspace/modules/jspui/src/main/resources/Messages_fr.
properties
Files to be localized:
[dspace-source]/dspace/modules/jspui/src/main/resources/Messages_LOCALE.
properties
[dspace-source]/dspace/config/input-forms_LOCALE.xml
[dspace-source]/dspace/config/default_LOCALE.license - should be pure ASCII
[dspace-source]/dspace/config/news-top_LOCALE.html
[dspace-source]/dspace/config/news-side_LOCALE.html
[dspace-source]/dspace/config/emails/change_password_LOCALE
[dspace-source]/dspace/config/emails/feedback_LOCALE
[dspace-source]/dspace/config/emails/internal_error_LOCALE
[dspace-source]/dspace/config/emails/register_LOCALE
[dspace-source]/dspace/config/emails/submit_archive_LOCALE
[dspace-source]/dspace/config/emails/submit_reject_LOCALE
[dspace-source]/dspace/config/emails/submit_task_LOCALE
[dspace-source]/dspace/config/emails/subscription_LOCALE
[dspace-source]/dspace/config/emails/suggest_LOCALE
[dspace]/webapps/jspui/help/collection-admin_LOCALE.html - in html keep the
jump link as original; must be copied to [dspace-source]/dspace/modules/jspui
/src/main/webapp/help
[dspace]/webapps/jspui/help/index_LOCALE.html - must be copied to [dspace-
source]/dspace/modules/jspui/src/main/webapp/help
Define the index name (from webui.browse.index) to use for displaying items by author.
Property: itemmap.author.index
Informational If you change the name of your author browse field, you will also need to update this
Note: property key.
Informational Note: To display group membership set to "true". If omitted, the default behavior is false.
Property: sfx.server.url
sfx.server.url = http://worldcatlibraries.org/registry/gateway?
Informational SFX query is appended to this URL. If this property is commented out or omitted, SFX
Note: support is switched off.
All the parameters mapping are defined in [dspace]/config/sfx.xml file. The program will check the
parameters in sfx.xml and retrieve the correct metadata of the item. It will then parse the string to your
resolver.
For the following example, the program will search the first query-pair which is DOI of the item. If there is a DOI
for that item, your retrieval results will be, for example:
http://researchspace.auckland.ac.nz/handle/2292/5763
<query-pairs>
<field>
<querystring>rft_id=info:doi/</querystring>
<dc-schema>dc</dc-schema>
<dc-element>identifier</dc-element>
<dc-qualifier>doi</dc-qualifier>
</field>
</query-pairs>
If there is no DOI for that item, it will search next query-pair based on the [dspace]/config/sfx.xml and
then so on.
<querystring>rft_id=info:doi/</querystring>
Program assume won't get empty string for the item, as there will at least author, title for the item to pass to the
resolver.
For contributor author, program maintains original DSpace SFX function of extracting author's first and last
name.
<field>
<querystring>rft.aulast=</querystring>
<dc-schema>dc</dc-schema>
<dc-element>contributor</dc-element>
<dc-qualifier>author</dc-qualifier>
</field>
<field>
<querystring>rft.aufirst=</querystring>
<dc-schema>dc</dc-schema>
<dc-element>contributor</dc-element>
<dc-qualifier>author</dc-qualifier>
</field>
Informational Show a link to the item recommendation page from item display page.
Note:
Property: webui.suggest.loggedinusers.only
Informational Enable only if the user is logged in. If this key commented out, the default value is false.
Note:
Property: webui.controlledvocabulary.enable
Informational Enable or disable the controlled vocabulary add-on. WARNING: This feature is not compatible
Note: with WAI (it requires JavaScript to function).
The need for a limited set of keywords is important since it eliminates the ambiguity of a free description
system, consequently simplifying the task of finding specific items of information.
The controlled vocabulary add-on allows the user to choose from a defined set of keywords organized in an tree
(taxonomy) and then use these keywords to describe items while they are being submitted.
We have also developed a small search engine that displays the classification tree (or taxonomy) allowing the
user to select the branches that best describe the information that he/she seeks.
The taxonomies are described in XML following this (very simple) structure:
You are free to use any application you want to create your controlled vocabularies. A simple text editor should
be enough for small projects. Bigger projects will require more complex tools. You may use Protegé to create
your taxonomies, save them as OWL and then use a XML Stylesheet (XSLT) to transform your documents to
the appropriate format. Future enhancements to this add-on should make it compatible with standard schemas
such as OWL or RDF.
In order to make DSpace compatible with WAI 2.0, the add-on is turned off by default (the add-on relies
strongly on JavaScript to function). It can be activated by setting the following property in dspace.cfg:
webui.controlledvocabulary.enable = true
Vocabularies need to be associated with the correspondent DC metadata fields. Edit the file [dspace]
/config/input-forms.xml and place a "vocabulary" tag under the "field" element that you want to control.
Set value of the "vocabulary" element to the name of the file that contains the vocabulary, leaving out the
extension (the add-on will only load files with extension "*.xml"). For example:
<field>
<dc-schema>dc</dc-schema>
<dc-element>subject</dc-element>
<dc-qualifier></dc-qualifier>
<!-- An input-type of twobox MUST be marked as repeatable -->
<repeatable>true</repeatable>
<label>Subject Keywords</label>
<input-type>twobox</input-type>
<hint> Enter appropriate subject keywords or phrases below. </hint>
<required></required>
<vocabulary [closed="false"]>nsi</vocabulary>
</field>
The vocabulary element has an optional boolean attribute closed that can be used to force input only with the
JavaScript of controlled-vocabulary add-on. The default behavior (i.e. without this attribute) is as set closed="
false". This allow the user also to enter the value in free way.
Property: webui.session.invalidate
Informational Enable or disable session invalidation upon login or logout. This feature is enabled by default
Note: to help prevent session hijacking but may cause problems for shibboleth, etc. If omitted, the
default value is 'true'.
Property: xmlui.force.ssl
Informational Force all authenticated connections to use SSL, only non-authenticated connections are
Note: allowed over plain http. If set to true, then you need to ensure that the " dspace.hostname"
parameter is set to the correctly.
Property: xmlui.user.registration
Informational Determine if new users should be allowed to register. This parameter is useful in conjunction
Note: with Shibboleth where you want to disallow registration because Shibboleth will automatically
register the user. Default value is true.
Property: xmlui.user.editmetadata
Informational Determines if users should be able to edit their own metadata. This parameter is useful in
Note: conjunction with Shibboleth where you want to disable the user's ability to edit their metadata
because it came from Shibboleth. Default value is true.
Property: xmlui.user.loginredirect
Informational After a user has logged into the system, which url should they be directed? Leave this
Note: parameter blank or undefined to direct users to the homepage, or /profile for the user's profile,
or another reasonable choice is /submissions to see if the user has any tasks awaiting their
attention. The default is the repository home page.
Property: xmlui.theme.allowoverrides
Informational Allow the user to override which theme is used to display a particular page. When submitting
Note: a request add the HTTP parameter "themepath" which corresponds to a particular theme, that
specified theme will be used instead of the any other configured theme. Note that this is a
potential security hole allowing execution of unintended code on the server, this option is only
for development and debugging it should be turned off for any production repository. The
default value unless otherwise specified is "false".
Property: xmlui.bundle.upload
Informational Determine which bundles administrators and collection administrators may upload into an
Note: existing item through the administrative interface. If the user does not have the appropriate
privileges (add and write) on the bundle then that bundle will not be shown to the user as an
option.
Property: xmlui.community-list.render.full
Informational On the community-list page should all the metadata about a community/collection be
Note: available to the theme. This parameter defaults to true, but if you are experiencing
performance problems on the community-list page you should experiment with turning this
option off.
Property: xmlui.community-list.cache
Informational Normally, Manakin will fully verify any cache pages before using a cache copy. This means
Note: that when the community-list page is viewed the database is queried for each community
/collection to see if their metadata has been modified. This can be expensive for repositories
with a large community tree. To help solve this problem you can set the cache to be assumed
valued for a specific set of time. The downside of this is that new or editing communities
/collections may not show up the website for a period of time.
Property: xmlui.bistream.mods
Informational Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream.
Note: The MODS metadata file must be inside the "METADATA" bundle and named MODS.xml. If
this option is set to 'true' and the bitstream is present then it is made available to the theme
for display.
Property: xmlui.bitstream.mets
Informational Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream.
Note: The METS metadata file must be inside the "METADATA" bundle and named METS.xml. If
this option is set to "true" and the bitstream is present then it is made available to the theme
for display.
Property: xmlui.google.analytics.key
Informational If you would like to use Google Analytics to track general website statistics then use the
Note: following parameter to provide your analytics key. First sign up for an account at
http://analytics.google.com, then create an entry for your repositories website. Google
Analytics will give you a snippet of javascript code to place on your site, inside that snip it is
your Google Analytics key usually found in the line: _uacct = "UA-XXXXXXX-X" Take this key
(just the UA-XXXXXX-X part) and place it here in this parameter.
Property: xmlui.controlpanel.activity.max
Informational Assign how many page views will be recorded and displayed in the control panel's activity
Note: viewer. The activity tab allows an administrator to debug problems in a running DSpace by
understanding who and how their dspace is currently being used. The default value is 250.
Property: xmlui.controlpanel.activity.ipheader
Informational Determine where the control panel's activity viewer receives an events IP address from. If
Note: your DSpace is in a load balanced environment or otherwise behind a context-switch then you
will need to set the parameter to the HTTP parameter that records the original IP address.
In order to change the registries, you may adjust the XML files before the first installation of DSpace. On an
already running instance it is recommended to change bitstream registries via DSpace admin UI, but the
metadata registries can be loaded again at any time from the XML files without difficult. The changes made via
admin UI are not reflected in the XML files.
There is a set of Dublin Core Elements, which is used by the system and should not be removed or moved to
another schema, see Appendix: Default Dublin Core Metadata registry.
Note: altering a Metadata Registry has no effect on corresponding parts, e.g. item submission interface, item
display, item import and vice versa. Every metadata element used in submission interface or item import must
be registered before using it.
Note also that deleting a metadata element will delete all its corresponding values.
If you wish to add more metadata elements, you can do this in one of two ways. Via the DSpace admin UI you
may define new metadata elements in the different available schemas. But you may also modify the XML file (or
provide an additional one), and re-import the data as follows:
<dspace-dc-types>
<dc-type>
<schema>dc</schema>
<element>contributor</element>
<qualifier>advisor</qualifier>
<scope_note>Use primarily for thesis advisor.</scope_note>
</dc-type>
</dspace-dc-types>
Unknown
License
Deleting a format will cause any existing bitstreams of this format to be reverted to the unknown
bitstream format.
XPDF Filter
This is an alternative suite of MediaFilter plugins that offers faster and more reliable text extraction from PDF
Bitstreams, as well as thumbnail image generation. It replaces the built-in default PDF MediaFilter.
If this filter is so much better, why isn't it the default? The answer is that it relies on external executable
programs which must be obtained and installed for your server platform. This would add too much complexity to
the installation process, so it left out as an optional "extra" step.
Installation Overview
Here are the steps required to install and configure the filters:
1. Install the xpdf tools for your platform, from the downloads at http://www.foolabs.com/xpdf
2. Acquire the Sun Java Advanced Imaging Tools and create a local Maven package.
3. Edit DSpace configuration properties to add location of xpdf executables, reconfigure MediaFilter plugins.
4. Build and install DSpace, adding -Pxpdf-mediafilter-support to Maven invocation.
You may be able to download a binary distribution for your platform, which simplifies installation. Xpdf is readily
available for Linux, Solaris, MacOSX, Windows, NetBSD, HP-UX, AIX, and OpenVMS, and is reported to work
on AIX, OS/2, and many other systems.
For AIX, Sun support has the following: "JAI has native acceleration for the above but it also works in pure Java
mode. So as long as you have an appropriate JDK for AIX (1.3 or later, I believe), you should be able to use it.
You can download any of them, extract just the jars, and put those in your $CLASSPATH."
curl -O http://download.java.net/media/jai-imageio/builds/release/1.1/jai_imageio-1_1-lib-linux-
i586.tar.gz
tar xzf jai_imageio-1_1-lib-linux-i586.tar.gz
curl -O http://download.java.net/media/jai/builds/release/1_1_2_01/jai-1_1_2_01-lib-linux-i586.tar.
gz
tar xzf jai-1_1_2_01-lib-linux-i586.tar.gz
The preceding example leaves the JAR in jai_imageio-1_1/lib/jai_imageio.jar . Now install it in your local
Maven repository, e.g.: (changing the path after file= if necessary)
mvn install:install-file \
-Dfile=jai_imageio-1_1/lib/jai_imageio.jar \
-DgroupId=com.sun.media \
-DartifactId=jai_imageio \
-Dversion=1.0_01 \
-Dpackaging=jar \
-DgeneratePom=true
You may have to repeat this procedure for the jai_core.jar library, as well, if it is not available in any of the public
Maven repositories. Once acquired, this command installs it locally: e.g.: (changing the path after file= if
necessary)
Now, add the absolute paths to the XPDF tools you installed. In this example they are installed under /usr/local
/bin (a logical place on Linux and MacOSX), but they may be anywhere.
xpdf.path.pdftotext = /usr/local/bin/pdftotext
xpdf.path.pdftoppm = /usr/local/bin/pdftoppm
xpdf.path.pdfinfo = /usr/local/bin/pdfinfo
Change the MediaFilter plugin configuration to remove the old org.dspace.app.mediafilter.PDFFilter and add the
new filters, e.g: (New sections are in bold)
filter.plugins = \
PDF Text Extractor, \
PDF Thumbnail, \
HTML Text Extractor, \
Word Text Extractor, \
JPEG Thumbnail
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \
org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \
org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \
org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG
Then add the input format configuration properties for each of the new filters, e.g.:
Finally, if you want PDF thumbnail images, don't forget to add that filter name to the filter.plugins property, e.g.:
Changes in 1.8
Tasks
Activation
Writing your own tasks
Task Invocation
On the command line
In the admin UI
In workflow
In arbitrary user code
Asynchronous (Deferred) Operation
Task Output and Reporting
Status Code
Result String
Reporting Stream
Task Properties
Task Annotations
Scripted Tasks
Interface
performDso() vs. performId()
Bundled Tasks
MetadataWebService Task
ISSN to Publisher Name
HTTP Headers
Transformations
Result String Programatic Use
Limits and Use
NoOp Curation Task
Bitstream Format Profiler
Required Metadata
Virus Scan
Setup the service from the ClamAV documentation.
DSpace Configuration
Task Operation from the Administrative user interface
Task Operation from the Item Submission user interface
Task Operation from the curation command line client
Table 1 – Virus Scan Results Table
Link Checkers
Basic Link Checker
Metadata Value Link Checker
Microsoft Translator
Configure Microsoft Translator
4.5.2 Tasks
The goal of the curation system ("CS") is to provide a simple, extensible way to manage routine content
operations on a repository. These operations are known to CS as "tasks", and they can operate on any
DSpaceObject (i.e. subclasses of DSpaceObject) - which means the entire Site, Communities, Collections, and
Items - viz. core data model objects. Tasks may elect to work on only one type of DSpace object - typically an
Item - and in this case they may simply ignore other data types (tasks have the ability to "skip" objects for any
reason). The DSpace core distribution will provide a number of useful tasks, but the system is designed to
encourage local extension - tasks can be written for any purpose, and placed in any java package. This gives
DSpace sites the ability to customize the behavior of their repository without having to alter - and therefore
manage synchronization with - the DSpace source code. What sorts of activities are appropriate for tasks?
Some examples:
apply a virus scan to item bitstreams (this will be our example below)
profile a collection based on format types - good for identifying format migrations
ensure a given set of metadata fields are present in every item, or even that they have particular values
call a network service to enhance/replace/normalize an item's metadata or content
ensure all item bitstreams are readable and their checksums agree with the ingest values
Since tasks have access to, and can modify, DSpace content, performing tasks is considered an administrative
function to be available only to knowledgeable collection editors, repository administrators, sysadmins, etc. No
tasks are exposed in the public interfaces.
4.5.3 Activation
For CS to run a task, the code for the task must of course be included with other deployed code (to [dspace]
/lib, WAR, etc) but it must also be declared and given a name. This is done via a configuration property in
[dspace]/config/modules/curate.cfg as follows:
plugin.named.org.dspace.curate.CurationTask = \
org.dspace.ctask.general.NoOpCurationTask = noop, \
org.dspace.ctask.general.ProfileFormats = profileformats, \
org.dspace.ctask.general.RequiredMetadata = requiredmetadata, \
org.dspace.ctask.general.ClamScan = vscan, \
org.dspace.ctask.general.MicrosoftTranslator = translate, \
org.dspace.ctask.general.MetadataValueLinkChecker = checklinks
For each activated task, a key-value pair is added. The key is the fully qualified class name and the value is the
taskname used elsewhere to configure the use of the task, as will be seen below. Note that the curate.cfg
configuration file, while in the config directory, is located under "modules". The intent is that tasks, as well as
any configuration they require, will be optional "add-ons" to the basic system configuration. Adding or removing
tasks has no impact on dspace.cfg.
For many tasks, this activation configuration is all that will be required to use it. But for others, the task needs
specific configuration itself. A concrete example is described below, but note that these task-specific
configuration property files also reside in [dspace]/config/modules
First, it must provide a no argument constructor, so it can be loaded by the PluginManager. Thus, all tasks are
'named' plugins, with the taskname being the plugin name.
The CurationTask interface is almost a "tagging" interface, and only requires a few very high-level methods be
implemented. The most significant is:
If a task extends the AbstractCurationTask class, that is the only method it needs to define.
As with other command-line tools, these invocations could be placed in a cron table and run on a fixed
schedule, or run on demand by an administrator.
In the admin UI
In the UI, there are several ways to execute configured Curation Tasks:
1. From the "Curate" tab/button that appears on each "Edit Community/Collection/Item" page: this
tab allows an Administrator, Community Administrator or Collection Administrator to run a Curation Task
on that particular Community, Collection or Item. When running a task on a Community or Collection, that
task will also execute on all its child objects, unless the Task itself states otherwise (e.g. running a task
on a Collection will also run it across all Items within that Collection).
NOTE: Community Administrators and Collection Administrators can only run Curation Tasks on
the Community or Collection which they administer, along with any child objects of that
Community or Collection. For example, a Collection Administrator can run a task on that specific
Collection, or on any of the Items within that Collection.
2. From the Administrator's "Curation Tasks" page: This option is only available to DSpace
Administrators, and appears in the Administrative side-menu. This page allows an Administrator to run a
Curation Task across a single object, or all objects within the entire DSpace site.
In order to run a task from this interface, you must enter in the handle for the DSpace object. To
run a task site-wide, you can use the handle: [your-handle-prefix]/0
Each of the above pages exposes a drop-down list of configured tasks, with a button to 'perform' the task, or
queue it for later operation (see section below). Not all activated tasks need appear in the Curate tab - you filter
them by means of a configuration property. This property also permits you to assign to the task a more user-
friendly name than the PluginManager taskname. The property resides in [dspace]/config/modules
/curate.cfg:
ui.tasknames = \
profileformats = Profile Bitstream Formats, \
requiredmetadata = Check for Required Metadata
When a task is selected from the drop-down list and performed, the tab displays both a phrase interpreting the
"status code" of the task execution, and the "result" message if any has been defined. When the task has been
queued, an acknowledgement appears instead. You may configure the words used for status codes in curate.
cfg (for clarity, language localization, etc):
ui.statusmessages = \
-3 = Unknown Task, \
-2 = No Status Set, \
-1 = Error, \
0 = Success, \
1 = Fail, \
2 = Skip, \
other = Invalid Status
As the number of tasks configured for a system grows, a simple drop-down list of all tasks may become too
cluttered or large. DSpace 1.8+ provides a way to address this issue, known as task groups. A task group is a
simple collection of tasks that the Admin UI will display in a separate drop-down list. You may define as many or
as few groups as you please. If no groups are defined, then all tasks that are listed in the ui.tasknames property
will appear in a single drop-down list. If at least one group is defined, then the admin UI will display two drop-
down lists. The first is the list of task groups, and the second is the list of task names associated with the
selected group. A few key points to keep in mind when setting up task groups:
The configuration of groups follows the same simple pattern as tasks, using properties in [dspace]/config
/modules/curate.cfg. The group is assigned a simple logical name, but also a localizable name that
appears in the UI. For example:
# ui.taskgroups contains the list of defined groups, together with a pretty name for UI display
ui.taskgroups = \
replication = Backup and Restoration Tasks, \
integrity = Metadata Integrity Tasks, \
.....
# each group membership list is a separate property, whose value is comma-separated list of
logical task names
ui.taskgroup.integrity = profileformats, requiredmetadata
....
In workflow
CS provides the ability to attach any number of tasks to standard DSpace workflows. Using a configuration file
[dspace]/config/workflow-curation.xml, you can declaratively (without coding) wire tasks to any step
in a workflow. An example:
<taskset-map>
<mapping collection-handle="default" taskset="cautious" />
</taskset-map>
<tasksets>
<taskset name="cautious">
<flowstep name="step1">
<task name="vscan">
<workflow>reject</workflow>
<notify on="fail">$flowgroup</notify>
<notify on="fail">$colladmin</notify>
<notify on="error">$siteadmin</notify>
</task>
</flowstep>
</taskset>
</tasksets>
This markup would cause a virus scan to occur during step one of workflow for any collection, and automatically
reject any submissions with infected files. It would further notify (via email) both the reviewers (step 1 group),
and the collection administrators, if either of these are defined. If it could not perform the scan, the site
administrator would be notified.
The notifications use the same procedures that other workflow notifications do - namely email. There is a new
email template defined for curation task use: [dspace]/config/emails/flowtask_notify. This may be
language-localized or otherwise modified like any other email template.
Tasks wired in this way are normally performed as soon as the workflow step is entered, and the outcome
action (defined by the 'workflow' element) immediately follows. It is also possible to delay the performance of
the task - which will ensure a responsive system - by queuing the task instead of directly performing it:
...
<taskset name="cautious">
<flowstep name="step1" queue="workflow">
...
This attribute (which must always follow the "name" attribute in the flowstep element), will cause all tasks
associated with the step to be placed on the queue named "workflow" (or any queue you wish to use, of
course), and further has the effect of suspending the workflow. When the queue is emptied (meaning all tasks
in it performed), then the workflow is restarted. Each workflow step may be separately configured,
Like configurable submission, you can assign these task rules per collection, as well as having a default for any
collection.
would do approximately what the command line invocation did. the method "curate" just performs all the tasks
configured (you can add multiple tasks to a curator).
would place a request on a named queue "monthly" to virus scan the collection. To read (and process) the
queue, we could for example:
use the command-line tool, but we could also read the queue programmatically. Any number of queues can be
defined and used as needed.
In the administrative UI curation "widget", there is the ability to both perform a task, but also place it on a queue
for later processing.
Status Code
This was mentioned above. This is returned to CS whenever a task is called. The complete list of values:
In the administrative UI, this code is translated into the word or phrase configured by the ui.statusmessages
property (discussed above) for display.
Result String
The task may define a string indicating details of the outcome. This result is displayed, in the "curation widget"
described above:
CS does not interpret or assign result strings, the task does it. A task may not assign a result, but the "best
practice" for tasks is to assign one whenever possible.
Reporting Stream
For very fine-grained information, a task may write to a reporting stream. This stream is sent to standard out, so
is only available when running a task from the command line. Unlike the result string, there is no limit to the
amount of data that may be pushed to this stream.
The status code, and the result string are accessed (or set) by methods on the Curation object:
and similar. But tasks are supposed to be written by anyone in the community and shared around (without prior
coordination), so if another task uses the same configuration file name, there is a name collision here that can't
be easily fixed, since the reference is hard-coded in each task. In this case, if we wanted to use both at a given
site, we would have to alter the source of one of them - which introduces needless code localization and
maintenance.
Task properties gives us a simple solution. Here is how it works: suppose that both colliding tasks instead use
this method provided by AbstractCurationTask in their task implementation code (e.g. in virus scanner):
host = taskProperty("service.host");
Note that there is no name of the configuration file even mentioned, just the property name whose value we
want. At runtime, the curation system resolves this call to a configuration file, and it uses the name the task has
been configured as as the name of the config file. So, for example, if both were installed (in curate.cfg) as:
org.dspace.ctask.general.ClamAv = vscan,
org.community.ctask.ConflictTask = virusscan,
....
Another use of task properties is to support multiple task profiles. Suppose we have a task that we want to
operate in one of two modes. A good example would be a mediafilter task that produces a thumbnail. We can
either create one if it doesn't exist, or run with "-force" which will create one regardless. Suppose this behavior
was controlled by a property in a config file. If we configured the task as "thumbnail", then we would have in
[dspace]/config/modules/thumbnail.cfg:
...other properties...
thumbnail.maxheight = 80
thumbnail.maxwidth = 80
forceupdate=false
Then, following the pattern above, the thumbnail generating task code would look like:
if (taskBooleanProperty("forceupdate")) {
// do something
}
But an obvious use-case would be to want to run force mode and non-force mode from the admin UI on
different occasions. To do this, one would have to stop Tomcat, change the property value in the config file, and
restart, etc However, we can use task properties to elegantly rescue us here. All we need to do is go into the
config/modules directory, and create a new file called: thumbnail.force.cfg. In this file, we put only one
property:
forceupdate=true
Then we add a new task (really just a new name, no new code) in curate.cfg:
org.dspace.ctask.general.ThumbnailTask = thumbnail,
org.dspace.ctask.general.ThumbnailTask = thumbnail.force
Consider what happens: when we perform the task "thumbnail" (using taskProperties), it reads the config file
thumbnail.cfg and operates in "non-force" profile (since the value is false), but when we run the task "
thumbnail.force" the curation system first reads thumbnail.cfg, then reads thumbnail.force.cfg
which overrides the value of the "forceupdate" property. Notice that we did all this via local configuration -
we have not had to touch the source code at all to obtain as many "profiles" as we would like.
@Distributive
public class MyTask implements CurationTask
A related issue concerns how non-distributive tasks report their status and results: the status will normally
reflect only the last invocation of the task in the container, so important outcomes could be lost. If a task
declares itself @Suspendable, however, the CS will cease processing when it encounters a FAIL status. When
used in the UI, for example, this would mean that if our virus scan is running over a collection, it would stop and
return status (and result) to the scene on the first infected item it encounters. You can even tune @Supendable
tasks more precisely by annotating what invocations you want to suspend on. For example:
@Suspendable(invoked=Curator.Invoked.INTERACTIVE)
public class MyTask implements CurationTask
would mean that the task would suspend if invoked in the UI, but would run to completion if run on the
command-line.
Only a few annotation types have been defined so far, but as the number of tasks grow, we can look for
common behavior that can be signaled by annotation. For example, there is a @Mutative type: that tells CS that
the task may alter (mutate) the object it is working on.
The procedure to set up curation tasks in Jython is described on a separate page: Curation tasks in
Jython
DSpace 1.8 includes limited (and somewhat experimental) support for deploying and running tasks written in
languages other than Java. Since version 6, Java has provided a standard way (API) to invoke so-called
scripting or dynamic language code that runs on the java virtual machine (JVM). Scripted tasks are those
written in a language accessible from this API. The exact number of supported languages will vary over time,
and the degree of maturity of each language, or suitability of the language for curation tasks will also vary
significantly. However, preliminary work indicates that Ruby (using the JRuby runtime) and Groovy may prove
viable task languages.
Support for scripted tasks does not include any DSpace pre-installation of the scripting language itself - this
must be done according to the instructions provided by the language maintainers, and typically only requires a
few additional jars on the DSpace classpath. Once one or more languages have been installed into the DSpace
deployment, task support is fairly straightforward. One new property must be defined in [dspace]/config
/modules/curate.cfg:
script.dir = ${dspace.dir}/scripts
This merely defines the directory location (usually relative to the deployment base) where task script files should
be kept. This directory will contain a "catalog" of scripted tasks named task.catalog that contains
information needed to run scripted tasks. Each task has a 'descriptor' property with value syntax:
<engine>|<relFilePath>|<implClassCtor>
An example property for a link checking task written in Ruby might be:
linkchecker = ruby|rubytask.rb|LinkChecker.new
This descriptor means that a "ruby" script engine will be created, a script file named "rubytask.rb" in the
directory <script.dir> will be loaded and the resolver will expect an evaluation of "LinkChecker.new" will
provide a correct implementation object. Note that the task must be configured in all other ways just like java
tasks (in ui.tasknames, ui.taskgroups, etc).
Script files may embed their descriptors to facilitate deployment. To accomplish this, a script must include the
descriptor string with syntax:
$td=<descriptor> somewhere on a comment line. For example:
# My descriptor $td=ruby|rubytask.rb|LinkChecker.new
For reasons of portability, the <relFilePath> component may be omitted in this context. Thus, "
$td=ruby||LinkChecker.new" will be expanded to a descriptor with the name of the embedding file.
Interface
Scripted tasks must implement a slightly different interface than the CurationTask interface used for Java tasks.
The appropriate interface for scripting tasks is ScriptedTask and has the following methods:
The difference is that ScriptedTask has separate perform methods for DSO and identifier. The reason for that is
that some scripting languages (e.g. Ruby) don't support method overloading.
There are a class of use-cases in which we want to construct or create new DSOs (DSpaceObject) given an
identifier in a task. In these cases, there may be no live DSO to pass to the task.
You actually can get curation system to call performId() if you queue a task then process the queue - when
reading the queue all CLI has is the handle to pass to the task.
MetadataWebService Task
DSpace item metadata can contain any number of identifiers or other field values that participate in networked
information systems. For example, an item may include a DOI which is a controlled identifier in the DOI registry.
Many web services exist to leverage these values, by using them as 'keys' to retrieve other useful data. In the
DOI case for example, CrossRef provides many services that given a DOI will return author lists, citations, etc.
The MetadataWebService task enables the use of such services, and allows you to obtain and (optionally) add
to DSpace metadata the results of any web service call to any service provider. You simply need to describe
what service you want to call, and what to do with the results. Using the task code, you can create as many
distinct tasks as you have services you want to call. Each description lives in a configuration file in 'config
/modules', and is a simple properties file, like all other DSpace configuration files. The name of the configuration
file is the task name you assign to it in config/modules/curate.cfg. There are a few required properties you must
configure for any service, and for certain services, a few additional ones. An example will illustrate best.
template=http://www.sherpa.ac.uk/romeo/api29.php?issn={dc.identifier.issn}
When the task runs, it will replace '{dc.identifier.issn}' with the value of that field in the item, If the field has
multiple values, the first one will be used. As a web service, the call to the above URL will return an XML
document containing information (including the publisher name) about that ISSN. We need to describe what to
do with this response document, i.e. what elements we want to extract, and what to do with the extracted
content. This description is encoded in a property called the 'datamap'. Using the example service above we
might have:
datamap=//publisher/name=>dc.publisher,//romeocolor
Each separate instruction is separated by a comma, so there are 2 instructions in this map. The first instruction
essentially says: find the XML element 'publisher name' and assign the value or values of this element to the 'dc.
publisher' field of the item. The second instruction says: find the XML element 'romeocolor', but do not add it to
the DSpace item metadata - simply add it to the task result string (so that it can be seen by the person running
the task). You can have as many instructions as you like in a datamap, which means that you can retrieve
multiple values from a single web service call. A little more formally, each instruction consists of one to three
parts. The first (mandatory) part identifies the desired data in the response document. The syntax (here '
//publisher/name') is an XPath 1.0 expression, which is the standard language for navigating XML trees. If the
value is to be assigned to the DSpace item metadata, then 2 other parts are needed. The first is the 'mapping
symbol' (here '=>'), which is used to determine how the assignment should be made. There are 3 possible
mapping symbols, shown here with their meanings:
'->' mapping will add to any existing value(s) in the item field
'=>' mapping will replace any existing value(s) in the item field
'~>' mapping will add *only if* item field has no existing value(s)
The third part (here 'dc.publisher') is simply the name of the metadata field to be updated. These two mandatory
properties (template and datamap) are sufficient to describe a large number of web services. All that is required
to enable this task is to edit 'config/modules/curate.cfg', add 'issn2pubname' to the list of tasks:
plugin.named.org.dspace.curate.CurationTask = \
... other defined tasks
org.dspace.ctask.general.MetadataWebService = issn2pubname, \
... other metadatata web service tasks
org.dspace.ctask.general.MetadataWebService = doi2crossref, \
If you wish the task to be available in the Admin UI, see the Invocation from the Admin UI documentation
(above) about how to configure it. The remaining sections describe some more specialized needs using the
MetadataWebService task.
HTTP Headers
For some web services, protocol and other information is expressed not in the service URL, but in HTTP
headers. Examples might be HTTP basic auth tokens, or requests for a particular media type response. In
these cases, simply add a property to the configuration file (our example was 'issn2pubname.cfg') containing all
headers you wish to transmit to the service:
You can specify any number of headers, just separate them with a 'double-pipe' ('||').
Transformations
One potential problem with the simple parameter substitutions performed by the task is that the service might
expect a different format or expression of a value than the way it is stored in the item metadata. For example, a
DOI service might expect a bare prefix/suffix notation ('10.000/12345'), whereas the DSpace metadata field
might have a URI representation ('http://dx.doi.org/10.000/12345'). In these cases one can declare a
'transformation' of a value in the template. For example:
template=http://www.crossref.org/openurl/?id={doi:dc.relation.isversionof}&format=unixref
The 'doi:' prepended to the metadata field name declares that the value of the 'dc.relation.isversionof' field
should be transformed before the substitution into the template using a transformation named 'doi'. The
transformation is itself defined in the same configuration file as follows:
This would be read as: exclude the value string up to the occurrence of '10.', then truncate any characters after
length 60. You may define as many transformations as you want in any task, although generally 1 or 2 will
suffice. They keywords 'match', 'trunc', etc are names of 'functions' to be applied (in the order entered). The
currently available functions are:
When the task is run, if the transformation results in an invalid state (e.g. cutting more characters than there are
in the value), the un-transformed value will be used and the condition will be logged. Transformations may also
be applied to values returned from the web service. That is, one can apply the transformation to a value before
assigning it to a metadata field. In this case, the declaration occurs in the datamap property, not the template:
datamap=//publisher/name=>shorten:dc.publisher,//romeocolor
Here the task will apply the 'shorten' transformation (which must be defined in the same config file) before
assigning the value to 'dc.publisher'.
separator=||
for example, it becomes easy to parse the result string and preserve spaces in the values. This use of the result
string can be very powerful, since you are essentially creating a map of returned values, which can then be
used to populate a user interface, or any other way you wish to exploit the data (drive a workflow, etc).
where the left column is the count of bitstreams of the named format and the letter in parentheses is an
abbreviation of the repository-assigned support level for that format:
U Unsupported
K Known
S Supported
The profiler will operate on any DSpace object. If the object is an item, then only that item's bitstreams are
profiled; if a collection, all the bitstreams of all the items; if a community, all the items of all the collections of the
community.
Required Metadata
The "requiredmetadata" task examines item metadata and determines whether fields that the web
submission (input-forms.xml) marks as required are present. It sets the result string to indicate either that
all required fields are present, or constructs a list of metadata elements that are required but missing. When the
task is performed on an item, it will display the result for that item. When performed on a collection or
community, the task be performed on each item, and will display the last item result. If all items in the
community or collection have all required fields, that will be the last in the collection. If the task fails for any item
(i.e. the item lacks all required fields), the process is halted. This way the results for the 'failed' items are not
lost.
Virus Scan
The "vscan" task performs a virus scan on the bitstreams of items using the ClamAV software product.
Clam AntiVirus is an open source (GPL) anti-virus toolkit for UNIX. A port for Windows is also available. The
virus scanning curation task interacts with the ClamAV virus scanning service to scan the bitstreams contained
in items, reporting on infection(s). Like other curation tasks, it can be run against a container or item, in the GUI
or from the command line. It should be installed according to the documentation at http://www.clamav.net. It
should not be installed in the dspace installation directory. You may install it on the same machine as your
dspace installation, or on another machine which has been configured properly.
NOTICE: The following directions assume there is a properly installed and configured clamav daemon. Refer to
links above for more information about ClamAV.
The Clam anti-virus database must be updated regularly to maintain the most current level of anti-virus
protection. Please refer to the ClamAV documentation for instructions about maintaining the anti-virus
database.
DSpace Configuration
In [dspace]/config/modules/curate.cfg, activate the task:
Optionally, add the vscan friendly name to the configuration to enable it in the administrative it in the
administrative user interface.
ui.tasknames = \
profileformats = Profile Bitstream Formats, \
requiredmetadata = Check for Required Metadata, \
vscan = Scan for Viruses
service.host = 127.0.0.1
Change if not running on the same host as your DSpace installation.
service.port = 3310
Change if not using standard ClamAV port
socket.timeout = 120
Change if longer timeout needed
scan.failfast = false
Change only if items have large numbers of bitstreams
Finally, if desired virus scanning can be enabled as part of the submission process upload file step. In
[dspace]/config/modules, edit configuration file submission-curation.cfg:
virus-scan = true
virus-scan = true
Command Line
Container T Report on 1st infected bitstream within an item/Scan all contained Items
Link Checkers
Two link checker tasks, BasicLinkChecker and MetadataValueLinkChecker can be used to check for broken or
unresolvable links appearing in item metadata.
This task is intended as a prototype / example for developers and administrators who are new to the curation
system.
Microsoft Translator
Microsoft Translator uses the Microsoft Translate API to translate metadata values from one source language
into one or more target languages.
This task cab be configured to process particular fields, and use a default language if no authoritative language
for an item can be found. Bing API v2 key is needed.
MicrosoftTranslator extends the more generic AbstractTranslator. This now seems wasteful, but a
GoogleTranslator had also been written to extend AbstractTranslator. Unfortunately, Google has announced
they are decommissioning free Translate API service, so this task hasn't been included in DSpace's general set
of curation tasks.
Translated fields are added in addition to any existing fields, with the target language code in the 'language'
column. This means that running a task multiple times over one item with the same configuration could result in
duplicate metadata.
This task is intended as a prototype / example for developers and administrators who are new to the curation
system.
#---------------------------------------------------------------#
#----------TRANSLATOR CURATION TASK CONFIGURATIONS--------------#
#---------------------------------------------------------------#
# Configuration properties used solely by MicrosoftTranslator #
# Curation Task (uses Microsoft Translation API v2) #
#---------------------------------------------------------------#
## Translation field settings
##
## Authoritative language field
## This will be read to determine the original language an item was submitted in
## Default: dc.language
translate.field.language = dc.language
Note: Installation location doesn't matter, this is not necessary for DSpace. You can safely delete
it after you retrieve jython.jar and Lib.
3. Install Jython to DSpace classpaths (step 2a already did this for you):
The goal is to put jython.jar and the jython Lib/ directory into every DSpace classpath you intend to
use, so it must be installed in both[dspace]/lib and the webapp that deploys to Tomcat (if you want
to run from the UI) - [dspace]/webapps/xmlui/WEB-INF/lib/. You can use symlinks if you wish.
There are no special maven/pom extensions - just copy in the jar and Lib/.
Note: Older versions of Jython mention the need for jython-engine.jar to implement JSR-223. Don't worry
about that, new Jython versions, e.g. 2.5.3 don't require this.
Notes:
don't put spaces around the pipe character or you'll get an error similar to this one:
ERROR org.dspace.curate.TaskResolver @ Script engine: 'python ' is
not installed
The "script engine name" is what ever name (or alias) jython registers in the JVM. You can
use both "python" and "jython" as engine name (tested on jython 2.5.3).
The logical task name can't conflict with existing (java) task names, but otherwise any
single-word token can be used.
The file name is just the script file name in the script.dir directory
"constructor invocation" is the language specific way to create an object that implements
the task interface - it's ClassName() for Python
c. If you want pretty names in the UI, configure other curate.cfg properties - see " ui.tasknames"
(or groups etc)
5. Write your task.
In the directory configured above, create your task (with the name configured in "task.catalog").
The basic requirement of any scripted task is that it implements the ScriptedTask Java interface.
So for our example, the mytask.py file might look like this:
class MyTask(ScriptedTask):
def init(self, curator, taskName):
print "initializing with Jython"
Note: "-r -" means that the script's standard output will be directed to the console. You can read more
details in the "On the command line" chapter of the Curation System page.
See also
Curation System page in the official documentation
Nailgun - for speeding up repeated runs of a dspace command from the command line
4.6 Discovery
What is DSpace Discovery
What is a Sidebar Facet
What is a Search Filter
Discovery Changelist
DSpace 4.0
DSpace 3.0
DSpace 1.8
DSpace 1.7
Enabling Discovery
Configuration files
General Discovery settings (config/modules/discovery.cfg)
Although these techniques are new in DSpace, they might feel familiar from other platforms like Aquabrowser or
Amazon, where facets help you to select the right product according to facets like price and brand. DSpace
Discovery offers very powerful browse and search configurations that were only possible with code
customization in the past.
Since DSpace 4.0 Discovery is the default Search and Browse infrastructure for both XMLUI and
JSPUI.
When you have successfully enabled Discovery in your DSpace, you will notice that the different enabled facets
are visualized in a "Discover" section in your sidebar, by default, right below the Browse options.
In this example, there are 3 Sidebar Facets, Author, Subject and Date Issued. It's important to know that
multiple metadata fields can be included in one facet. For example, the Author facet above includes values from
both dc.contributor.author as well as dc.creator.
Another important property of Sidebar Facets is that their contents are automatically updated to the context of
the page. On collection homepages or community homepages it will include information about the items
included in that particular collection or community.
In a faceted search, a user can modify the list of displayed search results by specifying additional "filters" that
will be applied on the list of search results. In DSpace, a filter is a contain condition applied to specific facets. In
the example below, a user started with the search term "health", which yielded 500 results. After applying the
filter "public" on the facet "Subject", only 227 results remain. Each time a user selects a sidebar facet it will be
added as a filter. Active filters can be altered or removed in the 'filters' section of the search interface.
Another example: Using the standard search, a user would search for something like [wetland + "dc.
author=Mitsch, William J" + dc.subject="water quality" ]. With filtered search, they can start by searching
for [wetland ], and then filter the results by the other attributes, author and subject.
DSpace 4.0
Starting from DSpace 4.0, Discovery is the default search and browse solution for DSpace.
General improvements:
Browse interfaces now also use Discovery index (rather than the legacy Lucene index)
"Did you means" spell check aid for search
DSpace 3.0
General improvements:
Authority control & variants awareness (homonyms are shown separately in a facet if they have different
authority ID). All variant forms as recognized by the authority framework are indexed. See Authority
Framework
XMLUI-only:
Auto-complete functionality has been removed in XMLUI from search queries due to performance issues.
JSPUI still supports auto-complete functionality without performance issues.
DSpace 1.8
Configuration moved from dspace.cfg into config/modules/discovery.cfg and config/spring
/api/discovery.xml
Individual communities and collections can have their own Discovery configuration.
Tokenization for Auto-complete values (see SearchFilter)
Alphanumeric sorting for Sidebarfacets
Possibility to avoid indexation of specific metadata fields.
Grouping of multiple metadata fields under the same SidebarFacet
DSpace 1.7
Sidebar browse facets that can be configured to use contents from any metadata field
Dynamically generated timespans for dates
Customizable "recent submissions" view on the repository homepage, collection and community pages
Hit highlighting & search snippets
Property: search.server
Example search.server=[http://localhost:8080/solr/search]
Value:
Informational Discovery relies on a Solr index for storage and retrieval of its information. This parameter
Note: determines the location of the Solr index.
Property: index.ignore
Example index.ignore=dc.description.provenance,dc.language
Value:
Informational By default, Discovery will include all of the DSpace metadata in its search index. In cases
Note: where specific metadata is confidential, repository managers can include those fields by
adding them to this comma separated list.
Property: index.authority.ignore[.field]
Example index.authority.ignore=true
Value:
index.authority.ignore.dc.contributor.author=false
Informational By default, Discovery will use the authority information in the metadata to disambiguate
Note: homonyms. Setting this property to false will make the indexing process the same as the
metadata doesn't include authority information. The configuration can be different on a field
(<schema>.<element>.<qualifier>) basis, the property without field set the default value.
Property: index.authority.ignore-prefered[.field]
Example index.authority.ignore-prefered=true
Value:
index.authority.ignore-prefered.dc.contributor.author=false
Informational By default, Discovery will use the authority information in the metadata to query the authority
Note: for the prefered label. Setting this property to false will make the indexing process the same
as the metadata doesn't include authority information (i.e. the prefered form is the one
recorded in the metadata value). The configuration can be different on a field (<schema>.
<element>.<qualifier>) basis, the property without field set the default value. If the authority
is a remote service, disabling this feature can greatly improve performance.
Property: index.authority.ignore-variants[.field]
Example index.authority.ignore-variants=true
Value:
index.authority.ignore-variants.dc.contributor.author=false
Informational By default, Discovery will use the authority information in the metadata to query the authority
Note: for variants. Setting this property to false will make the indexing process the same, as the
metadata doesn't include authority information. The configuration can be different on a per-
field (<schema>.<element>.<qualifier>) basis, the property without field set the default value.
If authority is a remote service, disabling this feature can greatly improve performance.
Structure Summary
This file is in XML format, you should be familiar with XML before editing this file. The configurations are
organized together in beans, depending on the purpose these properties are used for.
This purpose can be derived from the class of the beans. Here's a short summary of classes you will encounter
throughout the file and what the corresponding properties in the bean are used for.
Download the configuration file and review it together with the following parameters
Class: DiscoveryConfigurationService
Purpose: Defines the mapping between separate Discovery configurations and individual collections
/communities
Default: All communities, collections and the homepage (key=default) are mapped to defaultConfiguration
Class: DiscoveryConfiguration
Purpose: Groups configurations for sidebar facets, search filters, search sort options and recent
submissions
Class: DiscoverySearchFilter
Purpose: Defines that specific metadata fields should be enabled as a search filter
Default: dc.title, dc.contributor.author, dc.creator, dc.subject.* and dc.date.issued are defined as search
filters
Class: DiscoverySearchFilterFacet
Purpose: Defines which metadata fields should be offered as a contextual sidebar browse options, each of
these facets has also got to be a search filter
Class: HierarchicalSidebarFacetConfiguration
Purpose: Defines which metadata fields contain hierarchical data and should be offered as a contextual
sidebar option
Class: DiscoverySortConfiguration
Default: dc.title and dc.date.issued are defined as alternatives for sorting, other than Relevance (hard-
coded)
Class: DiscoveryHitHighlightingConfiguration
Purpose: Defines which metadata fields can contain hit highlighting & search snippets
Default: dc.title, dc.contributor.author, dc.subject, dc.description.abstract & full text from text files.
Default settings
In addition to the summarized descriptions of the default values, following details help you to better understand
these defaults. If you haven't already done so, download the configuration file and review it together with the
following parameters.
The file contains one default configuration that defines following sidebar facets, search filters, sort fields and
recent submissions display:
Sidebar facets
searchFilterAuthor: groups the metadata fields dc.contributor.author & dc.creator with a facet
limit of 10, sorted by occurrence count
searchFilterSubject: groups all subject metadata fields (dc.subject.*) with a facet limit of 10,
sorted by occurrence count
searchFilterIssued: contains the dc.date.issued metadata field, which is identified with the type
"date" and sorted by specific date values
Search filters
searchFilterTitle: contains the dc.title metadata field
searchFilterAuthor: contains the dc.contributor.author & dc.creator metadata fields
searchFilterSubject: contains the dc.subject.* metadata fields
searchFilterIssued: contains the dc.date.issued metadata field with the type "date"
Sort fields
sortTitle: contains the dc.title metadata field
sortDateIssued: contains the dc.date.issued metadata field, this sort has the type date
configured.
defaultFilterQueries
The default configuration contains no defaultFilterQueries
The default filter queries are disabled by default but there is an example in the default
configuration in comments which allows discovery to only return items (as opposed to also
communities/collections).
Recent Submissions
The recent submissions are sorted by dc.date. accessioned which is a date and a maximum
number of 5 recent submissions are displayed.
Hit highlighting
The fields dc.title, dc.contributor.author & dc.subject can contain hit highlighting.
The dc.description.abstract & full text field are used to render search snippets.
Many of the properties contain lists that use references to point to the configuration elements. This way a
certain configuration type can be used in multiple discovery configurations so there is no need to duplicate
them.
The id & class attributes are mandatory for this type of bean. The properties that it contains are discussed
below.
indexFieldName (Required): A unique search filter name, the metadata will be indexed in Solr under this
field name.
metadataFields (Required): A list of the metadata fields that need to be included in the facet.
Sidebar facets extend the search filter and add some extra properties to it, below is an example of a search
filter that is also used as a sidebar facet.
Note that the class has changed from DiscoverySearchFilter to SidebarFacetConfiguration this is needed to
support the extra properties.
facetLimit (optional): The maximum number of values to be shown. This property is optional, if none is
specified the default value "10" will be used. If the filter has the type date, this property will not be used
since dates are automatically grouped together.
sortOrder (optional):The sort order for the sidebar facets, it can either be COUNT or VALUE. The
default value is COUNT.
COUNT Facets will be sorted by the amount of times they appear in the repository
VALUE Facets will be sorted alphabetically
type(optional): the type of the sidebar facet it can either be "date" or "text", "text" is the default value.
text: The facets will be treated as is
date: Only the year will be stored in the Solr index. These years are automatically displayed in
ranges that get smaller when you select one.
The id & class attributes are mandatory for this type of bean. The properties that it contains are discussed
below.
DiscoveryConfiguration
The DiscoveryConfiguration Groups configurations for sidebar facets, search filters, search sort options and
recent submissions. If you want to show the same sidebar facets, use the same search filters, search options
and recent submissions everywhere in your repository, you will only need one DiscoveryConfiguration and you
might as well just edit the defaultConfiguration.
The DiscoveryConfiguration makes it very easy to use custom sidebar facets, search filters, ... on specific
communities or collection homepage. This is particularly useful if your collections are heterogeneous. For
example, in a collection with conference papers, you might want to offer a sidebar facet for conference date,
which might be more relevant than the actual issued date of the proceedings. In a collection with papers, you
might want to offer a facet for funding bodies or publisher, while these fields are irrelevant for items like learning
objects.
After modifying sidebarFacets and searchFilters, don't forget to reindex existing items by running
[dspace]/bin/dspace index-discovery -b, otherwise the changes will not appear.
Below is an example of how one of these lists can be configured. It's important that each of the bean references
corresponds to the exact name of the earlier defined facets, filters or sort options.
Each sidebar facet must also occur in the list of the search filters.
<property name="sidebarFacets">
<list>
<ref bean="sidebarFacetAuthor" />
<ref bean="sidebarFacetSubject" />
<ref bean="sidebarFacetDateIssued" />
</list>
</property>
<property name="searchSortConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoverySortConfiguration">
<!--<property name="defaultSort" ref="sortDateIssued"/>-->
<!--DefaultSortOrder can either be desc or asc (desc is default)-->
<property name="defaultSortOrder" value="desc"/>
<property name="sortFields">
<list>
<ref bean="sortTitle" />
<ref bean="sortDateIssued" />
</list>
</property>
</bean>
</property>
The property name & the bean class are mandatory. The property field names are discusses below.
defaultSort (optional): The default field on which the search results will be sorted, this must be a
reference to an existing search sort field bean. If none is given relevance will be the default. Sorting
according to the internal relevance algorithm is always available, even though it's not explicitly mentioned
in the sortFields section.
defaultSortOrder (optional): The default sort order can either be asc or desc.
sortFields (mandatory): The list of available sort options, each element in this list must link to an existing
sort field configuration bean.
<property name="defaultFilterQueries">
<list>
<value>query1</value>
<value>query2</value>
</list>
</property>
This property contains a simple list which in turn contains the queries. Some examples of possible queries:
search.resourcetype:2
dc.subject:test
dc.contributor.author: "Van de Velde, Kevin"
...
If the "Anonymous" group has "READ" access on the Item, then anonymous/public users will be able to view
that Item's metadata and locate that Item via DSpace's search/browse system. In addition, search engines will
also be able to index that Item's metadata. However, even with Anonymous READ set at the Item-level, you
may still choose to access-restrict the downloading/viewing of files within the Item. To do so, you would restrict
"READ" access on individual Bitstream(s) attached to the Item.
If the "Anonymous" group does NOT have "READ" access on the Item, then anonymous users will never see
that Item appear within their search/browse results (essentially the Item is "invisible" to them). In addition, that
Item will be invisible to search engines, so it will never be indexed by them. However, any users who have been
given READ access will be able to find/locate the item after logging into DSpace. For example, if a "Staff" group
was provided "READ" access on the Item, then members of that "Staff" group would be able to locate the item
via search/browse after logging into DSpace.
If you prefer to allow all access-restricted or embargoed Items to be findable within your DSpace, you can
choose to turn off Access Rights Awareness. However, please be aware that this means that restricting "READ"
access on an Item will not really do anything – the Item metadata will be available to the public no matter what
group(s) were given READ access on that Item.
The Browse Engine only supports the "Access Rights Awareness" if the Solr/Discovery backend is
enabled (see Defining the Storage of the Browse Data). However, it is enabled by default for DSpace
3.x and above.
When searching in discovery all the groups the user belongs to will be added as a filter query as well as the
users identifier. If the user is an admin all items will be returned since an admin has read rights on everything.
This paragraph only applies to XMLUI. JSPUI relies on the Browse Engine to show "recent
submissions". This requires that the Solr/Discovery backend is enabled (see Defining the Storage of
the Browse Data).
The recent submissions configuration element contains all the configuration settings to display the list of
recently submitted items on the home page or community/collection page. Because the recent submission
configuration is in the discovery configuration block, it is possible to show 10 recently submitted items on the
home page but 5 on the community/collection pages.
<property name="recentSubmissionConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoveryRecentSubmissionsConfiguration">
<property name="metadataSortField" value="dc.date.accessioned"/>
<property name="type" value="date"/>
<property name="max" value="5"/>
</bean>
</property>
The property name & the bean class are mandatory. The property field names are discusses below.
metadataSortField (mandatory): The metadata field to sort on to retrieve the recent submissions
max (mandatory): The maximum number of results to be displayed as recent submissions
type (optional): the type of the search filter. It can either be date or text, if none is defined text will be
used.
This paragraph only applies to XMLUI. JSPUI does not currently support "highlighting & search
snippets".
The hit highlighting configuration element contains all settings necessary to display search snippets & enable hit
highlighting.
Changes made to the configuration will not automatically be displayed in the user interface. By default,
only the following fields are displayed: dc.title, dc.contributor.author, dc.creator, dc.contributor, dc.date.
issued, dc.publisher, dc.description.abstract and fulltext.
<property name="hitHighlightingConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightingConfiguration">
<property name="metadataFields">
<list>
<bean class="org.dspace.discovery.configuration.
DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="dc.title"/>
<property name="snippets" value="5"/>
</bean>
<bean class="org.dspace.discovery.configuration.
DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="dc.contributor.author"/>
<property name="snippets" value="5"/>
</bean>
<bean class="org.dspace.discovery.configuration.
DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="dc.subject"/>
<property name="snippets" value="5"/>
</bean>
<bean class="org.dspace.discovery.configuration.
DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="dc.description.abstract"/>
<property name="maxSize" value="250"/>
<property name="snippets" value="2"/>
</bean>
<bean class="org.dspace.discovery.configuration.
DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="fulltext"/>
<property name="maxSize" value="250"/>
<property name="snippets" value="2"/>
</bean>
</list>
</property>
</bean>
</property>
The property name & the bean class are mandatory. The property field names are:
field (mandatory): The metadata field to be highlighted (can also be * if all the metadata fields should be
highlighted).
maxSize (optional): Limit the number of characters displayed to only the relevant part (use metadata
field as search snippet).
snippets (optional): The maximum number of snippets that can be found in one metadata field.
The rendering of search results is no longer handled by the METS format but uses a special type of list named
"TYPE_DSO_LIST". Each metadata field (& fulltext if configured) is added in the DRI and IF the field contains
hit higlighting the Java code will split up the string & add DRI highlights to the list. The XSL for the themes also
contains special rendering XSL for the DRI; for Mirage, the changes are located in the discovery.xsl file. For
themes using the old themes based on structural.xsl, look for the template matching " dri:list[@type='dsolist']".
This paragraph only apply to XMLUI. The JSPUI does not currently support the "More like this"
feature.
The "more like this"-configuration element contains all the settings for displaying related items on an item
display page.
Below is an example of the "more like this" configuration.
<property name="moreLikeThisConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoveryMoreLikeThisConfiguration">
<property name="similarityMetadataFields">
<list>
<value>dc.title</value>
<value>dc.contributor.author</value>
<value>dc.creator</value>
<value>dc.subject</value>
</list>
</property>
<!--The minimum number of matching terms across the metadata fields above before an item
is found as related -->
<property name="minTermFrequency" value="5"/>
<!--The maximum number of related items displayed-->
<property name="max" value="3"/>
<!--The minimum word length below which words will be ignored-->
<property name="minWordLength" value="5"/>
</bean>
</property>
The property name & the bean class are mandatory. The property field names are discussed below.
The feature currently only one line of configuration to discovery.xml. Changing the value from true to false will
disable the feature.
http://wiki.apache.org/solr/SpellCheckComponent
https://cwiki.apache.org/confluence/display/solr/Spell+Checking
-c clean existing index removing any documents that no longer exist in the db
-r <item handle> remove an Item, Collection or Community from index based on its handle
[dspace]/bin/dspace index-discovery -o
solr
search
conf
admin-extra.html
elevate.xml
protwords.txt
schema.xml
scripts.conf
solrconfig.xml
spellings.txt
stopwords.txt
synonyms.txt
xslt
DRI.xsl
example.xsl
example_atom.xsl
example_rss.xsl
luke.xsl
conf2
solr.xml
statistics
conf
admin-extra.html
elevate.xml
protwords.txt
schema.xml
scripts.conf
solrconfig.xml
spellings.txt
stopwords.txt
synonyms.txt
xslt
example.xsl
example_atom.xsl
example_rss.xsl
luke.xsl
DOIs are Persistent Identifiers like Handles are, but as many big publishing companies use DOIs they are quite
well-known to scientists. Some journals ask for DOIs to link supplemental material whenever an article is
submitted. Beginning with DSpace 4.0 it is possible to use DOIs in parallel to the Handle System within
DSpace. By "using DOIs" we mean automatic generation, reservation and registration of DOIs for every item
that enters the repository. These newly registered DOIs will not be used as a means to build URLs to DSpace
items. Items will still rely on handle assignment for the item urls.
DataCite is an international initiative to promote science and research, and a member of the International DOI
Foundation. The members of DataCite act as registration agencies for DOIs. Some DataCite members provide
their own APIs to reserve and register DOIs; others let their clients use the DataCite API directly. Starting with
version 4.0 DSpace supports the administration of DOIs by using the DataCite API directly or by using the API
from EZID (which is a service of the University of California Digital Library). This means you can administer
DOIs with DSpace if your registration agency allows you to use the DataCite API directly or if your registration
agency is EZID.
To use DOIs within DSpace you have to configure several parts of DSpace:
enter your DOI prefix and the credentials to use the API from DataCite in dspace.cfg,
configure the script which generates some metadata,
dspace.cfg
After you enter into a contract with a DOI registration agency, they'll provide you with user credentials and a
DOI prefix. You have to enter these in the dspace cfg. Here is a list of DOI configuration options in dspace.cfg:
Configuration [dspace]/config/dspace.cfg
File:
Property:
identifier.doi.user
Example
Value: identifier.doi.user = user123
Informational Username to login into the API of the DOI registration agency. You'll get it from your DOI
Note: registration agency.
Property:
identifier.doi.password
Example
Value: identifier.doi.password = top-secret
Informational Password to login into the API of the DOI registration agency. You'll get it from your DOI
Note: registration agency.
Property:
identifier.doi.prefix
Example
Value: identifier.doi.prefix = 10.5072
Informational The prefix you got from the DOI registration agency. All your DOIs start with this prefix,
Note: followed by a slash and a suffix generated from DSpace. The prefix can be compared with a
namespace within the DOI system.
Property:
Configuration [dspace]/config/dspace.cfg
File:
identifier.doi.namespaceseparator
Example
Value: identifier.doi.namespaceseparator = dspace-
Informational This property is optional. If you want to use the same DOI prefix in several DSpace
Note: installations or with other tools that generate and register DOIs it is necessary to use a
namespace separator. All the DOIs that DSpace generates will start with the DOI prefix,
followed by a slash, the namespace separator and some number generated by DSpace. For
example, if your prefix is 10.5072 and you want all DOIs generated by DSpace to look like
10.5072/dspace-1023 you have to set this as in the example value above.
Please don't use the test prefix 10.5072 with DSpace. The test prefix 10.5072 differs from other
prefixes: It answers GET requests for all DOIs even for DOIs that are unregistered. DSpace checks
that it mint only unused DOIs and will create an Error: "Register DOI ... failed:
DOI_ALREADY_EXISTS". Your registration agency can provide you an individual test prefix, that you
can use for tests.
Metadata conversion
To reserve or register a DOI, DataCite requires that metadata be supplied which describe the object that the
DOI addresses. The file [dspace]/config/crosswalks/DIM2DataCite.xsl controls the conversion of metadata from
the DSpace internal format into the DataCite format. You have to add the name of your institution to this file:
\[dspace\]/config/crosswalks/DIM2DataCite.xsl
<!--
Document : DIM2DataCite.xsl
Created on : January 23, 2013, 1:26 PM
Author : pbecker, ffuerste
Description: Converts metadata from DSpace Intermediat Format (DIM) into
metadata following the DataCite Schema for the Publication and
Citation of Research Data, Version 2.2
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dspace="http://www.dspace.org/xmlns/dspace/dim"
xmlns="http://datacite.org/schema/kernel-2.2"
version="1.0">
<!-- The content of the following variable will be used as element publisher. -->
<xsl:variable name="publisher">My University</xsl:variable>
<!-- The content of the following variable will be used as element contributor with
contributorType datamanager. -->
<xsl:variable name="datamanager"><xsl:value-of select="$publisher" /></xsl:variable>
<!-- The content of the following variable will be used as element contributor with
contributorType hostingInstitution. -->
<xsl:variable name="hostinginstitution"><xsl:value-of select="$publisher" /></xsl:variable>
<!-- Please take a look into the DataCite schema documentation if you want to know how to use
these elements.
http://schema.datacite.org -->
<!-- DO NOT CHANGE ANYTHING BELOW THIS LINE EXCEPT YOU REALLY KNOW WHAT YOU ARE DOING! -->
...
If you want to know more about the DataCite Schema, have a look at the documentation. If you change this file
in a way that is not compatible with the DataCite schema, you won't be able to reserve and register DOIs
anymore. Do not change anything if you're not sure what you're doing.
Identifier Service
The Identifier Service manages the generation, reservation and registration of identifiers within DSpace. You
can configure it using the config file located in [dspace]/config/spring/api/identifier-service.xml. In the file you
should already find the code to configure DSpace to register DOIs. Just read the comments and remove the
comment signs around the two appropriate beans.
After removing the comment signs the file should look something like this (I removed the comments to make the
listing shorter):
\[dspace\]/config/spring/api/identifier-service.xml
<!--
Copyright (c) 2002-2010, DuraSpace. All rights reserved
Licensed under the DuraSpace License.
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">
<bean id="org.dspace.identifier.IdentifierService"
class="org.dspace.identifier.IdentifierServiceImpl"
autowire="byType"
scope="singleton"/>
<bean id="org.dspace.identifier.DOIIdentifierProvider"
class="org.dspace.identifier.DOIIdentifierProvider"
scope="singleton">
<property name="configurationService"
ref="org.dspace.services.ConfigurationService" />
<property name="DOIConnector"
ref="org.dspace.identifier.doi.DOIConnector" />
</bean>
<bean id="org.dspace.identifier.doi.DOIConnector"
class="org.dspace.identifier.doi.DataCiteConnector"
scope="singleton">
<property name='DATACITE_SCHEME' value='https'/>
<property name='DATACITE_HOST' value='mds.datacite.org'/>
<property name='DATACITE_DOI_PATH' value='/doi/' />
<property name='DATACITE_METADATA_PATH' value='/metadata/' />
<property name='disseminationCrosswalkName' value="DataCite" />
</bean>
</beans>
If you use other IdentifierProviders beside the DOIIdentifierProvider there will be more beans in this file.
Please pay attention to configure the property DATACITE_HOST. Per default it is set to the DataCite test
server. To reserve real DOIs you will probably have to change it to mds.datacite.org. Ask your registration
agency if you're not sure about the correct address. Unfortunately the test and the production server have
different paths to the API. For the test server you have to set the DATACITE_DOI_PATH to "/mds/doi/" and the
DATACITE_METADATA_PATH to "/mds/doi/", for the production server you have to remove the leading /mds
from both properties.
DSpace should send updates to DataCite whenever the metadata of an item changes. To do so you have to
change the dspace.cfg again. You should remove the comments in front of the two following properties or add
them to the dspace.cfg:
\[dspace\]/config/dspace.cfg
event.consumer.doi.class = org.dspace.identifier.doi.DOIConsumer
event.consumer.doi.filters = Item+Modify_Metadata
Then you should add 'doi' to the property event.dispatcher.default.consumers. After adding it, this
property may look like this:
\[dspace\]/config/dspace.cfg
The command line interface in general is documented here: Command Line Operations. The command used for
DOIs is 'doi-organiser'. You can use the following options:
-d -- Transmit information to the DOI registration agency about all DOIs that
delete- were deleted.
all
-- DOI Transmit information to the DOI registration agency that the specified DOI
delete- was deleted. The DOI must already be marked for deletion; you cannot use
doi this command to delete a DOI for an exisiting item.
-l --list List all DOIs whose changes were not committed to the registration agency
yet.
-q --quiet The doi-organiser sends error reports to the mail address configured in the
property alert.recipient in dspace.cfg. If you use this option no output
should be given to stdout. If you do not use this option the doi-organiser
writes information about successful and unsuccessful operations to stdout
and stderr. You can find information in dspace.log of course.
-- DOI | If a DOI is marked for registration, you can trigger the registration at the
register- ItemID | DOI registration agency by this command. Specify either the DOI, the ID of
doi handle the item, or its handle.
-s -- Transmit to the DOI registration agency information about all DOIs that
reserve- should be reserved.
all
-- DOI | If a DOI is marked for registration, you can trigger the registration at the
reserve- ItemID | DOI registration agency by this command. Specify either the DOI, the ID of
doi handle the item, or its handle.
-u -- If a DOI is reserved for an item, the metadata of the item will be sent to
update- DataCite. This command transmits new metadata for items whose
all metadata were changed since the DOI was reserved.
-- DOI | If a DOI needs an update of the metadata of the item it belongs to, you can
update- ItemID | trigger this update with this command. Specify either the DOI, the ID of the
doi handle item, or its handle.
Currently you cannot generate new DOIs with this tool. You can only send information about changes in your
local DSpace database to the registration agency.
Update the metadata of all items that have changed since their DOI was reserved.
Reserve all DOIs marked for reservation
Register all DOIs marked for registration
Delete all DOIs marked for deletion
In DSpace, a DOI can have the state "registered", "reserved", "to be reserved", "to be registered", "needs
update", "to be deleted", or "deleted". After updating an item's metadata the state of its assigned DOI is set
back to the last state it had before. So, e.g., if a DOI has the state "to be registered" and the metadata of its
item changes, it will be set to the state "needs update". After the update is performed its state is set to "to be
registered" again. Because of this behavior the order of the commands above matters: the update command
must be executed before all of the other commands above.
The cron job should perform the following commands with the rights of the user your DSpace installation runs
as:
[dspace]/bin/dspace doi-organiser -u -q
[dspace]/bin/dspace doi-organiser -s -q
[dspace]/bin/dspace doi-organiser -r -q
[dspace]/bin/dspace doi-organiser -d -q
The doi-organiser sends error messages as email and logs some additional information. The option -q tells
DSpace to be quiet. If you don't use this option the doi-organiser will print messages to stdout about every DOI
it successfully reserved, registered, updated or deleted. Using a cron job these messages would be sent as
email.
In case of an error, consult the log messages. If there is an outage of the API of your registration agency,
DSpace will not change the state of the DOIs so that it will do everything necessary when the cron job starts the
next time and the API is reachable again.
The frequency the cron job runs depends on your needs and your hardware. The more often you run the cron
job the faster your new DOIs will be available online. If you have a lot of submissions and want the DOIs to be
available really quickly, you probably should run the cron job every fifteen minutes. If there are just one or two
submissions per day, it should be enough to run the cron job twice a day.
To set up the cron job, you just need to run the following command as the dspace UNIX user:
crontab -e
The following line tells cron to run the necessary commands twice a day, at 1am and 1pm. Please notice that
the line starting with the numbers is one line, even it it should be shown as multiple lines in your browser.
# Send information about new and changed DOIs to the DOI registration agency:
0 1,13 * * * [dspace]/bin/dspace doi-organiser -u -q ; [dspace]/bin/dspace doi-organiser -s -q ;
[dspace]/bin/dspace doi-organiser -r -q ; [dspace]/bin/dspace doi-organiser -d -q
Every DSpace installation expects to be the only application that generates DOIs which start
with the prefix and the namespace separator you configured. DSpace does not check whether
a DOI it generates is reserved or registered already.
That means if you want to use other applications or even more than one DSpace installation to register DOIs
with the same prefix, you'll have to use a unique namespace separator for each of them. Also you should not
generate DOIs manually with the same prefix and namespace separator you configured within DSpace. For
example, if your prefix is 10.5072 you can configure one DSpace installation to generate DOIs starting with
10.5072/papers-, a second installation to generate DOIs starting with 10.5072/data- and another application to
generate DOIs starting with 10.5072/results-.
DOIs will be used in addtion to Handles. This implementation does not replace Handles with DOIs in DSpace.
That means that DSpace will still generate Handles for every item, every collection and every community, and
will use those Handles as part of the URL of items, collections and communities.
DSpace currently generates DOIs for items only. There is no support to generate DOIs for Communities and
collections yet.
When using DSpaces support for the DataCite API probably not all infomration would be restored when using
the AIP Backup and Restore (see DS-1836-doi_seq in update-sequences.sql missingMore Details Needed).
The DOIs included in metadata of Items will be restored, but DSpace won't update the metadata of those items
at DataCite anymore. You can even get problems when minting new DOIs after you restored older once using
AIP.
In config/dspace.cfg you will find a small block of settings whose names begin with identifier.doi.
ezid. You should uncomment these properties and give them appropriate values. Sample values for a test
account are supplied.
name meaning
identifier.doi. The "shoulder" is the DOI prefix issued to you by the EZID service. DOIs minted by this
ezid. instance of DSpace will be the concatenation of the "shoulder" and a locally unique token.
shoulder
identifier.doi.
ezid.
password
identifier.doi. You may specify a default value for the required datacite.publisher metadatum, for use
ezid. when the Item has no publisher.
publisher
metadata fields to EZID fields, and can be extended or changed. The key of each entry is the name of an
EZID metadata field; the value is the name of the corresponding DSpace field, from which the EZID metadata
will be populated.
You can also supply transformations to be applied to field values using the crosswalkTransform property.
Each key is the name of an EZID metadata field, and its value is the name of a Java class which will convert
the value of the corresponding DSpace field to its EZID form. The only transformation currently provided is one
which converts a date to the year of that date, named org.dspace.identifier.ezid.DateToYear. In the
configuration as delivered, it is used to convert the date of issue to the year of publication. You may create new
Java classes with which to supply other transformations, and map them to metadata fields here. If an EZID
metadatum is not named in this map, the default mapping is applied: the string value of the DSpace field is
copied verbatim.
Normally, you should not change the values of the EZID_SCHEME and EZID_HOST properties of the
EZIDRequestFactory bean.
After the introduction of the SOLR Statistics logging in DSpace 1.6, every pageview and file download is logged
in a dedicated SOLR statistics core.
In addition to the already existing logging of pageviews and downloads, DSpace 3.0 now also logs search
queries users enter in the DSpace search dialog and workflow events.
Due to the very recent addition of Discovery for search & faceted browsing in JSPUI, these search
queries are not yet logged. Regular (non-discovery) search queries are being logged in JSP UI.
Only workflow events, initiated and executed by a physical user are being logged. Automated workflow
steps or ingest procedures are currently not being logged by the workflow events logger.
The logging happens at the server side, and doesn't require a javascript like Google Analytics does, to provide
usage data. Definition of which fields are to be stored happens in the file dspace/solr/statistics/conf/schema.
xml.
Although they are stored in the same index, the stored fields for views, search queries and workflow events are
different. A new field, statistics_type determines which kind of a usage event you are dealing with. The three
possible values for this field are view, search and workflow.
The combination of type and id determines which resource (either community, collection, item page or file
download) has been requested.
If you are not seeing these links or buttons, it's likely that they are only enabled for administrators in your
installation. Change the configuration parameter "authorization.admin.usage" in usage-statistics.cfg to false in
order to make statistics visible for all repository visitors.
Home page
Starting from the repository homepage, the statistics page displays the top 10 most popular items of the entire
repository.
If you are not seeing the link labelled "search statistics", it is likely that they are only enabled for administrators
in your installation. Change the configuration parameter "authorization.admin.search" in usage-statistics.cfg to
false in order to make statistics visible for all repository visitors.
The dropdown on top of the page allows you to modify the time frame for the displayed statistics.
The Pageviews/Search column tracks the amount of pages visited after a particular search term. Therefor a
zero in this column means that after executing a search for a specific keyword, not a single user has clicked a
single result in the list.
If you are using Discovery, note that clicking the facets also counts as a search, because clicking a facet sends
a search query to the Discovery index.
If you are not seeing the link labelled "Workflow statistics", it is likely that they are only enabled for
administrators in your installation. Change the configuration parameter "authorization.admin.workflow" in usage-
statistics.cfg to false in order to make statistics visible for all repository visitors.
The dropdown on top of the page allows you to modify the time frame for the displayed statistics.
4.8.3 Architecture
The DSpace Statistics Implementation is a Client/Server architecture based on Solr for collecting usage events
in the JSPUI and XMLUI user interface applications of DSpace. Solr runs as a separate webapplication and an
instance of Apache Http Client is utilized to allow parallel requests to log statistics events into this Solr instance.
Property: server
Informational Is used by the SolrLogger Client class to connect to the Solr server over http and perform
Note: updates and queries. In most cases, this can (and should) be set to localhost (or 127.0.0.1).
To determine the correct path, you can use a tool like wget to see where Solr is responding
on your server. For example, you'd want to send a query to Solr like the following:
wget http://127.0.0.1/solr/statistics/select?q=*:*
Assuming you get an HTTP 200 OK response, then you should set solr.log.server to
the '/statistics' URL of 'http://127.0.0.1/solr/statistics' (essentially removing the "/select?q= :"
query off the end of the responding URL.)
Property: query.filter.bundles
Example query.filter.bundles=ORIGINAL
Value:
Informational A comma seperated list that contains the bundles for which the file statistics will be displayed.
Note:
Property: solr.statistics.query.filter.spiderIp
solr.statistics.query.filter.spiderIp = false
Example
Value:
Informational If true, statistics queries will filter out spider IPs -- use with caution, as this often results in
Note: extremely long query strings.
Property: solr.statistics.query.filter.isBot
Informational If true, statistics queries will filter out events flagged with the "isBot" field. This is the
Note: recommended method of filtering spiders from statistics.
Property: spiderips.urls
Example spiderips.urls =
Value:
http://iplists.com/google.txt, \
http://iplists.com/inktomi.txt, \
http://iplists.com/lycos.txt, \
http://iplists.com/infoseek.txt, \
http://iplists.com/altavista.txt, \
http://iplists.com/excite.txt, \
http://iplists.com/misc.txt, \
http://iplists.com/non_engines.txt
Informational List of URLs to download spiders files into [dspace]/config/spiders. These files contain lists of
Note: known spider IPs and are utilized by the SolrLogger to flag usage events with an "isBot" field,
or ignore them entirely.
The "stats-util" command can be used to force an update of spider files, regenerate "isBot"
fields on indexed events, and delete spiders from the index. For usage, run:
dspace stats-util -h
In the {dspace.dir}/config/modules/usage-statistics.cfg file review the following fields to make sure they are
uncommented:
Property: dbfile
Informational The following referes to the GeoLiteCity database file utilized by the LocationUtils to calculate
Note: the location of client requests based on IP address. During the Ant build process (both
fresh_install and update) this file will be downloaded from http://www.maxmind.com/app
/geolitecity if a new version has been published or it is absent from your [dspace]/config
directory.
Property: resolver.timeout
Informational Timeout in milliseconds for DNS resolution of origin hosts/IPs. Setting this value too high may
Note: result in solr exhausting your connection pool.
Property: useProxies
Informational Will cause Statistics logging to look for X-Forward URI to detect clients IP that have accessed
Note: it through a Proxy service (e.g. the Apache mod_proxy). Allows detection of client IP when
accessing DSpace. [Note: This setting is found in the DSpace Logging section of dspace.cfg]
Property: authorization.admin.usage
Informational When set to true, only general administrators, collection and community administrators are
Note: able to access the pageview and download statistics from the web user interface. As a result,
the links to access statistics are hidden for non logged-in admin users. Setting this property to
"false" will display the links to access statistics to anyone, making them publicly available.
Property: authorization.admin.search
authorization.admin.search = true
Example
Value:
Informational When set to true, only system, collection or community administrators are able to access
Note: statistics on search queries.
Property: authorization.admin.workflow
Informational When set to true, only system, collection or community administrators are able to access
Note: statistics on workflow events.
Property: logBots
Informational When this property is set to false, and IP is detected as a spider, the event is not logged.
Note: When this property is set to true, the event will be logged with the "isBot" field set to true.
(see solr.statistics.query.filter.* for query filter options)
# should the stats be publicly available? should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = false
These fields are not used by the new 1.6 Statistics, but are only related to the Statistics from previous DSpace
releases
cd [dspace-source]/dspace
mvn package
cd [dspace-source]/dspace/target/dspace-<version>-build.dir
ant -Dconfig=[dspace]/config/dspace.cfg update
cp -R [dspace]/webapps/* [TOMCAT]/webapps
The last step is only used if you do not follow the recommended practice of configuring [dspace]/webapps as
location for webapps in your servlet container (Tomcat, Resin or Jetty). If you only need to build the statistics,
and don't make any changes to other web applications, you can replace the copy step above with:
cp -R dspace/webapps/solr TOMCAT/webapps
Again, only if you are not mounting [dspace]/webapps directly into your Tomcat, Resin or Jetty host (the
recommended practice)
Applying this change will involve dumping all the old file statistics into a file and re uploading these.
Therefore it is wise to create a backup of the {dspace.dir}/solr/statistics/data directory. It is best to
create this backup when the Tomcat/Jetty/Resin server program isn't running.
When a backup has been made start the Tomcat/Jetty/Resin server program.
The update script has one optional command which will if given not only update the broken file statistics but
also delete file statistics for files that where removed from the system (if this option isn't active these statistics
will receive the "BITSTREAM_DELETED" bundle name).
#The -r is optional
[dspace]/bin/dspace stats-util -b -r
{dspace.dir}/bin/stats-util -o
More information on how these solr server optimizations work can be found here: http://wiki.apache.org/solr
/SolrPerformanceFactors#Optimization_Considerations.
SOLR Autocommit
In DSpace 1.6.x, each solr event was committed to the solr server individually. For high load DSpace
installations, this would result in a huge load of small solr commits resulting in a very high load on the solr
server.
This has been resolved in dspace 1.7 by only committing usage events to the solr server every 15 minutes. This
will result in a delay of the storage of a usage event of maximum 15 minutes. If required, this value can be
altered by changing the maxTime property in the
{dspace.dir}/solr/statistics/conf/solrconfig.xml
https://github.com/DSpace/DSpace/blob/dspace-3_x/dspace-xmlui/src/main/java/org/dspace/app/xmlui/aspect
/statistics/StatisticsTransformer.java#L205
-6 is the default setting, displaying the past 6 months of statistics. When reducing this to a smaller natural
number, less months are being displayed.
Resources
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr
http://my.safaribooksonline.com/9781847195883/Cover
Examples
http://localhost:8080/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%
2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0
Explained:
<lst name="facet_counts">
<lst name="facet_fields">
<lst name="epersonid">
<int name="66">1167</int>
<int name="117">251</int>
<int name="52">42</int>
<int name="19">36</int>
<int name="88">20</int>
<int name="112">18</int>
<int name="110">9</int>
<int name="96">0</int>
</lst>
</lst>
</lst>
In most cases, this file is installed automatically when you run ant fresh_install. However, if the file
cannot be downloaded & installed automatically, you may need to manually install it.
As this file is also sometimes updated by MaxMind.com, you may also wish to update it on occasion.
1. Attempt to re-run the automatic installer from your DSpace Source Directory ([dspace-source]). This will
attempt to automatically download the database file, unzip it and install it into the proper location:
ant update_geolite
NOTE: If the location of the GeoLite Database file is known to have changed, you can also run
this auto-installer by passing it the new URL of the GeoLite Database File: ant -Dgeolite=
[full-URL-of-geolite] update_geolite
2. OR, you can manually install the file by performing these steps yourself:
First, download the latest GeoLite Database file from http://geolite.maxmind.com/download/geoip
/database/GeoLiteCity.dat.gz
Next, unzip that file to create a file named GeoLiteCity.dat
Finally, move or copy that file to your DSpace installation, so that it is located at [dspace]
/config/GeoLiteCity.dat.
The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted
into Solr.
Arguments Description
short and
long forms):
-o or -- Output file
out
-m or -- Adds a wildcard at the end of input and output, so it would mean if -i dspace.log -m was
multiple specified, dspace.log* would be converted. (i.e. all of the following: dspace.log, dspace.log.1,
dspace.log.2, dspace.log.3, etc.)
-n or -- If the log files have been created with DSpace 1.6 or newer
newformat
-h or -- Help
help
The command loads the intermediate log files that have been created by the aforementioned script into Solr.
Java org.dspace.statistics.util.StatisticsImporter
class:
Arguments Description
(short and
long
forms):
-m or -- Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported
multiple
-s or -- To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the
skipdns information about the host from its IP address, such as geographical location, etc. This can be
slow, and wouldn't work on a server not connected to the internet.)
-l or -- For developers: allows you to import a log file from another system, so because the handles
local won't exist, it looks up random items in your local system to add hits to instead.
-h or -- Help
help
Although the DSpace Log Convertor applies basic spider filtering (googlebot, yahoo slurp, msnbot), it is far from
complete. Please refer to Filtering and Pruning Spiders for spider removal operations, after converting your old
logs.
-r or --remove- While indexing the bundle names remove the statistics about deleted bitstreams
deleted-
bitstreams
-f or --delete- Delete Spiders in Solr By isBot Flag. Will prune out all records that have isBot:true
spiders-by-
flag
-i or --delete- Delete Spiders in Solr By IP Address, DNS name, or Agent name. Will prune out all
spiders-by-ip records that match spider identification patterns.
-m or --mark- Update isBot Flag in Solr. Marks any records currently stored in statistics that have IP
spiders addresses matched in spiders files
Notes:
The usage of these options is open for the user to choose. If you want to keep spider entries in your repository,
you can just mark them using "-m" and they will be excluded from statistics queries when "solr.statistics.
query.filter.isBot = true" in the dspace.cfg. If you want to keep the spiders out of the solr
repository, just use the "-i" option and they will be removed immediately.
Spider IPs are specified in files containing one pattern per line. A line may be a comment (starting with "#" in
column 1), empty, or a single IP address or DNS name. If a name is given, it will be resolved to an address.
Unresolvable names are discarded and will be noted in the log.
There are guards in place to control what can be defined as an IP range for a bot. In [dspace]/config
/spiders, spider IP address ranges have to be at least 3 subnet sections in length 123.123.123 and IP
Ranges can only be on the smallest subnet [123.123.123.0 - 123.123.123.255]. If not, loading that row will
cause exceptions in the dspace logs and exclude that IP entry.
Spiders may also be excluded by DNS name or Agent header value. Place one or more files of patterns in the
directories [dspace]/config/spiders/domains and/or [dspace]/config/spiders/agents. Each
line in a pattern file should be either empty, a comment starting with "#" in column 1, or a regular expression
which matches some names to be recognized as spiders.
-o or --optimize Run maintenance on the SOLR index. Recommended to run daily, to prevent your
servlet container from running out of memory
Notes:
The usage of this this option is strongly recommended, you should run this script daily (from crontab or your
system's scheduler), to prevent your servlet container from running out of memory.
-s or --shard-solr- Splits the data in the main core up into a separate solr core for each year, this will
index upgrade the performance of the solr.
Notes:
Yearly Solr sharding is a routine that can drastically improve the performance of your DSpace SOLR statistics.
It was introduced in DSpace 3.0 and is not backwards compatible. The routine decreases the load created by
the logging of new usage events by reducing the size of the SOLR Core in which new usage data are being
logged. By running the script, you effectively split your current SOLR core, containing all of your usage events,
into different SOLR cores that each contain the data for one year. In case your DSpace has been logging usage
events for less than one year, you will see no notable performance improvements until you run the script after
the start of a new year. Both writing new usage events as well as read operations should be more performant
over several smaller SOLR Shards instead of one monolithic one.
It is highly recommended that you execute this script once at the start of every year. To ensure this is not
forgotten, you can include it in your crontab or other system scheduling software. Here's an example cron entry
(just replace [dspace] with the full path of your DSpace installation):
# At 12:00AM on January 1, "shard" the DSpace Statistics Solr index. Ensures each year has its
own Solr index - this improves performance.
0 0 1 1 * [dspace]/bin/dspace stats-util -s
The actual sharding of the of the original solr core into individual cores by year is done in the shardSolrIndex
method in the org.dspace.statistics.SolrLogger class. The sharding is done by first running a facet on the time to
get the facets split by year. Once we have our years from our logs we query the main solr data server for all
information on each year & download these as csv's. When we have all data for one year we upload it to the
newly created core of that year by using the update csvhandler. One all data of one year has been uploaded
that data is removed from the main solr (by doing it this way if our solr crashes we do not need to start from
scratch).
Page). Elastic Search Statistics is bundled with DSpace, and requires no additional installation of software, it
just needs to be enabled. Elastic Search is only available for use with XMLUI.
IP Address
Time of Request
DNS / Hostname
User Agent
isBot, a flag that DSpace thinks that user is a robot or not
Geographical Information about where the user is located:
Continent
Country
Country Code
City
Geographical Latitude/Longitude
DSpace Object ID
DSpace Object Type: (Item, Bitstream, Collection, or Community)
If it is relevant, we also store the hierarchy of where this object exists within DSpace
Owning Community
Owning Collection
Owning Item
<!--
If you prefer to use "Elastic Search" Statistics, you can uncomment the below
aspect and COMMENT OUT the default "Statistics" aspect above.
You must also enable the ElasticSearchLoggerEventListener.
-->
Enable ElasticSearchLoggerEventListener
After making these two changes, you will then need to rebuild and restart DSpace.
From the (Windows / Linux) terminal, you will need to use the DSpace Command Launcher to convert the
dspace.log files to a statistics log format. Then you will need to import the statistics log format files into DSpace
Statistics.
The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted
into Elastic Search Statistics.
Input file
-i or --in
Output file
-o or --out
Adds a wildcard at the end of input and output, so it would mean if -i dspace.
-m or --multiple
log -m was specified, dspace.log* would be converted. (i.e. all of the following:
dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.)
If the log files have been created with DSpace 1.6 or newer
-n or --newformat
Help
-h or --help
The Log Importer program takes the intermediate format data produced in the previous step, and imports it into
Elastic Search Statistics.
Input file
-i or --in
Adds a wildcard at the end of input and output, so it would mean if -i statistics.
-m or --multiple
log -m was specified, statistics.log* would be imported. (i.e. all of the following:
statistics.log, statistics.log.1, statistics.log.2, statistics.log.3, etc.)
To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup
-s or --skipdns
finds the information about the host from its IP address, such as geographical location,
etc. This can be slow, and wouldn't work on a server not connected to the internet.)
Help
-h or --help
This data is presented as either a Table or Line Graph, and requires JavaScript to draw the graphics.
4.9 Embargo
What is an Embargo?
DSpace 3.0 New Embargo Functionality
Configuring and using Embargo in DSpace 3.0+
Introduction
Database
dspace.cfg
Submission Process
item-submission.xml
Simple Embargo Settings
AccessStep
UploadWithEmbargoStep
Advanced Embargo Settings
AccessStep
UploadWithEmbargoStep
Restrict list of displayed groups to specific (sub)groups
Private/Public Item
Pre-3.0 Embargo Migration Routine
Technical Specifications
Introduction
ResourcePolicy
Item
Item.inheritCollectionDefaultPolicies(Collection c)
AuthorizeManager
Withdraw Item
Reinstate Item
Pre-DSpace 3.0 Embargo Compatibility
Pre-DSpace 3.0 Embargo
Embargo model and life-cycle
Terms assignment
Terms interpretation/imposition
Embargo period
Embargo lift
Post embargo
Configuration
Operation
Extending embargo functionality
Setter
Lifter
As a DSpace administrator, you can choose to integrate either Simple or Advanced dialog screens as part of
the item submission process. These are outlined in detail in the sections Simple Embargo Settings and
Advanced Embargo Settings.
Please note that the configuration parameter name has been changed in DSpace 4.0 from xmlui.
submission.restrictstep.enableAdvancedForm to webui.submission.restrictstep.enableAdvancedForm
On the level of an individual item, a new Private/Public state has been introducted to control the visibility of item
metadata in the different indexes serving the DSpace web interface (search, browse, discovery), as well as
machine interfaces (REST-API, OAI-PMH, …)
Edit Item
Edit Bitstream
Wildcard Policy Admin Tool
Introduction
The following sections describe the steps needed to configure and use the new Embargo functionality in
DSpace 3.0.
Note: when the embargo will be set at item level or bitstream level a new ResourcePolicy will be added.
JSP UI support
Database
As a first step, the following script needs to be executed to ensure that your DSpace database gets extended
with 3 new fields, required by the new embargo. This is part of the normal process of upgrading from DSpace
1.8.x to 3.0.
- dspace/etc/[postgres-oracle]/database_schema_18-3.sql
dspace.cfg
As already mentioned the user will be given the opportunity to choose between:
To switch between the two, you need to set following variable in the dspace.cfg file. A value of false (the
default) enables the simple settings while a value of true enables the advanced settings.
webui.submission.restrictstep.enableAdvancedForm=false
Submission Process
item-submission.xml
To enable the new embargo, changes are required to the item-submission.xml file, located in your config
directory. This file determines which steps are executed in the submission of a new item.
Two new submission steps have been introduced in the file. By default, they are not activated yet:
AccessStep: the step in which the user can set the embargo at item level, effectively restricting access to
the item metadata.
UploadWithEmbargoStep: the step in which the user can set the embargo at bitstream level. If this step
is enabled, the old UploadStep must be disabled. Leaving both steps enabled will result in a
system failure.
<!-- Step 4 Upload Item with Embargo Features (not supported in JSPUI)
to enable this step, please make sure to comment-out the previous step "UploadStep"
<step>
<heading>submit.progressbar.upload</heading>
<processing-class>org.dspace.submit.step.UploadWithEmbargoStep</processing-class>
<jspui-binding>org.dspace.app.webui.submit.step.JSPUploadWithEmbargoStep</jspui-binding>
<xmlui-binding>org.dspace.app.xmlui.aspect.submission.submit.UploadWithEmbargoStep<
/xmlui-binding>
<workflow-editable>true</workflow-editable>
</step>
-->
To enable the new Embargo, ensure that the new steps are uncommented and the old UploadStep is
commented out.
AccessStep
The simple AccessStep Embargo form renders three options for the user:
Private item: to hide an item's metadata from all search and browse indexes, as well as external
interfaces such as OAI-PMH.
Embargo Access until Specific Date: to indicate a date until which the item will be embargoed.
Reason: to elaborate on the specific reason why an item is under embargo.
When Embargo is set, it applies to Anonymous or to any other Group that is indicated to have default read
access for that specific collection.
This shows how the Access step is rendered, using the simple embargo settings:
UploadWithEmbargoStep
The simple UploadWithEmbargoStep form renders two new fields for the user:
Embargo Access until Specific Date: to indicate a date until which the bitstream will be embargoed.
When left empty, no embargo will be applied.
Reason: to elaborate on the specific reason why the bitstream is under embargo.
These fields will be preloaded with the values set in the AccessStep.
The following picture shows the form for the Upload step, rendered using the simple embargo settings with
preloaded values:
AccessStep
The Advanced AccessStep Embargo step allows the users to manage more fine-grained resource policies to
attach to the item.
The last two fields will be enabled only when Embargoed has been selected.
This step gives the opportunity to the user to manage the policy manually, so that combinations such as the
following will be possible:
Here is a screenshot of the Access step form that will be rendered for the advanced embargo settings:
UploadWithEmbargoStep
UploadWithEmbargoStep for Advanced Embargo settings displays an additional Policies button next to Edit in
the list of uploaded files.
Clicking it brings you to the a page where you can edit existing policies on the bitstream and add new ones.
When the button is pushed, a form similar to the one in the AccessStep will be rendered, making it possible to
manage the policies at bitstream level.
When advanced embargo settings are enabled, you can limit the list of groups displayed to the submitters to
subgroups of a particular group.
To use this feature, assign the super group name to following configuration value in dspace.cfg:
webui.submission.restrictstep.groups=name_of_the_supergroup
Please note that the configuration parameter name has been changed in DSpace 4.0 from xmlui.
submission.restrictstep.groups to webui.submission.restrictstep.groups
Once a specific group is configured as supergroup here, only the groups belonging to the indicated group will
be loaded in the selection dialogs. By default, all groups are loaded.
Private/Public Item
It is also possible to adjust the Private/Public state of an item after it has been archived in the repository.
Private items are not retrievable through the DSpace search, browse or Discovery indexes.
Therefor, an admin-only view has been created to browse all private items. Here is a screenshot of this new
form:
./dspace migrate-embargo -a
Introduction
The following sections illustrate the technical changes that have been made to the back-end to add the new
Advanced Embargo functionality.
ResourcePolicy
When an embargo is set at item level or bitstream level, a new ResourcePolicy will be added.
While rpname and rpdescription are fields manageable by users, the rptype is managed by DSpace itself. It
represents a type that a resource policy can assume, among the following:
TYPE_SUBMISSION: all the policies added automatically during the submission process
TYPE_WORKFLOW: all the policies added automatically during the workflow stage
TYPE_CUSTOM: all the custom policies added by users
TYPE_INHERITED: all the policies inherited from the enclosing object (for Item, a Collection; for
Bitstream, an Item).
policy_id: 4847
resource_type_id: 2
resource_id: 89
action_id: 0
eperson_id:
epersongroup_id: 0
start_date: 2013-01-01
end_date:
rpname: Embargo Policy
rpdescription: Embargoed through 2012
rptype: TYPE_CUSTOM
Item
To manage Private/Public state a new boolean attribute has been added to the Item:
isDiscoverable
When an Item is private, the attribute will assume the value false.
Item.inheritCollectionDefaultPolicies(Collection c)
This method has been adjusted to leave custom policies, added by the users, in place and add the default
collection policies only if there are no custom policies.
AuthorizeManager
Some methods have been changed on AuthorizeManager to manage the new fields and some convenience
methods have been introduced:
Withdraw Item
The feature to withdraw an item from the repository has been modified to keep all the custom policies in place.
Reinstate Item
The feature to reinstate an item in the repository has been modified to preserve existing custom policies.
These terms are interpreted by the embargo system to yield a specific date on which the embargo can be
removed (or "lifted"), and a specific set of access policies. Obviously, some terms are easier to interpret than
others (the absolute date really requires none at all), and the default embargo logic understands only the most
basic terms (the first and third examples above). But as we will see below, the embargo system provides you
with the ability to add your own interpreters to cope with any terms expressions you wish to have. This date that
is the result of the interpretation is stored with the item. The embargo system detects when that date has
passed, and removes the embargo ("lifts it"), so the item bitstreams become available. Here is a more detailed
life-cycle for an embargoed item:
Terms assignment
The first step in placing an embargo on an item is to attach (assign) "terms" to it. If these terms are missing, no
embargo will be imposed. As we will see below, terms are carried in a configurable DSpace metadata field, so
assigning terms just means assigning a value to a metadata field. This can be done in a web submission user
interface form, in a SWORD deposit package, a batch import, etc. - anywhere metadata is passed to DSpace.
The terms are not immediately acted upon, and may be revised, corrected, removed, etc, up until the next stage
of the life-cycle. Thus a submitter could enter one value, and a collection editor replace it, and only the last
value will be used. Since metadata fields are multivalued, theoretically there can be multiple terms values, but in
the default implementation only one is recognized.
Terms interpretation/imposition
In DSpace terminology, when an Item has exited the last of any workflow steps (or if none have been defined
for it), it is said to be "installed" into the repository. At this precise time, the interpretation of the terms occurs,
and a computed "lift date" is assigned, which like the terms is recorded in a configurable metadata field. It is
important to understand that this interpretation happens only once, (just like the installation), and cannot be
revisited later. Thus, although an administrator can assign a new value to the metadata field holding the terms
after the item has been installed, this will have no effect on the embargo, whose "force" now resides entirely in
the "lift date" value. For this reason, you cannot embargo content already in your repository (at least using
standard tools). The other action taken at installation time is the actual imposition of the embargo. The default
behavior here is simply to remove the read policies on all the bundles and bitstreams except for the "LICENSE"
or "METADATA" bundles. See the Extending embargo functionality section below for how to alter this
behavior. Also note that since these policy changes occur before installation, there is no time during which
embargoed content is "exposed" (accessible by non-administrators). The terms interpretation and imposition
together are called "setting" the embargo, and the component that performs them both is called the embargo
"setter".
Embargo period
After an embargoed item has been installed, the policy restrictions remain in effect until removed. This is not an
automatic process, however: a "lifter" must be run periodically to look for items whose "lift date" has passed.
Note that this means the effective removal of an embargo does not occur on the lift date, but on the earliest
date after the lift date that the lifter is run. Typically, a nightly cron-scheduled invocation of the lifter is more than
adequate, given the granularity of embargo terms. Also note that during the embargo period, all metadata of the
item remains visible.This default behavior can be changed. One final point to note is that the "lift date", although
it was computed and assigned during the previous stage, is in the end a regular metadata field. That means, if
there are extraordinary circumstances that require an administrator (or collection editor - anyone with edit
permissions on metadata) to change the lift date, this can be done. Thus, one can "revise" the lift date without
reference to the original terms. This date will be checked the next time the "lifter" is run. One could immediately
lift the embargo by setting the lift date to the current day, or change it to "forever" to indefinitely postpone lifting.
Embargo lift
When the lifter discovers an item whose lift date is in the past, it removes ("lifts") the embargo. The default
behavior of the lifter is to add the resource policies that would have been added had the embargo not been
imposed. That is, it replicates the standard DSpace behavior, in which an item inherits its policies from its
owning collection. As with all other parts of the embargo system, you may replace or extend the default
behavior of the lifter (see Extending embargo functionality below). You may wish, e.g., to send an email to an
administrator or other interested parties when an embargoed item becomes available.
Post embargo
After the embargo has been lifted, the item ceases to respond to any of the embargo life-cycle events. The
values of the metadata fields reflect essentially historical or provenance values. With the exception of the
additional metadata fields, the item is indistinguishable from items that were never subject to embargo.
Configuration
DSpace embargoes utilize standard metadata fields to hold both the "terms" and the "lift date". Which fields you
use are configurable, and no specific metadata element is dedicated or pre-defined for use in embargo. Rather,
you must specify exactly what field you want the embargo system to examine when it needs to find the terms or
assign the lift date.
You replace the placeholder values with real metadata field names. If you only need the "default" embargo
behavior - which essentially accepts only absolute dates as "terms" - this is the only configuration required,
except as noted below.
You are free to use existing metadata fields, or create new fields. If you choose the latter, you must understand
that the embargo system does not create or configure these fields: i.e. you must follow all the standard
documented procedures for actually creating them (i.e. adding them to the metadata registry, or to display
templates, etc) - this does not happen automatically. Likewise, if you want the field for "terms" to appear in
submission screens and workflows, you must follow the documented procedure for configurable submission
(basically, this means adding the field to input-forms.xml). The flexibility of metadata configuration makes if easy
for you to restrict embargoes to specific collections, since configurable submission can be defined per
collection.
Key recommendations:
1. Use a local metadata schema. Breaking compliance with the standard Dublin Core in the default
metadata registry can create a problem for the portability of data to/from of your repository.
2. If using existing metadata fields, avoid any that are automatically managed by DSpace. For example,
fields like "date.issued" or "date.accessioned" are normally automatically assigned, and thus must not be
recruited for embargo use.
3. Do not place the field for "lift date" in submission screens. This can potentially confuse submitters
because they may feel that they can directly assign values to it. As noted in the life-cycle above, this is
erroneous: the lift date gets assigned by the embargo system based on the terms. Any pre-existing value
will be over-written. But see next recommendation for an exception.
4. As the life-cycle discussion above makes clear, after the terms are applied, that field is no longer
actionable in the embargo system. Conversely, the "lift date" field is not actionable until the application.
Thus you may want to consider configuring both the "terms" and "lift date" to use the same metadata
field. In this way, during workflow you would see only the terms, and after item installation, only the lift
date. If you wish the metadata to retain the terms for any resaon, use 2 distinct fields instead.
Operation
After the fields defined for terms and lift date have been assigned in dspace.cfg, and created and configured
wherever they will be used, you can begin to embargo items simply by entering data (dates, if using the default
setter) in the terms field. They will automatically be embargoed as they exit workflow. For the embargo to be
lifted on any item, however, a new administrative procedure must be added: the "embargo lifter" must be
invoked on a regular basis. This task examines all embargoed items, and if their "lift date" has passed, it
removes the access restrictions on the item. Good practice dictates automating this procedure using cron jobs
or the like, rather than manually running it.
The lifter is available as a target of the 1.6 DSpace launcher - see launcher documentation for details.
Setter
The default setter recognizes only two expressions of terms: either a literal, non-relative date in the fixed format
"yyyy-mm-dd" (known as ISO 8601), or a special string used for open-ended embargo (the default configured
value for this is "forever", but this can be changed in dspace.cfg to "toujours", "unendlich", etc). It will perform a
minimal sanity check that the date is not in the past. Similarly, the default setter will only remove all read
policies as noted above, rather than applying more nuanced rules (e.g allow access to certain IP groups, deny
the rest). Fortunately, the setter class itself is configurable and you can "plug in" any behavior you like, provided
it is written in java and conforms to the setter interface. The dspace.cfg property:
Lifter
The default lifter behavior as described above - essentially applying the collection policy rules to the item - might
also not be sufficient for all purposes. It also can be replaced with another class:
-c or --check ONLY check the state of embargoed Items, do NOT lift any embargoes
-i or --identifier Process ONLY this handle identifier(s), which must be an Item. Can be
repeated.
-l or --lift Only lift embargoes, do NOT check the state of any embargoed items.
-v or --verbose Print a line describing the action taken for each embargoed item found.
You must run the Embargo Lifter task periodically to check for items with expired embargoes and lift them from
being embargoed. For example, to check the status, at the CLI:
[dspace]/bin/dspace embargo-lifter -c
To lift the actual embargoes on those items that meet the time criteria, at the CLI:
[dspace]/bin/dspace embargo-lifter -l
First, you should export the DSpace Item(s) into the Simple Archive Format, as detailed at: Importing and
Exporting Items via Simple Archive Format. Be sure to use the --migrate option, which removes fields that
would be duplicated on import. Then import the resulting files into the other instance.
For more information see Harvesting Items from XMLUI via OAI-ORE or OAI-PMH
4.11.1 OAI
OAI Interfaces
OAI-PMH Server
OAI-PMH Server Activation
OAI-PMH / OAI-ORE Harvester (Client)
Harvesting from another DSpace
OAI-PMH / OAI-ORE Harvester Configuration
OAI-PMH Server
In the following sections and subpages, you will learn how to configure OAI-PMH server and activate additional
OAI-PMH crosswalks. The user is also referred to OAI-PMH Data Provider for greater depth details of the
program.
The OAI-PMH Interface may be used by other systems to harvest metadata records from your DSpace.
If you're using a recent browser, you should see a HTML page describing your repository. What you're getting
from the server is in fact an XML file with a link to an XSLT stylesheet that renders this HTML in your browser
(client-side). Any browser that cannot interpret XSLT will display pure XML. The default stylesheet is located in
[dspace]/webapps/oai/static/style.xsl and can be changed by configuring the stylesheet
attribute of the Configuration element in [dspace]/config/crosswalks/oai/xoai.xml.
Relevant Links
OAI 2.0 Server - basic information needed to configure and use the OAI Server in DSpace
OAI-PMH Data Provider 2.0 (Internals) - information on how it's implemented
http://www.openarchives.org/pmh/ - information on the OAI-PMH protocol and its usage (not
DSpace-specific)
Relevant Links
For information on activating & using the OAI-PMH / OAI-ORE Harvester to harvest content into your
DSpace, see Harvesting Items from XMLUI via OAI-ORE or OAI-PMH
First, that external DSpace must be running both the OAI-PMH interface and the XMLUI interface to support
harvesting content from it via OAI-ORE.
You can verify that OAI-ORE harvesting option is enabled by following these steps:
1. First, check to see if the external DSpace reports that it will support harvesting ORE via the OAI-PMH
interface. Send the following request to the DSpace's OAI-PMH interface: http://[full-URL-to-
OAI-PMH]/request?verb=ListRecords&metadataPrefix=ore
The response should be an XML document containing ORE, similar to the response from the
DSpace Demo Server: http://demo.dspace.org/oai/request?verb=ListRecords&metadataPrefix=ore
2. Next, you can verify that the XMLUI interface supports OAI-ORE (it should, as long as it's a current
version of DSpace). First, find a valid Item Handle. Then, send the following request to the DSpace's
XMLUI interface: http://[full-URL-to-XMLUI]/metadata/handle/[item-handle]/ore.xml
The response should be an OAI-ORE (XML) document which describes that specific Item. It
should look similar to the response from the DSpace Demo Server: http://demo.dspace.org/xmlui
/metadata/handle/10673/3/ore.xml
Configuration [dspace]/config/modules/oai.cfg
File:
Property: harvester.eperson
Informational The EPerson under whose authorization automatic harvesting will be performed. This field
Note: does not have a default value and must be specified in order to use the harvest scheduling
system. This will most likely be the DSpace admin account created during installation.
Property: dspace.oai.url
Informational The base url of the OAI-PMH disseminator webapp (i.e. do not include the /request on the
Note: end). This is necessary in order to mint URIs for ORE Resource Maps. The default value of
Configuration [dspace]/config/modules/oai.cfg
File:
Property: ore.authoritative.source
Informational The webapp responsible for minting the URIs for ORE Resource Maps. If using oai, the
Note: dspace.oai.url config value must be set.
When set to 'oai', all URIs in ORE Resource Maps will be relative to the OAI-PMH
URL (configured by dspace.oai.url above)
When set to 'xmlui', all URIs in ORE Resource Maps will be relative to the DSpace
Base URL (configued by dspace.url in the dspace.cfg file)
The URIs generated for ORE ReMs follow the following convention for either setting:
http://\[base-URL\]/metadata/handle/\[item-handle\]/ore.xml
Property: harvester.autoStart
Informational Determines whether the harvest scheduler process starts up automatically when the XMLUI
Note: webapp is redeployed.
Property: harvester.oai.metadataformats.PluginName
Example
Value: harvester.oai.metadataformats.PluginName = \
http://www.openarchives.org/OAI/2.0/oai_dc/, Simple Dublin Core
Informational This field can be repeated and serves as a link between the metadata formats supported by
Note: the local repository and those supported by the remote OAI-PMH provider. It follows the
form harvester.oai.metadataformats.PluginName = NamespaceURI,
Optional Display Name . The pluginName designates the metadata schemas that the
harvester "knows" the local DSpace repository can support. Consequently, the PluginName
must correspond to a previously declared ingestion crosswalk. The namespace value is
used during negotiation with the remote OAI-PMH provider, matching it against a list
Configuration [dspace]/config/modules/oai.cfg
File:
Property: harvester.oai.oreSerializationFormat.OREPrefix
Example
Value: harvester.oai.oreSerializationFormat.OREPrefix = \
http://www.w3.org/2005/Atom
Property: harvester.timePadding
Informational Amount of time subtracted from the from argument of the PMH request to account for the
Note: time taken to negotiate a connection. Measured in seconds. Default value is 120.
Property: harvester.harvestFrequency
Informational How frequently the harvest scheduler checks the remote provider for updates. Should
Note: always be longer than timePadding . Measured in minutes. Default value is 720.
Property: harvester.minHeartbeat
Example harvester.minHeartbeat = 30
Value:
Informational The heartbeat is the frequency at which the harvest scheduler queries the local database to
Note: determine if any collections are due for a harvest cycle (based on the harvestFrequency)
value. The scheduler is optimized to then sleep until the next collection is actually ready to
be harvested. The minHeartbeat and maxHeartbeat are the lower and upper bounds on this
timeframe. Measured in seconds. Default value is 30.
Configuration [dspace]/config/modules/oai.cfg
File:
Property: harvester.maxHeartbeat
Informational The heartbeat is the frequency at which the harvest scheduler queries the local database to
Note: determine if any collections are due for a harvest cycle (based on the harvestFrequency)
value. The scheduler is optimized to then sleep until the next collection is actually ready to
be harvested. The minHeartbeat and maxHeartbeat are the lower and upper bounds on this
timeframe. Measured in seconds. Default value is 3600 (1 hour).
Property: harvester.maxThreads
Example harvester.maxThreads = 3
Value:
Informational How many harvest process threads the scheduler can spool up at once. Default value is 3.
Note:
Property: harvester.threadTimeout
Example harvester.threadTimeout = 24
Value:
Informational How much time passes before a harvest thread is terminated. The termination process
Note: waits for the current item to complete ingest and saves progress made up to that point.
Measured in hours. Default value is 24.
Property: harvester.unknownField
Informational You have three (3) choices. When a harvest process completes for a single item and it has
Note: been passed through ingestion crosswalks for ORE and its chosen descriptive metadata
format, it might end up with DIM values that have not been defined in the local repository.
This setting determines what should be done in the case where those DIM values belong to
an already declared schema. Fail will terminate the harvesting task and generate an error.
Ignore will quietly omit the unknown fields. Add will add the missing field to the local
repository's metadata registry. Default value: fail.
Property: harvester.unknownSchema
Configuration [dspace]/config/modules/oai.cfg
File:
Example
Value:
Informational When a harvest process completes for a single item and it has been passed through
Note: ingestion crosswalks for ORE and its chosen descriptive metadata format, it might end up
with DIM values that have not been defined in the local repository. This setting determines
what should be done in the case where those DIM values belong to an unknown schema.
Fail will terminate the harvesting task and generate an error. Ignore will quietly omit the
unknown fields. Add will add the missing schema to the local repository's metadata registry,
using the schema name as the prefix and "unknown" as the namespace. Default value: fail.
Property: harvester.acceptedHandleServer
Example
Value: harvester.acceptedHandleServer = \
hdl.handle.net, handle.test.edu
Informational A harvest process will attempt to scan the metadata of the incoming items (identifier.uri
Note: field, to be exact) to see if it looks like a handle. If so, it matches the pattern against the
values of this parameter. If there is a match the new item is assigned the handle from the
metadata value instead of minting a new one. Default value: hdl.handle.net .
Property: harvester.rejectedHandlePrefix
Informational Pattern to reject as an invalid handle prefix (known test string, for example) when attempting
Note: to find the handle of harvested items. If there is a match with this config parameter, a new
handle will be minted instead. Default value: 123456789 .
Scheduled Tasks
Using Database
OAI Manager (Database Data Source)
Scheduled Tasks
Client-side stylesheet
Metadata Formats
Encoding problems
Configuration
Basic Configuration
Advanced Configuration
General options
Add/Remove Metadata Formats
Add/Remove Metadata Fields
Driver/OpenAIRE compliance
Driver Compliance
OpenAIRE compliance
Introduction
Open Archives Initiative Protocol for Metadata Harvesting is a low-barrier mechanism for repository
interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service
Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or
services that are invoked within HTTP.
http://www.example.com/oai/<context>
Contexts could be seen as virtual distinct OAI interfaces, so with this one could have things like:
http://www.example.com/oai/request
http://www.example.com/oai/driver
http://www.example.com/oai/openaire
With this ingredients it is possible to build a robust solution that fulfills all requirements of Driver, OpenAIRE and
also other project-specific requirements. As shown in Figure 1, with contexts one could select a subset of all
available items in the data source. So when entering the OpenAIRE context, all OAI-PMH request will be
restricted to that subset of items.
At this stage, contexts could be seen as sets (also defined in the basic OAI-PMH protocol). The magic of XOAI
happens when one need specific metadata format to be shown in each context. Metadata requirements by
Driver slightly differs from the OpenAIRE ones. So for each context one must define its specific transformer. So,
contexts could be seen as an extension to the concept of sets.
To implement an OAI interface from the XOAI core library, one just need to implement the datasource interface.
OAI 2.0
OAI 2.0 is a separate webapp which is a complete substitute for the old "oai" webapp. OAI 2.0 has a
configurable data source, by default it will not query the DSpace SQL database at the time of the OAI-PMH
request. Instead, it keeps the required metadata in its Solr index (currently in a separate "oai" Solr core) and
serves it from there. It's also possible to set OAI 2.0 to only use the database for querying purposes if
necessary, but this decreases performance significantly. Furthermore, it caches the requests, so doing the
same query repeatedly is very fast. In addition to that it also compiles DSpace items to make uncached
responses much faster.
Using Solr
OAI 2.0 uses the Solr data source by default.
The Solr index can be updated at your convenience, depending on how fresh you need the information to be.
Typically, the administrator sets up a nightly cron job to update the Solr index from the SQL database.
Syntax
Actions
import Imports DSpace items into OAI Solr index (also cleans OAI cache)
clean-cache Cleans the OAI cache
Parameters
Scheduled Tasks
In order to refresh the OAI Solr index, it is required to run the [dspace]/bin/dspace oai import
command periodically. You can add the following task to your crontab:
Note that [dspace] should be replaced by the correct value, that is, the value defined in dspace.cfg
parameter dspace.dir.
Using Database
OAI 2.0 could also work using the database for querying. To configure that one must change the [dspace]
/config/modules/xoai.cfg file, specifically the "storage" parameter, setting it to "database". This
decreases performance significantly and likely has no other benefits than leaving out Solr as a dependency.
Syntax
Actions
Parameters
-v Verbose output
-h Shows an help text
Scheduled Tasks
In order to refresh the OAI cache and compile DSpace items (for fast responses), it is required to run the
[dspace]/bin/dspace xoai compile-items command periodically. You can add the following task to
your crontab:
Note that [dspace] should be replaced by the correct value, that is, the value defined in dspace.cfg
parameter dspace.dir.
Client-side stylesheet
The OAI-PMH response is an XML file. While OAI-PMH is primarily used by harvesting tools and usually not
directly by humans, sometimes it can be useful to look at the OAI-PMH requests directly - usually when setting
it up for the first time or to verify any changes you make. For these cases, XOAI provides an XSLT stylesheet to
transform the response XML to a nice looking, human-readable and interactive HTML. The stylesheet is linked
from the XML response and the transformation takes place in the user's browser (this requires a recent
browser, older browsers will only display the XML directly). Most automated tools are interested only in the XML
file itself and will not perform the transformation. If you want, you can change which stylesheet will be used by
placing it into the [dspace]/webapps/oai/static directory (or into the [dspace-src]/dspace-xoai
/dspace-xoai-webapp/src/main/webapp/static after which you have to rebuild DSpace), modifying
the "stylesheet" attribute of the "Configuration" element in [dspace]/config/crosswalks/oai/xoai.xml
and restarting your servlet container.
Metadata Formats
By default OAI 2.0 provides 12 metadata formats within the /request context:
1. OAI_DC
2. DIDL
3. DIM
4. ETDMS
5. METS
6. MODS
7. OAI-ORE
8. QDC
9. RDF
10.
10. MARC
11. UKETD_DC
12. XOAI
1. OAI_DC
2. DIDL
3. METS
1. OAI_DC
2. METS
Encoding problems
There are two main potential sources of encoding problems:
a) The servlet connector port has to use the correct encoding. E.g. for Tomcat, this would be <Connector
port="8080" ... URIEncoding="UTF-8" />, where the port attribute specifies port of the connector that
DSpace is configured to access Solr on (this is usually 8080, 80 or in case of AJP 8009).
b) System locale of the dspace command line script that is used to do the oai import. Make sure the user
account launching the script (usually from cron) has the correct locale set (e.g. en_US.UTF-8). Also make sure
the locale is actually present on your system.
Configuration
Basic Configuration
Configuration [dspace]/config/modules/oai.cfg
File:
Property: storage
Information This allows to choose the OAI data source between solr and database
Note:
Property: solr.url
Configuration [dspace]/config/modules/oai.cfg
File:
Informational
Note:
Property: identifier.prefix
Property: config.dir
Informational Configuration directory, used by XOAI (core library). Contains xoai.xml, metadata format
Note: XSLTs and transformer XSLTs.
Property: cache.dir
Advanced Configuration
OAI 2.0 allows you to configure following advanced options:
Contexts
Transformers
Metadata Formats
Filters
Sets
General options
These options influence the OAI interface globally. "per page" means per request, next page (if there is one)
can be requested using resumptionToken provided in current page.
identation [boolean] - whether the output XML should be indented to make it human-readable
maxListIdentifiersSize [integer] - how many identifiers to show per page (verb=ListIdentifiers)
Their location and default values are shown in the following fragment:
<Configuration xmlns="http://www.lyncode.com/XOAIConfiguration"
identation="false"
maxListIdentifiersSize="100"
maxListRecordsSize="100"
maxListSetsSize="100"
stylesheet="static/style.xsl">
<Context baseurl="request">
<Format refid="oaidc" />
<Format refid="mets" />
<Format refid="xoai" />
<Format refid="didl" />
<Format refid="dim" />
<Format refid="ore" />
<Format refid="rdf" />
<Format refid="etdms" />
<Format refid="mods" />
<Format refid="qdc" />
<Format refid="marc" />
<Format refid="uketd_dc" />
</Context>
<Context baseurl="request">
<Format refid="oaidc" />
<Format refid="mets" />
<Format refid="didl" />
<Format refid="dim" />
<Format refid="ore" />
<Format refid="rdf" />
<Format refid="etdms" />
<Format refid="mods" />
<Format refid="qdc" />
<Format refid="marc" />
<Format refid="uketd_dc" />
</Context>
It is also possible to create new metadata format by creating a specific XSLT for it. All already defined XSLT for
DSpace can be found in the [dspace]/config/crosswalks/oai/metadataFormats directory. So after producing
a new one, add the following information (location marked using brackets) inside the <Formats> element in
[dspace]/config/crosswalks/oai/xoai.xml:
<Format id="[IDENTIFIER]">
<Prefix>[PREFIX]</Prefix>
<XSLT>metadataFormats/[XSLT]</XSLT>
<Namespace>[NAMESPACE]</Namespace>
<SchemaLocation>[SCHEMA_LOCATION]</SchemaLocation>
</Format>
where:
Parameter Description
IDENTIFIER The identifier used within context configurations to reference this specific format,
must be unique within all Metadata Formats available.
Therefore exposing any DSpace metadata field in any OAI format is just a matter of modifying the
corresponding output format stylesheet (This assumes the general knowledge of how XSLT works. For a
tutorial, see e.g. http://www.w3schools.com/xsl/).
For example, if you have a DC field "local.note.librarian" that you want to expose in oai_dc as <dc:note> (please
note that this is not a valid DC field and thus breaks compatibility), then edit oai_dc.xsl and add the following
lines just above the closing tag </oai_dc:dc>:
<xsl:for-each select="doc:metadata/doc:element[@name='local']/doc:element[@name='note']/doc:element
/doc:element/doc:field[@name='librarian']">
<dc:note><xsl:value-of select="." /></dc:note>
</xsl:for-each>
If you need to add/remove metadata fields, you're changing the output format. Therefore it is recommended to
create a new metadata format as a copy of the one you want to modify. This way the old format will remain
available along with the new one and any upgrades to the original format during DSpace upgrades will not
overwrite your customizations. If you need the format to have the same name as the original format (e.g. the
default oai_dc format), you can create a new context in xoai.xsl containing your modified format with the original
name, which will be available as /oai/context-name.
NOTE: Please, keep in mind that the OAI provider caches the transformed output, so you have to run
[dspace]/bin/dspace oai clean-cache after any .xsl modification and reload the OAI page for the
changes to take effect. When adding/removing metadata formats, making changes in [dspace]/config/crosswalks
/oai/xoai.xml requires reloading/restarting the servlet container.
Driver/OpenAIRE compliance
The default OAI 2.0 installation provides two new contexts. They are:
However, in order to be exposed DSpace items must be compliant with Driver/OpenAIRE guide-lines.
Driver Compliance
DRIVER Guidelines for Repository Managers and Administrators on how to expose digital scientific resources
using OAI-PMH and Dublin Core Metadata, creating interoperability by homogenizing the repository output. The
set driver of OAI-PMH is based on DRIVER Guidelines 2.0 (see the English version of the document)
This set is used to expose items of the repository that are available for open access. It’s not necessary for all
the items of the repository to be available for open access.
To have items in this set, you must configure your input-forms.xml file in order to comply with the DRIVER
Guidelines:
As DRIVER guidelines use Dublin Core, all the needed items are already registered in DSpace. You just need
to configure the deposit process.
OpenAIRE compliance
The OpenAIRE Guidelines 2.0 provide the OpenAIRE compatibility to repositories and aggregators. By
implementing these Guidelines, repository managers are facilitating the authors who deposit their publications
in the repository in complying with the EC Open Access requirements. For developers of repository platforms,
the Guidelines provide guidance to add supportive functionalities for authors of EC-funded research in future
versions.
The name of the set in OAI-PMH is "ec_fundedresources" and will expose the items of the repository that
comply with these guidelines. These guidelines are based on top of DRIVER guidelines. See version 2.0 of the
Guidelines.
These are the OpenAIRE metadata values only, to check these and driver metadata values check page 11 of
the OpenAIRE guidelines 2.0.
Optionally:
dc:date with the embargo end date (recommended for embargoed items)
<dc:date>info:eu-repo/date/embargoEnd/2011-05-12<dc:date>
Have a dc:relation field in input-forms.xml with a list of the projects. You can also use the OpenAIRE
Authority Control Addon to facilitate the process of finding the project.
Just use a combo-box for dc:rights to input the 4 options:
info:eu-repo/semantics/closedAccess
info:eu-repo/semantics/embargoedAccess
info:eu-repo/semantics/restrictedAccess
info:eu-repo/semantics/openAccess
Use an input-box for dc:date to insert the embargo end date
Relevant Links
The DSpace build process builds a Web application archive, [dspace-source]/build/oai.war), in much the same
way as the Web UI build process described above. The only differences are that the JSPs are not included.
This "webapp" is deployed to receive and respond to OAI-PMH requests via HTTP. In a typical configuration,
this is deployed at oai, containing request, driver and openaire contexts, for example:
http://dspace.myu.edu/oai/request?verb=Identify
http://dspace.myu.edu/oai/request
http://dspace.myu.edu/oai/driver
http://dspace.myu.edu/oai/openaire
Sets
OAI-PMH allows repositories to expose an hierarchy of sets in which records may be placed. A record can be in
zero or more sets.
Each community and collection has a corresponding OAI set, discoverable by harvesters via the ListSets verb.
The setSpec is based on the community/collection handle, with the "/" converted to underscore to form a legal
setSpec. The setSpec is prefixed by "com_" or "col_" for communities and collections, respectively (this is a
change in set names in DSpace 3.0 / OAI 2.0). For example:
col_1721.1_1234
Naturally enough, the community/collection name is also the name of the corresponding set.
Unique Identifier
Every item in OAI-PMH data repository must have an unique identifier, which must conform to the URI syntax.
As of DSpace 1.2, Handles are not used; this is because in OAI-PMH, the OAI identifier identifies the metadata
record associated with the resource. The resource is the DSpace item, whose resource identifier is the Handle.
In practical terms, using the Handle for the OAI identifier may cause problems in the future if DSpace instances
share items with the same Handles; the OAI metadata record identifiers should be different as the different
DSpace instances would need to be harvested separately and may have different metadata for the item.
oai:PREFIX:handle
For example:
oai:dspace.myu.edu:123456789/345
If you wish to use a different scheme, this can easily be changed by editing the value of identifier.prefix at
[dspace]/config/modules/oai.cfg file.
Access control
OAI provides no authentication/authorisation details, although these could be implemented using standard
HTTP methods. It is assumed that all access will be anonymous for the time being.
A question is, "is all metadata public?" Presently the answer to this is yes; all metadata is exposed via OAI-
PMH, even if the item has restricted access policies. The reasoning behind this is that people who do actually
have permission to read a restricted item should still be able to use OAI-based services to discover the content.
But, exposed data could be changed by changing the XSLT defined at [dspace]/config/crosswalks/oai
/metadataFormats.
"About" Information
As part of each record given out to a harvester, there is an optional, repeatable "about" section which can be
filled out in any (XML-schema conformant) way. Common uses are for provenance and rights information, and
there are schemas in use by OAI communities for this. Presently DSpace does not provide any of this
information, but XOAI core library allows its definition. This requires to dive into code and perform some
changes.
Deletions
DSpace keeps track of deletions (withdrawals). These are exposed via OAI, which has a specific mechansim for
dealing with this. Since DSpace keeps a permanent record of withdrawn items, in the OAI-PMH sense DSpace
supports deletions "persistently". This is as opposed to "transient" deletion support, which would mean that
deleted records are forgotten after a time.
Once an item has been withdrawn, OAI-PMH harvests of the date range in which the withdrawal occurred will
find the "deleted" record header. Harvests of a date range prior to the withdrawal will not find the record, despite
the fact that the record did exist at that time.
As an example of this, consider an item that was created on 2002-05-02 and withdrawn on 2002-10-06. A
request to harvest the month 2002-10 will yield the "record deleted" header. However, a harvest of the month
2002-05 will not yield the original record.
Note that presently, the deletion of "expunged" items is not exposed through OAI.
DSpace supports resumption tokens for "ListRecords", "ListIdentifiers" and "ListSets" OAI-PMH requests.
Each OAI-PMH ListRecords request will return at most 100 records (by default) but it could be configured in the
[dspace]/config/crosswalks/oai/xoai.xml file.
When a resumption token is issued, the optional completeListSize and cursor attributes are included. OAI 2.0
resumption tokens are persistent, so expirationDate of the resumption token is undefined, they do not expire.
Resumption tokens contain all the state information required to continue a request and it is encoded in Base64.
At present this functionality has only been developed for the XMLUI and is disabled by default.
Configuration [dspace]/config/modules/sword-client.cfg
File:
Property: targets
Example value:
targets = http://localhost:8080/sword/servicedocument, \
http://client.swordapp.org/client/servicedocument, \
http://dspace.swordapp.org/sword/servicedocument, \
http://sword.eprints.org/sword-app/servicedocument, \
http://sword.intralibrary.com/IntraLibrary-Deposit/service, \
http://fedora.swordapp.org/sword-fedora/servicedocument
Informational List of remote Sword servers. Used to build the drop-down list of selectable SWORD
note: targets.
Property: file-types
Configuration [dspace]/config/modules/sword-client.cfg
File:
Informational List of file types from which the user can select. If a type is not supported by the remote
note: server
it will not appear in the drop-down list.
Property: package-formats
Example value:
package-formats = http://purl.org/net/sword-types/METSDSpaceSIP
Informational List of package formats from which the user can select. If a format is not supported by the
note: remote server
it will not appear in the drop-down list.
1. "Select Collection" step: If not already selected, the user must select a collection to deposit the Item into.
2. "Describe" step: This is where the user may enter descriptive metadata about the Item. This step may
consist of one or more pages of metadata entry. By default, there are two pages of metadata-entry. For
information on modifying the metadata entry pages, please see Custom Metadata-entry Pages for
Submission section below.
3. "Upload" step: This is where the user may upload one or more files to associate with the Item. For more
information on file upload, also see Configuring the File Upload step below.
4. "Review" step: This is where the user may review all previous information entered, and correct anything
as needed.
5. "License" step: This is where the user must agree to the repository distribution license in order to
complete the deposit. This repository distribution license is defined in the [dspace]/config
/default.license file. It can also be customized per-collection from the Collection Admin UI.
6. "Complete" step: The deposit is now completed. The Item will either become immediately available or
undergo a workflow approval process (depending on the Collection policies). For more information on the
workflow approval process see: Configurable Workflow.
You can also choose to have different submission processes for different DSpace Collections. For more details,
please see the section below on Assigning a custom Submission Process to a Collection.
Prior to DSpace 4.0, the "Initial Questions" step preceded all "Describe" steps. However, it was
removed by default in DSpace 4.0.
You may still choose to re-enable the "Initial Questions" step, as needed. However, please note the
warning below about the auto-assigning of Dates in the "Initial Questions" step.
Optional Steps
DSpace also ships with several optional steps which you may choose to enable if you wish. In no particular
order:
"Access" step: This step allows the user to (optionally) modify access rights or set an embargo during the
deposit of an Item. For more information on this step, and Embargo options in general, please see the
Embargo documentation.
"CC License" step: This step allows the user to (optionally) assign a Creative Commons license to a
particular Item. Please see the Configuring Creative Commons License section of the Configuration
documentation for more details.
"Start Submission Lookup" step: This step allows the user to search or load metadata from an external
service (arXiv online, bibtex file, etc.) and prefill the submission form. For more information on enabling
and using it, please see the section on Configuring StartSubmissionLookupStep below.
"Initial Questions" step: This step asks users a simple set of "initial questions" which help to determine
which metadata fields are displayed in the "Describe" step (see above). These initial questions include:
Multiple Titles: The item has more than one title, e.g. a translated title (If selected, then users will
be asked for an alternative title in the Describe step)
Published Before: The item has been published or publicly distributed before (If selected, then
users will be asked for a publication date and publisher in the Describe step).
Please note, if you enable Initial Questions, and your users do NOT select "Published
Before" option, then DSpace will auto-assign a publication date (dc.date.issued) to that
particular Item.
It may be entirely accurate for some types of content (e.g. for gray literature or even
theses/dissertations) to auto-assign this publication date. As such, you may wish to still
enable "Initial Questions" if your repository is mainly for previously unpublished content.
You may also choose to only enable it for specific Collections – see Assigning a custom
Submission Process to a Collection section below.
However, if the Item actually was published in some other location, this will result in an
incorrect publication date being reported by DSpace. This tendency for an incorrect
publication date has been reported by Google Scholar to DSpace developers (see: DS-
1481), which is why the "Initial Questions" are now disabled by default (see DS-1655).
To enable any of these optional submission steps, just uncomment the step definition within the [dspace]
/config/item-submission.xml file. Please see the section below on Reordering/Removing/Adding
Submission Steps.
You can also choose to enable certain steps only for specific DSpace Collections. For more details, please see
the section below on Assigning a custom Submission Process to a Collection.
<item-submission>
<!-- Where submission processes are mapped to specific Collections -->
<submission-map>
<name-map collection-handle="default" submission-name="traditional" /> ...
</submission-map>
<!-- Where "steps" which are used across many submission processes can be defined in a
single place. They can then be referred to by ID later. -->
<step-definitions>
<step id="collection">
<processing-class>org.dspace.submit.step.SelectCollectionStep</process;/processing-class>
<workflow-editable>false</workflow-editable>
</step>
...
</step-definitions>
<!-- Where actual submission processes are defined and given names. Each <submission-process>
has
many <step> nodes which are in the order that the steps should be in.-->
<submission-definitions> <submission-process name="traditional">
...
<!-- Step definitions appear here! -->
</submission-process>
...
</submission-definitions>
</item-submission>
Because this file is in XML format, you should be familiar with XML before editing this file. By default, this file
contains the "traditional" Item Submission Process for DSpace, which consists of the following Steps (in this
order):
Select Collection -> Describe -> Upload -> Verify -> License -> Complete
If you would like to customize the steps used or the ordering of the steps, you can do so within the <submission-
definition> section of the item-submission.xml .
In addition, you may also specify different Submission Processes for different DSpace Collections. This can be
done in the <submission-map> section. The item-submission.xml file itself documents the syntax required to
perform these configuration changes.
<step-definitions>
<step id="custom-step">
...
</step>
...
</step-definitions>
The above step definition could then be referenced from within a <submission-process> as simply
<step id="custom-step"/>
2. Within a specific <submission-process>definition
This is for steps which are specific to a single <submission-process> definition.
For example:
<submission-process>
<step>
...
</step>
</submission-process>
For example, the following defines a Submission Process where the License step directly precedes the Initial
Questions step (more information about the structure of the information under each <step> tag can be found in
the section on Structure of the <step> Definition below):
<submission-process>
<!--Step 1 will be to Sign off on the License-->
<step>
<heading>submit.progressbar.license</heading>
<processing-class>org.dspace.submit.step.LicenseStep</processing-classing-class>
<jspui-binding>org.dspace.app.webui.submit.step.JSPLicenseStep</jspui-binding>
<xmlui-binding>org.dspace.app.xmlui.aspect.submission.submit.LicenseStenseStep</xmlui-
binding>
<workflow-editable>false</workflow-editable>
</step>
<!--Step 2 will be to Ask Initial Questions-->
<step>
<heading>submit.progressbar.initial-questions</heading>
<processing-class>org.dspace.submit.step.InitialQuestionsStep</process;/processing-class>
<jspui-binding>org.dspace.app.webui.submit.step.JSPInitialQuestionsSteonsStep</jspui-
binding>
<xmlui-binding>org.dspace.app.xmlui.aspect.submission.submit.InitialQutialQuestionsStep<
/xmlui-binding>
<workflow-editable>true</workflow-editable>
</step>
...[other steps]...
</submission-process>
<step>
<heading>submit.progressbar.describe</heading>
<processing-class>org.dspace.submit.step.DescribeStep</processing-classing-class>
<jspui-binding>org.dspace.app.webui.submit.step.JSPDescribeStep</jspuilt;/jspui-binding>
<xmlui-binding>org.dspace.app.xmlui.aspect.submission.submit.DescribeScribeStep</xmlui-binding>
<workflow-editable>true</workflow-editable>
</step>
Each step contains the following elements. The required elements are so marked:
heading: Partial I18N key (defined in Messages.properties for JSPUI or messages.xmlfor XMLUI) which
corresponds to the text that should be displayed in the submission Progress Bar for this step. This partial
I18N key is prefixed within either the Messages.properties or messages.xml file, depending on the
interface you are using. Therefore, to find the actual key, you will need to search for the partial key with
the following prefix:
XMLUI: prefix is xmlui.Submission. (e.g. "xmlui.Submission.submit.progressbar.describe" for
'Describe' step)
JSPUI: prefix is jsp. (e.g. "jsp.submit.progressbar.describe" for 'Describe' step)The 'heading' need
not be defined if the step should not appear in the progress bar (e.g. steps which perform
automated processing, i.e. non-interactive, should not appear in the progress bar).
processing-class (Required): Full Java path to the Processing Class for this Step. This Processing
Class must perform the primary processing of any information gathered in this step, for both the XMLUI
and JSPUI. All valid step processing classes must extend the abstract org.dspace.submit.
AbstractProcessingStep class (or alternatively, extend one of the pre-existing step processing
classes in org.dspace.submit.step.*)
jspui-binding: Full Java path of the JSPUI "binding" class for this Step. This "binding" class should
initialize and call the appropriate JSPs to display the step's user interface. A valid JSPUI "binding" class
must extend the abstract org.dspace.app.webui.submit.JSPStep class. This property need not
be defined if you are using the XMLUI interface, or for steps which only perform automated processing, i.
e. non-interactive steps.
xmlui-binding: Full Java path of the XMLUI "binding" class for this Step. This "binding" class should
generate the Manakin XML (DRI document) necessary to generate the step's user interface. A valid
XMLUI "binding" class must extend the abstract org.dspace.app.xmlui.submission.
AbstractSubmissionStep class. This property need not be defined if you are using the JSPUI
interface, or for steps which only perform automated processing, i.e. non-interactive steps.
workflow-editable: Defines whether or not this step can be edited during the Edit Metadata process with
the DSpace approval/rejection workflow process. Possible values include true and false. If undefined,
defaults to true (which means that workflow reviewers would be allowed to edit information gathered
during that step).
Reordering steps
1. Locate the <submission-process> tag which defines the Submission Process that you are using. If
you are unsure which Submission Process you are using, it's likely the one with name="traditional",
since this is the traditional DSpace submission process.
2. Reorder the <step> tags within that <submission-process> tag. Be sure to move the entire <step>
tag (i.e. everything between and including the opening <step> and closing </step> tags).
Hint #1: The <step> defining the Review/Verify step only allows the user to review information
from steps which appear before it. So, it's likely you'd want this to appear as one of your last few
steps
Hint #2: If you are using it, the <step> defining the Initial Questions step should always appear
before the Upload or Describe steps since it asks questions which help to set up those later
steps.
1. Locate the <submission-process> tag which defines the Submission Process that you are using. If
you are unsure which Submission Process you are using, it's likely the one with name="traditional",
since this is the traditional DSpace submission process.
2. Comment out (i.e. surround with <! -- and -->) the <step> tags which you want to remove from that
<submission-process> tag. Be sure to comment out the entire <step > tag (i.e. everything between and
including the opening <step> and closing </step> tags).
Hint #1: You cannot remove the Select a Collection step, as an DSpace Item cannot exist without
belonging to a Collection.
Hint #2: If you decide to remove the <step> defining the Initial Questions step, you should be
aware that this may affect your Describe and Upload steps! The Initial Questions step asks
questions which help to initialize these later steps. If you decide to remove the Initial Questions
step you may wish to create a custom, automated step which will provide default answers for the
questions asked!
1. Locate the <submission-process> tag which defines the Submission Process that you are using. If
you are unsure which Submission Process you are using, it's likely the one with name="traditional",
since this is the traditional DSpace submission process.
2. Uncomment (i.e. remove the <! -- and -->) the <step> tag(s) which you want to add to that
<submission-process> tag. Be sure to uncomment the entire <step> tag (i.e. everything between
and including the opening <step> and closing </step> tags).
Each name-map element within submission-map associates a collection with the name of a submission
definition. Its collection-handle attribute is the Handle of the collection. Its submission-name attribute is the
submission definition name, which must match the name attribute of a submission-process element (in the
submission-definitions section of item-submission.xml.
For example, the following fragment shows how the collection with handle "12345.6789/42" is assigned the
"custom" submission process:
<submission-map>
<name-map collection-handle=" 12345.6789/42" submission-name="custom" />
...
</submission-map>
<submission-definitions>
<submission-process name="custom">
...
</submission-definitions>
It's a good idea to keep the definition of the default name-map from the example input-forms.xml so there is
always a default for collections which do not have a custom form set.
http://myhost.my.edu/dspace/handle/12345.6789/42
The underlined part of the URL is the handle. It should look familiar to any DSpace administrator. That is what
goes in the collection-handle attribute of your name-map element.
Introduction
This section explains how to customize the Web forms used by submitters and editors to enter and modify the
metadata for a new item. These metadata web forms are controlled by the Describe step within the Submission
Process. However, they are also configurable via their own XML configuration file (input-forms.xml).
You can customize the "default" metadata forms used by all collections, and also create alternate sets of
metadata forms and assign them to specific collections. In creating custom metadata forms, you can choose:
NOTE: The cosmetic and ergonomic details of metadata entry fields remain the same as the fixed metadata
pages in previous DSpace releases, and can only be altered by modifying the appropriate stylesheet and JSP
pages.
All of the custom metadata-entry forms for a DSpace instance are controlled by a single XML file, input-forms.
xml, in the config subdirectory under the DSpace home. DSpace comes with a sample configuration that
implements the traditional metadata-entry forms, which also serves as a well-documented example. The rest of
this section explains how to create your own sets of custom forms.
To set up one of your DSpace collections with customized submission forms, first you make an entry in the form-
map. This is effectively a table that relates a collection to a form set, by connecting the collection's Handle to
the form name. Collections are identified by handle because their names are mutable and not necessarily
unique, while handles are unique and persistent.
A special map entry, for the collection handle "default", defines the default form set. It applies to all collections
which are not explicitly mentioned in the map. In the example XML this form set is named traditional (for the
"traditional" DSpace user interface) but it could be named anything.
<input-forms>
...
</value-pairs>
...
</form-value-pairs>
</input-forms>
For example, the following fragment shows how the collection with handle "12345.6789/42" is attached to the
"TechRpt" form set:
<form-map>
<name-map collection-handle=" 12345.6789/42" form-name=" TechRpt"/>
...
</form-map>
<form-definitions>
<form name="TechRept">
...
</form-definitions>
It's a good idea to keep the definition of the default name-map from the example input-forms.xml so there is
always a default for collections which do not have a custom form set.
http://myhost.my.edu/dspace/handle/12345.6789/42
The underlined part of the URL is the handle. It should look familiar to any DSpace administrator. That is what
goes in the collection-handle attribute of your name-map element.
The content of the form is a sequence of page elements. Each of these corresponds to a Web page of forms for
entering metadata elements, presented in sequence between the initial "Describe" page and the final "Verify"
page (which presents a summary of all the metadata collected).
A form must contain at least one and at most six pages. They are presented in the order they appear in the
XML. Each page element must include a number attribute, that should be its sequence number, e.g.
<page number="1">
The page element, in turn, contains a sequence of field elements. Each field defines an interactive dialog where
the submitter enters one of the Dublin Core metadata items.
Composition of a Field
Each field contains the following elements, in the order indicated. The required sub-elements are so marked:
dc-schema (Required) : Name of metadata schema employed, e.g. dc for Dublin Core. This value must
match the value of the schema element defined in dublin-core-types.xml
dc-element (Required) : Name of the Dublin Core element entered in this field, e.g. contributor.
dc-qualifier: Qualifier of the Dublin Core element entered in this field, e.g. when the field is contributor.
advisor the value of this element would be advisor. Leaving this out means the input is for an unqualified
DC element.
repeatable: Value is true when multiple values of this field are allowed, false otherwise. When you mark
a field repeatable, the UI servlet will add a control to let the user ask for more fields to enter additional
values. Intended to be used for arbitrarily-repeating fields such as subject keywords, when it is
impossible to know in advance how many input boxes to provide.
label (Required): Text to display as the label of this field, describing what to enter, e.g. " Your Advisor's
Name".
input-type(Required): Defines the kind of interactive widget to put in the form to collect the Dublin Core
value. Content must be one of the following keywords:
onebox – A single text-entry box.
twobox – A pair of simple text-entry boxes, used for repeatable values such as the DC subject
item. Note: The 'twobox' input type is rendered the same as a 'onebox' in the XML-UI, but both
allow for ease of adding multiple values.
textarea – Large block of text that can be entered on multiple lines, e.g. for an abstract.
name – Personal name, with separate fields for family name and first name. When saved they are
appended in the format 'LastName, FirstName'
date – Calendar date. When required, demands that at least the year be entered.
series – Series/Report name and number. Separate fields are provided for series name and
series number, but they are appended (with a semicolon between) when saved.
dropdown – Choose value(s) from a "drop-down" menu list. Note: You must also include a value
for the value-pairs-name attribute to specify a list of menu entries from which to choose. Use this
to make a choice from a restricted set of options, such as for the language item.
qualdrop_value – Enter a "qualified value", which includes both a qualifier from a drop-down
menu and a free-text value. Used to enter items like alternate identifiers and codes for a submitted
item, e.g. the DC identifier field. Note: As for the dropdown type, you must include the value-pairs-
name attribute to specify a menu choice list.
list – Choose value(s) from a checkbox or radio button list. If the repeatable attribute is set to true,
a list of checkboxes is displayed. If the repeatable attribute is set to false, a list of radio buttons is
displayed. Note: You must also include a value for the value-pairs-name attribute to specify a list
of values from which to choose.
hint (Required): Content is the text that will appear as a "hint", or instructions, next to the input fields.
Can be left empty, but it must be present.
required: When this element is included with any content, it marks the field as a required input. If the
user tries to leave the page without entering a value for this field, that text is displayed as a warning
message. For example, <required>You must enter a title.</required> Note that leaving the required
element empty will not mark a field as required, e.g.:<required></required>
visibility: When this optional element is included with a value, it restricts the visibility of the field to the
scope defined by that value. If the element is missing or empty, the field is visible in all scopes. Currently
supported scopes are:
workflow : the field will only be visible in the workflow stages of submission. This is good for
hiding difficult fields for users, such as subject classifications, thereby easing the use of the
submission system.
submit : the field will only be visible in the initial submission, and not in the workflow stages. In
addition, you can decide which type of restriction apply: read-only or full hidden the field (default
behaviour) using the otherwise attribute of the visibility XML element. For example:<visibility
otherwise="readonly">workflow</visibility> Note that it is considered a configuration error to limit a
field's scope while also requiring it - an exception will be generated when this combination is
detected.
Look at the example input-forms.xml and experiment with a a trial custom form to learn this
specification language thoroughly. It is a very simple way to express the layout of data-entry
forms, but the only way to learn all its subtleties is to use it.
For the use of controlled vocabularies see the Configuring Controlled Vocabularies section.
Item type Based Metadata Collection
This feature is available for use with the XMLUI since DSpace 3.0 and with JSPUI since 3.1. A field can be
made visible depending on the value of dc.type. A new field element, <type-bind>, has been introduced to
facilitate this. In this example the field will only be visible if a value of "thesis" or "ebook" has been entered into
dc.type on an earlier page:
<field>
<dc-schema>dc</dc-schema>
<dc-element>identifier</dc-element>
<dc-qualifier>isbn</dc-qualifier>
<label>ISBN</label>
<type-bind>thesis,ebook</type-bind>
</field>
When a user initiates a submission, DSpace first displays what we'll call the "initial-questions page". By default,
it contains three questions with check-boxes:
1. The item has more than one title, e.g. a translated title Controls title.alternative field.
2. The item has been published or publicly distributed before Controls DC fields:
date.issued
publisher
identifier.citation
3. The item consists of more than one file Does not affect any metadata input fields.
The answers to the first two questions control whether inputs for certain of the DC metadata fields will
displayed, even if they are defined as fields in a custom page. Conversely, if the metadata fields controlled by a
checkbox are not mentioned in the custom form, the checkbox is omitted from the initial page to avoid confusing
or misleading the user.
The two relevant checkbox entries are "The item has more than one title, e.g. a translated title", and "The item
has been published or publicly distributed before". The checkbox for multiple titles trigger the display of the field
with dc-element equal to "title" and dc-qualifier equal to "alternative". If the controlling collection's form set does
not contain this field, then the multiple titles question will not appear on the initial questions page.
The taxonomies are described in XML following this (very simple) structure:
</isComposedBy>
</node>
You are free to use any application you want to create your controlled vocabularies. A simple text editor should
be enough for small projects. Bigger projects will require more complex tools. You may use Protegé to create
your taxonomies, save them as OWL and then use a XML Stylesheet (XSLT) to transform your documents to
the appropriate format. Future enhancements to this add-on should make it compatible with standard schemas
such as OWL or RDF.
Vocabularies need to be associated with the correspondant DC metadata fields. Edit the file [dspace]
/config/input-forms.xml and place a "vocabulary" tag under the "field" element that you want to
control. Set value of the "vocabulary" element to the name of the file that contains the vocabulary, leaving
out the extension (the add-on will only load files with extension "*.xml"). For example:
<field>
<dc-schema>dc</dc-schema>
<dc-element>subject</dc-element>
<dc-qualifier></dc-qualifier>
<repeatable>true</repeatable>
<label>Subject Keywords</label>
<input-type>onebox</input-type>
<hint>Enter appropriate subject keywords or phrases below.</hint>
<required></required>
<vocabulary>srsc</vocabulary>
</field>
The vocabulary element has an optional boolean attribute closed that can be used to force input only with the
Javascript of controlled-vocabulary add-on. The default behaviour (i.e. without this attribute) is as set closed="
false". This allow the user also to enter the value in free way.
Adding Value-Pairs
Finally, your custom form description needs to define the "value pairs" for any fields with input types that refer to
them. Do this by adding a value-pairs element to the contents of form-value-pairs. It has the following required
attributes:
Each value-pairs element contains a sequence of pair sub-elements, each of which in turn contains two
elements:
displayed-value – Name shown (on the web page) for the menu entry.
stored-value – Value stored in the DC element when this entry is chosen. Unlike the HTML select tag,
there is no way to indicate one of the entries should be the default, so the first entry is always the default
choice.
Example
Here is a menu of types of common identifiers:
It generates the following HTML, which results in the menu widget below. (Note that there is no way to indicate
a default choice in the custom input XML, so it cannot generate the HTML SELECTED attribute to mark one of
the options as a pre-selected default.)
<select name="identifier_qualifier_0">
<option VALUE="govdoc">Gov't Doc #</option>
<option VALUE="uri">URI</option>
<option VALUE="isbn">ISBN</option>
</select>
You must always restart Tomcat (or whatever servlet container you are using) for changes made to the
input-forms.xml file take effect.
Any mistake in the syntax or semantics of the form definitions, such as poorly formed XML or a reference to a
nonexistent field name, will cause a fatal error in the DSpace UI. The exception message (at the top of the stack
trace in the dspace.log file) usually has a concise and helpful explanation of what went wrong. Don't forget to
stop and restart the servlet container before testing your fix to a bug.
upload.max- The maximum size of a file (in bytes) that can be uploaded from the JSPUI (not applicable
for the XMLUI). It defaults to 536870912 bytes (512MB). You may set this to -1 to disable any file size
limitation.
Note: Increasing this value or setting to -1 does not guarantee that DSpace will be able to
successfully upload larger files via the web, as large uploads depend on many other factors
including bandwidth, web server settings, internet connection speed, etc.
webui.submit.upload.required - Whether or not all users are required to upload a file when they submit an
item to DSpace. It defaults to 'true'. When set to 'false' users will see an option to skip the upload step
when they submit a new item.
That being said, at a higher level, creating a new Submission Step requires the following (in this relative order):
3.
3. (For steps using XMLUI) Create an XMLUI "binding" Step Transformer which will generate the DRI XML
which Manakin requires.
The Step Transformer must extend and implement all necessary methods within the abstract class
org.dspace.app.xmlui.submission.AbstractSubmissionStep
It is useful to use the existing classes in org.dspace.app.xmlui.submission.submit.* as
references
4. (Required) Add a valid Step Definition to the item-submission.xmlconfiguration file.
This may also require that you add an I18N (Internationalization) key for this step's heading. See
the sections on Configuring Multilingual Support for JSPUI or Configuring Multilingual Support for
XMLUI for more details.
For more information on <step> definitions within the item-submission.xml, see the section
above on Defining Steps (<step>) within the item-submission.xml.
1. Create the required Step Processing class, which extends the abstract org.dspace.submit.
AbstractProcessingStep class. In this class add any processing which this step will perform.
2. Add your non-interactive step to your item-submission.xml at the place where you wish this step to be
called during the submission process. For example, if you want it to be called immediately after the
existing 'Upload File' step, then place its configuration immediately after the configuration for that 'Upload
File' step. The configuration should look similar to the following:
<step>
<processing-class>org.dspace.submit.step.MyNonInteractiveStep</processing-class>
<workflow-editable>false</workflow-editable>
</step>
Note: Non-interactive steps will not appear in the Progress Bar! Therefore, your submitters will not even know
they are there. However, because they are not visible to your users, you should make sure that your non-
interactive step does not take a large amount of time to finish its processing and return control to the next step
(otherwise there will be a visible time delay in the user interface).
Configuring StartSubmissionLookupStep
StartSubmissionLookupStep is a new submission step, available since DSpace 4.0 contributed by CINECA, that
extends the basic SelectCollectionStep allowing the user to search or load metadata from an external service
(arxiv online, bibtex file, etc.) and prefill the submission form. Thanks to the EKT works it is underpinned by the
Biblio Transformation Engine ( https://github.com/EKT/Biblio-Transformation-Engine ) framework.
To enable the StartSubmissionLookupStep you only need to change the configuration of the id="collection" step
to match the following
item-submission.xml excerpt
<step id="collection">
<heading></heading> <!--can specify heading, if you want it to appear in Progress Bar-->
<processing-class>org.dspace.submit.step.StartSubmissionLookupStep</processing-class>
<jspui-binding>org.dspace.app.webui.submit.step.JSPStartSubmissionLookupStep</jspui-binding>
<xmlui-binding>org.dspace.app.xmlui.aspect.submission.submit.SelectCollectionStep</xmlui-
binding>
<workflow-editable>false</workflow-editable>
</step>
UI compatibility
The new step is available only for JSP UI. Nonetheless, if you run both UIs and want the JSP UI
benefit of the new step you can configure it as processing class also for XML as it degrades gracefully
to the standard SelectCollectionStep logic
The basic idea behind the BTE is a standard workflow that consists of three steps, a data loading step, a
processing step (record filtering and modification) and an output generation. A data loader provides the system
with a set of Records, the processing step is responsible for filtering or modifying these records and the output
generator outputs them in the appropriate format.
The standard BTE version offers several predefined Data Loaders as well as Output Generators for basic
bibliographic formats. However, Spring Dependency Injection can be utilized to load custom data loaders,
filters, modifiers and output generators.
StartSubmissionLookupStep in action!
When StartSubmissionLookupStep is enabled, the user comes up with the following screen when a new
submission is initiated:
There are four accordion tabs (default configuration hides the third tab):
1) Search for identifier: In this tab, the user can search for an identifier in the supported online services
(currently, arXiv, PubMed, CrossRef and CiNii are supported). The publication results are presented in the tab
"Results" in which the user can select the publication to proceed with. This means that a new submission form
will be initiated with the form fields prefilled with metadata from the selected publication.
Currently, there are four identifiers that are supported (DOI, PubMed ID, arXiv ID and NAID (CiNii ID) ). But
these can be extended - refer to the following paragraph regarding the SubmissionLookup service configuration
file.
User can fill in any of the four identifiers. DOI is preferable. Keep in mind that the service can integrate results
for the same publication from the three different providers so filling any of the four identifiers will pretty much do
the work. If identifiers for different publications are provided, the service will return a list of publications which
will be shown to user to select. The selected publication will make it to the submission form in which some fields
will be pre-filled with the publication metadata. The mapping from the input metadata (from arXiv or Pubmed or
CrossRef or CiNii) to the DSpace metadata schema (and thus, the submission form) is configured in the Spring
XML file that is discussed later on - you can see a table at the very end of this chapter.
Through the same file, a user can also extend the providers that the SubmissionLookup service can search
publication from.
2) Upload a file: In this tab, the user can upload a file, select the type (bibtex. csv, etc.), see the publications in
the "Results" tab and then either select one to proceed with the submission or make all of them "Workspace
Items" that can be found in the "Unfinished Submissions" section in the "My DSpace" page.
The "preview mode" in the figure above has the following functionality:
"ON": The list of the publications in the uploaded file will be show to the user to select the one for the
submission. The selected publication's metadata will pre-fill the submission form's fields according to
configuration in the Spring XML configuration file.
"OFF": All the publications of the uploaded file will be imported in the user's MyDSpace page as "Unfinished
Submissions" while the first one will go thought the submission process.
(Regarding the pubmed, crossref and arxiv file upload, you can find the attached file named "sample-files.zip"
that contains samples of these three file types)
3) Free search: In this tab, the user can freely search for Title, Author and Year in the four supported providers
(PubMed, CrossRef, Arxiv and CiNii). By default, the four providers are configured to be disabled for free search
but you can enable it via the configuration file. Thus, initially this accordion tab is not shown to the user except
for a data loader is declared as a "search provider" - refer to the following paragraphs.
The process is the same as in the previous cases. A result of publications is presented to the user to select the
one to preceed with the submission.
4) Default mode submission: In this tab, the user can proceed to the default manual submission. The
SubmissionLookup service will not run and the submission form will be empty for the user to start filling it.
The basic idea behind BTE is that the system holds the metadata in an internal format using a specific key for
each metadata field. DataLoaders load the record using the aforementioned keys, while the output generator
needs to map these keys to DSpace metadata fields.
The BTE configuration file is located in path: [dspace]/config/spring/api/bte.xml and it's a Spring
XML configuration file that consists of Java beans. (If these terms are unknown to you, please refer to Spring
Dependency Injection web site for more information.)
The service is broken down into two phases. In the first phase, the imported publications' metadata are
converted to an intermediate format while in the second phase, the intermediate format is converted to DSpace
metadata schema
Explanation of beans:
This is the top level bean that describes the service of the SubmissionLookup. It accepts three properties:
c) detailFields: A list of the keys that the user wants to display in the detailed form of a publication. That is,
when the results are shown, user can see the details of each one. In the detailed form, some fields appear.
These fields are configured by this property. Refer to the table at the very end of this chapter to see the
available values. This property is disabled by default while the list that is shown commented out is the default
list for the detailed form.
The transformation engine for the first phase of the service (from external service to intermediate format)
a) dataLoader : The data loader that will be used for the loading of the data
b) workflow : This property refers to the bean that describes the processing steps of the BTE. If no processing
steps are listed there all records loaded by the data loader will pass to the output generator, unfiltered and
unmodified.
Normally, you do not need to touch any of these three properties. You can edit the reference beans instead.
This bean declares the data loader to be used to load publications from. It has one property "dataloadersMap",
a map that declares key-value pairs, that is a unique key and the corresponding data loader to be used. Here is
the point where a new data loader can be added, in case the ones that are already supported do not meet your
needs.
in such a case, your data loader key will appear in the drop down menu of data types in the " Upload a file"
accordion tab
in such a case, your data loader key will appear as a provider in the " Search for identifier" accordion tab
These beans are the actual data loaders that are used by the service. They are either "FileDataLoaders" or
"SubmissionLookupDataLoaders" as mentioned previously.
a) fieldMap : it is a map that specifies the mapping between the keys that hold the metadata in the input format
and the ones that we want to have internal in the BTE. At the end of this article there is a table that summarises
the fields that are used from the three online services (pubmed, arXiv and crossRef) - which are the ones that
the submission lookup step is capable of reading from the online services - and the keys used internally in the
BTE.
CSV and TSV (which is actually a CSV loader if you look carefully the class value of the bean) loaders have
some more properties:
a) skipLines: A number that specifies the first line of the file that loader will start reading data. For example, if
you have a csv file that the first row contains the column names, and the second row is empty, the the value of
this property must be 2 so as the loader starts reading from row 2 (starting from 0 row). The default value for
this property is 0.
b) separator: A value to specify the separator between the values in the same row in order to make the
columns. For example, in a TSV data loader this value is "\u0009" which is the "Tab" character. The default
value is "," and that is why the CSV data loader doesn't need to specify this property.
c) quoteChar: This property specifies the quote character used in the CSV file. The default value is the double
quote character (").
a) searchProvider: if is set to true, the dataloader supports free search by title, author or year. If at least one of
these data loaders is declared as a search provider, the accordion tab "Free search" is appeared. Otherwise, it
stays hidden.
a) apiKey/appId respectively: Both these services need to acquire (for free) an API key in order to access their
online services. For CrossRef, visit: http://www.crossref.org/requestaccount/ and for CiNii visit:
https://portaltools.nii.ac.jp/developer/en/
b) maxResults: the maximum results that these services will reply with to your search. By default, this property
is commented out while the default value is 10 for both services.
(Regarding the file dataloaders, you can find the attached file named "sample-files.zip" that contains samples of
all the file types that the corresponding data loaders can handle)
This bean specifies the processing steps to be applied to the records metadata before they proceed to the
output generator of the transformation engine. Currenty, three steps are supported, but you can add yours as
well.
These beans are the processing steps that are supported by the 1st phase of transformation engine. The two
first map an incoming value to another one specified in a properties file. The last one is responsible to remove
the last dot from the incoming value.
All of them have the property "fieldKeys" which is a list of keys where the step will be applied.
In the case you need to create your own filters and modifiers follow the instructions below:
To create a new filter, you need to extend the following BTE abstact class:
gr.ekt.bte.core.AbstractFilter
Return false if the specified record needs to be filtered, otherwise return true.
To create a new modifier, you need to extend the following BTE abstact class:
gr.ekt.bte.core.AbstractModifier
within you can make any changes you like in the record. You can use the Record methods to get the values for
a specific key and load new ones (For the later, you need to make the Record mutable)
After you create your own filters or modifiers you need to add them in the Spring XML configuration file as in the
following example:
The transformation engine for the second phase of the service (from the intermediate format to DSpace
metadata schema)
Normally, you do not need to touch any of these three properties. You can edit the reference beans instead.
This bean specifies the processing steps to be applied to the records metadata before they proceed to the
output generator of the transformation engine. Currenty, two steps are supported, but you can add yours as
well.
These beans are the processing steps that are supported by the 2nd phase of transformation engine. The first
merges the values of multiple keys to a new key. The second one concatenates the values of a specific key to a
unique value. The third one translated the three-letters language code to two-letters one (ie: eng to en)
This bean declares the output generator to be used which is, in this case, a DSpaceWorkspaceItem generator.
It accepts two properties:
a) outputMap: A map from the intermediate keys to the DSpace metadata schema fields. The table below
displays the default output mapping. As you can see, some fields, while the are read from the input source, are
not output in DSpace since there are no default metadata schema fields to host them. However, if you create
the corresponding metadata field registry, you can come back in this configuration to add a map between the
input field key and the DSpace metadata field.
The following table presents the available keys from the online services, the keys that BTE uses in phase1 and
the final output map to DSpace metadata fields.
by BTE
(phase 2)
id url
by BTE
(phase 2)
comment note
pdfUrl fulltextUrl
authorWithAffiliation authorsWithAffiliation
primaryCategory arxivCategory
category arxivCategory
pubmedID pubmedID
publicationStatus publicationStatus
pubModel
by BTE
(phase 2)
printISBN pisbn
electronicISBN eisbn
editionNumber editionnumber
seriesTitle seriestitle
volumeTitle volumetitle
publicationType
editors editors
translators translators
chairs chairs
naid naid
ncid ncid
publisher publisher
I can see more beans in the configuration file that are not explained above. Why is this?
The configuration file hosts options for two services. BatchImport service and SubmissionLookup
service. Thus, some beans that are not used in the first service, are not mentioned in this
documentation. However, since both services are based on the BTE, some beans are used by both
services.
Curation System
Existing issues
Introduction
Configurable Workflows are an optional feature that may be enabled for use only within DSpace XMLUI.
The primary focus of the workflow framework is to create a more flexible solution for the administrator to
configure, and even to allow an application developer to implement custom steps, which may be configured in
the workflow for the collection through a simple configuration file. The concept behind this approach was
modeled on the configurable submission system already present in DSpace.
Please note that enabling the Configurable Reviewer Workflow makes changes to the structure of your
database that are currently irreversible in any graceful manner, so please backup your database in
advance to allow you to restore to that point should you wish to do so. It should also be noted that only
the XMLUI has been changed to cope with the database changes. The JSPUI will no longer work if the
Configurable Reviewer Workflow is enabled.
dspace/config/xmlui.xconf
The submission aspect has been split up into muliple aspects: one submission aspect for the submission
process, one workflow aspect containing the code for the original workflow and one xmlworkflow aspect
containing the code for the new XML configurable workflow framework. In order to enable one of the two
aspects, either the workflow or xmlworkflow aspect should be enabled in the [dspace]/config/xmlui.
xconf configuration file. This means that the xmlui.xconf configuration for the original workflow is the following:
And the xmlui.xconf configuration for the new XML configurable workflow is the following:
dspace/config/modules/workflow.cfg
Besides that, a workflow configuration file has been created that specifies the workflow that will be used in the
back-end of the DSpace code. It is important that the option selected in this configuration file matches the
aspect that was enabled. The workflow configuration file is available in [dspace]/config/modules
/workflow.cfg. This configuration file has been added because it is important that a CLI import process uses
the correct workflow and this should not depend on the UI configuration. The workflow.cfg configration file
contains the following property:
# Original Workflow
#workflow.framework: originalworkflow
#XML configurable workflow
workflow.framework: xmlworkflow
You will also need to follow the Data Migration Procedure below.
Data Migration
Please note that enabling the Configurable Reviewer Workflow makes changes to the structure of your
database that are currently irreversible in any graceful manner, so please backup your database in
advance to allow you to restore to that point should you wish to do so. It should also be noted that only
the XMLUI has been changed to cope with the database changes. The JSPUI will no longer work if the
Configurable Reviewer Workflow is enabled.
[dspace]/etc/oracle/xmlworkflow/xml_workflow.sql
[dspace]/etc/oracle/xmlworkflow/workflow_migration.sql
or
[dspace]/etc/postgres/xmlworkflow/xml_workflow.sql
[dspace]/etc/postgres/xmlworkflow/workflow_migration.sql
You need to run both scripts (xml_workflow.sql and then workflow_migration.sql) for the DBMS you are using.
Configuration
DSpace.cfg configuration
The workflow configuration file is available in [dspace]/config/modules/workflow.cfg. This
configuration file has been added because it is important that a CLI import process uses the correct workflow
and this should not depend on the UI configuration. The workflow.cfg configration file contains the following
property:
# Original Workflow
#workflow.framework: originalworkflow
#XML configurable workflow
workflow.framework: xmlworkflow
<wf-config>
<workflow-map>
<!-- collection to workflow mapping -->
<name-map collection="default" workflow="{workflow.id}"/>
<name-map collection="123456789/0" workflow="{workflow.id2}"/>
</workflow-map>
</step>
</workflow>
<workflow start="{start.step.id2}" id="{workflow.id}">
<!-- Another workflow configuration-->
</workflow>
</wf-config>
workflow-map
The workflow map contains a mapping between collections in DSpace and a workflow configuration. Similar to
the configuration of the submission process, the mapping can be done based on the handle of the collection.
The mapping with "default" as the value for the collection mapping, will be used for the collections not occurring
in other mapping tags. Each mapping is defined by a "name-map" tag with two attributes:
workflow
The workflow element is a repeatable XML element and the configuration between two "workflow" tags
represents one workflow process. It requires the following 2 attributes:
id: a unique identifier used for the identification of the workflow and used in the workflow to collection
mapping
start: the identifier of the first step of the workflow, this will be the entry point of this workflow-process.
When a new item has been committed to a collection that uses this workflow, the step configured in the
"start" attribute will he the first step the item will go through.
roles
Each workflow process has a number of roles defined between the "roles" tags. A role represents one or more
DSpace EPersons or Groups and can be used to assign them to one or more steps in the workflow process.
One role is represented by one "role" tag and has the following attributes:
id: a unique identifier (in one workflow process) for the role
description: optional attribute to describe the role
scope: optional attrbiute that is used to find our group and must have one of the following values:
collection: The collection value specifies that the group will be configured at the level of the
collection. This type of groups is the same as the type that existed in the original workflow system.
In case no value is specified for the scope attribute, the workflow framework assumes the role is a
collection role.
repository: The repository scope uses groups that are defined at repository level in DSpace. The
name attribute should exactly match the name of a group in DSpace.
item: The item scope assumes that a different action in the workflow will assign a number of
EPersons or Groups to a specific workflow-item in order to perform a step. These assignees can
be different for each workflow item.
name: The name specified in the name attribute of a role will be used to lookup the in DSpace. The
lookup will depend on the scope specified in the "scope" attribute:
collection: The workflow framework will look for a group containing the name specified in the name
attribute and the ID of the collection for which this role is used.
repository: The workflow framework will look for a group with the same name as the name
specified in the name attribute
item: in case the item scope is selected, the name of the role attribute is not required
internal: optional attribute which isn't really used at the moment, false by default
<roles>
<role id="{unique.role.id} description="{role.description}" scope="{role.scope}" name="{role.
name}" internal="true/false"/>
</roles>
step
The step element represents one step in the workflow process. A step represents a number of actions that must
be executed by one specified role. In case no role attribute is specified, the workflow framework assumes that
the DSpace system is responsible for the execution of the step and that no user interface will be available for
each of the actions in this step. The step element has the following attributes in order to further configure it:
id: The id attribute specifies a unique identifier for the step, this id will be used when configuring other
steps in order to point to this step. This identifier can also be used when configuring the start step of the
workflow item.
nextStep: This attribute specifies the step that will follow once this step has been completed under
normal circumstances. If this attribute is not set, the workflow framework will assume that this step is an
endpoint of the workflow process and will archive the item in DSpace once the step has been completed.
userSelectionMethod: This attribute defines the UserSelectionAction that will be used to determine how
to attache users to this step for a workflow-item. The value of this attribute must refer to the identifier of
an action bean in the workflow-actions.xml. Examples of the user attachment to a step are the currently
used system of a task pool or as an alternative directly assigning a user to a task.
role: optional attribute that must point to the id attribute of a role element specified for the workflow. This
role will be used to define the epersons and groups used by the userSelectionMethod.
RequiredUsers
Each step contains a number of actions that the workflow item will go through. In case the action has a user
interface, the users responsible for the exectution of this step will have to execute these actions before the
workflow item can proceed to the next action or the end of the step.
There is also an optional subsection that can be defined for a step part called "alternativeOutcome". This can be
used to define outcomes for the step that differ from the one specified in the nextStep attribute. Each action
returns an integer depending on the result of the action. The default value is "0" and will make the workflow item
proceed to the next action or to the end of the step.
In case an action returns a different outcome than the default "0", the alternative outcomes will be used to
lookup the next step. The alternativeOutcome element contains a number of steps, each having a status
attribute. This status attribute defines the return value of an action. The value of the element will be used to
lookup the next step the workflow item will go through in case an action returns that specified status.
API configuration
The workflow actions configuration is located in the [dspace]/config/spring/api/ directory and is named
"workflow-actions.xml". This configuration file describes the different Action Java classes that are used by the
workflow framework. Because the workflow framework uses Spring framework for loading these action classes,
this configuration file contains Spring configuration.
This file contains the beans for the actions and user selection methods referred to in the workflow.xml. In order
for the workflow framework to work properly, each of the required actions must be part of this configuration.
<!-- Below the class identifiers come the declarations for out actions/userSelectionMethods -->
User selection action: This type of action is always the first action of a step and is responsible for the
user selection process of that step. In case a step has no role attached, no user will be selected and the
NoUserSelectionAction is used.
Processing action: This type of action is used for the actual processing of a step. Processing actions
contain the logic required to execute the required operations in each step. Multiple processing actions
can be defined in one step. These user and the workflow item will go through these actions in the order
they are specified in the workflow configuration unless an alternative outcome is returned by one of them.
Each user selection action that is used in the workflow config refers to a bean definition in this workflow-actions.
xml configuration. In order to create a new user selection action bean, the following XML code is used:
This bean defines a new UserSelectionActionConfig and the following child tags:
constructor-arg: This is a constructor argument containing the ID the task. This is the same as the id
attribute of the bean and is used by the workflow config to refer to this action.
property processingAction: This tag refers the the ID of the API bean, responsible for the implementation
of the API side of this action. This bean should also be configured in this XML.
property requiresUI: In case this property is true, the workflow framework will expect a user interface for
the action. Otherwise the framework will automatically execute the action and proceed to the next one.
Processing Action
Processing actions are configured similar to the user selection actions. The only difference is that these
processing action beans are implementations of the WorkflowActionConfig class instead of the
UserSelectionActionConfig class.
http://www.springframework.org/schema/util http://www.springframework.
org/schema/util/spring-util-2.0.xsd">
Authorizations
Currently, the authorizations are always granted and revoked based on the tasks that are available for certain
users and groups. The types of authorization policies that is granted for each of these is always the same:
READ
WRITE
ADD
DELETE
Database
The workflow uses a separate metadata schema named workflow the fields this schema contains can be
found in the [dspace]/config/registries directory and in the file workflow-types.xml. This schema
is only used when using the score reviewing system at the moment, but one could always use this schema if
metadata is required for custom workflow steps.
The changes made to the database can always be found in the [dspace]/etc/[database-type]
/xmlworkflow/ directory in the file xml_workflow.sql. The following tables have been added to the
DSpace database. All tables are prefixed with 'cwf_' to avoid any confusion with the existing workflow related
database tables:
cwf_workflowitem
The cwf_workflowitem table contains the different workflowitems in the workflow. This table has the following
columns:
workflowitem_id: The identifier of the workflowitem and primary key of this table
item_id: The identifier of the DSpace item to which this workflowitem refers.
collection_id: The collection to which this workflowitem is submitted.
multiple_titles: Specifies whether the submission has multiple titles (important for submission steps)
published_before: Specifies whether the submission has been published before (important for
submission steps)
multiple_files: Specifies whether the submission has multiple files attached (important for submission
steps)
cwf_collectionrole
The cwf_collectionrole table represents a workflow role for one collection. This type of role is the same as the
roles that existed in the original workflow meaning that for each collection a separate group is defined to
described the role. The cwf_collectionrole table has the following columns:
collectionrol_id: The identifier of the collectionrole and the primaty key of this table
role_id: The identifier/name used by the workflow configuration to refer to the collectionrole
collection_id: The collection identifier for which this collectionrole has been defined
group_id: The group identifier of the group that defines the collection role
cwf_workflowitemrole
The cwf_workflowitemrole table represents roles that are defined at the level of an item. These roles are
temporary roles and only exist during the execution of the workflow for that specific item. Once the item is
archived, the workflowitemrole is deleted. Multiple rows can exist for one workflowitem with e.g. one row
containing a group and a few containing epersons. All these rows together make up the workflowitemrole The
cwf_workflowitemrole table has the following columns:
workflowitemrole_id: The identifier of the workflowitemrole and the primaty key of this table
role_id: The identifier/name used by the workflow configuration to refer to the workflowitemrole
workflowitem_id: The cwf_workflowitem identifier for which this workflowitemrole has been defined
group_id: The group identifier of the group that defines the workflowitemrole role
eperson_id: The eperson identifier of the eperson that defines the workflowitemrole role
cwf_pooltask
The cwf_pooltask table represents the different task pools that exist for a workflowitem. These task pools can
be available at the beginning of a step and contain all the users that are allowed to claim a task in this step.
Multiple rows can exist for one task pool containing multiple groups and epersons. The cwf_pooltask table has
the following columns:
pooltask_id: The identifier of the pooltask and the primaty key of this table
workflowitem_id: The identifier of the workflowitem for which this task pool exists
workflow_id: The identifier of the workflow configuration used for this workflowitem
step_id: The identifier of the step for which this task pool was created
action_id: The identifier of the action that needs to be displayed/executed when the user selects the task
from the task pool
eperson_id: The identifier of an eperson that is part of the task pool
group_id: The identifier of a group that is part of the task pool
cwf_claimtask
The cwf_claimtask table represents a task that has been claimed by a user. Claimed tasks can be assigned to
users or can be the result of a claim from the task pool. Because a step can contain multiple actions, the
claimed task defines the action at which the user has arrived in a particular step. This makes it possible to stop
working halfway the step and continue later. The cwf_claimtask table contains the following columns:
claimtask_id: The identifier of the claimtask and the primary key of this table
workflowitem_id: The identifier of the workflowitem for which this task exists
workflow_id: The id of the workflow configuration that was used for this workflowitem
step_id: The step that is currenlty processing the workflowitem
action_id: The action that should be executed by the owner of this claimtask
owner_id: References the eperson that is responsible for the execution of this task
cwf_in_progress_user
The cwf_in_progess_user table keeps track of the different users that are performing a certain step. This table
is used because some steps might require multiple users to perform the step before the workflowitem can
proceed. The cwf_in_progress_user table contains the following columns:
in_progress_user_id: The identifier of the in progress user and the primary key of this table
workflowitem_id: The identifier of the workflowitem for which the user is performing or has performed the
step.
user_id: The identifier of the eperson that is performing or has performe the task
finished: Keeps track of the fact that the user has finished the step or is still in progress of the execution
AssignStep: During the assignstep, a user has the ability to select a responsible user to review the
workflowitem. This means that for each workflowitem, a different user can be selected. Because a user is
assigned, the task pool is no longer required.
ReviewStep: The start of the reviewstep is different than the typical task pool. Instead of having a task
pool, the user will be automatically assigned to the task. However, the user still has the option to reject
the task (in case he or she is not responsible for the assigned task) or review the item. In case the user
rejects the task, the workflowitem will be sent to the another step in the workflow as an alternative to the
default outcome.
ScoreReviewStep: The group of responsible users for the score reviewing will be able to claim the task
from the taskpool. Dependingn on the configuration, a different number of users can be required to
execute the task. This means that the task will be available in the task pool until the required number of
users has at least claimed the task. Once everyone of them has finished the task, the next (automatic)
processing step is activated.
EvaluationStep: During the evaluationstep, no user interface is required. The workflow system will
automatically execute the step that evaluates the different scores. In case the average score is more
than a configurable percentage, the item is approved, otherwise it is rejected.
Known Issues
Curation System
The DSpace 1.7 version of the curation system integration into the original DSpace workflow only exists in the
WorkflowManager.advance() method. Before advancing to the next workflow step or archiving the Item, a check
is performed to see whether any curation tasks need to be executed/scheduled. The problem is that this check
is based on the hardcoded workflow steps that exist in the original workflow. These hardcoded checks are done
in the CurationManager and will need to be changed.
Existing issues
This mode also displays a list of the names of package ingestion and dissemination plugins that are currently
installed in your DSpace. Each Packager plugin also may allow for custom options, which may provide you
more control over how a package is imported or exported. You can see a listing of all specific packager options
by invoking --help (or -h) with the --type (or -t) option:
The above example will display the normal help message, while also listing any additional options available to
the "METS" packager plugin.
AIP - Ingests content which is in the DSpace Archival Information Package (AIP) format. This is used as
part of the DSpace AIP Backup and Restore process
DSPACE-ROLES - Ingests DSpace users/groups in the DSPACE-ROLES XML Schema. This is primarily
used by the DSpace AIP Backup and Restore process to ingest/replace DSpace Users & Groups.
METS - Ingests content which is in the DSpace METS SIP format
PDF - Ingests a single PDF file (where basic metadata is extracted from the file properties in the PDF
Document).
AIP - Exports content which is in the DSpace Archival Information Package (AIP) format. This is used as
part of the DSpace AIP Backup and Restore process
DSPACE-ROLES - Exports DSpace users/groups in the DSPACE-ROLES XML Schema. This is
primarily used by the DSpace AIP Backup and Restore process to export DSpace Users & Groups.
METS - Exports content in the DSpace METS SIP format
For a list of all package ingestion and dissemination plugins that are currently installed in your DSpace, you can
execute:
Some packages ingestion and dissemination plugins also have custom options/parameters. For example, to
see a listing of the custom options for the "METS" plugin, you can execute:
Ingesting
1. Submit/Ingest Mode (-s option, default) – submit package to DSpace in order to create a new object(s)
2. Restore Mode (-r option) – restore pre-existing object(s) in DSpace based on package(s). This also
attempts to restore all handles and relationships (parent/child objects). This is a specialized type of
"submit", where the object is created with a known Handle and known relationships.
3. Replace Mode (-r -f option) – replace existing object(s) in DSpace based on package(s). This also
attempts to restore all handles and relationships (parent/child objects). This is a specialized type of
"restore" where the contents of existing object(s) is replaced by the contents in the AIP(s). By default, if a
normal "restore" finds the object already exists, it will back out (i.e. rollback all changes) and report which
object already exists.
Where [user-email] is the e-mail address of the E-Person under whose authority this runs; [parent-handle] is the
Handle of the Parent Object into which the package is ingested, [packager-name] is the plugin name of the
package ingester to use, and /full/path/to/package is the path to the file to ingest (or "-" to read from the
standard input).
Here is an example that loads a PDF file with internal metadata as a package:
This example takes the result of retrieving a URL and ingests it:
For a Site-based package - this would ingest all Communities, Collections & Items based on the located
package files
For a Community-based package - this would ingest that Community and all SubCommunities,
Collections and Items based on the located package files
For a Collection - this would ingest that Collection and all contained Items based on the located package
files
For an Item – this just ingest the Item (including all Bitstreams & Bundles) based on the package file.
for example:
The above command will ingest the package named "collection-aip.zip" as a child of the specified Parent Object
(handle="4321/12"). The resulting object is assigned a new Handle (since -s is specified). In addition, any child
packages directly referenced by "collection-aip.zip" are also recursively ingested (a new Handle is also
assigned for each child AIP).
Because the packager plugin must know how to locate all child packages from an initial package file,
not all plugins can support bulk ingest. Currently, in DSpace the following Packager Plugins support
bulk ingest capabilities:
1. Default Restore Mode (-r) = Attempt to restore object (and optionally children). Rollback all changes if
any object is found to already exist.
2. Restore, Keep Existing Mode (-r -k) = Attempt to restore object (and optionally children). If an object is
found to already exist, skip over it (and all children objects), and continue to restore all other non-existing
objects.
3. Force Replace Mode (-r -f) = Restore an object (and optionally children) and overwrite any existing
objects in DSpace. Therefore, if an object is found to already exist in DSpace, its contents are replaced
by the contents of the package. WARNING: This mode is potentially dangerous as it will permanently
destroy any object contents that do not currently exist in the package. You may want to first perform a
backup, unless you are sure you know what you are doing!
For example:
Notice that unlike -s option (for submission/ingesting), the -r option does not require the Parent Object (-p
option) to be specified if it can be determined from the package itself.
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle
provided within the package itself (and added as a child of the parent object specified within the package itself).
If the object is found to already exist, all changes are rolled back (i.e. nothing is restored to DSpace)
Restore, Keep Existing Mode
When the "Keep Existing" flag (-k option) is specified, the restore will attempt to skip over any objects found to
already exist. It will report to the user that the object was found to exist (and was not modified or changed). It
will then continue to restore all objects which do not already exist. This flag is most useful when attempting a
bulk restore (using the --all (or -a) option.
One special case to note: If a Collection or Community is found to already exist, its child objects are also
skipped over. So, this mode will not auto-restore items to an existing Collection.
For example:
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle
provided within the package itself (and added as a child of the parent object specified within the package itself).
In addition, any child packages referenced by "aip4567.zip" are also recursively restored (the -a option
specifies to also restore all child pacakges). They are also restored with the Handles & Parent Objects provided
with their package. If any object is found to already exist, it is skipped over (child objects are also skipped). All
non-existing objects are restored.
Force Replace Mode
When the "Force Replace" flag (-f option) is specified, the restore will overwrite any objects found to already
exist in DSpace. In other words, existing content is deleted and then replaced by the contents of the package
(s).
Because this mode actually destroys existing content in DSpace, it is potentially dangerous and may
result in data loss! It is recommended to always perform a full backup (assetstore files & database)
before attempting to replace any existing object(s) in DSpace.
For example:
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle
provided within the package itself (and added as a child of the parent object specified within the package itself).
In addition, any child packages referenced by "aip4567.zip" are also recursively ingested. They are also
restored with the Handles & Parent Objects provided with their package. If any object is found to already exist,
its contents are replaced by the contents of the appropriate package.
If any error occurs, the script attempts to rollback the entire replacement process.
Disseminating
Where [user-email] is the e-mail address of the E-Person under whose authority this runs; [handle] is the
Handle of the Object to disseminate; [packager-name] is the plugin name of the package disseminator to use;
and [file-path] is the path to the file to create (or "-" to write to the standard output). For example:
The above code will export the object of the given handle (4321/4567) into a METS file named "4567.zip".
for example:
The above code will export the object of the given handle (4321/4567) into a METS file named "4567.zip". In
addition it would export all children objects to the same directory as the "4567.zip" file.
This feature came out of a requirement for DSpace to better integrate with DuraCloud (http://www.duracloud.org
), and other backup storage systems. One of these requirements is to be able to essentially "backup" local
DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.
Essentially, this means DSpace can export the entire hierarchy (i.e. bitstreams, metadata and relationships
between Communities/Collections/Items) into a relatively standard format (a METS-based, AIP format). This
entire hierarchy can also be re-imported into DSpace in the same format (essentially a restore of that content in
the same or different DSpace installation).
For more information, see the section on AIP backup & Restore for DSpace.
METS packages
Since DSpace 1.4 release, the software includes a package disseminator and matching ingester for the DSpace
METS SIP (Submission Information Package) format. They were created to help end users prepare sets of
digital resources and metadata for submission to the archive using well-defined standards such as METS,
MODS, and PREMIS. The plugin name is METS by default, and it uses MODS for descriptive metadata.
archive_directory/
item_000/
dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to
the dc schema
metadata_[prefix].xml -- metadata in another schema, the prefix is the name of the
schema as registered with the metadata registry
contents -- text file containing one line per filename
file_1.doc -- files to be added as bitstreams to the item
file_2.pdf
item_001/
dublin_core.xml
contents
file_1.png
...
The dublin_core.xml or metadata_[prefix].xml file has the following format, where each metadata
element has it's own entry within a <dcvalue> tagset. There are currently three tag attributes available in the
<dcvalue> tagset:
<dublin_core>
<dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue>
<dcvalue element="date" qualifier="issued">1990</dcvalue>
<dcvalue element="title" qualifier="alternative" language="fr">J'aime les Printemps<
/dcvalue>
</dublin_core>
(Note the optional language tag attribute which notifies the system that the optional title is in French.)
Every metadata field used, must be registered via the metadata registry of the DSpace instance first, see
Metadata and Bitstream Format Registries.
Recommended Metadata
The contents file simply enumerates, one file per line, the bitstream file names. See the following example:
file_1.doc
file_2.pdf
license
Please notice that the license is optional, and if you wish to have one included, you can place the file in the ...
/item_001/ directory, for example.
\tbundle:BUNDLENAME
\tpermissions:PERMISSIONS
\tdescription:DESCRIPTION
\tprimary:true
'BUNDLENAME' is the name of the bundle to which the bitstream should be added. Without specifying the
bundle, items will go into the default bundle, ORIGINAL.
1. Create a separate file for the other schema named metadata_[prefix].xml, where the [prefix] is
replaced with the schema's prefix.
2. Inside the xml file use the dame Dublin Core syntax, but on the <dublin_core> element include the
attribute schema=[prefix].
3. Here is an example for ETD metadata, which would be in the file metadata_etd.xml:
Importing Items
Before running the item importer over items previously exported from a DSpace instance, please first refer to
Transferring Items Between DSpace Instances.
-m or --mapfile Where the mapfile for items can be found (name and directory)
-n or --notify Kicks off the email alerting of the item(s) has(have) been imported
The item importer is able to batch import unlimited numbers of items for a particular collection using a very
simple CLI command and 'arguments'
eperson
Collection ID (either Handle (e.g. 123456789/14) or Database ID (e.g. 2)
Source directory where the items reside
Mapfile. Since you don't have one, you need to determine where it will be (e.g. /Import/Col_14/mapfile)
At the command line:
The above command would cycle through the archive directory's items, import them, and then generate a map
file which stores the mapping of item directories to item handles. SAVE THIS MAP FILE. Using the map file you
can use it for replacing or deleting (unimporting) the file.
Testing. You can add --test (or -t) to the command to simulate the entire import process without actually
doing the import. This is extremely useful for verifying your import files before doing the actual import.
eperson
Collection ID (either Handle (e.g. 123456789/14) or Database ID (e.g. 2)
Source directory where your zipfile containing the items resides
Zipfile
Mapfile. Since you don't have one, you need to determine where it will be (e.g. /Import/Col_14/mapfile)
At the command line:
The above command would unpack the zipfile, cycle through the archive directory's items, import them, and
then generate a map file which stores the mapping of item directories to item handles. SAVE THIS MAP FILE.
Using the map file you can use it for replacing or deleting (unimporting) the file.
Testing. You can add --test (or -t) to the command to simulate the entire import process without actually
doing the import. This is extremely useful for verifying your import files before doing the actual import.
Long form:
In long form:
Other Options
Workflow. The importer usually bypasses any workflow assigned to a collection. But add the --
workflow (-w) argument will route the imported items through the workflow system.
Templates. If you have templates that have constant data and you wish to apply that data during batch
importing, add the --template (-p) argument.
Resume. If, during importing, you have an error and the import is aborted, you can use the --resume (-
R) flag that you can try to resume the import where you left off after you fix the error.
Exporting Items
The item exporter can export a single item or a collection of items, and creates a DSpace simple archive
according to the aforementioned format for each item to be exported. The items are exported in a sequential
order in which they are retrieved from the database. As a consequence, the sequence numbers of the item
subdirectories (item_000, item_001) are not related to DSpace handle or item id's.
Arguments Description
short and
(long)
forms:
-t or -- Type of export. COLLECTION will inform the program you want the whole collection. ITEM will
type be only the specific item. (You will actually key in the keywords in all caps. See examples
below.)
-d or -- The destination of where you want the file of items to be placed. You place the path if
dest necessary.
-n or -- Sequence number to begin export the items with. Whatever number you give, this will be the
number name of the first directory created for your export. The layout of the export is the same as you
would set your layout for an Import.
-m or -- Export the item/collection for migration. This will remove the handle and metadata that will be
migrate re-created in the new instance of DSpace.
-h or -- Brief Help.
help
Exporting a Collection
Short form:
The keyword COLLECTION means that you intend to export an entire collection. The ID can either be the
database ID or the handle. The exporter will begin numbering the simple archives with the sequence number
that you supply. To export a single item use the keyword ITEM and give the item ID as an argument:
Short form:
Each exported item will have an additional file in its directory, named 'handle'. This will contain the handle that
was assigned to the item, and this file will be read by the importer so that items exported and then imported to
another machine will retain the item's original handle.
The -m Argument
Using the -m argument will export the item/collection and also perform the migration step. It will perform the
same process that the next section Exchanging Content Between Repositories performs. We recommend that
section to be read in conjunction with this flag being used.
The procedures below will not import the actual bitstreams into DSpace. They will merely inform
DSpace of an existing location where these Bitstreams can be found. Please refer to Importing and
Exporting Items via Simple Archive Format for information on importing metadata and bitstreams.
Overview
Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by
taking advantage of the bitstreams already being in storage accessible to DSpace. An example might be that
there is a repository for existing digital assets. Rather than using the normal interactive ingest process or the
batch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the
metadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplish
registration.
Accessible Storage
To register an item its bitstreams must reside on storage accessible to DSpace and therefore referenced by an
asset store number in dspace.cfg. The configuration file dspace.cfg establishes one or more asset stores
through the use of an integer asset store number. This number relates to a directory in the DSpace host's file
system or a set of SRB account parameters. This asset store number is described in The dspace.cfg
Configuration Properties File section and in the dspace.cfg file itself. The asset store number(s) used for
registered items should generally not be the value of the assetstore.incoming property since it is unlikely that
you will want to mix the bitstreams of normally ingested and imported items and registered items.
The DSpace Simple Archive Format for registration does not include the actual content files (bitstreams) being
registered. The format is however a directory full of items to be registered, with a subdirectory per item. Each
item directory contains a file for the item's descriptive metadata (dublin_core.xml) and a file listing the item's
content files (contents), but not the actual content files themselves.
The dublin_core.xml file for item registration is exactly the same as for regular item import.
The contents file, like that for regular item import, lists the item's content files, one content file per line, but each
line has the one of the following formats:
-r -s n -f filepath
-r -s n -f filepath\tbundle:bundlename
-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'
-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'\tdescription: some text
where
The command line for registration is just like the one for regular import:
The --workflow and --test flags will function as described in Importing Items.
The --delete flag will function as described in Importing Items but the registered content files will not be
removed from storage. See Deleting Registered Items.
The --replace flag will function as described in Importing Items but care should be taken to consider different
cases and implications. With old items and new items being registered or ingested normally, there are four
combinations or cases to consider. Foremost, an old registered item deleted from DSpace using --replace
will not be removed from the storage. See Deleting Registered Items. where is resides. A new item added to
DSpace using --replace will be ingested normally or will be registered depending on whether or not it is
marked in the contents files with the -r.
First, the randomly generated internal ID is not used because DSpace does not control the file path and name
of the bitstream. Instead, the file path and name are that specified in the contents file.
Second, the store_number column of the bitstream database row contains the asset store number specified in
the contents file.
Third, the internal_id column of the bitstream database row contains a leading flag (-R) followed by the
registered file path and name. For example, -Rfilepath where filepath is the file path and name relative to
the asset store corresponding to the asset store number. The asset store could be traditional storage in the
DSpace server's file system or an SRB account.
Fourth, an MD5 checksum is calculated by reading the registered file if it is in local storage. If the registered file
is in remote storage (say, SRB) a checksum is calculated on just the file name! This is an efficiency choice
since registering a large number of large files that are in SRB would consume substantial network resources
and time. A future option could be to have an SRB proxy process calculate MD5s and store them in SRB's
metadata catalog (MCAT) for rapid retrieval. SRB offers such an option but it's not yet in production release.
Registered items and their bitstreams can be retrieved transparently just like normally ingested items.
T his functionality is an extension of that provided by Importing and Exporting Items via Simple Archive Format
so please read that section before continuing. It is underpinned by the Biblio Transformation Engine (
https://github.com/EKT/Biblio-Transformation-Engine )
The basic idea behind the BTE is a standard workflow that consists of three steps, a data loading step, a
processing step (record filtering and modification) and an output generation. A data loader provides the system
with a set of Records, the processing step is responsible for filtering or modifying these records and the output
generator outputs them in the appropriate format.
The standard BTE version offers several predefined Data Loaders as well as Output Generators for basic
bibliographic formats. However, Spring Dependency Injection can be utilized to load custom data loaders,
filters, modifiers and output generators.
BTE in DSpace
The functionality of batch importing items in DSpace using the BTE has been incorporated in the
"import" script already used in DSpace for years.
In the import script, there is a new option (option "-b") to import using the BTE and an option -i to declare the
type of the input format. All the other options are the same apart from option "-s" that in this case points to a file
(and not a directory as it used to) that is the file of the input data. However, in the case of batch BTE import, the
option "-s" is not obligatory since you can configure the input from the Spring XML configuration file discussed
later on. Keep in mind, that if option "-s" is defined, import will take that option into consideration instead of the
one defined in the Spring XML configuration.
Thus, to import metadata from the various input formats use the following commands:
Input Command
Input Command
Keep in mind that the value of the "-e" option must be a valid email of a DSpace user and value of the "-c"
option must be the target collection handle. Attached, you can find a .zip file (sample-files.zip) that includes
examples of the file formats that are mentioned above.
BTE Configuration
The basic idea behind BTE is that the system holds the metadata in an internal format using a specific key for
each metadata field. DataLoaders load the record using the aforementioned keys, while the output generator
needs to map these keys to DSpace metadata fields.
The BTE configuration file is located in path: [dspace]/config/spring/api/bte.xml and it's a Spring
XML configuration file that consists of Java beans. (If these terms are unknown to you, please refer to Spring
Dependency Injection web site for more information.)
Explanation of beans:
This is the top level bean that describes the service of the batch import from the various external metadata
formats. It accepts three properties:
a) dataLoaders: a list of all the possible data loaders that are supported. Keep in mind that for each data loader
we specify a key that can be used as the value of option "-i" in the import script that we mentioned earlier. Here
is the point where you would add a new custom DataLoader in case the default ones doesn't match your needs.
b) outputMap: a Map between the internal keys that BTE service uses to hold metadata and the DSpace
metadata fields. (See later on, how data loaders specify the keys that BTE uses to hold the metadata)
c) transformationEngine: the BTE transformation engine that actually consisits of the processing steps that
will be applied to metadata during their import to DSpace
This bean is instantiated when the batch import takes place. It deploys a new BTE transformation engine that
will do the transformation from one format to the other. It needs one input argument, the workflow (the
processing step mentioned before) that will run when transformation takes place. Normally, you don't need to
modify this bean.
This bean describes the processing steps. Currently, there are no processing steps meaning that all records
loaded by the data loader will pass to the output generator, unfiltered and unmodified. ( See next section "Case
studies" for info about how to add a filter or a modifier )
These data loaders are of two types: "file" data loaders and "online" data loaders. The first 8 of them belong to
file data loaders while the last one (OAI data loader) is an online one.
a) filename: it is a String that specifies the filepath to the file that the loader will read data from. If you specify
this property, you do not need to give the option "-s" to the import script in the command prompt. If you,
however, specify this property and you also provide a "-s" option in the command line, the option "-s" will be
taken into consideration by the data loader.
b) fieldMap: it is a map that specifies the mapping between the keys that hold the metadata in the input file and
the ones that we want to have internal in the BTE. This mapping is very important because the internal keys
need to be declared in the "outputMap" of the "DataLoadeService" bean. Be aware that each data loader has
each own input file keys. For example, RIS loader uses the keys "T1, AU, SO ... " while the TSV or CSV use the
index number of the column that the value resides.
CSV and TSV (which is actually a CSV loader if you look carefully the class value of the bean) loaders have
some more properties:
a) skipLines: A number that specifies the first line of the file that loader will start reading data. For example, if
you have a csv file that the first row contains the column names, and the second row is empty, the the value of
this property must be 2 so as the loader starts reading from row 2 (starting from 0 row). The default value for
this property is 0.
b) separator: A value to specify the separator between the values in the same row in order to make the
columns. For example, in a TSV data loader this value is "\u0009" which is the "Tab" character. The default
value is "," and that is why the CSV data loader doesn't need to specify this property.
c) quoteChar: This property specifies the quote character used in the CSV file. The default value is the double
quote character (").
a) fieldMap: Same as above, the mapping between the input keys holding the metadata and the ones that we
want to have internal in BTE.
b) serverAddress: The base address of the OAI provider (server). Base address can be specified also in the "-
s" option of the command prompt. If is specified in both places, the one specified from the command line is
preferred.
Since DSpace administrators may have incorporated their own metadata schema within DSpace (apart from the
default Dublin Core schema), they may need to configure BTE to match their custom schemas.
So, in case you need to process more metadata fields than those that are specified by default, you need to
change the data loader configuration and the output map.
I can see more beans in the configuration file that are not explained above. Why is this?
The configuration file hosts options for two services. BatchImport service and SubmissionLookup
service. Thus, some beans that are not used for the latter, are not mentioned in this documentation.
However, since both services are based on the BTE, some beans are used by both services.
UI for administrators
Batch import of files can be done via the administrative UI. While logged in as administrator, visit "Administer"
link and then, under the "Content" drop down menu, choose "Batch import”. You can find more information here
Keep in mind that the type drop down menu includes the Simple Archive Format that discussed earlier and all
the supported data loaders declared in the configuration XML file that are of type "file". Thus, OAI data loader is
not included in this list and in case you need to create your own data loader you are advised to extend the
"FileDataLoader" abstract class rather than implement the "DataLoade" interface, as mentioned in previous
paragraph.
The whole procedure can take long time to complete, in case of large input files, so the whole procedure runs in
the background in a separate thread. When the thread is completed (either successfully or erroneously), the
user is informed via email for the status of the import.
Case Studies
1) I have my data in a format different from the ones that are supported by this functionality. What can I
do?
Either you try to easily transform your data to one of the supported formats or you need to create a new data
loader. To do this, create a new Java class that implements the following Java interface from BTE:
gr.ekt.bte.core.DataLoader
in which you have to create records - most probably you will need to create your own Record class (by
implementing the gr.ekt.bte.core.Record interface) and fill a RecordSet. Feel free to add whatever code you like
in this method, even to read data from multiple sources. All you need is just to return a RecordSet of Records.
gr.ekt.bte.core.dataloader.FileDataLoader
if you want to create a "file" data loader in which you need to pass a filepath to the file that the loader will read
the data from. Normally, a simple data loader is enough for the system to work, but file data loaders are also
utilized in the administration UI discussed later in this documentation.
After that, you will need to declare the new DataLoader in the Spring XML configuration file (in the bean with
id=" org.dspace.app.itemimport.BTEBatchImportService ") using your own unique key. Use this key as a value
for option "-i" in the batch import in order to specify that the specific data loader must run.
2) I need to filter some of the input records or modify some value from records before outputting them
In this case you will need to create your own filters and modifiers.
To create a new filter, you need to extend the following BTE abstact class:
gr.ekt.bte.core.AbstractFilter
Return false if the specified record needs to be filtered, otherwise return true.
To create a new modifier, you need to extend the following BTE abstact class:
gr.ekt.bte.core.AbstractModifier
within you can make any changes you like in the record. You can use the Record methods to get the values for
a specific key and load new ones (For the later, you need to make the Record mutable)
After you create your own filters or modifiers you need to add them in the Spring XML configuration file as in the
following example:
You can add as many filters and modifiers you like to batchImportLinearWorkflow, they will run the one after the
other in the specified order.
Usage
Command used: [dspace]/bin/dspace structure-builder
Argument: short and long (if available) forms: Description of the argument
<import_structure>
<community>
<name>Community Name</name>
<description>Descriptive text</description>
<intro>Introductory text</intro>
<copyright>Special copyright notice</copyright>
<sidebar>Sidebar text</sidebar>
<community>
<name>Sub Community Name</name>
<community> ...[ad infinitum]...
</community>
</community>
<collection>
<name>Collection Name</name>
<description>Descriptive text</description>
<intro>Introductory text</intro>
<copyright>Special copyright notice</copyright>
<sidebar>Sidebar text</sidebar>
<license>Special licence</license>
<provenance>Provenance information</provenance>
</collection>
</community>
</import_structure>
<import_structure>
<community identifier="123456789/1">
<name>Community Name</name>
<description>Descriptive text</description>
<intro>Introductory text</intro>
<copyright>Special copyright notice</copyright>
<sidebar>Sidebar text</sidebar>
<community identifier="123456789/2">
<name>Sub Community Name</name>
<community identifier="123456789/3"> ...[ad infinitum]...
</community>
</community>
<collection identifier="123456789/4">
<name>Collection Name</name>
<description>Descriptive text</description>
<intro>Introductory text</intro>
<copyright>Special copyright notice</copyright>
<sidebar>Sidebar text</sidebar>
<license>Special licence</license>
<provenance>Provenance information</provenance>
</collection>
</community>
</import_structure>
This command-line tool gives you the ability to import a community and collection structure directly from a
source XML file. It is executed as follows:
This will examine the contents of source.xml, import the structure into DSpace while logged in as the supplied
administrator, and then output the same structure to the output file, but including the handle for each imported
community and collection as an attribute.
Limitations
Currently this does not export community and collection structures, although it should only be a small
modification to make it do so
SWORD is based on the Atom Publish Protocol and allows service documents to be requested which describe
the structure of the repository, and packages to be deposited.
Property: mets-ingester.package-ingester
Informational The property key tell the SWORD METS implementation which package ingester to use to
Note: install deposited content. This should refer to one of the classes configured for:
plugin.named.org.dspace.content.packager.PackageIngester
The value of sword.mets-ingester.package-ingester tells the system which named plugin for
this interface should be used to ingest SWORD METS packages.
Properties: mets.default.ingest.crosswalk.EPDCX
mets.default.ingest.crosswalk.*
(NOTE: These configs are in the dspace.cfg file as they are used by many interfaces)
Configuration [dspace]/config/modules/sword-server.cfg
File:
Informational Define the metadata types which can be accepted/handled by SWORD during ingest of a
Note: package. Currently, EPDCX (EPrints DC XML) is the recommended default metadata
format, but others are supported.
Property: crosswalk.submission.EPDCX.stylesheet
(NOTE: This configuration is in the dspace.cfg file)
Informational Define the stylesheet which will be used by the self-named XSLTIngestionCrosswalk class
Note: when asked to load the SWORD configuration (as specified above). This will use the
specified stylesheet to crosswalk the incoming SWAP metadata to the DIM format for
ingestion.
Property: deposit.url
Example
Value: deposit.url = http://www.myu.ac.uk/sword/deposit
Informational The base URL of the SWORD deposit. This is the URL from which DSpace will construct
Note: the deposit location URLs for collections. The default is ${dspace.baseUrl}/sword
/deposit (where dspace.baseUrl is defined in your dspace.cfg file). In the event that
you are not deploying DSpace as the ROOT application in the servlet container, this will
generate incorrect URLs, and you should override the functionality by specifying in full as
shown in the example value.
Property: servicedocument.url
Example
Value: servicedocument.url = http://www.myu.ac.uk/sword/servicedocument
Informational The base URL of the SWORD service document. This is the URL from which DSpace will
Note: construct the service document location URLs for the site, and for individual collections. The
default is ${dspace.baseUrl}/sword/servicedocument (where dspace.baseUrl is
defined in your dspace.cfg file). In the event that you are not deploying DSpace as the
ROOT application in the servlet container, this will generate incorrect URLs, and you should
override the functionality by specifying in full as shown in the example value.
Property: media-link.url
Configuration [dspace]/config/modules/sword-server.cfg
File:
Example
Value: media-link.url = http://www.myu.ac.uk/sword/media-link
Informational The base URL of the SWORD media links. This is the URL which DSpace will use to
Note: construct the media link URLs for items which are deposited via sword. The default is
${dspace.baseUrl}/sword/media-link (where dspace.baseUrl is defined in your
dspace.cfg file). In the event that you are not deploying DSpace as the ROOT application
in the servlet container, this will generate incorrect URLs, and you should override the
functionality by specifying in full as shown in the example value.
Property: generator.url
Example
Value: generator.url = http://www.dspace.org/ns/sword/1.3.1
Informational The URL which identifies the SWORD software which provides the sword interface. This is
Note: the URL which DSpace will use to fill out the atom:generator element of its atom
documents. The default is: {{http://www.dspace.org/ns/sword/1.3.1
}}. If you have modified your SWORD software, you should change this URI to identify your
own version. If you are using the standard 'dspace-sword' module you will not, in general,
need to change this setting.
Property: updated.field
Informational The metadata field in which to store the updated date for items deposited via SWORD.
Note:
Property: slug.field
Informational The metadata field in which to store the value of the slug header if it is supplied.
Note:
Properties:
accept-packaging.METSDSpaceSIP.identifier
Configuration [dspace]/config/modules/sword-server.cfg
File:
accept-packaging.METSDSpaceSIP.q
Example
Value: accept-packaging.METSDSpaceSIP.identifier = http://purl.org/net/sword-types
/METSDSpaceSIP
accept-packaging.METSDSpaceSIP.q = 1.0
Informational The accept packaging properties, along with their associated quality values where
Note: appropriate. This is a Global Setting; these will be used on all DSpace collections
Property: accepts
Informational A comma separated list of MIME types that SWORD will accept.
Note:
Properties:
accept-packaging.[handle].METSDSpaceSIP.identifier
accept-packaging.[handle].METSDSpaceSIP.q
Example
Value: accept-packaging.[handle].METSDSpaceSIP.identifier = http://purl.org/net/sword-
types/METSDSpaceSIP
accept-packaging.[handle].METSDSpaceSIP.q = 1.0
Informational Collection Specific settings: these will be used on the collections with the given handles.
Note:
Property: expose-items
Informational Should the server offer up items in collections as sword deposit targets. This will be effected
Note: by placing a URI in the collection description which will list all the allowed items for the
depositing user in that collection on request. NOTE: this will require an implementation of
deposit onto items, which will not be forthcoming for a short while.
Configuration [dspace]/config/modules/sword-server.cfg
File:
Property: expose-communities
Informational Should the server offer as the default the list of all Communities to a Service Document
Note: request. If false, the server will offer the list of all collections, which is the default and
recommended behavior at this stage. NOTE: a service document for Communities will not
offer any viable deposit targets, and the client will need to request the list of Collections in
the target before deposit can continue.
Property: max-upload-size
Example max-upload-size = 0
Value:
Informational The maximum upload size of a package through the sword interface, in bytes. This will be
Note: the combined size of all the files, the metadata and any manifest data. It is NOT the same
as the maximum size set for an individual file upload through the user interface. If not set, or
set to 0, the sword service will default to no limit.
Property: keep-original-package
Informational Whether or not DSpace should store a copy of the original sword deposit package. NOTE:
Note: this will cause the deposit process to run slightly slower, and will accelerate the rate at
which the repository consumes disk space. BUT, it will also mean that the deposited
packages are recoverable in their original form. It is strongly recommended, therefore, to
leave this option turned on. When set to "true", this requires that the configuration option
upload.temp.dir (in dspace.cfg) is set to a valid location.
Property: bundle.name
Informational The bundle name that SWORD should store incoming packages under if sword.keep-
Note: original-package is set to true. The default is "SWORD" if not value is set
Properties: keep-package-on-fail
failed-package.dir
Configuration [dspace]/config/modules/sword-server.cfg
File:
Example
Value: keep-package-on-fail=true
failed-package.dir=${dspace.dir}/upload
Informational In the event of package ingest failure, provide an option to store the package on the file
Note: system. The default is false.
Property: identify-version
Informational Should the server identify the sword version in a deposit response. It is recommended to
Note: leave this unchanged.
Property: on-behalf-of.enable
Informational Should mediated deposit via sword be supported. If enabled, this will allow users to deposit
Note: content packages on behalf of other users.
Property: restore-mode.enable
Informational Should the sword server enable restore-mode when ingesting new packages. If this is
Note: enabled the item will be treated as a previously deleted item from the repository. If the item
had previously been assigned a handle then that same handle will be restored to activity. If
that item had not been previously assign a handle, then a new handle will be assigned.
Property: plugin.named.org.dspace.sword.SWORDingester
Example
Value: plugin.named.org.dspace.sword.SWORDIngester = \
org.dspace.sword.SWORDMETSIngester = http://purl.org/net/sword-types
/METSDSpaceSIP \
org.dspace.sword.SimpleFileIngester = SimpleFileIngester
Configuration [dspace]/config/modules/sword-server.cfg
File:
Informational Configure the plugins to process incoming packages. The form of this configuration is as per
Note: the Plugin Manager's Named Plugin documentation: plugin.named.[interface] =
[implementation] = [package format identifier] (see dspace.cfg).
Package ingesters should implement the SWORDIngester interface, and will be loaded
when a package of the format specified above in: accept-packaging.[package
format].identifier = [package format identifier] is received. In the event
that this is a simple file deposit, with no package format, then the class named by
"SimpleFileIngester" will be loaded and executed where appropriate. This case will only
occur when a single file is being deposited into an existing DSpace Item.
SWORD is based on the Atom Publish Protocol and allows service documents to be requested which describe
the structure of the repository, and packages to be deposited.
Property: url
Example
Value: url = http://www.myu.ac.uk/swordv2
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Informational The base url of the SWORD 2.0 system. This defaults to ${dspace.baseUrl}/swordv2 (where dspace
Note: dspace.cfg file).
Property: collection.url
Example
Value: collection.url = http://www.myu.ac.uk/swordv2/collection
Informational The base URL of the SWORD collection. This is the URL from which DSpace will construct the deposit loca
Note: defaults to ${dspace.baseUrl}/swordv2/collection (where dspace.baseUrl is defined in your d
Property: servicedocument.url
Example
Value: servicedocument.url = http://www.myu.ac.uk/swordv2/servicedocument
Informational The service document URL of the SWORD collection. The base URL of the SWORD service document. Thi
Note: DSpace will construct the service document location urls for the site, and for individual collections. This defa
/swordv2/servicedocument (where dspace.baseUrl is defined in your dspace.cfg file).
Property: accept-packaging.collection
Example
Value: accept-packaging.collection.METSDSpaceSIP = http://purl.org/net/sword/package/METSDSpaceSIP
accept-packaging.collection.SimpleZip = http://purl.org/net/sword/package/SimpleZip
accept-packaging.collection.Binary = http://purl.org/net/sword/package/Binary
Informational The accept packaging properties, along with their associated quality values where appropriate.
Note:
Property: accept-packaging.item
Example
Value: accept-packaging.item.METSDSpaceSIP = http://purl.org/net/sword/package/METSDSpaceSIP
accept-packaging.item.SimpleZip = http://purl.org/net/sword/package/SimpleZip
accept-packaging.item.Binary = http://purl.org/net/sword/package/Binary
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Informational The accept packaging properties for items. It is possible to configure this for specific collections by adding th
Note: the setting, for example accept-packaging.collection.[handle].METSDSpaceSIP = http://pu
/METSDSpaceSIP
Property: accepts
Example
Value: accepts = application/zip, image/jpeg
Informational A comma-separated list of MIME types that SWORD will accept. To accept all mimetypes, the value can be
Note:
Property: expose-communities
Example
Value: expose-communities = false
Informational Whether or not the server should expose a list of all the communities to a service document request. As dep
Note: collection, it is recommended to leave this set to false.
Property: max-upload-size
Example
Value: max-upload-size = 0
Informational The maximum upload size of a package through the SWORD interface (measured in bytes). This will be the
Note: metadata, and manifest file in a package - this is different to the maximum size of a single bitstream.
Property: keep-original-package
Example
Value: keep-original-package = true
Informational
Note:
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
This will cause the deposit process to be slightly slower and for more disk to be used, however original files
recommended to leave this option enabled.
Property: bundle.name
Example
Value: bundle.name = SWORD
Informational The bundle name that SWORD should store incoming packages within if keep-original-package is se
Note:
Property:
bundle.deleted
Example
bundle.deleted = DELETED
Value:
Informational The bundle name that SWORD should use to store deleted bitstreams if versions.keep is set to true. Th
Note: individual files are updated or removed via SWORD. If the entire Media Resource (files in the ORIGINAL bu
backed up in its entirety in a bundle of its own
Property: keep-package-on-fail
Example
Value: keep-package-on-fail = false
Informational In the event of package ingest failure, provide an option to store the package on the file system. The default
Note: set using the failed-package-dir setting.
Property: failed-package-dir
Example
Value: failed-package-dir = /dspace/upload
Informational If keep-package-on-fail is set to true, this is the location where the package would be stored.
Note:
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Property: on-behalf-of.enable
Example
Value: on-behalf-of.enable = true
Informational Should DSpace accept mediated deposits? See the SWORD specification for a detailed explanation of depo
Note:
Property:
on-behalf-of.update.mediators
Example
on-behalf-of.update.mediators = admin@mydspace.edu, mediator@mydspace.edu
Value:
Informational Which user accounts are allowed to do updates on items which already exist in DSpace, on-behalf-of other
Note:
If this is left blank, or omitted, then all accounts can mediate updates to items, which could be a security risk
checking that the authenticated user is a "legitimate" mediator
Property:
verbose-description.receipt.enable
Example
verbose-description.receipt.enable = false
Value:
Informational Should the deposit receipt include a verbose description of the deposit? For use by developers - recommen
Note: systems
Property:
verbose-description.error.enable
Example
verbose-description.error.enable = true
Value:
Informational should the error document include a verbose description of the error? For use by developers, although you
Note: to "true" for production systems
Property:
error.alternate.url
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Example
error.alternate.url = http://mydspace.edu/xmlui/contact
Value:
Informational The error document can contain an alternate url, which the client can use to follow up any issues. For exam
Note: Contact-Us page on the XMLUI
Property:
error.alternate.content-type
Example
error.alternate.content-type = text/html
Value:
Informational The error.alternate.url may have an associated content type, such as text/html if it points to a w
Note: indicate to the client what content type it can expect if it follows that url.
Property: generator.url
Example
Value: generator.url = http://www.dspace.org/ns/sword/2.0/
Informational The URL which identifies DSpace as the software that is providing the SWORD interface.
Note:
Property: generator.version
Example
Value: generator.version = 2.0
Property: auth-type
Example
Value: auth-type = Basic
Informational Which form of authentication to use. Normally this is set to Basic in order to use HTTP Basic.
Note:
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Property: upload.tempdir
Example
Value: upload.tempd = /dspace/upload
Informational The location where uploaded files and packages are stored while being processed.
Note:
Property: updated.field
Example
Value: updated.field = dc.date.updated
Informational The metadata field in which to store the updated date for items deposited via SWORD.
Note:
Property: slug.field
Example
Value: slug.field = dc.identifier.slug
Informational The metadata field in which to store the value of the slug header if it is supplied.
Note:
Property: author.field
Example
Value: author.field = dc.contributor.author
Informational The metadata field in which to store the value of the atom entry author if it supplied.
Note:
Property: title.field
Example
Value: dc.title
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Informational The metadata field in which to store the value of the atom entry title if it supplied.
Note:
Property: disseminate-packaging
Example
Value: disseminate-packaging.METSDSpaceSIP = http://purl.org/net/sword/package/METSDSpaceSIP
disseminate-packaging.SimpleZip = http://purl.org/net/sword/package/SimpleZip
Property:
statement.bundles
Example
statement.bundles = ORIGINAL, SWORD, LICENSE
Value:
Informational Which bundles should the Statement include in its list of aggregated resources? The Statement will automa
Note: which are in the bundle identified by the ${bundle.name} property, provided that bundle is also listed her
Deposits to be listed in the Statement then you should add the SWORD bundle to this list)
Property: plugin.single.org.dspace.sword2.WorkflowManager
Example
Value: plugin.single.org.dspace.sword2.WorkflowManager = org.dspace.sword2.WorkflowManagerDefault
Property:
workflowmanagerdefault.always-update-metadata
Example
workflowmanagerdefault.always-update-metadata = true
Value
Informational Should the WorkflowManagerDefault plugin allow updates to the item's metadata to take place on items wh
Note workspace (e.g. in the workflow, archive, or withdrawn) ?
Property:
workflowmanagerdefault.file-replace.enable
File:
Example
workflowmanagerdefault.file-replace.enable = false
Value
Property: mets-ingester.package-ingester
Example
Value: mets-ingester.package-ingester = METS
Property: restore-mode.enable
Example
Value: restore-mode.enable = false
Informational Should the SWORD server enable restore-mode when ingesting new packages. If this is enabled the item w
Note: deleted item from the repository. If the item has previously been assigned a handle then that same handle w
Property: simpledc.*
Example
Value: simpledc.abstract = dc.description.abstractsimpledc.date = dc.datesimpledc.rights = dc.rights
Property: atom.*
Example
atom.author = dc.contributor.author
Value
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Property:
metadata.replaceable
Example
metadata.replaceable = dc.description.abstract, dc.rights, dc.title.alternative
Value
Informational Used by SimpleDCEntryIngester: Which metadata fields can be replaced during a PUT to the Item of an Ato
Note listed here are the ones which will be removed when a new PUT comes through (irrespective of whether the
replace them)
Property: multipart.entry-first
Example
Value: multipart.entry-first = false
Informational The order of precedence for importing multipart content. If this is set to true then metadata in the package
Note: atom entry, otherwise the metadata in the atom entry will override that from the package.
Property: workflow.notify
Example
Value: workflow.notify = true
Informational If the workflow gets started (the collection being deposited into has a workflow configured), should a notifica
Note:
Property: versions.keep
Example
Value: versions.keep = true
Informational When content is replaced, should the old version be kept? This creates a copy of the ORIGINAL bundle with
Note: where YYYY-MM-DD is the date the copy was created, and X is an integer from 0 upwards.
Property: state.*
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Example
Value: state.workspace.uri = http://localhost:8080/xmlui/state/inprogress
state.workspace.description = The item is in the user workspace
state.workflow.uri = http://localhost:8080/xmlui/state/inreview
state.workflow.description = The item is undergoing review prior to acceptance in the archive
Informational Pairs of states (URI and description) than items can be in. Typical states are workspace, workflow, arch
Note:
Property:
workspace.url-template
Example
workspace.url-template = http://mydspace.edu/xmlui/submit?workspaceID=#wsid#
Value
Informational URL template for links to items in the workspace (items in the archive will use the handle). The #wsid# url
Note the workspace id of the item. The example above shows how to construct this URL for XMLUI.
Other configuration options exist that define the mapping between mime types, ingesters, and disseminators. A
typical configuration looks like this:
plugin.named.org.dspace.sword2.SwordContentIngester = \
org.dspace.sword2.SimpleZipContentIngester = http://purl.org/net/sword/package/SimpleZip, \
org.dspace.sword2.SwordMETSIngester = http://purl.org/net/sword/package/METSDSpaceSIP, \
org.dspace.sword2.BinaryContentIngester = http://purl.org/net/sword/package/Binary
plugin.single.org.dspace.sword2.SwordEntryIngester = \
org.dspace.sword2.SimpleDCEntryIngester
plugin.single.org.dspace.sword2.SwordEntryDisseminator = \
org.dspace.sword2.SimpleDCEntryDisseminator
# note that we replace ";" with "_" as ";" is not permitted in the PluginManager names
plugin.named.org.dspace.sword2.SwordContentDisseminator = \
org.dspace.sword2.SimpleZipContentDisseminator = http://purl.org/net/sword/package/SimpleZip, \
org.dspace.sword2.FeedContentDisseminator = application/atom+xml, \
org.dspace.sword2.FeedContentDisseminator = application/atom+xml_type_feed
# note that we replace ";" with "_" as ";" is not permitted in the PluginManager names
plugin.named.org.dspace.sword2.SwordStatementDisseminator = \
org.dspace.sword2.AtomStatementDisseminator = atom, \
org.dspace.sword2.OreStatementDisseminator = rdf, \
org.dspace.sword2.AtomStatementDisseminator = application/atom+xml_type_feed, \
org.dspace.sword2.OreStatementDisseminator = application/rdf+xml
Web pages tend to consist of several files – one or more HTML files that contain references to each
other, and stylesheets and image files that are referenced by the HTML files.
Web pages also link to or include content from other sites, often imperceptibly to the end-user. Thus, in a
few year's time, when someone views the preserved Web site, they will probably find that many links are
now broken or refer to other sites than are now out of context.In fact, it may be unclear to an end-user
when they are viewing content stored in DSpace and when they are seeing content included from
another site, or have navigated to a page that is not stored in DSpace. This problem can manifest when
a submitter uploads some HTML content. For example, the HTML document may include an image from
an external Web site, or even their local hard drive. When the submitter views the HTML in DSpace, their
browser is able to use the reference in the HTML to retrieve the appropriate image, and so to the
submitter, the whole HTML document appears to have been deposited correctly. However, later on,
when another user tries to view that HTML, their browser might not be able to retrieve the included image
since it may have been removed from the external server. Hence the HTML will seem broken.
Often Web pages are produced dynamically by software running on the Web server, and represent the
state of a changing database underneath it.
Dealing with these issues is the topic of much active research. Currently, DSpace bites off a small,
tractable chunk of this problem. DSpace can store and provide on-line browsing capability for self-
contained, non-dynamic HTML documents. In practical terms, this means:
AIP Backup & Restore functionality only works with the Latest Version of Items
If you are using the AIP Backup and Restore functionality to backup / restore / migrate DSpace
Content, you must be aware that the "Item Level Versioning" feature is not yet compatible with AIP
Backup & Restore. Using them together may result in accidental data loss. Currently the AIPs that
DSpace generates only store the latest version of an Item. Therefore, past versions of Items will
always be lost when you perform a restore / replace using AIP tools. See DS-1382.
If you enable versioning, the name and email of the submitter are shown to all users by default in
Version history. The only way to circumvent this is to make Version history visible only to admins by
setting item.history.view.admin=false in [dspace]/config/modules/versioning.cfg.
See DS-1349 for ongoing work on a better solution.
Starting from DSpace 4.0, Item Level Versioning is also supported in JSPUI.
<!-- =====================
Item Level Versioning
===================== -->
<!-- To enable Item Level Versioning features, uncomment this aspect. -->
<aspect name="Versioning Aspect" path="resource://aspects/Versioning/" />
#---------------------------------------------------#
#------------ VERSIONING CONFIGURATIONS ------------#
#---------------------------------------------------#
# These configs are used by the versioning system #
#---------------------------------------------------#
#Parameter 'enabled' is used only by JSPUI
enabled=false
A new version can only be created started from the latest available version
When new version has been created and still needs to pass certain steps of the workflow, it is temporary
impossible to create another new version until the workflow steps are finished and the new version has
replaced the previous one.
1. Click "Create a new version" from the Context Menu in the navigation bar.
2. Provide the reason for creating a new version that will lateron be stored and displayed in the version
summary.
3. Your new version is now creates as a new Item in your Workspace. It requires you to go through the
submission and workflow steps like you would do for a normal, new submission to the collection. The
rationale behind this is that if you are adding new files or metadata, you will also need to accept the
license for them. In addition to this, the versioning functionality does not bypass any quality control
embedded in the workflow steps.
After the submission steps and the execution of subsequent workflow steps, the new version becomes available
in the repository.
4.13.6 Architecture
Versioning model
For every new Version a separate DSpace Item will be created that replicates the metadata, bundle and
bitstream records. The bitstream records will point to the same file on the disk.
The Cleanup method has been modified to retain the file if another Bitstream record point to it (the dotted lines
in the diagram represent a bitstream deleted in the new version), in other words the file will be deleted only if
the Bitstream record processed is the only one to point to the file (count(INTERNAL_ID)=1).
Versioning Service
The Versioning Service will be responsible for the replication of one or more Items when a new version is
requested. The new version will not yet be preserved in the Repository, it will be preserved when the databases
transactional window is completed, thus when errors arise in the versioning process, the database will be
properly kept in its original state and the application will alert that an exception has occurred that is in need of
correction.
The Versioning Service will rely on a generic IdentifierService that is described below for minting and registering
any identifiers that are required to track the revision history of the Items.
Identifier Service
The Identifier Service maintains an extensible set of IdentifierProvider services that are responsible for two
important activities in Identifier management:
1. Resolution: IdentifierService act in a manner similar to the existing HandleManager in DSpace, allowing
for resolution of DSpace Items from provided identifiers.
2. Minting: Minting is the act of reserving and returning an identifier that may be used with a specific
DSpaceObject.
3. Registering: Registering is the act of recording the existence of a minted identifier with an external
persistent resolver service. These services may reside on the local machine (HandleManager) or exist as
external services (PURL or EZID DOI registration services)
/**
*
* @param context
* @param dso
* @param identifier
* @return
*/
String lookup(Context context, DSpaceObject dso, Class<? extends Identifier> identifier);
/**
*
* This will resolve a DSpaceObject based on a provided Identifier. The Service will
interrogate the providers in
* no particular order and return the first successful result discovered. If no
resolution is successful,
* the method will return null if no object is found.
*
* TODO: Verify null is returned.
*
* @param context
* @param identifier
* @return
* @throws IdentifierNotFoundException
* @throws IdentifierNotResolvableException
*/
DSpaceObject resolve(Context context, String identifier) throws
IdentifierNotFoundException, IdentifierNotResolvableException;
/**
*
* Reserves any identifiers necessary based on the capabilities of all providers in the
service.
*
* @param context
* @param dso
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void reserve(Context context, DSpaceObject dso) throws AuthorizeException, SQLException,
IdentifierException;
/**
*
* Used to Reserve a Specific Identifier (for example a Handle, hdl:1234.5/6) The
provider is responsible for
* Detecting and Processing the appropriate identifier, all Providers are interrogated,
multiple providers
* can process the same identifier.
*
* @param context
* @param dso
* @param identifier
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void reserve(Context context, DSpaceObject dso, String identifier) throws
AuthorizeException, SQLException, IdentifierException;
/**
*
* @param context
* @param dso
* @return
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void register(Context context, DSpaceObject dso) throws AuthorizeException,
SQLException, IdentifierException;
/**
*
/**
* Delete (Unbind) all identifiers registered for a specific DSpace item. Identifiers
are "unbound" across
* all providers in no particular order.
*
* @param context
* @param dso
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void delete(Context context, DSpaceObject dso) throws AuthorizeException, SQLException,
IdentifierException;
/**
* Used to Delete a Specific Identifier (for example a Handle, hdl:1234.5/6) The
provider is responsible for
* Detecting and Processing the appropriate identifier, all Providers are interrogated,
multiple providers
* can process the same identifier.
*
* @param context
* @param dso
* @param identifier
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void delete(Context context, DSpaceObject dso, String identifier) throws
AuthorizeException, SQLException, IdentifierException;
4.13.7 Configuration
[dspace_installation_dir]/config/spring/api/versioning-service.xml
In this file, you can specify which metadata fields are automatically "reset" (i.e. cleared out) during the creation
of a new item version. By default, all metadata values (and bitstreams) are copied over to the newly created
version, with the exception of dc.date.accessioned and dc.description.provenance. You may specify
additional metadata fields to reset by adding them to the "ignoredMetadataFields" property in the "versioning-
service.xml" file:
[dspace_installation_dir]/config/spring/api/identifier-service.xml
No changes to this file are required to enable Versioning. This file is currently only relevant if you aim to develop
your own implementation of versioning.
By default, all users will be able to see the version history. To ensure that only administrators can see the
Version History, enable item.history.view.admin in following configuration file:
[dspace_installation_dir]/config/modules/versioning.cfg
item.history.view.admin=false
One possible solution would be to present an end user with aggregated statistics across all viewers, and give
administrators the possibility to view statistics per version.
Therefore, discussion has illustrated that there is a usecase for an intermediate exposure of version history that
hides the Editor column.
4.13.9 Credits
The initial contribution of Item Level Versioning to DSpace 3.0 was implemented by @mire with kind support
from:
MBLWHOI Library
Woods Hole Oceanographic Institution
Marine Biology Laboratory, Center for Library and Informatics, History and Philosophy of Science
program
Arizona State University, Center for Biology and Society
Dryad
Configuration
Customizing the JSP pages
4.14.1 Configuration
The user will need to refer to the extensive WebUI/JSPUI configurations that are contained in JSP Web
Interface Settings.
To make it even easier, DSpace allows you to 'override' the JSPs included in the source distribution with
modified versions, that are stored in a separate place, so when it comes to updating your site with a new
DSpace release, your modified versions will not be overwritten. It should be possible to dramatically change the
look of DSpace to suit your organization by just changing the CSS style file and the site 'skin' or 'layout' JSPs in
jsp/layout; if possible, it is recommended you limit local customizations to these files to make future upgrades
easier.
You can also easily edit the text that appears on each JSP page by editing the Messages.properties file.
However, note that unless you change the entry in all of the different language message files, users of other
languages will still see the default text for their language. See Internationalization in Application Layer.
Note that the data (attributes) passed from an underlying Servlet to the JSP may change between versions, so
you may have to modify your customized JSP to deal with the new data.
Thus, if possible, it is recommended you limit your changes to the 'layout' JSPs and the stylesheet.
If you wish to modify a particular JSP, place your edited version in the [dspace-source]/dspace/modules/jspui
/src/main/webapp/ directory (this is the replacement for the pre-1.5 /jsp/local directory), with the same path as
the original. If they exist, these will be used in preference to the default JSPs. For example:
[jsp.dir]/community-list.jsp [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/community-list.jsp
[jsp.dir]/mydspace/main.jsp [jsp.custom-dir]/dspace/modules/jspui/src/main/webapp/mydspace/main.jsp
Heavy use is made of a style sheet, styles.css. If you make edits, copy the local version to [jsp.custom-dir]
/dspace/modules/jspui/src/main/webapp/styles.css, and it will be used automatically in preference to the default,
as described above.
Fonts and colors can be easily changed using the stylesheet. The stylesheet is a JSP so that the user's browser
version can be detected and the stylesheet tweaked accordingly.
The 'layout' of each page, that is, the top and bottom banners and the navigation bar, are determined by the
JSPs /layout/header-*.jsp and /layout/footer-*.jsp. You can provide modified versions of these (in [jsp.custom-dir]
/dspace/modules/jspui/src/main/webapp/layout), or define more styles and apply them to pages by using the
"style" attribute of the dspace:layout tag.
1. Rebuild the DSpace installation package by running the following command from your [dspace-source]
/dspace/ directory:
mvn package
2. Update all DSpace webapps to [dspace]/webapps by running the following command from your [dspace-
source]/dspace/target/dspace-[version]-build.dir directory:
cp -R /[dspace]/webapps/* /[tomcat]/webapps
4. Restart Tomcat
When you restart the web server you should see your customized JSPs.
Introduction
Common areas of localization
Enabling additional locales
Localization of email messages
Metadata localization
XMLUI specific localization
Message catalog
Where to find the message catalog
Where to edit
Difference with JSPUI
JSPUI specific localization
Message catalog
Where to find the message catalog
Where to edit
Localization of input-forms.xml and license.default
4.15.1 Introduction
DSpace ships with a number of interface translations. This page provides information on areas that can be
localized by means of configuration or customization. By default, DSpace will look at the user's browser
language. If it has a language file in the user's language, it will render the interface in that language. If not, it will
default to English or another default that you have configured.
webui.supported.locales
default.locale
You can change default.locale to a different one than English after adding it to webui.supported.locales.
[dspace]/config/emails
Metadata localization
DSpace associates each metadata field value with a language code (though it may be left empty, e.g. for
numeric values).
Message catalog
XMLUI supports multiple languages through the use of internationalization catalogues as defined by the Cocoon
Internationalization Transformer. Each catalog contains the translation of all user-displayed strings into a
particular language or variant. Each catalog is a single xml file whose name is based upon the language it is
designated for, thus:
messages_language_country_variant.xml
messages_language_country.xml
messages_language.xml
messages.xml
The interface will automatically determine which file to select based upon the user's browser and system
configuration. For example, if the user's browser is set to Australian English then first the system will check if
messages_en_au.xml is available. If this translation is not available it will fall back to messages_en.xml, and
finally if that is not available, messages.xml.
[dspace-source]/dspace-xmlui/src/main/webapp/i18n/messages.xml
The different translations for this message catalog are being managed separately from the DSpace core
project, in order to release updates for these files more frequently than the DSpace software itself. Visit the
dspace-xmlui-lang project on Github.
Where to edit
In some cases you may want to add additional keys to the message catalog or changing the particular wording
of DSpace concepts. For example, you may want to change "Communities" into "Departments". These kind of
changes may get automatically overwritten again when you upgrade to the newest version of DSpace. It is
therefore advised to keep such changes isolated in the following location:
[dspace-source]/dspace/modules/xmlui/src/main/webapp/i18n/
After rebuilding DSpace, any messages files placed in this directory will be automatically included in the XMLUI
web application. Files of the same name will override any default files. By default, this full directory path may not
exist or may be empty. If it does not exist, you can simply create it. You can place any number of translation
catalogues in this directory. To add additional translations, just add another copy of the messages.xml file
translated into the specific language and country variant you need.
After building and deploying, DSpace will finally read the files from the following location:
[dspace]/webapps/xmlui/i18n/messages.xml
Again, note that you will need to rebuild DSpace for these changes to take effect in your installed XMLUI web
application!
While it seems like a fast option to change your messages straight in the deployed dspace directory,
these changes are very volatile. If you rebuild and redeploy DSpace, these changes will get lost.
For more information about the [dspace-source]/dspace/modules/ directory, and how it may be used to
"overlay" (or customize) the default XMLUI interface, classes and files, please see: Advanced Customisation
Message catalog
The Java Standard Tag Library v1.0 is used to specify messages in the JSPs like this:
<H1><fmt:message key="jsp.search.results.title"/></H1>
This message can be changed using the config/language-packs/Messages.properties file. This must be done at
build-time: Messages.properties is placed in the dspace.war Web application file.
Phrases may have parameters to be passed in, to make the job of translating easier, reduce the number of
'keys' and to allow translators to make the translated text flow more appropriately for the target language. Here
is an example of a phrase in which two parameters are passed in:
Multiple Messages.properties can be created for different languages. See ResourceBundle.getBundle. e.g. you
can add German and Canadian French translations:
Messages_de.properties
Messages_fr_CA.properties
The end user's browser settings determine which language is used by default. The user can change the
language by clicking a link in the UI. These links are visible if more than one language is configured in DSpace.
The English language file Messages.properties (or the default server locale) will be used as a fallback if there's
no language bundle for the end user's preferred language. Note that the English file is not called Messages_en.
properties. This is because it is always available as a fallback, regardless of server configuration.
[dspace-source]/dspace-api/src/main/resources/Messages.properties
The different translations for this message catalog are being managed separately from the DSpace core
project, in order to release updates for these files more frequently than the DSpace software itself. Visit the
dspace-api-lang project on Github.
Where to edit
In some cases you may want to add additional keys to the message catalog or changing the particular wording
of DSpace concepts. For example, you may want to change "Communities" into "Departments". These kind of
changes may get automatically overwritten again when you upgrade to the newest version of DSpace. It is
therefore advised to keep such changes isolated in the following location:
[dspace-source]/dspace/modules/jspui/src/main/resources/
After rebuilding DSpace, any messages files placed in this directory will be automatically included in the JSPUI
web application. Files of the same name will override any default files. By default, this full directory path may not
exist or may be empty. If it does not exist, you can simply create it. You can place any number of translation
catalogues in this directory. To add additional translations, just add another copy of the M essages.properties file
translated into the specific language and country variant you need.
After building and deploying, DSpace will finally read the files from the dspace-api-4.0.jar file in your [tomcat]
\webapps\jspui\WEB-INF\lib directory.
Again, note that you will need to rebuild DSpace for these changes to take effect in your installed JSPUI web
application!
For more information about the [dspace-source]/dspace/modules/ directory, and how it may be used to
"overlay" (or customize) the default XMLUI interface, classes and files, please see: Advanced Customisation
Overview
DSpace can apply filters or transformations to files/bitstreams, creating new content. Filters are included that
extract text for full-text searching, and create thumbnails for items that contain images. The media filters are
controlled by the dspace filter-media script which traverses the asset store, invoking all configured
MediaFilter or FormatFilter classes on files/bitstreams (see Configuring Media Filters for more
information on how they are configured).
HTML Text org.dspace.app. extracts the full text of HTML documents for full true
Extractor mediafilter.HTMLFilter text indexing. (Uses Swing's HTML Parser)
Branded org.dspace.app. creates a branded preview image for GIF, JPEG false
Preview mediafilter. and PNG files
JPEG BrandedPreviewJPEGFilter
PDF Text org.dspace.app. extracts the full text of Adobe PDF documents true
Extractor mediafilter.PDFFilter (only if text-based or OCRed) for full text
indexing. (Uses the Apache PDFBox tool)
XPDF Text org.dspace.app. extracts the full text of Adobe PDF documents false
Extractor mediafilter.XPDF2Text (only if text-based or OCRed) for full text
indexing (Uses the XPDF command line tools
available for Unix.) See XPDF Filter Configuration
for details on installing/enabling.
true
Word Text org.dspace.app. extracts the full text of Microsoft Word or Plain
Extractor mediafilter.WordFilter Text documents for full text indexing. (Uses the
"Microsoft Word Text Mining" tools.)
PowerPoint org.dspace.app. extracts the full text of slides and notes in true
Text mediafilter. Microsoft PowerPoint and PowerPoint XML
Extractor PowerPointFilter documents for full text indexing (Uses the
Apache POI tools.)
Please note that the filter-media script will automatically update the DSpace search index by default (see
Legacy methods for re-indexing content) This is the recommended way to run these scripts. But, should you
wish to disable it, you can pass the -n flag to either script to do so (see Executing (via Command Line) below).
Enabling/Disabling MediaFilters
The media filter plugin configuration filter.plugins in dspace.cfg contains a list of all enabled media
/format filter plugins (see Configuring Media Filters for more information). By modifying the value of filter.
plugins you can disable or enable MediaFilter plugins.
[dspace]/bin/dspace filter-media
With no options, this traverses the asset store, applying media filters to bitstreams, and skipping bitstreams that
have already been filtered.
Alternatively, you could extend the org.dspace.app.mediafilter.MediaFilter class, which just defaults to
performing no pre/post-processing of bitstreams before or after filtering.
You must give your new filter a "name", by adding it and its name to the plugin.named.org.dspace.app.
mediafilter.FormatFilter field in dspace.cfg. In addition to naming your filter, make sure to specify its input
formats in the filter.<class path>.inputFormats config item. Note the input formats must match the short
description field in the Bitstream Format Registry (i.e. bitstreamformatregistry table).
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
org.dspace.app.mediafilter.MySimpleMediaFilter = My Simple Text Filter, \ ...
filter.org.dspace.app.mediafilter.MySimpleMediaFilter.inputFormats =
Text
If you neglect to define the inputFormats for a particular filter, the MediaFilterManager will never call that filter,
since it will never find a bitstream which has a format matching that filter's input format(s).
If you have a complex Media Filter class, which actually performs different filtering for different formats (e.g.
conversion from Word to PDF and conversion from Excel to CSV), you should define this as described in
Chapter 13.3.2.2 .
Since SelfNamedPlugins are self-named (as stated), they must provide the various names the plugin uses by
defining a getPluginNames() method. Generally speaking, each "name" the plugin uses should correspond to a
different type of filter it implements (e.g. "Word2PDF" and "Excel2CSV" are two good names for a complex
media filter which performs both Word to PDF and Excel to CSV conversions).
Self-Named Media/Format Filters are also configured differently in dspace.cfg. Below is a general template for a
Self Named Filter (defined by an imaginary MyComplexMediaFilter class, which can perform both Word to PDF
and Excel to CSV conversions):
As shown above, each Self-Named Filter class must be listed in the plugin.selfnamed.org.dspace.app.
mediafilter.FormatFilter item in dspace.cfg. In addition, each Self-Named Filter must define the
input formats for each named plugin defined by that filter. In the above example the MyComplexMediaFilter
class is assumed to have defined two named plugins, Word2PDF and Excel2CSV. So, these two valid plugin
names ("Word2PDF" and "Excel2CSV") must be returned by the getPluginNames() method of the
MyComplexMediaFilter class.
These named plugins take different input formats as defined above (see the corresponding inputFormats
setting).
If you neglect to define the inputFormats for a particular named plugin, the MediaFilterManager
will never call that plugin, since it will never find a bitstream which has a format matching that plugin's
input format(s).
For a particular Self-Named Filter, you are also welcome to define additional configuration settings in dspace.cfg
. To continue with our current example, each of our imaginary plugins actually results in a different output format
(Word2PDF creates "Adobe PDF", while Excel2CSV creates "Comma Separated Values"). To allow this
complex Media Filter to be even more configurable (especially across institutions, with potential different
"Bitstream Format Registries"), you may wish to allow for the output format to be customizable for each named
plugin. For example:
Any custom configuration fields in dspace.cfg defined by your filter are ignored by the MediaFilterManager, so it
is up to your custom media filter class to read those configurations and apply them as necessary. For example,
you could use the following sample Java code in your MyComplexMediaFilter class to read these custom
outputFormat configurations from dspace.cfg:
Title (dc.title)
When submitting an Item via the DSpace web user interface, this field is required.
If you add an Item to DSpace through another means (SWORD, etc), it is recommend to specify a
title for an Item. Without a title, the Item will show up in DSpace a "Untitled".
Publication Date (dc.date.issued)
When submitting an Item via the DSpace web user interface, this field is required (by default).
However, your System Administrator can choose to enable the "Initial Questions" step
within the Submission User Interface. Enabling this step will cause the following to occur: If
the item is said to be "published", then the Publication Date will be required. If the item is
said to be "unpublished" then the Publication Date will be auto-set to today's date (date of
submission). WARNING: Google Scholar has recommended against automatically
assigning this "dc.date.issued" field to the date of submission as it often results in incorrect
dates in Google Scholar results. See DS-1481 and DS-1745 for more details.
If you add and Item to DSpace through another means (SWORD, etc), it is recommended to
specify the date in which an Item was published, in ISO-8601 (e.g. 2007, 2008-01, or 2011-03-04).
This ensures DSpace can accurately report the publication date to services like Google Scholar. If
an item is unpublished, you can either chose to leave this blank, or pass in the literal string "today"
(which will tell DSpace to automatically set it to the date of ingest)
As of DSpace 4.0, the system will not assign a "dc.date.issued" when unspecified.
Previous versions of DSpace (3.0 or below) would set "dc.date.issued" to the date of
accession (dc.date.accessioned), if it was unspecified during ingest.
If you are adding content to DSpace without using the DSpace web user interface, there
are two recommended options for assigning "dc.date.issued"
If the item is previously published before, please set "dc.date.issued" to the date
of publication in ISO-8601(e.g. 2007, 2008-01, or 2011-03-04)
If the item has never been previously published, you may set "dc.date.
issued='today'" (the literal string "today"). This will cause DSpace to automatically
assign "dc.date.issued" to the date of accession (dc.date.accessioned), as it did
previously
You can also chose to leave "dc.date.issued" as unspecified, but then the
new Item will have an empty date within DSpace.
Obviously, we recommend specifying as much metadata as you can about a new Item. For a full list of
supported metadata fields, please see: Metadata and Bitstream Format Registries
It is generally discouraged to use any of the fields from the default schema as a place to store information that
doesn't correspond with the fields description. This is especially true if you are ever considering the option to
open up your repository metadata for external harvesting.
4.18.1 Introduction
The Item Mapper is a tool in the DSpace web user interface allowing repository managers to display the same
item in multiple collections at once. Thanks to this feature, a repository manager is not forced to duplicate items
to display them in different collections
In the JSP User Interface, the item mapper can be accessed from the "Admin Tools" menu on the right side of a
collection homepage.
The item mapper offers an interface to search for items in the repository with the goal of mapping them to the
collection from where you accessed the Item Mapper. While the JSPUI only offers a search for author names,
the XMLUI Item Mapper offers a broader search.
The list of items mapped into the current collection can be consulted through the Item Mapper page. While
JSPUI immediately shows the list of mapped items, the XMLUI requires you to click "Browse mapped items" in
order to access the list.
The list of mapped items provides the functionality to remove the mapping for selected items.
4.18.3 Implications
If you wish for the item to take on the default authorizations of the destination collection, tick the 'Inherit default
policies of destination collection' checkbox. This is useful if you are moving an item from a private collection to a
public collection, or from a public collection to a private collection.
Note: When selecting the 'Inherit default policies of destination collection' option, ensure that this will not
override system-managed authorizations such as those imposed by the embargo system.
The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be
either a 'parent' community‚ meaning it has at least one sub-community, or a 'child' community‚ meaning it is a
sub-community of another community, or both or neither. In these terms, an 'orphan' is a community that lacks a
parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-
interface, since there is no parent community 'above' them. The first operation‚ establishing a parent/child
relationship - can take place between any community and an orphan. The second operation - removing a parent
/child relationship‚ will make the child an orphan.
where '-s' or '-set' means establish a relationship whereby the community identified by the '-p' parameter
becomes the parent of the community identified by the '-c' parameter. Both the 'parentID' and 'childID' values
may be handles or database IDs.
where '-r' or '-remove' means dis-establish the current relationship in which the community identified by
'parentID' is the parent of the community identified by 'childID'. The outcome will be that the 'childID' community
will become an orphan, i.e. a top-level community.
If the required constraints of operation are violated, an error message will appear explaining the problem, and
no change will be made. An example in a removal operation, where the stated child community does not have
the stated parent community as its parent: "Error, child community not a child of parent community".
It is possible to effect arbitrary changes to the community hierarchy by chaining the basic operations together.
For example, to move a child community from one parent to another, simply perform a 'remove' from its current
parent (which will leave it an orphan), followed by a 'set' to its new parent.
It is important to understand that when any operation is performed, all the sub-structure of the child community
follows it. Thus, if a child has itself children (sub-communities), or collections, they will all move with it to its new
'location' in the community tree.
Please note that when a user has submitted content, his EPerson record cannot be deleted because there are
references to it from the submitted item(s). If it is necessary to prevent further use of such an account, it can be
marked "cannot log in".
One of the options --email or --netid is required to name the record. The complete options are:
- --add required
a
To list accounts:
-L --list required
To modify an account:
-M --modify required
To delete an account:
-d --delete required
Introduction
Registered users can subscribe to collections in DSpace. After subscribing, users will receive a daily email
containing the new and modified items in the collections they are subscribed to.
In the XML User interface, new subscriptions are added on the users Profile page.
In the JSP User Interface, a specific dialog "Receive Email Updates" is available from the dropdown in the top
right corner.
This script can be run with a parameter -t for testing purposes. When this parameter is passed, the log level is
set to DEBUG to ensure that more diagnostic information will be added to the dspace logfile.
4.22.1 Introduction
The request a copy functionality was added to DSpace as a measure to facilitate access in those cases when
uploaded content can not be openly shared with the entire world immediately after submission into DSpace. It
gives users an efficient way to request access to the original submitter of the item, who can approve this access
with the click of a button. This practice complies with most applicable policies as the submitter interacts directly
with the requester on a case by case basis.
After clicking request copy at the bottom of this form, the original submitter of the item will receive an email
containing the details of the request. The email also contains a link with a token that brings the original
submitter to a page where he or she can either grant or reject access. If the original submitter can not evaluate
the request, he or she can forward this email to the right person, who can use the link containing the token
without having to log into DSpace.
Each of these buttons registers the choice of the submitter, displaying the following form in which an additional
reason for granting or rejecting the access can be added.
After hitting send, the contents of this form will be sent together with the associated files to the email address of
the requester. In case the access is rejected, only the reason will be sent to the requester.
After responding positively to a request for copy, the person who approved is presented with an optional form to
ask the repository administrator to alter the access rights of the item, allowing unrestricted open access to
everyone.
After clicking request copy at the bottom of this form, the original submitter of the item will receive an email
containing the details of the request. The email also contains a link with a token that brings the original
submitter to a page where he or she can either grant or reject access. If the original submitter can not evaluate
the request, he or she can forward this email to the right person, who can use the link containing the token
without having to log into DSpace.
After approving or rejecting the request for a copy, the contents of the form will be sent together with the
associated files to the email address of the requester. In case the access is rejected, only the reason will be
sent to the requester.
request_item. template for the message that will be sent to the administrator of the repository, after the
admin original submitter requests to have the permissions changed for this item.
request_item. template for the message that will be sent to the original submitter of an item with the
author request for copy.
The templates for emails that the requester receives, that could have been customized by the approver in the
aforementioned dialog are not managed as separate email template files. These defaults are stored in the
Messages.properties file under the keys
Property: request.item.type
Informational This parameter manages who can file a request for an item. The parameter is optional. When
Note it is empty or commented out, request a copy is disabled across the entire repository. When
set to all, any user can file a request for a copy. When set to logged, only registered users
can file a request for copy.
Property: mail.helpdesk
Informational In JSPUI, the email address assigned to this parameter will receive the emails both for
Note granting or rejecting request a copy requests, as well as requests to change item policies. In
XMLUI, the parameter will also receive these requests to change item policies. However, the
actual requests for copy in XMLUI will initially be directed at the email address of the original
submitter. When this email address can not be retrieved, the address in mail.helpdesk will be
used as a fallback.
The DSpace 4.0 REST API (Jersey) allows for data in DSpace to be re-used by external systems to make new
uses of your data. The DSpace 4.0 REST API provides READ-ONLY access via JSON or XML to publicly
accessibly Communities, Collections, Items and Bitstreams. Only non-hidden item metadata (e.g. provenance is
hidden by default) are exposed at the Item endpoint. We intend that future DSpace releases will grow and
evolve the REST API to support a greater set of features, based on community input and support. This Jersey
implementation of a REST API for DSpace is not related to other add-on modules providing REST-API support
for DSpace, such as GSOC REST API, Wijiti REST API, Hedtek REST API, or SimpleREST.
Note: You must set your request header's "Accept" property to either JSON (application/json) or XML
(application/xml) depending on the format you prefer to work with.
Example usage from command line in XML format with pretty printing:
For this documentation, we will assume that the URL to the "REST" webapp will be http://localhost:8080/rest/
for production systems, this address will be slightly different, such as: http://demo.dspace.org/rest/. The path to
an endpoint, will go after the /rest/, such as /rest/communities, all-together this is: http://localhost:8080/rest
/communities
Another thing to note is that there are Query Parameters that you can tack on to the end of an endpoint to do
extra things. The most commonly used one in this API is "?expand". Instead of every API call defaulting to
giving you every possible piece of information about it, it only gives a most commonly used set by default and
gives the more "expensive" information when you deliberately request it. Each endpoint will provide a list of
available expands in the output, but for getting started, you can start with ?expand=all, to make the endpoint
provide all of its information (parent objects, metadata, child objects). You can include multiple expands, such
as: ?expand=collections,subCommunities .
Communities
Communities in DSpace are used for organization and hierarchy, and are containers that hold sub-Communities
and Collections. (ex: Department of Engineering)
Collections
Collections in DSpace are containers of Items. (ex: Engineering Faculty Publications)
You can access all the collections in a specific community through: /communities/:communityID?expand=all
Items
Items in DSpace represent a "work" and combine metadata and files, known as Bitstreams.
Specific /items/:itemID
Item
You can access all the items in a specific collection through: /collections/:collectionID?expand=items
Bitstreams
Bitstreams are files. They have a filename, size (in bytes), and a file format. Typically in DSpace, the Bitstream
will the "full text" article, or some other media. Some files are the actual file that was uploaded (tagged with
bundleName:ORIGINAL), others are DSpace-generated files that are derivatives or renditions, such as text-
extraction, or thumbnails. You can download files/bitstreams. DSpace doesn't really limit the type of files that it
takes in, so this could be PDF, JPG, audio, video, zip, or other. Also, the logo for a Collection or a Community,
is also a Bitstream.
You can access all the Bitstreams in a specific Item through: /items/:itemID?expand=bitstreams
You can access the parent object of a Bitstream (normally an Item, but possibly a Collection or Community
when it is its logo) through: /bitstreams/:bitstreamID?expand=parent
Below is some sample Jersey code of how you wire up resources, choose to serialize to HTML, JSON or XML.
And between display single-entity vs. display list-of-entities.
@Path("/collections")
public class CollectionsResource {
@GET
@Path("/")
@Produces(MediaType.TEXT_HTML)
public String listHTML() {...}
@GET
@Path("/")
@Produces({MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML})
public org.dspace.rest.common.Collection[] list(@QueryParam("expand") String expand) {...}
@GET
@Path("/{collection_id}")
@Produces({MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML})
public org.dspace.rest.common.Collection getCollection(@PathParam("collection_id") Integer
collection_id, @QueryParam("expand") String expand) {...}
There was no central ProviderRegistry that you have to declare your path. Your free to use @annotations to get
your code to respond to requests. There are helpful parameter helpers to extract parameters into Java
variables.
Example true
Value
Informational Boolean value indicates whether statistics should be recorded for access via the REST API;
Note Defaults to 'false'.
http://localhost:8080/rest/items/:ID?userIP=ip&userAgent=userAgent&xforwarderfor=xforwarderfor
If no parameters are given, the details of the HTTP request's sender are used in statistics. This enables tools to
record the details of their user rather than themselves.
For metadata, ItemUpdate can perform 'add' and 'delete' actions on specified metadata elements. For
bitstreams, 'add' and 'delete' are similarly available. All these actions can be combined in a single batch run.
ItemUpdate supports an undo feature for all actions except bitstream deletion. There is also a test mode, as
with ItemImport. However, unlike ItemImport, there is no resume feature for incomplete processing. There is
more extensive logging with a summary statement at the end with counts of successful and unsuccessful items
processed.
One probable scenario for using this tool is where there is an external primary data source for which the
DSpace instance is a secondary or down-stream system. Metadata and/or bitstream content changes in the
primary system can be exported to the simple archive format to be used by ItemUpdate to synchronize the
changes.
A note on terminology: item refers to a DSpace item. metadata element refers generally to a qualified or
unqualified element in a schema in the form [schema].[element].[qualifier] or [schema].
[element] and occasionally in a more specific way to the second part of that form. metadata field refers to a
specific instance pairing a metadata element to a value.
The user is referred to the previous section DSpace Simple Archive Format.
Additionally, the use of a delete_contents is now available. This file lists the bitstreams to be deleted, one
bitstream ID per line. Currently, no other identifiers for bitstreams are usable for this function. This file is an
addition to the Archive format specifically for ItemUpdate.
The optional suppress_undo file is a flag to indicate that the 'undo archive' should not be written to disk. This file
is usually written by the application in an undo archive to prevent a recursive undo. This file is an addition to the
Archive format specifically for ItemUpdate.
ItemUpdate Commands
Command used: [dspace]/bin/dspace itemupdate
-a or -- Repeatable for multiple elements. The metadata element should be in the form dc.x
addmetadata or dc.x.y. The mandatory argument indicates the metadata fields in the dublin_core.
[metadata xml file to be added unless already present (multiple fields should be separated by a
element] semicolon ';'). However, duplicate fields will not be added to the item metadata
without warning or error.
-d or -- Repeatable for multiple elements. All metadata fields matching the element will be
deletemetadata deleted.
[metadata
element]
-A or -- Adds bitstreams listed in the contents file with the bitstream metadata cited there.
addbitstreams
-D or -- Not repeatable. With no argument, this operation deletes bitstreams listed in the
deletebitstreams deletes_contents file. Only bitstream IDs are recognized identifiers for this
[filter plug operation. The optional filter argument is the classname of an implementation of
classname or org.dspace.app.itemdupate.BitstreamFilter class to identify files for
alias] deletion or one of the aliases (e.g. ORIGINAL, ORIGINAL_AND_DERIVATIVES,
TEXT, THUMBNAIL) which reference existing filters based on membership in a
bundle of that name. In this case, the delete_contents file is not required for any
item. The filter properties file will contains properties pertinent to the particular filer
used. Multiple filters are not allowed.
-i or --itemfield Specifies the metadata field that contains the item's identifier; Default value is "dc.
identifier.uri" (Optional)
-t or --test Runs the process in test mode with logging. But no changes applied to the DSpace
instance. (Optional)
-P or --provenance Prevents any changes to the provenance field to represent changes in the bitstream
content resulting from an Add or Delete. In other words, when this flag is specified,
no new provenance information is added to the DSpace Item when adding/deleting a
bitstream. No provenance statements are written for thumbnails or text derivative
bitstreams, in keeping with the practice of MediaFilterManager. (Optional)
-F or --filter- The filter properties files to be used by the delete bitstreams action (Optional)
properties
CLI Examples
Adding Metadata:
This will update all DSpace Items listed in your archive directory, adding a new dc.description metadata
field. Items will be located in DSpace based on the handle found in 'dc.identifier.uri' (since the -i argument
wasn't used, the default metadata field, dc.identifier.uri, from the dublin_core.xml file in the archive folder, is
used).
Introducing Manakin (XMLUI) - Provides an overview of what XMLUI is and how it works.
Learning to Use Manakin (XMLUI) - Overview of how to use Manakin and how it works. Based
on DSpace 1.5, but also valid for current versions
Making DSpace XMLUI Your Own - Concentrates on using Maven to build Overlays in the
XMLUI (Manakin). Also has very basic examples for JSPUI. Based on DSpace 1.6.x but also
valid for current versions.
The XMLUI (aka Manakin) is built on Apache Cocoon framework. The XMLUI uses Cocoon to provide a
modular, extendable, tiered interface framework
The XMLUI essentially consists of three main tiers, in increasing order of complexity:
1. Style Tier - allows one to use CSS and simple XHTML to stylize an existing XMLUI Theme
2. Theme Tier - allows one to use XSLT, XHTML and CSS to create new, more complex XMLUI Theme(s)
3. Aspect Tier - allows one to use the Cocoon framework and Java (or XSLT) to create new features
(aspects), and generate new content into DRI.
These tiers are very important and powerful because of their modularity. For example, based on your local
expertise with these technologies, your institution may decide to only modify the XMLUI at the "Style Tier" (by
just modifying CSS & images in an existing theme). As you learn more about themes & aspects, you may
decide to slowly venture into the more complex "Theme Tier" and finally into the "Aspect Tier". Other institutions
may determine that all they really need to ever do is make "Style Tier" changes.
Digging in a little deeper, there are three main XMLUI components that are unique to the XMLUI and used
throughout the system. These main components are:
DRI Schema- Digital Repository Interface (DRI) XML schema, which is the "abstract representation of a
single repository page". The DRI document is XML that contains all of the information (metadata)
available for display on a given page within the XMLUI. This information includes:
Metadata elements (described in METS, MODS, DSpace Internal Metadata (DIM), Qualified
Dublin Core, etc.)
Structural elements (described in TEI light)
For more specific information about DRI Schema along with examples, see DRI Schema
Reference.
#Aspects - One or more aspects are enabled at a given time. Generally speaking an aspect implements
a set of related features within the XMLUI. More specifically, the enabled aspects are what buildthe DRI
document. So, Aspects are the only things that can change the structure of the DRI document (or add
/remove content to/from DRI)
Aspects apply to all pages across your entire DSpace site. Each aspect must take a valid DRI
document as its input, and also output a valid DRI document.
Aspects usually are written in Java (and controlled by a Cocoon "sitemap.xmap"). However,
Aspects can also be written in XSLT (provided that the input and output are both valid DRI
documents)
#Themes- One or more themes are enabled at a given time. Themes are in charge of stylizing content
into a particular look & feel. More specifically, a theme is what transforms a DRI document into XHTML
(and adds any CSS, javascript, images, etc).
A single Theme may apply to your entire DSpace site, just a specific Community or Collection
(and all members of that Community/Collection), or just a specific page.
A Theme may consist of one or more of the following: an XSLT stylesheet, CSS stylesheets,
images, other static resources.
More information on creating a theme is available at: #Creating a New Theme
Additional Theme Resources include:
Manakin theme tutorial
Manakin Themes and Recipes
Create a new theme (Manakin)
Before getting started, it's worth mentioning that this request flow is controlled via a series of Cocoon Sitemap
files (named sitemap.xmap, themes.xmap and aspects.xmap). These Sitemap files are Cocoon's way of
defining the flow. More information about Cocoon Sitemaps is available at: http://cocoon.apache.org/2.1
/userdocs/concepts/sitemap.html
The following explanation provides a high level overview of how a request is processed, how a DRI document is
generated (via Aspects), and then how it is transformed into XHTML (via Themes). As this is a high level
overview, some details are likely left out, but the overarching flow is what is most important.
i.
i. The themes.xmap file will then load all "matching" themes which are configured in your
[dspace]/config/xmlui.xconf file (see #Themes below).
ii. If more than one theme matches the current URL path, then the first match wins
iii. Once a matching theme is located, that theme's sitemap.xmapfile (located in its theme
directory) is loaded and processed.
1. The theme's sitemap.xmapis in charge of actually loading the theme's XSLT, CSS,
etc. However, before it does that, you'll notice it makes a call to generate the DRI
document for the current page as follows:
2. This DRI call generates a brand new, internal Cocoon request. This request is then
processed back in the root sitemap.xmap (remember how we said that this
sitemap is the main entry point for all requests).
3. Back in the root sitemap, the "DRI/**" call is matched. This causes the [xmlui]/aspects/aspects.
xmapfile to be loaded. As the name suggests, this file obviously controls all the Aspects.
a. The aspects.xmap file will then load all enabled Aspects which are configured in your [dspace]
/config/xmlui.xconf file (see #Aspects below).
b. Each aspect is loaded in the order that it appears. However, multiple aspects may be loaded for
the same URL path. Remember, aspects can build upon each other (we call this an "aspect
chain") as they work together to generate the final DRI document.
c. When an Aspect is loaded, its sitemap.xmapis loaded & processed
NOTE: An aspect's sitemap.xmap is actually compiled into the dspace-xmlui-api.jar
file. However, if you have a copy of DSpace source handy, it can be found in: [dspace-
src]/dspace-xmlui/dspace-xmlui-api/src/main/resources/aspects/
[name-of-aspect]/
d. Each aspect is processed one-by-one (again in the order they are listed in xmlui.xconf). Each
aspect may add, remove or change content within the DRI document. After the final aspect is
finished processing, the DRI document is complete.
HINT: In the XMLUI you can always view the final DRI document by adding "?XML" or
"&XML" on to the end of the current URL in your web browser.
4. Once the final DRI document is complete (all aspects are done processing), the flow will return back to
your Theme's sitemap.xmap (remember, this is the same location that triggered the loading of the
Aspects in the first place).
5. At this point, your Theme's sitemap.xmap will continue its processing. Generally speaking, most
themes will then perform one or more XSLT transformations (to transform the final DRI document into
XHTML). They also may load up one or more CSS files to help stylize the final XHTML.
6. Finally, once the Theme has completed its processing (remember, only one theme is ever processed for
a single request), the final generated XHTML document is displayed to the user.
Again, the above flow is a slightly simplified version of what is going on underneath the XMLUI. As you can see,
Cocoon Sitemaps are what control most of the XMLUI processing (and the loading of the Aspects and Theme).
Property: xmlui.supportedLocales
Informational A list of supported locales for Manakin. Manakin will look at a user's browser configuration for
Note: the first language that appears in this list to make available to in the interface. This parameter
is a comma separated list of Locales. All types of Locales country, country_language,
country_language_variant. Note that if the appropriate files are not present (i.e.
Messages_XX_XX.xml) then Manakin will fall back through to a more general language.
Property: xmlui.force.ssl
Informational Force all authenticated connections to use SSL, only non-authenticated connections are
Note: allowed over plain http. If set to true, then you need to ensure that the ' dspace.hostname'
parameter is set to the correctly.
Property: xmlui.user.registration
Informational Determine if new users should be allowed to register. This parameter is useful in conjunction
Note: with Shibboleth where you want to disallow registration because Shibboleth will automatically
register the user. Default value is true.
Property: xmlui.user.editmetadata
Informational Determines if users should be able to edit their own metadata. This parameter is useful in
Note: conjunction with Shibboleth where you want to disable the user's ability to edit their metadata
because it came from Shibboleth. Default value is true.
Property: webui.user.assumelogin
Informational Determine if super administrators (those whom are in the Administrators group) can login as
Note: another user from the "edit eperson" page. This is useful for debugging problems in a running
dspace instance, especially in the workflow process. The default value is false, i.e., no one
may assume the login of another user.
Property: xmlui.user.loginredirect
Informational After a user has logged into the system, which url should they be directed? Leave this
Note: parameter blank or undefined to direct users to the homepage, or /profile for the user's profile,
or another reasonable choice is /submissions to see if the user has any tasks awaiting their
attention. The default is the repository home page.
Property: xmlui.theme.allowoverrides
Informational Allow the user to override which theme is used to display a particular page. When submitting
Note: a request add the HTTP parameter "themepath" which corresponds to a particular theme, that
specified theme will be used instead of the any other configured theme. Note that this is a
potential security hole allowing execution of unintended code on the server, this option is only
for development and debugging it should be turned off for any production repository. The
default value unless otherwise specified is "false".
Property: xmlui.bundle.upload
Informational Determine which bundles administrators and collection administrators may upload into an
Note: existing item through the administrative interface. If the user does not have the appropriate
privileges (add and write) on the bundle then that bundle will not be shown to the user as an
option.
Property: xmlui.community-list.render.full
Informational On the community-list page should all the metadata about a community/collection be
Note: available to the theme. This parameter defaults to true, but if you are experiencing
performance problems on the community-list page you should experiment with turning this
option off.
Property: xmlui.community-list.cache
Informational Normally, Manakin will fully verify any cache pages before using a cache copy. This means
Note: that when the community-list page is viewed the database is queried for each community
/collection to see if their metadata has been modified. This can be expensive for repositories
with a large community tree. To help solve this problem you can set the cache to be assumed
valued for a specific set of time. The downside of this is that new or editing communities
/collections may not show up the website for a period of time.
Property: xmlui.bistream.mods
Informational Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream.
Note: The MODS metadata file must be inside the "METADATA" bundle and named MODS.xml. If
this option is set to 'true' and the bitstream is present then it is made available to the theme
for display.
Property: xmlui.bitstream.mets
Informational Optionally, you may configure Manakin to take advantage of metadata stored as a bitstream.
Note: The METS metadata file must be inside the "METADATA" bundle and named METS.xml. If
this option is set to "true" and the bitstream is present then it is made available to the theme
for display.
Property: xmlui.google.analytics.key
Informational If you would like to use google analytics to track general website statistics then use the
Note: following parameter to provide your analytics key. First sign up for an account at
http://analytics.google.com, then create an entry for your repositories website. Google
Analytics will give you a snipit of javascript code to place on your site, inside that snip it is
your Google Analytics key usually found in the line: _uacct = "UA-XXXXXXX-X" Take this key
(just the UA-XXXXXX-X part) and place it here in this parameter.
Property: xmlui.controlpanel.activity.max
Informational Assign how many page views will be recorded and displayed in the control panel's activity
Note: viewer. The activity tab allows an administrator to debug problems in a running DSpace by
understanding who and how their DSpace is currently being used. The default value is 250.
Property: xmlui.controlpanel.activity.ipheader
Informational Determine where the control panel's activity viewer receives an events IP address from. If
Note: your DSpace is in a load balanced environment or otherwise behind a context-switch then you
will need to set the parameter to the HTTP parameter that records the original IP address.
The repository administrator is able to define which aspects and themes are installed for the particular
repository by editing the [dspace]/config/xmlui.xconf configuration file. The xmlui.xconf file consists of two major
sections: Aspects and Themes.
Aspects
The <aspects> section defines the "Aspect Chain", or the linear set of aspects that are installed in the
repository. For each aspect that is installed in the repository, the aspect makes available new features to the
interface. For example, if the "submission" aspect were to be commented out or removed from the xmlui.xconf,
then users would not be able to submit new items into the repository (even the links and language prompting
users to submit items are removed). Each <aspect> element has two attributes, name and path. The name is
used to identify the Aspect, while the path determines the directory where the aspect's code is located. Here is
the default aspect configuration:
<aspects>
ViewArtifacts The ViewArtifacts Aspect is reposonsible for displaying individual item metadata.
BrowseArtifacts The BrowseArtifacts Aspect is reponsible for displaying different browse options
SearchArtifacts The SearchArtifacts Aspect displays the different search boxes. Shouldn't be
activated together with DSpace Discovery.
Administrative The Administrative Aspect is responsible for administrating DSpace, such as creating,
modifying and removing all communities, collections, e-persons, groups, registries and authorizations.
E-Person The E-Person Aspect is responsible for logging in, logging out, registering new users, dealing
with forgotten passwords, editing profiles and changing passwords.
Submission The Submission Aspect is responsible for submitting new items to DSpace, determining the
workflow process and ingesting the new items into the DSpace repository.
Statistics The Statistics Aspect is responsible for displaying statistics information.
Workflow The Original Workflow Aspect is responsible for displaying workflow tasks. Shouldn't be
activated with the new workflow called XMLWorkflow
XMLWorkfflow This Aspect was added in DSpace 1.8 and is responsible for the new configurable
workflow system. Shouldn't be activated together with the Workflow aspect.
Discovery The Discovery Aspect replaces the standard search with faceted searching. It also takes care
of the faceted browse options. Shouldn't be activated togather with SearchArtifacts.
SwordClient The SwordClient aspect displays options that allow you to "push" DSpace content to
another SWORD-server enabled system.
XMLTest An aspect to assist developers in creating themes, as it displays different debugging options.
ArtifactBrowser This aspect has been split up into ViewArtifacts, BrowseArtifacts and SearchArtifacts in
DSpace 1.7.0
Themes
The <themes> section defines a set of "rules" that determine where themes are installed in the repository. Each
rule is processed in the order that it appears, and the first rule that matches determines the theme that is
applied (so order is important). Each rule consists of a <theme> element with several possible attributes:
name (always required)The name attribute is used to document the theme's name.
path (always required)The path attribute determines where the theme is located relative to the themes/
directory and must either contain a trailing slash or point directly to the theme's sitemap.xmap file.
regex (either regex and/or handle is required)The regex attribute determines which URLs the theme
should apply to.
handle (either regex and/or handle is required)The handle attribute determines which community,
collection, or item the theme should apply to.
If you use the "handle" attribute, the effect is cascading, meaning if a rule is established for a community
then all collections and items within that community will also have this theme apply to them as well. Here
is an example configuration:
<themes>
<theme name="Theme 1" handle="123456789/23" path="theme1/"/>
<theme name="Theme 2" regex="community-list" path="theme2/"/>
<theme name="Reference Theme" regex=".*" path="Reference/"/>
</themes>
In the example above three themes are configured: "Theme 1", "Theme 2", and the "Reference Theme".
The first rule specifies that "Theme 1" will apply to all communities, collections, or items that are
contained under the parent community "123456789/23". The next rule specifies any URL containing the
string "community-list" will get "Theme 2". The final rule, using the regular expression ".", will match
*anything, so all pages which have not matched one of the preceding rules will be matched to the
Reference Theme.
messages_language_country_variant.xml
messages_language_country.xml
messages_language.xml
messages.xml
The interface will automatically determine which file to select based upon the user's browser and system
configuration. For example, if the user's browser is set to Australian English then first the system will check if
messages_en_au.xml is available. If this translation is not available it will fall back to messages_en.xml, and
finally if that is not available, messages.xml.
DSpace XMLUI supplies an English only translation of the interface, which can be found in the XMLUI web
application ([dspace]/webapps/xmlui/i18n/messages.xml), after you first build DSpace.
If you wish to add other translations to the system, or make customizations to the existing messages.xml file,
you can place them in the following directory:
[dspace-source]/dspace/modules/xmlui/src/main/webapp/i18n/
After rebuilding DSpace, any messages files placed in this directory will be automatically included in the XMLUI
web application (and files of the same name will override any default files). By default this full directory path
may not exist (if not, just create it) or may be empty. You can place any number of translation catalogues in this
directory. To add additional translations, just add alternative versions of the messages.xml file in specific
language and country variants as needed for your installation.
To set a language other than English as the default language for the repository's interface, you can simply
rename the translation catalogue for the new default language to "messages.xml".
Again, note that you will need to rebuild DSpace for these changes to take effect in your installed XMLUI web
application!
For more information about the [dspace-source]/dspace/modules/ directory, and how it may be used to
"overlay" (or customize) the default XMLUI interface, classes and files, please see: Advanced Customisation
<global-variables>
<theme-path>[your theme's directory]</theme-path>
<theme-name>[your theme's name]</theme-name>
</global-variables>
Update both the theme's path to the directory name you created in step one. The theme's name is used only for
documentation.
3) Add your CSS stylesheets
The base theme template will produce a repository interface without any style - just plain XHTML with no color
or formatting. To make your theme useful you will need to supply a CSS Stylesheet that creates your desired
look-and-feel. Add your new CSS stylesheets:
[your theme's directory]/lib/style.css (The base style sheet used for all browsers)
1. Rebuild the DSpace installation package by running the following command from your [dspace-source]
/dspace/directory:
mvn package
2. Update all DSpace webapps to [dspace]/webapps by running the following command from your [dspace-
source]/dspace/target/dspace-[version]-build.dirdirectory:
cp -R /[dspace]/webapps/* /[tomcat]/webapps
4. Restart Tomcat
This will ensure the theme has been installed as described in the previous section "Configuring Themes
and Aspects".
Its (the News document) exact rendering in the XHTML UI depends, of course, on the theme. The default
content is designed to operate with the reference themes, so when you modify it, be sure to preserve the tag
structure and e.g. the exact attributes of the first DIV tag. Also note that the text is DRI, not HTML, so you must
use only DRI tags, such as figure to construct an image or xref tag to construct a link.
<document>
<body>
<div id="file.news.div.news" n="news" rend="primary">
<head> TITLE OF YOUR REPOSITORY HERE </head>
<p>
INTRO MESSAGE HERE
Welcome to my wonderful repository etc etc ...
A service of <xref target="http://myuni.edu/">My University</xref>
</p>
</div>
</body>
<options/>
<meta>
<userMeta/>
<pageMeta/>
<repositoryMeta/>
</meta>
</document>
<document>
<body>
<div id="file.news.div.news" n="news" rend="primary">
<head><i18n:text>myuni.repo.title</i18n:text></head>
<p>
<i18n:text>myuni.repo.intro</i18n:text>
<i18n:text>myuni.repo.a.service.of</i18n:text>
<xref target="http://myuni.edu/"><i18n:text>myuni.name</i18n:text></xref>
</p>
</div>
</body>
<options/>
<meta>
<userMeta/>
<pageMeta/>
<repositoryMeta/>
</meta>
</document>
Any static HTML content you add to this directory may also reference static content (e.g. CSS, Javascript,
Images, etc.) from the same [dspace-source]/dspace/modules/xmlui/src/main/webapp/static/ directory. You may
reference other static content from your static HTML files similar to the following:
This section will give the necessary steps to set up the OAI-ORE/OAI-PMH Harvester from the XMLUI
(Manakin). This feature is currently not available in the JSPUI.
5. The list of radio buttons labeled "Content being harvested" allows you to select the level of harvest.
These harvesting options include:
Harvest Metadata Only - will only harvest item metadata from the source DSpace (or any OAI-
PMH source)
Harvest metadata and references to bitstreams (requires ORE support) - will harvest item
metadata and create links to files/bitstreams (stored remotely) from the source DSpace (requires
OAI-ORE)
Harvest metadata and bitstreams (requires ORE support) - performs a full local replication.
Harvests both item metadata and files/bitstreams (requires OAI-ORE).
6. Select the appropriate option based on your needs, and click Save
At this point the settings are saved and the menu changes to provide three options:
Change Settings : takes you back to the edit screen (see above instructions)
Import Now: performs a single harvest from the remote collection into the local one. Success, notes, and
errors encountered in the process will be reflected in the "Last Harvest Result" entry. More detailed
information is available in the DSpace log.
Note that the whole harvest cycle is executed within a single HTTP request and will time out for
large collections. For this reason, it is advisable to use the automatic harvest scheduler set up
either in XMLUI or from the command line. If the scheduler is running, "Import Now" will handle
the harvest task as a separate thread.
Reset and Reimport Collection : will perform the same function as "Import Now", but will clear the
collection of all existing items before doing so.
Stop : the "full stop"; waits for the current item to finish harvesting, and aborts further
execution.
Reset Harvest Status : since stopping in the middle of a harvest is likely to result in
collections getting "stuck" in the queue, the button is available to clear all states.
Making DSpace XMLUI Your Own - Concentrates on using Maven to build Overlays in the XMLUI
(Manakin). Also has very basic examples for JSPUI. Based on DSpace 1.6.x.
Learning to Use Manakin (XMLUI) - Overview of how to use Manakin and how it works. Based on
DSpace 1.5, but also valid for 1.6.
Introducing Manakin (XMLUI)
Introduction
Mirage is a new XMLUI theme, added in DSpace 1.7 by @mire. The code was mainly developed by Art Lowel.
The main benefits of Mirage are:
Configuration Parameters
Property: xmlui.theme.mirage.item-list.emphasis
Informational Determines which style should be used to display item lists. Allowed values:
Note:
metadata: includes item abstracts in the listing and is suited for scientific articles.
file: immediately shows you whether files are attached to the items, by displaying a
large thumbnail icon for each of the items.
metadata is the default value.
Property: xmlui.theme.enableConcatenation
Informational Allows to enable concatenation for .js and .css files. Enhances performance when enabled by
Note: lowering the number of files that needs to be sent to the client per page request (as multiple
files will be concatenated together and sent as one file). Value can be true or false. False by
default.
Property: xmlui.theme.enableMinification
Informational Allows to enable minification for .js and .css files. Enhances performance when enabled by
Note: removing unnecessary whitespaces and other characters, thus reducing the size of files to be
sent. Value can be true or false. False by default.
Technical Features
The Simple Item Display underwent a full redesign to provide visitors with a clearer overview of available
metadata and associated files.
Item list views can now be displayed in two distinct different styles. Switching between these styles is
possible with the new dspace.cfg parameter 'xmlui.theme.mirage.item-list.emphasis'
The 'metadata' list style includes item abstracts in the listing and is suited for scientific articles.
The 'file' list style immediately shows you whether files are attached to the items, by displaying a
large thumbnail icon for each of the items.
Based on the new restructured dri2xhtml base templates. Templates in the theme, overriding the
new base templates, are located in the same folder hierarchy to ensure full transparency.
Automated browser feature detection for improved browser compatibility.
In other themes, user agent detection is used to identify which browser version your user is using.
Based on the result of this detection, the theme would use a different cascaded style sheet (CSS)
to render a compatible page for the visitor. This approach has 2 major issues:
User agent detection isn't very reliable
Maintaining these different CSS files is a maintenance nightmare for developers, especially
when using features from newer browsers.
Mirage applies two novel techniques to resolve these issues
For compatibility with older Internet Explorer browsers, conditional comments give the body
tag a class corresponding to the version of IE
modernizr is used to detect which css features are available in the user's browser. This way
you can target all browsers that support a certain feature using css classes, and rules
affecting the same element can be put together in the same place for all browsers.
CSS files are now split up according to function instead of browser. style.css will now fit most needs
for customization. Following additional CSS files are included, but will rarely need to be changed:
reset.css ensures that browser-specific initializations are being reset.
base.css contains a few base styles
helper.css contains helper classes to deal with specific functionality.
handheld.css and print.css enable you to define styles for handheld devices and printing of
pages.
jQuery and jQueryUI are included by default. To avoid conflicts the authority control javascript has
been rewritten to use jQuery instead of Prototype and Script.aculo.us.
Enhanced Performance
Caution: when minification is enabled, all code-comments will be removed. This could be a
problem for comments containing copyright notices, so for files with those comments you should
disable minification by adding '?nominify' after the url e.g.
<map:parameter name="javascript" value="lib/js/jquery-ui-1.8.5.custom.min.js?nominify"/>
Disabled by default, these features need to be enabled in the configuration using the properties
'xmlui.theme.enableConcatenation' and 'xmlui.theme.enableMinification'
These features can be enabled for other themes as well, but will require an alteration of the
theme's sitemap.
Javascript references are included at the bottom of the page instead of the top. This optimizes page load
times in general.
Troubleshooting
with
<script type="text/javascript">
<xsl:text disable-output-escaping="yes">var JsHost = (("https:" == document.location.
protocol) ? "https://" : "http://");
document.write(unescape("%3Cscript src='" + JsHost + "ajax.googleapis.com/ajax/libs
/jquery/1.4.2/jquery.min.js' type='text/javascript'%3E%3C/script%3E"));</xsl:text>
</script>
Thanks Peter Dietz for providing this fix. Note: This issue is resolved in 1.7.1
There are two main base templates you can use when creating an XMLUI Theme:
dri2xhtml - used in the generation of default Reference, Classic and Kubrick themes
dri2xhtml-alt - used in the generation of default Mirage theme
You only should use one of these two templates, based on which seems easier to you.
dri2xhtml
Template Structure
dri2xhtml-alt
Configuration and Installation
Features
Template Structure
dri2xhtml
The dri2xhtml base template is the original template for creating XMLUI themes. It attempts to provide generic
XSLT templates which are then applied across the entire DSpace site, thus making it easier to make site-wide
changes.
Template Structure
The dri2xhtml base template consists of five main XSLTs:
dri2xhtml-alt
The dri2xhtml-alt base template is an alternative template for creating XMLUI themes. It contains the same
XSLT templates from dri2xhtml, but they are divided into multiple files and folders. Each file attempts to group
XSLT templates together based on their function, in order to make it easier to find the templates related to the
feature you're trying to modify.
Mirage
<xsl:stylesheet xmlns:i18n="http://apache.org/cocoon/i18n/2.1"
xmlns:dri="http://di.tamu.edu/DRI/1.0/"
xmlns:mets="http://www.loc.gov/METS/"
xmlns:xlink="http://www.w3.org/TR/xlink/"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:dim="http://www.dspace.org/xmlns/dspace/dim"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="i18n dri mets xlink xsl dim xhtml mods dc">
<!--
comment out original dri2xhtml
<xsl:import href="../dri2xhtml.xsl"/>
and enable dri2xhtml-alt
-->
<xsl:import href="../dri2xhtml-alt/dri2xhtml.xsl"/>
<xsl:output indent="yes"/>
Because the contents of dri2xhtml-alt is identical to the current dri2xhtml.xsl and its derivatives, updating any of
the existing themes to reference the new dri2xhtml-alt should not impose any changes in the rendering of the
pages.
Features
Templates divided out into files so they can be more easily located, divided by Aspect, Page and
Functionality
Template Structure
/dspace-xmlui/dspace-xmlui-webapp/src/main/webapp/themes/dri2xhtml-alt/
aspect
administrative
harvesting.xsl
artifactbrowser
COinS.xsl
ORE.xsl
artifactbrowser.xsl
collection-list.xsl
collection-view.xsl
common.xsl
community-list.xsl
community-view.xsl
item-list.xsl
item-view.xsl
general
choice-authority-control.xsl
core
attribute-handlers.xsl
elements.xsl
forms.xsl
global-variables.xsl
navigation.xsl
page-structure.xsl
utils.xsl
dri2xhtml.xsl
Table of Contents:
Introduction
The Purpose of DRI
The Development of DRI
DRI in Manakin
Themes
Aspect Chains
Common Design Patterns
Localization and Internationalization
Standard attribute triplet
Structure-oriented markup
Schema Overview
Merging of DRI Documents
Version Changes
Changes from 1.0 to 1.1
Element Reference
BODY
cell
div
DOCUMENT
field
figure
head
help
hi
instance
item
label
list
META
metadata
OPTIONS
p
pageMeta
params
reference
referenceSet
repository
repositoryMeta
row
table
trail
userMeta
value
xref
Introduction
This manual describes the Digital Repository Interface (DRI) as it applies to the DSpace digital repository and
XMLUI Manakin based interface. DSpace XML UI is a comprehensive user interface system. It is centralized
and generic, allowing it to be applied to all DSpace pages, effectively replacing the JSP-based interface system.
Its ability to apply specific styles to arbitrarily large sets of DSpace pages significantly eases the task of
adapting the DSpace look and feel to that of the adopting institution. This also allows for several levels of
branding, lending institutional credibility to the repository and collections.
Manakin, the second version of DSpace XML UI, consists of several components, written using Java, XML, and
XSL, and is implemented in Cocoon. Central to the interface is the XML Document, which is a semantic
representation of a DSpace page. In Manakin, the XML Document adheres to a schema called the Digital
Repository Interface (DRI) Schema, which was developed in conjunction with Manakin and is the subject of this
guide. For the remainder of this guide, the terms XML Document, DRI Document, and Document will be used
interchangeably.
This reference document explains the purpose of DRI, provides a broad architectural overview, and explains
common design patterns. The appendix includes a complete reference for elements used in the DRI Schema, a
graphical representation of the element hierarchy, and a quick reference table of elements and attributes.
Popular schemas such as XHTML suffer from the problem of not relating elements together explicitly. For
example, if a heading precedes a paragraph, the heading is related to the paragraph not because it is encoded
as such but because it happens to precede it. When these structures are attempted to be translated into
formats where these types of relationships are explicit, the translation becomes tedious, and potentially
problematic. More structured schemas, like TEI or DocBook, are domain specific (much like DRI itself) and
therefore not suitable for our purposes.
We also decided that the schema should natively support a metadata standard for encoding artifacts. Rather
than encoding artifact metadata in structural elements, like tables or lists, the schema would include artifacts as
objects encoded in a particular standard. The inclusion of metadata in native format would enable the Theme to
choose the best method to render the artifact for display without being tied to a particular structure.
Ultimately, we chose to develop our own schema. We have constructed the DRI schema by incorporating other
standards when appropriate, such as Cocoon's i18n schema for internationalization, DCMI's Dublin Core, and
the Library of Congress's METS schema. The design of structural elements was derived primarily from TEI, with
some of the design patterns borrowed from other existing standards such as DocBook and XHTML. While the
structural elements were designed to be easily translated into XHTML, they preserve the semantic relationships
for use in more expressive languages.
DRI in Manakin
The general process for handling a request in DSpace XML UI consists of two parts. The first part builds the
XML Document, and the second part stylizes that Document for output. In Manakin, the two parts are not
discrete and instead wrapped within two processes: Content Generation, which builds an XML representation of
the page, and Style Application, which stylizes the resulting Document. Content Generation is performed by
Aspect chaining, while Style Application is performed by a Theme.
Themes
A Theme is a collection of XSL stylesheets and supporting files like images, CSS styles, translations, and help
documents. The XSL stylesheets are applied to the DRI Document to covert it into a readable format and give it
structure and basic visual formatting in that format. The supporting files are used to provide the page with a
specific look and feel, insert images and other media, translate the content, and perform other tasks. The
currently used output format is XHTML and the supporting files are generally limited to CSS, images, and
JavaScript. More output formats, like PDF or SVG, may be added in the future.
A DSpace installation running Manakin may have several Themes associated with it. When applied to a page, a
Theme determines most of the page's look and feel. Different themes can be applied to different sets of DSpace
pages allowing for both variety of styles between sets of pages and consistency within those sets. The xmlui.
xconf configuration file determines which Themes are applied to which DSpace pages (see the XMLUI
Configuration and Customization section for more information on installing and configuring themes). Themes
may be configured to apply to all pages of specific type, like browse-by-title, to all pages of a one particular
community or collection or sets of communities and collections, and to any mix of the two. They can also be
configured to apply to a singe arbitrary page or handle.
Aspect Chains
Manakin Aspects are arrangements of Cocoon components (transformers, actions, matchers, etc) that
implement a new set of coupled features for the system. These Aspects are chained together to form all the
features of Manakin. Five Aspects exist in the default installation of Manakin, each handling a particular set of
features of DSpace, and more can be added to implement extra features. All Aspects take a DRI Document as
input and generate one as output. This allows Aspects to be linked together to form an Aspect chain. Each
Aspect in the chain takes a DRI Document as input, adds its own functionality, and passes the modified
Document to the next Aspect in the chain.
When the Content Generation process produces a DRI Document, some of the textual content may be marked
up with i18n elements to signify that translations are available for that content. During the Style Application
process, the Theme can also introduce new textual content, marking it up with i18n tags. As a result, after the
Theme's XSL templates are applied to the DRI Document, the final output consists of a DSpace page marked
up in the chosen display format (like XHTML) with i18n elements from both DSpace and XSL content. This final
document is sent through Cocoon's i18n transformer that translates the marked up text.
Identification is important because it allows elements to be separated from their peers for sorting, special case
rendering, and other tasks. The first attribute, id, is the global identifier and it is unique to the entire document.
Any element that contains an id attribute can thus be uniquely referenced by it. The id attribute of an element
can be either assigned explicitly, or generated from the Java Class Path of the originating object if no name is
given. While all elements that can be uniquely identified can carry the id attribute, only those that are
independent on their context are required to do so. For example, tables are required to have an id since they
retain meaning regardless of their location in the document, while table rows and cells can omit the attribute
since their meaning depends on the parent element.
The name attribute n is simply the name assigned to the element, and it is used to distinguish an element from
its immediate peers. In the example of a particular list, all items in that list will have different names to
distinguish them from each other. Other lists in the document, however, can also contain items whose names
will be different from each other, but identical to those in the first list. The n attribute of an element is therefore
unique only in the scope of that element's parent and is used mostly for sorting purposes and special rendering
of a certain class of elements, like, for example, all first items in lists, or all items named "browse". The n
attribute follows the same rules as id when determining whether or not it is required for a given element.
The last attribute in the standard triplet is rend. Unlike id and n, the rend attribute can consist of several space
delimited values and is optional for all elements that can contain it. Its purpose is to provide a rendering hint
from the middle layer component to the styling theme. How that hint is interpreted and whether it is used at all
when provided, is completely up the theme. There are several cases, however, where the content of the rend
attribute is outlined in detail and its use is encouraged. Those cases are the emphasis element hi, the division
element div, and the list element. Please refer to the Element Reference for more detail on these elements.
Structure-oriented markup
The final design pattern is the use of structure-oriented markup for content carried by the XML Document. Once
generated by Cocoon, the Document contains two major types of information: metadata about the repository
and its contents, and the actual content of the page to be displayed. A complete overview of metadata and
content markup and their relationship to each other is given in the next section. An important thing to note here,
however, is that the markup of the content is oriented towards explicitly stating structural relationships between
the elements rather than focusing on the presentational aspects. This makes the markup used by the Document
more similar to TEI or Docbook rather than HTML. For this reason, XSL templates are used by the themes to
convert structural DRI markup to XHTML. Even then, an attempt is made to create XHTML as structural as
possible, leaving presentation entirely to CSS. This allows the XML Document to be generic enough to
represent any DSpace page without dictating how it should be rendered.
Schema Overview
The DRI XML Document consists of the root element document and three top-level elements that contain two
major types of elements. The three top-level containers are meta, body, and options. The two types of elements
they contain are metadata and content, carrying metadata about the page and the contents of the page,
respectively. Figure 1 depicts the relationship between these six components.
Figure 1: The two content types across three major divisions of a DRI page.
The document element is the root for all DRI pages and contains all other elements. It bears only one attribute,
version, that contains the version number of the DRI system and the schema used to validate the produced
document. At the time of writing the working version number is "1.1".
The meta element is a the top-level element under document and contains all metadata information about the
page, the user that requested it, and the repository it is used with. It contains no structural elements, instead
being the only container of metadata elements in a DRI Document. The metadata stored by the meta element is
broken up into three major groups: userMeta, pageMeta, and objectMeta, each storing metadata information
about their respective component. Please refer to the reference entries for more information about these
elements.
The options element is another top-level element that contains all navigation and action options available to the
user. The options are stored as items in list elements, broken up by the type of action they perform. The five
types of actions are: browsing, search, language selection, actions that are always available, and actions that
are context dependent. The two action types also contain sub-lists that contain actions available to users of
varying degrees of access to the system. The options element contains no metadata elements and can only
make use of a small set of structural elements, namely the list element and its children.
The last major top-level element is the body element. It contains all structural elements in a DRI Document,
including the lists used by the options element. Structural elements are used to build a generic representation of
a DSpace page. Any DSpace page can be represented with a combination of the structural elements, which will
in turn be transformed by the XSL templates into another format. This is the core mechanism that allows
DSpace XML UI to apply uniform templates and styling rules to all DSpace pages and is the fundamental
difference from the JSP approach currently used by DSpace.
The body element directly contains only one type of element: div. The div element serves as a major division of
content and any number of them can be contained by the body. Additionally, divisions are recursive, allowing
divs to contain other divs. It is within these elements that all other structural elements are contained. Those
elements include tables, paragraph elements p, and lists, as well as their various children elements. At the
lower levels of this hierarchy lie the character container elements. These elements, namely paragraphs p, table
cells, lists items, and the emphasis element hi, contain the textual content of a DSpace page, optionally
modified with links, figures, and emphasis. If the division within which the character class is contained is tagged
as interactive (via the interactive attribute), those elements can also contain interactive form fields. Divisions
tagged as interactive must also provide method and action attributes for its fields to use.
When merging two DRI Documents, one is considered to be the main document, and the other a feeder
document that is added in. The three top level containers (meta, body and options) of both documents are then
individually analyzed and merged. In the case of the options and meta elements, the children tags are taken
individually as well and treated differently from their siblings.
The body elements are the easiest to merge: their respective div children are preserved along with their
ordering and are grouped together under one element. Thus, the new body tag will contain all the divs of the
main document followed by all the divs of the feeder. However, if two divs have the same n and rend attributes
(and in case of an interactive div the same action and method attributes as well), those divs will be merged into
one. The resulting div will bear the id, n, and rend attributes of the main document's div and contain all the divs
of the main document followed by all the divs of the feeder. This process continues recursively until all the divs
have been merged. It should be noted that two divisions with separate pagination rules cannot be merged
together.
Merging the options elements is somewhat different. First, list elements under options of both documents are
compared with each other. Those unique to either document are simply added under the new options element,
just like divs under body. In case of duplicates, that is list elements that belong to both documents and have the
same n attribute, the two lists will be merged into one. The new list element will consist of the main document's
head element, followed label-item pairs from the main document, and then finally the label-item pairs of the
feeder, provided they are different from those of the main.
Finally, the meta elements are merged much like the elements under body. The three children of meta -
userMeta, pageMeta, and objectMeta - are individually merged, adding the contents of the feeder after the
contents of the main.
Version Changes
The DRI schema will continue to evolve overtime as the needs of interface design require. The version attribute
on the document will indicate which version of the schema the document conforms to. At the time Manakin was
incorporated into the standard distribution of DSpac the current version was "1.1", however earlier versions of
the Manakin interface may use "1.0".
Element Reference
Element Attributes Required?
BODY
cell
cols
id
rend
role
rows
div
behaviorSensitivFields
currentPage
firstItemIndex
id required
interactive
itemsTotal
lastItemIndex
n required
nextPage
pagesTotal
pageURLMask
pagination
previousPage
rend
field
disabled
id required
n required
rend
required
type required
figure
rend
source
target
head
id
rend
help
hi rend required
instance
item
id
rend
label
id
rend
list
id required
n required
rend
type
META
metadata
element required
language
qualifier
OPTIONS
id
rend
pageMeta
params
cols
maxlength
multiple
operations
rows
size
reference
url required
repositoryID required
type
referenceSet
id required
n required
orderBy
rend
type required
repository
repositoryID required
url required
repositoryMeta
row
id
rend
role required
table
cols required
id required
n required
rend
rows required
trail
rend
target
value
optionSelected
optionValue
type required
BODY
Top-Level Container
The body element is the main container for all content displayed to the user. It contains any number of div
elements that group content into interactive and display blocks.
Parent
document
Children
div (any)
Attributes
None
<document version=1.0>
<meta> ... </meta>
<body>
<div n="division-example1"
id="XMLExample.div.division-example1">
...
</div>
<div n="division-example2" id="XMLExample.div.division-example2"
interactive="yes" action="www.DRItest.com"
method="post">
...
</div>
...
</body>
<options> ... </options>
</document>
cell
Rich Text Container
Structural Element
The cell element contained in a row of a table carries content for that table. It is a character container, just like p
, item, and hi, and its primary purpose is to display textual data, possibly enhanced with hyperlinks, emphasized
blocks of text, images and form fields. Every cell can be annotated with a role (the most common being
"header" and "data") and can stretch across any number of rows and columns. Since cells cannot exist outside
their container, row, their id attribute is optional.
Parent
row
Children
hi (any)
xref (any)
figure (any)
field (any)
Attributes
div
Structural Element
The div element represents a major section of content and can contain a wide variety of structural elements to
present that content to the user. It can contain paragraphs, tables, and lists, as well as references to artifact
information stored in artifactMeta, repositoryMeta, collections, and communities. The div element is also
recursive, allowing it to be further divided into other divs. Divs can be of two types: interactive and static. The
two types are set by the use of the interactive attribute and differ in their ability to contain interactive content.
Children elements of divs tagged as interactive can contain form fields, with the action and method attributes of
the div serving to resolve those fields.
Parent
body
div
Children
Attributes
action: (required for interactive) The form action attribute determines where the form information should
be sent for processing.
behavior: (optional for interactive) The acceptable behavior options that may be used on this form. The
only possible value defined at this time is "ajax" which means that the form may be submitted multiple
times for each individual field in this form. Note that if the form is submitted multiple times it is best for the
behaviorSensitiveFields to be updated as well.
behaviorSensitiveFields: (optional for interactive) A space separated list of field names that are
sensitive to behavior. These fields must be updated each time a form is submitted with out a complete
refresh of the page (i.e. ajax).
currentPage: (optional) For paginated divs, the currentPage attribute indicates the index of the page
currently displayed for this div.
firstItemIndex: (optional) For paginated divs, the firstItemIndex attribute indicates the index of the first
item included in this div.
id: (required) A unique identifier of the element.
interactive: (optional) Accepted values are "yes", "no". This attribute determines whether the div is
interactive or static. Interactive divs must provide action and method and can contain field elements.
itemsTotal: (optional) For paginated divs, the itemsTotal attribute indicates how many items exit across
all paginated divs.
lastItemIndex: (optional) For paginated divs, the lastItemIndex attribute indicates the index of the last
item included in this div.
method: (required for interactive) Accepted values are "get", "post", and "multipart". Determines the
method used to pass gathered field values to the handler specified by the action attribute. The multipart
method should be used for uploading files.
n: (required) A local identifier used to differentiate the element from its siblings.
nextPage: (optional) For paginated divs the nextPage attribute points to the URL of the next page of the
div, if it exists.
pagesTotal: (optional) For paginated divs, the pagesTotal attribute indicates how many pages the
paginated divs spans.
pageURLMask: (optional) For paginated divs, the pageURLMask attribute contains the mask of a url to a
particular page within the paginated set. The destination page's number should replace the {pageNum}
string in the URL mask to generate a full URL to that page.
pagination: (optional) Accepted values are "simple", "masked". This attribute determines whether the div
is spread over several pages. Simple paginated divs must provide previousPage, nextPage, itemsTotal,
firstItemIndex, lastItemIndex attributes. Masked paginated divs must provide currentPage, pagesTotal,
pageURLMask, itemsTotal, firstItemIndex, lastItemIndex attributes.
previousPage: (optional) For paginated divs the previousPage attribute points to the URL of the
previous page of the div, if it exists.
rend: (optional) A rendering hint used to override the default display of the element. In the case of the div
tag, it is also encouraged to label it as either "primary" or "secondary". Divs marked as primary contain
content, while secondary divs contain auxiliary information or supporting fields.
<body>
<div n="division-example"
id="XMLExample.div.division-example">
<head> Example Division </head>
<p> This example shows the use of divisions. </p>
<table ...>
...
</table>
<referenceSet ...>
...
</referenceSet>
<list ...>
...
</list>
<div n="sub-division-example"
id="XMLExample.div.sub-division-example">
<p> Divisions may be nested </p>
...
</div>
...
</div>
...
</body>
DOCUMENT
Document Root
The document element is the root container of an XML UI document. All other elements are contained within it
either directly or indirectly. The only attribute it carries is the version of the Schema to which it conforms.
Parent
none
Children
meta (one)
body (one)
options (one)
Attributes
version: (required) Version number of the schema this document adheres to. At the time of writing the
only valid version numbers are "1.0" or "1.1". Future iterations of this schema may increment the version
number.
<document
version="1.1">
<meta>
...
</meta>
<body>
...
</body>
<options>
...
</options>
</document>
field
Text Container
Structural Element
The field element is a container for all information necessary to create a form field. The required type attribute
determines the type of the field, while the children tags carry the information on how to build it. Fields can only
occur in divisions tagged as "interactive".
Parent
cell
p
hi
item
Children
params (one)
help (zero or one)
error (any)
option (any - only with the select type)
value (any - only available on fields of type: select, checkbox, or radio)
field (one or more - only with the composite type)
valueSet (any)
Attributes
disabled: (optional) Accepted values are "yes", "no". Determines whether the field allows user input.
Rendering of disabled fields may vary with implementation and display media.
id: (required) A unique identifier for a field element.
n: (required) A non-unique local identifier used to differentiate the element from its siblings within an
interactive division. This is the name of the field use when data is submitted back to the server.
rend: (optional) A rendering hint used to override the default display of the element.
required: (optional) Accepted values are "yes", "no". Determines whether the field is a required
component of the form and thus cannot be left blank.
type: (required) A required attribute to specify the type of value. Accepted types are:
button: A button input control that when activated by the user will submit the form, including all
the fields, back to the server for processing.
checkbox: A boolean input control which may be toggled by the user. A checkbox may have
several fields which share the same name and each of those fields may be toggled independently.
This is distinct from a radio button where only one field may be toggled.
file: An input control that allows the user to select files to be submitted with the form. Note that a
form which uses a file field must use the multipart method.
hidden: An input control that is not rendered on the screen and hidden from the user.
password: A single-line text input control where the input text is rendered in such a way as to
hide the characters from the user.
radio: A boolean input control which may be toggled by the user. Multiple radio button fields may
share the same name. When this occurs only one field may be selected to be true. This is distinct
from a checkbox where multiple fields may be toggled.
select: A menu input control which allows the user to select from a list of available options.
text: A single-line text input control.
textarea: A multi-line text input control.
composite: A composite input control combines several input controls into a single field. The only
fields that may be combined together are: checkbox, password, select, text, and textarea. When
fields are combined together they can posses multiple combined values.
<p>
<hi> ... </hi>
<xref> ... </xref>
<figure> ... </figure>
...
<field id="XMLExample.field.name" n="name" type="text"
required="yes">
<params size="16" maxlength="32"/>
<help>Some help text with <i18n>localized
content</i18n>.</help>
<value type="raw">Default value goes
here</value>
</field>
</p>
figure
Text Container
Structural Element
The figure element is used to embed a reference to an image or a graphic element. It can be mixed freely with
text, and any text within the tag itself will be used as an alternative descriptor or a caption.
Parent
cell
p
hi
item
Children
none
Attributes
rend: (optional) A rendering hint used to override the default display of the element.
source: (optional) The source for the image, using either a URL or a pre-defined XML entity.
target: (optional) A target for an image used as a link, using either a URL or an id of an existing element
as a destination.
<p>
<hi> ... </hi>
...
<xref> ... </xref>
...
<field> ... </field>
...
<figure source="www.example.com/fig1"> This is a static image.
</figure> <figure source="www.example.com/fig1"
target="www.example.net">
This image is also a link.
</figure>
...
</p>
head
Text Container
Structural Element
The head element is primarily used as a label associated with its parent element. The rendering is determined
by its parent tag, but can be overridden by the rend attribute. Since there can only be one head element
associated with a particular tag, the n attribute is not needed, and the id attribute is optional.
Parent
div
table
list
referenceSet
Children
none
Attributes
<div ...>
<head> This is a simple header associated with its div element.
</head>
<div ...>
<head rend="green"> This header will be green.
</head>
<p>
<head> A header with <i18n>localized content</i18n>.
</head>
...
</p>
</div>
<table ...>
<head> ...
</head>
...
</table>
<list ...>
<head> ...
</head>
...
</list>
...
</body>
help
Text Container
Structural Element
The optional help element is used to supply help instructions in plain text and is normally contained by the field
element. The method used to render the help text in the target markup is up to the theme.
Parent
field
Children
none
Attributes
None
<p>
<hi> ... </hi>
...
<xref> ... </xref>
...
<figure> ... </figure>
...
hi
Rich Text Container
Structural Element
The hi element is used for emphasis of text and occurs inside character containers like p and list item. It can be
mixed freely with text, and any text within the tag itself will be emphasized in a manner specified by the required
rend attribute. Additionally, hi element is the only text container component that is a rich text container itself,
meaning it can contain other tags in addition to plain text. This allows it to contain other text containers,
including other hi tags.
Parent
cell
p
item
hi
Children
hi (any)
xref (any)
figure (any)
field (any)
Attributes
rend: (required) A required attribute used to specify the exact type of emphasis to apply to the contained
text. Common values include but are not limited to "bold", "italic", "underline", and "emph".
<p>
This text is normal, while <hi rend="bold">this text is bold and
this text is <hi rend="italic">bold and
italic.</hi></hi>
</p>
instance
Structural Element
The instance element contains the value associated with a form field's multiple instances. Fields encoded as an
instance should also include the values of each instance as a hidden field. The hidden field should be appended
with the index number for the instance. Thus if the field is "firstName" each instance would be named
"firstName_1", "firstName_2", "firstName_3", etc...
Parent
field
Children
value
Attributes
Example needed.
item
Rich Text Container
Structural Element
The item element is a rich text container used to display textual data in a list. As a rich text container it can
contain hyperlinks, emphasized blocks of text, images and form fields in addition to plain text.
The item element can be associated with a label that directly precedes it. The Schema requires that if one item
in a list has an associated label, then all other items must have one as well. This mitigates the problem of loose
connections between elements that is commonly encountered in XHTML, since every item in particular list has
the same structure.
Parent
list
Children
hi (any)
xref (any)
figure (any)
field (any)
list (any)
Attributes
<list n="list-example"
id="XMLExample.list.list-example">
<head> Example List </head>
<item> This is the first item
</item> <item> This is the second item with <hi ...>highlighted text</hi>,
<xref ...> a link</xref> and an <figure
...>image</figure>.</item>
...
<list n="list-example2"
id="XMLExample.list.list-example2">
<head> Example List </head>
<label>ITEM ONE:</label>
<item> This is the first item
</item>
<label>ITEM TWO:</label>
<item> This is the second item with <hi ...>highlighted
text</hi>, <xref ...> a link</xref> and an <figure
...>image</figure>.</item>
<label>ITEM THREE:</label>
<item> This is the third item with a <field ...> ... </field>
</item>
...
</list>
<item> This is the third item in the list
</item>
...
</list>
label
Text Container
Structural Element
The label element is associated with an item and annotates that item with a number, a textual description of
some sort, or a simple bullet.
Parent
item
Children
none
Attributes
<list n="list-example"
id="XMLExample.list.list-example">
<head>Example List</head>
<label>1</label>
<item> This is the first item </item>
<label>2</label>
<item> This is the second item with <hi ...>highlighted text</hi>,
<xref ...> a link</xref> and an <figure
...>image</figure>.</item>
...
<list n="list-example2"
id="XMLExample.list.list-example2">
<head>Example Sublist</head>
<label>ITEM
ONE:</label>
<item> This is the first item </item>
<label>ITEM
TWO:</label>
<item> This is the second item with <hi ...>highlighted
text</hi>, <xref ...> a link</xref> and an <figure
...>image</figure>.</item>
<label>ITEM
THREE:</label>
<item> This is the third item with a <field ...> ... </field>
</item>
...
</list>
<item> This is the third item in the list </item>
...
</list>
list
Structural Element
The list element is used to display sets of sequential data. It contains an optional head element, as well as any
number of item and list elements. Items contain textual information, while sublists contain other item or list
elements. An item can also be associated with a label element that annotates an item with a number, a textual
description of some sort, or a simple bullet. The list type (ordered, bulleted, gloss, etc.) is then determined either
by the content of labels on items or by an explicit value of the type attribute. Note that if labels are used in
conjunction with any items in a list, all of the items in that list must have a label. It is also recommended to avoid
mixing label styles unless an explicit type is specified.
Parent
div
list
Children
Attributes
<div ...>
...
<list n="list-example"
id="XMLExample.list.list-example">
<head>Example List</head>
<item> ... </item>
<item> ... </item>
...
<list n="list-example2"
id="XMLExample.list.list-example2">
<head>Example Sublist</head>
<label> ... </label>
<item> ... </item>
<label> ... </label>
<item> ... </item>
<label> ... </label>
<item> ... </item>
...
</list>
<label> ... </label>
<item> ... </item>
...
</list>
</div>
META
Top-Level Container
The meta element is a top level element and exists directly inside the document element. It serves as a
container element for all metadata associated with a document broken up into categories according to the type
of metadata they carry.
Parent
document
Children
userMeta (one)
pageMeta (one)
repositoryMeta (one)
Attributes
None
<document version=1.0>
<meta>
<userMeta> ... </userMeta>
metadata
Text Container
Structural Element
The metadata element carries generic metadata information in the form on an attribute-value pair. The type of
information it contains is determined by two attributes: element, which specifies the general type of metadata
stored, and an optional qualifier attribute that narrows the type down. The standard representation for this
pairing is element.qualifier. The actual metadata is contained in the text of the tag itself. Additionally, a language
attribute can be used to specify the language used for the metadata entry.
Parent
userMeta
pageMeta
Children
none
Attributes
<meta>
<userMeta>
<metadata element="identifier" qualifier="firstName"> Bob
</metadata> <metadata element="identifier" qualifier="lastName"> Jones
</metadata> <metadata ...> ...
</metadata>
...
</userMeta>
<pageMeta>
<metadata element="rights"
qualifier="accessRights">user</metadata> <metadata ...> ...
</metadata>
...
</pageMeta>
</meta>
OPTIONS
Top-Level Container
The options element is the main container for all actions and navigation options available to the user. It consists
of any number of list elements whose items contain navigation information and actions. While any list of
navigational options may be contained in this element, it is suggested that at least the following 5 lists be
included.
Parent
document
Children
list (any)
Attributes
None
<document version=1.0>
<options>
<list n="navigation-example1"
id="XMLExample.list.navigation-example1">
<item><xref target="/link/to/option">Option
One</xref></item>
<item><xref target="/link/to/option">Option
two</xref></item>
...
</list>
<list n="navigation-example2"
id="XMLExample.list.navigation-example2">
<item><xref target="/link/to/option">Option
One</xref></item>
<item><xref target="/link/to/option">Option
two</xref></item>
...
</list>
...
</options>
</document>
p
Rich Text Container
Structural Element
The p element is a rich text container used by divs to display textual data in a paragraph format. As a rich text
container it can contain hyperlinks, emphasized blocks of text, images and form fields in addition to plain text.
Parent
div
Children
hi (any)
xref (any)
figure (any)
field (any)
Attributes
<div n="division-example"
id="XMLExample.div.division-example">
</div>
pageMeta
Metadata Element
The pageMeta element contains metadata associated with the document itself. It contains generic metadata
elements to carry the content, and any number of trail elements to provide information on the user's current
location in the system. Required and suggested values for metadata elements contained in pageMeta include
but are not limited to:
browser (suggested): The user's browsing agent as reported to server in the HTTP request.
browser.type (suggested): The general browser family as derived form the browser metadata field.
Possible values may include "MSIE" (for Microsoft Internet Explorer), "Opera" (for the Opera browser),
"Apple" (for Apple web kit based browsers), "Gecko" (for Netscape, Mozilla, and Firefox based
browsers), or "Lynx" (for text based browsers).
browser.version (suggested): The browser version as reported by HTTP Request.
contextPath (required): The base URL of the Digital Repository system.
redirect.time (suggested): The time that must elapse before the page is redirected to an address
specified by the redirect.url metadata element.
redirect.url (suggested): The URL destination of a redirect page
title (required): The title of the document/page that the user currently browsing.
See the metadata and trail tag entries for more information on their structure.
Parent
meta
Children
metadata (any)
trail (any)
Attributes
None
<meta>
<pageMeta>
<metadata
element="contextPath">/xmlui/</metadata>
...
...
</pageMeta>
</meta>
params
Structural Component
The params element identifies extra parameters used to build a form field. There are several attributes that may
be available for this element depending on the field type.
Parent
field
Children
none
Attributes
cols: (optional) The default number of columns that the text area should span. This applies only to
textarea field types.
maxlength: (optional) The maximum length that the theme should accept for form input. This applies to
text and password field types.
multiple: (optional) yes/no value. Determine if the field can accept multiple values for the field. This
applies only to select lists.
operations: (optional) The possible operations that may be preformed on this field. The possible values
are "add" and/or "delete". If both operations are possible then they should be provided as a space
separated list. The "add" operations indicates that there may be multiple values for this field and the user
may add to the set one at a time. The front-end should render a button that enables the user to add more
fields to the set. The button must be named the field name appended with the string "_add", thus if the
field's name is "firstName" the button must be called "firstName_add".The "delete" operation indicates
that there may be multiple values for this field each of which may be removed from the set. The front-end
should render a checkbox by each field value, except for the first, The checkbox must be named the field
name appended with the string "_selected", thus if the field's name is "firstName" the checkbox must be
called "firstName_selected" and the value of each successive checkbox should be the field name. The
front-end must also render a delete button. The delete button name must be the field's name appended
with the string "_delete".
rows: (optional) The default number of rows that the text area should span. This applies only to textarea
field types.
size: (optional) The default size for a field. This applies to text, password, and select field types.
<p>
<params size="16"
maxlength="32"/>
</field>
</p>
reference
Metadata Reference Element
reference is a reference element used to access information stored in an external metadata file. The url attribute
is used to locate the external metadata file. The type attribute provides a short limited description of the
referenced object's type.
reference elements can be both contained by includeSet elements and contain includeSets themselves, making
the structure recursive.
Parent
referenceSet
Children
Attributes
<includeSet n="browse-list"
id="XMLTest.includeSet.browse-list">
<reference url="/metadata/handle/123/4/mets.xml"
repositoryID="123" type="DSpace
Item"/> <reference url="/metadata/handle/123/5/mets.xml"
repositoryID="123" />
...
</includeSet>
referenceSet
Metadata Reference Element
Parent
div
reference
Children
Attributes
detailList: Indicates that the metadata from referenced artifacts or repository objects should be
used to build a list representation that provides a complete, or near complete, view of the
referenced objects. Whether such a view is possible or different from summaryView depends
largely on the repository at hand and the implementing theme.
detailView: Indicates that the metadata from referenced artifacts or repository objects should be
used to display complete information about the referenced object. Rendering of several references
included under this type is up to the theme.
<div ...>
<head> Example Division </head>
<p> ... </p>
<table> ... </table>
<list>
...
</list>
<referenceSet n="browse-list"
id="XMLTest.referenceSet.browse-list" type="summaryView"
informationModel="DSpace">
<head>A header for the includeset</head>
<reference
url="/metadata/handle/123/34/mets.xml"/>
<reference
url=""metadata/handle/123/34/mets.xml/>
</referenceSet>
...
</p>
repository
Metadata Element
The repository element is used to describe the repository. Its principal component is a set of structural metadata
that carrier information on how the repository's objects under objectMeta are related to each other. The principal
method of encoding these relationships at the time of this writing is a METS document, although other formats,
like RDF, may be employed in the future.
Parent
repositoryMeta
Children
none
Attributes
repositoryID: requiredA unique identifier assigned to a repository. It is referenced by the object element
to signify the repository that assigned its identifier.
url: requiredA url to the external METS metadata file for the repository.
<repositoryMeta>
<repository repositoryID="123456789"
url="/metadata/handle/1234/4/mets.xml" />
</repositoryMeta>
repositoryMeta
Metadata Element
The repositoryMeta element contains metadata references about the repositories used in the used or
referenced in the document. It can contain any number of repository elements.
See the repository tag entry for more information on the structure of repository elements.
Parent
Meta
Children
repository (any)
Attributes
None
<meta>
<repositoryMeta>
</repositoryMeta>
</meta>
row
Structural Element
The row element is contained inside a table and serves as a container of cell elements. A required role attribute
determines how the row and its cells are rendered.
Parent
table
Children
cell (any)
Attributes
<row
role="head">
...
</row> <row>
...
</row>
...
</table>
table
Structural Element
The table element is a container for information presented in tabular format. It consists of a set of row elements
and an optional header.
Parent
div
Children
Attributes
<div n="division-example"
id="XMLExample.div.division-example">
<row role="head">
...
</row>
<row>
...
</row>
...
</table>
...
</div>
trail
Text Container
Metadata Element
The trail element carries information about the user's current location in the system relative of the repository's
root page. Each instance of the element serves as one link in the path from the root to the current page.
Parent
pageMeta
Children
none
Attributes
rend: (optional) A rendering hint used to override the default display of the element.
target: (optional) An optional attribute to specify a target URL for a trail element serving as a hyperlink.
The text inside the element will be used as the text of the link.
<pageMeta>
<metadata
element="contextPath">/xmlui/</metadata>
...
...
</pageMeta>
userMeta
Metadata Element
The userMeta element contains metadata associated with the user that requested the document. It contains
generic metadata elements, which in turn carry the information. Required and suggested values for metadata
elements contained in userMeta include but not limited to:
See the metadata tag entry for more information on the structure of metadata elements.
Parent
meta
Children
metadata (any)
Attributes
authenticated: (required) Accepted values are "yes", "no". Determines whether the user has been
authenticated by the system.
<meta>
<userMeta>
...
...
</userMeta>
</meta>
value
Rich Text Container
Structural Element
The value element contains the value associated with a form field and can serve a different purpose for various
field types. The value element is comprised of two subelements: the raw element which stores the unprocessed
value directly from the user of other source, and the interpreted element which stores the value in a format
appropriate for display to the user, possibly including rich text markup.
Parent
field
Children
hi (any)
xref (any)
figure (any)
Attributes
optionSelected: (optional) An optional attribute for select, checkbox, and radio fields to determine if the
value is to be selected or not.
optionValue: (optional) An optional attribute for select, checkbox, and radio fields to determine the value
that should be returned when this value is selected.
type: (required) A required attribute to specify the type of value. Accepted types are:
raw: The raw type stores the unprocessed value directly from the user of other source.
interpreted: The interpreted type stores the value in a format appropriate for display to the user,
possibly including rich text markup.
default: The default type stores a value supplied by the system, used when no other values are
provided.
<p>
<hi> ... </hi>
<xref> ... </xref>
<figure> ... </figure>
<field id="XMLExample.field.name" n="name" type="text"
required="yes">
<params size="16" maxlength="32"/>
<help>Some help text with <i18n>localized
content</i18n>.</help>
<value type="default">Author,
John</value>
</field>
</p>
xref
Text Container
Structural Element
The xref element is a reference to an external document. It can be mixed freely with text, and any text within the
tag itself will be used as part of the link's visual body.
Parent
cell
p
item
hi
Children
none
Attributes
target: (required) A target for the reference, using either a URL or an id of an existing element as a
destination for the xref.
<p>
<xref target="/url/link/target">This text is shown as a link.</xref>
</p>
4.26.2 Introduction
With DSpace you can describe digital objects such as text files, audio, video or data to facility easy retrieval and
high quality search results. These descriptions are organized into metadata fields that each have a specific
designation, for example dc.title stores the title of an object, while dc.subject is reserved for subject keywords.
For many of these fields, including title and abstract, free text entry is the de facto choice as the values are
likely to be unique. Other fields are likely to be associated with values that can occur across different items.
Such fields include unique names, subject keywords, document types and other classifications. For those kinds
of fields the overall quality of the repository metadata increases if values with the same meaning are normalized
across all items. Additional benefits can be gained if unique identifiers are associated as well in addition to
canonical text values associated with a particular metadata field.
This page covers features included in the DSpace submission forms that allow repository managers to enforce
the usage of normalized terms for those fields where this is required in their institutional use cases. DSpace
offers simple and straightforward features, such as definitions of simple text values for dropdowns, as well as
more elaborate integrations with external vocabularies such as the Library of Congress Naming Authority.
Example
It generates the following HTML, which results in the menu widget below.
<select name="identifier_qualifier_0">
<option VALUE="govdoc">Gov't Doc #</option>
<option VALUE="uri">URI</option>
<option VALUE="isbn">ISBN</option>
</select>
Each value-pairs element contains a sequence of pair sub-elements, each of which in turn contains two
elements:
displayed-value – Name shown (on the web page) for the menu entry.
stored-value – Value stored in the DC element when this entry is chosen. Unlike the HTML select tag,
there is no way to indicate one of the entries should be the default, so the first entry is always the default
choice.
As you can see, each node element has an id and label attribute. It can contain the isComposedBy element,
which in its turn, consists of a list of other nodes.
You are free to use any application you want to create your controlled vocabularies. A simple text editor should
be enough for small projects. Bigger projects will require more complex tools. You may use Protegé to create
your taxonomies, save them as OWL and then use a XML Stylesheet (XSLT) to transform your documents to
the appropriate format. Future enhancements to this add-on should make it compatible with standard schemas
such as OWL or RDF.
<field>
<dc-schema>dc</dc-schema>
<dc-element>subject</dc-element>
<dc-qualifier></dc-qualifier>
<repeatable>true</repeatable>
<label>Subject Keywords</label>
<input-type>onebox</input-type>
<hint>Enter appropriate subject keywords or phrases below.</hint>
<required></required>
<vocabulary>srsc</vocabulary>
</field>
The vocabulary element has an optional boolean attribute closed that can be used to force input only with the
Javascript of controlled-vocabulary add-on. The default behaviour (i.e. without this attribute) is as set closed="
false". This allow the user also to enter values as free text, not selecting them from the controlled vocabulary.
Authority An authority is an external source of fixed values for a given domain, each unique value
identified by a key. For example, the OCLC LC Name Authority Service, ORCID or VIAF.
Authority Record The information associated with one of the values in an authority; may include
alternate spellings and equivalent forms of the value, etc.
Authority Key An opaque, hopefully persistent, identifier corresponding to exactly one record in the
authority.
The fact that this functionality deals with external sources of authority makes it inherently different from the
functionality for controlled vocabularies. Another difference is that the authority control is asserted everywhere
metadata values are changed, including unattended/batch submission, LNI and SWORD package submission,
and the administrative UI.
How it works
TODO
Original source:
Authority Control of Metadata Values original development proposal for DSpace 1.6
5 System Administration
This top level node intends to hold all system administration aspects of DSpace including but not limited to:
Installation
Upgrading
Troubleshooting system errors
Managing Dependencies
In this context System administration is defined as all technical tasks required to get DSpace in a state in which
it operates properly so its behaviour is predictable and can be used according to all the guidelines under "Using
DSpace".
Below is the "Command Help Table". This table explains what data is contained in the individual command/help
tables in the sections that follow.
With DSpace Release 1.6, the many commands and scripts have been replaced with a simple
[dspace]/bin/dspace <command> command. See the Application Layer chapter for the details of
the DSpace Command Launcher.
the e-mail subscription feature that alerts users of new items being deposited;
the 'media filter' tool, that generates thumbnails of images and extracts the full-text of documents for
indexing;
the 'checksum checker' that tests the bitstreams in your repository for corruption;
the sitemap generator, which enhances the ability of major search engines to index your content and
make it findable;
the curation system queueing feature, which allows administrators to "queue" tasks (to run at a later
time) from the Admin UI;
and Discovery (search & browse), OAI-PMH and Usage Statistics all receive performance benefits from
regular re-optimization.
These regularly scheduled tasks should be setup via either cron (for Linux/Mac OSX) or Windows Task
Scheduler (for Windows).
crontab -e
While every DSpace installation is unique, in order to get the most out of DSpace, we highly recommend
enabling these basic cron settings (the settings are described in the comments):
## but this should give you an idea of what you likely wish to schedule via cron.
##
## NOTE: You may also need to add additional sysadmin related tasks to your crontab
## (e.g. zipping up old log files, or even removing old logs, etc).
#-----------------
# GLOBAL VARIABLES
#-----------------
# Full path of your local DSpace Installation (e.g. /home/dspace or /dspace or similar)
# MAKE SURE TO CHANGE THIS VALUE!!!
DSPACE = [dspace]
# Shell to use
SHELL=/bin/sh
#--------------
# HOURLY TASKS (Recommended to be run multiple times per day, if possible)
# At a minimum these tasks should be run daily.
#--------------
#----------------
# DAILY TASKS
# (Recommended to be run once per day. Feel free to tweak the scheduled times below.)
#----------------
# Update the OAI-PMH index with the newest content (and re-optimize that index) at midnight every
day
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING OAI-PMH
# (This ensures new content is available via OAI-PMH and ensures the OAI-PMH index is optimized for
better performance)
0 0 * * * $DSPACE/bin/dspace oai import -o > /dev/null
# Note, you may want to consider rebuilding your entire OAI index every night... it doesn't take
very long, and it ensures the feed is current
# to do so, just comment out the oai import line above and uncomment the one below
# 0 23 * * * dspace $DSPACE/bin/dspace oai import -c -o clean-cache
# Cleanup Web Spiders from DSpace Statistics Solr Index at 01:00 every day
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING SOLR STATISTICS
# (This removes any known web spiders from your usage statistics)
0 1 * * * $DSPACE/bin/dspace stats-util -i
# Run any Curation Tasks queued from the Admin UI at 04:00 every day
# (Ensures that any curation task that an administrator "queued" from the Admin UI is executed
# asynchronously behind the scenes)
0 4 * * * $DSPACE/bin/dspace curate -q admin_ui
#----------------
# WEEKLY TASKS
# (Recommended to be run once per week, but can be run more or less frequently, based on your
local needs/policies)
#----------------
# Run the checksum checker at 04:00 every Sunday
# By default it runs through every file (-l) and also prunes old results (-p)
# (This re-verifies the checksums of all files stored in DSpace. If any files have been changed
/corrupted, checksums will differ.)
0 4 * * * $DSPACE/bin/dspace checker -l -p
# NOTE: LARGER SITES MAY WISH TO USE DIFFERENT OPTIONS. The above "-l" option tells DSpace to
check *everything*.
# If your site is very large, you may need to only check a portion of your content per week. The
below commented-out task
# would instead check all the content it can within *one hour*. The next week it would start again
where it left off.
#0 4 * * 0 $DSPACE/bin/dspace checker -d 1h -p
# Mail the results of the checksum checker (see above) to the configured "mail.admin" at 05:00
every Sunday.
# (This ensures the system administrator is notified whether any checksums were found to be
different.)
0 5 * * 0 $DSPACE/bin/dspace checker-emailer
#----------------
# MONTHLY TASKS
# (Recommended to be run once per month, but can be run more or less frequently, based on your
local needs/policies)
#----------------
# Permanently delete any bitstreams flagged as "deleted" in DSpace, on the first of every month at
01:00
# (This ensures that any files which were deleted from DSpace are actually removed from your local
filesystem.
# By default they are just marked as deleted, but are not removed from the filesystem.)
0 1 1 * * $DSPACE/bin/dspace cleanup > /dev/null
#----------------
# YEARLY TASKS (Recommended to be run once per year)
#----------------
# At 2:00AM every January 1, "shard" the DSpace Statistics Solr index.
# This ensures each year has its own Solr index, which improves performance.
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING SOLR STATISTICS
# NOTE: This is scheduled here for 2:00AM so that it happens *after* the daily cleaning & re-
optimization of this index.
0 2 1 1 * $DSPACE/bin/dspace stats-util -s
The DSpace command launcher or CLI interface offers the execution of different maintenance operations. As
most of these are already documented in related parts of the documentation, this page is mainly intended to
provide an overview of all available CLI operations, with links to the appropriate documentation.
Examples:
bin/dspace -h
bin/dspace cleanup -h
bin/dspace cleanup
General use
checker: Run the checksum checker
checker-emailer: Send emails related to the checksum checker
classpath: Calculate and display the DSpace classpath
clean-database: Remove the database tables. Malfunctions for this script have been reported: DS-1480-
[dspace]/bin/dspace clean_database doesn't do anythingVolunteer Needed
cleanup: Remove deleted bitstreams from the assetstore
community-filiator: Tool to manage community and sub-community relationships
create-administrator: Create a DSpace administrator account
curate: Perform curation tasks on DSpace objects
doi-organiser: Transmit information about DOIs to the registration agency.
dsprop: View a DSpace property from dspace.cfg
dsrun: Run a class directly
embargo-lifter: Pre DSpace 3.0 embargo manager tool used to check, list and lift embargoes
export: Export items or collections
filter-media: Perform the media filtering to extract full text from documents and to create thumbnails
generate-sitemaps: Generate search engine and html sitemaps
harvest: Manage the OAI-PMH harvesting of external collections
import: Import items into DSpace
index-db-browse: General index command (requires extra parameters)
index-discovery: Update Discovery Solr Search Index
index-lucene-init: Initialise the search and browse indexes
index-lucene-update: Update the search and browse indexes
itemcounter: Update the item strength counts in the user interface
itemupdate: Item update tool for altering metadata and bitstream content in items
make-handle-config: Run the handle server simple setup command
metadata-export: Export metadata for batch editing
metadata-import: Import metadata after batch editing
migrate-embargo: Embargo manager tool used to migrate old version of Embargo to the new one
included in dspace3
oai: OAI script manager
packager: Execute a packager
read : execute a stream of commands from a file or pipe
registry-loader: Load entries into a registry
setup-database: Create the database tables
structure-builder: Build DSpace community and collection structure
Legacy statistics
stat-general: Compile the general statistics
stat-initial: Compile the initial statistics
stat-monthly: Compile the monthly statistics
stat-report-general: Create the general statistics report
stat-report-initial: Create the initial statistics report
stat-report-monthly: Create the monthly statistics report
SOLR Statistics
Scripts for the statistics that are stored in SOLR, added in DSpace 1.6.
stats-log-converter: Convert dspace.log files ready for import into solr statistics
stats-log-importer: Import previously converted log files into solr statistics
stats-log-importer-elasticsearch: Import solr-format converted log files into Elastic Search statistics
stats-util: Statistics Client for Maintenance of Solr Statistics Indexes
Test Database
This command can be used at any time to test for Database connectivity. It will assist in troubleshooting
PostgreSQL and Oracle connection issues with the database.
Options
Targets
5.4.1 Options
DSpace allows three property values to be set using the -D<property>=<value> option. They may be used in
other contexts than noted below, but take care to understand how a particular property will affect a target's
outcome.
overwrite
Whether to overwrite configuration files in [dspace]/config. If true, files from [dspace]/config and
subdirectories are backed up with .old extension and new files are installed from [dspace-src]/dspace/config
and subdirectories; if false, existing config files are untouched, and new files are written beside them with .
new extension.
Default: true
config
If a path is specified, ant uses values from the specified file and installs it in [dspace]/config in the
appropriate contexts.
Default: [dspace-src]/config/dspace.cfg
wars
Default: true
5.4.2 Targets
Target Effect
update Creates backup copies of the [dspace]/bin, /etc, /lib, and /webapps directories with the
form /<directory>.bak-<date-time>. Creates new copies of [dspace]/config, /etc, and /lib
directories. Does not affect data files or the database. (See overwrite, config, war
options.)
update_configs Updates the [dspace]/config directory with new configuration files. (See config option.)
update_code Creates backup copies of the [dspace]/bin, /etc, and /lib directories with the form
/<directory>.bak-<date-time>. Creates new copies of [dspace]/config, /etc, and /lib
directories. (See config option.)
install_code Deletes existing [dspace]/bin, /lib, and /etc directories, and installs new copies;
overwrites /solr application files, leaving data intact. (See config option.)
fresh_install
Target Effect
Performs a fresh installation of the software, including the database & config. (See
config, war options.)
test_database Tests database connection using parameters specified in dspace.cfg. (See config
option.)
setup_database Creates database tables. Database schema must exist and relevant parameters
specified in dspace.cfg. (See config option.)
load_registries Loads metadata & file format registries into the database. (See config option.)
clean_backups Removes [dspace]/bin, /etc, /lib, and /webapps directories with .bak* extensions.
clean_database Drops all DSpace database tables, destroying all data. (See config option.)
AIP Backup & Restore functionality only works with the Latest Version of Items
If you are using the new XMLUI-only Item Level Versioning functionality (disabled by default), you
must be aware that this "Item Level Versioning" feature is not yet compatible with AIP Backup &
Restore. Using them together may result in accidental data loss. Currently the AIPs that DSpace
generates only store the latest version of an Item. Therefore, past versions of Items will always be lost
when you perform a restore / replace using AIP tools.
Additional background information available in the Open Repositories 2010 Presentation entitled
Improving DSpace Backups, Restores & Migrations
As of DSpace 1.7, DSpace now can backup and restore all of its contents as a set of AIP Files. This includes all
Communities, Collections, Items, Groups and People in the system.
This feature came out of a requirement for DSpace to better integrate with DuraCloud, and other backup
storage systems. One of these requirements is to be able to essentially "backup" local DSpace contents into the
cloud (as a type of offsite backup), and "restore" those contents at a later time.
Essentially, this means DSpace can export the entire hierarchy (i.e. bitstreams, metadata and relationships
between Communities/Collections/Items) into a relatively standard format (a METS-based, AIP format). This
entire hierarchy can also be re-imported into DSpace in the same format (essentially a restore of that content in
the same or different DSpace installation).
Allows one to more easily move entire Communities or Collections between DSpace instances.
Allows for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own
local backup system), rather than relying on synchronizing a backup of your Database (stores metadata
/relationships) and assetstore (stores files/bitstreams).
Provides a way for people to more easily get their data out of DSpace (whatever the purpose may be).
Provides a relatively standard format for people to migrate entire hierarchies (Communities/Collections)
from one DSpace to another (or from another system into DSpace).
How does this differ from traditional DSpace Backups? Which Backup
route is better?
Traditionally, it has always been recommended to backup and restore DSpace's database and files (also known
as the "assetstore") separately. This is described in more detail in the Storage Layer section of the DSpace
System Documentation. The traditional backup and restore route is still a recommended and supported option.
However, the new AIP Backup & Restore option seeks to try and resolve many of the complexities of a
traditional backup and restore. The below table details some of the differences between these two valid Backup
and Restore options.
Supported Backup
/Restore Types
Can Backup & Yes (Requires two backups Yes (Though, will not backup/restore items which are
Restore all DSpace /restores – one for Database not officially "in archive")
Content easily and one for Files)
Can Backup & Yes (Requires two backups Restore can only backup/restore the latest version of an
Restore Item /restores – one for Database Item)
Versions and one for Files)
Supported Object
Types During
Backup & Restore
Supports backup Yes Yes (During restore, the AIP Ingester may throw a false
/restore of Item "Could not find a parent DSpaceObject" error (see
Mappings between Common Issues or Error Messages), if it tries to restore
Collections an Item Mapping to a Collection that it hasn't yet
restored. But this error can be safely bypassed using
the 'skipIfParentMissing' flag (see Additional Packager
Options for more details).
Supports backup Yes No (AIPs are only generated for objects which are
/restore of all in- completed and considered "in archive")
process,
uncompleted
Submissions (or
those currently in
an approval
workflow)
Supports backup Yes (if you backup your Not by default (unless your also backup parts of your
/restore of all local entire DSpace directory as DSpace directory – note, you wouldn't need to backup
DSpace part of backing up your files) the '[dspace]/assetstore' folder again, as those files are
Configurations and already included in AIPs)
Customizations
Based on your local institutions needs, you will want to choose the backup & restore process which is most
appropriate to you. You may also find it beneficial to use both types of backups on different time schedules, in
order to keep to a minimum the likelihood of losing your DSpace installation settings or its contents. For
example, you may choose to perform a Traditional Backup once per week (to backup your local system
configurations and customizations) and an AIP Backup on a daily basis. Alternatively, you may choose to
perform daily Traditional Backups and only use the AIP Backup as a "permanent archives" option (perhaps
performed on a weekly or monthly basis).
If you choose to use the AIP Backup and Restore option, do not forget to also backup your local
DSpace configurations and customizations. Depending on how you manage your own local DSpace,
these configurations and customizations are likely in one or more of the following locations:
[dspace] - The DSpace installation directory (Please note, if you also use the AIP Backup &
Restore option, you do not need to backup your [dspace]/assetstore directory, as those
files already exist in your AIPs).
[dspace-source] - The DSpace source directory
In the initial DuraCloud work, the DuraCloud team is working on a way to "synchronize" DuraCloud with a local
file folder. So, DuraCloud can be configured to "watch" a given folder and automatically replicate its contents
into the cloud.
Therefore, moving content from DSpace to DuraCloud would currently be a two-step process:
1. First, export AIPs describing that content from DSpace to a filesystem folder
2. Second, enable DuraCloud to watch that same filesystem folder and replicate it into the cloud.
Similarly, moving content from DuraCloud back into DSpace would also be a two-step process:
1. First, you'd tell DuraCloud to replicate the AIPs from the cloud to a folder on your file system
2. Second, you'd ingest those AIPs back into DSpace
(These backup/restore processes may change as we go forward and investigate more use cases. This is just
the initial plan.)
Collection or Community AIPs do not include all child objects (e.g. Items in those Collections or
Communities), as each AIP only describes one object. However, these container AIPs do contain
references (links) to all child objects. These references can be used by DSpace to automatically
restore all referenced AIPs when restoring a Collection or Community.
AIPs are only generated for objects which are currently in the "in archive" state in DSpace. This
means that in-progress, uncompleted submissions are not described in AIPs and cannot be
restored after a disaster. Permanently removed objects will also no longer be exported as AIPs
after their removal. However, withdrawn objects will continue to be exported as AIPs, since they
are still considered under the "in archive" status.
AIPs with identical contents will always have identical checksums. This provides a basic means of
validating whether the contents within an AIP have changed. For example, if a Collection's AIP
has the same checksum at two different points in time, it means that Collection has not changed
during that time period.
AIP profile favors completeness and accuracy rather than presenting the semantics of an object in
a standard format. It conforms to the quirks of DSpace's internal object model rather than
attempting to produce a universally understandable representation of the object. When possible,
an AIP tries to use common standards to express objects.
An AIP can serve as a DIP (Dissemination Information Package) or SIP (Submission Information
Package), especially when transferring custody of objects to another DSpace implementation.
In contrast to SIP or DIP, the AIP should include all available DSpace structural and administrative
metadata, and basic provenance information. AIPs also describe some basic system level
information (e.g. Groups and People).
For more specific details of AIP format / structure, along with examples, please see DSpace AIP Format.
Exporting AIPs
Single AIP (default, using -d option) - Exports just an AIP describing a single DSpace object. So, if you
ran it in this default mode for a Collection, you'd just end up with a single Collection AIP (which would not
include AIPs for all its child Items)
Hierarchy of AIPs (using the -d --all or -d -aoption) - Exports the requested AIP describing an
object, plus the AIP for all child objects. Some examples follow:
For a Site - this would export all Communities, Collections & Items within the site into AIP files (in
a provided directory)
For a Community - this would export that Community and all SubCommunities, Collections and
Items into AIP files (in a provided directory)
For a Collection - this would export that Collection and all contained Items into AIP files (in a
provided directory)
For an Item – this just exports the Item into an AIP as normal (as it already contains its Bitstreams
/Bundles by default)
for example:
The above code will export the object of the given handle (4321/4567) into an AIP file named "aip4567.zip".
This will not include any child objects for Communities or Collections.
for example:
The above code will export the object of the given handle (4321/4567) into an AIP file named "aip4567.zip". In
addition it would export all children objects to the same directory as the "aip4567.zip" file. The child AIP files are
all named using the following format:
This general file naming convention ensures that you can easily locate an object to restore by its
name (assuming you know its Object Type and Handle).
Alternatively, if object doesn't have a Handle, it uses this File Name Format: <Obj-Type>@internal-
id-<DSpace-ID>.zip (e.g. ITEM@internal-id-234.zip)
AIPs are only generated for objects which are currently in the "in archive" state in DSpace. This means that in-
progress, uncompleted submissions are not described in AIPs and cannot be restored after a disaster.
Again, this would export the DSpace Site AIP into the file "sitewide-aip.zip", and export AIPs for all
Communities, Collections and Items into the same directory as the Site AIP.
1. Submit/Ingest Mode (-s option, default) – submit AIP(s) to DSpace in order to create a new object(s) (i.
e. AIP is treated like a SIP – Submission Information Package)
2. Restore Mode (-r option) – restore pre-existing object(s) in DSpace based on AIP(s). This also attempts
to restore all handles and relationships (parent/child objects). This is a specialized type of "submit",
where the object is created with a known Handle and known relationships.
3. Replace Mode (-r -f option) – replace existing object(s) in DSpace based on AIP(s). This also
attempts to restore all handles and relationships (parent/child objects). This is a specialized type of
"restore" where the contents of existing object(s) is replaced by the contents in the AIP(s). By default, if a
normal "restore" finds the object already exists, it will back out (i.e. rollback all changes) and report which
object already exists.
Again, like export, there are two types of AIP Ingestion you can perform (using any of the above modes):
Single AIP (default) - Ingests just an AIP describing a single DSpace object. So, if you ran it in this
default mode for a Collection AIP, you'd just create a DSpace Collection from the AIP (but not ingest any
of its child objects)
Hierarchy of AIPs (by including the --all or -aoption after the mode) - Ingests the requested AIP
describing an object, plus the AIP for all child objects. Some examples follow:
For a Site - this would ingest all Communities, Collections & Items based on the located AIP files
For a Community - this would ingest that Community and all SubCommunities, Collections and
Items based on the located AIP files
For a Collection - this would ingest that Collection and all contained Items based on the located
AIP files
For an Item – this just ingest the Item (including all Bitstreams & Bundles) based on the AIP file.
Submission Mode (-s mode) - creates a new object (AIP is treated like a SIP)
By default, a new Handle is always assigned
However, you can force it to use the handle specified in the AIP by specifying -o
ignoreHandle=false as one of your parameters
By default, a new Parent object must be specified (using the -p parameter). This is the location
where the new object will be created.
However, you can force it to use the parent object specified in the AIP by specifying -o
ignoreParent=false as one of your parameters
By default, will respect a Collection's Workflow process when you submit an Item to a Collection
However, you can specifically skip any workflow approval processes by specifying -w
parameter.
Always adds a new Deposit License to Items
Always adds new DSpace System metadata to Items (includes new "dc.date.accessioned", "dc.
date.available", "dc.date.issued" and "dc.description.provenance" entries)
WARNING: Submission mode may not be able to maintain Item Mappings between Collections.
Because these mappings are recorded via the Collection Handles, mappings may be restored
improperly if the Collection handle has changed when moving content from one DSpace instance
to another.
Restore / Replace Mode (-r mode) - restores a previously existing object (as if from a backup)
By default, the Handle specified in the AIP is restored
However, for restores, you can force a new handle to be generated by specifying -o
ignoreHandle=true as one of your parameters. (NOTE: Doesn't work for replace mode
as the new object always retains the handle of the replaced object)
Although a Restore/Replace does restore Handles, it will not necessarily restore the
same internal IDs in your Database.
By default, the object is restored under the Parent specified in the AIP
However, for restores, you can force it to restore under a different parent object by using
the -p parameter. (NOTE: Doesn't work for replace mode, as the new object always retains
the parent of the replaced object)
Always skips any Collection workflow approval processes when restoring/replacing an Item in a
Collection
Never adds a new Deposit License to Items (rather it restores the previous deposit license, as
long as it is stored in the AIP)
Never adds new DSpace System metadata to Items (rather it just restores the metadata as
specified in the AIP)
It is possible to change some of the default behaviors of both the Submission and Restore/Replace
Modes. Please see the Additional Packager Options section below for a listing of command-line
options that allow you to override some of the default settings described above.
This option allows you to essentially use an AIP as a SIP (Submission Information Package). The
default settings will create a new DSpace object (with a new handle and a new parent object, if
specified) from your AIP.
To ingest a single AIP and create a new DSpace object under a parent of your choice, specify the -p (or --
parent) package parameter to the command. Also, note that you are running the packager in -s (submit)
mode.
NOTE: This only ingests the single AIP specified. It does not ingest all children objects.
If you leave out the -p parameter, the AIP package ingester will attempt to install the AIP under the same
parent it had before. As you are also specifying the -s (submit) parameter, the packager will assume you want
a new Handle to be assigned (as you are effectively specifying that you are submitting a new object). If you
want the object to retain the Handle specified in the AIP, you can specify the -o ignoreHandle=false
option to force the packager to not ignore the Handle specified in the AIP.
This option allows you to essentially use a set of AIPs as SIPs (Submission Information Packages).
The default settings will create a new DSpace object (with a new handle and a new parent object, if
specified) from each AIP
To ingest an AIP hierarchy from a directory of AIPs, use the -a (or --all) package parameter.
for example:
The above command will ingest the package named "aip4567.zip" as a child of the specified Parent Object
(handle="4321/12"). The resulting object is assigned a new Handle (since -s is specified). In addition, any child
AIPs referenced by "aip4567.zip" are also recursively ingested (a new Handle is also assigned for each child
AIP).
Another example – Ingesting a Top-Level Community (by using the Site Handle, <site-handle-prefix>
/0):
The above command will ingest the package named "community-aip.zip" as a top-level community (i.e. the
specified parent is "4321/0" which is a Site Handle). Again, the resulting object is assigned a new Handle. In
addition, any child AIPs referenced by "community-aip.zip" are also recursively ingested (a new Handle is also
assigned for each child AIP).
Please note: If you are submitting a larger amount of content (e.g. multiple Communities/Collections)
to your DSpace, you may want to tell the 'packager' command to skip over any existing Collection
approval workflows by using the -w flag. By default, all Collection approval workflows will be
respected. This means if the content you are submitting includes a Collection with an enabled
workflow, you may see the following occur:
Therefore, if this content has already received some level of approval, you may want to submit
it using the -w flag, which will skip any workflow approval processes. For more information, see
Submitting AIP(s) while skipping any Collection Approval Workflows.
When an Item is mapped to one or more Collections, this mapping is recorded in the AIP using the
mapped Collection's handle. Unfortunately, since the submission mode (-s) assigns new handles to
all objects in the hierarchy, this may mean that the mapped Collection's handle will have changed (or
even that a different Collection will be available at the original mapped Collection's handle). DSpace
does not have a way to uniquely identify Collections other than by handle, which means that item
mappings are only able to be retained when the Collection handle is also retained.
1. Use the restore/replace mode (-r) instead, as it will retain existing Collection Handles.
Unfortunately though, this may not work if the content is being moved from a Test DSpace to a
Production DSpace, as these existing handles may not be valid.
2. OR, use the submission mode with the "--o ignoreHandle=false". This will also retain existing
Collection Handles. Unfortunately though, this may not work if the content is being moved from
a Test DSpace to a Production DSpace, as these existing handles may not be valid.
3. OR, remove all existing Item Mappings and re-export AIPs (without Item Mappings). Then,
import the hierarchy into the new DSpace instance (again without Item Mappings). Finally,
recreate the necessary Item Mappings using a different tool, e.g. the Batch Metadata Editing
tool supports bulk editing of Collection memberships/mappings.
Please note, if you are using AIPs to move an entire Community or Collection from one DSpace to
another, there is a known issue (see DS-1105) that the new DSpace instance will be unable to (re-)
create any DSpace Groups or EPeople which are referenced by a Community or Collection AIP. The
reason is that the Community or Collection AIP itself doesn't contain enough information to create
those Groups or EPeople (rather that info is stored in the SITE AIP, for usage during Full Site Restores
).
However, there are two possible ways to get around this known issue:
EITHER, you can manually recreate all referenced Groups/EPeople in the new DSpace that
you are submitting the Community or Collection AIP into.
Note that if you are using Groups named with DSpace Database IDs (e.g.
COMMUNITY_1_ADMIN, COLLECTION_2_SUBMIT), you may first need to rename
those groups to no longer include Database IDs (e.g. MY_SUBMITTERS). The reason is
that Database IDs will likely change when you move a Community or Collection to a new
DSpace installation.
OR, you can temporarily disable the import of Group/EPeople information when submitting the
Community or Collection AIP to the new DSpace. This would mean that after you submit the
AIP to the new DSpace, you'd have to manually go in and add in any special permissions (as
needed). To disable the import of Group/EPeople information, add these settings to your
dspace.cfgfile, and re-run the submission of the AIP with these settings in place:
mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
Don't forget to remove these settings after you import your Community or Collection AIP.
Leaving them in place will mean that every time you import an AIP, all of its Group
/EPeople/Permissions would be ignored.
However, if you'd like to skip all workflow approval processes you can use the -w flag to do so. For example,
the following command will skip any Collection approval workflows and immediately add the Item to a
Collection.
This -w flag may also be used when Submitting an AIP Hierarchy. For example, if you are migrating one or
more Collections/Communities from one DSpace to another, you may choose to submit those AIPs with the -w
option enabled. This will ensure that, if a Collection has a workflow approval process enabled, all its Items are
available immediately rather than being all placed into the workflow approval process.
1. Default Restore Mode (-r) = Attempt to restore object (and optionally children). Rollback all changes if
any object is found to already exist.
2. Restore, Keep Existing Mode (-r -k) = Attempt to restore object (and optionally children). If an object is
found to already exist, skip over it (and all children objects), and continue to restore all other non-existing
objects.
3. Force Replace Mode (-r -f) = Restore an object (and optionally children) and overwrite any existing
objects in DSpace. Therefore, if an object is found to already exist in DSpace, its contents are replaced
by the contents of the AIP. WARNING: This mode is potentially dangerous as it will permanently destroy
any object contents that do not currently exist in the AIP. You may want to perform a secondary backup,
unless you are sure you know what you are doing!
Restore a Single AIP: Use this 'packager' command template to restore a single object from an AIP (not
including any child objects):
Restore a Hierarchy of AIPs: Use this 'packager' command template to restore an object from an AIP along
with all child objects (from their AIPs):
For example:
Notice that unlike -s option (for submission/ingesting), the -r option does not require the Parent Object (-p
option) to be specified if it can be determined from the package itself.
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle
provided within the package itself (and added as a child of the parent object specified within the package itself).
In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested (the -a option specifies to
also restore all child AIPs). They are also restored with the Handles & Parent Objects provided with their
package. If any object is found to already exist, all changes are rolled back (i.e. nothing is restored to DSpace)
In some cases, when you restore a large amount of content to your DSpace, the internal database
counts (called "sequences") may get out of sync with the Handles of the content you just restored. As
a best practice, it is highly recommended to always re-run the "update-sequences.sql" script on
your DSpace database after a larger scale restore. This database script should be run while DSpace
is stopped (you may either stop Tomcat or just the DSpace webapps). PostgreSQL/Oracle must be
running. The script can be found in the following locations for PostgreSQL and Oracle, respectively:
[dspace]/etc/postgres/update-sequences.sql
[dspace]/etc/oracle/update-sequences.sql
Using the Default Restore Mode without the -a option, will only restore the metadata for that
specific Community or Collection. No child objects will be restored.
Using the Default Restore Mode with the -a option, will only successfully restore a Community
or Collection if that object along with any child objects (Sub-Communities, Collections or Items)
do not already exist. In other words, if any objects belonging to that Community or Collection
already exist in DSpace, the Default Restore Mode will report an error that those object(s) could
not be recreated. If you encounter this situation, you will need to perform the restore using
either the Restore, Keep Existing Mode or the Force Replace Mode (depending on whether you
want to keep or replace those existing child objects).
One special case to note: If a Collection or Community is found to already exist, its child objects are also
skipped over. So, this mode will not auto-restore items to an existing Collection.
Restore a Hierarchy of AIPs: Use this 'packager' command template to restore an object from an AIP along
with all child objects (from their AIPs):
For example:
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle
provided within the package itself (and added as a child of the parent object specified within the package itself).
In addition, any child AIPs referenced by "aip4567.zip" are also recursively restored (the -a option specifies to
also restore all child AIPs). They are also restored with the Handles & Parent Objects provided with their
package. If any object is found to already exist, it is skipped over (child objects are also skipped). All non-
existing objects are restored.
This mode may also be used to restore missing objects which refer to existing objects. For example, if
you are restoring a missing Collection which had existing Items linked to it, you can use this mode to
auto-restore the Collection and update those existing Items so that they again link back to the newly
restored Collection.
Because this mode actually destroys existing content in DSpace, it is potentially dangerous and may
result in data loss! You may wish to perform a secondary full backup (assetstore files & database)
before attempting to replace any existing object(s) in DSpace.
Replace using a Single AIP: Use this 'packager' command template to replace a single object from an AIP
(not including any child objects):
Replace using a Hierarchy of AIPs: Use this 'packager' command template to replace an object from an AIP
along with all child objects (from their AIPs):
For example:
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle
provided within the package itself (and added as a child of the parent object specified within the package itself).
In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested. They are also restored
with the Handles & Parent Objects provided with their package. If any object is found to already exist, its
contents are replaced by the contents of the appropriate AIP.
If any error occurs, the script attempts to rollback the entire replacement process.
1. Install a completely "fresh" version of DSpace by following the Installation instructions in the DSpace
Manual
At this point, you should have a completely empty, but fully-functional DSpace installation. You will
need to create an initial Administrator user in order to perform this restore (as a full-restore can
only be performed by a DSpace Administrator).
2. Once DSpace is installed, run the following command to restore all its contents from AIPs
Notice that you are running this command in "Force Replace" mode (-r -f). This is necessary as your
empty DSpace install will already include a few default groups (Administrators and Anonymous) and your
initial administrative user. You need to replace these groups in order to restore your prior DSpace
contents completely.
<eperson> should be replaced with the Email Address of the initial Administrator (who you created
when you reinstalled DSpace).
<site-handle-prefix> should be replaced with your DSpace site's assigned Handle Prefix. This is
equivalent to the handle.prefix setting in your dspace.cfg
/full/path/to/your/site-aip.zip is the full path to the AIP file which represents your DSpace
SITE. This file will be named whatever you named it when you actually exported your entire site. All other
AIPs are assumed to be referenced from this SITE AIP (in most cases, they should be in the same
directory as that SITE AIP).
In some cases, when you restore a large amount of content to your DSpace, the internal database
counts (called "sequences") may get out of sync with the Handles of the content you just restored. As
a best practice, it is highly recommended to always re-run the "update-sequences.sql" script on
your DSpace database after a larger scale restore. This database script should be run while DSpace
is stopped (you may either stop Tomcat or just the DSpace webapps). PostgreSQL/Oracle must be
running. The script can be found in the following locations for PostgreSQL and Oracle, respectively:
[dspace]/etc/postgres/update-sequences.sql
[dspace]/etc/oracle/update-sequences.sql
createMetadataFields= ingest- true Tells the AIP ingester to automatically create any
[value] only metadata fields which are found to be missing from the
DSpace Metadata Registry. When 'true', this means as
each AIP is ingested, new fields may be added to the
DSpace Metadata Registry if they don't already exist.
When 'false', an AIP ingest will fail if it encounters a
metadata field that doesn't exist in the DSpace
Metadata Registry. (NOTE: This will not create missing
DSpace Metadata Schemas. If a schema is found to be
missing, the ingest will always fail.)
filterBundles=[value] export- defaults to This option can be used to limit the Bundles which are
only exporting exported to AIPs for each DSpace Item. By default, all
all file Bundles will be exported into Item AIPs. You could
Bundles use this option to limit the size of AIPs by only exporting
certain Bundles. WARNING: any bundles not included
in AIPs will obviously be unable to be restored. This
option can be run in two ways:
ignoreHandle=[value] ingest- Restore If 'true', the AIP ingester will ignore any Handle
only /Replace specified in the AIP itself, and instead create a new
Mode Handle during the ingest process (this is the default
defaults to when running in Submit mode, using the -s flag). If
'false', 'false', the AIP ingester attempts to restore the Handles
Submit specified in the AIP (this is the default when running in
Mode Restore/replace mode, using the -r flag).
defaults to
'true'
ignoreParent=[value] ingest- Restore If 'true', the AIP ingester will ignore any Parent object
only /Replace specified in the AIP itself, and instead ingest under a
Mode new Parent object (this is the default when running in
defaults to Submit mode, using the -s flag). The new Parent
'false', object must be specified via the -p flag (run dspace
Submit packager -h for more help). If 'false', the AIP ingester
Mode attempts to restore the object directly under its old
defaults to Parent (this is the default when running in Restore
'true' /replace mode, using the -r flag).
export- defaults to This option can be used to limit the Bundles which are
only "all" exported to AIPs for each DSpace Item. By default, all
includeBundles= file Bundles will be exported into Item AIPs. You could
[value] use this option to limit the size of AIPs by only exporting
certain Bundles. WARNING: any bundles not included
in AIPs will obviously be unable to be restored. This
option expects a comma separated list of bundle
names (e.g. "ORIGINAL,LICENSE,CC_LICENSE,
METADATA"), or "all" if all bundles should be included.
manifestOnly=[value] both false If 'true', the AIP Disseminator will only import/export a
import METS Manifest XML file (i.e. result will be an unzipped
and 'mets.xml' file), instead of a full AIP. This METS
export Manifest contains URI references to all content files,
but does not contain any content files. This option is
experimental and is meant for debugging purposes
only. It should never be set to 'true' if you want to
be able to restore content files. Again, please note
that when you use this option, the final result will be an
XML file, NOT the normal ZIP-based AIP format.
skipIfParentMissing= import- false If 'true', ingestion will skip over any "Could not find a
[value] only parent DSpaceObject" errors that are encountered
during the ingestion process (Note: those errors will still
be logged as "warning" messages in your DSpace log
file). If you are performing a full site restore (or a
restore of a larger Community/Collection hierarchy),
you may encounter these errors if you have a larger
number of Item mappings between Collections (i.e.
Items which are mapped into several collections at
once). When you are performing a recursive ingest,
skipping these errors should not cause any problems.
Once the missing parent object is ingested it will
automatically restore the Item mapping that caused the
error. For more information on this "Could not find a
parent DSpaceObject" error see Common Issues or
Error Messages.
unauthorized=[value] export- unspecified If 'skip', the AIP Disseminator will skip over any
only unauthorized Bundle or Bitstream encountered (i.e. it
will not be added to the AIP). If 'zero', the AIP
Disseminator will add a Zero-length "placeholder" file to
the AIP when it encounters an unauthorized Bitstream.
If unspecified (the default value), the AIP Disseminator
will throw an error if an unauthorized Bundle or
Bitstream is encountered.
validate=[value]
both Export If 'true', every METS file in AIP will be validated before
import defaults to ingesting or exporting. By default, DSpace will validate
and 'true', everything on export, but will skip validation during
export Ingest import. Validation on export will ensure that all exported
defaults to AIPs properly conform to the METS profile (and will
'false' throw errors if any do not). Validation on import will
ensure every METS file in every AIP is first validated
before importing into DSpace (this will cause the
ingestion processing to take longer, but tips on
speeding it up can be found in the "AIP Configurations
To Improve Ingestion Speed while Validating" section
below). DSpace recommends minimally validating AIPs
on export. Ideally, you should validate both on export
and import, but import validation is disabled by default
in order to increase the speed of AIP restores.
From the command-line, you can add the option to your command by using the -o or --option parameter.
For example:
As a basic example:
It is recommended to minimally use the default settings when generating AIPs. DSpace can only
restore information that is included within an AIP. Therefore, if you choose to no longer include some
information in an AIP, DSpace will no longer be able to restore that information from an AIP backup
aip.disseminate.techMD - Lists the DSpace Crosswalks (by name) which should be called to
populate the <techMD> section of the METS file within the AIP (Default: PREMIS, DSPACE-ROLES)
The PREMIS crosswalk generates PREMIS metadata for the object specified by the AIP
The DSPACE-ROLES crosswalk exports DSpace Group / EPerson information into AIPs in a
DSpace-specific XML format. Using this crosswalk means that AIPs can be used to recreated
Groups & People within the system. (NOTE: The DSPACE-ROLES crosswalk should be used
alongside the METSRights crosswalk if you also wish to restore the permissions that Groups
/People have within the System. See below for more info on the METSRights crosswalk.)
aip.disseminate.sourceMD - Lists the DSpace Crosswalks (by name) which should be called to
populate the <sourceMD> section of the METS file within the AIP (Default: AIP-TECHMD)
The AIP-TECHMD Crosswalk generates technical metadata (in DIM format) for the object
specified by the AIP
aip.disseminate.digiprovMD - Lists the DSpace Crosswalks (by name) which should be called to
populate the <digiprovMD> section of the METS file within the AIP (Default: None)
aip.disseminate.rightsMD - Lists the DSpace Crosswalks (by name) which should be called to
populate the <rightsMD> section of the METS file within the AIP (Default: DSpaceDepositLicense:
DSPACE_DEPLICENSE, CreativeCommonsRDF:DSPACE_CCRDF, CreativeCommonsText:
DSPACE_CCTEXT, METSRights)
The DSPACE_DEPLICENSE crosswalk ensures the DSpace Deposit License is referenced/stored
in AIP
The DSPACE_CCRDF crosswalk ensures any Creative Commons RDF Licenses are reference
/stored in AIP
The DSPACE_CCTEXT crosswalk ensures any Creative Commons Textual Licenses are referenced
/stored in AIP
The METSRights crosswalk ensures that Permissions/Rights on DSpace Objects (Communities,
Collections, Items or Bitstreams) are referenced/stored in AIP. Using this crosswalk means that
AIPs can be used to restore permissions that a particular Group or Person had on a DSpace
Object. (NOTE: The METSRights crosswalk should always be used in conjunction with the
DSPACE-ROLES crosswalk (see above) or a similar crosswalk. The METSRights crosswalk can
only restore permissions, and cannot re-create Groups or EPeople in the system. The DSPACE-
ROLES can actually re-create the Groups or EPeople as needed.)
aip.disseminate.dmd - Lists the DSpace Crosswalks (by name) which should be called to populate
the <dmdSec>section of the METS file within the AIP (Default: MODS, DIM)
The MODS crosswalk translates the DSpace descriptive metadata (for this object) into MODS. As
MODS is a relatively "standard" metadata schema, it may be useful to include a copy of MODS
metadata in your AIPs if you should ever want to import them into another (non-DSpace) system.
The DIM crosswalk just translates the DSpace internal descriptive metadata into an XML format.
This XML format is proprietary to DSpace, but stores the metadata in a format similar to Qualified
Dublin Core.
mets.dspaceAIP.ingest.crosswalk.<mdType> = <DSpace-crosswalk-name>
<mdType> is the type of metadata as specified in the METS file. This corresponds to the value of
the @MDTYPE attribute (of that metadata section in the METS). When the @MDTYPE attribute is
"OTHER", then the <mdType> corresponds to the @OTHERMDTYPE attribute value.
<DSpace-crosswalk-name> specifies the name of the DSpace Crosswalk which should be used to
ingest this metadata into DSpace. You can specify the "NULLSTREAM" crosswalk if you
specifically want this metadata to be ignored (and skipped over during ingestion).
mets.dspaceAIP.ingest.crosswalk.DSpaceDepositLicense = NULLSTREAM
mets.dspaceAIP.ingest.crosswalk.CreativeCommonsRDF = NULLSTREAM
mets.dspaceAIP.ingest.crosswalk.CreativeCommonsText = NULLSTREAM
The above settings tell the ingester to ignore any metadata sections which reference DSpace Deposit Licenses
or Creative Commons Licenses. These metadata sections can be safely ignored as long as the "LICENSE" and
"CC_LICENSE" bundles are included in AIPs (which is the default setting). As the Licenses are included in
those Bundles, they will already be restored when restoring the bundle contents.
If unspecified in the above settings, the AIP ingester will automatically use the Crosswalk which is
named the same as the @MDTYPE or @OTHERMDTYPE attribute for the metadata section. For
example, a metadata section with an @MDTYPE="PREMIS" will be processed by the DSpace
Crosswalk named "PREMIS".
mets.dspaceAIP.ingest.createSubmitter = false
In order to perform validations in a speedy fashion, you can pull down a local copy of all schemas. Validation
will then use this local cache, which can sometimes increase the speed up to 10 x.
To use a local cache of XML schemas when validating, use the following settings in 'dspace.cfg'. The general
format is:
The default settings are all commented out. But, they provide a full listing of all schemas currently used during
validation of AIPs. In order to utilize them, uncomment the settings, download the appropriate schema file, and
save it to your [dspace]/config/schemas/ directory (by default this directory does not exist – you will need
to create it) using the specified file name:
Ingest/Restore Error: If you receive this problem, you are likely attempting to Restore an Entire Site, but are
"Group Administrator not running the command in Force Replace Mode (-r -f). Please see the section on
already exists" Restoring an Entire Site for more details on the flags you should be using.
Ingest/Restore Error: If you receive this problem, one or more of your Items is using a custom metadata
"Unknown Metadata schema which DSpace is currently not aware of (in the example, the schema is
Schema named "mycustomschema"). Because DSpace AIPs do not contain enough details to
encountered recreate the missing Metadata Schema, you must create it manually via the DSpace
(mycustomschema)" Admin UI. Please note that you only need to create the Schema. You do not
need to manually create all the fields belonging to that schema, as DSpace will
do that for you as it restores each AIP. Once the schema is created in DSpace, re-
run your restore command. DSpace will automatically re-create all fields belonging to
that custom metadata schema as it restores each Item that uses that schema.
Ingest Error: "Could When you encounter this error message it means that an object could not be ingested
not find a parent /restored as it belongs to a parent object which doesn't currently exist in your DSpace
DSpaceObject instance. During a full restore process, this error can be skipped over and treated as
referenced as 'xxx a warning by specifying the 'skipIfParentMissing=true' option (see Additional
/xxx'" Packager Options). If you have a larger number of Items which are mapped to
multiple Collections, the AIP Ingester will sometimes attempt to restore an item
mapping before the Collection itself has been restored (thus throwing this error).
Luckily, this is not anything to be concerned about. As soon as the Collection is
restored, the Item Mapping which caused the error will also be automatically restored.
So, if you encounter this error during a full restore, it is safe to bypass this error
message using the 'skipIfParentMissing=true' option. All your Item Mappings should
still be restored correctly.
Submit Error: This error means that while submitting one or more AIPs, DSpace encountered a
PSQLException: Handle conflict. This is a general error the may occur in DSpace if your Handle
ERROR: duplicate sequence has somehow become out-of-date. However, it's easy to fix. Just run the
key value violates [dspace]/etc/postgres/update-sequences.sql script (or if you are using
unique constraint Oracle, run: [dspace]/etc/oracle/update-sequences.sql).
"handle_handle_key"
If you are using the new XMLUI-only Item Level Versioning functionality (disabled by default), you
must be aware that this "Item Level Versioning" feature is not yet compatible with AIP Backup &
Restore. Using them together may result in accidental data loss. Currently the AIPs that DSpace
generates only store the latest version of an Item. Therefore, past versions of Items will always be lost
when you perform a restore / replace using AIP tools.
restored after a disaster. Permanently removed objects will also no longer be exported as AIPs
after their removal. However, withdrawn objects will continue to be exported as AIPs, since they
are still considered under the "in archive" status.
AIPs with identical contents will always have identical checksums. This provides a basic means of
validating whether the contents within an AIP have changed. For example, if a Collection's AIP
has the same checksum at two different points in time, it means that Collection has not changed
during that time period.
AIP profile favors completeness and accuracy rather than presenting the semantics of an object in
a standard format. It conforms to the quirks of DSpace's internal object model rather than
attempting to produce a universally understandable representation of the object. When possible,
an AIP tries to use common standards to express objects.
An AIP can serve as a DIP (Dissemination Information Package) or SIP (Submission Information
Package), especially when transferring custody of objects to another DSpace implementation.
In contrast to SIP or DIP, the AIP should include all available DSpace structural and administrative
metadata, and basic provenance information. AIPs also describe some basic system level
information (e.g. Groups and People).
Notes:
Bitstreams and Bundles are second-class archival objects; they are recorded in the context of an Item.
BitstreamFormats are not even second-class; they are described implicitly within Item technical
metadata, and reconstructed from that during restoration
EPeople are only defined in Site AIP, but may be referenced from Community or Collection AIPs
Groups may be defined in Site AIP, Community AIP or Collection AIP. Where they are defined depends
on whether the Group relates specifically to a single Community or Collection, or is just a general site-
wide group.
DSpace Site configurations ([dspace]/config/ directory) or customizations (themes, stylesheets, etc) are
not described in AIPs
DSpace Database model (or customizations therein) is not described in AIPs
Any objects which are not currently in the "In Archive" state are not described in AIPs. This means that in-
progress, unfinished submissions are never included in AIPs.
AIP Recommendations
It is recommended to minimally use the default settings when generating AIPs. DSpace can only
restore information that is included within an AIP. Therefore, if you choose to no longer include some
information in an AIP, DSpace will no longer be able to restore that information from an AIP backup
1. You can customize your dspace.cfg settings pertaining to AIP generation. These configurations will
allow you to specify exactly which DSpace Crosswalks will be called when generating the AIP METS
manifest.
2. You can export your AIPs using one of the special options/flags.
This METS Structure is based on the structure decided for the original AipPrototype, developed as
part of the MIT & UCSD PLEDGE project.
mets element
@PROFILE fixed value="http://www.dspace.org/schema/aip/1.0/mets.xsd" (this is how we identify
an AIP manifest)
@OBJID URN-format persistent identifier (i.e. Handle) if available, or else a unique identifier. (e.g.
"hdl:123456789/1")
@LABEL title if available
@TYPE DSpace object type, one of "DSpace ITEM", "DSpace COLLECTION", "DSpace
COMMUNITY" or "DSpace SITE".
@ID is a globally unique identifier, built using the Handle and the Object type (e.g. dspace-
COLLECTION-hdl:123456789/3).
mets/metsHdr element
@LASTMODDATE last-modified date for a DSpace Item, or nothing for other objects.
agent element:
@ROLE = "CUSTODIAN",
@TYPE = "OTHER",
@OTHERTYPE = "DSpace Archive",
name = Site handle. (Note: The Site Handle is of the format [handle_prefix]/0, e.g.
"123456789/0")
agent element:
@ROLE = "CREATOR",
@TYPE = "OTHER",
@OTHERTYPE = "DSpace Software",
name = "DSpace [version]" (Where "[version]" is the specific version of DSpace software
which created this AIP, e.g. "1.7.0")
mets/dmdSec element(s)
By default, two dmdSec elements are included for all AIPs:
1. object's descriptive metadata crosswalked to MODS (specified by mets/dmdSec
/mdWrap@MDTYPE="MODS"). See #MODS Schema section below for more information.
2. object's descriptive metadata in DSpace native DIM intermediate format, to serve as a
complete and precise record for restoration or ingestion into another DSpace. Specified by
mets/dmdSec/mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="DIM". See #DIM (DSpace
Intermediate Metadata) Schema section below for more information.
For Collection AIPs, additional dmdSec elements may exist which describe the Item Template for
that Collection. Since an Item template is not an actual Item (i.e. it only includes metadata), it is
stored within the Collection AIP. The Item Template's dmdSec elements will be referenced by a
div @TYPE="DSpace ITEM Template" in the METS structMap.
When the mdWrap @TYPE value is OTHER, the element MUST include a value for the @OTHERTYPE
attribute which names the crosswalk that produced (or interprets) that metadata, e.g. DIM.
mets/amdSec element(s)
One or more amdSec elements are include for all AIPs. The first amdSec element contains
administrative metadata (technical, source, rights, and provenance) for the entire archival object.
Additional amdSec elements may exist to describe parts of the archival object (e.g. Bitstreams or
Bundles in an Item).
techMD elements. By default, two types of techMD elements may be included:
PREMIS metadata about an object may be included here (currently only specified for
Bitstreams (files)). Specified by mdWrap@MDTYPE="PREMIS". See #PREMIS
Schema section below for more information.
DSPACE-ROLES metadata may appear here to describe the Groups or EPeople
related to this object (_currently only specified for Site, Community and Collection).
Specified by mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="DSPACE-ROLES". See
#DSPACE-ROLES Schema section below for more information.
rightsMD elements. By default, there are four possible types of rightsMD elements
which may be included:
METSRights metadata may appear here to describe the permissions on this object.
Specified by mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="METSRIGHTS". See
#METSRights Schema section below for more information.
DSpaceDepositLicense if the object is an Item and it has a deposit license, it is
contained here. Specified by mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="
DSpaceDepositLicense".
CreativeCommonsRDF If the object is an Item with a Creative Commons license
expressed in RDF, it is included here. Specified by mdWrap@MDTYPE="OTHER",
@OTHERMDTYPE="CreativeCommonsRDF".
CreativeCommonsText If the object is an Item with a Creative Commons license
in plain text, it is included here. Specified by mdWrap@MDTYPE="OTHER",
@OTHERMDTYPE="CreativeCommonsText".
sourceMD element. By default, there is only one type of sourceMD element which may
appear:
AIP-TECHMD metadata may appear here. This stores basic technical/source
metadata about in object in a DSpace native format. Specified by
mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="AIP-TECHMD". See #AIP
Technical Metadata Schema (AIP-TECHMD) section below for more information.
digiprovMD element.
Not used at this time.
mets/fileSec element
For ITEM objects:
Each distinct Bundle in an Item goes into a fileGrp. The fileGrp has a @USE attribute
which corresponds to the Bundle name.
Bitstreams in bundles become file elements under fileGrp.
mets/fileSec/fileGrp/fileelements
Set @SIZE to length of the bitstream. There is a redundant value in the <techMD>
but it is more accessible here.
Set @MIMETYPE, @CHECKSUM, @CHECKSUMTYPE to corresponding bitstream values.
There is redundant info in the <techMD>. (For DSpace, the @CHECKSUMTYPE="MD5"
at all times)
SET @SEQ to bitstream's SequenceID if it has one.
SET @ADMID to the list of <amdSec> element(s) which describe this bitstream.
For COLLECTION and COMMUNITY objects:
Only if the object has a logo bitstream, there is a fileSec with one fileGrp child of
@USE="LOGO".
The fileGrp contains one file element, representing the logo Bitstream. It has the
same @MIMETYPE, @CHECKSUM, @CHECKSUMTYPE attributes as the Item content
bitstreams, but does NOT include metadata section references (e.g. @ADMID) or a @SEQ
attribute.
See the main structMap for the fptr reference to this logo file.
mets/structMap - Primary structure map, @LABEL="DSpace Object", @TYPE="LOGICAL"
For ITEM objects:
1. Top-Level div with @TYPE="DSpace Object Contents".
For every Bitstream in Item it contains a div with @TYPE="DSpace BITSTREAM".
Each Bitstream div has a single fptr element which references the bitstream
location.
If Item has primary bitstream, put it in structMap/div/fptr (i.e. directly under the div
with @TYPE="DSpace Object Contents")
For COLLECTION objects:
1. Top-Level div with @TYPE="DSpace Object Contents".
For every Item in the Collection, it contains a div with @TYPE="DSpace ITEM".
Each Item div has up to two child mptrelements:
a. One linking to the Handle of that Item. Its @LOCTYPE="HANDLE", and
@xlink:href value is the raw Handle.
b. (Optional) one linking to the location of the local AIP for that Item (if known).
Its @LOCTYPE="URL", and @xlink:href value is a relative link to the AIP
file on the local filesystem.
If Collection has a Logo bitstream, there is an fptr reference to it in the very first div.
If the Collection includes an Item Template, there will be a div with @TYPE="DSpace
ITEM Template" within the very first div. This div @TYPE="DSpace ITEM
Template" must have a @DMDID specified, which links to the dmdSec element(s) that
contain the metadata for the Item Template.
Metadata in METS
The following tables describe how various metadata schemas are populated (via DSpace Crosswalks) in the
METS file for an AIP.
is extendable to any number of schemas, elements, qualifiers). These custom fields/schemas may or may not
be able to be translated into normal Qualified Dublin Core. So, the DIM Schema must be able to express
metadata schemas, elements or qualifiers which may or may not exist within Qualified Dublin Core.
In the METS structure, DIM metadata always appears within a dmdSec inside an <mdWrap MDTYPE="OTHER"
OTHERMDTYPE="DIM"> element. For example:
<dmdSec ID="dmdSec_2190">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="DIM">
...
</mdWrap>
</dmdSec>
By default, DIM metadata is always included in AIPs. It is controlled by the following configuration in your
dspace.cfg:
MODS Schema
By default, all DSpace descriptive metadata (DIM) is also translated into the MODS Schema by utilizing
DSpace's MODSDisseminationCrosswalk. DSpace's DIM to MODS crosswalk is defined within your
[dspace]/config/crosswalks/mods.properties configuration file. This file allows you to customize the
MODS that is included within your AIPs.
In the METS structure, MODS metadata always appears within a dmdSec inside an <mdWrap MDTYPE="MODS"
> element. For example:
<dmdSec ID="dmdSec_2189">
<mdWrap MDTYPE="MODS">
...
</mdWrap>
</dmdSec>
By default, MODS metadata is always included in AIPs. It is controlled by the following configuration in your
dspace.cfg:
The MODS metadata is included within your AIP to support interoperability. It provides a way for other systems
to interact with or ingest the AIP without needing to understand the DIM Schema. You may choose to disable
MODS if you wish, however this may decrease the likelihood that you'd be able to easily ingest your AIPs into a
non-DSpace system (unless that non-DSpace system is able to understand the DIM schema). When restoring
/ingesting AIPs, DSpace will always first attempt to restore DIM descriptive metadata. Only if no DIM metadata
is found, will the MODS metadata be used during a restore.
In the METS structure, AIP-TECHMD metadata always appears within a sourceMD inside an <mdWrap
MDTYPE="OTHER" OTHERMDTYPE="AIP-TECHMD"> element. For example:
<amdSec ID="amd_2191">
...
<sourceMD ID="sourceMD_2198">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="AIP-TECHMD">
...
</mdWrap>
</sourceMD>
...
</amdSec>
By default, AIP-TECHMD metadata is always included in AIPs. It is controlled by the following configuration in
your dspace.cfg:
aip.disseminate.sourceMD = AIP-TECHMD
dc.relation. All other Collection's this item is linked to (Handle URN of each non-owner)
isReferencedBy
dc.format. System Support Level for Format (necessary to recreate Format during restore, if the format
supportlevel isn't know to DSpace by default)
dc.format. Whether Format is internal (necessary to recreate Format during restore, if the format isn't
internal know to DSpace by default)
Outstanding Question: Why are we recording the file format support status? That's a DSpace property,
rather than an Item property. Do DSpace instances rely on objects to tell them their support status?
Possible answer (from Larry Stone): Format support and other properties of the BitstreamFormat
are recorded here in case the Item is restored in an empty DSpace that doesn't have that format
yet, and the relevant bits of the format entry have to be reconstructed from the AIP. --lcs
dc.relation. All other Communities this Collection is linked to (Handle URN of each non-owner
isReferencedBy )
PREMIS Schema
At this point in time, the PREMIS Schema is only used to represent technical metadata about DSpace
Bitstreams (i.e. Files). The PREMIS metadata is generated by DSpace's PREMISCrosswalk. Only the
PREMIS Object Entity Schema is used.
In the METS structure, PREMIS metadata always appears within a techMD inside an <mdWrap MDTYPE="
PREMIS"> element. PREMIS metadata is always wrapped withn a <premis:premis> element. For example:
<amdSec ID="amd_2209">
...
<techMD ID="techMD_2210">
<mdWrap MDTYPE="PREMIS">
<premis:premis>
...
</premis:premis>
</mdWrap>
</techMD>
...
</amdSec>
Each Bitstream (file) has its own amdSec within a METS manifest. So, there will be a separate PREMIS techMD
for each Bitstream within a single Item.
By default, PREMIS metadata is always included in AIPs. It is controlled by the following configuration in your
dspace.cfg:
DSPACE-ROLES Schema
All DSpace Groups and EPeople objects are translated into a custom DSPACE-ROLES XML Schema. This XML
Schema is a very simple representation of the underlying DSpace database model for Groups and EPeople.
The DSPACE-ROLES Schemas is generated by DSpace's RoleCrosswalk.
Only the following DSpace Objects utilize the DSPACE-ROLES Schema in their AIPs:
Site AIP – all Groups and EPeople are represented in DSPACE-ROLES Schema
Community AIP – only Community-based groups (e.g. COMMUNITY_1_ADMIN) are represented in
DSPACE-ROLES Schema
Collection AIP – only Collection-based groups (e.g. COLLECTION_2_ADMIN, COLLECTION_2_SUBMIT,
etc.) are represented in DSPACE-ROLES Schema
In the METS structure, DSPACE-ROLES metadata always appears within a techMD inside an <mdWrap
MDTYPE="OTHER" OTHERMDTYPE="DSPACE-ROLES"> element. For example:
<amdSec ID="amd_2068">
...
<techMD ID="techMD_2070">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="DSPACE-ROLES">
...
</mdWrap>
</techMD>
...
</amdSec>
By default, DSPACE-ROLES metadata is always included in AIPs. It is controlled by the following configuration
in your dspace.cfg:
<DSpaceRoles>
<Groups>
<Group ID="1" Name="Administrator">
<Members>
<Member ID="1" Name="bsmith@myu.edu" />
</Members>
</Group>
<Group ID="0" Name="Anonymous" />
<Group ID="70" Name="COLLECTION_hdl:123456789/57_ADMIN">
<Members>
<Member ID="1" Name="bsmith@myu.edu" />
</Members>
</Group>
<Group ID="75" Name="COLLECTION_hdl:123456789/57_DEFAULT_READ">
<MemberGroups>
<MemberGroup ID="0" Name="Anonymous" />
</MemberGroups>
</Group>
<Group ID="71" Name="COLLECTION_hdl:123456789/57_SUBMIT">
<Members>
<Member ID="1" Name="bsmith@myu.edu" />
</Members>
</Group>
<Group ID="72" Name="COLLECTION_hdl:123456789/57_WORKFLOW_STEP_1">
<MemberGroups>
<MemberGroup ID="1" Name="Administrator" />
</MemberGroups>
</Group>
<Group ID="73" Name="COLLECTION_hdl:123456789/57_WORKFLOW_STEP_2">
<MemberGroups>
<MemberGroup ID="1" Name="Administrator" />
</MemberGroups>
</Group>
<Group ID="8" Name="COLLECTION_hdl:123456789/6703_DEFAULT_READ" />
<Group ID="9" Name="COLLECTION_hdl:123456789/2_ADMIN">
<Members>
<Member ID="1" Name="bsmith@myu.edu" />
</Members>
</Group>
</Groups>
<People>
<Person ID="1">
<Email>bsmith@myu.edu</Email>
<Netid>bsmith</Netid>
<FirstName>Bob</FirstName>
<LastName>Smith</LastName>
<Language>en</Language>
<CanLogin />
</Person>
<Person ID="2">
<Email>jjones@myu.edu</Email>
<FirstName>Jane</FirstName>
<LastName>Jones</LastName>
<Language>en</Language>
<CanLogin />
<SelfRegistered />
</Person>
</People>
</DSpaceRoles>
You may have noticed several odd looking group names in the above example, where a Handle is
embedded in the name (e.g. "COLLECTION_hdl:123456789/57_SUBMIT"). This is a translation of a
Group name which included a Community or Collection Internal ID (e.g.
"COLLECTION_45_SUBMIT"). Since you are exporting these Groups outside of DSpace, the Internal
ID may no longer be valid or be understandable. Therefore, before export, these Group names are all
translated to include an externally understandable identifier, in the form of a Handle. If you use this
AIP to restore your groups later, they will be translated back to the normal DSpace format (i.e. the
handle will be translated back to the new Internal ID).
In 1.8.2 and above, the Group is renamed using the following format: "ORPHANED_[object-
type]_GROUP_[obj-id]_[group-type]" (e.g. "ORPHANED_COLLECTION_GROUP_10_ADMIN").
Prior to 1.8.2, the Group was renamed with a random key: "GROUP_[random-hex-key]_[object-
type]_[group-type]" (e.g. "GROUP_123eb3a_COLLECTION_ADMIN"). This old format was
discontinued as giving the groups a randomly generated name caused the SITE AIP to have a
different checksum every time it was regenerated (see DS-1120).
The reasoning is that we were unable to translate an Internal ID into an External ID (i.e. Handle). If we
are unable to do that translation, re-importing or restoring a group with an old internal ID could cause
conflicts or instability in your DSpace system. In order to avoid such conflicts, these groups are
renamed using a random, unique key.
This specific example is for a Collection, which has associated Administrator, Submitter, and Workflow approver
groups. In this very simple example, each group only has one Person as a member of it. Please notice that the
Person's information (Name, NetID, etc) is NOT contained in this content (however they are available in the
DSPACE-ROLES example for a SITE, as shown above)
<DSpaceRoles>
<Groups>
<Group ID="9" Name="COLLECTION_hdl:123456789/2_ADMIN" Type="ADMIN">
<Members>
<Member ID="1" Name="bsmith@myu.edu" />
</Members>
</Group>
<Group ID="13" Name="COLLECTION_hdl:123456789/2_SUBMIT" Type="SUBMIT">
<Members>
<Member ID="2" Name="jjones@myu.edu" />
</Members>
</Group>
<Group ID="10" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_1" Type="WORKFLOW_STEP_1">
<Members>
<Member ID="1" Name="bsmith@myu.edu" />
</Members>
</Group>
<Group ID="11" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_2" Type="WORKFLOW_STEP_2">
<Members>
<Member ID="2" Name="jjones@myu.edu" />
</Members>
</Group>
<Group ID="12" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_3" Type="WORKFLOW_STEP_3">
<Members>
<Member ID="1" Name="bsmith@myu.edu" />
</Members>
</Group>
</Groups>
</DSpaceRoles>
METSRights Schema
All DSpace Policies (permissions on objects) are translated into the METSRights schema. This is different than
the above DSPACE-ROLES schema, which only represents Groups and People objects. Instead, the
METSRights schema is used to translate the permission statements (e.g. a group named "Library Admins" has
Administrative permissions on a Community named "University Library"). But the METSRights schema doesn't
represent who is a member of a particular group (that is defined in the DSPACE-ROLES schema, as described
above).
The METSRights Schema must be used in conjunction with the DSPACE-ROLES Schema for Groups,
People and Permissions to all be restored properly. As mentioned above, the METSRights metadata
can only be used to restore permissions (i.e. DSpace policies). The DSPACE-ROLES metadata must
also exist if you wish to restore the actual Group or EPeople objects to which those permissions apply.
All DSpace Object's AIPs (except for the SITE AIP) utilize the METSRights Schema in order to define what
permissions people and groups have on that object. Although there are several sections to the METSRights
Schema, DSpace AIPs only use the <RightsDeclarationMD> section, as this is what is used to describe
rights on an object.
In the METS structure, METSRights metadata always appears within a rightsMD inside an <mdWrap
MDTYPE="OTHER" OTHERMDTYPE="METSRIGHTS"> element. For example:
<amdSec ID="amd_2068">
...
<rightsMD ID="rightsMD_2074">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="METSRIGHTS">
...
</mdWrap>
</rightsMD>
...
</amdSec>
By default, METSRights metadata is always included in AIPs. It is controlled by the following configuration in
your dspace.cfg:
aip.disseminate.rightsMD = DSpaceDepositLicense:DSPACE_DEPLICENSE, \
CreativeCommonsRDF:DSPACE_CCRDF, CreativeCommonsText:DSPACE_CCTEXT, METSRIGHTS
Below is an example of a METSRights sections for a publicly visible Bitstream, Bundle or Item. Notice it
specifies that the "GENERAL PUBLIC" has the permission to DISCOVER or DISPLAY this object.
<rights:RightsDeclarationMD xmlns:rights="http://cosimo.stanford.edu/sdr/metsrights/"
RIGHTSCATEGORY="LICENSED">
<rights:Context CONTEXTCLASS="GENERAL PUBLIC">
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" />
</rights:Context>
</rights:RightsDeclarationMD>
As of DSpace 3, DSpace policies/permissions may also have a "start-date" or "end-date" (to support Embargo
functionality). Such a policy on an Item may look like this. Notice it specifies that the "GENERAL PUBLIC" has
the permission to DISCOVER or DISPLAY this object starting on 2015-01-01, while the Group "Staff" has
permission to DISCOVER or DISPLAY this object until 2015-01-01.
<rights:RightsDeclarationMD xmlns:rights="http://cosimo.stanford.edu/sdr/metsrights/"
RIGHTSCATEGORY="LICENSED">
<rights:Context CONTEXTCLASS="GENERAL PUBLIC" start-date="2015-01-01" in-effect="false">
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP" end-date="2015-01-01" in-effect="true">
<rights:UserName USERTYPE="GROUP">Staff</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" />
</rights:Context>
</rights:RightsDeclarationMD>
Below is an example of a METSRights sections for a publicly visible Collection, which also has an Administrator
group, a Submitter group, and a group for each of the three DSpace workflow approval steps. You'll notice that
each of the groups is provided with very specific permissions within the Collection. Submitters & Workflow
approvers can "ADD CONTENTS" to a collection (but cannot delete the collection). Administrators have full
rights.
<rights:RightsDeclarationMD xmlns:rights="http://cosimo.stanford.edu/sdr/metsrights/"
RIGHTSCATEGORY="LICENSED">
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_SUBMIT</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true"
OTHERPERMITTYPE="ADD CONTENTS" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_3</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true"
OTHERPERMITTYPE="ADD CONTENTS" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_2</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true"
OTHERPERMITTYPE="ADD CONTENTS" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_1</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true"
OTHERPERMITTYPE="ADD CONTENTS" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_ADMIN</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" COPY="true" DUPLICATE="true" MODIFY="true"
DELETE="true" PRINT="true" OTHER="true" OTHERPERMITTYPE="ADMIN" />
</rights:Context>
<rights:Context CONTEXTCLASS="GENERAL PUBLIC">
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" />
</rights:Context>
</rights:RightsDeclarationMD>
Below is an example of a METSRights sections for a publicly visible Community, which also has an
Administrator group. As you'll notice, this content looks very similar to the Collection METSRights section (as
described above)
<rights:RightsDeclarationMD xmlns:rights="http://cosimo.stanford.edu/sdr/metsrights/"
RIGHTSCATEGORY="LICENSED">
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COMMUNITY_hdl:123456789/10_ADMIN</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" COPY="true" DUPLICATE="true" MODIFY="true"
DELETE="true" PRINT="true" OTHER="true" OTHERPERMITTYPE="ADMIN" />
</rights:Context>
<rights:Context CONTEXTCLASS="GENERAL PUBLIC">
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" />
</rights:Context>
</rights:RightsDeclarationMD>
The software DSpace relies on does not come out of the box optimized for large repositories. Here are
some tips to make it all run faster.
At the time of writing, DSpace recommends you should give Tomcat >= 512MB of Java Heap Memory
to ensure optimal DSpace operation. Most larger sized or highly active DSpace installations however
tend to allocate more like 1024MB to 2048MB of Java Heap Memory.
Performance tuning in Java basically boils down to memory. If you are seeing "java.lang.
OutOfMemoryError: Java heap space" errors, this is a sure sign that Tomcat isn't being provided with
enough Heap Memory.
Tomcat is especially memory hungry, and will benefit from being given lots of RAM. To set the amount of
memory available to Tomcat, use either the JAVA_OPTS or CATALINA_OPTS environment variable, e.g:
CATALINA_OPTS=-Xmx512m -Xms512m
OR
JAVA_OPTS=-Xmx512m -Xms512m
The above example sets the maximum Java Heap memory to 512MB.
You can use either environment variable. JAVA_OPTS is also used by other Java programs (besides
just Tomcat). CATALINA_OPTS is only used by Tomcat. So, if you only want to tweak the memory
available to Tomcat, it is recommended that you use CATALINA_OPTS. If you set both
CATALINA_OPTS and JAVA_OPTS, Tomcat will default to using the settings in CATALINA_OPTS.
If the machine is dedicated to DSpace a decent rule of thumb is to give tomcat half of the memory on your
machine. At a minimum, you should give Tomcat >= 512MB of memory for optimal DSpace operation. (
NOTE: As your DSpace instance gets larger in size, you may need to increase this number to the several GB
range.) The latest guidance is to also set -Xms to the same value as -Xmx for server applications such as
Tomcat.
At the time of writing, DSpace recommends you should give Tomcat >= 128MB of PermGen Space to
ensure optimal DSpace operation.
If you are seeing "java.lang.OutOfMemoryError: PermGen space" errors, this is a sure sign that
Tomcat is running out PermGen Memory. (More info on PermGen Space: http://blogs.sun.com/fkieviet/entry
/classloader_leaks_the_dreaded_java)
To increase the amount of PermGen memory available to Tomcat (default=64MB), use either the JAVA_OPTS
or CATALINA_OPTS environment variable, e.g:
CATALINA_OPTS=-XX:MaxPermSize=128m
OR
JAVA_OPTS=-XX:MaxPermSize=128m
You can use either environment variable. JAVA_OPTS is also used by other Java programs (besides
just Tomcat). CATALINA_OPTS is only used by Tomcat. So, if you only want to tweak the memory
available to Tomcat, it is recommended that you use CATALINA_OPTS. If you set both
CATALINA_OPTS and JAVA_OPTS, Tomcat will default to using the settings in CATALINA_OPTS.
Please note that you can obviously set both Tomcat's Heap space and PermGen Space together
similar to:
CATALINA_OPTS=-Xmx512m -Xms512m -XX:MaxPermSize=128m
On an Ubuntu machine (10.04) at least, the file /etc/default/tomcat6 appears to be the best
place to put these environmental variables.
By default, DSpace only provides 256MB of maximum heap memory to its command-line tools.
If you'd like to provide more memory to command-line tools, you can do so via the JAVA_OPTS environment
variable (which is used by the [dspace]/bin/dspace script). Again, it's the same syntax as above:
JAVA_OPTS=-Xmx512m -Xms512m
This is especially useful for big batch jobs, which may require additional memory.
You can also edit the [dspace]/bin/dspace script and add the environmental variables to the
script directly.
Give the Command Line Tools More Java PermGen Space Memory
Similar to Tomcat, you may also need to give the DSpace Java-based command-line tools more PermGen
Space. If you are seeing "java.lang.OutOfMemoryError: PermGen space" errors, when running a
command-line tool, this is a sure sign that it isn't being provided with enough PermGen Space.
If you'd like to provide more PermGen Space to command-line tools, you can do so via the JAVA_OPTS
environment variable (which is used by the [dspace]/bin/dspace script). Again, it's the same syntax as
above:
JAVA_OPTS=-XX:MaxPermSize=128m
This is especially useful for big batch jobs, which may require additional memory.
Please note that you can obviously set both Java's Heap space and PermGen Space together similar
to:
JAVA_OPTS=-Xmx512m -Xms512m -XX:MaxPermSize=128m
For more hints/tips with PostgreSQL configurations and performance tuning, see also:
PostgresPerformanceTuning
PostgresqlConfiguration
Note that the Auto Commit method is already integrated in DSpace 1.7 and above.
DSpace comes with tools that ensure major search engines (Google, Bing, Yahoo, Google Scholar) are able to
easily and effectively index all your content. However, many of these tools provide some basic setup. Here's
how to ensure your site is indexed.
1. Keep your DSpace up to date. We are constantly adding new indexing improvements in new releases
2. Ensure your DSpace is visible to search engines.
3. Enable the sitemaps feature – this does not require e.g. registering with Google Webmaster tools.
4. Ensure your robots.txt allows access to item "splash" pages and full text.
5. Ensure item metadata appears in HTML headers correctly.
6. Avoid redirecting file downloads to Item landing pages
7. As an aside, it's worth noting that OAI-PMH is generally not useful to search engines. OAI-PMH has its
own uses, but do not expect search engines to use it.
As of DSpace 4.0, DSpace has provided several enhancements, which were requested by the Google
Scholar team. These included providing users (and web indexers) a way to browse content by the date it
was added to DSpace (see DS-1482), ensuring the "dc.date.issued" field is set more accurately (see DS-
1481), and enhancing the logic behind the "citation_pdf_url" HTML <meta> tag (see DS-1483)
As of DSpace 1.7, DSpace has improved how its Item-level metadata is made available to Google
Scholar. For the 1.7.0 release, the DSpace Developers worked directly with the Google Scholar
developers, to ensure DSpace is generating the "citation_*" HTML "<meta>" tags (i.e. Highwire Press
tags) that Google Scholar recommends in their Indexing Guidelines.
As of DSpace 1.5, DSpace has support for sitemaps (both simple HTML pages of links, as well as the
sitemaps.org protocol). It also includes item metadata in the HTML HEAD element of item display pages,
ensuring that the metadata can be effectively indexed no matter what changes you might have made to
your DSpace's layout or style.
As of DSpace 1.4, DSpace has support for the "if-modified-since" HTTP header. This basically means
that if an item (or bitstream therein) has not changed since the last time a search engine's crawler
indexed it, that item/bitstream does not have to be re-retrieved, sparing your server.
Additional minor improvements / bug fixes have been made to more recent releases of DSpace.
If your site is not indexed at all, all search engines have a way to add your URL, e.g.:
Google: http://www.google.com/addurl
Yahoo: http://siteexplorer.search.yahoo.com/submit
Bing: http://www.bing.com/docs/submit.aspx
HTML sitemaps provide a list of all items, collections and communities in HTML format, whilst Google sitemaps
provide the same information in gzipped XML format.
To enable sitemaps, all you need to do is run [dspace]/bin/dspace generate-sitemaps once a day.
Just set up a cron job (or scheduled task in Windows), e.g. (cron):
Once you've enabled your sitemaps, they will be accessible at the following URLs:
So, for example, if your "dspace.url = http://mysite.org/xmlui" in your "dspace.cfg" configuration file, then the
HTML Sitemaps would be at: "http://mysite.org/xmlui/htmlmap"
1. Provide a hidden link to the sitemaps in your DSpace's homepage. If you've customized your site's
look and feel (as most have), ensure that there is a link to /htmlmap in your DSpace's front or home
page.By default, both the JSPUI and XMLUI provide this link in the footer:
<a href="/htmlmap"></a>
2. Announce your sitemap in your robots.txt. Most major search engines will also automatically discover
your sitemap if you announce it in your robots.txt file. For example:
Sitemap: http://my.dspace.url/sitemap
Sitemap: http://my.dspace.url/htmlmap
a.
a. NOTE that you need to replace "http://my.dspace.url" lines above with the full URL of your
DSpace instance (this should correspond to the "dspace.url" setting in your dspace.cfg file)
b. This "Sitemap:" lines can be placed anywhere in your robots.txt file. You can also specify multiple
"Sitemap:" lines, so that search engines can locate both formats. For more information, see:
http://www.sitemaps.org/protocol.html#informing
Search engines will now look at your XML and HTML sitemaps, which serve pre-generated (and thus served
with minimal impact on your hardware) XML or HTML files linking directly to items, collections and communities
in your DSpace instance. Crawlers will not have to work their way through any browse screens, which are
intended more for human consumption, and more expensive for the server.
If you have restricted content on your site, search engines will not be able to access it; they access all pages as
an anonymous user.
Ensure that your robots.txt file is at the top level of your site: i.e. at http://repo.foo.edu/robots.txt, and NOT e.g.
http://repo.foo.edu/dspace/robots.txt. If your DSpace instance is served from e.g. http://repo.foo.edu/dspace/,
you'll need to add /dspace to all the paths in the examples below (e.g. /dspace/browse-subject).
DSpace 1.5 and 1.5.1 ship with a bad robots.txt file. Delete it, or specifically the line that says
Disallow: /browse. If you do not, your site will not be correctly indexed.
/bitstream
/browse (UNLESS USING SITEMAPS)
/*/browse (UNLESS USING SITEMAPS)
/browse-date (UNLESS USING SITEMAPS)
/*/browse-date (UNLESS USING SITEMAPS)
/community-list (UNLESS USING SITEMAPS)
/handle
/html
/htmlmap
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover
Disallow: /search-filter
# Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used
# and you have verified that your site is being indexed correctly.
# Disallow: /browse
# You also may wish to disallow access to the following paths, in order
# to stop web spiders from accessing user-based content:
# Disallow: /advanced-search
# Disallow: /contact
# Disallow: /feedback
# Disallow: /forgot
# Disallow: /login
# Disallow: /register
# Disallow: /search
Note that for your additional disallow statements to be recognized under the User-agent: * group, they can not
be separated by white lines from the declared user-agent: * block. A white line indicates the start of a new user
agent block. Without a leading user-agent declaration on the first line, blocks are ignored. Comment lines are
allowed and will not break the user-agent block.
This is OK:
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search
This is not OK, as the two lines at the bottom will be completely ignored.
User-agent: *
# Disable access to Discovery search and filters
Disallow: /discover
Disallow: /search-filter
Disallow: /displaystats
Disallow: /advanced-search
To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.
If you have heavily customized your metadata fields away from Dublin Core, you can modify the crosswalk that
generates these elements by modifying [dspace]/config/crosswalks/xhtml-head-item.properties
.
These meta tags are the "Highwire Press tags" which Google Scholar recommends. If you have heavily
customized your metadata fields, or wish to change the default "mappings" to these Highwire Press tags, they
are configurable in [dspace]/config/crosswalks/google-metadata.properties
Much more information is available in the Configuration section on Google Scholar Metadata Mappings.
While these URL redirects may seem harmless, they may be flagged as cloaking or spam by Google, Google
Scholar and other major search engines. This may hurt your site's search engine ranking or even cause your
entire site to be flagged for removal from the search engine.
If you have these URL redirects in place, it is highly recommended to remove them immediately. If you created
these redirects to facilitate capturing download statistics in Google Analytics, you should consider upgrading to
DSpace 5.0 or above, which is able to automatically record bitstream downloads in Google Analytics (see DS-
2088) without the need for any URL redirects.
As of DSpace 1.7, there is a mapping facility to connect metadata fields with these citation fields in HTML. In
order to enable this functionality, the switch needs to be flipped in dspace.cfg:
google-metadata.enable = true
Once the feature is enabled, the mapping is configured by a separate configuration file located here:
[dspace]/config/crosswalks/google-metadata.properties
Please, note that the file location changed between DSpace 1.7 and 1.8. It's now in the "crosswalks"
directory, so check that the google-metadata.config configuration property points to the right file:
google-metadata.config = ${dspace.dir}/config/crosswalks/google-metadata.
properties
This file contains name/value pairs linking meta-tags with DSpace metadata fields. E.g…
google.citation_title = dc.title
google.citation_publisher = dc.publisher
google.citation_authors = dc.author | dc.contributor.author | dc.creator
There is further documentation in this configuration file explaining proper syntax in specifying which metadata
fields to use. If a value is omitted for a meta-tag field, the meta-tag is simply not included in the HTML output.
The values for each item are interpolated when the item is viewed, and the appropriate meta-tags are included
in the HTML head tag, on both the Brief Item Display and the Full Item Display. This is implemented in the
XMLUI and JSPUI.
-p <prune> Prune old results (optionally using specified properties file for
configuration
There are three aspects of the Checksum Checker's operation that can be configured:
Unless a particular bitstream or handle is specified, the Checksum Checker will always check bitstreams in
order of the least recently checked bitstream. (Note that this means that the most recently ingested bitstreams
will be the last ones checked by the Checksum Checker.)
s Seconds
m Minutes
h Hours
d Days
w Weeks
y Years
The checker will keep starting new bitstream checks for the specific durations, so actual execution
duration will be slightly longer than the specified duration. Bear this in mind when scheduling checks.
Specific Bitstream mode: [dspace]/bin/dspace checker -b Checker will only look at the
internal bitstream IDs. Example: [dspace]/bin/dspace checker -b 112 113 4567 Checker will
only check bitstream IDs 112, 113 and 4567.
Specific Handle mode: [dspace]/bin/dspace checker -a Checker will only check bitstreams
within the Community, Community or the item itself. Example: [dspace]/bin/dspace checker -a
123456/999 Checker will only check this handle. If it is a Collection or Community, it will run through the
entire Collection or Community.
Looping mode: [dspace]/bin/dspace checker -l or [dspace]/bin/dspace checker -L
There are two modes. The lowercase 'el' (-l) specifies to check every bitstream in the repository once.
This is recommended for smaller repositories who are able to loop through all their content in just a few
hours maximum. An uppercase 'L' (-L) specifies to continuously loops through the repository. This is not
recommended for most repository systems. Cron Jobs. For large repositories that cannot be completely
checked in a couple of hours, we recommend the -d option in cron.
Pruning mode: [dspace]/bin/dspace checker -p The Checksum Checker will store the result of
every check in the checksum_history table. By default, successful checksum matches that are eight
weeks old or older will be deleted when the -p option is used. (Unsuccessful ones will be retained
indefinitely). Without this option, the retention settings are ignored and the database table may grow
rather large!
1. Editing the retention policies in [dspace]/config/dspace.cfg See Chapter 5 Configuration for the
property keys. OR
2. Pass in a properties file containing retention policies when using the -p option.To do this, create a file
with the following two property keys:
checker.retention.default = 10y
checker.retention.CHECKSUM_MATCH = 8w
You can use the table above for your time units. At the command line: [dspace]/bin/dspace
checker -p retention_file_name <ENTER>
Checker Reporting
Checksum Checker uses log4j to report its results. By default it will report to a log called [dspace]/log
/checker.log, and it will report only on bitstreams for which the newly calculated checksum does not match
the stored checksum. To report on all bitstreams checked regardless of outcome, use the -v (verbose)
command line option:
[dspace]/bin/dspace checker -l -v (This will loop through the repository once and report in detail
about every bitstream checked.
To change the location of the log, or to modify the prefix used on each line of output, edit the [dspace]
/config/templates/log4j.properties file and run [dspace]/bin/install_configs.
Unix, Linux, or MAC OS. You can schedule it by adding a cron entry similar to the following to the crontab for
the user who installed DSpace:
The above cron entry would schedule the checker to run the checker every Sunday at 400 (4:00 a.m.) for 2
hours. It also specifies to 'prune' the database based on the retention settings in dspace.cfg.
Windows OS. You will be unable to use the checker shell script. Instead, you should use Windows Schedule
Tasks to schedule the following command to run at the appropriate times:
-d or --Deleted Send E-mail report for all bitstreams set as deleted for today.
-m or --Missing Send E-mail report for all bitstreams not found in assetstore for today.
-c or --Changed Send E-mail report for all bitstreams where checksum has been changed
for today.
-n or --Not Processed Send E-mail report for all bitstreams set to longer be processed for today.
-h or --help Help
You can also combine options (e.g. -m -c) for combined reports.
Cron. Follow the same steps above as you would running checker in cron. Change the time but match the
regularity. Remember to schedule this after Checksum Checker has run. For an example cron setup, see
Scheduled Tasks via Cron.
Please note, that as of DSpace 4.0, the Solr-based Discovery search is on by the default in both
JSPUI and XMLUI. This page describes the older Lucene-based search and DBMS browse indices.
Neither the DBMS browse tables nor the Lucene search indices are used anymore (unless you
explicitly disable SolrBrowseDAO and enable search artifacts). This page was previously called
ReIndexing Content with the old legacy providers (DBMS for Browse or Lucene for Search)
Overview
Re-Enabling the legacy Lucene Search and/or DBMS Browse providers
Configure the browse engine to use Oracle
Creating the Browse & Search Indexes
Running the Indexing Programs
Complete Index Regeneration
Updating the Indexes
Destroy and Rebuild Browse Tables
Indexing Customization
Browse Index Customization
Search Index Customization
Configuring Lucene Search Indexes
Customize the advanced search form
5.9.1 Overview
DSpace offers two options to index content for Browsing & Searching:
1. Faceted/Filtered Search & Browse (via Solr & DSpace Discovery) - enabled by default since DSpace 4.0
2. Traditional Browse & Search (via Lucene & Database tables) - this is disabled by default
This particular page only describes the "Traditional Browse & Search" indexing processes. For more information
on Faceted/Filtered Browse & Search, please see DSpace Discovery, in particular Discovery Solr Index
Maintenance .
TO BE COMPLETED
If a DAOs configuration is not provided the system will use the SOLR Browse Engine
This option enables the browse engine to store its indexes in PostgreSQL database tables. All browsing is then
performed via queries to those database tables. This is the traditional browsing option for users of PostgreSQL.
The configuration is as follows:
browseDAO.class = org.dspace.browse.BrowseDAOPostgres
browseCreateDAO.class = org.dspace.browse.BrowseCreateDAOPostgres
browseDAO.class = org.dspace.browse.BrowseDAOOracle
browseCreateDAO.class = org.dspace.browse.BrowseCreateDAOOracle
-r or -rebuild Should we rebuild all the indexes, which removes old tables and creates new ones.
For use with -f. Mutually exclusive with -d
-s or -start -s <int> start from this index number and work upwards (mostly only useful for
debugging). For use with -t and -f
-x or -execute Execute all the remove and create SQL against the database. For use with -t and -f
-o or -out -o <filename> write the remove and create SQL to the given file. For use with -t
and -f
-p or -print Write the remove and create SQL to the stdout. For use with -t and -f.
-t or -tables Create the tables only, do no attempt to index. Mutually exclusive with -f and -i
-f or -full Make the tables, and do the indexing. This forces -x. Mutually exclusive with -f and -
i.
-v or -verbose Print extra information to the stdout. If used in conjunction with -p, you cannot use the
stdout to generate your database structure.
-d or -delete Delete all the indexes, but do not create new ones. For use with -f. This is mutually
exclusive with -r.
If you are using the Solr Browse DAOs, that is the default since DSpace 4.0, it is not required to run
this script as the data are stored in the Solr search core that need to be recreated using the Discovery
maintenance script
Because this command actually deletes existing Browse Index tables, you must stop Tomcat (or your
Servlet Container of choice) before executing index-lucene-init. After the indexing command
completes, you can restart Tomcat.
In many Oracle based DSpace installations, index-lucene-init often malfunctions because of Oracle
specific permissions. It is therefore advised to stick to index-lucene-update instead
[dspace]/bin/dspace index-lucene-init
[dspace]/bin/dspace index-lucene-update
If you are using the Solr Browse DAOs, that is the default since DSpace 4.0, you don't need to run this
script as the data are stored in the Solr search core. You need to recreate the indexes using the
Discovery maintenance script
This is really not recommended unless you know what you are doing.
You can destroy and rebuild the database, but do not do the indexing. Output the SQL to do this to the screen
and a file, as well as executing it against the database, while being verbose.
Add new browse indexes besides the four that are delivered upon installation. Examples:
Series
Specific subject fields (Library of Congress Subject Headings). (It is possible to create a browse
index based on a controlled vocabulary or thesaurus.)
Examples of new browse indexes that are possible. (The system administrator is reminded to read the
section on Browse Index Configuration )
Add a Series Browse. You want to add a new browse using a previously unused metadata element.
webui.browse.index.6 = series:metadata:dc.relation.ispartofseries:text:
single
Note: the index # need to be adjusted to your browse stanza in the _dspace.cfg_ file. Also, you
will need to update your Messages.properties file.
Combine more than one metadata field into a browse.You may have other title fields used in your
repository. You may only want one or two of them added, not all title fields. And/or you may want your
series to file in there.
webui.browse.index.3 = title:metadata:dc.title,dc:title.uniform,dc:
relation.ispartofseries:title:full
Separate subject browse.You may want to have a separate subject browse limited to only one type of
subject.
webui.browse.index.7 = lcsubject.metdata:dc.subject.lcsh.text:single
As one can see, the choices are limited only by your metadata schema, the metadata, and your imagination.
Because Browse Indexes are stored in database tables, remember to run index-lucene-init after adding
any new definitions in the dspace.cfg to have the indexes created and the data indexed.
Since DSpace 4.0 the Solr DAOs implementation of the browse engine is used by default you don't
need to run the script described in this page at least if you have re-enabled the legacy DBMS provider.
Instead use the Discovery maintenance script. Browse indexing in Solr is done within the Search
Indexing process.
Please note, that as of DSpace 4.0, the Solr-based Discovery search is on by the default in both
JSPUI and XMLUI. If you want customize the search behavior in a normal DSpace you should refer to
the Discovery documentation.
Property: search.dir
Property: search.max-clauses
Informational By setting higher values of search.max-clauses will enable prefix searches to work on larger
Note: repositories.
Property: search.index.delay
Informational It is possible to create a 'delayed index flusher'. If a web application pushes multiple search
Note: requests (i.e. a barrage or sword deposits, or multiple quick edits in the user interface), then
this will combine them into a single index update. You set the property key to the number of
milliseconds to wait for an update. The example value will hold a Lucene update in a queue
for up to 5 seconds. After 5 seconds all waiting updates will be written to the Lucene index.
Property: search.analyzer
Informational Which Lucene Analyzer implementation to use. If this is omitted or commented out, the
Note: standard DSpace analyzer (designed for English) is used by default. This standard DSpace
analyzer removes common stopwords, lowercases all words and performs stemming
(removing common word endings, like "ing", "s", etc).
Property: search.analyzer
Informational Instead of the standard DSpace Analyzer (DSAnalyzer), use an analyzer which doesn't "stem"
Note: words/terms. When using this analyzer, a search for "wellness" will always return items
matching "wellness" and not "well". However, similarly a search for "experiments" will only
return objects matching "experiments" and not "experiment" or "experimenting". When using
this analyzer, you may still use WildCard searches like "experiment*" to match the beginning
of words.
Property: search.analyzer
Informational Instead of the standard English analyzer, the Chinese analyzer is used.
Note:
Property: search.operator
Example search.operator = OR
Value:
Informational Boolean search operator to use. The currently supported values are OR and AND. If this
Note configuration item is missing or commented out, OR is used. AND requires all the search
terms to be present. OR requires one or more search terms to be present.
Property: search.maxfieldlength
Informational This is the maximum number of terms indexed for a single field in Lucene. The default is
Note: 10,000 words‚ often not enough for full-text indexing. If you change this, you will need to re-
index for the change to take effect on previously added items. -1 = unlimited (Integer.
MAG_VALUE)
Property: search.index. n
Informational This property determines which of the metadata fields are being indexed for search. As an
Note example, if you do not include the title field here, searching for a word in the title will not be
matched with the titles of your items..
For example, the following entries appear in the default DSpace installation:
search.index.1 = author:dc.contributor.*
search.index.2 = author:dc.creator.*
search.index.3 = title:dc.title.*
search.index.4 = keyword:dc.subject.*
search.index.5 = abstract:dc.description.abstract
search.index.6 = author:dc.description.statementofresponsibility
search.index.7 = series:dc.relation.ispartofseries
search.index.8 = abstract:dc.description.tableofcontents
search.index.9 = mime:dc.format.mimetype
search.index.10 = sponsor:dc.description.sponsorship
search.index.11 = id:dc.identifier.*
search.index.12 = language:dc.language.iso
<search is the identifier for the search field this index will correspond to
index
name>
<schema> is the schema used. Dublin Core (DC) is the default. Others are possible.
<index can be used to specify how manipulate the values before indexing.
type>
Example: search.index.12 = language:dc.language.iso:inputform
text - default, no special treatment. Metadata value are passed to lucene as text
timestamp - the values are interpreted as date with second granularity. An additional index
postfixed with .year is created with year granularity
date - the values are interpreted as date with day granularity. An additional index postfixed
with .year is created with year granularity
inputform - in addition to the values stored in the metadata the displayed form of this value
as derivable from the input-form (in any of the available languages) are stored
In the example above, search.index.1 and search.index.2 and search.index.3 are configured as the
author search field. The author index is created by Lucene indexing all dc.contributor.*,dc.creator.
* and description.statementofresponsibility metadata fields.
While the indexes are created, this only affects the search results and has no effect on the search components
of the user interface.
In the above examples, notice the asterisk (*). The metadata field (at least for Dublin Core) is made up of the
"element" and the "qualifier". The asterisk is used as the "wildcard". So, for example, keyword.dc.subject.*
will index all subjects regardless if the term resides in a qualified field. (subject versus subject.lcsh). One could
customize the search and only index LCSH (Library of Congress Subject Headings) with the following entry
keyword:dc.subject.lcsh instead ofkeyword:dc.subject.*
Although DSIndexer automatically builds a separate index for the authority keys of any index that contains
authority-controlled metadata fields, the "Advanced Search" UIs do not allow direct access to it. Perhaps it will
be added in the future. Fortunately, the OpenSearch API lets you submit a query directly to the Lucene search
engine, and this may include the authority-controlled indexes.
XML UI requires manual coding of the involved templates instead the JSP UI provides specific configuration to
set the index to show in the advanced search dropdown. Below are listed the configuration parameters
Property: jspui.search.index.display.<n>
Example
jspui.search.index.display.1 = ANY
Value
Informational Set the N-value of the index dropdown in the advanced search form. The value must match
Note: one of the defined index
$ bin/dspace version
DSpace version: 4.0-SNAPSHOT
SCM revision: da53991b6b7e9f86c2a7f5292e3c2e9606f9f44c
SCM branch: UNKNOWN
OS: Linux(amd64) version 3.7.10-gentoo
Discovery enabled.
Lucene search enabled.
6 DSpace Reference
Directories and Files
Metadata and Bitstream Format Registries
Architecture
Application Layer
Business Logic Layer
DSpace Services Framework
Storage Layer
History
Changes in 4.x
Changes in 3.x
Changes in 1.8.x
Changes in 1.7.x
Changes in 1.6.x
Changes in 1.5.x
Changes in 1.4.x
Changes in 1.3.x
Changes in 1.2.x
Changes in 1.1.x
DSpace Item State Definitions
6.1.1 Overview
A complete DSpace installation consists of three separate directory trees:
The source directory:: This is where (surprise!) the source code lives. Note that the config files here are
used only during the initial install process. After the install, config files should be changed in the install
directory. It is referred to in this document as [dspace-source].
The install directory:: This directory is populated during the install process and also by DSpace as it
runs. It contains config files, command-line tools (and the libraries necessary to run them), and usually --
although not necessarily -- the contents of the DSpace archive (depending on how DSpace is
configured). After the initial build and install, changes to config files should be made in this directory. It is
referred to in this document as [dspace].
The web deployment directory:: This directory is generated by the web server the first time it finds a
dspace.war file in its webapps directory. It contains the unpacked contents of dspace.war, i.e. the JSPs
and java classes and libraries necessary to run DSpace. Files in this directory should never be edited
directly; if you wish to modify your DSpace installation, you should edit files in the source directory and
then rebuild. The contents of this directory aren't listed here since its creation is completely automatic. It
is usually referred to in this document as [tomcat]/webapps/dspace.
postgres/ - Versions of the database schema and updater SQL scripts for
PostgreSQL.
oracle/ - Versions of the database schema and updater SQL scripts for Oracle.
modules/ - The Web UI modules "overlay" directory. DSpace uses Maven to automatically
look here for any customizations you wish to make to DSpace Web interfaces.
jspui - Contains all customizations for the JSP User Interface.
src/main/resources/ - The overlay for JSPUI Resources. This is the location to
place any custom Messages.properties files. (Previously this file had been
stored at: _[dspace-source]/config/language-packs/Messages.properties_
src/main/webapp/ - The overlay for JSPUI Web Application. This is the
location to place any custom JSPs to be used by DSpace.
lni - Contains all customizations for the Lightweight Network Interface.
oai - Contains all customizations for the OAI-PMH Interface.
sword - Contains all customizations for the SWORD (Simple Web-service Offering
Repository Deposit) Interface.
xmlui - Contains all customizations for the XML User Interface (aka Manakin).
src/main/webapp/ - The overlay for XMLUI Web Application. This is the
location to place custom Themes or Configurations.
i18n/ - The location to place a custom version of the XMLUI's
messages.xml (You have to manually create this folder)
themes/ - The location to place custom Themes for the XMLUI (You
have to manually create this folder).
src/ - Maven configurations for DSpace System. This directory contains the Maven and Ant
build files for DSpace.
target/ - (Only exists after building DSpace) This is the location Maven uses to build your
DSpace installation package.
dspace-[version].dir - The location of the DSpace Installation Package (which can
then be installed by running ant update)
[dspace]
assetstore/ - asset store files
bin/ - shell and Perl scripts
config/ - configuration, with sub-directories as above
handle-server/ - Handles server files
history/ - stored history files (generally RDF/XML)
lib/ - JARs, including dspace.jar, containing the DSpace classes
log/ - Log files
reports/ - Reports generated by statistical report generator
search/ - Lucene search index files
upload/ - temporary directory used during file uploads etc.
webapps/ - location where DSpace installs all Web Applications
[dspace]/log Main DSpace log file. This is where the DSpace code writes a simple log of events and
/dspace.log. errors that occur within the DSpace code. You can control the verbosity of this by editing the
yyyy-mm-dd [dspace-source]/config/templates/log4j.properties file and then running "ant init_configs".
[dspace]/log Apache Cocoon log file for the XMLUI. This is where the DSpace XMLUI logs all of its
/cocoon.log. events and errors.
yyyy-mm-dd
[tomcat]/logs This is where Tomcat's standard output is written. Many errors that occur within the Tomcat
/catalina.out code are logged here. For example, if Tomcat can't find the DSpace code ( dspace.jar), it
would be logged in catalina.out.
[tomcat]/logs If you're running Tomcat stand-alone (without Apache), it logs some information and errors
/hostname_log. for specific Web applications to this log file. hostname will be your host name (e.g. dspace.
yyyy-mm-dd.txt myu.edu) and yyyy-mm-dd will be the date.
[tomcat]/logs If you're using Apache, Tomcat logs information about Web applications running through
/apache_log. Apache (mod_webapp) in this log file (yyyy-mm-dd being the date.)
yyyy-mm-dd.txt
[apache] Apache logs to this file. If there is a problem with getting mod_webapp working, this is a
/error_log good place to look for clues. Apache also writes to several other log files, though error_log
tends to contain the most useful information for tracking down problems.
[dspace]/log The Handle server runs as a separate process from the DSpace Web UI (which runs under
/handle-plug. Tomcat's JVM). Due to a limitation of log4j's 'rolling file appenders', the DSpace code
log running in the Handle server's JVM must use a separate log file. The DSpace code that is
run as part of a Handle resolution request writes log information to this file. You can control
the verbosity of this by editing [dspace-source]/config/templates/log4j-handle-plugin.
properties.
[dspace]/log This is the log file for CNRI's Handle server code. If a problem occurs within the Handle
/handle-server. server code, before DSpace's plug-in is invoked, this is where it may be logged.
log
[dspace] On the other hand, a problem with CNRI's Handle server code might be logged here.
/handle-server
/error.log
PostgreSQL PostgreSQL also writes a log file. This one doesn't seem to have a default location, you
log probably had to specify it yourself at some point during installation. In general, this log file
rarely contains pertinent information--PostgreSQL is pretty stable, you're more likely to
encounter problems with connecting via JDBC, and these problems will be logged in dspace.
log.
log4j.properties File.
the file [dspace]/config/log4j.properties controls how and where log files are created. There are three sets of
configurations in that file, called A1, A2, and A3. These are used to control the logs for DSpace, the checksum
checker, and the XMLUI respectively. The important settings in this file are:
These lines control what level of logging takes place. Normally they should be set to
log4j. INFO, but if you need to see more information in the logs, set them to DEBUG and
rootCategory= restart your web server
INFO,A
log4j.logger.
org.
dspace=INFO,
A1
log4j.appender. This is the name of the log file creation method used. The DailyFileAppender creates a
A1=org.dspace. new date-stamped file every day or month.
app.util.
DailyFileAppender
log4j.appender. This sets the filename and location of where the log file will be stored. It iwll have a date
A1.File=${log.dir} stamp appended to the file name.
/dspace.log
log4j.appender. This defines the format for the date stamp that is appended to the log file names. If you
A1. wish to have log files created monthly instead of daily, change this to yyyy-MM
DatePattern=yyy-
MM-DD
log4j.appender. This defines how many log files will be created. You may wish to define a retention
A1.MaxLogs=0 period for log files. If you set this to 365, logs older than a year will be deleted. By default
this is set to 0 so that no logs are ever deleted. Ensure that you monitor the disk space
used by the logs to make sure that you have enough space for them. It is often important
to keep the log files for a long time in case you want to rebuild your statistics.
contributor editor
contributor illustrator
contributor other
date available¹ Date or date range item became available to the public.
description
¹
description provenance ¹ The history of custody of the item since its creation, including any
changes successive custodians made to it.
language iso ² Current ISO standard for language of intellectual content, including
country codes (e.g. "en_US").
relation¹ ispartofseries Series name and number within that series, if available.
subject classification Catch-all for value from local classification system. Global
classification systems will receive specific qualifier
subject other Local controlled vocabulary; global vocabularies will receive specific
qualifier.
title alternative² Varying (or substitute) form of title proper appearing in item, e.g.
abbreviation or translation
¹ Used by several functional areas of DSpace. DO NOT REMOVE WITHOUT INVESTIGATING THE
CONSEQUENCES
² This field is included in the default DSpace Submission User Interface. Removing this field from your registry
will break the default DSpace submission form.
The main advantage of the DCTERMS schema is that no field name details gets lost during harvesting, as
opposed to harvesting of so called "simple" dublin core, where the qualifiers from the above schema are omitted
during harvesting.
accessRights Information about who can access the resource or an indication of its security status.
May include information regarding access or restrictions based on privacy, security, or
other policies.
available Date (often a range) that the resource became or will become available.
coverage The spatial or temporal topic of the resource, the spatial applicability of the resource,
or the jurisdiction under which the resource is relevant.
date A point or period of time associated with an event in the lifecycle of the resource.
hasFormat A related resource that is substantially the same as the pre-existing described
resource, but in another format.
hasPart A related resource that is included either physically or logically in the described
resource.
hasVersion A related resource that is a version, edition, or adaptation of the described resource.
instructionalMethod A process, used to engender knowledge, attitudes and skills, that the described
resource is designed to support.
isFormatOf A related resource that is substantially the same as the described resource, but in
another format.
isPartOf A related resource in which the described resource is physically or logically included.
isReferencedBy A related resource that references, cites, or otherwise points to the described
resource.
isReplacedBy A related resource that supplants, displaces, or supersedes the described resource.
isRequiredBy A related resource that requires the described resource to support its function,
delivery, or coherence.
isVersionOf A related resource of which the described resource is a version, edition, or adaptation.
license A legal document giving official permission to do something with the resource.
mediator An entity that mediates access to the resource and for whom the resource is intended
or useful.
provenance A statement of any changes in ownership and custody of the resource since its
creation that are significant for its authenticity, integrity, and interpretation.
references A related resource that is referenced, cited, or otherwise pointed to by the described
resource.
requires A related resource that is required by the described resource to support its function,
delivery, or coherence.
application/pdf Adobe PDF Adobe Portable Document Format Known false pdf
application/sgml SGML SGML application (RFC 1874) Known false sgm, sgml
audio/x-aiff AIFF Audio Interchange File Format Known false aif, aifc,
aiff
image/jpeg JPEG Joint Photographic Experts Group Known false jpeg, jpg
/JPEG File Interchange Format (JFIF)
image/tiff TIFF Tag Image File Format Known false tif, tiff
¹ Used by several functional areas of DSpace. DO NOT REMOVE WITHOUT INVESTIGATING THE
CONSEQUENCES
6.3 Architecture
Overview
DSpace System Architecture
6.3.1 Overview
The DSpace system is organized into three layers, each of which consists of a number of components.
application layer contains components that communicate with the world outside of the individual DSpace
installation, for example the Web user interface and the Open Archives Initiative protocol for metadata
harvesting service.
Each layer only invokes the layer below it; the application layer may not use the storage layer directly, for
example. Each component in the storage and business logic layers has a defined public API. The union of the
APIs of those components are referred to as the Storage API (in the case of the storage layer) and the DSpace
Public API (in the case of the business logic layer). These APIs are in-process Java classes, objects and
methods.
It is important to note that each layer is trusted. Although the logic for authorising actions is in the business logic
layer, the system relies on individual applications in the application layer to correctly and securely authenticate
e-people. If a 'hostile' or insecure application were allowed to invoke the Public API directly, it could very easily
perform actions as any e-person in the system.
The reason for this design choice is that authentication methods will vary widely between different applications,
so it makes sense to leave the logic and responsibility for that in these applications.
The source code is organized to cohere very strictly to this three-layer architecture. Also, only methods in a
component's public API are given the public access level. This means that the Java compiler helps ensure that
the source code conforms to the architecture.
The storage and business logic layer APIs are extensively documented with Javadoc-style comments. Generate
the HTML version of these by entering the [dspace-source]/dspace directory and running:
mvn javadoc:javadoc
Storage Layer
RDBMS
Bitstream Store
It also features an administration section, consisting of pages intended for use by central administrators.
Presently, this part of the Web UI is not particularly sophisticated; users of the administration section need to
know what they are doing! Selected parts of this may also be used by collection administrators.
Web UI Files
The Web UI-related files are located in a variety of directories in the DSpace source tree. Note that as of
DSpace version 1.5, the deployment has changed. The build systems has moved to a maven-based system
enabling the various projects (JSPUI, XMLUI, etc.) into separate projects. The system still uses the familar 'Ant'
to deploy the webapps in later stages.
Location Description
[dspace-source/dspace/modules/jspui/src/main This is where you can place you customize version of the
/resources Messages.properties file.
All the DSpace source code is compiled, and/or automatically downloaded from the Maven Central code
/libraries repository.
A full DSpace "installation template" folder is built in [dspace-source]/dspace/target/dspace-
[version]-build.dir/
This DSpace "installation template" folder has a structure identical to the Installed Directory Layout
In order to then install & deploy DSpace from this "installation template" folder, you must run the following from
[dspace-source]/dspace/target/dspace-[version]-build.dir/ :
Please see the Installing DSpace instructions for more details about the Installation process.
All of the processing is done before the JSP is invoked, so any error or problem that occurs does not
occur halfway through HTML rendering
The JSPs contain as little code as possible, so they can be customized without having to delve into Java
code too much
The org.dspace.app.webui.servlet.LoadDSpaceConfig servlet is always loaded first. This is a very simple
servlet that checks the dspace-config context parameter from the DSpace deployment descriptor, and
uses it to locate dspace.cfg. It also loads up the Log4j configuration. It's important that this servlet is
loaded first, since if another servlet is loaded up, it will cause the system to try and load DSpace and
Log4j configurations, neither of which would be found.
All DSpace servlets are subclasses of the DSpaceServlet class. The DSpaceServlet class handles some basic
operations such as creating a DSpace Context object (opening a database connection etc.), authentication and
error handling. Instead of overriding the doGet and doPost methods as one normally would for a servlet,
DSpace servlets implement doDSGet or doDSPost which have an extra context parameter, and allow the
servlet to throw various exceptions that can be handled in a standard way.
The DSpace servlet processes the contents of the HTTP request. This might involve retrieving the results of a
search with a query term, accessing the current user's eperson record, or updating a submission in progress.
According to the results of this processing, the servlet must decide which JSP should be displayed. The servlet
then fills out the appropriate attributes in the HttpRequest object that represents the HTTP request being
processed. This is done by invoking the setAttribute method of the javax.servlet.http.HttpServletRequest object
that is passed into the servlet from Tomcat. The servlet then forwards control of the request to the appropriate
JSP using the JSPManager.showJSP method.
The JSPManager.showJSP method uses the standard Java servlet forwarding mechanism is then used to
forward the HTTP request to the JSP. The JSP is processed by Tomcat and the results sent back to the user's
browser.
There is an exception to this servlet/JSP style: index.jsp, the 'home page', receives the HTTP request directly
from Tomcat without a servlet being invoked first. This is because in the servlet 2.3 specification, there is no
way to map a servlet to handle only requests made to '/'; such a mapping results in every request being directed
to that servlet. By default, Tomcat forwards requests to '/' to index.jsp. To try and make things as clean as
possible, index.jsp contains some simple code that would normally go in a servlet, and then forwards to home.
jsp using the JSPManager.showJSP method. This means localized versions of the 'home page' can be created
by placing a customized home.jsp in [dspace-source]/jsp/local, in the same manner as other JSPs.
At the top of each JSP file, right after the license and copyright header, is documented the appropriate attributes
that a servlet must fill out prior to forwarding to that JSP. No validation is performed; if the servlet does not fill
out the necessary attributes, it is likely that an internal server error will occur.
Many JSPs containing forms will include hidden parameters that tell the servlets which form has been filled out.
The submission UI servlet (SubmissionController is a prime example of a servlet that deals with the input from
many different JSPs. The step and page hidden parameters (written out by the SubmissionController.
getSubmissionParameters() method) are used to inform the servlet which page of which step has just been
filled out (i.e. which page of the submission the user has just completed).
Below is a detailed, scary diagram depicting the flow of control during the whole process of processing and
responding to an HTTP request. More information about the authentication mechanism is mostly described in
the configuration section.
layout: Just about every JSP uses this tag. It produces the standard HTML header and <BODY>_tag.
Thus the content of each JSP is nested inside a _<dspace:layout> tag. The (XML-style)attributes of this
tag are slightly complicated--see dspace-tags.tld. The JSPs in the source code bundle also provide
plenty of examples.
sidebar: Can only be used inside a layout tag, and can only be used once per JSP. The content between
the start and end sidebar tags is rendered in a column on the right-hand side of the HTML page. The
contents can contain further JSP tags and Java 'scriptlets'.
date: Displays the date represented by an org.dspace.content.DCDate object. Just the one
representation of date is rendered currently, but this could use the user's browser preferences to display
a localized date in the future.
include: Obsolete, simple tag, similar to jsp:include. In versions prior to DSpace 1.2, this tag would use
the locally modified version of a JSP if one was installed in jsp/local. As of 1.2, the build process now
performs this function, however this tag is left in for backwards compatibility.
item: Displays an item record, including Dublin Core metadata and links to the bitstreams within it. Note
that the displaying of the bitstream links is simplistic, and does not take into account any of the bundling
structure. This is because DSpace does not have a fully-fledged dissemination architectural piece yet.
Displaying an item record is done by a tag rather than a JSP for two reasons: Firstly, it happens in
several places (when verifying an item record during submission or workflow review, as well as during
standard item accesses), and secondly, displaying the item turns out to be mostly code-work rather than
HTML anyway. Of course, the disadvantage of doing it this way is that it is slightly harder to customize
exactly what is displayed from an item record; it is necessary to edit the tag code ( org.dspace.app.webui.
jsptag.ItemTag). Hopefully a better solution can be found in the future.
itemlist, collectionlist, communitylist: These tags display ordered sequences of items, collections and
communities, showing minimal information but including a link to the page containing full details. These
need to be used in HTML tables.
popup: This tag is used to render a link to a pop-up page (typically a help page.) If Javascript is
available, the link will either open or pop to the front any existing DSpace pop-up window. If Javascript is
not available, a standard HTML link is displayed that renders the link destination in a window named '
dspace.popup'. In graphical browsers, this usually opens a new window or re-uses an existing window of
that name, but if a window is re-used it is not 'raised' which might confuse the user. In text browsers,
following this link will simply replace the current page with the destination of the link. This obviously
means that Javascript offers the best functionality, but other browsers are still supported.
selecteperson: A tag which produces a widget analogous to HTML <SELECT>, that allows a user to
select one or multiple e-people from a pop-up list.
sfxlink: Using an item's Dublin Core metadata DSpace can display an SFX link, if an SFX server is
available. This tag does so for a particular item if the sfx.server.url property is defined in dspace.cfg.
XMLUI Internationalization
For information about XMLUI Internationalization please see: XMLUI Multilingual Support.
The Java Standard Tag Library v1.0 is used to specify messages in the JSPs like this:
OLD:
<H1>Search Results</H1>
NEW:
<H1><fmt:message key="jsp.search.results.title"/></H1>
This message can now be changed using the config/language-packs/Messages.properties file. (This must be
done at build-time: Messages.properties is placed in the dspace.war Web application file.)
Phrases may have parameters to be passed in, to make the job of translating easier, reduce the number of
'keys' and to allow translators to make the translated text flow more appropriately for the target language.
OLD:
NEW:
<fmt:message key="jsp.search.results.text">
<fmt:param><%= r.getFirst() %></fmt:param>
<fmt:param><%= r.getLast() %></fmt:param>
<fmt:param><%= r.getTotal() %></fmt:param>
</fmt:message>
(Note: JSTL 1.0 does not seem to allow JSP <%= %> expressions to be passed in as values of attribute in <fmt:
param value=""/>)
Introducing number parameters that should be formatted according to the locale used makes no difference in
the message key compared to string parameters:
In the JSP using this key can be used in the way belov:
<fmt:message key="jsp.submit.show-uploaded-file.size-in-bytes">
<fmt:param><fmt:formatNumber><%= bitstream.getSize()%></fmt:formatNumber></fmt:param>
</fmt:message>
(Note: JSTL offers a way to include numbers in the message keys as jsp.foo.key = {0,number} bytes. Setting
the parameter as <fmt:param value="${variable}" /> workes when variable is a single variable name and doesn't
work when trying to use a method's return value instead: bitstream.getSize(). Passing the number as string (or
using the <%= %> expression) also does not work.)
Multiple Messages.properties can be created for different languages. See ResourceBundle.getBundle. e.g. you
can add German and Canadian French translations:
Messages_de.properties
Messages_fr_CA.properties
The end user's browser settings determine which language is used. The English language file Messages.
properties (or the default server locale) will be used as a default if there's no language bundle for the end user's
preferred language. (Note that the English file is not called Messages_en.properties – this is so it is always
available as a default, regardless of server configuration.)
The dspace:layout tag has been updated to allow dictionary keys to be passed in for the titles. It now has two
new parameters: titlekey and parenttitlekey. So where before you'd do:
<dspace:layout title="Here"
parentlink="/mydspace"
parenttitle="My DSpace">
<dspace:layout titlekey="jsp.page.title"
parentlink="/mydspace"
parenttitlekey="jsp.mydspace">
And so the layout tag itself gets the relevant stuff out of the dictionary. title and parenttitle still work as before for
backwards compatibility, and the odd spot where that's preferable.
For text in JSPs use the complete path + filename of the JSP, then a one-word name for the message. e.g. for
the title of jsp/mydspace/main.jsp use:
jsp.mydspace.main.title
Some common words (e.g. "Help") can be brought out into keys starting jsp. for ease of translation, e.g.:
jsp.admin = Administer
Other common words/phrases are brought out into 'general' parameters if they relate to a set (directory) of
JSPs, e.g.
jsp.tools.general.delete = Delete
Phrases that relate strongly to a topic (eg. MyDSpace) but used in many JSPs outside the particular directory
are more convenient to be cross-referenced. For example one could use the key below in jsp/submit/saved.jsp
to provide a link back to the user's MyDSpace:
(Cross-referencing of keys in general is not a good idea as it may make maintenance more difficult. But in
some cases it has more advantages as the meaning is obvious.)
jsp.mydspace.general.goto-mydspace = Go to My DSpace
For text in servlet code, in custom JSP tags or wherever applicable use the fully qualified classname + a one-
word name for the message. e.g.
org.dspace.app.webui.jsptag.ItemListTag.title = Title
contents.html
chapter1.html
chapter2.html
chapter3.html
figure1.gif
figure2.jpg
figure3.gif
figure4.jpg
figure5.gif
figure6.gif
The Bundle's primary bitstream field would point to the contents.html Bitstream, which we know is HTML (check
the format MIME type) and so we know which to serve up first.
The HTML servlet employs a trick to serve up HTML documents without actually modifying the HTML or other
files themselves. Say someone is looking at contents.html from the above example, the URL in their browser
will look like this:
https://dspace.mit.edu/html/1721.1/12345/contents.html
If there's an image called figure1.gif in that HTML page, the browser will do HTTP GET on this URL:
https://dspace.mit.edu/html/1721.1/12345/figure1.gif
The HTML document servlet can work out which item the user is looking at, and then which Bitstream in it is
called figure1.gif, and serve up that bitstream. Similar for following links to other HTML pages. Of course all the
links and image references have to be relative and not absolute.
HTML documents must be "self-contained", as explained here. Provided that full path information is known by
DSpace, any depth or complexity of HTML document can be served subject to those constraints. This is usually
possible with some kind of batch import. If, however, the document has been uploaded one file at a time using
the Web UI, the path information has been stripped. The system can cope with relative links that refer to a
deeper path, e.g.
<IMG SRC="images/figure1.gif">
If the item has been uploaded via the Web submit UI, in the Bitstream table in the database we have the 'name'
field, which will contain the filename with no path (figure1.gif). We can still work out what images/figure1.gif is by
making the HTML document servlet strip any path that comes in from the URL, e.g.
https://dspace.mit.edu/html/1721.1/12345/images/figure1.gif
^^^^^^^
Strip this
BUT all the filenames (regardless of directory names) must be unique. For example, this wouldn't work:
contents.html
chapter1.html
chapter2.html
chapter1_images/figure.gif
chapter2_images/figure.gif
since the HTML document servlet wouldn't know which bitstream to serve up for:
https://dspace.mit.edu/html/1721.1/12345/chapter1_images/figure.gif
https://dspace.mit.edu/html/1721.1/12345/chapter2_images/figure.gif
To prevent "infinite URL spaces" appearing (e.g. if a file foo.html linked to bar/foo.html, which would link to bar
/bar/foo.html...) this behavior can be configured by setting the configuration property webui.html.max-depth-
guess.
For example, if we receive a request for foo/bar/index.html, and we have a bitstream called just index.html, we
will serve up that bitstream for the request if webui.html.max-depth-guess is 2 or greater. If webui.html.max-
depth-guess is 1 or less, we would not serve that bitstream, as the depth of the file is greater. If webui.html.max-
depth-guess is zero, the request filename and path must always exactly match the bitstream name. The default
value (if that property is not present in dspace.cfg) is 3.
Thesis Blocking
The submission UI has an optional feature that came about as a result of MIT Libraries policy. If the block.theses
parameter in dspace.cfg is true, an extra checkbox is included in the first page of the submission UI. This asks
the user if the submission is a thesis. If the user checks this box, the submission is halted (deleted) and an error
message displayed, explaining that DSpace should not be used to submit theses. This feature can be turned off
and on, and the message displayed (/dspace/jsp/submit/no-theses.jsp can be localized as necessary.
Older Versions
Prior to Release 1.6, there were various scripts written that masked a more manual approach to running CLI
programs. The user had to issue [dspace]/bin/dsrun and then java class that ran that program. With release 1.5,
scripts were written to mask the [dspace]/bin/dsrun command. We have left the java class in the System
Administration section since it does have value for debugging purposes and for those who wish to learn about
DSpace
programming or wish to customize the code at any time.
[dspace]/bin/dsrun org.dspace.browse.IndexBrowse -f -r
[dspace]/bin/dsrun org.dspace.browse.ItemCounter
[dspace]/bin/dsrun org.dspace.search.DSIndexer
In release 1.5 a script was written and in release 1.6 the command [dspace]/bin/dspace index-init
replaces the script. The stanza from launcher.xmlshow us how one can build more commands if needed:
<command>
<name>index-update</name>
<description>Update the search and browse indexes</description>
<step passuserargs="false">
<class>org.dspace.browse.IndexBrowse</class>
<argument>-i</argument>
</step>
<step passuserargs="false">
<class>org.dspace.browse.ItemCounter</class>
</step>
<step passuserargs="false">
<class>org.dspace.search.DSIndexer</class>
</step>
</command>
Other Classes
Modifications
What's In Memory?
Dublin Core Metadata
Support for Other Metadata Schemas
Packager Plugins
Plugin Manager
Concepts
Using the Plugin Manager
Types of Plugin
Self-Named Plugins
Obtaining a Plugin Instance
Lifecycle Management
Getting Meta-Information
Implementation
PluginManager Class
SelfNamedPlugin Class
Errors and Exceptions
Configuring Plugins
Configuring Singleton (Single) Plugins
Configuring Sequence of Plugins
Configuring Named Plugins
Configuring the Reusable Status of a Plugin
Validating the Configuration
Use Cases
Managing the MediaFilter plugins transparently
A Singleton Plugin
Plugin that Names Itself
Stackable Authentication
Workflow System
Administration Toolkit
E-person/Group Manager
Authorization
Special Groups
Miscellaneous Authorization Notes
Handle Manager/Handle Plugin
Search
Current Lucene Implementation
Indexed Fields
Harvesting API
Browse API
Using the API
Index Maintenance
Caveats
Checksum checker
OpenSearch Support
Embargo Support
What is an Embargo?
Embargo Model and Life-Cycle
Core Classes
The org.dspace.core package provides some basic classes that are used throughout the DSpace code.
The system is configured by editing the relevant files in [dspace]/config, as described in the configuration
section.
When editing configuration files for applications that DSpace uses, such as Apache Tomcat, you may
want to edit the copy in [dspace-source] and then run ant update or ant overwrite_configs
rather than editing the 'live' version directly! This will ensure you have a backup copy of your modified
configuration files, so that they are not accidentally overwritten in the future.
Constants
This class contains constants that are used to represent types of object and actions in the database. For
example, authorization policies can relate to objects of different types, so the resourcepolicy table has columns
resource_id, which is the internal ID of the object, and resource_type_id, which indicates whether the object is
an item, collection, bitstream etc. The value of resource_type_id is taken from the Constants class, for example
Constants.ITEM.
Here are a some of the most commonly used constants you might come across:
DSpace types
Bitstream: 0
Bundle: 1
Item: 2
Collection: 3
Community: 4
Site: 5
Group: 6
Eperson: 7
DSpace actions
Read: 0
Write: 1
Delete: 2
Add: 3
Remove: 4
Context
The Context class is central to the DSpace operation. Any code that wishes to use the any API in the business
logic layer must first create itself a Context object. This is akin to opening a connection to a database (which is
in fact one of the things that happens.)
A context object is involved in most method calls and object constructors, so that the method or object has
access to information about the current operation. When the context object is constructed, the following
information is automatically initialized:
A connection to the database. This is a transaction-safe connection. i.e. the 'auto-commit' flag is set to
false.
A cache of content management API objects. Each time a content object is created (for example Item or
Bitstream) it is stored in the Context object. If the object is then requested again, the cached copy is
used. Apart from reducing database use, this addresses the problem of having two copies of the same
object in memory in different states.
The following information is also held in a context object, though it is the responsibility of the application
creating the context object to fill it out correctly:
Typical use of the context object will involve constructing one, and setting the current user if one is
authenticated. Several operations may be performed using the context object. If all goes well, complete
is called to commit the changes and free up any resources used by the context. If anything has gone
wrong, abort is called to roll back any changes and free up the resources.
You should always abort a context if any error happens during its lifespan; otherwise the data in the system
may be left in an inconsistent state. You can also commit a context, which means that any changes are written
to the database, and the context is kept active for further use.
Email
Sending e-mails is pretty easy. Just use the configuration manager's getEmail method, set the arguments and
recipients, and send.
The e-mail texts are stored in [dspace]/config/emails. They are processed by the standard java.text.
MessageFormat. At the top of each e-mail are listed the appropriate arguments that should be filled out by the
sender. Example usage is shown in the org.dspace.core.Email Javadoc API documentation.
LogManager
The log manager consists of a method that creates a standard log header, and returns it as a string suitable for
logging. Note that this class does not actually write anything to the logs; the log header returned should be
logged directly by the sender using an appropriate Log4J call, so that information about where the logging is
taking place is also stored.
The level of logging can be configured on a per-package or per-class basis by editing [dspace]/config
/log4j.properties. You will need to stop and restart Tomcat for the changes to take effect.
Action view_item
The above format allows the logs to be easily parsed and analyzed. The [dspace]/bin/log-reporter
script is a simple tool for analyzing logs. Try:
[dspace]/bin/log-reporter --help
It's a good idea to 'nice' this log reporter to avoid an impact on server performance.
Utils
Utils contains miscellaneous utility method that are required in a variety of places throughout the code, and thus
have no particular 'home' in a subsystem.
Classes corresponding to the main elements in the DSpace data model (Community, Collection, Item, Bundle
and Bitstream) are sub-classes of the abstract class DSpaceObject. The Item object handles the Dublin Core
metadata record.
Each class generally has one or more static find methods, which are used to instantiate content objects.
Constructors do not have public access and are just used internally. The reasons for this are:
"Constructing" an object may be misconstrued as the action of creating an object in the DSpace system,
for example one might expect something like:
to construct a brand new item in the system, rather than simply instantiating an in-memory instance of an
object in the system.
find methods may often be called with invalid IDs, and return null in such a case. A constructor would
have to throw an exception in this case. A null return value from a static method can in general be dealt
with more simply in code.
If an instantiation representing the same underlying archival entity already exists, the find method can
simply return that same instantiation to avoid multiple copies and any inconsistencies which might result.
Collection, Bundle and Bitstream do not have create methods; rather, one has to create an object using the
relevant method on the container. For example, to create a collection, one must invoke createCollection on the
community that the collection is to appear in:
The primary reason for this is for determining authorization. In order to know whether an e-person may create
an object, the system must know which container the object is to be added to. It makes no sense to create a
collection outside of a community, and the authorization system does not have a policy for that.
In the previous chapter there is an overview of the item ingest process which should clarify the previous
paragraph. Also see the section on the workflow system.
Community and BitstreamFormat do have static create methods; one must be a site administrator to have
authorization to invoke these.
Other Classes
Classes whose name begins DC are for manipulating Dublin Core metadata, as explained below.
The FormatIdentifier class attempts to guess the bitstream format of a particular bitstream. Presently, it does
this simply by looking at any file extension in the bitstream name and matching it up with the file extensions
associated with bitstream formats. Hopefully this can be greatly improved in the future!
The ItemIterator class allows items to be retrieved from storage one at a time, and is returned by methods that
may return a large number of items, more than would be desirable to have in memory at once.
The ItemComparator class is an implementation of the standard java.util.Comparator that can be used to
compare and order items based on a particular Dublin Core metadata field.
Modifications
When creating, modifying or for whatever reason removing data with the content management API, it is
important to know when changes happen in-memory, and when they occur in the physical DSpace storage.
Primarily, one should note that no change made using a particular org.dspace.core.Context object will actually
be made in the underlying storage unless complete or commit is invoked on that Context. If anything should go
wrong during an operation, the context should always be aborted by invoking abort, to ensure that no
inconsistent state is written to the storage.
Additionally, some changes made to objects only happen in-memory. In these cases, invoking the update
method lines up the in-memory changes to occur in storage when the Context is committed or completed. In
general, methods that change any metadata field only make the change in-memory; methods that involve
relationships with other objects in the system line up the changes to be committed with the context. See
individual methods in the API Javadoc.
The new name will not be stored since update was not invoked
Context context = new
Context();
Bitstream b = Bitstream.find
(context, 1234);
b.setName("newfile.txt");
context.complete();
The bitstream will be included in the bundle, since update doesn't need
Context context = new to be called
Context();
Bitstream bs = Bitstream.find
(context, 1234);
Bundle bnd = Bundle.find
(context, 5678);
bnd.add(bs);
context.complete();
What's In Memory?
Instantiating some content objects also causes other content objects to be loaded into memory.
Instantiating a Bitstream object causes the appropriate BitstreamFormat object to be instantiated. Of course the
Bitstream object does not load the underlying bits from the bitstream store into memory!
Instantiating a Bundle object causes the appropriate Bitstream objects (and hence _BitstreamFormat_s) to be
instantiated.
Instantiating an Item object causes the appropriate Bundle objects (etc.) and hence _BitstreamFormat_s to be
instantiated. All the Dublin Core metadata associated with that item are also loaded into memory.
The reasoning behind this is that for the vast majority of cases, anyone instantiating an item object is going to
need information about the bundles and bitstreams within it, and this methodology allows that to be done in the
most efficient way and is simple for the caller. For example, in the Web UI, the servlet (controller) needs to pass
information about an item to the viewer (JSP), which needs to have all the information in-memory to display the
item without further accesses to the database which may cause errors mid-display.
You do not need to worry about multiple in-memory instantiations of the same object, or any inconsistencies
that may result; the Context object keeps a cache of the instantiated objects. The find methods of classes in org.
dspace.content will use a cached object if one exists.
It may be that in enough cases this automatic instantiation of contained objects reduces performance in
situations where it is important; if this proves to be true the API may be changed in the future to include a
loadContents method or somesuch, or perhaps a Boolean parameter indicating what to do will be added to the
find methods.
When a Context object is completed, aborted or garbage-collected, any objects instantiated using that context
are invalidated and should not be used (in much the same way an AWT button is invalid if the window
containing it is destroyed).
classes assume that the values will be in a certain syntax, which will be true for all data generated within the
DSpace system, but since Dublin Core does not always define strict syntax, this may not be true for Dublin Core
originating outside DSpace.
Below is the specific syntax that DSpace expects various fields to adhere to:
date Any or ISO 8601 in the UTC time zone, with either year, month, day, DCDate
unqualified or second precision. Examples:_2000 2002-10 2002-08-14
1999-01-01T14:35:23Z _
contributor Any or In general last name, then a comma, then first names, then DCPersonName
unqualified any additional information like "Jr.". If the contributor is an
organization, then simply the name. Examples:_Doe, John
Smith, John Jr. van Dyke, Dick Massachusetts Institute of
Technology _
language iso A two letter code taken ISO 639, followed optionally by a two DCLanguage
letter country code taken from ISO 3166. Examples:_en fr
en_US _
relation ispartofseries The series name, following by a semicolon followed by the DCSeriesNumber
number in that series. Alternatively, just free text._MIT-TR;
1234 My Report Series; ABC-1234 NS1234 _
The MetadataField class describes a metadata field by schema, element and optional qualifier. The value of a
MetadataField is described by a MetadataValue which is roughly equivalent to the older DCValue class. Finally
the MetadataSchema class is used to describe supported schemas. The DC schema is supported by default.
Refer to the javadoc for method details.
Packager Plugins
The Packager plugins let you ingest a package to create a new DSpace Object, and disseminate a content
Object as a package. A package is simply a data stream; its contents are defined by the packager plugin's
implementation.
To ingest an object, which is currently only implemented for Items, the sequence of operations is:
2.
Plugin Manager
The PluginManager is a very simple component container. It creates and organizes components (plugins), and
helps select a plugin in the cases where there are many possible choices. It also gives some limited control
over the life cycle of a plugin.
Concepts
The following terms are important in understanding the rest of this section:
Plugin Interface A Java interface, the defining characteristic of a plugin. The consumer of a plugin asks
for its plugin by interface.
Plugin a.k.a. Component, this is an instance of a class that implements a certain interface. It is
interchangeable with other implementations, so that any of them may be "plugged in", hence the name. A
Plugin is an instance of any class that implements the plugin interface.
Implementation class The actual class of a plugin. It may implement several plugin interfaces, but must
implement at least one.
Name Plugin implementations can be distinguished from each other by name, a short String meant to
symbolically represent the implementation class. They are called "named plugins". Plugins only need to
be named when the caller has to make an active choice between them.
SelfNamedPlugin class Plugins that extend the SelfNamedPlugin class can take advantage of
additional features of the Plugin Manager. Any class can be managed as a plugin, so it is not necessary,
just possible.
Reusable Reusable plugins are only instantiated once, and the Plugin Manager returns the same
(cached) instance whenever that same plugin is requested again. This behavior can be turned off if
desired.
Types of Plugin
The Plugin Manager supports three different patterns of usage:
1. Singleton Plugins There is only one implementation class for the plugin. It is indicated in the
configuration. This type of plugin chooses an implementation of a service, for the entire system, at
configuration time. Your application just fetches the plugin for that interface and gets the configured-in
choice. See the getSinglePlugin() method.
2. Sequence Plugins You need a sequence or series of plugins, to implement a mechanism like Stackable
Authentication or a pipeline, where each plugin is called in order to contribute its implementation of a
process to the whole. The Plugin Manager supports this by letting you configure a sequence of plugins
for a given interface. See the getPluginSequence() method.
3. Named Plugins Use a named plugin when the application has to choose one plugin implementation out
of many available ones. Each implementation is bound to one or more names (symbolic identifiers) in the
configuration. The name is just a string to be associated with the combination of implementation class
and interface. It may contain any characters except for comma (,) and equals (=). It may contain
embedded spaces. Comma is a special character used to separate names in the configuration entry.
Names must be unique within an interface: No plugin classes implementing the same interface may have
the same name. Think of plugin names as a controlled vocabulary – for a given plugin interface, there is
a set of names for which plugins can be found. The designer of a Named Plugin interface is responsible
for deciding what the name means and how to derive it; for example, names of metadata crosswalk
plugins may describe the target metadata format. See the getNamedPlugin() method and the
getPluginNames() methods.
Self-Named Plugins
Named plugins can get their names either from the configuration or, for a variant called self-named plugins,
from within the plugin itself.
Self-named plugins are necessary because one plugin implementation can be configured itself to take on many
"personalities", each of which deserves its own plugin name. It is already managing its own configuration for
each of these personalities, so it makes sense to allow it to export them to the Plugin Manager rather than
expecting the plugin configuration to be kept in sync with it own configuration.
An example helps clarify the point: There is a named plugin that does crosswalks, call it CrosswalkPlugin. It has
several implementations that crosswalk some kind of metadata. Now we add a new plugin which uses XSL
stylesheet transformation (XSLT) to crosswalk many types of metadata – so the single plugin can act like many
different plugins, depending on which stylesheet it employs.
This XSLT-crosswalk plugin has its own configuration that maps a Plugin Name to a stylesheet – it has to, since
of course the Plugin Manager doesn't know anything about stylesheets. It becomes a self-named plugin, so that
it reads its configuration data, gets the list of names to which it can respond, and passes those on to the Plugin
Manager.
When the Plugin Manager creates an instance of the XSLT-crosswalk, it records the Plugin Name that was
responsible for that instance. The plugin can look at that Name later in order to configure itself correctly for the
Name that created it. This mechanism is all part of the SelfNamedPlugin class which is part of any self-named
plugin.
A sequence plugin is returned as an array of _Object_s since it is actually an ordered list of plugins.
Lifecycle Management
When PluginManager fulfills a request for a plugin, it checks whether the implementation class is reusable; if so,
it creates one instance of that class and returns it for every subsequent request for that interface and name. If it
is not reusable, a new instance is always created.
For reasons that will become clear later, the manager actually caches a separate instance of an implementation
class for each name under which it can be requested.
You can ask the PluginManager to forget about (decache) a plugin instance, by releasing it. See the
PluginManager.releasePlugin() method. The manager will drop its reference to the plugin so the garbage
collector can reclaim it. The next time that plugin/name combination is requested, it will create a new instance.
Getting Meta-Information
The PluginManager can list all the names of the Named Plugins which implement an interface. You may need
this, for example, to implement a menu in a user interface that presents a choice among all possible plugins.
See the getPluginNames() method.
Note that it only returns the plugin name, so if you need a more sophisticated or meaningful "label" (i.e. a key
into the I18N message catalog) then you should add a method to the plugin itself to return that.
Implementation
Note: The PluginManager refers to interfaces and classes internally only by their names whenever possible, to
avoid loading classes until absolutely necessary (i.e. to create an instance). As you'll see below, self-named
classes still have to be loaded to query them for names, but for the most part it can avoid loading classes. This
saves a lot of time at start-up and keeps the JVM memory footprint down, too. As the Plugin Manager gets used
for more classes, this will become a greater concern.
The only downside of "on-demand" loading is that errors in the configuration don't get discovered right away.
The solution is to call the checkConfiguration() method after making any changes to the configuration.
PluginManager Class
The PluginManager class is your main interface to the Plugin Manager. It behaves like a factory class that never
gets instantiated, so its public methods are static.
Returns an instance of the singleton (single) plugin implementing the given interface. There must be
exactly one single plugin configured for this interface, otherwise the PluginConfigurationError is thrown.
Note that this is the only "get plugin" method which throws an exception. It is typically used at
initialization time to set up a permanent part of the system so any failure is fatal. See the plugin.single
configuration key for configuration details.
Returns instances of all plugins that implement the interface intface, in an Array. Returns an empty array
if no there are no matching plugins. The order of the plugins in the array is the same as their class
names in the configuration's value field. See the plugin.sequence configuration key for configuration
details.
Returns an instance of a plugin that implements the interface intface and is bound to a name matching
name. If there is no matching plugin, it returns null. The names are matched by String.equals(). See the
plugin.named and plugin.selfnamed configuration keys for configuration details.
Tells the Plugin Manager to let go of any references to a reusable plugin, to prevent it from being given
out again and to allow the object to be garbage-collected. Call this when a plugin instance must be taken
out of circulation.
Returns all of the names under which a named plugin implementing the interface intface can be
requested (with getNamedPlugin()). The array is empty if there are no matches. Use this to populate a
menu of plugins for interactive selection, or to document what the possible choices are. The names are
NOT returned in any predictable order, so you may wish to sort them first. Note: Since a plugin may be
bound to more than one name, the list of names this returns does not represent the list of plugins. To get
the list of unique implementation classes corresponding to the names, you might have to eliminate
duplicates (i.e. create a Set of classes).
Validates the keys in the DSpace ConfigurationManager pertaining to the Plugin Manager and reports
any errors by logging them. This is intended to be used interactively by a DSpace administrator, to check
the configuration file after modifying it. See the section about validating configuration for details.
SelfNamedPlugin Class
A named plugin implementation must extend this class if it wants to supply its own Plugin Name(s). See Self-
Named Plugins for why this is sometimes necessary.
An error of this type means the caller asked for a single plugin, but either there was no single plugin configured
matching that interface, or there was more than one. Either case causes a fatal configuration error.
This exception indicates a fatal error when instantiating a plugin class. It should only be thrown when something
unexpected happens in the course of instantiating a plugin, e.g. an access error, class not found, etc. Simply
not finding a class in the configuration is not an exception.
This is a RuntimeException so it doesn't have to be declared, and can be passed all the way up to a
generalized fatal exception handler.
Configuring Plugins
All of the Plugin Manager's configuration comes from the DSpace Configuration Manager, which is a Java
Properties map. You can configure these characteristics of each plugin:
1. Interface: Classname of the Java interface which defines the plugin, including package name. e.g. org.
dspace.app.mediafilter.FormatFilter
2. Implementation Class: Classname of the implementation class, including package. e.g. org.dspace.app.
mediafilter.PDFFilter
3. Names: (Named plugins only) There are two ways to bind names to plugins: listing them in the value of a
plugin.named.interface key, or configuring a class in plugin.selfnamed.interface which extends the
SelfNamedPlugin class.
4. Reusable option: (Optional) This is declared in a plugin.reusable configuration line. Plugins are reusable
by default, so you only need to configure the non-reusable ones.
plugin.single.interface = classname
For example, this configures the class org.dspace.checker.SimpleDispatcher as the plugin for interface org.
dspace.checker.BitstreamDispatcher:
plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checker.SimpleDispatcher
For example, this entry configures Stackable Authentication with three implementation classes:
plugin.sequence.org.dspace.eperson.AuthenticationMethod = \
org.dspace.eperson.X509Authentication, \
org.dspace.eperson.PasswordAuthentication, \
edu.mit.dspace.MITSpecialGroup
1. Plugins Named in the Configuration A named plugin which gets its name(s) from the configuration is
listed in this kind of entry:_plugin.named.interface = classname = name [ , name.. ] [ classname = name..
]_The syntax of the configuration value is: classname, followed by an equal-sign and then at least one
plugin name. Bind more names to the same implementation class by adding them here, separated by
commas. Names may include any character other than comma (,) and equal-sign (=).For example, this
entry creates one plugin with the names GIF, JPEG, and image/png, and another with the name TeX:
plugin.named.org.dspace.app.mediafilter.MediaFilter = \
org.dspace.app.mediafilter.JPEGFilter = GIF, JPEG, image/png \
org.dspace.app.mediafilter.TeXFilter = TeX
This example shows a plugin name with an embedded whitespace character. Since comma (,) is the
separator character between plugin names, spaces are legal (between words of a name; leading and
trailing spaces are ignored).This plugin is bound to the names "Adobe PDF", "PDF", and "Portable
Document Format".
plugin.named.org.dspace.app.mediafilter.MediaFilter = \
org.dspace.app.mediafilter.TeXFilter = TeX \
org.dspace.app.mediafilter.PDFFilter = Adobe PDF, PDF, Portable Document Format
NOTE: Since there can only be one key with plugin.named. followed by the interface name in the
configuration, all of the plugin implementations must be configured in that entry.
2. Self-Named Plugins Since a self-named plugin supplies its own names through a static method call, the
configuration only has to include its interface and classname:plugin.selfnamed.interface = classname [ ,
classname.. ]_The following example first demonstrates how the plugin class,
_XsltDisseminationCrosswalk is configured to implement its own names "MODS" and "DublinCore".
These come from the keys starting with crosswalk.dissemination.stylesheet.. The value is a stylesheet
file. The class is then configured as a self-named plugin:
crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl
crosswalk.dissemination.stylesheet.MODS = xwalk/mods.xsl
plugin.selfnamed.crosswalk.org.dspace.content.metadata.DisseminationCrosswalk = \
org.dspace.content.metadata.MODSDisseminationCrosswalk, \
org.dspace.content.metadata.XsltDisseminationCrosswalk
NOTE: Since there can only be one key with plugin.selfnamed. followed by the interface name in the
configuration, all of the plugin implementations must be configured in that entry. The
MODSDisseminationCrosswalk class is only shown to illustrate this point.
For example, this marks the PDF plugin from the example above as non-reusable:
plugin.reusable.org.dspace.app.mediafilter.PDFFilter = false
To validate the Plugin Manager configuration, call the PluginManager.checkConfiguration() method. It looks for
the following mistakes:
Eventually, someone should develop a general configuration-file sanity checker for DSpace, which would just
call PluginManager.checkConfiguration().
Use Cases
Here are some usage examples to illustrate how the Plugin Manager works.
A Singleton Plugin
This shows how to configure and access a single anonymous plugin, such as the BitstreamDispatcher plugin:
Configuration:
plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checker.SimpleDispatcher
The following code fragment shows how dispatcher, the service object, is initialized and used:
BitstreamDispatcher dispatcher =
(BitstreamDispatcher)PluginManager.getSinglePlugin(BitstreamDispatcher
.class);
int id = dispatcher.next();
id = dispatcher.next();
}
NOTE: Remember how getPlugin() caches a separate instance of an implementation class for every name
bound to it? This is why: the instance can look at the name under which it was invoked and configure itself
specifically for that name. Since the instance for each name might be different, the Plugin Manager has to
cache a separate instance for each name.
Here is the configuration file listing both the plugin's own configuration and the PluginManager config line:
crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl
crosswalk.dissemination.stylesheet.MODS = xwalk/mods.xsl
plugin.selfnamed.org.dspace.content.metadata.DisseminationCrosswalk = \
org.dspace.content.metadata.XsltDisseminationCrosswalk
This look into the implementation shows how it finds configuration entries to populate the array of plugin names
returned by the getPluginNames() method. Also note, in the getStylesheet() method, how it uses the plugin
name that created the current instance (returned by getPluginInstanceName()) to find the correct stylesheet.
while (pe.hasMoreElements())
{
String key = (String)pe.nextElement();
if (key.startsWith(prefix))
aliasList.add(key.substring(prefix.length()));
}
return (String[])aliasList.toArray(new
String[aliasList.size()]);
}
Stackable Authentication
The Stackable Authentication mechanism needs to know all of the plugins configured for the interface, in the
order of configuration, since order is significant. It gets a Sequence Plugin from the Plugin Manager. Refer to
the Configuration Section on Stackable Authentication for further details.
Workflow System
The primary classes are:
org.dspace.eperson.Group people who can perform workflow tasks are defined in EPerson
Groups
The workflow system models the states of an Item in a state machine with 5 states (SUBMIT, STEP_1,
STEP_2, STEP_3, ARCHIVE.) These are the three optional steps where the item can be viewed and corrected
by different groups of people. Actually, it's more like 8 states, with STEP_1_POOL, STEP_2_POOL, and
STEP_3_POOL. These pooled states are when items are waiting to enter the primary states.
The WorkflowManager is invoked by events. While an Item is being submitted, it is held by a WorkspaceItem.
Calling the start() method in the WorkflowManager converts a WorkspaceItem to a WorkflowItem, and begins
processing the WorkflowItem's state. Since all three steps of the workflow are optional, if no steps are defined,
then the Item is simply archived.
Workflows are set per Collection, and steps are defined by creating corresponding entries in the List named
workflowGroup. If you wish the workflow to have a step 1, use the administration tools for Collections to create
a workflow Group with members who you want to be able to view and approve the Item, and the workflowGroup
[0] becomes set with the ID of that Group.
If a step is defined in a Collection's workflow, then the WorkflowItem's state is set to that step_POOL. This
pooled state is the WorkflowItem waiting for an EPerson in that group to claim the step's task for that
WorkflowItem. The WorkflowManager emails the members of that Group notifying them that there is a task to
be performed (the text is defined in config/emails,) and when an EPerson goes to their 'My DSpace' page to
claim the task, the WorkflowManager is invoked with a claim event, and the WorkflowItem's state advances
from STEP_x_POOL to STEP_x (where x is the corresponding step.) The EPerson can also generate an
'unclaim' event, returning the WorkflowItem to the STEP_x_POOL.
Other events the WorkflowManager handles are advance(), which advances the WorkflowItem to the next state.
If there are no further states, then the WorkflowItem is removed, and the Item is then archived. An EPerson
performing one of the tasks can reject the Item, which stops the workflow, rebuilds the WorkspaceItem for it and
sends a rejection note to the submitter. More drastically, an abort() event is generated by the admin tools to
cancel a workflow outright.
Administration Toolkit
The org.dspace.administer package contains some classes for administering a DSpace system that are not
generally needed by most applications.
The CreateAdministrator class is a simple command-line tool, executed via [dspace]/bin/dspace create-
administrator, that creates an administrator e-person with information entered from standard input. This is
generally used only once when a DSpace system is initially installed, to create an initial administrator who can
then use the Web administration UI to further set up the system. This script does not check for authorization,
since it is typically run before there are any e-people to authorize! Since it must be run as a command-line tool
on the server machine, generally this shouldn't cause a problem. A possibility is to have the script only operate
when there are no e-people in the system already, though in general, someone with access to command-line
scripts on your server is probably in a position to do what they want anyway!
The DCType class is similar to the org.dspace.content.BitstreamFormat class. It represents an entry in the
Dublin Core type registry, that is, a particular element and qualifier, or unqualified element. It is in the administer
package because it is only generally required when manipulating the registry itself. Elements and qualifiers are
specified as literals in org.dspace.content.Item methods and the org.dspace.content.DCValue class. Only
administrators may modify the Dublin Core type registry.
The org.dspace.administer.RegistryLoader class contains methods for initializing the Dublin Core type registry
and bitstream format registry with entries in an XML file. Typically this is executed via the command line during
the build process (see build.xml in the source.) To see examples of the XML formats, see the files in config
/registries in the source directory. There is no XML schema, they aren't validated strictly when loaded in.
E-person/Group Manager
DSpace keeps track of registered users with the org.dspace.eperson.EPerson class. The class has methods to
create and manipulate an EPerson such as get and set methods for first and last names, email, and password.
(Actually, there is no getPassword() method‚ an MD5 hash of the password is stored, and can only be verified
with the checkPassword() method.) There are find methods to find an EPerson by email (which is assumed to
be unique,) or to find all EPeople in the system.
The EPerson object should probably be reworked to allow for easy expansion; the current EPerson object
tracks pretty much only what MIT was interested in tracking - first and last names, email, phone. The access
methods are hardcoded and should probably be replaced with methods to access arbitrary name/value pairs for
institutions that wish to customize what EPerson information is stored.
Groups are simply lists of EPerson objects. Other than membership, Group objects have only one other
attribute: a name. Group names must be unique, so we have adopted naming conventions where the role of the
group is its name, such as COLLECTION_100_ADD. Groups add and remove EPerson objects with
addMember() and removeMember() methods. One important thing to know about groups is that they store their
membership in memory until the update() method is called - so when modifying a group's membership don't
forget to invoke update() or your changes will be lost! Since group membership is used heavily by the
authorization system a fast isMember() method is also provided.
Another kind of Group is also implemented in DSpace‚ special Groups. The Context object for each session
carries around a List of Group IDs that the user is also a member of‚ currently the MITUser Group ID is added to
the list of a user's special groups if certain IP address or certificate criteria are met.
Authorization
The primary classes are:
The authorization system is based on the classic 'police state' model of security; no action is allowed unless it is
expressed in a policy. The policies are attached to resources (hence the name ResourcePolicy,) and detail who
can perform that action. The resource can be any of the DSpace object types, listed in org.dspace.core.
Constants (BITSTREAM, ITEM, COLLECTION, etc.) The 'who' is made up of EPerson groups. The actions are
also in Constants.java (READ, WRITE, ADD, etc.) The only non-obvious actions are ADD and REMOVE, which
are authorizations for container objects. To be able to create an Item, you must have ADD permission in a
Collection, which contains Items. (Communities, Collections, Items, and Bundles are all container objects.)
Currently most of the read policy checking is done with items‚ communities and collections are assumed to be
openly readable, but items and their bitstreams are checked. Separate policy checks for items and their
bitstreams enables policies that allow publicly readable items, but parts of their content may be restricted to
certain groups.
Three new attributes have been introduced in the ResourcePolicy class as part of the DSpace 3.0 Embargo
Contribution:
While rpname and rpdescription _are fields manageable by the users the _rptype is a fields managed by the
system. It represents a type that a resource policy can assume beteween the following:
TYPE_SUBMISSION: all the policies added automatically during the submission process
TYPE_WORKFLOW: all the policies added automatically during the workflow stage
TYPE_CUSTOM: all the custom policies added by users
TYPE_INHERITED: all the policies inherited by the DSO father.
An custom policy, created for the purpose of creating an embargo could look like:
policy_id: 4847
resource_type_id: 2
resource_id: 89
action_id: 0
eperson_id:
epersongroup_id: 0
start_date: 2013-01-01
end_date:
rpname: Embargo Policy
rpdescription: Embargoed through 2012
rptype: TYPE_CUSTOM
ResourcePolicies are very simple, and there are quite a lot of them. Each can only list a single group, a single
action, and a single object. So each object will likely have several policies, and if multiple groups share
permissions for actions on an object, each group will get its own policy. (It's a good thing they're small.)
Special Groups
All users are assumed to be part of the public group (ID=0.) DSpace admins (ID=1) are automatically part of all
groups, much like super-users in the Unix OS. The Context object also carries around a List of special groups,
which are also first checked for membership. These special groups are used at MIT to indicate membership in
the MIT community, something that is very difficult to enumerate in the database! When a user logs in with an
MIT certificate or with an MIT IP address, the login code adds this MIT user group to the user's Context.
Handles are stored internally in the handle database table in the form:
1721.123/4567
Typically when they are used outside of the system they are displayed in either URI or "URL proxy" forms:
hdl:1721.123/4567
http://hdl.handle.net/1721.123/4567
It is the responsibility of the caller to extract the basic form from whichever displayed form is used.
The handle table maps these Handles to resource type/resource ID pairs, where resource type is a value from
org.dspace.core.Constants and resource ID is the internal identifier (database primary key) of the object. This
allows Handles to be assigned to any type of object in the system, though as explained in the functional
overview, only communities, collections and items are presently assigned Handles.
Creating a Handle
Finding the Handle for a DSpaceObject, though this is usually only invoked by the object itself, since
DSpaceObject has a getHandle method
Retrieving the DSpaceObject identified by a particular Handle
Obtaining displayable forms of the Handle (URI or "proxy URL").
HandlePlugin is a simple implementation of the Handle Server's net.handle.hdllib.HandleStorage
interface. It only implements the basic Handle retrieval methods, which get information from the handle
database table. The CNRI Handle Server is configured to use this plug-in via its config.dct file.
Note that since the Handle server runs as a separate JVM to the DSpace Web applications, it uses a separate
'Log4J' configuration, since Log4J does not support multiple JVMs using the same daily rolling logs. This
alternative configuration is located at [dspace]/config/log4j-handle-plugin.properties. The
[dspace]/bin/start-handle-server script passes in the appropriate command line parameters so that
the Handle server uses this configuration.
Search
DSpace's search code is a simple API which currently wraps the Lucene search engine. The first half of the
search task is indexing, and org.dspace.search.DSIndexer is the indexing class, which contains indexContent()
which if passed an Item, Community, or Collection, will add that content's fields to the index. The methods
unIndexContent() and reIndexContent() remove and update content's index information. The DSIndexer class
also has a main() method which will rebuild the index completely. This can be invoked by the dspace/bin/index-
init (complete rebuild) or dspace/bin/index-update (update) script. The intent was for the main() method to be
invoked on a regular basis to avoid index corruption, but we have had no problem with that so far.
Which fields are indexed by DSIndexer? These fields are defined in dspace.cfg in the section "Fields to index
for search" as name-value-pairs. The name must be unique in the form search.index.i (i is an arbitrary positive
number). The value on the right side has a unique value again, which can be referenced in search-form (e.g.
title, author). Then comes the metadata element which is indexed. '*' is a wildcard which includes all sub
elements. For example:
search.index.4 = keyword:dc.subject.*
tells the indexer to create a keyword index containing all dc.subject element values. Since the wildcard ('*')
character was used in place of a qualifier, all subject metadata fields will be indexed (e.g. dc.subject.other, dc.
subject.lcsh, etc)
By default, the fields shown in the Indexed Fields section below are indexed. These are hardcoded in the
DSIndexer class. If any search.index.i items are specified in dspace.cfg these are used rather than these
hardcoded fields.
The query class DSQuery contains the three flavors of doQuery() methods‚ one searches the DSpace site, and
the other two restrict searches to Collections and Communities. The results from a query are returned as three
lists of handles; each list represents a type of result. One list is a list of Items with matches, and the other two
are Collections and Communities that match. This separation allows the UI to handle the types of results
gracefully without resolving all of the handles first to see what kind of content the handle points to. The DSQuery
class also has a main() method for debugging via command-line searches.
Indexed Fields
The DSIndexer class shipped with DSpace indexes the Dublin Core metadata in the following way:
Authors contributor.creator.description.statementofresponsibility
Titles title.*
Keywords subject.*
Abstracts description.abstractdescription.tableofcontents
Series relation.ispartofseries
Sponsors description.sponsorship
Identifiers identifier.*
Harvesting API
The org.dspace.search package also provides a 'harvesting' API. This allows callers to extract information
about items modified within a particular timeframe, and within a particular scope (all of DSpace, or a community
or collection.) Currently this is used by the Open Archives Initiative metadata harvesting protocol application,
and the e-mail subscription code.
The Harvest.harvest is invoked with the required scope and start and end dates. Either date can be omitted.
The dates should be in the ISO8601, UTC time zone format used elsewhere in the DSpace system.
HarvestedItemInfo objects are returned. These objects are simple containers with basic information about the
items falling within the given scope and date range. Depending on parameters passed to the harvest method,
the containers and item fields may have been filled out with the IDs of communities and collections containing
an item, and the corresponding Item object respectively. Electing not to have these fields filled out means the
harvest operation executes considerable faster.
In case it is required, Harvest also offers a method for creating a single HarvestedItemInfo object, which might
make things easier for the caller.
Browse API
The browse API maintains indexes of dates, authors, titles and subjects, and allows callers to extract parts of
these:
Title: Values of the Dublin Core element title (unqualified) are indexed. These are sorted in a case-
insensitive fashion, with any leading article removed. For example: "The DSpace System" would appear
under 'D' rather than 'T'.
Author: Values of the contributor (any qualifier or unqualified) element are indexed. Since contributor
values typically are in the form 'last name, first name', a simple case-insensitive alphanumeric sort is
used which orders authors in last name order. Note that this is an index of authors, and not items by
author. If four items have the same author, that author will appear in the index only once. Hence, the
index of authors may be greater or smaller than the index of titles; items often have more than one
author, though the same author may have authored several items. The author indexing in the browse API
does have limitations:
Ideally, a name that appears as an author for more than one item would appear in the author
index only once. For example, 'Doe, John' may be the author of tens of items. However, in
practice, author's names often appear in slightly differently forms, for example:
Doe, John
Currently, the above three names would all appear as separate entries in the author index even
though they may refer to the same author. In order for an author of several papers to be correctly
appear once in the index, each item must specify exactly the same form of their name, which
doesn't always happen in practice.
Another issue is that two authors may have the same name, even within a single institution. If this
is the case they may appear as one author in the index. These issues are typically resolved in
libraries with authority control records, in which are kept a 'preferred' form of the author's name,
with extra information (such as date of birth/death) in order to distinguish between authors of the
same name. Maintaining such records is a huge task with many issues, particularly when
metadata is received from faculty directly rather than trained library catalogers.
Date of Issue: Items are indexed by date of issue. This may be different from the date that an item
appeared in DSpace; many items may have been originally published elsewhere beforehand. The Dublin
Core field used is date.issued. The ordering of this index may be reversed so 'earliest first' and 'most
recent first' orderings are possible. Note that the index is of items by date, as opposed to an index of
dates. If 30 items have the same issue date (say 2002), then those 30 items all appear in the index
adjacent to each other, as opposed to a single 2002 entry. Since dates in DSpace Dublin Core are in
ISO8601, all in the UTC time zone, a simple alphanumeric sort is sufficient to sort by date, including
dealing with varying granularities of date reasonably. For example:
2001-12-10
2002
2002-04
2002-04-05
2002-04-09T15:34:12Z
2002-04-09T19:21:12Z
2002-04-10
Date Accessioned: In order to determine which items most recently appeared, rather than using the
date of issue, an item's accession date is used. This is the Dublin Core field date.accessioned. In other
aspects this index is identical to the date of issue index.
Items by a Particular Author: The browse API can perform is to extract items by a particular author.
They do not have to be primary author of an item for that item to be extracted. You can specify a scope,
too; that is, you can ask for items by author X in collection Y, for example.This particular flavor of browse
is slightly simpler than the others. You cannot presently specify a particular subset of results to be
returned. The API call will simply return all of the items by a particular author within a certain scope. Note
that the author of the item must exactly match the author passed in to the API; see the explanation about
the caveats of the author index browsing to see why this is the case.
Subject: Values of the Dublin Core element subject (both unqualified and with any qualifier) are
indexed. These are sorted in a case-insensitive fashion.
The results of invoking Browse.getItemsByTitle with the above parameters might look like this:
Note that in the case of title and date browses, Item objects are returned as opposed to actual titles. In these
cases, you can specify the 'focus' to be a specific item, or a partial or full literal value. In the case of a literal
value, if no entry in the index matches exactly, the closest match is used as the focus. It's quite reasonable to
specify a focus of a single letter, for example.
Being able to specify a specific item to start at is particularly important with dates, since many items may have
the save issue date. Say 30 items in a collection have the issue date 2002. To be able to page through the
index 20 items at a time, you need to be able to specify exactly which item's 2002 is the focus of the browse,
otherwise each time you invoked the browse code, the results would start at the first item with the issue date
2002.
Author browses return String objects with the actual author names. You can only specify the focus as a full or
partial literal String.
Another important point to note is that presently, the browse indexes contain metadata for all items in the main
archive, regardless of authorization policies. This means that all items in the archive will appear to all users
when browsing. Of course, should the user attempt to access a non-public item, the usual authorization
mechanism will apply. Whether this approach is ideal is under review; implementing the browse API such that
the results retrieved reflect a user's level of authorization may be possible, but rather tricky.
Index Maintenance
The browse API contains calls to add and remove items from the index, and to regenerate the indexes from
scratch. In general the content management API invokes the necessary browse API calls to keep the browse
indexes in sync with what is in the archive, so most applications will not need to invoke those methods.
If the browse index becomes inconsistent for some reason, the InitializeBrowse class is a command line tool
(generally invoked using the [dspace]/bin/dspace index-init command) that causes the indexes to be
regenerated from scratch.
Caveats
Presently, the browse API is not tremendously efficient. 'Indexing' takes the form of simply extracting the
relevant Dublin Core value, normalizing it (lower-casing and removing any leading article in the case of titles),
and inserting that normalized value with the corresponding item ID in the appropriate browse database table.
Database views of this table include collection and community IDs for browse operations with a limited scope.
When a browse operation is performed, a simple SELECT query is performed, along the lines of:
There are two main drawbacks to this: Firstly, LIMIT and OFFSET are PostgreSQL-specific keywords.
Secondly, the database is still actually performing dynamic sorting of the titles, so the browse code as it stands
will not scale particularly well. The code does cache BrowseInfo objects, so that common browse operations are
performed quickly, but this is not an ideal solution.
Checksum checker
Checksum checker is used to verify every item within DSpace. While DSpace calculates and records the
checksum of every file submitted to it, the checker can determine whether the file has been changed. The idea
being that the earlier you can identify a file has changed, the more likely you would be able to record it
(assuming it was not a wanted change).
org.dspace.checker.CheckerCommand class, is the class for the checksum checker tool, which calculates
checksums for each bitstream whose ID is in the most_recent_checksum table, and compares it against the last
calculated checksum for that bitstream.
OpenSearch Support
DSpace is able to support OpenSearch. For those not acquainted with the standard, a very brief introduction,
with emphasis on what possibilities it holds for current use and future development.
OpenSearch is a small set of conventions and documents for describing and using 'search engines', meaning
any service that returns a set of results for a query. It is nearly ubiquitous‚ but also nearly invisible‚ in modern
web sites with search capability. If you look at the page source of Wikipedia, Facebook, CNN, etc you will find
buried a link element declaring OpenSearch support. It is very much a lowest-common-denominator abstraction
(think Google box), but does provide a means to extend its expressive power. This first implementation for
DSpace supports none of these extensions‚ many of which are of potential value‚ so it should be regarded as a
foundation, not a finished solution. So the short answer is that DSpace appears as a 'search-engine' to
OpenSearch-aware software.
Another way to look at OpenSearch is as a RESTful web service for search, very much like SRW/U, but
considerably simpler. This comparative loss of power is offset by the fact that it is widely supported by web tools
and players: browsers understand it, as do large metasearch tools.
Browser IntegrationMany recent browsers (IE7+, FF2+) can detect, or 'autodiscover', links to the
document describing the search engine. Thus you can easily add your or other DSpace instances to the
drop-down list of search engines in your browser. This list typically appears in the upper right corner of
the browser, with a search box. In Firefox, for example, when you visit a site supporting OpenSearch, the
color of the drop-down list widget changes color, and if you open it to show the list of search engines,
you are offered an opportunity to add the site to the list. IE works nearly the same way but instead labels
the web sites 'search providers'. When you select a DSpace instance as the search engine and enter a
search, you are simply sent to the regular search results page of the instance.
Flexible, interesting RSS FeedsBecause one of the formats that OpenSearch specifies for its results is
RSS (or Atom), you can turn any search query into an RSS feed. So if there are keywords highly
discriminative of content in a collection or repository, these can be turned into a URL that a feed reader
can subscribe to. Taken to the extreme, one could take any search a user makes, and dynamically
compose an RSS feed URL for it in the page of returned results. To see an example, if you have a
DSpace with OpenSearch enabled, try:
http://dspace.mysite.edu/open-search/?query=<your query>
The default format returned is Atom 1.0, so you should see an Atom document containing your search
results.
You can extend the syntax with a few other parameters, as follows:
Parameter Values
rpp number indicating the number of results per page (i.e. per request)
Parameter Values
sort_by number indicating sorting criteria (same as DSpace advanced search values
Multiple parameters may be specified on the query string, using the "&" character as the delimiter, e.g.:
http://dspace.mysite.edu/open-search/?query=<your query>&format=rss&scope=123456789/1
Configuration is through the dspace.cfg file. See OpenSearch Support for more details.
Embargo Support
What is an Embargo?
An embargo is a temporary access restriction placed on content, commencing at time of accession. It's scope or
duration may vary, but the fact that it eventually expires is what distinguishes it from other content restrictions.
For example, it is not unusual for content destined for DSpace to come with permanent restrictions on use or
access based on license-driven or other IP-based requirements that limit access to institutionally affiliated
users. Restrictions such as these are imposed and managed using standard administrative tools in DSpace,
typically by attaching specific policies to Items or Collections, Bitstreams, etc. The embargo functionally
introduced in 1.6, however, includes tools to automate the imposition and removal of restrictions in managed
timeframes.
that is the result of the interpretation is stored with the item and the embargo system detects when that date has
passed, and removes the embargo ("lifts it"), so the item bitstreams become available. Here is a more detailed
life-cycle for an embargoed item:
1. Terms Assignment. The first step in placing an embargo on an item is to attach (assign) 'terms' to it. If
these terms are missing, no embargo will be imposed. As we will see below, terms are carried in a
configurable DSpace metadata field, so assigning terms just means assigning a value to a metadata
field. This can be done in a web submission user interface form, in a SWORD deposit package, a batch
import, etc. - anywhere metadata is passed to DSpace. The terms are not immediately acted upon, and
may be revised, corrected, removed, etc, up until the next stage of the life-cycle. Thus a submitter could
enter one value, and a collection editor replace it, and only the last value will be used. Since metadata
fields are multivalued, theoretically there can be multiple terms values, but in the default implementation
only one is recognized.
2. Terms interpretation/imposition. In DSpace terminology, when an item has exited the last of any
workflow steps (or if none have been defined for it), it is said to be 'installed' into the repository. At this
precise time, the 'interpretation' of the terms occurs, and a computed 'lift date' is assigned, which like the
terms is recorded in a configurable metadata field. It is important to understand that this interpretation
happens only once, (just like the installation), and cannot be revisited later. Thus, although an
administrator can assign a new value to the metadata field holding the terms after the item has been
installed, this will have no effect on the embargo, whose 'force' now resides entirely in the 'lift date' value.
For this reason, you cannot embargo content already in your repository (at least using standard tools).
The other action taken at installation time is the actual imposition of the embargo. The default behavior
here is simply to remove the read policies on all the bundles and bitstreams except for the "LICENSE" or
"METADATA" bundles. See the section on Extending Embargo Functionality for how to alter this
behavior. Also note that since these policy changes occur before installation, there is no time during
which embargoed content is 'exposed' (accessible by non-administrators). The terms interpretation and
imposition together are called 'setting' the embargo, and the component that performs them both is called
the embargo 'setter'.
3. Embargo Period. After an embargoed item has been installed, the policy restrictions remain in effect
until removed. This is not an automatic process, however: a 'lifter' must be run periodically to look for
items whose 'lift date' is past. Note that this means the effective removal of an embargo is not the lift
date, but the earliest date after the lift date that the lifter is run. Typically, a nightly cron-scheduled
invocation of the lifter is more than adequate, given the granularity of embargo terms. Also note that
during the embargo period, all metadata of the item remains visible. This default behavior can be
changed. One final point to note is that the 'lift date', although it was computed and assigned during the
previous stage, is in the end a regular metadata field. That means, if there are extraordinary
circumstances that require an administrator (or collection editor‚ anyone with edit permissions on
metadata) to change the lift date, they can do so. Thus, they can 'revise' the lift date without reference to
the original terms. This date will be checked the next time the 'lifter' is run. One could immediately lift the
embargo by setting the lift date to the current day, or change it to 'forever' to indefinitely postpone lifting.
4. Embargo Lift. When the lifter discovers an item whose lift date is in the past, it removes (lifts) the
embargo. The default behavior of the lifter is to add the resource policies that would have been added
had the embargo not been imposed. That is, it replicates the standard DSpace behavior, in which an item
inherits it's policies from its owning collection. As with all other parts of the embargo system, you may
replace or extend the default behavior of the lifter (see section V. below). You may wish, e.g. to send an
email to an administrator or other interested parties, when an embargoed item becomes available.
5. Post Embargo. After the embargo has been lifted, the item ceases to respond to any of the embargo life-
cycle events. The values of the metadata fields reflect essentially historical or provenance values. With
the exception of the additional metadata fields, they are indistinguishable from items that were never
subject to embargo.
More details on Embargo configuration, including specific examples can be found in the Embargo
section of the documentation.
The DSpace Services Framework is a backporting of the DSpace 2.0 Development Group's work in creating a
reasonable and abstractable "Core Services" layer for DSpace components to operate within. The Services
Framework represents a "best practice" for new DSpace architecture and implementation of extensions to the
DSpace application. DSpace Services are best described as a "Simple Registry" where plugins can be "looked
up" or located. The DS2 (DSpace 2.0) core services are the main services that make up a DS2 system. These
includes services for things like user and permissions management and storage and caching. These services
can be used by any developer writing DS2 plugins (e.g. statistics), providers (e.g. authentication), or user
interfaces (e.g. JSPUI).
Architectural Overview
DSpace Kernel
The DSpace Kernel manages the start up and access services in the DSpace Services framework. It is meant
to allow for a simple way to control the core parts of DSpace and allow for flexible ways to startup the kernel.
For example, the kernel can be run inside a single webapp along with a frontend piece (like JSPUI) or it can be
started as part of the servlet container so that multiple webapps can use a single kernel (this increases speed
and efficiency). The kernel is also designed to happily allow multiple kernels to run in a single servlet container
using identifier keys.
Kernel registration
The kernel will automatically register itself as an MBean when it starts up so that it can be managed via JMX. It
allows startup and shutdown and provides direct access to the ServiceManager and the ConfigurationService.
All the other core services can be retrieved from the ServiceManager by their APIs.
Service Manager
The ServiceManager abstracts the concepts of service lookups and lifecycle control. It also manages the
configuration of services by allowing properties to be pushed into the services as they start up (mostly from the
ConfigurationService). The ServiceManagerSystem abstraction allows the DSpace ServiceManager to use
different systems to manage its services. The current implementations include Spring and Guice. This allows
DSpace 2 to have very little service management code but still be flexible and not tied to specific technology.
Developers who are comfortable with those technologies can consume the services from a parent Spring
ApplicationContext or a parent Guice Module. The abstraction also means that we can replace Spring/Guice or
add other dependency injection systems later without requiring developers to change their code. The interface
provides simple methods for looking up services by interface type for developers who do not want to have to
use or learn a dependency injection system or are using one which is not currently supported.
The DS2 kernel is compact so it can be completely started up in a unit test (technically integration test)
environment. (This is how we test the kernel and core services currently). This allows developers to execute
code against a fully functional kernel while developing and then deploy their code with high confidence.
Basic Usage
To use the Framework you must begin by instantiating and starting a DSpaceKernel. The kernel will give you
references to the ServiceManager and the ConfigurationService. The ServiceManager can be used to get
references to other services and to register services which are not part of the core set.
Access to the kernel is provided via the Kernel Manager through the DSpace object, which will locate the kernel
object and allow it to be used.
Standalone Applications
For standalone applications, access to the kernel is provided via the Kernel Manager and the DSpace object
which will locate the kernel object and allow it to be used.
bin/dspace
Web Applications
In web applications, the kernel can be started and accessed through the use of Servlet Filter/ContextListeners
which are provided as part of the DSpace 2 utilities. Developers don't need to understand what is going on
behind the scenes and can simply write their applications and package them as webapps and take advantage
of the services which are offered by DSpace 2.
Activators
Developers can provide an activator to allow the system to startup their service or provider. It is a simple
interface with 2 methods which are called by the ServiceManager to startup the provider(s) and later to shut
them down. These simply allow a developer to run some arbitrary code in order to create and register services if
desired. It is the method provided to add plugins directly to the system via configuration as the activators are
just listed in the configuration file and the system starts them up in the order it finds them.
Provider Stacks
Utilities are provided to assist with stacking and ordering providers. Ordering is handled via a priority number
such that 1 is the highest priority and something like 10 would be lower. 0 indicates that priority is not important
for this service and can be used to ensure the provider is placed at or near the end without having to set some
arbitrarily high number.
Core Services
The core services are all behind APIs so that they can be reimplemented without affecting developers who are
using the services. Most of the services have plugin/provider points so that customizations can be added into
the system without touching the core services code. For example, let's say a deployer has a specialized
authentication system and wants to manage the authentication calls which come into the system. The
implementor can simply implement an AuthenticationProvider and then register it with the DS2 kernel's
ServiceManager. This can be done at any time and does not have to be done during Kernel startup. This allows
providers to be swapped out at runtime without disrupting the DS2 service if desired. It can also speed up
development by allowing quick hot redeploys of code during development.
Caching Service
Provides for a centralized way to handle caching in the system and thus a single point for configuration and
control over all caches in the system. Provider and plugin developers are strongly encouraged to use this rather
than implementing their own caching. The caching service has the concept of scopes so even storing data in
maps or lists is discouraged unless there are good reasons to do so.
Configuration Service
The ConfigurationService controls the external and internal configuration of DSpace 2. It reads Properties files
when the kernel starts up and merges them with any dynamic configuration data which is available from the
services. This service allows settings to be updated as the system is running, and also defines listeners which
allow services to know when their configuration settings have changed and take action if desired. It is the
central point to access and manage all the configuration settings in DSpace 2.
Manages the configuration of the DSpace 2 system. Can be used to manage configuration for providers and
plugins also.
EventService
Handles events and provides access to listeners for consumption of events.
RequestService
In DS2 a request is an atomic transaction in the system. It is likely to be an HTTP request in many cases but it
does not have to be. This service provides the core services with a way to manage atomic transactions so that
when a request comes in which requires multiple things to happen they can either all succeed or all fail without
each service attempting to manage this independently. In a nutshell this simply allows identification of the
current request and the ability to discover if it succeeded or failed when it ends. Nothing in the system will
enforce usage of the service, but we encourage developers who are interacting with the system to make use of
this service so they know if the request they are participating in with has succeeded or failed and can take
appropriate actions.
SessionService
In DS2 a session is like an HttpSession (and generally is actually one) so this service is here to allow
developers to find information about the current session and to access information in it. The session identifies
the current user (if authenticated) so it also serves as a way to track user sessions. Since we use HttpSession
directly it is easy to mirror sessions across multiple servers in order to allow for no-interruption failover for users
when servers go offline.
Examples
In Spring:
<bean id="dspace.eventService"
factory-bean="dspace"
factory-method="getEventService"/>
<bean class="org.my.EventListener">
<property name="eventService" >
<ref bean="dspace.eventService"/>
</property>
</bean>
</beans>
(org.my.EventListener will need to register itself with the EventService, for which it is passed a reference to that
service via the eventService property.)
or in Java:
(This registers the listener externally – the listener code assumes it is registered.)
Tutorials
Several tutorials on Spring / DSpace Services are available:
1.8 Schema
This Database schema is not fully up to date with DSpace 4.0. 2 additional table to store item level
versioning information were added to DSpace 3 are currently not represented in this diagram.
Most of the functionality that DSpace uses can be offered by any standard SQL database that supports
transactions. Presently, the browse indices use some features specific to PostgreSQL and Oracle, so some
modification to the code would be needed before DSpace would function fully with an alternative database back-
end.
The org.dspace.storage.rdbms package provides access to an SQL database in a somewhat simpler form than
using JDBC directly. The main class is DatabaseManager, which executes SQL queries and returns TableRow
or TableRowIterator objects. The InitializeDatabase class is used to load SQL into the database via JDBC, for
example to set up the schema.
All calls to the Database Manager require a DSpace Context object. Example use of the database manager API
is given in the org.dspace.storage.rdbms package Javadoc.
The database schema used by DSpace is created by SQL statements stored in a directory specific to each
supported RDBMS platform:
The DSpace database code uses an SQL function getnextid to assign primary keys to newly created rows. This
SQL function must be safe to use if several JVMs are accessing the database at once; for example, the Web UI
might be creating new rows in the database at the same time as the batch item importer. The PostgreSQL-
specific implementation of the method uses SEQUENCES for each table in order to create new IDs. If an
alternative database backend were to be used, the implementation of getnextid could be updated to operate
with that specific DBMS.
The etc directory in the source distribution contains two further SQL files. clean-database.sql contains the SQL
necessary to completely clean out the database, so use with caution! The Ant target clean_database can be
used to execute this. update-sequences.sql contains SQL to reset the primary key generation sequences to
appropriate values. You'd need to do this if, for example, you're restoring a backup database dump which
creates rows with specific primary keys already defined. In such a case, the sequences would allocate primary
keys that were already used.
Versions of the .sql files for Oracle are stored in [dspace-source]/dspace/etc/oracle. These need to be copied
over their PostgreSQL counterparts in [dspace-source]/dspace/etc prior to installation.
The DSpace database can be backed up and restored using usual methods, for example with pg_dump and
psql. However when restoring a database, you will need to perform these additional steps:
The fresh_install target loads up the initial contents of the Dublin Core type and bitstream format
registries, as well as two entries in the epersongrouptable for the system anonymous and administrator
groups. Before you restore a raw backup of your database you will need to remove these, since they will
already exist in your backup, possibly having been modified. For example, use:
After restoring a backup, you will need to reset the primary key generation sequences so that they do not
produce already-used primary keys. Do this by executing the SQL in [dspace-source]/dspace/etc/update-
sequences.sql, for example with:
Future updates of DSpace may involve minor changes to the database schema. Specific instructions on
how to update the schema whilst keeping live data will be included. The current schema also contains a
few currently unused database columns, to be used for extra functionality in future releases. These
unused columns have been added in advance to minimize the effort required to upgrade.
db.url The JDBC URL to use for accessing the database. This should not point to a connection pool,
since DSpace already implements a connection pool.
db.driver JDBC driver class name. Since presently, DSpace uses PostgreSQL-specific features, this
should be org.postgresql.Driver.
Bitstream Store
DSpace offers two means for storing content. The first is in the file system on the server. The second is using
SRB (Storage Resource Broker). Both are achieved using a simple, lightweight API.
SRB is purely an option but may be used in lieu of the server's file system or in addition to the file system.
Without going into a full description, SRB is a very robust, sophisticated storage manager that offers essentially
unlimited storage and straightforward means to replicate (in simple terms, backup) the content on other local or
remote storage resources.
The terms "store", "retrieve", "in the system", "storage", and so forth, used below can refer to storage in the file
system on the server ("traditional") or in SRB.
The BitstreamStorageManager provides low-level access to bitstreams stored in the system. In general, it
should not be used directly; instead, use the Bitstream object in the content management API since that
encapsulated authorization and other metadata to do with a bitstream that are not maintained by the
BitstreamStorageManager.
The bitstream storage manager provides three methods that store, retrieve and delete bitstreams. Bitstreams
are referred to by their 'ID'; that is the primary key bitstream_id column of the corresponding row in the
database.
As of DSpace version 1.1, there can be multiple bitstream stores. Each of these bitstream stores can be
traditional storage or SRB storage. This means that the potential storage of a DSpace system is not bound by
the maximum size of a single disk or file system and also that traditional and SRB storage can be combined in
one DSpace installation. Both traditional and SRB storage are specified by configuration parameters. Also see
Configuring the Bitstream Store below.
Stores are numbered, starting with zero, then counting upwards. Each bitstream entry in the database has a
store number, used to retrieve the bitstream when required.
At the moment, the store in which new bitstreams are placed is decided using a configuration parameter, and
there is no provision for moving bitstreams between stores. Administrative tools for manipulating bitstreams and
stores will be provided in future releases. Right now you can move a whole store (e.g. you could move store
number 1 from /localdisk/store to /fs/anotherdisk/store but it would still have to be store number 1 and have the
exact same contents.
Bitstreams also have an 38-digit internal ID, different from the primary key ID of the bitstream table row. This is
not visible or used outside of the bitstream storage manager. It is used to determine the exact location (relative
to the relevant store directory) that the bitstream is stored in traditional or SRB storage. The first three pairs of
digits are the directory path that the bitstream is stored under. The bitstream is stored in a file with the internal
ID as the filename.
(assetstore dir)/12/34/56/12345678901234567890123456789012345678
Using a randomly-generated 38-digit number means that the 'number space' is less cluttered than simply
using the primary keys, which are allocated sequentially and are thus close together. This means that the
bitstreams in the store are distributed around the directory structure, improving access efficiency.
The internal ID is used as the filename partly to avoid requiring an extra lookup of the filename of the
bitstream, and partly because bitstreams may be received from a variety of operating systems. The
original name of a bitstream may be an illegal UNIX filename.
When storing a bitstream, the BitstreamStorageManager DOES set the following fields in the
corresponding database table row:
bitstream_id
size
checksum
checksum_algorithm
internal_id
deleted
store_number
The remaining fields are the responsibility of the Bitstream content management API class.
The bitstream storage manager is fully transaction-safe. In order to implement transaction-safety, the following
algorithm is used to store bitstreams:
1. A database connection is created, separately from the currently active connection in the current DSpace
context.
2. An unique internal identifier (separate from the database primary key) is generated.
3. The bitstream DB table row is created using this new connection, with the deleted column set to true.
4. The new connection is _commit_ted, so the 'deleted' bitstream row is written to the database
5. The bitstream itself is stored in a file in the configured 'asset store directory', with a directory path and
filename derived from the internal ID
6. The deleted flag in the bitstream row is set to false. This will occur (or not) as part of the current DSpace
Context.
This means that should anything go wrong before, during or after the bitstream storage, only one of the
following can be true:
Similarly, when a bitstream is deleted for some reason, its deleted flag is set to true as part of the overall
transaction, and the corresponding file in storage is not deleted.
Cleanup
The above techniques mean that the bitstream storage manager is transaction-safe. Over time, the bitstream
database table and file store may contain a number of 'deleted' bitstreams. The cleanup method of
BitstreamStorageManager goes through these deleted rows, and actually deletes them along with any
corresponding files left in the storage. It only removes 'deleted' bitstreams that are more than one hour old, just
in case cleanup is happening in the middle of a storage operation.
This cleanup can be invoked from the command line via the Cleanup class, which can in turn be easily
executed from a shell on the server machine using /dspace/bin/dspace cleanup. You might like to have this run
regularly by cron, though since DSpace is read-lots, write-not-so-much it doesn't need to be run very often.
Backup
The bitstreams (files) in traditional storage may be backed up very easily by simply 'tarring' or 'zipping' the
assetstore directory (or whichever directory is configured in dspace.cfg). Restoring is as simple as extracting
the backed-up compressed file in the appropriate location.
Similar means could be used for SRB, but SRB offers many more options for managing backup.
It is important to note that since the bitstream storage manager holds the bitstreams in storage, and information
about them in the database, that a database backup and a backup of the files in the bitstream store must be
made at the same time; the bitstream data in the database must correspond to the stored files.
Of course, it isn't really ideal to 'freeze' the system while backing up to ensure that the database and files match
up. Since DSpace uses the bitstream data in the database as the authoritative record, it's best to back up the
database before the files. This is because it's better to have a bitstream in storage but not the database
(effectively non-existent to DSpace) than a bitstream record in the database but not storage, since people would
be able to find the bitstream but not actually get the contents.
With DSpace 1.7 and above, there is also the option to backup both files and metadata via the AIP Backup and
Restore feature.
assetstore.dir = [dspace]/assetstore
(Remember that [dspace] is a placeholder for the actual name of your DSpace install directory).
assetstore.dir = [dspace]/assetstore_0
assetstore.dir.1 = /mnt/other_filesystem/assetstore_1
The above example specifies two asset stores. assetstore.dir specifies the asset store number 0 (zero); after
that use assetstore.dir.1, assetstore.dir.2 and so on. The particular asset store a bitstream is stored in is held in
the database, so don't move bitstreams between asset stores, and don't renumber them.
By default, newly created bitstreams are put in asset store 0 (i.e. the one specified by the assetstore.dir
property.) This allows backwards compatibility with pre-DSpace 1.1 configurations. To change this, for example
when asset store 0 is getting full, add a line to dspace.cfg like:
assetstore.incoming = 1
Then restart DSpace (Tomcat). New bitstreams will be written to the asset store specified by assetstore.dir.1,
which is /mnt/other_filesystem/assetstore_1 in the above example.
well: The particular asset store a bitstream is stored in is held in the database, so don't move bitstreams
between asset stores, and don't renumber them.
For example, let's say asset store number 1 will refer to SRB. The there will be a set of SRB account
parameters like this:
srb.host.1 = mysrbmcathost.myu.edu
srb.port.1 = 5544
srb.mcatzone.1 = mysrbzone
srb.mdasdomainname.1 = mysrbdomain
srb.defaultstorageresource.1 = mydefaultsrbresource
srb.username.1 = mysrbuser
srb.password.1 = mysrbpassword
srb.homedirectory.1 = /mysrbzone/home/mysrbuser.mysrbdomain
srb.parentdir.1 = mysrbdspaceassetstore
Several of the terms, such as mcatzone, have meaning only in the SRB context and will be familiar to SRB
users. The last, srb.parentdir.n, can be used to used for addition (SRB) upper directory structure within an SRB
account. This property value could be blank as well.
(If asset store 0 would refer to SRB it would be srb.host = ..., srb.port = ..., and so on (.0 omitted) to be
consistent with the traditional storage configuration above.)
The similar use of assetstore.incoming to reference asset store 0 (default) or 1..n (explicit property) means that
new bitstreams will be written to traditional or SRB storage determined by whether a file system directory on the
server is referenced or a set of SRB account parameters are referenced.
There are comments in dspace.cfg that further elaborate the configuration of traditional and SRB storage.
6.4 History
Changes in 4.x
Changes in 3.x
Changes in 1.8.x
Changes in 1.7.x
Changes in 1.6.x
Changes in 1.5.x
Changes in 1.4.x
Changes in 1.3.x
Changes in 1.2.x
Changes in 1.1.x
DS- administrative.js Dec 07, Feb 21, Unassigned Samuel Closed Fixed
3415 doEditCommunity 2016 2017 Cambien
wrong parameter name (Atmire)
DS- BasicWorkflow system Dec 28, Jul 12, Pascal- Pascal- Closed Fixed
3431 is vulnerable to 2016 2017 Nicolas Nicolas
unauthorized Becker Becker
manipulations
DS- PasswordAuthentication Mar 03, Mar 04, Unassigned Jonas Closed Fixed
3519 getSpecialGroups 2017 2017 Van
empty exception Goolen
catchblock (Atmire)
DS- Apache Commons Mar 06, Jun 07, Tim Alan Closed Fixed
3520 Collections vulnerability 2017 2017 Donohue Orth
(COLLECTIONS-580)
DS- when editing an Apr 26, Jul 05, Unassigned Samuel Closed Fixed
3584 eperson, trying to 2017 2017 Cambien
change its email (Atmire)
address is ignored if
another user already
has that email address
DS- BasicWorkflow system Jul 12, Jul 12, Pascal- Pascal- Closed Fixed
3647 is vulnerable to 2017 2017 Nicolas Nicolas
unauthorized Becker Becker
6 issues
DS- Any Nov 18, Oct 13, Andrea Andrea Closed Fixed
2895 registered 2015 2016 Bollini Bollini
user can (4Science) (4Science)
modify
inprogress
submission
DS- Bitstreams of Mar 10, Dec 08, Andrea Mark H. Closed Fixed
3097 embargoed 2016 2016 Bollini Wood
and/or (4Science)
withdrawn
items can be
accessed by
anyone
2 issues
DS- Cannot send Aug 13, Oct 13, Bram Roeland Closed Fixed
2702 email using 2015 2016 Luyten Dillen
SSL (Atmire)
XML External
Entity (XXE)
vulnerability in
pdfbox
DS- ItemTest and Sep 16, Sep 16, Pascal- Pascal- Closed Fixed
3328 CollectionTest 2016 2016 Nicolas Nicolas
fails in Becker Becker
dspace-4_x
branch
DS- Test fails into Sep 19, Sep 19, Luigi Luigi Closed Fixed
3330 dspace-4_x 2016 2016 Andrea Andrea
Pascarelli Pascarelli
4 issues
DS- JSPUI Edit Feb 15, May 16, Andrea Tim Closed Fixed
3063 News feature 2016 2016 Bollini Donohue
can be used to (4Science)
view/edit other
files readable
to Tomcat user
DS- XMLUI Mar 07, May 16, Tim Tim Closed Fixed
3094 Directory 2016 2016 Donohue Donohue
Traversal
Vulnerability in
Themes
2 issues
DS- XSS in JSPUI Sep 02, May 16, Tim Genaro Closed Fixed
2736 search form 2015 2016 Donohue Contreras
DS- Expression Sep 02, May 16, Tim Genaro Closed Fixed
2737 language 2015 2016 Donohue Contreras
Injection in
JSPUI search
form
2 issues
DS- Cross-site Oct 15, Mar 25, Luigi Sean Closed Fixed
1702 scripting (XSS 2013 2015 Andrea Xiao
injection) is Pascarelli
possible in
JSPUI Recent
Submissions
listings
DS- XMLUI returns Jan 30, Feb 24, Tim Tim Closed Fixed
1896 500 response 2014 2015 Donohue Donohue
for most invalid
"/static" URLs
DS- Cross-site Jun 30, Mar 25, Luigi Gabriela Closed Fixed
2044 scripting (XSS 2014 2015 Andrea Mircea
injection) is Pascarelli
possible in
JSPUI
Discovery
search form
DS- XMLUI allows Sep 03, Aug 01, Tim Hardy Closed Fixed
2130 access to 2014 2016 Donohue Pottinger
theme XSL files
DS- XMLUI Feb 05, Mar 07, Tim Tim Closed Fixed
2445 Directory 2015 2016 Donohue Donohue
Traversal
Vulnerability
DS- JSPUI Path Feb 09, Mar 25, Pascal- Pascal- Closed Fixed
2448 Traversal 2015 2015 Nicolas Nicolas
Vulnerability Becker Becker
6 issues
Improvements in 4.2
DS- Ukrainian May 30, Jul 16, Ivan Parhomenko Closed Fixed
913 translation 2011 2014 Masár Yaroslav
for Manakin
web
interface
DS- Shibboleth Feb 07, Aug 01, Hardy Pascal- Closed Fixed
1906 attributes 2014 2016 Pottinger Nicolas
may need to Becker
be
reconverted
DS- Complete Mar 01, Mar 06, Ivan Washington Closed Fixed
1932 Update 2014 2014 Masár Ribeiro
translation
of JSPUI
pt_BR
DS- Language Mar 11, May 29, Mark H. Sonmez Closed Fixed
1943 Selection _ 2014 2014 Wood CELIK
Turkish
Option
DS- zh_TW Jul 02, Jul 16, Ivan Chunmin Closed Fixed
2047 language 2014 2014 Masár Tai
for dspace
4.1 jspui
5 issues
DS- uncaught NPE in Dec 03, Mar 26, Mark H. Ivan Closed Fixed
1411 stats-log-converter 2012 2014 Wood Masár
-m
DS- XMLUI "Browse Jun 20, Apr 23, Bram Denis Closed Fixed
1584 by" sorting Bug 2013 2014 Luyten Fdz
(Atmire)
DS- Solr Search Empty Feb 20, Aug 01, Hardy Denis Closed Fixed
1919 FilterQuery bug 2014 2016 Pottinger Fdz
DS- OAI-PMH Identify Feb 27, Jul 17, Ivan Masár Ondej Closed Fixed
1928 response well- 2014 2014 Košarko
formed but invalid
DS- DS-1867 caused Mar 08, Jul 28, Tim Mohsen Closed Fixed
1940 error running "mvn 2014 2014 Donohue
package"
DS- http://mobile.demo. Mar 11, Aug 01, Hardy Thomas Closed Fixed
1944 dspace.org/xmlui/ 2014 2016 Pottinger Misilo
DS- Solr floods catalina. Mar 12, Apr 09, Mark H. Mark H. Closed Fixed
1946 out with unwanted 2014 2014 Wood Wood
messages
DS- Missing m-tweaks. Mar 13, Jul 16, Tim Thomas Closed Fixed
1947 js for mobile theme 2014 2014 Donohue Misilo
DS- incorrect xml Apr 01, Jul 09, Mark H. Roeland Closed Fixed
1957 workflow script for 2014 2014 Wood Dillen
oracle
DS- Discovery Apr 02, Jul 17, Mark H. Mark Closed Fixed
1958 OutOfMemoryError 2014 2014 Wood Diggory
when indexing
Large Bitstreams
DS- Use HTTPS with Apr 04, Jul 28, Mark H. Mark H. Closed Fixed
1961 oss.sonatype.org 2014 2014 Wood Wood
repository
DS- to many open files Apr 15, Aug 01, Hardy Roeland Closed Fixed
1970 exception when 2014 2016 Pottinger Dillen
update lucene
index
DS- 'bte-io' (v 0.9.2.3) Apr 15, Aug 01, Hardy Tim Closed Fixed
1971 dependency from 2014 2016 Pottinger Donohue
EKT has an invalid
SNAPSHOT
dependency in its
POM
DS- REST API holds Apr 27, Jul 16, Peter Dietz Peter Closed Fixed
1986 on to context for 2014 2014 Dietz
too long, should
use DB pool
DS- "dspace classpath" May 11, Jun 05, Mark H. Ivan Closed Fixed
1998 CLI command 2014 2014 Wood Masár
does nothing,
throws error
DS- Catalan translation May 15, May 16, Ivan Masár Àlex Closed Fixed
2004 of Discovery strings 2014 2014 Magaz
Graça
DS- JSPUI with Oracle May 22, Jun 05, Mark H. Denis Closed Fixed
2013 DB - Browse items 2014 2014 Wood Fdz
with
THUMBNAILS
unimplemented
DS- dim crosswalk has Jun 24, Jul 16, Tim Antoine Closed Fixed
2035 missing values 2014 2014 Donohue Snyers
(Atmire)
Closed Fixed
DS- DSpace upgrade Jun 24, May 08, Kevin Van Kevin
2036 with oracle 2014 2015 de Velde Van de
database, no (Atmire) Velde
discovery results (Atmire)
DS- Oracle dspace- Jun 24, Aug 01, Unassigned Hardy Closed Fixed
2038 schema_3-4.sql 2014 2016 Pottinger
upgrade script
contains a minor
error
Improvements in 4.1
DS- Community- Jan 13, Jul 28, Ivan Denis Closed Fixed
1860 list doesn't 2014 2014 Masár Fdz
show all
collections
DS- "Consuming Jan 15, Jan 21, Richard Tim Closed Fixed
1866 Web 2014 2014 Rodgers Donohue
Services"
curation task
is missing
documentation
DS- Update Feb 14, Feb 14, Ivan Tiago Closed Fixed
1911 translation of 2014 2014 Masár Murakami
JSPUI pt_BR
3 issues
Closed Fixed
DS- bug in current Apr 06, Feb 05, Mark H. James Closed Fixed
1531 DSpace (3.1) 2013 2014 Wood Halliday
with log
importing
DS- having a DOT in Apr 17, Sep 22, Ivan Jose Blanco Closed Fixed
1536 handle prefix 2013 2014 Masár
causes identifier.
uri to be cut off
when being
created
DS- Upgrade to Oct 30, Dec 20, Mark H. Mark H. Wood Closed Fixed
1744 latest log4j 2013 2013 Wood
DS- Wrongly aligned Nov 05, Jan 29, Ivan Marina Closed Fixed
1756 text on item 2013 2014 Masár Muilwijk
view in mobile
theme
DS- Missing images Nov 05, Jan 22, Ivan Marina Closed Fixed
1757 in mobile theme 2013 2014 Masár Muilwijk
DS- Pagination link Nov 11, Feb 20, Kim Raul Ruiz Closed Fixed
1779 error in JSPUI 2013 2014 Shepherd
discovery search
DS- When run Nov 17, Feb 26, Mark H. Anonymous Closed Fixed
1795 command 2013 2014 Wood (No Reply)
dspace "dspace
stat-initial"
DS- Incorrect label Dec 02, Jan 27, Ivan Bavo Van Closed Fixed
1816 for search in 2013 2014 Masár Geit
community in
navigation
section.
DS- Proxy Dec 11, Dec 21, Kevin Kevin Van de Closed Fixed
1832 configuration 2013 2013 Van de Velde (Atmire)
set in system Velde
properties if (Atmire)
empty
DS- Collection Dec 11, Jan 30, Kevin Kevin Van de Closed Fixed
1833 content source 2013 2014 Van de Velde (Atmire)
harvesting test Velde
does not work (Atmire)
with ORE
DS- Collection Dec 11, Feb 17, Kevin Kevin Van de Closed Fixed
1834 content source 2013 2014 Van de Velde (Atmire)
harvesting test Velde
does not check (Atmire)
sets properly
DS- Invalid Dec 11, Jan 29, Mark H. Mark H. Wood Closed Fixed
1835 bootstrap CSS 2013 2014 Wood
DS- Cannot deposit Dec 18, Jul 28, Andrea Àlex Magaz Closed Fixed
1846 new item via 2013 2014 Schweer Graça
SWORD
DS- OAI harvest Dec 20, Feb 18, Kevin Kevin Van de Closed Fixed
1848 issues when 2013 2014 Van de Velde (Atmire)
starting from Velde
control panel (Atmire)
/command line
DS- Unhandled Jan 08, Jan 31, Kostas Kostas Closed Fixed
1857 exception in 2014 2014 Stamatis Stamatis
BTE batch
import when
uploading CSV
files with
misconfiguration
options
DS- JSPUI eperson Jan 15, Feb 20, Ivan Denis Fdz Closed Fixed
1863 and group 2014 2014 Masár
selection should
DS- Maven build Jan 15, Apr 09, Tim Tim Donohue Closed Fixed
1867 issues from [src] 2014 2015 Donohue
/dspace/ , error
finding target
/build.properties
DS- CheckSum Jan 21, Apr 09, Tim Tim Donohue Closed Fixed
1873 Checker 2014 2015 Donohue
Emailer sends
emails for 0
issues, doesn't
specify any site
info
DS- XMLUI mobile Jan 23, Jan 23, Ivan Ivan Masár Closed Fixed
1878 theme front 2014 2014 Masár
page search
doesn't
recognize
Discovery
DS- Request Copy Feb 24, Sep 08, Ivan Masár Brian Closed Fixed
824 function for 2011 2015 Freels-
XMLUI and Stendel
JSPUI
DS- Recent items Mar 02, Aug 15, Keiji Suzuki João Melo Closed Fixed
831 addon : Listing 2011 2013
of most
recently added
items to
DSpace
DS- Create new Nov 28, Jun 13, Mark H. Stuart Closed Fixed
1083 users from the 2011 2013 Wood Lewis
command line
DS- (JSP)UI Import Aug 29, Nov 13, Andrea Andrea Closed Fixed
1252 from 2012 2013 Bollini Bollini
bibliographics (4Science) (4Science)
database
/formats
DS- EmailService Sep 20, Aug 23, Mark H. Mark H. Closed Fixed
1269 to encapsulate 2012 2013 Wood Wood
the sending of
mail
DS- Creative Oct 17, May 02, Ivan Masár Juan Closed Fixed
1336 Commons 2012 2013 Corrales
Locale Correyero
DS- "dspace Jan 17, Jan 23, Mark H. Ivan Closed Fixed
1456 version" 2013 2016 Wood Masár
command-line
script
DS- Add a way for Feb 08, Feb 06, Unassigned Tim Closed Fixed
1482 harvesters to 2013 2017 Donohue
find recently
added items
(request from
Google)
DS- Store link to Feb 08, Apr 13, Andrea Tim Closed Fixed
1483 "primary 2013 2016 Schweer Donohue
bitstream" in
citation_pdf_url
for Google
Scholar
(request from
Google)
DS- DOI support Apr 10, Mar 15, Mark H. Pascal- Closed Fixed
1535 for dspace-api 2013 2016 Wood Nicolas
Becker
DS- Stream May 30, Jun 13, Mark H. Mark H. Closed Fixed
1567 multiple 2013 2013 Wood Wood
commands into
one invocation
of bin/dspace
DS- Porting Aug 01, Dec 12, Andrea Keiji Closed Fixed
1613 curation task 2013 2013 Bollini Suzuki
administrative (4Science)
UI to JSPUI
DS- Porting of the Aug 12, Nov 03, Andrea Andrea Closed Fixed
1622 Login As 2013 2013 Bollini Bollini
feature to (4Science) (4Science)
JSPUI
DS- Upgrade Aug 12, Oct 06, Andrea Andrea Closed Fixed
1623 DSpace-SOLR 2013 2015 Bollini Bollini
to SOLR 4 (4Science) (4Science)
DS- Reset Aug 12, Jul 28, Andrea Andrea Closed Fixed
1624 password in 2013 2014 Bollini Bollini
edit eperson (4Science) (4Science)
for
administrator
(as in XMLUI)
DS- Sherpa/Romeo Aug 17, Jul 08, Andrea Andrea Closed Fixed
1633 integration in 2013 2014 Bollini Bollini
the submission (4Science) (4Science)
upload step
DS- AJAX progress Aug 25, Apr 30, Andrea Andrea Closed Fixed
1639 bar for file 2013 2014 Bollini Bollini
upload in (4Science) (4Science)
JSPUI
DS- Curation Task Sep 03, May 08, Richard Richard Closed Fixed
1647 for Consuming 2013 2015 Rodgers Rodgers
Web Services
DS- Adopt/Create Sep 19, Nov 01, Peter Dietz Tim Closed Fixed
1657 an official 2013 2013 Donohue
DSpace REST
API
DS- New JSPUI Sep 25, Jul 28, Andrea Andrea Closed Fixed
1675 look & feel 2013 2014 Bollini Bollini
(4Science) (4Science)
Improvements in 4.0
DS- Remove dspace Aug 24, Jan 31, Mark H. Stuart Closed Fixed
286 /bin 2009 2013 Wood Lewis
/dspace_migrate
script
DS- SOLR - Spider Dec 22, May 08, Mark H. Peter Closed Fixed
790 detection to 2010 2015 Wood Dietz
match on
hostname or
useragent
DS- When 'mail. Dec 27, Aug 13, Mark H. usha Closed Fixed
792 server.disabled 2010 2013 Wood sharma
= true' put text
of email in log
file?
DS- Language Mar 16, May 08, Ivan Masár Claudia Closed Fixed
842 switch for xmlui 2011 2015 Jürgen
and some basic
i18n stuff
DS- EPerson Dec 01, May 21, Mark H. Mark H. Closed Fixed
1085 last_active field 2011 2014 Wood Wood
is defined but
never filled
Closed Fixed
DS- Show a single May 09, Jan 15, Unassigned Àlex Closed Fixed
1168 search box in 2012 2014 Magaz
the front page Graça
DS- use better Sep 07, May 08, Bram Ivan Closed Fixed
1259 image 2012 2015 Luyten Masár
downscaling (Atmire)
method in filter-
media
DS- Enable Oct 01, May 08, Kevin Van Mark Closed Fixed
1272 Discovery By 2012 2015 de Velde Diggory
Default in XMLUI (Atmire)
DS- batch-create Oct 29, May 30, Mark H. Ivan Closed Fixed
1355 users from 2012 2013 Wood Masár
command line
DS- A porting Oct 30, Nov 03, Andrea Keiji Closed Fixed
1360 advanced 2012 2013 Bollini Suzuki
embargo (4Science)
function to
JSPUI
DS- Discovery Dec 03, Feb 06, Ivan Masár Ivan Closed Fixed
1409 should obsolete 2012 2014 Masár
webui.strengths.
cache
DS- Testing of Jan 22, Jan 23, Ivan Masár Pascal- Closed Fixed
1459 dissemination 2013 2013 Nicolas
crosswalks Becker
DS- Add SOLR Jan 22, Oct 06, Mark H. Hilton Closed Fixed
1460 logging config 2013 2015 Wood Gibson
file
DS- Fix Jan 30, Nov 01, Ivan Masár Thomas Closed Fixed
1472 Capitalization of 2013 2013 Misilo
Submissions &
workflow tasks
in xmlui
messages.xml
DS- Improvement of Feb 04, Oct 23, Ivan Masár Thomas Closed Fixed
1475 Collection 2013 2013 Misilo
Dropdown
DS- "dc.date.issued" Feb 08, May 08, Unassigned Tim Closed Fixed
1481 is often 2013 2015 Donohue
incorrectly set
(reported from
Google)
DS- I18n in default. Feb 11, Nov 28, Bram Onivaldo Closed Fixed
1484 license and 2013 2013 Luyten Rosa
input-forms.xml (Atmire) Junior
DS- Stop ehcache Sep 14, Oct 29, Ivan Masár Mark H. Closed Fixed
1492 new-version 2012 2013 Wood
check
DS- make current May 01, Oct 23, Ivan Masár Ivan Closed Fixed
1542 interface 2013 2013 Masár
language
accessible in
DRI
DS- Item mapper search Dec 07, Aug 19, Keiji Suzuki Claudia Closed Fixed
402 case sensitive (jspui 2009 2013 Jürgen
only)
DS- Command line utility org. Jan 05, Nov 11, Mark H. Toni Closed Fixed
449 dspace.app.harvest. 2010 2013 Wood Prieto
Harvest -S throws
AuthorizeException
Closed Fixed
DS- 'dspace harvest -g' Jan 19, Aug 22, Mark H. Mark H.
803 (ping) doesn't 2011 2013 Wood Wood
DS- file description at May 01, Aug 01, Mark H. Kostas Closed Fixed
888 UploadStep 2011 2013 Wood Maistrelis
DS- Monthly stats report Jul 11, Dec 04, Andrea Andrea Closed Fixed
951 ignores items archived 2011 2013 Schweer Schweer
on first and last day of
the month
DS- Browse by author or Aug 16, Nov 06, Unassigned Cedric Closed Fixed
992 subject with special 2011 2013 Devaux
characters
DS- xmlui "wildcard policy Feb 07, Aug 21, Ivan Masár james Closed Fixed
1119 admin tool" does nothing 2012 2013 bardin
DS- ItemImport BitStream Feb 24, Dec 04, Kostas Thomas Closed Fixed
1132 Registration does not 2012 2013 Stamatis Autry
properly set the
Description
DS- BinaryContentIngester in Mar 29, May 08, Richard Marco Closed Fixed
1149 SWORDv2 creates a 2012 2015 Jones Fabiani
new ORIGINALS bundle
every time a bitstream is
ingested to an Item
DS- collection view doesn't Jun 07, Jul 28, Kevin Van Ivan Closed Fixed
1188 show content by default 2012 2014 de Velde Masár
(Atmire)
DS- DSpace org.dspace.core. Jul 02, May 08, Unassigned DSpace Closed Fixed
1205 Context caching problem 2012 2013 @
Lyncode
DS- Only collections are Jul 18, Oct 11, Keiji Suzuki Àlex Closed Fixed
1212 exported when exporting 2012 2013 Magaz
a community Graça
IP authentication
configuration does not
apply netmask and
CIDR ranges correctly
DS- Provide a link to More Oct 10, Nov 05, Kevin Van Samuel Closed Fixed
1278 Submissions at the 2012 2013 de Velde Ottenhoff
bottom of Recent (Atmire)
Submissions
DS- Item without Title Oct 15, Dec 04, Kostas Claudia Closed Fixed
1322 inaccessible via the UI 2012 2013 Stamatis Jürgen
unless for the admin via
ID or Handle
DS- Clarify documentation: Oct 16, Dec 01, Kevin Van Tim Closed Fixed
1335 versioned items will re- 2012 2013 de Velde Donohue
enter Collection (Atmire)
workflow approval?
DS- Mobile XMLUI theme Oct 29, Nov 27, Ivan Masár Moises Closed Fixed
1357 fails to load when 2012 2013 A.
reloading item view page
DS- Adding supervisor order Nov 26, Nov 23, Keiji Suzuki Jonathan Closed Fixed
1399 bug 2012 2013 Blood
DS- ShibbolethAuthentication Dec 03, Aug 01, Hardy Ian Closed Fixed
1410 has multiple NPE and 2012 2016 Pottinger Boston
Findbugs issues
DS- Duplicate Headers when Dec 12, Dec 09, Ivan Masár Jonathan Closed Fixed
1422 bitstream has a comma 2012 2014 Blood
in the title (Chrome)
DS- XMLUI Mar 07, May 16, Tim Tim Closed Fixed
3094 Directory 2016 2016 Donohue Donohue
Traversal
Vulnerability in
Themes
1 issue
DS- XSS in JSPUI Sep 02, May 16, Tim Genaro Closed Fixed
2736 search form 2015 2016 Donohue Contreras
DS- Expression Sep 02, May 16, Tim Genaro Closed Fixed
2737 language 2015 2016 Donohue Contreras
Injection in
JSPUI search
form
2 issues
DS-1702 - Cross-site scripting (XSS injection) is possible in JSPUI Recent Submissions listings
DS-2044 - Cross-site scripting (XSS injection) is possible in JSPUI Discovery search form
DS-2445 - XMLUI Directory Traversal Vulnerabilities
Also resolves related, minor theme access issues DS-1896 and DS-2130.
DS-2448 - JSPUI Directory Traversal Vulnerability
DS- having a DOT in Apr 17, Sep 22, Ivan Jose Closed Fixed
1536 handle prefix 2013 2014 Masár Blanco
causes identifier.
uri to be cut off
when being created
DS- Unable to remove Aug 10, Jan 16, Andrea Andrea Closed Fixed
1619 items after 2013 2015 Bollini Bollini
enabling (4Science) (4Science)
SOLRBrowseDAOs
DS- Collection content Dec 11, Feb 17, Kevin Van Kevin Van Closed Fixed
1834 source harvesting 2013 2014 de Velde de Velde
test does not (Atmire) (Atmire)
check sets properly
DS- Get page refresh Jan 30, Jan 30, Kevin Van Kevin Van Closed Fixed
1893 after adding a 2014 2014 de Velde de Velde
value in the (Atmire) (Atmire)
submission forms
clears all metadata
in XMLUI
DS- OAI not always Jan 31, Jul 28, Kevin Van Kevin Van Closed Fixed
1898 closing contexts 2014 2014 de Velde de Velde
(Atmire) (Atmire)
DS- Discovery Apr 02, Jul 17, Mark H. Mark Closed Fixed
1958 OutOfMemoryError 2014 2014 Wood Diggory
when indexing
Large Bitstreams
DS- Use HTTPS with Apr 04, Jul 28, Mark H. Mark H. Closed Fixed
1961 oss.sonatype.org 2014 2014 Wood Wood
repository
DS- "dspace classpath" May 11, Jun 05, Mark H. Ivan Closed Fixed
1998 CLI command 2014 2014 Wood Masár
does nothing,
throws error
DS- JSPUI with Oracle May 22, Jun 05, Mark H. Denis Fdz Closed Fixed
2013 DB - Browse items 2014 2014 Wood
with
THUMBNAILS
unimplemented
9 issues
1 issue
DS- "clean_backups" removed from help section of build.xml Ivan Brian Freels-
1123 Masár Stendel
DS- In OAI-PMH "Identify" response, the <description> is no longer João Tim Donohue
1479 configurable Melo
DS- OAI 2.0 Bug (set & from/until parameters) João João Melo
1507 Melo
DS- DSpace's .gitignore wrongly ignores all *.properties files Tim Tim Donohue
1540 Donohue
DS- dspace-lni-client is detached from the project tree and won't build Mark H. Mark H.
1550 Wood Wood
DS- discovery.cfg doesn't use the "solr.server" setting from build.properties, Tim Tim Donohue
1593 hardcodes its own URL Donohue
DS- DSpace 3.2 OAI-PMH Functionality needs JDK 1.7 (Java 7) João Samuel
1609 Melo Ottenhoff
12 issues
DS- Porting the document type-based submission (DS-1127) to Ivan Masár Keiji
1361 JSPUI Suzuki
DS- Refactor SOLR Statistics to use OpenCSV or Apache Kevin Van de Velde Tim
1407 Commons CSV (Atmire) Donohue
DS- In OAI src for jquery uses an http only João Melo Thomas
1457 Misilo
3 issues
DS- NPE when removing roles from Collection workflow steps Kevin Van de Ian Boston
1416 Velde (Atmire)
DS- Thumbnails in discovery search results do not point to the item Ivan Masár Elvi S. Nemiz
1417
DS- OAI Harvester settings missing from oai.cfg João Melo Tim Donohue
1461
DS- StatisticsServlet attempts to show JSP twice when there are no Ivan Masár Bram Luyten
1464 reports (Atmire)
14 issues
DS- OAI Extended Addon : Adding filter and modifying capacities to the João Melo João Melo
829 OAI interface
DS- Created a DSpace API module to contain api changes Kevin Van Kevin Van
981 de Velde de Velde
(Atmire) (Atmire)
DS- DSpace Shibboleth authentication module needs to support Lazy Scott Phillips Scott Phillips
1012 Authentication, NetID based authentication, and additional EPerson
metadata
DS- Ensure that DSpace can run on java 7 Robin Taylor Kevin Van
1081 de Velde
(Atmire)
DS- Create controlled vocabulary support for the XMLUI. Kevin Van Kevin Van
1130 de Velde de Velde
(Atmire) (Atmire)
DS- New config setting to skip IP checks when authenticating a user Sands Fish Samuel
1192 Ottenhoff
DS- Batch import from basic bibliographic formats (Endnote, BibTex, RIS, Robin Taylor Kostas
1226 TSV, CSV) Stamatis
15 issues
DS- On login screen, keyboard input focus should be set to the first field (E- Peter Dietz Andrea
722 mail Address) so you don't have to use the mouse (XMLUI) Bollini
(4Science)
DS- Embargo Overhaul: Utilize ResourcePolicy Start and Stop datestamps for Unassigned Mark
908 enforcing embargo in DSpace Diggory
DS- Samuel
1078 Ottenhoff
DS- Handle authority and confidence fields in Bulk Editing Ivan Masár Keiji
1084 Suzuki
DS- Increase the default upload limit to the maximum allowed by cocoon (2GB) Scott Scott
1124 Phillips Phillips
DS- Refactor Browse related code out of InitializeDatabase into Robin Robin
1156 InitializeBrowseDatabase Taylor Taylor
DS- Cleanup various code comment typos and whitespace issues Tim Ivan
1158 Donohue Masár
DS- Refactor class InitializeDatabase to use Configuration Service rather than Robin Robin
1160 ConfigurationManager Taylor Taylor
DS- LDAP: if no adminUser is set, build the DN using the object_context Ivan Masár Samuel
1180 Ottenhoff
DS- Input Form Fields, fields with restricted visibility can't be made Unassigned Claudia
334 mandatory Jürgen
DS- When a "qualdrop_value" is set to "required", submit form always Kevin Van de Onivaldo
851 fails Velde (Atmire) Rosa
Junior
DS- DSpace Test supporting files get quickly out of date Mark H. Wood Mark
859 Diggory
DS- DSpaceControlledVocabulary always returns an empty list Mark H. Wood Ariel Lira
886 (SEDICI)
DS- Last modified timestamp doesn't trigger on bitstream delete Kevin Van de Bram
899 Velde (Atmire) Luyten
(Atmire)
DS- Concurrent task claiming and editing of metadata possible for same Kevin Van de Bill Hays
918 item in submission workflow Velde (Atmire)
DS- Mirage theme authority control popup (choice-support.js) breaks on Tim Donohue David
972 more results Chandek-
Stark
DS- Item view in Mirage theme broken when ORIGINAL or CONTENT Kevin Van de Jennifer
1039 bundle present and empty Velde (Atmire) Whitney
DS- 'ant help' refers to 'install_code' target which does not exist Mark H. Wood Mark H.
1043 Wood
DS- Select Collection step limits length of collection name, leading to Peter Dietz Peter Dietz
1044 difficulty in picking the correct collection.
DS- Items without date.accessioned are perminantly sorted to the top of Scott Phillips Scott
1052 all date based searches. Phillips
DS- References to bitstreams not from the 'ORIGINAL' bundle are Scott Phillips Àlex Magaz
1055 shown in harvested items Graça
DS- When multiple authentication methods are enabled the Scott Phillips Scott
1056 LoginChooser will place an blank div prior to logging in Phillips
DS- Filenames and BitstreamFormat detection break on filenames with Kevin Van de Mark
1061 equal signs in them Velde (Atmire) Diggory
DS-1587 Update 1.7.x and 1.8.x branches for Git/GitHub Tim Donohue Tim Donohue
1 issue
DS-949 Curation needs to document queueing with workflow configuration Richard Wendy
Rodgers Bossons
DS- Increase the default upload limit to the maximum allowed by Scott Phillips Scott Phillips
1124 cocoon (2GB)
2 issues
DS- Concurrent task claiming and editing of metadata possible for same Kevin Van de Bill Hays
918 item in submission workflow Velde (Atmire)
DS- System-wide Curation Task UI is missing a "Task" label Tim Donohue Tim Donohue
1107
DS- AIP Backup & Restore doesn't restore a Bitstream's "Sequence ID" Tim Donohue Tim Donohue
1108
DS- AIP Backup & Restore : SITE AIP has a different checksum Tim Donohue Tim Donohue
1120 everytime when orphaned Collection/Community groups exist
DS- When adding a Bitstream to a Bundle, the 'bitstream_order' is Tim Donohue Tim Donohue
1122 always set to the 'sequence_id'
DS- Edit Harvesting Collection Content Source tab broken Kevin Van de Kevin Van de
1129 Velde (Atmire) Velde (Atmire)
8 issues
DS-1099 Bulgarian Translation for DSpace 1.8.1 Claudia Jürgen Vladislav Zhivkov
1 issue
DS- Last modified timestamp doesn't trigger on bitstream delete Kevin Van de Bram Luyten
899 Velde (Atmire) (Atmire)
DS- Items without date.accessioned are perminantly sorted to the top Scott Phillips Scott Phillips
1052 of all date based searches.
DS- References to bitstreams not from the 'ORIGINAL' bundle are Scott Phillips Àlex Magaz
1055 shown in harvested items Graça
DS- When multiple authentication methods are enabled the Scott Phillips Scott Phillips
1056 LoginChooser will place an blank div prior to logging in
DS- Subscription email reports new items twice, sometimes. Scott Phillips Scott Phillips
1062
DS- Authentication error with external login in JSPUI Kevin Van de Kevin Van de
1064 Velde (Atmire) Velde (Atmire)
DS- Removing a metadata field from an item does not update the Scott Phillips Scott Phillips
1068 browse sorting indexes.
DS- Visualisation of static pages is broken in 1.8 Peter Dietz Àlex Magaz
1076 Graça
DS- XMLUI & CLI always show a NullPointerException after running a Tim Donohue Tim Donohue
1077 Site-wide Curation Task
DS- CC license process fails with java.lang. Peter Dietz Dan Ishimitsu
1090 NegativeArraySizeException.
DS- Potential NPE error when mapping items Scott Phillips Scott Phillips
1094
14 issues
DS- RSS feeds to support richer features, such as iTunes Peter Dietz Peter Dietz
528 Podcast or Media RSS
DS- Marker ticket for developing a Sword client for DSpace. Robin Taylor Robin Taylor
602
DS- check files on input for viruses, and verify file format Robin Taylor Jose Blanco
638
DS- Make launcher's classpath calculation available to external Mark H. Wood Mark H. Wood
737 scripts
DS- allow for bitstream display order to be changed Kevin Van de Velde Jose Blanco
749 (Atmire)
DS- Delete / withdraw items via bulk csv editing Stuart Lewis Stuart Lewis
811
DS- Add MARCXML crosswalk for OAI-PMH Robin Taylor Timo Aalto
848
DS-
984
DS- DSpace 1.8: Add Curation Task Groups to GUI Richard Rodgers Wendy
1001 Bossons
13 issues
DS- Need to remove all release repository and pluginRepository entries from Mark Mark
514 Maven poms. Diggory Diggory
DS- Ability to perform maintenance on SOLR with solr.optimize Ben Peter Dietz
615 Bosman
DS- Tidy up URL mapping for DisplayStatisticsServlet (JSPUI servlet that Kim Kim
690 handles solr statistics) Shepherd Shepherd
DS- Add ability to disable the building of particular DSpace modules Tim Tim
791 /interfaces from source code Donohue Donohue
DS- Adding Field to Choice Authority to allow Authorities to be able to know Mark Fabio
839 field being required Diggory Bolognesi
DS- Add Ability to create Top Level Community in at the home page. Mark Mark
840 Diggory Diggory
DS- Split the Creative Commons and Licence steps into two seperate steps. Robin Robin
852 Taylor Taylor
DS- Licenses on non-DSpace files have been replaced by DSpace Tim Peter Dietz
854 boilerplate license Donohue
DS- CHANGES file now obsolete in SVN - point at online History Tim Tim
857 Donohue Donohue
DS- Improve Logging & XMLUI Error Handling of Curation Tools Tim Tim
896 Donohue Donohue
DS- Withdrawn items displayed as "restricted" rather than withdrawn Robin Taylor Tim
135 Donohue
DS- Single-argument Item.getMetadata does not work with mixed- Stuart Lewis Nicholas
215 case metadata Riley
DS- UI cosmetics, "My Exports" displayed in navigation bar, when no Robin Taylor Claudia
435 user is logged in Jürgen
DS- SOLR statistics file download displays all files and not only Kevin Van de Claudia
599 those in the Bundle Original Velde (Atmire) Jürgen
DS- Unfinished submissions see cc-rdf file instead of their uploaded Richard Rodgers Peter Dietz
612 PDF in the uploads step.
DS- Exceed maximum while uploading files got the user stuck should Peter Dietz Claudia
620 lead to a friendly error page Jürgen
DS- Wrong Parameter Name in web.xml comment Mark H. Wood Andy Smith
631
DS- IPAuthentication doesn't work with IPv6 addresses Mark H. Wood Stuart Lewis
642
DS- MetadataSchema: cache out of sync after calling delete() Claudia Jürgen Janne
761 Pietarila
DS- OAI-PMH ListRecords false no result answer and missing Unassigned Claudia
764 resumptionToken Jürgen
DS- All XMLUI Error Pages respond with 200 OK, instead of 404 Not Kim Shepherd Tim
768 Found Donohue
DS- SWORD deposits fail when ingest events are fired if Discovery Kim Shepherd Kim
785 event consumer is configured Shepherd
DS- HTTPS renders with errors due to a hardcoded HTTP link Peter Dietz Bram Luyten
789 (Atmire)
Item.match() incorrect logic for schema testing Stuart Lewis Stuart Lewis
DS-
806
DS- jqueryUI javascript gets imported without corresponding CSS Ben Bosman Bram Luyten
808 (Atmire)
DS-1587 Update 1.7.x and 1.8.x branches for Git/GitHub Tim Donohue Tim Donohue
DS-1588 Update 1.7.x branch to build properly with Maven 3 Tim Donohue Tim Donohue
2 issues
Bug Fixes
DS- 'IllegalArgumentException: No such column rnum' error in DSpace 1.7.x Peter Hardy
841 XMLUI admin eperson (with Oracle backend) Dietz Pottinger
Peter Dietz
DS- XMLUI caches community / collection page which doesn't show a Peter
871 recently submitted item immediately Dietz
DS- DSpace Configuration service error when using "dspace" script. Mark Kevin Van de
875 Diggory Velde
(Atmire)
3 issues
DS- Solr statistics documentation in DSpace manual and DSDOC is out-of-date, Kim Kim
720 wrong, and inconsistent with dspace.cfg Shepherd Shepherd
DS- Adding Field to Choice Authority to allow Authorities to be able to know field Mark Fabio
839 being required Diggory Bolognesi
DS- Add Ability to create Top Level Community in at the home page. Mark Mark
840 Diggory Diggory
DS- CHANGES file now obsolete in SVN - point at online History Tim Tim
857 Donohue Donohue
11 issues
DS- Single-argument Item.getMetadata does not work with mixed-case Stuart Nicholas
215 metadata Lewis Riley
DS- UI cosmetics, "My Exports" displayed in navigation bar, when no user is Robin Claudia
435 logged in Taylor Jürgen
DS- Exceed maximum while uploading files got the user stuck should lead to a Peter Dietz Claudia
620 friendly error page Jürgen
DS- Mirage theme - lists of unifished submission/workflow task wron link in Claudia Claudia
758 collection column Jürgen Jürgen
DS- MetadataSchema: cache out of sync after calling delete() Claudia Janne
761 Jürgen Pietarila
DS- Collection admin cannot add bitstreams unless there is at least one bundle Peter Dietz Eija Airio
776
DS- SWORD deposits fail when ingest events are fired if Discovery event Kim Kim
785 consumer is configured Shepherd Shepherd
DS- DSpace 1.7.0 only builds properly for Maven 2.2.0 or above Unassigned Tim
788 Donohue
DS- HTTPS renders with errors due to a hardcoded HTTP link Peter Dietz Bram
789 Luyten
(Atmire)
DS- jqueryUI javascript gets imported without corresponding CSS Ben Bram
808 Bosman Luyten
(Atmire)
DS- Empty dc.abstract dim field (in mets XML) creates an empty span tag, Ben Bram
809 causing page display errors in all Internet Explorer version Bosman Luyten
(Atmire)
DS- AbstractMETSIngester creates an item before adding descriptive metadata Tim Stuart
821 Donohue Lewis
DS- Autocomplete in authority control contains small errors in Mirage Ben Ben
843 Bosman Bosman
DS- Multicore SOLR needs prevent remote access to solr cores Mark Kim
858 Diggory Shepherd
DS- SWORD still uses dspace.url rather than dspace.baseUrl Kim Stuart
860 Shepherd Lewis
20 issues
DS- Provide metatags used by Google Scholar for enhanced indexing Sands Sarah
396 Fish Shreeves
DS- Add ability to export/import entire Community/Collection/Item structure (for Tim Tim
466 easier backups, migrations, etc.) Donohue Donohue
DS- Move item - inherit default policies of destination collection Stuart Stuart
525 Lewis Lewis
DS- PowerPoint Text Extraction for DSpace Media Filter Keith Keith
714 Gilbertson Gilbertson
12 issues
DS- Add ability for various Packager plugins to report their custom "options" Tim Tim
387 via command line Donohue Donohue
DS- Consider making the JSPUI styles.css.jsp a static file Stuart Lewis Stuart
467 Lewis
DS- Upgrade to latest Google Analytics tracking code Stuart Lewis Stuart
550 Lewis
DS- LC Authority Names - Lookup Feature - names w/o dates Kim Mark
557 Shepherd Diggory
DS- On login screen, keyboard input focus should be set to the first field (E- Andrea Oleksandr
561 mail Address) so you don't have to use the mouse (JSPUI) Bollini Sytnyk
(4Science)
DS- Use modified Cocoon Servlet Service Impl in place of existing to support Mark Mark
577 proper Cocoon Block addition. Diggory Diggory
DS- Error Handling in the XMLUI interface after section expired Tim Antero
613 Donohue Neto
DS- Make the timeout for the extended resolver dnslookup configurable Jeffrey Claudia
628 Trimble Jürgen
DS- Need Help Testing LNI refactoring changes in AIP Backup/Restore Work Unassigned Tim
647 Donohue
DS- Modern Browsers are not identified in XMLUI main sitemap.xmap Tim Tim
648 Donohue Donohue
DS- Brazilian Portuguese (pt_BR) translation for XML-UI 1.6.2 Claudia Erick
653 Jürgen Rocha
Fonseca
DS- xmlui hardcoded string in AuthenticationUtil.java - ID: 2088360 Mark H. Andrea Bollini
63 Wood (4Science)
DS- xmlui browse in empty collection displays "Now showing items 1-0" of 0 Scott Keith
123 - incorrect numbering Phillips Gilbertson
DS- Special groups shown for logged in user rather than for user being Stuart Stuart Lewis
242 examined Lewis
DS- XMLUI Item Mapper cannot handle multiple words in search box Stuart Tim Donohue
268 Lewis
DS- Item's submission license accessible without beiing configured to be Claudia Claudia
426 public Jürgen Jürgen
DS- Restricted Bitstream prompts for login, then forwards user to MyDSpace Kim Tim Donohue
431 Shepherd
DS- Accessing site-level 'mets.xml' in XMLUI doesn't work properly for Kim Tim Donohue
471 handle prefixes with periods (e.g. 2010.1) Shepherd
DS- Broken link in the documentation section 8.2.3. Jeffrey Robin Taylor
495 Trimble
DS- Date month and day get default values when user returns to describe Robin Gabriela
497 form Taylor Mircea
DS- Retrieving country names in SOLR can return ArrayIndexOutOfBounds Peter Peter Dietz
509 when country code is unchecked Dietz
DS- Malformed Japanese option values in the authority lookup window Kim Keiji Suzuki
537 Shepherd
DS- restricted items are being returned in OAI GetRecord method while Ben Ben Bosman
538 using harvest.includerestricted.oai Bosman
1 issue
DS- DSRUN does not start Service Manager Stuart Lewis Mark
516 Diggory
DS- Errors in 1.5.x -> 1.6.x and 1.6.0 - 1.6.1 upgrade steps Jeffrey Kim
604 Trimble Shepherd
DS- Batch metadata import missing item headers Stuart Lewis Stuart Lewis
608
7 issues
DS- Documentation for "schema" attribute in metadata xml files Jeffrey Keith
534 Trimble Gilbertson
DS- Use modified Cocoon Servlet Service Impl in place of existing to support Mark Mark
577 proper Cocoon Block addition. Diggory Diggory
7 issues
DS- Special groups shown for logged in user rather than for user being Stuart Lewis Stuart Lewis
242 examined
DS- CC License being assigned incorrect Mime Type during submission. Jeffrey Steven
295 Trimble Williams
DS- Accessing site-level 'mets.xml' in XMLUI doesn't work properly for Kim Shepherd Tim Donohue
471 handle prefixes with periods (e.g. 2010.1)
DS- Url in browser is incorrect after login Ben Bosman Ben Bosman
493
DS- Date month and day get default values when user returns to Robin Taylor Gabriela
497 describe form Mircea
DS- embargo-lifter command missing from launcher.xml Stuart Lewis Stuart Lewis
506
DS- Log Converter difference between docs (log-converter) and launcher Jeffrey Peter Dietz
507 (stats-log-converter) Trimble
DS- Retrieving country names in SOLR can return Peter Dietz Peter Dietz
509 ArrayIndexOutOfBounds when country code is unchecked
DS- Connection leak in SWORD authentication process Andrea Bollini Andrea Bollini
513 (4Science) (4Science)
DS- DSRUN does not start Service Manager Stuart Lewis Mark Diggory
516
DS- Reordering of 1.5 -> 1.6 upgrade steps in DSpace manual Jeffrey Stuart Lewis
523 Trimble
DS- Withdrawn items not shown as deleted in OAI Kim Shepherd John
527
DS- Malformed Japanese option values in the authority lookup window Kim Shepherd Keiji Suzuki
537
DS- restricted items are being returned in OAI GetRecord method while Ben Bosman Ben Bosman
538 using harvest.includerestricted.oai
DS- Give METS ingester configuration option to make use of Stuart Lewis Stuart Lewis
194 collection templates
DS- New -zip option for item exporter and importer Unassigned Stuart Lewis
204
DS- Creative Commons - option to set legal jurisdiction Unassigned Stuart Lewis
205
DS- Community Admin XMLUI: Delegated Admins Patch Andrea Bollini Tim Donohue
228 (4Science)
DS- Authority Control, and plug-in choice control for Metadata Jeffrey Trimble Larry Stone
236 Fields
DS- Contribution of @MIRE Solr Based Statistics Engine to Mark Diggory Mark Diggory
247 DSpace.
DS- Hide metadata from full item view Larry Stone Claudia Jürgen
288
DS- ItemUpdate - new feature to batch update metadata and Jeffrey Trimble Richard Rodgers
323 bitstreams (OLD acct)
DS- Add support for OpenSearch syndicated search Jeffrey Trimble Richard Rodgers
324 conventions (OLD acct)
DS- Create new session on login / invalidate sessions on Stuart Lewis Stuart Lewis
330 logout
DS- Add alternate file appender for log4j Graham Triggs Graham Triggs
359
DS- JSPUI tags/views for @mire Solr statistics module Kim Shepherd Kim Shepherd
363
DS- Item importer - new option to enable workflow notification Jeffrey Trimble Stuart Lewis
388 emails
20 issues
DS- Factor out common webapp installation - ID: 2042160 Mark H. Wood Charles
52 Kiplagat
DS- METS exposed via OAI-PMH includes descritpion.provenance Stuart Lewis Stuart Lewis
196 information
DS- handle.jar 6.2 needs adding to DSpace Maven repository Mark Diggory Stuart Lewis
201
DS- IPAuthentication extended to allow negative matching Stuart Lewis Stuart Lewis
213
DS- Internal Server error - include login details of user Stuart Lewis Vanessa
219 Newton-Wade
DS- XMLUI 'current activity' recognises Google Chrome as Safari Stuart Lewis Stuart Lewis
221
DS- Configurable passing of Javamail parameter settings Stuart Lewis Stuart Lewis
234
DS- Bulk Metadata Editing: XMLUI aspect and forms Kim Shepherd Kim Shepherd
251
DS- Interpolate variables in the Subject: line of email templates as well Stuart Lewis Larry Stone
252
DS- Community Admin JSPUI: porting of the DS-228 patch Andrea Bollini Andrea Bollini
261 (4Science) (4Science)
DS- Make delegate admin permissions configurable Jeffrey Trimble Andrea Bollini
270 (4Science)
DS- README update for top level of dspace 1.6.0 package directory Stuart Lewis Van Ly
291
DS- Refactor SQL source and Ant script to avoid copying Oracle Larry Stone Larry Stone
297 versions over PostgreSQL
DS- Allow long values to be specified for the max upload request (for Graham Triggs Stuart Lewis
299 uploading files greater than 2Gb)
DS- Offer access in AbstractSearch to QueryResults for subclasses Ben Bosman Ben Bosman
307
DS- documentation on an added optional configuration parameter Jeffrey Trimble Ben Bosman
308
DS- Monthly statistics skip first and last of month - ID: 2541435 Stuart Lewis Charles
44 Kiplagat
DS- Links not working due to trailing white space in dspace.url Claudia Jürgen Claudia Jürgen
114
DS- File preview link during submission leeds to page not found Claudia Jürgen Claudia Jürgen
118
DS- XMLUI Feedback form breaks with multiple hostnames Kim Shepherd Keith
121 Gilbertson
DS- xmlui browse in empty collection displays "Now showing items 1- Scott Phillips Keith
123 0" of 0 - incorrect numbering Gilbertson
DS- metadataschemaregistry_seq is not initialized correctly under Stuart Lewis Larry Stone
191 Oracle
DS- OAI RDF crosswalk fails when DC value is null Stuart Lewis Larry Stone
193
DS- Deleting a primary bitstream does not clear the Claudia Jürgen Graham Triggs
197 primary_bitstream_id on the bundle table
DS- File descriptions can not be removed/cleared in XMLUI Unassigned Kim Shepherd
198
DS- SWORD module doesn't accept X-No-Op header (dry run) Unassigned Claudio
199 Venturini
DS- SWORD module requires the X-Packaging header Stuart Lewis Claudio
200 Venturini
DS- Input form visibility restriction doesn't work properly Andrea Bollini Andrea Bollini
206 (4Science) (4Science)
DS- Context.java turnOffAuthorisationSystem() can throw a NPE Stuart Lewis Stuart Lewis
209
DS- NPE thrown during Harvest of non-items when visibility Stuart Lewis Stuart Lewis
212 restriction is enabled
DS- Migrating items that use additional metadata schemas causes Unassigned Stuart Lewis
216 an NPE
DS- Hardcoded String in the license bitstream Andrea Bollini Andrea Bollini
217 (4Science) (4Science)
DS- Email alerts due to internal errors are not sent, if context is Claudia Jürgen Claudia Jürgen
222 missing
DS- Usage event (statistics) Plugin hook for 1.5 (SF Mark H. Bradley McLean
108 2025998) Wood
3 issues
DS- Refactor LDAPServlet to use Stackable Authentication - ID: 2057231 Stuart Lewis Charles
4 Kiplagat
DS- 'My Account' disappears following exports - ID: 2495728 Stuart Lewis Charles
11 Kiplagat
DS- Fix for bug [1774958] Nested folders do not export correctly - ID: Stuart Lewis Charles
13 2513300 Kiplagat
DS- Feature Request #1896717 Registration notification missin - ID: Stuart Lewis Charles
19 2041754 Kiplagat
DS- Fix for hardcoded metadata language qualifiers - ID: 2433387 Claudia Jürgen Charles
21 Kiplagat
DS- Hardcoded String in jspui browse - ID: 2526153 Claudia Jürgen Charles
30 Kiplagat
DS- Bug 2512868 Double quote problem in some fields of JSPUI - ID: Claudia Jürgen Charles
31 2525942 Kiplagat
DS- Add File Format Descriptions to XMLUI 1.5.x - ID: 2433852 Unassigned Charles
34 Kiplagat
DS- Enable Google Sitemaps for XMLUI - ID: 2462293 Unassigned Charles
35 Kiplagat
DS- DSpace 1.5 XMLUI - Enable METS <amdSec> using crosswalks - Unassigned Charles
36 ID: 2477820 Kiplagat
DS- Fix for toDate method in DCDate - ID: 2385187 Stuart Lewis Charles
39 Kiplagat
DS- Messages_th.properties for DSpace 1.5.1 JSPUI - ID: 2540683 Unassigned Charles
45 Kiplagat
DS- Bug 1617889 Years < 1000 do not display in simple item view - ID: Andrea Bollini Charles
46 2524083 (4Science) Kiplagat
DS- Add support for rendering DOI links in JSPUI (1.4, 1.5) - ID: 2521493 Andrea Bollini Charles
47 (4Science) Kiplagat
DS- XMLUI Cocoon logs should not be stored under [xmlui-webapp] Unassigned Tim
85 /WEB-INF/logs/ Donohue
Tim Donohue
DS- XMLUI file download links break in Google search results if file Tim
87 'sequence' number changes. Donohue
DS- Verify Configuration Options are still applicable with the Cocoon Mark Diggory Mark
94 User community. Diggory
DS- "Not found" page returns 200 OK instead of 404 Not Found - ID: Mark H. Charles
2 2002866 Wood Kiplagat
DS- DSpace1.5.1(XML) problem with Login to restricted bitstreams - ID: Stuart Lewis Charles
5 2164955 Kiplagat
DS- XHTML Head Dissimination Crosswalk exposes provenance info - ID: Stuart Lewis Charles
6 2343281 Kiplagat
DS- HTML tags not stripped in statistics display - ID: 1896225 Stuart Lewis Charles
7 Kiplagat
DS- DSpace Home link style in breadcrumb trail - ID: 1951859 Stuart Lewis Charles
8 Kiplagat
DS- Restricted Items metadata exposed via OAI - ID: 1730606 Stuart Lewis Charles
9 Kiplagat
DS- Implicit group for all registered users - ID: 1587270 Stuart Lewis Charles
10 Kiplagat
DS- Exception handling for deleting a metadata field - ID: 1606439 Stuart Lewis Charles
12 Kiplagat
DS- xmlui Administrative log in as another eperson - ID: 2086481 Stuart Lewis Charles
14 Kiplagat
DS- Submission verify page handles dc.identifier.* incorrectly - ID: 2155479 Unassigned Charles
15 Kiplagat
DS- DSpace 1.5 Controlled Vocab (edit-metadata.jsp) - ID: 1931796 Stuart Lewis Charles
17 Kiplagat
DS- DSpace 1.5.1(XMLUI) Wrong dir usage of StatisticsLoader - ID: Stuart Lewis Charles
18 2137425 Kiplagat
DS- 2 Authentications with LoginPage cause connection exhaust - ID: Claudia Charles
20 2352146 Jürgen Kiplagat
DS- News stored not language dependend - ID: 2125833 Unassigned Charles
22 Kiplagat
DS- DSQuery invalid check for empty query string - ID: 2343849 Unassigned Charles
23 Kiplagat
DS- Error in authorization to submit when you add collection. - ID: 1725817 Unassigned Charles
24 Kiplagat
DS- SWORD Service Document fails if Collection is untitled - ID: 1968082 Stuart Lewis Charles
25 Kiplagat
(Scott Phillips) Fixed bug where users could not finish registering nor reset
their password because the authentication method signatures were changed.
Jay Paz (SF#1898241) Additional fixes to patch to enable reuse of methods.
Added the ability to manage sessions with site wide alerts to prevent users from authenticating.
Fixes a bug where the ability to edit an item durring workflow step 2 is not displayed.
Jay Paz (SF#1898241) Add item Export from jspui and xmlui.
Added easy support for google analytics statistics
(Mark Diggory)
(Claudia Juergen)
Fix for SF bug #2090761 Statistics wrong use of dspace.dir for log location
Fix for SF bug #2081930 xmlui hardcoded strings in EditGroupForm.java
Fix for SF bug #2080319 jspui hardcoded strings in browse
Fix for SF bug #2078305 xmlui hardcoded strings used in UI in xmlui-api
Fix for SF bug #2078324 xmlui hardcoded strings used in UI in General-Handler.xsl
SF patch #2076066 Review in jspui submission non-dc metadata
SF Bug #1983859 added Foreign Lucene Analyzers to poms
SF Bug #1989916 - missing LDAP authentication key
(Stuart Lewis)
#1947036 Patch for SF Bug1896960 SWORD authentication and LDAP + 1989874 LDAPAuthentication
pluggable method broken for current users
Added copying of registration email template to 1.4 to 1.5 upgrade instructions
Fix for SF bug #2055941 LDAP authentication fails for new users in SWORD and Manakin
#1990660 SWORD Service Document are malformed / Corrected Atom publishing MIME types
Updated installation and configuration documents for new statistics script, and removed references to
Perl
(Tim Donohue)
Fix for SF bug #2095402 - Non-interactive Submission Steps don't work in JSPUI 1.5
Fix for SF bug #2013921 - Movement in Submission Workflow Causes Skipped Steps
Fix for SF bug #2015988 - Configurable Submission bug in SubmissionController
Fix for SF bug #2034372 - Resorting Search Results in JSPUI always gives no results
Updates to Community/Collection Item Counts (i.e. strengths) for XMLUI.
1.5 upgrade instructions were missing Metadata Registry updates necessary to support SWORD.
(Graham Triggs)
Fix various problems with resources potentially not being freed, and other minor fixes suggested by
FindBugs
Replace URLEncoder with StringEscapeUtils for better fix of escaping the hidden query field
Fix #2034372 - Resorting in JSPUI gives no results
Fix #1714851 - set eperson.subscription.onlynew in dspace.cfg to only include items that are new to the
repository
Fix issue where the browse and search indexes will not be updated correctly if you move an Item
Fix problem with SWORD not accepting multiple concurrent submissions
Fix #1963060 Authors listed in reverse order
Fix #1970852 - XMLUI: Browse by Issue Date "Type in Year" doesn't work
Statistics viewer for XMLUI, based on existing DStat. Note that this generates
the view from the analysis files (.dat), does not require HTML report generation.
Fixed incorrect downloading of bitstream on withdrawn item
Add JSPUI compatible log messages to XMLUI transformers
Clean up use of ThreadLocal
Improved cleanup of database resources when web
application is unloaded
Fix bug #1931799 - duplicate "FROM metadatavalue"
Fixed Oracle bugs with ILIKE operators and LIMIT/OFFSET clauses
LNI (Lightweight Network Interface) service. Allows programmatic ingest of content via WebDAV or
SOAP.
SWORD (Simple Web-service Offering Repository Deposit): repository-standard ingest service using
Atom Publishing Protocol.
Highly configurable item web submission system. All submission steps are configurable not just
metadata pages.
Browse functionality allowing customisation of the available indexes via dspace.cfg and pluggable
normalisation of the sort strings. Integration with both JSP-UI and XML-UI included.
Extensible content event notification service.
Generation of Google and HTML sitemaps
Error pages now return appropriate HTTP status codes (e.g. 404 not found)
Bad filenames in /bitstream/ URLs now result in 404 error – prevents infinite URL spaces confusing
crawlers and bad "persistent" bitstream IDs circulating
Prevent infinite URL spaces in HTMLServlet
InstallItem no longer sets dc.format.extent, dc.format.mimetype; no longer sets default value for dc.
language.iso if one is not present
Empty values in drop-down submit fields are not added as empty metadata values
API methods for searching epeople and groups
Support stats from both 1.3 and 1.4
[dspace]/bin/update-handle-prefix now runs index-all
Remove cases of System.out from code executed in webapp
Change "View Licence" to "View License" in Messages.properties
dspace.cfg comments changed to indicate what default.language actually means
HandleServlet and BitstreamServlet support If-Modified-Since requests
Improved sanity-checking of XSL-based ingest crosswalks
Remove thumbnail filename from alt-text
Include item title in HTML title element
Improvements to help prevent spammers and sploggers
Make cleanup() commit outstanding work every 100 iterations
Better handling where email send failed due to wrong address for new user
Include robots.txt to limit bots navigating author, date and browse by subject pages
Add css styles for print media
RSS made more configurable and provide system-wide RSS feed, also moves text to Messages.
properties
Jar file updates (includes required code changes for DSIndexer and DSQuery and new jars fontbox.jar
and serializer.jar)
Various documentation additions and cleanups
XHTML compliance improvements
Move w3c valid xhtml boiler image into local repository
Remove uncessary Log4j Configuration in CheckerCommand
Include Windows CLASSPATH in dsrun.bat
1604037 - UIUtil.encodeBitstream() now correctly encodes URLs (no longer incorrectly substitutes '+' for
spaces in non-query segment
1592984 - Date comparisons strip time in org.dspace.harvest.Harvest
1589902 - Duplicate field checking error on input-forms.xml
1596952 - Collection Wizard create Template missing schema
1596978 - View unfinished submissions - collection empty
1588625 - Incorrect text on item mapper screen
1597805 - DIDL Crosswalk: wrong resource management
1605635 - NPE in Utils.java
1597504 - Search result page shows shortened query string
1532389 - Item Templates do not work for non-dc fields
1066771 - Metadata edit form dropping DC qualifier
1548738 - Multiple Metadata Schema, schema not shown on edit item page
1589895 - Not possible to add unqualified Metadata Field
1543853 - Statistics do not work in 1.4
1541381 - Browse-by-date and browse-by-title not working
1556947 - NullPointerException when no user selected to del/edit
1554064 - Fix exception handling for ClassCastException in BitstreamServlet
1548865 - Browse errors on withdrawn item
1554056 - Community/collection handle URL with / redirects to homepage
1571490 - UTF-8 encoded characters in licence
1571519 - UTF-8 in statistics
1544807 - Browse-by-Subject/Author paging mechanism broken
1543966 - "Special" groups inside groups bug
1480496 - Cannot turn off "ignore authorization" flag!
1515148 - Community policies not deleting correctly
1556829 - Docs mention old SiteAuthenticator class
1606435 - Workflow text out of context
Fix for bitstream authorization timeout
Fix to make sure cleanup() doesn't fail with NullPointerException
Fix for removeBitstream() failing to update primary bitstream
Fix for Advanced Search ignoring conjunctions for arbitrary number of queries
Fix minor bug in Harvest.java for Oracle users
Fix missing title for news editor page
Small Messages.properties modification (change of DSpace copyright text)
fix PDFBox tmp file issue
Fix HttpServletRequest encoding issues
Fix bug in TableRow toString() method where NPE is thrown if tablename not set
Update DIDL license and change coding style to DSpace standard
Initial i18n Support for JSPs - Note: the implementation of this feature required changes to almost all JSP
pages
LDAP authentication support
Log file analysis and report generation
Configurable item licence viewing
Supervision order/collaborative workspace administrative tools
collection-home.jspchanged
community-home.jspchanged
community-list.jspchanged
home.jspchanged
dspace-admin/list-formats.jspchanged
dspace-admin/wizard-questions.jspchanged
search/results.jspchanged
submit/cancel.jspchanged
submit/change-file-description.jspchanged
submit/choose-file.jspchanged
submit/complete.jspchanged
submit/creative-commons.jspchanged
submit/edit-metadata.jspnew
submit/get-file-format.jspchanged
submit/initial-questions.jspchanged
submit/progressbar.jspchanged
submit/review.jspchanged
submit/select-collection.jspchanged
submit/show-license.jspchanged
submit/show-uploaded-file.jspchanged
submit/upload-error.jspchanged
submit/upload-file-list.jspchanged
collection-homechanged
community-homechanged
display-itemchanged
dspace-admin/confirm-delete-collectionmoved to tools/ and changed
dspace-admin/confirm-delete-communitymoved to tools/ and changed
dspace-admin/edit-collectionmoved to tools/ and changed
dspace-admin/edit-communitymoved to tools/ and changed
dspace-admin/indexchanged
dspace-admin/upload-logochanged
dspace-admin/wizard-basicinfochanged
dspace-admin/wizard-default-itemchanged
dspace-admin/wizard-permissionschanged
dspace-admin/wizard-questionschanged
help/formats.htmlremoved
help/formatschanged
indexchanged
layout/navbar-adminchanged
Administration
If you are logged in as administrator, you see admin buttons on item, collection, and community pages
New collection administration wizard
Can now administer collection's submitters from collection admin tool
Delegated administration - new 'collection editor' role - edits item metadata, manages submitters list,
edits collection metadata, links to items from other collections, and can withdraw items
Admin UI moved from /admin to /dspace-admin to avoid conflict with Tomcat /admin JSPs
New EPerson selector popup makes Group editing much easier
'News' section is now editable using admin UI (no more mucking with JSPs)
Import/Export/OAI
New tool that exports DSpace content in AIPs that use METS XML for metadata (incomplete)
OAI - sets are now collections, identified by Handles ('safe' with /, : converted to _)
OAI - contributor.author now mapped to oai_dc:creator
Miscellaneous
Build process streamlined with use of WAR files, symbolic links no longer used, friendlier to later
versions of Tomcat
MIT-specific aspects of UI removed to avoid confusion
Item metadata now rendered to avoid interpreting as HTML (displays as entered)
Forms now have no-cache directive to avoid trouble with browser 'back' button
Bundles now have 'names' for more structure in item's content
Changed: dspace/jsp/collection-home.jsp
Changed: dspace/jsp/community-home.jsp
Changed: dspace/jsp/community-list.jsp
Changed: dspace/jsp/display-item.jsp
Changed: dspace/jsp/index.jsp
Changed: dspace/jsp/home.jsp
Changed: dspace/jsp/styles.css.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/authorize-advanced.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/authorize-collection-edit.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/authorize-community-edit.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/authorize-item-edit.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/authorize-main.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/authorize-policy-edit.jsp
Moved to dspace-admin: dspace/jsp/admin/collection-select.jsp
Moved to dspace-admin: dspace/jsp/admin/community-select.jsp
Moved to dspace-admin: dspace/jsp/admin/confirm-delete-collection.jsp
Moved to dspace-admin: dspace/jsp/admin/confirm-delete-community.jsp
Moved to dspace-admin: dspace/jsp/admin/confirm-delete-dctype.jsp
Moved to dspace-admin: dspace/jsp/admin/confirm-delete-eperson.jsp
Moved to dspace-admin: dspace/jsp/admin/confirm-delete-format.jsp
Moved to dspace/jsp/tools: dspace/jsp/admin/confirm-delete-item.jsp
Moved to dspace/jsp/tools: dspace/jsp/admin/confirm-withdraw-item.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/edit-collection.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/edit-community.jsp
Moved to dspace/jsp/tools and changed: dspace/jsp/admin/edit-item-form.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/eperson-browse.jsp
Moved to dspace-admin: dspace/jsp/admin/eperson-confirm-delete.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/eperson-edit.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/eperson-main.jsp
Moved to dspace/jsp/tools and changed: dspace/jsp/admin/get-item-id.jsp
Moved to dspace/jsp/tools and changed: dspace/jsp/admin/group-edit.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/group-eperson-select.jsp
Moved to dspace/jsp/tools and changed: dspace/jsp/admin/group-list.jsp
Moved to dspace-admin: dspace/jsp/admin/index.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/item-select.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/list-communities.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/list-dc-types.jsp
Removed: dspace/jsp/admin/list-epeople.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/list-formats.jsp
Moved to dspace/jsp/tools: dspace/jsp/admin/upload-bitstream.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/upload-logo.jsp
Moved to dspace-admin: dspace/jsp/admin/workflow-abort-confirm.jsp
Moved to dspace-admin and changed: dspace/jsp/admin/workflow-list.jsp
Changed: dspace/jsp/browse/authors.jsp
Changed: dspace/jsp/browse/items-by-author.jsp
Changed: dspace/jsp/browse/items-by-date.jsp
Changed: dspace/jsp/browse/no-results.jsp
New: dspace-admin/eperson-deletion-error.jsp
New: dspace/jsp/dspace-admin/news-edit.jsp
New: dspace/jsp/dspace-admin/news-main.jsp
New: dspace/jsp/dspace-admin/wizard-basicinfo.jsp
New: dspace/jsp/dspace-admin/wizard-default-item.jsp
New: dspace/jsp/dspace-admin/wizard-permissions.jsp
New: dspace/jsp/dspace-admin/wizard-questions.jsp
Changed: dspace/jsp/components/contact-info.jsp
Changed: dspace/jsp/error/internal.jsp
New: dspace/jsp/help/formats.jsp
Changed: dspace/jsp/layout/footer-default.jsp
Changed: dspace/jsp/layout/header-default.jsp
Changed: dspace/jsp/layout/navbar-admin.jsp
Changed: dspace/jsp/layout/navbar-default.jsp
Changed: dspace/jsp/login/password.jsp
Changed: dspace/jsp/mydspace/main.jsp
Changed: dspace/jsp/mydspace/perform-task.jsp
Changed: dspace/jsp/mydspace/preview-task.jsp
Changed: dspace/jsp/mydspace/reject-reason.jsp
Changed: dspace/jsp/mydspace/remove-item.jsp
Changed: dspace/jsp/register/edit-profile.jsp
Changed: dspace/jsp/register/inactive-account.jsp
Changed: dspace/jsp/register/new-password.jsp
Changed: dspace/jsp/register/registration-form.jsp
Changed: dspace/jsp/search/advanced.jsp
Changed: dspace/jsp/search/results.jsp
Changed: dspace/jsp/submit/cancel.jsp
New: dspace/jsp/submit/cc-license.jsp
Changed: dspace/jsp/submit/choose-file.jsp
New: dspace/jsp/submit/creative-commons.css
New: dspace/jsp/submit/creative-commons.jsp
Changed: dspace/jsp/submit/edit-metadata-1.jsp
Changed: dspace/jsp/submit/edit-metadata-2.jsp
Changed: dspace/jsp/submit/get-file-format.jsp
Changed: dspace/jsp/submit/initial-questions.jsp
Changed: dspace/jsp/submit/progressbar.jsp
Changed: dspace/jsp/submit/review.jsp
Changed: dspace/jsp/submit/select-collection.jsp
Changed: dspace/jsp/submit/show-license.jsp
Changed: dspace/jsp/submit/show-uploaded-file.jsp
Changed: dspace/jsp/submit/upload-error.jsp
Changed: dspace/jsp/submit/upload-file-list.jsp
Changed: dspace/jsp/submit/verify-prune.jsp
New: dspace/jsp/tools/edit-item-form.jsp
New: dspace/jsp/tools/eperson-list.jsp
New: dspace/jsp/tools/itemmap-browse.jsp
New: dspace/jsp/tools/itemmap-info.jsp
New: dspace/jsp/tools/itemmap-main.jsp
Improvements in 1.1.1
bin/dspace-info.pl now checks jsp and asset store files for zero-length files
make-release-package now works with SourceForge CVS
eperson editor now doesn't display the spurious text 'null'
item exporter now uses Jakarta's cli command line arg parser (much cleaner)
item importer improvements:
now uses Jakarta's cli command line arg parser (much cleaner)
imported items can now be routed through a workflow
more validation and error messages before import
can now use email addresses and handles instead of just database IDs
can import an item to a collection with the workflow suppressed
An item that is under submission and active edit by an authorized user. The workspace item is visible only to
the submitter and the system administrators. (Currently there is no simple way to find/browse such items other
than with the direct item ID or to use the supervisor functionality). Using the supervisor functionality, a system
admin can allow other authorized user to see/edit the item in the workspace state.
Self deposit
Collaboration over an in-progress submission for a small group of researchers. (This use case is
implemented only with major limitations, using the supervision feature – concurrency, lack of delegation:
supervision must be defined by the system administrators, etc.)
Workflow Item
An item that is under review for quality control and policy compliance. The workflow item is visible to the original
submitter (currently only basic metadata are visible out-of-box in the mydspace summary list), users assigned to
the specific workflow step where the item resides, and system administrators. (Currently there is no simple way
to find/browse such items other than with the direct item ID or to use the abort workflow functionality).
Quality control
Improvements to the bibliographic record (metadata available in workflow can be different than those
asked of the submitter)
Check of policy / copyright
Withdrawn item
It is a logical deletion. The Item can be restored and it can be used to keep track of what has been available for
a while on the public site.
Staging area for item to be removed when copyright issues arise with publisher. If the copyright issue is
confirmed, the item will be permanently deleted or kept in the withdrawn state for future reference.
Logical deletion delegated to community/collection admin, where permanent deletion is reserved to
system administrators
Logical deletion, where permanent deletion is not an option for an organization
Removal of an old version of an item, forcing redirect to a new up-to-date version of the item (this use
case is not currently implemented out-of-box in DSpace, see )
Private item
This state should only refer to the discoverable nature of the item. A private item will not be included in any
system that aims to help users to find items. So it will not appear in:
Browse
Recent submission
Search result
OAI-PMH (at least for the ListRecords and ListIdentifiers verb; though the OAI-PMH specification is not
clear about inconsistent implementation of the ListRecords and GetRecord verb)
REST list and search methods
It should be accessible under the actual ACL rules of DSpace using direct URL or query method such as:
Provide a light rights awareness feature where discovery is not enabled for search and/or browse
Hide “special items” such as repository presentations, guides or support materials
Hide an old version of an Item in cases where real versioning is not appropriate or liked
Hide specific types of item such as “Item used to record Journal record: Journal Title, ISSN, Publisher
etc.” used as authority file for metadata (dc.relation.ispartof) of “normal item”
Archived/Published item
An item that is in a stable state, available in the repository under the defined ACL rule. Changes to these items
are possible only for a restricted group of users (administrators) and should produce versioning according to the
Institution's policy.
Are a special case of Archived/Published Item. The item has some time based access policy attached to it and
/or the underlying bitstreams. Specifically, read permission for someone (EPerson Group) starting from a
defined date. Typically embargo is applied to the bitstreams so that "fulltext" has initially very limited access
(normally administrators or other "repository staff" groups) and only after a defined date will the fulltext become
visible to all users (Anonymous group). This scenario is used to implement typical "embargo requirements" from
publishers -- see Delayed Open Access.
If the metadata of the item should be visible only to a specific group of users, it is possible to define an embargo
policy also for the ITEM itself. A READ policy for a specific group will mean that only the users in that group will
be able to access the item splash page. Note that currently only some UIs (JSPUI/XMLUI) and in a very specific
configuration (discovery enabled as search provider, and the SOLRBrowseDAOs is used for the Browse
system) are fully rights aware. This means that in different UIs or with different configurations (legacy lucene
search or DBMS browse) some metadata of a restricted item could be exposed to unauthorized users. When
you need to work with UIs not fully rights aware, a workaround can be to use the "Private Item" flag to make the
item undiscoverable so that metadata will be not exposed to unauthorized users. Please note that this
workaround has several major limitations:
No one, not ever authorized users, is able to find the item by browsing or searching the repository.
You need to manage externally a schedule that alerts you when the embargo is expired so that you may
re-enable the discoverable nature of the item.