[go: up one dir, main page]

Academia.eduAcademia.edu

Digitization Workflow Management System for Massive Digitization Projects

2000

The Digitization Workflow Management System (DWMS) is a system developed in Bibliotheca Alexandrina, the Library of Alexandria, to manage the whole process of digitization including its various phases, system users, files movement, archiving, and integration with the ILS and the library digital repository. The system supports workflow dynamic evolutions and deviation to allow for exception handling. It provides history tracking

Digitization Workflow Management System for Massive Digitization Projects Mohamed Yakout* Email: mohamed.yakout@bibalex.org Noha Adly*† Email: noha.adly@bibalex.org Magdy Nagi*† Email: magdy.nagi@bibalex.org * Bibliotheca Alexandrina, El Shatby 21526, Alexandria, Egypt † Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Alexandria, Egypt Abstract — The Digitization Workflow Management System (DWMS) is a system developed in Bibliotheca Alexandrina, the Library of Alexandria, to manage the whole process of digitization including its various phases, system users, files movement, archiving, and integration with the ILS and the library digital repository. The system supports workflow dynamic evolutions and deviation to allow for exception handling. It provides history tracking of actions and flexibility to simultaneously manage multiple projects with a diversity of materials. Moreover, it supports ingesting a job in the middle of the workflow and allows easy integration of tools used to perform functions of the workflow. Index Terms — Digitization, Workflow, Digital Library. I. INTRODUCTION When an end user accesses images, PDF, audio, video, or any other multimedia document through the Internet, this means that the primary purpose of the entire digitization effort is met. From start to finish, digital multimedia documents production is a manufacturing and delivery process which should be likened to an assembly line. To date, the digitization process in many libraries has concentrated on the input (scanning) side, and fulfillment via the Internet and Content Management Software. What has not received sufficient attention is automating, tracking, and managing the entire workflow with particular emphasis on what happens between scanning and delivery. BA accepted this challenge as a part of its Digital Assets Repository (DAR) project to achieve a truly device-independent, integrated and automated workflow. The result was the Digitization Workflow Management System (DWMS), which is a high reliable digitization workflow management that can be customized for large and challenging digitization projects or used out-of-thebox. In either case, BA workflow system improves productivity and therefore reduces both the cost of production and the time it takes to complete a project. A Digitization Laboratory requires an efficient and highly integrated digitization system consisting of hardware, software and workflow management processes that could fully exploit the unique capabilities of the Digital Lab. Several experiences with large digitization projects taught us the need for a highly integrated system that manages the whole process of digitization with its phases, system users, exception handling, history tracking of actions, files movement, archiving, and integration with the LIS and the library digital repository. Such system would need to be flexible enough to simultaneously manage multiple projects with a diversity of materials covering books, journals, newspapers, manuscripts, unbound materials, audio, video, and slides. The system would also need to seamlessly feed content to the libraries' digital repository to ensure the preservation of the content for years to come. As any workflow, the digitization workflow is a description of a business process in sufficient details that it is able to be directly executed by a workflow management system [1]. A digitization workflow is composed of a sequence of phases. The phases are undertaken by the digital lab resources, such as digital lab operators and devices (scanners or encoding servers). Production data are information objects. For instance, TIFF, PDF, DJVU, ZIP files or any digital files, whose existence does not depend on workflow management. Several tools are used to execute elementary activities within a digitization phase such as, image processing and OCR suits. Workflow management systems are used to configure and control structured business processes from which well-defined workflow models and instances can be derived [2, 3]. However, the proprietary process definition frameworks imposed make it difficult to support (i) dynamic evolution (i.e. modifying process definitions during execution) following unexpected or developmental change in the business processes being modeled [4]; and (ii) deviations from the prescribed process model at runtime [5, 6, 7]. Without support for dynamic evolution, the occurrence of a process deviation requires either suspension of execution while the deviation is handled manually, or an entire process abort. However, since most processes are long and complex, neither manual intervention nor process termination are satisfactory solutions [8]. Manual handling incurs an added penalty: the corrective actions undertaken are not added to the system history or transaction log [9, 10], and so natural process evolution is not incorporated into future iterations of the process. Other evolution issues include problems of migration, synchronization and version control [5, 11]. From our experiences, the digitization process is one of the business processes, which is affected by the above limitations. These limitations make it hard to be mapped to a rigid modeling structure [12], due to the lack of flexibility inherent in a framework that, by definition, imposes rigidity. As a result, users are forced to work outside of the system, and/or constantly revise the static process model, in order to successfully support their activities, thereby negating the efficiency gains sought by implementing a workflow solution in the first place. It is therefore desirable to extend the capabilities of workflow systems by developing an approach to dynamic flexibility based on natural work practices [15]. DWMS avoided the above limitations by providing the facility to design a rigid workflow and allow for the dynamic evolution and deviations. This helped in handling exceptions by forwarding the jobs to an appropriate phase, which is not in the rigid defined sequence without the manual intervention. For example when digitizing books, a printed book passes by scanning, processing, OCRing, archiving then encoding to PDF. While for a hand written book, it could not go to the OCR phase and hence should be forwarded to the archiving then encoding phase. Also the jobs can be redirected back to a previous phase due to a quality assurance decision. The rest of the paper is organized as follow; section II gives an overview on the related work in the digitization workflow systems. Then, the data model of the DWMS will be discussed in section III followed by a description for the system architecture in section IV. In section V, the life cycle of a job will be described then in section VI implementation details will be discussed. Finally, section VII presents a conclusion and future work. II. RELATED WORK In this section, we will provide an overview on the digitization workflow systems efforts and directions. Currently, there are three directions followed by most of the existing digitization labs; (1) Manual workflow management using several software packages, (2) Simple tracking workflow system with limited capabilities, and (3) several integrated digitization activities in one software application to perform all the digitization phases. The manual workflow management is performed using tools such as Excel spread sheets, which could be adequate when the number of digital lab operators is small and digitization projects are also small. When the number of operators increases, some labs use in addition to Excel sheets tools such as Microsoft Share Point and Microsoft Project for managing larger digitization projects. Other labs may have the ability to hire programmers to develop a customized tracking system for their digitization workflow. In this case, the developed software deals with a rigid defined workflow, where handling exceptions is done manually and a very limited history tracking. The developed system depends on the lab environment and other software tools used within the digitization process. If there is files handling and checking, it will be related to the currently used tools, but the if tools changed the system will need extra programming and development. In addition to the above directions, several commercial software companies tried to develop digitization workflow systems by integrating several activities. For example, digital capturing, image processing and OCRing in one software. DOCWorks from CCS [16] integrates several useful activities for digitizing textual documents as books and newspapers. It provides tools for image processing, layout analysis, OCR and metadata extraction. It depends on the OCR software to analyze and extract metadata from a document. If the OCR results are poor, then the software acts as an image processor and PDF encoder. DOCWorks is limited to textual documents only. BookRestorer from i2s [17] integrates the whole process to digitize a printed book until it is encoded in PDF. It allows for the integration with Photoshop and the currently installed OCR software. OUPS from the Academic Imaging Associates [18] integrates the image capturing by supporting various scanners, image processing tools, OCRing and metadata extraction. It also provides metadata templates that can be customized. The above mentioned software tools can be a step or a phase in a bigger digitization workflow system. However, they suffer from several limitation. First, they are tightly coupled with certain tools and do not allow easily other tools to be integrated. For example, the OCR engine in DOCWorks is not working well for the Arabic, and the image processing tools in BookRestorer do not produce the best quality. Second, the systems do not manage the digital lab resources such as Workstations and users. Also they lack the management of projects and collections. Third, all the files handling between the storage server and clients is done manually. Finally, the systems lack the handling of workflow exceptions, and do not allow for dynamic evolution and deviations except through manual intervention. III. SYSTEM DATA MODEL To achieve a truly device-independent, integrated and automated digitization workflow, the system introduces a data model capable of defining different workflows for various types of objects. Each type of object can have its workflow defined by what we call Phase Sequence. The data model diagram as shown in Fig. 1., consists of six types of entities involved in managing the digitization workflow. to obtain a digitized version of the Job. Each Job passes through several Phases according to its Job Type. For example, for a Printed Book Job Type, the digitization Phases could be Scanning, Processing, OCRing, Archiving and PDF Encoding. While for Maps Job Type, the digitization Phases could be Scanning, Processing, and JPG Internet derivatives. The same Phase can be done on several Workstations in the system. Workstations can be assigned for each Phase according to its capabilities Scanning is done on Workstations with books scanner, Processing is done on Workstations with image processing software and so on.. A time period is attached to each Phase so that if it took longer time, the Job will be reported as Late Job. After finishing any Phase the Users are able to provide information about the Phase. This information is divided into Phase specific information, general comments, and file level information. This information will help the next operator who is working on the next Phase whether it is a new or a previous Phase. D. The User The User entity represents the system Users or the Digital Lab operators. Several roles can be defined for the Users to manage their access on the Jobs. The User can perform several types of Phases of a specific Job Type. The User can also be assigned to work on some Collections in the system. Fig. 1. BA Workflow tracking system data model. A. The Job It is the main entity, which represents the object being digitized. For example, a printed book for Naguib Mahfouz, photos for an event, a map of Alexandria, a music sheet for Omar Khayrat, a video film about the High Dam, etc. The Job can be one of any Job Type in the system. The Job should pass through the entire digitization Phases required for the Job Type. Each Job is identified by a unique ID and also can be identified by an External ID, which is the ID of the document in the external source as the ILS ID and/or Barcode. The Job has a priority and a life time in the workflow otherwise it will be reported as a Late Job. B. The Job Type The Job Type entity represents all the types of materials that can be digitized. The Job Type can be book, map, audio, video, or any other type of document that needs a special digitization workflow. C. The Phase The Phase entity represents a task or a unit of work that should be applied on a specific Job Type in its digitization workflow. Each Job Type has its own sequence of Phases defined apriori in a Phase Sequence, E. The Collection The Collection entity represents logical grouping for the Jobs. It may represent a digitization project or a private collection. A Collection may contain several documents of different Job Types. A group of Users can be assigned to work on a Collection. Several Workstations in the system can be allocated for a Collection. F. The Workstation The Workstation entity represents the computer, where the execution of the Phases is performed. Several Phases can be done on one Workstation. The Workstation can be allocated for several Collections. IV. SYSTEM ARCHITECTURE Fig. 2. shows a representation for the architecture of the DWMS. The system provides all the services through five main modules; Check-In, Phase Manager, Reporting, Archiving and Administration module. All modules provide the services after passing through an "'(+# ! ) "#) * "'($ # * " % # ) # # # " # & ! % $! ! % ! " # + ( ! $! Fig. 2. DWMS system Architecture Authentication and Authorization Handler. The XML Phases Definition Handler accepts the requests related to applying the necessary checks and actions before and after performing a digitization Phase. The File Handler is used mainly by the XML Phases Definition Handler to manage the files checks, copying and movements. All the system configurations, parameterizations and transactions are stored in a database and managed through the Database Handler. A. The System Handlers All the provided interfaces and services of the system are accessible through an authentication and authorization handler. This handler is responsible for customizing the application interface to the logged in User. Moreover, it authorizes each action or request submitted by User. The XML Phases Definition Handler is responsible for interpreting and applying the XML definition of the Phases. Each Phase has its own XML Phases Definition specifying the prerequisites and actions that need to be done before and after each Phase. The XML definition contains two main sections; Pre-Phase and Post-Phase. Each of these sections is composed of three subsections; Physical, Database and Reflection Call. • Physical section: In the Pre-Phase section, the Physical section allows to describe the necessary folders and files structure required to start work in the Phase and which of them should be copied to the client’s working folder to execute the Phase. For example, it is possible to say that the OCR Phase can not start unless there are OTIFF folder with TIFF files and PTIFF folder with TIFF files on the main file server. Only the PTIFF folder is required to be copied from the file server to the client’s working folder to do the OCR Phase. In the Post-Phase section, the Physical section allows to describe the necessary files and folders structure required to complete the Phase. It also defines which of the folders and files should return to the file server. For example, when finishing the Processing Phase there should a PTIFF folder with a number of TIFF files equal to the ones in the OTIFF folder. • Database section: It is usually used in the PostPhase section. It allows to define the structure of database information that should be submitted after finishing the Phase. It contains listing and naming for the fields that should be filled by the operator during his work in the Phase. This fields are saved as XML text with the Phase information in the transaction log for reference and query later using XPath. • Reflection Call section: In this section, DWMS allows to specify the Java function that should be executed either in the Pre-Phase or PostPhase. The function can start any process including files management, data entry, zipping, or encoding the files. In this section, it is possible to write the necessary code to ingest the objects in the digital document repository. An example of a Phase definition is shown in Fig. 3. <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> Fig. 3: XML Phase definition example The File Handler component is used by the XML Definition Handler to manage the file copying and movement. It is responsible for the necessary ftp handling with the file server and local file handling on the clients. All the database interactions are done through the Database Handler, who is responsible for interfacing with the database stored procedures. B. The System Modules The Check-In Module is responsible for creating a Job in the system and fires it to start. Although the Check-in Module determines the first Phase of a Job depending on its Phase Sequence , the system allows to handle an exception and allows the Job to start from an intermediate Phase within the workflow as long as its prerequisites are met. For example, although a printed book Phase Sequence is defined to pass through the Phases Scanning, Processing, OCRing, Archiving, and Encoding, it also possible to assign it to the Processing Phase as long as the TIFF files are ready in the working folder. This allows the accommodations of the system to receive scanned books from external sources. DWMS check-in has been designed and built to allow for the integration with any metadata source as the Integrated Library System (ILS), document registry, MARC or MODS files. This flexible integration has been achieved by making the check-in module built as plug-in based as described in Fig. 4. The system allows each library to write its own check-in plug-in for its metadata source. The Check-Out Module is responsible for ingesting the digital objects into the repository. It is implemented to allow for the integration with the institution’s repository such that the objects can be ingested into it. The jobs check-out can be written in the Java Reflection section of the XML Phases Definition explained earlier, see Fig. 3. This will help to customize the Check-out Module for the integration with the digital document repository. Fig. 4. DWMS Check-in and Check-out The Phase Manager provides the interface to the digitization laboratory operator. It allows the operator to request a new Job to work on, download the working files, and submit the Job back to the system to continue in its workflow. Moreover, the Phase Manager allows the operator to reject a Job after starting working on it. In this case, the operator will have to submit a rejection reason. Afterwards, the system will automatically assign the Job to the administrator to review the rejection reason and take the necessary actions. Also the operator can redirect the Job to another Phase not in its default path. The redirection of Jobs is automatically confirmed if the operator has enough permission. Otherwise to redirect Jobs, the Jobs is automatically assigned to the administrator as pending until he accepts or denies this redirection. The redirection may be due to a problem in a previous Phase. For instance, while performing the OCR Phase, the operator might have discovered that there were some pages that need further Processing or that there were some pages missing required scanning, thus they need to be returned to the Processing or Scanning Phase. In order to simplify solving problems that happen in previous Phases of digitization, DWMS allows the operator to add information on the produced files level. The information contains the files numbers, the required Phase that should be revisited, and the problem’s reason. This information is saved in the database. Once the Job is revisiting a previous Phase, the operator will be informed with the files that require reprocessing and the reasons. DWMS allows the administrator to define a list of reasons for each Phase to be revisited so that the operator can select the reasons from a drop-down list. This information is propagated for the next phases to take the necessary actions on the new produced files. For example, suppose in the OCR Phase, files 20 to 25 are missing and require Re-Scanning, the Job will be redirected to the Scanning Phase with file info about the page numbers that require scanning. The Job will be forwarded to the Processing Phase with the files level information indicating that files 20-25 require processing and so on. The Administration Module is responsible for the necessary system parameterization and settings. It allows the administrator to define and manage the Job Types with its Digitization workflow and Phases, the Roles of the Users, Workstations, and Collections. It also provides the facility to control the matrix covering the relation between Users, Workstations, Job Types, and Collections. For example, BA collection contains two Job Types; J1 and J2. The collection will be handled on Workstations W1 and W2 by the Users U1 and U2. U1 will be working on the Job Type J1, while U2 will be working on Job Type J1 and J2. The Reporting Module provides the necessary reports to allow for managing the Jobs within the workflow. The module provides four types of reports: • Workflow Tracking reports: provides the status of the Jobs in the system. It provides for each Job Type, the number of Jobs pending , started, and finished within each Phase. It also can separate the Jobs, which are revisiting a Phase to provide information about the new and old Jobs in the system • Pending Items report: provides the redirected or rejected Jobs by the operators. The administrator can grant or deny the redirection through this report. • Late Jobs report: provides a list of the Jobs that exceeded its due time either within a Phase or in the whole workflow, since there is a due time for each Job and a due time for each Phase. • Operators Rate report: helps the supervisors and higher level management to get the laboratory overall production and provide a tool for evaluating the operators. In addition to the above reports, the module provide a query builder, where the User can design his own report on the Jobs. The module also provides a search capabilities for the Jobs by the various attributes of the job ID, External ID, External ID type, title, creator, Collection, Job Type, and language. This search can be considered as an access point to the Job to assign or retrieve it from an archive. The Archiving Module is responsible for the archiving process. DWMS allows for the archiving on either online storage, CDs, tapes, or on all of the previous. It also allows to define a new media type with a specific capacity. The archiving information is ingested to DAR in the archiving metadata section. The integration between DWMS and DAR allowed DAR to keep track of the different archived versions of the digitized objects. C. The Quality Assurance Handling In this section, we will discuss the Quality Assurance (QA) Phase, which is an important step in a digitization workflow. DWMS helps the QA in two stages. First, it allows for providing QA information and decisions during each Phase. Second, a QA Phase is defined and configured to allow for a complete investigation on the produced digitized objects and all output files. The first stage is achieved by giving the chance to provide QA information and decisions while moving from a Phase to another in the digitization Phase Sequence. QA decisions are applied and the Job is automatically forwarded to the appropriate Phases. During each Phase, DWMS provides an interface to specify the erroneous files, recommended Phase to revisit and a possible reason for the problem. The second stage is a QA Phase as a separate Phase in the digitization Phase Sequence. During this Phase a complete quality assurance is applied on the produced digitized objects and all output files. Fig. 5. shows an example for the XML Phase Definition of the Books QA. The Physical subsection of the Pre-Phase section applies the necessary checks on the files and folder structures, denoting that Phase Sequence has been successfully completed. The Database subsection of the Post-Phase section defines fields to help the operator specifying whether the PDF is Image on Text or not, whether there is Wrong Text and Image Pairing, or there is a Page in Wrong Order or there is an error in the PDF file. After investigating the objects, the Job is automatically forwarded to the earliest Phase required to be re-done in the Phase Sequence. Once the Job started in the re-do Phase, DWMS can display two types of information to help the operator in the re-doing. First, the QA information on the files level, which contains the erroneous files with a possible reason and the QA information defined in the XML Phase definition fields. Of course this type of information will accelerate the redoing of the Phase. The files level QA information is propagated for the next Phases in the digitization Phase Sequence to take the necessary actions. For example, In the QA of a printed book, files from 10 to 15 requires Re-Scan. The QA files information will contain this information. After finishing the Re-Scan, the QA files information will say that files 10 to 15 requires ReProcessing, then files 10 to 15 requires Re-OCRing and so on. <Phase Name="Book QA"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> <Folder Name="PTIFF" Create="false" ToDestination="false" NewName="PTIFF" Mode="Restircted"> <File Name="ProcessedFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> <Folder Name="TXT" Create="false" ToDestination="false" NewName="TXT" Mode="Restricted"> <File Name="OCRedFiles" Type="afn" Count="+" ToDestination="false" Compare=""/> </Folder> <File Name="" Type="pdf" Count="1" ToDestination="true" Compare=""/> </Physical> </PrePhase> <PostPhase> <Database> <Field Name="isIOT" DisplayName="Image on Text:"/> <Field Name="WrongTextImage" DisplayName="Wrong Text and Image Pairing: "/> <Field Name="WrongOrder" DisplayName="Pages in Wrong Order: "/> <Field Name="PDFerror" DisplayName="PDF Error: "/> </Database> </PostPhase> </Phase> Fig. 5: Books QA phase definition in BA Digital Lab. D. Achieving flexibility using DWMS Workflow management systems provide support for business processes that are generally predictable and repetitive. However, the prescriptive, assembly-line frameworks imposed by workflow systems limit the ability to model and enact flexible work practices where deviations are a normal part of every work activity [13]. For these environments, formal representations of business processes may be said to provide merely a contingency around which tasks can be formulated dynamically [14], rather than a prescriptive blueprint that must be strictly adhered to. In this sense, a workflow process model may be considered a resource which mediates activities towards their objective. Rather than continue to try to force the digitization workflow processes into inflexible frameworks with limited success, a more adaptable approach is needed that is based on accepted ideas of how people actually work. DWMS has been developed and implemented to be a flexible workflow management system that: • Considers the defined digitization workflow for a Job Type in terms of a Phase Sequence as a guide, rather than a prescription for it; • Provides the facility to define a list of Phases that can or can not be included in the default Phase Sequence of digitization. The operator can assign the Job to any of all of these Phases; • Provides the ability to forward dynamically the Jobs to another Phase in the default Phase Sequence; and • Allows for changing the digitization Phase Sequence by adding or removing Phases. The new sequence will be applied on the current and new Jobs in the system, leading to natural process evolution V. LIFE CYCLE OF A JOB Of particular interest for a digital lab operator is the manner in which Jobs are advertised and ultimately bound to specific operator for execution. Fig. 6 illustrates the lifecycle of a Job in the form of a state transition diagram from the time that a Job is created by the check-in module to final completion. It can be seen that there are a series of potential states that comprise this process. Each node in Fig. 6 represents a possible state of a Job. Each edge within this diagram is provided with a text describing how this transition is initiated. It also indicates whether the transition requires a permission granted to the operator or not. • Initially a Job comes into existence in the Assign state to start its execution in a digitization Phase. A Job can be assigned to a specific operator, or to a group of operators or to any operator. • The Job moves to the Start state once the operator used the DWMS to download the required Job's files and folders in his working folder. The required files are defined in its XML Phase Definition. Started Jobs can be either rejected or completed. The operator can reject the Job because of a problem and accordingly the Job will be automatically assigned to the administrator, in the Pending Jobs report, to investigate and assign the Job again to the appropriate Phase. • If the operator completed the Phase, he may submit the Job back to the system to be in the Assign state for the next Phase in the Phase Sequence. If it is the last Phase, the Job will be ingested in the repository and check-out from the DWMS. If the operator recommends a Phase other than the normal flow, then the Job will move to the Redirect state, where it is automatically assigned to the administrator to approve or deny such action and put the Job in the Assign state either to the system or to a specific operator. simplify the Jobs’ flow management. Moreover, it Fig. 6: Life Cycle of a Job VI. IMPLEMENTATION The system was implemented on an open source platform. It is written using Java using Eclipse 3.1 IDE and requires JRE 1.5. The system can run on any operating system with a Java Virtual Machine. The database used is MySQL 5.1, with support for stored procedures and XPath query. The MySQL stored procedures were used for all the database interactions. The Database Handler interfaces with the database through a MySQL JDBC driver 5.0. The system allows for the use of the Java Reflection call technology to allow for writing special code to handle difficult Post and Pre-Phase actions. A Check-in plug-in has been implemented to import document's metadata from Virtua VTLS, ILS used in BA, and a Check-out plug-in has been implemented using the Java Reflection to ingest digital objects into DAR, the Digital Assets Repository of BA. VII. CONCLUSION AND FUTURE WORK In this paper, we presented the DWMS implemented in the Bibliotheca Alexandrina. The system introduces a data model capable of defining different workflows for various types of objects. A flexible integration with both any source of metadata such as ILS and a library digital document repository has been achieved by building the Check-in and Check-out modules of the system as plugins to allow each library to write its own modules. Moreover, the Check-in Module supports ingesting a Job in the middle of the workflow. The system adapts a flexible Job life cycle with history tracking of actions to supports dynamic evolutions and deviations to allow for exception handling. DWMS provides all the necessary tools required to manage the whole process of digitization including its various Phases, system Users, files movement and archiving. It provides flexibility to simultaneously manage multiple projects with a diversity of materials covering books, journals, newspapers, manuscripts, unbound materials, audio, video, and slides and allows easy integration of tools used to perform functions of the workflow. DWMS is a highly reliable digitization workflow management system that can be customized for large and challenging digitization projects or used out-of-the-box. The system is based on an open source platform and is fully deployed in BA digitization laboratory. Future work includes: • Check-out plug-in for Fedora. The system is currently integrated with the DAR repository system of BA. We are planning to build the plug-in modules for the integration with popular repositories especially Fedora. • Check-in plug-ins will be implemented to support various metadata standards formats MODS, DC, VAR, etc. • Enhance the software interface with graphical tools to help design and follow the digitization process. ACKNOWLEDGEMENT Many thanks go to Fadi Edward, Shehab Kamal, and Mohammed Abuouda for their efforts in the design and implementation phases of the system. [14] REFERENCES [15] [1] Russell, N., van der Aalst, W. ter Hofstede, A. & Edmond. , D., Workflow resource patterns: Identification, representation and tool support., in O. Pastor & J. Falcao e Cunha, eds, Proceedings of the 17th Conference on Advanced Information Systems Engineering (CAiSE'05), Vol. 3520 of Lecture Notes in Computer Science, Springer, Porto, Portugal, pp. 216—232, 2005. [2] van der Aalst, W., Weske, M. and Grunbauer, D. Case handling: A new paradigm for business process support. Data & Knowledge Engineering, 53(2):129-162, 2005. [3] Joeris, G. Defining flexible workflow execution behaviors. In Peter Dadam and Manfred Reichert, editors, Enterprise-wide and Cross-enterprise Workflow Management: Concepts, Systems, Applications, volume 24 of CEUR Workshop Proceedings, Paderborn, Germany, October 1999. [4] Borgida, A. and Murata, T. Tolerating Exceptions In Workflows: A Unified Framework For Data And Processes. In Proceedings of the International Joint Conference on Work Activities, Coordination and Collaboration (WACC'99), pages 59-68, San Francisco, CA, February 1999. ACM Press. [5] Rinderle, S., Reichert, M. and Dadam, P. Correctness Criteria For Dynamic Changes In Workflow Systems: A Survey. Data and Knowledge Engineering, 50(1):9-34, 2004. [6] Casati, F. A Discussion On Approaches To Handling Exceptions In Workflows. In CSCW Workshop on Adaptive Workflow Systems, Seattle, USA, November 1998. [7] Ellis, C.A., Keddara, K. and Rozenberg, G. Dynamic Change Within Workflow Systems. In N. Comstock, C. Ellis, R. Kling, J. Mylopoulos, and S. Kaplan, editors, Proceedings of the Conference on Organizational Computing Systems, pages 10-21, Milpitas, California, August 1995. ACM SIGOIS, ACM Press, New York. [8] Hagen, C. and Alonso, G. Exception Handling In Workflow Management Systems. IEEE Transactions on Software Engineering, 26(10):943-958, October 2000. [9] Ackerman, M. S. and Halverson, C. Considering An Organization's Memory. In Proceedings of the ACM 1998 Conference on Computer Supported Cooperative Work, pages 39-48. ACM Press, 1998. [10] Larkin, A. K. P. and Gould, E. Activity Theory Applied To The Corporate Memory Loss Problem. In L. Svennson, U. Snis, C. Sorensen, H. Fagerlind, T. Lindroth, M. Magnusson, and C. Ostlund, editors, Proceedings of IRIS 23 Laboratorium for Interaction Technology, University of Trollhattan Uddevalla, 2000. [11] van der Aalst, W.M.P. Exterminating The Dynamic Change Bug: A Concrete Approach To Support Workflow Change. Information Systems Frontiers, 3(3):297-317, 2001. [12] Bardram, J. E. I love the system - I just don't use it! In Proceedings of the 1997 International Conference on Supporting Group Work (GROUP'97), Phoenix, Arizona, 1997. [13] Strong, D. M. and Miller, S. M. Exceptions and Exception Handling In Computerized Information [16] [17] [18] Processes. ACM Transactions on Information Systems, 13(2):206-233, 1995. Bardram J. E. Plans As Situated Action: An Activity Theory Approach To Workflow Systems. In Proceedings of the 1997 European Conference on Computer Supported Cooperative Work (ECSCW'97), pages 17-32, Lancaster U.K., 1997. Adams, M, ter Hofstede, A. H. M., Edmond, D., and van der Aalst, W. M. P. Implementing Dynamic Flexibility in Workflows using Worklets. BPM Center Report BPM-0606, BPMcenter.org, 2006. Content Conversion Specialists. http://www.ccs-gmbh.de/ i2s DigiBook. http://www.i2s-bookscanner.com/ Academic Imaging Associates http://www.academicimaging.com