[go: up one dir, main page]

0% found this document useful (0 votes)
435 views302 pages

Document Information Extraction: Public 2024-05-13

This document provides information about document information extraction including what it is, new features, concepts, pricing, setup instructions, tutorials, development guides and best practices. It covers extracting information from various document types in multiple languages.

Uploaded by

f8cjames
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
435 views302 pages

Document Information Extraction: Public 2024-05-13

This document provides information about document information extraction including what it is, new features, concepts, pricing, setup instructions, tutorials, development guides and best practices. It covers extracting information from various document types in multiple languages.

Uploaded by

f8cjames
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 302

© 2024 SAP SE or an SAP affiliate company. All rights reserved.

PUBLIC
2024-05-13

Document Information Extraction

THE BEST RUN


Content

1 What Is Document Information Extraction?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 What's New for Document Information Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8


2.1 2023 What's New for Document Information Extraction (Archive). . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 2022 What's New for Document Information Extraction (Archive). . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 2021 What's New for Document Information Extraction (Archive). . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4 2020 What's New for Document Information Extraction (Archive). . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.5 2019 What's New for Document Information Extraction (Archive). . . . . . . . . . . . . . . . . . . . . . . . . . 74

3 Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4 Service Plans. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 Metering and Pricing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79


5.1 Blocks of 100 Documents for Base Edition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Compute Hours for Base Edition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 Blocks of 100 Documents for Premium Edition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6 Supported Document Types and File Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7 Supported Languages and Countries/Regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


7.1 Business Card: Languages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86
7.2 Invoice: Languages and Countries/Regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3 Payment Advice: Languages and Countries/Regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.4 Purchase Order: Languages and Countries/Regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.5 Extraction Using Template: Languages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.6 Extraction Using Generative AI: Languages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

8 Initial Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.1 Enabling the Service in the Cloud Foundry Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2 Enabling the Service in the Kyma Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97

9 Enable X.509 Authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

10 Run the Service in a Multitenant Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

11 Tutorials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

12 Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102
12.1 API Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Get Access Token. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Document Information Extraction


2 PUBLIC Content
Capabilities API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Client API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Identifier API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Configuration API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Document API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Enrichment Data API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Schema API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .184
Template API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Common Request Headers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Common Status and Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
12.2 Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .227
Enabling Destination Service for Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .228
Creating Destination Configuration for Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Supported Authentication Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Callback Request Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .231
Callback Response Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

13 Using the Document Information Extraction UI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234


13.1 Subscribing to the Document Information Extraction UI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Role Collections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .236
13.2 Using the Key Features of the Document Information Extraction UI. . . . . . . . . . . . . . . . . . . . . . . . 237
Set Screen Language. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Built-In Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Document. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .239
Schema Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Template. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252

14 Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258


14.1 Optical Character Recognition (OCR): Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
14.2 Schema Configuration: Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Standard Document Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Custom Document Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
14.3 Template: Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
General Recommendations and Limitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Standard and Custom Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
14.4 Document: Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
14.5 Data Enrichment: Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
14.6 Extraction Using Generative AI: Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

15 Technical Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275


15.1 Free Tier Option and Trial Account Technical Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

16 Extracted Header Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

Document Information Extraction


Content PUBLIC 3
16.1 Barcode Header Field in Invoice Documents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

17 Extracted Line Items. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

18 Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
18.1 Data Protection and Privacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .288
18.2 Auditing and Logging Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
18.3 Front-End Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

19 Accessibility Features in Document Information Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . 294

20 Monitoring and Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .295


20.1 Getting Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
20.2 Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Problem: You Receive Status Code 4**. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Problem: You Receive Status Code 400. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Problem: You Receive Status Code 401. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .297
Problem: You Receive Status Code 413. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Problem: You Receive Status Code 415. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Problem: You Receive Status Code 422. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Problem: You Receive Status Code 429. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Problem: You Receive Status Code 500. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

Document Information Extraction


4 PUBLIC Content
1 What Is Document Information
Extraction?

Automate your document information extraction processes.

Document Information Extraction helps you to process large amounts of business documents that have
content in headers and tables. You can use the extracted information, for example, to automatically process
payables, invoices, or payment notes while making sure that invoices and payables match. After you upload a
document file to the service, it returns the extraction results from header fields and line items.

 Tip

• See Supported Document Types and File Formats [page 84].


• See also Supported Languages and Countries/Regions [page 86].

The service performs the following steps to extract information from the uploaded document file:

1. The document is submitted to optical character recognition.


2. The information from the document is extracted to a JSON file, which the user can query.

For more information, see API Reference [page 102].

You can also use the Document Information Extraction UI to consume the service. See Using the Document
Information Extraction UI [page 234] to find out how to subscribe, access, and use the user interface
application for the service.

With Document Information Extraction you can:

• Process more documents efficiently with fewer errors and difficulties.


• Increase quality and compliance mechanisms.
• Reduce the time required to process a document.
• Allow the members of your organization to focus on more relevant tasks that are in their field of expertise.

Features

Automate information Automate the extraction of relevant information from business documents. The
extraction Document API takes document files as input and returns header fields and line items
as structured data.

Automate data Match a business document to enrichment data records based on the information
enrichment extracted from the document. The Enrichment Data API takes document files as
input and returns the ID of the matching enrichment data records.

Benefit from Use this service in tenant-aware (multitenant) applications. Run them on a shared
multitenancy support compute unit that can be used by multiple consumers (tenants).

Document Information Extraction


What Is Document Information Extraction? PUBLIC 5
 Note

SAP may continuously improve the above listed core features and their functionalities provided as part
of the Document Information Extraction cloud service including automation, transaction processing, and
machine learning on behalf of the customer.

 Tip

Use the data feedback collection feature to allow confirmed documents to be used to improve the
Document Information Extraction service.

SAP uses the identity and position of the document-specific fields (see Extracted Header Fields [page 278]
and Extracted Line Items [page 286]) as a feedback signal to continuously retrain the machine learning
models of the service. With this approach, SAP is able to reduce errors over time when predicting field
values from documents.

This is a platform functionality reused by other applications. SAP reserves the right to reject documents
submitted for retraining.

For more information, see Create Configuration [page 115], Confirm Document [page 157] and Data
Protection and Privacy [page 288].

Environment

This service is available in the following environments:

• Cloud Foundry environment


• Kyma environment

Multitenancy Support

This service supports multitenancy. It can be used in tenant-aware applications.

For information on multitenancy support, see Run the Service in a Multitenant Application [page 100].

Prerequisites

See Initial Setup [page 96].

Technical Constraints

For information on technical limits, see Technical Constraints [page 275].

Document Information Extraction


6 PUBLIC What Is Document Information Extraction?
Regional Availability
Get an overview on the availability of Document Information Extraction according to region, infrastructure
provider, and release status in the Pricing tab of the SAP Discovery Center .

Trial Scope
Document Information Extraction is available for trial use. A trial account lets you try out SAP Business
Technology Platform (SAP BTP) for free and is open to everyone. Trial accounts are intended for personal
exploration, and not for productive use or team development. They allow restricted use of the platform
resources and services.

To activate your trial account, go to Welcome to SAP BTP Trial.

 Note

See also the following information: Trial Accounts and Free Tier.

In the Cloud Foundry environment, you get a free trial account for Document Information Extraction with the
following constraints: Free Tier Option and Trial Account Technical Constraints [page 276].

Document Information Extraction


What Is Document Information Extraction? PUBLIC 7
2 What's New for Document Information
Extraction

Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Enrichment The Enrichment Data API [page Rec- Dep- Cha Intel- Not SAP 2024 2024
Business
men d Data API 166] endpoint Delete Enrich- om- re- nged li- ap- -05-1 -05-1
Foun Technology
t ment Data (Synchronous) - Dep- men cate gent pli- 3 3
dry Platform
Infor recated [page 181] is now ded d Tech ca-
mati deprecated and scheduled for nolo- ble
on decommissioning in November gies
Extra 2024. After that, the endpoint
ction will no longer be available.

Please use the endpoint De-


lete Enrichment Data (Asynchro-
nous) [page 182] to delete data
records.

Docu Clou Better Mod- The machine learning mod- Info Gen- Cha Intel- Not SAP 2024 2024
Business
men d els for the els for the extraction of only eral nged li- ap- -05-1 -05-1
Foun Technology
t Extraction invoice, paymentAdvice, Avail gent pli- 3 3
dry Platform
Infor of Standard and purchaseOrder docu- abil- Tech ca-
mati Document ments have been improved. ity nolo- ble
on Types gies
Extra
ction

Docu Clou Better Ex- The extraction of the Info Gen- Cha Intel- Not SAP 2024 2024
Business
men d traction of
rawValue response field has only eral nged li- ap- -05-1 -05-1
Foun Technology
t rawValue been improved for the standard Avail gent pli- 3 3
dry Platform
Infor for Stand- document types and fields. abil- Tech ca-
mati ard Docu- ity nolo- ble
See Get Result [page 138].
on ment Types gies
Extra and Fields
ction

Document Information Extraction


8 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Extracted You can now extract purchase Info Gen- New Intel- Not SAP 2024 2024
Business
men d Line Items order numbers that are availa- only eral li- ap- -03-1 -03-1
Foun Technology
t ble on line item field level from Avail gent pli- 1 1
dry Platform
Infor invoice documents. abil- Tech ca-
mati ity nolo- ble
See Extracted Line Items [page
on gies
286].
Extra
ction

Docu Clou Configura- Info Gen- New Intel- Not SAP 2024 2024
You can now use the client
Business
men d tion API scope configuration for the only eral li- ap- -03-1 -03-1
Foun Technology
t dataFeedbackCollection Avail gent pli- 1 1
dry Platform
Infor configuration key. abil- Tech ca-
mati ity nolo- ble
See Configuration Keys [page
on gies
117].
Extra
ction

Docu Clou Post Cata- You can now filter documents Info Gen- New Intel- Not SAP 2024 2024
Business
men d log based on schemaId. only eral li- ap- -03-1 -03-1
Foun Technology
t Avail gent pli- 1 1
dry See Post Catalog [page 134]. Platform
Infor abil- Tech ca-
mati ity nolo- ble
on gies
Extra
ction

Docu Clou New Invoice The Document Information Info Gen- New Intel- Not SAP 2024 2024
Business
men d Supported Extraction service supports now only eral li- ap- -03-1 -03-1
Foun Technology
t Language - the Japanese language for Avail gent pli- 1 1
dry Platform
Infor Japanese invoice documents. abil- Tech ca-
mati ity nolo- ble
See Invoice: Languages and
on gies
Countries/Regions [page 87].
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 9
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Better Mod- The machine learning mod- Info Gen- Cha Tech Not SAP 2024 2024
Business
men d els for the els for the extraction of only eral nged nol- ap- -03-1 -03-1
Foun Technology
t Extraction invoice, paymentAdvice, Avail ogy pli- 1 1
dry Platform
Infor of Standard and purchaseOrder docu- abil- ca-
mati Document ments have been improved. ity ble
on Types
Extra
ction

Docu Clou Better Ex- The template algorithm has Info Gen- Cha Tech Not SAP 2024 2024
Business
men d traction of been enhanced. Document only eral nged nol- ap- -02- -02-
Foun Technology
t Line Items Information Extraction now de- Avail ogy pli- 20 20
dry Platform
Infor from Multi- livers better results when ex- abil- ca-
mati page Docu- tracting line items from multi- ity ble
on ments with page documents with a table
Extra Template header that appears only on the
ction first page.

Document Information Extraction


10 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Combine You can now combine header Info Gen- Cha Intel- Not SAP 2024 2024
Different Business
men d fields with different setup types only eral nged li- ap- -02- -02-
Foun Setup Technology
t in the same schema. Avail gent pli- 20 20
dry Types When Platform
Infor Adding abil- Tech ca-
You can add header fields with
mati Data Fields ity nolo- ble
the following setup types to a
on to Schemas gies
schema created for a standard
Extra
document type:
ction
• auto (with and without a de-
fault extractor)
• manual

You can add header fields with


the following setup types to a
schema created for a custom
document type:

• auto (without a default ex-


tractor)
• manual

 Restriction
The setup type auto is
available without default
extractor (extraction using
generative AI) for sche-
mas with the service
plan Document Information
Extraction, premium edi-
tion (premium_edition)
only. See Service Plans
[page 77] and Metering
and Pricing [page 79].

See also Add Fields to Schema


Version [page 199], Add Data
Fields [page 247], and Setup
Types [page 249].

Document Information Extraction


What's New for Document Information Extraction PUBLIC 11
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Invoices - The conversion of country spe- Info Gen- Cha Intel- Not SAP 2024 2024
Conversion Business
men d cific unit of measure values to only eral nged li- ap- -02- -02-
Foun of Country Technology
t ISO format for invoice docu- Avail gent pli- 20 20
dry Specific Platform
Infor Unit of ments has been improved. abil- Tech ca-
mati Measure ity nolo- ble
on Values to gies
Extra ISO Format
ction

Docu Clou Support for Info Gen- Cha Intel- Not SAP 2024 2024
The businessCard docu-
Business
men d
business
ments are now supported in the only eral nged li- ap- -02- -02-
Foun Card Technology
t AWS region Australia (Sydney). Avail gent pli- 20 20
dry Documents Platform
Infor abil- Tech ca-
in AWS re- See Supported Document Types
mati gion Aus- ity nolo- ble
and File Formats [page 84].
on tralia (Syd- gies
Extra ney)
ction

Docu Clou Download You can now download data Info Gen- New Intel- Not SAP 2024 2024
Business
men d Trouble- about documents added to only eral li- ap- -02- -02-
Foun Technology
t shooting the Document Information Avail gent pli- 05 05
dry Platform
Infor Data for Extraction UI for use in trouble- abil- Tech ca-
mati Documents shooting any issues. ity nolo- ble
on gies
See Download Troubleshooting
Extra
Data [page 241].
ction

Docu Clou Model Used The Document API now includes Info Gen- New Intel- Not SAP 2024 2024
Business
men d for Extrac- information about the model only eral li- ap- -02- -02-
Foun Technology
t tion used for extraction. As a result, Avail gent pli- 05 05
dry Platform
Infor you can see whether Document abil- Tech ca-
mati Information Extraction used a ity nolo- ble
on template or AI to extract infor- gies
Extra mation from a particular field.
ction
See Get Result [page 138].

Document Information Extraction


12 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou New Loca- You now call up the Info Gen- Cha Intel- Not SAP 2024 2024
Business
men d tion for Schema Configuration feature only eral nged li- ap- -02- -02-
Foun Technology
t Schema of Document Information Avail gent pli- 05 05
dry Platform
Infor Configura- Extraction UI directly from the abil- Tech ca-
mati tion Feature navigation bar on the left of the ity nolo- ble
on on UI screen. gies
Extra
See Create Schema [page 246].
ction

Docu Clou Extraction We’ve fixed an issue with ex- Info Gen- Cha Intel- Not SAP 2024 2024
Business
men d of Descrip- tracting description values from only eral nged li- ap- -02- -02-
Foun Technology
t tions from columns. Avail gent pli- 05 05
dry Platform
Infor Columns abil- Tech ca-
Document Information
mati ity nolo- ble
Extraction now extracts the
on gies
complete content of large col-
Extra
umn cells containing descrip-
ction
tions of numbers or quantities,
for example.

Docu Clou Extraction We’ve fixed an issue with ex- Info Gen- Cha Intel- Not SAP 2024 2024
Business
men d of Line tracting line items. only eral nged li- ap- -02- -02-
Foun Technology
t Items Avail gent pli- 05 05
dry If the template returns the ex- Platform
Infor abil- Tech ca-
traction result invalid, but the
mati ity nolo- ble
AI returns the extraction result
on gies
valid for the same line item,
Extra
the final result is now valid
ction
when Document Information
Extraction merges the two re-
sults.

Docu Clou Get Tem- Info Gen- Cha Intel- Not SAP 2024 2024
The limit parameter of the
Business
men d plates End- Get Templates endpoint is now only eral nged li- ap- -02- -02-
Foun Technology
t point independent of the order pa- Avail gent pli- 05 05
dry Platform
Infor rameter. abil- Tech ca-
mati ity nolo- ble
To apply the limit parameter,
on gies
you no longer need to specify a
Extra
ction value for order.

See Get Template [page 213].

Document Information Extraction


What's New for Document Information Extraction PUBLIC 13
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Display De- You can now display the de- Info Gen- Cha Intel- Not SAP 2024 2024
Business
men d scription for scription text for fields in only eral nged li- ap- -02- -02-
Foun Technology
t Fields in Ex- the Extraction Results pane Avail gent pli- 05 05
dry Platform
Infor traction Re- on the Document Information abil- Tech ca-
mati sults Extraction UI. ity nolo- ble
on gies
To view the description, open
Extra
the Extraction Results pane and
ction
hover over the name of a header
field or line item. A tooltip ap-
pears, which includes the de-
scription text.

See View and Edit Extraction Re-


sults [page 242].

Docu Clou Extracted The line items materialNumber Info Dep- Cha Intel- Not SAP 2024 2024
Business
men d Line Items - and senderMaterialNumber only re- nged li- ap- -02- -02-
Foun Technology
t materialNu were replaced by cate gent pli- 05 05
dry Platform
Infor mber and supplierMaterialNumber and d Tech ca-
mati senderMate customerMaterialNumber re- nolo- ble
on rialNumber spectively in the list of gies
Extra Depreca- fields that you can extract
ction tion in from purchaseOrder docu-
SAP_purch ments when using the
aseOrder_ SAP_purchaseOrder_schema.
schema
The legacy line
items materialNumber and
senderMaterialNumber are now
deprecated and no longer availa-
ble for purchaseOrder docu-
ments.

See Extracted Line Items [page


286].

Document Information Extraction


14 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Extracted We updated the list of line Info Dep- Cha Intel- Not SAP 2024 2024
Business
men d Line Items - items that you can extract from only re- nged li- ap- -02- -02-
Foun Technology
t currencyCo purchaseOrder documents. cate gent pli- 05 05
dry Platform
Infor de Depreca- The currencyCode line item is d Tech ca-
mati tion now deprecated and no longer nolo- ble
on available for extraction. gies
Extra
See Extracted Line Items [page
ction
286].

2.1 2023 What's New for Document Information Extraction


(Archive)

Document Information Extraction


What's New for Document Information Extraction PUBLIC 15
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Prefilled When you add data fields to Info Gen- New Tech Not SAP 2023 2023
Business
men d Setup schemas, the service now prefills only eral nol- ap- -12-1 -12-1
Foun Technology
t Types for the Setup Type field with default Avail ogy pli- 1 1
dry Platform
Infor Schema values. abil- ca-
mati Fields ity ble
Depending on whether you
on
use Document Information
Extra
Extraction, premium edition or
ction
base edition, the default values
are as follows:

• Premium edition
• Schemas for standard
and custom document
types: auto
• Base edition
• Schemas for standard
document types: auto
• Schemas for cus-
tom document types:
manual

See Setup Types [page 249].

Docu Clou Support for The Document Information Info Gen- New Tech Not SAP 2023 2023
Business
men d X.509 Au- Extraction APIs now support only eral nol- ap- -12-1 -12-1
Foun Technology
t thentication X.509 authentication. Avail ogy pli- 1 1
dry Platform
Infor abil- ca-
See Enable X.509 Authentica-
mati ity ble
tion [page 98].
on
Extra
ction

Docu Clou Auditing New client related events have Info Gen- New Tech Not SAP 2023 2023
Business
men d and Log- been created. only eral nol- ap- -12-1 -12-1
Foun Technology
t ging Infor- Avail ogy pli- 1 1
dry See Auditing and Logging Infor- Platform
Infor mation abil- ca-
mation [page 291].
mati ity ble
on
Extra
ction

Document Information Extraction


16 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Template From now, you can't download Info Gen- Cha Tech Not SAP 2023 2023
Business
men d API documents that are part of the only eral nged nol- ap- -12-1 -12-1
Foun Technology
t template export package but Avail ogy pli- 1 1
dry Platform
Infor haven't been malware-scanned abil- ca-
mati during upload. You can download ity ble
on malware-scanned documents
Extra only.
ction
See Export Template [page 223].

Docu Clou Document There have been several security Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Information improvements on the Document only eral nged nol- ap- -12-1 -12-1
Foun Technology
t Extraction Information Extraction UI. Avail ogy pli- 1 1
dry Platform
Infor UI abil- ca-
See Using the Document Infor-
mati ity ble
mation Extraction UI [page 234].
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 17
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou New Serv- The service plan Info Gen- New Tech Not SAP 2023 2023
Business
men d ice Plan: Document Information only eral nol- ap- -12-0 -12-0
Foun Technology
t Document Extraction, premium edition Avail ogy pli- 6 6
dry Platform
Infor Informatio (premium_edition) is now abil- ca-
mati n generally available. ity ble
on Extraction,
The premium_edition service
Extra premium
plan allows you to use genera-
ction edition
tive AI to automate use cases
(premium_
for business document process-
edition)
ing with large language models
(LLMs). Use generative AI to
process business documents in
more than 40 languages, and
implement new business docu-
ment use cases with shorter
time to value.

You can also use an SAP


BTP trial account to try out
the document information ex-
traction using generative AI. Fol-
low the tutorial: Use Trial to
Extract Information from Cus-
tom Documents with Generative
AI and Document Information
Extraction .

See Service Plans [page 77]


and Metering and Pricing [page
79].

See also Extraction Using Gener-


ative AI: Languages [page 94],
Add Fields to Schema Version
[page 199], Setup Types [page
249], and Extraction Using Gen-
erative AI: Best Practices [page
273].

Document Information Extraction


18 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Template The Template API [page 211] is Info Gen- New Tech Not SAP 2023 2023
Business
men d API now generally available. You can only eral nol- ap- -11-2 -11-2
Foun Technology
t now use the Template API end- Avail ogy pli- 7 7
dry Platform
Infor points to create, reuse, edit, and abil- ca-
mati delete templates based on sche- ity ble
on mas and document types.
Extra
ction

Docu Clou Machine For your convenience, machine Info Gen- New Tech Not SAP 2023 2023
Business
men d Translation translation from the original only eral nol- ap- -11-2 -11-2
Foun Technology
t available for and official English language is Avail ogy pli- 7 7
dry Platform
Infor the now available for the Document abil- ca-
mati Document Information Extraction docu- ity ble
on Information mentation on SAP Help Portal in
Extra Extraction the following languages:
ction SAP Help
• Chinese Simplified
Portal Doc-
umentation
• French
• German
• Italian
• Japanese
• Korean
• Portuguese
• Spanish

Docu Clou Configura- In addition to the already availa- Info Gen- Cha Tech Not SAP 2024 2023
tion API and Business
men d ble instance and tenant only eral nged nol- ap- -01- -11-2
Foun Notifica- Technology
t scopes, you can now also use Avail ogy pli- 08 7
dry tions Platform
Infor the abil- ca-
mati activateDocumentNotifi ity ble
on cations configuration key on
Extra client scope level to enable
ction the Notifications [page 227]
functionality and get notifica-
tions about the status of your
processed documents.

See Configuration Keys [page


117].

Document Information Extraction


What's New for Document Information Extraction PUBLIC 19
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of invoice only eral nged nol- ap- -11-2 -11-2
Foun Technology
t the Extrac- documents has been improved. Avail ogy pli- 7 7
dry Platform
Infor tion of The improvements include bet- abil- ca-
mati invoice ter extraction results for cur- ity ble
on Documents rency, country and date fields.
Extra Additionally, the service now
ction supports the following coun-
tries/regions for Invoice: Lan-
guages and Countries/Regions
[page 87] documents (and
their corresponding languages):

• Hungary (Hungarian)
• Romania (Romanian)
• Türkiye (Turkish)

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of only eral nged nol- ap- -11-2 -11-2
Foun Technology
t the Extrac- paymentAdvice documents Avail ogy pli- 7 7
dry Platform
Infor tion of has been improved. The im- abil- ca-
mati paymentA provements include better ex- ity ble
on dvice traction results for currency and
Extra Documents country fields.
ction

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of only eral nged nol- ap- -11-2 -11-2
Foun Technology
t the Extrac- purchaseOrder documents Avail ogy pli- 7 7
dry Platform
Infor tion of has been improved. The im- abil- ca-
mati purchase provements include better ex- ity ble
on Order traction results for currency,
Extra Documents country and date fields. Addi-
ction tionally, the service now sup-
ports the extraction of quanti-
ties with multipliers, for example,
"2x5".

Document Information Extraction


20 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Enrichment The orderby parameter was Info Dep- Cha Tech Not SAP 2023 2023
Business
men d Data API replaced by order in December only re- nged nol- ap- -11-2 -11-2
Foun Technology
t 2022. cate ogy pli- 7 7
dry Platform
Infor d ca-
The legacy orderby parameter
mati ble
is now deprecated and no longer
on
available.
Extra
ction See List Data-Persistence Jobs
[page 174].

Docu Clou New Gener- The tutorial Use Trial to Ex- Info Gen- New Tech Not SAP 2023 2023
ative AI Tu- Business
men d tract Information from Custom only eral nol- ap- -11-1 -11-1
Foun torial Technology
t Documents with Generative Avail ogy pli- 0 0
dry Platform
Infor AI and Document Information abil- ca-
mati Extraction is now available. ity ble
on
Learn how to use Document
Extra
Information Extraction with gen-
ction
erative AI to automate the ex-
traction of information from cus-
tom document types using large
language models (LLMs).

Docu Clou Data Feed- You can now use the feed- Info Gen- New Tech Not SAP 2023 2023
Business
men d back Col- back collection feature in only eral nol- ap- -11-0 -11-0
Foun Technology
t lection for the Document Information Avail ogy pli- 5 5
dry Platform
Infor Model Im- Extraction UI to consent to the abil- ca-
mati provement use of confirmed documents to ity ble
on retrain the service’s machine
Extra learning models.
ction
See Confirm Documents [page
244].

Docu Clou Document The look and feel of Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Information the Document Information only eral nged nol- ap- -11-2 -11-0
Foun Technology
t Extraction Extraction UI has been updated Avail ogy pli- 9 5
dry Platform
Infor UI to provide the latest SAP Fiori abil- ca-
mati user experience. ity ble
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 21
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Edit Tem- You can now edit templates. In Info Gen- New Tech Not SAP 2023 2023
Business
men d plate addition to changing the name only eral nol- ap- -11-2 -10-2
Foun Technology
t and description, you can choose Avail ogy pli- 9 3
dry Platform
Infor a different schema for the tem- abil- ca-
mati plate. Changing the schema ity ble
on makes a new set of extraction
Extra fields available for the template.
ction
If you’ve already edited extrac-
tion results for sample docu-
ments associated with your tem-
plate, these edits are preserved
following the change of schema
if the relevant fields appear
in both the old and the new
schema.

See Edit Template [page 255].

Docu Clou Field Label In Schema Configuration, you Info Gen- New Tech Not SAP 2023 2023
Business
men d can now optionally enter a field only eral nol- ap- -10-2 -10-2
Foun Technology
t label in the Add Data Field dialog. Avail ogy pli- 3 3
dry Platform
Infor These labels enable you to give abil- ca-
mati user-friendly names to some or ity ble
on all of the header fields and line
Extra item fields that you add to sche-
ction mas.

Field labels that you define


in this way are displayed in-
stead of the technical field
names under Extraction Results
in the Document feature of
the Document Information
Extraction UI.

See Add Data Fields [page 247].

Document Information Extraction


22 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of invoice only eral nged nol- ap- -10-2 -10-2
Foun Technology
t the Extrac- documents has been improved. Avail ogy pli- 3 3
dry Platform
Infor tion of The improvements include bet- abil- ca-
mati invoice ter extraction results for date ity ble
on Documents fields.
Extra
ction

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of only eral nged nol- ap- -10-2 -10-2
Foun Technology
t the Extrac- paymentAdvice documents Avail ogy pli- 3 3
dry Platform
Infor tion of has been improved. The im- abil- ca-
mati paymentA provements include better ex- ity ble
on dvice traction results for date fields,
Extra Documents and amount fields in line items.
ction

Docu Clou Built-In You can now use the integrated Info Gen- New Tech Not SAP 2023 2023
Business
men d Support Built-In Support tool to quickly only eral nol- ap- -10-0 -10-0
Foun Technology
t find answers to your support-re- Avail ogy pli- 9 9
dry Platform
Infor lated questions. abil- ca-
mati ity ble
Built-In Support is an embedded
on
digital assistant that allows you
Extra
to search for support-related in-
ction
formation without leaving the UI.

If you have an s-user ID and the


associated authorizations, Built-
In Support also allows you to re-
port issues, review cases, and
chat with an expert or a chatbot.

See Built-In Support [page 238].

Document Information Extraction


What's New for Document Information Extraction PUBLIC 23
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Configura- The Info Gen- New Tech Not SAP 2023 2023
tion API Business
men d enrichmentConfidenceTh only eral nol- ap- -10-0 -10-0
Foun Technology
t reshold configuration key is Avail ogy pli- 9 9
dry Platform
Infor now available. You can now ad- abil- ca-
mati just the similarity confidence ity ble
on threshold for the enrichment.
Extra
See Create Configuration [page
ction
115], Configuration Keys [page
117], and Enrichment Data API
[page 166].

Docu Clou New Auto- You can now have the Document Info Gen- New Tech Not SAP 2023 2023
Business
men d save Fea- Information Extraction UI save only eral nol- ap- -10-0 -10-0
Foun Technology
t ture for Ed- your edits to extraction results. Avail ogy pli- 9 9
dry Platform
Infor iting Extrac- abil- ca-
When you choose Autosave on
mati tion Results ity ble
the Edit Extraction Results pane
on
in the Documents feature, the
Extra
service saves your work auto-
ction
matically at 10-second intervals.

See View and Edit Extraction Re-


sults [page 242].

Docu Clou New The setup types auto and Info Gen- New Tech Not SAP 2023 2023
Schema Business
men d manual are now available when only eral nol- ap- -10-0 -10-0
Foun Field Setup Technology
t you add data fields to new sche- Avail ogy pli- 9 9
dry Types Platform
Infor mas. abil- ca-
mati ity ble
See Add Fields to Schema Ver-
on
sion [page 199] and Add Data
Extra
Fields [page 247].
ction

Docu Clou Technical You can now associate a max- Info Gen- New Tech Not SAP 2023 2023
d Business
men Constraints imum of 5 documents with a only eral nol- ap- -10-0 -10-0
Foun Technology
t template. Avail ogy pli- 9 9
dry Platform
Infor abil- ca-
See Technical Constraints [page
mati ity ble
275], Free Tier Option and Trial
on
Account Technical Constraints
Extra
[page 276] and Add Documents
ction
and Activate/Deactivate Tem-
plate [page 253].

Document Information Extraction


24 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Associated You can now associate docu- Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Confirmed ments that have the status only eral nged nol- ap- -10-0 -10-0
Foun Technology
t Documents “CONFIRMED” with templates. Avail ogy pli- 9 9
dry Platform
Infor with Tem- abil- ca-
If you edit the extraction results
mati plates ity ble
for a document and then con-
on
firm the document, you can use
Extra
the Add to Document feature to
ction
associate the document with a
template.

See Add Documents and Acti-


vate/Deactivate Template [page
253].

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of only eral nged nol- ap- -10-0 -10-0
Foun Technology
t the Extrac- purchaseOrder documents Avail ogy pli- 9 9
dry Platform
Infor tion of has been improved. The im- abil- ca-
mati purchase provements include better ex- ity ble
on Order traction results for date fields
Extra Documents and better formatting of
ction amounts.

Docu Clou Role Collec- The role collection Info Dep- Cha Tech Not SAP 2023 2023
Business
men d tions Document_Information_E only re- nged nol- ap- -10-0 -10-0
Foun Technology
t xtraction_UI_Admin_Use cate ogy pli- 9 9
dry Platform
Infor r has been deprecated. d ca-
mati ble
To create or delete schemas and
on
templates, use the role collec-
Extra
tion
ction
Document_Information_E
xtraction_UI_Templates
_Admin.

See Role Collections [page 236].

Document Information Extraction


What's New for Document Information Extraction PUBLIC 25
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Use Gener- You now have the option of using Info Re- New Tech Not SAP 2023 2023
Business
men d ative AI to generative AI to extract informa- only strict nol- ap- -11-1 -10-0
Foun Technology
t Extract In- tion from standard and custom ed ogy pli- 0 5
dry Platform
Infor formation document types. Avail ca-
mati from Stand- abil- ble
To use generative AI, select the
on ard and ity
setup type auto without a de-
Extra Custom
fault extractor when adding data
ction Document
fields to a schema for a standard
Types
or custom document type.

 Restriction
This option is currently
available in SAP BTP trial ac-
counts only.

If you don’t want to use gener-


ative AI with standard or cus-
tom document types, select the
setup type manual when adding
fields to schemas. With standard
document types, you can also
avoid using generative AI by se-
lecting auto with a suitable de-
fault extractor.

See Add Fields to Schema Ver-


sion [page 199] and Setup Types
[page 249].

Document Information Extraction


26 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Schema You can now optionally use Info Gen- New Tech Not SAP 2023 2023
Business
men d API - Add the label property to en- only eral nol- ap- -09- -09-
Foun Technology
t Schema ter field labels. These la- Avail ogy pli- 04 04
dry Platform
Infor Fields bels enable you to give user- abil- ca-
mati friendly names to some or all ity ble
on of the headerFields and
Extra lineItemFields that you
ction include in the Add Fields to
Schema Version [page 199]
payload.

Field labels that you define


in this way are displayed in-
stead of the technical field
names under Extraction Results
in the Document feature of
the Document Information
Extraction UI.

Docu Clou Free Tier Free tier and trial account users Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Option and can now: only eral nged nol- ap- -09- -09-
Foun Technology
t Trial Ac- Avail ogy pli- 04 04
dry • Upload up to 50 document Platform
Infor count Tech- abil- ca-
pages per tenant in a rolling
mati nical Con- ity ble
period of 30 days.
on straints
Extra
• Create up to 1000 schemas
per client.
ction
See Free Tier Option and Trial
Account Technical Constraints
[page 276].

Docu Clou Extraction You no longer need to save Info Gen- New Tech Not SAP 2023 2023
Business
men d Results extraction results manually be- only eral nol- ap- -08-1 -08-1
Foun Technology
t Saved Au- fore associating documents Avail ogy pli- 8 8
dry Platform
Infor tomatically with templates. The Document abil- ca-
mati when Docu- Information Extraction UI now ity ble
on ments As- saves these results automati-
Extra sociated cally.
ction with Tem-
plates

Document Information Extraction


What's New for Document Information Extraction PUBLIC 27
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Schema The Schema API [page 184] is Info Gen- New Tech Not SAP 2023 2023
Business
men d API now generally available. You can only eral nol- ap- -08-1 -08-1
Foun Technology
t now use the Schema API end- Avail ogy pli- 8 8
dry Platform
Infor points to create, list, update, and abil- ca-
mati delete schemas and schema ver- ity ble
on sions.
Extra
ction

Docu Clou Technical The maximum total number of Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Constraints header fields and line items you only eral nged nol- ap- -08-1 -08-1
Foun Technology
t can add per schema is now 500. Avail ogy pli- 8 8
dry Platform
Infor abil- ca-
See Technical Constraints [page
mati ity ble
275].
on
Extra
ction

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of invoice only eral nged nol- ap- -08-1 -08-1
Foun Technology
t the Extrac- documents has been improved. Avail ogy pli- 8 8
dry Platform
Infor tion of The improvements include bet- abil- ca-
mati invoice ter extraction results for bank ity ble
on Documents account numbers, amounts with
Extra non-standard formats and nu-
ction merical dates with whitespaces.

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -08-1 -08-1
Foun Technology
t Avail ogy pli- 8 8
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


28 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Delete When editing extraction results Info Gen- New Tech Not SAP 2023 2023
Business
men d Bounding with the Document Information only eral nol- ap- -07- -07-
Foun Technology
t Boxes Extraction UI, you can now de- Avail ogy pli- 26 26
dry Platform
Infor lete bounding boxes together abil- ca-
mati with their coordinates. ity ble
on
See View and Edit Extraction Re-
Extra
sults [page 242].
ction

Docu Clou Display and When editing extraction results Info Gen- New Tech Not SAP 2023
Business
men d Edit Bound- with the Document Information only eral nol- ap- -07-
Foun Technology
t ing Boxes Extraction UI, you can now Avail ogy pli- 26
dry Platform
Infor open the Assign Field dialog for abil- ca-
mati bounding boxes by choosing the ity ble
on relevant tooltip in the page pre-
Extra view pane.
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -07- -07-
Foun Technology
t Avail ogy pli- 26 26
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -07-1 -07-1
Foun Technology
t Avail ogy pli- 7 7
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 29
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Technical The maximum number of tem- Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Constraints plates you can create has been only eral nged nol- ap- -06- -06-
Foun Technology
t increased from 1000 templates Avail ogy pli- 30 30
dry Platform
Infor per tenant to 1000 templates abil- ca-
mati per schema. ity ble
on
See Technical Constraints [page
Extra
275].
ction

Docu Clou Support for The Template [page 252] feature Info Gen- New Tech Not SAP 2023 2023
Business
men d Country supports now country code con- only eral nol- ap- -06- -06-
Foun Technology
t Code Con- version. Avail ogy pli- 22 22
dry Platform
Infor version in abil- ca-
mati Template ity ble
on
Extra
ction

Docu Clou New Data The new data type country/ Info Gen- New Tech Not SAP 2023 2023
Business
men d Type region is now available for only eral nol- ap- -06- -06-
Foun Technology
t country/ schema fields. Avail ogy pli- 22 22
dry Platform
Infor region for abil- ca-
See Add Data Fields [page 247].
mati Schema ity ble
on Fields
Extra
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -06-1 -06-1
Foun Technology
t Avail ogy pli- 3 3
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


30 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Issues with Some issues with codes Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Units of for units of measure in only eral nged nol- ap- -06-1 -06-1
Foun Technology
t Measure in purchaseOrder documents Avail ogy pli- 3 3
dry Platform
Infor purchase have now been resolved. abil- ca-
mati Order ity ble
on Documents
Extra Corrected
ction

Docu Clou Support for When you edit extraction results, Info Gen- New Tech Not SAP 2023 2023
Business
men d Bounding you can now draw bounding only eral nol- ap- -06- -06-1
Foun Technology
t Boxes boxes around parts of header Avail ogy pli- 30 3
dry Platform
Infor around field entries, instead of around abil- ca-
mati Parts of the entire entry. ity ble
on Fields
As a result, you can elimi-
Extra
nate unwanted or irrelevant el-
ction
ements, such as punctuation,
from strings and ensure that
they include only the values that
you need.

See View and Edit Extraction Re-


sults [page 242].

Docu Clou Better The machine learning model Info Gen- New Tech Not SAP 2023 2023
Business
men d Model for for the extraction of only eral nol- ap- -06-1 -06-1
Foun Technology
t the Extrac- paymentAdvice documents Avail ogy pli- 3 3
dry Platform
Infor tion of has been improved. abil- ca-
mati paymentA ity ble
on dvice
Extra Documents
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -05- -05-
Foun Technology
t Avail ogy pli- 23 23
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 31
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Setup Type The Add Data Field for schema Info Gen- New Tech Not SAP 2023 2023
Business
men d field on Add configuration now includes a only eral nol- ap- -05- -05-
Foun Technology
t Data Field new field: Setup Type. Avail ogy pli- 08 08
dry Platform
Infor dialog for abil- ca-
See the updated procedure in
mati schemas ity ble
Add Data Fields [page 247].
on
Extra
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -05- -05-
Foun Technology
t Avail ogy pli- 08 08
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Response The Document API endpoint Get Info Gen- New Tech Not SAP 2023 2023
Business
men d Field Result [page 138] includes a only eral nol- ap- -05- -05-
Foun Technology
t clientId new response field: clientId. Avail ogy pli- 08 08
dry Platform
Infor in Get Re- You can now identify the client abil- ca-
mati sult End- that submitted the extraction ity ble
on point request using the Upload Docu-
Extra ment [page 127] endpoint.
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -04- -04-
Foun Technology
t Avail ogy pli- 20 20
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


32 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Fixed Val- You can now include fixed val- Info Gen- New Tech Not SAP 2023 2023
Business
men d ues in Tem- ues for selected extraction fields only eral nol- ap- -04- -04-
Foun Technology
t plate Ex- in a template. If you intend to Avail ogy pli- 04 04
dry Platform
Infor traction use a template with documents abil- ca-
mati Fields from only one supplier, for exam- ity ble
on ple, you can define the supplier’s
Extra name as the fixed value for the
ction senderName field.

See Add Template [page 253].

Docu Clou Scene Text You can now extract text from Info Gen- New Tech Not SAP 2023 2023
Business
men d Recognition images using the OCR engine only eral nol- ap- -04- -04-
Foun Technology
t Schema for scene text recognition. When Avail ogy pli- 04 04
dry Platform
Infor you create a schema with the abil- ca-
mati document type Custom, you can ity ble
on choose between two types of
Extra OCR engine (Document or Scene
ction Text), depending on whether the
text you wish to extract is in an
image or not.

See Schema Configuration


[page 245] and Create Schema
[page 246].

Docu Clou Filtering, or- The new Document API endpoint Info Gen- New Tech Not SAP 2023 2023
Business
men d dering, and Post Catalog [page 134] is now only eral nol- ap- -04- -04-
Foun Technology
t pagination available. You can use the fol- Avail ogy pli- 04 04
dry Platform
Infor lowing catalog options to get a abil- ca-
mati list with all document processing ity ble
on jobs in a JSON file:
Extra
• Filtering
ction
• Ordering
• Pagination

The Document Information


Extraction UI also supports
document filtering, ordering, and
pagination.

Document Information Extraction


What's New for Document Information Extraction PUBLIC 33
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Configura- The Info Gen- New Tech Not SAP 2023 2023
tion API and Business
men d activateDocumentNotifi only eral nol- ap- -04- -04-
Foun Notifica- Technology
t cations configuration key is Avail ogy pli- 04 04
dry tions Platform
Infor now available. You can now ena- abil- ca-
mati ble the Notifications [page 227] ity ble
on functionality to get notifications
Extra about the status of your proc-
ction essed documents.

See Create Configuration [page


115].

Docu Clou New Proce- There’s now a new procedure for Info Gen- Cha Tech Not SAP 2023 2023
Business
men d dure for As- adding documents to templates only eral nged nol- ap- -04- -04-
Foun Technology
t sociating on the Document Information Avail ogy pli- 04 04
dry Platform
Infor Documents Extraction UI. In the past, abil- ca-
mati with Tem- you selected these documents ity ble
on plates when creating the template or
Extra added them later using the
ction Template feature. Now, you se-
lect documents using the new
Add to Template function in the
Document feature.

See Add Documents and Acti-


vate/Deactivate Template [page
253] and View and Edit Extrac-
tion Results [page 242].

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of invoice only eral nged nol- ap- -04- -04-
Foun Technology
t the Extrac- documents has been improved. Avail ogy pli- 04 04
dry Platform
Infor tion of abil- ca-
mati invoice ity ble
on Documents
Extra
ction

Document Information Extraction


34 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -04- -04-
Foun Technology
t Avail ogy pli- 04 04
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Get Tem- The new Document API endpoint Info Gen- New Tech Not SAP 2023 2023
Business
men d plates End- Get Templates Associated with only eral nol- ap- -03-1 -03-1
Foun Technology
t point Document [page 164] is now Avail ogy pli- 4 4
dry Platform
Infor available. You can get all the abil- ca-
mati templates associated with the ity ble
on specified document ID.
Extra
ction

Docu Clou New Tem- The Document Information Info Gen- New Tech Not SAP 2023 2023
Business
men d plate Fea- Extraction UI Template [page only eral nol- ap- -03-1 -03-1
Foun Technology
t ture Sup- 252] feature supports now the Avail ogy pli- 4 4
dry Platform
Infor ported Lan- Greek language. abil- ca-
mati guage - ity ble
See Extraction Using Template:
on Greek
Languages [page 92].
Extra
ction

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2023 2023
Business
men d Model for for the extraction of only eral nged nol- ap- -03-1 -03-1
Foun Technology
t the Extrac- purchaseOrder documents Avail ogy pli- 4 4
dry Platform
Infor tion of has been improved. abil- ca-
mati purchase ity ble
on Order
Extra Documents
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 35
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -03-1 -03-1
Foun Technology
t Avail ogy pli- 4 4
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -03- -03-
Foun Technology
t Avail ogy pli- 01 01
dry The performance of the Tem- Platform
Infor abil- ca-
plate [page 252] feature has
mati ity ble
been improved.
on
Extra
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements and stability improvements. only eral nged nol- ap- -02-1 -02-1
Foun Technology
t Avail ogy pli- 7 7
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Overall Im- There have been several code, Info Gen- Cha Tech Not SAP 2023 2023
Business
men d provements security, and stability improve- only eral nged nol- ap- -02- -02-
Foun Technology
t ments. Avail ogy pli- 06 06
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


36 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Barcode You can now see in the response Info Gen- New Tech Not SAP 2023 2023
Business
men d Header from Get Result [page 138], in only eral nol- ap- -01-3 -01-3
Foun Technology
t Field Sym- the symbology response field, Avail ogy pli- 0 0
dry Platform
Infor bology the type of the extracted bar- abil- ca-
mati code header fields. ity ble
on
Extra
ction

2.2 2022 What's New for Document Information Extraction


(Archive)

Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Configura- Info Gen- New Tech Not SAP 2022 2022
The coordinateFormat
Business
men d tion API configuration key is now availa- only eral nol- ap- -12-1 -12-1
Foun Technology
t ble. You can now choose the for- Avail ogy pli- 9 9
dry Platform
Infor mat of the bounding box coordi- abil- ca-
mati nates in the extraction results. ity ble
on
See Create Configuration [page
Extra
115].
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 37
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Enrichment The orderby parameter has Rec- Gen- Cha Tech Not SAP 2022 2022
Business
men d Data API been replaced by order. om- eral nged nol- ap- -12-1 -12-1
Foun Technology
t men Avail ogy pli- 9 9
dry Platform
Infor  Note ded abil- ca-
mati ity ble
The legacy orderby pa-
on
rameter will still be sup-
Extra
ported for a limited amount
ction
of time. Please start using
the new parameter (order)
as soon as possible.

See List Data-Persistence Jobs


[page 174].

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -12-1 -12-1
Foun Technology
t Avail ogy pli- 9 9
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Document The Document Information Info Gen- New Tech Not SAP 2022 2022
Business
men d Information Extraction UI and associated in- only eral nol- ap- -12-0 -12-0
Foun Technology
t Extraction app help are now available in the Avail ogy pli- 7 7
dry Platform
Infor UI following new languages: abil- ca-
mati ity ble
• Chinese Simplified
on
Extra
• Chinese Traditional

ction • French
• Italian
• Japanese
• Korean
• Portuguese
• Russian
• Spanish

See Set Screen Language [page


237].

Document Information Extraction


38 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Enrichment You can now see in the response Info Gen- New Tech Not SAP 2022 2022
Business
men d Data from Get Result [page 138], only eral nol- ap- -12-0 -12-0
Foun Technology
t Method in the method response field, Avail ogy pli- 7 7
dry Platform
Infor the match strategy used for abil- ca-
mati each matched enrichment data ity ble
on record.
Extra
ction

Docu Clou Change You can now change instances Info Gen- New Tech Not SAP 2022 2022
Business
men d Service In- on the Document Information only eral nol- ap- -12-0 -12-0
Foun Technology
t stance by Extraction UI by entering the Avail ogy pli- 7 7
dry Platform
Infor Name service instance name. abil- ca-
mati ity ble
See Subscribing to the Docu-
on
ment Information Extraction UI
Extra
[page 234].
ction

Docu Clou Better The machine learning model Info Gen- Cha Tech Not SAP 2022 2022
Business
men d Model for for the extraction of only eral nged nol- ap- -12-0 -12-0
Foun Technology
t the Extrac- purchaseOrder documents Avail ogy pli- 7 7
dry Platform
Infor tion of has been improved. abil- ca-
mati purchase ity ble
on Order
Extra Documents
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -12-0 -12-0
Foun Technology
t Avail ogy pli- 7 7
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 39
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Document The Document Information Info Gen- New Tech Not SAP 2022 2022
Business
men d Information Extraction UI and associated in- only eral nol- ap- -11-1 -11-1
Foun Technology
t Extraction app help are now available in Avail ogy pli- 5 5
dry Platform
Infor UI German. abil- ca-
mati ity ble
See Set Screen Language [page
on
237].
Extra
ction

Docu Clou SAP Sche- The preconfigured SAP schema Info Gen- New Tech Not SAP 2022 2022
Business
men d mas SAP_OCROnly_schema is now only eral nol- ap- -11-0 -11-0
Foun Technology
t available for custom documents Avail ogy pli- 9 9
dry Platform
Infor and OCR (Optical Character abil- ca-
mati Recognition) output only. ity ble
on
See Upload Document [page
Extra
127], Get Result [page 138], and
ction
Add Document [page 240].

Docu Clou Configura- Info Gen- New Tech Not SAP 2022 2022
You can now use the client
Business
men d tion API scope configuration for the only eral nol- ap- -11-0 -11-0
Foun Technology
t documentRetentionTimeD Avail ogy pli- 9 9
dry Platform
Infor ays configuration key. abil- ca-
mati ity ble
You can now use the optional
on
Extra parameters clientId and

ction tenantId to create, get and


delete configurations.

See Create Configuration [page


115].

Docu Clou Free Serv- The Template [page 252] feature Info Gen- Cha Tech Not SAP 2022 2022
Business
men d ice Plan is now also available to Free only eral nged nol- ap- -11-0 -11-0
Foun Technology
t service plan users. Avail ogy pli- 9 9
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


40 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -11-0 -11-0
Foun Technology
t Avail ogy pli- 9 9
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Role Collec- The role collection Info Gen- Cha Tech Not SAP 2022 2022
Business
men d tions Document_Information_E only eral nged nol- ap- -10-0 -10-0
Foun Technology
t xtraction_UI_Templates Avail ogy pli- 4 4
dry Platform
Infor _Admin now includes permis- abil- ca-
mati sions for reading and writing ity ble
on documents.
Extra
See Role Collections [page 236].
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -10-0 -10-0
Foun Technology
t Avail ogy pli- 4 4
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Enrichment The following Info Gen- New Tech Not SAP 2022 2022
Business
men d Data API paymentAdvice fields now only eral nol- ap- -09-1 -09-1
Foun Technology
t support enrichment: Avail ogy pli- 3 3
dry Platform
Infor abil- ca-
• taxId
mati ity ble
on • senderAddress
Extra • senderName
ction
See Extracted Header Fields
[page 278].

Document Information Extraction


What's New for Document Information Extraction PUBLIC 41
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -09-1 -09-1
Foun Technology
t Avail ogy pli- 3 3
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Role Collec- The role collection Info Gen- New Tech Not SAP 2022 2022
Business
men d tions Document_Information_E only eral nol- ap- -08- -08-
Foun Technology
t xtraction_UI_Document_ Avail ogy pli- 30 30
dry Platform
Infor Viewer is now available. This abil- ca-
mati new collection allows users to ity ble
on read documents in the UI appli-
Extra cation.
ction
See Role Collections [page 236].

Docu Clou Client Seg- You can now restrict user access Info Gen- New Tech Not SAP 2022 2022
Business
men d regation to specified clients. only eral nol- ap- -08- -08-
Foun Technology
t Avail ogy pli- 30 30
dry See Create Configuration [page Platform
Infor abil- ca-
115] and Add Document [page
mati ity ble
240].
on
Extra
ction

Docu Clou Free Serv- The Free service plan is Info Gen- New Tech Not SAP 2022 2022
Business
men d ice Plan now available for Document only eral nol- ap- -08- -08-
Foun Technology
t Information Extraction. Avail ogy pli- 30 30
dry Platform
Infor abil- ca-
See Service Plans [page 77],
mati ity ble
Tutorials [page 101] and Free
on
Tier Option and Trial Account
Extra
Technical Constraints [page
ction
276].

Document Information Extraction


42 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -08- -08-
Foun Technology
t Avail ogy pli- 30 30
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Extracted You can now extract the Info Gen- New Tech Not SAP 2022 2022
Business
men d Header following header fields from only eral nol- ap- -08- -08-
Foun Technology
t Fields paymentAdvice documents: Avail ogy pli- 04 04
dry Platform
Infor abil- ca-
• senderAddress
mati ity ble
on • taxId

Extra See Extracted Header Fields


ction [page 278].

Docu Clou New Busi- Document Information Info Gen- New Tech Not SAP 2022 2022
ness Card Business
men d Extraction supports now only eral nol- ap- -08- -08-
Foun Supported Technology
t businessCard documents in Avail ogy pli- 04 04
dry Language: Platform
Infor Hebrew Hebrew. abil- ca-
mati ity ble
See Business Card: Languages
on
[page 86].
Extra
ction

Docu Clou Accessibil- Documentation on Accessibility Info Gen- New Tech Not SAP 2022 2022
Business
men d ity Features Features in Document Informa- only eral nol- ap- -08- -08-
Foun Technology
t tion Extraction [page 294] is now Avail ogy pli- 04 04
dry Platform
Infor available. abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 43
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -08- -08-
Foun Technology
t Avail ogy pli- 04 04
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Technical The maximum number of clients Info Gen- Cha Tech Not SAP 2022 2022
Business
men d Constraints you can create in one API call only eral nged nol- ap- -06- -06-
Foun Technology
t has increased from 10 to 5000. Avail ogy pli- 23 23
dry Platform
Infor abil- ca-
The maximum number of sche-
mati ity ble
mas per client and templates per
on
tenant has increased from 100
Extra
to 1000.
ction
See Technical Constraints [page
275].

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -06- -06-
Foun Technology
t Avail ogy pli- 23 23
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Handwrit- The handwriting detection fea- Info Gen- New Tech Not SAP 2022 2022
Business
men d ing Detec- ture is now available. For now, it only eral nol- ap- -06- -06-
Foun Technology
t tion detects only handwriting in Eng- Avail ogy pli- 23 23
dry Platform
Infor lish. abil- ca-
mati ity ble
See Optical Character Recogni-
on
tion (OCR): Best Practices [page
Extra
258].
ction

Document Information Extraction


44 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Barcode It's now available the list of sup- Info Gen- New Tech Not SAP 2022 2022
ported countries/regions and Business
men d Supported only eral nol- ap- -06- -06-
Foun extracted fields for barcodes in Technology
t Countries/ Avail ogy pli- 03 03
dry Invoice: Languages and Coun- Platform
Infor Regions tries/Regions [page 87] docu- abil- ca-
mati and Ex- ments. ity ble
on tracted
Extra Fields for
ction Invoice
Documents

Docu Clou New Sup- Document Information Info Gen- New Tech Not SAP 2022 2022
Business
men d ported Extraction supports now the fol- only eral nol- ap- -06- -06-
Foun Technology
t Countries/ lowing new countries/regions for Avail ogy pli- 03 03
dry Platform
Infor Regions for Invoice: Languages and Coun- abil- ca-
mati Invoice tries/Regions [page 87] docu- ity ble
on Documents ments:
Extra
• Austria
ction
• Belgium
• Czech Republic
• Denmark
• Finland
• Norway
• Poland
• Portugal
• Slovakia
• Slovenia
• Sweden

 Note
To support the new lan-
guages, the machine learn-
ing models have been ex-
tended. Consequently, pre-
dictions (field extractions
and corresponding confi-
dence scores) may differ
from previous releases.

Document Information Extraction


What's New for Document Information Extraction PUBLIC 45
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -06- -06-
Foun Technology
t Avail ogy pli- 03 03
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Document You can now see all matched en- Info Gen- New Tech Not SAP 2022 2022
Business
men d API richment data records in the Get only eral nol- ap- -05- -05-
Foun Technology
t Result [page 138] response. Avail ogy pli- 06 06
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Enrichment The Create Data Activation Info Gen- New Tech Not SAP 2022 2022
Business
men d Data API [page 179] endpoint now in- only eral nol- ap- -05- -05-
Foun Technology
t cludes the optional parameters Avail ogy pli- 06 06
dry Platform
Infor type and subtype. abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Deskew The service now automatically Info Gen- New Tech Not SAP 2022 2022
Business
men d rotates document images to only eral nol- ap- -05- -05-
Foun Technology
t compensate for skewing. Avail ogy pli- 06 06
dry Platform
Infor abil- ca-
See Supported Document Types
mati ity ble
and File Formats [page 84].
on
Extra
ction

Document Information Extraction


46 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -05- -05-
Foun Technology
t Avail ogy pli- 06 06
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Document The Upload Document [page Info Gen- New Tech Not SAP 2022 2022
Business
men d API 127] endpoint now includes a only eral nol- ap- -04- -04-
Foun Technology
t schemaId parameter. This pa- Avail ogy pli- 22 22
dry Platform
Infor rameter is required in payloads abil- ca-
mati that include templateId. ity ble
on
Extra
ction

Docu Clou Enrichment You can now use variants to Info Gen- New Tech Not SAP 2022 2022
Business
men d Data API create multiple versions of the only eral nol- ap- -04- -04-
Foun Technology
t same data record. Avail ogy pli- 22 22
dry Platform
Infor abil- ca-
See Create Enrichment Data
mati ity ble
[page 167], Data Variants [page
on
172] and Data Duplicates [page
Extra
173].
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -04- -04-
Foun Technology
t Avail ogy pli- 22 22
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 47
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Template You can now use templates to Info Gen- Cha Tech Not SAP 2022 2022
Business
men d extract multiple tables from the only eral nged nol- ap- -03- -03-
Foun Technology
t same page, provided the tables Avail ogy pli- 31 31
dry Platform
Infor all have a standard structure abil- ca-
mati and the same table headers. See ity ble
on General Recommendations and
Extra Limitations [page 266].
ction

Docu Clou Global Ac- You can now move subaccounts Info Gen- New Tech Not SAP 2022 2022
Business
men d counts between your global accounts. only eral nol- ap- -03- -03-
Foun Technology
t Avail ogy pli- 31 31
dry See Initial Setup [page 96]. Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Trial Ac- The Free Tier Option and Trial Info Gen- Cha Tech Not SAP 2022 2022
Business
men d count Tech- Account Technical Constraints only eral nged nol- ap- -03- -03-
Foun Technology
t nical Con- [page 276] documentation has Avail ogy pli- 31 31
dry Platform
Infor straints been updated. abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -03- -03-
Foun Technology
t Avail ogy pli- 31 31
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


48 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Support for If you create more than one Info Gen- New Tech Not SAP 2022 2022
Business
men d Multiple service instance, the Document only eral nol- ap- -03-1 -03-1
Foun Technology
t Service In- Information Extraction UI now Avail ogy pli- 7 7
dry Platform
Infor stances allows you to change between abil- ca-
mati instances. See Subscribing to ity ble
on the Document Information Ex-
Extra traction UI [page 234].
ction

Docu Clou Document You can now select folders Info Gen- New Tech Not SAP 2022 2022
Business
men d Feature containing multiple documents only eral nol- ap- -03-1 -03-1
Foun Technology
t for upload. The Document Avail ogy pli- 7 7
dry Platform
Infor Information Extraction UI now abil- ca-
mati displays thumbnails of docu- ity ble
on ments and allows you to re-
Extra name documents before upload-
ction ing them. See Add Document
[page 240].

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -03-1 -03-1
Foun Technology
t Avail ogy pli- 7 7
dry Metering and pricing details for Platform
Infor abil- ca-
the Compute Hours for Base Ed-
mati ity ble
ition [page 80] have been up-
on
dated.
Extra
ction

Docu Clou Document You can now download extrac- Info Gen- New Tech Not SAP 2022 2022
Business
men d Extraction tion values before and after you only eral nol- ap- -02- -02-
Foun Technology
t Results edit and save them. Avail ogy pli- 03 03
dry Platform
Infor abil- ca-
See View and Edit Extraction Re-
mati ity ble
sults [page 242].
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 49
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Document You can now view the raw values Info Gen- New Tech Not SAP 2022 2022
Business
men d Extraction for extraction results. Raw val- only eral nol- ap- -02- -02-
Foun Technology
t Results ues are the original field values Avail ogy pli- 03 03
dry Platform
Infor before postprocessing, which abil- ca-
mati can differ from the correspond- ity ble
on ing extraction results.
Extra
See View and Edit Extraction Re-
ction
sults [page 242].

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements and stability improvements. only eral nged nol- ap- -02- -02-
Foun Technology
t Avail ogy pli- 03 03
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Docu Clou SAP Sche- The SAP schemas for standard Info Gen- Cha Tech Not SAP 2022 2022
Business
men d mas document types now have the only eral nged nol- ap- -01-1 -01-1
Foun Technology
t status ACTIVE. As a result, you Avail ogy pli- 8 8
dry Platform
Infor no longer need to create copies abil- ca-
mati of these schemas before using ity ble
on them to upload documents or
Extra create templates.
ction
See Schema Configuration
[page 245].

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements improvements. only eral nged nol- ap- -01-1 -01-1
Foun Technology
t Avail ogy pli- 8 8
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


50 PUBLIC What's New for Document Information Extraction
Mod
Tech ular
nical Envi- Line Busi Lat-
Com ron- Life- of ness est Avail
po- men Ac- cy- Busi Proc Revi- able
nent t Title Description tion cle Type ness ess Product sion as of

Docu Clou Enrichment The new Enrichment Data API Info Gen- New Tech Not SAP 2022 2022
Business
men d Data API endpoint List Data-Persistence only eral nol- ap- -01-1 -01-1
Foun Technology
t Jobs [page 174] is now available. Avail ogy pli- 0 0
dry Platform
Infor abil- ca-
The new enrichment data entity
mati ity ble
type Product [page 172] is now
on
available.
Extra
ction

Docu Clou Configura- Info Gen- New Tech Not SAP 2022 2022
The performPIICheck sub-
Business
men d tion API configuration is now available. only eral nol- ap- -01-1 -01-1
Foun Technology
t Avail ogy pli- 0 0
dry See Create Configuration [page Platform
Infor abil- ca-
115].
mati ity ble
on
Extra
ction

Docu Clou Mass Dele- The Document [page 239] fea- Info Gen- New Tech Not SAP 2022 2022
Business
men d tion of ture has been enhanced: you can only eral nol- ap- -01-1 -01-1
Foun Technology
t Documents now select multiple documents Avail ogy pli- 0 0
dry Platform
Infor for simultaneous deletion. abil- ca-
mati ity ble
on
Extra
ction

Docu Clou Overall Im- There have been several code Info Gen- Cha Tech Not SAP 2022 2022
Business
men d provements improvements. only eral nged nol- ap- -01-1 -01-1
Foun Technology
t Avail ogy pli- 0 0
dry Platform
Infor abil- ca-
mati ity ble
on
Extra
ction

Document Information Extraction


What's New for Document Information Extraction PUBLIC 51
2.3 2021 What's New for Document Information Extraction
(Archive)

Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-1
ent sion Foun- Im- only ed 2-06
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-1
ent sion Foun- Im- only ed 1-23
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Tem- Improved template extraction results for header fields in mul- Info Chang 2021-1
ent sion Foun- plate tipage documents. only ed 1-23
dry
Inform Suite - Fea-
See Template [page 252].
ation Devel- ture
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Using Documentation updated: now includes requirement to use a Info Chang 2021-1
ent sion Foun- the schema when creating templates based on document extrac- only ed 1-23
dry
Inform Suite - Docu- tion results.
ation Devel- ment
See Document [page 239] and Template [page 252].
Extract op- Infor-
ion ment mat-
Effi- ion Ex-
ciency trac-
tion UI

Document Information Extraction


52 PUBLIC What's New for Document Information Extraction
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Tutori- The following tutorial missions are now available for Info New 2021-1
ent sion Foun- als Document Information Extraction: only 1-23
dry
Inform Suite -
• Shape Machine Learning to Process Standard Business
ation Devel-
Documents
Extract op-
ion ment • Shape Machine Learning to Process Custom Business

Effi- Documents
ciency
See Tutorials [page 101].

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-1
ent sion Foun- Im- only ed 1-05
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud En- The matching accuracy for the bankAccount Info Chang 2021-1
ent sion Foun- rich- only ed 1-05
businessEntity key has been improved.
dry
Inform Suite - ment
See BusinessEntity [page 170] and Data Enrichment: Best
ation Devel- Data
Practices [page 271].
Extract op- API
ion ment
Effi-
ciency

Docum Exten- Cloud Tech- The 3510 x 3510 pixels maximum limit for the file size of sin- Info De- 2021-1
ent sion Foun- nical gle-page JPEG, PNG and TIFF documents has been removed. only leted 1-05
dry
Inform Suite - Con- You can now upload to the service documents with any reso-
ation Devel- straint lution as long as the file size is not higher than 50 MB.
Extract op- s
See Optical Character Recognition (OCR): Best Practices
ion ment
[page 258] and Technical Constraints [page 275].
Effi-
ciency

Document Information Extraction


What's New for Document Information Extraction PUBLIC 53
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-1
ent sion Foun- Im- only ed 0-15
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud New Document Information Extraction is now available in the AWS Info New 2021-1
ent sion Foun- AWS region Australia (Sydney). only 0-15
dry
Inform Suite - Region
ation Devel-
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Sup- Document Information Extraction supports now, at API level Info New 2021-1
ent sion Foun- port to only, businessCard as one of the standard document types. only 0-15
dry
Inform Suite - Busi-
See Supported Document Types and File Formats [page
ation Devel- ness
84], Supported Languages and Countries/Regions [page
Extract op- Card
86], and Extracted Header Fields [page 278].
ion ment Docu-
Effi- ments
ciency

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 09-30
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Document Information Extraction


54 PUBLIC What's New for Document Information Extraction
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Role The role collection Info New 2021-
ent sion Foun- Collec- Document_Information_Extraction_UI_Admin_ only 09-30
dry
Inform Suite - tions User is now available. This new collection provides access to
ation Devel- all the features of the UI application.
Extract op-
See Role Collections [page 236].
ion ment
Effi-
ciency

Docum Exten- Cloud Best Best practices covering all stages of processing documents in Info New 2021-
ent sion Foun- Practi- the Document Information Extraction UI are now available. only 09-30
dry
Inform Suite - ces
See Document: Best Practices [page 270], Template: Best
ation Devel-
Practices [page 265], and Schema Configuration: Best Practi-
Extract op-
ces [page 259].
ion ment
Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 09-10
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Tem- You can now: Info New 2021-
ent sion Foun- plate only 09-10
dry • Import and export templates
Inform Suite - Fea-
ation Devel- ture
• Create templates from extracted documents

Extract op- See Template [page 252].


ion ment
Effi-
ciency

Document Information Extraction


What's New for Document Information Extraction PUBLIC 55
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Sup- Single-page document files in TIFF format are now sup- Info New 2021-
ent sion Foun- ported ported. only 09-10
dry
Inform Suite - File
See Supported Document Types and File Formats [page 84]
ation Devel- For-
and Technical Constraints [page 275].
Extract op- mats
ion ment
Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 08-31
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Tech- The technical constraints for the number of schemas are now Info Chang 2021-
ent sion Foun- nical available. only ed 08-31
dry
Inform Suite - Con-
See Technical Constraints [page 275] and Free Tier Option
ation Devel- straint
and Trial Account Technical Constraints [page 276].
Extract op- s
ion ment
Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 08-12
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Document Information Extraction


56 PUBLIC What's New for Document Information Extraction
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Docu- The Get Result [page 138] endpoint returns now two new Info New 2021-
ent sion Foun- ment response fields: only 08-12
dry
Inform Suite - API
• languageCodes
ation Devel-
Extract op-
• pageCount
ion ment
Effi-
ciency

Docum Exten- Cloud Con- All Configuration API [page 115] keys have now ten- Info New 2021-
ent sion Foun- figura- ant scope by default. Service instance scope is now only 08-12
dry
Inform Suite - tion also available for the dataFeedbackCollection and
ation Devel- API documentRetentionTimeDays keys.
Extract op-
The documentRetentionTimeDays configuration key is
ion ment
Effi- now available. See Create Configuration [page 115].

ciency

Docum Exten- Cloud Py- A Python client library is now available for Document Info New 2021-
ent sion Foun- thon Information Extraction. It provides easy access to the REST only 07-26
dry
Inform Suite - Client API, UI application, and facilitates the service onboarding
ation Devel- Li- process.
Extract op- brary
Go to Python Client Library .
ion ment
Effi-
ciency

Docum Exten- Cloud Bar- Decoded information is now available for barcode fields from Info New 2021-
ent sion Foun- code India invoices. only 07-26
dry
Inform Suite - Heade
See Extracted Header Fields [page 278] and Barcode Header
ation Devel- r Field
Field in Invoice Documents [page 285].
Extract op-
ion ment
Effi-
ciency

Document Information Extraction


What's New for Document Information Extraction PUBLIC 57
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Tem- The Template [page 252] feature is now also available to all Info Chang 2021-
ent sion Foun- plate SAP BTP Trial users. only ed 07-26
dry
Inform Suite - Fea-
See Free Tier Option and Trial Account Technical Constraints
ation Devel- ture
[page 276].
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 07-26
dry The Service Guide documentation has been updated:
Inform Suite - prove-
ation Devel- ments • Capabilities API [page 104]
Extract op- • Save Ground Truth [page 154]
ion ment
• Extracted Header Fields [page 278]
Effi-
• Extracted Line Items [page 286]
ciency

Docum Exten- Cloud Tem- The Template [page 252] feature is now generally available to Info New 2021-
ent sion Foun- plate all Document Information Extraction UI application users. only 07-20
dry
Inform Suite - Fea-
ation Devel- ture
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Exten- The new Compute Hours for Base Edition [page 80] service Info New 2021-
ent sion Foun- sion plan is now available. only 07-20
dry
Inform Suite - Capa-
ation Devel- bilities
Extract op- Serv-
ion ment ice
Effi- Plan
ciency

Document Information Extraction


58 PUBLIC What's New for Document Information Extraction
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Overall There have been several code improvements. Info New 2021-
ent sion Foun- Im- only 07-07
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Docu- You can now use the Get Document File [page 159] endpoint Info New 2021-
ent sion Foun- ment to get the original document file you uploaded to the service. only 07-07
dry
Inform Suite - API
ation Devel-
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Secur- Auditing and logging information is now available in the Se- Info New 2021-
ent sion Foun- ity curity [page 288]. only 07-07
dry
Inform Suite - Guide
ation Devel-
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Sche The Schema Configuration [page 245] feature is now availa- Info New 2021-
ent sion Foun- ma ble in the Document Information Extraction UI application. only 06-28
dry
Inform Suite - Fea-
Document Information Extraction supports now custom
ation Devel- ture
documents and fields. See Supported Document Types and
Extract op- and
File Formats [page 84].
ion ment Sup-
Effi- port
ciency for
Cus-
tom
Docu-
ments
and
Fields

Document Information Extraction


What's New for Document Information Extraction PUBLIC 59
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Sup- Document Information Extraction supports now Info Chang 2021-
ent sion Foun- port only ed 06-28
purchaseOrder documents for all users.
dry
Inform Suite - for
The list of line items you can extract from purchaseOrder
ation Devel- Pur-
documents has been updated. See Extracted Line Items
Extract op- chase
[page 286].
ion ment Order
Effi- Docu- See also Supported Document Types and File Formats [page
ciency ments 84] and Extracted Header Fields [page 278].

Docum Exten- Cloud Con- The dataFeedbackCollection Configuration API [page Info Chang 2021-
ent sion Foun- figura- 115] key is now available. only ed 06-28
dry
Inform Suite - tion
ation Devel- API
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Tem- The Template API (Beta) and its endpoints are no longer ex- Info Chang 2021-
ent sion Foun- plate posed to users at API level. only ed 06-28
dry
Inform Suite - API
The Template [page 252] feature remains available from the
ation Devel- (Beta)
Document Information Extraction UI application.
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Secur- The Security [page 288] documentation has been updated. Info Chang 2021-
ent sion Foun- ity only ed 06-28
dry
Inform Suite - Guide
ation Devel-
Extract op-
ion ment
Effi-
ciency

Document Information Extraction


60 PUBLIC What's New for Document Information Extraction
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 05-24
dry
Inform Suite - prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Bar- The barcode header field can now be extracted from Ticket- Info New 2021-
ent sion Foun- code BAI invoices for the three Basque provincial councils (Álava, only 05-24
dry
Inform Suite - Heade Vizcaya and Guipúzcoa) and the Basque government.
ation Devel- r Field
See Extracted Header Fields [page 278] and Barcode Header
Extract op-
Field in Invoice Documents [page 285].
ion ment
Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 05-05
Inform Suite - dry prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Bar- The barcode header field can now be extracted from: Info New 2021-
ent sion Foun- code only 05-05
• Brazil PIX (instant payments)
Inform Suite - dry Heade
ation Devel- r Field
• Argentina, Colombia and Uruguay invoices

Extract op- See Extracted Header Fields [page 278] and Barcode Header
ion ment Field in Invoice Documents [page 285].
Effi-
ciency

Document Information Extraction


What's New for Document Information Extraction PUBLIC 61
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 03-29
Inform Suite - dry prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Sup- Document Information Extraction supports now the Factur- Info New 2021-
ent sion Foun- port X and ZUGFeRD standards (all versions) for e-invoice docu- only 03-29
Inform Suite - dry for ment files in PDF and XML hybrid format.
ation Devel- Fac-
See Supported Document Types and File Formats [page
Extract op- tur-X
84].
ion ment and
Effi- ZUG-
ciency FeRD
Stand-
ards

Docum Exten- Cloud Docu The Document Information Extraction UI application now fea- Info New 2021-
ent sion Foun- ment tures: only 03-29
Inform Suite - dry Inform
• Activation and deactivation of templates. See Template
ation Devel- ation
[page 252].
Extract op- Extrac
ion ment tion UI
• Field level confidence visualization. See Document [page
239].
Effi-
ciency • Web Assistant

Docum Exten- Cloud Tem- The following Template API (Beta) endpoints are now availa- Info New 2021-
ent sion Foun- plate ble: only 03-29
Inform Suite - dry API
• Activate Template (Beta)
ation Devel- (Beta)
Extract op-
• Deactivate Template (Beta)

ion ment
Effi-
ciency

Document Information Extraction


62 PUBLIC What's New for Document Information Extraction
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Data The data feedback collection feature is now available. Info New 2021-
ent sion Foun- Feed- only 03-29
See Get Result [page 138] and Confirm Document [page
Inform Suite - dry back
157].
ation Devel- Collec-
Extract op- tion
ion ment for
Effi- Model
ciency Im-
prove-
ment

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 03-22
Inform Suite - dry prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Sup- The list of supported countries/regions for Info Chang 2021-
ent sion Foun- ported purchaseOrder (controlled availability) documents, and only ed 03-22
Inform Suite - dry Lan- the list of supported languages for the Template API (Beta)
ation Devel- gua- and the Document Information Extraction UI Template (Beta)
Extract op- ges feature are now available.
ion ment and
See Supported Languages and Countries/Regions [page
Effi- Coun-
86].
ciency tries/
Re-
gions

Docum Exten- Cloud Bar- Barcode header field extraction has been improved. Info Chang 2021-
ent sion Foun- code only ed 03-22
See Extracted Header Fields [page 278] and Barcode Header
Inform Suite - dry Heade
Field in Invoice Documents [page 285].
ation Devel- r Field
Extract op-
ion ment
Effi-
ciency

Document Information Extraction


What's New for Document Information Extraction PUBLIC 63
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 03-01
Inform Suite - dry prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud En- You can now set data activation to manual, instead of using Info Chang 2021-
ent sion Foun- rich- the default automatic refresh of enrichment data, that takes only ed 03-01
Inform Suite - dry ment place every 4 hours.
ation Devel- Data
See Create Data Activation [page 179] and Get Data Activa-
Extract op- API
tion Details [page 180].
ion ment
Effi-
ciency

Docum Exten- Cloud Con- The Configuration API [page 115] is now available. Info New 2021-
ent sion Foun- figura- only 03-01
Inform Suite - dry tion
ation Devel- API
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Identi- The Identifier API [page 110] is now available. Info New 2021-
ent sion Foun- fier only 03-01
Inform Suite - dry API
ation Devel-
Extract op-
ion ment
Effi-
ciency

Document Information Extraction


64 PUBLIC What's New for Document Information Extraction
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Sup- Document Information Extraction supports now Info New 2021-
ent sion Foun- ported paymentAdvice document files in Excel format. See Sup- only 03-01
Inform Suite - dry Docu- ported Document Types and File Formats [page 84].
ation Devel- ment
Extract op- Types
ion ment and
Effi- File
ciency For-
mats

Docum Exten- Cloud Docu- The rawValue response field is now available for the Get Info Chang 2021-
ent sion Foun- ment Result [page 138] endpoint. only ed 02-15
Inform Suite - dry API
ation Devel-
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud SAP Document Information Extraction is now available in the SAP Info New 2021-
ent sion Foun- API API Business Hub. only 02-15
Inform Suite - dry Busi-
See Document Information Extraction .
ation Devel- ness
Extract op- Hub
ion ment
Effi-
ciency

Docum Exten- Cloud En- You can now delete large numbers of data records for all Info Chang 2021-
ent sion Foun- rich- clients per data type (employee or businessEntity). only ed 02-01
Inform Suite - dry ment
See Delete Enrichment Data (Asynchronous) [page 182].
ation Devel- Data
Extract op- API
ion ment
Effi-
ciency

Document Information Extraction


What's New for Document Information Extraction PUBLIC 65
Tech-
nical
Com- Envi- Avail-
po- Capa- ron- able
nent bility ment Title Description Action Type as of

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 02-01
Inform Suite - dry prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Tem- The Document Information Extraction UI Template [page Info Chang 2021-
ent sion Foun- plate 252] feature has been updated. See Add Documents and only ed 01-18
Inform Suite - dry (Beta) Activate/Deactivate Template [page 253].
ation Devel- Fea-
The role collection
Extract op- ture
Document_Information_Extraction_UI_Templa
ion ment
tes_Admin is now available. See Subscribing to the Docu-
Effi-
ment Information Extraction UI [page 234].
ciency

Docum Exten- Cloud Overall There have been several code improvements. Info Chang 2021-
ent sion Foun- Im- only ed 01-18
Inform Suite - dry prove-
ation Devel- ments
Extract op-
ion ment
Effi-
ciency

Docum Exten- Cloud Ex- The list of header fields you can extract from Info Chang 2021-
ent sion Foun- tracte purchaseOrder documents has been updated. only ed 01-04
Inform Suite - dry d
See Extracted Header Fields [page 278].
ation Devel- Heade
Extract op- r
ion ment Fields
Effi-
ciency

2.4 2020 What's New for Document Information Extraction


(Archive)

Document Information Extraction


66 PUBLIC What's New for Document Information Extraction
Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud Overall There have been several code improvements. Chang 2020-1
ent sion Foun- Im- ed 2-21
Informa Suite - dry prove-
tion Devel- ments
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Tem- The Document Information Extraction UI Template [page 252] New 2020-1
ent sion Foun- plate feature supports now purchaseOrder documents (only for 2-21
Informa Suite - dry (Beta) a previously selected group of beta customers). See Using the
tion Devel- Feature Document Information Extraction UI [page 234].
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements. Chang 2020-1
ent sion Foun- Im- ed 2-03
Informa Suite - dry prove-
tion Devel- ments
Extracti opment
on Effi-
ciency

Docum Exten- Cloud New You can now use the Set up account for Document Information New 2020-1
ent sion Foun- SAP Extraction booster to automate the onboarding steps on the SAP 1-20
Informa Suite - dry Cloud Cloud Platform cockpit, and quickly consume the service and its
tion Devel- Plat- UI application.
Extracti opment form
See Initial Setup [page 96] and Subscribing to the Document
on Effi- Cock-
Information Extraction UI [page 234].
ciency pit
Boos-
ter

Docum Exten- Cloud New • Document Information Extraction supports now New 2020-1
ent sion Foun- Beta purchaseOrder documents (only for a previously selected 1-20
Informa Suite - dry Fea- group of beta customers). See Supported Document Types
tion Devel- tures and File Formats [page 84], Extracted Header Fields [page
Extracti opment 278] and Extracted Line Items [page 286].
on Effi- • The Template [page 252] feature is now available (only
ciency for a previously selected group of beta customers) in the
Document Information Extraction UI. See Using the Docu-
ment Information Extraction UI [page 234].

Document Information Extraction


What's New for Document Information Extraction PUBLIC 67
Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud Overall • There have been several code improvements. Chang 2020-1
ent sion Foun- Im- • The Feature Scope Description for Document Information ed 1-20
Informa Suite - dry prove- Extraction has been updated.
tion Devel- ments • The Technical Constraints [page 275] have been updated.
Extracti opment
• The Document Information Extraction Tutorials [page 101]
on Effi-
have been updated.
ciency

Docum Exten- Cloud New Document Information Extraction is now available in the AWS re- New 2020-1
ent sion Foun- AWS gion Europe (Frankfurt) EU-ONLY (access from Europe only). 0-27
Informa Suite - dry Region
tion Devel-
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Overall • There have been several code improvements. Chang 2020-1
ent sion Foun- Im- • The discount and dueDate header fields can now be extracted ed 0-27
Informa Suite - dry prove- from invoices. See Extracted Header Fields [page 278].
tion Devel- ments • To get better extraction and enrichment results with
Extracti opment Document Information Extraction, see Optical Character Rec-
on Effi- ognition (OCR): Best Practices [page 258].
ciency

Docum Exten- Cloud Meter- A new service plan is available for Document Information New 2020-1
ent sion Foun- ing and Extraction. 0-21
Informa Suite - dry Pricing
See Metering and Pricing [page 79].
tion Devel-
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements: Chang 2020-1
ent sion Foun- Im- ed 0-16
• The barcode header field can now be extracted from India
Informa Suite - dry prove-
invoices. See Extracted Header Fields [page 278].
tion Devel- ments
Extracti opment
• The new returnNullValues request parameter is now
available for the Get Result endpoint. See Get Result [page
on Effi-
138].
ciency

Document Information Extraction


68 PUBLIC What's New for Document Information Extraction
Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud Ex- The unitOfMeasure line item can now be extracted from invoices. Chang 2020-1
ent sion Foun- tracted ed 0-05
See Extracted Line Items [page 286].
Informa Suite - dry Line
tion Devel- Items
Extracti opment
on Effi-
ciency

Docum Exten- Cloud UI Ap- The Document Information Extraction UI is now generally availa- New 2020-1
ent sion Foun- plica- ble to all SAP Cloud Platform customers. 0-05
Informa Suite - dry tion
See Using the Document Information Extraction UI [page 234].
tion Devel-
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Sup- The Service Guide documentation has been updated with a new New 2020-0
ent sion Foun- ported section: Supported Document Types and File Formats [page 84]. 9-16
Informa Suite - dry Docu-
tion Devel- ment
Extracti opment Types
on Effi- and
ciency File
For-
mats

Docum Exten- Cloud Overall There have been several code improvements: Chang 2020-0
ent sion Foun- Im- ed 9-16
• The barcode header field can now be extracted from invoices.
Informa Suite - dry prove-
See Extracted Header Fields [page 278].
tion Devel- ments
Extracti opment
• The fileType response field is now available for the Get
Result [page 138] endpoint.
on Effi-
ciency

Docum Exten- Cloud Using A new version of the Document Information Extraction UI is now Chang 2020-0
ent sion Foun- the available (only for a previously selected group of beta customers). ed 8-28
Informa Suite - dry Docum
See details on the possible document statuses and the Confirm
tion Devel- ent
document functionality in Using the Key Features of the Document
Extracti opment Inform
Information Extraction UI [page 237].
on Effi- ation
ciency Extract
ion UI
(Beta)

Document Information Extraction


What's New for Document Information Extraction PUBLIC 69
Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud Overall There have been several code improvements. Chang 2020-0
ent sion Foun- Im- ed 8-28
Informa Suite - dry prove-
tion Devel- ments
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Docu- The clientId request parameter is no longer needed to send a Chang 2020-0
ent sion Foun- ment request to the following Document API [page 127] endpoints: ed 8-17
Informa Suite - dry API
• Get Result [page 138]
tion Devel-
Extracti opment • Save Ground Truth [page 154]

on Effi- • Get All Pages Text [page 159]


ciency • Get Single Page Text [page 161]
• Get Request Options [page 163]

Docum Exten- Cloud Overall There have been several code improvements. Chang 2020-0
ent sion Foun- Im- ed 8-17
Informa Suite - dry prove-
tion Devel- ments
Extracti opment
on Effi-
ciency

Docum Exten- Cloud New You can now use the Set up account for Document Information New 2020-0
ent sion Foun- SAP Extraction booster to automatically create your Document 8-17
Informa Suite - dry Cloud Information Extraction service key on SAP Cloud Platform Trial.
tion Devel- Plat- Follow the steps described in the tutorial Set Up Account for
Extracti opment form Document Information Extraction .
on Effi- Trial
ciency Cock-
pit
Boos-
ter

Docum Exten- Cloud New Document Information Extraction is now available in the AWS re- New 2020-0
ent sion Foun- AWS gion US East (VA). 7-31
Informa Suite - dry Region
tion Devel-
Extracti opment
on Effi-
ciency

Document Information Extraction


70 PUBLIC What's New for Document Information Extraction
Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud New Single-page PNG and JPEG paymentAdvice files are now sup- New 2020-0
ent sion Foun- docu- ported. See Upload Document [page 127] and Technical Con- 7-31
Informa Suite - dry ment straints [page 275].
tion Devel- file for-
Extracti opment mats
on Effi- for
ciency payme
ntAdv
ice

Docum Exten- Cloud Using A new version of the Document Information Extraction UI is now Chang 2020-0
ent sion Foun- the available (only for a previously selected group of beta customers). ed 7-31
Informa Suite - dry Docum
See Using the Document Information Extraction UI [page 234].
tion Devel- ent
Extracti opment Inform
on Effi- ation
ciency Extract
ion UI
(Beta)

Docum Exten- Cloud Overall There have been several code improvements. Chang 2020-0
ent sion Foun- Im- ed 7-31
Informa Suite - dry prove-
tion Devel- ments
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Overall There have been several code and usability improvements: Chang 2020-0
ent sion Foun- Im- ed 7-14
• Enrichment data upload performance. See Create Enrich-
Informa Suite - dry prove-
ment Data [page 167].
tion Devel- ments
Extracti opment
• Document confirmation feature. See the new Document API
endpoint Confirm Document [page 157].
on Effi-
ciency

Docum Exten- Cloud Overall There have been several code improvements: Chang 2020-0
ent sion Foun- Im- ed 6-15
• The deliveryDate, paymentTerms and senderBankAccount
Informa Suite - dry prove-
header fields can now be extracted from invoices. See Ex-
tion Devel- ments
tracted Header Fields [page 278].
Extracti opment
on Effi-
• The list of supported character types for the IDs of clients,
enrichment data records, system and company codes has
ciency
been updated. See Technical Constraints [page 275].

Document Information Extraction


What's New for Document Information Extraction PUBLIC 71
Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud Overall There have been several code and usability improvements: Chang 2020-0
ent sion Foun- Im- ed 6-02
• Single-page PNG and JPEG invoice files are now supported.
Informa Suite - dry prove-
See Upload Document [page 127] and Technical Constraints
tion Devel- ments
[page 275].
Extracti opment
on Effi-
• New Document API [page 127] endpoints are now available.

ciency • The Enrichment Data API [page 166] endpoints have also
been updated. Delete Enrichment Data (Asynchronous)
[page 182] is now available.
• The deliveryNoteNumber header field can now be extracted
from invoices. See Extracted Header Fields [page 278].
• You can now use the Capabilities API [page 104] to get the list
of document fields and enrichment data you can process by
document type.

Docum Exten- Cloud New The following beta features are now available (only for a previously New 2020-0
ent sion Foun- Beta selected group of beta customers): 6-02
Informa Suite - dry Fea-
• Template-based information extraction. See Template API
tion Devel- tures
(Beta) and Technical Constraints [page 275].
Extracti opment
on Effi-
• Document Information Extraction UI. See Using the Docu-
ment Information Extraction UI [page 234].
ciency

Docum Exten- Cloud Overall There have been several code and usability improvements: Chang 2020-0
ent sion Foun- Im- ed 5-18
• Higher model accuracy
Informa Suite - dry prove-
tion Devel- ments
• The Supported Languages and Countries/Regions [page 86]
list has been updated
Extracti opment
on Effi- • The tutorial mission Use Machine Learning to Enrich Data
ciency Extracted from Documents is now available. See Tutorials
[page 101].

Docum Exten- Cloud New The Notifications [page 227] functionality is now available. New 2020-0
ent sion Foun- Notifi- 5-18
Informa Suite - dry cations
tion Devel- Func-
Extracti opment tional-
on Effi- ity
ciency

Document Information Extraction


72 PUBLIC What's New for Document Information Extraction
Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud Overall There have been several stability and usability improvements, in- Chang 2020-0
ent sion Foun- Im- cluding the model accuracy. ed 4-20
Informa Suite - dry prove-
The Service Guide documentation has been updated:
tion Devel- ments
Extracti opment • Technical Constraints [page 275]
on Effi- • Free Tier Option and Trial Account Technical Constraints
ciency [page 276]

Docum Exten- Cloud Overall There have been several stability and usability improvements: Chang 2020-0
ent sion Foun- Im- ed 3-30
• Some field value types have been updated. See Capabilities
Informa Suite - dry prove-
API [page 104]
tion Devel- ments
Extracti opment
• The enrichment parameter top property has now a maxi-
mum possible value of 50. See Enrichment Parameter [page
on Effi-
132].
ciency
• Now, if no value is detected for fields in header or line items,
they do not appear in the response JSON file. See Get Result
[page 138].

Docum Exten- Cloud API The API Reference [page 102] documentation has been updated Chang 2020-0
ent sion Foun- Refer- with the following new sections: ed 3-30
Informa Suite - dry ence
• Get Access Token [page 103]
tion Devel-
Extracti opment
• Capabilities API [page 104]

on Effi- • Technical Constraints [page 275]


ciency

Docum Exten- Cloud Tutori- A new tutorial mission is now available for Document Information New 2020-0
ent sion Foun- als Extraction. 3-02
Informa Suite - dry
See Use Machine Learning to Process Business Documents .
tion Devel-
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Client The new clientIdStartsWith request parameter is now New 2020-0
ent sion Foun- API available for the Get Client endpoint. 3-02
Informa Suite - dry
See Get Client [page 108] .
tion Devel-
Extracti opment
on Effi-
ciency

Document Information Extraction


What's New for Document Information Extraction PUBLIC 73
2.5 2019 What's New for Document Information Extraction
(Archive)

Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud New Document Information Extraction is now available in the AWS re- New 2019-1
ent sion Foun- AWS gion Japan (Tokyo). 2-19
Informa Suite - dry Region
tion Devel-
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Trial You can now try out Document Information Extraction on SAP New 2019-1
ent sion Foun- Ac- Cloud Platform Trial. 2-05
Informa Suite - dry count
See Get a Trial Account.
tion Devel-
Extracti opment
on Effi-
ciency

Docum Exten- Cloud API • Enrichment Data API documentation is now available. See Chang 2019-11
ent sion Foun- Refer- Enrichment Data API [page 166]. ed -04
Informa Suite - dry ence • Document API documentation has also been updated. See
tion Devel- Document API [page 127]
Extracti opment
• The documentNumber, documentDate, discountAmount,
on Effi-
deductionAmount, and grossAmount fields can now be ex-
ciency
tracted from line items. See Extracted Line Items [page 286].

Docum Exten- Cloud Getting CA-ML-BDP is now the BCP component for Document Information Chang 2019-11
ent sion Foun- Sup- Extraction. ed -04
Informa Suite - dry port
See Getting Support [page 295].
tion Devel-
Extracti opment
on Effi-
ciency

Document Information Extraction


74 PUBLIC What's New for Document Information Extraction
Techni-
cal Envi- Availa-
Com- Capa- ron- ble as
ponent bility ment Title Description Type of

Docum Exten- Cloud Secur- The Security Guide has been updated with Enrichment Data API Chang 2019-11
ent sion Foun- ity details. ed -04
Informa Suite - dry Guide
See Security [page 288].
tion Devel-
Extracti opment
on Effi-
ciency

Docum Exten- Cloud Trou- The Troubleshooting section is now available. New 2019-11
ent sion Foun- ble- -04
See Troubleshooting [page 296].
Informa Suite - dry shoot-
tion Devel- ing
Extracti opment
on Effi-
ciency

Document Information Extraction


What's New for Document Information Extraction PUBLIC 75
3 Concepts

See a glossary of definitions for artificial intelligence (AI) and machine learning (ML), and Document
Information Extraction concepts in AI & ML Glossary. In the third column Filter, select Document Information
Extraction.

Document Information Extraction


76 PUBLIC Concepts
4 Service Plans

Learn more about the different types of service plans for Document Information Extraction.

Document Information Extraction provides different types of service plans. The type you choose determines
pricing, conditions of use, resources, available services, and hosts.

It depends on your use case whether you choose a free or a paid service plan. If you plan to use your global
account in productive mode, you must purchase a paid enterprise account. It's important that you're aware of
the differences when you're planning and setting up your account model. See Initial Setup [page 96].

The following service plans are currently available:

• For enterprise and trial accounts: Base Edition (blocks_of_100)


• For enterprise accounts: Premium Edition (premium_edition)
• For enterprise accounts: Free (free)

For more details about the available service plans, see the following table:

Service Plan Details Account Type

Base Edition (blocks_of_100) • Base Edition service plan that in- Enterprise
cludes all core features but doesn't
include document information ex-
traction using generative AI.
• Service plan intended for produc-
tive usage.
• Inference requests in blocks of 100
documents and compute hours.
• You can upload to the service up
to 2000 documents per hour per
tenant (each document can have
up to 100 pages).

See Metering and Pricing [page 79]


and Technical Constraints [page 275].

Base Edition (blocks_of_100) • Service plan intended for personal Trial


exploration. Access is open to ev-
eryone after registration.
• It includes document information
extraction using generative AI.
• You can upload to the service up to
50 document pages per tenant in a
rolling period of 30 days.

See Free Tier Option and Trial Account


Technical Constraints [page 276].

Document Information Extraction


Service Plans PUBLIC 77
Service Plan Details Account Type

Free (free) • Service plan intended for develop- Enterprise


ment and try-out purposes on your
enterprise account.
• It doesn't include document infor-
mation extraction using generative
AI.
• You can upload to the service up to
50 document pages per tenant in a
rolling period of 30 days.

See Free Tier Option and Trial Account


Technical Constraints [page 276] and
the tutorial Get an Account on SAP BTP
to Try Out Free Tier Service Plans .

Premium Edition (premium_edition) • Premium edition service plan that Enterprise


includes document information ex-
traction using generative AI.
• Service plan intended for produc-
tive usage.
• Inference requests in blocks of 100
documents.
• You can upload to the service up
to 2000 documents per hour per
tenant (each document can have
up to 100 pages).

See Metering and Pricing [page 79]


and Technical Constraints [page 275].

 Remember

• If you first activated the Free service plan, you can update the same service instance to switch to Base
Edition or Premium Edition for enterprise accounts.
• Both metadata and transaction data are transferred to Base Edition or Premium Edition for enterprise
accounts when you switch from Free to Base Edition or Premium Edition.
• If you don't want Free and Base Edition or Premium Edition data to be combined together, you can
split them by subscribing to the service plans in separate subaccounts.

Document Information Extraction


78 PUBLIC Service Plans
5 Metering and Pricing

Learn more about the different types of metering and pricing for Document Information Extraction by service
plan.

 Tip

The metering and pricing details listed here are relevant only to users of the service plans Base Edition
(blocks_of_100) and Premium Edition (premium_edition) for enterprise accounts. See Service Plans
[page 77].

The service plan Document Information Extraction, base edition (blocks_of_100) is metered based on the
following metrics:

• Blocks of 100 Documents for Base Edition [page 79]


• Compute Hours for Base Edition [page 80]

The service plan Document Information Extraction, premium edition (premium_edition) is metered based
on the following metrics:

• Blocks of 100 Documents for Premium Edition [page 82]

 Tip

Use the pricing estimator tool .

Related Information

SAP Discovery Center


SAP Business Technology Platform Service Description Guide

5.1 Blocks of 100 Documents for Base Edition

Usage Metric

The service plan Document Information Extraction, base edition (blocks_of_100) is metered based on the
usage of documents defined as unique records processed by the cloud service. One document can consist of
maximum 3 pages. If a document consists of more than 3 pages, each additional 3 pages are charged as an
additional document.

Document Information Extraction


Metering and Pricing PUBLIC 79
Block Size

1 block = 100 documents. The final price is a sum of the number of documents uploaded to the service.

Basic Service

 Caution

The price rates listed below might be outdated. Find updated price rates in the Pricing tab of the SAP
Discovery Center .

Document Information Extraction does not allow users to train and deploy customizable models. For this
service, the number of inference requests is relevant for the charged amount.

Metric Tiers Block Price per Month

Blocks of 100 documents 1 to 300 blocks EUR 20.00

301 to 600 blocks EUR 17.00


1 document = 3 pages
More than 601 blocks EUR 14.00

Example

Cost for 7 blocks = 7 * EUR 20.00 = EUR 140.00.

Cost for 310 blocks = 310 * EUR 17.00 = EUR 5,270.

Cost for 610 blocks = 610 * EUR 14.00 = EUR 8,540.

5.2 Compute Hours for Base Edition

Usage Metric

The service plan Document Information Extraction, base edition (blocks_of_100) is also metered based on
consumed compute hours defined as one hour, or portion thereof, consumed by the cloud service to process
one or several documents with a custom model.

Document Information Extraction


80 PUBLIC Metering and Pricing
Piece Size

 Caution

The price rate listed below might be outdated. Find updated price rates in the Pricing tab of the SAP
Discovery Center .

1 piece = compute hour. 1 compute hour = EUR 1.00.

1 template activation = 5 compute hours.

The costs are associated with the usage of templates. See Template API [page 211] and the Template [page
252] UI feature.

Example

 Note

The following calculation examples are based on current experiments. During the usage of the service, the
exact usage numbers can vary slightly.

Basic Service Calculation

• Metric = compute hours (usage of templates)


• 1 compute hour = EUR 1.00
• 1 template activation = 5 compute hours (EUR 5.00)
• 1 template transaction = 1 second

Estimated Processing Time in Com-


Number of Documents (per month) pute Hours (per month) Estimated Costs in Euro (per month)

500 documents 500 seconds = 1 compute hour EUR 1.00

1000 documents 1000 seconds = 1 compute hour EUR 1.00

5000 documents 5000 seconds = 2 compute hours EUR 2.00

10,000 documents 10,000 seconds = 3 compute hours EUR 3.00

Human Resources (HR) Onboarding Business Scenario Calculation

• HR Shared Service Center onboards 200 new employees each month.


• For each new employee, the HR agent needs to extract information from five different document types.
• The HR agent needs to:
• Process 1000 documents per month
• Create and activate five custom templates
• Use each template 200 times per month

Document Information Extraction


Metering and Pricing PUBLIC 81
Cost Components Consumption Price

Document Upload 1000 documents (10 blocks of 100 EUR 200 per month
documents)

Template activation 5 templates x 5 compute hours EUR 25.00 one time

Template usage 1000 (5 templates x 200 transac- EUR 1.00 per month
tions) x 1 second = 0.3 compute hour
(rounded up to 1 compute hour)

In this example, the total cost is EUR 201.00 per month, and EUR 25.00 only once when the five templates are
activated.

5.3 Blocks of 100 Documents for Premium Edition

Usage Metric

The service plan Document Information Extraction, premium edition (premium_edition) is metered based
on the usage of documents defined as unique records processed by the cloud service.

One document can consist of maximum 1 page. If a document consists of more than 1 page, each additional
page is charged as an additional document.

You can extract a maximum of 50 fields per document. If you extract more than 50 fields per document, every
additional 50 fields are charged as an additional document. As a technical limit, you can add up to 500 header
fields and line items per schema. For more information, see Technical Constraints [page 275].

Block Size

1 block = 100 documents. The final price is a sum of the number of documents uploaded to the service.

Basic Service

 Caution

The price rates listed below might be outdated. Find updated price rates in the Pricing tab of the SAP
Discovery Center .

Document Information Extraction does not allow users to train and deploy customizable models. For this
service, the number of inference requests is relevant for the charged amount.

Document Information Extraction


82 PUBLIC Metering and Pricing
Metric Tiers Price per Month

Blocks of 100 documents 1 to 5 blocks EUR 300.00 (fixed price)

More than 5 blocks EUR 60.00 (block price)


1 document = 1 page

Example

Cost for 1 block = EUR 300.00.

Cost for 3 blocks = EUR 300.00.

Cost for 5 blocks = EUR 300.00.

Cost for 10 blocks = 10 * EUR 60.00 = EUR 600.00.

Document Information Extraction


Metering and Pricing PUBLIC 83
6 Supported Document Types and File
Formats

Document Types

Document Information Extraction supports the following document types as input:

• Standard document types: refer to document types for which SAP provides pre-trained machine learning
models that allow out-of-the-box (without prior training) extraction of information based on default
extractors, which are managed directly by SAP.
• businessCard

 Note

• For now, businessCard documents are only supported at API level.


• Be aware that businessCard documents with more than one contact person are not
supported.
• For businessCard documents, the service extracts only the information (contact details)
from the first page of any submitted document, but all pages are counted for metering
purposes. Submit only single-page documents to avoid additional charges. See Metering and
Pricing [page 79].

• invoice
• paymentAdvice
• purchaseOrder
• Custom document types: refer to document types for which there are no pre-trained machine learning
models that are managed by SAP. Use the Template [page 252] and Schema Configuration [page 245]
features to extract information from custom documents that are different from the standard document
types listed above. See also Schema API [page 184] and Template API [page 211].

File Formats

Document Information Extraction supports the following document file formats as input:

• Document files in PDF format


• Single-page document files in JPEG, PNG, and TIFF format
• Image files that include scene text in JPEG, PNG, and TIFF format
• E-invoice document files in PDF and XML hybrid format, and in Factur-X and ZUGFeRD standards (all
versions)
• paymentAdvice document files in Excel format

Document Information Extraction


84 PUBLIC Supported Document Types and File Formats
 Note

• The endpoint Upload Document [page 127] accepts only multipart-encoded files with a file name and a
content type.
• The file name should contain a file extension. For example: “invoice” only, without a file extension, is
not a valid file name.
• The file name cannot be empty even if a file extension is provided. For example: “.pdf” is not a valid file
name.

 Tip

The Document Information Extraction service handles distorted and asymmetrical images with a rotation
of multiples of 90 degrees. In addition, small rotations of up to 15 degrees are also handled by the service.
In both cases, the images are deskewed automatically.

Document Information Extraction


Supported Document Types and File Formats PUBLIC 85
7 Supported Languages and Countries/
Regions

Explore the Document Information Extraction supported languages and countries/regions by document type
and extraction approach.

• Business Card: Languages [page 86]


• Invoice: Languages and Countries/Regions [page 87]
• Payment Advice: Languages and Countries/Regions [page 90]
• Purchase Order: Languages and Countries/Regions [page 91]
• Extraction Using Template: Languages [page 92]
• Extraction Using Generative AI: Languages [page 94]

The supported languages and countries/regions have been validated with Document Information Extraction.
It is also possible to get similar accuracy results with documents in other languages and from other countries/
regions that use Latin-1 (ISO-8859-1) character sets.

If you want to try out Document Information Extraction to check if it fulfills your business needs, you can use
a trial account to upload to the service a document in any language and from any country/region, and get the
results following the tutorial mission Use Machine Learning to Process Business Documents .

7.1 Business Card: Languages

See the list of supported languages for businessCard documents.

 Restriction

For now, businessCard documents are only supported at API level.

Language

Document Information Extraction supports the following languages for businessCard documents:

Language Language Code

Chinese Simplified zh-Hans

Chinese Traditional zh-Hant

Dutch nl

Document Information Extraction


86 PUBLIC Supported Languages and Countries/Regions
Language Language Code

English en

French fr

German de

Hebrew he

Italian it

Japanese ja

Korean ko

Polish pl

Portuguese pt

Russian ru

Spanish es

7.2 Invoice: Languages and Countries/Regions

See the list of supported languages and countries/regions for invoice documents. See also the supported
countries/regions, and extracted fields for barcodes in invoice documents.

Language

Document Information Extraction supports the following languages for invoice documents:

Language Language Code

Czech cs

Danish da

Dutch nl

English en

Finnish fi

French fr

German de

Hungarian hu

Italian it

Japanese jp

Document Information Extraction


Supported Languages and Countries/Regions PUBLIC 87
Language Language Code

Norwegian no

Polish pl

Portuguese pt

Romanian ro

Slovak sk

Slovenian sl

Spanish es

Swedish sv

Turkish tr

Country/Region

Document Information Extraction supports the following countries/regions for invoice documents:

• Australia
• Austria
• Belgium
• Canada
• Czech Republic
• Denmark
• Finland
• France
• Germany
• Hungary
• Italy
• Japan
• Mexico
• Netherlands
• New Zealand
• Norway
• Poland
• Portugal
• Romania
• Slovakia
• Slovenia
• Spain
• Sweden
• Switzerland

Document Information Extraction


88 PUBLIC Supported Languages and Countries/Regions
• Türkiye
• United Kingdom
• United States

 Note

Document Information Extraction does not guarantee to support all specific fields for the countries/
regions listed above, even if they are legally required in a country/region.

Barcode Country/Region and Extracted Fields

Document Information Extraction supports the following countries/regions, and extracted fields for barcodes
in invoice documents:

Barcode Country/Region Extracted Fields

Argentina • currencyCode
• documentDate
• documentNumber
• grossAmount

Basque • documentNumber
• grossAmount

Brazil • currencyCode
• grossAmount
• senderName

China • documentDate
• documentNumber
• netAmount

Colombia • documentDate
• documentNumber
• grossAmount
• netAmount
• receiverTaxId
• taxAmount

EPC QR code (European Payments Council Quick Response • currencyCode


Code including Austria, Belgium, Finland, Germany, and • grossAmount
Netherlands) • senderName

Document Information Extraction


Supported Languages and Countries/Regions PUBLIC 89
Barcode Country/Region Extracted Fields

India • documentDate
• documentNumber
• grossAmount
• receiverTaxId
• taxId

Mexico • grossAmount
• taxId

Switzerland • currencyCode
• documentNumber
• grossAmount
• senderAddress
• senderBankAccount
• senderName
• receiverAddress
• receiverName

Uruguay • documentNumber
• grossAmount

 Note

The barcode supported countries/regions have been validated with Document Information Extraction. It is
also possible to get similar accuracy results with documents from other countries/regions that contain the
most common types of 1D and 2D barcodes as described in Barcode Header Field in Invoice Documents
[page 285].

7.3 Payment Advice: Languages and Countries/Regions

See the list of supported languages and countries/regions for paymentAdvice documents.

Language

Document Information Extraction supports the following languages for paymentAdvice documents:

Language Language Code

English en

Document Information Extraction


90 PUBLIC Supported Languages and Countries/Regions
Language Language Code

German de

Country/Region

Document Information Extraction supports the following countries/regions for paymentAdvice documents:

• Germany
• United Kingdom

7.4 Purchase Order: Languages and Countries/Regions

See the list of supported languages and countries/regions for purchaseOrder documents.

Language

Document Information Extraction supports the following languages for purchaseOrder documents:

Language Language Code

English en

German de

Country/Region

Document Information Extraction supports the following countries/regions for purchaseOrder documents:

• Germany
• United Kingdom
• United States

Document Information Extraction


Supported Languages and Countries/Regions PUBLIC 91
7.5 Extraction Using Template: Languages

See the list of languages supported when using a template to extract information from custom and standard
documents.

 Note

When using templates to extract information from standard documents, the accuracy results are usually
higher when you take into account the supported languages and countries/regions listed for Business
Card: Languages [page 86], Invoice: Languages and Countries/Regions [page 87], Payment Advice:
Languages and Countries/Regions [page 90], and Purchase Order: Languages and Countries/Regions
[page 91] documents.

Language

Extraction using a template supports the following Latin languages:

Language Language Code

Albanian sq

Bosnian bs

Catalan ca

Croatian hr

Czech cs

Danish da

Dutch nl

English en

Estonian et

Finnish fi

French fr

Galician gl

German de

Hungarian hu

Icelandic is

Indonesian id

Italian it

Irish ga

Document Information Extraction


92 PUBLIC Supported Languages and Countries/Regions
Language Language Code

Latvian lv

Lithuanian lt

Malaysian ms

Montenegrin cnr

Norwegian no

Polish pl

Portuguese pt

Serbian sr

Slovak sk

Slovenian sl

Spanish es

Swedish sv

Turkish tr

Welsh cy

Extraction using a template supports the following non-Latin languages:

Language Language Code

Arabic ar

Chinese Simplified zh-Hans

Chinese Traditional zh-Hant

Greek el

Hebrew he

Japanese ja

Korean ko

Russian ru

Thai th

Related Information

Template API [page 211]


Template [page 252]

Document Information Extraction


Supported Languages and Countries/Regions PUBLIC 93
7.6 Extraction Using Generative AI: Languages

See the list of languages supported when using generative AI to extract information from custom and
standard documents.

 Restriction

Extraction using generative AI is available with the service plan Document Information Extraction,
premium edition (premium_edition) only. See Service Plans [page 77] and Metering and Pricing [page
79].

You can also use an SAP BTP trial account to try out extraction using generative AI. Follow the tutorial:
Use Trial to Extract Information from Custom Documents with Generative AI and Document Information
Extraction .

Language

Extraction using generative AI supports the following Latin languages:

Language Language Code

Albanian sq

Bosnian bs

Catalan ca

Croatian hr

Czech cs

Danish da

Dutch nl

English en

Estonian et

Finnish fi

French fr

Galician gl

German de

Hungarian hu

Icelandic is

Indonesian id

Italian it

Document Information Extraction


94 PUBLIC Supported Languages and Countries/Regions
Language Language Code

Irish ga

Latvian lv

Lithuanian lt

Malaysian ms

Montenegrin cnr

Norwegian no

Polish pl

Portuguese pt

Serbian sr

Slovak sk

Slovenian sl

Spanish es

Swedish sv

Turkish tr

Welsh cy

Extraction using generative AI supports the following non-Latin languages:

Language Language Code

Arabic ar

Chinese Simplified zh-Hans

Chinese Traditional zh-Hant

Greek el

Hebrew he

Japanese ja

Korean ko

Russian ru

Thai th

Related Information

Add Fields to Schema Version [page 199]


Setup Types [page 249]
Extraction Using Generative AI: Best Practices [page 273]

Document Information Extraction


Supported Languages and Countries/Regions PUBLIC 95
8 Initial Setup

Get started with Document Information Extraction using the standard procedures for SAP BTP Cloud Foundry
environment or Kyma environment.

 Tip

See Tutorials [page 101] to find out how to use a trial account or the free tier option for Document
Information Extraction to try out the service.

Prerequisites

You have set up your global account and at least one subaccount on SAP BTP. For an overview of the required
steps, see Getting Started in the Cloud Foundry Environment or Getting Started in the Kyma Environment.

 Note

Document Information Extraction allows you to move subaccounts between your global accounts. For more
information, see Relationship Between Global Accounts, Subaccounts, and Directories [Feature Set B].

Related Information

Enabling the Service in the Cloud Foundry Environment [page 97]


Enabling the Service in the Kyma Environment [page 97]
Subscribing to the Document Information Extraction UI [page 234]

Document Information Extraction


96 PUBLIC Initial Setup
8.1 Enabling the Service in the Cloud Foundry Environment

Enable Document Information Extraction using the standard procedures for SAP BTP Cloud Foundry
environment.

Context

 Tip

You can also use the booster Set up account for Document Information Extraction to automate the steps
described below on the SAP BTP cockpit. See Boosters and the tutorials:

• Use Free Tier to Set Up Account for Document Information Extraction and Get Service Key
• Use Free Tier to Set Up Account for Document Information Extraction and Go to Application

Procedure

1. Create a service instance in the Cloud Foundry environment. See Creating Service Instances.
2. You can then bind the service instance to your application, or you can create a service key to communicate
directly with the service instance. See Binding Service Instances to Applications and Creating Service Keys.

8.2 Enabling the Service in the Kyma Environment

Enable Document Information Extraction using the standard procedures for Kyma environment.

Procedure

1. Create a service instance in the Kyma environment.


2. You can then bind the service instance to your application, or you can create a service key to communicate
directly with the service instance. See Using SAP BTP Services in the Kyma Environment.

Document Information Extraction


Initial Setup PUBLIC 97
9 Enable X.509 Authentication

Find out how to enable your service instance for authentication with an X.509 client certificate.

The Document Information Extraction service supports X.509 authentication with the certificates managed
either by the SAP Authorization and Trust Management service or self-managed.

 Restriction

The X.509 authentication is currently available for Document Information Extraction at API level only.

Enable Service Instance to Use X.509 Secrets

To enable your service instance for authentication with an X.509 client certificate, in the New Instance or
Subscription wizard, enter in Parameters the following instance parameters in JSON format:

{
"xs-security":{
"xsappname":"<app-name>",
"oauth2-configuration":{
"credential-types":[
"x509"
]
}
}
}

 Note

In the sample code, "<app-name>" is a name of your choice.

Create Service Key or Service Binding Additional Parameters

To use X.509 secrets, you need to set additional parameters when you create your service key or service
binding. We support the following two scenarios:

• The SAP Authorization and Trust Management service generates certificates for you. In this case, the
parameters you need to set when you create your service key or service binding are the following in JSON
format:

{
"xsuaa":{
"credential-type":"x509",
"x509":{
"key-length":2048,
"validity":8,
"validity-type":"DAYS"

Document Information Extraction


98 PUBLIC Enable X.509 Authentication
}
}
}

For a detailed description of the parameters, see Parameters for X.509 Certificates Managed by SAP
Authorization and Trust Management Service.
• You already have your own public key infrastructure (PKI), with certificates issued from one of the trusted
Certificate Authorities (CAs). In this case, the parameters you need to set when you create your service key
or service binding are the following in JSON format::

{
"xsuaa":{
"credential-type":"x509",
"x509":{
"certificate":"-----BEGIN CERTIFICATE-----...-----END
CERTIFICATE-----",
"ensure-uniqueness":false,
"certificate-pinning":true,
"hide-certificate":true
}
}
}

For a detailed description of the parameters, see Parameters for Self-Managed X.509 Certificates. See also
Trusted Certificate Authentication.

Get an Authorization Token with X.509 Certificate

To get an authorization token using an X.509 certificate, use “certurl”. In the scenario of already generated
certificates, also use “key” and “certificate” from the service key.

Example of a request using curl:

curl --cert <path to certificate.pem> --key <path to


key.pem> --request POST <value of "uaa.certurl">/oauth/token -d
'grant_type=client_credentials&client_id=<Value of "uaa.clientid">'

See also the blog post: X.509 certificate-based authentication(mTLS) – Generating X.509 certificates of BTP
managed services .

Document Information Extraction


Enable X.509 Authentication PUBLIC 99
10 Run the Service in a Multitenant
Application

Find out how to run Document Information Extraction in a multitenant application.

In the Cloud Foundry environment, you can develop and run multitenant applications, and share them with
multiple consumers simultaneously on SAP BTP.

Document Information Extraction supports this scenario and can be declared as a dependency of a multitenant
application. This means that Document Information Extraction gets provisioned automatically for every
consumer that subscribes to the multitenant application. Different consumers are independently provisioned
and data from these consumers is isolated inside Document Information Extraction.

 Tip

See Developing Multitenant Applications in the Cloud Foundry Environment for more details on how to
declare Document Information Extraction as a dependency of a multitenant application using the SAP
SaaS Provisioning service.

Document Information Extraction


100 PUBLIC Run the Service in a Multitenant Application
11 Tutorials

Follow our tutorials to get familiar with the Document Information Extraction UI application, APIs, and
functionalities.

Tutorial Missions Description

Use Generative AI to Process Business Documents Find out how to use the SAP Business Technology Platform
service Document Information Extraction with generative AI
to automate the extraction of information from any type of
document using large language models (LLMs).

Use Machine Learning to Process Business Documents Try out the Document Information Extraction Trial UI to proc-
ess business documents that have content in headers and
tables.

Use Machine Learning to Extract Information from Business Process business documents that have content in head-
Documents and Enrich Data
ers and tables, and enrich the information extracted with
your own master data records, using machine learning and
Swagger UI.

Shape Machine Learning to Process Standard Business Create your own header and line item fields, and edit extrac-
Documents tion results for documents associated with templates to au-
tomate the extraction of information from standard business
documents such as invoices and purchase orders.

Shape Machine Learning to Process Custom Documents Create your own header and line item fields, and edit ex-
traction results for documents associated with templates to
automate the extraction of information from custom docu-
ments (not supported out of the box) such as résumés and
power of attorney.

 Tip

See also the following onboarding tutorials that use the free tier option for Document Information
Extraction:

• Use Free Tier to Set Up Account for Document Information Extraction and Get Service Key
• Use Free Tier to Set Up Account for Document Information Extraction and Go to Application

Related Information

Tutorial Navigator

Document Information Extraction


Tutorials PUBLIC 101
12 Development

Explore the sections listed below to get started with the Document Information Extraction APIs and the
Notifications feature.

• API Reference [page 102]


• Notifications [page 227]

12.1 API Reference

Explore the Document Information Extraction APIs.

Before using the Document Information Extraction APIs listed below, you need to retrieve your OAuth access
token as described in Get Access Token [page 103].

• Capabilities API [page 104]


• Client API [page 106]
• Identifier API [page 110]
• Configuration API [page 115]
• Document API [page 127]
• Enrichment Data API [page 166]
• Schema API [page 184]
• Template API [page 211]

To display the comprehensive specification of the Document Information Extraction APIs in Swagger UI, add
the URL path extension /document-information-extraction/v1 to the Document Information Extraction
base URL (that is, the url value from outside the uaa section of your service key).

Related Information

Common Request Headers [page 226]


Common Status and Error Codes [page 226]
Best Practices [page 258]
Technical Constraints [page 275]
Extracted Header Fields [page 278]
Extracted Line Items [page 286]

Document Information Extraction


102 PUBLIC Development
12.1.1 Get Access Token

Retrieve your OAuth access token, which will grant you access to the Document Information Extraction APIs.

 Note

The token is valid for 12 hours. After that, you need to generate a new one.

 Tip

Alternatively, you can follow the steps in this tutorial to Get OAuth Access Token for Document Information
Extraction via Web Browser .

Request

Base URL: url value from inside the uaa section of the service key

URL Path: /oauth/token

HTTP Method: POST

Request Headers

Header Required Values

Content-Type Yes <application/x-www-form-


urlencoded>

Request Parameters

Parameter Required Data Type Description

client_id Yes String The clientid value from the service key.

client_secret Yes String The clientsecret value from the service key.

grant_type Yes String Token grant type. Set it to


client_credentials.

response_type Yes String Token response type. Set it to token.

Response

The response is given as a status (200 or 401). See Common Status and Error Codes [page 226].

Response Example
200 “Success”

Document Information Extraction


Development PUBLIC 103
"access_token": "<< your access token >>",
"token_type": "bearer",
"expires_in": 43199,
"scope": "uaa.resource",
"jti": "8d00c157058949daab714a44c04c416b"
}

12.1.2 Capabilities API

See the list of document fields and enrichment data for each document type you can process with Document
Information Extraction.

 Tip

• See Supported Document Types and File Formats [page 84].


• See also Supported Languages and Countries/Regions [page 86].

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /capabilities

HTTP Method: GET

Response

Response Fields

JSON Field Description

documentTypes List of document types you can submit

enrichment List of enrichment data you can match

extraction List of headerFields, lineItemFields, enrichment and documentTypes

headerFields List of header fields you can extract

lineItemFields List of line items you can extract

The response is given as a status (200, 401, or 500) and JSON file. See Common Status and Error Codes [page
226].

Document Information Extraction


104 PUBLIC Development
Response Example
200 “Success”

{
"extraction":{
"headerFields":[
{
"name":"documentNumber",
"type":"string",
"category":"document",
"supportedDocumentTypes":[
"invoice",
"paymentAdvice",
"purchaseOrder"
]
},
{
"name":"taxId",
"type":"string",
"category":"amounts",
"supportedDocumentTypes":[
"invoice",
"purchaseOrder"
]
},
{
"name":"taxName",
"type":"string",
"category":"amounts",
"supportedDocumentTypes":[
"invoice"
]
},
{
"name":"purchaseOrderNumber",
"type":"string",
"category":"details",
"supportedDocumentTypes":[
"invoice"
]
},
{
"name":"shippingAmount",
"type":"number",
"category":"amounts",
"supportedDocumentTypes":[
"invoice"
]
},
"..."
],
"lineItemFields":[
{
"name":"description",
"type":"string",
"category":"details",
"supportedDocumentTypes":[
"invoice",
"purchaseOrder"
]
},
{
"name":"netAmount",
"type":"number",
"category":"amounts",
"supportedDocumentTypes":[
"invoice",

Document Information Extraction


Development PUBLIC 105
"paymentAdvice",
"purchaseOrder"
]
},
{
"name":"quantity",
"type":"number",
"category":"details",
"supportedDocumentTypes":[
"invoice",
"purchaseOrder"
]
},
{
"name":"unitPrice",
"type":"number",
"category":"details",
"supportedDocumentTypes":[
"invoice",
"purchaseOrder"
]
},
{
"name":"materialNumber",
"type":"string",
"category":"details",
"supportedDocumentTypes":[
"invoice"
]
},
"..."
]
},
"enrichment":{
"employee":{
"dataTypes":[
"employee"
]
},
"sender":{
"dataTypes":[
"businessEntity"
]
},
"receiver":{
"dataTypes":[
"businessEntity"
]
}
},
"documentTypes":[
"invoice",
"paymentAdvice",
"purchaseOrder",
"businessCard"
]
}

12.1.3 Client API


Document Information Extraction requires a client to be called. A default client is created following tenant
provisioning, enabling you to use the service immediately.

The Client API consists of the following endpoints:

Document Information Extraction


106 PUBLIC Development
• Create Client [page 107]
• Get Client [page 108]
• Delete Client [page 109]

12.1.3.1 Create Client

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /clients

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

payload Yes JSON Object List of clients containing clientId and


clientName

Request Example

Single client:

{
"value":[
{
"clientId":"c_00",
"clientName":"client 00"
}
]
}

Multiple clients:

{
"value":[
{
"clientId":"c_00",
"clientName":"tyler"
},
{
"clientId":"c_01",
"clientName":"jlaix"
}
]
}

Document Information Extraction


Development PUBLIC 107
Response

Response Fields

JSON Field Description

inserted Number of inserted entries

modified Number of modified entries

The response is given as a status (201, 400, 401, 429, or 500) and JSON file. See Common Status and Error
Codes [page 226] and Technical Constraints [page 275].

Response Example

201 “Created”

{
"inserted":1,
"modified":2
}

12.1.3.2 Get Client

Retrieve all the client names and IDs.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /clients

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientIdStartsW No String Filters the list of clients by the characters the clien-
ith tId starts with. For example: c

limit Yes Integer Number of clients to process. For example: 10. See
Technical Constraints [page 275]

offset No Integer Index of the first client to be retrieved. For example:


10

Document Information Extraction


108 PUBLIC Development
Response

Response Fields

JSON Field Description

id Tenant ID

payload List of all clients, including their zoneId, clientId, and clientName

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"id": "1234",
"payload": [
{
"clientId": "c_00",
"clientName": "client 00"
},
{
"clientId": "c_01",
"clientName": "client 01"
},
{
"clientId": "c_02",
"clientName": "client 02"
},
{
"clientId": "c_03",
"clientName": "client 03"
},
{
"clientId": "c_04",
"clientName": "client 04"
}
]
}

12.1.3.3 Delete Client

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /clients

HTTP Method: DELETE

Document Information Extraction


Development PUBLIC 109
Request Parameters

Parameter Required Data Type Description

payload Yes JSON Object List of client IDs

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"message": "Successfully deleted 1 client(s)."
}

12.1.4 Identifier API

Create, list, and delete identifiers for client mappings.

 Restriction

The Identifier API is only available for paymentAdvice documents in Excel format.

The Identifier API consists of the following endpoints:

• Create Identifier [page 111]


• Get Identifier [page 113]
• Delete Identifier [page 114]

Document Information Extraction


110 PUBLIC Development
12.1.4.1 Create Identifier

Create identifiers for client mappings.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /identifier

HTTP Method: POST

Request Parameters

 Note

In single POST calls, you can create aliases for only one documentType and fileType.

Parameter Required Data Type Description

clientId Yes String The ID of the client. For example: c_00.

options Yes JSON Object Options for processing the document. See the
Options Payload table below.

Options Payload
Option Required Data Type Description

documentType Yes String Type of the document


submitted. For now, only
paymentAdvice is sup-
ported.

fileType Yes String Type of the file submitted.


For now, only Excel is sup-
ported.

headerFields Yes String List of header fields with the


aliases of supported capabili-
ties you want to extract from
a specific documentType
and fileType. See the
list of fields that can be ex-
tracted from header fields
in Extracted Header Fields
[page 278].

language Yes String Language of the aliases

Document Information Extraction


Development PUBLIC 111
Option Required Data Type Description

lineItemFields Yes String List of line item fields


with the aliases of sup-
ported capabilities you want
to extract from a spe-
cific documentType and
fileType. See the list of
fields that can be extracted
from line items in Extracted
Line Items [page 286].

Request Example: Options Payload

{
"documentType":"paymentAdvice",
"fileType":"Excel",
"headerFields":[
{
"language":"en",
"capabilities":{
"documentNumber":[
"Payment Number"
],
"documentDate":[
"Payment Date"
],
"currencyCode":[
"Invoice Currency"
],
"grossAmount":[
"Amount in Invoice Currency",
"Document currency"
]
}
},
{
"language":"de",
"capabilities":{
"documentNumber":[
"Beleg-Nr."
],
"documentDate":[
"RE-Datum"
]
}
}
],
"lineItemFields":[
{
"language":"en",
"capabilities":{
"documentNumber":[
"Invoice Number",
"Document Number"
],
"documentDate":[
"Invoice Date",
"Document Date"
],
"discountAmount":[
"Cash disc. amt LC"
],
"netAmount":[
"Amount Paid",

Document Information Extraction


112 PUBLIC Development
"Amount in doc. curr."
]
}
},
{
"language":"de",
"capabilities":{
"documentNumber":[
"Beleg-Nr."
],
"documentDate":[
"RE-Datum"
],
"netAmount":[
"Gesamt-OP"
]
}
}
]
}

Response

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

12.1.4.2 Get Identifier

Retrieve all identifiers for client mappings by fileType, documentType, and clientId.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /identifier

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. For example: c_00.

documentType Yes String Type of the document submitted. For now, only
paymentAdvice is supported.

Document Information Extraction


Development PUBLIC 113
Parameter Required Data Type Description

fileType Yes String Type of the file submitted. For now, only Excel is
supported.

Response

Response Fields

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

12.1.4.3 Delete Identifier

Delete identifiers for client mappings.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /identifier

HTTP Method: DELETE

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. For example: c_00.

documentType No String Type of the document submitted. For now, only


paymentAdvice is supported.

fileType No String Type of the file submitted. For now, only Excel is
supported.

 Note

If you want to delete aliases for a specific documentType and fileType, all parameter fields are required.
If the documentType and fileType are not provided, all aliases are deleted.

Document Information Extraction


114 PUBLIC Development
Response

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

12.1.5 Configuration API

Create, update, list, and delete configurations on tenant scope by default, or optionally, on instance or client
scope.

The Configuration API consists of the following endpoints:

• Create Configuration [page 115]


• Get Configuration [page 120]
• Get Configuration with Key [page 122]
• Delete Configuration [page 124]

Related Information

Configuration Keys [page 117]

12.1.5.1 Create Configuration

Create or update configurations according to the given payload.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /configuration

HTTP Method: POST

Document Information Extraction


Development PUBLIC 115
Request Parameters

Parameter Required Data Type Description

clientId No String The ID of the client you want to set the configura-
tion for. For example: c_00. This parameter is only
used for client scope configurations.

payload Yes JSON Object List of configuration key-value pairs. For more in-
formation, see Configuration Keys [page 117].

scope No String Choose the scope of the configuration among the


following values:

• client
• instance
• tenant

 Tip
If you leave this parameter empty, the
tenant scope is used.

tenantId No String The ID of the tenant you want to set the configura-
tion for.

 Tip
If you leave this parameter empty, the
tenantId sending the request is used.

Response

Response Fields

JSON Field Description

inserted Number of inserted entries

modified Number of modified entries

The response is given as a status (201, 401, or 500) and JSON file. See Common Status and Error Codes [page
226].

Response Example
201 “Success”

{
"inserted":1,
"modified":0
}

Document Information Extraction


116 PUBLIC Development
12.1.5.1.1 Configuration Keys

Explore the available configuration keys for the Document Information Extraction service.

Create Config-
uration [page
115] Request
Configuration Possible Val- Payload Exam-
Key Default Value ues Scope Description ple

activateDo false true, or • client Use this configuration key to enable


{
cumentNoti false
• instanc or disable the Notifications [page
fications 227] functionality. Set "value":
e {
• tenant
activateDocumentNotifica
tions to true to get notifications "activat
eDocumen
about the status of your processed
tNotific
documents. ations":
"true"
}
}

clientSegr false true, or • instanc Use this configuration key to restrict


{
egation false e user access to specified clients. See

• tenant also Add Document [page 240]. "value":


{

"clientS
egregati
on":"tru
e"
}
}

coordinate default default, • instanc Use this configuration key to choose


{
Format absolute, or e the format of the bounding box co-
normalized
• tenant ordinates in the extraction results. "value":
{

"coordin
ateForma
t":"norm
alized"
}
}

Document Information Extraction


Development PUBLIC 117
Create Config-
uration [page
115] Request
Configuration Possible Val- Payload Exam-
Key Default Value ues Scope Description ple

dataFeedba false true, or • client Use this configuration key to make


{
ckCollecti false
• instanc use of the data feedback collection
on feature. See also Confirm Document "value":
e {
[page 157]. If set to false, all
• tenant
documents already uploaded to the "dataFee
service for retraining by this tenant
dbackCol
lection"
(or instance) are deleted. And all :"true"
documents uploaded from that mo- }
}
ment onwards are no longer used to
retrain the service's machine learn-
ing models.

See also Delete Configuration [page


124].

 Remember
As Document Information
Extraction learns from data,
enabling data feedback collec-
tion may help the service to be-
come more accurate in extract-
ing information from your docu-
ments. On the contrary, dele-
tion of data may result in ex-
traction results becoming less
accurate. Deletion of data is ir-
reversible.

documentRe 7 days 1 to 30 days • client Use this configuration key to set the
{
tentionTim
• instanc retention period, for inference docu-
eDays ments uploaded to the service. "value":
e {
• tenant
"documen
tRetenti
onTimeDa
ys":"10"
}
}

Document Information Extraction


118 PUBLIC Development
Create Config-
uration [page
115] Request
Configuration Possible Val- Payload Exam-
Key Default Value ues Scope Description ple

enrichment low low, medium, • client Use this configuration key to adjust
{
Confidence or high
• instanc the similarity confidence threshold
Threshold for the enrichment. "value":
e {
• tenant The low value results in more
"enrichm
matches with higher possiblity of entConfi
false-positve matches. denceThr
eshold":
The high value returns only very "medium"
}
confident matches and has lower
}
tolerance for differences between
document content and master data.

The medium value is a balanced ad-


justment.

This configuration can alter the be-


havior of the enrichment. If you
don't get good enrichment results,
it's recommended to test the dif-
ferent values for this configuration.
Use a lower value if you want to
get more matches, or if the ex-
pected master data doesn't match
the document. Use a higher value
if you get incorrect or unexpected
matches.

manualData false true, or tenant Use this configuration key to set


{
Activation false data activation to manual, instead of
using the default automatic refresh "value":
{
of enrichment data, that takes place
every 4 hours. "manualD
ataActiv
See also Create Data Activation ation":"
[page 179] and Get Data Activation true"
}
Details [page 180]. }

Document Information Extraction


Development PUBLIC 119
Create Config-
uration [page
115] Request
Configuration Possible Val- Payload Exam-
Key Default Value ues Scope Description ple

performPII true true, or tenant This is a subconfiguration of


{
Check false
the dataFeedbackCollection
configuration key. To use "value":
{
this subconfiguration, set
the dataFeedbackCollection "perform
PIICheck
configuration key to true. The ":"false
performPIICheck subconfigu- "
}
ration is set to true by default. If
}
set to true, the service automatically
scans documents for Personally
Identifiable Information (PII) and ex-
clude any document with PII from
being used for improving the serv-
ice. If you set performPIICheck
to false, all documents may be
used for improving the service.

See also Confirm Document [page


157].

 Note

Before setting the dataFeedbackCollection configuration key to true, and the performPIICheck
subconfiguration to false, review the subsection Deletion of Personal Data in Data Protection and Privacy
[page 288].

 Restriction

The documentRetentionTimeDays and dataFeedbackCollection configuration keys, and the


performPIICheck subconfiguration are only available on production environments for enterprise
accounts. These keys are not available for trial account users.

12.1.5.2 Get Configuration

Retrieve all configurations already created for the requested scope.

Request

Base URL: url value from outside the uaa section of the service key

Document Information Extraction


120 PUBLIC Development
URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /configuration

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId No String The ID of the client you want to get the configura-
tion for. For example: c_00. This parameter is only
used for client scope configurations.

scope No String Choose the scope of the configuration among the


following values:

• active (all configurations already created,


including the ones on client, instance,
and tenant scope)
• client
• instance
• tenant

 Tip
If you leave this parameter empty, the
active scope is used.

tenantId No String The ID of the tenant you want to get the configura-
tion for.

 Tip
If you leave this parameter empty, the
tenantId sending the request is used.

Response

Response Fields

JSON Field Description

results List containing information of all configurations

The response is given as a status (200, 401, or 500) and JSON file. See Common Status and Error Codes [page
226].

Document Information Extraction


Development PUBLIC 121
Response Example
200 “Success”

{
"results":{
"documentRetentionTimeDays": "10",
"manualDataActivation": "true",
"dataFeedbackCollection": "true",
"performPIICheck": "true"
}
}

12.1.5.3 Get Configuration with Key

Retrieve all configurations already created for a given key for the requested scope.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /configuration/<key>

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId No String The ID of the client you want to get the configura-
tion for. For example: c_00. This parameter is only
used for client scope configurations.

key Yes String One of the available Configuration Keys [page 117].

Document Information Extraction


122 PUBLIC Development
Parameter Required Data Type Description

scope No String Choose the scope of the configuration among the


following values:

• active (all configurations already created,


including the ones on client, instance,
and tenant scope)
• client
• instance
• tenant

 Tip
If you leave this parameter empty, the
active scope is used.

tenantId No String The ID of the tenant you want to get the configura-
tion for.

 Tip
If you leave this parameter empty, the
tenantId sending the request is used.

Response

Response Fields

JSON Field Description

results List containing information of all configurations

The response is given as a status (200, 400, 401, 404, or 500) and JSON file. See Common Status and Error
Codes [page 226].

Response Example
200 “Success”

{
"results":{
"documentRetentionTimeDays": "10"
}
}

{
"results":{
"manualDataActivation": "true"
}

Document Information Extraction


Development PUBLIC 123
}

{
"results":{
"dataFeedbackCollection": "true"
}
}

{
"results":{
"performPIICheck": "true"
}
}

12.1.5.4 Delete Configuration

Delete configurations according to the given payload.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /configuration

HTTP Method: DELETE

Request Parameters

Parameter Required Data Type Description

clientId No String The ID of the client you want to delete the config-
uration for. For example: c_00. This parameter is
only used for client scope configurations.

Document Information Extraction


124 PUBLIC Development
Parameter Required Data Type Description

payload Yes JSON Object List of configuration keys. All configurations are
deleted if payload is empty. Possible configuration
key and subconfiguration values:

• activateDocumentNotifications
• clientSegregation
• coordinateFormat
• dataFeedbackCollection

 Note
After sending the DELETE request using
the dataFeedbackCollection con-
figuration key, all documents already up-
loaded to the service for retraining by
this tenant (or service instance) are de-
leted. And all documents uploaded from
that moment onwards are no longer used
to retrain the service's machine learn-
ing models. See also Configuration Keys
[page 117] (if parameter is set to false).

• documentRetentionTimeDays

 Note
After sending the DELETE request using
the documentRetentionTimeDays
configuration key, the default retention
period of 7 days is used again.

• enrichmentConfidenceThreshold
• manualDataActivation
• performPIICheck

scope No String Choose the scope of the configuration among the


following values:

• client
• instance
• tenant

 Tip
If you leave this parameter empty, the
tenant scope is used.

Document Information Extraction


Development PUBLIC 125
Parameter Required Data Type Description

tenantId No String The ID of the tenant you want to delete the config-
uration for.

 Tip
If you leave this parameter empty, the
tenantId sending the request is used.

Request Examples

{
"value":[
"documentRetentionTimeDays"
]
}

{
"value":[
"manualDataActivation"
]
}

{
"value":[
"dataFeedbackCollection"
]
}

{
"value":[
"performPIICheck"
]
}

{
"value":[
"documentRetentionTimeDays",
"manualDataActivation",
"dataFeedbackCollection",
"performPIICheck"
]
}

Response

Response Fields

JSON Field Description

deleted Total number of configurations deleted with this request

Document Information Extraction


126 PUBLIC Development
The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

200 “Success”

{
"deleted": 1
}

12.1.6 Document API

The core functionality of Document Information Extraction is extracting structured information from
documents automatically using machine learning. The Document API provides endpoints to upload documents
for processing and also to get the results.

The Document API consists of the following endpoints:

• Upload Document [page 127]


• Post Catalog [page 134]
• List Documents [page 137]
• Get Result [page 138]
• Save Ground Truth [page 154]
• Confirm Document [page 157]
• Get Document File [page 159]
• Get All Pages Text [page 159]
• Get Single Page Text [page 161]
• Get Request Options [page 163]
• Get Templates Associated with Document [page 164]
• Delete Document [page 165]

12.1.6.1 Upload Document

Upload a document file to the service to get the extraction results from header fields and line items in JSON
format.

 Tip

• See Supported Document Types and File Formats [page 84].


• See also Supported Languages and Countries/Regions [page 86].

Document Information Extraction


Development PUBLIC 127
Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

file Yes File Document file you want to process. See Supported
Document Types and File Formats [page 84].

options Yes JSON Object Options for processing the document. See the
Options Payload table below.

Options Payload
Option Required Data Type Description

candidateTemplateId No String IDs of templates from


s which the service automati-
cally detects the appropriate
templateId.

clientId Yes String The ID of the client. For ex-


ample: c_00

customLabel No String The label you want to use for


the document. If used you
can query the corresponding
document ID using the Post
Catalog [page 134] endpoint.

documentType No String The type of the docu-


ment you uploaded. For
example: invoice or
paymentAdvice

enrichment No String See Enrichment Parameter


[page 132].

headerFields No (Yes if payload doesn't in- String Comma-separated list of


clude schemaId)
header fields you want to
extract. When you include
schemaId in the payload,
don't include a list of
headerFields.

See the list of fields that can


be extracted from header
fields in Extracted Header
Fields [page 278].

Document Information Extraction


128 PUBLIC Development
Option Required Data Type Description

lineItemFields No (Yes if payload doesn't in- String Comma-separated list of


clude schemaId)
line item fields you want
to extract. When you in-
clude schemaId in the pay-
load, don't include a list of
lineItemFields.

See the list of fields that can


be extracted from line items
in Extracted Line Items [page
286].

receivedDate No String The date when the document


was received. For example:
2020-02-17

templateId No String The ID of the template to be


used for this document.

To detect templateId au-


tomatically, use the value
"detect" instead of the ID
string. You can also option-
ally use the
candidateTemplateId
s option to restrict detection
to specified templates.

 Caution
schemaId isn’t always
a required option. How-
ever, if your payload
includes templateId,
it must also include
schemaId. In such
cases, don’t include
headerFields or
lineItemFields in
the payload to avoid con-
flicts.

Document Information Extraction


Development PUBLIC 129
Option Required Data Type Description

schemaId No (Yes if payload in- String The ID of the schema to be


cludes templateId or
used for this document.
doesn't include a list of
headerFields and / or a To use one of the preconfig-
list of lineItemFields) ured SAP schemas, consider
the following schema IDs and
document types:

• SAP_OCROnly_schema:
"schemaId":"09e6c9e4-
d7b0-414f-bd85-
cfee6fbb2add" for
custom documents
• SAP_invoice_schema:
"sche-
maId":"cf8cc8a9-1eee-4
2d9-9a3e-507a61baac2
3" for invoice docu-
ments
• SAP_purchaseOrder_
schema: "sche-
maId":"fbab052e-6f9b-
4a5f-
b42f-29a8162eb1bf" for
purchaseOrder
documents
• SAP_paymentAdvice_
schema: "sche-
maId":"b7fdcfac-7853-4
2bb-89d2-
ede2ba1ce803" for
paymentAdvice
documents

schemaVersion No (Yes if payload in- String The version number of


cludes schemaId, and you the schema you want to
don't want to use the use for this document. In
schemaVersion default the payload, schemaId,
version 1) and schemaVersion
must be provided. If
schemaVersion isn't pro-
vided, default version 1 is
used.

Request Example: Options Payload for Autodetecting templateID

{
"clientId":"c_00",
"documentType":"invoice",
"receivedDate":"2020-02-17",
"schemaId":"10c10bd2-082b-47c8-851d-e58827828637",

Document Information Extraction


130 PUBLIC Development
"templateId":"detect",
"candidateTemplateIds":[
"0ebcd5c4-7843-4e6e-867a-1e5c997e4e4c",
"98ee6ff3-30bf-4e22-8579-0f0bde462c53",
"d6f62ef3-551a-454d-bfa4-fc334af30bf2"
],
"enrichment":{

}
}

Request Example: Options Payload with Template

{
"clientId":"c_00",
"documentType":"invoice",
"receivedDate":"2020-02-17",
"schemaId":"10c10bd2-082b-47c8-851d-e58827828637",
"templateId":"0ebcd5c4-7843-4e6e-867a-1e5c997e4e4c",
"enrichment":{

}
}

Request Example: Options Payload without Template

{
"extraction":{
"headerFields":[
"documentNumber",
"taxId",
"purchaseOrderNumber",
"shippingAmount",
"netAmount",
"senderAddress",
"senderName",
"grossAmount",
"currencyCode",
"receiverContact",
"documentDate",
"taxAmount",
"taxRate",
"receiverName",
"receiverAddress"
],
"lineItemFields":[
"description",
"netAmount",
"quantity",
"unitPrice",
"materialNumber"
]
},
"clientId": "c_00",
"documentType": "invoice",
"receivedDate": "2020-02-17",
"enrichment": {}
}

Request Example: Options Payload with SAP_OCROnly_schema

{
"schemaId":"09e6c9e4-d7b0-414f-bd85-cfee6fbb2add",
"clientId":"c_10",

Document Information Extraction


Development PUBLIC 131
"documentType":"custom"
}

Response

Response Fields

JSON Field Description

id Request ID

processedTime Timestamp in RFC format

status Status of the request. Possible values: “PENDING”, “DONE”, or “FAILED”

The response is given as a status (201, 400, 401, 415, 429, 500, or 503). See Common Status and Error Codes
[page 226].

Response Example
201 “Created”

{
"id":"484b6e1c-501c-4a07-85cb-84554656a175",
"status":"PENDING",
"processedTime":"2020-03-26T17:00:00.000000+00:00"
}

12.1.6.1.1 Enrichment Parameter

The enrichment parameter can be used to retrieve a matching of enrichment data to extracted header fields.
See Create Enrichment Data [page 167]. The property should be a JSON object which can contain properties,
as listed in the table below, depending on the enrichment data you want to match.

Example

"enrichment":{
"sender":{
"top":5,
"type":"businessEntity",
"subtype":"supplier"
},
"employee":{
"type":"employee"
},
"product":{
"type":"product"
}

Document Information Extraction


132 PUBLIC Development
}

Property Required Description

employee No To match the receiverContact extracted header field to enrichment data,


the employee property should be present in enrichment.

product No To match the product line items found on the document to enrichment
data, the product property should be present in enrichment.

receiver No To match the extracted visual information from the document to the
receiver enrichment data, the receiver property should be present in
enrichment.

sender No To match the extracted visual information from the document to the
sender enrichment data, the sender property should be present in
enrichment.

type Yes The type of enrichment data entities used for matching. Available values:
businessEntity, employee, and product. See Entities [page
170] for details about the available enrichment data entity types.

subtype No The subtype of enrichment data entities used for matching with
type businessEntity. Available values: supplier, customer, and
companyCode.

top No
The top property specifies the maximum number of matched enrich-
ment data entities returned.

 Note
If the top property is not defined, the default value is 1. The maxi-
mum possible value of the property is 50. If you enter a value higher
than 50, you will get an error message with the maximum possible
value.

 Note

The following properties are optional, but, in case you want to match enrichment data, at least one of them
is required:

• sender
• receiver
• employee
• product

Document Information Extraction


Development PUBLIC 133
Related Information

Extracted Header Fields [page 278]

12.1.6.2 Post Catalog

Post a search or filter request to get the current status of document processing jobs. Returns a list with all
document processing jobs in a JSON file.

Optionally, the jobs can be filtered based on the client ID and a filter query. You have the following catalog
options:

• Filtering using the filter and likeFilter parameters


• Ordering using the order parameter
• Pagination using the limit and offset parameters

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/catalog

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

options Yes JSON Object Catalog options used when searching for docu-
ments. See the Options Payload table below.

Options Payload
Option Required Data Type Description

clientId No String The ID of the client used


while submitting the docu-
ment. For example: “c_00”.

Document Information Extraction


134 PUBLIC Development
Option Required Data Type Description

filter No String Filter query for retrieving


documents. The filter query
needs to be an expression.
The expression should fol-
low the format: "fieldName
op value <AND/OR> field-
Name op value". Supported
fields: documentType, cre-
ated, schemaId, status, cus-
tomLabel, reviewStatus, or
tenantId. Possible operators
(op): depend on the field. For
example: “status eq done”.

likeFilter No String Filter query for retrieving


documents that uses the
LIKE operator. The expres-
sion should follow the format
“fieldName like value”. Sup-
ported field: fileName. For ex-
ample: “fileName like \"test
receipt\"”.

limit No Integer Number of documents to


retrieve (maximum allowed
value: 50). For example: 10.

offset No Integer Index of the first document


to be retrieved. For example:
20.

order No String Order criteria for the re-


trieved documents. Possible
values: created, fileName,
documentType, or status.
For example: “created asc”
(sorts by creation date in as-
cending order).

Request Example: Options Payload

{
"clientId":"c_00",
"limit":10,
"offset":2,
"order":"created desc",
"likeFilter":"fileName like \"test receipt\"",
"filter":"status eq failed or documentType eq invoice"
}

Document Information Extraction


Development PUBLIC 135
Response

Response Fields

JSON Field Description

results List containing all document processing jobs

totalDocumentCount Total number of document processing jobs returned by the request options

usedOptions Options used in the filtering and/or ordering of document processing jobs

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

200 “Success”

{
"results":[
[
{
"status":"DONE",
"id":"c4f25368-d3e6-43f7-a0b4-55adf7f54e95",
"fileName":"test receipt_invoice1.pdf",
"documentType":"invoice",
"created":"2020-03-26 17:00:00.000000+00:00",
"clientId":"c_00",
"finished":"2020-03-26 17:01:30.000000+00:00"
},
{
"status":"PENDING",
"id":"50199d80-c742-453b-830d-8e6ce14568e2",
"fileName":"test receipt invoice2.pdf",
"documentType":"invoice",
"created":"2020-03-26 18:00:00.000000+00:00",
"clientId":"c_00"
},
{
"status":"FAILED",
"id":"50199d80-c742-453b-830d-8e6ce14568e2",
"fileName":"test receipt pa.pdf",
"documentType":"paymentAdvice",
"created":"2020-03-26 19:00:00.000000+00:00",
"clientId":"c_00",
"finished":"2020-03-26 19:01:30.000000+00:00"
}
]
],
"usedOptions":{
"clientId":"c_00",
"limit":10,
"offset":2,
"order":"created desc",
"likeFilter":"fileName like \"test receipt\"",
"filter":"status eq failed or documentType eq invoice"
},
"totalDocumentCount":5
}

Document Information Extraction


136 PUBLIC Development
12.1.6.3 List Documents

Get a list of up to 200 documents in a JSON file.

 Tip

Use the endpoint Post Catalog [page 134] to page through lists of more than 200 documents in a JSON file.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId No String The ID of the client. For example: c_00

Response

Response Fields

JSON Field Description

clientId ID of the client this document was processed for

created Time when the document was submitted for processing

finished Time when the document status changed to “DONE”, or “FAILED”

id Document ID

results List containing information of all processed documents

status Processing status of the document. Possible values: “PENDING”, “DONE”, “CONFIRMED”,
or “FAILED”

The response is given as a status (200, 401, or 500) and JSON file. See Common Status and Error Codes [page
226].

Response Example
200 “Success”

{
"results":[

Document Information Extraction


Development PUBLIC 137
[
{
"id":"c4f25368-d3e6-43f7-a0b4-55adf7f54e95",
"clientId":"c1",
"created":"2020-05-08T10:39:59.916359+00:00",
"finished":"2020-05-08T10:40:50.467719+00:00",
"status":"DONE"
},
{
"id":"50199d80-c742-453b-830d-8e6ce14568e2",
"clientId":"c1",
"created":"2020-05-12T08:30:04.718730+00:00",
"status":"PENDING"
},
{
"id":"47299d80-c742-453b-830d-8e6ce14568e2",
"clientId":"c1",
"created":"2020-05-12T08:23:06.938779+00:00",
"finished":"2020-05-12T08:23:21.765680+00:00",
"status":"FAILED"
}
]
]
}

12.1.6.4 Get Result

The Document Information Extraction service takes document files as input and returns a JSON file that
contains the information that has been extracted from the header fields and line items of the specified
document. See Supported Document Types and File Formats [page 84].

 Remember

Document Information Extraction generally provides extraction results for an average document in 30
seconds.

However, processing can take longer if the task involved is more complex – for example, if the documents
processed are large.

Before you use the service for important or time-sensitive tasks, we strongly recommend running mass
tests to assess the performance of the service and make sure it meets your requirements.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs/<id>

HTTP Method: GET

Document Information Extraction


138 PUBLIC Development
Request Parameters

Parameter Required Data Type Description

extractedValues No Boolean Set to true to get the extracted values. Set to


false to get the ground truth values, if available.
If ground truth values are not available, extracted
values are returned in any case. The default value
for this parameter is false.

id Yes String The ID returned by the Upload Docu-


ment [page 127] endpoint. For example:
4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff.

returnNullValue No Boolean Set to true to get all requested fields in the docu-
s ment results, even if they could not be extracted.
For fields that could not be extracted, for example,
because they are not available in the document or
because the service was not able to identify the
field, the value is null.

If no value can be extracted, both value and


rawValue are null. If a value is extracted, it
can occur that the corresponding rawValue is
displayed as an empty string.

Additionally, the prediction confidence score is


null and the x-coordinate, y-coordinate, width, and
height are set to 0. Per default, this parameter
is set to false and fields not extracted are not re-
turned.

Response

Response Fields

JSON Field Description

attributes Dictionary containing the method of the matched enrichment data record. Or dictionary
containing the symbology of the extracted barcode header field.

bocrVersion The version number of the Optical Character Recognition (OCR) service.

category Category of the field. For example: document or receiver.

clientId Identify the client that submitted the extraction request using the Upload Document [page
127] endpoint.

confidence Prediction confidence score for a field or enrichment data. The possible values are between
0.0 and 1.0.

coordinates Bounding box coordinates for this field (not present if value is null).

country Country/Region code of the document submitted.

Document Information Extraction


Development PUBLIC 139
JSON Field Description

created Time when the document was submitted for processing.

dataForRetrainingSt Retraining status. Possible values: “notUsedForTraining”, “rejectedDueToPII”, “inProcess”,


atus “acceptedForTraining”, or “usedForTraining”.

documentType Type of the document submitted.

doxVersion The version number of the Document Information Extraction service.

employee Employee enrichment data. For example: employee name.

enrichment Dictionary containing enrichment data.

extraction Dictionary containing all the extracted header fields and line items.

fileName Full name of the document submitted.

fileType File format of the document submitted. For example: PDF, PNG, JPEG.

finished Time when the document status changed to “DONE”, or “FAILED”.

group Group this field belongs to.

headerFields Dictionary containing all extracted header fields.

height Page height of the document

id Document or enrichment data ID.

label User-friendly names for header fields and line items. See Add Fields to Schema Version
[page 199].

languageCodes Array containing strings of language codes. For example: "en" for English and "de" for
German.

lineItems Dictionary containing all extracted line items.

method Match strategy for each matched enrichment data record. Possible values: “exactTaxId”,
“exactBankAccount”, “exactMaterialNumber”, or “similarity”.

model The model used to extract information from the specified field. Possible values: “ai” or “tem-
plate”. “ai” denotes the machine learning models of the Document Information Extraction
service.

name Name of the field.

page Page number of the document where the field was found (not present if value is null).

pageCount Total number of pages a document contains. For example: 2.

rawValue Value extracted for this field by the Document Information Extraction service as displayed in
the document.

schemaId The ID of the schema used when you uploaded the document.

schemaVersion The version number of the schema used when you uploaded the document.

sender Sender enrichment data. For example: sender name and sender address.

status Processing status of the document. Possible values: “PENDING”, “DONE”, or “FAILED”.

symbology Type of the extracted barcode. For example: QR.

templateId The ID of the template associated with the document

Document Information Extraction


140 PUBLIC Development
JSON Field Description

type Data type of the extracted hearder fields and line items.

value Value extracted for this field by the Document Information Extraction service in standar-
dized format.

values Dictionary containing all matched enrichment data records.

variant See Data Variants [page 172].

width Page width of the document

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success” with SAP_OCROnly_schema ("schemaId":"09e6c9e4-d7b0-414f-bd85-cfee6fbb2add")

{
"status":"DONE",
"id":"2acc2040-f956-4178-9cf4-d02f020626a6",
"fileName":"sample-power_of_attorney-3.pdf",
"documentType":"custom",
"created":"2022-10-04T07:46:03.412498+00:00",
"finished":"2022-10-04T07:46:56.834313+00:00",
"clientId":"c_00",
"languageCodes":[
"en"
],
"pageCount":1,
"schemaId":"09e6c9e4-d7b0-414f-bd85-cfee6fbb2add",
"country":null,
"extraction":{
"headerFields":[

],
"lineItems":[

]
},
"bocrVersion":"2.7.1",
"doxVersion":"local",
"fileType":"pdf",
"dataForRetrainingStatus":"notUsedForTraining"
}

Response Example
200 “Success” without schemaId

{
"status":"DONE",
"id":"a712375f-0b6d-4550-83fb-2271a2301aad",
"fileName":"demo_taxid.pdf",
"documentType":"invoice",
"created":"2022-04-27T09:46:20.090953+00:00",
"finished":"2022-04-27T09:46:45.151654+00:00",
"clientId":"c_00",
"languageCodes":[
"xx"
],
"pageCount":1,

Document Information Extraction


Development PUBLIC 141
"country":"MX",
"extraction":{
"headerFields":[
{
"name":"receiverContact",
"category":"receiver",
"value":"FELISHIA ICE SOUTH",
"rawValue":"FELISHIA ICE SOUTH",
"type":"string",
"page":1,
"confidence":0.9431540966033936,
"coordinates":{
"x":0.0792156862745098,
"y":0.19575757575757577,
"w":0.13568627450980392,
"h":0.009090909090909066
},
"model":"ai",
"label":"receiverContact"
},
{
"name":"receiverName",
"category":"receiver",
"value":"JOERNS ICE HEALTH (EST)",
"rawValue":"JOERNS ICE HEALTH (EST)",
"type":"string",
"page":1,
"confidence":0.8918973803520203,
"coordinates":{
"x":0.05215686274509804,
"y":0.45696969696969697,
"w":0.18823529411764706,
"h":0.011818181818181839
},
"model":"ai",
"label":"receiverName"
},
{
"name":"shippingAmount",
"category":"amounts",
"value":0.0,
"rawValue":"0.00",
"type":"number",
"page":1,
"confidence":0.9837643504142761,
"coordinates":{
"x":0.907843137254902,
"y":0.7975757575757576,
"w":0.03215686274509799,
"h":0.009393939393939399
},
"model":"ai",
"label":"shippingAmount"
},
{
"name":"taxAmount",
"category":"amounts",
"value":7.07,
"rawValue":"7.07",
"type":"number",
"page":1,
"confidence":0.9896121621131897,
"coordinates":{
"x":0.9078431129455566,
"y":0.8166666626930237,
"w":0.0313725471496582,
"h":0.008787870407104492
},

Document Information Extraction


142 PUBLIC Development
"model":"ai",
"group":1,
"label":"taxAmount"
},
{
"name":"senderAddress",
"category":"sender",
"value":"12345 NETWORK PLACE CHICAGO, IL 60000-1234",
"rawValue":"12345 NETWORK PLACE CHICAGO, IL 60000-1234",
"type":"string",
"page":1,
"confidence":0.6106114352383017,
"coordinates":{
"x":0.3184313725490196,
"y":0.1087878787878788,
"w":0.15725490196078434,
"h":0.022727272727272707
},
"model":"ai",
"label":"senderAddress"
},
{
"name":"receiverAddress",
"category":"receiver",
"value":"12345 DEARBORN STREET CHATSWORTH, CA 12345 UNITED STATES",
"rawValue":"12345 DEARBORN STREET CHATSWORTH, CA 12345 UNITED
STATES",
"type":"string",
"page":1,
"confidence":0.5784785588744978,
"coordinates":{
"x":0.07882352941176471,
"y":0.21363636363636362,
"w":0.21607843137254903,
"h":0.04545454545454547
},
"model":"ai",
"label":"receiverAddress"
},
{
"name":"senderName",
"category":"sender",
"value":"GLOBAL ICE COMPANY INC.",
"rawValue":"GLOBAL ICE COMPANY INC.",
"type":"string",
"page":1,
"confidence":0.602843187909389,
"coordinates":{
"x":0.343921568627451,
"y":0.2875757575757576,
"w":0.2984313725490196,
"h":0.009393939393939399
},
"model":"ai",
"label":"senderName"
},
{
"name":"taxId",
"category":"amounts",
"value":"11-3584699",
"rawValue":"11-3584699",
"type":"string",
"page":1,
"confidence":0.950018584728241,
"coordinates":{
"x":0.5015686274509804,
"y":0.3409090909090909,
"w":0.08784313725490189,

Document Information Extraction


Development PUBLIC 143
"h":0.009393939393939399
},
"model":"ai",
"group":1,
"label":"taxId"
},
{
"name":"currencyCode",
"category":"amounts",
"value":"USD",
"rawValue":"",
"type":"string",
"page":1,
"confidence":0.9978113174438477,
"coordinates":{
"x":0.0,
"y":0.0,
"w":0.0,
"h":0.0
},
"model":"ai",
"label":"currencyCode"
},
{
"name":"documentNumber",
"category":"document",
"value":"112857784",
"rawValue":"112857784",
"type":"string",
"page":1,
"confidence":0.9963446855545044,
"coordinates":{
"x":0.5862745098039216,
"y":0.08757575757575757,
"w":0.0811764705882353,
"h":0.008787878787878789
},
"model":"ai",
"label":"documentNumber"
},
{
"name":"documentDate",
"category":"document",
"value":"2018-06-29",
"rawValue":"06-29-2018",
"type":"date",
"page":1,
"confidence":0.9906787872314453,
"coordinates":{
"x":0.7003921568627451,
"y":0.08757575757575757,
"w":0.08470588235294119,
"h":0.008484848484848484
},
"model":"ai",
"label":"documentDate"
},
{
"name":"grossAmount",
"category":"amounts",
"value":108.13,
"rawValue":"108.13",
"type":"number",
"page":1,
"confidence":0.9433890581130981,
"coordinates":{
"x":0.8913725490196078,
"y":0.8357575757575758,

Document Information Extraction


144 PUBLIC Development
"w":0.05058823529411771,
"h":0.009696969696969648
},
"model":"ai",
"label":"grossAmount"
},
{
"name":"netAmount",
"category":"amounts",
"value":101.06,
"rawValue":"101.06",
"type":"number",
"page":1,
"confidence":0.9396025538444519,
"coordinates":{
"x":0.8901960784313725,
"y":0.7778787878787878,
"w":0.04980392156862745,
"h":0.010000000000000009
},
"model":"ai",
"label":"netAmount"
},
{
"name":"purchaseOrderNumber",
"category":"details",
"value":"14035740",
"rawValue":"14035740",
"type":"string",
"page":1,
"confidence":0.7348883748054504,
"coordinates":{
"x":0.5905882352941176,
"y":0.12818181818181817,
"w":0.0725490196078431,
"h":0.009090909090909094
},
"model":"ai",
"label":"purchaseOrderNumber"
}
],
"lineItems":[
[
{
"name":"description",
"category":"details",
"value":"PIP Ambi-Dex&#174; 63-331PF Industrial Grade Nitrile
Gloves, Powder-Free, Tex- tured, Blue, L, 100/Box - Tracking#:
1ZX647100300080084",
"rawValue":"PIP Ambi-Dex&#174; 63-331PF Industrial Grade
Nitrile Gloves, Powder-Free, Tex- tured, Blue, L, 100/Box - Tracking#:
1ZX647100300080084",
"type":"string",
"page":1,
"confidence":0.8756256103515625,
"coordinates":{
"x":0.2988235294117647,
"y":0.6545454545454545,
"w":0.30980392156862746,
"h":0.05363636363636359
},
"model":"ai",
"label":"description"
},
{
"name":"materialNumber",
"category":"details",
"value":"B676817",

Document Information Extraction


Development PUBLIC 145
"rawValue":"B676817",
"type":"string",
"page":1,
"confidence":0.982785165309906,
"coordinates":{
"x":0.18313725490196078,
"y":0.6548484848484849,
"w":0.06627450980392158,
"h":0.009393939393939288
},
"model":"ai",
"label":"materialNumber"
},
{
"name":"netAmount",
"category":"amounts",
"value":88.0,
"rawValue":"88.00",
"type":"number",
"page":1,
"confidence":0.8774160146713257,
"coordinates":{
"x":0.779607843137255,
"y":0.6551515151515152,
"w":0.04117647058823526,
"h":0.009393939393939399
},
"model":"ai",
"label":"netAmount"
},
{
"name":"quantity",
"category":"details",
"value":10.0,
"rawValue":"10",
"type":"number",
"page":1,
"confidence":0.9688798189163208,
"coordinates":{
"x":0.08627450980392157,
"y":0.6551515151515152,
"w":0.016470588235294126,
"h":0.009090909090909038
},
"model":"ai",
"label":"quantity"
},
{
"name":"unitPrice",
"category":"details",
"value":8.8,
"rawValue":"8.80",
"type":"number",
"page":1,
"confidence":0.9341872334480286,
"coordinates":{
"x":0.6862745098039216,
"y":0.6551515151515152,
"w":0.03176470588235292,
"h":0.009393939393939399
},
"model":"ai",
"label":"unitPrice"
}
],
[
{
"name":"description",

Document Information Extraction


146 PUBLIC Development
"category":"details",
"value":"Ergodyne&#174; ProFlex&#174; 812 Standard Utility Glove,
Black, Large, 17174 - Tracking#: 1ZX647100300081467",
"rawValue":"Ergodyne&#174; ProFlex&#174; 812 Standard Utility
Glove, Black, Large, 17174 - Tracking#: 1ZX647100300081467",
"type":"string",
"page":1,
"confidence":0.7070900797843933,
"coordinates":{
"x":0.296078431372549,
"y":0.7233333333333334,
"w":0.323921568627451,
"h":0.040303030303030285
},
"model":"ai",
"label":"description"
},
{
"name":"materialNumber",
"category":"details",
"value":"B2139393",
"rawValue":"B2139393",
"type":"string",
"page":1,
"confidence":0.9847809076309204,
"coordinates":{
"x":0.17882352941176471,
"y":0.7233333333333334,
"w":0.07529411764705879,
"h":0.009696969696969648
},
"model":"ai",
"label":"materialNumber"
},
{
"name":"netAmount",
"category":"amounts",
"value":13.06,
"rawValue":"13.06",
"type":"number",
"page":1,
"confidence":0.8856437802314758,
"coordinates":{
"x":0.7803921568627451,
"y":0.7233333333333334,
"w":0.040784313725490184,
"h":0.010303030303030258
},
"model":"ai",
"label":"netAmount"
},
{
"name":"quantity",
"category":"details",
"value":1.0,
"rawValue":"1",
"type":"number",
"page":1,
"confidence":0.9791963696479797,
"coordinates":{
"x":0.08901960784313726,
"y":0.723939393939394,
"w":0.010196078431372546,
"h":0.008484848484848428
},
"model":"ai",
"label":"quantity"
},

Document Information Extraction


Development PUBLIC 147
{
"name":"unitPrice",
"category":"details",
"value":13.06,
"rawValue":"13.06",
"type":"number",
"page":1,
"confidence":0.8986196517944336,
"coordinates":{
"x":0.6772549019607843,
"y":0.7233333333333334,
"w":0.04117647058823526,
"h":0.010303030303030258
},
"model":"ai",
"label":"unitPrice"
}
]
]
},
"bocrVersion":null,
"doxVersion":"local",
"fileType":"pdf",
"enrichment":{
"sender":[
{
"id":"demo-match",
"confidence":0.7157895,
"values":{
"name":"GLOBAL ICE COMPANY INC.",
"bankAccount":"de23672700030136040305",
"email":"example@sap.com",
"address1":"12345 NETWORK PLACE CHICAGO, IL 60673-1298 2",
"countryCode":"US",
"state":"Illinois",
"city":"Chicago",
"postalCode":"60007"
},
"attributes":{
"method":"similarity"
}
}
],
"employee":[

],
"product":[

]
},
"dataForRetrainingStatus":"notUsedForTraining"
}

If the document is processed successfully, Document Information Extraction provides the predictions for the
requested fields. The requested fields are those which were requested in Upload Document [page 127]. When
no value can be detected for fields in header or line items, they do not appear in the response JSON file.

Response Example
200 “Success” with barcode header field extraction

{
"status":"DONE",
"id":"2853a32c-9cf9-415f-9585-82c63c2fa699",
"fileName":"qr_three_codes.pdf",
"documentType":"invoice",

Document Information Extraction


148 PUBLIC Development
"created":"2023-01-27T09:57:26.160906+00:00",
"finished":"2023-01-27T09:58:20.383827+00:00",
"clientId":"c_00",
"languageCodes":[
"de"
],
"pageCount":1,
"width":2480,
"height":3507,
"country":"MX",
"bocrVersion":"1.7.0",
"doxVersion":"local",
"fileType":"pdf",
"enrichment":{
"sender":[

],
"employee":[

],
"product":[

]
},
"dataForRetrainingStatus":"notUsedForTraining",
"extraction":{
"headerFields":[
{
"name":"barcode",
"category":"details",
"value":"https://verificacfdi.facturaelectronica.sat.gob.mx/
default.aspx?
id=706220d0-3b0b-4801-82b8-5f771f8af9c1&re=CSA080218TQ8&rr=NME140730ME0&tt=000009
9576.720000&fe=ZI/I4A==",
"rawValue":"https://verificacfdi.facturaelectronica.sat.gob.mx/
default.aspx?
id=706220d0-3b0b-4801-82b8-5f771f8af9c1&re=CSA080218TQ8&rr=NME140730ME0&tt=000009
9576.720000&fe=ZI/I4A==",
"type":"string",
"page":1,
"confidence":1.0,
"coordinates":{
"x":0.14717741935483872,
"y":0.262617621899059,
"w":0.07782258064516129,
"h":0.05503279155973767
},
"model":"ai",
"group":1,
"attributes":{
"symbology":"QR"
},
"label":"barcode"
},
{
"name":"barcode",
"category":"details",
"value":"https://verificacfdi.facturaelectronica.sat.gob.mx/
default.aspx?
id=706220d0-3b0b-4801-82b8-5f771f8af9c1&re=CSA080218TQ8&rr=NME140730ME0&tt=000009
9576.720000&fe=ZI/I4A==",
"rawValue":"https://verificacfdi.facturaelectronica.sat.gob.mx/
default.aspx?
id=706220d0-3b0b-4801-82b8-5f771f8af9c1&re=CSA080218TQ8&rr=NME140730ME0&tt=000009
9576.720000&fe=ZI/I4A==",
"type":"string",
"page":1,
"confidence":1.0,

Document Information Extraction


Development PUBLIC 149
"coordinates":{
"x":0.3294354838709677,
"y":0.6854861705161106,
"w":0.21129032258064517,
"h":0.1497005988023952
},
"model":"ai",
"group":2,
"attributes":{
"symbology":"QR"
},
"label":"barcode"
},
{
"name":"barcode",
"category":"details",
"value":"https://verificacfdi.facturaelectronica.sat.gob.mx/
default.aspx?
id=706220d0-3b0b-4801-82b8-5f771f8af9c1&re=CSA080218TQ8&rr=NME140730ME0&tt=000009
9576.720000&fe=ZI/I4A==",
"rawValue":"https://verificacfdi.facturaelectronica.sat.gob.mx/
default.aspx?
id=706220d0-3b0b-4801-82b8-5f771f8af9c1&re=CSA080218TQ8&rr=NME140730ME0&tt=000009
9576.720000&fe=ZI/I4A==",
"type":"string",
"page":1,
"confidence":1.0,
"coordinates":{
"x":0.7411290322580645,
"y":0.47619047619047616,
"w":0.20725806451612902,
"h":0.14656401482748788
},
"model":"ai",
"group":3,
"attributes":{
"symbology":"QR"
},
"label":"barcode"
},
{
"name":"currencyCode",
"category":"amounts",
"value":"CHF",
"rawValue":"",
"type":"string",
"page":1,
"confidence":0.992719292640686,
"coordinates":{
"x":0.0,
"y":0.0,
"w":0.0,
"h":0.0
},
"model":"ai",
"label":"currencyCode"
},
{
"name":"documentDate",
"category":"document",
"value":"2019-11-18",
"rawValue":"19-11-18",
"type":"date",
"page":1,
"confidence":0.9978566765785217,
"coordinates":{
"x":0.294758064516129,
"y":0.16737952666096378,

Document Information Extraction


150 PUBLIC Development
"w":0.08548387096774196,
"h":0.00883946392928428
},
"model":"ai",
"label":"documentDate"
},
{
"name":"documentNumber",
"category":"document",
"value":"10101010",
"rawValue":"10101010",
"type":"string",
"page":1,
"confidence":0.947092592716217,
"coordinates":{
"x":0.3100806451612903,
"y":0.18106643855146848,
"w":0.07379032258064516,
"h":0.00855431993156544
},
"model":"ai",
"label":"documentNumber"
},
{
"name":"grossAmount",
"category":"amounts",
"value":99576.72,
"rawValue":"0000099576.720000",
"type":"number",
"page":1,
"confidence":1.0,
"coordinates":{
"x":0.14717741935483872,
"y":0.262617621899059,
"w":0.07782258064516129,
"h":0.05503279155973767
},
"model":"ai",
"label":"grossAmount"
},
{
"name":"receiverAddress",
"category":"receiver",
"value":"Max M\u00fcller, Teststral\u00dfe 99, 19012 Testort Rene
Teststral\u00dfe 100 19012 Testort",
"rawValue":"Max M\u00fcller, Teststral\u00dfe 99, 19012 Testort Rene
Teststral\u00dfe 100 19012 Testort",
"type":"string",
"page":1,
"confidence":0.7233287231620119,
"coordinates":{
"x":0.567741935483871,
"y":0.20786997433704021,
"w":0.2270161290322581,
"h":0.053607071571143444
},
"model":"ai",
"label":"receiverAddress"
},
{
"name":"receiverContact",
"category":"receiver",
"value":"Rene M\u00fcller",
"rawValue":"Rene M\u00fcller",
"type":"string",
"page":1,
"confidence":0.5884397625923157,
"coordinates":{

Document Information Extraction


Development PUBLIC 151
"x":0.5689516129032258,
"y":0.22783005417735958,
"w":0.08185483870967736,
"h":0.00855431993156544
},
"model":"ai",
"label":"receiverContact"
},
{
"name":"receiverName",
"category":"receiver",
"value":"Max M\u00fcller,",
"rawValue":"Max M\u00fcller,",
"type":"string",
"page":1,
"confidence":0.6223656535148621,
"coordinates":{
"x":0.5677419304847717,
"y":0.2078699767589569,
"w":0.06330645084381104,
"h":0.008269175887107849
},
"model":"ai",
"label":"receiverName"
},
{
"name":"senderAddress",
"category":"sender",
"value":"Teststra\u00df\u00dfe 99 19012 Testort",
"rawValue":"Teststra\u00df\u00dfe 99 19012 Testort",
"type":"string",
"page":1,
"confidence":0.9818366663199499,
"coordinates":{
"x":0.15201612903225806,
"y":0.08639863130881095,
"w":0.10443548387096777,
"h":0.02281151981750784
},
"model":"ai",
"label":"senderAddress"
},
{
"name":"senderBankAccount",
"category":"sender",
"value":"CHS1 5112 3453 5444 44",
"rawValue":"CHS1 5112 3453 5444 44",
"type":"string",
"page":1,
"confidence":0.6283774228323075,
"coordinates":{
"x":0.06411290322580646,
"y":0.682919874536641,
"w":0.1411290322580645,
"h":0.007128599942971103
},
"model":"ai",
"group":1,
"label":"senderBankAccount"
},
{
"name":"senderName",
"category":"sender",
"value":"Max M\u00fcller",
"rawValue":"Max M\u00fcller",
"type":"string",
"page":1,
"confidence":0.8665437601845373,

Document Information Extraction


152 PUBLIC Development
"coordinates":{
"x":0.15201612903225806,
"y":0.0718562874251497,
"w":0.08225806451612905,
"h":0.009409751924721987
},
"model":"ai",
"label":"senderName"
},
{
"name":"taxId",
"category":"amounts",
"value":"CSA080218TQ8",
"rawValue":"CSA080218TQ8",
"type":"string",
"page":1,
"confidence":1.0,
"coordinates":{
"x":0.14717741935483872,
"y":0.262617621899059,
"w":0.07782258064516129,
"h":0.05503279155973767
},
"model":"ai",
"group":1,
"label":"taxId"
}
],
"lineItems":[

]
}
}

Extracted Header Fields and Extracted Line Items Categories

Fields can belong to a category. This is indicated by the category property of a field in the response JSON.
An example is a tax with multiple fields. Taxes are returned in the form of a category with the fields taxName,
taxRate, and taxAmount. See all field categories in Extracted Header Fields [page 278] and Extracted Line
Items [page 286].

Response Example

400 “Bad Request”

{
"code": "E93",
"message": "Required parameters not provided.",
"details": "string"
}

Response Example

401 “Unauthorized”

{
"message": "No Authorization given in the request header"
}

Document Information Extraction


Development PUBLIC 153
Response Example
500 “Internal server error”

{
"message": "Internal server error"
}

12.1.6.5 Save Ground Truth

Save the ground truth (correct values for document fields) for the specified document job ID.

This endpoint takes the job ID of a document submitted previously and returns the corresponding processing
results, or an error, if the given ID isn't found.

Add to the payload extraction (list of all the extracted header fields and line items), and enrichment (list of
the matched enrichment data).

For the fields, the following attributes are part of the ground truth:

• name (required)
• value (required)
• rawValue (optional)
• page (optional)
• coordinates (optional)

For enrichment data, the following attribute is part of the ground truth: id (required).

 Caution

It's technically possible to add other attributes to the ground truth payload (for example, confidence), but
they have no impact on the stored values and are ignored.

 Note

After saving the ground truth of a document, the prediction confidence score of all header fields and line
items is automatically set to 1.0 (100%). The service assumes that all field values are correct or have been
manually corrected. Only save the ground truth of documents that have been reviewed and don't contain
incorrect extraction results.

 Caution

It isn't possible to save ground truth if you used the SAP_OCROnly_schema for the document extraction.
See second “Bad Request” error message in the Response section below.

Request

Base URL: url value from outside the uaa section of the service key

Document Information Extraction


154 PUBLIC Development
URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs/<id>

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

id Yes String The ID returned by the Upload Docu-


ment [page 127] endpoint. For example:
4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff

payload Yes JSON Object Fields of the document (header fields and line
items) and enrichment data

 Note
The structure of the payload is as the
response returned by the Get Result [page
138] endpoint. However, while the top-N en-
richment matches are returned in Get Result
[page 138], for the Save Ground Truth end-
point, the enrichment list must not contain
more than one (ground truth) match for each
sender and employee.

Request Example

{
"extraction":{
"headerFields":[
{
"name":"documentDate",
"value":"2019-02-18"
},
{
"name":"grossAmount",
"value":200
}
],
"lineItems":[
[
{
"name":"description",
"value":"Professional Services"
},
{
"name":"netAmount",
"value":200
},
{
"name":"unitPrice",
"value":200
},
{
"name":"materialNumber",
"value":"007"
}
]

Document Information Extraction


Development PUBLIC 155
]
},
"enrichment":{
"sender":[
{
"id":"BE0001"
}
],
"employee":[
{
"id":"E0001"
}
]
}
}

Response

Response Fields

JSON Field Description

message Status message with information about the request

status Status of the ground truth upload. Possible values: “PENDING”, “DONE”, or “FAILED”

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
201 “Created”

{
"status": "DONE",
"message": "Ground truth / corrected values uploaded successfully"
}

Response Example
400 “Bad Request”

{
"code": "E93",
"message": "Required parameters not provided.",
"details": "string"
}

Response Example
400 “Bad Request” (with SAP_OCROnly_schema)

{
"error":{
"code":"ES068",
"message":"Posting ground truth is not allowed for SAP_OCROnly_schema.",
"details":[

Document Information Extraction


156 PUBLIC Development
]
}
}

Response Example
401 “Unauthorized”

{
"message": "No Authorization given in the request header"
}

Response Example
500 “Internal server error”

{
"message":"Internal server error"
}

12.1.6.6 Confirm Document

Change the status of a document from “DONE” to “CONFIRMED”. After that, the document status is
permanent and cannot be changed anymore. The document extraction values cannot be changed anymore
either. Also use this endpoint to enable the data feedback collection feature to allow documents to be used for
retraining.

 Note

SAP reserves the right to use confirmed documents in the reporting of accuracy values and for analytics.

If you set the parameter dataForRetraining to true, you allow the use of confirmed documents to
retrain the machine learning models and improve the service.

Submitting retraining data and documents to SAP does not guarantee that SAP will actually use the data
for improving the service, or that SAP guarantees that potential errors will be fixed in future improved
versions of the service.

The prediction confidence score of all header fields and line items is automatically set to 1.0 (100%)
for confirmed documents. The service assumes that all field values are correct or have been manually
corrected. Only confirm documents that have been reviewed and don't contain incorrect extraction results.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs/<id>/confirm

HTTP Method: POST

Document Information Extraction


Development PUBLIC 157
Request Parameters

Parameter Required Data Type Description

dataForRetraini No Boolean Set to true to allow confirmed documents to be


ng used to retrain the service's machine learning
models. Set to false if you do not want to use the
data feedback collection feature.

 Note
The data feedback collection feature is only
available on production environments for en-
terprise accounts. This feature is not available
for trial account users.

SAP reserves the right to reject documents


submitted by the customers for retraining.

To use the data feedback collection feature,


Create Configuration [page 115] setting the
dataFeedbackCollection configuration key
to true.

id Yes String The ID returned by the Upload Docu-


ment [page 127] endpoint. For example:
4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff.

Response

Response Fields

JSON Field Description

message Status message with information about the request

status Document confirmation status. Possible value: “CONFIRMED”

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"status": "CONFIRMED",
"message": "Document confirmed successfully."
}

400 “ Bad Request” (dataFeedbackCollection configuration key is not set to true)

Document Information Extraction


158 PUBLIC Development
"message": "Data feedback collection is only possible with the correct tenant
configuration. Please set dataFeedbackCollection to true."
}

12.1.6.7 Get Document File

Get the original document file you uploaded to the service.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs/<id>/file

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

id Yes String The ID returned by the Upload Docu-


ment [page 127] endpoint. For example:
4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff.

Response

The response is given as a status (200, 400, 401, 404, or 500) and document file in the format previously
uploaded using the Upload Document [page 127] endpoint. See Common Status and Error Codes [page 226].

12.1.6.8 Get All Pages Text

Get the text of all pages of a document.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs/<id>/pages/text

Document Information Extraction


Development PUBLIC 159
HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

id Yes String The ID returned by the Upload Docu-


ment [page 127] endpoint. For example:
4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff

Response

Response Fields

JSON Field Description

results List containing the text and the corresponding bounding boxes (specified by the returned
coordinates) of all pages of a document

The response is given as a status (200, 400, 401, 404, or 500) and JSON file. See Common Status and Error
Codes [page 226].

Response Example
200 “Success”

{
"results":{
"1":[
{
"word_boxes":[
{
"bbox":[
[
890,
141
],
[
1028,
174
]
],
"content":"Rocket"
},
{
"bbox":[
[
1049,
141
],
[
1275,
182
]
],
"content":"Enterprises"
},
{
"bbox":[

Document Information Extraction


160 PUBLIC Development
[
1297,
143
],
[
1365,
183
]
],
"content":"Pty"
},
{
"bbox":[
[
1383,
140
],
[
1443,
174
]
],
"content":"Ltd"
}
],
"bbox":[
[
890,
140
],
[
1443,
184
]
]
}
]
}
}

12.1.6.9 Get Single Page Text

Get the text of a single page of a document.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs/<id>/pages/<no>/text

HTTP Method: GET

Document Information Extraction


Development PUBLIC 161
Request Parameters

Parameter Required Data Type Description

id Yes String The ID returned by the Upload Docu-


ment [page 127] endpoint. For example:
4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff

no Yes String The page number of the document

Response

Response Fields

JSON Field Description

value List containing the text and the corresponding bounding boxes (specified by the returned
coordinates) of a single page of a document

The response is given as a status (200, 400, 401, 404, or 500) and JSON file. See Common Status and Error
Codes [page 226].

Response Example
200 “Success”

{
"value":[
{
"word_boxes":[
{
"bbox":[
[
890,
141
],
[
1028,
174
]
],
"content":"Rocket"
},
{
"bbox":[
[
1049,
141
],
[
1275,
182
]
],
"content":"Enterprises"
},
{
"bbox":[
[
1297,

Document Information Extraction


162 PUBLIC Development
143
],
[
1365,
183
]
],
"content":"Pty"
},
{
"bbox":[
[
1383,
140
],
[
1443,
174
]
],
"content":"Ltd"
}
],
"bbox":[
[
890,
140
],
[
1443,
184
]
]
}
]
}

12.1.6.10 Get Request Options

Get the request options for a document.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs/<id>/request

HTTP Method: GET

Document Information Extraction


Development PUBLIC 163
Request Parameters

Parameter Required Data Type Description

id Yes String The ID returned by the Upload Docu-


ment [page 127] endpoint. For example:
4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff

Response

Response Fields

JSON Field Description

documentType Type of the document submitted

extraction Dictionary containing all the extracted header fields and line items

receivedDate The date when the document was received, for example, 2020-02-17.

The response is given as a status (200, 400, 401, 404, or 500) and JSON file. See Common Status and Error
Codes [page 226].

Response Example

200 “Success”

{
"extraction": "...",
"documentType": "invoice",
"receivedDate": "2020-02-17"
}

12.1.6.11 Get Templates Associated with Document

Get all the templates associated with the specified document ID.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs/<id>/template

HTTP Method: GET

Document Information Extraction


164 PUBLIC Development
Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. For example: c_00

id Yes String The ID returned by the Upload Docu-


ment [page 127] endpoint. For example:
4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff

Response

Response Fields

JSON Field Description

templateId The ID of the template associated with the document ID.

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

200 “Success”

{
"templateId":[
"4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff"
]
}

12.1.6.12 Delete Document

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /document/jobs

HTTP Method: DELETE

Document Information Extraction


Development PUBLIC 165
Request Parameters

Parameter Required Data Type Description

payload Yes JSON Object List of document IDs

Response

Response Fields

JSON Field Description

message Status message with information about the request

processedTime Timestamp in RFC format

status Deletion status of the document. Possible values: “PENDING”, “DONE”, or “FAILED”

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"status": "DONE",
"message": "Documents deleted successfully.",
"processedTime": "2020-03-26T17:00:00.000000+00:00"
}

12.1.7 Enrichment Data API

Document Information Extraction can also enrich the information extracted from documents with your existing
structured data (typically master data records).

Enrichment in that context means to provide additional information to a document, which is not directly
contained on a document, but which is inferred based on information, which is contained on a document in
conjunction to other external data.

You can, for example, infer the proprietary ID of a customer from another SAP system based on the sender
address contained on an invoice document. Even though the customer ID is not explicitly contained on the
invoice, the ID from the SAP system can be inferred by using the address data contained on the invoice by
matching it against the relevant master data.

The service matches enrichment data entities with the Extracted Header Fields [page 278] and Extracted Line
Items [page 286] from processed documents.

The Enrichment Data API provides the functionalities to create, update, get and delete enrichment data. After
enrichment data entities have been maintained, please check the usage of the enrichment property in Upload
Document [page 127] in order to leverage the matching of enrichment data to extracted fields.

Document Information Extraction


166 PUBLIC Development
The Enrichment Data API consists of the following endpoints:

• Create Enrichment Data [page 167]


• List Data-Persistence Jobs [page 174]
• Get Enrichment Data [page 175]
• Get Enrichment Data Creation or Deletion Status [page 177]
• Create Data Activation [page 179]
• Get Data Activation Details [page 180]
• Delete Enrichment Data (Synchronous) - Deprecated [page 181]
• Delete Enrichment Data (Asynchronous) [page 182]

Related Information

Data Enrichment: Best Practices [page 271]


Enrichment Parameter [page 132]

12.1.7.1 Create Enrichment Data

Create or update one or more enrichment data entities.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /data/jobs

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. For example: c_00.

Document Information Extraction


Development PUBLIC 167
Parameter Required Data Type Description

payload Yes JSON Object List containing enrichment data entities in value
property. The entities can be:

• BusinessEntity [page 170]


• Employee [page 171]
• Product [page 172]

See request examples below. See also Data Var-


iants [page 172].

type Yes String The type of enrichment data entities used for
matching specified in the JSON string of the
payload. Available values: businessEntity,
employee, and product.

subtype No String The subtype of enrichment data entities used for


matching with type businessEntity specified in the
JSON string of the payload. Available values:
supplier, customer, and companyCode.

Request Examples
Create BusinessEntity [page 170] entities:

payload:
{
"value":[
{
"id":"BE0001",
"name":"Emma Dowerg",
"accountNumber":"SK2421",
"address1":"Amalie-Klemm-Platz 0/9, 48581, Geithain",
"address2":"Near city church",
"city":"Geithain",
"countryCode":"DE",
"postalCode":"48581",
"state":"Schleswig-Holstein",
"email":"e.dowerf@mustermail.com",
"phone":"+49(0) 909979463",
"bankAccount":"DE345982837402",
"taxId":"DE435531312"
},
{
"id":"BE0002",
"name":"Ioannis Kruschwitz",
"accountNumber":"393H292",
"address1":"Alina-Reichmann-Allee 73, 63228, Staßfurt",
"city":"Staßfurt",
"countryCode":"DE",
"postalCode":"63228",
"state":"Hessen",
"email":"Ioannis.Kruschwitz@mustermail.com",
"phone":"+49(0) 818172710",
"bankAccount":"DE1093628093743",
"taxId":"DE593029048"
}
]
}
type: businessEntity
clientId: c_00
subtype: supplier

Document Information Extraction


168 PUBLIC Development
Create Employee [page 171] entities:

payload:
{
"value":[
{
"id":"E0001",
"email":"john.will.doe@mustermail.com",
"firstName":"John",
"middleName":"William",
"lastName":"Doe"
},
{
"id":"E0002",
"email":"m.gierschner@mustermail.com",
"firstName":"Maren",
"middleName":"Volkhard",
"lastName":"Gierschner"
}
]
}
type: employee
clientId: c_00

Create Product [page 172] entities:

payload:
{
"value": [
{
"id": "12342",
"description": "Glycerin Retinol 80 ML",
"materialNumber": "B676817",
"unitPrice": "1000,0 €",
"unitOfMeasure": "LTR"
}
]
}
type: product
clientId: c_00

Response

Response Fields

JSON Field Description

id Request ID

status Status of the request. Possible values: “PENDING”, “DONE”, or “FAILED”

The response is given as a status (201, 400, 401, 422, 429, or 500) and JSON file. See Common Status and
Error Codes [page 226].

Response Example
201 “Success”

Document Information Extraction


Development PUBLIC 169
"id": "484b6e1c-501c-4a07-85cb-84554656a175",
"status": "PENDING"
}

Related Information

Entities [page 170]


Data Variants [page 172]
Data Duplicates [page 173]

12.1.7.1.1 Entities

Entities are several actors which can be addressed by a business document. A business entity can be, for
example, a customer and a supplier. The employee entity represents an employee in the company. The product
entity represents a specific good or service available in a catalog or system.

Related Information

BusinessEntity [page 170]


Employee [page 171]
Product [page 172]

12.1.7.1.1.1 BusinessEntity

A businessEntity can represent different kind of organizations with which you deal as a company. It can
represent, for example, suppliers and customers.

See Create Enrichment Data [page 167] to create businessEntity entities.

Length (maximum
Key Type length of the string) Description Example

accountNumber String 100 Account number of the 1213414


business entity. This
refers to a business ac-
count number and not
a bank account num-
ber.

Document Information Extraction


170 PUBLIC Development
Length (maximum
Key Type length of the string) Description Example

address1 String 150 Complete address Musterstraße 21,


fields of the business 13123, Musterstadt
entity. Use a comma
(“,”) to separate each
individual field of the
address.

address2 String 100 Any additional fields Near Stadt Dom


or landmarks that are
part of the address.

bankAccount String 100 Bank account number DE32245443233323


of the business entity.
Enter the bank account
number in a contin-
uous string without
spaces.

city String 100 City of the business Musterstadt


entity.

countryCode String 100 Country/Region of the Deutschland


business entity.

id String 100 Unique identifier of the BE21e112


business entity in the
user system.

email String 100 Email address of the mustermann@muster-


business entity. mail.com

name String 256 Name of the business Muster Mann GmbH


entity.

phone String 100 Phone number of +49131231331


the business entity.
Add the country/re-
gion code with the
“+” symbol before the
phone number.

postalCode String 100 Postal code of the 12323


business entity.

state String 100 State of the business Rhineland Palatinate


entity.

taxId String 100 Tax ID of the business DE123456789


entity.

12.1.7.1.1.2 Employee

An employee is a specific employee within the company.

See Create Enrichment Data [page 167] to create employee entities.

Document Information Extraction


Development PUBLIC 171
Length (maximum
Key Type length of the string) Description Example

email String 256 Email address of the m.gierschner@muster-


employee. mail.com

firstName String 100 First name of the em- Maren


ployee.

id String 128 Unique identifier of the E0002


employee in the user
system.

lastName String 100 Last name of the em- Gierschner


ployee.

middleName String 100 Middle name of the Volkhard


employee.

12.1.7.1.1.3 Product

A product is a specific good or service available in a catalog or system.

See Create Enrichment Data [page 167] to create product entities.

Length (maximum
Key Type length of the string) Description Example

description String 100 Description of the Glycerin Retinol 80 ML


product.

id String 128 Unique identifier of the 12342


product in the user
system.

materialNumber String 100 Unique code that iden- B676817


tifies a specific good
or service in a supplier
catalog or system.

unitOfMeasure String 100 The unit of measure LTR for liter and KGM
UN/CEFACT code. for kilogram.

unitPrice String 100 Price for a single in- 1000,0 €


stance of an object.

12.1.7.1.2 Data Variants

Use variants to create multiple versions of the same data record, which all point to the same record ID.

To create a data record variant, add the variant key to the Create Enrichment Data [page 167] payload:

payload:
{

Document Information Extraction


172 PUBLIC Development
"value":[
{
"id":"BE0001",
"name":"Emma Dowerg",
"address1":"Amalie-Klemm-Platz 0/9, 48581, Geithain",
"bankAccount":"DE345982837402",
"taxId":"DE435531312",
"variant":"2"
}
]
}
type: businessEntity
clientId: c_00
subtype: supplier

All the variants are used for the enrichment. If a data record match is associated with a variant ID, the matched
variant ID is returned by Get Result [page 138] alongside the usual enrichment result information. For example:

enrichment: {
"id":"BE0001",
"confidence":98.647,
"variant":2
}

The variant ID is an optional parameter. If absent, the data record is not associated to any variant. If used,
variant IDs can be a number in the inclusive range 1 - 9. Any other variant ID is invalid and will result in an error.

Creating another master data record with the same ID and variant ID will not result in an error. Instead, the
behavior is the same as creating a data record with an already existing ID, but both without variant IDs. See
Data Duplicates [page 173].

 Note

A single invalid variant ID value (for example, a variant that is not a number in the inclusive range 1 - 9) will
cause the whole batch (API request) to fail.

 Tip

You can create multiple variants of the same data record (all sharing the same ID) but in different
languages.

12.1.7.1.3 Data Duplicates

Find out how how the Document Information Extraction service handles the upload of duplicated master data
records.

What are data duplicates?

A master data record “X” is considered a duplicate by the Document Information Extraction service if there is
another existing record “Y” which fulfills all of the following conditions:

Document Information Extraction


Development PUBLIC 173
• “X” has the exact same ID as “Y”.
• “X” has the exact same variant ID as “Y”. If both records have no variant ID they are also considered as
equals.
• “X” and “Y” are created from the same tenant, client and service instance.

How does Document Information Extraction handle duplicates?

The service filters out duplicate records as part of the automatic or manual data activation. If one or more
duplicates are identified, the following update rule is applied to all of them: the most recently created record
replaces all previously created versions of that record.

This process optimizes the service experience and results for most common use cases in which duplicated
records are not intended. If duplicated records are required as part of an individual use case, this can be
achieved using variant IDs.

12.1.7.2 List Data-Persistence Jobs

Returns a list of all data-persistence jobs for this tenant.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /data/jobs

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId No String The ID of the client. For example: c_00

limit No Integer Items per page. Defines a maximum limit. For ex-
ample: 10. See Technical Constraints [page 275].

offset No Integer Offset of the first item to be returned. For example:


10

order No String Order criteria for the retrieved data-persistence


jobs. Possible values: created, client, or status. For
example: created asc (sorts by creation date in as-
cending order)

Document Information Extraction


174 PUBLIC Development
Parameter Required Data Type Description

status No String The status of this data-persistence job. Possible


values: “PENDING”, “SUCCESS”, or “FAILED”

Response

Response Fields

JSON Field Description

clientId ID of the client this data-persistence job was created for

created Time when the data-persistence job was created

id Data-persistence job ID

status The status of this data-persistence job. Possible values: “PENDING”, “SUCCESS”, or
“FAILED”

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

200 “Success”

{
"value":{
"id":"c4f25368-d3e6-43f7-a0b4-55adf7f54e95",
"status":"PENDING",
"clientId":"c_00",
"created":"2020-05-08T10:39:59.916359+00:00"
}
}

12.1.7.3 Get Enrichment Data

Retrieve one or more enrichment data entities.

 Note

Enrichment data is refreshed automatically every 4 hours. It might take up to 4 hours until the enrichment
data prediction is available in the Get Result [page 138] response. Manual data activation is also available
and is the recommended process. You can set data activation to manual using the following endpoints:

• Create Configuration [page 115]


• Create Data Activation [page 179]

Document Information Extraction


Development PUBLIC 175
Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /data

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. For example: c_00

companyCode No String The company code of a single entry

id No String The data ID of a single entry

limit No Integer Items per page. Defines a maximum limit. For ex-
ample: 10. See Technical Constraints [page 275].

offset No Integer Offset of the first item to be returned. For example:


10

type Yes String The type of enrichment data entities used for
matching. Available values: businessEntity,
employee, and product.

subtype No String The subtype of enrichment data entities used


for matching with type businessEntity. Avail-
able values: supplier, customer, and
companyCode.

system No String The system of a single entry

Response

Response Fields

JSON Field Description

accountNumber Account number of the enrichment data entity

address1 Address of the enrichment data entity

address2 Additional address of the enrichment data entity

bankAccount Bank account number of the enrichment data entity

city City name of the enrichment data entity

companyCode Company code of the enrichment data entity

countryCode Country/Region code of the enrichment data entity

Document Information Extraction


176 PUBLIC Development
JSON Field Description

email Email address of the enrichment data entity

id ID of the enrichment data entity

name Name of the enrichment data entity

phone Phone number of the enrichment data entity

postalCode Postal code of the enrichment data entity

state State code of the enrichment data entity

system System of the enrichment data entity

taxId Tax ID of the enrichment data entity

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"value": [
{
"id": "BE0001",
"name": "A",
"accountNumber": "12345",
"address1": "A street 5",
"address2": "",
"city": "Heidelberg",
"countryCode": "DE",
"postalCode": "69117",
"state": "BW",
"email": "a@a.com",
"phone": "",
"bankAccount": "000001",
"taxId": "999",
"companyCode": "4711",
"system": "System A"
}
]
}

12.1.7.4 Get Enrichment Data Creation or Deletion Status

Give a data persistence job ID to check the database and receive information on this data persistence job.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

Document Information Extraction


Development PUBLIC 177
URL Endpoint Path: /data/jobs/<id>

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

id Yes String The ID returned by the Create Enrichment Data


[page 167] or Delete Enrichment Data (Asyn-
chronous) [page 182] endpoints. For example:
29812f26-464e-4ee6-be63-731859cf99f3.

Response

Response Fields

JSON Field Description

id Request ID.

processedTime Amount of time it took to process the request.

refreshedAt Date in extended ISO 8601 format (for example, "2021-01-16T13:36:29.453713+00:00"). It


tells when the enrichment data job was refreshed for the last time. When the response is
“null”, it means that the enrichment data has not yet been refreshed.

status Status of the request. Possible values: “PENDING”, “DONE”, or “FAILED”.

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

200 “Success”

{
"value":{
"id":"b89645b4-605b-45cd-bf69-1147875e75f5",
"status":"SUCCESS",
"processedTime":"0:00:00.063022",
"refreshedAt":"2021-01-16T13:36:29.453713+00:00"
}
}

Response Example

400 “Bad Request”

{
"code": "E5",
"message": "Failed to retrieve data.",
"details": "string"
}

Document Information Extraction


178 PUBLIC Development
Response Example
401 “Unauthorized”

{
"message": "No Authorization given in the request header"
}

12.1.7.5 Create Data Activation

Create a data activation job record to see new or updated enrichment data in the extraction results if you
are using the manual data activation process. Only activated enrichment data will be added to the extraction
results.

 Remember

Before creating an enrichment data activation job record, you need to Create Configuration [page 115].

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /data/activation

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

type No String The type of enrichment data entities used for


matching. Available values: businessEntity,
employee, and product.

subtype No String The subtype of enrichment data entities used


for matching with type businessEntity. Avail-
able values: supplier, customer, and
companyCode.

Document Information Extraction


Development PUBLIC 179
Response

Response Fields

JSON Field Description

id ID of the enrichment data activation job record

status Status of the request. Possible values: “PENDING”, “DONE”, or “FAILED”

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

201 “Success”

{
"id": "484b6e1c-501c-4a07-85cb-84554656a175",
"status": "PENDING"
}

12.1.7.6 Get Data Activation Details

Give an enrichment data activation job record ID to check the database, and receive information on this data
activation job.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /data/activation/<id>

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

id Yes String The ID returned by the Create Data Ac-


tivation [page 179] endpoint. For example:
484b6e1c-501c-4a07-85cb-84554656a175.

Document Information Extraction


180 PUBLIC Development
Response

Response Fields

JSON Field Description

created Time when the enrichment data was submitted for processing

finished Time when the enrichment data status changed to “DONE”, or “FAILED”

id ID of the enrichment data activation job record

processedTime Timestamp in RFC format

status Status of the request. Possible values: “PENDING”, “DONE”, or “FAILED”

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"value": {
"id": "484b6e1c-501c-4a07-85cb-84554656a175",
"status": "DONE",
"processedTime": "0:01:00",
"created": "2019-07-04T15:20:37.668873+00:00",
"finished": "2019-07-04T15:21:37.668873+00:00"
}
}

12.1.7.7 Delete Enrichment Data (Synchronous) - Deprecated

Perform synchronous deletion of existing data records for specified fields.

 Caution

This endpoint has been deprecated and is scheduled for decommissioning in November 2024. Please use
the endpoint Delete Enrichment Data (Asynchronous) [page 182] to delete data records.

 Note

To delete large numbers of data records, use only the endpoint Delete Enrichment Data (Asynchronous)
[page 182].

Request

Base URL: url value from outside the uaa section of the service key

Document Information Extraction


Development PUBLIC 181
URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /data

HTTP Method: DELETE

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. For example: c_00

payload Yes JSON Object Comma-separated list of data record IDs that you
want to delete

type Yes String The type of enrichment data entities used for
matching specified in the JSON string of the
payload. Available values: businessEntity,
employee, and product.

subtype No String The subtype of enrichment data entities used for


matching with type businessEntity specified in the
JSON string of the payload. Available values:
supplier, customer, and companyCode.

Response

Response Fields

JSON Field Description

deleted Total number of data records deleted with this request

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"deleted": "2"
}

12.1.7.8 Delete Enrichment Data (Asynchronous)

Perform asynchronous deletion of existing data records for specified fields.

This endpoint accepts an array of data record IDs that you want to delete. If no array is entered in the payload,
all entries are deleted.

Document Information Extraction


182 PUBLIC Development
You can also delete large numbers of data records for all clients per data type (businessEntity, employee, or
product) by entering only the type parameter in your request. If you do not specify clientId and type, you
will delete all data records for a tenant.

 Tip

Delete outdated and no longer used data records frequently to improve the performance of the data
enrichment feature when matching a business document to an enrichment data record based on the
information extracted from the document.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /data/jobs

HTTP Method: DELETE

Request Parameters

Parameter Required Data Type Description

clientId No String The ID of the client. For example: c_00

payload Yes JSON Object Comma-separated list of data record IDs that you
want to delete. All data records are deleted if
payload is empty.

type No String The type of enrichment data entities used for


matching specified in the JSON string of the
payload. Available values: businessEntity,
employee, and product.

subtype No String The subtype of enrichment data entities used for


matching with type businessEntity specified in the
JSON string of the payload. Available values:
supplier, customer, and companyCode.

Request Examples
Delete all data records:

payload:
{
"value":[]
}

Delete all BusinessEntity [page 170] data records:

payload:
{
"value":[]
}

Document Information Extraction


Development PUBLIC 183
type: businessEntity

Delete all Employee [page 171] data records:

payload:
{
"value":[]
}
type: employee

Delete all Product [page 172] data records:

payload:
{
"value":[]
}
type: product

Response

Response Fields

JSON Field Description

id Request ID

status Status of the request. Possible values: “PENDING”, “DONE”, or “FAILED”

The response is given as a status (201, 400, 401, 422, or 500) and JSON file. See Common Status and Error
Codes [page 226].

Response Example
201 “Success”

{
"id": "484b6e1c-501c-4a07-85cb-84554656a175",
"status": "PENDING"
}

12.1.8 Schema API

Create schemas containing data fields found in standard or custom document types. You can use these
schemas as a basis for creating templates. You can select schemas and associated templates when adding
documents. The Schema API provides endpoints to create, list, update, and delete schemas and schema
versions.

The Schema API consists of the following endpoints:

• Create Schema [page 185]


• Get Schema [page 186]

Document Information Extraction


184 PUBLIC Development
• Get Schema Capabilities [page 188]
• Update Schema [page 190]
• Get Schema Details [page 191]
• Delete Schema [page 195]
• Create Schema Version [page 196]
• Activate Schema Version [page 197]
• Deactivate Schema Version [page 198]
• Add Fields to Schema Version [page 199]
• Get Schema Versions [page 206]
• Get Schema Version Details [page 208]
• Delete Schema Versions [page 210]

12.1.8.1 Create Schema

Create one or more schemas for a client.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

payload Yes JSON Object List containing clientId, name,


schemaDescription, documentType, and
documentTypeDescription.

Request Example

{
"clientId":"c_00",
"name":"Custom_Payment_Advice_Schema",
"schemaDescription":"Schema For Accounts Department Payment Advices",
"documentType":"paymentAdvice",
"documentTypeDescription":"Payment Advice with Order Number"
}

Document Information Extraction


Development PUBLIC 185
Response

Response Fields

JSON Field Description

created Time when the schema was created.

id ID of the schema

The response is given as a status (201, 400, 401, 429, or 500) and JSON file. See Common Status and Error
Codes [page 226] and Technical Constraints [page 275].

Response Example
201 “Success”

{
"id":"484b6e1c-501c-4a07-85cb-84554656a175",
"created":"2020-03-26T17:00:00.000000+00:00"
}

12.1.8.2 Get Schema

Retrieve all schemas for a client.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

documentType No String The type of the document used when creating


the schema. For example: custominvoice, or
paymentAdvice

limit No Integer Maximum number of schemas to be returned


(maximum allowed value: 1000). For example: 10

Document Information Extraction


186 PUBLIC Development
Parameter Required Data Type Description

offset No Integer Index of the first schema to be retrieved. For exam-


ple: 20

order No String Order criteria of schemas to be returned. For ex-


ample: “name asc”, (sorts by name in ascending
order)

predefined No Boolean Set to true for standard documents or false for


custom documents.

Response

Response Fields

JSON Field Description

created Time when the schema was created

documentType Type of the document submitted

documentTypeDescrip Description of the document submitted


tion

id ID of the schema

name Name of the schema

predefined True for standard documents, false for custom documents

schemaDescription Description of the schema

state State of the schema

updated Time when the schema was updated

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"schemas":[
[
{
"name":"Basic Involve FormatSchema",
"schemaDescription":"SAP Invoice Schema",
"documentType":"Invoice",
"documentTypeDescription":"Payment Advice with Order Number",
"id":"484b6e1c-501c-4a07-85cb-84554656a175",
"predefined":"True",
"created":"2020-03-26T17:00:00.000000+00:00",
"updated":"2020-04-26T17:00:00.000000+00:00",
"state":"draft"
},
{

Document Information Extraction


Development PUBLIC 187
"name":"Daimier Payment Advice Schema",
"schemaDescription":"Payment Advice Schema",
"documentType":"Payment Advice",
"documentTypeDescription":"Payment Advice with Order Number",
"id":"484b6e1c-501c-4a07-85cb-84554656a189",
"predefined":"False",
"created":"2020-03-26T17:00:00.000000+00:00",
"updated":"2020-04-26T17:00:00.000000+00:00",
"state":"active"
}
]
]
}

12.1.8.3 Get Schema Capabilities

Retrieve all schema capabilities.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/capabilities

HTTP Method: GET

Response

Response Fields

JSON Field Description

documentTypes List of the document types that are relevant to schemas

formatting List of the possible formatting for schemas

setupTypes List of the possible setup types for schemas

state List of the possible statuses for schemas

The response is given as a status (200, 401, or 500) and JSON file. See Common Status and Error Codes [page
226].

Response Example
200 “Success”

{
"documentTypes":[

Document Information Extraction


188 PUBLIC Development
"invoice",
"paymentAdvice",
"purchaseOrder",
"custom",
"businessCard"
],
"state":[
"active",
"inactive",
"draft"
],
"setupTypes":[
{
"name":"static",
"properties":[

]
},
{
"name":"ml",
"properties":[
"x",
"y",
"w",
"z"
]
},
{
"name":"...",
"properties":"[]"
}
],
"formatting":[
{
"name":"string",
"properties":[
{
"name":"length",
"values":[
"number"
]
}
]
},
{
"name":"number",
"properties":[
{
"name":"length",
"values":[
"number"
]
},
{
"name":"thousandSeparator",
"values":[
".",
",",
" "
]
},
{
"name":"decimalSeparator",
"values":[
".",
",",
" "
]

Document Information Extraction


Development PUBLIC 189
}
]
},
{
"name":"...",
"properties":"[]"
}
]
}

12.1.8.4 Update Schema

Update existing schemas for a client.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>

HTTP Method: PUT

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

payload Yes JSON Object List containing name, schemaDescription,


and documentTypeDescription.

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

Request Example

{
"name":"Custom_Payment_Advice_Schema",
"schemaDescription":"Schema For Accounts Department Payment Advices",
"documentTypeDescription":"Payment Advice with Order Number"
}

Document Information Extraction


190 PUBLIC Development
Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

201 “Success”

{
"message":"Schema has been updated successfully."
}

12.1.8.5 Get Schema Details

Retrieve schema details for a client.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

Document Information Extraction


Development PUBLIC 191
Response

Response Fields

JSON Field Description

created Time when the schema was created

documentType Type of the document used for the schema

documentTypeDescrip Description of the document used for the schema


tion

headerFields List of header fields that are part of the schema

id ID of the schema

lineItemFields List of line items that are part of the schema

name Name of the schema

predefined True for standard documents, false for custom documents

schemaDescription Description of the schema

state State of the schema

updated Time when the schema was updated

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"name":"Basic Involve FormatSchema",
"schemaDescription":"SAP Invoice Schema",
"documentType":"Invoice",
"documentTypeDescription":"Payment Advice with Order Number",
"id":"484b6e1c-501c-4a07-85cb-84554656a175",
"created":"2020-03-26T17:00:00.000000+00:00",
"updated":"2020-04-26T17:00:00.000000+00:00",
"predefined":"FALSE",
"state":"draft",
"headerFields":[
{
"name":"GrossAmountValue",
"description":"TotalAmountValue",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"1.0.0",
"setup":{
"type":"default",
"priority":1,
"filter":[
{
"key":"language",
"value":"EN"
},
{

Document Information Extraction


192 PUBLIC Development
"key":"language",
"value":"DE"
}
],
"properties":[
{
"key":"deploymentID",
"value":"123e4567-e89b-12d3-a456-426614174000."
},
{
"key":"fieldName",
"value":"GrossAmount"
}
]
},
"formattingType":"number",
"formatting":{
"length":"64",
"precision":"3",
"decimalSeparator":".",
"thousandSeparator":","
},
"formattingTypeVersion":"1.0.0"
},
{
"name":"sendersFullName",
"description":"Name of Sender",
"defaultExtractor":{
"fieldName":"senderName"
},
"setup":{
"type":"default",
"priority":1,
"filter":[
{
"key":"language",
"value":"EN"
},
{
"key":"language",
"value":"DE"
}
],
"properties":[
{
"key":"deploymentID",
"value":"123e4567-e89b-12d3-a456-426614174000."
},
{
"key":"fieldName",
"value":"senderName"
}
]
},
"setupTypeVersion":"",
"setupType":"",
"formattingType":"",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
}
],
"lineItemFields":[
{
"name":"Amount",
"description":"TotalAmountValue",
"defaultExtractor":{

Document Information Extraction


Development PUBLIC 193
},
"setupType":"static",
"setupTypeVersion":"1.0.0",
"setup":{
"type":"default",
"priority":1,
"filter":[
{
"key":"language",
"value":"EN"
},
{
"key":"language",
"value":"DE"
}
],
"properties":[
{
"key":"deploymentID",
"value":"123e4567-e89b-12d3-a456-426614174000."
},
{
"key":"fieldName",
"value":"NetAmount"
}
]
},
"formattingType":"number",
"formatting":{
"length":"64",
"precision":"3",
"decimalSeparator":".",
"thousandSeparator":","
},
"formattingTypeVersion":"1.0.0"
},
{
"name":"WithdrawalDate",
"description":"Date of Withdrawal",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"1.0.0",
"setup":{
"type":"default",
"priority":1,
"filter":[
{
"key":"language",
"value":"EN"
},
{
"key":"language",
"value":"DE"
}
],
"properties":[
{
"key":"deploymentID",
"value":"123e4567-e89b-12d3-a456-426614174000."
},
{
"key":"fieldName",
"value":"DocumentDate"
}
]

Document Information Extraction


194 PUBLIC Development
},
"formattingType":"date",
"formatting":{
"dateformat":"dd/mm/yy"
},
"formattingTypeVersion":"1.0.0"
}
]
}

12.1.8.6 Delete Schema

Delete one or more schemas for a client.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas

HTTP Method: DELETE

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

payload Yes JSON Object Comma-separated list of schemaIds you want to


delete.

Request Example

{
"value":[
"4476cc01-72f3-4b64-9eb0-cdd9c1cb27ff"
]
}

Document Information Extraction


Development PUBLIC 195
Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

200 “Success”

{
"message":"Schemas deleted successfully."
}

12.1.8.7 Create Schema Version

Create a new version for a schema.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

Document Information Extraction


196 PUBLIC Development
Response

Response Fields

JSON Field Description

created Time when the schema version was created.

id ID of the schema

version Version of the schema

The response is given as a status (201, 400, 401, 429, or 500) and JSON file. See Common Status and Error
Codes [page 226] and Technical Constraints [page 275].

Response Example
201 “Success”

{
"id":"484b6e1c-501c-4a07-85cb-84554656a175",
"version":"2",
"created":"2020-03-26T17:00:00.000000+00:00"
}

12.1.8.8 Activate Schema Version

Activate a particular version of a schema.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>/versions/<version>/activate

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

Document Information Extraction


Development PUBLIC 197
Parameter Required Data Type Description

version Yes String The version returned by the endpoint Create


Schema Version [page 196]. Example: 2

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226] and Technical Constraints [page 275].

Response Example
201 “Success”

{
"message":"Schema version activated successfully."
}

12.1.8.9 Deactivate Schema Version

Deactivate a particular version of a schema..

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>/versions/<version>/deactivate

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

Document Information Extraction


198 PUBLIC Development
Parameter Required Data Type Description

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

version Yes String The version returned by the endpoint Create


Schema Version [page 196]. Example: 2

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226] and Technical Constraints [page 275].

Response Example

201 “Success”

{
"message":"Schema version deactivated successfully."
}

12.1.8.10 Add Fields to Schema Version

Add fields to schema version for a client.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>/versions/<version>/fields

HTTP Method: POST

Document Information Extraction


Development PUBLIC 199
Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

Document Information Extraction


200 PUBLIC Development
Parameter Required Data Type Description

payload Yes JSON Object


List containing the headerFields and
lineItemFields that you want to add to the
schema version.

You can also optionally use the label property


to give user-friendly names to some or all of the
headerFields and lineItemFields that
you enter in the payload.

 Remember
Each label can have a maximum length of
200 characters.

Use setup types to choose the best schema field


setup type for your documents. The following val-
ues are available:

• auto
• manual

In schemas created for standard document types,


auto supports extraction in the following ways:

• Using the service’s machine learning models:


In this case, select an appropriate default ex-
tractor.
• Using generative AI: In this case, don’t select a
default extractor.

In schemas created for custom document types,


auto supports extraction using generative AI. In
this case, no default extractor is available.

 Restriction
The setup type auto is available without de-
fault extractor for schemas with the service
plan Document Information Extraction, pre-
mium edition (premium_edition) only. See
Service Plans [page 77] and Metering and Pric-
ing [page 79].

 Caution
Always validate information extracted using
generative AI before using it for critical appli-
cations.

Document Information Extraction


Development PUBLIC 201
Parameter Required Data Type Description

If you prefer not to use generative AI to


extract information from documents, select
the setup type auto with a default extractor
(standard document types only) or select
the setup type manual (standard and custom
document types) when adding data fields to
your schema.

The setup type manual supports extraction using


a template. It's available in schemas created for
standard or custom document types.

 Note
To consume the setup types "auto" and
"manual", use the setupTypeVersion
2.0.0.

The setupTypeVersion 1.0.0 and the


setuptype "default" are still supported. As
of October 9, 2023, 2.0.0 is the recommended
setupTypeVersion.

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

version Yes String The version returned by the endpoint Create


Schema Version [page 196]. Example: 2

Request Example: Payload with label and setupTypeVersion 2.0.0

{
"headerFields":[
{
"name":"documentDate",
"label":"Document Date",
"description":"Document Date",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"2.0.0",
"setup":{
"type":"manual",
"priority":1
},
"formattingType":"date",
"formatting":{
"dateformat":"dd/mm/yy"
},
"formattingTypeVersion":"1.0.0"
}
],
"lineItemFields":[
{

Document Information Extraction


202 PUBLIC Development
"name":"netAmount",
"label":"Net Amount",
"description":"Net Amount",
"defaultExtractor":{
"fieldName":"netAmount"
},
"setupType":"static",
"setupTypeVersion":"2.0.0",
"setup":{
"type":"auto",
"priority":1
},
"formattingType":"number",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
},
{
"name":"discountAmount",
"label":"Discount Amount",
"description":"Discount Amount",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"2.0.0",
"setup":{
"type":"manual",
"priority":1
},
"formattingType":"number",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
}
]
}

Request Example: Payload with label, setupType auto without defaultExtractor, and
setupTypeVersion 2.0.0

{
"headerFields":[
{
"name":"documentDate",
"label":"Document Date",
"description":"Document Date",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"2.0.0",
"setup":{
"type":"auto",
"priority":1
},
"formattingType":"date",
"formatting":{
"dateformat":"dd/mm/yy"
},
"formattingTypeVersion":"1.0.0"
},
{
"name":"documentNumber",
"label":"Document Number",

Document Information Extraction


Development PUBLIC 203
"description":"Document Number",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"2.0.0",
"setup":{
"type":"auto",
"priority":1
},
"formattingType":"number",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
}
],
"lineItemFields":[
{
"name":"netAmount",
"label":"Net Amount",
"description":"Net Amount",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"2.0.0",
"setup":{
"type":"auto",
"priority":1
},
"formattingType":"number",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
},
{
"name":"discountAmount",
"label":"Discount Amount",
"description":"Discount Amount",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"2.0.0",
"setup":{
"type":"auto",
"priority":1
},
"formattingType":"number",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
}
]
}

Request Example: Payload with setupTypeVersion 1.0.0

{
"headerFields":[
{
"name":"DocumentNumber",
"description":"",
"defaultExtractor":{

Document Information Extraction


204 PUBLIC Development
"fieldName":"documentNumber"
},
"setupType":"static",
"setupTypeVersion":"1.0.0",
"setup":{

},
"formattingType":"string",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
},
{
"name":"TaxId",
"description":"",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"1.0.0",
"setup":{

},
"formattingType":"string",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
}
],
"lineItemFields":[
{
"name":"Quantity",
"description":"",
"defaultExtractor":{
"fieldName":"quantity"
},
"setupType":"static",
"setupTypeVersion":"1.0.0",
"setup":{

},
"formattingType":"number",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
},
{
"name":"netAmount",
"description":"",
"defaultExtractor":{

},
"setupType":"static",
"setupTypeVersion":"1.0.0",
"setup":{

},
"formattingType":"number",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
},
{
"name":"UnitPrice",

Document Information Extraction


Development PUBLIC 205
"description":"",
"defaultExtractor":{
"fieldName":"unitPrice"
},
"setupType":"static",
"setupTypeVersion":"1.0.0",
"setup":{

},
"formattingType":"number",
"formatting":{

},
"formattingTypeVersion":"1.0.0"
}
]
}

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226] and Technical Constraints [page 275].

Response Example
201 “Success”

{
"message":"Schema fields have been uploaded successfully."
}

12.1.8.11 Get Schema Versions

Retrieve all versions for a schema.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>/versions

HTTP Method: GET

Document Information Extraction


206 PUBLIC Development
Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

Response

Response Fields

JSON Field Description

created Time when the schema was created

documentType Type of the document submitted

documentTypeDescrip Description of the document submitted


tion

id ID of the schema

name Name of the schema

predefined True for standard documents, false for custom documents

schemaDescription Description of the schema

state State of the schema

updated Time when the schema was updated

version Version of the schema

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"schemas":[
[
{
"name":"Basic Involve FormatSchema",
"schemaDescription":"SAP Invoice Schema",
"documentType":"Invoice",
"documentTypeDescription":"Payment Advice with Order Number",
"id":"484b6e1c-501c-4a07-85cb-84554656a175",
"version":"1",
"predefined":"True",
"created":"2020-03-26T17:00:00.000000+00:00",
"updated":"2020-04-26T17:00:00.000000+00:00",
"state":"draft"
},

Document Information Extraction


Development PUBLIC 207
{
"name":"Basic Involve FormatSchema",
"schemaDescription":"SAP Invoice Schema",
"documentType":"Invoice",
"documentTypeDescription":"Payment Advice with Order Number",
"id":"484b6e1c-501c-4a07-85cb-84554656a175",
"version":"2",
"predefined":"True",
"created":"2020-03-26T17:00:00.000000+00:00",
"updated":"2020-04-26T17:00:00.000000+00:00",
"state":"draft"
}
]
]
}

12.1.8.12 Get Schema Version Details

Retrieve version details of a schema for a client.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>/versions/<version>

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

version Yes String The version returned by the endpoint Create


Schema Version [page 196]. Example: 2

Document Information Extraction


208 PUBLIC Development
Response

Response Fields

JSON Field Description

created Time when the schema was created

documentType Type of the document submitted

documentTypeDescrip Description of the document submitted


tion

id ID of the schema

name Name of the schema

predefined True for standard documents, false for custom documents

schemaDescription Description of the schema

state State of the schema

updated Time when the schema was updated

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

200 “Success”

{
"schemas":[
[
{
"name":"Basic Involve FormatSchema",
"schemaDescription":"SAP Invoice Schema",
"documentType":"Invoice",
"documentTypeDescription":"Payment Advice with Order Number",
"id":"484b6e1c-501c-4a07-85cb-84554656a175",
"predefined":"True",
"created":"2020-03-26T17:00:00.000000+00:00",
"updated":"2020-04-26T17:00:00.000000+00:00",
"state":"draft"
},
{
"name":"Daimier Payment Advice Schema",
"schemaDescription":"Payment Advice Schema",
"documentType":"Payment Advice",
"documentTypeDescription":"Payment Advice with Order Number",
"id":"484b6e1c-501c-4a07-85cb-84554656a189",
"predefined":"False",
"created":"2020-03-26T17:00:00.000000+00:00",
"updated":"2020-04-26T17:00:00.000000+00:00",
"state":"active"
}
]
]
}

Document Information Extraction


Development PUBLIC 209
12.1.8.13 Delete Schema Versions

Delete versions associated with a schema.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /schemas/<schemaId>/versions

HTTP Method: DELETE

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client used when creating the
schema. Example: c_00

payload Yes JSON Object Comma-separated list of the schema versions you
want to delete. The schema and all its versions
are deleted if payload is empty. You can't delete
version "1".

schemaId Yes String The ID returned by the endpoint Create Schema


[page 185]. Example: 4476cc01-72f3-4b64-9eb0-
cdd9c1cb27ff

Request Example

{
"version":[
"5"
]
}

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Document Information Extraction


210 PUBLIC Development
Response Example
201 “Success”

{
"message":"Schema versions deleted successfully."
}

12.1.9 Template API

Create, reuse, edit, and delete templates based on schemas and document types. You can select templates
together with a corresponding schema to extract information from business documents of the appropriate
type and structure. The Template API provides endpoints to create, update, list, import, export, activate,
deactivate, and delete templates. You can also associate documents with a template and dissociate documents
from a template using the Template API endpoints.

The Template API consists of the following endpoints:

• Create or Update Template [page 211]


• Get Template [page 213]
• Import Template [page 215]
• Get Template Details [page 216]
• Delete Template [page 218]
• Activate Template [page 219]
• Deactivate Template [page 220]
• Associate Document with Template [page 221]
• Dissociate Document from Template [page 222]
• Export Template [page 223]
• Create Template Metadata [page 224]
• Get Template Metadata [page 225]

12.1.9.1 Create or Update Template

Create or update a template.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates

HTTP Method: POST

Document Information Extraction


Development PUBLIC 211
Request Parameters

Parameter Required Data Type Description

payload Yes JSON Object List containing id, name, description,


clientId, schemaID, and schemaVersion.

 Note
If id is not provided, a template ID is gener-
ated and returned.

If id is provided, but it is not in the system,


a new record with template ID provided is cre-
ated.

If id is provided, and it is in the system, an


update of record with same template ID is exe-
cuted.

Request Example

{
"id":"37c8a59b-b210-48c1-9002-19ec989066eb",
"name":"Test_Template",
"description":"Test description",
"clientId":"c_00",
"schemaId":"37c8a59b-b210-48c1-9002-19ec989066eb",
"schemaVersion":"1"
}

Response

Response Fields

JSON Field Description

id Template ID

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226] and Technical Constraints [page 275].

Response Example

201 “Created”

{
"id":"31516520-b4c9-40a6-b9ba-94d1800d472d"
}

Document Information Extraction


212 PUBLIC Development
12.1.9.2 Get Template

Get templates for a client ID.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

includeHeader No Boolean Result should include header fields

includeLineItem No Boolean Result should include line item fields


s

limit No Integer Max number of records to be returned. All records


are returned if limit = 0. Example: 10

offset No Integer The offset of the query result start index to be re-
turned. Example: 0

order No String Order the results. Example: name asc

Response

Response Fields

JSON Field Description

clientId ID of the client this template was created for

creationDate Date when you created this template

documentAssociation ID of the documents associated with this template


s

description Template description

documentType Type of the document this template was created for

extraction Dictionary containing all the extracted header fields and line items

headerFields Dictionary containing all extracted header fields

Document Information Extraction


Development PUBLIC 213
JSON Field Description

id Template ID

isActive Set to true if template has been activated. Set to false if template has not been activated, or
it has been deactivated

language Template language

lastUpdatedDate Date when you last updated this template

lineItemFields Dictionary containing all extracted line items

name Template name

results List containing information of all templates by clientId

schemaId Schema ID

status Template status. Possible values: “NO_SAMPLES”, “NO_ANNOTATIONS”, or “READY”

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example

200 “Success”

{
"results":[
{
"id":"5fb6279a-1bb9-4e37-b3bc-95ffb0e3d220",
"schemaId":"3e048fac-7799-45dc-a360-ff921d8ef152",
"name":"Test Template",
"description":"Test Description",
"language":"en",
"documentType":"invoice",
"clientId":"c_00",
"status":"NO_SAMPLES",
"isActive":true,
"creationDate":"2023-11-14T07:39:23.536547+00:00",
"lastUpdatedDate":"2023-11-14T07:39:23.536547+00:00",
"documentAssociations":[
{
"id":"sample_id"
}
],
"extraction":{
"headerFields":[
{
"name":"documentNumber",
"label":"Document Number:",
"type":"number"
}
]
}
},
{
"id":"1213723c-bdff-4b2a-b821-93f051966b0c",
"schemaId":"0f68b9c8-1e10-467d-a01a-23ffae9b5e4e",
"name":"Test Template 2",
"description":"Test Description 2",
"language":"en",
"documentType":"invoice",
"clientId":"c_00",

Document Information Extraction


214 PUBLIC Development
"status":"NO_SAMPLES",
"isActive":false,
"creationDate":"2023-11-14T07:39:23.536547+00:00",
"documentAssociations":[
{
"id":"sample_id"
}
],
"extraction":{
"headerFields":[
{
"name":"documentNumber",
"type":"number"
}
]
}
}
]
}

12.1.9.3 Import Template

Create or update a template.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/import

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

file Yes File The template file you want to import.

Response

The response is given as a status (201, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
201 “Created”

Document Information Extraction


Development PUBLIC 215
12.1.9.4 Get Template Details

Get template details for a template ID. You can only get template details that belong to the same zone_id and
client_id.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

Response

Response Fields

JSON Field Description

clientId ID of the client this template was created for

creationDate Date when you created this template

documentAssociation ID of the documents associated with this template


s

description Template description

documentType Type of the document this template was created for

extraction Dictionary containing all the extracted header fields and line items

headerFields Dictionary containing all extracted header fields

id Template ID

isActive Set to true if template has been activated. Set to false if template has not been activated, or
it has been deactivated

language Template language

lastUpdatedDate Date when you last updated this template

Document Information Extraction


216 PUBLIC Development
JSON Field Description

lineItemFields Dictionary containing all extracted line items

name Template name

schemaId Schema ID

schemaName Schema Name

status Template status. Possible values: “NO_SAMPLES”, “NO_ANNOTATIONS”, or “READY”

The response is given as a status (200, 400, 401, or 500) and JSON file. See Common Status and Error Codes
[page 226].

Response Example
200 “Success”

{
"id":"37c8a59b-b210-48c1-9002-19ec989066eb",
"schemaId":"608aa59c-4895-4308-bcae-905f8f343acc",
"name":"Test Template",
"description":"Test Template Description",
"language":"en",
"documentType":"invoice",
"clientId":"c_00",
"status":"NO_SAMPLES",
"isActive":true,
"creationDate":"2023-11-14",
"lastUpdatedDate":"2023-11-14T07:39:23.536547+00:00",
"schemaName":"SAP_Schema",
"documentAssociations":[
{
"id":"f58f7e0b-a1a8-449c-aa4b-6c71e256cd3e"
}
],
"extraction":{
"headerFields":[
{
"name":"string",
"label":"string",
"type":"string"
}
],
"lineItemFields":[
{
"name":"string",
"label":"string",
"type":"string"
}
]
}
}

Document Information Extraction


Development PUBLIC 217
12.1.9.5 Delete Template

Delete a template and its links to the associated documents for a template ID.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>

HTTP Method: DELETE

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (200, 400, 401, or 500). See Common Status and Error Codes [page 226].

Response Example
200 “Success”

{
"message":"Successfully deleted 1 template."
}

Document Information Extraction


218 PUBLIC Development
12.1.9.6 Activate Template

Activate a template.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>/activate

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (200, 400, 401, or 500). See Common Status and Error Codes [page 226].

Response Example
200 “Success”

{
"message":"Successfully activated the template"
}

Document Information Extraction


Development PUBLIC 219
12.1.9.7 Deactivate Template

Deactivate a template.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>/deactivate

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (200, 400, 401, or 500). See Common Status and Error Codes [page 226].

Response Example
200 “Success”

{
"message":"Successfully deactivated the template"
}

Document Information Extraction


220 PUBLIC Development
12.1.9.8 Associate Document with Template

Associate a document with a template.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>/documents/<document_id>

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

document_id Yes String Document ID. Example:


5146ce05-7tf3-4g64-9eb0-vaa9f9cb22af

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

validateDocumen No Boolean Set to false to skip document status validation


tStatus when associating document with template. The de-
fault value is true.

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (200, 400, 401, or 500). See Common Status and Error Codes [page 226].

Response Example

200 “Success”

{
"message":"Successfully added document to the template."
}

Document Information Extraction


Development PUBLIC 221
12.1.9.9 Dissociate Document from Template

Dissociate a document from a template.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>/documents/<document_id>

HTTP Method: DELETE

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

document_id Yes String Document ID. Example:


5146ce05-7tf3-4g64-9eb0-vaa9f9cb22af

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

Response

Response Fields

JSON Field Description

message Status message with information about the request

The response is given as a status (200, 400, 401, or 500). See Common Status and Error Codes [page 226].

Response Example

200 “Success”

{
"message":"Successfully removed document from the template."
}

Document Information Extraction


222 PUBLIC Development
12.1.9.10 Export Template

Export a template.

 Note

You can download malware-scanned documents only. You can't download documents that are part of the
template export package but haven't been malware-scanned during upload.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>/export

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

Response

The response is given as a status (200, 400, 401, 410 or 500). See Common Status and Error Codes [page
226].

Response Example
200 “Success”

Document Information Extraction


Development PUBLIC 223
12.1.9.11 Create Template Metadata

Set certain fields of a template to be fixed-value fields.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>/metadata

HTTP Method: POST

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

payload Yes JSON Object List containing all fixed-value fields of a template

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

Response

Response Fields

JSON Field Description

id Metadata ID

The response is given as a status (201, 400, 401, or 500). See Common Status and Error Codes [page 226].

Response Example

201 “Accepted”

{
"id":"b6e6ddaf-ceb0-4245-ab07-6ced50b18807"
}

Document Information Extraction


224 PUBLIC Development
12.1.9.12 Get Template Metadata

Get all fixed-value fields of a template.

Request

Base URL: url value from outside the uaa section of the service key

URL Path Extension: /document-information-extraction/v1

URL Endpoint Path: /templates/<template_id>/metadata

HTTP Method: GET

Request Parameters

Parameter Required Data Type Description

clientId Yes String The ID of the client. Example: c_00

template_id Yes String Template ID. Example: 4176cc01-71f3-4b64-9eb0-


cdd9c0cb27fd

Response

Response Fields

JSON Field Description

metadata Dictionary containing all fixed-value fields of a template

name Metadata name

value Metadata value

The response is given as a status (200, 400, 401, or 500). See Common Status and Error Codes [page 226].

Response Example

200 “Success”

{
"metadata":[
{
"name":"name",
"value":"value"
}
]
}

Document Information Extraction


Development PUBLIC 225
12.1.10 Common Request Headers

Header Required Description

Authorization Yes Access token used to access the serv-


ice.

Content-Type Yes Indicates the type that can be con-


sumed by this service.

Set the Accept parameter to


application/json.

Accept Yes Indicates the type that is associated


with the message body.

Set the Content-Type parameter to


application/json.

tenantName Yes Specifies the tenant name used to ac-


cess this service.

12.1.11 Common Status and Error Codes

Code Reason

200 The request was successful

201 Client creation, document upload and deletion of enrich-


ment data or uploaded document, for example, was suc-
cessful.

400 Bad request. Document Information Extraction process


could not be submitted or completed; possibly due to pa-
rameter error.

401 Unauthorized, for example, no token or bad token

413 The request you are making is too large. Either you are send-
ing a file that is too large or trying to process too many
objects in a single request. See Technical Constraints [page
275].

415 Unsupported document file format. See Supported Docu-


ment Types and File Formats [page 84].

Document Information Extraction


226 PUBLIC Development
Code Reason

422 Unprocessable entity. Your request payload references a cli-


entId, senderName, or documentNumber that does not ex-
ist. For example, you will get this error if you try to create a
document for a client that does not exist. You may also get
this error if the document you upload cannot be parsed.

429 Application quota limit exceeded

500 Internal server error. Document Information Extraction proc-


ess could not be submitted or completed; possibly due to an
internal error.

503 System is temporarily unavailable

12.2 Notifications

Discover the notifications functionality.

Use this functionality to get notifications about your processed documents status without having to constantly
poll the Document Information Extraction service. Through this functionality, Document Information Extraction
notifies an endpoint using a callback URL that you specify with the explicit name of document-information-
extraction-callback. The notification callback request will only be sent once document processing has
either completed or failed.

 Note

To enable the notifications functionality, set the configuration key activateDocumentNotifications to


true as described in Create Configuration [page 115].

Document Information Extraction sends only one notification per document without retry.

 Restriction

The notifications functionality is available from 2020-05-18. Any service instance created before this date
does not include this functionality. If existing customers want to use their existing instances with this
new functionality, they need to subscribe to the Document Information Extraction UI in SAP Business
Technology Platform, as described in Subscribing to the Document Information Extraction UI [page 234]
(procedure steps from 1 until 6).

Related Information

Enabling Destination Service for Notifications [page 228]


Creating Destination Configuration for Notifications [page 229]

Document Information Extraction


Development PUBLIC 227
Supported Authentication Methods [page 231]
Callback Request Examples [page 231]
Callback Response Status [page 233]

12.2.1 Enabling Destination Service for Notifications

Prerequisites

You have subscribed to the Document Information Extraction UI in SAP Business Technology Platform.

 Tip

In Subscribing to the Document Information Extraction UI [page 234], observe the prerequisites and follow
the procedure steps from 1 until 4.

To use the notifications functionality, you need to enable the Cloud Foundry Destination Service at subaccount
level via the Entitlements. After that, Destinations will be visible in the left navigation pane.

Document Information Extraction


228 PUBLIC Development
See Consuming the Destination Service.

12.2.2 Creating Destination Configuration for Notifications

Create a new destination configuration that includes the callback URL, and some additional information about
authentication credentials and the ProxyType.

Name the callback endpoint document-information-extraction-callback. You can only have one
callback endpoint with this name on subaccount level. This destination configuration callback URL must link to
an endpoint connected to the Internet.

Document Information Extraction


Development PUBLIC 229
See Create HTTP Destinations.

Example

NoAuthentication Destination Configuration:

Example

BasicAuthentication Destination Configuration:

Document Information Extraction


230 PUBLIC Development
Example

OAuth2 Client Credentials Destination Configuration:

12.2.3 Supported Authentication Methods

The following authentication types are currently supported:

• NoAuthentication
• BasicAuthentication
• OAuth2 Client Credentials

See Create HTTP Destinations.

12.2.4 Callback Request Examples

The Document Information Extraction callback sends a POST request to the URL specified in the destination
configuration with the name document-information-extraction-callback.

Example

Payload

The payload will be sent with the POST request to the specified callback URL in the destination configuration
specified by the customer.

The payload includes the ID of the uploaded document and its status. These two fields are in alignment with the
other Document Information Extraction API fields:

Document Information Extraction


Development PUBLIC 231
• The ID field string represents a <uuid> representing a document
• The status field string includes the process status which can be either “DONE”, or “FAILED”

This payload indicates a successful processing of the document:

{
"id": "d7c08124-d852-408f-8d46-0466312f6007",
"status": "DONE"
}

Example

NoAuthentication

CURL representation of the POST request with no authentication to the callback URL of the customer:

curl --location --request POST 'https://callback.url/notify-callback' \


--header 'Content-Type: application/json' \
--data-raw '{
"id": "d7c08124-d852-408f-8d46-0466312f6007",
"status": "DONE"
}'

Example

BasicAuthentication

CURL representation of the POST request with basic authentication to the callback URL of the customer:

curl --location --request POST 'https://callback.url/notify-callback' \


--header 'Content-Type: application/json' \
--header 'Authorization: Basic dGVzdC11c2VyOnRlc3QtcGFzc3dvcmQ=' \
--data-raw '{
"id": "d7c08124-d852-408f-8d46-0466312f6007",
"status": "FAILED"
}'

Example

OAuth2 Client Credentials

CURL representation of the POST request with OAuth2 client credentials to the callback URL of the customer:

curl --location --request POST 'https://callback.url/notify-callback' \


--header 'Content-Type: application/json' \
--header 'Authorization: Bearer
eyJhbGciOiJSUzI1NiIsImprdSI6Imh0dHBzOi8vc2FwLXByb3Zpc2lvbmluZy5hdXRoZW50aWNhdGlvb
i5zYXAuaGFuYS5vbmRlbWFuZC5jb20vdG9rZW5fa2V5cyIsImtpZCI6ImtleS1pZC0xIiwidHlwIjoiSl
dUIn0.eyJqdGkiOiIxNzBhYzY2jU0YmQwOTE0NDhkNjBhZDcyMDQzNyIsImV4dF9hdHRyIjp7ImVuaGFu
Y2VyIjoiWFNVQUEiLCJ6ZG4iOiJzYXAtcHJvdmlzaW9uaW5nI6InNiLXRlbmFudC1vbmJvYXJkaW5nIXQ

Document Information Extraction


232 PUBLIC Development
xMyIsInNjb3BlIjpbImRveC14c3VhYS1pbnQtdCFiOTM4MC5DYWxsYmFjayJdLCJjbGllbnRfaWQiOiJz
Yi10ZW5hbnQtb25ib2FyZGluZyF0MTMiLCJjaWQiOiJzYi10ZW5hbnQtb25ib2FyZGluZyF0MTMiLCJhe
nAiOiJzYi10ZW5hbnQtb25ib2FyZGluZyF0MTMiLCJncmFudF90eXBlIjoiY2xpZW50X2NyZWRlbnRpYW
xzIiwicmV2X3NpZyI6Ijc3MWQ1DDFmIiwiaWF0Njk4LCJleHAiOjE1ODUxNzQ4OTgsImlzcyI6Imh0dHA
6Ly9zYXAtcHJvdmlzaW9uaW5nLmxvY2FsaG9zdDo4MDgwL3VhYS9vYXV0aC90b2tlbiIsInppZCI6InNh
cC1wcm92aXNpb25pbmciLCJhdWQiOlsic2ItdGVuYW50LW9uYm9hcmRpbmchdDEzIiwiZG94LXhzdWFhL
WludC10IWI5MzgwIl19.ROCb2LQZOGTFE7ZKQVC8T-
kuvzb8DtMjetY8vqeJUt9GC1UA24siGkiagTGPYNzalvlBwLW2b1Thx7WA3OkVIMLiWwG_7AHm6ONjoUz
Ew8v35NMlHALrY97oRPgSZOSCWFzhzKnL6t1Y0G0m83ctQAJaml-wd5NdDSbHyoIkJ3i5qhXC-
rVaNsAnfX9eerJtjYwxvqvIYi9rEewTg-EcRBdWndvB962RFDGDZco_92ZNP4uYN238_0-
ylFKYFF8mdlSivwc8SNscXCojlCAgk_4kYqiM_3ai5FkuXwyZunoPtrNnr77yK5HUyuZUuYmhzy7F6GJI
59VCrPYnELJPiw' \
--data-raw '{
"id": "d7c08124-d852-408f-8d46-0466312f6007",
"status": "DONE"
}'

12.2.5 Callback Response Status

The status of the callback response should be 200 “OK”, as you can see in the curl response below. Statuses
less than 400 are also accepted.

Request

Callback request from the Document Information Extraction service:

curl --verbose --location --request POST 'https://callback.url/notify-callback'


\
--header 'Authorization: Basic dGVzdC11c2VyOnRlc3QtcGFzc3dvcmQ=' \
--header 'Content-Type: application/json' \
--data-raw '{
"id": "d7c08124-d852-408f-8d46-0466312f6007",
"status": "DONE"
}'

Response

< HTTP/1.1 200 OK


< Content-Type: application/json; charset=utf-8
< Date: Thu, 16 Apr 2020 06:55:41 GMT
<
{}

 Note

The body of the callback response is not relevant to the Document Information Extraction service, only the
response status of 200.

Document Information Extraction


Development PUBLIC 233
13 Using the Document Information
Extraction UI

Find out how to subscribe to, access, and use the Document Information Extraction UI.

Related Information

Subscribing to the Document Information Extraction UI [page 234]


Using the Key Features of the Document Information Extraction UI [page 237]
Best Practices [page 258]

13.1 Subscribing to the Document Information Extraction UI

To use the Document Information Extraction UI and other features, you need to subscribe to the service UI in
SAP Business Technology Platform (SAP BTP).

Prerequisites

• You have an SAP BTP global account and a Cloud Foundry subaccount.
• You’re a global account administrator.
• You’ve created a service instance for Document Information Extraction.
• You’ve created business users and user groups in your identity provider (IdP). SAP ID Service is the default
IdP, but you can also add your instance of the Identity Authentication service or a different IdP.

 Note

If you use the Identity Authentication service, see Establish Trust and Federation Between UAA and
Identity Authentication.

If you use a different IdP, see Establish Trust and Federation with UAA Using Any SAML Identity
Provider.

Document Information Extraction


234 PUBLIC Using the Document Information Extraction UI
Context

 Tip

You can also use the Set up account for Document Information Extraction booster in the SAP BTP cockpit
to automate the process. In this case, you don’t need to perform the steps for subscribing to the Document
Information Extraction UI described here. See Boosters and the tutorial Use Free Tier to Set Up Account for
Document Information Extraction and Go to Application .

 Note

You can create multiple service instances for Document Information Extraction. However, we recommend
creating only one, unless there’s a compelling reason for having more.

If you do use more than one instance, you can change between instances by choosing Settings
( cogwheels icon) Change Instance on the Document Information Extraction UI. You can specify the
instance by entering its name or its ID.

To subscribe to the Document Information Extraction UI, do the following.

Procedure

1. Open the SAP BTP cockpit and go to your subaccount.


2. Click Service Marketplace under Services on the left navigation pane.
3. Search for Document Information Extraction and click the tile.

The Overview page appears.


4. Click Create.

The New Instance or Subscription dialog appears.

 Remember

Before proceeding, check whether you’ve created an instance for Document Information Extraction. If
you haven’t, create the service instance before continuing with the following steps. Creating a service
instance is a prerequisite for using the Document Information Extraction UI.

5. Choose the default Subscription plan.


6. Click Create.
7. Click Users under Security on the left navigation pane.
8. Click the arrow under Actions in the row with your user.

The Overview page appears.


9. Click Assign Role Collection.
10. Select the role collection that you wish to assign. See Role Collections [page 236].
11. Click Assign Role Collection. For more information, see Assign Users to Role Collections.
12. Click Instances and Subscriptions on the left navigation pane.

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 235
13. Click the three dots at the right end of the row with the Document Information Extraction application and
select Go to Application from the dropdown.

The logon screen appears.


14. Enter your User and Password, previously created in your identity provider (IdP), to log on.

 Note

You may not have to log on explicitly at this point if the following is true:
• You’ve configured your user to log in with a certificate.
• Your user already has an active session on your IdP.

The Document Information Extraction UI appears.

13.1.1 Role Collections

Find out about the role collections you can use with the Document Information Extraction UI.

Document Information Extraction provides default role collections that you can assign to users. These role
collections determine which actions a user can carry out on the Document Information Extraction UI.

The default role collections and associated actions are as follows:

Role Collection Actions

Document_Information_Extraction_UI_Templ Manage the template and schema lifecycle. View


ates_Admin documents and edit extraction results.

Document_Information_Extraction_UI_End_U View documents, edit extraction results, and work with


ser schemas and templates.

Document_Information_Extraction_UI_Docum View documents in the UI application.


ent_Viewer

The default role collections grant users the following read/write permissions:

Document Template/Schema

Read Write Read Write

Document_Inform 

ation_Extractio
n_UI_Document_V
iewer

Document_Inform   

ation_Extractio
n_UI_End_User

Document Information Extraction


236 PUBLIC Using the Document Information Extraction UI
Document_Inform    

ation_Extractio
n_UI_Templates_
Admin

 Remember

The role collection Document_Information_Extraction_UI_Admin_User has been deprecated.


Assign the Document_Information_Extraction_UI_Templates_Admin to any administrators who
formerly used the deprecated role collection to manage the template and schema lifecycle.

13.2 Using the Key Features of the Document Information


Extraction UI

Find out how to use the Document Information Extraction UI features for documents, schemas, and templates.

Use the following features to handle a wide range of tasks:

• Document [page 239]


• Schema Configuration [page 245]
• Template [page 252]

 Note

For recommendations on getting better extraction results, see Optical Character Recognition (OCR): Best
Practices [page 258].

For instructions on how to set the language of the Document Information Extraction UI, see Set Screen
Language [page 237].

For information about how to use the integrated digital assistant to find answers to support-related questions,
see Built-In Support [page 238].

13.2.1 Set Screen Language

Select the screen language for the Document Information Extraction UI.

Context

The Document Information Extraction UI is currently available in the following languages:

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 237
Language Language Code

German de

English en

Spanish es

French fr

Italian it

Japanese ja

Korean ko

Portuguese pt

Russian ru

Chinese Simplified zh_CN

Chinese Traditional zh_TW

 Note

The SAP Companion in-app help is also available in the language you select for the UI. Display this help by
choosing  (question mark) in the top-right of the screen.

To set the screen language, do the following:

Procedure

1. Open the dropdown for your user name at the top-right of the screen.
2. Select Languages.
3. Select your preferred language.
4. Complete your entries by choosing Apply.

13.2.2 Built-In Support

Use the integrated digital assistant on the Document Information Extraction UI to quickly find answers to your
support-related questions.

Context

The Document Information Extraction UI includes Built-In Support, an embedded digital assistant that allows
you to search for support-related information without leaving the UI.

Document Information Extraction


238 PUBLIC Using the Document Information Extraction UI
 Note

If you have an s-user ID and the associated authorizations, Built-In Support also allows you to report issues,
review cases, and chat with an expert or a chatbot.

Procedure

1. Choose Built-In Support ( headset icon).

The Built-In Support initial screen appears. This screen gives you access to the basic support functions
that are available to all users. Here, you can enter keywords in the intelligent search field to find
relevant information in the documentation for Document Information Extraction UI. You can also call up
recommended information about the service directly via the links provided.
2. Choose the Help Information ( hint icon).

The Contextual Help screen appears. Here, you can access information, including tutorial videos, the
Built-In Support documentation, the privacy statement, and the terms of use.
3. Choose  (person icon) to view system context information.

If you have an s-user ID, you can sign in to access more Built-In Support functions. These functions allow
you to report issues via case or by chatting with an expert. In addition, you can review your cases.

13.2.3 Document

Use this Document Information Extraction UI feature to upload documents to the service and get machine
learning predictions for the extracted header fields and line items.

Context

Use this feature to do the following:

• Add Document [page 240]


• View and Edit Extraction Results [page 242]
• Delete Documents [page 244]

For additional information on working with documents, see the best practices under Document: Best Practices
[page 270].

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 239
13.2.3.1 Add Document

Procedure

1. Open the Document Information Extraction UI, as described in Subscribing to the Document Information
Extraction UI [page 234].
2. Click the Document icon in the left navigation pane.
3. Click Upload a new document ( add icon) at the top right of the screen.

The Select Document area appears. Here, you can upload a maximum of 50 files. Add files individually or
select a folder containing multiple files. Each file can have a maximum size of 50 MB and 100 pages. The
service supports the following document types: invoice, payment advice, purchase order, and custom in
PDF, JPG, PNG, and TIFF format.
4. Select the document type.
5. Choose a schema and a template, making sure that both match the document type you selected in the
preceding step. You can also use the Detect automatically function to get the service to search for the
correct template. These entries are optional.

The Document Information Extraction UI includes preconfigured SAP schemas for the following standard
document types: purchase order, payment advice, and invoice. In addition, there’s an SAP schema for
custom documents (SAP_OCROnly_schema). Templates are available only if your administrator has
created and activated them.

 Tip

For best extraction results, we strongly recommend using a schema whenever you upload documents.
For further details, see the best practices for Schema Configuration: Best Practices [page 259].

 Note

If you later want to create a template based on your document extraction results, you must choose a
schema here. See Create Template from Document Extraction Results [page 256].

Also, if you later want to add the document to a template, you must choose a schema here. Documents
and the templates they’re associated with must share the same schema. See Add Documents and
Activate/Deactivate Template [page 253].

6. Upload one or more document files by dragging and dropping them or by clicking  (add icon).
7. Click Step 2

The Select Header Fields area contains the header fields for extraction from the uploaded documents. If
you didn’t choose a schema in the Select Document step, you can select fields from the list. If you did
choose a schema, the fields are selected automatically and can’t be changed.
8. Click Step 3.

The Select Line Item Columns area shows the line items for extraction from the documents you uploaded.
Here, too, if you didn’t choose a schema in the Select Document step, you can select fields from the list. If
you did choose a schema, the fields are selected automatically and can’t be changed.

Document Information Extraction


240 PUBLIC Using the Document Information Extraction UI
9. Click Review.
10. Review your selection. Click Edit if you want to change anything. If you chose a schema in the Select
Document step, you can’t edit the header fields and line item columns here. When you’ve completed your
entries, click Confirm.

You now see the documents you’ve uploaded, with Document Name, Upload Date, and Status. When the
selected header fields and line items have been extracted, the document status changes from “PENDING”
to “READY”. You can now review the extraction results and make any corrections required. If an error
occurs during document processing, the status changes from “PENDING” to “FAILED”. In this case, you
must upload the document again.
11. In the top right of the screen, you see the clientId (c_00, for example) of the listed uploaded documents.
Click Change Client and select another clientId (c_01, for example) to see the list of uploaded
documents that have a different clientId.

Before you can change clients, there must be at least one client in addition to Default. You can’t create
clients on the Document Information Extraction UI. To add new clients, use Swagger UI and follow the
steps in Create Client [page 107].

 Note

You can restrict user access to specified clients using the clientSegregation configuration key. For
more details and guidance, see Create Configuration [page 115] and Client Segregation in Document
Information Extraction: A Brief Guide .

13.2.3.1.1 Download Troubleshooting Data

Find out how to download data needed to troubleshoot issues with adding documents to the Document
Information Extraction UI.

Context

For each document that you add to the Document Information Extraction UI, you can download a zip folder
with files for troubleshooting.

Procedure

1. Choose the Document icon in the navigation on the left of the screen.
2. Now, choose a document to display its details.

The details pane appears on the right of the screen.


3. Choose  (Download Troubleshooting Data) after the document status at the top of the details pane.

The Document Information Extraction UI downloads a zip folder to your local machine. The files in the
folder include the document that you uploaded as well as details of the document, template, and schema.

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 241
You can either upload this data to an SAP support incident or use it to do your own troubleshooting.

13.2.3.2 View and Edit Extraction Results

Context

 Remember

Document Information Extraction generally provides extraction results within 1 hour for documents
uploaded to the service. The actual processing time can be much shorter.

Procedure

1. Click the Document icon in the left navigation pane.


2. Choose the row on the Documents screen with the document that you want to view. You now see the page
preview of the document file.

 Note

If your device has a small screen, and you have difficulty checking the fields in the page preview,
download the PDF document for full-screen display.

3. Click Extraction Results to see the results for header fields and line items. You can also see the machine
learning model Extraction Confidence Range classified by colors: red (confidence between 0% and 50%),
yellow (confidence between 51% and 79%), and green (confidence between 80% and 100%). To view the
prediction confidence score for each header field and line item extracted, as well as the field name and
description, hover over a field name, for example Invoice Number.

Hovering over a field name also displays the raw value for that field – in other words, the value before
postprocessing. Raw values can differ from extraction results. For example, if the Delivery Date field of a
purchase order contains “ASAP”, Document Information Extraction can’t convert this text into a date and
therefore returns a null value. Viewing raw values enables you to identify the content of fields that couldn’t
be extracted.

 Tip

If the label property is defined for schema fields, user-friendly names for header fields and line items
are displayed in the extraction results. For further information, see Add Fields to Schema Version [page
199].

4. If corrections are required, and the document status is “READY”, you can edit the extraction results under
Header Fields and Line Items.

To download the unedited results, click  (download icon) and choose csv, json, or txt.

Document Information Extraction


242 PUBLIC Using the Document Information Extraction UI
5. Click Edit.

 Tip

To avoid losing your work if there’s an outage, activate Autosave. The service then saves your edits
automatically every 10 seconds.

You can edit extracted values manually on the right of the screen. You can also select them from the
page preview in the middle of the screen. To do the latter, hover your mouse over the page preview. The
mouse pointer changes to a crosshair cursor. Position the cursor at the corner of the value that you wish to
select. Then, hold down the left mouse button. Move the cursor diagonally to the opposite corner to draw a
bounding box around the value you want to select. Select the appropriate header or line item field from the
Field dropdown in the Assign Field dialog. Add or change the value, as necessary. If you choose a line item,
set the number in the Row Index field. Make sure the number that you enter here matches the appropriate
line item in the Label column on the right of the screen. Click Apply in the Assign Field dialog to confirm
your edits.

 Note

To prevent Document Information Extraction from extracting unwanted or irrelevant characters, you
can also draw bounding boxes around parts of the field values. In this case, you must edit the value so
that it includes only the values in the bounding box. If you associate documents edited in this way with
templates, the templates extract only those characters in the part of the field defined by the bounding
box. This approach can be useful if you want to exclude punctuation from the extraction, for example.

 Tip

If you’ve uploaded your documents using a schema but without a template, you can create a template
here using the extraction values you’ve edited.

For instructions on how to do so, see Create Template from Document Extraction Results [page 256].

Note that this option is no longer available after you confirm the document.

Alternatively, you can associate the document with an existing template by choosing Add to Template.

 Remember

If you associate a document with a template and then use that template to extract information from
the same document, the extraction values can differ from the ones you entered and confirmed during
editing.

The technical reason for differences of this kind is that the Document Information Extraction UI
extracts data based on heuristics and not on exact matching of bounding boxes.

6. Delete any bounding boxes that you don’t need. In Edit mode, hover over the tooltip for the relevant
bounding box in the page preview. Double-click the tooltip to display the Assign Field dialog and then
choose Delete to remove the bounding box and its coordinates.
7. Save your changes.

To download your edited results, click  (download icon) and choose csv, json, or txt.
8. You can also confirm the document here. To do so, choose Edit again and then choose Confirm. When you
confirm documents, the prediction confidence score of all header and line item fields is set to 1.0 (100%).

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 243
 Caution

Do not confirm documents that haven’t been reviewed and may have incorrect extraction results.
Once the document status changes from “READY” to “CONFIRMED”, you can no longer change the
extraction results.

For additional considerations when you confirm documents, see Confirm Documents [page 244].

13.2.3.2.1 Confirm Documents

There are a few points to bear in mind when you confirm documents.

• SAP reserves the right to use confirmed documents in the reporting of accuracy values and for analytics.
• By default, Document Information Extraction doesn’t use your documents to retrain the service’s
machine learning models. To allow SAP to use your documents for this purpose, set the
dataFeedbackCollection configuration key at API level to true. A checkbox appears on the UI
requesting your consent each time you confirm documents.
• If you allow SAP to use your documents for retraining, Document Information Extraction automatically
checks them for any Personally Identifiable Information (PII). If a document contains PII data, it isn’t used
for retraining. You can deactivate these checks by setting the performPIICheck subconfiguration at API
level to false.

For further details of API-level settings, see Create Configuration [page 115] and Configuration Keys [page 117].

13.2.3.3 Delete Documents

Procedure

1. In the left navigation pane, click the Document icon.


2. On the Documents overview screen, select the documents you want to delete by choosing the relevant
checkboxes.

To select all the documents in the list, choose the checkbox above the table.
3. Click Delete and then click OK to delete the documents you selected. These documents are then removed
from the Documents list.

You can also delete individual documents by choosing Delete on the document detail screen.

 Remember

You can’t delete documents that are associated with templates. In such cases, you must first navigate
to the Template overview screen and dissociate the document from the template. For further details,
see Add Documents and Activate/Deactivate Template [page 253].

Document Information Extraction


244 PUBLIC Using the Document Information Extraction UI
13.2.4 Schema Configuration

Use this Document Information Extraction UI feature to create schemas containing data fields found in
standard or custom document types. As an administrator, you can use these schemas as a basis for creating
templates. End users can select schemas and corresponding templates when adding documents.

Context

 Note

This feature is available only to users with the administrator role (role collection
Document_Information_Extraction_UI_Templates_Admin).

For additional information on using schemas, see the best practices under Schema Configuration: Best
Practices [page 259].

A schema contains a list of header fields and line item fields representing the target information you want to
extract from a particular type of document.

 Tip

The Document Information Extraction UI provides preconfigured SAP schemas for the following standard
document types: purchase order, payment advice, and invoice. You can use these schemas unchanged to
upload documents.

You can’t edit original SAP schemas. Always create a copy and then change the default fields, as required.

 Note

To extract text from images captured by camera, create a schema for a custom document type and use the
OCR engine type Scene Text.

Extraction results for scene text appear in the API, not on the Document Information Extraction UI.

For details of extracted header fields and line items, see the following sections of the Document Information
Extraction documentation:

• Extracted Header Fields [page 278]


• Extracted Line Items [page 286]

For information about limitations on extraction from tables, see Technical Constraints [page 275].

Use this feature to do the following:

• Create Schema [page 246]


• Edit Schema [page 246]
• Create Copy of Schema [page 247]
• Add Data Fields [page 247]
• Activate/Deactivate Schema [page 251]

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 245
• Delete Schema [page 251]

13.2.4.1 Create Schema

Procedure

1. Open the Document Information Extraction UI, as described in Subscribing to the Document Information
Extraction UI [page 234].
2. In the left navigation pane, choose Schema Configuration.
3. In the top right of the screen, click Create.
4. Enter a name and optionally a description for the new schema.
5. Select the appropriate type of document.

If you select Custom here, you must also select an OCR engine type. To extract text from images, select
Scene Text; otherwise, select Document.

 Remember

Extraction results for scene text recognition appear in the API, not on the Document Information
Extraction UI.

6. Choose Create.
7. Choose the row containing your new schema to display the details pane. Here, you can add data fields and
also edit, copy, activate/deactivate, or delete the schema, as described in the following sections.

 Restriction

You can’t add data fields to schemas created with document type Custom and OCR engine type Scene
Text.

In schemas created using document type Custom and OCR engine type Document, you can add data
fields. In this case, no default extractors are available.

13.2.4.2 Edit Schema

Procedure

1. In the left navigation pane, choose Schema Configuration.

Document Information Extraction


246 PUBLIC Using the Document Information Extraction UI
2. On the Configurations screen, choose the row containing the schema you want to edit. You now see the
schema details.
3. To change the schema, click the Edit button.
4. In the Edit Schema dialog, you can change the name of your schema and add, remove, or edit the
description.

 Restriction

If a schema is currently active, deactivate it before editing. When you deactivate a schema, its status on
the Configurations screen changes to “INACTIVE”.

You can’t deactivate schemas that provide the basis for templates. Otherwise, any changes to the
schema would affect the field definitions for the relevant templates.

Before deactivating a schema of this kind, first deactivate all templates based on it and then delete
them.

Once you’ve completed your changes, activate the schema again.

13.2.4.3 Create Copy of Schema

Use this feature to copy SAP or custom schemas. SAP schemas support standard document types. You can
use these preconfigured schemas unchanged to add documents and create templates. You can also copy and
edit SAP schemas as a basis for configuring schemas of your own.

Procedure

1. In the left navigation pane, choose Schema Configuration.


2. Click  (copy icon) in the row of the schema you want to copy on the Configurations screen.
3. In the Copy Schema dialog, the original schema name, followed by “_copy”, appears automatically. Edit the
name as required and add an optional description. Click the Copy button.

The copy you’ve created now appears in the Schemas list, with the status “INACTIVE”.

13.2.4.4 Add Data Fields

Procedure

1. In the left navigation pane, choose Schema Configuration.

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 247
2. On the Configurations screen, choose the row containing the schema you want to add data fields to.
You now see the schema details.
3. If the schema has the status “ACTIVE”, you must deactivate it before you can add data fields. In this case,
click Deactivate.

 Restriction

You can’t deactivate schemas that provide the basis for templates. Otherwise, any changes to the
schema would affect the field definitions for the relevant templates.

Before deactivating a schema of this kind, first deactivate all templates based on it and then delete
them.

4. To add a header field to the schema, click the Add button for Header Fields.
5. In the Add Data Field dialog, enter the name of the header field you want to extract, an optional field label,
and an optional description. Next, select the data type – either country/region, currency, discount, date,
number, or string.

 Tip

Use the Field Label option to define user-friendly names for header and line item fields. Any field labels
that you enter here replace the technical field names under Extraction Results in the Documents feature
of the Document Information Extraction UI.

 Remember

The data type country/region extracts the values in a two-letter code (alpha-2) ISO 3166 format. For
example, DE for Germany, FR for France, GB for United Kingdom, and US for United States.

6. In the Setup Type dropdown, use the prefilled value (auto or manual) or change it in line with your needs.

 Note

Which setup type you select here depends on a number of factors, including document type, preferred
extraction method, and which service plan you’re using.

For details of setup types and associated factors, see Setup Types [page 249].

7. Click Add.
On the Configurations panel on the left of the screen, the status of the schema changes to “DRAFT”.
8. If you want to edit the data field, click  (edit icon) in the Action column for the field on the right of the
screen.
9. To add line item fields to the schema, click the Add button for Line Item Fields.
10. Enter the data for the new line item field in the same way as you did for the header field.

Related Information

Setup Types [page 249]

Document Information Extraction


248 PUBLIC Using the Document Information Extraction UI
13.2.4.4.1 Setup Types

Learn about the setup types available when you add data fields to schemas. Find out how these setup types
relate to document types, extraction methods, and default extractors.

Available Setup Types

When you add data fields to a schema on the Document Information Extraction UI, you can select one of the
following setup types:

• auto

• manual

These setup types support extraction using different methods, depending on whether the schema was created
for a standard or for a custom document type.

Default Values

When you first call up the Add Data Fields dialog, the service prefills the Setup Type field. The default values
depend on the document type and which edition of Document Information Extraction you use:

• Premium edition
• Schemas for standard and custom document types: auto

• Base edition
• Schemas for standard document types: auto
• Schemas for custom document types: manual

You can change these prefilled values in line with your needs.

Document Types, Setup Types, Extractions Methods, and Default Extractors

The following table shows the various combinations of document type and setup type and how they relate to
the extraction method and the use of default extractors:

Document Type for Schema Setup Type Extraction Method Select Default Extractor?

Standard auto Service’s machine learning Yes


models

Generative AI Not applicable

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 249
Document Type for Schema Setup Type Extraction Method Select Default Extractor?

manual Template Not applicable

Custom auto Generative AI Not applicable

manual Template Not applicable

 Restriction

The setup type auto without default extractor (extraction method: generative AI) is available only with
the service plan Document Information Extraction, premium edition (premium_edition). See Service
Plans [page 77] and Metering and Pricing [page 79].

However, if you want to try out extraction using generative AI, you can do so with an SAP BTP trial account.
Simply follow the steps in the tutorial: Use Trial to Extract Information from Custom Documents with
Generative AI and Document Information Extraction

 Remember

You can use different extraction types for header fields in the same schema. However, you can’t combine
different extraction types for line items in the same schema.

For example, if you use the setup type auto without a default extractor for one line item field, you must use
it for all the other line item fields that you add to your schema.

 Caution

Always validate information extracted using generative AI before using it for critical applications.

If you prefer not to use generative AI to extract information from documents, select the setup type auto
with a default extractor (standard document types only). Alternatively, select the setup type manual
(standard and custom document types) when adding data fields to your schema.

 Note

As of October 9, 2023, the setup type default is no longer available for new schemas. If an existing schema
includes fields added before this date with the setup type default, you can use only this setup type when
adding new fields. Schemas created before this date that don’t yet include any fields offer you the choice of
auto or manual as setup type.

Because SAP schemas include fields added before October 9, 2023, when you copy these schemas, the
only setup type available is default.

Related Information

Extraction Using Generative AI: Languages [page 94]


Add Fields to Schema Version [page 199]
Extraction Using Generative AI: Best Practices [page 273]

Document Information Extraction


250 PUBLIC Using the Document Information Extraction UI
13.2.4.5 Activate/Deactivate Schema

Procedure

1. In the left navigation pane, choose Schema Configuration.


2. On the Configurations screen, choose the row containing the schema you want to activate. You now see the
schema details.
3. To activate the schema, click the Activate button. On the Configurations screen, the schema status
changes to “ACTIVE”.

If a schema doesn’t yet have any data fields, the Activate button is grayed out.
4. When a schema has the status “ACTIVE”, the Deactivate button replaces the Activate button.

 Note

If you wish to change or delete a schema that is active, you must first click Deactivate. When you
deactivate a schema, its status on the Configurations screen changes to “INACTIVE”. To enter your
changes, choose Edit  (pen icon) Once you’ve completed your changes, activate the schema again.

 Restriction

You can’t deactivate schemas that provide the basis for templates. Otherwise, any changes to the
schema would affect the field definitions for the relevant templates.

Before deactivating a schema of this kind, first deactivate all templates based on it and then delete
them.

13.2.4.6 Delete Schema

Procedure

1. In the left navigation pane, choose Schema Configuration.


2. On the Configurations screen, select the checkbox for the row containing the schema you want to delete.

You can’t delete a schema that has the value “YES” in the SAP Schema column.
3. If the schema has the status “ACTIVE”, you must deactivate it before you can delete it. In this case, click
Deactivate.

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 251
 Restriction

You can’t deactivate schemas that provide the basis for templates. Otherwise, any changes to the
schema would affect the field definitions for the relevant templates.

Before deactivating a schema of this kind, first deactivate all templates based on it and then delete
them.

4. Click Delete and then Yes to delete the selected schema. The schema is removed from the Schemas list.

13.2.5 Template

Use this Document Information Extraction UI feature to create, reuse, edit, and delete templates based on
schemas and document types. End users can select templates together with a corresponding schema to
extract information from business documents of the appropriate type and structure.

Context

 Note

This feature is available only to users with the following administrator role:

• Document_Information_Extraction_UI_Templates_Admin

For additional information on using templates, see the best practices under Template [page 265].

Templates are based on schemas and enable you to show the position of extraction fields in a particular
document layout. After creating a template, you use the Document feature to associate one or more
documents with it. You then edit the extraction results for these documents, indicating the location of fields
and their values.

Templates are essential for processing custom document types. However, you can also use them with standard
document types to fine-tune extraction results.

 Tip

If you follow the guidance in General Recommendations and Limitations [page 266], you only have to edit
the extraction results for one document that you associate with your template.

Use this feature to do the following:

• Add Template [page 253]


• Add Documents and Activate/Deactivate Template [page 253]
• Export/Import Template [page 256]
• Create Template from Document Extraction Results [page 256]
• Delete Template [page 257]

Document Information Extraction


252 PUBLIC Using the Document Information Extraction UI
13.2.5.1 Add Template

Procedure

1. Open the Document Information Extraction UI, as described in Subscribing to the Document Information
Extraction UI [page 234].
2. Click the Template icon in the left navigation pane.
3. Click Create a new template ( add icon) at the top right.
4. Enter a name and optionally a description for the new template. Select the appropriate document type
(either Invoice, Payment Advice, Purchase Order, or Custom). Choose the schema you wish to use as a
basis for the new template. Click Create.
5. Choose OK to see the template details.

The Extraction Fields tab shows the header fields and line item fields from the schema you specified.

6.  Note

This step and the ones that follow are optional and are only applicable if you want to assign a fixed
value to one or more extraction fields.

Choose the Extraction Fields tab and then choose Edit on that tab.
7. Enter a value that you wish to associate with all instances of a particular field.

For example, if you intend to use your template only for documents from one supplier, you could enter the
name of that supplier as the fixed value for the senderName field.
8. Repeat the preceding step for any other fields that you want to assign fixed values to.
9. Save your entries.

13.2.5.2 Add Documents and Activate/Deactivate Template

Context

To add documents to a template, you use the Document feature of the Document Information Extraction UI.
Adding documents to templates, as described here, helps improve accuracy.

 Restriction

The document and the template that you wish to add it to must share the same schema. If the document
and template have different schemas, you can’t add the document to the template.

If no schema was selected when the document was uploaded to the Document Information Extraction UI,
you can’t add the document to a template. In this case, Add to Template is grayed out.

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 253
Procedure

1. Choose the Document icon in the left navigation pane.


2. Add a document or documents as described in the chapter Add Document [page 240].
3. Choose the row with the document that you want to work with.

You now see the document details. It’s best if the file has at least 2 line items.
4. Edit the extraction results for the document as described in View and Edit Extraction Results [page 242].

You can confirm the document at this point. It’s not necessary to save the document. When you associate
a document with a template, the Document Information Extraction UI saves the extraction results
automatically.

 Remember

If you associate a document with a template and then use that template to extract information from
the same document, the extraction values can differ from the ones you entered and confirmed during
editing.

The technical reason for differences of this kind is that the Document Information Extraction UI
extracts data based on heuristics and not on exact matching of bounding boxes.

5. To add this document to a template, choose Add to Template at the top of the pane on the right of the
screen.
6. Select the relevant template from the dropdown and choose Add.

The document file is added to the template that you selected. It’s displayed as an associated document on
the details page for this template.
7. Repeat the preceding steps to add more documents to your template.

 Restriction

You can add a maximum of 5 documents to a template.

8. If you want to remove associated documents from a template, first choose the Template icon in the left
navigation pane.
9. Then select the relevant template.
10. Choose the  (broken link) icon in the Action column of the Associated Documents tab.
11. Finally, choose OK to confirm the action.
12. Activate a template in status “DRAFT” to use it to extract results from documents similar to the ones
associated with it.

The template status changes from “DRAFT” to “ACTIVE”.


13. Deactivate a template in status “ACTIVE” to edit it, delete it, or make it no longer available for Document
Information Extraction.

The template status changes from “ACTIVE” to “DRAFT”.

Document Information Extraction


254 PUBLIC Using the Document Information Extraction UI
13.2.5.3 Edit Template

Find out how to make changes to templates.

Context

If you want to make changes to a template, you can do so using the Edit function. You can change the template
name and description. In addition, you can select a different schema for the template. Changing the schema
makes a new set of extraction fields available for the template.

 Restriction

If a template is currently active, you must deactivate it before you can edit it.

Procedure

1. Click the Template icon in the left navigation pane.


2. Select the template that you want to edit.
3. Click Edit.

The Edit Template dialog appears. Here, you can change the name and description by editing the
corresponding fields.

You can also select a different schema for your template. To change the schema, do the following.
4. Choose the Schema dropdown and select a schema from the list.

 Note

This list includes only schemas that match the document type for which the template was originally
created.

5. Click Save to complete your changes.

 Remember

If you’ve already edited extraction result for sample documents associated with your template, these
edits are preserved following the change of schema only for fields that appear in both the old and the
new schema. After changing the schema, you can annotate the newly added fields in your existing
sample documents.

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 255
13.2.5.4 Export/Import Template

Avoid duplicated effort by reusing existing templates in different clients.

Context

You’ve created a template in a test client by following the steps in Add Template [page 253] and Add
Documents and Activate/Deactivate Template [page 253]. You’re now happy with your new template and want
to export it from the current client before importing it into your production client.

The steps presented here assume the following:

• You’re still in the test client.


• You’ve selected your new template from the Templates list and are now on the screen showing the template
details.

Procedure

1. Choose Export.

Document Information Extraction downloads the template to your local machine. The download includes
the schema.json and template.json files and a folder with the associated documents.
2. Choose Change Client and select the production client to which you want to import your template.

The Document Information Extraction UI displays the Templates list for the production client.
3. Choose  (upload icon) and navigate to the folder you downloaded in Step 1.
4. Select the folder and choose Open.

The new template appears in the list. Users can now select this template when adding documents of the
appropriate type to the Document Information Extraction UI .

13.2.5.5 Create Template from Document Extraction Results

This feature enables you to quickly and easily create templates when adding documents to the Document
Information Extraction UI.

Context

You’ve added a document by following the steps in Add Document [page 240] and View and Edit Extraction
Results [page 242].

Document Information Extraction


256 PUBLIC Using the Document Information Extraction UI
 Remember

To create a template based on document extraction results, you must use a schema when adding the
document.

Before creating a template from the document extraction results, make sure you meet the following
prerequisites:

• You’ve used an appropriate schema when adding your document for extraction.
• The document you want to base your template on has the status “READY”.

Procedure

1. Choose Create Template on the Documents UI.


2. Enter a name for your template (mandatory) and a description (optional), then choose Create.

The template detail screen appears, showing your new template with the preprocessing status “DONE”.

You can now use your template in the same way you’d use one created directly using the Template feature.
3. Activate, edit, export, or delete your template, as described in Add Documents and Activate/Deactivate
Template [page 253], Export/Import Template [page 256], and Delete Template [page 257].

13.2.5.6 Delete Template

Procedure

1. In the left navigation pane, click the Template icon.


2. On the Templates screen, click the row containing the template you want to delete.

You see the template details.

 Restriction

If a template is currently active, you must deactivate it before you can delete it.

3. Click Delete and then OK to delete the selected template.

The template is removed from the Templates list.

Document Information Extraction


Using the Document Information Extraction UI PUBLIC 257
14 Best Practices

Find out about recommended approaches for optical character recognition, the main features of the Document
Information Extraction service, data enrichment, and extraction using generative AI.

The quality of your extraction results depends on a wide range of factors. This section is intended to help you
get the best out of the Document Information Extraction service. It includes the following information:

• General recommendations on how to get better extraction and enrichment results using OCR best
practices.
• Decision procedures, recommendations, and tips on how to use the schema configuration, template, and
document features of Document Information Extraction.
• Important considerations when using generative AI to extract information from documents automatically.

Related Information

Optical Character Recognition (OCR): Best Practices [page 258]


Schema Configuration: Best Practices [page 259]
Template: Best Practices [page 265]
Document: Best Practices [page 270]
Data Enrichment: Best Practices [page 271]
Extraction Using Generative AI: Best Practices [page 273]

14.1 Optical Character Recognition (OCR): Best Practices

To get better extraction and enrichment results, bear in mind the following when uploading document files to
the Document Information Extraction service:

• Use page size A4 (Europe) or letter (United States).


• Portrait orientation is preferable.
• Use a high-quality scan.
• A handwriting detection feature is available. At present, this feature detects only handwriting in English.
• The ideal resolution is 300 dpi. For good quality, at least 150 dpi is needed. Higher resolution (> 300
dpi) generally has no effect on extraction results. Be aware that very large files take longer to preprocess
because they’re scaled back to 300 dpi. In addition, the service ignores colors and converts images to
grayscale.
• Make sure that the text isn’t blurred.
• The service extracts dark text on a light background more accurately than light text on a dark background.
• Avoid handwritten additions, such as texts, numbers, checkmarks, or underlining, as well as highlighting
with marker pens. Additions of this kind can lead to poor OCR and extraction results.

Document Information Extraction


258 PUBLIC Best Practices
• Words that are oriented differently (for example, rotated 90 degrees) or have a much larger or much
smaller font than those on the rest of the page aren’t detected.
• Avoid very small fonts and fonts that weren’t used in the training samples.
• Where text takes up only a small area of the page, excessive zoom-in or cropping can cause extraction
issues.

 Tip

• See Supported Document Types and File Formats [page 84].


• See also Supported Languages and Countries/Regions [page 86] and Technical Constraints [page
275].

14.2 Schema Configuration: Best Practices

Learn about best practices for using schemas to upload documents to the Document Information Extraction
UI.

For best results, we strongly recommend that you always use a schema when uploading standard document
types to the Document Information Extraction UI. You can also upload documents of this type directly, without
a schema or template. However, using a schema has the following benefits:

• You don’t have to select extraction fields manually every time you submit a document.
• There’s no risk of inconsistent settings for different documents.

If you’re uploading custom document types, always use a schema.

 Note

To use the Schema Configuration feature to create, copy, and edit schemas, you must have the
administrator rights provided by the following role collection:

• Document_Information_Extraction_UI_Templates_Admin

If you have the Document_Information_Extraction_UI_End_User role, you can use any available
schemas, except SAP schemas, to upload documents.

The steps involved in adding a schema differ depending on whether the document type is standard or custom.
For details of the respective processes, see the subtopics in this section.

Related Information

Standard Document Types [page 260]


Custom Document Types [page 262]

Document Information Extraction


Best Practices PUBLIC 259
14.2.1 Standard Document Types

Configure schemas for standard document types.

The Document Information Extraction UI supports the following standard document types:

• Invoice
• Payment advice
• Purchase order

 Tip

We strongly recommend using a schema to process standard document types.

The following image outlines the steps and settings for processing standard document types with or without a
template.

• Template: Best Practices [page 265]


• Create Schema [page 246]
• Create Schema [page 246]
• Setup Types [page 249]
• Setup Types [page 249]
• Add Document [page 240]

Document Information Extraction


260 PUBLIC Best Practices
 Remember

SAP schemas provide a set of typical fields with default extractors for standard document types. If
you don’t want to configure schemas for standard document types from scratch, you can select the
appropriate SAP schema unedited when you add a document or create a template on the Document
Information Extraction UI. No configuration is needed when you use SAP schemas in this way.

You can also create your own schema by copying the SAP schema for the relevant standard document type.
You can then edit this copy, choosing some or all the fields from the SAP schema as a basis for your own
schema and adding custom fields, as required.

When deciding whether to use a schema, bear in mind the following points:

• It’s better to use a preconfigured SAP schema than to use no schema at all.
• SAP schemas only support the setup type default.

Extraction Methods, Setup Types, and Default Extractors

You can use the following extraction methods for header fields in schemas for standard document types:

• Template – setup type manual without default extractor


• Machine learning models of the Document Information Extraction service – setup type auto with default
extractor
• Generative AI – setup type auto without default extractor

 Restriction

The generative AI extraction method is available only with the service plan Document Information
Extraction, premium edition (premium_edition).

 Remember

You can use different extraction types for header fields in the same schema. However, you can’t combine
different extraction types for line items in the same schema.

For example, if you use the setup type auto without a default extractor for one line item field, you must use
it for all the other line item fields that you add to your schema.

Default Extractors

Templates generally deliver better results for custom header fields than for custom line items. To get the
best extraction results when using a template or the machine learning models of the Document Information
Extraction service with standard document types, configure default extractors for header and line item fields as
follows:

• Header fields: Don’t use default extractors for custom header fields. You can then use a template to edit
them.
• Line items: Use default extractors, wherever possible.

Document Information Extraction


Best Practices PUBLIC 261
To access the default extractors when configuring a schema, choose Header Fields or Line Item Fields and then
choose Add. Next, select the relevant data type and the setup type auto. You can now select the appropriate
extractor for the data type from the Default Extractor dropdown.

Related Information

Custom Document Types [page 262]


Add Data Fields [page 247]
Setup Types [page 249]

14.2.2 Custom Document Types

Configure schemas for custom document types.

Custom documents are documents that don’t belong to the standard document types in Document
Information Extraction. There are many different types of custom document: Common examples include
powers of attorney, birth certificates, and résumés. If you want to process documents of this kind, always use a
schema.

The following image outlines the steps and settings for processing custom document types with and without a
template.

Document Information Extraction


262 PUBLIC Best Practices
• Create Schema [page 246]
• Create Schema [page 246]
• Template: Best Practices [page 265]
• Setup Types [page 249]
• Setup Types [page 249]

Extraction Methods and Setup Types

You can use the following combinations of extraction methods and setup types for header fields in schemas for
custom document types:

• Template: setup type manual


• Generative AI: setup type auto

Document Information Extraction


Best Practices PUBLIC 263
 Restriction

The generative AI extraction method is available only with the service plan Document Information
Extraction, premium edition (premium_edition).

 Note

Default extractors aren’t available for custom document types.

 Remember

You can use different extraction types for header fields in the same schema. However, you can’t combine
different extraction types for line items in the same schema.

For example, if you use the setup type auto for one line item field, you must use it for all the other line item
fields that you add to your schema.

Related Information

Add Data Fields [page 247]


Setup Types [page 249]

Document Information Extraction


264 PUBLIC Best Practices
14.3 Template: Best Practices

Decide whether to use a template when uploading documents to the Document Information Extraction UI and
make the relevant settings.

When you opt to use a schema (recommended), you must also decide whether to use a template to upload
your documents. The associated procedure is as follows:

• Template: Best Practices [page 265]


• Standard and Custom Tables [page 267]
• Add Document [page 240]

 Note

To use the Template feature to create templates, you must have the administrator rights provided by the
following role collection:

• Document_Information_Extraction_UI_Templates_Admin

The Document Information Extraction UI delivers best results with standard table structures. If your
documents include custom fields, we recommend using a template. This approach allows you to edit extraction
results for fields that don’t have default extractors. Edit all custom header fields. If the line items in your
documents are in a standard table structure, also edit the line items. However, if the table has a custom
structure, don’t edit the line items.

If the documents don’t include custom fields, and only a few of the documents share the same template layout,
don’t use a template. In this case, upload the documents using a schema only.

Document Information Extraction


Best Practices PUBLIC 265
If your documents don’t include custom fields, but many of them share the same template layout, use a
template. If the line items in your documents are in a standard table structure, edit the line items. However, if
the table has a custom structure that is likely to cause issues with the template approach, don’t edit the line
items.

 Note

If there are extraction errors when using templates, refer to the subsections of these template best
practices.

Related Information

General Recommendations and Limitations [page 266]


Standard and Custom Tables [page 267]

14.3.1 General Recommendations and Limitations

Follow best practices and be aware of limitations when using templates to extract information from custom and
standard document types.

Templates are essential when extracting information from custom document types, for which Document
Information Extraction has no pre-trained models. In addition, templates can help you fine-tune results when
extracting information from standard document types. (See Standard Document Types [page 260].)

Whether you use templates to extract information from custom or standard document types, note the
recommendations here and in Standard and Custom Tables [page 267]:

• Use templates only with well-structured form-like documents such as the following: structured forms,
application forms, certificates, prescriptions, and personal IDs.
• If possible, process one-page documents only. Otherwise, the results can be less accurate.

Note the following constraints on header and line item fields:

• If the same header field appears on more than one page, the Document Information Extraction UI extracts
this field only once.
• Templates support multiple tables per page, provided they all have a standard structure and the same table
headers. Multiple tables that are horizontally placed aren’t supported.
• Nested table structures (with items grouped in the same line) cause issues.
• Items that overlap horizontally (for example, different items in the same column) also cause problems.
• Header and line item fields with identical or very similar formatting prevent the template from
distinguishing the header from the main part of the table. As a result, the template can’t detect where
the table starts.
• If adjacent columns are too close to each other, the Document Information Extraction UI can’t distinguish
them. In such cases, the service extracts the contents of multiple columns as a single value.

Document Information Extraction


266 PUBLIC Best Practices
 Caution

If there are extraction errors when using templates, check for the following issues:

• Document for upload has significant page rotation/tilt (15 degrees or more).
• Size of pages and margins differs between document for upload and associated document.
• Position of image differs between document for upload and associated document.
• Line items in the document for upload differ slightly from the line items in the associated document.
• Images include scanning noise – for example, background images and bleed through, where text on the
back of the document is visible on the front.
• OCR results are poor.

These issues result in fields failing to map to their expected positions. In such cases, extraction can
either be incorrect (wrong value) or fail entirely (no value). If extraction fails, the system falls back to the
pre-trained global model, which can result in incorrect extraction.

Related Information

Standard and Custom Tables [page 267]


Optical Character Recognition (OCR): Best Practices [page 258]
Technical Constraints [page 275]

14.3.2 Standard and Custom Tables

Compare the tables in your documents with examples of standard and custom structures.

If you use a template to extract information from tables, you get the best results from simple, well-structured
layouts (standard tables). By contrast, custom tables can cause issues.

Before using a template, compare the tables in your documents with the following examples of standard and
custom tables.

 Remember

Whether you’re extracting information from standard or custom tables, bear the following layout-related
points in mind:

• If you use a template, make sure that the header and line item fields are formatted differently from
each other. If they have very similar or identical formatting, the template can’t distinguish the header
from the main part of the table and therefore can’t detect where the table starts.
• Make sure that adjacent table columns aren't too close to each other. If they are, the Document
Information Extraction UI can’t distinguish them. As a result, it extracts the contents of multiple
columns as a single value.

Document Information Extraction


Best Practices PUBLIC 267
Standard Tables

For best results, use tables with the standard structures shown here.

In the following examples, the column headings correspond to the header fields, and the line items appear
directly under them.

Headers Arranged Horizontally from Left to Right

Material number Description Quantity Unit price Total price

123 Product 1 1 EUR 12.35 EUR 12.35

234 Product 2 2 EUR 2.35 EUR 2.35

Headers Arranged Horizontally from Left to Right: No Nested Structures

Material number Description Quantity Unit price Total price

123 Product 1 1 EUR 12.35 EUR 12.35

Description covering
several lines

234 Product 2 2 EUR 2.35 EUR 2.35

Description covering
several lines

As shown in both of the preceding tables, headers are arranged horizontally from left to right in standard
tables. If a column includes content that covers more than one line (as in the Description column of the second
table), this content isn’t nested. In other words, it’s not spread across multiple columns.

See the contrasting examples in the Custom Tables section.

Custom Tables

Tables structured as shown in this section can cause issues during extraction and deliver poorer results.

Headers Arranged Vertically

Material number 123 234

Description Product 1 Product 2

Quantity 1 2

Unit price EUR 12.35 EUR 2.35

Total price EUR 12.35 EUR 2.35

Nested Structures

Document Information Extraction


268 PUBLIC Best Practices
Items Overlapping Horizontally

 Tip

If your documents include custom tables, we recommend using default extractors for all line items when
configuring the corresponding schema. If you then decide to use the Template function with your schema,
you don’t have to edit the extraction results for the line items.

 Note

If you follow the guidance in this subsection but still have extraction errors, refer to the general
recommendations for using templates.

Document Information Extraction


Best Practices PUBLIC 269
Related Information

General Recommendations and Limitations [page 266]

14.4 Document: Best Practices

Make the recommended settings for uploading documents to the Document Information Extraction UI.

We recommend always using a schema when uploading documents to the Document Information Extraction
UI. Schemas enable you to manage fields for extraction centrally, reducing manual effort, and inconsistencies.

When you add documents, the decision procedure is as follows:

• Add Document [page 240]


• Schema Configuration: Best Practices [page 259]
• Template: Best Practices [page 265]

If you want to use a schema without a template, simply select the appropriate schema and then upload your
documents to the Document Information Extraction UI.

If you want to use a schema with a template and know the template name, select the template from the
dropdown in the Select Document step. If you’re unsure which template to use, choose Detect Automatically.
The service then finds the best template for your document.

Document Information Extraction


270 PUBLIC Best Practices
 Tip

When uploading documents using a schema, you may find that a suitable template isn’t available. In this
case, you can create a template based on the extraction results for your documents. For details of how to
do this, see Create Template from Document Extraction Results [page 256].

To create templates in this way, you need the admin rights provided by the following role collection:

• Document_Information_Extraction_UI_Templates_Admin

14.5 Data Enrichment: Best Practices

Data enrichment is a powerful feature that matches vendors, customers, employees, and products found on a
document with master data uploaded to the Document Information Extraction service.

To improve the performance of the data enrichment feature, make sure that your master data is up to date and
activated. To get the best possible matching results, observe the following recommendations:

• Don’t use placeholder values for individual fields that lack a value. Remove these fields instead.
• Always include the keys name and address1 and populate them with a valid supplier or customer name
and address. Otherwise, the enrichment is unlikely to work as intended.
• Whenever possible, include taxId and bankAccount information in the businessEntity field. These
two fields have benefits for the enrichment.
• Always keep in mind that uploaded master data must be activated before it can be used for enrichment. If
automatic activation (default) is enabled, this process can take up to four hours.

 Tip

With large numbers of data records and for better control, use manual data activation. While automatic
data activation is more convenient in many cases, it can lead to unexpected results, especially if
triggered during the upload of new data records.

• Make sure to select the correct subtype when uploading the data (supplier for vendors or senders, and
customer for buyers or receivers).
• Currently, products are matched by materialNumber only. This means that data enrichment only works
for product line items that include a materialNumber on the document.
• If you upload a product entity without a materialNumber, this entity won’t be matched. Always include a
valid materialNumber when uploading product master data.
• To take advantage of ongoing normalization improvements, reupload the entire master data from time to
time – for example, once a quarter. To optimize the matching of values, we make improvements of this kind
continuously.

Request Examples
Not recommended – Create Enrichment Data [page 167] request payload:

payload:
{
"value":[

Document Information Extraction


Best Practices PUBLIC 271
{
"id":"BE0001",
"name":"Emma Dowerg",
"accountNumber":"SK2421",
"address1":"Amalie-Klemm-Platz 0/9, 48581, Geithain",
"address2":"none", Do not add custom placeholder values
"city":"Geithain",
"countryCode":"DE",
"postalCode":"48581",
"state":"unknown", Do not add custom placeholder values
"email":"e.dowerf@mustermail.com",
"phone":"", Do not leave empty values
"bankAccount":"DE345982837402",
"taxId":"DE435531312"
}
]
}
type: businessEntity
clientId: c_00
subtype: supplier

Recommended – Create Enrichment Data [page 167] request payload (do not use fields with custom
placeholders or empty values):

payload:
{
"value":[
{
"id":"BE0001",
"name":"Emma Dowerg",
"accountNumber":"SK2421",
"address1":"Amalie-Klemm-Platz 0/9, 48581, Geithain",
"city":"Geithain",
"countryCode":"DE",
"postalCode":"48581",
"email":"e.dowerf@mustermail.com",
"bankAccount":"DE345982837402",
"taxId":"DE435531312"
}
]
}
type: businessEntity
clientId: c_00
subtype: supplier

Related Information

Enrichment Data API [page 166]


Data Variants [page 172]
Data Duplicates [page 173]

Document Information Extraction


272 PUBLIC Best Practices
14.6 Extraction Using Generative AI: Best Practices

Find out about best practices for using generative AI to extract information from documents.

 Restriction

Extraction using generative AI is available with the service plan Document Information Extraction,
premium edition (premium_edition) only. See Service Plans [page 77] and Metering and Pricing [page
79].

You can also use an SAP BTP trial account to try out extraction using generative AI. Follow the tutorial:
Use Trial to Extract Information from Custom Documents with Generative AI and Document Information
Extraction .

 Caution

Bear the following in mind when using the Document Information Extraction service to process documents
using generative AI:

Confidence Scores: The Document Information Extraction service returns confidence scores for extracted
results. These values are usually reliable when the service uses a pre-trained model. Be aware, however,
that they can’t be relied on when the service uses generative AI to extract information.

Coordinates: Result objects returned by the API and the Document Information Extraction UI include
coordinates indicating the assumed location of extracted items of information on the page. These
coordinates are intended to let users see where the service extracted information and check manually for
errors. Even if the extraction results are correct, some coordinates may be missing or incorrect. Therefore,
coordinates can’t be relied on when the service extracts information automatically using generative AI.

See also Get Result [page 138] and View and Edit Extraction Results [page 242].

The better you describe the information that you want to extract using generative AI, the better your results will
be.

When adding fields to a schema, pay particular attention to their names and associated descriptions.

 Tip

When entering field names and descriptions, it’s often useful to imagine that you’re explaining what you
want to extract to a person with no prior knowledge.

With this point in mind, we recommend the following best practices:

• Consider the wording of names and descriptions carefully, making sure that they’re accurate, complete,
and unambiguous.
• Write your definitions in English, even if documents for extraction are in a different language.
• Make sure that field names are self-explanatory and don’t include abbreviations or acronyms.

 Example

Use purchaseOrderNumber, not pon or id1.

• If one field can have different names, include as many of these names as possible in your description.

Document Information Extraction


Best Practices PUBLIC 273
 Example

The Order Number field may be called Your Reference in some documents.

• If there are multiple fields with similar names, add all the fields to your schema, even if only one is needed
in the downstream application. Doing so simplifies processing because you can be sure of extracting a
value automatically, which you can later correct manually, if necessary.

 Example

The field names receiver material number and sender material number are very similar and therefore
could be confused with each other.

• Use generic terms rather than business roles in field names.


If a document doesn’t include labels indicating business roles, such as vendor or customer, the extraction
model may not know these roles. So, if you want to extract a vendor address from an invoice document it’s
best to use senderAddress, rather than vendorAddress.
• To simplify subsequent processing, make sure that your description includes the desired output format for
results.

 Example

If you want a value extracted from a document to be output in uppercase, you can specify this
formatting in the description.

Related Information

Extraction Using Generative AI: Languages [page 94]


Add Fields to Schema Version [page 199]
Setup Types [page 249]

Document Information Extraction


274 PUBLIC Best Practices
15 Technical Constraints

All Document Information Extraction endpoints exposed to the end user have strict technical limits. See details
in the following table.

 Note

The technical limits listed here are relevant only to users of the service plans Base Edition
(blocks_of_100) and Premium Edition (premium_edition) for enterprise accounts. See Service Plans
[page 77].

Variable Maximum Limit

Document file size 50 MB

Uploaded documents per hour per tenant 2000

Pages per document 100

Number of clients created per tenant 5000

Number of clients created in one API call 5000

Number of enrichment data records per tenant 100,000

Number of schemas per client 1000

Number of header fields and line items per schema 500

Number of templates per schema 1000

Number of associated documents per template 5

 Note

The Document Information Extraction service supports extraction from single or multiple tables. A single
table can extend across multiple pages. It’s not possible to extract information from multiple tables if they
have different sets of line item fields.

 Tip

See the following sections of the Document Information Extraction documentation for other useful
information:

• Supported Document Types and File Formats [page 84]


• Supported Languages and Countries/Regions [page 86]
• Optical Character Recognition (OCR): Best Practices [page 258]

Document Information Extraction


Technical Constraints PUBLIC 275
 Restriction

Use only the following types of characters for the IDs of clients, enrichment data records, system and
company codes, and the name of templates, schemas, and schema header and line item fields:

• letters (lowercase and uppercase)


• numbers
• underscore “_”
• hyphen “-”
• period “.”
• comma “,”
• ampersand “&”
• dollar sign “$”
• hashtag “#”
• tilde “~”

Related Information

Free Tier Option and Trial Account Technical Constraints [page 276]

15.1 Free Tier Option and Trial Account Technical


Constraints

When using the free tier option for Document Information Extraction or a trial account, be aware of the
following technical limits:

 Note

The technical limits listed here are relevant only to users of the Free service plan for enterprise accounts
and the Base Edition (blocks_of_100) service plan for trial accounts. See Service Plans [page 77].

Document Information Extraction


276 PUBLIC Technical Constraints
Variable Maximum Limit

Uploaded document pages per tenant in a rolling period of 50


30 days

 Tip
The rolling period consists of the past 30 days. The to-
tal number of document pages available at any time is
calculated based on how many pages you’ve uploaded
during these 30 days.

Let’s say that you upload your first documents to the


service on June 1, when you add 5 document pages. Up
to and including June 29, you then upload 35 more pa-
ges. Because of the 50-page limit for the rolling 30-day
period, you can upload only 10 more document pages on
June 30.

If you don’t upload any pages on June 30 and wait until


July 1, you can now add up to 15 more pages to the serv-
ice. This is because the 5 pages you uploaded on June 1
are no longer considered since they now fall outside of
the 30-day rolling period.

Pages per document 40

Number of clients created per tenant 1

 Tip
A default client is created following tenant provision-
ing, enabling you to use the service immediately.

Number of enrichment data records per tenant 10

Number of schemas per client 1000

Number of header fields and line items per schema 500

Number of templates per tenant 3

Number of associated documents per template 5

 Note

You can't change the details of the default client, a previously created customized client, and enrichment
data records. Delete the client and data records, and then create new ones with the updated details. For
more information, see Client API [page 106] and Enrichment Data API [page 166].

See also Tutorials [page 101].

Document Information Extraction


Technical Constraints PUBLIC 277
16 Extracted Header Fields

See below the list of fields that can be extracted from header fields by Document Information Extraction.

Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

amounts currencyCode Currency Three-character combination codes invoice String


Code
representing each one of the world cur- payment
rencies in circulation. For example: Advice
purchas
• AUD for Australian dollar
eOrder
• CAD for Canadian dollar
• CHF for Swiss Franc
• EUR for euro
• GBP for Great Britain pound (ster-
ling)
• USD for U.S. dollar

amounts grossAmount Gross Amount Invoice amount including taxes and invoice Number
shipping/handling costs.

amounts grossAmount Payment Amount to be paid. payment Number


Amount Advice

amounts grossAmount Total Amount Sum of subtotal, taxes, special han- purchas Number
dling charges, and shipping charges, eOrder
without discounts, or total amount due
and payable.

amounts netAmount Net Amount Invoice amount without taxes and ship- invoice Number

ping/handling costs.

amounts netAmount Sub Total Amount without taxes and ship- purchas Number
Amount eOrder
ping/handling costs.

amounts shippingAmou Shipping Shipping and handling charges. invoice Number


nt Amount

amounts taxAmount Tax Amount The tax amount applied to this docu- invoice Number
ment.

Document Information Extraction


278 PUBLIC Extracted Header Fields
Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

amounts taxId Supplier Tax The number used to identify the suppli- invoice String Used for
ID Busines-
er's company for tax purposes.
sEntity
[page 170]
sender
and re-
ceiver en-
richment.

amounts taxId Customer Tax Tax identifier of the organization send- payment String Used for
ID Advice Busines-
ing the payment advice.
sEntity
[page 170]
sender
and re-
ceiver en-
richment.

amounts taxId Tax ID Tax identifier of the sender’s business purchas String Used for
eOrder Busines-
entity. Unique to each sender.
sEntity
[page 170]
sender
and re-
ceiver en-
richment.

amounts taxIdNumber Tax ID Tax identifier number of the sender’s purchas String
Number
business entity. Unique to each sender. eOrder

amounts taxName Tax A brief description of the tax. For exam- invoice String
Description
ple: California sales tax.

amounts taxRate Tax Rate Primary tax rate applied to the docu- invoice Number
ment.

contact barcode Barcode The decoded content of the QR code busines String
for business cards supports the vCard sCard
standard. Also known as VCF (Virtual
Contact File), a vCard is a file format
standard for electronic business cards.
They can contain name and address
information, phone numbers, email ad-
dresses, URLs, logos, photographs, and
audio clips.

contact buildingName Building Name of the building in the address. busines String
Name sCard

contact city City Name of the city in the address. busines String
sCard

Document Information Extraction


Extracted Header Fields PUBLIC 279
Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

contact departmentN Department The area one works in a company. busines String
ame sCard

contact email Email Email address. busines String


sCard

contact faxNumber Fax Number Fax phone number. busines String


sCard

contact firstName First Name The name that stands first in one's full busines String
name. sCard

contact fixedLine Fixed Line Landline phone number. busines String


sCard

contact houseNumber House Number of the house in the address. busines String
Number sCard

contact lastName Last Name Surname or family name. busines String


sCard

contact middleName Middle Name Name between one's first name and busines String
surname. sCard

contact mobile Mobile Phone Mobile phone number. busines String


sCard

contact namePrefix Name Prefix Title used before a person's name. busines String
sCard

contact nameSuffix Name Suffix Title used after a person's name. busines String
sCard

contact organizationN Organization Company name. busines String


ame Name sCard

contact poBox Post Office Post office box number. busines String
Box Number sCard

contact role Role The position one has in a company. busines String
sCard

contact state State Name of the state in the address. busines String
sCard

contact streetName Street Name Name of the street in the address. busines String
sCard

contact website Website Set of related web pages located under busines String
a single domain name, typically created sCard
by a single person or company.

contact zipCode Zip Code Postal code of the address. busines String
sCard

Document Information Extraction


280 PUBLIC Extracted Header Fields
Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

details barcode Barcode The decoded content of the QR code or invoice String

barcode. For example: an URL (or some


other text) that can be used for further
processing. For more information, see
Barcode Header Field in Invoice Docu-
ments [page 285].

details purchaseOrde Purchase Number of the buyer’s purchase order. invoice String
rNumber Order

details quantity Quantity Quantity of goods or services. purchas Number


eOrder

document documentDat Invoice Date Date of the invoice document. invoice Date
e

document documentDat Payment Date Date of the payment advice document. payment Date
e Advice

document documentDat Purchase Date of the purchase order document. purchas Date
e Order Date eOrder

document documentNu Invoice Number that identifies this invoice. invoice String
mber Number

document documentNu Payment Number of the payment advice that ref- payment String
mber Reference Advice
erences the payment.

document documentNu Purchase Number that identifies this purchase purchas String
mber Order Number eOrder
order.

payment discount Discount Amount deduced from gross amount. invoice String

payment dueDate Due Date Expected date of payment in extended invoice Date
ISO 8601 format (YYYY-MM-DD).

payment paymentTerm Payment Payment terms as found on the invoice invoice String
s Terms
document. Payment terms are a com-
bination of the payment due date and
the discount rate or penalty rate.

payment paymentTerm Payment Indicate when payments should be purchas String


s Terms eOrder
made and how.

receiver receiverAddre Buyer Address of the organization that or- invoice String Used for
ss Address Busines-
dered the goods or services.
sEntity
[page 170]
receiver
enrich-
ment.

Document Information Extraction


Extracted Header Fields PUBLIC 281
Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

receiver receiverConta Buyer Contact Name of the employee that should re- invoice String Used for
ct Employee
ceive this invoice.
[page 171]
enrich-
ment.

receiver receiverId Supplier ID A unique code that identifies the sup- purchas String
plier. eOrder

receiver receiverName Buyer Name Name of the organization that ordered invoice String Used for
Busines-
the goods or services.
sEntity
[page 170]
receiver
enrich-
ment.

receiver receiverTaxId Buyer Tax ID Tax identifier of the buyer's business invoice String
entity. Unique to each buyer.

sender senderAddres Supplier Address of the organization generating invoice String Used for
s Address Busines-
this invoice.
sEntity
[page 170]
sender en-
richment.

sender senderAddres Customer Address of the organization sending payment String Used for
s Address Advice Busines-
the payment advice.
sEntity
[page 170]
sender en-
richment.

sender senderAddres Sender Address of the sender, only one box for purchas String Used for
s Address eOrder Busines-
the street, city, and country/region of
sEntity
the sender.
[page 170]
sender en-
richment.

sender senderBankA Supplier Bank Bank account of the organization gen- invoice String Used for
ccount Account Busines-
erating this invoice.
sEntity
[page 170]
sender
and re-
ceiver en-
richment.

Document Information Extraction


282 PUBLIC Extracted Header Fields
Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

sender senderBankA Sender Bank Bank account number of the sender. purchas String Used for
ccount Account eOrder Busines-
sEntity
[page 170]
sender
and re-
ceiver en-
richment.

sender senderCity Sender City City or town name of the sender's ad- purchas String
dress. eOrder

sender senderCountr Sender Country/Region code of the sender's purchas String


yCode Country eOrder
address.

sender senderDistrict Sender District name of the sender's address. purchas String
District eOrder

sender senderEmail Sender Email Email address of the sender. purchas String
eOrder

sender senderExtraA Sender Extra Any part of the sender's address not purchas String
ddressPart Address eOrder
included in the other address fields.

sender senderFax Sender Fax Fax number of the sender. purchas String
eOrder

sender senderHouse Sender House House number of the sender's address. purchas String
Number Number eOrder

sender senderId Sender ID A unique code that identifies the purchas String
sender. eOrder

sender senderName Supplier Name of organization generating this invoice String Used for
Name Busines-
invoice.
sEntity
[page 170]
sender en-
richment.

sender senderName Customer Name of the organization sending the payment String Used for
Name Advice Busines-
payment advice.
sEntity
[page 170]
sender en-
richment.

sender senderName Sender Name Name of the sender of the document purchas String Used for
eOrder Busines-
(usually the sending company).
sEntity
[page 170]
sender en-
richment.

Document Information Extraction


Extracted Header Fields PUBLIC 283
Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

sender senderPhone Sender Phone Telephone number of the sender. purchas String
eOrder

sender senderPostal Sender Postal Postal code of the sender's address. purchas String
Code Code eOrder

sender senderState Sender State State or province name of the sender's purchas String
address. eOrder

sender senderStreet Sender Street Street name of the sender's address. purchas String
eOrder

shipTo deliveryDate Delivery Date Date of the delivery in extended ISO invoice Date
8601 format (YYYY-MM-DD). purchas
eOrder

shipTo deliveryNoteN Delivery Note Unique identifier on the invoice follow- invoice String
umber Number
ing the goods.

shipTo shippingTerm Shipping Indicate when the goods should be de- purchas String
s Terms eOrder
livered and how.

shipTo shipToAddres Shipping Address where the goods will be ship- purchas String
s Address eOrder
ped to: only one box for the street, city,
and country/region.

shipTo shipToCity Shipping City City or town name of the shipping ad- purchas String
dress. eOrder

shipTo shipToCountr Shipping Country/Region code of the shipping purchas String


yCode Country eOrder
address.

shipTo shipToDistrict Shipping District name of the shipping address. purchas String
District eOrder

shipTo shipToEmail Shipping Email address for the shipping address. purchas String
Email eOrder

shipTo shipToExtraAd Shipping Extra Any part of the shipping address not purchas String
dressPart Address eOrder
included in the other address fields.

shipTo shipToFax Shipping Fax Fax number for the shipping address. purchas String
Number eOrder

shipTo shipToHouseN Shipping House number of the shipping address. purchas String
umber House eOrder
Number

shipTo shipToName Shipping Company name for the shipping ad- purchas String
Company eOrder
dress.
Name

Document Information Extraction


284 PUBLIC Extracted Header Fields
Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

shipTo shipToPhone Shipping Telephone number for the shipping ad- purchas String
Telephone eOrder
dress.
Number

shipTo shipToPostalC Shipping Postal code of the shipping address. purchas String
ode Postal Code eOrder

shipTo shipToState Shipping State or province name of the shipping purchas String
State eOrder
address.

shipTo shipToStreet Shipping Street name of the shipping address. purchas String
Street eOrder

16.1 Barcode Header Field in Invoice Documents

When the barcode header field is requested for extraction, the Document Information Extraction service scans
the whole document for 1D and 2D barcodes and provides the extracted content of the barcode as a string
value. The service can detect multiple barcodes in the same document and provide all the detected content in
the extracted results. The most common types of 1D and 2D barcodes are supported by this field, for example:

• Code39
• Code128
• DataMatrix
• EAN
• Interleaved 2 of 5
• PDF417
• QRCode
• UPC

Document Quality and Extraction

The document quality affects the result of the extraction. For example, a low quality (low resolution) image
of a scanned document with a barcode may not return any barcode header field extraction, if the barcode
in the document is not identifiable. Therefore, the quality of a decoded barcode interferes in the prediction
confidence score of the barcode header field. Use high quality (high resolution) images to increase the chance
of extraction for barcodes in the document.

Document Information Extraction


Extracted Header Fields PUBLIC 285
17 Extracted Line Items

See below the list of fields that can be extracted from line items by Document Information Extraction.

Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

amounts currencyCode Currency Three-character combination codes purchas String


Code
representing each one of the world cur- eOrder
 Cau-
rencies in circulation. For example:
tion
• AUD for Australian dollar
The
currency
• CAD for Canadian dollar

Code line • CHF for Swiss Franc


item was • EUR for euro
depre- • GBP for Great Britain pound (ster-
cated in ling)
February
• USD for U.S. dollar
2024. It's
no longer
available
for ex-
traction.

amounts deductionAm Deductions Deductions for a document because of payment Number


ount Advice
damages or late delivery.

amounts discountAmo Discount Discount received for a document. payment Number


unt Amount Advice

amounts netAmount Amount Total amount of the line item (typically invoice Number
unit price * quantity). payment
Advice
purchas
eOrder

details customerMat Customer Unique code that identifies a specific purchas String Used for
erialNumber Material eOrder Product
good or service in a customer catalog
Number [page 172]
or system.
enrich-
ment.

details description Description Textual description of goods or serv- invoice String


ices. purchas
eOrder

Document Information Extraction


286 PUBLIC Extracted Line Items
Supported
Document Enrich-
Category Field Name Field Label Description Type Type ment Data

details materialNumb Material Unique code that identifies a specific invoice String Used for
er Number Product
good or service in a supplier catalog or
[page 172]
system.
enrich-
ment.

details purchaseOrde Purchase Number of the associated purchase or- invoice String
rNumber Order Number
der (if available on line item field level).

details quantity Quantity Quantity of goods or services. invoice Number


purchas
eOrder

details supplierMateri Supplier Unique code that identifies a specific purchas String Used for
alNumber Material eOrder Product
good or service in a supplier catalog or
Number [page 172]
system.
enrich-
ment.

details unitOfMeasur Unit of The unit of measure UN/CEFACT code. invoice String
e Measure
For example: EA for each, HR for hour purchas
and YR for year. eOrder

details unitPrice Unit Price Price for a single instance of an object. invoice Number
purchas
eOrder

document documentDat Document Date of the invoice document. payment Date


e Date Advice

document documentDat Document Requested delivery date. purchas Date


e Date eOrder

document documentNu Document Document number that is used by the payment String
mber Number Advice
receiver.

item itemNumber Item Number Item number that is used by the re- purchas String
ceiver. eOrder

Document Information Extraction


Extracted Line Items PUBLIC 287
18 Security

Get an overview on the security information that applies to Document Information Extraction. Learn about the
main security aspects of the service and its components.

Related Information

Data Protection and Privacy [page 288]


Auditing and Logging Information [page 291]
Front-End Security [page 293]

18.1 Data Protection and Privacy

Introduction

Data protection is associated with numerous legal requirements and privacy concerns. In addition to
compliance with general data privacy regulation, it is necessary to consider compliance with industry-specific
legislation in different countries/regions. SAP provides specific features and functions to support compliance
with regard to relevant legal requirements, including data protection. SAP does not give any advice on whether
these features and functions are the best method to support company, industry, regional, or country/region-
specific requirements. Furthermore, this information does not give any advice or recommendation in regards
to additional features that would be required in particular IT environments; decisions related to data protection
must be made on a case-by-case basis, under consideration of the given system landscape and the applicable
legal requirements.

 Note

SAP software supports data protection by providing security features and specific data protection-relevant
functions such as functions for the simplified blocking and deletion of personal data. SAP does not provide
legal advice in any form. The definitions and other terms used in this document are not taken from any
given legal source.

Document Information Extraction may process personal data, such as employee names and email addresses,
depending on the information available in documents and enrichment data.

All data processed by the service is stored in the SAP BTP, Cloud Foundry environment. Document Information
Extraction generally processes the following data types:

Document Information Extraction


288 PUBLIC Security
Data required by Document Information Extraction

Data Purpose

Inference Docu- Refers to documents that are submitted by users to receive machine learning predictions.
ments

Data Feedback Col- Refers to documents that are submitted by users to receive machine learning predictions, and to be
lection Documents used to retrain the service's machine learning models through the data feedback collection feature.

Documents Associ- Refers to documents that are submitted by users and associated with templates to extract informa-
ated with Templates tion from other similar business documents.

Enrichment Data Refers to enrichment data records, for example, supplier name and supplier address. The serv-
ice matches your existing structured data (typically master data records) with the information
extracted from documents.

Read Access Logging

Document Information Extraction does not persist any sensitive personal data. For this reason, it does not log
read access to sensitive personal data.

Information Report

The data from inference documents and data feedback collection documents used by Document Information
Extraction is controlled and managed by the consuming application which calls the Document Information
Extraction APIs. Document Information Extraction does not create or modify inference or retraining data
provided by the consuming application. Therefore it is not possible for Document Information Extraction to
provide a retrieval function to identify data of specific individuals.

It is recommended that the consuming application which uses Document Information Extraction provides
personal data reports to its users and transfers to Document Information Extraction for processing. After every
change of the data in the customer system, customers should call the Create Enrichment Data [page 167]
endpoint.

Deletion of Personal Data

See in the table below, retention period and deletion details for all data types required by the Document
Information Extraction service.

Deletion of personal data is logged using audit logging services. For more information, see Audit Logging in the
Cloud Foundry Environment.

Document Information Extraction


Security PUBLIC 289
Data Deletion

Inference Documents The default retention period for inference data documents is 7 days. You can also use the
documentRetentionTimeDays key to Create Configuration [page 115] and customize the
retention period, for inference documents uploaded to the service, from 1 to 30 days.

You can delete inference data using the Delete Document [page 165] endpoint at any time, even
before the retention period expires.

Data Feedback Collec- There is no default retention period for retraining data documents.
tion Documents
You can delete all retraining data using the Create Configuration [page 115] and Delete Configu-
ration [page 124] endpoints at any time.

You can also individually delete documents previously submitted for retraining using the Delete
Document [page 165] endpoint at any time.

If the performPIICheck subconfiguration is set to true, the service automatically scans all
submitted documents and tries to exclude all documents where Personally Identifiable Informa-
tion (PII) data is detected from being used for retraining and improving the service.

It is the customer's responsibility to ensure that no personal data is submitted when using the
data feedback collection feature.

Documents Associated The documents uploaded to the document feature and associated with templates are not de-
with Templates leted automatically. To minimize the processing of personal data, do not use sample documents
that contain personal data.

Enrichment Data Enrichment data containing personal data is deleted automatically when customers delete the
service instances.

You also control the enrichment data retention period using the Delete Enrichment Data (Syn-
chronous) - Deprecated [page 181] and Delete Enrichment Data (Asynchronous) [page 182]
endpoints to delete enrichment data records at any point in time.

Change Log

The application does not perform any update of enrichment data automatically. Any update of enrichment data
per customer request would be logged using audit logging services. For more information, see Audit Logging in
the Cloud Foundry Environment.

Consent

According to Personal Data Processing Agreement for SAP Cloud Services, SAP acts as data processor. Thus,
customers are responsible for obtaining relevant consent to process personal data, including when applicable
approval by controllers to use SAP as a processor.

Document Information Extraction


290 PUBLIC Security
18.2 Auditing and Logging Information

Here you can find a list of the security events that are logged by the Document Information Extraction service.

Security events written in audit logs


How to identify related log
Event grouping What events are logged events Additional information

Authentication related events Authentication success Successful login attempt for See below the definitions of
tenant {tenant_id} on {in- the notations used in the log
stance_id} on {time} events.

Authentication failure Failed login attempt for


• {client_name}: ID of a
client created with the
tenant {tenant_id} on {in-
Create Client [page 107]
stance_id} on {time}
endpoint.
Client related events Client(s) created "Tenant" and ID consisting of: • {dataset_id}: ID of the
targetTenant {tenant_id} dataset (enrichment
data).
(Multiple) Attribute(s) with
name "client {client_name}" • {document_id}: ID of a
and state change from None document uploaded to
to "CREATED" the service.
• {instance_id}: ID of the
Client(s) deleted "Tenant" and ID consisting of: service instance used to
targetTenant {tenant_id} access the service.

(Multiple) Attribute(s) with • {tenant_id}: ID of the


name "client {client_name}" tenant used to access
and state change from "CRE- the service.
ATED" to "DELETED" • {time}: time stamp of
when a log was created.
Dataset related events Modification of dataset (en- Modification of dataset:{da-
You can use time stamps
richment data) taset_id} successful
to sort the logs by time.
Modification of dataset:{da- • {user_id}: ID of the user
taset_id} failed that accessed the serv-
ice and performed docu-
Deletion of dataset (enrich- Deletion of dataset:{data- ment related tasks.
ment data) set_id} successful

Deletion of dataset:{data-
set_id} failed

Document related events Deletion of documents (cus- Deletion of document:{docu-


tomer documents, for exam- ment_id} successful
ple, invoices uploaded to the
Deletion of document:{docu-
service)
ment_id} failed

Document Information Extraction


Security PUBLIC 291
How to identify related log
Event grouping What events are logged events Additional information

Document access attempt Document access attempt by


{user_id} of {tenant_id} on
{document_id}

Document updated Successful / failed modifica-


tion attempt by {user_id}
of {tenant_id} on {docu-
ment_id}

Attribute with name "extrac-


tions" was changed

Document confirmed Successful / failed modifica-


tion attempt by {user_id}
of {tenant_id} on {docu-
ment_id}

Attribute with name "sta-


tus" was changed to "CON-
FIRMED"

Document deleted Successful / failed modifica-


tion attempt by {user_id}
of {tenant_id} on {docu-
ment_id}

Attribute with name "status"


was changed to "DELETED"

Tenant related events Tenant provision "Tenant provisioned" and ID


consisting of: targetTenant
{tenant_id}

Attribute with name


"state" was changed
from "DOES_NOT_EXIST" to
"PROVISIONED"

Tenant de-provision "Tenant de-provisioned" and


ID consisting of: targetTenant
{tenant_id}

Attribute with name "state"


was changed from "PRO-
VISIONED" to "DEPROVI-
SIONED"

Document Information Extraction


292 PUBLIC Security
How to identify related log
Event grouping What events are logged events Additional information

Tenant saas-subscription "Tenant SAAS Subscription"


and ID as targetTenant {ten-
ant_id}

Attribute with name


"state" was changed
from "DOES_NOT_EXIST" to
"SAAS_SUBSCRIBED"

Tenant saas-unsubscription "Tenant SAAS UnSubscrip-


tion" and ID as targetTenant
{tenant_id}

Attribute with name


"state" was changed from
"SAAS_SUBSCRIBED" to
"SAAS_UNSUBSCRIBED"

Related Information

Audit Logging in the Cloud Foundry Environment

18.3 Front-End Security

The Document Information Extraction UI (User Interface) is a web application that supports the following
features:

• SAPUI5 Frame option to avoid clickjacking attacks


• Cross-site request forgery (CSRF) protection
• Cross-site scripting (XSS) output encoding during SAP UI5 rendering
• Secure socket layer (SSL) transport layer encryption using HTTPS
• Access to business data only after authentication and with sufficient authorizations using SAP Business
Technology Platform (SAP BTP) identity management and SAP BTP role-based access management
(RBAM)
• Cross-site-scripting counter measures
• Session inactivity timeout (15 minutes)
• Rate limiting for document upload
• Data access audit log for viewing extracted document
• Data change audit log for changing/confirming extraction results
• Data change audit log for deleting a document

Document Information Extraction


Security PUBLIC 293
19 Accessibility Features in Document
Information Extraction

To optimize your experience of Document Information Extraction, SAP Business Technology Platform (SAP
BTP) provides features and settings that help you use the software efficiently.

 Note

Document Information Extraction runs on the SAP BTP cockpit. For this reason, the accessibility features
for SAP BTP cockpit apply. For more information, see the accessibility documentation for SAP BTP cockpit
on SAP Help Portal at Accessibility Features in SAP BTP Cockpit.

The Document Information Extraction UI is based on SAPUI5. It provides accessibility support in its tools
and customer documentation. For more information on keyboard handling for SAPUI5 UI elements and screen-
reader support for SAPUI5 controls, see Accessibility for End Users.

Document Information Extraction


294 PUBLIC Accessibility Features in Document Information Extraction
20 Monitoring and Troubleshooting

Find out how to get support, and explore solutions to potential issues.

Related Information

Getting Support [page 295]


Troubleshooting [page 296]
Download Troubleshooting Data [page 241]

20.1 Getting Support

If you encounter an issue with this service, we recommend that you follow the procedure below.

Check Platform Status


Check the availability of the platform at SAP Trust Center .

For more information about selected platform incidents, see Root Cause Analyses.

Check Guided Answers


In the SAP Support Portal, check the Guided Answers section for SAP Business Technology Platform. You
can find solutions for general platform issues as well as for specific services there.

Contact SAP Support


You can report an incident or error through the SAP Support Portal. For more information, see Getting Support.

Please use the following component for your incident:

Component Name Component Description

CA-ML-BDP Services related to Business Document Processing

When submitting the incident, we recommend including the following information:

• Region information (Canary, EU10, US10, for example)


• Subaccount technical name

Document Information Extraction


Monitoring and Troubleshooting PUBLIC 295
• The URL of the page where the incident or error occurs
• The steps or clicks used to replicate the error
• Screenshots, videos, or the code entered
• Any business documents (for example, invoices) with which there have been extraction issues

Related Information

Built-In Support [page 238]

20.2 Troubleshooting

In this section, see possible reasons for the following Document Information Extraction potential issues:

• Problem: You Receive Status Code 4** [page 296]


• Problem: You Receive Status Code 400 [page 297]
• Problem: You Receive Status Code 401 [page 297]
• Problem: You Receive Status Code 413 [page 298]
• Problem: You Receive Status Code 415 [page 298]
• Problem: You Receive Status Code 422 [page 299]
• Problem: You Receive Status Code 429 [page 299]
• Problem: You Receive Status Code 500 [page 300]

20.2.1 Problem: You Receive Status Code 4**

If you are getting a 4** status code for your request (such as 400, 401, or 422), make sure that you
are submitting the request correctly. In most cases, the problem can be fixed in the request. Perhaps the
authentication information is missing or the request is using the wrong HTTP method (GET, POST, DELETE). Or
maybe the payload is invalid.

Document Information Extraction


296 PUBLIC Monitoring and Troubleshooting
20.2.2 Problem: You Receive Status Code 400

 Output Code

Status: 400 Bad Request


{
"errors": [
{
"code": "string",
"message": "string"
}
]
}

Possible reasons:

A 400 error means that the request is malformed. This can be because of one of the following reasons:

• The request does not have the correct Content-Type header (usually application/json)
• The request payload is not a valid JSON
• The request payload does not contain some of the required fields and files
• The authorization token was not included in the headers. The error message will be "Authorization
token was not found in headers". The header should look like Authorization: Bearer
eyJhbGc....

20.2.3 Problem: You Receive Status Code 401

 Output Code

Status: 401 Unauthorized


{
"error": {
"statusCode": 401,
"message": "..."
},
}

Document Information Extraction


Monitoring and Troubleshooting PUBLIC 297
Possible reasons:

A 401 error means that you did not supply correct authentication information. This can be because of one of
the following reasons:

• You provided an invalid tenant password


• You provided an invalid authentication token or the authentication token has expired

20.2.4 Problem: You Receive Status Code 413

 Output Code

Status: 413 Request Entity Too Large


{
"error": {
"statusCode": 413,
"message": "..."
}
}

Possible reasons:

A 413 status indicates that the request you are making is too large. Either you are sending a file that is too large
or trying to process too many objects in a single request.

20.2.5 Problem: You Receive Status Code 415

 Output Code

Status: 415 Unsupported File Type


{
"error": {
"statusCode": 415,
"message": "..."
}
}

Document Information Extraction


298 PUBLIC Monitoring and Troubleshooting
Possible reasons:

You get a 415 status code when you use the wrong content type or file format. See Supported Document Types
and File Formats [page 84].

20.2.6 Problem: You Receive Status Code 422

 Output Code

Status: 422 Unprocessable Entity


{
"error": {
"statusCode": 422,
"message": "..."
}
}

Possible reasons:

You get a 422 status code when your request payload references a clientId, senderId, or documentId that does
not exist. For example, you will get this error if you try to create a document for a client that does not exist.

You may also get this error if the document you upload cannot be parsed.

20.2.7 Problem: You Receive Status Code 429

 Output Code

Status: 429 Rate Limit Exceeded


{
"error": {
"statusCode": 429,
"message": "..."
}
}

Document Information Extraction


Monitoring and Troubleshooting PUBLIC 299
Possible reasons:

You get a 429 status code when you have reached the rate limit for this user. You have made too many
requests.

20.2.8 Problem: You Receive Status Code 500

You get a 500 status code for your request due to a server error and not an issue with the request. A 500 error
is usually an error in the Document Information Extraction application code. To report 500 errors, create an
incident on the component CA-ML-BDP, as described in Getting Support [page 295].

Document Information Extraction


300 PUBLIC Monitoring and Troubleshooting
Important Disclaimers and Legal Information

Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:

• Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:

• The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.

• SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.

• Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering an SAP-hosted Web site. By using
such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.

Videos Hosted on External Platforms


Some videos may point to third-party video hosting platforms. SAP cannot guarantee the future availability of videos stored on these platforms. Furthermore, any
advertisements or other content hosted on these platforms (for example, suggested videos or by navigating to other videos hosted on the same site), are not within
the control or responsibility of SAP.

Beta and Other Experimental Features


Experimental features are not part of the officially delivered scope that SAP guarantees for future releases. This means that experimental features may be changed by
SAP at any time for any reason without notice. Experimental features are not for productive use. You may not demonstrate, test, examine, evaluate or otherwise use
the experimental features in a live operating environment or with data that has not been sufficiently backed up.
The purpose of experimental features is to get feedback early on, allowing customers and partners to influence the future product accordingly. By providing your
feedback (e.g. in the SAP Community), you accept that intellectual property rights of the contributions or derivative works shall remain the exclusive property of SAP.

Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.

Bias-Free Language
SAP supports a culture of diversity and inclusion. Whenever possible, we use unbiased language in our documentation to refer to people of all cultures, ethnicities,
genders, and abilities.

Document Information Extraction


Important Disclaimers and Legal Information PUBLIC 301
www.sap.com/contactsap

© 2024 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form


or for any purpose without the express permission of SAP SE or an SAP
affiliate company. The information contained herein may be changed
without prior notice.

Some software products marketed by SAP SE and its distributors


contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for


informational purposes only, without representation or warranty of any
kind, and SAP or its affiliated companies shall not be liable for errors or
omissions with respect to the materials. The only warranties for SAP or
SAP affiliate company products and services are those that are set forth
in the express warranty statements accompanying such products and
services, if any. Nothing herein should be construed as constituting an
additional warranty.

SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.

Please see https://www.sap.com/about/legal/trademark.html for


additional trademark information and notices.

THE BEST RUN

You might also like