Drobac et al., 2023 - Google Patents
An OCR pipeline for transforming parliamentary debates into linked data: Case ParliamentSampo–Parliament of Finland on the semantic webDrobac et al., 2023
View PDF- Document ID
- 7622086608060185815
- Author
- Drobac S
- Sinikallio L
- Hyvönen E
- Publication year
- Publication venue
- Digital Humanities in the Nordic and Baltic Countries Publications
External Links
Snippet
This paper presents the OCR pipeline created for ParliamentSampo-Parliament of Finland on the Semantic Web, a Linked Open Data (LOD) service, data infrastructure, and semantic portal for studying Finnish political culture, language, and networks of the Members of …
- 230000001131 transforming 0 title description 5
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2288—Version control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/30707—Clustering or classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30908—Information retrieval; Database structures therefor; File system structures therefor of semistructured data, the undelying structure being taken into account, e.g. mark-up language structure data
- G06F17/30914—Mapping or conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30011—Document retrieval systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F17/30 and subgroups
- G06F2216/11—Patent retrieval
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Atkins et al. | Corpus design criteria | |
Oostdijk et al. | The construction of a 500-million-word reference corpus of contemporary written Dutch | |
Surdeanu | Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling and Temporal Slot Filling. | |
Lapponi et al. | The Talk of Norway: a richly annotated corpus of the Norwegian parliament, 1998–2016 | |
US8201085B2 (en) | Method and system for validating references | |
Papadopoulos et al. | The IMPACT dataset of historical document images | |
Drobac et al. | An OCR pipeline for transforming parliamentary debates into linked data: Case ParliamentSampo–Parliament of Finland on the semantic web | |
Sinikallio et al. | Plenary debates of the Parliament of Finland as linked open data and in Parla-CLARIN markup | |
CN110609983A (en) | Structured decomposition method for policy file | |
Maarand et al. | A comprehensive comparison of open-source libraries for handwritten text recognition in Norwegian | |
Vuković | Representing variation in a spoken corpus of an endangered dialect: the case of Torlak | |
Rubinstein | Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew | |
CN110083654A (en) | A kind of multi-source data fusion method and system towards science and techniques of defence field | |
Galibert et al. | Extended named entity annotation on ocred documents: From corpus constitution to evaluation campaign | |
Hladká et al. | Compiling Czech parliamentary stenographic protocols into a corpus | |
Milioni | Automatic transcription of historical documents: Transkribus as a tool for libraries, archives and scholars | |
Tolonen et al. | Scaling Up Bibliographic Data Science. | |
Thammarak et al. | Automated data digitization system for vehicle registration certificates using google cloud vision API | |
Booth et al. | BLN600: A parallel corpus of machine/human transcribed nineteenth century newspaper texts | |
Boenig et al. | Labelling OCR Ground Truth for Usage in Repositories | |
Rydberg-Cox | Digitizing Latin incunabula: Challenges, methods, and possibilities | |
CN117591571A (en) | Intelligent document writing system for assisting writing | |
Smith et al. | Analytical developments for the Homer Multitext: palaeography, orthography, morphology, prosody, semantics | |
Stewart et al. | A new generation of textual corpora: mining corpora from very large collections | |
Repolusk et al. | The KuiSCIMA Dataset for Optical Music Recognition of Ancient Chinese Suzipu Notation |