Published September 18, 2023
| Version 1.2
Dataset
Open
SCI-3000: A Novel Dataset for the Task of Figure, Table and Caption Extraction from Scientific PDFs
Description
This dataset contains bounding boxes of figures, tables, captions in 34,791 pages extracted from 3000 open-access scientific publications from the fields of medicine, chemistry, physics, computer science, and technology. The underlying publications are also included in PDF form.
For more details, refer to the README file.
Notes
Files
SCI-3000-full.zip
Additional details
Related works
- Is supplement to
- Conference paper: 10.1007/978-3-031-41676-7_14 (DOI)