MVFR: Multilingual Visual Font Recognition Synthetic Dataset

With the advancements of deep learning and computer vision-incorporated applications, the Visual Font Recognition (VFR) field has evolved rapidly. From browser extensions to mobile and web apps, several efficient systems now exist for identifying fonts from images. However, progress in languages other than English has been limited, largely due to insufficient data availability. To address this obstacle, we have created a synthetic image dataset for VFR encompassing four different languages: Bangla, Hindi, Russian, and Spanish. Each language is represented by a dedicated folder, with 10 subfolders containing 5,000 images each, resulting in a substantial corpus of 200,000 images overall. Furthermore, we have provided the Python generator script used to create this dataset, which can be employed to generate synthetic VFR image data for additional languages, furthering the progress of the VFR field in languages with limited resources.

The dataset has been published in IEEE Data Descriptions and can be accessed from IEEE Xplore
Access the dataset directly from Mendeley Data

Value of the Data

The MVFR dataset is a comprehensive synthetic dataset that can be a valuable resource for researchers working on the Visual Font Recognition domain. Researchers and developers can explore the use of deep learning architectures to effectively recognize visual font styles by employing this dataset.
The dataset contains a total of 2,00,000 images of 4 different languages. For each language, we employed 10 popular fonts and generated images of 5000 distinct common words for each font, thus each language has 50,000 images. This large collection can be utilized for benchmarking and comparing the performances of different traditional to advanced deep learning models.
This is the first-ever large open-source VFR dataset on the respective languages. Researchers can utilize this dataset for developing tools and applications for visual font recognition of the respective language e.g. browser extensions, mobile apps, etc.
Apart from the VFR application, this dataset can also be utilized in other computer vision applications as well, for instance, researchers can experiment with this dataset for optical character recognition (OCR) from diverse font styles as well.

Sample Data Instances

Citation

@ARTICLE{10680521,
  author={Tonmoy, Moshiur Rahman and Adnan, Akhtaruzzaman and Saha, Aloke Kumar and Mridha, M. F. and Dey, Nilanjan},
  journal={IEEE Data Descriptions}, 
  title={Descriptor: Multilingual Visual Font Recognition Dataset (MVFR)}, 
  year={2024},
  volume={1},
  number={},
  pages={1-6},
  doi={10.1109/IEEEDATA.2024.3460768}}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
sample_data		sample_data
Data Generator.ipynb		Data Generator.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MVFR: Multilingual Visual Font Recognition Synthetic Dataset

Value of the Data

Sample Data Instances

Citation

About

Releases

Packages

Languages

License

moshiurtonmoy/MVFR

Folders and files

Latest commit

History

Repository files navigation

MVFR: Multilingual Visual Font Recognition Synthetic Dataset

Value of the Data

Sample Data Instances

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages