A Graph Framework for Describing Visual Compositions and Modelling Uncertainty.
Bridging the gap between flexible humanities research and high-performance data engineering.
This project is optimized for an interactive documentation experience.
- Github Pages: If you prefer the reader experience, please switch to Github Pages.
- Zenodo Users: For the full interactive experience, latest updates, and proper navigation, please switch to our GitHub Repository.
.
├── docs/ # Detailed technical specifications (Architecture etc.)
├── modules/
│ ├── 00_core/ # The domain-agnostic engine (Cypher Statements, Fixtures)
│ ├── 05_ontology/ # Multi-language support and semantic metadata (WIP)
│ └── 10_numismatics/ # Domain extension for ancient coinage (WIP)
└── README.md # This file
To investigate and search hundreds of thousands of coin datasets for our Imagines Nummorum project in real-time, we required a high-performance, scalable solution.
While CIDOC CRM remains the academic gold standard for linked data in cultural heritage, implementing it at this specific scale and complexity often results in systems that are practically unusable for high-speed querying. Furthermore, we needed a native way to model scientific uncertainty and expert reasoning — areas where current standards offer no satisfactory high-performance solutions.
IN.IDEA is our tool, not our goal. It was born out of the necessity to achieve a comprehensive iconographical and semantic indexing of ancient Greek coinage. We invite the community to join us on this 25-year mission (2025–2050) and to adapt or evolve our architecture for their own research needs.
IN.IDEA is designed to act as a semantic workbench for scholars in numismatics, archaeology, and visual studies. It transforms static image descriptions into a dynamic, multidimensional knowledge graph.
- Complex Iconographical Queries: Perform advanced searches that go beyond simple keywords. You can query specific scene structures, such as: "Show me all units where a deity (Entity) is holding (Relation) a weapon (Object)".
- Modeling the "Maybe": In the humanities, truth is rarely binary. IN.IDEA allows you to capture uncertainty by assigning certainty values (0.0–1.0) and documenting conflicting hypotheses or expert reasoning for every identification.
- Separating Sight from Meaning: Maintain a clean distinction between a neutral formal description (what is seen, the segment) and its epistemic interpretation (what it represents, the Platonic idea).
- Tracking Provenance & Methodology: Every statement is linked to an Agent (Human/AI), ensuring that the history of an interpretation is fully traceable and bibliographically grounded. Source References and Methological information enable detailled reasoning.
- Flexible Ontological Mapping: The framework is architecturally agnostic. While it serves as the foundation for the upcoming ThING (Thesaurus Iconographicus Nummorum Graecorum), you can model and plug in any hierarchical classification system or domain-specific ontology to meet your research requirements.
- Quantifying Similarity: Utilize the Hub-and-Spoke model to identify and quantify "diffuse" similarities between compositions that are difficult to capture through traditional direct edges.
- Modular Extension Framework: IN.IDEA is designed as a "lean", domain-agnostic core engine that can be easily extended to suit your specific research field. You can develop your own domain-specific modules or feature-rich extensions to add custom labels and properties without breaking the underlying four-layer logic.
IN.IDEA is engineered for production-grade reasoning, moving beyond "traditional" academic graphs to a robust, scalable architecture.
Instead of modelling facts as binary edges, IN.IDEA reifies the act of interpretation as a first-class node (Interpretation):
- Atomic Accountability: Every semantic claim is a node, not a property on an edge, allowing for complex metadata (Certainty, Reasoning, Methodology) without "Property-Graph-Bloat".
- Provenance Tracking: Every semantic link carries metadata:
Agent(Human/AI),Methodology,Certainty( to ), and aReasoning Statement. - Deterministic RAG: Provides a high-fidelity roadmap for LLMs, mitigating hallucinations by forcing the model to traverse explicitly modeled uncertainty layers.
Standard graph models often suffer from unpredictable traversal costs due to deep ontological recursions. IN.IDEA eliminates expensive :IS_A* lookups by materializing the hierarchy:
- O(1) Hierarchy Checks: Ancestral lookups in the concept tree are converted into simple array-membership checks using the
concept_path_idsproperty. - Path-Length Invariant: The distance from a physical
Unitto its semanticConceptis structurally capped (e.g., 4 for Entities, 6 for Events). - Performance Guarantee: By fixing traversal depth and materializing hierarchical paths, the model achieves near-constant O(1) vertical traversal complexity relative to the schema depth and O(log n) lookup efficiency, ensuring native compatibility with horizontally scalable graph engines.
To avoid the O(n²) "dense graph" trap where edges grow exponentially, IN.IDEA utilizes a Centroid-based Hub-and-Spoke model:
- Topological Compression: Diffuse similarity is managed via
CompositionParallelhubs rather than direct edges between compositions. - Resource Efficiency: This reduces edge density by orders of magnitude, allowing the system to handle many compositions on commodity hardware without index collapse.
IN.IDEA is designed as a strict Read-Optimized Projection of a normalized relational database:
- Integrity-First: The graph layer acts as a high-performance materialized view, ensuring ACID compliance and formal data integrity at the source.
- Traceability: Every node carries a unique identifier (
_id) mapped directly from the relational SSoT primary keys.
graph LR
%% Nodes
A(Concrete)
B(Abstract)
C(Object Layer)
D(Formal Analysis Layer)
E(Epistemic Layer)
F(Ontological Layer)
%% Edges
A ----> B
C --> D
D --> E
E --> F
The framework distinguishes between observation and interpretation through a strict four-layer logic:
| Layer | Focus | Key Nodes |
|---|---|---|
| I: Object | Physical/Virtual Object/Carrier | Unit |
| II: Formal Analysis | Structural Primitives (Neutral) | Composition, Entity, Relation ... |
| III: Epistemic | Uncertainty & Reasoning | Interpretation, Source, Agent, Comparison |
| IV: Ontological | Abstract Concepts (ThING) | Concept |
To understand the core logic and explore the graph, we recommend the following paths:
Before diving into the code, please review these documents to understand the epistemic and structural foundations:
- Architecture Overview: Learn about the four-layer logic, epistemic reification, and the performance-centric design philosophy.
- Core Fixtures: Understand how abstract concepts are instantiated. This document includes references to visualizations of the example scenarios.
For a live demonstration of the IN.IDEA framework, we provide a pre-configured Docker environment that automatically seeds the graph with our core logic and test data. Important Note: This Docker setup is intended for local demo and development purposes only. It is not hardened for production environments.
- Set up the Environment: if you need to change default password or ports, run
cp .env.example .envand edit the related keys. - Run the docker command:
docker-compose upand wait for the seeding to be completed - Access the Neo4j Browser: Open your browser and navigate to
http://localhost:7474(User: noe4j, PW: password).
- The seeder has already loaded the constraints, indices, and fixtures from the
modules/00_coredirectory. - You can immediately run the examples from queries.md in the Neo4j Browser.
- Reset: use
docker-compose down -vto revert any changes to the data
If you want to build some datasets on you own, we recommand reading at least how-to-annotate-using-in-idea.md and nodes-and-edges.md.
IN.IDEA is open-sourced software created and maintained by Jan Köster and licensed under the Apache 2.0 license for the Academy Project "Imagines Nummorum" at the Berlin-Brandenburg Academy of Sciences and Humanities. This project is part of the "Akademienprogramm", funded by German federal and state governments, which serves to preserve, secure and make present our cultural heritage. It is coordinated by the Union of German Academies of Sciences and Humanities.
Contact: For any IN.IDEA related topic we prefer direct communication on Github, for any contact to our initiative, see Contact
@software{koester_jan_2026_IN_IDEA,
author = {Köster, Jan},
title = {{IN.IDEA: A Graph Framework for Describing Visual
Compositions and Modelling Uncertainty}},
month = jan,
year = 2026,
publisher = {Zenodo},
version = {v0.9.3},
doi = {10.5281/zenodo.18160255},
url = {https://doi.org/10.5281/zenodo.18160255}
}
In alignment with the epistemological focus of this project, transparency regarding the creation process is paramount. This graph model was developed with the assistance of Google Gemini (v2.5 Flash and 3 Pro), which served as an interactive dialogue partner and research tool.
The AI's contribution included the following areas:
- Conceptual Brainstorming: Exploratory dialogue to refine the concept of the model.
- Best Practices Integration: Research and suggestions for adapting established patterns from graph theory and Neo4j.
- Coherence Validation: Reviewing and stress-testing the consistency of the terminology and structural logic used in the graph.
- Drafting Support: Generating initial outlines and structural sketches for the publication/documentation.
Note on Authorship: While the AI provided support as described above, all final decisions, code implementations, and text formulations were curated, verified, and finalized by the maintainer mentioned above. The AI acted solely as an assistive tool, not as an autonomous agent/co-author.
This framework is officially titled IN.IDEA. The name is a functional synthesis:
- IN: Relfects the primary scope of application (Imagines Nummorum)
- IDEA: An acronym for Iconographical Definitional Epistemological Architecture, describing the underlying logical framework for graph-based knowledge representation.
Note: This software is independent and not affiliated with any brand including the word or acronym idea. It is a non-commercial, research-driven, open-sourced software architecture.