8000 feat(write): OWL by jdreo · Pull Request #412 · biocypher/biocypher · GitHub
[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(write): OWL #412

Merged
merged 19 commits into from
Mar 5, 2025
Merged

feat(write): OWL #412

merged 19 commits into from
Mar 5, 2025

Conversation

jdreo
Copy link
Collaborator
@jdreo jdreo commented Feb 6, 2025

A writer able to output a self-contained populated OWL file:

  • Based on RDFWriter.
  • Embbed the input ontology within the output one.
  • Using the input's namespaces.
  • Allows two edge models:
    • using owl:ObjectProperty, but losing edge properties,
    • or using intermediate owl:Class, keeping edge properties.

Internal changes:

  • Adds get_rdf_graph to Ontology.
  • Rename _rdf:subject_to_uri to to_uri.
  • Adds a compatible as_uri to deal with explicit namespaces.
  • Adds find_uri for terms with unknown namespaces.

@jdreo jdreo force-pushed the feat-write-owl branch 2 times, most recently from d91b716 to 257e6f6 Compare February 12, 2025 09:18
A writer able to output a self-contained populated OWL file:
- Based on RDFWriter.
- Embbed the input ontology within the output one.
- Using the input's namespaces.
- Allows two edge models:
    - using owl:ObjectProperty, but losing edge properties,
    - or using intermediate owl:Class, keeping edge properties.

Internal changes:
- Adds `get_rdf_graph` to Ontology.
- Rename _rdf:`subject_to_uri` to `to_uri`.
- Adds a compatible `as_uri` to deal with explicit namespaces.
- Adds `find_uri` for terms with unknown namespaces.

- Adds a documentation.
- Adds a test, following the RDF one.
@jdreo jdreo marked this pull request as ready for review February 12, 2025 09:33
@jdreo
Copy link
Collaborator Author
jdreo commented Feb 12, 2025

From what I can see, failing CI tests are in another module than the ones impacted by this PR, so I'm asking for a review anyway.

@slobentanzer
Copy link
Collaborator

I think this failure is formatting-related; have you run the pre-commit locally? :)

@jdreo jdreo mentioned this pull request Feb 12, 2025
11 tasks
@jdreo
Copy link
Collaborator Author
jdreo commented Feb 12, 2025

I have, yes, and it's passing.

@slobentanzer
Copy link
Collaborator

Wasn't passing for me, how were you running it? There was a duplicated log statement causing some module-level imports to not be at the top, which is what ticked off Ruff.

Copy link
Collaborator
@slobentanzer slobentanzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition, thanks!

(Feel free to ignore the partial CI failure, the GitHub runner for Windows is acting up on the download tests, this will be fixed in time.)

Based on a superficial review, I like the changes, but I also saw several comments in the code that make me suspect there is some procedural or philosophical issue we may need to address; yet you have not marked this in your PR comment I think. Would you mind elaborating on how you see the various open problems that you indicated and what should be the practical implications? Pragmatically, what should we attempt to fix within this PR?

Maybe @LoesvdBiggelaar also wants to have a look?

@jdreo
Copy link
Collaborator Author
jdreo commented Feb 13, 2025

I forgot to remove some FIXME tags here and there, I will do a pass.

I am currently trying to summarize my thoughts regarding the architecture, and I felt this was going above this specific PR. Where should I post them?

@slobentanzer
Copy link
Collaborator

Thanks, all makes sense to me. I see no problems with merging the (maybe cleaned up) PR here to not keep it around for too long. As indicated in some of my other answers, I think a Discussion thread would be good to talk about the complexity / philosophy issues. Individual issues and PRs should then be created for the things we prioritise. D'accord? :)

@jdreo
Copy link
Collaborator Author
jdreo commented Feb 13, 2025

Agreed. As a general comment regarding the complexity of ontologies, I generally think that the "world modeling" ontologies are too complex, but I also refrain from being too judgmental of their modeling approach, since they also thought hard about how to solve hard problems that can be very close to ours (like how to model uncertainties on associations, if I follow the BioLink vocabulary).

Copy link
Collaborator
@slobentanzer slobentanzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdreo this looks good, but the windows runner fails with

FAILED test/output/write/graph/test_owl.py::test_owl_write_data[4] - UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 331719: character maps to <undefined>

I don't have a windows machine to debug; you can probably find more info in the failed CI action's logs. Let me know if I can be of help.

Copy link
Collaborator
@LoesvdBiggelaar LoesvdBiggelaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great feature!

@jdreo
Copy link
Collaborator Author
jdreo commented Mar 3, 2025

@jdreo this looks good, but the windows runner fails with

FAILED test/output/write/graph/test_owl.py::test_owl_write_data[4] - UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 331719: character maps to <undefined>

I don't have a windows machine to debug; you can probably find more info in the failed CI action's logs. Let me know if I can be of help.

I don't have access to a Windows machine, either. And the character is nowhere to be found in any test file. The error occurs in RDFlib, when parsing the ontology (test_owl.py:51), when trying to decode from CP1252 (a Windows encoding), which does not have a \x9D character.

Maybe there is a config somewhere in the OS/Python interpreter, to default file encodings to UTF-8? I have absolutely no clue about how to use Windows (my last use of it was in 2002, if you can believe it).

@slobentanzer
Copy link
Collaborator

my last use of it was in 2002, if you can believe it

good for you :D

Maybe there is a config somewhere in the OS/Python interpreter, to default file encodings to UTF-8?

I can throw it at my various AI assistants and see what comes up.

jdreo added 2 commits March 4, 2025 08:30
- 'rdf_format' marked as deprecated.
- Simple renaming as 'file_format'.
- Used in RDF and OWL writers.
- Adds deprecation warnings.

BREAKING CHANGE: file_format replaces rdf_format
Copy link
Collaborator
@slobentanzer slobentanzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdreo nice work! I managed to fix the windows issue with a little help from my generative friend.

I will fix some minor things we noted, like allowing ttl, and will make sure to migrate your docs changes to the new mkdocs version we have created in parallel; then merge. :)

@slobentanzer slobentanzer linked an issue Mar 4, 2025 that may be closed by this pull request
This was linked to issues Mar 4, 2025
@jdreo
Copy link
Collaborator Author
jdreo commented Mar 5, 2025

As a side note, the diagrams in the OWL doc would look better, would this PR be merged in Pygments: pygments/pygments#2869 and the related tag be added to the code block (this would also ease the CSS fix).

- Add 'ttl' to the list of supported RDF formats
- Remove conversion from 'ttl' to 'turtle' as 'ttl' is directly supported by rdflib
- Add tests for both RDF and OWL writers with 'ttl' format
- Update docs to reflect addition

fixes Allow `ttl` as format spec biocypher#422
slobentanzer added a commit that referenced this pull request Mar 5, 2025
fixes Migrate new OWL docs to `mkdocs` version #423
Copy link
Collaborator
@slobentanzer slobentanzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added ttl as valid format and updated the new docs on mkdocs-migration to include the additions and changes made here. The docs on that branch are the default now, please introduce any further changes there; will be merged soon and replace the Sphinx docs.

Again, thanks @jdreo for the nice addition, and thanks @LoesvdBiggelaar for reviewing as well. :)

Merging and bumping minor version now. 🎉

@slobentanzer slobentanzer merged commit 2054ca5 into biocypher:main Mar 5, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate new OWL docs to mkdocs version Allow ttl as format spec [Bug]: RDF support
3 participants
0