Create a single source for metadata #6

Open
opened 2026-04-21 15:46:52 +00:00 by linarphy · 8 comments
Owner

Implement a metadata.toml ? And integrate it so pyproject get version from this, and tag are added from this file.

Implement a metadata.toml ? And integrate it so pyproject get version from this, and tag are added from this file.
Author
Owner

Working on a Software Project Ontology to store metadata for software project.
I already have a first draft (still missing some obvious properties).

Working on a Software Project Ontology to store metadata for software project. I already have a first draft (still missing some obvious properties).
Author
Owner
I should look at: - https://github.com/codemeta/codemeta/issues/357: - https://softwareunderstanding.github.io/software_types/release/1.0.0/ - DOAP: https://github.com/ewilderj/doap/wiki - more resources: https://www.w3.org/wiki/SemanticWebDOAPBulletinBoard - rationale: https://web.archive.org/web/20041210114105/http://www-106.ibm.com/developerworks/xml/library/x-osproj.html - SoftwareDescriptionOntology: https://knowledgecaptureanddiscovery.github.io/SoftwareDescriptionOntology/release/1.9.0/index-en.html - OpenSource Metadata Framework: https://www.ibiblio.org/osrt/omf/template.html - reference: https://www.ibiblio.org/osrt/omf/omf_elements
Author
Owner

Focus on OpenSourceMetadata Framework:

It distinguish Author, Maintainer and Contributor. I would prefer a has_contributor property linked to an Actor that can have one or more has_role, that can be a value in a specific controlled vocabulary that would contains something like translation, maintainer, add new feature, etc.
I want, from an ideological belief, avoid something like author, which would seems like someone has to "own" a project while I’m against the existence of private property (and even more against private property of an idea or a project !).
After though, if I go to an event-driven ontology, I do not need to have an has_role associated to an Actor. I can have Event that create, update or delete Task that will then have has_category ? I still can have a partially event-driven ontology to have a simpler way to store contributors information.

Title would be has_preferred_identifier for me (coming from CIDOC-CRM).

Date is quite problematic has it only give information about the "last modification", I would prefer having an event-driven ontology (where one could store information in a way like "Add feature X" has_actor "Y", "Add feature X" has_date "2000-01-01" and "Add feature X" is_after "Add feature W", or "12th observation of the project" has_date "2000-01-01", "12th observation of the project" change_governance_state "Abandonned")

Version is related to Date, in an event-driven ontology, it would not be a property of the project itself but a result of an event. Need reasoner though, so maybe this add too much complexity for little gain ?

Subject and Keywords: I need to use a Taxonomy/controlled vocabulary, at least for subject… That’s an new issue in itself ?

Resource-type: Not using subject for the type of the resource itself is interesting… But how to characterize what are the different resources ? I don’t think it’s one-dimensional. Related to Subject and keyword task. I first need to find a decent dimensionality to characterize the type of a software project. I need to define the limit of the ontology here. If I try to go to deep, I’ll loose time and will make think too complex. Annndd… I read it again and I didn’t understand. It’s about resources linked to a project. I think I should not include resources outside of the project (or of other projects), and should consider theses resources as part of the project.

Format: Not needed if I follow last idea.

Resource identifier can be another has_preferred_identifier (or title can just be a has_identifier and I would have a unique preferred identifier ?).

Language is an hard part. Should focus on it later ? Or just avoid it, but it IS a metadata about the project… An hard part indeed.

Relation is not necessary.

I find coverage too precise. Still raises good point. Can a project have "geographic specificity" ? Distribution, kernel is focused on GNU/Linux, not sure if it is needed ? Maybe an optional parameter of an "OS" entity ? OS and architecture is important, I think.

Right management seems too complex, just having a SPDX license identifier should be fine.

Focus on OpenSourceMetadata Framework: It distinguish Author, Maintainer and Contributor. I would prefer a `has_contributor` property linked to an `Actor` that can have one or more `has_role`, that can be a value in a specific controlled vocabulary that would contains something like `translation`, `maintainer`, `add new feature`, etc. I want, from an ideological belief, avoid something like `author`, which would seems like someone has to "own" a project while I’m against the existence of private property (and even more against private property of an idea or a project !). After though, if I go to an event-driven ontology, I do not need to have an `has_role` associated to an `Actor`. I can have `Event` that create, update or delete `Task` that will then have `has_category` ? I still can have a partially event-driven ontology to have a simpler way to store contributors information. Title would be `has_preferred_identifier` for me (coming from CIDOC-CRM). Date is quite problematic has it only give information about the "last modification", I would prefer having an event-driven ontology (where one could store information in a way like `"Add feature X" has_actor "Y"`, `"Add feature X" has_date "2000-01-01"` and `"Add feature X" is_after "Add feature W"`, or `"12th observation of the project" has_date "2000-01-01"`, `"12th observation of the project" change_governance_state "Abandonned"`) Version is related to Date, in an event-driven ontology, it would not be a property of the project itself but a result of an event. Need reasoner though, so maybe this add too much complexity for little gain ? Subject and Keywords: I need to use a Taxonomy/controlled vocabulary, at least for subject… That’s an new issue in itself ? Resource-type: Not using subject for the type of the resource itself is interesting… But how to characterize what are the different resources ? I don’t think it’s one-dimensional. Related to Subject and keyword task. I first need to find a decent dimensionality to characterize the type of a software project. I need to define the limit of the ontology here. If I try to go to deep, I’ll loose time and will make think too complex. Annndd… I read it again and I didn’t understand. It’s about resources linked to a project. I think I should not include resources outside of the project (or of other projects), and should consider theses resources as part of the project. Format: Not needed if I follow last idea. Resource identifier can be another `has_preferred_identifier` (or title can just be a `has_identifier` and I would have a unique preferred identifier ?). Language is an hard part. Should focus on it later ? Or just avoid it, but it IS a metadata about the project… An hard part indeed. Relation is not necessary. I find coverage too precise. Still raises good point. Can a project have "geographic specificity" ? Distribution, kernel is focused on GNU/Linux, not sure if it is needed ? Maybe an optional parameter of an "OS" entity ? OS and architecture is important, I think. Right management seems too complex, just having a SPDX license identifier should be fine.
Author
Owner

Focus on The Software Description Ontology

It’s more modern and it’s related to codemeta (and I will need to work with codemeta for project that I do for my lab -> OSCARS). Nice. Have a graph representation: NICCCEEEE.

Still an author… Can remove it though (or make it "null" by default ?). Same for publisher. However, information about the current mirrors that publish the project/make it available is important. I must not collide where the source code/artifact/binaries are and where the project is stored, we only talk about metadata here.

Contact person is a good metadata, but I would prefer Contact information or something that is not directly related to one person/org. I mean, we want to contact human at the end, so "contact person" is not that bad. Just need to have other attributes.

Take into account funding, nice to have.

I don’t understand what a "Visualization" is.

Other things seems to be about the software features/software functionalities, not interesting for what I need.

Focus on The Software Description Ontology It’s more modern and it’s related to codemeta (and I will need to work with codemeta for project that I do for my lab -> OSCARS). Nice. Have a graph representation: NICCCEEEE. Still an author… Can remove it though (or make it "null" by default ?). Same for publisher. However, information about the current mirrors that publish the project/make it available is important. I must not collide where the source code/artifact/binaries are and where the project is stored, we only talk about metadata here. Contact person is a good metadata, but I would prefer Contact information or something that is not directly related to one person/org. I mean, we want to contact human at the end, so "contact person" is not that bad. Just need to have other attributes. Take into account funding, nice to have. I don’t understand what a "Visualization" is. Other things seems to be about the software features/software functionalities, not interesting for what I need.
Author
Owner

Focus on DOAP

has a rdf file to download and edit with protégé ! Nice.

Will do it later.

Outside of Subject but https://web.archive.org/web/20231203233338/https://www.lespetitescases.net/les-technos-du-web-semantique-ont-elles-tenu-leurs-promesses is very interesting (found it randomly while searching for "blogs that would use rdf with BlogEd" (so a random subject too \o/))

Focus on DOAP has a rdf file to download and edit with protégé ! Nice. Will do it later. Outside of Subject but https://web.archive.org/web/20231203233338/https://www.lespetitescases.net/les-technos-du-web-semantique-ont-elles-tenu-leurs-promesses is very interesting (found it randomly while searching for "blogs that would use rdf with BlogEd" (so a random subject too \o/))
Author
Owner

DOAP seems to "force" some miscellaneous choice, like "repo type", not useful for what I want. The repo concerns a deeper information stack of the project. Same for "online account".

I don’t like foaf for a simple reason: the fact properties like "jabber ID", "Skype ID", etc… exists. Which is not dynamic at all (and didn’t age well). There is "OnlineAccount", which is more general. So maybe I should still use it. It’s really one of the most used ontology right now.

It characterize user-base, which is an interesting separation to make to consider project "topic". I should focus on separating what are the minimal number of dimension needed to define a "topic" for my use-case (which is: describe a project so it’s easy to understand quickly what is it about).

There are "security police" and "security contact", it is interesting for my case ?

It’s quite similar to The Software Description Ontology, and is as interesting.

If I do my own, I should definitely create mappings for these two ontologies.

DOAP seems to "force" some miscellaneous choice, like "repo type", not useful for what I want. The repo concerns a deeper information stack of the project. Same for "online account". I don’t like foaf for a simple reason: the fact properties like "jabber ID", "Skype ID", etc… exists. Which is not dynamic at all (and didn’t age well). There is "OnlineAccount", which is more general. So maybe I should still use it. It’s really one of the most used ontology right now. It characterize user-base, which is an interesting separation to make to consider project "topic". I should focus on separating what are the minimal number of dimension needed to define a "topic" for my use-case (which is: describe a project so it’s easy to understand quickly what is it about). There are "security police" and "security contact", it is interesting for my case ? It’s quite similar to The Software Description Ontology, and is as interesting. If I do my own, I should definitely create mappings for these two ontologies.
Author
Owner

I created a repository to continue the work on this ontology: https://git.linarphy.net/linarphy/software-project-metadata-ontology

I created a repository to continue the work on this ontology: https://git.linarphy.net/linarphy/software-project-metadata-ontology
Author
Owner

This is too much of a work for now, delaying up to 1.1.0

This is too much of a work for now, delaying up to 1.1.0
Sign in to join this conversation.
No milestone
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
linarphy/galaxy-graph#6
No description provided.