Skip to content

Add new PURL type: 'git' (for generic git repositories)#823

Open
darakian wants to merge 11 commits intopackage-url:mainfrom
darakian:add-git-purl-type
Open

Add new PURL type: 'git' (for generic git repositories)#823
darakian wants to merge 11 commits intopackage-url:mainfrom
darakian:add-git-purl-type

Conversation

@darakian
Copy link
Copy Markdown
Contributor

@darakian darakian commented Mar 4, 2026

This PR adds a purl type which is intended to be used for project source code stored in a git repository. resolves #780

In its current form it mimics the swift type defintion for namespace and name definition. Version is defined as a git reference based on the conversation in #780 Ref: https://git-scm.com/book/en/v2/Git-Internals-Git-References

Happy to iterate, but wanted to get this up 👍

This PR adds a purl type which is intended to be used for project source code stored in a git repository.
resolves package-url#780

In its current form it mimics the swift type defintion for namespace and name definition.
Version is defined as a git reference based on the conversation in package-url#780
Ref: https://git-scm.com/book/en/v2/Git-Internals-Git-References

Happy to iterate, but wanted to get this up 👍
@darakian
Copy link
Copy Markdown
Contributor Author

darakian commented Mar 4, 2026

@pombredanne I was going to add a human doc file as well in the types-doc/ folder, but the headers on those all mention that they're auto generated, so.... I skipped that. Happy to write up some human prose elsewhere.

@mjherzog
Copy link
Copy Markdown
Member

mjherzog commented Mar 4, 2026

@darakian The types-doc folder is currently reserved for the auto-generated type documentation, but we do need a place to put additional human friendly documentation at the PURL type level.
See also:

It would be helpful if you draft and share a markdown file with the additional documentation that you have in mind.

Copy link
Copy Markdown

@alilleybrinker alilleybrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor thoughts.

Comment thread types/git-definition.json
"$id": "https://packageurl.org/types/git-definition.json",
"type": "git",
"type_name": "Git",
"description": "Git-based source packages",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would "source repositories" be clearer / more accurate than "source packages"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be. I was trying to stay with the package verbiage to align with the rest of the package url types, but I'm not opposed to swapping. @pombredanne do you have a preference one way or another?

Comment thread types/git-definition.json Outdated
"version_definition": {
"requirement": "optional",
"native_name": "A git reference",
"note": "The version is a git reference. Ideally a commit or tag."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a mechanism to explicitly link to the documentation for git references? Just to be maximally clear what "git reference" means in this context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good call out. It looks like a few of the types have url references in their type definitions, so I added 8c9402c

@darakian
Copy link
Copy Markdown
Contributor Author

darakian commented Mar 4, 2026

@mjherzog I take it that there's no specific place to add those docs yet? Can do on drafting them, but where should I create the file?

@jkowalleck jkowalleck added the PURL type: new Register a new PURL type label Mar 5, 2026
Comment thread types/git-definition.json
"note": "The version is a git reference (https://git-scm.com/book/en/v2/Git-Internals-Git-References). Ideally a commit or tag."
},
"examples": [
"pkg:git/codeberg.org/forgejo/forgejo/@a72d2c07cfca03b55371089de6aa230d8c951fa0#options/locale_readme.md",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this supposed to be exactly the (canonical) Git clone URL? If so, why is the .git suffix being omitted in this example?

Similarly, shouldn't a trailing / be omitted?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough I can't find a primary source reference talking about the .git suffix. The best I was able to come across in searching is this stack overflow question where there is an assertion (without reference) that a trailing .git is a naming convention
https://stackoverflow.com/questions/8686691/what-does-the-git-mean-in-a-git-url
the primary git reference shows them off in examples but does not seem to explain them directly
https://git-scm.com/docs/git-clone.html#_git_urls

Testing locally (and also with the trailing /) I see some differences but they all come down to which url is populated in the local .git dir.
ex. when cloning this repo the three different ways I see

jon~/g/randos❯❯❯ diff test-1/.git/config test-2/.git/config
9c9
< 	url = https://github.com/package-url/purl-spec
---
> 	url = https://github.com/package-url/purl-spec/
...
jon~/g/randos❯❯❯ diff test-1/.git/config test-3/.git/config
9c9
< 	url = https://github.com/package-url/purl-spec
---
> 	url = https://github.com/package-url/purl-spec.git

For the purposes of identifying the actual code they all seem equivalent. I suppose there's a choice for use to deviate from what is allowed in the world of git and git tools and force a normalization or to align with the git world for ease of interoperability. Based on the conversation in #780 I believe the preference is to align with the norms of the git world.

That said, looking into the git url doc did make me think that the namespace and name definition should be altered slightly to align with the how git urls are defined upstream.
See: 88e820a

@andrew you don't have any insight into .git and trailing /s do you?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough I can't find a primary source reference talking about the .git suffix.

I believe its (conventional) use is simply hosting-platform-specific. I'm simply looking at the platform's "copy URL to clipboard" UI to see if that platform prefers to have a ".git" suffix in the URL or not.

So I agree that we can't have a general rule here, but for the scope of the examples, which name concrete platforms, IMO we should adhere to the preference of that platform.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the .git suffix comes from the original git-http-backend CGI script served repositories over HTTP by mapping the URL path directly to the filesystem.

As far as I know all the major forges support both with and without .git when cloning

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is commonly removed by forges, not Git itself, meaning it's not possible to tell for any given Git URL whether the .git is optional or not.

Forgejo: https://codeberg.org/forgejo/forgejo/src/commit/df79ccf7d8b69f63b7cb66d340e26ce1e3e79c89/routers/web/repo/githttp.go#L62
GitLab: https://gitlab.com/gitlab-org/gitlab/-/blob/358f46317b3613756ee6c471379d4cbdc33d9197/config/routes/git_http.rb#L58-68

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sschuberth would you like to propose an example to add/edit?

Well, any Git clone URL of a GitHub project would do, I guess, so for example:

pkg://git/github.com/oss-review-toolkit/ort.git@86c6b09b93db996689735d1eaabaa86a1051a319#cli/build.gradle.kts

Copy link
Copy Markdown
Member

@sschuberth sschuberth Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if we give such an example, we should probably make clear that this is not the "canonical form" of a GitHub PURL, and people should prefer pkg://github/... instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg:git/projects.blender.org/blender/blender.git and pkg:git/projects.blender.org/blender/blender are equivalent, but PURL implementations cannot be expected to know.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sschuberth how do you feel about the example (stolen from @matt-phylum) added in 8b1d9f0 ?

As for the advice about preferring the github type over the more primitive git type; totally agree. That's probably something to add to the human readable doc that @mjherzog mentioned. Maybe that's a follow up PR to add that though? I'm not sure there's a paved path for that doc yet (please correct me if I'm wrong).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sschuberth how do you feel about the example (stolen from @matt-phylum) added in 8b1d9f0 ?

Fine with me.

As for the advice about preferring the github type over the more primitive git type; totally agree. [...] Maybe that's a follow up PR to add that though?

Fine with me as well to do that as a follow-up; we just should not forget about giving that advice.

sschuberth added a commit to oss-review-toolkit/ort that referenced this pull request Mar 5, 2026
This prematurely adds support for the upcoming PURL type for generic Git
repositories [1] to avoid an empty PURL being created from the `id` if
the VCS host cannot be determined.

[1]: package-url/purl-spec#823

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
sschuberth added a commit to oss-review-toolkit/ort that referenced this pull request Mar 6, 2026
This prematurely adds support for the upcoming PURL type for generic Git
repositories [1] to avoid an empty PURL being created from the `id` if
the VCS host cannot be determined.

[1]: package-url/purl-spec#823

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
sschuberth added a commit to oss-review-toolkit/ort that referenced this pull request Mar 6, 2026
This prematurely adds support for the upcoming PURL type for generic Git
repositories [1] to avoid an empty PURL being created from the `id` if
the VCS host cannot be determined.

[1]: package-url/purl-spec#823

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
sschuberth added a commit to oss-review-toolkit/ort that referenced this pull request Mar 6, 2026
This prematurely adds support for the upcoming PURL type for generic Git
repositories [1] to avoid an empty PURL being created from the `id` if
the VCS host cannot be determined.

[1]: package-url/purl-spec#823

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
@matt-phylum
Copy link
Copy Markdown
Contributor

Git supports multiple different protocols, which HTTP over TLS now being the most common. Can a Git PURL be constructed for:

  • A filesystem path, which may point to a network resource on a company LAN?
  • HTTP without TLS?
  • HTTPS on a different port?
  • SSH?
  • SSH on a different port?
  • The Git protocol (eg git://git.kernel.org/pub/scm/bluetooth/bluez.git)?

@darakian
Copy link
Copy Markdown
Contributor Author

@matt-phylum transports were mostly discussed synchronously in the purl community meetings (though I did mention it #780 (comment) ), but transports are out of scope. The goal with this type is to describe content rather than access method.

Comment thread types/git-definition.json
"note": "The version is a git reference (https://git-scm.com/book/en/v2/Git-Internals-Git-References). Ideally a commit or tag."
},
"examples": [
"pkg:git/codeberg.org/forgejo/forgejo/@a72d2c07cfca03b55371089de6aa230d8c951fa0#options/locale_readme.md",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example uses the #, followed by a path to a file in the repository, but it doesn't look like this behavior is specified in this document.

Also, we likely ought to specify / explain in whatever documentation is made for this type that the path part must be stripped if you're going to git clone a repository.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to expanding on the the # path behavior, but it is also shared with the github and bitbucket types and I think comes from the file_name qualifier here
https://github.com/package-url/purl-spec/blob/main/docs/common-qualifiers.md
I'll be honest, I built from the example of the github type which has that same behavior here
https://github.com/package-url/purl-spec/blob/main/types/github-definition.json#L30

The # character is not explicitly mentioned in the common qualifiers doc, so perhaps this is a spec level clarification rather than just for this type?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good references. I suppose this goes to the purl folks for where information should be communicated. I'm in favor of being as explicit as possible (even if that means that the specs for other types need to change to be more detailed).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alilleybrinker In any case we need to start with documenting details outside of the <purl-type>.definition.json files that are based on the Schema included in ECMA-427 (Clause 6). Making schema level changes based on the most complex cases is problematic and we need many good examples before proposing schema changes.

There are 3 other places to put this information:

  • Test cases for a PURL type
  • PURL type background documentation
  • Specific PURL type information in the How to documentation

We have flexibility for adding new documentation and we now have an efficient process for publishing documentation at www.packageurl.org.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, with a little help I found the reference @alilleybrinker. I was looking for a file specifier and the term I should have been looking for was subpath. So, I believe we're good on this being documented as normal construction
https://packageurl.org/docs/purl/specification#subpath

@darakian
Copy link
Copy Markdown
Contributor Author

darakian commented Apr 7, 2026

I let this get away from me, but added some tests. Was there anything else we wanted prior to merging? Was the human doc something to do async?

@mjherzog
Copy link
Copy Markdown
Member

mjherzog commented Apr 8, 2026

@darakian We need the PR to include:

  • Update to purl-types-index.json
  • Generation of the type documentation

We are still working on better documentation for this but the basic steps are in CONTRIBUTING.md (formerly in README-dev.md)

For the bonus PURL type documentation, please add your file to the new docs/drafts folder. I suggest the naming:
docs/drafts/types/git-documentation.md for now.

I will ask @johnmhoran to review the test cases.

@mjherzog mjherzog changed the title Add a generic git purl type Add new PURL type: git (for generic git repositories) Apr 13, 2026
@mjherzog mjherzog changed the title Add new PURL type: git (for generic git repositories) Add new PURL type: 'git' (for generic git repositories) Apr 13, 2026
@darakian
Copy link
Copy Markdown
Contributor Author

@mjherzog Ok. I've updated the PR with both the rendered types-doc and a human doc following your naming suggestion. The human doc is a little bare bones, but I tried to give a basic intro to the uninitiated as well as some human text for the different parameters.

Corrected typo 'verison' to 'version'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PURL type: new Register a new PURL type

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new PURL Type: 'git'

7 participants