Add new PURL type: 'git' (for generic git repositories)#823
Add new PURL type: 'git' (for generic git repositories)#823darakian wants to merge 11 commits intopackage-url:mainfrom
Conversation
This PR adds a purl type which is intended to be used for project source code stored in a git repository. resolves package-url#780 In its current form it mimics the swift type defintion for namespace and name definition. Version is defined as a git reference based on the conversation in package-url#780 Ref: https://git-scm.com/book/en/v2/Git-Internals-Git-References Happy to iterate, but wanted to get this up 👍
|
@pombredanne I was going to add a human doc file as well in the |
|
@darakian The
It would be helpful if you draft and share a markdown file with the additional documentation that you have in mind. |
| "$id": "https://packageurl.org/types/git-definition.json", | ||
| "type": "git", | ||
| "type_name": "Git", | ||
| "description": "Git-based source packages", |
There was a problem hiding this comment.
Nit: would "source repositories" be clearer / more accurate than "source packages"?
There was a problem hiding this comment.
It could be. I was trying to stay with the package verbiage to align with the rest of the package url types, but I'm not opposed to swapping. @pombredanne do you have a preference one way or another?
| "version_definition": { | ||
| "requirement": "optional", | ||
| "native_name": "A git reference", | ||
| "note": "The version is a git reference. Ideally a commit or tag." |
There was a problem hiding this comment.
Is there a mechanism to explicitly link to the documentation for git references? Just to be maximally clear what "git reference" means in this context.
There was a problem hiding this comment.
That's a good call out. It looks like a few of the types have url references in their type definitions, so I added 8c9402c
|
@mjherzog I take it that there's no specific place to add those docs yet? Can do on drafting them, but where should I create the file? |
| "note": "The version is a git reference (https://git-scm.com/book/en/v2/Git-Internals-Git-References). Ideally a commit or tag." | ||
| }, | ||
| "examples": [ | ||
| "pkg:git/codeberg.org/forgejo/forgejo/@a72d2c07cfca03b55371089de6aa230d8c951fa0#options/locale_readme.md", |
There was a problem hiding this comment.
Isn't this supposed to be exactly the (canonical) Git clone URL? If so, why is the .git suffix being omitted in this example?
Similarly, shouldn't a trailing / be omitted?
There was a problem hiding this comment.
Interestingly enough I can't find a primary source reference talking about the .git suffix. The best I was able to come across in searching is this stack overflow question where there is an assertion (without reference) that a trailing .git is a naming convention
https://stackoverflow.com/questions/8686691/what-does-the-git-mean-in-a-git-url
the primary git reference shows them off in examples but does not seem to explain them directly
https://git-scm.com/docs/git-clone.html#_git_urls
Testing locally (and also with the trailing /) I see some differences but they all come down to which url is populated in the local .git dir.
ex. when cloning this repo the three different ways I see
jon~/g/randos❯❯❯ diff test-1/.git/config test-2/.git/config
9c9
< url = https://github.com/package-url/purl-spec
---
> url = https://github.com/package-url/purl-spec/
...
jon~/g/randos❯❯❯ diff test-1/.git/config test-3/.git/config
9c9
< url = https://github.com/package-url/purl-spec
---
> url = https://github.com/package-url/purl-spec.git
For the purposes of identifying the actual code they all seem equivalent. I suppose there's a choice for use to deviate from what is allowed in the world of git and git tools and force a normalization or to align with the git world for ease of interoperability. Based on the conversation in #780 I believe the preference is to align with the norms of the git world.
That said, looking into the git url doc did make me think that the namespace and name definition should be altered slightly to align with the how git urls are defined upstream.
See: 88e820a
@andrew you don't have any insight into .git and trailing /s do you?
There was a problem hiding this comment.
Interestingly enough I can't find a primary source reference talking about the
.gitsuffix.
I believe its (conventional) use is simply hosting-platform-specific. I'm simply looking at the platform's "copy URL to clipboard" UI to see if that platform prefers to have a ".git" suffix in the URL or not.
So I agree that we can't have a general rule here, but for the scope of the examples, which name concrete platforms, IMO we should adhere to the preference of that platform.
There was a problem hiding this comment.
I believe the .git suffix comes from the original git-http-backend CGI script served repositories over HTTP by mapping the URL path directly to the filesystem.
As far as I know all the major forges support both with and without .git when cloning
There was a problem hiding this comment.
It is commonly removed by forges, not Git itself, meaning it's not possible to tell for any given Git URL whether the .git is optional or not.
Forgejo: https://codeberg.org/forgejo/forgejo/src/commit/df79ccf7d8b69f63b7cb66d340e26ce1e3e79c89/routers/web/repo/githttp.go#L62
GitLab: https://gitlab.com/gitlab-org/gitlab/-/blob/358f46317b3613756ee6c471379d4cbdc33d9197/config/routes/git_http.rb#L58-68
There was a problem hiding this comment.
@sschuberth would you like to propose an example to add/edit?
Well, any Git clone URL of a GitHub project would do, I guess, so for example:
pkg://git/github.com/oss-review-toolkit/ort.git@86c6b09b93db996689735d1eaabaa86a1051a319#cli/build.gradle.kts
There was a problem hiding this comment.
However, if we give such an example, we should probably make clear that this is not the "canonical form" of a GitHub PURL, and people should prefer pkg://github/... instead.
There was a problem hiding this comment.
pkg:git/projects.blender.org/blender/blender.git and pkg:git/projects.blender.org/blender/blender are equivalent, but PURL implementations cannot be expected to know.
There was a problem hiding this comment.
@sschuberth how do you feel about the example (stolen from @matt-phylum) added in 8b1d9f0 ?
As for the advice about preferring the github type over the more primitive git type; totally agree. That's probably something to add to the human readable doc that @mjherzog mentioned. Maybe that's a follow up PR to add that though? I'm not sure there's a paved path for that doc yet (please correct me if I'm wrong).
There was a problem hiding this comment.
@sschuberth how do you feel about the example (stolen from @matt-phylum) added in 8b1d9f0 ?
Fine with me.
As for the advice about preferring the github type over the more primitive git type; totally agree. [...] Maybe that's a follow up PR to add that though?
Fine with me as well to do that as a follow-up; we just should not forget about giving that advice.
This prematurely adds support for the upcoming PURL type for generic Git repositories [1] to avoid an empty PURL being created from the `id` if the VCS host cannot be determined. [1]: package-url/purl-spec#823 Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
This prematurely adds support for the upcoming PURL type for generic Git repositories [1] to avoid an empty PURL being created from the `id` if the VCS host cannot be determined. [1]: package-url/purl-spec#823 Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
This prematurely adds support for the upcoming PURL type for generic Git repositories [1] to avoid an empty PURL being created from the `id` if the VCS host cannot be determined. [1]: package-url/purl-spec#823 Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
This prematurely adds support for the upcoming PURL type for generic Git repositories [1] to avoid an empty PURL being created from the `id` if the VCS host cannot be determined. [1]: package-url/purl-spec#823 Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
|
Git supports multiple different protocols, which HTTP over TLS now being the most common. Can a Git PURL be constructed for:
|
|
@matt-phylum transports were mostly discussed synchronously in the purl community meetings (though I did mention it #780 (comment) ), but transports are out of scope. The goal with this type is to describe content rather than access method. |
| "note": "The version is a git reference (https://git-scm.com/book/en/v2/Git-Internals-Git-References). Ideally a commit or tag." | ||
| }, | ||
| "examples": [ | ||
| "pkg:git/codeberg.org/forgejo/forgejo/@a72d2c07cfca03b55371089de6aa230d8c951fa0#options/locale_readme.md", |
There was a problem hiding this comment.
This example uses the #, followed by a path to a file in the repository, but it doesn't look like this behavior is specified in this document.
Also, we likely ought to specify / explain in whatever documentation is made for this type that the path part must be stripped if you're going to git clone a repository.
There was a problem hiding this comment.
I'm not opposed to expanding on the the # path behavior, but it is also shared with the github and bitbucket types and I think comes from the file_name qualifier here
https://github.com/package-url/purl-spec/blob/main/docs/common-qualifiers.md
I'll be honest, I built from the example of the github type which has that same behavior here
https://github.com/package-url/purl-spec/blob/main/types/github-definition.json#L30
The # character is not explicitly mentioned in the common qualifiers doc, so perhaps this is a spec level clarification rather than just for this type?
There was a problem hiding this comment.
Hm, good references. I suppose this goes to the purl folks for where information should be communicated. I'm in favor of being as explicit as possible (even if that means that the specs for other types need to change to be more detailed).
There was a problem hiding this comment.
@alilleybrinker In any case we need to start with documenting details outside of the <purl-type>.definition.json files that are based on the Schema included in ECMA-427 (Clause 6). Making schema level changes based on the most complex cases is problematic and we need many good examples before proposing schema changes.
There are 3 other places to put this information:
- Test cases for a PURL type
- PURL type background documentation
- Specific PURL type information in the How to documentation
We have flexibility for adding new documentation and we now have an efficient process for publishing documentation at www.packageurl.org.
There was a problem hiding this comment.
Alright, with a little help I found the reference @alilleybrinker. I was looking for a file specifier and the term I should have been looking for was subpath. So, I believe we're good on this being documented as normal construction
https://packageurl.org/docs/purl/specification#subpath
|
I let this get away from me, but added some tests. Was there anything else we wanted prior to merging? Was the human doc something to do async? |
|
@darakian We need the PR to include:
We are still working on better documentation for this but the basic steps are in For the bonus PURL type documentation, please add your file to the new I will ask @johnmhoran to review the test cases. |
|
@mjherzog Ok. I've updated the PR with both the rendered |
Corrected typo 'verison' to 'version'
This PR adds a purl type which is intended to be used for project source code stored in a git repository. resolves #780
In its current form it mimics the swift type defintion for namespace and name definition. Version is defined as a git reference based on the conversation in #780 Ref: https://git-scm.com/book/en/v2/Git-Internals-Git-References
Happy to iterate, but wanted to get this up 👍