Git LFS development guidelines
This page contains developer-centric information for GitLab team members. For the user documentation, see Git Large File Storage.
Controllers and Services
The methods for authentication defined here are inherited by all the other LFS controllers.
After authentication the
batch action is the first action called by the Git LFS
client during downloads and uploads (such as pull, push, and clone).
Provides payload to Workhorse including a path for Workhorse to save the file to. Could be remote object storage.
Handles requests from Workhorse that contain information on a file that workhorse already uploaded (see this middleware) so that
gitlab can either:
- Create an
- Connect an existing
LfsObjectto a project with an
LfsObject and LfsObjectsProject
- Only one
LfsObjectis created for a file with a given
oid(a SHA256 checksum of the file) and file size.
Projects. They determine if a file can be accessed through a project.
- These objects are also used for calculating the amount of LFS storage a given project is using.
For more information, see
Handles the lock API for LFS. Delegates mostly to corresponding services:
These services create and delete
- This endpoint responds with a payload that allows a client to check if there are any files being pushed that have locks that belong to another user.
- A client-side
lfs.locksverifyconfiguration can be set so that the client aborts the push if locks exist that belong to another user.
- The existence of locks belonging to other users is also validated on the server side.
sequenceDiagram autonumber alt Over HTTPS Git client-->>Git client: user-supplied credentials else Over SSH Git client->>gitlab-shell: git-lfs-authenticate activate gitlab-shell activate GitLab Rails gitlab-shell->>GitLab Rails: POST /api/v4/internal/lfs_authenticate GitLab Rails-->>gitlab-shell: token with expiry deactivate gitlab-shell deactivate GitLab Rails end
- Clients can be configured to store credentials in a few different ways. See the Git LFS documentation on authentication.
gitlab-shell. See the Git LFS documentation concerning
gitlab-shellmakes a request to the GitLab API.
- Responding to shell with token which is used in subsequent requests. See Git LFS documentation concerning authentication.
sequenceDiagram Note right of Git client: Typical Git clone things happen first Note right of Git client: Authentication for LFS comes next activate GitLab Rails autonumber Git client->>GitLab Rails: POST project/namespace/info/lfs/objects/batch GitLab Rails-->>Git client: payload with objects deactivate GitLab Rails loop each object in payload Git client->>GitLab Rails: GET project/namespace/gitlab-lfs/objects/:oid/ (<- This URL is from the payload) GitLab Rails->>Workhorse: SendfileUpload Workhorse-->> Git client: Binary data end
- Git LFS requests the ability to download files with authorization header from authorization.
gitlabresponds with the list of objects and where to find them. See LfsApiController#batch.
- Git LFS makes a request for each file for the
hrefin the previous response. See how downloads are handled with the basic transfer mode.
gitlabredirects to the remote URL if remote object storage is enabled. See SendFileUpload.
sequenceDiagram Note right of Git client: Typical Git push things happen first. Note right of Git client: Suthentication for LFS comes next. autonumber activate GitLab Rails Git client ->> GitLab Rails: POST project/namespace/info/lfs/objects/batch GitLab Rails-->>Git client: payload with objects deactivate GitLab Rails loop each object in payload Git client->>Workhorse: PUT project/namespace/gitlab-lfs/objects/:oid/:size (URL is from payload) Workhorse->>GitLab Rails: PUT project/namespace/gitlab-lfs/objects/:oid/:size/authorize GitLab Rails-->>Workhorse: response with where path to upload Workhorse->>Workhorse: Upload Workhorse->>GitLab Rails: PUT project/namespace/gitlab-lfs/objects/:oid/:size/finalize end
- Git LFS requests the ability to upload files.
gitlabresponds with the list of objects and uploads to find them. See LfsApiController#batch.
- Git LFS makes a request for each file for the
hrefin the previous response. See how uploads are handled with the basic transfer mode.
gitlabresponds with a payload including a path for Workhorse to save the file to. Could be remote object storage. See LfsStorageController#upload_authorize.
- Workhorse does the work of saving the file.
- Workhorse makes a request to
gitlabwith information on the uploaded file so that
gitlabcan create an
LfsObject. See LfsStorageController#upload_finalize.
In April 2019, Francisco Javier López hosted a Deep Dive (GitLab team members only:
on the GitLab Git LFS implementation to share domain-specific
knowledge with anyone who may work in this part of the codebase in the future.
You can find the recording on YouTube,
and the slides on Google Slides
and in PDF.
This deep dive was accurate as of GitLab 11.10, and while specific
details may have changed, it should still serve as a good introduction.
Including LFS blobs in project archives
Introduced in GitLab 13.5.
The following diagram illustrates how GitLab resolves LFS files for project archives:
sequenceDiagram autonumber Client->>+Workhorse: GET /group/project/-/archive/master.zip Workhorse->>+Rails: GET /group/project/-/archive/master.zip Rails->>+Workhorse: Gitlab-Workhorse-Send-Data git-archive Workhorse->>Gitaly: SendArchiveRequest Gitaly->>Git: git archive master Git->>Smudge: OID 12345 Smudge->>+Workhorse: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234 Workhorse->>+Rails: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234 Rails->>+Workhorse: Gitlab-Workhorse-Send-Data send-url Workhorse->>Smudge: <LFS data> Smudge->>Git: <LFS data> Git->>Gitaly: <streamed data> Gitaly->>Workhorse: <streamed data> Workhorse->>Client: master.zip
- The user requests the project archive from the UI.
- Workhorse forwards this request to Rails.
- If the user is authorized to download the archive, Rails replies with
an HTTP header of
Gitlab-Workhorse-Send-Datawith a base64-encoded JSON payload prefaced with
git-archive. This payload includes the
SendArchiveRequestbinary message, which is encoded again in base64.
- Workhorse decodes the
Gitlab-Workhorse-Send-Datapayload. If the archive already exists in the archive cache, Workhorse sends that file. Otherwise, Workhorse sends the
SendArchiveRequestto the appropriate Gitaly server.
- The Gitaly server calls
git archive <ref>to begin generating the Git archive on-the-fly. If the
include_lfs_blobsflag is enabled, Gitaly enables a custom LFS smudge filter via the
-c filter.lfs.smudge=/path/to/gitaly-lfs-smudgeGit option.
gitidentifies a possible LFS pointer using the
gitaly-lfs-smudgeand provides the LFS pointer via the standard input. Gitaly provides
GL_INTERNAL_CONFIGas environment variables to enable lookup of the LFS object.
- If a valid LFS pointer is decoded,
gitaly-lfs-smudgemakes an internal API call to Workhorse to download the LFS object from GitLab.
- Workhorse forwards this request to Rails. If the LFS object exists
and is associated with the project, Rails sends
ArchivePatheither with a path where the LFS object resides (for local disk) or a pre-signed URL (when object storage is enabled) via the
Gitlab-Workhorse-Send-DataHTTP header with a payload prefaced with
- Workhorse retrieves the file and send it to the
gitaly-lfs-smudgeprocess, which writes the contents to the standard output.
gitreads this output and sends it back to the Gitaly process.
- Gitaly sends the data back to Rails.
- The archive data is sent back to the client.
In step 7, the
gitaly-lfs-smudge filter must talk to Workhorse, not to
Rails, or an invalid LFS blob is saved. To support this, GitLab 13.5
changed the default Omnibus configuration to have Gitaly talk to the Workhorse
instead of Rails.
One side effect of this change: the correlation ID of the original
request is not preserved for the internal API requests made by Gitaly
gitaly-lfs-smudge), such as the one made in step 8. The
correlation IDs for those API requests are random values until
this Workhorse issue is
- Blog post: Getting started with Git LFS
- User documentation: Git Large File Storage (LFS)
- GitLab Git Large File Storage (LFS) Administration for self-managed instances