    Git File format &mdash; dulwich 0.10.0 documentation
    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
  <div class="section" id="git-file-format">
<h1>Git File format<a class="headerlink" href="#git-file-format" title="Permalink to this headline">¶</a></h1>
<p>For a better understanding of Dulwich, we&#8217;ll start by explaining most of the
Git secrets.</p>
<p>Open the &#8221;.git&#8221; folder of any Git-managed repository. You&#8217;ll find folders
like &#8220;branches&#8221;, &#8220;hooks&#8221;... We&#8217;re only interested in &#8220;objects&#8221; here. Open it.</p>
<p>You&#8217;ll mostly see 2 hex-digits folders. Git identifies content by its SHA-1
digest. The 2 hex-digits plus the 38 hex-digits of files inside these folders
form the 40 characters (or 20 bytes) id of Git objects you&#8217;ll manage in
<p>We&#8217;ll first study the three main objects:</p>
<ul class="simple">
<li>The Commit;</li>
<li>The Tree;</li>
<li>The Blob.</li>
<div class="section" id="the-commit">
<h2>The Commit<a class="headerlink" href="#the-commit" title="Permalink to this headline">¶</a></h2>
<p>You&#8217;re used to generate commits using Git. You have set up your name and
e-mail, and you know how to see the history using <tt class="docutils literal"><span class="pre">git</span> <span class="pre">log</span></tt>.</p>
<p>A commit file looks like this:</p>
<div class="highlight-python"><pre>commit &lt;content length&gt;&lt;NUL&gt;tree &lt;tree sha&gt;
parent &lt;parent sha&gt;
[parent &lt;parent sha&gt; if several parents from merges]
author &lt;author name&gt; &lt;author e-mail&gt; &lt;timestamp&gt; &lt;timezone&gt;
committer &lt;author name&gt; &lt;author e-mail&gt; &lt;timestamp&gt; &lt;timezone&gt;

&lt;commit message&gt;</pre>
<p>But where are the changes you commited? The commit contains a reference to a
<div class="section" id="the-tree">
<h2>The Tree<a class="headerlink" href="#the-tree" title="Permalink to this headline">¶</a></h2>
<p>A tree is a collection of file information, the state of a single directory at
a given point in time.</p>
<p>A tree file looks like this:</p>
<div class="highlight-python"><pre>tree &lt;content length&gt;&lt;NUL&gt;&lt;file mode&gt; &lt;filename&gt;&lt;NUL&gt;&lt;item sha&gt;...</pre>
<p>And repeats for every file in the tree.</p>
<p>Note that the SHA-1 digest is in binary form here.</p>
<p>The file mode is like the octal argument you could give to the <tt class="docutils literal"><span class="pre">chmod</span></tt>
command.  Except it is in extended form to tell regular files from
directories and other types.</p>
<p>We now know how our files are referenced but we haven&#8217;t found their actual
content yet. That&#8217;s where the reference to a blob comes in.</p>
<div class="section" id="the-blob">
<h2>The Blob<a class="headerlink" href="#the-blob" title="Permalink to this headline">¶</a></h2>
<p>A blob is simply the content of files you are versionning.</p>
<p>A blob file looks like this:</p>
<div class="highlight-python"><pre>blob &lt;content length&gt;&lt;NUL&gt;&lt;content&gt;</pre>
<p>If you change a single line, another blob will be generated by Git at commit
time. This is how Git can fastly checkout any version in time.</p>
<p>On the opposite, several identical files with different filenames generate
only one blob. That&#8217;s mostly how renames are so cheap and efficient in Git.</p>
<div class="section" id="dulwich-objects">
<h2>Dulwich Objects<a class="headerlink" href="#dulwich-objects" title="Permalink to this headline">¶</a></h2>
<p>Dulwich implements these three objects with an API to easily access the
information you need, while abstracting some more secrets Git is using to
accelerate operations and reduce space.</p>
<div class="section" id="more-about-git-formats">
<h2>More About Git formats<a class="headerlink" href="#more-about-git-formats" title="Permalink to this headline">¶</a></h2>
<p>These three objects make up most of the contents of a Git repository and are
used for the history. They can either appear as simple files on disk (one file
per object) or in a <tt class="docutils literal"><span class="pre">pack</span></tt> file, which is a container for a number of these
<p>The is also an index of the current state of the working copy in the
repository as well as files to track the existing branches and tags.</p>
<p>For a more detailed explanation of object formats and SHA-1 digests, see:
<a class="reference external" href=""></a></p>
<p>Just note that recent versions of Git compress object files using zlib.</p>

