<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="Docutils 0.6: http://docutils.sourceforge.net/" /> <title>Dulwich Tutorial</title> <style type="text/css"> /* :Author: David Goodger (goodger@python.org) :Id: $Id: html4css1.css 5951 2009-05-18 18:03:10Z milde $ :Copyright: This stylesheet has been placed in the public domain. Default cascading style sheet for the HTML output of Docutils. See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to customize this style sheet. */ /* used to remove borders from tables and images */ .borderless, table.borderless td, table.borderless th { border: 0 } table.borderless td, table.borderless th { /* Override padding for "table.docutils td" with "! important". The right padding separates the table cells. */ padding: 0 0.5em 0 0 ! important } .first { /* Override more specific margin styles with "! important". */ margin-top: 0 ! important } .last, .with-subtitle { margin-bottom: 0 ! important } .hidden { display: none } a.toc-backref { text-decoration: none ; color: black } blockquote.epigraph { margin: 2em 5em ; } dl.docutils dd { margin-bottom: 0.5em } /* Uncomment (and remove this text!) to get bold-faced definition list terms dl.docutils dt { font-weight: bold } */ div.abstract { margin: 2em 5em } div.abstract p.topic-title { font-weight: bold ; text-align: center } div.admonition, div.attention, div.caution, div.danger, div.error, div.hint, div.important, div.note, div.tip, div.warning { margin: 2em ; border: medium outset ; padding: 1em } div.admonition p.admonition-title, div.hint p.admonition-title, div.important p.admonition-title, div.note p.admonition-title, div.tip p.admonition-title { font-weight: bold ; font-family: sans-serif } div.attention p.admonition-title, div.caution p.admonition-title, div.danger p.admonition-title, div.error p.admonition-title, div.warning p.admonition-title { color: red ; font-weight: bold ; font-family: sans-serif } /* Uncomment (and remove this text!) to get reduced vertical space in compound paragraphs. div.compound .compound-first, div.compound .compound-middle { margin-bottom: 0.5em } div.compound .compound-last, div.compound .compound-middle { margin-top: 0.5em } */ div.dedication { margin: 2em 5em ; text-align: center ; font-style: italic } div.dedication p.topic-title { font-weight: bold ; font-style: normal } div.figure { margin-left: 2em ; margin-right: 2em } div.footer, div.header { clear: both; font-size: smaller } div.line-block { display: block ; margin-top: 1em ; margin-bottom: 1em } div.line-block div.line-block { margin-top: 0 ; margin-bottom: 0 ; margin-left: 1.5em } div.sidebar { margin: 0 0 0.5em 1em ; border: medium outset ; padding: 1em ; background-color: #ffffee ; width: 40% ; float: right ; clear: right } div.sidebar p.rubric { font-family: sans-serif ; font-size: medium } div.system-messages { margin: 5em } div.system-messages h1 { color: red } div.system-message { border: medium outset ; padding: 1em } div.system-message p.system-message-title { color: red ; font-weight: bold } div.topic { margin: 2em } h1.section-subtitle, h2.section-subtitle, h3.section-subtitle, h4.section-subtitle, h5.section-subtitle, h6.section-subtitle { margin-top: 0.4em } h1.title { text-align: center } h2.subtitle { text-align: center } hr.docutils { width: 75% } img.align-left, .figure.align-left{ clear: left ; float: left ; margin-right: 1em } img.align-right, .figure.align-right { clear: right ; float: right ; margin-left: 1em } .align-left { text-align: left } .align-center { clear: both ; text-align: center } .align-right { text-align: right } /* reset inner alignment in figures */ div.align-right { text-align: left } /* div.align-center * { */ /* text-align: left } */ ol.simple, ul.simple { margin-bottom: 1em } ol.arabic { list-style: decimal } ol.loweralpha { list-style: lower-alpha } ol.upperalpha { list-style: upper-alpha } ol.lowerroman { list-style: lower-roman } ol.upperroman { list-style: upper-roman } p.attribution { text-align: right ; margin-left: 50% } p.caption { font-style: italic } p.credits { font-style: italic ; font-size: smaller } p.label { white-space: nowrap } p.rubric { font-weight: bold ; font-size: larger ; color: maroon ; text-align: center } p.sidebar-title { font-family: sans-serif ; font-weight: bold ; font-size: larger } p.sidebar-subtitle { font-family: sans-serif ; font-weight: bold } p.topic-title { font-weight: bold } pre.address { margin-bottom: 0 ; margin-top: 0 ; font: inherit } pre.literal-block, pre.doctest-block { margin-left: 2em ; margin-right: 2em } span.classifier { font-family: sans-serif ; font-style: oblique } span.classifier-delimiter { font-family: sans-serif ; font-weight: bold } span.interpreted { font-family: sans-serif } span.option { white-space: nowrap } span.pre { white-space: pre } span.problematic { color: red } span.section-subtitle { /* font-size relative to parent (h1..h6 element) */ font-size: 80% } table.citation { border-left: solid 1px gray; margin-left: 1px } table.docinfo { margin: 2em 4em } table.docutils { margin-top: 0.5em ; margin-bottom: 0.5em } table.footnote { border-left: solid 1px black; margin-left: 1px } table.docutils td, table.docutils th, table.docinfo td, table.docinfo th { padding-left: 0.5em ; padding-right: 0.5em ; vertical-align: top } table.docutils th.field-name, table.docinfo th.docinfo-name { font-weight: bold ; text-align: left ; white-space: nowrap ; padding-left: 0 } h1 tt.docutils, h2 tt.docutils, h3 tt.docutils, h4 tt.docutils, h5 tt.docutils, h6 tt.docutils { font-size: 100% } ul.auto-toc { list-style-type: none } </style> </head> <body> <div class="document" id="dulwich-tutorial"> <h1 class="title">Dulwich Tutorial</h1> <div class="contents topic" id="contents"> <p class="topic-title first">Contents</p> <ul class="simple"> <li><a class="reference internal" href="#introduction" id="id1">Introduction</a><ul> <li><a class="reference internal" href="#git-repository-format" id="id2">Git repository format</a></li> <li><a class="reference internal" href="#the-commit" id="id3">The Commit</a></li> <li><a class="reference internal" href="#the-tree" id="id4">The Tree</a></li> <li><a class="reference internal" href="#the-blob" id="id5">The Blob</a></li> <li><a class="reference internal" href="#dulwich-objects" id="id6">Dulwich Objects</a></li> <li><a class="reference internal" href="#more-about-git-formats" id="id7">More About Git formats</a></li> </ul> </li> <li><a class="reference internal" href="#the-repository" id="id8">The Repository</a></li> <li><a class="reference internal" href="#initial-commit" id="id9">Initial commit</a></li> <li><a class="reference internal" href="#playing-again-with-git" id="id10">Playing again with Git</a></li> <li><a class="reference internal" href="#changing-a-file-and-commit-it" id="id11">Changing a File and Commit it</a></li> <li><a class="reference internal" href="#adding-a-file" id="id12">Adding a file</a></li> <li><a class="reference internal" href="#removing-a-file" id="id13">Removing a file</a></li> <li><a class="reference internal" href="#renaming-a-file" id="id14">Renaming a file</a></li> <li><a class="reference internal" href="#conclusion" id="id15">Conclusion</a></li> </ul> </div> <div class="section" id="introduction"> <h1><a class="toc-backref" href="#id1">Introduction</a></h1> <div class="section" id="git-repository-format"> <h2><a class="toc-backref" href="#id2">Git repository format</a></h2> <p>For a better understanding of Dulwich, we'll start by explaining most of the Git secrets.</p> <p>Open the ".git" folder of any Git-managed repository. You'll find folders like "branches", "hooks"... We're only interested in "objects" here. Open it.</p> <p>You'll mostly see 2 hex-digits folders. Git identifies content by its SHA-1 digest. The 2 hex-digits plus the 38 hex-digits of files inside these folders form the 40 characters (or 20 bytes) id of Git objects you'll manage in Dulwich.</p> <p>We'll first study the three main objects:</p> <ul class="simple"> <li>The Commit;</li> <li>The Tree;</li> <li>The Blob.</li> </ul> </div> <div class="section" id="the-commit"> <h2><a class="toc-backref" href="#id3">The Commit</a></h2> <p>You're used to generate commits using Git. You have set up your name and e-mail, and you know how to see the history using <tt class="docutils literal">git log</tt>.</p> <p>A commit file looks like this:</p> <pre class="literal-block"> commit <content length><NUL>tree <tree sha> parent <parent sha> [parent <parent sha> if several parents from merges] author <author name> <author e-mail> <timestamp> <timezone> committer <author name> <author e-mail> <timestamp> <timezone> <commit message> </pre> <p>But where are the changes you commited? The commit contains a reference to a tree.</p> </div> <div class="section" id="the-tree"> <h2><a class="toc-backref" href="#id4">The Tree</a></h2> <p>A tree is a collection of file information, the state of your working copy at a given point in time.</p> <p>A tree file looks like this:</p> <pre class="literal-block"> tree <content length><NUL><file mode> <filename><NUL><blob sha>... </pre> <p>And repeats for every file in the tree.</p> <p>Note that for a unknown reason, the SHA-1 digest is in binary form here.</p> <p>The file mode is like the octal argument you could give to the <tt class="docutils literal">chmod</tt> command. Except it is in extended form to tell regular files from directories and other types.</p> <p>We now know how our files are referenced but we haven't found their actual content yet. That's where the reference to a blob comes in.</p> </div> <div class="section" id="the-blob"> <h2><a class="toc-backref" href="#id5">The Blob</a></h2> <p>A blob is simply the content of files you are versionning.</p> <p>A blob file looks like this:</p> <pre class="literal-block"> blob <content length><NUL><content> </pre> <p>If you change a single line, another blob will be generated by Git at commit time. This is how Git can fastly checkout any version in time.</p> <p>On the opposite, several identical files with different filenames generate only one blob. That's mostly how renames are so cheap and efficient in Git.</p> </div> <div class="section" id="dulwich-objects"> <h2><a class="toc-backref" href="#id6">Dulwich Objects</a></h2> <p>Dulwich implements these three objects with an API to easily access the information you need, while abstracting some more secrets Git is using to accelerate operations and reduce space.</p> </div> <div class="section" id="more-about-git-formats"> <h2><a class="toc-backref" href="#id7">More About Git formats</a></h2> <p>These three objects make 90 % of a Git repository. The rest is branch information and optimizations.</p> <p>For instance there is an index of the current state of the working copy. There are also pack files to group several small objects in a single indexed file.</p> <p>For a more detailled explanation of object formats and SHA-1 digests, see: <a class="reference external" href="http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html">http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html</a></p> <p>Just note that recent versions of Git compress object files using zlib.</p> </div> </div> <div class="section" id="the-repository"> <h1><a class="toc-backref" href="#id8">The Repository</a></h1> <p>After this introduction, let's start directly with code:</p> <pre class="literal-block"> >>> from dulwich.repo import Repo </pre> <p>The access to every object is through the Repo object. You can open an existing repository or you can create a new one. There are two types of Git repositories:</p> <blockquote> <p>Regular Repositories -- They are the ones you create using <tt class="docutils literal">git init</tt> and you daily use. They contain a <tt class="docutils literal">.git</tt> folder.</p> <p>Bare Repositories -- There is not ".git" folder. The top-level folder contains itself the "branches", "hooks"... folders. These are used for published repositories (mirrors).</p> </blockquote> <p>Let's create a folder and turn it into a repository, like <tt class="docutils literal">git init</tt> would:</p> <pre class="literal-block"> >>> from os import mkdir >>> mkdir("myrepo") >>> repo = Repo.init("myrepo") >>> repo <Repo at '/tmp/myrepo/'> </pre> <p>You can already look a the structure of the "myrepo/.git" folder, though it is mostly empty for now.</p> </div> <div class="section" id="initial-commit"> <h1><a class="toc-backref" href="#id9">Initial commit</a></h1> <p>When you use Git, you generally add or modify content. As our repository is empty for now, we'll start by adding a new file:</p> <pre class="literal-block"> >>> from dulwich.objects import Blob >>> blob = Blob.from_string("My file content\n") >>> blob.id 'c55063a4d5d37aa1af2b2dad3a70aa34dae54dc6' </pre> <p>Of course you could create a blob from an existing file using <tt class="docutils literal">from_file</tt> instead.</p> <p>As said in the introduction, file content is separed from file name. Let's give this content a name:</p> <pre class="literal-block"> >>> from dulwich.objects import Tree >>> tree = Tree() >>> tree.add(0100644, "spam", blob.id) </pre> <p>Note that "0100644" is the octal form for a regular file with common permissions. You can hardcode them or you can use the <tt class="docutils literal">stat</tt> module.</p> <p>The tree state of our repository still needs to be placed in time. That's the job of the commit:</p> <pre class="literal-block"> >>> from dulwich.objects import Commit, parse_timezone >>> from time import time >>> commit = Commit() >>> commit.tree = tree.id >>> author = "Your Name <your.email@example.com>" >>> commit.author = commit.committer = author >>> commit.commit_time = commit.author_time = int(time()) >>> tz = parse_timezone('-0200') >>> commit.commit_timezone = commit.author_timezone = tz >>> commit.encoding = "UTF-8" >>> commit.message = "Initial commit" </pre> <p>Note that the initial commit has no parents.</p> <p>At this point, the repository is still empty because all operations happen in memory. Let's "commit" it.</p> <blockquote> <pre class="doctest-block"> >>> object_store = repo.object_store >>> object_store.add_object(blob) </pre> </blockquote> <p>Now the ".git/objects" folder contains a first SHA-1 file. Let's continue saving the changes:</p> <pre class="literal-block"> >>> object_store.add_object(tree) >>> object_store.add_object(commit) </pre> <p>Now the physical repository contains three objects but still has no branch. Let's create the master branch like Git would:</p> <pre class="literal-block"> >>> repo.refs['refs/heads/master'] = commit.id </pre> <p>The master branch now has a commit where to start, but Git itself would not known what is the current branch. That's another reference:</p> <pre class="literal-block"> >>> repo.refs['HEAD'] = 'ref: refs/heads/master' </pre> <p>Now our repository is officialy tracking a branch named "master" refering to a single commit.</p> </div> <div class="section" id="playing-again-with-git"> <h1><a class="toc-backref" href="#id10">Playing again with Git</a></h1> <p>At this point you can come back to the shell, go into the "myrepo" folder and type <tt class="docutils literal">git status</tt> to let Git confirm that this is a regular repository on branch "master".</p> <p>Git will tell you that the file "spam" is deleted, which is normal because Git is comparing the repository state with the current working copy. And we have absolutely no working copy using Dulwich because we don't need it at all!</p> <p>You can checkout the last state using <tt class="docutils literal">git checkout <span class="pre">-f</span></tt>. The force flag will prevent Git from complaining that there are uncommitted changes in the working copy.</p> <p>The file <tt class="docutils literal">spam</tt> appears and with no surprise contains the same bytes as the blob:</p> <pre class="literal-block"> $ cat spam My file content </pre> <div class="attention"> <p class="first admonition-title">Attention!</p> <p class="last">Remember to recreate the repo object when you modify the repository outside of Dulwich!</p> </div> </div> <div class="section" id="changing-a-file-and-commit-it"> <h1><a class="toc-backref" href="#id11">Changing a File and Commit it</a></h1> <p>Now we have a first commit, the next one will show a difference.</p> <p>As seen in the introduction, it's about making a path in a tree point to a new blob. The old blob will remain to compute the diff. The tree is altered and the new commit'task is to point to this new version.</p> <p>In the following examples, we assume we still have the <tt class="docutils literal">repo</tt> and <tt class="docutils literal">tree</tt> object from the previous chapter.</p> <p>Let's first build the blob:</p> <pre class="literal-block"> >>> spam = Blob.from_string("My new file content\n") >>> spam.id '16ee2682887a962f854ebd25a61db16ef4efe49f' </pre> <p>An alternative is to alter the previously constructed blob object:</p> <pre class="literal-block"> >>> blob.data = "My new file content\n" >>> blob.id '16ee2682887a962f854ebd25a61db16ef4efe49f' </pre> <p>In any case, update the blob id known as "spam". You also have the opportunity of changing its mode:</p> <pre class="literal-block"> >>> tree["spam"] = (0100644, spam.id) </pre> <p>Now let's record the change:</p> <pre class="literal-block"> >>> c2 = Commit() >>> c2.tree = tree.id >>> c2.parents = [commit.id] >>> c2.author = c2.committer = author >>> c2.commit_time = c2.author_time = int(time()) >>> c2.commit_timezone = c2.author_timezone = tz >>> c2.encoding = "UTF-8" >>> c2.message = 'Changing "spam"' </pre> <p>In this new commit we record the changed tree id, and most important, the previous commit as the parent. Parents are actually a list because a commit may happen to have several parents after merging branches.</p> <p>Remain to record this whole new family:</p> <pre class="literal-block"> >>> object_store.add_object(spam) >>> object_store.add_object(tree) >>> object_store.add_object(c2) </pre> <p>You can already ask git to introspect this commit using <tt class="docutils literal">git show</tt> and the value of <tt class="docutils literal">commit.id</tt> as an argument. You'll see the difference will the previous blob recorded as "spam".</p> <p>You won't see it using git log because the head is still the previous commit. It's easy to remedy:</p> <pre class="literal-block"> >>> repo.refs['refs/heads/master'] = c2.id </pre> <p>Now all git tools will work as expected. Though don't forget that Dulwich is still open!</p> </div> <div class="section" id="adding-a-file"> <h1><a class="toc-backref" href="#id12">Adding a file</a></h1> <p>If you followed well, the next lesson will be straightforward.</p> <p>We need a new blob:</p> <pre class="literal-block"> >>> ham = Blob.from_string("Another\nmultiline\nfile\n") >>> ham.id 'a3b5eda0b83eb8fb6e5dce91ecafda9e97269c70' </pre> <p>But the same tree:</p> <pre class="literal-block"> >>> tree["ham"] = (0100644, spam.id) </pre> <p>And a new commit:</p> <pre class="literal-block"> >>> c3 = Commit() >>> c3.tree = tree.id >>> c3.parents = [commit.id] >>> c3.author = c3.committer = author >>> c3.commit_time = c3.author_time = int(time()) >>> c3.commit_timezone = c3.author_timezone = tz >>> c3.encoding = "UTF-8" >>> c3.message = 'Adding "ham"' </pre> <p>Save it all:</p> <pre class="literal-block"> >>> object_store.add_object(spam) >>> object_store.add_object(tree) >>> object_store.add_object(c3) </pre> <p>Update the head:</p> <pre class="literal-block"> >>> repo.refs['refs/heads/master'] = commit.id </pre> <p>A call to <tt class="docutils literal">git show</tt> will confirm the addition of "spam".</p> <p>Remember you can also call <tt class="docutils literal">git checkout <span class="pre">-f</span></tt> to make it appear.</p> <p>Well... Adding "spam" was not such a good idea... We'll remove it.</p> </div> <div class="section" id="removing-a-file"> <h1><a class="toc-backref" href="#id13">Removing a file</a></h1> <p>Removing a file just means removing its entry in the tree. The blob won't be deleted because Git tries to preserve the history of your repository.</p> <p>It's all pythonic:</p> <pre class="literal-block"> >>> del tree["ham"] >>> c4 = Commit() >>> c4.tree = tree.id >>> c4.parents = [commit.id] >>> c4.author = c4.committer = author >>> c4.commit_time = c4.author_time = int(time()) >>> c4.commit_timezone = c4.author_timezone = tz >>> c4.encoding = "UTF-8" >>> c4.message = 'Removing "ham"' </pre> <p>Here we only have the new tree and the commit to save:</p> <pre class="literal-block"> >>> object_store.add_object(spam) >>> object_store.add_object(tree) >>> object_store.add_object(c4) </pre> <p>And of course update the head:</p> <pre class="literal-block"> >>> repo.refs['refs/heads/master'] = commit.id </pre> <p>If you don't trust me, ask <tt class="docutils literal">git show</tt>. ;-)</p> </div> <div class="section" id="renaming-a-file"> <h1><a class="toc-backref" href="#id14">Renaming a file</a></h1> <p>Remember you learned that the file name and content are distinct. So renaming a file is just about associating a blob id to a new name. We won't store more content, and the operation will be painless.</p> <p>Let's transfer the blob id from the old name to the new one:</p> <pre class="literal-block"> >>> tree["eggs"] = tree["spam"] >>> del tree["spam"] </pre> <p>As usual, we need a commit to store the new tree id:</p> <pre class="literal-block"> >>> c5 = Commit() >>> c5.tree = tree.id >>> c5.parents = [commit.id] >>> c5.author = c5.committer = author >>> c5.commit_time = c5.author_time = int(time()) >>> c5.commit_timezone = c5.author_timezone = tz >>> c5.encoding = "UTF-8" >>> c5.message = 'Rename "spam" to "eggs"' </pre> <p>As for a deletion, we only have a tree and a commit to save:</p> <pre class="literal-block"> >>> object_store.add_object(tree) >>> object_store.add_object(c5) </pre> <p>Remains to make the head bleeding-edge:</p> <pre class="literal-block"> >>> repo.refs['refs/heads/master'] = commit.id </pre> <p>As a last exercise, see how <tt class="docutils literal">git show</tt> illustrates it.</p> </div> <div class="section" id="conclusion"> <h1><a class="toc-backref" href="#id15">Conclusion</a></h1> <p>You'll find the <tt class="docutils literal">test.py</tt> program with some tips I use to ease generating objects.</p> <p>You can also make Tag objects, but this is left as a exercise to the reader.</p> <p>Dulwich is abstracting much of the Git plumbing, so there would be more to see.</p> <p>Dulwich is also able to clone and push repositories.</p> <p>That's all folks!</p> </div> </div> </body> </html>