blog

GitHub is getting easier

Today, GitHub announced that you can move and rename files within your repositories actually at their website, rather than by making the changes locally, committing and then pushing them. I confess that I didn’t even know, until today, that you could create and edit files in the first place on the GitHub website, I’ve just been using it as a freetard’s on-line repository – commit, push and forgeddaboutit.

I think this changes the game both for the user and for GitHub itself. Normally, one would create and edit files locally, commit them to the local git repository, and then push the changes up to GitHub (or any on-line git repository such as those hosted by SourceForge or BitBucket). Such is this work-flow so natural that I rarely ever visit the GItHub web-site, except when I need to create a new, blank, repository. Now a novice user can very easily create and initialise a new Git repository, create and edit files, rename and move them, create branches, fork and merge – all without so much as a nod to a command line or a Git client. So the dirty mechanics of Git recede into the background and what we have is an easy to use web interface/client to create and manage versions of files, and with a social sharing aspect, to boot.

GitHub is therefore de-emphasising the difficult and opaque “Git” aspect and increasing its “Hub-ness”. I think this potentially increases the user base and positions it for services and features that are not necessarily tied to just geeky and boring Source Control Management. It’s not that Git is getting sexy, but that file version control and repository forking is getting easier. It also means that users might spend more time at the GitHub web-site rather than in their Git client or IDE. Ads next, maybe?

I propose a new word:

giterate
adjective
able to use Git literately.

As in, “Hey, Steve’s getting quite giterate these days!”

Why is Git scary?

There have been a few occasions where I’ve tried to explain how the distributed nature of Git works to an interested listener, usually a Subversion (or perhaps CVS) user, who just downright couldn’t get it and got annoyed at not getting it. Clearly I didn’t explain it well, but if you don’t know how Git works, and you’ve only ever used a centralised repository system it sure can sound kind of mysterious and scary. How the hell can you have more than one repository? Which one is the real, or canonical, one? What if someone codes something phenomenal, but it’s in their own repo; how do we get that in the build? What if there are a hundred different versions? How can you integrate the work of different developers? What if there’s a conflict? What if…

I think the missing piece of information that can help in understanding Git is that there actually is a central, or canonical, repository. It’s the one you do your build from. Let’s take the Linux kernel as an example. There are probably thousands of versions of this pulled from the main Git repository, perhaps hobbyist developers, or people actually working on the kernel. But when it comes to do the build, then the agreed main repository is the one to use. One of the most useful things you can do to get a better understand of Git is to watch Linus Torvalds explain it at a Google Tech Talk in 2007 (Git has come a long way since then):

A great takeaway from that video is the slide (at about 12:30) that shows the difference between centralised and distributed systems. I’ve made my own versions.

Here’s a centralised system such as Subversion or CVS:

centralised

And a distributed system like Git:

distributed

In the first diagram, each user has to commit their work to the one central repository, let’s call it The Central Scrutinizer. To pay homage to the Central Scrutinizer you have to be online. You’re out of luck if you want to commit some code when you’re on a plane. Each user works in isolation, checking their changes in and checking things out hoping there’s no conflict that requires a merge.

In the second diagram you can see that there is one main repository in the centre, let’s call it the Le Big Mac, but there are many satellite repositories owned by users that seem to have formed sub-groups. These can cluster together creating and refining their own secret sauce that can be pushed to Le Big Mac when they’re ready.

So, if you have a problem visualising how Git works, just remember The Central Scrutinizer and Le Big Mac.

Bundling a Java Runtime Environment (JRE) with an Eclipse RCP application

I’ve figured out how to bundle the Oracle Java Runtime Environment (JRE) 7 with the Mac version of Archi, an Eclipse-based Rich Client Platform (RCP) application that I developed. There’s no real need to do this at the moment, because the first time an application with a dependency on desktop Java, such as Archi, is run on a Mac, and Java is not currently installed on the system, the user will be informed that the application needs it and Java will be automagically downloaded and installed. This of course ensures that you can easily run your favourite Java-based applications such as Archi, XMind, SmartGit, Eclipse, IDEA and a lot of other useful tools. (Note that Apple only supports JRE 6.)

For Archi I ensure that the user has reasonable choices – on Windows they can either use the forget-about-it installer which includes its own local copy of the JRE (and is only used by Archi and for no other purposes, not even the Browser) or download the manual zip file, which means that the user needs to manually install their own copy of Java. On Linux, the user will probably want to compile the source anyway and knows what type of Java framework they want on their system (probably OpenJDK). But on the Mac, the user has to let the system install Apple’s version of Java (version 6) or, if they prefer to use the latest version 7, they have to manually download and install the JRE from Oracle’s website. This is a bit of a pain. I tried this myself and could only get the JDK to work, not the JRE. Whilst Lion, Mountain Lion and Mavericks OS X will install JRE 6, this might not always be the case in future version of OS X.

The advantages of bundling a local copy of the JRE with Archi are:

  • The user doesn’t have to worry about installing Java (or even care that the application requires it)
  • The JRE is local and is only used by the application and is therefore “sandboxed”
  • It isn’t installed as an extension in the Browser (this is the real vector for trojans and virii)
  • When the user deletes the application off their system, they also delete the local JRE – an instant complete uninstall

Disadvantages:

  • The download size and application footprint is bigger (adds about another 140mb or so when unpacked)
  • Each Java based application will have its own copy of the JRE when only one system-wide copy is necessary, so you could end up with some disk bloat
  • …can’t think of any more 🙂

So, how do we bundle a copy of the the Mac JRE 7 with Archi, or any Eclipse-based RCP application for that matter?

I already do this for the Windows version of Archi by simply copying the “jre” folder of the official JRE (with its “lib” and “bin” sub-directories) and putting it at the root level of the Archi installation. This procedure works for any other Eclipse-based application, including Eclipse itself.

On OS X this has only been possible since Oracle’s later versions of JRE 7, and later versions of Eclipse itself. The same principle applies on a Mac as for Windows – include the JRE in a “jre” folder at the root level of the application. Of course, as Archi is delivered as a self-contained application bundle on Mac (Archi.app) the “jre” folder sits inside the Archi.app bundle at the same level as the other folders:

Archi.app
   |______ configuration
   |______ Contents
   |______ plugins
   |______ jre

So, how do we make a re-distributable copy of the JRE to add to the application bundle? The only way I could figure out how to do this was to firstly install the JDK onto a Mac and then make a copy of some sub-folders and files:

  1. Install Oracle’s JDK 7 on a Mac (not the JRE)
  2. Copy the “/Library/Java/JavaVirtualMachines/jdk1.7.0_xx.jdk” folder and rename the copy  to “jre” (xx = the two-digit version number of the JDK)
  3. Delete everything in the copy’s “Contents/Home” sub-folder except for the “jre” sub-folder

You end up with a slimmed-down JRE with this folder structure:

jre
   |______ Contents
            |______ Home
                     |______ jre
            |______ Info.plist
            |______ MacOS

This “Jre” folder then needs to be added to the Archi.app bundle.

Note – as I use a Windows build machine and an Ant script to create the installation archives for Archi I found that some files in the JRE folder lost their executable bit and so some files didn’t work. To get around this I simply zipped up the copied JRE folder on a Mac and used this zip file as the source of the Mac JRE, so that the Ant script simply copies the zip file’s content to the overall target installation archive, preserving file attributes (including an “alias” type file).

I haven’t rolled this out yet, but might do so for a future version of Archi. It can only be the 64-bit version, though, as Oracle’s JRE 7 only supports 64-bit.

Update – some people have emailed to ask how I get the Archi Eclipse application into an Archi.app application bundle in the first place. By default, an Eclipse product export doesn’t do this, so I’ve written an Ant script that moves the “plugins” and “configuration” folders down one level into the .app bundle, moves the “Archi” executable launcher file down a level into the Archi.app bundke, and modifies the Info.plist file to adjust the path to the launcher file. The folder structure inside the app bundle looks like this:

archi foldersThe Info.plist file is modified to set the launcher path:

CFBundleExecutable 
launcher/Archi

Gamification and Vacuous Neologisms

I read a post this morning by Nigel Green – Four G’s: Gartner, Gamification Getting Things Done & Game Theory

A nice post, fair enough. But the part that made me choke on my cornflakes* was this quote from Gartner’s Steve Prentice:

We all do Gamification already. Gamification is when we create a To Do List and enjoy the satisfaction of ticking items off and finally completing the list. It gives us focus and goals to achieve.

No, no, no, no, no! When I tick a task off of my To Do list it’s because I damn well made myself do it in spite of not wanting to. It’s called “discipline”. Do I need another word for this?

What’s the value in using another word for something I already do? When I’m on my daily jog I set myself small goals to make it more interesting, such as “run to the next lamp-post”. “Gamification!” goes the cry. Setting goals and achieving them is now “Gamification”. Groan. So what added value does this re-branding provide? To me, none. To a consultant, an academic, or an author, possibly a whole lot – opportunities for workshops, for consultancy, perhaps a paper, or a trend-setting “How To” book.

A quick Googling of the neologism led me to this article from How Stuff Works:

McGonigal believes that if people worldwide could play more, not less, in the right game scenario, their experience could help solve some of the world’s biggest problems like hunger, poverty and global conflict.

My heart sinks.

And this:

In his 2010 book “Game-Based Marketing,” co-authored with writer Joselin Linder, Zichermann defines a related term he coined: funware. Funware describes the everyday activities we’re already engaging in that we consider a game. Zichermann explains that business should look for ways to apply funware in their marketing. Funware, he says, is the core component in applying gamification to business.

My heart sinks even further. This is the kind of nonsense Douglas Adams would have included in the “B” Ark.

But, sadly, I need to get back to work, there’s a bug I need to fix. Damn, if only I had some Funware to fix it.

(* Disclaimer – I don’t actually eat cornflakes for breakfast, preferring instead that prince of foods, the muffin)

MOOCs

Most of the recent anti-MOOC commentary by the cleverati sounds more like sour grapes to me. One bogus argument is that courses achieve a low completion rate. 10% of several thousand is doing OK by anybody’s book.

Here are some comments from Tucker Balch who’s actually taught a MOOC:

The cost for a MOOC is zero. All a student need do is provide an email address, and click a button labeled “sign me up.”

Failing a course at a university is costly in many ways for a student. Besides the time and funds lost, there’s the cost of that “F” on the transcript. There are no such costs associated with MOOCs.

But MOOC completion rates aren’t really low in the context of Internet engagement. A click through rate of 5% for a google ad is considered a strong success. Convincing 5% to engage intellectually for 8 weeks is, I think, a big deal.

A refreshing change to the the tiresome armchair punditry of those who typically haven’t taken a MOOC or taught one. It reminds me of the brouhaha in the 1980s when the UK Musician’s Union tried to limit the use of Samplers because they feared that “real” musicians would be done out of a job. That’s the real issue here isn’t it? The bogus edutech cleverati weren’t consulted, MOOCs have been launched without their (unwanted) say-so, and they’re basically out of a job.

Forking Hell? Git, GitHub, and the Rise of Social Coding

Wired recently ran an article about GitHub – the web service that provides version control management repositories using the Git version control system. What makes this interesting is that, as they point out in the follow-up story, it is a self-referential experiment in version control because, in addition to publishing their story about GitHub on the Wired site, they also published it on GitHub itself. What they did essentially was to eat their own dog food (or put their money where their mouth is, if you prefer) by sharing the text of the article under a Creative Commons licence for anyone to download, edit, modify, translate into another language and then resubmit. An example of crowd-sourcing, right?

Reading the Wired article reminded me of the realisations that occurred to me when I first started to use Git and GitHub. I became aware of the greater potential for distributing and sharing different types of material and the possibilities for social interactivity that it might engender. I found it extraordinary, and still do, that something as geeky as Source Control Management had  suddenly become cool, and that there was a buzz around GitHub similar to that coming from Facebook and Twitter.

But first, we need to take a short break for some Git 101…

Git

Git is a distributed or Decentralised Version Control System (DVCS). Prior to Git (and Mercurial, another DVCS) developers managed versions of their shared source code in systems such as CVS and Subversion. Typically, when using these “legacy” systems, a developer downloads any new changes to the shared code-base from the code repository, merges these with their own local changes, edits and adds new code, and then re-submits the new work back to a centralised repository situated on the local network or on a public hosted repository, such as SourceForge. This system ensures that there is a full history of all changes to the code (a record of who did what) and creates “tags” that mark released versions and milestones. The system works somewhat like the revision history used in Wiki software.

But two words immediately stand out here: Code and Centralised. Let’s be honest, these Source Control Management systems were, and still are, typically used by developers for storing their code, whether that be Java code, C++ code, or HTML. And each of those coders is uploading and downloading that code from one single, centralised repository. But what if the server goes down? What if you’re working off-line? What if two people try to upload conflicting changes at the same time? And what if you don’t “do” code?

Enter Linus Torvalds, founder of the Linux operating system. In 2005, after struggling to manage the Linux code-base using an ad hoc system of “patches” sent by users via email or other precarious systems, Torvalds created Git, a version control system designed to ease the burden of source control management and which enables developers to create their own branches or “forks” of the code. These “forks” can remain in the hands of the forker, acting as their own version of Linux, or they may be shared with Torvalds or other developers who would then choose to merge or reject the changes into the main code-base. I recommend that you watch Torvalds describe Git and its genesis in his own inimitable way in this YouTube video of a talk he gave to Google some years ago. At the very least, it’s entertaining.

The beauty of Git is that because the user clones a local copy of the material in the repository on their hard drive they no longer have to be on-line in order to “commit” changes. They can commit as many branches and versions as they like to their local copy and then upload (or “push”) their commits to an on-line repository whenever they happen to be on-line, or not at all. Furthermore, their copy is, in a sense, the repository, being an exact copy of the source. And this is the killer feature, there is no “main” repository, as it’s a distributed system. If a hundred users create a hundred clones of a source repository then there are a hundred separate, and potentially different, versions in existence as each person contributes their own edits and additions. But think about this for a moment: if there are a hundred separate versions of the source material then which one is the “right”, or canonical, one?

For it is written, “…realise the truth. There is no master repository”.

It’s an egalitarian system. Each clone, or copy, of the code can be regarded as the “master” repository. Or not. I can assemble a group of co-workers and we can work collaboratively on our version and merge our changes together and then “push” those changes to yet another group working on the same material. These are self-organising systems.

Having said that, it can be useful to have one nominated de facto repository that hosts the “authoritative” version that can be used for a build, for example if it’s code for an application, or as a starting point for people to clone their own copies. This obviously requires some kind of centralised public presence. Enter GitHub.

GitHub

GitHub is a web-based hosting service for software development projects that use Git.  Certainly, Git can be difficult for the non-technical user (and indeed for many avowed techies) and GitHub attempts to make the process a whole lot easier. GitHub’s first tag-line was “Git Hosting: no longer a pain in the ass”.

GitHub provides free (and paid for) facilities to host Git repositories but perhaps more importantly, provides social networking services that aid collaboration and sharing. For example, I can comment on your code, fork your code, ask you to merge my changes into your version (a “pull” request), edit your Wiki, watch your repository and follow users in a similar way to following users on Twitter.

GitHub makes any Git interaction painless, quick and easy. I can browse its collection of repositories, find something I find interesting, click on the “Fork” button and immediately GitHub will create a clone of the repository and add it to my user space. I then get to work on the cloned copy in any way I wish. Using the command line Git is a daunting process for the non-technically inclined, but using GitHub means that you don’t even have to install Git on your computer. Get to work on a project from your iPhone if you like.

It’s been said that GitHub is “Facebook for geeks”, because you can also share snippets of code, text, or anything as a “Gist” – perhaps this feature is akin to a “Twitter for geeks”?

Hold up a moment. Isn’t the idea behind Git that it’s supposed to be decentralised and distributed? Surely GitHub is now functioning as a central repository? Not really. GitHub is just one node amongst potentially many. I myself use three repositories for the code for Archi, a backup on a network drive, one at SourceForge and one at GitHub. These are all mirrors of exactly the same material. GitHub provides visualisation and social tools, and don’t forget the “hub” part of the name. It is centralised, but only in the same way that a common room acts as meeting place for social interaction. You’re free to grab what you want and go. The moment you clone a Git repository you too represent another node in the distributed graph.

Not Only for Developers and Code

As I said earlier, it may seem that Git and GitHub are designed only for developers to manage their code. It turns out that this isn’t the case. What’s interesting is that riding on the back of its cool factor, GitHub is increasingly being used to host and share material that isn’t code. Writers are using it to version and share their novels, musicians to promote their songs and invite remixes, and artists to capture their work-flow.

Writers

As the Wired article mentioned earlier demonstrates, writers are now starting to use Git and GitHub to manage their novels, poems, and articles. Clearly, some writers are finding this invaluable for distributing their work and crowd-sourcing revisions. If they keep versions of their masterpiece in a fine-grained way by committing versions into a Git repository as they progress an interesting archival record is made, as Cory Doctorow points out:

…prior to the computerized era, writers produced a series [of] complete drafts on the way to publications, complete with erasures, annotations, and so on. These are archival gold, since they illuminate the creative process in a way that often reveals the hidden stories behind the books we care about.

And not only that, Doctorow notes that a service such as GitHub can provide you with the means (and the incentive) to publish some or all of your projects to a public repository or to a private site and, furthermore, the publisher can check out the latest revision of an author’s text when it’s time to publish an updated version.

It sounds attractive: push your latest novel to a Git repository, commit your changes, branch off different versions and let your readers and your publisher choose the version they want. Fork that!

Musicians

Durham-based band, the Bristol 7’s, last year released their album, “The Narwhalingus EP” on GitHub under a Creative Commons licence “to see what the world could do with it”. The release, if we can call it that, comprises the final mixes and the individual tracks as MP3 files. The band invites everyone to:

Fork the repo, sing some harmony, steal my guitar solo, or add a Trance beat. Whatever you want to do, just tell us about it, so we can hear what’s become of our baby!

At the time of writing they have nine forks. Actually, ten, since I just forked it. Now where’s that guitar…

Artists

Cory Doctorow has discovered that Mark V of Electric Puppet Theatre is using Git to produce automated “making of” videos of his workflow. V says:

Electric Puppet Theatre is a web comic that I draw in Inkscape, using git for version control. A neat side effect of using git is that I can make a ‘making of’ video for each 24 page issue by playing the git repository through ffmpeg. The linked page contains animations for the first two issues as well as instructions on creating this type of animation (touching on how to make both ogg and youtube-compatible webm animations).

Again, a perfect example of versioning being used to illuminate the creative process and using Git like a playback script.

More Uses

Here’s another very interesting potential use for Git. The Wired article cites the case of Ryan Blair,  a technologist with the New York State Senate who wants to see citizens “fork the law”:

[He] thinks it could even give citizens a way to fork the law – proposing their own amendments to elected officials. A tool like GitHub could also make it easier for constituents to track and even voice their opinions on changes to complex legal code. “When you really think about it, a bill is a branch of the law”, he says. “I’m just in love with the idea of a constituent being able to send their state senator a pull request.”

In the world of education, Git could be used to promote the use of Open Educational Resources (OERs) in the class-room. A teacher or lecturer could create their own set of resources under a Creative Commons licence, “push” them to their public GitHub repository and use this as a starting point for distribution, and hopefully attract contributions in order to crowd-source a richer set of materials.

GitHub as a Record of Achievement

Something else occurs to me. If I have all my code publicly available on GitHub with all my interactions, Gists, forks, contributions, interactions and so on, doesn’t this constitute a kind of open record of my personal achievements, competence, abilities and social interaction? It’s common practice for some employers to peruse potential employees’ Facebook pages to discover some juicy bits of background information. Might I not be able to exploit GitHub to show how I’ve been actively engaged in the developer community, or to show off my coding chops?

Git and Me Sitting in a Tree…

Did I mention that I think Git rocks? I was forced to use Git at the start of 2011 when the CVS repository that I relied on at SourceForge became unavailable for several days due to a malicious attack. Ten days later and still with no repository on-line, I vowed never again to be reliant on a single centralised point of failure. This is the beauty of distributed. In fact, Linus Torvalds admits to never backing up his laptop since the work he does is so heavily cloned that his stuff is just “out there”, each user’s copy being a backup in itself.

Even if you are the sole developer/writer/artist/musician on a project don’t assume that you won’t benefit from Git and GitHub. At least you’ll have an archive of past versions and experimental branches to fall back on if things go wrong or you lose your work. For my current project, Archi, I use four Git repositories. Two are deemed to be official or canonical – one at SourceForge and one at Git, one named “experimental” at SourceForge and a local backup. The “experimental” repository allows me to keep experimental work.

Git is good, Git is social, all the cool kids use Git. I can have as many Git repositories as I like, for free, and I don’t have to be on-line when I work. Git, where have you been all my life? And even if you never use Git, who can resist the mascot of GitHub, the mighty Octocat…

octocat

Footnote

OK, I admit it. I should have done the obvious and uploaded the text of this post to GitHub…

If you want to find out more about Git I suggest reading Scott Chacon’s useful book “Pro Git”. As for Git tools, I personally don’t use the command line as I like to see what I’m doing. I recommend SmartGit as it’s cross-platform and free for non-commercial usage.

Some resources:

Begin typing your search term above and press enter to search. Press ESC to cancel.