Posted by Ancestry Team on February 26, 2015 in Development, Operations

You are probably aware of Git Submodules. If you haven’t, you may want to read about it from Atlassian and Git itself. In summary, Git provides a way to embed a reference to a separate project within a main project, while treating both projects as separate entities (versioning, commits etc).  This article applies to any project that makes use of such scenarios, irrespective of programming languages.

Recently, my team had issues with working with submodules. These ranged from changes in project structure to abrupt change in using tools and commands when working on projects that involve submodules.  In the industry, there are opinions that consider Git submoduling as an anti-pattern, where the ideal solution is to reference shared code only as precompiled packages (e.g. NuGet, Nexus etc).

This post is a short reflection on how we can restrict ourselves to only certain scenarios and how best to use projects utilizing submodules daily.

When should you use submodules?

  • When you want to share code with multiple projects
  • When you want to work on the code in the submodule at the same time as you work on the code in the main project

In a different light, this blog highlights several scenarios where Git submodule will actually make your project management a living nightmare.

Why use submodules?

To solve the following problems:

  • When a new person starts working on a project, you’ll  want to avoid having this person find out the individual shared reference repositories that need to be cloned. The main repository should be self-contained.
  • In a CI (continuous integration) environment, plainly hooking up shared reference Git repositories as material pulls would be detrimental as there is no coupling of versions between modules. Any modification in a repository would trigger the dependent CI pipeline; hence possibly causing a pipeline to be blocked if there is a breaking change.
  • Allow independent development of projects. For example, say both projects, ConsumerProject1 and ConsumerProject2 which depend on a SharedProject can be worked on without worrying about breaking changes that would affect the pipeline status (which may block development and deployment of the separate project/services).

How should submodules be restricted?

We found that the best way to prevent complexity from creeping in to this methodology is to do the following:

  • Avoid nested submodule structures, meaning that submodules containing other submodules, which may share the same submodules as the others, thus creating duplicates. Thus, the parent repository would NEVER be a shared project.
  • Depending on the development environment (i.e. Visual Studio) submodules should only be worked on when opened through the solution file of the parent repository. This is to ensure consistent relative path works across other parent repositories which consume the same submodules.
  • Submodules should always be added to the root of the parent repository (for consistency).
  • The parent  repository would be responsible for satisfying the dependency requirements of submodules by linking necessary submodules together, similar to the responsibility of an IoC (Inversion of Control) container (i.e. Unity).

What are the main Git commands when working with submodules?

  • When importing a repository that contain submodules:
    git clone --recursive <url>
  • When pulling in latest change for a repository (e.g. parent project) that contain submodules:
    git pull
    git submodule update --init –recursive

    The ‘update’ command is used to update the contents of the submodule folders after the first git pull updates the commit references of the submodules in the parent project (yes, it’s weird)

  • When you want to update all submodules for a repository to their latest commit
    git submodule update --init --remote --recursive
  • By default, the above submodule update commands will result in your submodules being in a detached state. Before you begin work, create a branch to track all changes.
    git checkout -b <branchname>
    git submodule foreach 'git checkout -b <branchname>'

 

So, how do you use Git submodules? What best practices do you use to keep Git modules well-managed and within a certain amount of complexity? Do share your experience and feedback here in the comment section below.