Interest Article

Consolidating SecureDrop Workstation’s Git repositories to make development easier

January 30, 2024

As the SecureDrop team previously announced, we’re shifting our focus in order to graduate SecureDrop Workstation from its pilot phase. One of the first steps we’ve taken in this direction is to reorganize and consolidate the project's Git repositories to make development and releases easier and faster.

In summary:

Background

SecureDrop Workstation is built on top of Qubes OS, embodying the concept of “security by compartmentalization.” The workstation runs across multiple virtual machines and is split into multiple components that each run in their own VM.

SecureDrop Workstation Dataflow diagram

There are four key components:

  • Client: Graphical Qt application that journalists interact with.
  • Export: Application responsible for document export and printing.
  • Log: Centralized logging from all other VMs.
  • Proxy: Bespoke HTTP proxy to limit outbound traffic.

Most of the components interact with one another; for example, the client sends documents to export and makes API requests through the proxy.

Because each runs in its own VM, it was originally a reasonable decision to separate each one into its own Git repository to make it clear where the boundary is. Unfortunately, we’ve found out the hard way that this setup ends up causing more work and offering few benefits. Changes to one component often necessitate changes to another, which means awkwardly managing and testing multiple pull requests.

We also have to keep all the repository boilerplate (CI manifests, linters, helper scripts, etc.) up to date in multiple places, which didn’t always happen. In early 2023, we mapped out the tools used across our primary repositories to see exactly just how bad the divergence had become.

We’re already starting to see the benefits of consolidating; instead of two pull requests across the client and export repositories, we just have a single one (still in progress) to add VeraCrypt support.

Case study: Proxy and SDK

Another good case study for this is planned changes to the proxy component and SDK. At our November 2023 team retreat, we worked out a plan to rearchitect the proxy to lay the groundwork for features like resuming file downloads and progress reporting. This would have necessitated changes across three Git repositories:

  1. The proxy itself
  2. The SDK, which invokes the proxy
  3. The client, which pulls in the SDK as a Python package dependency

In contrast, the new repository setup requires just one, because we can update both the SDK and the proxy at the same time.

We can also simplify how the SDK works, since it’s no longer a stand-alone package. It currently has two modes of operation: one for development purposes, which makes HTTP requests using the requests library; and another, for production installs, which makes HTTP requests through the proxy VM.

Now that the proxy is in the same Git repository, we can assume it’ll always be present, and have the SDK unconditionally route requests through the proxy, optionally also through the Qubes VM boundary. We’ll get to remove some code and eliminate a difference between development and production modes.

Merging repositories with --allow-unrelated-histories

When consolidating the repositories, we wanted to make sure to keep the Git history intact, not only for tools like git blame but also so that past contributors would be properly attributed. Git makes it straightforward to merge two (or more!) repositories together with git merge --allow-unrelated-histories (documentation).

The workflow was roughly to move files into a separate directory (to avoid merge conflicts) and then merge them into the main branch. For example, at the command line:

$ git clone git@github.com:freedomofpress/securedrop-client.git sd-client
$ cd sd-client
sd-client$ git remote add export git@github.com:freedomofpress/securedrop-export.git
sd-client$ git fetch export && git checkout -b export/main export-main
sd-client$ mkdir export/ && mv * export/ && git add .
sd-client$ git commit -m “Move files into export/ directory”
sd-client$ git checkout main
sd-client$ git merge --allow-unrelated-histories export-main

If you try to git blame a file to see who changed it in the past, it works just fine! And if you view the file history, GitHub will let you know it was renamed and link you to the pre-merge history.

Final thoughts

There are some risks in this change. For example, we could just end up with too many files in one place, or it could become hard to keep track of everything (this has happened somewhat in the SecureDrop server Git repository, which is a true monorepo). But we also want to make sure it is straightforward for newcomers to find their way around our codebases and make contributions — consolidation may affect this.

There is one other Git repository we have not discussed yet, securedrop-workstation, which contains code for installation and provisioning. We have intentionally not consolidated this repository yet, as its contents run in a pretty different context (dom0 running Fedora) than the other components (locked-down Debian VMs, shipped as .debs). Some changes will still need to be made across the workstation and client repositories, and we’ll continue to reevaluate whether keeping it separate helps or hinders us.

Return to News