RFC - Design Doc - BPMN Git Sync

BPMN Git Sync

Nominated Owner: @geoff

Summary

BPMN Specifications are stored in the database and versioned which approximates the familiar Git workflow, but not as well. AppSmith has Git Sync functionality that stores the configuration of AppSmith pages in a Git repo which is a more familiar workflow for developers. Git branching would also make the deployment of changes across environments more streamlined allowing code review and approval.

Retrospective

This section is essential to allow us to learn from the things we are implementing
Retro completed?

Motivation

Why do this? What use-case does it support? What is the expected outcome?

  • BPMN Specifications are essentially code that should be under version control
  • Being stored as data makes them different to all the other code and creates an additional step when deploying code changes across environments
  • Versioning in the database does provide the ability to approve and roll-back, but is more complex to deploy across environments
  • Moving the source control function to Git means that customers can control their source code using git rules and branch protections like all their other code.
  • Most projects are manually doing this now. Exporting their BPMN definitions and saving into a repo. This change would improve that experience by making it a native part of the platform.

Guide-level explanation

Following the AppSmith pattern for how this could work, In the Admin UI, an administrator would configure the connection to the git repo, setting up the url, and authentication details.
AppSmith also allows the administrator to configure the default branch, and branch protection. Changes to the app are prevented if the user has selected a protected branch.

When editing a BPMN, the user would be able to:

  • Select or create a branch
  • Pull from the git repo
  • Push local changes to the git repo
  • Merge requests would be done in the git repo directly.

Reference-level explanation

The integration with Github would be best done from the backend so as to keep the frontend simple, but also to allow other data objects to be stored in Git in the future.

The following Mutations are envisioned:

  • SetupGitSync: A mutation that establishes the connection to the git repo
  • SetupGitBranchProtection: A mutation that allows the user to store branches that should be protected from changes
  • GitPull(branch): This will pull the files from the git repo. The mutation would hen synchronise them with the data stored in the database
  • GitPush(branch): This will read the data from the database and push it to the repo. It’s effectively like a file export to git

Drawbacks

why we should NOT do this
We are currently working on data migration functionality that will allow users to migrate data from one environment to another, which will address much of the current need. Git Sync is a significant development. AppSmith took over 12 months to get their Git Sync functionality to the point where it is useable.

Extending this to other configuration data, like Equipment Versions, Rules etc is not straight forward as the definition of what is configuration data and what is run-time data can depend on what has been integrated with other systems and what has not. It becomes a user configuration to determine what should be under source control, and what is runtime data.

Rationale and alternatives

What alternatives have been considered, and what is the rationale for not choosing them?

Prior art

The current process in most projects for BPMN is to manually export the BPMN specification file and add it to a git repo. The files can then be loaded through the UI and saved into the database in a different environment.
For Master data, there are various prior arts. In one project we have spreadsheets of master data contained in the repo for the customer’s extension microservice. The build pipeline build a container that includes the spreadsheets, and there is a custom mutation which can be called to load the data from the spreadsheet into the database.
On other projects, GraphQL mutation input json files are stored in a git repo and are manually deployed to environments.
In other projects, master data is manually maintained through the admin UI in each environment.

Unresolved Questions

What parts of the design still need to be resolved before this can be started?

What related issues are out of scope of this design?

Future Possibilities

  • Master Data Git Sync

To clarify. Would this completely replace the existing workflow specification versioning that’s currently done in dgraph?

At the moment, we have very limited use for previous versions. Eventually, I can see BPMN being used for more long running processes, where it will be important to keep previous versions in the database.

Some additional questions:

  • Would this mean that the “canonical view” of the BPMN is the XML stored in Git? Currently that canonical view is provided by the UI and we somewhat discourage even looking at the XML.

  • Storage and branches: Would this be a “repo-per-bpmn” or “monorepo-of-bpmns” model? Do we support branches (ie draft BPMNs are in a branch and then merged to main to activate them).

  • I think there is a fairly nice version → tag link that we could use to maintain versions in the longer term. However, this interacts interestingly with the previous point.

  • How do we handle outside updates? If a BPMN developer changes the BPMN by hand in the repository should we:

    • Expect them to execute GitPull(); or
    • Automatically sync, ArgoCD style and always match the BPMNs in the repo.
  • Should we look to integrate MRs? GitLab and GitHub have a fairly nice “create MR on push” interface that we could use on new drafts to allow a developer to push their changes but only activate the BPMN on merge.

  • Finally, we probably should look at how round-tripping through the UI for small changes impacts the diff of the XML. We have a few BPMNs stored as tests, but these almost never see the UI, I’m not sure whether we’ll get somewhat stable output for a BPMN that has one node and two connections changed.

These are great questions. Thanks @Jarrah

The question of “canonical view” is a great question. Should it be the XML or should it be the json representation of the WorkflowSpecification from the database?

We use the XML now, but one limitation is that we cannot integrate workflow specifications with external systems (Like DeltaV and Syncade as an example) We’ve not really had a use-case where we’ve had to use the BPMN for actual business processes rather than data pipelines, so we really haven’t run into the issue yet, but I suspect it’s only a matter of time

I use several services that offer GitHub integration, and of all of them, I like how Vercel handles it best. This is similar to the ArgoCD deployment style.

You set up a project and attach a repo/branch to it. They watch the branch for commits. When a commit takes place, they run your actions, evaluate, and potentially deploy. It’s simple, but I’m sure it wasn’t effortless to develop.

A simplified approach to the workflow could look something like this:

  • We allow clients to map their BPMN to a repo. As long as they can specify the repo and branch, it’s up to them to monorepo, repo-per-BPMN, or hybrid it.

  • We can support watching tags/releases, which would help with teams using a trunk-based development model

  • Outside syncs are handled if and only if they take place in the watched branch/tag

  • We can leave MR/PRs as is and allow the client to use them in whatever workflow they already use. Hopefully, their watched branch has some kind of protection rules requiring a pull request and approval requirements, but it’s up to them to enforce that.

We would support the ability to pull directly from the watched branch or another branch in the UI. However, the expected update pattern would be that the user pushes new content to the watched branch, and the BPMN is updated automatically.

Theres lots to like with this approach.

Does it de-couple the development experience from the execution experience?

How would this approach change the way we think about BPMN development IDE?

Just to mention one other benefit of this: it will also make it easier for newcomers to find examples of elements by searching the synced repo. Even grep is probably enough for this task if the project follows standard BPMN naming conventions.