Deferring architecture decisions

A couple of weeks ago I started to read a book called The Software Architect Elevator written by Gregor Hohpe. Gregor said in the book "The more uncertain about the future I am, the more value I derive from deferring a decision.". After reading that sentence I started to think of one specific very special "start-up" project. 

This is a self-reflection from a project in which I was involved three years ago. Overall project was a kind of once in a lifetime experience. All stakeholders were highly motivated to work hard to achieve a common very important goal. We all knew that the stakes would be high and potential results outstanding.

I would say that we worked together like in a start-up company. Bureaucracy was minimal and everyone had an attitude and passion to get this working. The vision was quite clear what we wanted to achieve but time was our biggest enemy. We realized soon that we needed to work hard to get this to market as soon as possible. We knew also that we needed to be agile, flexible, and ready for external pressure and tolerate a high frequency of change.

I'm sure that the way how we managed to defer decisions, especially from the technical and architectural point of view was one factor why this project was a successful one. 

What did we know in the beginning?

We needed to change and reinvent old complex highly manual and fragile processes into digital. We knew that if the digitalization of this core process is successful, this new digital way of working would enable a lot of new possibilities for end users.

There were a lot more unknown things than known things in the beginning. Actually these unknown factors are not unique. I'm sure that many projects start without knowing all the facts that I'll mention next. 

The unknown new domain area

As said we were under time pressure all the time and it didn't help to understand this new domain area and its nuances. From a domain modeling point of view, we tried to be as flexible as possible because we knew that there would be changes coming definitely. 

Unknown lifecycle of the system

Like a start-up company we didn't know in the beginning will this be a huge success or if should we find a new project after 6 months. There were also many external factors from out of our reach which will affect how long there is a need for this system.

Uncertainty about the estimated life cycle of the system caused a lot of discussion in the beginning for a reason because it will affect also architectural and technical decisions. We balanced with this dilemma a quite a long time and eventually started to make decisions based on the assumption that this system would be long-lived. This was the right decision because after working for one year we saw that system has potential for other purposes as well.

Unknown user volume and usage patterns

We had some guesses about user volumes and we knew that if this skyrockets user volumes could be substantial. We also learned during the journey that there will be significant user peeks and we need to handle those as well. During the first months, it was also clear that the system has national potential and interest as well.

Changing environment and external decision makers

It was already clear in the beginning that there would be a lot of external decision-makers where we would have a small possible impact. This will affect heavily to our work but we cannot anticipate how. We knew that we needed to be flexible and ready for changes and sometimes also revert decisions. This will heavily affect also technical design and decision-making.

Success requires integrations

We knew that there would be a need for multiple outbound and inbound integrations to enable all benefits of digitalization of the process. Our system will be the key component of orchestrating the process across other systems.

Operational reliability

It was clear that if the system skyrockets, the popularity of the system will be substantial and we need to provide superior operational reliability for our end users. Reliability was one of our top priorities because we knew how crucial the system was to achieve important common goals. 

Security and privacy

We knew in the beginning that the domain area would require special handling and focus from the security and privacy point of view. It was clear that this would increase our work and we need to integrate security deeply into our processes, practices, and mindset.

What kind of deferring decisions we did?

Next, I'll say something about deferring architectural decisions that we did during the project.

From a modular monolith to microservices

This was one of the first deferring architectural decisions that we made. We deferred moving to microservices architecture until we knew more about how the system will be used and what will be the number of people (developers) involved in development work.

There were so many unknown factors and our team was small (5 people) in the beginning so modular monolith was a good way to start. Modular monolith basically means that the application is built so that code is divided into independent modules. Independent modules enable that modules or components can be broken down into individual microservices if necessary later in the future.

It didn't make any sense to start immediately with a microservice architecture because we didn't know what is the life-cycle of the system, will this system skyrockets and how many developers will eventually work with this (scaling teams).

During the project our knowledge, requirements, and system evolved and we moved more toward microservice architecture. Microservice architecture enabled scaling of the teams as well as independent technical components.

We started developing one application for specific needs with 5 people. Eventually, the system expanded rapidly so that when I left the project there were 7 internal UI applications and APIs, 3 outbound integrations, and 5 inbound integrations. In the end, over 20 people were working with different microservices of the system. 

Be cloud native and scale later

It's important to develop applications to cloud-native already from the beginning so you can truly benefit from cloud-computing scaling. You don't need to know immediately how many users will use the service or what kind of usage pattern they have. Cloud services like Azure enable scaling in horizontally and vertically so you don't need to know immediately how much you need to compute power. 

As said we didn't know usage patterns of the application or user amounts. Deciding on compute scaling was one of the deferring decisions that we made because we could adjust compute based on the need.

We used Azure App Service autoscaling and Event Driven pattern & post-processing to adjust high user amount peeks.

The transition from CRUD to domain-driven

Everything started from a very simple use case about persisting records to a database and showing data from the database. Domain-driven design would have been a bit too heavy for this purpose and like said earlier in the beginning we didn't know the life-cycle of the system or possible dimensions of the domain model either. These facts drive us to start with a C (create) R (read) U (update) D (delete) type of approach because it was very quick to develop, simplistic, and efficient. Our understanding of the domain has just started to increase.

Later when our use cases started to become more and more complex from the business logic point of view we started to move towards domain-driven design.

This was also one example of a deferred architectural decision that we made. We started with small and changed the plan when we had more information and knowledge.

Towards API management

In the beginning, our system had a small number of external API consumers. At that time our externally consumable API endpoints were easy to manage from an operational point of view. Later we started to get more and more API consumers so we decided to start using centralized API management to help management. API management provided also a developer portal functionality for external developers and improved also the governance of our APIs.

We did this architectural change when time was ready and we had a true need for this functionality.

From Synchronous to Asynchronous processing 

When we learned more about user amounts and usage patterns we started to move more to asynchronous processing in the background. Our application contained many functionalities that didn't require an immediate response back to the user after submission so transition to asynchronous processing was a clear choice. 

It was easy to start with synchronous processing and move to asynchronous when we had more information about the peaks and patterns.

What do we achieve by applying deferred decisions?

This wasn't an easy journey but overall project was successful. I think the deferred decision-making process enabled us to always start with a small and maybe simpler solution which required less work. I believe that this guided also us a direction to avoid over-engineering. Of course, this didn't happen without pain and problems but we managed to solve them because we were allowed to refactor and change the direction. Time to market was the most important KPI for us and I believe that the deferred decision-making process helped to achieve this.

Modular monolith architecture in the beginning

Microservices-based architecture after evolution

Comments