1. Introduction

Defining Software Architecture

Software architecture is hard to define precisely. The field moves rather quickly. There are quips like “it’s about the important stuff,” that obviously don’t tell us much. Likewise, it used to be described as dealing with all the things that are costly to change later. However, with microservices, that is no longer the case.

Most books or resources about it are dated by now.

The key thing to keep in mind is that it’s changing quickly. Likewise, it can only be understood in context.

Our capabilities are always growing, which unlocks new ways of doing things.
Hardware gets better, faster, bigger. We devise new means to operate. We invent.
So that changes how we build things.

The authors define software architecture as consisting of

the structure of the system: the type of architecture style(s) the system is implemented in.
- E.g. microservices, layered, or microkernel.
architecture characteristics: these define the success criteria of the system, which are generally orthogonal to the non-domain functionality of the system.
- E.g. availability, reliability, testability, scalability, security, agility, fault tolerance, elasticity, recoverability, performance, deployability, learnability.
architecture decisions: defining the rules for how the system should be constructed.
- E.g. only the business and service layers in a layered architecture can access the database (presentation layer can’t make direct database calls).
- If a decision can’t be implemented in one part of the system for some reason, that decision can be broken with a variance – basically a formalization so you can analyze the exception.
design principles: more like a guideline than a hard-and-fast rule.
- E.g. prefer REST or gRPC for communication between services.

Expectations of an Architect

Make architecture decisions: define architecture decisions and design principles to guide technology decisions. That means you don’t just decide that the team uses React.js, but rather you instruct them to use a reactive-based framework for frontend web development.
Continually analyze the architecture and the current technology environment and then recommend solutions for improvement
Keep current with latest trends
Ensure compliance with decisions
Diverse exposure and experience — don’t need deep expertise in all, but at least familiarity with a variety of technologies. Go beyond your comfort zone. Aggressively.
Have business domain knowledge
Possess interpersonal skills
Understand and navigate politics: your decisions will be challenged. Learn to negotiate and navigate politics.

Laws of Software Architecture

First law: Everything in software architecture is a trade-off.

If you think you’ve found something that isn’t a trade-off, you likely just haven’t found the trade-off yet.

Second law: Why is more important than how.

I. Foundations

2. Architectural Thinking

Four main aspects of thinking like an architect:

Understanding the difference between architecture and design and knowing how to collaborate with development teams to make architecture work
Having a wide breadth of technical knowledge while also having a certain level of technical depth – to see solutions and possibilities that others can’t
Understanding, analyzing, and reconciling the trade-offs between various solutions
Understanding the importance of business drivers & how they translate to architectural concerns

All architects should code and be able to maintain a certain level of technical depth.
However, they shouldn’t become the bottleneck by taking over code within the critical path of a project.

3. Modularity

The book uses modularity to describe “a logical grouping of related code, which could be a group of classes in an object-oriented language or functions in a structured or functional language.”

They don’t have to be grouped physically, just logically, to fit the definition.

Measuring Modularity

Focusing on three language-agnostic metrics that help us understand modularity:

cohesion
coupling
connascence

Cohesion

The extent to which the parts of a module should be contained within the same module – how related they are to each other.

A cohesive module is one where all the parts should be packaged together, as breaking it down further would require coupling the parts together between modules.

Cohesion measures, from best to worst:

Functional cohesion: every part of the module is related to the others, and it contains everything essential to function.
Sequential cohesion: two modules interact, one outputting data that becomes the input for the other.
Communicational cohesion: two modules form a communication chain, each operating on information and/or contributing to some output.
Procedural cohesion: two modules must execute code in a particular order.
Temporal cohesion: modules are related based on timing dependencies.
Logical cohesion: data within modules is related logically but not functionally.
Coincidental cohesion: elements in a module are not related other than being in the same file

You can use the Chidamber and Kemerer Lack of Cohesion in Methods (LCOM) metric to measure the structural cohesion of a module.

LCOM is the sum of sets of methods not shared via sharing fields.
Say we have a class with two private fields, a and b. Many of the methods only access a, and many others only access b. So the sum of the sets of methods not shared via sharing fields (a and b) is high, indicating a high LCOM for the class, meaning it scores high in lack of cohesion in methods.

Coupling

Yourdon and Constantine published Structured Design that defined many concepts, including the metrics afferent and efferent coupling.

Afferent coupling: measures number of incoming connections to a code artifact (component, class, function, etc.).

Efferent coupling: measures outgoing connections to other code artifacts.

This book also presents some other metrics:

Instability is the ratio of efferent coupling to the sum of both efferent and afferent coupling. It determines the volatility of the code base. If it has a high degree of instability, it breaks more easily when changed due to high coupling.
If a class calls many other classes to delegate work, the calling class is susceptible to breaking if one or more of the called methods change.

Abstractness is the ratio of the sum of abstract artifacts (abstract classes, interfaces, etc.) to the sum of concrete ones (implementations).

Distance from the main sequence is a derived metric based on instability and abstractness. It equals the absolute value of (abstractness + instability - 1).

Code shouldn’t be so abstract that it’s difficult to use. And code with too much implementation and not enough abstraction becomes brittle and hard to maintain.
The distance metric just helps calculate that.

We have other metrics for code, like cyclomatic complexity. However, it can’t distinguish between essential complexity (complexity because the underlying domain is complex) or accidental complexity (code is more complex than necessary).

Connascence

These are afferent and efferent coupling metrics recast to OOP languages.

As defined by Meilir Page-Jones, the creator of the concept:

Two components are connascent if a change in one would require the other to be modified in order to maintain the overall correctness of the system.

4. Architecture Characteristics Defined

Architecture characteristics is a key responsibility that entails defining, discovering, and analyzing all the things the software must do that isn’t directly related to the domain functionality.

An architecture characteristics meets three criteria:

Specifies a non-domain design consideration
- E.g. specifying a certain level of performance for the application.
Influences some structural aspect of the design
- E.g. security is almost always a concern, but becomes very important when designing payment processing applications.
Is critical or important to application success
- There are many things you could consider, but shouldn’t. Choose the fewest architecture characteristics you can.

There are also implicit and explicit architecture characteristics.

Implicit ones rarely appear in requirements, but are necessary still. E.g. availability, reliability, and security. Or low latency for high-frequency trading firms.
Explicit architecture characteristics appear in the requirements documents or other specific instructions.

Architecture Characteristics (Partially) Listed

There isn’t really any exhaustive list. I’m including these because it’s interesting to use as a reference.

Operational Architecture Characteristics

Availability: how long the system needs to be available. If 24/7, how can you quickly recover from failure?
Continuity: disaster recovery capability.
Performance: includes stress testing, peak analysis, analysis of the frequency of functions used, capacity required, response times.
Recoverability: business continuity requirements – in case of disaster, how fast should it be online again? Affects backup strategy and redundancy.
Reliability/safety: does the system need to be fail-safe, or is it mission critical in a way that affects lives? If it fails, will it cost a large amount of money?
Robustness: ability to handle error and boundary conditions while running (e.g. internet outage, power outage, hardware failure).
Scalability: ability for the system to perform and operate as the number of users or requests increases.

Structural Architecture Characteristics

Need to consider code structure as well. Ensure code quality: good modularity, controlled coupling, readable code, etc.

Configurability: ability for end user to change aspects of the software’s configuration.
Extensibility: importance of plugging new pieces of functionality in.
Installability: ease of system installation on necessary platforms.
Leverageability/reuse: ability to leverage common components across products.
Localization: support for multiple languages, units of measure, and currencies.
Maintainability: how easy is it to apply changes and enhance the system?
Portability: does it need to run on more than one platform?
Supportability: level of technical support needed by the application. Level of logging and other facilities required to debug.
Upgradeability: how easy/fast is it to upgrade from a previous version on servers and clients?

Cross-Cutting Architecture Characteristics

These are not easily categorized, yet still important.

Accessibility: access to all users, including those with disabilities.
Archivability: will data need to be archived or deleted after some time?
Authentication: security requirements to ensure users are who they say they are.
Authorization: security requirements to ensure users can access only certain functions within the application.
Legal: what legislative constraints is the system operating in (e.g. GDPR)? What reservation rights does the company require? Any regulations regarding the way the application is to be built or deployed?
Privacy: ability to hide transactions from internal company employees.
Security: should data be encrypted in the database? Should it be encrypted for network communication between internal systems? What type of authentication needs to be in place for remote user access?
Usability/achievability: level of training required for users to achieve their goals with the application/solution.

Trade-Offs and Least Worst Architecture

Applications can only support a few architecture characteristics.
Many of the architecture characteristics has an impact on the others. E.g. if security is important, it’ll affect performance (more encryption, indirection, etc.).

It’s rare to be able to design a system and be able to maximize every single architecture characteristic. Instead, you’ll make decisions based on trade-offs between several competing concerns.

Never shoot for the best architecture, but rather the least worst architecture.

5. Identifying Architecture Characteristics

Identifying architectural characteristics is crucial for creating or validating an architecture
Architects uncover characteristics through domain concerns, requirements, and implicit knowledge
Domain concerns
- Translating domain concerns to architectural characteristics is essential
- Keep the list of characteristics short to avoid overcomplicating the design
- What matters? Scalability? Security? Performance? Fault tolerance? Or a combination of all four?
- Prioritizing characteristics can be challenging; focus on the top three most important ones (in any order)
- Architects and stakeholders often face communication challenges due to different vocabularies. Here’s a translation in the form of “user says → translation to architecture characteristics”
  - Mergers and acquisitions → interoperability, scalability, adaptability, extensibility
  - Time to market → agility, testability, deployability
  - User satisfaction → performance, availability, fault tolerance, testability, deployability, agility, security
  - Competitive advantage → agility, testability, deployability, scalability, availability, fault tolerance
  - Time and budget → simplicity, feasibility
- Agility != time to market. Agility + testability + deployability does.
- Multiple characteristics often work together to address domain concerns (e.g., agility + testability + deployability)
Requirements
- Requirements documents can provide explicit architectural characteristics. E.g. expected amount of users.
- Implicit domain knowledge is valuable for identifying additional characteristics
- Architecture katas , developed by Ted Neward, offer practice in deriving characteristics from domain descriptions. May help you practice.

There are no wrong answers in architecture, only expensive ones.

Above all, it is critical for the architect to collaborate with the developers, project manager, operations team, and other co-constructors of the software system. No architecture decision should be made isolated from the implementation team (which leads to the dreaded Ivory Tower Architect anti-pattern).

6. Measuring and Governing Architecture Characteristics

There are multiple common problems with defining architecture characteristics:

Many characteristics are vague. How do you design for agility? There are many different perspectives on common terms (e.g. by context)
There are wildly varying definitions of the characteristics. Even within the same organization, different departments may disagree on the definition of common characteristics like performance.
They’re too composite. Many desirable architecture characteristics comprise many others at a smaller scale. E.g. developers decompose agility into characteristics like modularity, deployability, and testability.

Having objective definitions for architecture characteristics solve all three of the above problems.

Operational Measures

Many characteristics have obvious direct measures (e.g. performance and scalability), but even those may be ambiguous.

Some projects look at general performance, e.g. through how long requests take, response cycles, etc.
You can also make ‘budgets’ for specific parts of the application, e.g. a performance budget like 500ms first page render.

It’s best to define your metrics using statistical analysis.
You might measure scale over time, build statistical models, and then raise alarm if real-time metrics fall outside the prediction models.
Either the model is incorrect or something is amiss. Both are important to know.

Structural Measures

Structural characteristics are harder to measure. How do you measure well-defined modularity?

One measure of code is complexity, defined by cyclomatic complexity (CC).
It’s a code-level metric that can provide a measure of the complexity of code at the function/method, class, or application level.

It uses concepts from graph theory. The idea is centered around decision points, which cause different execution paths.
A decision statement is, for example, an if statement.

The CC formula is

where is the edges (decisions) and is the nodes (lines of code).
is a simplification for a single function/method. For fan-out calls to other methods (a.k.a. connected components in graph theory), the more general formula is

where represents the number of connected components.

The industry threshold for CC is under 10, but under 5 is preferred.

Process Measures

Agility can be hard to measure, as it’s often a composite characteristic of multiple characteristics like testability and deployability.

However, while they are difficult to measure as well, there are ways to.
For testability, code coverage is perhaps not the best metric. You can have 100% code coverage, yet poor assertions that don’t provide confidence in code correctness.

Deployability can be measured by things like the ratio of successful to failed deployments, how long deployment takes, issues/bugs raised by deployments, and so on.

Governance and Fitness Functions

How can you ensure developers respect the established (and prioritized) architecture characteristics?

Governance is an important responsibility of an architect.
The Building Evolutionary Architectures book developed a family of techniques, called fitness functions, used to automate many aspects of architecture governance.

Fitness functions in this context are mechanisms to assess architecture characteristics, or a combination of them, objectively.

The concept comes from evolutionary computing (particularly, genetic algorithms). It’s an objective measure that helps evaluate how well a solution (or candidate) performs in achieving the desired outcome.
In evolutionary algorithms, where multiple solutions are evolved and mutated over time, the fitness function guides the selection process by determining which solutions are closer to the optimal result.

Since the fitness function concept here denoted any mechanism, there are lots of examples: metrics, monitors, unit testing libraries, chaos engineering, etc.
As long as they’re used to assess the architecture characteristics.

As an example of a fitness function that tests modularity, we have cyclic dependencies.
How can we prevent them? Code reviews is an option, but aren’t perfect. Instead, you can run static analysis on the code, which readily detects any cyclic dependencies. Just wire that into CI.
There’s also the “distance from main sequence” fitness function.

Just make sure developers understand the purpose of the fitness function before you enforce it.

You can use tools like ArchUnit and NetArchTest for enforcing architectural rules.

Netflix’s Chaos Monkey and Simian Army are examples of fitness functions in production.

7. Scope of Architecture Characteristics

The scope of architecture characteristics has narrowed from system-level to component-level due to modern engineering techniques and architectures like microservices.

Connascence: Two components are connascent if a change in one requires the other to be modified to maintain system correctness.

Types of connascence:

Static: Discoverable via static code analysis
Dynamic: Concerning runtime behavior. Further, there are two types:
- Synchronous: calls between two distributed services have the caller wait for response from the callee
- Asynchronous: calls allow fire-and-forget semantics in an event-driven architecture, enabling operational differences between services

Example: if two services in a microservice architecture share the same class definition of some class, they are statically connascent with each other, as changing the class requires changes to both services.

Architecture quantum: An independently deployable artifact with high functional cohesion and synchronous connascence.

Recall that the word “quantum” comes from physics and refers to the minimum amount of any physical entity involved in an interaction.

Key components of architecture quantum:

Independently deployable: an architecture quantum includes all necessary components to function independently from other parts of the architecture
High functional cohesion: cohesion in component design refers to how well the contained code is unified in purpose. High functional cohesion implies that an architecture quantum does something purposeful
- This matters a lot in microservice architectures
Synchronous connascence: implies synchronous calls within an application context or between distributed services that form this architecture quantum
- Operational symmetry is required between services during a synchronous call, as differences in scalability or reliability can lead to timeouts and failures
- A common solution is to use some asynchronous element as a link between services, like a message queue to temporarily buffer differences (event-driven architecture)

Architecture characteristics should be defined at the quantum level rather than system level in modern systems.

8. Component-Based Thinking

Components are the physical manifestation of modules in software architecture.
They can be e.g. libraries, subsystems, layers, or services.
Architects define, refine, manage, and govern components within an architecture.

Component scope ranges from simple wrappers to complex subsystems or services.
We use components as the fundamental modular building block in architecture.

One of the first things you do when you start a project is to identify components. But before you can do that, you need to know how to partition the architecture.
You need to decide on top-level partitioning of components: technical or domain-based.

Technical partitioning organizes components by capabilities (e.g., presentation, business rules).
You may see a design like this (from the top down):

Presentation
Business rules
Service
Persistence

And that block talks to some database.
Layered architecture, like above, represents technical top-level partitioning. Another example is the MVC, which is a type of layered architecture.

Conway’s Law: Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.
For example, it is common for organizations to partition workers based on technical capabilities.

Domain partitioning organizes components by workflows or business domains.
This was inspired by Eric Evan’s book, Domain-Driven Design.
In DDD, architects identify domains or workflows independent and decoupled from each other.
The microservice architecture style is based on this philosophy.
For example, you may partition into CatalogCheckout, ShipToCustomer, Analytics, UpdateInventory, Reporting, UpdateAccounts. Each of these may have a persistence library and a separate layer for business rules, but the top-level partition is on domains.

Component identification is an iterative process involving multiple steps:

Identifying initial components
Assigning requirements to components
Analyzing roles and responsibilities
Analyzing architecture characteristics
Restructuring components

Finding the proper granularity for components is challenging.

Common component design techniques:

Talking to developers, business analysts, domain experts
- Beware the entity trap: if you’re just creating “manager” components that define simple CRUD operations, you might as well just use a library that can create that for you. You aren’t defining an architecture, but rather a component-relational mapping. Database relations are not workflows. Think about the actual workflows of the application.
Actor/Actions approach: identify actors who perform activities with the application and the actions those actors may perform
Event storming: comes from Domain-Driven Design. You assume the project will use messages and/or events to communicate between various components. So you determine which events occur in the system based on requirements and identified roles, and build components around those event and message handlers
- This works great in distributed architectures (e.g. microservice architecture) that use events and messages because it helps you define the messages that’ll be used in the system
Workflow approach: similar to event storming, but without the explicit constraint of building a message-based system. You identify workflows by identifying the key roles, determining the kinds of workflows those roles engage in, and build components around that

II. Architecture Styles

Architecture styles are defined as the overarching structure of how the user interface and backend source code are organized (e.g. within layers of a monolith or separately deployed services) and how that source code interacts with a data store.

Architecture patterns are lower-level design structures that help form specific solutions within an architecture style (e.g. how to achieve scalability or high performance within a set of operations or between services).

9. Foundations

Fundamental patterns in software architecture recur throughout history
Layered architecture is a common and long-standing pattern
Big Ball of Mud:
- An anti-pattern lacking discernible structure
- Characterized by haphazard organization, sprawling code, and unregulated growth
- Often results from lack of governance and rapid, unplanned development
- Causes difficulties in change management, deployment, testing, scalability, and performance
- Basically: spaghetti architecture
Unitary Architecture:
- Originally, software and hardware were a single entity
- Gradually separated as needs for more sophisticated capabilities grew — we now use distributed systems
- Now mainly found in embedded systems and constrained environments
Client/Server Architecture:
- Separates functionality between frontend and backend
- Two-tier architecture with various implementations:
  - Desktop (application) + database server
  - Browser + web server
Three-tier Architecture:
- Popular in the late 1990s
- Includes database tier, application tier, and frontend
- Corresponded with network-level protocols like CORBA and DCOM
- Facilitated building distributed architectures
Modern architectures often incorporate capabilities from earlier distributed systems as tools or patterns

Monolithic Versus Distributed Architecture

The architecture styles described in this book

Monolithic: single deployment unit of all code
- Layered architecture
- Pipeline architecture
- Microkernel architecture
Distributed: multiple deployment units connected through remote access protocols
- Service-based architecture
- Event-driven architecture
- Space-based architecture
- Service-oriented architecture
- Microservice architecture

Distributed styles offer more power in terms of performance, scalability, and availability, but have significant tradeoffs.
Some of these issues were described by L. Peter Deutsch in The Fallacies of Distributed Computing . The fallacies are:

#1: The Network Is Reliable
- Networks have become more reliable over time, yet remain generally unreliable
- Impacts all distributed architectures, as all of them rely on the network for communication
Fallacy #2: Latency Is Zero
- Remote calls take longer than local calls (milliseconds vs. nanoseconds)
- Architects must know average latency and 95th-99th percentile in production
- Chaining multiple service calls can significantly increase overall latency
Fallacy #3: Bandwidth Is Infinite
- Distributed architectures consume more bandwidth than monolithic ones
- Stamp coupling (a type of coupling where modules share complex data structures) can lead to excessive data transfer between services
- Solutions include private APIs, field selectors, GraphQL, consumer-driven contracts (CDCs), and internal messaging
Fallacy #4: The Network Is Secure
- Distributed architectures increase the surface area for threats
- Each endpoint must be secured, even for interservice communication
  - Impacts performance in highly-distributed architectures
Fallacy #5: The Topology Never Changes
- Network topology changes frequently
  - The network topology includes routers, hubs, switches, firewalls, networks, and appliances (etc.)
- Changes can invalidate latency assumptions and trigger timeouts
- Architects must maintain communication with operations and network administrators
Fallacy #6: There Is Only One Administrator
- Large companies have multiple network administrators
- Requires increased coordination and communication in distributed architectures
Fallacy #7: Transport Cost Is Zero
- Distributed architectures have higher monetary costs
  - Requires additional hardware, servers, gateways, firewalls, subnets, and proxies
- Architects should analyze current topology for capacity, bandwidth, latency, and security
Fallacy #8: The Network Is Homogeneous
- Networks often consist of multiple hardware vendors
- Heterogeneous hardware may not always integrate seamlessly

Additional challenges in distributed architecture beyond the eight fallacies:

Distributed logging:
- Root-cause analysis is difficult due to multiple logs in different locations and formats
- Monolithic applications have a single log, making tracing easier
- Tools like Splunk help consolidate logs but don’t solve all complexities
Distributed transactions:
- More complex than monolithic transactions
- Rely on eventual consistency instead of ACID transactions
- Trade-off: high scalability and availability at the cost of data consistency
- Solutions include transactional sagas and BASE transactions
Contract maintenance and versioning:
- Challenging due to decoupled services owned by different teams
- Involves agreeing on behavior and data between client and service
- Requires complex communication models for version deprecation
These issues are more prevalent in distributed architectures compared to monolithic ones

10. Layered Architecture

Layered architecture (n-tiered) is a common & simple
Often results from Conway’s law: matches the organizational communication structure
Typically consists of four layers: presentation, business, persistence, and database
Can have physical topology variants with different deployment configurations
- E.g. you may have physical deployments separated like this — with parentheses denoting grouping of multiple layers:
- (Presentation → Business → Persistence) → Database
- Presentation → (Business → Persistence) → Database
- Presentation → Business → Persistence → Database
Each layer has specific roles and responsibilities
Promotes separation of concerns and clear roles for developers
Is technically partitioned rather than domain-partitioned
- In some architectures you partition by domain responsibility, like Orders, Cart, and Shop. Here, we’re grouping by technical responsibility.
Layers can be open or closed, affecting request flow
- Closed layers support the concept of “layers of isolation”
  - Layers have to be closed for layers of isolation. If the presentation layer could directly access the persistence layer, then any changes made to the persistence layer would impact both the business layer and presentation layer. Then you have a tightly coupled application with layer interdependencies, which is rather brittle as well as difficult and expensive to change.
- Layers of isolation allow for easier changes and replacements: changes in one layer generally don’t impact or affect components in another, given the contracts between the layers remain unchanged.
  - It also lets you replace any layer without impacting the other layers (assuming well-defined contracts and the use of the business delegate pattern)
- Open means you can ‘skip’ over the layer, closed means requests have to go through it.
Additional layers can be added to enforce architectural constraints
Good starting point for applications with uncertain final architecture
Prone to the “architecture sinkhole anti-pattern”
- Architecture sinkhole: When requests move from layer to layer as simple pass-through processing with no business logic performed within each layer
- Every layered architecture has at least some scenarios that fall into this anti-pattern. You can follow The 80/20 Rule: it’s acceptable if only 20% of the requests are sinkholes. If 80% are, the layered architecture is probably not the correct architecture style for the problem domain.
  - Or you can open all layers, accepting the increased difficulty in making changes.
Suitable for small, simple applications or websites
Cost-effective and familiar to developers
Less suitable for large applications due to decreased maintainability and agility
Characteristics ratings:
- High in cost-effectiveness and simplicity
- Low in deployability and testability
- Medium in reliability
- Low in elasticity and scalability
- Low to medium in performance
- Poor in fault tolerance and availability
Monolithic nature limits scalability and performance optimization
Not well-suited for high-performance or highly available systems

11. Pipeline Architecture Style

Pipeline Architecture Style
- Also known as “pipes and filters” architecture
- Common in Unix terminal shell languages (Bash, Zsh)
- Parallels in functional programming languages and MapReduce model
Topology
- Consists of pipes and filters
- Pipes form unidirectional, point-to-point communication channels between filters
  - Accepts input from one source and always directs output to another
- Filters are independent, (generally) stateless, and perform single tasks
  - Handle composite tasks with a sequence of filters rather than a single one
Types of Filters
- Producer: Starting point; outbound only
- Transformer: Accepts input, optionally transforms it, forwards it (similar to map)
- Tester: Tests criteria on input, optionally produces output (similar to reduce)
- Consumer: Termination point; may persist or display the final result
Benefits
- Encourages compositional reuse
- Simple but powerful
Example Applications
- Electronic Data Interchange (EDI)
- ETL (extract, transform, load) tools
- Orchestrators and mediators (e.g., Apache Camel)
Example Scenario
- Service telemetry data sent to Apache Kafka
- Service Info Capture filter captures data from Kafka
- Data passed to Duration Filter or Uptime Filter based on criteria
- Relevant data processed by Duration Calculator or Uptime Calculator
- Final results persisted by Database Output consumer
Extensibility
- New filters can be easily added to existing pipelines
Characteristics Ratings
- Cost: Low
- Simplicity: High
- Modularity: High (due to separation of concerns)
- Deployability and Testability: Medium (better than layered architecture but still a monolith)
- Reliability: Medium (less network dependency but monolithic testing required)
- Elasticity and Scalability: Very Low (difficult in monolithic deployments)
- Fault Tolerance: Low (entire application impacted by single point failures)
- Availability: Low (high mean time to recovery, especially for large applications)

12. Microkernel Architecture Style

The microkernel architecture style (also known as the plug-in architecture) is decades old but still widely used.
Suited for product-based applications and non-product custom business applications.
Topology:
- Consists of two main components: a core system and plug-in components.
- Application logic is split between these components, enhancing extensibility, adaptability, and isolation.
Core System:
- Minimal functionality required to run the system.
- Example: Eclipse IDE, starting as a basic text editor until plugins are added.
- Simplifies core logic by offloading cyclomatic complexity to plug-in components.
- Can be implemented as a layered architecture, modular monolith, or split into domain services.
Plug-In Components:
- Standalone and independent, containing specialized logic or custom features.
- Can isolate volatile code for better maintainability and testability.
- Communication with core is usually point-to-point but can also be via REST or messaging.
- Can be compile-based or runtime-based, with runtime plugins added/removed without entire redeployment.
- Can be implemented as shared libraries (e.g., JAR, DLL, Gem) and namespaces/packages.
- The plugins usually don’t connect to a central database. The core system takes on that responsibility. They can, however, have their own, separate data stores.
Registry:
- Core system uses a registry to keep track of available plug-in modules.
- Can be simple (internal map structure) or complex (registry tools like Apache ZooKeeper).
Contracts:
- Standardized contracts between core and plug-ins for behavior, input, and output.
- Can be implemented using XML, JSON, or objects.
- If you have no control over the contract used by the plugin (e.g. it’s developed by 3rd party developers), it’s common to create an adapter between the plug-in contract and your standard contract so your core system doesn’t need specialized code for each plug-in.
Examples and Use Cases:
- Common in tools like Eclipse, PMD, Jira, Jenkins.
- Used in applications like insurance claims processing and tax preparation software to manage complex rules.
Architecture Characteristics Ratings:
- Simplicity and cost are high strengths.
- Weaknesses include scalability, fault tolerance, and extensibility due to monolithic nature.
- Offers domain partitioning and technical partitioning for flexibility.
- Testability, deployability, reliability, modularity, and extensibility rated slightly above average.

Example (Java):

// From
public void assessDevice(String deviceID) {
	if (deviceID.equals("iPhone6s")) {
		assessIphone6s();
	} else if (deviceID.equals("iPad1")) {
		assessiPad1();
	} else if (deviceID.equals("Galaxy5")) {
		assessGalaxy5();
	} else { ... }
}

// To
public void assessDevice(String deviceID) {
	String plugin = pluginRegistry.get(deviceID);
	Class<?> theClass = Class.forName(plugin);
	Constructor<?> constructor = theClass.getConstructor();
	DevicePlugin devicePlugin = (DevicePlugin)constructor.newInstance();
	DevicePlugin.assess();
}

// Registry
Map<String, String> registry = new HashMap<String, String>();
static {
	// Point-to-point access example
	registry.put("iPhone6s", "iPhone6sPlugin");
	// Messaging example
	registry.put("iPhone6s", "iphone6s.queue");
	// RESTful example
	registry.put("iPhone6s", "https://atlas:443/assess/iphone6s");
}

// Maintain contracts for the plugins
public interface AssessmentPlugin {
	public AssessmentOutput assess();
	public String register();
	public String deregister();
}

public class AssessmentOutput {
	public String assessmentReport;
	public Boolean resell;
	public Double value;
	public Double resellPrice;
}

13. Service-Based Architecture Style

Hybrid of microservices architecture style.
Flexible and less complex/costly than microservices or event-driven architectures (even though it’s distributed).
Popular for business-related applications.
Topology:
- Distributed macro layered structure.
- Separately deployed user interface and coarse-grained services, with a monolithic database.
- Services deployed like monolithic applications (e.g., EAR file, WAR file).
- Typically 4-12 services sharing a single database.
Service Deployment:
- Usually a single instance per service; can scale with multiple instances if needed.
- Remote access protocols include REST, RPC, messaging, or SOAP.
- Direct access from user interface via a service locator pattern or optional API/proxy layer.
Database:
- Shared monolithic database; SQL queries and joins used similar to traditional monolithic architectures.
- Database partitioning can address impact of schema changes.
Topology Variants:
- User interface variants:
  - Single monolithic user interface spans all services
  - Domain-based user interface spans spans some services (e.g. 2 user interfaces, each spanning 2 services)
  - Or e.g. one user interface per service
- Database variations:
  - All services share one database
  - A service has its own database, the rest share
  - Each service has its own database
- Essentially, you can break it down to what fits your situation.
- Possible to federate UI and databases by domain service.
Service Design:
- Coarse-grained domain services typically using layered architecture (API facade, business layer, persistence layer).
- Internal orchestration of business requests (e.g., order placement) differs from microservices’ external orchestration.
Transaction Integrity:
- Uses ACID transactions for data integrity.
- Unlike microservices which use eventual consistency (BASE transactions).
Database Partitioning:
- Services with a service-based architecture usually share a single, monolithic database due to the small number of services.
- But if you don’t do it properly, a table schema change can potentially impact every service, so database changes can become costly.
- Logical partitioning and federated shared libraries can mitigate database change impacts.
  - Recommends fine-grained logical partitions for better control.
Characteristics Ratings:
- Agility, testability, deployability, fault tolerance, and availability rate four stars of five.
- Scalability three stars, elasticity two stars.
- Simplicity and cost-efficiency highlighted.
- Reliable due to less network traffic and better resource use.
When to Use:
- Pragmatic and flexible, suitable for domain-driven design.
- Preserves ACID transactions better than other distributed architectures.
- Avoids complexities of fine-grained service coordination (orchestration and choreography).
- Well-suited for applications needing modularity without high complexity.

14. Event Driven Architecture Style

Distributed asynchronous architecture for scalable and high-performance applications.
Consists of decoupled event processing components that process events asynchronously.
Used for both small and large-scale applications.
Can be standalone or integrated with other architectures like microservices.
Most applications follow the request-based model
Request-Based Model:
- Traditional approach using a request orchestrator to direct requests to processors.
- Suitable for deterministic, data-driven requests (e.g., retrieving order history).
- For example, the request orchestrator is a user interface (or API layer, or enterprise service bus). You might think of a user making a request on a webpage, which then directs the request to the appropriate request processors.
Event-Based Model:
- Reacts to events and triggers actions based on specific situations (e.g., online auction bids).
- Used for dynamic actions requiring high responsiveness and flexibility.
There are two main topologies within event-driven architecture.
Topologies in Event-Driven Architecture:
- Broker Topology:
  - No central mediator, event flow distributed across event processors via message brokers.
  - Consists of initiating events, event brokers, event processors, and processing events.
    - Initiating event: starts the flow
    - Event broker: e.g. RabbitMQ
    - Event processor: processes the requests
    - Processing event: an event raised after a processor has processed the event it took on (e.g. for further processing by another processor)
  - Highly decoupled with parallel processing, high scalability, responsiveness, performance, and fault tolerance. But lacks workflow control, recoverability, restart capabilities, and has challenges with data inconsistency
  - Used when you require a high degree of responsiveness and dynamic control over the processing of an event.
- Mediator Topology:
  - Central mediator manages and controls workflow of events.
  - Components:
    - Initiating event: starts the flow, is sent to an initiating event queue, which is accepted by the event mediator
    - Event queues
    - Event mediator: only knows the steps involved in processing the event and therefore generates corresponding processing events that are sent to dedicated event channels (usually event queues) in a point-to-point messaging fashion
    - Event channels
    - Event processors: listen to the dedicated event channels, process the event, and usually respond back to the mediator that they’ve completed the work
  - There are usually multiple mediators, and these are usually associated with a particular domain or grouping of events
    - Reduces single point of failure issue and increases overall throughput and performance
  - Used for complex workflows requiring coordination of multiple event processors.
  - Offers better error handling, recoverability, and restart capabilities, but introduces more coupling and lower scalability and performance.
  - Commonly used when you require control over the workflow of an event process.
Error Handling and Asynchronous Capabilities:
- Asynchronous communication improves responsiveness but complicates error handling.
- The workflow event pattern can be used for reactive error handling without impacting responsiveness.
  - Workflow event pattern uses delegation, containment, and repair via a workflow delegate.
  - Event producer sends data to event consumer asynchronously through a message channel.
  - On error, the event consumer delegates the error to the workflow processor and moves to the next message.
  - This process maintains overall responsiveness by immediately processing the next message. If the event consumed had to spend time figuring out the error, that’s time away from processing the next message.
  - Workflow processor analyzes the error, e.g. with a static and deterministic error handling method, or possibly by using machine learning to detect anomalies.
  - Attempts to repair the data programmatically and resends it to the queue.
  - Event consumer reprocesses the repaired data as a new message.
  - Unresolvable errors are sent to a dashboard queue, where someone can manually fix the message via a dashboard interface.
  - Manually fixed messages are then resubmitted to the original queue.
- There are multiple asynchronous messaging techniques to mitigate data loss risks, like:
  - Persisted message queues + synchronous send.
    - Broker receives message, stores it in memory and on some physical data store, so we can recover it if the broker goes down.
    - Synchronous send does a blocking wait in the message receiver until the broker has acknowledged that the message has been persisted.
  - Client acknowledge mode.
    - By default, when a message is de-queued, it’s immediately removed from the queue (auto acknowledge mode). Client acknowledge mode keeps the message in the queue and attaches the client ID to the message so no other consumers can read it.
    - So if the consumer crashes, the message is still in the queue, waiting for it.
  - Use ACID transactions + Last participant support (LPS). Using LPS removes the message from the persisted queue by acknowledging that processing has been completed and that the message has been persisted. So we ensure the message made it into the database.
Broadcasting allows producers to send messages to multiple subscribers without knowing the recipients.
Request-reply messaging provides pseudosynchronous communication (uses correlation IDs or temporary queues).
- What if you need an order ID when ordering something? Or a confirmation number when booking a flight? Then you need some synchronous communication between services or event processors. We get that with request-reply messaging.
- Each event channel within request-reply messaging consists of two queues: a request queue and a reply queue.
- The initial request for information is asynchronously sent to the request queue, and control is returned to the message producer.
- Then it does a blocking wait on the reply queue, waiting for response.
- The message consumer receives and processes the message and sends the response to the reply queue.
- The producer then receives the message with the response data.
- There are two primary techniques for implement request-reply messaging: using a correlation ID or using a temporary queue for the reply queue.
  - Correlation ID is contained in the message header. It’s a field in the reply message that’s usually set to the message ID of the original request message.
  - Using a temporary queue for the reply queue: create a temp queue when the request is made and delete it when the request ends. Then you don’t need a correlation ID because the temp queue is a dedicated queue only known to the event producer for the specific request.
Choosing Between Request-Based and Event-Based Models:
- Request-based for structured, data-driven workflows.
- Event-based for dynamic, action-oriented processes requiring scalability and real-time decision making.
Hybrid Event-Driven Architectures:
- Combines event-driven architecture with other styles like microservices for enhanced scalability and performance.
Architecture Characteristics Ratings:
- High performance, scalability, and fault tolerance.
- Challenges with simplicity and testability due to nondeterministic event flows.
- Highly evolutionary, allowing easy addition of new features and functionality.

15. Space-Based Architecture Style

Space-Based Architecture Style addresses high scalability, elasticity, and high concurrency. Also useful for apps with variable and unpredictable concurrent user volumes.
Traditional web-based architecture experiences bottlenecks at the web server, application server, and database server layers.
Scaling out web servers can lead to bottlenecks at the application and database layers.
In high-volume applications with a large concurrent user load, the database is usually the final limiting factor in how many transactions you can process concurrently.
Space-based architecture removes the central database as a synchronous constraint, leveraging replicated in-memory data grids and asynchronous data updates.
- Keeps application data in memory and is replicated among all active processing units.
- When a processing unit updates data, it asynchronously sends the data to the database (usually via messaging with persistent queues).
- Processing units dynamically start up and shut down with user load changes, ensuring high scalability.
- Since there’s no central database involved in the standard transactional processing of the application, the database bottleneck is removed.
Components of space-based architecture:
- Processing unit: contains application logic, in-memory data grid, and replication engine.
- Virtualized middleware: manages and coordinates the processing units. Manages data synchronization and request handling, includes messaging grid, data grid, processing grid, and deployment manager.
  - Messaging grid: manages input requests and session state. When a message comes in, the messaging grid determines which active processing components are available to receive the request & forwards it to one of those. Could be simple round-robin or a more complex next-available algorithm. Usually implemented with a typical web server with load balancing capabilities (HA Proxy and Nginx)
  - Data grid: implemented solely within the processing units as a replicated cache, but if you require an external controller or use a distributed cache, this functionality would reside in both the processing units and in the data grid in the virtualized middleware. Since the messaging grid can send data to any active processing unit, each must contain exactly the same data in their in-memory data grids.
  - Processing grid: optional component that manages orchestrated request processing when there are multiple processing units involved in a single business request.
  - Deployment manager: manages the dynamic startup and shutdown of processing unit instances based on load conditions. Continually monitors response times and user loads, starts a new processing unit when load increases, and shuts processing units down when load decreases.
- Data pumps: mechanisms used to transfer data between processing units and databases in a space-based architecture. They operate asynchronously, meaning the processing units don’t directly interact with the database in real time but send updates through the data pump, ensuring the system maintains eventual consistency between the in-memory cache and the database.
- Data writers: update the database with information from data pumps.
- Data readers: read data from the database when necessary and send it to processing units upon startup.
Data collisions can occur due to replication latency, but are calculated and managed based on update rate, number of instances, cache size, and latency.
Space-based architecture can be deployed both in cloud-based and on-prem environments, supporting hybrid topologies.
Different caching models:
- Replicated caching: fast, fault-tolerant, but might face issues with large cache sizes or high update rates.
- Distributed caching: consistent data but lower performance and fault tolerance.
Near-cache model: hybrid of in-memory data grids and distributed cache, not recommended due to inconsistent performance among processing units.
Suitable for applications with unpredictable spikes in user load, like online concert ticketing systems and auction systems.
Main characteristics: high elasticity, scalability, and performance.
Trade-offs include complexity, testing difficulty, and higher costs due to resource utilization and licensing.
Partitioning type is both domain partitioned and technically partitioned. Quantas vary based on the communication between processing units and user interfaces.

16. Orchestration-Driven Service-Oriented Architecture

Orchestration-driven service-oriented architecture (SOA) evolved in the late 1990s to accommodate rapid enterprise growth with distributed computing.
Was influenced by expensive and complex commercial systems, pushing for reuse and technical partitioning.
Notable layers: Business Services, Enterprise Services, Application Services, Infrastructure Services, and an Orchestration Engine.
- Business services were defined by domain behavior and served as an entry point, but had no actual code.
- Enterprise services were reusable, fine-grained implementations provided by dedicated teams for specific business domains.
- Application services were one-off implementations created for particular needs without reuse intentions.
- Infrastructure services were focused on operational tasks like monitoring and authentication, managed by a centralized infrastructure team.
- The Orchestration Engine acted as a core component, coordinating various service implementations and handling transactions declaratively.
The architecture’s focus on reuse led to significant coupling between components, making changes risky and requiring coordinated deployments.
The challenge of reuse led to inefficiency and frustration due to the increased complexity when consolidating multiple service needs.
The architecture struggled to support modern engineering metrics like deployability and testability effectively due to inherent complexity.
It did achieve some success in elasticity and scalability due to vendor efforts, although performance suffered.
This architecture style represent a milestone, highlighting the difficulties of distributed transactions and the limits of heavy technical partitioning, eventually leading to the development of microservices.

17. Microservices Architecture

Microservices architecture is a highly popular style.
It was named early on and popularized by a 2014 blog post by Martin Fowler and James Lewis.
Microservices architecture is inspired by domain-driven design (DDD) and emphasizes the concept of bounded context (from DDD).
Bounded context allows decoupling between internal components while avoiding coupling to external elements.
The architecture prioritizes high decoupling over reuse, leading to service duplication rather than shared dependencies. Reuse is beneficial, but often leads to coupling (programmers tend to ensure reuse through inheritance or composition).
Microservices have distributed architecture; each service runs in its own process, and includes all necessary parts to operate independently, including databases and such.
Performance is a notable concern due to the distributed nature requiring network calls and security checks.
The architecture avoids transactions across service boundaries (because it’s distributed), so determining the granularity of your services is important!
Each service models a domain (or subdomain) or workflow.
It’s a common mistake to make the services too small, and then you have to build communication links back between the services to do useful work. Don’t read too much into ‘micro’, just don’t make them ‘gigantic.’
Guidelines:
- Purpose: Each microservice should focus on one main function within the system, based on a clear domain or workflow.
- Transactions: Identify workflows where entities interact. Aim to avoid transactions between services as they can cause problems in distributed systems.
- Choreography: If services need to communicate a lot to work, it might be better to combine them into a single, larger service to reduce communication issues.
- Iteration: Perfect service design comes through refining and improving over time, not on the first attempt.
Data isolation is key: avoid shared schemas and databases to reduce coupling.
Microservices often utilize API layers for operational tasks, which offer a good location in the architecture for common tasks, either via indirection as a proxy or a tie into operational facilities (such as a naming service).
- API layer shouldn’t be used as a mediator or for orchestration — that violates the bounded context
What if you really need coupling? You may have some parts of your architecture that would benefit from coupling (e.g. for operational concerns: monitoring, logging, circuit breakers).
- The sidecar pattern offers a solution. Have the common operational concerns appear within each service as a separate component. Then when you have to update the operational concern, it’s just that component.
- You can build a service mesh for unified control access for concerns like logging and monitoring
- You can use service discovery to build elasticity into microservices architectures. Instead of invoking a single service, your request goes through a service discovery tool, which can monitor the number and frequency of requests, and spin up new instances of services to handle scale or elasticity concerns.
  - Service discovery is often included in the service mesh.
  - API layer is often used to host the service discovery. Then it’s a single place for user interfaces or other calling systems to find and create services in an elastic, consistent way.
Different UI styles: monolithic frontends and microfrontends
Communication in microservices: you need to decide on synchronous or asynchronous communication.
- Microservices usually use ‘protocol-aware heterogeneous interoperability’
- Protocol-aware: each service should know how to call other services, so we usually standardize how particular services call each other (REST, message queues, etc.). Services must know (or discover) which protocol to use to call other services.
- Heterogeneous: services may be written in different languages, so heterogeneous suggests that should be supported
- Interoperability: describes services calling each other
Choreography and Orchestration are two methods for managing inter-service communication (essentially the broker and mediator patterns, respectively)
The saga pattern handles distributed transactions
Microservices’ strengths: scalability, elasticity, and evolutionary adaptability.
Despite performance issues, the style supports fast-paced, modern business environments with high fault tolerance.

18. Choosing the Appropriate Architecture Style

Choosing an appropriate architectural style is context-dependent. You should consider things like trade-offs, strategic goals, and domain requirements.
The field isn’t static. Architectural styles evolve. There’ll likely be new styles in a few years that fit the tools we have available then.
- Be aware of industry trends to make informed decisions on architectural styles.
Some key decisions:
- Monolith versus distributed: Assess whether a single set of architecture characteristics suffices or if different system parts need different characteristics.
- Data location: Decide where data should reside.
- Communication between services: Choose between synchronous (use by default) and asynchronous communication.
The design process results in an architecture topology and includes:
- Decision records for complex design decisions.
- Architecture fitness functions to maintain principles and operational characteristics.

Part III. Techniques and Soft Skills

19. Architecture Decisions

Your core responsibility is making architecture decisions that guide technical choices in development.
Good architecture decisions involve gathering information, justifying, documenting, and communicating the decision.

Anti-patterns and how to overcome them

Covering Your Assets: Avoiding decisions out of fear of mistakes.
- Make decisions at the last responsible moment with adequate information.
- Collaborate with development teams for feasible decision implementation.
Groundhog Day: Repeatedly discussing decisions due to lack of justification.
- Provide complete justifications, including business and technical reasons.
- Align decisions with business value (cost, time to market, user satisfaction, strategic positioning).
Email-Driven Architecture: Losing track of decisions due to poor communication.
- Use a single system of record for decisions, not just email.
- Communicate decisions only to those directly impacted.

Decisions impacting structure, non-functional characteristics, dependencies, interfaces, or construction techniques are considered architectural.

Architecture Decision Records (ADRs)

ADRs document decisions with sections: Title, Status, Context, Decision, Consequences, Compliance, and Notes.
Status can be Proposed, Accepted, or Superseded.
Consider criteria like cost, cross-team impact, and security when determining decision approval level.
Context specifies forces and alternatives guiding the decision.
- Emphasizing the “why” (justification) is key to understanding decisions.
Consequences detail decision impacts and trade-offs.
Compliance outlines how decisions are measured and governed.
Notes capture metadata like authorship and modification history.
ADRs should be stored systematically, ideally outside of code repositories for broader access.
They provide a structured way for documenting and understanding software architecture.
ADRs can establish standards by justifying their existence and documenting implications.

22. Making Teams Effective

A software architect’s role extends beyond designing the technical architecture; it includes guiding the development team to implement the architecture effectively.
You don’t just throw your diagrams over the fence to the developers.

Successful architects create productive teams by establishing appropriate boundaries and providing the right level of guidance.

Team Boundaries
Architects influence team success by setting constraints within which developers operate. Boundaries that are too tight can stifle creativity and cause frustration, while those that are too loose can lead to confusion and misalignment.
The goal is to establish boundaries that are “just right,” enabling developers to work efficiently while adhering to the architectural vision.

Architect Personalities
There are three primary architect personalities, each impacting team boundaries differently:

Control Freak Architect: Imposes excessive constraints and micromanages details, leading to tight boundaries that hinder team productivity and morale.
Armchair Architect: Provides minimal guidance and is detached from the development process, resulting in loose boundaries where teams lack direction and struggle with implementation.
Effective Architect: Strikes a balance by offering appropriate guidance and constraints, fostering collaboration, and being available to support the team. This approach creates optimal boundaries for effective teamwork.

How Much Control?
An effective architect adjusts their level of control based on five key factors:

Team Familiarity: New teams may need more guidance, while established teams require less.
Team Size: Larger teams often necessitate more control to coordinate efforts effectively.
Overall Experience: Teams with less experience may benefit from additional mentoring and oversight.
Project Complexity: Complex projects demand closer architectural involvement.
Project Duration: Longer projects might require sustained control to maintain focus and momentum.

Team Warning Signs
Architects should watch for indicators that a team may be struggling:

Process Loss (Brook’s Law): Adding more people can reduce efficiency due to increased coordination efforts.
Pluralistic Ignorance: When everyone agrees to (but privately rejects) a norm because they think they are missing something obvious.
Diffusion of Responsibility: In larger teams, individuals may feel less accountable, causing tasks to be neglected.

Leveraging Checklists
Checklists can enhance team effectiveness by ensuring critical steps aren’t overlooked:

Developer Code Completion Checklist: Verifies code meets standards before being considered complete.
Unit and Functional Testing Checklist: Ensures thorough testing, including edge cases often missed.
Software Release Checklist: Prevents deployment issues by confirming all release steps are completed.

Effective checklists are concise, focus on error-prone areas, and avoid procedural tasks.
You don’t need checklists for everything. Knowing when to use them and when not to is important.
The more checklists you create, the less likely developers are to use them.
Try to find items that can be performed through automation & therefore removed from the checklist.

Don’t worry about stating the obvious in a checklist. It’s the obvious stuff that’s usually skipped or missed.

Providing Guidance
Architects should offer clear design principles to guide developers, such as:

Encouraging overlap analysis to prevent redundant functionality.
Requiring both technical and business justifications for new libraries or tools.

This helps developers make informed decisions that align with the project’s goals.

Notes on

Fundamentals of Software Architecture

by Mark Richards & Neal Ford