第 15 章 精炼

Fifteen. Distillation

Image —James Clerk Maxwell, A Treatise on Electricity and Magnetism, 1873

These four equations, along with the definitions of their terms and the body of mathematics they rest on, express the entirety of classical nineteenth-century electromagnetism.

How do you focus on your central problem and keep from drowning in a sea of side issues? A LAYERED ARCHITECTURE separates domain concepts from the technical logic that makes a computer system run, but in a large system, even the isolated domain may be unmanageably complex.

Distillation is the process of separating the components of a mixture to extract the essence in a form that makes it more valuable and useful. A model is a distillation of knowledge. With every refactoring to deeper insight, we abstract some crucial aspect of domain knowledge and priorities. Now, stepping back for a strategic view, this chapter looks at ways to distinguish broad swaths of the model and distill the domain model as a whole.

As with many chemical distillations, the separated by-products are themselves made more valuable by the distillation process (as GENERIC SUBDOMAINS and COHERENT MECHANISMS), but the effort is motivated by the desire to extract that one particularly valuable part, the part that distinguishes our software and makes it worth building: the “CORE DOMAIN.”

Strategic distillation of a domain model does all of the following:

  1. Aids all team members in grasping the overall design of the system and how it fits together

  2. Facilitates communication by identifying a core model of manageable size to enter the UBIQUITOUS LANGUAGE

  3. Guides refactoring

  4. Focuses work on areas of the model with the most value

  5. Guides outsourcing, use of off-the-shelf components, and decisions about assignments

This chapter lays out a systematic approach to strategic distillation of the CORE DOMAIN, and it explains how to effectively share a view of it within the team and provides the language to talk about what we are doing.

Image Figure 15.1. A navigation map for strategic distillation

Like a gardener pruning a tree, clearing the way for the growth of the main branches, we are going to apply a suite of techniques to hew away distractions in the model and focus our attention on the part that matters most. . . .

CORE DOMAIN Image In designing a large system, there are so many contributing components, all complicated and all absolutely necessary to success, that the essence of the domain model, the real business asset, can be obscured and neglected.

A system that is hard to understand is hard to change. The effect of a change is hard to foresee. A developer who wanders outside his or her own area of familiarity gets lost. (This is particularly true when bringing new people into a team, but even an established member of the team will struggle unless code is very expressive and organized.) This forces people to specialize. When developers confine their work to specific modules, it further reduces knowledge transfer. With the compartmentalization of work, smooth integration of the system suffers, and flexibility in assigning work is lost. Duplication crops up when a developer does not realize that a behavior already exists elsewhere, and so the system becomes even more complex.

Those are some of the consequences of any design that is hard to understand, but there is another, equally serious risk from losing the big picture of the domain:

The harsh reality is that not all parts of the design are going to be equally refined. Priorities must be set. To make the domain model an asset, the model’s critical core has to be sleek and fully leveraged to create application functionality. But scarce, highly skilled developers tend to gravitate to technical infrastructure or neatly definable domain problems that can be understood without specialized domain knowledge.

Such parts of the system seem interesting to computer scientists, and are perceived to build transferable professional skills and provide better resume material. The specialized core, that part of the model that really differentiates the application and makes it a business asset, typically ends up being put together by less skilled developers who work with DBAs to create a data schema and then code feature-by-feature without drawing on any conceptual power in the model at all.

Poor design or implementation of this part of the software leads to an application that never does compelling things for the users, no matter how well the technical infrastructure works, no matter how nice the supporting features are. This insidious problem can take root when a project lacks a sharp picture of the overall design and the relative significance of the various parts.

One of the most successful projects I’ve joined initially suffered from this syndrome. The goal was to develop a very complex syndicated loan system. Most of the strong talent was happily working on database mapping layers and messaging interfaces while the business model was in the hands of developers new to object technology.

The single exception, an experienced object developer working on a domain problem, devised a way of attaching comments to any of the long-lived domain objects. These comments could be organized so that traders could see the rationale they or others recorded for some past decision. He also built an elegant user interface that gave intuitive access to the flexible features of the comment model.

These features were useful and well designed. They went into production.

Unfortunately, they were peripheral. This talented developer modeled his interesting, generic way of commenting, implemented it cleanly, and put it into users’ hands. Meanwhile an incompetent developer was turning the mission-critical “loan” module into an incomprehensible tangle that the project very nearly did not recover from.

The planning process must drive resources to the most crucial points in the model and design. To do that, those points must be identified and understood by everyone during planning and development.

Those parts of the model distinctive and central to the purposes of the intended applications make up the CORE DOMAIN. The CORE DOMAIN is where the most value should be added in your system.

Therefore:

Boil the model down. Find the CORE DOMAIN and provide a means of easily distinguishing it from the mass of supporting model and code. Bring the most valuable and specialized concepts into sharp relief. Make the CORE small.

Apply top talent to the CORE DOMAIN, and recruit accordingly. Spend the effort in the CORE to find a deep model and develop a supple design—sufficient to fulfill the vision of the system. Justify investment in any other part by how it supports the distilled CORE.

Distilling the CORE DOMAIN is not easy, but it does lead to some easy decisions. You’ll put a lot of effort into making your CORE distinctive, while keeping the rest of the design as generic as is practical. If you need to keep some aspect of your design secret as a competitive advantage, it is the CORE DOMAIN. There is no need to waste effort concealing the rest. And whenever a choice has to be made (due to time limitations) between two desirable refactorings, the one that most affects the CORE DOMAIN should be chosen first.

Image Image Image The patterns in this chapter make the CORE DOMAIN easier to see and use and change.

Choosing the CORE We are looking at those parts of the model particular to representing your business domain and solving your business problems.

The CORE DOMAIN you choose depends on your point of view. For example, many applications need a generic model of money that could represent various currencies and their exchange rates and conversions. On the other hand, an application to support currency trading might need a more elaborate model of money, which would be considered part of the CORE. Even in such a case, there may be a part of the money model that is very generic. As insight into the domain deepens with experience, the distillation process can continue by separating the generic money concepts and retaining only the specialized aspects of the model in the CORE DOMAIN.

In a shipping application, the CORE could be the model of how cargoes are consolidated for shipping, how liability is transferred when containers change hands, or how a particular container is routed on various transports to reach its destination. In investment banking, the CORE could include the models of syndication of assets among assignees and participants.

One application’s CORE DOMAIN is another application’s generic supporting component. Still, throughout one project, and usually throughout one company, a consistent CORE can be defined. Like every other part of the design, the identification of the CORE DOMAIN should evolve through iterations. The importance of a particular set of relationships might not be apparent at first. The objects that seem obviously central at first may turn out to have supporting roles.

The discussion in the following sections, particularly GENERIC SUBDOMAINS, will give more guidelines for these decisions.

Who Does the Work? The most technically proficient members of project teams seldom have much knowledge of the domain. This limits their usefulness and reinforces the tendency to assign them to supporting components, sustaining a vicious circle in which lack of knowledge keeps them away from the work that would build domain knowledge.

It is essential to break this cycle by assembling a team matching up a set of strong developers who have a long-term commitment and an interest in becoming repositories of domain knowledge with one or more domain experts who know the business deeply. Domain design is interesting, technically challenging work when approached seriously, and developers can be found who see it this way.

It is usually not practical to hire short-term, outside design expertise for the nuts and bolts of creating the CORE DOMAIN, because the team needs to accumulate domain knowledge, and a temporary member is a leak in the bucket. On the other hand, an expert in a teaching/mentoring role can be very valuable by helping the team build its domain design skills and facilitating the use of sophisticated principles that team members probably have not mastered.

For similar reasons, it is unlikely that the CORE DOMAIN can be purchased. Efforts have been made to build industry-specific model frameworks, conspicuous examples being the semiconductor industry consortium SEMATECH’s CIM framework for semiconductor manufacturing automation, and IBM’s “San Francisco” frameworks for a wide range of businesses. Although this is a very enticing idea, so far the results have not been compelling, except perhaps as PUBLISHED LANGUAGES facilitating data interchange (see Chapter 14). The book Domain-Specific Application Frameworks (Fayad and Johnson 2000) gives an overview of the state of this art. As the field advances, more workable frameworks may be available.

Even so, there is a more fundamental reason for caution: The greatest value of custom software comes from the total control of the CORE DOMAIN. A well-designed framework may be able to provide high-level abstractions that you can specialize for your use. It may save you from developing the more generic parts and leave you free to concentrate on the CORE. But if it constrains you more than that, then there are three likely possibilities.

  1. You are losing an essential software asset. Back off restrictive frameworks in your CORE DOMAIN.

  2. The area treated by the framework is not as pivotal as you thought. Redraw the boundaries of the CORE DOMAIN to the truly distinctive part of the model.

  3. You don’t have special needs in your CORE DOMAIN. Consider a lower-risk solution, such as purchasing software to integrate with your applications.

One way or another, creating distinctive software comes back to a stable team accumulating specialized knowledge and crunching it into a rich model. No shortcuts. No magic bullets.

AN ESCALATION OF DISTILLATIONS The various distillation techniques that make up the rest of this chapter can be applied in almost any order, but there is a range in how radically they modify the design.

A simple DOMAIN VISION STATEMENT communicates the basic concepts and their value with a minimum investment. The HIGHLIGHTED CORE can improve communication and help guide decision making—and still requires little or no modification to the design.

More aggressive refactoring and repackaging explicitly separate GENERIC SUBDOMAINS, which can then be dealt with individually. COHESIVE MECHANISMS can be encapsulated with versatile, communicative, and supple design. Removing these distractions disentangles the CORE.

Repackaging a SEGREGATED CORE makes the CORE directly visible, even in the code, and facilitates future work on the CORE model.

And most ambitious is the ABSTRACT CORE, which expresses the most fundamental concepts and relationships in a pure form (and requires extensive reorganizing and refactoring of the model).

Each of these techniques requires a successively greater commitment, but a knife gets sharper as its blade is ground finer. Successive distillation of a domain model produces an asset that gives the project speed, agility, and precision of execution.

To start, we can boil off the least distinctive aspects of the model. GENERIC SUBDOMAINS provide a contrast to the CORE DOMAIN that clarifies the meaning of each. . . .

GENERIC SUBDOMAINS Some parts of the model add complexity without capturing or communicating specialized knowledge. Anything extraneous makes the CORE DOMAIN harder to discern and understand. The model clogs up with general principles everyone knows or details that belong to specialties which are not your primary focus but play a supporting role. Yet, however generic, these other elements are essential to the functioning of the system and the full expression of the model.

There is a part of your model that you would like to take for granted. It is undeniably part of the domain model, but it abstracts concepts that would probably be needed for a great many businesses. For example, a corporate organization chart is needed in some form by businesses as diverse as shipping, banking, or manufacturing. For another example, many applications track receivables, expense ledgers, and other financial matters that could all be handled using a generic accounting model.

Often a great deal of effort is spent on peripheral issues in the domain. I personally have witnessed two separate projects that have employed their best developers for weeks in redesigning dates and times with time zones. While such components must work, they are not the conceptual core of the system.

Even if such a generic model element is deemed critical, the overall domain model needs to make prominent the most value-adding and special aspects of your system, and needs to be structured to give that part as much power as possible. This is hard to do when the CORE is mixed with all the interrelated factors.

Therefore:

Identify cohesive subdomains that are not the motivation for your project. Factor out generic models of these subdomains and place them in separate MODULES. Leave no trace of your specialties in them.

Once they have been separated, give their continuing development lower priority than the CORE DOMAIN, and avoid assigning your core developers to the tasks (because they will gain little domain knowledge from them). Also consider off-the-shelf solutions or published models for these GENERIC SUBDOMAINS.

Image Image Image You may have a few extra options when developing these packages.

Option 1: An Off-the-Shelf Solution Sometimes you can buy an implementation or use open source code.

Advantages

• Less code to develop.

• Maintenance burden externalized.

• Code is probably more mature, used in multiple places, and therefore more bulletproof and complete than homegrown code.

Disadvantages

• You still have to spend the time to evaluate it and understand it before using it.

• Quality control being what it is in our industry, you can’t count on it being correct and stable.

• It may be overengineered for your purposes; integration could be more work than a minimalist homegrown implementation.

• Foreign elements don’t usually integrate smoothly. There may be a distinct BOUNDED CONTEXT. Even if not, it may be difficult to smoothly reference ENTITIES from your other packages.

• It may introduce platform dependencies, compiler version dependencies, and so on.

Off-the-shelf subdomain solutions are worth investigating, but they are usually not worth the trouble. I’ve seen success stories in applications with very elaborate workflow requirements that used commercially available external workflow systems with API hooks. I’ve also seen success with an error-logging package that was deeply integrated into the application. Sometimes GENERIC SUBDOMAIN solutions are packaged in the form of frameworks, which implement a very abstract model that can be integrated with and specialized for your application. The more generic the subcomponent, and the more distilled its own model, the better the chance that it will be useful.

Option 2: A Published Design or Model Advantages

• More mature than a homegrown model and reflects many people’s insights

• Instant, high-quality documentation

Disadvantage

• May not quite fit your needs or may be overengineered for your needs

Tom Lehrer (the comedic songwriter from the 1950s and 1960s) said the secret to success in mathematics was, “Plagiarize! Plagiarize. Let no one’s work evade your eyes. . . . Only be sure always to call it please, research.” Good advice in domain modeling, and especially when attacking a GENERIC SUBDOMAIN.

This works best when there is a widely distributed model, such as the ones in Analysis Patterns (Fowler 1996). (See Chapter 11.)

When the field already has a highly formalized and rigorous model, use it. Accounting and physics are two examples that come to mind. Not only are these very robust and streamlined, but they are widely understood by people everywhere, reducing your present and future training burden. (See Chapter 10, on using established formalisms.)

Don’t feel compelled to implement all aspects of a published model, if you can identify a simplified subset that is self-consistent and satisfies your needs. But in cases where there is a well-traveled and well-documented—or better yet, formalized—model available, it makes no sense to reinvent the wheel.

Option 3: An Outsourced Implementation Advantages

• Keeps core team free to work on the CORE DOMAIN, where most knowledge is needed and accumulated.

• Allows more development to be done without permanently enlarging the team, but without dissipating knowledge of the CORE DOMAIN.

• Forces an interface-oriented design, and helps keep the subdomain generic, because the specification is being passed outside.

Disadvantages

• Still requires time from the core team, because the interface, coding standards, and any other important aspects need to be communicated.

• Incurs significant overhead of transferring ownership back inside, because code has to be understood. (Still, overhead is less than for specialized subdomains, because a generic model presumably requires no special background to understand.)

• Code quality can vary. This could be good or bad, depending on the relative caliber of the two teams.

Automated tests can play an important role in outsourcing. The implementers should be required to provide unit tests for the code they deliver. A really powerful approach—one that helps ensure a degree of quality, clarifies the spec, and smooths reintegration—is to specify or even write automated acceptance tests for the outsourced components. Also, “outsourced implementation” can be an excellent combination with “published design or model.”

Option 4: An In-House Implementation Advantages

• Easy integration.

• You get just what you want and nothing extra.

• Temporary contractors can be assigned.

Disadvantages

• Ongoing maintenance and training burden.

• It is easy to underestimate the time and cost of developing such packages.

Of course, this too combines well with “published design or model.”

GENERIC SUBDOMAINS are the place to try to apply outside design expertise, because they do not require deep understanding of your specialized CORE DOMAIN, and they do not present a major opportunity to learn that domain. Confidentiality is of less concern, because little proprietary information or business practice will be involved in such modules. A GENERIC SUBDOMAIN lessens the training burden for those not committed to deep knowledge of the domain.

Over time, I believe our ideas of what constitutes the CORE model will narrow, and more and more generic models will be available as implemented frameworks, or at least as published models or analysis patterns. For now, we still have to develop most of these ourselves, but there is great value in partitioning them from the CORE DOMAIN model.

Example: A Tale of Two Time Zones Twice I’ve watched as the best developers on a project spent weeks of their time solving the problem of storing and converting times with time zones. While I’m always suspicious of such activities, sometimes it is necessary, and these two projects provide almost perfect contrast.

The first was an effort to design scheduling software for cargo shipping. To schedule international transports, it is critical to have accurate time calculations, and because all such schedules are tracked in local time, it is impossible to coordinate transports without conversions.

Having clearly established their need for this functionality, the team proceeded with development of the CORE DOMAIN and some early iterations of the application using the available time classes and some dummy data. As the application began to mature, it was clear that the existing time classes were not adequate, and that the problem was very intricate because of the variations between the many countries and the complexity of the International Date Line. With their requirements by now even clearer, they searched for an off-the-shelf solution, but found none. They had no option but to build it themselves.

The task would require research and precision engineering, so the team leaders assigned one of their best programmers. But the task did not require any special knowledge of shipping and would not cultivate that knowledge, so they chose a programmer who was on the project on a temporary contract.

This programmer did not start from scratch. He researched several existing implementations of time zones, most of which did not meet requirements, and decided to adapt the public-domain solution from BSD Unix, which had an elaborate database and an implementation in C. He reverse-engineered the logic and wrote an import routine for the database.

The problem turned out to be even harder than expected (involving, for example, the import of databases of special cases), but the code got written and integrated with the CORE and the product was delivered.

Things went very differently on the other project. An insurance company was developing a new claims-processing system, and planned to capture the times of various events (time of car crash, time of hail storm, and so on). This data would be recorded in local time, so time zone functionality was needed.

When I arrived, they had assigned a junior, but very smart, developer to the task, although the exact requirements of the app were still in play and not even an initial iteration had been attempted. He had dutifully set out to build a time zone model a priori.

Not knowing what would be needed, it was assumed that it should be flexible enough to handle anything. The programmer assigned to the task needed help with such a difficult problem, so a senior developer was assigned to it also. They wrote complex code, but no specific application was using it, so it was never clear that the code worked correctly.

The project ran aground for various reasons, and the time zone code was never used. But if it had been, simply storing local times tagged with the time zone might have been sufficient, even with no conversion, because this was primarily reference data and not the basis of computations. Even if conversion had turned out to be necessary, all the data was going to be gathered from North America, where time zone conversions are relatively simple.

The main cost of this attention to the time zones was the neglect of the CORE DOMAIN model. If the same energy had been placed there, they might have produced a functioning prototype of their own application and a first cut at a working domain model. Furthermore, the developers involved, who were committed long-term to the project, should have been steeped in the insurance domain, building up critical knowledge within the team.

One thing both projects did right was to cleanly segregate the GENERIC time zone model from the CORE DOMAIN. A shipping-specific or insurance-specific model of time zones would have coupled the model to this generic supporting model, making the CORE harder to understand (because it would contain irrelevant detail about time zones). It would have made the time zone MODULE harder to maintain (because the maintainer would have to understand the CORE and its interrelationship with time zones).

Image We technical people tend to enjoy definable problems like time zone conversion, and we can easily justify spending our time on them. But a disciplined look at priorities usually points to the CORE DOMAIN.

Generic Doesn’t Mean Reusable Note that while I have emphasized the generic quality of these subdomains, I have not mentioned the reusability of code. Off-the-shelf solutions may or may not make sense for a particular situation, but assuming that you are implementing the code yourself, in-house or outsourced, you should specifically not concern yourself with the reusability of that code. This would go against the basic motivation of distillation: that you should be applying as much of your effort to the CORE DOMAIN as possible and investing in supporting GENERIC SUB-DOMAINS only as necessary.

Reuse does happen, but not always code reuse. The model reuse is often a better level of reuse, as when you use a published design or model. And if you have to create your own model, it may well be valuable in a later related project. But while the concept of such a model may be applicable to many situations, you do not have to develop the model in its full generality. You can model and implement only the part you need for your business.

Though you should seldom design for reusability, you must be strict about keeping within the generic concept. Introducing industry-specific model elements will have two costs. First, it will impede future development. Although you need only a small part of the subdomain model now, your needs will grow. By introducing anything to the design that is not part of the concept, you make it much more difficult to expand the system cleanly without completely rebuilding the older part and redesigning the other modules that use it.

The second, and more important, reason is that those industry-specific concepts belong either in the CORE DOMAIN or in their own, more specialized, subdomains, and those specialized models are even more valuable than the generic ones.

Project Risk Management Agile processes typically call for managing risk by tackling the riskiest tasks early. XP specifically calls for getting an end-to-end system up and running immediately. This initial system often proves a technical architecture, and it is tempting to build a peripheral system that handles some supporting GENERIC SUBDOMAIN because these are usually easier to analyze. But be careful; this can defeat the purpose of risk management.

Projects face risk from both sides, with some projects having greater technical risks and others greater domain modeling risks. The end-to-end system mitigates risk only to the extent that it is an embryonic version of the challenging parts of the actual system. It is easy to underestimate the domain modeling risk. It can take the form of unforeseen complexity, inadequate access to business experts, or gaps in key skills of the developers.

Therefore, except when the team has proven skills and the domain is very familiar, the first-cut system should be based on some part of the CORE DOMAIN, however simple.

The same principle applies to any process that tries to push high-risk tasks forward: the CORE DOMAIN is high risk because it is often unexpectedly difficult and because without it, the project cannot succeed.

Most of the distillation patterns in this chapter show how to change the model and code to distill the CORE DOMAIN. However, the next two patterns, DOMAIN VISION STATEMENT and HIGHLIGHTED CORE, show how the use of supplemental documents can, with a very minor investment, improve communication and awareness of the CORE and focus development effort. . . .

DOMAIN VISION STATEMENT At the beginning of a project, the model usually doesn’t even exist, yet the need to focus its development is already there. In later stages of development, there is a need for an explanation of the value of the system that does not require an in-depth study of the model. Also, the critical aspects of the domain model may span multiple BOUNDED CONTEXTS, but by definition these distinct models can’t be structured to show their common focus.

Many project teams write “vision statements” for management. The best of these documents lay out the specific value the application will bring to the organization. Some mention the creation of the domain model as a strategic asset. Usually the vision statement document is abandoned after the project gets funding, and it is never used in the actual development process or even read by the technical staff.

A DOMAIN VISION STATEMENT is modeled after such documents, but it focuses on the nature of the domain model and how it is valuable to the enterprise. It can be used directly by the management and technical staff during all phases of development to guide resource allocation, to guide modeling choices, and to educate team members. If the domain model serves many masters, this document can show how their interests are balanced.

Therefore:

Write a short description (about one page) of the CORE DOMAIN and the value it will bring, the “value proposition.” Ignore those aspects that do not distinguish this domain model from others. Show how the domain model serves and balances diverse interests. Keep it narrow. Write this statement early and revise it as you gain new insight.

A DOMAIN VISION STATEMENT can be used as a guidepost that keeps the development team headed in a common direction in the ongoing process of distilling the model and code itself. It can be shared with nontechnical team members, management, and even customers (except where it contains proprietary information, of course).

Image Image Image Image Image A DOMAIN VISION STATEMENT gives the team a shared direction. Some bridge between the high-level STATEMENT and the full detail of the code or model will usually be needed. . . .

HIGHLIGHTED CORE A DOMAIN VISION STATEMENT identifies the CORE DOMAIN in broad terms, but it leaves the identification of the specific CORE model elements up to the vagaries of individual interpretation. Unless there is an exceptionally high level of communication on the team, the VISION STATEMENT alone will have little impact.

Image Image Image Even though team members may know broadly what constitutes the CORE DOMAIN, different people won’t pick out quite the same elements, and even the same person won’t be consistent from one day to the next. The mental labor of constantly filtering the model to identify the key parts absorbs concentration better spent on design thinking, and it requires comprehensive knowledge of the model. The CORE DOMAIN must be made easier to see.

Significant structural changes to the code are the ideal way of identifying the CORE DOMAIN, but they are not always practical in the short term. In fact, such major code changes are difficult to undertake without the very view the team is lacking.

Structural changes in the organization of the model, such as partitioning GENERIC SUBDOMAINS and a few others to come later in this chapter, can allow the MODULES to tell the story. But as the only means of communicating the CORE DOMAIN, this is too ambitious to shoot for straight away.

You will probably need a lighter solution to supplement these aggressive techniques. You may have constraints that prevent you from physically separating the CORE. Or you may be starting out with existing code that does not differentiate the CORE well, but you really need to see the CORE, and share that view, to effectively refactor toward better distillation. And even at an advanced stage, a few carefully selected diagrams or documents provide mental anchor points and entry points for the team.

These issues arise equally for projects that use elaborate UML models and those (such as XP projects) that keep few external documents and use the code as the primary repository of the model. An Extreme Programming team might be more minimalist, keeping these supplements more casual and more transient (for example, a hand-drawn diagram on the wall for all to see), but these techniques can fold nicely into the process.

Marking off a privileged part of a model, along with the implementation that embodies it, is a reflection on the model, not necessarily part of the model itself. Any technique that makes it easy for everyone to know the CORE DOMAIN will do. Two specific techniques can represent this class of solutions.

The Distillation Document Often I create a separate document to describe and explain the CORE DOMAIN. It can be as simple as a list of the most essential conceptual objects. It can be a set of diagrams focused on those objects, showing their most critical relationships. It can walk through the fundamental interactions at an abstract level or by example. It can use UML class or sequence diagrams, nonstandard diagrams particular to the domain, carefully worded textual explanations, or combinations of these. A distillation document is not a complete design document. It is a minimalist entry point that delineates and explains the CORE and suggests reasons for closer scrutiny of particular pieces. The reader is given a broad view of how the pieces fit and guided to the appropriate part of the code for more details.

Therefore (as one form of HIGHLIGHTED CORE):

Write a very brief document (three to seven sparse pages) that describes the CORE DOMAIN and the primary interactions among CORE elements.

All the usual risks of separate documents apply.

  1. The document may not be maintained.

  2. The document may not be read.

  3. By multiplying the information sources, the document may defeat its own purpose of cutting through complexity.

The best way to limit these risks is to be absolutely minimalist. Staying away from mundane detail and focusing on the central abstractions and their interactions allows the document to age more slowly, because this level of the model is usually more stable.

Write the document to be understood by the nontechnical members of the team. Use it as a shared view that delineates what every-one needs to know, and a guide by which all team members may start their exploration of the model and code.

The Flagged CORE On my first day on a project at a major insurance company, I was given a copy of the “domain model,” a two-hundred-page document, purchased at great expense from an industry consortium. I spent a few days wading through a jumble of class diagrams covering everything from the detailed composition of insurance policies to extremely abstract models of relationships between people. The quality of the factoring of these models ranged from high-school project to rather good (a few even described business rules, at least in the accompanying text). But where to start? Two hundred pages.

The project culture heavily favored abstract framework building, and my predecessors had focused on a very abstract model of the relationship of people with each other, with things, and with activities or agreements. It was actually a nice analysis of these relationships, and their experiments with the model had the quality of an academic research project. But it wasn’t getting us anywhere near an insurance application.

My first instinct was to start slashing, finding a small CORE DOMAIN to fall back on, then refactoring that and reintroducing other complexities as we went. But the management was alarmed by this attitude. The document was invested with great authority. Its production had involved experts from across the industry, and in any event they had paid the consortium far more than they were paying me, so they were unlikely to weigh my recommendations for radical change too heavily. But I knew we had to get a shared picture of our CORE DOMAIN and get everyone’s efforts focused on that.

Instead of refactoring, I went through the document and, with the help of a business analyst who knew a great deal about the insurance industry in general and the requirements of the application we were to build in particular, I identified the handful of sections that presented the essential, differentiating concepts we needed to work with. I provided a navigation of the model that clearly showed the CORE and its relationship to supporting features.

A new prototyping effort started from this perspective, and quickly yielded a simplified application that demonstrated some of the required functionality.

Two pounds of recyclable paper was turned into a business asset by a few page tabs and some yellow highlighter.

This technique is not specific to object diagrams on paper. A team that uses UML diagrams extensively could use a “stereotype” to identify core elements. A team that uses the code as the sole repository of the model might use comments, maybe structured as Java Doc, or might use some tool in its development environment. The particular technique doesn’t matter, as long as a developer can effortlessly see what is in and what is out of the CORE DOMAIN.

Therefore (as another form of HIGHLIGHTED CORE):

Flag each element of the CORE DOMAIN within the primary repository of the model, without particularly trying to elucidate its role. Make it effortless for a developer to know what is in or out of the CORE.

The CORE DOMAIN is now clearly visible to those working with the model, with a fairly small effort and low maintenance, at least to the extent that the model is factored fine enough to distinguish the contributions of parts.

The Distillation Document as Process Tool Theoretically on an XP project, any pair (two programmers working together) can change any code in the system. In practice, some changes have major implications, and call for more consultation and coordination. When working in the infrastructure layer, the impact of a change may be clear, but it may not be so obvious in the domain layer, as typically organized.

With the concept of the CORE DOMAIN, this impact can be made clear. Changes to the model of the CORE DOMAIN should have a big effect. Changes to widely used generic elements may require a lot of code updating, but they still shouldn’t create the conceptual shift that CORE changes do.

Use the distillation document as a guide. When developers realize that the distillation document itself requires change to stay in sync with their code or model change, then consultation is called for. Either they are fundamentally changing the CORE DOMAIN elements or relationships, or they are changing the boundaries of the CORE, including or excluding something different. Dissemination of the model change to the whole team is necessary by whatever communication channels the team uses, including distribution of a new version of the distillation document.

If the distillation document outlines the essentials of the CORE DOMAIN, then it serves as a practical indicator of the significance of a model change. When a model or code change affects the distillation document, it requires consultation with other team members. When the change is made, it requires immediate notification of all team members, and the dissemination of a new version of the document. Changes outside the CORE or to details not included in the distillation document can be integrated without consultation or notification and will be encountered by other members in the course of their work. Then the developers have the full autonomy that XP suggests.

Image Image Image Although the VISION STATEMENT and HIGHLIGHTED CORE inform and guide, they do not actually modify the model or the code itself. Partitioning GENERIC SUBDOMAINS physically removes some distracting elements. The next patterns look at ways to structurally change the model and the design itself to make the CORE DOMAIN more visible and manageable. . . .

COHESIVE MECHANISMS Encapsulating mechanisms is a standard principle of object-oriented design. Hiding complex algorithms in methods with intention-revealing names separates the “what” from the “how.” This technique makes a design simpler to understand and use. Yet it runs into natural limits.

Computations sometimes reach a level of complexity that begins to bloat the design. The conceptual “what” is swamped by the mechanistic “how.” A large number of methods that provide algorithms for resolving the problem obscure the methods that express the problem.

This proliferation of procedures is a symptom of a problem in the model. Refactoring toward deeper insight can yield a model and design whose elements are better suited to solving the problem. The first solution to seek is a model that makes the computation mechanism simple. But now and then the insight emerges that some part of the mechanism is itself conceptually coherent. This conceptual computation will probably not include all of the messy computations you need. We are not talking about some kind of catch-all “calculator.” But extracting the coherent part should make the remaining mechanism easier to understand.

Therefore:

Partition a conceptually COHESIVE MECHANISM into a separate lightweight framework. Particularly watch for formalisms or well-documented categories of algorithms. Expose the capabilities of the framework with an INTENTION-REVEALING INTERFACE. Now the other elements of the domain can focus on expressing the problem (“what”), delegating the intricacies of the solution (“how”) to the framework.

These separated mechanisms are then placed in their supporting roles, leaving a smaller, more expressive CORE DOMAIN that uses the mechanism through the interface in a more declarative style.

Recognizing a standard algorithm or formalism moves some of the complexity of the design into a studied set of concepts. With such a guide, we can implement a solution with confidence and little trial and error. We can count on other developers knowing about it or at least being able to look it up. This is similar to the benefits of a published GENERIC SUBDOMAIN model, but a documented algorithm or formal computation may be found more often because this level of computer science has been studied more. Still, more often than not you will have to create something new. Make it narrowly focused on the computation and avoid mixing in the expressive domain model. There is a separation of responsibilities: The model of the CORE DOMAIN or a GENERIC SUBDOMAIN formulates a fact, rule, or problem. A COHESIVE MECHANISM resolves the rule or completes the computation as specified by the model.

Example: A Mechanism in an Organization Chart I went through this process on a project that needed a fairly elaborate model of an organization chart. This model represented the fact that one person worked for another, and in which branches of the organization, and it provided an interface by which relevant questions might be asked and answered. Because most of these questions were along the lines of “Who, in this chain of command, has authority to approve this?” or “Who, in this department, is capable of handling an issue like this?” the team realized that most of the complexity involved traversing specific branches of the organizational tree, searching for specific people or relationships. This is exactly the kind of problem solved by the well-developed formalism of a graph, a set of nodes connected by arcs (called edges) and the rules and algorithms needed to traverse the graph.

A subcontractor implemented a graph traversal framework as a COHESIVE MECHANISM. This framework used standard graph terminology and algorithms familiar to most computer scientists and abundantly documented in textbooks. By no means did he implement a fully general graph. It was a subset of that conceptual framework that covered the features needed for our organization model. And with an INTENTION-REVEALING INTERFACE, the means by which the answers are obtained are not a primary concern.

Now the organization model could simply state, using standard graph terminology, that each person is a node, and that each relationship between people is an edge (arc) connecting those nodes. After that, presumably, mechanisms within the graph framework could find the relationship between any two people.

If this mechanism had been incorporated into the domain model, it would have cost us in two ways. The model would have been coupled to a particular method of solving the problem, limiting future options. More important, the model of an organization would have been greatly complicated and muddied. Keeping mechanism and model separate allowed a declarative style of describing organizations that was much clearer. And the intricate code for graph manipulation was isolated in a purely mechanistic framework, based on proven algorithms, that could be maintained and unit-tested in isolation.

Another example of a COHESIVE MECHANISM would be a framework for constructing SPECIFICATION objects and supporting the basic comparison and combination operations expected of them. By employing such a framework, the CORE DOMAIN and GENERIC SUBDOMAINS can declare their SPECIFICATIONS in the clear, easily understood language described in that pattern (see Chapter 10). The intricate operations involved in carrying out the comparisons and combinations can be left to the framework.

Image Image Image GENERIC SUBDOMAIN Versus COHESIVE MECHANISM Both GENERIC SUBDOMAINS and COHESIVE MECHANISMS are motivated by the same desire to unburden the CORE DOMAIN. The difference is the nature of the responsibility taken on. A GENERIC SUBDOMAIN is based on an expressive model that represents some aspect of how the team views the domain. In this it is no different than the CORE DOMAIN, just less central, less important, less specialized. A COHESIVE MECHANISM does not represent the domain; it solves some sticky computational problem posed by the expressive models.

A model proposes; a COHESIVE MECHANISM disposes.

In practice, unless you recognize a formalized, published computation, this distinction is usually not pure, at least not at first. In successive refactoring it could either be distilled into a purer mechanism or be transformed into a GENERIC SUBDOMAIN with some previously unrecognized model concepts that would make the mechanism simple.

When a MECHANISM Is Part of the CORE DOMAIN You almost always want to remove MECHANISMS from the CORE DOMAIN. The one exception is when a MECHANISM is itself proprietary and a key part of the value of the software. This is sometimes the case with highly specialized algorithms. For example, if one of the distinguishing features of a shipping logistics application were a particularly effective algorithm for working out schedules, that MECHANISM could be considered part of the conceptual CORE. I once worked on a project at an investment bank in which highly proprietary algorithms for rating risk were definitely in the CORE DOMAIN. (In fact, they were held so closely that even most of the CORE developers were not allowed to see them.) Of course, these algorithms are probably a particular implementation of a set of rules that really predict risk. Deeper analysis might lead to a deeper model that would allow those rules to be explicit, with an encapsulated solving mechanism.

But that would be another incremental improvement in the design, for another day. The decision as to whether to go that next step would be based on a cost-benefit analysis: How difficult would it be to work out that new design? How difficult is the current design to understand and modify? How much easier would it be with a more advanced design, for the type of people who would be expected to do the work? And of course, does anyone have any idea what form the new model might take?

Example: Full Circle: Organization Chart Reabsorbs Its MECHANISM Actually, a year after we completed the organization model in the previous example, other developers redesigned it to eliminate the separation of the graph framework. They felt the increased object count and the complication of separating the MECHANISM into a separate package were not warranted. Instead, they added node behavior to the parent class of the organizational ENTITIES. Still, they retained the declarative public interface of the organization model. They even kept the MECHANISM encapsulated, within the organizational ENTITIES.

These full circles are common, but they do not return to their starting point. The end result is usually a deeper model that more clearly differentiates facts, goals, and MECHANISMS. Pragmatic refactoring retains the important virtues of the intermediate stages while shedding the unneeded complications.

Distilling to a Declarative Style Declarative design and “declarative style” is a topic of Chapter 10, but that design style deserves special mention in this chapter on strategic distillation. The value of distillation is being able to see what you are doing: cutting to the essence without being distracted by irrelevant detail. Important parts of the CORE DOMAIN may be able to follow a declarative style, when the supporting design provides an economical language for expressing the concepts and rules of the CORE while encapsulating the means of computing or enforcing them.

COHESIVE MECHANISMS are by far most useful when they provide access through an INTENTION-REVEALING INTERFACE, with conceptually coherent ASSERTIONS and SIDE-EFFECT-FREE FUNCTIONS. MECHANISMS and supple designs allow the CORE DOMAIN to make meaningful statements rather than calling obscure functions. But an exceptional payoff comes when part of the CORE DOMAIN itself breaks through to a deep model and starts to function as a language that can express the most important application scenarios flexibly and concisely.

A deep model often comes with a corresponding supple design. When a supple design reaches maturity, it provides an easily understood set of elements that can be combined unambiguously to accomplish complex tasks or express complex information, just as words are combined into sentences. At that point, client code takes on a declarative style and can be much more distilled.

Factoring out GENERIC SUBDOMAINS reduces clutter, and COHESIVE MECHANISMS serve to encapsulate complex operations. This leaves behind a more focused model, with fewer distractions that add no particular value to the way users conduct their activities. But you are unlikely ever to find good homes for everything in the domain model that is not CORE. The SEGREGATED CORE takes a direct approach to structurally marking off the CORE DOMAIN….

SEGREGATED CORE Elements in the model may partially serve the CORE DOMAIN and partially play supporting roles. CORE elements may be tightly coupled to generic ones. The conceptual cohesion of the CORE may not be strong or visible. All this clutter and entanglement chokes the CORE. Designers can’t clearly see the most important relationships, leading to a weak design.

By factoring out GENERIC SUBDOMAINS, you clear away some of the obscuring detail from the domain, making the CORE more visible. But it is hard work identifying and clarifying all these subdomains, and some of them don’t seem worth the trouble. Meanwhile, the all-important CORE DOMAIN is left entangled with the residue.

Therefore:

Refactor the model to separate the CORE concepts from supporting players (including ill-defined ones) and strengthen the cohesion of the CORE while reducing its coupling to other code. Factor all generic or supporting elements into other objects and place them into other packages, even if this means refactoring the model in ways that separate highly coupled elements.

This is basically taking the same principles we applied to GENERIC SUBDOMAINS but from the other direction. The cohesive subdomains that are central to our application can be identified and partitioned into coherent packages of their own. What is done with the undifferentiated mass left behind is important, but not as important. It can be left more or less where it was, or placed into packages based on prominent classes. Eventually, more and more of the residue can be factored into GENERIC SUBDOMAINS, but in the short term any easy solution will do, just so the focus on the SEGREGATED CORE is retained.

Image Image Image The steps needed to refactor to SEGREGATED CORE are typically something like these:

  1. Identify a CORE subdomain (possibly drawing from the distillation document).

  2. Move related classes to a new MODULE, named for the concept that relates them.

  3. Refactor code to sever data and functionality that are not directly expressions of the concept. Put the removed aspects into (possibly new) classes in other packages. Try to place them with conceptually related tasks, but don’t waste too much time being perfect. Keep focused on scrubbing the CORE subdomain and making the references from it to other packages explicit and self-explanatory.

  4. Refactor the newly SEGREGATED CORE MODULE to make its relationships and interactions simpler and more communicative, and to minimize and clarify its relationships with other MODULES. (This becomes an ongoing refactoring objective.)

  5. Repeat with another CORE subdomain until the SEGREGATED CORE is complete.

The Costs of Creating a SEGREGATED CORE Segregating the CORE will sometimes make relationships with tightly coupled non-CORE classes more obscure or even more complicated, but that cost is outweighed by the benefit of clarifying the CORE DOMAIN and making it much easier to work on.

The SEGREGATED CORE will let you enhance the cohesion of that CORE DOMAIN. There are many meaningful ways of breaking down a model, and sometimes in the creation of a SEGREGATED CORE a nicely cohesive MODULE may be broken, sacrificing that cohesion for the sake of bringing out the cohesiveness of the CORE DOMAIN. This is a net gain, because the greatest value-added of enterprise software comes from the enterprise-specific aspects of the model.

The other cost, of course, is that segregating the CORE is a lot of work. It must be acknowledged that a decision to go to a SEGREGATED CORE will potentially absorb developers in changes all over the system.

The time to chop out a SEGREGATED CORE is when you have a large BOUNDED CONTEXT that is critical to the system, but where the essential part of the model is being obscured by a great deal of supporting capability.

Evolving Team Decision As with many strategic design decisions, an entire team must move to a SEGREGATED CORE together. This step requires a team decision process and a team disciplined and coordinated enough to carry out the decision. The challenge is to constrain everyone to use the same definition of the CORE while not freezing that decision. Because the CORE DOMAIN evolves just like every other aspect of a design, experience working with a SEGREGATED CORE will lead to new insights into what is essential and what is a supporting element. Those insights should feed back into a refined definition of the CORE DOMAIN and of the SEGREGATED CORE MODULES.

This means that new insights must be shared with the team on an ongoing basis, but an individual (or programming pair) cannot act on those insights unilaterally. Whatever the process is for joint decisions, whether consensus or team leader directive, it must be agile enough to make repeated course corrections. Communication must be effective enough to keep everyone together in one view of the CORE.

Example: Segregating the CORE of a Cargo Shipping Model We start with the model shown in Figure 15.2 as the basis of software for cargo shipping coordination.

Image Figure 15.2

Note that this is highly simplified compared to what would likely be needed for a real application. A realistic model would be too cumbersome for an example. Therefore, although this example may not be complicated enough to drive us to a SEGREGATED CORE, take a leap of imagination to treat this model as being too complex to interpret easily and deal with as a whole.

Now, what is the essence of the shipping model? Usually a good place to start looking is the “bottom line.” This might lead us to focus on pricing and invoices. But we really need to look at the DOMAIN VISION STATEMENT. Here is an excerpt from this one.

. . . Increase visibility of operations and provide tools to fulfill customer requirements faster and more reliably…

This application is not being designed for the sales department. It is going to be used by the front-line operators of the company. So let’s relegate all money-related issues to (admittedly important) supporting roles. Someone has already placed some of these items into a separate package (Billing). We can keep that, and further recognize that it plays a supporting role.

The focus needs to be on the cargo handling: delivery of the cargo according to customer requirements. Extracting the classes most directly involved in these activities produces a SEGREGATED CORE in a new package called Delivery, as shown in Figure 15.3.

Image Figure 15.3. Reliable delivery in adherence with customer requirements is the core goal of this project.

For the most part, classes have just moved into the new package, but there have been a few changes to the model itself.

First, the Customer Agreement now constrains the Handling Step. This is typical of the insights that tend to arise as the team segregates the CORE. As attention is focused on effective, correct delivery, it becomes clear that the delivery constraints in the Customer Agreement are fundamental and should be explicit in the model.

The other change is more pragmatic. In the refactored model, the Customer Agreement is attached directly to the Cargo, rather than requiring a navigation through the Customer. (It will have to be attached when the Cargo is booked, just as the Customer is.) At actual delivery time, the Customer is not as relevant to operations as the agreement itself. In the other model, the correct Customer had to be found, according to the role it played in the shipment, and then queried for its Customer Agreement. This interaction would clog up every story you set out to tell about the model. The new association makes the most important scenarios as simple and direct as possible. Now it becomes easy to pull the Customer out of the CORE altogether.

And what about pulling Customer out, anyway? The focus is on fulfilling the Customer’s requirements, so at first Customer seems to belong in the CORE. Yet the interactions during delivery do not usually need to involve the Customer class now that the Customer Agreement is available directly. And the basic model of a Customer is pretty generic.

A strong argument could be made for Leg to remain in the CORE. I tend to be minimalist in the CORE, and the Leg has tighter cohesion with Transport Schedule, Routing Service, and Location, none of which needed to be in the CORE. But if a lot of the stories I wanted to tell about this model involved Legs, I’d move it into the Delivery package and suffer the awkwardness of its separation from those other classes.

In this example, all the class definitions are the same as before, but often distillation requires refactoring the classes themselves to separate the generic and domain-specific responsibilities, which can then be segregated.

Now that we have a SEGREGATED CORE, the refactoring is complete. But the Shipping package we are left with is just “everything left over after we pulled out the CORE.” We can follow up with other refactorings to get more communicative packaging, as shown in Figure 15.4.

Image Figure 15.4. Meaningful MODULES for non-CORE subdomains follow after the SEGREGATED CORE is complete.

It might take several refactorings to get to this point; it doesn’t have to be done all at once. Here, we’ve ended up with one SEGREGATED CORE package, one GENERIC SUBDOMAIN, and two domain-specific packages in supporting roles. Deeper insight might eventually produce a GENERIC SUBDOMAIN for Customer, or it might end up more specialized for shipping.

Recognizing useful, meaningful MODULES is a modeling activity (as discussed in Chapter 5). Developers and domain experts collaborate in strategic distillation as part of the knowledge crunching process.

ABSTRACT CORE Image Even the CORE DOMAIN model usually has so much detail that communicating the big picture can be difficult.

Image Image Image We usually deal with a large model by breaking it into narrower subdomains that are small enough to be grasped and placing them in separate MODULES. This reductive style of packaging often works to make a complicated model manageable. But sometimes creating separate MODULES can obscure or even complicate the interactions between the subdomains.

When there is a lot of interaction between subdomains in separate MODULES, either many references will have to be created between MODULES, which defeats much of the value of the partitioning, or the interaction will have to be made indirect, which makes the model obscure.

Consider slicing horizontally rather than vertically. Polymorphism gives us the power to ignore a lot of the detailed variation among instances of an abstract type. If most of the interactions across MODULES can be expressed at the level of polymorphic interfaces, it may make sense to refactor these types into a special CORE MODULE.

We are not looking for a technical trick here. This is a valuable technique only when the polymorphic interfaces correspond to fundamental concepts in the domain. In that case, separating these abstractions decouples the MODULES while distilling a smaller and more cohesive CORE DOMAIN.

Therefore:

Identify the most fundamental concepts in the model and factor them into distinct classes, abstract classes, or interfaces. Design this abstract model so that it expresses most of the interaction between significant components. Place this abstract overall model in its own MODULE, while the specialized, detailed implementation classes are left in their own MODULES defined by subdomain.

Most of the specialized classes will now reference the ABSTRACT CORE MODULE but not the other specialized MODULES. The ABSTRACT CORE gives a succinct view of the main concepts and their interactions.

The process of factoring out the ABSTRACT CORE is not mechanical. For example, if all the classes that were frequently referenced across MODULES were automatically moved into a separate MODULE, the likely result would be a meaningless mess. Modeling an ABSTRACT CORE requires a deep understanding of the key concepts and the roles they play in the major interactions of the system. In other words, it is an example of refactoring to deeper insight. And it usually requires considerable redesign.

The ABSTRACT CORE should end up looking a lot like the distillation document (if both were used on the same project, and the distillation document had evolved with the application as insight deepened). Of course, the ABSTRACT CORE will be written in code, and therefore more rigorous and more complete.

Image Image Image DEEP MODELS DISTILL Distillation does not operate only on the gross level of separating parts of the domain away from the CORE. It also means refining those subdomains, especially the CORE DOMAIN, through continuously refactoring toward deeper insight, driving toward a deep model and supple design. The goal is a design that makes the model obvious, a model that expresses the domain simply. A deep model distills the most essential aspects of a domain into simple elements that can be combined to solve the important problems of the application.

Although a breakthrough to a deep model provides value anywhere it happens, it is in the CORE DOMAIN that it can change the trajectory of an entire project.

CHOOSING REFACTORING TARGETS When you encounter a large system that is poorly factored, where do you start? In the XP community, the answer tends to be either one of these:

  1. Just start anywhere, because it all has to be refactored.

  2. Start wherever it is hurting. I’ll refactor what I need to in order to get my specific task done.

I don’t hold with either of these. The first is impractical except in a few projects staffed entirely with top programmers. The second tends to pick around the edges, treating symptoms and ignoring root causes, shying away from the worst tangles. Eventually the code becomes harder and harder to refactor.

So, if you can’t do it all, and you can’t be pain-driven, what do you do?

  1. In a pain-driven refactoring, you look to see if the root involves the CORE DOMAIN or the relationship of the CORE to a supporting element. If it does, you bite the bullet and fix that first.

  2. When you have the luxury of refactoring freely, you focus first on better factoring of the CORE DOMAIN, on improving the segregation of the CORE, and on purifying supporting subdomains to be GENERIC.

This is how to get the most bang for your refactoring buck.