In the book, Data Strategy, by Sid Adelman, Larissa Moss, and Majid Abai, the authors make the following statement:
Working without a data strategy is analogous to a company allowing each department and each person within each department to develop its own financial chart of accounts. This empowerment allows each person in the organization to choose his own numbering scheme. [...] Even to those of us who don't wear green eyed shades, the resulting chaos is obvious and easy to predict – (pg. 3, para 2).
What astounds me is how a risk that seems so blatantly obvious is completely missed by the majority of enterprises. In a recent study by Forrester Research, they found that 74% of over 400 companies surveyed view data strategy as critical or very important, but only 17% of them had a mature data strategy in place – (Topic Overview: Information Architecture by Gene Leganza, January 21, 2010, Forrester Research).
When you consider that most enterprises are outsourcing a substantial part of their core business systems, it is frightening that they do not have a strategy in place. The result is that each of their vendors defines their own view of the data and the enterprise loses control of what happens with their application infrastructure.
Scott Busse, in an article entitled Describing a Data Strategy to a Business Leader, called this uncontrolled, evolutionary data strategy a "waxy build-up" that leads to "higher costs, rigid processes, and a lack of insight into enterprise data" – (para 4).
In this article we will briefly look at what Data Strategy is, and then focus on how data architectural integrity can be maintained in the Enterprise Architecture process.
The Scope of this Article
While the data architecture discipline owns the responsibility for the overall data strategy, I want to narrow data architecture for the purposes of this discussion. A formal, mature data strategy supports the enterprise by providing at least the following:
- Enterprise Data Model – The EDM is a model that defines all enterprise-level entities at a logical level, their relationships to each other, the life-cycle of each entity, what systems and services act upon it, and where the entity is used in the enterprise. It also defines the attributes of each entity and forms the basis for a common language for the enterprise.
- Canonical Model – The Canonical Model is a physical data model that is used by all applications and services to interact with each other throughout the enterprise. We'll talk more about the canonical model later.
- Metadata Management – Metadata is data that describes the entities and attributes in the Enterprise Data Model. So, for example, an account number may consist of 10 digits, of which the first 9 are a sequence number and the last digit is a check-digit. All of that information is metadata about the account number. This is a simple example, and in a large enterprise data model, there is a lot of metadata associated with the model.
- Data Quality – The GIGO principle (Garbage-In, Garbage-Out) is absolutely true about data and when strategic business decisions are being made based on invalid data, the results can be catastrophic. Ensuring that data integrity is maintained is a key part of any data strategy, and it has to be enforced in the data architecture as much as it is in the processes that govern the data.
- Data Governance – Managing and maintaining data from an operational point of view, ensuring that it is backed up, determining how long it is maintained in various stores, ensuring that the data stores perform well, and developing a high-availability and disaster-recovery policy all falls under the banner of data governance.
- Security and Compliance – Security and Compliance need to be closely aligned with the Data Governance component of a data strategy, but these functions are responsible for ensuring that legal compliance with things like SOX and PCI are taken care of. Security is responsible for ensuring data availability to the authorized channels while securing it from unauthorized ones.
This is far more than I plan to focus on in this article. I want to talk about how you go about deriving the Enterprise Data Model components and the Canonical Model from the business process models that were created for the Exchange Integration Project that I blogged about a week or two ago.
The Business Process
The process at left deals with what happens when a sales person initiates a call with a prospect. The sales person initiates the call.
If the call does not succeed (i.e. the call fails, or the prospect is not available to take the call), the sales person documents the fact that the call was attempted in the system.
If the call goes through (i.e. the prospect is available for a conversation), a discussion is had and the sales person documents the conversation. Once the call is complete, the sales person documents the details of the call and determines the next steps to be taken, including any follow-up calls.
There is no question about the fact that I am way over-simplifying the business process in this discussion, but in order to keep this post as concise as possible, I need to do that. You can assume that some of the items that are noted as activities really need to be spelled out in more detail, and the business processes that are documented in the System swim-lane need a lot more information, but please cut me that slack… I need to keep this explanation short.
The Business Entities
One of the first things we can do with this model is define some of the business entities in this enterprise. In this process we have two entities who are really actors in the process:
- Prospect – The person who may be interested in goods and services from the enterprise; and
- Sales Person – The person responsible for pitching the sales of the goods and services to the prospect.
There are also several pieces of information that we care about that are flowing between these entities:
- Call – The fact that a call took place is important. We need to know the date and time of the call and how the call progressed;
- Discussion – The discussion that took place between the sales person and the prospect probably contains some very important information that would be useful to the enterprise if it were recorded; and
- Next Steps – The action that is to be taken as a result of the call could range from a follow-up call to the generation of a quote, or even an order.
So let's take these entities and model them to get an idea of what they look like.
All of the entities that I references above are now part of this business entity model. We obviously have the prospect and sales person. We also have a call and a call next step. You'll note that the discussion got absorbed into the call as an element.
Now clearly, there is a lot more work that can be done to dig into each of these entities and refine them. Through an iterative cycle with the business, it is not hard to understand that a sales person is generally responsible for a prospect.
Moreover, a sales person is technically either an employee or contractor. That means that their information is available from other sources and in this context, they are simply performing a role in the business.
By the same token, a prospect may very well be an existing customer, in which case you want make sure that the information that you have for this prospect is stored for the customer. When the customer is a large organization, the prospect may, in fact, be a new department in the organization to which you have not sold before.
If a lead management system is present, the prospect may be an existing lead. So what is the difference between a prospect and a lead? Well a lead is a tip that you get about someone who may be interested in the product. A prospect is a lead for which the prospect-to-customer conversion process has already begun.
A call is simply a way of interacting with a prospect. Other ways are by means of meetings, e-mails, letters, and other communication devices. Thus, the call is really an interaction.
Lastly, the call next step may be related to an appointment in the sales person's calendar, an order, or a quote.
So as you dig into each of these entities in more detail, it becomes clear that the model is significantly more complex and detailed than the business entity model we started defining above.
The point that is worth bearing in mind, though, is that these business entities were fairly easy to derive from the business process model that we discussed above. Although this process becomes more complex as the business processes are more complex, the same basic rules apply:
- Each swim-lane is a major business entity;
- Each line that crosses a swim-lane is a significant item of business data that needs to be tracked; and
- By iteratively digging in to each of the business entities, it is fairly easy to identify numerous other affected entities.
Once the business entities have been defined, it is important to understand the impact that each business process has on a business entity. To do this, I like to use a tool known as an Entity-Event Matrix. It is effectively a table with the list of entities across the top and the list of processes down the side. Each cell contains the actions that may be performed by that process on the entity. In the following table I have restricted my actions to create, read, update or delete, but in reality this becomes set of actions that specify the details of the changes made.
|Process||Prospect||Sales Person||Call||Call Next Step|
|Receive Sales Call|
|Initiate Sales Call||Create, Read||Read||Create|
|Discuss Prospective Sale|
|Record Discussion Information||Read, Update||Update|
|Document Call Attempt||Update||Create|
|Process Call Attempt||Update||Update|
|Await Call Termination|
|Document Next Actions||Update||Create, Update|
|Process Next Actions||Update||Update|
This matrix evolves as the model that it is associated with grows. Once the matrix has been completed for all business processes and aggregated together, any entity can be extracted from the matrix and a life-cycle for the entity can be produced.
At this point we have only discussed things from a business point of view. In a mature organization, the above deliverables are a function of the Business Architecture discipline (and I'm lumping Business Architecture and Business Analysis together).
The next step is to take these business entities and compare them against the Enterprise Data Model to determine whether there are any gaps.
The Enterprise Data Model
The terms "Enterprise Data Model," "Enterprise Conceptual Data Model," "Enterprise Logical Data Model," and "Canonical Model" are often bandied about in the industry somewhat interchangeably. So to make sure we are on the same page, I am going to define the Enterprise Data Model as follows:
An Enterprise Data Model is a logical model of the entities that exist across an enterprise, including their relationships, attributes and methods, as well as usage and value, and it is designed as a communication tool between all business, IT and external stakeholders.
There are several key things to bear in mind about the enterprise data model:
- It is Enterprise-wide. That means that is the definitive guide to all entities that make the business work;
- It is a logical model. That means that it does not have a physical manifestation, but it is used as the basis for defining the canonical model, which I will discuss a little later;
- It is a communication device. That means that it is intended to help the business communicate with the business architects and all the other layers in IT. It effectively provides the common language that allows business units to properly communicate between themselves. It also serves as a reference point for external communication with partners, vendors, and outsourcing partners.
- It is more than just an ER model. Beyond the entities and attributes that make it up, it also provides an impact analysis tool that allows the business to determine what systems are affecting different entities, where attributes are being realized, and what value data elements have to the business.
These are just four of the things that the Enterprise Data Model is; there are several others, but they are not germane to this discussion.
The model shown at left deliberately excludes attributes to simplify the diagram, but in this model, the data architect has taken the business entity model, mapped the entities and attributes to the enterprise entities and attributes, and identified any gaps in the enterprise data model. For these purposes, the calendar item entity is the only one that was missing.
In a mature organization, the enterprise data model is subject to a rigorous release cycle, much the same way as application software is.
Of course, the enterprise data model is not only affected by changes that are introduced by new business requirements. Often times, during the process of building components of the enterprise solution, additional entities and attributes are identified that should be included in the enterprise data model. Generally, this will trigger an architectural review of the enhancement as it is likely that other, more serious gaps exist. Nonetheless, it is very possible that additional entities and attributes can be introduced from sources outside business architecture.
Once the enterprise data model changes are approved, focus shifts to the canonical model.
The Canonical Model
Perhaps the most misunderstood, maligned concept in service-oriented architectures is the idea of a canonical model. Gregor Hohpe and Bobby Woolf spoke of the concept has a design pattern for enterprise integration in their book Enterprise Integration Patterns. While they were probably the first to document the pattern, there is nothing new about the idea. Many people have been using it for a very long time.
For example, about 12 years ago I was working on an EDI integration with a number of external parties. Each of the parties had a different standard and each of them needed data from one or more of the other parties. The simplest way to achieve that integration was to transform the data format from one party to a common format and then transform it to another format for the receiving party. This had nothing to do with SOA.
Actually, back as far as 1991, I was on a project that involved a conversion from several systems into a new system. Each of the old systems had data that contributed to the new system and all of the contributed data needed to be written in a single transaction to ensure referential integrity. The conclusion was to create a common model that could be addressed by all of the interested systems, and then transform data into that common model before moving it to the new system.
A canonical model is therefore a layer above all applications to which each application is able to transform their data to integrate with other systems.
In the context of our example, the canonical model is responsible for providing the transformation layer between the CRM application, the LDAP Server and the Microsoft Exchange Server. To create a calendar item in Microsoft Exchange Server, we need to draw data from the CRM application and the LDAP Server and then send it across to Microsoft Exchange in the correct format.
The CRM application has the Call Next Step data in a certain format. The LDAP server has directory information about the organizer and the attendees for the calendar item that needs to be created in Microsoft Exchange Server.
By creating a Calendar Item in the canonical model that conforms to a certain standard format, we can extract data from CRM and LDAP, transform it and store it in a canonical Calendar Item, ship the calendar item to the Microsoft Exchange Server, and transform it into the Exchange Web Service Item form that does not mean anything to anyone other than Exchange. Other systems that wish to make use of the Calendar Item can also perform their own transformations to their native formats.
More than just data, the canonical model also contains the methods that operate on canonical model objects. Moreover, the canonical model is often implemented as a relational model, an object-relational model, and an XML model.
Up until now, we have not dealt with persistence models at all – that is, models that are created to persist the data to a physical data store. As the canonical model effectively defines the entities and attributes for the enterprise, it can often be easily adapted for use as the physical database schema for an application. Clearly, the database architecture specialist for the type of database being used should be involved in defining and implementing the persistence store for his platform.
Hopefully, by now, the value of a data architecture is becoming clear. Among other things, it provides:
- A mechanism for communication with all stakeholders in the business;
- A tool for impact analysis to determine the outcome of changes made;
- A strong mechanism for integration between heterogeneous systems;
- A mechanism for integration with external (partner/vendor) systems; and
- Governance over the way that data is stored and manipulated.
Once the data entities in an enterprise have been defined, the enterprise is much more able to adapt to changes in the business.
Where do you start?
Most data architecture initiatives fail very quickly because the sponsors fail to realize the following:
- A data architecture is a living model. It evolves as the business grows;
- Shutting 20 people away in a room for two years to develop a data architecture simply won't cut it. You cannot boil the ocean;
- The job is never done. The data architecture will continue growing permanently so any effort needs to be seen as a long-term program;
- Quantifying the benefit of data architecture is hard. It's easier to justify when you can look back and say "Here… we failed here and it cost us 20 million because we didn't have data architecture in place."
Ideally you need to start with executive sponsorship from CxO-level people. That doesn't always happen, but often it is possible to start by using the approach I have highlighted here. Derive your business entities for a project, build these into an enterprise data model that can grow over time, implement a canonical model that will also grow as the business evolves, and be fastidious about making sure that you know what all the processes are that operate on the data.
At the beginning of this article I talked about Data Strategy as opposed to data architecture. Ideally you want to attack the architecture as part of an overall strategy that addresses the issues that I raised above. The key thing that data architecture can help with, is better quality data and that is the lifeblood of modern business. If the data is correct, it is easy to extract virtually any intelligence that can be used to support business decisions, and that is the focus of information architecture.