SHARED DATA: DATA MODELS FOR THE FUTURE

- Falk Borgmann

Some 78 years ago, Konrad Zuse invented the first computer. Around 50 years ago, the first prototypical electronic messages were exchanged in the USA. And the XML format was released about 25 years ago. If we map these 78 years of computer history onto the development of humankind, then the idea of exchanging data between independent computer systems was invented in the Stone Age, so to speak. Not that it’s such a bad concept—since it’s proved to be very successful. In fact, entire disciplines and branches of IT now spend all their time focusing on these interfaces.

Interfaces: when it comes to data exchange, no-one can really stand them—they are usually thought of as complicated, expensive, and good for nothing but problems. But our modern IT infrastructure would be lost without them. All of us can surely think of a major IT solution that had a problem with interfaces as part of its productive life.

And today, at a time when everyone is talking about digitalization and innovation, we can certainly assume that some areas of data exchange are going to be seeing changes to the conventional notion of the interface over the next few years. Nor are these changes focused on developing improved processes or formats. This has nothing to do with innovation as such but is more a case of ‘old wine in new bottles’. Instead, the question to be answered is how some systems will share and process data in the future. New ideas are being developed—and it’s high time that the old ways of thinking were jettisoned in several areas. That’s not to say we should attempt to consolidate all of our core business processes onto a single technical data platform, so as to be rid of all of our interfaces. Perhaps this is actually even possible. Often, however, it’s only a small step from a sensible innovation to cloud cuckoo land. If an overall concept is to have a decent chance of success, then it must offer a valid set of prospects for the world in which we live in if it is to be taken seriously.

If we dive deeper than the business logic layer of IT applications—into the technical architecture—then we discover the individual data stores used by conventional IT systems. First and foremost, these will be databases and file stores. Let’s assume, then, that we don’t want to consolidate the business logic in our solutions but some of these underlying data persistence objects. Let’s also assume that one way of achieving this would be to have the individual systems not working with similar data but with the exact same data. As an example, let’s imagine the till receipt issued by a major retail chain. This receipt is typically produced at the cashier’s till and transferred as a physical data record, via an EAI system, to ERP, data warehouse, and archive systems. In this example, the same data record is copied, processed, and stored on three separate occasions in order to then be processed using business/contextual logic in the respective system. Let’s now imagine an alternative system, where the till generates the receipt directly in a data store that fulfils all of the legal requirements for storing tax records. At this point, it would then be an advantage if the ERP or data warehouse system would take its information directly from this same data store without needing any data logistics infrastructure as middleware. This would also mean that, in this scenario, we would neither have a conventional data interface nor need its associated handling. This would achieve considerable reductions in effort for troubleshooting and technical clearing. Less monitoring, operational effort and—last but not least—a large part of the EAI infrastructure could be retired. For a large company, we’re already talking about very significant savings on resources and costs here. Of course, the business systems in this scenario obviously need to be capable of handling the data structures as provided.

If we look at the typical IT infrastructure in a conventionally run company, we find: ERP, data warehouse, CRM, archive, EAI, etc. … So it’s obvious: we have fairly big and complex ‘general stores’ inside our IT departments today. These are silo-like entities, which in most cases are connected together with interfaces for data exchange. Incidentally, the same is also true in the physical world of the modules offered by major system providers. When a sales rep talks about ‘fully integrated modules’, they’re often actually talking about well-tested interfaces between the separate parts of the systems. Even inside these ‘closed’ systems, data are often stored redundantly across expensive storage media. While designed-in data redundancy is a good thing, these silo systems do not involve the targeted distribution of data across low-cost cloud storage, for example, but generally store their data on dedicated, expensive instances of individual systems.


Diagram of typical backend systems (using POS data as an example)


Diagram of typical backend systems (using POS data as an example)

But why shouldn’t these systems at least share their data in places where this would make some sense from a technical perspective? Surely an archive and a data warehouse could work with identical data, for example? Actually, no: because these are physically and technically separate systems. But does this have to be so? Redundant infrastructure and data provisioning is expensive and not always a good idea. So the question we face is how to set up our scenario with an alternative model that merges the data persistence objects from the various systems at exactly the points where we would end up adding value. If we attempt to understand a data record as actually being a service for the various business requirements, this takes us a quantum leap away from the silo-focused theoretical concepts of the 1990s. To get a technical handle on this vision, it’s instructive to look at the three big IT hypes of the last few years. What do Big Data, blockchain, and artificial intelligence have in common? They are all essentially based on concepts that are typically or exclusively centered on distributed systems. So the key concept is distributed systems. A distributed system is a computer system made up of independent instances that, when viewed from the outside, act as a single system. As the name already suggests, they have the architectural bonus of being available at different locations, so they can be implemented as failure-tolerant systems. Next, we merely need to consider this kind of distributed system as the data persistence layer in a shared data services concept. Accordingly, participating systems save their data to a shared, distributed system. Some business functions of these systems may even become superfluous, since they are supplied ‘for free’, so to speak, as part of the service model. One example here would be the conventional long-term archive used in a tax records context.

By taking this approach further, even shared-infrastructure models between individual companies would be possible. In this setup, a vendor’s billing department would no longer transfer its invoices by electronic mail/messaging, but would generate them directly in a shared data store to which the vendor’s customer also has access rights. Has the invoice been sent? Was it sent correctly? Who is responsible for the processing error? Is it signed? Does it need to be signed for a specific situation? All of these questions would be superfluous. Instead, we would have a ‘Data-Record-as-a-Service’.

The challenges involved with this kind of architecture are certainly non-trivial but not impossible to master. As always, the complexity depends on the actual specification and the actual requirements.
Around 25 years ago, when XML was invented, the mobile smartphone era also dawned, with basic text messaging functionality. Today, we use our phones to take pictures, shoot video, and pay at the till. Our phones can get us from A to B and we share our daily lives with our followers on social media. While some might not call this progress, it’s still true that ideas are changing about how people—and companies—interact with one another.
The conventional idea of an interface, where a data record is sent from one computer system to another system by a dedicated software and server infrastructure, and is then processed using expensive storage media, might be a successful model but it is also on the way out in some areas of IT. For some use cases, the future is to be found in distributed systems with shared data models. In the long term, these can be used to make significant savings to IT costs while streamlining technical processes.

An enormous potential exists, which the major software/systems providers and their customers have not yet identified—or perhaps do not wish to. This is also understandable to an extent: some manufacturers now provide specialist solutions to related issues and, if nothing else, are now earning good money with unnecessarily complicated projects in the field: cloud, Big Data and AI. Distribution is coming in with new application areas and technologies, and it’s only a matter of time until new models start supplanting the ‘legacy IT’ found in corporate IT departments. Bowing to the pressure of digitalization, dozens of companies are now working hard on ideas and approaches in this field. That these models and technologies will establish themselves is no longer in any doubt: it’s merely a question of how the world of conventional IT will face up to them.

Share