Why IT Costs Explode (and what your best practices have got to with it) Escape From Data Darkness webcast series - episode 1
How To Lay The Foundation For Digital Transformation
Dave & Chris follow up on the topic in their next webcast on Jan. 27, 5pm (CET): Register now.
Dave McComb, Semantic Arts
Dave McComb is the though-leader in how to overcome IT complexity in business environments. With more than 40 years of practical experience in governmental and enterprise IT management, he has seen all pitfalls that cause the failure of digitalization efforts. Dave is author of the though-provoking books „Software Wasteland“ and „The Data-Centric Revolution“.
Q&A with webcast audience
For technical reasons we couldn't answer the questions from the audience during the webcast. So Dave provided us with the answers later on.
Your slide on exploding IT costs names dependency, complexity and redundancy. Can you break those 3 factors down in a few sentences (how they relate to the problem and eaach other)?
Dependency is when one part of a solution could be adversely impacted if another part changed. If your code is dependent on your schema, which most code is (it takes a heck of a lot of discipline to reduce this) then when the schema changes your code is impacted. This is why a factory that makes a million shampoo bottles does worry that there is a change between the 900,000st bottle and the 1,000,000st one. They just don’t have the opportunity to have a dependency relation. But any system of 1M lines of code has it in spades. Try to change any one line.
Complexity is the product of numerical complexity (how many parts are there) and interaction complexity (how complex is their interaction). In the shampoo bottle example, we have high numerical complexity but virtually no interaction. Some of the common techniques such as abstraction and isolation can help reduce the degree of interaction. But what we’ve found far more effective is to reduce the number of concepts that have to be dealt with. From millions to thousands. That level of complexity reduction is possible, we’ve shown it many times, and really is beneficial. Not only is it reducing the complexity by a factor of 1000 (pretty good by itself) but it moves the model from the realm that no one can understand to the realm where a devoted analyst could. One more level of complexity reduction, down to hundreds and you get business analysts that can participate.
Redundancy — no one complains about purely functional redundancy with no change in representation (database backups, caches etc). It’s when data is copied, renamed, restructured, and potentially changed in some other system, it is the redundancy that causes all the work. There is the ongoing work of reconciliation (figuring out the differences in two sets of data and then trying to establish which one is right) and much of what goes under the heading of data quality is also dealing with redundancies (deduplication to start with, but also the inconsistencies that grow in databases when they evolve separately).
How can I change the mindset of my employees?
This is a tough one. Most people are not visionary, they struggle to project the implication of a change. They mostly want to deal with what they know. If you find some that are more visionary, then you can tell a more visionary tale.
However, most visionaries (you may be one yourself) are not leaders and do not have access to budget. You will need to ally yourself with someone who does. That said the most useful first thing you can do is implement some specific project, with existing, real data. Proofs of concept with synthetic data don’t cut it. The non-abstract thinkers don’t make the leap. But when people see their own data and they see it hooked up in a way they hadn’t seen before, often the light bulb goes on. Then its just a matter of execution.
Which brings us to developers. Most developers don’t get this. Many developers are intellectually arrogant. They will pick up the technology, try it for a while, tell you what is wrong with it and go back to what they were doing. When developers want to solve a problem the first thing they do is bring it to the latest stack of technology they’ve used recently, get it all configured and start to work on the problem. When they get ambitious, they will add something that will enhance their resume, these days that would be ML or GPT3.
If you turn the project over to the developers, it’s probably dead. Your two best hopes are: you direct them tightly enough that you can dictate the tools and some of the key design decisions they make, or outsource it to someone you trust. Once it’s working most of the internal agruments about how “its stupid” “won’t scale” etc will be muted.
Why don't the big software providers offer a solution?
This is 180 degrees opposite of their core business models. Their business model depends on the persistent belief that application systems are big and complex. They require small armies of consultants to successfully traverse all the risks of the project. When it finally becomes clear it will be far too late. They along with all their peers will be scrapping for those last few mega projects.
What about data lakes and data warehousing? Are they not tools to tackle the data problem?
These are partial solutions. In one way they are data-centric, in that we’ve gotten the data from far flung places it resides to a central place. In the early days of data warehouse, people were pretty diligent about “conforming” the incoming data to a fairly limited number of fact and dimension tables. But as anyone who has been watching has noticed there has been less and less conformance and more and more just dumping. Which, coupled with the tsunami of external data, lead to the data lake.
Both the data warehouse and the data lake rely on co-location for value add. In a data-centric architecture co-location is nice and we will shoot for it, especially among data that has complex relationships which other data, but it’s not essential. Through federate ability we can get the data where it rests. The real issue for us with data lake and data warehouse is there is no long term migration. The warehouse is completely dependent on being fed by a legacy system.
In data-centric systems you have the data in the center and well defined. You can begin trimming functionality off the legacy system - one use case at a time, until such time you can just decommission it.
When you say “a single, simple, extensible, federateable data model” – does that imply that all knowledge graphs in an enterprise should eventually connect (assuming appropriate data security is in place)?
They should at least connect at the “Core TBox” level. In semantics the TBox (Terminological Box) is where new terms (classes and properties are defined) The “Core TBox” is the enterprise ontology. As the divisions extend the core, they should be extending known Core TBox concepts. The other bit they should strive for is to have as many of the indentifiers of specific individuals aligned. This may not always be possible, so you may need to build an entity resolution service and a central place to keep all the aliases for a given entity (Division X calls this company “Customer 1234” but Division Y calls the same company “Customer 3333"
Dave, when is your third book coming out?
I’ve decided that the “Data-Centric Pattern Language" will be coming out first as a podcast. I have about 8 episodes in the can. When I get 4 more we launch.