Soda
In Conversation With
S1 Ep06
S1 Ep05

In Conversation with Arif and Max: Data Engineers, Data Mesh Practitioners, and Good Friends in Data

About this Episode

Maarten is in conversation with Arif Wider, Professor of Software Engineering at HTW Berlin, and Max Schultze, Data Engineering Manager at Zalando. Arif and Max are friends in data who share a passion and interest in data mesh.

“Data Mesh in Practice: How to Set Up a Data-Driven Organization”, is the book co-authored by Arif and Max and it’s where this conversation starts. This conversation brings forward their practical experience and opinions as they tackle some common misconceptions and answer some challenging questions that will provide insights, "a-ha" and "oh yeah" moments, and valuable takeaways for anyone embarking on their data mesh journey, and everyone looking to make conscious decisions on how they address data at scale.

There is something for everyone - from data engineers and practitioners, to Berliners, and anyone who believes in the power of people to make the change that will help organizations realize the value of data.

Video

Episode Transcript

Natasha

Welcome to the Soda Podcast and welcome to season one of the series In Conversation With. Just like good data helps the world go around, so do good conversations. Your host is Maarten Masschelein, CEO and founder of Soda Data. In this series, Maarten will be in conversation with practitioners, technologists, and change makers who all share a passion in making meaningful connections and rethinking traditional practices. They'll be talking about data, what makes their world go around, and sharing their thoughts, perspective, and ideas that we think will inspire you to be a part of the conversation and be a part of the change. Without further ado, here's your host Maarten Masschelein.

Maarten

Hi everyone. I'm  Maarten. I'm the CEO of Soda Data and this is In Conversation With. Today, I'm delighted to welcome both Arif Wider and Max Schultze. Arif and Max are known because they connected and they also decided to write a book together on one of the most talked about or sought after topics in methodologies in today's data world. And that is data mesh. So today is a conversation about data mesh and I'm very keen to have this conversation because data mesh is probably in more than half of the conversations we have with data leaders today. I personally like it a lot because it provides a framework for data teams to think about how they can grow and mature their organization to be more effective with data. And it's a quite opinionated piece, as well, that talks about both the social dynamics as well as the technical architecture.

So today's In Conversation With Arif and Max, they will talk about their experience of working together, they will talk about their experience in the data space, but also of course, their book, which is ‘Data Mesh in Practice: How to Set Up a Data Driven Organization’.

So let's start with some introductions before we dive in. Arif and Max, welcome to In Conversation With. I'm delighted that you're both here. Could you please introduce yourselves? So what should we know about you and what makes you a great listen for the podcast?

Arif

Absolutely. Yeah, great to be here. Thanks for hitting this off today here, Maarten. So, my name is Arif. I used to be the head of data and AI for Thoughtworks Germany so I was working in consulting full time for quite a few years. Working a lot with data teams on the ground, as an engineer, and often leading those teams, trying to help data teams to get out of trouble. And now the last two years I went back to academia and I'm a full professor of software engineering, still continuing to work as a both independent consultant and as a fellow consultant with Thoughtworks. This is really what I do these days. And actually, to be honest, data mesh is still the topic that keeps me busy, pretty much every single day, for the last three years, I would say. It's the topic that I'm busy with all the time and trying to help as many clients, people, conferences.

Maarten

Awesome. And you Max?

Max

Yeah. Also from my side, thanks for having me here today. It's really amazing to get this chance to speak to such an amazing audience as well. From my side, I am actually having somewhat of a similar background as compared to Arif when it comes to the engineering space, when it comes to the data space. I've been a data practitioner for almost a decade now, mostly with a background, both on the engineering side, but also specifically on providing data platforms, but more focused on specifically working in bigger organizations. I came from a background even back in University where I was already getting the chance to be lucky enough to get to participate in working on Apache Flink because that was built directly at my university and that really kicked off my interest in data really, really early. And then after some initial endeavors, I ended up at Zalando, Europe's biggest online platform for fashion. I've been working there as a data engineer for many, many years, starting to build up the first version of the data lake of the company, and eventually transitioning into a leadership role where I'm now also working as a data engineering manager that's responsible for essentially the storage layer of the data lake of the company.

Also, at that time, of course, running into a lot of issues when it comes to building data platforms, when it comes to offering data platforms to a larger organization and that's what also directly drove me into the hands of data mesh as well, where ultimately I also met Arif and from where we started on going on to common endeavors. and really start publicly advocating data mesh and creating content in the data mesh space for almost three years now, already. 

Maarten

Awesome. Arif, you said something that peaked my interest. You said a fellow with Thoughtworks in Germany. That was right?

Arif

Yeah.

Maarten

Can you tell us a bit more, what that means, or what that entails?

Arif

Yeah. Yeah. So the thing is when I made the move back to academia two years ago, it was really important to me to be able to not only stay in touch with industry, but actually also to stay in touch with Thoughtworks as a company and also as a community, because this is really a very exciting tech community and it has always been. Because of that, I talked to the folks at Thoughtworks when I decided to take on the professor role and we agreed that I can continue working for Thoughtworks. Of course, in a different capacity, because I do the full time professorship, which means I'm only allowed to work up to 20% of my time in industry but luckily this is not only allowed by my university, but also encouraged. 

Maarten

Awesome. Well, I think we're bringing a lot of experience, not only in the world of data engineering, but much broader to the table today and that's what I'm really excited about. We'll go into the practical side of data mesh, the implementation, the real life things, the roadblocks, I guess, you hit or the experiences that you have. I'm really excited to dive into those. Max, maybe I want to follow up on your story, on your backgrounds. Is there anything in your career in the last couple of years, maybe, or even further back that had a very important life changing or impactful, maybe even defining moment that you want to talk about?

Max

Oh, definitely. The thing that really pops to my mind immediately there is the point in time when I made a transition towards leadership. Because that was like really the point where I was transitioning from doing a lot of hands on things in the data space and really building these things for direct impact for the people that were then using them, towards actually transitioning into a role that was scaling the impact to a much, much broader thing. Being much less hands on, still getting the chance to tinker around with some ideas in architecture discussions with the team and stuff like that, still really enjoy that, but getting the possibility to actually influence a big company in a much broader scale. And this was really exciting for me and it really came with two different facets. On the one hand side, of course, now being responsible for a team and getting the chance to actually also work with other very gifted individuals, helped them grow and see their trajectory over time. That was incredibly exciting and really motivated me. But at the same time, again, the part where I would almost say I switched my team lead hat off and I put the data mesh hat on, when it really came towards influencing the company as a whole in terms of the direction that we are taking with data. This very much went into areas like defining strategic outlook, right? This went into directions of sitting with individual teams and actually helping them to follow up on some of the things that we had been discussing and preaching for quite a while. We are really starting to make things happen in an end-to-end way, not just focusing on dedicated technical features, but really bringing this to a broader scope.

Maarten

So how did you guys meet? What's the story there? When did this all get started?

Arif

I worked as a consultant full-time with Thoughtworks and that is pretty much how we met because I was leading a team at Zalando. And Zalando was the client of Thoughtworks at the time. And in fact, we were not working in the same team or not directly, but I somehow knew of Max because he was somehow involved in getting us there. Or specifically I think, organized a meet up where Emily Gorcenski, who's now the Head of Data and AI Thoughtworks, Germany, spoke and then somehow this ended up us being at Zalando. When I worked at clients for Thoughtworks, I was often seeing great things. That people built great things, that people create great things or follow fantastic practices, et cetera. And, and so, I have this almost a tradition that I usually then pick someone from the client and say, "Hey, this is pretty cool what we're doing here. Should we talk about this publicly or write something about this." And in this case, data mesh was just a pretty new topic at the time, really new, and I knew of Max. Max was already leading many of the data infrastructure efforts at the time at Zalando, so I think I just shot him a message. Then we met in the Zalando office and were brainstorming what we could talk about. 

Maarten

Awesome. It's great to hear that Zalando, it's a success story in Europe. That it has served as the incubator or the innovation center for this to all start and that, that's all possible. That's really great to hear.

Now, tell us a bit about, or tell the listeners a bit about where you are located and maybe, before we dive into all the data mesh specific topics, what are some favorite things you like doing outside or other than data mesh? 

Max

Yeah. Maybe I can take that one on first, before handing it over to Arif. Well, first of all, we are both actually in Berlin, Berlin, Germany, and that is also where Zalando is actually sitting so there was of course fueling the initiative that we took upon ourselves here. But I have to honestly say I was born in Berlin. I grew up here, went to school, university, the whole journey and I'm super happy that I actually got the chance to spend my life in this amazing city. I still love it. I still very much love that it has an incredibly international environment where there's so many people from so many countries all over the world. It started off a lot as, I think there was a slogan at some point, it said something like, "Poor, but sexy," where it attracted a lot of people that just came here to meet, to party, to spend all the nights out. And I was very much among them and very much enjoyed that. 

But by now, as well, Berlin has become a lot of a tech hub as well. It has drawn in a lot of big name companies as well, that have now started building up tech hubs within the city. Again, fueling the international environment because they are again, bringing in a lot of international talents that then come into the city and again, continue building up this amazing international environment. 

One thing maybe to throw in from me, from a personal side, I'm a huge gamer. I love playing games, be it board games, card games, video games, console, mobile, whatever. You can shoot anything at me and I will get excited, definitely. I was even traveling around for many years, playing “Magic: The Gathering'' on a competitive basis and playing major tournaments with thousands of participants all over Europe and sometimes even beyond that and I even got to meet my wife that way.

Maarten

Awesome. That's very cool. And Arif? 

Arif

It's also pretty easy for me to answer the question of what keeps me intrigued outside of data. And that is most of the time coffee, to be honest. I'm really quite a coffee geek if you can say so. So yeah, I'm always looking into the newest gadgets and gear and stuff that you can have about coffee. I have an espresso machine and different grinders and different techniques and stuff to do your hand brew and always look for different coffee to get. Whenever I travel, I basically just look up the best coffee shops to go to. So this is pretty much what I'm really into. 

Max

But be honest, Arif. The coffee is also what actually keeps you going in the data space. 

Arif

Absolutely. 

Maarten

So Max, if someone were to visit Berlin, what is the one thing they cannot leave Berlin without doing? 

Max

So I already mentioned that it's a super international environment and I already mentioned that this, of course, also for a very long time brought in a lot of people just to party hard. I definitely would say you have to experience the nightlife. There's so many amazing clubs, there's amazing bars in various parts of the city. It really doesn't matter where you end up, you will always find some place to actually be around. But be careful as well, a party rarely starts before midnight. So usually people, like, meet and warm up with some drinks, maybe meeting as early as 10 or so. If you go to a club before midnight, there's a good chance it's still closed. But B, even if you get on the dance floor is probably empty. There's even a good saying, "Where has been the night from Friday to Monday," because people, they really enjoy their party in Berlin and I can absolutely recommend doing that as well. 

Maarten

I can't say I've never partaken myself. I've learned that there's a thing called day partying in Berlin. You just basically go wake up in the morning, bright and early at 7 AM and that's really prime time. I've heard it's amazing.

Max

Yes, definitely, definitely. And you know, you can just get up at seven and go party and meet the people that have already partied the whole night through. 

Maarten

Nice. Well, I'm sure we've given our listeners some inspiration and I definitely recommend going to Berlin, it's in a fantastic place. But enough about Berlin. So one of the key reasons we wanted you guys on the podcast is because of the book. So it's called “Data Mesh in Practice: How to Set Up a Data-Driven Organization” and it was published, if I'm not mistaken, in 2021. Right, so last year, O'Reilly. First question, how did you come up with the idea to write a book? 

Arif

That's actually pretty easy. I think we gave a talk at the Databricks Data and AI conference three years ago, I guess, or two years. I don't know. 

Max

2020 I think it was. 

Arif

Covid mixes with your sense of time. I think as one of the effects of this talk, we were approached by O'Reilly first to give a series of trainings about data mesh, online live training. I think then at some point, I don't know, after the first training or something, we were approached by O'Reilly again, if we were interested in writing a little book. That's when we talked about this and thought that might be a good idea.

Maarten

That's very cool. There's, I think, various degrees of understanding what data mesh really is and definitely how to apply it. I think there's still quite some interpretations as well. How do you guys think it should be understood or how would you summarize it? 

Max

Well, it's actually quite interesting because as you mentioned it's a huge hype and a huge buzzword and everybody talks about it, but rarely anyone actually understands what's the meaning behind it. But what is actually pretty important from my perspective is that you understand what are the principles that actually stick behind the term. Because, ultimately, data mesh is trying to address data at scale from an organizational angle. It's really trying to ask people to make more conscious decisions about how they are dealing with their data. Who's owning data, who's responsible for data and to really turn data into something that is tightly integrated into the value generation chain of a company, instead of coming from the situation that many companies have been in for way too long or still are. Where data is provided somewhere as a by-product of some applications, ends up in the storage layer, sometimes even by accident, gets picked up just because of random and somebody starts building a production use case on top of that, right?

That unfortunately is the situation that like many companies are still in. And I don't want to lie for many cases, that's still the case for us as well. But this is really where data mesh is asking you to turn around on how you are actually thinking about what you're doing with your data and to put in the conscious effort. And to turn around and really make it a proactive work with your data to really get the best out of it and get the value that the data actually ingrains. 

Arif

Yeah, I think the one message that I always want to bring home about data mesh is, that it's about people. I think I've given quite a few talks with exactly that title. I think also Zhamak [Dehghani] recently is pushing that point quite a bit, that she is always emphasizing that data mesh is a sociotechnical approach. So in fact, I would say data mesh is not only about people, right? So technology is really an important part of data mesh, but I think it's very easy to see data mesh as a successor of maybe the data warehouse or the data lake paradigm. But this means that you come from purely technical approaches and then, you think data mesh is the next technical successor to that, but that's really not what it is, right? Data mesh really adds the people's perspective to it and this is really the important thing there. It is really an approach that, or a paradigm that, really looks at ownership structures, responsibility structures, interactions between people and not only how data is stored, how data flows, et cetera. 

Maarten

That makes total sense. The people part has always been, I think, the hardest, and especially at a time like this where the data space is evolving so, so quickly. So it's very, very hard for a lot of people to keep up with all of the latest and greatest, all of the changes that are happening, all of the dynamics in the landscape. And it must have also been, for you, a very interesting journey. I remember from one of our other podcast series, Data Dream Team, Jesse Anderson, who's the host of that, he had a conversation with Zhamak and she was talking about how interesting it was to be kind of at the bleeding edge, writing about all of these concepts and having to kind of distill and make them very, as easy as possible to explain and share, so how difficult that journey was.

And I'm sure that was very similar when you're kind of implementing this, you're at the bleeding edge. There's some theory, but the practice and theory there, I think there might have been a big gap. So is there anything that you want to share that you've learned while implementing kind of data mesh into practice? 

Max

Well, the interesting thing is, and that is also something that Zhamak once mentioned to us is that data mesh is nothing that has been freshly invented from the scratch and just popped up out of nowhere, but much rather, it's something that already developed in many, many companies at the same time, but ultimately, Zhamak did us the favor to give that thing a name so that we actually have something to talk about, right? 

And this is honestly also how the whole Zalando data mesh journey started because we looked at what was proposed in the original article and we realized that there's actually quite an overlap to things that we are already doing. And that was actually also where, even though that was really early after the name became public and started to be talked about quite frequently, we got the chance to already give the talk that Arif earlier mentioned to talk about our practical experiences because it was not just about, “Hey, now there's this data mesh thing. Let's try to actually make that happen,” but it was also much more a reflection of where do we actually stand already? What are the things that we have already done? And now that we actually have something, some way to call that, how can we reflect on these experiences and share them with others so that we can give some practical advice as well to the rest of the community around that. 

Maarten

That makes a lot of sense. I really want to hone in on this people's part. I'm also super, not curious, but I think it's a core aspect, but before we do that, I want to know a bit more about the book.  So without spoiling it for any of our listeners who haven't read the book yet, take us through a high level summary. What are some of the key points you're making? What are the key steps that companies need to go through when setting up a data driven organization? Maybe some stories. 

Max

Yeah, generally speaking, I think one of the key points of the book is the practical experiences and the practical examples that we have actually thrown in there. But generally speaking, the book is built up in two main chapters, the first one where we talk a little bit about the whole background about data mesh, where it's coming from, what it's actually about, what are some of the problems that it's addressing as well.

But then we try to turn that around and shift gears and say, "Well, how can you now actually get started? What are the things that you can dive into?" And like even going through the different stages on how a data mesh develops within the company, really from the first steps that you are taking, the first things that you're getting involved with, some tips and tricks on really how to get started from square one, but then developing this further and how can you actually scale the mesh once you have made the first steps in your organization? Up until the point where you really dive into how can you sustain this over a longer time? 

And again, one of the selling points I would say is a lot of small practical examples that we've thrown in there. A lot of pitfalls and best practices that we have encountered through various different organizations that we have been talking to over time, and that is really where we wanted to build that up. And maybe we can even share a small, anecdotal story when it comes to that, to showcase one of the practical examples that we have integrated there. 

Arif

Yeah. I think we called it data mesh in Practice for a reason. There are other books about data mesh that simply explain what data mesh is, and we really wanted to go a bit deeper into how do you do data mesh? And yeah, I think what is really at the core of the book is this data mesh journey, and the different stages of this data mesh journey. Because as we know, this is really a journey that will probably take several years. So there are really very different problems and different challenges that you have at different stages of that journey. And this is really what we look into from a very practical, hands-on perspective. 

Max

Maybe going a bit on a  tangent here, but I think just reflecting on one of the practical examples that we put in there for the purpose of making the things that we are trying to convey just more relatable to the people, right? One of the examples that we put in there, and again, those are somewhat made up examples, but they are too close to reality to just push them aside. And this particular one, maybe I can share, is about a data analyst that joins a team that has been working in a technical setup for quite a while, an engineering team, and they were measuring also some KPIs of how their services are doing. And now with this analyst joining, for the first time, this team was in the situation that they wanted to build some KPIs that did not only rely on the data that they were producing by themselves, but also on some data of another team. 

And this analyst got pointed towards a contact in that other team and they reached out and they were asking for, well, about that data, about that service that the data was coming from and about how they can use that. And  they got pushed back saying, "No, we are not responsible for this. And we don't know how to help you." And yeah, so they just had this new job and the new team got this first appointment, and after two weeks, they were already stuck.

Trying to then talk to the colleagues again, they learned that there was a central data lake team and that the central data lake team might be able to pick up the conversation to help them. So they reached out to the central team and conveniently enough, again, it took them some time to actually get a initial response, but conveniently enough, they actually got a hold of someone that was able to sit with them and to actually help them and to dig deep into, for instance, finding this one data set that was mentioned before that they wanted to use for their KPIs, and identifying where it was actually coming from. And who would have guessed, the original owner of the application on where the data was coming from was in fact the very first team that they already talked to. 

Now with that knowledge, they again reached out to that team. And then that respective team realized that they screwed up, that in fact, this was within their area of responsibility, just that there has always been one person that was taking care of that service that had all the knowledge about that service and that person had just left the company. So they were in this amazing situation that they did not know anything about that anymore. The analyst, well, they were stuck in this situation and they had to actually figure out what to do there, but ultimately, it took them six weeks or so to actually find the right data that they actually wanted to have for their particular use case and to be able to use that, and to in the end fulfill a job that took them less than an hour to actually make it happen, right?

And this is just one of the stories that, again, way too close to reality. I've seen many, many things like this happen all the time everywhere to just showcase how important it is. Not only to look at the technical parts around the data that you're working with, but to really understand the people component of that, right, the part of ownership, the part of responsibility, and to really ingrain into working with data and, yeah, owning the data so that you are actually able to also extract the value from the data in a reliable fashion in such a company.

Arif

Yeah. So I think it's really kind of those real world stories that are usually about actual people, the problems of actual people that really inspired large parts of the book. So I don't know, I think this particular story of mine is not shared in that way in the book, but it's kind of the one that I have seen again and again as part of my work as a consultant. It happens so often that I came into a central data team or a data team with central data responsibility. And they were so buried in firefighting work basically, and couldn't innovate at all, but, we're really just facing the complaints of what felt like the entire company. And they got blamed that nothing is working anymore and nothing is moving anymore when it comes to data, et cetera. And it's really this pretty bad situation of central data teams that has influenced many of my ideas about this.

Maarten

I think this will be very relatable to many of our listeners. Striking this balance between agility, but also stability, I think is a very difficult one to strike. And it's very often also, I guess, not visible in the organization and so how much time goes into all of this firefighting. In the past, I think this problem would to some extent be kind of tackled with what we call data governance programs, right, to various degrees of success. Well, let's keep it at that. Probably some, a little bit less successful than we hoped. How is data mesh different from data governance? And maybe as an extension to that, are there any lessons that we can learn from these data governance programs from the past?

Max

Yeah, I definitely think so. One of the biggest things to say first is that data governance, of course, is a part of data mesh, right? The approaches that data mesh has taken might be very much different compared to these central data governance programs that you just mentioned before where somebody was running around more like a data governance police that was with a stick behind the teams, trying to force them to actually take care of some of the rules that they had made up in their ivory tower. 

Of course, we have seen many times that, as you mentioned, too, various degrees of success, rather on the low end on that side, but this is exactly the part where data mesh tries to turn it around and take a more decentralized approach as well to on the one hand side, generally, the concept of data ownership and data responsibility, but to deeper ingrain that into the culture and to not have, again, somebody run around with a stick and force the people to do that, but to actually build incentives for people and to prove to the business as well that there's value behind having people actually take on their responsibility about data.

And of course, there's a whole governance aspect to that as well that comes through things like the moment you start decentralizing things, that means there's a high risk that you will start building up a lot of silos, right? There's a real risk that things start drifting apart and starting to move into totally different directions. And this is, of course, something where again, you need to really make sure that you catch those things early and you make sure that there is still enough alignment for the cases where it really matters.

And this is where also the governance side in data mesh takes a much more federated approach where you actually bring in different individuals from their respective domains to, well, have their say and to speak to each other on a regular basis to understand what are the global concerns that go beyond just the scope of their respective domain. You want to keep the things that belong into one domain and that only concern inside that domain. You want to also keep them inside of that, but for the things, because that enables again, agility and speed, and actually allows people to move fast, right? But for the things where there's touch points between the domains with the concerns that go beyond that, there still needs to be the possibility to connect those people as well with each other and to actually talk about that. Arif, is there anything that you want to maybe add specifically for the automation part of that? 

Arif

Yeah. Yeah. I was indeed about to go a bit into that. So, I mean governance is basically about making people do the right thing. I mean governance should really be about, I don't know, data protection policies, those kind of things that are really important, right? And I think the important thing about data mesh is that it's all about ownership and responsibility. So the idea is to not force people to do the right thing, but instead create a responsibility and maybe also incentive structure so that people are aware that they should do the right thing and they feel responsible to do the right thing. And the important thing here is that you support the people in the right way, because it doesn't help to just say, "Hey, you have to do this and you also have to do this. And on top, you also have to do this," But instead, it needs to be really easy to do the right thing and you need to get as much support as possible.

And so this is where this whole computational federated governance aspect of data mesh comes in. So the idea is to give as many tools to people that allow for a certain amount of automation. So to give one of several possible examples, the people on the ground who own the data, who create the data, et cetera, they know what's in the data. So they're the ones who can, for instance, say, "Hey, this data field here, this really needs to be encrypted and secured very well because it contains important personal information," et cetera. So they're the ones who really know what's in the data and how it should be treated.

So now, the thing is those people, they need to be enabled to tag the data, for instance, accordingly, and somehow use their knowledge about the data to say, "Hey, this is important here," but they shouldn't be the one who need to think about the details, how that encryption is happening, et cetera. So they need good tools so they can simply use their knowledge, make good use of it, say, "This piece of data here is kind of sensitive," and they know how to tag it, what to do. And then, they need to be able to rely on automation and on a platform that the right things are then done with the data accordingly. 

Maarten

That makes a lot of sense. So you're creating a platform which could be a platform for people, but also for tools so that the different teams and parts of the organization can do the right things when it comes to data and they can do it on their own, they could do it in a self-serve manner. That would be my summary of that. Does that make sense to you guys as well? Or would you add something?

Arif

Absolutely. 

Maarten

Very cool. All right, so now we have that mission. What can go wrong when we're doing this? And I'm sure there's a plethora of things that can go wrong, but what would you want to share with our listeners that are embarking on this journey? 

Max

I guess we can start off just with the most basic thing that is probably the earliest that people can be doing wrong as well in their journeys, which is not actually understanding what data mesh is about, starting and embarking on a journey or based on just hype and buzz and start running without actually really understanding what's the direction that you're taking. And I always love to summarize it as do not commit to the buzzword, but commit to the pillars behind. And this is super, super important as well that you actually need to understand and have deeper conversations as well with some people around you to really reflect on what is the real meaning of the principles behind it. And especially, what is the real meaning of those in the context of your organization?

'Cause one of the most important things to realize is as well, data mesh is different in every organization. Everyone has their own very specific needs for their field of work and that differs from org to org, from company to company. And first figuring out what does data mesh actually mean by the book, but then also reflecting on that and understanding what it mean for your company, that is one of the first steps before you really want to get going. And we've seen many companies do that mistake. And I have to frankly say we did the very same at Zalando as well of trying to name something data mesh at the very first step when the whole thing just came out and then, realizing that this is not a good idea only a year later when we reflected much more on where this journey is actually taking us. 

Arif

I mean data mesh is honestly a pretty advanced paradigm and a set of pretty advanced practices. So I would say there's also a certain level of maturity that you as a company already want to have before you go into data mesh. So if you have a company that really is very early in their experience with product thinking, for instance, it's really hard to go the next step already and think about data products. You first have to have a good idea of how do you do product management and product development at a company before you can get into the even more complex topic of building data products and doing data product management, for instance. 

Max

And just to share another small anecdote from one of our trainings, there were people that were approaching us who were all like, "Yeah, this whole data product thinking is really nice, but how can I do this federated governance?" And I'm like, "Well, maybe at first you need to understand that one builds up on top of each other and that you cannot just start making your governance federated if not even your central team knows what that truly means, but even more so if your decentral teams have no idea how to work with data in the first place."

And that was really, we heard a lot of people that tried to jump ahead of themselves, really trying to first build the platform before they even have the first use case or even especially because, again, a lot of companies have governance issues and they heard there's a new kid on the block that talks about something, something governance. So that is the one thing that they try to immediately jump to.

Maarten

There are already quite some very, very good recommendations in there, not getting ahead of yourself, not caving into the buzzword lingo, and really studying the different concepts and doing some form of maturity assessment on the different areas. If you're mature in one, but maybe not, as you said, Arif, in product thinking or in product management, that could lead to pitfalls. So that's very, very helpful. Any other things top of mind that you would share, other either lessons that you've learned or that you're still learning that you want to share with the audience? 

Arif

So I think the thing that we also keep repeating again and again, but it is really important is that we believe it's important to start small, right? I think you usually need some form of management backing in order to successfully do such a cultural transformation. But nevertheless, it's really important to start with a really specific use case. Start with a team of motivated people who really want to change their way of working, yeah, who are happy to challenge certain things. And then, create a small lighthouse project, a lighthouse example on which other people in the company can then look at, right, and see, “Hey, this is interesting to do things in a different way. Maybe then, the people from that first team can share how things work better or maybe what specific challenges were.” And this way, you can really kind of tell a story, tell different stories and, yeah, drive this culture and those practices through a company. But it really doesn't work if you, for instance, create a huge platform project and plan for 12 months, platform development, and then a six-month roll out of the platform for the whole company. 

So it's far more important to look at a specific use case. And you also, you want to have advantages already after three to six months, right? This isn't about kind of a huge upfront investment. This is more of a marathon, right? You will need to stick for it for quite a while, but at the same time, you already want to reflect on it a couple of months in, and you want to make sure that it's already working for you to some extent, and that you already get something out of it. 

Maarten

I think that's great advice for organizations that are taking on data mesh. So start small, do it end to end. Make sure you have business value as early as possible. I think those are great points. So are there any particular roles that are most impacted by data mesh?

Max

I would say, first of all, the not very fair answer is probably all of them, but of course there are some that are much more impacted by others. On the one hand side, the engineering teams that are now asked to actually take on more responsibility about the data that they are producing, they would very much be affected as well because they might need to pick up a new tool or two and expand their skill set to actually go a bit deeper into data in general, especially when they're coming more from, let's say a software engineering background, and now they are really asked to take on some challenges around data. The most important role I would say, is probably going to be the data product manager. We have emphasized product thinking for data quite a couple times already today. 

And it is really important that you have somebody in the team that has this on top of their mind all the time. That is experienced in product thinking in general, but also is becoming more and more an expert in the data space. And this is really to understand where the value from your data is coming from, to really understand your stakeholder landscape, to be able to communicate with them on a regular basis, to understand what they need, what is the impact that you're actually having on the rest of the organization? And to use this input as well to prioritize the work within the team. And this is really what I would say is probably the role that's the most impacted by that, because that is the one that is essentially driving the adoption of data ownership and data responsibility within the respective teams. And that's the one that really needs to understand best where the business impact is actually coming from that you're trying to reach with the services that you are offering, not just from an IT side, but specifically also from the data side.

Arif

Yeah, I would very much agree with that assessment, that it's really the data product management that is most affected, simply because that role usually doesn't exist, so it needs to be created. And it's incredibly hard to build up those people, to be honest. And you cannot find those people because this is just being developed now, so you want to look for people who have an interest in data and analytics, but who also have experience or an interest in product management, and then you want to bring those two sides together. So you probably need to look for people who have prior experience with either of the two sides and then see how you can build up the other side. So that is really a pretty tough one, but a very important one.

And every time that we manage to successfully build up those people, companies have been very grateful and happy to have those people around. Another role that I would still mention is actually that of the data scientist. It might already be affected before introducing something like data mesh. But let's say depending on the maturity of your company, there are still a lot of companies where data scientists work in an isolated matter or way in their little labs or pockets or so, and they're not integrated in general engineering practices, such as continuous delivery, checking all your work into version control, versioning your models, your data, et cetera. And I think data mesh is actually not really talking about this much, but in fully functional data mesh, there is no way around integrating data scientists with all the other engineering and product roles. And depending on the company, this can be quite a big shift for the data science role. 

Maarten

That makes sense. And I'd love to double click on the data product manager role a little bit. So if you already mentioned that it's, in most cases, not there yet. My assumption would be of course, in companies that deliver software products, you'll have more product managers, so you might have an easier time to find them. But if you're not a software company, you might have a harder time. Ideally it's somebody that has one of both skill sets, so either product management or in data, and I guess it could be a variety of backgrounds or roles in data. Would there be any other thoughts or recommendations, maybe even a high level job description, just to help our listeners more think about or visualize what that role would actually look like in a day-to-day, how they can look for or find these people within their organization? Any pointers there that come to mind?

Arif

Yeah, that is a tough one. So from my experience, I would probably start to look for people who have a passion for product development, or basically people who work. So I would probably first look for people who don't mind talking to stakeholders, understanding the needs of customers and consumers, because this is what you need to have a passion for, that you try to understand what the needs, what the desires of other people, usually within the company are. And then I think you can ideally look whether among those people, there are some that are also interested in data, and if not, you can try to upskill them on that side. Of course, it can also work the other way around, but I think it's really important that you find people who like to work with people, and are not only enthusiastic about data and data technology and analytics results, et cetera.

Maarten

Max, anything you want to add there?

Max

Well, I can also speak a bit from my experience here, in the sense of, on the one hand side, we were lucky enough to some extent, to be already more on the side that you mentioned before, that we had already been a pretty product-driven company at that time. So even a lot of the, let's say software engineering teams that were providing a lot of the data, they already had product support for the parts of the services that they were providing. And from that angle, of course, it makes it easier to take somebody that already has the product thinking, already has a network internally of stakeholders that they are regularly communicating with to up skill them on the data side and have this they're using to actually introduce them to, then as a reset to have it the other way around where you have some people that might be passionate about data, but they are more focused on technology and diving into the details and really being interested in the analytics of the specific facts that they're actually looking at.

From that angle, I think we've always been on the more lucky side, let's say to start that journey. Of course, we are growing as well. There's new teams that are being built up and there's, again, also people that we need to bring in from the side or that we need to newly introduce to these roles. And I have seen people as well that were coming from a data engineering space or from a data analytics space that started to move into product as well. And again, one of the biggest traits that Arif also highlighted, those were usually the people that were very interested in working with people and communicating all the time, and understanding and trying to dig deep into what their stakeholders actually need. So from that angle, the experiences are really matching here from my side as well.

Maarten

That makes tons of sense.

Nice. So what will the topic of the second book be about, then?

Max

I think we have to admit, this is not the fully sized book that you would expect on a topic like this. On the one hand side, of course, we've written it rather early throughout the development of the data mesh topic in general, but that also makes it a great intro point. I think it has now reached a size in the format where you can easily run through that in a day and get a very good grasp on what the topic is about, and what are some of the key points that you need to start thinking about when it comes to reflecting this on your organization? Now, the interesting part is, of course, we are curious to dive deeper into that. Since  we wrote the book, there's still plenty of practical examples and practical experience that we are collecting each and every day when applying those things in practice.

And as you might have noticed, we are very curious individuals to also share these things with the community and with a broader audience. And I think it's really important to reflect on these things as well, again, after a certain amount of time and after the topic itself has matured, but also after the organizations that we are working with have matured in their adoptions of the topics. And I think one of the biggest points that we want to follow up on is really to continue outlining the journey and to start gathering more and more practical input as well, and again share these practical examples with the community and turn that around so that people can really have something tangible that they can work with, that they can touch and that they can really connect with when it comes to applying these things to their own data mesh journey as well.

Arif

Yeah. I think as Max said, the second book will probably be the first book in an extended version, fleshing it out at various places. Because yeah, we are experiencing so many interesting challenges as we work with different people, with different companies, and they're just really very interesting questions, like, I don't know, how do you build consumer driven contracts for the output ports of a data product? Really going into detail and sharing best practices and maybe pitfalls. I think this is really what we want to go deeper into.

Maarten

Awesome.

Max

And of course, because of Arif, we also have to bring data mesh to academia.

Arif

Oh, yes. Of course.

Maarten

Very nice.

Arif

We will.

Maarten

Very good. So later in this series, we'll have a guest from Google's site reliability engineering team join us, and we're going to dive into the pillars of site reliability engineering, but also how it applies to the data space. Arif, do you have any views or perspectives on this that you want to share?

Arif

Yeah, sure. So in fact, I think Zhamak has also often said that data mesh is, to a certain degree, about bringing engineering principles or practices from operations engineering to the data space. And so data mesh is a lot about bringing things that now have a name, which is site reliability engineering, to the data space. So I think data mesh is a lot about working professionally, so to say, with data and building contracts, for instance, or also building service level agreements, those kind of things about data. And I think this has, as always with data mesh, not only a technical component. So it's not only about describing a technical contract as with an API or so, but it also has this semantic component, where you really maybe also want to write a document about the assumptions that you have about the data. 

How do you expect the data to behave, to develop into the future? How often is that updated? How reliable is that particular stream, et cetera? And of course, it's great if you can formalize many of those things and then maybe even automatically check those parts of a contract, but I feel that many of the things that you need to talk about in data cannot be, or it's pretty hard to check them automatically at this point in time. So I think it's still worthwhile to also just write down your expectations and your assumptions about data and agree about that with your data stakeholders. This is really the important point there. An analogy that I often use is that a federated data governance team, as we envision it in data mesh, as Max said earlier, shouldn't act as a data governance police, but more, I would say like a notary. Basically someone who helps a discussion between two parties, between the data consumers and the data producers, and helps them create a good contract. Help them to ask the right questions, help that the right things get included in the contract, et cetera,  to prevent arguments later. This is what a good notary is doing. And I think this is an important part of this SRE idea but as you see again, bringing it also a bit more to the people aspect. 

Maarten

That is very interesting and definitely something we second or echo at Soda. We've been tirelessly working on making language for these expectations available for all your producers and consumers. Really resonates with me as well.

Arif

Awesome. 

Maarten

You've mentioned the term data product, and I've personally struggled a little bit with that. Data products in my head in my mind are a finished product. It's like a machine learning model. For example, that's kind of driving a feature in your application, or it's a dashboard that your operations teams or analysts are using. When I read Data Mesh for the first time, it explained data as a product, and it explained that so we should think of a data set or of data as a product. And therefore how I understood it was that, the data sets that we, for example, as a data team published on the mesh that those are products in and off itself. And I'd love to hear if you shared that opinion that there's some confusion or is it something that's very clear cut in your minds?

Arif

Yeah. You're not the only one struggling with that term. And indeed, there's a lot of confusion about that term, data product. First of all, you're right. The key idea about data product is that product thinking is applied to data. In this case, I would even say that for instance, if you only apply SRE, site reliability engineering principles on a dataset, that is not a data product, because you haven't applied product thinking. And the latter I think is even more important. Now where the confusion comes from is that as you said, well, if I build a machine learning model or a dashboard and I apply product thinking, then this must be a data product. And this is indeed not the case. And the second term that I usually introduce here is actually that of a data application.

And I usually say that you have data products and you have data applications that build on top of data products. And now the way, or what is distinguishing a data product from a data application is that you have not only applied product thinking on a data product, but you have also invested in reusability and composability. And that is the thing which you do not do when you build, for instance, a dashboard. A dashboard, you can build it as a great product, but you don't build a dashboard so that another dashboard can be built on top of that dashboard. And this is really the thing. So what creates that confusion is that data mesh has this organizational aspect to it, where all the product thinking, et cetera comes in, but it has also this architecture aspect to it.

And as we say, the architectural, what is it…The architectural quantum of data mesh is that of data product. And this idea of an architectural quantum that only works when you make that investment in making this a building block. And this is exactly what makes a data product. A data product is a data set, or it can also be a machine learning model, something that is providing data where product thinking has been applied to, and that serves as a building block to build other data products off it, or build data applications on top of it.

Maarten

That is super helpful. Will change my vocabulary to start using data applications now. That makes a ton of sense to me. Thank you for that.

Arif

Sure. You're welcome. 

Maarten

Very good. Arif, Max, thank you. It's been a great conversation. A few points I'm taking away are, the difference between data products and data applications, what to do next time we're in Berlin, and how to start with a Data Mesh initiative. Start small, prove value early and take it from there. It's been fascinating to hear how you have put Data Mesh into Practice. Thank you for sharing that with our listeners. Thank you for sharing that into the data community. 

Arif

Thank you so much. It was a pleasure to talk to you here.

Max

Yeah, it was great to be here. 

Natasha

That was a great conversation. When our peers share, it's an opportunity to listen where others have tried, where others have succeeded, we can learn. Oh, for the power of trusted relationships and this data community. Join the journey and get connected. Follow Soda to be the first to know about new conversations as soon as they drop. We'll meet you back here soon at the Soda Podcast.

Aug 18, 2022
-
S1 Ep06
In Conversation with Arif and Max: Data Engineers, Data Mesh Practitioners, and Good Friends in Data
Close Icon

Share

In Conversation with Arif and Max: Data Engineers, Data Mesh Practitioners, and Good Friends in Data