Soda
Data Dream Team
S1 Ep14
S1 Ep13

Focusing on the Outcome with Harvinder Atwal

About this Episode

Harvinder Atwal is the Chief Data Scientist at Moneysupermarket Group who is passionate about people, algorithms, processes, and technology. He believes that the data lifecycle is a team sport and will share his experience on structuring and organizing teams; product thinking; and, hiring, training, retaining, and growing skills for the roles in your data team.

Listen to Jesse and Harvinder as they discuss how Harvinder has applied Lean Thinking, DevOps, and Agile Methodologies, to develop the right organisation and structure for data management and governance.

Video

Episode Transcript

Natasha

Welcome to the Soda Podcast. We're talking about the Data Dream Team with Jesse Anderson. There's a new approach needed to align how the organization, the team, the people are structured and organized around data. New roles, shifted accountability, breaking silos, and forging new channels of collaboration. The lineup of guests is fantastic. We're excited for everyone to listen, learn, and like. Without further ado, here's your host, Jesse Anderson.

Jesse

With me today is Harvinder Atwal. Harvinder Atwal is a data professional with an extensive career using analytics. He's done all kinds of things. He is well-known within the DataOps community. In fact, he even wrote a book about it. We're going to talk more about that. We'll talk about data science and engineering, and then how we deal with the talent shortages we're dealing with. Harvinder, welcome. Would you mind introducing yourself a little bit more?

Harvinder

Thanks Jesse. Thanks for having me. So I'm Harvinder. I'm a data professional. I've been working in data and using algorithms for 25 years now, long before it was the world's sexiest profession. Started off actually in operation research, did a role in the government. From there, I moved on to an organization called Apax, whereas looking at the impact of the internet and chip and pin on payments and finance. From there, I went back into a more traditional OR at British Airways. So working on a wide variety of problems across the whole airline from pricing, engineering, manpower planning, fuel hedging, you name it. From there, I moved on to Lloyds Banking Group, and that was my first sort of experience of large-scale sort of analytics using customer data.

So Lloyds at that time had invested a huge amount of money and resources well ahead of other competitors into building analytical capability. I moved from Lloyds onto Dunnhumby. So those of you who don't know who Dunnhumby are, they were the sort of pioneers of loyalty schemes. So using sort of supermarket transactional data for Tesco and Kroger in the U.S. to drive loyalty to those businesses. And then more recently, I've been working at Moneysupermarket, which is the UK's largest price comparison site, allowing consumers to compare prices on a whole range of products from insurance, credit cards, home broadband. There's literally hundreds of products that you can come to our website and get a better deal on.

Along the way, I've seen a lot of change in data and analytics. And we'll be talking a lot more about that through the podcast.

Jesse

Well, I'm excited to have you here because you have such a wide variety of experience. And those of you who don't know what OR is, that's operations research, which if you haven't ever gotten the chance to talk to somebody about that, it's fascinating because of the sheer amount of data that they look at and from an operations perspective.

And so, as you've looked over your career, you've actually shared your knowledge with people. You wrote a book called Practical DataOps: Delivering Agile Data Science at Scale, and this was an incredible book. And if you haven't read it, I highly suggest you do that. If you've already read Data Teams, my book, you know that I deal with DataOps a little bit, and I really consider Harvinder's book, Practical DataOps, to be complementary and really expand on that subject even more than I did. So I would highly suggest you read that as well.

So with that said, tell us a little bit about your book and why you chose to write that.

Harvinder

Yeah. So as I mentioned, I've been working in data analytics for 25 years and a lot of the problems are still the same, they're still the same challenges. So if you look at the data about how well data is used in organizations, it's not exactly pretty. So according to NewVantage Partners, only 7.3% of organizations say that their data analytics is excellent. Only 22% of companies, according to Gartner, say that they're currently seeing a return on their investment in big data analytics. But there's definitely a challenge there. And the reason I wanted to kind of write the book was to share some of my experiences from Moneysupermarket and earlier on, how do you try and overcome these challenges and drive better outcomes through data.

And one of the problems is that a lot of people have been kind of repeating the same sort of things over the years, around the need to deliver sort of actionable insights to make the organization more data literate, to have self-service analytics and so on. But these things clearly were making a difference.

So working in a digital first sort of organization, I had a lot of experience working alongside software engineers, software developers, product specialists, so product managers, product owners, and I realized that they also had sort of similar challenges in terms of how do you deliver business impact and beneficial outcomes when you're dealing with a lot of complexity and uncertainty. So I kind of started to apply a lot of their practices and methodologies to data, starting in the area of personalization. And found that with adoption, they could help actually deliver a lot more value to the organization. And I kind of wanted to kind of share that experience.

So at that time, I didn't know that it fell under the area of DataOps. I thought I was just applying best practice methodologies from sort of related domains. It was only later when I kind of came across the work of people like Kate and others, that I kind of realized that actually this wasn't unique to Moneysupermarket at all. It was quite a common challenge across all organizations. And there were several people who were thinking along the same lines.

So to give people, kind of my definition of DataOps, I know there's a few competing versions. Some people kind of more narrowly describe it as applying DevOps principles to data. I think of it a little bit more widely. For me, it's about employing some sort of best practice from software development. So yes, DevOps, but also agile methodologies and also lean thinking, and some of the work that has come around in product development. So think, things like lean startup, and how you can apply that to data.

But data itself had some unique challenges compared to those areas. So although on the one hand you can kind of think of it as sort of data product development, and you can apply product development and product management techniques to it, there are nuances to data. So for example, if you're building a software product. A classic example is if you're building a code for a calculator, for example, you can develop that code, you can test it. And you can be pretty confident that the code will work as well as it does today in a hundred years time. The problem with data is that data is constantly changing. So data isn't always captured in a uniform and consistent way at the time. There can be lots of edge cases in the data that you don't necessarily know about at the point you're doing development with data.

So it's about adapting those techniques to data. And it's combining the sort of best practices from lean manufacturing, software development, product development, and applying those to data. And really, my goal was just really to kind of share my experience and to help organizations kind of overcome some of these challenges and get away from a lot of the legacy thinking, which still exists in a lot of organizations, around how you actually use data.

So historically, they found a lot of organizations have been... You build a data warehouse, you put some BI tooling on top and off people go. Or more recently, it's about hiring data scientists, giving them a MacBook Pro, getting them to create AI, ML models and hopefully some magic will happen. And they'll create lots of value and they will see... Those of us who have worked in this industry for a long time, know that's not the case at all.

Jesse

It is not the case. One of the things I really appreciate about what you did and other books of that same vein is when people share their personal experience, that firsthand experience, there's no substitute for it. And that's part of the reason why I liked your book so much; you shared your firsthand experience doing this. So could you share with us another story about data? What is your greatest story that you've never told before?

Harvinder

Okay. So this is, I guess, not a DataOps story. It's more of a sort of interesting story from my career, is from when I was at British Airways.

So going back to about 2005, there was a fuel supply depot outside London, just the north of London, which exploded. So at the time, it was actually the biggest, I think biggest fire or biggest explosion in Europe since World War II. And it was so big actually, even though I kind of lived about 30 miles away, it shook the windows of our house. Now, I didn't really think too much about it, but I just got a text from my manager at British Airways at the time. And I was seconded to his team. So at the time, I was seconded to the fuel team. I was looking at things like optimal fuel hedging strategies. And he just texted me to say, "Make sure you're in early tomorrow." So I was just curious, didn't really make the connection between the two things.

So I went in the next day and he told me that the fuel depot actually supplied a third of Heathrow's airports fuel supply. So if you do the math, you kind of realize that within three days, the air was going to run out of fuel and there's no way of making up that supply any other way. And I had basically a day or two to come up with some sort of plan. Basically, the plan I came up with was to basically get aircraft to fly in excess fuel, so that it needed less at the airport on their outward journey again. So basically, I had to do a whole lot of complex modeling, very, very quickly to understand how much excess fuel different aircraft on different routes could bring in.

So aircraft have lots of limitations, like maximum takeoff weight. So they can't carry unlimited amounts of fuel. They also have maximum landing weights as well. So say they land too heavy with too much fuel, they will break the undercarriage, which is also not great.

So I had to do all this modeling, not just for British Airways, but for all the aircrafts and airlines at Heathrow. And I managed to actually build a very, very accurate model because later on, the airport operator shared details of how much fuel all the different flights had actually used beforehand. And I actually came very, very close in my predictions to how much fuel each aircraft could actually carry and needed for any particular journey in any aircraft combination.

So I guess, that's one of my proudest moments in my career. I was able to keep basically the whole of Heathrow flying by calculating exactly how much each pilot needed to bring in, in excess fuel at the lowest possible cost from around the world. And probably, very, very few people know how close Heathrow actually came to actually shutting down for several months as a result of that explosion.

Jesse

That is an awesome story. I loved it on many levels. Did you at least get a plaque?

Harvinder

No, I wish I did. No, I think there were just a few pats on the back.

Jesse

All right. Unsung hero, Harvinder Atwal, you heard it here. Okay, so going back to what you were talking about before. We talked a little bit about DataOps. Tell me more about the structures for those teams. Were you doing, exactly like you talked about in the book for DataOps, or were you doing something different?

Harvinder

So maybe take a step back here around sort of team structures. And the fact is there's no sort of perfect team structure. Every organization structures’ in a different way. Amazon is different from Google. They're different from Facebook and so on. And it's really about choosing, I call it the least bad option, for your organization. As I mentioned, there's no perfect way. You're going to have to make some compromises somewhere. Now, the way I look at it is, I've tried many different models.

So typically, in terms of team orientation, you might have functional teams. So a functional team is organized by its technical expertise. So you might have a team of data scientists, data engineers, testers, BI developers, and so on. Now, that's really good for resource utilization, because it means everyone's always busy. As soon as someone is available, you can hand them a project or a piece of work to do, but it's not necessarily great for speed because they're actually interdependencies between teams. So if you think that there's a data life cycle and a data journey, you have raw data, which is your start point, which is the data that you capture through to your data product. At the end, there's lots of processes, lots of transformations that data goes through from that raw state to a usable state.

And every time you introduce a team, you introduce handoffs, you introduce backlogs, and you prioritize bun fights and so on. And you also become very aware of imbalances between teams. So certain teams will become a bottleneck for many, many of the teams because they're not resourced efficiently.

So functional teams are historically the way a lot of organizations have organized their data teams, but as I said, while it's great for resource utilization, it's not great for doing things very quickly.

There've been, sort of many other sorts of team orientations considered, by lots of different organizations. They also have their downsides. If you have multiple reporting lines, even if they're dotted, there's questions around who does responsibility and accountability and prioritization actually sit with. It gets very confusing.

And then there are domain oriented teams. So a domain might be a customer, product, value stream, segment, market and so on. That's the kind of model that I've come to favor, having tried some of the others, which is to organize the data teams by a domain. So a domain for us might be some of our brands. It might be marketing. It might be the customer, anything at customer levels, like personalization or communications.

And the idea there is that essentially teams are long-standing and they're focused on the areas, so they get to understand both the data and the stakeholders, and also what's happening externally. The downside there though is that these teams could in theory become isolated from each other. So if you've got data scientists split across multiple teams, how do you enforce best practice, reusability, consistency of hiring, and promotion and so on. So that's where you have to be kind of very careful to make sure that these connections remain between these functional specialists.

And there's various ways you can do that, through the Spotify model, chapter skills, there's communities of practice, sense of excellence, a specialist coordinator and so on. So kind of explore various models. And that's kind of the model that I've kind of settled on, which is kind of what I think works best for an organization of our size. Obviously, when you get larger, Google is very famous for having very functional teams and as isApple. You could consider changing your model again.

And then there's another dimension which often gets overlooked, which is that the reporting lines and the team orientation don't necessarily have to be the same. So a lot of organizations might start with a very centralized team, which will work on whatever they think is the highest priority across the whole of the organization, or across the stakeholders they work with. Now, that's good in a way. In that you're constantly making sure that you're working on the most valuable projects, but the downsides there are that you become a little bit detached.

You become a bit of a consultancy and potentially a bit marginalized from the rest of the organization. And also because you're working on the highest priority things, which could be very different next week from what they were last week, it's a very linear way of continuity. So if you recall, DataOps is very sort of product versus project thinking. So different being that whereas a project might have a very defined end date, a product will be long lasting. And therefore, you need to maintain that product. You need to iterate it. And it becomes quite difficult if you're a centralized team jumping from project to project, to have that product mindset.

I guess, you could code to the other stream, which is decentralized teams. And you can go to like the Netflix version of that, which is very decentralized integrated teams where they'll have data science for marketing, data science for products. And those teams are very independent. They could even be using completely different technology.

But again, I think that the best approach here is a little bit more of a hybrid approach, where you have a central data function, but the teams are domain orientated and sort of embedded, or at least very closely aligned to a particular domain area. Kind of a very, very long answer.

So the summary there is tried lots of different approaches to how you structure teams, the truth is there's no perfect way, but the one I've kind of settled on for our organization is you have the main orientated teams, longstanding, there's a set function, but those teams and those functional specialists do need to be tied together so that you are ensuring that best practice is spread across all those teams.

Jesse

So you mentioned that there's no one size fits all, but let's say I'm a CXO, CTO, chief analytics officer, or VP. And I'm looking at my teams, how do I see when they're organized improperly? What would be that manifestation at that level?

Harvinder

Yeah. So I guess, there are a few red flags. So one is when teams come to launch, communication within teams becomes a challenge. So when you find that your communication within a team is not working particularly well, that people aren't aware of what other people are doing or there's problems sequencing work within a team. That's usually a red flag, that actually a team perhaps needs to be split.

Another one would be if your teams are becoming bottlenecks for other teams. So for example, if constantly people are coming to you for escalation saying, "I need my work prioritized, but it's being blocked by blah." Then that's also usually a red flag to say, "Look, maybe we are not organized to optimize the flow." And there could be a number of reasons for that. So it could be, you have bottlenecks in terms of skills, in which case it could just be a case of hiring the right people, or it could be that actually your team has become too much of a functional silo for other teams. And actually needs to become more cross-functional.

The other red flags for me would be where there's also imbalances in terms of the output of your teams and the outcomes of their driving. So you might have one team which is creating really great outcomes and another, which when you ask them to demonstrate the value that they're creating for the organization, they're struggling to kind of justify what they're actually creating and doing with their time.

Jesse

Thinking about the roles on a team, coming at this from different angles, what should we do if we can't build a full team? So to start off with a discussion, I think there are startups, where they just plain don't have enough people yet. They're just too new. What do you think they should do? This is a common question I get and you probably get too.

Harvinder

Yeah, so startups are interesting because there's almost like a chicken and egg problem. So typically, they want to hire a data scientist in order to essentially look at their data and understand what they can do with it. Well, they might even have defined problems, where they think a data scientist or a data analyst as a specialist can come in. But the challenge is that the data scientists and the data analysts need good quality data. They need access to data. They need access to the platforms and technology to do their job. And if they don't have the support of a data engineer, they're going to struggle to do that themselves. Or if they try to do that themselves, and do not have the right skills, they are building up technical debt and big problems for themselves in the future.

But the problem is if you bring in a data engineer first, then the organization will be asking, "Well, what value are they creating?" It very much does depend on the startup and their stage and what they're trying to achieve. But I think on average, on balance, I do think it's best to invest in the data foundations first. Make sure that you have good quality data that people can work with, that you have the right platforms and technology in place, and then bring in data scientists and data analysts to work alongside that data engineer to refine that further.

Jesse

So now let's further refine that question. This is the other side of it. It's a medium sized, it's a large sized business, and they're just starting out. What would you say to them?

So if it's a medium sized business, they can actually afford to hire a, I would say hire a bigger team to begin with. So I spoke a bit about domain oriented teams and cross-functional teams. And the reason I'm kind of keen on cross-functional teams is that it's famous for the McKinsey research, which showed that they're looking at companies across many different industries and trying to separate what made those companies who use data very successfully from those that didn't. And they had about eight sort of key findings.

Harvinder

So what they found was that those breakaway organizations were twice as likely to use cross-functional teams and agile methodologies as those companies which were struggling to use data. But the challenge comes up, not all roles kind of really fit or personas really fit within a cross-functional team. So for example, you wouldn't necessarily want a security expert or an architect in every team. So there comes a stage where you kind of need to do some separation between the data platform itself and the users of that data platform.

And I think for a medium sized organization, you can do that split. So you can have a team who are responsible for data management and the data platform, which will consist primarily of data engineers, maybe some solutions architects, DevOps engineers, security experts, depending on the size. And then you have more domain oriented cross-functional teams that use those platforms. So I think as the organization starts to get bigger, you can have more specialist roles, but you do start to then have that distinction between more generalists and the platforms and the cross-functional analytics teams.

Jesse

Hopefully, the people listening to this podcast are CXOs, VPs. That's usually a medium-sized company, big companies saying, "How can I do this as cheaply as possible with as few people as possible?" Means they haven't fully bought in. Or in some rare cases, it means that they just don't know, now you know. And knowing is half the battle. But yes, you need it. So a great response, Harvinder, thank you.

As you look at the data team, what's your favorite role there?

Harvinder

Oh, that's like trying to choose between your children. I have to admit, I have a lot of respect for data engineers. I think it's definitely probably the most challenging data role. And the reason I think that is, data engineers usually sit between the data producers and the data consumers. By sitting in the middle, one of the challenges they have is that they're dealing with data producers who don't necessarily have any incentive to produce good quality data, or to often even communicate changes in that data capture process. So they have to deal with ever-changing data. Often, they're not notified of those changes.

And then, they're dealing with data consumers who are very demanding. So they're always asking for more data, and part of that is natural. If you're a data scientist, you often don't know or don't understand what data you actually need until you start exploring it. But it's those data consumers that create the models, that create the analytics, that create the recommendations that kind of get the plaudits. So data engineers are often the very unsung heroes within data teams. And so for that reason, I think they deserve a lot of respect.

Jesse

Well, thank you. Sometimes my writing is accused of being too data engineering focused. I do that to overcome the focus that's on the data scientist. So thank you for shining that light on the data engineers and why their job is so difficult.

Harvinder

Yeah. I mean, I think they do deserve the spotlight in a lot of organizations. The old saying that some roles don't get noticed until things go wrong and then everyone's pointing fingers. But most of the time, things do go right. And people should be definitely rewarded for that.

Jesse

So one of the things I like to do is more like group therapy of saying, "That problem that you're experiencing and is actually relatively common." So would you mind telling us about a challenge that you've encountered frequently when you're dealing with data teams?

Harvinder

Yeah. I mean, that challenge I kind of spoke about with data producers is quite common. But I think there's a couple of common challenges.

One with data producers, they often don't understand how the data is used within an organization. They don't think of the downstream consequences of the changes they might make. And there's lots of things that could change. As digital business, we may decide that we're going to change the format of the data we capture or worse, not capture data, which is very valuable, not check the quality of that data, not check that the data has gone missing. And so, the onus is on downstream teams and often, sometimes it's the consumer of that data who's the first person, "Actually, this output, this report, or this prediction doesn't look right." And then you trace it all the way back up to data capture, which isn't right. So that's a very common challenge.

And it kind of leads to kind of very sort of defensive practices, lots of effort that goes into making sure that you're looking at anomalies, that you have lots of tests and monitoring in place, but it's trying to solve the recalls there, which is to go to these teams and say, "Look, data is extremely valuable to an organization. These are the ways that it's used." It's a team sport, basically. The entire data life cycle is a team sport. We can all be more successful if we just collaborate better and communicate better.

And then the other extreme were data consumers, the challenges that they often... They often don't understand what they need or how best to use data. So a lot of effort goes into making sure that you are, firstly, I think the best way to overcome this problem is to be very much aligned on outcomes, to share targets and objectives with your stakeholders, to make sure that you're working together and you're fully aligned. And a lot of work needs to go into making sure that you are communicating how data is best used to reach those objectives. Because one of the challenges is that everyone or most people in organizations are quite numerate, so they think they know how to use data. But the challenge is that a little bit of knowledge can be very dangerous.

The world has moved on in terms of the analytical techniques and how you can use data, predictive, prescriptive analytics to reach your goals. But a lot of people who are not from this school, data science background, or an operational research background like myself, wouldn't know that. And so, naturally, they would think about using data in a different way. So those are the two sort of core challenges that I kind of see, is data consumers don't know how to get the best out of data and data producers don't know how that data is used and how they can help data teams be more successful.

Jesse

We've talked a lot about what the data team is responsible for. What do you think they aren't responsible for?

Harvinder

Yeah. So for me, we want data to help make better decisions, ultimately, at the end of the day. So whether it's going onto Netflix and getting a recommendation for the next movie or TV episode that you should watch, that's a decision, right? It's like of all the content that Netflix has on its platform, what is the most relevant to display to you on the screen at that point in time. Or for Amazon, what is the right product. Or for Google Maps, what is the best route for you. These are all examples of how you use data to make a better decision than you would be able to in the sense of data and having to go on gut feel and human judgment.

So if you think about it from that perspective, the role of the data teams is to support data-driven decision-making and specifically, analytical decision-making. Or it could be data products that enable other people or other systems or processes to make better decisions.

And so for me, anything that falls outside that remit should not be the responsibility for a data team. So looking at transactions so that you can bill your customers, for example. I don't necessarily think that should be the remit of a data team. So that's kind of like where I'd kind of draw the cutoff. If it's an operational use of data that doesn't necessarily need decisioning, it's just very straightforward logic, business logic, that falls outside the remit of the data teams. Although it can be pretty gray, because the same data can be used for multiple purposes.

Jesse

Excellent. As we put together this podcast, we talked a little bit about the difficulties we're having, and this is a difficulty that we're seeing not just in Europe, we're seeing this in the U.S., I have clients in South America who are hitting this, we're having a talent shortage right now. And that talent shortage is in all areas of data teams. So that leaves companies with some difficulties that both of us have highlighted. So to start off with, how would we try to look internally at people and try to see if they are viable parts of a data team?

Harvinder

Yeah. I mean, the supply demand imbalance continues to persist. I was looking at some sort of LinkedIn's Jobs on the Rise Report the other day, and data scientists, they're in the top 15 fastest growing jobs still, both in the U.S. and the UK. Data engineering was up there too. So it's a common challenge globally.

I think looking internally, I think yes, I have had some experience of people making the transition from say software engineering to data engineering, or from a sort of analytically orientated role to data analyst, and then on to data scientist. But I think there are some pros in that approach. I mean, naturally, hiring someone who has the aptitudes, ambition and has kind of an understanding of the organization today and how it works is helpful. I think there are also some cons.

So I think working with data requires a particular mindset. Going back to the example I shared earlier, if you're in software engineering, things are a little bit more black and white in terms of whether you test your code, it either passes its tests or it doesn't. With data, you are dealing with both code, which needs to be correct, but also data and edge cases and lots of things that can go wrong. You can have missing data, duplicate data, changing formats, data outbounds and so on. And then, the data science ends, you need to have an understanding of advanced analytics. So ML and so on. So I do think you need a particular mindset and a set of foundational skills, technical skills to begin with, that while some people in the organization may have that, that can be transitioned. And we have done that with people.

I think there are actually relatively few people with those existing skills in the organization. You will find some. There are always really good people who are really curious, ambitious, willing to learn, pick up new skills, who are very hungry and eager to make that transition and you can work with them. We offer training with sponsor people first certifications and so on, to help them on that journey. But I do think you need to think very carefully around - do you have the right people internally versus trying to hire people externally and creating a pipeline that way.

Jesse

Speaking of those external people, I have a sneaking suspicion that there will be people who are trying to break into data engineering and data science that are listening to this podcast. I saw a meme the other day, where it was giving a choice on which video do you watch. Do you watch the video on YouTube about how to pass a FANG test, your employment test, getting an interview, or do you actually buckle down and watch the introductory video and actually get your skills? And everybody is asking the question of, "How do I just pass that interview?" That's, in my opinion, the wrong way. So what's your advice there, Harvinder?

Harvinder

Getting into the FANGs is really tough. Obviously, they have many applicants per role and they set the bar very high. I mean, if people want to go for that moonshot, fine. I wouldn't want to stop anyone. But I think if you are completely new to the area, your best route is through a more progressive path.

Firstly, if you don't have any experience, then really, you need to kind of demonstrate that you do actually have the aptitude for it. So thankfully, these days, there's lots of very, very high quality online training that you can undertake. I would try and find a way of demonstrating how you can apply that within your current role first. So that at least when you're talking, turned into you're applying for roles, you have evidence that actually, "It's not just curiosity. I can actually apply what I've learned to real-world business problems."

And then the other thing I'd say is your technical skills and your technical foundation will obviously get you through the door, get you through your early years in your career. But then, the progression from there depends on other skills. So most people in organizations are not data literate and they won't necessarily be so anytime soon. So it's about how you communicate what you can do for the rest of the organization in a way that they can understand, they are persuaded and influenced by. So some of your softer skills start coming in, and then obviously it's going to be more general management skills around how do you manage teams, how do you prioritize and then on to how do you become more strategic, how do you create data strategies further down.

But at the entry level, I think it comes down to a number of things. So one is demonstrating that you have the passion and curiosity, that you have acted on that, so you have tried to learn. And really importantly, trying to evidence how you have applied that and created some benefit from the organization. And that can be within your non-data role within an organization. Again, if you're completely new, a more progressive route into data science is better than going for the moonshot role.

Jesse

As you look at what's happening right now, what are you keeping your eye on in terms of either technology, or tools, or modeling techniques? What are you excited about right now?

Harvinder

Yeah, I guess, one of the things top of mind is ML Ops for us. So we've gone through quite a technology transition in the eight years, when I've been at Moneysupermarket. So it seems like ML and AI and cloud have been around for a long time, but a lot of things only really took off sort of 5, 6, 7 years ago. And I think there are still a few areas where I think there's potential for improvement in some of the software and services and ML Ops is one of them.

So that's an area we're actively exploring. So for data analytics, we're pretty much full of GCP now. And they've released Vertex AI, which is a certified AI platform, which is Google's sort of third iteration of a managed AI platform. So some of it is just branding and remarketing, but the whole ML Ops workflow, which is how do you train models, develop them, test, deploy, monitor, and produce. And then we train them when necessary. That's an area in which I think there's still some work to be done. And so, we're quite interested in sort of Google's solutions around that, particularly around machine learning feature stores, so that's the data inputs that go into the modeling process. They would not be a perfect solution, but a really good solution. So we're always still exploring that space.

And I guess, the idea that our area is sort of interoperable with different services in the date space. So a lot of these services still function a little bit independently. So now, how do you, for example, pass metadata from one service to another. So metadata is data about data. I think that's another area which is kind of ripe for exploration.

Jesse

You've touched on a point a couple of times over this podcast that I want to go deeper into. And you've talked about perfection, and this striving after perfection. Tell me more about that and why that's a problem.

Harvinder

Yeah, I think one of the challenges in this space is that people become a little bit too obsessed with our rhythms and energy and looking for the perfect solution in those spaces. When actually, you want to start over on the right-hand side, which is well, what outcome and impact do you want to make, and then work backwards from there to understand "Okay, how do I deliver that outcome or impact, or at least test whether I can deliver that outcome and impact in the quickest, easiest, most repeatable way."

And I think one of the challenges may be linked to the fact that a lot of data scientists in organizations struggle to make an impact because they try to focus on the things which are in their control. So they can spend a lot of time tweaking the machine learning models, striving for the perfect model, the highest accuracy school they can because that's within that control, as opposed to trying to persuade the stakeholders within the organization, that actually they should be using data and ML to help automate their decisions because that's a much harder conversation to be had. Or talking to their IT team around, "Right, how do we get access to the data we need or the platforms to productionalize our data products and our outputs?"

And I think a good process and a good team will beat out the perfect technology every single time, because a good team, a good process will be constantly iterating and trying to understand how they can do better. So start with a minimum viable product, test out whether it's something that you should actually be developing further, and then you can always iterate and make it better, rather than trying to go for the perfect model and spending lots of time trying to deliver that, and then finding, well, actually it doesn't actually produce any meaningful impact at all.

And I think it's the same with data engineering and technology. People are always looking for the perfect technology that's going to solve their problems, that will automate everything for them. And it just doesn't exist. And it will never exist because things keep moving.

And now to our users, that iPhone that you buy, that you think is absolutely amazing, in two years time, you'll just be looking at it thinking, "This is so obsolete. I need to chuck it away and get the latest model now." And it's the same with technology. There's no point in sort of obsessing about the perfect technology, because it will be constantly evolving. It's just what is the best technology for you to use at that point in time, knowing that you will have to evolve it.

Jesse

So Harvinder, you're the last person we're talking to in this current data dream team series. We've talked to many practitioners and professionals on establishing data teams, methodologies for getting data management right, the importance of people, nurturing skills, knowing the gaps, of course, and seizing these opportunities to gain value from data. Could you share with us, what is it that you never compromise on?

Harvinder

So for me, it's that outcome focus. So I think you really must always be focused on the outcome that you're trying to drive and working back from there.

That requires quite a few things. Making sure that you have measurement in place. So you do understand the impact that you are making. It makes sure that you focus on that entire data life cycle as well. So you're never going to get a great outcome if you've got really bad data quality, so it's garbage in, garbage out. So it forces you to focus on things like data management, data quality, your processes and how you can improve the velocity within your team.

So that's what I always try to keep at the forefront of my mind. And then, if you focus on that, then everything else will naturally start to fall in place. And so, you will focus on all the right things if you are focused on delivering the right impacts.

Jesse

Well, thank you so much, Harvinder, for all of your wisdom and insight. We really appreciate it. So with that, I'd like to thank everybody for listening, and I wish you the best of luck on your endeavors to create your own data dream teams.

Natasha

Another great story, another perspective shared on data, and the tools, technologies, methodologies, and people that use it every day. I loved it. It was informative, refreshing, and just the right dose of inspiration. Remember to check dreamteam.soda.io for additional resources and more great episodes. We’ll meet you back here soon at the Soda Podcast.

Dec 2, 2021
-
S1 Ep14
Focusing on the Outcome with Harvinder Atwal
Close Icon

Share

Focusing on the Outcome with Harvinder Atwal