20. Pioneers. with Jan Hegewald, Director of Engineering at Zalando

"Zalando is a broadly discussed case of migration towards microservices architecture. They've got almost 200 teams working on the site. Jan is leading the Product and Category streams leading 5 development teams. We discussed how it works in the real world with all it's pros and cons. Jan shared very interesting cases of integrating the cross-services features, how they set prorities and how they leave the space for the teams to keep up with the architecture and code quality." Piotr Karwatka, Host

Questions asked in this episode

How did your career start?

How did you get to Zalando?

Microservices - the Zalando story - tell us a few words about the motivation and the process.

What works, what doesn’t, what’s challenging in this architecture?

When does it make sense to implement microservices architecture - the scale (ppl, features, traffic - what other measure to take?)

Zalando’s team structure - how are the services structured?

What about the tech-stack? Is it fully unified or the teams have some flexibility in choosing technologies and to which extent? Micro front ends?

Is it headless architecture? Do the dev teams consist of all the necessary roles (front, backend etc)?

How is it all managed - the agility vs. hierarchical product and business decisions.

Tell us about your own team and its structure?

How long does it take from the business decision for the feature to having it on production?

Product development process: from an idea to the deployment.

How do you integrate the features? How do you make it look and feel coherent? How do you optimize the experience?

How often do you deploy? What does the QA process look like? What if something breaks?

Challenge adding features vs architecture (vertical vs. horizontal improvements)?

Balancing business features with maintaining a healthy architecture - priorities.

Changes to the Product management roles?

Changes to the Engineering roles?

What are your future plans?

Which tech trends make you excited?


Transcript:

Piotr Karwatka: [00:00:00] Zalando's migration towards microservices architecture is still a model case for many engineers. One of the first eCommerce cases so broadly discussed, I guess, maybe the first one at such a scale. Actually, I was wondering how it works backstage. What are the pros, cons and the daily life of the team power in the biggest European fashion store?

Well, I can't wait to ask all those questions to my guest Jan Hegewald,  Zalando's Director of Engineering of Product and Category Experience. Hi, John. Thanks for accepting my invitation.

Jan Hegewald: [00:01:46] Thanks for having me.

Piotr Karwatka: [00:01:48] Let's start with your experience and your background. Can you tell us a little bit more about yourself?

Jan Hegewald: [00:01:58] Yes. Sure. So for my professional background I studied computer sciences in the beginning, and then I worked for around 10 years in the consulting business, developing custom software for big clients. So I went through all the phases of software development from requirements engineering, to design implementation, testing, rollout, project management, and so on.

And yeah, after that, I was switching to another company that was also smaller consulting firm. And there I was heading my first team. So I had the first opportunity to get an experience in a leadership position that was more about process, program, portfolio management methodology, but also very technology focused.

And at that time that didn't really fit my personal life anymore because I had a child at that time already. And so I looked for something else. And then I figured out that a good opportunity would be in eCommerce. So I joined Yallo, which is a European price comparison platform. And I was a Head of Technology there for the B2B business or the integration of hundred thousands of sellers. And yeah, I worked there for four years and after that which was now, or three years ago, I joined Zalando.

Piotr Karwatka: [00:03:17] Awesome story. So you first started in a more engineering roles. Have you coded yourself?

Jan Hegewald: [00:03:24] Yeah. Yeah, for sure. I started with C++ and later on, of course, a lot of Java C-sharp and yeah.

Piotr Karwatka: [00:03:35] Solid solid background, then managerial skills, and then you finally landed at Zalando with e-commerce experience from the previous job.  Sounds like a perfect fit to the role and your role is Engineering Director of Product and Category Experience. Right. So what does it exactly mean?

Jan Hegewald: [00:03:52] That's a good question. So we structured our teams in Zalando in different areas and product and category experiences, basically making sure that we serve the right product data to our customers and that they see right offers and so on. So I'm responsible for like three teams. The first one, which I was already started being responsible for us, the product of our platform.

This is Zalando's central backend platform that combines prices, stock levels, and product data into offers. Of course, what we also do there is like some, some business steering, like what is available where, and we also already tied which one becomes the merchant of record is if there are multiple offers by multiple merchants.

And then we stayed at to all the different what we call I'd say it's channels that the different markets and yeah, channels we're active in. So this is a really backend high load data processing platform. That's the first team. And then secondly I'm responsible for the engineering of the product experience team.

That's the name for the team that owns the product detail page for the page that you see when you look at one specific product on the page with everything images, the description, the reviews, and so on and so forth. And currently I'm responsible for a team that's called category experience with that team.

We aim to the average, the experience with shopping on Zalando. So far, it had been very much tailored to fashion shopping. Right. But for example, we also have a beauty products. And if you're interested in beauty products, you have different questions as a customer, because then you're not necessarily only interested i n sizes.

But for example, it's more interesting if you look at, let's say face creams or something like that what's your skin type. For example, and we want to build a specific user experiences,those types of  problems Another area of, of that is also elevating for example, designer fashion items from normal fashion items to provide a really comfortable and, and unique experience for people shopping there.

Piotr Karwatka: [00:06:04] Gotcha. So, in other words, if I'm entering the site from Google directly to product pages or category pages, I'm I'm interacting with the code deployed by one of your teams or, or even all of them because the backend front end. Separation.  That's that's fantastic. So you have a really huge impact on the whole Zolando experience.

Jan Hegewald: [00:06:32] Yeah, maybe. I mean, there's many other teams, right? Just as my part until I have to put it into perspective. In Zalando we have around 200 delivery teams and the ones I mentioned now are roughly let me quickly count. It's rather like five or six of them, how many more contributing to the whole experience.

Piotr Karwatka: [00:06:54] Gotcha. So what was the size of a typical team? One of those three you have

That depends, Jan Hegewald: [00:07:02] but I think we go with the usual rules, like something around maybe five, seven or something. It also depends on how you count. Right. So for example, this, this product of a platform mainly consists of backend engineers solely while the product experience team consists of yeah, some backend engineers, but then mainly frontend engineers, partly working on web, partly working then on apps also. So Android and iOS, and then there are designers and product managers, of course, in every case. So it really depends on, on what we're doing and how you count these people in or not.

So let's talk about this category experience team that has been newly built. This consists today of around 18 people coming all together, right? Yeah.

Piotr Karwatka: [00:07:56] Okay. So three projects about five different teams, right. Because I didn't get derailed and you corrected me. So that's, that's their composition, right?

Jan Hegewald: [00:08:07] Yeah. The areas that ever is referring to, I would rather count as departments and within them, there are probably several teams via counted differently. Yeah.

Piotr Karwatka: [00:08:20] Awesome. So I have some questions on how those teams work. I mean the development process, but maybe first I ask you something, probably you already answered many times.

I mean the microservices and Zalando story. So. If you can say a few words about the motivation and the process where you were a part of this process, or was it already there when you joined the company?

Jan Hegewald: [00:08:50] When I joined the company three years ago, I mean, mainly, already in a microservices architecture, however, I know the history. At least from, from people telling it, nevertheless, I think even most companies which are sufficiently complex even if they have moved into microservice architecture, still have some hidden monoliths. And so do we, so it's not fully done. But yeah, I can also tell about the history maybe of how this went.

This was definitely before my time, but when Zalando started. Which has been more than yeah, 12 years ago now. It was of course built on running on a smart system as a certificate startup. Right. It was running on a Magento or e-commerce system and Yeah. Obviously it did quite well at that time, but at some point it, yeah, it encountered some limits to be able to scale that and then how it normally goes, Zalando build something by themselves in order to cater to the specific needs that they have in order to be able to differentiate themselves from, from competition. And they've built that one monolith as also so many too, and all the time also, you experienced then with monolith the limits to scaling these.

That was the time when they decided to go to a microservices architecture and yeah, I think we are very well, in that regard, set up. So, so we're really having a microservice oriented architecture. Otherwise we wouldn't be able to work with these 200 teams in parallel. Have I never left the, also some, some legacy

Piotr Karwatka: [00:10:29] You, meant scalability  in other to scaling the team and structures even more than, than the performance of the website, right? To scale up the the teams working on the code with monolith is a nightmare.

Jan Hegewald: [00:10:44] Yeah. Yeah. Actually it's both, I would say sort of one thing, one reason why you would like to have a microservice architecture is because it covers the components and it allows teams to work independently. Right? And so you can work on features in parallel that you're not able to work on in parallel if you are in a monolith. So that's kind of the development scaling. The other thing is also right. That you mentioned it's a performance scaling because if you're running a monolith software, you can typically only scale the whole, right. So if you have a backlog in, let's say, I don't know, check out to make up an example.

You have to scale the whole application and all the underlying hardware it's running on. If it's on premise at these times in order to scale are just one part, and this gets much easier of course, in the microservice architecture, because you can scale the parts hopefully independently.

Piotr Karwatka: [00:11:37] You can still scale on those parts that are on the pressure is much easier, but on the other hand  I guess, managing the whole microservices architecture is more complex, but I have another question that in a second, but first let me just ask one more question about this migration process.

Any, any insights, how long it took to, to finalize this migration, or maybe is it something which is still going on? I mean, because I guess it was iterative process,

Yeah, it is. It definitely is. I would also recommend always to do it. And it's a really tough process. I can't say exactly because I was not at the company at that time. However,I mean, it took definitely quite some, quite some time to move the, the most part into a microservice based architecture, like year or more. And you're right. It's not always in a way that you can say it's not completely done. We have some little, but these are super minor. I would always think that doing it in our kind of incremental approach is good because also if you move from a monolithic architecture and working on that, Switching to a microservice architecture, you have to learn and get used to it because it also comes with some pitfalls that you should avoid. And if you do everything at once, it's quite risky, I guess.

Piotr Karwatka: Actually this touched the next question I prepared because I wanted to ask you what works and what doesn't in this architecture, you work also at smaller shops, dev shops. So you probably can somehow compare it to working with smaller teams simpler architectures. What are your thoughts on that?

Jan Hegewald: [00:13:30] Yeah, I think the main things that work well as what we discussed before, it's the scalability that you get from it in both in terms of how much you can parallelize the work and increase your feature, output velocity let's say, and also how you can scale the performance of the system itself. When it comes to challenges. Yeah. It's not all just benefits. Right. It comes with some terms just for, for sure. And where I see the most important challenges is in that you need some overarching governance. I mean, it sounds very anti agil , but let me give you some examples. For example, we did an internal investigation how many programming languages we use within Zalando and we resolved that it was 43.

Of course you could argue that one too much if it's 43 and not 42, but you could also argue that it's many too much because why is it the problem? I mean, we had some exotic languages and that teams wanted to work on and decided to, to use. And then it seems kind of dissolved and no one was able to pick up the work they had been doing before.

That's the problem. And also what you want to achieve, tell us that we really try to tackle is you want to be able in such a complex system to also not optimize only worse on one team, but to do overarching bigger things. And for that, you need to be able to work across multiple teams together. And if you are so diverse in your technology it becomes harder, right?

And so it is definitely a challenge, especially at this time we were, I think also quite famous for this particular agility where teams have autonomy to do what they want. I just was of course, kind of misunderstanding. This was not intended, but this is where this stems from. And the second. Yeah. And the second issue that I want to highlight under this umbrella of this governance that is needed.

If you look into a microservice architecture, there is one paradigm that is you should be CapitalDX services, right? So they should be independent from others. That's that's the main principle. And for that, you also apply it to data. So it should be that no databases, for example, are shared. And we did the same.

And also when I came in, I was responsible for Estonia, for his product of a platform that wants to go for a narrow set of components. Then it comprises of today and then other components from other teams who are later on handed over to us. And it turned out that today, if you look at the landscape, we have, for example, a data store that's because.

Yeah. In the end, the same information, it's all the offers, but we have four of them because this was first part of another team that was part of another team. One is serving the data. One is processing this stuff. And the problem around that is that it's of course not cost efficient because you really.

Replicate the data and the volume that we are talking it does, it makes a difference in terms of costs. And so you need to be careful of such effect or, or yeah, you need to take care of such effect. And what do you need to do despite if you want his teams to operate independently, you need to take care of the big picture, like the overarching technology and architecture to keep it efficient.

And yeah, efficient in terms of processing, but also efficient in terms of costs, for example. And so it is what you need to take care of, especially if the architecture really grows.

(...)