Peter Szaflarski On The Maturity of a Data-Centric Industry

8 min readAug 5, 2021

What is your day-to-day like and is there anything exciting that you’re currently working on?

My day-to-day is not that glamorous when it comes down to it. I meet with external team members and my own data teams that I manage. The people on my teams are the ones who do all of the interesting work. I generally act as a liaison to other teams that need to interface with us but also external teams outside of Thomson Reuters, if they need a point of contact for solving their data problems within the Thomson Reuters suite of products.

In terms of exciting things, Thomson Reuters has grown as we have acquired a bunch of market leaders in the tax space, the legal space, and in other various industries. We have acquired these market leaders and they now have to be integrated into the suite of products. With that comes a whole slew of data problems. So what my team gets to do is work through all of those data problems that are presented by tax data being transferred to and from our systems at Thomson Reuters.

Can you talk to me a little bit about how the pandemic has impacted your work?

I bet that’s a question where you get almost the same answer every single time. In some ways, we’ve learned a lot and we’ve become a lot more efficient. I was surprised to find out that in most aspects of my team, our velocity has actually gone up. But that kind of velocity came pretty easily probably because of two things. One, the distractions of being stopped at work for doing one thing or another have been removed but also (and this is the worst side effect) is that the barrier between work and life balance has disappeared. People are willing to work longer and that’s a bad thing that we have to keep an eye on.

What are some of the top trends you’re noticing right now in your industry? What is going to be big in 2021?

I almost feel as though a year’s time frame is too short, because within the industry on the data engineering side with AI and ML, things are coming up in that space with an expected hockey stick-like growth. When you have that hockey stick kind of growth you never really know where that inflection point is going to be and you don’t know if it’s going to be in 2021, 2022, or even 2030.

At a high level, trends coming down the pipeline that I think is pretty interesting is the maturity of the whole data field as it comes out. I don’t think there’s a good name for it right now, it’s seen as the child of computer engineering.

In some ways, it is exactly that, but to me there’s a world of difference between someone who works as a data scientist or a data engineer and someone who works as a computer engineer or a computer programmer or software developer, depending on what you call them. The data scientists and data engineers today are a lot of the time confused and mixed up and sometimes put into the same basket as a computer engineer.

But as the data space matures I can see that having more refinement in the data engineering space, having more of a focus on building robust data pipelines or having greater resources on how you build documentation around those things and definition around what are the best practices around that.

I think that in the whole data space we don’t have the resources that the computer engineers have around them today. I think that is something that we can expect will be coming up in the next five years and we will see the emergence of something like a DevOps Kung Fu or an Agile Manifesto but for data.

The other thing that I see is already starting is on the infrastructure side, where there is a massive shift towards managed services and less of a reliance on open source. I think the space is so new and we have a tendency to overestimate our abilities to solve problems in it and when it comes down to it, open-source projects, as brilliant as they are, are often difficult to understand, difficult to maintain, and difficult to build a team around.

The more we can lean on managed services to solve our problems, the better it’ll be. One example that is worth mentioning is the snowflake IPO that just happened not that long ago. There’s some skepticism about their growth and whether they’ll be able to support the numbers that they’re projecting. But based on the idea of solving data problems using managed services, I think they will succeed precisely because they are an easy-to-use but powerful managed data warehouse in this space.

What is your biggest data management challenge right now? What is keeping you up at night?

The problem that I’m trying to solve is that right now is that our customers are getting closer and closer to wanting to solve their own data problems and use our products to do it. You might say that’s not a big deal, just use the technologies that you just mentioned and build something on top of those technologies, and then pass that on to your customer. That’s fine in principle but if you have millions of customers that are all expecting their own data warehouse and you to supply that to them, that’s a very different problem than spinning up your own data warehouse and using it for yourself.

So it’s a combined problem of how do you handle large volumes of data, and then how do you pass that onto the product and then on to your customers? There are solutions to it, but they’re not always easy.

How are you attempting to solve this problem right now?

I have a motto on this and that is, “don’t be a hero”.

One of the problems in this space is that it has developed very quickly and anyone who’s been paying attention can see that it looks very different than it did in 2015. What people were using in 2015 is different from what people are using in 2021. So you can’t expect people to have the same expertise in the tools that we’re using today.

This goes back to my previous point where I said we’re trying to lean more and more on managed services. That’s how we’re trying to solve this right now, find technologies and find players in the space that really know what they’re doing and know what they’re talking about when it comes to building these things and pass that problem on to them. I think the more we try to reinvent the wheel to try to solve these problems again for ourselves, the more we’ll fall flat on our face and we’ll look very silly doing it.

What do you think of the current state of governance and ethics when it comes to AI and ML right now?

I think there are two ways to take this question because when it comes to governance and ethics in the space, most of the questions that I get are around things like retention policies for data, how do we handle data residency, how do we make sure that data passes through a certain country or does not pass through certain countries and how do we make sure that data is deleted and not searched or inaccessible in all of these kinds of things? So we’re handling problems of governance or regulatory concerns.

Sometimes it is what our customers expect of us, so we’re trying to handle what their requirements are. So that’s one way of looking at it but then there is sort of an ethical spin on it. An example of this might be how accessible do we make our customer’s data or how inaccessible do we make said data?

There’s also another whole class of concerns here and that is based on our understanding of where this industry’s going. This has to do with things like AI safety and questions around that which are completely independent of how we believe these things should be structured or how the industry should be structured.

Even with a robust regulatory framework, we could build these things in a way that is inherently dangerous to us and we’re not even realizing it today.

In some ways, these two forces are actually in opposition to each other. Because on the one hand, we have the regulatory concerns that are trying to essentially slow us down in terms of sharing data, whereas you have the other side of things which is trying to push us faster, trying to move us toward a paradigm where we can essentially automate the entire world. The faster we move on this one the less we really care about the regulatory concerns.

How will emerging technology play a key role in the development and evolution of Thomson Reuter’s service offerings for your clients?

The Thomson Reuters strategy is a growth by acquisition strategy, so from a purely business perspective, when emerging tech that’s coming out and becomes viable or a key player in our space, they immediately become a potential acquisition target. That’s not my team’s purview, but I recognize the fact that we’re always looking for good opportunities to acquire a business and integrate with them.

From a technologist’s perspective, emerging tech is always pushing us to be better because they are the ones who are taking the risks, who are much more risk-loving, who can fail fast and move on to the next thing very quickly.

Should we be looking to that emerging tech for those kinds of ideas? I suppose right? But what we can’t do is fall in love with it because we do have to remember what emerging tech really does and that is failing fast and move on to the next thing. So I find every once in a while we will fall in love too quickly with an idea and really try to push it and not really know where it’s going. So we need to be able to have some kind of way of validating emerging tech before we actually integrate it into our own way of working our own business or our own technology.

Are there any final words or topics you think the public needs more awareness around?

I would say let’s focus on not trying to be the hero, let’s focus on leaning on what people have already done right, leaning on those managed services by the companies that already know what they’re doing.

I immediately think back to one of my previous answers which is my motto of “don’t be a hero”. I generally believe that because this industry’s been moving so quickly and because computer engineering is a relatively new field, the field we’re talking about here is only a fraction of that and is even newer, so someone who has a lot of confidence in a field for 20 years does not necessarily have the same knowledge in a field that’s only five years old. I find that sometimes we can get a little bit too confident in what we’re doing.

I think the best practices in the data space with respect to data science and data engineering are separate from the best practices that we already have in the computer engineering space.

Peter Szaflarski On The Maturity of a Data-Centric Industry

Written by Nelufer Beebeejaun