What Happens When You Bring Data Science and Data Engineering Under One Roof (Mark Teflian, VP, Data Science & Data Engineering, Charter Communications)

Summary
Transcript

It’s a common refrain among enterprise data science professionals: 70-80% of their time is spent on data wrangling and pipeline building. But what happens if you bring data science and data engineering together under one roof?

Mark Teflian, VP, Data Science and Data Engineering at Charter Communications (Spectrum), joins the show to share how bringing the functions together can help increase efficiency and productivity for everyone at an enterprise scale.

Mark covered:

Why data science and data engineering should be under one roof
How data science helped keep Americans connected when COVID-19 drove massive shifts in internet usage
Different ways to approach embedding data science into production systems

Dave Cole

Welcome to another episode of the Data Science Leaders podcast. I am your host, Dave Cole. Our guest today is Mark Teflian. Mark is the Vice President of Data Science and Data Engineering at Charter Communications. Welcome, Mark!

Mark Teflian

It’s nice to be here with you, Dave. Thanks for inviting me.

Dave Cole

You bet. So, the first thing on our agenda will be to talk about putting data science and data engineering under one roof. That is fairly unique, and I’d love to dive into that and just get your perspective on it, how that came to be, and what you’ve learned as to why you think it should be under one roof. Then the next thing, you’ve spent time at Charter. Charter has been impacted by COVID-19. How so? We will dive into that, a little teaser there, and Charter’s response and how data science played a role there and your team has played a role there is really fascinating. Then, last but not least, we talk a lot about the last mile within the podcast, and we’re going to be diving in to how to embed data science into products, so that integration between products and data science, and I know you have a unique perspective on that and that will be something we’ll dive into as our third and final topic.

So, Mark, why don’t we start with data science and data engineering. Why should a data science leader think about having them under one roof?

Mark Teflian

Yeah, it’s a great question, and it’s part of the evolution of learning. I would say the first thing is because if you look at the intersection of building modern data pipelines, and what those data sets have to be engineered to, and their reuse, and their enrichment between other data sets and models, is where I think that answer lies. If you look at, Dave, there’s a thorny enterprise problem that any data science leader faces. The data science or the analytics professionals, all of them lament that every analysis or model effort requires 70-80% of the project time in data acquisition, cleaning, and data prep. So that whole cycle tends to be project based. And I think that’s why companies think that they’re separate, that these are project events or initiatives that are widely separate.

And so the problem set that we face is exacerbated when every project or report or model has its own isolated life cycle. That tends to happen once again, when science—model building, output—and data pipeline, data engineering, are widely separated. So there’s no sharing of the data intelligence, the business intelligence, or the final data output. When we started to look at that problem, in the enormous amount of analytics data sets, or data pipeline engineering that needed to be built, it became really apparent that we were able to close that gap based on pipeline, data engineering, and model output to start work, irrespective of the type of modeling—supervised, unsupervised, discrete, continuous, it didn’t matter about the modeling type—to begin that as soon as 1-3 days, from a few weeks.

In other words, Dave, just one thing, I’ll tie this together, thanks for your patience. But if you didn’t see that data pipeline engineering, you couldn’t bring the model into that, and that’s very different from what we all know of like a CRISP model. It’s not linear, right? It’s iterative, and so that collapsing of time and the value to the model output made that obvious to us, to bring data engineering and data science together.

Dave Cole

Right. So that 70-80% of a data scientist’s time being spent on data wrangling and pipeline building, that statistic has been around for a long time. The data engineering and data science teams historically have been separate teams, and with varying degrees of collaboration. It really depends on the company and the culture and everything else.

So I would love it if somebody were to actually start quantifying…like if a data science leader would actually look at a data scientist’s time to see how much. Because data wrangling still happens, despite having a dedicated team. I think what you’re saying is it’s never going to go away to zero, and data science is always hungry for new data and it will want to organize it in a way that makes it easy for their algorithm or just easier to analyze or what have you. But if you put those teams under one roof, you’re seeing much quicker times to value in terms of that ability to acquire the data sets by the data scientists and actually build the model and get going. Do I have that right?

Mark Teflian

Yeah, you have that right, because there’s another dimension of it. If we have a true data-as-a-service model, the most important data-as-a-service consumers are your top data scientists working on the top prospective model outcomes for the business, competitive value, ROI, or whatever that problem set is, and then you have the implications of linearity. I mean, you look at the sheer volume. Now we’re running just a big telemetry in Charter, about 7.2 petabytes right now moving to 10, and the disparateness of those different analytics data sets. So when they look like off the shelf assets and there’s concurrency, so let’s say you look at concurrency on a hypothesis. You start at unsupervised learning because of that.

Now, if you didn’t look at it that way, you as a data scientist would be in a wait state to start that modeling. So it’s both the technology value at running the pipeline but it’s also the sheer amount of volume of data that you are looking at, starting to work on. If you look at structure innovation, I’m starting an AEA. Now I can do several. So there’s a concurrency value too.

Dave Cole

Right, and so by concurrency…there’s the sheer volume of data, and data engineers typically are seen as individuals who can handle the larger volumes of data that may be some of the data wrangling type tasks that a data scientist is doing. They can’t necessarily always do those things at scale. So for example, a data scientist might not be able to build a pipeline in Spark, but a data engineer may. So that certainly helps in dealing with large data volumes. On this concurrency point, I think your point is that if you have that strong collaboration between a data scientist and a data engineer, the data scientist can be informing and telling that data engineer what he or she wants, and it can be done in parallel, it can be done that much quicker. So there can be that pairing them together, they’re dividing the labor and taking some of that 70-80% task and moving it over to the data engineer’s plate. Do I have that right?

Mark Teflian

Yeah, you’ve got that right. Your explanation was excellent. If we think about a manufacturing model, so there’s assemble to order and there’s order to custom. By getting these, it’s the nature of these data assets too, they are analytics data sets. They are not reduced dimensional tables, they are engineered and built for the feature extraction that’s necessary and to reuse on enrichment. So you have these off the shelf data-as-a-service ADSs. There’s a lot of enrichment and joins and things that take place because of that concurrency, and then there’s assemble to a custom order. That’s almost enough but what else is needed through exploration? So if you look at exploratory data analysis, take it back to a model, that discovery and cycle in the data pipeline has accelerated…

Dave Cole

Got it. What is ADS by the way?

Mark Teflian

Analytics data set.

Dave Cole

Analytics data set. Okay. Got it.

Mark Teflian

Yeah, yeah, different beast of data, right? A data set that has a specific, prescriptive lifecycle, has features that are engineered, labeled.

Dave Cole

You can say well-documented I’m sure?

Mark Teflian

That’s right, yeah. Not your mother’s BI table.

Dave Cole

Right. So I imagine under one roof, you’re just seeing much, much stronger collaboration but also that does make your job a bit more challenging. Every data science leader out there I think maybe has the capacity to be able to manage a data engineering team. So I imagine your background, you’ve had experience on both sides of the fence so to speak. Is that correct?

Mark Teflian

Yeah, and also in our world of ModelOps and automation, the whole network automation platform that animates our model operations too. So in a way, it’s a nice array of assets that you can do certain things that are harder to do if these things are parsed in different parts of the organization.

Dave Cole

I’m curious, who do you report to? Because I think you’re on a path to be a chief data and analytics officer. Does that sound right to you?

Mark Teflian

I report to Stephanie Mitchko Beale, who is the CTO of Charter.

Dave Cole

CTO. Okay. Sometimes you see a chief data officer and then you have a chief analytics officer or a chief data scientist. They’re separate roles. But if you have data engineering under one roof and you have data science under one roof, I’ve also seen the combined role, the chief data and analytics officer. So just throwing that out there, so, Stephanie, hopefully you’re listening to the podcast and you’ll take that under advisement.

So I can see a lot of benefits there. Do you have stories and anecdotes from your team, that 70-80% is whittled down and your team tends to spend more time on actually doing the data science, actually training the models? Is there anecdotal evidence that you’ve heard from your team on that?

Mark Teflian

Yeah, yeah. In fact, we’re looking at how to measure that now that we’re at scale on that shift, but there’s some variables that have changed that too. We’ve done a lot of extreme automation on data, data quality, algorithms, on the data prep cycles too that have accelerated that irrespective. But if I start to look at the sheer model production hypothesis results that we’re getting, it’s actually moving faster and proportional to the volume and disparity and complexity of the data that’s being modeled. So I just look at that in our productivity because we’ve adapted, data engineering and data science too, within the safe framework. So that was hard too to take our practice of engineering and data science and how do you put that in with a safe framework? That’s going to get a little bit into the applications piece at the end of our discussion, but I see, to be honest with you, this is kind of like maybe the cobbler’s kids have no shoes. We haven’t metriced that exact shift yet, but we have the apparatus in place and we’re at a stable scale to do just that.

Dave Cole

You’re going there.

Mark Teflian

I think we’re seeing a collapse of that 70-80% in output value already, even though the data complexity to produce the higher value models is increasing also.

Dave Cole

That’s fantastic. I’m not sure if you did this purposefully, but we’ve been talking a lot about how data engineers can support their data scientists in a better way by being under one roof, but you just gave me a few examples where data scientists can actually build out models to actually help improve the data quality which helps out the data engineer. So it sounds like there’s a bidirectional, symbiotic relationship between both and having them this close together working together.

Mark Teflian

There is, and you have picked up on that too and also in this thing called advanced analytics that has these pieces, we also have computer scientists, DataOps, ModelOps, so I’m blessed I have this really integrated team that makes this velocity, speed with a direction, work. Where if you have those in different components, like your com sci guy is over here, your SMEs are out from somewhere else, now with data science and data engineers, you start to bring the platform architects for this together, the com science guys that understand analytics, the construct of these platforms with your data science and data engineers. That’s what makes this do really well.

Dave Cole

Yeah, and we’ll get into the ModelOps. I know you’re eager to get to the third one, third topic of the day but I do want to get into the next topic which is Charter is a communications company, and it was disrupted, as I would imagine almost all communication companies out there, by the COVID-19 pandemic. Talk to us a little bit about what that disruption looked like and how data science played a role.

Mark Teflian

Yeah sure. This was a great example of that response, so thanks for wanting to chat about it, Dave. Last year, we kind of timed this inflection point out around March 12. It’s amazing that it’s 14 months later, it’s hard to believe here, but immediately with the lockdowns that started, I believe when the NBA said we’re not playing anymore, there was this inflection point. For a network provider, what that meant is not only did the volume of traffic start to grow, to accelerate what would normally be 18 to 19 months. So given the time machine of volume back then and here we’re in the fourth quarter of this year, and our volume growths were very predictable before that, but this traffic at an enormous scale in different regions, different geographies shifted so the peak traffic times and the nature of the traffic dramatically shifted.

For example, if you think about all the work at home that immediately started to take place, now the traffic shifts in peaks on the upstream of the network just took off almost overnight. We were faced with huge accelerated volume growth as well as massive traffic shifts. Massive whiplash at scale. So what does that mean to us? First of all, we had a robust network architecture, highly invested, flexible options, but the issue is what do you do to immediately respond to it? So this COVID response needed a data-driven approach to look at this immediately without having a traditional 12-month window of development to look at the problem.

Dave Cole

Right. Didn’t have 12 months. Yeah.

Mark Teflian

So because this data and science and engineering product was in place as data-as-a-service, we were able to produce seven new data assets or new product IDs, enhancing the base analytics data sets within three weeks, and then they were customized for the new COVID behavior and then integrated with 10 other existing data engineered assets that could give us the new specific view of the data in near real time, peak hour, peak day, by state, by hub, by region, by customer now in data. We had the data so we could compute the implications of the shifts now to then look at the augmentations and the changes now that would need to be implemented on the network.

So as far as some sizes, the collection of the data assets that changed were a trillion rows, 105 terabytes, 17 data sets. A lot of data meat there, but we were able to start immediately on the model training for capacity planning, the visualization of what had changed. The implication of visualizing your network in COVID was very different from what it was previously. Then there were a lot of specific reports that were going on to the FCC and other entities because all of a sudden, the internet was an absolute hanging society together and an e-commerce vehicle for COVID. It had an essential purpose, so that’s what we did.

So because of the data engineering and these analytics data sets, the changes and then the model, the ML, changes and prediction, we could move rapidly within a two to three week period of time to reset the whole behavior of the network and all of the engineering complexity changes that we had tried to track and manage across the whole network footprint. So that was, at that time, near 29 million customers across the network.

Dave Cole

That’s a lot. First of all, the impact of COVID-19, it was not a gradual thing. There was, like you said, virtually overnight, states, counties, businesses were deciding that you had to work from home. So suddenly you have this increase in volume of traffic through Charter. Also the times, I would imagine, before COVID, peak traffic maybe happened early in the morning or when people got home from the office, let’s say, but now you’re seeing it more spread throughout the day and maybe even higher between the working hours because people are working from home, they’re on Zoom calls and things of that nature. Do I have that right?

Mark Teflian

Yeah, that’s a great summary. In fact if you looked at peak traffic behaviors on our networks, peak time 7:00 to 11:00 on weekends usually, Saturday or Sunday, and that’s all that streaming video, downstream, game load, downloading, all that. Now all of a sudden it’s not on the weekends, it’s shifting to weekdays because people’s work is weekdays. Now you’re seeing peaking on weekdays at 3:00 to 5:00 in the afternoon or at 10:00 in the morning. So the whole tonnage of the traffic and the content behavior just whiplashed us. But once again, I mean the resilience of Charter’s network that had been invested in heavily and in place with the right data science and data engineering, let us then use that plus the ML, the models we needed, to literally quickly re-drive and respond to that.

Dave Cole

Right, and then you mentioned your data science team said, “Hey, we need these seven data sets, ADSs I believe, the new acronym I learned today, and we need them created so that we can build new models to address I guess the future capacity challenges that the network is going to undergo thanks to COVID and within three weeks.”

Mark Teflian

If you think, “Okay, now I’ve got to compute by day, by hour, by peak against the hierarchy of the network.” Totally different in the underlying engineering algorithms and rules on how you would even treat this network.

Dave Cole

Got it, and then so your data engineering team, having this data-as-a-service mindset, I imagine they had a data lake or data structures or what have you that made it very easy for them to pull the data necessary for your data science team to go ahead and build these models to address the future capacity concerns.

Mark Teflian

Exactly. The automation and then as we tear down these analytics data sets to product IDs and SKUs. So now if you look at that, how fungible that was in the data intelligence that was in place, then building the new ADSs with the compute features that are necessary for the models, if we hadn’t had data-as-a-service in place as a service like that, the data response would have been months and months.

Dave Cole

Right. That’s awesome.

Mark Teflian

I don’t even know how long it would have been.

Dave Cole

Let me dig on this one a little bit. Define data-as-a-service for our audience. I mean I can infer, but how would you define it?

Mark Teflian

It’s basically like I have all these different operands to do with data. Let’s say it’s ETL, it doesn’t matter what my compute vehicle, an embedded app, a deep learning app, we’ll get to that. I have all these different things I’m going to do to data so my consumers can come into a place that looks like a data governance ontological search space and look at a piece of metadata and all of the extant data and ADSs, finished goods and raw materials are self-serve. They’re available for any of those specific operations, whether it’s our predictive automated ML, our optimization engines, they’re used, and so the data-as-a-service specifically to those tools and those applications. Just like you look at server lists in an application world in the cloud. So the data is the service to them. You don’t have any of this, “Gee, I need to ask for this extract and I’ve got to call somebody up because I heard you might have a row sample that would be interesting…” and also it’s data-as-a-service because the storage and compute space is the service.

Dave Cole

Right.

Mark Teflian

So if you think of some of these ADSs, one being loading 23 trillion rows a day, the service of those ADSs and what those mean are all highly available and repeatable to the consumer and the producer of the data.

Dave Cole

Right, so if I’m a data scientist working on your team, it sounds like there’s an interface, a searchable interface that allows me to discover data on my own, discover these ADSs on my own, and then learn about them, learn a little bit about the data lineage, how current the data is, that sort of thing.

Mark Teflian

Also you have a relationship, and this is where these governance practices of the past just broke down. If you look at some raw material that’s underway in the data pipeline, it also has a direct … Okay, I’ve got these assembled to order ADS products that are just fine for my data science model, but wait a second, there’s some raw stuff in the mill that I see, and now I know who the data steward of that raw in the mill is and what step it is whether it’s an algorithm step that’s taking place with it or a data engineer, so now I can start to bring that. That’s what you talked about before when you accelerate the visibility of what you need to do for your model expiration.

Dave Cole

Is it something that your team has created on your own? Or is this something that you bought off the shelf or is it an open source package or something like that?

Mark Teflian

It’s a combination, Dave. I mean one of the things that spurred data-as-a-service was this whole reality of this one-off confirmation bias project for project approach of data extracts and data engineering and then the reality that we needed to build ADSs that had to serve the value of advanced models, prescriptive, predictive, autonomous, a different model set than specifically descriptive or diagnostic only models so the high ROI based on complexity to engineer for the model so it would have higher value. So that combination.

The other thing I’d be happy to admit was that in the data service pipeline that we created, along came Snowflake. The Snowflake model that these guys built, the only ground-up analytics DBMS, was structured to fit right into a data-as-a-service model. That fit right into what we needed versus a lot of other ADMSs being stitched together with extant technologies, we need a little bit of this type of a…this map-reduce database, this piece of Hadoop, this piece…now you had all these siloed targets that you didn’t have to provide as a service or do all those transforms and mappings.

You had one place that you use to build this data service so because Snowflake came along and then some other abstraction models that we were able to build and validate as tool chains for it, so it’s a little bit of our invention, our development, open source and then commercial off the shelf available and licensing.

Dave Cole

So the next topic I want to get into here is actually embedding data science into some of your production systems. So we heard a bit about how agile your team is in terms of being able to address things as black swany as the COVID-19 crisis. But now let’s talk about how agile your team is in terms of being able to get models into production and what your philosophy is. So I know we talked a lot about the data engineering team, we talked a lot about your data science team, but there’s a third team there too which is that ModelOps team. Is that right?

Mark Teflian

Yeah, DataOps and ModelOps.

Dave Cole

DataOps and ModelOps, okay, got it.

Mark Teflian

Yeah, we had to figure this out too. If you have industrial strength, people call us industrial strength ML, right? You can’t separate the DataOps if you think about a CICD process, the DataOps relevant to the ModelOps. So your versioning like you talked about before, the specific versioning, providence, AQM checks, those have got to be present operationally with your executable model. You can’t have a separation of the data that produced the model and somehow we’ll be back in a couple months to tune this thing. In a real instructional data model, that versioning has to be there with the model and the model tail.

Dave Cole

Yeah, the reproducibility. The unique element of data science is your code may change, but the data that is used to train the model, there’s a link there and you need to make sure that you capture that from a reproducibility standpoint. It’s not good enough just to store your code in Git or something like that. You also need to know what data set was used for that version of the code in order to train the model.

Mark Teflian

That’s exactly right, and the other thing, it’s not necessarily an alert, let’s say, to a NOC, a network operating center. You have a model that’s alerting to a prospective degradation of a specific service but also in the sense when your model is instructing a machine to change hundreds of thousands of devices or act on them, you have the model feedback loop as you said against that versioning, but we have a resource control loop. That’s another loop based on the behavior and the nature of the specific controller or machine state that was acted upon. So you need to couple data versioning, ModelOps, for both the model feedback loop as well as the resource control loop because the other end of some of our deployments are machines.

Dave Cole

Right.

Mark Teflian

They’re not a human consumer of a campaign execution or a campaign result or anything like that.

Dave Cole

So you’re saying that some of these models that you build are specific to certain types of machines, like actual equipment. So you need to know that this model was trained with this data set and only pertains to this type of machine. Is that correct?

Mark Teflian

Yeah, and this type of machine’s automation has a resource control loop with the following state changes that the model is instructing. So it’s not just like, “Yeah, we want to make the model better, the feedback loop with the data,” but it’s also the state of the output. Let’s say the metric output of the machine’s behavior and that state which the model instructed. Some of those things change. It’s different on Wednesday afternoon than it is on Thursday evening too, so that the automation controller feedback loop data is as critical to be in the model tail for tracking as the feedback loop for the model itself.

Dave Cole

This is overwhelming, Mark. I’m glad I’m not in your world.

Mark Teflian

Feedback. It is everything. It’s just that in our world where you want to let a model be autonomous, and it’s autonomous to another machine, it puts this whole resource control loop on top of a normal model feedback loop for pure data. So just think of it as two layers of feedback.

Dave Cole

Right. I could imagine why you have to set up a separate team for DataOps and ModelOps to keep track of all this. So for our audience out there, what advice do you have? Clearly you have a fairly sophisticated approach to ModelOps. We talked a little bit about reproducibility and things that you need to keep track of, but what other pieces of advice do you have for them?

Mark Teflian

On both ends of this, if we look at where things are going, it’s really critical, if you are looking at models that are part of innovation of the company, like maybe experimental product design or a whole new way that your customer quality experience, is going to become self-serve. You have some radical changes, they’re innovations, and they have to be plugged into the existing operation. That takes more of an end-to-end architecture the data science leaders or data engineering leaders have got to participate in. In other words, your entire pickup and management, your data plane, like all of us data science leaders, our data plane, that just isn’t inbound to us. We have to be part of that architecture in the nature of it. In other words, how I am fed is what I will produce. Sometimes we think that all we’re doing is getting exhaust and not influencing well what is the production of the data that I’m going to transform into an asset.

So think about the data you’re producing as an asset to the company. This isn’t exhaust for every project and if you think about it becoming a product, as if you’re a third party product provider of data, that you may sell to somebody else even though you’re inside your own company, thinking about that productization of data are an asset here based on advanced modeling is really critical. So that’s one good thing to think about.

The other thing is as we think about the fact that this is going to get a whole lot more fun but more complex in embedding. So if we start to build such big data enrichment to build applications, like deep learning enabled applications, and the value of that now says that you need to think about your data APIs and the specific application characteristic, not only of building the deep learning application but what’s the distribution and egress of that application? Because it’s like people like to say, the right data set has an infinite return on data. If we start to think about embedded deep learning models that can be developed and distributed that have wide egress, the consumers and value of those applications to other applications is going to be enormous. It’s probably going to make our big data look small once you get to the other side of it. So on the other end of it, start thinking about, participating about the change to the application architecture. If there’s specific new applications that are going to be designed, little things like, “Well, how do we get data versioning in GitHub?” Think about an application’s development pipeline. How is the data versioning and environment actually part of your GitHub, is part of your whole, continuous development.

What’s happening here with the data pipeline is all of the products of the data pipeline are growing and growing, so we’ve got to think about how do I make the input run faster to produce these data assets? If we look to the other side of these with these, not being traditional build and run your models, but these are highly enriched deep learning based embedded applications that you’re now producing, but how are those produced within your traditional application feature enabler development cycle in the business? I really think that as data science leaders, we have to face it. When we’re going to be successful, it’s like we’re going to be ambient to the corporation’s processes.

Dave Cole

Yeah, I was just going to try to dumb it down a little bit. I think what you’re basically saying is as a data science leader, don’t be afraid of going wide. Don’t be afraid of understanding how your data is actually being collected and actually potentially talking to the people who are building the applications that are collecting the data to suggest specific types of data. Eventually they will be features that you use in your model, and then separately as you go downstream from the typical data science model producing artifacts that your data science team is creating. Understand how those models are actually being embedded in the applications themselves. Really understand it; that will better inform what you’re doing. So go both directions.

Mark Teflian

Because they’re not going to look for it. We have to go out there and evangelize it and work to see how this pipeline builds to these embedded applications. The other thing is the issue of co-opting. Your best SMEs in the company are going to want to be with your team. You don’t want your guys in the back office producing over-fitted models for them to spit out. But if you bring your SMEs into this, they will want to be that, and by the way, and it’s not to make them citizen data scientists. I don’t subscribe to that. I’m sorry, maybe many of my other colleagues do. But this whole notion of citizen data scientists, if you think about it, in this type of view, they’re not necessary because we’re getting our data scientists and our data engineers into the front room as if they were like a business analyst but in the sense of a data science analyst and that’s what you need. We don’t need this intervening structure. It’s like Conway’s law. We look at these teams, and somehow there has to be another team that doesn’t actually run in this pipeline with the knowledge of the processes and the relationships that are needed to deliver it.

You said it really well. Yeah, go wide right now because it’s going to happen, but also, it’s up to us to pull these innovations through and plug them into the existing operations.

Dave Cole

With experts, right?

Mark Teflian

With experts, yeah. I don’t think the existing operations are going to come finding us.

Dave Cole

Right. Right.

Mark Teflian

It doesn’t work that way.

Dave Cole

Citizen data scientists. It’s not quite a tangent, that comes up quite a bit, right? And that conversation of can we take SMEs and business analysts and turn them into data scientists? You’re saying no. Find people who are great data scientists. Pair them up with SMEs. SMEs who are able to talk the talk and understand a bit about the world of data science, and vice versa quite frankly. You want your data scientists also very much to understand the business and even how the data is being collected all the way to how their models are being used in production. I think it’s a reasonable explanation or a reasonable expectation rather for your data science community.

Then also you have data engineers on your team. You have your ML engineers and your ModelOps team and your DataOps team, so it all makes perfect sense.

Well, this has been fantastic. I have learned a lot. My brain hurt at times, Mark, but in a good way. What you’ve built is really fantastic and I do think you’re on the way to being the next chief data and analytics officer at Charter, so just with that plug in there.

Mark Teflian

Dave, I really enjoyed it. And we talk about feedback loops for everything, for machine learning, we all know that. But you’ve been a great feedback loop on this discussion too because as we look inside ourselves and what we produce and what I explain, I learned a lot from you on what the takeaway would be in general.

Dave Cole

Great, well, check’s in the mail! I appreciate the kind words. Mark, it’s been great. If people want to reach out to you and find out a bit more, can they hit you up on LinkedIn?

Mark Teflian

Yeah, LinkedIn or mark.teflian@charter.com, my work world. That would be fine too.

Dave Cole

Awesome. Well I appreciate you joining the Data Science Leader Podcast. I had a blast. Have a great rest of your week.

Mark Teflian

You too, Dave. I enjoyed it too. Take care of yourself, okay?

Dave Cole

Right. You too!

Popular episodes

How Computer Science & Statistics Fundamentals Can Advance Data Science in 2021

29:22 | Episode 16 | August 17, 2021

Listen now

Getting Started with Deep Learning in the Enterprise

40:04 | Episode 15 | August 10, 2021

Listen now

Communication in Data Science: Know the Data & Know the Business

38:29 | Episode 14 | August 03, 2021

Listen now

The Right and Wrong Place for the Citizen Data Scientist

26:54 | Episode 13 | July 27, 2021

Listen now

Listen how you want

Use another app? Just search for Data Science Leaders to subscribe.

About the show

Data Science Leaders is a podcast for data science teams that are pushing the limits of what machine learning models can do at the world’s most impactful companies.

In each episode, host Dave Cole interviews a leader in data science. We’ll discuss how to build and enable data science teams, create scalable processes, collaborate cross-functionality, communicate with business stakeholders, and more.

Our conversations will be full of real stories, breakthrough strategies, and critical insights—all data points to build your own model for enterprise data science success.

Dave Cole

Host, Data Science Leaders