After his PhD in physics he realized in his first job as a Data Scientist at Blue Yonder, that it takes a lot more than just software and algorithms to deliver value to the customers. That’s when he dived into configuration management, application life cycle and the design, development and operation of distributed systems in general. He is now a Senior Data Scientist at Blue Yonder specializing in the improvement of the data science delivery efficiency.
What is data science and what has it to do with value delivery? What implications has it to architectural paradigms like microservices? What does it take to operate a data science application? How do we monitor the data science value add? Do we really need Hadoop? Let’s take a walk through a data science delivery pipeline.
Data Science, Big Data or predictive applications are buzzwords no one can escape in the last years. With a grain of salt, in the past it was sufficient for many companies to just buy a Hadoop cluster in order to show that one is really doing “this data stuff”. Times are changing and the companies slowly realize that data buried in a datacenter is worth nothing. The biggest value of data science lies in data-driven automation of business decisions and therefore as an active part of the value stream just like software development in tech companies. Because of this, the same arguments for continuous delivery apply to the data science delivery process. Furthermore, as data science needs to pull data “greedy” from many different sources, this imposes a new dimension of complexity to the continuous delivery pipeline. While the increased required resources can mostly be managed, the increased coherence due to the excessive number of different data sources from throughout the whole business creates great trouble for modern modular and scalable architectures like microservices. At Blue Yonder, we have more than seven years of (sometimes painful) experience delivering and operating predictive applications as-a-service for our customers. In this talk I will share important lessons learned, how we deploy, how we test, how we monitor and how we “crunch the numbers”, in short, I will take you for a walk through our data science delivery pipeline.