At the E-Commerce Show Asia (20th-21st April, Singapore) we’re excited to be welcoming John Berns, the head of data science at Lazada, who will be speaking on how next generation big data technologies are allowing almost infinite insights into customer behaviour. Ahead of the show we caught up with John to talk big data, analytics-driven business planning, and his industry heroes.
Which big e-commerce challenge/opportunity is front of mind for you going into 2016 and why?
Opportunity. I always see challenges as opportunities.
In the end, everything we’re working on is about using data to improve the Lazada customer experience.
My mental model of the data system is that it’s like a tree. The root is data acquisition: collecting all the data—not just the data you have, but stretching and growing to collect the data you need. The trunk is data engineering: it provides support and moves the raw nutrients (data) to where they are needed. The branches are the applications created by the data science team. This serves to connect the data to the leaves, which are our customers, sellers and internal stakeholders.
My challenge for 2016 is to grow the whole tree in a balanced manner. We will work on bringing more data into our analytics systems to build more sophisticated views of all aspects of our business: our customers, our customer service, our user experience, our products, our sellers, our marketing activities, our operations and our logistics.
My goal is to have the right data at the right place and time to deliver an amazing experience for all Lazada’s customers while improving internal efficiency and lowering costs.
Not too ambitious is it?
How will this big data challenge/opportunity impact your work in the coming year?
It makes for a lot of work across many fronts: Defining an overall strategy for how we use data that includes working with stakeholders to refine their requirements; working with Engineering to acquire and deliver the data; analyzing the data and creating reliable, scalable operational systems to deliver results.
It’s somewhat like choreographing a ballet; there are many moving pieces that all have to come together and make sense as a whole—and do that reliably every time, without fail.
Prioritizing and coordinating all the moving parts is going to be the biggest challenge. There is a lot we can do, but we also have to decide what’s going to make the biggest impact immediately and how do we balance that against foundational work that deliver results in the long term.
Who is your industry hero and why?
I am going to go out on a limb and pick three. Nathan Marz from Twitter and Kay Kreps and Martin Kleppmann, both formerly with LinkedIn.
All three have been working on open source projects to process vast amounts of streaming data in real time: Storm, Kafka and Samsa, respectively.
The projects they worked on are physical manifestations of their big data philosophies, all of which have a common theme: we are moving away from static views of data to data as a function over time.
Databases are state of data at a point in time while streams show how data changes over time. Streams are, in many ways, superior to databases. With streams (which are just changes to the state of data), you can always compute the present static state—or any previous state at any important time. But, more importantly from a data science perspective, you can analyze the changes and how they correlate over time.
That’s a foundational concept in big data and one that, once you “get”, everything about big data starts to make sense. You collect all the data at a granular level over time, and you use technologies like Hadoop and Spark to compute any past state, the present state or predict a future state. Very powerful stuff there.
Let’s take inventory for example. It’s great if I know I have 37 widgets in stock. But is that enough to last me until the next shipment arrives? Maybe it is on a normal week—but what if it’s on sale? What if it’s just before Christmas? What if it’s a sale just before Christmas? If I have a history of inventory levels (which is a stream of data), I can analyze historical trends and correlations with external events to predict if 37 widgets is enough to meet customer demand given the present situation. Additionally, I can also determine if 37 is too many widgets and inform Operations that we should lower the inventory level.
Here are some good books by my “heroes” if you are interested in learning more:
Find out more about the fantastic big data insights John will bring to the E-Commerce Show Asia by downloading the brochure here!