As you click through an e-commerce site like Amazon or Overstock, many of the features you interact with, from recommendations and search ranking to display advertisement selection, are driven in real time by machine learning and artificial intelligence technologies.
Overstock’s Director of Machine Learning, Kamelia Aryafar, oversees a team of almost 20 data scientists and machine learning engineers who build these systems, helping make Overstock one of the top 50 online retailers in the world and Utah’s largest technology company.
Academia and Industry
Aryafar said she learned Logo — an educational programming language — when she was 5 or 6, starting her journey into computer science and AI. She encountered machine learning along the way, and ultimately completed a Ph.D. in AI/Machine Learning at Drexel University building large-scale classification models. Aryafar moved to Overstock last summer after spending more than four years as a senior data scientist at Etsy.
The switch from academia to industry was “definitely a learning curve,” Aryafar said. While they were both rewarding, she found them to be quite different environments. Industry tends to be more collaborative and less individual than academia, she said, and the problems that academics work on have a distinct character.
“When you’re in academia, as a Ph.D. student, your goal isn’t always to get something into production but more to write a research paper,” she said. “So the problem is well-defined, the scope is clear, and the scale is much, much smaller.”
She said moving to a production-focused mindset required some personal learning.
“When I transferred to industry the first thing I noticed was that a lot of things that I wished would work didn’t because they wouldn’t scale,” she said. “I had to learn a lot of engineering skills to be able to scale what I’m doing to a production-quality system.”
One aspect of industry Aryafar found to be less challenging than in academia was the availability of computational resources. While the situation depends on the company, she said it’s generally easier to justify the cost of a GPU cluster when working in industry. Nevertheless, an important “part of the work in industry is to optimize whatever you’re working on to make sure it makes sense in terms of the resources you’re using.”
“As a data scientist, a lot of my time is actually spent on optimizing a lot of different jobs that I do just to make sure that they actually finish in a reasonable time,” Aryafar said. “But it’s definitely easier than academia to get access to resources.”
The divide between industry and academia hasn’t just affected Aryafar’s personal experience — she said it poses a major challenge for progress in the AI field as a whole.
“One of the things that’s holding the field back is a kind of disconnect between academia and industry,” she said. “I wish there could be more close collaboration.”
There’s something of a mismatch between the goals of academic and industry research, according to Aryafar; she said academia could focus more on what works at scale in industry, for example.
To bridge the gap with academia, Aryafar said she attends conferences and reviews papers to be published — something that “actually forces me to stay engaged and informed,” she said. Another step she takes to forge a connection with academics is hiring interns out of graduate programs.
Machine Learning at Overstock
Aryafar said Overstock’s current machine learning projects largely rely on the state-of-the-art techniques that have accelerated the field over the past several years: deep learning driven by distributed GPUs and large datasets.
According to Aryafar, there’s no specific background that her whole data science team shares — to the contrary, she said she believes having a range of different skill sets improves team performance.
“I think that anyone who has a good foundation of engineering skills and a quantitative mind can definitely catch up on machine learning,” she said.
Aryafar said the large number of online tutorials and open-source machine learning libraries make it possible for an increasing number of people to teach themselves the skills they need to enter the field.
Open-source tools are also widely used by Overstock’s data scientists, Aryafar said. In terms of frameworks, Aryafar typically uses TensorFlow and Torch, but she’s open to team members using the technology they want.
“I believe that it should be the right tool for the job,” she said. “As long as it works I’m ok with it”
Similarly, Aryafar said it’s important to avoid becoming too narrowly focused on a specific approach when selecting models and architectures.
“I love the trend [of more people entering AI] and I’m a big advocate of it, but I do want to mention something about keeping an open mind,” she said. “I have seen people who become very attached to one model. Part of AI is science and part of science is to keep that scientific method: analyze results and try to improve on them, not get attached to them.”
Continuing her philosophy of flexibility, Aryafar said there is no single best approach to integrating data scientists into development teams within Overstock. The choice depends on a number of factors including the project type, number of people on the team, and characteristics of an individual themselves.
“Over the years, I have seen different models, from embedded data scientists (within the product teams) to data scientists who are production-quality engineers too — they take everything from the idea to the actual aspects of production,” she said. “It really depends on the project.”
Aryafar said she believes data science teams will continue to drive growth in startups and the e-commerce industry in particular.
“To me the machine learning and AI teams are the ones that are going to grow and expand and make all the difference,” she said. “Of course, I’m biased.”