Machine learning as a service (MLaS) is imperative to the success of many companies as many internal teams and organizations need to gain business intelligence from big data. Building a scalable MLaS in a very challenging problem. In this paper, we present the scalable MLaS we built for a company that operates globally. We focus on several scalability challenges and our technical solutions.
LI Erran Li received his Ph.D. in Computer Science from Cornell University in 2001. From 2001 to 2015, he worked as a researcher in Bell Labs, Alcatel-Lucent (acquired by Nokia). Since 2015, he started working as a senior software engineer at Uber Technologies. He is also an adjunct professor in the Computer Science Department of Columbia University. He is an IEEE Fellow and ACM Distinguished Scientist. His research interests are machine learning algorithms, systems, deep learning and AI.
Finding value in legacy industrial datasets: lessons from Amey’s Mercury platform — Stephen Gooberman-Hill (Amey)
Amey is one of the UK’s leading engineering asset management companies. We manage the design, build and maintenance of large public infrastructure estates – our clients include rail operators, airports and public utilities. The assets are specified for long lifetimes – an escalator is designed to last 30 years; a bridge hundreds.
Many of these assets are instrumented via a variety of legacy systems. We have designed and deployed a system called Mercury, which builds models of asset performance from this instrumentation data, and combines it with work order and other maintenance data to allow operations and maintenance teams to understand the performance of their assets.
Machine learning is integral to Mercury, problems include free text matching, anomaly detection and fault prediction. I will talk about our experiences of applying ML techniques into legacy asset datasets, the issues we have faced, and how we have been able to provide actionable predictions of upcoming asset failures.
Dr Stephen Gooberman-Hill is a Principal Consultant in Amey’s Strategic Consulting and Technology Group. He is the solution originator of Amey’s Mercury data analytics system. He is currently managing a number of Mercury pilot deployments, and is also developing innovative data gathering and analytic solutions across a range of customers and partners.
A multi-pronged approach to speeding up predictive applications — Scott Leishman (Intel / Nervana Systems)
The development of predictive models is a time and computationally intensive process that is highly iterative in nature. By carefully optimizing the right parts of the workflow, order of magnitude type speed-ups can be achieved, leading to more accurate models in shorter periods of time. In this talk we'll touch on several different ways in which we've been able to drastically reduce the time to train deep learning models, from high level library choices all the way down to leveraging custom silicon.
Scott has over nine years experience creating machine learning based solutions to solve large-scale, real-world problems. Scott's currently the cloud team lead at Nervana Systems, focused on providing a highly optimized deep learning platform for customers across a variety of domains. Inside of work he can often be found pushing and reviewing code. Outside of work he can often be found running long distances and quaffing local craft beer, occasionally simultaneously.
Digital health has a problem. There are plenty of mobile applications being built to tackle just about every health-related issue out there, but most of these apps still lack quantitative data, predictive intelligence, and clinical validation. In this talk, Sean will discuss how we can aggregate smart home, wearable, connected health and ingestible data to create smarter, adaptive applications that patients, caregivers and providers can use to stay healthy outside the walls of a hospital.
Sean Lorenz is Founder & CEO of Senter, a startup creating a smart home health hub for healthy aging, as well as CTO for the Aging Well Institute. Dr. Lorenz was recently the Director of IoT Market Strategy for LogMeIn’s IoT platform, Xively. He has shaped business models and product strategies in several emerging markets including IoT, robotics, artificial intelligence and healthcare. He holds a PhD in Cognitive & Neural Systems from Boston University.
Transfer Learning and Fine-tuning Deep Convolution Neural Network model for Fashion images — Anusua Trivedi (Microsoft)
In this talk, we propose prediction techniques using deep learning on fashion images. We show how to build a generic deep learning model, which could be used with a fashion image to predict the clothing type in that image and generate fashion image description/captions. We propose a method to apply a pre-trained deep convolution neural network (DCNN) on images to improve prediction accuracy. We use an ImageNet pre-trained DCNN and apply fine-tuning to transfer the learned features to the prediction.
Anusua Trivedi is a Data Scientist at Microsoft’s Advanced Data Science & Strategic Initiatives team. She works on developing advanced Predictive Analytics & Deep Learning models. Prior to joining Microsoft, Anusua was a data scientist at a Supercomputer Center - Texas Advanced Computing Center (TACC). Anusua is a frequent speaker at machine learning and big data conferences.
Enterprise business analysts can no longer ignore the value of using public cloud-based machine learning solutions. The cost, quality, ease-of-use and rapid development in predictive APIs enables corporations to use such public cloud services for effectively modelling their private data. Within such hybrid public-private cloud environments, the need for API security is greater than ever. Building a proper API security infrastructure without a long-term API strategy can be a challenge, and over time become both costly and expose your organization to serious security risks.
JJason Macy is the Chief Technical Officer responsible for innovation and product strategy for global operations at Forum Systems. Jason has been a leading visionary for enterprise architecture design and successful deployment API identity and security technology. With hundreds of deployments worldwide, Jason’s unique ability to pragmatically solve complex, industry use cases and provide sustained engineering initiatives continues to forge the leadership role of Forum Systems product technology. Drawing from experience from virtually every industry sector, Jason has helped to evolve the Forum Sentry technology platform to be the global leader in FIPS 140-2 API security and identity.
Meta Data Science: When all the world's data scientists are just not enough — Chalenge Masekera (Salesforce)
What if you had to build more models than there are data scientists in the world? Well, enterprise companies serving hundreds of thousands of businesses often have to do precisely this. In this talk, I'll describe our general purpose machine learning platform that automatically builds per-company optimized models for any given predictive problem at scale, beating out most hand tuned models.
Chalenge Masekera is a data scientist at Salesforce, where he builds machine learning models and analytics tools that enable real time monitoring of system infrastructure, machine learning models and executive dashboards ensuring scalable machine learning pipelines. Previous experience also includes business intelligence consultancy. He has a Masters in Information Management and Systems from the University of California, Berkeley.
A few years ago, Hive brought SQL to Hadoop and enabled its widespread adoption by data analysts. Today, Spark has become the tool of choice for data engineers, who can build powerful data pipelines. However, Spark is fairly complex. Using it efficiently requires some understanding of the inner workings (shuffler, caching, memory, …). We will cover the challenges we faced in bringing Spark to an audience of less technical users, some of the solutions (like auto-tuning), and how improvements to Spark (memory management, statistics, new APIs, …) help bring its power to every data citizen.
Clément Stenac is a passionate software engineer, CTO of Dataiku. We are the makers of DSS, an integrated development environment that helps data analysts, scientists and engineers collaborate to build and run data applications. Clément was previously head of development at Exalead, leading the design and implementation of large-scale search engine software. He also has extended experience with open source software, as a former developer of the VideoLAN (VLC) and Debian projects.
Accelerating Model Development and Deployment with the right API Abstraction — Dallin Akagi (DataRobot)
The amount of value that we can get out of our data depends on both the accuracy of the models built around them and the speed with which these can be built, tested, and deployed. In this talk we present the API of DataRobot, focusing on the reasons why we focus on a higher-level modeling abstraction than other APIs. We will also share a use case illustrating how this abstraction level makes it possible to accelerate model deployment development.
Dallin is a data scientist and engineer at DataRobot, building a REST API for automated machine learning. He previously worked in a computer vision lab for the Department of Defense studying neural networks and deep learning. He studied Computer Science at CalTech.
In just the last few years, Machine Learning has gone from something barely known outside of academic circles, to becoming now a critically important tool for optimizing business operations. Assuming an organization even has a small team of ML experts, as the number of ML applications explodes, the pressure on these teams and their hand tailored solutions brings innovation to a halt.
As a result, many organizations are beginning to realize that the solution is to bring ML to everyone as a standardized platform. So, what should you be looking for in a platform? As easy as it it to make a wishlist of features, it's equally easy to overlook the importance of automation. ML tasks are iterative by nature and automation of the tasks and workflows is essential. In this talk you will see how WhizzML is making automation easy, reducing the need for experts, and putting the Machines back into Machine Learning.
Poul is Chief Infrastructure Officer at BigML. He has an MS degree in Mathematics as well as BS degrees in Mathematics, Physics and Engineering Physics. With 20 plus years of experience building scalable and fault tolerant systems in data centers, Poul currently enjoys the benefits of programmatic infrastructure, hacking in python to run BigML with only a laptop and a cloud.
How to use predictive APIs for 'Next Best Action'-marketing based on various datasets, predictive APIs and BigML's infrastructure.
Datatrics makes predictive marketing accessible, actionable and easy to use. With Datatrics, small and medium-sized enterprises can easily integrate their data, gain valuable insights and get actionable results that help them - and their team - to reach marketing goals. As Chief Technology Officer, Bas is responsible for the strategic development of the platform. Previously, Bas has worked in similar roles for Green Orange Digital Marketing and the financial analytics startup StockFluence.