MLDB is an open-source database designed for machine learning. You can install it wherever you want and send it commands over a RESTful API to store data, explore it using SQL, then train machine learning models and expose them as APIs. In this talk, we will cover how to build a Predictive API "end to end" from data exploration to model evaluation to deployment, all using only simple MLDB API calls from MLDB's Notebook interface.
Jeremy Barnes is an entrepreneur and technology leader, active at the intersection of artificial intelligence and industry. He has 15 of years of experience applying machine learning to develop innovative products. Prior to MLDB.ai, he co-founded Datacratic, an enterprise software company developing machine learning technology for the marketing industry. Before that, Jeremy co-founded Idilia, a computational linguistics company, where he was responsible for research and development of Idilia's machine learning based core computational linguistics technology.
Machine learning as a service (MLaS) is imperative to the success of many companies as many internal teams and organizations need to gain business intelligence from big data. Building a scalable MLaS in a very challenging problem. In this paper, we present the scalable MLaS we built for a company that operates globally. We focus on several scalability challenges and our technical solutions.
LI Erran Li received his Ph.D. in Computer Science from Cornell University in 2001. From 2001 to 2015, he worked as a researcher in Bell Labs, Alcatel-Lucent (acquired by Nokia). Since 2015, he started working as a senior software engineer at Uber Technologies. He is also an adjunct professor in the Computer Science Department of Columbia University. He is an IEEE Fellow and ACM Distinguished Scientist. His research interests are machine learning algorithms, systems, deep learning and AI.
Finding value in legacy industrial datasets: lessons from Amey’s Mercury platform — Stephen Gooberman-Hill (Amey)
Amey is one of the UK’s leading engineering asset management companies. We manage the design, build and maintenance of large public infrastructure estates – our clients include rail operators, airports and public utilities. The assets are specified for long lifetimes – an escalator is designed to last 30 years; a bridge hundreds.
Many of these assets are instrumented via a variety of legacy systems. We have designed and deployed a system called Mercury, which builds models of asset performance from this instrumentation data, and combines it with work order and other maintenance data to allow operations and maintenance teams to understand the performance of their assets.
Machine learning is integral to Mercury, problems include free text matching, anomaly detection and fault prediction. I will talk about our experiences of applying ML techniques into legacy asset datasets, the issues we have faced, and how we have been able to provide actionable predictions of upcoming asset failures.
Dr Stephen Gooberman-Hill is a Principal Consultant in Amey’s Strategic Consulting and Technology Group. He is the solution originator of Amey’s Mercury data analytics system. He is currently managing a number of Mercury pilot deployments, and is also developing innovative data gathering and analytic solutions across a range of customers and partners.
A multi-pronged approach to speeding up predictive applications — Scott Leishman (Intel / Nervana Systems)
The development of predictive models is a time and computationally intensive process that is highly iterative in nature. By carefully optimizing the right parts of the workflow, order of magnitude type speed-ups can be achieved, leading to more accurate models in shorter periods of time. In this talk we'll touch on several different ways in which we've been able to drastically reduce the time to train deep learning models, from high level library choices all the way down to leveraging custom silicon.
Scott has over nine years experience creating machine learning based solutions to solve large-scale, real-world problems. Scott's currently the cloud team lead at Nervana Systems, focused on providing a highly optimized deep learning platform for customers across a variety of domains. Inside of work he can often be found pushing and reviewing code. Outside of work he can often be found running long distances and quaffing local craft beer, occasionally simultaneously.
Digital health has a problem. There are plenty of mobile applications being built to tackle just about every health-related issue out there, but most of these apps still lack quantitative data, predictive intelligence, and clinical validation. In this talk, Sean will discuss how we can aggregate smart home, wearable, connected health and ingestible data to create smarter, adaptive applications that patients, caregivers and providers can use to stay healthy outside the walls of a hospital.
Sean Lorenz is Founder & CEO of Senter, a startup creating a smart home health hub for healthy aging, as well as CTO for the Aging Well Institute. Dr. Lorenz was recently the Director of IoT Market Strategy for LogMeIn’s IoT platform, Xively. He has shaped business models and product strategies in several emerging markets including IoT, robotics, artificial intelligence and healthcare. He holds a PhD in Cognitive & Neural Systems from Boston University.
Transfer Learning and Fine-tuning Deep Convolution Neural Network model for Fashion images — Anusua Trivedi (Microsoft)
In this talk, we propose prediction techniques using deep learning on fashion images. We show how to build a generic deep learning model, which could be used with a fashion image to predict the clothing type in that image and generate fashion image description/captions. We propose a method to apply a pre-trained deep convolution neural network (DCNN) on images to improve prediction accuracy. We use an ImageNet pre-trained DCNN and apply fine-tuning to transfer the learned features to the prediction.
Anusua Trivedi is a Data Scientist at Microsoft’s Advanced Data Science & Strategic Initiatives team. She works on developing advanced Predictive Analytics & Deep Learning models. Prior to joining Microsoft, Anusua was a data scientist at a Supercomputer Center - Texas Advanced Computing Center (TACC). Anusua is a frequent speaker at machine learning and big data conferences.
BayesDB and VizGPM.js: open-source AI for visually exploring complex databases — Richard Tibbetts (MIT)
Artificially intelligent data products don’t have to be limited to answering simple natural language queries. Navigation, search, and retrieval of structured data, even by sophisticated domain experts, benefit from using AI to infer data’s latent structure. Using open source BayesDB and VizGPM.js, we demonstrate interfaces for browsing US census and software performance data.
Richard Tibbetts is a software entrepreneur, database and programming languages nerd, a Visiting Scientist at MIT Probabilistic Computing and a leader of the "BayesDB": probcomp.csail.mit.edu/bayesdb open source project. Prior to MIT Richard was founder and CTO at StreamBase, a CEP company that merged with TIBCO in 2013. Richard is also the CEO of Empirical Systems a stealth mode startup.
Development and cloud deployment of machine learning models for heartbeat classification on data from wearable devices — Ikaro Silva (MC10)
Electrical heart signals are one of the most recorded and stored physiological data in healthcare. With cardiovascular diseases being the single most common cause of death in the world, automatic analysis of cardiac signals under normal ambulatory conditions is expected to play a crucial role in assisting clinicians identify health issues. A critical step towards this goal is the automatic classification of heartbeats. The purpose of this work is to showcase the development and deployment of a cloud system for heartbeat classification collected from wearable devices.
Dr. Ikaro Silva is a Data Scientist at MC10 Inc and is responsible for developing algorithms that process the biological signals collected through MC10's unique wearable form factors. Dr. Silva is also a research scientist at MIT, where he is involved in augmenting PhysioNet's open source software and research.
Data Science and Dev Ops teams live on opposite sides of a wall in most organizations. Despite the separation, these teams should work together to develop a coherent process to release analytic products, support those products and maintain sanity. We propose an institutional capability, ‘Analytic Operations’, to support data-driven processes within lines-of-business. We hope to share lessons learned practicing Analytic Ops and present a set of best practices for Analytic Ops teams. We also demo open source tools that reduce frictions between Data Science and Ops/Deployment teams.
Stuart Bailey is a partner and the Chief Technology Officer at the Open Data Group. He is a technologist and entrepreneur who has been focused on analytic and data intensive distributed systems for over two decades. Prior to Open Data Group, Stuart was the founder and most recently Chief Scientist of Infoblox (NYSE:BLOX), a Sequoia Capital-backed company. More than half the Fortune 500 rely on the Infoblox automated distributed system solutions for essential, software-based network control.
Mac Devine is an IBM Fellow currently serving as Vice President and CTO for Emerging Technologies. He is also a faculty member for the Cloud and Internet-of-Things Expo, and a member of the IoT Community Advisory Board.
[Keynote] Computational Privacy: The privacy bounds of human behavior — Yves-Alexandre de Montjoye (MIT Media Lab)
We're living in an age of big data, a time when most of our movements and actions are collected and stored in real time. Large-scale mobile phone, credit card, or browsing datasets dramatically increase our capacity to measure, understand, and potentially affect the behavior of individuals and collectives. The use of this data, however, raise legitimate privacy concerns. In this talk, I will first show how the mere absence of obvious identifiers such as name or phone number is often not enough to prevent re-identification. I will then discuss how, as the use of this data progress, it will become increasingly important to consider whether sensitive information can be inferred from apparently innocuous data. Finally, I will discuss the impact of metadata on society and some of solutions we have been developing to allow metadata to be used in a privacy-conscientious way.
Yves-Alexandre de Montjoye is a Research Scientist at the MIT Media Lab (and was previously a postdoctoral researcher at Harvard IQSS). His research aims at understanding how the unicity of human behavior impacts the privacy of individuals in large-scale metadata datasets. (My work has been covered in The New York Times, BBC News, CNN, Wall Street Journal, Harvard Business Review, Le Monde, Die Spiegel, Die Zeit, El Pais, and in reports of the World Economic Forum, United Nations, OECD, FTC, and the European Commission, as well as in my talks at TEDxLLN and TEDxULg.)
Claudia Perlich is Chief Scientist at Dstillery (the former Media6Degrees) where she designs, develops, analyzes and optimizes the machine learning that drives digital advertising to prospective customers of brands. She was selected as member of the Crain’s NY annual 40 Under 40 list.
This talk showcases a patient health condition prediction and monitoring system, with goal to perform predictive care on patients and identify the risk factors before the condition become more serious. We illustrate the end-to-end process to build this application, including how the machine learning model is trained and deployed, how to automate the data scoring process, and how to consume these results using a reporting/visualization tool.
Dr. Yan Zhang is a Sr. Data Scientist in Algorithm and Data Science team in data group, Cloud & Enterprise, Microsoft. She builds predictive analytics models and generalizes machine learning solutions on Cloud machine learning platform. Her recent research include cost prediction/fraud claim detection in healthcare domain and predictive maintenance in IoT applications. She is a viewer for book "Predictive Analytics with Microsoft Azure Machine Learning 2nd Edition" published in September 2015.
Enterprise business analysts can no longer ignore the value of using public cloud-based machine learning solutions. The cost, quality, ease-of-use and rapid development in predictive APIs enables corporations to use such public cloud services for effectively modelling their private data. Within such hybrid public-private cloud environments, the need for API security is greater than ever. Building a proper API security infrastructure without a long-term API strategy can be a challenge, and over time become both costly and expose your organization to serious security risks.
JJason Macy is the Chief Technical Officer responsible for innovation and product strategy for global operations at Forum Systems. Jason has been a leading visionary for enterprise architecture design and successful deployment API identity and security technology. With hundreds of deployments worldwide, Jason’s unique ability to pragmatically solve complex, industry use cases and provide sustained engineering initiatives continues to forge the leadership role of Forum Systems product technology. Drawing from experience from virtually every industry sector, Jason has helped to evolve the Forum Sentry technology platform to be the global leader in FIPS 140-2 API security and identity.
Meta Data Science: When all the world's data scientists are just not enough — Chalenge Masekera (Salesforce)
What if you had to build more models than there are data scientists in the world? Well, enterprise companies serving hundreds of thousands of businesses often have to do precisely this. In this talk, I'll describe our general purpose machine learning platform that automatically builds per-company optimized models for any given predictive problem at scale, beating out most hand tuned models.
Chalenge Masekera is a data scientist at Salesforce, where he builds machine learning models and analytics tools that enable real time monitoring of system infrastructure, machine learning models and executive dashboards ensuring scalable machine learning pipelines. Previous experience also includes business intelligence consultancy. He has a Masters in Information Management and Systems from the University of California, Berkeley.
A few years ago, Hive brought SQL to Hadoop and enabled its widespread adoption by data analysts. Today, Spark has become the tool of choice for data engineers, who can build powerful data pipelines. However, Spark is fairly complex. Using it efficiently requires some understanding of the inner workings (shuffler, caching, memory, …). We will cover the challenges we faced in bringing Spark to an audience of less technical users, some of the solutions (like auto-tuning), and how improvements to Spark (memory management, statistics, new APIs, …) help bring its power to every data citizen.
Clément Stenac is a passionate software engineer, CTO of Dataiku. We are the makers of DSS, an integrated development environment that helps data analysts, scientists and engineers collaborate to build and run data applications. Clément was previously head of development at Exalead, leading the design and implementation of large-scale search engine software. He also has extended experience with open source software, as a former developer of the VideoLAN (VLC) and Debian projects.
Accelerating Model Development and Deployment with the right API Abstraction — Dallin Akagi (DataRobot)
The amount of value that we can get out of our data depends on both the accuracy of the models built around them and the speed with which these can be built, tested, and deployed. In this talk we present the API of DataRobot, focusing on the reasons why we focus on a higher-level modeling abstraction than other APIs. We will also share a use case illustrating how this abstraction level makes it possible to accelerate model deployment development.
Dallin is a data scientist and engineer at DataRobot, building a REST API for automated machine learning. He previously worked in a computer vision lab for the Department of Defense studying neural networks and deep learning. He studied Computer Science at CalTech.
In just the last few years, Machine Learning has gone from something barely known outside of academic circles, to becoming now a critically important tool for optimizing business operations. Assuming an organization even has a small team of ML experts, as the number of ML applications explodes, the pressure on these teams and their hand tailored solutions brings innovation to a halt.
As a result, many organizations are beginning to realize that the solution is to bring ML to everyone as a standardized platform. So, what should you be looking for in a platform? As easy as it it to make a wishlist of features, it's equally easy to overlook the importance of automation. ML tasks are iterative by nature and automation of the tasks and workflows is essential. In this talk you will see how WhizzML is making automation easy, reducing the need for experts, and putting the Machines back into Machine Learning.
Poul is Chief Infrastructure Officer at BigML. He has an MS degree in Mathematics as well as BS degrees in Mathematics, Physics and Engineering Physics. With 20 plus years of experience building scalable and fault tolerant systems in data centers, Poul currently enjoys the benefits of programmatic infrastructure, hacking in python to run BigML with only a laptop and a cloud.
How to use predictive APIs for 'Next Best Action'-marketing based on various datasets, predictive APIs and BigML's infrastructure.
Datatrics makes predictive marketing accessible, actionable and easy to use. With Datatrics, small and medium-sized enterprises can easily integrate their data, gain valuable insights and get actionable results that help them - and their team - to reach marketing goals. As Chief Technology Officer, Bas is responsible for the strategic development of the platform. Previously, Bas has worked in similar roles for Green Orange Digital Marketing and the financial analytics startup StockFluence.
[Tutorial] Evaluating Failure Prediction Models for Predictive Maintenance — Shaheen Gauher (Microsoft)
Predictive Maintenance is about anticipating failures and taking preemptive actions. In the realm of predictive maintenance, the event of interest is an equipment failure. Modelling for Predictive Maintenance falls under the classic problem of modelling with imbalanced data when only a fraction of the data constitutes failure. This kind of data poses several issues. In this talk I will highlight some of the pitfalls and challenges of building a model with such data and describe ways to circumvent the problems using real use cases and examples.
Shaheen Gauher, PhD, is a Data Scientist in Information Management and Machine Learning at Microsoft. She develops end to end data driven advanced analytic solutions for external customers working across all verticals.
[Tutorial] Beyond Churn Prediction: An Introduction to Uplift Modelling — Pierre Gutierrez (Dataiku)
In several industries (e-business, telcos…), a common approach to diminishing user churn is to use machine learning to score individual customers by churn probability and target them with specific messaging or offers. However, this approach may be ineffective since it does not optimize what is called "true lift" or "uplift": the effect of an action on churning probability. This talk aims at introducing uplift modeling in a tutorial like format. We’ll cover the basics of the theory as well as how to make it work in practice. We will illustrate the talk with examples from real life.
Pierre Gutierrez is a senior data scientist at Dataiku. As a data science expert and consultant, Pierre has worked in diverse sectors such as e-business, retail, insurance or telcos. He has experience in various topics such as fraud detection, bot detection, recommender systems, or churn prediction.
We will look at ways of applying data science and machine learning to better understand customers and improve their user experience. From a practical industry-application perspective, we will discuss the following: measuring popularity, statistical significance in A/B testing, survival analysis, predictive lifetime value and recommendation systems. We will review the concepts and some of the math behind these, while also addressing the real world challenges faced by many of these implementations.
Vinny is a Senior Data Scientist and Professor at Metis. Previously, he was a Lead Data Scientist at High 5 Games and an R&D Programmer at Blue Sky Studios (the animation company that made Ice Age, Rio and Peanuts).
- Abhi Yadav - CEO at DataXylo (Linkedin)
- Apparao Kari - CEO at Cintell (Linkedin)
- Rag Srinivas - Cloud Architect at IBM (Linkedin)
- Snejina Zacharia - CEO at Insurify (Twitter - Linkedin)