MLDB is an open-source database designed for machine learning. You can install it wherever you want and send it commands over a RESTful API to store data, explore it using SQL, then train machine learning models and expose them as APIs. In this talk, we will cover how to build a Predictive API "end to end" from data exploration to model evaluation to deployment, all using only simple MLDB API calls from MLDB's Notebook interface.
Jeremy Barnes is an entrepreneur and technology leader, active at the intersection of artificial intelligence and industry. He has 15 of years of experience applying machine learning to develop innovative products. Prior to MLDB.ai, he co-founded Datacratic, an enterprise software company developing machine learning technology for the marketing industry. Before that, Jeremy co-founded Idilia, a computational linguistics company, where he was responsible for research and development of Idilia's machine learning based core computational linguistics technology.
Machine learning as a service (MLaS) is imperative to the success of many companies as many internal teams and organizations need to gain business intelligence from big data. Building a scalable MLaS in a very challenging problem. In this paper, we present the scalable MLaS we built for a company that operates globally. We focus on several scalability challenges and our technical solutions.
LI Erran Li received his Ph.D. in Computer Science from Cornell University in 2001. From 2001 to 2015, he worked as a researcher in Bell Labs, Alcatel-Lucent (acquired by Nokia). Since 2015, he started working as a senior software engineer at Uber Technologies. He is also an adjunct professor in the Computer Science Department of Columbia University. He is an IEEE Fellow and ACM Distinguished Scientist. His research interests are machine learning algorithms, systems, deep learning and AI.
Finding value in legacy industrial datasets: lessons from Amey’s Mercury platform — Stephen Gooberman-Hill (Amey)
Amey is one of the UK’s leading engineering asset management companies. We manage the design, build and maintenance of large public infrastructure estates – our clients include rail operators, airports and public utilities. The assets are specified for long lifetimes – an escalator is designed to last 30 years; a bridge hundreds.
Many of these assets are instrumented via a variety of legacy systems. We have designed and deployed a system called Mercury, which builds models of asset performance from this instrumentation data, and combines it with work order and other maintenance data to allow operations and maintenance teams to understand the performance of their assets.
Machine learning is integral to Mercury, problems include free text matching, anomaly detection and fault prediction. I will talk about our experiences of applying ML techniques into legacy asset datasets, the issues we have faced, and how we have been able to provide actionable predictions of upcoming asset failures.
Dr Stephen Gooberman-Hill is a Principal Consultant in Amey’s Strategic Consulting and Technology Group. He is the solution originator of Amey’s Mercury data analytics system. He is currently managing a number of Mercury pilot deployments, and is also developing innovative data gathering and analytic solutions across a range of customers and partners.
A multi-pronged approach to speeding up predictive applications — Scott Leishman (Intel / Nervana Systems)
The development of predictive models is a time and computationally intensive process that is highly iterative in nature. By carefully optimizing the right parts of the workflow, order of magnitude type speed-ups can be achieved, leading to more accurate models in shorter periods of time. In this talk we'll touch on several different ways in which we've been able to drastically reduce the time to train deep learning models, from high level library choices all the way down to leveraging custom silicon.
Scott has over nine years experience creating machine learning based solutions to solve large-scale, real-world problems. Scott's currently the cloud team lead at Nervana Systems, focused on providing a highly optimized deep learning platform for customers across a variety of domains. Inside of work he can often be found pushing and reviewing code. Outside of work he can often be found running long distances and quaffing local craft beer, occasionally simultaneously.
Digital health has a problem. There are plenty of mobile applications being built to tackle just about every health-related issue out there, but most of these apps still lack quantitative data, predictive intelligence, and clinical validation. In this talk, Sean will discuss how we can aggregate smart home, wearable, connected health and ingestible data to create smarter, adaptive applications that patients, caregivers and providers can use to stay healthy outside the walls of a hospital.
Sean Lorenz is Founder & CEO of Senter, a startup creating a smart home health hub for healthy aging, as well as CTO for the Aging Well Institute. Dr. Lorenz was recently the Director of IoT Market Strategy for LogMeIn’s IoT platform, Xively. He has shaped business models and product strategies in several emerging markets including IoT, robotics, artificial intelligence and healthcare. He holds a PhD in Cognitive & Neural Systems from Boston University.
Transfer Learning and Fine-tuning Deep Convolution Neural Network model for Fashion images — Anusua Trivedi (Microsoft)
In this talk, we propose prediction techniques using deep learning on fashion images. We show how to build a generic deep learning model, which could be used with a fashion image to predict the clothing type in that image and generate fashion image description/captions. We propose a method to apply a pre-trained deep convolution neural network (DCNN) on images to improve prediction accuracy. We use an ImageNet pre-trained DCNN and apply fine-tuning to transfer the learned features to the prediction.
Anusua Trivedi is a Data Scientist at Microsoft’s Advanced Data Science & Strategic Initiatives team. She works on developing advanced Predictive Analytics & Deep Learning models. Prior to joining Microsoft, Anusua was a data scientist at a Supercomputer Center - Texas Advanced Computing Center (TACC). Anusua is a frequent speaker at machine learning and big data conferences.
BayesDB and VizGPM.js: open-source AI for visually exploring complex databases — Richard Tibbetts (MIT)
Artificially intelligent data products don’t have to be limited to answering simple natural language queries. Navigation, search, and retrieval of structured data, even by sophisticated domain experts, benefit from using AI to infer data’s latent structure. Using open source BayesDB and VizGPM.js, we demonstrate interfaces for browsing US census and software performance data.
Richard Tibbetts is a software entrepreneur, database and programming languages nerd, a Visiting Scientist at MIT Probabilistic Computing and a leader of the "BayesDB": probcomp.csail.mit.edu/bayesdb open source project. Prior to MIT Richard was founder and CTO at StreamBase, a CEP company that merged with TIBCO in 2013. Richard is also the CEO of Empirical Systems a stealth mode startup.
Development and cloud deployment of machine learning models for heartbeat classification on data from wearable devices — Ikaro Silva (MC10)
Electrical heart signals are one of the most recorded and stored physiological data in healthcare. With cardiovascular diseases being the single most common cause of death in the world, automatic analysis of cardiac signals under normal ambulatory conditions is expected to play a crucial role in assisting clinicians identify health issues. A critical step towards this goal is the automatic classification of heartbeats. The purpose of this work is to showcase the development and deployment of a cloud system for heartbeat classification collected from wearable devices.
Dr. Ikaro Silva is a Data Scientist at MC10 Inc and is responsible for developing algorithms that process the biological signals collected through MC10's unique wearable form factors. Dr. Silva is also a research scientist at MIT, where he is involved in augmenting PhysioNet's open source software and research.
Data Science and Dev Ops teams live on opposite sides of a wall in most organizations. Despite the separation, these teams should work together to develop a coherent process to release analytic products, support those products and maintain sanity. We propose an institutional capability, ‘Analytic Operations’, to support data-driven processes within lines-of-business. We hope to share lessons learned practicing Analytic Ops and present a set of best practices for Analytic Ops teams. We also demo open source tools that reduce frictions between Data Science and Ops/Deployment teams.
Stuart Bailey is a partner and the Chief Technology Officer at the Open Data Group. He is a technologist and entrepreneur who has been focused on analytic and data intensive distributed systems for over two decades. Prior to Open Data Group, Stuart was the founder and most recently Chief Scientist of Infoblox (NYSE:BLOX), a Sequoia Capital-backed company. More than half the Fortune 500 rely on the Infoblox automated distributed system solutions for essential, software-based network control.
Mac Devine is an IBM Fellow currently serving as Vice President and CTO for Emerging Technologies. He is also a faculty member for the Cloud and Internet-of-Things Expo, and a member of the IoT Community Advisory Board.
[Keynote] Computational Privacy: The privacy bounds of human behavior — Yves-Alexandre de Montjoye (MIT Media Lab)
We're living in an age of big data, a time when most of our movements and actions are collected and stored in real time. Large-scale mobile phone, credit card, or browsing datasets dramatically increase our capacity to measure, understand, and potentially affect the behavior of individuals and collectives. The use of this data, however, raise legitimate privacy concerns. In this talk, I will first show how the mere absence of obvious identifiers such as name or phone number is often not enough to prevent re-identification. I will then discuss how, as the use of this data progress, it will become increasingly important to consider whether sensitive information can be inferred from apparently innocuous data. Finally, I will discuss the impact of metadata on society and some of solutions we have been developing to allow metadata to be used in a privacy-conscientious way.
Yves-Alexandre de Montjoye is a Research Scientist at the MIT Media Lab (and was previously a postdoctoral researcher at Harvard IQSS). His research aims at understanding how the unicity of human behavior impacts the privacy of individuals in large-scale metadata datasets. (My work has been covered in The New York Times, BBC News, CNN, Wall Street Journal, Harvard Business Review, Le Monde, Die Spiegel, Die Zeit, El Pais, and in reports of the World Economic Forum, United Nations, OECD, FTC, and the European Commission, as well as in my talks at TEDxLLN and TEDxULg.)
For 2,500 years, knowledge in the West has consisted of justifiable true beliefs. But humans often cannot comprehend how Deep Learning comes up with its results. This is different from the role played by our traditional instruments of knowledge and poses challenges not just to the uses to which we put what we learn, but also to our idea of what knowledge itself is.
David Weinberg is a senior researcher at Harvard's Berkman Center for Internet & Society. From 2010 to 2014, he was Co-director of Harvard's Library Innovation Lab, and led Harvard Library's Interoperability Initiative. Since Spring 2015, he is a fellow at Harvard's Shorenstein Center for Media, Politics, and Public Policy. (Part of the Harvard Kennedy School.)
David is co-author of the best-seller The Cluetrain Manifesto, which InformationWeek called the most important business book since Tom Peter's In Search of Excellence. He is the author of the critically-acclaimed Small Pieces Loosely Joined: A Unified Theory of the Web and of Everything Is Miscellaneous: The Power of the New Digital Disorder. His latest book, Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room, is about how the Net is transforming knowledge and expertise, winner of two international Best Book of the Year awards.