Battery Log Data Mining — Ramon Oliveira (Datart)
Jun
22
3:00 pm15:00

Battery Log Data Mining — Ramon Oliveira (Datart)

Battery life is critical for smart devices, but optimizing it requires cooperation from the entire software ecosystem. Wasteful software affects user perception about devices’ battery quality. Therefore, a large team within a producer of those smart devices is focused on identifying and correcting energy consumption bugs. Since the software ecosystem grows fast, that team faces a lot of suspect issues, from which only a small fraction turns out to be genuine. Our project aims to streamline energy-related bug processing in devices of the company and its partners, by automatically identifying anomalous behaviors related to battery drain using data mining and machine learning.

Ramon is a researcher at a Motorola funded project and Co-founder of Datart, a data science consulting startup company.Ramon is an M.S. student at Unicamp working on Deep Learning and Prediction Uncertainty.

Defining Customer Value by Predicting Customer Conversion — Allan Dieguez (Creditas)
Jun
22
2:30 pm14:30

Defining Customer Value by Predicting Customer Conversion — Allan Dieguez (Creditas)

The objective of this presentation is to describe the challenges of modeling a customer conversion predictor using real leads data observed on different levels of the conversion funnel. This predictor is useful for segmenting customers by estimated effort of conversion, which allows intelligence-based decision making for many areas of the company, such as marketing, customer success and credit analysis. The discussion will include the solution sketching process, as well as the data extraction, feature engineering and model evaluation. It’ll also be covered some challenges of using such models as a support system for the operations analysts, as well as collecting their feedback to improve the solution.

Allan Dieguez is a Data Scientist at Creditas, responsible for conceptualizing, building and deploying optimization solutions in many areas of the company. He has 11 years of experience in designing and building Machine Learning solutions in many fields such as image recognition, NLP, dynamic pricing and data mining. He holds an MSc in Computer and Information Sciences from the Rio de Janeiro Federal University (UFRJ), Brazil.

Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM Research)
Jun
22
2:00 pm14:00

Discovering the hidden treasure of data using graph analytic — Ana Paula Appel (IBM Research)

Graphs are used to map relations on unstructured data. Companies’ data are most from the database and mined using traditional data mining approach. However, model relational data as a graph can reveal useful insights and discovery relation among data that is ignored by traditional data mining techniques. In this work, we used graphs to map physician relations using claim data as a proxy and this approach reveals interesting insights from health insurance company.

Research - Brazil working in Social Data Analytics Group. She joined IBM Brazil in February 2012. Ana Paula has a B.S. in Computer Science from UFSCar, an M. Sc. and Ph.D. degree also in Computer Science from the USP under the guidance of Prof. Dr. Caetano Traina Jr. She has a year visiting at Carnegie Mellon University (CMU) under the supervision of Prof. Dr. Christos Faloutsos. She also has a post-doc at UFSCar.

[Tutorial] Building Machine Learning applications locally with Spark — Joel Pinho Lucas (Tail Target)
Jun
22
12:00 pm12:00

[Tutorial] Building Machine Learning applications locally with Spark — Joel Pinho Lucas (Tail Target)

In times of huge amounts of heterogeneous data available, processing and extracting knowledge requires more and more efforts on building complex software architectures. In this context, Apache Spark provides a powerful and efficient approach for large-scale data processing. This talk will briefly introduce a powerful machine learning library (MLlib) along with a general overview of the Spark framework, describing how to launch applications within a cluster. In this way, a demo will show how to simulate a Spark cluster in a local machine using images available on a Docker Hub public repository. In the end, another demo will show how to save time using unit tests for validating jobs before running them in a cluster.

Joel has received his Bachelor degree in Computer Science from Universidade Federal de Pelotas (Brazil) in 2005 and his PhD in Informatics from Universidad de Salamanca (Spain) in 2010. His thesis topics were mainly within Big Data and Recommender Systems fields. For two years, he was part of the R&D sector of HP Brazil and, subsequently, he was responsible for building recommender systems architectures in Mobjoy Games. In the last three years, he works as a data scientist in Tail Target.

 

Deep Learning for Sentiment Analysis — André Barbosa (Elo7)
Jun
22
11:30 am11:30

Deep Learning for Sentiment Analysis — André Barbosa (Elo7)

Convolutional Neural Networks (CNNs) are already proven to be the state of art technique for image classification projects. However, some recent research found that it can be also used for some text classification problems such as sentiment analysis.This talk presents some definitions about what CNNs are and shows a little bit code about how to build one in a little Sentiment Analysis project.

André Barbosa works as a Data Scientist/ML Engineer at Elo7 where he develops and designs several machine learning solutions over a broad area that goes from computer vision to nlp. He holds a Bachelor’s Degree in Information Systems from EACH/USP. MOOC addicted, he completed several courses from websites like Coursera and he is currently pursuing a Nanodegree in Deep Learning Foundations from Udacity.

Improving a recommendation engine with transfer learning — Pierre Gutierrez (Dataiku Labs)
Jun
22
10:00 am10:00

Improving a recommendation engine with transfer learning — Pierre Gutierrez (Dataiku Labs)

For online businesses, recommender systems are paramount. There is an increasing need to take into account all the user information to tailor the best product offer, tailored to each new user.

Part of that information is the content that the user actually sees: the visuals of the products. When it comes to products like luxury hotels, pictures of the room, the building or even the nearby beach can significantly impact users’ decision.

In this talk, we will describe how we improved an online vacation retailer recommender system by using the information in images. We’ll explain how to leverage open data and pre-trained deep learning models to derive information on user taste. We will use a transfer learning approach that enables companies to use state of the art machine learning methods without needing deep learning expertise.

Pierre Gutierrez is a lead data scientist at Dataiku Labs in Paris, France. In the past few years, he has been working on state of the art Data Science and Machine learning problems in a large variety of sectors such as e-business, retail, insurance, or telcos. He has experience in topics such as fraud detection, bot detection, recommender systems, or churn prediction. Pierre has a pragmatic approach to data science and strongly believes in the power of transfer learning in image, text and AI.

A Tensorflow recommender system for news — Fabricio Vargas Matos (Hearst TV R&D)
Jun
22
9:30 am09:30

A Tensorflow recommender system for news — Fabricio Vargas Matos (Hearst TV R&D)

News recommendations are particularly challenging given the high number of new contents produced every day and the fast deterioration of its value for the users, demanding models and infrastructure able to deal with those nuances and serve a newly trained model about 100 times per day. Attending this presentation you're going to follow a detailed overview of how R&D team of Hearst's TV division is putting together Google BigQuery, Kubernetes cluster and Tensorflow to build a hybrid recommendation system combining model-based matrix factorization, content recency, and content semantics through NLP.

Fabricio joined the Hearst TV R&D team as a full-time Data Science Consultant in January 2017 to help them to design and implement their next generation recommending system. He have first class BS (1999) and MS (2004) in Computer Science, with a strong math background, working for a decade as a Senior Software Engineer and an Entrepreneur, and since 2016 Fabricio is fully dedicated to Data Science.

Building Machine Learning Service in Your Business — Eric Chen (Uber)
Jun
22
9:00 am09:00

Building Machine Learning Service in Your Business — Eric Chen (Uber)

When making machine learning applications in Uber, we identified a sequence of common practices and painful procedures, and thus built a machine learning platform as a service. We here present the key components to build such a scalable and reliable machine learning service which serves both our online and offline data processing needs.

Eric Chen is the Tech Lead/Manager for Machine Learning Platform in Uber Inc. He is responsible for designing a scalable and shareable platform to cover machine learning practices for engineers and data scientists in Uber.

Learning from Data and Background Knowledge — Fabio Cozman (Universidade de São Paulo)
Jun
21
4:00 pm16:00

Learning from Data and Background Knowledge — Fabio Cozman (Universidade de São Paulo)

It is often possible to combine observed data with background knowledge, perhaps expressed as a set of rules or as a terminology. There has been significant work on the combination of data-centered methods, often based on probabilities, and knowledge-based cues, often based on logical languages. In this talk, we will examine a number of tools that can help one combine data and knowledge when trying to learn a model.

Fabio G. Cozman is Full Professor at Escola Politecnica, Universidade de São Paulo, where he is the head of the Mechatronics Department. After finishing his PhD at Carnegie Mellon University (USA), Fabio has worked on automating decision-making and on machine learning techniques.

Practical Machine Learning Models to Prevent Revenue Loss — Eiti Kimura and Flávio Clésio (Movile)
Jun
21
3:00 pm15:00

Practical Machine Learning Models to Prevent Revenue Loss — Eiti Kimura and Flávio Clésio (Movile)

Nowadays with high data volumes, there’s a need to develop intelligent systems that can assist in data analysis and decision making. We offer a practical demonstration of machine learning to create an intelligent application based on distributed system data. We'll show machine learning techniques in the development of a data analysis application to monitor distributed platforms with direct impact on company revenue, saving more than 3M dollars a year. Also, we will provide a source code of a practical demonstration on how to train machine learning models and perform predictions with Apache Spark.

Eiti is an IT coordinator and architect of distributed and high-performance platforms at Movile Brazil. Eiti has over 15 years of experience working with software development. Eiti is an enthusiast of open technologies—he was an Apache Cassandra MVP from 2014 to 2015—and had vast experience with backend systems for carrier billing services, sending bulk text messages (SMS), and user action tracking. He holds a master’s degree in electrical engineering with a specialization in software engineering.

Flavio Clesio is a specialist in machine learning and revenue assurance at Movile, where he helps to develop core intelligent applications to exploit revenue opportunities and automation in decision making. Prior to Movile, Flavio was a business intelligence consultant in financial markets, specifically in nonperforming loans. He holds a master’s degree in computational intelligence applied in financial markets.

Shortening the time from analysis to deployment with ML-as-a-Service — Luiz Andrade (TEVEC Sistemas SA)
Jun
21
2:30 pm14:30

Shortening the time from analysis to deployment with ML-as-a-Service — Luiz Andrade (TEVEC Sistemas SA)

The daily job of a Data Scientist ranges from a variety of tasks: improving models performance or dealing with framework structure implementations. Machine Learning as a Service, a hot topic in the field, implies thinking about architecture to allow constant improvements in performance for our products. This presentation shows one architecture design using RESTful resources, document-oriented databases and pre-trained pipelines to achieve real-time predictions of time series with high availability, scalability and freedom to Data Scientists work directly on improving the accuracy rate of our products. We fine tuned to work on time series forecasting which is a very challenging field that still needs better solutions in terms of innovative modeling. During the presentation will be shown how these decisions keep our Data Scientists focused on working with real data and thinking about improvements that can reach a large volume of time series instead of singular and localized actions.

Luiz has worked as COO in TEVEC since its foundation, promoting Machine Learning software development and implementation. He did Civil Engineering from Polytechnic School of the University of São Paulo, MSc Transportation Engineer from Polytechnic School of the University of São Paulo, Graduated in Global Supply Chain and Logistics from Massachusetts Institute of Technology.

 From 0 to ML in a Few Clicks — Poul Petersen (BigML)
Jun
21
11:45 am11:45

From 0 to ML in a Few Clicks — Poul Petersen (BigML)

So you've heard of Machine Learning and are eager to make your first data-driven decision? In this presentation, Poul Petersen, CIO of BigML, will show you the quickest way to get started with Machine Learning. More importantly, you will see some examples of the type of end-to-end predictive applications that can be built quickly using a Machine Learning API, saving you the grief of trying to productize a roll-your-own solution when it is time to put your models into production.

Poul is Chief Infrastructure Officer at BigML. He has an MS degree in Mathematics as well as BS degrees in Mathematics, Physics and Engineering Physics. With 20 plus years of experience building scalable and fault-tolerant systems in data centres, Poul currently enjoys the benefits of programmatic infrastructure, hacking in python to run BigML with only a laptop and a cloud.

[Tutorial] Feature Engineering — HJ van Veen (Nubank)
Jun
21
11:15 am11:15

[Tutorial] Feature Engineering — HJ van Veen (Nubank)

Feature engineering is one of the most important, yet elusive, skills to master if you want to be a good data scientist. Machine learning competitions are hardly ever won with strong modeling techniques alone -- it is the combination of creative feature engineering and powerful modeling techniques that makes the difference. This tutorial will give the audience practical tips and tricks to improve the performance of machine learning algorithms. We will broadly look at feature engineering for applied machine learning, touching on subjects like: categorical vs. numerical variables, data cleaning, feature extraction, transformations, and imputation.

Currently, HJ van Veen is working as a Data scientist at Nubank.

Solving a business problem in 2 weeks using machine learning —  Lucas Fonseca Navarro (Getninjas)
Jun
21
9:45 am09:45

Solving a business problem in 2 weeks using machine learning — Lucas Fonseca Navarro (Getninjas)

Since Data Science became a very popular field, Machine Learning(ML) algorithms are being used more than ever in the industry. ML techniques are amazing to solve a bunch of complex business problems efficiently and also in a very fast manner with all of the available tools that we have nowadays. In this presentation, we will show you how we designed and implemented an ML application to solve an emergent problem in just two weeks. We want to show how a dynamic and pro-creativity environment of a StartUp company combined with Machine Learning can be powerful to create efficient solutions.

Lucas completed his Bachelor's in Computer Science from Federal University of Sao Carlos (2013) and Master's in Computer Science in the area of Artificial Intelligence from Federal University of Sao Carlos(2015). Currently, Lucas is working at Getninjas as Data Scientist and Product Owner in the Data Science Squad. Lucas has experience in Machine Learning, Business Intelligence, Data Visualization and Product Management.

Getting value from data science in the Telco business: the journey of Vivo Data Labs — Paula Fadul (Telefonica)
Jun
21
9:15 am09:15

Getting value from data science in the Telco business: the journey of Vivo Data Labs — Paula Fadul (Telefonica)

For those who had the chance to take a closer look, we can agree that data science may not be the holy grail of our today’s companies, but it provides the right questions and propose better ways to achieve their goals. The point is: making our business truly smarter using data is no simple task. In this presentation, we will go through the main challenges faced by the data scientists team since its creation and how we could leverage disruptive changes in our business process, technology definitions, organizational structure and culture. Finally, we will provide a brief overview of the most relevant use cases we have delivered inside the company and the roadmap we have ahead.

Paula Fadul is senior manager of Data Labs team at Telefonica Brazil. She joined Telefonica in January 2008 when she first started her career, and she had worked with Advanced Analytics since then being part of the evolution of Business Intelligence and the adoption of Big Data in the company. Paula holds a B.S. in Electrical Engineering from Universidade Federal de Uberlândia, a Master in Marketing Management from Insper and also a post-graduation in Digital Telecommunications Management from Universitat Politecnica de Catalunya, Spain.