Started 5 big data companies, Kaggle competition champion: Internet deep learning misunderstanding-spend great efforts on those things with little influence

Started 5 big data companies, Kaggle competition champion: Internet deep learning misunderstanding-spend great efforts on those things with little influence

Author: Gregory Piatetsky, KDnuggets.

I conducted an exclusive interview with Jeremy Howard, a data scientist known as a "rock star". He talked about his latest deep learning online course, why Kaggle has achieved industry leadership, and the value of data scientists.

Jeremy Howard (@jeremyphoward) is a "rock star" in the field of data science. He did well when he was young and got the highest exam scores many times when he was still studying in Australia, but he found it boring in school. So he started to "start a business" at the age of 12, selling pirated game software, and was hired by McKinsey as a self-taught data analyst at the age of 18. A few years later, he founded Optimal Decision Group to use data analysis to help insurance companies increase their profits.

In contrast, his second startup, FastMail, is more popular and well-known. At the end of 2000, he sold these two companies and started a simple "retirement" life-learning Chinese and making audio amplifiers by himself.
Started 5 big data companies, Kaggle competition champion: Internet deep learning misunderstanding-spend great efforts on those things with little influence

In order to find a challenge, in 2010, he participated in a competition held by Kaggle and won the first place with a blockbuster. He was then invited to join Kaggle as the president and chief scientific scientist, helping Kaggle to take the lead in the industry step by step.

After he left Kaggle in December 2013, he founded a company Enlitic to improve medical diagnosis and clinical decision-making by using Deep Learning.

The author first met Jeremy at the KDD-2011 conference. At that conference, he gave an unforgettable speech on deep learning. He didn't use any slides, just used a marker to write and draw on the whiteboard, explaining his thoughts and understandings in a simple way.

fast.ai is Jeremy’s latest startup. You can find the answer below for the details of this company.

Q1. Gregory Piatetsky (hereinafter referred to as GP): Tell us about your current startup company fast.ai? How is your "Deep Learning for Coders" course different from other deep learning courses?

Jeremy Howard (JH): There are many in-depth learning courses online, but none of them meet our most important needs. We hope to show people how to choose and use the most effective deep learning techniques to solve practical problems. And we hope it is as simple as possible, especially easy to understand and master for programmers without any experience.

The previous teaching method is highly related to the field of mathematics and cannot directly solve any problems, such as programming problems on Udacity.

Based on the analysis of many in-depth learning projects or courses, we realized that the most important teaching method is transfer learning, which refers to the use of models that have been trained on large data sets as a starting point or baseline. This speeds up the training time by several orders of magnitude, provides a more accurate model, and does not require too much data.

We are also committed to teaching only those research results obtained from real, practical problems. I heard that many people who have studied our MOOC courses have benefited a lot: greatly improved the accuracy and training speed of their models!
Started 5 big data companies, Kaggle competition champion: Internet deep learning misunderstanding-spend great efforts on those things with little influence

Q2. GP: Before founding fast.ai, you created Enlitic in 2014 with the goal of using deep learning to help doctors make diagnoses faster and more accurately. Compared with other doctors with professional medical education, how much help and improvement can you get (using Enlitic)?

JH: I don't know the latest development, after all, I have been away for a few months. However, when I was learning how to apply deep learning to medicine, I found that the opportunities and potential in this area are huge. Most importantly, through the application of this technology, it has the opportunity to save lives and significantly reduce medical costs, especially in developing countries.

In clinical trials, Enlitic helped four top radiologists in the industry find and diagnose 7% more cancers. In the diagnosis of a wide range of cases, the doctor’s misdiagnosis rate was 66%, while Enlitic’s rate was 47%. —— Sydney Morning Herald Reports

Started 5 big data companies, Kaggle competition champion: Internet deep learning misunderstanding-spend great efforts on those things with little influence

Q3. GP: What are the obstacles to the widespread adoption of Enlitic or similar automation technology in the healthcare industry?

JH: One of the biggest obstacles is the lack of comprehensive data sets. That is, a data set that contains the history of medical tests, interventions, and treatment results over a long period of time, and connects all patients. Only through such a data set can an effective model for diagnostic testing and treatment recommendations based on actual treatment results be constructed.

Another obstacle is the lack of data scientists researching this field. In the Internet industry, you can see many smart and capable people working on things with little "influence", such as advertising systems, recommendation systems, and time-wasting social networks. This is surprising to me.

In addition, many in-depth researchers in the academic world are focusing on "how to build a brain" instead of solving various important problems facing humans.

Another particular obstacle is that medical practitioners, especially clinical experts, have a very high level of expertise in their field of knowledge, so that it is difficult to find others who can provide us with teaching advice on "solving medical problems".

Q4. GP: You received widespread attention as the first place in the Kaggle competition, and later became the president of Kaggle. Is there anything worth talking about during Kaggle? For those who want to challenge your Kaggle ranking, what advice do you have?

JH: I learned a lot about machine learning in Kaggle, almost more than the accumulation in the previous two decades. Another point is that in the past few months, I have been studying and researching Kaggle data sets in preparation for our courses.

For people who want to improve their rankings, or machine learning practitioners who want to improve their skills, my advice is simple:

Insist on submitting every day (contest code)

If you insist on submitting every day, you will learn a lot after the competition. In your daily work, you will rarely (if any) have the opportunity to work around these rigorously defined data sets and indicators, and it will be difficult for you to have the opportunity to compete with those well-known data scientists in the industry.
Started 5 big data companies, Kaggle competition champion: Internet deep learning misunderstanding-spend great efforts on those things with little influence

Q5. GP: In the next 5 years, what skills should data scientists learn and improve to avoid being replaced by algorithms?

JH: I hope that the role of "data scientist" will be greatly reduced in the next few years. On the contrary, we will incorporate data science into other jobs or fields, such as medical experts, lawyers, and supply chain experts. Therefore, I think data scientists should understand and learn how an organization creates value, how different industries work, and how organizations are structured. Most importantly, they should cooperate with field experts from these organizations or groups to increase their influence.

I don’t know what technology or skills will still be important after five years. I think the important thing is your learning ability and adaptability.

Q6. GP: What do you expect deep learning technology to develop in 5 years? Will it eventually surpass humans in all areas, or will there be some (fields) humans that will always maintain a leading position?

JH: First of all, it is difficult to know the limitations of deep learning, because currently we are far from finding its limits.

In the field of creativity and skill display, humans will never be replaced, because humans are only interested in "observing" the performance of other people. For example, in the field of creativity and art, take a look at this article by Mike Loukides.

Q7. GP: You are the youngest member of Singularity University. what are you doing there?

JH: Actually I don't think I am the youngest! I teach data science there. The most interesting thing every year is the exchange in the Global Solutions Program. This project selects 80 of the brightest and most passionate young people from all over the world every year to get together and work hard to solve some of the most pressing human problems. I am fortunate to teach them how to use data science to help them solve problems in this project. .

Q8. GP: Many readers and I are curious about why you left Kaggle and Enlitic?

JH: Leaving Kaggle was not a difficult decision. Because I never wanted to be a full-time member of the company, I just started as a volunteer. To my surprise, we have raised a lot of money from venture capital institutions, and at this time there is no other choice but to join me full-time. Kaggle then made an unwise decision. It decided to devote its full effort to oil and gas related businesses, so it didn't make much sense for me to stay. In the following year, I devoted myself to deep learning related fields, which made me decide to enter the field of medical information.

Leaving Enlitic is very difficult for me. I have left the company for a year to deal with an emergency at home. Before founding Enlitic, I spent a lot of time thinking about how to make a better impact in the medical world, whether to enter academia or choose to start a business. It now appears that start-ups rely on external investment to engage in areas that require a lot of basic research, which is not a good choice. Because investors who are eager to increase the company's valuation will put too much pressure on the company and its employees.

That's why Rachel Thomas and I set up fast.ai, a self-sufficient research organization.

Q9. GP: What are your hobbies? Recommend a book you have read and liked recently?

JH: My greatest pleasure is to play with my little daughter. I like her curiosity about everything! I spend a lot of time reading deep learning papers, so I don't have much time to read other books. In the evening I like to listen to audio books, and I am "listening" to "PG Wodehouse" these days.

Related Links

Recommended reading

  • Application of machine learning algorithms in question answering systems: Quora's 2017 ML platform plan
  • Why machine learning is so difficult: machine learning in the eyes of an artificial intelligence doctor

The author of this article, Gregory Piatetsky, was translated by Wei Jia. Please indicate the source for reprinting. Technical original and architectural practice articles. Welcome to submit your papers through the official account menu "Contact Us".

Highly available architecture

Changing the way the internet is built

Started 5 big data companies, Kaggle competition champion: Internet deep learning misunderstanding-spend great efforts on those things with little influence

Guess you like

Origin blog.51cto.com/14977574/2547169