Machine Learning Engineer experience of the first year 12:00

Click on the top " AI proper way ", select the "star" Public No.

Heavy dry goods, the first time served640?wx_fmt=jpeg

This switched | artificial intelligence, big data and deep learning (ID: datayx)

Machine learning and data on a broad scientific terms are, they involve more areas and super knowledge, doing things a data scientist may be very different from another, machine learning engineer as well. Often used in the past (data) to understand or predict (to build the model) in the future.

To the above points just mentioned into the context, I have to explain what my role Yes. I had to stay in a small machine learning consulting team. We did it from data collection to clean, build a model and then you can think of many industries service deployment. Because the team is small, so every man has a lot of titles.

Machine Learning Engineer routine:

9:00 in the morning, I walked into the office and say hello to my colleagues, the food in the refrigerator, pour a cup of coffee, walked in front of my desk. Then I sit down, look at the previous day's notes, open Slack, read and unread messages open for teams to share articles or blog links to articles, because the rapid development in this field, so to look at some cutting-edge stuff.

I usually after reading unread messages, it will take some time to browse articles and blog posts, and carefully study the contents of those difficult to understand. Have to say this one, there are some elements of great help to me are doing. Generally speaking, reading takes me about an hour or more, depending on the article itself. Some friends will ask me why so long?

It seems to me, reading is a skill ultimate yuan. Because once there is a better way to accomplish what I currently do, I will immediately use it by learning to save more time and effort. But there are special circumstances, if there is a project deadline approaching, then I will shorten the reading time to advance the project.

After completing the reading, I would check the previous day's work, check out my notepad and see if I need to start working from somewhere, why I can do this? Because my Notepad is running account type diary.

For example: "to process data into the correct format, you now need the data in the training model. 」 If I encounter difficulties in the work process, it will write down something like this: "the situation occurred data does not match, then I will try to repair mix and match, and get a baseline before trying a new model.

About the time 16 o'clock I'll tidy up my code, probably related to: make a mess of code becomes clear, add comments, combined. Why do this? Because this question I often ask myself: If other people do not understand how to do this? If I have to read this code, what do I need most? With such thinking, I think it takes some time to sort out the code becomes especially meaningful. Around 17:00, my code should be uploaded to GitHub.

This is the ideal day, but not every day is true. Sometimes you will have at 16:00 an excellent idea, and then follow it, then there may be overnight.

Now you should have a general understanding of machine learning engineer day routine of it, then I'll get the experience in which I share with you.

1. opened and closed all data

In many cases, machine learning, engineers will focus on building a better model, rather than improved build its data. Although you can put in enough computing power to make exciting model provides short-term results, but it will not always be a desirable goal.

When you first contact with the project, it is necessary to spend a lot of time to become familiar data. Because in the long run, become familiar with these data in the future will save you more time.

This does not mean you should not start from the details for any new dataset, your goal should be to be in this area "experts." Examine the distribution, to find different types of features, outliers, why they are outliers, and so this kind of problem. If you can not tell the story of these current data, then how to make the model better process the data it?

640?wx_fmt=other

Exploratory data analysis examples of the life cycle (each time it encounters a new data set operation will be performed). More details about the exploratory data analysis.

2. Communication is more difficult than solving technical problems

I met most of the obstacles are not technical, but rather relate to communication problems. Of course, of which there are technical challenges, but as engineers to solve technical problems is our job.

But never underestimate the importance of internal and external communications. There's nothing worse than technology selection errors, because it is wrong to solve technical challenges. What happens in the end such a situation? From the outside, this is because the mismatch between East and we can provide clients seek. In the interior, because a lot of people wear many hats, it is difficult to ensure that everyone can concentrate on one thing the whole body.

In the end how to resolve when confronted with these problems?

For external problems, we can only continue to communicate with customers. Are your customers understand the service you can provide? You know your customers' needs? Do they understand what machine learning can provide and what it can not provide? How can we more effectively communicate your ideas?

For internal problems, you can use our software based on the number of problem-solving tools to determine how hard the internal communication: Asana, Jira, Trello, Slack, Basecamp, catalog on Monday, in the Microsoft Teams is. One of the most effective ways I have found is the simple message updates at the end of the day in the relevant project channel.

It is perfect? No, but it seems effective. It gave me a chance to reflect on what I did, and I am going to tell you what work who need support, and even get advice from everyone there. No matter how good you are an engineer, your ability to maintain and gain new business skills are related to your ability to communicate.

3. Stability> the most advanced technology

There is now a natural language question: the text is classified into different categories, the goal is to allow users to send a text to the service and automatically classified into one of two categories. If there is no confidence in the prediction model, leave the text passed to humans classifier, the daily load of about 1000-3000 request.

BERT is the fire though in the last year. However, if there is no calculation on the scale of Google, with BERT training model to solve the problem we want to solve it is still very complicated, because before going into production, we need to change a lot of content. Instead, we use another method ULMFiT, although it is not the most advanced, but still get satisfactory results, and easier to use.

4. The most common machine for beginners learning two pit

The machine learning applied to actual production, there are two pits: First, the gap from course work to project work, the second is from the notebook model to the gap between the production model (model deployment) of.

I'm learning machine learning course on the Internet, in order to complete his master's degree in AI. But even after the completion of many of the best courses, when I started learning as a machine engineer, I found that my skills are based on the structure of the main courses, but the project was not as well-organized curriculum.

I lack a lot of specific knowledge can not be learned in the course, such as: how to question the data, what data and what use data exploration.

How to compensate for this it? I was lucky enough to be the best people in Australia, but I am willing to learn and willing to do wrong. Of course, the error is not a goal, but to the right, you have to figure out what is wrong.

If you are learning machine learning through a course, then continue this course, but you need to learn what you are learning through their own projects, which make up course deficiencies.

As for how to deploy? At this point I still do not very good. Fortunately, I have noticed a trend: machine learning engineering and software engineering are converging. By like Seldon, Kubeflow and Kubernetes such services, machine learning and soon will become another part of the stack. In Jupyter build the model is very simple, but how to make thousands or even millions of people use this model? This is the machine learning engineer should think about things, which is the premise of machine learning to create value. However, according to a recent discussion on Cloud Native activity point of view, people outside the big companies do not know how to do it.

5.20% of the time

20% of the time, which means that 20% of our time is spent on learning. The objective sense, learning is a loose term, as long about machine learning can be incorporated into the learning areas, related business to continue learning, learning as a machine engineer, understand the business can greatly improve your productivity.

If your business advantage is that you are doing the best, then the future of the business depends on you keep doing what you do best, which means continuous learning.

6. One of the very papers worth reading, but less

This is a rough indicator. However, when you explore any data collection or model, you'll soon find that this law is universal. In other words, each year thousands of submission, you may get 10 groundbreaking papers. In this 10 seminal paper, there are five possible from the Institute or the same individual.

You can not keep up with every new breakthrough, but they, these basic principles have stood the test of time in the application of the basic principles of a solid foundation.

Next is the problem of exploration and development.

7. become your own worst doubters

Exploration and development issues is to try new things and have to play a hard choice between the role of things, you can deal with these problems by becoming his biggest skeptics. Constantly ask yourself, choose to replace these old What benefits can bring?

DevelopIn general, you run the model has been used and it is easy to obtain high-precision digital, which can then be reported to the team as a new benchmark. But if you get a good result, remember to check your work, and your team to do the same again. Because you are an engineer, you should have this awareness.

explore20% of the time spent on exploration is a good decision, but it might be better if it is 70/20/10. That means you need to spend on core products, 70% of the time, spend 20 percent on the secondary development of core products, spending 10% moonshots on (future use of things), although these things may not work immediately . That it is very ashamed, I have never in my practice this role, but this is my positive development in this direction.

8. "Toy issue" very useful

Toy problems can help you understand a lot of problems, especially to help solve a complex problem. First, first create a simple question, it could be about your data or irrelevant data set a small part. Find a solution to this problem, and then extend to him the entire data set. In a small team, the trick is to deal with the problem of abstract problem, and then figured out how to solve.

9. rubber duck

If you have a problem, sit down and stare at the code may solve the problem, may not. At this time, if you discuss with colleagues about, pretend they are your rubber duck, then the problem could easily be solved.

"Ron, I'm trying to traverse the array, and an array of other cycling and track status through, then I think the combination of these states into a list of tuples. "

"Loop the loop? Why do not you put it vectorization it? " "Can I do that? " "Let's try under the bar. "

10. The model number from 0 construct is declining

This machine learning engineering and software engineering are converging related.

Unless your data is very specific problem, otherwise very similar to a lot of problems, classification, regression, time series forecasting, it is recommended.

Google and Microsoft's other services are AutoML can upload data sets for each and choose the target variables provide world-class machine learning. In for developers, such as libraries have fast.ai, they can provide the most advanced model, various models and animation (a set of pre-built model) in a few lines of code, such as PyTorch hub and the hub to provide the same TensorFlow function.

This means that we need to understand the deeper principles of scientific data and machine learning, they only need to know the basic principles can, we should be more concerned about how to apply them to practical problems to create value.

11. mathematics or the code?

For customer issues that I deal with, we are all code first, and all the scientific data and machine learning code is Python. Sometimes I will reproduce it through and read the paper to get involved in math, but most of the existing framework includes mathematics. This is not to say that mathematics is not necessary, after all, machine learning and deep learning are applied in the form of mathematics.

The minimum master matrix operations, linear algebra and calculus of some, especially the chain rule enough to be a machine learning practitioners.

Remember, the goal most of the time or most of the practitioners not to invent a new machine learning algorithms, but to show clients the potential of machine learning to help their business there.

12. The work you have done last year may not work next year

This is a big trend, because the integration of software engineering and machine learning projects, this situation is becoming more and more evident.

But it is also the reason you get into this industry, the framework will change a variety of useful libraries will change, but the basic statistics, probability, mathematics, these things are the same. The biggest challenge remains: how to apply them to create value.

What should we do now?

On the growth path of machine learning engineer should have a lot of pit needs tackling, if you are a novice, first grasp this 12 is enough!


Recommended Reading

(Click on the title to jump to read)

640?wx_fmt=png

Guess you like

Origin blog.csdn.net/red_stone1/article/details/102693811