Thinking big year-end summary data algorithm engineers & data

Former owner had to leave more than a month's time, in this month, and also back and forth several companies made technological exchanges, and he is the first time such a long time to calm down and think summary. This year is the fifth year I graduated, it happened to catch the end of the year, put them together conspire to write a small summary of it, there is no theme, no primary and secondary, pure record, think of where to where to write to.

1. Recommended system

In the last three or four years, my main job is to build a recommendation system, not to say the past few years also saw thousands Hundreds of papers, this focus makes me think at least a recommendation system in the field the relative industry-leading level, but it is precisely this experience I was deeply marked by the label: he is a " recommendation system expert ." If so, then I would recommend First is that the system now.

Recommendation system is too large a word, we might as well say the recommendation algorithm itself, in fact, recommended algorithm itself is a comprehensive issue, he said he could do very shallow shallow, deep says he can do deep . You can simply use the most basic of Content-based, more complex point may Collaborative Filtering, a number of in-depth, based on SVD / LDA dimensionality reduction algorithms, etc. If you want, SVD ++ and other ratings prediction algorithm based on the Learning To Rank sorting algorithm, then even your conversion problems, the recommendation problem converted to classification, pretreatment or the use of the above algorithm to make data using a variety of clustering algorithms, you can toss a lot of tricks. The

Recommended to do the field engineer is a very "painful" thing, because as long as the machine learning have any breakthrough, you need to do to track, NLP field out Word2Vec, the GloVe, algorithm engineers in other fields can said I was not interested in NLP, but you must keep track of, because he can assist you to do text content recommendation algorithms class; Deep learning can make do even better in the field of image recognition feature works, you'll go to track learning, because in finally there is a way may be able to solve the problem of meta-information when making pictures recommended; RecSys2013 the best paper by adjusting the order of nodes to optimize the blocking strategy matrix, the matrix factorization algorithm greatly improved the efficiency, you have to keep track of updates to their old there are off-line algorithm; Microsoft Research Asia come up with a Light LDA allows low network traffic going to do the LDA multi-machine parallel, you have to read them and excitedly ran to dozens of pages of paper La La weave, because finally do not have to endure poor performance of the LDA, and these tend to be endless track. But if once you have stopped updating the knowledge base, the academic community will be far behind, doing a "collaborative filtering" engineer.

But all the adjustment algorithm is only in the hands of the product in order to play its greatest power, but how to choose based on product and adjustment algorithms are algorithms I know most of the engineers very weak point. To give an actual example, we all know that in all competitions, a variety of hybrid algorithm is the most important part, often someone will ask me to say which is the best policy mix, but in fact it is heavily dependent on the the product itself. For example Tinder, their products form each occurrence is only one person, so you click on like it or not, so this situation you must need a classification algorithm to select an appropriate recommendation algorithm for each user, and the user the real-time feedback to adjust classifier, because if the user continuously Unlike several users, he may have lost out. But for users of LinkedIn recommendation list, you use several algorithms are algorithms for mixing, because it can ensure that at least the whole list there are at least X number of user interest, and often with a recommendation Items algorithm is convergence. For another show on the first page of the recommended entrance area, some may prefer the intersection of policy recommendation algorithm, so you can use a small amount of high quality Item maximize meet the user's bottom line, and thus entice users to click. So understand the shape and form of interactive products, custom algorithm is based on products to be a good algorithm engineers, rather than the important point algorithm researchers.

A good recommendation algorithm, a good recommendation system can indeed create a lot of value for the enterprise, has electricity and a well-known director's Web site data exchange, their recommendation systems actually the sales increased by 15%, but too much myth too obsessed recommendation algorithm and looked down upon recommendation algorithm is a kind of extreme behavior. As a recommendation algorithm engineer should clear the bottleneck recommendation algorithm itself, or it can be estimated real value of the algorithm for the enterprise, and thus to decide whether it should continue to optimize the recommendation algorithms. When to start and when to stop is a problem any recommendation algorithm engineers must face and decisions, but talk about this topic must involve two important part of the recommendation system: product and data.

Can not think of a better example of it, said it Taobao, Taobao search is recommended as an example. First of all, I believe that Taobao is not required in terms of overall recommendation algorithm, because Taobao explosion models based on commodity-based play, make up most of the Taobao profit, and I believe Taobao KPI and operational culture, is bound to CTR as the main KPI, so this extent, Taobao operations necessarily kept recommended explosion models can not run the risk of commodity and make this part do the personalization. This product is not suitable for typical example is recommended. On the other hand, due to the explosion models recommended commodities so let the long tail of the product can not accumulate enough click data, which makes the loss and offset data become very serious, recommendation algorithm is unable to play the power. (I did not work in Taobao, so I guess purely an outsider, and I believe most of the electricity supplier website so)

From this example you would negate the role of a recommendation system that I think there is a very easy example of reactions:? Now the ground every 1 100 meters there is money, there may be a distant 500W at 10,000 meters, you in the end It is the choice down to pick up 100 yuan, or 10,000 meters ran far to find that 500W problem.

Recommended Reading articles

Zero-based Big Data Quick Start Tutorial

Java Basic Course

web front-end development based tutorial

Big Data era need to know six things

Big Data framework hadoop Top 10 Myths

Experience big data development engineer salary 30K summary?

Big Data framework hadoop we encountered problems

Guess you like

Origin blog.csdn.net/chengxvsyu/article/details/92011394