Machine learning, scientific data and financial sector Series 14: artificial intelligence, big data and Investment Management (10)

Machine learning, scientific data and financial sector

Series 14: artificial intelligence, big data and investment management

9. establishing a data science team (SCHRODERS)

Here Insert Picture Description
background
    Schroders, insights department established in 2014, mainly how to study ways to improve revenue data related to the organization, how to improve their insight into various enterprises.

• Use case: generate investment ideas
    after the release of the theme of smart city report in UN, a fund manager decides to find out which companies may benefit from this topic and raise global urbanization.
    The first step is to search contain "smart city" and "future" of the article, resulting in tens of thousands. Next applying machine learning techniques:
    1) using a series of NLP algorithms to extract ideas, concepts, topics and keywords from the document.
    2) The next step using dimensionality reduction algorithm. Is used in the actual operation force of the guide, FIG Each node represents a news article on a two-dimensional plane, similar iterations articles to draw together, without similar articles will tend to separate. Over time, the article will be talking about the same idea to come together.
    3) At this point the document is assigned to different clusters by unsupervised learning algorithm, finally you can get a map, but similar documents together, different clusters use different color-coded onto the desktop.
    These clusters company name will be mentioned by the article and contoured. Visualization of the news articles diagram reveals some interesting features. Specifically, the company name appears at the edge of the cluster map may fund manager and has not been heard. Mined by this method may be a small company investment opportunities, fund managers will come to a financial analysis of its fundamentals and profitability are good for their business. Then, the company's stock will be added to the portfolio, bringing more than 25% of the proceeds.

team structure
    Ben is a global securities and international team of portfolio managers, he believed that modern science and technology data and new data sources can be used as a powerful complement to traditional equity market analysis. Driven by the idea of his, Mark, a scientist with data science and data analysis background 20 years to join the team as the person responsible for the technical and scientific data, and later recruited from many different industries, with a scientific background data people, but also recruit talent from college.

the development process
    at the beginning, the process is a key investment experts and data scientists are brainstorming, which provides a very valuable opportunity to generate some ideas that can be tested and shared across multiple teams.
    The first few years, how white from such a team into the asset management school fundamentals to some lessons. Each person is different, some people like the idea of a team and care about these ideas in the end what can be done, but not always passionate. In common is that everyone wants to see the results. When a valuable results released thing about this team can do a lot of positive feedback will be internal. This led to a lot of investors want to get involved, given data set and what they consider useful feedback like.
    All investment departments have begun to use the results of the work of Insights department, or a specific data analysis tasks, or by a report generated from automated tools. The results of the work have a lot of users, for example, to analyze a stock, use macro data within the broad range.

AI / Big the Data technical
    work elements of this team has been the technology used, which generates data sets can not be insightful Excel, so the team's work has been focused on specific tools and techniques.
    The first tool is a big data, the team in two ways: AWS Redshift (data warehousing services on Amazon's public cloud) and Hadoop / Hive local deployment.
    The second tool is a geospatial data team used two tools: QGIS (Open Source GIS) and PostGIS (geographic information database).
    The team has a strong ability to predict from the data and interpret patterns, including machine learning and Bayesian inference, also used in Hadoop, Spark, with a GPU.
    The main development language used by the team that R and Python, using Kubernetes Docker and the tools deployed in a production environment. Further, a simple visualization using the Tableau; and R Shiny Dash established using dashboards. An important task is the presentation layer is how to pass the relevant information and enable users to understand the information insight immediately.
    Bayesian approach is the most efficient in identifying a track star point of view of time series, when there have been major changes.

The key point
    requires a senior originator, he wants to really believe and support this change.
    Teams need to have the right mix of a variety of skills.
    Other industry experts and experts in the field are essential.
    The team need to be very clear on what to do and what I can do, so that you know where the work of the Center.
    To execute a transaction without taking into account other factors compared to the signal from big data in a more suitable to help enterprises to establish a view and its operating environment.

Published 80 original articles · won praise 1 · views 2040

Guess you like

Origin blog.csdn.net/weixin_43171270/article/details/104046070