LinkedIn open source Feathr, enterprise-level high-performance feature storage

LinkedIn today announced that it is open -sourcing Feathr, its feature store built to simplify machine learning (ML) feature management and improve developer productivity.

Feature Store is a data management system for managing machine learning features, including feature engineering code and feature data. It is a central repository for documented, designed, and access-controlled features that can be used across many different ML models across the team. It takes data from various sources and performs defined transformations, aggregations, validations, and other operations to create features. The feature repository registers available features and makes them ready to be retrieved and consumed by ML training pipelines and inference services.

At LinkedIn, dozens of applications use Feathr to define features, compute their training, deploy them in production, and share them across teams. The report shows that with Feathr, the team required significantly less time to add new features to the model training workflow and improved runtime performance compared to previous application-specific feature pipeline solutions.

"A few years ago, we noticed a pattern: teams were overburdened by the increased cost of maintaining their feature-ready pipelines, which hurt their productivity in innovating and improving their applications," LinkedIn said in the announcement. Many sources pool time-sensitive data, connect features with training labels in a point-in-time-correct manner, and persist features to storage for low-latency online serving. They also need to ensure the same way for training and inference Environment preparation features to prevent bias in training services."

Preparing and managing features has been one of the most time-consuming parts of running our ML applications at scale.

As an abstraction layer, Feathr provides users with a common feature namespace for defining features, and a common platform for computing, serving, and addressing "by name" from within ML workflows. Feathr also brings advanced support for feature transformation, enabling users to experiment with new features on top of the original dataset.

Feathr's abstraction creates producer and consumer roles for traits. Producers define features and register them with Feathr, consumers access/import feature groups into their ML model workflows.

The LinkedIn team is continuing to develop the ecosystem around Feathr, providing new infrastructure and tools, including enabling CI/CD for feature engineering. With it, customers will be able to create upgraded versions of widely shared ML capabilities and then test them against existing models that rely on that capability.

 

Guess you like

Origin www.oschina.net/news/191999/linkedin-open-sources-feathr