[Introduction to Lens of Apache]

Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one.


At a high level the project provides these features -

1)Simple metadata layer which provides an abstract view over tiered data stores

2)Single shared schema server based on the Hive Metastore - This schema is shared by data pipelines (HCatalog) and analytics applications.

3)OLAP Cube QL which is a high level SQL like language to query and describe data sets organized in data cubes.

4)A JDBC driver and Java client libraries to issue queries, and a CLI for ad hoc queries.

5)Lens application server - a REST server which allows users to query data, make schema changes, scheduling queries and enforcing quota limits on queries.

6)Driver based architecture allows plugging in reporting systems like Hive, Columnar data warehouses, Redshift etc.

7)Cost based engine selection - allows optimal use of resources by selecting the best execution engine for a given query based on the query cost.


Apache Lens provides a unified data analysis interface. Lens cuts down on data analytics silos by providing a single view across multiple multiple tiered data stores and optimizing the environment for query analytics execution. Seamless integration with Hadoop enables functionality similar to traditional data warehouses.


Main features of this project:

1) Simple metadata layer provides abstract view layer for data storage

2) A single shared mode server, based on Hive metastore. Schemas are shared through the data pipeline HCatalog and analytics applications:

3) OLAP Cube QL is a high-level language similar to SQL used to query and describe data sets stored in different data cubes (Cubes)

4) JDBC driver and Java client library to process the query

5) Lens Application Server - This is a REST server that allows users to query data, change data models, schedule queries and query quota limits

6) Driver-based architecture allows embedding in reporting systems such as Hive, columnar datastore, Redshift, etc.

7) Engine selection based on cost algorithm - this algorithm optimizes the use of resources and automatically selects the best engine based on the complexity of the query

Guess you like