1. Introduction to Apache Zeppelin
Apache Zeppelin is an open source Web notebook type interactive data analysis tool that provides a browser-based interface that allows data engineers and scientists to interact through various languages and tools such as Scala, Python, SQL, R, etc. Analyze, visualize and share data in a convenient way. It integrates with different data processing systems (such as Apache Spark, Flink, Hive, etc.) through an interpreter plug-in architecture, enabling users to easily use and switch between different data processing engines.
Its main functions include:
1. Notebook interface: Provides an interactive web interface, users can easily write and run code, view results, visualize data, and easily manage and share notebooks.
2. Multi-language support: Zeppelin supports multiple languages, such as Scala, Python, R, SQL, etc., allowing users to choose the most suitable programming language for the task.
3. Interpreter plug-in system: Zeppelin supports different data processing engines, such as Apache Spark, Flink, Hive, etc., through interpreter plug-ins. Users can install different interpreters according to their needs.
4. Data visualization: A series of built-in data visualization tools can be used to generate various charts, such as histograms, pie charts, line charts, and tables, without exporting data to other platforms.
5. Real-time collaboration and sharing: supports multi-person real-time collaboration and sharing of notebooks, facilitating communication and sharing of analysis results among team members. 6. Security: Provides an access control system based on users and roles, which can limit access to notebooks and interpreters to ensure data security.
In general, Apache Zeppelin is a powerful interactive data analysis tool, which is suitable for scenarios such as data exploration, model development, visualization and sharing, and provides a flexible and efficient analysis platform for data engineers and scientists.
ps: You can also see the introduction on the official website: Zeppelin
2. Quick installation (based on docker)
docker run -d --name zeppelin0.9 -p 8888:8080 apache/zeppelin:0.10.1
Three, use
After the installation is complete, enter the page: localhost:8888
Configure Interpreters
1.jdbc configuration to connect to mysql
2. Create a new notebook, select jdbc Interpreter, and run sql to query the data in the database
Zeppelin explains some concepts:
Interpreter: executor, execute code executor, such as: jdbc, spark, python, shell, markdown, etc.
Notebook: can be understood as a page
Paragraph: What to run
There can be multiple Paragraphs under one Notebook (as shown in the screenshot below, one Notebook has two Paragraphs)