Presto Deployment Guide

1.Presto briefing note

  Presto is an open source distributed SQL query engine for interactive analysis query, the amount of data to support GB PB bytes.

  Presto is designed and written entirely in order to solve problems like interactive analysis and processing speed commercial data warehouse of this size of Facebook.

  Presto enables online data queries, including Hive, Cassandra, relational databases, and proprietary data stores. Presto a query data from multiple data sources are combined, it can be analyzed across the entire organization.

  Presto is a run on multiple servers in a distributed system. Complete installation comprising a coordinator and a plurality of worker. Submit a query by the client, to submit to the CLI command line from Presto coordinator. coordinator parse, analyze and execute the query plan, and then distribute the processing queue to worker.

  

2. official document: https://prestodb.github.io/docs/current/installation.html

3. Download: wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.220/presto-server-0.220.tar.gz

4. unpack, create folders and files

cd /data/bigdata/presto && mkdir etc && touch etc/{config.properties,jvm.config,log.properties,node.properties} && mkdir etc/catalog

5. Modify Profile

etc vim / config.properties  # configuration properties: Presto server's configuration information

Each Presto server is both a coordinator is also a worker. But in a large cluster, in a performance reasons, we recommend using a separate machine as coordinator. 
A coordinator's etc / the config.properties should contain at least the following information: 
coordinator = to true 
Node -scheduler.include =-coordinator to false 
HTTP -server.http.port = 8080 
query.max -memory The = 50GB 
query.max -memory The-per- = the Node 1GB 
Discovery -server.enabled = to true 
discovery.uri = HTTP: // ip_ address: 8080 

the following configuration is the most basic worker: 
Coordinator = false 
HTTP -server.http.port = 8080 
query.max= - MEMORY 50GB 
query.max - MEMORY-per-the Node = 1GB 
discovery.uri = HTTP: // ip_ Address: 8080 

but if you were testing a machine, then this machine will ie as a coordinator, also as a worker. Profile will be as follows: 
Coordinator = to true 
Node -scheduler.include-Coordinator = to true 
HTTP -server.http.port = 8080 
query.max -memory The = 5GB 
query.max -memory The Node-per-= 1GB 
Discovery -server = .Enabled to true 
discovery.uri = HTTP: // ip_ address: 8080

To resolve the above configuration items:

coordinator: Specifies whether operation and maintenance Presto (execution of each query management affection is receiving a query from a client) as an example of a coordinator. 
the Node -scheduler.include- coordinator: whether to allow the scheduling coordinator to work in the service. For large clusters on a node that is the Presto server as coordinator and as worke will reduce query performance. Because if a server is used as a worker, then most of the resources will not be occupied worker, then there would be enough resources for mission-critical scheduling, management and monitoring query execution. 
HTTP - server.http.port: Specifies the HTTP server port. Presto uses HTTP for all internal and external communications. 
task.max - MEMORY = 1GB: maximum memory use of a separate task (a portion of a query execution plan will be executed on a specific node). This limits the number of configuration parameters in Group GROUP BY statement, the right size JOIN association relation table, the number of rows in ORDER BY statement number of rows and a window function processing. This parameter should be adjusted according to the number and complexity of queries concurrent queries. If this parameter is set too low, many queries will not be executed; but if set too high will cause the JVM to run out of memory. 
Discovery -server.enabled: Presto to find all the nodes in the cluster by the Discovery service. In order to be able to find all the nodes in the cluster, each instance of Presto will be at startup to register itself to discovery services. Presto To ease deployment, and do not want to add a new service process, Presto coordinator can run a built-in coordinator inside Discovery service. The Discovery service embedded HTTP server and Presto share and use the same port. 
discovery.uri: URI Discovery server is. Because Presto coordinator embedded Discovery service enabled, so this is the uri uri Presto coordinator of. Modify example.net: 8080 , according to your actual environment settings that URI. Note: This URI must not end with "/."

etc vim / the jvm.config  #JVM configuration: JVM command-line options

-server
-Xmx4G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

etc vim / the log.properties  # Log Level

com.facebook.presto = ABOUT

etc Vim / node.properties  # node attributes: configuration information of each node environment

= node.environment Production's # cluster name. All must have the same cluster name in the same cluster node Presto 
the Node. The above mentioned id = FFFFFFFFFFFF-ffff-ffff- uniquely identifies each ffffffffffff # Presto node. Node.id each node must be unique. Or restart the upgrade process node.id each node in the Presto must remain unchanged. Presto If multiple instances installed on one node (e.g.: Presto plurality of nodes installed on the same machine), then each node must have a unique Presto node.id 
node.data - the dir = / Data / with BigData / presto_data # position data storage directory, the best kept separate, easy to upgrade presto

etc Vim / Catalog / jmx.properties  #Catalog properties: configuration forConnectors (data source) configuration information

connector.name=jmx

etc Vim / Catalog / mysql.properties  # data source connected mysql

connector.name=mysql
connection-url=jdbc:mysql://mysql_ip:3336
connection-user=root
connection-password=xxxxx

6. Run Presto

  Background: cd / data / bigdata / presto / bin && ./launcher start

  Foreground: cd / data / bigdata / presto / bin && ./launcher run

  web monitoring interface: http: // ip_ Address: 8080 / ui /

7.部署presto client:https://prestodb.github.io/docs/current/installation/cli.html

  Download: wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.220/presto-cli-0.220-executable.jar

  重 命名: cp -r soon-cli-0220-executable.jar soon-cli

  Given execute permissions: chmod + x presto-cli

  Connecting the source data mysql:

/ Data / with BigData / Presto-CLI --server localhost: 8080 --catalog MySQL --schema moodscat 

parameters:
--catalog: Specifies the data source connection, connector.name specified in the etc / catalog / mysql.properties in
--schema: Specifies a database connected to this particular example of the data

8. Perform data query:

 

  

 

Guess you like

Origin www.cnblogs.com/chenjw-note/p/10978878.html