spark官网首页翻译

官网:http://spark.apache.org/


Download(下载)  
Libraries(SQL And DataFrame、Spark Streaming、MLlib、Third-Party Projects)  
Documentation(frequently asked questions--常见问答)  
Example 
Community(社区)  
Developers   
apache software foundation(apache软件基金会)


Apache Spark is a fast(快速) and general engine(普遍的引擎) for large-scale(大规模) data processing(大数据处理).


Speed:
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. 
==>速度上,在内存中,比Hadoop的 MapReduce快100倍,在磁盘中,也要快10倍(以Logistic regression,即Logistic回归进行比较)
Apache Spark has an advanced(超前的) DAG execution engine that supports acyclic data flow and in-memory computing.
==>DAG:在内存中计算,支持非循环数据流(有向无环图)


Ease of Use:
Spark offers over 80 high-level(高层次) operators that make it easy to build parallel apps(并行的app). And you can use it interactively(人机交互) from the Scala, Python and R shells.


Generality:
Combine(聚合) SQL, streaming, and complex analytics(复杂的分析).
Spark powers a stack(堆) of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly(无缝) in the same application.
==> Spark可以在一个app中,无缝的使用SQL、DataFrame、MLlib、GraphX、SparkStreaming


Runs Everywhere:
Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse(不同的) data sources(数据来源) including HDFS, Cassandra, HBase, and S3.
You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.
==>Spark的底层数据来源广泛,不仅仅只支持HDFS,还包含Hive、HBase、Tachyon、Cassandra以及其他Hadoop数据源。
Spark可以进行独有的standalone模式、hadoop的yarn模式、集群模式等。


Community:
Spark is used at a wide range of(大范围的) organizations (各种各样的组织)to process large datasets(大数据集). You can find example use cases at the Spark Summit conference(spark峰会), or on the Powered By page.
There are many ways to reach the community:
·Use the mailing lists to ask questions.
·In-person events include numerous(为数众多的) meetup groups and Spark Summit.
·We use JIRA for issue tracking(问题跟踪).


Contributors:
Apache Spark is built by a wide set of developers from over 200 companies. Since 2009, more than 1000 developers have contributed to Spark!
The project's committers come from 19 organizations.
If you'd like to participate in (参与)Spark, or contribute to the libraries on top of it, learnhow to contribute.


Getting Started:
Learning Spark is easy whether you come from a Java or Python background:
Download the latest release — you can run Spark locally(本地) on your laptop(便携式电脑).
·Read the quick start guide(*******).
·Spark Summit 2014 contained free training videos and exercises.
·Learn how to deploy Spark on a cluster.

猜你喜欢

转载自blog.csdn.net/xiaogao2017/article/details/78665457