6.Best Practices for Handling Big Data with Python in

Author: Zen and the Art of Computer Programming

1 Introduction

Big data processing is one of the most commonly used data analysis methods in enterprises. Amazon Web Services (AWS) provides many tools to help users store, process, and analyze big data. Below, I will share some methods and techniques for processing big data on AWS. Hope it can be helpful to readers.

This article is suitable for engineers with a certain foundation in Python programming. If you are not familiar with Python or don’t know about big data processing on AWS, you can read the following articles first:

Note: All the following codes are written based on Python3+

2. Explanation of basic concepts and terms

Amazon EC2(Elastic Cloud Compute)

EC2 is an elastic computing service launched by Amazon. Users can quickly deploy virtual machines or containerized applications on its platform, and automatically configure applications and environments through it to achieve pay-as-you-go and high availability.

EC2 can run on multiple types of hardware, including standard IA architecture servers, high-performance computing ASIC servers and GPU accelerator cards. EC2 provides users with comprehensive reliability and service level guarantees, and can also flexibly adjust configurations.

Amazon S3(Simple Storage Service)

S3 is an object storage service that can provide access and storage of static resources. It provides users with a simple, scalable and secure cloud storage platform for storing various types of data, such as pictures, videos, audios, files, backups, etc.

S3 supports a variety of storage methods, including low latency, high availability, tiering, redundant backup, off-site replication, built-in version control, data reporting and auditing and other functions. The RESTful API provided by S3 can

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132621560