EMR usage scenarios: processing large amounts of data with inconsistent structures.
EMR node EBS encryption: LUKS or EBS encryption
EMR Hbase high availability: Build additional EMR Hbase read-replica clusters located in different AZs for high availability.
EMR master nodes batch initialization: 1.custom bootstrap scripts, 2. AMI
EMR master nodes must be in a subnet.
Encrypted root device volumn on cluster nodes=customer AMI or security configuration.
EMR Auto-Scaling=instance group
The permission mechanism for EMR to access S3: Role Mapping. Service Role does not need any S3 permissions. Additional IAM Role can be inherited by Service Role and used by different user groups. (That is, add Service Role to the Trust Policy of each Additional Roles.)
Cloudwatch Event + Lambda can start a temporary EMR cluster, and KeepJobFlowAliveWhenNoStep=False will automatically shut down.
EMR blocks public network access = account level block public policy
Glue invoke EMR = step function
EMRFS
EMRFS consistency: 1. Object metadata in DynamoDB, 2. Retry rules.
EMRFS does not support SSE-C (S3 Customer Key), but supports KMS key, S3-SSE.
EMRFS writes "Slow Down" error: 1. Add Prefix (each preifx in S3 has a list limit, the upper limit is 5000) 2. Add retry for EMRFS
EMR list objects slow, increase EMRFS DynamoDB RCU
QuickSight
QuickSight is integrated with CloudTrail.
Scatter Plot=Determine whether there is a correlation between the two.
When QuickSight accesses the newly added S3 Bucket through Athena, a SPICE error is reported: Configure the QuickSight Console to increase the permissions of the S3 Bucket.
QuickSight can perform federated queries directly, connecting to Salesforce, mysql and S3.
When QuickSight accesses Redshift for the first time, it needs to add the IP to the RD's SG.
QuickSight Enterprise Edition has the ML-Powered forecast (forecast widget) function, which can be used when it comes to requiring minimum effort in the algorithm.
Enterprise import data limit is 500G, Standard limit is 25GB.
QuickSight connects to Redshift across regions: Add QuickSight's IP to RD's SG or VPC Peering+RD managed endpoint via the public network
QuickSight cannot directly read Parquet files on S3 but can read JSON, CSV, and XLSX formats.
QuickSight Enterprise sharing + permission management = group + folder.
Safety
Encryption at rest is only supported by Enterprise edition.
QuickSight does not support encryption with customer-provided keys.
QuickSight use on-premises AD=AD connector+QuickSight Enterprise edition(AD Connector或SAML2.0)
S3 Select supports compression formats (gzip, bzip2), S3 Glacier Select does not support compression formats.
Lakeformation Blueprint: Import data from RDS and AWS CloudTrail into S3, and use Glue Crawler directly to build datalake for the data that has entered S3
Lakeformation supports cross-account catalog and permission management (IAM+LakeFormation Permission).
Other
OpenSearch
Amazon Opensearch uses IAM for permission management.
Amazon Opensearch JVMMemoryPressure problem = too many shards.
Amazon Opensearch Ultrawarm Storage & no move back = A quick way to query infrequently used data.
DynamoDB
DynamoDB does not support Join.
DynamoDB=JSON data,read in millsec.
other
Fast Data Curation = DMS+S3+Glue Crawler。
Data Exchange Service=share data based on a subscription model。
ADF+Amazon Connect (Call Center service) docking, AWS Appflow directly connects to Salesforce and ServiceNow.
Amazon Kendra Search Services Integrated with ML
SNS Message Filtering function, each subscription can configure Filter Policy.