Big data: data table operations, partition tables
2022找工作是学历、能力和运气的超强结合体,遇到寒冬,大厂不招人,可能很多算法学生都得去找开发,测开
测开的话,你就得学数据库,sql,oracle,尤其sql要学,当然,像很多金融企业、安全机构啥的,他们必须要用oracle数据库
这oracle比sql安全,强大多了,所以你需要学习,最重要的,你要是考网络警察公务员,这玩意你不会就别去报名了,耽误时间!
与此同时,既然要考网警之数据分析应用岗,那必然要考数据挖掘基础知识,今天开始咱们就对数据挖掘方面的东西好生讲讲 最最最重要的就是大数据,什么行测和面试都是小问题,最难最最重要的就是大数据技术相关的知识笔试
Article Directory
- Big data: data table operations, partition tables
-
- Big Data: Partition Tables
- bucket table
- modify table
- complex operation array type
- map data type
- struct data type
- Anyway, hive is based on the MapReduce sql framework. It can write sql and do distributed computing. Reviewing this knowledge will be very helpful for future network police exams.
- Summarize
Article Directory
- Big data: data table operations, partition tables
- Big Data: Partition Tables
- bucket table
- modify table
- complex operation array type
- map data type
- struct data type
- Anyway, hive is based on the MapReduce sql framework. It can write sql and do distributed computing. Reviewing this knowledge will be very helpful for future network police exams.
- Summarize
Big Data: Partition Tables
Physically, the folders are
separated
the syntax is
partitioned by(字段,列类型)
The injected data is the partition in May
. In this case, it is equivalent to specifying a field attribute
The partition will continue to build subfolders
Multi-level partitioning
is equivalent to three file directories,
injecting data,
narrowing the scope of query,
filtering conditions, very similar to SQL
bucket table
Bucketing is for load balancing
The number of files is fixed
The purpose is load balancing
The number of reduce is the same as the number of buckets.
It is estimated that it is for the convenience of calculating channel matching.
clustered by(字段) into k buckets
关键字
Bucketing, which field is used to divide the buckets,
the hash value is randomly divided into buckets, awesome
learned in the algorithm
Load transfer
To make a table, you cannot directly transfer the data to the bucketed table. The table created
by bucketing
is clustered by
the imported data is clustered.
no ed
Look at the number of buckets specified by hdfs , which is 3.
According to the field of cid, buckets are divided.
The principle of bucketing is hash table mapping.
The cid hash value %3
is enough.
The data needs to be divided into three parts.
You can’t do it directly.
You also need to calculate the whereabouts .
As long as it is calculated, it must go through MapReduce
, so load data cannot do it, it will not be triggered,
so the contents of each bucket may not necessarily be the same.
The purpose of bucketing is to determine certain data, which must be in the same bucket
without having to go Find another barrel,
understand?
Corresponding to join, just merge
naturally grouped
modify table
Modify the table name
Modify the attributes of the table, internal table, external table
Add folder,
modify folder name,
delete folder,
partition is folder classification
There is no need to engage in partitioning
, do not operate partition operations
add column
complex operation array type
The array is separated by commas
Count the number of arrays
Regardless of python, java, c++, or sql, hive, they are all similar, and the core idea remains the same
map data type
The collection items are separated by # to separate
the map key-value pairs through: separated.
It’s better to say
the map type, which is better than sql
The dictionary in python
is the kv key-value pair
easy to say
struct data type
structure, in c
Anyway, hive is based on the MapReduce sql framework. It can write sql and do distributed computing. Reviewing this knowledge will be very helpful for future network police exams.
Summarize
提示:重要经验:
1)
2) Learn oracle well, even if the economy is cold, the whole test offer is definitely not a problem! At the same time, it is also the only way for you to test the public Internet police.
3) When seeking AC in the written test, space complexity may not be considered, but the interview must consider both the optimal time complexity and the optimal space complexity.