SparkSql create table into the local excel

White first attempt to import an excel sheet to the local database and create a table, because it is new to sparksql soon made a lot of small mistakes, hereby record.

Source data processing

The first is the process excel table, because it involves company secrets and privacy on hold specific data, and six columns, since the beginning of the hand cheap blank form and drag a few times, which later led to the blank columns are also null data as enumerated, .
Here Insert Picture Description
Since we can process it into a type, so we want to save as a txt file. Delimiter Select "," probably because the reason excel version can not be directly everywhere comma-separated text (only the tab character), choose to save as CSV format, and then modify the suffix enough.
Here we made a mistake, that is, in the first line of txt will be a column name, but we do not need, only the value of each column can be deleted manually.

This is one of the lines.

Uploaded to the database

Our company using a self-developed database platform, in short, is to upload the txt file under Repository directory, and then sql can access on the line.

sql Code

CREATE EXTERNAL TABLE default.eip_rewards_usage (Asset varchar(50),Platform varchar(50),UserOrBatch varchar(50),NT_Login varchar(50),y_Date DECIMAL(18,0),Sum_ACCESS_CNT DECIMAL(18,0))
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/taiwang/eip_rewards_usage_test/';

A total of six, the first four are of type string, because the bloggers too zz did not find the relevant documentation, and repeatedly re-examination several times String, StringType like keywords, to finally determine the definition of a variable-length column name varchar .
(Revision: Before the direct use Description char plus the length of the column name to define this is wrong, at the time of the query after the number of characters have to fill the whole gap, they seem stupid.)

Location not accurate then we have to upload txt, only use the file in a directory on the line, all the files in the directory will be scanned again.

Then refresh again a success.

refresh table eip_rewards_usage;
select * from eip_rewards_usage;

Here Insert Picture Description
If the creation fails or you want to modify, put off the assembly line re-create the original form again, or simply another name on the list.

drop table default.eip_rewards_usage

Finally: because it is really white intern, just write a line running through the sql can be happy all day, special mention this article, are provided for self-motivation.

Released five original articles · won praise 2 · Views 173

Guess you like

Origin blog.csdn.net/DUTwangtaiyu/article/details/103601288