- mysql import hdfs
#启动hadoop
$ start-dfs.sh
$ start-yarn.sh
$ jps
2706 NameNode
3334 ResourceManager
3495 NodeManager
3112 SecondaryNameNode
3848 Jps
2873 DataNode
# sqoop energydata是数据库名,average_price_by_state是表名
$ sqoop import --connect jdbc:mysql//localhost:3306/energydata --username abc -P --table average_price_by_state -m 1
test result
$ hadoop fs -tail average_price_by_state/part-m-00000 | less
- mysql import hive
Just delete the files on Hadoop
$ dfs -rm -r /user/hadoop/average_price_by_state
$ sqoop import --connect jdbc:mysql://localhost:3306/energydata --username abc --table average_price_by_state -P --hive-import --fields-terminated-by ',' --lines-terminated-by '\n' --null-string 'null' -m 1
If you encounter an error:
ERROR tool.ImportTool: Import failed: java.io.IOException: Exception thrown in Hive
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:358)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
... 9 more
Caused by: java.lang.NoSuchMethodError: com.lmax.disruptor.dsl.Disruptor.<init>(Lcom/lmax/disruptor/EventFactory;ILjava/util/concurrent/ThreadFactory;Lcom/lmax/disruptor/dsl/ProducerType;Lcom/lmax/disruptor/WaitStrategy;)V
at org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.start(AsyncLoggerDisruptor.java:97)
at org.apache.logging.log4j.core.async.AsyncLoggerContext.maybeStartHelper(AsyncLoggerContext.java:97)
at org.apache.logging.log4j.core.async.AsyncLoggerContext.start(AsyncLoggerContext.java:86)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:240)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:158)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:131)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:101)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:188)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:173)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:106)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:98)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:81)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
... 14 more
Online edition said the hive and disruptor version of the problem, I have here the simple and crude solution is to do disruptor:
$ vim $HIVE_HOME/conf/hive-site.xml
Find this property, into false
<property>
<name>hive.async.log.enabled</name>
<value>false</value>
<description>
Whether to enable Log4j2's asynchronous logging. Asynchronous logging can give
significant performance improvement as logging will be handled in separate thread
that uses LMAX disruptor queue for buffering log messages.
Refer https://logging.apache.org/log4j/2.x/manual/async.html for benefits and
drawbacks.
</description>
</property>
If you encounter an error:
17/11/26 11:24:20 ERROR orm.CompilationManager: It seems as though you are running sqoop with a JRE.
17/11/26 11:24:20 ERROR orm.CompilationManager: Sqoop requires a JDK that can compile Java code.
17/11/26 11:24:20 ERROR orm.CompilationManager: Please install a JDK and set $JAVA_HOME to use it.
17/11/26 11:24:20 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Could not start Java compiler.
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:187)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
$ sudo apt-get install openjdk-7-*
test:
$ hive
hive> show tables;
OK
average_price_by_state
Time taken: 0.103 seconds, Fetched: 1 row(s)
hive> desc average_price_by_state;
OK
year int
state string
sector string
residential double
commercial double
transportation double
other double
total double
Time taken: 0.322 seconds, Fetched: 8 row(s)
- MYSQL import hbase
mysql> CREATE DATABASE logdata;
Query OK, 1 row affected (0.01 sec)
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'localhost';
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT ALL ON logdata.* TO 'hive'@'%' IDENTIFIED BY '!QAZ2wsx';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> use logdata;
Database changed
mysql> create table weblogs (ipyear varchar(255) not NULL PRIMARY KEY,
-> january int(11) DEFAULT NULL,
-> february int(11) DEFAULT NULL,
-> march int(11) DEFAULT NULL,
-> april int(11) DEFAULT NULL,
-> may int(11) DEFAULT NULL,
-> june int(11) DEFAULT NULL,
-> july int(11) DEFAULT NULL
-> ,
-> august int(11) DEFAULT NULL,
-> september int(11) DEFAULT NULL,
-> october int(11) DEFAULT NULL,
-> november int(11) DEFAULT NULL,
-> december int(11) DEFAULT NULL);
Query OK, 0 rows affected (0.02 sec)
mysql> SHOW TABLES;
+-------------------+
| Tables_in_logdata |
+-------------------+
| weblogs |
+-------------------+
1 row in set (0.00 sec)
mysql>
$ sudo mysql -u root -p logdata --local-infile=1
mysql> LOAD DATA LOCAL INFILE '/home/hadoop/hadoop-fundamentals-master/data/weblogs.csv' INTO TABLE weblogs FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;
Query OK, 27300 rows affected (0.30 sec)
Records: 27300 Deleted: 0 Skipped: 0 Warnings: 0
$ start-hbase.sh
$ sqoop import --connect jdbc:mysql://localhost:3306/logdata --username hive -P --table weblogs --hbase-table weblogs --column-family traffic --hbase-row-key ipyear --hbase-create-table -m 1
20/01/19 17:47:29 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=243631
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=1
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=24144
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=24144
Total vcore-milliseconds taken by all map tasks=24144
Total megabyte-milliseconds taken by all map tasks=36216000
Map-Reduce Framework
Map input records=27300
Map output records=27300
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=856
CPU time spent (ms)=10110
Physical memory (bytes) snapshot=203603968
Virtual memory (bytes) snapshot=3019489280
Total committed heap usage (bytes)=79167488
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
20/01/19 17:47:29 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 45.3857 seconds (0 bytes/sec)
20/01/19 17:47:29 INFO mapreduce.ImportJobBase: Retrieved 27300 records.