data-config.xml配置示例:
<?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/video" user="root" password="root" batchSize="-1"/> <document> <entity name="video" pk="v_id" query="SELECT * FROM y2_video" deltaImportQuery="select * from y2_video where v_id='${dataimporter.delta.v_id}'" deltaQuery="select v_id from y2_video where v_create_at>UNIX_TIMESTAMP('${dataimporter.last_index_time}')" > <field column="v_id" name="v_id"/> <field column="v_title" name="v_title"/> <field column="v_thumb" name="v_thumb"/> <field column="v_url" name="v_url"/> <field column="v_tags" name="v_tags"/> <field column="v_create_at" name="v_create_at"/> <field column="v_last_index_time" name="v_last_index_time"/> </entity> </document> </dataConfig>
其中batchSize="-1"这个配置很重要,如果不配置,百万级数据全量导入就内存溢出了
entity name="video" pk="v_id"
这个pk也很重要,不配置导入会很慢
deltaQuery=这个查询语句只能返回表的id键,用来配合上面deltaImportQuery的dataimiporter.delta.v_id,增量导入的时候很重要