http://wiki.apache.org/solr/DataImportHandler
目标
从关系数据库中导入数据
环境
apache-solr-dataimporthandler-3.4.0.jar和apache-solr-dataimporthandler-extras-3.4.0.jar和数据库驱动jar需要放到$solr.home/lib目录下
配置solrconfig.xml
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">/home/username/data-config.xml</str> </lst> </requestHandler>
配置data-config.xml, 使用mysql, 表结构同例子(example-DIH中的db)中一致
<dataConfig> <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/testsolr?autoReconnect=true&characterEncoding=utf8&useUnicode=true" user="root" password="123456" /> <!-- pk id 小写,大写报错, Map.containsKey区分大小写 --> <document> <entity name="item" pk="id" query="select * from item" deltaImportQuery="select * from item where ID ='${dataimporter.delta.id}'" deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'"> <entity name="feature" pk="ITEM_ID" query="select DESCRIPTION as features from FEATURE where ITEM_ID='${item.ID}'" deltaQuery="select ITEM_ID from FEATURE where last_modified > '${dataimporter.last_index_time}'" parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}"/> <entity name="item_category" pk="ITEM_ID, CATEGORY_ID" query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'" deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}'" parentDeltaQuery="select ID from item where ID=${item_category.ITEM_ID}"> <entity name="category" pk="ID" query="select DESCRIPTION as cat from category where ID = '${item_category.CATEGORY_ID}'" deltaQuery="select ID from category where last_modified > '${dataimporter.last_index_time}'" parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=${category.ID}"/> </entity> </entity> </document> <!-- deltaQuery集中写在一起 <document name="products"> <entity name="item" pk="id" query="select * from item" deltaImportQuery="select * from item where ID='${dataimporter.delta.id}'" deltaQuery_1="select id from item where last_modified > '${dataimporter.last_index_time}'" deltaQuery="select id from item where id in (select item_id as id from feature where last_modified > '${dataimporter.last_index_time}') or id in (select item_id as id from item_category where item_id in (select id as item_id from category where last_modified > '${dataimporter.last_index_time}') or last_modified > '${dataimporter.last_index_time}' ) or last_modified > '${dataimporter.last_index_time}'" > <entity name="feature" pk="ITEM_ID" query="select description as features from feature where item_id='${item.ID}'"> </entity> <entity name="item_category" pk="ITEM_ID, CATEGORY_ID" query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"> <entity name="category" pk="ID" query="select description as cat from category where id = '${item_category.CATEGORY_ID}'"> </entity> </entity> </entity> </document> --> </dataConfig>
pk="id"中id的大小写要注意
dataSource更多参数见http://wiki.apache.org/solr/DataImportHandler
entity属性
- query数据查询sql
- deltaQuery增加数据
- parentDeltaQuery父entity增加数据
- deletedPkQuery?
- deltaImportQuery增量数据查询sql,如果没有则会根据query生成(可能生成错误),所以还是自己写的好
全导入
http://localhost:8983/solr/db/dataimport?command=full-import
增量导入
http://localhost:8983/solr/dataimport?command=delta-import
其他命令:
查看结果 http://localhost:8983/solr/dataimport
重新装载配置,修改配置文件后执行,避免重启服务http://localhost:8983/solr/dataimport?command=reload-config
终止http://localhost:8983/solr/dataimport?command=abort
执行后看返回的xml结果是否正常,还可以看后台是否有异常, 导入后可查询数据看看是否与数据库中一致
conf/dataimport.properties中保存有last_index_time, 导入后solr会更新这个时间
对于数据库中删除的数据?solr中的索引也应该要删除吧, 通过设置删除标记?(是不是最好的方法)
MORE:
DataImportHandlerDeltaQueryViaFullImport