solr DataImportHandler(DIH)

http://wiki.apache.org/solr/DataImportHandler

目标

从关系数据库中导入数据

环境

apache-solr-dataimporthandler-3.4.0.jar和apache-solr-dataimporthandler-extras-3.4.0.jar和数据库驱动jar需要放到$solr.home/lib目录下

配置solrconfig.xml

  <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">/home/username/data-config.xml</str>
    </lst>
  </requestHandler>

配置data-config.xml, 使用mysql, 表结构同例子(example-DIH中的db)中一致

<dataConfig>
	<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/testsolr?autoReconnect=true&amp;characterEncoding=utf8&amp;useUnicode=true" user="root" password="123456" />	
	<!-- pk id 小写,大写报错,  Map.containsKey区分大小写 -->
	<document>
		<entity name="item" pk="id" 
					query="select * from item"
					deltaImportQuery="select * from item where ID ='${dataimporter.delta.id}'"
					deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'">

			<entity name="feature" pk="ITEM_ID"
					query="select DESCRIPTION as features from FEATURE where ITEM_ID='${item.ID}'"
					deltaQuery="select ITEM_ID from FEATURE where last_modified > '${dataimporter.last_index_time}'"
					parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}"/>

			<entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
						query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"
						deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}'"
						parentDeltaQuery="select ID from item where ID=${item_category.ITEM_ID}">
					<entity name="category" pk="ID"
						query="select DESCRIPTION as cat from category where ID = '${item_category.CATEGORY_ID}'"
						deltaQuery="select ID from category where last_modified > '${dataimporter.last_index_time}'"
						parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=${category.ID}"/>
			</entity>

		</entity>
	</document>
	
	<!-- deltaQuery集中写在一起
	<document name="products">
		<entity name="item" pk="id"
				query="select * from item"
				deltaImportQuery="select * from item where ID='${dataimporter.delta.id}'"
				deltaQuery_1="select id from item where last_modified > '${dataimporter.last_index_time}'"

				deltaQuery="select id from item where 
								id in (select item_id as id from feature where last_modified > '${dataimporter.last_index_time}')
								or id in (select item_id as id from item_category where 			
											item_id in (select id as item_id from category where last_modified > '${dataimporter.last_index_time}')
									 		or last_modified > '${dataimporter.last_index_time}'
								)
								or last_modified > '${dataimporter.last_index_time}'" >


			<entity name="feature" pk="ITEM_ID"
					query="select description as features from feature where item_id='${item.ID}'">
			</entity>
			<entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
					query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'">
				<entity name="category" pk="ID"
					query="select description as cat from category where id = '${item_category.CATEGORY_ID}'">
				</entity>
			</entity>
		</entity>
	</document>
	-->
</dataConfig>

pk="id"中id的大小写要注意

dataSource更多参数见http://wiki.apache.org/solr/DataImportHandler

entity属性

  • query数据查询sql
  • deltaQuery增加数据
  • parentDeltaQuery父entity增加数据
  • deletedPkQuery?
  • deltaImportQuery增量数据查询sql,如果没有则会根据query生成(可能生成错误),所以还是自己写的好

全导入

http://localhost:8983/solr/db/dataimport?command=full-import

增量导入

http://localhost:8983/solr/dataimport?command=delta-import

其他命令:

查看结果 http://localhost:8983/solr/dataimport

重新装载配置,修改配置文件后执行,避免重启服务http://localhost:8983/solr/dataimport?command=reload-config

终止http://localhost:8983/solr/dataimport?command=abort

执行后看返回的xml结果是否正常,还可以看后台是否有异常, 导入后可查询数据看看是否与数据库中一致

conf/dataimport.properties中保存有last_index_time, 导入后solr会更新这个时间

对于数据库中删除的数据?solr中的索引也应该要删除吧, 通过设置删除标记?(是不是最好的方法)

MORE:

multiple datasources

DataImportHandlerDeltaQueryViaFullImport

猜你喜欢

转载自simplehappy.iteye.com/blog/1178273