sparksql中如何实现对Sequoiadb数组类型字段的查询

Sequoiadb数据库是国产的企业级分布式数据库,Sequoiadb本身是key-value格式的nosql数据库,上层使用spark做SQL解析层,本文介绍如何使用sparksql查询Sequoiadb数组。

下面举一个具体的例子来说明:

1. 在SDB中创建集合,里面包含数据对象

db.foo.createCL("array1", {ShardingKey:{_id:1}, ShardingType:"hash", AutoSplit:true})

db.foo.array1.insert({id:1, empList: [{name:"Tom", age:30}, {name:"Jack", age:40}]})

db.foo.array1.insert({id:2, empList: [{name:"Nacy", age:25}, {name:"Wendy", age:35}]})

db.foo.createCL("array2", {ShardingKey:{_id:1}, ShardingType:"hash", AutoSplit:true})

db.foo.array2.insert({id:3, empList: [{name:"Tom", age:30}, {name:"Jack", age:40}]})

db.foo.array2.insert({id:4, empList: [{name:"Nacy", age:25}, {name:"Wendy", age:35}]})

2. 在spark-sql中创建对应的数据表:

扫描二维码关注公众号,回复: 5108344 查看本文章

CREATE table sdb_array1 ( id int, empList array<struct<name:string, age:int>>) using com.sequoiadb.spark OPTIONS ( host 'sdbserver1:11810', collectionspace 'foo', collection 'array1');

CREATE table sdb_array2 ( id int, empList array<struct<name:string, age:int>>) using com.sequoiadb.spark OPTIONS ( host 'sdbserver1:11810', collectionspace 'foo', collection 'array2');

select * from sdb_array1;

select * from sdb_array2;

注意:

基本的模式是 array<TYPE> 和 struct<COLUMN:TYPE, COLUMN:TYPE, ...>,上面的用法是两者的组合。

Hive和Spark数据类型的说明,请参考下面的文档:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

https://spark.apache.org/docs/1.2.0/sql-programming-guide.html

3. 以数组中的特定信息作为查询条件:

select * from sdb_array1 where empList[0].name='Tom';

select * from sdb_array2 where empList[1].name='Wendy';

select * from sdb_array1 where empList[0].age=30;

select * from sdb_array2 where empList[1].age=35;

select * from sdb_array1 where empList[0].name='Tom' union all select * from sdb_array2 where empList[1].age=35;



 

猜你喜欢

转载自blog.csdn.net/u014439239/article/details/81906889