Oracle实验三 基于大数据集的数据库操作

版权声明:@YDDUONG https://blog.csdn.net/y_dd6011/article/details/86370955

一、实验内容

    1.以常用“名字大全”与“百家姓”数据集为基础,生成不小于1千万条stud记录,要求,姓名的重复率不超过10%,学号以ABCD16EFGH为格式模板,即其中16是固定的,AB为从01到80,CD为从01到90,EF为01到50,GH为01到32;性别中,男、女占比为99%到99.5%。TEL与E-mail不作要求,但不能全空。Birthday要求从‘19940101’到‘19990731’分布。要求记录ORACLE数据文件的大小变化。(需要编制过程)

    2.分别测试stud有主键与没有主键情形下生成记录的时间。

    3.建立基于name的索引index_name,测试建立的时间与建立索引前后查询某一姓名及某一姓的时间长度。

    4.测试索引index_name建立前后,分姓(简单地理解为姓名的第1,2位)的记录数统计时间。

    5.按学号首位建立10个分区分别为part_0到part_9,测试建立分区前后分首位统计人数与分专业(EF位)统计人数的时间差别。

二、实验分析

    1.实验思路

    分别完成姓名表,学号表,学生其他信息表的建立和数据生成,最后将三张表数据同时插入学生信息表。

    2.实验过程

    0)写一个记录系统时间的过程,方便记录时间差

    1)导入基本的姓名表

    2)创建姓名各个字的视图

扫描二维码关注公众号,回复: 4893810 查看本文章

    3)将视图进行笛卡尔积插入学生姓名表

    4)成学生学号表

    5)建立相关随机函数:性别,电话,邮箱,出生日期

    6)生成学生其他信息表数据

    7)将三张表数据同时插入学生信息表

    8)记录各个操作的时间

    3.生成姓名表分析

    对原始的姓名创建3个视图,分别选出第一、二、三个字,其分别有572、1504和1200条记录,对于生成千万条数据,随机选出500、1200和20条记录就可达到实验要求。

    4.生成学生学号分析

--  ABCD16EFGH:每次生成两位,分四批生成,可以在1分钟左右完成生成

--建立过程中要使用的表

drop table t_student_sno_gh_j432;

drop table t_student_sno_ef_j432;

drop table t_student_sno_cd_j432;

drop table t_student_sno_ab_j432;

create table t_student_sno_gh_j432(sno number(10));

create table t_student_sno_ef_j432(sno number(10));

create table t_student_sno_cd_j432(sno number(10));

create table t_student_sno_ab_j432(sno varchar2(10));

--通过批处理,建立过程分批生成:

declare

i int;

begin

for i in 1..32 loop--先生成班级内学生序号

insert into t_student_sno_gh_j432 values(i);

end loop;

for i in 1..50 loop--生成班级序号

insert /*+ append */ into t_student_sno_ef_j432 select 160000+i*100+t_student_sno_gh_j432.sno from t_student_sno_gh_j432;

commit;

end loop;

for i in 1..90 loop--生成专业序号

insert /*+ append */ into t_student_sno_cd_j432 select i*1000000+t_student_sno_ef_j432.sno from t_student_sno_ef_j432;

commit;

end loop;

for i in 1..80 loop--生成学院序号

insert /*+ append */ into t_student_sno_ab_j432 select substr(10000000000+i*100000000+t_student_sno_cd_j432.sno,2,10) from t_student_sno_cd_j432;

commit;

end loop;

end;

/

    5.生成学生其他信息表分析

    1)建立随机生成性别函数f_getSex_j432,在性别表中插入143条记录,其中男女占142条,还有一条为“其他”,每次调用函数会随机返回一条记录,可是男女占比为99.3%,达到实验要求。

    2)另外建立随机产生电话函数f_getTel_j432、随机产生邮箱函数f_getEmail_j432、随机产生生日函数f_getBirthday_j432

    3)通过批处理产生大量数据

--建立基本信息模板表

create table t_sequence_id(

id number(10),

sex varchar2(32),

tel varchar2(32),

email varchar2(32),

birthday date);

 

--存储大量的学生其他信息,千万级以上

drop table t_stud_other_information_j432;

create table t_stud_other_information_j432(

id number(16),

sex varchar2(32),

tel varchar2(32),

email varchar2(32),

birthday date);

 

--生成1万条数据模板

begin

delete from t_sequence_id;

for i in 0..9999 loop

insert into t_sequence_id(id, sex, tel, email, birthday) values(i,f_getsex_j432,f_gettel_j432,f_getemail_j432,f_getbirthday_j432);

end loop;

end;

/

--select * from t_sequence_id;

 

--每1万条数据插入一次,得到12000000条数据

begin

delete from t_stud_other_information_j432;

for i in 1..1200 loop

insert /*+ append */ into t_stud_other_information_j432(id, sex, tel, email,birthday) select i * 10000 +t_sequence_id.id as MSISDN,t_sequence_id.sex,t_sequence_id.tel,t_sequence_id.email,t_sequence_id.birthday from t_sequence_id;

commit;

end loop;

end;

/

6.生成完整学生信息表分析

--通过表格自带的序号来连接表格,并且可以防止笛卡尔积

insert into t_student_j432(sno,sname, sex, tel, email, birthday)

select X.sno, A.sname, B.sex, B.tel, B.email, B.birthday

from

 (select rownum rownum_X,sno from t_student_sno_ab_j432) X,

 (select rownum rownum_A,sname from t_stu_name_j432) A,

 (select rownum rownum_B,sex,tel,email,birthday from t_stud_other_information_j432) B

where rownum_A = rownum_B and rownum_A = rownum_X;

7.记录各个操作时间差分析

create table t_record_time_j432(things varchar2(128),time varchar2(32));

create or replace procedure p_record_time_j432(input varchar2) as

temp varchar2(32);

begin

select to_char(systimestamp,'yyyy-mm-dd hh24:mi:ss.ff') into temp from dual;

insert into t_record_time_j432(things) values(input);

update t_record_time_j432 set time=temp where things=input;

end;

/

show error

三、实验结果

本次记录中得到以下实验数据:

导入原始姓名表用时为69.970000秒

学生信息表没有主键生成时间为40.811000秒

学生信息表有主键生成时间为144.011000秒

没有姓名索引,查询一条姓名的时间为2.108000秒

有姓名索引,查询一条姓名的时间为1.680000秒

没有分区,按学号首位ID统计人数的时间为2.295000秒

没有分区,按学号首位ID统计人数的时间为1.780000秒

 

实验过程记录:(整个记录保存在txt文件内,使用Notepad++查看)

 

 

 

四、实验小结

    本次实验主要遇到的问题是数量级太大,效率低。最开始的时候,是想建立整张学生信息表,先导入姓名,对整张表做一个游标,不断生成其他信息,55分钟大约生成160万信息,太慢。然后查找资料可以进行批处理,以及可以先拆开生成不同的表格数据,最后再统一插入学生信息表,可大大加快效率,实现实验要求。

    另一个问题就是记录时间,我在网上查到可以用表格自带的时间戳,我用过或发现误差太大,然后我想到获取毫秒级的系统时间,在每次操作前后都获取一次系统时间,通过分析时间差来达到实验目的,所以就有了实验源代码之初的记录时间的过程。

五、完整代码

--建立精确记录时间过程,方便记录系统时间(表格自带的时间戳,误差太大)
--drop table t_record_time_j432;
create table t_record_time_j432(things varchar2(128),time varchar2(32));
create or replace procedure p_record_time_j432(input varchar2) as
temp varchar2(32);
begin 
select to_char(systimestamp,'yyyy-mm-dd hh24:mi:ss.ff') into temp from dual;
insert into t_record_time_j432(things) values(input);
update t_record_time_j432 set time=temp where things=input;
end;
/
show error
exec p_record_time_j432('第一次执行记录时间过程')
col things format a64;
col time format a32;
select * from t_record_time_j432;

--1.创建基本姓名表
drop table sname;
create table sname(sname varchar(32));
--2.插入基本姓名
exec p_record_time_j432('开始导入原始姓名')
@D:\YDDUONG\大学课程学习资料\大三课程\orcale\ODB\name_insert.txt
commit;
exec p_record_time_j432('导入原始姓名结束')
--3.制造足够多的姓名
--  为名字的每个字创建视图
drop view v_name1_j432;
drop view v_name2_j432;
drop view v_name3_j432;
drop view v_name12_j432;
drop view v_name123_j432;
create view v_name1_j432 as select * from (select distinct substr(sname,1,1) na1 from sname order by dbms_random.value) where rownum<=500;
create view v_name2_j432 as select * from (select distinct substr(sname,2,1) na2 from sname order by dbms_random.value) where rownum<=1200;
create view v_name3_j432 as select * from (select distinct substr(sname,3,1) na3 from sname order by dbms_random.value) where rownum<=20;
create view v_name12_j432 as select concat(na1,na2) na12 from v_name1_j432,v_name2_j432;
create view v_name123_j432 as select concat(na12,na3) na123 from v_name12_j432,v_name3_j432;

--4创建t_stu_name_j432学生姓名表
drop table t_stu_name_j432;
create table t_stu_name_j432(sname varchar(16));

--5为姓名表t_stu_name_j432插入120200000条学生姓名
insert into t_stu_name_j432(sname) select na12 from v_name12_j432;
insert into t_stu_name_j432(sname) select na123 from v_name123_j432;
exec p_record_time_j432('完成为姓名表生成120200000条记录')
--select count(*) from t_stu_name_j432;

--6.生成规范学号表
--  ABCD16EFGH:每次生成两位,分四批生成,可以在1分钟左右生成
--建立过程中要使用的表
drop table t_student_sno_gh_j432;
drop table t_student_sno_ef_j432;
drop table t_student_sno_cd_j432;
drop table t_student_sno_ab_j432;
create table t_student_sno_gh_j432(sno number(10));
create table t_student_sno_ef_j432(sno number(10));
create table t_student_sno_cd_j432(sno number(10));
create table t_student_sno_ab_j432(sno varchar2(10));

--建立过程:
declare
i int;
begin
for i in 1..32 loop
	insert into t_student_sno_gh_j432 values(i);
end loop;
for i in 1..50 loop
	insert /*+ append */ into t_student_sno_ef_j432 select 160000+i*100+t_student_sno_gh_j432.sno from t_student_sno_gh_j432;
	commit;
end loop;
for i in 1..90 loop
	insert /*+ append */ into t_student_sno_cd_j432 select i*1000000+t_student_sno_ef_j432.sno from t_student_sno_ef_j432;
	commit;
end loop;
for i in 1..80 loop
	insert /*+ append */ into t_student_sno_ab_j432 select substr(10000000000+i*100000000+t_student_sno_cd_j432.sno,2,10) from t_student_sno_cd_j432;
	commit;
end loop;
end;
/
exec p_record_time_j432('完成11520000条学号ID生成')

--7.1批量处理sex, tel, email, birthday等数据
--   产生随机性别:男、女占比为99%到99.5%
--   在性别表t_sex_j432中,143人,男女分别71人,一人性别为其它。随机查询一条记录时,可使男、女占比为99.3%
--   函数f_getSex_j432返回值为随机性别
create or replace function f_getSex_j432 return varchar2
is
f_numb number;
f_sex varchar2(8);
begin
select trunc(dbms_random.value(1,143))) into f_numb from dual;
if f_numb<=71 then
	f_sex='男';
elsif f_numb<=142 then
	f_sex='女';
else
	f_sex='其他';
end if;
return f_sex;
end;
/
show error

--7.2产生随机电话
--   函数f_getTel_j432会返回一个随机手机号
create table t_tel_j432(tel varchar2(4));
insert into t_tel_j432 values('132');
insert into t_tel_j432 values('135');
insert into t_tel_j432 values('156');
insert into t_tel_j432 values('151');
insert into t_tel_j432 values('138');
insert into t_tel_j432 values('139');
insert into t_tel_j432 values('183');
insert into t_tel_j432 values('187');
insert into t_tel_j432 values('153');
insert into t_tel_j432 values('150');
insert into t_tel_j432 values('186');
insert into t_tel_j432 values('188');
create or replace function f_getTel_j432 return varchar2
is
getTel varchar2(12);
numb_head varchar2(4);
numb_other varchar2(16);
begin
select tel into numb_head from (select tel from t_tel_j432 order by dbms_random.value) where rownum=1;
select substr(cast(dbms_random.value as varchar2(32)),3,8) into numb_other from dual;
getTel:=numb_head||numb_other;
return getTel;
end;
/
show error

--7.3随机产生邮箱
--   函数f_getEmail_j432会返回一个随机邮箱号
create table t_email_j432 (email varchar2(16));
insert into t_email_j432 values('126');
insert into t_email_j432 values('139');
insert into t_email_j432 values('sohu');
insert into t_email_j432 values('sina');
insert into t_email_j432 values('163');
insert into t_email_j432 values('foxmail');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');
insert into t_email_j432 values('qq');

create or replace function f_getemail_j432 return varchar2
is
getEmail varchar2(32);
em_name varchar2(16);
em_own varchar2(16);
begin
select substr(cast(dbms_random.value as varchar2(32)),3,11) into em_name from dual;
select email into em_own from (select email from t_email_j432 order by dbms_random.value) where rownum=1;
getEmail:=em_name||'@'||em_own||'.com';
return getEmail;
end;
/
show error

--7.4产生随机生日
--   函数f_getBirthday_j432会返回一个在[19940101,19990701]时间内的日期
create or replace function f_getBirthday_j432 return date
is
birthday date;
begin
select to_date(trunc(dbms_random.value(2449354,2451186)),'J') into birthday from dual;
return birthday;
end;
/
show error
exec p_record_time_j432('性别、手机号、邮箱、出生日期等随机函数生成完毕')

--7.5生成完整学生表除学号和姓名之外的所有信息
--建立基本信息模板表
create table t_sequence_id(
id number(10),
sex varchar2(32),
tel varchar2(32),
email varchar2(32),
birthday date);

--存储大量的学生其他信息,千万级以上
drop table t_stud_other_information_j432;
create table t_stud_other_information_j432(
	id number(16),
	sex varchar2(32),
	tel varchar2(32),
	email varchar2(32),
	birthday date);

--生成1万条数据模板
begin
delete from t_sequence_id;
for i in 0..9999 loop
insert into t_sequence_id(id, sex, tel, email, birthday) values(i,f_getsex_j432,f_gettel_j432,f_getemail_j432,f_getbirthday_j432);
end loop;
end;
/
--select * from t_sequence_id;

--每1万条数据插入一次,得到12000000数据
begin
delete from t_stud_other_information_j432;
for i in 1..1200 loop
insert /*+ append */ into t_stud_other_information_j432(id, sex, tel, email,birthday) select i * 10000 +t_sequence_id.id as MSISDN,t_sequence_id.sex,t_sequence_id.tel,t_sequence_id.email,t_sequence_id.birthday from t_sequence_id;
commit;
end loop;
end;
/
exec p_record_time_j432('完成生成12000000条其他信息生成')

exec p_record_time_j432('学生信息表没有主键,生成11520000条记录,开始时间')
drop table t_student_j432;
create table t_student_j432(
	sno varchar2(10),--学生学号
	sname varchar2(32),
	sex varchar2(32),
	tel varchar2(32),
	email varchar2(32),
	birthday date);
--生成完整的学生信息表:将学号,姓名和其他信息,放在一起
insert into t_student_j432(sno,sname, sex, tel, email, birthday) select X.sno,A.sname, B.sex, B.tel, B.email, B.birthday from (select rownum rownum_X,sno from t_student_sno_ab_j432) X, (select rownum rownum_A,sname from t_stu_name_j432) A, (select rownum rownum_B,sex,tel,email,birthday from t_stud_other_information_j432) B where rownum_A = rownum_B and rownum_A = rownum_X;
exec p_record_time_j432('学生信息表没有主键,生成11520000条记录,完成时间')

exec p_record_time_j432('学生信息表有主键,生成11520000条记录,开始时间')
drop table t_student_j432;
create table t_student_j432(
	sno varchar2(10) primary key,--学生学号
	sname varchar2(32),
	sex varchar2(32),
	tel varchar2(32),
	email varchar2(32),
	birthday date);
insert into t_student_j432(sno,sname, sex, tel, email, birthday) select X.sno,A.sname, B.sex, B.tel, B.email, B.birthday from (select rownum rownum_X,sno from t_student_sno_ab_j432) X, (select rownum rownum_A,sname from t_stu_name_j432) A, (select rownum rownum_B,sex,tel,email,birthday from t_stud_other_information_j432) B where rownum_A = rownum_B and rownum_A = rownum_X;
exec p_record_time_j432('学生信息表有主键,生成11520000条记录,完成时间')

--为学生表添加其他约束
alter table t_student_j432 add constraint ck_student_sex check(sex in('男','女','其它'));
alter table t_student_j432 add constraint ck_student_email check(email like '%@%.%');
alter table t_student_j432 add constraint ck_student_birthday check(birthday>=to_date('19940101','yyyymmdd') and birthday<=to_date('19990731','yyyymmdd'));
--随机查询100名同学信息
set linesize 300;
set pagesize 1000;
col sno format a11;
col sname format a8;
col sex format a6;
col email format a26;
col tel format a12;
alter session set nls_date_format = 'yyyy-mm-dd';
select * from (select * from t_student_j432 order by dbms_random.value()) where rownum<=100;
select count(*) from t_student_j432;

col things format a64;
col time format a32;
select * from t_record_time_j432;
exec p_record_time_j432('没有姓名索引,查询一条姓名的开始时间')
select * from t_student_j432 where sname='周平环';
exec p_record_time_j432('没有姓名索引,查询一条姓名的结束时间')

exec p_record_time_j432('没有姓名索引,查询某一姓氏人数的开始时间')
select * from t_student_j432 where sname like '周%';
exec p_record_time_j432('没有姓名索引,查询某一姓氏人数的结束时间')

exec p_record_time_j432('没有姓名索引,统计某一姓氏人数的开始时间')
select count(*) from t_student_j432 where sname like '周%';
exec p_record_time_j432('没有姓名索引,统计某一姓氏人数的结束时间')

exec p_record_time_j432('没有姓名索引,统计某一姓名第二个字相同人数的开始时间')
select count(*) from t_student_j432 where sname like '_平%';
exec p_record_time_j432('没有姓名索引,统计某一姓名第二个字相同人数的结束时间')
--创建索引
drop index i_stu_sname_j432;
exec p_record_time_j432('开始创建姓名索引')
create index i_stu_sname_j432 on t_student_j432(sname);
exec p_record_time_j432('完成创建姓名索引')

exec p_record_time_j432('有姓名索引,查询一条姓名,开始时间')
select * from t_student_j432 where sname='周平环';
exec p_record_time_j432('有姓名索引,查询一条姓名,结束时间')

exec p_record_time_j432('有姓名索引,查询某一姓氏人数的开始时间')
select * from t_student_j432 where sname like '周%';
exec p_record_time_j432('有姓名索引,查询某一姓氏人数的结束时间')

exec p_record_time_j432('有姓名索引,统计某一姓氏人数的开始时间')
select count(*) from t_student_j432 where sname like '周%';
exec p_record_time_j432('有姓名索引,统计某一姓氏人数的结束时间')

exec p_record_time_j432('有姓名索引,统计某一姓名第二个字相同人数的开始时间')
select count(*) from t_student_j432 where sname like '_平%';
exec p_record_time_j432('有姓名索引,统计某一姓名第二个字相同人数的结束时间')
exec p_record_time_j432('没有分区,按学号首位ID统计人数的开始时间')
select count(*) from t_student_j432 where sno like '5%';
exec p_record_time_j432('没有分区,按学号首位ID统计人数的结束时间')

exec p_record_time_j432('没有分区,按专业统计人数的开始时间')
select count(*) from t_student_j432 where sno like '______01%';
exec p_record_time_j432('没有分区,按专业统计人数的结束时间')

--重新建立学生表,并分区
drop table t_student_j432;
create table t_student_j432(
	sno varchar2(10) primary key,--学生学号
	sname varchar2(32),
	sex varchar2(32),
	tel varchar2(32),
	email varchar2(32),
	birthday date)partition by range(sno)(
partition part_0 values less than ('1000160000'),
partition part_1 values less than ('2000160000'),
partition part_2 values less than ('3000160000'),
partition part_3 values less than ('4000160000'),
partition part_4 values less than ('5000160000'),
partition part_5 values less than ('6000160000'),
partition part_6 values less than ('7000160000'),
partition part_7 values less than ('8000160000'),
partition part_8 values less than ('9000160000'),
partition part_9 values less than (maxvalue));
insert into t_student_j432(sno,sname, sex, tel, email, birthday) select X.sno,A.sname, B.sex, B.tel, B.email, B.birthday from (select rownum rownum_X,sno from t_student_sno_ab_j432) X, (select rownum rownum_A,sname from t_stu_name_j432) A, (select rownum rownum_B,sex,tel,email,birthday from t_stud_other_information_j432) B where rownum_A = rownum_B and rownum_A = rownum_X;
--为学生表添加其他约束
alter table t_student_j432 add constraint ck_student_sex check(sex in('男','女','其它'));
alter table t_student_j432 add constraint ck_student_email check(email like '%@%.%');
alter table t_student_j432 add constraint ck_student_birthday check(birthday>=to_date('19940101','yyyymmdd') and birthday<=to_date('19990731','yyyymmdd'));

exec p_record_time_j432('有分区,按学号首位ID统计人数的开始时间')
select count(*) from t_student_j432 where sno like '5%';
exec p_record_time_j432('有分区,按学号首位ID统计人数的结束时间')

exec p_record_time_j432('有分区,按专业统计人数的开始时间')
select count(*) from t_student_j432 where sno like '______01%';
exec p_record_time_j432('有分区,按专业统计人数的结束时间')

col things format a64;
col time format a32;
select * from t_record_time_j432 order by time;
spool off;

猜你喜欢

转载自blog.csdn.net/y_dd6011/article/details/86370955