Big Data Miscellaneous

3.1mongoDB : Sort

db.getCollection('test_zhouketao').find({}).sort({'_id': 1})

设置增量:{$and:[{'daas.lastUpdateTime' :{$gte : '${begin_date}'}},{'daas.lastUpdateTime' :{$lt : '${end_date}'}}]}

 

3.2 Check micro service log command in the container

docker ps | grep adapter

docker exec -it [docker_id] bash

/usr/local/tomcat/logs

/data/admatmp

Docker copy the files to the server root directory command: docker cp [docker_id]: source_path dst_path /

Check on-line publishing task fails tail -f zte-itp-dcp- adapter.log | grep 'edw_test_quote_adma_0805'

3.3 executed on the server sparksql environment command

Spark environment server address (test environment test task) 10.5.21.133, root / Zte_133_it; sparksql into the environment commands: 1) sparksql; 2) use zxvmax; 3) show create table ods_alm_test

3.4 executed on the server hdfs environment command

Hdfs environment (test environment test tasks) 10.5.21.133, under the command view the path: hdfs dfs -ls /

3.5 kafka log sample

zte-itp-dcp-adapter_app INFO 10237221 10237221 - 192.169.213.147 10.31.6.126 Apache-HttpClient/4.5.2 (Java/1.8.0_191) d.zte.com.cn /zte-itp-dcp-adapter/task/assign {} 192.169.123.116 POST 2019-08-29 12:32:57 +0800 ok- - - - - HTTP/1.1 - - - - - zte-itp-dcp-datamodeling zte-itp-dcp-adapter TaskCommitService - - - - zte-itp-dcp-datamodeling192.169.65.235^1566915850240^218278 - 8913088503369124154 -5415999034902090158 {"X-Emp-No":"10237221"} - zte-itp-dcp-adapter {"virtualTaskSet":[{"virtualTaskName":"edw_intrmgmt_qm","errCode":"0131","errInfo":"table(zxvmax.edw_intrmgmt_qm_optical_binding_relation)not found","taskSet":[{"taskName":"edw_intrmgmt_qm_power_ic_delivery_info_qy_temp2","id":0,"errCode":"0131","errInfo":"table(zxvmax.edw_intrmgmt_qm_optical_binding_relation)not found"}]}],"taskPreCheckRsp":""} http-nio-8080-exec-10 zte-itp-dcp-adapter/ 358 com.zte.itp.dcp.adapter.task.service.TaskCommitService

 

 com.zte.itp.dcp.adapter.callback.service.CallbackServiceImpl [79] -| send Kafka msg,key:task_assign_result_msg,msg:{"processId":"e3135bbbc38c4fb495928e9ee55e1564","taskAssignResults":[{"isSuccess":"true","taskId":"367017264220766208","taskList":[{"errInfo":"","insID":"4601","isSuccess":"true","taskName":"ods_alm_liuyue_test_0905_4_liuyue_res1_ztetmp(2019-12-02 07:15:00)"},{"errInfo":"","insID":"4602","isSuccess":"true","taskName":"ods_alm_liuyue_test_0905_4_zte02"}],"taskVerCode":"V1.0.6"}]}

 

3.6 Oracle Database setting increments conditions

LAST_UPDATED_DATE>=to_date($[lastexcutedate],'YYYY-MM-DD HH24:mi:SS')

sync_date >= to_date(substr($[lastexcutedate]-5,1,10),'yyyy-MM-DD') and sync_date < sysdate and to_char(sync_date,'yyyy-MM-dd')>='2020-01-20'

3.6.1 Mysql time function change means

select STR_TO_DATE(update_date,'%Y-%m-%d %H:%i:%s') as my_date,DATE_FORMAT(createdate,'%Y-%m-%d %H:%i:%s') from liuyue_test_hour;

update_date >= STR_TO_DATE(substr($[lastexcutedate]-5,1,10),'%Y-%m-%d %H:%i:%s') and update_date < sysdate() and date_format(update_date,'%Y-%m-%d')>='2020-01-20'

 3.7 adma task failed to run the query log

bundleresult

3.8 inner and outer tables

Not external modified internal table (managed table), a modified external and storage paths hive data warehouse (HDFS: / namespace / spark / ) of the outer table ( external Table);

the difference:

1) by the internal management table data itself Hive, data is managed by the HDFS external table;

2) internal table data storing position is hive.metastore.warehouse.dir (default: / user / hive / warehouse), a storage position of the outer table data set by themselves;

3) Remove deletes direct internal table metadata (Metadata) and storing data; external table delete deletes only the metadata file on the HDFS and will not be deleted;

4) to modify the internal tables will be modified to directly synchronize metadata, while the outer table structure and partition table is modified, the need to repair (MSCK REPAIR TABLE table_name;)

3.9 View Jian table statement

show create table zxvmax.edw_dim_employee_cs

3.10 Dimension SQL explain:

tmp01 (1) all data tables ods; (2) a proxy key generated here just to satisfy the table structure, but also the regenerated tmp03

tmp02 (1) effective value of N; (2) a change in the gradient column name; (3) ods table has been deleted

tmp03 (1) all the valid flag is not N, last updated blank record id is set to the current time is not; (2) after recording changes slowly added record (here regenerate all proxy keys, generated slow change and natural key new record);

Tmp02 + tmp03 data inserted into the logic table. Wangliang Wei

3.10 Incremental fact zipper SQL explain:

Zip past the table every day from ods to the data set to N, saving the day from ods to the full amount of data is set to Y, the start time for the current time

TMP1 ( all data 1) ods Table; (2) the current time as the start time, end time is empty

tmp2 (2) the edw any original table data end time to the start time of tmp1, ods the end time to NULL (data corresponding to all active ods, the original valid data set to invalid)

tmp3 (3) the original table and tmp2 end time in a valid flag is not NULL is N, tmp2 the end time is set to NULL Y

 

3.11 Data Integration incremental SQL explain:

Part I: Review of

Part II: new

Part III: no change

3.12 Data Integration incremental drift partition does not handle SQL explain:

A first update data, only the value of the new partition, the update data is not coincident with the original partition portions, the original partition is not rewritten.

Part I: Review of

Part II: new

Part III: null (the data are still in the original partition, including data from other drift went partition)

A second, updated data, there are new partition values, update the data have overlapped portion with the original partition, then rewrite the original partition.

Part I: Review of

Part II: new

Part III: There are partitions to find duplicate data, and the data in the original table id is not updated (ie no change in the data, but the zoning change floated away)

3.13 Data Integration incremental drift partition processing SQL explain:

Part I: Review of

Part II: new

Part III: no change

3.14 left anti join 和left semi join

Select * from table1 t1 left anti join table2 t2 on t1.name = t2.name --- left anti join intersection is removed in a part table

Select * from table1 t1 left anti join table2 t2 on t1.name = t2.name --- left semi join to take part in Table 1 and Table 2 intersection

In short, left and left semi join the Join Anti taken all the fields in the primary table, but not in Table 2. It can not be written as:

Select t2.name  from table1 t1 left anti join table2 t2 on t1.name=t2.name

3.15 code completion tool TabNine

edw_engsrv_dpm_site_bom_satisfaction --python task --- 251 environment

3.16 Ali cloud UI test automation: F2etest

3.17 kimball dimension modeling theory

3.18 postgresql data source error

Java.lang.numberFormatException:For.input.string:”2.97664e+06”

Execute the following statement in the database, the execution result is not empty, indicating that the user rights issues, the need to increase authorized

select t1.table_schema as owner,t1.table_name,cast(obj_description(t2.relfilenode,'pg_class') as varchar) as table_comment,coalesce(t2.reltuples, 0) table_rows,concat(t1.table_schema,'.',t1.table_name) as schema_table_name from information_schema.TABLES t1,pg_class t2 where t1.table_name = t2.relname

Extracting fields from the table:

select t3.column_name,      format_type(t2.atttypid, t2.atttypmod) as data_type,     col_description(t2.attrelid, t2.attnum) as column_comment,     '0' as data_length  from pg_class t1,     pg_attribute t2,     information_schema.COLUMNS t3  where t1.relname = t3.table_name     and t2.attname=t3.column_name     and t2.attrelid = t1.oid     and t2.attnum > 0     and t3.table_schema= ?     and t3.table_name= ?

Spark_pg execution error:

select concat(b."NAME", '.', a."TBL_NAME") as schema_table_name,    a."OWNER" as owner,    a."TBL_NAME" as table_name,    0 as table_rows  from "TBLS" a, "DBS" b  where a."DB_ID" = b."DB_ID"

3.19 postgresql database queries in a user permissions list of a library

select * from INFORMATION_SCHEMA.role_table_grants where grantee='postgres' and table_schema ='postgres';

3.20 Alibaba datax information

https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md

 

Guess you like

Origin www.cnblogs.com/yahutiaotiao/p/12631790.html