How to manually implement TryInsert and InsertOrUpdate

In the daily development, we sometimes need to insert data to customize. For example, if the table has not written a certain recording a new record, or the record is not inserted into the table, otherwise it updated. We called the former TryInsert, the latter is InsertOrUpdate(also called upsert). In general, a lot of ormthe framework will be included with such a function, but if you want to bulk insert data, ormbuilt-in function is not quite good enough. Let us fight to achieve from the perspective of SQL manual TryInsertand InsertOrUpdate.

Given the current popularity of the two major open source RDBMSto the SQL standard support is relatively backward, while the earlier standard and no standard syntax for this area, so we divided into MySQLarticles and Postgrespapers to their respective dialects were used to solve two problems mentioned above.

MySQL articles

Analytical principle

insert ignore into

If the error insertion (primary key or the Uniquekey repeat), will turn into error warnings, the number of rows impact returns to 0 at this time, can be used to achieve TryInsert().

replace into

replaceWith insertsyntax basically the same, it is Mysqlthe extended syntax, the official InsertOrUpdate, replacethe basic logic of the statement is as follows:

ok:=Insert()
if !ok {
  if duplicate-key {  // key重复就删掉重新插入
    Delete()
    Insert()
  }
}

From here we can see replacethe number of rows impact statement, if it is inserted, the impact of the number of rows is 1; if it is updated, deleted, and then insert, affect the number of rows is two.

Insert into ... on duplicate key update

MySQL also extended syntax. ... on duplicate key updateThe logic replaceis almost the only difference is that if you insert a new value and the old values, the default number of rows returned impact is zero, so the logic here is that if the old value and the new value is not the same as treatment.

The sample code

The following is golangan example, given the example:

type User struct {
  UserID   int64  `gorm:"user_id"`
  Username string   `gorm:"username"`
  Password string   `gorm:"password"`
  Address  string   `gorm:"address"`
}

func BulkTryInsert(data []*User) error{
  str:=make([]string, 0, len(data))
  param:=make([]interface{},0,len(data)*4)  // 4个属性
  for _,d:=range data {
    str=append(str,"(?,?,?,?)")
    param=append(d.UserID)
    param=append(d.Username)
    param=append(d.Password)
    param=append(d.Address)
  }
  stmt:=fmt.Sprintf("INSERT IGNORE INTO table_name(user_id,username,password,address) VALUES %s",strings.Join(str,",") )
  return DB.Exec(stmt, param...).Error
}

func BulkUpsert(data []*User) error{
  str:=make([]string, 0, len(data))
  param:=make([]interface{},0,len(data)*4)  // 4个属性
  for _,d:=range data {
    str=append(str,"(?,?,?,?)")
    param=append(d.UserID)
    param=append(d.Username)
    param=append(d.Password)
    param=append(d.Address)
  }
  stmt:=fmt.Sprintf("REPLACE INTO table_name(user_id,username,password,address) VALUES %s",strings.Join(str,",") )    // 与上面的区别仅在这行的SQL
  return DB.Exec(stmt, param...).Error
}

Postgres articles

Analytical principle

Insert into ... on conflict (...) do nothing

on conflictNeeded to bring back the key conflict, such as a primary key or Uniqueconstraint. This SQL meaning it literally, when there is a conflict of repeat certain key times and do nothing, that is TryInsert.

Insert into ... on conflict (...) do update set (...)

This is more complicated SQL, Postgresthis syntax surface than the MySQLhigher degree of freedom, in fact, very complicated bulky, not as MySQLpragmatic. setMean, you need to specify which properties are updated when the conflict, which is mandatory and must detail each field, really unfriendly ah. Supposedly to be written this way, which refers to that record on behalf EXCLUDED be inserted:

INSERT INTO ... on conflict (user_id, address) do update set password=EXCLUDED.password and username=EXCLUDED.username

The sample code

This time we imagine a practical scenario, pythonoften used as scientific computing, pandasis our favorite computing package, pandasthe iosection provides a fool-function to read and write files and database data, such as writing the database to_sql, but this function has limitations, it can do TryInsertand then insert the empty data table for upsertthe powerless. For now, we can only achieve it manually.

According to the above resolution, we need to give each table is set up UniqueConstraintto use this syntax. An example is given below:

# 使用的是sqlalchemy
Base = declarative_base()

# 将一个list分割成m个大小为n的list
def chunks(a, n):
    return [a[i:i + n] for i in range(0, len(a), n)]

class DBUser(Base):
  __tablename__ = 'user' # UniqueConstraint和PrimaryKey至少要有一个
  __table_args__ = (UniqueConstraint('user_id', 'address'), 
                   {'schema': 'db'})
  user_id = Column(BigInteger)
  username = Column(String(200))
  password = Column(String(200))
  address = Column(String(200))
  
  def dtype(self): # pandas需要的dtype
    d = {c.name: c.type for c in self.__table__.c}
    if 'id' in d:
        el d['id']   # 一般id都是自动生成的,提供给pandas的dtype应该剔除id
    return d
  
  def fullname(self):
    return self.__table_args__[-1]['schema'] + '.' + self.__tablename__
  
  # 只要DBUser再提供一个Unique Constraint的属性列表,下面这两个函数就可以写成通用的函数
  # 这里只是给出例子,点到为止
  def bulk_try_insert(self, engine, data):
    col = self.dtype().keys()
    col_str = ','.join(col)
    col_str = '(' + col_str + ')'
    update_col = []
    for c in col:
      update_str = '{0}=EXCLUDED.{1}'.format(c, c)
      update_col.append(update_str)
    value_str = []
    value_args = []
    for d in data:
      tmp_str = '(' + col.__len__() * '%s,'
      tmp_str = tmp_str[:-1] + ')'
      value_str.append(tmp_str)
      for k in col:
        value_args.append(d[k])
    
    stmt= 'insert into ' + self.fullname() + col_str + 'values ' + ','.join(
      value_str) + 'on conflict (user_id, address) do update set ' + ",".join(update_col)
    engine.execute(stmt, value_args)
  
  def bulk_insert_chunk(self, engine, data, n=1000):
    d_list = chunks(data, n)
    for a in d_list:
      self.bulk_insert(engine, a)

Guess you like

Origin www.cnblogs.com/ripley/p/12045098.html
Recommended