Write your own database: basic preparations for SQL query processing

Students who have used relational databases will understand the SQL language, which is a part of the database query module. This language can describe what data the user wants to obtain and how to process the data. The SQL language is based on a logic called relational algebra. This logic is based on three underlying operations, namely select, project, and product. The input of the three operations is one or several tables, and the result of select processing is output and input. The same table, but with several rows removed from the table. The project output is the removal of several columns of the input table; the product output is the possible combination of all records in those tables in the input.

We will not discuss too much about relational algebra for the time being, and its specific content will be presented in our subsequent code implementation. From a code point of view, when the three operations are completed, we can use a unified interface to describe the results of the operation. Create a folder called query under the root directory of our project, and add a file named interface inside it. .go file, and then add the following content:

package query

import (
	"record_manager"
)

type Scan interface {
    
    
	BeforeFirst()
	Next() bool
	GetInt(fldName string) int
	GetString(fldName string) string
	//本模块会自己定义一个Constant对象
	GetVal(fldName string) *Constant
	HasField(fldName string) bool
	Close()
}

We have implemented an interface called TableScan in the previous chapter. The above definition is not much different from TableScan, because these two interfaces operate on the table. The Scan object is actually an abstract representation of the execution result of the SQL statement. Students of database application and development will understand that the results returned by SQL execution may correspond to the records in the database table, and the other may return is the view, which is actually the representation of the data records after specific processing, and it does not correspond to the actual The data exists on the hard disk, so the result after SQL execution can not be modified in some cases, and some can be modified, for example, the result after the execution of the select statement can be modified.

Therefore, we also need to create a new interface on the basis of Scan to modify the result of the SQL statement execution, so we create another interface called UpdateScan, the code is as follows:

type UpdateScan interface {
    
    
	Scan 
	SetInt(fldName string, val int)
	SetString(fldName string, val string)
	SetVal(fldName string, val record_manager.Constant)
	Insert()
	Delete()
	GetRid() record_manager.RID
	MoveToRid(rid record_manager.RID)
}

The specific implementation of the above interface also requires us to understand other concepts. Since they are used to implement the operations specified by the SQL statement, we must first have an interface to correspond to the operation of the SQL statement. The first object used to describe the SQL statement is called Predicates, which is used to represent the query conditions behind the where statement. Suppose we have a query statement as follows:

where (GradYear > 2021 or MOD(GradYear, 4) = 0) and MajorId = DId

Where "(GradYear > 2021 or MOD(GradYear, 4) = 0) and MajorId = DId" wants to perform an operation, we create a Predicate object in the code to describe it. The problem here becomes a bit tricky, because the next analysis will involve the content of the compilation principle. First, we can find that the statement after where can be divided into several components through the or, and keywords, such as GradYear > 2021, MOD(GradYear ,4) = 0, MajorId = DId, we use a Term to represent these parts.

Let's look at GradYear > 2021 again, which is divided into two parts by an operator >, which are GradYear on the left and 2021 on the right. In addition, MOD(GradYear, 4) = 0 is divided into left and right parts by the = sign, so we use expression to represent these parts.

Next, we decompose the expression again. For MOD (GradYear, 4), we can divide it into MOD, GradYear, 4, where MOD specifies an operation, which we call operation, and GradYear is a field name, use field name Said, and then "4" is a constant, represented by constant.

For a better understanding, let's look at a specific example, for Predicate:

SName = 'joe' and MajorId = DId

Let's use a piece of pseudocode to see how to construct a Predicate object:

//创建表达式SName = 'joe'
1hs1 := NewExpression("SName")
c := Constant("joe")
rhs1 = NewExpression(c)
t1 = NewTerm(1hs1, rhs1)
//创建表达式MajorId = DId
1hs2 := NewExpression("MajorId")
rhs2 := NewExpression("DId")
t2 = NewTerm(1hs2, rhs2)

pred1 := NewPredicate(t1)
pred2 := NewPredicate(t2)
pred1.ConjoinWith(pred2)

Let's take a look at the corresponding code implementation of Constant, Expression, and Term. First, create the constant.go file. The implementation code is as follows:

package query

import (
	"strconv"
)

type Constant struct {
    
    
	ival *int
	sval *string
}

func NewConstantWithInt(ival *int) *Constant {
    
    
	return &Constant{
    
    
		ival: ival,
		sval: nil,
	}
}

func NewConstantWithString(sval *string) *Constant {
    
    
	return &Constant{
    
    
		ival: nil,
		sval: sval,
	}
}

func (c *Constant) AsInt() int {
    
    
	return *c.ival
}

func (c *Constant) AsString() string {
    
    
	return *c.sval
}

func (c *Constant) Equals(obj *Constant) bool {
    
    
	if c.ival != nil && obj.ival != nil {
    
    
		return *c.ival == *obj.ival
	}

	if c.sval != nil && obj.sval != nil {
    
    
		return *c.sval == *obj.sval
	}

	return false
}

func (c *Constant) ToString() string {
    
    
	if c.ival != nil {
    
    
		return strconv.FormatInt((int64)(*c.ival), 10)
	}
	
	return *c.sval
}

Let's take a look at the implementation of Expression and create the file expression.go. The implementation code is as follows:

package query

import (
	"record_manager"
)

type Expression struct {
    
    
	val     *Constant
	fldName string
}

func NewExpressionWithConstant(val *Constant) *Expression {
    
    
	return &Expression{
    
    
		val:     val,
		fldName: "",
	}
}

func NewExpressionWithString(fldName string) *Expression {
    
    
	return &Expression{
    
    
		val:     nil,
		fldName: fldName,
	}
}

func (e *Expression) IsFieldName() bool {
    
    
	return e.fldName != ""
}

func (e *Expression) AsConstant() *Constant {
    
    
	return e.val
}

func (e *Expression) AsFieldName() string {
    
    
	return e.fldName
}

func (e *Expression) Evaluate(s Scan) *Constant {
    
    
	/*
		expression 有可能对应一个常量,或者对应一个字段名,如果是后者,那么我们需要查询该字段对应的具体值
	*/
	if e.val != nil {
    
    
		return e.val
	}

	return s.GetVal(e.fldName)
}

func (e *Expression) AppliesTo(sch *record_manager.Schema) bool {
    
    
	if e.val != nil {
    
    
		return true
	}

	return sch.HasFields(e.fldName)
}

func (e *Expression) ToString() string {
    
    
	if e.val != nil {
    
    
		return e.val.ToString()
	}
	
	return e.fldName
}

Let's take a look at the implementation of term, create a file called term.go, and the implementation code is as follows:

package query

import (
	"math"
	"record_manager"
)

type Term struct {
    
    
	lhs *Expression
	rhs *Expression
}

func NewTerm(lhs *Expression, rhs *Expression) *Term {
    
    
	return &Term{
    
    
		lhs,
		rhs,
	}
}

func (t *Term) IsSatisfied(s Scan) bool {
    
    
	lhsVal := t.lhs.Evaluate(s)
	rhsVal := t.rhs.Evaluate(s)
	return rhsVal.Equals(lhsVal)
}

func (t *Term) AppliesTo(sch *record_manager.Schema) bool {
    
    
	return t.lhs.AppliesTo(sch) && t.rhs.AppliesTo(sch)
}

func (t *Term) ReductionFactor(p *Plan) int {
    
    
	//Plan是后面我们研究SQL解析执行时才创建的对象,
	lhsName := ""
	rhsName := ""
	if t.lhs.IsFieldName() && t.rhs.IsFieldName() {
    
    
		lhsName = t.lhs.AsFieldName()
		rhsName = t.rhs.AsFieldName()
		if p.DistanctValues(lhsName) > p.DistanctValues(rhsName) {
    
    
			return p.DistanctValues(lhsName)
		}
		return p.DistanctValues(rhsName)
	}

	if t.lhs.IsFieldName() {
    
    
		lhsName = t.lhs.AsFieldName()
		return p.DistanctValues(lhsName)
	}

	if t.rhs.IsFieldName() {
    
    
		rhsName = t.rhs.AsFieldName()
		return p.DistanctValues(rhsName)
	}

	if t.lhs.AsConstant().Equals(t.rhs.AsConstant()) {
    
    
		return 1
	} else {
    
    
		return math.MaxInt
	}
}

func (t *Term) EquatesWithConstant(fldName string) *Constant {
    
    
	if t.lhs.IsFieldName() && t.lhs.AsFieldName() == fldName && !t.rhs.IsFieldName() {
    
    
		return t.rhs.AsConstant()
	} else if t.rhs.IsFieldName() && t.rhs.AsFieldName() == fldName && !t.lhs.IsFieldName() {
    
    
		return t.lhs.AsConstant()
	} else {
    
    
		return nil
	}
}

func (t *Term) EquatesWithField(fldName string) string {
    
    
	if t.lhs.IsFieldName() && t.lhs.AsFieldName() == fldName && t.rhs.IsFieldName() {
    
    
		return t.rhs.AsFieldName()
	} else if t.rhs.IsFieldName() && t.rhs.AsFieldName() == fldName && t.lhs.IsFieldName() {
    
    
		return t.lhs.AsFieldName()
	}

	return ""
}

func (t *Term) ToString() string {
    
    
	return t.lhs.ToString() + "=" + t.rhs.ToString()
}

The Term code implemented above is not easy to understand at present, because its logic requires us to master the content behind to understand, and it uses an object Plan that we will use when we study SQL analysis and execution later, so the code here is given first. After we study the follow-up content, we will have a better grasp in retrospect.

Next, let's look at the implementation of Predicate, create a file named predicate.go, and the implementation code is as follows:

package query

import (
	"strconv"
)

type Constant struct {
    
    
	ival *int
	sval *string
}

func NewConstantWithInt(ival *int) *Constant {
    
    
	return &Constant{
    
    
		ival: ival,
		sval: nil,
	}
}

func NewConstantWithString(sval *string) *Constant {
    
    
	return &Constant{
    
    
		ival: nil,
		sval: sval,
	}
}

func (c *Constant) AsInt() int {
    
    
	return *c.ival
}

func (c *Constant) AsString() string {
    
    
	return *c.sval
}

func (c *Constant) Equals(obj *Constant) bool {
    
    
	if c.ival != nil && obj.ival != nil {
    
    
		return *c.ival == *obj.ival
	}

	if c.sval != nil && obj.sval != nil {
    
    
		return *c.sval == *obj.sval
	}

	return false
}

func (c *Constant) ToString() string {
    
    
	if c.ival != nil {
    
    
		return strconv.FormatInt((int64)(*c.ival), 10)
	}

	return *c.sval
}

With the above code, let's look at the relevant code for processing the execution result of the select statement. Here we first give the basic implementation of the code. Its logic requires us to explain in detail the analysis of the SQL language and the analysis result of the database engine later. It can be better understood after processing. Create a file named select_scan.go in the local directory, and enter the code as follows:

package query
import (
	"record_manager"
)
type SelectionScan struct {
    
    
	scan UpdateScan
	pred *Predicate
}

func NewSelectionScan(s UpdateScan, pred *Predicate) *SelectionScan {
    
    
	return &SelectionScan{
    
    
		scan: s,
		pred: pred,
	}
}

func (s *SelectionScan) BeforeFirst() {
    
    
	s.BeforeFirst()
}

func (s *SelectionScan) Next() bool {
    
    
	for s.scan.Next() {
    
    
		if s.pred.IsSatisfied(s) {
    
    
			return true
		}
	}

	return false
}

func (s *SelectionScan) GetInt(fldName string) int {
    
    
	return s.scan.GetInt(fldName)
}

func (s *SelectionScan) GetString(fldName sring) string {
    
    
	return s.scan.GetString(fldName)
}

func (s *SelectionScan) GetVal(fldName String) string {
    
    
	return s.scan.GetVal(fldName)
}

func (s *SelectionScan) HasField(fldName string) bool {
    
    
	return s.scan.HasField(fldName)
}

func (s *SelectionScan) Close() {
    
    
	s.scan.Close()
}

func (s *SelectionScan) SetInt(fldName string, val int) {
    
    
    s.scan.SetInt(fildName, val)
}

func (s *SelectionScan) SetString(fldName string, val string) {
    
    
	s.scan.SetString(fldName, val)
}

func (s *SelectionScan) SetVal(fldName string, val *Constant) {
    
    
	s.scan.SetVal(fldName, val)
}

func (s *SelectionScan) Delete() {
    
    
	s.scan.Delete()
}

func (s *SelectionScan) Insert() {
    
    
	s.scan.Insert()
}

func (s *SelectionScan) *record_manager.RID {
    
    
	s.scan.GetRid()
}

func (s *SelectionScan)MoveToRID(rid *record_manager.RID) {
    
    
	s.scan.MoveToRid(rid)
}

Let's create another file called project_scan.go, which will implement the interface Scan, and its content is as follows:

package query

import (
	"errors"
)

type ProjectScan struct {
    
    
	scan      Scan
	fieldList []string
}

func NewProductionScan(s Scan, fieldList []string) *ProjectScan {
    
    
	return &ProjectScan{
    
    
		scan:      s,
		fieldList: fieldList,
	}
}

func (p *ProjectScan) BeforeFirst() {
    
    
	p.scan.BeforeFirst()
}

func (p *ProjectScan) Next() bool {
    
    
	return p.scan.Next()
}

func (p *ProjectScan) GetInt(fldName string) (int, error) {
    
    
	if p.scan.HasField(fldName) {
    
    
		return p.scan.GetInt(fldName), nil
	}

	return 0, errors.New("Field Not Found")
}

func (p *ProjectScan) GetString(fldName string) (string, error) {
    
    
	if p.scan.HasField(fldName) {
    
    
		return p.scan.GetString(fldName), nil
	}

	return "", errors.New("Field Not Found")
}

func (p *ProjectScan) GetVal(fldName string) (*Constant, error) {
    
    
	if p.scan.HasField(fldName) {
    
    
		return p.scan.GetVal(fldName), nil
	}

	return nil, errors.New("Field Not Found")
}

func (p *ProjectScan) HasField(fldName string) bool {
    
    
	for _, s := range p.fieldList {
    
    
		if s == fieldName {
    
    
			return true
		}
	}

	return false
}

func (p *ProjectScan) Close() {
    
    
	p.scan.Close()
}

Finally, let's look at the implementation of the product operation, create product_scan.go, and the implementation code is as follows:
```go
package query

type ProductScan struct {
scan1 Scan
scan2 Scan
}

func NewProductScan(s1 Scan, s2 Scan) *ProductScan {
p := &ProductScan{
scan1: s1,
scan2: s2,
}

p.scan1.Next()
return p 

}

func (p *ProductScan) BeforeFirst() {
p.scan1.BeforeFirst()
p.scan1.Next()
p.scan2.BeforeFirst()
}

func (p *ProductScan)Next() bool {
if p.scan2.Next() {
return true
} else {
p.scan2.BeforeFirst()
return p.scan2.Next() && p.scan1.Next()
}
}

func (p *ProductScan) GetInt(fldName string) int {
if p.scan1.HasField(fldName) {
return p.scan1.GetInt(fldName)
} else {
return p.scan2.GetInt(fldName)
}
}

func (p *ProductScan) GetString(fldName string) string {
if p.scan1.HasField(fldName) {
return p.scan1.GetString(fldName)
} else {
return p.scan2.GetString(fldName)
}
}

func (p *ProductScan) GetVal(fldName string) *Constant {
if p.scan1.HasField(fldName) {
return p.scan1.GetVal(fldName)
} else {
return p.scan2.GetVal(fldName)
}
}

func (p *ProductScan) HasField(fldName string) bool {
return p.scan1.HasField(fldName) || p.scan2.HasField(fldName)
}

func (p *ProductScan) Close() {
p.scan1.Close()
p.scan2.Close()
}

```
本节我们给出的代码主要是被后续SQL解析执行时被调用,因此在没有后面内容基础前,这些代码的逻辑不好理解,在后续我们完成SQL解析执行的描述后,我们会调用上面代码,到时候逻辑就会更加清晰,更多内容请在B站搜索"coding迪斯尼"。

Guess you like

Origin blog.csdn.net/tyler_download/article/details/128981647