Explain the indexes and views in the database in detail

Abstract: An index is a list of data and corresponding storage locations in a data table. Using an index can improve the speed of finding data in a table or view.

This article is shared from Huawei Cloud Community " Database Development Guide (6) Using Skills, Methods and Comprehensive Applications of Indexes and Views ", author: bluetata.

1. Index

1.1 What is an index

An index is a list of data and corresponding storage locations in a data table. Using an index can improve the speed of finding data in a table or view. It is similar to the index of a book and can help to locate and retrieve data quickly. In a database, indexes are structures that order and store the values ​​of one or more columns, and they contain pointers to the actual data locations.

1.2 Index classification

Indexes in databases are mainly divided into two categories: clustered indexes and non-clustered indexes. SQL Server also provides unique indexes, indexed views, full-text indexes, XML indexes, and more. Clustered and non-clustered indexes are the basic types of indexes in the database engine and are the basis for understanding other types of indexes.

1.2.1 Clustered Index

The clustered index is that the physical storage order of the data rows in the value table is exactly the same as the storage order of the index. The clustered index physically rearranges the data inserted into the table by the user according to the index order, therefore, only one clustered index can be created per table. Clustered indexes are often created on frequently searched or sequentially accessed columns in a table. By default, primary key constraints automatically create a clustered index.

1.2.2 Non-clustered index

The non-clustered index does not change the physical storage location of the data column in the table, the data is stored separately from the index, and the address pointed to by the index is related to the data in the table.

The nonclustered index does not change the position of the physical row in the table, and the index can use the nonclustered index in the following situations:

  • If the data uniqueness of a field is relatively high
  • If the amount of data obtained by the query is relatively small

1.2.3 The difference between clustered index and non-clustered index

Here is a simple table to summarize the difference between a clustered index and a non-clustered index:

1.2.4 Other types of indexes

In addition to the above indexes, there are the following types of indexes:

  • Unique index : If you want the index keys to be different, you can create a unique index. Both clustered and nonclustered indexes can be unique indexes.
  • Contains new column indexes : the maximum number of index columns is 16, and the maximum value of the total number of bytes for index columns is 900. If the total number of bytes in multiple columns is greater than 900, and you want to include indexes in these inferior types, you can use the index containing new columns
  • View index : Provide view query efficiency, and can physicalize the index of the view, that is to say, the result set is permanently stored in the index, and the view index can be created.
  • XML index : is a form of index associated with xml data and is a split persistent representation of an XML binary blob
  • Full-text indexing : a special type of token-based functionality used to help search strings for assigned words

1.3 Create an index

1.3.1 Syntax

create [unique] [clustered | noclustered]
index index_name
on table_name (column_name ...)
[with fillfactor=x]

parameter explanation

unique unique index
clustered clustered index
noclustered non-clustered index
fillfactor fill factor size, ranging from 0-100 direct, indicating the percentage of space filled by the index page.

1.3.2 Best practice of naming rules for creating indexes

In MSSQL, the best practice of index naming rules can have some common guidelines to improve readability and maintainability. This potential requirement is not only applicable to SQL Server databases, but also to other databases such as MySQL and Oracle.

The following are some naming rules and suggestions summarized by individuals:

  1. Naming should be descriptive : the name of the index should clearly express its role and associated column or table. Using meaningful names makes it easier for other developers to understand the purpose of the index.
  2. Include table and column names : Including relevant table and column names in the index name (long table names can be abbreviated appropriately, but make sure you can locate the table) makes the index more readable and avoids Conflict when using an index with the same name.
  3. Use a consistent naming convention : To improve consistency, a set of naming conventions can be defined and used across the database. For example, a specific prefix or suffix can be used to identify the type of index (such as idx_ for a nonclustered index).
  4. Avoid very long names : Index names should not be so long that they cause inconvenience when using the index. Try to use concise but descriptive names.
  5. Avoid reserved keywords and special characters : Make sure that index names do not conflict with MSSQL's reserved keywords or special characters to avoid syntax errors.

1.3.3 Example of creating an index

-- 普通索引
if (exists (select * from sys.indexes where name = 'idx_stu_name'))
 drop index student.idx_stu_name
go
create index idx_stu_name
on
student(name);
-- 联合索引
if (exists (select * from sys.indexes where name = 'idx_uqe_clu_stu_name_age'))
 drop index student.idx_uqe_clu_stu_name_age
go
create unique clustered index idx_uqe_clu_stu_name_age
on student(name, age);
if (exists (select * from sys.indexes where name = 'idx_cid'))
 drop index student.idx_cid
go
if (exists (select * from sys.indexes where name = 'idx_cid'))
 drop index student.idx_cid
go
-- 非聚集索引
create nonclustered index idx_cid
on
student (cid)
with fillFactor = 30; --填充因子
-- 聚集索引
if (exists (select * from sys.indexes where name = 'idx_sex'))
 drop index student.idx_sex
go
create clustered index idx_sex
on
student(sex);
-- 聚集索引
if (exists (select * from sys.indexes where name = 'idx_name'))
 drop index student.idx_name
go
create unique index idx_name
on
student(name);

1.4 Suitable columns for creating indexes

In general, you can choose those columns that have a positive impact on query performance for index creation. The following is a summary:

Selectivity of a column : Selectivity is the ratio of the number of distinct values ​​in a column to the total number of rows. If a column has high selectivity, that is, there are many different values, creating an index for this column may have a better effect. For example, creating an index on a column representing gender might not be of much help since there are only two possible values.

Query Frequency : Observe the columns that are frequently used for query conditions. If a column is frequently used for search, filter, or join operations, creating an index on that column can improve query performance.

The size of the data table : For large tables, the impact of creating indexes may be more significant. Smaller tables may not require as many indexes, since the overhead of a full table scan is relatively small.

Data update frequency : The creation and maintenance of indexes will also increase the overhead of data write operations. If the data of a certain column changes frequently, creating an index may bring certain performance overhead.

Query performance optimization requirements : By analyzing the query execution plan , you can determine whether there are potential performance bottlenecks, and consider creating indexes for related columns to improve query performance.

Please note that too many indexes may also bring maintenance overhead and storage costs, so you need to find a balance between the number of indexes and performance improvement. It is also important to regularly monitor and evaluate index usage to ensure that indexes are still having a positive impact on database performance.

1.5 Columns that are not suitable for indexing

Although indexing can improve query performance in some cases, not all columns are suitable for indexing . The following are some cases where columns are not suitable for indexing:

Low selectivity columns : If a column has very low selectivity, that is, the column has few distinct values, creating an index may not bring about a significant performance improvement. For example, it might not make much sense to create an index on a column like gender that has only a few possible values.

Frequently updated columns : If the value of a column is frequently modified, creating an index for the column may bring additional maintenance costs and performance overhead. Every update operation requires the index to be updated, which can affect write performance. In this case, you need to carefully evaluate whether you really need to create an index for this column.

Columns queried too frequently : Some columns may be queried frequently, but they are less selective, i.e. have fewer distinct values. In this case, despite high query frequency, creating an index on this column may not provide a noticeable performance gain because the index is used to a limited extent.

Large text or large binary columns : Indexing is generally less effective for columns that store large text or large binary data, such as long text fields or image fields. This is because the index itself requires additional storage space, and indexing operations on large data can become very time-consuming.

Infrequently used columns : For columns that are rarely used in queries, it may not make much sense to create an index. If a column is rarely used in query conditions or join operations, creating an index for it may only add additional overhead without actual performance improvement.

It should be noted that the situations listed above are only general guidelines, and whether it is suitable to create indexes depends on the specific database structure, query mode and performance requirements. When designing and creating an index, it should be evaluated on a case-by-case basis, and performance testing and optimization should be performed to ensure the effectiveness of the index.

2. View

2.1 What is a view

A view is a virtual data table, and the data records in the data table are obtained by the query result of a query statement.

2.2 Why use a view instead of a table (the interview may be asked)

If you are asked this question during the interview, it is recommended to answer the interviewer from the following process.

First introduce the role of the table (for example, the table directly stores structured data, which can be expanded, deleted, modified, etc.), then introduces what the view is, and then explains the benefits and necessity of the view from two entry points. The first entry point is: reusability and security, here is a brief summary:

  1. Simplify queries and improve reusability
    Imagine a wide table of people with hundreds of fields, but you only need to use the three fields of name, gender, and age in this table every time, then you can create a view To use directly, or your personnel table is often combined with another resume table, but only some of the fields are taken, and these fields are frequently used. Then it is undoubtedly a good practice to create a view. Of course, this situation can also show that using views can simplify queries.
  2. improve security
  • Views allow you to restrict users' direct access to sensitive data. Views can control the scope of data that users can see and manipulate, providing better security and privacy protection. Here I also take the three fields of name, gender, and age I just mentioned. If age is a relatively sensitive field, then for some database users, only name and gender can be queried, then a view can be set and assigned to this user.
  • In addition, if you want to update the view, you can only update the fields seen in the view. Users cannot change or delete the view at will, which can guarantee data security to a certain extent.

After explaining the above two major key points, you can also play it by yourself. For example, you can adjust the display order of table fields, or field names, etc. in the view. These are also advantages. can be properly explained.

2.3 Create a view

When creating a view, you generally have default rules for naming views. In general, you can use the form of v_ or view_ + table name (table abbreviation).

For example: v_student

--创建视图
if (exists (select * from sys.objects where name = 'v_student'))
 drop view v_student
go
create view v_student
as
select id, name, age, sex from student;

2.4 Guidelines for creating views

Creating a view requires a few guidelines to consider:

  1. View names must follow the rules for identifiers, and the name must not be the same as the name of any table of the schema.
  2. You can create views on other views. Nested views are allowed up to 32 levels deep. Views can have up to 1024 fields.
  3. You cannot associate rules and default definitions with views.
  4. A query for a view cannot contain a compute clause, a compute by clause, or the into keyword.
  5. The query defining the view cannot contain an order by clause unless there is also a top clause in the select list of the select statement.

The name of each column in the view must be specified in the following cases:

  • There are column order requirements (In some cases, you may wish to define the order of the columns in the view's result set, and this differs from the order in the underlying tables.)
  • Any column in the view is derived from an arithmetic expression, built-in function, or constant
  • There are two or more columns in the view with the same name (usually from two or more different columns with the same name because the view definition contains joins)
  • There is a requirement to specify column aliases. Note that no matter whether it is renamed or not, the view column needs to inherit the data type of the original column

2.5 Modify View

Modifying a view is somewhat similar to modifying a table, you can directly use the alter keyword to modify, the example is as follows:

alter view v_student
as
select id, name, sex from student;
alter view v_student(编号, 名称, 性别)
as
 select id, name, sex from student
go
select * from v_student;
select * from information_schema.views;

2.6 Encrypted view

If you need to protect query logic, prevent modification, or query encryption for a certain view, you can use encrypted view operations.

After using with encryption in SQL Server, the SQL query defined by it can be encrypted when creating a view. That is to say, MSSQL will encrypt the query statement in the definition of the view. This means that others cannot directly view or analyze the view's query logic. You can't see the internal structure of this view at all.

-- 加密视图
if (exists (select * from sys.objects where name = 'v_student_info'))
 drop view v_student_info
go
create view v_student_info
with encryption --加密
as
 select id, name, age from student
go
--view_definition is null
select * from information_schema.views 
where table_name like 'v_student';

How to decrypt an encrypted view, or modify an encrypted view:

Generally, after a view is encrypted, you need to modify it, so there are roughly three methods:

  1. Recreate the view (drop the encrypted view first, then recreate the view with the new query logic.).
  2. Create a new view (create a new one with a different view name and call this new one afterwards).
  3. Modify after violent decryption (generally need third-party tools or assistance, this method is not recommended personally)

2.7 Whether the view can be updated (the interview may be asked)

视图可以被更新吗?什么情况下可以被更新? 

If the interviewer asks these two questions, then he reminds you in a friendly manner. If you directly ask a sentence "Can the view be updated?", then I feel suspected of being dug.

Views can be updated, but not in all cases.

View updates must obey the following rules:

  1. When the field of the view is the result calculated by field expression (Field Expression) or constant (Constant), it is not allowed to perform INSERT and UPDATE operations on the view, but DELETE operations can be performed.
  2. If the field of the view is from a library function, the view does not allow updating;
  3. If there is a GROUP BY clause or an aggregate function in the definition of the view, the view does not allow updating;
  4. If there is a DISTINCT option in the definition of the view, the view does not allow updating;
  5. If there is a nested query in the definition of the view, and the table involved in the FROM clause of the nested query is also the base table from which the view is exported, the view is not allowed to be updated;
  6. If the view is derived from more than two base tables, the view is not allowed to be updated (only source table 1 can be updated);
  7. A view defined on a view that does not allow updates is also not allowed to be updated;
  8. Views defined by a base table contain only the primary or alternate keys of the base table, and no attributes defined by expressions or functions in the view are allowed to be updated.

 

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10085149