One way to solve the problem of too large data in the database: Index

An index is a data structure used in a database to speed up data queries. It is similar to the table of contents of a book, and the index can help the database system quickly locate and access specific data. In a database table, an index consists of the values of one or more columns, and each index has a unique name. By creating an index, the database system maintains a data structure internally to make it faster to locate and retrieve specific rows of data.

The main functions of the index are:

Improve query performance: By using indexes, the database can quickly locate data that meets query conditions, reducing the amount of database scanning, thereby improving query efficiency.
Accelerate data sorting: If a query needs to be sorted by a certain column or multiple columns, indexes can speed up these sorting operations and reduce the time required for sorting.
Optimize connection operations: When performing connection queries, indexes can help the database system quickly match associated data rows and improve the performance of connection operations.

Let’s distinguish the difference between having and not having an index:

Suppose there is a database table named "Students", which contains the following fields: student ID (student_id), name (name), age (age), gender (gender), and score (score).

Now we want to quickly query student information by name, we can consider creating an index on the "name" column of the "Students" table.

The SQL statement to create an index is as follows:

CREATE INDEX idx_name ON Students(name);

After the index is created, when the query statement is executed, the database will quickly locate matching data rows based on the index.

For example, we want to query the student information named "Tom":

SELECT * FROM Students WHERE name = 'Tom';

When an index is created, the database system first searches for the corresponding record in the index and then returns the matching data rows.

If no index is created for the "name" column of the "Students" table, then when the following query statement is executed:

SELECT * FROM Students WHERE name = 'Tom';

The database system will need to perform a full table scan to find rows of data that meet the criteria. It scans the entire "Students" table row by row and compares whether the value of the "name" field in each row is equal to "Tom".

This full table scan may cause the following problems:

Low performance: When the amount of data in the "Students" table is large, a full table scan needs to traverse all data rows, which consumes more time and resources.
High consumption: Queries that do not use indexes require more disk IO and CPU resources, and are a relatively expensive operation for the database system.

Introducing two common indexes: BTREE index and HASH index

Suppose there is a database table named "Students" that contains the following fields: student ID (student_id), name (name), and age (age).

BTREE index:
- Assume that a B-tree index is created on the "name" column of the "Students" table.
- B-tree indexes can effectively support range queries and sort operations. For example, if the query requires a range query based on names, such as finding students whose last names are between "Zhang" and "Li"
```
SELECT * FROM Students WHERE name >= 'Zhang' AND name <= 'Li'; 
```
For queries like this, the B-tree index will quickly locate data rows that meet the conditions.
HASH index:
- Assume that a hash index is created on the "student_id" column of the "Students" table.
- A hash index maps the index value to a specific location in the hash table through a hash function, allowing you to quickly find matching data rows.
```
SELECT * FROM Students WHERE student_id = 12345;
```

Summary: B-tree indexes are suitable for supporting range queries, sorting and partial matching, while hash indexes are suitable for equivalent searches. B-tree indexes can provide more functionality through the ordered nature of the index, but hash indexes generally have faster lookups. It is important to choose the appropriate index type based on specific query requirements and data characteristics.

One way to solve the problem of too large data in the database: Index

Guess you like