[Getting Started] ClickHouse Data Type


Insert image description here

1 Introduction

ClickHouse database system supports multiple data types and complex table structure designs. The following is a detailed list of possible blog posts that will introduce ClickHouse's data types and table structure design.

2. ClickHouse data type

type of data definition scenes to be used Example
Basic data types
Integer type (Int8, Int16, Int32, Int64) Stores integer values, supports positive and negative integers Store integer scenarios such as age, statistical quantity, etc. Age: 25, views: 1000
Floating point type (Float32, Float64) Store decimal values ​​and support decimals with different precisions Store data such as prices and ratings that need to be accurate to decimal places Price: 23.5, Rating: 4.3
String type (String, FixedString) Store text data, supporting variable-length and fixed-length strings Store text information such as name, address, description, etc. Name: "Zhang San", address: "Beijing"
Composite data type
Array Store multiple values ​​of the same type Store multiple tags for an article, multiple products for an order, etc. Tags: ["Technology", "Education"]
Enumeration type (Enum) Stores a predefined set of values, mapping strings to integers Store limited and fixed values ​​such as gender, order status, etc. Gender: Male, Order status: Paid
Tuple Stores a fixed-length ordered list, each element can be of any type Store fixed-length combined data such as latitude and longitude, RGB color values, etc. Latitude and longitude: (116.4, 39.9)
Map Store key-value pairs Store user attributes, product metadata, etc. User attributes: {"age": 25, "city": "Beijing"}
Nullable Store a possibly null value and add a null flag to any type of data Store fields that may be empty, such as user nicknames, product discounts, etc. Nickname: Nullable ("Zhang San")
Special data types
Date and time types (Date, DateTime) Store date and time Store user registration time, order creation time, etc. Registration time: 2021-08-01
UUID Store a unique identifier and generate a globally unique random string Store the user's unique ID, the order's unique ID, etc. User ID: 550e8400-e29b-41d4-a716-446655440000
IP address (IPv4, IPv6) Store IP address Store the user's IP address, server's IP address, etc. User IP: 192.168.1.1
AggregateFunction Stores the state of an aggregate function Statistics of average user active time, calculation of total product sales, etc. Average active time: 5.5 hours

2.1 Basic data types

2.1.1 Integer type

The integer type is an integer data type that supports the storage of positive and negative integers. ClickHouse supports multiple integer types, such as Int8, Int16, Int32, Int64, etc. Each type supports different data ranges. For example, Int8 supports integers from -128 to 127, while Int32 supports integers from -2147483648 to 2147483647. Choose the appropriate integer type based on your data size and needs.

Usage scenarios: Store integer scenarios such as age, statistical quantity, etc.

2.1.2 Floating point type

Floating point is a decimal data type. ClickHouse provides two floating point number types, Float32 and Float64, for storing decimals. Float32 supports decimals with approximately 7 digits of precision, while Float64 supports decimals with approximately 15 digits of precision.

Usage scenario: Store prices, ratings and other data that need to be accurate to decimals.

2.1.3 String type

String type is used to store text data. String types in ClickHouse include String and FixedString. String is used to store variable-length strings, and FixedString is used to store fixed-length strings.

Usage scenario: Store text information such as name, address, description, etc.

2.2 Composite data types

2.2.1 Array

Array type is used to store multiple values ​​of the same type. It can store any type of data, including basic types and composite types.

Usage scenarios: Store multiple tags of an article, multiple products of an order, etc.

2.2.2 Enumeration types

The Enum type is used to store a predefined set of values. It can map strings to integers, thereby saving storage space and improving query efficiency.

Usage scenario: Store limited and fixed values ​​such as gender, order status, etc.

2.2.3 Tuple

The Tuple type is used to store fixed-length ordered lists. Each element in the tuple can be of any type.

Usage scenario: Store fixed-length combined data such as latitude and longitude, RGB color values, etc.

2.2.4 Map

Map type is used to store key-value pairs. It makes it easy to query and update data.

Usage scenarios: Store user attributes, product metadata, etc.

2.2.5 Nullable

The Nullable type is used to store values ​​that may be null. It can add a mark indicating whether it is empty or not to any type of data.

Usage scenario: Store fields that may be empty, such as user nicknames, product discounts, etc.

2.3 Special data types

2.3.1 Date and time types

ClickHouse provides Date and DateTime types for storing dates and times. The Date type is used to store dates (year, month, day), and the DateTime type is used to store dates and times.

Usage scenarios: Store user registration time, order creation time, etc.

2.3.2 UUID

The UUID type is used to store unique identifiers. It can generate globally unique random strings.

Usage scenarios: Store the unique ID of the user, the unique ID of the order, etc.

2.3.3 IP address

ClickHouse provides IPv4 and IPv6 types for storing IP addresses.

Usage scenario: Store the user's IP address, the server's IP address, etc.

2.3.4 AggregateFunction

The AggregateFunction type is used to store the state of an aggregate function. It facilitates aggregate queries.

Usage scenarios: counting the average active time of users, calculating the total sales of goods, etc.

2.4 Selection and use of data types

2.4.1 How to choose the appropriate data type

When selecting data types, factors such as the nature of the data, business needs, and query efficiency need to be considered. For example, if the data may be empty, you should choose the Nullable type; if the data is limited and fixed, you may consider using the Enum type.

2.4.2 Data type conversion

ClickHouse provides a series of functions that can convert between different data types. For example, you can use the toString function to convert a number to a string, and the toInt32 function to convert a string to an integer.

3. Differences in similar data types

  1. The difference between Int and UInt: Integers of type Int can be negative, while integers of type UInt can only be non-negative.

  2. The difference between Float32 and Float64: Float32 is a single-precision floating-point number, and Float64 is a double-precision floating-point number.

  3. The difference between String and FixedString: The String type can store strings of any length, while the FixedString type needs to specify the length of the string when defining.

  4. The difference between Date, DateTime, and DateTime64: The Date type is used to represent dates, DateTime is used to represent dates and times, and DateTime64 provides higher time precision.

  5. The difference between Enum8 and Enum16: Enum8 can store up to 256 enumeration values, while Enum16 can store up to 65536 enumeration values.

  6. Definition and use of Tuple and Nested: The Tuple type can store a set of values ​​of different types, while the Nested type can store a set of data with the same structure.

4. Other numeric types

4.1. ClickHouse geographical location data type

  1. Definition and use of Point, LineString, Polygon and other types:

Point: Used to represent the geographical location of a point on the earth's surface. For example: Point(经度, 纬度)Such as Point(30.5, 50.2).

LineString: represents a series of connected line segments, consisting of multiple Points. For example: LineString(Point(30.5, 50.2), Point(31.5, 51.2), Point(32.5, 52.2)).

Polygon: Represents a polygonal area, a ring-shaped structure composed of multiple points. For example: Polygon((30.5, 50.2), (31.5, 51.2), (32.5, 52.2), (30.5, 50.2)), please note that the first and last points must be the same.

Usage scenarios of the geographical location data type: It is suitable for storing and querying geospatial information, such as map applications, logistics, travel and other scenarios that require geographical location analysis.

4.2. ClickHouse null and non-null values

  1. The definition and use of Null and NotNull:

Null: Represents missing or unknown values ​​of data. You can specify a column as Nullable type when defining the table, for example Nullable(String).

NotNull: Indicates that storing null values ​​is not allowed, that is, the data must have a value. By default, most data types are NotNull.

Usage scenarios of null and non-null values: In some cases, the data may be incomplete or some fields cannot be obtained, and the Nullable type can be used to store the data. If a certain column of data always exists, you can use the NotNull type, which can improve query performance.

4.3. ClickHouse data type conversion

  1. How to convert between different data types: You can use CASTfunctions for type conversion. For example, convert String type to Int32 type: CAST('123' AS Int32).

  2. Notes on data type conversion: When performing type conversion, you need to ensure that the original data can be successfully converted to the target type, otherwise data loss or conversion errors may result.

4.4. Performance considerations for the ClickHouse data type

ClickHouse's data type selection has an important impact on storage space and query performance. When designing your database architecture, consider the following performance factors to optimize your system's performance:

1. Storage space impact:

  • Data type size: Different data types occupy different amounts of storage space. Choosing smaller data types saves storage space. For example, using Int8 instead of Int32 reduces integer storage space to 1/4.
  • Compression: ClickHouse has powerful data compression capabilities that can significantly reduce the size of data storage. Depending on the characteristics of the data type, selecting appropriate compression algorithms and settings can further reduce storage space usage.

2. Query performance impact:

  • Computational complexity of data types: Certain data types may be more complex to compute and operate on than others. For example, comparisons and pattern matching of string types are generally more time-consuming than integer types. Selecting data types with lower computational complexity can improve query performance.
  • Indexing and filtering efficiency: Indexing and filtering are key to query performance. Some data types support more efficient indexing and filtering operations. For example, using a date type instead of a string type can make time range queries faster.
  • Sequentiality of data types: ClickHouse is a columnar database, and data is stored in columns. Certain data types have better sequentiality, which can improve query performance. For example, ordered integer types are easier to compress and query than unordered string types.

When selecting a data type, storage space, query performance, and data semantics requirements need to be considered comprehensively. It needs to be weighed and tested according to specific application scenarios and data characteristics to obtain the best performance and storage efficiency.

At the same time, be careful to avoid over-optimization. In some cases, the small storage space savings or query performance improvements may not be worth the complexity and additional development costs. Therefore, it is very important to evaluate and test the comprehensive performance in real scenarios.

type of data Storage space occupied
UInt8 1 byte
Int8 1 byte
UInt16 2 bytes
Int16 2 bytes
UInt32 4 bytes
Int32 4 bytes
UInt64 8 bytes
Int64 8 bytes
Float32 4 bytes
Float64 8 bytes
Decimal(M, D) Depends on accuracy and scale (M+D+1)
String According to the string length and encoding method
FixedString(N) N bytes
Date 4 bytes
DateTime 8 bytes
Enum8 1 byte
Enum16 2 bytes
Array(T) Calculate based on element type and number of elements
Tuple(T1, T2,…) Calculate based on element type and number of elements
Nullable(T) Calculate based on basic types and null value flags
UUID 16 bytes
IPv4 4 bytes
IPv6 16 bytes
Nested Calculations based on nested structures and data types of individual fields

The above are some common ClickHouse data types and their general impact on storage space. Note that actual storage space may be affected by data compression, specific settings of the column engine, and other factors. Therefore, in specific applications, it is best to conduct actual testing and evaluation to obtain accurate storage space usage.

Guess you like

Origin blog.csdn.net/wangshuai6707/article/details/132920839