Given a text file, each line is a stock information, write a program to extract all stock codes

Question: Given a text file, each line is a stock information, write a program to extract all stock codes. Among them, the stock code rule is: 6 digits,
and end with .SH or .SZ.
Example of file content:
2020-08-08; Ping An Bank (000001.SZ); 15.55; 294.00 billion
2020-08-08; Hengrui Medicine (600276.SH); 95.32; 495.65 billion (including non-tradable market value)
... ...
2020-08-08; CATL (300750.SZ); 205.32; 465.7 billion
Output:
['000001.SZ','600276.SH'.......'300750.SZ']

answer:

import re

# 打开文件并读取所有行
with open('file.txt', 'r') as f:
    lines = f.readlines()

# 定义正则表达式规则
pattern = re.compile(r'\d{6}\.[SZ]{2}')

# 提取所有股票代码
codes = []
for line in lines:
    match = pattern.search(line)
    if match:
        codes.append(match.group())

# 输出所有股票代码
print(codes)

In this sample code, we first open the file and read all the lines into a list. Then, we define a regex rule to match stock symbols. Next, we use a loop to go through all the rows and look for tickers that match the rule. Finally, we store all matching tickers in a list and output it.

Note that the filename here should be changed to your actual filename, not 'file.txt'.

But there will be an error:

content = f.readlines()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa1 in position 21: illegal multibyte sequence

 reason:

This error is usually caused by a file encoding that does not match the encoding used by Python. You can try to specify the file encoding method, for example:

with open('file.txt', 'r', encoding='utf-8') as f:
    lines = f.readlines()

 So when this problem occurs, just

becomes

with open('file.txt', 'r',encoding='utf-8') as f:
    lines = f.readlines()

 

 

Guess you like

Origin blog.csdn.net/CSH__/article/details/130471981