python numbers into Chinese / numbers into Chinese python: pure string method is optimal

In the past few days, when I was making a small program, I had such a demand: convert Arabic numerals into Chinese characters, such as '101' into 'one hundred and one', and '10000' into 'ten thousand'.

There are several technical difficulties when doing such a program:

1. The problem of adding units: for example, you need to add 'ten', 'hundred', 'thousand', 'ten thousand'

2. The problem of removing the redundant 'zero': Since more than two units are '0' in Chinese, we will only say one 'zero'. And if the trailing bit is also zero, nothing is read. For example, '10001' is read as '10,001', and '10000' is directly read as '10,000'.

3. The problem of retaining larger units: In the process of removing zero, we need to retain units such as '10,000' and '100 million', but 'hundred' and 'thousand' are not needed.

 

In order to solve the problem, I designed two sets of programming ideas (without the use of ready-made libraries):

1. Use string substitution method:

The first step: put the original number str() and then list(), and replace '0123456789' with Chinese characters '0123456789'.

Step 2: Add units where needed.

Step 3: Remove redundant units and zeros


2. Using mathematical calculation method:

Step 1: Design the function to divide the corresponding power of 10 by the original number. For example, 200000 will be divided by 100000. The number is the value of the corresponding unit, for example, this is 2 in the 100,000th place.

Step 2: Use the remainder after the original number is divided (% to take the remainder), continue to divide with the corresponding number to the power of 10, and then take the remainder... Loop

Step 3: After each division, convert the obtained integer into Chinese and concatenate it with the Chinese of the corresponding rank unit into a string.

Step 4: If the number after the remainder is two or more smaller than the original number, connect a 'zero' to the string and skip the corresponding unit. In theory, no extra 'zero' and unit will be generated.


       Interestingly, method 2 is a very fine programming idea, and in the process of implementation, the program can be further simplified by splitting numbers (such as splitting 100000 into 10+0000) (only use programs within thousands of digits). That's it, add a '10,000' after the calculation of more than 1,000), it can be said that the control of the generation of excess 'zero' is also excellent, except for the end, there will be no useless 'zero' and 'unit'. .

      但是方法一,却是一个很符合python思想的办法:虽然会产生不计其数的多余‘零’和单位,但是在计算的过程中几乎没有用到数学,都是if或者del这样的简单语句。在最后的多余单位删除中,可以采用正则表达式来提高效率。


      两种方法各有优缺点,对于两种变成思路,我分别设计了三种程序来完成任务,并对它们进行计时,对比效率:

任务内容:转换1-9999999所有的数字并产生一个对应列表。

程序一思路:采用数学计算法,整除对应10的次方数+取余,把整除得数转换成字符串,加上对应单位数,再次进行。如果余数和原数字差两位且不是零,则输出一个‘零’并连接。


程序二思路:采用字符串法并应用正则表达式,先把原数字打成列表

用汉字替换所有数字并在对应位次插入单位。如果对应位次汉字是‘零’,则不插入单位,改为再插入一个‘零’。之后用正则表达式识别‘零’+并替换为‘零’,再删除末尾‘零’后返回字符串。


程序三思路:采用纯字符串方法。先把原数字打成列表用汉字替换数字并插入单位。将列表倒置,在列表内搜索‘零’,再验证找到位置处i后面两位[i+2]处是否为单位或者‘零’,如果是,则删除[i+1],之后递归。这样的话零以后的所有多余单位和零就全部被删除了。然后进行搜索,删除所有‘零’左边的一位[i-1](就是它原有对应的单位),之后再删除末尾的‘零’即可。

不废话,上代码:

程序一:

num=['零','一','二','三','四','五','六','七','八','九']

k=['零','十','百','千','万','十','百']
import time

def rankid():
    rank=[]
    for i in range(9999999):
        a=tstr(i)
        rank.append(a)
    return rank

#取整取余并连接,返回连接好的字符串和余数
def turn(x,y):
    if y>= 1:
        a=x//pow(10,y)
        b=x%pow(10,y)
        c=num[a]+k[y]
        if y>4 and b<pow(10,4):
            c+=k[4]
        if (len(str(x))-len(str(b))) >= 2 and b != 0:
            c+=k[0]
    else:
        a=x
        b=0
        c=num[a]

    return (c,b,)
#调用上一个函数,以保证进行完所有的数并返回
def tstr(x):
    c=turn(x,(len(str(x))-1))
    a=c[0]
    b=c[1]
    while b != 0:
        a+=turn(b,(len(str(b))-1))[0]
        b=turn(b,(len(str(b))-1))[1]

    return a

start=time.time()

ranki=rankid()
end=time.time()-start
print('程序共用时:%0.2f'%end)

共用时:362.93

程序很精巧,可惜太慢。


程序二:

import re,time
#主程序
def ranki():
    rank=[]
    for i in range(9999999):
        i=turn(i)
        rank.append(i)
    return rank
#如果超过万,则分为两部分以节约代码和运行速度
def turn(x):
    i=str(x)
    if len(i) >4:
        i=tran(i[0:-4])+'万'+tran(i[-4:])
    else:
        i=tran(i[-4:])
    return i
#转换数字并插入对应单位,单位为‘零’则再插入一个‘零’以方便正则表达式替换
def tran(x):
    num=['零','一','二','三','四','五','六','七','八','九']
    kin=['零','十','百','千']
    x=list(reversed(x))
    for i in x:
        x[(x.index(i))]=num[int(i)]
    if len(x) >= 2:
        if x[1]==num[0]:
            x.insert(1,kin[0])
        else:
            x.insert(1,kin[1])
        if len(x) >= 4:
            if x[3]==num[0]:
                x.insert(3,kin[0])
            else:
                x.insert(3,kin[2])
            if len(x) >= 6:
                if x[5]==num[0]:
                    x.insert(5,kin[0])
                else:
                    x.insert(5,kin[3])
    x=delz(x)
    return x
#进行多余‘零’的删除
#reversed()函数真是可以用在列表和字符串。
#加上 if 语句 防止对不必要的数据进行正则表达式检测
def delz(x):
    x=''.join(x)
    if '零零'in x:
        x=re.sub('零+','零',x)
    if x.startswith('零'):
        x=list(x)
        x.remove('零')
    x=reversed(x)
    x=''.join(x)
    return x
start=time.time()
rank=ranki()
end=time.time()-start
print('程序共用时:%0.2f'%end)
共用时:181.69s

是第一个的两倍快。


程序三:

num=['零','一','二','三','四','五','六','七','八','九']
kin=['十','百','千','万','零']
import time

def sadd(x):
    x.reverse()
    if len(x) >= 2:
        x.insert(1,kin[0])
        if len(x) >= 4:
            x.insert(3,kin[1])
            if len(x) >= 6:
                x.insert(5,kin[2])
                if len(x) >= 8:
                    x.insert(7,kin[3])
                    if len(x) >= 10:
                        x.insert(9,kin[0])
                        if len(x) >= 12:
                            x.insert(11,kin[1])

    x=fw(x)
    x=d1(x)
    x=d2(x)
    x=dl(x)
    return x
    
    
def rankis():
    rank=[]
    for i in range(9999999):
        i=list(str(i))
        for j in i:
            i[(i.index(j))]=num[int(j)]
        i=sadd(i)
        rank.append(i)
    return rank


def d1(x):
    if '零' in x:
        a=x.index('零')
        if a==0:
            del x[0]
            d1(x)
        else:
            if x[a+2] in ['十','百','千','万','零']:
                if x[a+1] != '万':
                    del x[a+1]
                    d1(x)     
    return x
def d2(x):
    try:
        a=x.index('零')
        if x[a-1] in ['ten','hundred','thousand','zero']:
            del x[a-1]
            d2(x[a+1])
    except:pass
    return x

def fw(x):
    if len(x) >= 9:
        if x[8] == '零':
            del x[8]
    return x
def dl(x):
    try:
        if x[0]=='零':
            del x[0]
            del1(x)
    except:pass
    x.reverse()
    x=''.join(x)
    return x
start=time.time()
rank=rankis()
end=time.time()-start
print('When the program is shared: %0.2f'%end)
Program time: 123.68s

Although there is still some redundant code, it runs at an amazing speed.

Summary: I personally feel that python is a string-first language. Strings can be used flexibly in many places. The program is far from perfect, and I hope it can be faster.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324881634&siteId=291194637
Recommended