This article explains how to convert Markdown files to plain text files. Markdown is a lightweight markup language for writing simple-formatted documents. However, sometimes we need to convert Markdown files to plain text files for other processing or viewing directly in the browser. Here is a simple way to implement this function.
Convert to html
To convert Markdown files to html files, you can use Python's markdown
library. First make sure the library is installed markdown
. If not, you can install it using the following command:
pip install markdown
The Markdown file can then be converted to a plain text file using the following code:
import markdown
def md_to_txt(md_file, txt_file):
with open(md_file, 'r', encoding='utf-8') as f:
md_content = f.read()
txt_content = markdown.markdown(md_content)
with open(txt_file, 'w', encoding='utf-8') as f:
f.write(txt_content)
md_file = 'example.md' # Markdown文件路径
txt_file = 'example.html' # 转换后的纯文本文件路径
md_to_txt(md_file, txt_file)
Replace example.md
with the path to the Markdown file you want to convert and example.txt
replace with the path to the html file you want to save.
Convert to txt
If you want to remove the link and save only the plain text, we define a function that md_to_txt()
accepts two parameters: md_file
the path of the Markdown file and txt_file
the path of the converted plain text file. The function first open()
reads the contents of the Markdown file using the function and splits it line by line into a list of strings str_list
. It then iterates through each line in the list, ignoring lines containing specific keywords (such as ![
or https
), and removing specific text (such as 如下图所示:
). Add the processed text to txt_content
the variable, update the title and category information as needed, write txt_content
to a plain text file under the specified path, and print the conversion completion message.
import os
import re
import markdown2 as mdk
def traverse_dir_files(root_dir, ext=None, is_sorted=True):
"""
列出文件夹中的文件, 深度遍历
:param root_dir: 根目录
:param ext: 后缀名
:param is_sorted: 是否排序,耗时较长
:return: [文件路径列表, 文件名称列表]
"""
names_list = []
paths_list = []
for parent, _, fileNames in os.walk(root_dir):
for name in fileNames:
if name.startswith('.'): # 去除隐藏文件
continue
if ext: # 根据后缀名搜索
if name.endswith(tuple(ext)):
names_list.append(name)
paths_list.append(os.path.join(parent, name))
else:
names_list.append(name)
paths_list.append(os.path.join(parent, name))
if not names_list: # 文件夹为空
return paths_list, names_list
# if is_sorted:
# paths_list, names_list = sort_two_list(paths_list, names_list)
print(paths_list)
return paths_list
def remove_code_blocks(text):
return re.sub(r'```(.*?)```', '', text, flags=re.DOTALL)
def md_to_txt(md_file, txt_file):
txt_content = ''
title = os.path.basename(md_file).replace('.md','').strip()
with open(md_file, 'r', encoding='utf-8') as f:
str_list = f.read().splitlines()
for md in str_list:
if '![' in md or 'https' in md:
continue
md = md.replace('如下图所示:', '')
txt_content += md +'
'
if 'title:' in md:
title = md.replace('title:','').strip()
if 'category:' in md:
category = md.replace('category:','').strip()
title = category + '_' + title
os.makedirs(os.path.dirname(txt_file), exist_ok=True) # 如果目录不存在则创建目录
with open(os.path.join(txt_file,title+'.txt'), 'w', encoding='utf-8') as f:
f.write(txt_content)
print("转换完成:%s" % (md_file))
Traverse the specified directory
A function is defined readlist()
, which is used to traverse all Markdown files in the specified directory and call md_to_txt()
the function for conversion. It accepts two parameters: path
the directory path to be traversed, txt_dir
and the directory path where the converted plain text files are stored. The function uses dir_util.traverse_dir_files()
the method to obtain all file paths with the extension in the directory .md
and stores them in path_list
the list. It then iterates through each file path in the list and attempts to call md_to_txt()
the function to convert it. If an exception occurs during conversion, it will print out an error message.
def readlist(path, txt_dir):
path_list = dir_util.traverse_dir_files(root_dir=path, ext='.md')
res = []
for path_str in path_list:
try:
md_to_txt(path_str, txt_dir)
except Exception as e:
print(path_str + '---------error-----------')
print(e)
Finally, we can call these two functions in the Python script to convert Markdown to plain text. For example, suppose we have a Markdown file data/tree.md
and we want to convert it to a plain text file and save it to data/txt
the directory. We can write the code like this:
if __name__ == '__main__':
md_file = r'data\' # Markdown文件路径
txt_dir = r'data\txt' # 转换后的纯文本文件存放的目录路径
readlist(md_file, txt_dir)
After running this code, data/txt
a plain text file tree.md
with . The text content is the same as the original Markdown file.