Haskell Lesson:a tokenizer

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/abcamus/article/details/79187854

一、关于pure和impure

Haskell作为一门纯函数式语言,副作用剥离是它的一大特色,基本上较为严格的遵循数学函数的形式。但是也带来了一些问题,譬如在实现一个文本编辑器的时候,需要把token记录下来,这个问题在C语言中很容易解决,搞个静态的变量即可,但是在Haskell中就没有那么容易了。可能需要这么做:

getToken >>= \s -> return (s:tokenList) >>= showInfo

getToken返回一个IO Token,通过Monad运算符(>>=)传递到一个lambda表达式,把IO Token中的Token提取出来,然后添加到tokenList后再给后一级函数。

二、Why important of purity?

purity并不是必须的。譬如实现一个编译器,必须要处理命令行输入,就不可能是完全pure的。事实上也不是所有的函数式语言都是pure的,譬如Lisp就是impure的。

三、一个tokenizer的haskell实现

这个tokenizer很简单,只是高亮了C语言的保留字。

分成了三个模块:

  • 主程序(parser)
  • LexInfo
  • DebugInfo

3.1 主parser

因为是文本界面,渲染选择了ansi-terminal这个lib,里面有提供cursor操作、清屏、着色等接口。每次通过getToken函数抓取一个token,传给parser,由parser处理后保存或者做其他处理。

{-# LANGUAGE ForeignFunctionInterface #-}
module Main
  (
    main
  ) where

import System.Console.ANSI
import System.Process
import System.IO
import System.Exit

import Control.Monad
import Data.Char(ord)

import LexInfo
import DebugInfo

resetScreen = do
  clearScreen
  cursorUpLine 1000
  return ()

isWordChar :: Char -> Bool
isWordChar ch = ch `elem` wordCharList
  where wordCharList = '_':['A'..'Z']++['a'..'z']

{- 'A' == UpArrow; 'B' == DownArrow;
   'C' == RightArrow; 'D' == LeftArrow-}
dealArrow :: Char -> IO()
dealArrow ch
  | ch == 'A' = cursorBackward 4 >> clearFromCursorToLineEnd >> cursorUpLine 1
  | ch == 'B' = cursorBackward 4 >> clearFromCursorToLineEnd >> cursorDownLine 1
  | ch == 'C' = cursorBackward 4 >> clearFromCursorToLineEnd >> cursorForward 1
  | ch == 'D' = cursorBackward 4 >> clearFromCursorToLineEnd >> cursorBackward 1
  | otherwise = return ()

cmdParser :: Char -> IO String
cmdParser ch
  | ch == 'q' = cursorBackward 1 >> clearFromCursorToLineEnd >> return "quit"
  | ch == 's' = cursorBackward 1 >> clearFromCursorToLineEnd >> return "savefile"
  | otherwise = cursorBackward 1 >> return "undefined"

getWord :: String -> IO String
getWord str = getChar >>= go str where
  go str ch = if ch == '\DEL'
                 then do
                   cursorBackward 3
                   clearFromCursorToLineEnd
                   getWord (drop 1 str)
                 else if ch == '\ESC' -- ^[[A == UpArrow
                   then do
                     getChar >> getChar >>= dealArrow
                     return $reverse str
                 else if ch == ':'
                   then
                   cursorBackward 1 >> clearFromCursorToLineEnd >> getChar >>= cmdParser
                 else if not (isWordChar ch)
                   then do
                     return $reverse (str)
                 else
                   getWord $ch:str

{- type 0 : normal word
   type 1 : keyword-}
parse :: String -> IO Token
parse str
  | str `elem` keywords  = do
    setSGR [SetColor Foreground Vivid Red] >> cursorBackward (length str + 1)
    putStr str >> putChar ' '
    setSGR [Reset]
    return (Token str 1)
  | str == "quit" = return (Token "" (-1))
  | str `elem` whiteSpace = return Empty
  | otherwise = return (Token str 0)
  where keywords = getKeyRsv 'c'
        whiteSpace = [" ", "\n", "\t"]

saveToken :: Handle -> Token -> IO (Token)
saveToken tmph token = if tokenType token == -1
                          then
                          hClose tmph >> exitSuccess
                          else do
                            hPutStr tmph $tokenValue token
                            return token

getToken :: IO Token
getToken = getWord "" >>= parse

main :: IO ()
main = do
  hSetBuffering stdin NoBuffering
  hSetBuffering stdout NoBuffering
  resetScreen

  forever $do
    let tokenList = [] in
      mainloop tokenList
        where mainloop tokenList = getToken >>= \s -> return (s:tokenList) >>= showInfo False >>= mainloop

3.2 LexInfo

LexInfo的话目前只是枚举了C语言的保留字,通过导出getKeyRsv函数,可以方便的添加其他语言对应的关键字。

module LexInfo
  (
    Token(..),
    getKeyRsv
  ) where

data Token = Token {
  tokenValue :: String,
  tokenType :: Int
                   } | Empty deriving (Show)

getKeyRsv :: Char -> [String]
getKeyRsv ch
  | ch == 'c' =
    ["auto", "break", "case", "char", "const", "continue", "default", "do", "double", "else", "enum", "extern", "float", "for", "goto", "if", "int", "long", "register", "return", "short", "signed", "sizeof", "static", "struct", "switch", "typedef", "union", "unsigned", "void", "volatile", "while"]
  | otherwise = [""]

3.3 DebugInfo

供调试用,提供showInfo接口,用来插入Monad链中辅助调试。

module DebugInfo
  (
    showInfo
  ) where

import LexInfo

showInfo :: Bool -> [Token] -> IO [Token]
showInfo flag tokenList = case flag of
                            True -> do
                              print tokenList >> return tokenList
                            otherwise -> return tokenList

欢迎关注微信公众号
这里写图片描述

猜你喜欢

转载自blog.csdn.net/abcamus/article/details/79187854