Memo writing a parser for man pages

I generally liked doing nothing, but sometimes too boring not OK - 2015 a Sunday afternoon like this, I decided to start writing an open source project to make me less boring.

When I was looking for ideas, I stumbled upon a request to build a made Mathias Bynens 's " according to Web standards built Man man page viewer ." Did not think too much, I started writing a manual page using JavaScript parser, after a lot of thinking again and again, finally made a Jroff .

At that time, I am very familiar with the concept of man pages, and use many times, but I know stop here, I do not know how they are generated, or if there is a standard. After two years, I have some thoughts on the matter.

man pages is how to write

At that time I am surprised that the first thing is that the core of the manual pages just plain text files stored in the system somewhere (you can use the manpathcommand to check these directories).

This file contains not only documents, but also includes the use of the 1970s called troffformatting information typesetting system.

GNU implementation of troff and groff program is a text document describing the layout to produce a version suitable for printing. It is more like "that you describe what you get" instead of your WYSIWYG.

If you have no familiarity with the layout format, you can use them as Markdown steroids journals, but its flexibility is to bring more complex syntax:

groff Files can be written by hand, you can use many different tools generated from other formats, such as Markdown, Latex, HTML and so on.

Why groffand man pages tied together is a historical reason, the format has changed over time , its lineage by a series of similarly named program consisting of: RUNOFF> roff> nroff> troff > groff.

But that does not necessarily mean groffthe man page how closely the relationship, it is a common format has been used for books , even for phototypesetting .

In addition, it is noteworthy that groffmay call processor converts the intermediate to the final output format, which is not necessarily a terminal display for ASCII! Some supported formats are: TeX DVI, HTML, Canon, HP LaserJet4 compatible formats, PostScript, utf8 and so on.

Macros

Other cool features of the format is that it's scalable, you can write a macro to enhance its basic functions.

Given the long history of * nix system, several can be grouped together in packages specific functions according to the output you want to generate, for example man, mdoc, mom, ms, mmand so on.

Manual page often uses manand mdocpackages written.

Distinguish native groffcommands and macros is through a standard groffpackage uppercase its macro name. For manpackages, each macro name are capitalized, such as .PP, .TH, .SHand so on. For mdocpackages, only the first letter is .Ppcapitalized: .Dt, .Sh, .

challenge

Whether you are considering writing your own groffparser, or just curious, these are some of the more challenging problems I found.

Context-sensitive grammar

On the surface, groffthe context-free grammar is, unfortunately, because the body is opaque tokens described macro, so the package may not be achieved macro collection itself context free grammar.

This leads me to draw up a parser generator at that time (or worse).

Nested macros

mdoc Most macro packages are callable, which means that almost Macros can be used for other macro parameters, for example, you look at this:

  • Macro Fl(Flag) adds dashes in its parameters, and thus Fl sgenerated-s
  • Macro Ar(Argument) provides a definition of the parameters of the tool
  • Macro Op(Optional) will its argument enclosed in parentheses, because this is something defined as an optional standard idiom
  • The following combination .Op Fl s Ar filewill be generated [-s file], since Opthe macro can be nested.

Lack of resources for beginners

Confused me was the lack of a standardized, well-defined, clear source, there are many online information that is important to readers, it takes time to master.

Interestingly macro

To sum up, I will provide you with a very short list of macros, I found it very interesting when developing jroff:

man Packages:

  • .TH: The manmacro package written manual page, your first comment line must not be the macro, it takes five parameters: title, section, date, source, manual.
  • .BI: Bold italic (particularly suitable Function Format)
  • .BR: Bold Jiazheng body (especially for other reference manual page)

mdoc Packages:

  • .Dd, , .Dt: .OsSimilar manpackages required .TH, mdocmacro need three macros, requires a specific order. Their abbreviations represent: Document date, document title and the operating system.
  • .Bl, , .It: .ElThese three macros are used to create the list, their names speak for themselves: the beginning of the list, and the end of the list of items.

via: monades.roperzh.com/memories-wr…

Author: Roberto Dip Translator: wxy proofread: wxy topics: lujun9972

This article from the LCTT original compiler, Linux China is proud

Reproduced in: https: //juejin.im/post/5d0114e851882574ce01412b

Guess you like

Origin blog.csdn.net/weixin_33859231/article/details/93183700