翻译自 Why Static Languages Suffer From Complexity?
Article Directory
foreword
People in the programming language design community strive to make their languages more expressive, with stronger type systems, primarily to increase the efficiency of code development by avoiding code duplication in the final software, however, the more expressive their languages, The more suddenly repetition penetrates the language itself.
This is what I mean by static-dynamic biformity : whenever you introduce a new language abstraction in your language, it might reside at the static level, the dynamic level, or both . In the first two cases, where the abstraction is only at one specific level, you introduce language inconsistency; in the latter case, you inevitably introduce feature biformity .
As we know, the static level refers to the block of statements executed at compile time. Likewise, dynamic levels are blocks of statements that are executed at runtime. Thus, typical control flow operators (eg if/while/for/return
, data structures and procedures procedure
) are dynamic, while static type system features (type system features)
and syntax macros (syntactical macros)
are static. Essentially, most static language abstractions have their counterparts in dynamic space and vice versa:
In the following sections, before elaborating further, let me show you how to implement logic using both static and dynamic methods equivalent program. Most of the examples are written in Rust, but can be applied to any other general-purpose programming language with a sufficiently expressive type system; keep in mind that this article is language-agnostic and focuses on the general PLT philosophy rather than a specific programming language accomplish. If you think there is too much content, you can skip directly to the relevant section.
Record type - Array
Consider everyday use record types
scenarios
struct Automobile {
wheels: u8,
seats: u8,
manufacturer: String,
}
fn main() {
let my_car = Automobile {
wheels: 4,
seats: 4,
manufacturer: String::from("X"),
};
println!(
"My car has {} wheels and {} seats, and it was made by {}.",
my_car.wheels, my_car.seats, my_car.manufacturer
);
}
(The size here Automobile
can be determined at compile time, so it is static record-type
– Translator’s Note)
can be used arrays
for the same implementation:
use std::any::Any;
#[repr(usize)]
enum MyCar {
Wheels,
Seats,
Manufacturer,
}
fn main() {
let my_car: [Box<dyn Any>; 3] = [Box::new(4), Box::new(4), Box::new("X")];
println!(
"My car has {} wheels and {} seats, and it was made by {}.",
my_car[MyCar::Wheels as usize]
.downcast_ref::<i32>()
.unwrap(),
my_car[MyCar::Seats as usize].downcast_ref::<i32>().unwrap(),
my_car[MyCar::Manufacturer as usize]
.downcast_ref::<&'static str>()
.unwrap()
);
}
If we specify the incorrect type to do .downcast_ref
, we will encounter panic
. But the logic of the program remains the same, it's just that we lift the type checking up to runtime.
Going a step further, we can encode static Automobile
typing into heterogeneous lists heterogenous list
:
use frunk::{
hlist, HList};
struct Wheels(u8);
struct Seats(u8);
struct Manufacturer(String);
type Automobile = HList![Wheels, Seats, Manufacturer];
fn main() {
let my_car: Automobile = hlist![Wheels(4), Seats(4), Manufacturer(String::from("X"))];
println!(
"My car has {} wheels and {} seats, and it was made by {}.",
my_car.get::<Wheels, _>().0,
my_car.get::<Seats, _>().0,
my_car.get::<Manufacturer, _>().0
);
}
This version enforces automobile-static.rs
the exact same type checking as (the previous code), but also provides Automobile
methods that operate like normal collections! For example, we might want to invert our car:
assert_eq!(
my_car.into_reverse(),
hlist![Manufacturer(String::from("X")), Seats(4), Wheels(4)]
);
Or we might want to pull our car with someone else's car:
let their_car = hlist![Wheels(6), Seats(4), Manufacturer(String::from("Y"))];
assert_eq!(
my_car.zip(their_car),
hlist![
(Wheels(4), Wheels(6)),
(Seats(4), Seats(4)),
(Manufacturer(String::from("X")), Manufacturer(String::from("Y")))
]
);
... etc.
However, sometimes we may wish to apply type computations type-level computation
(referring to the type system's type equivalence, type compatibility, type inference, and type native computation – Translator's Note) to ordinary struct
s and enum
s, but we cannot This is done because we cannot extract the structure of a type definition (fields fields
and types/variables types/variants
and their function signatures ) from the corresponding type name , and we cannot provide derived macros for function signatures
this type if it is external to us . crate
To solve this problem, the Frunk developers decided to create such a procedural macro procedural macro
that Generic
inspects the internal structure of a type definition by implementing a generic; it has the ability type Repr
to associate a type that, when implemented, is equal to some form of operable HList
. Still, all other types (well, transparent types such as DTOs) that don't have this derived macro are unscannable due to Rust's aforementioned limitations.
Sum type - Tree
One might find that the sum type is well suited for representing AST nodes:
use std::ops::Deref;
enum Expr {
Const(i32),
Add(Box<Expr>, Box<Expr>),
Sub(Box<Expr>, Box<Expr>),
Mul(Box<Expr>, Box<Expr>),
Div(Box<Expr>, Box<Expr>),
}
use Expr::*;
fn eval(expr: &Box<Expr>) -> i32 {
match expr.deref() {
Const(x) => *x,
Add(lhs, rhs) => eval(&lhs) + eval(&rhs),
Sub(lhs, rhs) => eval(&lhs) - eval(&rhs),
Mul(lhs, rhs) => eval(&lhs) * eval(&rhs),
Div(lhs, rhs) => eval(&lhs) / eval(&rhs),
}
}
fn main() {
let expr: Expr = Add(
Const(53).into(),
Sub(
Div(Const(155).into(), Const(5).into()).into(),
Const(113).into(),
)
.into(),
);
println!("{}", eval(&expr.into()));
}
tagged trees
The same can be done using :
use std::any::Any;
struct Tree {
tag: i32,
value: Box<dyn Any>,
nodes: Vec<Box<Tree>>,
}
const AST_TAG_CONST: i32 = 0;
const AST_TAG_ADD: i32 = 1;
const AST_TAG_SUB: i32 = 2;
const AST_TAG_MUL: i32 = 3;
const AST_TAG_DIV: i32 = 4;
fn eval(expr: &Tree) -> i32 {
let lhs = expr.nodes.get(0);
let rhs = expr.nodes.get(1);
match expr.tag {
AST_TAG_CONST => *expr.value.downcast_ref::<i32>().unwrap(),
AST_TAG_ADD => eval(&lhs.unwrap()) + eval(&rhs.unwrap()),
AST_TAG_SUB => eval(&lhs.unwrap()) - eval(&rhs.unwrap()),
AST_TAG_MUL => eval(&lhs.unwrap()) * eval(&rhs.unwrap()),
AST_TAG_DIV => eval(&lhs.unwrap()) / eval(&rhs.unwrap()),
_ => panic!("Out of range"),
}
}
fn main() {
let expr = /* Construction omitted... */;
println!("{}", eval(&expr));
}
Similar to our struct Automobile
operation on , we can use frunk::corproduct
the representation
Value - Associated type
!
We may wish to negate boolean values using standard operators
fn main() {
assert_eq!(!true, false);
assert_eq!(!false, true);
}
associated types
The same can be done by
use std::marker::PhantomData;
trait Bool {
type Value;
}
struct True;
struct False;
impl Bool for True {
type Value = True; }
impl Bool for False {
type Value = False; }
struct Negate<Cond>(PhantomData<Cond>);
impl Bool for Negate<True> {
type Value = False;
}
impl Bool for Negate<False> {
type Value = True;
}
const ThisIsFalse: <Negate<True> as Bool>::Value = False;
const ThisIsTrue: <Negate<False> as Bool>::Value = True;
In fact, the Turing-completeness of Rust's type system is based on this principle combined with type induction (as we'll see shortly). Every time you see an ordinary Rust value, know that it has a formal counterpart to its type in the computational sense. Every time you write some algorithm, it has its counterpart on the type system using conceptually equivalent constructs. If you're interested in how, the above article provides a mathematical proof: first, the author implements Smallfuck using the dynamic features: a , sum type
pattern matching, recursion, and then using the statics features: logic on traits
, associated types
etc.
Recursion-Type-level induction
Let me show you one more example, this time please focus!
use std::ops::Deref;
#[derive(Clone, Debug, PartialEq)]
enum Nat {
Z,
S(Box<Nat>),
}
fn add(lhs: &Box<Nat>, rhs: &Box<Nat>) -> Nat {
match lhs.deref() {
Nat::Z => rhs.deref().clone(), // I
Nat::S(next) => Nat::S(Box::new(add(next, rhs))), // II
}
}
fn main() {
let one = Nat::S(Nat::Z.into());
let two = Nat::S(one.clone().into());
let three = Nat::S(two.clone().into());
assert_eq!(add(&one.into(), &two.into()), three);
}
This is the Peano encoding of the natural numbers. In add
the function, we use recursion to calculate the sum, and pattern matching to find out where to stop.
Since recursion corresponds to type induction, and pattern matching corresponds to multiple implementations, the same can be done at compile time (playground):
use std::marker::PhantomData;
struct Z;
struct S<Next>(PhantomData<Next>);
trait Add<Rhs> {
type Result;
}
// I
impl<Rhs> Add<Rhs> for Z {
type Result = Rhs;
}
// II
impl<Lhs: Add<Rhs>, Rhs> Add<Rhs> for S<Lhs> {
type Result = S<<Lhs as Add<Rhs>>::Result>;
}
type One = S<Z>;
type Two = S<One>;
type Three = S<Two>;
const THREE: <One as Add<Two>>::Result = S(PhantomData);
Derivation process (Translator's Note):
Add<Two> -> Result = S<One>
<S<Z> as Add<S<S<Z>>>
Lhs S<Z>
Rhs S<Z>
Three : <S<Z>> as <<S<Z> as S<Z>>::Result
Here, impl ... for Z
is the base case (terminating case), impl ... for S<Lhs>
and is the inductive step (recursive case) - similar to the pattern matching we use. Also, as shown in the first example, induction works by reducing the first argument to Z<Lhs as Add<Rhs>>::Result
: like add(next, rhs)
- which again invokes pattern matching to push the computation further. Note that the two trait implementations do belong to the same logic function implementation; they appear to be separate because we perform pattern matching on type-level number
( Z
and ). S<Next>
This is somewhat similar to what we see in Haskell, where each pattern matching case looks like a separate function definition:
import Control.Exception
data Nat = Z | S Nat deriving Eq
add :: Nat -> Nat -> Nat
add Z rhs = rhs -- I
add (S next) rhs = S(add next rhs) -- II
one = S Z
two = S one
three = S two
main :: IO ()
main = assert ((add one two) == three) $ pure ()
Type-level logic reified
The purpose of this article is only to convey statics-dynamics biformity
the intuition behind it, not to provide a formal proof - for the latter, see an awesome library called type-operator (by the same guy who implemented Smallfuck on types). Essentially, it's an algorithmic macro eDSL that boils down to type-level operations with traits: you can define algebraic data types and perform data operations on them, similar to how you usually do in Rust, but ultimately, The entire code will stay at the type-level. For more details, see the translation rules and the excellent guide by the same author . Another notable project is Fortraith , which is a "compile-time compiler that compiles Forth into compile-time trait expressions":
forth!(
: factorial (n -- n) 1 swap fact0 ;
: fact0 (n n -- n) dup 1 = if drop else dup rot * swap pred fact0 then ;
5 factorial .
);
The code above turns a simple factorial implementation into a computation on traits and related types. After a while, you will get the result like this:
println!(
"{}",
<<<Empty as five>::Result as factorial>::Result as top>::Result::eval()
);
After considering all of the above, it's clear that no matter what you call it, the logic part remains the same: whether it's static or dynamic.
The unfortunate consequenes of being static
Are you quite sure that all those bells and whistles, all those wonderful facilities of your so called powerful programming languages, belong to the solution set rather than the problem set? The wonderful facilities of a programming language that all belong to the solution set rather than the problem set?
Edsger Dijkstra (Edsger Dijkstra, nd)
Today's programming languages don't focus on logic. They focus on the mechanics underlying the logic; they call boolean negation the simplest operator that must exist from the start, but (can be understood negative trait bounds
as negated pattern matching or templates, refer to here – Translator's Note) is considered a Controversial concept with "lots of questions". Most mainstream PLs support tree data structures in their standard libraries, but it sum types
hasn't been implemented for decades. I can't imagine if
a single language without operators, but only a few PLs have mature ones trait bounds
, let alone pattern matching. It's inconsistent - it forces software engineers to design low-quality APIs that are either dynamic and expose few compile-time checks, or become static and try to circumvent fundamental limitations of the host language, making their use increasingly obscure Difficult to understand. Combining static and dynamic in a single working solution is also complicated because you can't call dynamic features in a static context. In terms of function colors , the dynamic color is red and the static color is blue.
In addition to this inconsistency, we have biformity
traits. In languages like C++, Haskell, and Rust, this biformity
amounts to the most perverse form; you can think of any so-called "expressive" programming language as two or more smaller languages put together: the C++ language and C++ templates/macros, the Rust language and type-level Rust + declarative macros, etc. With this approach, every time you write something at the meta level, you can't reuse it in the host language and vice versa, violating the DRY principle (as we'll see in a minute). Additionally, biformity
the learning curve is increased, language evolution is enforced, and ultimately feature bloat occurs where only coders can figure out what's going on in the code. Look at any production code in Haskell, and you'll immediately see those numerous GHC #LANGUAGE
clauses, each of which signifies a separate language extension:
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE DefaultSignatures #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE NamedFieldPuns #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE ViewPatterns #-}
When the host language doesn't provide enough static functionality needed to facilitate development, some programmers go especially insane (insane!), creating entirely new compile-time metalanguages and methods on top of existing compile-time metalanguages
languages eDSL
. biformity
Thus, inconsistency has the dangerous property of translating to
: [C++] We have template metaprogramming libraries like Boost/Hana and Boost/MPL that replicate the functionality of C++ for use at the meta level:
BOOST_HANA_CONSTANT_CHECK(
hana::take_while(hana::tuple_c<int, 0, 1, 2, 3>, hana::less.than(2_c))
==
hana::tuple_c<int, 0, 1>
);
constexpr auto is_integral =
hana::compose(hana::trait<std::is_integral>, hana::typeid_);
static_assert(
hana::filter(hana::make_tuple(1, 2.0, 3, 4.0), is_integral)
== hana::make_tuple(1, 3), "");
static_assert(
hana::filter(hana::just(3), is_integral)
== hana::just(3), "");
BOOST_HANA_CONSTANT_CHECK(
hana::filter(hana::just(3.0), is_integral) == hana::nothing);
typedef vector_c<int, 5, -1, 0, 7, 2, 0, -5, 4> numbers;
typedef iter_fold<
numbers,
begin<numbers>::type,
if_<less<deref<_1>, deref<_2>>, _2, _1>
>::type max_element_iter;
BOOST_MPL_ASSERT_RELATION(
deref<max_element_iter>::type::value, ==, 7);
[c] My own compile-time metaprogramming framework, Metalang99, does the same thing by using the C preprocessor (ab). It grew to such an extent that I was forced to reimplement recursion through a combination of Lisp-like trampoline
and (CPS) techniques. continuation-passing style
Finally, I have a large number of list manipulation functions in the standard library, such as ML99_listMap
, ML99_listIntersperse
and ML99_listFoldr
, which arguably makes Metalang99, as a pure data transformation language, more expressive than C itself.
[rust] In Automobile
the first example of inconsistency, we used the Frunk library hlist
. It's not hard to see that Frunk replicates some of the functionality of collections and iterators just to bring them up to the type-level. It might be cool to apply Iterator::map
or Iterator::intersperse
to hlist
, but we can't. Worse, if we still want to perform declarative data transformations, we have to maintain a 1-to-1 correspondence between iterator adapters and type-level adapters; hlist
A utility is missing in .
[rust] Typenum is another popular type-level library: it allows integer computations to be performed at compile time by encoding integers as generics. By doing this, the part of the language responsible for integers finds its counterpart in statics, thus introducing more biformity
. We can't just parameterize certain types with (2 + 2) * 5
, we have to write something like this <<P2 as Add<P2>>::Output as Mul<P5>>::Output
! The best you can do is write a macro that does the dirty work for you, but it'll just be syntactic sugar - and you'll see tons of compile-time errors with the above characteristics anyway.
Sometimes software engineers find their language too primitive to express their ideas even in dynamic code. But they didn't give up:
[Golang]Kubernetes is one of the largest codebases in Golang and implements its own object-oriented type system in the runtime package.
[C] VLC media player has a macro-based plugin API for representing media codecs. Following is the definition of Opus .
[C] QEMU computer emulator builds on its custom object model QObject``QNum``QNull``QList``QString``QDictQBool
etc.
Recall the famous Greenspun tenth rule (yes! the one we all know " Any sufficiently complex C or Fortran program contains a temporary, non- Formally specified, error-ridden, slow half-implementation of General Lisp." – Annotation), this hand-crafted metalanguage is often "ad hoc, informally specified, error-ridden, slow", with rather vague Semantics and terrible documentation. The notion of metalanguage abstraction simply doesn't work, although the rationale for creating highly declarative, small domain-specific languages sounds cool at first glance. When a problem entity (or some intermediate mechanism) is expressed in the host language, you need to understand how to chain calls together to get the job done - this is what we usually call an API; however, when this API is written in another language , then, in addition to calling sequences, you need to know the syntax and semantics of the language, which is very unfortunate for two reasons: the mental burden it places on the developer, and the ability to support such a metalanguage The number of developers is very limited. In my experience, handcrafted metalinguistics tend to quickly get out of hand and spread throughout the codebase, making it harder to mine. Not only inference is impaired, but also compiler-developer interaction: have you tried to use complex types or macro APIs? If yes, then you should be completely familiar with incomprehensible compiler diagnostics, which can be summed up in the following screenshot:
Sad to say, but now it seems that "expressive" PL means "hey, I seriously screwed up feature's Quantity, but that's okay!"
Finally, a word must be said about metaprogramming in the host language. Using template systems like Template Haskell and Rust's procedural macros, we can use the same language to process the host language's AST, which biformity
is nice in terms of language inconsistency, but unpleasant in terms of language inconsistency. Macros are not functions: we can't partially apply a macro and get a partially applied function (and vice versa), because they're just different concepts - which can be a pain in the ass if we were to design a general and easy-to-use library API. Personally, I do think procedural macros in Rust are a huge design mistake, comparable to #define
macros in plain C: the macro system has no knowledge of the language used at all, other than the pure syntax; you get slightly Enhanced text replacement instead of a tool to extend and use a language gracefully. For example, suppose there is an Either
enum named enum defined as follows:
pub enum Either<L, R> {
Left(L),
Right(R),
}
Now imagine we have an arbitrary trait Foo
, and we want to Either<L,R>
implement that trait, L
and R
both. It turns out that we can't apply a derivation macro to Either
achieve this trait
, even if the name is known, because in order to do this, this macro must know Foo
all the signatures. Worse, Foo
it may be defined in a separate library, which means we cannot Either<L,R>
enhance its definition with the extra meta information needed for derivation. While it might seem like a rare case, it's actually not; I highly recommend looking at tokio-util's Either, which is the exact same enum, but implements Tokio-specific, eg traits
, AsyncRead AsyncWrite AsyncSeek
etc. Now imagine having five different collections from different libraries in your project and Either
this would be a headache! While type introspection (the ability to check object types or properties at runtime, maybe you're more familiar with "reflection"? - Annotation) may be a compromise, it still makes the language more complex than it already is.
Idris:The way out?
One of the most fundamental features of Idris is that types and expressions are part of the same language – you use the same syntax for both
. same syntax.
Edwin Brady, the author of Idris (Edwin Brady, nd)
Let's think about how to solve this problem. If we made our language fully dynamic, we would have no problems with biformity and inconsistency, but would quickly lose the ability to verify at compile time and then have to debug our programs in the middle of the night. The pain of dynamic type systems is well known.
The only way to solve this problem is to make one language function both static and dynamic, instead of splitting the same function in two. Therefore, the ideal language abstraction is both static and dynamic. However, it is still a single concept rather than two logically similar systems with different interfaces. A perfect example is CTFE, commonly known as constexpr
: the same code can be executed at compile time under a static context, and at runtime under a dynamic context (e.g. when user input is requested). So instead of writing different code for compile time (static) and runtime (dynamic), we use the same representation.
One possible solution I see is dependent types
(depending on the type of value, corresponding to the universal quantifier and existential quantifier in predicate logic, the dependent type of strong functional programming is not Turing complete, and vice versa cannot solve the halting problem – Annotation ). With dependent types, we can not only parameterize types with other types, but also parameterize types with values. In the dependently typed language Idris, there is a type called Type
- which stands for "types of all types", thereby weakening the dichotomy between type-level and value-level. With such power, we can express typed abstractions, which are usually either built into the language compiler/environment, or done via macros. Perhaps the most common and descriptive example is type safety printf
, which dynamically computes the types of its arguments, so let's have fun with that in Idris!
First, define the inductive data type fmt
and the method to get it from the format string:
data Fmt = FArg Fmt | FChar Char Fmt | FEnd
toFmt : (fmt : List Char) -> Fmt
toFmt ('*' :: xs) = FArg (toFmt xs)
toFmt ( x :: xs) = FChar x (toFmt xs)
toFmt [] = FEnd
printf
Later, we'll use this to generate a type for our function. The syntax is very similar to Haskell and should be understandable to the reader.
Now for the fun part:
PrintfType : (fmt : Fmt) -> Type
PrintfType (FArg fmt) = ({
ty : Type} -> Show ty => (obj : ty) -> PrintfType fmt)
PrintfType (FChar _ fmt) = PrintfType fmt
PrintfType FEnd = String
What does this function do? It fmt
computes the type based on the input parameters. As usual, we split the case fmt
into three cases and deal with them separately:
(FArg fmt).
This case produces a type signature that takes an additional parameter, sinceFArg
it indicates that we will provide a printable parameter:{ty : Type}
means that Idris will automatically deducety
a type for this parameter (implicit parameter).Show ty
is a type constraint that says itty
should be implementedShow
.(obj : ty)
isprintf
the printable parameter we have to supply to .PrintfType fmt
isfmt
a recursive call to process the rest of the input. In Idris, recursive types are managed by recursive functions!
(FChar _ fmt).
represents an ordinary character in the format string, so here we ignore it and carry onFCharPrintfType fmt
.FEnd.
This is the end of the input. Since we want toprintf
generate oneString
, we returnString
as a normal type.
Now suppose we have a format string "*x*"
or FArg (FChar ('x' (FArg FEnd)))
; PrintfType
what type will be generated? It's simple:
1. FArg:{ty : Type} -> Show ty => (obj : ty) -> PrintfType (FChar ('x' (FArg FEnd)))
2. FChar:{ty : Type} -> Show ty => (obj : ty) -> PrintfType (FArg FEnd)
3. FArg:{ty : Type} -> Show ty => (obj : ty) -> {ty : Type} -> Show ty => (obj : ty) -> PrintfType FEnd
4. FEnd:{ty : Type} -> Show ty => (obj : ty) -> {ty : Type} -> Show ty => (obj : ty) -> String
Cool, now it's time to achieve what we've always dreamed of printf
:
printf : (fmt : String) -> PrintfType (toFmt $ unpack fmt)
printf fmt = printfAux (toFmt $ unpack fmt) [] where
printfAux : (fmt : Fmt) -> List Char -> PrintfType fmt
printfAux (FArg fmt) acc = \obj => printfAux fmt (acc ++ unpack (show obj))
printfAux (FChar c fmt) acc = printfAux fmt (acc ++ [c])
printfAux FEnd acc = pack acc
As you can see, PrintfType (toFmt $ unpack fmt)
appears in the type signature, which means that printf
the type of the entire type depends on the input parameters fmt
! Butunpack fmt
what does it mean? Due to printf
usage fmt:String
, we should convert it to beforehand List Char
, since we are toFmt
matching this string in ; as far as I know, Idris does not allow matching in the same way String
. Again, we do it printfAux
before unpack fmt
, since it also needs List Char
to be the sum of the results.
Let's check printfAux
the implementation:
(FArg fmt).
Here we return a lambda function that takesobj
and invokesshow
and is then++
appended to by the operatoracc
.(FChar c fmt).
Just attachc
toacc
andfmt
call again inprintfAux
.FEnd.
Althoughacc
oneList Char
, we must returnString
(accordingPrintfType
to the last case) we call on itpack
.
Finally, test printf
:
printf.idr
main : IO ()
main = putStrLn $ printf "Mr. John has * contacts in *." 42 "New York"
This will print Mr. John has 42 contacts in "New York"
. But what if we don't provide 42
it?
Error: While processing right hand side of main. When unifying:
?ty -> PrintfType (toFmt [assert_total (prim__strIndex “Mr. John has * contacts in *.” (prim__cast_IntegerInt (natToInteger (length “Mr. John has * contacts in *.”)) - 1))])
and:
String
Mismatch
between: ?ty -> PrintfType (toFmt [assert_total (prim__strIndex “Mr. John has * contacts in *.” (prim__cast_IntegerInt (natToInteger (length “Mr. John has * contacts in *.”)) - 1))]) and String.
test:21:19–21:68
17 | printfAux (FChar c fmt) acc = printfAux fmt (acc ++ [c])
18 | printfAux FEnd acc = pack acc
19 |
20 | main : IO ()
21 | main = putStrLn $ printf “Mr. John has * contacts in *.” “New York”
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Warning: compiling hole Main.main
Yes, Idris detected an error and produced a type mismatch! This is basically the way first-class
to achieve type safety with types printf
. If you're curious about the same in Rust, take a look at Will Crichton's attempt, which relies heavily on the heterogeneous lists we saw above. The downside of this approach should be pretty clear by now: in Rust, the language for the type system is different from the main language, but in Idris, it's really the same thing – which is why we're free to define type-level functions as returning a type of regular functions, and call them later in the type signature. Also, since Idris is dependently typed, you can even compute types from certain runtime parameters, which is not possible in languages like Zig.
I've anticipated this question: printf
what's the problem with using the macro implementation? After all, println!
it works just fine in Rust. The problem is with the macro. Think about it: why do programming languages need heavy macros? Because we might want to extend it. Why should we extend it? Because a programming language didn't fit our needs: we couldn't express something using regular language abstractions, that's why we decided to extend the language with ad-hoc meta-abstractions. In the main part, I provide an argument why this approach sucks - because the macro system has no clue about how the language works; in fact, procedural macros in Rust are just a fancy name for the M4 preprocessor. You have integrated M4 into your language. Sure, that's better than an external M4, but it's still a 20th-century approach. Also, macros can't even manipulate abstract syntax trees, syn::Item
which are indeed called concrete syntax trees, or "parse trees", as is a common structure for writing procedural macros. Types, on the other hand, are a natural part of the host language, which is why if we can use types to express programming abstractions, we reuse language abstractions instead of resorting to ad-hoc mechanisms. Ideally, a programming language should have no macros, or only lightweight syntax rewriting rules (such as Scheme's extended syntax or Idris' syntax extensions), to keep the language consistent and well suited to solve intended tasks.
Having said that, Idris Type
eliminates the first biformity "value generics" by introducing "types of all types" values-generics
. By doing so, it also resolves many other correspondences, such as recursion vs. type-level induction, functions vs. the trait mechanism, etc.; in turn, this allows as much as possible to be programmed in the same language, even when dealing with highly generalized code is also like this. For example, you can even represent a typed list as List Type
, like List Nat
or List String
, and handle it as usual! This may be due to the cumulative hierarchy of universes (see below). Since the generic name of Data.List a
is "implicitly" typed Type
, it Type
can be either as, Nat
or String
; in the latter case, a
will be deduced as Type 1
. Such an infinite sequence of types is needed to avoid Russell's paradox of variation making inhabitant
"structurally smaller" than its type.
However, Idris is not a simple language. Our twenty-line printf
example has used "the whole lotta feature", such as inductive data types, dependent pattern matching, implicits, type constraints, etc. In addition, Idris has computational effects, elucidator reflections, empathy data types, and much more for theorem proving. With so many tools, you're usually fiddling with your language instead of doing meaningful work. I find it hard to believe that, in their current state, dependent languages find a lot of production use; as for now, in the programming world, they are little more than a A fancy thing. Dependent types themselves are too low-level.
Zig:Simpler,but to systems
In Zig, types are first-class citizens. They can be assigned to variables, passed as parameters to functions, and returned from functions. They can be assigned to variables, passed as arguments to functions, and returned from functions.
The Zig manual (Zig developers, nd)
Our last patient is the Zig programming language. Here's printf
the compile-time implementation in Zig (sorry, not highlighted yet):
const std = @import("std");
fn printf(comptime fmt: []const u8, args: anytype) anyerror!void {
const stdout = std.io.getStdOut().writer();
comptime var arg_idx: usize = 0;
inline for (fmt) |c| {
if (c == '*') {
try printArg(stdout, args[arg_idx]);
arg_idx += 1;
} else {
try stdout.print("{c}", .{c});
}
}
comptime {
if (args.len != arg_idx) {
@compileError("Unused arguments");
}
}
}
fn printArg(stdout: std.fs.File.Writer, arg: anytype) anyerror!void {
if (@typeInfo(@TypeOf(arg)) == .Pointer) {
try stdout.writeAll(arg);
} else {
try stdout.print("{any}", .{arg});
}
}
pub fn main() !void {
try printf("Mr. John has * contacts in *.\n", .{ 42, "New York" });
}
Here we use a feature called comptime: comptime
a function argument means it must be known at compile time. Not only does it allow aggressive optimization, but it also opens up a temple of "metaprogramming" facilities, most notably no separate macro-level or type-level sublanguages. The above code needs no further explanation - the simple logic should be clear to every programmer, and doesn't printf.idr
seem like the fruit of a mad genius' fancy.
If we omit 42, Zig will report a compilation error:
An error occurred:
/tmp/playground2454631537/play.zig:10:38: error: field index 1 outside tuple 'struct:33:52' which has 1 fields
try printArg(stdout, args[arg_idx]);
^
/tmp/playground2454631537/play.zig:33:15: note: called from here
try printf("Mr. John has * contacts in *.\n", .{ "New York" });
^
/tmp/playground2454631537/play.zig:32:21: note: called from here
pub fn main() !void {
printf
The only inconvenience I encountered during development was huge errors ...much like C++ templates. However, I concede that this could be solved (or at least be able to cover everything) with more explicit type constraints. Overall, the design of Zig's type system seems sound: there is one type for all types called type
, and using comptime
, we can compute the type at compile time with regular variables, loops, procedures, etc. We can even perform type reflection via @typeInfo
, @typeName
and @TypeOf
builtins! Yes, we can no longer depend on runtime values, but if you don't need a theorem prover (theorem prover), then the full dependency type might be overkill.
Everything is fine, except that Zig is a systems language. On their official website, Zig is described as a "general-purpose programming language", but I have a hard time agreeing with that statement. Yes, you can write almost any software in Zig, but should you? My experience maintaining high-level code in Rust and C99 says "no". The first reason is security: if you make your systems language safe, you'll make programmers deal with borrow checkers and ownership (or equivalent) issues that are completely unrelated to business logic borrow checker
(trust me, I know how painful it is); If you choose the C way of manual memory management, you'll have programmers debugging their code for a long time and hoping -fsanitize=address
to show something meaningful. Also, if you were to build new abstractions on top of pointers, you'd end up with &str, AsRef<str>, Borrow<str>, Box<str>
something like that. Please, I just want a UTF-8 string; most of the time, I don't really care if it's one of these alternatives.
The second reason has to do with the language runtime: for a language to avoid hidden performance penalties, it should have a minimal runtime - no default GC, no default event loop, etc., but for a particular application program, it may be necessary to have a runtime - for example, to deal with an asynchronous runtime, so you actually have to deal with custom runtime code somehow. Here we run into a whole new set of problems with function coloring (see above): For example, having no facilities in your language to abstract synchronous and asynchronous functions means that you split your language into two parts: synchronous and asynchronous, For example, if you have a generic high-level library, it will inevitably be marked to async
accept various user callbacks. In order to solve this problem, you need to implement some form of effect polymorphism
(for example, monads or algebraic effects algebraic effects
), which is still a research topic. High-level languages inherently have fewer problems to deal with, which is why most software is written in Java, C#, Python, and JavaScript. In Golang, conceptually, every function is async
, so the default helps to maintain consistency without resorting to complex type traits. In contrast, Rust has been recognized as a complex language, and there is still no standard way to write truly general-purpose asynchronous code.
Zig can still be used in large system projects such as web browsers, interpreters, and operating system kernels - nobody wants these things to freeze unexpectedly. Zig's low-level programming capabilities will facilitate convenient manipulation of memory and hardware devices, while its sound metaprogramming approach (in the right hands) will foster understandable code structures. Introducing high-level code will only increase the mental burden without providing measurable benefits.
Progress is possible only if we train ourselves to think about programs without thinking of them as pieces of executable code
.
Edsger Dijkstra
epilogue
Static languages enforce compile-time checking; which is fine. But they suffer from characteristic biformity and inconsistency - which is bad. Dynamic languages, on the other hand, suffer from these shortcomings to a lesser extent, but they lack compile-time checking. The hypothetical solution should take the best out of the best of both worlds.
Programming languages should be rethought.
Replenish
Add some language feature
introduction
borrow
Borrowing, which appears in rust
Borrowing can only be one or more references to resources or a mutable reference
borrow's scope is smaller than the owner's scope
tuple structure
Appearing in rust,
the form is a structure of tuples, and its meaning is to deal with simple data that needs to define types (often used) but does not want to be too complicated:
struct Color(u8, u8, u8);
struct Point(f64, f64);
let black = Color(0, 0, 0);
let origin = Point(0.0, 0.0);
Phantom Data
In Rust only,
PhantomData is a tagged struct of type zero size.
Role:
unused type;
type change;
mark ownership relationship;
automatic trait implementation (send/sync);
Opaque and protocol types
Take swift as an example
Protocol type: a type that supports a set of methods
Opaque type: hides the type information of the return value, the compiler can access it, but the client cannot
implicit
Taking Scala as an example,
scala implicit
is used to implicitly pass parameters, including the implicit value of the function, implicit view (used for implicit conversion of parameter types), and implicit conversion (calling methods that do not exist in the class)
traits
Take rust as an example
case classes
Taking Scala as an example,
case classes are good at modeling immutable data.
The case class has an apply default method for instantiating case classes.
Case classes are compared by structure rather than by data.
Monoid scale
Taking scala as an
example
, a monoid (monoid) is a set of binary operations and identity elements that satisfy the associative law.
Scala Context Bounds
New feature introduced in Scala 2.8, often type class pattern
used with the type class pattern
//等价于
def foo[A](a:A)(implicit b:B[A]) = g(a)
// 将B折叠到A做隐式值传递
def foo[A : B](a: A) = g(a)
Because the implicit parameter value cannot be passed explicitly after using context bound, you need to use the implicitly
identifier to obtain the implicit value of the type in the context
def fol1[F[_], A](list: F[A])(m: Monoid[A])(implicit f: Foldable[F]): A = {
f.foldleft(list)(m.zero)(m.combine)
}
//-->>
def fold[F[_]: Foldable, A](list: F[A])(m: Monoid[A]): A = {
implicitly[Foldable[F]].foldleft(list)(m.zero)(m.combine)
}
// impliit参数被隐式传递了
IsoMerism
I don't understand this code
// A pair of arbitrary case classes
case class Foo(i : Int, s : String)
case class Bar(b : Boolean, s : String, d : Double)
// Publish their `HListIso`'s
implicit def fooIso = Iso.hlist(Foo.apply _, Foo.unapply _)
implicit def barIso = Iso.hlist(Bar.apply _, Bar.unapply _)
// And now they're monoids ...
implicitly[Monoid[Foo]]
val f = Foo(13, "foo") |+| Foo(23, "bar")
assert(f == Foo(36, "foobar"))
implicitly[Monoid[Bar]]
val b = Bar(true, "foo", 1.0) |+| Bar(false, "bar", 3.0)
assert(b == Bar(true, "foobar", 4.0))