On this page:
OCamlDuce is a merger between OCaml and
See the README.cduce file for more information, The sub-directory tests/ has some simple examples. There is also an ocamldoc-generated documentation for the support library.
The build procedure for OCamlDuce is exactly the same as for OCaml: configure, make world, make install. The names of the tools are unchanged: ocaml,ocamlc,ocamlopt. Currently, OCamlDuce is based on CVS snapshots of OCaml (between 3.08.3 and the current release308 branch) and CDuce (between 0.3.91 and the head).
GODI users can upgrade an existing installation by adding this line to their etc/godi.conf file:
and by forcing a recompilation of the godi-ocaml-src and godi-ocaml packages.
In a nutshell, OCamlDuce extends OCaml with a new kind of values (x-values) to represent XML documents, fragments, tags, Unicode strings. In order to describe these values, it also extends the type algebra with so-called x-types. The philosophy behind these types is that they represent set of x-values. They can be very precise: indeed, each value can be seen as a singleton type (a set with a single value), and it is possible to form Boolean combinations of x-types (intersection, union, difference).
OCamlDuce's type system can be understood as a refinement of OCaml. For each sub-expression which is inferred to be of the x-kind (using OCaml unification based type-system), OCamlDuce will try to infer to best possible sound x-type. Here, best means smallest for the natural subtyping relation (set inclusion). The inference algorithm is actually a data-flow analysis: the x-type will collect all the values that can be produced by the expression, considering all the possible data-flow in the program. It it sometimes necessary to provide explicit type annotations to help the type checker infer this type, in particular when you define recursive functions or when you use iterators.
Subtyping is implicit for x-types: if an expression is inferred to be
of x-type t
, which is a subtype of s
, then
it is possible to use this expression in any context which expects a
value of type s
.
Most of the new language features are enclosed within double curly braces
{{ON}}{{...}}
. For instance, the following code sample
defines a value x
as an XML element (with tag
a
, an attribute href
, and a simple
string as content):
What appears between the curly braces is called an x-expression.
Similarly, there are x-types (as seen above), and also x-patterns.
The delimiters {{ON}}{{...}}
are only used
for syntactical reasons, to avoid clashed between OCaml and CDuce
syntaxes and lexical conventions. As a matter of fact,
an OCaml expression need not be a syntactical x-expression
(delimited by double curly braces) to evaluate to an x-value.
For instance, once x
has been declared as above,
the expression x
evaluates to an x-value.
It is possible to use an arbitrary
OCaml expression as part of an x-expression: it must simply be
protected by a new pair of double curly braces. For instance, there is
no if-then-else
construction for x-expressions, but you
can write:
Only the highlighted parts are parsed as x-expressions. The
if-then-else
sub-expression is parsed as an OCaml
expression, but its type is an x-type (namely {{ON}}{{[ 'a' |
'z' ]}}
).
X-values are intended to represent XML documents and fragments thereof: elements, tags, text, sequences. In this section, we present the x-value algebra, the syntax of the corresponding x-expression constructors and the associated x-types.
There are three kinds of atomic kind of x-values:
X-characters are different from OCaml characters. They can represent
the range of Unicode codepoints defined in the XML specification.
Character literals are delimited by single quotes. The escape
sequences \n, \r, \t, \b, \', \", \\ are recognized as usual. The
numerical escape sequence are written \n;
where n is an integer
literal (note the extra semi-colon). The source code is interpreted as
being encoded in iso-8859-1. As a consequence, Unicode characters which are not
part of the Latin1 character set must be introduced with this
numerical escape mechanism. The x-types for x-characters are:
c -- d
, where c
and
d
are literals (example: {{ON}}type t = {{ 'a'--'z'
}}
);Char
;Latin1Char
(defined as \0; -- \255;
).
X-integers are arbitrarily large. Literals must be written in decimal.
Negative literals must be in parenthesis. E.g.: (-3)
.
The x-types for x-integers are:
i -- j
, where i
and
j
are literals (example: {{ON}}type t = {{ 10--20
}}
); it is possible to replace i
or j
with **
to define open-ended intervals, e.g.
{{ON}}type pos = {{ 1 -- ** }}
;
Int
;Int32
(resp.
Int64
).
Qualified names are intended to represent XML tag names. Conceptually,
they are made of a namespace URI and a local name. Since URIs tends
to be long, literals are of the form `prefix:local
where local
is the local name and prefix
is an namespace prefix bound to some URI (in the scope of the
literal). The local name follows the definitions from
the XML Namespaces specification; a dot character must be protected
by a backslash and non-Latin1 characters are written as character
literals \n;
. See below for a
explanation on how to bind prefixes to URIs. To refer
to the default namespace (or the absence of namespace if not default
has been defined), the syntax is simply `local
.
The x-types for qualified names are:
Atom
;`ns:*
.
X-records are mainly used to represent the set of attributes of an XML
element. An x-record is a binding from a finite set of labels
to x-values. Labels follows the same syntax as for qualified names
without the leading backquote. However, if the namespace prefix is not
given, the default namespace does not apply (the namespace URI is
empty). The syntax for record x-expressions is { l1=e1
... ln=en }
where the li
are labels and the
ei
are x-expressions. Fields can also be separated with a
semi-colon. It is legal to omit the expression for a field; the label is then
taken as the content of the field (a value with this name must be
defined in the current scope), e.g.: {{ON}}let x = ... and y = ...
in {{ {x y z=3} }}
is equivalent to {{ON}}let x = ... and
y = ... in {{ {x=x y=y z=3} }}
. The types for x-records specify
which labels are authorized/mandatory, and what the types of the
corresponding fields are. There are two kind of record x-types:
{ l1=t1 ... ln=tn }
;
{ l1=t1 ... ln=tn .. }
(the final two colons are
in the syntax).
In both cases, it is possible to make one of the fields optional by changing = to =?.
The x-type of all x-record is thus { .. }
,
and the x-type of x-records with maybe a field l
of type Int
and maybe arbitrary other fields is
{ l=?Int .. }
.
X-sequences are finite and ordered collections of x-values.
The syntax for a sequence x-expression in
[ e1 ... en ]
(note that elements are not separated
by semi-colons as in OCaml list). Each item ei
can either be:
!e
where e
is an x-expression which
evaluates to a sequence (whose content is inserted in the sequence
which is currently defined); e.g.
let x = [ 2 3 ] in [ 1 !x 4 ]
is equivalent to
[ 1 2 3 4 ]
;[ 'abc' ]
is equivalent to [ 'a' 'b' 'c' ]
.
X-types for sequences are of the form [R]
where R
is a regular expression over x-types which
describe the possible contents of the sequences. The possible
forms of regular expressions are:
t
(one single element of x-type t
)R*
(zero or more repetitions)R+
(one or more repetitions)R?
(zero or one repetition)R1 R2
(sequence)R1|R2
(alternation)(R)
/t
(guard: the tail of the sequence must comply with
t
).PCDATA
(equivalent to Char*).
Strings are nothing but sequences of characters. There are two
predefined types String
and Latin1
(defined as [ Char* ]
and [ Latin1Char* ]
).
A string literal [ '...' ]
can also be written
"..."
(without the square brackets). Note that simple
(resp. double) quotes need to be escaped only when the string is
delimited with double (resp. simple) quotes.
An XML element is a triple of x-values. The syntax for
the corresponding x-expression constructor is
e3]]>
. When e1
is a
qualified name literal, it is possible to omit the leading
backquote and the surrounding parentheses. Similarly,
when e2
is an x-record literal, it is possible
to omit the curly braces and the parentheses. For instance,
one can simply write ['def']]]>
instead of ['def']]]>
.
XML element x-type are written t3]]>
,
and the same simplifications applies. For instance, if
the namespace prefix ns
has been defined,
the following is a legal x-type []]]>
;
it describes XML elements whose tag is in the namespace bound to
ns
, with an empty content, and with an arbitrary set of
attributes. An underscore in place of (t1)
is
equivalent to (Atom)
(any tag).
In the previous section, we have seen the syntax for x-values constructors (constant literals, sequence, record, element constructors). In this section, we describe the other kinds of x-expressions.
The arithmetic operators on integers follow the usual precedence.
They are written +,*,-,div,mod
(they are all infix).
Record concatenation: e1 ++ e2
. The x-expressions
e1
and e2
must evaluate to x-records.
The result is obtained by concatening them. If a field with the same
label is present in both records, the right-most one is selected.
Sequence concatenation: e1 @ e2
, equivalent
to [!e1 !e2]
.
If the x-expression e
evaluates to a record or an XML
element, the construction e.l
will extract the value of
field or attribute l
. Similarly, the construction
e.?l
will extract the value of field or attribute
l
if present, and return the empty sequence
[]
otherwise.
If the x-expression e
evaluates to a record,
the construction e -. l
will produce a new record
where the field l
has been removed (if present).
If the x-expression e
evaluates to an x-sequence,
the construction e/
will result in a new x-sequence
obtained by taking in order all the children of the XML elements
from the sequence e
. For instance, the x-expression
[ 1 2 3 ] 4 5 [ 6 7 8 ] ]/]]>
evaluates to the x-value [ 1 2 3 6 7 8 ]
.
If the x-expression e
evaluates to an x-sequence,
the construction e.(t)
(where t
is an
x-type) will result in a new x-sequence
obtained by filtering e
to keep only the elements
of type t
. For instance, the x-expression
[ 1 2 3 ] 4 5 [ 6 7 8 ] ].(Int)]]>
evaluates to the x-value [ 4 5 ]
.
If e
is an x-expression and t
is an x-type,
the construction (e :? t)
returns the same
result as e
if it has type t
, and otherwise
raises a Failure
exception whose argument explains
why this is not the case.
OCamlDuce comes with a powerful pattern matching operation.
X-patterns are described below.
The syntax for the pattern matching operation is:
match e with p1 -> e1 | ... | pn -> en
.
The type-system ensures exhaustivivity for the pattern matching
and infers precise types for the capture variables.
It is also possile to use x-pattern matching as a regular
OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
match e with {{p1}} -> e1 | ... | {{pn}} -> en
function {{p1}} -> e1 | ... | {{pn}} -> en
The x-expression let p=e1 in e2
is equivalent to
match e1 with p -> e2
. There is also an local binding
with an x-pattern in OCaml expressions: let {{p}}=e1 in
e2
.
OCamlDuce comes with a sequence iterator
map e with p1 -> e1 | ... | pn -> en
and
a tree iterator
map* e with p1 -> e1 | ... | pn -> en
.
For both constructions, the argument must evaluate to a sequence.
The map
iterator applies the patterns to each element
of this sequence in turns and produces a new sequence by concatenating
all the results (all the right-hand sides must thus produce a
sequence). The set of patterns must be exhaustive for all the possible
elements of the input sequence.
The tree iterator is similar except that the patterns need not be exhaustive. If some element of the input sequence is not matched, it is simply copied into the result unless it is an XML element. In this case, the transformation is applied recursively to its content.
As a convenience, some of the OCaml expression constructors are allowed as x-expressions (without a need to go back to OCaml with double curly braces): (unqualified) value identifiers and function calls.
We have seen how to write simple x-types. We can then combine them with Boolean connectives:
t1 & t2
: intersection;t1 | t2
: union;t1 - t2
: difference.
The empty x-type is written Empty
(it contains no value),
and the universal x-type is written Any
(it contains
all the x-values) or _
.
When an x-type has been bound to some OCaml identifier
({{ON}}type t = {{...}}
), it is possible to use
this identifier in another x-type. Recursive definitions
are allowed:
Note that x-values are always finite and acyclic. The type checker detects type definition which would yield empty types:
If t1
and t2
are record x-types,
we can combine them with the infix ++
operator, which
mimics the corresponding operator on expressions (record
concatenation). Similarly, we can use the infix @
concatenation operator on sequence x-types.
X-patterns follow the same syntax as X-types. In particular, any X-type is a valid X-pattern. In addition to X-types constructors, X-patterns can have:
(x := c)
where x is a capture
variable and c is
a literal x-constant (this pattern always succeeds and returns the
binding x->c).
In record x-patterns, it is possible to omit the =p
part of a field.
The content is then replaced with the label name considered as
a capture variable. E.g. { x y=p }
is equivalent to
{ x=x y=p }
.
It is also possible to add an "else" clause:
{ x = (a,_)|(a:=3) }
will accept any record with atmost the field x
. If the content
is a pair, the capture variable a will be bound to its component;
otherwise, it is set to 3
.
In regular expressions, it is possible to extract whole subsequences
with the notation x::R
, e.g.: [ _* x::Int+ _* ]
If the same sequence capture variable appears several times (or below a
repetition) in a regexp, it is bound to the concatenation of all
matched subsequences. E.g.: [ (x::Int | _)* ]
will
collect in x
all the elements of type Int
from
a sequence.
The regexp operators +,*,?
are greedy by default (they match as long
as possible). They admit non-greedy variants +?,*?,??
.
The binding of namespace prefixes to URIs can be done either by toplevel phrases (structure items) or by local declarations:
The toplevel definitions can also appear in module interfaces (signatures). A toplevel prefix binding is not exported by a module: its scope is limited to the current structure or signature. It is possible to specify a default namespace, and to reset it:
Note that the value pretty-printer invented some prefix
for the namespace URI. The default prefix declaration also have a
local form let namespace "..." in ...
.
As we said above, the programmer is sometimes required to provide type annotations. To know where to put these annotation, it is necessary to get a basic understanding of how type-checking works.
The OCaml type-checker is run first to detect which sub-expressions are of the x-kind. A second ML type-checking pass is then done to introduce subsumption (implicit subtyping) steps where allowed. After these two passes, the OCamlDuce type checker obtains a data-flow summary of x-values in the whole compilation unit. This is a directed graph, whose edges represent either simple data-flow or complex operation on x-values. The nodes of the graph can be thought as x-type variables. A data-flow edge corresponds to a subtyping constraints, and an operation edge corresponds to a symbolic constraints which mimics the corresponding operation on values.
Some of the nodes are given an explicit type by the programmer, through type annotations (on expressions or function arguments) or the other usual mechanism in ML (data type declarations, signatures, ...).
Also, if there is a loop with only subtyping edges in the graph, all the nodes on the loop are merged together.
After this operation, the graph is required to be acyclic (assuming that the nodes with an explicit type are removed from the graph). It is the responsibility of the programmer to provide enough type annotation to achieve this property. Otherwise, a type error is issued.
In the example above, there is a cycle between the result type for
f
and the type for the sub-expression {{ON}}f
{{n-1}}
. It is here broken with a type annotation on the result; it could
have been broken by a type annotation on the expression {{ON}}f
{{n-1}}
, or on the function f
itself, or by a
module signature.
Let us study another simple example:
The type-checkers detects that the two x-values 2
and
3
can flow to the argument of f
. Its body
is thus type-checked with the assumption that x
has type
2--3
. The computed result type is then 3--4
.
The type-inference process described above is global by nature. The acyclicity condition is only imposed after a whole compilation unit has been type-checked by OCaml (and the information from the module interface as been integrated). When a type variable is inferred to be of the x-kind, it is never generalized. As a consequence, there is no parametric polymorphism on x-types.
In the toplevel, type-checking is done after each phrase. Consider the following session:
The function f
is inferred to have type
{{ON}}{{Empty}} -> {{Empty}}
because when the first
phrase is type-checked, the data-flow graph says that no value
can flow to x
, and thus the input type is empty
(and similarly for the result type). If the two phrases
were type-checked together (which would be the case it they had
been compiled by the compiler, not in the toplevel), the type checker
would have correctly inferred that the input type for f
must contain 2
.
Coercion from an x-type to a super type is automatic in OCamlDuce. However, this automatic subsumption does not carry over to OCaml type constructor, even if there are covariant. Consider:
The first attempt to define g
fails because the type for
x
is not an x-type and thus subsumption does not
apply. In the second attempt, we extract the two components of the
pair; since they are inferred to be x-values, subtyping applies to
both of them. Thus, when the pair (a,b)
is reconstructed,
it is legal to unify its type with the input type of f
.
The third definition for g
gives an alternative solution:
using explicit OCaml type coercions.
OCamlDuce strongly seperates regular OCaml values from the new x-values. They have different syntax, expressions, types, patterns, and even type-checking algorithms. This strong segregation is key point which allowed a simple integration between very different type systems.
At some point, it is still necessary to cross the frontier and translate OCaml values to x-values or the opposite.
Fortunately, OCamlDuce provides automatic translations in both
directions. Instead of double curly braces, you can
enclose x-expressions in curly brace+colon {: ... :}
(here, the ...
is an x-expression).
The effect is to translate the result of the x-expression
(which must be an x-value) to an OCaml value. Similarly,
in an x-expression, you can obtain the x-translation of
an OCaml value with the same syntax {: ... :}
(here, the ...
is an OCaml expression).
Here is how the translation works. To each OCaml type t
,
we associate an x-type T(t)
and a pair of translation
function between t
and T(t)
.
Actually, not all the features are supported. For instance,
free type variables, abstract types, object types, non-regular
recursive types cannot be translated. In particular, since
type variables are not allowed, the OCaml type must be fully known.
The translation for an OCaml type t
is by structural
induction on the definition of t
. Sum types are
translated to union types: a constant constructor A
is
translated to the qualified name `A
; a non-constant
constructor A of t1 * ... * tn
is translated to
<A>[ T(t1) ... T(tn) ]
. Closed polymorphic variants
have the same translation. Record types are translated to closed
record x-types. Some other translations:
Caml type t | X-type T(t) |
---|---|
int | Int |
int32 | Int32 |
int64 | Int64 |
string | Latin1 |
t list | [T(t)*] |
t array | [T(t)*] |
unit | [] |
char | Latin1Char |
{{t}} | t |
Here is an example:
In this example, the result type of the translation is inferred
to be {{ON}}{{ Int }} list
(because the type for
f
is given). The corresponding x-type
is {{ON}}{{ [Int*] }}
.