Type inference rules
Here we’ll continue explaining how Crystal assigns types to each variable and expression of your program. This post is a bit long, but in the end it’s just about making Crystal behave in the most intuitive way for the programmer, to make it behave as similar as possible to Ruby.
We’ll start with literals, C functions and some primitives. Then we’ll continue with flow
control structures, like if
, while
, and blocks. Then we’ll talk about the special
NoReturn
type and type filters.
Literals
Literals have a type of their own, known by the compiler:
true # Boolean
1 # Int32
"hello" # String
1.5 # Float64
C functions
When you define a C function you must tell the compiler its types:
lib C
fun sleep(seconds : UInt32) : UInt32
end
sleep(1_u32) # sleep has type UInt32
allocate
The allocate primitive gives you an uninitialized instance of an object:
class Foo
def initialize(@x)
end
end
Foo.allocate # Foo.allocate has type Foo
You don’t normally invoke it directly. Instead, you invoke new
, which
is automatically generated by the compiler to something like this:
class Foo
def self.new(x)
foo = allocate
foo.initialize(x)
foo
end
def initialize(@x)
end
end
Foo.new(1)
A similar primitive is Pointer#malloc
, which gives you a typed pointer to a
memory region:
Pointer(Int32).malloc(10) # has type Pointer(Int32)
Variables
Next, when you assign an expression to a variable, the variable will be bound to that expression’s type (if the expresision’s type changes, so the variable’s type changes).
a = 1 # 1 is Int32, so a is Int32
The compiler tries to be as smart as possible when you use variables. For example, you can assign multiple times to a variable:
a = 1 # a is Int32
a.abs # ok, Int32 has a method 'abs'
a = "hello" # a is now String
a.size # ok, String has a method 'size'
To achieve this, the compiler remembers which expression was the last one assigned
to a variable. In the above example, after the first line the compiler knows that
a
has type Int32
, so a call to abs
is valid. In the third line we assign
a String
to it, so the compiler remembers this and, on the fourth line, it’s perfectly
valid to invoke size
on it.
Additionally, the compiler remembers that both an Int32
and a String
were
assigned to a
. When generating LLVM code, the compiler will represent a
as a union type that can be Int32 or String. It would be something like this in C:
struct Int32OrString { int type_id; union { int int_value; string string_value; } data; }
This might seem inefficient if we continually assign different types to the same variable.
However, the compiler knows that when you invoked abs
, a
was an Int32, so it never
checks the type_id
field: it directly uses the int_value
field. LLVM notices this
and optimizes this out, so in the generated code there will never be a union (the type_id
field is never read).
Going back to Ruby, if you assign a variable multiple times in a row, the last value (and type) is the one that counts for subsequent calls. Crystal mimics this behaviour. A variable then just becomes a name for the last expression that we assigned to.
If
Let’s take a piece of Ruby code and analyze it:
if some_condition
a = 1
a.abs
else
a = "hello"
a.size
end
a.size
In Ruby, the only line that can fail at runtime is the last one. The first call to abs
will never fail, as an Int32
was assigned to a
. The first call to size
will
also never fail, as a String
was assigned to a
. However, after the if
, a
can
either be an Int32
or a String
.
So Crystal tries to keep this intuitive reasoning about a
’s type. When a variable is assigned
inside an if
’s then or else branch, the compiler knows that it will continue to have that type until the if
ends or until it is assigned a new expression. When an if
ends, the compiler will let a
have the type of the last expressions that it was assigned to in each branch.
The last line in Crystal will give a compiler error: “undefined method ‘size’ for Int32”.
That’s because even though String
has a size
method, Int32
doesn’t.
In designing the language we had two choices: make the above a compile-time error (like now) or just make it a runtime error (like in Ruby). We believe it’s better to make it a compile-time error. In some cases you might know better than the compiler and you will be sure that a variable has the type that you might think. But in some cases the compiler will let you know that you overlooked a case or some logic, and you’ll thank it for that.
The if
has some more cases to take into account. For example, the variable a
might not exist
before the if
. In this case, if it’s not assigned in one of the branches, at the end of the
if
it will also contain the Nil type if it’s read:
if some_condition
a = 1
end
a # here a is Int32 or Nil
This, again, mimics Ruby’s behaviour.
Finally, an if
’s type is the union of the last expressions in both branches. If a branch
is missing, it’s considered to have a Nil
type.
While
A while
is in a way similar to an if
:
a = 1
while some_condition
a = "hello"
end
a # here a is Int32 or String
That’s because some_condition
might be falsy the first time.
However, since a while
is a loop there are some more things to consider. For example,
the last expression assigned to a variable inside a while
determines the type of that
variable in the next iteration. In this way, the type at the beginning of the loop
will be a union of the type before the loop and the type after the loop:
a = 1
while some_condition
a # here a is actually Int32 or String
a = false # here a is Bool
a = "hello" # here a is String
a.size # ok, a is String
end
a # here a is Int32 or String
Some other things to consider inside a while
are break
and next
. A break
makes the types right before the break add to the type at the exit of the while
:
a = 1
while some_condition
a # here a is Int32 or Bool
if some_other_condition
a = "hello" # we break, so at the exit a can also be String
break
end
a = false # here a is Bool
end
a # here a is Int32 or String or Bool
A next
adds the type to the beginning of the while
:
a = 1
while some_condition
a # here a is Int32 or String or Bool
if some_other_condition
a = "hello" # we next, so in the next iteration a can be String
next
end
a = false # here a is Bool
end
a # here a is Int32 or String or Bool
Blocks
Blocks are very similar to a while
: they can be executed zero or more times. So
the logic for variables’ types is very similar to that of a while
.
NoReturn
There’s a mysterious type in Crystal called NoReturn
. One such example is
C’s exit
function:
lib C
fun exit(status : Int32) : NoReturn
end
C.exit(1) # this is NoReturn
puts "hello" # this will never be executed
Another very useful method that is NoReturn is raise
: raising an exception.
The type basically means: after this point there’s nothing else. Nothing gets returned,
and nothing that comes afterwars is executed (of course, a rescue
will be executed
if there’s one surrounding the code, but the normal path won’t be executed).
The compiler knows about NoReturn
. For example, take a look at the following code:
a = some_int
if a == 1
a = "hello"
puts a.size # ok
raise "Boom!"
else
a = 2
end
a # here a can only be Int32
Remember that after an if
a variable’s type is the union of the types of
both branches. However, since the first branch ends there, because raise
is NoReturn
,
the compiler knows that code after the if
, if that branch is taken, will never be
executed. So it can definitely say: a
will only have the type of the else
branch.
The same logic applies when you have return
, break
or next
inside an if
.
Also, when you define a method whose type is NoReturn
, that method is in turn NoReturn
:
def raise_boom
raise "Boom!"
end
if some_condition
a = 1
else
raise_boom
end
a.abs # ok
Union of NoReturn
Remember that an if
’s type is the union of the last expressions of the if
’s branches.
What type has the following if
(and consequently the a
variable)?
a = if some_condition
raise "Boom!"
else
1
end
a # a is...?
Well, the then
branch is definitely NoReturn
. The else
branch is definitely
Int32
. We could conclude then that a
has type NoReturn
or Int32
.
However, NoReturn
means that nothing gets executed afterwards. So a
can only
be Int32
at the end of the previous snippet, and that’s how the compiler behaves.
With this we can implement a little method called not_nil!
. Here it is:
class Object
def not_nil!
self
end
end
class Nil
def not_nil!
raise "Nil assertion failed"
end
end
a = some_condition ? 1 : nil
a.not_nil!.abs # compiles!
a
’s type is Int32
or Nil
. One thing that we didn’t say yet is that when you
have a union type and you invoke a method on it, and all types respond to that method, the resulting
type is the union of the types of each method.
In this case, a.not_nil!
will have the type Int32
if a
is Int32
, or
NoReturn
if it’s Nil
(because of the raise
). Combining these types just
gives Int32
, so the above code is perfectly valid. And that’s how you can discard Nil
from a variable and turn it into a runtime exception if it turns out to be nil
. No special language
construct is needed. All is made with the logic explained so far.
Type filters
Now, what if we want to execute a method on a variable whose type is Int32
or Nil
,
but only if that variable is Int32
. If it’s Nil
, we don’t want to do anything.
We can’t use not_nil!
, because that will raise a runtime exception when nil.
We can define another method, try
:
class Object
def try
yield self
end
end
class Nil
def try(&block)
nil
end
end
a = some_condition ? 1 : nil
b = a.try &.abs # b is Int32 or Nil
(if you are not sure what &.abs
means, read this)
Since doing something depending on whether a value is Nil
or not is so common, Crystal
provides another way to do the above. This was shortly explained
here, but now we’ll explain it
better and combine it with the previous explanations.
If a variable is an if
’s condition, the compiler assumes the variable is not nil
inside the then
branch:
a = some_condition ? 1 : nil # a is Int32 or Nil
if a
a.abs # a is Int32
end
This makes sense: if a
is truthy then it means it is not nil
. Not only this,
but the compiler also makes a
’s type be that one after the if
, combined with
whatever type a
has in the else
branch. For example:
a = some_condition ? 1 : nil
if a
a.abs # ok, here a is Int32
else
a = 1 # here a is Int32
end
a.abs # ok, a can only be Int32 here
Just like a programmer expects the above to always work in Ruby (never raise an “undefined method” error in runtime), so it works in Crystal.
We call the above a “type filter”: a
’s type got filtered inside the if
’s
then
branch by removing Nil
from the possible types a
can have.
Another type filter happens when you do is_a?
:
a = some_condition ? 1 : nil
if a.is_a?(Int32)
a.abs # ok
end
And another type filter happens when you do responds_to?
:
a = some_condition ? 1 : nil
if a.responds_to?(:abs)
a.abs # ok
end
These are special methods, known by the compiler, and that’s why the compiler is
able to filter the types. On the contrary, the method nil?
is not special right now
so the following won’t work:
a = some_condition ? 1 : nil
if a.nil?
else
a.abs # should be ok, but now gives error
end
We’ll probably make nil?
a special method too, so it’s more consistent with the
rest of the language and the above works. We’ll also probably make the unary !
method
special, not overloadable, so you could do:
a = some_condition ? 1 : nil
if !a
else
a.abs # should be ok, but now gives error
end
Conclusion
In conclusion, as was said in the beginning of this post, we want Crystal to behave as much as possible as Ruby, and if something is intuitive and makes sense for the programmer to make the compiler understand it too. For example:
def foo(x)
return unless x
x.abs # ok
end
a = some_condition ? 1 : nil
b = foo(a)
The above shouldn’t give you a compile time error. The programmer knows that if x
was nil
inside foo
, the method returns. It follows that x
can never be
nil
afterwards so it’s ok to invoke abs
on it. How does the compiler know this?
Well, first, the compiler rewrites an unless
to an if
:
def foo(x)
if x
else
return
end
x.abs # ok
end
Next, inside the then
branch of the if
we know that x
is not nil
.
Inside the else
branch the method returns, so we don’t care about the type of x
afterwards. So, after the if
, x
can only be of type Int32
. This is idiomatic
code in Ruby, and so it is in Crystal if we carefully follow the language rules.
We still have to talk about methods and instance variables, but this post is already long enough so that will have to be explained in a following post. Stay tuned!