Design and implementation details
Baseline Step Proto
The internals of Step Runner operate on the baseline step definition
which is defined in Protocol Buffer. All GitLab CI steps (and other
supported formats such as GitHub Actions) compile / fold to baseline steps.
Both step invocations in .gitlab-ci.yml
and step definitions
in step.yml
files will be compiled to baseline structures.
The term "step" means "baseline step" for the remainder of this document.
Each step includes a reference ref
in the form of a URI. The method of
retrieval is determined by the protocol of the URI.
Steps and step traces have fields for inputs, outputs,
environment variables and environment exports.
After steps are downloaded and the step.yml
is parsed
a step definition def
will be added.
If a step defines multiple additional steps then the
trace will include sub-traces for each sub-step.
message Step {
string name = 1;
string step = 2;
map<string,string> env = 3;
map<string,google.protobuf.Value> inputs = 4;
}
message Definition {
DefinitionType type = 1;
Exec exec = 2;
repeated Step steps = 3;
message Exec {
repeated string command = 1;
string work_dir = 2;
}
}
enum DefinitionType {
definition_type_unspecified = 0;
exec = 1;
steps = 2;
}
message Spec {
Content spec = 1;
message Content {
map<string,Input> inputs = 1;
message Input {
InputType type = 1;
google.protobuf.Value default = 2;
}
}
}
enum InputType {
spec_type_unspecified = 0;
string = 1;
number = 2;
bool = 3;
struct = 4;
list = 5;
}
message StepResult {
Step step = 1;
Spec spec = 2;
Definition def = 3;
enum Status {
unspecified = 0;
running = 1;
success = 2;
failure = 3;
}
Status status = 4;
map<string,Output> outputs = 5;
message Output {
string key = 1;
string value = 2;
bool masked = 3;
}
map<string,string> exports = 6;
int32 exit_code = 7;
repeated StepResult children_step_results = 8;
}
Step Caching
Steps are cached locally by a key comprised of location
(URL), version
and hash
. This prevents the exact same component
from being downloaded multiple times. The first time a step is
referenced it will be downloaded (unless local) and the cache will
return the path to the folder containing step.yml
and the other
step files. If the same step is referenced again, the same folder
will be returned without downloading.
If a step is referenced which differs by version or hash from another cached step, it will be re-downloaded into a different folder and cached separately.
Execution Context
State is kept by Step Runner across all steps in the form of an execution context. The context contains the output of each step, environment variables and overall job and environment metadata. The execution context can be referenced by expressions in GitLab CI steps provided by the workflow author.
Example of context available to expressions in .gitlab-ci.yml
:
steps:
previous_step:
outputs:
name: "hello world"
env:
EXAMPLE_VAR: "bar"
job:
id: 1234
Expressions in step definitions can also reference execution
context. However they can only access overall
job and environment metadata and the inputs defined in step.yml
.
They cannot access the outputs of previous steps. In order to
provide the output of one step to the next, the step input
values should include an expression which references another
step's output.
Example of context available to expressions in step.yml
:
inputs:
name: "foo"
env:
EXAMPLE_VAR: "bar"
job:
id: 1234
E.g. this is not allowed in a step.yml file
because steps
should not couple to one another.
spec:
inputs:
name:
---
type: exec
exec:
command: [echo, hello, ${{ steps.previous_step.outputs.name }}]
This is allowed because the GitLab CI steps syntax passes data from one step to another:
spec:
inputs:
name:
---
type: exec
exec:
command: [echo, hello, ${{ inputs.name }}]
steps:
- name: previous_step
...
- name: greeting
inputs:
name: ${{ steps.previous_step.outputs.name }}
Therefore evaluation of expressions will done in two different kinds of context. One as a GitLab CI Step and one as a step definition.
Step Inputs
Step inputs can be given in several ways. They can be embeded
directly into expressions in an exec
command (as above). Or they
can be embedded in expressions for environment variables set during
exec:
spec:
inputs:
name:
---
type: exec
exec:
command: [greeting.sh]
env:
NAME: ${{ inputs.name }}
Input Types
Input values are stored as strings. But they can also have a type associated with them. Supported types are:
string
bool
number
object
String type values can be any string. Bool type values must be either true
or false
when parsed as JSON. Number type values must a valid float64
when parsed as JSON. Object types will be a JSON serialization of
the YAML input structure.
For example, these would be valid inputs:
steps:
- name: my_step
inputs:
foo: bar
baz: true
bam: 1
Given this step definition:
spec:
inputs:
foo:
type: string
baz:
type: bool
bam:
type: number
---
type: exec
exec:
command: [echo, ${{ inputs.foo }}, ${{ inputs.baz }}, ${{ inputs.bam }}]
And it would output bar true 1
For an object type, these would be valid inputs:
steps:
name: my_step
inputs:
foo:
steps:
- name: my_inner_step
inputs:
name: steppy
Given this step definition:
spec:
inputs:
foo:
type: object
---
type: exec
exec:
command: [echo, ${{ inputs.foo }}]
And it would output {"steps":[{"name":"my_inner_step","inputs":{"name":"steppy"}}]}
Outputs
Output files are created into which steps can write their
outputs and environment variable exports. The file locations are
provided in OUTPUT_FILE
and ENV_FILE
environment variables.
After execution Step Runner will read the output and environment variable files and populate the trace with their values. The outputs will be stored under the context for the executed step. And the exported environment variables will be merged with environment provided to the next step.
Some steps can be of type steps
and be composed of a sequence
of GitLab CI steps. These will be compiled and executed in sequence.
Any environment variables exported by nested steps will be available
to subsequent steps. And will be available to high level steps
when the nested steps are complete. E.g. entering nested steps does
not create a new "scope" or context object. Environment variables
are global.
Containers
We've tried a couple approaches to running steps in containers. In end we've decided to delegate steps entirely to a step runner in the container.
Here are the options considered:
Delegation (chosen option)
A provision is made for passing complex structures to steps, which
is to serialize them as JSON (see Inputs above). In this way the actual
step to be run can be merely a parameter to step running in container.
So the outer step is a docker/run
step with a command that executes
step-runner
with a steps
input parameter. The docker/run
step will
run the container and then extract the output files from the container
and re-emit them to the outer steps.
This same technique will work for running steps in VMs or whatever. Step Runner doesn't have to know anything about containerizing or isolation steps.
Special Compilation (rejected option)
When we see the image
keyword in a GitLab CI step we would download
and compile the "target" step. Then manufacture a docker/run
step
and pass the complied exec
command as an input. Then we would compile
the docker/run
step and execute it.
However this requires Step Runner to know how to construct a docker/run
step. Which couples Step Runner with the method of isolation, making
isolation in VMs and other methods more complicated.
Native Docker (rejected option)
The baseline step can include provisions for running a step in a
Docker container. For example the step could include a ref
"target"
field and an image
field.
However this also couples Step Runner with Docker and expands the role of Step Runner. It is preferable to make Docker an external step that Step Runner execs in the same way as any other step.