Purpose

Describe how an artifact or set of artifacts was produced so that:

  • Consumers of the provenance can verify that the artifact was built according to expectations.
  • Others can rebuild the artifact, if desired.

This predicate is the recommended way to satisfy the SLSA provenance requirements.

Prerequisite

Understanding of SLSA Software Attestations and the larger in-toto attestation framework.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Model

Provenance is an attestation that the builder produced the subject software artifacts through execution of the buildDefinition.

Build Model

The model is as follows:

  • Each build runs as an independent process on a multi-tenant platform. The builder is the identity of this platform, representing the transitive closure of all entities that are trusted to faithfully run the build and record the provenance. (Note: The same model can be used for platform-less or single-tenant build systems.)

  • The build process is defined by a parameterized template, identified by buildType. Often a build platform only supports a single build type. For example, the GitHub Actions platform only supports executing a GitHub Actions workflow file.

  • All top-level, independent inputs are captured by the parameters to the template. There are two types of parameters:

    • externalParameters: the external interface to the build. In SLSA, these values are untrusted; they MUST be included in the provenance and MUST be verified downstream.

    • systemParameters: set internally by the platform. In SLSA, these values are trusted because the platform is trusted; they are OPTIONAL and need not be verified downstream. They MAY be included to enable reproducible builds, debugging, or incident response.

    Some (but not all) parameters are references to artifacts. For example, the external parameters for a GitHub Actions workflow includes the source repository (artifact reference) and the path to the workflow file (string value).

  • All other artifacts fetched during initialization or execution of the build process are considered dependencies. The resolvedDependencies captures these dependencies, if known.

  • During execution, the build process MAY communicate with the build platform’s control plane and/or build caches. This communication is not captured in the provenance but is subject to SLSA Requirements.

  • Finally, the build process outputs one or more artifacts, identified by subject.

For concrete examples, see index of build types.

TODO: Align with the Build model.

Parsing rules

This predicate follows the in-toto attestation parsing rules. Summary:

  • Consumers MUST ignore unrecognized fields.
  • The predicateType URI includes the major version number and will always change whenever there is a backwards incompatible change.
  • Minor version changes are always backwards compatible and “monotonic.” Such changes do not update the predicateType.
  • Producers MAY add extension fields using field names that are URIs.
  • Optional fields MAY be unset or null, and should be treated equivalently. Both are equivalent to empty for object or array values.

Schema

NOTE: This section describes the fields within predicate. For a description of the other top-level fields, such as subject, see Statement.

{
    // Standard attestation fields:
    "_type": "https://in-toto.io/Statement/v0.1",
    "subject": [...],

    // Predicate:
    "predicateType": "https://slsa.dev/provenance/v1?draft",
    "predicate": {
        "buildDefinition": {
            "buildType": string,
            "externalParameters": { [string]: #ParameterValue },
            "systemParameters": { [string]: #ParameterValue },
            "resolvedDependencies": [ ...#ArtifactReference ],
        },
        "runDetails": {
            "builder": {
                "id": string,
                "version": string,
                "builderDependencies": [ ...#ArtifactReference ],
            },
            "metadata": {
                "invocationId": string,
                "startedOn": #Timestamp,
                "finishedOn": #Timestamp,
            },
            "byproducts": [ ...#ArtifactReference ],
        }
    }
}

#ParameterValue: {
    "artifactRef": #ArtifactReference
} | {
    "scalarValue": string
} | {
    "mapValue": { [string]: string }
} | {
    "arrayValue": [ ...string ]
}

#ArtifactReference: {
    "uri": string,
    "digest": {
        "sha256": string,
        "sha512": string,
        "sha1": string,
        // TODO: list the other standard algorithms
        [string]: string,
    },
    "localName": string,
    "downloadLocation": string,
    "mediaType": string,
}

#Timestamp: string  // <YYYY>-<MM>-<DD>T<hh>:<mm>:<ss>Z

Protocol buffer schema

Link: provenance.proto

syntax = "proto3";

package slsa.v1;

import "google/protobuf/struct.proto";
import "google/protobuf/timestamp.proto";

// NOTE: While file uses snake_case as per the Protocol Buffers Style Guide, the
// provenance is always serialized using JSON with lowerCamelCase. Protobuf
// tooling performs this case conversion automatically.

message Provenance {
  BuildDefinition build_definition = 1;
  RunDetails run_details = 2;
}

message BuildDefinition {
  string build_type = 1;
  map<string, ParameterValue> external_parameters = 2;
  map<string, ParameterValue> system_parameters = 3;
  repeated ArtifactReference resolved_dependencies = 4;
}

message ParameterValue {
  // Logically a oneof, but oneof doesn't support repeated or map.
  ArtifactReference artifact_ref = 1;
  string scalar_value = 2;
  map<string, string> map_value = 3;
  repeated string array_value = 4;
}

message ArtifactReference {
  string uri = 1;
  map<string, string> digest = 2;
  string local_name = 3;
  string download_location = 4;
  string media_type = 5;
}

message RunDetails {
  Builder builder = 1;
  BuildMetadata metadata = 2;
  repeated ArtifactReference byproducts = 3;
}

message Builder {
  string id = 1;
  map<string, string> version = 2;
  repeated ArtifactReference builder_dependencies = 3;
}

message BuildMetadata {
  string invocation_id = 1;
  google.protobuf.Timestamp started_on = 2;
  google.protobuf.Timestamp finished_on = 3;
}

Provenance

REQUIRED for SLSA Build L1: buildDefinition, runDetails

FieldTypeDescription
buildDefinition BuildDefinition

The input to the build. The accuracy and completeness are implied by runDetails.builder.id.

runDetails RunDetails

Details specific to this particular execution of the build.

BuildDefinition

REQUIRED for SLSA Build L1: buildType, externalParameters

FieldTypeDescription
buildType string (TypeURI)

Identifies the template for how to perform the build and interpret the parameters and dependencies.

The URI SHOULD resolve to a human-readable specification that includes: overall description of the build type; a list of all parameters (name, description, external vs system, artifact vs scalar vs…, required vs optional, etc.); unambiguous instructions for how to initiate the build given this BuildDefinition, and a complete example. Example: https://slsa.dev/github-actions-workflow/v0.1

externalParameters map (string→ParameterValue)

The parameters that are under external control, such as those set by a user or tenant of the build system. They MUST be complete at SLSA Build L3, meaning that that there is no additional mechanism for an external party to influence the build. (At lower SLSA Build levels, the completeness MAY be best effort.)

The build system SHOULD be designed to minimize the size and complexity of externalParameters, in order to reduce fragility and ease verification. Consumers SHOULD have an expectation of what “good” looks like; the more information that they need to check, the harder that task becomes.

systemParameters map (string→ParameterValue)

The parameters that are under the control of the builder. The primary intention of this field is for debugging, incident response, and vulnerability management. The values here MAY be necessary for reproducing the build. There is no need to verify these parameters because the build system is already trusted, and in many cases it is not practical to do so.

resolvedDependencies array (ArtifactReference)

Collection of artifacts needed at build time, aside from those listed in externalParameters or systemParameters. For example, if the build script fetches and executes “example.com/foo.sh”, which in turn fetches “example.com/bar.tar.gz”, then both “foo.sh” and “bar.tar.gz” should be listed here.

The BuildDefinition describes all of the inputs to the build. It SHOULD contain all the information necessary and sufficient to initialize the build and begin execution.

The externalParameters and systemParameters are the top-level inputs to the template, meaning inputs not derived from another input. Each field is a map from parameter name to parameter value. The each parameter name MUST be unique across externalParameters and systemParameters. The following conventional names are RECOMMENDED when appropriate:

  • source: The primary input to the build.
  • config: The build configuration, if different from source.

Guidelines:

  • Maximize the amount of information that is implicit from the meaning of buildType. In particular, any value that is boilerplate and the same for every build SHOULD be implicit.

  • Reduce parameters by moving configuration to input artifacts whenever possible. For example, instead of passing in compiler flags via an external parameter that has to be verified separately, require the flags to live next to the source code or build configuration so that verifying the latter automatically verifies the compiler flags.

  • If possible, architect the build system to use this definition as its sole top-level input, in order to guarantee that the information is sufficient to run the build.

  • In some cases, the build configuration is evaluated client-side and sent over the wire, such that the build system cannot determine its origin. In those cases, the build system SHOULD serialize the configuration in a deterministic way and record the digest without a uri. This allows one to consider the client-side evaluation as a separate “build” with its own provenance, such that the verifier can chain the two provenance attestations together to determine the origin of the configuration.

TODO: Explain the purpose of resolvedDependencies. Why do we need it? What goes in it? Is it OK for it to be incomplete? If a dependency is already pinned, does it need to be listed? How does one choose between resolvedDependencies and builderDependencies?

ParameterValue

REQUIRED: exactly one of the fields MUST be set.

Field Type Description
artifactRef ArtifactReference Reference to an artifact.
scalarValue string Scalar value.
mapValue map (string→string) Unordered collection of name/value pairs.
arrayValue array (string) Ordered collection of values.

For simplicity, only string values or collections of string values are supported.

RFC: The design of parameters is still not settled. We welcome feedback on this particular design and suggestions for alternatives. In particular:

  • How restrictive should we be? This is somewhat of a balance between making it easier for the builder vs verifier. A very restrictive type, such as only strings, makes it easier to set expectations but harder for a builder to describe reality. A very open type, such as an arbitrary JSON object, provides a lot of freedom to builders but possibly at the cost of complexity in terms of expectations.
  • Is there a better way to express types than using field names?
  • Do we need ArtifactReference? Would it instead make sense to just have the raw parameter here and then represent the digest in resolvedDependencies? What is the specific use case?

Alternatives considered so far:

  • Only allow strings (difficult for many builders)
  • Allow strings, maps of strings, or arrays of strings (current design)
  • Allow arbitrary JSON (challenge: how do we do ArtifactReference?)

ArtifactReference

REQUIRED: at least one of uri or digest

FieldTypeDescription
uri string (URI)

URI describing where this artifact came from. When possible, this SHOULD be a universal and stable identifier, such as a source location or Package URL (purl).

digest DigestSet

One or more cryptographic digests of the contents of this artifact.

TODO: Decide on hex vs base64 in #533 then document it here.

localName string

The name for this artifact local to the build.

downloadLocation string (URI)

URI identifying the location that this artifact was downloaded from, if different and not derivable from uri.

mediaType string (MediaType)

Media type (aka MIME type) of this artifact was interpreted.

Example:

{
    "uri": "pkg:pypi/pyyaml@6.0",
    "digest": {"sha256": "5f0689d54944564971f2811f9788218bfafb21aa20f532e6490004377dfa648f"},
    "localName": "PyYAML-6.0.tar.gz",
    "downloadLocation": "https://files.pythonhosted.org/packages/36/2b/61d51a2c4f25ef062ae3f74576b01638bebad5e045f747ff12643df63844/PyYAML-6.0.tar.gz",
    "mediaType": "application/gzip"
}

RFC: Do we need all these fields? Is this adding too much complexity?

RunDetails

REQUIRED for SLSA Build L1: builder (unless id is implicit from the attestation envelope)

FieldTypeDescription
builder Builder

Identifies the entity that executed the invocation, which is trusted to have correctly performed the operation and populated this provenance.

metadata BuildMetadata

Metadata about this particular execution of the build.

byproducts array (ArtifactReference)

Additional artifacts generated during the build that should not be considered the “output” of the build but that may be needed during debugging or incident response. For example, this might reference logs generated during the build and/or a digest of the fully evaluated build configuration.

In most cases, this SHOULD NOT contain all intermediate files generated during the build. Instead, this should only contain files that are likely to be useful later and that cannot be easily reproduced.

TODO: Do we need some recommendation for how to distinguish between byproducts? For example, should we recommend using localName?

Builder

REQUIRED for SLSA Build L1: id (unless implicit from the attestation envelope)

FieldTypeDescription
id string (TypeURI)

URI indicating the transitive closure of the trusted builder.

TODO: In most cases this is implicit from the envelope layer (e.g. the public key or x.509 certificate), which is just one more thing to mess up. Can we rescope this to avoid the duplication and thus the security concern? For example, if the envelope identifies the build system, this might identify the tenant project?

TODO: Provide guidance on how to choose a URI, what scope it should have, stability, how verification works, etc.

version map (string→string)

Version numbers of components of the builder.

builderDependencies array (ArtifactReference)

Dependencies used by the orchestrator that are not run within the workload and that should not affect the build, but may affect the provenance generation or security guarantees.

TODO: Flesh out this model more.

The builder represents the transitive closure of all the entities that are, by necessity, trusted to faithfully run the build and record the provenance.

The id MUST reflect the trust base that consumers care about. How detailed to be is a judgement call. For example, GitHub Actions supports both GitHub-hosted runners and self-hosted runners. The GitHub-hosted runner might be a single identity because it’s all GitHub from the consumer’s perspective. Meanwhile, each self-hosted runner might have its own identity because not all runners are trusted by all consumers.

Consumers MUST accept only specific signer-builder pairs. For example, “GitHub” can sign provenance for the “GitHub Actions” builder, and “Google” can sign provenance for the “Google Cloud Build” builder, but “GitHub” cannot sign for the “Google Cloud Build” builder.

Design rationale: The builder is distinct from the signer because one signer may generate attestations for more than one builder, as in the GitHub Actions example above. The field is required, even if it is implicit from the signer, to aid readability and debugging. It is an object to allow additional fields in the future, in case one URI is not sufficient.

RFC: Should we just allow builders to set arbitrary properties, rather than calling out version and builderDependencies? We don’t expect verifiers to use any of them, so maybe that’s the simpler approach? Or have a properties that is an arbitrary object? (#319)

RFC: Do we want/need to identify the tenant of the build system, separately from the build system itself? If so, should it be a single id that combines both (e.g. https://builder.example/tenants/company1.example/project1), or two separate fields (e.g. {"id": "https://builder.example", "tenant": "https://company1.example/project1"})? What would the use case be for this? How should verification work?

BuildMetadata

REQUIRED: (none)

FieldTypeDescription
invocationId string

Identifies this particular build invocation, which can be useful for finding associated logs or other ad-hoc analysis. The exact meaning and format is defined by builder.id; by default it is treated as opaque and case-sensitive. The value SHOULD be globally unique.

startedOn string (Timestamp)

The timestamp of when the build started.

finishedOn string (Timestamp)

The timestamp of when the build completed.

Verification

TODO: Describe how clients are expected to verify the provenance.

Index of build types

The following is an partial index of build type definitions. Each contains a complete example predicate.

TODO: Before marking the spec stable, add at least 1-2 other build types to validate that the design is general enough to apply to other builders.

Migrating from 0.2

To migrate from version 0.2 (old), use the following pseudocode. The meaning of each field is unchanged unless otherwise noted.

{
    "buildDefinition": {
        // The `buildType` MUST be updated for v1.0 to describe how to
        // interpret `inputArtifacts`.
        "buildType": /* updated version of */ old.buildType,
        "externalParameters": old.invocation.parameters + {
            // It is RECOMMENDED to rename "entryPoint" to something more
            // descriptive.
            "entryPoint": old.invocation.configSource.entryPoint,
            // OPTION 1:
            // If the old `configSource` was the sole top-level input,
            // (i.e. containing the source or a pointer to the source):
            "source": {
                "artifactRef": {
                    "uri": old.invocation.configSource.uri,
                    "digest": old.invocation.configSource.digest,
                },
            },
            // OPTION 2:
            // If the old `configSource` contained just build configuration
            // and a separate top-level input contained the source:
            "source": {
                "artifactRef": old.materials[indexOfSource],
            },
            "config": {
                "artifactRef": {
                    "uri": old.invocation.configSource.uri,
                    "digest": old.invocation.configSource.digest,
                },
            },
        },
        "systemParameters": {
            "artifacts": null, // not in v0.2
            "values": old.invocation.environment,
        },
        "resolvedDependencies": old.materials,
    },
    "runDetails": {
        "builder": {
            "id": old.builder.id,
            "version": null,  // not in v0.2
            "builderDependencies": null,  // not in v0.2
        },
        "metadata": {
            "invocationId": old.metadata.buildInvocationId,
            "startedOn": old.metadata.buildStartedOn,
            "finishedOn": old.metadata.buildFinishedOn,
        },
        "byproducts": null,  // not in v0.2
    },
}

The following fields from v0.2 are no longer present in v1.0:

  • entryPoint: Use externalParameters[<name>] instead.
  • buildConfig: No longer inlined into the provenance. Instead, either:
    • If the configuration is a top-level input, record its digest in externalParameters["config"].
    • Else if there is a known use case for knowing the exact resolved build configuration, record its digest in byproducts. An example use case might be someone who wishes to parse the configuration to look for bad patterns, such as curl | bash.
    • Else omit it.
  • metadata.completeness: Now implicit from builder.id.
  • metadata.reproducible: Now implicit from builder.id.

Change history

v1.0 (DRAFT)

Major refactor to reduce misinterpretation, including a minor change in model.

  • Significantly expanded all documentation.
  • Altered the model slightly to better align with real-world build systems, align with reproducible builds, and make verification easier.
  • Grouped fields into buildDefinition vs runDetails.
  • Renamed parameters and environment to externalParameters and systemParameters, respectively. Both can now reference artifacts or string values.
  • Split and merged configSource into externalParameters.
  • Split and merged materials into resolvedDependencies, externalParameters, systemParameters, and builderDependencies.
  • Added localName, downloadLocation, and mediaType to artifact references.
  • Removed buildConfig; can be replaced with externalParameters.artifacts["config"], byproducts, or simply omitted.
  • Removed completeness and reproducible; now implied by builder.id.
  • Added builder.version.
  • Added byproducts.

v0.2

Refactored to aid clarity and added buildConfig. The model is unchanged.

  • Replaced definedInMaterial and entryPoint with configSource.
  • Renamed recipe to invocation.
  • Moved invocation.type to top-level buildType.
  • Renamed arguments to parameters.
  • Added buildConfig, which can be used as an alternative to configSource to validate the configuration.

rename: slsa.dev/provenance

Renamed to “slsa.dev/provenance”.

v0.1.1

  • Added metadata.buildInvocationId.

v0.1

Initial version, named “in-toto.io/Provenance”