Adds "Legacy Syntax Editions" to the GitHub code repository.

PiperOrigin-RevId: 574248480
pull/14418/head
Protobuf Team Bot 2023-10-17 13:29:25 -07:00 committed by Copybara-Service
parent cbef12f8cf
commit 8489d8d461
2 changed files with 145 additions and 0 deletions

View File

@ -35,3 +35,4 @@ The following topics are in this repository:
* [Edition Evolution](edition-evolution.md)
* [Edition Naming](edition-naming.md)
* [Editions Feature Visibility](editions-feature-visibility.md)
* [Legacy Syntax Editions](legacy-syntax-editions.md)

View File

@ -0,0 +1,144 @@
# Legacy Syntax Editions
**Author:** [@mkruskal-google](https://github.com/mkruskal-google)
**Approved:** 2023-09-08
Should proto2/proto3 be treated as editions?
## Background
[Edition Zero Features](edition-zero-features.md) lays out our plan for edition
2023, which will unify proto2 and proto3. Since early in the design process,
we've discussed the possibility of making proto2 and proto3 "special" editions,
but never laid out what exactly it would look like or determined if it was
necessary.
We recently redesigned editions to be represented as enums
([Edition Naming](edition-naming.md)), and also how edition defaults are
propagated to generators and runtimes
([Editions: Life of a FeatureSet](editions-life-of-a-featureset.md)). With these
changes, there could be an opportunity to special-case proto2 and proto3 in a
beneficial way.
## Problem Description
While the original plan was to keep editions and syntax orthogonal, that naively
means we'd be supporting two very different codebases. This has some serious
maintenance costs though, especially when it comes to test coverage. We could
expect to have sub-optimal test coverage of editions initially, which would
gradually become poor coverage of syntax later. Since we need to support both
syntax and editions long-term, this isn't ideal.
In the implementation of editions in C++, we decided to unify a lot of the
infrastructure to avoid this issue. We define global feature sets for proto2 and
proto3, and try to use those internally instead of checking syntax directly. By
pushing the syntax/editions branch earlier in the stack, it gives us a lot of
indirect test coverage for editions much earlier.
A separate issue is how Prototiller will support the conversion of syntax to
edition 2023. For features it knows about, we can hardcode defaults into the
transforms. However, third party feature owners will have no way of signaling
what the old proto2/proto3 behavior was, so Prototiller won't be able to provide
any transformations by default. They'd need to provide custom Prototiller
transforms hardcoding all of their features.
## Recommended Solution
We recommend adding two new special editions to our current set:
```
enum Edition {
EDITION_UNKNOWN = 0;
EDITION_PROTO2 = 998;
EDITION_PROTO3 = 999;
EDITION_2023 = 1000;
}
```
These will be treated the same as any other edition, except in our parser which
will reject `edition = "proto2"` and `edition = "proto3"` in proto files. The
real benefit here is that this allows features to specify what their
proto2/proto3 defaults are, making it easier for Prototiller to handle
migration. It also allows generators and runtimes to unify their internals more
completely, treating proto2/proto3 files exactly the same as editions.
### Serialized Descriptors
As we now know, there are a lot of serialized `descriptor.proto` descriptor sets
out there that need to continue working for O(months). In order to avoid
blocking edition zero for that long, we may need fallbacks in protoc for the
case where feature resolution *fails*. If the file is proto2/proto3, failure
should result in a fallback to the existing hardcoded defaults. We can remove
these later once we're willing to break stale `descriptor.proto` snapshots that
predate the changes in this doc.
### Bootstrapping
In order to get feature resolution running in proto2 and proto3, we need to be
able to support bootstrapped protos. For these builds, we can't use any
reflection without deadlocking, which means feature defaults can't be compiled
during runtime. We would have had to solve this problem anyway when it came time
to migrate these protos to editions, but this proposal forces our hand early.
Luckily, "Editions: Life of a FeatureSet" already set us up for this scenario,
and we have Blaze rules for embedding these defaults into code. For C++
specifically, this will need to be checked in alongside the other bootstrapped
protos. Other languages will be able to do this more dynamically via genrules.
### Feature Inference
While we can calculate defaults using the same logic as in editions, actually
inferring "features" from proto2/proto3 needs some custom code. For example:
* The `required` keyword sets `LEGACY_REQUIRED` feature
* The `optional` keyword in proto3 sets `EXPLICIT` presence
* The `group` keyword implies `DELIMITED` encoding
* The `enforce_utf8` options flips between `PACKED` and `EXPANDED` encoding
This logic needs to be written in code, and will need to be duplicated in every
language we support. Any language-specific feature transformations will also
need to be included in that language. To make this as portable as possible, we
will define functions like:
Each type of descriptor will have its own set of transformations that should be
applied to its features for legacy editions.
#### Pros
* Makes it clearer that proto2/proto3 are "like" editions
* Gives Prototiller a little more information in the transformation from
proto2/proto3 to editions (not necessarily 2023)
* Allows proto2/proto3 defaults to be specified in a single location
* Makes unification of syntax/edition code easier to implement in runtimes
* Allows cross-language proto2/proto3 testing with the conformance framework
mentioned in "Editions: Life of a FeatureSet"
#### Cons
* Adds special-case legacy editions, which may be somewhat confusing
* We will need to port feature inference logic across all languages. This is
arguably cheaper than maintaining branched proto2/proto3 code in all
languages though
## Considered Alternatives
### Do Nothing
If we do nothing, there will be no built-in unification of syntax and editions.
Runtimes could choose any point to split the logic.
#### Pros
* Requires no changes to editions code
#### Cons
* Likely results in lower test coverage
* May hide issues until we start rolling out edition 2023
* Prototiller would have to hard-code proto2/proto3 defaults of features it
knows, and couldn't even try to migrate runtimes it doesn't