Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8272137: Make Collection and Optional classes streamable #5050

Closed
wants to merge 2 commits into from
Closed

8272137: Make Collection and Optional classes streamable #5050

wants to merge 2 commits into from

Conversation

CC007
Copy link

@CC007 CC007 commented Aug 9, 2021

create Streamable and ParallelStreamable interface and use them in Collection and Optional


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Integration blocker

 ⚠️ Title mismatch between PR and JBS for issue JDK-8272137

Issue

  • JDK-8272137: Make Iterable classes streamable ⚠️ Title mismatch between PR and JBS.

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5050/head:pull/5050
$ git checkout pull/5050

Update a local copy of the PR:
$ git checkout pull/5050
$ git pull https://git.openjdk.java.net/jdk pull/5050/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5050

View PR using the GUI difftool:
$ git pr show -t 5050

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5050.diff

Sorry, something went wrong.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
…n Collection and Optional
@CC007
Copy link
Author

CC007 commented Aug 9, 2021

Something seemed to have gone wrong with the jcheck

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label Aug 9, 2021
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 9, 2021

Hi @CC007, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user CC007" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented Aug 9, 2021

@CC007 The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Aug 9, 2021
@CC007
Copy link
Author

CC007 commented Aug 9, 2021

/signed

@bridgekeeper bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label Aug 9, 2021
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 9, 2021

Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

@bridgekeeper bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels Aug 13, 2021
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 13, 2021
@mlbridge
Copy link

mlbridge bot commented Aug 13, 2021

Webrevs

@CC007
Copy link
Author

CC007 commented Aug 13, 2021

I couldn't think of any, but if there are more places where these interfaces could be used, please tell me, so I can add them.

@liach
Copy link
Member

liach commented Aug 14, 2021

Aren't all iterable implementations effectively streamable if they properly implement spliterator? And the spliterator implementation can always be sequential or parallel, dependent on how you feed it into StreamSupport.

@forax
Copy link
Member

forax commented Aug 14, 2021

Hi Rick,
I do not think that such interfaces are a good addition to the JDK,
we do not want to promote the fact that you can create a Stream from an Iterable because the resulting Stream is not efficient (or is not as efficient as it could be) and we can already uses a Supplier if we want to to represent a factory of streams

See my comment on the bug for more details

@CC007
Copy link
Author

CC007 commented Aug 14, 2021

The name of this pull request is a bit wrong, but it had to be the same as the JBS name.

The interfaces don't make the Iterable have stream and parallelStream methods, but are two separate interfaces to denote that an object can be streamed. These interfaces are fully decoupled from the Iterable interface and it is merely suggested that these can be used together in cases that an object is both Iterable and Streamable. No implementation is implied and has to be provided by the object or its superclasses or other interfaces (like collection).
PS: my name is Rik, not Rick

@CC007
Copy link
Author

CC007 commented Aug 14, 2021

Aren't all iterable implementations effectively streamable if they properly implement spliterator? And the spliterator implementation can always be sequential or parallel, dependent on how you feed it into StreamSupport.

Yes indeed, but to add extra methods to the Iterable interface itself could have compatibility or possible security issues depending on whether these methods are abstract or not resp.

Therefore I made it a fully separate interface here and used it in places with known implementations (Collection and Optional) so that there's no changes in the exposed API of the other classes. The only difference is that you can now make use of the fact that these interfaces extend Streamable and/or ParallelStreamable

Copy link
Contributor

@briangoetz briangoetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I object to this change. These issues were explored in the JSR 335 EG and it was agreed that this abstraction did not carry its weight. In any case, it is premature to bring a PR for a significant API change that has not been discussed first on corelibs-dev. Please withdraw the PR, and if you want to continue the discussion, bring it to corelibs-dev.

@CC007
Copy link
Author

CC007 commented Aug 15, 2021

I object to this change. These issues were explored in the JSR 335 EG and it was agreed that this abstraction did not carry its weight. In any case, it is premature to bring a PR for a significant API change that has not been discussed first on corelibs-dev. Please withdraw the PR, and if you want to continue the discussion, bring it to corelibs-dev.

I agree with the spliterator performance issues and the ability to be able to create IntStreams from Iterable. Therefore option 1 and 2 from JDK-8272137 seem fairly unfeasible to implement. This was already hinted at there, but for different reasons.

This PR however implements option 4: an option that doesn't actually make the Iterable class have a stream method, but instead abstracts the stream method to an interface. This method is then implemented by Collection and Optional. It also allows other code to implement this interface.

Using this implementation, you don't have the issues that the JSR 335 EG specified that would be present if Iterable itself were to be made Streamable. You wouldn't want that anyway, because streamability and iterability are two similar but very separate concerns in Java.

@CC007 CC007 changed the title 8272137: Make Iterable classes streamable 8272137: Make Collection and Optional classes streamable Aug 15, 2021
@CC007
Copy link
Author

CC007 commented Aug 15, 2021

Changed the title, since it seemed to cause confusion.

@openjdk
Copy link

openjdk bot commented Aug 15, 2021

⚠️ @CC007 a branch with the same name as the source branch for this pull request (pr/5050) is present in the target repository. If you eventually integrate this pull request then the branch pr/5050 in your personal fork will diverge once you sync your personal fork with the upstream repository.

To avoid this situation, create a new branch for your changes and reset the pr/5050 branch. You can do this by running the following commands in a local repository for your personal fork. Note: you do not have to name the new branch NEW-BRANCH-NAME.

$ git checkout -b NEW-BRANCH-NAME
$ git branch -f pr/5050 a86ac0d1e3a6f02e587362c767abdf62b308d321
$ git push -f origin pr/5050

Then proceed to create a new pull request with NEW-BRANCH-NAME as the source branch and close this one.

@briangoetz
Copy link
Contributor

briangoetz commented Aug 15, 2021 via email

@CC007
Copy link
Author

CC007 commented Aug 15, 2021

I understand what you are proposing. I do not believe Streamable carries its weight.

That argument seems subjective. Could you elaborate on that in an objective manner?

I based this implementation on the Interface segregation principle and the reason I added this was because of a valid use case where it would be more convenient and logical to be able to stream a certain object, but since such an interface doesn't exist yet, the api didn't have a way to declare the object streamable, without it also declaring the object to be a collection or optional (or api specific streamable implementation).

@briangoetz
Copy link
Contributor

briangoetz commented Aug 15, 2021 via email

@CC007
Copy link
Author

CC007 commented Aug 15, 2021

I read through these posts and while I did see good reasons for not moving stream to Iterable, I didn't see any mention of a separate Streamable-like interface, so could you elaborate on that it "did not carry its weight"?

@amaembo
Copy link
Contributor

amaembo commented Aug 16, 2021

Mostly agreed with Brian. Judging from 7 years of using Stream API, I can say that this abstraction would not solve any real problem. If you need a way to create many identical streams on demand, just accept Supplier<Stream<T>>. This allows more flexibility for clients. They can not only supply myCollection::stream or myOptional::stream but also () -> Arrays.stream(myArray), () -> IntStream.range(...).boxed(), () -> myCollection.stream().filter(something) or whatever else. A dedicated Streamable interface is too limited and will require adapters in many cases but you can already adapt anything to Supplier<Stream<T>>. People already use Supplier<Stream<T>> idiom pretty often, so creating a new Streamable interface would add an API mess: some people would stick with Supplier and others would migrate to Streamable. So I vote to reject this PR.

I said "mostly" because I think that PR is a good starting point for discussion. It's much easier to explain which enhancement you are proposing if you already present some code. And we are already at corelibs-dev, as PR comments are mirrored there, and for some people, it's more comfortable to discuss via GitHub interface, as you don't have to subscribe and get tons of unrelated e-mails, you can concentrate on a single discussion only. So in my opinion, it's completely ok to write code and create a PR before the discussion, even if it's likely to be thrown away.

@openjdk
Copy link

openjdk bot commented Aug 16, 2021

⚠️ @CC007 a branch with the same name as the source branch for this pull request (pr/5050) is present in the target repository. If you eventually integrate this pull request then the branch pr/5050 in your personal fork will diverge once you sync your personal fork with the upstream repository.

To avoid this situation, create a new branch for your changes and reset the pr/5050 branch. You can do this by running the following commands in a local repository for your personal fork. Note: you do not have to name the new branch NEW-BRANCH-NAME.

$ git checkout -b NEW-BRANCH-NAME
$ git branch -f pr/5050 a86ac0d1e3a6f02e587362c767abdf62b308d321
$ git push -f origin pr/5050

Then proceed to create a new pull request with NEW-BRANCH-NAME as the source branch and close this one.

@kevinrushforth
Copy link
Member

@CC007 Please follow the course of action that @briangoetz has requested. This needs to be discussed on the appropriate mailing list (core-libs-dev in this case) prior to submitting a pull request. It is premature to review this PR.

I invite you to refer to the OpenJDK Developers' Guide, specifically the Socialize your change section.

@PaulSandoz
Copy link
Member

To reiterate: These issues were explored in the JSR 335 EG and it was agreed that this abstraction did not carry its weight.

Yes, we explored this when the Stream API was being designed. It's hard, if not impossible, to capture all decisions for posterity. In the passing years I don't think anything significantly new has arisen for us to revisit that decision.

@CC007
Copy link
Author

CC007 commented Aug 17, 2021

It is fine that the pull request is closed for now, but I agree with @amaembo that if this conversation is indeed mirrored there, then it shouldn't be a problem to discuss the proposal here, along with the proposed changes.

I actually worked out a usecase (See: https://github.com/CC007/InterfaceSegregationDemo), where I partially reimagined the Collection interface stack and its neighboring interfaces/classes. I also added a Demo class to show how I imagined to use these interfaces and to show how the granularity of these interfaces can be useful.

The fact that I have to create copies of interfaces and delegates for implementations makes me feel like something is lacking in the collection stack and that for this usecase Supplier<E> wouldn't cut it, but feel free to fork and refactor my code to prove me wrong.

PS: while creating the Demo class, I noticed that it couldn't be analyzed if the return value of a method with type <T extends Streamable<String> & Addable<String> & Iterable<String> & RandomAccess> could be safely cast to a new local variable with type <SA extends Streamable<String> & Addable<String> and that you can't declare variables of type <? extends Streamable<String> & Addable<String> (it won't let you use the & when using the ?), but I digress.

@CC007
Copy link
Author

CC007 commented Aug 17, 2021

Also, sorry for my Millennial lack of knowledge of older communication methods, but if I wanted to reply to a specific thread in a mailing list, how would I do that? Does it require certain content in the subject and/or quoted text in the message body?

It does feel way less intuitive than just clicking the comment box or clicking ... -> Quote reply on a comment in this PR

@liach
Copy link
Member

liach commented Aug 17, 2021

You can view the mailing lists at https://mail.openjdk.java.net/mailman/listinfo and subscribe there.
If you want to send a mail, just send one to say jdk-dev at openjdk.java.net (and carbon copy if you reply to a specific person so the list can see a copy)


On a separate note, @CC007 your model falls back to the fairly-old inheritance vs embodiment (extending a class vs using it as a field) debate. Nowadays, I think Java has evolved to reduce excessive reliances on inheritances, as there can be clashes (for example, a class cannot implement a comparator for two different type arguments). Also due to these potential of clashing, in a lot of java code, interface implementation classes avoid implementing multiple interfaces at once.

@CC007
Copy link
Author

CC007 commented Aug 17, 2021

The List and Collection interface was almost directly taken from java.util (apart from the small feature interfaces, that I extracted from them), so it would make sense that those classes use an older design.

Are you suggesting that there are deeper architectural issues with the hierarchy-heavy collection stack? Would it be better if a list contained a collection, rather than inheriting from it? If that is the case, anything that is proposed in this PR is merely aiming at symptoms of a possibility much larger underlying problem, which might require going back to the drawing board for all collection related features.

@liach
Copy link
Member

liach commented Aug 17, 2021

With modern java, you can always create a Streamable on your own (and have specifications/documentations that a simple Supplier lacks) and implement it like:

public interface Streamable<T> {
  Stream<T> stream();
  static Streamable<T> ofIterable(Iterable<T> iterable) {
    return () -> StreamSupport.stream(iterable.spliterator(), false);
  }
}

The lack of such an interface in the JDK is not really a problem.

Are you suggesting that there are deeper architectural issues with the hierarchy-heavy collection stack?

I am saying that your sample project has unnecessarily many interfaces for your collection model, where many of them have little virtue on their own. This interface mash is susceptible to accidentally introducing unwanted features into the hierarchy and having method or specification clashes for implementation.

Looking back on identifying efficiently streamable objects, people can usually find efficient streams by the method return types. Your interface doesn't allow people to identify BufferedReader.lines() as an efficiently streamable target, for instance.
In addition, I think the main purpose of Optional.stream() is to allow using it in stream flatmapping than to promote stream creation with optionals.

@CC007
Copy link
Author

CC007 commented Aug 17, 2021

Ah ok, I see your point. In the case that you want to have something be only Streamable, you can create an interface like this (fixed missing method type param and added ofCollection:

public interface Streamable<T> {
    
    Stream<T> stream();

    static <T> Streamable<T> ofIterable(Iterable<T> iterable) {
        return () -> StreamSupport.stream(iterable.spliterator(), false);
    }

    static <T> Streamable<T> ofCollection(Collection<T> collection) {
        return collection::stream;
    }
}

This will indeed allow you to only expose the stream() method, even to the degree that you can't even expose the other methods with type casting, which is a nice side effect. You could also add a static method for ofOptional, if required, but you made a good point about Optional.stream's general use case (though it could still be used as a stream when needed).

At first I didn't fully understand how that would resolve the issue for whenever you need something that is both Iterable and Streamable, but after rereading your comment I came up with the following interface:

/**
 * This is an interface that specifies that its content can be traversed and acted upon,
 * either through iteration or using streams
 */
public interface Traversable<T> extends Streamable<T>, Iterable<T> {

    static <T> Traversable<T> ofIterable(Iterable<T> iterable) {
        return new Traversable<T>() {
            @Override
            public Stream<T> stream() {
                return Streamable.ofIterable(iterable).stream();
            }

            @Override
            public Iterator<T> iterator() {
                return iterable.iterator();
            }
        };
    }

    static <T> Traversable<T> ofStreamable(Streamable<T> streamable) {
        return new Traversable<T>() {
            @Override
            public Stream<T> stream() {
                return streamable.stream();
            }

            @Override
            public Iterator<T> iterator() {
                return streamable.stream().iterator();
            }
        };
    }

    static <T> Traversable<T> ofCollection(Collection<T> collection) {
        return new Traversable<T>() {
            @Override
            public Stream<T> stream() {
                return collection.stream();
            }

            @Override
            public Iterator<T> iterator() {
                return collection.iterator();
            }
        };
    }
}

For anyone who's doubtful about the arguments against this change, you can find this code in use in demo2 in my demo repo: https://github.com/CC007/InterfaceSegregationDemo/tree/master/src/main/java/com/github/cc007/interfacesegregationdemo/demo2

@CC007 CC007 deleted the pr/5050 branch August 17, 2021 17:51
@CC007
Copy link
Author

CC007 commented Aug 17, 2021

I believe that the accompanying JBS issue (JDK-8272137) can be closed. Maybe a mention of how the discussion in this PR was resolved (with code examples) could be added there before closing it.

@mlbridge
Copy link

mlbridge bot commented Sep 1, 2021

Mailing list message from Brian Goetz on core-libs-dev:

Starting with a PR is not the way to add significant API surface to the JDK. You have to first build consensus on the mailing list for the design concept before we start talking about code.

FTR, Interfaces like these were already discussed and rejected in the JSR 335 design discussions.

Sent from my iPad

@mlbridge
Copy link

mlbridge bot commented Sep 1, 2021

Mailing list message from Brian Goetz on core-libs-dev:

For the third time: This discussion illustrates why the PR was
premature; the design was not agreed upon first.? High-level design
discussions (i.e., "is this a good design", "is this a good idea at
all", "are we solving the right problem", "does it need to be solved in
the JDK") should happen first on the discussion lists; if they happen on
a PR like this, that's clear evidence that important steps have been
skipped.

On 8/14/2021 10:48 PM, CC007 wrote:

@mlbridge
Copy link

mlbridge bot commented Sep 1, 2021

Mailing list message from Alan Bateman on core-libs-dev:

On 15/08/2021 14:29, Brian Goetz wrote:

For the third time: This discussion illustrates why the PR was
premature; the design was not agreed upon first.? High-level design
discussions (i.e., "is this a good design", "is this a good idea at
all", "are we solving the right problem", "does it need to be solved
in the JDK") should happen first on the discussion lists; if they
happen on a PR like this, that's clear evidence that important steps
have been skipped.

The OpenJDK developers guide has been getting some TLC in recent months.
One of the improvements is the "Socialize your change" section [1] which
does try to make it clear that API proposals should be discussed on the
mailing list before going near a PR.

-Alan

[1] https://openjdk.java.net/guide/#socialize-your-change

@mlbridge
Copy link

mlbridge bot commented Sep 1, 2021

Mailing list message from Alan Snyder on core-libs-dev:

Ah, if only one could define a type alias Streamable<T> = Supplier<Stream<T>>...

@mlbridge
Copy link

mlbridge bot commented Sep 1, 2021

Mailing list message from Alan Snyder on core-libs-dev:

This provides an opportunity for me to promote what I believe is a much more important missing interface, namely, an interface that supports a semantic replacement for type casting.

Using type casting (instanceof) is a really bad way to test an object for an optional capability.

The reason is that instanceof is strictly limited by how the object is implemented. It will only work if the object *directly* implements the interface. It does not support an object that might provide the requested interface using an auxiliary object. It doesn?t support delegation at all. If you try to wrap an object with a transparent wrapper implemented using delegation, the wrapper must support exactly the interfaces that you expect from the wrapped object. If some of those are optional, you wind up with many versions of the wrapper to ensure that instanceof will work on the wrapper as expected.

This is hardly a new idea. I?ve seen this idea in several major libraries. But, because Java does define its own version of this interface, this approach cannot be used in general.

I suspect it would be useful for some of the problems being discussed here.

For concreteness, this is how I define it:

public interface Extensible
{
<T> @Nullable T getExtension(@NotNull Class<T> c);
}

with a static method used in place of instanceof:

public static <T> @Nullable T getExtension(@Nullable Object o, @NotNull Class<T> c)
{
if (o == null) {
return null;
}

if (c.isInstance(o)) {
return c.cast(o);
}

if (o instanceof Extensible) {
Extensible x = (Extensible) o;
return x.getExtension(c);
}

return null;
}

On Aug 17, 2021, at 10:54 AM, CC007 <github.com+5381337+cc007 at openjdk.java.net> wrote:

On Mon, 9 Aug 2021 12:28:23 GMT, CC007 <github.com+5381337+CC007 at openjdk.org> wrote:

create Streamable and ParallelStreamable interface and use them in Collection and Optional

Ah ok, I see your point. In the case that you want to have something be only `Streamable`, you can create an interface like this (fixed missing method type param and added `ofCollection`:

public interface Streamable<T> {

Stream<T> stream();

static <T> Streamable<T> ofIterable(Iterable<T> iterable) {
return () -> StreamSupport.stream(iterable.spliterator(), false);
}

static <T> Streamable<T> ofCollection(Collection<T> collection) {
return collection::stream;
}
}

This will indeed allow you to only expose the `stream()` method, even to the degree that you can't even expose the other methods with type casting, which is a nice side effect. You could also add a static method for `ofOptional`, if required, but you made a good point about `Optional.stream`'s general use case (though it could still be used as a stream when needed).

@CC007
Copy link
Author

CC007 commented Sep 1, 2021

Mailing list message from Alan Snyder on core-libs-dev:

Ah, if only one could define a type alias Streamable = Supplier<Stream>...

If you use that, you'd lose some semantics that the method name would have given you. I feel that @liach 's solution is cleaner, since that preserves this method name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

None yet

7 participants