8270100: Fix some inaccurate GC logging #4727

simonis · 2021-07-08T15:19:01Z

If running with -Xlog:gc+heap*=trace the JVM will print the extra per thread amount which is added to the new generation on resize:

[0,105s][debug][gc,ergo,heap ] GC(0) New generation size 34112K->34176K [eden=27392K,survivor=3392K]
[0,105s][trace][gc,ergo,heap ] GC(0) [allowed 0K extra for 0 threads]

Currently this will always print "0K extra for 0 threads" no matter how much extra space was added.

Also, the shrink factor will always be printed to be 0%, even if we run with -XX:-ShrinkHeapInSteps which pins the shrink factor at 100%:

[13,213s][trace][gc,heap ] GC(34) shrink_bytes: 463564,0K current_shrink_factor: 0 new shrink factor: 0 _min_heap_delta_bytes: 192,0K
[13,239s][trace][gc,heap ] GC(34) Shrinking tenured generation from 531852K to 68288K

The fix is trivial.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8270100: Fix some inaccurate GC logging

Reviewers

Albert Mingkun Yang (@albertnetymk - Committer)
Thomas Schatzl (@tschatzl - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4727/head:pull/4727
$ git checkout pull/4727

Update a local copy of the PR:
$ git checkout pull/4727
$ git pull https://git.openjdk.java.net/jdk pull/4727/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4727

View PR using the GUI difftool:
$ git pr show -t 4727

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4727.diff

bridgekeeper · 2021-07-08T15:19:24Z

👋 Welcome back simonis! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2021-07-08T15:21:13Z

@simonis The following label will be automatically applied to this pull request:

hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2021-07-08T15:25:03Z

Webrevs

01: Full - Incremental (afef6ad4)
00: Full (dcb1f71e)

albertnetymk · 2021-07-09T08:05:45Z

Can you share the complete flags and the benchmark for easy reproducing? Somehow I can't reproduce the gc logs; the flags and the benchmark I used:

java -XX:+UseSerialGC -Xms2g -Xmx2g '-Xlog:gc,gc+heap*=trace:gc.log::filecount=0' -jar dacapo-9.12-MR1-bach.jar h2 -s huge -n 1

simonis · 2021-07-09T17:00:45Z

Hi @albertnetymk,

to see the first issue, you have to make the initial heap smaller than the the maximum heap. It can easily seen with your command line. Before the change:

$ java -XX:+UseSerialGC -Xms100m -Xmx2g '-Xlog:gc,gc+heap*=trace' -jar /share/soware/Java/dacapo-9.12-MR1-bach.jar h2 -s huge -n 1 | grep "allowed 0K extra"
[1,339s][trace][gc,ergo,heap ] GC(11)   [allowed 0K extra for 0 threads]
[2,066s][trace][gc,ergo,heap ] GC(17)   [allowed 0K extra for 0 threads]
[3,323s][trace][gc,ergo,heap ] GC(23)   [allowed 0K extra for 0 threads]
[5,560s][trace][gc,ergo,heap ] GC(29)   [allowed 0K extra for 0 threads]
...

After the change:

$ java -XX:+UseSerialGC -Xms100m -Xmx2g '-Xlog:gc,gc+heap*=trace' -jar /share/soware/Java/dacapo-9.12-MR1-bach.jar h2 -s huge -n 1 | grep "allowed 0K extra"
[1,397s][trace][gc,ergo,heap ] GC(11)   [allowed 5K extra for 1 threads]
[2,202s][trace][gc,ergo,heap ] GC(17)   [allowed 5K extra for 1 threads]
[3,468s][trace][gc,ergo,heap ] GC(23)   [allowed 5K extra for 1 threads]
[5,699s][trace][gc,ergo,heap ] GC(29)   [allowed 5K extra for 1 threads]
...

To see the second issue, you obviously will have to shrink the heap and set -XX:-ShrinkHeapInSteps. You can use the Uncommit.java program attached below. Before the change:

$ java -XX:+UseSerialGC -Xms100m -Xmx1g '-Xlog:gc,gc+heap*=trace' -XX:-ShrinkHeapInSteps Uncommit
...
[9,759s][trace][gc,heap      ] GC(11)     shrinking:  initSize: 68288,0K  maximum_desired_capacity: 90337,3K
[9,759s][trace][gc,heap      ] GC(11)     shrink_bytes: 572226,7K  current_shrink_factor: 0  new shrink factor: 0  _min_heap_delta_bytes: 192,0K
[9,780s][trace][gc,heap      ] GC(11) Shrinking tenured generation from 662564K to 90340K
...

After the change:

$ java -XX:+UseSerialGC -Xms100m -Xmx1g '-Xlog:gc,gc+heap*=trace' -XX:-ShrinkHeapInSteps Uncommit
...
[3,941s][trace][gc,heap      ] GC(11)     shrinking:  initSize: 68288,0K  maximum_desired_capacity: 92130,6K
[3,941s][trace][gc,heap      ] GC(11)     shrink_bytes: 575773,4K  current_shrink_factor: 100  new shrink factor: 100  _min_heap_delta_bytes: 192,0K
[3,958s][trace][gc,heap      ] GC(11) Shrinking tenured generation from 667904K to 92132K
...

Here's the example program:

import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;

public class Uncommit {

  static class KB {
    int[] kb;
    public KB() {
      kb = new int[256];
    }
  }

  static class MB {
    KB[] mb;
    public MB() {
      mb = new KB[1024];
      for (int i = 0; i < mb.length; i++) {
        mb[i] = new KB();
      }
    }
  }

  static class GB {
    MB[] gb;
    public GB() {
      gb = new MB[1024];
      for (int i = 0; i < gb.length; i++) {
        gb[i] = new MB();
      }
    }
  }

  static void printRSS() throws IOException {
    Files.lines(FileSystems.getDefault().getPath("/proc/self/status")).
      filter(l -> l.startsWith("VmRSS")).
      forEach(System.out::println);
  }

  public static void main(String[] args) throws Exception {
    System.out.println("Calling System.gc()");
    System.gc();
    printRSS();
    System.out.println("\nPress <Enter>");
    System.console().readLine();

    int size = Integer.parseInt(args.length > 0 ? args[0] : "512");
    MB mb[] = new MB[size];
    for (int i = 0; i < size; i++) {
      mb[i] = new MB();
    }
    System.out.println("Successfully allocated " + size + "MB memory");
    printRSS();
    System.out.println("\nPress <Enter>");
    System.console().readLine();
    for (int i = 0; i < size; i++) {
      mb[i] = null;
    }
    System.out.println("Successfully unlinked " + size + "MB memory");
    printRSS();
    System.out.println("\nPress <Enter>");
    System.console().readLine();
    int gcs = 5;
    if (args.length > 1) {
      gcs = Integer.parseInt(args[1]);
    }
    for (int i = 1; i <= gcs; i++) {
      System.gc();
      System.out.println("Performed " + i + ". System.gc()\n");
      printRSS();
      System.out.println("Press <Enter>");
      System.console().readLine();
      }
  }
}

albertnetymk · 2021-07-09T21:29:43Z

Thank you for the instruction, with which I can reproduce both issues.

As for the first fix, threads_count and thread_increase_size don't really depend on adjust_for_thread_increase, right? Then how about just changing the values in the log, sth like:

int threads_count = Threads::number_of_non_daemon_threads();;
size_t thread_increase_size = threads_count * NewSizeThreadIncrease;
log_trace(gc, ergo, heap)(
        "  [allowed " SIZE_FORMAT "K extra for %d threads]",
          thread_increase_size/K, threads_count);

As for the second fix, I think the following is more symmetric:

if (ShrinkHeapInSteps) {
  // assign current_shrink_factor and _shrink_factor
} else {
  // Shink 100% to the desired value
  _shrink_factor = current_shrink_factor = 100;
}

PS: currently, the assignment of current_shrink_factor is done at the beginning of the method, but it's quite far from its actual usage. Maybe move it to the if-branch above.

assert(_shrink_factor <= 100, "invalid shrink factor");
size_t current_shrink_factor = _shrink_factor;
_shrink_factor = 0;

simonis · 2021-07-12T08:22:09Z

threads_count and adjust_for_thread_increase get computed in adjust_for_thread_increase(). That's why they are passed in by reference. I don't think it makes sense to duplicate the computation of threads_count and adjust_for_thread_increase before we log them because the next time somebody will change adjust_for_thread_increase(), the values will be wrong again. This error occurred in the first place because JDK-8144527 has factored out the computation of threads_count and adjust_for_thread_increase into the new adjust_for_thread_increase() function, but did not pass the values back for logging.

For the second fix, I don't see why assigning to 100 to _shrink_factor and current_shrink_factor should be any better? These values aren't used except for logging if ShrinkHeapInSteps is false. Also, the assignment of current_shrink_factor can't easily be moved into the if-branch because _shrink_factor has to be reset to 0 before we potentially expand the heap (see comment But if we recompute size without shrinking, it goes back to 0%.). But on heap expansion, we'll return from the method early, before even reaching the shrinking logic.

albertnetymk · 2021-07-12T18:50:32Z

duplicate the computation of threads_count and adjust_for_thread_increase

I don't view a method call and accessing a global as much duplication. Anyway, this is subjective.

I don't see why assigning to 100 to _shrink_factor and current_shrink_factor should be any better?

Having a logic-free (does nothing but print) logger brings less surprise. IOW, many expect a logger faithfully reflect the actual internal states with no distortion.

These values aren't used except for logging if ShrinkHeapInSteps is false.

That's true, but I prefer that the logger doesn't know ShrinkHeapInSteps, and just prints _shrink_factor and current_shrink_factor as they are.

But on heap expansion, we'll return from the method early, before even reaching the shrinking logic.

I see; thank you for pointing it out. How about sth like this?

void CardGeneration::compute_new_size() {
  ... 
  if (capacity_after_gc < minimum_desired_capacity) {
    ...
    // expanding the heap; reset shrink factor
    _shrink_factor = 0;
    return;
  }

  if (capacity_after_gc > maximum_desired_capacity) {
    ...
    if (ShrinkHeapInSteps) {
      current_shrink_factor = _shrink_factor;
      _shrink_factor = ...
    } else {
      // Shrink 100% to the desired value
      current_shrink_factor = _shrink_factor = 100;
    }
    // log internal states
  }
}

simonis · 2021-07-13T06:16:32Z

@albertnetymk your latest proposal is still changing the current semantics. Before, the shrink factor was reset on every invocation of CardGeneration::compute_new_size(). With your proposal, it will only be reset if we expand the heap.

My patch is really just a trivial fix of some logging errors. I'm not against changing or improving it, but before I introduce any behavioral changes, I'd like to hear a second opinion.

tschatzl · 2021-07-13T09:08:14Z

I would prefer to not change semantics for heap sizing without lots of testing, and discussing the results. People have come to expect particular behavior, and actually "decaying" the shrink factor on every CardGeneration::compute_new_size() call makes sense. This makes shrinking the heap more slowly and less surprising. Consider the situation when you have the following sequence of shrink (s), no change (n) heap changes:

s - s - s - n - n - ... lots of n ... - n - s

On the last s, the GC would shrink 100% of the suggested value, which is kind of surprising given that the many "no-changes" gcs that would have indicated the application is kind of stable now (or you started another round in your benchmark). Generally it is also a good idea to be very reluctant with heap decreases as it typically has a significant performance impact during further operation. Too much heap shrinking tends to also just cause the next gc(s) to increase the heap again, i.e. ending up in wasted cpu time :)

Fwiw, for this issue, I would split adjust_for_thread_increase into two methods: one calculating the thread_increase_size, and the original adjust_for_thread_increase take that value as a parameter.

Repeating the calculation at the logging as suggested by @albertnetymk gives different results on boundary conditions (i.e. the overflow checks are missing. Strangely on any overflow condition, thread_increase_size does not factor in at all into the desired_new_size. I think this is a bug; my preferred alternative would be to maximize the thread_increase_size to just not overflow instead - but that is another issue(!)). Another reason why I would prefer to separate the calculation of that value from the calculation of the desired_new_size is that these seem to logically be two separate steps, so that makes sense anyway imho.

I agree with @albertnetymk that it is preferable to not have the conditions in the log message to make them as simple as possible - so I would probably just set current_shrink_factor and _shrink_factor to 100 at the top of CardGeneration::compute_new_size if ShrinkHeapInSteps is false. In addition to that probably put these variable declarations at the start of the shrinking calculations block, somewhere close to

size_t shrink_bytes = 0;

Calling Threads::number_of_non_daemon_threads another time for the log message does not seem to be a problem, but if you wanted, that calculate_thread_increase_size() could also return the thread count used for the calculation.

Of course, I did not run and test my suggestions.

Three people, three opinions, but maybe there's a middle ground to be found somewhere :p

albertnetymk · 2021-07-13T09:40:07Z

Before, the shrink factor was reset on every invocation of CardGeneration::compute_new_size(). With your proposal, it will only be reset if we expand the heap.

Indeed, my suggestion is flawed.

"decaying" the shrink factor on every CardGeneration::compute_new_size() call makes sense.

Agree.

so I would probably just set current_shrink_factor and _shrink_factor to 100 at the top of CardGeneration::compute_new_size if ShrinkHeapInSteps is false

Sounds good to me.

Re NewSizeThreadIncrease, what's the motivation for having it (used only by Serial) in the first place? Going through the git-log, I can only find the overflow-fix from JDK-8144527.

simonis · 2021-07-14T08:18:57Z

@albertnetymk, @tschatzl thanks for your feedback. From what I understand we all agree that this change should only fix the reported GC logging issues without changing the semantics or behavior of the underlying GC algorithm. My new version of the fix tries to achieve exactly this based on your recommendations:

I've introduced a new helper function calculate_thread_increase_size() which takes the number of threads as an argument and returns the thread_increase_size which is then passed to calculate_thread_increase_size().
For the shrink factor I now set _shrink_factor to 100 right in the constructor if ShrinkHeapInSteps is not set and never change it. Notice that ShrinkHeapInSteps is not manageable, so this should be fine. If ShrinkHeapInSteps is true, _shrink_factor will be reset to zero at every invocation of CardGeneration::compute_new_size() just as before the change.

With these changes the logging issues are fixed without any change in functionality.

openjdk · 2021-07-14T11:19:22Z

@simonis This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8270100: Fix some inaccurate GC logging

Reviewed-by: ayang, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 68 new commits pushed to the master branch:

357fe09: 8247351: [aarch64] NullPointerException during stack walking (clhsdb "where -a")
589a1d5: 8270155: ARM32: Improve register dump in hs_err
b6d7653: 8264419: Remove has_max_index argument from G1BlockOffsetTablePart::block_at_or_preceding
4a7ccf3: Merge
8583aab: 8270025: DynamicCallSiteDesc::withArgs doesn't throw NPE
d32e42c: 8270184: [TESTBUG] Add coverage for jvmci ResolvedJavaType.toJavaName() for lambdas
b2416b6: 8269281: java/foreign/Test{Down,Up}call.java time out
bd95c0c: 8269635: Stress test SEGV while emitting OldObjectSample
00ef65f: 8269525: Deadlock during Volano with JFR
e631add: 8259848: Interim javadoc build does not support platform links
... and 58 more: https://git.openjdk.java.net/jdk/compare/bca570c56ee17cb4735a8360ec79a3ca22049d05...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

simonis · 2021-07-14T17:37:29Z

Thanks for the reviews @albertnetymk, @tschatzl!

simonis · 2021-07-14T17:38:06Z

/integrate

openjdk · 2021-07-14T17:39:28Z

Going to push as commit fb86d13.
Since your change was applied there have been 70 commits pushed to the master branch:

b1bb05b: 8269592: [JVMCI] Optimize c2v_iterateFrames
c0d4eff: 6506405: Math.abs(float) is slow
357fe09: 8247351: [aarch64] NullPointerException during stack walking (clhsdb "where -a")
589a1d5: 8270155: ARM32: Improve register dump in hs_err
b6d7653: 8264419: Remove has_max_index argument from G1BlockOffsetTablePart::block_at_or_preceding
4a7ccf3: Merge
8583aab: 8270025: DynamicCallSiteDesc::withArgs doesn't throw NPE
d32e42c: 8270184: [TESTBUG] Add coverage for jvmci ResolvedJavaType.toJavaName() for lambdas
b2416b6: 8269281: java/foreign/Test{Down,Up}call.java time out
bd95c0c: 8269635: Stress test SEGV while emitting OldObjectSample
... and 60 more: https://git.openjdk.java.net/jdk/compare/bca570c56ee17cb4735a8360ec79a3ca22049d05...master

Your commit was automatically rebased without conflicts.

openjdk · 2021-07-14T17:39:46Z

@simonis Pushed as commit fb86d13.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk bot added the rfr label Jul 8, 2021

openjdk bot added the hotspot-gc label Jul 8, 2021

8270100: Fix some inaccurate GC logging

afef6ad

simonis force-pushed the JDK-8270100 branch from dcb1f71 to afef6ad Compare July 14, 2021 08:07

albertnetymk approved these changes Jul 14, 2021

View reviewed changes

tschatzl approved these changes Jul 14, 2021

View reviewed changes

openjdk bot added the ready label Jul 14, 2021

openjdk bot closed this Jul 14, 2021

openjdk bot added integrated and removed ready rfr labels Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8270100: Fix some inaccurate GC logging #4727

8270100: Fix some inaccurate GC logging #4727

simonis commented Jul 8, 2021 •

edited by openjdk bot

Loading

bridgekeeper bot commented Jul 8, 2021

openjdk bot commented Jul 8, 2021

mlbridge bot commented Jul 8, 2021 •

edited

Loading

albertnetymk commented Jul 9, 2021

simonis commented Jul 9, 2021

albertnetymk commented Jul 9, 2021

simonis commented Jul 12, 2021

albertnetymk commented Jul 12, 2021

simonis commented Jul 13, 2021

tschatzl commented Jul 13, 2021

albertnetymk commented Jul 13, 2021

simonis commented Jul 14, 2021

openjdk bot commented Jul 14, 2021

simonis commented Jul 14, 2021

simonis commented Jul 14, 2021

openjdk bot commented Jul 14, 2021

openjdk bot commented Jul 14, 2021

8270100: Fix some inaccurate GC logging #4727

8270100: Fix some inaccurate GC logging #4727

Conversation

simonis commented Jul 8, 2021 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Jul 8, 2021

openjdk bot commented Jul 8, 2021

mlbridge bot commented Jul 8, 2021 • edited Loading

Webrevs

albertnetymk commented Jul 9, 2021

simonis commented Jul 9, 2021

albertnetymk commented Jul 9, 2021

simonis commented Jul 12, 2021

albertnetymk commented Jul 12, 2021

simonis commented Jul 13, 2021

tschatzl commented Jul 13, 2021

albertnetymk commented Jul 13, 2021

simonis commented Jul 14, 2021

openjdk bot commented Jul 14, 2021

simonis commented Jul 14, 2021

simonis commented Jul 14, 2021

openjdk bot commented Jul 14, 2021

openjdk bot commented Jul 14, 2021

simonis commented Jul 8, 2021 •

edited by openjdk bot

Loading

mlbridge bot commented Jul 8, 2021 •

edited

Loading