Java garbage collection (GC)

Introduction

The purpose of this article is to introduce GC to you in an easy way. I hope this article proves to be very helpful.

Stop-the-world

There is a term in GC and that is “stop-the-world”. Stop-the-world means that the JVM is stopping the application  to execute a GC and all threads except the threads required to run a GC are stopped. The interrupted task will continue only after the GC is completed. So GC tuning is reducing this stop-the-world time.

Generational Garbage Collection process

Java does not explicitly specify a memory and remove it in the program code. We can set the object reference to null or call the method System.gc()  Setting to System.gc() is not a big deal, but calling System.gc()  method will affect the system performace drastically and it must not be carried out. In Java as the developer does not remove the object explicitly remove the memory in program code, the garbage collector finds the unecessary object and removes them. The Garbage Collector was created with two hypotheses

  • Most Obejects will become unreachable
  • References from old objects to young objects only exist in small numbers.

These hypothesis are Weak Generational Hypothesis. JVM has been modified to strengthen the hypothesis above. Hotspot VM is divided into Young and Old generation. 

Young generation : Most of the newly created objects are located here.Since most of the objects created becomes unreachable, many objects from Young genearation disappers or destoryed, then we say a minor gc has happened.

Old generation: The objects that did not become unreachable and survived from the young generation are copied here. It is generally larger than the young generation. As it is bigger in size, the GC occurs less frequently than in the young generation. When objects disappear from the old generation, we say a “major GC” (or a “full GC“) has occurred.

Hotspot Heap Structure

Fig 1 : Hotspot Heap Structure (Oracle)

The Permanent generation contains metadata required by the JVM to describe the classes and methods used in the application. The permanent generation is populated by the JVM at runtime based on classes in use by the application. In addition, Java SE library classes and methods may be stored here.Classes may get collected (unloaded) if the JVM finds they are no longer needed and space may be needed for other classes. The permanent generation is included in a full garbage collection.

Object in Old generation can refer an object in Young Generation and to handle such cases, card tables are present.which is a 512 byte chunk. Whenever an object in the old generation references an object in the young generation, it is recorded in this table. When a GC is executed for the young generation, only this card table is searched to determine whether or not it is subject for GC, instead of checking the reference of all the objects in the old generation. This card table is managed with write barrier. This write barrier is a device that allows a faster performance for minor GC. Though a bit of overhead occurs because of this, the overall GC time is reduced.

Young generation

The young generation is divided into 3 spaces.

  • One Eden space
  • Two Survivor spaces
  1. The majority of newly created objects are located in the Eden space.
  2. After one GC in the Eden space, the surviving objects are moved to one of the Survivor spaces.
  3. After a GC in the Eden space, the objects are piled up into the Survivor space, where other surviving objects already exist.
  4. Once a Survivor space is full, surviving objects are moved to the other Survivor space. Then, the Survivor space that is full will be changed to a state where there is no data at all.
  5. The objects that survived these steps that have been repeated a number of times are moved to the old generation.

Majority of the new objects are in eden space, after one GC, the surviving objects are moved to Survivor space 0(S0). The objects are piled up in the same survivor space. Once the space is full it is moved to the survivor space 1(S1). The swapping of survivor space occurs number of times. If we take a graph of space consumed in survivor spaces, then it would be a square wave, that complements each other. At any given point in time, any survivor space utilization is zero. If both are zero are non-empty (utilization of S1 and S0) then SOMETHING WRONG in the system.

Memory allocation in Hotspot VM

In HotSpot VM, two techniques are used for faster memory allocations. One is called “bump-the-pointer,” and the other is called “TLABs (Thread-Local Allocation Buffers).”

Bump-the-pointer technique tracks the last object allocated to the Eden space. That object will be located on top of the Eden space. And if there is an object created afterwards, it checks only if the size of the object is suitable for the Eden space. If the said object seems right, it will be placed in the Eden space, and the new object goes on top. This allows much faster memory allocations. But if we consider a multi-threaded environment. for  saving objects  a lock should occur and the performance will drop due to the lock-contention. TLABs is the solution to this problem in HotSpot VM. This allows each thread to have a small portion of its Eden space that corresponds to its own share. As each thread can only access to their own TLAB, even the bump-the-pointer technique will allow memory allocations without a lock.

GC for Old Generation

The old generation basically performs a GC when the data is full. There are 5 GC types in JDK 7.

  1. Serial GC
  2. Parallel GC
  3. Parallel Old GC (Parallel Compacting GC)
  4. Concurrent Mark & Sweep GC  (or “CMS”)
  5. Garbage First (G1) GC

Serial GC

The serial collector is the default for client style machines in Java SE 5 and 6. With the serial collector, both minor and major garbage collections are done serially (using a single virtual CPU).It uses a mark-compact collection method. This method moves older memory to the beginning of the heap so that new memory allocations are made into a single continuous chunk of memory at the end of the heap. This compacting of memory makes it faster to allocate new chunks of memory to the heap.

To enable the Serial Collector use:
-XX:+UseSerialGC

Parallel GC

The parallel garbage collector uses multiple threads to perform the young genertion garbage collection. By default on a host with N CPUs, the parallel garbage collector uses N garbage collector threads in the collection. The number of garbage collector threads can be controlled with command-line options:
-XX:ParallelGCThreads=<desired number>

-XX:+UseParallelGC

With this command line option you get a multi-thread young generation collector with a single-threaded old generation collector.

-XX:+UseParallelOldGC

With the -XX:+UseParallelOldGC option, the GC is both a multithreaded young generation collector and multithreaded old generation collector

Compacting describes the act of moving objects in a way that there are no holes between objects. After a garbage collection sweep, there may be holes left between live objects. Compacting moves objects so that there are no remaining holes. It is possible that a garbage collector be a non-compacting collector. Therefore, the difference between a parallel collector and a parallel compacting collector could be the latter compacts the space after a garbage collection sweep. The former would not.

The Concurrent Mark Sweep (CMS) Collector

The Concurrent Mark Sweep (CMS) collector (also referred to as the concurrent low pause collector) collects the tenured generation. It attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads. Normally the concurrent low pause collector does not copy or compact the live objects. A garbage collection is done without moving the live objects. If fragmentation becomes a problem, allocate a larger heap.

To enable the CMS Collector use:
-XX:+UseConcMarkSweepGC
and to set the number of threads use:
-XX:ParallelCMSThreads=<n>

The G1 Garbage Collector

The Garbage First or G1 garbage collector is available in Java 7 and is designed to be the long term replacement for the CMS collector. The G1 collector is a parallel, concurrent, and incrementally compacting low-pause garbage collector that has quite a different layout from the other garbage collectors described previously.

To enable the G1 Collector use:
-XX:+UseG1GC

 

 

 

2 thoughts on “Java garbage collection (GC)

  1. Appreciating the time and energy you put into your site and in depth information you provide.
    It’s nice to come across a blog every once in a while that isn’t the same unwanted rehashed material.
    Great read! I’ve bookmarked your site and I’m adding your RSS feeds
    to my Google account.

  2. Admiring the dedication you put into your blog and in depth information you present.

    It’s nice to come across a blog every once in a while that isn’t
    the same old rehashed information. Fantastic read! I’ve bookmarked your site and
    I’m adding your RSS feeds to my Google account.

Leave a Reply

Your email address will not be published. Required fields are marked *