Skip to main content

How to implement cache in Java

 

ccording to wikipedia, a cache is a hardware or software component that stores data so that future requests for that data can be served faster.

Benefits:

  • Faster access of data in O(1)
  • Computation complexity once for the first time

Types:

  • Memory cache
  • Database cache
  • Disk cache, etc

In this post, we’ll go through the steps to create a in memory cache in java.

To create a cache, we can simply use a map / dictionary data structure and we can get the expected result of O(1) for both get and put operation.

But, we can’t store everything in our cache. We have storage and performance limits.

A cache eviction algorithm is a way of deciding which element to evict when the cache is full. To gain optimized benefits there are many algorithms for different use cases.

  • Least Recently Used (LRU)
  • Least Frequently Used (LFU)
  • First In First Out (FIFO)
  • Last In First Out (LIFO) etc.

There are only two hard things in computer science, Cache invalidation and naming things. — Phil Karlton

In our design, we will use

  • HashMap to get and put data in O(1)
  • Doubly linked list
Image for post
Add caption

Cache implementation strategy

We are using doubly linked list to determine which key to delete and have the benefit of adding and deleting keys in O(1).

Initially we will declare a model to store our key-value pair, hit count and reference node to point previous and next node.

public class CacheItem<K, V> {
private K key;
private V value;
private int hitCount = 0; // LFU require this
private CacheItem prev, next;

public CacheItem(K key, V value) {
this.value = value;
this.key = key;
}

public void incrementHitCount() {
this.hitCount++;
}
// getter / setter
}

Now, we can outline our cache class with basic functionalities of get, put, delete and size.

public class Cache<K, V> {
private Map<K, CacheItem> map;
private CacheItem first, last;
private int size;
private final int CAPACITY;
private int hitCount = 0;
private int missCount = 0;
public Cache(int capacity) {
CAPACITY = capacity; //
map = new HashMap<>(CAPACITY);
}

public void put(K key, V value) {}

public V get(K key) {}

public void delete(K key) {
deleteNode(map.get(key));
}

public int size() {
return size;
}
public int getHitCount() {
return hitCount;
}

public int getMissCount() {
return missCount;
}
}

We also included hit count and miss count to determine the performance of our cache.

Hit ratio = total hit / (total hit + total miss)

For our put method

  1. If key already exists, update the value
  2. Otherwise, If size exceeds capacity
  3. Delete existing node using appropriate strategy
  4. Add new node in the top
public void put(K key, V value) {
CacheItem node = new CacheItem(key, value);

if(map.containsKey(key) == false) {
if(size() >= CAPACITY) {
deleteNode(first);
}
addNodeToLast(node);
}
map.put(key, node);
}

In our delete method

  • Remove node from map
  • Delete all references associated with that node.
private void deleteNode(CacheItem node) {
if(node == null) {
return;
}
if(last == node) {
last = node.getPrev();
}
if(first == node) {
first = node.getNext();
}
map.remove(node.getKey());
node = null; // Optional, collected by GC
size--;
}

We can add node in the top. The last pointer will reference to the last inserted node.

private void addNodeToLast(CacheItem node) {
if(last != null) {
last.setNext(node);
node.setPrev(last);
}

last = node;
if(first == null) {
first = node;
}
size++;
}

In our get method,

  • Get data from map
  • Increment the counter for that item. ( Useful for lease frequently used )
  • Reorder the linked list.
public V get(K key) {
if(map.containsKey(key) == false) {
return null;
}
CacheItem node = (CacheItem) map.get(key);
node.incrementHitCount();
reorder(node);
return (V) node.getValue();
}

Reorder is the key for this implementation. For different algorithm this reorder to delete method will change.

Least Recently Used (LRU)

  • The latest accessed node will be at the last end along with newly added items. In this way, we can delete from the first easily.
private void reorder(CacheItem node) {
if(last == node) {
return;
}
if(first == node) {
first = node.getNext();
} else {
node.getPrev().setNext(node.getNext());
}
last.setNext(node);
node.setPrev(last);
node.setNext(null);
last = node;
}

Least Frequently Used (LFU)

We have to sort items based on the frequency the nodes being accessed.

To avoid getting deleted, for each accessed items needs to reach top based on their frequency.

  • Iterated a loop, which swaps the node if the frequency is greater than it’s next node frequency.
private void reorder(CacheItem node) {
if(last == node) {
return;
}
CacheItem nextNode = node.getNext(); while (nextNode != null) {
if(nextNode.getHitCount() > node.getHitCount()) {
break;
}
if(first == node) {
first = nextNode;
}
if(node.getPrev() != null) {
node.getPrev().setNext(nextNode);
}
nextNode.setPrev(node.getPrev());
node.setPrev(nextNode);
node.setNext(nextNode.getNext());
if(nextNode.getNext() != null) {
nextNode.getNext().setPrev(node);
}
nextNode.setNext(node);
nextNode = node.getNext();
}
if(node.getNext() == null) {
last = node;
}
}

This way, there is an edge case where, after adding items that require delete from the start can cause deletion of the high frequency node.

To solve this issue, we need to add nodes pointing our first node for least frequently accessed.

private void addNodeToFirst(CacheItem node) {
if(first != null) {
node.setNext(first);
first.setPrev(node);
}
first = node;

if(last == null) {
last = node;
}
size++;
}

First In First Out (FIFO)

  • To add node we will use addNodeToFirst
  • Delete node from first

Last In First Out (LIFO)

  • To add node we will use addNodeToLast
  • Delete node from first

Questions:

  1. We can use array /ArrayList to solve the same issue with simplified coding.

Yes, it would be simpler. The aspects for not using,

  • ArrayList has complexity of insertion and deletion of O(n) in worst case
  • To search from there list, the worst case complexity would be O(n)

2. We could have used the LinkedList instead of ArrayList.

Once again, yes. LinkedList had the benefit for adding and removing node at O(1). The aspects for not using,

  • We will still face the searching complexity of O(n)

3. For LFU, we are having a complexity of O(n) in reordering method.

Yes, unfortunately. I have came up a solution with this so far. Please suggest any better idea.

4. We are using HashMap. It can have the issue of hash collision. Then the performance will not be O(1) as mentioned.

Absolutely.

  • It’s always recommended to use better hash function that cause less collision.
  • Make sure our load factor remains 0.75 or less.
  • If the load factor is more than default or provided one, from Java 8 HashMap automatically increase the size and rehash the full structure.
  • From Java 8, the collision also being handled by using balanced tree to obtain O(log n) performance.

I’ve written a post with implementation and explanation HashMap Implementation for Java.

5. Is this implementation thread safe?

No. To be thread safe

  • We can use ConcurrentHashMap instead of HashMap
  • Use synchronized block

6. How to add time based eviction strategy to auto delete items?

No. To implement this, we need to run a timer thread to define which items are eligible for eviction and perform deletion operation.

Comments