Nowadays, Information technology is key to any business application. And, the second most important thing is the performance of the application. Online space is crowded with tons of applications accessed by thousands of users every second. So, the competition is cut-throat and survival is critical. As a result, different performance improvement strategies are evolving continuously. One of the most effective solution is caching. A proper caching implementation can dramatically enhance the application response time.
In this article, we will try to explore some of the widely used caching solutions in the open source market place. Our main focus would be on ‘Redis’ and ‘Memcached’. These two are market leaders and implemented by various critical enterprise level applications.
What is Caching?
A cache can be defined as a temporary memory storage, primarily used for storing frequently accessed data. And, ‘caching’ is the process of implementing the solution. Caching can be implemented on the client side (the example is browser caching, web server caching etc.), server side (application server caching) or a combined approach. Client side caching is mainly used for storing static data (does not change frequently), which are fetched from the server. On the other hand, server side caching is used to store data in-memory, fetched from other sources. Caching can also be on a single node or on a distributed clustered environment. The caching strategy should be finalized based on the use case.
Why we Need Caching?
We have already discussed about caching and their importance. Following are some more pointers to identify the need of caching.
- Cost optimization by using less band width
- Improved processing time
- Faster response
- Better user experience
- Low latency
Different Caching Strategies
Caching in general, follows different levels of implementation. It can be as simple as using local memory or complex distributed clustered environment. Horizontal and vertical scaling are also used to implement caching. But, it is more dependent on the hardware configuration.SO, different caching strategies can be implemented based on the requirement and the use case.
Now, we will discuss about the two most important ways of caching. One is the ‘In-Process’ caching and the other one is the ‘In-Memory distributed’ caching. Let us explore the concepts in details.
In-Process caching: This is the simplest way of caching implementing. In this process, the cached object is stored along with the application instance. And, the stored object uses the same memory space as the application. So, it is like binding the application instance and cached object in the same local memory location. This is a perfect solution when you have a single instance, running on a single node.
In this caching strategy, the problem arises when multiple instances of an application is running with in-process caching. The data synchronization will be a big head ache across all instances. Because, all instances are having their local storage of cached data, but they are not in-sync. So, a proper synchronization mechanism has to be followed, which is again a complex task. The other challenge is the performance, if the server configuration is limited, because all are using the same memory space.
So, the best fit scenario for in-process caching is – Single instance, single node and single caching.
In-Memory distributed caching: In-memory distributed caching strategy is widely used by mid and large size application. This is more robust and complex in nature. It follows key/value strategy to store the objects. The implementation is external to the original application. In this mechanism, a separate cache server is used for storing purpose and it supports read/write functionalities. The implementation is on a distributed clustered environment with multiple nodes. It forms a single logical view for accessing the distributed data. On the application side, separate caching clients (API) are used for interacting with the cache server. These caching APIs/clients uses complex hashing algorithms to identify the node, where the cached object is stored on the caching server. This solution is a good fit, where performance is the key factor. Although, performance is some time degraded due to inter-process communication, object serialization and network latency.
In our article, we are focusing on Redis and Memcached and both are in-memory distributed caching solution.
Challenges in Caching
Caching is an integral part of any application. It is no doubt a performance booster. But, its proper implementation requires a lot of attention. It brings various challenges based on the application requirement. In this section, we will talk about some of the major challenges to overcome.
Dynamic data and cache invalidation: Cache is not a database, rather it is a temporary storage of most frequently used data. So, the actual data always lies in the database or the back end persistence storage. Now, when there is a change or update or modification of data, the database is updated instantly. But, the cache may have the previous stale data, which is no longer valid. So, there should be some mechanism in place, which can update the cache and make it relevant.
Cache failure: This is again a major risk, even if you have a proper cache implementation. When a requested data is not found in the cache, the back end database is searched. So, there is an increase of load on the database end. Now, if the cache fails, the total load will hit the database, which might be a serious concern. So, a proper fail-over strategy should be implemented (may be by using multi-level caching or distributed caching system).
Deployment issues: Imagine in a production environment, you need to re-deploy your application server or web server. Or you may be re-starting your servers. So, the local cache gets cleared instantly. It also impacts the server level caches. As a result, all user requests will hit the database and slow down the performance. To overcome this issue, proper deployment strategy has to be in place.
Selection criteria – Redis or Memcached?
In this section, we will discuss about the caching options available in the open source world and their fitness with your application. We already know that Redis and Memcached are the two caching solutions which are widely used across different applications. But, we cannot select blindly without checking our application requirement.
Let’s dive into some more details to get a proper understanding and direction.
Now, we will explore the following key areas for selecting a proper caching strategy for your application.
- Data type support: Data type support is one of the major criteria for selecting Redis or Memcached. Memcached supports simple data records with a key/value structure. But, Redis supports five types of data structure. It includes String, Hash, List, Set and Sorted Set. So, it is more effective in real-life scenarios. Whereas, Memcached is suitable for simple key/value pairs (preferably String value).So, Memcached is preferred for the small volume of simple static date. And, redis is suitable for complex data structures with a large volume of data.
- Memory management: Memory management is another key area to consider before selecting the proper caching solutions. Redis and Memcached follows a different structure for storing cached data. In the case of Memcached, all data caching is done on the physical memory only, no disk operation is performed. But, Redis is having a proper mechanism to store values on the disk when the physical memory is full (reached the threshold value). Redis only caches the key information in the memory. Redis swaps the values based on a calculation, which is known as swappability. And, when the data is swapped, it is deleted from the memory. We must remember that, Redis keeps all the keys in-memory and it does not swap all the data. So, during fetching, if the data is available in-memory, it is served immediately. Else, it is swapped back from the disk to memory, and then send back to the client. Whereas, Memcached uses Slab/chunk allocation for memory management.
- Performance: Memcached is faster when the data is static in nature and volume is low. Memcached is memory only storage, so when the data volume increases, its performance decreases. Whereas, the performance of Redis is superior when data is dynamic and complex in nature. Redis also performs better with a huge volume of data and a clustered environment.
- Persistence support: We all know that, Redis and Memcached are both in-memory data store. It means that they only use memory for their storage. But, Redis supports in-memory data persistence. On the other hand, Memcached does not have any support for memory data persistence.
Let’s explore some more details about Redis persistence support. It will help us to select the proper caching mechanism for our applications.
Redis supports persistence in two forms, one is RDB snapshot and the other one is AOF log.
RDB snapshot is the most widely used mechanism for memory data persistence. In this process, a snapshot of the current data is stored in a data file. RDB files are incorruptible as the write operation is done in a separate process. Following are the steps followed in a RDB snapshot and store process.
- Snapshot of the current data is taken as per scheduled configuration
- Current process creates a sub process
- Sub process writes the snapshot data in a temp file
- Temp file is renamed as a RDB file
The other form of persistence is by creating an AOF log file, which is actually an Append Only File (AOF). It is a plain text file with standard redis commands. This file will have the appended commands which causes data changes to the AOF.
- Cluster management: Memcached does not support distributed mode and clustered environment. The only way to implement distributed storage is to use distributed algorithms and client side storage. Memcached does not support server distributed storage. On the other hand, Redis supports clustered environment by default. Its distributed storage is on the server side and it can expand in a linear way.
Use case discussion: In this section, we will focus on some practical use cases. The purpose is to identify the use case suitable for Redis and Memcached. This discussion will help you to relate caching with the real world scenarios and select the proper caching framework.
- In a small and medium e-commerce site, Memcached is suitable. Memcached is very efficient in storing huge volume of static data, like static HTML pages, item data, item price etc., which are frequently accessed by the users.
- In a large e-commerce platform, where the data volume is huge, we should go for Redis. Redis is capable of managing a large volume of data in a clustered environment. Here, Memcached would not be a good fit.
- In a data critical application like banking, finance and insurance application, where you cannot afford to lose data, Redis would be a good fit. It offers memory data persistence in a data file (RDB snap shot and AOF file). AOF file should be selected for zero data loss.
- In an application, where you need to store complex data types (more specifically different data structures), Redis should be selected. As, Memcached only support key/value data.
The purpose of this application is to understand the importance of caching and its different implementation. Here we have discussed, different aspects of caching. We have also covered the challenges involved and their solutions. The main focus of this application is to discuss about Redis and Memcached – two most widely used caching solutions. And, at the end, we have covered some practical scenarios and their caching selection criteria.