# Learn about Memoization in PHP

0
1747  Memoization is a programming technique that aims to cut down the recalculation of costly functions that would need to be called more than once. The easiest way to picture this to imagine a cache-like data structure, then in each pass of the function check to see if  re-calculations is needed.

Doing something like this can greatly cut down on computation time. Instead of having to recalculate values, you’ll simply use the values that have already been generated and continue processing the request with that. Once you get to the point where you need to recompute values millions of times you can pretty easily see the performance gains. This might sound a bit complicated at first, but once we put it into practice it will make much more sense.

First, let’s take a look at a very common problem: the Fibonacci sequence. This is a very famous sequence of number found both in mathematics and nature. The fibonacci sequence is defined as:

So each value of the sequence is generated by combining the previous two values. This generates the following for the first couple values of the sequence:

The problem we’ll be looking at today is to find the nth value of the sequence. A naive solution to this is to use plain recursion to generate the values. This makes sense since the problem itself is recursive in nature, each successive iteration uses the previous amount to generate it. This is all good for small values of n, but it’s pretty easy to see how this can grow quickly. If fact, with a little bit of understanding on time complexity we can tell that without any extra optimization this would run in exponential time.  Let’s take a look at that right now.

As you can see there are some issues with the implementation of this code. I tested it with a simple timer, and even for a value of n=35 I was pushing 10+ seconds for every request. That’s a tad long, and is likely to not be viable for any production environment. As this is an article about optimization I’m sure you can see what’s coming next; there is a much better solution for us.

First, let’s think through it really quickly and understand where the optimization is coming from. For example, say we wanted to compute what the value for the 5th number in the sequence is. Well, to that we’d have to calculate the 4th, 3rd, 2nd… etc. Hold on a second, if we wanted to calculate the 4th, then we’d have to figure out the 3rd, 2nd…etc. Right away I’m sure you can see the issue, we’re will have to re-compute all the values for each iteration of the sequence.

That seems horribly inefficient, and as you can see if you run the above code for any large value in practice. Wouldn’t it be nice if there was a way for us to only have to compute each value only once?

Going back to our topic, memoization and how it can speed things up. By taking each iteration and saving it into an array, we can save ourselves from having to compute each one more than once.

For this particular example an array is likely to be the best choice. If you’re working with something more complex, there may be a better structure to use, but that will be on a case by case basis. We won’t get into anything that complex, this is just going to cover a basic use case.

Going back to the analysis, once we calculate F(3), we can save that and use the value later to compute F(4) without having to recalculate F(3). Thinking about it from a conceptual standpoint, it’s pretty easy to see the performance gains. This is especially true when we start trying to calculate large values, the larger the value the more the performance gain.

Let’s take a look at how that might be implemented.

This improved version uses the concept of memoization. Each pass through the function saves the result to an array \$previousValues. You can see this as the result of each recursive call is saved into the array. As we get higher into the values of n, this prevents us from having to do extra computation, greatly saving time.

This significantly cuts down on computation time. I was able to call “fibonacci(100)” and it executed in under a second; a huge improvement over taking 10 seconds for only up to 35!

At this point you might be wondering if there are any downsides to this method. Like all things, there is. As we go through each iteration of the function, we are adding onto an array. It’s important to consider that when building out your application, the example shown is not “in place”. This means that that its memory consumption increases as the value of n increases. If you’re looking for values in the millions or billions you’ll need to be sure that you have enough memory in order to hold all that additional info.

You’re also adding complexity to the function. I’m sure if you look back at the two examples above, the first one is much easier to read than the second. This is an often overlooked aspect of software development and is an important consideration. In this case, the value of the time saved is likely to overwrite the other concerns, but in other cases where the performance gain might not be as pronounced, it may be a tougher decision. The important thing to take away is understand all of the aspects and make an informed decision of the best way to proceed.

The example above is also fairly trivial in that it only needs to store integer values. In some cases, you may need to store more complex data like strings or arrays. The important thing to realize is that this technique is very similar to caching, and to do correct caching is hard. Like all things, it’s important to understand the systems in place and work through it, accordingly.