MS Excel 2007 Tutorial: Array Formulas

Introduction

Array formulas are a powerful tool in Excel. An array formula is a formula that works with an array, or series, of data values rather than a single data value. There are two flavors of array formulas: first, there are those formulas that work with an array or series of data and aggregate it, typically using SUM, AVERAGE, or COUNT, to return a single value to a single cell. In this type of array formula, the result, while calculated from arrays, is a single value. We will examine this type of array formula first. The second flavor of array formulas is a formula that returns a result in to two or more cells. These types of array formulas return an array of values as their result.

SectionBreak

Single Value Result Array Formulas

For example, in its simple form, the formula =ROW(A1:A10) returns the number 1, which is the row number of the first cell in the range A1:A10. However, if this is entered as an array formula, it will return an array or series of numbers, each of which is the row number of a cell in the range A1:A10. That is, instead of returning the single value 1, it returns the array of numbers {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. (In standard notation, arrays are written enclosed in curly braces { }.) When using array formulas, you typically use a container function such as SUM or COUNT to aggregate the array to a single number result. Expanding on the example above, the formula =SUM(ROW(A1:A10)) entered normally will return a value of 1. This because in its normal mode, ROW(A1:A10) returns a single number, 1, and then SUM just sums that single number. However, if the formula is entered as an array formula, ROW(A1:A10) returns the array of row numbers and thenSUM adds up the elements of the array, giving a result of 55 ( = 1 + 2 + 3 + ... + 10).

ENTERING AN ARRAY FORMULA: To enter a formula as an array formula, type the formula in the cell and press the CTRL SHIFT and ENTER keys at the same time rather then just ENTER. You must do this the first time you enter the formula and whenever you edit the formula later. If you do this properly, Excel will display the formula enclosed in curly braces { }. You do not type in the braces -- Excel will display them automatically. If you neglect to enter the formula with CTRL SHIFT ENTER, the formula may return a #VALUE error or return an incorrect result.

SectionBreak

Creating Array Formulas Using Arrays Of Data

The IF function can be used in an array formula to test the result of multiple cell tests at one time. For example, you may want to compute the average of the values in A1:A5 but exclude numbers that are less than or equal to zero. For this, you would use an array formula with an IF function to test the cell values and an AVERAGE function to aggregate the result. The following formula does exactly this:

=AVERAGE(IF(A1:A5>0,A1:A5,FALSE))

This formula works by testing each cell in A1:A5 to > 0. This returns an array of Boolean values such as {TRUE, TRUE, FALSE, FALSE, TRUE}.

A BOOLEAN VALUE is a data type that contains either the value TRUE or the value FALSE. When converted to numbers in an arithmetic operation, TRUE is equivalent to 1 and FALSE is equivalent to 0. Most arithmetic functions like SUM and AVERAGE ignore Boolean values, so those values must be converted to numeric values before passing them to SUM or AVERAGE.

The IF function tests each of these results individually, and returns the corresponding value from A1:A5 if True or the value FALSE if false. Fully expanded, the formula would look like the following:

=AVERAGE(IF({TRUE,TRUE,FALSE,FALSE,TRUE},{A1,A2,A3,A4,A5}, {FALSE,FALSE,FALSE,FALSE,FALSE})

Note that the single FALSE value at the end of the original formula is expanded to an array of the appropriate size to match the array from the A1:A5 range in the formula. In array formulas, all arrays must be the same size. Excel will expand single elements to arrays as necessary, but will not resize arrays with more than one element to another size. If the arrays are not of the same size, you will get a #VALUE or in some cases a #N/A error.

When the IF function evaluates, the following intermediate array is formed: {A1, A2, FALSE, FALSE, A5}. This is a substitution of the TRUE elements with the values from A1:A5 and the FALSE elements by FALSE. Since the AVERAGEfunction is designed within Excel to ignore Boolean values (TRUE or FALSE values), it will average only elements A1, A2, and A5 ignoring the TRUE and FALSE values. Note that the FALSE value is not converted to a zero. It is ignored completely by the AVERAGE function.

Array formulas are ideal for counting or summing cells based on multiple criteria.
Consider the table of data shown to the right. It lists the number of products (column C) in different categories (column A) sold by various salesman (column B). To calculate the number of Fax machines sold by Brown, we can use the following array formula:

=SUM((A2:A10="Fax")*(B2:B10="Brown")*(C2:C10))

This function builds three arrays. The first array is a series of TRUE or FALSE values which are the results of comparingA2:A10 to the word "Fax". (Remember, Excel will expand the single "Fax" element to an array of items all of which are "Fax".) The second array is also a series of TRUE orFALSE values, the result of comparing B2:B10 to "Brown". (The single "Brown" element in the formula is expanded to an array of the appropriate size.) The third array is comprised of the number of units sold from the range C2:C10. These three arrays are multiplied together. When you multiply two arrays, the result is itself an array, each element of which is the product of the corresponding elements of the two arrays being multiplied. For example, {1, 2, 3} times {4, 5, 6}is {1*4, 2*5, 3*6} = {4, 10, 18}. When TRUE and FALSE values are used in any arithmetic operation, they are given the values 1 and 0, respectively. Thus in the formula above, Excel expands the formula into the three arrays:

(A2:A10="Fax") {TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE}
(B2:B10="Brown") {TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE}
(C2:C10) {1, 10, 20, 30, 40, 50, 60, 70, 80}

When these array are multiplied, treating TRUE equal to 1 and FALSE equal to 0, we get the array
{1, 0, 0, 0, 0, 0, 60, 0, 0}
which are the quantities of Brown's two Fax sales. The SUM function simply adds up the elements of the array and return a result of 61, the number of Fax machines sold by Brown.

You may have noticed that the logic of the formula tests Product equals "Fax" AND Salesman equals "Brown", but nowhere do we use the AND function. Here, we use multiplication to act as a logical AND function. Multiplication follows the same rules as the AND operator. It will return TRUE (or 1) only when both of the parameters are TRUE (or <> 0). If either or both parameters are FALSE (or 0), the result is FALSE (or 0).

SectionBreak

Logical Operations With Array Formulas

In addition to the logical AND operation using multiplication shown above, other logical operations can be performed arithmetically.

A logical OR operation can be accomplished with addition. For example,

=SUM(IF(((A2:A10="Fax")+(B2:B10="Jones"))>0,1,0))

will count the number of sales (not the number of units sold) in which the product was a Fax OR the salesman was Jones (or both). Addition acts as an OR because the result it TRUE (or <> 0) if either one or both of the elements are TRUE (<> 0). It is FALSE ( = 0) only when both elements are FALSE (or 0). This formula adds two arrays: the results of the comparisons A2:A10 to "Fax", and the results of the comparisons B2:B10 to "Jones". Each of these arrays is an array of TRUE and FALSE values, each element being the result of comparing one cell to "Fax" or "Jones". It then adds these two arrays. When you add two arrays, the result is itself an array, each element of which is the sum of the corresponding element of the original arrays. For example, {1, 2, 3} + {4, 5, 6} = {1+4, 2+5, 3+6} = {5, 7, 9}. For each element in the sum array (A2:A10="Fax")+(B2:B10="Jones"), if that element is greater than 0, IF returns 1, otherwise it returns 0. Finally, SUM just adds up the array.

An "exclusive or" or XOR operation is a comparison that returns TRUE when exactly one of the two elements is TRUE. XOR is FALSE if both elements are TRUE or if both elements are FALSE. Arithmetically, we can use the MOD operator to simulate an XOR operation. For example, to count the number of sales in which the product was a Fax XOR the salesman was Jones (excluding Faxes sold by Jones), we can use the following formula:

=SUM(IF(MOD((A2:A10="Fax")+(B2:B10="Jones"),2),1,0))

A "negative and" or NAND operation is a comparison that returns TRUE when neither or exactly one of the elements is TRUE, but returns FALSE if both elements are TRUE. For example, we can count the number of sales except those in which Jones sold a Fax with the formula:

=SUM(IF((A2:A10="Fax")+(B2:B10="Jones")<>2,1,0))

SectionBreak

Creating Sequences And Loops For Array Formulas

When you are constructing some types of array formulas, you need to create a sequence of numbers for a function to process as an array. As an example, consider an array formula that will compute the average of the Nth largest elements in a range. To do this, we will use the LARGE function to get the largest numbers, and then pass those numbers as an array to AVERAGE to compute the average. Normally, the LARGE function takes as parameters a range to process and a number indicating which largest value to return (1 = largest, 2 = second largest, etc.,). But LARGE does work with arrays for its second parameter. You might be tempted to type in the array in the formula yourself: =LARGE(A1:A10,{1,2,3}). While this will indeed work, it is tedious.

Instead, you can use the ROW function to return a sequence of numbers. When used in an array formula, the functionROW(m:n) will return an array of integers from m to n. Therefore, we can use ROW to create the array to pass to LARGE. This changes our array formula to =LARGE(A1:A10,ROW(1:3)). This brings us closer to a good formula, but two things remain.

First, if you insert a row between rows 1 through 3, Excel will change the row reference 1:3, and therefore the formula will average the wrong numbers. Second, the formula is locked into the three largest values. We can make it more flexible by making the number of elements to average a cell reference that can be easily changed. For example, we can specify that cell C1 contains the size of the array to pass to LARGE. This is accomplished with the INDIRECT function. (Click here for more information about INDIRECT.) The INDIRECT function converts a string representing a cell reference into an actual cell reference. The sub-formula ROW(INDIRECT("1:"&C1)) will return an array of numbers between 1 and the value in cellC1. Now, coming together the formula to average the N largest values in A1:A10 becomes:

=AVERAGE(LARGE(A1:A10,ROW(INDIRECT("1:"&C1))))

SectionBreak

Formulas That Return Arrays

The other type of array formula is one that returns an array of numbers as its result. These sort of array formulas are entered into multiple cells that are then treated as a group. For example, consider the formula =ROW(A1:A10). If this is entered into one cell, either as a normal formula or as an array formula, the result will be 1 in that single cell. If, however, you array enter it into a range of cells each cell will contain one element of the array. To do this, you first must select the range of cells in to which the array should be written, say C1:C10, type the formula =ROW(A1:A10), and then press CTRL SHIFT ENTER. The elements of the array {1, 2, ...., 10} will be written to the range of cells, with one element of the array in each cell. When you array enter a formula into an array of cells, Excel prevents you from modifying a single cell with that array range. You may select the entire range, edit the formula, and array-enter it again with CTRL SHIFT ENTER, but you cannot change a single element of the array.

Some of the built-in Excel functions return an array of values. These formulas must be entered into an array of cells. For example, the MINVERSE function returns the inverse of a matrix with an equal number of rows and columns. Since the inverse of a matrix is itself a matrix, the MINVERSE function must be entered into a range of cells with the same number of rows and columns as the matrix to be inverted. Therefore, if your matrix is in cells A1:B2 (two rows and two columns), you must select a range the same size, type the formula =MINVERSE(A1:B2) and press CTRL SHIFT ENTER rather than just ENTER. This enters the formula as an array formula into all the selected cells. If you were to use the MINVERSEfunction in a single cell, only the upper left corner value of the inverted matrix would be returned.

MS Excel 2007 Tutorial

Sunday, July 19, 2009

Array Formulas

No comments:

Post a Comment

Category

ARCHIVE

Followers