Variables elimination

From Ssdlpedia
Jump to: navigation, search

The rationale behind the Spartan desire to eliminate variables is that every variable places a burden on both the reader and the writer.

  1. Each variable has a name. The code writer must spend precious mental effort on picking a good name for the variable. The code reader must spend similar effort in figuring out the intention of the reader from the name.
  2. Variables tend to change. The code writer must invest resources in making sure that the variable is changed only when it is supposed to. The code reader must work out the set of locations that may change the variable, and understand the nature of these changes.
  3. Variables may be used by the code. The code writer must define a role for that variable which suggests where that variable may be read. The code reader must understand that role, and to properly appreciate that role, he must work out the list of code locations in which the variable may be read.

Contents

Variables Inlining

A variable that is used only once, is a perfect candidate for elimination. Simply replace the use of this variable with the expression that computes it. For example, instead of

double a = p.area();
 return a;

you can write

return p.area();

In the following example, drawn from the source code of JFLEX, variable b is used only once.

public int interpret(int[] in, int[] par) {
     boolean b = boolexp.interpret(in,par);
     if (b) 
       return exp1.interpret(in,par);
     else 
       return exp2.interpret(in,par);
 }

Eliminating this variable, we obtain

public int interpret(int[] in, int[] par) {
     if (boolexp.interpret(in,par)) 
         return exp1.interpret(in,par);
     else 
         return exp2.interpret(in,par);
 }

and by simple ternarization we get a single statement function

public int interpret(int[] in, int[] par) {
     return boolexp.interpret(in,par) ? exp1.interpret(in,par) : exp2.interpret(in,par);
 }

another ternarization steps yields

public int interpret(int[] in, int[] par) {
     return return (boolexp.interpret(in,par) ? exp1 : exp2).interpret(in,par);
 }


Inlining vs. Optimization and Clarity

Not all inlining opportunities should be exploited. Consider for example

for (a = 0; a < f(); a += g()) {
     ...
}

which may have resulted from inlining of a more explicit statement of the iteration boundaries

int f = f(), g = g();
for (a = 0; a < f; a += g) {
     ...
}

The semantics of these two code snippets is different though. In the former, functions f and g are evaluated in every iteration, while in the latter, they are evaluated only once.

In the case that these functions' return value is not the same, the two loops will execute differently, and the variables should not be inlined.

Should we use the inlined version of the code in the case that the two functions are pure, that is return the same value in all calls? If the purity of the functions is not obvious, then inlining may confuse the reader, and should not be used.

But, what if purity is obvious? The inlined version is then at least as clear as the non-inlined, is shorter, and uses fewer variables. This version may also mean, at least with some compilers, slower execution. Hasting to optimize may not be a good idea though. Remember that premature optimization is a prime example of dangerous creativity and originality and that the loop may execute only a handful of times, or may not even be reached in most paths of execution.

Return the result as soon as it is found

Many routines define a variable to store their result. This variable can often be eliminated, by returning the result as soon as it is found.

Example 1

Consider for example the following simple function, designed to check if a given token is in a list of keywords.

private boolean isKeyword(String token, String[] keywords) {
     boolean result = false;
     for (int i = 0; i < keywords.length; i++)
         if (token.equals(keywords[i])) 
             result = true;
     return result;
 }

By returning the result as soon as we find it, we can eliminate variable result. The resulting function is not only shorter but also a bit more efficient.

private boolean isKeyword(String token, String[] keywords) {
     for (int i = 0; i < keywords.length; i++)
         if (token.equals(keywords[i])) 
            return true;
     return false;
 }

Example 2

Here is another example, taken from the code of the CLASSYCLE project.

private ResultRenderer getRenderer() throws InstantiationException, IllegalAccessException, ClassNotFoundException {
     ResultRenderer renderer = new DefaultResultRenderer();
     if (_resultRenderer != null)
           renderer = (ResultRenderer) Class.forName(_resultRenderer).newInstance();
     return renderer;
 }

which can be written simply as

private ResultRenderer getRenderer() throws InstantiationException, IllegalAccessException, ClassNotFoundException {
     return _resultRenderer != null ? (ResultRenderer) Class.forName(_resultRenderer).newInstance() : DefaultResultRenderer();
 }

Example 3

A third example taken from the CLASSYCLE project is

public boolean isFulfilled(Vertex vertex) {
     boolean result = false;
     if (vertex != null) {
         Attributes attributes = vertex.getAttributes();
         if (attributes instanceof NameAttributes)
             result = _pattern.matches(((NameAttributes) attributes).getName());
         }
      return result;
 }

Returning the result as soon it is known also eliminates the first conditional

public boolean isFulfilled(Vertex vertex) {
     if (vertex == null) 
         return false;
     Attributes attributes = vertex.getAttributes();
     return attributes instanceof NameAttributes ? _pattern.matches(((NameAttributes) attributes).getName()) : false;
 }

Example 4

Here is an even trickier example, drawn again from the CLASSYCLE project.

private boolean matches(String string, int indexInString, int indexInConstantParts) {
         boolean result = true;
         if (indexInConstantParts < _constantParts.length) {
             String constantPart = _constantParts[indexInConstantParts];
             do {
                 int index = string.indexOf(constantPart, indexInString);
                 if (index < 0 || (indexInString == 0 && !_startsWithAnything && index > 0)) {
                     result = false;
                     break;
                 }
                 indexInString = index + constantPart.length();
                 result = matches(string, indexInString, indexInConstantParts + 1);
             } while (result == false);
         } else {
             result = result && (_endsWithAnything || indexInString == string.length());
         }
         return result;
     }

This example is quite tricky. The variable result is used in several places, and the flow of control is a bit confusing. Let us therefore start by removing changing the order of the branches in the conditional statement following the shortest first technique.

private boolean matches(String string, int indexInString, int indexInConstantParts) {
         boolean result = true;
         if (indexInConstantParts >= _constantParts.length)
             result = result && (_endsWithAnything || indexInString == string.length());
         } else {
             String constantPart = _constantParts[indexInConstantParts];
             do {
                 int index = string.indexOf(constantPart, indexInString);
                 if (index < 0 || (indexInString == 0 && !_startsWithAnything && index > 0)) {
                     result = false;
                     break;
                 }
                 indexInString = index + constantPart.length();
                 result = matches(string, indexInString, indexInConstantParts + 1);
             } while (result == false);
         } 
         return result;
     }

It now becomes a bit clearer that we can return the correct value immediately in the first branch.

private boolean matches(String string, int indexInString, int indexInConstantParts) {
         if (indexInConstantParts >= _constantParts.length)
             return _endsWithAnything || indexInString == string.length();
         boolean result = true;
         String constantPart = _constantParts[indexInConstantParts];
         do {
             int index = string.indexOf(constantPart, indexInString);
             if (index < 0 || (indexInString == 0 && !_startsWithAnything && index > 0)) {
                  result = false;
                  break;
             }
             indexInString = index + constantPart.length();
             result = matches(string, indexInString, indexInConstantParts + 1);
         } while (result == false);
         return result;
     }

Examining now the inner conditional statement, we see that if the break is executed, then the returned value is false, so we can rewrite the function as follows:

private boolean matches(String string, int indexInString, int indexInConstantParts) {
         if (indexInConstantParts >= _constantParts.length)
             return _endsWithAnything || indexInString == string.length();
         boolean result = true;
         String constantPart = _constantParts[indexInConstantParts];
         do {
             int index = string.indexOf(constantPart, indexInString);
             if (index < 0 || (indexInString == 0 && !_startsWithAnything && index > 0)) {
                  return false; 
             indexInString = index + constantPart.length();
             result = matches(string, indexInString, indexInConstantParts + 1);
         } while (result == false);
         return result;
     }

Now, we see that if the loop terminates normally, the value of result must be true. A closer inspection reveals that the value of result is computed right before the loop ending condition. So, we can eliminate it altogether by writing

private boolean matches(String string, int indexInString, int indexInConstantParts) {
         if (indexInConstantParts >= _constantParts.length)
             return _endsWithAnything || indexInString == string.length();
         String constantPart = _constantParts[indexInConstantParts];
         do {
             int index = string.indexOf(constantPart, indexInString);
             if (index < 0 || (indexInString == 0 && !_startsWithAnything && index > 0)) {
                  return false; 
             indexInString = index + constantPart.length();
         } while (!matches(string, indexInString, indexInConstantParts + 1));
         return true;
     }

Eliminate iteration and indexing variables using foreach loops

Array indices

A famous idiom for iterating over an array is

for (int i = 0; i < array.length; i++) {
         .... //Do something with array[i]
    }

But, if the loop's body never uses variable i nor modifies array[i], then the loop can be more concisely written as

for (T a: array) {
         .... //Do something with a
    }

where T is the type of elements of array array.

The above example of searching through an array of keywords, can thus be rewritten to eliminate variable i

private boolean isKeyword(String token, String[] keywords) {
     for (final String element : keywords)
         if (element.equals(token))
             return true;
     return false;
 }

The following represents a common structure of the main function of many file processing applications.

public static void main(String argv[]) {
     for (int i = 0; i < argv.length; i++) {
         try {
             System.out.println("Processing ["+argv[i]+"]");
             process(new FileReader(argv[i])));
             System.out.println("No errors processing " + argv[i]);
         } catch (Exception e) {
             e.printStackTrace(System.out);
             System.exit(1);
         }
     }
 }

With this technique, it is simplified to the following

public static void main(String argv[]) {
     for (final String f : argv) {
         try {
             System.out.println("Processing ["+f+"]");
             process(new FileReader(f)));
             System.out.println("No errors processing " + f);
         } catch (Exception e) {
             e.printStackTrace(System.out);
             System.exit(1);
         }
     }
 }

Collections

Old Java code iterating over collections uses an instance of class Iterator to carry out the iteration. This iteration variable can be eliminated.

The following function is drawn from class SecureRandom of package java.security in the Java runtime library.

private static String getPrngAlgorithm() {
     List provs = Providers.getProviderList().providers();
     for (Iterator t = provs.iterator(); t.hasNext();) {
     Provider p = (Provider) t.next();
     for (Iterator u = p.getServices().iterator(); u.hasNext();) {
         Service s = (Service) u.next();
             if (s.getType().equals("SecureRandom")) {
                 return s.getAlgorithm();
              }
         }
     }
     return null;
 }

In this function we have two iteration variables, t and u, both of which can be eliminated. Also, variable, provs is used only once, so it is safe to eliminate as well.

The result is:

private static String getPrngAlgorithm() {
     for (Provider p : Providers.getProviderList().providers())
         for (Service s : p.getServices())
             if (s.getType().equals("SecureRandom"))
                 return s.getAlgorithm();
     return null;
 }

The NOT metric is reduced by this simplification by 50% (from 113 to 56).

Inlining vs. Optimization and Clarity

Not all opportunities for inlining should be exploited. Consider for example

for (a = 0; a < f(); a += g()) {
     ...
}

which may have resulted from inlining of a more explicit statement of the iteration boundaries

int bound = f(), step = g();
for (a = 0; a < found; a += step) {
     ...
}

The semantics of these two code snippets is different though. In the former, functions f and g are evaluated in every iteration, while in the latter, they are evaluated only once.

In the case that these functions' return value is not the same, the two loops will execute differently, and the variables should not be inlined.

But, in the case that the two functions return the same value in all calls using the first form of the code may mean lost optimization opportunities. Choosing between the two versions is a matter of favoring between clarity of the design.

Chained Elimination

Sometimes, several steps are required to eliminate a variable. Consider for example the following constructore

protected Term(IndexFactory a, Predicate target, List<Attribute> as, Location loc) {
         super(a, loc);
         predicate = target;
         this.loc = loc;
         Attribute[] temp = new Attribute[as.size()];
         temp = as.toArray(temp);
         for (int i = 0; i < temp.length; ++i)
             temp[i] = as.get(i);
         attributes = Sequence.make(temp);
    }

We would like to eliminate both variable temp and i. This time, we cannot use an enhanced for loop syntax, since this iteration mechanism does not allow one update the underlying collection.

However, the elimination of i is rather easy if we remember that the List interface offers a toArray method, so we do not need variable i to iterate over the array. The tricky part is that for reasons related to the implementation of generics in Java this method requires an array as a parameter. (There is also an overloaded version of this method which takes no parameters. Alas, this version returns an array of Objects instead of an array of the right type.)

We thus write:

protected Term(IndexFactory a, Predicate target, List<Attribute> as, Location loc) {
         super(a, loc);
         predicate = target;
         this.loc = loc;
         Attribute[] temp = as.toArray(new Attribute[as.size()]);
         attributes = Sequence.make(temp);
    }

Now, we can eliminate temp altogether.

protected Term(IndexFactory a, Predicate target, List<Attribute> as, Location loc) {
         super(a, loc);
         predicate = target;
         this.loc = loc;
         attributes = Sequence.make(as.toArray(new Attribute[as.size()]));
    }

Finally, re-examining the specification of toArray, we see that it does not need an array of the correct size, so instead

attributes = Sequence.make(as.toArray(new Attribute[as.size()]));

we can write

attributes = Sequence.make(as.toArray(new Attribute[0]));


A Concluding Example

The following example is taken from the Java run time library (class ????? in packagecom.sun.corba.se.impl.activation), and was obviously written before the new syntax of for loops was introduced.

public int[] getActiveServers() {
         ServerTableEntry entry;
         int[] list = null;
         synchronized (serverTable) {
             // unlike vectors, list is not synchronized
             ArrayList servers = new ArrayList(0);
             Iterator serverList = serverTable.keySet().iterator();
             try {
                 while (serverList.hasNext()) {
                     Integer key = (Integer) serverList.next();
                     // get an entry
                     entry = (ServerTableEntry) serverTable.get(key);
                     if (entry.isValid() && entry.isActive()) {
                         servers.add(entry);
                     }
                 }
             } catch (NoSuchElementException e) {
                 // all done
             }
             // collect the active entries
             list = new int[servers.size()];
             for (int i = 0; i < servers.size(); i++) {
                 entry = (ServerTableEntry) servers.get(i);
                 list[i] = entry.getServerId();
             }
         }
         if (debug) {
             StringBuffer sb = new StringBuffer();
             for (int ctr = 0; ctr < list.length; ctr++) {
                 sb.append(' ');
                 sb.append(list[ctr]);
             }
             System.out.println("ServerManagerImpl: getActiveServers returns" + sb.toString());
         }
         return list;
     }

Concentrating on the first loop,

Iterator serverList = serverTable.keySet().iterator();
             try {
                 while (serverList.hasNext()) {
                     Integer key = (Integer) serverList.next();
                     // get an entry
                     entry = (ServerTableEntry) serverTable.get(key);
                     if (entry.isValid() && entry.isActive()) {
                         servers.add(entry);
                     }
                 }
             } catch (NoSuchElementException e) {
                 // all done
             }

we see that both variable key and variable serverList can be eliminated,

try {
                 for (ServerTableEntry e : serverTable.values())
                     if (e.isValid() && e.isActive())
                         servers.add(e);
             } catch (NoSuchElementException e) {
                 // all done
             }

In the second loop,

// collect the active entries
             list = new int[servers.size()];
             for (int i = 0; i < servers.size(); i++) {
                 entry = (ServerTableEntry) servers.get(i);
                 list[i] = entry.getServerId();
             }

we cannot eliminate any variables. Yet, the iteration is simplified by using the new syntax.

// collect the active entries
             list = new int[servers.size()];
             int i = 0;
             for (ServerTableEntry e : servers)
                 list[i++] = e.getServerId();

The third looping instruction

for (int ctr = 0; ctr < list.length; ctr++) {
                 sb.append(' ');
                 sb.append(list[ctr]);
             }

makes it possible to eliminate variable ctr:

for (int id : list)
                 sb.append(' ').append(id);

Applying all these modifications, and passing an appropriate generic parameter to the local variable servers ,the function now looks as follows:

public int[] getActiveServers() {
         int[] list;
         synchronized (serverTable) {
             // unlike vectors, list is not synchronized
             ArrayList<ServerTableEntry> servers = new ArrayList<ServerTableEntry>(0);
             try {
                 for (ServerTableEntry e : serverTable.values())
                     if (e.isValid() && e.isActive())
                         servers.add(e);
             } catch (NoSuchElementException e) {
                 // all done
             }
             // collect the active entries
             list = new int[servers.size()];
             int i = 0;
             for (ServerTableEntry e : servers)
                 list[i++] = e.getServerId();
         }
         if (debug) {
             StringBuffer sb = new StringBuffer();
             for (int id : list)
                 sb.append(' ').append(id);
             System.out.println("ServerManagerImpl: getActiveServers returns" + sb.toString());
         }
         return list;
     }

In total, the NOT metric is reduced from 235 to 164 (30% reduction).

Personal tools