AnyWhichWay
Indexing With JavaScript Objects

Home > Indexing With JavaScript Objects, Millions Of Ops/Second July 6th, 2016

Introduction

When the term "index" is used people generally think "database"; however, indexes can play a major role in many areas to speed up applications. This includes, but is not limited to, caching, memoizing, and message dispatch.

This article demonstrates how to quickly create indexes on JavaScript objects using serializable JavaScript objects. The approach leverages the thousands of hours that have already been put into optimizing JavaScript property lookup in the Google, Mozilla, and Microsoft JavaScript engines to create simple, serializable, super-fast indexes with a minimal amount of code.

This accomplished by:

The Details

Note, there are a number of shortcomings with the class below as documented in the comments; however, it is intended for tutorial purposes only. Addressing the shortcomings would obsfucate the tutorial. Additionally, the sample code has been written using older coding conventions in order to ensure cross-browser support, e.g. => functions are avoided.
function Index() {
	/* 
	* A mapping property provides the storage for the index. It takes the following form:
	*
	*  <className>.<property>.<value>.<type>.<cachedObjectsArray>
	*
	*  For example indexing {name: "Joe", age:25} and {name:"Mary", age:25} would result in this exploded JSON view:
	*
	*  {"Object":
	*		{"name":
	*			{"Joe":
	*				{"string": [{name: "Joe", age:25}]}},
	*			{"Mary":
	*				{"string": [{name: "Mary", age:25}]}
	*		},
	*		{"age":
	*			{"25": 
	*				{"number": [{name: "Joe", age:25},{name: "Mary", age:25}]}
	*		}
	*	}	
	*/
	this.mapping = {}
}
Index.prototype.add = function(object) {
	var me = this,
		className = object.constructor.name; // this will only work for named constructors
		
	// add a root property that correlates to the class of the object being indexed if one does not already exist
	me.mapping[className] = (me.mapping[className] ? me.mapping[className]  : {});
	
	// index all the property values
	Object.keys(object).forEach(function(property) {
		var value = object[property],
			type = typeof(value);
		
		// ignore values that are themselves objects, this can be addressed but is beyond the scope of the tutorial
		if(value && type==="object") {
			return;
		}	
		// add a property that correlates to each property in the object being indexed if one does not already exist
		me.mapping[className][property] = (me.mapping[className][property] ? me.mapping[className][property] : {});
		
		// add a property that correlates to the value of the property in the object being indexed if one does not already exist
		me.mapping[className][property][value] = (me.mapping[className][property][value] ? me.mapping[className][property][value] : {});
		
		// add a type property, if it does not already exists, which contains an array to store target objects with the same property, value and type
		// an Array is used for storage since it is directly serializable
		// a Set might be faster, but serialization would be lost unless an Index.prototype.toJSON method was written
		me.mapping[className][property][value][type] = (me.mapping[className][property][value][type] ? me.mapping[className][property][value][type] : []);
		
		// add the target object to the index
		if(me.mapping[className][property][value][type].indexOf(object)===-1) {
			me.mapping[className][property][value][type].push(object);
		}
	});
}
Index.prototype.find = function(pattern,className) {
	var me = this,
		results;
		
	function intersection() { 
		// A high speed intersection function is required. Details are beyond the scope of this tutorial; 
		// however, this is the fastest we have found: https://github.com/Benvie. Working code is in the bechmark file.
	}
	
	// if there are objects of className in the index
	if(me.mapping[className]) {
	
		// if every property in the pattern has ...
		if(Object.keys(pattern).every(function(property) {
			var value = pattern[property],
				type = typeof(value);
				
			// a cachedObjectsArray at the property, value and type
			if(me.mapping[className][property] && me.mapping[className][property][value] && me.mapping[className][property][value][type]) {
				// which when intersected with existing results
				results = (results ? intersection(results,me.mapping[className][property][value][type]) : me.mapping[className][property][value][type]);
			}
			
			// is non-empty
			return results && results.length>0;
		})) {
			// return the results
			return results
		} else { // a portion of the pattern match failed
			// return an empty array of results
			return [];
		}
	} else { // there are no objects of className
		// return an empty array of results
		return [];
	}
}

Enhancements

Removing Object From Index

Removing an object from an index requires adding a Index.prototype.remove method that is a modification Index.prototype.find so that it:

Supporting Sub Objects

Adding support for indexing property values that are themselves objects requires:

Once uniquely keyed objects are in play, if a non-enumerable .instances property is added to the index, then the cachedObjectsArray array stored in the type subfield can be changed into a regular object whose properties are the unique keys with the value "true" and the objects themselves can be stored in the .instances property. This will substantially improve insert and delete performance since array look-ups that traverse the length of the array looking to see if an object already exists for the property, value and type can be replaced with a built-in hash lookup and splice can be replaced with property deletion or nullification. However, unless further optimizations are made, search performance will drop since the terminal nodes will need to have their properties turned back into arrays of ids and ultimately resolved back to to actual object instances.

Additionally, using keys and adding a .instances property will make the index smaller when serializing to JSON since there will only be one direct reference to each object.

Memoizing

Memoizing is the process of caching function results such that when a function is called with the same arguments more than once, the cached value is returned. Conceptually, one can think of the arguments arrays associated with distinct function invocations as objects to be indexed. However, a slightly different index structure can be used since there will only be one result for any given combination of values, there are no property names perse, and the ordering of values has semantic implications. Whereas, when objects are indexed the ordering of properties does not have semantics and there can be distinct instances with indentical values for all properties.

The below memoization index structure leverages the same thousands of hours of performance enhancements to JavaScript as the above object indexing structure.

{
  <arg1Value1>: {
	<arg1Type1>: {
		<arg2Value>: {
			<arg2Type>: {
				...
				<argNValue>: {
					<argNType>: <function result>	
				}
			}
		},
		...
	}
  },
  <arg1Value2>: {
	...
  }
}

Below is a JSON view of an index from computing volume as function volume(l,w,h) { return l * w * h; } where it has been called three times, once as volume(1,2,3) and once as volume(1,3,2) and once as volume(3,2,1). Not than anyone would cache the results of such a trivial function in a real application!

{
  1: {
	number: {
	  2: {
		number: {
		  3: 6
		}
	  },
	  3: {
		number: {
		  2: 6
		}
	  }
	}
  },
  3: {
	number: {
	  2: {
		number: {
		  1: 6
		}
	  }
	}
  }
}

With this approach, resolving a function becomes a find operation against the index, which if it fails, results in the original computation being executed and the results being stored prior to a value being returned. The implementation of find can be done in such a way that it is at exactly the right place to insert a value when it fails, making the cache assignment trivial.

The approach actually results in a smaller, less complicated code base for indexing since there is only one value at the terminal nodes and the indexing process can be integrated with the look-up process at runtime when functions are invoked. It also results in a small memoization code base since built-in JavaScript structures are used to represent the memos and built-in access methods are used to look them up. See the source code of iMemoized.

Benchmarks & Performance

No claim is made that the indexing approach described in this article is the fastest. Sufficient benchmarks have not been conducted and it is also possible that specific situations could be addressed by purpose built indexing with a dramatically different implementation; however, we do believe that what we describe is generally useful.

High performance is dependent on value variability across the data being indexed. The lower the variability in values for any given property relative to the total number of objects, the faster the index can be searched. Although, even with a variability of values the same as the total number of objects (e.g. when searching based on unique object ids), using the index will probably be faster than linear search. More research on the topic of performance degradation vs memory utilization in likely real-world situations is required.

A Firefox benchmark of linear search vs indexed find across 100,000 objects for a single property value that varies from 1 to 100 is shown below:

Access this page to see the benchmark in action: indexingBenchmark.html. Note, Microsoft and Apple browsers are far slower than Chrome and Firefox, but still get a substaintial performance boost.

A comparison of using this approach in iMemoized vs fast-memoize and _lodash.memoize to support memoizing for computing a fibonacci value shows the following:

un-memoized x 7.47 ops/sec 5.46% (23 runs sampled)
test/benchmark.html:49 iMemoized.memoize x 20,765,906 ops/sec 4.40% (56 runs sampled)
test/benchmark.html:44 lodash _.memoize x 18,171,325 ops/sec 3.11% (54 runs sampled)
test/benchmark.html:44 fast-memoize x 8,275,435 ops/sec 1.07% (59 runs sampled)
test/benchmark.html:46 Fastest is iMemoize.memoize
It should be noted that iMemoized and lodash are within possible garbage collection variance of each other. It is entirely possible lodash will at times be slightly faster or iMemoized slightly slower.

Further Information

The following open source libraries from AnyWhichWay use this indexing approach. Their source code can be explored for further insight:

Copyright 2016, AnyWhichWay