Once again I would like to introduce guest blogger Hanan Kavitz of Applied Materials. Several months ago Hanan discussed several quirks with compiled Matlab DLLs. Today Hanan will discuss how they overcame a performance bottleneck with Matlab’s builtin rmfield function, exemplifying the general idea that we can sometimes improve performance by profiling the core functionality that causes a performance hotspot and optimizing it, even when it is part of a builtin Matlab function. For additional ideas of improving Matlab peformance, search this blog for “Performance” articles, and/or get the book “Accelerating MATLAB Performance“.
I’ve been using Matlab for many years now and from time to time I need to profile low-throughput code. When I profile this code sometimes I realize that a computational ‘bottleneck’ is due to a builtin Matlab function (part of the core language). I can often find ways to accelerate such builtin functions and get significant speedup in my code.
I recently found Matlab’s builtin rmfield function being too slow for my needs. It works great when one needs to remove a few fields from a small structure, but in our case we needed to remove thousands of fields from a structure containing about 5000 fields – and this is executed in a function that is called many times inside an external loop. The program was significantly sluggish.
It started when a co-worker asked me to look at a code that looked just slightly more intelligent than this:
for i = 1:5000 myStruct = rmfield(myStruct,fieldNames{i}); end
Running this code within a tic/toc pair yielded the following results:
>> tic; myFunc(); t1 = toc t1 = 25.7713
In my opinion 25.77 secs for such a simple functionality seems like an eternity…
The obvious thing was to change the code to the documented faster (vectorized) version:
>> tic; myStruct = rmfield(myStruct,fieldNames); t2 = toc t2 = 0.6097
This is obviously much better but since rmfield is called many times in my application, I needed something even better. So I profiled rmfield and was not happy with the result.
The original code of rmfield (%matlabroot%/toolbox/matlab/datatypes/rmfield.m) looks something like this (I deleted some non-essential code for brevity):
function t = rmfield(s,field) % get fieldnames of struct f = fieldnames(s); % Determine which fieldnames to delete. idxremove = []; for i=1:length(field) j = find(strcmp(field{i},f) == true); idxremove = [idxremove;j]; end % set indices of fields to keep idxkeep = 1:length(f); idxkeep(idxremove) = []; % remove the specified fieldnames from the list of fieldnames. f(idxremove,:) = []; % convert struct to cell array c = struct2cell(s); % find size of cell array sizeofarray = size(c); newsizeofarray = sizeofarray; % adjust size for fields to be removed newsizeofarray(1) = sizeofarray(1) - length(idxremove); % rebuild struct t = cell2struct(reshape(c(idxkeep,:),newsizeofarray),f);
When I profiled the code, the highlighted row was the bottleneck I was looking for.
First, I noticed the string comparison equals to true
part – while '==true'
is not the cause of the bottleneck, it does leave an impression of bad coding style Perhaps this code was created as some apprentice project, which might also explain its suboptimal performance.
The real performance problem here is that for each field that we wish to remove, rmfield compares it to all existing fields to find its location in a cell array of field names. This is algorithmically inefficient and makes the code hard to understand (just try – it took me hard, long minutes).
So, I created a variant of rmfield.m called fast_rmfield.m, as follows (again, omitting some non-essential code):
function t = fast_rmfield(s,field) % get fieldnames of struct f = fieldnames(s); [f,ia] = setdiff(f,field,'R2012a'); % convert struct to cell array c = squeeze(struct2cell(s)); % rebuild struct t = cell2struct(c(ia,:),f)';
This code is much shorter, easier to explain and maintain, but also (and most importantly) much faster:
>> tic; myStruct = fast_rmfield(myStruct,fieldNames); t3 = toc t3 = 0.0302 >> t2/t3 ans = 20.1893
This resulted in a speedup of ~850x compared to the original version (of 25.77 secs), and ~20x compared to the vectorized version. A nice improvement in my humble opinion…
The point in all this is that we can and should rewrite Matlab builtin functions when they are too slow for our needs, whether it is found to be an algorithmic flaw (as in this case), extraneous sanity checks (as in the case of ismember or datenum), bad default parameters (as in the case of fopen/fwrite or scatter), or merely slow implementation (as in the case of save, cellfun, or the conv family of functions).
A good pattern is to save such code pieces in file names that hint to the original code. In our case, I used fast_rmfield to suggest that it is a faster alternative to rmfield.
Do you know of any other example of a slow implementation in a built-in Matlab function that can be optimized? If so, please leave a comment below.