During some recent performance tuning work, I took a look at a method
that was taking around 450 ms. The primary task of the function was to
build a relatively large DataSet object, with around 350 rows and over
110 columns. Sure, that’s over 38,500 cells, but 450 ms seemed extreme.
As a first step, I tried commenting out the actual logic that decided
what went into each and every cell and just sticking in the word “hello”
38,500 times. To my surprise, it hardly made a dent in the performance.
So, I took it a step further, and tried not putting in any data at all.
Sure enough, time fell down to close to zero!
How could copying the same string in over eat so much performance time?
Googling around didn’t turn up much, but I started poking around in the
DataSet object and discovered that it had a lot of power that I had
never used. It’s able to keep track of what rows have been modified,
what needed to be updated back in a database, etc. Clearly there was a
significant amount of logic behind it. With this in mind, I looked back
at exactly how we were creating the rows, and I spotted something
suspicious:
DataRow newRow = dataTable.NewRow(); dataTable.Rows.Add(newRow); newRow[col1] = value1; newRow[col2] = value2; ...
We were creating the row, adding it to the table, and then putting in
the values. When I was thinking of the DataSet as a glorified,
serializable hashtable, this shouldn’t matter much. However, since it
was clearly doing some sophisticated state management, could modifying a
row after it was in the table be much more expensive than before it was
in the table? Perhaps the DataSet was working hard to keep track of all
the modifications we were doing on the theory that we might need it.
I tried making the simple change of moving step of adding the row until
after it was populated with data:
DataRow newRow = dataTable.NewRow(); newRow[col1] = value1; newRow[col2] = value2; ... dataTable.Rows.Add(newRow);
Boom! Construction time dropped down to close to zero. Clearly, that
small detail really mattered.
So, lesson learned: don’t add the row to a DataSet until it is
populated.
hhhh