Refactoring and performance pitfalls

As I spent this last week profiling and trying to figure out why the C++ version turned out to be slower than the C one I am going to share some tips and hints that one should follow when refactoring in order to keep a decent level of performance.

Avoid implementing copy constructors if you don’t need to

When I was implementing classes such as Vector3 and Matrix4 I implemented my own copy constructor and assignment operator by using memcpy, it turned out that my own version was slower than what the compiler could have generated on its own, so if you don’t have any particular need then just avoid implementing it (you’ll also have less code to maintain!)

Never assume that return value optimization will be employed

This goes along with function inlining, you should never assume that RVO will be employed by the compiler when you are implementing functions. When I replaced function that modified a reference of an object instead of the classic assignment I got a huge increase in performance so consider this if you’re noticing some suspicious performance problems after a refactoring.

Be careful about function inlining

Even if you put the code in the header file and use the keyword “inline”, you’re just giving the compiler an hint but you have no guarantees that the code will actually be inlined. Sometimes it’s easy to think that an inline function that performs some simple operation and returns a new value might be inlined (and you might also think that the compiler will use RVO to remove the unnecessary temporary object) but more than often the compiler will ignore you and generate more overhead than you’d have expected in the first place.
And that’s all for this week!
One more advice I would give is to use compare performance reports if you’re using visual studio profiler as it will help you a lot in finding out exactly which function is slower than before and by how much.

Refactoring part 2

As planned, I am still working on refactoring the maths code as I’ve encountered some problems on the road.

The renderer is now working but there are some minor lighting issues that I am trying to address, however the code is now much cleaner and more readable than before and to show you this I am going to paste some snippets of before and after refactoring scenarios:

Before refactoring –

float *m;
V4 *n;
 
if (c->lighting_enabled) {
 // eye coordinates needed for lighting
 
 m = &c->matrix_stack_ptr[0]->m[0][0];
 v->ec.X = (v->coord.X * m[0] + v->coord.Y * m[1] +
    v->coord.Z * m[2] + m[3]);
 v->ec.Y = (v->coord.X * m[4] + v->coord.Y * m[5] +
    v->coord.Z * m[6] + m[7]);
 v->ec.Z = (v->coord.X * m[8] + v->coord.Y * m[9] +
    v->coord.Z * m[10] + m[11]);
 v->ec.W = (v->coord.X * m[12] + v->coord.Y * m[13] +
    v->coord.Z * m[14] + m[15]);
 
 // projection coordinates
 m = &c->matrix_stack_ptr[1]->m[0][0];
 v->pc.X = (v->ec.X * m[0] + v->ec.Y * m[1] + v->ec.Z * m[2] + v->ec.W * m[3]);
 v->pc.Y = (v->ec.X * m[4] + v->ec.Y * m[5] + v->ec.Z * m[6] + v->ec.W * m[7]);
 v->pc.Z = (v->ec.X * m[8] + v->ec.Y * m[9] + v->ec.Z * m[10] + v->ec.W * m[11]);
 v->pc.W = (v->ec.X * m[12] + v->ec.Y * m[13] + v->ec.Z * m[14] + v->ec.W * m[15]);
 
 m = &c->matrix_model_view_inv.m[0][0];
 n = &c->current_normal;
 
 v->normal.X = (n->X * m[0] + n->Y * m[1] + n->Z * m[2]);
 v->normal.Y = (n->X * m[4] + n->Y * m[5] + n->Z * m[6]);
 v->normal.Z = (n->X * m[8] + n->Y * m[9] + n->Z * m[10]);
 
 if (c->normalize_enabled) {
  gl_V3_Norm(&v->normal);
 }
} else {
 // no eye coordinates needed, no normal
 // NOTE: W = 1 is assumed
 m = &c->matrix_model_projection.m[0][0];
 
 v->pc.X = (v->coord.X * m[0] + v->coord.Y * m[1] + v->coord.Z * m[2] + m[3]);
 v->pc.Y = (v->coord.X * m[4] + v->coord.Y * m[5] + v->coord.Z * m[6] + m[7]);
 v->pc.Z = (v->coord.X * m[8] + v->coord.Y * m[9] + v->coord.Z * m[10] + m[11]);
 if (c->matrix_model_projection_no_w_transform) {
  v->pc.W = m[15];
 } else {
  v->pc.W = (v->coord.X * m[12] + v->coord.Y * m[13] + v->coord.Z * m[14] + m[15]);
 }
}
 
v->clip_code = gl_clipcode(v->pc.X, v->pc.Y, v->pc.Z, v->pc.W);

After Refactoring –

Matrix4 *m;
Vector4 *n;
 
if (c->lighting_enabled) {
 // eye coordinates needed for lighting
 
 m = c->matrix_stack_ptr[0];
 v->ec = m->transform3x4(v->coord);
 
 // projection coordinates
 m = c->matrix_stack_ptr[1];
 v->pc = m->transform(v->ec);
 
 m = &c->matrix_model_view_inv;
 n = &c->current_normal;
 
 v->normal = m->transform3x3(n->toVector3());
 
 if (c->normalize_enabled) {
  v->normal.normalize();
 }
} else {
 // no eye coordinates needed, no normal
 // NOTE: W = 1 is assumed
 m = &c->matrix_model_projection;
 
 v->pc = m->transform3x4(v->coord);
 if (c->matrix_model_projection_no_w_transform) {
  v->pc.setW(m->get(3,3)); 
 }
}
 
v->clip_code = gl_clipcode(v->pc.getX(), v->pc.getY(), v->pc.getZ(), v->pc.getW());

As you can see the code is more readable and the operations performed are clearly stated.
While trying to fix some issues that arised during the refactoring I also stumbled upon the git command “stash”: this command lets you store the changes in your code in a place and then apply them afterwards, I used this system to keep my changes while I was switching betweeen branches to execute the old and new version of the code while fixing all the issues I found, so I highly reccommend to read more about it and learn how to use it!

Refactoring and code readability

Today I’m going to write a post about how refactoring is useful to increase the readability (and thus, maintainability) of the code.

I will start with a basic example: tinyGL uses a struct called V3 to represent three dimensional vectors:

struct V3 {
	float v[3];
};

The APi also defines some functions to work with vectors (namely construction, function to get the normal vector, multiplication, copy, etc)

V3 gl_V3_New(float x, float y, float z);
int gl_V3_Norm(V3 *a);
void gl_MulM3V3(V3 *a, const M4 *b, const V3 *c);
void gl_MoveV3(V3 *a, const V3 *b);

Which in turn leads to code like this:

V3 a, b;
a = gl_V3_New(5,5,5);
M4 transform;
gl_MulM4V3(b,transform,a);

My goal with this refactoring is to increase the readability by introducing a class Vector3, which will make creating vectors and using the much easier (I also plan to write classes that represent matrices so that operations between vector and matrices can be expressed in a much concise and intuitive way)

class Matrix4 {
  public:
	Matrix4();
	Matrix4(const Matrix4 &other);
 
	Matrix4 operator=(const Matrix4 &other);
	Matrix4 operator*(const Matrix4 &b);
	static Matrix4 identity();
 
	Matrix4 transpose() const;
	Matrix4 inverse_ortho() const;
	Matrix4 inverse() const;
	Matrix4 rotation() const;
 
	Vector3 transform(const Vector3 &vector) const;
	Vector4 transform(const Vector4 &vector) const;
  private:
	float m[4][4];
};

class Vector3 {
  public:
	Vector3();
	Vector3(const Vector3 &other);
	Vector3(float x, float y, float z);
	Vector3 operator=(const Vector3 &other);
 
	static Vector3 normal(const Vector3 &v);
 
	Vector3 operator*(float value);
	Vector3 operator+(const Vector3 &other);
	Vector3 operator-(const Vector3 &other);
 
  private:
	float v[3];
};

Those changes would make that previous snippet look like this:

Vector3 a(5,5,3);
Matrix4 transform;
Vector3 b = transform.transform(a);

As we can see from this little snippet the code is more readable, plus now that vectors are represented by classes we can also overload binary operators such as + and – to express vector addition and subtraction in a more concise way (instead of having to write the addition separately for each component).

That’s all for now! I will try to update the blog again this week to show more examples.

Beginning of Summer of Code

Google Summer of Code begins today, the 19th of May.

I thought it would be nice to share in this blog post my plan for the next weeks and how I am going to proceed with it.

According to my project proposal I will be working on math code refactoring for the next 2 weeks: the reason why I chose to put this task as the first one is that refactoring will allow me to get used to the codebase and coding conventions.
Moreover, refactoring the math code will simplify the task of optimizing the code in different ways: since I am going to refactor a substantial amount of code I need to make sure that after this process the library runs at the same speed at before, so I will need to setup some sort of performance benchmark in order to make sure that the refactoring didn’t actually impact the performance of the library; also, refactoring the code will yield a better readability on the whole codebase, which will make it easier to eventually spot any performance bottleneck and try to address it in some way.

By going more in depth about the task, I will need to rewrite most of the maths functions from C to C++, this will involve writing a Vector3/4 class and a Matrix class.
One of my goal for the design of these classes is to make sure that a SIMD implementation will be possible in the future without having to change all the code that is using those classes.

I think that’s all for this blog post, I will keep you updated with code snippets in the future, thanks for readng!

Google Summer of Code 2014!

Hello everyone!

I am extremely happy to be blogging this today, as I have been accepted in the google summer of code 2014 programme!

I will be working on ResidualVM and my goal will be to optimize and refactor TinyGL ( a software implementation of openGL ).

I am still working on my university projects but I will be done soon, and I can’t wait to share this code journey with other people, it is my intention to write about the challenges I will be facing during the project and I think this might be valuable for other people interested in optimization and software design in general.

Thanks for reading this!

Buffer code refactoring and pixel blending

In the past week I’ve been working on refactoring the “z buffer” code.
This class main purpose is to store and manage all the rendering information that happens inside tinyGL.

I started off by mving all the external C functions inside the struct ZBuffer (which was subsequently renamed to FrameBuffer) and then I removed all the direct access of the rendering information by encapsulating it in member functions, this also opened the possibility of implementing different logic of pixel blending directly inside the class, without having to modify every external access to the code to add this kind of logic.

During the refactoring of the z buffer code I also had to heavily rewrite some functions that performed triangle rasterization on the screen as they relied too much on macros and other C-style performance trick, what I did was replace those functions with a single templatized function that handled all the different cases at compile time, yielding a different version of the function based on the parameters passed on the templatized version.

My task for this week is to provide an implementation of the function glBlendFunc, which will allow the renderer to support alpha, additive and subtractive blending. In order to implement this I did some research about how the blending should be performed and the results on the screen and I found out this image that describes visually what should happen with every combination of parameter: