🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Fixed Function Pipeline Faster For Sprites?

Started by
15 comments, last by VoxycDev 5 years, 2 months ago

I'm getting some strangely unexpected results with my new sprite renderer that uses OpenGL ES 2.0. It performs much worse than my old sprite renderer from 5 years ago that uses OpenGL ES 1.1 (no shaders). All I'm doing is displaying a grid of quads 16x16 and moving and zooming it around a little bit. You can see the difference in the video below:

Video to demonstrate the issue

Clearly, the fixed pipeline runs smoothly, but my supposedly fast one-draw-call shader program chugs (when I tried one draw call-per-quad it was naturally even slower). This is not what I expected.

  • How can I speed up my new sprite renderer?
  • Is the fixed function pipeline naturally just more adapted to vertex data that changes more often? (like a new VBO on every frame)
  • I could just re-write the new renderer in OpenGL ES 1.1 again, but then I will lose compatibility with desktop OpenGL. This is a bad idea, right?
  • Can I emulate the fixed-function pipeline with shaders? Is there code out there that does this? What tricks did they use in it to get sprites to render so fast?

Old Fixed-Function Code:


            for (int z = 0; z <= mTileEdit.mCurLevel; z++) {
                for (int y = 0; y < tm.mSizeY; y++) {
                    for (int x = 0; x < tm.mSizeX; x++) {

                        int t = tm.get(x, y, z);

                        if (t != 0 && t > 0 && t < 256) {
                            
                            // Set alpha
                            float alpha = 1.0f;
                            if (Lozoware.getMP().get("name").equals("pixeledit") || Lozoware.getMP().get("name").equals("edit3d")) {
                                alpha = 1.0f - ((float)z / (float)tm.mSizeZ);
                            }

                            // Set color
                            gl.glColor4f(tm.mPalette.mRed[t],
                                    tm.mPalette.mGreen[t],
                                    tm.mPalette.mBlue[t], alpha);

                            // Vertex buffer
                            bb = ByteBuffer.allocateDirect((6 * 3) * 3 * 4);
                            bb.order(ByteOrder.nativeOrder());
                            FloatBuffer buf = bb.asFloatBuffer();

                            float bottomLeftX = x * mGLTileSizeX;
                            float bottomLeftY = y * mGLTileSizeY;
                            float topLeftX = x * mGLTileSizeX;
                            float topLeftY = y * mGLTileSizeY + mGLTileSizeY;
                            float bottomRightX = x * mGLTileSizeX + mGLTileSizeX;
                            float bottomRightY = y * mGLTileSizeY;
                            float topRightX = x * mGLTileSizeX + mGLTileSizeX;
                            float topRightY = y * mGLTileSizeY + mGLTileSizeY;

                            buf.position(0);

                            buf.put(topLeftX);
                            buf.put(topLeftY);
                            buf.put(0);

                            buf.put(bottomRightX);
                            buf.put(bottomRightY);
                            buf.put(0);

                            buf.put(bottomLeftX);
                            buf.put(bottomLeftY);
                            buf.put(0);

                            buf.put(topLeftX);
                            buf.put(topLeftY);
                            buf.put(0);

                            buf.put(topRightX);
                            buf.put(topRightY);
                            buf.put(0);

                            buf.put(bottomRightX);
                            buf.put(bottomRightY);
                            buf.put(0);

                            buf.position(0);

                            // Draw
                            gl.glEnableClientState(GL10.GL_VERTEX_ARRAY);

                            gl.glVertexPointer(3, GL10.GL_FLOAT, 0, buf);

                            gl.glDrawArrays(GL10.GL_TRIANGLES, 0, 6 * 3);

                            gl.glDisableClientState(GL10.GL_VERTEX_ARRAY);
                        }
                    }
                }
            }


            gl.glFlush();

New OpenGL ES 2.0 Code:


	int numVerts = 0;
	int numQuads = 0;

	// Alloc enough data for all sprites
	for (const auto & pair: objects) {
	  Object * obj = pair.second;

	  if (obj != nullptr && obj - > visible && obj - > type == OBJTYPE_SPRITE) {
	    numVerts += 6;
	    numQuads += 1;
	  }
	}

	int floatsPerVert = 26;

	float * data = new float[numVerts * floatsPerVert];

	int cursor = 0;

	// Quad/sprite index
	int q = 0;

	// Fill data for all sprites
	for (const auto & pair: objects) {
	  Object * obj = pair.second;

	  if (obj != nullptr && obj - > visible && obj - > type == OBJTYPE_SPRITE) {

	    // Add sprite
	    texAtlas.add(obj - > textureName);

	    if (texAtlas.getNeedsRefresh())
	      texAtlas.refresh();

	    // Set modelview matrix
	    glm::mat4 mvMatrix;
	    glm::mat4 scaleToNDC;
	    glm::mat4 cameraRotate;
	    glm::mat4 cameraTranslate;
	    glm::mat4 rotate;

	    #ifdef PLATFORM_OPENVR
	    scaleToNDC = glm::scale(glm::mat4(), glm::vec3(VRSCALE, VRSCALE, VRSCALE));#
	    else
	      scaleToNDC = glm::scale(glm::mat4(), glm::vec3(NDC_SCALE, NDC_SCALE, NDC_SCALE));#
	    endif

	    if (obj - > alwaysFacePlayer)
	      rotate = glm::rotate(glm::mat4(), glm::radians(-camera - > yaw), glm::vec3(0, 1, 0)) // Model yaw
	      *
	      glm::rotate(glm::mat4(), glm::radians(camera - > pitch), glm::vec3(1, 0, 0)); // Model pitch
	    else
	      rotate = glm::rotate(glm::mat4(), glm::radians(-obj - > yaw), glm::vec3(0, 1, 0)) // Model yaw
	      *
	      glm::rotate(glm::mat4(), glm::radians(-obj - > pitch), glm::vec3(1, 0, 0)); // Model pitch

	    cameraRotate = glm::rotate(glm::mat4(), glm::radians(camera - > roll), glm::vec3(0, 0, 1)) // Camera roll
	      *
	      glm::rotate(glm::mat4(), -glm::radians(camera - > pitch), glm::vec3(1, 0, 0)) // Camera pitch
	      *
	      glm::rotate(glm::mat4(), glm::radians(camera - > yaw), glm::vec3(0, 1, 0)); // Camera yaw

	    cameraTranslate = glm::translate(glm::mat4(), glm::vec3(-camera - > position.x, -camera - > position.y, -camera - > position.z)); // Camera translate

	    #ifdef PLATFORM_OPENVR
	    mvMatrix =
	      glm::make_mat4((const GLfloat * ) g_poseEyeMatrix.get()) *
	      scaleToNDC *
	      cameraRotate *
	      cameraTranslate *
	      glm::translate(glm::mat4(), glm::vec3(obj - > position.x, obj - > position.y, obj - > position.z)) // World translate
	      *
	      rotate *
	      glm::scale(glm::mat4(), obj - > scale / glm::vec3(2.0, 2.0, 2.0)); // Scale
	    #else
	      mvMatrix =
	      scaleToNDC *
	      cameraRotate *
	      cameraTranslate *
	      glm::translate(glm::mat4(), glm::vec3(obj - > position.x, obj - > position.y, obj - > position.z)) // World translate
	      *
	      rotate *
	      glm::scale(glm::mat4(), obj - > scale / glm::vec3(2.0, 2.0, 2.0)); // Scale
	    #endif

	    //   ______
	    // |\\5   4|
	    // |0\\    |
	    // |  \\   |
	    // |   \\  |
	    // |    \\3|
	    // |1__2_\\|

	    // Triangle 1

	    // Vertex 0
	    data[cursor + 0] = -1.0 f;
	    data[cursor + 1] = 1.0 f;
	    data[cursor + 2] = 0.0 f;
	    data[cursor + 3] = 1.0 f;

	    UV input;
	    input.u = 0.0 f;
	    input.v = 1.0 f;
	    UV output = texAtlas.getUV(obj - > textureName, input);

	    data[cursor + 4] = output.u;
	    data[cursor + 5] = output.v;

	    data[cursor + 6] = mvMatrix[0][0];
	    data[cursor + 7] = mvMatrix[0][1];
	    data[cursor + 8] = mvMatrix[0][2];
	    data[cursor + 9] = mvMatrix[0][3];

	    data[cursor + 10] = mvMatrix[1][0];
	    data[cursor + 11] = mvMatrix[1][1];
	    data[cursor + 12] = mvMatrix[1][2];
	    data[cursor + 13] = mvMatrix[1][3];

	    data[cursor + 14] = mvMatrix[2][0];
	    data[cursor + 15] = mvMatrix[2][1];
	    data[cursor + 16] = mvMatrix[2][2];
	    data[cursor + 17] = mvMatrix[2][3];

	    data[cursor + 18] = mvMatrix[3][0];
	    data[cursor + 19] = mvMatrix[3][1];
	    data[cursor + 20] = mvMatrix[3][2];
	    data[cursor + 21] = mvMatrix[3][3];

	    data[cursor + 22] = obj - > color.r;
	    data[cursor + 23] = obj - > color.g;
	    data[cursor + 24] = obj - > color.b;
	    data[cursor + 25] = obj - > color.a;

	    cursor += floatsPerVert;

	    // Vertex 1
	    data[cursor + 0] = -1.0 f;
	    data[cursor + 1] = -1.0 f;
	    data[cursor + 2] = 0.0 f;
	    data[cursor + 3] = 1.0 f;

	    input.u = 0.0 f;
	    input.v = 0.0 f;
	    output = texAtlas.getUV(obj - > textureName, input);

	    data[cursor + 4] = output.u;
	    data[cursor + 5] = output.v;

	    data[cursor + 6] = mvMatrix[0][0];
	    data[cursor + 7] = mvMatrix[0][1];
	    data[cursor + 8] = mvMatrix[0][2];
	    data[cursor + 9] = mvMatrix[0][3];

	    data[cursor + 10] = mvMatrix[1][0];
	    data[cursor + 11] = mvMatrix[1][1];
	    data[cursor + 12] = mvMatrix[1][2];
	    data[cursor + 13] = mvMatrix[1][3];

	    data[cursor + 14] = mvMatrix[2][0];
	    data[cursor + 15] = mvMatrix[2][1];
	    data[cursor + 16] = mvMatrix[2][2];
	    data[cursor + 17] = mvMatrix[2][3];

	    data[cursor + 18] = mvMatrix[3][0];
	    data[cursor + 19] = mvMatrix[3][1];
	    data[cursor + 20] = mvMatrix[3][2];
	    data[cursor + 21] = mvMatrix[3][3];

	    data[cursor + 22] = obj - > color.r;
	    data[cursor + 23] = obj - > color.g;
	    data[cursor + 24] = obj - > color.b;
	    data[cursor + 25] = obj - > color.a;

	    cursor += floatsPerVert;

	    // Vertex 2
	    data[cursor + 0] = 1.0 f;
	    data[cursor + 1] = -1.0 f;
	    data[cursor + 2] = 0.0 f;
	    data[cursor + 3] = 1.0 f;

	    input.u = 1.0 f;
	    input.v = 0.0 f;
	    output = texAtlas.getUV(obj - > textureName, input);

	    data[cursor + 4] = output.u;
	    data[cursor + 5] = output.v;

	    data[cursor + 6] = mvMatrix[0][0];
	    data[cursor + 7] = mvMatrix[0][1];
	    data[cursor + 8] = mvMatrix[0][2];
	    data[cursor + 9] = mvMatrix[0][3];

	    data[cursor + 10] = mvMatrix[1][0];
	    data[cursor + 11] = mvMatrix[1][1];
	    data[cursor + 12] = mvMatrix[1][2];
	    data[cursor + 13] = mvMatrix[1][3];

	    data[cursor + 14] = mvMatrix[2][0];
	    data[cursor + 15] = mvMatrix[2][1];
	    data[cursor + 16] = mvMatrix[2][2];
	    data[cursor + 17] = mvMatrix[2][3];

	    data[cursor + 18] = mvMatrix[3][0];
	    data[cursor + 19] = mvMatrix[3][1];
	    data[cursor + 20] = mvMatrix[3][2];
	    data[cursor + 21] = mvMatrix[3][3];

	    data[cursor + 22] = obj - > color.r;
	    data[cursor + 23] = obj - > color.g;
	    data[cursor + 24] = obj - > color.b;
	    data[cursor + 25] = obj - > color.a;

	    cursor += floatsPerVert;

	    // Triangle 2

	    // Vertex 3
	    data[cursor + 0] = 1.0 f;
	    data[cursor + 1] = -1.0 f;
	    data[cursor + 2] = 0.0 f;
	    data[cursor + 3] = 1.0 f;

	    input.u = 1.0 f;
	    input.v = 0.0 f;
	    output = texAtlas.getUV(obj - > textureName, input);

	    data[cursor + 4] = output.u;
	    data[cursor + 5] = output.v;

	    data[cursor + 6] = mvMatrix[0][0];
	    data[cursor + 7] = mvMatrix[0][1];
	    data[cursor + 8] = mvMatrix[0][2];
	    data[cursor + 9] = mvMatrix[0][3];

	    data[cursor + 10] = mvMatrix[1][0];
	    data[cursor + 11] = mvMatrix[1][1];
	    data[cursor + 12] = mvMatrix[1][2];
	    data[cursor + 13] = mvMatrix[1][3];

	    data[cursor + 14] = mvMatrix[2][0];
	    data[cursor + 15] = mvMatrix[2][1];
	    data[cursor + 16] = mvMatrix[2][2];
	    data[cursor + 17] = mvMatrix[2][3];

	    data[cursor + 18] = mvMatrix[3][0];
	    data[cursor + 19] = mvMatrix[3][1];
	    data[cursor + 20] = mvMatrix[3][2];
	    data[cursor + 21] = mvMatrix[3][3];

	    data[cursor + 22] = obj - > color.r;
	    data[cursor + 23] = obj - > color.g;
	    data[cursor + 24] = obj - > color.b;
	    data[cursor + 25] = obj - > color.a;

	    cursor += floatsPerVert;

	    // Vertex 4
	    data[cursor + 0] = 1.0 f;
	    data[cursor + 1] = 1.0 f;
	    data[cursor + 2] = 0.0 f;
	    data[cursor + 3] = 1.0 f;

	    input.u = 1.0 f;
	    input.v = 1.0 f;
	    output = texAtlas.getUV(obj - > textureName, input);

	    data[cursor + 4] = output.u;
	    data[cursor + 5] = output.v;

	    data[cursor + 6] = mvMatrix[0][0];
	    data[cursor + 7] = mvMatrix[0][1];
	    data[cursor + 8] = mvMatrix[0][2];
	    data[cursor + 9] = mvMatrix[0][3];

	    data[cursor + 10] = mvMatrix[1][0];
	    data[cursor + 11] = mvMatrix[1][1];
	    data[cursor + 12] = mvMatrix[1][2];
	    data[cursor + 13] = mvMatrix[1][3];

	    data[cursor + 14] = mvMatrix[2][0];
	    data[cursor + 15] = mvMatrix[2][1];
	    data[cursor + 16] = mvMatrix[2][2];
	    data[cursor + 17] = mvMatrix[2][3];

	    data[cursor + 18] = mvMatrix[3][0];
	    data[cursor + 19] = mvMatrix[3][1];
	    data[cursor + 20] = mvMatrix[3][2];
	    data[cursor + 21] = mvMatrix[3][3];

	    data[cursor + 22] = obj - > color.r;
	    data[cursor + 23] = obj - > color.g;
	    data[cursor + 24] = obj - > color.b;
	    data[cursor + 25] = obj - > color.a;

	    cursor += floatsPerVert;

	    // Vertex 5
	    data[cursor + 0] = -1.0 f;
	    data[cursor + 1] = 1.0 f;
	    data[cursor + 2] = 0.0 f;
	    data[cursor + 3] = 1.0 f;

	    input.u = 0.0 f;
	    input.v = 1.0 f;
	    output = texAtlas.getUV(obj - > textureName, input);

	    data[cursor + 4] = output.u;
	    data[cursor + 5] = output.v;

	    data[cursor + 6] = mvMatrix[0][0];
	    data[cursor + 7] = mvMatrix[0][1];
	    data[cursor + 8] = mvMatrix[0][2];
	    data[cursor + 9] = mvMatrix[0][3];

	    data[cursor + 10] = mvMatrix[1][0];
	    data[cursor + 11] = mvMatrix[1][1];
	    data[cursor + 12] = mvMatrix[1][2];
	    data[cursor + 13] = mvMatrix[1][3];

	    data[cursor + 14] = mvMatrix[2][0];
	    data[cursor + 15] = mvMatrix[2][1];
	    data[cursor + 16] = mvMatrix[2][2];
	    data[cursor + 17] = mvMatrix[2][3];

	    data[cursor + 18] = mvMatrix[3][0];
	    data[cursor + 19] = mvMatrix[3][1];
	    data[cursor + 20] = mvMatrix[3][2];
	    data[cursor + 21] = mvMatrix[3][3];

	    data[cursor + 22] = obj - > color.r;
	    data[cursor + 23] = obj - > color.g;
	    data[cursor + 24] = obj - > color.b;
	    data[cursor + 25] = obj - > color.a;

	    cursor += floatsPerVert;

	    q++;
	  }
	}

	#if defined PLATFORM_WINDOWS || defined PLATFORM_OSX
	// Generate VAO
	glGenVertexArrays(1, (GLuint * ) & vao);
	checkGLError("glGenVertexArrays");
	glBindVertexArray(vao);
	checkGLError("glBindVertexArray");#
	endif

	// Generate VBO
	glGenBuffers(1, (GLuint * ) & vbo);
	checkGLError("glGenBuffers");
	glBindBuffer(GL_ARRAY_BUFFER, vbo);
	checkGLError("glBindBuffer");

	// Load data into VBO
	glBufferData(GL_ARRAY_BUFFER, sizeof(float) * 6 * floatsPerVert * q, data, GL_STATIC_DRAW);
	checkGLError("glBufferData");

	// Delete data
	delete data;

	// Get aspect
	float width = PLAT_GetWindowWidth();
	float height = PLAT_GetWindowHeight();#
	ifdef PLATFORM_OPENVR
	float aspect = 1.0;#
	else
	  float aspect = width / height;#
	endif

	// DRAW
	glEnable(GL_CULL_FACE);
	checkGLError("glEnable");
	glFrontFace(GL_CCW);
	checkGLError("glFrontFace");

	glCullFace(GL_BACK);
	checkGLError("glCullFace");

	glEnable(GL_BLEND);
	checkGLError("ShapeRenderer glEnable");#
	ifndef PLATFORM_ANDROID
	glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
	checkGLError("ShapeRenderer glBlendFunc");#
	endif

	// Add program to OpenGL environment
	int curProgram = -1;
	curProgram = programMain;

	glUseProgram(curProgram);
	checkGLError("SpriteRenderer glUseProgram");

	#if defined PLATFORM_WINDOWS || defined PLATFORM_OSX
	// Bind the VAO
	glBindVertexArray(vao);
	checkGLError("glBindVertexArray");#
	endif

	// Bind the VBO
	glBindBuffer(GL_ARRAY_BUFFER, vbo);
	checkGLError("glBindBuffer");

	// Set the projection matrix
	glm::mat4 projMatrix;

	#if defined PLATFORM_OPENVR
	projMatrix = glm::make_mat4((const GLfloat * ) g_projectionMatrix.get());#
	else
	  projMatrix = glm::perspective(VIEW_FOV, aspect, 0.001 f, 1000.0 f);#
	endif

	setMatrix(curProgram, "projectionMatrix", projMatrix);

	setUniform4f(curProgram, "globalColor", globalColor.x, globalColor.y, globalColor.z, globalColor.w);

	int t = texAtlas.getGlTexId();

	glActiveTexture(GL_TEXTURE0);
	checkGLError("glActiveTexture");

	glBindTexture(GL_TEXTURE_2D, t);

	setUniform2f(curProgram, "vTexSpan", 1.0, 1.0);
	setUniform1f(curProgram, "useTexture", 1.0);

	setUniform1f(curProgram, "fadeNear", 600.0 * NDC_SCALE);
	setUniform1f(curProgram, "fadeFar", 900.0 * NDC_SCALE);

	// Set attributes
	setVertexAttrib(curProgram, "vPosition", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 0);
	setVertexAttrib(curProgram, "vTexCoords", 2, GL_FLOAT, false, floatsPerVert * sizeof(float), 4);

	setVertexAttrib(curProgram, "mvMatrixPt1", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 6);
	setVertexAttrib(curProgram, "mvMatrixPt2", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 10);
	setVertexAttrib(curProgram, "mvMatrixPt3", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 14);
	setVertexAttrib(curProgram, "mvMatrixPt4", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 18);

	setVertexAttrib(curProgram, "vColor", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 22);

	// Draw
	glDrawArrays(GL_TRIANGLES, 0, q * 6);
	checkGLError("glDrawArrays");

	#if defined PLATFORM_WINDOWS || defined PLATFORM_OSX
	// Reset
	glBindVertexArray(0);
	glBindTexture(GL_TEXTURE_2D, 0);
	glUseProgram(0);#
	endif

	// Delete VAO and VBO
	glDeleteBuffers(1, (GLuint * ) & vbo);#
	if defined PLATFORM_WINDOWS || defined PLATFORM_OSX
	glDeleteVertexArrays(1, (GLuint * ) & vao);#
	endif

Shader Code:


	//
	// VERTEX SHADER ES 2.0
	//

	const char * vertexShaderCodeES20 =

	  "attribute vec4 vPosition;"\
	"varying lowp vec4 posOut; "\
	"attribute vec2 vTexCoords;"\
	"varying lowp vec2 vTexCoordsOut; "\
	"uniform vec2 vTexSpan;"\
	"attribute vec4 vNormal;"\
	"varying vec4 vNormalOut;"\
	"attribute vec4 vVertexLight; "\
	"varying vec4 vVertexLightOut; "\
	"uniform mat4 projectionMatrix; "\
	"varying lowp float distToCamera; "\

	"attribute vec4 mvMatrixPt1; "\
	"attribute vec4 mvMatrixPt2; "\
	"attribute vec4 mvMatrixPt3; "\
	"attribute vec4 mvMatrixPt4; "\

	"attribute vec4 vColor; "\
	"varying vec4 vColorOut;"\

	"attribute mat4 oldmvMatrix; "\

	"void main() {"\

	"  mat4 mvMatrix; "\

	"  mvMatrix[0] = mvMatrixPt1; "\
	"  mvMatrix[1] = mvMatrixPt2; "\
	"  mvMatrix[2] = mvMatrixPt3; "\
	"  mvMatrix[3] = mvMatrixPt4; "\

	"  gl_Position = projectionMatrix * mvMatrix * vPosition; "
	"  vTexCoordsOut = vTexCoords * vTexSpan; "\
	"  posOut = gl_Position; "\

	"  vec4 posBeforeProj = mvMatrix * vPosition;"\
	"  distToCamera = -posBeforeProj.z; "\

	"  vColorOut = vColor; "\
	"}\n";

	//
	// FRAGMENT SHADER ES 2.0
	//

	const char * fragmentShaderCodeES20 =

	  "uniform sampler2D uTexture; "\
	"uniform lowp vec4 vColor; "\
	"uniform lowp vec4 globalColor; "\
	"varying lowp vec2 vTexCoordsOut; "\
	"varying lowp vec4 posOut; "\
	"uniform lowp float useTexture; "\

	"uniform lowp float fadeNear; "\
	"uniform lowp float fadeFar; "\

	"varying lowp float distToCamera; "\
	"varying lowp vec4 vColorOut; "\

	"void main() {"\

	"   lowp vec4 f = texture2D(uTexture, vTexCoordsOut.st); "\
	"   if (f.a == 0.0) "\
	"       discard; "\

	"	lowp float visibility = 1.0; "\
	"   lowp float alpha = 1.0; "\

	"   if (distToCamera >= fadeFar) discard; "\

	"   if (distToCamera >= fadeNear) "\
	"		alpha = 1.0 - (distToCamera - fadeNear) * 3.0; "\

	"   if (useTexture == 1.0)"\
	"   {"\
	"      gl_FragColor = texture2D(uTexture, vTexCoordsOut.st) * vColorOut * vec4(visibility, visibility, visibility, alpha) * globalColor; "\
	"   }"\
	"   else"\
	"   {"\
	"      gl_FragColor = vColorOut * vec4(visibility, visibility, visibility, alpha) * globalColor; "\
	"   }"\
	"}\n";

The rest of the new code is here:

TextureAtlas.cpp

Renderer.cpp

https://github.com/dimitrilozovoy/Voxyc/

Advertisement

Did you profile? CPU or GPU bottleneck?

 

The new code seems to do some extra things as well.

  • `mvMatrix` is being made for each sprite and then stored in every vertex? That is a lot of data. Normally for rendering tiles, I'll just do the X/Y addition in the CPU code (directly, no matrix), then if the game say camera rotation/zooming and I need a matrix, I'll just have a single one for all world sprites in the entire frame (as a uniform).

    I also found doing the translation directly helps avoid FP rounding errors that can cause visible seams between adjacent tiles/sprites.
     
  • Not sure what the cost of `glGenBuffers`, `glGenVertexArrays`, etc. is. The code I have here appears to re-use the same one, replacing the contents with `glBufferData`. I also believe STATIC is slower to "upload" than DYNAMIC or STREAM.
     
  • What is `texAtlas.add(obj- >textureName);`. Your not rebuilding a texture dynamically are you? Even if not every frame, need to be careful not to cause slow frames / stutter. Also looks like a string, if its doing string map lookups for every sprite that is not ideal.
     
  • Also not sure on the cost of things like `setVertexAttrib`. You should be able to do this once, and it is saved with the `GL_ARRAY_BUFFER` (possibly all in one go, e.g. `glVertexAttribPointer`)
     
  • Any sort of dynamic branch in a shader is usually bad if adjacent/nearby data will branch differently. GPU cores are not like CPU ones and can't all independently do their own thing. I didn't look closely at your data, but something to be aware of.
     
  • The "useTexture" path calls `texture2D` twice, I am not sure this will be optimised out.
     
  • The `mvMatrix * vPosition` multiplication is done twice, again not sure it will optimise that.
     
  • What is vTexSpan for? Seems like extra work. Likewise for the unused normals.

 

I haven't fully read through your code or your github but:

That code you posted for OpenGL ES2, is it meant to be pseudocode? There are no functions. It is not clear what you are doing as a once off process and what you are doing per frame. The general idea in graphics programming (and game programming in general) is usually to move as much code into a once off process (on starting game or level etc) and do as little as possible per frame.

As such, to move a viewpoint, you typically don't change the vertex data, you might change, e.g. a matrix representing the view / camera transform and pass it as a uniform. This is very cheap to do for the GPU.

If you do need to change dynamically vertex data each frame, you should explicitly tell the API that it is dynamic (rather than static unchanging) on creation. You have to be very careful using dynamic vertex buffers so as not to drastically affect performance by stalling the pipeline. In some cases this means creating e.g. 3 copies of a dynamic VB, and using them in turn on each frame. In some cases the API help do this for you, it is a good idea to try both and compare if you are not sure.

You also absolutely do not want to be making any dynamic allocations / deallocations either on the GPU or CPU each frame.

You recreate the array each frame, consider making a big array of sprites and not to use all of them, or if you have a constant num of sprites then you use glBufferSubData and you dont do glGenBuffers per frame too its only needed when you change the size of vertex buffer, anyway you gave no idea what ur doing

13 hours ago, SyncViews said:

Did you profile? CPU or GPU bottleneck?

No. I will. Thank you.

13 hours ago, SyncViews said:

The new code seems to do some extra things as well.

  • `mvMatrix` is being made for each sprite and then stored in every vertex? That is a lot of data. Normally for rendering tiles, I'll just do the X/Y addition in the CPU code (directly, no matrix), then if the game say camera rotation/zooming and I need a matrix, I'll just have a single one for all world sprites in the entire frame (as a uniform).

Yes, since I was looking for a way to draw all sprites with one call, I decided to make mvMatrix an attribute. Should I try sending it as an array of uniforms? Maybe I can send an index of the sprite as an attribute and then get mvMatrix out of a uniform array based on that? Then, I think I will be constrained by maximum size of a uniform array. These guys here talk about values around 512 maximum floats (32 matrices max). Granted, this is a conversation from 2008 so the limits must have risen since then. I'm looking to draw around 256-1024 sprites (32x32 grid would be nice), and it should be as butter-smooth as OpenGL 1.1 was. It would suffice if I could have 256 matrices in a uniform array if that speeds things up. Can I?

Thing is, even though right now it's a just a grid, the sprites are supposed to be stretchable/bendable, like trey were in my old fixed pipeline code, so yes, each corner of each poly does have to have a completely unique position on every frame. What I'm building is an editor for a flexible mesh of voxels, where you can stretch each corner and morph it into interesting architecture or landscapes. This worked perfectly in my old engine but it was Java and fixed-function.

13 hours ago, SyncViews said:
  • Not sure what the cost of `glGenBuffers`, `glGenVertexArrays`, etc. is. The code I have here appears to re-use the same one, replacing the contents with `glBufferData`. I also believe STATIC is slower to "upload" than DYNAMIC or STREAM.

Got it. Will try re-use the same VBO and will try DYNAMIC and STREAM. Other people have mentioned this as well below. Thank you.

13 hours ago, SyncViews said:
  • What is `texAtlas.add(obj- >textureName);`. Your not rebuilding a texture dynamically are you? Even if not every frame, need to be careful not to cause slow frames / stutter. Also looks like a string, if its doing string map lookups for every sprite that is not ideal.

It makes sure the texture is in the texture atlas. It's rebuilt as-needed (only when a brand new texture is added). You're right, I probably should get rid of string map lookup here. But in this particular case there is only one texture so array size is 1, so it's not the bottleneck.

13 hours ago, SyncViews said:
  • Also not sure on the cost of things like `setVertexAttrib`. You should be able to do this once, and it is saved with the `GL_ARRAY_BUFFER` (possibly all in one go, e.g. `glVertexAttribPointer`)

setVertexAttrib just calls all the gl functions needed to set up an attribute. Good point, though. I should try to do this once if I can. This is not the only program/renderer that runs in the engine though, so I assumed I have to re-set-up all the attributes on every frame for every program. Is that not the case?

13 hours ago, SyncViews said:
  • Any sort of dynamic branch in a shader is usually bad if adjacent/nearby data will branch differently. GPU cores are not like CPU ones and can't all independently do their own thing. I didn't look closely at your data, but something to be aware of.
     

I'm not super worried about the gaps between the sprites. This is only for an editor, not for rendering in the game. As long as it's smooth and I can quickly build vast landscapes and cities out of voxels, that's all I care about.

13 hours ago, SyncViews said:

 

  • The "useTexture" path calls `texture2D` twice, I am not sure this will be optimised out.
     
  • The `mvMatrix * vPosition` multiplication is done twice, again not sure it will optimise that.

 

Thank you, will try to see if I can only calculate these once.

13 hours ago, SyncViews said:
  • What is vTexSpan for? Seems like extra work. Likewise for the unused normals.

 

It's texture span. It's basically how many voxels a texture spans before it repeats. I still have it in the shader for some legacy reason I think. I can't remember what I was going to do with this value, but I think it was important for something once before.

7 hours ago, lawnjelly said:

That code you posted for OpenGL ES2, is it meant to be pseudocode? There are no functions. It is not clear what you are doing as a once off process and what you are doing per frame. The general idea in graphics programming (and game programming in general) is usually to move as much code into a once off process (on starting game or level etc) and do as little as possible per frame.

It's C++ that runs on every frame. Will clarify these things in the future.

7 hours ago, lawnjelly said:

As such, to move a viewpoint, you typically don't change the vertex data, you might change, e.g. a matrix representing the view / camera transform and pass it as a uniform. This is very cheap to do for the GPU.

If you do need to change dynamically vertex data each frame, you should explicitly tell the API that it is dynamic (rather than static unchanging) on creation. You have to be very careful using dynamic vertex buffers so as not to drastically affect performance by stalling the pipeline. In some cases this means creating e.g. 3 copies of a dynamic VB, and using them in turn on each frame. In some cases the API help do this for you, it is a good idea to try both and compare if you are not sure.

You also absolutely do not want to be making any dynamic allocations / deallocations either on the GPU or CPU each frame.

Yes, in this case the goal is to change vertex data on every frame. Will definitely try using one VBO without recreating it, set it as dynamic, then do updates to it. Thank you.

7 hours ago, _WeirdCat_ said:

You recreate the array each frame, consider making a big array of sprites and not to use all of them, or if you have a constant num of sprites then you use glBufferSubData and you dont do glGenBuffers per frame too its only needed when you change the size of vertex buffer, anyway you gave no idea what ur doing

Thank you for the brilliant suggestion. Will do exactly that. Thanks for pointing out that I have no idea what I'm doing. I guess this is why I'm here, so I can learn from you and one day, maybe, have an idea of what I'm doing.

@SyncViews, just an idea. What if I send mvMatrix as a uniform array, and even though I can only send 32 or 64 matrices at once, I can then break it up into, let's say, 4 draw calls, to do 128 or 256 sprites? Maybe worth a try.

12 minutes ago, VoxycDev said:

What if I send mvMatrix as a uniform array, and even though I can only send 32 or 64 matrices at once, I can then break it up into, let's say, 4 draw calls, to do 128 or 256 sprites? Maybe worth a try.

What are those sprites actualy defined by? Do they rotate in world space, scale, and translate? In that case you still need only 3x4 matrix not a 4x4 matrix, which saves you entire 4f vector in uniform, general most lowend limit for uniform array is 256 times 4f vectors. In case your sprites do not rotate (what they should not, so not call it sprites if they do but general quads) you can use a single 4f vector for position and fourth number as the scale factor around all 3 axises.

1 minute ago, JohnnyCode said:

What are those sprites actualy defined by? Do they rotate in world space, scale, and translate? In that case you still need only 3x4 matrix not a 4x4 matrix, which saves you entire 4f vector in uniform, general most lowend limit for uniform array is 256 times 4f vectors. In case your sprites do not rotate (what they should not, so not call it sprites if they do but general quads) you can use a single 4f vector for position and fourth number as the scale factor around all 3 axises.

Well, ideally I want a multi-purpose blazing-fast particle system. But you're absolutely right! I should try to pass only as much data as is absolutely required for the task.

48 minutes ago, VoxycDev said:
8 hours ago, _WeirdCat_ said:

You recreate the array each frame, consider making a big array of sprites and not to use all of them, or if you have a constant num of sprites then you use glBufferSubData and you dont do glGenBuffers per frame too its only needed when you change the size of vertex buffer, anyway you gave no idea what ur doing

Thank you for the brilliant suggestion. Will do exactly that. Thanks for pointing out that I have no idea what I'm doing. I guess this is why I'm here, so I can learn from you and one day, maybe, have an idea of what I'm doing.

I think you misread weirdcat, I think he was implying that how you do it will depend on what exactly you want to achieve, 'gave no idea' rather than 'have no idea'! :) 

I don't think you are that far from something decent.. it is more a case of jigging things around to make it more efficient for the hardware, reducing the amount of unnecessary expensive calls and repeat work. To go further on my suggestion about separating your once off work from your per frame work, you could have something like this:


void Game_Start() // one off stuff on game creation, create shaders, textures maybe, vertex buffers etc?
void Game_End() // free resources etc used by the whole game
  
void Level_Start() // one off stuff dependent on a game level .. might have some resources
void Level_End() // free level resources
  
void Frame_Update() // stuff you want to do on your frame, updating if necessary and drawing using the resources you have already created

That kind of scheme is fine to get started, be aware thought that on some platforms you can 'lose the device' for the 3D and sometimes more (say if the user starts playing another game in between, or alt tab etc, this may be the case on android from memory), in which case you need to recreate your GPU resources, in which case it makes sense to reuse the same bit of code you would use on game / level start for resource creation. It can often be a good idea to use pools as wierdcat suggested, allocate more than you need at the start, then use whatever you need on each frame (this is true for main memory as well as GPU resources).

The other things is that you appear to be recreating and compiling the shader on every frame, which will probably kill performance. Again move this to one off code and reuse the shader. After all this is done you can reassess whether there are any bottlenecks.

This topic is closed to new replies.

Advertisement