🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to Graphics and GPU Programming

C++ BC5 decompressor

Graphics and GPU Programming Programming

Started by Josh Klint November 20, 2021 10:34 AM

7 comments, last by Josh Klint 2 years, 7 months ago

Josh Klint

1,469

Author

November 20, 2021 10:34 AM

I am trying to implement a BC5 decompressor in C++ using this C# code as a guide:

https://github.com/yretenai/bcffnet/blob/develop/BCFF/BlockDecompressor.cs

Any ideas what is wrong? Here is my decompress block code:

float GetRed(int Red0, int Red1, unsigned char Index)

{

if (Index == 0)

{

return float(Red0) / 255.0f;

}

if (Index == 1)

{

return float(Red1) / 255.0f;

}

float Red0f = float(Red0) / 255.0f;

float Red1f = float(Red1) / 255.0f;

if (Red0 > Red1)

{

Index -= 1;

return (Red0f * float(7 - Index) + Red1f * float(Index)) / 7.0f;

}

else

{

if (Index == 6)

{

return 0.0f;

}

if (Index == 7)

{

return 1.0f;

}

Index -= 1;

return (Red0f * float(5 - Index) + Red1f * float(Index)) / 5.0f;

}

unsigned char GetIndex(uint64_t strip, int offset)

{

return (unsigned char)((strip >> (3 * offset + 16)) & 0x7);

}

void BC5ReadBlock(uint64_t strip, std::array<float, 16>& block)

{

unsigned char Red0, Red1;

memcpy(&Red0, &strip, 1);

memcpy(&Red1, &strip + 1, 1);

for (int i = 0; i < 16; ++i)

{

int index = GetIndex(strip, i);

block[i] = GetRed(Red0, Red1, index);

}

}
Texture loading code:
float r, g;

std::array<float, 16> rblock, gblock;

uint64_t strip;

for (x = 0; x < blocks.x; ++x)

{

for (y = 0; y < blocks.y; ++y)

{

stream->Read(&strip, 8);

BC5::BC5ReadBlock(strip, rblock);

stream->Read(&strip, 8);

BC5::BC5ReadBlock(strip, gblock);

int index = 0;

for (int bx = 0; bx < 4; bx++)

{

for (int by = 0; by < 4; by++)

{

pixmap->WritePixel(x * 4 + bx, y * 4 + by, RGBA(Clamp(Floor(rblock[index] * 255.0f),0,255), Clamp(Floor(gblock[index] * 255.0f),0,255), 0, 255));

index++;

}

10x Faster Performance for VR: www.ultraengine.com

Josh Klint

1,469

Author

November 20, 2021 10:45 AM

I can see the above image is rotated and flipped, but the pixel colors are still not correct.

10x Faster Performance for VR: www.ultraengine.com

Programmer71

175

November 20, 2021 07:04 PM

Have you tried to swap the rgb content ? , also opengl flips the texture in the y direction

L. Spiro

25,818

November 21, 2021 08:32 AM

Why use that code as a template rather than just the code provided by Microsoft®?
https://docs.microsoft.com/en-us/windows/win32/direct3d10/d3d10-graphics-programming-guide-resources-block-compression#bc5

I’ve already used it to create working C++ code that handles all permutations:

/**
 * BC5U -> RGBA32F conversion.
 *
 * \param _pui8Src Source texels.
 * \param _pui8Dst Destination texels known to be in RGBA32F format.
 * \param _ui32Width Width of the image.
 * \param _ui32Height Height of the image.
 * \param _ui32Depth Depth of the image.
 * \param _pvParms Optional parameters for the conversion.
 */
template <unsigned _bSrgb /*Unused for now.*/, unsigned _bLumAlpha>
bool CDds::Bc5uToRgba32F( const uint8_t * _pui8Src, uint8_t * _pui8Dst, uint32_t _ui32Width, uint32_t _ui32Height, uint32_t _ui32Depth, void * /*_pvParms*/ ) {
	struct LSI_BC5_BLOCK {
		uint64_t ui64Red;
		uint64_t ui64Green;
	};
	const LSI_BC5_BLOCK * pbbBlocks = reinterpret_cast<const LSI_BC5_BLOCK *>(_pui8Src);
	uint32_t ui32BlocksW = (_ui32Width + 3) / 4;
	uint32_t ui32BlocksH = (_ui32Height + 3) / 4;
		
	struct LSI_RGBA32 {
		float fTexels[4];
	};
	LSI_RGBA32 * prgbaTexels = reinterpret_cast<LSI_RGBA32 *>(_pui8Dst);
	float fPaletteR[8];
	uint8_t ui8IndicesR[16];
	float fPaletteG[8];
	uint8_t ui8IndicesG[16];
	// Size per slice.
	uint32_t ui32SrcSliceSize = ui32BlocksW * ui32BlocksH;
	uint32_t ui32SliceSize = _ui32Width * _ui32Height;
	for ( uint32_t Z = 0; Z < _ui32Depth; ++Z ) {
		for ( uint32_t Y = 0; Y < ui32BlocksH; ++Y ) {
			for ( uint32_t X = 0; X < ui32BlocksW; ++X ) {
				uint64_t ui64ThisBlock = pbbBlocks[Z*ui32SrcSliceSize+Y*ui32BlocksW+X].ui64Red;
				DecodeBC4U( ui64ThisBlock, fPaletteR );
				Bc4Indices( ui64ThisBlock, ui8IndicesR );

				ui64ThisBlock = pbbBlocks[Z*ui32SrcSliceSize+Y*ui32BlocksW+X].ui64Green;
				DecodeBC4U( ui64ThisBlock, fPaletteG );
				Bc4Indices( ui64ThisBlock, ui8IndicesG );
				for ( uint32_t I = 0; I < 16; ++I ) {
					uint32_t ui32ThisX = X * 4 + I % 4;
					uint32_t ui32ThisY = Y * 4 + I / 4;
					if ( ui32ThisX < _ui32Width && ui32ThisY < _ui32Height ) {
						LSI_RGBA32 * prgbaRow0 = &prgbaTexels[Z*ui32SliceSize+ui32ThisY*_ui32Width+ui32ThisX];
						if ( _bLumAlpha ) {
							(*prgbaRow0).fTexels[0] = fPaletteR[ui8IndicesR[I]];
							(*prgbaRow0).fTexels[1] = fPaletteR[ui8IndicesR[I]];
							(*prgbaRow0).fTexels[2] = fPaletteR[ui8IndicesR[I]];
							(*prgbaRow0).fTexels[3] = fPaletteG[ui8IndicesG[I]];
						}
						else {
							(*prgbaRow0).fTexels[0] = fPaletteR[ui8IndicesR[I]];
							(*prgbaRow0).fTexels[1] = fPaletteG[ui8IndicesG[I]];
							(*prgbaRow0).fTexels[2] = 0.0f;
							(*prgbaRow0).fTexels[3] = 1.0f;
						}
					}
				}
			}
		}
	}
	return true;
}

/**
 * Decodes a single block of BC4U.
 *
 * \param _ui64Block The block to decode.
 * \param _pfPalette The created palette (contains 8 entries).
 */
void CDds::DecodeBC4U( uint64_t _ui64Block, float * _pfPalette ) {
	_pfPalette[0] = ((_ui64Block >> 0) & 0xFF) / 255.0f;
	_pfPalette[1] = ((_ui64Block >> 8) & 0xFF) / 255.0f;
	if ( _pfPalette[0] > _pfPalette[1] ) {
		// 6 interpolated color values.
		_pfPalette[2] = (6.0f * _pfPalette[0] + 1.0f * _pfPalette[1]) / 7.0f;	// Bit code 010.
		_pfPalette[3] = (5.0f * _pfPalette[0] + 2.0f * _pfPalette[1]) / 7.0f;	// Bit code 011.
		_pfPalette[4] = (4.0f * _pfPalette[0] + 3.0f * _pfPalette[1]) / 7.0f;	// Bit code 100.
		_pfPalette[5] = (3.0f * _pfPalette[0] + 4.0f * _pfPalette[1]) / 7.0f;	// Bit code 101.
		_pfPalette[6] = (2.0f * _pfPalette[0] + 5.0f * _pfPalette[1]) / 7.0f;	// Bit code 110.
		_pfPalette[7] = (1.0f * _pfPalette[0] + 6.0f * _pfPalette[1]) / 7.0f;	// Bit code 111.
	}
	else {
		// 4 interpolated color values.
		_pfPalette[2] = (4.0f * _pfPalette[0] + 1.0f * _pfPalette[1]) / 5.0f;	// Bit code 010.
		_pfPalette[3] = (3.0f * _pfPalette[0] + 2.0f * _pfPalette[1]) / 5.0f;	// Bit code 011.
		_pfPalette[4] = (2.0f * _pfPalette[0] + 3.0f * _pfPalette[1]) / 5.0f;	// Bit code 100.
		_pfPalette[5] = (1.0f * _pfPalette[0] + 4.0f * _pfPalette[1]) / 5.0f;	// Bit code 101.
		_pfPalette[6] = 0.0f;													// Bit code 110.
		_pfPalette[7] = 1.0f;													// Bit code 111.
	}
}

/**
 * Gets the indices from a BC4 block. 
 *
 * \param _ui64Block The block to decode.
 * \param _pui8Indices The 16 indices as extracted from the block.
 */
void LSE_CALL CDds::Bc4Indices( uint64_t _ui64Block, uint8_t * _pui8Indices ) {
	_ui64Block >>= 16;
	for ( uint32_t I = 0; I < 16; ++I ) {
		(*_pui8Indices++) = _ui64Block & 0x7;
		_ui64Block >>= 3;
	}
}

Your code is very branch-heavy and has quite a few redundancies, whether that matters to you right now or not, but it would be a lot easier to see what is going wrong if you decoded all of the color values and all of the indices fully before trying to output pixels. Your flow is very hard to follow as it is, contributing to your woes.

You also need to explain what the source of your image data is. Did you read that back from the GPU? Did you open a file and try to decode it from there? Depending on the source of your data you may need to rearrange some data, for example with byte-swapping.

Finally, “Floor(gblock[index] * 255.0f)” is going to ruin your RGB values. A value of 233.999 will be rounded down instead of up. Either use Round() instead of Floor() or multiply by 255.5f, not 255.0f. This isn’t your main problem, but it is one issue that is trivial to solve while you are at it.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Josh Klint

1,469

Author

November 22, 2021 06:32 AM

L. Spiro said:
Why use that code as a template rather than just the code provided by Microsoft®?

You are awesome and cool, thank you. :D

10x Faster Performance for VR: www.ultraengine.com

Josh Klint

1,469

Author

November 22, 2021 07:43 AM

I wonder why best-fit normals are not the standard for normal maps? I used them in our deferred renderer for screen normals with great results.

10x Faster Performance for VR: www.ultraengine.com

JoeJ

4,263

November 22, 2021 12:16 PM

Josh Klint said:
I wonder why best-fit normals are not the standard for normal maps? I used them in our deferred renderer for screen normals with great results.

Interesting, i did not know about this. Failed to understand how it works, but saw it needs a LUT lookup to uncompress.
So i wonder how this compares to sphere → octahedral mapping, which i thought to be the best compression method. Needs no LUT just a bit of analytical math.

Edit: I think i got it: It searches cells of a N^3 grid having roughly the same direction from the origin, and keeps the best match. Thus compression is offline only, we still have 3 components, and we need the LUT.

Here's a nice blog post comparing alternatives: https://knarkowicz.wordpress.com/2014/04/16/octahedron-normal-vector-encoding/