Register | Login

Stacking Code

public interface IBlog { string Dump(Stream consciousness); }

Compressing BLOBs in the Database

Tuesday, 4 January, 2011 @ 8:19 PM < Adam Boddington
Tags: Building Neno, NHibernate, SQL Server

This is post #33 in the Building Neno series. Please click here for a description of the Building Neno project and instructions on how to access the source code for this post.

I recently started storing the attachments for my posts in the database.

It works, but I've noticed some of the BLOBs could benefit from compression before being stored. If I compress them before saving them, and decompress them after retrieval, I could save a bit of database space. Typically this might be considered if:

  • No other application is using the BLOB data directly. In this particular application, all integration will be done through the service layer -- nothing talks to the database directly except for the application itself.
  • Database space is restricted.
  • The network between the database and web server is restricted.
  • Web server CPU cycles are in abundance.

Absolutely the number of times a BLOB is retrieved and therefore decompressed has to be taken into account. The small saving in disk space may well be a false economy if the web server is placed under significant load as a result.

Having the option to compress can certainly be handy, even if it's never used.

The fact that my hoster charges for database disk space and not for web server CPU cycles has nothing to do with me implementing this. Honest. ;)

Compression Engine

The first thing to do is hook up a replaceable mechanism for compressing and decompressing. I'll call it an engine and I'll put it in the StackingCode.Moja.Compression namespace. I'll keep it simple for now, I just need to compress/decompress arrays of bytes -- I'm not working with streams just yet.

public static class Engine
{
    public static byte[] Compress(byte[] bytes)
    {
        return Container.Get<IEngineProvider>().Compress(bytes);
    }

    public static byte[] Decompress(byte[] bytes)
    {
        return Container.Get<IEngineProvider>().Decompress(bytes);
    }
}

IEngineProvider is the interface for my replaceable engine.

public interface IEngineProvider
{
    byte[] Compress(byte[] bytes);
    byte[] Decompress(byte[] bytes);
}

While I'm having fun in this namespace, I'll whip up a quick implementation of IEngineProvider to use by default. It's all replaceable anyway, so if my first implementation is crappy, I can replace it later.

Since version 2.0 of the .NET Framework, there has been two classes to help with compression, System.IO.Compression.DeflateStream and System.IO.Compression.GZipStream. I'm going to use GZipStream first up since its algorithm is a bit more widely used.

public class GZipEngineProvider : IEngineProvider
{
    #region IEngineProvider Members

    public byte[] Compress(byte[] bytes)
    {
        using (var outStream = new MemoryStream())
        {
            using (var inStream = new MemoryStream(bytes))
            using (var deflaterStream = new GZipStream(outStream, CompressionMode.Compress))
                inStream.CopyTo(deflaterStream);

            // Ensure the deflater stream is closed, otherwise the trailer won't be written.
            return outStream.ToArray();
        }
    }

    public byte[] Decompress(byte[] bytes)
    {
        using (var outStream = new MemoryStream())
        {
            using (var inStream = new MemoryStream(bytes))
            using (var inflaterStream = new GZipStream(inStream, CompressionMode.Decompress))
                inflaterStream.CopyTo(outStream);

            return outStream.ToArray();
        }
    }

    #endregion
}

Hooking up a specific engine is done through my IOC container provider, Ninject in this case.

Bind<IEngineProvider>().To<GZipEngineProvider>().InSingletonScope(); // Singleton

Domain Model

I'm using a File class right now to hold information about my attachments, including a reference to the Blob object itself. Since I can see myself reusing this code, I'm going to move the File and Blob classes down to StackingCode.Moja.DomainModel. Then I'm going to make a new class to handle compressible files.

namespace StackingCode.Moja.DomainModel
{
    public abstract class CompressibleFile<TId, TBlob, TBlobId> : File<TId, TBlob, TBlobId> where TBlob : Blob<TBlobId>
    {
        private bool _isCompressed;

        public override byte[] Bytes
        {
            get
            {
                if (IsCompressed)
                    return Engine.Decompress(Blob.Bytes);

                return Blob.Bytes;
            }
            protected set
            {
                if (IsCompressed)
                    Compress(value);
                else
                    PassThrough(value);
            }
        }

        [DisplayName("Compressed")]
        public virtual bool IsCompressed
        {
            get { return _isCompressed; }
            set
            {
                if (!_isCompressed && value)
                    Compress(Blob.Bytes);

                if (_isCompressed && !value)
                    Decompress(Blob.Bytes);

                _isCompressed = value;
            }
        }

        [DisplayName("Compressed Size")]
        public virtual int? CompressedSize { get; private set; }

        private void Compress(byte[] bytes)
        {
            Blob.Bytes = Engine.Compress(bytes);
            CompressedSize = Blob.Bytes.Length;
        }

        private void Decompress(byte[] bytes)
        {
            Blob.Bytes = Engine.Decompress(bytes);
            CompressedSize = null;
        }

        private void PassThrough(byte[] bytes)
        {
            Blob.Bytes = bytes;
            CompressedSize = null;
        }
    }
}

I want to be able to choose whether to compress or not to compress, so I've included a boolean for that.

The Neno domain model classes are just concrete descendants with the Id type specified.

namespace StackingCode.Neno.DomainModel
{
    public class File : CompressibleFile<int, Blob, int>
    {
    }

    public class Blob : Blob<int>
    {
    }
}

NHibernate and SQL Server

The NHibernate mapping file for CompressibleFile has something new. Because the IsCompressed property has logic in its setter, I need to tell NHibernate to use the backing field when hydrating objects for me.

<property name="IsCompressed" access="nosetter.camelcase-underscore" />
<property name="CompressedSize" />

And because I'm modifying a table that already has data in it, I can avoid a migration script by specifying a default value on the new NOT NULL column.

[IsCompressed]   BIT            DEFAULT ((0)) NOT NULL,
[CompressedSize] INT            NULL,

Result

All that's left are a couple of changes to my attachment views. Now I can compress attachments whenever I feel the need.

Compressed Attachment

Check out the source code for the full implementation.

There are 0 comments.


Comments

Leave a Comment

Please register or login to leave a comment.


Older
Adding Multithreading

Newer
Brisbane Flood, 2011

Older
Adding Multithreading

Newer
Brisbane Flood, 2011

browse with Pivot


About


Projects

Building Neno


RSS
Recent Posts

Codility Nitrogenium Challenge
OS X Lock
HACT '13
Codility Challenges
Priority Queue


Tags

Architecture (13)
ASP.NET (2)
ASP.NET MVC (13)
Brisbane Flood (1)
Building Neno (38)
C# (4)
Challenges (3)
Collections (1)
Communicator (1)
Concurrency Control (2)
Configuration (1)
CSS (5)
DataAnnotations (2)
Database (1)
DotNetOpenAuth (2)
Entity Framework (1)
FluentNHibernate (2)
Inversion of Control (5)
JavaScript (1)
jQuery (4)
Kata (2)
Linq (7)
Markdown (4)
Mercurial (5)
NHibernate (20)
Ninject (2)
OpenID (3)
OS X (1)
Pivot (6)
PowerShell (8)
Prettify (2)
RSS (1)
Spring (3)
SQL Server (5)
T-SQL (2)
Validation (2)
Vim (1)
Visual Studio (2)
Windows Forms (3)
Windows Service (1)


Archives


Powered by Neno, ASP.NET MVC, NHibernate, and small furry mammals. Copyright 2010 - 2011 Adam Boddington.
Version 1.0 Alpha (d9e7e4b68c07), Build Date Sunday, 30 January, 2011 @ 11:37 AM