Register | Login

Stacking Code

public interface IBlog { string Dump(Stream consciousness); }

BLOBs, NHibernate and SQL Server

Saturday, 18 December, 2010 @ 9:48 PM < Adam Boddington
Tags: ASP.NET MVC, Building Neno, Markdown, NHibernate, SQL Server

This is post #27 in the Building Neno series. Please click here for a description of the Building Neno project and instructions on how to access the source code for this post.

One of the tricks to having BLOBs in the application is to ensure they're only pulled from the database when they're actually needed. Pulling 5MB of binary data into web server memory when it's not going to be used is wasteful, but not really a shooting offence. When working with sets of BLOBs however, pulling 100 x 5MB of binary data into web server memory is going to be noticeable -- and yes, someone might be reaching for a blunt object. It sounds like such an obvious situation to avoid, but in this day an age of ORMs, it can require a little management to get right.

Often the data surrounding the BLOB bytes is required, and required quite often. Things like file name, content type, size. I might want to display those details frequently. But I usually only want the actual bytes when it comes time to view or download the BLOB.

In NHibernate 2.x, the best way to avoid pulling the bytes until they were needed was to put the bytes into their own class and make that class a lazy loaded property.

In NHibernate 3.x, the separate class isn't needed anymore, but I haven't had time to grok how that works just yet. I'll be rolling with the way I know for now.

Implementation

Here's what the separate blob class looks like in Neno.

public class Blob : Entity<int>
{
    [Required]
    public virtual byte[] Bytes { get; set; }
}

And here is its NHibernate mapping file.

<class xmlns="urn:nhibernate-mapping-2.2" name="StackingCode.Neno.DomainModel.Blob, StackingCode.Neno" mutable="true" table="Neno_Blob">
    <id name="Id">
        <generator class="identity" />
    </id>
    <version name="Version" />
    <property name="Bytes" type="BinaryBlob" />
</class>

The underlying T-SQL datatype is VARBINARY (MAX). The BinaryBlob special type is to let NHibernate know this property is bigger than 8,000 bytes.

Here's what the surrounding file looks like.

public class File : Entity<int>
{
    public File()
    {
        Blob = new Blob();
    }

    [Required]
    [StringLength(100)]
    public virtual string Name { get; set; }

    [Required]
    [StringLength(10)]
    public virtual string Extension
    {
        get { return Path.GetExtension(Name); }
        private set { }
    }

    [DisplayName("Content Type")]
    [Required]
    [StringLength(100)]
    public virtual string ContentType { get; set; }

    public virtual byte[] Bytes
    {
        get { return Blob.Bytes; }
    }

    [StringLength(50)]
    public virtual string Hash { get; private set; }

    public virtual int Size { get; private set; }

    [Required]
    protected virtual Blob Blob { get; private set; }

    public virtual void Store(byte[] bytes)
    {
        if (bytes == null)
            throw new ArgumentNullException("bytes");

        Blob.Bytes = bytes;
        Hash = Convert.ToBase64String(new MD5CryptoServiceProvider().ComputeHash(bytes));
        Size = bytes.Length;
    }

    // ...
}

And its NHibernate mapping file.

<class xmlns="urn:nhibernate-mapping-2.2" name="StackingCode.Neno.DomainModel.File, StackingCode.Neno" discriminator-value="F" mutable="true" table="Neno_File">
    <id name="Id">
        <generator class="identity" />
    </id>
    <discriminator column="Discriminator" />
    <version name="Version" />
    <property name="Name" />
    <property name="Extension" />
    <property name="ContentType" />
    <property name="Hash" />
    <property name="Size" />
    <many-to-one name="Blob" cascade="all" />
    <subclass name="StackingCode.Neno.DomainModel.Attachment, StackingCode.Neno" discriminator-value="A">
        <property name="Slug" />
    </subclass>
</class>

The file always has a blob -- there's no way for it to not have one. In the NHibernate mapping file, I've cascaded all operations on file objects down to the blob property, so the bytes are saved if they're changed, and if the file is deleted, the blob is too. But the best part, the blob is only retrieved from the database when something accesses File.Bytes in code. The rest of the file's properties can be freely accessed without the blob being retrieved from the database.

The file domain class isn't the attachment class. I've made the distinction because I expect to have other types of files in the application in the near future. Here's what the attachment domain class looks like.

public class Attachment : File
{
    private string _slug;

    [Required]
    [StringLength(100)]
    public virtual string Slug
    {
        get { return _slug.Sluggify(Name); }
        set { _slug = value; }
    }
}

Attack of the slugs... Slugs again! I've got slugs on attachments because I plan to download my attachments by appending the attachment slug to the post URL. For example...

BLOB Size Limitations

SQL Server will happily accept 2GB of BLOB bytes. Unfortunately the way these BLOBs are set up with NHibernate, the maximum is much less than that. That's simply because of the way the bytes are being uploaded, passed off to NHibernate, and then saved in the database.

When a HTTP request comes in, there's a stream attached to it for the uploaded file. NHibernate wants an array of bytes, so the stream is read into a new array. The content length from the HTTP request can't always be trusted, so sometimes it's read into another stream first, then copied to an array. The web server ends up holding up to three copies of the data in memory before passing one of them off to NHibernate for transmittal to the database. An 80MB file results in a 240MB spike in RAM usage -- and that can result in out of memory errors. For small files however, like the less than 1MB files I plan to upload, this setup is just fine.

If larger files are needed there are plenty of ways to get the application to play nice. One approach that has worked for me in the past is to short-circuit NHibernate and pass the stream straight down to the database. A transaction is opened, NHibernate is flushed to get the other changes into the database, then custom T-SQL is used to write the stream from the request straight into the database -- and then everything is committed. With this approach the web server never holds more than a small chunk of the stream at a time. I've managed to upload 1GB files using this method.

I'll save the details of how to do this for another post.

Database Performance

Another issue with putting BLOBs in databases is the performance hit it can incur. Mixing normal data with binary data does tend to slow things down. If the normal data was by itself, the database would be able to access anything it needed in a certain amount of time. With binary data mixed in, the database has to access a much larger storage area to find the same amount of normal data. I'm massively oversimplifying, but that's the general gist of it.

Luckily SQL Server can alleviate this problem through files and filegroups. By having a secondary filegroup just for binary data, the normal data is kept isolated in the primary filegroup, keeping queries and other operations as snappy as they would otherwise be.

CREATE TABLE [dbo].[Neno_Blob] (
    [Id]                    INT             IDENTITY (1, 1) NOT NULL,
    [Version]               INT             NOT NULL,
    [Bytes]                 VARBINARY (MAX) NOT NULL,
    PRIMARY KEY CLUSTERED ([Id] ASC)
    ON [PRIMARY]/* TEXTIMAGE_ON [SECONDARY]*/)

I don't have that luxury with my hosted SQL Server though, but hopefully my attachments won't be big enough to matter.

More Implementation

One of the gotchas in building views with upload controls is making sure the form has the right attributes on it. Specifically an enctype of multipart/form-data.

<h2>Edit Post Attachment</h2>
<%
    using (Html.BeginForm("edit", "postattachment", new { PostId = Request["PostId"] }, FormMethod.Post, new { enctype = "multipart/form-data" }))
    {%>

To find the uploaded file inside the postback action, look in the Request.Files collection.

The last item of note in the attachments implementation is how to open or download the bytes.

public ActionResult PostAttachmentByPublishDateSlugAndAttachmentSlug(int year, int month, int day, string slug, string attachmentSlug, bool? download)
{
    var publishDate = new DateTime(year, month, day);
    Post post = Container.Get<IPostService>().GetPostByPublishDateAndSlug(publishDate, slug);

    // ...

    Attachment attachment = post.Attachments.SingleOrDefault(a => a.Slug == attachmentSlug);

    // ...

    if (download.HasValue && download.Value)
        Response.AddHeader("Content-Disposition", "attachment; filename=\"" + attachment.Name + "\"");
    else
        Response.AddHeader("Content-Disposition", "inline; filename=\"" + attachment.Name + "\"");

    return File(attachment.Bytes, attachment.ContentType);
}

By switching around the content disposition based on an optional parameter, I can use the same action to open the attachment (useful for img tags) or download it (useful for zip files).

Here's an example of both on the same attachment...

Markdown

[![Screenshot](/blog/2010/12/18/blobs-nhibernate-and-sql-server/screenshot "Screenshot")](/blog/2010/12/18/blobs-nhibernate-and-sql-server/screenshot?Download=True)

HTML

<a href="/blog/2010/12/18/blobs-nhibernate-and-sql-server/screenshot?Download=True"><img src="/blog/2010/12/18/blobs-nhibernate-and-sql-server/screenshot" alt="Screenshot" title="Screenshot" /></a>

Result

Screenshot

Make sure to check out the source code to see the full implementation.

There are 0 comments.


Comments

Leave a Comment

Please register or login to leave a comment.


Older
Files in Databases

Newer
Replacing Spring with Ninject

Older
Files in Databases

Newer
Replacing Spring with Ninject

browse with Pivot


About


Projects

Building Neno


RSS
Recent Posts

Codility Nitrogenium Challenge
OS X Lock
HACT '13
Codility Challenges
Priority Queue


Tags

Architecture (13)
ASP.NET (2)
ASP.NET MVC (13)
Brisbane Flood (1)
Building Neno (38)
C# (4)
Challenges (3)
Collections (1)
Communicator (1)
Concurrency Control (2)
Configuration (1)
CSS (5)
DataAnnotations (2)
Database (1)
DotNetOpenAuth (2)
Entity Framework (1)
FluentNHibernate (2)
Inversion of Control (5)
JavaScript (1)
jQuery (4)
Kata (2)
Linq (7)
Markdown (4)
Mercurial (5)
NHibernate (20)
Ninject (2)
OpenID (3)
OS X (1)
Pivot (6)
PowerShell (8)
Prettify (2)
RSS (1)
Spring (3)
SQL Server (5)
T-SQL (2)
Validation (2)
Vim (1)
Visual Studio (2)
Windows Forms (3)
Windows Service (1)


Archives


Powered by Neno, ASP.NET MVC, NHibernate, and small furry mammals. Copyright 2010 - 2011 Adam Boddington.
Version 1.0 Alpha (d9e7e4b68c07), Build Date Sunday, 30 January, 2011 @ 11:37 AM