I just migrated my blog to the latest version of BlogEngine.NET 2.5.0.6.

I had a shock when I saw the number of spam that I had on the blog!

447883 Spam! Wow. So I started the cleaning by using BlogEngine tools but it was damn slow, and no way to stop it when you started the delete all.

So I stopped the web site which was a bad idea because then one XML file was damaged. As I always do a backup before doing something like that I was on the safe side, and just reverted the files.

Then I used 7zip to zip the posts folder which is located in the App_Data which was 338 MB, again wow.

Downloaded the zip file on my local machine, installed BlogEngine and imported the post.

I thought it would be faster on my machine because it is a recent one. But still to slow to treat 447833 spam messages.

So as a developer I went on and wrote a little application to do it. And after cleanup the spam which took less than 10 seconds I went to this folder size of the posts

Quite a difference ! And BlogEngine showing me the results

And here is the code, it is using .NET Framework 4 and the parallelization of queries to treat files:

#region using

using System;
using System.IO;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

#endregion

namespace BlogEngineSpamDelete
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            var files = Directory.GetFiles(@"C:\Temp\blogengine\posts", "*.xml");
            foreach (var file in files.AsParallel())
            {
                FixPost(file);
            }
        }

        private static void FixPost(string file)
        {
            XDocument doc;
            using (var stream = File.OpenRead(file))
            {
                doc = XDocument.Load(stream);
            }

            var comments = from comment in doc.Descendants(XName.Get("comment", String.Empty))
                           select comment;

            var spamComments = from comment in comments.ToArray()
                               let data = new CommentState(comment.Attribute("spam").Value,
                                                           comment.Attribute("approved").Value,
                                                           comment.Attribute("deleted").Value) 
                               where ShouldDeleteSpamAndUnApproved(data)
                               select comment;

            foreach (var spamComment in spamComments)
            {
                spamComment.Remove();
            }

            using (var writer = XmlWriter.Create(file, new XmlWriterSettings {Indent = true}))
            {
                doc.WriteTo(writer);
            }
        }

        private static bool ShouldDeleteSpam(CommentState commentState)
        {
            return !commentState.Approved && 
                   (commentState.Spam || commentState.Deleted);
        }
        
        private static bool ShouldDeleteSpamAndUnApproved(CommentState commentState)
        {
            return !commentState.Approved || 
                   commentState.Spam ||
                   commentState.Deleted;
        }

        private class CommentState
        {
            public CommentState(String spam, String approved, String deleted)
            {
                Approved = bool.Parse(approved);
                Spam = bool.Parse(spam);
                Deleted = bool.Parse(deleted);
            }

            public bool Approved { get; private set; }
            public bool Spam { get; private set; }
            public bool Deleted { get; private set; }
        }
    }
}

Update: I also posted the code on bitbucket: https://bitbucket.org/lkempe/blogenginespamdelete