I just migrated my blog to the latest version of BlogEngine.NET 2.5.0.6.
I had a shock when I saw the number of spam that I had on the blog!
447883 Spam! Wow. So I started the cleaning by using BlogEngine tools but it was damn slow, and no way to stop it when you started the delete all.
So I stopped the web site which was a bad idea because then one XML file was damaged. As I always do a backup before doing something like that I was on the safe side, and just reverted the files.
Then I used 7zip to zip the posts folder which is located in the App_Data which was 338 MB, again wow.
Downloaded the zip file on my local machine, installed BlogEngine and imported the post.
I thought it would be faster on my machine because it is a recent one. But still to slow to treat 447833 spam messages.
So as a developer I went on and wrote a little application to do it. And after cleanup the spam which took less than 10 seconds I went to this folder size of the posts
Quite a difference ! And BlogEngine showing me the results
And here is the code, it is using .NET Framework 4 and the parallelization of queries to treat files:
#region using
using System;
using System.IO;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
#endregion
namespace BlogEngineSpamDelete
{
internal class Program
{
private static void Main(string[] args)
{
var files = Directory.GetFiles(@"C:\Temp\blogengine\posts", "*.xml");
foreach (var file in files.AsParallel())
{
FixPost(file);
}
}
private static void FixPost(string file)
{
XDocument doc;
using (var stream = File.OpenRead(file))
{
doc = XDocument.Load(stream);
}
var comments = from comment in doc.Descendants(XName.Get("comment", String.Empty))
select comment;
var spamComments = from comment in comments.ToArray()
let data = new CommentState(comment.Attribute("spam").Value,
comment.Attribute("approved").Value,
comment.Attribute("deleted").Value)
where ShouldDeleteSpamAndUnApproved(data)
select comment;
foreach (var spamComment in spamComments)
{
spamComment.Remove();
}
using (var writer = XmlWriter.Create(file, new XmlWriterSettings {Indent = true}))
{
doc.WriteTo(writer);
}
}
private static bool ShouldDeleteSpam(CommentState commentState)
{
return !commentState.Approved &&
(commentState.Spam || commentState.Deleted);
}
private static bool ShouldDeleteSpamAndUnApproved(CommentState commentState)
{
return !commentState.Approved ||
commentState.Spam ||
commentState.Deleted;
}
private class CommentState
{
public CommentState(String spam, String approved, String deleted)
{
Approved = bool.Parse(approved);
Spam = bool.Parse(spam);
Deleted = bool.Parse(deleted);
}
public bool Approved { get; private set; }
public bool Spam { get; private set; }
public bool Deleted { get; private set; }
}
}
}
Update: I also posted the code on bitbucket: https://bitbucket.org/lkempe/blogenginespamdelete