26 January 2010

Sphinx Reindexing Gotcha

For the past year or so, I've always used Sphinx for my full-text search needs. Along with Thinking Sphinx, it's one of the most simple, yet most powerful searching tools available in the Ruby world. Although there are other data search solutions (like Solr) and other tools for using Sphinx under Ruby (like UltraSphinx), I prefer Sphinx and Thinking Sphinx for their ease of use.

As most full-text search tools out there, Sphinx needs your searchable data to be indexed, which will allow the search engine to work its magic. How often depends on your application's requirements. I use delta indexing for some of my data models, and I usually set a cron job to reindex every 30 - 60 minutes. I also have a Capistrano task to reindex during deployment. This way, I can just deploy and any changes I've made in my models will be immediately indexed before the new version of the app comes up.

This week, however, I was having a problem during deployment to our Staging server. My indexing of Sphinx was failing, with the Thinking Sphinx error specifying a newly-created field in my database (the field was named 'visible' on a Rails model named 'Club') that I wanted to be included in the indexing:

Cannot automatically map attribute visible in Club to an
equivalent Sphinx type (integer, float, boolean, datetime, string as ordinal).
You could try to explicitly convert the column's value in your define_index
block:
  has "CAST(column AS INT)", :type => :integer, :as => :column

Everything was working smoothly on my development machine, so I was wondering what was up. After reviewing the deployment log for a while, I discovered that the problem was so simple. I had forgotten that I set up Capistrano to reindex Sphinx before running any pending database migrations, using Capistrano's after "deploy:update_code"method / task. So obviously, if the field isn't in the database, it can't be indexed!

The fix was simple - Reindex sphinx after any migrations are run. So I moved the reindexing task to be run before restarting the server (or touching the tmp/restart.txt file to restart all Passenger processes), which is essentially the last step of Capistrano's deploy task, using before deploy:restart. It's really a simple thing, but I think it can be easily overlooked.

07 January 2010

MongoShort - URL Shortener using Sinatra and MongoDB

One of the things I've been working on recently at BarterQuest is adding some basic Twitter integration. After seeing Facebook implement their own URL shortener recently, I got the idea to make our own URL shortener, for use when sending Twitter messages. The main purpose is that I hope doing this will make users trust the URL more, since we'll only be generating these short URLs internally, meaning that the shortened URL really comes from us. It also makes for a fun little project. Seeing that I've been toying with Sinatra and MongoDB for quite a while now, earlier this week I whipped up a URL shortening server in a day or so.

Today, I christened this code as MongoShort, and released it to GitHub. MongoShort is an extremely simple shortening service (I only have two actions - One to create a shortened URL, the other to redirect a shortened URL to the full URL). Since this app is really simple and there are so many more fully-featured URL shorteners out there, I don't expect anyone to use this nearly as much as I do. I released MongoShort because I hope others can use it as a starting point to Sinatra, MongoDB and the awesome MongoMapper gem, and perhaps serve as inspiration to use these tools.

I hope you can take a look at MongoShort and send some comments about it in my direction! I'm always up to learning more about what I used.

24 December 2009

ReCAPTCHA - Quick Gotcha

On some sites I've worked on, I have used ReCAPTCHA as a 'captcha' service to help minimize automated registrations on those sites. When using Ruby on Rails, I use the nice ReCAPTCHA plugin to set everything up for me.

This week, we did some updates to one of these sites, and after we checked around, we noticed that the places where ReCAPTCHA was used was all screwed up.

Messed-Up ReCAPTCHA

Now, we didn't catch this while testing, because we disable the captcha services in all environments except production. I did some digging around, and it turns out it that there was a recent CSS change by our designer on all tables:

table {
  table-layout: fixed;
}

I started probing around this area because we didn't change any of the default styling for ReCAPTCHA, and it builds the whole section using HTML tables. So using the ever-so-awesome Firebug, I removed this CSS properly, and voila, ReCAPTCHA was looking all nice and beautiful, as it did before.

Fixed ReCAPTCHA

Of course, the aforementioned table-layout property was there for a reason, and I didn't want to remove it from our CSS files without screwing everything up and causing more stress to our designer (she's had enough these past couple of weeks!). So after noticing that the ReCAPTCHA <table> tag had a class name itself, I just added the following CSS section of my own.

.recaptchatable {
  table-layout: auto;
}

This fixed the problem at hand. So be careful when introducing CSS classes for an HTML element!

08 October 2009

MongoDB and MongoMapper - The Future?

In the past couple of weeks, I've been playing around with different Document-Oriented Databases. Specifically, I've been focusing on MongoDB. In reality, all the document-oriented databases out there today are great, and it's a refreshing change of pace from relational databases and stuff like not having to worry about database schemas. However, I found MongoDB to feel closer to 'home', being someone who has strictly worked on relational databases since college. Also, it's blazing fast - Most benchmarks I've seen around the Internet show that MongoDB is much faster than other alternatives, including both document-oriented databases and relational databases.

Besides databases, I've also been playing with Sinatra, a micro-framework in Ruby. I have a ton of ideas for small sites, where using a framework like Rails or Merb would be overkill. So I thought, "I like Sinatra and MongoDB... Why not create an app using these two awesome tools?" So I've been doing some toy projects (some stuff will be released to my GitHub account soon), and I've really been enjoying it so far.

For connectivity to MongoDB, there are a few tools out there. MongoDB has excellent drivers for many different languages, both officially supported and community supported. There's an official Ruby driver, which will install itself as a gem. This will provide Ruby with most of the functionality to connect and interact with MongoDB.

However, to make things even easier, there are other tools that hook into the power of the MongoDB Ruby Driver. The one I'm using the most is MongoMapper. MongoMapper works sort of like DataMapper, where you map your database fields to a class, and you can use these as method attributes to write and fetch data.

I'll update this post later whenever I release any project where I use Sinatra and MongoDB together.

23 September 2009

Rails date_select Madness

I love using the Rails date_select helper. It's wonderful how one simple line of code can automatically generate all you need for your users to select a month, day and year. However, I have one small annoyance with it. A user can select an invalid date (like February 30 or September 31), and Rails will automatically roll over the additional dates, instead of actually marking that field / attribute as invalid. For example, if the user selected February 31 and submitted the form, Rails would happily set the date to March 3.

Now, that really isn't an issue for alarm on my part, and in most cases I would not handle these types of mistakes (or toying around, which is most likely what I suspect if this occurs). But one of the apps I'm working on required that the user should not be able to enter these bogus dates.

The easiest way I found to do this was to do a quick check using Ruby's Date class with the selected values. The Date class raises an ArgumentError if you try to initialize it with a bogus date. I put this in my controller action where the information, including the selected date, was going to be saved:

def create
  @user = User.new(params[:user])

  selected_day = params[:user]['birthdate(1i)'].to_i
  selected_month = params[:user]['birthdate(2i)'].to_i
  selected_year = params[:user]['birthdate(3i)'].to_i
  Date.new(selected_day, selected_month, selected_year)
  
  # Other logic here, including saving the record and redirecting on
  # success, or rendering the form on failure.
rescue ArgumentError
  @user.errors.add(:birthdate, 'is an invalid date')
  # Clear the birthdate, so it doesn't show the rolled-over date in the view.
  @user.birthdate = nil
  render :action => 'new'
end

In the code above, it does all the regular Rails stuff of creating a new object with the form data. However, I added an additional statement, which is to create a new Date object. I don't use this Date object for nothing I initialize it. I simply initialize the object so that if an incorrect date is selected in the date_select helper, it will raise the ArgumentError, where I then handle the error properly. It will render the form again, displaying the standard Rails error message indicating that the birthdate is invalid.

I avoided using plugins, since I didn't have many cases where it warranted using an external library. I'm also sure I could monkey-patch this, so that all my date_select helpers worked this way, but I only had it in two places. Is there some other way of doing this? There might be. But judging by the search results I encountered while looking into this, I don't think anyone cares if the user enters bogus dates.

Older Posts

04 September 2009 - JSON Gem (1.1.9) Weirdness with Rails

28 August 2009 - Snow Leopard App Fixes

23 August 2009 - jQuery and Rails

22 August 2009 - Website Screenshots - Take 1 (Using Selenium)

20 August 2009 - Markup languages for this blog

20 August 2009 - Informality rocks!

19 August 2009 - Why do I have this blog?