RSS

Neil Crookes

Learnings and Teachings on Web Application Development & CakePHP

Jan

30

CakePHP Site Search with Yahoo! BOSS

A complete turnkey solution for integrating Yahoo! BOSS powered site search functionality into your CakePHP application.

Share and Enjoy:

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Slashdot

Providing the ability for users to search your whole site for information is pretty damn important.

You can build your own search functionality by searching your database, but as the number of datatypes or tables in your db increases, and their relationships get more complex, the harder this is to do, and it’s even harder to do well.

An alternative is something like Lucene, but is there not a better solution, easier to implement? Of course there is.

What you need is a search functionality with the search and ranking technology of search a search engine, but with results restricted to your own site and ideally without having to include sponsored links or any branding or anything like that.

What you need to do is Build your Own Search Service.

Well, actually, you don’t, because Yahoo! has done it for you!

BOSS (Build your Own Search Service) is Yahoo!’s open search web services platform. The goal of BOSS is simple: to foster innovation in the search industry. Developers, start- ups, and large Internet companies can use BOSS to build and launch web-scale search products that utilize the entire Yahoo! Search index. BOSS gives you access to Yahoo!’s investments in crawling and indexing, ranking and relevancy algorithms, and powerful infrastructure. By combining your unique assets and ideas with our search technology assets, BOSS is a platform for the next generation of search innovation, serving hundreds of millions of users across the Web.

So, that’s the hard part covered, the next step is integrating it into your CakePHP application. The good news is, I’ve done this for you as well.

I’ve written a CakePHP datasource for the Yahoo! BOSS service that uses CakePHP’s built-in HttpSocket class to make requests and provides both web search and spelling suggestion functionality.

The web search can be limited to one or more sites, and the results contain key terms related to the result.

The other cool things is it can easily be used in conjunction with custom paginateCount and paginate model methods to make use of CakePHP’s buit-in pagination controller logic and helpers.

The datasource itself, and all the files you need to integrate the functionality into your site are available on my github account, the files are:

  • app/config/database.php
    Merge with your existing database.php file.
  • app/config/routes.php
    Merge with your existing routes.php file. Gives you nice urls like http://domain.com/search/<term>
  • app/controllers/searches_controller.php
    Contains the results() action
  • app/models/datasources/yahoo_boss_source.php
    Where the magic (it’s pretty simple actually) happens
  • app/models/search.php
    Calls methods in the datasource
  • app/views/searches/results.ctp
    Search results view

So, to add it to your app:

  1. Copy the files into your app
  2. Register for a Yahoo! developer app ID
  3. Add it to your the yahooBoss config array in app/config/database.php
  4. Set the value of the ‘sites’ key in the config array to your own site

As follows:

var $yahooBoss = array(
  'datasource' => 'yahoo_boss',
  'sites' => 'http://your.site.here',
  'app_id' => 'your_app_id_here',
);

Now point browser to http://your.site.here/search

See it in action. It’s configured to search http://www.neilcrookes.com. Try a search for CakePHP and to try the spelling suggestion, try searching for CakePHO. Note, I’ve hidden the key terms in the search results, but you can view source to see what they look like.

Are there are disadvantages? A couple I’ve noticed, but you can live with them – the Ts and Cs of Yahoo! BOSS say you have to use the click url they send you in the search results (they send you the real URL too), which is a link to Yahoo!, who then redirect the user to the proper page on your site – I think it’s for link tracking or something. The other – it highlights how crap your page titles are!

Share and Enjoy:

  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Slashdot
(1 votes, average: 4.00 out of 5)
Loading ... Loading ...

20 Responses so far

[...] A complete turnkey solution for integrating Yahoo! BOSS powered site search functionality into your CakePHP application. Share and Enjoy:Providing the ability for users to search your whole site for information is pretty damn important.You can build your own search functionality by searching your database, but as the number of datatypes or tables in your db increases, and their relationships get more complex, the harder this is to do, and it’s even harder to do well.An alternative is something Read the original here: CakePHP Site Search with Yahoo! BOSS [...]

That’s very nifty! Thanks for sharing :-)

[...] Neil Crookes » CakePHP Site Search with Yahoo! BOSS A CakePHP datasource and associated controllers to create a simple (to implement), paginated Yahoo powered site search. (tags: cakephp yahoo search datasource) [...]

[...] through the site and can be used to redirect them back to previous pages. Next is Neil’s CakePHP implementation of Yahoo!’s BOSS, which right now is implemented as a set of files that you drop into your [...]

Nice work, Neil. I hacked an implementation together of this as well last summer – and well, your is much better and Cake-ish. I can learn from this, thanks!

Hey Marc, cheers mate, glad I can return the favour.

[...] the CakePHP digest I posted the other day I linked to Neil Crookes’ CakePHP datasource for using Yahoo! Search BOSS. BOSS stands for Build your Own Search Service and is a cool way to [...]

Really great work Neil! I was looking to implement BOSS on cakephp as I previously used a searchable behavior but that gets quite cumbersome when trying to index multple models. Then saw that you’d already done the hard yards. Thanks!

Multiple sites did not work for me as an array. I fixed this. If you want the updated yahoo_boss_source.php file email me and I’ll send to you as I couldn’t submit to github.

Thanks for this! I am looking forward to start using it!

I have got a question. For this to work, does my site needs to be in the Yahoo!’s index allready? I guess so, or is it crawling my site as soon as I start using this?

Thanks!

Thanks, this is excellent! Rarely does anything work so easily and with so little setup. The only issue I had was with a $startQuote / $endQuote being undefined but I just added those variables to the dataSource and all behaved just fine.

Thanks again, this saved a lot of time!

If I wanted to use BOSS to search the web instead of my site, should I just leave the ‘sites’ value blank?

Does yahoo need to index your site for this to work? I have it working on all sites but my newest that isn’t being read on yahoo search either.

Thanks

Thanks for your comments everyone.

Yes, Yahoo needs to index your site first for this to work.

To use it to search the web, I think you can probably just leave the sites value blank. Check the Yahoo BOSS documentation for more information.

Thank you very much, this will help us a lot and will save us a lot of time, effort, and resources in terms of implementation!
Thanks again neil!

[...] and provided a single results set from multiple models/sources. Normally I’d use the CakePHP Yahoo BOSS site search I wrote and blogged about previously, but this particular app requires users to login to access [...]

Hi Neil,

Whoops! I just followed your excellent instructions, but then discovered that BOSS would no longer be free: http://techcrunch.com/2010/08/17/yahoo-webmasters-search-tools-bing-2012/

Doh!

Dan

Shit yeah I tweeted it http://twitter.com/neilcrookes/status/21561557852 and meant to update the post, but completely forgot. Sorry dude. However, I reckon that they’ll only charge for high volumes of requests. I wish they’d confirm that though.

Apparently there may be some YQL Boss support–unclear whether or not that is free.

http://developer.yahoo.net/blog/archives/2010/08/api_updates_and_changes.html

We’ll see what shakes out.

Pls, how can I use this for a full web search engine like google?
Thanks

Leave a comment