A number of people have recently asked me how to setup real-time alerts for
different things -- Jive, competitors, their own name (vanity?), etc.
I've tested a number of ways to facilitate this "persistent search"
functionality, whereby I am notified any time one of the many search
engines out there (Google, Yahoo, Ask, Technorati, etc) finds a
specific keyword. Let's use "Jive Software" as an example. Here is what
I want to do:
- know any time a blog mentions the words "Jive Software"
- know any time a static website adds a page that contains "Jive Software"
- be notified ASAP so I can react
Now, to facilitate this, there are some process points:
- I want to avoid clogging my inbox, so this has to be done via RSS, not email (so Google Alerts won't fit the bill).
- I also want to over-subscribe, and build logic to filter the noise -- as opposed to only subscribe to one or two sources -- so I don't miss anything (which crosses-off the persistent search functions built into RSS readers like Attensa , where you could setup searches for both "Jive Software" and "Clearspace" and get two of every article that mentions both).
So, I turn to Yahoo Pipes. This tool is absolutely fantastic. A life-saver. And for you Perl ninjas out there, you can make it do even more than me. So here's what it does: takes in data (RSS/XML, JSON, whatever), allows you to mash it up (splice, sort, filter, rename, etc), and then spits it out in whatever fashion you want (RSS, JSON, email, text message).
Back to our "Jive Software" example. So, I want to setup a "pipe" to monitor this term (a free-standing tool that sucks in data, processes it, and spits it back out):
- Log-in to your Yahoo account
- Go to http://pipes.yahoo.com/pipes
- Go to my profile http://pipes.yahoo.com/techpaulogy
- Hover over my pipe that says "Persistent Search – Jive"
- Click on the "Clone" link in the upper-right of the gray highlighted area that pops up
- Click on the "My Pipes" link in the top navbar
- You should see a new pipe in your list of Pipes that is a copy of mine.
- Click on the title of the copied pipe to go to it's management page. You'll see the current live results for "Jive Software" searches on Google, Yahoo, Ask, and Technorati , etc.
- Click on the "Edit Source" link. Now for the magic (click the image -- it's 1600 px wide, might need to resize or download to see it best):
Here, you can see the guts of the pipe. It uses modules to fetch the source feeds, in one case re-maps some data fields to sync up with each other, combines all the sources, sorts them by date, filters out duplicates, and outputs the new feed. Voila! Play around. Now that it's been copied to your account, you can break it to your heart's content. Envision expanding this to also track searches for competitors, Jive product names, industry analysts, etc. (you'll see another pipe in my account called "Splice - Jive Feed" that combines this persistent search pipe with a shared OPML file of enterprise software blogs -- it's a work in progress).
Who else has played with this? Any Perl ninjas out there want to help me write logic (based on Regular Expressions) that can help standardize dates in all the various feeds to enable better sorting? Right now it's pretty hit or miss. Also, I'd love to explore publishing some shared "best-practice" OPML files (one for Competitor's blogs, one for Enterprise Software blogs, etc) that can be plugged in -- I was having trouble parsing the OPML file in the example "Splice" pipe.

