A number of people have recently asked me how to setup real-time alerts for
different things -- Jive, competitors, their own name (vanity?), etc.
I've tested a number of ways to facilitate this "persistent search"
functionality, whereby I am notified any time one of the many search
engines out there (Google, Yahoo, Ask, Technorati, etc) finds a
specific keyword. Let's use "Jive Software" as an example. Here is what
I want to do:
- know any time a blog mentions the words "Jive Software"
- know any time a static website adds a page that contains "Jive Software"
- be notified ASAP so I can react
Now, to facilitate this, there are some process points:
- I want to avoid clogging my inbox, so this has to be done via RSS, not email (so Google Alerts won't fit the bill).
- I also want to over-subscribe, and build logic to filter the noise
-- as opposed to only subscribe to one or two sources -- so I don't
miss anything (which crosses-off the persistent search functions built
into RSS readers like Attensa , where you could setup searches for both "Jive Software" and "Clearspace" and get two of every article that mentions both).
So, I turn to Yahoo Pipes.
This tool is absolutely fantastic. A life-saver. And for you Perl
ninjas out there, you can make it do even more than me. So here's what
it does: takes in data (RSS/XML, JSON, whatever), allows you to mash it
up (splice, sort, filter, rename, etc), and then spits it out in
whatever fashion you want (RSS, JSON, email, text message).
Back to our "Jive Software" example. So, I want to setup a "pipe" to
monitor this term (a free-standing tool that sucks in data, processes
it, and spits it back out):
- Log-in to your Yahoo account
- Go to http://pipes.yahoo.com/pipes
- Go to my profile http://pipes.yahoo.com/techpaulogy
- Hover over my pipe that says "Persistent Search – Jive"
- Click on the "Clone" link in the upper-right of the gray highlighted area that pops up
- Click on the "My Pipes" link in the top navbar
- You should see a new pipe in your list of Pipes that is a copy of mine.
- Click on the title of the copied pipe to go to it's management
page. You'll see the current live results for "Jive Software" searches
on Google, Yahoo, Ask, and Technorati , etc.
- Click on the "Edit Source" link. Now for the magic (click the image -- it's 1600 px wide, might need to resize or download to see it best):
Here, you can see the guts of the pipe. It uses modules to fetch the
source feeds, in one case re-maps some data fields to sync up with each
other, combines all the sources, sorts them by date, filters out
duplicates, and outputs the new feed. Voila! Play around. Now that it's
been copied to your account, you can break it to your heart's content.
Envision expanding this to also track searches for competitors, Jive
product names, industry analysts, etc. (you'll see another pipe in my account
called "Splice - Jive Feed" that combines this persistent search pipe
with a shared OPML file of enterprise software blogs -- it's a work in
progress).
Who else has played with this? Any Perl ninjas out there want to help
me write logic (based on Regular Expressions) that can help standardize
dates in all the various feeds to enable better sorting? Right now it's
pretty hit or miss. Also, I'd love to explore publishing some shared
"best-practice" OPML files (one for Competitor's blogs, one for
Enterprise Software blogs, etc) that can be plugged in -- I was having
trouble parsing the OPML file in the example "Splice" pipe.