Home > software > Brmson / BlanQA

Brmson / BlanQA

January 27th, 2014 Leave a comment Go to comments

I have recently been dabbling in Natural Language Processing, in particular Question Answering. I have been fascinated by the success of IBM Watson and have gradually came to believe that this technology can serve as a great basis of autonomous agents operating in the complex world of human knowledge. (I later came across Project Aristo – I’m not alone.) This approach, compared to projects like OpenCog that aim to create autonomous agents understanding and operating in the physical world, seems to offer many advantages – but let’s talk about that some other time.

Let’s say we wanted to take a stab on approximating IBM Watson with easily available technology, in “at home” conditions (or rather, “at hackerspace” – I gave this aim a temporary callsign “Project Brmson”). What’s the best we can do?

So I took a look at the current open source question-answering technologies and found – well, just one, and none that would be immediately usable by anyone. I have put together a short survey of the current landscape.

The only OSS framework I found that (i) could be used with not-so-many modifications to produce something functional, and (ii) would be a good base to build a truly good system on, is OAQA / OpenQA. It seems appealing from multiple viewpoints – it builds on the UIMA unstructured data processing platform which is also at the basis of IBM Watson, it originates at CMU which collaborated with IBM in this area; and, well, it’s the only platform that already exists anyway, so it’s a good starting point for someone who has no prior clue about the field. A honorable mention goes to OpenEphyra, basically a non-UIMA OAQA predecessor by the same institution; it’s not a good base to use for new systems, but can be sourced for a lot of NLP functionality.

In my first stab, I looked if there is actually a working QA system built on top of OAQA, and the answer was non-obvious. There is a helloqa project, but its master branch can currently do nothing useful. However, there is also a prototype branch that can actually answer some terrorism-related questions! It doesn’t work out of the box, but our fork does if you follow the instructions. But overally the project seems to be a bit of a hack and not a good base for a universal system usable by anyone but the original author.


So I set out to rewrite the helloqa-prototype from scratch on top of OAQA and build a different, clean and extendable QA pipeline (that shares bits of the original code and is much simpler). Thus, behold the project BlanQA! :-)

BlanQA is focused on universality, practicality and user-friendliness. That means there is a relatively detailed documentation and easy to follow installation instructions (try BlanQA out yourself!). By default, BlanQA offers interactive mode and will answer on top of Project Gutenberg corpus; but you can also connect it to IRC (#brmson @ freenode) or run on top of Wikipedia.

BlanQA is still a very stupid program at this point. It gets the answer right about 10-30% of the time, depending on how nicely you ask. But it’s more important as a base on top of which you can add clever algorithms (the smartest parts of BlanQA are currently outsourced from the OpenEphyra project, mainly guessing the type of the answer – is it a person? location? amount of something?). And if you want an OSS question-answering engine now, BlanQA is where to turn!


I want to develop this further, but the way ahead remains a little unclear. The thing is, OAQA appears to have significant architectural problems, as I realized while I continued hacking BlanQA and learning more about both OAQA and the UIMA framework it builds on top of. The rest of this section is a bit technical, c.f. also a quick intro to BlanQA architecture.

The basic UIMA principle is that each artifact (in this case: question, document/passage, answer) should have its own CAS (“piece of data” with a set of annotations and other featuresets derived from it) with a dedicated type system and appropriate Sofa (view of this piece of data). This would enable easy creation of stand-off annotations of e.g. fetched documents.

However, the OAQA model works with just a single CAS that has just the question text set as a Sofa and then a variety of types mashed together, partitioned only into phase-based views. This seems to me as a substantially less appealing option – it doesn’t allow to use third-party UIMA annotators that expect their subject to be the Sofa, it might be harmful for scaleout and it seems generally awkward to use; I actually have hard time seeing what advantages does using UIMA bring on the table in this model.

So it seems the way forward for BlanQA (or likely a differently-named successor) is to break away of OAQA and build directly on top of UIMA (possibly with a hacked version of uima-ecd that supports multiple CAS, but that seems as a bit intimidating proposition).


Tue Jan 28 2014 update: Note that we have started work on a new Question Answering engine YodaQA built on UIMA from scratch.

Categories: software Tags: , , , , , ,
  1. June 10th, 2014 at 06:59 | #1

    Report– watch Can Have A Critical role In Almost Any Organization

  2. July 12th, 2014 at 05:41 | #2

    Thanks for a marvelous posting! I genuinely enjoyed reading it, you happen to be a great author.I will be sure to bookmark your blog and
    will come back very soon. I want to encourage you continue your great posts, have a nice
    afternoon!

  3. August 8th, 2014 at 21:45 | #3

    This blog was… how do I say it? Relevant!!
    Finally I’ve found something which helped me. Thanks a
    lot!

  4. August 13th, 2014 at 13:29 | #4

    Another urban wear trend that favors both men and women includes adornments like studs, rhinestones and glitter,
    which complement the graphics incorporated in urban clothing.

    Another thing that is to be taken care while selecting
    the appropriate apparels for the occasions is picking that one which
    suits your taste and style. Families are trending
    from the chain photography studio shots and picking more
    casual, documentary style photos.

  5. August 14th, 2014 at 22:19 | #5

    Please let me know if you’re looking for a author for your site.

    You have some really good articles and I believe I would be a good asset.
    If you ever want to take some of the load off, I’d really like to
    write some material for your blog in exchange for a link
    back to mine. Please shoot me an email if interested.

    Thank you!

  6. August 15th, 2014 at 16:44 | #6

    Be aware however that the account name you sign up with
    is not the name you will appear under in the game.
    One Pride” campaign to unify the city behind the team.

    What assists is the number of weapons and strategy,
    which is often, got making use of riots.

  7. August 20th, 2014 at 16:37 | #7

    Hello it’s me, I am also visiting this website daily, this website is truly good and the users are really sharing nice
    thoughts.

  1. No trackbacks yet.