Pasky’s Log

Brmson / BlanQA

January 27th, 2014 No comments

I have recently been dabbling in Natural Language Processing, in particular Question Answering. I have been fascinated by the success of IBM Watson and have gradually came to believe that this technology can serve as a great basis of autonomous agents operating in the complex world of human knowledge. (I later came across Project Aristo – I’m not alone.) This approach, compared to projects like OpenCog that aim to create autonomous agents understanding and operating in the physical world, seems to offer many advantages – but let’s talk about that some other time.

Let’s say we wanted to take a stab on approximating IBM Watson with easily available technology, in “at home” conditions (or rather, “at hackerspace” – I gave this aim a temporary callsign “Project Brmson”). What’s the best we can do?

So I took a look at the current open source question-answering technologies and found – well, just one, and none that would be immediately usable by anyone. I have put together a short survey of the current landscape.

The only OSS framework I found that (i) could be used with not-so-many modifications to produce something functional, and (ii) would be a good base to build a truly good system on, is OAQA / OpenQA. It seems appealing from multiple viewpoints – it builds on the UIMA unstructured data processing platform which is also at the basis of IBM Watson, it originates at CMU which collaborated with IBM in this area; and, well, it’s the only platform that already exists anyway, so it’s a good starting point for someone who has no prior clue about the field. A honorable mention goes to OpenEphyra, basically a non-UIMA OAQA predecessor by the same institution; it’s not a good base to use for new systems, but can be sourced for a lot of NLP functionality.

In my first stab, I looked if there is actually a working QA system built on top of OAQA, and the answer was non-obvious. There is a helloqa project, but its master branch can currently do nothing useful. However, there is also a prototype branch that can actually answer some terrorism-related questions! It doesn’t work out of the box, but our fork does if you follow the instructions. But overally the project seems to be a bit of a hack and not a good base for a universal system usable by anyone but the original author.

So I set out to rewrite the helloqa-prototype from scratch on top of OAQA and build a different, clean and extendable QA pipeline (that shares bits of the original code and is much simpler). Thus, behold the project BlanQA! :-)

BlanQA is focused on universality, practicality and user-friendliness. That means there is a relatively detailed documentation and easy to follow installation instructions (try BlanQA out yourself!). By default, BlanQA offers interactive mode and will answer on top of Project Gutenberg corpus; but you can also connect it to IRC (#brmson @ freenode) or run on top of Wikipedia.

BlanQA is still a very stupid program at this point. It gets the answer right about 10-30% of the time, depending on how nicely you ask. But it’s more important as a base on top of which you can add clever algorithms (the smartest parts of BlanQA are currently outsourced from the OpenEphyra project, mainly guessing the type of the answer – is it a person? location? amount of something?). And if you want an OSS question-answering engine now, BlanQA is where to turn!

I want to develop this further, but the way ahead remains a little unclear. The thing is, OAQA appears to have significant architectural problems, as I realized while I continued hacking BlanQA and learning more about both OAQA and the UIMA framework it builds on top of. The rest of this section is a bit technical, c.f. also a quick intro to BlanQA architecture.

The basic UIMA principle is that each artifact (in this case: question, document/passage, answer) should have its own CAS (“piece of data” with a set of annotations and other featuresets derived from it) with a dedicated type system and appropriate Sofa (view of this piece of data). This would enable easy creation of stand-off annotations of e.g. fetched documents.

However, the OAQA model works with just a single CAS that has just the question text set as a Sofa and then a variety of types mashed together, partitioned only into phase-based views. This seems to me as a substantially less appealing option – it doesn’t allow to use third-party UIMA annotators that expect their subject to be the Sofa, it might be harmful for scaleout and it seems generally awkward to use; I actually have hard time seeing what advantages does using UIMA bring on the table in this model.

So it seems the way forward for BlanQA (or likely a differently-named successor) is to break away of OAQA and build directly on top of UIMA (possibly with a hacked version of uima-ecd that supports multiple CAS, but that seems as a bit intimidating proposition).

Tue Jan 28 2014 update: Note that we have started work on a new Question Answering engine YodaQA built on UIMA from scratch.

Categories: software Tags: ai, brmlab, brmson, java, nlp, qa, research

Texas Instrument Launchpad MSP430 and Linux II

June 13th, 2012 4 comments

So, thanks to very helpful Rickta59 on #43oh IRC channel, I got my Launchpad v1.5 serial communication working. The key piece of information I was missing:

If you are using hardware UART,
you must rotate the RX-TX jumpers by 90 degrees!

This is even drawn on the board, but it just didn’t occur to me that I need to do this simple thing. Most examples seem to use hardware UART, and Energia Serial class also uses hardware UART.

It is still very flaky:

For the first ten seconds, communication is impossible. Wait for timeout messages to appear in dmesg, then you can start communication.
When the board is sending data, something must be reading them on the host side. If not, the driver collapses and you need to replug the device.
The latter might be circumvented by direct USB communication without involving the tty driver.

So, it is rather fragile, but usable! Let’s enjoy our Launchpads for projects where this is not a big issue…

Categories: hardware, life Tags: arduino, brmlab, hardware, launchpad, microcontroller, msp430, usb

Texas Instrument Launchpad MSP430 and Linux

June 11th, 2012 2 comments

I found out that the situation with MSP430 is not as bad as it seemed. This post is mostly obsolete, but I’m leaving the text up for the benefit of Google index and other desperate people struggling with their Launchpad. :-)

~~This blogpost serves as a big fat warning to the future ones that might be about to follow in my footsteps:~~

Currently sold TI Launchpad MSP430
is not properly supported by Linux
as of 2012-06-01

It’s a sad reality but that’s just how it is, to the best of my knowledge, and after a lot of research and doing unbelievable things to kernel drivers etc. To clarify a bit, basic programming using mspdebug works, but you cannot communicate between host and board using USB serial. This seems to have worked with much older USB chips but not with the ones used by TI in current versions of the board (I got Launchpad with MSP-EXP430G2 ordered in May 2012).

Some fun technical details to help google index and guide others diagnosing this:

[186808.775510] usb 1-1.2: new full-speed USB device number 7 using ehci_hcd
[186808.891778] usb 1-1.2: New USB device found, idVendor=0451, idProduct=f432
[186808.891788] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[186808.891794] usb 1-1.2: Product: Texas Instruments MSP-FET430UIF
[186808.891800] usb 1-1.2: Manufacturer: Texas Instruments
[186808.891804] usb 1-1.2: SerialNumber: CFFF4695F6C11445
[186808.924900] cdc_acm 1-1.2:1.0: This device cannot do calls on its own. It is not a modem.
[186808.924914] cdc_acm 1-1.2:1.0: No union descriptor, testing for castrated device
[186808.925029] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device
[186808.927595] usbcore: registered new interface driver cdc_acm
[186808.927603] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
[186818.963279] generic-usb 0003:0451:F432.0001: usb_submit_urb(ctrl) failed
[186818.963332] generic-usb 0003:0451:F432.0001: timeout initializing reports
[186818.964177] generic-usb 0003:0451:F432.0001: hiddev0,hidraw0: USB HID v1.01 Device [Texas Instruments Texas Instruments MSP-FET430UIF] on usb-0000
:00:1a.0-1.2/input1
[186818.964262] usbcore: registered new interface driver usbhid
[186818.964269] usbhid: USB HID core driver

This is what my dmesg says the first time the board is plugged in. mspdebug works fine but any attempt of serial communication over /dev/ttyACM0 (talking to TI-provided sample UART code). OBTW if you are actually wondering how to compile and upload stuff on this baby:

msp430-gcc -mmcu=msp430g2553 -Wall -O3 -o uart_01_9600 msp430g2xx3_uscia0_uart_01_9600.c
mspdebug rf2500 prog\ uart_01_9600

For USB interface, TI includes its own crazy USB-enabled microcontroller on board that provides a HID-ish interface (for mspdebug) and an ACM-ish interface (for UART emulation) on a single port (which is nicely confusing). The serial part is supposed to be handled by ti_usb_3410_5052 kernel driver, which grabs a firmware and attempts to reflash the USB microcontroller so that it presents a more sensible serial USB interface (pretty crazy, eh?). However, the rf2500 variant of this chip appears to be too new and simply not supported either by the firmware or the firmware uploader.

Tweaking USB ids in the driver (f430 -> f432) does not help. Getting ti_3410.fw that Debian helpfully does not ship does not help. Manually binding the driver to USB does not help. The furthest I get is that the driver indeed tries to flash the ti_3410.fw firmware to device, but just times out doing that (I think maybe I bricked the serial part of the USB microcontroller by now):

[193053.430662] ti_usb_3410_5052 1-1.2:1.0: TI USB 3410 1 port adapter converter detected
[193054.443490] usb 1-1.2: ti_download_firmware - error downloading firmware, -110
[193054.443528] ti_usb_3410_5052: probe of 1-1.2:1.0 failed with error -5

Oh, and mspdebug rf2400 exit before any serial communication (I have found a tip somewhere) does not help either. An obviously-working UART code for MSP430G2553 would be welcome too, to triple-rule-out a uC-side firmware problem. (The launchpad board is awesome but rx/tx leds are sorely missing. I know, I could grab an oscilloscope… but how many hours have I already wasted by this?)

So, what seemed to be a great Arduino replacement turns to dust for me since the whole point of 80% of my Arduino projects is to talk to a computer… That said, if (after) you make it work, you will get one, or maybe even two Launchpads for free from me.

Categories: hardware, linux Tags: arduino, brmlab, hardware, launchpad, microcontroller, msp430, usb

Realtime Signal Analysis in Perl

September 24th, 2011 No comments

Archive

Brmson / BlanQA

Texas Instrument Launchpad MSP430 and Linux II

Texas Instrument Launchpad MSP430 and Linux

Realtime Signal Analysis in Perl