Pasky’s Log

YodaQAâ€™s abilities are enlarged by traffic domain

May 23rd, 2016 1 comment

Guest post by Petr Marek (source)

Everybody driving a car needs the navigation to get to the destination fast and avoid traffic jam. One of the biggest problems is how to enter fast the destination and how to find where are the congestions, what is the traffic situation. YodaQA Traffic is a project attempting to answer the traffic related questions quickly and efficiently. Drivers may ask questions in natural language like: â€œWhat is the traffic situation in the EvropskÃ¡ street?â€ or â€œWhat is the fastest route from Opletalova street to Kafkova street?â€ You can try out the prototype (demo available only for limited time) – try to ask for example â€œtraffic situation in the Wilsonova streetâ€ .

YodaQA Traffic still has some limitations. Currently we only have a browser version not suitable for smart phones. It is answering traffic questions for Pragueâ€™s streets only.

But as usual, this whole technology demo is open source – you can find it in the branch f/traffic-flow of our Hub project.

How does it work and where we get the data from?

All YodaQA are first analyzed to recognize and select traffic questions. We do it in two steps. The first step is to recognize the question topic. We use six topics like traffic situation, traffic incident or fastest route. The topic is determined by comparing semantic similarity of the userâ€™s question with a set of reference questions. We estimate the similarity with our Dataset-STS Scoring API. Each reference question is labeled by a â€œtopicâ€. The Sentence Pair Similarity algorithm selects the reference question â€œtopicâ€ with the highest similarity to the question.

Next we need to recognize the location, i.e. to recognize the street name. This is handled by another tool called the Label-lookup which we normally use for entity linking in YodaQA. It compares questions words with a list of all street names in the Prague. We exported the list of streets names in Prague from OpenStreetMap. We do not do exact match, we try to select the closest street name from the list.

The last step is to decide whether the question is really the traffic question, because the Dataset-STS API and Label-lookup can find topic and street name even in a pure movie question like â€œWhen was the Nightmare on Elm Street released?â€. The Dataset-STS and Label-lookup return not only topic or street name but also the score, fortunately. We created dataset of over 70 traffic questions and over 300 movies questions and founded the minimal score thresholds, with which the recognition makes the lowest classification error on this dataset.

Once we know the type of question and the location we start a small script accessing the traffic situation data from HERE Maps. The only complication is that the the API doesnâ€™t return traffic situation for particular street, but bounding box only. To overcome this problem we have to find a bounding box for a desired location, using an algorithm we developed for this purpose. Then we call the traffic flow API to acquire the information for all streets in the bounding box. Finally, we filter out the traffic situation for the desired street.

It was great fun to work on this application, it is not perfect but it shows how to create intelligent assistants helping people solving various everyday situations. We are also excited to see, how the users will use the new functionality of YodaQA and how it will help them.

Categories: ailao, software Tags: 3c, dataset-sts, geo, guest, maps, nlp, sps, yodaqa

Semantic Sentence Pair Scoring

May 20th, 2016 No comments

The blog has been a little bit silent – a typical sign of us working too hard to worry about that! But we’ll satisfy some of your curiosity in the coming weeks as we have about six posts in the pipeline.

The thing I would like to mention first is some fundamental research we work on now. I stepped back from my daily Question Answering churn and took a little look around and decided the right thing to focus for a while are the fundamentals of the NLP field so that our machine learning works better and makes more sense. Warning: We’ll use some scientific jargon in this one post.

So, in the first months of 2016 I focused huge chunk of my research on deep learning of natural language. That means neural networks used on unstructured text, in various forms, shapes and goals. I have set some audacious goals for myself, fell short in some aspects but still made some good progress hopefully. Here’s the deal – a lot of the current research is about processing a single sentence, maybe to classify its sentiment or translate it or generate other sentences. But I have noticed that recently, I have seen many problems that are about scoring a pair of two sentences. So I decided to look into that and try to build something that (A) works better, (B) actually has an API and we can use it anywhere for anything.

My original goal was to build awesome new neural network architectures that will turn the field on its head. But I noticed that the field is a bit of a mess – there is a lot of tasks that are about the same thing, but very little cross-talk between them. So you get a paper that improves the task of Answer Sentence Selection, but could the models do better on the Ubuntu Dialogue task then, or on Paraphrasing datasets? Who knows! Meanwhile, each dataset has its own format and a lot of time is spent only in writing the adapter code for it. Training protocols (from objectives to segmentation to embedding preinitializations) are inconsistent, and some datasets need a lot of improvement. Well, my goal turned to sorting out the field, cross-check the same models on many tasks and provide a better entry point for others than I had.

Software: Getting a few students of the 3C group together, we have created the dataset-sts platform for all tasks and models that are about comparing two sentences using deep learning. We have a pretty good coverage (of both tasks and models), and more brewing in some side branches. It’s in Python and uses the awesome Keras deep learning library.

Paper: To kick things off research-wise, we have posted a paper Sentence Pair Scoring: Towards Unified Framework for Text Comprehension where we summed up what we have learned early in the process. A few highlights:

We have a lofty goal of building an universal text comprehension model, a sort of black box that eats your sentences and produces embeddings that correspond to their meaning, which you can use for whatever task you need to do. Long way to go, but we have found that a simple neural model trained on very large data is doing pretty good in this exact setting, and even if applied to tasks and data that look very different from the original. Maybe we are on to something.
Our framework is state-of-art on the Ubuntu Dialogue dataset of 1M techsupport IRC dialogs, beating Facebook’s memory network models.
It’s hard to compare neural models because if you train a model 16 times with the same data, the result will always be somewhat different. Not a big deal with large test datasets, but a very big deal with small test datasets which are still popular in the research community. Almost all papers ignore this! If you look at evolution of performance of models in some areas like Answer Sentence Selection, we have found that most differences over the last year are deep below per-train variance we see.

Please take a look, and tell us what you think! We’ll shortly cover a follow-up paper here that we also already posted, and we plan to continue the work by improving our task and model coverage further, fixing a few issues with our training process and experimenting with some novel neural network ideas.

More to come, both about our research and some more product-related news, in a few days. We will also talk about how the abstract-sounding research connects with some very practical technology we are introducing.

Categories: ailao, software Tags: 3c, dataset-sts, deep learning, keras, nlp, sps

Keras for Binary Classification

January 13th, 2016 5 comments

So I didn’t get around to seriously (besides running a few examples) play with Keras (a powerful library for building fully-differentiable machine learning models aka neural networks) – until now. And I have been a bit surprised about how tricky it actually was for me to get a simple task running, despite (or maybe because of) all the docs available already.

The thing is, many of the “basic examples” gloss over exactly how the inputs and mainly outputs look like, and that’s important. Especially since for me, the archetypal simplest machine learning problem consists of binary classification, but in Keras the canonical task is categorical classification. Only after fumbling around for a few hours, I have realized this fundamental rift.

The examples (besides LSTM sequence classification) silently assume that you want to classify to categories (e.g. to predict words etc.), not do a binary 1/0 classification. The consequences are that if you naively copy the example MLP at first, before learning to think about it, your model will never learn anything and to add insult to injury, always show the accuracy as 1.0.

So, there are a few important things you need to do to perform binary classification:

Pass output_dim=1 to your final Dense layer (this is the obvious one).
Use sigmoid activation instead of softmax – obviously, softmax on single output will always normalize whatever comes in to 1.0.
Pass class_mode='binary' to model.compile() (this fixes the accuracy display, possibly more; you want to pass show_accuracy=True to model.fit()).

Other lessons learned:

For some projects, my approach of first cobbling up an example from existing code and then thinking harder about it works great; for others, not so much…
In IPython, do not forget to reinitialize model = Sequential() in some of your cells – a lot of confusion ensues otherwise.
Keras is pretty awesome and powerful. Conceptually, I think I like NNBlocks‘ usage philosophy more (regarding how you build the model), but sadly that library is still very early in its inception (I have created a bunch of gh issues).

(Edit: After a few hours, I toned down this post a bit. It wasn’t meant at all to be an attack at Keras, though it might be perceived by someone as such. Just as a word of caution to fellow Keras newbies. And it shouldn’t take much to improve the Keras docs.)

Categories: ailao, software Tags: keras, ml, nlp, python

Linked Data Mashups

November 13th, 2015 No comments

I’m still working on YodaQA and there is quite some interest in it in my mailbox. One thing leads to another and our startup Ailao already has a few first customers, we work together on various related semantic NLP / search projects.

In YodaQA, we have a much neater web interface as well as a mobile app as the natural way to interact with a QA system is using your voice. Plus, on a limited domain (movies), we are getting pretty close to crossing the 80% mark for accuracy on simpler questions, entering the “magic zone” where people might start really trusting the system. A few essential blocks for that are still in the pipeline, though.

I’ll try to post a bit more about YodaQA and other work we are doing in the coming weeks / months (as well as some of my hobby projects, of course).

For a course of Jan Å edivÃ½, I prepared a presentation on building apps around the semantic web and linked data. See it here for an intro to the tech, it also includes two silly web mashups that might be inspiring.

Categories: ailao, software Tags: lod, mashup, nlp, rdf, sparql, yodaqa

Brmson / BlanQA

January 27th, 2014 No comments

I have recently been dabbling in Natural Language Processing, in particular Question Answering. I have been fascinated by the success of IBM Watson and have gradually came to believe that this technology can serve as a great basis of autonomous agents operating in the complex world of human knowledge. (I later came across Project Aristo – I’m not alone.) This approach, compared to projects like OpenCog that aim to create autonomous agents understanding and operating in the physical world, seems to offer many advantages – but let’s talk about that some other time.

Let’s say we wanted to take a stab on approximating IBM Watson with easily available technology, in “at home” conditions (or rather, “at hackerspace” – I gave this aim a temporary callsign “Project Brmson”). What’s the best we can do?

So I took a look at the current open source question-answering technologies and found – well, just one, and none that would be immediately usable by anyone. I have put together a short survey of the current landscape.

The only OSS framework I found that (i) could be used with not-so-many modifications to produce something functional, and (ii) would be a good base to build a truly good system on, is OAQA / OpenQA. It seems appealing from multiple viewpoints – it builds on the UIMA unstructured data processing platform which is also at the basis of IBM Watson, it originates at CMU which collaborated with IBM in this area; and, well, it’s the only platform that already exists anyway, so it’s a good starting point for someone who has no prior clue about the field. A honorable mention goes to OpenEphyra, basically a non-UIMA OAQA predecessor by the same institution; it’s not a good base to use for new systems, but can be sourced for a lot of NLP functionality.

In my first stab, I looked if there is actually a working QA system built on top of OAQA, and the answer was non-obvious. There is a helloqa project, but its master branch can currently do nothing useful. However, there is also a prototype branch that can actually answer some terrorism-related questions! It doesn’t work out of the box, but our fork does if you follow the instructions. But overally the project seems to be a bit of a hack and not a good base for a universal system usable by anyone but the original author.

So I set out to rewrite the helloqa-prototype from scratch on top of OAQA and build a different, clean and extendable QA pipeline (that shares bits of the original code and is much simpler). Thus, behold the project BlanQA! :-)

BlanQA is focused on universality, practicality and user-friendliness. That means there is a relatively detailed documentation and easy to follow installation instructions (try BlanQA out yourself!). By default, BlanQA offers interactive mode and will answer on top of Project Gutenberg corpus; but you can also connect it to IRC (#brmson @ freenode) or run on top of Wikipedia.

BlanQA is still a very stupid program at this point. It gets the answer right about 10-30% of the time, depending on how nicely you ask. But it’s more important as a base on top of which you can add clever algorithms (the smartest parts of BlanQA are currently outsourced from the OpenEphyra project, mainly guessing the type of the answer – is it a person? location? amount of something?). And if you want an OSS question-answering engine now, BlanQA is where to turn!

I want to develop this further, but the way ahead remains a little unclear. The thing is, OAQA appears to have significant architectural problems, as I realized while I continued hacking BlanQA and learning more about both OAQA and the UIMA framework it builds on top of. The rest of this section is a bit technical, c.f. also a quick intro to BlanQA architecture.

The basic UIMA principle is that each artifact (in this case: question, document/passage, answer) should have its own CAS (“piece of data” with a set of annotations and other featuresets derived from it) with a dedicated type system and appropriate Sofa (view of this piece of data). This would enable easy creation of stand-off annotations of e.g. fetched documents.

However, the OAQA model works with just a single CAS that has just the question text set as a Sofa and then a variety of types mashed together, partitioned only into phase-based views. This seems to me as a substantially less appealing option – it doesn’t allow to use third-party UIMA annotators that expect their subject to be the Sofa, it might be harmful for scaleout and it seems generally awkward to use; I actually have hard time seeing what advantages does using UIMA bring on the table in this model.

So it seems the way forward for BlanQA (or likely a differently-named successor) is to break away of OAQA and build directly on top of UIMA (possibly with a hacked version of uima-ecd that supports multiple CAS, but that seems as a bit intimidating proposition).

Tue Jan 28 2014 update: Note that we have started work on a new Question Answering engine YodaQA built on UIMA from scratch.

Categories: software Tags: ai, brmlab, brmson, java, nlp, qa, research

Pasky’s Log

Archive

YodaQAâ€™s abilities are enlarged by traffic domain

How does it work and where we get the data from?

Semantic Sentence Pair Scoring

Keras for Binary Classification

Linked Data Mashups

Brmson / BlanQA

Recent Comments

Categories

Blogroll

Licence