Thursday, December 2, 2010

Exciting Day at Nexiwave

Everyone at Nexiwave is excited today: today is the announcement date of Nexiwave Audio Search function to be offered as standard feature on UbiCast video hosting platform. More news to coming out in 30 minutes from Nexiwave, UbiCast and nvidia.

Thursday, August 19, 2010

Our MIT Cluster

Nexiwave has quite strong tie with MIT... Well, we still have a big cluster at a MIT facility. This cluster is also shared to some MIT student run activities. The recent addition of two GPU machines have given us trouble with MIT Facilities... They have come three times to reset the circuit breaker for us. We obviously are drawing too much power. We had to turn off half of the machines. Look like a good solution is required to solve this once and for all.

Thursday, July 15, 2010

Java ThreadPool and Large Training

A little shocking discovery we had on the Java ThreadPoolExecutor implementation: our large training cluster would somehow never filled up with a specific kind of training task. The culprit was eventually identified in the ThreadPoolExecutor, which gracefully queued up all tasks. I always thought new threads will be created when the queue is not empty and the number of threads is less than the maxPoolSize. Obviously, I did not read the fine-print in JavaDoc. There were quite a few different strategies described in ThreadPoolExecutor: Direct Handoffs (never queue, always create new thread till reach max), Unbounded queues (always queue, and never reach max threads) and Bounded queues (queue to a certain size, then start ceating threads till max threads, then rejects).

My assumption, Direct Handoffs+Unbounded Queue (use as many threads till reach maxPoolSize, then queue up unbounded), was obviously not one of them, even though I would think this is definitely favourable strategy in most cases. Anyway, we implemented this using a customized LinkedBlockingQueue.

Lessons learned: (like dealing with the bank,) always read the fine-print;).

Thursday, May 27, 2010

Computing Paradigm changes...

Nick have an very interesting point in his entry here: we effectively moved speech indexing/recognition techniques to a different domain with the GPU introduction. I couldn't agree with him more. With the performance gain we reported, many techniques in speeding up acoustic scoring becomes more or less irrelevant, since the bottleneck has shifted to some other areas. In this case, the grow itself.

This is clear from the performance matrix that I posted.

In other news, we are attacking the remaining "grow" part...

Friday, May 21, 2010

GPU and Speech Processing

We have been planning this for ages: utilizing GPU in nexiwave's speech processing. Now, we finally did it. Before the official release to our production decoding environment, our test environment shows dramatic speed improvement with GPU. A few things that are very worth noting:
  • do follow best practices from nividia:0
  • Critical best practices:
  • find those loops
  • memory coalesce
The first is quite obvious. The second is subtle. Anyway, our speed data as follows:

with GPU:
[java] Score 2134 0.0000s 0.0000s 0.0570s 0.0084s 17.9790s
[java] Grow 11238 0.0000s 0.0000s 0.1950s 0.0040s 45.2970s
[java] Scoring-Cuda 2124 0.0010s 0.0000s 0.0020s 0.0010s 2.1580s

without GPU (64 bit):
[java] Score 2134 0.0000s 0.0000s 0.4000s 0.0488s 104.1230s
[java] Grow 11238 0.0000s 0.0000s 0.1980s 0.0032s 36.0170s

without GPU (32 bit):
[java] Score 9607 0.0000s 0.0000s 0.2150s 0.0343s 329.9720s
[java] Prune 57599 0.0000s 0.0000s 0.0910s 0.0005s 30.7650s
[java] Grow 57642 0.0000s 0.0000s 0.7900s 0.0059s 339.4380s

Quite amazing, isn't it? Acoustic scoring time reduced from 104s -> 17s. Also, note the GPU only took 2s.

edit: If you must know how much was shipped to GPU: that's 15 million loops per 0.1s of audio

Monday, May 10, 2010

Java Coding standard wars

Nick and I have had quite a few fights regarding some technical stuff. I guess I am educated too much by Yoav Shapiro, SDM06, as regarding spaghetti code theory. I have to say I am in totally agree with Yoav that that is total necessary in a start-up environment: get the stuff done first. Well, I sure have seen that in many startup companies;).

Nick, on the other hand, is from the dream world. Well, I have to say nick did teach me some good discipline over the years, such as demanding detailed comments when checking in code. No complain about that. However, I do think we can lose some trivial stuff, such as:
  • line length should be no more than 80 characters (he keeps formatting my code and makes me verifying every time ;)). Who cares in today's world: We all have big monitors and we never print code to paper (in order to save the environment;))
  • No underscore in class name. I am actually not sure if this is bad practice. Maybe it is.
  • Whether closing braces should be on the same line or not. Come on, I just like keeping it on the same line. It looks cleaner.
Anyhow, I suppose we should compromise and meet in the middle. However, I still think the first priority is to get the job done.

Eclipse WTP "Java EE Module Dpendencies" not saving, or persisting

OK, I fumbled with the "amazing" Eclipse WTP project with the nice "Java EE Module Dependencies" option. There were a few projects that I just couldn't add them as dependency Java Utility Project to the web project. They were working before (for nearly two years;)). Apparently (as svn log says, nick), nick's cleanup effort two weeks ago removed some needed "nature" setting in .project file of those projects.

The solution was simple indeed. Just add this nature to the .project file of the Java utility projects:


Here goes over 8 hours of my time;). And, here are my two wishes:
1. Eclipse WTP should really give a warning if this nature is missing. Currently, there is no warning whatsoever. This is very bad, verrrrrrrrrrrry bad.
2. nick can do more meaningful work, than messing around setting like this;). (No offence, nick:)).

As far as the nature goes, you may also need this nature:


Thursday, April 29, 2010

Amazing Audio Search Accruacy Achieved

OK, this is really a back-post. During our tests back in February, our system achieved amazing search accuracy: with three words search, we had 93% search accuracy. Note, this also means proper search ranking: target result is within the top three search results.

Like any other search engines, the more words used in search, the better job we do. Our five words search accuracy was at whopping 97%.

If you are interesting in the details of these tests, feel free to give me a shout.

Dreadful IE/FF compatability issues

During the past two weeks, Xenia, our head of web marketing, has rolled out a nice look for the landing pages for both nexiwave and SearchMyMeetings. Not sure if you noticed, we hit a IE/FF compatability problem. Even though all pages looks quite find in FF, they looked quite bad in IE, especially the images.

As usual, the problem is yet again the CSS Selector support in IE. Dreamweaver, which Xenia was using, used "header-link>IMG", which FF happily complies, but IE just ignores it.

The fix was simple enough: replace the > with a space. Problem solved.

Monday, April 26, 2010

ant junit task: when forking, clonevm and dir don't co-play nicely

While writing the site monitor code, I decided to pass some control variables to the test cases using java system properties. I edited the ant junit task and set clonevm="true".

Apparently, this broke the dir property that was also passing into the junit tests. I noted (after many hours of wondering "heh, it was working before") that the "dir" will not take effect if you set clonevm to true. It seems clonevm also clones the current directory property.

It might be a good idea for junit task to still honour the dir property even if clonevm is true.

First blog

ok, so this blog is for the tech people from nexiwave. I hope that we can dump some of the interesting findings into this blog and hope it can benefit for the greater good:0.