2018. március 26., hétfő

Flickering with Google Chrome on KUbuntu 16.04

There are many flickering issues with Intel GPU and Google Chrome on Linux. I use Ubuntu 16.04 with KDE. I hide the window title bars to get the maximal usable screen space on my laptop. Chrome flickered with some web pages heavily, but only if the window title bar was hidden. I tried a few fixes from askubuntu:


But eventually it turned out, I have to change the KWin rendering backend to XRender and the flickering stopped without disabling any GPU acceleration.

2017. május 8., hétfő

Zapcc Compiler

Once I noticed that a small startup company (Ceemple) introduced a modified Clang compiler (Zapcc) to reduce the compilation time. Instead of a disk-based cache like ccache, they cache some intermediate compilation pass results in memory (at least that is my bet) and reuse it for later compilation units. I modified the CMake build system in my robotics research project (AiBO+) and measured their claims. I found out that they don't overpromise. My project is about 90 kLOC now and zapcc was 30% faster than vanilla Clang 4.0.

Gcc 5.4: 6:40 min
Clang 4.0: 6 min
Zapcc: 4 min

I highly recommend their product. They don't just promise, but deliver. And it is really a drop-in replacement for gcc/Clang on Linux 64-bit environment.

2016. november 10., csütörtök

When you can use folly::ThreadLocalPtr instead of boost::thread_specific_ptr

In this post, I compare two solutions for thread local storage:

1. boost::thread_specific_ptr in Boost library which provides a cross-platform solution to store data per thread.
URL: http://www.boost.org/

2. Facebook's folly has folly::ThreadLocalPtr which has a very similar interface to boost::thread_specific_ptr for easy adaption and it is claimed to be 4x faster. The disadvantage of folly is that it supports only 64 bit Mac OS X+Linux distributions (e.g no Windows) and it requires a modern C++ compiler with C++11 support.

URL: https://github.com/facebook/folly
URL: https://github.com/facebook/folly/blob/master/folly/docs/ThreadLocal.md

I did not write a synthetic case for performance testing which may justify how faster folly is, but I come up with a real world use case from my research project. I develop an artificial intelligence for the old Sony ERS-7 robot dogs, and as part of my efforts, I can simulate the AI in a thread on my host Linux machine to use multithreaded testing. When I replaced Boost with folly in the testing, I was surprised that it is really faster. Obviously not 4x, but it is still much faster. On the other hand, I saw a significant difference in the folly performance when different compilers were used and decided to create this blog post.

Test case: I run one AI simulation test case in a single thread 100 times and the averaged runtime is shown in the diagrams below. In one place, my code was inefficient, using the thread local pointer (TLP) inside a frequently called loop. Because of this performance bug, the long execution time was mainly caused by this TLP bottleneck. After I moved the TLP usage outside the loop, the performance gains were not relevant with folly anymore. I still think that it has some value to publish these numbers how folly can improve a situation when TLP is heavily used or not.

Compiler setup

Important compiler switches: -O3 -std=gnu++14
I-don't-think-so-relevant compiler switches: -mfpmath=sse -march=core2 -fPIC

Gcc 5.4 came the official Ubuntu Xenial repositories
Gcc 6.2 was installed from a toolchain testing PPA for Ubuntu (https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test)
Clang 3.9 and 4.0 (4.0~svn286079) were installed from LLVM repositories (http://apt.llvm.org).

Case 1

So in this case, the TLP was used inefficiently, causing a large part of the runtime. As we can see, the the Boost results are almost identical, gcc is a bit faster than clang. However, Folly is not only faster with gcc, but Clang provides a much better performance than gcc. Although gcc 6 improved the performance a bit over gcc 5, but clang 4.0 is still 17.7 % faster than gcc 6.2.

Left axis is execution time in milliseconds. Lower is better.

Case 2

After the TLP usage was fixed in my codes, the compilers delivered almost identical results since the TLP access time did not play such main role like in Case 1. Boost and folly results are quite similar, but clang was a bit faster with Folly by a small margin. gcc 5.4 was unexpectedly faster with Folly, but I would assume it was a coincidence of some optimizations since that compiler was the slowest with folly in Case 1.

Left axis is execution time in milliseconds. Lower is better.


No silver bullets here. If you have a program which heavily use thread local pointers under Linux or Mac OS X, it is recommended to try folly to gain some speed. Otherwise Boost provides a generic cross-platform solution when TLP is not used all over the place.

2016. március 11., péntek

Unity Build macro for CMake

A half year ago, I came across some techniques to speed up the C++ compilation for bigger projects. As my evergrowing AiBO+ project is around 87000 C++ source lines without 3rd party libraries now, it really became important to keep the compilation time low. Apart from the ccache integration in my CMake build system, I decided to try out the Unity Build method. When I looked around the internet, I found that the Unity Build would shorten the build duration significantly and some people crafted some small CMake scripts, but these solutions were incomplete. Either they did not handle the Qt's moc files at all or all Unity files were regenerated each time when a CMake reconfigure was initiated. An other example is Cotire which does not handle the dependencies correctly (https://github.com/sakra/cotire/issues/77).

My script is based on Christoph Heindl's work although heavily modified. So here it is:


The features of my Unity Build script for CMake:
- Easy to add to existing projects
- Configurable extension for the generated Unity Build files (c, cpp, cc etc.)
- Easy to exclude certain problematic sources from the Unity Build file generation
- Source file count limit for the Unity Build files
- Working out-of-source builds
- Qt support (handling moc files correctly)
- Track the source file changes with md5 hashes
- Regenerate the Unity Build files only when really needed

- The Unity Build files are not removed with "make clean".

- The UNITY_GENERATE_MOC() macro is optional. I wrote this lightweight moc file generation for Qt instead of the default slower moc invocations in CMake.

Basic usage of my script:

- Copy UnityBuild.cmake to your project's root directory.
- Include in the root CMakeLists.txt:


- In any source directory, use like this:

    first.c ;
    second.c ;
    third.c ;

# Parameters:
# 1. Name prefix for the generated Unity Build files
# 2. CMake variable which contains the source files
# 3. Unit size (source file count) per Unity Build file
# 4. Extension for the Unity Build files to invoke the correct compiler

# Optional: Add any problematic source file which are not suitable for Unity Build and
# you are lazy to fix it.


2015. augusztus 13., csütörtök

Shishi odoshi vs. nyúl

Szal, egy nyúl ma azt hitte, hogy felzabálja a büszke bokrunkat a kertben. Aztán a shishi odoshi helyrerakta.

2015. augusztus 6., csütörtök

A new update of my artificial intelligence for Sony ERS-7 robots

Here comes the new update of my AiBO+ AIBOWare for Sony ERS-7: AiBO+ 3.1 - Let's Play!

New changes in this release:

- New sound profiles with voice acting (text-to-speech, Marissa, Gillian)
- A game can be played with AIBO
- Reworked pick-up mode/fall over/poking detection
- Configurable adaptive volume control
- Quicker responses for petting
- Less stress on the motors
- Many small fixes

The AiBO+ AIBOWare must be installed in a PMS stick and an AiBO+ Client application can be run under Windows, Ubuntu Linux and Android.

The installation instructions and a detailed user guide are available on this page:


2015. augusztus 3., hétfő

Second astrophoto - Shapley 1

Shapley 1 (planetary nebula)

Captured with telescope T32 of iTelescope network, Side Springs Observatory, Australia.

Capture parameters: LRGB, 14x120, 4x120, 4x120, 4x120.

It is my second trial with astroimaging. I was a bit surprised that I got better results when I tried to work with uncalibrated images. I tried various filters, but none of them given reasonable results with shutter speed 300 seconds.