Profile cover photo
Profile photo
Naoki Shibata
49 followers -
Computer Scientist, Associate Professor at Nara Institute of Science and Technology
Computer Scientist, Associate Professor at Nara Institute of Science and Technology

49 followers
About
Naoki's posts

Post has attachment
And now SLEEF 2.100 is released.
AVX-512F and LLVM Clang Extended Vectors are newly supported in this release.

In order to add support for AVX-512F, I did code cleanup. Portability is greatly improved as a result. SLEEF can utilize _mm512_getexp_pd and _mm_getmant_pd.

LLVM Clang Extended Vectors can be also utilized for porting SLEEF to another architecture. The quality of generated code with Clang Extended Vectors is pretty good, and it is much better than I tried a similar thing with gcc's vector extension.

Post has attachment
Today I released SLEEF 2.90.
In this release, all the reported bugs should be fixed.

Post has attachment
We received a best demo award at IPSJ DPS workshop 2016.

Post has attachment
SLEEF port to Julia language is now ongoing thanks to JuliaMath community.

Post has attachment
Today I released version 1.33 of SSRC.
I am trying monthly release of any one of my software packages, but it was skipped last month since I was too busy.

In this release of SSRC, compilation support for Mac OS X and support for ARM NEON instructions are added.

Post has attachment

Post has attachment
I've just released a new version of SSRC. More than 10 years has passed since the last release. In this release, the FFT subroutine is replaced and the new FFT can utilize AVX and SSE instructions.

Post has attachment
I released version 1.10 of BAUM library. It can now be used from Java. A few potentially serious bugs are also fixed.

I don't know if someone is reading this, but here is a tip for making CPU load lower when we do some processing with OpenCL on nVidia GPUs.

1. Always use pinned memory when you transfer data between a device and the host. Reading and writing buffer always block and waste CPU if you don't use pinned memory.

2. Never use a command that blocks. With nVidia GPUs, all blocking commands involve a busy wait that wastes CPU.  Avoid them and you need to manually poll and sleep in order to wait until a command finishes.

3. If you don't use blocking commands, you need to manually flush the command queue.

Post has attachment
I released a new software titled "BAUM : Library for Recognizing Blur-Resistant Markers."
Wait while more posts are being loaded