January 06, 2017

Alexa Skill written in D

As I wrote last week in my post about Alexa I started looking into developing custom skills for Alexa.

Custom Skills for Alexa are like apps for mobile devices. Amazon makes it easy to upload the source into their Lambda service. Custom Skills are webservices - everyhing happens in the cloud. Amazon Lambda lets you choose between:

  • C#
  • Java
  • Nodejs
  • Python

When facing the options I remembered the article I read on the dlang forums about how to use a D application in there. AWS Lambda Functions in D

Why D then?

So why chose D over the alternatives ?

  • blazing fast
  • statically typed
  • insane compile time functionality
  • personal taste and freedom to choose

Blazing fast ? Yes this is meant in two ways: 1. fast compile time 2. fast execution time

The dlang was designed to be fast - written by a experienced C++ compiler writer it was created to be compiled in a faster way. And since D is a system programming language and even shares the same compiler backend as a lot of other system languages it is equally fast as C++ and co. Why is performance especially interesting for an Alexa Skill? Because if you host it on AWS Lambda you pay per use - this means you pay per CPU/memory usage.

Template programming for the sane - D has a meta programming capability that is so strong because it allows the average programmer to go wild in it. I am using this to delegate incoming requests to the right endpoint here and to create simple to use REST interfaces to existing services like openwebif

You are not familiar with D yet? Then I have some resources for you to start:

January 05, 2017

Use Machine Learning To Increase Sales From Your Predictable Customers

Using machine learning and Python, we can find customers with predictable month to month patterns and target our sales team based on those patterns.

January 01, 2017

Alexa meets german developer

About 2 weeks ago I suddenly got my inventation to order an echo dot. The echo device family is the entry point to using the Alexa natural language processing service from Amazon. That is like having Apple Siri at home, or Microsofts Cortana.

I have to admit I was not entirely sure about this tech befor I got it, simply because of the practical usefulness that I was not able to imagine befor starting to put my hands on it.


My quick summary: This kind of speech control interface to modern technology (or VUI - Voice User Interface - as Amazon calls it) will revulutionize the way we interact with devices in the near future. I did not expect it to be as far as it is already. There is still things to do like keeping context in the ‘conversations’ but for me the experience was mindblowing: control your home using Alexa, control your music streaming with alexa, have her wake you up in the morning and stop the playback of chillout music at night when I sleep, have her manage my todo and shopping lists … and lots more.

Developing for Alexa

Only after I had bootet up Alexa the first time a colleague of mine told me he developed stuff for extending Alexa himself. Amazon did a great job allowing nerds like me to extend Alexa with custom skills in virtually every language because interfacing with it you only have to access and provide http and json (docs). This allowed me to use my favorite hobby programming language: dlang

The first skill I developed so far is already on github: alexa-openwebif

I will cover details on the development of custom skills using D in future post.

Food for thought for the guys at the Amazon Alexa team

Despite my enthusiasm there is a lot of topics that I have critique to raise. It is nothing major but I am hoping they will be tackled before the roll out in Germany starts big (right now you only can subscribe to get invited).

no multi language support and foreign skills

Right now you have to switch your language in the alexa app, you cannot just talk english or tell alexa to switch to english. Even if you change the language of your device you cannot access the existing thousands of skills in the U.S., you have to switch your whole amazon account country to the U.S.. It is not clear to me why you want to make it that tough to use the original U.S. skills in germany, after all english is a very commonly spoken language here.

This makes it for developers even harder to find out what skills it is actually worth developing from scratch. U.S. skills will likely be translated at some point.

built in slot types not in german

As you can see in the slot-type-reference most of the predefined slots that Amazon provides are not supported in German yet. This makes it very hard to convert skills and for German devs it is harder to create the same user experience that U.S. users benefit from. This has to change befor the open rollout of Alexa happens in Germany. This maybe one of the reasons why there is almost no skill in German yet.

updating skill through api

As a developer I am a fan of automating as much as possible. There is already a lot of functionality I can automate when it comes to alexa skill development because AWS is so CLI friendly. The Amazon developer console on the other hand is just a web portal. It would be great if the whole update lifecycle of a skill could be automated:

  • set activation name
  • add/set languages
  • update language model:
    • update intent schema
    • update custom slot types
    • sample utterances

lang tags in ssml

Defining the way Alexa is supposed to speak out text the System supports SSML as a markup language. What I would like to have is a language markup (like <language lang="english">) to give Alea a hint how to pronounce certain foreign words.

custom settings per skill

Alexa skills are plain web APIs in the cloud. As a skill developer you dont have a mean to save any state for the user accessing it on the device. The user is used to be able to configure Alexa in their Alexa App - this only applys to settings Amazon specifies (like language, volume and such). I would like to see a Skill-Settings-Panel that lets the user specify settings per skill that Amazon then provides to us developers when calling our API. This is for most of the skills more than enough compared to be forced to provide a custom user portal with OAuth account linking and the whole ceremony.

smaller issues

  • crossing things off todo list is not supported
  • sleep timer badly translated (setting sleep timer using example in ‘things to try’ does not work 99% of the times)
  • alexa skill search broken on amazon.com (seems to be limited to U.S. users, makes it particularly hard for devs to assess what skills are already there)
  • loud music prevents Alexa to sometimes hear the magic word to activate her
  • you can’t control your FireTV using your echo dot - that is simply pathetic

Happy New Year

The inevitable shift in perception that comes with age, when the passage of time seems to speed up, is beginning to have an impact on my programming hobby. I'm less eager to begin an exploratory project or game prototype, knowing that I'm almost certainly going to abandon it in the end. Time has become more precious to me, so I've become more conservative in how I spend it. I haven't played guitar in months. Where I used to play games regularly, often daily, I now play only once or twice a month and rarely more than twenty or thirty minutes in a sitting. I still have multiple interests competing for my attention, but I'm less inclined to take that first step than in the past, when I used to flit impulsively from interest to interest and project to project. I never would have started learning D back in 2003 if it hadn't been for my impulsive nature, but that very nature made it difficult for me to actually complete most projects. Read More

December 29, 2016

Year End Renovations

For quite a while, I've been wanting to change the workflow I use with my personal blogs. The primary motivation is that I'm just tired of Wordpress. I've been eager to make the transition to a purely static site. I've also wanted to scale down my VPS plan, as the one I've been paying for the past couple of years is massively bigger than I need, especially since Linode gave everyone free upgrades a little while back. I finally made the move to the cheaper server a couple of weeks ago and I'm now taking the opportunity to initiate the process of moving over to a static site. Read More

December 24, 2016

Profiling with perf and friends

Just a short tutorial on using perf and friends to figure out where to start with optimizations. Our example will be dmd compiling the release build of libphobos2.so.

First of all figuring out the command we’re interested in.

cd phobos
make -f posix.mak | grep -F libphobos2.so
../dmd/src/dmd -conf= -I../druntime/import  -w -dip25 -m64 -fPIC -O -release -shared -debuglib= -defaultlib= -ofgenerated/linux/release/64/libphobos2.so.0.73.0 -L-soname=libphobos2.so.0.73 ../druntime/generated/linux/release/64/libdruntime.so.a -L-ldl std/array.d std/ascii.d std/base64.d std/bigint.d std/bitmanip.d ...


A very good start to get a high-level overview is perf stat to obtain CPU event counts.

perf stat -r 5 -- ../dmd/src/dmd -conf= -I../druntime/import  -w ...
       2932.072376      task-clock (msec)         #    0.968 CPUs utilized            ( +-  0.34% )
                13      context-switches          #    0.004 K/sec                    ( +-  2.92% )
                 3      cpu-migrations            #    0.001 K/sec
           230,120      page-faults               #    0.078 M/sec                    ( +-  0.00% )
    10,942,586,352      cycles                    #    3.732 GHz                      ( +-  0.34% )  (34.19%)
    14,322,043,503      instructions              #    1.31  insn per cycle           ( +-  0.06% )  (50.00%)
     3,009,171,058      branches                  # 1026.295 M/sec                    ( +-  0.30% )  (32.70%)
        78,587,057      branch-misses             #    2.61% of all branches          ( +-  0.24% )  (30.76%)

       3.029178061 seconds time elapsed

It will already color numbers that are extremely off.


toplev is another great tool to get a more detailed and better understandable high-level overview.

./toplev.py --level 2 taskset -c 0 -- ../dmd/src/dmd -conf= -I../druntime/import  -w ...
C0    FE      Frontend_Bound:                             34.75 %           [  2.92%]
    This category represents slots fraction where the
    processor's Frontend undersupplies its Backend...
    Sampling events:  frontend_retired.latency_ge_8:pp
C0    FE      Frontend_Bound.Frontend_Latency:            24.38 %           [  2.92%]
    This metric represents slots fraction the CPU was stalled
    due to Frontend latency issues...
    Sampling events:  frontend_retired.latency_ge_16:pp frontend_retired.latency_ge_32:pp
C0    BAD     Bad_Speculation:                            14.05 %           [  2.92%]
C0    BAD     Bad_Speculation.Branch_Mispredicts:         13.65 %           [  2.92%]
    This metric represents slots fraction the CPU has wasted due
    to Branch Misprediction...
    Sampling events:  br_misp_retired.all_branches
C0-T0         MUX:                                         2.92 %
    PerfMon Event Multiplexing accuracy indicator
C1    FE      Frontend_Bound:                             42.02 %           [  2.92%]
C1    FE      Frontend_Bound.Frontend_Latency:            31.16 %           [  2.92%]
C1-T0         MUX:                                         2.92 %
C2    FE      Frontend_Bound:                             40.71 %           [  2.92%]
C2    FE      Frontend_Bound.Frontend_Latency:            34.68 %           [  2.92%]
C2    BAD     Bad_Speculation:                            10.23 %           [  2.92%]
C2    BAD     Bad_Speculation.Branch_Mispredicts:          9.66 %           [  2.92%]
C2    BE      Backend_Bound:                              35.74 %           [  2.92%]
C2    BE/Mem  Backend_Bound.Memory_Bound:                 21.60 %           [  2.92%]
    This metric represents slots fraction the Memory subsystem
    within the Backend was a bottleneck...
C2    RET     Retiring:                                   13.77 %           [  2.92%]
C2    RET     Retiring.Microcode_Sequencer:                8.49 %           [  5.84%]
    This metric represents slots fraction the CPU was retiring
    uops fetched by the Microcode Sequencer (MS) unit...
    Sampling events:  idq.ms_uops
C2-T0         MUX:                                         2.92 %
C3    FE      Frontend_Bound:                             36.71 %           [  2.92%]
C3    FE      Frontend_Bound.Frontend_Latency:            45.72 %           [  2.93%]
C3    BAD     Bad_Speculation:                            11.72 %           [  2.92%]
C3    BAD     Bad_Speculation.Branch_Mispredicts:         11.28 %           [  2.91%]
C3    BE      Backend_Bound:                              37.37 %           [  2.92%]
C3    BE/Mem  Backend_Bound.Memory_Bound:                 23.81 %           [  2.92%]
C3    RET     Retiring:                                   13.83 %           [  2.92%]
C3    RET     Retiring.Microcode_Sequencer:                8.74 %           [  5.84%]
C3-T0         MUX:                                         2.91 %
C0-T1         MUX:                                         2.92 %
C1-T1         MUX:                                         2.92 %
C2-T1         MUX:                                         2.92 %
C3-T1         MUX:                                         2.91 %

The level of detail can be selected using --level X (with X from 1-5), also see Selecting the right level and multiplexing, and it can record and plot events over time.

./toplev.py --level 3 taskset -c 0 -I 10 -o -x, x.csv -- ../dmd/src/dmd -conf= -I../druntime/import  -w ...
./tl-barplot.py x.csv --cpu C0-T0 -o toplev_dmd_barplot.png



perf record is the workhorse for drilling down into performance problems up to instruction level. The basic work-flow is recording events and then using the interactive perf-report to analyze them.

perf record -- ../dmd/src/dmd -conf= -I../druntime/import  -w ...
perf report

Another interesting mode is recording call-graphs.

perf record -g -- ../dmd/src/dmd -conf= -I../druntime/import  -w ...
perf report

It’s useful to play around with the --freq= option to collect more sample, and the --event= option to gather other events than the default cycles, e.g. branch-misses. Ask perf list for all available events. While neither of perf-record’s call-graph collection methods, frame pointers or DWARF backtraces, works for all of dmd, using frame pointers (perf record -g or perf record --call-graph fp instead of perf record --call-graph dwarf) captures most of it.


The latest addition in my optimization toolbox is CPU Flame Graphs, a bunch of scripts to visualize profiles with call-graphs. After converting the profiler specific stacktraces (stackcollapse-perf.pl for perf), flamegraph.pl will generate an interactive svg file. We limit the stack depth to not kill the browser or svg viewer.

perf script --max-stack 24 | ./stackcollapse-perf.pl > profile.folded
# less profile.folded
./flamegraph.pl profile.folded > profile.svg