Thursday, May 14, 2009

Small update...

...I'm not dead. I'm still working on the whole thing.

The current version of Halebopp is dead, but I learned a lot:
  • Clutter is not as extraordinary it seems. Sure, it renders everything in GL, but it doesn't give you much freedom over its widgets. All coordinates are stored as integers to support embedded targets better, which mades a zoomable UI a pain to implement.
  • Writing your own alternative to Clutter is no fun, especially when writing a music editor is your main goal.
  • I hate traditional widgets, windowed layouts, text and cursors. I love multi-touch, images, zoomable interfaces, animations. I want to be able to control everything with my fingers. I need to reinvent the interface here. What I dream of does not exist yet. The subprojects name is "Mjoo" (A revived old project of mine).
  • Offline audio rendering is an optimization, not a paradigm. It gets too complicated, especially if you want (some) live input support. I'm better off rendering DSP in the traditional way, with a bit of mix-ahead cache magic to use up free CPU cycles.
  • I hate writing UI code directly. I love to set up simple databindings and static/functional structure. I explored that concept in Holygrail (another subproject), and it went pretty beautiful. So I actually have an idea what I'm doing here.
  • Nobody needs components in Python. Simple enumeration of modules and auto-import is fine. Anything else becomes self-serving rather quickly.
  • Function decorators in Python are mighty.
  • The logging module is good enough for any log output job.
  • SQLite is too low-level to be used directly. Writing your own wrapper doesn't cut it. SQLAlchemy does the trick - for any kind of SQL RDBMS.
  • SQL blobs suck for asynchronous R/W. Use files and store only references in your DB.
  • I hate trees. I love flat lists and dictionaries.
  • Python can't do DSP work efficiently and never will.
  • Python can't do multithreading efficiently and never will. Use multiple processes and RPC.
  • Boost.Python is a friend to keep, but never wrap anything not especially written for it.
  • SCons remains a great build system, as long as you don't get creative; stick to best practice.
  • It's impossible to track DB changes directly from multiple processes using the DB file alone. I assume best is some kind of RPC document server doing all transactions and providing changesets to clients - like Verse for 3D editors.
  • C++ is a dish best served stupid. Stick to the STL, no clever tricks, especially if you plan to expose stuff to Python.
  • Twisted is a great framework for writing anything socket related.
  • SndObj has nice processing plugins and Python bindings.
  • Python SWIG bindings can be understood by Boost.Python.
  • Don't do anything nobody asked for. No docs, no tests, no website, no mailing list until there is actually something that you use yourself productively. Community building is for later.
  • Mercurial is fun to work with, so is Eclipse and it's numerous plugins, including PyDev.
  • Top down development doesn't work for science and experiments. Don't make big plans.

I'm currently working on the second revision of Halebopp. Right now the (small) plan is (in this order) to get the document server working, write a player client, write an editing frontend.

I'm not trying to get revolutionary with the DSP technology this time. I want to support a DSP graph for SndObj objects and, later, a sequencer to script events. Also, something to easily import and record waves.

So long.

Friday, March 27, 2009

User Interface 101: UI Data Binding

The traditional method of writing UI logic is hard. It requires writing an enourmous amount of housekeeping code, such as event handlers and update methods. Non-interactive testing is almost impossible to realize.

I propose a method to bind an user interface to data using only a set of declarations, describing data constraints and UI/data relations. This set is called "binding".

Here are a few applications for bindings:
  • A property view, allowing to select items from a list and change their attributes.
  • A spatial view, allowing to position and resize items on a surface.
  • A complete application made of cascading spatial and property views.
A binding connects a model and a graphical UI in both directions. It can be described in an XML document using a dedicated DTD scheme, obsoleting imperative micromanagement of dialogs.

Both model and UI should be reflective: their properties and elements can be enumerated and queried by the binding. If either model or UI is not reflective, additional XML documents are required, providing information about structure and constraints.

The model can be of any of these following format types:
  • SQL database
  • XML DOM
  • JSON tree
  • Python class tree
  • C structures
The binding will initially support following UI frameworks:
  • GTK+
  • Glade
  • Some canvas (Clutter, GooCanvas?)
A binding supports following model datatypes:
  • Numbers (integers, floats)
  • Strings
  • Lists of mixed types
  • Compounds (structures, classes)
  • References
  • Functions
On the UI side, a binding supports following display and editing methods:
  • Text/numeric entries
  • Combo/choice boxes
  • Radio buttons
  • Check boxes
  • Buttons
  • List views
  • Tree views
  • Sliders
  • Scroll bars
  • Progress bars
  • Canvas items
More will follow.

Sunday, March 15, 2009

User Interface 101: Device Independent User Input Processing

This is the first post of a new series called "User Interface 101", where I try to invent a functional, description-based approach to writing graphical user interfaces bottom-up, devoid of any imperative baggage. The first post deals with one of the basic building blocks of human-computer interaction, user input.

Users operate computers only through input devices. An input device, no matter what shape, can be divided into separate elements, of which each element carries a value and an activity state.

Internally, we are going to represent each input device element as an abstract "input source", which is designed to yield any information a computer program would require to successfully interpret user input, while burdening programmers with device specific interface details only if necessary.

An input source describes a single input device element such as a key on a computer/MIDI keyboard, a button on a mouse or joypad, an IR pointing device, a joystick or a finger on a touch pad. For multi-dimensional input devices (e.g. mouse, pen, touch pad), each input source describes a single dimension.

An input source yields one single (signed integer) value at a time contained within a hardware dependent range r0..r1, where r1 > r0. This enables tracking of both digital key/button strokes/presses/clicks and analog stick/mouse/pen/finger movements for a single dimension.

An input source may also yield an activity state, if supported by the device, which indicates if the input device element is enabled/alive/in use. This enables tracking of caps/scroll/num lock key states, cursor dead zones and retraction of pen/finger/device.

An input source may optionally point to a descriptor containing information that may help to identify the source and interpret the values.

An input source may be tagged as relative or absolute in its descriptor. Relative axes yield delta values. Adding delta values subsequently, one may acquire an absolute value, contained within an unspecified range. This enables tracking of rotary wheels and mouse movements while repositioning the device.

An input source may describe a glyph or command it represents in its descriptor. This enables querying a scan code or button index.

An input source may describe an one to three dimensional position vector in its descriptor, indicating the sources spatial position on the device in meters, relative to either the center or a corner of the device. This enables to map tasks to input sources based on their adjacency and location.

An input source may describe the dimension it operates on in its descriptor, if it is part of a composite source. Zero indicates no specific dimension. This aids with building multi-dimensional values from multiple input sources.

An input source may describe a default resting point or range in its descriptor that the control returns to when not operated by the user. This enables to determine whether a key/mod wheel/button/axis is resting.

Imperative GUI Programming Sucks

I have recently had my eyes on Haskell, and I liked what I saw. I spent about 20 years of my life writing programs in imperative tongue, and the more I knew, the more trivial the solving of problems became, the more I found the language to be obstrusive and low level.

Why do I need to be bothered with rolling out the exact details of a routine, when I have exact logic relationships in my head? Can't I put my understanding of the abstracts in letters and the compiler figures out all the implications and details?

Turns out, functional programming, even if it looks a bit ugly at times, does exactly what I want. Haskell is a functional programming language, where the specification is the program. With such a concept, you theoretically would write even less code than you would in Python, while making even less mistakes.

But this world is not quite ready for functional approaches. All that old bullshit comes back through Haskells imperative back door of monads and I/O operations, and if you do a lot of interfacing, there is right now virtually no difference between Haskell and Python or C++. I had a look at the GTK+ binding for Haskell and it outright sucks. There I am again, writing event handlers and filling GUI's sequentially. The specification is again not in the program, but in my head.

As a GUI programmer, how can I get closer to writing a GUI cleanly from specification, without muddling much with the details?

Since the web is said to be the future, I had a look at different web technologies. There is a functional language in use on the web for quite a time, and it is called XSLT. This XML based descriptive language specifies how an XML can be translated into something else, either another XML or a completely different format. No imperative I/O. Just pattern matching and templates, and the XSL transform does the rest. You can even transform the XSL document itself using XSLT.

You could build a GUI from this. One part of the job, at least, in a primitive way. You could turn the XML into an HTML form using pure XSLT, but there is still the problem of turning user input back into XML. You would also want to update only parts of the source DOM on user input, along with various validation mechanisms. Source data and front end would have to be connected in both ways based on a formal specification. I searched the web up and down for this "two-way XSLT", but found nothing yet.

Is it really possible that in the year 2009 there is no need for a functional approach to binding data and front end? Or is it simply too hard, or silly to think of? I can't believe that. From my perspective, all these different approaches to editing data - buttons, edit fields, lists, trees, knobs, sliders, graphs - boil down to about at most 30 primitive concepts in combination, reimplemented over and over, requesting each and every GUI programmer ever coming along to stupidly assemble every dialog as if it was the first he had ever written. The technology we use today does not scale. It should become easier the more interfaces you do, instead it becomes tedious. Refactoring is a nightmare because GUI tests are way too hard to write.

If I only were able to write down the specifications of an GUI and get my working program right away, testing steps would be self-evident, they could even be automatically determined. Changing a GUI flow would become a lot easier. I would literally simply change connections instead of rewriting dialogs and tests. Somebody told me: everything that is hard to do is worth doing. Screw this guy. This stuff should be trivial. Figuring out the specification should be the hard work, not working like a code monkey.

Imagine how easier experimentation with new paradigms would become if we had enough time to test different ideas for user interfaces. How utterly convenient and elegant these applications would be, regarding both code and usage.

If you know of such a project in the works, please let me know. I am desperate.

Monday, February 9, 2009

The Rendering Graph: What Renders When?

Using the current pipeline for mixing tracks, I tried to write a metronome synth for QGrids, but that was actually way too mindbending to be fun.

To explain: traditional DSP works with sequential rendering, where each samples value kind of depends on the previous one, and intermediate states need to be saved. Since my concept renders audio offline, and only subportions of a clip may change, the architecture requires that any clip can render one buffer of its range (8192 samples) independent of location.

Given this limitation, it is a bit tricky to render e.g. a metronome sound from somewhere in the middle of the clip. I need to calculate what kind of click I currently am at (to determine pitch) and how far I am away from the clicks location (to determine phase and volume). Right now, this is possible, but kind of impractical, given that most of the time, I still know pitch, phase and volume from the previous location.

The concept breaks apart as soon as I employ delay lines and reverbs, which strongly rely on previous states and additional delay buffers. It is still possible to calculate values independently, but it requires to be able to look up input sample values from other locations. Chances are usually big that these locations have been rendered before, and so I would often recalculate stuff that I already know. Another issue is that I would not be able to use traditional DSP code, because it is usually written for linear signal processing.

Three questions emerge, a.) when I have already rendered a part of the pipeline, how can I prevent to render it again? b.) how can I simulate linear signal processing to simplify algorithms? - and since we are at it, c.) how can I neatly parallelize all this?

a.) Transparent bouncing. Whenever a clip renders, its rendered output is usually written to a cache on disk. When the output is needed again, I just read it from the cache. Since disk space is far from being sparse these days, that should usually not be a problem. DONE

b.) Bounce linearly. Render a clip from start to finish in usual block sizes, and store intermediate states like phase, volume, voices, if applicable. DONE

c.) Invalidate dependencies. Fragments of a clip are dependent on other fragments. Root fragments need to be rendered first, then dependent fragments can be rendered. Most of the time, unrelated clips can render in parallel.

Monday, January 19, 2009

Plans for Halebopp 2009.1.1

The next year has arrived and a new release of Halebopp is due. That would be easy to manage if I had actually done anything in the past weeks. Since I have not, here is a list of things I want to address before the next release.

This is more like a laundry list for myself, so please don't be too disappointed about its formality.

  • I feel blocked by the impression that the app could already use some refactoring. Here are a few points of critique:
  • The song database wrapper is monolithic and can not be extended by components. Accessing objects in the database has to be individually written, although I know that it can be done in a more generic way. A solution would be to add support for transparent DB objects, which, based on a list of typed attributes, can serialize attribute changes to and from the database. I call the class SQLRecordProxy. A class could inherit from SQLRecordProxy to be able to transparently access the database through its attributes. DONE
  • I do not feel comfortable at all with wave files being contained in the database. It makes debugging hard, and requires a non-SQL blob I/O interface. Doing this with raw binary files will simplify streaming and recording code, and makes auditing in an external program possible. Additionally, C++ code could access wave data without interfacing to Python. DONE
  • Importing a wave stalls the program for the time of loading. I need to import from streams, perhaps threaded.
  • Building wave visualizations can take a long time. This should be done in a thread as well.
  • I need rhythm in the program to make it useful. Q-grids will provide a facility to organize rhythm and tempo. SQLRecordProxy will ease the implementation of Q-grids. PARTIALLY DONE
  • It must be possible to duplicate clips.
  • Multiselection is whack and needs to be standardized and properly reflected visually.
  • I must find a good way to browse samples. If that isn't possible, allow drag and drop for samples at least.
Update: I am currently working on refactoring Halebopp as planned above, here are some remarks on implementation as I go along (just for housekeeping and showing off):
  • SQLRecordProxy was finally named SQLRecord, and works as described, there is a set of loose tests contained in the sqlobject module as well.
  • I chopped down the song database facade and used SQLRecord for waves, tracks and loops. The code is much easier to read and extend now. The song database has been renamed to "song.db2", because table structure is incompatible to earlier versions.
  • Waves and digest data have been moved from blobs to external .raw and .digest files stored in a /wavestore subfolder relative to the database. The files contain no header, just pure data, so the files size equals the length of the sample.
  • QGrids have a kind of visual representation, but are practically not editable at the moment, neither do they provide a metronome.

Sunday, December 14, 2008

Halebopp Kokino Released

It is done! The first alpha release of Halebopp, nicknamed "Kokino" is now available for the popular Ubuntu Linux platform.

Kokino is an ancient megalithic observatory in Macedonia, used to track the movement of sun and moon on the eastern horizon. Just like Kokino is one of the first primitive attempts of humans to touch the sky, our first release of Halebopp is meant to be only the first step of an ambitious project: a music composition application for everybody.

The source code is available at our development site at Google Code. If you need assistance or you want to take part in the development process, feel free to join the Halebopp Alpha Users Group. There are still 34 free seats left :)

Thanks to everybody who participated in this release!