Wednesday, April 24, 2013

Function pointer type compatibility

I've been wondering how function pointers get passed around inside the GObject framework. Some example code seems to play fast and loose with function pointer types, relying on the fact that C's Undefined Behaviour can also include doing what one hopes. After some source-diving I finally found how GLib calls the callback. It does it through a function pointer of this type in one example:
typedef void (*GMarshalFunc_VOID__UINT_POINTER) (gpointer instance, guint arg_0, gpointer arg_1, gpointer data);
That's for a signal handler whose signature is void ()(gpointer instance, guint x, gpointer userdata). Because GtkCellRendererText * and void * are not compatible types, it's actually wrong (it invokes undefined behaviour) to simply copy the signal handler signatures from the GTK+ documentation! For this example, the correct function declaration would have to be:
void user_function(gpointer renderer, gpointer path, gpointer new_text, gpointer user_data);
I'm not sure if I want to be that pure. Too much boilerplate type conversion code. Maybe the reasonable compromise is to continue using pointers to specific types, but to make sure that at least the number of arguments matches what the marshaller functions demand. I think it's far more likely that a C implementation will be sensitive to a mismatched number of arguments (consider how cdecl vs pascal calling convention specifiers in some compilers determine a function's activation record) than that void * will have a different representation than GtkCellRendererText *.

Tuesday, February 12, 2013

Re-redesign the gEDA slotting mechanism

The slotting mechanism that is the primary subject of my gEDA fork seems to work, and solves the opamp problem, presumably also the transistor problem, and supports heterogeneous slots, unlike the more inflexible [1] stock gEDA mechanism.

But the design of the mechanism is broken: it doesn't play nicely with hierarchical designs, especially not ones that re-use schematics as distinct copies of a functional block. John Doty pointed this use case out to me; he's probably one of the heavier users of gEDA's hierarchical nature.

Essentially, the attributes my slotting mechanism currently uses, point the wrong way. It is the symbols that point to the slots they inhabit, thereby pointing "up" in the hierarchy of schematics, which is a graph (hopefully an acyclic one) and not a tree. Because the hierarchy is a graph, a schematic may have multiple parents - schematics that contain a symbol with a source= attribute pointing to it. While it would be possible to store multiple upwards-pointing attributes in a sub-schematic, doing so would damage the utility of that page as a reusable element when it accumulated slotting-related attributes from all the projects which used it as a sub-schematic.

So the slotting attributes can't point "up". Can they point "down" instead? It would be better, but perhaps still too inflexible: schematics with sub-schematics can themselves be sub-schematics. gEDA's hierarchy of schematics isn't limited in depth [2]. Only a toplevel schematic for a particular assembly [3] should sensibly assign slots in concrete parts to the symbols below that need them, since only it has no super-schematic, therefore it cannot appear in a project in multiple instances.

Then if slots need to point "down" to the symbols occupying them, we'll need not just a pointer to the symbol object, but a path through the hierarchy by which to reach it. Without the path, it would be impossible to disambiguate references to the same symbol used in a sub-schematic to multiple parents. Something like this should do:

slotsymbol=opamp3:48aa3670-55de-4dad-9587-f54e9f196837/c2490daf-a9e8-4ec8-9941-62845fc9bb29

Interpretation: 48aa3670-55de-4dad-9587-f54e9f196837 is the UUID of a hierarchical symbol (a "COMPLEX" in libgeda jargon). Perhaps, a block symbol for a bandpass filter.


Then, one of its source= attributes will point to a sub-schematic which will have another symbol on it (perhaps a generic opamp triangle symbol) whose UUID is c2490daf-a9e8-4ec8-9941-62845fc9bb29. That particular opamp function is assigned to the slot "opamp3".


Having these paths encoded in the slotting associations will allow gschem to show different pin numbers on the same symbol, depending on which instance of a sub-schematic one is looking at. gschem cannot do this yet, of course; I will have to code this extra behaviour.

I should reverse the associations now, before anyone really starts using my fork for its slotting mechanism. Patches welcome - it's a big job.

[1] Stock gEDA doesn't understand heterogeneous slots, conflates the symbol for a function (a NAND gate, for example) with the symbol for a part (correspondingly, a 74LS00 in this example), relies on the user to manually track slot assignments, and relies on fragile hacks to produce the correct netlist (setting identical refdes= attributes - or, worse, using lowercase suffixes to trigger special-case treatment in PCB). You end up with multiple symbols with a split identity between function and part, that can easily take on incompatible attributes. Imagine two NAND gates intended to be gates in the same chip, but each carries a mutually incompatible footprint= attribute. Let's not even think of the different pin numbering schemes of the various package styles - stock gEDA would demand that you edit the slotdef= attributes. How baroque!

[2] The hierarchy of schematics needs to be acyclic though; I can't see any good coming from a cycle of schematics. There is a similar issue in the component library: the gEDA file format allows any object to appear as part of the graphical representation of a symbol, including other symbols, and specifically including itself. Semantically invalid, but syntactically okay. A hare and tortoise algorithm would be able to detect cycles, and one day I'll get around to adding such a check.

[3] We could have a design consisting of a backplane and a set of identical daughterboards; each of these daughterboards would have identically-numbered parts, and this wouldn't be a problem, because the daughterboard is an entire sub-project. The important bit here is that only the topmost schematic of a particular assembly, subproject, whatever you want to call a distinct domain of refdes values, should carry the slot assignments for all the abstract symbols and slots below it.

Tuesday, January 29, 2013

dpkg MD5 checksums

My OpenOffice installation stopped working a few days ago after I Changed Nothing (tm) [1], so one of my avenues in investigating the breakage was to check for any unexplained changes to installed files. That happened to me once before [2], back when I worked at Prism, so "obviously" I felt I should check out that possibility again:

$ md5sum -c --quiet /var/lib/dpkg/info/*.md5sums

After much disk grinding (sometimes I'm sure I'm about to see a puff of hard disk powder come out of the fan exhaust), what seems to be a smoking gun:

usr/bin/gnuplot: FAILED
md5sum: WARNING: 1 of 45 computed checksums did NOT match

This is interesting! So I download the deb for gnuplot-x11 and unpack it manually (with binutils' ar), and find the same "wrong" checksum. A friend repeated the procedure and found the same "wrong" checksum, so I'm no longer suspecting a fancy worm/virus that infects new gnuplot binaries as they appear on the filesystem.

It turns out that these mismatching packages have preinst scripts that "divert" files, invalidating the naive checksum. The diverted files are still around, but their names no longer match what's in the lists of MD5 checksums.

And that's where laziness bites me in the behind: I knew by the time I started on my wild goose chase that debsums(1) checked checksums, but since I didn't have it installed and felt too lazy to install it, decided to just run the checksum files through md5sum(1). And after all that effort to get an explanation for these mismatched checksums, I installed debsums(1) anyway and discovered that it knows how to follow diversions!

Now, I'm back to still wanting to know why OpenOffice stopped working.

[1] I upgraded google-chrome, but that update involved only its own package. ooffice seemed to stop working after I tried to open some document that caused it to crash, but I no longer recall the exact sequence of events.

[2] It was almost ten years ago when gethostbyname(3) or some nearby interface seemed to stop working. Suddenly no programs could connect to anything on the Internet anymore. After a bit of bug-chasing I noticed that libc's contents had changed. I don't remember what led me to check that with rpm, but I did. I must have suspected cosmic rays, because I made a copy of libc before rebooting, in order to freeze the corrupted memory contents onto stable storage. Sure enough, after the reboot libc was fine (clearly having been reloaded from the uncorrupted copy on disk), and a diff of a hexdump showed some six bytes that differed, right inside gethostbyname(3). To this day I don't know how I might have forced the kernel to re-read what must have been a very frequently-accessed page.

Monday, January 14, 2013

Waterfall spectrogram

A few days ago while visiting my dad, he got a call from Leon, who was transmitting at 137kHz at the time, asking my dad to listen for the signal. We didn't hear anything convincing, but it got me thinking: with some DSP we could grope out the signal from under the noise floor.

I quickly hacked up a pipeline on my laptop involving Pulseaudio's parec, my own FFT tool, and gnuplot. Sure enough, there seemed to be an (inaudible) audio signal at about 390Hz clearly visible in an approximately 10 second integration, giving a sub-Hz resolution. But I wasn't satisfied; I wanted to see the short-term spectra scrolling across the screen in real time. I ended up hacking into the night, but ultimately frustrated by some GTK+ weirdness. (One needs to set a file descriptor to non-blocking mode when calling g_io_add_watch.) I wasn't able to finish anything useful during my visit.

Last night I achieved victory:

This image is just a looped GIF showing two 1-second slices where I said something or my dog bumped the table. The vertical axis spans DC to 22kHz, which obscures much of the interesting stuff in the lower-frequency band where most of the information in speech lives. I'm not finished with this tool; it needs at least some zooming functionality, to adjust the range of frequencies shown, and to expand and contract the colour range I use to indicate spectral intensity. You can grab a copy of the code from repo.or.cz and help out, if you like.

Thursday, December 27, 2012

That damn halting problem

I'm fixing my gEDA fork so that it will work with Guile 2.0. There are several good reasons for doing so, including moving with the times and hopefully cleaner valgrind output. There's an interesting new feature in this new major release of Guile: scheme source files get compiled automatically. (They get cached under $HOME/.cache/guile, in case you're wondering.)

All that cleverness presumably leads to faster-running code (at the cost of slower startup), but it also results in this output to stderr:


;;; compiling /usr/local/geda/share/gEDA/system-gafrc
;;; /usr/local/geda/share/gEDA/system-gafrc:31:18: warning: possibly unbound variable `build-path'
;;; /usr/local/geda/share/gEDA/system-gafrc:40:24: warning: possibly unbound variable `build-path'
;;; /usr/local/geda/share/gEDA/system-gafrc:43:0: warning: possibly unbound variable `load-scheme-dir'
;;; /usr/local/geda/share/gEDA/system-gafrc:48:19: warning: possibly unbound variable `build-path'
;;; compiled /home/berndj/.cache/guile/ccache/2.0-LE-8-2.0/usr/local/geda/share/gEDA/system-gafrc.go
;;; compiling /usr/local/geda/share/gEDA/scheme/geda.scm
;;; compiled /home/berndj/.cache/guile/ccache/2.0-LE-8-2.0/usr/local/geda/share/gEDA/scheme/geda.scm.go

system-gafrc loads geda.scm with a call to (load-from-path "geda.scm") that occurs before the first use of build-path or load-scheme-dir. Unfortunately, the compiler can't "see through" the call to load-from-path, and hence can't see that geda.scm provides the definitions of the functions system-gafrc calls.

The compiler can't see through these calls in the general case because to do so would be to solve the halting problem. Just imagine that just above the call to (load-from-path "geda.scm") were a chunk of code that wrote a definition for build-path if and only if it searched ℤ3 and found a triple such that x3 + y3 = z3. We know now (thanks, Andrew Wiles) that this is impossible, so substitute a search based on any other as yet unproven conjecture, if you like. (This is a counter to the objection of the student (my younger self included) to the impossibility of solving the halting problem: "But I can show that these programs halt!")

And that is why it isn't reasonable to expect guile's auto-compiler to know that build-path and load-scheme-dir are, in fact, defined before use. That's a separate matter from whether it should warn or not.

Wednesday, October 31, 2012

Life in gEDA?

Vital signs weak, but detectable

A few days ago the geda-user mailing list suddenly remembered that development on gEDA/gaf had slowed to a near standstill. Some soul-searching ensued, and as usual John Doty castigated anyone daring to suggest that gschem's UI could stand a bit of improvement. If it ain't broke, don't fix it! In turn, as usual there were others to castigate him in turn for being unreasonable. Over the years that I've come to know his mailing list persona, I've long known that I disagree with him about the "micro" things: whether, and how, to use keyboard shortcuts, to integrate various tools more closely, and whether and how to add attributes that do special things. In my fork's case: to implement my solution to the "transistor problem", as well as to the related "package pin numbering problem" - both specific instances of the broader light-versus-heavy symbols problem.

My fork is found

And somebody noticed my fork! I'm glad - both because I'm vain and want to be noticed, and because I hope it's a sign that the cliqueishness / insularity of a few years is giving way to a more outward-looking culture. Perhaps they have to: commits to the main repository have slowed to near standstill and KiCAD has all but eaten gEDA's lunch ever since CERN chose the former over the latter. Perhaps, like some people afflicted with depression, the patient has now suffered enough and decides to get better. We'll see.

The truth shall set you free (but first it will cause pain)

There's also what I find a more credible explanation for why Anthony Blake quit working on gEDA than the notion that (presumably - see footnote [1]) KMK "drove him away". I stand by what I told Ales during an IRC conversation in #geda about the disintegrating developer community:

2011-05-19 04:58:21 <berndj>    i could also submit that when people quit, they may indicate "excuses" rather than "reasons" if, say, real-life circumstances were causing them to want to quit soon anyway

If DJ has, as I suspect, the real reason here, then I think a few apologies are due to the victims of the toxic-people witch-hunt. If apologies are too personal for what may have been an emergent and collective phenomenon, an olive branch.

Return from self-imposed exile?

I don't know yet. I've long suspected that I may have been a supposed "toxic person" - it is my perhaps somewhat paranoid way to make sense of how comprehensively my non-trivial patches have failed to find any traction with any of the "gEDA developers" clique. I'm done with explaining, at length, the features that my patches implement, and why I think they are a good idea. Clearly they're forgettable, and not as self-evidently the superior solution to the problems they seek to address as other folks' favoured solutions. No matter. I'm fluent in C, and I can keep maintaining my fork so that I, at least, can have my abstract slots and, one day when I teach pcb to do the other half of the work, pin and gate swapping back-annotations. Perhaps I'll just speak when spoken to, and not earlier.


[1] I'm piecing together the details from fragments, as Ales vaguely described the contents of the "I quit!" private emails he has received. Both KMK and JPD seemed to be the black sheep around the time the elders were discussing the breakdown of the community, but overall, I think the context of the IRC discussion hints that they attributed Anthony's departure to KMK's "abusiveness".

Tuesday, August 21, 2012

My response to "A Generation Lost in the Bazaar"

Poul-Henning Kamp argues that a whole generation of programmers is "lost" in the bazaar (strangely, although he refers to it, he seems to use different meanings of the words "bazaar" and "cathedral" than the ones used in ESR's The Cathedral and the Bazaar - PHK's "cathedral" seems to be an analogy for a designed program, rather than an analogy for a xenophobic elite) and wouldn't even recognize a cathedral if encountering one.

I added a comment and then a few hours later it was gone! ACM was censoring me! Luckily Google had indexed the page after I had commented, but before my comment disappeared, and I was able to reconstruct nearly my whole comment by searching for words and phrases I know I had used, and copying the search results' excerpts where I recognized them. It was only a temporary blackout though; I checked again as I write this and noticed my comment was back. Here it is, with some elaboration. (Sentences I missed in italics. Pretty good reconstruction, eh?)
Much of your rant seems predicated on the badness of the autotools. While some of your criticisms are certainly valid (even though I might not necessarily agree what the response should be), you seem to be burning the sources of inelegant software in effigy, by attacking the autotools. The creators of the autotools are not the ones responsible for the problems which the autotools were designed to address!

Secondly, some of your criticisms (here and in your autocrap rant) against the autotools are again misplaced: you seem to want to hold the autotools' creators responsible for the misuses to which others put them, sometimes due to innocent ignorance, other times due to an arrogant sense of superiority (poor translation: I want to convey the german word "besserwisserisch").

As others have pointed out, I'm not surprised either that there are features in a large dependency graph that go into a black hole, like the TIFF support you mention. My curiosity is somewhat piqued to know how this irony resolves. And specifically, is this a run-time dependency or a build-time dependency?

But here's another irony: you don't recognize the deliberate architecture to which the autotools were built. I wasn't around at the time to know if they were originally built cathedral-style (their development now seems very bazaar-like), and you might not appreciate the architectural style used, but boy, you'd better believe it, much of your interaction with them was *designed*. In my experience most of the "friction" in interacting with the autotools is due to the very sort of meta-ignorance you write of: not even knowing what the tools' raison d'être are, or "disagreeing" with them (as if that is possible). I like to say: Those who do not understand the autotools are doomed to reinvent them, poorly. Please understand that I mean a philosophical understanding; I'm sure you know very well what AC_CHECK_LIBS does, for example. And yes, I think I can agree with your criticism of keeping around 20-years-superseded feature checks.

P.S. I know I'm conflating cathedral-as-designed-artifact with cathedral-as-home-of-a-priesthood.
Others argued, and I agree, that a bazaar is the only style that is capable of delivering the gamut of systems we use today. PHK's lament reads a bit like a town planner despairing that there are no cities that consist only of cathedrals, that there's just so much damn chaos in the streets, with people just building their houses where they please! But that's pining for an unworthy goal: centrally planned economies don't work - a universally planned city would result in suburbs that are convenient to administer, but are not what their residents desire. Likewise, in the software world, one could have a landscape of only cathedrals, but then the bazaars would be gone, and so would be the users who shop at the bazaars.

I think it's important to consider the results of the evolutionary pressures acting on the software development industry. Clearly, the unruly bazaar strategies have largely displaced the cathedrals. In some instances we have even had direct invasions of individual members of the cathedral population: witness GCC's conversion to bazaar style with the 2.95 release. More recently, we had the XFree86 conversion to Xorg, another such conversion. I'm not aware of any conversions going the other way. This sort of conversion just doesn't happen consistently if there aren't strong ecological advantages to the bazaar strategy. And I suppose that's my objection to PHK's portrayal of our modern industry as a bunch of unruly wet-behind-the-ears kids who should get off his lawn, dammit, distilled into one word: ecology.