Boehm Garbage Collector
November 10th, 2007
Toying with the Nokia 770 has been teaching me more than I ever wanted to know about C, gcc, and the other things we do to write software for handhelds. I had known about the existence of garbage collectors for C and C++ but had always assumed they worked similar to GObject. A GObject programmer must call g_object_ref() and g_object_unref() to manipulate the reference count of each object on the heap. When the count reaches zero the space is reclaimed. A Boehm GC, rather, can function as a drop-in replacement for malloc() and free(). To my amazement the free() procedure isn’t required at all and can be replaced with a noop. The word “automagically” doesn’t even annoy me in this case since the Boehm GC is quite a trick. It can determine when an object on the heap is no longer required by comparing places where pointers usually live with it’s internal book. For example, a function that calls malloc usually sets a pointer on the stack to the allocated memory. When the stack shrinks the reference is gone and the dead memory can be automatically freed.
The main benefit, for me anyway, is that I can write consing functions in C. Any linked list implementation has a foreach method. Foreach is easy because it doesn’t have to keep the return value of its lambda argument. Map, filter, and almost every other good thing that comes from functional programming requires consing up a new list. Allocating the right amount of memory for each return value is doable. The difficult part is the bookeeping required to later free all the unreferenced lists. A functional program might map, filter, then reduce in a single line as it takes the GC for granted. Manual allocation is a better fit for building fewer, less disposable structures. I’m not sure I’ll use map that much with C but it would be a good way to test-drive the Boehm GC.
There has to be a catch. I’m still wide eyed that the Boehm GC even works. It might be a sign that Mozilla and X, the two biggest hogs on my system, use it.
Duck Typed Declarations of War
August 21st, 2007
Lessons from computer programming for U.S. government.
Duck-typing has gained a lot of prominence as a means to facilitate code reuse and refactoring. Proponents argue that the absence of type declarations results in shorter, easier to understand code that is simpler to maintain. For example, an objects which prints a list of its members names need not concern itself with the type of those members as long as each member has a name. If the object walks like a duck and talks like a duck, it’s a duck and it can be used and abused anywhere a duck might be needed.
Most statutory law code is duck-typed. A felon is someone who as been convicted of a felony and so on. There is no law that says John Doe must go to jail. John Doe only goes to jail if he is a criminal as declared in the statutes and implemented in precedent. Why then should the executive have a law that directs it to attack Iraq or Iran when what it really needs is a law that says attack Our Enemies? Despite the current limitations and red tape, it is possible to wage war to an interface. This benefits in a code of law that is easier to modify in the face of changing and conflicting requirements such as finding evidence of WMD, spreading democracy, stopping nuclear proliferation, bringing justice to Sadam, retaliation for Sept. 11, preventing terrorism, controlling oil supply, stabilization in the region, world peace, religious freedom, etc.
Rather than obtain a declaration of war from congress against a specific nation, a prudent executive should obtain permission to wage war on an abstract entity. Good picks include Terrorists and “Evildoersh” but even these may prove too constrictive for later refactoring. Goals always tend to change in any large project so it would be ideal to get a war declaration on the super, Our Enemies. Future implementations of democratic government should have better support for waging war directly on Our Enemies. Until then it’s possible to wage war on The Terrorists while raising few public exceptions.
Of course this is only proof of concept material and you shouldn’t trust it until TestCase::Iran is merged.
Get the source!
Ruby again: Any string can be a method name
May 9th, 2007
Mattz wrote that he designed Ruby with the intent of least surprise.
While tooling around with Liquid Templates I was surprised to learn that a method can begin with a capital letter. What a pleasant surprise. A little experimentation in irb provided more insight into allowable method names. Check it out.
irb(main):005:0> self.class.send :define_method, "Is this a funny name for a method?" do
irb(main):006:1* "Yes, but Ruby doesn't care."
irb(main):007:1> end
=> #<Proc:0xb7cf1900@(irb):5>
irb(main):008:0> self.send "Is this a funny name for a method?"
=> "Yes, but Ruby doesn't care."
Crazy!
Why directories?
May 8th, 2007
I had this crazy idea the other day that I can’t get out of my head. It’s sort of Copernican revolutionist and perhaps plain wrong so I don’t blame you if you disagree.
I was thinking it might be for the best if we could scrap directories in the filesystem. A regular file would then be linked to children, along with it’s stream. In other words, merge directories and files into one type. I suppose Dennis Ritchie thought about these things and implemented directories as a separate type from regular files because they ARE a separate type. Regular files are streams of data. Directories are a list of files, both regular and not. I should thank my lucky stars that the two are polymorphic enough to be listed along side each other when I type ls.
From a user perspective though it is sort of bothersome. I remember an old version of Word asking me if I wanted to link to a picture or include the picture data in file. Now, if Word files were implemented as directories we wouldn’t need to let the application worry about such things (assuming FAT had symlinks). I could cd mywordfile.doc and inspect the contents to see for myself whether or not it contained images or links to images. Database implementations almost always use a directory per database; why not every other program? Usually I want to create a rich (as in treelike) document, not a stream, but the application saves my data as a stream. Shouldn’t the OS handle this?
Positives of blurring regular files and directories
- One less first class data-type. Our filesystems now have at least three; regular files, directories, and sym-links. Hardlinks are not really first class. What I am proposing is to scrap the concept of directories and hardlink files in a hierarchical manner. I would never want to lose symlinks.
- No more non-standard formats for embedded images, etc.
- No more tar. One less step to do before or after transporting data.
- Less multipart-mixed sections in email because we could attach one file that contained the others.
- Less XML on disk, more on the wire. Perhaps if it were less of a chore to transmit a directory people would use them more often. We would see configuration files, hierarchical in nature like MS registry files, that use the OS instead of XML.
Negatives and Unknowns
- We can already accomplish this by using a convention like index.html, where the directory is made to contain data of it’s own as if it where a regular file.
- The transport problem is largely solved by tar, and I do mean largely. tar—help is 244 lines. Maybe email clients should open tar files instead of expecting us to include all the components of a document via multipart-mixed.
- Possibly lots of small files eating up blocks.
- Large number of open file handles.
- How would file permissions work? Same as they do now?
- Dennis Ritchie was probably right with our current implementation. The best way to avoid a leaky abstraction is to avoid abstraction. Directories and regular files are different enough that they warrant exposing different types to the user.
Finding a file by filename, the fast way.
April 9th, 2007
# Get list of files for grepping at my leisure
33 03 * * * root find / -mount > /root/files
Now look at the time I'm saving. Here I count the C files on the local disk using both a cached, and a non-cached list of files.
$ time sudo find / -mount -name *.c | wc
265 265 15190
real 1m17.288s
user 0m1.168s
sys 0m2.772s
$ time sudo grep "\.c$" /root/files | wc
265 265 15190
real 0m0.309s
user 0m0.280s
sys 0m0.020s
Of course you need to set logical permissions on the /root/files file. And note that on my system this is a 35MB file with almost 500,000 lines so be careful not to open it in MS Word :)
*** Update 05-08-2007
slocate is better!