miércoles, marzo 04, 2009

signal slot qt/moc vs templates

http://doc.trolltech.com/4.5/templates.html

Why Doesn't Qt Use Templates for Signals and Slots?

Templates are a builtin mechanism in C++ that allows the compiler to generate code on the fly, depending on the type of the arguments passed. As such, templates are highly interesting to framework creators, and we do use advanced templates in many places in Qt. However, there are limitations: There are things that you can easily express with templates, and there are things that are impossible to express with templates. A generic vector container class is easily expressible, even with partial specialisation for pointer types, while a function that sets up a graphical user interface based on a XML description given as a string is not expressible as template. And then there is gray area in between. Things that you can hack with templates at the cost of code size, readability, portability, usability, extensability, robustness and ultimately design beauty. Both templates and the C preprocessor can be stretched to do incredibility smart and mind boggling things. But just because those things can be done, does not necessarily mean doing them is the right design choice.

There is an important practical challenge we have to mention: due to the inadequacies of various compilers it is still not possible to fully exploit the template mechanism in cross-platform applications. Code unfortunately is not meant to be published in books, but compiled with real-world compilers on real-world operating system. Even today, many widely used C++ compilers have problems with advanced templates. For example, you cannot safely rely on partial template specialisation, which is essential for some non-trivial problem domains. Some compilers also have limitations with regards to template member functions, which make it hard to combine generic programming with object orientated programming. However, we do not perceive these problems as a serious limitation in our work. Even if all our users had access to a fully standards compliant modern C++ compiler with excellent template support, we would not abandon the string-based approach used by our meta object compiler for a template based signals and slots system. Here are five reasons why:

Syntax matters

Syntax isn't just sugar: the syntax we use to express our algorithms can significantly affect the readability and maintainability of our code. The syntax used for Qt's signals and slots has proved very successful in practice. The syntax is intuitive, simple to use and easy to read. People learning Qt find the syntax helps them understand and utilize the signals and slots concept -- despite its highly abstract and generic nature. Furthermore, declaring signals in class definitions ensures that the signals are protected in the sense of protected C++ member functions. This helps programmers get their design right from the very beginning, without even having to think about design patterns.

Code Generators are Good

Qt's moc (Meta Object Compiler) provides a clean way to go beyond the compiled language's facilities. It does so by generating additional C++ code which can be compiled by any standard C++ compiler. The moc reads C++ source files. If it finds one or more class declarations that contain the Q_OBJECT macro, it produces another C++ source file which contains the meta object code for those classes. The C++ source file generated by the moc must be compiled and linked with the implementation of the class (or it can be #included into the class's source file). Typically moc is not called manually, but automatically by the build system, so it requires no additional effort by the programmer.

The moc is not the only code generator Qt is using. Another prominent example is the uic (User Interface Compiler). It takes a user interface description in XML and creates C++ code that sets up the form. Outside Qt, code generators are common as well. Take for example rpc and idl, that enable programs or objects to communicate over process or machine boundaries. Or the vast variety of scanner and parser generators, with lex and yacc being the most well-known ones. They take a grammar specification as input and generate code that implements a state machine. The alternatives to code generators are hacked compilers, proprietary languages or graphical programming tools with one-way dialogs or wizards that generate obscure code during design time rather than compile time. Rather than locking our customers into a proprietary C++ compiler or into a particular Integrated Development Environment, we enable them to use whatever tools they prefer. Instead of forcing programmers to add generated code into source repositories, we encourage them to add our tools to their build system: cleaner, safer and more in the spirit of UNIX.

GUIs are Dynamic

C++ is a standarized, powerful and elaborate general-purpose language. It's the only language that is exploited on such a wide range of software projects, spanning every kind of application from entire operating systems, database servers and high end graphics applications to common desktop applications. One of the keys to C++'s success is its scalable language design that focuses on maximum performance and minimal memory consumption whilst still maintaining ANSI C compatibility.

For all these advantages, there are some downsides. For C++, the static object model is a clear disadvantage over the dynamic messaging approach of Objective C when it comes to component-based graphical user interface programming. What's good for a high end database server or an operating system isn't necessarily the right design choice for a GUI frontend. With moc, we have turned this disadvantage into an advantage, and added the flexibility required to meet the challenge of safe and efficient graphical user interface programming.

Our approach goes far beyond anything you can do with templates. For example, we can have object properties. And we can have overloaded signals and slots, which feels natural when programming in a language where overloads are a key concept. Our signals add zero bytes to the size of a class instance, which means we can add new signals without breaking binary compatibility. Because we do not rely on excessive inlining as done with templates, we can keep the code size smaller. Adding new connections just expands to a simple function call rather than a complex template function.

Another benefit is that we can explore an object's signals and slots at runtime. We can establish connections using type-safe call-by-name, without having to know the exact types of the objects we are connecting. This is impossible with a template based solution. This kind of runtime introspection opens up new possibilities, for example GUIs that are generated and connected from Qt Designer's XML uifiles.

Calling Performance is Not Everything

Qt's signals and slots implementation is not as fast as a template-based solution. While emitting a signal is approximately the cost of four ordinary function calls with common template implementations, Qt requires effort comparable to about ten function calls. This is not surprising since the Qt mechanism includes a generic marshaller, introspection, queued calls between different threads, and ultimately scriptability. It does not rely on excessive inlining and code expansion and it provides unmatched runtime safety. Qt's iterators are safe while those of faster template-based systems are not. Even during the process of emitting a signal to several receivers, those receivers can be deleted safely without your program crashing. Without this safety, your application would eventually crash with a difficult to debug free'd memory read or write error.

Nonetheless, couldn't a template-based solution improve the performance of an application using signals and slots? While it is true that Qt adds a small overhead to the cost of calling a slot through a signal, the cost of the call is only a small proportion of the entire cost of a slot. Benchmarking against Qt's signals and slots system is typically done with empty slots. As soon as you do anything useful in your slots, for example a few simple string operations, the calling overhead becomes negligible. Qt's system is so optimized that anything that requires operator new or delete (for example, string operations or inserting/removing something from a template container) is significantly more expensive than emitting a signal.

Aside: If you have a signals and slots connection in a tight inner loop of a performance critical task and you identify this connection as the bottleneck, think about using the standard listener-interface pattern rather than signals and slots. In cases where this occurs, you probably only require a 1:1 connection anyway. For example, if you have an object that downloads data from the network, it's a perfectly sensible design to use a signal to indicate that the requested data arrived. But if you need to send out every single byte one by one to a consumer, use a listener interface rather than signals and slots.

No Limits

Because we had the moc for signals and slots, we could add other useful things to it that could not be done with templates. Among these are scoped translations via a generated tr() function, and an advanced property system with introspection and extended runtime type information. The property system alone is a great advantage: a powerful and generic user interface design tool like Qt Designer would be a lot harder to write - if not impossible - without a powerful and introspective property system. But it does not end here. We also provide a dynamic qobject_cast() mechanism that does not rely on the system's RTTI and thus does not share its limitations. We use it to safely query interfaces from dynamically loaded components. Another application domain are dynamic meta objects. We can e.g. take ActiveX components and at runtime create a meta object around it. Or we can export Qt components as ActiveX components by exporting its meta object. You cannot do either of these things with templates.

C++ with the moc essentially gives us the flexibility of Objective-C or of a Java Runtime Environment, while maintaining C++'s unique performance and scalability advantages. It is what makes Qt the flexible and comfortable tool we have today.

domingo, marzo 01, 2009

Linux vs BSD by Linus Tordvalds

http://www.linux.com/articles/45571

I recently asked Linus Torvalds for his thoughts on the relative strengths and weaknesses of Linux and BSD, and about how much synergy there might be between the Linux kernel and the BSDs.
I prefaced my query to Linus by recounting my observations from a Usenix conference in San Diego a few years ago. He was a speaker that day, and a group of BSD users came right down to the front row to hear him. In fact, they laughed and joked with him, and eventually gave Linus one of the beanies with horns on it they were wearing, a familiar symbol to BSD fans.

They may have been surprised by his reaction. I was. He took the beanie they offered, put it on, and wore it during his entire presentation. No big deal, the leader of the Linux kernel wearing BSD colors. He defused what could have been a contentious moment.

NewsForge: I want to ask you a few uninformed questions about the similarities, differences, and synergy -- if any -- between the Linux kernel and the BSDs.

Torvalds: I really don't much like the comparisons. In many ways they aren't even valid, since "better" always ends up depending on "for what?" and "according to what criteria?".

NF: BSD is still considered by some to be more "technically correct" than the Linux kernel. Do you think the BSDs are better technically than the Linux kernel?

Torvalds: Linux has a much wider audience, in many ways. That ranges from supporting much wider hardware (both in the driver sense and in the architecture sense) to actual uses. The BSDs tend to be focused in specific areas, while I have always personally felt that any particular focus on any particular use is a bad thing.

Which one is "better"? To me, Linux is much better, since to me, the important thing for an OS is how well it performs under different patterns, be they embedded, server, or desktop, or just some totally crazy person in a basement trying something new.

But some people disagree with me, and like to limit their work to specific areas, and like the fact that developers have one cohesive goal, and don't care about anything else. Some people consider the Linux development model "too permissive," in other words -- they want the project to concentrate on X, where X is some random area that they care about.

Which mindset is right? Mine, of course. People who disagree with me are by definition crazy. (Until I change my mind, when they can suddenly become upstanding citizens. I'm flexible, and not black-and-white.)

NF: If the BSDs were better technically five years ago, has the playing field leveled since then?

Torvalds: I don't think they were better five years ago (see above), and I don't think the question really makes sense.

Are there areas where you could point to "X does Y better"? Oh, sure, that's inevitable. But exactly because Linux tries to be "good enough" for everybody, you'll find a lot of areas where Linux is better (often a lot better -- as in "it works"), and then you'll find a few narrow areas where one particular BSD version will be better.

To me, it's largely a mentality issue. I said "good enough," and that's really telling. The BSD people (and keep in mind that I'm obviously generalizing) are often perfectionists. They hone something specific for a long time, and then they frown on anything that doesn't meet their standards of perfection. The OpenBSD single-minded focus on security is a good example.

In contrast, one of my favorite mantras is "perfect is the enemy of good," and the idea is that "good enough" is actually a lot more flexible than some idealized perfection. The world simply isn't black-and-white, and I recognize a lot of grayness. I often find black-and-white people a bit stupid, truth be told.

NF: Is sharing between BSD and the Linux kernel a common occurrence? And if so, does it go both ways?

Torvalds: It's quite rare on the kernel level. It happens occasionally, mainly in drivers, and sometimes on the "idea" level (don't get me wrong -- it's not an acrimonious setting, and people do talk about things). But the fact is, it's usually more effort to share things and try to synchronize and agree on them than it is to have independent projects.

On a user level, there's obviously tons of sharing, since there you don't have the communication issues, and user projects tend to be pretty independent of each other (and the kernel) anyway.

NF: Are there parts of BSD today that you would like to see adopted in the kernel?

Torvalds: I certainly don't have any specifics, but that's not saying that I'd be against it. It just means that I don't know anything about BSD technical internals, so I'm the wrong person to ask. Ask somebody who uses both.

Note: Tune in Wednesday for the views of BSD leaders as they answer the same questions posed to Linus.

Linux monolitickernels microkernels Linus vs Tanenbaum

http://en.wikipedia.org/wiki/Tanenbaum-Torvalds_debate

Sorpresa, Linus no es tan terrible

Tannenbaum inició las críticas poco constructivas

Linus respondío duramente, pero Linus trató de parar la bronca con una respuesta pública y otra privada




The debate

Graphic of a monolithic kernel running kernel space entirely in supervisor mode
Dr. Andrew S. Tanenbaum (ast in comp.os.minix)
Linus Torvalds

While the debate initially started out as relatively moderate, with both parties involved making only basal statements about kernel design, it would get progressively more detailed and sophisticated with every round of posts. Besides just kernel design, the debate branched into several other topics, such as which microprocessor architecture would win out over others in the future. Besides Tanenbaum and Torvalds, several other people joined the debate, including Peter MacDonald, an early Linux kernel developer and creator of one of the first distributions, Softlanding Linux System, David S. Miller, one of the core developers of the Linux kernel, and Theodore Ts'o, the first North American Linux kernel developer.

[edit] “Linux is obsolete”

The first occurrence of this debate was recorded on January 29, 1992, when Tanenbaum first posted his criticism on the Linux kernel to comp.os.minix, noting how the monolithic design was detrimental to its abilities, in a post titled Linux is obsolete.[1] While he initially did not go into great technical detail to explain why he felt that the microkernel design was better, he did suggest that it was mostly related to portability, arguing that the Linux kernel was too closely tied to the x86 line of processors to be of any use in the future, as this architecture would be superseded by then. To put things into perspective, he mentioned how writing a monolithic kernel in 1991 is "a giant step back into the 1970s".

Since the criticism was posted in a public newsgroup, Torvalds was able to respond to it directly. He did so a day later, arguing that MINIX has inherent design flaws (naming the lack of multithreading as a specific example), while acknowledging that he finds the microkernel kernel design to be superior “from a theoretical and aesthetical” point of view.[4] He also claimed that since he was developing the Linux kernel in his spare time and giving it away for free (Tanenbaum's MINIX was not free at that time), Tanenbaum should not object to his efforts. Furthermore, he mentioned how he developed Linux specifically for the Intel 80386 due to the project being partly just so he could learn more about the architecture; while he argued that this made the kernel itself less portable than MINIX, it was an acceptable design principle, as it made the application programming interface much simpler and more portable. For this reason, he stated, “linux is more portable than minix. [sic]”

Microkernel architecture relies on user-space server programs

Following Linus' reply, Tanenbaum argued that the limitations of MINIX relate to him being a professor, stating the requirement for the system to be able to run on the rather limited hardware of the average student, which he noted was an Intel 8088-based computer, sometimes even without a hard drive.[11] Linux was, at that time, specifically built for the Intel 80386, a significantly more powerful (and expensive) processor. Tanenbaum also specifically states “[...] as of about 1 year ago, there were two versions [of MINIX], one for the PC (360K diskettes) and one for the 286/386 (1.2M). The PC version was outselling the 286/386 version by 2 to 1.” He noted that even though Linux was free, it wouldn't be a viable choice for his students, as they would not be able to afford the expensive hardware required to run it, and that MINIX could be used on “a regular 4.77 MHz PC with no hard disk.” To this, Kevin Brown, another user of the Usenet group, replied that Tanenbaum should not complain about Linux's ties to the 386 architecture, as it was the result of a conscious choice rather than lack of knowledge about operating system design, stating “[...] an explicit design goal of Linux was to take advantage of the special features of the 386 architecture. So what exactly is your point? Different design goals get you different designs.”[12] He also stated that designing a system specifically for cheap hardware would cause it to have portability problems in the future. Despite the fact that MINIX did not fully support the newer hardware that Linux did support, thus making Linux a better choice for those who actually owned that hardware, Tanenbaum argued that since the x86 architecture would be outdone by other architecture designs in the future, he did not need to address the issue, noting “Of course 5 years from now that will be different, but 5 years from now everyone will be running free GNU on their 200 MIPS, 64M SPARCstation-5.” He stated that the Linux kernel would eventually fall out of taste as hardware would progress, due to it being so closely tied to the 386 architecture.[11] (See section “Erroneous predictions” for a detailed account of this claim.)

Torvalds attempted to end the discussion at that point, stating that he felt he should not have overreacted to Tanenbaum's initial statements, and that he was composing a personal e-mail to him to apologize.[13] However, he would continue the debate at a later time.

[edit] Aftermath

Despite this debate, Torvalds and Tanenbaum appear to be on good speaking terms; Torvalds wants it understood that he holds no animosity towards Tanenbaum, and Tanenbaum underlines that disagreements about ideas or technical issues should not be interpreted as personal feuds.[14]

[edit] Erroneous predictions

When the issue and full initial debate was published in the O'Reilly Media book Open Sources: Voices from the Open Source Revolution in 1999, it stated that the debate exemplifies “the way the world was thinking about OS design at the time”.[14]

The 386 processor was then the most widespread chip “by several times”, according to participant Kevin Brown, with the 486 used in high-end computers, and the 286 almost obsolete, and the World Wide Web was not yet widely used. One of Tanenbaum's arguments against Linux was that it was too closely tied to the x86 line of processors, which he claimed was “not the way to go”.[1] However, as of 2009, x86 remains the overwhelmingly dominant CPU architecture on desktop computers. Linux has since been ported to many other processor architectures, including x86-64, ARM, IA-64, 680x0, MIPS, POWER/PowerPC, and SPARC.

Another recurring topic in the debate discusses alternatives to Linux and MINIX, such as GNU and 4.4BSD. Tanenbaum suggested the former in his first post, stating that unlike Linux, it was a “modern” system.[1] In his second post, he mentioned that “[...] 5 years from now everyone will be running free GNU on their 200 MIPS, 64M SPARCstation-5”.[11] Several debaters disagreed that GNU was a suitable alternative. Kevin Brown called it vaporware, and stated that Linux would likely benefit from the x86 architecture which would continue to be common and become more accessible to a general audience. Theodore Ts'o, an early Linux contributor, said that while a microkernel approach would have benefits, “[...] Linux is here, and GNU isn't — and people have been working on Hurd for a lot longer than Linus has been working on Linux”.[15] Torvalds, aware of GNU's efforts to create a kernel, stated “If the GNU kernel had been ready last spring, I'd not have bothered to even start my project: the fact is that it wasn't and still isn't.”[16]

4.4BSD-Lite would not be available until two years later due to the USL v. BSDi lawsuit, filed by AT&T's subsidiary Unix System Laboratories against Berkeley Software Design, which pertained to the intellectual property related to UNIX. The lawsuit slowed development of the free-software descendants of BSD for nearly two years while their legal status was in question. As Linux did not have such legal ambiguity, systems based on it gained greater support. A settlement between USL v. BSDi was reached in January 1994, and 4.4BSD was released in June. (While the final release was in 1995, several free versions based on this version have been maintained since, including FreeBSD, OpenBSD and NetBSD.)

[edit] The Samizdat incident

On 23 March 2004, Kenneth Brown, president of the Alexis de Tocqueville Institution, interviewed Tanenbaum. This was a prelude to the pending publication of a book by Brown titled Samizdat: And Other Issues Regarding the 'Source' of Open Source Code. The book claims that Linux was initially illegally copied from MINIX. Tanenbaum published a strong rebuttal, defending Torvalds,[17] and stated at that time:

I would like to close by clearing up a few misconceptions and also correcting a couple of errors. First, I REALLY am not angry with Linus. HONEST. He's not angry with me either. I am not some kind of "sore loser" who feels he has been eclipsed by Linus. MINIX was only a kind of fun hobby for me. I am a professor. I teach and do research and write books and go to conferences and do things professors do. I like my job and my students and my university. [...] I wrote MINIX because I wanted my students to have hands-on experience playing with an operating system. After AT&T forbade teaching from John Lions' book, I decided to write a UNIX-like system for my students to play with. [...] I was not trying to replace GNU/HURD or Berkeley UNIX. Heaven knows, I have said this enough times. I just wanted to show my students and other students how you could write a UNIX-like system using modern technology. A lot of other people wanted a free production UNIX with lots of bells and whistles and wanted to convert MINIX into that. I was dragged along in the maelstrom for a while, but when Linux came along, I was actually relieved that I could go back to professoring. [...] Linus seems to be doing excellent work and I wish him much success in the future. While writing MINIX was fun, I don't really regard it as the most important thing I have ever done. It was more of a distraction than anything else. The most important thing I have done is produce a number of incredibly good students, especially Ph.D. students. See my home page for the list. They have done great things. I am as proud as a mother hen. To the extent that Linus can be counted as my student, I'm proud of him, too. Professors like it when their students go on to greater glory.[18]

Re: clarification on git, central repositories and commit access lists

http://lwn.net/Articles/246381/

¿Linux es borde, autoritario y egocéntrico?
Discutible

Lo que es seguro es un genial programador y un tío constructivo que hace y ayuda mucho




From: Linus Torvalds
To: Adam Treat
Subject: Re: clarification on git, central repositories and commit access lists
Date: Mon, 20 Aug 2007 11:41:05 -0700 (PDT)
Message-ID:
Cc: kde-core-devel-AT-kde.org

On Sun, 19 Aug 2007, Adam Treat wrote:
>
> I just watched your talk on git and wanted to ask for clarification on a
> few points. Many of us in the KDE community are interested in git and
> some even contemplate using git as the official SCM tool in the future.

As you are probably aware, some people have tried to import the whole KDE
history into git. Quite frankly, the way git works (tracking whole trees
at a time, never single files), that ends up being very painful, because
it's an "all or nothing" approach.

So I'm hoping that if you guys are seriously considering git, you'd also
split up the KDE repository so that it's not one single huge one, but with
multiple smaller repositories (ie kdelibs might be one, and each major app
would be its own), and then using the git "submodule" support to tie it
all together.

> However, I think a few issues have been confused and want to see if you
> can clarify.

Sure.

> Your talk focused heavily on the evils of a central repository versus
> the benefits of a distributed model. However, I wonder if what you
> actually find distasteful is not a central repository per se, but rather
> designing an SCM that relies upon *communication* with a central
> repository to do branching/merging or offline development.

I certainly agree that almost any project will want a "central" repository
in the sense that you want to have one canonical default source base that
people think of as the "primary" source base.

But that should not be a *technical* distinction, it should be a *social*
one, if you see what I mean. The reason? Quite often, certain groups would
know that there is a primary archive, but for various reasons would want
to ignore that knowledge: the reasons can be any of

- Release management: you often want the central "development" repository
to be totally separate from the release management tree. Yes, you
approximate that with branches, but let's face it, the people involved
usually have a lot of overlap, but the overlap is not total, and the
*interest* isn't necessarily the same.

For an example of "release management", think of multiple different
vendors. They would probably always start with your "central" release
tree (which in turn may well be different from your central development
tree!), but vendors invariably have their own timetables and customer
issues, so they usually need to make decisions that may not even make
sense for the "official" tree.

Examples of this in the kernel is how my tree is the central
development tree, then we have the "stable" tree (which is a *separate*
thing, maintained totally separately, but obviously based on my
releases), and then each vendor tends to have their own "release
trees". They are all different, they all have different policies and
reasons for existence, and they are *all* "central" depending on who
looks at them.

- Branching. Yes, you can branch in a truly centralized model too, but
it's generally a "big issue" - the branches are globally visible
things, and you need permission from the maintainers of the centralized
model too.

Both of those are *horrible* mistakes: the "globally visible" part
means that if you're not sure this makes sense, you're much less likely
to begin a branch - even if it's cheap, it's still something that
everybody else will see, and as such you can't really do "throwaway"
development that way. And let's face it, many cool ideas turn out to be
totally idiotic, but it might take a long time until it's obvious that
it was a bad idea.

So you absolutely need *private* branches, that can becom "central" for
the people involved in some re-architecting, even if they never ever
show up in the "truly central" repository. That's a huge deal for
development.

The other problem is the "permission from maintainers" thing: I have an
ego the size of a small planet, but I'm not _always_ right, and in that
kind of situation it would be a total disaster if everybody had to ask
for my permission to create a branch to do some re-architecting work.

The fact that anybody can create a branch without me having to know
about it or care about it is a big issue to me: I think it keeps me
honest. Basically, the fundamental tool we use for the kernel makes
sure that if I'm not doing a good job, anybody else can show people
that they do a better job, and nobody is really "inconvenienced".

Compare that to some centralized model, and something like the gcc/egcs
fork: the centralized model made the fork so painful that it became a
huge political fight, instead of just becoming an issue of "we can do
this better"!

There are other reasons for having a *social* network that tends to have
one or two fairly central nodes, but not having a *technical* limitation
that enforces that. But the above are the two biggest and most important
reasons, I think-

> After all, your repository acts as a de-facto central repository of the
> linux kernel in as much as everyone pulls from it. Without such a
> central place to pull the linux kernel would not exist, rather what
> you'd have is a bunch of forks which perhaps merge with each other from
> time to time.

Well, I do want to make it clear that we *do* have such forks that pull
from each other too. So the kernel actually does use the technology, it's
just that you have to be involved in the particular subprojects to even
know or care about it!

So it's not strictly true that there is a single "central" one, even if
you ignore the stable tree (or the vendor trees). There are subsystems
that end up working with each other even before they hit the central tree
- but you are right that most people don't even see it. Again, it's the
difference between a technical limitation, and a social rule: people use
multiple trees for development, but because it's easier for everybody to
have one default tree, that's obviously what most people who aren't
actively developing do.

To put this in a KDE perspective: it would make tons and tons of sense to
have one central place (kde.org) that most developers know about, and
where they would fetch their sources from. But for various reasons (and
security is one of them), that may not be the main place where most "core
developers" really work. You would generally want to have separate places
that are secure, and those separate places may be *different* for
different developer groups.

For a kernel example: the "public" git tree is on the public kernel.org
servers (including "git.kernel.org"), but that is actually not a machine
that any developers really ever push to directly.

Many kernel developers use other kernel.org machines (because we have the
infrastructure), but others will use their own setups entirely, because
they might have issues like bandwidth (ie kernel.org may be reasonably
well connected, but while it has mirrors elsewhere, the main machines are
in the US, so some European developers prefer to just use servers that are
closer).

So if you look at my merge messages, for example, you'll see things like
merges from lm-sensors.org, git.kernel.dk, ftp.linux-mips.org, oss.sgi.com
etc etc. The point being that yes, there is a central place that people
know about, but at the same time, much of the *development* really happens
outside that central place!

> For any software project to exist as opposed to a bunch of forks I think
> you *have to have* a central repository from which everyone pulls, no?
> Of course many branches might exist, but those branches must pull from a
> central repository if they want to share *at least some* common code.

Practically speaking, you'd generally have one or a few central
repositories, yes. But no, it really doesn't have to be a single one. And
I'm not just talking about mirroring (which is really easy with a
distributed setup), I'm literally talking about things like some people
wanting to use the "stable" tree, and not my tree at all, or the vendor
trees.

And they are obviously *connected*, but it doesn't have to be a totally
central notion at all.

Think of the git trees as people: some people are more "central" than
others, but in the end, the kernel is actually fairly unusual (at least
for a big project) in having just *one* person that is so much in the
"center" that everybody knows about him.

In most other projects, you literally would have different groups that
handle different parts. In the KDE group, for example, there really is no
reason why the people who work on one particular application should ever
use the same "central" repository as the people who work on another app
do.

You'd have a *separate* group (that probably also maintains some central
part like the kdelibs stuff) that might be in charge of *integrating* it
all, and that integration/core group might be seen to outsiders as the
"one central repository", but to the actual application developers, that
may actually be pretty secondary, and as with the kernel, they may
maintain their own trees at places like ftp.linux-mips.org - and then just
ask the core people to pull from them when they are reasonably ready.

See? There's really no more "one central place" any more. To the casual
observer, it *looks* like one central place (since casual users would
always go for the core/integration tree), but the developers themselves
would know better. If you wanted to develop some bleeding edge koffice
stuff, you'd use *that* tree - and it might not have been merged into the
core tree yet, because it might be really buggy at the moment!

This is one of the big advantages of true distribution: you can have that
kind of "central" tree that does integration, but it doesn't actually have
to integrate the development "as it happens". In fact, it really really
shouldn't. If you look at my merges, for example, when I merge big changes
from somebody else who actually maintains them in a git tree, they will
have often been done much earlier, and be a series of changes, and I only
merge when they are "ready".

So the core/central people should generally not necessarily even do any
real development at all: the tree that people see as the "one tree" is
really mostly just an integration thing. When the koffice/kdelibs/whatever
people decide that they are ready and stable, they can tell the
integration group to pull their changes. There's obviously going to be
overlap between developers/integrators (hopefully a *lot* of overlap), but
it doesn't have to be that way (for example, I personally do almost *only*
integration, and very little serious development).

> A central repository is also necessary for projects like KDE to enable
> things like buildbots and commit mailing lists.

I disagree.

Yes, you want a central build-bot and commit mailing list. But you don't
necessarily want just *one* central build-bot and commit mailing list.

There's absolutely no reason why everybody would be interested in some
random part of the tree (say, kwin), and there's no reason why the people
who really only do kwin stuff should have to listen to everybody elses
work. They may well want to have their *own* build-bot and commit mailing
list!

So making one central one is certainly not a mistake, but making *only* a
central one is. Why shouldn't the groups that do specialized work have
specialized test-farms? The kernel does. The NFS stuff, for example, tends
to have its own test infrastructure.

Also, it's a mistake to think that one site has to do everything. That's
not what we do in the kernel, for example. Yes, we have kernel.org, and
it's reasonably central, but that doesn't mean that everything has to, or
even should, happen within that organization.

So we've had people do build-bots and performance regressions, and
specialized testing *outside* of kernel.org. For example, intel and others
have done things like performance regression testing that required
specialized hardware and software (eg TPC-C performance numbers).

So we do commit mailing lists from kernel.org, but (a) that doesn't mean
that everything else should be done from that central site and (b) it also
doesn't mean that subprojects shouldn't do their *own* commit mailing
lists. In fact, there's a "gitstat" project (which tracks the kernel, but
it's designed to be available for *any* git project), and you can see an
example of it in action at

http://tree.celinuxforum.org/gitstat

(or get the source code from sourceforge), and the point is that all of
this was done entirely *outside* the kernel.org framework.

So centralized is not at all always good. Quite the reverse: having
distributed services allows *specialized* services, and it also allows the
above kind of experimental stuff that does some (fairly simple, but maybe
it will expand) data-mining on the project!


> These tools are important to the way we work and provide for many eyes
> constantly reviewing changes to the codebase as well as regular
> regression testing across diverse platforms. In the future, whether git
> or svn, I see no advantages in getting rid of a central repository from
> which everyone pulls. I wonder whether you really disagree.

So I do disagree, but only in the sense that there's a big difference
between "a central place that people can go to" and "ONLY ONE central
place".

See? Distribution doesn't mean that you cannot have central places - but
it means that you can have *different* central places for different
things. You'd generally have one central place for "default" things
(kde.org), but other central places for more specific or specialized
services!

And whether it's specialized by project, or by things like the above
"special statistics" kind of thing, or by usage, is another matter! For
example, maybe you have kde.org as the "default central place", but then
some subgroup that specializes in mobility and small-memory-footprint
issues might use something like kde.mobile.org as _their_ central site,
and then developers would occasionally merge stuff (hopefully both ways!)

> In your talk you also focus on the evils of commit access lists,
> comparing and contrasting with the web of trust the kernel uses where
> you have no commit access lists at all. However, isn't the kernel model
> just a special case? The linux kernel has a de-facto commit access list
> of one: you.

No, really. It doesn't. It's the one you see from the outside, but the
fact is, different sub-parts of the kernel really do use their own trees,
and their own mailing lists. You, as a KDE developer, would generally
never care about it, so you only _see_ the main one.

> This might work well for the kernel, but I fail to see how this really
> reduces politics. Many are still constantly pushing and arguing to
> merge their branches upstream into your repository. Would having a
> central repository where you and all your trusted lieutenants push their
> changes really be very different?

Yes it would be. You only see the end result now. You don't see how those
lieutenants have their own development trees, and while the kernel is
fairly modular (so the different development trees seldom have to interact
with each others), they *do* interact. We've had the SCSI development tree
interact with the "block layer" development tree, and all you ever see is
the end result in my tree, but the fact is, the development happened
entirely *outside* my tree.

The networking parts, for example, merge the crypto changes, and I then
merge the end result of the crypto _and_ network changes.

Or take the powerpc people: they actually merge their basic architecture
stuff to me, but their network driver stuff goes through Jeff Garzik - and
you as a user never even realize that there was another "central" tree for
network driver development, because you would never use it unless you had
reported a bug to Jeff, and Jeff might have sent you a patch for it, or
alternatively he might have asked if you were a git user, and if so,
please pull from his 'e1000e' branch.

For an example of this, go to

http://git.kernel.org/

and look at all the projects there. There are lots of kernel subprojects
that are used by developers - exactly so that if you report a bug against
a particular driver or subsystem, the developer can tell you to test an
experimental branch that may fix it.

> The KDE community has a very large commit access list and it is quite
> easy to join. Having a central git repository with a large set of
> committers would seem to map well with our community. I fail to see any
> harm in this model. The web of trust would still exist, it would just
> be much larger and more inclusive than the model the kernel uses. I
> wonder if you disagree.

Hey, you can use your old model if you want to. git doesn't *force* you to
change. But trust me, once you start noticing how different groups can
have their own experimental branches, and can ask people to test stuff
that isn't ready for mainline yet, you'll see what the big deal is all
about.

Centralized _works_. It's just *inferior*.

> Another sticking point is the performance implications of a git
> repository managing something the size of the KDE project. I understand
> the straightforward solution: just define content boundaries with a
> separate git repo for each submodule: kdelibs.git, kdebase.git,
> kdesupport.git, etc, etc. And then have a super git repo with hooks
> that point to these submodules. However, I think this leads to a few
> problems.
>
> What if I want to make a commit to kdelibs that will require changes in
> other modules for them to compile. I will no longer be able to make a
> single atomic commit with changes to multiple submodules, right?

Sure you will. It's hierarchical, though.

What happens is that you do a single commit in each submodule that is
atomic to that *private* copy of that submodule (and nobody will ever see
it on its own, since you'd not push it out), and then in the supermodule
you make *another* commit that updates the supermodule to all the changes
in each submodule.

See? It's totally atomic. Anybody that updates from the supermodule will
get one supermodule commit, when when that in turn fetches all the
submodule changes, you never have any inconsistent state.

> Also, won't we lose history when moving files/content between
> submodules?

Yes. If you move stuff between repositories, you do lose history (or
rather, it breaks it as far as git is concerned - you still obviously have
both *pieces* of history, but to see it, you'd have to manually go and
look).

The point of submodules is that they are totally independent entities in
their own right, so that you can develop on a submodule without having to
even know about or care about the supermodule.

Git actually does perform fairly well even for huge repositories (I fixed
a few nasty problems with 100,000+ file repos just a week ago), so if you
absolutely *have* to, you can consider the KDE repos to be just one single
git repository, but that unquestionably will perform worse for some things
(notably, "git annotate/blame" and friends).

But what's probably worse, a single large repository will force everybody
to always download the whole thing. That does not necessarily mean the
whole *history* - git does support the notion of "shallow clones" that
just download part of the history - but since git at a very fundamental
level tracks the whole tree, it forces you to download the whole "width"
of the tree, and you cannot say "I want just the kdelibs part".

> And how will we break up the existing history between all of these
> submodules?

There's a few options for that.

One is to just import the SVN history per directory in the first place,
but that makes it hard to then tie the history together in the
supermodule.

The better approach is probably to import the *whole* thing (which will
require a rather beefy machine), and then split it up from within git.
There are various tools on the git side to basically rewrite the history
in other formats, including splitting up a bigger repository (google for
"git-split", for example).

But I certainly won't lie to you: importing all the history of KDE is
going to be a fairly big project, and it will require people who have good
git knowledge to set it up. I suspect (judging by some noises I've seen on
the git mailing list and irc channel) that you have those kinds of people
already, but it may well be a good idea to _avoid_ doing it as one big
"everything at once" kind of event.

So seriously, I would suggest that if there is currently some smaller part
of the KDE SVN tree, and the people who work on that part are already more
familiar with git than most KDE people necessarily are, I suspect that the
best thing to do is to convert just that piece first, and have people
migrate in pieces. Because any SCM move is going to be a learning process
(the CVS->SVN one is much easier than most, since they really are largely
just different faces of the same coin - no real changes in how things
fundamentally work as far as the user experience is concerned).

> Finally, a couple points... CVS/SVN might be stupid and moronic, but I
> think it is good to note they are not nearly as bad as some other SCM's.
> Many SCM's used by some of the largest codebases in the world are still
> lock-based. If you think it is difficult to branch/merge using a
> central server, remember that some poor folks can't even *change a
> single file* without asking the central server for permission.

Sure. Crap exists. That doesn't make CVS/SVN _good_. It just means that
there are even worse things out there.

> It is also good to note that a free distributed SCM was not available
> until recently. The kernel community might have had a special deal with
> BitKeeper, but the same didn't apply to all open source projects AFAIK.
> When KDE moved to svn it was the best tool for the job. That might have
> changed when git became easier to use, but at the time it was simply too
> big of a barrier for new developers and too new. And from what I
> understand git support on other platforms is a recent development.

Git works pretty well on any random unix (although most users are on
Linux, with a reasonable minority on OS X - everything else tends to be
pretty spotty, and can at times require that you add compiler options
etc).

The native windows support is pretty recent, and still in flux. It's now
apparently quite usable, although I don't think there's any real
integration with any native Windows development environments (ie it's all
either command line or the "native" git visualization tools like git-gui
or gitk).

Linus


Linus Tordvals y C++

http://article.gmane.org/gmane.comp.version-control.git/57918

Linus Tordvals no es muy político expresándose

Si añadimos que "un", le molesta en la lista de distribución con una crítica poco constructiva...


From: Linus Torvalds linux-foundation.org>
Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String Library.
Newsgroups: gmane.comp.version-control.git
Date: 2007-09-06 17:50:28 GMT (1 year, 25 weeks, 1 day, 9 hours and 9 minutes ago)
On Wed, 5 Sep 2007, Dmitry Kakurin wrote:
>
> When I first looked at Git source code two things struck me as odd:
> 1. Pure C as opposed to C++. No idea why. Please don't talk about portability,
> it's BS.

*YOU* are full of bullshit.

C++ is a horrible language. It's made more horrible by the fact that a lot
of substandard programmers use it, to the point where it's much much
easier to generate total and utter crap with it. Quite frankly, even if
the choice of C were to do *nothing* but keep the C++ programmers out,
that in itself would be a huge reason to use C.

In other words: the choice of C is the only sane choice. I know Miles
Bader jokingly said "to piss you off", but it's actually true. I've come
to the conclusion that any programmer that would prefer the project to be
in C++ over C is likely a programmer that I really *would* prefer to piss
off, so that he doesn't come and screw up any project I'm involved with.

C++ leads to really really bad design choices. You invariably start using
the "nice" library features of the language like STL and Boost and other
total and utter crap, that may "help" you program, but causes:

- infinite amounts of pain when they don't work (and anybody who tells me
that STL and especially Boost are stable and portable is just so full
of BS that it's not even funny)

- inefficient abstracted programming models where two years down the road
you notice that some abstraction wasn't very efficient, but now all
your code depends on all the nice object models around it, and you
cannot fix it without rewriting your app.

In other words, the only way to do good, efficient, and system-level and
portable C++ ends up to limit yourself to all the things that are
basically available in C. And limiting your project to C means that people
don't screw that up, and also means that you get a lot of programmers that
do actually understand low-level issues and don't screw things up with any
idiotic "object model" crap.

So I'm sorry, but for something like git, where efficiency was a primary
objective, the "advantages" of C++ is just a huge mistake. The fact that
we also piss off people who cannot see that is just a big additional
advantage.

If you want a VCS that is written in C++, go play with Monotone. Really.
They use a "real database". They use "nice object-oriented libraries".
They use "nice C++ abstractions". And quite frankly, as a result of all
these design decisions that sound so appealing to some CS people, the end
result is a horrible and unmaintainable mess.

But I'm sure you'd like it more than git.

Linus