A good general rule is to keep all your own work, including configuration stuff (in script files if need be) as plaintext. Be able to clean everything but your raw source out of the way whenever necessary, and be able to rebuild automatically.
Your repository and configuration management system's most important job is to give you security. Every bit of complexity you add increases the danger that the system will fail. The team that abdicates control to a whizz-bang client-server architecture configuration management system risks corruption due to loss of referential integrity within the system. The resultant chaos, as all work stops, efforts to find some way to have confidence in backups commence, people start to identify what they must rework, and morale disappears, shouldn't happen to a dog. It's not hard to build a good configuration management system out of system scripts using shopping lists, with something as simple as SCCS or RCS providing underlying version control.
The same logic, that the objective is security, so simplicity increases reliability and confidence when system and users are under stress, applies to backup as well. When something goes wrong, the most important thing is to know that you have rolled the system back to its exact state at a given time in the past. Incremental backups with convoluted strategies for handling changes to the directory tree can reduce confidence that they've worked OK even when they do. Or rather, they should. The team that automatically cleans down to plaintext, streams everything off to tape, and rebuilds every night knows where it is, and on this solid foundation can spend its time making real progress is actually smarter than the sophisticated team that spends a month trying to baseline itself.
Keep it simple. Never fix anything - you never know if you found all the problems. Clean down and reload instead. Always be able to reformat your disk, reinstall your tools, retrieve your plaintext repository, reconfigure and rebuild. This brings total security and saves all that time you have to spend worrying about viruses. Who cares?
Conceptual integrity requires that you define an overall approach to error handling to use throughout your design and stick to it. How will you indicate failures? What idioms will programmers use to elegantly test for errors without breaking the main flow? It should not be necessary to make a second call to find out either whether an error occurred or what it was - otherwise code will bloat. Errors should ideally be testable for on the function return to allow for terse idioms, which make for apprehendable code and beyond that, the quality plateau:
if((fp = fopen(...)) == NULL)
{
// Error
}
or
if(!DoTheBusiness())
{
// Error
}
One hears many excuses for bloating up the error handling logic to the point where a rigorously coded function call can take so many lines of code it isn't possible to see the big picture any more. As mappers, we know that accreting so many Worthy coding standards you can't write a beautiful program is just missing the point.
In procedural (and some object) code, there is a debate about error handling strategy that it is worth addressing. There are two approaches addressed in the debate. The first says that the called subroutine should not return until it has done what it as asked, and we might call it `total delegation'. The other says that the called subroutine should proceed until encounters a problem, and then tidy up any mess it has made and bale out telling the caller what has gone wrong - we'll call it `ready failure'.
The attractions of total delegation are that it makes for very clean code in the caller, it can be very efficient, and it devolves responsibility for maintaining the state of lower levels to the levels themselves. The disadvantage is that it only works if the caller doesn't need to actually handle the consequences of error using its own context. This limits it to systems programming type situations, where the application really can have no knowledge of the goings on below, and if the lower layer really can't sort the problem out, any kind of exit niceties above would be inappropriate because the OS is going to sling the process out or panic trap.
Ready failure always allows the caller to make a response to problems, and nested calls can pop their stack until they reach a layer that can deal with the problem. The error handling logic is threaded through every layer, but can be minimised with careful coding, and both author and maintainer know how to operate within the scheme. In addition, one can ensure that a call trace showing how the process got into trouble so that the situation can be reproduced is always produced.
We don't think there should be a debate because when total delegation is appropriate it is at the lowest level. Mixing the two is a nightmare in code because it breaks conceptual integrity.
Some object languages provide exceptions, which allow the automatic collapse of the call stack to a layer qualified to handle the error. These are a great way to decouple the main flow from error handling. Important points to remember are that an exception can sometimes be thrown be lot higher if you just want to handle it than if you want to know how it got into trouble. An error message from a low level saying
Could not write() datafile ftell() = 246810
followed by another saying
Could not Save World
just doesn't help with debug. You can throw exceptions up a layer at a time without compromising the main flow, and should think about doing this.
Do not abuse exceptions to create weird control flow in company time.
In particular do not hide longjmp()
s in macros
and call them from handlers.
If you wish to experiment with the Powers of Darkness,
do it at home.
We all have to do it,
but it is madness after all,
and your colleagues might get the wrong idea and start rationalising it.
Isn't it strange that we are producing languages
today that are becoming anally retentive to the point where it takes
ages just to get the const
declarations set up right
in function prototypes,
but allow us to pull stunts with control flow that we'd
never have tried to pull off in assembler after we got just a few
K
to play with?
Try to avoid leaving assert()
s
and conditional compilations of debug macros littering your code.
You cannot achieve the necessary
sufficiency of the quality plateau with all that junk lying around.
As well as emphasising the importance of dialogue, this points out an often overlooked point. Does the user really want you to implement the failure mode described in such detail in the URD? Might a system that just works be acceptable? Of course it would be, but many teams just go ahead and implement the failure just like the URD said.
A modern legend at ICL tells that when they bought their first load of boards from Fujitsu, they specified that there would be a 1% rejection rate. So just before the first batch of 100 shipped, a senior Fujitsu executive picked the top board out of the crate and smashed it with a hammer before repacking it.
Apart from the need to manage state transitions and execute rarely exercised code, often across distributed platforms during conditions of failure which is just asking for trouble, there is always a deeper problem with systems of this kind.
First we are in normal running. Then we enter failure mode. Then recovery mode. What happens if we now fail? Do we have failure during recovery from failure mode? Recovery from failure during recovery from failure mode? It is so easy to introduce a need for this kind of infinite regress into modal systems, and not even recognise it. Of course, if in your design every level of failure and recovery in the regress is identical then you are OK - all you have to do is prove this is the case.
If you can collapse the infinite regress then you can probably take the next step - eliminate the normal and recovery modes altogether and stay in failure mode! (Or eliminate the normal and failure modes and stay in recovery mode if you'd rather see it that way.) Then there are no co-ordinated state transitions to manage across multiple platforms while the gremlins are shoving the power leads in and out. The system need not ever know that it is under sustained real world attack and this is the fourth time it has tried to process a bunch of transactions. It need not know its context to do The Right Thing, if you have defined The Right Thing carefully enough.
Having multiple modes for handling failure really is much less necessary than most people think, and avoiding them makes huge gains in controlling complexity. If we wish to keep control and understanding of our designs, we must minimise complexity everywhere we can. On the win side of this equation is the quality plateau. On the lose side is interaction between complexity with other complexity to produce vast growth in the system's state space called `combinatorical explosion'.
The thing is, database normalisation concepts apply everywhere, for the same reasons. Never store a thing, and another unconnected thing somewhere else that asserts that the thing exists. Let data control its own structure, and it won't get tangled. Ritualistic use of data structures, often includes an aspect of pretending to be in control by picking up one's toothbrush with chopsticks. If we create the financial accounts structure in Box A, and a complicated description of Box A in Box B, we can spend a long time thrashing about in Box B and never have to address the fact that we really don't understand what is happening in Box A at all!
Don't fall into this trap. Let data represent themselves, or as Laurie Anderson said in Big Science,
Let X = X
Be aware of which system resources clean themselves up (such as semaphores) when the owning process dies and use them preferably.
Avoid `cleanup processes' that run around non-deterministically on the system clock with slash and burn rights against all your system's resources. Try to use initialisation protocols that start by determining a known state and moving forward instead. An example might be,
Or, grasp branch firmly with left paw before releasing right paw!
Object designs are intended to create a formalised Knight's Fork, by providing an approach which explicitly relates real world objects to viable system semantics via object programming languages (be they Eiffel or UML code generators). When producing these designs their creators tend to represent everything that is in the real world today, rather than tomorrow, when the system will be in use. The major difference is that tomorrow the system will exist in user's world - today it does not. So analysts regularly produce little pictures of the user's world in the future that contains everything but the very computer system that is central to the whole scenario.
Meanwhile, the internal design of the system is also hampered by the lack of representation of the system itself. One might say that the real world system and the internal system are the same thing in both real and abstract worlds, and hence this identity forms the appearance of the Knight's Fork in its most basic form in object designs.
The two deep questions when finding objects and how they link together are:
With a clear System class in the design it's a lot easier to draw out the instantiation hierarchy, as well as see where things like GUI and tape I/O come from, let alone use cases that are triggered by wall-clock time! This doesn't mean that functionality can't be moved out into specialised classes later in the design, but it does give the user world reality an equal footing with the system reality in the design, so the result will satisfy both criteria.
Getting to grips with an abstract set of classes floating around in the air with no rigorous way to get a reality check can be as painful as anything else when you don't know what you are doing.
Of course, the need for a System class disappears if one in interesting in simply modelling, rather than automating business flows where the control systems are not represented. What would be the point in that? Here we stress the point that engineering informed by the mapping cognitive strategy involves more than a set of procedural actions. It means bounding your own problem, clarifying your own desires, and finding the optimal point of leverage between problem dynamics and system semantics. If your design won't benefit from having a System class, don't use one!
malloc()
in
C
under
UNIX
or
DOS ,
or the operator
new
in
C ++),
and then forgets to give it back when it is finished with it.
This can sometimes damage other processes on the same platform,
because some
OSes
will allow one process to gobble up
all available system memory and swap!
Even if the OS is sophisticated enough to limit the amount of real memory it will allocate to a single process on a multiprocessor, the application can soon gobble up its own quota, which usually ends up failing in a user-visible fashion, at the very least by having the application lobbed out of the system by the OS !
So memory leaks are A Bad Thing.
This is why people sell,
and buy,
memory leak detectors.
The trouble is,
memory leaks are a symptom of problems,
not a cause.
It isn't hard to call free()
or delete objects that are no longer needed.
Using delete
on collections of pointers to
active objects is just plain sloppy,
as is over-writing their addresses.
What if these objects have, say,
callbacks registered with the
GUI ?
How will you get rid of them?
Destructors out of control mean programmer out of control.
If a programmer can't show control over objects enough to avoid memory leaks,
how can we know that anything else is right?
Conceptual integrity is one of the strongest supports in keeping control of objects. A useful general rule (although like all rules it is not always appropriate) is to say that the layer that constructs a module should also be responsible for destroying it. This at least focuses attention on the objects lifecycle, and not just a few aspects of it behaviour that might be indicated in use case diagrams.
Therefore timeouts are A Bad Thing. In deliverables they make the system's state space vastly bigger, so making its behaviour much harder for the designer to predict. In debugging, they can make the conditions under which the fault occurred impossible to reproduce. Don't use them unless you absolutely must.
Communications layers are often obliged to use timeouts, because when it comes down to it, the only way to find out if a remote box wants to play is to send it a message, and wait to see if it sends one back. How long should one wait? The `Byzantine Generals' Problem' illustrates this. So most modern systems have timeouts in the comms layer, but this is not an excuse to use them all over the place, and where they must be used, they should be hidden within an encapsulated object that can be replaced by a deterministic event generator (such as a key press) for debug.
At a requirements elicitation level, we can wander around the particular part of the problem domain we are supposed to be tackling, without ever knowing if we have yet captured all the issues that are relevant. To gain confidence that we have not missed anything, we need to widen our gaze, so that we can see where our outputs go to, and where our inputs come from. We need to find a way to present these flows so that we can see the big picture at a glance. Reams of prose or fat folders full of Data Flow Diagrams are of no help at all here (although they may well be needed elsewhere on the project), because they will not allow us to see at a glance that there are no loose ends. If we can see that there are no loose ends, we can be reasonably confident that there are no hidden horrors that we will discover during implementation. This is an example of the mapper technique of problem bounding.
At an architectural and detailed design level, the same idea applies. During our contemplation of our design we represent our ideas to ourselves in as many ways as we can, and challenge them to see if we can break them. It is important that we use some feature of the design, such as the number of possible input states, to show that the system we design will be robust in all cases, by showing that we have considered all cases. This does not mean that we attempt to enumerate all cases - instead we find a means to group them, and show that we have considered all groups.
When single-stepping code with a graphical symbolic debugger, at each decision we should consider all the circumstances under which the path we are taking would be followed, and all cases where the other path would be followed.
In all these situations, design for test begins by laying our work out so that its correctness is visible to inspection. In this light it is interesting to consider what we mean by a mathematical proof. The purpose of proof is usually described as being to show that a proposition is the case. That is a very packer, activity-centred way of seeing things. The mapper description of the purpose of proof is that it shows us the proposition in a new light, in which the truth of the proposition is obvious to inspection. For mappers, a proof doesn't just establish a fact, it increases our understanding as well. We have recently seen computer-assisted proofs that fulfill the packer purpose, but do nothing for the mapper purpose. Because they do not exploit the leverage which comes from understanding, these proofs are also weaker. Is it necessarily the case that the correctness of the code (let alone the architecture of the computer) that is going to perform the search is obvious to inspection?
Wise architects usually layer their designs so that there are discrete stages visible in the transition from end-user facing code and OS facing code. Every one of these layers provides an opportunity to write a little test application. These opportunities should usually be taken, because although it may seem like a high up-front cost, tracking down bugs that do not have well-defined test points in the layering can explode the final test phase and add enormous time costs just before delivery. To fully exploit these test opportunities, we should consider test when defining the API s for our layers. Is it possible to simplify the API definition so that we can reduce the proportion of all possible calls that are meaningless? Each layer must either validate its input, or if time is really critical, require prevalidated inputs. Test must ensure that this logic works properly as well as the job the layer is supposed to be doing. If the API can be simplified, the test requirement is automatically simplified at the same time.
Considerations that apply between layers also apply between runtime processes. Most non-trivial systems require several processes to co-operate either on a single platform or across a network. The functionality of these processes should be divided up so that it is possible to test them, ideally in isolation and from a command-line or script.
Sometimes we cannot avoid introducing discontinuities into the solution where none exists in the problem. For example, if our database is so big we must spread it across several machines (and our COTS RDBMS isn't managing this for us) we need to recognise the points where the logic of our programs must change to look on another system, and test that this change is negotiated correctly.
Designers of object systems have a particularly easy strategy available for automating test. Every class (or key classes at the discretion of the architect) can have a matching class defined that exercises the methods of the system class. This works so well because the class declaration forces the surface area of the class to be tested into a standardised, and well defined format (this is what objects are all about). So each class can carry with it it's own test code, that just needs calling itself from a little application wrapper to automate the test. These test classes are sometimes called `yang' classes (the deliverable classes are the `yin' classes).
There are two benefits that can be attained when automated test is in place. The first is that the tests can be run every night, as part of the build process. It does programmers no end of good to come in and find an email from the development environment saying that everything that the entire team has developed to date is still working properly. When the email says that something is broken, they don't waste days trying to find out what is wrong with their new layer when in fact the problem has appeared two levels down. The second benefit is that automated test code cannot slide out of date as documentation can. If the automated test compiles, links, and passes, than we know that the description of the behaviour of the tested code that it contains is true.
These ideas of the definition and execution of automated tests are especially important on very sophisticated projects where dynamic configuration management and incremental compilation tools out of science fiction books allow hundreds of developers to hack away like demented monkeys on cocaine without even stopping for sleep let alone thought. (Said the authorial voice rhetorically.) Checkpointing and running full tests from the ground up should not be considered an interruption of work - it is a very cheap way of buying confidence in the solidity of the ground. As an added benefit, such events can become team festivals as module after module, layer after layer, announces its own successful build and test on the configuration manager's workstation. It is at these festivals that the team can naturally reflect on all that they have achieved to date, because the first festival should be simply to compile and run a `Hello world!' program and prove that the compiler is working properly, while the last produces a working product that is deliverable to the customer with all objectives achieved.
LocalAdjustTime()
or some such.
UNIX
has wonderful support for doing this stuff right,
and sadly few sites ever use it properly.
The same thinking applies to time zones. Your users may well work all over the planet and want to talk in terms of their local times, but your network should use GMT (or UTC if you really mean UTC ) throughout, and files should be so timestamped. Sequencing issues with files created on computers with different clock settings still absorb far too many programmer hours. One manager of an international network went to her local domestic furnishings store and bought forty hideous, 1950s pointy-style, identical clocks and a boxful of spare batteries. Over the following year she took a clock with her whenever she visited a remote office, set it to the correct GMT time and hung it on the wall. By the end of that year, the persistent undercurrent of difficulties that were ultimately caused by sequencing issues magically went away, because every operator had a very big reminder of what time to set the system clocks to when they rebooted them.
Another example of an avoidable discontinuity of problem domain is the two kinds of money most countries like to maintain. There are always 100 pence in the pound or cents in the dollar, or just store the pence or the cents, and if the user really wants a decimal point printed out after the second digit, put a 2 in the database somewhere and use an output routine that does the database lookup. That way sensible currencies like pesetas and lira don't end up causing problems because one must remember not to print out the redundant decimal point...
The difficulties associated with the Year 2000 problem repeatedly reveal an issue that programmers seem to have to discover over and over again. Programming languages provide data types because within the type there are a permissible set of operations that one can either perform or not, and if one is reading code and the operations are being performed, one can see them. OO languages extend this facility to any kind of data we wish to so control by giving us Abstract Data Types (ADTs). The real difficulty with Year 2000 is not the way so many programmers coded the year into two digits - in days gone by that was a necessary storage saving and some Year 2000 prone software is quite old. The problem is the way some programmers chopped up their two digit dates and tucked them away all over the place, without using a consistent subroutine, macro or even code fragment to do it. That means that to dig out the Year 2000 problems one must read and understand every single line of these horrible blathering old programs.
First, one should distinguish between the security requirements of one's products, which come from the users' requirements, and the security needs of one's own development environment. These may be linked, where, for example, the security of the product depends on the confidentiality of the source code, but linkage is not equivalence. In products, don't just add `security' features by force or habit. Is it really necessary to associate a password with every user ID in your product? Do you need user IDs at all? Can non-contentious functionality be provided behind a `guest' ID that requires no password? Every password in your product must be remembered and maintained, reducing ergonomic viability and increasing cost of ownership, for the little darlings are sure to forget their passwords.
Next, there are two kinds of security threat, malicious and inadvertent. Your product may need to guard against malicious threats, but if you need to guard your own development environment against malicious threats from within (we assume you are a grown-up and have a firewall), you have bigger problems than tweaking a few file permissions will sort out for you. So give up on malicious threats at work. As for inadvertent threats, such as accidentally deleting the entire source tree, you have your backups don't you? Placing a high cost security overhead on every operation in a development environment to guard against `disasters' that are in fact low-cost when, if ever, they happen is misguided. As programmers become more familiar with the doctrine of the personal layered process, even these low cost errors reduce in number, and the development of shared mental models and mapper jargon within teams means that informal `etiquettes' develop readily, such as the cry `Reinitialising the test database - all OK?' before cleaning up trashed test data. These outcry etiquette elements are the only acceptable shouting in the hated open plan office, and just about the only valid reason for them. It is not a good enough reason however.
So don't lock up your development environment to the point where changing anything at all requires every team member present to type in their passwords. Don't create or adopt a configuration management system that stops a developer dead at eight o'clock at night when he or she is on a roll but can't even book out a hackable file for read to try something out. Not only does this directly impede your project: it is also an emotionally painful experience that you dump on the most highly motivated animal in the commercial world - a programmer in Deep Hack Mode. What has this person done to hurt you?
And finally, don't allow any element of the packer article of faith that we must know exactly who did exactly what at exactly what time with respect to absolutely everything to cloud your thinking. If your project is a team co-ordinated by etiquette and formalised as necessary you have a chance. If it is a bazaar regulated by detailed records you are doomed anyway.
This file last updated 9 November 1997 Copyright (c) Alan G Carter and Colston Sanger 1997Contact info:
alancarter@dial.pipex.com
colston@gid.co.uk