You can help to make this program better. If you fix bugs or implement new
features, I'd be grateful if you send me patches. For a list of interesting
projects, and for a brief summary on how UAE works, see below.

A few guidelines for anyone who wants to help:
- Please contact me first before you implement major new features. Someone 
  else might be doing the same thing already. This has already happened :-(
  Even if no one else is working on this feature, there might be alternative
  and better/easier/more elegant ways to do it.
- If you have more than one Kickstart, try your code with each one.
- Patches are welcome in any form, but diff -u or diff -c output is preferred.
  If I get whole source files, the first thing I do is to run diff on it. You 
  can save me some work here (and make my mailbox smaller).

Some possible projects, in order of estimated difficulty:
- Someone running *BSD on a x86 might want to try using X86.S on such a
  system. It's likely that only configure needs to be modified.
- Add gamma correction
- If the serial port still isn't working (I've got no idea, I don't use it),
  fix it.
- Someone with a 68020 data sheet might check whether all opcodes are
  decoded correctly and whether all instructions really do what they are 
  supposed to do (I'm pretty sure it's OK by now, but you never know...).
- Add more 2.0 packets to filesys.c
- Multi-thread support is there now, it just needs someone to test it on a SMP
  machine and to fix it so it improves speed instead of slowing the thing 
  down.
- Improve the Kickstart replacement to boot more demos.
- Snapshots as in CPE. Will need to collect all the variables containing
  important information. Fairly easy, but boring. (Use core dumps instead :-)
  _If_ someone attempts this, please be more clever than the various CPC
  emulators and dump state only at one fixed point in the frame, preferrably
  the vsync point. Also talk with Petter about this.
- Find out why uae.device has to be mounted manually with Kick 1.3.
  The problem seems to be that we don't have a handler for it. I _think_ what
  we need is the seglist of the standard filesystem handler. Problem is,
  DOS hasn't been started when the devices are initialized and so we can't get
  to the DosBase->RootNode->FileHandlerSeg pointer, and then there is the
  confusing matter of BCPL GlobVecs and other weird stuff...
- Some incompatibilities might be fixed with user-modifiable fudge variables
  the same way it's done in various C64 emulators.
- With the new display code, it would probably be easier than before to
  implement ECS resolutions - however, a lot of places rely on the OCS timing
  parameters and display sizes.
- Figure out a diskfile format that supports every possible non-standard
  format.
- Implement 68551 MMU. I have docs now. Not among the most necessary things.
  Should be done like exception 3 handling: add code to genamode in gencpu.c.
- Implement AGA support. Some bits and pieces exist.
- Reimplement Amiga OS. (Well-behaved) Amiga programs could then be made
  to use the X Window System as a "public screen". Of course, not all the
  OS would have to be re-done, only Intuition/GFX/Layers (which is enough).
  [Started, look at gfxlib.c - not usable yet.]
- Find some extremely clever ways to optimize the smart update methods. Some
  ideas:
  a) Always use memcmpy() to check for bitplane differences. If no differences
     are found, see if BPLxDELAY got modified, if so, scroll.
     Problems:
      * You'd still have to draw a few pixels around the DIW borders. Not very
        hard.
      * Scrolling with memcpy in video memory can be terribly slow (no, I
        shouldn't have bought the cheaper video card with DRAMs)
      * At least every 15 pixels a full update has to be done since the
        bitplane pointers get updated after that. And that's with the slowest
	scrolling - if the playfield scrolls faster, the benefit converges
	against zero.
     You could also do vertical scrolling tests, but similar problems arise - 
     where should one check? One line above/below? What about faster
     scrolling? You could use the bitplane pointers as hints, but with
     double/triple buffering this gets problematic, too.
     On the whole, I don't think it would be worth the effort, even if it
     works very well for a few games.
  b) Well, there is no b). If I thought of something I forgot it while
     writing a).
- Port it to Java and Emacs Lisp
- A formal proof of correctness would be nice.


Source file layout

src/      contains (mostly) machine-independent C code.
include/  contains header files included by C code.
md-*/     CPU and compiler dependent files, linked to machdep by configure
od-*/     operating system dependent files, linked to osdep by configure
td-*/     thread library dependent files, linked to threaddep by configure
sd-*/     Sound code. sd-* is only for sound systems which are not OS specific
          or for which no "od-*" directory exists. Linked to sounddep
targets/  Contains header files which contain some information about which
          options a specific port of UAE understands.


Coding style

As long as your code is hidden in a file buried in md-*/ or od-*/ where I
never have a look at it, you can probably get away with not following these
guidelines. 

* Do not include CR characters.
* Do not use GNU C extensions if you can't hide them in a macro or in a
  system-specific file so that an alternative implementation is available
  when GNU C is not used.
  This applies to _all_ OS/CPU/compiler specific details. Basically, nothing
  of that sort should appear in src/*.c (we're a bit away from that goal at
  the moment, but it's getting better).
* Make sure your code does not make assumption about type sizes other than
  the minimum widths allowed by C. If you need specific type sizes, use the
  uae_u32 type and its friends.
* Set up your editor so that tab characters round up to the next position
  where ((cursorx-1) % 8) == 0, i.e. 8 space tabs. Do not use 4 space tabs,
  that makes the code awful to read on other machines and worse to edit.
* Lines can be up to 132 characters wide. Use SVGATextMode for the Linux
  console, or use a windowing system in a high resolution.
* C++ comments are a no-no in C code.
* Indentation - look at some code in custom.c and try to follow it. Don't
  use GNU 2-space-in-weird-places indentation, I find it awful. But _do_
  follow the GNU rules for adding whitespace in expressions, and those for
  breaking up multiple-line if statements.
  Fixed indentation rules almost never make sense - break the rules if that
  makes your code more readable.
  Hint: Get jed from space.mit.edu, /pub/davis. It can indent your code
  automatically. Put the following into your .jedrc, and it will come out
  right:
  C_INDENT		= 4;
  C_BRACE		= 0;
  C_BRA_NEWLINE		= 0;
  C_Colon_Offset	= 1;
  C_CONTINUED_OFFSET	= 4;


How it works

Let's start with the memory emulation. All addressable memory is split into
banks of 64K each. Each bank can define custom routines accessing bytes, 
words, and longwords. All banks that really represent physical memory just 
define these routines to write/read the specified amount of data to a chunk 
of memory. This memory area is organized as an array of uae_u8, which means 
that those parts of the emulator that want to access memory in a linear 
fashion can get a (uae_u8 *) pointer and use it to circumvent the overhead of
the put_*() and get_*() calls. That is done, for example, in the
pfield_doline() function which handles screen refreshes.
Memory banks that represent hardware registers (such as the custom chip bank
at 0xDF0000) can trap reads/writes and take any necessary actions.

To provide a good emulation of graphical effects, only one thing is vital:
Copper and playfield emulation have to be kept absolutely synchronous. If the
copper writes to (say) a color register in a specific cycle, the playfield 
hardware needs to use the new information in the next word of data it
processes.
UAE 0.1 used to call routines like do_pfield() and do_copper() each time the
CPU emulator had finished an instruction. That was one of the reasons why it
was so slow. Recent versions try to draw complete scanlines in one piece. This
is possible if the copper does not write to any registers affecting the
display during that scanline. Therefore, drawing the line is deferred until
the last cycle of the line. However, sometimes a register which affects how
the screen will look is modified before the end of the line (think of copper
plasmas). That's what "struct decision thisline_decision" is for. It is
initialized at the start of each line. During the line, whenever a vital
register is changed, one of the decide_*() functions is called and may modify
thisline_decision. There are several independent decisions:
 - which DIW should be used
 - where does data fetch start/stop (or is the line in the border altogether)
 - where should sprites be drawn (note: the same sprite can appear more than
   once on one scanline, see Turrican I world 3 levels 1 and 3 for the best
   example)
 - what are the playfield pointers at the start of DDF. Related, what data do
   they point to.
 - what are the playfield modulos at the end of DDF
 - coppermagic with the colors is remembered for later use
 - so is copper magic with the bitplane delay values. I used to think there
   was no useful application for modifying BPLCON1 while data is being
   displayed, but Sanity demos can make Amiga emulator programmers look real
   old.

All of this is remembered while the raster line is processed by the hardware.
After the line (at hsync), all the decisions are made if they weren't made
before. At that point the line can be drawn by playfield_draw_line.
Additionally, all the decisions from the previous displayed frame are saved
and compared with the new ones, since often lines are not modified between
frames. This saves a lot of redrawing work.

The CPU emulator no longer has to call all sorts of functions after each
instruction. Instead, it keeps a list of events that are scheduled (timer
interrupts, hsync and vsync events) and their "arrival time". Only the time
for the next event is checked after each CPU instruction. If it's higher than
the current cycle counter, the CPU can continue to execute.

Things that can't be supported with the current "decision" model:
  - Changes in lores/hires mode during one line. Dunno whether that was ever
    used in reality.
  - Changes to the bitplane DMA bit during one line. Hardly useful and not
    likely to be used. [but there are at least two programs which do ugly
    things like that, and there are some hacks in UAE that make those programs
    work (Magic 12 Ray of Hope 2 is one of these demos)]
  - Changes in bitplane data during one line. If programs do this kind of
    thing, it's most likely accidental and the program is broken. Can happen
    with programs that use the blitter incorrectly, like all the Andromeda
    demos.
  - others? (fill in if you can think of anything)

All in all, it's unlikely that this causes compatibility problems. If it does,
fudge values could be introduced (although that sort of thing gets messy
quickly).


* Native code vs. 68k code

It is possible to call native code from 68k code; autoconf.c has some routines
which make setting up a call trap very easy. However, it is not as easy to
call 68k code from native C code, at least not while Amiga Exec multitasking
is running. You ask why?

Amiga process1 calls native function foo
Native function foo calls some 68k function and goes into 68k mode
Amiga context switch happens, process1 is put to sleep and process2 gets run.
Amiga process2 calls native function foo
Native function foo calls some 68k function and goes into 68k mode
Amiga context switch happens, process2 is put to sleep and process1 gets run.
Process 1 completes the 68k function called by foo and returns from 68k mode.

There. Now we are in function foo again. When it called the 68k code, process2
was active. Now process1 is active, and the function we called in process2
hasn't completed yet. What a mess.

To get around this, you need to do some stack magic. Code to do this exists,
but it must be adapted for each port, since setting up a different stack is
completely non-portable.


* How multithreading in filesys.c works

AmigaOS is nice enough to start one processes for each mounted filesystem. All
of these run in the 68k emulation code, i.e. in the main UAE thread. This is
the reason why multithreading is desirable: if the main UAE thread blocks
waiting for I/O, the CPU emulation can't continue to run. Since the Amiga OS
is capable of multi-tasking, it is possible that other code could run until
the I/O operation is complete. The most important bit of code that can run is
the code that moves the mouse pointer - it's unpleasant if the pointer does
not follow mouse movement during disk/CD accesses.

When a packet is received by the filesys.asm code, filesys_handler is called.
This function always runs in the main UAE thread.
 - In the single-threaded case, this function performs the action that was
   requested, then returns 0 to indicate "action completed, reply packet".
   Nothing else is performed.
 - In the multi-threaded case, filesys_handler figures out which unit the
   packet was for and sends the packet to the UAE thread responsible for
   handling this unit. filesys_handler returns 0 to indicate: queue the
   packet. Also, one (at that point unused) field in the packet is set to
   0 to indicate that the action was not completed.

The latter case is the interesting one. The thread that got the packet does
the following:
 - perform the action as usual
 - set the "command complete" field in the packet to -1.
 - send a message to the AmigaOS (!) filesystem process. However, it can't do
   that without some effort. We can't call 68k code from the emulator easily.
   So we have to use an Amiga interrupt. The filesystem init code sets up an
   Exec IntServer for the EXTER interrupt, and hsync_handler() checks
   periodically whether the filesystem needs an interrupt and raises one if
   necessary.
   Only one dummy message is used per filesystem unit, which is allocated at
   startup. This means that there must be some locking to prevent the unit
   thread from sending the same message twice to the same port. To determine
   whether the message is free, three counts are kept. "cmds_sent" is
   incremented by the UAE thread whenever it has completed a command.
   "cmds_acked" is set to the same value of cmds_sent at the point that the
   interrupt handler got invoked and decided it must send a message. Finally,
   cmds_complete is set to this value at the time the AmigaOS process receives
   the dummy message. Whenever cmds_acked == cmds_complete, the dummy message
   is free to be sent again.
   
The EXTER interrupt basically walks through the units, looks at the cmds_*
fields and sends the dummy message to the Amiga filesystem process when
possible and necessary.

When the Amiga filesystem process receives such a dummy message, it does the
following:
 - increment cmds_complete as described above.
 - walk through the queue of unprocessed commands and see which ones now have
   a status of -1, indicating that they are finished. These are removed from
   the queue and replied to.


* Calltraps at fixed locations

F0FF00: return from 68k mode.
F0FF10: must have gotten lost somewhere ;)
F0FF20: used by filesys.c to store away some information from the startup
        packet.
F0FF30: filesys_handler().
F0FF40: startup_handler(), handles only the startup packet for each
        filesystem.
F0FF50: used by the EXTER interrupt which we set up for the filesystem.
F0FF60: used by the uaectrl/uae-control programs (see uaelib.c)
F0FF70: used by the task that gets set up for the mouse emulation.

* How the compiler works

.. yet to be written. To be decided, in fact.


Portability

This section was out of date. I'll rewrite it.
Some day.