chipKIT® Development Platform

Inspired by Arduino™

Mix Arduino and Assembler?

Created Sat, 13 Oct 2012 21:17:50 +0000 by radiosky


radiosky

Sat, 13 Oct 2012 21:17:50 +0000

Is there a way to create/call assembler routines using the MPIDE? I want to do some fast pin toggling withing my program. digitalWrite/Read is far too slow.


EmbeddedMan

Sat, 13 Oct 2012 21:57:20 +0000

The answer to your question is YES, however, it is strongly recommended against. The main reason is that writing any assembly prevents the compiler from doing any optimization of the function. It's also hard to write MIPS assembly that's as good as what the compiler (gcc) can do.

HOWEVER - you don't need assembly for what you want to do. You can directly access the GPIO pins from C, and doing so will be exactly as fast as doing it in assembly. (i.e. one instruction) Just use the PIC32 registers.

For example, if you wanted to set PORTC bit 3, you could just write LATCSET = 0x0008;

If you wanted to clear bit 0 of PORTE, you could write LATECLR = 0x0001;

These get translated into single instructions by the compiler. You can't get any faster than that. :-)

*Brian


WestfW

Sun, 14 Oct 2012 09:07:25 +0000

These get translated into single instructions by the compiler. You can't get any faster than that. :-)

No, they don't. RISC doesn't do that. IIRC, you're looking at 4 instructions: load high half of address into pointer reg load low half of address into (the same) pointer reg load bit constant into temp reg store bit constant from temp reg to memory via pointer reg For some loops, the address and constant will stay in registers, leaving just the store...

More analysis and discussion: viewtopic.php?f=6&t=30 viewtopic.php?f=7&t=448 (that was for the pre-open-source version. It's possible that the OS version is somewhat different.)


EmbeddedMan

Sun, 14 Oct 2012 13:36:41 +0000

If that were true, how am I able to generate a 40MHz square wave on an IO pin using my 80MHz part? :-) It has to be one instruction per bit flip, otherwise that wouldn't work.

Yes, and I do understand how all of this works - if you have a number of bit-flip instructions in a row, the compiler sets everything up at the beginning and is able to execute the store instruction repeatedly every clock. And of course your explanation is correct about MIPS and RISC in general.

The more important point here though is that you can make GPIO accesses in C just as fast as you can in assembler. And so there is no good reason to accept the negatives that come with using assembler just for GPIO access.

*Brian


majenko

Sun, 14 Oct 2012 20:09:47 +0000

Knocking up a little program in RetroBSD (which uses the ChipKit compiler):

#include <machine/pic32mx.h>

void main()
{
    LATCSET=0x0008;
}

and compiling it with a disassembly output, gives (amongst other stuff) a listing of what the LATCSET=0x0008 compiles into:

void main()
{
        LATCSET=0x0008;
7f00805c:       24030008        li      v1,8
7f008060:       3c02bf88        lui     v0,0xbf88
7f008064:       ac4360a8        sw      v1,24744(v0)
}

So that's three instructions for one set command - load immediate, load upper immediate, and store word.


EmbeddedMan

Sun, 14 Oct 2012 21:02:57 +0000

And if you do

LATCINV = 0x0001; LATCINV = 0x0001; LATCINV = 0x0001; LATCINV = 0x0001; LATCINV = 0x0001; LATCINV = 0x0001; LATCINV = 0x0001; LATCINV = 0x0001;

You'll get :

! LATCINV = 0x0001; 0x9D00468C: LUI V1, -16504 0x9D004690: ADDIU V0, ZERO, 1 0x9D004694: SW V0, 24748(V1) ! LATCINV = 0x0001; 0x9D004698: SW V0, 24748(V1) ! LATCINV = 0x0001; 0x9D00469C: SW V0, 24748(V1) ! LATCINV = 0x0001; 0x9D0046A0: SW V0, 24748(V1) ! LATCINV = 0x0001; 0x9D0046A4: SW V0, 24748(V1) ! LATCINV = 0x0001; 0x9D0046A8: SW V0, 24748(V1) ! LATCINV = 0x0001; 0x9D0046AC: SW V0, 24748(V1) ! LATCINV = 0x0001; 0x9D0046B0: SW V0, 24748(V1) ! LATCINV = 0x0001; 0x9D0046B4: SW V0, 24748(V1)

And that's how you get single instruction bit manipulation - 40MHz square wave out of an 80MHz part.

Is it practical? Not for much, really. But it does work. And I have used this on projects (the idea that you can send data to 12 bits of parallel IO at a rate of about 20MHz).

*Brian


majenko

Sun, 14 Oct 2012 21:28:28 +0000

Here's one that would be better off with manual assembly writing:

void main()
{
        while(1) LATC = LATC + 1;
}
void main()
{
        while(1) LATC = LATC + 1;
7f00805c:       3c02bf88        lui     v0,0xbf88
7f008060:       8c4360a0        lw      v1,24736(v0)
7f008064:       24630001        addiu   v1,v1,1
7f008068:       ac4360a0        sw      v1,24736(v0)
7f00806c:       0bc02018        j       7f008060 <main+0x4>
7f008070:       00000000        nop
}

You notice the jump instruction goes back to the second line of assembly - loading the contents of the LATC register back in to v1 - but that is what is already in v1. Jumping to the addiu instruction (main+0x8) would save an instruction and perform the same job - only faster.


WestfW

Mon, 15 Oct 2012 08:26:08 +0000

Oh! It does look like the peripherals are now compile-time constants rather than link time symbols. That should indeed cut the 4-instruction seq to only three, and probably make use of multiple peripheral registers nicer as well.


jumpin_jack

Mon, 15 Oct 2012 19:45:54 +0000

@majenko

I'd imagine that something like this would be a little faster:

void main ()
{
  unsigned int value;
  while(1) LATC = ++value;
}
void main ()
{
9d000188:	3c03bf88 	lui	v1,0xbf88
  unsigned int value;
  while(1) LATC = ++value;
9d00018c:	24420001 	addiu	v0,v0,1
9d000190:	ac6260a0 	sw	v0,24736(v1)
9d000194:	24420001 	addiu	v0,v0,1
9d000198:	ac6260a0 	sw	v0,24736(v1)
9d00019c:	0b400064 	j	9d000190 <foo+0x8>
9d0001a0:	24420001 	addiu	v0,v0,1

majenko

Mon, 15 Oct 2012 20:15:36 +0000

It's just the same.

Instead of "load, add, store, jump" you have "store, add, store, jump" - the same number of instructions.


jumpin_jack

Mon, 15 Oct 2012 21:47:19 +0000

No, I don't think it's the same. It's "store add store add jump". The second add is in the jump's branch delay slot.

I think your first example has your unwanted load because LATC is declared 'volatile' so really the compiler has to generate code to load it again. By using a local variable without volatile, you can get rid of that load and also get rid of the nop in the branch delay slot.