chipKIT® Development Platform

Inspired by Arduino™

comparing I/O speeds with Arduino

Created Thu, 26 May 2011 23:35:12 +0000 by jbeale


jbeale

Thu, 26 May 2011 23:35:12 +0000

Just trying out the new Uno32 board using the "Arduino 0022-chipkit-win-20110521" environment. The below sketch shows the digitalWrite() operation executes 5x faster on the Uno32.

void loop() { digitalWrite(13, HIGH); // set the LED on digitalWrite(13, LOW); // set the LED off digitalWrite(13, HIGH); // set the LED on digitalWrite(13, LOW); // set the LED off delay(1000); } // viewing waveforms on scope: // Arduino: 3.82 usec between "high" & "low" // Uno32: 740 nsec between "high" & "low" about 5x faster

However I'd like to try the faster direct write (which can generate a 8 MHz square wave on Arduino), that is 62.5 ns per instruction:

DDRB = B00111111; // set Port B pins 0-5 as outputs
   PORTB = B00100000;  // PortB.5 = Arduino Digital Pin 13 high
   PORTB = B00000000;  // PortB pins low (after 62.5 nsec)

What is the Arduino "Pin13" mapped to on the Uno32? It doesn't seem to be Port B pin 5, at any rate. And I see the data direction register (DDRB) is not defined when compiling for Uno32.


jbeale

Fri, 27 May 2011 00:21:12 +0000

Ok, by toggling PortG bit 6 (matching Arduino Digital 13) I see I can generate a 40 MHz square wave output, except that the first transition takes twice as long (some kind of pipeline issue I suppose)

PORTG = B00000000;  // PortG pins low
   PORTG = B01000000;  // PortG.6 = Arduino Digital Pin 13 high
   PORTG = B00000000;  // PortG pins low
   PORTG = B01000000;  // PortG.6 = Arduino Digital Pin 13 high

jasonk

Fri, 27 May 2011 01:00:25 +0000

The fastest way to toggle a pin on PIC32 is going to be something like

while(1)
    {
      LATGINV = B01000000;
    }

For PIC32, usually the LAT registers are best for output and the PORT registers are best for input. The *INV (invert) registers are the fastest way to toggle an SFR bit. There are also *SET and *CLR registers that allow for quick bit sets and bit clears.


WestfW

Fri, 27 May 2011 05:59:47 +0000

which can generate a 8 MHz square wave on Arduino

I don't think we ever got the AVR Arduino to generate anything faster than 5.33MHz using a code loop; all the jump instructions take at least 2 clocks, so the max speed loop using the pin toggle capability (PINB = 1) is 3 clocks, and 4 clocks with explicit port writes (PORTB=1; PORTB=0; Also no longer square...) I guess repeating code without looping will get you faster square waves (16Mhz, even?)

Extensive discussion here: http://arduino.cc/forum/index.php/topic,4324.0.html

What is the Arduino "Pin13" mapped to on the Uno32? It doesn't seem to be Port B pin 5, at any rate. And I see the data direction register (DDRB) is not defined when compiling for Uno32.

"RG6" If you drop below the arduino library level (even below c library level, so that you're dealing with chip specific pin and port names, you're not going to get instance compatibility. That's true even with the AVR based Arduinos (pin 13 is PORTB bit 7 on the MEGAs, for instance.)


WestfW

Fri, 27 May 2011 06:04:45 +0000

the first transition takes twice as long (some kind of pipeline issue?

Is there an explanation for the first transition issue? Can I exactly control the waveform (perfect square wave, or whatever) at the sacrifice of speed (some sort of sync instruction?) I know that the large MIPS cores I worked with had all sorts of special hacks to use when it became necessary to access physical memory or bypass caches and such, but I was never involved at that level and I don't know what (if anything) carries over to the M4K versions...


WestfW

Fri, 27 May 2011 08:36:22 +0000

the digitalWrite() operation executes 5x faster on the Uno32.

That's actually rather cool. I could say that this is about what you ought to expect; bit IO code in a tight loop is not going to benefit much from 32bit-ness, so any speedup would be mostly due to the increased clock rate. But I am surprised that it runs so close to the actual ratio (16 vs 80MHz: 5x speedup.)


Mark

Sat, 28 May 2011 14:19:23 +0000

What is the Arduino "Pin13" mapped to on the Uno32? It doesn't seem to be Port B pin 5, at any rate. And I see the data direction register (DDRB) is not defined when compiling for Uno32.

The IO ports for PIC32 are completely different, things like DDRx are NOT going to be the same at all nor should the be expected to. Microchip uses TRSxxxx instead. TRS stands for TRISTATE register. Also, on the PIC, setting the data direction bit to a 1 makes it and input and 0 makes it an output. Complete opposite of AVR. This was the same way back in my early years (70s/80s), Motorola and Intel where like this Motorola was 1=output/0=input and Intel was the opposite.

Below is the mapping for the MAX32 board. I will put up a web page that this detailed. If anyone wants to, this info can be found in pins_arduino.c. This applies to the Arduino/AVR code as well as the pic32 code

Digital pin 00 = F 2 RX Data Digital pin 01 = F 8 TX Data Digital pin 02 = E 8
Digital pin 03 = D 0 Timer 1 Digital pin 04 = C 14
Digital pin 05 = D 1 Timer 2 Digital pin 06 = D 2 Timer 3 Digital pin 07 = E 9
Digital pin 08 = D 12
Digital pin 09 = D 3 Timer 4 Digital pin 10 = D 4 Timer 5 Digital pin 11 = C 4
Digital pin 12 = A 2
Digital pin 13 = A 3 On board LED Digital pin 14 = F 13
Digital pin 15 = F 12
Digital pin 16 = F 5
Digital pin 17 = F 4
Digital pin 18 = D 15
Digital pin 19 = D 14
Digital pin 20 = A 15
Digital pin 21 = A 14
Digital pin 22 = C 2
Digital pin 23 = C 3
Digital pin 24 = C 0
Digital pin 25 = F 3
Digital pin 26 = G 3
Digital pin 27 = G 2
Digital pin 28 = G 15
Digital pin 29 = G 7
Digital pin 30 = E 7
Digital pin 31 = E 6
Digital pin 32 = E 5
Digital pin 33 = E 4
Digital pin 34 = E 3
Digital pin 35 = E 2
Digital pin 36 = E 1
Digital pin 37 = E 0
Digital pin 38 = D 10
Digital pin 39 = D 5
Digital pin 40 = B 11
Digital pin 41 = B 13
Digital pin 42 = B 12
Digital pin 43 = G 8
Digital pin 44 = A 10
Digital pin 45 = F 0
Digital pin 46 = F 1
Digital pin 47 = D 6
Digital pin 48 = D 8
Digital pin 49 = D 11
Digital pin 50 = G 7
Digital pin 51 = G 8
Digital pin 52 = G 6
Digital pin 53 = G 9
Digital pin 54 = B 0 Analog Input 0 Digital pin 55 = B 1 Analog Input 1 Digital pin 56 = B 2 Analog Input 2 Digital pin 57 = B 3 Analog Input 3 Digital pin 58 = B 4 Analog Input 4 Digital pin 59 = B 5 Analog Input 5 Digital pin 60 = B 6 Analog Input 6 Digital pin 61 = B 7 Analog Input 7 Digital pin 62 = B 8 Analog Input 8 Digital pin 63 = B 9 Analog Input 9 Digital pin 64 = B 10 Analog Input 10 Digital pin 65 = B 11 Analog Input 11 Digital pin 66 = B 12 Analog Input 12 Digital pin 67 = B 13 Analog Input 13 Digital pin 68 = B 14 Analog Input 14 Digital pin 69 = B 15 Analog Input 15 Digital pin 70 = A 0
Digital pin 71 = A 1
Digital pin 72 = A 4
Digital pin 73 = A 5
Digital pin 74 = D 9
Digital pin 75 = C 13
Digital pin 76 = D 13
Digital pin 77 = D 7
Digital pin 78 = G 1
Digital pin 79 = G 0
Digital pin 80 = A 6
Digital pin 81 = A 7
Digital pin 82 = G 14
Digital pin 83 = G 12
Digital pin 84 = G 13
Digital pin 85 = A 9


kasperkamperman

Thu, 20 Oct 2011 14:10:31 +0000

Anyone a concrete example on this:

while(1)

For example how can I fast write on digital pin 3 with this method?


WestfW

Sun, 23 Oct 2011 06:30:21 +0000

while (1) {
    digitalWrite(3, 1);
    digitalWrite(3, 0);
  }

370.4kHz

while (1) {
    LATDINV = 1;
  }

About 20MHz. Ugly waveform. Weird object code:

void loop()
{
9d00135c:       3c03bf88        lui     v1,0xbf88
  while (1) {
    LATDINV = 1;
9d001360:       24020001        li      v0,1
9d001364:       ac6260ec        sw      v0,24812(v1)
9d001368:       ac6260ec        sw      v0,24812(v1)
9d00136c:       0b4004d9        j       9d001364 <loop+0x8>
9d001370:       00000000        nop

(huh. Could you put the store in the branch delay slot and have a zero-length loop?) (Apparently yes, but to no speed advantage. The following is still about 20MHz):

asm volatile(" .set noreorder\n"
  " lui $3, 0xbf88\n"
  " li $2, 1\n"
  "wewloop: j wewloop\n"
  " sw $2, 24812($3)\n"
  " .set reorder\n"
  );

WestfW

Sun, 23 Oct 2011 07:10:50 +0000

A loop with the LATINV and two nop's in the loop and a nop in the branch delay slot gives me really close to 8MHz...

And unrolling the loop (consecutive sw instructions) gives almost 40MHz. These all sound like expected results. 80MHz instruction rate, 4 instructions per waveform for the loop, 2 per for the unrolled code.


jumpin_jack

Sun, 23 Oct 2011 20:36:46 +0000

In this case, do you think executing from RAM may help? I don't have hardware with me at the moment to test.

void __attribute__((longcall,section(".ramfunc"))) togglepin(void)
{
  while (1) {
    LATDINV = 1;
  }
}

void loop() {
 togglepin();
}

Also, do you think that the core timer interrupt used by the delay function may cause IO to slow? Does anyone know how often the core timer interrupt fires? Would it be possible to disable the core timer interrupt and enable it only for a call to a delay?

Thanks.


EmbeddedMan

Sun, 23 Oct 2011 22:37:51 +0000

The core timer interrupt (normally) fires once every millisecond, and it takes very little time (<1uS I think) to execute. If it does slow down the I/O, you would just see a little delay once every millisecond.

*Brian


WestfW

Sun, 23 Oct 2011 22:41:47 +0000

do you think executing from RAM may help?

No, I think 20MHz is the theoretical frequency that you get if you're running at 80MHz. One "toggle" instruction and one "jump" for each half-cycle of the waveform, 4 instructions for each full cycle. 80/4 = 20.

do you think that the core timer interrupt used by the delay function may cause IO to slow?

I assume that the core timer interrupt will put a serious glitch into the 20MHz waveform approximately every millisecond. It appears to be 100+ instructions long :-( Other interrupts would be similarly disruptive. It shouldn't otherwise "slow down io."

Yes, you could turn off the timer, or turn off interrupts entirely, to remove the glitch. It depends on what you are trying to do; if all you want is a 40MHz square wave, you might as well use a $2 oscillator instead of a ChipKit...

What ARE you trying to do?