chipKIT® Development Platform

Inspired by Arduino™

DigitalWriteFast library for Uno32?

Created Thu, 06 Oct 2011 13:02:28 +0000 by TRC


TRC

Thu, 06 Oct 2011 13:02:28 +0000

Hello all, I'm new to Chipkit, just received my Uno32. I switched from Arduino to ChipKit because of the better speed.

Well, with Arduino I have been using the custom DigitalWriteFast library a lot. The libary can be found here: (Hmm as a new user I can't post urls) but Google: [color=#0000BF]code google digitalwritefast[/color]

I must say it is quite a speed boost. This simple code:

[color=#0000BF]void setup(){ pinMode(8, OUTPUT); } void loop(){ digitalWrite(8, HIGH); digitalWrite(8, LOW); }[/color]

Gives on the Arduino a squarewave of 120khz. This same code on the Uno32 gives a squarewave of 633khz This same code on the Maple gives a squarewave of 710khz

But on the Arduino and using the DigitalWriteFast library and this code:

[color=#0000BF]#include <digitalWriteFast.h> void setup(){ pinMode(8, OUTPUT); } void loop(){ digitalWriteFast(8, HIGH); digitalWriteFast(8, LOW); }[/color]

Gives on the Arduino a squarewave of 1.14Mhz. Now I know this squarewave has a fair bit of ringing on the rising and falling slopes, it does show that there is much speed to be obtained from this faster library (like 11 times). Not only from DigitalWriteFast but DigitalReadFast as well.

Now my obvious question is, is such a library also available for the Uno32? Or if I dare to ask can someone please make this?

I know direct port manipulation would provide the biggest gain but I am not familiar with this and I think there are a lot of people like me in the ChipKit and Arduino community who would be more then happy to see a faster version of the DigitalRead and DigitalWrite commands, besides that it is easier to use.

Kind regards, TRC


aleandro

Mon, 24 Oct 2011 23:31:49 +0000

Very good point! I'm exactly looking for this too. Any news about this?


WestfW

Tue, 25 Oct 2011 06:19:34 +0000

I'll take a stab at it. My EMACS-Foo is strong!

Is the PIC32 compiler smart enough to turn [color=#800000]LATG |= 4;[/color] into [color=#800000]LATGSET = 4;[/color] and similar? That would make it quite a bit easier (but I'm not quite sure I like that idea. (and it doesn't seem to be.) I don't suppose there is a generic register name for ioports for which such optimization is done?)


WestfW

Tue, 25 Oct 2011 07:33:10 +0000

The code produced is a little weird. Shouldn't the compiler know that it doesn't need to reload v1 ?

digitalWriteFast(3,1);
9d00135c:       24020001        li      v0,1
9d001360:       3c03bf88        lui     v1,0xbf88
9d001364:       ac6260e8        sw      v0,24808(v1)
    digitalWriteFast(3,0);
9d001368:       3c03bf88        lui     v1,0xbf88
9d00136c:       ac6260e4        sw      v0,24804(v1)

It does different weird things in a loop. a0 and v1 with the same contents, eh? And duplication of the loop guts ?

void loop()
{  
9d00135c:       3c04bf88        lui     a0,0xbf88
9d001360:       3c03bf88        lui     v1,0xbf88
  while(1) {
    digitalWriteFast(3,1);
9d001364:       24020001        li      v0,1
9d001368:       ac8260e8        sw      v0,24808(a0)
    digitalWriteFast(3,0);
9d00136c:       ac6260e4        sw      v0,24804(v1)
}

void loop()
{  
  while(1) {
    digitalWriteFast(3,1);
9d001370:       ac8260e8        sw      v0,24808(a0)
    digitalWriteFast(3,0);
9d001374:       ac6260e4        sw      v0,24804(v1)
9d001378:       0b4004da        j       9d001368 &lt;loop+0xc&gt;
9d00137c:       00000000        nop

(Hmm. Looks like the failure to realize that the LATxCLR and LATxSET have the same upper register contents, and could theoretically share an address register, is due to the build environment using the linker to define IO register addresses instead of hacking them into compile-time constants. At compile time, the compiler has no way of knowing the values of LATDCLR and LATDSET, so it can't optimize their "commonality."

That's pretty bogus. No wonder microchip needs a special (not open source) global optimizer... Grr.)

I don't suppose that there's a separate set of structure-based definitions that would finesse that? If there were:

struct ioport_ {
  int port;
  int set;
  int clr;
  int inv;
  int lat;
  int tris;
// etc} *PORTD_s = &amp; PORTDBASE;

it would be easier for the compiler to do somewhat better. At least some things, anyway. (I don't see anything. Hmmph. ALL the SFRs are in 0xBF80xxxx or 0xBF88xxxx; you could commit two registers to those base addresses and access everything without reloading...)


WestfW

Wed, 26 Oct 2011 05:58:25 +0000

I don't suppose anyone has a digitalWrite() test fixture and sketch of some kind where they could substitute in digitalWriteFast() and see if it still works? testing 44 pins is a daunting task :-( (and it took me a bit to figure out where "pin 42" went!)

The implementation I did used editor macros to create the giant compile-time conditional for digitalPinToPort() and similar, so it ought to either be correct or not work at all, but I don't feel good about putting it out there without some more significant testing.

It looks like the typical digitalWriteFast() will take three instructions (at least for the first invocation.) Load the IOPORT base address into a register, Load the bit into a register, store the bit into the IOPORT clr/set address. Depending on "stuff", subsequent writes could take as little as one instruction. The existence of the SET/CLR registers makes additional instructions to ensure atomicity unnecessary, which is nice.


WestfW

Wed, 26 Oct 2011 07:04:31 +0000

For the adventuresome and/or curious: https://github.com/WestfW/ChipKit/blob/master/FastDigital.h


Ddall

Wed, 26 Oct 2011 10:32:41 +0000

Any way you could make this work on MAX32? You'd be my hero :roll:


jasonk

Wed, 26 Oct 2011 16:44:26 +0000

Looks like the failure to realize that the LATxCLR and LATxSET have the same upper register contents, and could theoretically share an address register, is due to the build environment using the linker to define IO register addresses instead of hacking them into compile-time constants.

Yes, this is a deficiency in the compiler that we plan to address. We could work around the problem with something like this

void foo(void)
{
  register unsigned int BASEPTR0=0xBF880000;
  register unsigned int BASEPTR1=0xBF800000;
  register unsigned int BASEPTR2=0xBFC00000;

 #define TRISG (*(unsigned int*)(BASEPTR0+0x6180))
 #define LATGCLR (*(unsigned int*)(BASEPTR0+0x61A4))
 #define LATGSET (*(unsigned int*)(BASEPTR0+0x61A8))

        TRISG=0x01;
        LATGSET=0x01;
        LATGCLR=0x01;
        TRISG++;
}

We plan to add some compiler features for improving SFR access in the coming year. Among them are removing the unnecessary LUI instructions. We will also look at auto-converting bit sets, clears, & toggles to assignments to the corresponding SET/CLR/INV register. We've had other priorities so far, but these items are making their way to the top of our long list of potential improvements.


WestfW

Wed, 26 Oct 2011 20:04:20 +0000

So what's the advantage of having the SFRs defined at link time rather than compile time?


jasonk

Wed, 26 Oct 2011 20:32:30 +0000

So what's the advantage of having the SFRs defined at link time rather than compile time?

Theoretically, libraries from vendors can be provided in precompiled form and can work on different device variants. For instance, a library compiled for a PIC32MX1 could still work on a PIC32MX7, which may have some SFRs at different addresses. In practice, it probably won't happen because the devices have different peripheral sets.

Like I said though, we are likely going to add a new attribute to our port of GCC so that we can specify an absolute address for a variable and then the compiler can take advantage of the absolute address when generating code.


WestfW

Wed, 26 Oct 2011 23:39:17 +0000

libraries from vendors can be provided in precompiled form

ahh! that makes sense. It even sounds valuable, though I'm not sure how well it would work out in practice (even with similar peripheral sets.) I used to get really uncomfortable thinking what would actually be required to meet the requirement of "user-linkable with new versions of LGPL libraries" in embedded environments.

Can the compilers be given a "hint" that a given linker section has a constant base address on a 64k boundry? It might even be implementable with the current compiler and some "creative" uses of existing features (heh. Adding "paging" at source level makes the final code smaller...) Sorta like:

#define SFRPAGE ((((unsigned int)&amp;LATBSET) &amp; 0xFFFF0000))
#define xLATBSET *((volatile unsigned int *)((SFRPAGE) + (((unsigned int)&amp;LATBSET) &amp; 0xFFFF)))
#define xLATBCLR *((volatile unsigned int *)((SFRPAGE) + (((unsigned int)&amp;LATBCLR) &amp; 0xFFFF)))

(alas, this doesn't quite work. gcc doesn't seem to understand that the masking doesn't end up dividing the (unresolved) externals into upper and lower 16bit halves, so it ends up doing actual "andi" operations. I think I have the expression correct enough that it should work... Inline assembler could probably do it...)


WestfW

Thu, 27 Oct 2011 05:18:29 +0000

Any way you could make this work on MAX32?

Done. The second one is easier.

These are "slightly" tested, which is to say that they compile, pin13 blinks using the "fast" versions (and pinMode to set to output), the "last" pin works (pin43 (2nd LED) on Uno32, pin85 on Max32, and several random pins in between will flash an LED.


KurtE

Sat, 19 Nov 2011 18:54:47 +0000

Hi, I am new here so please forgive me, if I am asking an obvious question. I have been programming for a long time and over the last year or so I have been playing around with Arduinos including the some prototype boards that will be coming out at Lynxmotion

Awhile ago I ported the Lynxmotion hexapod code (Phoenix) over to the Arduino environment and thought it might be fun to try it out on the Uno32. But the first thing I ran into was several of the libraries I was using have not been ported over to the Uno32, like the PSX_LIB library for the PS2 by Bill Porter. The problem is that he is using his own macros and the like to set and clear the IO lines. So I thought I would take a stab at this.

Recently on the Arduino, I was wanting to control two WII nunchucks on one board, but they both have the same I2C address, I found the software I2C library was too slow, so I built my own as the DigitalWriteFast. The issue I had with using DigitalWriteFast was I wanted to pass in pin numbers to the init function and DigitalWriteFast only works with constants. So what I did instead was to un-roll the digital Write, that my init code would do something like:

// Some globals or class variables...
volatile uint32_t	*SCLlatchPort;
uint16_t SCLPinBit;
...

// Init code 
    uint8_t				port;

	port	=	digitalPinToPort(SCLpin);
	SCLPinBit =	digitalPinToBitMask(SCLPin);
	SCLlatchPort=	portOutputRegister(port);
// may need to grab data to read ... like wise for SDA

Then when I needed to to change SCL to low, I would just do something like:

*SCLlatchPort &amp; = ~SCLPinBit

Or if I needed to go high I would:

SCLlatchPort  |= SCLPinBit

Note: these were done through macros or inline functions...

So the question is, would this work on the Uno32? Also would the set and clear turn out to be atomic operations? If not what do I need to do to make them atomic? Ran into earlier problems with PS2 library was not atomic and it walked over the changes that the Servo library was making during interrupts.

Again sorry if I went too far off topic here or if the question(s) are obvious.

Thanks Again Kurt


mikael

Tue, 22 Nov 2011 00:12:36 +0000

I've tested WestfW DigitalWriteFast library in my atempt to port the arduino glcd lib for chipkit. digitalWriteFast works perfectly, but pinModeFast and digitalReadFast seem not work as expected. Tested with board Max32, and i've used pin 71-83.


WestfW

Wed, 23 Nov 2011 07:08:47 +0000

pinModeFast and digitalReadFast seem not work as expected.

Rats! Any specifics? I thought my testing covered pinModeFast(), but I'll admit that digitalReadFast() didn't get much more that a "compiles and doesn't crash" sort of test.


bperrybap

Sat, 24 Dec 2011 00:13:25 +0000

Please Oh Please, can we not wander down the digitalWriteFast() path on this platform? There is no need to have a separate functions for "fast" vs regular.

It is better to use appropriate macro wrappers around functions so that if the arguments are constants you get the fast direct port i/o and if not, it calls a function (ex: the digitalWrite() function) to do the mappings runtime.

That way there is one and only one API and if you use constants, things just happen a lot faster.

I never understood why the Arduino development team wouldn't accept this solution.

Take a look at what Paul has done with his Teensy core. If you use constants, with his core, you automagically get the faster code.

IMHO, this is the way things should work.

--- bill


WestfW

Sat, 24 Dec 2011 02:25:50 +0000

It is better to use appropriate macro wrappers

In fact, this implementation is exactly that, and is based off the discussions that have occurred in the Arduino Forums. However, lacking complete testing or official blessings, they have to have SOME other name!

#ifndef digitalWriteFast
#define digitalWriteFast(P, V) \
do {								\
    if (__builtin_constant_p(P) &amp;&amp; __builtin_constant_p(V)) {		\
	if (V) {							\
	    *(_dpin_to_set_macro(P)) = _dpin_to_bitmask_macro(P);	\
	} else {							\
	    *(_dpin_to_clr_macro(P))  = _dpin_to_bitmask_macro(P);	\
	}								\
    } else  digitalWrite((P), (V));					\
}while (0)
#endif  //#ifndef digitalWriteFast

(The big part of the effort is coming up with the _dpin_to_xxx macros. In this case, they are simple but long and ugly cascaded ternary expressions built from the pins_xxx file using editor macros. (said macros being included in the source, for prosperity...)


bperrybap

Sat, 24 Dec 2011 04:01:04 +0000

I know and understand the DigitalWriteFast code. (I participated in the long discussion over the Arduino forum).

I have 1700 lines of macros and inline functions that I created myself for the Arduino AVR based glcd library that not only maps individual pins for port i/o but can detect adjacent pins in a port and do multiple pin updates concurrently - nibble and byte when possible.

What I'm trying to say is lets not intentionally to go down the DigitalWriteFast path again on this platform except for testing the implementation. IMHO, DigitalWriteFast is solving the problem in the wrong way by doing it at the wrong level.

Instead, I'd like to see us push to get the final working solution incorporated into the real core code instead of having to live with a less optimal solution that is forced into using alternate API names to get the faster code.

In other words, I believe that the optimization should be totally transparent to the user. The user should not have to use a different set of APIs to get faster code when the core code make it happen automatically.

I understand the usefulness of going through an interim stage while all the various macros are sorted out. But after that, I think it makes sense to be moved into the core code so everything benefits from it automatically.

--- bill


Jacob Christ

Tue, 27 Dec 2011 13:12:40 +0000

What I'm trying to say is lets not intentionally to go down the DigitalWriteFast path again on this platform except for testing the implementation. IMHO, DigitalWriteFast is solving the problem in the wrong way by doing it at the wrong level.

Bill,

Although I don't have much experience with the Arduino or the community I agree completely with you. Also, in working with the maintainers I would say that they are very open too implementation suggestions, so much so that just posting a suggested code change as an issue in the repo will get it implemented.

https://github.com/chipKIT32/chipKIT32-MAX

Jacob


WestfW

Wed, 15 Feb 2012 09:45:29 +0000

So, remember this really ugly implementation of DigitalWriteFast? It relied on gcc being able to optimize away long and complex ternary statements containing constants. It was fast, but it relied on awful-looking C macros that were completely separate from the main pins_arduino.c structures, and thus subject to maintenance nightmares.

Well, it turns out that gcc will also optimize indexing into static arrays with a constant, so we can get ALMOST as fast an implementation, using the code that already exists. No maintenance nightmare, and a pretty closely parallel implementation, and no particularly ugly macros.

variants/xxx/Board_Data.c gets a slight set of modifications so that it can make either global arrays (for pins_arduino.c) or static arrays that disappear when only indexed by constants:

+#if defined(OPT_BOARD_DATA_STATIC)
+#define MAYBESTATIC static
+#else
+#define MAYBESTATIC
+#endif
 
 /* ------------------------------------------------------------ */
 /*                                     Data Tables                                                                     */
@@ -56,7 +61,7 @@
 ** the TRIS register for the port. This is used for setting the
 ** pin direction.
 */
-const uint32_t port_to_tris_PGM[] = {
+MAYBESTATIC const uint32_t port_to_tris_PGM[] = {
        NOT_A_PORT,                             //index value 0 is not used
 
 #if defined(_PORTA)
@@ -108,7 +113,7 @@
 /* This table is used to map the digital pin number to the port
 ** containing that pin.
 */
-const uint8_t digital_pin_to_port_PGM[] = {
+MAYBESTATIC const uint8_t digital_pin_to_port_PGM[] = {

And then fastio.h gets code to include the arrays, and uses inline functions that closely parallel wiring_digital.c. It even gets to include some of the sanity checking:

#define OPT_BOARD_DATA_STATIC 1
#define OPT_BOARD_DATA 1
#define OPT_BOARD_INTERNAL 1
#include &lt;p32xxxx.h&gt;
#include &lt;WProgram.h&gt;
#include &lt;Board_Data.c&gt;

/*
 * This looks a lot like digitalWrite, but uses the static arrays and is inline.
 * when called with constants, it should optimize down to the single instruct.
 */
static inline void _dwf(uint8_t pin, uint8_t val)
{
    p32_ioport *	iop;
    unsigned int		port;
    unsigned int		bit;

	//* Get the port number for this pin.
	if ((pin &gt;= NUM_DIGITAL_PINS) ||
	    ((port = digitalPinToPort(pin)) == NOT_A_PIN))
	{
		return;
	}

	//* Obtain pointer to the registers for this io port.
	iop = (p32_ioport *)portRegisters(port);

	//* Obtain bit mask for the specific bit for this pin.
	bit = digitalPinToBitMask(pin);

	//* Set the pin state
	if (val == LOW)
	{
		iop-&gt;lat.clr = bit;
	}
	else
	{
		iop-&gt;lat.set = bit;
	}
}


#define digitalWriteFast(P, V)  \
    if (__builtin_constant_p(P) &amp;&amp; __builtin_constant_p(V)) {	\
	_dwf(P, V);						\
    } else {							\
	digitalWrite((P), (V));					\
    }

This makes me feel a lot warmer and fuzzier than the previous implementation. It ought to be MUCH easier to add to the different variants, and could have "partial" support added to the core with no impact...


rasmadrak

Tue, 21 Feb 2012 21:40:55 +0000

You're the man! :)


bperrybap

Tue, 15 Oct 2013 01:01:55 +0000

ok, So I know this thread is old, but I'm just getting around to needing fast i/o on the pic32 for my glcd library and all the digitalWriteFast()/digitalReadFast() stuff annoys the !@#!@# out of me not to mention it is pain to deal with. So let me explain further about not needing to go down the digitalXXXFast() path again.

While I believe that a better more integrated solution should be supplied with the IDE for a "it just works" out of the box experience, even when implementing faster i/o by using the simple wrapper macros in a header file like several of the FastDigital header files are doing, there is simply no need to create a new/alternate API for the faster i/o. The way C macro expansion works, you can use the exact name for the macro wrapper as the name of the function it is wrapping. This feature is specifically mentioned in the cpp documenation. This allows you to wrap a function without the calling code ever having to know about it. It is one of the many subtle behaviors of cpp that often comes in handy. I use it quite often. So for example, a tweak from the FastDigital.h header works perfectly:

#define digitalWrite(P, V) \
do {								\
    if (__builtin_constant_p(P) &amp;&amp; __builtin_constant_p(V)) {		\
	if (V) {							\
	    *(_dpin_to_set_macro(P)) = _dpin_to_bitmask_macro(P);	\
	} else {							\
	    *(_dpin_to_clr_macro(P))  = _dpin_to_bitmask_macro(P);	\
	}								\
    } else  digitalWrite((P), (V));					\
}while (0)

Because of the way cpp works, it does not cause macro recursion. In the above example, the calling code will get the macro instead of the function and the macro can then call the real digitalWrite() when necessary. Defined this way instead, the user simply includes the FastDigital header and does not have to modify his code.

--- bill


guymc

Thu, 17 Oct 2013 00:40:58 +0000

This sounds like a great idea. So I asked a few of our run-time engineers to take a closer look.

Turns out there may be complications with PPS mapping on MX1,2 devices, and with pull-ups for Change Notification pins. These concerns might be resolvable, but there is also a third issue. A new I/O architecture is under development that is definitely not compatible with this proposal. In the interest of compatibility across board variants, I don’t think we can implement this macro in the core system.

Nevertheless, thank you for taking the time to write up a clear explanation of how it could work. That's useful info for those of us with limited C preprocessor (cpp) experience.


bperrybap

Thu, 17 Oct 2013 01:53:54 +0000

This sounds like a great idea. So I asked a few of our run-time engineers to take a closer look. Turns out there may be complications with PPS mapping on MX1,2 devices, and with pull-ups for Change Notification pins. These concerns might be resolvable, but there is also a third issue. A new I/O architecture is under development that is definitely not compatible with this proposal. In the interest of compatibility across board variants, I don’t think we can implement this macro in the core system. Nevertheless, thank you for taking the time to write up a clear explanation of how it could work. That's useful info for those of us with limited C preprocessor (cpp) experience.

Huh? I'm following what you are saying. You can wrap CPP macros on top of anything. There should be no compatiblity issue with respect to variants the macros just have to smart enough to deal with things.

I have a set of macros that are thousands of lines long that I use on AVR to do raw port i/o. They automatically determine if bits are adjacent on ports and can do multi bit i/o.

In my openGLCD library I have layers and layers of macros that make decisions based on all kinds of information, including board types, which in some cases has to be determined by looking at analog to digital pin mappings from other macros down in the variant files. (Arduino IDE does not tell you which board the code is being compiled for)

All of this is done at compile time with help from lots of cpp macros.

The key is macros need to be very high up if not on top of the API rather than down in the middle of a design.

My suggestion is that anybody involved with defining a i/o architecture will need to be VERY proficient at CPP. The reason being is that as much as possible needs to be done at compile time vs run time to get the performance up. There will have to be trade offs between what can be done at compile time vs run time given the API. But often you gain significant advantages when taking of advantage of capabilites of CPP and using smart macros.

There are also some things that can be done in C++ that can really help, but then you can't use it in regular C code.

For me, one thing that is needed that is missing from the existing Arduino i/o API is multi bit i/o. i.e. the abilty to set/read multiple i/o pins with a single API call. This allows the underlying code to optimize the i/o to reduce the number of register operation and dramatically speed up the i/o process.

For an example of what I'm talking about see my avrio header file: [url]http://code.google.com/p/mcu-io/source/browse/trunk/avr/avrio/avrio.h[/url]

--- bill


ricklon

Mon, 21 Oct 2013 22:15:55 +0000

If anyone would like to combine this into a 3rd party library on github. I could include it in the next build of MPIDE.

Let me know if there is an issue getting the code into Github. I can help with that.

-Rick


bperrybap

Mon, 21 Oct 2013 23:41:22 +0000

Were you refreing to DigitalFast? or the avrio stuff I've done?

When is the next build?

In the big picture, I still believe that the digitalXXXfast() API is the wrong way to go. I stronly believe that should be transparaent to the actual user sketch code vs being a new API.

If it were to be handled as a 3rd party library, then I believe that the user simply includes a header file, and his existing code with no modifications, continues to work, using the existing digitalWrite()/digitalRead() API, but the code gets faster if certain parameters are constants.

i.e. the header file just creates wrapper macros that sit on top of the existing API as I mentioned earlier.

The reason I ask about timing for the next release, is that while I'm willing to do it, I've still got a bunch of stuff on my plate to get my openGLCD library pushed out.

As far as actual implemenation goes, there are some open items like does the implemenation have to work for both C and C++?

Also, it would get much easier to do and maintain if there could be a slightly different declaration in the Board_Data.c files to allow turning on/off a static declaration so that the code could pull the data from the actual IDE supplied pin data tables vs having to create and maintain parallel tables in macros.

And if we are talking about going that far for integration, why not just include it into the core code and be done with it so all sketches benefit from the additional performance when paramters allow it?

--- bill