chipKIT® Development Platform

Inspired by Arduino™

New to ChipKit/MPIDE development, some questions

Created Thu, 14 Jul 2016 10:16:27 +0000 by physecfed


physecfed

Thu, 14 Jul 2016 10:16:27 +0000

Hey all,

I'm fairly new to the ChipKit boards and so I had a bunch of questions regarding them, particularly the WiFire if there are any differences between them (that's the one I have right now). I've done plenty of Arduino dev work, albeit not for a couple years, and I've certainly done enough embedded systems stuff, but I'm green as grass to this particular ecosystem. So, here go the questions.

  1. In general, is the MPIDE development environment for PIC32MX/MZ targets equivalent to that of AVRs, such as the Arduino? That is, do the pin functions such as pinMode generally work the same, taking board number macros as arguments? Are there any headers I need to include, or are these implicitly included/linked by the IDE at compile-time? (I write most of my code in external editors)

  2. Do "bare-metal" I/O routines work in MPIDE, such as flushing a byte or a word to an entire port (i.e. I/O port B) at once? I'm working on a project that involves porting over some parallel interface code, and that code can be optimized by making the data lines contiguous, provided MPIDE provides faculties for port-level, rather than pin-level I/O. Does MPIDE support reading and writing the TRISx/PORTx/LATx register macros directly?

  3. Does the peripheral pin select (PPS) functionality have to be tinkered with much? For example, in order to enable UART1, do I have to enable the UART functionality on RD14/RD15 or does the appropriate Serial object constructor handle that "behind the scenes"? I'm a bit used to MPLAB in that respect.

  4. Does interrupt handler attachment and functionality generally work the same as in Arduino environments (that is, via attachInterrupt(), attachment to a void, no-args handler)?

  5. What is the WiFire's (MZ) floating-point performance generally like? I'm working with some code sections that involve heavy use of long double and want to ensure that it won't sap down the core too much.

Also, if an admin/moderator could PM me about some account concerns, I'd greatly appreciate it.

Best regards, physecfed


majenko

Thu, 14 Jul 2016 11:51:24 +0000

  1. In general, is the MPIDE development environment for PIC32MX/MZ targets equivalent to that of AVRs, such as the Arduino? That is, do the pin functions such as pinMode generally work the same, taking board number macros as arguments? Are there any headers I need to include, or are these implicitly included/linked by the IDE at compile-time? (I write most of my code in external editors)

The chipKIT environment is (mostly) API compatible with Arduino. Everything from pinMode() through analogRead() and analogWrite(). pulseIn(), Serial.println(), etc work exactly the same. Many Arduino libraries will just work directly. SPI.h, Wire.h, etc., are all provided and work just the same.

  1. Do "bare-metal" I/O routines work in MPIDE, such as flushing a byte or a word to an entire port (i.e. I/O port B) at once? I'm working on a project that involves porting over some parallel interface code, and that code can be optimized by making the data lines contiguous, provided MPIDE provides faculties for port-level, rather than pin-level I/O. Does MPIDE support reading and writing the TRISx/PORTx/LATx register macros directly?

Yes, they do. You have a full PIC32 C++ compiler and headers which give you full access to all the registers. It's basically the open-source portion of XC32 v1.40.

  1. Does the peripheral pin select (PPS) functionality have to be tinkered with much? For example, in order to enable UART1, do I have to enable the UART functionality on RD14/RD15 or does the appropriate Serial object constructor handle that "behind the scenes"? I'm a bit used to MPLAB in that respect.

That is all handled behind the scenes. The pin assignments are all defined in a set of board files (Board_Defs.h and Board_Data.c) and the Serial code (and Wire, and SPI, etc) assign those pins to the right places using PPS. You don't have to do anything - unless you want to change the pin assignments to something different.

  1. Does interrupt handler attachment and functionality generally work the same as in Arduino environments (that is, via attachInterrupt(), attachment to a void, no-args handler)?

In general, yes. The only caveat is that the PIC32 only understands RISING and FALLING. There's no CHANGE or LOW support in the chip. There has been an attempt at providing LOW emulation, but I don't know if that has made it into the publicly distributed core yet.

  1. What is the WiFire's (MZ) floating-point performance generally like? I'm working with some code sections that involve heavy use of long double and want to ensure that it won't sap down the core too much.

If you have a Rev C and choose the right board in the IDE then the FPU is used. If you have a Rev B, or a Rev C but don't choose the right board, then the FPU won't be used.

I use the FPU for some complex 3D graphical transformations and calculations. It speeds my program up about 5x. Depending on just what you are doing you may get more than 5x (some people have reported up to 10x greater performance) or less.

Here is a video comparing with FPU and without FPU:

[youtube]https://youtu.be/qY1Mq8s5YH8[/youtube]

Left is without FPU, right is with FPU.


physecfed

Thu, 14 Jul 2016 23:08:19 +0000

The chipKIT environment is (mostly) API compatible with Arduino. Everything from pinMode() through analogRead() and analogWrite(). pulseIn(), Serial.println(), etc work exactly the same. Many Arduino libraries will just work directly. SPI.h, Wire.h, etc., are all provided and work just the same.

Okay. So the differences between the PIC32 and AVR are for the better part abstracted into the existing Arduino functions. Do the timers also follow the same pattern, or will I need to write out a set or library routines to access them?

Yes, they do. You have a full PIC32 C++ compiler and headers which give you full access to all the registers. It's basically the open-source portion of XC32 v1.40.

Now, for the fun part. Reading the datasheet and manuals, those registers are 32 bits wide, with only the lower 16 bits being used, and the "available" bits are based on which pins of that port happen to be bonded out.

Are there any special precautions that need to be taken (in regards to casting, endianness or manipulating byte-wide data) in order to write to or read from the port? Would this code work?

void wrioregb (uint8_t byte) {
    TRISB &= (uint32_t) 0x00;
    LATB  |= (uint32_t) byte;
}

Would the cast to an explicitly-32-bit data type even be necessary, or would the port macros handle appropriately if the byte was written to them directly? I might be a bit too worried about this because I've been dealing with systems where endianness/alignment is an issue and overall the past few years been writing way too much Ada (where types have an intrinsic hatred for one another) for the benefit of my C skills.

In general, yes. The only caveat is that the PIC32 only understands RISING and FALLING. There's no CHANGE or LOW support in the chip. There has been an attempt at providing LOW emulation, but I don't know if that has made it into the publicly distributed core yet.

Ah. RISING/FALLING seem to be all that I need, at least for the time being. I'm working on some embedded GPS applications and I just want to be able to use the PPS output (pulses high for 50-100ms every second) for some interval routines. That should also explain the interest in the FPU - I'm playing around a little with Vincenty's formulae (calculating the surface distance between two GPS coordinates) and it involves a pretty heavy set of floating-point routines.

I've been trying to strip it down in order to get it into the ballpark of accuracy where it could be relied upon to run within one second, or preferably half a second.

Is the MZ on the Wi-Fire configured by default to work at top speed (200 MHz) or will I need to configure that myself?

physecfed


majenko

Thu, 14 Jul 2016 23:19:56 +0000

Okay. So the differences between the PIC32 and AVR are for the better part abstracted into the existing Arduino functions. Do the timers also follow the same pattern, or will I need to write out a set or library routines to access them?

Timers are very different. They aren't included in the Arduino API, only external libraries. I made a library a while back to handle timers on the PIC32:

  • [url]https://github.com/MajenkoLibraries/Timer[/url]

Now, for the fun part. Reading the datasheet and manuals, those registers are 32 bits wide, with only the lower 16 bits being used, and the "available" bits are based on which pins of that port happen to be bonded out. Are there any special precautions that need to be taken (in regards to casting, endianness or manipulating byte-wide data) in order to write to or read from the port? Would this code work?

void wrioregb (uint8_t byte) {
TRISB &= (uint32_t) 0x00;
LATB  |= (uint32_t) byte;
}

Would the cast to an explicitly-32-bit data type even be necessary, or would the port macros handle appropriately if the byte was written to them directly? I might be a bit too worried about this because I've been dealing with systems where endianness/alignment is an issue and overall the past few years been writing way too much Ada (where types have an intrinsic hatred for one another) for the benefit of my C skills.

No need to cast anything at all. By the way, there are handy "SET" and "CLR" register variants for speeding your code up.

LATBSET = myByte;

is the same as:

LATB |= myByte;

but is far more efficient. By the way, "byte" is a reserved word - aliased to "uint8_t".

Ah. RISING/FALLING seem to be all that I need, at least for the time being. I'm working on some embedded GPS applications and I just want to be able to use the PPS output (pulses high for 50-100ms every second) for some interval routines. That should also explain the interest in the FPU - I'm playing around a little with Vincenty's formulae (calculating the surface distance between two GPS coordinates) and it involves a pretty heavy set of floating-point routines. I've been trying to strip it down in order to get it into the ballpark of accuracy where it could be relied upon to run within one second, or preferably half a second. Is the MZ on the Wi-Fire configured by default to work at top speed (200 MHz) or will I need to configure that myself? physecfed

Yes, it's 200MHz by default.


physecfed

Fri, 15 Jul 2016 00:44:38 +0000

Ah, okay. Final question -

I've done some digging in the past and come up fairly dry, but is there a good reference to the instructions (the part of the MIPS ISA) supported by the PIC32 core as well as their timings?

I'm asking because there is some code that I've tinkered with in the past (and that I'd be interested in running again on a platform with more power) that has very specific timings in place. So, I'm looking to burn clock cycles with NOPs, i.e.

asm volatile ("nop"
              "nop");

In order to do this, I need to know how many cycles a NOP will execute in on the PIC32 MIPS core.


majenko

Fri, 15 Jul 2016 09:51:57 +0000

In the CPU manual (http://ww1.microchip.com/downloads/en/DeviceDoc/61113E.pdf) there is a list of related MIPS documents. One of them is the MIP32 instruction set:

  • [url]https://imagination-technologies-cloudfront-assets.s3.amazonaws.com/documentation/MD00086-2B-MIPS32BIS-AFP-06.03.pdf[/url]

That details every instruction available to you in great depth. However it doesn't give the number of cycles per instruction. Simply because MIPS is a pipelined architecture. It's effectively doing up to 5 different things on 5 different instructions at once, so timing of one instruction can be affected by the timing of another one higher up in the pipeline. You can't say "This instruction takes this number of clock cycles" because it's far more complex than that.

The M4K Processor Core document from MIPS details the pipeline (and all the rest of the internals of the CPU):

  • [url]https://imagination-technologies-cloudfront-assets.s3.amazonaws.com/documentation/MD00249-2B-M4K-SUM-02.03.pdf[/url]

That said, if there are 5 NOPs in the pipeline then each one will take one clock cycle. In general a single-word (32-bit) instruction will require 5 clock cycles to make its way through the 5-stage pipeline, but it can complete one for each clock cycle since there are 5 in the pipeline at any one time.

The biggest culprit for "flexible" timing is MUL or DIV. These instructions get farmed out asynchronously to the multiplier/divide unit (MDU), and if the result is requested (with MFLO / MFHI) before it is ready the pipeline will stall until the MDU is finished working on the problem. I guess the same (though I haven't checked) will be true of the FPU - if you request the result before it's ready the pipeline will stall until it is. So any NOPs following in the pipeline will take as long as it takes for the MUL, etc, to complete.

Another critical thing to watch with a pipeline is when you get a branch instruction, which needs special handling, since what is following it in the pipeline is destined to be executed (or is already being executed) regardless of the results of the branch. This is called the "Branch Delay Slot" and is the instruction immediately following a branch operation. It is common to put a NOP in the branch delay slot to negate this, but it can be used to execute one last instruction before the branch takes effect, as long as you don't expect that operation to affect the outcome of the branch, since that has already been decided at that point.

Section 2.10 of the M4K Processor Core document mentions instruction timing:

Most instructions can be issued at a rate of one per clock cycle. In order to adhere to the sequential programming model, the issue of an instruction must sometimes be delayed. This [is] to ensure that the result of a prior instruction is available. Table 2.5 details the instruction interactions that prevent an instruction from advancing in the processor pipeline.

There then follows a table of instruction combinations that will have an "interlock" delay and how many clock cycles that delay will be. For instance, any instruction that performs a load, followed by any instruction that would consume the results of that load, has a single clock cycle delay added to allow the result of the load to be available to the consumer instruction.

For that reason instruction ordering can be critical to efficient software design. For instance it is far more efficient to:

Load A
Load B
Use A
Use B

rather than the more logical:

Load A
[interlock]
Use A
Load B
[interlock]
Use B

since the CPU could be loading B instead of interlocking waiting for Load A to finish before being able to Use A.

Another thing of note with the M4K core is the "Pipeline Bypass" feature. This allows for the result of one instruction to be fed directly to the input of the next instruction without the need for it to be stored and loaded from a register in between. That means that some instruction combinations can take less than the full 5 clocks to complete their pipeline journey. Therefore it is possible for it to appear that an instruction takes less than one clock cycle to complete. That's how a 200MHz processor can achieve 330 DMIPS, by instructions on average taking less than one full clock cycle to execute. Clever, eh?

So as you can see, it's impossible to say for sure "This instruction will take X clock cycles to execute" - the closest you can say is "You can usually issue one instruction per clock cycle, but not always, and how many clock cycles that instruction will take to complete depends on what has gone before."