cihpkit issues on linux Mint 13

bperrybap
Posts: 48
Joined: Sat Nov 19, 2011 8:45 pm

Re: cihpkit issues on linux Mint 13

Post by bperrybap » Mon Jun 13, 2016 1:11 am

I looked here for the issue:
https://github.com/sergev/pic32prog/issues
but didn't see a new issue.
Is the repo somewhere else?

It seems odd that it is a DTR handling issue since I see the board always getting reset.
The failure seems to occur when the FTDI chip has been given a bunch of bytes with before pic32prog is run.

For the time being, I've got a wrapper script that does a retry if the pic32prog upload fails and that works since the first upload attempt resets the board and second starts before the sketch has a time to push out any serial data since the boot-loader is so slow at starting up the application code.



--- bill

bperrybap
Posts: 48
Joined: Sat Nov 19, 2011 8:45 pm

Re: cihpkit issues on linux Mint 13

Post by bperrybap » Mon Jun 13, 2016 6:25 am

So I tracked down the I2C issue.
note: this has nothing to do with Mint13 but since i mentioned it earlier, I thought I do a followup.

After many hours with a logic analyzer,
here is what happening.
MPIDE 0023 works.
MPIDE 0150 and the latest chipkit core both have an issue but break slightly differently.
(all 3 have different code in this area, with the 0023 code being an entirely different low level twowire code)

It turns out if you call Wire.beginTransmission(address) and then call Wire.endTransmission()
If you then call Wire.beginTransmission(address) again, with no delay, things scew up.
The reason to do this is when probing the i2c bus for devices. The endTransmission() returns the ACK/NACK status so you can tell if there is device there with that address.
From experimentation there needs to be a delay of at least 20us between the endTransmission() and the next beginTransmission()
The 0150 MPIDE Wire code seems to get stuck sending out the same address over and over again.
The latest Wire code in the Arduino core, in beginTransmission() checks the bus status and then aborts thinking someone else has the bus.
I don't believe this is a slew rate issue since when doing back to back data transfers there is only 17us between each byte.

There is nothing in the i2c spec that mandates any sort of delay here so I'm not sure why the delay is needed.
Maybe the pic32 i2c system is trying to give other masters more time to jump in?
In any case it isn't obvious that there needs to be a delay here as no other cores require this.

Not sure if this is really a bug or if it is pic32 i2c h/w limitation that needs to be documented.

--- bill

User avatar
majenko
Site Admin
Posts: 2165
Joined: Wed Nov 09, 2011 7:51 pm
Location: UK
Contact:

Re: cihpkit issues on linux Mint 13

Post by majenko » Mon Jun 13, 2016 9:49 am

Which board are you using for this? It may be something specific in the silicon of the chip on that board. I have a large range of boards here that I can test it on to see if it's something in the core, or something in the chip.

Also, what I2C device(s) do you have connected up? It may be something on the bus that's interfering in some way and holding the clock line low for a short period when it shouldn't, or causing noise on the data line making the master think there's a collision.
Why not visit my shop? http://majenko.co.uk/catalog
Universal IDE: http://uecide.org
"I was trying to find out if it was possible to only eat one Jaffa Cake. I had to abandon the experiment because I ran out of Jaffa Cakes".

User avatar
majenko
Site Admin
Posts: 2165
Joined: Wed Nov 09, 2011 7:51 pm
Location: UK
Contact:

Re: cihpkit issues on linux Mint 13

Post by majenko » Mon Jun 13, 2016 9:53 am

bperrybap wrote:I looked here for the issue:
https://github.com/sergev/pic32prog/issues
but didn't see a new issue.
Is the repo somewhere else?

It seems odd that it is a DTR handling issue since I see the board always getting reset.
The failure seems to occur when the FTDI chip has been given a bunch of bytes with before pic32prog is run.

For the time being, I've got a wrapper script that does a retry if the pic32prog upload fails and that works since the first upload attempt resets the board and second starts before the sketch has a time to push out any serial data since the boot-loader is so slow at starting up the application code.
--- bill
It's a pull request, not an issue: https://github.com/sergev/pic32prog/pull/43

Simply, when a new baud rate was being selected, the current terminal status was being saved. It shouldn't have been. The normal course of action is you open the serial port, save the termios data somewhere, put in place your own settings, do whatever you want on serial, then put back the settings that you saved and close the port. When selecting an alternate baud rate the current settings were being saved again. That would overwrite the real saved settings with the current custom settings, so when the port was closed the wrong settings were being put back leaving the port in a bad state.

By opening the port and DTR not being asserted it cleared out the FTDI's buffers and allowed the LEDs to blink again briefly giving the appearance of a reset. I confirmed it wasn't resetting on my oscilloscope - there was no low blip on the reset line after the first use of pic32prog.
Why not visit my shop? http://majenko.co.uk/catalog
Universal IDE: http://uecide.org
"I was trying to find out if it was possible to only eat one Jaffa Cake. I had to abandon the experiment because I ran out of Jaffa Cakes".

bperrybap
Posts: 48
Joined: Sat Nov 19, 2011 8:45 pm

Re: cihpkit issues on linux Mint 13

Post by bperrybap » Mon Jun 13, 2016 5:38 pm

majenko wrote:Which board are you using for this? It may be something specific in the silicon of the chip on that board. I have a large range of boards here that I can test it on to see if it's something in the core, or something in the chip.

Also, what I2C device(s) do you have connected up? It may be something on the bus that's interfering in some way and holding the clock line low for a short period when it shouldn't, or causing noise on the data line making the master think there's a collision.
I'm using an UNO32 serial # D402405 purchased in Dec 2011.
I'm testing with PCF8574 and MCP23008 devices.
But for what is going on, they really are not involved with bus.

I didn't see anything looking out of place looking on the clock or data lines.
See the attached analyzer shots of with and without the delay.
The sck clock is only 100khz and analyser sample rate is 24Mhz.
You can see the time between the stop and the next start.

The code running is attempting to discover devices by probing addresses starting at address 0x20 and incrementing by 1
Without the delay you will see what looks like addresses skipping, but in fact what is happening is that the code in beginTransmission() is rejecting requests for a period of time and so the requests for those addresses are silently dropped on the floor since the function is a void function.
The other code as a 100us delay after the endTransmission() before next beginTransmission().

Different versions of the code hand beginTransmission() differently,
here is the latest beginTransmission()

Code: Select all

void TwoWire::beginTransmission(uint8_t address)
{
    DTWI::I2C_STATUS i2cStatus = di2c.getStatus();

    // if someone else has the bus, then we won't get it; get out
    if(i2cStatus.fBusInUse  && !i2cStatus.fMyBus)
    {
        return;
    }
   
    // we only want to loop on this with a repeated start
    // otherwise it will pass on the first try 
    while(!di2c.startMasterWrite(address) && di2c.getStatus().fMyBus);
}
What is happening is that for about 20us after a previous endTransmission() that if statement aborts the master start setup by returning.
I'm not sure why, but from a larger perspective, it seems bad to have a function that can behave two different ways, one of them a fatal error and not return any status.
Given that the Arduino version is a void, it creates a portability issue if the chipKIT code were to return a status.
Wouldn't it be better to spin here either forever or perhaps with a timeout? so that it "just works" rather than just silently aborting/failing.

--- bill
Attachments
Screenshot-Saleae Logic 1.1.15 - [Connected] - [24 MHz, 2 M Samples].png
With added 100us delay
Screenshot-Saleae Logic 1.1.15 - [Connected] - [24 MHz, 2 M Samples].png (67.72 KiB) Viewed 552 times
Screenshot-Saleae Logic 1.1.15 - [Connected] - [24 MHz, 2 M Samples]-1.png
Without delay
Screenshot-Saleae Logic 1.1.15 - [Connected] - [24 MHz, 2 M Samples]-1.png (69.72 KiB) Viewed 552 times

User avatar
majenko
Site Admin
Posts: 2165
Joined: Wed Nov 09, 2011 7:51 pm
Location: UK
Contact:

Re: cihpkit issues on linux Mint 13

Post by majenko » Mon Jun 13, 2016 6:19 pm

I guess we need to narrow it down to which of those two terms is causing the negative result - is it that it thinks the bus is in use (fBusInUse) or that it's not the owner of the bus (fMyBus)?

The former is simple the state of the S bit in the I2CxSTAT register - a flag that says if a start bit has been seen. The latter is slightly more complex, being:

Code: Select all

fMyBus          = fI2COn && !fBusError && fBusInUse && curStateFreeze != I2C_IDLE;
A fBusError is if the "curStateFreeze" (whatever that is) is equal to I2C_BUS_ERRORm or the BCL bit in I2CxSTAT register being true (BCL is Bus CoLlision). So it has to be turned on, not an error, in use, and not idle for the bus to be considered "mine". So if it turns out to think it's not "mine" then we'd need to then dig deeper to see which of those 4 terms is saying the wrong thing.

So I'd start with splitting that "if" into two separate ones and performing some form of debugging (Serial print, light an LED, whatever) to see which of them is triggering it.
Why not visit my shop? http://majenko.co.uk/catalog
Universal IDE: http://uecide.org
"I was trying to find out if it was possible to only eat one Jaffa Cake. I had to abandon the experiment because I ran out of Jaffa Cakes".

bperrybap
Posts: 48
Joined: Sat Nov 19, 2011 8:45 pm

Re: cihpkit issues on linux Mint 13

Post by bperrybap » Mon Jun 13, 2016 6:25 pm

In the i2c spec:
http://www.nxp.com/documents/user_manua ... f#G1659003 table 10 page 48
the min bus free time tBUF between stop and start for standard mode (100Khz) is only 4.7us
but you can insert an additional delay of 15us, and the Wire code will still have the issue.

--- bill

User avatar
majenko
Site Admin
Posts: 2165
Joined: Wed Nov 09, 2011 7:51 pm
Location: UK
Contact:

Re: cihpkit issues on linux Mint 13

Post by majenko » Mon Jun 13, 2016 7:11 pm

Ok, using a WiFire Rev C (the newest one with the least silicon bugs) I am seeing exactly what you describe.

Code: Select all

#include <Wire.h>

void setup() {
    Wire.begin();
}

void loop() {
    for (int i = 20; i < 60; i++) {
        Wire.beginTransmission(i);
        Wire.endTransmission();
    }
}
It skips and skips and skips, to the extent that the addresses appear almost random. Add 20µs delay into the very beginning of beginTransmission() and you get a perfect incrementing of addresses.

So if it is something in the silicon then it's something that has been in the silicon since way back when and has never been either spotted or fixed in the numerous copy-and-pastings of that logic block in the designs since then.

So now to debug...
Why not visit my shop? http://majenko.co.uk/catalog
Universal IDE: http://uecide.org
"I was trying to find out if it was possible to only eat one Jaffa Cake. I had to abandon the experiment because I ran out of Jaffa Cakes".

bperrybap
Posts: 48
Joined: Sat Nov 19, 2011 8:45 pm

Re: cihpkit issues on linux Mint 13

Post by bperrybap » Mon Jun 13, 2016 7:30 pm

From the comment it looks like there are some h/w issues that are attempting to be worked around.

So here is the state of the flags when it occurs:
TwoWire begin abort: curStateFreeze:0, fBusInUse:1, fBusError:0, fI2COn:1, fMyBus:0, curStateFreeze!=I2C_IDLE:0

I don't understand enough about the h/w and how the state machine works (without spending more time than I want to ) to be able able to comment much further.

The older MPIDE wire code didn't have this issue, but then it was different code and it didn't do all these checks so maybe multi master didn't work on that older code.

--- bill

User avatar
majenko
Site Admin
Posts: 2165
Joined: Wed Nov 09, 2011 7:51 pm
Location: UK
Contact:

Re: cihpkit issues on linux Mint 13

Post by majenko » Mon Jun 13, 2016 7:32 pm

OK, I think I see what is happening here actually...

It's not a case of delaying between transfers, but actually a case of delaying during transfers. I think the reason it's failing is because the new beginTransfer() is being called whilst the previous one is still happening.. The state I2C_IDLE is set in the interrupt routine when the .P flag is set (STOP condition detected). Since that's interrupt driven it's asynchronous, and there is no blocking waiting for the previous one to finish before we start the next one.

So I think the problem is actually going to be in endTransmission that isn't properly detecting that the transmission is still in progress and waiting for it to finish properly.

Looking at the return value of endTransmission() it's returning 2 every time with nothing on the bus:

Code: Select all

    // if not my bus, then the beginMaster failed and we either had
    // a collision or the slave acked, in either case report a NACK from the slave
    if(!di2c.getStatus().fMyBus)
    {
        return(2);
    }
So it looks like it thinks it doesn't own the bus. But why? Let's break the fMyBus down and take a look...

Ok. By adding some more checking for the individual parts in that if, it looks like fBusError is being set. So that is either a collision (I doubt it) or the current state being I2C_BUS_ERROR. I can not (yet) fathom the logic that is setting I2C_BUS_ERROR (it is very cryptic). I need to dig into interrupt registers to work it out.

Oh, and I can definitely confirm that commenting out that entire "if" cures the problem.
Why not visit my shop? http://majenko.co.uk/catalog
Universal IDE: http://uecide.org
"I was trying to find out if it was possible to only eat one Jaffa Cake. I had to abandon the experiment because I ran out of Jaffa Cakes".

Post Reply