Created Sun, 26 Apr 2015 09:50:14 +0000 by HunterR
Sun, 26 Apr 2015 09:50:14 +0000
I've ported the SDFat library (Mar 24 2015) to ChipKit Uno32 on UECIDE 0.8.8.alpha11 using Digilent hardware DPSI(and "default" hardware SPI functions) and significantly improved both the bandwidth and latency. I've also ported it to MPIDE 0150-20150318 but with far more hackish solutions since it still uses Arduino Core 0023 (and not Core 1.0 like UECIDE). I'll need to document all the changes later. For now I've temporarily jumped ship to UECIDE since it required less hacks to the core.
More appropriately, I added "DPSI"/"PIC32MX" code to the stock SDFat library -- so greiman could theoretically clean it up include it directly. DSPI could be omitted completely if I knew the PIC32 direct register access codes/functions.... the STM32 shim uses that approach.
I've accomplished this though a "compatibility shim" file called "SdSpiPIC32.cpp" analogous to "SdSpiSTM32F1.cpp" or "SdSpiTeensy3.cpp" included with SDFatLib. Here are the current results from my work: I'm sure if someone who actually knew what they were doing cleaned up my methods it could probably be boosted another 2-3x.
HARDWARE USED: ChipKit Uno32 80Mhz, SPI @ 13.33Mhz with Sparkfun CANbus shield with SD card (on top of a Seeedunio CANbus card with 6-pin SPI header and "Mega pin hack" to route the SPI pins correctly) 64GB Sandisk Extreme (Plus) labeled as 95MB/s symmetric wite/read. SDXC FAT32 formatted 64GB microSD card with 1 primary partition.
Results from SDFat->Bench example: (4096 buffer, 13.33Mhz SPI) -- some debug functions visible. 960KB/s Write - 1175KB/s Read - latency of ~4200/3500 usec.
Use a freshly formatted SD for best performance.
Type any character to start
Free RAM: 0
Raw ChipKit: [199] 200000
Raw ChipKit: [2] 13333333
Type is FAT32
Card size: 63.86 GB (GB = 1E9 bytes)
Manufacturer ID: 0X3
OEM ID: SD
Product: SE64G
Version: 8.0
Serial number: 0XDEADBEEF
Manufacturing date: 12/2014
File size 5 MB
Buffer size 4096 bytes
Starting write test, please wait.
write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
957.67,12269,4088,4272
964.70,11783,4151,4242
960.25,11811,4085,4261
962.09,11789,4085,4253
962.47,11806,4087,4251
960.25,11788,4085,4260
964.51,11784,4151,4242
959.88,11804,4085,4262
960.06,11786,4085,4262
964.51,11787,4088,4242
Starting read test, please wait.
read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1176.07,4079,3388,3482
1175.79,4087,3375,3482
1175.79,4132,3372,3482
1175.79,4126,3372,3482
1175.79,4085,3372,3482
Done
Type any character to start
Compare this to the results using the Soft SPI library currently shipped with MPIDE [and BenchSD sketch from CodeBender]. The MPIDE default sketches also have problems with cards >32GB I might add... unsigned long long (uint64_t) print functions are needed in the CardInfo example. Infuriatingly, the setSPIspeed functions are silent and hollow "return true"s if you dig deep enough. Software SPI! :geek:
Results from MPIDE SD.h -> BenchSD example: (4096 buffer, SPI is Soft -- no speed option) 250KB/s Write - 510KB/s Read - latency of ~17000/8000 usec.
Type any character to start
File size 5MB
Buffer size 4096 bytes
Starting write test. Please wait up to a minute
Write 246.02 KB/sec
Maximum latency: 244474 usec, Minimum Latency: 13847 usec, Avg Latency: 16643 usec
Starting read test. Please wait up to a minute
Read 511.48 KB/sec
Maximum latency: 8961 usec, Minimum Latency: 7883 usec, Avg Latency: 8007 usec
Done
Type any character to start
One additional test I was able to run only with SDFat was the lowLatencyLogger -- which was the whole point of my endeavor. I need to record CAN packets that are coming in 20-bytes/frame * ~600 fames/second realtime, plus relay them to two MCP2515 CAN interfaces simultaneously, all on the same SPI bus! I've managed to get the latency down to 404 microseconds, recording from 4 ADC channels (sketch default) adds another 30 microseconds. (I'm not sure if this is a decent figure for Arduino, as I have no reference -- but the point is it went down a lot!) :D
Results from SDFat->lowLatencyLogger example: (18+1* buffers, 13.33Mhz SPI) *some #defines modified for the Uno32's bigger RAMEND, 19 buffers is probably excessive/never would be used... but whatever. One line of unoptimized code in sketch moved, minimum stable LOG_INTERVAL_USEC = 450, one "." printed for every 512byte block written. Max block write: 404 microseconds, 2222Hz for 4 ADC channels (default)
Buffers is [18]
RAMEND is [3FFF]
type:
c - convert file to csv
d - dump data to Serial
e - overrun error details
r - record data
Creating new file
Erasing all data
Logging - type any character to stop
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
...............................................................................................................Truncatin
g file
File renamed: data00.bin
Max block write usec: 404
Record time sec: 27.042
Sample count: 60093
Samples/sec: 2222.21
Overruns: 0
Done
Buffers is [18]
RAMEND is [3FFF]
type:
c - convert file to csv
d - dump data to Serial
e - overrun error details
r - record data
GIST OF THE MODIFICATIONS: Modifications to core print libraries mainly required "removing" the size_t return in MPIDE (or adding, in UECIDE's case), or adding typedefs/defines/fuctions related to "FlashStringHelper". And some changes to SDFat to create a hollow get total FreeMem() function since the methods illustrated didn't work for some reason, and I am not knowledgeable enough about PIC32 to code one. (I first touched an Arduino maybe 3 months ago!) :ugeek:
I utilize 32-bit wide transfers by checking intercepting SPI 8-bit buffered transfers of (length n > 4) and temporarily switching into (32-bit) SPI mode. Unfortunately, it also means I have to swap endianness with a Macro before/after every transfer operation since the four 8-bit bytes come out of the sxBuf.reg backwards. I'm certain I could optimize it further, but I am unsure what code directly access the SPI registers -- it's a pointed at by private member "pspi" of DSPI class, so I stopped here. But even overhead included, it still is much faster!!
HACKS/DISCLAIMERS: SPI prescaler of "2": 80Mhz / (2 * (2 + 1 ) ) = 13.33Mhz AFAIK the BRG register on the PIC32 is discrete and doesn't allow anything between 13.33Mhz (prescaler of 2) and 20MHZ (prescaler of 1). But I found the Sandisk Extreme would not operate reliably (failed after init) at 20Mhz. So 13.33Mhz it is. Some hacks were done to "SDFatConfig.h" to change the SPI_SCK_INIT_DIVISOR to 199 for PIC32 (end-result frequency of 200000) as well. And changes to the Init function to not "bin" the divisor to certain #defined values (2,4,8,16,32,64,128). [By default the CASE statement lumps it in at SPI_CLOCK_DIV128 = """16Mhz"""/128 = 125Khz (= Pic32 BRG prescaler #define of 319).] Now I can now init 100% of the time reliably.... and it's still within the SD card handbook spec.
Also, there were some weird inefficiencies within SDFatLib I cleaned up. I'm new to this.... maybe they are some kind of fix... but they looked like buggy code so I just optimized them according to "how the documentation says they should work".
I'm pretty sure the SPI.beginTransaction() code was never working... I'm pretty sure the DPSI code is incompatible with it anyway. It is theoretically possible to support it by storing a "SPISettings" type object in the class somewhere, but I'm not mucking with that part of the code.. it's inscrutable.
EnhancedBuffering was not used for these benchmarks since my Uno32 doesn't even support it! Meaning it could be improved even further! However, it is theoretically supported in the UECIDE DPSI libraries used by my "compatibility shim" -- the #defines are there.
Files fiddled with (mainly, for refrence): SpSpi.h (init code). SdFatConfig.h (Init speed), SdSpiCard.cpp (ChipselectHigh/Low had a spurious 0xff transfer "insure MISO goes high impedance"?? -- plus I used a hack... hardcoded LATDSET/LATDCLR to speed up CS pin flipping)
Okay, that's a wall of text, but if I disappear tomorrow someone else should be able to replicate it. Libs are GPLv3 so I plan to release them. How does GIT work? :ugeek: I don't want to ruin someone else's repo by accident! Maybe unexpectedly or majenko can chime in?
Edit: I've attached the obligatory "Works for me" zip file with all the things I think I modified. (Also, in case I suddenly make it not-work.) THE ATTACHED IS AN UTTER HACK AND IS SPECIFIC TO MY HARDWARE ALONE. Put the core files in your Core folder, the SDfat folder in your UECIDE/libraries folder, and the other 2 are my test projects and benchmarks.
Stuff to keep aware of in "worksforme":
Edit 2: Hack-ey version removed, proper universal version uploaded. I've left the "worksforme" ZIP in case someone wants to see the debug functions.
Sun, 26 Apr 2015 09:59:37 +0000
Brilliant!!!
This is a job I have been meaning to get round to tackling some time myself, and you have now saved me lots of work :D
You should get your code onto Github. The normal way would be to create a fork of Greiman's repository into your own area, then clone that locally, add your changes to it, and commit & push it back to your area on Github. From there you can then make a "pull request" to try and get your changes into the official SdFat repository. At the very least I would then be able to easily create a libsdfat package for UECIDE from your fork so anyone could then use it at the press of a button...
Sun, 26 Apr 2015 10:34:50 +0000
I think I modified some of the ChipKit core though.... "print.h" and "print.cpp".
That sort of thing needs to be addressed by the "pending 1.6.x core update" that's been in developer hell for the past...... (how many??) years. Hence why I jumped ship as soon as I saw UECIDE had Core 1.0 support. Apparently the "FlashStringHelper" stuff was added later 1.0->1.6.
Specific functions added (adapted from my Arduino 1.6.?? installation) -- GIT to SdFat won't fix this!
UECIDE\cores\ChipKit\pic32\print.h
size_t print(const __FlashStringHelper *);
size_t println(const __FlashStringHelper *);
UECIDE\cores\ChipKit\pic32\print.cpp
#define PGM_P const char * <-------- I have no idea why I needed this... but it had to be in this file or I got compiler errors
size_t print(const __FlashStringHelper *ifsh) { .... }
size_t println(const __FlashStringHelper *ifsh) { .... }
Also I had a very hard time understanding the class inheritance on class SdCard -> ???????. I wasn't sure how I could optimize the ChipSelect stuff or add SPITransaction support.
You also need to add a "DSPI" object which should be able to be passed or initialized in the constructor. (like the cut-down version of SDFat that comes stock in UECIDE) But the way the class hierarchy is laid out I couldn't figure out how to add a "DSPI" private member to the SdSpi class "after-the-fact" in my "shim" cpp... or if I need to typedef a new SdSpiPic32 class -> [SdSpi -> SpiDefault_t -> m_spi_t]. But I'm thinking that because the Teensy code doesn't do this..... neither should we.
You can't add class variables ex-post-facto... so for now I have it hardcoded the DSPI object as DSPI0 and have it floating around as a Global in the "shim". Also, you mentioned some kind of "library within a library" problem... I think I ran into that. I have to #define <DSPI.h> in my sketch itself or it won't be included. Even if I have a #define in the "shim" .cpp it isn't added automatically.
Sun, 26 Apr 2015 10:44:50 +0000
You mean #include <DSPI.h> not #define I think ;)
In UECIDE if DSPI is included in the main .h file for the library then UECIDE will include DSPI in the list of libraries it needs for your sketch, so no need for it in your sketch. That's not the case in MPIDE though, which needs the DSPI in your sketch to make it work, which is why I never back-ported my tweaked SD library from UECIDE to MPIDE since it has that library requirement problem. The only easy way around it is for chipKIT to drop MPIDE and just use UECIDE (which may actually happen).
Did you have to modify the UECIDE chipKIT core for this, or just the MPIDE one? If it's just the MPIDE one then the changes are already there in MPIDE's github repository since UECIDE just uses that these days. We are a gnat's whisker away from full 1.6.x compatibility now.
Sun, 26 Apr 2015 12:52:23 +0000
The #include <DSPI.H> was in a CPP (the shim) so your UCIDE didn't pick up on it. Hence I had to manually add <DSPI.H> to my sketch in UCEIDE.
I had to modify both MPIDE and UCEIDE, although MPIDE was more extensive.
I downloaded the MPIDE off the installer on the main website. I picked the "newer"/"unstable" one. All the print functions return "VOID" according to the core files. I have no idea where/if there is a newer version.
I've added the obligatory "works for me" ZIP dump with all my work. I realllllly don't want to have to learn a new software (GIT), register another account, push, pull, etc just to put it out there. If you (or someone knowledgeable about these things) could look at what I did and re-implement it a cleaner manner, that would be ideal.
I'd like to avoid code fragmentation as much as possible.... there's already 4 different SPI implementations (just for ChipKit!) I don't want to add another. Adding it directly to SDFat's trunk itself would be ideal.
Sun, 26 Apr 2015 14:02:21 +0000
Just the two "Flash Helper" print functions added to Print? I'll get those added into the chipKIT core and into the MPIDE github repo.
It looks like you may be mixing SPI and DSPI calls, which may cause some rather nasty problems if you're not careful. I think we need to go through the whole library and work out how the structure's meant to work and try and stop it using SPI and only use DSPI, if that's possible...
Sun, 26 Apr 2015 14:11:22 +0000
Urrrrghh... This library is horrible. I'm surprised you got it working as well as you did - well done.
It's going to take some time trying to untangle all the hard coded configuration stuff and weird typedefs. Has he never heard of polymorphism!?
Sun, 26 Apr 2015 14:13:08 +0000
I think it's SUPPOSED to use "regular SPI" (SPI.begin() SPI.transfer() etc) if in "SdFatConfig.h" you set "#define SD_SPI_CONFIGURATION" to 1. That option is just poorly documented. Only if it is "0" will it try to use these "optimized SPI" commands... I think we should leave it that way.
The "optimized SPI" for Arduino (#ifdef AVR) is inlined at the bottom of the SdSpi.h header file. The "optimized SPI" for others is in the "compatibility shims".
If there's ever a issue with the "optimized SPI" in the future, people can change that 1 #define and go on with their day, albeit more slowly.
Also, I think I made two more changes.... the PGM_P define too. I don't think that was there... or I botched an #include somewhere. Like I said... I'm new at this. :oops:
Again, I'd really like to keep it unified with the main SDFat repo.. so people can just download the .ZIP to their lib folder and use it. I'm not sure what DSPI offers is even compatible with the "Arduino standard", like "SPITransaction" etc.
Sun, 26 Apr 2015 14:19:03 +0000
A diff of the core just showed those two functions. The rest was already there.
The problem with "regular SPI" if you're also using DSPI is that SPI and DSPI are different beasts and they have slightly different interfaces. You should use one or the other. Ideally we'd have "regular SPI" only making DSPI calls, or have a separate configuration for DSPI. Also the SPI library doesn't work at all with MZ based boards, so wants to be completely avoided if at all possible.
Sun, 26 Apr 2015 14:26:15 +0000
Oohhh... thank you for that warning.
My project ( http://chipkit.net/forum/viewtopic.php?f=15&t=3264&p=13637#p13637 ) involves 2 other SPI devices on the same bus... CAN shields... which use the "default" SPI interface. (I planned to add interrupt-based transfers, but then it all got confusing to I gave up and used stock.) This SD stuff I was benchtesting separately.
You've probably just saved me a few days of scratching my head tracking down ghost bugs!! I'll try to incorporate DSPI only in the CAN library via #defines, so I can still easily cross-test it on my Mega.
Edit: I just clean-installed UECIDE and used the SDFat lib. After adding the print.h/cpp functions, it complains about PGM_P being undefined. I did a little more digging and I find PIC32 doesn't have/need AVR/PGMspace.h ....
Edit 2: I found the "patch" header file that defines-away the non-needed features ( core/chipkjit/pic32/acr/pgmspace.h ). However, the " #define PGM_P const char *" line is missing. Do you have a newer version of the file? That isn't available by the internal-updater.
Sun, 26 Apr 2015 15:19:10 +0000
Ah no, I see the PGM_P is in your Print.cpp - I missed that when looking at the diff.
I may move that to avr/pgmspace.h
Sun, 26 Apr 2015 19:25:48 +0000
OK, so I went ahead and learned the bare minimum in GIT to commit the changes. I also put a shirt and tie on the code so it was more presentable. :ugeek:
It now doesn't use any of the hard-coded hacks, and is ready to be used by anyone. (Feel free to pull the complete library from my Github fork here.)
Attached to this post are the modified files that need to be added to UECIDE. (Temporarily, until the modifications can be pushed out over an update.) I'll work on sorting out the MPIDE modifications more... slowly.
Mon, 27 Apr 2015 14:41:25 +0000
So, my pull request was rejected. He said he only plans on supporting the Arduino IDE and doesn't use the PIC32 for any project.
I was looking into speeding up the process by using DMA SPI, but I learned my Uno32 doesn't support DMA. If someone who knows more about these things would like to try their hand at speed up SD card access on PIC32 further, please investigate this route. Also, it may help converting the SDFat library to using 32-bit variables wherever possible.
http://chipkit.net/forum/viewtopic.php?f=7&t=182
For the time being, please use my GIThub fork of SDFat if you want PIC32 support.
Mon, 27 Apr 2015 19:54:30 +0000
Yeah, DMA driven SPI block transfers is something I've been meaning to sort out in DSPI. I have been trying to decide on the best way to do it. Ideally I'd like to have some kind of DMA channel manager in the core so libraries can request an available DMA channel to use so you don't get conflicts (and if there is no DMA it gets no channel), but that kind of thing may be too heavyweight so it cancels out some of the benefits of using DMA in the first place.
So the opposite end of the scale is to have the DMA channel and code hard coded in the DSPI library, and that means that other libraries that use DMA could conflict with it all. So some way of changing the channel is needed, and that generally means manually editing a header file to change the channel, which I really want to avoid if at all possible.
DMA is really at its most effective when you can do background transfers. Pre-emptive caching, where it pre-fetches the next block you are most likely to want to read, or flushes a write cache to the SD card in the background, is where DMA really would shine. I did make a start on my own SD card library (FileSystem) which is actually aimed at being a generic file system management library for more than just SD cards, but it's not got too far yet. It uses more advanced caching and would benefit from DMA transfers big time, especially when flushing its cache blocks to the SD card.