Make yourself at Ohm

Zero-Copy: CUDA, OpenCV and NVidia Jetson TK1: Part 2

2017-03-15T15:22:00.001-04:00

In this part 2 post I want to illustrate the difference in technique between the common 'device copy' method and the 'unified memory' method which is more suitable for memory architectures such as NVidia's Tegra K1/X1 processors used on NVidia Jetson development kits. I wanted to show an example using just a CUDA kernel as well as an example utilizing OpenCV gpu::functions().

1. CUDA kernels: Device Copy method

For this example, I've written a simple CUDA kernel that will take a fixed matrix (640x480) of depth values (delivered by Xbox 360's Kinect) and simultaneously convert to XYZ coordinates while rotating the points. This example only computes the Y dimension, but I can provide a full XYZ function as well, the math is fairly simple. The code may seem a bit intense, but try not to think of what's inside the CUDA kernel for now.

Kernel Code:

__global__ void cudaCalcXYZ_R2( float *dst, float *src, float *M, float heightCenter, float widthCenter, float scaleFactor, float minDistance)
{

//__shared__ float jFactor;
__shared__ float shM[3];
float nx,ny,nz, nzpminD, jFactor;
int blockCapacity;
int index;
if(threadIdx.x == 0)
{
shM[0] = M[4];
shM[1] = M[5];
shM[2] = M[6];
}
index = blockIdx.x*blockDim.x + threadIdx.x;
nz = src[index];
jFactor = ((float)blockIdx.x - heightCenter)*scaleFactor;
nzpminD = nz + minDistance;
nx = ((float)threadIdx.x - widthCenter )*(nzpminD)*scaleFactor;
ny = (jFactor)*(nzpminD);
//Solve for only Y matrix (height vlaues)
__syncthreads();
dst[index] = nx*shM[0] + ny*shM[1] + nz*shM[2];

}

Basically, a float pointer is sent as src (Depth data), it is manipulated to acquire the 'Y' parameter which is then stored in another float* dst. In a device copy implementation of the CUDA kernel, the data pointed to by src must first be copied to device memory using CUDA method cudaMemcpy(). Below is an example of how to do this ('h' generally means host (cpu) while 'd' means 'device' (gpu) ):

{
int rows = 480;
int cols = 640;
float* h_src, h_dst; //Host matrices
float* d_src, d_dst; //Device matrices
float* h_m, d_m; //4x4 rotation matrix (host/device)

//Allocate device copies using cudaMalloc
cudaMalloc( (void **)&d_src, sizeof(float)*rows*480);
cudaMalloc( (void **)&d_dst, sizeof(float)*rows*480);
cudaMalloc( (void **)&d_m, sizeof(float)*16);

//Allocate host pointers
h_src = (float*)malloc(sizeof(float)*rows*cols);
h_dst = (float*)malloc(sizeof(float)*rows*cols);

h_m = (float*)malloc(sizeof(float)*4*4);

//Copy all matrices from host to device
cudaMemcpy( d_src, h_src, sizeof(float)*rows*cols, cudaMemcpyHostToDevice);
cudaMemcpy( d_m, h_m, sizeof(float)*16, cudaMemcpyHostToDevice);

//Run the kernel
cudaCalcXYZ_R2<<< rows , cols>>>(d_dst, d_src, d_m, 240, 320, 0.0021, -10);

//Wait for GPU to finish
cudaDeviceSynchronize();

//Copy the result back to host memory
cudaMemcpy( h_dst, d_dst, sizeof(float)*rows*cols, cudaMemcpyDeviceToHost);

}

2. CUDA kernels: Unified Memory method

Here we are going to utilize the same kernel as the above example, but this time we are going to avoid any memory copy altogether by utilizing the CUDA_UVA technique. Here, instead of using cudaMallac() we have to use cudaMallocManaged();

{
cudaSetDeviceFlags(cudaDeviceMapHost); //Support for mapped pinned allocations

int rows = 480;
int cols = 640;
float* h_src, h_dst; //Src and Dst matrices
float* h_m; //4x4 rotation matrix

//Allocate float*s for CUDA. No need to allocate host and device separately
cudaMallocManaged(&h_src, sizeof(float)*ros*cols);
cudaMallocManaged(&h_M, sizeof(float)*4*4);
cudaMallocManaged(&h_dst, sizeof(float)*ros*cols);

//Run the kernel

cudaCalcXYZ_R2<<< rows , cols>>>(h_dst, h_src, h_m, 240, 320, 0.0021, -10);

//Wait for GPU to finish
cudaDeviceSynchronize();

//Done, now h_dst contains the results}
}

So now we have completely eliminated copying over 1.23MB (640x480x4 bytes) prior to running the kernel as well as eliminated copying 1.23MB (640x480x4 bytes) after the kernel has finished. Imagine trying to achieve real-time performance on a robot reading a Kinect sensor at 30FPS, needlessly copying more than 73.3MB a second into the same RAM!

[Note]: This code would only function on an architecture such as the NVidia Tegra X/K processors, so no sense in trying to run it on your discrete GPU in your laptop or desktop (it just won't work!).

3. OpenCV GPU functions: Device Copy method

There is a module available with OpenCV called GPU written in CUDA, for those to take advantage of GPU acceleration of various functions. There is plenty of documentation online to understand how to use OpenCV's CUDA, I will go over the very basics. The example we will use is the per-element multiplication of two matrices a and b, where the result is stored in c. Using the 'device copy' method, here is how to do so with OpenCV's gpu function gpu::multiply():

{
//variables/pointers
int rows = 480;
int cols = 640;
float* h_a, h_b, h_c;
float* d_a, d_b, d_c;

//Allocate memory for host pointers
h_a = (float*)malloc(sizeof(float)*rows*cols);
h_b = (float*)malloc(sizeof(float)*rows*cols);
h_c = (float*)malloc(sizeof(float)*rows*cols);

//Allocate memory for device pointers
cudaMalloc( (void **)&d_a, sizeof(float)*rows*cols);
cudaMalloc( (void **)&d_b, sizeof(float)*rows*cols);
cudaMalloc( (void **)&d_c, sizeof(float)*rows*cols);

//Mats (declaring them using available pointers)
Mat hmat_a(cvSize(cols, rows), CV_32F, h_a);
Mat hmat_b(cvSize(cols, rows), CV_32F, h_b);
Mat hmat_c(cvSize(cols, rows), CV_32F, h_c);

//Gpu Mats (declaring with available pointers)
gpu::GpuMat dmat_a(cvSize(cols, rows), CV_32F, d_a);
gpu::GpuMat dmat_b(cvSize(cols, rows), CV_32F, d_b);
gpu::GpuMat dmat_c(cvSize(cols, rows), CV_32F, d_c);

//Let's assume our host matrices are filled with actual data, then copy them to the device matrices
dmat_a.upload(hmat_a);
dmat_b.upload(hmat_b);

//Run gpu::multiply()
gpu::multiply(dmat_a, dmat_b, dmat_c);

//Copy the result back to the host
dmat_c.download(hmat_c);

//Result now in hmat_c, required copying matrix a, b and c...
}

4. OpenCV GPU functions: Unified Memory method

You'll notice that in the above example I've been allocating memory to pointers for my images, rather than just using OpenCV to allocate memory upon declaration of a Mat or GpuMat. This is required for this section on utilizing OpenCV GpuMats without having to upload and download data to and from the GPU memory on chips such as the Jetson IC. There is another less obvious reason I use this method. For real-time performance on embedded processors, it is more efficient to allocate memory for objects early on prior to any operations that run cyclically. As long as you can spare the memory, this becomes an effective way to increase performance (granted the trade off is sacrificing some RAM which won't be freed up etc.). If you find yourself in need of dynamically freeing up space from these allocated methods, you can look into cudaFree() and cudaFreeHost().

Now on eliminating download() and upload() OpenCV function calls.

{
cudaSetDeviceFlags(cudaDeviceMapHost); //Support for mapped pinned allocations

//variables/pointers
int rows = 480;
int cols = 640;
float* h_a, h_b, h_c;

//Allocate memory for device pointers
cudaMallocManaged(&h_a, sizeof(float)*rows*cols);
cudaMallocManaged(&h_b, sizeof(float)*rows*cols);
cudaMallocManaged(&h_c, sizeof(float)*rows*cols);

//Mats (declaring them using pointers)
Mat hmat_a(cvSize(cols, rows), CV_32F, h_a);
Mat hmat_b(cvSize(cols, rows), CV_32F, h_b);
Mat hmat_c(cvSize(cols, rows), CV_32F, h_c);

//Gpu Mats (declaring with the same pointers!)
gpu::GpuMat dmat_a(cvSize(cols, rows), CV_32F, h_a);
gpu::GpuMat dmat_b(cvSize(cols, rows), CV_32F, h_b);
gpu::GpuMat dmat_c(cvSize(cols, rows), CV_32F, h_c);

//Run gpu::multiply()
gpu::multiply(dmat_a, dmat_b, dmat_c);

//Result now in hmat_c, no copying required!
}

Much like in the CUDA unified memory example, this method will only function on hardware with unified memory architecture (Jetson ICs for example). Now you do not need to bother using OpenCV download and upload methods for your algorithms.

Enjoy the speedups!

Zero-Copy: CUDA, OpenCV and NVidia Jetson TK1: Part 1

2017-03-08T23:01:00.003-05:00

If you aren't yet familiar with NVidia's embedded ECU releases (NVidia Jetson TK1, TX1 and coming soon TX2) they are definitely something dig into. NVidia has been embedding their GPU architectures on the same IC as decent-speed processors (like a quad-core ARM Cortex A-15). The Jetson TK1 is by far the most affordable ($100-200), and is an excellent option to bring in some high performance computing to your mobile robotics projects. I'm posting my findings on a slight difference between programming with CUDA on NVidia discrete GPUs and on NVidia's embedded TK/TX platforms (in regards to their memory architecture)

CUDA is a C-based framework developed by NVidia to allow developers to write code for parallel processing using NVidia's GPUs. Typically the main CPU is considered the 'Host' while the GPU is considered the 'Device'. The general flow for using the GPU for general purpose computing is as follows:

CPU: Transfers data from host-memory to device-memory
CPU: Command CUDA process to run on GPU
CPU: Either do other work or block (waiting) until the GPU has finished
CPU: Transfer data from device-memory to host-memory

This is the general flow, and you can read up on this more in depth. As far as the GPU and the CPU on the NVidia TK/TX SOCs go they are considered to have a Unified Memory Architecture (UMA): meaning they share RAM. Typical discrete GPU cards have their own RAM, thus the necessity of copying data to and from the device. When I learned this I figured the memory-copy process could be eliminated altogether on the Jetson!

I started out learning how to do this purely with simple CUDA kernels. Generally, the CUDA compiler will not allow a CUDA function (or kernel) to operate on data-types that have CPU-type pointers etc. I came across different memory-methods of using CUDA:

CUDA Device Copy
CUDA Zero Copy
CUDA UVA (Unified Memory)

It took me a while to get used to using these different techniques, and I was not sure which one was appropriate so I did some profiling in order to find out which one gave me the best speed-up (Device Copy as the base). It turned out that CUDA UVA was the better method for coding on the Jetson TK1 embedded GPU.

However, I still ran into a problem using OpenCV on the Jetson. OpenCV has a CUDA module, however OpenCV is designed to use two different Mat data-types: mat for CPU and gpu::GpuMat for GPU. So you could not use OpenCV gpu::functions on cpu mat objects. OpenCV actually has you do the same thing as in 'device copy' for CUDA, and use their methods for copying a CPU mat to the GPU and vice-versa. When I realized this, I was stunned that there was no Unified Memory method (to my knowledge) in OpenCV. So all OpenCV gpu::functions required needless memory copying on the Jetson! On an embedded device this is an extreme bottleneck, as I was already hitting the wall with my programs working with the Kinect IR sensor and image data.

So after quite a bit of sand-box style experimentation, I found the correct approach to casting Mat pointers into GpuMat pointers without doing any memory copy and maintaining the CUDA UVA style. My original program with my Kinect sensor ran at 7-10FPS, and that was with cutting the width and height down from 640x480 to 320x240. With my new approach of avoiding any memory copy I was able to achieve full 30FPS at full 640x480 (this is on all the Depth Data from the IR sensor).

I will post code on my github and update this with the link soon.

Move on to Part 2 for examples

Ethernet-Based IMU for ROS

2017-03-06T10:30:00.002-05:00

So I've been doing a bit of development with Robot Operating System (ROS) and a while back needed a flexible solution for acquiring inertial measurement data from multiple sensors. My intention was to have an embedded controller responsible for collecting IMU data and delivering to a master controller. Since in ROS it is easy to swap between a laptop and an embedded-Linux device as your platform, I decided to make a portable Ethernet-based IMU out of the MPU9150 and a raspberry pi (or pcduino).

I had previously made my own C/C++ Linux-based API interface for communicating to the MPU9150 (3axis gyrometer, accelerometer, magnetometer). This project is an extension to have an embedded server which would send periodic packets containing latest IMU data (so as of now, the embedded-IMU device does not run ROS). I wrote a ros-node to connect to this server, receive the data and publish it as a ROS message. Everything seems to be working out quite well and I currently am receiving IMU data at 100Hz using Ethernet (less over wifi). I can add additional Ethernet-based IMUs to the project with little complexity now. Below is my wireless version.

Pcduino-3, MPU9150 and a USB battery pack

For show, I setup a battery-powered pcduino-3 connected to an IMU as the server over wifi (eliminating wires which get in the way for a hand-held demo). On my ROS device (laptop) I have a node running which is dedicated to receiving the IMU data packets and publishing in ROS as a sensor message that other ROS nodes can subscribe to. Below is a video of real-time plotting of received IMU data using rqt. The video is not the best quality because my phone is not so great, but you can see that the top plot is the Gyro-Z measurement and the bottom plot is the Accelerometer x-axis measurement.

Plotting Wireless IMU in ROS

I also wrote an additional node in ROS to subscribe to the IMU messages and perform some integration on the gyro data to estimate the sensor orientation around the z-axis (so what a compass would tell you). You can see in this short video the integration is somewhat reliable. There is definitely gyro drift over time which would end up corrupting the estimate in the long term, but that's where filtering techniques will come in to assist in state estimation (for example: Kalman filtering). Video below:

Heading Estimate Using Gyro with Wireless IMU in ROS

I will be posting my code for this project on my github soon, I will update this once that is ready (and hopefully making better videos)

Self-Balancing Robot

2014-12-10T21:43:00.004-05:00

In my last class for my Masters, I decided to build a self-balancing robot for the final project. What I liked about this project, is that it involved a few very relevant areas of embedded systems all in one project. I'll try to write this post the way I wrote the presentation.

The class was 'Mixed Signal Embedded Systems', and revolved around the Cypress chip called the PSoC4 (programmable-system-on-chip). The chip is interesting in that it has both an ARM M0 processor AND some (emphasis on 'some') programmable logic (PLDs). It also has 2 embedded amplifiers and some other interesting HW components (all configurable through the ARM via registers). Anyway, this post is not about this chip, but feel free to read up on it. It is another example of the direction in which embedded systems are heading.

Motivations for this project

Interface to inertial measurement sensors (Gyros/Accelerometers)
Employ light-duty sensor fusion for robot pose estimation
Implement an embedded PID controller to maintain robot balance

The problem: The Inverted Pendulum

The goal of the inverted pendulum is to keep the pendulum from tipping over. Without intervention, the pendulum is naturally unstable and will eventually tip over. Controls must be used to counter act this tendency by applying a force in the horizontal direction.

Inertial Measurement Sensors

The MPU9150 was used for inertial measurement sensing for this project. The MPU9150 is:

3 axis accelerometer
3 axis gyro
3 axis compass
...all integrated in a single IC

The benefits of having these components on a single chip, is that the error caused by misaligned axis is greatly reduced and controlled by the manufacturing process. This also makes the packaging both smaller and cheaper.

The sensor data is accessed through the I2C protocol, so a microcontroller is needed to perform this (and I wrote a library both in C and C++ which I will post to my github page and provide a link).

Accelerometers

Accelerometers measure linear acceleration. The diagram above shows the 3-axis accelerometer

placement inside the MPU9150. When an accelerometer is stationary, it measures a net acceleration the gravitational force of 9.8m/s^2 (at least on earth). The orientation of this sensor with respect to the gravitational force (let's call it earth's z-axis) can be estimated using trigonometry:

This function is accurate only when there is no external forces being applied. When external forces are applied, such as the robot moving/tipping, the orientation cannot be accurately estimated.

Gyros

Gyros measure the the rate of angular change. The above diagram shows the 3-axis gyro placement inside the MPU9150. Gyros are not susceptible to linear forces such as vibrations or movement. They are only sensitive to angular displacements. Gyros cannot be used to directly measure physical orientation, but their readings can be integrated over time.

dt is the control loop time.

Gyros suffer from a phenomenon known as drift, where they show a small rate of change even though no actual rate of change is occurring physically. When integrating, this error adds up over time, causing an inaccurate estimate of angle.

Also note from the 2 pictures of the axis alignment: the x-axis gyro is perpendicular to the y-z plane of the accelerometer. So the angle using the accelerometer's x-z plane (as shown in the equation) would be the same angle using the gyro's y-axis.

Estimating Robot Pose

In order to accurately estimate angle using gyros and accelerometers, one has to combine these sensor
readings together in a way that takes advantage of their strengths in order to make up for their
weaknesses. This technique is known as sensor fusion. There are a few different pose estimation methods that have their own strengths and weaknesses. For this project, one that is easily implemented on a low-power microcontroller, called complementary filtering is used. The equation below shows how this is implemented:

Alpha is used to weight the two angles estimates before they are combined. Since the accelerometer is more prone to adding noise to the system, a large alpha is chosen to give the gyro integration more
weight in the equation, effectively applying a low-pass filter on the accelerometer. For this project, an
alpha of 0.99 was used (along with a control loop time of 10 milliseconds).

PID Control System

Proportional-Integral-Derivative (PID) is a control loop feedback technique using the error between a
set-point and observed output of a system (often referred to as the plant). A PID controller must be tuned by varying the 3 gains associated with the control algorithm: Kp, Kd and Ki. Different choices for these constants manipulate the systems response in terms of responsetime, overshoot and oscillations.

Kp: The proportional term. Depends on the present error only.
Ki: Integral term. Depends on the accumulation of past errors.
Kd: Derivative term. Prediction of future errors.

The following is a pseudo-code example of implementing a PID algorithm in a controlled loop.

The output of the PID algorithm is a sum of the gain-terms multiplied by their function on the error. The error is a simple difference from the target (in this case an angle) and the observed state (the estimated angle by the sensor fusion algorithm). It is important to keep the control loop rate consistent (so use an interrupt, or don't do extra processing).

Results

I built my robot using parts of one of my other robots, so save on cost. I built a cheap frame out of small wood squares, a dowel and hot glue (all bought at Michaels for less than $5):

Starting off with just a p controller (Kp >0, Ki = Kd = 0), the system was nowhere near useful. The robot would over-shoot and bang itself on the floor. I thought it would destroy itself before I would be able to stabilize it. Fortunately, after adding some derivative gain (Kd), the robot was able to keep itself up-right, with some oscillation. When I added some disturbance (a little push) the robot would travel horizontally until it eventually fell over. After adding some integral gain (Ki) the robot seemed to stabilize itself very well. My robot at this point would re-stabilize fast enough from a slight push, with some oscillation and small travel in the horizontal direction.My first set (and my favorite) set of gains were: Kp = 8, Ki = 0.5, Kd = 10. These gains won't mean much to you, as they are fit for my robot/system, which is dependent on my control loop, motor response, motor torque, speed resolution, robot height...and it goes on...). You would have to experiment with your system for a suitable set of gains.

What I can do, is share my next set of gains as a comparison to show the change in performance. I thought it would be good to rid of the oscillation, so I increased the derivative gain from 15 to 60. In order to keep my robot stable I had to increase the Kp and Ki gains slightly. This was good, in that my robot did not oscillate so much while trying to keep its balance in the non-added disturbance case. And when I did add a slight push, the robot corrected its angle almost instantly. However, the robot was forced to travel in the horizontal direction for a much larger distance to maintain this response.

I learned that added disturbances (like pushes) is like adding energy to the system. In order to deal with the extra energy, the robot can either oscillate back and forth a few times while minimizing travel in the horizontal direction, or it had to travel in the horizontal direction quite more just to maintain the angle I had programmed.

Anywho, here is a video of the robot in action:

I have a github site, so I'll post the code there. I have C, and C++ library for the MPU9150. The C library however is dependent upon the PSOC4, but you can strip what you need rather easily from it. I'll also include an application I wrote in processing (see processing.org) which the robot used to communicate to my PC so I could understand what was going on a little better:

Link to the code:

https://github.com/johnnyonthespot/self-balancing-robot-psoc4/tree/master

Thanks!

Autonomous Robot with Zynq

2014-07-29T11:06:00.002-04:00

In an earlier post, I talked up the Xilinx Zynq (an IC with both FPGA and Microcontroller). In my Advanced Embedded System Design course, we had to build an autonomous robot that can navigate around with some form of intelligence and seek out certain objects to 'destroy'. Now the term 'destroy' was really left up to the students to define; for our robot a laser pointer was used to mark enemies. Identifying 'enemies' was a challenge, so we incorporated a camera so the robot could see and track on its own. Meet our robot (below):

Equipped with:

Zybo
OV7670 Camera
Arduino
Sabertooth Motor Controller
IR Proximity Sensors
LiPo Batteries (12V + 7V)
Pan/Tilt Servo motors
Laser

Custom logic was designed onto the FPGA portion of the Zynq, in order to maneuver the robot, control the pan/tilt bracket, capture frame data from the camera, and lastly - 'fire the laser'. The ARM portion of the Zynq was used as the algorithm prototyping environment (in C) to make use of the custom FPGA interfaces. The Arduino was used to configure the OV7670 over the I2C-like communications. The following diagram shows how all components were interfaced.

The reason the OV7670 was chosen was its parallel data interface and low cost ($20), allowing us to obtain an image capture rate of 30fps. However, you get what you pay for in terms of ease of interfacing. I had to design custom logic in VHDL to perform frame captures and store them in block-ram on the FPGA. It's hard to debug what you can't see. None the less, after days of toying around, the Zybo was finally able to see. In order to view what the Zybo saw, I wrote a quick program to transfer images from the Zynq to my laptop over a serial connection (using Processing). Below is the process flow for debugging the images.

The custom FPGA logic which interfaced to the OV7670 was designed to either stream frames into the block-ram, or take a single snapshot and leave it in the block-ram. The FPGA had to interface/synchronize to the vsync, href, pixel-clock, and 8-bit data of the OV7670. It also needed to interface with the ARM processor and to FPGA-based block-RAM. Below is the block model of the custom camera control as shown in Vivado.

We couldn't have done it without camera register settings provided by this source: Hamsterworks - OV7670

Otherwise the images didn't turn out so well:

I'll post some videos of the robot in action soon...

Xilinx Zynq7000 and the Zybo

2014-04-29T14:54:00.001-04:00

If you are into getting your hands on the latest embedded technology, the Zynq7000 is a great platform to get familiar with. Xilinx, a leader in FPGA design (in which I have no affiliation with), partnered up with ARM to create the very first microprocessor with on-chip FPGA (or is it an FPGA with an on-chip processor?).

Xilinx Zynq7000

This technology combines the power of using an FPGA to perform high-speed paralleled tasks with a dual-core ARM processor with standard Micorcontroller peripherals (CAN, I2C, SPI, UART etc.). I was lucky enough to be introduced to this device in my Advanced Embedded System Design course at Oakland University, and will continue working with it as long as I can continue to make sense out of partitioning my embedded design projects into both 'some hardware' and 'some software'. I purchased the Zybo development board made by Digilient, however there are others out there (Zedboard, microZed by AVNet)

To paint a better picture of how this is so useful, imagine using a microcontroller to both interface to a camera and perform image processing in order to make a decision (maybe you are trying to track an object). In terms of cameras you are able to interface to, you are left with those with slower serial interfaces (SPI, UART). If you really wanted to interface a microcontroller to a camera with a faster parallel interface, you would have to spend a little more money to get a fast enough controller. But this doesn't leave your controller much time to do anything else other than image-processing and frame-grabbing! On the flip side, if you were to use an FPGA it becomes harder to actually implement your decision making and takes longer to develop and is harder to debug.

Having an FPGA and Microcontroller on the same IC solves this exact problem. Now, you can develop your frame-grabbing for your higher-speed camera and even do some image processing on the FPGA. In this way, the FPGA can be treated as a custom co-processor for your application running on the Processor. And the FPGA to Processor interface is seen by your application is either 'just another memory mapped peripheral' or even an 'external interrupt'.

The example I used comes directly from my project, where I did use the FPGA to communicate with a faster camera (well faster than UART or SPI based) and allow the processor to dictate what kind of operations to perform on the image, or even stream the image to the processor's RAM for streaming to a remote PC via UART. This was for an Autonomous Robot project, which I will add posts on in the near future.

I recommend checking this technology out. It is great to see innovations such as this, because they have the power to take industries into new directions. If you are interested in getting yourself one, I suggest either the Zedboard or the Zybo.

Link to Digilent's Zybo

Audio Localization

2012-04-20T01:09:00.001-04:00

Just a little introduction on my interest in this topic:
I've always been intrigued by the way our mind is configured to interpret sound signals. To those of you with two working ears: Ever notice when you hear a noise, you know which direction it came from?... I mean you just 'know'; you don't have to sit down, grab a pencil and notepad and plot waveforms to triangulate the angle in which the sound likely originated. These calculations are done in the background of our mind. That's right, you and I (and even our pet cats) are pre-programmed to utilize these functions without having to 'think' about it. This way we can use our main processor-time for more important tasks.

I wanted to experiment with methods that the brain uses for indicating the direction of sound. A little background on the two methods:

Interaural Level Difference: used to describe difference in amplitude between two or more sensors
Interaural TIme Difference: used to describe difference in arrival time of two signals

A few links on the topic:
http://www.ise.ncsu.edu/kay/msf/sound.htm
http://en.wikipedia.org/wiki/Sound_localization

For simplicity, I focused on the 1st method and implemented an 'object tracking' approach.

Components used:

Arduino Uno
2 Phidget Sound Sensors
Continuous Servo Motor

To continue to maintain simplicity, I chose the Phidget Sound Sensors because they outputs a 0-5 Volt signal representing measured volume (opposed to a raw signal from a microphone). This also allows for a slower processor (such as the Atmega328) to be quick enough for the task. Below is a pic of the system (made for a class project).

Here is a functional diagram of the system I drew up:

A high-level schematic

A picture of the project

And at last, a video! (May be loud). We used the "Air-Horn" Phone-App

http://youtu.be/GbMaHHulh_s

Video with improved code

Return of the RC car!!!

2011-05-03T21:47:00.011-04:00

I've been very busy since the last post, but I finaly got back into the groove for the RC car. I wanted this car to be completely modifiable so the inside of the electronics control box = breadboards + velcro (which works pretty darn good). Here is a photo of the wiring.

Included in this box:
1. Boarduino (Atmega 328) controller
2. 18v15 Pololu Motor Driver (rear motor drive)
3. Dual AxisCompass Module - HMC6352
4. Xbee Pro 60mW U.FL module: for wireless comunication

I had to replace the stock steering motor/potentiometer setup with a servo motor to put ease on the programming (I'm now able to eliminate the steering control loop code). I also broke the ICSP header outside the box for programming (I had to wire up a switch to disconnect both the external power and the Xbee's transmit line from the microcontroller in order to externally program).

I'm working on the code for wireless control/feedback. Eventually I intend on adding a camera, ultrasonic rangefinder and GPS module.

Anyway, here is a short video on my car's first run with both functional steering/driving. (given the small area, I had it just do circles in my kitchen).

RC Car

2010-08-04T21:20:00.005-04:00

I'm currently putting together a new project where I'm attempting to automate the control of an old RC Car. I've ripped out the original guts except for the the drive motor and steer motor. After numerous attempts at a DIY motor controller for the drive motor (and popping through transistors like popcorn), I've come to the conclusion that this motor needed a driver who could handle the spike currents without melting under pressure. Here is a list of all the parts that will be going into this project, and later I will provide updates on my progress.

Parts:

1. Original RC Chasis (including suspension and wheels/tires).
2. Original RC 7.2V Battery Packs (2)
3. Original rear-wheel drive motor (specs unknown except that it draws 3.5A current running, and 15A stall).
4. Orignal steering motor + original feedback potentiometer (specs also unkown)
5. Boarduino (Atmega 328) controller
6. 18v15 Pololu Motor Driver (to properly drive the rear motor)
7. Dual AxisCompass Module - HMC6352
8. Xbee Pro 60mW U.FL modules + antenna (2): for wireless comunication
9. A few 2n2907/2n2222 BJTs for driving the steering motor

Light Follower - 2 Axis

2010-04-25T01:23:00.011-04:00

Here it is - My rendition of a 2-axis light follower. I went ahead and clipped the IR Cam from the wiimote to cut down on weight. I put the camera along with the required oscillator in a project box (from radio shack). I also included an LED on the box; it lights up when light is seen by the camera. A serial port is used to connect the box to the microcontroller.

Here are some Pictures:

Here is a video:

http://www.youtube.com/watch?v=y41o7KieRgw

Here is a link to the code:

http://www.wiimoteproject.com/wiimote-accelerometer-and-motions-detecting-projects/light-follower-2-axis/

Nintendo nunchucks as Orientation Sensors

2010-03-17T02:26:00.006-04:00

My senior design project was to make a wirelessly controlled robotic arm, that mimics human arm movements. The closest we got was the movement of a shoulder joint and an elbow joint at a very high accuracy and low time delay.

I made a sensor system out of 2 Wii Nunchucks, an Arduino and some external circuitry to switch between nunchuck sensor reading. At a deadline of one of our presentations, a plastic gear from our arm had chipped some of its teeth so we weren't able to give motion demos. I whipped up a program using Processing (processing.org) to communicate with the Arduino and move a 3D simulation of an arm based on the sensor outputs.

I thought I'd share a few screenshots of the program. I'll have both the Arduino code, and processing code up soon.

Processing is a great program for doing graphical manipulations. It can also compile code to execuatable files.

Wiimote light follower with servo

2010-03-17T00:03:00.001-04:00

Everybody is familiar with the infamous Wiimote. When I look at it, I think about all the useful sensors/gadgets that this little 40$ package (new) comes with. Recently I've been playing with the IR Camera (It's really just a light sensing camera with an IR Filter).This particular camera is a standalone module that outputs coordinates of the 4 brightest "images", all via I2C communication.

I've only seen hacks with the Wiimote cam where the camera is desoldered/removed from the Wiimote. However, at 40$ a pop that seemed like a waste of a perfectly good Wiimote. Instead of removing the cam, I only made 1 small modification, which was drilling a very tiny hole near the camera and soldering a connection to it's "Clock" pin (which needs to be a 24MHz sine wave to replace the internal oscillator). Once you have this done, all you need to do is plug a cord into the Wiimote peripheral port to use anything on its I2C bus.
Moving on, I attached the Wiimote on a homemade stand that was fixed onto a continuous-rotation servo motor (servo without feedback). Add a little duct tape, and that servo isn't moving for at least an hour.

Using an IR Camera library already created (thanks to Hobley – http://www.stephenhobley.com/), I used an Arduino to receive points from the camera and follow the 1st object (light source) that's noticed.

The code can be found here:

http://www.wiimoteproject.com/wiimote-accelerometer-and-motions-detecting-projects/wiimote-light-follower-with-servo

http://www.youtube.com/watch?v=Nj7UqjP-z6U

Two Wii Nunchucks with one arduino

2010-03-15T23:07:00.000-04:00

In the midst of a senior design project, it was decided that we wanted to use 2 Wii Nunchucks as accelerometers to measure orientation of a human arm ( 1 for the upper arm and 1 for the forearm). In understanding I2C communication, there is no way to use 2 Nunchucks on the same I2C bus without some sort of external circuitry. (All nunchucks have the same slave address, leaving nothing to distinguish between the two when attempting to receive data).

I drew up a simple and cheap solution to interface two (or more) Wii Nunchucks on the same I2C bus. This is useful for projects that require multiple accelerometers at a cheap price.

Here is all you will need:

2 npn switching transistors (I used 2N3904)
2 current limiting resistors (I used 200 Ohm)

Just connect all nunchuck Power (PWR), Clock (SCL), and Ground (GND) wires to the same corresponding spots on your microcontroller. The microcontroller's SDL can be connected to the outputs of both transistors.

Programming notes:

In order to perform a read, all you have to do is set the pin of the corresponding transistor to HIGH (5v in our case), write/read to the I2C bus, then set that pin to LOW (0v)to disconnect that nunchuck from the bus. Also, during start up you must initialize each nunchuck individually in order to operate both nunchucks correctly.
Good luck,
B Dwyer (aka: johnnyonthespot)