Wednesday, March 15, 2017

Zero-Copy: CUDA, OpenCV and NVidia Jetson TK1: Part 2

In this part 2 post I want to illustrate the difference in technique between the common 'device copy' method and the 'unified memory' method which is more suitable for memory architectures such as NVidia's Tegra K1/X1 processors used on NVidia Jetson development kits. I wanted to show an example using just a CUDA kernel as well as an example utilizing OpenCV gpu::functions().


1. CUDA kernels: Device Copy method


For this example, I've written a simple CUDA kernel that will take a fixed matrix (640x480) of depth values (delivered by Xbox 360's Kinect) and simultaneously convert to XYZ coordinates while rotating the points. This example only computes the Y dimension, but I can provide a full XYZ function as well, the math is fairly simple. The code may seem a bit intense, but try not to think of what's inside the CUDA kernel for now.

Kernel Code:

__global__ void cudaCalcXYZ_R2( float *dst, float *src, float *M, float heightCenter, float widthCenter, float scaleFactor, float minDistance)
{

//__shared__ float jFactor;
__shared__ float shM[3];
float nx,ny,nz, nzpminD, jFactor;
int blockCapacity;
int index;
if(threadIdx.x == 0)
{
shM[0] = M[4];
shM[1] = M[5];
shM[2] = M[6];
}
index = blockIdx.x*blockDim.x + threadIdx.x;
nz = src[index];
jFactor = ((float)blockIdx.x - heightCenter)*scaleFactor;
nzpminD = nz + minDistance;
nx = ((float)threadIdx.x - widthCenter )*(nzpminD)*scaleFactor;
ny = (jFactor)*(nzpminD);
//Solve for only Y matrix (height vlaues)
__syncthreads();
dst[index] = nx*shM[0] + ny*shM[1] + nz*shM[2];
}

Basically, a float pointer is sent as src (Depth data), it is manipulated to acquire the 'Y' parameter which is then stored in another float* dst. In a device copy implementation of the CUDA kernel, the data pointed to by src must first be copied to device memory using CUDA method cudaMemcpy(). Below is an example of how to do this ('h' generally means host (cpu) while 'd' means 'device' (gpu) ):

{
int rows = 480;
int cols = 640;
float* h_src, h_dst; //Host matrices
float* d_src, d_dst; //Device matrices
float* h_m, d_m; //4x4 rotation matrix (host/device)

//Allocate device copies using cudaMalloc
cudaMalloc( (void **)&d_src, sizeof(float)*rows*480);
cudaMalloc( (void **)&d_dst, sizeof(float)*rows*480);
cudaMalloc( (void **)&d_m, sizeof(float)*16);

//Allocate host pointers
h_src = (float*)malloc(sizeof(float)*rows*cols);
h_dst = (float*)malloc(sizeof(float)*rows*cols); 

h_m =   (float*)malloc(sizeof(float)*4*4);

//Copy all matrices from host to device
cudaMemcpy( d_src, h_src, sizeof(float)*rows*cols, cudaMemcpyHostToDevice);
cudaMemcpy( d_m, h_m, sizeof(float)*16, cudaMemcpyHostToDevice);

//Run the kernel
cudaCalcXYZ_R2<<< rows , cols>>>(d_dst, d_src, d_m, 240, 320, 0.0021, -10);

//Wait for GPU to finish
cudaDeviceSynchronize();

//Copy the result back to host memory
cudaMemcpy( h_dst, d_dst, sizeof(float)*rows*cols, cudaMemcpyDeviceToHost);

}


2. CUDA kernels: Unified Memory method


Here we are going to utilize the same kernel as the above example, but this time we are going to avoid any memory copy altogether by utilizing the CUDA_UVA technique. Here, instead of using cudaMallac() we have to use cudaMallocManaged();

{
cudaSetDeviceFlags(cudaDeviceMapHost); //Support for mapped pinned allocations

int rows = 480;
int cols = 640;
float* h_src, h_dst; //Src and Dst matrices
float* h_m;          //4x4 rotation matrix

//Allocate float*s for CUDA. No need to allocate host and device separately
cudaMallocManaged(&h_src, sizeof(float)*ros*
cols);
cudaMallocManaged(&h_M,   sizeof(float)*4*4);
cudaMallocManaged(&h_dst, sizeof(float)*
ros*cols);

//Run the kernel
cudaCalcXYZ_R2<<< rows , cols>>>(h_dsth_srch_m, 240, 320, 0.0021, -10);

//Wait for GPU to finish
cudaDeviceSynchronize();

//Done, now h_dst contains the results}
}

So now we have completely eliminated copying over 1.23MB (640x480x4 bytes) prior to running the kernel as well as eliminated copying 1.23MB (640x480x4 bytes) after the kernel has finished. Imagine trying to achieve real-time performance on a robot reading a Kinect sensor at 30FPS, needlessly copying more than 73.3MB a second into the same RAM!

[Note]: This code would only function on an architecture such as the NVidia Tegra X/K processors, so no sense in trying to run it on your discrete GPU in your laptop or desktop (it just won't work!).


3. OpenCV GPU functions: Device Copy method

There is a module available with OpenCV called GPU written in CUDA, for those to take advantage of GPU acceleration of various functions. There is plenty of documentation online to understand how to use OpenCV's CUDA, I will go over the very basics. The example we will use is the per-element multiplication of two matrices a and b, where the result is stored in c. Using the 'device copy' method, here is how to do so with OpenCV's gpu function gpu::multiply():

{
//variables/pointers
int rows = 480;
int cols = 640;

float* h_a, h_b, h_c;
float* d_a, d_b, d_c;

//Allocate memory for host pointers
h_a = (float*)malloc(sizeof(float)*rows*cols);
h_b = (float*)malloc(sizeof(float)*rows*cols);
h_c = (float*)malloc(sizeof(float)*rows*cols);

//Allocate memory for device pointers
cudaMalloc( (void **)&d_a, sizeof(float)*rows*cols);
cudaMalloc( (void **)&d_b, sizeof(float)*rows*cols);
cudaMalloc( (void **)&d_c, sizeof(float)*rows*cols);

//Mats (declaring them using available pointers)
Mat hmat_a(cvSize(cols, rows), CV_32F, h_a);
Mat hmat_b(cvSize(cols, rows), CV_32F, h_b);
Mat hmat_c(cvSize(cols, rows), CV_32F, h_c);

//Gpu Mats (declaring with available pointers)
gpu::GpuMat dmat_a(cvSize(cols, rows), CV_32F, d_a);
gpu::GpuMat dmat_b(cvSize(cols, rows), CV_32F, d_b);
gpu::GpuMat dmat_c(cvSize(cols, rows), CV_32F, d_c);

//Let's assume our host matrices are filled with actual data, then copy them to the device matrices
dmat_a.upload(hmat_a);
dmat_b.upload(hmat_b);

//Run gpu::multiply()
gpu::multiply(dmat_a, dmat_b, dmat_c);

//Copy the result back to the host
dmat_c.download(hmat_c);

//Result now in hmat_c, required copying matrix a, b and c...
}


4. OpenCV GPU functions: Unified Memory method


You'll notice that in the above example I've been allocating memory to pointers for my images, rather than just using OpenCV to allocate memory upon declaration of a Mat or GpuMat. This is required for this section on utilizing OpenCV GpuMats without having to upload and download data to and from the GPU memory on chips such as the Jetson IC. There is another less obvious reason I use this method. For real-time performance on embedded processors, it is more efficient to allocate memory for objects early on prior to any operations that run cyclically. As long as you can spare the memory, this becomes an effective way to increase performance (granted the trade off is sacrificing some RAM which won't be freed up etc.). If you find yourself in need of dynamically freeing up space from these allocated methods, you can look into cudaFree() and cudaFreeHost().

Now on eliminating download() and upload() OpenCV function calls.

{
cudaSetDeviceFlags(cudaDeviceMapHost); //Support for mapped pinned allocations

//variables/pointers
int rows = 480;
int cols = 640;

float* h_a, h_b, h_c;

//Allocate memory for device pointers
cudaMallocManaged(&h_a, sizeof(float)*rows*cols);
cudaMallocManaged(&h_b, sizeof(float)*rows*cols);
cudaMallocManaged(&h_c, sizeof(float)*rows*cols);

//Mats (declaring them using pointers)
Mat hmat_a(cvSize(cols, rows), CV_32F, h_a);
Mat hmat_b(cvSize(cols, rows), CV_32F, h_b);
Mat hmat_c(cvSize(cols, rows), CV_32F, h_c);

//Gpu Mats (declaring with the same pointers!)
gpu::GpuMat dmat_a(cvSize(cols, rows), CV_32F, h_a);
gpu::GpuMat dmat_b(cvSize(cols, rows), CV_32F, h_b);
gpu::GpuMat dmat_c(cvSize(cols, rows), CV_32F, h_c);

//Run gpu::multiply()
gpu::multiply(dmat_a, dmat_b, dmat_c);

//Result now in hmat_c, no copying required!
}

Much like in the CUDA unified memory example, this method will only function on hardware with unified memory architecture (Jetson ICs for example). Now you do not need to bother using OpenCV download and upload methods for your algorithms.

Enjoy the speedups!

Wednesday, March 8, 2017

Zero-Copy: CUDA, OpenCV and NVidia Jetson TK1: Part 1

If you aren't yet familiar with NVidia's embedded ECU releases (NVidia Jetson TK1, TX1 and coming soon TX2) they are definitely something dig into. NVidia has been embedding their GPU architectures on the same IC as decent-speed processors (like a quad-core ARM Cortex A-15). The Jetson TK1 is by far the most affordable ($100-200), and is an excellent option to bring in some high performance computing to your mobile robotics projects. I'm posting my findings on a slight difference between programming with CUDA on NVidia discrete GPUs and on NVidia's embedded TK/TX platforms (in regards to their memory architecture)

CUDA is a C-based framework developed by NVidia to allow developers to write code for parallel processing using NVidia's GPUs. Typically the main CPU is considered the 'Host' while the GPU is considered the 'Device'. The general flow for using the GPU for general purpose computing is as follows:
  1. CPU: Transfers data from host-memory to device-memory
  2. CPU: Command CUDA process to run on GPU
  3. CPU: Either do other work or block (waiting) until the GPU has finished
  4. CPU: Transfer data from device-memory to host-memory 
This is the general flow, and you can read up on this more in depth. As far as the GPU and the CPU on the NVidia TK/TX SOCs go they are considered to have a Unified Memory Architecture (UMA): meaning they share RAM. Typical discrete GPU cards have their own RAM, thus the necessity of copying data to and from the device. When I learned this I figured the memory-copy process could be eliminated altogether on the Jetson!

I started out learning how to do this purely with simple CUDA kernels. Generally, the CUDA compiler will not allow a CUDA function (or kernel) to operate on data-types that have CPU-type pointers etc. I came across different memory-methods of using CUDA:
  • CUDA Device Copy
  • CUDA Zero Copy
  • CUDA UVA (Unified Memory)
 It took me a while to get used to using these different techniques, and I was not sure which one was appropriate so I did some profiling in order to find out which one gave me the best speed-up (Device Copy as the base). It turned out that CUDA UVA was the better method for coding on the Jetson TK1 embedded GPU.

However, I still ran into a problem using OpenCV on the Jetson. OpenCV has a CUDA module, however OpenCV is designed to use two different Mat data-types: mat for CPU and gpu::GpuMat for GPU. So you could not use OpenCV gpu::functions on cpu mat objects. OpenCV actually has you do the same thing as in 'device copy' for CUDA, and use their methods for copying a CPU mat to the GPU and vice-versa. When I realized this, I was stunned that there was no Unified Memory method (to my knowledge) in OpenCV. So all OpenCV gpu::functions required needless memory copying on the Jetson! On an embedded device this is an extreme bottleneck, as I was already hitting the wall with my programs working with the Kinect IR sensor and image data.

So after quite a bit of sand-box style experimentation, I found the correct approach to casting Mat pointers into GpuMat pointers without doing any memory copy and maintaining the CUDA UVA style. My original program with my Kinect sensor ran at 7-10FPS, and that was with cutting the width and height down from 640x480 to 320x240. With my new approach of avoiding any memory copy I was able to achieve full 30FPS at full 640x480 (this is on all the Depth Data from the IR sensor).

I will post code on my github and update this with the link soon.

Move on to Part 2 for examples

Monday, March 6, 2017

Ethernet-Based IMU for ROS

So I've been doing a bit of development with Robot Operating System (ROS) and a while back needed a flexible solution for acquiring inertial measurement data from multiple sensors. My intention was to have an embedded controller responsible for collecting IMU data and delivering to a master controller. Since in ROS it is easy to swap between a laptop and an embedded-Linux device as your platform, I decided to make a portable Ethernet-based IMU out of the MPU9150 and a raspberry pi (or pcduino).

I had previously made my own C/C++ Linux-based API interface for communicating to the MPU9150 (3axis gyrometer, accelerometer, magnetometer). This project is an extension to have an embedded server which would send periodic packets containing latest IMU data (so as of now, the embedded-IMU device does not run ROS). I wrote a ros-node to connect to this server, receive the data and publish it as a ROS message. Everything seems to be working out quite well and I currently am receiving IMU data at 100Hz using Ethernet (less over wifi). I can add additional Ethernet-based IMUs to the project with little complexity now. Below is my wireless version.

Pcduino-3, MPU9150 and a USB battery pack

For show, I setup a battery-powered pcduino-3 connected to an IMU as the server over wifi (eliminating wires which get in the way for a hand-held demo). On my ROS device (laptop) I have a node running which is dedicated to receiving the IMU data packets and publishing in ROS as a sensor message that other ROS nodes can subscribe to. Below is a video of real-time plotting of received IMU data using rqt. The video is not the best quality because my phone is not so great, but you can see that the top plot is the  Gyro-Z measurement and the bottom plot is the Accelerometer x-axis measurement.


I also wrote an additional node in ROS to subscribe to the IMU messages and perform some integration on the gyro data to estimate the sensor orientation around the z-axis (so what a compass would tell you). You can see in this short video the integration is somewhat reliable. There is definitely gyro drift over time which would end up corrupting the estimate in the long term, but that's where filtering techniques will come in to assist in state estimation (for example: Kalman filtering). Video below:



I will be posting my code for this project on my github soon, I will update this once that is ready (and hopefully making better videos)


Wednesday, December 10, 2014

Self-Balancing Robot

In my last class for my Masters, I decided to build a self-balancing robot for the final project. What I liked about this project, is that it involved a few very relevant areas of embedded systems all in one project. I'll try to write this post the way I wrote the presentation.

The class was 'Mixed Signal Embedded Systems', and revolved around the Cypress chip called the PSoC4 (programmable-system-on-chip). The chip is interesting in that it has both an ARM M0 processor AND some (emphasis on 'some') programmable logic (PLDs). It also has 2 embedded amplifiers and some other interesting HW components (all configurable through the ARM via registers). Anyway, this post is not about this chip, but feel free to read up on it. It is another example of the direction in which embedded systems are heading.

Motivations for this project

  • Interface to inertial measurement sensors (Gyros/Accelerometers)
  • Employ light-duty sensor fusion for robot pose estimation
  • Implement an embedded PID controller to maintain robot balance

The problem: The Inverted Pendulum



The goal of the inverted pendulum is to keep the pendulum from tipping over. Without intervention, the pendulum is naturally unstable and will eventually tip over. Controls must be used to counter act this tendency by applying a force in the horizontal direction.

Inertial Measurement Sensors

The MPU9150 was used for inertial measurement sensing for this project. The MPU9150 is:

  • 3 axis accelerometer
  • 3 axis gyro
  • 3 axis compass
  • ...all integrated in a single IC

The benefits of having these components on a single chip, is that the error caused by misaligned axis is greatly reduced and controlled by the manufacturing process. This also makes the packaging both smaller and cheaper.


The sensor data is accessed through the I2C protocol, so a microcontroller is needed to perform this (and I wrote a library both in C and C++ which I will post to my github page and provide a link).

Accelerometers


Accelerometers measure linear acceleration. The diagram above shows the 3-axis accelerometer
placement inside the MPU9150. When an accelerometer is stationary, it measures a net acceleration the gravitational force of 9.8m/s^2 (at least on earth). The orientation of this sensor with respect to the gravitational force (let's call it earth's z-axis) can be estimated using trigonometry:


This function is accurate only when there is no external forces being applied. When external forces are applied, such as the robot moving/tipping, the orientation cannot be accurately estimated.

Gyros



Gyros measure the the rate of angular change. The above diagram shows the 3-axis gyro placement inside the MPU9150. Gyros are not susceptible to linear forces such as vibrations or movement. They are only sensitive to angular displacements. Gyros cannot be used to directly measure physical orientation, but their readings can be integrated over time.


dt is the control loop time.

Gyros suffer from a phenomenon known as drift, where they show a small rate of change even though no actual rate of change is occurring physically. When integrating, this error adds up over time, causing an inaccurate estimate of angle.

Also note from the 2 pictures of the axis alignment: the x-axis gyro is perpendicular to the y-z plane of the accelerometer. So the angle using the accelerometer's x-z plane (as shown in the equation) would be the same angle using the gyro's y-axis.

Estimating Robot Pose


In order to accurately estimate angle using gyros and accelerometers, one has to combine these sensor
readings together in a way that takes advantage of their strengths in order to make up for their
weaknesses. This technique is known as sensor fusion. There are a few different pose estimation methods that have their own strengths and weaknesses. For this project, one that is easily implemented on a low-power microcontroller, called complementary filtering is used. The equation below shows how this is implemented:


Alpha is used to weight the two angles estimates before they are combined. Since the accelerometer is more prone to adding noise to the system, a large alpha is chosen to give the gyro integration more
weight in the equation, effectively applying a low-pass filter on the accelerometer. For this project, an
alpha of 0.99 was used (along with a control loop time of 10 milliseconds).

PID Control System



Proportional-Integral-Derivative (PID) is a control loop feedback technique using the error between a
set-point and observed output of a system (often referred to as the plant). A PID controller must be tuned by varying the 3 gains associated with the control algorithm: Kp, Kd and Ki. Different choices for these constants manipulate the systems response in terms of responsetime, overshoot and oscillations.

Kp: The proportional term. Depends on the present error only.
Ki: Integral term. Depends on the accumulation of past errors.
Kd: Derivative term. Prediction of future errors.

The following is a pseudo-code example of implementing a PID algorithm in a controlled loop.


The output of the PID algorithm is a sum of the gain-terms multiplied by their function on the error. The error is a simple difference from the target (in this case an angle) and the observed state (the estimated angle by the sensor fusion algorithm). It is important to keep the control loop rate consistent (so use an interrupt, or don't do extra processing).


Results


I built my robot using parts of one of my other robots, so save on cost. I built a cheap frame out of small wood squares, a dowel and hot glue (all bought at Michaels for less than $5):


Starting off with just a p controller (Kp >0, Ki = Kd = 0), the system was nowhere near useful. The robot would over-shoot and bang itself on the floor. I thought it would destroy itself before I would be able to stabilize it. Fortunately, after adding some derivative gain (Kd), the robot was able to keep itself up-right, with some oscillation. When I added some disturbance (a little push) the robot would travel horizontally until it eventually fell over. After adding some integral gain (Ki) the robot seemed to stabilize itself very well.  My robot at this point would re-stabilize fast enough from a slight push, with some oscillation and small travel in the horizontal direction.My first set (and my favorite) set of gains were: Kp = 8, Ki = 0.5, Kd = 10. These gains won't mean much to you, as they are fit for my robot/system, which is dependent on my control loop, motor response, motor torque, speed resolution, robot height...and it goes on...). You would have to experiment with your system for a suitable set of gains.

What I can do, is share my next set of gains as a comparison to show the change in performance. I thought it would be good to rid of the oscillation, so I increased the derivative gain from 15 to 60. In order to keep my robot stable I had to increase the Kp and Ki gains slightly. This was good, in that my robot did not oscillate so much while trying to keep its balance in the non-added disturbance case. And when I did add a slight push, the robot corrected its angle almost instantly. However, the robot was forced to travel in the horizontal direction for a much larger distance to maintain this response.

I learned that added disturbances (like pushes) is like adding energy to the system. In order to deal with the extra energy, the robot can either oscillate back and forth a few times while minimizing travel in the horizontal direction, or it had to travel in the horizontal direction quite more just to maintain the angle I had programmed.

Anywho, here is a video of the robot in action:


I have a github site, so I'll post the code there. I have C, and C++ library for the MPU9150. The C library however is dependent upon the PSOC4, but you can strip what you need rather easily from it. I'll also include an application I wrote in processing (see processing.org) which the robot used to communicate to my PC so I could understand what was going on a little better:

Link to the code:

Thanks!

Tuesday, July 29, 2014

Autonomous Robot with Zynq

In an earlier post, I talked up the Xilinx Zynq (an IC with both FPGA and Microcontroller). In my Advanced Embedded System Design course, we had to build an autonomous robot that can navigate around with some form of intelligence and seek out certain objects to 'destroy'. Now the term 'destroy' was really left up to the students to define; for our robot a laser pointer was used to mark enemies. Identifying 'enemies' was a challenge, so we incorporated a camera so the robot could see and track on its own. Meet our robot (below):


Equipped with:

  •  Zybo
  • OV7670 Camera
  • Arduino
  • Sabertooth Motor Controller
  • IR Proximity Sensors
  • LiPo Batteries (12V + 7V)
  • Pan/Tilt Servo motors
  • Laser
Custom logic was designed onto the FPGA portion of the Zynq, in order to maneuver the robot, control the pan/tilt bracket, capture frame data from the camera, and lastly - 'fire the laser'. The ARM portion of the Zynq was used as the algorithm prototyping environment (in C) to make use of the custom FPGA interfaces. The Arduino was used to configure the OV7670 over the I2C-like communications. The following diagram shows how all components were interfaced.


The reason the OV7670 was chosen was its parallel data interface and low cost ($20), allowing us to obtain an image capture rate of 30fps. However, you get what you pay for in terms of ease of interfacing. I had to design custom logic in VHDL to perform frame captures and store them in block-ram on the FPGA. It's hard to debug what you can't see. None the less, after days of toying around, the Zybo was finally able to see. In order to view what the Zybo saw, I wrote a quick program to transfer images from the Zynq to my laptop over a serial connection (using Processing). Below is the process flow for debugging the images.


The custom FPGA logic which interfaced to the OV7670 was designed to either stream frames into the block-ram, or take a single snapshot and leave it in the block-ram. The FPGA had to interface/synchronize to the vsync, href, pixel-clock, and 8-bit data of the OV7670. It also needed to interface with the ARM processor and to FPGA-based block-RAM. Below is the block model of the custom camera control as shown in Vivado.

We couldn't have done it without camera register settings provided by this source: Hamsterworks - OV7670

Otherwise the images didn't turn out so well:

I'll post some videos of the robot in action soon...

Tuesday, April 29, 2014

Xilinx Zynq7000 and the Zybo

If you are into getting your hands on the latest embedded technology, the Zynq7000 is a great platform to get familiar with. Xilinx, a leader in FPGA design (in which I have no affiliation with), partnered up with ARM to create the very first microprocessor with on-chip FPGA (or is it an FPGA with an on-chip processor?).


This technology combines the power of using an FPGA to perform high-speed paralleled tasks with a dual-core ARM processor with standard Micorcontroller peripherals (CAN, I2C, SPI, UART etc.). I was lucky enough to be introduced to this device in my Advanced Embedded System Design course at Oakland University, and will continue working with it as long as I can continue to make sense out of partitioning my embedded design projects into both 'some hardware' and 'some software'. I purchased the Zybo development board made by Digilient, however there are others out there (Zedboard, microZed by AVNet)

To paint a better picture of how this is so useful, imagine using a microcontroller to both interface to a camera and perform image processing in order to make a decision (maybe you are trying to track an object). In terms of cameras you are able to interface to, you are left with those with slower serial interfaces (SPI, UART). If you really wanted to interface a microcontroller to a camera with a faster parallel interface, you would have to spend a little more money to get a fast enough controller. But this doesn't leave your controller much time to do anything else other than image-processing and frame-grabbing! On the flip side, if you were to use an FPGA it becomes harder to actually implement your decision making and takes longer to develop and is harder to debug.

Having an FPGA and Microcontroller on the same IC solves this exact problem. Now, you can develop your frame-grabbing for your higher-speed camera and even do some image processing on the FPGA. In this way, the FPGA can be treated as a custom co-processor for your application running on the Processor. And the FPGA to Processor interface is seen by your application is either 'just another memory mapped peripheral' or even an 'external interrupt'.

The example I used comes directly from my project, where I did use the FPGA to communicate with a faster camera (well faster than UART or SPI based) and allow the processor to dictate what kind of operations to perform on the image, or even stream the image to the processor's RAM for streaming to a remote PC via UART. This was for an Autonomous Robot project, which I will add posts on in the near future.

I recommend checking this technology out. It is great to see innovations such as this, because they have the power to take industries into new directions. If you are interested in getting yourself one, I suggest either the Zedboard or the Zybo.

Link to Digilent's Zybo

Friday, April 20, 2012

Audio Localization

Just a little introduction on my interest in this topic:
I've always been intrigued by the way our mind is configured to interpret sound signals. To those of you with two working ears: Ever notice when you hear a noise, you know which direction it came from?... I mean you just 'know'; you don't have to sit down, grab a pencil and notepad and plot waveforms to triangulate the angle in which the sound likely originated. These calculations are done in the background of our mind. That's right, you and I (and even our pet cats) are pre-programmed to utilize these functions without having to 'think' about it. This way we can use our main processor-time for more important tasks.

I wanted to experiment with methods that the brain uses for indicating the direction of sound. A little background on the two methods:
  • Interaural Level Difference: used to describe difference in amplitude between two or more sensors
  • Interaural TIme Difference: used to describe difference in arrival time of two signals
A few links on the topic:
http://www.ise.ncsu.edu/kay/msf/sound.htm
http://en.wikipedia.org/wiki/Sound_localization
For simplicity, I focused on the 1st method and implemented an 'object tracking' approach.

Components used:
  • Arduino Uno
  • 2 Phidget Sound Sensors
  • Continuous Servo Motor
To continue to maintain simplicity, I chose the Phidget Sound Sensors because they outputs a 0-5 Volt signal representing  measured volume (opposed to a raw signal from a microphone). This also allows for a slower processor (such as the Atmega328) to be quick enough for the task. Below is a pic of the system (made for a class project).

Here is a functional diagram of the system I drew up:

A high-level schematic

A picture of the project

And at last, a video! (May be loud). We used the "Air-Horn" Phone-App

Video with improved code


Tuesday, May 3, 2011

Return of the RC car!!!

I've been very busy since the last post, but I finaly got back into the groove for the RC car. I wanted this car to be completely modifiable so the inside of the electronics control box = breadboards + velcro (which works pretty darn good). Here is a photo of the wiring.

Included in this box:
1. Boarduino (Atmega 328) controller
2. 18v15 Pololu Motor Driver (rear motor drive)
3. Dual AxisCompass Module - HMC6352
4. Xbee Pro 60mW U.FL module: for wireless comunication



I had to replace the stock steering motor/potentiometer setup with a servo motor to put ease on the programming (I'm now able to eliminate the steering control loop code). I also broke the ICSP header outside the box for programming (I had to wire up a switch to disconnect both the external power and the Xbee's transmit line from the microcontroller in order to externally program).

I'm working on the code for wireless control/feedback. Eventually I intend on adding a camera, ultrasonic rangefinder and GPS module.

Anyway, here is a short video on my car's first run with both functional steering/driving. (given the small area, I had it just do circles in my kitchen).

Wednesday, August 4, 2010

RC Car

I'm currently putting together a new project where I'm attempting to automate the control of an old RC Car. I've ripped out the original guts except for the the drive motor and steer motor. After numerous attempts at a DIY motor controller for the drive motor (and popping through transistors like popcorn), I've come to the conclusion that this motor needed a driver who could handle the spike currents without melting under pressure. Here is a list of all the parts that will be going into this project, and later I will provide updates on my progress.

Parts:

1. Original RC Chasis (including suspension and wheels/tires).
2. Original RC 7.2V Battery Packs (2)
3. Original rear-wheel drive motor (specs unknown except that it draws 3.5A current running, and 15A stall).
4. Orignal steering motor + original feedback potentiometer (specs also unkown)
5. Boarduino (Atmega 328) controller
6. 18v15 Pololu Motor Driver (to properly drive the rear motor)
7. Dual AxisCompass Module - HMC6352
8. Xbee Pro 60mW U.FL modules + antenna (2): for wireless comunication
9. A few 2n2907/2n2222 BJTs for driving the steering motor



Sunday, April 25, 2010

Light Follower - 2 Axis


Here it is - My rendition of a 2-axis light follower. I went ahead and clipped the IR Cam from the wiimote to cut down on weight. I put the camera along with the required oscillator in a project box (from radio shack). I also included an LED on the box; it lights up when light is seen by the camera. A serial port is used to connect the box to the microcontroller.
Here are some Pictures:


Here is a video:
Here is a link to the code: