SlideShare a Scribd company logo
SURESH S
Contact: +91-9611225463
E- Mail: suresh.david86@gmail.com
S E N I O R L E V E L P R O F E S S I O N A L
E M B E D D E D D S P P R O J E C T S | P A R A L L E L C O M P U T I N G
Location Preference: Bangalore |Chennai| Hyderabad
CORE COMPETENCIES
DSP Projects
Parallel Computing
Image Processing / Analysis
Video, Audio & Speech Codecs
DSP Assembly Programming
Algorithm Processing
Application Development
Testing & Inspection
Troubleshooting / Debugging
Multimedia Framework
Project Management
Team Building & Leadership
PROFILE SUMMARY
• Offering over 8 years of experience with a strong analytical bent and a wide knowledge
processing algorithms and techniques
• Presently associated with Cerium Systems Pvt. Ltd. Bangalore as Technical Lead
• Spearheaded designing, analysing, implementation and debugging of DSP projects (entailing
video / speech / audio/ image systems) and Parallel computing field with OpenCL, Open GL,
C++Amp and HSA
• Skilled at creating and developing high performance, algorithmic trading models using maths
and statistics
• Expertise in delivering high quality DSP audio & voice processing software AND working
seamlessly with diverse software teams to integrate DSP deliverables into framework
• Comprehensive experience in video / audio encoding and encryption with focus on all
compression algorithms, video formats, standards
• Skilled in evaluating and integrating third-party deliverables needed towards the overall DSP
software solution
• In-depth knowledge of software tools / methodologies including Arm-Cortexa8-Neon,
CortexA9 (Dual core), ARM11, CEVAX1620, TMS320C64X, Blackfin, MIPS, ARM-926EJS
Processors, AMD GPU and ARM GPU architectures and its programming
• Expertise in testing, debugging and resolving bug issues of Mobile Multimedia Framework
(MMF) for Audio, Video, Speech, Image codes on VC+ & ARM926EJS platform
• Enhanced vision algorithmic programs, rules and formulas for semi device equipment and
assessing and installing vision system operation particularities
• Proficiency in creating, enhancing and codifying particular software programs and backing-
up the creation and enhancement of commodities
• Drove the development of image processing, image analysis, and image understanding
solutions for multiple image and video sensor data application
ORGANIZATIONAL EXPERIENCE
Jan’14- till date Cerium Systems Pvt. Ltd. Bangalore as Technical Lead
Key Result Areas:
• Leading the design, specification, development, testing and production of software for video / speech / audio / imaging system
• Developing and implementing real-time and offline signal processing algorithms in C/C++ and collaborating with research teams
and product management to identify and develop critical signal processing innovations for the company
• Involved in Algorithm optimization, assisting with automated testing of embedded DSP while staying up to date on industry trends
and best practices and upholding company best practices in engineering and research
• Developing & implementing algorithms and optimizing image processing codes on OpenCL and HSA
• Directing the development of various OpenCL 1.2/2.0 samples in AMD APPSDK Installer for AMD
• Optimizing jm H.264 Baseline Encoder, integration in Ffmpeg on Cortexa15(four core)/MaliGPU/Neon Platform
• Integrating AMR-NB & AMR-WB, Midi-decoder on FFMPEG on Cortex-A9 Dual core with Neon and MIPS platform
• Ensuring the development & optimisation of Mp3 decoder, integration in Ffmpeg Multimedia Framework on Arm-CortexA8 -Neon .
• Steering real-time implementation of (EVRC-WB) Speech coder in CEVA-X1622 Processor
• Implementing & integrating image and video processing, storage, and display systems; participating in system design, integration
and integration testing
• Conducting various experiments and tests and revaluating internal representation algorithm rules, formulas and programs for
technical and special procedures and methods
• Providing specialized and technological suggestions in order to supervise and directing the algorithm operative functions of the
creators and developers
PREVIOUS EXPERIENCE
Aug’10-Jan’14 Samsung India Software Center, Noida as Lead Engineer
Jul’09-Aug’10 Adeptte Microsystems Pvt. Limited, Chennai as Senior Design Engineer
Jun’07-Jul’09 DigiBee Microsystems Limited, Chennai as Member Technical Staff
ACADEMIC QUALIFICATIONS
• B.E. (ECE) from S.A Engineering College, Anna University in 2007 with 78.33%
• 12th from Sathyaa Mat. Higher Secondary School, Chennai with 86.75%
• 10th from Sathyaa Mat. Higher Secondary School, Chennai with 77.36%
KNOWLEDGE PURVIEW
• Optimization of:
o Ultra-Sound Medical Display Pipeline Using Opencl & Dx9
o JM H.264 Baseline Encoder ,Integration in Ffmpeg on Cortexa15(four core)/ MaliGPU /Neon Platform
• Development of:
o OpenCL 1.2/2.0 samples in AMD APPSDK Installer for AMD
o Optimized image processing algorithms on OpenCL and HSA
o Mp3 decoder, Integration in Ffmpeg Multimedia Framework on Arm-CortexA8-Neon Platform
o LATM-AAC Parser implementation and AAC decoder on MIPS platform
o Video Post process on Black fin Processor
o Scaling, Color conversion for Image Post Processing in CEVA-X1620 Platform
• Implementation of:
o (EVRC-WB) Speech coder in CEVA-X1622 Processor
o Mp3 encoder in TMS320C64XX platform
o G.711 WB, G.729AB, iLBC speech codec in TMS320C64XX platform
• Post- Silicon Testing of DGB501 SOC
• Integration of AMR-NB & AMR-WB, Midi-decoder on FFMPEG on Cortex-A9 Dual core with Neon and MIPS platform
IT SKILLS
• DSP Processors: ARM, CortexA8-Neon, CortexA9 (Dual Core), Arm926EJS, TMS320C64XX, CEVA-X1620,
Mips32-34&74 core and Black-fin
• Languages: C, OpenCL, Assembly language, Perl, TCL, Matlab, C++Amp, OpenCV, DX9 and OpenMP
• Tools: Visual Studio, SmartNCodeIDE (CEVA), Code Warrior, Real View (ARM), Code composer
studio (TI), Visual, Ecclipse, CQ & Perforce and Static analyser Prevent
• OS: Windows & Linux
Achievements
• Got Best Team Player Award in year 2011 while working in Samsung R&D Delhi
• Got Cash award for successfully completing a critical project in Adeptte Microsystems Pvt ltd before the proposed
completion time
PERSONAL DETAILS
Date of Birth: 25th
April 1985
Languages known: English, Tamil, Telugu & Hindi
Address: No: c4, 1st main road, PWD road, B Narayanapura, Bangalore – 560016
(Please refer to the Annexure for project details)
ANNEXURE:
ORGANIZATIONAL PROJECTS
Title Optimisation of Ultra-Sound Medical Pipeline Using Opencl & Dx9
Client AMD India Pvt. Ltd.
Description The Project goal was to implement Ultra-Sound algorithm into GPU using OpenCL language. These
ultrasound used various DSP algorithms such as FFT, median filter, fkstolts and zshift. All the DSP
algorithms are optimised well using AMDCodeXL .
Key Result Areas • Optimized the display pipeline using opencl-DX9 interOP and optimized resize float module using
OpenCL in the pipeline
• Used different pthread for display to increase the performance of the pipeline
• Avoided unnecessary data transfers from GPU-CPU-GPU for the sack of display through OpenCL – DX9
interOP; increased the usage of GPU since entire Pipeline runs on GPU. DX9 is used to support in
windows 7
Title Development of optimised image algorithms in OpenCL
Client AMD India Pvt. Ltd.
Description Development of Accelerated Embedded modules in image processing field. Well known image algorithm
like image scaling, Median filtering, dilating/Eroding, affine rotate, flip/transpose, LoG (Laplacian of
Gaussian ) filtering & histogram. In this project, the key goal is to achieve the best optimised the algorithm
for parallel computing using parallel computing language like openCL. The samples are also developed in
Heterogeneous system architecture (HSA) which has unified memory system for CPU & GPU . Devices used
are AMD Kaveri APU & AMD carrizzo APU.
Key Result Areas • Worked on both compute and memory intensive algorithm; maximized the performance of the
algorithm by understanding the behaviour of the openCL kernel using AMD codeXL
• Achieved Best optimization by maximizing cache hit, 100% utilization of GPU Device, efficient
utilization of Local Memory, increasing write-Combine, Better Logic with in-built functions, better
openCL buffers w.r.t. Application and avoiding LDS Conflicts
• Benchmarked Quality of the modules is compared with OpenCV
• Developed 90% of samples with best optimization, unit testing and documentations
Title Development of OpenCL 1.2/2.0 Samples for AMD APPSDK Installer
Client AMD India Pvt. Ltd.
Description AMD APPSDK Installer provides various samples in different domains in various parallel computing
languages like Opencl , C++AMP , Bolt and Aparapi . The Samples explains different features of the GPU
Hardware to increase performance. The Algorithms which uses massive data can be well used in GPU.
Key Result Areas • Developed samples like Binary Search using openCL2.0 features (Device-side Enqueue) , Samples
involving other openCL2.0 features as well as openCL1.2 features .
• Created few samples using C++ and AMP language
Title Optimization of jm H.264 Baseline Encoder & Integrating on FFMPEG on Cortexa15 (four core) /
MaliGPU/Neon Platform
Description H.264/AVC format gradually became so familiar to adopt in many application like Video conferencing,
Broadcasting, Recording in Movies. Similar Parallel computing are the future since advanced parallel
processors are becoming more powerful like Nvidia’s GPU. So we are using multiple processor
optimisations. Blocks are optimised using GPU, Quad core, Neon coprocessor. Initially best search
algorithms are chosen based on Quality and performance trade-off. Later few modifications on algorithm
has been achieved.
Key Result Areas • Worked in Opencl on Nvidia’s GT240 in Motion compensation model for the POC; selected the best
motion estimation algorithm and optimised ME,MC on ARM GPU
• Reviewed Neon assemblies, steered Process documentation and ran tools for memory leak,
corruption and code coverage
Title Added support of midi on Samsung DTV (MIPS 34kc platform & Arm-CortexA9(Dual core) withNeon co-
processor platform)
Description The project was featured to have midi support in Ffmpeg . The midi code is got from Open Source. Midi
player supports GM, Add effects of Chorus and Reverb.
Key Result Areas • Added plug-in for parser, Init, decode & delete module in Ffmpeg and conducted sanity level testing
for a reasonable level of performance in both MIPS & Dual core A9 processor with Neon support
Title Development, Optimisation & Integration of Mp3 Decoder on Ffmpeg on Arm-CortexA8-Neon platform
Description In this project, we support all layers of MPEG1, additionally supports lower sampling rates and lower bit-
rates as mentioned MPEG2 & MPEG2.5 standard. Ffmpeg version 0.6.9 multi-threaded version is used for
integration. Testing and documentation was made.
Title Development of Latm-Aac parser implementation on Ffmpeg on Arm-CortexA8 platform
Description LATM Parser code was developed based on the specification. This code is integrated in Ffmpeg as parser,
and then AAC decoder is used to decode the LATM Aac file. Testing is done only for single stream Aac
content. Regressive testing with different vectors are made on the Arm-CortexA8-Neon Target
Title Implementation of EVRC-WB on CEVAX1622 platform
Description EVRC-WB is a Wideband extension of EVRC. It splits into two bands, lower band and higher band performs
operation separately.
Key Result Areas • Converted the code from ++ to C
• Resolved build error and ensured proper working condition of the code in visual studio solution
• Ported into SmartNcode Ceva Emulator; performed C optimisation, compiler optimisation and added
intrinsic to basics ops
• Wrote Hand coded assembly for critical modules
Title Implementation of Mp3 encoder on TMS320C64XX platform
Description Mp3 encoder was based on psychoacoustic principles yields good compression.
Key Result Areas • Converted critical module from floating to fixed; implemented various C optimisation techniques
• Utilized compiler setting for better compression in code and cycles
• Wrote Intrinsics for basics are written and Hand-code assembly for time critical modules; the code
was into Re-entrant and code is re-locatable at run-time
Title Video Post Process on Black fin BF561 Processor
Description Post Process is mainly undergone in order to remove unwanted noises, unsmooth colour transition etc.
Thus increase the quality of the video.
Key Result Areas • Used initial bilinear transformation algorithm for the post process of video, and implemented bicubic
interpolation
Title Implementation of G.711_1 & G.729AB, iLBC Speech Codec in TMS320C64XX platform
Description The G711.1 Encoder contains three layers depends on different bit rate uses log companded PCM,
embedded PCM extension with adaptive bit allocation and weighted vector quantization coding principles.
The Decoder contains special principles to improve the quality like frame erasure concealment (FERC) and
Post filter. G.729AB vocoder conforms to ITU-T G.729 Annex a & Annex B recommendation and can be
effectively used in a wide range of applications for simultaneous voice and data transmission, especially in
telephony over packet networks. iLBC is a low bit-rate speech coder effectively used in voIP Internet
applications.
Key Result Areas • Converted critical module from floating to fixed; implemented various C optimisation techniques
• Utilized compiler setting for better compression in code and cycles
• Wrote Intrinsics for basics are written and Hand-code assembly for time critical modules; the code
was into Re-entrant and code is re-locatable at run-time
Title Post Silicon Testing Of DGB501 SOC (CEVAX1620)
Description DGB 501 SOC is a dual core processor chip, which has a DSP and an ARM processor. The responsibility in
this project is to verify the working of the DSP Sub System in the SoC this testing is used to make sure all
function units of the chip including all peripherals are working fine.
Key Result Areas • Administered the coding for memory access, looping, control instructions and also monitor other
instructions; wrote PERL script for few testing
• Tested in CEVA Development board by using PERL and TCL scripts and resolved all the bug issues.
• Performed documentation
Title Real-time Implementation of Acoustic Echo Cancellation (LMS Algorithm) on CEVA-X1620
Description Echo cancellation is a technique used in telephony to describe the process of removing echo from a voice
communication in order to improve voice quality on a telephone call. Echo cancellation involves first
recognizing the originally transmitted signal that re-appears, with some delay, in the transmitted or
received signal. Once the echo is recognized, it can be removed by 'subtracting' it from the transmitted or
received signal. Highly optimised code resulting in < 4 MCPS on CEVA Development board.
Key Result Areas • Managed coding and optimisation of time critical modules for MCPS and PM constraints
• Integrated and tested in CEVA Development board
Title Integration of GSM Speech, MPEG4 & H.264 Video Codecs with Mobile Multimedia DSP Framework on
CEVAX1620 platform
Description All the optimised and standalone tested speech codecs such as GSM Speech and MPEG4 and H.264 video
codecs were integrated with the DSP software framework. The key challenges were proper organization
and management of code and data memory of all the codecs. Lack of data cache in the system meant that
an efficient memory management scheme for stack, state and table memory had to be implemented.
Resolved discrepancies in codec MCPS and memory requirements between standalone version and with
integrated version.
Key Result Areas • Integrated and tested all the multimedia codecs in CEVA Development board; resolved issues related
to memory and MCPS constraints
• Performed proper memory management to match MCPS in integrated version with standalone
version
Title Development & Implementation of Image Post Processing On CEVAX-1620
Description Image post processing is essential for any mobile application, as the decoded images need to be scaled to
the dimensions of the LCD screen. Various image post-processing functions are needed such as Scaling and
Zooming of the decoded image besides doing the required color conversions. Developed all the functions in
C and tested using different types of images. These functions need to be highly optimised so that they do
not consume processor MIPS. This not only helps in reducing the time needed to display the image on the
LCD but also saves critical battery power. All function were optimised in C-level and later coded for further
optimisation. Highly optimised code resulting in < 12 MCPS for VGA on CEVA Development board.
Key Result Areas • Involved in coding of scaling of a given image, rotation of scaled image and color conversion from
YUV420 to RGB555, RGB565, RGR24, and RGB32 in CEVA Development board
• Integrated and tested in CEVA Development board & Performed Documentation
Title Validation of Mobile Multimedia Software Framework on ARM926EJS Versatile Board
Description The Mobile Multimedia Framework (MMF) enables one to use all the multimedia services available on the
mobile handset. The MMF is built over a DGB framework, which serves as an abstraction layer between the
mobile services and the operating system. The MMF integrates various multimedia service components
such as an Audio Ringer, Image Viewer, Voice Player & Recorder, and Media Player. Perl scripts were
generated for automating the entire validation process. The entire unit tested and optimised codecs were
integrated with the MMF and tested in real-time on the ARM Development Platform. Multiple components
that will work together in a typical mobile application were also tested extensively .A number of bugs were
identified in the MMF; these were documented, reported to the development team and also tracked to
closure using bug-tracking tools. A few trivial bugs were identified and fixed during the validation process
itself.
Key Result Areas • Tested all the Multimedia codecs in VC and ARM Versatile platform
• Resolved the Critical bug issues related to framework integration
Title Post Silicon Testing Of DGB501 SOC (CEVAX1620)
Description DGB 501 SOC is a dual core processor chip, which has a DSP and an ARM processor. The responsibility in
this project is to verify the working of the DSP Sub System in the SoC this testing is used to make sure all
function units of the chip including all peripherals are working fine.
Key Result Areas • Administered the coding for memory access, looping, control instructions and also monitor other
instructions; wrote PERL script for few testing
• Tested in CEVA Development board by using PERL and TCL scripts and resolved all the bug issues.
• Performed documentation
Title Real-time Implementation of Acoustic Echo Cancellation (LMS Algorithm) on CEVA-X1620
Description Echo cancellation is a technique used in telephony to describe the process of removing echo from a voice
communication in order to improve voice quality on a telephone call. Echo cancellation involves first
recognizing the originally transmitted signal that re-appears, with some delay, in the transmitted or
received signal. Once the echo is recognized, it can be removed by 'subtracting' it from the transmitted or
received signal. Highly optimised code resulting in < 4 MCPS on CEVA Development board.
Key Result Areas • Managed coding and optimisation of time critical modules for MCPS and PM constraints
• Integrated and tested in CEVA Development board
Title Integration of GSM Speech, MPEG4 & H.264 Video Codecs with Mobile Multimedia DSP Framework on
CEVAX1620 platform
Description All the optimised and standalone tested speech codecs such as GSM Speech and MPEG4 and H.264 video
codecs were integrated with the DSP software framework. The key challenges were proper organization
and management of code and data memory of all the codecs. Lack of data cache in the system meant that
an efficient memory management scheme for stack, state and table memory had to be implemented.
Resolved discrepancies in codec MCPS and memory requirements between standalone version and with
integrated version.
Key Result Areas • Integrated and tested all the multimedia codecs in CEVA Development board; resolved issues related
to memory and MCPS constraints
• Performed proper memory management to match MCPS in integrated version with standalone
version
Title Development & Implementation of Image Post Processing On CEVAX-1620
Description Image post processing is essential for any mobile application, as the decoded images need to be scaled to
the dimensions of the LCD screen. Various image post-processing functions are needed such as Scaling and
Zooming of the decoded image besides doing the required color conversions. Developed all the functions in
C and tested using different types of images. These functions need to be highly optimised so that they do
not consume processor MIPS. This not only helps in reducing the time needed to display the image on the
LCD but also saves critical battery power. All function were optimised in C-level and later coded for further
optimisation. Highly optimised code resulting in < 12 MCPS for VGA on CEVA Development board.
Key Result Areas • Involved in coding of scaling of a given image, rotation of scaled image and color conversion from
YUV420 to RGB555, RGB565, RGR24, and RGB32 in CEVA Development board
• Integrated and tested in CEVA Development board & Performed Documentation
Title Validation of Mobile Multimedia Software Framework on ARM926EJS Versatile Board
Description The Mobile Multimedia Framework (MMF) enables one to use all the multimedia services available on the
mobile handset. The MMF is built over a DGB framework, which serves as an abstraction layer between the
mobile services and the operating system. The MMF integrates various multimedia service components
such as an Audio Ringer, Image Viewer, Voice Player & Recorder, and Media Player. Perl scripts were
generated for automating the entire validation process. The entire unit tested and optimised codecs were
integrated with the MMF and tested in real-time on the ARM Development Platform. Multiple components
that will work together in a typical mobile application were also tested extensively .A number of bugs were
identified in the MMF; these were documented, reported to the development team and also tracked to
closure using bug-tracking tools. A few trivial bugs were identified and fixed during the validation process
itself.
Key Result Areas • Tested all the Multimedia codecs in VC and ARM Versatile platform
• Resolved the Critical bug issues related to framework integration

More Related Content

PDF
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
PPTX
Cross platform computer vision optimization
PPTX
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
PDF
UplinQ - enhance qualcomm® snapdragon™ audio using android audio ap_is
DOCX
Vignesh_Resume_7years
PDF
DCC Labs Overview
PDF
RESUME_VLSI
DOC
Muruganandam_7years
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Cross platform computer vision optimization
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
UplinQ - enhance qualcomm® snapdragon™ audio using android audio ap_is
Vignesh_Resume_7years
DCC Labs Overview
RESUME_VLSI
Muruganandam_7years

What's hot (18)

PDF
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
PDF
DCC Labs Company Presentation
DOCX
Prabhat Ravi Resume
PDF
Rahul_Ramani_Profile
DOC
Excellent opportunities in Bangalore and Chennai
PDF
TeamSpirit
DOC
ajay_Profile
PPTX
Velocity-EHF for Android
PDF
I Tprogramming
PDF
GY-HM790E
PDF
EclipseCon 2011: Deciphering the CDT debugger alphabet soup
DOC
Resume
PDF
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
DOC
PrashantSoni_exp_embeddedSwDevelopment_latest
PDF
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
PDF
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
DOC
DamonKoachResume
PDF
Introduction to the Graphics Pipeline of the PS3
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
DCC Labs Company Presentation
Prabhat Ravi Resume
Rahul_Ramani_Profile
Excellent opportunities in Bangalore and Chennai
TeamSpirit
ajay_Profile
Velocity-EHF for Android
I Tprogramming
GY-HM790E
EclipseCon 2011: Deciphering the CDT debugger alphabet soup
Resume
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
PrashantSoni_exp_embeddedSwDevelopment_latest
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
DamonKoachResume
Introduction to the Graphics Pipeline of the PS3
Ad

Similar to Resume_suresh_final (20)

DOCX
shvp_07
RTF
AMIT PATIL- Embedded OS Professional
DOC
Resume
DOC
RalphRes_12_29
PDF
Rajas mhaskar resume2k19
PDF
Luxi Cheng's Resume
DOCX
HARISH_Resume_Embedded_SW
DOCX
NeerajSharma_EmbeddedSoftwareDeveloper
PPT
Ramprakash
DOC
YuanYiPeng_161017
DOCX
virendra
PDF
Shalabh_resume
PDF
Senior software software engineer
PDF
BourrezCVEnglish
DOC
Ankur_Sharma Resume
DOC
PDF
MarcoMorenoResume
DOCX
Bhavin_Resume
DOCX
kripashree
PDF
Ramesh.resume_iiith
shvp_07
AMIT PATIL- Embedded OS Professional
Resume
RalphRes_12_29
Rajas mhaskar resume2k19
Luxi Cheng's Resume
HARISH_Resume_Embedded_SW
NeerajSharma_EmbeddedSoftwareDeveloper
Ramprakash
YuanYiPeng_161017
virendra
Shalabh_resume
Senior software software engineer
BourrezCVEnglish
Ankur_Sharma Resume
MarcoMorenoResume
Bhavin_Resume
kripashree
Ramesh.resume_iiith
Ad

Resume_suresh_final

  • 1. SURESH S Contact: +91-9611225463 E- Mail: suresh.david86@gmail.com S E N I O R L E V E L P R O F E S S I O N A L E M B E D D E D D S P P R O J E C T S | P A R A L L E L C O M P U T I N G Location Preference: Bangalore |Chennai| Hyderabad CORE COMPETENCIES DSP Projects Parallel Computing Image Processing / Analysis Video, Audio & Speech Codecs DSP Assembly Programming Algorithm Processing Application Development Testing & Inspection Troubleshooting / Debugging Multimedia Framework Project Management Team Building & Leadership PROFILE SUMMARY • Offering over 8 years of experience with a strong analytical bent and a wide knowledge processing algorithms and techniques • Presently associated with Cerium Systems Pvt. Ltd. Bangalore as Technical Lead • Spearheaded designing, analysing, implementation and debugging of DSP projects (entailing video / speech / audio/ image systems) and Parallel computing field with OpenCL, Open GL, C++Amp and HSA • Skilled at creating and developing high performance, algorithmic trading models using maths and statistics • Expertise in delivering high quality DSP audio & voice processing software AND working seamlessly with diverse software teams to integrate DSP deliverables into framework • Comprehensive experience in video / audio encoding and encryption with focus on all compression algorithms, video formats, standards • Skilled in evaluating and integrating third-party deliverables needed towards the overall DSP software solution • In-depth knowledge of software tools / methodologies including Arm-Cortexa8-Neon, CortexA9 (Dual core), ARM11, CEVAX1620, TMS320C64X, Blackfin, MIPS, ARM-926EJS Processors, AMD GPU and ARM GPU architectures and its programming • Expertise in testing, debugging and resolving bug issues of Mobile Multimedia Framework (MMF) for Audio, Video, Speech, Image codes on VC+ & ARM926EJS platform • Enhanced vision algorithmic programs, rules and formulas for semi device equipment and assessing and installing vision system operation particularities • Proficiency in creating, enhancing and codifying particular software programs and backing- up the creation and enhancement of commodities • Drove the development of image processing, image analysis, and image understanding solutions for multiple image and video sensor data application ORGANIZATIONAL EXPERIENCE Jan’14- till date Cerium Systems Pvt. Ltd. Bangalore as Technical Lead Key Result Areas: • Leading the design, specification, development, testing and production of software for video / speech / audio / imaging system • Developing and implementing real-time and offline signal processing algorithms in C/C++ and collaborating with research teams and product management to identify and develop critical signal processing innovations for the company • Involved in Algorithm optimization, assisting with automated testing of embedded DSP while staying up to date on industry trends and best practices and upholding company best practices in engineering and research • Developing & implementing algorithms and optimizing image processing codes on OpenCL and HSA • Directing the development of various OpenCL 1.2/2.0 samples in AMD APPSDK Installer for AMD • Optimizing jm H.264 Baseline Encoder, integration in Ffmpeg on Cortexa15(four core)/MaliGPU/Neon Platform • Integrating AMR-NB & AMR-WB, Midi-decoder on FFMPEG on Cortex-A9 Dual core with Neon and MIPS platform • Ensuring the development & optimisation of Mp3 decoder, integration in Ffmpeg Multimedia Framework on Arm-CortexA8 -Neon . • Steering real-time implementation of (EVRC-WB) Speech coder in CEVA-X1622 Processor • Implementing & integrating image and video processing, storage, and display systems; participating in system design, integration and integration testing • Conducting various experiments and tests and revaluating internal representation algorithm rules, formulas and programs for technical and special procedures and methods • Providing specialized and technological suggestions in order to supervise and directing the algorithm operative functions of the creators and developers PREVIOUS EXPERIENCE Aug’10-Jan’14 Samsung India Software Center, Noida as Lead Engineer
  • 2. Jul’09-Aug’10 Adeptte Microsystems Pvt. Limited, Chennai as Senior Design Engineer Jun’07-Jul’09 DigiBee Microsystems Limited, Chennai as Member Technical Staff ACADEMIC QUALIFICATIONS • B.E. (ECE) from S.A Engineering College, Anna University in 2007 with 78.33% • 12th from Sathyaa Mat. Higher Secondary School, Chennai with 86.75% • 10th from Sathyaa Mat. Higher Secondary School, Chennai with 77.36% KNOWLEDGE PURVIEW • Optimization of: o Ultra-Sound Medical Display Pipeline Using Opencl & Dx9 o JM H.264 Baseline Encoder ,Integration in Ffmpeg on Cortexa15(four core)/ MaliGPU /Neon Platform • Development of: o OpenCL 1.2/2.0 samples in AMD APPSDK Installer for AMD o Optimized image processing algorithms on OpenCL and HSA o Mp3 decoder, Integration in Ffmpeg Multimedia Framework on Arm-CortexA8-Neon Platform o LATM-AAC Parser implementation and AAC decoder on MIPS platform o Video Post process on Black fin Processor o Scaling, Color conversion for Image Post Processing in CEVA-X1620 Platform • Implementation of: o (EVRC-WB) Speech coder in CEVA-X1622 Processor o Mp3 encoder in TMS320C64XX platform o G.711 WB, G.729AB, iLBC speech codec in TMS320C64XX platform • Post- Silicon Testing of DGB501 SOC • Integration of AMR-NB & AMR-WB, Midi-decoder on FFMPEG on Cortex-A9 Dual core with Neon and MIPS platform IT SKILLS • DSP Processors: ARM, CortexA8-Neon, CortexA9 (Dual Core), Arm926EJS, TMS320C64XX, CEVA-X1620, Mips32-34&74 core and Black-fin • Languages: C, OpenCL, Assembly language, Perl, TCL, Matlab, C++Amp, OpenCV, DX9 and OpenMP • Tools: Visual Studio, SmartNCodeIDE (CEVA), Code Warrior, Real View (ARM), Code composer studio (TI), Visual, Ecclipse, CQ & Perforce and Static analyser Prevent • OS: Windows & Linux Achievements • Got Best Team Player Award in year 2011 while working in Samsung R&D Delhi • Got Cash award for successfully completing a critical project in Adeptte Microsystems Pvt ltd before the proposed completion time PERSONAL DETAILS Date of Birth: 25th April 1985 Languages known: English, Tamil, Telugu & Hindi Address: No: c4, 1st main road, PWD road, B Narayanapura, Bangalore – 560016 (Please refer to the Annexure for project details)
  • 3. ANNEXURE: ORGANIZATIONAL PROJECTS Title Optimisation of Ultra-Sound Medical Pipeline Using Opencl & Dx9 Client AMD India Pvt. Ltd. Description The Project goal was to implement Ultra-Sound algorithm into GPU using OpenCL language. These ultrasound used various DSP algorithms such as FFT, median filter, fkstolts and zshift. All the DSP algorithms are optimised well using AMDCodeXL . Key Result Areas • Optimized the display pipeline using opencl-DX9 interOP and optimized resize float module using OpenCL in the pipeline • Used different pthread for display to increase the performance of the pipeline • Avoided unnecessary data transfers from GPU-CPU-GPU for the sack of display through OpenCL – DX9 interOP; increased the usage of GPU since entire Pipeline runs on GPU. DX9 is used to support in windows 7 Title Development of optimised image algorithms in OpenCL Client AMD India Pvt. Ltd. Description Development of Accelerated Embedded modules in image processing field. Well known image algorithm like image scaling, Median filtering, dilating/Eroding, affine rotate, flip/transpose, LoG (Laplacian of Gaussian ) filtering & histogram. In this project, the key goal is to achieve the best optimised the algorithm for parallel computing using parallel computing language like openCL. The samples are also developed in Heterogeneous system architecture (HSA) which has unified memory system for CPU & GPU . Devices used are AMD Kaveri APU & AMD carrizzo APU. Key Result Areas • Worked on both compute and memory intensive algorithm; maximized the performance of the algorithm by understanding the behaviour of the openCL kernel using AMD codeXL • Achieved Best optimization by maximizing cache hit, 100% utilization of GPU Device, efficient utilization of Local Memory, increasing write-Combine, Better Logic with in-built functions, better openCL buffers w.r.t. Application and avoiding LDS Conflicts • Benchmarked Quality of the modules is compared with OpenCV • Developed 90% of samples with best optimization, unit testing and documentations Title Development of OpenCL 1.2/2.0 Samples for AMD APPSDK Installer Client AMD India Pvt. Ltd. Description AMD APPSDK Installer provides various samples in different domains in various parallel computing languages like Opencl , C++AMP , Bolt and Aparapi . The Samples explains different features of the GPU Hardware to increase performance. The Algorithms which uses massive data can be well used in GPU. Key Result Areas • Developed samples like Binary Search using openCL2.0 features (Device-side Enqueue) , Samples involving other openCL2.0 features as well as openCL1.2 features . • Created few samples using C++ and AMP language Title Optimization of jm H.264 Baseline Encoder & Integrating on FFMPEG on Cortexa15 (four core) / MaliGPU/Neon Platform Description H.264/AVC format gradually became so familiar to adopt in many application like Video conferencing, Broadcasting, Recording in Movies. Similar Parallel computing are the future since advanced parallel processors are becoming more powerful like Nvidia’s GPU. So we are using multiple processor optimisations. Blocks are optimised using GPU, Quad core, Neon coprocessor. Initially best search algorithms are chosen based on Quality and performance trade-off. Later few modifications on algorithm has been achieved. Key Result Areas • Worked in Opencl on Nvidia’s GT240 in Motion compensation model for the POC; selected the best motion estimation algorithm and optimised ME,MC on ARM GPU • Reviewed Neon assemblies, steered Process documentation and ran tools for memory leak, corruption and code coverage
  • 4. Title Added support of midi on Samsung DTV (MIPS 34kc platform & Arm-CortexA9(Dual core) withNeon co- processor platform) Description The project was featured to have midi support in Ffmpeg . The midi code is got from Open Source. Midi player supports GM, Add effects of Chorus and Reverb. Key Result Areas • Added plug-in for parser, Init, decode & delete module in Ffmpeg and conducted sanity level testing for a reasonable level of performance in both MIPS & Dual core A9 processor with Neon support Title Development, Optimisation & Integration of Mp3 Decoder on Ffmpeg on Arm-CortexA8-Neon platform Description In this project, we support all layers of MPEG1, additionally supports lower sampling rates and lower bit- rates as mentioned MPEG2 & MPEG2.5 standard. Ffmpeg version 0.6.9 multi-threaded version is used for integration. Testing and documentation was made. Title Development of Latm-Aac parser implementation on Ffmpeg on Arm-CortexA8 platform Description LATM Parser code was developed based on the specification. This code is integrated in Ffmpeg as parser, and then AAC decoder is used to decode the LATM Aac file. Testing is done only for single stream Aac content. Regressive testing with different vectors are made on the Arm-CortexA8-Neon Target Title Implementation of EVRC-WB on CEVAX1622 platform Description EVRC-WB is a Wideband extension of EVRC. It splits into two bands, lower band and higher band performs operation separately. Key Result Areas • Converted the code from ++ to C • Resolved build error and ensured proper working condition of the code in visual studio solution • Ported into SmartNcode Ceva Emulator; performed C optimisation, compiler optimisation and added intrinsic to basics ops • Wrote Hand coded assembly for critical modules Title Implementation of Mp3 encoder on TMS320C64XX platform Description Mp3 encoder was based on psychoacoustic principles yields good compression. Key Result Areas • Converted critical module from floating to fixed; implemented various C optimisation techniques • Utilized compiler setting for better compression in code and cycles • Wrote Intrinsics for basics are written and Hand-code assembly for time critical modules; the code was into Re-entrant and code is re-locatable at run-time Title Video Post Process on Black fin BF561 Processor Description Post Process is mainly undergone in order to remove unwanted noises, unsmooth colour transition etc. Thus increase the quality of the video. Key Result Areas • Used initial bilinear transformation algorithm for the post process of video, and implemented bicubic interpolation Title Implementation of G.711_1 & G.729AB, iLBC Speech Codec in TMS320C64XX platform Description The G711.1 Encoder contains three layers depends on different bit rate uses log companded PCM, embedded PCM extension with adaptive bit allocation and weighted vector quantization coding principles. The Decoder contains special principles to improve the quality like frame erasure concealment (FERC) and Post filter. G.729AB vocoder conforms to ITU-T G.729 Annex a & Annex B recommendation and can be effectively used in a wide range of applications for simultaneous voice and data transmission, especially in telephony over packet networks. iLBC is a low bit-rate speech coder effectively used in voIP Internet applications. Key Result Areas • Converted critical module from floating to fixed; implemented various C optimisation techniques • Utilized compiler setting for better compression in code and cycles • Wrote Intrinsics for basics are written and Hand-code assembly for time critical modules; the code was into Re-entrant and code is re-locatable at run-time
  • 5. Title Post Silicon Testing Of DGB501 SOC (CEVAX1620) Description DGB 501 SOC is a dual core processor chip, which has a DSP and an ARM processor. The responsibility in this project is to verify the working of the DSP Sub System in the SoC this testing is used to make sure all function units of the chip including all peripherals are working fine. Key Result Areas • Administered the coding for memory access, looping, control instructions and also monitor other instructions; wrote PERL script for few testing • Tested in CEVA Development board by using PERL and TCL scripts and resolved all the bug issues. • Performed documentation Title Real-time Implementation of Acoustic Echo Cancellation (LMS Algorithm) on CEVA-X1620 Description Echo cancellation is a technique used in telephony to describe the process of removing echo from a voice communication in order to improve voice quality on a telephone call. Echo cancellation involves first recognizing the originally transmitted signal that re-appears, with some delay, in the transmitted or received signal. Once the echo is recognized, it can be removed by 'subtracting' it from the transmitted or received signal. Highly optimised code resulting in < 4 MCPS on CEVA Development board. Key Result Areas • Managed coding and optimisation of time critical modules for MCPS and PM constraints • Integrated and tested in CEVA Development board Title Integration of GSM Speech, MPEG4 & H.264 Video Codecs with Mobile Multimedia DSP Framework on CEVAX1620 platform Description All the optimised and standalone tested speech codecs such as GSM Speech and MPEG4 and H.264 video codecs were integrated with the DSP software framework. The key challenges were proper organization and management of code and data memory of all the codecs. Lack of data cache in the system meant that an efficient memory management scheme for stack, state and table memory had to be implemented. Resolved discrepancies in codec MCPS and memory requirements between standalone version and with integrated version. Key Result Areas • Integrated and tested all the multimedia codecs in CEVA Development board; resolved issues related to memory and MCPS constraints • Performed proper memory management to match MCPS in integrated version with standalone version Title Development & Implementation of Image Post Processing On CEVAX-1620 Description Image post processing is essential for any mobile application, as the decoded images need to be scaled to the dimensions of the LCD screen. Various image post-processing functions are needed such as Scaling and Zooming of the decoded image besides doing the required color conversions. Developed all the functions in C and tested using different types of images. These functions need to be highly optimised so that they do not consume processor MIPS. This not only helps in reducing the time needed to display the image on the LCD but also saves critical battery power. All function were optimised in C-level and later coded for further optimisation. Highly optimised code resulting in < 12 MCPS for VGA on CEVA Development board. Key Result Areas • Involved in coding of scaling of a given image, rotation of scaled image and color conversion from YUV420 to RGB555, RGB565, RGR24, and RGB32 in CEVA Development board • Integrated and tested in CEVA Development board & Performed Documentation Title Validation of Mobile Multimedia Software Framework on ARM926EJS Versatile Board Description The Mobile Multimedia Framework (MMF) enables one to use all the multimedia services available on the mobile handset. The MMF is built over a DGB framework, which serves as an abstraction layer between the mobile services and the operating system. The MMF integrates various multimedia service components such as an Audio Ringer, Image Viewer, Voice Player & Recorder, and Media Player. Perl scripts were generated for automating the entire validation process. The entire unit tested and optimised codecs were integrated with the MMF and tested in real-time on the ARM Development Platform. Multiple components that will work together in a typical mobile application were also tested extensively .A number of bugs were identified in the MMF; these were documented, reported to the development team and also tracked to closure using bug-tracking tools. A few trivial bugs were identified and fixed during the validation process itself. Key Result Areas • Tested all the Multimedia codecs in VC and ARM Versatile platform • Resolved the Critical bug issues related to framework integration
  • 6. Title Post Silicon Testing Of DGB501 SOC (CEVAX1620) Description DGB 501 SOC is a dual core processor chip, which has a DSP and an ARM processor. The responsibility in this project is to verify the working of the DSP Sub System in the SoC this testing is used to make sure all function units of the chip including all peripherals are working fine. Key Result Areas • Administered the coding for memory access, looping, control instructions and also monitor other instructions; wrote PERL script for few testing • Tested in CEVA Development board by using PERL and TCL scripts and resolved all the bug issues. • Performed documentation Title Real-time Implementation of Acoustic Echo Cancellation (LMS Algorithm) on CEVA-X1620 Description Echo cancellation is a technique used in telephony to describe the process of removing echo from a voice communication in order to improve voice quality on a telephone call. Echo cancellation involves first recognizing the originally transmitted signal that re-appears, with some delay, in the transmitted or received signal. Once the echo is recognized, it can be removed by 'subtracting' it from the transmitted or received signal. Highly optimised code resulting in < 4 MCPS on CEVA Development board. Key Result Areas • Managed coding and optimisation of time critical modules for MCPS and PM constraints • Integrated and tested in CEVA Development board Title Integration of GSM Speech, MPEG4 & H.264 Video Codecs with Mobile Multimedia DSP Framework on CEVAX1620 platform Description All the optimised and standalone tested speech codecs such as GSM Speech and MPEG4 and H.264 video codecs were integrated with the DSP software framework. The key challenges were proper organization and management of code and data memory of all the codecs. Lack of data cache in the system meant that an efficient memory management scheme for stack, state and table memory had to be implemented. Resolved discrepancies in codec MCPS and memory requirements between standalone version and with integrated version. Key Result Areas • Integrated and tested all the multimedia codecs in CEVA Development board; resolved issues related to memory and MCPS constraints • Performed proper memory management to match MCPS in integrated version with standalone version Title Development & Implementation of Image Post Processing On CEVAX-1620 Description Image post processing is essential for any mobile application, as the decoded images need to be scaled to the dimensions of the LCD screen. Various image post-processing functions are needed such as Scaling and Zooming of the decoded image besides doing the required color conversions. Developed all the functions in C and tested using different types of images. These functions need to be highly optimised so that they do not consume processor MIPS. This not only helps in reducing the time needed to display the image on the LCD but also saves critical battery power. All function were optimised in C-level and later coded for further optimisation. Highly optimised code resulting in < 12 MCPS for VGA on CEVA Development board. Key Result Areas • Involved in coding of scaling of a given image, rotation of scaled image and color conversion from YUV420 to RGB555, RGB565, RGR24, and RGB32 in CEVA Development board • Integrated and tested in CEVA Development board & Performed Documentation Title Validation of Mobile Multimedia Software Framework on ARM926EJS Versatile Board Description The Mobile Multimedia Framework (MMF) enables one to use all the multimedia services available on the mobile handset. The MMF is built over a DGB framework, which serves as an abstraction layer between the mobile services and the operating system. The MMF integrates various multimedia service components such as an Audio Ringer, Image Viewer, Voice Player & Recorder, and Media Player. Perl scripts were generated for automating the entire validation process. The entire unit tested and optimised codecs were integrated with the MMF and tested in real-time on the ARM Development Platform. Multiple components that will work together in a typical mobile application were also tested extensively .A number of bugs were identified in the MMF; these were documented, reported to the development team and also tracked to closure using bug-tracking tools. A few trivial bugs were identified and fixed during the validation process itself. Key Result Areas • Tested all the Multimedia codecs in VC and ARM Versatile platform • Resolved the Critical bug issues related to framework integration