Algorithm Development and High-Performance Software Implementation: Areas I've done serious analysis and coding in include:
Programming Languages & Skills: Fluency in C, C++, OOP, inline assembly (both Intel and AT&T/GCC syntax, x86, SSE SIMD), multithreading (mainly Pthreads and various thread-affinity extensions thereto), floating-point, multiword arithmetic, instruction set architectures (mainly x86_64 and ARM Neon).
Data Structures: Fluent in implementation of graphs, lists, trees, queues, heaps, etc, both in roll-your-own fashion and using the C++ STL, also with use of high-efficiency GCC extensions such as hash set and hash map; significant experience with matrix computations, discrete transforms & convolutions.
|B. S.||The University of Michigan, Ann Arbor||1985||Aerospace Engineering|
|M. S.||The University of Michigan, Ann Arbor||1987||Aerospace Engineering|
|M. A.||The University of Michigan, Ann Arbor||1991||Mathematics|
|Ph.D.||The University of Michigan, Ann Arbor||1993||Aerospace Engineering|
|Sep 2007 - Nov 2011||
Senior Staff R&D Engineer
|Wrote the schematic-versus-layout-netlist comparison code for the SDL (schematic-driven layout) toolsuite of the Custom Designer (CD) chip-design tool. SDL was a brand-new project group budgeted for 4-6 full-time developers when I joined it. Early stages of SDL involved deploying a project infrastructure centered around a pair of schematic and layout netlists, resulting from parsing and elaboration of design data stored in the OpenAccess (OA) database atop which CD is built. The verify engine (which I had major responsibility for) compared the 2 netlists and flagged mismatches of various kinds (user had control over which of the possible menu of detectable errors were to be flagged) via OA markers created in the layout. These markers were used by a subsequent eco-fix engine, which supported both batch-mode and incremental (user-selected) error repair.|
|Jan 2006 - Aug 2007||
Senior Software Engineer
Agate Logic, Inc.
|Agate is the company which bought out Leopard Logic's IP in mid-2005. Ongoing SW tools development, mainly in the areas of router, architectural modeling and optimization, architectural development, quality of results of the p&r tool flow, and helping train and work with a group of young engineers in the Agate Logic Beijing office. Also developed a general method for hybrid planar/hierarchical integrated-circuit interconnect while there (Cf. U.S. Patent 7,786,757).|
|Nov 2001 - Oct 2004||
Staff Software Engineer
Leopard Logic, Inc.
|The Leopard Logic "router guy." More-or-less sole responsibility for development of the production router tools for both the original LL FPGA core and later for the hybrid Gladiator CLD core. For the hierarchical FP core, devised a novel "reverse routing" algorithm that is both extremely fast and provably optimal in terms of mux usage (Cf. U.S. Patent 7,725,863). This allowed us to reduce the area of the chip by roughly one-third and (despite the reduced mux counts) routinely route designs having over 90% cell utilization. For Gladiator, the challenge was to quickly develop and productize a placement engine and fast, efficient global and detail router for a mask-programmable heterogenous logic fabric consisting of 64K 4-lut-based corecells, 64 RAM/MAC pairs running through the MP base in 8 vertical columns, surrounded by a set of horizontal and vertical outer routing channels and 8 I/O banks, each with a different capacity.|
|Jul 2000 - Oct 2001||
Adaptive Silicon, Inc.
Los Gatos, CA
|Major responsibility for development of the detail-router component of the integrated Verdi toolsuite for the ASi embeddable FPGA core, as well as numerous high-performance graph-theoretic utility algorithms|
|Jul 1993 - Jun 1999||
Dept. of Mech.&Aero. Engineering
Case Western Reserve University
10900 Euclid Avenue
Cleveland, OH 44106-7222
|Taught undergraduate and graduate classes in Computational Fluid Dynamics, Numerical Methods in Engineering, Advanced Fluid Mechanics. Supervised the Senior Project capstone design class, as well as two PhD dissertations (Dr. Sih-Tsan Lee and Dr. Kamran Eftekhari) and several Masters' theses.|
7,725,863, Reverse Routing Methods for Integrated Circuits Having an Hierarchical Interconnect Architecture, Granted 25. May. 2010.
7,786,757:, Integrated Circuits with Hybrid Planar Hierarchical Architecture and Methods for Interconnecting Their Resources, Granted 31. August 2010.
January 2018: Mlucas used to verify the 50th known Mersenne prime. I did an 82-hour verify run using an AVX2 build of the code on a 32-core Xeon E5-2698 v3 @ 2.30GHz. Separately, Andreas Höglund did a verify run using an AVX512 build on one of the recently-introduced Amazon Web Services C5 instances based on the Intel Skylake Xeon architecture and its 512-bit vector-arithmetic capability.
January 2016: 49th Known Mersenne prime verified using a multithreaded AVX2-based build of my Mlucas code.
February 2013: 48th Known Mersenne prime verified using a multithreaded SSE2-based build of my Mlucas code.
November 2009: Mlucas 3.0 beta code released - main feature is optimized SSE2 vector-double-precision support for the x86_64 (Intel core2 and AMD64) chip families.
April 2009: 47th Known Mersenne prime verified using my Mlucas code.
September 2008: 45th and 46th Known Mersenne prime verified using a multithreaded in-development version of my Mlucas code. M45 was a new world-record prime = 243112609-1, a number having a whopping 12,978,189 digits! The parallel code achieves an average processor utilization (compared to single-threaded) of 70% running 16-threaded on 8 dual-cores of a Sparc VII server and 4 quad cores of a Sparc VIII for large-array double-precision FFTs of lengths 2048K and 4096K floating doubles, respectively.
1999-2004: Using my Mlucas code, performed official independent-hardware-and-software verification runs of the newly-discovered Mersenne primes M6972593 (2098960 decimal digits), M13466917 (4053946 decimal digits), M20996011 (6320430 decimal digits) and M24036583 (7235733 decimal digits).
1999-2000: In collaboration with Richard Crandall and Jason Papadopoulos, established the composite character of the 24th Fermat Number. At the time, this was the largest number ever resolved as prime or composite via nonfactorial compositeness test.
1999: Lucas, a prototype version of my ever-evolving discrete-weighted-transform-based Lucas-Lehmer code for testing Mersenne numbers, makes it into the SPEC 2000 floating-point benchmark suite.
1993: Honorable mention 2nd place, McIvor Award (for outstanding research in Applied Mechanics), The University of Michigan College of Engineering.
1992: Outstanding Graduate Student Achievement Award, Department of Aerospace Engineering, The University of Michigan.
1990-1991: NASA Graduate Student Researcher Fellowship, Lewis Research Center.
1989: Research Partnership Award, Horace H. Rackham School of Graduate Studies, University of Michigan.
1987: Ralph W. Dobbins Fellow, Department of Aerospace Engineering, The University of Michigan.
Development of fast numerical algorithms for transform-based large-integer arithmetic, primality proving, and factorization. Development of advanced fast Fourier transform (FFT) algorithms optimized for modern cache-based microprocessor architectures, and investigation of (in collaboration with Peter L. Montgomery) hybrid floating-point/integer transform algorithms for factorization and primality testing of very large integers, targeted for architectures with a good balance of floating-point and integer capabilities such as the then-cutting-edge Alpha 21264, Intel IA-64, AMD Opteron and Apple/Motorola PowerPC/AltiVec). In 1999 (in collaboration with Richard E. Crandall and Jason S. Papadopoulos) conducted computational proof of the composite character of the twenty-fourth Fermat number (5050446 decimal digits) using a highly efficient floating-point-based code of my own writing, running on a MIPS R10000 single-processor workstation. (JSP conducted a second, independent floating-point-based proof, and REC organized a distributed volunteer effort which used checkpoint files generated by the floating-point runs to effect a parallel, all-integer ``wavefront'' check of the Pépin squares using multiple machines, including 5 Pentia paid for by a grant from the Number Theory Foundation.)
Applied mathematics/numerical analysis:
Mathematical analysis of finite-difference, finite-volume and spectral methods to solve linear and nonlinear ordinary and partial differential equations; weakly and strongly nonlinear hydrodynamic stability theory; nonlinear wave propagation and singularity formation; sensitivity analysis of hydrodynamic stability operators. Development of high-accuracy numerical algorithms for the generalized linear eigenproblem
Ax=cBx. The latter algorithm is far more accurate than the generalized-eigensystem routines in LAPACK package, considered the state of the art in numerical linear algebra.
Theoretical and computational fluid dynamics:
Application of finite-difference, finite-volume and spectral methods to solve linear and nonlinear ordinary and partial differential equations arising in boundary-layer theory, swirling flows, compressible aerodynamics and hydrodynamic stability theory.
Classic Iterative Methods for Polynomials:
In 1994, after reading a brief but very interesting SIAM Review Classroom Note by J. Gerlach ("Accelerated Convergence in Newton's Method," SIAM Review 36, 272-276), discovered an elegant recurrence formula for generating Newton-style iterative schemes of arbitrarily high order: For f(x) sufficiently smooth, an Nth-order-converging iterative scheme is defined by
|xk+1 = xk -||f f'
[(f')2 - f f''/2]
|xk+1 = xk -||f [(f')2 - f f''/2]
[(f')3 - f f' f'' + f2 f'''/6]
Study of the structure and dynamics of concentrated vortices in aerodynamics, geophysical and bio-fluid dynamics, and of particular phenomena such as vortex instability and breakdown.
Theoretical and numerical hydrodynamic stability theory:
Study of linear and nonlinear stability of swirling and nonswirling shear flows, the mathematical behavior and numerical solution of equations arising in this context, and applicability to understanding of physical flow instabilities including (but not limited to) shear-layer instabilities, swirling-flow instabilities/vortex breakdown and transition to turbulence.