
林煜轩 Yuxuan Lin
yl6061@columbia.edu|+1 (646) 705-7015|+86 19533230071
Hi! I am currently pursuing a M.S. in Computer Engineering at Columbia University, expecting to graduate in February 2027.
Prior to this, I earned a B.S. in Computer Engineering from University of Illinois Urbana-Champaign and a B.Eng. in Electronic & Computer Engineering from Zhejiang University through ZJU-UIUC dual-degree program.

Publication

Research Experiences
My research explores scalable and efficient training/inference systems, as well as generative models.


- Formulated PET (Positron Emission Tomography) reconstruction as a Bayesian inference problem.
- Applied Normalizing Flows (RealNVP, Glow) to model the posterior distribution of tracer activity.
- Implemented a Deep Probabilistic Imaging pipeline, enhancing reconstruction accuracy and providing crucial uncertainty estimates for clinical diagnosis.
- Scaled training with multi-GPU parallelization (3.7x speedup) while maintaining image quality.

- Created a Generative AI-powered Blender plugin in Python to automate the 3D design workflow, supporting 3D prototype management, segmentation, and Gaussian ↔ Mesh conversion.
- Engineered text-to-3D model generation using Transformer/Diffusion-based Gaussian Splatting and mesh rendering pipelines, cutting average modeling time for artists by 50%.
- Deployed a Flask-based backend to track user interactions and deliver 3D generation services.
Professional Experiences
I had fun doing internships in software development.

- Developed and integrated core control modules for a commercial autonomous cleaning robot on Linux OS, covering device state management, inter-module communication, version control, logging, and testing; the product has been successfully deployed to market
- Implemented sensor control and data acquisition logic on FreeRTOS, ensuring precise microcontroller control and real-time performance

- Hold weekly office hours and discussion sessions to strengthen students' grasp of core discrete mathematics concepts.
- Designed and graded assignments and exams, ensuring alignment with learning objectives.
Selected Projects
I am familiar with C/C++ and Linux. Also, I have experience in CUDA, Golang (Gin, GORM), Python (PyTorch, Flask), SQL/NoSQL, x86 Assembly, Quartus, and UE5.

- Built an intelligent web-based decision system for U.S. stock trading, addressing latency bottlenecks of serial multi-agent workflows while improving decision transparency and usability
- Designed a Gin gateway + FastAPI inference service architecture; persisted reasoning chains and decisions using PostgreSQL (GORM); containerized core microservices with Docker for full-stack rapid deployment
- Implemented parallel multi-agent execution and dependency control using LangGraph, reducing end-to-end latency by ~70% compared to serial pipelines; supported both API-based and local LLM inference

- Collaborated in a team of 4 to establish an end-to-end simulation pipeline driven by real execution traces from deep learning training and inference workloads, enabling reusable power- and performance-aware analysis for large-scale systems
- Implemented a multi-GPU distributed fine-tuning (FSDP + LoRA) event collection and conversion at collective communication and kernel-level execution, covering 10+ mainstream models including LLaMA, Qwen, BERT, and ResNet

- In a team of 2, developed an FPGA-based Plants vs. Zombies game as a System-on-Chip (SoC), integrating SystemVerilog hardware modules with a Nios II soft-core processor.
- Implemented VGA display, sprite rendering, collision detection, and USB keyboard input modules based on FSM control and ROM buffering, achieving responsive real-time interaction and smooth 60 FPS gameplay on a 640x480 monitor.
- Integrated hardware-software co-design in Platform Designer, connecting VGA, USB, SDRAM, and logic modules via the Avalon bus and verifying timing through ModelSim simulation.

- Led a team of 4 to engineer an ESP32-based hardware-software co-designed IoT system, enabling real-time video streaming and asynchronous control of lighting and motor modules.
- Implemented a Flask backend with SQLite for local data storage, and deployed YOLO/MediaPipe inference service for real-time pose estimation and repetition counting.
- Built a Vue3 (UniApp) frontend to visualize workout sessions and user history, and integrated DeepSeek API to provide personalized fitness insights.

- Optimized the forward-pass of a LeNet-5 convolutional layer using CUDA with an advanced GEMM kernel.
- Applied techniques including streams, Tensor Cores, memory tiling, FP16 arithmetic, and loop unrolling to maximize throughput.
- Achieved 27,909x speedup over the CPU implementation and 36% over the parallel baseline.

- Led a team of 4 to construct a Linux-like operating system kernel from scratch using C and x86 assembly.
- Developed OS modules and services including virtual memory, file system, terminal display, interrupt / system calls / exception handling, and device drivers for keyboards, RTC, and PIT.
- Completed kernel and user modes switching, multi-terminal switching, and multi-process scheduling.

- In a team of 5, developed an adventure puzzle-solving game demo inspired by Infinity Blade, using Unreal Engine 5 and Blueprints.
- Implemented core gameplay mechanics including health and attack systems, collectible items, and AI enemies for 4 integrated levels (Lava Parkour, Laser Puzzle, Riddle Maze, and Traffic Jam), delivering varied gameplay.
Relevant courses:
Adapted design and styles from Jon Barron.
