Research Projects

My research journey spans both robotics and AI security, exploring the intersection of intelligent systems, security, and privacy. Currently, I'm focusing on making robotic systems more capable through advanced imitation learning techniques while also investigating security vulnerabilities in AI systems.

🤖

UCLA Robot Intelligence Lab (URIL)

Directed by Professor Yuchen Cui, URIL focuses on imitation learning, reinforcement learning, and human-robot interaction for complex robotic systems.

Learning where to look: Gaze-Guided Active Perception for Long-Horizon Imitation Learning

Imitation Learning Gaze Tracking Computer Vision Robotics

Developing gaze-guided active perception methods to improve long-horizon imitation learning by inferring human visual intent and dynamically decomposing complex tasks.

Key Contributions:

Utilized Meta Aria as a gaze tracking system and collaborated on homography-based and LightGlue+RNN transformer methods for translating human perspective to machine perspective
Architected a probabilistic gaze prediction model using MDN, GMM, and UNet to infer human visual intent
Trained a gaze-centric skill selector using LSTM that dynamically chooses robot skills at inference time
Improved long-horizon tasks' success rates by 16% through foveated vision model implementation
Presented research at Meta's Research Summit for Egocentric Perception 2025 and UCLA Research Symposium

Technical Details:

Foveated Vision Model: Applies Gaussian noise and reduces intensity to visual data outside human gaze point to simulate attention
System Integration: Assembled URIL ACT ALOHA bimanual teleoperation hardware system with ROS+OpenCV real-time data pipeline
Architecture Adaptation: Adapted ALOHA's ACT Architecture and benchmarked SOTA IL policies (Vanilla BC, Diffusion Policy, ACT)

Multi-sensory Electronics-integrated Robotic Limb for Intelligent manipulation (MERLIN)

Dexterous Manipulation Sensor Integration Collaborative Robotics Data Collection

Co-initiating a collaboration between URIL and RoMeLa to develop a high-quality dexterous-hand data-collection system with joint sensors, cameras, and tactile sensing.

Key Contributions:

Co-initiated MERLIN as a collaboration between URIL and RoMeLa (Robotics Mechanism Laboratory)
Co-designed a high-quality dexterous-hand data-collection glove with joint sensors, cameras, and tactile sensing
Integrated and extended Python APIs on ROHand platform to replay and visualize data from the prototype glove
Collecting dexterous hand demonstrations with MERLIN and DexUMI and training imitation policies to compare performance across data-collection modalities

Project Goals:

Create comprehensive dataset of dexterous manipulation tasks
Compare performance of different data-collection modalities (MERLIN vs DexUMI)
Develop robust imitation learning policies for complex manipulation tasks

🔒

UCLA Security and Privacy Lab

Directed by Prof. Yuan Tian Focusing on developing secure intelligent systems and investigating vulnerabilities in AI/ML systems, particularly in the context of adversarial attacks and LLM security.

Adversarial Attack and Defense on Images

Adversarial ML Computer Vision Security Defense Mechanisms

Investigating adversarial attacks on image classification and text-to-image generation models, and developing defense mechanisms to protect against malicious manipulations.

Key Contributions:

Tested gradient matching & ascent algorithms to introduce unwanted results for image classification models
Researched the effects of Encoder and Diffusion Attack on text-to-image generation models
Trained a protection model adding 1.6% perturbation to an image and prevented malicious generation on the targeted image

Research Focus:

Attack Methods: Gradient-based attacks on classification models, encoder/diffusion attacks on generative models
Defense Strategies: Adversarial training, input perturbation for protection
Impact: Preventing malicious content generation in text-to-image systems

Security Flaws in Translation: A Study of Prompt Injection in LLM Translators

LLM Security Prompt Injection Vulnerability Analysis Defense Mechanisms

Conducting security evaluation of prompt injection vulnerabilities in LLM-based translation systems and developing detection/defense mechanisms.

Key Contributions:

Conducted a security evaluation of prompt injection vulnerabilities in RedNote's LLM-based translation mode
Engineered automated data collection scrapers to gather injection prompts from comments on test posts
Developed a Gemini-based analysis pipeline utilizing 3/4 of manual data for few-shot prompt analysis
Investigated detection and defense mechanisms by performing attention analysis on malicious prompts

Methodology:

Data Collection: Automated scraping of injection prompts from social media/test platforms
Analysis Pipeline: Gemini-based system for few-shot prompt analysis and classification
Defense Research: Attention analysis to understand and mitigate prompt injection attacks

Publications

2023

A robust and novel semantic segmentation deep neural network for robotic surgery vision with a single RGB camera

Shi, Y.

In Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022) (Vol. 12610, pp. 73-77). SPIE.

研究项目

我的研究领域涵盖机器人与网络安全，探索智能系统、安全与隐私的交叉点。目前，我专注于通过先进的模仿学习技术提升机器人系统的能力，同时研究人工智能系统的安全漏洞。

🤖

UCLA 机器人智能实验室 (URIL)

由崔雨晨教授指导，URIL专注于复杂机器人系统的模仿学习、强化学习和人机交互。

学习注视点：用于长程模仿学习的视线引导主动感知

模仿学习视线追踪计算机视觉机器人学

开发视线引导的主动感知方法，通过推断人类视觉意图和动态分解复杂任务来改进长程模仿学习。

主要贡献：

使用Meta Aria作为视线追踪系统，并合作开发了基于单应性和LightGlue+RNN transformer的方法，将人类视角转换为机器视角
构建了使用MDN、GMM和UNet的概率视线预测模型来推断人类视觉意图
训练了基于LSTM的视线中心技能选择器，在推理时动态选择机器人技能
通过实施中央凹视觉模型，将长程任务的成功率提高了16%
在Meta 2025年自我中心感知研究峰会和UCLA研究研讨会上展示了研究成果

技术细节：

中央凹视觉模型： 在人类注视点外的视觉数据上应用高斯噪声并降低强度以模拟注意力
系统集成： 组装了URIL ACT ALOHA双手遥操作硬件系统，配备ROS+OpenCV实时数据管道
架构适配： 适配了ALOHA的ACT架构，并对SOTA IL策略（Vanilla BC、Diffusion Policy、ACT）进行了基准测试

用于智能操作的多感官电子集成机器人肢体 (MERLIN)

灵巧操作传感器集成协作机器人数据收集

共同发起URIL和RoMeLa之间的合作，开发具有关节传感器、摄像头和触觉感知的高质量灵巧手数据收集系统。

主要贡献：

共同发起了URIL和RoMeLa（机器人机构实验室）之间的MERLIN合作项目
共同设计了具有关节传感器、摄像头和触觉感知的高质量灵巧手数据采集手套
在ROHand平台上集成和扩展了Python API，以回放和可视化来自原型手套的数据
使用MERLIN和DexUMI收集灵巧手演示数据，并训练模仿策略以比较不同数据采集模式的性能

项目目标：

创建灵巧操作任务的全面数据集
比较不同数据采集模式（MERLIN与DexUMI）的性能
为复杂操作任务开发稳健的模仿学习策略

🔒

UCLA 安全与隐私实验室

由田园教授指导，专注于开发安全的智能系统，并研究AI/ML系统中的漏洞，特别是在对抗性攻击和LLM安全方面。

图像对抗攻击与防御

对抗性机器学习计算机视觉安全防御机制

研究图像分类和文本到图像生成模型的对抗性攻击，并开发防御机制以防范恶意操作。

主要贡献：

测试了梯度匹配和上升算法，对图像分类模型引入不良结果
研究了编码器和扩散攻击对文本到图像生成模型的影响
训练了一个保护模型，对图像添加1.6%的扰动，防止了针对目标图像的恶意生成

研究重点：

攻击方法： 基于梯度的分类模型攻击，生成模型的编码器/扩散攻击
防御策略： 对抗训练，用于保护的输入扰动
影响： 防止文本到图像系统中的恶意内容生成

翻译中的安全漏洞：LLM翻译器中的提示注入研究

LLM安全提示注入漏洞分析防御机制

对基于LLM的翻译系统中的提示注入漏洞进行安全评估，并开发检测/防御机制。

主要贡献：

对小红书基于LLM的翻译模式进行了提示注入漏洞的安全评估
设计了自动数据收集爬虫，从测试帖子的评论中收集注入提示
开发了基于Gemini的分析流程，利用3/4的手动数据进行少样本提示分析
通过对恶意提示进行注意力分析，研究了检测和防御机制

方法论：

数据收集： 从社交媒体/测试平台自动爬取注入提示
分析流程： 基于Gemini的系统，用于少样本提示分析和分类
防御研究： 注意力分析以理解和减轻提示注入攻击

发表论文

2023

用于机器人手术视觉的鲁棒新颖语义分割深度神经网络（仅使用单个RGB相机）

石逸可

第三届人工智能与计算机工程国际会议 (ICAICE 2022) (Vol. 12610, pp. 73-77). SPIE.