基于Qwen3-ASR-1.7B的SpringBoot语音识别服务搭建指南

张开发

• 2026/6/3 21:51:38 • 15 分钟阅读

分享文章

基于Qwen3-ASR-1.7B的SpringBoot语音识别服务搭建指南1. 引言语音识别技术正在改变我们与设备交互的方式从智能助手到语音转文字应用这项技术已经深入到日常生活的方方面面。最近阿里开源的Qwen3-ASR-1.7B模型以其出色的多语言支持和准确的识别能力为开发者提供了一个强大的开源选择。这个模型最吸引人的地方在于它能识别52种语言和方言包括30种主要语言和22种中文方言甚至还能处理带背景音乐的歌唱识别。对于Java开发者来说如何在SpringBoot项目中快速集成这个强大的语音识别能力就是本文要解决的核心问题。无论你是想为应用添加语音输入功能还是需要处理大量的音频转文字任务跟着本文一步步操作你就能在半小时内搭建起一个可用的语音识别服务。2. 环境准备与项目搭建2.1 系统要求与依赖配置首先确保你的开发环境满足以下要求JDK 11或更高版本Maven 3.6 或 Gradle 7至少8GB内存模型推理需要较多内存Python 3.8用于模型推理环境在SpringBoot项目的pom.xml中添加必要的依赖dependencies dependency groupIdorg.springframework.boot/groupId artifactIdspring-boot-starter-web/artifactId /dependency dependency groupIdorg.springframework.boot/groupId artifactIdspring-boot-starter-validation/artifactId /dependency !-- 用于处理多媒体文件 -- dependency groupIdorg.apache.tika/groupId artifactIdtika-core/artifactId version2.4.1/version /dependency /dependencies2.2 模型环境配置Qwen3-ASR-1.7B需要Python环境来运行我们需要在项目中集成Python推理服务。创建src/main/python目录并添加requirements.txttorch2.0.0 transformers4.30.0 librosa0.10.0 soundfile0.12.0 numpy1.21.03. 核心服务实现3.1 Python推理服务封装创建Python语音识别服务类asr_service.pyimport torch from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor import librosa import soundfile as sf import numpy as np class QwenASRService: def __init__(self, model_nameQwen/Qwen3-ASR-1.7B): self.device cuda if torch.cuda.is_available() else cpu self.model AutoModelForSpeechSeq2Seq.from_pretrained( model_name, torch_dtypetorch.float16, low_cpu_mem_usageTrue ).to(self.device) self.processor AutoProcessor.from_pretrained(model_name) def transcribe_audio(self, audio_path): # 加载音频文件 audio_input, sample_rate librosa.load(audio_path, sr16000) # 处理音频输入 inputs self.processor( audio_input, sampling_ratesample_rate, return_tensorspt, paddingTrue ) # 将输入移动到相应设备 inputs {k: v.to(self.device) for k, v in inputs.items()} # 执行推理 with torch.no_grad(): outputs self.model.generate(**inputs) # 解码结果 transcription self.processor.batch_decode( outputs, skip_special_tokensTrue )[0] return transcription3.2 SpringBoot服务层实现创建Java服务类来调用Python推理服务Service public class SpeechRecognitionService { private final ProcessBuilder processBuilder; public SpeechRecognitionService() { this.processBuilder new ProcessBuilder(python, src/main/python/asr_service.py); } public String transcribeAudio(MultipartFile audioFile) { try { // 保存上传的音频文件 Path tempDir Files.createTempDirectory(audio_); Path audioPath tempDir.resolve(audioFile.getOriginalFilename()); Files.copy(audioFile.getInputStream(), audioPath, StandardCopyOption.REPLACE_EXISTING); // 调用Python服务 Process process processBuilder .command(python, src/main/python/asr_runner.py, audioPath.toString()) .start(); // 读取输出结果 BufferedReader reader new BufferedReader( new InputStreamReader(process.getInputStream())); String result reader.readLine(); process.waitFor(); // 清理临时文件 Files.deleteIfExists(audioPath); Files.deleteIfExists(tempDir); return result; } catch (IOException | InterruptedException e) { throw new RuntimeException(语音识别失败, e); } } }3.3 RESTful API接口创建控制器类提供Web接口RestController RequestMapping(/api/speech) public class SpeechRecognitionController { Autowired private SpeechRecognitionService speechService; PostMapping(/transcribe) public ResponseEntityTranscriptionResponse transcribeAudio( RequestParam(audio) MultipartFile audioFile) { if (audioFile.isEmpty()) { return ResponseEntity.badRequest() .body(new TranscriptionResponse(请上传音频文件)); } try { String transcription speechService.transcribeAudio(audioFile); return ResponseEntity.ok(new TranscriptionResponse(transcription)); } catch (Exception e) { return ResponseEntity.internalServerError() .body(new TranscriptionResponse(处理失败: e.getMessage())); } } GetMapping(/health) public ResponseEntityString healthCheck() { return ResponseEntity.ok(语音识别服务运行正常); } } record TranscriptionResponse(String text, String status) { public TranscriptionResponse(String text) { this(text, success); } }4. 高级功能与优化4.1 批量处理支持对于需要处理大量音频文件的场景我们可以实现批量处理功能Service public class BatchSpeechRecognitionService { Async public CompletableFutureString transcribeAsync(MultipartFile audioFile) { return CompletableFuture.supplyAsync(() - speechService.transcribeAudio(audioFile)); } public ListString transcribeBatch(ListMultipartFile audioFiles) { ListCompletableFutureString futures audioFiles.stream() .map(this::transcribeAsync) .collect(Collectors.toList()); return futures.stream() .map(CompletableFuture::join) .collect(Collectors.toList()); } }4.2 性能优化建议为了提高服务性能可以考虑以下优化措施模型预热服务启动时预先加载模型连接池维护Python进程池避免频繁创建销毁缓存机制对相同音频文件的结果进行缓存异步处理使用消息队列处理大量请求Configuration EnableAsync public class AsyncConfig { Bean public TaskExecutor taskExecutor() { ThreadPoolTaskExecutor executor new ThreadPoolTaskExecutor(); executor.setCorePoolSize(4); executor.setMaxPoolSize(8); executor.setQueueCapacity(100); executor.setThreadNamePrefix(asr-worker-); executor.initialize(); return executor; } }5. 实际应用示例5.1 语音文件上传界面创建一个简单的HTML页面来测试语音识别功能!DOCTYPE html html head title语音识别测试/title /head body h2上传音频文件进行识别/h2 form iduploadForm enctypemultipart/form-data input typefile nameaudio acceptaudio/* required button typesubmit开始识别/button /form div idresult/div script document.getElementById(uploadForm).addEventListener(submit, async (e) { e.preventDefault(); const formData new FormData(); formData.append(audio, e.target.audio.files[0]); try { const response await fetch(/api/speech/transcribe, { method: POST, body: formData }); const result await response.json(); document.getElementById(result).innerText 识别结果: ${result.text}; } catch (error) { console.error(识别失败:, error); } }); /script /body /html5.2 常见音频格式支持Qwen3-ASR-1.7B支持多种音频格式但在实际使用中建议统一转换为WAV格式以获得最佳效果Component public class AudioPreprocessor { public File convertToWav(MultipartFile audioFile) { // 实现音频格式转换逻辑 // 可以使用FFmpeg或Java音频库进行转换 return convertedFile; } public boolean validateAudioFormat(MultipartFile file) { String contentType file.getContentType(); return contentType ! null (contentType.startsWith(audio/) || contentType.equals(video/mpeg)); } }6. 总结通过本文的步骤我们成功在SpringBoot项目中集成了Qwen3-ASR-1.7B语音识别模型。从环境准备到API开发整个流程其实并不复杂关键是理解如何将Python的机器学习模型与Java的Web服务进行有机结合。实际使用下来这个模型的识别准确率确实令人印象深刻特别是对中文方言和多语言场景的支持。在性能方面虽然模型较大需要一定的硬件资源但通过合理的优化和异步处理完全能够满足大多数应用场景的需求。如果你在实施过程中遇到问题建议先从简单的音频文件开始测试逐步优化处理流程。对于生产环境还需要考虑加入监控、日志和故障恢复机制确保服务的稳定性。获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。