帖子内容
算法求解:字幕译文与原文匹配问题 现有两个字幕文本,分别表示字幕原文和字幕译文,其中原文的行数比译文多,原因是一行译文经常由多行译文翻译而来。现在需要将原文和译文一一匹配,以制作双语字幕: 原文输入示例(行数n:2<n<100): At that time, TV was doing quite a lot of singles, and I don't think that one was particularly expensive. I don't think it could have been a movie, but it was fine for TV, because they finance things differently. I know this is supposed to be about film, but if I'd made XX or OOXX as movies, there would've been 20,000-30,000 tickets sold on four sad prints. 译文输入示例(行数m:2<=m<n<100): 那会儿,电视剧集频繁上档,制作费用不算高 电影行不通,但电视剧可以,因为电视与电影的筹资方式迥异 我知道我们本该聊电影,但如果把《懊悔》或《不后悔》搬上大银幕 最终可能也就只能在几家孤零影院,卖出两三万张票 希望得到的匹配(之一,符合字幕连续特性,不造成跳跃阅读即可): At that time... | 那会儿,电... and I don't think.. | 那会儿,电... I don't think it.. | 电影行不通,但.. because they.. | 电影行不通,但.. but if I'd made.. | 我知道我们本该.. there would've been | 最终可能也就.. 请问有什么办法可以得到这样的匹配? 要求算法可使用代码解决,因为字幕制作是一个自动化的过程。 50条匹配50条左右的时间在消费级别PC上可以在3分钟内解决。(一部影片字幕大约600条,总时间大约半小时) 必须是本地免费的方案,不借助外部云计算平台的算力。 虽然已和GPT讨论了一些方案,但可行性一般,发出来问问,说不定关注频道的有大佬。 #讨论 🏳️🌈 酷儿影视频道 see.queers.top