手机扫描二维码答题
00:00:00
SingVisio User Study
录音中...
Thank you for participating in this user study for the SingVisio interactive visual analysis system for diffusion-based singing voice conversion.
Before taking the questionnaire, please read the system instructions:
https://x8gvg3n7v3.feishu.cn/wiki/LsPxwZc11iKz35krGf7cUjagnfe
. To access SingVisio on the CUHKSZ campus, visit
http://10.26.1.178:8080
;
for off-campus users, visit
Original link:
https://dsvc.openmmlab.org.cn/
.
To access both links, please paste them into your browser.
This questionnaire comprises three parts, 35 questions, and is estimated to take 18-24 minutes to complete.
If you have any questions or suggestions, please contact us (mingxuanwang1@link.cuhk.edu.cn; chaorenwang@link.cuhk.edu.cn; xueliumeng@cuhk.edu.cn).
Part One: Background
*
1. How many years of experience do you have in the field of machine learning?
Less than one year
One to three years
More than three years
*
2. How many years of experience do you have in the field of music and singing processing?
Less than one year
One to three years
More than three years
Part Two: Objective Evaluation
Task 1: Time-wise step comparison
Configuration: Step Comparison Mode, with Source Singer 1, Target Singer 5, and Song 1
*
3. How does the audio quality of the converted singing voice evolve throughout the diffusion generation process (From step 999 to step 0)?
There is a gradual increase in noise.
The audio alternates between being noisy and intelligible.
The audio gradually transitions from noisy to intelligible.
*
4. What is the trend in the changes of the F0 curve and mel spectrogram during the diffusion generation process?
They gradually become clearer.
They gradually become more blurred.
They alternate between clarity and blurriness.
*
5. What's the F0 range in the final converted results (Step 0) with the condition of source singer 1, song 1 and target singer 5?
270-450Hz
100-250 Hz
450-550 Hz
*
6. At which step does the contents of the converted singing voice start to become relatively intelligible?
500 step
100 step
10 step
Task 2: Time-wise metric comparison
Configuration: Metric Comparison Mode, with Source Singer 2, Target Singer 6, and Song 17
*
7. What is the relationship between the F0CORR (F0 Correlation) metric and the model's performance?
No relation
Positive correlation; higher values indicate better model performance
Negative correlation; lower values indicate better model performance
*
8. What aspect of audio does the FAD (Fréchet Audio Distance) metric evaluate?
Intelligibility
Timbre similarity
Sound quality
*
9. What is the trend of the MCD (Mel-cepstral Distortion) curve with the change of time steps?
No change
Gradually decreasing
Gradually increasing
*
10. What does the changing trend of Dembed (Singer Similarity) during the diffusion generation process indicate?
The timbre similarity between the converted result and the target singer's voice increasingly improves.
The timbre similarity between the converted result and the target singer's voice increasingly diminishes.
The content similarity between the converted result and the target singer's content increasingly improves.
Task 3: Pair-wise SVC comparison with different target singers
Configuration: Target Singer Comparison Mode, Source Singer 12, Song 12. Target singer config refer to questions.
*
11. In the Target Singer Comparison mode, is there a difference in timbre between target singer 8 and target singer 13?
Yes, there is a difference
No, there is no difference
*
12. In the Target Singer Comparison mode, what are the fundamental frequency ranges (F0) of target singer 8 and target singer 9, respectively?
150-450 Hz and 70-170 Hz
150-200 Hz and 150-200 Hz
300-400 Hz and 200-250 Hz
*
13. In the Target Singer Comparison mode, with source singer as 12, song as 12, and target speakers as 8 and 9 respectively, at diffusion step 30, which result has clearer harmonics (horizontal bright lines) in the Mel spectrogram?
Song 12: Singer 12 -> Singer 8
Song 12: Singer 12 -> Singer 9
Both are the same
Task 4: Pair-wise SVC comparison with different source singers
Configuration: Source Singer Comparison Mode, with Source Singer 8 & 12, Target Singer 13, and Song 12
*
14. What are the timbres of source singer 8 and singer 12, respectively?
Male and male
Female and male
Female and female
*
15. From the project embedding of Step+Noise+Condition (Last layer) shown in the projection view, we can observe the motion trajectories of the steps corresponding to two conversion processes:
The motion trajectories completely overlap.
It is unclear what the motion trajectories look like.
The direction of motion is consistent, but the specific trajectories are different.
*
16. What are the timbre and fundamental frequency range of the final generated results at diffusion step 0?
Female, 100-300 Hz
Male, 100-300 Hz
Male, 200-400 Hz
Task 5: Pair-wise SVC comparison with different source songs
Configuration: Song Comparison Mode, with Source Singer 12, Target Singer 13, and Song 12 & 13
*
17. Are the contents of the two sources (Singer 12 Song 12 and Singer 12 Song 13) consistent with the content of the target (Singer 13)?
The content is different.
The content is the same.
One is the same, the other is different.
*
18. For the final two converted results at step 0 (Song 12: Singer 12 -> Singer 13 and Song 13: Singer 12 -> Singer 13), into whose timbre should they be converted?
Into the timbre of Singer 12 for both conversions.
Into the timbre of Singer 12 for the first and Singer 13 for the second conversion.
Into the timbre of Singer 13 for both conversions.
*
19. For the final two converted results at step 0 (Song 12: Singer 12 -> Singer 13 and Song 13: Singer 12 -> Singer 13), which singing content do they match?
The singing content in Target (Singer 13)
The singing content in Source (Singer 12 Song 12) and Source (Singer 12 Song 13)
The singing content in Source (Singer 12 Song 13) and Source (Singer 12 Song 12)
Part Three: Subjective Evaluation
Explainability
*
20. It is easy to compare the diffusion generation results at different steps.
Strongly disagree
Strongly agree
1
2
3
4
5
*
21. The metric curve over diffusion steps is helpful for analyzing the changes in metrics during the diffusion generation process.
Strongly disagree
Strongly agree
1
2
3
4
5
*
22. The pairwise comparison of converted results under two different source singer conditions is helpful for understanding the singing voice conversion task.
Strongly disagree
Strongly agree
1
2
3
4
5
*
23. The system is helpful for understanding the working mechanism of the iterative generation process in a diffusion model.
Strongly disagree
Strongly agree
1
2
3
4
5
Analysis (Functionality)
*
24. The system supports interactive selection and comparison of generated results at different diffusion steps.
Strongly disagree
Strongly agree
1
2
3
4
5
*
25. The system supports the display and observation of metric trends that change with the diffusion steps.
Strongly disagree
Strongly agree
1
2
3
4
5
*
26. The system supports the pair-wise comparison with different conditions of target singers.
Strongly disagree
Strongly agree
1
2
3
4
5
*
27. The system supports the visualization of intermediate results in diffusion-based singing voice conversion.
Strongly disagree
Strongly agree
1
2
3
4
5
Visual Design (Effectiveness)
*
28. The projection view is very helpful for observing and controlling the diffusion steps.
Strongly disagree
Strongly agree
1
2
3
4
5
*
29. The step view is very helpful for an overall observation of the diffusion generation process.
Strongly disagree
Strongly agree
1
2
3
4
5
*
30. The comparison view is helpful in displaying information under different modes.
Strongly disagree
Strongly agree
1
2
3
4
5
*
31. The interactivity of system design is effective.
Strongly disagree
Strongly agree
1
2
3
4
5
Usability (User-friendly UI)
*
32. It is easy to learn and use the system.
Strongly disagree
Strongly agree
1
2
3
4
5
*
33. I would like to recommend it to others in need.
Strongly disagree
Strongly agree
1
2
3
4
5
User Information
*
34. Please tell us your name.
*
35. Please tell us your email address.
36. If you have some suggestions, please fill them out below.
评价对象得分
字体大小