Thoughts on Sentence Collector
Anything? I have a lot to say.
- As for How to, the rules for each language are absolutely necessary.
- A guide to each language (link to). At the very least, I think the Playbook and Collector's how-to should be translated.
- Filter out specific sentences. For example, characters in a foreign language. Number of characters. We need rules for each language.
- This relates to the guidelines, but should also refer to "grey areas". For example, some arrangements should be made for sensitive sentences, such as sexual language, politics and religion.
- For example, let's say there is a sentence stating "facts of history". But it's really quite common for what is considered "governing" in country A to be considered "aggression" in country B. We need to understand that all people read the sentence.
- Sex is an important expression for us humans. But it's also a pretty difficult issue. Some sentences should be painful for some people to utter, such as 私はエッチです (I'm lecherous.) or 彼女は彼に身を任せた (she gave herself to him.) or おっぱい (boobs). It should also be noted that the terms and conditions allow minors to read.
- In fact, the Japanese source text says, 私よりもっとエッチな人もいて安心しました。 (I was relieved to see that some people were even more sexually active than me.) What do you guys think?
- I admit that this is a poor translation of エッチ. But when we say "エッチ", the Japanese definitely perceive something sexual in it. (And we read it aloud!)
- Change the design of the review button. Why the thumb design? Yes, if the thumb is up, it's good; if it's down, it's no good. It's hard to tell. There are two meanings to this.
- First of all, we don't have that gesture in Japan. This means that there are probably other countries and regions of the world that don't have a thumbs up or thumbs down culture. Like services like YouTube, I always feel like I'm in a "foreign country" when I see these regional expressions. Sometimes I wonder if Common Voice is really trying to engage with people around the world.
- It is, in a word, universal design. We are trying to collect distinctive data. But ironically, Common Voice itself should not be "distinctive".
- Secondly, it's misleading. The only difference is exactly "thumbs". Why didn't they simply use "yes" or "no"?
"yes" button and the
"no" button are mentioned in the How to. This adds to the extra confusion. Or were the buttons used to be "yes" and "no" buttons? Even if it is, though, I'd like to see the description changed.
- Another cause could be that the blocks of sentence are close together. Sometimes the thumbs are all lined up in a row and I can't tell which thumb is which. We should at least draw a border or alternate the color of the blocks.
- Change the color of the review button. Yes, because as I mentioned above, it's confusing. Ideally, it should be clearly distinguishable, such as "yes" for green and "no" for red (yes, these are combinations that are difficult for colorblind people to see. They should be other combinations, and "approval" and "rejection" should be clearly distinguishable in letters and symbols alone). I think it's a good design that turns black when we press it.
- Corpus links. Ref: We need a text corpus link
- A page per sentence where we can see the metadata. (Is this not realistic because of the huge number of them?)
- It shows the exact number of unreviewed sentences.
- The progress of total sentences and unreviewed sentences is visible, e.g., in colored bars.
- "My added sentences" page. As in Rejected Sentences. This is handy when doing a self-review.
- Discussion Button. Discuss with other users about sentences that the user cannot judge the review. (It might be better to create a page of the sentence only when this button is pressed. As I said before, the number of sentences is huge.)
- As for one-sentence discussions, ideally they should be able to be done within the Collector tool. Discourse is too cumbersome with information. We need a page where we can comment on "specific sentences". Yes, that's what @irvin was suggesting in post #10.
- Hide Button (if that's appropriate). I've mentioned this in We need a Q&A. It should also allow we to see only what we've hidden (i.e., "Hidden sentences" page).
- The ability to search for sentences.
- The ability to check the source of a sentence.
- The ability to search for the source of a sentence.
- The ability to check the user who added or reviewed the sentence.
- The ability to flag the sentence. For later review. A "hold". For example, when we want to review a difficult sentence after we've consulted a dictionary.
- A hold period will be set up. When the period expires, the flag will be removed and other users will be able to review the sentence.
- The ability to add sentences from a text file (.txt).
- There should be rules for file formatting as well. Like one sentence per line.
- The text file can be previewed before it is submitted.
- Filtered sentences are notified and we can see where they are caused. (e.g. letters turn red)
- Dark Mode. Makes long hours of work easier.
- Would it be faster to use Dark Reader or Midnight Lizard? (Sorry, I didn't try. Which means I don't want to put in an add-on!)
- We may be willing to distribute user stylesheets. It is available in, for example, Stylus.
I sometimes look back and review the sentences I've "ignored", so it's a bit inconvenient for me to be random. But,
- To flag the sentences we care about.
- To search for sentences.
If these are possible, we might try to introduce them.
It's hard for me to notice mistakes when there are similar sentences in a row. Well, I generally agree that "boring" is a word.
But, as @irvin says in post #14, we should not overlook the benefits of making connections (relationships) between each sentence.
Users should show everyone why they are rejecting the sentence.
- Press reject button.
- The choices are displayed.
- Incorrect. (e.g., misspellings, lack of.)
- Inappropriate language. (e.g., sexual language, hate speech, etc.)
- It's hard to pronounce.
- Can't understand the meaning.
- Other (user enters)
- Select and reject.
I agree with the idea of re-posting after fix. But who is going to fix it? The user who added the sentence? Another user?
Wouldn't the work get done faster if the rejected sentences were public and could be fixed by any user?
In any case, the re-posted sentence
- can see that it has been reposted.
- can see why it was rejected.
It should be like this. To maintain neutrality, it should be able to be reviewed by anyone other than the user who rejected it.
Protesting the rejection
I also think the user who added the sentence needs to be able to protest (required if the user wants to re-post the sentence without fixing it). For example, let's say it was rejected because of a "misspelling". But it could be that the user who rejected it just didn't know the words or grammar. Therefore,
- The sentence is rejected.
- The user who added the sentence presses "protest button" (or simply "publish button" or "discuss button")
- A sentence discussion page is created.
- Each user gives their opinion on the page.
I think it's appropriate to maintain neutrality with this kind of process.
Hmmm, are these filters shared by all users? Or is it configurable on a per-user basis? Personally, it's best to have both working. I think the filters we share should be carefully considered.
It's important to know clearly from the Collector tool which users who have added and reviewed the sentence. But when reviewing a sentence, the user's information becomes noise. The "Review" screen hides it, and the "Search" screen shows the metadata of the sentence (user, source, etc.). In this way, there should be a distinction between judgments about sentences and judgments about users.
Why do we want to use user filters? That's the focus. Most reasons to filter users is because of a problem with the sentence. Therefore,
- Rejected sentence
- Reposted sentence
- Sentence with copyright issues
For the above, allow the users involved to be added to all the users' individual filters. Then, for the most problematic users, add them to a shared filter.
I would rather have a source filter. If there is an alleged copyright violation in a sentence, we can exclude it.
I think @irvin's opinion in post #3 is reasonable. Yes, even if we make the source searchable, not all users will submit a "source"... I agree with the filtering itself.
I think it's an option. I don't need it (because I'll check it myself).
- Two hundred sentences were added today!
- Twenty sentences need user comments.
The frequency of sending is also important. A day, a week, a month. Or every time a sentence is added? We might want to have an option to notify, "only rejected sentences" or "only sentences that require discussion".
Maybe we don't need a self-review when we upload. If we' re able to do a self-review anyway. There would be no reason to interrupt the upload process. This should have been written in Sentence Collector - Review before Submit. Sorry.
Tell people from the platform
On the platform, let people know that the text is also being collected by volunteers. Perhaps people who only record voices and their validation don't know about it. Currently, when we run out of sentences to read, we are guided to the Collector tool. But I think sentence collection is a matter that should be mentioned by the platform. Because in fact, sentence collection is just as important as recording!