Thread by @Mazi_Obinna, In #machinelearning, the quality of the #data goes a long way toward [...]

Victor Umunna

Mazi_Obinna

In #machinelearning, the quality of the #data goes a long way toward determining the quality of the result. Just like in manufacturing, the higher the #quality of the input data, the more likely it is that the final product is of high quality as well- “Garbage in, garbage out”.

Garbage data here refers to data that are poorly labelled, inaccurate data or bias. “Poor data quality is enemy number one to the widespread, profitable use of machine learning,” says Thomas C. Redman — aka “The Data Doc” — one of the original pioneers of data quality management

To make sure you get the right data, you must first clarify your objective and also, know your data source. This will ensure that your data are aligned with the values and goals of your current project. Then, create plenty of time to execute data quality fundamentals...

into your overall project plan and challenge assumptions at every stage.
It is very possible to establish a culture of quality data into machine learning projects and accuracy can be achieved by comprehensive testing, cleaning, and auditing of data.

You can follow @Mazi_Obinna.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: