Surveys under scrutiny; Pulse Asia stands by method

By: Krixia Subingsubing - 3 years ago

MANILA, Philippines 鈥� With only four days left until Election Day, some of the country鈥檚 top statisticians find themselves at loggerheads and pondering questions on whether survey designs need to be updated to more accurately reflect public sentiment in light of survey results that indicate a victory for Ferdinand Marcos Jr.

Statistics experts like Romulo Virola, former secretary-general of the National Statistical Coordination Board (NSCB), and Dr. Peter Cayton of the University of the Philippines, believe that the recent Pulse Asia surveys showing Marcos way ahead of his closest rival, Vice President Leni Robredo, had under- and overrepresented certain sectors.

Both believe that those in Classes A and B as well as the 18-41 age group were underrepresented, while there was an overrepresentation of those in Classes D and E. Virola also believes there was underrepresentation of those who reached college.

Cayton said that over- or underrepresentation meant that the 鈥減roportion of sample agents from a survey may be higher or lower than what is typically expected from a larger population.鈥�

Virola clarified that he did not think Pulse Asia used a wrong sampling method, but that the over- and underrepresentation was the result of its post-stratification process, which focused on regional stratifications over sociodemographic group (SDG) profiles.

He noted that several studies and polls abroad have shown that age, class, and educational attainment have stronger impacts on voter preferences.

鈥楩濒补飞蝉鈥�

Virola tried to work out these 鈥渇laws鈥� and reweighed the results of the March 16-21, 2022, Pulse Asia survey showing a 56-24 gap between Marcos and Robredo.

He did this by using the 2017 socioeconomic classification system (1SEC) developed by the UP School of Statistics (to adjust underrepresentation of the ABC classes); the distribution of educational attainment of the voting age population from the Philippine Statistics Authority (to adjust underrepresentation of those who reached college); and using the Comelec data on registered voters by age to adjust underrepresentation of the young voters.

Since the numbers barely moved from the March to the April 16-21, 2022, survey and Pulse Asia did not change its methodology from the first poll, 鈥渨hatever the problem was from the very beginning was still there,鈥� Virola told the Inquirer.

He admitted, however, that his computations were based on an 鈥渁rbitrary鈥� sharing of votes (60-40 in favor of Robredo) based on the assumption that there were relatively more Robredo supporters among the youth as well as those with higher educational and socioeconomic backgrounds.

Massive rallies

These assumptions, he said, were based partly on Google Trends data showing massive interest for Robredo. 鈥淓ven though these are arbitrary metrics, I don鈥檛 think these are unreasonable given what is happening on the ground,鈥� he said, referring to Robredo鈥檚 massive rallies.

His computations show that Marcos will still lead even after adjusting the nationwide count by socioeconomic class (53.7 percent versus 29.3 percent) and educational attainment (48.8 percent versus 31.2 percent).

However, adjusting the vote among those aged 18-41 and 42-57 shows Robredo taking over the lead narrowly with 40.4 percent to 39.6 percent.

Virola鈥檚 computations sought to augment the gaps in Pulse Asia鈥檚 sampling. But there are clashing opinions on whether over- or undersampling has significant implications on the research design.

Cayton said it could mean 鈥渟ome inherent deviation, at the very least.鈥�

鈥淚f a group is under and overrepresented, the estimates tend to be a little more deviant in the way that it favors the overrepresented group than the underrepresented group,鈥� he said. 鈥淚f the deviation is very large, that might affect the outcomes in terms of whether it could be reliable and accurate.鈥�

Cayton also tried to do ensemble methodologies that merged Pulse Asia survey and Google Trends data under the assumption that big data could also be a reliable metric of public sentiment.

His computations also bring Marcos and Robredo to a statistical tie. But he is also the first to admit that 鈥渢here are a lot of heavy assumptions under this model.鈥�

Men Sta. Ana, coordinator for the think tank Action for Economic Reforms, said the sampling used by Pulse Asia was 鈥渃lose to the true distribution,鈥� especially since the demographic description of the respondents emerged only after conducting the random survey.

鈥淩andom variation is not a systematic bias. It just happens precisely because the result stems from randomness,鈥� he said. A well-designed random survey 鈥渨ill result in a random variation that is insignificant,鈥� he added.

Even without members from Classes A and B鈥攚ho are notoriously difficult to interview and belong to the top 1 percent of households鈥攊n the mix, the variance would remain very small, Sta. Ana said.

Never compromised

Pulse Asia defended its methodology, which it had used for decades.

The margin of error for each SDG reflected the 鈥渧ariance for the SDG,鈥� given its share of the total sample of the survey. It also corrected, 鈥渢o a significant extent, what Dr. Virola finds as an under/oversampling of specific SDGs,鈥� Pulse Asia president Ronald Holmes said in a statement.

He rejected claims that Pulse Asia had been 鈥渂ought鈥� and its work compromised. Creating such doubts on scientific polls 鈥渙nly deepen polarization and distrust and contribute to the continued erosion of an already extremely feeble democratic order,鈥� Holmes said.

鈥淭hose who make these unfair and unjust criticisms bear the responsibility for their baseless accusations feeding into the spiral of disinformation and malinformation that affects our society,鈥� he said.

Based on location

Research companies, like Pulse Asia, use multistage probability sampling based on location. Thus, the data on socioeconomic classes come after the survey when respondents are grouped into classes.

According to Jose Ramon Albert, a senior research fellow at the state-owned think tank Philippine Institute for Development Studies (PIDS), it is impossible to sample households across socioeconomic and income groups as no one has a complete listing.

鈥淧ulse Asia and SWS (Social Weather Stations) tables that list [socioeconomic status] are 鈥榓fterthoughts鈥� from data collected in the survey, just as the tables on [PIDS] income groups. They are themselves data from the surveys,鈥� Albert said.

Google appropriately warns that information available on its search trends page is not a substitute for polling data as users may want to know more about a party or politician for any number of reasons, without intending to vote for them.

Surveys and online trends reflect public preference and interest at the time the survey was taken or data was collected but people can and do change their minds up to the day of the elections.

After the 2016 presidential elections, the exit polls of SWS showed that voters decided on their choice for president a little later: 18 percent of those interviewed said they made their choice only on Election Day itself; another 15 percent decided only during the period May 1 to 8; 12 percent made their decision in April; 8 percent in March; and 46 percent in February or earlier.

鈥擶ITH A REPORT FROM INQUIRER RESEARCH

Pulse Asia defends sampling methods amid criticisms on latest survey

Pulse Asia: False accusations, baseless claims vs survey erode PH democracy

黑料社

鈥楩濒补飞蝉鈥�

Massive rallies

Never compromised

Based on location

鈥擶ITH A REPORT FROM INQUIRER RESEARCH