Issue 001 -- AI Content Pollution in Search
On Saturdays/weekends, I usually go out to play badminton. After playing, I sometimes have dinner with friends. The photo above is from one of those dinners - Wushan Paper-Wrapped Fish 😋. Unfortunately, I injured my foot playing badminton on Friday, so I'll be resting this weekend.
>> Topics to Discuss
Recently, a hot topic has emerged: discovering AI-generated content cited within AI search engines.
What problem does this cause? Simply put, it leads to "misinformation propagation and hallucination overlap."
Imagine this: AI-generated answers still have hallucinations. The emergence of RAG (Retrieval Augmented Generation) aims to avoid these hallucinations by referencing correct sources. However, if RAG references AI-generated content and then the AI reinterprets it, the hallucination becomes even more severe, making the content unusable and hard to trust.
Moreover, some users have discovered that searching site:doubao.com
on Google yields over tens of millions of results!
Of course, it seems that Doubao has since made corrections, and the excessive SEO optimizations have been taken down.
However, I believe this is something that search engines like Google should handle. Here are some of my questions:
- Did Google not notice this? Can't they customize a strategy?
- Is it that they can't distinguish AI content and don't know what to do?
- Or is the AI-generated content still valuable? For example, users can directly search for corresponding AI answers without consuming tokens and phrasing questions.
Even if Doubao has corrected it, what about other AI sites, especially those personal AI content sites? There are many examples of people doing anything for traffic. When I use a search engine, I naturally want to find content I need. How to filter and whether it can be filtered should be the job of the search engine.
Just like how many sites used to stuff unrelated keywords to increase search hits and were then eliminated by PageRank, which contributed to Google's success.
Furthermore, if you create a search engine now that can use some strategy to distinguish AI content, wouldn't it attract many dissatisfied users? After all, some people will always prefer non-AI-written content, given that AI content still has hallucinations.
>> Must Read
Google Meet Introduces Adaptive Audio
A very useful feature! When multiple computers join a meeting in the same room, everyone can be heard clearly without awkward echoes and audio feedback.
GPT-4-turbo vs GPT-4o
If I only use text and don't focus on multimodal, which one performs better? There is a discussion on the official forum, and several top-rated answers suggest that GPT-4-turbo seems to perform better.
that GPT-4-turbo is a lot better than GPT-4o.
GPT 4 turbo is much better for step by step tasks.
Here is also a detailed blog comparing their performances:
- Data Extraction: GPT-4o shows better performance than GPT-4 Turbo but still lacks accuracy in complex tasks.
- Classification: GPT-4o has the highest precision, making it the best choice to avoid false positives. GPT-4 Turbo shows lower accuracy.
- Conversational Reasoning: GPT-4o has made significant improvements in some reasoning tasks but still needs improvement. GPT-4 Turbo finds these tasks more challenging.
- Latency: GPT-4o has lower latency, responding faster than GPT-4 Turbo.
- Throughput: GPT-4o has a throughput of 109 tokens per second compared to GPT-4 Turbo's 20 tokens per second.
While GPT-4o is the clear winner in quality and latency, it may not be the best model for every task.
>> Useful Tools
Git Command Cheat Sheet
Still not familiar with Git commands? Quickly save this Git command cheat sheet!
Full-Text Search Engine Library Tantivy
A full-text search engine library, an alternative to ElasticSearch, written in Rust, supports Chinese, delivers blazing performance, supports REST API, and can be called from Python, open-sourced under the MIT license.
Gradient Animation Generator
A great choice for creating background for a website homepage, with impressive effects and various configurable parameters:
Terminal Text Effects Library
This Python-written terminal text effects library gained nearly a thousand points on HN in just a few hours. It's super cool!
Logo Generator
The FAV0 weekly logo was generated using this tool website. No fancy stuff, just like the logos of X and DEV communities, black background with white text. High recognition is the first element of logo design!
Manage Fonts via npm
A must-have tool for front-end developers! A great font website with thousands of open-source fonts, which can be directly installed and version-controlled via npm.
Hand-Drawn Style UI Component Library
A hand-drawn style UI component library that can be directly used in different frameworks such as React, Vue, Svelte, or HTML. Perfect for creating some fun pages!
HTTPS Related Learning Materials
ChatTTS
- Conversational TTS: ChatTTS is optimized for conversational tasks, achieving natural and smooth speech synthesis while supporting multiple speakers.
- Fine-Grained Control: The model can predict and control fine-grained prosodic features, including laughter, pauses, and filler words.
- Better Prosody: ChatTTS surpasses most open-source TTS models in prosody. Pre-trained models are also available for further research.
In summary, it's impressive and open-source!
>> Interesting Finds
What is AGI
Reddit users provided a bunch of interesting answers:
How to Learn Programming Well?
Relying on CV (Copy-Paste) engineering has always been a popular joke among programmers. To address this question, one user provided this image:
Why Can't Hou Yi Shoot on Weekdays?
Because once he shoots, he'll lose his job.
Cloudflare Bug
Recently, while purchasing a domain on CF, I saw this page where the data wasn't rendered:
>> Worth Reading
Reading Programming Languages is Not the Same as Reading Languages
This article studies how, when we program, the part of the brain activated is not the area for language processing but the multiple demand network related to mathematics. However, it does not precisely replicate the cognitive demands of mathematics.
Language is just a tool. Once we learn how to express it, we focus more on the logic behind the language.
16 Practices for Good Web Design
Why, After 6 Years, I'm Over GraphQL
GraphQL is very flexible: it allows "schema queries," where a single interface can return data with different fields, more or less.
- How to control permissions for each field?
- How to rate-limit each interface, as the data returned by a single interface can vary greatly?
- When field resolvers hit external data sources: N+1 problem...
Redirecting APIs from HTTP to HTTPS Seems Wrong
It should be completely disabled, quickly fail, and return an error code like 403. Interestingly, OpenAI seems to have changed their interface strategy based on this article.