An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates. Experts say the problem is bigger than that
Ah yes Mr. Professor, mind telling us how you came to this conclusion?
To me you come off like an early 1900s fear monger a la “There will never be a flying machine, humans aren’t meant to be in the sky and it’s physically impossible”.
If you literally meant that there is no such thing yet, then sure, we haven’t reached AGI yet. But the rest of your sentence is very disingenuous toward the thousands of scientists and developers working on precisely these issues and also extremely ignorant of current developments.
I pasted 1k line C++ file into Gemini, along with a screenshot and a trace log and asked it to find the bug. It reasoned for about 5 minutes. Extract of the solution:
There has to be an issue in this sequence for i=21:
V_to_next_xy (norm) is (5.00004113497265e-08, -0.999999999999999, 0). (Correct)
Unnormalized perpendicular P_un = Vec3(V_to_next_xy.y(), -V_to_next_xy.x(), 0.0) should be (-0.999999999999999, -5.00004113497265e-08, 0).
temp_bisector becomes P_un.
temp_bisector.normalize() is called.
The result of this normalization is logged as temp_bisector (result of collinear path): (-1.78813934326169e-07, -0.999999999999984, 0).
This step (5) is where the numerical error occurs. The vector (-0.999…, -5e-8, 0) when normalized should not become (-1.78e-7, -0.999…, 0).
The latter is almost parallel to (0, -1, 0), while the former is almost parallel to (-1, 0, 0).
It’s as if temp_bisector.normalize() for the input (-0.999…, -5e-8, 0) is producing something wildly incorrect, or the vector being normalized was actually (-1.78e-7 * L, -0.999… * L, 0) for some L.
What if temp_bisector = Vec3(V_to_next_xy.y(), -V_to_next_xy.x(), 0.0); itself has precision issues?
If V_to_next_xy.y() is -0.999999999999999 (16 nines), this is -(1.0 - 1e-15).
If V_to_next_xy.x() is 5.00004113497265e-08.
Vec3 tmp = Vec3(-(1.0-1e-15), -5.00004113497265e-08, 0.0).
This should be fine.
The most plausible candidate is Vec3::normalize() failing on this specific input vector (-0.999999999999999, -5.00004113497265e-08, 0).
Perhaps length() computes sqrt(xx + yy + zz).
xx = (-(1.0-1e-15))^2 = (1.0-1e-15)^2 = 1.0 - 2e-15 + 1e-30.
yy = (-5e-8)^2 = 2.5e-15.
xx + y*y = 1.0 - 2e-15 + 1e-30 + 2.5e-15 = 1.0 + 0.5e-15 + 1e-30.
length = sqrt(1.0 + 0.5e-15 + 1e-30). This is extremely close to 1.0.
If this length calculation has any error causing it to be, for example, abs(y) by mistake, then x/length would become x/abs(y) and y/length would become y/abs(y) = sign(y).
x/abs(y) = -1.0 / 5e-8 = -2e7. This is not what’s happening.
The logged value for temp_bisector (-1.78813934326169e-07, -0.999999999999984, 0)
is suspicious because its X component -1.78813934326169e-07 is very close to the X component of V_to_prev_xy (norm) from loop i=21, which is -5.00004113497265e-08, divided by a factor. Or perhaps V_to_next_xy.x() itself was somehow used more directly than -V_to_next_xy.x() in the Y component of the vector passed to normalize.
The logged temp_bisector (result of collinear path) for i=21 is (-1.78813934326169e-07, -0.999999999999984, 0).
Consider Loop i=20: temp_bisector (result of collinear path): (1, 0, 0). Its input for perpendicular construction was V_to_next_xy (norm) = (X_v: 4.9999318885022e-08, Y_v: -0.999999999999999, Z_v: 0).
Perpendicular from this is (Y_v, -X_v, 0) = (-0.999…, -4.999…e-8, 0). Normalizing this gives (-1, -4.999…e-8, 0).
The log for i=20 says it becomes (1,0,0). This is also wrong. It should be (-1, -very_small_y, 0).
The fallback temp_bisector = Vec3(1,0,0) only happens if temp_bisector.normalize() (the one on the perpendicular) also fails.
It seems the normalize() call for the vector constructed as Vec3(V_to_next_xy.y(), -V_to_next_xy.x(), 0.0) is problematic when one component is very small.
It correctly identified that sqrt(_v[0]*_v[0] + _v[1]*_v[1] + _v[2]*_v[2]); had too low precision and using std::hypot(_v[0], _v[1], _v[2]) would likely solve it.
If this is just autocomplete, then I agree that it’s a pretty fancy one.
Ah yes Mr. Professor, mind telling us how you came to this conclusion?
To me you come off like an early 1900s fear monger a la “There will never be a flying machine, humans aren’t meant to be in the sky and it’s physically impossible”.
If you literally meant that there is no such thing yet, then sure, we haven’t reached AGI yet. But the rest of your sentence is very disingenuous toward the thousands of scientists and developers working on precisely these issues and also extremely ignorant of current developments.
deleted by creator
I pasted 1k line C++ file into Gemini, along with a screenshot and a trace log and asked it to find the bug. It reasoned for about 5 minutes. Extract of the solution:
It correctly identified that
sqrt(_v[0]*_v[0] + _v[1]*_v[1] + _v[2]*_v[2]);
had too low precision and usingstd::hypot(_v[0], _v[1], _v[2])
would likely solve it.If this is just autocomplete, then I agree that it’s a pretty fancy one.